proposal for regression testing#490
Conversation
🔍 Catalogue's Preview Site DeployedYour changes have been deployed to the preview site: 🔗 Preview URL: https://esa-apex.github.io/apex-algorithms-catalogue-web/pr-preview/pr-490/ This preview will be updated automatically when you push new changes to your PR. |
44710b2 to
16c9681
Compare
42a8863 to
be6e8ed
Compare
|
@JanssenBrm @VictorVerhaert ready to check. I have opted for a more adaptive benchmark where we look at the average and the std. Depending on the nr of successful runs the benchmark becomes more determinantal |
|
@JanssenBrm @JeroenVerstraelen @VictorVerhaert all feedback is welcome |
VictorVerhaert
left a comment
There was a problem hiding this comment.
Two small optional comment aimed at trying to prevent false fails. One more question: could you try and run it using github actions and see how it behaves in practice?
Otherwise the pr looks clean
| scaled_mad = 1.4826 * _median([abs(v - median) for v in values]) | ||
|
|
||
| k = _adaptive_k(min(n, 10)) | ||
| threshold = median + k * scaled_mad |
There was a problem hiding this comment.
Consider adding an absolute buffer for the cost metric here. I wouldn't let a benchmark fail if it suddenly costs 9 instead of 8.
| ) | ||
|
|
||
|
|
||
| def load_scenario_history( |
There was a problem hiding this comment.
Consider adding a date cutoff field here which should equal the updated field in the record. That way when a benchmark gets updated it resets the performance tests history.
Not needed and could overcomplicate it, but otherwise we might get a lot of false fails.
Idea for starting to include regression benchmarks.
@JanssenBrm I would also need info on how to best expose it such that we can keep a log on the service catalogue