This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Increase the --filter-short threshold
ClosedPublic

Authored by SjoerdMeijer on Feb 15 2023, 6:14 AM.

Details

Summary

This proposes to increase the --filter-short threshold from 0.6 to 1.0 seconds. The reason is that the LLVM test-suite is quite noisy and this is one mitigation for that. I.e., if we filter out all apps with a runtime of less than a second, then the variation between runs is a lot less and we get less noisy results. I appreciate this replaces one arbitrary number with another arbitray number, but a runtime of at least 1 second for a benchmark seemed reasonable to me.

As a result, less results/benchmarks will be reported. On the system that I tested it, this means that for the MultiSource benchmarks the following apps will be excluded in addition to the ones that already got excluded with --filter-short:

Program                                         exec_time
Benchmarks/DOE-ProxyApps-C/CoMD/CoMD            0.92
Benchmarks...VC/Expansion-flt/Expansion-flt     0.84
Applications/viterbi/viterbi                    0.78
Benchmarks...ataFlow-flt/GlobalDataFlow-flt     0.76
Benchmarks/Trimaran/enc-rc4/enc-rc4             0.74
Benchmarks/BitBench/five11/five11               0.97
Benchmarks/Prolangs-C++/life/life               0.65
Benchmarks...VC/Symbolics-flt/Symbolics-flt     0.73
Benchmarks...Fhourstones-3.1/fhourstones3.1     0.69
Benchmarks/VersaBench/dbms/dbms                 0.85
Benchmarks.../DOE-ProxyApps-C++/CLAMR/CLAMR     0.81
Benchmarks...roxyApps-C/SimpleMOC/SimpleMOC     0.88
Benchmarks...alencing-dbl/Equivalencing-dbl     0.63
Benchmarks/nbench/nbench                        0.89
Benchmarks/ASCI_Purple/SMG2000/smg2000          0.94
Benchmarks...aran/netbench-crc/netbench-crc     0.66

Diff Detail

Repository
rT test-suite

Event Timeline

SjoerdMeijer created this revision.Feb 15 2023, 6:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 6:14 AM
Herald added a subscriber: StephenFan. · View Herald Transcript
SjoerdMeijer requested review of this revision.Feb 15 2023, 6:14 AM
fhahn added a comment.Mar 7 2023, 1:46 AM

It would probably be even better to make the threshold an argument (perhaps pass the threshold optionally like --filter-short=N), with the default being 1.0?

It would probably be even better to make the threshold an argument (perhaps pass the threshold optionally like --filter-short=N), with the default being 1.0?

I like this idea. Where would the argument come from in the buildbots? CMake?

fhahn added a comment.Mar 7 2023, 1:58 AM

It would probably be even better to make the threshold an argument (perhaps pass the threshold optionally like --filter-short=N), with the default being 1.0?

I like this idea. Where would the argument come from in the buildbots? CMake?

I think the script is mostly (only?) used for local analysis of the results.json files generated by LNT/running the benchmarks using llvm-lit, so I would expect this to be provided by the person running the script in most cases. I might be missing some potential use cases though.

Thanks for the comments, I also like this idea!
I will prepare a new revision to support this.

Option --filter-short now accepts an optional arguments, and it defaults to 1.0s.
Some special care had to be taken if this optional argument is omitted, it then needs to recognise that the FILE arguments is not the optional argument to --filter-short, as also explained in the comments.

rengolin added inline comments.Mar 9 2023, 3:32 AM
utils/compare.py
342

what happens to filter_short_threshold on exception? Is it a stable behaviour?

SjoerdMeijer added inline comments.Mar 9 2023, 3:47 AM
utils/compare.py
342

It defaults to filter_short_threshold = 1.0 on line 329, so that should always be defined (but not always used).
I think that makes it stable behaviour, but I am not a Python programmer, so might be missing something. Let me know what you think, I could do some more python research on try/except behaviour.

This looks good to me. @fhahn ?

utils/compare.py
342

Yeah, just worried the assignment won't occur before float() succeeds. If you pass "foobar" as the option and it's still 1.0, than that's good enough. :)

Thanks for your help and reviews! I will commit this tomorrow if there will be no further comments.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 20 2023, 6:11 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.