Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I'd like to add: thanks for doing this! We should definitely encourage the adding of performance tests like this.
:)
The loop looks fine. Someone else should check that the build system etc. changes are correct.
Sorry to be this guy: This benchmark is running for too long! We should aim for 0.5-1s runtimes for our benchmarks and the 1000000 looks arbitrary to me. (This takes nearly 3x the time of salsa20, the next slowest benchmark in SingleSource/Benchmarks/Misc for me).
Just lowering is the way to go. Aiming for a specific wall time is contraproductive at least today, as we also have modes where we look at profile data and performance counters and want to compare them between runs.
(Long term we should have something like googlebenchmark for our microbenchmarking here which would runt he function just often enough to get stable timing results. Maybe by tweaking it to run a fixed number of times for the cases with an external profiling tool.)
For the record: This was in response to Michaels comment on llvm-commits which phabricator ignored...
FWIW, I've done an experiment a while back on a few of AArch64 and X86 machines to see what the minimum running time should be for the programs in the test-suite so that they wouldn't be noisy because they run for too short.
My experiments show that across the machines I tested on, as soon as the program runs for longer than 0.01 seconds, there's no noise because of the shortness of the program run-time. This is with using "lnt runtest nt --use-perf=1" on linux.
So, in my experience, I'd say aiming for 0.1s runtime still leaves an order of magnitude safety margin, so that may be a good execution time to aim for.
My back-of-the-envelope calculation from a bit more than a year ago is that if we could make all programs in the test-suite run for about 0.1s, the test-suite would execute about 200 times faster than today. And probably produce results of the same quality as today. See slide 26 in http://llvm.org/devmtg/2015-10/slides/Beyls-AutomatedPerformanceTrackingOfLlvmGeneratedCode.pdf. Or, in other words, it would run in about 30s instead of almost 2 hours for a single run on a Cortex-A53. It'd become feasible to have full multi-run test-suite runs for every commit.