This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Add regression test for indirect branch critical edge splitting
ClosedPublic

Authored by mkuper on Feb 23 2017, 3:18 PM.

Details

Summary

This is a regression test benchmark for D29916.

Eli, does this seems reasonable?
For reference, without D29916 this takes ~12 seconds on my machine, and with D29916, about 3.5 seconds.

Diff Detail

Event Timeline

mkuper created this revision.Feb 23 2017, 3:18 PM

I'd like to add: thanks for doing this! We should definitely encourage the adding of performance tests like this.

I'd like to add: thanks for doing this! We should definitely encourage the adding of performance tests like this.

You should be thanking Eli, not me, he's pretty much forcing me to do this. :-)

I'd like to add: thanks for doing this! We should definitely encourage the adding of performance tests like this.

You should be thanking Eli, not me, he's pretty much forcing me to do this. :-)

Okay. Thanks, Eli! :-)

efriedma edited edge metadata.Feb 23 2017, 3:38 PM

:)

The loop looks fine. Someone else should check that the build system etc. changes are correct.

hfinkel accepted this revision.Feb 23 2017, 3:44 PM

LGTM

This revision is now accepted and ready to land.Feb 23 2017, 3:44 PM
This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Mar 2 2017, 10:28 AM

Sorry to be this guy: This benchmark is running for too long! We should aim for 0.5-1s runtimes for our benchmarks and the 1000000 looks arbitrary to me. (This takes nearly 3x the time of salsa20, the next slowest benchmark in SingleSource/Benchmarks/Misc for me).

Sorry to be this guy: This benchmark is running for too long! We should aim for 0.5-1s runtimes for our benchmarks and the 1000000 looks arbitrary to me. (This takes nearly 3x the time of salsa20, the next slowest benchmark in SingleSource/Benchmarks/Misc for me).

Just lowering is the way to go. Aiming for a specific wall time is contraproductive at least today, as we also have modes where we look at profile data and performance counters and want to compare them between runs.

(Long term we should have something like googlebenchmark for our microbenchmarking here which would runt he function just often enough to get stable timing results. Maybe by tweaking it to run a fixed number of times for the cases with an external profiling tool.)

Sorry to be this guy: This benchmark is running for too long! We should aim for 0.5-1s runtimes for our benchmarks and the 1000000 looks arbitrary to me. (This takes nearly 3x the time of salsa20, the next slowest benchmark in SingleSource/Benchmarks/Misc for me).

Just lowering is the way to go. Aiming for a specific wall time is contraproductive at least today, as we also have modes where we look at profile data and performance counters and want to compare them between runs.

(Long term we should have something like googlebenchmark for our microbenchmarking here which would runt he function just often enough to get stable timing results. Maybe by tweaking it to run a fixed number of times for the cases with an external profiling tool.)

For the record: This was in response to Michaels comment on llvm-commits which phabricator ignored...

Sorry to be this guy: This benchmark is running for too long! We should aim for 0.5-1s runtimes for our benchmarks and the 1000000 looks arbitrary to me. (This takes nearly 3x the time of salsa20, the next slowest benchmark in SingleSource/Benchmarks/Misc for me).

FWIW, I've done an experiment a while back on a few of AArch64 and X86 machines to see what the minimum running time should be for the programs in the test-suite so that they wouldn't be noisy because they run for too short.
My experiments show that across the machines I tested on, as soon as the program runs for longer than 0.01 seconds, there's no noise because of the shortness of the program run-time. This is with using "lnt runtest nt --use-perf=1" on linux.
So, in my experience, I'd say aiming for 0.1s runtime still leaves an order of magnitude safety margin, so that may be a good execution time to aim for.

My back-of-the-envelope calculation from a bit more than a year ago is that if we could make all programs in the test-suite run for about 0.1s, the test-suite would execute about 200 times faster than today. And probably produce results of the same quality as today. See slide 26 in http://llvm.org/devmtg/2015-10/slides/Beyls-AutomatedPerformanceTrackingOfLlvmGeneratedCode.pdf. Or, in other words, it would run in about 30s instead of almost 2 hours for a single run on a Cortex-A53. It'd become feasible to have full multi-run test-suite runs for every commit.