This check was present before https://reviews.llvm.org/D126061. I eliminated it
as part of that patch because it seemed superfluous; the check has no influence
on behavior. However, as @yurai007 points out in
https://reviews.llvm.org/D128097, this change may have had a negative impact on
compile time, so I'm adding the check back in.
Details
- Reviewers
- None
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Marking as {WIP] for the time being because I'm uploading this change for discussion.
Not seeing a statistically significant impact from this change: http://llvm-compile-time-tracker.com/compare.php?from=8b6f69a4da5baaf3748798a84dd16a2481b7ca7f&to=797ba50f5fd88017925fe765427b1f5f136c3310&stat=instructions Maybe it's an improvement at the 0.05% level, but it may just as well be noise. As such, I probably wouldn't bother with the change.
Thanks for performing the measurement for me!
Just a question for my understanding before I abandon the change: Shouldn't the "instructions" count be relatively resistant to noise? (I was assuming this is based on performance counters? What would cause noise in this metric?)
Edit: As an empirical experiment, I ran perf stat -e instructions,cpu-cycles multiple times on the same Clang invocation. (I assume perf is what the tracker uses to obtain the numbers it reports?) I did indeed observe variations in the instruction count from run to run, with a spread of about 0.15% between the smallest and largest value. The cycle count was really noisy (as you already pointed out to me on https://reviews.llvm.org/D128097), with a spread of over 7%.
I found these docs that state that the retired instruction counter can be incremented, among other things, on hardware interrupts and page faults. I assume this is one source for the variation?
I had previously assumed that instruction counts would be identical from run to run because that is what I observed with Callgrind (which is what I've been using to do most of my profiling). I assume that Callgrind uses some other mechanism, not based on hardware performance counters, to count instructions?
Anyway, given that the measured difference for this patch is below the noise level, I will abandon this change.
There are two primary sources of noise. The first is kernel activity, as the instructions metric is a combined userspace and kernel metric. I do collect userspace-only as well (this is instructions:u). On that metric this does look like a small improvement just above the noise floor: http://llvm-compile-time-tracker.com/compare.php?from=8b6f69a4da5baaf3748798a84dd16a2481b7ca7f&to=797ba50f5fd88017925fe765427b1f5f136c3310&stat=instructions:u
The other is ASLR. If ASLR is disabled, the instructions:u measurements become perfectly noise-free. This is actually bad, because a lot of changes will then show statistically significant and perfectly reproducible compile-time changes that are ultimately just down to how exactly the address space gets laid out. So this is intentional noise :)
And yes, execution statistics are collected using perf. callgrind uses a CPU simulator instead, which is a much more controlled environment...
Ah -- that makes sense.
On that metric this does look like a small improvement just above the noise floor: http://llvm-compile-time-tracker.com/compare.php?from=8b6f69a4da5baaf3748798a84dd16a2481b7ca7f&to=797ba50f5fd88017925fe765427b1f5f136c3310&stat=instructions:u
So based on that, would you recommend that I un-abandon this revision after all and submit it for review?
The other is ASLR. If ASLR is disabled, the instructions:u measurements become perfectly noise-free. This is actually bad, because a lot of changes will then show statistically significant and perfectly reproducible compile-time changes that are ultimately just down to how exactly the address space gets laid out. So this is intentional noise :)
And yes, execution statistics are collected using perf. callgrind uses a CPU simulator instead, which is a much more controlled environment...
Thanks for the clarification. I learned a lot about performance measurement from this discussion!
Indeed, measuring performance is tricky and chasing slow downs can be challenging especially those at the level of ~0.2%-0.1% or less.
Personally, every time when I need to use perf on Clang I follow those short guidelines to reduce noise and get relevant numbers: https://llvm.org/docs/Benchmarking.html#linux
It contains some extra information about controlling mentioned ASLR (via *randomize_va_space), SMT, cpu isolation and more.