This is an archive of the discontinued LLVM Phabricator instance.

[reassociate] Fix excessive revisits when processing long chains of reassociatable instructions.
ClosedPublic

Authored by dsanders on Apr 17 2018, 1:20 PM.

Details

Summary

Some of our internal testing detected a major compile time regression which I've
tracked down to:

r278938 - Revert "Reassociate: Reprocess RedoInsts after each inst".

It appears that processing long chains of reassociatable instructions causes
non-linear (potentially exponential) growth in the number of times an
instruction is revisited. For example, the included test revisits instructions
220 times in a 20-instruction test.

It appears that r278938 reversed the order instructions were visited and that
this is preventing scheduled revisits from being cancelled as a result of
visiting the instructions naturally during normal processing. However, simply
reversing the order also harmed the generated code. Upon closer inspection, it
was discovered that revisits occurred in the opposite order to the first pass
(Thanks to escha for spotting that).

This patch makes the revisit order consistent with the first pass which allows
more revisits to be cancelled. This does appear to have a small impact on the
generated code in few cases but it significantly reduces compile-time.

After this patch, our internal test that was most affected by the regression
dropped from ~2 million revisits to ~4k resulting in Reassociate having 0.46%
of the runtime it had before (99.54% improvement).

Here's the summaries reported by lnt for the LLVM test-suite with --benchmarking-only:

metricgeomean before patchgeomean after patchdelta
compile time0.19560.1261-35.54%
execution time0.32400.3237-
code size7365.44597365.6079-

The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though:

Performance Regressions - compile_timeΔPreviousCurrent
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk9.82%2.04732.2483

Diff Detail

Event Timeline

dsanders created this revision.Apr 17 2018, 1:20 PM

I have some performance figures from the LLVM test-suite now using the --benchmarking-only option. Here's the summaries reported by lnt:

metricgeomean before patchgeomean after patchdelta
compile time0.19560.1261-35.54%
execution time0.32400.3237-
code size7365.44597365.6079-

The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though:

Performance Regressions - compile_timeΔPreviousCurrent
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk9.82%2.04732.2483
dberlin accepted this revision.May 2 2018, 9:30 AM
This revision is now accepted and ready to land.May 2 2018, 9:30 AM

This looks right.
There is an optimal order to visit and revisit, and if you switch them, you will definitely take an extra N iterations per instruction :)

dberlin added inline comments.May 2 2018, 9:42 AM
test/Transforms/Reassociate/long-chains.ll
31

Is it possible to check the number of visits is within a range?

(IE 100-200, and not 1 million)?

I don't remember if lit can do this :(

Thanks

test/Transforms/Reassociate/long-chains.ll
31

I don't think it can do ranges but orders of magnitude are possible by matching a fixed number of digits with something like:

; CHECK: {{[1-9][0-9]}} reassociate - Number of insts reassociated

Would you like me to change it to that?

dberlin added inline comments.May 2 2018, 9:50 AM
test/Transforms/Reassociate/long-chains.ll
31

Yeah, why don't we do that just so it doesn't get messed up by random changes.
We really care about the order of magnitude, not the exact number.

(I think :-P)

dsanders added inline comments.May 2 2018, 10:28 AM
test/Transforms/Reassociate/long-chains.ll
31

Ok. For the Number of insts reassociated checks I'll go with [1-9][0-9] (effectively >=10 and <= 99) and for the Number of multiplies factored I'll go with [3-9] so the number of changes made can't go down without us noticing.

dsanders edited the summary of this revision. (Show Details)May 2 2018, 11:02 AM
This revision was automatically updated to reflect the committed changes.