This issue raises in https://reviews.llvm.org/D26855: when we tried to use Machine Combiner instead of DAG Combiner for fast-math operations (in our case it was fdiv as a fisrt step) we got significant increasing of compile time - from about 2s up to 18s. I used a very specific and worst possible test for div operations (see attachment) but in any case it was very unpleasant result. After compiler profiling I discovered that the reason of the slowdown is huge number of creations of MC traces (even if don't use them). This patch moves the trace creation in the place when it's really needed.
Diff Detail
Event Timeline
LGTM.
Please put a perf measurement in the commit message (or add a test to test-suite so we don't regress?) - something like:
Before this patch (testing on N GHz Haswell):
$ time llc -o /dev/null -enable-unsafe-fp-math foo.ll
X secs
After this patch:
$ time llc -o /dev/null -enable-unsafe-fp-math foo.ll
Y secs
foo.ll is attached to D29627
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
449–450 | How about making the comment more direct, so we don't lose track of the reason. Something like: |
Just for your info: I collected the perf numbers.
DAGCombiner - trunk
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.685s
DAGCombiner + Speed patch
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.655s
MachineCombiner w/o Speed patch
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m21.614s
MachineCombiner + Speed patch
time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
real 0m1.593s
How about making the comment more direct, so we don't lose track of the reason. Something like:
"Calculating the trace metrics may be expensive, so only do this when necessary."