A = ? op ? B = A op X C = B op Y --> A = ? op ? B = X op Y C = A op B
'op' is currently limited to x86 AVX scalar FP adds (with fast-math on), but in theory, it could be any associative math/logic op (see TODO in code comment).
This patch generalizes the pattern match to ignore the instruction that defines 'A'. So instead of a sequence of 3 adds, we now only need to find 2 dependent adds and decide if it's worth reassociating them.
This generalization has a compile-time cost because we can now match more instruction sequences and we rely more heavily on the machine combiner to discard sequences where reassociation doesn't improve the critical path.
For example, in the new test case:
A = M div N B = A add X C = B add Y
We'll match 2 reassociation patterns, but this transform doesn't reduce the critical path:
A = M div N B = A add Y C = B add X
We need the combiner to reject that pattern but select this:
A = M div N B = X add Y C = B add A
On Mehdi's (hopefully degenerate for x86) test case from the r236031 post-commit thread, the compile-time increases from ~0.2 sec to 5.0 sec because the combiner completes 3963 reassociations. Using test-suite's benchmarking subset, however, the only test where this completes more than 4 times is linpack; there it reassociates 14 times (used to be 11). But I don't see any compile-time difference from doing that extra optimization work.