The Reassociate pass currently does two things. First, it factors out common sub-expressions and otherwise *optimizes* the expression tree. Second, it blindly linearizes the result, regardless of whether any *optimizations* were performed.
This second part is problematic, as we don't have a robust scheduler anywhere else in the pipeline, and a linear execution order is distinctly non-optimal on modern CPUs. Consider the following (toy) example:
for (int i = 0; i < N; i++) {
int tmp = a[i]; tmp += b; acquire_fence(); sum += tmp;
}
In this case, we end up with a nicely unrolled loop, but due to the linearization of the add expressions, we emit a long chain of additions with a single target register. This form bottlenecks in the scheduler on modern X86 chips for a 25% performance slow down over the original form.
Note that to do this, I had remove a stale piece of code which tried to aggressively re-try expression formation after removing uses. Given that code has been disabled since 2012, I'm not too worried about that.
An alternate approach to this problem would be to invest in building a scheduler for associative expressions which can properly balance ILP and register pressure. I'm hoping not to have to solve that problem. :)