This patch allows PRE of the following type of loads:
preheader: br label %loop loop: br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p br label %merge merge: ... br i1 ..., label %loop, label %exit
Into
preheader: %x0 = load %p br label %loop loop: %x.pre = phi(x0, x2) br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p %x1 = load %p br label %merge merge: x2 = phi(x.pre, x1) ... br i1 ..., label %loop, label %exit
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header.
clang-tidy: warning: invalid case style for function 'PerformLoopLoadPRE' [readability-identifier-naming]
not useful