This patch allows PRE of the following type of loads:
preheader: br label %loop loop: br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p br label %merge merge: ... br i1 ..., label %loop, label %exit
Into
preheader: %x0 = load %p br label %loop loop: %x.pre = phi(x0, x2) br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p %x1 = load %p br label %merge merge: x2 = phi(x.pre, x1) ... br i1 ..., label %loop, label %exit
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header.
Extend this comment to emphasize that this means we have proven the load must execute if the loop is entered, and is thus safe to hoist to the end of the preheader without introducing a new fault. Similarly, the in-loop clobber must be dominated by the original load and is thus fault safe.
Er, hm, there's an issue here I just realized. This isn't sound. The counter example here is when the clobber is a call to free and the loop actually runs one iteration.
You need to prove that LI is safe to execute in both locations. You have multiple options in terms of reasoning, I'll let you decide which you want to explore: speculation safety, must execute, or unfreeable allocations. The last (using allocas as an example for test), might be the easiest.