super short version: this is a loop pass that does trivial CFG simplification on a loop, as requested by Chandler as the solution to the real problem below. it isn't used in the pass manager yet. right now it only merges consecutive blocks; it doesn't do anything fancier, but could in the future.
Details:
This IR has a perfectly reasonable nested loop that rotate -> unroll does not actually unroll all the way:
define i32 @foo(i32* %P, i64 *%Q) { entry: br label %outer outer: %y.2 = phi i32 [ 0, %entry ], [ %y.inc2, %outer.latch2 ] br label %inner inner: %x.2 = phi i32 [ 0, %outer ], [ %inc2, %inner ] %inc2 = add nsw i32 %x.2, 1 %exitcond2 = icmp eq i32 %inc2, 3 store i32 %x.2, i32* %P br i1 %exitcond2, label %outer.latch, label %inner outer.latch: %y.inc2 = add nsw i32 %y.2, 1 %exitcond.outer = icmp eq i32 %y.inc2, 3 store i32 %y.2, i32* %P br i1 %exitcond.outer, label %exit, label %outer.latch2 outer.latch2: %t = sext i32 %y.inc2 to i64 store i64 %t, i64* %Q br label %outer exit: ret i32 0 }
This is because after unrolling the inner loop, the outer loop has two header blocks, which while valid and canonical in terms of LCSSA, is not what loop rotate understands. The hack solution is to run rotate -> unroll -> simplifycfg-> rotate -> unroll, which is bad. The slightly less hack is to put this simplification into LoopSimplify, which Chandler argues is a bad idea because LoopSimplify specifically simplifies in ways that maintain the canonical form, and nothing else (and we may want to run LoopSimplifyCFG in other places for other reasons). Chandler suggests that the most general solution is just to add a much-needed LoopSimplifyCFG, which I did.
The problem with using this right now is that in practice, you need a pipeline that looks like this to make use of it:
LoopPassManager:
- Loop SimplifyCFG
- Loop Rotate
- Loop Unroll
And currently the PassManagerBuilder causes the LPMs to be split up due to analyses that are required being inserted in between (which chandler is working on). However, with a shim to require the associated analyses, this does work in practice in our pipeline out of tree, and a test just for this pass is included.
This is important to us because we have critical benchmark code that takes a form similar to this and similarly fails to unroll, resulting in catastrophic performance regressions.
Minor: convention is to not indent namespaces: http://llvm.org/docs/CodingStandards.html#namespace-indentation
Also, why not make this a struct?