Remove a FIXME comment: allow all cold regions of a function to be outlined.
Diff Detail
Event Timeline
@sebpop thanks for this patch! I don't see any problems with it (although I would prefer that the test explicitly check that outlined functions contain the correct instructions).
At a higher-level, I'm seeing some problems with the forward/back cold propagation done in getHotBlocks on internal projects. The propagation seems to stop when it encounters simple control flow, like an if-then-else or a for loop, after which cold code is unconditionally executed.
I have a prototype of a different propagation scheme which overcomes some of these limitations. The idea is to mark blocks which are post-dominated by a cold block, or are dominated by a cold block, as cold. This is able to handle the control flow I described, and isn't limited to requiring a single exit block. Could you give me a day to evaluate it further, run benchmarks etc. and report back? If it turns out to be promising, istm that it'd make sense to rebase this patch on top of it.
I was discussing with Aditya over the llvm dev meeting that it would be good to move the static analysis of hot/cold blocks to an analysis pass instead of carrying it in the hot/cold split pass.
That way we will have a smaller transform pass and other passes could use the static analysis of hot/cold blocks.
Btw, here's an example. ToT does not outline given this code:
extern void sideeffect(int); extern void __attribute__((noreturn)) sink(); void foo(int cond) { if (cond) { while (cond > 10) { --cond; sideeffect(0); } sink(); } sideeffect(1); }
With my prototype, the loop may be outlined:
Outlined Region: ; Function Attrs: minsize nounwind optsize ssp uwtable define internal void @__outlined_foo_if.then(i32 %cond) #3 { newFuncRoot: br label %if.then if.then: ; preds = %newFuncRoot %cmp3 = icmp sgt i32 %cond, 10 br i1 %cmp3, label %while.body.preheader, label %while.end while.body.preheader: ; preds = %if.then br label %while.body while.body: ; preds = %while.body.preheader, %while.body %cond.addr.04 = phi i32 [ %dec, %while.body ], [ %cond, %while.body.preheader ] %dec = add nsw i32 %cond.addr.04, -1 tail call void @sideeffect(i32 0) #4 %cmp = icmp sgt i32 %cond.addr.04, 11 br i1 %cmp, label %while.body, label %while.end.loopexit while.end.loopexit: ; preds = %while.body br label %while.end while.end: ; preds = %while.end.loopexit, %if.then tail call void (...) @sink() #5 unreachable } HotColdSplitting: Outlined 12 insts
I have a prototype of a different propagation scheme which overcomes some of these limitations. The idea is to mark blocks which are post-dominated by a cold block, or are dominated by a cold block, as cold. This is able to handle the control flow I described, and isn't limited to requiring a single exit block. Could you give me a day to evaluate it further, run benchmarks etc. and report back? If it turns out to be promising, istm that it'd make sense to rebase this patch on top of it.
As noted in my latest comment on D53534, the current naming scheme of extracted functions is such that for IR from a release built clang we would get identical names if multiple functions are extracted from the same original function.
@sebpop are you interested in rebasing this on the new cold block propagation code? If not, I'd be happy to give it a try. Is the main challenge in teaching CodeExtractor to update the DT and PDT?
Please go ahead and give it a try. I haven't thought about maintaining DT and PDT: you are right they need to be maintained after your last change.
For splitting more than one cold region, maintaining a DT maybe expensive but we don't have to do that. All we need is to 'color/mark' the blocks which we want to outline. In the next iteration the colored blocks need not be considered. It may be slightly non-trivial in a general case where coloring SEME and reasoning about DT/PDT of blocks which are non-colored. What we can do is in the subsequent iterations we can take a sub-graph which do not intersect anywhere except at entry or exit. This way DT/PDT will still be preserved in the non-intersecting regions. I think jump-threading also works(should work) on similar lines.
I'm prototyping a patch along these lines and will try to share it soon. Outlining non-intersecting sub-graphs is key to the approach. The idea is to build a worklist of all (possibly multiple-entry) outlining regions up front, then extract+outline sub-graphs from each region until the worklist is empty.