This is an archive of the discontinued LLVM Phabricator instance.

[hot-cold-split] split more than a cold region per function
AbandonedPublic

Authored by sebpop on Oct 23 2018, 10:10 AM.

Details

Summary

Remove a FIXME comment: allow all cold regions of a function to be outlined.

Diff Detail

Event Timeline

sebpop created this revision.Oct 23 2018, 10:10 AM
vsk added a comment.Oct 23 2018, 10:39 AM

@sebpop thanks for this patch! I don't see any problems with it (although I would prefer that the test explicitly check that outlined functions contain the correct instructions).

At a higher-level, I'm seeing some problems with the forward/back cold propagation done in getHotBlocks on internal projects. The propagation seems to stop when it encounters simple control flow, like an if-then-else or a for loop, after which cold code is unconditionally executed.

I have a prototype of a different propagation scheme which overcomes some of these limitations. The idea is to mark blocks which are post-dominated by a cold block, or are dominated by a cold block, as cold. This is able to handle the control flow I described, and isn't limited to requiring a single exit block. Could you give me a day to evaluate it further, run benchmarks etc. and report back? If it turns out to be promising, istm that it'd make sense to rebase this patch on top of it.

In D53588#1272789, @vsk wrote:

@sebpop thanks for this patch! I don't see any problems with it (although I would prefer that the test explicitly check that outlined functions contain the correct instructions).

At a higher-level, I'm seeing some problems with the forward/back cold propagation done in getHotBlocks on internal projects. The propagation seems to stop when it encounters simple control flow, like an if-then-else or a for loop, after which cold code is unconditionally executed.

I have a prototype of a different propagation scheme which overcomes some of these limitations. The idea is to mark blocks which are post-dominated by a cold block, or are dominated by a cold block, as cold. This is able to handle the control flow I described, and isn't limited to requiring a single exit block. Could you give me a day to evaluate it further, run benchmarks etc. and report back? If it turns out to be promising, istm that it'd make sense to rebase this patch on top of it.

I was discussing with Aditya over the llvm dev meeting that it would be good to move the static analysis of hot/cold blocks to an analysis pass instead of carrying it in the hot/cold split pass.
That way we will have a smaller transform pass and other passes could use the static analysis of hot/cold blocks.

vsk added a comment.Oct 23 2018, 11:22 AM
In D53588#1272789, @vsk wrote:

@sebpop thanks for this patch! I don't see any problems with it (although I would prefer that the test explicitly check that outlined functions contain the correct instructions).

At a higher-level, I'm seeing some problems with the forward/back cold propagation done in getHotBlocks on internal projects. The propagation seems to stop when it encounters simple control flow, like an if-then-else or a for loop, after which cold code is unconditionally executed.

Btw, here's an example. ToT does not outline given this code:

extern void sideeffect(int);

extern void __attribute__((noreturn)) sink();

void foo(int cond) {
  if (cond) {
    while (cond > 10) {
      --cond;
      sideeffect(0);
    }

    sink();
  }

  sideeffect(1);
}

With my prototype, the loop may be outlined:

Outlined Region:
; Function Attrs: minsize nounwind optsize ssp uwtable
define internal void @__outlined_foo_if.then(i32 %cond) #3 {
newFuncRoot:
  br label %if.then

if.then:                                          ; preds = %newFuncRoot
  %cmp3 = icmp sgt i32 %cond, 10
  br i1 %cmp3, label %while.body.preheader, label %while.end

while.body.preheader:                             ; preds = %if.then
  br label %while.body

while.body:                                       ; preds = %while.body.preheader, %while.body
  %cond.addr.04 = phi i32 [ %dec, %while.body ], [ %cond, %while.body.preheader ]
  %dec = add nsw i32 %cond.addr.04, -1
  tail call void @sideeffect(i32 0) #4
  %cmp = icmp sgt i32 %cond.addr.04, 11
  br i1 %cmp, label %while.body, label %while.end.loopexit

while.end.loopexit:                               ; preds = %while.body
  br label %while.end

while.end:                                        ; preds = %while.end.loopexit, %if.then
  tail call void (...) @sink() #5
  unreachable
}
HotColdSplitting: Outlined 12 insts

I have a prototype of a different propagation scheme which overcomes some of these limitations. The idea is to mark blocks which are post-dominated by a cold block, or are dominated by a cold block, as cold. This is able to handle the control flow I described, and isn't limited to requiring a single exit block. Could you give me a day to evaluate it further, run benchmarks etc. and report back? If it turns out to be promising, istm that it'd make sense to rebase this patch on top of it.

vsk added a comment.Oct 23 2018, 11:22 AM
In D53588#1272789, @vsk wrote:

@sebpop thanks for this patch! I don't see any problems with it (although I would prefer that the test explicitly check that outlined functions contain the correct instructions).

At a higher-level, I'm seeing some problems with the forward/back cold propagation done in getHotBlocks on internal projects. The propagation seems to stop when it encounters simple control flow, like an if-then-else or a for loop, after which cold code is unconditionally executed.

I have a prototype of a different propagation scheme which overcomes some of these limitations. The idea is to mark blocks which are post-dominated by a cold block, or are dominated by a cold block, as cold. This is able to handle the control flow I described, and isn't limited to requiring a single exit block. Could you give me a day to evaluate it further, run benchmarks etc. and report back? If it turns out to be promising, istm that it'd make sense to rebase this patch on top of it.

I was discussing with Aditya over the llvm dev meeting that it would be good to move the static analysis of hot/cold blocks to an analysis pass instead of carrying it in the hot/cold split pass.
That way we will have a smaller transform pass and other passes could use the static analysis of hot/cold blocks.

Aditya mentioned this to me as well :). Definitely on my list!

As noted in my latest comment on D53534, the current naming scheme of extracted functions is such that for IR from a release built clang we would get identical names if multiple functions are extracted from the same original function.

vsk added a comment.Oct 25 2018, 3:45 PM

@sebpop are you interested in rebasing this on the new cold block propagation code? If not, I'd be happy to give it a try. Is the main challenge in teaching CodeExtractor to update the DT and PDT?

In D53588#1276693, @vsk wrote:

@sebpop are you interested in rebasing this on the new cold block propagation code? If not, I'd be happy to give it a try. Is the main challenge in teaching CodeExtractor to update the DT and PDT?

Please go ahead and give it a try. I haven't thought about maintaining DT and PDT: you are right they need to be maintained after your last change.

For splitting more than one cold region, maintaining a DT maybe expensive but we don't have to do that. All we need is to 'color/mark' the blocks which we want to outline. In the next iteration the colored blocks need not be considered. It may be slightly non-trivial in a general case where coloring SEME and reasoning about DT/PDT of blocks which are non-colored. What we can do is in the subsequent iterations we can take a sub-graph which do not intersect anywhere except at entry or exit. This way DT/PDT will still be preserved in the non-intersecting regions. I think jump-threading also works(should work) on similar lines.

vsk added a comment.Oct 30 2018, 9:59 AM

For splitting more than one cold region, maintaining a DT maybe expensive but we don't have to do that. All we need is to 'color/mark' the blocks which we want to outline. In the next iteration the colored blocks need not be considered. It may be slightly non-trivial in a general case where coloring SEME and reasoning about DT/PDT of blocks which are non-colored. What we can do is in the subsequent iterations we can take a sub-graph which do not intersect anywhere except at entry or exit. This way DT/PDT will still be preserved in the non-intersecting regions. I think jump-threading also works(should work) on similar lines.

I'm prototyping a patch along these lines and will try to share it soon. Outlining non-intersecting sub-graphs is key to the approach. The idea is to build a worklist of all (possibly multiple-entry) outlining regions up front, then extract+outline sub-graphs from each region until the worklist is empty.

sebpop abandoned this revision.Nov 1 2018, 11:22 AM
brzycki removed a subscriber: brzycki.