Page MenuHomePhabricator

[HotColdSplitting] Outline more than once per function
ClosedPublic

Authored by vsk on Oct 30 2018, 1:26 PM.

Details

Summary

Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.

Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 383KB post-patch (+ ~2.8x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.

Diff Detail

Repository
rL LLVM

Event Timeline

vsk created this revision.Oct 30 2018, 1:26 PM
hiraditya added inline comments.Oct 31 2018, 8:47 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗(On Diff #171773)

I think, this can go before the previous check.

489 ↗(On Diff #171773)

Add optimize for size attribute?

vsk added inline comments.Oct 31 2018, 9:03 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗(On Diff #171773)

I think that would cause splitting to fail when a predecessor outside of the cold region is the entry block. Am I missing something?

489 ↗(On Diff #171773)

Good point.

hiraditya added inline comments.Oct 31 2018, 9:08 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
480 ↗(On Diff #171773)

Creating scope '{}' may not be required.

517 ↗(On Diff #171773)

Maybe print the region which didn't get outlined, or at least the root-block.

hiraditya added inline comments.Oct 31 2018, 9:18 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗(On Diff #171773)

When SinkPostDom is true and PredBB is the entry, we can just return.

tejohnson added inline comments.Oct 31 2018, 9:56 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
522 ↗(On Diff #171773)

The last argument should be changed from a constant 1 to a running count of outlined regions. Otherwise all outlined functions will have the same name.

Maybe you can add the testcase from my previous patch: https://reviews.llvm.org/D53588

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
378 ↗(On Diff #171773)

s/anu/any/

517 ↗(On Diff #171773)

I think that would generate too much debug output. Maybe completely remove the LLVM_DEBUG stmt.

vsk updated this revision to Diff 172456.Nov 2 2018, 4:15 PM
vsk marked 5 inline comments as done.
vsk edited the summary of this revision. (Show Details)
  • While doing performance testing I found a miscompile in ./SingleSource/Regression/C++/EH/Regression-C++-class_hierarchy. I'll file a PR with more details by next week. It looks like it could be an existing bug that surfaces due to more aggressive outlining.
  • Added tests (including the one from @sebpop's earlier patch).
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗(On Diff #171773)

I see what you mean.

480 ↗(On Diff #171773)

It's not, I just find it easier to read multiple statements when there are surrounding curly braces.

517 ↗(On Diff #171773)

I think some form of debug statement here would help to tune the outlining threshold. Could we keep it for now? I'll remove some of the other redundant debug statements to keep the output reasonable.

522 ↗(On Diff #171773)

Thanks for pointing this out.

vsk edited the summary of this revision. (Show Details)Nov 2 2018, 4:38 PM

Can we push this patch? This is not enabled by default so we can continue development in subsequent patches.

vsk added a comment.EditedNov 4 2018, 3:07 PM

Can we push this patch? This is not enabled by default so we can continue development in subsequent patches.

I'd like to form a plan for addressing llvm.org/PR39545 first, to avoid inadvertently leaving the EH regression in tree.

Edit: Starting to think that the underlying bug in PR39545 is faulty lowering of llvm.eh.typeid.for in outlined thunks. At a conceptual level it shouldn't be hard to fix. I really think we should tackle that first as looks like a pretty bad miscompile affecting all in-tree clients of CodeExtractor.

junbuml added inline comments.Nov 5 2018, 1:20 PM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
160 ↗(On Diff #172456)

This function set MinSize, so I think the function name should be something like markMinSize().

410 ↗(On Diff #172456)

We should have additional check before adding SuccBB into Blocks (maybe successors of SuccBB). If a dom-frontier have a phi taking different incoming values from multiple cold blocks, it will assert. In the example below, %if.end takes different incoming values from %coldbb and %coldbb2.

define void @foo(i32 %cond) {
entry:
  %tobool = icmp eq i32 %cond, 0
  br i1 %tobool, label %if.end, label %coldbb

coldbb:
  call void (...) @sink()
  br i1 undef, label %if.end, label %coldbb2

coldbb2:
  br label %if.end

if.end:
  %p = phi i32 [0, %entry], [1, %coldbb], [3, %coldbb2]
  ret void
}

declare void @sink(...) cold
511 ↗(On Diff #172456)

I don't think it's good idea to add MinSize in this case. It is possible that a hot function or main() itself can be successfully ended up with exit(0).

hiraditya added inline comments.Nov 6 2018, 9:36 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
511 ↗(On Diff #172456)

The static analysis here is pretty conservative, unless the runtime-profile information is broken, it seems unlikely that this pass will mark a hot function as cold.

junbuml added inline comments.Nov 6 2018, 11:21 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
511 ↗(On Diff #172456)

I agree that this pass will unlikely mark hot blocks as cold. But, in line 509~511, it mark MinSize on the function, if all blocks are post-domed by a single cold block (e.g., mark-the-whole-func-cold.ll). In the example below, we may not want to mark MinSize on main().

void main() {
  // .. 
  // hot loop .. 
  // 
  exit (0);
}
junbuml added inline comments.Nov 6 2018, 11:52 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
282 ↗(On Diff #172456)

Considering the case below, I believed we should not mark the MinSize here as well.

void foo(bool c) {
  if(c) {
    ....
    return;
  }

   // hot code
   exit (0);
}
vsk added inline comments.Nov 6 2018, 11:58 AM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
160 ↗(On Diff #172456)

That's true, but as a follow-up, I'd like to have this helper add Attribute::Cold, and move the function to the end of the text segment. Would it be all right to keep the aspirational name?

282 ↗(On Diff #172456)

I think this is illustrative of a more general problem: I don't think we should treat NoReturn functions as cold at all. On Darwin, functions like longjmp are NoReturn but warm. Our kernel uses continuation functions (conceptually similar to longjmp) to reduce stack usage: these are also NoReturn but hot.

IMO, noreturn functions which are actually cold should be marked as such (abort, etc., but not exit). That's what we do on Darwin.

410 ↗(On Diff #172456)

As noted in llvm.org/PR39545, I'll send out a separate review to address this.

vsk added inline comments.Nov 6 2018, 12:02 PM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
282 ↗(On Diff #172456)

If the concern is that it's hard to add attributes to functions in libc, we could special-case them as needed. But I think treating all noreturn functions as cold when they might not be will cause problems, or at least prevent us from attaching MinSize, etc.

junbuml added inline comments.Nov 6 2018, 1:10 PM
llvm/lib/Transforms/IPO/HotColdSplitting.cpp
282 ↗(On Diff #172456)

IMHO, I doubt if unlikelyExecuted() is conservative enough without profile data. Except a specific cold mark(Attribute::Cold), all other checks may not perfectly clear to say the blocks are really cold.

vsk updated this revision to Diff 172867.Nov 6 2018, 3:28 PM
vsk retitled this revision from [HotColdSplitting] [WIP] Outline more than once per function to [HotColdSplitting] Outline more than once per function.
vsk edited the summary of this revision. (Show Details)

Removing the WIP label, as I'm confident in the results now.

  • Fix an iterator invalidation bug in takeSingleEntrySubRegion.
  • I've addressed the immediate miscompile reported in llvm.org/PR39545, and plan on improving handling of eh.typeid.for in a follow-up.
  • @junbuml has concerns about applying MinSize to functions which call exit. IMHO the right fix is to not treat noreturn calls as cold (also as a follow-up).

IMHO the right fix is to not treat noreturn calls as cold (also as a follow-up).

I'm not sure if handling noreturn is a right fix. A block containing exit(0) will have "unreachable", so it must be still considered as a cold block even after removing noreturn from unlikelyExecuted().

vsk added a comment.Nov 7 2018, 12:55 PM

IMHO the right fix is to not treat noreturn calls as cold (also as a follow-up).

I'm not sure if handling noreturn is a right fix. A block containing exit(0) will have "unreachable", so it must be still considered as a cold block even after removing noreturn from unlikelyExecuted().

I had in mind:

if (blockEndsInUnreachable(BB)) {
  // Calls to noreturn functions are followed by an unreachable inst, but
  // the call itself may be warm (e.g. longjmp, or exit).
  if (auto *CI =
          dyn_cast_or_null<CallInst>(BB.getTerminator()->getPrevNode()))
    if (CI->hasFnAttr(Attribute::NoReturn))
      return false;
  return true;
}

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

vsk added a comment.Nov 7 2018, 12:59 PM

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

That's a great idea. I'll do that now with the test-suite + externals. FWIW, I also built all of iOS with this pass enabled, and the only compiler crash I found was https://bugs.llvm.org/show_bug.cgi?id=39564. I'm still investigating possible miscompiles.

vsk added a comment.EditedNov 7 2018, 2:36 PM
In D53887#1290587, @vsk wrote:

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

That's a great idea. I'll do that now with the test-suite + externals. FWIW, I also built all of iOS with this pass enabled, and the only compiler crash I found was https://bugs.llvm.org/show_bug.cgi?id=39564. I'm still investigating possible miscompiles.

So, I defined unlikelyExecuted(BB) = true, and found a bug while running through the test-suite. The issue is that two do two DFS's (one on predecessor blocks, one on successor blocks). The second DFS can mark a block already marked by the first DFS. In practice this isn't a problem, because CodeExtractor maintains a set of blocks. But it's weird to have duplicate blocks in the extraction list, and it's a simple issue to fix.

Other than that, I found no miscompiles. The test suite passed cleanly.

Edit: the unlikelyExecuted(BB) = true experiment is superseded by the one described below.

In D53887#1290587, @vsk wrote:

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

That's a great idea. I'll do that now with the test-suite + externals. FWIW, I also built all of iOS with this pass enabled, and the only compiler crash I found was https://bugs.llvm.org/show_bug.cgi?id=39564. I'm still investigating possible miscompiles.

So, I defined bool Cold = !pred_empty(BB), and this resulted in 29MB of text being outlined in test suite (out of 43MB total). I found a small issue in D54189, but no actual miscompilations. All tests passed and matched the reference output.

vsk added a comment.Nov 14 2018, 12:20 PM

Ping, are there any outstanding concerns about this one? It'd be nice to have this in-tree, as I have a few follow-ups based on it.

vsk updated this revision to Diff 176518.Dec 3 2018, 5:12 PM
vsk edited the summary of this revision. (Show Details)

Friendly ping. I've rebased this on top of r348205, which fixes the assertion failure pointed out in llvm.org/PR39564.

I've stress-tested this by:

  • Building LNT+externals with hot/cold splitting enabled. I forced outlining to occur whenever a block has more than 1 predecessor, so long as it wouldn't result in the entire function being outlined. All output validation tests still passed.
  • Running check-llvm in a stage2 build with hot/cold splitting enabled in the same way described above, but with stack coloring disabled due to llvm.org/PR39671.
hiraditya accepted this revision.Dec 7 2018, 12:04 PM

LGTM, if there are outstanding comments from other reviewers we can address them in subsequent patches.

This revision is now accepted and ready to land.Dec 7 2018, 12:04 PM
This revision was automatically updated to reflect the committed changes.