This is an archive of the discontinued LLVM Phabricator instance.

Hot cold splitting pass
ClosedPublic

Authored by hiraditya on Aug 13 2018, 11:55 AM.

Details

Summary

Find cold blocks based on profile information (or optionally with static analysis).
Forward propagate profile information to all cold-blocks.
Outline a cold region.
Set calling conv and prof hint for the callsite of the outlined function.

Diff Detail

Event Timeline

hiraditya created this revision.Aug 13 2018, 11:55 AM

This looks good. Do you have a testcase that demonstrates what this pass does?

lib/Transforms/IPO/HotColdSplitting.cpp
210

Another thing for the future development would be to find the largest cold section of the code: for example, if there are two adjacent loops that are both in a cold side of an if-statement, both of them could be outlined into the same function.

This looks good. Do you have a testcase that demonstrates what this pass does?

Working on it.

lib/Transforms/IPO/HotColdSplitting.cpp
210

That's a good idea! Thanks. I'll work on that. Another improvement I'm thinking of is to outline multiple regions from a function.

sebpop added inline comments.Aug 16 2018, 1:34 PM
lib/Transforms/IPO/HotColdSplitting.cpp
210

find cold blocks like this:

+ bool unlikelyExecuted(BasicBlock &BB) {
+    if (blockEndsInUnreachable(&BB))
+      return true;
+    // Exception handling blocks are unlikely executed.
+    if (BB.isEHPad())
+      return true;
+    for (const Instruction &I : BB)
+      if (const CallInst *CI = dyn_cast<CallInst>(&I)) {
+        // The block is cold if it calls functions tagged as cold or noreturn.
+        if (CI->hasFnAttr(Attribute::Cold) ||
+            CI->hasFnAttr(Attribute::NoReturn))
+          return true;
+
+        // Assume that inline assembly is hot code.
+        if (isa<InlineAsm>(CI->getCalledValue()))
+          return false;
+      }

and then propagate the cold block property to all the blocks only reachable from the cold blocks:

+  DenseMap<const BasicBlock *, bool> ColdBlock;
+  // First mark all function basic blocks as hot or cold.
+  for (BasicBlock &BB : *F)
+    ColdBlock[&BB] = unlikelyExecuted(BB);
+
+  // Then adjust the hot/cold partitions by adding to the cold partition all the
+  // blocks that are only reachable from cold blocks.
+  forwardPropagation(ColdBlock, F);
+
+static void
+forwardPropagation(DenseMap<const BasicBlock *, bool> &ColdBlock,
+                   Function *F) {
+  SmallVector<BasicBlock *, 8> WL;
+  DenseMap<const BasicBlock *, bool> Visited;
+  for (BasicBlock &BB : *F)
+    Visited[&BB] = false;
+
+  BasicBlock *It = &F->front();
+  if (!ColdBlock[It]) {
+    Visited[It] = true;
+    WL.push_back(It);
+    while (WL.size() > 0) {
+      It = WL.pop_back_val();
+      for (BasicBlock *Succ : It->successors())
+        // Do not visit blocks that are cold.
+        if (!ColdBlock[Succ] && !Visited[Succ]) {
+          Visited[Succ] = true;
+          WL.push_back(Succ);
+        }
+    }
+  }
+
+  // Mark all the blocks that were not visited as cold.
+  for (MachineBasicBlock &BB : *F)
+    if (!Visited[&BB])
+      ColdBlock[&BB] = true;
+}

Thanks @sebpop for the patches. I'll update them. Having a static analysis to split the blocks will definitely help when we don;t have profile.

hiraditya updated this revision to Diff 161380.Aug 18 2018, 1:07 PM
hiraditya edited the summary of this revision. (Show Details)

Updated with patches from @sebpop

sebpop added inline comments.Aug 18 2018, 3:35 PM
lib/Transforms/IPO/HotColdSplitting.cpp
276

or ColdBlock[BB]

278

As Krzysztof mentioned, you need a different code here to get the largest cold region that could contain sequential loops.

You can extend the region as long as SingleExit is cold: then you ask for another exit block that post-dominates Exit, etc. recursively.

305

you need to pass ColdBlock map to outlineColdBlocks().

hiraditya edited the summary of this revision. (Show Details)

Forward propagate profile information to all cold-blocks.

sebpop added inline comments.Aug 20 2018, 7:05 AM
lib/Transforms/IPO/HotColdSplitting.cpp
166

As AllColdBlocks is the complement of Visited, you could return Visited and avoid this loop.
Then you would ask the negative: a block is cold if it is not in Visited.

168

I think this TODO should be done in the code generation where you outline cold basic blocks: see below in outlineColdBlocks ...

300

... implement the expansion to larger cold regions here.
Again, you can extend the region as long as SingleExit is cold: then you ask for another exit block that post-dominates Exit, etc. recursively.
Like so:

 if (PSI->isColdBB(BB, BFI) || ColdBlock.count(BB)) {
   SmallVector<BasicBlock *, 4> Region;
   BasicBlock *ExitColdRegion = nullptr;
   BasicBlock *Exit = (*PDT)[BB]->getIDom()->getBlock();
   // Estimate cold region between a BB and its dom-frontier.
   while (isSingleEntrySingleExit(BB, Exit, DT, PDT, Region) &&
              isOutlineCandidate(Region, Exit)) {
     ExitColdRegion = Exit;
     // Update Exit recursively to its dom-frontier.
     Exit = (*PDT)[Exit]->getIDom()->getBlock();
     Region.clear();
   }
   if (ExitColdRegion) {
     ++NumColdSESEFound;
     Region.clear();
     isSingleEntrySingleExit(BB, ExitColdRegion, DT, PDT, Region);
     return extractColdRegion(Region, DT, BFI, ORE);
   }
}
sebpop commandeered this revision.Aug 23 2018, 2:21 PM
sebpop updated this revision to Diff 162268.
sebpop edited reviewers, added: hiraditya; removed: sebpop.
hiraditya commandeered this revision.Aug 25 2018, 11:33 AM
hiraditya updated this revision to Diff 162560.
hiraditya edited reviewers, added: sebpop; removed: hiraditya.

Suggestions from Sebastian:

Only populate hot-regions and find cold as complement of hot-region (compile time optimization)

hiraditya marked 4 inline comments as done.Aug 25 2018, 11:36 AM
sebpop added inline comments.Aug 25 2018, 2:37 PM
lib/Transforms/IPO/HotColdSplitting.cpp
295

In the patch that I posted this if condition was outside the while loop.

Please move this check outside the while loop body, otherwise you won't iterate the while loop to discover a larger cold region than the first one that we discover with (*PDT)[BB]->getIDom()->getBlock().

hiraditya updated this revision to Diff 162599.Aug 26 2018, 4:20 PM

Fixed copy-paste error.

hiraditya marked an inline comment as done.Aug 26 2018, 4:21 PM
hiraditya updated this revision to Diff 162629.Aug 27 2018, 1:27 AM

Added testcase and disable recursive outlining of cold-functions.

hiraditya updated this revision to Diff 162631.Aug 27 2018, 1:34 AM

Check calling convention of outlined function to discard it form hot-cold split optimization

Hello. What version of llvm is this based on? It's still using DEBUG macros. It will need rebasing onto trunk.

I'll rebase onto top of trunk.

hiraditya updated this revision to Diff 162753.Aug 27 2018, 2:53 PM

Rebase onto top of trunk

If there are no-more suggestions for this first draft I'd like to get this in (disabled by default) such that people can try and give feedback. Thanks!

sebpop accepted this revision.Sep 5 2018, 1:33 PM

lgtm, thanks!

This revision is now accepted and ready to land.Sep 5 2018, 1:33 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Sep 11 2018, 7:03 AM

The new test fails like so on my mac laptop:

********************
FAIL: LLVM :: Transforms/HotColdSplit/split-cold-1.ll (22545 of 27683)
******************** TEST 'LLVM :: Transforms/HotColdSplit/split-cold-1.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /Users/thakis/src/llvm-build-goma/bin/opt -hotcoldsplit -S < /Users/thakis/src/llvm-rw/test/Transforms/HotColdSplit/split-cold-1.ll | /Users/thakis/src/llvm-build-goma/bin/FileCheck /Users/thakis/src/llvm-rw/test/Transforms/HotColdSplit/split-cold-1.ll
--
Exit Code: 1

Command Output (stderr):
--
/Users/thakis/src/llvm-rw/test/Transforms/HotColdSplit/split-cold-1.ll:10:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: call coldcc void @foo
              ^
<stdin>:18:11: note: scanning from here
codeRepl: ; preds = %if.end
          ^
<stdin>:26:5: note: possible intended match here
define internal void @foo_if.then12() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
    ^

--

Output of the pass is at http://codepad.org/R7iwxBeR

I see that there is a new Pass Manager version of your new pass, but I don't see that it is ever being enabled in the new pass manager pipeline (or tested). Are you planning to add that?

I see that there is a new Pass Manager version of your new pass, but I don't see that it is ever being enabled in the new pass manager pipeline (or tested). Are you planning to add that?

Pinging question. It should be straightforward to add to the new PM, is that something you plan to do soon as a follow-on?

I see that there is a new Pass Manager version of your new pass, but I don't see that it is ever being enabled in the new pass manager pipeline (or tested). Are you planning to add that?

Pinging question. It should be straightforward to add to the new PM, is that something you plan to do soon as a follow-on?

Sure, we will add the pass to the newPM together with a flag.
The pass will be disabled by default until we get all the bugs fixed.

gyiu added a subscriber: gyiu.Oct 28 2019, 1:41 PM

@hiraditya This is very similar to the PartialInlining pass based on PSI that can outline multiple cold regions in a function, something I implemented a couple of years ago. Is there a chance we can merge these two passes into one?

https://reviews.llvm.org/rL319398

Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2019, 1:41 PM