Find cold blocks based on profile information (or optionally with static analysis).
Forward propagate profile information to all cold-blocks.
Outline a cold region.
Set calling conv and prof hint for the callsite of the outlined function.
Details
Diff Detail
Event Timeline
This looks good. Do you have a testcase that demonstrates what this pass does?
lib/Transforms/IPO/HotColdSplitting.cpp | ||
---|---|---|
209 ↗ | (On Diff #160423) | Another thing for the future development would be to find the largest cold section of the code: for example, if there are two adjacent loops that are both in a cold side of an if-statement, both of them could be outlined into the same function. |
Working on it.
lib/Transforms/IPO/HotColdSplitting.cpp | ||
---|---|---|
209 ↗ | (On Diff #160423) | That's a good idea! Thanks. I'll work on that. Another improvement I'm thinking of is to outline multiple regions from a function. |
lib/Transforms/IPO/HotColdSplitting.cpp | ||
---|---|---|
209 ↗ | (On Diff #160423) | find cold blocks like this: + bool unlikelyExecuted(BasicBlock &BB) { + if (blockEndsInUnreachable(&BB)) + return true; + // Exception handling blocks are unlikely executed. + if (BB.isEHPad()) + return true; + for (const Instruction &I : BB) + if (const CallInst *CI = dyn_cast<CallInst>(&I)) { + // The block is cold if it calls functions tagged as cold or noreturn. + if (CI->hasFnAttr(Attribute::Cold) || + CI->hasFnAttr(Attribute::NoReturn)) + return true; + + // Assume that inline assembly is hot code. + if (isa<InlineAsm>(CI->getCalledValue())) + return false; + } and then propagate the cold block property to all the blocks only reachable from the cold blocks: + DenseMap<const BasicBlock *, bool> ColdBlock; + // First mark all function basic blocks as hot or cold. + for (BasicBlock &BB : *F) + ColdBlock[&BB] = unlikelyExecuted(BB); + + // Then adjust the hot/cold partitions by adding to the cold partition all the + // blocks that are only reachable from cold blocks. + forwardPropagation(ColdBlock, F); + +static void +forwardPropagation(DenseMap<const BasicBlock *, bool> &ColdBlock, + Function *F) { + SmallVector<BasicBlock *, 8> WL; + DenseMap<const BasicBlock *, bool> Visited; + for (BasicBlock &BB : *F) + Visited[&BB] = false; + + BasicBlock *It = &F->front(); + if (!ColdBlock[It]) { + Visited[It] = true; + WL.push_back(It); + while (WL.size() > 0) { + It = WL.pop_back_val(); + for (BasicBlock *Succ : It->successors()) + // Do not visit blocks that are cold. + if (!ColdBlock[Succ] && !Visited[Succ]) { + Visited[Succ] = true; + WL.push_back(Succ); + } + } + } + + // Mark all the blocks that were not visited as cold. + for (MachineBasicBlock &BB : *F) + if (!Visited[&BB]) + ColdBlock[&BB] = true; +} |
Thanks @sebpop for the patches. I'll update them. Having a static analysis to split the blocks will definitely help when we don;t have profile.
lib/Transforms/IPO/HotColdSplitting.cpp | ||
---|---|---|
275 ↗ | (On Diff #161380) | or ColdBlock[BB] |
277 ↗ | (On Diff #161380) | As Krzysztof mentioned, you need a different code here to get the largest cold region that could contain sequential loops. You can extend the region as long as SingleExit is cold: then you ask for another exit block that post-dominates Exit, etc. recursively. |
304 ↗ | (On Diff #161380) | you need to pass ColdBlock map to outlineColdBlocks(). |
lib/Transforms/IPO/HotColdSplitting.cpp | ||
---|---|---|
166 ↗ | (On Diff #161418) | As AllColdBlocks is the complement of Visited, you could return Visited and avoid this loop. |
168 ↗ | (On Diff #161418) | I think this TODO should be done in the code generation where you outline cold basic blocks: see below in outlineColdBlocks ... |
300 ↗ | (On Diff #161418) | ... implement the expansion to larger cold regions here. if (PSI->isColdBB(BB, BFI) || ColdBlock.count(BB)) { SmallVector<BasicBlock *, 4> Region; BasicBlock *ExitColdRegion = nullptr; BasicBlock *Exit = (*PDT)[BB]->getIDom()->getBlock(); // Estimate cold region between a BB and its dom-frontier. while (isSingleEntrySingleExit(BB, Exit, DT, PDT, Region) && isOutlineCandidate(Region, Exit)) { ExitColdRegion = Exit; // Update Exit recursively to its dom-frontier. Exit = (*PDT)[Exit]->getIDom()->getBlock(); Region.clear(); } if (ExitColdRegion) { ++NumColdSESEFound; Region.clear(); isSingleEntrySingleExit(BB, ExitColdRegion, DT, PDT, Region); return extractColdRegion(Region, DT, BFI, ORE); } } |
Suggestions from Sebastian:
Only populate hot-regions and find cold as complement of hot-region (compile time optimization)
lib/Transforms/IPO/HotColdSplitting.cpp | ||
---|---|---|
294 ↗ | (On Diff #162560) | In the patch that I posted this if condition was outside the while loop. Please move this check outside the while loop body, otherwise you won't iterate the while loop to discover a larger cold region than the first one that we discover with (*PDT)[BB]->getIDom()->getBlock(). |
Check calling convention of outlined function to discard it form hot-cold split optimization
Hello. What version of llvm is this based on? It's still using DEBUG macros. It will need rebasing onto trunk.
If there are no-more suggestions for this first draft I'd like to get this in (disabled by default) such that people can try and give feedback. Thanks!
The new test fails like so on my mac laptop:
******************** FAIL: LLVM :: Transforms/HotColdSplit/split-cold-1.ll (22545 of 27683) ******************** TEST 'LLVM :: Transforms/HotColdSplit/split-cold-1.ll' FAILED ******************** Script: -- : 'RUN: at line 1'; /Users/thakis/src/llvm-build-goma/bin/opt -hotcoldsplit -S < /Users/thakis/src/llvm-rw/test/Transforms/HotColdSplit/split-cold-1.ll | /Users/thakis/src/llvm-build-goma/bin/FileCheck /Users/thakis/src/llvm-rw/test/Transforms/HotColdSplit/split-cold-1.ll -- Exit Code: 1 Command Output (stderr): -- /Users/thakis/src/llvm-rw/test/Transforms/HotColdSplit/split-cold-1.ll:10:15: error: CHECK-NEXT: expected string not found in input ; CHECK-NEXT: call coldcc void @foo ^ <stdin>:18:11: note: scanning from here codeRepl: ; preds = %if.end ^ <stdin>:26:5: note: possible intended match here define internal void @foo_if.then12() personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) { ^ --
Output of the pass is at http://codepad.org/R7iwxBeR
I broke the check in https://reviews.llvm.org/rL341838 and fixed it in https://reviews.llvm.org/rL341839
I see that there is a new Pass Manager version of your new pass, but I don't see that it is ever being enabled in the new pass manager pipeline (or tested). Are you planning to add that?
Pinging question. It should be straightforward to add to the new PM, is that something you plan to do soon as a follow-on?
Sure, we will add the pass to the newPM together with a flag.
The pass will be disabled by default until we get all the bugs fixed.
@hiraditya This is very similar to the PartialInlining pass based on PSI that can outline multiple cold regions in a function, something I implemented a couple of years ago. Is there a chance we can merge these two passes into one?