This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
HotColdSplitting.cpp
-
test/Transforms/HotColdSplit/
-
Transforms/
-
HotColdSplit/
-
eh-pads.ll
-
extraction-subregion-breaks-phis.ll
-
forward-dfs-reaches-marked-block.ll
-
mark-the-whole-func-cold.ll
-
outline-disjoint-diamonds.ll
-
outline-multiple-entry-region.ll
-
outline-while-loop.ll
-
phi-with-distinct-outlined-values.ll
-
region-overlap.ll
-
succ-block-with-self-edge.ll

Differential D53887

[HotColdSplitting] Outline more than once per function
ClosedPublic

Authored by vsk on Oct 30 2018, 1:26 PM.

Download Raw Diff

Details

Reviewers

sebpop
hiraditya
tejohnson
junbuml
kachkov98

Commits

rG03aaa3e2aa37: [HotColdSplitting] Outline more than once per function
rL348639: [HotColdSplitting] Outline more than once per function

Summary

Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.

Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 383KB post-patch (+ ~2.8x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.

Diff Detail

Repository: rL LLVM

Event Timeline

vsk created this revision.Oct 30 2018, 1:26 PM

hiraditya added inline comments.Oct 31 2018, 8:47 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗	(On Diff #171773)	I think, this can go before the previous check.
489 ↗	(On Diff #171773)	Add optimize for size attribute?

vsk added inline comments.Oct 31 2018, 9:03 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗	(On Diff #171773)	I think that would cause splitting to fail when a predecessor outside of the cold region is the entry block. Am I missing something?
489 ↗	(On Diff #171773)	Good point.

hiraditya added inline comments.Oct 31 2018, 9:08 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
480 ↗	(On Diff #171773)	Creating scope '{}' may not be required.
517 ↗	(On Diff #171773)	Maybe print the region which didn't get outlined, or at least the root-block.

hiraditya added inline comments.Oct 31 2018, 9:18 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗	(On Diff #171773)	When SinkPostDom is true and PredBB is the entry, we can just return.

tejohnson added inline comments.Oct 31 2018, 9:56 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
522 ↗	(On Diff #171773)	The last argument should be changed from a constant 1 to a running count of outlined regions. Otherwise all outlined functions will have the same name.

Maybe you can add the testcase from my previous patch: https://reviews.llvm.org/D53588

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
378 ↗	(On Diff #171773)	s/anu/any/
517 ↗	(On Diff #171773)	I think that would generate too much debug output. Maybe completely remove the LLVM_DEBUG stmt.

brzycki added a subscriber: brzycki.Nov 1 2018, 11:24 AM

While doing performance testing I found a miscompile in ./SingleSource/Regression/C++/EH/Regression-C++-class_hierarchy. I'll file a PR with more details by next week. It looks like it could be an existing bug that surfaces due to more aggressive outlining.
Added tests (including the one from @sebpop's earlier patch).

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
348 ↗	(On Diff #171773)	I see what you mean.
480 ↗	(On Diff #171773)	It's not, I just find it easier to read multiple statements when there are surrounding curly braces.
517 ↗	(On Diff #171773)	I think some form of debug statement here would help to tune the outlining threshold. Could we keep it for now? I'll remove some of the other redundant debug statements to keep the output reasonable.
522 ↗	(On Diff #171773)	Thanks for pointing this out.

vsk edited the summary of this revision. (Show Details)Nov 2 2018, 4:38 PM

Can we push this patch? This is not enabled by default so we can continue development in subsequent patches.

In D53887#1286544, @hiraditya wrote:

Can we push this patch? This is not enabled by default so we can continue development in subsequent patches.

I'd like to form a plan for addressing llvm.org/PR39545 first, to avoid inadvertently leaving the EH regression in tree.

Edit: Starting to think that the underlying bug in PR39545 is faulty lowering of llvm.eh.typeid.for in outlined thunks. At a conceptual level it shouldn't be hard to fix. I really think we should tackle that first as looks like a pretty bad miscompile affecting all in-tree clients of CodeExtractor.

junbuml added inline comments.Nov 5 2018, 1:20 PM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
160 ↗	(On Diff #172456)	This function set MinSize, so I think the function name should be something like markMinSize().
410 ↗	(On Diff #172456)	We should have additional check before adding SuccBB into Blocks (maybe successors of SuccBB). If a dom-frontier have a phi taking different incoming values from multiple cold blocks, it will assert. In the example below, %if.end takes different incoming values from %coldbb and %coldbb2. define void @foo(i32 %cond) { entry: %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %if.end, label %coldbb coldbb: call void (...) @sink() br i1 undef, label %if.end, label %coldbb2 coldbb2: br label %if.end if.end: %p = phi i32 [0, %entry], [1, %coldbb], [3, %coldbb2] ret void } declare void @sink(...) cold
511 ↗	(On Diff #172456)	I don't think it's good idea to add MinSize in this case. It is possible that a hot function or main() itself can be successfully ended up with exit(0).

hiraditya added inline comments.Nov 6 2018, 9:36 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
511 ↗	(On Diff #172456)	The static analysis here is pretty conservative, unless the runtime-profile information is broken, it seems unlikely that this pass will mark a hot function as cold.

junbuml added inline comments.Nov 6 2018, 11:21 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
511 ↗	(On Diff #172456)	I agree that this pass will unlikely mark hot blocks as cold. But, in line 509~511, it mark MinSize on the function, if all blocks are post-domed by a single cold block (e.g., mark-the-whole-func-cold.ll). In the example below, we may not want to mark MinSize on main(). void main() { // .. // hot loop .. // exit (0); }

junbuml added inline comments.Nov 6 2018, 11:52 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
282 ↗	(On Diff #172456)	Considering the case below, I believed we should not mark the MinSize here as well. void foo(bool c) { if(c) { .... return; } // hot code exit (0); }

vsk added inline comments.Nov 6 2018, 11:58 AM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
160 ↗	(On Diff #172456)	That's true, but as a follow-up, I'd like to have this helper add Attribute::Cold, and move the function to the end of the text segment. Would it be all right to keep the aspirational name?
282 ↗	(On Diff #172456)	I think this is illustrative of a more general problem: I don't think we should treat NoReturn functions as cold at all. On Darwin, functions like longjmp are NoReturn but warm. Our kernel uses continuation functions (conceptually similar to longjmp) to reduce stack usage: these are also NoReturn but hot. IMO, noreturn functions which are actually cold should be marked as such (abort, etc., but not exit). That's what we do on Darwin.
410 ↗	(On Diff #172456)	As noted in llvm.org/PR39545, I'll send out a separate review to address this.

vsk added inline comments.Nov 6 2018, 12:02 PM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
282 ↗	(On Diff #172456)	If the concern is that it's hard to add attributes to functions in libc, we could special-case them as needed. But I think treating all noreturn functions as cold when they might not be will cause problems, or at least prevent us from attaching MinSize, etc.

junbuml added inline comments.Nov 6 2018, 1:10 PM

llvm/lib/Transforms/IPO/HotColdSplitting.cpp
282 ↗	(On Diff #172456)	IMHO, I doubt if unlikelyExecuted() is conservative enough without profile data. Except a specific cold mark(Attribute::Cold), all other checks may not perfectly clear to say the blocks are really cold.

Removing the WIP label, as I'm confident in the results now.

Fix an iterator invalidation bug in takeSingleEntrySubRegion.
I've addressed the immediate miscompile reported in llvm.org/PR39545, and plan on improving handling of eh.typeid.for in a follow-up.
@junbuml has concerns about applying MinSize to functions which call exit. IMHO the right fix is to not treat noreturn calls as cold (also as a follow-up).

IMHO the right fix is to not treat noreturn calls as cold (also as a follow-up).

I'm not sure if handling noreturn is a right fix. A block containing exit(0) will have "unreachable", so it must be still considered as a cold block even after removing noreturn from unlikelyExecuted().

In D53887#1290571, @junbuml wrote:

IMHO the right fix is to not treat noreturn calls as cold (also as a follow-up).

I'm not sure if handling noreturn is a right fix. A block containing exit(0) will have "unreachable", so it must be still considered as a cold block even after removing noreturn from unlikelyExecuted().

I had in mind:

if (blockEndsInUnreachable(BB)) {
  // Calls to noreturn functions are followed by an unreachable inst, but
  // the call itself may be warm (e.g. longjmp, or exit).
  if (auto *CI =
          dyn_cast_or_null<CallInst>(BB.getTerminator()->getPrevNode()))
    if (CI->hasFnAttr(Attribute::NoReturn))
      return false;
  return true;
}

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

In D53887#1290584, @junbuml wrote:

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

That's a great idea. I'll do that now with the test-suite + externals. FWIW, I also built all of iOS with this pass enabled, and the only compiler crash I found was https://bugs.llvm.org/show_bug.cgi?id=39564. I'm still investigating possible miscompiles.

In D53887#1290587, @vsk wrote:

In D53887#1290584, @junbuml wrote:

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

That's a great idea. I'll do that now with the test-suite + externals. FWIW, I also built all of iOS with this pass enabled, and the only compiler crash I found was https://bugs.llvm.org/show_bug.cgi?id=39564. I'm still investigating possible miscompiles.

So, I defined unlikelyExecuted(BB) = true, and found a bug while running through the test-suite. The issue is that two do two DFS's (one on predecessor blocks, one on successor blocks). The second DFS can mark a block already marked by the first DFS. In practice this isn't a problem, because CodeExtractor maintains a set of blocks. But it's weird to have duplicate blocks in the extraction list, and it's a simple issue to fix.

Other than that, I found no miscompiles. The test suite passed cleanly.

Edit: the unlikelyExecuted(BB) = true experiment is superseded by the one described below.

In D53887#1290587, @vsk wrote:

In D53887#1290584, @junbuml wrote:

I guess it would be good to give some stress test on this pass to see if there is any hidden bug by relaxing conditions in unlikelyExecuted(). As we treat more blocks as cold without being limited on unlikelyExecuted(), we maybe able to expose hidden issues with it.

That's a great idea. I'll do that now with the test-suite + externals. FWIW, I also built all of iOS with this pass enabled, and the only compiler crash I found was https://bugs.llvm.org/show_bug.cgi?id=39564. I'm still investigating possible miscompiles.

So, I defined bool Cold = !pred_empty(BB), and this resulted in 29MB of text being outlined in test suite (out of 43MB total). I found a small issue in D54189, but no actual miscompilations. All tests passed and matched the reference output.

Ping, are there any outstanding concerns about this one? It'd be nice to have this in-tree, as I have a few follow-ups based on it.

vsk mentioned this in D54189: [HotColdSplitting] Ensure PHIs have unique incoming values.Dec 3 2018, 1:16 PM

Friendly ping. I've rebased this on top of r348205, which fixes the assertion failure pointed out in llvm.org/PR39564.

I've stress-tested this by:

Building LNT+externals with hot/cold splitting enabled. I forced outlining to occur whenever a block has more than 1 predecessor, so long as it wouldn't result in the entire function being outlined. All output validation tests still passed.
Running check-llvm in a stage2 build with hot/cold splitting enabled in the same way described above, but with stack coloring disabled due to llvm.org/PR39671.

vsk added a reviewer: kachkov98.Dec 3 2018, 5:13 PM

LGTM, if there are outstanding comments from other reviewers we can address them in subsequent patches.

This revision is now accepted and ready to land.Dec 7 2018, 12:04 PM

Closed by commit rL348639: [HotColdSplitting] Outline more than once per function (authored by vedantk). · Explain WhyDec 7 2018, 12:27 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

IPO/

HotColdSplitting.cpp

452 lines

test/

Transforms/

HotColdSplit/

eh-pads.ll

39 lines

extraction-subregion-breaks-phis.ll

63 lines

forward-dfs-reaches-marked-block.ll

29 lines

mark-the-whole-func-cold.ll

64 lines

outline-disjoint-diamonds.ll

57 lines

outline-multiple-entry-region.ll

81 lines

outline-while-loop.ll

49 lines

phi-with-distinct-outlined-values.ll

35 lines

region-overlap.ll

65 lines

succ-block-with-self-edge.ll

56 lines

Diff 177277

llvm/trunk/lib/Transforms/IPO/HotColdSplitting.cpp

//===- HotColdSplitting.cpp -- Outline Cold Regions -------------- C++ --===//		//===- HotColdSplitting.cpp -- Outline Cold Regions -------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Outline cold regions to a separate function.		// Outline cold regions to a separate function.
// TODO: Update BFI and BPI		// TODO: Update BFI and BPI
// TODO: Add all the outlined functions to a separate section.		// TODO: Add all the outlined functions to a separate section.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/CFG.h"		#include "llvm/Analysis/CFG.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	static bool unlikelyExecuted(const BasicBlock &BB) {
return false;		return false;
}		}

/// Check whether it's safe to outline \p BB.		/// Check whether it's safe to outline \p BB.
static bool mayExtractBlock(const BasicBlock &BB) {		static bool mayExtractBlock(const BasicBlock &BB) {
return !BB.hasAddressTaken();		return !BB.hasAddressTaken();
}		}

/// Check whether \p BB is profitable to outline (i.e. its code size cost meets		/// Check whether \p Region is profitable to outline.
/// the threshold set in \p MinOutliningThreshold).		static bool isProfitableToOutline(const BlockSequence &Region,
static bool isProfitableToOutline(const BasicBlock &BB,
TargetTransformInfo &TTI) {		TargetTransformInfo &TTI) {
		if (Region.size() > 1)
		return true;

int Cost = 0;		int Cost = 0;
		const BasicBlock &BB = *Region[0];
for (const Instruction &I : BB) {		for (const Instruction &I : BB) {
if (isa<DbgInfoIntrinsic>(&I) \|\| &I == BB.getTerminator())		if (isa<DbgInfoIntrinsic>(&I) \|\| &I == BB.getTerminator())
continue;		continue;

Cost += TTI.getInstructionCost(&I, TargetTransformInfo::TCK_CodeSize);		Cost += TTI.getInstructionCost(&I, TargetTransformInfo::TCK_CodeSize);

if (Cost >= (MinOutliningThreshold * TargetTransformInfo::TCC_Basic))		if (Cost >= (MinOutliningThreshold * TargetTransformInfo::TCC_Basic))
return true;		return true;
}		}
return false;		return false;
}		}

/// Identify the maximal region of cold blocks which includes \p SinkBB.		/// Mark \p F cold. Return true if it's changed.
///		static bool markEntireFunctionCold(Function &F) {
/// Include all blocks post-dominated by \p SinkBB, \p SinkBB itself, and all		assert(!F.hasFnAttribute(Attribute::OptimizeNone) && "Can't mark this cold");
/// blocks dominated by \p SinkBB. Exclude all other blocks, and blocks which		bool Changed = false;
/// cannot be outlined.		if (!F.hasFnAttribute(Attribute::MinSize)) {
///		F.addFnAttr(Attribute::MinSize);
/// Return an empty sequence if the cold region is too small to outline, or if		Changed = true;
/// the cold region has no warm predecessors.
static BlockSequence findMaximalColdRegion(BasicBlock &SinkBB,
TargetTransformInfo &TTI,
DominatorTree &DT,
PostDomTree &PDT) {
// The maximal cold region.
BlockSequence ColdRegion = {};

// The ancestor farthest-away from SinkBB, and also post-dominated by it.
BasicBlock *MaxAncestor = &SinkBB;
unsigned MaxAncestorHeight = 0;

// Visit SinkBB's ancestors using inverse DFS.
auto PredIt = ++idf_begin(&SinkBB);
auto PredEnd = idf_end(&SinkBB);
while (PredIt != PredEnd) {
BasicBlock &PredBB = **PredIt;
bool SinkPostDom = PDT.dominates(&SinkBB, &PredBB);

// If SinkBB does not post-dominate a predecessor, do not mark the
// predecessor (or any of its predecessors) cold.
if (!SinkPostDom \|\| !mayExtractBlock(PredBB)) {
PredIt.skipChildren();
continue;
}

// Keep track of the post-dominated ancestor farthest away from the sink.
unsigned AncestorHeight = PredIt.getPathLength();
if (AncestorHeight > MaxAncestorHeight) {
MaxAncestor = &PredBB;
MaxAncestorHeight = AncestorHeight;
}

ColdRegion.push_back(&PredBB);
++PredIt;
}

// CodeExtractor requires that all blocks to be extracted must be dominated
// by the first block to be extracted.
//
// To avoid spurious or repeated outlining, require that the max ancestor
// has a predecessor. By construction this predecessor is not in the cold
// region, i.e. its existence implies we don't outline the whole function.
//
// TODO: If MaxAncestor has no predecessors, we may be able to outline the
// second largest cold region that has a predecessor.
if (pred_empty(MaxAncestor) \|\|
MaxAncestor->getSinglePredecessor() == MaxAncestor)
return {};

// Filter out predecessors not dominated by the max ancestor.
//
// TODO: Blocks not dominated by the max ancestor could be extracted as
// other cold regions. Marking outlined calls as noreturn when appropriate
// and outlining more than once per function could achieve most of the win.
auto EraseIt = remove_if(ColdRegion, [&](BasicBlock *PredBB) {
return PredBB != MaxAncestor && !DT.dominates(MaxAncestor, PredBB);
});
ColdRegion.erase(EraseIt, ColdRegion.end());

// Add SinkBB to the cold region.
ColdRegion.push_back(&SinkBB);

// Ensure that the first extracted block is the max ancestor.
if (ColdRegion[0] != MaxAncestor) {
auto AncestorIt = find(ColdRegion, MaxAncestor);
*AncestorIt = ColdRegion[0];
ColdRegion[0] = MaxAncestor;
}

// Find all successors of SinkBB dominated by SinkBB using DFS.
auto SuccIt = ++df_begin(&SinkBB);
auto SuccEnd = df_end(&SinkBB);
while (SuccIt != SuccEnd) {
BasicBlock &SuccBB = **SuccIt;
bool SinkDom = DT.dominates(&SinkBB, &SuccBB);

// If SinkBB does not dominate a successor, do not mark the successor (or
// any of its successors) cold.
if (!SinkDom \|\| !mayExtractBlock(SuccBB)) {
SuccIt.skipChildren();
continue;
}

ColdRegion.push_back(&SuccBB);
++SuccIt;
}

if (ColdRegion.size() == 1 && !isProfitableToOutline(*ColdRegion[0], TTI))
return {};

return ColdRegion;
}

/// Get the largest cold region in \p F.
static BlockSequence getLargestColdRegion(Function &F, ProfileSummaryInfo &PSI,
BlockFrequencyInfo *BFI,
TargetTransformInfo &TTI,
DominatorTree &DT, PostDomTree &PDT) {
// Keep track of the largest cold region.
BlockSequence LargestColdRegion = {};

for (BasicBlock &BB : F) {
// Identify cold blocks.
if (!mayExtractBlock(BB))
continue;
bool Cold =
PSI.isColdBlock(&BB, BFI) \|\| (EnableStaticAnalyis && unlikelyExecuted(BB));
if (!Cold)
continue;

LLVM_DEBUG({
dbgs() << "Found cold block:\n";
BB.dump();
});

// Find a maximal cold region we can outline.
BlockSequence ColdRegion = findMaximalColdRegion(BB, TTI, DT, PDT);
if (ColdRegion.empty()) {
LLVM_DEBUG(dbgs() << " Skipping (block not profitable to extract)\n");
continue;
}

++NumColdRegionsFound;

LLVM_DEBUG({
llvm::dbgs() << "Identified cold region with " << ColdRegion.size()
<< " blocks:\n";
for (BasicBlock *BB : ColdRegion)
BB->dump();
});

// TODO: Outline more than one region.
if (ColdRegion.size() > LargestColdRegion.size())
LargestColdRegion = std::move(ColdRegion);
}		}
		// TODO: Move this function into a cold section.
return LargestColdRegion;		return Changed;
}		}

class HotColdSplitting {		class HotColdSplitting {
public:		public:
HotColdSplitting(ProfileSummaryInfo *ProfSI,		HotColdSplitting(ProfileSummaryInfo *ProfSI,
function_ref<BlockFrequencyInfo *(Function &)> GBFI,		function_ref<BlockFrequencyInfo *(Function &)> GBFI,
function_ref<TargetTransformInfo &(Function &)> GTTI,		function_ref<TargetTransformInfo &(Function &)> GTTI,
std::function<OptimizationRemarkEmitter &(Function &)> *GORE)		std::function<OptimizationRemarkEmitter &(Function &)> *GORE)
: PSI(ProfSI), GetBFI(GBFI), GetTTI(GTTI), GetORE(GORE) {}		: PSI(ProfSI), GetBFI(GBFI), GetTTI(GTTI), GetORE(GORE) {}
bool run(Module &M);		bool run(Module &M);

private:		private:
bool shouldOutlineFrom(const Function &F) const;		bool shouldOutlineFrom(const Function &F) const;
		bool outlineColdRegions(Function &F, ProfileSummaryInfo &PSI,
		BlockFrequencyInfo *BFI, TargetTransformInfo &TTI,
		DominatorTree &DT, PostDomTree &PDT,
		OptimizationRemarkEmitter &ORE);
Function *extractColdRegion(const BlockSequence &Region, DominatorTree &DT,		Function *extractColdRegion(const BlockSequence &Region, DominatorTree &DT,
BlockFrequencyInfo *BFI, TargetTransformInfo &TTI,		BlockFrequencyInfo *BFI, TargetTransformInfo &TTI,
OptimizationRemarkEmitter &ORE, unsigned Count);		OptimizationRemarkEmitter &ORE, unsigned Count);
SmallPtrSet<const Function *, 2> OutlinedFunctions;		SmallPtrSet<const Function *, 2> OutlinedFunctions;
ProfileSummaryInfo *PSI;		ProfileSummaryInfo *PSI;
function_ref<BlockFrequencyInfo *(Function &)> GetBFI;		function_ref<BlockFrequencyInfo *(Function &)> GetBFI;
function_ref<TargetTransformInfo &(Function &)> GetTTI;		function_ref<TargetTransformInfo &(Function &)> GetTTI;
std::function<OptimizationRemarkEmitter &(Function &)> *GetORE;		std::function<OptimizationRemarkEmitter &(Function &)> *GetORE;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

Function *HotColdSplitting::extractColdRegion(const BlockSequence &Region,		Function *HotColdSplitting::extractColdRegion(const BlockSequence &Region,
DominatorTree &DT,		DominatorTree &DT,
BlockFrequencyInfo *BFI,		BlockFrequencyInfo *BFI,
TargetTransformInfo &TTI,		TargetTransformInfo &TTI,
OptimizationRemarkEmitter &ORE,		OptimizationRemarkEmitter &ORE,
unsigned Count) {		unsigned Count) {
assert(!Region.empty());		assert(!Region.empty());
LLVM_DEBUG(for (auto *BB : Region)
llvm::dbgs() << "\nExtracting: " << *BB;);

// TODO: Pass BFI and BPI to update profile information.		// TODO: Pass BFI and BPI to update profile information.
CodeExtractor CE(Region, &DT, /* AggregateArgs / false, / BFI */ nullptr,		CodeExtractor CE(Region, &DT, /* AggregateArgs / false, / BFI */ nullptr,
/* BPI / nullptr, / AllowVarArgs */ false,		/* BPI / nullptr, / AllowVarArgs */ false,
/* AllowAlloca */ false,		/* AllowAlloca */ false,
/* Suffix */ "cold." + std::to_string(Count));		/* Suffix */ "cold." + std::to_string(Count));

SetVector<Value *> Inputs, Outputs, Sinks;		SetVector<Value *> Inputs, Outputs, Sinks;
Show All 15 Lines	if (Function *OutF = CE.extractCodeRegion()) {
if (TTI.useColdCCForColdCall(*OutF)) {		if (TTI.useColdCCForColdCall(*OutF)) {
OutF->setCallingConv(CallingConv::Cold);		OutF->setCallingConv(CallingConv::Cold);
CS.setCallingConv(CallingConv::Cold);		CS.setCallingConv(CallingConv::Cold);
}		}
CI->setIsNoInline();		CI->setIsNoInline();

// Try to make the outlined code as small as possible on the assumption		// Try to make the outlined code as small as possible on the assumption
// that it's cold.		// that it's cold.
assert(!OutF->hasFnAttribute(Attribute::OptimizeNone) &&		markEntireFunctionCold(*OutF);
"An outlined function should never be marked optnone");
OutF->addFnAttr(Attribute::MinSize);

LLVM_DEBUG(llvm::dbgs() << "Outlined Region: " << *OutF);		LLVM_DEBUG(llvm::dbgs() << "Outlined Region: " << *OutF);
ORE.emit([&]() {		ORE.emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "HotColdSplit",		return OptimizationRemark(DEBUG_TYPE, "HotColdSplit",
&*Region[0]->begin())		&*Region[0]->begin())
<< ore::NV("Original", OrigF) << " split cold code into "		<< ore::NV("Original", OrigF) << " split cold code into "
<< ore::NV("Split", OutF);		<< ore::NV("Split", OutF);
});		});
return OutF;		return OutF;
}		}

ORE.emit([&]() {		ORE.emit([&]() {
return OptimizationRemarkMissed(DEBUG_TYPE, "ExtractFailed",		return OptimizationRemarkMissed(DEBUG_TYPE, "ExtractFailed",
&*Region[0]->begin())		&*Region[0]->begin())
<< "Failed to extract region at block "		<< "Failed to extract region at block "
<< ore::NV("Block", Region.front());		<< ore::NV("Block", Region.front());
});		});
return nullptr;		return nullptr;
}		}

		/// A pair of (basic block, score).
		using BlockTy = std::pair<BasicBlock *, unsigned>;

		/// A maximal outlining region. This contains all blocks post-dominated by a
		/// sink block, the sink block itself, and all blocks dominated by the sink.
		class OutliningRegion {
		/// A list of (block, score) pairs. A block's score is non-zero iff it's a
		/// viable sub-region entry point. Blocks with higher scores are better entry
		/// points (i.e. they are more distant ancestors of the sink block).
		SmallVector<BlockTy, 0> Blocks = {};

		/// The suggested entry point into the region. If the region has multiple
		/// entry points, all blocks within the region may not be reachable from this
		/// entry point.
		BasicBlock *SuggestedEntryPoint = nullptr;

		/// Whether the entire function is cold.
		bool EntireFunctionCold = false;

		/// Whether or not \p BB could be the entry point of an extracted region.
		static bool isViableEntryPoint(BasicBlock &BB) { return !BB.isEHPad(); }

		/// If \p BB is a viable entry point, return \p Score. Return 0 otherwise.
		static unsigned getEntryPointScore(BasicBlock &BB, unsigned Score) {
		return isViableEntryPoint(BB) ? Score : 0;
		}

		/// These scores should be lower than the score for predecessor blocks,
		/// because regions starting at predecessor blocks are typically larger.
		static constexpr unsigned ScoreForSuccBlock = 1;
		static constexpr unsigned ScoreForSinkBlock = 1;

		OutliningRegion(const OutliningRegion &) = delete;
		OutliningRegion &operator=(const OutliningRegion &) = delete;

		public:
		OutliningRegion() = default;
		OutliningRegion(OutliningRegion &&) = default;
		OutliningRegion &operator=(OutliningRegion &&) = default;

		static OutliningRegion create(BasicBlock &SinkBB, const DominatorTree &DT,
		const PostDomTree &PDT) {
		OutliningRegion ColdRegion;

		SmallPtrSet<BasicBlock *, 4> RegionBlocks;

		auto addBlockToRegion = [&](BasicBlock *BB, unsigned Score) {
		RegionBlocks.insert(BB);
		ColdRegion.Blocks.emplace_back(BB, Score);
		assert(RegionBlocks.size() == ColdRegion.Blocks.size() && "Duplicate BB");
		};

		// The ancestor farthest-away from SinkBB, and also post-dominated by it.
		unsigned SinkScore = getEntryPointScore(SinkBB, ScoreForSinkBlock);
		ColdRegion.SuggestedEntryPoint = (SinkScore > 0) ? &SinkBB : nullptr;
		unsigned BestScore = SinkScore;

		// Visit SinkBB's ancestors using inverse DFS.
		auto PredIt = ++idf_begin(&SinkBB);
		auto PredEnd = idf_end(&SinkBB);
		while (PredIt != PredEnd) {
		BasicBlock &PredBB = **PredIt;
		bool SinkPostDom = PDT.dominates(&SinkBB, &PredBB);

		// If the predecessor is cold and has no predecessors, the entire
		// function must be cold.
		if (SinkPostDom && pred_empty(&PredBB)) {
		ColdRegion.EntireFunctionCold = true;
		return ColdRegion;
		}

		// If SinkBB does not post-dominate a predecessor, do not mark the
		// predecessor (or any of its predecessors) cold.
		if (!SinkPostDom \|\| !mayExtractBlock(PredBB)) {
		PredIt.skipChildren();
		continue;
		}

		// Keep track of the post-dominated ancestor farthest away from the sink.
		// The path length is always >= 2, ensuring that predecessor blocks are
		// considered as entry points before the sink block.
		unsigned PredScore = getEntryPointScore(PredBB, PredIt.getPathLength());
		if (PredScore > BestScore) {
		ColdRegion.SuggestedEntryPoint = &PredBB;
		BestScore = PredScore;
		}

		addBlockToRegion(&PredBB, PredScore);
		++PredIt;
		}

		// Add SinkBB to the cold region. It's considered as an entry point before
		// any sink-successor blocks.
		addBlockToRegion(&SinkBB, SinkScore);

		// Find all successors of SinkBB dominated by SinkBB using DFS.
		auto SuccIt = ++df_begin(&SinkBB);
		auto SuccEnd = df_end(&SinkBB);
		while (SuccIt != SuccEnd) {
		BasicBlock &SuccBB = **SuccIt;
		bool SinkDom = DT.dominates(&SinkBB, &SuccBB);

		// Don't allow the backwards & forwards DFSes to mark the same block.
		bool DuplicateBlock = RegionBlocks.count(&SuccBB);

		// If SinkBB does not dominate a successor, do not mark the successor (or
		// any of its successors) cold.
		if (DuplicateBlock \|\| !SinkDom \|\| !mayExtractBlock(SuccBB)) {
		SuccIt.skipChildren();
		continue;
		}

		unsigned SuccScore = getEntryPointScore(SuccBB, ScoreForSuccBlock);
		if (SuccScore > BestScore) {
		ColdRegion.SuggestedEntryPoint = &SuccBB;
		BestScore = SuccScore;
		}

		addBlockToRegion(&SuccBB, SuccScore);
		++SuccIt;
		}

		return ColdRegion;
		}

		/// Whether this region has nothing to extract.
		bool empty() const { return !SuggestedEntryPoint; }

		/// The blocks in this region.
		ArrayRef<std::pair<BasicBlock *, unsigned>> blocks() const { return Blocks; }

		/// Whether the entire function containing this region is cold.
		bool isEntireFunctionCold() const { return EntireFunctionCold; }

		/// Remove a sub-region from this region and return it as a block sequence.
		BlockSequence takeSingleEntrySubRegion(DominatorTree &DT) {
		assert(!empty() && !isEntireFunctionCold() && "Nothing to extract");

		// Remove blocks dominated by the suggested entry point from this region.
		// During the removal, identify the next best entry point into the region.
		// Ensure that the first extracted block is the suggested entry point.
		BlockSequence SubRegion = {SuggestedEntryPoint};
		BasicBlock *NextEntryPoint = nullptr;
		unsigned NextScore = 0;
		auto RegionEndIt = Blocks.end();
		auto RegionStartIt = remove_if(Blocks, [&](const BlockTy &Block) {
		BasicBlock *BB = Block.first;
		unsigned Score = Block.second;
		bool InSubRegion =
		BB == SuggestedEntryPoint \|\| DT.dominates(SuggestedEntryPoint, BB);
		if (!InSubRegion && Score > NextScore) {
		NextEntryPoint = BB;
		NextScore = Score;
		}
		if (InSubRegion && BB != SuggestedEntryPoint)
		SubRegion.push_back(BB);
		return InSubRegion;
		});
		Blocks.erase(RegionStartIt, RegionEndIt);

		// Update the suggested entry point.
		SuggestedEntryPoint = NextEntryPoint;

		return SubRegion;
		}
		};

		bool HotColdSplitting::outlineColdRegions(Function &F, ProfileSummaryInfo &PSI,
		BlockFrequencyInfo *BFI,
		TargetTransformInfo &TTI,
		DominatorTree &DT, PostDomTree &PDT,
		OptimizationRemarkEmitter &ORE) {
		bool Changed = false;

		// The set of cold blocks.
		SmallPtrSet<BasicBlock *, 4> ColdBlocks;

		// The worklist of non-intersecting regions left to outline.
		SmallVector<OutliningRegion, 2> OutliningWorklist;

		// Set up an RPO traversal. Experimentally, this performs better (outlines
		// more) than a PO traversal, because we prevent region overlap by keeping
		// the first region to contain a block.
		ReversePostOrderTraversal<Function *> RPOT(&F);

		// Find all cold regions.
		for (BasicBlock *BB : RPOT) {
		// Skip blocks which can't be outlined.
		if (!mayExtractBlock(*BB))
		continue;

		// This block is already part of some outlining region.
		if (ColdBlocks.count(BB))
		continue;

		bool Cold = PSI.isColdBlock(BB, BFI) \|\|
		(EnableStaticAnalyis && unlikelyExecuted(*BB));
		if (!Cold)
		continue;

		LLVM_DEBUG({
		dbgs() << "Found a cold block:\n";
		BB->dump();
		});

		auto Region = OutliningRegion::create(*BB, DT, PDT);
		if (Region.empty())
		continue;

		if (Region.isEntireFunctionCold()) {
		LLVM_DEBUG(dbgs() << "Entire function is cold\n");
		return markEntireFunctionCold(F);
		}

		// If this outlining region intersects with another, drop the new region.
		//
		// TODO: It's theoretically possible to outline more by only keeping the
		// largest region which contains a block, but the extra bookkeeping to do
		// this is tricky/expensive.
		bool RegionsOverlap = any_of(Region.blocks(), [&](const BlockTy &Block) {
		return !ColdBlocks.insert(Block.first).second;
		});
		if (RegionsOverlap)
		continue;

		OutliningWorklist.emplace_back(std::move(Region));
		++NumColdRegionsFound;
		}

		// Outline single-entry cold regions, splitting up larger regions as needed.
		unsigned OutlinedFunctionID = 1;
		while (!OutliningWorklist.empty()) {
		OutliningRegion Region = OutliningWorklist.pop_back_val();
		assert(!Region.empty() && "Empty outlining region in worklist");
		do {
		BlockSequence SubRegion = Region.takeSingleEntrySubRegion(DT);
		if (!isProfitableToOutline(SubRegion, TTI)) {
		LLVM_DEBUG({
		dbgs() << "Skipping outlining; not profitable to outline\n";
		SubRegion[0]->dump();
		});
		continue;
		}

		LLVM_DEBUG({
		dbgs() << "Hot/cold splitting attempting to outline these blocks:\n";
		for (BasicBlock *BB : SubRegion)
		BB->dump();
		});

		Function *Outlined =
		extractColdRegion(SubRegion, DT, BFI, TTI, ORE, OutlinedFunctionID);
		if (Outlined) {
		++OutlinedFunctionID;
		OutlinedFunctions.insert(Outlined);
		Changed = true;
		}
		} while (!Region.empty());
		}

		return Changed;
		}

bool HotColdSplitting::run(Module &M) {		bool HotColdSplitting::run(Module &M) {
bool Changed = false;		bool Changed = false;
		OutlinedFunctions.clear();
for (auto &F : M) {		for (auto &F : M) {
if (!shouldOutlineFrom(F)) {		if (!shouldOutlineFrom(F)) {
LLVM_DEBUG(llvm::dbgs() << "Not outlining in " << F.getName() << "\n");		LLVM_DEBUG(llvm::dbgs() << "Skipping " << F.getName() << "\n");
continue;		continue;
}		}

LLVM_DEBUG(llvm::dbgs() << "Outlining in " << F.getName() << "\n");		LLVM_DEBUG(llvm::dbgs() << "Outlining in " << F.getName() << "\n");
DominatorTree DT(F);		DominatorTree DT(F);
PostDomTree PDT(F);		PostDomTree PDT(F);
PDT.recalculate(F);		PDT.recalculate(F);
BlockFrequencyInfo *BFI = GetBFI(F);		BlockFrequencyInfo *BFI = GetBFI(F);
TargetTransformInfo &TTI = GetTTI(F);		TargetTransformInfo &TTI = GetTTI(F);

BlockSequence ColdRegion = getLargestColdRegion(F, *PSI, BFI, TTI, DT, PDT);
if (ColdRegion.empty())
continue;

OptimizationRemarkEmitter &ORE = (*GetORE)(F);		OptimizationRemarkEmitter &ORE = (*GetORE)(F);
Function *Outlined =		Changed \|= outlineColdRegions(F, *PSI, BFI, TTI, DT, PDT, ORE);
extractColdRegion(ColdRegion, DT, BFI, TTI, ORE, /Count=/1);
if (Outlined) {
OutlinedFunctions.insert(Outlined);
Changed = true;
}
}		}
return Changed;		return Changed;
}		}

bool HotColdSplittingLegacyPass::runOnModule(Module &M) {		bool HotColdSplittingLegacyPass::runOnModule(Module &M) {
if (skipModule(M))		if (skipModule(M))
return false;		return false;
ProfileSummaryInfo *PSI =		ProfileSummaryInfo *PSI =
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/HotColdSplit/eh-pads.ll

				; RUN: opt -S -hotcoldsplit < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK-LABEL: define {{.*}}@foo(
				; CHECK: landingpad
				; CHECK: sideeffect(i32 2)

				; CHECK-LABEL: define {{.*}}@foo.cold.1(
				; CHECK: sideeffect(i32 0)
				; CHECK: sideeffect(i32 1)
				; CHECK: sink

				define void @foo(i32 %cond) personality i8 0 {
				entry:
				invoke void @llvm.donothing() to label %normal unwind label %exception

				exception:
				; Note: EH pads are not candidates for region entry points.
				%cleanup = landingpad i8 cleanup
				br label %continue_exception

				continue_exception:
				call void @sideeffect(i32 0)
				call void @sideeffect(i32 1)
				call void @sink()
				ret void

				normal:
				call void @sideeffect(i32 2)
				ret void
				}

				declare void @sideeffect(i32)

				declare void @sink() cold

				declare void @llvm.donothing() nounwind readnone

llvm/trunk/test/Transforms/HotColdSplit/extraction-subregion-breaks-phis.ll

				; RUN: opt -S -hotcoldsplit < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK-LABEL: define {{.*}}@foo(
				; CHECK: call {{.*}}@foo.cold.1(
				; CHECK: unreachable

				; CHECK-LABEL: define {{.*}}@foo.cold.1(
				; CHECK: switch i32 undef, label %sw.epilog.i
				define void @foo(i32 %QMM) {
				entry:
				switch i32 %QMM, label %entry.if.end16_crit_edge [
				i32 1, label %if.then
				]

				entry.if.end16_crit_edge: ; preds = %entry
				br label %if.end16

				if.then: ; preds = %entry
				br i1 undef, label %cond.true.i.i, label %_ZN10StringView8popFrontEv.exit.i

				cond.true.i.i: ; preds = %if.then
				ret void

				_ZN10StringView8popFrontEv.exit.i: ; preds = %if.then
				switch i32 undef, label %sw.epilog.i [
				i32 81, label %if.end16
				i32 82, label %sw.bb4.i
				i32 83, label %sw.bb8.i
				i32 84, label %sw.bb12.i
				i32 65, label %if.end16
				i32 66, label %sw.bb20.i
				i32 67, label %sw.bb24.i
				i32 68, label %sw.bb28.i
				]

				sw.bb4.i: ; preds = %_ZN10StringView8popFrontEv.exit.i
				br label %if.end16

				sw.bb8.i: ; preds = %_ZN10StringView8popFrontEv.exit.i
				br label %if.end16

				sw.bb12.i: ; preds = %_ZN10StringView8popFrontEv.exit.i
				br label %if.end16

				sw.bb20.i: ; preds = %_ZN10StringView8popFrontEv.exit.i
				br label %if.end16

				sw.bb24.i: ; preds = %_ZN10StringView8popFrontEv.exit.i
				br label %if.end16

				sw.bb28.i: ; preds = %_ZN10StringView8popFrontEv.exit.i
				br label %if.end16

				sw.epilog.i: ; preds = %_ZN10StringView8popFrontEv.exit.i
				br label %if.end16

				if.end16: ; preds = %sw.epilog.i, %sw.bb28.i, %sw.bb24.i, %sw.bb20.i, %sw.bb12.i, %sw.bb8.i, %sw.bb4.i, %_ZN10StringView8popFrontEv.exit.i, %_ZN10StringView8popFrontEv.exit.i, %entry.if.end16_crit_edge
				%0 = phi i8 [ 0, %entry.if.end16_crit_edge ], [ 0, %_ZN10StringView8popFrontEv.exit.i ], [ 0, %_ZN10StringView8popFrontEv.exit.i ], [ 1, %sw.bb4.i ], [ 2, %sw.bb8.i ], [ 3, %sw.bb12.i ], [ 1, %sw.bb20.i ], [ 2, %sw.bb24.i ], [ 3, %sw.bb28.i ], [ 0, %sw.epilog.i ]
				unreachable
				}

llvm/trunk/test/Transforms/HotColdSplit/forward-dfs-reaches-marked-block.ll

				; RUN: opt -hotcoldsplit -S < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK-LABEL: define {{.*}}@fun
				; CHECK: call {{.*}}@fun.cold.1(
				define void @fun() {
				entry:
				br i1 undef, label %if.then, label %if.else

				if.then:
				; This will be marked by the inverse DFS on sink-predecesors.
				br label %sink

				sink:
				call void @sink()

				; Do not allow the forward-DFS on sink-successors to mark the block again.
				br i1 undef, label %if.then, label %if.then.exit

				if.then.exit:
				ret void

				if.else:
				ret void
				}

				declare void @sink() cold

llvm/trunk/test/Transforms/HotColdSplit/mark-the-whole-func-cold.ll

				; RUN: opt -S -hotcoldsplit < %s \| FileCheck %s

				; Source:
				;
				; extern __attribute__((cold)) void sink();
				; extern void sideeffect(int);
				; void foo(int cond1, int cond2) {
				; if (cond1) {
				; if (cond2) {
				; sideeffect(0);
				; } else {
				; sideeffect(1);
				; }
				; sink();
				; } else {
				; sideeffect(2);
				; }
				; sink();
				; }

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK: define {{.}}@_Z3fooii{{.}}#[[outlined_func_attr:[0-9]+]]
				; CHECK-NOT: _Z3fooii.cold
				; CHECK: attributes #[[outlined_func_attr]] = { {{.*}}minsize
				define void @_Z3fooii(i32, i32) {
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				store i32 %0, i32* %3, align 4
				store i32 %1, i32* %4, align 4
				%5 = load i32, i32* %3, align 4
				%6 = icmp ne i32 %5, 0
				br i1 %6, label %7, label %13

				; <label>:7: ; preds = %2
				%8 = load i32, i32* %4, align 4
				%9 = icmp ne i32 %8, 0
				br i1 %9, label %10, label %11

				; <label>:10: ; preds = %7
				call void @_Z10sideeffecti(i32 0)
				br label %12

				; <label>:11: ; preds = %7
				call void @_Z10sideeffecti(i32 1)
				br label %12

				; <label>:12: ; preds = %11, %10
				call void @_Z4sinkv() #3
				br label %14

				; <label>:13: ; preds = %2
				call void @_Z10sideeffecti(i32 2)
				br label %14

				; <label>:14: ; preds = %13, %12
				call void @_Z4sinkv() #3
				ret void
				}

				declare void @_Z10sideeffecti(i32)

				declare void @_Z4sinkv() cold

llvm/trunk/test/Transforms/HotColdSplit/outline-disjoint-diamonds.ll

				; RUN: opt -S -hotcoldsplit < %s 2>&1 \| FileCheck %s

				; CHECK-LABEL: define {{.*}}@fun
				; CHECK: call {{.*}}@fun.cold.2(
				; CHECK-NEXT: ret void
				; CHECK: call {{.*}}@fun.cold.1(
				; CHECK-NEXT: ret void
				define void @fun() {
				entry:
				br i1 undef, label %A.then, label %A.else

				A.else:
				br label %A.then4

				A.then4:
				br i1 undef, label %A.then5, label %A.end

				A.then5:
				br label %A.cleanup

				A.end:
				br label %A.cleanup

				A.cleanup:
				%A.cleanup.dest.slot.0 = phi i32 [ 1, %A.then5 ], [ 0, %A.end ]
				unreachable

				A.then:
				br i1 undef, label %B.then, label %B.else

				B.then:
				ret void

				B.else:
				br label %B.then4

				B.then4:
				br i1 undef, label %B.then5, label %B.end

				B.then5:
				br label %B.cleanup

				B.end:
				br label %B.cleanup

				B.cleanup:
				%B.cleanup.dest.slot.0 = phi i32 [ 1, %B.then5 ], [ 0, %B.end ]
				unreachable
				}

				; CHECK-LABEL: define {{.*}}@fun.cold.1(
				; CHECK: %B.cleanup.dest.slot.0 = phi i32 [ 1, %B.then5 ], [ 0, %B.end ]
				; CHECK-NEXT: unreachable

				; CHECK-LABEL: define {{.*}}@fun.cold.2(
				; CHECK: %A.cleanup.dest.slot.0 = phi i32 [ 1, %A.then5 ], [ 0, %A.end ]
				; CHECK-NEXT: unreachable

llvm/trunk/test/Transforms/HotColdSplit/outline-multiple-entry-region.ll

				; RUN: opt -S -hotcoldsplit < %s \| FileCheck %s

				; Source:
				;
				; extern __attribute__((cold)) void sink();
				; extern void sideeffect(int);
				; void foo(int cond1, int cond2) {
				; while (true) {
				; if (cond1) {
				; sideeffect(0); // This is cold (it reaches sink()).
				; break;
				; }
				; if (cond2) {
				; sideeffect(1); // This is cold (it reaches sink()).
				; break;
				; }
				; sideeffect(2);
				; return;
				; }
				; sink();
				; sideeffect(3);
				; }

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK-LABEL: define {{.*}}@_Z3fooii.cold.1
				; CHECK: call void @_Z10sideeffecti(i32 1)
				; CHECK: call void @_Z10sideeffecti(i32 11)

				; CHECK-LABEL: define {{.*}}@_Z3fooii.cold.2
				; CHECK: call void @_Z10sideeffecti(i32 0)
				; CHECK: call void @_Z10sideeffecti(i32 10)

				; CHECK-LABEL: define {{.*}}@_Z3fooii.cold.3
				; CHECK: call void @_Z4sinkv
				; CHECK: call void @_Z10sideeffecti(i32 3)

				define void @_Z3fooii(i32, i32) {
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				store i32 %0, i32* %3, align 4
				store i32 %1, i32* %4, align 4
				br label %5

				; <label>:5: ; preds = %2
				%6 = load i32, i32* %3, align 4
				%7 = icmp ne i32 %6, 0
				br i1 %7, label %8, label %9

				; <label>:8: ; preds = %5
				call void @_Z10sideeffecti(i32 0)
				call void @_Z10sideeffecti(i32 10)
				br label %14

				; <label>:9: ; preds = %5
				%10 = load i32, i32* %4, align 4
				%11 = icmp ne i32 %10, 0
				br i1 %11, label %12, label %13

				; <label>:12: ; preds = %9
				call void @_Z10sideeffecti(i32 1)
				call void @_Z10sideeffecti(i32 11)
				br label %14

				; <label>:13: ; preds = %9
				call void @_Z10sideeffecti(i32 2)
				br label %15

				; <label>:14: ; preds = %12, %8
				call void @_Z4sinkv() #3
				call void @_Z10sideeffecti(i32 3)
				br label %15

				; <label>:15: ; preds = %14, %13
				ret void
				}

				declare void @_Z10sideeffecti(i32)

				declare void @_Z4sinkv() cold

llvm/trunk/test/Transforms/HotColdSplit/outline-while-loop.ll

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	while.end: ; preds = %while.end.loopexit, %while.cond.preheader
tail call void (...) @sink()		tail call void (...) @sink()
ret void		ret void

if.end: ; preds = %entry		if.end: ; preds = %entry
tail call void @sideeffect(i32 1)		tail call void @sideeffect(i32 1)
ret void		ret void
}		}

		; This is the same as @foo, but the while loop comes after the sink block.
		; CHECK-LABEL: define {{.*}}@while_loop_after_sink(
		; CHECK: br i1 {{.*}}, label %if.end, label %codeRepl
		; CHECK-LABEL: codeRepl:
		; CHECK-NEXT: call void @while_loop_after_sink.cold.1
		; CHECK-LABEL: if.end:
		; CHECK: call void @sideeffect(i32 1)
		define void @while_loop_after_sink(i32 %cond) {
		entry:
		%tobool = icmp eq i32 %cond, 0
		br i1 %tobool, label %if.end, label %sink

		sink:
		tail call void (...) @sink()
		br label %while.cond.preheader

		while.cond.preheader:
		%cmp3 = icmp sgt i32 %cond, 10
		br i1 %cmp3, label %while.body.preheader, label %while.end

		while.body.preheader: ; preds = %while.cond.preheader
		br label %while.body

		while.body: ; preds = %while.body.preheader, %while.body
		%cond.addr.04 = phi i32 [ %dec, %while.body ], [ %cond, %while.body.preheader ]
		%dec = add nsw i32 %cond.addr.04, -1
		tail call void @sideeffect(i32 0) #3
		%cmp = icmp sgt i32 %dec, 10
		br i1 %cmp, label %while.body, label %while.end.loopexit

		while.end.loopexit: ; preds = %while.body
		br label %while.end

		while.end: ; preds = %while.end.loopexit, %while.cond.preheader
		ret void

		if.end: ; preds = %entry
		tail call void @sideeffect(i32 1)
		ret void
		}

; CHECK-LABEL: define {{.*}}@foo.cold.1		; CHECK-LABEL: define {{.*}}@foo.cold.1
; CHECK: phi i32		; CHECK: phi i32
; CHECK-NEXT: add nsw i32		; CHECK-NEXT: add nsw i32
; CHECK-NEXT: call {{.*}}@sideeffect		; CHECK-NEXT: call {{.*}}@sideeffect
; CHECK-NEXT: icmp		; CHECK-NEXT: icmp
; CHECK-NEXT: br		; CHECK-NEXT: br

		; CHECK-LABEL: define {{.*}}@while_loop_after_sink.cold.1
		; CHECK: call {{.*}}@sink
		; CHECK: phi i32
		; CHECK-NEXT: add nsw i32
		; CHECK-NEXT: call {{.*}}@sideeffect
		; CHECK-NEXT: icmp
		; CHECK-NEXT: br

declare void @sideeffect(i32)		declare void @sideeffect(i32)

declare void @sink(...) cold		declare void @sink(...) cold

llvm/trunk/test/Transforms/HotColdSplit/phi-with-distinct-outlined-values.ll

				; RUN: opt -S -hotcoldsplit < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK-LABEL: define {{.*}}@foo(
				; CHECK: phi i32 [ 0, %entry ], [ %p.ce.reload, %codeRepl ]

				; CHECK-LABEL: define {{.*}}@foo.cold.1(
				; CHECK: call {{.*}}@sink
				; CHECK: %p.ce = phi i32 [ 1, %coldbb ], [ 3, %coldbb2 ]
				; CHECK-NEXT: store i32 %p.ce, i32* %p.ce.out

				define void @foo(i32 %cond) {
				entry:
				%tobool = icmp eq i32 %cond, 0
				br i1 %tobool, label %if.end, label %coldbb

				coldbb:
				call void @sink()
				call void @sideeffect()
				call void @sideeffect()
				br i1 undef, label %if.end, label %coldbb2

				coldbb2:
				br label %if.end

				if.end:
				%p = phi i32 [0, %entry], [1, %coldbb], [3, %coldbb2]
				ret void
				}

				declare void @sink() cold

				declare void @sideeffect()

llvm/trunk/test/Transforms/HotColdSplit/region-overlap.ll

				; RUN: opt -S -hotcoldsplit < %s \| FileCheck %s

				; Source:
				;
				; __attribute__((cold)) extern void sink(int);
				; extern void sideeffect(int);
				; void foo(int cond1, int cond2) {
				; if (cond1) {
				; if (cond2) { // This is the first cold region we visit.
				; sideeffect(0);
				; sideeffect(10);
				; sink(0);
				; }
				;
				; // There's a larger, overlapping cold region here. But we ignore it.
				; // This could be improved.
				; sideeffect(1);
				; sideeffect(11);
				; sink(1);
				; }
				; }

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK-LABEL: define {{.*}}@_Z3fooii
				; CHECK: call {{.*}}@_Z3fooii.cold.1
				; CHECK-NOT: _Z3fooii.cold
				define void @_Z3fooii(i32, i32) {
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				store i32 %0, i32* %3, align 4
				store i32 %1, i32* %4, align 4
				%5 = load i32, i32* %3, align 4
				%6 = icmp ne i32 %5, 0
				br i1 %6, label %7, label %12

				; <label>:7: ; preds = %2
				%8 = load i32, i32* %4, align 4
				%9 = icmp ne i32 %8, 0
				br i1 %9, label %10, label %11

				; <label>:10: ; preds = %7
				call void @_Z10sideeffecti(i32 0)
				call void @_Z10sideeffecti(i32 10)
				call void @_Z4sinki(i32 0) #3
				br label %11

				; <label>:11: ; preds = %10, %7
				call void @_Z10sideeffecti(i32 1)
				call void @_Z10sideeffecti(i32 11)
				call void @_Z4sinki(i32 1) #3
				br label %12

				; <label>:12: ; preds = %11, %2
				ret void
				}

				; CHECK-LABEL: define {{.*}}@_Z3fooii.cold.1
				; CHECK: call void @_Z10sideeffecti(i32 0)
				; CHECK: call void @_Z10sideeffecti(i32 10)

				declare void @_Z10sideeffecti(i32)

				declare void @_Z4sinki(i32) cold

llvm/trunk/test/Transforms/HotColdSplit/succ-block-with-self-edge.ll

				; RUN: opt -S -hotcoldsplit < %s \| FileCheck %s

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-apple-macosx10.14.0"

				; CHECK-LABEL: define {{.*}}@exit_block_with_same_incoming_vals
				; CHECK: call {{.*}}@exit_block_with_same_incoming_vals.cold.1(
				; CHECK-NOT: br i1 undef
				; CHECK: phi i32 [ 0, %entry ], [ %p.ce.reload, %codeRepl ]
				define void @exit_block_with_same_incoming_vals(i32 %cond) {
				entry:
				%tobool = icmp eq i32 %cond, 0
				br i1 %tobool, label %if.end, label %coldbb

				coldbb:
				call void @sink()
				call void @sideeffect()
				call void @sideeffect()
				br i1 undef, label %if.end, label %coldbb2

				coldbb2:
				%p2 = phi i32 [0, %coldbb], [1, %coldbb2]
				br i1 undef, label %if.end, label %coldbb2

				if.end:
				%p = phi i32 [0, %entry], [1, %coldbb], [1, %coldbb2]
				ret void
				}

				; CHECK-LABEL: define {{.*}}@exit_block_with_distinct_incoming_vals
				; CHECK: call {{.*}}@exit_block_with_distinct_incoming_vals.cold.1(
				; CHECK-NOT: br i1 undef
				; CHECK: phi i32 [ 0, %entry ], [ %p.ce.reload, %codeRepl ]
				define void @exit_block_with_distinct_incoming_vals(i32 %cond) {
				entry:
				%tobool = icmp eq i32 %cond, 0
				br i1 %tobool, label %if.end, label %coldbb

				coldbb:
				call void @sink()
				call void @sideeffect()
				call void @sideeffect()
				br i1 undef, label %if.end, label %coldbb2

				coldbb2:
				%p2 = phi i32 [0, %coldbb], [1, %coldbb2]
				br i1 undef, label %if.end, label %coldbb2

				if.end:
				%p = phi i32 [0, %entry], [1, %coldbb], [2, %coldbb2]
				ret void
				}

				declare void @sink() cold

				declare void @sideeffect()