This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/IPO/
-
lib/
-
Transforms/
-
IPO/
-
HotColdSplitting.cpp

Differential D86832

[HotColdSplit][WIP] Add support for outlining Itanium EH blocks by hoisting calls to eh.typeid.for intrinsic
Needs ReviewPublic

Authored by rjf on Aug 28 2020, 10:03 PM.

Download Raw Diff

Details

Reviewers

hiraditya
rcorcs
vsk

Summary

Currently, Hot Cold Splitting does not support outlining exception handling regions.

The difficulties of outlining EH regions are as follows:

We cannot extract the block containing the invoke, otherwise we'll potentially extract the hot branch as well;
We cannot extract the entire landing pad block, since the first instruction after the unwind edge into the lpad block must be the landingpad instruction.
It seems possible to simply split the lpad block into two from the first instruction, and then outling starting from there; this is analogous to issue #4, which we outline below;
The block at catch.dispatch contains potentially a series of calls to the eh.typeid.for intrinsic to use function-specific type information to match if the catch call can go through. As such, CodeExtractor cannot extract these calls (See detailed discussion and example in https://bugs.llvm.org/show_bug.cgi?id=39545). Making typeid.for outlining-friendly seems in general a difficult task, as the proposed patch in 39545 uses an entirely new pass to do so.

What remains is the idea of extracting the typeid.for intrinsic calls to
further up in the control flow graph. Without calls to eh.typeid.for intrinsic,
we can perform outlining safely and store the resultant values in some variable.

To safely implement this strategy, we need to hoist these call instructions to
the highest post-landingpad block that dominates them. Otherwise, incorrect compilation
might happen with cases of nested try/catches.

Empirical evaluation with -Os on Firefox yields the following data:

-Os, EH outlining enabled: 142678 cold regions detected/62985 cold regions outlined, code size: 2.187481424GB

-Os, EH outlining disabled: 142081 cold regions detected/62982 cold regions outlined, code size: 2.188262032 GB

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rjf created this revision.Aug 28 2020, 10:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2020, 10:03 PM

Herald added subscribers: llvm-commits, danielkiss. · View Herald Transcript

rjf requested review of this revision.Aug 28 2020, 10:03 PM

This is marked WIP since it might be better to do the code transformation inside CodeExtractor instead of HotColdSplitting. With HCS, we're currently doing it before code extraction, but the region might end up being unworthy for extraction. But if we do it in CodeExtractor, we'll need another implementation of OutliningRegion to detect the SESE region enclosing the EH blocks.

Update comments.

Add test case for EH outlining

Harbormaster completed remote builds in B70010: Diff 288758.Aug 28 2020, 10:41 PM

Harbormaster completed remote builds in B70013: Diff 288761.Aug 28 2020, 11:04 PM

Harbormaster completed remote builds in B70011: Diff 288759.Aug 28 2020, 11:10 PM

Matt added a subscriber: Matt.Mar 3 2021, 7:06 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

HotColdSplitting.cpp

96 lines

Diff 288758

llvm/lib/Transforms/IPO/HotColdSplitting.cpp

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/CodeExtractor.h"		#include "llvm/Transforms/Utils/CodeExtractor.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/ValueMapper.h"		#include "llvm/Transforms/Utils/ValueMapper.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
		#include <set>
#include <string>		#include <string>

#define DEBUG_TYPE "hotcoldsplit"		#define DEBUG_TYPE "hotcoldsplit"

STATISTIC(NumColdRegionsFound, "Number of cold regions found.");		STATISTIC(NumColdRegionsFound, "Number of cold regions found.");
STATISTIC(NumColdRegionsOutlined, "Number of cold regions outlined.");		STATISTIC(NumColdRegionsOutlined, "Number of cold regions outlined.");

using namespace llvm;		using namespace llvm;
Show All 12 Lines	cl::desc("Enable placement of extracted cold functions"
" into a separate section after hot-cold splitting."));		" into a separate section after hot-cold splitting."));

static cl::opt<std::string>		static cl::opt<std::string>
ColdSectionName("hotcoldsplit-cold-section-name", cl::init("__llvm_cold"),		ColdSectionName("hotcoldsplit-cold-section-name", cl::init("__llvm_cold"),
cl::Hidden,		cl::Hidden,
cl::desc("Name for the section containing cold functions "		cl::desc("Name for the section containing cold functions "
"extracted by hot-cold splitting."));		"extracted by hot-cold splitting."));

		static cl::opt<bool>
		OutlineEH("hotcoldsplit-outline-eh", cl::init(false), cl::Hidden,
		cl::desc("Perform outlining for Itanium ABI-based"
		" exception handling blocks."));

namespace {		namespace {
// Same as blockEndsInUnreachable in CodeGen/BranchFolding.cpp. Do not modify		// Same as blockEndsInUnreachable in CodeGen/BranchFolding.cpp. Do not modify
// this function unless you modify the MBB version as well.		// this function unless you modify the MBB version as well.
//		//
/// A no successor, non-return block probably ends in unreachable and is cold.		/// A no successor, non-return block probably ends in unreachable and is cold.
/// Also consider a block that ends in an indirect branch to be a return block,		/// Also consider a block that ends in an indirect branch to be a return block,
/// since many targets use plain indirect branches to return.		/// since many targets use plain indirect branches to return.
bool blockEndsInUnreachable(const BasicBlock &BB) {		bool blockEndsInUnreachable(const BasicBlock &BB) {
if (!succ_empty(&BB))		if (!succ_empty(&BB))
return false;		return false;
if (BB.empty())		if (BB.empty())
return true;		return true;
const Instruction *I = BB.getTerminator();		const Instruction *I = BB.getTerminator();
return !(isa<ReturnInst>(I) \|\| isa<IndirectBrInst>(I));		return !(isa<ReturnInst>(I) \|\| isa<IndirectBrInst>(I));
}		}

bool unlikelyExecuted(BasicBlock &BB, ProfileSummaryInfo *PSI,		bool unlikelyExecuted(BasicBlock &BB, ProfileSummaryInfo *PSI,
BlockFrequencyInfo *BFI) {		BlockFrequencyInfo *BFI) {
// Exception handling blocks are unlikely executed.		// Exception handling blocks are unlikely executed.
if (BB.isEHPad() \|\| isa<ResumeInst>(BB.getTerminator()))		if (!OutlineEH && (BB.isEHPad() \|\| isa<ResumeInst>(BB.getTerminator())))
return true;		return true;

// The block is cold if it calls/invokes a cold function. However, do not		// The block is cold if it calls/invokes a cold function. However, do not
// mark sanitizer traps as cold.		// mark sanitizer traps as cold.
for (Instruction &I : BB)		for (Instruction &I : BB)
if (auto *CB = dyn_cast<CallBase>(&I))		if (auto *CB = dyn_cast<CallBase>(&I))
if (CB->hasFnAttr(Attribute::Cold) && !CB->getMetadata("nosanitize"))		if (CB->hasFnAttr(Attribute::Cold) && !CB->getMetadata("nosanitize"))
return true;		return true;
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	Function *HotColdSplitting::extractColdRegion(
int OutliningBenefit = getOutliningBenefit(Region, TTI);		int OutliningBenefit = getOutliningBenefit(Region, TTI);
int OutliningPenalty =		int OutliningPenalty =
getOutliningPenalty(Region, Inputs.size(), Outputs.size());		getOutliningPenalty(Region, Inputs.size(), Outputs.size());
LLVM_DEBUG(dbgs() << "Split profitability: benefit = " << OutliningBenefit		LLVM_DEBUG(dbgs() << "Split profitability: benefit = " << OutliningBenefit
<< ", penalty = " << OutliningPenalty << "\n");		<< ", penalty = " << OutliningPenalty << "\n");
if (OutliningBenefit <= OutliningPenalty)		if (OutliningBenefit <= OutliningPenalty)
return nullptr;		return nullptr;

		LLVM_DEBUG(dbgs() << "Attempting to outline region into function\n");
		LLVM_DEBUG(dbgs() << "Region size = " << Region.size() << "\n");
		LLVM_DEBUG(dbgs() << "Region entry block = " << Region[0]->getName() << "\n");

Function *OrigF = Region[0]->getParent();		Function *OrigF = Region[0]->getParent();
if (Function *OutF = CE.extractCodeRegion(CEAC)) {		if (Function *OutF = CE.extractCodeRegion(CEAC)) {
User U = OutF->user_begin();		User U = OutF->user_begin();
CallInst *CI = cast<CallInst>(U);		CallInst *CI = cast<CallInst>(U);
NumColdRegionsOutlined++;		NumColdRegionsOutlined++;
if (TTI.useColdCCForColdCall(*OutF)) {		if (TTI.useColdCCForColdCall(*OutF)) {
OutF->setCallingConv(CallingConv::Cold);		OutF->setCallingConv(CallingConv::Cold);
CI->setCallingConv(CallingConv::Cold);		CI->setCallingConv(CallingConv::Cold);
Show All 14 Lines	ORE.emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "HotColdSplit",		return OptimizationRemark(DEBUG_TYPE, "HotColdSplit",
&*Region[0]->begin())		&*Region[0]->begin())
<< ore::NV("Original", OrigF) << " split cold code into "		<< ore::NV("Original", OrigF) << " split cold code into "
<< ore::NV("Split", OutF);		<< ore::NV("Split", OutF);
});		});
return OutF;		return OutF;
}		}

		LLVM_DEBUG(dbgs() << "CodeExtractor failed to extract the region\n");
ORE.emit([&]() {		ORE.emit([&]() {
return OptimizationRemarkMissed(DEBUG_TYPE, "ExtractFailed",		return OptimizationRemarkMissed(DEBUG_TYPE, "ExtractFailed",
&*Region[0]->begin())		&*Region[0]->begin())
<< "Failed to extract region at block "		<< "Failed to extract region at block "
<< ore::NV("Block", Region.front());		<< ore::NV("Block", Region.front());
});		});
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	bool HotColdSplitting::outlineColdRegions(Function &F, bool HasProfileSummary) {
BlockFrequencyInfo *BFI = nullptr;		BlockFrequencyInfo *BFI = nullptr;
if (HasProfileSummary)		if (HasProfileSummary)
BFI = GetBFI(F);		BFI = GetBFI(F);

TargetTransformInfo &TTI = GetTTI(F);		TargetTransformInfo &TTI = GetTTI(F);
OptimizationRemarkEmitter &ORE = (*GetORE)(F);		OptimizationRemarkEmitter &ORE = (*GetORE)(F);
AssumptionCache *AC = LookupAC(F);		AssumptionCache *AC = LookupAC(F);

		// For each catch.dispatch block, elevate the
		// calls to eh.typeid.for instructions into
		// the landingpad block for outlining purposes.

		std::set<BasicBlock *> LPadSuccessors;

		// Split EH pad blocks into a landing pad block and the
		// rest. We can start outlining at the first non-landingpad
		// instruction.

		// The EH outlining strategy below only works with Itanium-style EH.
		// WinEH outlining is not supported. We check if the personality
		// function is WinEH's (CxxFrameHandler3), or we try to do EH outlining.
		// TODO find better way of finding out if function uses WinEH handling
		// or find a way to outling WinEH code.
		if (OutlineEH && F.hasPersonalityFn() &&
		!F.getPersonalityFn()->getName().endswith("CxxFrameHandler3")) {
		for (BasicBlock *BB : RPOT)
		if (BB->isEHPad()) {
		LLVM_DEBUG({
		dbgs() << "Found an EH Basic Block: ";
		BB->dump();
		dbgs() << "------------------------\n";
		});

		if (!DT)
		DT = std::make_unique<DominatorTree>(F);
		std::vector<Instruction *> EHIntrinsicCalls;
		SmallVector<BasicBlock *, 2> Descendants;
		DT->getDescendants(BB, Descendants);
		for (BasicBlock *SuccBB : Descendants) {
		for (Instruction &I : *SuccBB) {
		if (isa<CallInst>(&I)) {
		const CallInst *CI = dyn_cast<CallInst>(&I);
		if (CI->getIntrinsicID() == Intrinsic::eh_typeid_for)
		EHIntrinsicCalls.push_back(&I);
		}
		}
		}
		Instruction *LPadInst = BB->getLandingPadInst()->getNextNode();
		BasicBlock *NewSuccessorBlock = SplitBlock(BB, LPadInst, DT.get());
		for (size_t I = 0; I < EHIntrinsicCalls.size(); I++) {
		EHIntrinsicCalls[I]->removeFromParent();
		// Insert eh.typeid.for call after the landingpad instruction.
		// We split \p BB from the next instruction after the landingpad
		// instruction, so the landingpad instruction's successor
		// must be the terminating unconditional branch.
		Instruction *PreBranchInst = BB->getTerminator()->getPrevNode();
		BB->getInstList().insertAfter(PreBranchInst->getIterator(),
		EHIntrinsicCalls[I]);
		}
		LLVM_DEBUG({
		dbgs()
		<< "EH Outliner: Split BB into lpad and lpad.split, lpad.split: ";
		NewSuccessorBlock->dump();
		dbgs() << "---------------------\n";
		dbgs() << "EH Outliner: lpad:";
		BB->dump();
		dbgs() << "---------------------\n";
		});
		LPadSuccessors.insert(NewSuccessorBlock);
		}
		}

// Find all cold regions.		// Find all cold regions.
for (BasicBlock *BB : RPOT) {		for (BasicBlock *BB : RPOT) {
// This block is already part of some outlining region.		// This block is already part of some outlining region.
if (ColdBlocks.count(BB))		if (ColdBlocks.count(BB))
continue;		continue;

bool Cold = (BFI && PSI->isColdBlock(BB, BFI)) \|\|		bool Cold = (BFI && PSI->isColdBlock(BB, BFI)) \|\|
(EnableStaticAnalysis && unlikelyExecuted(*BB, PSI, BFI));		(EnableStaticAnalysis && unlikelyExecuted(*BB, PSI, BFI));

		if (OutlineEH && EnableStaticAnalysis && BB->getSinglePredecessor() &&
		BB->getSinglePredecessor()->isEHPad()) {
		LLVM_DEBUG(dbgs() << "EH Outliner: block " << BB->getName()
		<< " has EHPad predecessor and marked as cold\n");
		Cold = true;
		}

		// if BB is a split EH-pad block
		if (OutlineEH && LPadSuccessors.find(BB) != LPadSuccessors.end()) {
		LLVM_DEBUG(dbgs() << "EH Outliner: Found a LPad successor block "
		<< BB->getName() << "\n");
		Cold = true;
		}

if (!Cold)		if (!Cold)
continue;		continue;

LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "Found a cold block:\n";		dbgs() << "Found a cold block:\n";
BB->dump();		BB->dump();
});		});

▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	auto LookupAC = [this](Function &F) -> AssumptionCache * {
if (auto *ACT = getAnalysisIfAvailable<AssumptionCacheTracker>())		if (auto *ACT = getAnalysisIfAvailable<AssumptionCacheTracker>())
return ACT->lookupAssumptionCache(F);		return ACT->lookupAssumptionCache(F);
return nullptr;		return nullptr;
};		};

return HotColdSplitting(PSI, GBFI, GTTI, &GetORE, LookupAC).run(M);		return HotColdSplitting(PSI, GBFI, GTTI, &GetORE, LookupAC).run(M);
}		}

PreservedAnalyses		PreservedAnalyses HotColdSplittingPass::run(Module &M,
HotColdSplittingPass::run(Module &M, ModuleAnalysisManager &AM) {		ModuleAnalysisManager &AM) {
auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();		auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();

auto LookupAC = [&FAM](Function &F) -> AssumptionCache * {		auto LookupAC = [&FAM](Function &F) -> AssumptionCache * {
return FAM.getCachedResult<AssumptionAnalysis>(F);		return FAM.getCachedResult<AssumptionAnalysis>(F);
};		};

auto GBFI = [&FAM](Function &F) {		auto GBFI = [&FAM](Function &F) {
return &FAM.getResult<BlockFrequencyAnalysis>(F);		return &FAM.getResult<BlockFrequencyAnalysis>(F);
Show All 32 Lines