This is an archive of the discontinued LLVM Phabricator instance.

Implement callsite-hotness based inline cost for Sample-based PGO
ClosedPublic

Authored by danielcdh on Jul 7 2016, 4:07 PM.

Download Raw Diff

Details

Reviewers

dnovillo
davidxl
eraman

Commits

rG9232f98279eb: Implement callsite-hotness based inline cost for Sample-based PGO
rL275073: Implement callsite-hotness based inline cost for Sample-based PGO

Summary

For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.

E.g.

if (A1 && A2 && A3 && ..... && A10) {

for (i=0; i < 100000000; i++) {
  callsite();
}

}

Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.

In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.

Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 63154.Jul 7 2016, 4:07 PM

danielcdh retitled this revision from to Implement callsite-hotness based inline cost for Sample-based PGO.

danielcdh updated this object.

danielcdh added reviewers: davidxl, eraman, dnovillo.

danielcdh added a subscriber: llvm-commits.

LGTM with a couple of small changes.

lib/IR/Metadata.cpp
1323	This will leave TotalVal in an undefined state. Perhaps initialize it to 0 at the start?
1327	Likewise.

This revision is now accepted and ready to land.Jul 11 2016, 7:48 AM

integrate Diego's comments.

danielcdh closed this revision.Jul 11 2016, 9:56 AM

Revision Contents

Path

Size

include/

llvm/

IR/

Instruction.h

5 lines

lib/

Analysis/

InlineCost.cpp

9 lines

IR/

Metadata.cpp

25 lines

Transforms/

IPO/

SampleProfile.cpp

13 lines

test/

Transforms/

Inline/

inline-hot-callsite.ll

52 lines

Diff 63525

include/llvm/IR/Instruction.h

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	public:
/// Sets the metadata on this instruction from the AAMDNodes structure.		/// Sets the metadata on this instruction from the AAMDNodes structure.
void setAAMetadata(const AAMDNodes &N);		void setAAMetadata(const AAMDNodes &N);

/// Retrieve the raw weight values of a conditional branch or select.		/// Retrieve the raw weight values of a conditional branch or select.
/// Returns true on success with profile weights filled in.		/// Returns true on success with profile weights filled in.
/// Returns false if no metadata or invalid metadata was found.		/// Returns false if no metadata or invalid metadata was found.
bool extractProfMetadata(uint64_t &TrueVal, uint64_t &FalseVal);		bool extractProfMetadata(uint64_t &TrueVal, uint64_t &FalseVal);

		/// Retrieve total raw weight values of a branch.
		/// Returns true on success with profile total weights filled in.
		/// Returns false if no metadata was found.
		bool extractProfTotalWeight(uint64_t &TotalVal);

/// Set the debug location information for this instruction.		/// Set the debug location information for this instruction.
void setDebugLoc(DebugLoc Loc) { DbgLoc = std::move(Loc); }		void setDebugLoc(DebugLoc Loc) { DbgLoc = std::move(Loc); }

/// Return the debug location for this node as a DebugLoc.		/// Return the debug location for this node as a DebugLoc.
const DebugLoc &getDebugLoc() const { return DbgLoc; }		const DebugLoc &getDebugLoc() const { return DbgLoc; }

/// Set or clear the nsw flag on this instruction, which must be an operator		/// Set or clear the nsw flag on this instruction, which must be an operator
/// which supports this flag. See LangRef.html for the meaning of this flag.		/// which supports this flag. See LangRef.html for the meaning of this flag.
▲ Show 20 Lines • Show All 333 Lines • Show Last 20 Lines

lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	if (DefaultInlineThreshold.getNumOccurrences() > 0) {
// If -inline-threshold is not given, listen to the optsize and minsize		// If -inline-threshold is not given, listen to the optsize and minsize
// attributes when they would decrease the threshold.		// attributes when they would decrease the threshold.
if (Caller->optForMinSize() && OptMinSizeThreshold < Threshold)		if (Caller->optForMinSize() && OptMinSizeThreshold < Threshold)
Threshold = OptMinSizeThreshold;		Threshold = OptMinSizeThreshold;
else if (Caller->optForSize() && OptSizeThreshold < Threshold)		else if (Caller->optForSize() && OptSizeThreshold < Threshold)
Threshold = OptSizeThreshold;		Threshold = OptSizeThreshold;
}		}

		bool HotCallsite = false;
		uint64_t TotalWeight;
		if (CS.getInstruction()->extractProfTotalWeight(TotalWeight) &&
		PSI->isHotCount(TotalWeight))
		HotCallsite = true;

// Listen to the inlinehint attribute or profile based hotness information		// Listen to the inlinehint attribute or profile based hotness information
// when it would increase the threshold and the caller does not need to		// when it would increase the threshold and the caller does not need to
// minimize its size.		// minimize its size.
bool InlineHint = Callee.hasFnAttribute(Attribute::InlineHint) \|\|		bool InlineHint = Callee.hasFnAttribute(Attribute::InlineHint) \|\|
PSI->isHotFunction(&Callee);		PSI->isHotFunction(&Callee) \|\|
		HotCallsite;
if (InlineHint && HintThreshold > Threshold && !Caller->optForMinSize())		if (InlineHint && HintThreshold > Threshold && !Caller->optForMinSize())
Threshold = HintThreshold;		Threshold = HintThreshold;

bool ColdCallee = PSI->isColdFunction(&Callee);		bool ColdCallee = PSI->isColdFunction(&Callee);
// Command line argument for DefaultInlineThreshold will override the default		// Command line argument for DefaultInlineThreshold will override the default
// ColdThreshold. If we have -inline-threshold but no -inlinecold-threshold,		// ColdThreshold. If we have -inline-threshold but no -inlinecold-threshold,
// do not use the default cold threshold even if it is smaller.		// do not use the default cold threshold even if it is smaller.
if ((DefaultInlineThreshold.getNumOccurrences() == 0 \|\|		if ((DefaultInlineThreshold.getNumOccurrences() == 0 \|\|
▲ Show 20 Lines • Show All 898 Lines • Show Last 20 Lines

lib/IR/Metadata.cpp

Show First 20 Lines • Show All 1,306 Lines • ▼ Show 20 Lines	if (!CITrue \|\| !CIFalse)
return false;		return false;

TrueVal = CITrue->getValue().getZExtValue();		TrueVal = CITrue->getValue().getZExtValue();
FalseVal = CIFalse->getValue().getZExtValue();		FalseVal = CIFalse->getValue().getZExtValue();

return true;		return true;
}		}

		bool Instruction::extractProfTotalWeight(uint64_t &TotalVal) {
		assert((getOpcode() == Instruction::Br \|\|
		getOpcode() == Instruction::Select \|\|
		getOpcode() == Instruction::Call) &&
		"Looking for branch weights on something besides branch");

		TotalVal = 0;
		auto *ProfileData = getMetadata(LLVMContext::MD_prof);
		if (!ProfileData)
		dnovilloUnsubmitted Not Done Reply Inline Actions This will leave TotalVal in an undefined state. Perhaps initialize it to 0 at the start? dnovillo: This will leave TotalVal in an undefined state. Perhaps initialize it to 0 at the start?
		return false;

		auto *ProfDataName = dyn_cast<MDString>(ProfileData->getOperand(0));
		if (!ProfDataName \|\| !ProfDataName->getString().equals("branch_weights"))
		dnovilloUnsubmitted Not Done Reply Inline Actions Likewise. dnovillo: Likewise.
		return false;

		TotalVal = 0;
		for (int i = 1; i < ProfileData->getNumOperands(); i++) {
		auto *V = mdconst::dyn_extract<ConstantInt>(ProfileData->getOperand(i));
		if (!V)
		return false;
		TotalVal += V->getValue().getZExtValue();
		}
		return true;
		}

void Instruction::clearMetadataHashEntries() {		void Instruction::clearMetadataHashEntries() {
assert(hasMetadataHashEntry() && "Caller should check");		assert(hasMetadataHashEntry() && "Caller should check");
getContext().pImpl->InstructionMetadata.erase(this);		getContext().pImpl->InstructionMetadata.erase(this);
setHasMetadataHashEntry(false);		setHasMetadataHashEntry(false);
}		}

void GlobalObject::getMetadata(unsigned KindID,		void GlobalObject::getMetadata(unsigned KindID,
SmallVectorImpl<MDNode *> &MDs) const {		SmallVectorImpl<MDNode *> &MDs) const {
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 967 Lines • ▼ Show 20 Lines	void SampleProfileLoader::propagateWeights(Function &F) {

// Generate MD_prof metadata for every branch instruction using the		// Generate MD_prof metadata for every branch instruction using the
// edge weights computed during propagation.		// edge weights computed during propagation.
DEBUG(dbgs() << "\nPropagation complete. Setting branch weights\n");		DEBUG(dbgs() << "\nPropagation complete. Setting branch weights\n");
LLVMContext &Ctx = F.getContext();		LLVMContext &Ctx = F.getContext();
MDBuilder MDB(Ctx);		MDBuilder MDB(Ctx);
for (auto &BI : F) {		for (auto &BI : F) {
BasicBlock *BB = &BI;		BasicBlock *BB = &BI;

		if (BlockWeights[BB]) {
		for (auto &I : BB->getInstList()) {
		if (CallInst *CI = dyn_cast<CallInst>(&I)) {
		if (!dyn_cast<IntrinsicInst>(&I)) {
		SmallVector<uint32_t, 1> Weights;
		Weights.push_back(BlockWeights[BB]);
		CI->setMetadata(LLVMContext::MD_prof,
		MDB.createBranchWeights(Weights));
		}
		}
		}
		}
TerminatorInst *TI = BB->getTerminator();		TerminatorInst *TI = BB->getTerminator();
if (TI->getNumSuccessors() == 1)		if (TI->getNumSuccessors() == 1)
continue;		continue;
if (!isa<BranchInst>(TI) && !isa<SwitchInst>(TI))		if (!isa<BranchInst>(TI) && !isa<SwitchInst>(TI))
continue;		continue;

DEBUG(dbgs() << "\nGetting weights for branch at line "		DEBUG(dbgs() << "\nGetting weights for branch at line "
<< TI->getDebugLoc().getLine() << ".\n");		<< TI->getDebugLoc().getLine() << ".\n");
▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

test/Transforms/Inline/inline-hot-callsite.ll

This file was added.

				; RUN: opt < %s -inline -inline-threshold=0 -inlinehint-threshold=100 -S \| FileCheck %s

				; This tests that a hot callsite gets the (higher) inlinehint-threshold even without
				; without inline hints and gets inlined because the cost is less than
				; inlinehint-threshold. A cold callee with identical body does not get inlined because
				; cost exceeds the inline-threshold

				define i32 @callee1(i32 %x) {
				%x1 = add i32 %x, 1
				%x2 = add i32 %x1, 1
				%x3 = add i32 %x2, 1

				ret i32 %x3
				}

				define i32 @callee2(i32 %x) {
				; CHECK-LABEL: @callee2(
				%x1 = add i32 %x, 1
				%x2 = add i32 %x1, 1
				%x3 = add i32 %x2, 1

				ret i32 %x3
				}

				define i32 @caller2(i32 %y1) {
				; CHECK-LABEL: @caller2(
				; CHECK: call i32 @callee2
				; CHECK-NOT: call i32 @callee1
				; CHECK: ret i32 %x3.i
				%y2 = call i32 @callee2(i32 %y1), !prof !22
				%y3 = call i32 @callee1(i32 %y2), !prof !21
				ret i32 %y3
				}

				!llvm.module.flags = !{!1}
				!21 = !{!"branch_weights", i64 300}
				!22 = !{!"branch_weights", i64 1}

				!1 = !{i32 1, !"ProfileSummary", !2}
				!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
				!3 = !{!"ProfileFormat", !"InstrProf"}
				!4 = !{!"TotalCount", i64 10000}
				!5 = !{!"MaxCount", i64 1000}
				!6 = !{!"MaxInternalCount", i64 1}
				!7 = !{!"MaxFunctionCount", i64 1000}
				!8 = !{!"NumCounts", i64 3}
				!9 = !{!"NumFunctions", i64 3}
				!10 = !{!"DetailedSummary", !11}
				!11 = !{!12, !13, !14}
				!12 = !{i32 10000, i64 100, i32 1}
				!13 = !{i32 999000, i64 100, i32 1}
				!14 = !{i32 999999, i64 1, i32 2}