This is an archive of the discontinued LLVM Phabricator instance.

Fine tuning of sample profile propagation algorithm.
ClosedPublic

Authored by danielcdh on Aug 5 2016, 1:29 PM.

Download Raw Diff

Details

Reviewers

dnovillo
davidxl

Commits

rGc0a1e432c756: Fine tuning of sample profile propagation algorithm.
rL278522: Fine tuning of sample profile propagation algorithm.

Summary

The refined propagation algorithm is more accurate and robust.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 67009.Aug 5 2016, 1:29 PM

danielcdh retitled this revision from to Fine tuning of sample profile propagation algorithm..

danielcdh updated this object.

danielcdh added reviewers: dnovillo, davidxl.

danielcdh added a subscriber: llvm-commits.

Could you describe this in *much* more detail?

What changes?
What's the idea behind the change?
High-level overview of the change.

It's really hard to review dry-code without the motivation behind the changes and a description.

The changes includes:

Not use branch instruction and intrinsic instruction for annotation because their debug info is usually incorrect.
Hoist the check of whether a callsite was inlined in the profile but not in annotation. Because we are confident that these callsites are cold.
If any BB in an equivalence class is annotated, set the entire equivalence class as annotated.
When setting the wait, add the weight by one to ensure all weights are at least 1. This is trying to avoid propagation error when the weight is 0.
Add a propagateThroughEdge pass with all edge weights reset to 0 to recompute edge weight from propagated block weights.
Add a propagateThroughEdge pass in which basic block count can amended when it's apparently incorrect.
If computed unknown edge weight exceeds its pred/succ block's weight, reduce its weight to pred/succ block's weight
When block count is 0, all its pred/succ edge count is set to 0
Adjust block header weights to be no less than all basic blocks inside the loop

These changes are all from tuning google applications for a long period of time, so it will be hard to get unittests to show effects for each one of them. But they combined to show big difference: for the legacy unittests, the changes have corrected some incorrect annotations.

Apologies for the delay, Dehao. Thanks for the description. That needs to be added to the code, so we have it for future reference. I suspect that several of the spots where I was confused and asked for comments are going to be good candidates to sprinkle that high-level description around.

Thanks.

lib/Transforms/IPO/SampleProfile.cpp
463–466	Why are we ignoring branches? IS it because they don't matter for weight calculation? Should we return 0 for them, or simply ignore them?
474	And this is because we will redo that inlining decision later, right? What about the inlining decisions we decide not to redo?
810	s/could/should/ here, right?
907	Convention is camel case for locals => OtherEC
921	Comment here, please. What is this doing?
1014	Comment here, please. What is this doing?
1039	The block below also needs some commenting. Why the multiple stages of propagation? Can this be re-factored a bit?

add more comments

lib/Transforms/IPO/SampleProfile.cpp
463–466	Because branch instruction's debug info are usually attributed to sources outside the basic block. So we simply ignore all branches when annotating.
474	At this point, all necessary inlining has been redone. If we decided not to redo it, it means it's hot, then we will mark it's count as 0 to prevent from getting inlined in later inlining phase.
1039	Comments added. I think this just calls the same function 3 times, not sure if using a loop to do it three times would simplify the code much.

Thanks! LGTM.

This revision is now accepted and ready to land.Aug 12 2016, 6:53 AM

danielcdh closed this revision.Aug 12 2016, 9:30 AM

Revision Contents

Path

Size

lib/

Transforms/

IPO/

SampleProfile.cpp

131 lines

test/

Transforms/

SampleProfile/

Inputs/

2 lines

1 line

8 lines

8 lines

8 lines

2 lines

12 lines

4 lines

12 lines

Diff 67792

lib/Transforms/IPO/SampleProfile.cpp

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	protected:
void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB);		void printBlockEquivalence(raw_ostream &OS, const BasicBlock *BB);
bool computeBlockWeights(Function &F);		bool computeBlockWeights(Function &F);
void findEquivalenceClasses(Function &F);		void findEquivalenceClasses(Function &F);
void findEquivalencesFor(BasicBlock BB1, ArrayRef<BasicBlock > Descendants,		void findEquivalencesFor(BasicBlock BB1, ArrayRef<BasicBlock > Descendants,
DominatorTreeBase<BasicBlock> *DomTree);		DominatorTreeBase<BasicBlock> *DomTree);
void propagateWeights(Function &F);		void propagateWeights(Function &F);
uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);		uint64_t visitEdge(Edge E, unsigned NumUnknownEdges, Edge UnknownEdge);
void buildEdges(Function &F);		void buildEdges(Function &F);
bool propagateThroughEdges(Function &F);		bool propagateThroughEdges(Function &F, bool UpdateBlockCount);
void computeDominanceAndLoopInfo(Function &F);		void computeDominanceAndLoopInfo(Function &F);
unsigned getOffset(unsigned L, unsigned H) const;		unsigned getOffset(unsigned L, unsigned H) const;
void clearFunctionData();		void clearFunctionData();

/// \brief Map basic blocks to their computed weights.		/// \brief Map basic blocks to their computed weights.
///		///
/// The weight of a basic block is defined to be the maximum		/// The weight of a basic block is defined to be the maximum
/// of all the instruction weights in that block.		/// of all the instruction weights in that block.
▲ Show 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	SampleProfileLoader::getInstWeight(const Instruction &Inst) const {
const DebugLoc &DLoc = Inst.getDebugLoc();		const DebugLoc &DLoc = Inst.getDebugLoc();
if (!DLoc)		if (!DLoc)
return std::error_code();		return std::error_code();

const FunctionSamples *FS = findFunctionSamples(Inst);		const FunctionSamples *FS = findFunctionSamples(Inst);
if (!FS)		if (!FS)
return std::error_code();		return std::error_code();

// Ignore all dbg_value intrinsics.		// Ignore all intrinsics and branch instructions.
const IntrinsicInst *II = dyn_cast<IntrinsicInst>(&Inst);		// Branch instruction usually contains debug info from sources outside of
if (II && II->getIntrinsicID() == Intrinsic::dbg_value)		// the residing basic block, thus we ignore them during annotation.
		if (isa<BranchInst>(Inst) \|\| isa<IntrinsicInst>(Inst))
		dnovilloUnsubmitted Not Done Reply Inline Actions Why are we ignoring branches? IS it because they don't matter for weight calculation? Should we return 0 for them, or simply ignore them? dnovillo: Why are we ignoring branches? IS it because they don't matter for weight calculation? Should…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Because branch instruction's debug info are usually attributed to sources outside the basic block. So we simply ignore all branches when annotating. danielcdh: Because branch instruction's debug info are usually attributed to sources outside the basic…
return std::error_code();		return std::error_code();

		// If a call instruction is inlined in profile, but not inlined here,
		// it means that the inlined callsite has no sample, thus the call
		// instruction should have 0 count.
		const CallInst *CI = dyn_cast<CallInst>(&Inst);
		if (CI && findCalleeFunctionSamples(*CI))
		return 0;
		dnovilloUnsubmitted Not Done Reply Inline Actions And this is because we will redo that inlining decision later, right? What about the inlining decisions we decide not to redo? dnovillo: And this is because we will redo that inlining decision later, right? What about the inlining…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions At this point, all necessary inlining has been redone. If we decided not to redo it, it means it's hot, then we will mark it's count as 0 to prevent from getting inlined in later inlining phase. danielcdh: At this point, all necessary inlining has been redone. If we decided not to redo it, it means…

const DILocation *DIL = DLoc;		const DILocation *DIL = DLoc;
unsigned Lineno = DLoc.getLine();		unsigned Lineno = DLoc.getLine();
unsigned HeaderLineno = DIL->getScope()->getSubprogram()->getLine();		unsigned HeaderLineno = DIL->getScope()->getSubprogram()->getLine();

uint32_t LineOffset = getOffset(Lineno, HeaderLineno);		uint32_t LineOffset = getOffset(Lineno, HeaderLineno);
uint32_t Discriminator = DIL->getDiscriminator();		uint32_t Discriminator = DIL->getDiscriminator();
ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator);		ErrorOr<uint64_t> R = FS->findSamplesAt(LineOffset, Discriminator);
if (R) {		if (R) {
bool FirstMark =		bool FirstMark =
CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get());		CoverageTracker.markSamplesUsed(FS, LineOffset, Discriminator, R.get());
if (FirstMark) {		if (FirstMark) {
const Function *F = Inst.getParent()->getParent();		const Function *F = Inst.getParent()->getParent();
LLVMContext &Ctx = F->getContext();		LLVMContext &Ctx = F->getContext();
emitOptimizationRemark(		emitOptimizationRemark(
Ctx, DEBUG_TYPE, *F, DLoc,		Ctx, DEBUG_TYPE, *F, DLoc,
Twine("Applied ") + Twine(*R) + " samples from profile (offset: " +		Twine("Applied ") + Twine(*R) + " samples from profile (offset: " +
Twine(LineOffset) +		Twine(LineOffset) +
((Discriminator) ? Twine(".") + Twine(Discriminator) : "") + ")");		((Discriminator) ? Twine(".") + Twine(Discriminator) : "") + ")");
}		}
DEBUG(dbgs() << " " << Lineno << "." << DIL->getDiscriminator() << ":"		DEBUG(dbgs() << " " << Lineno << "." << DIL->getDiscriminator() << ":"
<< Inst << " (line offset: " << Lineno - HeaderLineno << "."		<< Inst << " (line offset: " << Lineno - HeaderLineno << "."
<< DIL->getDiscriminator() << " - weight: " << R.get()		<< DIL->getDiscriminator() << " - weight: " << R.get()
<< ")\n");		<< ")\n");
} else {
// If a call instruction is inlined in profile, but not inlined here,
// it means that the inlined callsite has no sample, thus the call
// instruction should have 0 count.
const CallInst *CI = dyn_cast<CallInst>(&Inst);
if (CI && findCalleeFunctionSamples(*CI))
R = 0;
}		}
return R;		return R;
}		}

/// \brief Compute the weight of a basic block.		/// \brief Compute the weight of a basic block.
///		///
/// The weight of basic block \p BB is the maximum weight of all the		/// The weight of basic block \p BB is the maximum weight of all the
/// instructions in BB.		/// instructions in BB.
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	void SampleProfileLoader::findEquivalencesFor(
DominatorTreeBase<BasicBlock> *DomTree) {		DominatorTreeBase<BasicBlock> *DomTree) {
const BasicBlock *EC = EquivalenceClass[BB1];		const BasicBlock *EC = EquivalenceClass[BB1];
uint64_t Weight = BlockWeights[EC];		uint64_t Weight = BlockWeights[EC];
for (const auto *BB2 : Descendants) {		for (const auto *BB2 : Descendants) {
bool IsDomParent = DomTree->dominates(BB2, BB1);		bool IsDomParent = DomTree->dominates(BB2, BB1);
bool IsInSameLoop = LI->getLoopFor(BB1) == LI->getLoopFor(BB2);		bool IsInSameLoop = LI->getLoopFor(BB1) == LI->getLoopFor(BB2);
if (BB1 != BB2 && IsDomParent && IsInSameLoop) {		if (BB1 != BB2 && IsDomParent && IsInSameLoop) {
EquivalenceClass[BB2] = EC;		EquivalenceClass[BB2] = EC;
		// If BB2 is visited, then the entire EC should be marked as visited.
		if (VisitedBlocks.count(BB2)) {
		VisitedBlocks.insert(EC);
		}

// If BB2 is heavier than BB1, make BB2 have the same weight		// If BB2 is heavier than BB1, make BB2 have the same weight
// as BB1.		// as BB1.
//		//
// Note that we don't worry about the opposite situation here		// Note that we don't worry about the opposite situation here
// (when BB2 is lighter than BB1). We will deal with this		// (when BB2 is lighter than BB1). We will deal with this
// during the propagation phase. Right now, we just want to		// during the propagation phase. Right now, we just want to
// make sure that BB1 has the largest weight of all the		// make sure that BB1 has the largest weight of all the
// members of its equivalence set.		// members of its equivalence set.
Weight = std::max(Weight, BlockWeights[BB2]);		Weight = std::max(Weight, BlockWeights[BB2]);
}		}
}		}
		if (EC == &EC->getParent()->getEntryBlock()) {
		BlockWeights[EC] = Samples->getHeadSamples() + 1;
		} else {
BlockWeights[EC] = Weight;		BlockWeights[EC] = Weight;
}		}
		}

/// \brief Find equivalence classes.		/// \brief Find equivalence classes.
///		///
/// Since samples may be missing from blocks, we can fill in the gaps by setting		/// Since samples may be missing from blocks, we can fill in the gaps by setting
/// the weights of all the blocks in the same equivalence class to the same		/// the weights of all the blocks in the same equivalence class to the same
/// weight. To compute the concept of equivalence, we use dominance and loop		/// weight. To compute the concept of equivalence, we use dominance and loop
/// information. Two blocks B1 and B2 are in the same equivalence class if B1		/// information. Two blocks B1 and B2 are in the same equivalence class if B1
/// dominates B2, B2 post-dominates B1 and both are in the same loop.		/// dominates B2, B2 post-dominates B1 and both are in the same loop.
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
///		///
/// If the weight of a basic block is known, and there is only one edge		/// If the weight of a basic block is known, and there is only one edge
/// with an unknown weight, we can calculate the weight of that edge.		/// with an unknown weight, we can calculate the weight of that edge.
///		///
/// Similarly, if all the edges have a known count, we can calculate the		/// Similarly, if all the edges have a known count, we can calculate the
/// count of the basic block, if needed.		/// count of the basic block, if needed.
///		///
/// \param F Function to process.		/// \param F Function to process.
		/// \param UpdateBlockCount Whether we should update basic block counts that
		dnovilloUnsubmitted Done Reply Inline Actions s/could/should/ here, right? dnovillo: s/could/should/ here, right?
		/// has already been annotated.
///		///
/// \returns True if new weights were assigned to edges or blocks.		/// \returns True if new weights were assigned to edges or blocks.
bool SampleProfileLoader::propagateThroughEdges(Function &F) {		bool SampleProfileLoader::propagateThroughEdges(Function &F,
		bool UpdateBlockCount) {
bool Changed = false;		bool Changed = false;
DEBUG(dbgs() << "\nPropagation through edges\n");		DEBUG(dbgs() << "\nPropagation through edges\n");
for (const auto &BI : F) {		for (const auto &BI : F) {
const BasicBlock *BB = &BI;		const BasicBlock *BB = &BI;
const BasicBlock *EC = EquivalenceClass[BB];		const BasicBlock *EC = EquivalenceClass[BB];

// Visit all the predecessor and successor edges to determine		// Visit all the predecessor and successor edges to determine
// which ones have a weight assigned already. Note that it doesn't		// which ones have a weight assigned already. Note that it doesn't
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < 2; i++) {
}		}
} else if (NumUnknownEdges == 1 && VisitedBlocks.count(EC)) {		} else if (NumUnknownEdges == 1 && VisitedBlocks.count(EC)) {
// If there is a single unknown edge and the block has been		// If there is a single unknown edge and the block has been
// visited, then we can compute E's weight.		// visited, then we can compute E's weight.
if (BBWeight >= TotalWeight)		if (BBWeight >= TotalWeight)
EdgeWeights[UnknownEdge] = BBWeight - TotalWeight;		EdgeWeights[UnknownEdge] = BBWeight - TotalWeight;
else		else
EdgeWeights[UnknownEdge] = 0;		EdgeWeights[UnknownEdge] = 0;
		const BasicBlock *OtherEC;
		dnovilloUnsubmitted Done Reply Inline Actions Convention is camel case for locals => OtherEC dnovillo: Convention is camel case for locals => OtherEC
		if (i == 0)
		OtherEC = EquivalenceClass[UnknownEdge.first];
		else
		OtherEC = EquivalenceClass[UnknownEdge.second];
		// Edge weights should never exceed the BB weights it connects.
		if (VisitedBlocks.count(OtherEC) &&
		EdgeWeights[UnknownEdge] > BlockWeights[OtherEC])
		EdgeWeights[UnknownEdge] = BlockWeights[OtherEC];
VisitedEdges.insert(UnknownEdge);		VisitedEdges.insert(UnknownEdge);
Changed = true;		Changed = true;
DEBUG(dbgs() << "Set weight for edge: ";		DEBUG(dbgs() << "Set weight for edge: ";
printEdgeWeight(dbgs(), UnknownEdge));		printEdgeWeight(dbgs(), UnknownEdge));
}		}
		} else if (VisitedBlocks.count(EC) && BlockWeights[EC] == 0) {
		dnovilloUnsubmitted Done Reply Inline Actions Comment here, please. What is this doing? dnovillo: Comment here, please. What is this doing?
		// If a block Weights 0, all its in/out edges should weight 0.
		if (i == 0) {
		for (auto *Pred : Predecessors[BB]) {
		Edge E = std::make_pair(Pred, BB);
		EdgeWeights[E] = 0;
		VisitedEdges.insert(E);
		}
		} else {
		for (auto *Succ : Successors[BB]) {
		Edge E = std::make_pair(BB, Succ);
		EdgeWeights[E] = 0;
		VisitedEdges.insert(E);
		}
		}
} else if (SelfReferentialEdge.first && VisitedBlocks.count(EC)) {		} else if (SelfReferentialEdge.first && VisitedBlocks.count(EC)) {
uint64_t &BBWeight = BlockWeights[BB];		uint64_t &BBWeight = BlockWeights[BB];
// We have a self-referential edge and the weight of BB is known.		// We have a self-referential edge and the weight of BB is known.
if (BBWeight >= TotalWeight)		if (BBWeight >= TotalWeight)
EdgeWeights[SelfReferentialEdge] = BBWeight - TotalWeight;		EdgeWeights[SelfReferentialEdge] = BBWeight - TotalWeight;
else		else
EdgeWeights[SelfReferentialEdge] = 0;		EdgeWeights[SelfReferentialEdge] = 0;
VisitedEdges.insert(SelfReferentialEdge);		VisitedEdges.insert(SelfReferentialEdge);
Changed = true;		Changed = true;
DEBUG(dbgs() << "Set self-referential edge weight to: ";		DEBUG(dbgs() << "Set self-referential edge weight to: ";
printEdgeWeight(dbgs(), SelfReferentialEdge));		printEdgeWeight(dbgs(), SelfReferentialEdge));
}		}
		if (UpdateBlockCount && !VisitedBlocks.count(EC) && TotalWeight > 0) {
		BlockWeights[EC] = TotalWeight;
		VisitedBlocks.insert(EC);
		Changed = true;
		}
}		}
}		}

return Changed;		return Changed;
}		}

/// \brief Build in/out edge lists for each basic block in the CFG.		/// \brief Build in/out edge lists for each basic block in the CFG.
///		///
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
/// minus the weight of the other incoming edges to that block (if		/// minus the weight of the other incoming edges to that block (if
/// known).		/// known).
void SampleProfileLoader::propagateWeights(Function &F) {		void SampleProfileLoader::propagateWeights(Function &F) {
bool Changed = true;		bool Changed = true;
unsigned I = 0;		unsigned I = 0;

// Add an entry count to the function using the samples gathered		// Add an entry count to the function using the samples gathered
// at the function entry.		// at the function entry.
F.setEntryCount(Samples->getHeadSamples());		F.setEntryCount(Samples->getHeadSamples() + 1);

		// If BB weight is larger than its corresponding loop's header BB weight,
		dnovilloUnsubmitted Done Reply Inline Actions Comment here, please. What is this doing? dnovillo: Comment here, please. What is this doing?
		// use the BB weight to replace the loop header BB weight.
		for (auto &BI : F) {
		BasicBlock *BB = &BI;
		Loop *L = LI->getLoopFor(BB);
		if (!L) {
		continue;
		}
		BasicBlock *Header = L->getHeader();
		if (Header && BlockWeights[BB] > BlockWeights[Header]) {
		BlockWeights[Header] = BlockWeights[BB];
		}
		}

// Before propagation starts, build, for each block, a list of		// Before propagation starts, build, for each block, a list of
// unique predecessors and successors. This is necessary to handle		// unique predecessors and successors. This is necessary to handle
// identical edges in multiway branches. Since we visit all blocks and all		// identical edges in multiway branches. Since we visit all blocks and all
// edges of the CFG, it is cleaner to build these lists once at the start		// edges of the CFG, it is cleaner to build these lists once at the start
// of the pass.		// of the pass.
buildEdges(F);		buildEdges(F);

// Propagate until we converge or we go past the iteration limit.		// Propagate until we converge or we go past the iteration limit.
while (Changed && I++ < SampleProfileMaxPropagateIterations) {		while (Changed && I++ < SampleProfileMaxPropagateIterations) {
Changed = propagateThroughEdges(F);		Changed = propagateThroughEdges(F, false);
		}

		dnovilloUnsubmitted Not Done Reply Inline Actions The block below also needs some commenting. Why the multiple stages of propagation? Can this be re-factored a bit? dnovillo: The block below also needs some commenting. Why the multiple stages of propagation? Can this…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Comments added. I think this just calls the same function 3 times, not sure if using a loop to do it three times would simplify the code much. danielcdh: Comments added. I think this just calls the same function 3 times, not sure if using a loop to…
		// The first propagation propagates BB counts from annotated BBs to unknown
		// BBs. The 2nd propagation pass resets edges weights, and use all BB weights
		// to propagate edge weights.
		VisitedEdges.clear();
		Changed = true;
		while (Changed && I++ < SampleProfileMaxPropagateIterations) {
		Changed = propagateThroughEdges(F, false);
		}

		// The 3rd propagation pass allows adjust annotated BB weights that are
		// obviously wrong.
		Changed = true;
		while (Changed && I++ < SampleProfileMaxPropagateIterations) {
		Changed = propagateThroughEdges(F, true);
}		}

// Generate MD_prof metadata for every branch instruction using the		// Generate MD_prof metadata for every branch instruction using the
// edge weights computed during propagation.		// edge weights computed during propagation.
DEBUG(dbgs() << "\nPropagation complete. Setting branch weights\n");		DEBUG(dbgs() << "\nPropagation complete. Setting branch weights\n");
LLVMContext &Ctx = F.getContext();		LLVMContext &Ctx = F.getContext();
MDBuilder MDB(Ctx);		MDBuilder MDB(Ctx);
for (auto &BI : F) {		for (auto &BI : F) {
Show All 29 Lines	for (unsigned I = 0; I < TI->getNumSuccessors(); ++I) {
DEBUG(dbgs() << "\t"; printEdgeWeight(dbgs(), E));		DEBUG(dbgs() << "\t"; printEdgeWeight(dbgs(), E));
// Use uint32_t saturated arithmetic to adjust the incoming weights,		// Use uint32_t saturated arithmetic to adjust the incoming weights,
// if needed. Sample counts in profiles are 64-bit unsigned values,		// if needed. Sample counts in profiles are 64-bit unsigned values,
// but internally branch weights are expressed as 32-bit values.		// but internally branch weights are expressed as 32-bit values.
if (Weight > std::numeric_limits<uint32_t>::max()) {		if (Weight > std::numeric_limits<uint32_t>::max()) {
DEBUG(dbgs() << " (saturated due to uint32_t overflow)");		DEBUG(dbgs() << " (saturated due to uint32_t overflow)");
Weight = std::numeric_limits<uint32_t>::max();		Weight = std::numeric_limits<uint32_t>::max();
}		}
Weights.push_back(static_cast<uint32_t>(Weight));		// Weight is added by one to avoid propagation errors introduced by
		// 0 weights.
		Weights.push_back(static_cast<uint32_t>(Weight + 1));
if (Weight != 0) {		if (Weight != 0) {
if (Weight > MaxWeight) {		if (Weight > MaxWeight) {
MaxWeight = Weight;		MaxWeight = Weight;
MaxDestLoc = Succ->getFirstNonPHIOrDbgOrLifetime()->getDebugLoc();		MaxDestLoc = Succ->getFirstNonPHIOrDbgOrLifetime()->getDebugLoc();
}		}
}		}
}		}

// Only set weights if there is at least one non-zero weight.		// Only set weights if there is at least one non-zero weight.
// In any other case, let the analyzer set weights.		// In any other case, let the analyzer set weights.
if (MaxWeight > 0) {
DEBUG(dbgs() << "SUCCESS. Found non-zero weights.\n");		DEBUG(dbgs() << "SUCCESS. Found non-zero weights.\n");
TI->setMetadata(llvm::LLVMContext::MD_prof,		TI->setMetadata(llvm::LLVMContext::MD_prof,
MDB.createBranchWeights(Weights));		MDB.createBranchWeights(Weights));
DebugLoc BranchLoc = TI->getDebugLoc();		DebugLoc BranchLoc = TI->getDebugLoc();
emitOptimizationRemark(		emitOptimizationRemark(
Ctx, DEBUG_TYPE, F, MaxDestLoc,		Ctx, DEBUG_TYPE, F, MaxDestLoc,
Twine("most popular destination for conditional branches at ") +		Twine("most popular destination for conditional branches at ") +
((BranchLoc) ? Twine(BranchLoc->getFilename() + ":" +		((BranchLoc) ? Twine(BranchLoc->getFilename() + ":" +
Twine(BranchLoc.getLine()) + ":" +		Twine(BranchLoc.getLine()) + ":" +
Twine(BranchLoc.getCol()))		Twine(BranchLoc.getCol()))
: Twine("<UNKNOWN LOCATION>")));		: Twine("<UNKNOWN LOCATION>")));
} else {
DEBUG(dbgs() << "SKIPPED. All branch weights are zero.\n");
}
}		}
}		}

/// \brief Get the line number for the function header.		/// \brief Get the line number for the function header.
///		///
/// This looks up function \p F in the current compilation unit and		/// This looks up function \p F in the current compilation unit and
/// retrieves the line number where the function is defined. This is		/// retrieves the line number where the function is defined. This is
/// line 0 for all the samples read from the profile file. Every line		/// line 0 for all the samples read from the profile file. Every line
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

test/Transforms/SampleProfile/Inputs/branch.prof

	main:15680:0			main:15680:2500
	1: 2500			1: 2500
	4: 1000			4: 1000
	5: 1000			5: 1000
	6: 800			6: 800
	7: 500			7: 500
	9: 10226			9: 10226
	10: 2243			10: 2243
	16: 0			16: 0
	18: 0			18: 0

test/Transforms/SampleProfile/Inputs/fnptr.binprof

test/Transforms/SampleProfile/Inputs/fnptr.prof

	_Z3fooi:7711:610			_Z3fooi:7711:610
	1: 610			1: 610
	_Z3bari:20301:1437			_Z3bari:20301:1437
	1: 1437			1: 1437
	main:184019:0			main:184019:0
				3: 0
	4: 534			4: 534
	6: 2080			6: 2080
	9: 2064 _Z3bari:1471 _Z3fooi:631			9: 2064 _Z3bari:1471 _Z3fooi:631
	5.1: 1075			5.1: 1075
	5: 1075			5: 1075
	7: 534			7: 534
	4.2: 534			4.2: 534

test/Transforms/SampleProfile/branch.ll

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	entry:
store i32 0, i32* %retval, align 4		store i32 0, i32* %retval, align 4
store i32 %argc, i32* %argc.addr, align 4		store i32 %argc, i32* %argc.addr, align 4
call void @llvm.dbg.declare(metadata i32* %argc.addr, metadata !16, metadata !17), !dbg !18		call void @llvm.dbg.declare(metadata i32* %argc.addr, metadata !16, metadata !17), !dbg !18
store i8 %argv, i8* %argv.addr, align 8		store i8 %argv, i8* %argv.addr, align 8
call void @llvm.dbg.declare(metadata i8*** %argv.addr, metadata !19, metadata !17), !dbg !20		call void @llvm.dbg.declare(metadata i8*** %argv.addr, metadata !19, metadata !17), !dbg !20
%0 = load i32, i32* %argc.addr, align 4, !dbg !21		%0 = load i32, i32* %argc.addr, align 4, !dbg !21
%cmp = icmp slt i32 %0, 2, !dbg !23		%cmp = icmp slt i32 %0, 2, !dbg !23
br i1 %cmp, label %if.then, label %if.end, !dbg !24		br i1 %cmp, label %if.then, label %if.end, !dbg !24
; CHECK: edge entry -> if.then probability is 0x4ccccccd / 0x80000000 = 60.00%		; CHECK: edge entry -> if.then probability is 0x4ccf6b16 / 0x80000000 = 60.01%
; CHECK: edge entry -> if.end probability is 0x33333333 / 0x80000000 = 40.00%		; CHECK: edge entry -> if.end probability is 0x333094ea / 0x80000000 = 39.99%

if.then: ; preds = %entry		if.then: ; preds = %entry
store i32 1, i32* %retval, align 4, !dbg !25		store i32 1, i32* %retval, align 4, !dbg !25
br label %return, !dbg !25		br label %return, !dbg !25

if.end: ; preds = %entry		if.end: ; preds = %entry
call void @llvm.dbg.declare(metadata double* %result, metadata !26, metadata !17), !dbg !27		call void @llvm.dbg.declare(metadata double* %result, metadata !26, metadata !17), !dbg !27
call void @llvm.dbg.declare(metadata i32* %limit, metadata !28, metadata !17), !dbg !29		call void @llvm.dbg.declare(metadata i32* %limit, metadata !28, metadata !17), !dbg !29
%1 = load i8, i8* %argv.addr, align 8, !dbg !30		%1 = load i8, i8* %argv.addr, align 8, !dbg !30
%arrayidx = getelementptr inbounds i8, i8* %1, i64 1, !dbg !30		%arrayidx = getelementptr inbounds i8, i8* %1, i64 1, !dbg !30
%2 = load i8, i8* %arrayidx, align 8, !dbg !30		%2 = load i8, i8* %arrayidx, align 8, !dbg !30
%call = call i32 @atoi(i8* %2) #4, !dbg !31		%call = call i32 @atoi(i8* %2) #4, !dbg !31
store i32 %call, i32* %limit, align 4, !dbg !29		store i32 %call, i32* %limit, align 4, !dbg !29
%3 = load i32, i32* %limit, align 4, !dbg !32		%3 = load i32, i32* %limit, align 4, !dbg !32
%cmp1 = icmp sgt i32 %3, 100, !dbg !34		%cmp1 = icmp sgt i32 %3, 100, !dbg !34
br i1 %cmp1, label %if.then.2, label %if.else, !dbg !35		br i1 %cmp1, label %if.then.2, label %if.else, !dbg !35
; CHECK: edge if.end -> if.then.2 probability is 0x66666666 / 0x80000000 = 80.00%		; CHECK: edge if.end -> if.then.2 probability is 0x6652c748 / 0x80000000 = 79.94%
; CHECK: edge if.end -> if.else probability is 0x1999999a / 0x80000000 = 20.00%		; CHECK: edge if.end -> if.else probability is 0x19ad38b8 / 0x80000000 = 20.06%

if.then.2: ; preds = %if.end		if.then.2: ; preds = %if.end
call void @llvm.dbg.declare(metadata double* %s, metadata !36, metadata !17), !dbg !38		call void @llvm.dbg.declare(metadata double* %s, metadata !36, metadata !17), !dbg !38
%4 = load i8, i8* %argv.addr, align 8, !dbg !39		%4 = load i8, i8* %argv.addr, align 8, !dbg !39
%arrayidx3 = getelementptr inbounds i8, i8* %4, i64 2, !dbg !39		%arrayidx3 = getelementptr inbounds i8, i8* %4, i64 2, !dbg !39
%5 = load i8, i8* %arrayidx3, align 8, !dbg !39		%5 = load i8, i8* %arrayidx3, align 8, !dbg !39
%call4 = call i32 @atoi(i8* %5) #4, !dbg !40		%call4 = call i32 @atoi(i8* %5) #4, !dbg !40
%conv = sitofp i32 %call4 to double, !dbg !40		%conv = sitofp i32 %call4 to double, !dbg !40
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

test/Transforms/SampleProfile/calls.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
br label %while.cond, !dbg !13		br label %while.cond, !dbg !13

while.cond: ; preds = %if.end, %entry		while.cond: ; preds = %if.end, %entry
%0 = load i32, i32* %i, align 4, !dbg !14		%0 = load i32, i32* %i, align 4, !dbg !14
%inc = add nsw i32 %0, 1, !dbg !14		%inc = add nsw i32 %0, 1, !dbg !14
store i32 %inc, i32* %i, align 4, !dbg !14		store i32 %inc, i32* %i, align 4, !dbg !14
%cmp = icmp slt i32 %0, 400000000, !dbg !14		%cmp = icmp slt i32 %0, 400000000, !dbg !14
br i1 %cmp, label %while.body, label %while.end, !dbg !14		br i1 %cmp, label %while.body, label %while.end, !dbg !14
; CHECK: edge while.cond -> while.body probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]		; CHECK: edge while.cond -> while.body probability is 0x7ffa4e20 / 0x80000000 = 99.98% [HOT edge]
; CHECK: edge while.cond -> while.end probability is 0x00000000 / 0x80000000 = 0.00%		; CHECK: edge while.cond -> while.end probability is 0x0005b1e0 / 0x80000000 = 0.02%

while.body: ; preds = %while.cond		while.body: ; preds = %while.cond
%1 = load i32, i32* %i, align 4, !dbg !16		%1 = load i32, i32* %i, align 4, !dbg !16
%cmp1 = icmp ne i32 %1, 100, !dbg !16		%cmp1 = icmp ne i32 %1, 100, !dbg !16
br i1 %cmp1, label %if.then, label %if.else, !dbg !16		br i1 %cmp1, label %if.then, label %if.else, !dbg !16
; Without discriminator information, the profiler used to think that		; Without discriminator information, the profiler used to think that
; both branches out of while.body had the same weight. In reality,		; both branches out of while.body had the same weight. In reality,
; the edge while.body->if.then is taken most of the time.		; the edge while.body->if.then is taken most of the time.
;		;
; CHECK: edge while.body -> if.else probability is 0x00000000 / 0x80000000 = 0.00%		; CHECK: edge while.body -> if.else probability is 0x0005b1e0 / 0x80000000 = 0.02%
; CHECK: edge while.body -> if.then probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]		; CHECK: edge while.body -> if.then probability is 0x7ffa4e20 / 0x80000000 = 99.98% [HOT edge]


if.then: ; preds = %while.body		if.then: ; preds = %while.body
%2 = load i32, i32* %i, align 4, !dbg !18		%2 = load i32, i32* %i, align 4, !dbg !18
%3 = load i32, i32* %s, align 4, !dbg !18		%3 = load i32, i32* %s, align 4, !dbg !18
%call = call i32 @_Z3sumii(i32 %2, i32 %3), !dbg !18		%call = call i32 @_Z3sumii(i32 %2, i32 %3), !dbg !18
store i32 %call, i32* %s, align 4, !dbg !18		store i32 %call, i32* %s, align 4, !dbg !18
br label %if.end, !dbg !18		br label %if.end, !dbg !18
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/Transforms/SampleProfile/discriminator.ll

Show All 29 Lines	entry:
store i32 %i, i32* %i.addr, align 4		store i32 %i, i32* %i.addr, align 4
store i32 0, i32* %x, align 4, !dbg !10		store i32 0, i32* %x, align 4, !dbg !10
br label %while.cond, !dbg !11		br label %while.cond, !dbg !11

while.cond: ; preds = %if.end, %entry		while.cond: ; preds = %if.end, %entry
%0 = load i32, i32* %i.addr, align 4, !dbg !12		%0 = load i32, i32* %i.addr, align 4, !dbg !12
%cmp = icmp slt i32 %0, 100, !dbg !12		%cmp = icmp slt i32 %0, 100, !dbg !12
br i1 %cmp, label %while.body, label %while.end, !dbg !12		br i1 %cmp, label %while.body, label %while.end, !dbg !12
; CHECK: edge while.cond -> while.body probability is 0x7ebb907a / 0x80000000 = 99.01% [HOT edge]		; CHECK: edge while.cond -> while.body probability is 0x7d83ba68 / 0x80000000 = 98.06% [HOT edge]
; CHECK: edge while.cond -> while.end probability is 0x01446f86 / 0x80000000 = 0.99%		; CHECK: edge while.cond -> while.end probability is 0x027c4598 / 0x80000000 = 1.94%

while.body: ; preds = %while.cond		while.body: ; preds = %while.cond
%1 = load i32, i32* %i.addr, align 4, !dbg !14		%1 = load i32, i32* %i.addr, align 4, !dbg !14
%cmp1 = icmp slt i32 %1, 50, !dbg !14		%cmp1 = icmp slt i32 %1, 50, !dbg !14
br i1 %cmp1, label %if.then, label %if.end, !dbg !14		br i1 %cmp1, label %if.then, label %if.end, !dbg !14
; CHECK: edge while.body -> if.then probability is 0x06666666 / 0x80000000 = 5.00%		; CHECK: edge while.body -> if.then probability is 0x07878788 / 0x80000000 = 5.88%
; CHECK: edge while.body -> if.end probability is 0x7999999a / 0x80000000 = 95.00% [HOT edge]		; CHECK: edge while.body -> if.end probability is 0x78787878 / 0x80000000 = 94.12% [HOT edge]

if.then: ; preds = %while.body		if.then: ; preds = %while.body
%2 = load i32, i32* %x, align 4, !dbg !17		%2 = load i32, i32* %x, align 4, !dbg !17
%dec = add nsw i32 %2, -1, !dbg !17		%dec = add nsw i32 %2, -1, !dbg !17
store i32 %dec, i32* %x, align 4, !dbg !17		store i32 %dec, i32* %x, align 4, !dbg !17
br label %if.end, !dbg !17		br label %if.end, !dbg !17

if.end: ; preds = %if.then, %while.body		if.end: ; preds = %if.then, %while.body
Show All 36 Lines

test/Transforms/SampleProfile/entry_counts.ll

	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/entry_counts.prof -S \| FileCheck %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/entry_counts.prof -S \| FileCheck %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/entry_counts.prof -S \| FileCheck %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/entry_counts.prof -S \| FileCheck %s

	; According to the profile, function empty() was called 13,293 times.			; According to the profile, function empty() was called 13,293 times.
	; CHECK: {{.*}} = !{!"function_entry_count", i64 13293}			; CHECK: {{.*}} = !{!"function_entry_count", i64 13294}

	define void @empty() !dbg !4 {			define void @empty() !dbg !4 {
	entry:			entry:
	ret void, !dbg !9			ret void, !dbg !9
	}			}

	; This function does not have profile, check if function_entry_count is 0			; This function does not have profile, check if function_entry_count is 0
	; CHECK: {{.*}} = !{!"function_entry_count", i64 0}			; CHECK: {{.*}} = !{!"function_entry_count", i64 0}
	Show All 18 Lines

test/Transforms/SampleProfile/fnptr.ll

	; The two profiles used in this test are the same but encoded in different			; The two profiles used in this test are the same but encoded in different
	; formats. This checks that we produce the same profile annotations regardless			; formats. This checks that we produce the same profile annotations regardless
	; of the profile format.			; of the profile format.
	;			;
	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/fnptr.prof \| opt -analyze -branch-prob \| FileCheck %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/fnptr.prof \| opt -analyze -branch-prob \| FileCheck %s
	; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/fnptr.binprof \| opt -analyze -branch-prob \| FileCheck %s			; RUN: opt < %s -sample-profile -sample-profile-file=%S/Inputs/fnptr.binprof \| opt -analyze -branch-prob \| FileCheck %s

	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/fnptr.prof \| opt -analyze -branch-prob \| FileCheck %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/fnptr.prof \| opt -analyze -branch-prob \| FileCheck %s
	; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/fnptr.binprof \| opt -analyze -branch-prob \| FileCheck %s			; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/fnptr.binprof \| opt -analyze -branch-prob \| FileCheck %s

	; CHECK: edge for.body3 -> if.then probability is 0x1a4f3959 / 0x80000000 = 20.55%			; CHECK: edge for.body3 -> if.then probability is 0x1a56a56a / 0x80000000 = 20.58%
	; CHECK: edge for.body3 -> if.else probability is 0x65b0c6a7 / 0x80000000 = 79.45%			; CHECK: edge for.body3 -> if.else probability is 0x65a95a96 / 0x80000000 = 79.42%
	; CHECK: edge for.inc -> for.inc12 probability is 0x20dc8dc9 / 0x80000000 = 25.67%			; CHECK: edge for.inc -> for.inc12 probability is 0x000fdc50 / 0x80000000 = 0.05%
	; CHECK: edge for.inc -> for.body3 probability is 0x5f237237 / 0x80000000 = 74.33%			; CHECK: edge for.inc -> for.body3 probability is 0x7ff023b0 / 0x80000000 = 99.95%
	; CHECK: edge for.inc12 -> for.end14 probability is 0x00000000 / 0x80000000 = 0.00%			; CHECK: edge for.inc12 -> for.end14 probability is 0x40000000 / 0x80000000 = 50.00%
	; CHECK: edge for.inc12 -> for.cond1.preheader probability is 0x80000000 / 0x80000000 = 100.00%			; CHECK: edge for.inc12 -> for.cond1.preheader probability is 0x40000000 / 0x80000000 = 50.00%

	; Original C++ test case.			; Original C++ test case.
	;			;
	; #include <stdlib.h>			; #include <stdlib.h>
	; #include <math.h>			; #include <math.h>
	; #include <stdio.h>			; #include <stdio.h>
	;			;
	; #define N 10000			; #define N 10000
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

test/Transforms/SampleProfile/offset.ll

	Show All 23 Lines
	entry:			entry:
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	%a.addr = alloca i32, align 4			%a.addr = alloca i32, align 4
	store i32 %a, i32* %a.addr, align 4			store i32 %a, i32* %a.addr, align 4
	call void @llvm.dbg.declare(metadata i32* %a.addr, metadata !11, metadata !12), !dbg !13			call void @llvm.dbg.declare(metadata i32* %a.addr, metadata !11, metadata !12), !dbg !13
	%0 = load i32, i32* %a.addr, align 4, !dbg !14			%0 = load i32, i32* %a.addr, align 4, !dbg !14
	%cmp = icmp sgt i32 %0, 0, !dbg !18			%cmp = icmp sgt i32 %0, 0, !dbg !18
	br i1 %cmp, label %if.then, label %if.else, !dbg !19			br i1 %cmp, label %if.then, label %if.else, !dbg !19
	; CHECK: edge entry -> if.then probability is 0x0147ae14 / 0x80000000 = 1.00%			; CHECK: edge entry -> if.then probability is 0x0167ba82 / 0x80000000 = 1.10%
	; CHECK: edge entry -> if.else probability is 0x7eb851ec / 0x80000000 = 99.00% [HOT edge]			; CHECK: edge entry -> if.else probability is 0x7e98457e / 0x80000000 = 98.90% [HOT edge]

	if.then: ; preds = %entry			if.then: ; preds = %entry
	store i32 10, i32* %retval, align 4, !dbg !20			store i32 10, i32* %retval, align 4, !dbg !20
	br label %return, !dbg !20			br label %return, !dbg !20

	if.else: ; preds = %entry			if.else: ; preds = %entry
	store i32 20, i32* %retval, align 4, !dbg !22			store i32 20, i32* %retval, align 4, !dbg !22
	br label %return, !dbg !22			br label %return, !dbg !22
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

test/Transforms/SampleProfile/propagate.ll

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	for.cond: ; preds = %for.inc17, %if.else
br i1 %cmp1, label %for.body, label %for.end19, !dbg !38		br i1 %cmp1, label %for.body, label %for.end19, !dbg !38

for.body: ; preds = %for.cond		for.body: ; preds = %for.cond
%6 = load i64, i64* %i, align 8, !dbg !39		%6 = load i64, i64* %i, align 8, !dbg !39
%7 = load i64, i64* %N.addr, align 8, !dbg !42		%7 = load i64, i64* %N.addr, align 8, !dbg !42
%div = sdiv i64 %7, 3, !dbg !43		%div = sdiv i64 %7, 3, !dbg !43
%cmp2 = icmp sgt i64 %6, %div, !dbg !44		%cmp2 = icmp sgt i64 %6, %div, !dbg !44
br i1 %cmp2, label %if.then3, label %if.end, !dbg !45		br i1 %cmp2, label %if.then3, label %if.end, !dbg !45
; CHECK: edge for.body -> if.then3 probability is 0x51451451 / 0x80000000 = 63.49%		; CHECK: edge for.body -> if.then3 probability is 0x51292fa6 / 0x80000000 = 63.41%
; CHECK: edge for.body -> if.end probability is 0x2ebaebaf / 0x80000000 = 36.51%		; CHECK: edge for.body -> if.end probability is 0x2ed6d05a / 0x80000000 = 36.59%

if.then3: ; preds = %for.body		if.then3: ; preds = %for.body
%8 = load i32, i32* %x.addr, align 4, !dbg !46		%8 = load i32, i32* %x.addr, align 4, !dbg !46
%dec = add nsw i32 %8, -1, !dbg !46		%dec = add nsw i32 %8, -1, !dbg !46
store i32 %dec, i32* %x.addr, align 4, !dbg !46		store i32 %dec, i32* %x.addr, align 4, !dbg !46
br label %if.end, !dbg !47		br label %if.end, !dbg !47

if.end: ; preds = %if.then3, %for.body		if.end: ; preds = %if.then3, %for.body
%9 = load i64, i64* %i, align 8, !dbg !48		%9 = load i64, i64* %i, align 8, !dbg !48
%10 = load i64, i64* %N.addr, align 8, !dbg !50		%10 = load i64, i64* %N.addr, align 8, !dbg !50
%div4 = sdiv i64 %10, 4, !dbg !51		%div4 = sdiv i64 %10, 4, !dbg !51
%cmp5 = icmp sgt i64 %9, %div4, !dbg !52		%cmp5 = icmp sgt i64 %9, %div4, !dbg !52
br i1 %cmp5, label %if.then6, label %if.else7, !dbg !53		br i1 %cmp5, label %if.then6, label %if.else7, !dbg !53
; CHECK: edge if.end -> if.then6 probability is 0x5dbaa1dc / 0x80000000 = 73.23%		; CHECK: edge if.end -> if.then6 probability is 0x5d89d89e / 0x80000000 = 73.08%
; CHECK: edge if.end -> if.else7 probability is 0x22455e24 / 0x80000000 = 26.77%		; CHECK: edge if.end -> if.else7 probability is 0x22762762 / 0x80000000 = 26.92%

if.then6: ; preds = %if.end		if.then6: ; preds = %if.end
%11 = load i32, i32* %y.addr, align 4, !dbg !54		%11 = load i32, i32* %y.addr, align 4, !dbg !54
%inc = add nsw i32 %11, 1, !dbg !54		%inc = add nsw i32 %11, 1, !dbg !54
store i32 %inc, i32* %y.addr, align 4, !dbg !54		store i32 %inc, i32* %y.addr, align 4, !dbg !54
%12 = load i32, i32* %x.addr, align 4, !dbg !56		%12 = load i32, i32* %x.addr, align 4, !dbg !56
%add = add nsw i32 %12, 3, !dbg !56		%add = add nsw i32 %12, 3, !dbg !56
store i32 %add, i32* %x.addr, align 4, !dbg !56		store i32 %add, i32* %x.addr, align 4, !dbg !56
br label %if.end16, !dbg !57		br label %if.end16, !dbg !57

if.else7: ; preds = %if.end		if.else7: ; preds = %if.end
call void @llvm.dbg.declare(metadata i64* %j, metadata !58, metadata !12), !dbg !62		call void @llvm.dbg.declare(metadata i64* %j, metadata !58, metadata !12), !dbg !62
store i64 0, i64* %j, align 8, !dbg !62		store i64 0, i64* %j, align 8, !dbg !62
br label %for.cond8, !dbg !63		br label %for.cond8, !dbg !63

for.cond8: ; preds = %for.inc, %if.else7		for.cond8: ; preds = %for.inc, %if.else7
%13 = load i64, i64* %j, align 8, !dbg !64		%13 = load i64, i64* %j, align 8, !dbg !64
%cmp9 = icmp slt i64 %13, 100, !dbg !67		%cmp9 = icmp slt i64 %13, 100, !dbg !67
br i1 %cmp9, label %for.body10, label %for.end, !dbg !68		br i1 %cmp9, label %for.body10, label %for.end, !dbg !68
; CHECK: edge for.cond8 -> for.body10 probability is 0x7e985735 / 0x80000000 = 98.90% [HOT edge]		; CHECK: edge for.cond8 -> for.body10 probability is 0x7e941a89 / 0x80000000 = 98.89% [HOT edge]
; CHECK: edge for.cond8 -> for.end probability is 0x0167a8cb / 0x80000000 = 1.10%		; CHECK: edge for.cond8 -> for.end probability is 0x016be577 / 0x80000000 = 1.11%


for.body10: ; preds = %for.cond8		for.body10: ; preds = %for.cond8
%14 = load i64, i64* %j, align 8, !dbg !69		%14 = load i64, i64* %j, align 8, !dbg !69
%15 = load i32, i32* %x.addr, align 4, !dbg !71		%15 = load i32, i32* %x.addr, align 4, !dbg !71
%conv11 = sext i32 %15 to i64, !dbg !71		%conv11 = sext i32 %15 to i64, !dbg !71
%add12 = add nsw i64 %conv11, %14, !dbg !71		%add12 = add nsw i64 %conv11, %14, !dbg !71
%conv13 = trunc i64 %add12 to i32, !dbg !71		%conv13 = trunc i64 %add12 to i32, !dbg !71
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Fine tuning of sample profile propagation algorithm.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 67792

lib/Transforms/IPO/SampleProfile.cpp

test/Transforms/SampleProfile/Inputs/branch.prof

test/Transforms/SampleProfile/Inputs/fnptr.binprof

test/Transforms/SampleProfile/Inputs/fnptr.prof

test/Transforms/SampleProfile/branch.ll

test/Transforms/SampleProfile/calls.ll

test/Transforms/SampleProfile/discriminator.ll

test/Transforms/SampleProfile/entry_counts.ll

test/Transforms/SampleProfile/fnptr.ll

test/Transforms/SampleProfile/offset.ll

test/Transforms/SampleProfile/propagate.ll

Fine tuning of sample profile propagation algorithm.
ClosedPublic