This is an archive of the discontinued LLVM Phabricator instance.

[BPI] Use metadata info before any other heuristics
ClosedPublic

Authored by skatkov on Mar 5 2017, 8:27 PM.

Download Raw Diff

Details

Reviewers

chandlerc
sanjoy
vsk
junbuml

Commits

rG2616bbb16d8a: [BPI] Use metadata info before any other heuristics
rL300440: [BPI] Use metadata info before any other heuristics

Summary

Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is distributed
among other reachable blocks equally.

An example where metadata might be more strong then unreachable heuristic is as follows:
it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable heuristics we'll set
the probability for A to (1, 2^20) and now edge of A becomes hotter than edge of B.
This is unexpected behavior.

This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214

Diff Detail

Event Timeline

skatkov created this revision.Mar 5 2017, 8:27 PM

The problem I see here is that calcUnreachableHeuristics computes PostDominatedByUnreachable and if metadata is present then we miss this computation. From other point of view if metadata is present than it is better to use it. So the problem is when metadata is present in some cases but no in all cases.

I haven't worked on this area much, but this seems like a reasonable change.

The test case should be stronger because it would also pass if the change from D30633 were applied. Maybe you could use branch weight metadata which states that Pr[entry -> deopt] = 1, then check that we actually report that.

@skatkov you had a concern that this patch would cause PostDominatedByUnreachable to not be computed as often. What kinds of problems would this cause?

Hi Vedant,
I picked your name as one who touched this code.

Thank you, for the review and good point about the testcase. I will update it after gathring a bit more of review.

As I've undersood PostDominatedByUnreachable is computed inside calcUnreachableHeuristics, if on the path there will be some metadata available for some block which dominates unreachable block we will handle it in calcMetadataWeights and this block will not be added to PostDominatedByUnreachable. So the predecessor of this block will not consider it as dominating unreachable block. So the analysis will not be complete. So the trouble happens if metadata is present but not for each branch.

To resolve it we can run calcUnreachableHeuristics, rememeber the result and force running calcMetadataWeights to overwrite our heuristics. After calcMetadataWeights we can re-check the result of calcUnreachableHeuristics and bailout if any of previous ones handled block.
It is not clean from code but it works.

Actually I've updated a test like Vedant suggested. I like it more.

Thanks for explaining. It looks like PostDominatedByUnreachable needs to be updated every time we visit a BB. I think we should factor out the logic that updates PostDominatedByUnreachable, and make sure that the update happens every time a BB is visited, unconditionally. You could save some of the computations and forward them to calcUnreachableHeuristics, no need to overwrite any edge probabilities.

Please note that PostDominatedByColdCall has the same potential issue...

This is some kind of redundant computation if all metadata is present. So the main question here I would say whether it is possible the metadata is present but not for all branches. And if it is true, do we still want to have the precise information?

Note that it is possible to have some BB to have metadata while others do not (e.g with builtin_expect). Your patch may break in those case when PostDominatebyUnreachable computation is skipped with this change.

That is what I talking about.
So it seems that it would be right if we compute the domination information for both PostDominatedByUnreachable and PostDominatedByColdCall for all BBs, correct?

In addition to the issue David pointed out, I don't understand the motivation yet.

It would be much more helpful to describe in the patch exactly what motivates the change so that we don't have to guess. =]

Relatedly, I think there are several heuristics that are actually more accurate than any metadata. For example, even if there is metadata that says code which is post-dominated by unreachable is hot, it seems much more likely that the metadata is wrong as we are *guaranteed* unreachable is, er, not reached. =] If this is the heuristic you're trying to change, I suspect that there is instead a bug in how we are computing it, and it isn't just about metadata being more reliable.

Hi Chandler, please take a look at my example from D30633. The story is the same, profiling in metadata may say us that probability of unreachable block is zero (and it is more accurate than our heuristic) while we override this proflining data with our heuristic value causing the unreachable block is hotter than "normal" exit from the loop.

I tried to generailze the summary and do not use some specific example. I can put an example to the description if you want with the next version of the patch.

In general, to me the metadata is something user of LLVM would like us to follow. I do not see any reason to violate user's choice in this case until it breaks something. If metadata is wrong then user should fix the metadata, no need to fix it on our side.

skatkov updated this revision to Diff 90799.Mar 6 2017, 11:29 PM

skatkov edited the summary of this revision. (Show Details)

skatkov edited the summary of this revision. (Show Details)Mar 7 2017, 4:17 AM

FYI: Serguei is going to file an upstream bug with a clear illustration of where loop rotations goes wrong due to the issue identified here. Essentially, for a sufficiently long running loop, the static heuristic for unreached blocks is not strongly biased enough. In our case, we have branch weights specified which are more strongly biased than the static heuristic result. Using the static heuristic by itself is clearly wrong, but I do see Chandler's point about the static heuristics providing useful information. Possibly we should be using the stronger of the two sources of information?

I have prepared an example illustrating the bad loop rotation behavior in block-placement pass due to incorrect behavior of BPI to file a bug but I do not have an account to bugzilla. I have requested an account and as soon I get it I will file a bug.

I will add an option which makes unreachable case first one.

Option to select unreachable first added.
Test for the option is added.
updatePostDominated is split for clearness.

I still do not have bugzilla account. Will file a bug as soon as I get it.

Thanks to Artur who filed a bug instead of me because I did not get an account till this moment: https://bugs.llvm.org/show_bug.cgi?id=32214. The bug describes the issue demonstrating the unexpected BPI behavior. Please take a look.

Hi, anything I can do more to make a progress?

Chandler, any comments here?

Serguei and I talked offline a bit about your concerns. He's going to post a patch which uses the minimum frequency computed from either the static heuristic or the profile data for a block ending in unreachable. That seems like it addresses your concern to me, do you agree?

Please review. To simplify the review, I potentially can split the patch to two ones: refactoring of collection of post domination information and fix itself. Please let me know if it makes sense.

Given lack of response from Chandler following the update from Serguei, I am going to move forward with the review of this patch. I do not intend to hold the patch any longer for Chandler's response. Note that Serguei made one major change in the approach: rather than having the metadata weight unconditionally win, he now has the patch structured so that a branch to unreachable takes the *minimum* frequency produced by either the static heuristic or the metadata.

In D30631#716486, @skatkov wrote:

Please review. To simplify the review, I potentially can split the patch to two ones: refactoring of collection of post domination information and fix itself. Please let me know if it makes sense.

Serguei, please split off the refactoring patch. It will make my life much easier as the reviewer.

Also, please update the description of this review thread to make it clear we're taking the minimum of the static heuristic and the metadata. The current description reflects the original patch, not the updated one.

In D30631#716698, @reames wrote:

Given lack of response from Chandler following the update from Serguei

FWIW, I was travelling back to the US. Sorry for delay. I should have a response to this patch today or tomorrow at the latest.

The re-factoring part has been split out in https://reviews.llvm.org/D31701.
This is only fix part. Please review.

skatkov added a parent revision: D31701: [BPI] Refactor post domination calculation and simple fix for ColdCall.Apr 5 2017, 3:24 AM

skatkov added a child revision: D31704: [BPI] NFC: reorder ifs to bail out earlier.Apr 5 2017, 3:39 AM

First off, thanks for the new approach. I like this direction a lot. Some more tactical comments here.

lib/Analysis/BranchProbabilityInfo.cpp
327–332	There is a lot of code here. I wonder, is it possible to share the logic here with the logic above that is used in the absence of metadata?
334	To avoid re-hitting this set for every successor, you could above append the successor indices that are in this set to a list, and then loop over that list here. The size of the list would still give you the count of unreachable successors vs. reachable.
336–339	I feel like it would be nicer to just adjust the weight downward such that the probability is essentially the minimum of the two sources of information. That way we don't lose the metadata's weights for the different successors that don't go to unreachable. Consider the test case (in pseudo C code): for (...) { switch (cond) { default: unreachable case 2: // HOT // something tiny continue; case 3: // COLD // huge pile of ugly code continue; } } } If, for whatever reason, we end up with one sample in the metadata going to unreachable, we'll completely loose the metadata that distinguishes between hot and cold here. Does that make sense?

skatkov updated this revision to Diff 94647.Apr 10 2017, 12:51 AM

skatkov marked 2 inline comments as done.Apr 10 2017, 12:57 AM

skatkov added inline comments.

lib/Analysis/BranchProbabilityInfo.cpp
336–339	It is possible, however I try to follow simpler logic here. So the main question if we do not trust that metadata represents the value for unreachable edge correctly (we fix it by the weight downward) why we trust that data for hot/cold edges is valid and continue using it? However if you still insist on that I would propose I will create a follow up patch implementing this approach and leave this patch as is. Is it ok for you?

chandlerc added inline comments.Apr 10 2017, 2:31 PM

lib/Analysis/BranchProbabilityInfo.cpp
336–339	It's not that I don't trust the metadata edge, it's about what is the strongest signal to the optimizer. When we have an unreachable, we don't need to wonder about what the metadata says because we have a control flow reason to know we shouldn't optimize that path. It isn't that the metadata is definitely wrong or bad, it is that the CFG analysis is definitely sufficient. So we shouldn't throw out the metadata for the reachable successors IMO. I think it would be most clear to do it in this patch. Is there a problem with doing that?

it will do the patch more complex but ok, I'll do that.

Hi Chandler, please review. I've also added a couple of new tests for switch case.

Hi Chandler, could you please take a look into the last version where I addressed your concern?

Sorry I couldn't get back to it sooner, first chance I had.

However, this looks really, really nice. Thanks for seeing it all the way through. I love the test cases where we nicely zero out the unreachable bits of the switch but leave the clear hot path based on metadata.

Some really minor code suggestions below. Feel free to land with those.

lib/Analysis/BranchProbabilityInfo.cpp
240	Didn't this get factored out into a separate patch? Not a big deal, but seems like a clear thing to factor out.
336–347	Lift all of this into the if for there being some unreachable and some reachable successors? Just seems worth skipping the ToDistribute checks in the case where none of this matters.
343	Is it better to do this in the loop or to multiply by size and subtract that once? It seems simpler to write the latter way inside the addition below: BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size());

This revision is now accepted and ready to land.Apr 14 2017, 12:19 AM

Thank you, Chandler for your time!

skatkov marked an inline comment as done.Apr 14 2017, 12:26 AM

skatkov added inline comments.

lib/Analysis/BranchProbabilityInfo.cpp
240	It will be in the next patch which you have already reviewed but I made that patch to depend on this one, so I will handle it after this patch is landed.
336–347	ok
343	Will do.

skatkov added inline comments.Apr 14 2017, 1:15 AM

lib/Analysis/BranchProbabilityInfo.cpp
343	Funny, BranchProbability does not have an multiplication operation by scalar... I will leave it as is for now and upload one more patch implementing BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size()); BTW, I guess the compiler should optimize it anyway and move ToDistribute -= PerEdge; out of the loop. But who knows :)

Two comments addressed. I will not submit it until Monday.

Chandler, if you have a chance please let me know if you are ok with my suggestion to update
BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size());
in a follow-up patch.

Closed by commit rL300440: [BPI] Use metadata info before any other heuristics (authored by skatkov). · Explain WhyApr 16 2017, 9:45 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

BranchProbabilityInfo.h

2 lines

lib/

Analysis/

BranchProbabilityInfo.cpp

163 lines

test/

Analysis/

BranchProbabilityInfo/

basic.ll

109 lines

Diff 93830

include/llvm/Analysis/BranchProbabilityInfo.h

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	private:
const Function *LastF;		const Function *LastF;

/// \brief Track the set of blocks directly succeeded by a returning block.		/// \brief Track the set of blocks directly succeeded by a returning block.
SmallPtrSet<const BasicBlock *, 16> PostDominatedByUnreachable;		SmallPtrSet<const BasicBlock *, 16> PostDominatedByUnreachable;

/// \brief Track the set of blocks that always lead to a cold call.		/// \brief Track the set of blocks that always lead to a cold call.
SmallPtrSet<const BasicBlock *, 16> PostDominatedByColdCall;		SmallPtrSet<const BasicBlock *, 16> PostDominatedByColdCall;

		void updatePostDominatedByUnreachable(const BasicBlock *BB);
		void updatePostDominatedByColdCall(const BasicBlock *BB);
bool calcUnreachableHeuristics(const BasicBlock *BB);		bool calcUnreachableHeuristics(const BasicBlock *BB);
bool calcMetadataWeights(const BasicBlock *BB);		bool calcMetadataWeights(const BasicBlock *BB);
bool calcColdCallHeuristics(const BasicBlock *BB);		bool calcColdCallHeuristics(const BasicBlock *BB);
bool calcPointerHeuristics(const BasicBlock *BB);		bool calcPointerHeuristics(const BasicBlock *BB);
bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI);		bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI);
bool calcZeroHeuristics(const BasicBlock *BB);		bool calcZeroHeuristics(const BasicBlock *BB);
bool calcFloatingPointHeuristics(const BasicBlock *BB);		bool calcFloatingPointHeuristics(const BasicBlock *BB);
bool calcInvokeHeuristics(const BasicBlock *BB);		bool calcInvokeHeuristics(const BasicBlock *BB);
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

lib/Analysis/BranchProbabilityInfo.cpp

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
static const uint32_t IH_TAKEN_WEIGHT = 1024 * 1024 - 1;		static const uint32_t IH_TAKEN_WEIGHT = 1024 * 1024 - 1;

/// \brief Invoke-terminating normal branch not-taken weight.		/// \brief Invoke-terminating normal branch not-taken weight.
///		///
/// This is the weight for branching to the unwind destination of an invoke		/// This is the weight for branching to the unwind destination of an invoke
/// instruction. This is essentially never taken.		/// instruction. This is essentially never taken.
static const uint32_t IH_NONTAKEN_WEIGHT = 1;		static const uint32_t IH_NONTAKEN_WEIGHT = 1;

/// \brief Calculate edge weights for successors lead to unreachable.		/// \brief Add \p BB to PostDominatedByUnreachable set if applicable.
///		void
/// Predict that a successor which leads necessarily to an		BranchProbabilityInfo::updatePostDominatedByUnreachable(const BasicBlock *BB) {
/// unreachable-terminated block as extremely unlikely.
bool BranchProbabilityInfo::calcUnreachableHeuristics(const BasicBlock *BB) {
const TerminatorInst *TI = BB->getTerminator();		const TerminatorInst *TI = BB->getTerminator();
if (TI->getNumSuccessors() == 0) {		if (TI->getNumSuccessors() == 0) {
if (isa<UnreachableInst>(TI) \|\|		if (isa<UnreachableInst>(TI) \|\|
// If this block is terminated by a call to		// If this block is terminated by a call to
// @llvm.experimental.deoptimize then treat it like an unreachable since		// @llvm.experimental.deoptimize then treat it like an unreachable since
// the @llvm.experimental.deoptimize call is expected to practically		// the @llvm.experimental.deoptimize call is expected to practically
// never execute.		// never execute.
BB->getTerminatingDeoptimizeCall())		BB->getTerminatingDeoptimizeCall())
PostDominatedByUnreachable.insert(BB);		PostDominatedByUnreachable.insert(BB);
return false;		return;
		}

		// If the terminator is an InvokeInst, check only the normal destination block
		// as the unwind edge of InvokeInst is also very unlikely taken.
		if (auto *II = dyn_cast<InvokeInst>(TI)) {
		if (PostDominatedByUnreachable.count(II->getNormalDest())) {
		PostDominatedByUnreachable.insert(BB);
		}
		return;
}		}

		for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {
		// If any of successor is not post dominated then BB is also not.
		if (!PostDominatedByUnreachable.count(*I))
		return;
		}
		PostDominatedByUnreachable.insert(BB);
		}

		/// \brief Add \p BB to PostDominatedByColdCall set if applicable.
		void
		BranchProbabilityInfo::updatePostDominatedByColdCall(const BasicBlock *BB) {
		const TerminatorInst *TI = BB->getTerminator();
		if (TI->getNumSuccessors() == 0)
		return;

		// If the terminator is an InvokeInst, check only the normal destination block
		// as the unwind edge of InvokeInst is also very unlikely taken.
		if (auto *II = dyn_cast<InvokeInst>(TI)) {
		if (PostDominatedByColdCall.count(II->getNormalDest())) {
		PostDominatedByColdCall.insert(BB);
		}
		return;
		}

		bool MarkColdCall = true;
		for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {
		// If any of successor is not post dominated then BB is also not.
		if (!PostDominatedByColdCall.count(*I)) {
		MarkColdCall = false;
		break;
		}
		}
		if (MarkColdCall) {
		PostDominatedByColdCall.insert(BB);
		} else {
		// Otherwise, if the block itself contains a cold function, add it to the
		// set of blocks post-dominated by a cold call.
		assert(!PostDominatedByColdCall.count(BB));
		for (BasicBlock::const_iterator I = BB->begin(), E = BB->end(); I != E; ++I)
		if (const CallInst *CI = dyn_cast<CallInst>(I))
		if (CI->hasFnAttr(Attribute::Cold)) {
		PostDominatedByColdCall.insert(BB);
		break;
		}
		}
		}

		/// \brief Calculate edge weights for successors lead to unreachable.
		///
		/// Predict that a successor which leads necessarily to an
		/// unreachable-terminated block as extremely unlikely.
		bool BranchProbabilityInfo::calcUnreachableHeuristics(const BasicBlock *BB) {
		const TerminatorInst *TI = BB->getTerminator();
		if (TI->getNumSuccessors() == 0)
		return false;

		// Return false here so that edge weights for InvokeInst could be decided
		// in calcInvokeHeuristics().
		if (isa<InvokeInst>(TI))
		return false;

SmallVector<unsigned, 4> UnreachableEdges;		SmallVector<unsigned, 4> UnreachableEdges;
SmallVector<unsigned, 4> ReachableEdges;		SmallVector<unsigned, 4> ReachableEdges;

for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {		for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {
if (PostDominatedByUnreachable.count(*I))		if (PostDominatedByUnreachable.count(*I))
UnreachableEdges.push_back(I.getSuccessorIndex());		UnreachableEdges.push_back(I.getSuccessorIndex());
else		else
ReachableEdges.push_back(I.getSuccessorIndex());		ReachableEdges.push_back(I.getSuccessorIndex());
}		}

// If all successors are in the set of blocks post-dominated by unreachable,
// this block is too.
if (UnreachableEdges.size() == TI->getNumSuccessors())
PostDominatedByUnreachable.insert(BB);

// Skip probabilities if this block has a single successor or if all were		// Skip probabilities if this block has a single successor or if all were
// reachable.		// reachable.
if (TI->getNumSuccessors() == 1 \|\| UnreachableEdges.empty())		if (TI->getNumSuccessors() == 1 \|\| UnreachableEdges.empty())
return false;		return false;

// If the terminator is an InvokeInst, check only the normal destination block
// as the unwind edge of InvokeInst is also very unlikely taken.
if (auto *II = dyn_cast<InvokeInst>(TI))
if (PostDominatedByUnreachable.count(II->getNormalDest())) {
PostDominatedByUnreachable.insert(BB);
// Return false here so that edge weights for InvokeInst could be decided
// in calcInvokeHeuristics().
return false;
}

if (ReachableEdges.empty()) {		if (ReachableEdges.empty()) {
BranchProbability Prob(1, UnreachableEdges.size());		BranchProbability Prob(1, UnreachableEdges.size());
for (unsigned SuccIdx : UnreachableEdges)		for (unsigned SuccIdx : UnreachableEdges)
setEdgeProbability(BB, SuccIdx, Prob);		setEdgeProbability(BB, SuccIdx, Prob);
return true;		return true;
}		}

auto UnreachableProb = BranchProbability::getBranchProbability(		auto UnreachableProb = BranchProbability::getBranchProbability(
UR_TAKEN_WEIGHT, (UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) *		UR_TAKEN_WEIGHT, (UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) *
uint64_t(UnreachableEdges.size()));		uint64_t(UnreachableEdges.size()));
auto ReachableProb = BranchProbability::getBranchProbability(		auto ReachableProb = BranchProbability::getBranchProbability(
UR_NONTAKEN_WEIGHT,		UR_NONTAKEN_WEIGHT,
(UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) * uint64_t(ReachableEdges.size()));		(UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) * uint64_t(ReachableEdges.size()));

for (unsigned SuccIdx : UnreachableEdges)		for (unsigned SuccIdx : UnreachableEdges)
setEdgeProbability(BB, SuccIdx, UnreachableProb);		setEdgeProbability(BB, SuccIdx, UnreachableProb);
for (unsigned SuccIdx : ReachableEdges)		for (unsigned SuccIdx : ReachableEdges)
setEdgeProbability(BB, SuccIdx, ReachableProb);		setEdgeProbability(BB, SuccIdx, ReachableProb);

return true;		return true;
}		}

// Propagate existing explicit probabilities from either profile data or		// Propagate existing explicit probabilities from either profile data or
// 'expect' intrinsic processing.		// 'expect' intrinsic processing. Examine metadata against unreachable
		// heuristic. If probability of the edge coming to unreachable block is
		// higher than it would be according to unreachable heuristic then metadata is
		// ignored.
bool BranchProbabilityInfo::calcMetadataWeights(const BasicBlock *BB) {		bool BranchProbabilityInfo::calcMetadataWeights(const BasicBlock *BB) {
const TerminatorInst *TI = BB->getTerminator();		const TerminatorInst *TI = BB->getTerminator();
if (TI->getNumSuccessors() == 1)		if (TI->getNumSuccessors() == 1)
		chandlercUnsubmitted Not Done Reply Inline Actions Didn't this get factored out into a separate patch? Not a big deal, but seems like a clear thing to factor out. chandlerc: Didn't this get factored out into a separate patch? Not a big deal, but seems like a clear…
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions It will be in the next patch which you have already reviewed but I made that patch to depend on this one, so I will handle it after this patch is landed. skatkov: It will be in the next patch which you have already reviewed but I made that patch to depend on…
return false;		return false;
if (!isa<BranchInst>(TI) && !isa<SwitchInst>(TI))		if (!isa<BranchInst>(TI) && !isa<SwitchInst>(TI))
return false;		return false;

MDNode *WeightsNode = TI->getMetadata(LLVMContext::MD_prof);		MDNode *WeightsNode = TI->getMetadata(LLVMContext::MD_prof);
if (!WeightsNode)		if (!WeightsNode)
return false;		return false;

Show All 24 Lines	bool BranchProbabilityInfo::calcMetadataWeights(const BasicBlock *BB) {
assert(Weights.size() == TI->getNumSuccessors() && "Checked above");		assert(Weights.size() == TI->getNumSuccessors() && "Checked above");

// If the sum of weights does not fit in 32 bits, scale every weight down		// If the sum of weights does not fit in 32 bits, scale every weight down
// accordingly.		// accordingly.
uint64_t ScalingFactor =		uint64_t ScalingFactor =
(WeightSum > UINT32_MAX) ? WeightSum / UINT32_MAX + 1 : 1;		(WeightSum > UINT32_MAX) ? WeightSum / UINT32_MAX + 1 : 1;

WeightSum = 0;		WeightSum = 0;
		uint64_t UnreachableCount = 0;
for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {
		if (PostDominatedByUnreachable.count(TI->getSuccessor(i)))
		UnreachableCount++;
Weights[i] /= ScalingFactor;		Weights[i] /= ScalingFactor;
WeightSum += Weights[i];		WeightSum += Weights[i];
}		}

		// Examine the metadata against unreachable heuristic.
		if (UnreachableCount > 0) {
		auto UnreachableProb =
		UnreachableCount == TI->getNumSuccessors()
		? BranchProbability::getBranchProbability(1, TI->getNumSuccessors())
		: BranchProbability::getBranchProbability(
		UR_TAKEN_WEIGHT,
		(UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) * UnreachableCount);
		if (WeightSum == 0) {
		if (UnreachableProb <
		BranchProbability::getBranchProbability(1, TI->getNumSuccessors()))
		return false;
		} else {
		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {
		if (!PostDominatedByUnreachable.count(TI->getSuccessor(i)))
		continue;
		if (UnreachableProb < BranchProbability::getBranchProbability(
		Weights[i], static_cast<uint32_t>(WeightSum)))
		return false;
		}
		}
		}

if (WeightSum == 0) {		if (WeightSum == 0) {
for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
setEdgeProbability(BB, i, {1, e});		setEdgeProbability(BB, i, {1, e});
} else {		} else {
for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
setEdgeProbability(BB, i, {Weights[i], static_cast<uint32_t>(WeightSum)});		setEdgeProbability(BB, i, {Weights[i], static_cast<uint32_t>(WeightSum)});
}		}

assert(WeightSum <= UINT32_MAX &&		assert(WeightSum <= UINT32_MAX &&
"Expected weights to scale down to 32 bits");		"Expected weights to scale down to 32 bits");

return true;		return true;
}		}

/// \brief Calculate edge weights for edges leading to cold blocks.		/// \brief Calculate edge weights for edges leading to cold blocks.
///		///
/// A cold block is one post-dominated by a block with a call to a		/// A cold block is one post-dominated by a block with a call to a
/// cold function. Those edges are unlikely to be taken, so we give		/// cold function. Those edges are unlikely to be taken, so we give
/// them relatively low weight.		/// them relatively low weight.
///		///
/// Return true if we could compute the weights for cold edges.		/// Return true if we could compute the weights for cold edges.
		chandlercUnsubmitted Done Reply Inline Actions There is a lot of code here. I wonder, is it possible to share the logic here with the logic above that is used in the absence of metadata? chandlerc: There is a lot of code here. I wonder, is it possible to share the logic here with the logic…
/// Return false, otherwise.		/// Return false, otherwise.
bool BranchProbabilityInfo::calcColdCallHeuristics(const BasicBlock *BB) {		bool BranchProbabilityInfo::calcColdCallHeuristics(const BasicBlock *BB) {
		chandlercUnsubmitted Done Reply Inline Actions To avoid re-hitting this set for every successor, you could above append the successor indices that are in this set to a list, and then loop over that list here. The size of the list would still give you the count of unreachable successors vs. reachable. chandlerc: To avoid re-hitting this set for every successor, you could above append the successor indices…
const TerminatorInst *TI = BB->getTerminator();		const TerminatorInst *TI = BB->getTerminator();
if (TI->getNumSuccessors() == 0)		if (TI->getNumSuccessors() == 0)
return false;		return false;

		// Return false here so that edge weights for InvokeInst could be decided
		chandlercUnsubmitted Not Done Reply Inline Actions I feel like it would be nicer to just adjust the weight downward such that the probability is essentially the minimum of the two sources of information. That way we don't lose the metadata's weights for the different successors that don't go to unreachable. Consider the test case (in pseudo C code): for (...) { switch (cond) { default: unreachable case 2: // HOT // something tiny continue; case 3: // COLD // huge pile of ugly code continue; } } } If, for whatever reason, we end up with one sample in the metadata going to unreachable, we'll completely loose the metadata that distinguishes between hot and cold here. Does that make sense? chandlerc: I feel like it would be nicer to just adjust the weight downward such that the probability is…
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions It is possible, however I try to follow simpler logic here. So the main question if we do not trust that metadata represents the value for unreachable edge correctly (we fix it by the weight downward) why we trust that data for hot/cold edges is valid and continue using it? However if you still insist on that I would propose I will create a follow up patch implementing this approach and leave this patch as is. Is it ok for you? skatkov: It is possible, however I try to follow simpler logic here. So the main question if we do not…
		chandlercUnsubmitted Not Done Reply Inline Actions It's not that I don't trust the metadata edge, it's about what is the strongest signal to the optimizer. When we have an unreachable, we don't need to wonder about what the metadata says because we have a control flow reason to know we shouldn't optimize that path. It isn't that the metadata is definitely wrong or bad, it is that the CFG analysis is definitely sufficient. So we shouldn't throw out the metadata for the reachable successors IMO. I think it would be most clear to do it in this patch. Is there a problem with doing that? chandlerc: It's not that I don't trust the metadata edge, it's about what is the strongest signal to the…
		// in calcInvokeHeuristics().
		if (isa<InvokeInst>(TI))
		return false;

		chandlercUnsubmitted Not Done Reply Inline Actions Is it better to do this in the loop or to multiply by size and subtract that once? It seems simpler to write the latter way inside the addition below: BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size()); chandlerc: Is it better to do this in the loop or to multiply by size and subtract that once? It seems…
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions Will do. skatkov: Will do.
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions Funny, BranchProbability does not have an multiplication operation by scalar... I will leave it as is for now and upload one more patch implementing BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size()); BTW, I guess the compiler should optimize it anyway and move ToDistribute -= PerEdge; out of the loop. But who knows :) skatkov: Funny, BranchProbability does not have an multiplication operation by scalar... I will leave it…
// Determine which successors are post-dominated by a cold block.		// Determine which successors are post-dominated by a cold block.
SmallVector<unsigned, 4> ColdEdges;		SmallVector<unsigned, 4> ColdEdges;
SmallVector<unsigned, 4> NormalEdges;		SmallVector<unsigned, 4> NormalEdges;
for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I)		for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I)
		chandlercUnsubmitted Not Done Reply Inline Actions Lift all of this into the if for there being some unreachable and some reachable successors? Just seems worth skipping the ToDistribute checks in the case where none of this matters. chandlerc: Lift all of this into the if for there being some unreachable and some reachable successors?
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions ok skatkov: ok
if (PostDominatedByColdCall.count(*I))		if (PostDominatedByColdCall.count(*I))
ColdEdges.push_back(I.getSuccessorIndex());		ColdEdges.push_back(I.getSuccessorIndex());
else		else
NormalEdges.push_back(I.getSuccessorIndex());		NormalEdges.push_back(I.getSuccessorIndex());

// If all successors are in the set of blocks post-dominated by cold calls,
// this block is in the set post-dominated by cold calls.
if (ColdEdges.size() == TI->getNumSuccessors())
PostDominatedByColdCall.insert(BB);
else {
// Otherwise, if the block itself contains a cold function, add it to the
// set of blocks postdominated by a cold call.
assert(!PostDominatedByColdCall.count(BB));
for (BasicBlock::const_iterator I = BB->begin(), E = BB->end(); I != E; ++I)
if (const CallInst *CI = dyn_cast<CallInst>(I))
if (CI->hasFnAttr(Attribute::Cold)) {
PostDominatedByColdCall.insert(BB);
break;
}
}

if (auto *II = dyn_cast<InvokeInst>(TI)) {
// If the terminator is an InvokeInst, consider only the normal destination
// block.
if (PostDominatedByColdCall.count(II->getNormalDest()))
PostDominatedByColdCall.insert(BB);
// Return false here so that edge weights for InvokeInst could be decided
// in calcInvokeHeuristics().
return false;
}

// Skip probabilities if this block has a single successor.		// Skip probabilities if this block has a single successor.
if (TI->getNumSuccessors() == 1 \|\| ColdEdges.empty())		if (TI->getNumSuccessors() == 1 \|\| ColdEdges.empty())
return false;		return false;

if (NormalEdges.empty()) {		if (NormalEdges.empty()) {
BranchProbability Prob(1, ColdEdges.size());		BranchProbability Prob(1, ColdEdges.size());
for (unsigned SuccIdx : ColdEdges)		for (unsigned SuccIdx : ColdEdges)
setEdgeProbability(BB, SuccIdx, Prob);		setEdgeProbability(BB, SuccIdx, Prob);
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI) {
LastF = &F; // Store the last function we ran on for printing.		LastF = &F; // Store the last function we ran on for printing.
assert(PostDominatedByUnreachable.empty());		assert(PostDominatedByUnreachable.empty());
assert(PostDominatedByColdCall.empty());		assert(PostDominatedByColdCall.empty());

// Walk the basic blocks in post-order so that we can build up state about		// Walk the basic blocks in post-order so that we can build up state about
// the successors of a block iteratively.		// the successors of a block iteratively.
for (auto BB : post_order(&F.getEntryBlock())) {		for (auto BB : post_order(&F.getEntryBlock())) {
DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");		DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");
if (calcUnreachableHeuristics(BB))		updatePostDominatedByUnreachable(BB);
continue;		updatePostDominatedByColdCall(BB);
if (calcMetadataWeights(BB))		if (calcMetadataWeights(BB))
continue;		continue;
		if (calcUnreachableHeuristics(BB))
		continue;
if (calcColdCallHeuristics(BB))		if (calcColdCallHeuristics(BB))
continue;		continue;
if (calcLoopBranchHeuristics(BB, LI))		if (calcLoopBranchHeuristics(BB, LI))
continue;		continue;
if (calcPointerHeuristics(BB))		if (calcPointerHeuristics(BB))
continue;		continue;
if (calcZeroHeuristics(BB))		if (calcZeroHeuristics(BB))
continue;		continue;
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Analysis/BranchProbabilityInfo/basic.ll

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	else:
br label %exit		br label %exit
; CHECK: edge else -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]		; CHECK: edge else -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

exit:		exit:
%result = phi i32 [ %a, %then ], [ %b, %else ]		%result = phi i32 [ %a, %then ], [ %b, %else ]
ret i32 %result		ret i32 %result
}		}

		define i32 @test_unreachable_with_prof_greater(i32 %a, i32 %b) {
		; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_greater'
		entry:
		%cond = icmp eq i32 %a, 42
		br i1 %cond, label %exit, label %unr, !prof !3

		; CHECK: edge entry -> exit probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]
		; CHECK: edge entry -> unr probability is 0x00000800 / 0x80000000 = 0.00%

		unr:
		unreachable

		exit:
		ret i32 %b
		}

		!3 = !{!"branch_weights", i32 0, i32 1}

		define i32 @test_unreachable_with_prof_equal(i32 %a, i32 %b) {
		; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_equal'
		entry:
		%cond = icmp eq i32 %a, 42
		br i1 %cond, label %exit, label %unr, !prof !4

		; CHECK: edge entry -> exit probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]
		; CHECK: edge entry -> unr probability is 0x00000800 / 0x80000000 = 0.00%

		unr:
		unreachable

		exit:
		ret i32 %b
		}

		!4 = !{!"branch_weights", i32 1048575, i32 1}

		define i32 @test_unreachable_with_prof_zero(i32 %a, i32 %b) {
		; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_zero'
		entry:
		%cond = icmp eq i32 %a, 42
		br i1 %cond, label %exit, label %unr, !prof !5

		; CHECK: edge entry -> exit probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]
		; CHECK: edge entry -> unr probability is 0x00000800 / 0x80000000 = 0.00%

		unr:
		unreachable

		exit:
		ret i32 %b
		}

		!5 = !{!"branch_weights", i32 0, i32 0}

		define i32 @test_unreachable_with_prof_less(i32 %a, i32 %b) {
		; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_less'
		entry:
		%cond = icmp eq i32 %a, 42
		br i1 %cond, label %exit, label %unr, !prof !6

		; CHECK: edge entry -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]
		; CHECK: edge entry -> unr probability is 0x00000000 / 0x80000000 = 0.00%

		unr:
		unreachable

		exit:
		ret i32 %b
		}

		!6 = !{!"branch_weights", i32 1, i32 0}

declare i32 @regular_function(i32 %i)		declare i32 @regular_function(i32 %i)

		define i32 @test_cold_call_sites_with_prof(i32 %a, i32 %b, i1 %flag, i1 %flag2) {
		; CHECK: Printing analysis {{.*}} for function 'test_cold_call_sites_with_prof'
		entry:
		br i1 %flag, label %then, label %else
		; CHECK: edge entry -> then probability is 0x07878788 / 0x80000000 = 5.88%
		; CHECK: edge entry -> else probability is 0x78787878 / 0x80000000 = 94.12% [HOT edge]

		then:
		br i1 %flag2, label %then2, label %else2, !prof !7
		; CHECK: edge then -> then2 probability is 0x7ebb907a / 0x80000000 = 99.01% [HOT edge]
		; CHECK: edge then -> else2 probability is 0x01446f86 / 0x80000000 = 0.99%

		then2:
		br label %join
		; CHECK: edge then2 -> join probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		else2:
		br label %join
		; CHECK: edge else2 -> join probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		join:
		%joinresult = phi i32 [ %a, %then2 ], [ %b, %else2 ]
		call void @coldfunc()
		br label %exit
		; CHECK: edge join -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		else:
		br label %exit
		; CHECK: edge else -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		exit:
		%result = phi i32 [ %joinresult, %join ], [ %b, %else ]
		ret i32 %result
		}

		!7 = !{!"branch_weights", i32 100, i32 1}

define i32 @test_cold_call_sites(i32* %a) {		define i32 @test_cold_call_sites(i32* %a) {
; Test that edges to blocks post-dominated by cold call sites		; Test that edges to blocks post-dominated by cold call sites
; are marked as not expected to be taken.		; are marked as not expected to be taken.
; TODO(dnovillo) The calls to regular_function should not be merged, but		; TODO(dnovillo) The calls to regular_function should not be merged, but
; they are currently being merged. Convert this into a code generation test		; they are currently being merged. Convert this into a code generation test
; after that is fixed.		; after that is fixed.

; CHECK: Printing analysis {{.*}} for function 'test_cold_call_sites'		; CHECK: Printing analysis {{.*}} for function 'test_cold_call_sites'
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[BPI] Use metadata info before any other heuristicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 93830

include/llvm/Analysis/BranchProbabilityInfo.h

lib/Analysis/BranchProbabilityInfo.cpp

test/Analysis/BranchProbabilityInfo/basic.ll

[BPI] Use metadata info before any other heuristics
ClosedPublic