This is an archive of the discontinued LLVM Phabricator instance.

[BPI] Use metadata info before any other heuristics
ClosedPublic

Authored by skatkov on Mar 5 2017, 8:27 PM.

Download Raw Diff

Details

Reviewers

chandlerc
sanjoy
vsk
junbuml

Commits

rG2616bbb16d8a: [BPI] Use metadata info before any other heuristics
rL300440: [BPI] Use metadata info before any other heuristics

Summary

Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is distributed
among other reachable blocks equally.

An example where metadata might be more strong then unreachable heuristic is as follows:
it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable heuristics we'll set
the probability for A to (1, 2^20) and now edge of A becomes hotter than edge of B.
This is unexpected behavior.

This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214

Diff Detail

Event Timeline

skatkov created this revision.Mar 5 2017, 8:27 PM

The problem I see here is that calcUnreachableHeuristics computes PostDominatedByUnreachable and if metadata is present then we miss this computation. From other point of view if metadata is present than it is better to use it. So the problem is when metadata is present in some cases but no in all cases.

I haven't worked on this area much, but this seems like a reasonable change.

The test case should be stronger because it would also pass if the change from D30633 were applied. Maybe you could use branch weight metadata which states that Pr[entry -> deopt] = 1, then check that we actually report that.

@skatkov you had a concern that this patch would cause PostDominatedByUnreachable to not be computed as often. What kinds of problems would this cause?

Hi Vedant,
I picked your name as one who touched this code.

Thank you, for the review and good point about the testcase. I will update it after gathring a bit more of review.

As I've undersood PostDominatedByUnreachable is computed inside calcUnreachableHeuristics, if on the path there will be some metadata available for some block which dominates unreachable block we will handle it in calcMetadataWeights and this block will not be added to PostDominatedByUnreachable. So the predecessor of this block will not consider it as dominating unreachable block. So the analysis will not be complete. So the trouble happens if metadata is present but not for each branch.

To resolve it we can run calcUnreachableHeuristics, rememeber the result and force running calcMetadataWeights to overwrite our heuristics. After calcMetadataWeights we can re-check the result of calcUnreachableHeuristics and bailout if any of previous ones handled block.
It is not clean from code but it works.

Actually I've updated a test like Vedant suggested. I like it more.

Thanks for explaining. It looks like PostDominatedByUnreachable needs to be updated every time we visit a BB. I think we should factor out the logic that updates PostDominatedByUnreachable, and make sure that the update happens every time a BB is visited, unconditionally. You could save some of the computations and forward them to calcUnreachableHeuristics, no need to overwrite any edge probabilities.

Please note that PostDominatedByColdCall has the same potential issue...

This is some kind of redundant computation if all metadata is present. So the main question here I would say whether it is possible the metadata is present but not for all branches. And if it is true, do we still want to have the precise information?

Note that it is possible to have some BB to have metadata while others do not (e.g with builtin_expect). Your patch may break in those case when PostDominatebyUnreachable computation is skipped with this change.

That is what I talking about.
So it seems that it would be right if we compute the domination information for both PostDominatedByUnreachable and PostDominatedByColdCall for all BBs, correct?

In addition to the issue David pointed out, I don't understand the motivation yet.

It would be much more helpful to describe in the patch exactly what motivates the change so that we don't have to guess. =]

Relatedly, I think there are several heuristics that are actually more accurate than any metadata. For example, even if there is metadata that says code which is post-dominated by unreachable is hot, it seems much more likely that the metadata is wrong as we are *guaranteed* unreachable is, er, not reached. =] If this is the heuristic you're trying to change, I suspect that there is instead a bug in how we are computing it, and it isn't just about metadata being more reliable.

Hi Chandler, please take a look at my example from D30633. The story is the same, profiling in metadata may say us that probability of unreachable block is zero (and it is more accurate than our heuristic) while we override this proflining data with our heuristic value causing the unreachable block is hotter than "normal" exit from the loop.

I tried to generailze the summary and do not use some specific example. I can put an example to the description if you want with the next version of the patch.

In general, to me the metadata is something user of LLVM would like us to follow. I do not see any reason to violate user's choice in this case until it breaks something. If metadata is wrong then user should fix the metadata, no need to fix it on our side.

skatkov updated this revision to Diff 90799.Mar 6 2017, 11:29 PM

skatkov edited the summary of this revision. (Show Details)

skatkov edited the summary of this revision. (Show Details)Mar 7 2017, 4:17 AM

FYI: Serguei is going to file an upstream bug with a clear illustration of where loop rotations goes wrong due to the issue identified here. Essentially, for a sufficiently long running loop, the static heuristic for unreached blocks is not strongly biased enough. In our case, we have branch weights specified which are more strongly biased than the static heuristic result. Using the static heuristic by itself is clearly wrong, but I do see Chandler's point about the static heuristics providing useful information. Possibly we should be using the stronger of the two sources of information?

I have prepared an example illustrating the bad loop rotation behavior in block-placement pass due to incorrect behavior of BPI to file a bug but I do not have an account to bugzilla. I have requested an account and as soon I get it I will file a bug.

I will add an option which makes unreachable case first one.

Option to select unreachable first added.
Test for the option is added.
updatePostDominated is split for clearness.

I still do not have bugzilla account. Will file a bug as soon as I get it.

Thanks to Artur who filed a bug instead of me because I did not get an account till this moment: https://bugs.llvm.org/show_bug.cgi?id=32214. The bug describes the issue demonstrating the unexpected BPI behavior. Please take a look.

Hi, anything I can do more to make a progress?

Chandler, any comments here?

Serguei and I talked offline a bit about your concerns. He's going to post a patch which uses the minimum frequency computed from either the static heuristic or the profile data for a block ending in unreachable. That seems like it addresses your concern to me, do you agree?

Please review. To simplify the review, I potentially can split the patch to two ones: refactoring of collection of post domination information and fix itself. Please let me know if it makes sense.

Given lack of response from Chandler following the update from Serguei, I am going to move forward with the review of this patch. I do not intend to hold the patch any longer for Chandler's response. Note that Serguei made one major change in the approach: rather than having the metadata weight unconditionally win, he now has the patch structured so that a branch to unreachable takes the *minimum* frequency produced by either the static heuristic or the metadata.

In D30631#716486, @skatkov wrote:

Please review. To simplify the review, I potentially can split the patch to two ones: refactoring of collection of post domination information and fix itself. Please let me know if it makes sense.

Serguei, please split off the refactoring patch. It will make my life much easier as the reviewer.

Also, please update the description of this review thread to make it clear we're taking the minimum of the static heuristic and the metadata. The current description reflects the original patch, not the updated one.

In D30631#716698, @reames wrote:

Given lack of response from Chandler following the update from Serguei

FWIW, I was travelling back to the US. Sorry for delay. I should have a response to this patch today or tomorrow at the latest.

The re-factoring part has been split out in https://reviews.llvm.org/D31701.
This is only fix part. Please review.

skatkov added a parent revision: D31701: [BPI] Refactor post domination calculation and simple fix for ColdCall.Apr 5 2017, 3:24 AM

skatkov added a child revision: D31704: [BPI] NFC: reorder ifs to bail out earlier.Apr 5 2017, 3:39 AM

First off, thanks for the new approach. I like this direction a lot. Some more tactical comments here.

lib/Analysis/BranchProbabilityInfo.cpp
320–325	There is a lot of code here. I wonder, is it possible to share the logic here with the logic above that is used in the absence of metadata?
327	To avoid re-hitting this set for every successor, you could above append the successor indices that are in this set to a list, and then loop over that list here. The size of the list would still give you the count of unreachable successors vs. reachable.
329–332	I feel like it would be nicer to just adjust the weight downward such that the probability is essentially the minimum of the two sources of information. That way we don't lose the metadata's weights for the different successors that don't go to unreachable. Consider the test case (in pseudo C code): for (...) { switch (cond) { default: unreachable case 2: // HOT // something tiny continue; case 3: // COLD // huge pile of ugly code continue; } } } If, for whatever reason, we end up with one sample in the metadata going to unreachable, we'll completely loose the metadata that distinguishes between hot and cold here. Does that make sense?

skatkov updated this revision to Diff 94647.Apr 10 2017, 12:51 AM

skatkov marked 2 inline comments as done.Apr 10 2017, 12:57 AM

skatkov added inline comments.

lib/Analysis/BranchProbabilityInfo.cpp
329–332	It is possible, however I try to follow simpler logic here. So the main question if we do not trust that metadata represents the value for unreachable edge correctly (we fix it by the weight downward) why we trust that data for hot/cold edges is valid and continue using it? However if you still insist on that I would propose I will create a follow up patch implementing this approach and leave this patch as is. Is it ok for you?

chandlerc added inline comments.Apr 10 2017, 2:31 PM

lib/Analysis/BranchProbabilityInfo.cpp
329–332	It's not that I don't trust the metadata edge, it's about what is the strongest signal to the optimizer. When we have an unreachable, we don't need to wonder about what the metadata says because we have a control flow reason to know we shouldn't optimize that path. It isn't that the metadata is definitely wrong or bad, it is that the CFG analysis is definitely sufficient. So we shouldn't throw out the metadata for the reachable successors IMO. I think it would be most clear to do it in this patch. Is there a problem with doing that?

it will do the patch more complex but ok, I'll do that.

Hi Chandler, please review. I've also added a couple of new tests for switch case.

Hi Chandler, could you please take a look into the last version where I addressed your concern?

Sorry I couldn't get back to it sooner, first chance I had.

However, this looks really, really nice. Thanks for seeing it all the way through. I love the test cases where we nicely zero out the unreachable bits of the switch but leave the clear hot path based on metadata.

Some really minor code suggestions below. Feel free to land with those.

lib/Analysis/BranchProbabilityInfo.cpp
254	Didn't this get factored out into a separate patch? Not a big deal, but seems like a clear thing to factor out.
330–341	Lift all of this into the if for there being some unreachable and some reachable successors? Just seems worth skipping the ToDistribute checks in the case where none of this matters.
337	Is it better to do this in the loop or to multiply by size and subtract that once? It seems simpler to write the latter way inside the addition below: BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size());

This revision is now accepted and ready to land.Apr 14 2017, 12:19 AM

Thank you, Chandler for your time!

skatkov marked an inline comment as done.Apr 14 2017, 12:26 AM

skatkov added inline comments.

lib/Analysis/BranchProbabilityInfo.cpp
254	It will be in the next patch which you have already reviewed but I made that patch to depend on this one, so I will handle it after this patch is landed.
330–341	ok
337	Will do.

skatkov added inline comments.Apr 14 2017, 1:15 AM

lib/Analysis/BranchProbabilityInfo.cpp
337	Funny, BranchProbability does not have an multiplication operation by scalar... I will leave it as is for now and upload one more patch implementing BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size()); BTW, I guess the compiler should optimize it anyway and move ToDistribute -= PerEdge; out of the loop. But who knows :)

Two comments addressed. I will not submit it until Monday.

Chandler, if you have a chance please let me know if you are ok with my suggestion to update
BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size());
in a follow-up patch.

Closed by commit rL300440: [BPI] Use metadata info before any other heuristics (authored by skatkov). · Explain WhyApr 16 2017, 9:45 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Analysis/

BranchProbabilityInfo.cpp

100 lines

test/

Analysis/

BranchProbabilityInfo/

basic.ll

225 lines

Diff 94803

lib/Analysis/BranchProbabilityInfo.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
/// \brief Unreachable-terminating branch not-taken weight.		/// \brief Unreachable-terminating branch not-taken weight.
///		///
/// This is the weight for a branch not being taken toward a block that		/// This is the weight for a branch not being taken toward a block that
/// terminates (eventually) in unreachable. Such a branch is essentially never		/// terminates (eventually) in unreachable. Such a branch is essentially never
/// taken. Set the weight to an absurdly high value so that nested loops don't		/// taken. Set the weight to an absurdly high value so that nested loops don't
/// easily subsume it.		/// easily subsume it.
static const uint32_t UR_NONTAKEN_WEIGHT = 1024*1024 - 1;		static const uint32_t UR_NONTAKEN_WEIGHT = 1024*1024 - 1;

		/// \brief Returns the branch probability for unreachable edge according to
		/// heuristic.
		///
		/// This is the branch probability being taken to a block that terminates
		/// (eventually) in unreachable. These are predicted as unlikely as possible.
		static BranchProbability getUnreachableProbability(uint64_t UnreachableCount) {
		assert(UnreachableCount > 0 && "UnreachableCount must be > 0");
		return BranchProbability::getBranchProbability(
		UR_TAKEN_WEIGHT,
		(UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) * UnreachableCount);
		}

		/// \brief Returns the branch probability for reachable edge according to
		/// heuristic.
		///
		/// This is the branch probability not being taken toward a block that
		/// terminates (eventually) in unreachable. Such a branch is essentially never
		/// taken. Set the weight to an absurdly high value so that nested loops don't
		/// easily subsume it.
		static BranchProbability getReachableProbability(uint64_t ReachableCount) {
		assert(ReachableCount > 0 && "ReachableCount must be > 0");
		return BranchProbability::getBranchProbability(
		UR_NONTAKEN_WEIGHT,
		(UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) * ReachableCount);
		}

/// \brief Weight for a branch taken going into a cold block.		/// \brief Weight for a branch taken going into a cold block.
///		///
/// This is the weight for a branch taken toward a block marked		/// This is the weight for a branch taken toward a block marked
/// cold. A block is marked cold if it's postdominated by a		/// cold. A block is marked cold if it's postdominated by a
/// block containing a call to a cold function. Cold functions		/// block containing a call to a cold function. Cold functions
/// are those marked with attribute 'cold'.		/// are those marked with attribute 'cold'.
static const uint32_t CC_TAKEN_WEIGHT = 4;		static const uint32_t CC_TAKEN_WEIGHT = 4;

▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	bool BranchProbabilityInfo::calcUnreachableHeuristics(const BasicBlock *BB) {

if (ReachableEdges.empty()) {		if (ReachableEdges.empty()) {
BranchProbability Prob(1, UnreachableEdges.size());		BranchProbability Prob(1, UnreachableEdges.size());
for (unsigned SuccIdx : UnreachableEdges)		for (unsigned SuccIdx : UnreachableEdges)
setEdgeProbability(BB, SuccIdx, Prob);		setEdgeProbability(BB, SuccIdx, Prob);
return true;		return true;
}		}

auto UnreachableProb = BranchProbability::getBranchProbability(		auto UnreachableProb = getUnreachableProbability(UnreachableEdges.size());
UR_TAKEN_WEIGHT, (UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) *		auto ReachableProb = getReachableProbability(ReachableEdges.size());
uint64_t(UnreachableEdges.size()));
auto ReachableProb = BranchProbability::getBranchProbability(
UR_NONTAKEN_WEIGHT,
(UR_TAKEN_WEIGHT + UR_NONTAKEN_WEIGHT) * uint64_t(ReachableEdges.size()));

for (unsigned SuccIdx : UnreachableEdges)		for (unsigned SuccIdx : UnreachableEdges)
setEdgeProbability(BB, SuccIdx, UnreachableProb);		setEdgeProbability(BB, SuccIdx, UnreachableProb);
for (unsigned SuccIdx : ReachableEdges)		for (unsigned SuccIdx : ReachableEdges)
setEdgeProbability(BB, SuccIdx, ReachableProb);		setEdgeProbability(BB, SuccIdx, ReachableProb);

return true;		return true;
}		}

// Propagate existing explicit probabilities from either profile data or		// Propagate existing explicit probabilities from either profile data or
// 'expect' intrinsic processing.		// 'expect' intrinsic processing. Examine metadata against unreachable
		// heuristic. The probability of the edge coming to unreachable block is
		// set to min of metadata and unreachable heuristic.
bool BranchProbabilityInfo::calcMetadataWeights(const BasicBlock *BB) {		bool BranchProbabilityInfo::calcMetadataWeights(const BasicBlock *BB) {
const TerminatorInst *TI = BB->getTerminator();		const TerminatorInst *TI = BB->getTerminator();
if (TI->getNumSuccessors() == 1)		if (TI->getNumSuccessors() <= 1)
		chandlercUnsubmitted Not Done Reply Inline Actions Didn't this get factored out into a separate patch? Not a big deal, but seems like a clear thing to factor out. chandlerc: Didn't this get factored out into a separate patch? Not a big deal, but seems like a clear…
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions It will be in the next patch which you have already reviewed but I made that patch to depend on this one, so I will handle it after this patch is landed. skatkov: It will be in the next patch which you have already reviewed but I made that patch to depend on…
return false;		return false;
if (!isa<BranchInst>(TI) && !isa<SwitchInst>(TI))		if (!isa<BranchInst>(TI) && !isa<SwitchInst>(TI))
return false;		return false;

MDNode *WeightsNode = TI->getMetadata(LLVMContext::MD_prof);		MDNode *WeightsNode = TI->getMetadata(LLVMContext::MD_prof);
if (!WeightsNode)		if (!WeightsNode)
return false;		return false;

// Check that the number of successors is manageable.		// Check that the number of successors is manageable.
assert(TI->getNumSuccessors() < UINT32_MAX && "Too many successors");		assert(TI->getNumSuccessors() < UINT32_MAX && "Too many successors");

// Ensure there are weights for all of the successors. Note that the first		// Ensure there are weights for all of the successors. Note that the first
// operand to the metadata node is a name, not a weight.		// operand to the metadata node is a name, not a weight.
if (WeightsNode->getNumOperands() != TI->getNumSuccessors() + 1)		if (WeightsNode->getNumOperands() != TI->getNumSuccessors() + 1)
return false;		return false;

// Build up the final weights that will be used in a temporary buffer.		// Build up the final weights that will be used in a temporary buffer.
// Compute the sum of all weights to later decide whether they need to		// Compute the sum of all weights to later decide whether they need to
// be scaled to fit in 32 bits.		// be scaled to fit in 32 bits.
uint64_t WeightSum = 0;		uint64_t WeightSum = 0;
SmallVector<uint32_t, 2> Weights;		SmallVector<uint32_t, 2> Weights;
		SmallVector<unsigned, 2> UnreachableIdxs;
		SmallVector<unsigned, 2> ReachableIdxs;
Weights.reserve(TI->getNumSuccessors());		Weights.reserve(TI->getNumSuccessors());
for (unsigned i = 1, e = WeightsNode->getNumOperands(); i != e; ++i) {		for (unsigned i = 1, e = WeightsNode->getNumOperands(); i != e; ++i) {
ConstantInt *Weight =		ConstantInt *Weight =
mdconst::dyn_extract<ConstantInt>(WeightsNode->getOperand(i));		mdconst::dyn_extract<ConstantInt>(WeightsNode->getOperand(i));
if (!Weight)		if (!Weight)
return false;		return false;
assert(Weight->getValue().getActiveBits() <= 32 &&		assert(Weight->getValue().getActiveBits() <= 32 &&
"Too many bits for uint32_t");		"Too many bits for uint32_t");
Weights.push_back(Weight->getZExtValue());		Weights.push_back(Weight->getZExtValue());
WeightSum += Weights.back();		WeightSum += Weights.back();
		if (PostDominatedByUnreachable.count(TI->getSuccessor(i - 1)))
		UnreachableIdxs.push_back(i - 1);
		else
		ReachableIdxs.push_back(i - 1);
}		}
assert(Weights.size() == TI->getNumSuccessors() && "Checked above");		assert(Weights.size() == TI->getNumSuccessors() && "Checked above");

// If the sum of weights does not fit in 32 bits, scale every weight down		// If the sum of weights does not fit in 32 bits, scale every weight down
// accordingly.		// accordingly.
uint64_t ScalingFactor =		uint64_t ScalingFactor =
(WeightSum > UINT32_MAX) ? WeightSum / UINT32_MAX + 1 : 1;		(WeightSum > UINT32_MAX) ? WeightSum / UINT32_MAX + 1 : 1;

		if (ScalingFactor > 1) {
WeightSum = 0;		WeightSum = 0;
for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {
Weights[i] /= ScalingFactor;		Weights[i] /= ScalingFactor;
WeightSum += Weights[i];		WeightSum += Weights[i];
}		}
		}

if (WeightSum == 0) {		if (WeightSum == 0 \|\| ReachableIdxs.size() == 0) {
for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
setEdgeProbability(BB, i, {1, e});		Weights[i] = 1;
} else {		WeightSum = TI->getNumSuccessors();
		}

		// Set the probability.
		SmallVector<BranchProbability, 2> BP;
for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
setEdgeProbability(BB, i, {Weights[i], static_cast<uint32_t>(WeightSum)});		BP.push_back({ Weights[i], static_cast<uint32_t>(WeightSum) });

		// Examine the metadata against unreachable heuristic.
		// If the unreachable heuristic is more strong then we use it for this edge.
		auto ToDistribute = BranchProbability::getZero();
		if (UnreachableIdxs.size() > 0 && ReachableIdxs.size() > 0) {
		auto UnreachableProb = getUnreachableProbability(UnreachableIdxs.size());
		for (auto i : UnreachableIdxs)
		if (UnreachableProb < BP[i]) {
		chandlercUnsubmitted Done Reply Inline Actions There is a lot of code here. I wonder, is it possible to share the logic here with the logic above that is used in the absence of metadata? chandlerc: There is a lot of code here. I wonder, is it possible to share the logic here with the logic…
		ToDistribute += BP[i] - UnreachableProb;
		BP[i] = UnreachableProb;
		chandlercUnsubmitted Done Reply Inline Actions To avoid re-hitting this set for every successor, you could above append the successor indices that are in this set to a list, and then loop over that list here. The size of the list would still give you the count of unreachable successors vs. reachable. chandlerc: To avoid re-hitting this set for every successor, you could above append the successor indices…
		}
		}
		// If we modified the probability of some edges then we must distribute
		// the difference between reachable blocks.
		if (ToDistribute > BranchProbability::getZero()) {
		chandlercUnsubmitted Not Done Reply Inline Actions I feel like it would be nicer to just adjust the weight downward such that the probability is essentially the minimum of the two sources of information. That way we don't lose the metadata's weights for the different successors that don't go to unreachable. Consider the test case (in pseudo C code): for (...) { switch (cond) { default: unreachable case 2: // HOT // something tiny continue; case 3: // COLD // huge pile of ugly code continue; } } } If, for whatever reason, we end up with one sample in the metadata going to unreachable, we'll completely loose the metadata that distinguishes between hot and cold here. Does that make sense? chandlerc: I feel like it would be nicer to just adjust the weight downward such that the probability is…
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions It is possible, however I try to follow simpler logic here. So the main question if we do not trust that metadata represents the value for unreachable edge correctly (we fix it by the weight downward) why we trust that data for hot/cold edges is valid and continue using it? However if you still insist on that I would propose I will create a follow up patch implementing this approach and leave this patch as is. Is it ok for you? skatkov: It is possible, however I try to follow simpler logic here. So the main question if we do not…
		chandlercUnsubmitted Not Done Reply Inline Actions It's not that I don't trust the metadata edge, it's about what is the strongest signal to the optimizer. When we have an unreachable, we don't need to wonder about what the metadata says because we have a control flow reason to know we shouldn't optimize that path. It isn't that the metadata is definitely wrong or bad, it is that the CFG analysis is definitely sufficient. So we shouldn't throw out the metadata for the reachable successors IMO. I think it would be most clear to do it in this patch. Is there a problem with doing that? chandlerc: It's not that I don't trust the metadata edge, it's about what is the strongest signal to the…
		assert(ReachableIdxs.size() && "Must be at least one reachable successor");
		BranchProbability PerEdge = ToDistribute / ReachableIdxs.size();
		for (auto i : ReachableIdxs) {
		BP[i] += PerEdge;
		ToDistribute -= PerEdge;
		chandlercUnsubmitted Not Done Reply Inline Actions Is it better to do this in the loop or to multiply by size and subtract that once? It seems simpler to write the latter way inside the addition below: BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size()); chandlerc: Is it better to do this in the loop or to multiply by size and subtract that once? It seems…
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions Will do. skatkov: Will do.
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions Funny, BranchProbability does not have an multiplication operation by scalar... I will leave it as is for now and upload one more patch implementing BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size()); BTW, I guess the compiler should optimize it anyway and move ToDistribute -= PerEdge; out of the loop. But who knows :) skatkov: Funny, BranchProbability does not have an multiplication operation by scalar... I will leave it…
		}
		// Tail goes to the first reachable edge.
		BP[ReachableIdxs[0]] += ToDistribute;
}		}
		chandlercUnsubmitted Not Done Reply Inline Actions Lift all of this into the if for there being some unreachable and some reachable successors? Just seems worth skipping the ToDistribute checks in the case where none of this matters. chandlerc: Lift all of this into the if for there being some unreachable and some reachable successors?
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions ok skatkov: ok

		for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
		setEdgeProbability(BB, i, BP[i]);

assert(WeightSum <= UINT32_MAX &&		assert(WeightSum <= UINT32_MAX &&
"Expected weights to scale down to 32 bits");		"Expected weights to scale down to 32 bits");

return true;		return true;
}		}

/// \brief Calculate edge weights for edges leading to cold blocks.		/// \brief Calculate edge weights for edges leading to cold blocks.
///		///
▲ Show 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI) {
assert(PostDominatedByColdCall.empty());		assert(PostDominatedByColdCall.empty());

// Walk the basic blocks in post-order so that we can build up state about		// Walk the basic blocks in post-order so that we can build up state about
// the successors of a block iteratively.		// the successors of a block iteratively.
for (auto BB : post_order(&F.getEntryBlock())) {		for (auto BB : post_order(&F.getEntryBlock())) {
DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");		DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");
updatePostDominatedByUnreachable(BB);		updatePostDominatedByUnreachable(BB);
updatePostDominatedByColdCall(BB);		updatePostDominatedByColdCall(BB);
if (calcUnreachableHeuristics(BB))
continue;
if (calcMetadataWeights(BB))		if (calcMetadataWeights(BB))
continue;		continue;
		if (calcUnreachableHeuristics(BB))
		continue;
if (calcColdCallHeuristics(BB))		if (calcColdCallHeuristics(BB))
continue;		continue;
if (calcLoopBranchHeuristics(BB, LI))		if (calcLoopBranchHeuristics(BB, LI))
continue;		continue;
if (calcPointerHeuristics(BB))		if (calcPointerHeuristics(BB))
continue;		continue;
if (calcZeroHeuristics(BB))		if (calcZeroHeuristics(BB))
continue;		continue;
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Analysis/BranchProbabilityInfo/basic.ll

	Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines
	else:			else:
	br label %exit			br label %exit

	exit:			exit:
	%result = phi i32 [ %a, %then ], [ %b, %else ]			%result = phi i32 [ %a, %then ], [ %b, %else ]
	ret i32 %result			ret i32 %result
	}			}

				define i32 @test_unreachable_with_prof_greater(i32 %a, i32 %b) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_greater'
				entry:
				%cond = icmp eq i32 %a, 42
				br i1 %cond, label %exit, label %unr, !prof !4

				; CHECK: edge entry -> exit probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]
				; CHECK: edge entry -> unr probability is 0x00000800 / 0x80000000 = 0.00%

				unr:
				unreachable

				exit:
				ret i32 %b
				}

				!4 = !{!"branch_weights", i32 0, i32 1}

				define i32 @test_unreachable_with_prof_equal(i32 %a, i32 %b) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_equal'
				entry:
				%cond = icmp eq i32 %a, 42
				br i1 %cond, label %exit, label %unr, !prof !5

				; CHECK: edge entry -> exit probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]
				; CHECK: edge entry -> unr probability is 0x00000800 / 0x80000000 = 0.00%

				unr:
				unreachable

				exit:
				ret i32 %b
				}

				!5 = !{!"branch_weights", i32 1048575, i32 1}

				define i32 @test_unreachable_with_prof_zero(i32 %a, i32 %b) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_zero'
				entry:
				%cond = icmp eq i32 %a, 42
				br i1 %cond, label %exit, label %unr, !prof !6

				; CHECK: edge entry -> exit probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]
				; CHECK: edge entry -> unr probability is 0x00000800 / 0x80000000 = 0.00%

				unr:
				unreachable

				exit:
				ret i32 %b
				}

				!6 = !{!"branch_weights", i32 0, i32 0}

				define i32 @test_unreachable_with_prof_less(i32 %a, i32 %b) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_prof_less'
				entry:
				%cond = icmp eq i32 %a, 42
				br i1 %cond, label %exit, label %unr, !prof !7

				; CHECK: edge entry -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]
				; CHECK: edge entry -> unr probability is 0x00000000 / 0x80000000 = 0.00%

				unr:
				unreachable

				exit:
				ret i32 %b
				}

				!7 = !{!"branch_weights", i32 1, i32 0}

				define i32 @test_unreachable_with_switch_prof1(i32 %i, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_switch_prof1'
				entry:
				switch i32 %i, label %case_a [ i32 1, label %case_b
				i32 2, label %case_c
				i32 3, label %case_d
				i32 4, label %case_e ], !prof !8
				; CHECK: edge entry -> case_a probability is 0x00000800 / 0x80000000 = 0.00%
				; CHECK: edge entry -> case_b probability is 0x07fffe01 / 0x80000000 = 6.25%
				; CHECK: edge entry -> case_c probability is 0x67fffdff / 0x80000000 = 81.25% [HOT edge]
				; CHECK: edge entry -> case_d probability is 0x07fffdff / 0x80000000 = 6.25%
				; CHECK: edge entry -> case_e probability is 0x07fffdff / 0x80000000 = 6.25%

				case_a:
				unreachable

				case_b:
				br label %exit
				; CHECK: edge case_b -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				case_c:
				br label %exit
				; CHECK: edge case_c -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				case_d:
				br label %exit
				; CHECK: edge case_d -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				case_e:
				br label %exit
				; CHECK: edge case_e -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				exit:
				%result = phi i32 [ %b, %case_b ],
				[ %c, %case_c ],
				[ %d, %case_d ],
				[ %e, %case_e ]
				ret i32 %result
				}

				!8 = !{!"branch_weights", i32 4, i32 4, i32 64, i32 4, i32 4}

				define i32 @test_unreachable_with_switch_prof2(i32 %i, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_switch_prof2'
				entry:
				switch i32 %i, label %case_a [ i32 1, label %case_b
				i32 2, label %case_c
				i32 3, label %case_d
				i32 4, label %case_e ], !prof !9
				; CHECK: edge entry -> case_a probability is 0x00000400 / 0x80000000 = 0.00%
				; CHECK: edge entry -> case_b probability is 0x00000400 / 0x80000000 = 0.00%
				; CHECK: edge entry -> case_c probability is 0x6aaaa800 / 0x80000000 = 83.33% [HOT edge]
				; CHECK: edge entry -> case_d probability is 0x0aaaa7ff / 0x80000000 = 8.33%
				; CHECK: edge entry -> case_e probability is 0x0aaaa7ff / 0x80000000 = 8.33%

				case_a:
				unreachable

				case_b:
				unreachable

				case_c:
				br label %exit
				; CHECK: edge case_c -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				case_d:
				br label %exit
				; CHECK: edge case_d -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				case_e:
				br label %exit
				; CHECK: edge case_e -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				exit:
				%result = phi i32 [ %c, %case_c ],
				[ %d, %case_d ],
				[ %e, %case_e ]
				ret i32 %result
				}

				!9 = !{!"branch_weights", i32 4, i32 4, i32 64, i32 4, i32 4}

				define i32 @test_unreachable_with_switch_prof3(i32 %i, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_switch_prof3'
				entry:
				switch i32 %i, label %case_a [ i32 1, label %case_b
				i32 2, label %case_c
				i32 3, label %case_d
				i32 4, label %case_e ], !prof !10
				; CHECK: edge entry -> case_a probability is 0x00000000 / 0x80000000 = 0.00%
				; CHECK: edge entry -> case_b probability is 0x00000400 / 0x80000000 = 0.00%
				; CHECK: edge entry -> case_c probability is 0x6e08fa2e / 0x80000000 = 85.96% [HOT edge]
				; CHECK: edge entry -> case_d probability is 0x08fb80e9 / 0x80000000 = 7.02%
				; CHECK: edge entry -> case_e probability is 0x08fb80e9 / 0x80000000 = 7.02%

				case_a:
				unreachable

				case_b:
				unreachable

				case_c:
				br label %exit
				; CHECK: edge case_c -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				case_d:
				br label %exit
				; CHECK: edge case_d -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				case_e:
				br label %exit
				; CHECK: edge case_e -> exit probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

				exit:
				%result = phi i32 [ %c, %case_c ],
				[ %d, %case_d ],
				[ %e, %case_e ]
				ret i32 %result
				}

				!10 = !{!"branch_weights", i32 0, i32 4, i32 64, i32 4, i32 4}

				define i32 @test_unreachable_with_switch_prof4(i32 %i, i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) {
				; CHECK: Printing analysis {{.*}} for function 'test_unreachable_with_switch_prof4'
				entry:
				switch i32 %i, label %case_a [ i32 1, label %case_b
				i32 2, label %case_c
				i32 3, label %case_d
				i32 4, label %case_e ], !prof !11
				; CHECK: edge entry -> case_a probability is 0x1999999a / 0x80000000 = 20.00%
				; CHECK: edge entry -> case_b probability is 0x1999999a / 0x80000000 = 20.00%
				; CHECK: edge entry -> case_c probability is 0x1999999a / 0x80000000 = 20.00%
				; CHECK: edge entry -> case_d probability is 0x1999999a / 0x80000000 = 20.00%
				; CHECK: edge entry -> case_e probability is 0x1999999a / 0x80000000 = 20.00%

				case_a:
				unreachable

				case_b:
				unreachable

				case_c:
				unreachable

				case_d:
				unreachable

				case_e:
				unreachable

				}

				!11 = !{!"branch_weights", i32 0, i32 4, i32 64, i32 4, i32 4}