This is an archive of the discontinued LLVM Phabricator instance.

[BPI] Use metadata info before any other heuristics
ClosedPublic

Authored by skatkov on Mar 5 2017, 8:27 PM.

Details

Summary

Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is distributed
among other reachable blocks equally.

An example where metadata might be more strong then unreachable heuristic is as follows:
it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable heuristics we'll set
the probability for A to (1, 2^20) and now edge of A becomes hotter than edge of B.
This is unexpected behavior.

This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214

Diff Detail

Event Timeline

skatkov created this revision.Mar 5 2017, 8:27 PM

The problem I see here is that calcUnreachableHeuristics computes PostDominatedByUnreachable and if metadata is present then we miss this computation. From other point of view if metadata is present than it is better to use it. So the problem is when metadata is present in some cases but no in all cases.

vsk edited edge metadata.EditedMar 5 2017, 10:38 PM

I haven't worked on this area much, but this seems like a reasonable change.

The test case should be stronger because it would also pass if the change from D30633 were applied. Maybe you could use branch weight metadata which states that Pr[entry -> deopt] = 1, then check that we actually report that.

@skatkov you had a concern that this patch would cause PostDominatedByUnreachable to not be computed as often. What kinds of problems would this cause?

Hi Vedant,
I picked your name as one who touched this code.

Thank you, for the review and good point about the testcase. I will update it after gathring a bit more of review.

As I've undersood PostDominatedByUnreachable is computed inside calcUnreachableHeuristics, if on the path there will be some metadata available for some block which dominates unreachable block we will handle it in calcMetadataWeights and this block will not be added to PostDominatedByUnreachable. So the predecessor of this block will not consider it as dominating unreachable block. So the analysis will not be complete. So the trouble happens if metadata is present but not for each branch.

To resolve it we can run calcUnreachableHeuristics, rememeber the result and force running calcMetadataWeights to overwrite our heuristics. After calcMetadataWeights we can re-check the result of calcUnreachableHeuristics and bailout if any of previous ones handled block.
It is not clean from code but it works.

skatkov updated this revision to Diff 90650.Mar 5 2017, 11:09 PM
skatkov edited the summary of this revision. (Show Details)

Actually I've updated a test like Vedant suggested. I like it more.

vsk added a comment.Mar 5 2017, 11:40 PM

Thanks for explaining. It looks like PostDominatedByUnreachable needs to be updated every time we visit a BB. I think we should factor out the logic that updates PostDominatedByUnreachable, and make sure that the update happens every time a BB is visited, unconditionally. You could save some of the computations and forward them to calcUnreachableHeuristics, no need to overwrite any edge probabilities.

Please note that PostDominatedByColdCall has the same potential issue...

This is some kind of redundant computation if all metadata is present. So the main question here I would say whether it is possible the metadata is present but not for all branches. And if it is true, do we still want to have the precise information?

Note that it is possible to have some BB to have metadata while others do not (e.g with builtin_expect). Your patch may break in those case when PostDominatebyUnreachable computation is skipped with this change.

That is what I talking about.
So it seems that it would be right if we compute the domination information for both PostDominatedByUnreachable and PostDominatedByColdCall for all BBs, correct?

chandlerc edited edge metadata.Mar 6 2017, 1:18 PM

In addition to the issue David pointed out, I don't understand the motivation yet.

It would be much more helpful to describe in the patch exactly what motivates the change so that we don't have to guess. =]

Relatedly, I think there are several heuristics that are actually more accurate than any metadata. For example, even if there is metadata that says code which is post-dominated by unreachable is hot, it seems much more likely that the metadata is wrong as we are *guaranteed* unreachable is, er, not reached. =] If this is the heuristic you're trying to change, I suspect that there is instead a bug in how we are computing it, and it isn't just about metadata being more reliable.

Hi Chandler, please take a look at my example from D30633. The story is the same, profiling in metadata may say us that probability of unreachable block is zero (and it is more accurate than our heuristic) while we override this proflining data with our heuristic value causing the unreachable block is hotter than "normal" exit from the loop.

I tried to generailze the summary and do not use some specific example. I can put an example to the description if you want with the next version of the patch.

In general, to me the metadata is something user of LLVM would like us to follow. I do not see any reason to violate user's choice in this case until it breaks something. If metadata is wrong then user should fix the metadata, no need to fix it on our side.

skatkov updated this revision to Diff 90799.Mar 6 2017, 11:29 PM
skatkov edited the summary of this revision. (Show Details)
skatkov edited the summary of this revision. (Show Details)Mar 7 2017, 4:17 AM
reames added a subscriber: reames.Mar 9 2017, 1:42 PM

FYI: Serguei is going to file an upstream bug with a clear illustration of where loop rotations goes wrong due to the issue identified here. Essentially, for a sufficiently long running loop, the static heuristic for unreached blocks is not strongly biased enough. In our case, we have branch weights specified which are more strongly biased than the static heuristic result. Using the static heuristic by itself is clearly wrong, but I do see Chandler's point about the static heuristics providing useful information. Possibly we should be using the stronger of the two sources of information?

I have prepared an example illustrating the bad loop rotation behavior in block-placement pass due to incorrect behavior of BPI to file a bug but I do not have an account to bugzilla. I have requested an account and as soon I get it I will file a bug.

I will add an option which makes unreachable case first one.

skatkov updated this revision to Diff 91271.Mar 9 2017, 11:57 PM

Option to select unreachable first added.
Test for the option is added.
updatePostDominated is split for clearness.

I still do not have bugzilla account. Will file a bug as soon as I get it.

Thanks to Artur who filed a bug instead of me because I did not get an account till this moment: https://bugs.llvm.org/show_bug.cgi?id=32214. The bug describes the issue demonstrating the unexpected BPI behavior. Please take a look.

Hi, anything I can do more to make a progress?

Chandler, any comments here?

Serguei and I talked offline a bit about your concerns. He's going to post a patch which uses the minimum frequency computed from either the static heuristic or the profile data for a block ending in unreachable. That seems like it addresses your concern to me, do you agree?

skatkov updated this revision to Diff 93830.Apr 3 2017, 1:33 AM
skatkov edited the summary of this revision. (Show Details)

Please review. To simplify the review, I potentially can split the patch to two ones: refactoring of collection of post domination information and fix itself. Please let me know if it makes sense.

reames added a comment.Apr 3 2017, 7:48 AM

Given lack of response from Chandler following the update from Serguei, I am going to move forward with the review of this patch. I do not intend to hold the patch any longer for Chandler's response. Note that Serguei made one major change in the approach: rather than having the metadata weight unconditionally win, he now has the patch structured so that a branch to unreachable takes the *minimum* frequency produced by either the static heuristic or the metadata.

Please review. To simplify the review, I potentially can split the patch to two ones: refactoring of collection of post domination information and fix itself. Please let me know if it makes sense.

Serguei, please split off the refactoring patch. It will make my life much easier as the reviewer.

Also, please update the description of this review thread to make it clear we're taking the minimum of the static heuristic and the metadata. The current description reflects the original patch, not the updated one.

Given lack of response from Chandler following the update from Serguei

FWIW, I was travelling back to the US. Sorry for delay. I should have a response to this patch today or tomorrow at the latest.

skatkov updated this revision to Diff 94179.Apr 5 2017, 2:41 AM
skatkov edited the summary of this revision. (Show Details)

The re-factoring part has been split out in https://reviews.llvm.org/D31701.
This is only fix part. Please review.

First off, thanks for the new approach. I like this direction a lot. Some more tactical comments here.

lib/Analysis/BranchProbabilityInfo.cpp
320–325

There is a lot of code here. I wonder, is it possible to share the logic here with the logic above that is used in the absence of metadata?

327

To avoid re-hitting this set for every successor, you could above append the successor indices that are in this set to a list, and then loop over that list here. The size of the list would still give you the count of unreachable successors vs. reachable.

329–332

I feel like it would be nicer to just adjust the weight downward such that the probability is essentially the minimum of the two sources of information. That way we don't lose the metadata's weights for the different successors that *don't* go to unreachable.

Consider the test case (in pseudo C code):

for (...) {
  switch (cond) {
    default:
      unreachable
    case 2: // HOT
      // something tiny
      continue;
    case 3: // COLD
      // huge pile of ugly code
      continue;
    }
  }
}

If, for whatever reason, we end up with one sample in the metadata going to unreachable, we'll completely loose the metadata that distinguishes between hot and cold here.

Does that make sense?

skatkov updated this revision to Diff 94647.Apr 10 2017, 12:51 AM
skatkov marked 2 inline comments as done.Apr 10 2017, 12:57 AM
skatkov added inline comments.
lib/Analysis/BranchProbabilityInfo.cpp
329–332

It is possible, however I try to follow simpler logic here. So the main question if we do not trust that metadata represents the value for unreachable edge correctly (we fix it by the weight downward) why we trust that data for hot/cold edges is valid and continue using it?

However if you still insist on that I would propose I will create a follow up patch implementing this approach and leave this patch as is. Is it ok for you?

chandlerc added inline comments.Apr 10 2017, 2:31 PM
lib/Analysis/BranchProbabilityInfo.cpp
329–332

It's not that I don't trust the metadata edge, it's about what is the strongest signal to the optimizer.

When we have an unreachable, we don't need to wonder about what the metadata says because we have a control flow reason to know we shouldn't optimize that path. It isn't that the metadata is definitely wrong or bad, it is that the CFG analysis is definitely sufficient.

So we shouldn't throw out the metadata for the reachable successors IMO.

I think it would be most clear to do it in this patch. Is there a problem with doing that?

it will do the patch more complex but ok, I'll do that.

skatkov updated this revision to Diff 94803.Apr 11 2017, 4:03 AM
skatkov edited the summary of this revision. (Show Details)

Hi Chandler, please review. I've also added a couple of new tests for switch case.

Hi Chandler, could you please take a look into the last version where I addressed your concern?

chandlerc accepted this revision.Apr 14 2017, 12:19 AM

Sorry I couldn't get back to it sooner, first chance I had.

However, this looks really, really nice. Thanks for seeing it all the way through. I love the test cases where we nicely zero out the unreachable bits of the switch but leave the clear hot path based on metadata.

Some really minor code suggestions below. Feel free to land with those.

lib/Analysis/BranchProbabilityInfo.cpp
254

Didn't this get factored out into a separate patch? Not a big deal, but seems like a clear thing to factor out.

330–341

Lift all of this into the if for there being some unreachable and some reachable successors? Just seems worth skipping the ToDistribute checks in the case where none of this matters.

337

Is it better to do this in the loop or to multiply by size and subtract that once? It seems simpler to write the latter way inside the addition below:

BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size());
This revision is now accepted and ready to land.Apr 14 2017, 12:19 AM

Thank you, Chandler for your time!

skatkov marked an inline comment as done.Apr 14 2017, 12:26 AM
skatkov added inline comments.
lib/Analysis/BranchProbabilityInfo.cpp
254

It will be in the next patch which you have already reviewed but I made that patch to depend on this one, so I will handle it after this patch is landed.

330–341

ok

337

Will do.

skatkov added inline comments.Apr 14 2017, 1:15 AM
lib/Analysis/BranchProbabilityInfo.cpp
337

Funny, BranchProbability does not have an multiplication operation by scalar...
I will leave it as is for now and upload one more patch implementing BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size());

BTW, I guess the compiler should optimize it anyway and move ToDistribute -= PerEdge; out of the loop. But who knows :)

skatkov updated this revision to Diff 95280.Apr 14 2017, 2:23 AM

Two comments addressed. I will not submit it until Monday.

Chandler, if you have a chance please let me know if you are ok with my suggestion to update
BP[ReachableIdxs[0]] += ToDistribute - (PerEdge * ReachableIdxs.size());
in a follow-up patch.

This revision was automatically updated to reflect the committed changes.