This is an archive of the discontinued LLVM Phabricator instance.

[BPI] Reduce the probability of unreachable edge to minimal value greater than 0
ClosedPublic

Authored by skatkov on Mar 5 2017, 9:24 PM.

Details

Summary

The probability of edge coming to unreachable block should be as low as possible.
The change reduces the probability to minimal value greater than zero.

The bug https://bugs.llvm.org/show_bug.cgi?id=32214 show the example when
the probability of edge coming to unreachable block is greater than for edge
coming to out of the loop and it causes incorrect loop rotation.

Please note that with this change the behavior of unreachable heuristic is a bit different
than others. Specifically, before this change the sum of probabilities
coming to unreachable blocks have the same weight for all branches
(it was just split over all edges of this block coming to unreachable blocks).
With this change it might be slightly different but not to much due to probability of
taken branch to unreachable block is really small.

Diff Detail

Repository
rL LLVM

Event Timeline

skatkov created this revision.Mar 5 2017, 9:24 PM
chandlerc requested changes to this revision.Mar 6 2017, 11:50 AM

We intentionally didn't use zero because that seems more of a degenerate test case. What problem are you actually trying to solve here?

This revision now requires changes to proceed.Mar 6 2017, 11:50 AM

Hi Chandler,

Let's consider the following example: you have a loop with say throwing instruction (or @llvm.experimental.deoptimize) inside which is representable by unreachable basic block.
If we use any value greater than zero as probability of the branch coming to unreachable block it is possible that profiling will detect that normal exit from the loop has lower probability than unreachable block and this is incorrect.
So now we say that unreachable edge probability is (1, 2^20). If loop is executed > 2^20 iterations than edge coming to "normal" out of the loop basic block will have a probability (1, >2^20) and LLVM will consider unreachable edge be hotter than mormal exit.

This is unexpected behavior. And it is evident if we choose any constant greater than 0 we are able to find a loop iteration count breaking the expected behavior.
In reality as I understand, we expect that unreachable is actually never executed and this corresponds to 0 probability for corresponding edge.

Hope that it helps. If you'd like I could add this example to the summary section. I tried to put more general words in summary section.

skatkov updated this revision to Diff 99101.May 15 2017, 10:07 PM
skatkov edited edge metadata.
skatkov retitled this revision from [BPI] Unreachable branch has 0 probability to [BPI] Reduce the probability of unreachable edge to minimal value greater than 0.
skatkov edited the summary of this revision. (Show Details)

Please take a look, any comments are appreciated.

Please also note that I still think that zero probability for edge coming to unreachable edge still makes sense to me because unreachable edge is something we do not expect or do not want to point our attention to. If there are some pros to avoid zero probability, please let me know for my knowledge.

This at least seems like a very reasonable patch to me. I'd like to double check with David and Duncan as well (I've added them as reviewers). My only lingering concern would be if this will run the risk of underflow coming up more often, but that is probably not too much of an issue now that BFI does a better job of things.

Regarding using 'zero' as a viable probability: I'm mostly worried what other assumptions have crept into the codebase as a consequence of these being non-zero. But this is definitely something we can overcome if zero is a substantially better way to model this. The things I think would be needed:

  1. Some thorough testing of using zero in this way, both with and without an additional profile.
  2. Start a brief llvm-dev thread just to make sure others who may not be watching the reviews are paying attention.
  3. In #2, somewhat clearly spell out the reason why the approach in this patch is significantly improved by using 'zero'.

None of this is insurmountable, but it is somewhat expensive, so I only wanted us to go down this path if necessary.

davidxl edited edge metadata.May 16 2017, 8:55 AM

It is known that BFI can not handle zero weight and it will add 1 to zero weight when it sees it.

Here the unreachable Branch prob is not zero (without the change). Where does it get changed to zero?

It is known that BFI can not handle zero weight and it will add 1 to zero weight when it sees it.

Here the unreachable Branch prob is not zero (without the change). Where does it get changed to zero?

This patch doesn't.

In my response I suggested that *if* the author wants to go beyond the current patch (which keeps unreachable at >0) they should discuss it more broadly. I don't think you need to worry about that today, they'll send email if they want to go there.

I added you to this patch to make sure the technique of dropping to 1 and maxing out the other side makes sense (now that we factor in when metadata has more precise information).

That is not my question :)

My question is that without the patch, the probability of UR branch is actually lower (1, 0x8000....). Looks like it gets truncated somewhere? In other words, the fix can be put to where the truncation happens.

That is not my question :)

My question is that without the patch, the probability of UR branch is actually lower (1, 0x8000....). Looks like it gets truncated somewhere? In other words, the fix can be put to where the truncation happens.

I think I must be missing something ... Where is the probability of the UR branch lower before this patch? At least, the test cases updated, the probability of the UR branch is *higher* before this patch.

From the comments in the review thread, the goal here is to avoid the case where the UR branch probability ends up *higher* than the loop exit branch probability.

Chandler is right in all comments :)

  1. The main motivation of the patch is to avoid "the case where the UR branch probability ends up *higher* than the loop exit branch probability". In this terms I'm completely ok with this patch and there is not need to go with zero. The question about zero from my side was mostly about understanding why zero is not good.
  2. Th UR branch probability without patch is equal to 1/(2^20) while with this change it will be 1/(2^31), so with the patch it becomes lower.

However I need to check what happens if metadata provides a weight greater than 2^31. Will do it tomorrow.

dexonsmith edited edge metadata.May 16 2017, 10:42 AM

The testcase updates look good to me, and demonstrate the patch is doing what's intended. It might be worth adding a couple of tests for multiple branch targets (switch statements), since I'm not convinced everything adds up to 1 right now.

As an aside, it seems the logic around unreachable (with and without the patch) might pessimize code paths that lead to noreturn functions, such as posix_spawn(). Is that really what we want? Why?

lib/Analysis/BranchProbabilityInfo.cpp
202–206 ↗(On Diff #99101)

This doesn't look to me like it will add up to BranchProbability::getOne() in the end. Is that a problem?

As an aside, it seems the logic around unreachable (with and without the patch) might pessimize code paths that lead to noreturn functions, such as posix_spawn(). Is that really what we want? Why?

Heh, I thought about bringing this up, but it really seems orthogonal and something I wanted to look at.

Yeah, generally, the unreachable logic I wrote eons ago was dead wrong and this is still wrong in the case of nice "run forever" functions. I'll either send a patch or file a bug about this.

As an aside, it seems the logic around unreachable (with and without the patch) might pessimize code paths that lead to noreturn functions, such as posix_spawn(). Is that really what we want? Why?

Heh, I thought about bringing this up, but it really seems orthogonal and something I wanted to look at.

Yeah, generally, the unreachable logic I wrote eons ago was dead wrong and this is still wrong in the case of nice "run forever" functions. I'll either send a patch or file a bug about this.

I agree, no reason to hold the patch up.

The testcase updates look good to me, and demonstrate the patch is doing what's intended. It might be worth adding a couple of tests for multiple branch targets (switch statements), since I'm not convinced everything adds up to 1 right now.

The test @test2 represents the case with switch or you are talking about case when sum of probabilities is not equal to 1?

lib/Analysis/BranchProbabilityInfo.cpp
202–206 ↗(On Diff #99101)

The sum of probabilities might be different than 1 and this consistent with all other heuristics in this file.
It seems that it was not considered as problem before.

If you think it should be fixed I can come up with a follow-up patch fixing all cases. The question is actually where to add the remainder.

davidxl added inline comments.May 16 2017, 12:45 PM
lib/Analysis/BranchProbabilityInfo.cpp
291 ↗(On Diff #99101)

Not sure if we want to add a case to cover the meta data is smaller and is kept.

The only possible way is to use 0 weight, which actually may end up creating a a larger BP (after BFI bumping it up). So we should probably add a comment here and unconditionally update the Unreachable branch (i.e. ignoring 0 weight metadata).

skatkov added inline comments.May 16 2017, 9:53 PM
lib/Analysis/BranchProbabilityInfo.cpp
291 ↗(On Diff #99101)

I prefer to keep it as is due to current logic is correct semantically independent on what UR_TAKEN_PROB is. If someone in the future increases the UR_TAKEN_PROB the code will work correctly anyway.

Hello guys,

you are saying that BFI bumps the zero probability but I still see in MachineBlockPlacement pass that it actively uses branch probabilities and it seems that they are original ones. At least I see the zero probability if I set it in metadata.

Could you please point me to the place where it is bumped?

Thank you,
Serguei.

David, thank you very much for pointing me to place where the weight is bumped in BFI. Yes, it is really bumped for branch frequency computation but branch probability is not updated that is why weight 0 helps me for block placement but 1 is not (1 - helps to reduce the probability but does not help in common case).

Let me just explain one more time where the problem is.
Consider the test basing on https://bugs.llvm.org/show_bug.cgi?id=32214.
We have a loop
LoopHeader -> (UnreachableExit, Middleblock)
Middleblock -> (ColdPath, Backedge)
ColdPath ->Backedge
Backedge -> (NormalExit, LoopHeader)

MiddleBlock and ColdPath are only needed for triggering loop rotation.
The probability of the edge LoopHeader -> UnreachableExit is set to 1/2^20 without this patch and 1/2^31 within this patch.
Let's probability of edge Backedge -> NormalExit is obtained by real profile date and equal to 1/2^31.

MachineBlockPlacement pass tries to do a loop rotation. It will build a loop chain: LoopHeader->MiddleBlock->BackEdge->ColdPath.
To do a loop rotation it needs to find the best exit. It does it using the following (it reality it is a bit more complex but for explanation it is enough):
BlockFrequency ExitEdgeFreq = MBFI->getBlockFreq(MBB) * SuccProb;
There are two exits:
LoopHeader -> UnreachableExit
Backedge -> NormalExit

The frequency of LoopHeader is greater than frequency of Backedge due to mass comes out to UnreachableBlock (even f it's probability is zero due to bump of the weight in BFI).
So without this patch SuccProb for UnreachableExit is equal to 1/2^20 and chances the ExitEdgeFreq will be greater for UnreachableExit than for NormalExit more.
Within this patch the worst case is when probability of edge Backedge -> NormalExit is the lowest one (1/2^31).
In this case SuccProb will be the same for both exits. But taking into account that frequency of LoopHeader is greater than for Backedge the ExitEdgeFreq will be greater for UnreachableExit anyway.
However when SuccProb for UnreachableExit is equal to zero the NormalExit becomes the best exit.
That is why for corner case the zero unreachable probability helps to fix the loop rotation issue.

So, this patch makes sense anyway is due to it reduces the number of cases where loop rotation works incorrectly while for the corner cases Zero probability is required.

Any opinions? May be I missed anything?

BTW, one of the possible another solution might be to change MachineBlockPlacement pass to use frequency of exit block:
change
BlockFrequency ExitEdgeFreq = MBFI->getBlockFreq(MBB) * SuccProb;
to
BlockFrequency ExitEdgeFreq = MBFI->getBlockFreq(Succ);
In this case the hottest exit will be chosen basing on frequency of exit block...

Actually I tend to think that we do not need zero probability. Instead I will update loop rotation to prohibit the case when rotation breaks the hot path. In this case for corner case we just will not take the best case but still will not do worse.

Hello Guys,

To make a progress with landing this patch I need a feedback from you. As I see:

  1. Chandler had no issues with this patch
  2. David in his e-mail wrote that he is ok with this patch
  3. Duncan made a proposal to dd some tests with switch.

There are several tests currently with switch statement with unreachable involved:
basic.ll:@test_unreachable_with_switch_prof1
basic.ll:@test_unreachable_with_switch_prof2
basic.ll:@test_unreachable_with_switch_prof3
basic.ll:@test_unreachable_with_switch_prof4
noreturn.ll:@test2
some of them shows that sum of probabilities are not equal to one:
basic.ll:@test_unreachable_with_switch_prof4

Do you want to see some specific tests to be added?

Is there any thing else what prevent you from accepting this change? Please let me know and I'll fix it.

Hello Guys,

To make a progress with landing this patch I need a feedback from you. As I see:

  1. Chandler had no issues with this patch
  2. David in his e-mail wrote that he is ok with this patch
  3. Duncan made a proposal to dd some tests with switch.

I'm just waiting for Duncan to confirm he's happy with the tests. I was planning to give him at least another day as he may have just been busy today.

sure, thank you Chandler for your time!

dexonsmith accepted this revision.May 17 2017, 9:07 PM

Tests LGTM.

lib/Analysis/BranchProbabilityInfo.cpp
202–206 ↗(On Diff #99101)

A follow-up patch seems fine. I suggest spreading the remainder among the branches, 1 each until you run out, skipping the unreachable branch.

chandlerc accepted this revision.May 17 2017, 9:14 PM
This revision is now accepted and ready to land.May 17 2017, 9:14 PM
skatkov added inline comments.May 17 2017, 9:22 PM
lib/Analysis/BranchProbabilityInfo.cpp
202–206 ↗(On Diff #99101)

Yest It is clear, the question is the following:
we have 3 branches, 1 unreachable and 2 reachable. The probability of unreachable is 1/0x80000000. Each reachable branch has 0x3fffffff/0x80000000. The remainder is 1. (Please note that with the current patch this remainder will always less then number of reachable branches). So we have two reachable branches and remainder is 1. Where to put it into the first reachable branch or second?

In reality it might be more useful information that two reachable branches has the equal probability then sum of probabilities is equal to one.
Please also note that all other heuristics in these file do not add this remainder. Also please not that before this patch remainder for unreachable heuristic was not ignored as well.

So if I do follow-up patch, it makes sense to fix all other cases as well. However I do not know the clear answer at this moment where to add this remainder. That is the problem.

Duncan and Chandler, thank you for approval.

This revision was automatically updated to reflect the committed changes.