Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). Two more analyses (BlockFrequencyInfo and BranchProbabilityInfo) are needed then.
Details
- Reviewers
chandlerc davidxl dexonsmith - Commits
- rGb74d3b3b86c6: Update the branch weight metadata in JumpThreading pass.
rG7ab123a5cf7d: Update the branch weight metadata in JumpThreading pass.
rG3320bcd81569: Update the branch weight metadata in JumpThreading pass.
rL250345: Update the branch weight metadata in JumpThreading pass.
rL250204: Update the branch weight metadata in JumpThreading pass.
rL250089: Update the branch weight metadata in JumpThreading pass.
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
1522 ↗ | (On Diff #29140) | Shouldn't you be using weights here instead of frequencies? Frequencies can be computed later out of the branch weights. |
Weights are relative values that can only be compared/mixed across
branches sharing the same source BB. For Jump-thread updating, the
split-out edge from NewBB to SuccBB is a new edge without profile
info. We need to use the profile info from edge Pred -> NewBB which
does not share the same source BB with BB->SuccBB. Frequency needs to
be used for scaling purpose here.
David
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
172 ↗ | (On Diff #29140) | Ideally the new passes in the pipeline is only needed when profile-instr-use or autofdo is turned on. |
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
1522 ↗ | (On Diff #29140) | Weights are only meaningful in terms of the same source BB. Here I need to know how many weights on BB to SuccBB are contributed by the edge PredBB to BB, and I could not compare the weights on those two edges directly as they only make sense for PredBB and BB respectively. |
This update merges the patch that adds setBlockFreq() interfaces to BlockFrequencyInfo (http://reviews.llvm.org/D11104). As both BFI and BPI can be used without including passes with them, in jump-threading pass, we can update the edge weights only in PGO mode (i.e., only when we have the entry count of a function) without introducing new passes. To make sure both BFI/BPI have up-to-date information, for every CFG change, we need to update BFI and BPI accordingly. Two utility functions (JumpThreading::UpdateEdgeWeight() and JumpThreading::SplitBlockPreds()) are created to facilitate the update.
include/llvm/Analysis/BlockFrequencyInfoImpl.h | ||
---|---|---|
972 ↗ | (On Diff #29963) | It might be slightly better to split the interfaces into two: one is set and the other is update. |
lib/Analysis/BranchProbabilityInfo.cpp | ||
622 ↗ | (On Diff #29963) | It is probably better to call this one 'updateEdgeWeight' |
lib/Transforms/Scalar/JumpThreading.cpp | ||
150 ↗ | (On Diff #29963) | This function certainly does more than just edge weight update -- a different name may be better. |
1560 ↗ | (On Diff #29963) | assert here? |
1568 ↗ | (On Diff #29963) | fix typo. |
1569 ↗ | (On Diff #29963) | A side question: do we have a bug tracking that data insanity issue? |
1583 ↗ | (On Diff #29963) | Should probably assert here that BPI->getEdgeWeight(..) == W |
1583 ↗ | (On Diff #29963) | This is not really BBFreq, it should be BB2Succ Edge frequency |
1588 ↗ | (On Diff #29963) | Another insanity to track? |
1591 ↗ | (On Diff #29963) | I don't think this one is correct. You are using the New BB->Succ Frequency in the update which is wrong. The invariant is : PredBBFreq + NewBB2SuccBBEdgeFreq = OldBB2SuccBBEdgeFreq and the update is: NewWeight = (NewBB2SuccBBEdgeFreq/OldBB2EdgeFreq)* OldWeight combine them we have NewWeight = (1 - PredBBFreq/OldBB2EdgeFreq)*OldWeight |
I suggest also adding another test case (documented in the PR -- the one that has problem with jump-threading + loop-rotation + jump-threading).
This reminds me of a way to do profile sanity check/validation when debug check is turned on:
- compute the BB freq before the pass
- cfg transformation + BB freq update
- branch probability weight update
- recompute BFI using updated weight in 3). The result of 4) should match that of 2).
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
1572 ↗ | (On Diff #30214) | change the name to OrigBBFreq |
1578 ↗ | (On Diff #30214) | Is this still an issue that needs the workaround here? |
1596 ↗ | (On Diff #30214) | Why this condition? |
1599 ↗ | (On Diff #30214) | It is unclear how can this happen. |
1614 ↗ | (On Diff #30214) | The scaling code looks unnecessarily complicated. How about this: a) Pre-normalization
b) update BB->SuccBB with (1- Freq(Pred->BB)/Freq(BB->SuccBB)) c) post-normalization: scale all weights such that the weight sum is 2^20 Some helper routine can be defined. |
You mean comparing result from 4) and 1)? What if a CFG node is split? If CFG is modified, we may not be able to find one-to-one BB match. To make a reasonable comparison, we may need to understand what the pass does...
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
1578 ↗ | (On Diff #30214) | Yes. The issue emerges after jump-threading + loop-rotation + jump-threading will lead to this. |
1596 ↗ | (On Diff #30214) | I forget to add a comment when I did this. But I think this condition can be removed now. |
1599 ↗ | (On Diff #30214) | Consider the following CFG and assume the edge frequency is 100 for each edge: A B \ / C / \ D E In some pass C may be split into two blocks: A B | | C1 C2 | X | D E Then edges from C1/C2 have frequency 50. Now suppose in jump-threading pass we thread B->E across C2 (C2->D is impossible at runtime), then B has frequency 100 but C2->E has frequency 50. This is when this condition is not met. |
1614 ↗ | (On Diff #30214) | This method looks good. I have two questions:
|
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
1599 ↗ | (On Diff #30214) | What I meant is that at runtime Pred->BB can only be followed by BB->Succ, then Freq(BB->SuccBB) can be proved to be no less than Freq(PredBB->BB) -- so that check is unnecessary. |
1614 ↗ | (On Diff #30214) |
|
Update the patch by normalizing a list of weights of uint64_t type using MachineBranchProbabilityInfo::normalizeEdgeWeights().
Please take a look.
As discussed, there is another more complicated jump-threading related
profile updating bug (also involving loop rotation). Do you plan to
address that in a different patch?
David
After reading the code again, I think the change needs to be restructured. There are a couple of main points we should stick to:
- When CFG is changed, the code range in which the in-memory profile data is in inconsistent state should be minimized -- ideally immediately after the CFG is changed -- so that the profile state before and after the CFG change is kosher.
- BFI and BPI update need to be bundled together
- weight meta data should be a direct translation of updated BPI.
I suggest reorganizing the change like this
- BFI and BPI update for threading should be in one place (one helper function). a) BBs involved: PredBB, NewBB, BB, and SuccBB b) Edges involved: Pred->BB : removed Pred->NewBB: new NewBB->SuccBB: new BB->SuccBB: need edge freq and bp update.
- after threading profile update, the meta data update should be simple.
There should be no worry for overflow as subtraction is involved. Due to this, the scaling code can be simple.
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
1453 ↗ | (On Diff #31408) | Should the NewBB' freq = BFI->getBlockFreq(PredBB)*BPI->getEdgeProbability(PredBB, BB)? Where is the BPI update for PredBB->NewBB? |
1571 ↗ | (On Diff #31408) | Freq(PredBB->NewBB) / Orig_Freq(BB->SuccBB) which is basically Freq (NewBB)/Orig_Freq(BB->SuccBB) |
1585 ↗ | (On Diff #31408) | When NewBB is created, there should be no edge from PredBB to BB anymore. The code here assumes the BPI has not been updated which can be broken. I think the right way is to use NewBB's frequency which is already computed -- the update interface either needs to pass in NewBB's freq or NewBB (clone of BB). |
1591 ↗ | (On Diff #31408) | To avoid ambiguity of the name, it is better to change the name of the variable to BBNewFreq, otherwise it may be thought of as NewBB's frequency where NewBB is the cloned bb of 'BB'. Similarly change OrigBBFreq to BBOrigFreq. |
1595 ↗ | (On Diff #31408) | Here is the case where BB's frequency is updated, but the BPI for BB's successors are not updated. I think these updates needs to be wrapped into its own helper function independent of meta data weights update (mixing them makes the code hard to read and reason). |
1601 ↗ | (On Diff #31408) | This whole loop seems unnecessary. |
1621 ↗ | (On Diff #31408) | There does not seem to be a need to collect the old weights at all. As long as we have newly updated branch probabilty info (using the updated edge frequency), the new weights can be resynthezied. |
BFI and BPI are updated in different ways and it is difficult to write a general function doing this. However, I have greatly reduced the size of the function that updates BFI and BPI in this case.
- weight meta data should be a direct translation of updated BPI.
This is a good idea. I have used the edge frequencies to update edge weights. The only possible issue is that when the source BB's frequency is too low, we may suffer some precision loss. But I think it is OK because BB's frequency is low.
I suggest reorganizing the change like this
- BFI and BPI update for threading should be in one place (one helper function). a) BBs involved: PredBB, NewBB, BB, and SuccBB b) Edges involved: Pred->BB : removed Pred->NewBB: new NewBB->SuccBB: new BB->SuccBB: need edge freq and bp update.
This is now done in one function.
- after threading profile update, the meta data update should be simple.
Right.
There should be no worry for overflow as subtraction is involved. Due to this, the scaling code can be simple.
Right. As we now use edge frequency to calculate edge weights, the only think we should take care is that we need to normalize those frequencies to let them fit in 32-bit.
lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
1453 ↗ | (On Diff #31408) | You are right. I have corrected this part. The BPI update for PredBB->NewBB is unnecessary here as PredBB's old successor BB is replaced by NewBB and PredBB's weights list in BPI can remain unchanged (in BPI each edge weight is mapped with a BB-to-success_index pair). |
1585 ↗ | (On Diff #31408) | You are right. Here I should use NewBB to calculate the frequency of the edge from PredBB to BB. |
1591 ↗ | (On Diff #31408) | OK. |
1601 ↗ | (On Diff #31408) | This is removed now as we are using edge frequencies to calculate edge weights. |
1621 ↗ | (On Diff #31408) | This is done now. |
1596 ↗ | (On Diff #30214) | It was used to prevent the "Denominator cannot be 0!" error when BB2SuccBBFreq is used to build branch probability. I have also handled this case when scaling weights below. |
lib/Support/BlockFrequency.cpp | ||
---|---|---|
70 ↗ | (On Diff #33467) | No underflow check here? |
lib/Transforms/Scalar/JumpThreading.cpp | ||
1591 ↗ | (On Diff #33467) | It is better to call PredBB2BBFreq NewBBFreq (with comment: frequency of the newly split BB) |
1601 ↗ | (On Diff #33467) | operator - does not have underflow check (see comment above). |
1607 ↗ | (On Diff #33467) | Is this interface available in trunk? |
1616 ↗ | (On Diff #33467) | BPI for PredBB also needs to be updated for completeness -- as its successor BB is now NewBB |
lib/Support/BlockFrequency.cpp | ||
---|---|---|
70 ↗ | (On Diff #33467) | -= below is using the new operator defined above, so underflow is checked. |
lib/Transforms/Scalar/JumpThreading.cpp | ||
1591 ↗ | (On Diff #33467) | OK. |
1601 ↗ | (On Diff #33467) | But -= has underflow check which is used in operator -. |
1607 ↗ | (On Diff #33467) | Not yet :( Still waiting for the review of that patch. |
1616 ↗ | (On Diff #33467) | The old PredBB's successor BB is replaced by NewBB in the same position, so BPI doesn't have to be updated. (In BPI, the successor weights are stores in a map with the index of the successor as the key. The NewBB is set as a successor using the following code: if (PredTerm->getSuccessor(i) == BB) { BB->removePredecessor(PredBB, true); PredTerm->setSuccessor(i, NewBB); } So the position of this new successor is not changed.) |
lib/Support/BlockFrequency.cpp | ||
---|---|---|
70 ↗ | (On Diff #33467) | ok |
lib/Transforms/Scalar/JumpThreading.cpp | ||
1607 ↗ | (On Diff #33467) | Since the interfaces are under large rework, I suggest simply extract extract the minimal necessary normalization code from that patch here (and abandon that one). Is it doable? |
1616 ↗ | (On Diff #33467) | ok then. |
Update the patch by adding the definition of MachineBranchProbabilityInfo::normalizeEdgeWeights().