Page MenuHomePhabricator

Update the branch weight metadata in JumpThreading pass.
ClosedPublic

Authored by congh on Jul 6 2015, 5:22 PM.

Details

Summary

Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). Two more analyses (BlockFrequencyInfo and BranchProbabilityInfo) are needed then.

Diff Detail

Repository
rL LLVM

Event Timeline

congh updated this revision to Diff 29140.Jul 6 2015, 5:22 PM
congh retitled this revision from to Update the branch weight metadata in JumpThreading pass..
congh updated this object.
congh added reviewers: chandlerc, davidxl.
congh added a subscriber: llvm-commits.
dnovillo added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
1522 ↗(On Diff #29140)

Shouldn't you be using weights here instead of frequencies? Frequencies can be computed later out of the branch weights.

davidxl edited edge metadata.Jul 7 2015, 1:08 PM
davidxl added a subscriber: davidxl.

Weights are relative values that can only be compared/mixed across
branches sharing the same source BB. For Jump-thread updating, the
split-out edge from NewBB to SuccBB is a new edge without profile
info. We need to use the profile info from edge Pred -> NewBB which
does not share the same source BB with BB->SuccBB. Frequency needs to
be used for scaling purpose here.

David

davidxl added inline comments.Jul 7 2015, 1:26 PM
lib/Transforms/Scalar/JumpThreading.cpp
172 ↗(On Diff #29140)

Ideally the new passes in the pipeline is only needed when profile-instr-use or autofdo is turned on.

congh added inline comments.Jul 7 2015, 1:45 PM
lib/Transforms/Scalar/JumpThreading.cpp
1522 ↗(On Diff #29140)

Weights are only meaningful in terms of the same source BB. Here I need to know how many weights on BB to SuccBB are contributed by the edge PredBB to BB, and I could not compare the weights on those two edges directly as they only make sense for PredBB and BB respectively.

congh updated this revision to Diff 29963.Jul 16 2015, 4:46 PM
congh added a reviewer: dexonsmith.

This update merges the patch that adds setBlockFreq() interfaces to BlockFrequencyInfo (http://reviews.llvm.org/D11104). As both BFI and BPI can be used without including passes with them, in jump-threading pass, we can update the edge weights only in PGO mode (i.e., only when we have the entry count of a function) without introducing new passes. To make sure both BFI/BPI have up-to-date information, for every CFG change, we need to update BFI and BPI accordingly. Two utility functions (JumpThreading::UpdateEdgeWeight() and JumpThreading::SplitBlockPreds()) are created to facilitate the update.

davidxl added inline comments.Jul 16 2015, 5:33 PM
include/llvm/Analysis/BlockFrequencyInfoImpl.h
972 ↗(On Diff #29963)

It might be slightly better to split the interfaces into two: one is set and the other is update.

lib/Analysis/BranchProbabilityInfo.cpp
622 ↗(On Diff #29963)

It is probably better to call this one 'updateEdgeWeight'

lib/Transforms/Scalar/JumpThreading.cpp
150 ↗(On Diff #29963)

This function certainly does more than just edge weight update -- a different name may be better.

1560 ↗(On Diff #29963)

assert here?

1568 ↗(On Diff #29963)

fix typo.

1569 ↗(On Diff #29963)

A side question: do we have a bug tracking that data insanity issue?

1583 ↗(On Diff #29963)

Should probably assert here that BPI->getEdgeWeight(..) == W

1583 ↗(On Diff #29963)

This is not really BBFreq, it should be BB2Succ Edge frequency

1588 ↗(On Diff #29963)

Another insanity to track?

1591 ↗(On Diff #29963)

I don't think this one is correct. You are using the New BB->Succ Frequency in the update which is wrong.

The invariant is :

PredBBFreq + NewBB2SuccBBEdgeFreq = OldBB2SuccBBEdgeFreq

and the update is:

NewWeight = (NewBB2SuccBBEdgeFreq/OldBB2EdgeFreq)* OldWeight

combine them we have

NewWeight = (1 - PredBBFreq/OldBB2EdgeFreq)*OldWeight

congh updated this revision to Diff 30017.Jul 17 2015, 11:15 AM

Update the patch according to David's comments.

davidxl added inline comments.Jul 17 2015, 11:23 AM
lib/Transforms/Scalar/JumpThreading.cpp
1543 ↗(On Diff #29963)

Can be skipped when Preds' s size == 1

1551 ↗(On Diff #29963)

Can be skipped if Preds's size is 1

congh updated this revision to Diff 30191.Jul 20 2015, 1:54 PM

Fixed a bug and let two newly added internal functions be private.

congh updated this revision to Diff 30214.Jul 20 2015, 4:49 PM

Fix precision loss problem when calculating new edge weights.

I suggest also adding another test case (documented in the PR -- the one that has problem with jump-threading + loop-rotation + jump-threading).

This reminds me of a way to do profile sanity check/validation when debug check is turned on:

  1. compute the BB freq before the pass
  2. cfg transformation + BB freq update
  3. branch probability weight update
  1. recompute BFI using updated weight in 3). The result of 4) should match that of 2).
lib/Transforms/Scalar/JumpThreading.cpp
1572 ↗(On Diff #30214)

change the name to OrigBBFreq

1578 ↗(On Diff #30214)

Is this still an issue that needs the workaround here?

1596 ↗(On Diff #30214)

Why this condition?

1599 ↗(On Diff #30214)

It is unclear how can this happen.

1614 ↗(On Diff #30214)

The scaling code looks unnecessarily complicated. How about this:

a) Pre-normalization

  1. first sum up all the weights before the update
  2. if the sum is too small, scale it up to a fixed val such as ScaledNumber<64>(1,20)
  3. compute the scale factor and rescale the weights

b) update BB->SuccBB with (1- Freq(Pred->BB)/Freq(BB->SuccBB))

c) post-normalization: scale all weights such that the weight sum is 2^20

Some helper routine can be defined.

congh added a comment.Jul 21 2015, 1:03 PM

I suggest also adding another test case (documented in the PR -- the one that has problem with jump-threading + loop-rotation + jump-threading).

This reminds me of a way to do profile sanity check/validation when debug check is turned on:

  1. compute the BB freq before the pass
  2. cfg transformation + BB freq update
  3. branch probability weight update
  4. recompute BFI using updated weight in 3). The result of 4) should match that of 2).

You mean comparing result from 4) and 1)? What if a CFG node is split? If CFG is modified, we may not be able to find one-to-one BB match. To make a reasonable comparison, we may need to understand what the pass does...

congh marked an inline comment as done.Jul 21 2015, 1:50 PM
congh added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
1578 ↗(On Diff #30214)

Yes. The issue emerges after jump-threading + loop-rotation + jump-threading will lead to this.

1596 ↗(On Diff #30214)

I forget to add a comment when I did this. But I think this condition can be removed now.

1599 ↗(On Diff #30214)

Consider the following CFG and assume the edge frequency is 100 for each edge:

A   B
 \ /
  C
 / \
D   E

In some pass C may be split into two blocks:

A   B
|   |
C1 C2
| X |
D   E

Then edges from C1/C2 have frequency 50. Now suppose in jump-threading pass we thread B->E across C2 (C2->D is impossible at runtime), then B has frequency 100 but C2->E has frequency 50. This is when this condition is not met.

1614 ↗(On Diff #30214)

This method looks good. I have two questions:

  1. How to tell if the sum is too small? Can we just normalize all weights such that the sum of them is 2^20?
  2. Is c) necessary?
congh updated this revision to Diff 30372.Jul 22 2015, 11:14 AM

Fixed two bugs in the patch.

davidxl added inline comments.Jul 22 2015, 4:24 PM
lib/Transforms/Scalar/JumpThreading.cpp
1599 ↗(On Diff #30214)

What I meant is that at runtime Pred->BB can only be followed by BB->Succ, then Freq(BB->SuccBB) can be proved to be no less than Freq(PredBB->BB) -- so that check is unnecessary.

1614 ↗(On Diff #30214)
  1. I think it is fine to do normalization uncondionally
  1. c is not strictly necessary. However I do expect in the future we can assert that the weight sum at any given time is always normalized.
congh updated this revision to Diff 31408.Aug 5 2015, 3:54 PM

Update the patch by normalizing a list of weights of uint64_t type using MachineBranchProbabilityInfo::normalizeEdgeWeights().

Please take a look.

As discussed, there is another more complicated jump-threading related
profile updating bug (also involving loop rotation). Do you plan to
address that in a different patch?

David

After reading the code again, I think the change needs to be restructured. There are a couple of main points we should stick to:

  1. When CFG is changed, the code range in which the in-memory profile data is in inconsistent state should be minimized -- ideally immediately after the CFG is changed -- so that the profile state before and after the CFG change is kosher.
  2. BFI and BPI update need to be bundled together
  3. weight meta data should be a direct translation of updated BPI.

I suggest reorganizing the change like this

  1. BFI and BPI update for threading should be in one place (one helper function). a) BBs involved: PredBB, NewBB, BB, and SuccBB b) Edges involved: Pred->BB : removed Pred->NewBB: new NewBB->SuccBB: new BB->SuccBB: need edge freq and bp update.
  1. after threading profile update, the meta data update should be simple.

There should be no worry for overflow as subtraction is involved. Due to this, the scaling code can be simple.

lib/Transforms/Scalar/JumpThreading.cpp
1453 ↗(On Diff #31408)

Should the NewBB' freq = BFI->getBlockFreq(PredBB)*BPI->getEdgeProbability(PredBB, BB)?

Where is the BPI update for PredBB->NewBB?

1571 ↗(On Diff #31408)

Freq(PredBB->NewBB) / Orig_Freq(BB->SuccBB)

which is basically Freq (NewBB)/Orig_Freq(BB->SuccBB)

1585 ↗(On Diff #31408)

When NewBB is created, there should be no edge from PredBB to BB anymore. The code here assumes the BPI has not been updated which can be broken.

I think the right way is to use NewBB's frequency which is already computed -- the update interface either needs to pass in NewBB's freq or NewBB (clone of BB).

1591 ↗(On Diff #31408)

To avoid ambiguity of the name, it is better to change the name of the variable to BBNewFreq, otherwise it may be thought of as NewBB's frequency where NewBB is the cloned bb of 'BB'.

Similarly change OrigBBFreq to BBOrigFreq.

1595 ↗(On Diff #31408)

Here is the case where BB's frequency is updated, but the BPI for BB's successors are not updated. I think these updates needs to be wrapped into its own helper function independent of meta data weights update (mixing them makes the code hard to read and reason).

1601 ↗(On Diff #31408)

This whole loop seems unnecessary.

1621 ↗(On Diff #31408)

There does not seem to be a need to collect the old weights at all. As long as we have newly updated branch probabilty info (using the updated edge frequency), the new weights can be resynthezied.

congh marked 3 inline comments as done.Aug 28 2015, 2:13 PM

After reading the code again, I think the change needs to be restructured. There are a couple of main points we should stick to:

  1. When CFG is changed, the code range in which the in-memory profile data is in inconsistent state should be minimized -- ideally immediately after the CFG is changed -- so that the profile state before and after the CFG change is kosher.
  2. BFI and BPI update need to be bundled together

BFI and BPI are updated in different ways and it is difficult to write a general function doing this. However, I have greatly reduced the size of the function that updates BFI and BPI in this case.

  1. weight meta data should be a direct translation of updated BPI.

This is a good idea. I have used the edge frequencies to update edge weights. The only possible issue is that when the source BB's frequency is too low, we may suffer some precision loss. But I think it is OK because BB's frequency is low.

I suggest reorganizing the change like this

  1. BFI and BPI update for threading should be in one place (one helper function). a) BBs involved: PredBB, NewBB, BB, and SuccBB b) Edges involved: Pred->BB : removed Pred->NewBB: new NewBB->SuccBB: new BB->SuccBB: need edge freq and bp update.

This is now done in one function.

  1. after threading profile update, the meta data update should be simple.

Right.

There should be no worry for overflow as subtraction is involved. Due to this, the scaling code can be simple.

Right. As we now use edge frequency to calculate edge weights, the only think we should take care is that we need to normalize those frequencies to let them fit in 32-bit.

lib/Transforms/Scalar/JumpThreading.cpp
1453 ↗(On Diff #31408)

You are right. I have corrected this part.

The BPI update for PredBB->NewBB is unnecessary here as PredBB's old successor BB is replaced by NewBB and PredBB's weights list in BPI can remain unchanged (in BPI each edge weight is mapped with a BB-to-success_index pair).

1585 ↗(On Diff #31408)

You are right. Here I should use NewBB to calculate the frequency of the edge from PredBB to BB.

1591 ↗(On Diff #31408)

OK.

1601 ↗(On Diff #31408)

This is removed now as we are using edge frequencies to calculate edge weights.

1621 ↗(On Diff #31408)

This is done now.

1596 ↗(On Diff #30214)

It was used to prevent the "Denominator cannot be 0!" error when BB2SuccBBFreq is used to build branch probability. I have also handled this case when scaling weights below.

congh updated this revision to Diff 33467.Aug 28 2015, 2:15 PM
congh marked 3 inline comments as done.

Update the patch by using edge frequencies to update edge weights.

davidxl added inline comments.Aug 29 2015, 11:32 AM
lib/Support/BlockFrequency.cpp
70 ↗(On Diff #33467)

No underflow check here?

lib/Transforms/Scalar/JumpThreading.cpp
1591 ↗(On Diff #33467)

It is better to call PredBB2BBFreq NewBBFreq (with comment: frequency of the newly split BB)

1601 ↗(On Diff #33467)

operator - does not have underflow check (see comment above).

1607 ↗(On Diff #33467)

Is this interface available in trunk?

1616 ↗(On Diff #33467)

BPI for PredBB also needs to be updated for completeness -- as its successor BB is now NewBB

congh added inline comments.Aug 29 2015, 2:13 PM
lib/Support/BlockFrequency.cpp
70 ↗(On Diff #33467)

-= below is using the new operator defined above, so underflow is checked.

lib/Transforms/Scalar/JumpThreading.cpp
1591 ↗(On Diff #33467)

OK.

1601 ↗(On Diff #33467)

But -= has underflow check which is used in operator -.

1607 ↗(On Diff #33467)

Not yet :( Still waiting for the review of that patch.

1616 ↗(On Diff #33467)

The old PredBB's successor BB is replaced by NewBB in the same position, so BPI doesn't have to be updated. (In BPI, the successor weights are stores in a map with the index of the successor as the key. The NewBB is set as a successor using the following code:

if (PredTerm->getSuccessor(i) == BB) {
  BB->removePredecessor(PredBB, true);
  PredTerm->setSuccessor(i, NewBB);
}

So the position of this new successor is not changed.)

davidxl added inline comments.Aug 29 2015, 2:47 PM
lib/Support/BlockFrequency.cpp
70 ↗(On Diff #33467)

ok

lib/Transforms/Scalar/JumpThreading.cpp
1607 ↗(On Diff #33467)

Since the interfaces are under large rework, I suggest simply extract extract the minimal necessary normalization code from that patch here (and abandon that one). Is it doable?

1616 ↗(On Diff #33467)

ok then.

congh updated this revision to Diff 33602.Aug 31 2015, 11:14 AM

Update the patch by adding the definition of MachineBranchProbabilityInfo::normalizeEdgeWeights().

This revision was automatically updated to reflect the committed changes.