This is an archive of the discontinued LLVM Phabricator instance.

Update the branch weight metadata in JumpThreading pass.
ClosedPublic

Authored by congh on Jul 6 2015, 5:22 PM.

Details

Summary

Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). Two more analyses (BlockFrequencyInfo and BranchProbabilityInfo) are needed then.

Diff Detail

Event Timeline

congh updated this revision to Diff 29140.Jul 6 2015, 5:22 PM
congh retitled this revision from to Update the branch weight metadata in JumpThreading pass..
congh updated this object.
congh added reviewers: chandlerc, davidxl.
congh added a subscriber: llvm-commits.
dnovillo added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
1522

Shouldn't you be using weights here instead of frequencies? Frequencies can be computed later out of the branch weights.

davidxl edited edge metadata.Jul 7 2015, 1:08 PM
davidxl added a subscriber: davidxl.

Weights are relative values that can only be compared/mixed across
branches sharing the same source BB. For Jump-thread updating, the
split-out edge from NewBB to SuccBB is a new edge without profile
info. We need to use the profile info from edge Pred -> NewBB which
does not share the same source BB with BB->SuccBB. Frequency needs to
be used for scaling purpose here.

David

davidxl added inline comments.Jul 7 2015, 1:26 PM
lib/Transforms/Scalar/JumpThreading.cpp
172

Ideally the new passes in the pipeline is only needed when profile-instr-use or autofdo is turned on.

congh added inline comments.Jul 7 2015, 1:45 PM
lib/Transforms/Scalar/JumpThreading.cpp
1522

Weights are only meaningful in terms of the same source BB. Here I need to know how many weights on BB to SuccBB are contributed by the edge PredBB to BB, and I could not compare the weights on those two edges directly as they only make sense for PredBB and BB respectively.

congh updated this revision to Diff 29963.Jul 16 2015, 4:46 PM
congh added a reviewer: dexonsmith.

This update merges the patch that adds setBlockFreq() interfaces to BlockFrequencyInfo (http://reviews.llvm.org/D11104). As both BFI and BPI can be used without including passes with them, in jump-threading pass, we can update the edge weights only in PGO mode (i.e., only when we have the entry count of a function) without introducing new passes. To make sure both BFI/BPI have up-to-date information, for every CFG change, we need to update BFI and BPI accordingly. Two utility functions (JumpThreading::UpdateEdgeWeight() and JumpThreading::SplitBlockPreds()) are created to facilitate the update.

davidxl added inline comments.Jul 16 2015, 5:33 PM
include/llvm/Analysis/BlockFrequencyInfoImpl.h
972

It might be slightly better to split the interfaces into two: one is set and the other is update.

lib/Analysis/BranchProbabilityInfo.cpp
622

It is probably better to call this one 'updateEdgeWeight'

lib/Transforms/Scalar/JumpThreading.cpp
158

This function certainly does more than just edge weight update -- a different name may be better.

1580

assert here?

1588

fix typo.

1589

A side question: do we have a bug tracking that data insanity issue?

1603

Should probably assert here that BPI->getEdgeWeight(..) == W

1603

This is not really BBFreq, it should be BB2Succ Edge frequency

1608

Another insanity to track?

1611

I don't think this one is correct. You are using the New BB->Succ Frequency in the update which is wrong.

The invariant is :

PredBBFreq + NewBB2SuccBBEdgeFreq = OldBB2SuccBBEdgeFreq

and the update is:

NewWeight = (NewBB2SuccBBEdgeFreq/OldBB2EdgeFreq)* OldWeight

combine them we have

NewWeight = (1 - PredBBFreq/OldBB2EdgeFreq)*OldWeight

congh updated this revision to Diff 30017.Jul 17 2015, 11:15 AM

Update the patch according to David's comments.

davidxl added inline comments.Jul 17 2015, 11:23 AM
lib/Transforms/Scalar/JumpThreading.cpp
1563

Can be skipped when Preds' s size == 1

1571

Can be skipped if Preds's size is 1

congh updated this revision to Diff 30191.Jul 20 2015, 1:54 PM

Fixed a bug and let two newly added internal functions be private.

congh updated this revision to Diff 30214.Jul 20 2015, 4:49 PM

Fix precision loss problem when calculating new edge weights.

I suggest also adding another test case (documented in the PR -- the one that has problem with jump-threading + loop-rotation + jump-threading).

This reminds me of a way to do profile sanity check/validation when debug check is turned on:

  1. compute the BB freq before the pass
  2. cfg transformation + BB freq update
  3. branch probability weight update
  1. recompute BFI using updated weight in 3). The result of 4) should match that of 2).
lib/Transforms/Scalar/JumpThreading.cpp
1590

change the name to OrigBBFreq

1596

Is this still an issue that needs the workaround here?

1614

Why this condition?

1617

It is unclear how can this happen.

1632

The scaling code looks unnecessarily complicated. How about this:

a) Pre-normalization

  1. first sum up all the weights before the update
  2. if the sum is too small, scale it up to a fixed val such as ScaledNumber<64>(1,20)
  3. compute the scale factor and rescale the weights

b) update BB->SuccBB with (1- Freq(Pred->BB)/Freq(BB->SuccBB))

c) post-normalization: scale all weights such that the weight sum is 2^20

Some helper routine can be defined.

congh added a comment.Jul 21 2015, 1:03 PM

I suggest also adding another test case (documented in the PR -- the one that has problem with jump-threading + loop-rotation + jump-threading).

This reminds me of a way to do profile sanity check/validation when debug check is turned on:

  1. compute the BB freq before the pass
  2. cfg transformation + BB freq update
  3. branch probability weight update
  1. recompute BFI using updated weight in 3). The result of 4) should match that of 2).

You mean comparing result from 4) and 1)? What if a CFG node is split? If CFG is modified, we may not be able to find one-to-one BB match. To make a reasonable comparison, we may need to understand what the pass does...

congh marked an inline comment as done.Jul 21 2015, 1:50 PM
congh added inline comments.
lib/Transforms/Scalar/JumpThreading.cpp
1596

Yes. The issue emerges after jump-threading + loop-rotation + jump-threading will lead to this.

1614

I forget to add a comment when I did this. But I think this condition can be removed now.

1617

Consider the following CFG and assume the edge frequency is 100 for each edge:

A   B
 \ /
  C
 / \
D   E

In some pass C may be split into two blocks:

A   B
|   |
C1 C2
| X |
D   E

Then edges from C1/C2 have frequency 50. Now suppose in jump-threading pass we thread B->E across C2 (C2->D is impossible at runtime), then B has frequency 100 but C2->E has frequency 50. This is when this condition is not met.

1632

This method looks good. I have two questions:

  1. How to tell if the sum is too small? Can we just normalize all weights such that the sum of them is 2^20?
  2. Is c) necessary?
congh updated this revision to Diff 30372.Jul 22 2015, 11:14 AM

Fixed two bugs in the patch.

davidxl added inline comments.Jul 22 2015, 4:24 PM
lib/Transforms/Scalar/JumpThreading.cpp
1617

What I meant is that at runtime Pred->BB can only be followed by BB->Succ, then Freq(BB->SuccBB) can be proved to be no less than Freq(PredBB->BB) -- so that check is unnecessary.

1632
  1. I think it is fine to do normalization uncondionally
  1. c is not strictly necessary. However I do expect in the future we can assert that the weight sum at any given time is always normalized.
congh updated this revision to Diff 31408.Aug 5 2015, 3:54 PM

Update the patch by normalizing a list of weights of uint64_t type using MachineBranchProbabilityInfo::normalizeEdgeWeights().

Please take a look.

As discussed, there is another more complicated jump-threading related
profile updating bug (also involving loop rotation). Do you plan to
address that in a different patch?

David

After reading the code again, I think the change needs to be restructured. There are a couple of main points we should stick to:

  1. When CFG is changed, the code range in which the in-memory profile data is in inconsistent state should be minimized -- ideally immediately after the CFG is changed -- so that the profile state before and after the CFG change is kosher.
  2. BFI and BPI update need to be bundled together
  3. weight meta data should be a direct translation of updated BPI.

I suggest reorganizing the change like this

  1. BFI and BPI update for threading should be in one place (one helper function). a) BBs involved: PredBB, NewBB, BB, and SuccBB b) Edges involved: Pred->BB : removed Pred->NewBB: new NewBB->SuccBB: new BB->SuccBB: need edge freq and bp update.
  1. after threading profile update, the meta data update should be simple.

There should be no worry for overflow as subtraction is involved. Due to this, the scaling code can be simple.

lib/Transforms/Scalar/JumpThreading.cpp
1458

Should the NewBB' freq = BFI->getBlockFreq(PredBB)*BPI->getEdgeProbability(PredBB, BB)?

Where is the BPI update for PredBB->NewBB?

1578

Freq(PredBB->NewBB) / Orig_Freq(BB->SuccBB)

which is basically Freq (NewBB)/Orig_Freq(BB->SuccBB)

1592

When NewBB is created, there should be no edge from PredBB to BB anymore. The code here assumes the BPI has not been updated which can be broken.

I think the right way is to use NewBB's frequency which is already computed -- the update interface either needs to pass in NewBB's freq or NewBB (clone of BB).

1598

To avoid ambiguity of the name, it is better to change the name of the variable to BBNewFreq, otherwise it may be thought of as NewBB's frequency where NewBB is the cloned bb of 'BB'.

Similarly change OrigBBFreq to BBOrigFreq.

1602

Here is the case where BB's frequency is updated, but the BPI for BB's successors are not updated. I think these updates needs to be wrapped into its own helper function independent of meta data weights update (mixing them makes the code hard to read and reason).

1608

This whole loop seems unnecessary.

1628

There does not seem to be a need to collect the old weights at all. As long as we have newly updated branch probabilty info (using the updated edge frequency), the new weights can be resynthezied.

congh marked 3 inline comments as done.Aug 28 2015, 2:13 PM

After reading the code again, I think the change needs to be restructured. There are a couple of main points we should stick to:

  1. When CFG is changed, the code range in which the in-memory profile data is in inconsistent state should be minimized -- ideally immediately after the CFG is changed -- so that the profile state before and after the CFG change is kosher.
  2. BFI and BPI update need to be bundled together

BFI and BPI are updated in different ways and it is difficult to write a general function doing this. However, I have greatly reduced the size of the function that updates BFI and BPI in this case.

  1. weight meta data should be a direct translation of updated BPI.

This is a good idea. I have used the edge frequencies to update edge weights. The only possible issue is that when the source BB's frequency is too low, we may suffer some precision loss. But I think it is OK because BB's frequency is low.

I suggest reorganizing the change like this

  1. BFI and BPI update for threading should be in one place (one helper function). a) BBs involved: PredBB, NewBB, BB, and SuccBB b) Edges involved: Pred->BB : removed Pred->NewBB: new NewBB->SuccBB: new BB->SuccBB: need edge freq and bp update.

This is now done in one function.

  1. after threading profile update, the meta data update should be simple.

Right.

There should be no worry for overflow as subtraction is involved. Due to this, the scaling code can be simple.

Right. As we now use edge frequency to calculate edge weights, the only think we should take care is that we need to normalize those frequencies to let them fit in 32-bit.

lib/Transforms/Scalar/JumpThreading.cpp
1458

You are right. I have corrected this part.

The BPI update for PredBB->NewBB is unnecessary here as PredBB's old successor BB is replaced by NewBB and PredBB's weights list in BPI can remain unchanged (in BPI each edge weight is mapped with a BB-to-success_index pair).

1592

You are right. Here I should use NewBB to calculate the frequency of the edge from PredBB to BB.

1598

OK.

1608

This is removed now as we are using edge frequencies to calculate edge weights.

1614

It was used to prevent the "Denominator cannot be 0!" error when BB2SuccBBFreq is used to build branch probability. I have also handled this case when scaling weights below.

1628

This is done now.

congh updated this revision to Diff 33467.Aug 28 2015, 2:15 PM
congh marked 3 inline comments as done.

Update the patch by using edge frequencies to update edge weights.

davidxl added inline comments.Aug 29 2015, 11:32 AM
lib/Support/BlockFrequency.cpp
70

No underflow check here?

lib/Transforms/Scalar/JumpThreading.cpp
1591

It is better to call PredBB2BBFreq NewBBFreq (with comment: frequency of the newly split BB)

1601

operator - does not have underflow check (see comment above).

1607

Is this interface available in trunk?

1616

BPI for PredBB also needs to be updated for completeness -- as its successor BB is now NewBB

congh added inline comments.Aug 29 2015, 2:13 PM
lib/Support/BlockFrequency.cpp
70

-= below is using the new operator defined above, so underflow is checked.

lib/Transforms/Scalar/JumpThreading.cpp
1591

OK.

1601

But -= has underflow check which is used in operator -.

1607

Not yet :( Still waiting for the review of that patch.

1616

The old PredBB's successor BB is replaced by NewBB in the same position, so BPI doesn't have to be updated. (In BPI, the successor weights are stores in a map with the index of the successor as the key. The NewBB is set as a successor using the following code:

if (PredTerm->getSuccessor(i) == BB) {
  BB->removePredecessor(PredBB, true);
  PredTerm->setSuccessor(i, NewBB);
}

So the position of this new successor is not changed.)

davidxl added inline comments.Aug 29 2015, 2:47 PM
lib/Support/BlockFrequency.cpp
70

ok

lib/Transforms/Scalar/JumpThreading.cpp
1607

Since the interfaces are under large rework, I suggest simply extract extract the minimal necessary normalization code from that patch here (and abandon that one). Is it doable?

1616

ok then.

congh updated this revision to Diff 33602.Aug 31 2015, 11:14 AM

Update the patch by adding the definition of MachineBranchProbabilityInfo::normalizeEdgeWeights().

This revision was automatically updated to reflect the committed changes.