This is an archive of the discontinued LLVM Phabricator instance.

Codegen: MachineBlockPlacement Improve probability layout.
ClosedPublic

Authored by iteratee on Jul 27 2016, 5:06 PM.

Details

Reviewers
davidxl
Summary

The following pattern was being layed out poorly:

    A
   / \
  B   C
 / \ / \
D   E   ? (Doesn't matter)

Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)

The current algorithm gives:
A,B,C,E (D goes on worklist)

It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:

A,B,E, (C and D are in a worklist)

Diff Detail

Event Timeline

iteratee updated this revision to Diff 65842.Jul 27 2016, 5:06 PM
iteratee retitled this revision from to Codegen: MachineBlockPlacement Improve probability layout..
iteratee updated this object.
iteratee added a reviewer: davidxl.
iteratee set the repository for this revision to rL LLVM.
davidxl edited edge metadata.Jul 27 2016, 6:47 PM

This is a good change. I actually wanted to get rid of the forward check which is always true for case 2 and redundant for case 1. Now it is also bad for case 3.

Please add a test case also.

lib/CodeGen/MachineBlockPlacement.cpp
639

Nit: can you make the art work like the following to not split S2 into two 'blocks':

//

//      Head (Or Entry, Top)
//      / \
//     /   \
//   BB    Pred
//   / \   /  |
//   |  S1  |
//    \      /
//      S2   
//
643

Nit: can you make the art work like the following to not split S2 into two 'blocks':

//

//      Head (Or Entry, Top)
//      / \
//     /   \
//   BB    Pred
//   / \   /  |
//   |  S1  |
//    \      /
//      S2   
//
660

layed out --> laid out

661

Another way to explain in terms of savings instead of cost: the savings is the total freq of the fall through edges. In topo case, the savings is

freq(S->BB) + max(freq(Pred->S1), freq(Pred->S2). (1)

For non-top case, the saving is:

freq(S->BB) + freq(BB->S1) + freq(Pred->S2) (2)

When freq(Pred->S2) > freq(Pred->S1), (2) is strictly larger than (1). In the opposite case, the check below will also lead to (2) > (1)

679

For case 1 and 2

683

This comment does not fit here. With profile data, such check won't be skipped, but just does not need to be as biased.

iteratee updated this revision to Diff 65961.Jul 28 2016, 11:29 AM
iteratee edited edge metadata.
iteratee removed rL LLVM as the repository for this revision.

Added test cases. Improved comments are coming.

iteratee updated this revision to Diff 65972.Jul 28 2016, 12:50 PM
iteratee marked 6 inline comments as done.

Improved comments, and better ascii diagrams.

davidxl accepted this revision.Jul 28 2016, 1:18 PM
davidxl edited edge metadata.

lgtm with the test change suggestion.

test/CodeGen/X86/block-placement.ll
1334 ↗(On Diff #65972)

I think we probably don't need to test the exact boundary condition here to make the test more robust (say when hotprob is tuned). I suggest making the branch at then1 block to have (60%, 40%) branch prob distribution to make the case more obvious (such that the backward branch prob of edge then2->fork1 is around 70%.

This revision is now accepted and ready to land.Jul 28 2016, 1:18 PM
iteratee updated this revision to Diff 66030.Jul 28 2016, 2:55 PM
iteratee edited edge metadata.
iteratee marked an inline comment as done.

Made test give a wider berth to the HotProb threshold.

lib/CodeGen/MachineBlockPlacement.cpp
661

I think I'll stick with cost, as that's how the other 2 cases are explained.

iteratee updated this revision to Diff 66146.Jul 29 2016, 10:44 AM

Added fixes for 4 affected tests. Since they weren't testing layout, I added branch weights to get the layout that was produced before. In 3 cases, this was simply equal-weighting the branches, and in the 4th the expected layout required slight biasing.

iteratee closed this revision.Jul 29 2016, 11:18 AM