This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Allow small copyable blocks to "break" the CFG.
ClosedPublic

Authored by iteratee on Dec 13 2016, 5:00 PM.

Details

Summary

When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, because the copy means that the CFG won't actually be violated.

Diff Detail

Event Timeline

iteratee updated this revision to Diff 81329.Dec 13 2016, 5:00 PM
iteratee retitled this revision from to CodeGen: Allow small copyable blocks to "break" the CFG..
iteratee updated this object.
iteratee added a reviewer: davidxl.
iteratee set the repository for this revision to rL LLVM.

davidxl - This should be quite a bit shorter and easier to review. Thanks for the direction. There is a 2nd part, but it's also really short.

t.p.northover - Can you look at the arm cmpxchg tests? I mentioned it on IRC, but here it is concrete. This code causes the layout, and therefore fallthrough of the generated cmpxchg code to no longer to favor the success case. I think we should change the generating code to either supply a small hint, or to provide the inverted test and the branches in the other order. Both will put the fallthrough back the way it was.

davidxl edited edge metadata.Dec 14 2016, 12:38 PM

Please add one or two test cases for this feature.

lib/CodeGen/MachineBlockPlacement.cpp
568

&& --> || ?

In other words, do we care about block dup of simple blocks in layout mode?

581

Perhaps change the name to "canTailDuplicateUnplacedPreds" ?

593

We have have a situation where some of the predecessors can share a common successor (newly created), and the block can be tail-Duped into that block. It might be worth considering supporting.

iteratee updated this revision to Diff 81496.Dec 14 2016, 4:17 PM
iteratee edited edge metadata.
iteratee removed rL LLVM as the repository for this revision.

Fixes from review comments, and new test case.

iteratee added inline comments.Dec 14 2016, 4:18 PM
lib/CodeGen/MachineBlockPlacement.cpp
568

Simple blocks contain only an unconditional jump. I don't see a good reason to exclude them during layout.

581

Done. Thanks.

593

While I'll keep that in mind, that would basically be block copying. We should support that as well, but I think it's out of scope for what I'm trying to do here.

do you forget to include the new test in the patch, I still can not find it.

lib/CodeGen/MachineBlockPlacement.cpp
568

Basically you are trying to tail duplicate such simple blocks in layout -- what is the benefit? It does not create more fall-through opportunities.

iteratee updated this revision to Diff 82010.Dec 19 2016, 2:09 PM
iteratee edited edge metadata.

Put test into correct patch.

davidxl added inline comments.Dec 20 2016, 11:28 AM
lib/CodeGen/MachineBlockPlacement.cpp
568

Should the condition be just

if (BB->succ_size() == 1)

return false;

If not, can you come up with a test case showing tail-dup simple BB helps the layout?

test/CodeGen/PowerPC/tail-dup-layout.ll
97

The intention of this test case is not clear. I expect a simpler and straightforward test case for instance just a control flow with two concatenated triangles. The first triangle branch is annotated with profile data such that without this patch it will layout the blocks in topo order.

116

The term 'avoidable'/'unavoidable' is not well defined and can be confusing. I also suggest add this test case in a different patch.

iteratee added inline comments.Dec 21 2016, 4:28 PM
test/CodeGen/PowerPC/tail-dup-layout.ll
97

The straightline test that you're asking for will be redundant as soon as D20505 lands. (It removes the outlining flags from this file, and verifies the single triangle case by just doing multiple triangles)

This test is basically as small as possible. It needs a block that:
a) has predecessors that can accept a tail-duplicated copy
b) wouldn't be chosen based on some outlining algorithm
c) has at least 2 successors

The block then2 has 2 predecessors: else1, and test2

It has 2 successors: end1 and end2
and test2 has an alternate path around then2: else2

116

The goal of the test is a block that wouldn't be chosen due to some outlining strategy. Do you have a name you prefer?

davidxl added inline comments.Dec 21 2016, 6:19 PM
test/CodeGen/PowerPC/tail-dup-layout.ll
97

It may become redundant after D20505 lands, but this patch still needs a simplest test case possible in addition to this one.

116

The name ties to much to the implementation details of outlining heuristics. Just change the function name to something like tail_dup_test or something and add a comment that test2 has an alternate path around then2 to disable outllining. Perhaps add a check to make sure outlining does not happen?

iteratee updated this revision to Diff 83334.Jan 5 2017, 6:21 PM
iteratee edited edge metadata.

Added new test.

I added a simple test in a new file that doesn't pass the outlining flag to llc.

davidxl added inline comments.Jan 6 2017, 12:31 PM
test/CodeGen/PowerPC/tail-dup-break-cfg.ll
7 ↗(On Diff #83334)

This test demonstrates that this patch helps when test2 is more likely (but the branch probability is not biased enough to make the successor selection to pick test2 as successor without the patch).

What happens if body1 a more likely successor such that even with the patch, body1 will be selected as test1's successor. Later when block test2 is checked, it is still beneficial to tail duplicate it. Does it happen with the patch?

iteratee updated this revision to Diff 83415.Jan 6 2017, 1:36 PM
iteratee edited edge metadata.

Things that I had marked as done, but had missed, perhaps by putting them in D20505

iteratee added inline comments.Jan 6 2017, 1:36 PM
test/CodeGen/PowerPC/tail-dup-break-cfg.ll
7 ↗(On Diff #83334)

This patch only prefers blocks that can be tail-duplicated if they are more probable.

selectBestSuccessor looks at the blocks in order and picks the most probable block that doesn't have a CFG conflict, or is hot enough to ignore the CFG conflict. Most of that logic is in hasBetterLayoutPredecessor. This patch just adds "can tail-duplicate to unplaced preds" as another condition to allow CFG-breaking.

If body1 is laid out after test1, then test2 will not be copied into the end of test1, as test1 has more than one successor. Since test2 would be laid out after body1, it would not be tail-duplicated anywhere.

In an if-else case where both bodies go to a small test afterward:

   t1
  / \
b1   e1
  \ /
   t2

The existing code will duplicate the test without this patch, producing either

t1 b1 t2 ... e1(t2) or t1 e1 t2 ... b2(t2)

I can create that test if you'd like, but since it's not changed by this patch, it should go in as a separate change.

davidxl accepted this revision.Jan 6 2017, 2:34 PM
davidxl edited edge metadata.

lgtm

This revision is now accepted and ready to land.Jan 6 2017, 2:34 PM
iteratee updated this revision to Diff 83464.Jan 6 2017, 4:52 PM
iteratee edited edge metadata.

Simple tidying of whitespace.

This revision was automatically updated to reflect the committed changes.

A few comments if you wouldn't mind?

llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
570 ↗(On Diff #83881)

const arguments?

589 ↗(On Diff #83881)

Ditto.

595 ↗(On Diff #83881)

Ditto.

598 ↗(On Diff #83881)

Can you comment/explain the BlockFilter part here?

599 ↗(On Diff #83881)

Same with BlockToChain.

1961 ↗(On Diff #83881)

Maybe this should be above the previous line?