Page MenuHomePhabricator

Codegen: Tail-duplicate during placement.
ClosedPublic

Authored by iteratee on Mar 16 2016, 2:42 PM.

Details

Summary

The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. This change allows duplication after
layout decisions. Because we want the updated CFG to affect subsequent
placement decisions, duplication must occur during placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Fixes from review.

I have some questions, but most of the suggestions are done.

lib/CodeGen/BranchFolding.cpp
703

I believe ARM marks the actual return as an indirect branch.
I don't see a better way. Indirect tail-calls will likely get caught here as well.
Since what we're looking for is jump tables within the function, checking the successor size seems reasonable.
Would you like me to re-write the comment so that it seems more intentional?

707

It's one more than 20, from tail-duplication of the same pattern. I'm open to other suggestions as to how to make these two cooperate.

lib/CodeGen/TailDuplicator.cpp
545

No. With tail-duplication occurring only once, we had to duplicate those blocks when we got the chance. Now with layout-duplication we can wait and duplicate them later.

555

Yes, there are tests where the loop-latch is a switch. I presume it occurs in the real world as well.

558

No. Because this check is only necessary if you want to expand these blocks later in layout mode.
This change is required by the previous one above, because otherwise it breaks a test.

iteratee added inline comments.Aug 23 2016, 12:10 PM
lib/CodeGen/TailDuplicator.cpp
545

This was related to a performance regression I found. I can probably pull this out into a separate patch that gets committed at the same time. Would that be better?

davidxl added inline comments.Aug 23 2016, 1:22 PM
lib/CodeGen/TailDuplicator.cpp
545

Yes this should be in a different patch with different review (and test cases etc).

iteratee updated this revision to Diff 69036.Aug 23 2016, 2:05 PM

Changes missed from previous upload

davidxl added inline comments.Aug 23 2016, 2:10 PM
lib/CodeGen/BranchFolding.cpp
703

How abut Block1.back().isIndirectBranch() && !Block1.back().isReturn() ?

707

I think this value should be passed in either via constructor or a separate API:

setMergeThresholdForBBwithIndirectBr(...).

In tailDup, the value 20 should also be controlled by an option.

lib/CodeGen/MachineBlockPlacement.cpp
916

Why can't this part be folded into maybeTailDuplicateBlock? If not, the big chunk of code between if (Removed) { ..} belongs to its own method.

lib/CodeGen/TailDuplicator.cpp
558

I don't get it. This one looks like a heuristic change to not give block with return instruction (as indirect branch) a larger threshold for duplication, which is independent of this patch.

iteratee marked an inline comment as done.Aug 23 2016, 2:52 PM
iteratee added inline comments.
lib/CodeGen/TailDuplicator.cpp
558

I'll split these out, but they're all intertwined.

  1. Don't duplicate latches/pre-headers early.
  2. There are now tests that fail, so duplicate switches during layout.
  3. Because the indirect branch test is now post Reg-Alloc, it has to handle the actual return instructions.
davidxl added inline comments.Aug 23 2016, 3:29 PM
lib/CodeGen/TailDuplicator.cpp
558

thx for the expalnation. Getting Part-1 right is the key here.

iteratee updated this revision to Diff 69058.Aug 23 2016, 5:28 PM

Removed changes as requested. They'll show up in another patch shortly.

davidxl added inline comments.Aug 24 2016, 10:28 AM
lib/CodeGen/MachineBlockPlacement.cpp
917

BB = *std::prev(Chain.end())

1693

early return if not removed to reduce code nesting level

1694

remove 'into'

1699

std::prev

1703

This does not look right the place to update. The successors of DupBB may also get more unscheduled predecessors. The update should be done in one place where the total number of Preds (BB is dupped into) is known.

1708

Can you give more explanation on the iterative tailDup?

1744

Should an early return be done here?
if (!IsSimple && BB->succ_size() == 1) return false;
if (!TailDup.shouldTailDuplicate(...) return false;

1749

should also skip if Pred is in the same chain as BB (e.g, when BB is a loop head of another loop, and Pred is that loop's latch)

1752

Need to count the number of other Pred that BB can tail duplicate into.

If that number is zero, should return false.

1764

To early to do the update. Wait until the tailDup actually happens. Actually, do we really need to update? When tailDup happens, Pred will become NewSucc's new predecessor, and Pred is still not scheduled.

1769

Early return false when '!CanDupToChain'

1793

It is cleaner to use RemoveList.erase(remove_if(...)..) pattern

Also bypass if RemBB's unscheduled pred number is not zero.

iteratee updated this revision to Diff 69153.Aug 24 2016, 12:23 PM
iteratee marked 8 inline comments as done.

Simplifications from review.

Applied some suggestions, answered others.

lib/CodeGen/MachineBlockPlacement.cpp
1699

Can't. I need to actually update the iterator so that I can check to see if there is an additional block in the chain.

1703

You're correct, It only needs to be done once because only one block is actually going from unscheduled to scheduled: BB.
That makes the function simpler overall and easier to understand.

1749

No, even if they're in the same chain, blocks that end up with both as predecessors still have an additional predecessor.

I think you're thinking of the check that I do below of NewChain vs PredChain.

1752

No. because although we don't have predecessors to worry about (Loop filter), there may be unfiltered predecessors that we can duplicate into, and now is the only time to do so.

1764

We can't do it any later. BB may disappear after tail duplication.

I could make tail duplication return a vector of blocks that were duplicated into, and save the successor list before duplication.
That may be easier to follow, but wouldn't be any more correct.

Yes, we need to update. For each duplication the successors now have an additional unscheduled predecessor

1769

That would be incorrect. There are plenty of cases where we can't duplicate into the layout predecessor where we still want to perform tail duplication.

iteratee updated this revision to Diff 69176.Aug 24 2016, 3:14 PM

Remove unused variable.

davidxl added inline comments.Aug 24 2016, 3:55 PM
lib/CodeGen/MachineBlockPlacement.cpp
1752

It is not correct to do. BB and Pred are in the same nested loop that already been laid out. You can not duplicate BB into Pred here.

LPred

|
|

BB <-----

|               \
..                 \
Pred              |
 \__________|
1764

I think a list of Preds that BB can be duplicated into need to be passed in.

The reason doing update here is not bullet proof is that if later if there are other heuristic to decide not to tailDup, then the early update will be wrong. It needs to be tied to the actual transformation

1769

There are two problems here

  1. many Pred (other than LPred) has been skipped as illegal candidates, but tailDupiicator does not know about will happily do tailDup -- as the tailDuplicator does not know about the new constraints from MBP
  2. if the total number of candidate predecessors that BB can tailDuped into is fewer than 2, there is no point do the tail duplication
iteratee added inline comments.Aug 24 2016, 4:49 PM
lib/CodeGen/MachineBlockPlacement.cpp
1752

I don't see why not. There's not a correctness issue, and if BB was the loop top of the inner loop, it was never considered to be duplicated. Now is the chance that we have to do it.

1764

The goal of this loop is not to find constraints.

I agree it would be cleaner to get the list out of tail-duplication. I'll do that.

1769

I think you misunderstand the point of this loop. The point is not to find "legal candidates", but to find the candidates for which duplicating will require update.

Because of that, we don't have an accurate count, and don't want one. We'll gladly let tailDuplicator do its job.

davidxl added inline comments.Aug 24 2016, 5:15 PM
lib/CodeGen/MachineBlockPlacement.cpp
1708

This one needs a test case to cover.

1769

Ok.

Do you see the possibility that some Pred simply can not be selected as dup target bb (for legality or performance reasons)? Overall, the code should be organizied in the way that the list of BBs that MBP thinks it can/should be tailDup should match the actual transformations.

iteratee added inline comments.Aug 24 2016, 5:21 PM
lib/CodeGen/MachineBlockPlacement.cpp
1769

I can see that we might want to limit the set of predecessors duplicate to. A good case would be to filter out predecessors that are too cold.

The performance results have been reasonable so far without that change, and It would certainly make this more complicated.
For now, I think it would be best to simply go with the cleaner version of getting a list of blocks back from tailDuplicateAndUpdate.

This will make it easy to add filters later, and to add them inside of tailDuplicator as well without worrying about breaking this loop.

davidxl added inline comments.Aug 24 2016, 5:26 PM
lib/CodeGen/MachineBlockPlacement.cpp
1769

I am fine with unfiltered solution for now -- but the code structure should make future tuning easy to do :)

iteratee updated this revision to Diff 69262.Aug 25 2016, 10:08 AM

Update unscheduled predecessors after tail duplication.

iteratee updated this revision to Diff 69293.Aug 25 2016, 2:35 PM
iteratee marked an inline comment as done.

Changes missed in previous patch. Get list of duplicated predecessors from TailDuplicator and use that list to update unscheduled predecessors.

iteratee updated this revision to Diff 69434.Aug 26 2016, 2:02 PM

Remove extra debugging.

iteratee marked 2 inline comments as done.Aug 30 2016, 3:03 PM
iteratee added inline comments.
lib/CodeGen/BranchFolding.cpp
703

I checked. There are platforms where the "return" instruction does not return true for isReturn after epilog lowering.
Mips, Hexagon, Thumb.

I think I'll re-write the comment so that it sounds less like a workaround.

iteratee updated this revision to Diff 69762.Aug 30 2016, 3:04 PM

Add test case for repeated tail-duplication.

iteratee marked an inline comment as done.Aug 30 2016, 5:14 PM
davidxl added inline comments.Sep 12 2016, 1:48 PM
test/CodeGen/X86/tail-dup-repeat.ll
7 ↗(On Diff #69762)

Can this function's declaration be simplified? If the parameters are not necessary, remove them

11 ↗(On Diff #69762)

Clean up the parameters (shorten names etc)). Remove unnecessary attribute

15 ↗(On Diff #69762)

Please also clean up variable names and label names.

For label names, make them more meaningful. For instance, if a block is duplicated, the label for the block can be 'dup:'. For iterative tail dup, also name them in sequence numbers dup1, dup2 etc. Others can be cleaned up suchas header, loopexit etc.

Some simple comments in the test case (where tail dup happens) is also helpful.

iteratee updated this revision to Diff 71247.Sep 13 2016, 3:21 PM
iteratee marked 3 inline comments as done.

Found new, shorter test case for repeated tail duplication.

davidxl added inline comments.Sep 13 2016, 3:51 PM
lib/CodeGen/MachineBlockPlacement.cpp
1732

Is this comment correct? The TailBB may be dup'ed in to Lpred, but still not removed as there are some other predecessors it can not be dup'ed into ?

1821

Should DuplicatedToLPred == CanDupToChain?

iteratee updated this revision to Diff 71274.Sep 13 2016, 5:04 PM

Simplified computation of whether a block was duplicated to the layout pred.

lib/CodeGen/MachineBlockPlacement.cpp
1732

It is currently correct.
The tail duplication code only copies the block into the layout predecessor if it can be copied into all other predecessors as well.

1821

Even better, since we now return the list of predecessors that received a copy of BB, we should check to see if LPred is in that list. Basically free since we loop over the list anyway.

davidxl added inline comments.Sep 13 2016, 6:56 PM
lib/CodeGen/MachineBlockPlacement.cpp
917

The update of BB should be wrapped into 'repeatedlyTailDuplicateBlock'.

1696

Missing coma. ".. it was removed, markChainSuccessors .."

1706

Add a comment here that Chain End is updated when DupBB is removed by the removal call back.

There also seem to be a bug here: if TailDup happens, but DupBB is not removed, the loop will check the same BB again. Should it just break the loop when removed is not true?

1733

It is cleaner to make this interface return bool (either Removed or DuplicatedToLPred)

iteratee updated this revision to Diff 73330.Oct 3 2016, 1:20 PM
iteratee marked 2 inline comments as done.

Added comments.

Made changes from review:
Chain end is now updated in repeatedlyTailDuplicateBlock
MaybeTailDuplicate now returns Removed instead of using out parameter.

Please take a look. I cleaned up the issues you pointed out and added comments.

lib/CodeGen/MachineBlockPlacement.cpp
1696

I split the sentence into 2 and then added a comma. It's easier to read now.

1706

I added some comments.

Currently, duplication to layout predecessor only occurs if the block is removed. I added a comment about that as well.

davidxl accepted this revision.Oct 3 2016, 3:35 PM
davidxl edited edge metadata.

lgtm.

This is large change so it bounds to run into some corner case issues. Please test thoroughly and watch for any fallouts.

lib/CodeGen/MachineBlockPlacement.cpp
1722

Add an assert that Removed is true when DuplicatedToLPred is true

This revision was automatically updated to reflect the committed changes.