This is an archive of the discontinued LLVM Phabricator instance.

Codegen: Tail Merge: Be less aggressive with special cases.
ClosedPublic

Authored by iteratee on Jul 13 2016, 3:18 PM.

Details

Reviewers
davidxl
Summary

This change makes it possible for tail-duplication and tail-merging to
be disjoint.

Don't special case fallthrough if it requires replacing a conditional jump with an unconditional jump.

Diff Detail

Event Timeline

iteratee updated this revision to Diff 63866.Jul 13 2016, 3:18 PM
iteratee retitled this revision from to Codegen: Tail Merge: Be less aggressive with special cases..
iteratee updated this object.
iteratee added a reviewer: davidxl.
iteratee set the repository for this revision to rL LLVM.
iteratee added subscribers: echristo, llvm-commits.
davidxl added inline comments.Jul 13 2016, 3:37 PM
lib/CodeGen/BranchFolding.cpp
621

This looks fine but should it be done only in layout mode?

davidxl added inline comments.Jul 13 2016, 3:46 PM
lib/CodeGen/BranchFolding.cpp
621

looks like if you merge this patch back to D18226 with a slight modification like this:

if ((MBB1 == PredBB || MBB2 == PredBB) && (MBB1->succ_size() == 1 || !AfterPlacement)) {

}

D18226 will be good to go in .

iteratee added inline comments.Jul 13 2016, 3:54 PM
lib/CodeGen/BranchFolding.cpp
621

No, I don't think it should be only layout mode.
rdf-copy is a good example of why. Without this patch:

foo:                                    // @foo
// BB#0:                                // %entry
        {
                        r1 = #0 ; jump .LBB0_2
    }
         .p2align        4
 .LBB0_1:                                // %while.cond
                                         //   in Loop: Header=BB0_2 Depth=1
         {
                         r1 = r0
                         r0 = memw(r0 + #0)
         }
 .LBB0_2:                                // %while.cond
                                         // =>This Inner Loop Header: Depth=1
         {
                         if (p0.new) r0 = r1
                         p0 = cmp.eq(r0, #0)
                         if (p0.new) jumpr:nt r31
         }
         {
                         jump .LBB0_1
         }

Note the loop is 3 packets long. Now with the patch:

foo:                                    // @foo
// BB#0:                                // %entry
        {
                        r1 = #0
                        p0 = cmp.eq(r0, #0)
        }
        {
                        if (p0) r0 = r1
                        if (p0) jumpr:nt r31
        }
        .p2align        4
.LBB0_1:                                // %while.cond
                                        // =>This Inner Loop Header: Depth=1
        {
                        r1 = r0
                        r0 = memw(r0 + #0)
                        if (!cmp.eq(r0.new, #0)) jump:t .LBB0_1
        }
// BB#2:                                // %if.end
        {
                        r0 = r1
                        jumpr r31
        }

Now the loop is one packet long.
There are other cases where unanalyzable fallthrough prevents duplication, but we avoid the problem completely by not merging.

davidxl added inline comments.Jul 13 2016, 4:32 PM
lib/CodeGen/BranchFolding.cpp
621

But this fix is over-generalized

The root cause of the problem in that test is that one of tail sequence is in a loop while the other is not. I think this can only happen for loop top-testing and bottom testing, so it seems to me a better general fix is to check if MBB1 or MBB2 dominates the other.

davidxl edited edge metadata.Jul 13 2016, 6:49 PM
davidxl added a subscriber: davidxl.

This also looks like some problem in later pass that get exposed by tail
merging. For instance the two branches in the bottom test should be
swapped. Ideally, with tail merging, the code should look like the
following:

foo: @foo
BB#0: // %entry

{
                r1 = #0 ; jump .LBB0_2
}
.p2align        4

.LBB0_1: // %while.cond

                                //   in Loop: Header=BB0_2 Depth=1
{
                r1 = r0
                r0 = memw(r0 + #0)
}

.LBB0_2: // %while.cond

                                // =>This Inner Loop Header: Depth=1
{
                if (!cmp.eq(r0.new, #0)) jump .LBB0_1
}
{
                jumpr:nt r31
}

.LBB0_3: // %if.end

I am fine having special handling to disable cases like this, but not too
general.

David

I've looked over the results with and without this patch, and I've come to the conclusion that It's correct to make this less aggressive generally.

For an unconditional branch, ignoring the threshold because of fallthrough is correct, because there is no runtime cost if the jump happens earlier. It was going to happen anyway.
For a conditional branch this isn't true. We trade a conditional branch and a jump (which may not execute), for a jump (which always executes) and a conditional branch. Given this, I think it's reasonable to enforce the threshold in this case.

There are two independent issues here

  1. one issue is the interaction of tailDup and tailMerge. The right fix is to make the post-layout tailMerge really restrictive (such that it does not undo post-layout tailDup decisions). Your current patch with added check !AfterPlacement as I suggested can probably accomplish that
  1. the bug you mentioned that merges loop top and bottom tests.

We shall probably proceed with 1) and make tailDup enhancement your have in. The right way to approach 2) can be handled later. WDT?

iteratee added inline comments.Aug 9 2016, 11:19 AM
lib/CodeGen/BranchFolding.cpp
621

The forked diamond code is about to go in.

I still think that these changes are correct in general, but if you still disagree, I can drop this and roll it into D18226 only after placement.
I think the thresholds there need to be adjusted so that there is only one, but I can do that as well.

iteratee updated this revision to Diff 67444.Aug 9 2016, 5:57 PM
iteratee edited edge metadata.
iteratee removed rL LLVM as the repository for this revision.

Made one of the heuristics depend on layout mode.

iteratee added inline comments.Aug 9 2016, 6:01 PM
lib/CodeGen/BranchFolding.cpp
621

I think in this case the comment is just wrong. A simple count shows that even for fallthrough if the branch was conditional, we trade a certain uncond-branch to a cond-branch for a cond-branch and an uncond-branch. In the first case, we always execute both branches. In the second, we may not execute the uncond-branch.

642

I've made this one layout mode dependent, because there's a much weaker case for keeping it overall.

davidxl accepted this revision.Aug 9 2016, 6:09 PM
davidxl edited edge metadata.

lgtm with more suggested restriction.

lib/CodeGen/BranchFolding.cpp
621

Add AfterPlacement check here as well, and also add a FIXME comment to revisit making it more general later.

This revision is now accepted and ready to land.Aug 9 2016, 6:09 PM
iteratee closed this revision.Aug 10 2016, 1:20 PM