This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: If Convert blocks that would form a diamond when tail-merged.
ClosedPublic

Authored by iteratee on Jul 19 2016, 4:28 PM.

Details

Reviewers
davidxl
Summary

Some if conversion currently requires tail-merging to have run first.

As an example the following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
      %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
      br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
      %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
      %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
      br label %bb

bb:             ; preds = %cond_true, %bb.outer
      %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
      %tmp. = sub i32 0, %b_addr.021.0.ph
      %tmp.40 = mul i32 %indvar, %tmp.
      %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
      %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
      br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
      %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
      %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
      %indvar.next = add i32 %indvar, 1
      br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
      %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
      %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
      br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
      %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
      ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:

LBB1_1:                                 @ %bb
                                      @ =>This Inner Loop Header: Depth=1
      cmp     r0, r1
      ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                      @   in Loop: Header=BB1_1 Depth=1
      subs    r0, r0, r1
      cmp     r1, r0
      it      ne
      cmpne   r0, r1
      bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                      @   in Loop: Header=BB1_1 Depth=1
      subs    r1, r1, r0
      cmp     r1, r0
      bne     LBB1_1
LBB1_4:                                 @ %bb17
      bx      lr

With diamond-tail if conversion, but without tail-merging:

@ BB#0:                                 @ %entry
      cmp     r0, r1
      it      eq
      bxeq    lr
LBB1_1:                                 @ %bb
                                      @ =>This Inner Loop Header: Depth=1
      cmp     r0, r1
      ite     le
      suble   r1, r1, r0
      subgt   r0, r0, r1
      cmp     r1, r0
      bne     LBB1_1
@ BB#2:                                 @ %bb17
      bx      lr

Diff Detail

Event Timeline

iteratee updated this revision to Diff 64599.Jul 19 2016, 4:28 PM
iteratee retitled this revision from to CodeGen: If Convert blocks that would form a diamond when tail-merged..
iteratee updated this object.
iteratee added a reviewer: davidxl.
iteratee set the repository for this revision to rL LLVM.
iteratee updated this object.
iteratee updated this object.
iteratee updated this revision to Diff 64607.Jul 19 2016, 4:33 PM
iteratee removed rL LLVM as the repository for this revision.

Removed debugging, tidied spacing, added comments.

davidxl edited edge metadata.Jul 20 2016, 11:23 AM

I am torn about this change. While this looks like a useful thing to do, I suspect this is not the right way to approach the problem.

The example actually confirms the fact the pre-layout tailmerging is a good normalization/enabler pass for later optimizations. This is the reason why it should be run with lower threshold enabling as much optimization as possible, and later let TailDup to undo those that do not bring benefit and to improve layout.

Another question is that whether this patch can handle more cases where (tailMerge + ifcvt) can not handle. If not, it seems to me the patch seems to have duplicated logics (e.g, counting dups) in tailMerge which is not the right approach.

Is this patch required to enable your tailDup enhancement patch? I don't think this one is essential for it. We can probably focus on getting your tailDup patch in first (it is very close to get -- probably just to make your latest tailMerge tuning to be enabled only in post layout mode?)

I am torn about this change. While this looks like a useful thing to do, I suspect this is not the right way to approach the problem.

The example actually confirms the fact the pre-layout tailmerging is a good normalization/enabler pass for later optimizations. This is the reason why it should be run with lower threshold enabling as much optimization as possible, and later let TailDup to undo those that do not bring benefit and to improve layout.

I couldn't find any of the original commits about tail merging that refer to normalization. It's always about reducing code size. Also, it's unusual for a canonicalization pass to have a threshold.

Another question is that whether this patch can handle more cases where (tailMerge + ifcvt) can not handle. If not, it seems to me the patch seems to have duplicated logics (e.g, counting dups) in tailMerge which is not the right approach.

Two things:

  1. The code has to count duplicates anyway. Look at how IfConvertDiamond is written.
  2. There is a precedent for teaching optimization passes to look deeper, even if there is a canonicalization pass that would prevent the need.

Is this patch required to enable your tailDup enhancement patch? I don't think this one is essential for it. We can probably focus on getting your tailDup patch in first (it is very close to get -- probably just to make your latest tailMerge tuning to be enabled only in post layout mode?)

Not specifically. If tailMerge is made less aggressive only during layout, then this is not necessary.

iteratee updated this revision to Diff 64802.Jul 20 2016, 6:15 PM
iteratee edited edge metadata.

Refactor shared code between Diamond and Diamond with shared tail.

The validateDiamond refactoring change can be split out from the functional change into a different patch .

davidxl added inline comments.Jul 21 2016, 3:58 PM
lib/CodeGen/IfConversion.cpp
864

Document the difference between this pattern vs ValidDiamond?

Also since there is no common tail shared between truebb and falsebb, the shape is not really 'Diamond'. Perfhaps make it named "ValidDiamondWithTailCommonned' ? to indicate the shape will be diamond if the tail is commonned?

882

is 'fallthough' check needed here?

892

Can this check be moved earlier? If so, common check can be extracted and shared across this and ValidDiamond.

906

Merge these two :

if (TF == FT && TT == FF) {
    if (! Reversable)
         return false;
    reverse ...
 }
924

Better move this out of line to improve readability.

929

I have not looked in details here. Is there existing code that can be refactored/reused by any chance?

iteratee updated this revision to Diff 65399.Jul 25 2016, 12:12 PM
iteratee marked 5 inline comments as done.

Move closure to member function.

iteratee updated this revision to Diff 65406.Jul 25 2016, 1:14 PM

Renamed to ForkedDiamond, and add more documentation about how it differs from standard diamond.

lib/CodeGen/IfConversion.cpp
860–861

I renamed it ForkedDiamond, and added more comments about it.

882

No. It's just checking if the branch is analyzable.

892

It's short enough that I don't think it's worth factoring out. If there were more checks in common, then maybe, but there aren't.

929

The function that is the most similar is ScanInstructions. I think they're different enough that having them share code would be more confusing than helpful.

The code looks in pretty good shape now. I find the test case is little missing -- how about adding some more (including negative one)?

lib/CodeGen/IfConversion.cpp
90

nit: with a common tail that can be shared

489

Split out the independent fix with a test case if possible

1024

Document the parameters.

1024

Why can't this function be folded into existing FeasbilityAnalysis with a new flag : hasCommonForkedTail (which defaults to false) ?

2085

Why is this check not done for the forked case?

iteratee updated this revision to Diff 65830.Jul 27 2016, 4:14 PM
iteratee marked 3 inline comments as done.

Cleanups in response to comments.
Removed DebugLocation propagation.

lib/CodeGen/IfConversion.cpp
489

I've removed it. Finding a test case is enough work that I just don't have time.

2085

Good catch.

iteratee updated this revision to Diff 65834.Jul 27 2016, 4:28 PM

Add negative test.

Is there anything else that I need to do for this patch?

davidxl added inline comments.Jul 30 2016, 8:32 PM
lib/CodeGen/IfConversion.cpp
895

The variable names here does not seem to match the control flow graph drawn in the comment. Please make it consistent.

932

Clean up the comment -- last instruction of what?

933

Is this a good assumption to make? Any assert can be added countDuplicatedInstructions?

965

This skip code pattern has appeared many times -- good candidate to extract into an inline function.

973

Why is this check not done outside of this function (in ValidForkedForkedDiamond before countDuplicatedInstuctions as in ValidDiamond ?

978

Why is this not already computed?

981

feasbilityAnalysis already checks isUnpredicable bit -- why is it still done here?

1155

This code looks almost exactly the same as the regular diamond case. Perhaps defined a lamba function
auto DiamondFinder = [&](decltype(&IfConverter::ValidDiamond) Checker) {

  if (CanRevCond && (this->*Checker(..) && ...) {
 ....
}

};

DiamondFinder(&IfConverter::ValidDiamond);
DiamondFinder(&IfConverter::ValidForkedDiamond);

iteratee updated this revision to Diff 66406.Aug 1 2016, 5:37 PM
iteratee marked an inline comment as done.

Refactor and comments.

lib/CodeGen/IfConversion.cpp
895

Which comment specifically? The names in the graph are member names of BBInfo, and are assumed to coincide. Here we can't make that assumption.

The names are also logical: TT = TrueBBI.TrueBB, TF = TrueBBI.FalseBB, etc.

933

I've elaborated in the comment. The size is computed by ScanInstructions, and the duplicated portion is subtracted off, so there's no point in recomputing the size, we would get the same answer.

973

Because countDuplicatedInstructions adjusts the iterators so that we know exactly which instructions are duplicated. We only worry about the non-duplicated instructions that clobber the predicate info.

978

See above.

981

Same reason as above.

1155

It would take too many parameters, and the code would be less legible.

I would have to pass in TrueBBICalc, FalseBBICalc, hasCommonTail, and then conditionalize the calls, because
ValidDiamond takes a different number of arguments from ValidForkedDiamond.

I don't think it would be cleaner.

iteratee updated this revision to Diff 66409.Aug 1 2016, 5:48 PM

Missed changes from last patch. (Refactor and comments.)

davidxl added inline comments.Aug 2 2016, 10:24 AM
lib/CodeGen/IfConversion.cpp
895

Ok -- the naming convention makes sense.

933

My question in this comment is that countDuplicatiedInstruction does not document exit state of TIE and FIE, so add assertions to make sure TIE and FIE point to what you expect to see (pointing to identical instructions before and not identical after ?)

973

Should this be done for ValidDiamond too -- only check the non-shared portion?

Also I don't think it is ideal to have code duplication like this. Looks like you should re-use scanInstructions or part of it (by making it accepting BIB and BIE) ?

1155

I still think refactoring is better -- the main reason of doing the refactoring is to avoid code duplication which is better longer term. For instance, no need to worry about fixing bugs in multiple different places.

Another point is that Diamond and ForkedDiamond patterns are exclusive, so there is no need to do forked diamond check after diamond check returns true. In other words, the code can be further simplified to:

if (CanRevCond) {

if (!DiamondFinder(...::ValidDiamond)) {
    DiamondFilnder(... ::ValidForkedDiamond);

}

Also it seems to me you don't need to introduce TrueBBICalc and FalseBBICalc. How about in ValidForkedDiamond

  1. saving the initial value of TrueBBI and FalseBBI
  2. re-scan region of BB and update TrueBBI and FalseBBI
  3. if ValidForkedDiamond fails, restore TrueBBI etc value before returning.
iteratee updated this revision to Diff 66748.Aug 3 2016, 6:17 PM

More refactors and comments.

I split ScanInstructions in two, made the actual scan take iterator bounds, and removed RecaculateCostsAndClobbers.

lib/CodeGen/IfConversion.cpp
1183

I factored out the feasibility analysis. Take a look now.

iteratee updated this revision to Diff 66753.Aug 3 2016, 8:03 PM
iteratee marked an inline comment as done.

More refactoring.

davidxl accepted this revision.Aug 5 2016, 12:30 PM
davidxl edited edge metadata.

lgtm

When commtting, I suggest you carve out the refactoring changes into one or more NFC patches (debug skip for one, scanning for one, and the rest of refactoring) before committing the functional change.

This revision is now accepted and ready to land.Aug 5 2016, 12:30 PM
iteratee updated this revision to Diff 67595.Aug 10 2016, 2:15 PM
iteratee edited edge metadata.

Added bug fix from the revert.

iteratee closed this revision.Aug 11 2016, 2:21 PM

Committed in: r278287

I have a possible fix for the broken self hosting bot. I've split the fixed patch into 3: Fix (which can apply to top), Rescan diamonds, and Forked diamond. I'm going to re-commit all 3 and let the bots churn through them again.

Recommitted in r279670 and r279671