Page MenuHomePhabricator

InstCombinePHI: Partial simplification of identity operations
ClosedPublic

Authored by kuhar on Jul 30 2015, 8:00 AM.

Details

Reviewers
majnemer
jmolloy
Summary

Consider this code:

...
BB:
  %i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
  %add = add nsw i32 %i, %b
  ...
`

In this common case the add can be moved to the %if.else basic block, because adding zero is an identity operation. If we go though %if.then branch it's always a win, because add is not executed; if not, the number of instructions stays the same.
This pattern applies also to other instructions like sub, shl, shr, ashr | 0, mul, sdiv, udiv | 1.

Diff Detail

Repository
rL LLVM

Event Timeline

kuhar updated this revision to Diff 31027.Jul 30 2015, 8:00 AM
kuhar retitled this revision from to InstCombinePHI: Partial simplification of identity operations.
kuhar updated this object.
kuhar set the repository for this revision to rL LLVM.
kuhar added a subscriber: llvm-commits.
jmolloy requested changes to this revision.Aug 3 2015, 2:51 AM
jmolloy added a reviewer: jmolloy.
jmolloy added a subscriber: jmolloy.

Hi Jakub,

Thanks for working on this! I've got a bunch of comments from this first round of review.

Cheers,

James

lib/Transforms/InstCombine/InstCombinePHI.cpp
784

"if *a* PHI node"

787

"we can hoist this instruction into only one of the incoming blocks"

The use of "hoist" is important, I think.

795

There are a lot of ternary operators (x ? y : z) going on in this function, and it's a bit difficult to assert that they're all correct.

What I'd like to see is some ascii-art of the situation, with each of the nodes having a name. Then variables refer to those names later, which means I don't get confused which node OpOperandIdx refers to!

C        V
  \      /
   Phi
X  |
 \  |
   Op

Something like that? ^

809

Remove braces and put break on its own line.

817

Same as line 809, and for the rest.

818

No parens () around the second expression.

838

You need to handle the case where dyn_cast returns nullptr here.

843

Same as line 838.

846

I think this second check is unnecessary. As the terminator leads to the block with the PHI in, any instruction that dominates the terminator must also dominate PN.

856

What is this doing? What situation can cause whatever this is checking against?

865

Better to use a cast<> rather than a C-style cast for PNUser.getOpcode(); that's safer:

auto *NewInst = Builder.CreateBinOp(cast<BinaryOperator>(PNUser).getOpcode(), LHS, RHS);
cast<BinaryOperator>(NewInst)->copyIRFlags(&PNUser);
This revision now requires changes to proceed.Aug 3 2015, 2:51 AM
kuhar marked 7 inline comments as done.Aug 4 2015, 2:44 AM
kuhar added inline comments.
lib/Transforms/InstCombine/InstCombinePHI.cpp
838

If it returns null, we are fine and can continue (ex. value that is a function parameter).

843

saa

846

Well, sometimes it is necessary when things get little more tricky. Consider duff's device with the do-while inside a switch. You end up with such IR:

for.cond:                                         ; preds = %for.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
  %cmp = icmp eq i32 %i.0, 100
  br i1 %cmp, label %for.end, label %for.body

for.body:                                         ; preds = %for.cond
  %conv = trunc i32 %i.0 to i16
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds [100 x i16], [100 x i16]* %Array, i64 0, i64 %idxprom
  store i16 %conv, i16* %arrayidx, align 2
  %inc = add nsw i32 %i.0, 1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %arraydecay = getelementptr inbounds [100 x i16], [100 x i16]* %Array, i64 0, i64 0
  br label %sw.bb.10.i

do.body.i:                                        ; preds = %sw.bb.10.i
  %incdec.ptr.i = getelementptr inbounds i16, i16* %incdec.ptr29.i, i64 1
  %0 = load i16, i16* %incdec.ptr29.i, align 2
  %add2.i = add i16 %add32.i, %0
  %incdec.ptr5.i = getelementptr inbounds i16, i16* %incdec.ptr.i, i64 1
  %1 = load i16, i16* %incdec.ptr.i, align 2
  %add8.i = add i16 %add2.i, %1
  %2 = add i16 %add8.i, %4
  br label %sw.bb.10.i

sw.bb.10.i:                                       ; preds = %do.body.i, %for.end
  %3 = phi i16 [ %add8.i, %do.body.i ], [ 0, %for.end ]
  %from.addr.2.i = phi i16* [ %incdec.ptr5.i, %do.body.i ], [ %arraydecay, %for.end ]
  %n.2.i = phi i32 [ %dec.i, %do.body.i ], [ 17, %for.end ]
  %incdec.ptr11.i = getelementptr inbounds i16, i16* %from.addr.2.i, i64 1
  %4 = load i16, i16* %from.addr.2.i, align 2
  %add14.i = add i16 %3, %4
  %incdec.ptr17.i = getelementptr inbounds i16, i16* %incdec.ptr11.i, i64 1
  %5 = load i16, i16* %incdec.ptr11.i, align 2
  %add20.i = add i16 %add14.i, %5
  %incdec.ptr23.i = getelementptr inbounds i16, i16* %incdec.ptr17.i, i64 1
  %6 = load i16, i16* %incdec.ptr17.i, align 2
  %add26.i = add i16 %add20.i, %6
  %incdec.ptr29.i = getelementptr inbounds i16, i16* %incdec.ptr23.i, i64 1
  %7 = load i16, i16* %incdec.ptr23.i, align 2
  %add32.i = add i16 %add26.i, %7
  %dec.i = add nsw i32 %n.2.i, -1
  %cmp.i = icmp sgt i32 %n.2.i, 1
  br i1 %cmp.i, label %do.body.i, label %sum.exit

Here you want to replace
%3 = phi i16 [ %add8.i, %do.body.i ], [ 0, %for.end ]
%add14.i = add i16 %3, %4
with
%2 = add i16 %add8.i, %4 (in for.end)
%3 = phi i16 [ %2, %do.body.i ], [ %4, %for.end ]
Which is illegal (detected by the domination check of Operand and PN).

856

It's checking if the newly inserted instruction would dominate all of its uses. I'm checking against Terminator not to add the new instruction and remove it afterwards (equivalent).
The if (PN.getIncomingBlock(IncomingValIdx) == PN.getParent()) is essentially detecting some subset of illegal cases of the check in the for loop, but it's the most common one and can be detected without using dominator tree at all.

kuhar updated this revision to Diff 31312.Aug 4 2015, 2:46 AM
kuhar edited edge metadata.
kuhar updated this revision to Diff 31313.Aug 4 2015, 3:18 AM
kuhar edited edge metadata.

Hi Jakub,

Thanks for making those changes. I just have one more tranche of changes to request; they're all to do with the testcase.

Cheers,

James

lib/Transforms/InstCombine/InstCombinePHI.cpp
846

Okay, after chatting with Jakub offline, the simpler case here is a single-block loop where "Operand" is inside the loop. It'll dominate the teminator, but not the PHI node because it is between them.

test/Transforms/InstCombine/phi.ll
636

Your CHECKs here need to be more explicit. I don't think they're tight enough now to validate that the test does what it's supposed to.

You should check that the if.end block contains only a PHI and a ret:

// CHECK: if.end:
// CHECK-NEXT: phi [ %{{.*}}, %if.else ], [ %v.addr.0, %entry ]
// CHECK-NEXT: ret
655

Same for all the rest of these; be more specific.

kuhar updated this revision to Diff 31351.Aug 5 2015, 7:49 AM

Thanks for the review, James.
Tests updated.

jmolloy accepted this revision.Aug 5 2015, 8:15 AM
jmolloy edited edge metadata.

LGTM!

This revision is now accepted and ready to land.Aug 5 2015, 8:15 AM
chatur01 closed this revision.Aug 13 2015, 5:39 AM
chatur01 added a subscriber: chatur01.

Landed r244887.

This commit resulted in PR24470, I reverted r244887 in r245194. Please
take a look when you get the chance.

kuhar updated this revision to Diff 32314.Aug 17 2015, 9:02 AM
kuhar added a reviewer: majnemer.
kuhar marked 2 inline comments as done.

Fixed bug reported as PR24470.

kuhar added a comment.Aug 17 2015, 9:05 AM

Thanks for reporting the bug and for the minimalistic test, it should be fixed now.

kuhar updated this revision to Diff 32315.Aug 17 2015, 9:08 AM

For clarity, this review has been reopened due to code reversion and now requires David's sayso to proceed.

majnemer added inline comments.Aug 18 2015, 2:09 PM
lib/Transforms/InstCombine/InstCombinePHI.cpp
882–884

Please do not create instructions that you end up destroying once you realize that the transform isn't appropriate. Instead, perform an upfront check.

kuhar updated this revision to Diff 32530.Aug 19 2015, 3:50 AM
kuhar updated this object.

Perform all checks before adding new instruction

kuhar marked an inline comment as done.Aug 19 2015, 3:54 AM

I've changed the check form using isSafeToSpeculativelyExecute to simply checking for division instruction - from all the cases I deal with only this one could result in trap.

majnemer edited edge metadata.Aug 19 2015, 4:10 AM

I've changed the check form using isSafeToSpeculativelyExecute to simply checking for division instruction - from all the cases I deal with only this one could result in trap.

Why is it OK to hoist if the incoming block unconditionally branches or ends in a switch or indirectbr?

kuhar updated this revision to Diff 32536.Aug 19 2015, 5:24 AM
kuhar edited edge metadata.

Don't perform the optimization if incoming edge's terminator is not a Branch Instruction.

kuhar added a comment.Aug 19 2015, 5:25 AM

I've changed the check form using isSafeToSpeculativelyExecute to simply checking for division instruction - from all the cases I deal with only this one could result in trap.

Why is it OK to hoist if the incoming block unconditionally branches or ends in a switch or indirectbr?

It may be not, good catch. Thanks.

majnemer added inline comments.Aug 19 2015, 10:31 AM
lib/Transforms/InstCombine/InstCombinePHI.cpp
846–855

Why not just check if BB->getUniqueSuccessor() != nullptr?

kuhar added inline comments.Aug 19 2015, 2:05 PM
lib/Transforms/InstCombine/InstCombinePHI.cpp
846–855

Do you also suggest removing this check for UDiv/SDiv with conditional branch?

kuhar added inline comments.Aug 24 2015, 2:12 AM
lib/Transforms/InstCombine/InstCombinePHI.cpp
846–855

I can either continue checking if an instruction is a branch and make sure that if it's conditional branch hoisting operation won't potentially cause a trap (sdiv/udiv); or I can only apply the optimization to unconditional branches.

Currently I follow the first option, so I cast the instruction to BranchInst and then check if it's conditional. getUniqueSuccessor won't simplify anything here (I think).

majnemer added inline comments.Aug 25 2015, 7:29 PM
lib/Transforms/InstCombine/InstCombinePHI.cpp
850–855

I think the correct logic would be:

// We can only push the instruction to a predecessor if our predecessor is guaranteed to have a single, unique successor.
if (BB->getUniqueSuccessor() != PN.getParent())
  return nullptr;

It should always be safe to hoist the instruction so long as it is inserted before the terminator.

kuhar updated this revision to Diff 33189.Aug 26 2015, 2:00 AM
kuhar marked an inline comment as done.
kuhar added a comment.Sep 11 2015, 7:02 AM

Ping! David, do you have any further comments?