Page MenuHomePhabricator

[InstCombine] try to fold binop with phi operands
Needs ReviewPublic

Authored by spatel on Wed, Jan 12, 6:11 AM.

Details

Summary

This is an alternate version of D115914 that handles/tests all binary opcodes. I think it is close to the same logic, but I was having a hard time making sense of the diffs/comments in that patch, so I just wrote the patch from scratch.

I suspect that we don't see these patterns too often because -simplifycfg would convert the minimal cases into selects rather than leave them in phi form (note: instcombine has logic holes for combining the select patterns too though, so that's another potential patch).

We only create a new binop in a predecessor that unconditionally branches to the final block.
https://alive2.llvm.org/ce/z/C57M2F
https://alive2.llvm.org/ce/z/WHwAoU (not safe to speculate an sdiv for example)
https://alive2.llvm.org/ce/z/rdVUvW (but it is ok on this path)

Diff Detail

Event Timeline

spatel created this revision.Wed, Jan 12, 6:11 AM
spatel requested review of this revision.Wed, Jan 12, 6:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Jan 12, 6:11 AM
This comment was removed by lebedev.ri.

What about the case where one PHI is fully constant: https://godbolt.org/z/16W8oGYv3 ?

What about the case where one PHI is fully constant: https://godbolt.org/z/16W8oGYv3 ?

Hmm...so we convert that into a binop with constant operand, and I thought we'd then get that pattern within InstCombinerImpl::foldOpIntoPhi(). But there's an extra clause in that code that is likely also needed here:

// If there is exactly one non-constant value, we can insert a copy of the
// operation in that block.  However, if this is a critical edge, we would be
// inserting the computation on some other paths (e.g. inside a loop).  Only
// do this if the pred block is unconditionally branching into the phi block.
// Also, make sure that the pred block is not dead code.
spatel updated this revision to Diff 399348.Wed, Jan 12, 9:00 AM
spatel edited the summary of this revision. (Show Details)

Patch updated:

  1. Added the necessary clause that prevents hoisting into a conditional predecessor (and more tests).
  2. Added a TODO for handling phi with 2 constant incoming values (not sure yet if that's reliably handled by simplifycfg and/or other instcombines).

is it possible that we're hoisting a non-speculatable operation past something like a call in the same block?

llvm/test/Transforms/InstCombine/zext-or-icmp.ll
42

these tests need to be changed to preserve testing of https://reviews.llvm.org/D30781

is it possible that we're hoisting a non-speculatable operation past something like a call in the same block?

Indeed. If the binop is not freely speculatable, then we need to check that it is guaranteed to be executed: https://alive2.llvm.org/ce/z/VqvC_r

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1299–1301

This is too restrictive (please add a fixme?)
https://alive2.llvm.org/ce/z/kKuiW9 vs https://alive2.llvm.org/ce/z/SJ9oYU

luna added inline comments.Wed, Jan 12, 5:38 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1329–1334

I'm wondering if it's idiomatic to validate that this instruction could also be simplified away in later iterations of instruction-combine, and only proceed with the fold if yes.

e.g., in https://godbolt.org/z/fKfondqrn, %res = or i64 %retval.sroa.3.0.extract.shift, %phi.cast instruction is inserted into if.then block, and could be simplified away.

lebedev.ri requested changes to this revision.Thu, Jan 13, 11:38 AM

(as per previous comments regarding UB introduction)

This revision now requires changes to proceed.Thu, Jan 13, 11:38 AM
spatel added inline comments.Thu, Jan 13, 12:35 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1329–1334

In general, instcombine tries to do the smallest transform that still results in an improvement. It can then chain together a series of minimal transforms to achieve the larger goal.

So if we can make this patch safe and still profitable while creating a new binop, then that's better than limiting a larger transform to the case where we eliminate the binop completely.

llvm/test/Transforms/InstCombine/zext-or-icmp.ll
42

I think these tests were already altered enough (there's no shift remaining in the existing CHECK lines) that they don't test the original problem (undefined shift). I'll try to come up with another way to cover that path.

spatel updated this revision to Diff 400014.Fri, Jan 14, 8:14 AM
spatel marked 3 inline comments as done.

Patch updated:

  1. Added check for isGuaranteedToTransferExecutionToSuccessor() to make sure the binop is executed.
  2. Added TODO comment/test for binop in different block than phi.
lebedev.ri added inline comments.Fri, Jan 14, 8:36 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1293–1297

the fixme

// TODO: This could miss matching a phi with 2 constant incoming values.

should be here, because uniform constant phi will be simplified into a plain constant

1328–1333

Don't we also need isSafeToSpeculativelyExecute() check?

spatel added inline comments.Fri, Jan 14, 9:02 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1328–1333

That's what the TODO comment and "mul_const_incoming0_speculatable" test are referring to (if I'm understanding the question).
Ie, for correctness, I think we only need to bail out for opcodes like div/rem. But for performance (and as a conservative first step), we probably don't want to hoist anything on this path if it isn't going to be executed unconditionally. Do we know that some other pass will sink the op later if possible?

llvm/test/Transforms/InstCombine/zext-or-icmp.ll
42

Extra test added here:
417f5f4633e

42

lost a char:
f417f5f4633e

spatel updated this revision to Diff 400042.Fri, Jan 14, 9:13 AM
spatel marked an inline comment as done.

Moved TODO comment higher to account for more potential patterns that could be handled here.

lebedev.ri added inline comments.Fri, Jan 14, 11:25 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
1297

Technically, this is wrong.

  1. for switch predecessor, we could have more than a single incoming edges
  2. This check is backwards. It's not really that it must have exactly two predecessors, but more that there must be at most a single predecessor with non-constant input value, otherwise we'd increase instruction count. And for constant incoming values we'll constant-fold.