This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] ease use constraint in tryFactorization()
ClosedPublic

Authored by spatel on Aug 22 2022, 2:32 PM.

Details

Summary

(x * y) + x --> x * (y + 1)
(x * y) - x --> x * (y - 1)

https://alive2.llvm.org/ce/z/eMhvQa

This is one of the IR transforms suggested in issue #57255.

This should be better in IR because it removes a use of a variable operand (we already fold the case with a constant multiply operand).
The backend should be able to re-distribute the multiply if that's better for the target.

Diff Detail

Event Timeline

spatel created this revision.Aug 22 2022, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 2:32 PM
spatel requested review of this revision.Aug 22 2022, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 2:32 PM
hiraditya added inline comments.Aug 22 2022, 5:04 PM
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1422 ↗(On Diff #454611)

Can this cause a regression when Y is a power of two?
Some other optimization would rewrite x*y as y << n (when n == lg(y) at compile time)?

bcl5980 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1422 ↗(On Diff #454611)

https://alive2.llvm.org/ce/z/LRobTp
It looks the answer is yes.

spatel added inline comments.Aug 23 2022, 5:56 AM
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1422 ↗(On Diff #454611)

That is correct - this transform could obscure a power-of-2 transform.
But it could also expose a power-of-2 multiply that we miss today:
https://alive2.llvm.org/ce/z/ZUjR5S

In the typical instcombine sequence, we would process the multiply operand before the add user of that operand, so I haven't found a way to show a regression from this patch yet.

We need a more general solution if we are going to get the optimal IR for all patterns.

But the example shows a simpler pattern that I missed initially:
https://alive2.llvm.org/ce/z/eMhvQa

I think we should do that transform for the same reason stated here - it reduces the number of uses of a variable operand.

And it looks like we might not need this patch because -reassociate gives us that simpler pattern.

spatel updated this revision to Diff 454824.Aug 23 2022, 6:52 AM
spatel edited the summary of this revision. (Show Details)

Patch updated:
Handle a simpler pattern - just looking for a mul with a common operand with an add.

bcl5980 added inline comments.Aug 23 2022, 7:36 AM
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1422 ↗(On Diff #454611)

This looks like a special case of distributive law: x * y + x * 1
How about move this to InstCombinerImpl::SimplifyUsingDistributiveLaws or InstCombinerImpl::tryFactorization ?

Allen added inline comments.Aug 23 2022, 8:15 AM
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1422 ↗(On Diff #454611)

so, the decomposition the const of a MUL should be improved first?

1423 ↗(On Diff #454824)

should we need consider the flag of NUW and NSW ?

spatel added inline comments.Aug 23 2022, 8:45 AM
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1422 ↗(On Diff #454611)

We do try to match this pattern in that code, but we miss it because the operand 'x' has multiple uses. I'll try to make it work if that isn't too ugly...but this explicit pattern-match might be the clearer way to achieve the transform.

1422 ↗(On Diff #454611)

Ideally yes, we would have decomposition completed in the backend. But I also think this patch can uncover folds that we miss today (like the example where we created a power-of-2 multiply). So this patch should not be blocked on improving the backend. There could be both wins and losses.

1423 ↗(On Diff #454824)

I only found one case so far where we can partially propagate nuw:
https://alive2.llvm.org/ce/z/aGSyrK
Everything else seems to be illegal. Propagating the flag would be another reason to code this match explicitly rather than trying to make it work through tryFactorization().

bcl5980 added inline comments.Aug 23 2022, 9:03 AM
llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1423 ↗(On Diff #454824)

If we don't move it to tryFactorization, do we need to add the similar pattern x*y - x --> (x-1) * y ?

1423 ↗(On Diff #454824)

sorry, should be x*y -x --> x * (y-1)

spatel updated this revision to Diff 454992.Aug 23 2022, 3:47 PM
spatel retitled this revision from [InstCombine] reassociate/factor add+mul with common operand to [InstCombine] ease use constraint in tryFactorization().
spatel edited the summary of this revision. (Show Details)

Patch updated:
Adjust the use checks in tryFactorization() to handle the motivating patterns. This also changes some existing tests with multi-uses that we ignored before. Those look like good changes to me for the same reason cited before - it reduces uses of variables.

Also, I didn't remember that we have code that tries to propagate nsw/nuw through those folds. It is there, but incomplete as noted with the TODO items in the subtract tests.

bcl5980 accepted this revision.Aug 23 2022, 8:42 PM

LGTM. But please wait for someone else.

This revision is now accepted and ready to land.Aug 23 2022, 8:42 PM
Allen added inline comments.Aug 24 2022, 12:24 AM
llvm/test/Transforms/InstCombine/sub.ll
2007

sorry for the naive question, does the nsw of mul implies that of add? As you just say the mul could retain nsw.

bcl5980 added inline comments.Aug 24 2022, 12:41 AM
llvm/test/Transforms/InstCombine/sub.ll
2007

https://alive2.llvm.org/ce/z/WjpNoD
add pattern can't retain nsw. overflow comes from -1 * INT_MIN

Allen added inline comments.Aug 24 2022, 1:41 AM
llvm/test/Transforms/InstCombine/sub.ll
2007

Thanks @bcl5980, as your case showed, the mul pattern also can't retain nsw ?

bcl5980 added inline comments.Aug 24 2022, 2:01 AM
llvm/test/Transforms/InstCombine/sub.ll
2007

Yeah, these two patterns are the only two cases we can keep mul flag. All add/sub flags will be lost.
https://alive2.llvm.org/ce/z/HDyrwb

spatel added inline comments.Aug 24 2022, 5:59 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
688–689

For reference, this is the block where we try propagating overflow flags. It can be adjusted independently of this patch.

Allen accepted this revision.Aug 24 2022, 7:16 AM
This revision was landed with ongoing or failed builds.Aug 24 2022, 9:11 AM
This revision was automatically updated to reflect the committed changes.
arsenm added a subscriber: arsenm.Nov 2 2022, 5:17 PM
arsenm added inline comments.
llvm/test/Transforms/InstCombine/add.ll
1759

Do we have a backend combine to undo this already?

arsenm added inline comments.Nov 2 2022, 5:31 PM
llvm/test/Transforms/InstCombine/add.ll
1759

Specifically, this is bad for us because it breaks integer mad matching

bcl5980 added inline comments.Nov 3 2022, 12:13 AM
llvm/test/Transforms/InstCombine/add.ll
1759

There is a similar pattern in DAGCombiner but the mul second op is also constant:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L4112

// fold (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2)
if (DAG.isConstantIntBuildVectorOrConstantInt(N1) &&
    N0.getOpcode() == ISD::ADD &&
    DAG.isConstantIntBuildVectorOrConstantInt(N0.getOperand(1)) &&
    isMulAddWithConstProfitable(N, N0, N1))
  return DAG.getNode(
      ISD::ADD, DL, VT,
      DAG.getNode(ISD::MUL, SDLoc(N0), VT, N0.getOperand(0), N1),
      DAG.getNode(ISD::MUL, SDLoc(N1), VT, N0.getOperand(1), N1));

And it looks AArch64 backend already has the pattern:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp#L15066

// Canonicalize X*(Y+1) -> X*Y+X and (X+1)*Y -> X*Y+Y,
// and in MachineCombiner pass, add+mul will be combined into madd.
// Similarly, X*(1-Y) -> X - X*Y and (1-Y)*X -> X - Y*X.

AMDGPU can add similar code, maybe you need to consider more for that (divergence? data type?).
https://alive2.llvm.org/ce/z/gJRkR5