This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Reassociate (C * X) * Y in foldICmpMulConstant
Needs ReviewPublic

Authored by junaire on Apr 13 2023, 3:25 AM.

Details

Summary

The existing logic in foldICmpMulConstant can only optimize the
constant if we have (X * Y) * C. However, reassociate pass currently
produces this form with the constant on the inner multiply, so let's
reassociate it in foldICmpMulConstant to enable further optimizations.

Related issue: https://github.com/llvm/llvm-project/issues/61538

Signed-off-by: Jun Zhang <jun@junz.org>

Diff Detail

Event Timeline

junaire created this revision.Apr 13 2023, 3:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2023, 3:25 AM
junaire requested review of this revision.Apr 13 2023, 3:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2023, 3:25 AM
junaire added inline comments.Apr 13 2023, 3:28 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2107

There's one corner case this doesn't handle, consider:
2 * X * Y > 0
2 * X will be folded into a shl. I'll appreciate it if anyone can suggest a more general way to match this.

junaire updated this revision to Diff 513154.Apr 13 2023, 3:30 AM

dummy update to trigger CI

nikic added a comment.Apr 13 2023, 3:43 AM

As I understand it, this is a reassociation problem. Once you reassociate the multiply from (X * C) * Y to (X * Y) * C this will automatically fold via the existing logic in foldICmpMulConstant(). I don't really want this logic to be duplicated.

It looks like reassociate currently produces this form with the constant on the inner multiply, but I'm not sure whether that's something that can be changed or not.

As I understand it, this is a reassociation problem. Once you reassociate the multiply from (X * C) * Y to (X * Y) * C this will automatically fold via the existing logic in foldICmpMulConstant(). I don't really want this logic to be duplicated.

It looks like reassociate currently produces this form with the constant on the inner multiply, but I'm not sure whether that's something that can be changed or not.

Thanks, that makes sense and solve my confusion when I wrote this. Should I try to fix the missing optimization by looking into the reassociate issue or the patch doesn't make sense at all?

nikic added a comment.EditedApr 13 2023, 5:24 AM

As I understand it, this is a reassociation problem. Once you reassociate the multiply from (X * C) * Y to (X * Y) * C this will automatically fold via the existing logic in foldICmpMulConstant(). I don't really want this logic to be duplicated.

It looks like reassociate currently produces this form with the constant on the inner multiply, but I'm not sure whether that's something that can be changed or not.

Thanks, that makes sense and solve my confusion when I wrote this. Should I try to fix the missing optimization by looking into the reassociate issue or the patch doesn't make sense at all?

I took a quick look at reassociate, and I don't think it's possible to change what it does. Reversing the order is not really compatible with the ranking it performs.

In that case, I think we do want to handle this here, but we should try to extract the common code from foldICmpMulConstant(). I'm thinking that the helper function should accept the multiply constant, the icmp constant, the predicate and the OBO flags, and then return the new predicate and constant if possible. Then we'll call that function to check whether the reassociated form can fold and if so, we can actually materialize the new multiply and the new icmp.

junaire planned changes to this revision.Apr 13 2023, 5:40 AM

As I understand it, this is a reassociation problem. Once you reassociate the multiply from (X * C) * Y to (X * Y) * C this will automatically fold via the existing logic in foldICmpMulConstant(). I don't really want this logic to be duplicated.

It looks like reassociate currently produces this form with the constant on the inner multiply, but I'm not sure whether that's something that can be changed or not.

Thanks, that makes sense and solve my confusion when I wrote this. Should I try to fix the missing optimization by looking into the reassociate issue or the patch doesn't make sense at all?

I took a quick look at reassociate, and I don't think it's possible to change what it does. Reversing the order is not really compatible with the ranking it performs.

In that case, I think we do want to handle this here, but we should try to extract the common code from foldICmpMulConstant(). I'm thinking that the helper function should accept the multiply constant, the icmp constant, the predicate and the OBO flags, and then return the new predicate and constant if possible. Then we'll call that function to check whether the reassociated form can fold and if so, we can actually materialize the new multiply and the new icmp.

Thanks, I'll try what you proposed.

junaire updated this revision to Diff 513961.Apr 15 2023, 9:41 PM

Make a helper function.

As I understand it, this is a reassociation problem. Once you reassociate the multiply from (X * C) * Y to (X * Y) * C this will automatically fold via the existing logic in foldICmpMulConstant(). I don't really want this logic to be duplicated.

It looks like reassociate currently produces this form with the constant on the inner multiply, but I'm not sure whether that's something that can be changed or not.

Thanks, that makes sense and solve my confusion when I wrote this. Should I try to fix the missing optimization by looking into the reassociate issue or the patch doesn't make sense at all?

I took a quick look at reassociate, and I don't think it's possible to change what it does. Reversing the order is not really compatible with the ranking it performs.

In that case, I think we do want to handle this here, but we should try to extract the common code from foldICmpMulConstant(). I'm thinking that the helper function should accept the multiply constant, the icmp constant, the predicate and the OBO flags, and then return the new predicate and constant if possible. Then we'll call that function to check whether the reassociated form can fold and if so, we can actually materialize the new multiply and the new icmp.

I tried to unify icmp((MulC * X) * Y, C) and icmp(X * MulC, C) but it looks like adds more complexity. Because we have to extract the MulC bit by bit in case 1 in order to also get the nsw flags. And for case 2 MulC is needed in the below folds (MulC can't be zero or X * MulC == C --> X == C/MulC doesn't work anymore). What's more C < 0 is OK for case 2 but it doesn't apply to case 1. That said, though these cases have almost the same effect, they don't actually share too much common code...

junaire updated this revision to Diff 513963.Apr 15 2023, 10:31 PM

Maybe we can just make helper function for icmp((MulC * X) * Y, C) as a special case

junaire updated this revision to Diff 513966.Apr 15 2023, 11:45 PM

early bail out

nikic added a comment.Apr 16 2023, 1:13 AM

I tried to unify icmp((MulC * X) * Y, C) and icmp(X * MulC, C) but it looks like adds more complexity. Because we have to extract the MulC bit by bit in case 1 in order to also get the nsw flags. And for case 2 MulC is needed in the below folds (MulC can't be zero or X * MulC == C --> X == C/MulC doesn't work anymore). What's more C < 0 is OK for case 2 but it doesn't apply to case 1. That said, though these cases have almost the same effect, they don't actually share too much common code...

I'm not sure what you mean here. Why does MulC need to be extracted bit by bit? Don't we just need to implement these two rules on whether nuw/nsw can be preserved during reassociation? https://alive2.llvm.org/ce/z/KKtP62 https://alive2.llvm.org/ce/z/jgrYGB

goldstein.w.n added inline comments.Apr 17 2023, 5:48 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2049

Swap order of this conditions.

2057

match(Mul, m_c_Mul(m_Mul(m_Value(X), m_APInt(C)), m_Value(Y))

You don't need m_c_Mul when you are searching for constant being of canonicalization.

Also to avoid excessive scope depth, can you invert if the if i.e:
if(!match(...)) return nullptr

Likewise below when checking nsw/MulC s> 0

2060

the "mul" tag is probably not needed. Also I think generally you shouldn't be creating instructions unless you know its going to be used. You can still return nullptr here.

2061

ConstantInt:: -> Constant::

bcl5980 added inline comments.Apr 17 2023, 10:52 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2056

It looks we need to limit both * to one-use.

bcl5980 added inline comments.Apr 17 2023, 10:55 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2056

At least outer mul need one-use.

junaire updated this revision to Diff 514956.Apr 19 2023, 7:54 AM

Try what @nikic proposed.

junaire marked an inline comment as done.Apr 19 2023, 7:59 AM

I tried to unify icmp((MulC * X) * Y, C) and icmp(X * MulC, C) but it looks like adds more complexity. Because we have to extract the MulC bit by bit in case 1 in order to also get the nsw flags. And for case 2 MulC is needed in the below folds (MulC can't be zero or X * MulC == C --> X == C/MulC doesn't work anymore). What's more C < 0 is OK for case 2 but it doesn't apply to case 1. That said, though these cases have almost the same effect, they don't actually share too much common code...

I'm not sure what you mean here. Why does MulC need to be extracted bit by bit? Don't we just need to implement these two rules on whether nuw/nsw can be preserved during reassociation? https://alive2.llvm.org/ce/z/KKtP62 https://alive2.llvm.org/ce/z/jgrYGB

Thanks, I think I get your ideas. Note I didn't add folds for nuw since I found the existing logic works great for it (https://alive2.llvm.org/ce/z/r_oHlt)
Note we should also reassociate for shl since X * Power_Of_N will be fold to X << N, I left it as a TODO. But I'm OK with that if you want to do it in this patch :)

junaire edited the summary of this revision. (Show Details)Apr 19 2023, 8:04 AM
junaire retitled this revision from [InstCombine] icmp(MulC * X * Y, C) --> icmp(X * Y, C) to [InstCombine] Reassociate (C * X) * Y in foldICmpMulConstant.
junaire updated this revision to Diff 514958.Apr 19 2023, 8:07 AM

Fix comments

junaire updated this revision to Diff 514994.Apr 19 2023, 9:38 AM

handle shl cuz why not

junaire updated this revision to Diff 515004.Apr 19 2023, 9:57 AM

clang-format

goldstein.w.n added inline comments.Apr 19 2023, 2:34 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2071

1 << C, not C << C?

llvm/test/Transforms/InstCombine/icmp.ll
5345

Add some tests for the shl case. Better yet split adding the shl logic to a follow up patch.

bcl5980 added inline comments.Apr 19 2023, 5:09 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2043

I'm not sure what are you going to do now. If you just want to reassociate C to outer mul, there is no extra condition. https://alive2.llvm.org/ce/z/v-f92Q
The code should be simple:

BinaryOperator *InnerMul;
Value *Y;
Constant *InnerC;
if (match(Mul, m_OneUse(m_c_Mul(m_BinOp(InnerMul), m_Value(Y)))) &&
    match(InnerMul, m_Mul(m_Value(X), m_Constant(InnerC)))) {
  Value *NewInner = Builder.CreateMul(X, Y);
  Value *NewMul = Builder.CreateMul(NewInner, InnerC);
  return new ICmpInst(Pred, NewMul, Mul->getOperand(1));
}

If you want to keep nsw/nuw flags, you can set the flag after create NewMul like:

if (!InnerC->isZeroValue() && Mul->hasNoUnsignedWrap() &&
    InnerMul->hasNoUnsignedWrap()) {
  cast<BinaryOperator>(NewInner)->setHasNoUnsignedWrap();
  cast<BinaryOperator>(NewMul)->setHasNoUnsignedWrap();
} else {
  Constant *Zero = Constant::getNullValue(InnerC->getType());
  Constant *InnerCSleZero =
      ConstantExpr::getCompare(ICmpInst::ICMP_SLE, InnerC, Zero);
  if (InnerCSleZero->isZeroValue() && Mul->hasNoSignedWrap() &&
      InnerMul->hasNoSignedWrap()) {
    cast<BinaryOperator>(NewInner)->setHasNoSignedWrap();
    cast<BinaryOperator>(NewMul)->setHasNoSignedWrap();
  }
}
2052

Do we really need llvm:: here? There is already using namespace llvm in this file at line 30.

junaire marked 8 inline comments as done.Apr 20 2023, 12:48 AM
junaire added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2043

IIUC we want to reassociate (X * C) * Y to (X * Y) * C if they both has nsw/nuw flags so they can be folded by the existing logic in foldICmpMulConstant.

I'm not sure what are you going to do now. If you just want to reassociate C to outer mul, there is no extra condition. https://alive2.llvm.org/ce/z/v-f92Q
The code should be simple:

BinaryOperator *InnerMul;
Value *Y;
Constant *InnerC;
if (match(Mul, m_OneUse(m_c_Mul(m_BinOp(InnerMul), m_Value(Y)))) &&
    match(InnerMul, m_Mul(m_Value(X), m_Constant(InnerC)))) {
  Value *NewInner = Builder.CreateMul(X, Y);
  Value *NewMul = Builder.CreateMul(NewInner, InnerC);
  return new ICmpInst(Pred, NewMul, Mul->getOperand(1));
}

The issue is that if the inner mul's constant operand is a power of 2, it will be folded to an shl. like:

%mul = shl nsw i8 %a, i8 1
%mul2 = mul nsw i8 %b, %mul

Then the pattern match you proposed doesn't work anymore.

If you want to keep nsw/nuw flags, you can set the flag after create NewMul like:

if (!InnerC->isZeroValue() && Mul->hasNoUnsignedWrap() &&
    InnerMul->hasNoUnsignedWrap()) {
  cast<BinaryOperator>(NewInner)->setHasNoUnsignedWrap();
  cast<BinaryOperator>(NewMul)->setHasNoUnsignedWrap();
} else {
  Constant *Zero = Constant::getNullValue(InnerC->getType());
  Constant *InnerCSleZero =
      ConstantExpr::getCompare(ICmpInst::ICMP_SLE, InnerC, Zero);
  if (InnerCSleZero->isZeroValue() && Mul->hasNoSignedWrap() &&
      InnerMul->hasNoSignedWrap()) {
    cast<BinaryOperator>(NewInner)->setHasNoSignedWrap();
    cast<BinaryOperator>(NewMul)->setHasNoSignedWrap();
  }
}
2052

Do we really need llvm:: here? There is already using namespace llvm in this file at line 30.

Thanks, fixed.

2071

1 << C, not C << C?

good catch, fixed. Thx.

llvm/test/Transforms/InstCombine/icmp.ll
5345

I thought I already added tests for mul(shl(X, C), Y)? I don't have a strong opinion about whether we should delay shl support, I think that's easy to do?

junaire updated this revision to Diff 515238.Apr 20 2023, 12:49 AM
junaire marked 3 inline comments as done.

Address comments.

bcl5980 added inline comments.Apr 20 2023, 1:18 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2043

SHL is not the main thing I discuss here, we can support shl similar to mul. What I mean is do we really need to limit the transform to nsw/nuw. Always hoist C to outer mul should be a more general solution I think.

2102

should be

return new ICmpInst(Pred, NewMul, Mul->getOperand(1));
junaire added inline comments.Apr 20 2023, 1:43 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2043

Yeah, we can always hoist C to outer mul but does that have any benefit? We hoist it here cuz we know we can eliminate it later in foldICmpMulConstant. Can we do the same thing for muls that doesn't have nsw/nuw? I took a quick look at foldICmpMulConstant and believe the answer it no. Correct me if I'm wrong.

2102

Why return it? It doesn't make sense to me. I thought what we want to do is reassociate (X * C) * Y to (X * Y) * C so the later code can recognize the pattern and fold it. Thus we reuse the existing logic.

bcl5980 added inline comments.Apr 20 2023, 5:18 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2043

That is not for foldICmpMulConstant only. Constant may match more patterns generally. For example, In foldICmpBinOp:
icmp eq/ne (X * C), (Y * C) --> icmp (X & Mask), (Y & Mask)
@nikic What do you think about this?

2102

After return you still can reuse existing. I'm not sure if currect code will break something or not but I don't think we have similar usage in instcombine.

junaire updated this revision to Diff 515564.Apr 20 2023, 8:12 PM

Always reassociate despite we have OBO flags or not.

junaire marked an inline comment as done.Apr 20 2023, 8:14 PM
junaire added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2043

OK! I updated the patch and added a test when we don't have nsw.

2102

Hi @bcl5980, thanks for your suggestion, I appreciate it. However, after I tried to replace replaceInstUsesWith(*Mul, NewMul); to return new ICmpInst(Pred, NewMul, Mul->getOperand(1));, the folds in my tests stop working. I took another deep look and I think your suggestion should be right, let me try to investigate why... (any more comments are welcome

junaire marked 3 inline comments as done.Apr 20 2023, 8:14 PM
junaire updated this revision to Diff 515582.Apr 20 2023, 9:11 PM

replace replaceInstUsesWith(*Mul, NewMul); with return new ICmpInst(Pred, NewMul, ConstantInt::get(Mul->getType(), C));

junaire marked 2 inline comments as done.Apr 20 2023, 9:13 PM
junaire added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2102

So that's new ICmpInst(Pred, NewMul, ConstantInt::get(Mul->getType(), C));, not return new ICmpInst(Pred, NewMul, Mul->getOperand(1));. Fixed, thanks!

goldstein.w.n added inline comments.Apr 25 2023, 4:34 PM
llvm/test/Transforms/InstCombine/icmp.ll
4970

I don't see any tests with nuw or for the comination of the flags.

junaire updated this revision to Diff 517850.Apr 28 2023, 2:47 AM

Add tests for nuw

junaire marked an inline comment as done.Apr 28 2023, 2:48 AM
junaire added inline comments.
llvm/test/Transforms/InstCombine/icmp.ll
4970

Thanks, I added more tests. PTAL :)

junaire marked an inline comment as done.May 1 2023, 12:40 AM

The review process seems to be stuck. What's your take on the patch? @nikic

gentle ping :)

junaire updated this revision to Diff 522116.May 15 2023, 4:04 AM

Rebase + ping :)

junaire updated this revision to Diff 522119.May 15 2023, 4:13 AM

Ooops, updated a wrong diff, sorry

nikic added a comment.May 16 2023, 8:51 AM

The reassociation code itself looks basically fine. The problem is how it's used. There is two ways we can do this:

  1. Say that InstCombine is always going to reassociate the constant to the outside, in opposition to -reassociate. (Is there precedent for this?)
  2. Factor out the logic for "will this fold" from the logic for "do the fold" in foldICmpMulConstant(). Then only "virtually" reassociate and only perform the transform if it's actually going to simplify.

The current implementation is in between those two: It only reassociates if there's a comparison, but there's no actual guarantee that the resulting reassociation will actually fold (unless I'm missing something).

Option 2 is what I was suggesting before. I'm open to considering option 1, but I don't full understand the implications. I think one problem could be that this breaks up a multiply that would otherwise be loop invariant.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
2047

IRBuilderBase &

2050

It looks like this variable declaration can be moved into the if block below.

2053

I'd probably accept CI as APInt here as well, and move creation of the ConstantInt into this helper.

2073

Unnecessary newline.

2078

Should check here that the shift is valid, otherwise we'll assert. (It will usually get simplified, but is not guaranteed.)

junaire updated this revision to Diff 524758.May 23 2023, 9:12 AM

Been busy working on other topics so just address some nit issues to indicate the patch isn't dead yet

junaire marked 4 inline comments as done.May 23 2023, 9:14 AM
junaire marked an inline comment as done.