This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold a mul with bool value into and
ClosedPublic

Authored by Allen on May 19 2022, 8:23 PM.

Diff Detail

Event Timeline

Allen created this revision.May 19 2022, 8:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 8:23 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
Allen requested review of this revision.May 19 2022, 8:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 8:23 PM
Allen updated this revision to Diff 430875.May 19 2022, 9:25 PM
Allen updated this revision to Diff 430876.
Allen updated this revision to Diff 430877.May 19 2022, 9:38 PM

zero already supported before

Allen updated this revision to Diff 430886.May 19 2022, 11:34 PM

update test case

grandinj added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

Surely the more general case is something like:

if (ComputeMaxSignificantBits(Op0) ==1 ||
    ComputeMaxSignificantBits(Op1)==1)

?

xbolva00 added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

Yeah, please more general case.

craig.topper added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

ComputeMaxSignificantBits(Op0) ==1 means the input is 0 or -1.

You would need computeKnownBits or MaskedValueIsZero.

Allen added inline comments.May 20 2022, 1:21 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

Thanks very much , I'll try to use the API ComputeMaxSignificantBits.

and there is some failure cases, for example, it 'll turn the -42 into 42.

; X * C (when X is a sext boolean) --> X ? -C : 0

define i32 @mul_sext_bool(i1 %x) {
; CHECK-LABEL: @mul_sext_bool(   
; CHECK-NEXT:    [[M:%.*]] = select i1 [[X:%.*]], i32 -42, i32 0
; CHECK-NEXT:    ret i32 [[M]]
;            
  %s = sext i1 %x to i32
  %m = mul i32 %s, 42
  ret i32 %m
}
Allen added inline comments.May 20 2022, 3:35 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

hi @craig.topper, base your comment, I tried the following change, and it works.

+    KnownBits XKnown = computeKnownBits(Op0, 0, &I);
+    KnownBits YKnown = computeKnownBits(Op1, 0, &I);
+    if ((XKnown.countMaxPopulation() == 1) &&
+        (YKnown.countMaxPopulation() == 1))
+      return BinaryOperator::CreateAnd(Op0, Op1);

but there is a new failure case mul_bools_use3 in file llvm/test/Transforms/InstCombine/mul.ll. And I can't sure which version is better, should I add extra condition to avoid touch this case ?

  • without above change
%r = select i1 %x, i32 %zy, i32 0
  • with above change
%r1 = and i1 %x, %y
%r = zext i1 %r1 to i32
Allen updated this revision to Diff 430938.May 20 2022, 4:08 AM

Place the new match rule after the following rule can void to touch other cases

// (zext bool X) * Y --> X ? Y : 0
// Y * (zext bool X) --> X ? Y : 0
xbolva00 added inline comments.May 20 2022, 4:44 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

So then we miss this transformation as well.

%r1 = and i1 %x, %y
%r = zext i1 %r1 to i32

>

%r = select i1 %x, i32 %zy, i32 0

but which form is better? target specific? @spatel

llvm/test/Transforms/InstCombine/trunc.ll
1025

I dont think trunc.ll file is suitable for these new tests. Find some other like mul.ll or just create new one.

Allen updated this revision to Diff 430946.May 20 2022, 5:26 AM

split the test into new file mull.ll

Allen added inline comments.May 20 2022, 5:27 AM
llvm/test/Transforms/InstCombine/trunc.ll
1025

thanks, apply your comment.

Allen marked 4 inline comments as done.May 20 2022, 5:41 AM
Allen added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

@xbolva00 @spatel

As now it doesn't have such regression after adjusting the place of pattern match, so I'll take another separate patch to fix it later.  so please remaind me if you think the prior **%r = select i1 %x, i32 %zy, i32 0** is better, thanks.
RKSimon added inline comments.May 20 2022, 5:57 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
392

this doesn't limit to 0 / 1 - it means at most 1 bit is set - so 0 or any-pow-2

KnownBits::getMaxValue().ule(1) might be what you want?

llvm/test/Transforms/InstCombine/mull.ll
2 ↗(On Diff #430946)

Maybe rename this file mul-bool.ll ?

3 ↗(On Diff #430946)

Do you need this?

5 ↗(On Diff #430946)

Maybe rename this file mul-bool.ll ?

7 ↗(On Diff #430946)

better descriptive test names - @test_mul_bit_x0_y0?

And add additional negativel test coverage such as

define i64 @test_mul_bit_x0_y1(i64 %x, i64 %y) {
  %and1 = and i64 %x, 1
  %and2 = and i64 %y, 2
  %mul = mul i64 %and1, %and2
  ret i64 %mul
}

Vector tests as well would be good.

19 ↗(On Diff #430946)

test_mul_bit_x0_yC

33 ↗(On Diff #430946)

Isn't this covered by existing tests?

spatel added inline comments.May 20 2022, 6:20 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

Those forms are not exactly equivalent (without freeze?), so we need more context to decide if something needs to be done.
The immediate problem should be fixed by adding a use constraint:
f0071d43e4d3

Please rebase.

Allen updated this revision to Diff 431112.May 20 2022, 7:00 PM
Allen marked 2 inline comments as done.

Add new vector test case and update with condition XKnown.getMaxValue().ule(1)

Allen added a comment.May 20 2022, 7:00 PM

Please rebase.

Done, thanks

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
312

thanks very much for detail demo, I'll try according it.

llvm/test/Transforms/InstCombine/mull.ll
33 ↗(On Diff #430946)

the combine is already supported before, but I don't find related case, so just add a case to guard that!
I can delete this case if you think this is redundant ?

Allen updated this revision to Diff 431114.May 20 2022, 7:10 PM

Add new case test_mul_bit_x0_yC

xbolva00 added inline comments.May 21 2022, 1:11 AM
llvm/test/Transforms/InstCombine/mul-bool.ll
4 ↗(On Diff #431114)

mull -> mul

LG

llvm/test/Transforms/InstCombine/mul-bool.ll
19 ↗(On Diff #431114)

Negativel test -> Negative test

nikic added a reviewer: nikic.May 21 2022, 1:42 AM
nikic added a subscriber: nikic.

What is the real-world motivation for this change? This seems very niche for a transform that introduces two unconditional computeKnownBits() calls.

Allen updated this revision to Diff 431128.May 21 2022, 2:36 AM

update typo for comment

Allen marked 2 inline comments as done.May 21 2022, 2:41 AM
Allen added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
392

Thanks very much, apply with your comment.

llvm/test/Transforms/InstCombine/mul-bool.ll
4 ↗(On Diff #431114)

done, thanks

19 ↗(On Diff #431114)

done, thanks

llvm/test/Transforms/InstCombine/mull.ll
2 ↗(On Diff #430946)

Done

3 ↗(On Diff #430946)

I delete it.

5 ↗(On Diff #430946)

Done

7 ↗(On Diff #430946)

Both negativel test and Vector test added, thanks

spatel edited the summary of this revision. (Show Details)May 22 2022, 8:43 AM
nikic added a comment.May 22 2022, 8:49 AM

Botan AES-128 benchmark, maybe others..

As far as I can tell, this wouldn't help Botan AES-128, because it needs the variant where only one of the operands is 0/1. That does look more generally useful, but is also somewhat unclear from a canonicalization perspective, because it increases instruction count. (We'd probably want to represent it as trunc(X) ? Y : 0 rather than -X & Y, but it's increasing the count either way.)

Botan AES-128 benchmark, maybe others..

As far as I can tell, this wouldn't help Botan AES-128, because it needs the variant where only one of the operands is 0/1. That does look more generally useful, but is also somewhat unclear from a canonicalization perspective, because it increases instruction count. (We'd probably want to represent it as trunc(X) ? Y : 0 rather than -X & Y, but it's increasing the count either way.)

Yes, note that we do have these related folds already:

// (zext bool X) * Y --> X ? Y : 0
// (lshr X, 31) * Y --> (ashr X, 31) & Y

With these comments on the 2nd fold:

// TODO: We are not checking one-use because the elimination of the multiply
//       is better for analysis?
// TODO: Should we canonicalize to '(X < 0) ? Y : 0' instead? That would be
//       more similar to what we're doing above.

If we really want this fold in IR using known bits, then it could be implemented to reduce cost in 2 ways:

  1. Check for 'nuw nsw' - if we know all of the high bits of both values are clear, then no-wrap should be set (and if they are not set, then there's no chance that computing known bits again is going to work here).
  2. Predicate the 2nd call on the 1st instead of trying both before the 'if'.

But given that we already have the zext/shift patterns covered, maybe we just want to add the specific match for and 1 as this patch was originally written. None of the tests are covering other patterns because we already get all of those?

Allen updated this revision to Diff 431281.May 22 2022, 7:47 PM
Allen marked 2 inline comments as done.

Thanks all. Based on the disccusion above, I roll back to the previous conservative handing method, so that further improvements can be made base on more special actual scenarious.

Allen added a comment.May 24 2022, 3:46 AM

any new comment ?

Given the existing folds near here, I'm fine with the small and direct match. But let's make sure others agree with the direction.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
305

I think it would be better to add the new match to this clause since they are folding to the same new value:

if (type is i1 || (match...))
  return new and
387–390

IIUC, the comment about trunc + select was about the case when only one operand has range {0,1}.
So I'd remove that and possibly just add something like:
// Note: We could use known bits to generalize this and related patterns with shifts/truncs.

391–392

Don't capture X and Y if we are not using them in the new instruction.

llvm/test/Transforms/InstCombine/mul-bool.ll
1–2 ↗(On Diff #431281)

It would be better to pre-commit these tests with baseline results in the same file as one of the related transforms. Add to "mul-masked-bits.ll" ?

Allen added inline comments.May 25 2022, 12:37 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
391–392

Thanks @spatel very much, as the m_And operation need 2 operands. How can I not match the 1st operand?
Do you mean use the Op0->gtOpcode() == Instruction::And && isOneConstant(Op0->getOperand(1)) ?

Allen added inline comments.May 25 2022, 1:00 AM
llvm/test/Transforms/InstCombine/mul-bool.ll
1–2 ↗(On Diff #431281)

hi @spatel, I try to pre-commit these tests with commit https://reviews.llvm.org/D126356, please help to review, thanks!

spatel added inline comments.May 25 2022, 4:57 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
391–392

If you don't need to capture the operand value, then you just leave the argument empty:

match(Op0, m_And(m_Value(), m_One()))
Allen updated this revision to Diff 431955.May 25 2022, 6:04 AM

rebase to the top after precommit the test cases

Allen updated this revision to Diff 431957.May 25 2022, 6:07 AM

leave the capture empty according comment

Allen added inline comments.May 25 2022, 6:09 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
391–392

thanks very much, apply your comment

Allen updated this revision to Diff 432246.May 26 2022, 5:03 AM
Allen marked 4 inline comments as done.
Allen edited the summary of this revision. (Show Details)

address a missing comment

Allen marked an inline comment as done.May 26 2022, 5:06 AM

hi @nikic, would you please help me check this change ? thanks!

Allen added a comment.EditedMay 28 2022, 3:24 PM

hi @spatel , I'm sorry to trouble you again.

Now I already address all your comment before, should we wait another reviewer to also agree with this direction?

I added a couple of minor comments.
Please update, and if we don't hear any objections, then I think we are good. (If not, it can always be reverted.)

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
307

"...can be only {0,1}."

llvm/test/Transforms/InstCombine/mul-masked-bits.ll
170 ↗(On Diff #432246)

Please add a test where both of the operands have an extra use. Something like this:

define i64 @scalar_mul_bit_x0_y0_uses(i64 %x, i64 %y) {
  %and1 = and i64 %x, 1
  call void @use(i64 %and1)
  %and2 = and i64 %y, 1
  call void @use(i64 %and2)
  %mul = mul i64 %and1, %and2
  ret i64 %mul
}
Allen updated this revision to Diff 432830.May 29 2022, 9:28 PM
Allen edited the summary of this revision. (Show Details)

Address comment and add a new test case scalar_mul_bit_x0_y0_uses

nikic accepted this revision.May 30 2022, 4:16 AM

LG

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
305

scenes -> scenarios?

This revision is now accepted and ready to land.May 30 2022, 4:16 AM
spatel accepted this revision.May 30 2022, 4:32 AM
This revision was landed with ongoing or failed builds.May 30 2022, 6:07 AM
This revision was automatically updated to reflect the committed changes.
Allen marked 2 inline comments as done.