This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
10/10
InstCombineAddSub.cpp
2/2
InstCombineInternal.h
8/8
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/2
mul.ll

Differential D136015

[InstCombine] Fold series of instructions into mull
ClosedPublic

Authored by Allen on Oct 15 2022, 5:01 AM.

Download Raw Diff

Details

Reviewers

spatel
efriedma
RKSimon
nikic
bcl5980

Commits

rG81713e893a33: [InstCombine] Fold series of instructions into mull

Summary

The following sequence should be folded into in0 * in1

In0Lo = in0 & 0xffffffff; In0Hi = in0 >> 32;
In1Lo = in1 & 0xffffffff; In1Hi = in1 >> 32;
m01 = In1Hi * In0Lo; m10 = In1Lo * In0Hi; m00 = In1Lo * In0Lo;
addc = m01 + m10;
ResLo = m00 + (addc >> 32);

Diff Detail

Event Timeline

Allen created this revision.Oct 15 2022, 5:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2022, 5:01 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Allen requested review of this revision.Oct 15 2022, 5:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2022, 5:01 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptOct 15 2022, 5:01 AM

Harbormaster completed remote builds in B192342: Diff 468010.Oct 15 2022, 5:46 AM

What is the motivation of this change? I feel a little strange to do this in instcombine.
And if we really need to do this, we do need more negative tests.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856	This pattern can work for any types with even bit width I think, not only i64.
864	Need one-use here for addc.

Thanks for your attention, I do this as there is case https://godbolt.org/z/x5jMhqW8s is our benchmark,
and the source is equel to an mull operater for two 64bits integer vaules, so it should be fold to similar assemble.
This is the 1st step try to generate the mul. so now I only enable it with i64 as the instruction umulh.

mul   x8,x0,x1
umulh x9,x0,x1
str   x8,[x2]
str   x9,[x3]

In D136015#3860475, @Allen wrote:
Thanks for your attention, I do this as there is case https://godbolt.org/z/x5jMhqW8s is our benchmark,
and the source is equel to an mull operater for two 64bits integer vaules, so it should be fold to similar assemble.
This is the 1st step try to generate the mul. so now I only enable it with i64 as the instruction umulh.
mul   x8,x0,x1
umulh x9,x0,x1
str   x8,[x2]
str   x9,[x3]

Maybe you can do it in AArch64 SDAG if you are only interested in AArch64.
I think the detect pattern is too long in instcombine so I have a little worry about the change.
But I'm not senior enough to review the patch, so I will resign as reviewer.

Add conditon m_OneUse(Addc)

Allen marked an inline comment as done.Oct 17 2022, 8:02 AM

Allen added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856	The source https://godbolt.org/z/x5jMhqW8s is equel to an mull operater for two 64bits integer vaules, so it should be fold to similar assemble. This is the 1st step try to generate the mul, so now I only enable it with i64 as the instruction umulh

Harbormaster completed remote builds in B192499: Diff 468205.Oct 17 2022, 8:51 AM

We're not creating a new multiply that is wider than we started with, so I'm assuming codegen can't be worse.
As mentioned earlier, the code should be generalized to handle any even bitwidth; we don't want highly type-specific transforms in IR canonicalization.
https://alive2.llvm.org/ce/z/2BqKLt

The commutative pattern matching doesn't look correct at first glance, so we need tests that exercise all of those possible patterns. The instructions with constants will always have the constant as operand 1, so you don't need to worry about those. But the 3 muls and 2 adds can all be commuted, so that's 16 potential patterns?

Since we are only creating a single new instruction, there's no need to check for m_OneUse on any of the existing values (but we should include at least one test with extra uses to show that works as expected).

Delete condtion m_OneUse and I.getType()->getIntegerBitWidth() == 64, and Add relavant test cases

Allen marked an inline comment as done.Oct 19 2022, 4:51 AM

Allen added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856	delete the checking I.getType()->getIntegerBitWidth() == 64, thanks.

Harbormaster completed remote builds in B192966: Diff 468860.Oct 19 2022, 5:04 AM

spatel added inline comments.Oct 19 2022, 7:40 AM

llvm/lib/Transforms/InstCombine/InstCombineInternal.h
550	There's no need to make a class function for this transform. Just create a static function above InstCombinerImpl::visitAdd(). Use the raw BinaryOperator::CreateMul() to return an Instruction, so we don't need to pass the Builder or use replaceInstUsesWith().
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856	The type check is insufficient in at least 2 ways and over-restrictive in other. So we need at least 3 more tests like this: define i9 @mul9_low(i9 %in0, i9 %in1) { %In0Lo = and i9 %in0, 15 %In0Hi = lshr i9 %in0, 4 %In1Lo = and i9 %in1, 15 %In1Hi = lshr i9 %in1, 4 %m10 = mul i9 %In1Hi, %In0Lo %m01 = mul i9 %In1Lo, %In0Hi %m00 = mul i9 %In1Lo, %In0Lo %addc = add i9 %m10, %m01 %shl = shl i9 %addc, 4 %addc9 = add i9 %shl, %m00 ret i9 %addc9 } define <2 x i8> @mul_v2i8_low(<2 x i8> %in0, <2 x i8> %in1) { %In0Lo = and <2 x i8> %in0, <i8 15, i8 15> %In0Hi = lshr <2 x i8> %in0, <i8 4, i8 4> %In1Lo = and <2 x i8> %in1, <i8 15, i8 15> %In1Hi = lshr <2 x i8> %in1, <i8 4, i8 4> %m10 = mul <2 x i8> %In1Hi, %In0Lo %m01 = mul <2 x i8> %In1Lo, %In0Hi %m00 = mul <2 x i8> %In1Lo, %In0Lo %addc = add <2 x i8> %m10, %m01 %shl = shl <2 x i8> %addc, <i8 4, i8 4> %addc9 = add <2 x i8> %shl, %m00 ret <2 x i8> %addc9 } define i128 @mul128_low(i128 %in0, i128 %in1) { %In0Lo = and i128 %in0, 18446744073709551615 %In0Hi = lshr i128 %in0, 64 %In1Lo = and i128 %in1, 18446744073709551615 %In1Hi = lshr i128 %in1, 64 %m10 = mul i128 %In1Hi, %In0Lo %m01 = mul i128 %In1Lo, %In0Hi %m00 = mul i128 %In1Lo, %In0Lo %addc = add i128 %m10, %m01 %shl = shl i128 %addc, 64 %addc9 = add i128 %shl, %m00 ret i128 %addc9 }
866	The structure of these matches is confusing. I'd prefer to organize it more like this: // R = (CrossSum << HalfBits) + (XLo * YLo) Value XLo, YLo; Value CrossSum; if (!match(&I, m_c_Add(m_Shl(m_Value(CrossSum), m_SpecificInt(HalfBits)), m_Mul(m_Value(XLo), m_Value(YLo))))) return nullptr; // XLo = X & HalfMask // YLo = Y & HalfMask Value X, Y; if (!match(XLo, m_And(m_Value(X), m_SpecificInt(HalfMask))) \|\| !match(YLo, m_And(m_Value(Y), m_SpecificInt(HalfMask)))) return nullptr; // CrossSum = (X' (Y >> Halfbits)) + (Y' * (X >> HalfBits)) ... IIUC, X' can be either X or XLo in the pattern (and the same for Y'). You can probably use `m_CombineOr(m_Specific(), m_Specific())` to match that with minimal code.
llvm/test/Transforms/InstCombine/mul.ll
1578	The tests are incomplete for commutative patterns. As I said earlier, I think we need at least 16 tests to verify that the matching is working as expected. Once we have the right tests in place, please pre-commit the baseline tests (CHECK lines without the code change), so we will only show diffs in this patch.

1、 use BinaryOperator::CreateMul() to avoid the use of replaceInstUsesWith()
2、 Add 3 more cases according comment
3、 Use m_CombineOr to match that with minimal code
4、create a static function above InstCombinerImpl::visitAdd()

Harbormaster completed remote builds in B193154: Diff 469121.Oct 20 2022, 12:20 AM

Allen mentioned this in D136340: [tests] precommit tests for D136015.Oct 20 2022, 5:21 AM

update after precommit the testcases

spatel added inline comments.Oct 20 2022, 6:31 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1270	Add a better description for the full transform. Something like: /// Reduce a sequence of masked half-width multiplies to a single multiply. /// ((XLow * YHigh) + (YLow * XHigh)) << HalfBits) + (XLow * YLow) --> X * Y
1271	Function names should start with lower-case letter. "Simplify" has a distinct meaning in LLVM combining - it suggests that we are not creating a new instruction. Even though it is misused in other places including in this file, we shouldn't do that again. I suggest naming this "foldLongMultiply" or "foldBoxMultiply" ( https://www.ixl.com/math/grade-4/box-multiplication ) or something like that, so it's more obvious that we are reducing a sequence of mul and add to something else.
1275	I don't see a reason to exclude vectors from this transform. Just change this line? unsigned BitWidth = I.getType()->getScalarSizeInBits();
1277	Similarly, why exclude wide widths? We're already using APInt::getMaxValue(), so just use that APInt in the m_SpecificInt() calls?

Harbormaster completed remote builds in B193205: Diff 469189.Oct 20 2022, 6:39 AM

any chance we could get vector support/tests please?

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	What about if the AND has been removed by SimplifyDemandedBits? Maybe also test for KnownBits known leading zeros?

Allen added inline comments.Oct 20 2022, 7:36 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1277	exclude the wide/vectors widths as hey are unusual get the IR from C/C++ code, and can be expand when needed later? or a seperate patch, now we already need too many cases to cover the pattern?
llvm/lib/Transforms/InstCombine/InstCombineInternal.h
550	Done, thanks for detail suggestions
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856	Thanks for detail examples.
866	Apply your comment, thanks
llvm/test/Transforms/InstCombine/mul.ll
1578	Addressed in D136340

a) rename function name to foldBoxMultiply and it's description
b) use APInt in m_SpecificInt directly
c) update getIntegerBitWidth with getScalarSizeInBits

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	Thanks for your suggestion, I'll record this issue, and I'll try out your suggestions later with a separate patch ?

Allen marked 2 inline comments as done.Oct 20 2022, 8:06 AM

RKSimon added inline comments.Oct 20 2022, 8:39 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	A TODO comment is fine for now - cheers

Harbormaster completed remote builds in B193243: Diff 469236.Oct 20 2022, 8:55 AM

spatel added inline comments.Oct 20 2022, 10:36 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1277	There's really no difference in the testing - just change one test to i130 or something like that? And the code difference is just to remove that clause in the `if` on line 1278 - nothing else changes? But if you think there's some risk from handling that, then please add a TODO comment, so we can relax the constraint in a follow-up patch.

CC @chfast who was looking at something similar in D56214

In https://reviews.llvm.org/D56214 similar pattern match was applied in AggressiveInstCombine.

Do you want me to submit test cases from there?

rebase as the precommit tests update

Allen marked 5 inline comments as done.Oct 21 2022, 6:26 PM

Harbormaster completed remote builds in B193700: Diff 469842.Oct 21 2022, 7:09 PM

In D136015#3875187, @chfast wrote:

In https://reviews.llvm.org/D56214 similar pattern match was applied in AggressiveInstCombine.

Do you want me to submit test cases from there?

Yes please @chfast, if you think we can just use this patch then maybe just move them (and tweak for -instcombine).

chfast mentioned this in rG119c34e7f9c6: [InstCombine][test] Add tests for mul combinations.Oct 22 2022, 7:26 AM

In D136015#3876878, @RKSimon wrote:

In D136015#3875187, @chfast wrote:

In https://reviews.llvm.org/D56214 similar pattern match was applied in AggressiveInstCombine.

Do you want me to submit test cases from there?

Yes please @chfast, if you think we can just use this patch then maybe just move them (and tweak for -instcombine).

Added in https://reviews.llvm.org/rG119c34e7f9c66dbdb77f69d67bb50507c91dc2ef.

@Allen please can you rebase?

Allen mentioned this in rG770d5e89ba89: [tests] precommit tests for D136015.Oct 23 2022, 6:41 AM

rebase top as the precommit test

In D136015#3877593, @RKSimon wrote:

@Allen please can you rebase?

Done, thanks @RKSimon/@chfast for your precommit tests.

Harbormaster completed remote builds in B193852: Diff 470033.Oct 23 2022, 7:51 PM

spatel mentioned this in rG41c42f5b1825: [InstCombine] adjust mul tests to avoid reliance on other folds; NFC.Oct 24 2022, 6:20 AM

spatel mentioned this in rG56c6b612aed1: [InstCombine] vary commuted patterns for mul fold; NFC.

Please rebase again after 41c42f5b1825 / 56c6b612aed1.
If I did that correctly, we won't see any changes for the final value in each test from this revision, but we'll test this patch directly and get a better coverage for commuted patterns.
After that, I think this patch will be complete.

rebase after 41c42f5b1825 / 56c6b612aed1

chfast added inline comments.Oct 24 2022, 7:15 AM

llvm/test/Transforms/InstCombine/mul_full_64.ll
452 ↗	(On Diff #470137)	Interestingly, it hasn't folded this one.

In D136015#3879133, @spatel wrote:

Please rebase again after 41c42f5b1825 / 56c6b612aed1.
If I did that correctly, we won't see any changes for the final value in each test from this revision, but we'll test this patch directly and get a better coverage for commuted patterns.
After that, I think this patch will be complete.

Done, thanks very much for your changes. And I don't completely understand why need the use at the beginning of a function? eg:

define i8 @mul8_low_A0_B2(i8 %in0, i8 %p) {
  %in1 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization
  %In0Lo = and i8 %in0, 15
  %In0Hi = lshr i8 %in0, 4
  %In1Lo = and i8 %in1, 15
  %In1Hi = lshr i8 %in1, 4
  %m10 = mul i8 %In1Hi, %in0
  %m01 = mul i8 %in1, %In0Hi
  %m00 = mul i8 %In1Lo, %In0Lo
  %addc = add i8 %m01, %m10
  %shl = shl i8 %addc, 4
  %retLo = add i8 %shl, %m00
  ret i8 %retLo
}

Harbormaster completed remote builds in B193924: Diff 470137.Oct 24 2022, 8:26 AM

LGTM

In D136015#3879280, @Allen wrote:
Done, thanks very much for your changes. And I don't completely understand why need the use at the beginning of a function? eg:
define i8 @mul8_low_A0_B2(i8 %in0, i8 %p) {
  %in1 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization

If you remove that line, notice that the values in the later multiply get commuted. That happens before we reach this transform, so the test is trying to ensure that the exact placement of the values at runtime is the same as specified in the test.

llvm/test/Transforms/InstCombine/mul_full_64.ll
452 ↗	(On Diff #470137)	This patch assumes we are ending with an "add", but this test changes to an "or". We'd need to add another check for hasNoCommonBitsSet() to catch it? Here's another potential fold: https://alive2.llvm.org/ce/z/hUm56R ...but it needs to freeze the inputs to be poison-safe because they have multiple uses.

This revision is now accepted and ready to land.Oct 24 2022, 9:16 AM

If you remove that line, notice that the values in the later multiply get commuted. That happens before we reach this transform, so the test is trying to ensure that the exact placement of the values at runtime is the same as specified in the test.

Thanks very much for your guidance.

Closed by commit rG81713e893a33: [InstCombine] Fold series of instructions into mull (authored by Allen). · Explain WhyOct 24 2022, 10:10 AM

This revision was automatically updated to reflect the committed changes.

Allen added a commit: rG81713e893a33: [InstCombine] Fold series of instructions into mull.

Allen added a subscriber: tgt.Oct 24 2022, 6:56 PM

Allen added inline comments.

llvm/test/Transforms/InstCombine/mul_full_64.ll

452 ↗

(On Diff #470137)

hi @chfast

I think the case **@mullo** should not be matched? https://alive2.llvm.org/ce/z/jH4kU7

hi, @spatel

 As the case in link https://alive2.llvm.org/ce/z/hUm56R, it's result not equal to **mul i8 %y, %x**, so it need some other logic to match ? maybe defined with a new helper function. see detail https://alive2.llvm.org/ce/z/FEgEU7
```

define i8 @tgt(i8 %x, i8 %y) {

%m = mul i8 %y, %x
ret i8 %m

}

Allen mentioned this in D136661: [InstCombine] Fold series of instructions into mull for more types.Oct 24 2022, 10:06 PM

chfast added inline comments.Oct 24 2022, 11:56 PM

llvm/test/Transforms/InstCombine/mul_full_64.ll
452 ↗	(On Diff #470137)	I think the case @mullo should not be matched? https://alive2.llvm.org/ce/z/jH4kU7 There is a typo in the example. You changed `or` to `and` but the original pattern starts at `add`. I.e. all patterns starting at `add`, `or` and `xor` should work, the one starting at `and` should not. https://alive2.llvm.org/ce/z/y26zaW I'm not sure it is worth to expand the matching to `or` and `xor.

Allen mentioned this in rG620cff096aba: [InstCombine] Fold series of instructions into mull for more types.Oct 25 2022, 8:05 AM

RKSimon mentioned this in D56214: AggressiveInstCombine: Fold full mul i64 x i64 -> i128.Oct 26 2022, 3:20 AM

Allen mentioned this in rGf58311796c49: [InstCombine] refactor the SimplifyUsingDistributiveLaws NFC.Oct 30 2022, 6:06 AM

Allen added inline comments.Oct 31 2022, 7:32 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	hi @RKSimon As this revision is accept, so it is time to consider your refactor suggestion, do you have some idea about the extra tests ? thanks.
llvm/test/Transforms/InstCombine/mul_full_64.ll
452 ↗	(On Diff #470137)	Thanks @chfast for your case. I take a look at your case more, except the above add VS or, there is some other diffirence with my initail case. https://alive2.llvm.org/ce/z/ZKmrJB

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

3 lines

InstCombineInternal.h

3 lines

InstructionCombining.cpp

29 lines

test/

Transforms/

InstCombine/

mul.ll

18 lines

Diff 468205

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines	if (auto *NewI = dyn_cast<BinaryOperator>(NewMath)) {
NewI->setHasNoUnsignedWrap(HasNUW);		NewI->setHasNoUnsignedWrap(HasNUW);
}		}
auto *NewShl = BinaryOperator::CreateShl(NewMath, ShAmt);		auto *NewShl = BinaryOperator::CreateShl(NewMath, ShAmt);
NewShl->setHasNoSignedWrap(HasNSW);		NewShl->setHasNoSignedWrap(HasNSW);
NewShl->setHasNoUnsignedWrap(HasNUW);		NewShl->setHasNoUnsignedWrap(HasNUW);
return NewShl;		return NewShl;
}		}

Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {
		spatelUnsubmitted Done Reply Inline Actions Add a better description for the full transform. Something like: /// Reduce a sequence of masked half-width multiplies to a single multiply. /// ((XLow * YHigh) + (YLow * XHigh)) << HalfBits) + (XLow * YLow) --> X * Y spatel: Add a better description for the full transform. Something like: /// Reduce a sequence of…
if (Value *V = simplifyAddInst(I.getOperand(0), I.getOperand(1),		if (Value *V = simplifyAddInst(I.getOperand(0), I.getOperand(1),
		spatelUnsubmitted Done Reply Inline Actions Function names should start with lower-case letter. "Simplify" has a distinct meaning in LLVM combining - it suggests that we are not creating a new instruction. Even though it is misused in other places including in this file, we shouldn't do that again. I suggest naming this "foldLongMultiply" or "foldBoxMultiply" ( https://www.ixl.com/math/grade-4/box-multiplication ) or something like that, so it's more obvious that we are reducing a sequence of mul and add to something else. spatel: Function names should start with lower-case letter. "Simplify" has a distinct meaning in LLVM…
I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),		I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

		spatelUnsubmitted Done Reply Inline Actions I don't see a reason to exclude vectors from this transform. Just change this line? unsigned BitWidth = I.getType()->getScalarSizeInBits(); spatel: I don't see a reason to exclude vectors from this transform. Just change this line? unsigned…
if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;
		spatelUnsubmitted Done Reply Inline Actions Similarly, why exclude wide widths? We're already using APInt::getMaxValue(), so just use that APInt in the m_SpecificInt() calls? spatel: Similarly, why exclude wide widths? We're already using APInt::getMaxValue(), so just use that…
		AllenAuthorUnsubmitted Done Reply Inline Actions exclude the wide/vectors widths as hey are unusual get the IR from C/C++ code, and can be expand when needed later? or a seperate patch, now we already need too many cases to cover the pattern? Allen: exclude the wide/vectors widths as hey are unusual get the IR from C/C++ code, and can be…
		spatelUnsubmitted Done Reply Inline Actions There's really no difference in the testing - just change one test to i130 or something like that? And the code difference is just to remove that clause in the `if` on line 1278 - nothing else changes? But if you think there's some risk from handling that, then please add a TODO comment, so we can relax the constraint in a follow-up patch. spatel: There's really no difference in the testing - just change one test to i130 or something like…

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

if (Instruction *Phi = foldBinopWithPhiOperands(I))		if (Instruction *Phi = foldBinopWithPhiOperands(I))
return Phi;		return Phi;

// (AB)+(AC) -> A*(B+C) etc		// (AB)+(AC) -> A*(B+C) etc
if (Value *V = SimplifyUsingDistributiveLaws(I))		if (Value *V = SimplifyUsingDistributiveLaws(I))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

		if (Value *V = SimplifyMull(I))
		return replaceInstUsesWith(I, V);

if (Instruction *R = factorizeMathWithShlOps(I, Builder))		if (Instruction *R = factorizeMathWithShlOps(I, Builder))
return R;		return R;

if (Instruction *X = foldAddWithConstant(I))		if (Instruction *X = foldAddWithConstant(I))
return X;		return X;
		RKSimonUnsubmitted Done Reply Inline Actions What about if the AND has been removed by SimplifyDemandedBits? Maybe also test for KnownBits known leading zeros? RKSimon: What about if the AND has been removed by SimplifyDemandedBits? Maybe also test for KnownBits…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for your suggestion, I'll record this issue, and I'll try out your suggestions later with a separate patch ? Allen: Thanks for your suggestion, I'll record this issue, and I'll try out your suggestions later…
		RKSimonUnsubmitted Done Reply Inline Actions A TODO comment is fine for now - cheers RKSimon: A TODO comment is fine for now - cheers
		AllenAuthorUnsubmitted Done Reply Inline Actions hi @RKSimon As this revision is accept, so it is time to consider your refactor suggestion, do you have some idea about the extra tests ? thanks. Allen: hi @RKSimon As this revision is accept, so it is time to consider your refactor suggestion…

if (Instruction *X = foldNoWrapAdd(I, Builder))		if (Instruction *X = foldNoWrapAdd(I, Builder))
return X;		return X;

Value LHS = I.getOperand(0), RHS = I.getOperand(1);		Value LHS = I.getOperand(0), RHS = I.getOperand(1);
Type *Ty = I.getType();		Type *Ty = I.getType();
if (Ty->isIntOrIntVectorTy(1))		if (Ty->isIntOrIntVectorTy(1))
return BinaryOperator::CreateXor(LHS, RHS);		return BinaryOperator::CreateXor(LHS, RHS);
▲ Show 20 Lines • Show All 1,240 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	public:
/// operation distributes over.		/// operation distributes over.
///		///
/// It does this by either by factorizing out common terms (eg "(AB)+(AC)"		/// It does this by either by factorizing out common terms (eg "(AB)+(AC)"
/// -> "A*(B+C)") or expanding out if this results in simplifications (eg: "A		/// -> "A*(B+C)") or expanding out if this results in simplifications (eg: "A
/// & (B \| C) -> (A&B) \| (A&C)" if this is a win). Returns the simplified		/// & (B \| C) -> (A&B) \| (A&C)" if this is a win). Returns the simplified
/// value, or null if it didn't simplify.		/// value, or null if it didn't simplify.
Value *SimplifyUsingDistributiveLaws(BinaryOperator &I);		Value *SimplifyUsingDistributiveLaws(BinaryOperator &I);

		/// Tries to simplify a few sequence operations into MULL
		Value *SimplifyMull(BinaryOperator &I);
		spatelUnsubmitted Done Reply Inline Actions There's no need to make a class function for this transform. Just create a static function above InstCombinerImpl::visitAdd(). Use the raw BinaryOperator::CreateMul() to return an Instruction, so we don't need to pass the Builder or use replaceInstUsesWith(). spatel: There's no need to make a class function for this transform. Just create a static function…
		AllenAuthorUnsubmitted Done Reply Inline Actions Done, thanks for detail suggestions Allen: Done, thanks for detail suggestions

/// Tries to simplify add operations using the definition of remainder.		/// Tries to simplify add operations using the definition of remainder.
///		///
/// The definition of remainder is X % C = X - (X / C ) * C. The add		/// The definition of remainder is X % C = X - (X / C ) * C. The add
/// expression X % C0 + (( X / C0 ) % C1) * C0 can be simplified to		/// expression X % C0 + (( X / C0 ) % C1) * C0 can be simplified to
/// X % (C0 * C1)		/// X % (C0 * C1)
Value *SimplifyAddWithRemainder(BinaryOperator &I);		Value *SimplifyAddWithRemainder(BinaryOperator &I);

// Binary Op helper for select operations where the expression can be		// Binary Op helper for select operations where the expression can be
▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 846 Lines • ▼ Show 20 Lines	if (R && R == ConstantExpr::getBinOpIdentity(InnerOpcode, R->getType())) {
A->takeName(&I);		A->takeName(&I);
return A;		return A;
}		}
}		}

return SimplifySelectsFeedingBinaryOp(I, LHS, RHS);		return SimplifySelectsFeedingBinaryOp(I, LHS, RHS);
}		}

		Value *InstCombinerImpl::SimplifyMull(BinaryOperator &I) {
		if (!(I.getType()->isIntegerTy() && I.getType()->getIntegerBitWidth() == 64))
		bcl5980Unsubmitted Done Reply Inline Actions This pattern can work for any types with even bit width I think, not only i64. bcl5980: This pattern can work for any types with even bit width I think, not only i64.
		AllenAuthorUnsubmitted Done Reply Inline Actions The source https://godbolt.org/z/x5jMhqW8s is equel to an mull operater for two 64bits integer vaules, so it should be fold to similar assemble. This is the 1st step try to generate the mul, so now I only enable it with i64 as the instruction umulh Allen: The source https://godbolt.org/z/x5jMhqW8s is equel to an mull operater for two 64bits integer…
		AllenAuthorUnsubmitted Done Reply Inline Actions delete the checking I.getType()->getIntegerBitWidth() == 64, thanks. Allen: delete the checking I.getType()->getIntegerBitWidth() == 64, thanks.
		spatelUnsubmitted Done Reply Inline Actions The type check is insufficient in at least 2 ways and over-restrictive in other. So we need at least 3 more tests like this: define i9 @mul9_low(i9 %in0, i9 %in1) { %In0Lo = and i9 %in0, 15 %In0Hi = lshr i9 %in0, 4 %In1Lo = and i9 %in1, 15 %In1Hi = lshr i9 %in1, 4 %m10 = mul i9 %In1Hi, %In0Lo %m01 = mul i9 %In1Lo, %In0Hi %m00 = mul i9 %In1Lo, %In0Lo %addc = add i9 %m10, %m01 %shl = shl i9 %addc, 4 %addc9 = add i9 %shl, %m00 ret i9 %addc9 } define <2 x i8> @mul_v2i8_low(<2 x i8> %in0, <2 x i8> %in1) { %In0Lo = and <2 x i8> %in0, <i8 15, i8 15> %In0Hi = lshr <2 x i8> %in0, <i8 4, i8 4> %In1Lo = and <2 x i8> %in1, <i8 15, i8 15> %In1Hi = lshr <2 x i8> %in1, <i8 4, i8 4> %m10 = mul <2 x i8> %In1Hi, %In0Lo %m01 = mul <2 x i8> %In1Lo, %In0Hi %m00 = mul <2 x i8> %In1Lo, %In0Lo %addc = add <2 x i8> %m10, %m01 %shl = shl <2 x i8> %addc, <i8 4, i8 4> %addc9 = add <2 x i8> %shl, %m00 ret <2 x i8> %addc9 } define i128 @mul128_low(i128 %in0, i128 %in1) { %In0Lo = and i128 %in0, 18446744073709551615 %In0Hi = lshr i128 %in0, 64 %In1Lo = and i128 %in1, 18446744073709551615 %In1Hi = lshr i128 %in1, 64 %m10 = mul i128 %In1Hi, %In0Lo %m01 = mul i128 %In1Lo, %In0Hi %m00 = mul i128 %In1Lo, %In0Lo %addc = add i128 %m10, %m01 %shl = shl i128 %addc, 64 %addc9 = add i128 %shl, %m00 ret i128 %addc9 } spatel: The type check is insufficient in at least 2 ways and over-restrictive in other. So we need…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for detail examples. Allen: Thanks for detail examples.
		return nullptr;

		Value In0, In1;
		Value M01, M10, M00, Addc;

		// addc = m01 + m10;
		// ResLo = m00 + (addc >> 32);
		bool IsMulLow = match(&I, m_c_Add(m_Value(M00),
		bcl5980Unsubmitted Done Reply Inline Actions Need one-use here for addc. bcl5980: Need one-use here for addc.
		m_Shl(m_Value(Addc), m_SpecificInt(32)))) &&
		match(Addc, m_OneUse(m_c_Add(m_Value(M01), m_Value(M10))));
		spatelUnsubmitted Done Reply Inline Actions The structure of these matches is confusing. I'd prefer to organize it more like this: // R = (CrossSum << HalfBits) + (XLo * YLo) Value XLo, YLo; Value CrossSum; if (!match(&I, m_c_Add(m_Shl(m_Value(CrossSum), m_SpecificInt(HalfBits)), m_Mul(m_Value(XLo), m_Value(YLo))))) return nullptr; // XLo = X & HalfMask // YLo = Y & HalfMask Value X, Y; if (!match(XLo, m_And(m_Value(X), m_SpecificInt(HalfMask))) \|\| !match(YLo, m_And(m_Value(Y), m_SpecificInt(HalfMask)))) return nullptr; // CrossSum = (X' (Y >> Halfbits)) + (Y' * (X >> HalfBits)) ... IIUC, X' can be either X or XLo in the pattern (and the same for Y'). You can probably use `m_CombineOr(m_Specific(), m_Specific())` to match that with minimal code. spatel: The structure of these matches is confusing. I'd prefer to organize it more like this: ```…
		AllenAuthorUnsubmitted Done Reply Inline Actions Apply your comment, thanks Allen: Apply your comment, thanks

		// In0Lo = in0 & 0xffffffff; In0Hi = in0 >> 32;
		// In1Lo = in1 & 0xffffffff; In1Hi = in1 >> 32;
		// m01 = In1Hi * In0Lo; m10 = In1Lo * In0Hi; m00 = In1Lo * In0Lo;
		if (IsMulLow &&
		match(M00, m_c_Mul(m_c_And(m_Value(In1), m_SpecificInt(4294967295)),
		m_c_And(m_Value(In0), m_SpecificInt(4294967295)))) &&
		match(M01, m_c_Mul(m_LShr(m_Specific(In1), m_SpecificInt(32)),
		m_Specific(In0))) &&
		match(M10, m_c_Mul(m_LShr(m_Specific(In0), m_SpecificInt(32)),
		m_Specific(In1)))) {
		return Builder.CreateMul(In0, In1);
		}

		return nullptr;
		}

Value *InstCombinerImpl::SimplifySelectsFeedingBinaryOp(BinaryOperator &I,		Value *InstCombinerImpl::SimplifySelectsFeedingBinaryOp(BinaryOperator &I,
Value *LHS,		Value *LHS,
Value *RHS) {		Value *RHS) {
Value A, B, C, D, E, F;		Value A, B, C, D, E, F;
bool LHSIsSelect = match(LHS, m_Select(m_Value(A), m_Value(B), m_Value(C)));		bool LHSIsSelect = match(LHS, m_Select(m_Value(A), m_Value(B), m_Value(C)));
bool RHSIsSelect = match(RHS, m_Select(m_Value(D), m_Value(E), m_Value(F)));		bool RHSIsSelect = match(RHS, m_Select(m_Value(D), m_Value(E), m_Value(F)));
if (!LHSIsSelect && !RHSIsSelect)		if (!LHSIsSelect && !RHSIsSelect)
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 3,830 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/mul.ll

	Show First 20 Lines • Show All 1,568 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[R:%.*]] = mul i32 [[ZX]], -16777216			; CHECK-NEXT: [[R:%.*]] = mul i32 [[ZX]], -16777216
	; CHECK-NEXT: ret i32 [[R]]			; CHECK-NEXT: ret i32 [[R]]
	;			;
	%zx = zext i8 %x to i32			%zx = zext i8 %x to i32
	call void @use32(i32 %zx)			call void @use32(i32 %zx)
	%r = mul i32 %zx, -16777216 ; -1 << 24			%r = mul i32 %zx, -16777216 ; -1 << 24
	ret i32 %r			ret i32 %r
	}			}

				define i64 @mul64_low(i64 noundef %in0, i64 noundef %in1) {
				spatelUnsubmitted Done Reply Inline Actions The tests are incomplete for commutative patterns. As I said earlier, I think we need at least 16 tests to verify that the matching is working as expected. Once we have the right tests in place, please pre-commit the baseline tests (CHECK lines without the code change), so we will only show diffs in this patch. spatel: The tests are incomplete for commutative patterns. As I said earlier, I think we need at least…
				AllenAuthorUnsubmitted Done Reply Inline Actions Addressed in D136340 Allen: Addressed in D136340
				; CHECK-LABEL: @mul64_low(
				; CHECK-NEXT: [[TMP1:%.]] = mul i64 [[IN0:%.]], [[IN1:%.*]]
				; CHECK-NEXT: ret i64 [[TMP1]]
				;
				%In0Lo = and i64 %in0, 4294967295
				%In0Hi = lshr i64 %in0, 32
				%In1Lo = and i64 %in1, 4294967295
				%In1Hi = lshr i64 %in1, 32
				%m10 = mul i64 %In1Hi, %In0Lo
				%m01 = mul i64 %In1Lo, %In0Hi
				%m00 = mul i64 %In1Lo, %In0Lo
				%addc = add i64 %m10, %m01
				%shl = shl i64 %addc, 32
				%addc9 = add i64 %shl, %m00
				ret i64 %addc9
				}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold series of instructions into mullClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 468205

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/test/Transforms/InstCombine/mul.ll

[InstCombine] Fold series of instructions into mull
ClosedPublic