This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
10/10
InstCombineAddSub.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
mul_fold.ll
2/5
mul_full_64.ll

Differential D136015

[InstCombine] Fold series of instructions into mull
ClosedPublic

Authored by Allen on Oct 15 2022, 5:01 AM.

Download Raw Diff

Details

Reviewers

spatel
efriedma
RKSimon
nikic
bcl5980

Commits

rG81713e893a33: [InstCombine] Fold series of instructions into mull

Summary

The following sequence should be folded into in0 * in1

In0Lo = in0 & 0xffffffff; In0Hi = in0 >> 32;
In1Lo = in1 & 0xffffffff; In1Hi = in1 >> 32;
m01 = In1Hi * In0Lo; m10 = In1Lo * In0Hi; m00 = In1Lo * In0Lo;
addc = m01 + m10;
ResLo = m00 + (addc >> 32);

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Allen created this revision.Oct 15 2022, 5:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2022, 5:01 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Allen requested review of this revision.Oct 15 2022, 5:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2022, 5:01 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptOct 15 2022, 5:01 AM

Harbormaster completed remote builds in B192342: Diff 468010.Oct 15 2022, 5:46 AM

What is the motivation of this change? I feel a little strange to do this in instcombine.
And if we really need to do this, we do need more negative tests.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856 ↗	(On Diff #468010)	This pattern can work for any types with even bit width I think, not only i64.
864 ↗	(On Diff #468010)	Need one-use here for addc.

Thanks for your attention, I do this as there is case https://godbolt.org/z/x5jMhqW8s is our benchmark,
and the source is equel to an mull operater for two 64bits integer vaules, so it should be fold to similar assemble.
This is the 1st step try to generate the mul. so now I only enable it with i64 as the instruction umulh.

mul   x8,x0,x1
umulh x9,x0,x1
str   x8,[x2]
str   x9,[x3]

In D136015#3860475, @Allen wrote:
Thanks for your attention, I do this as there is case https://godbolt.org/z/x5jMhqW8s is our benchmark,
and the source is equel to an mull operater for two 64bits integer vaules, so it should be fold to similar assemble.
This is the 1st step try to generate the mul. so now I only enable it with i64 as the instruction umulh.
mul   x8,x0,x1
umulh x9,x0,x1
str   x8,[x2]
str   x9,[x3]

Maybe you can do it in AArch64 SDAG if you are only interested in AArch64.
I think the detect pattern is too long in instcombine so I have a little worry about the change.
But I'm not senior enough to review the patch, so I will resign as reviewer.

Add conditon m_OneUse(Addc)

Allen marked an inline comment as done.Oct 17 2022, 8:02 AM

Allen added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856 ↗	(On Diff #468010)	The source https://godbolt.org/z/x5jMhqW8s is equel to an mull operater for two 64bits integer vaules, so it should be fold to similar assemble. This is the 1st step try to generate the mul, so now I only enable it with i64 as the instruction umulh

Harbormaster completed remote builds in B192499: Diff 468205.Oct 17 2022, 8:51 AM

We're not creating a new multiply that is wider than we started with, so I'm assuming codegen can't be worse.
As mentioned earlier, the code should be generalized to handle any even bitwidth; we don't want highly type-specific transforms in IR canonicalization.
https://alive2.llvm.org/ce/z/2BqKLt

The commutative pattern matching doesn't look correct at first glance, so we need tests that exercise all of those possible patterns. The instructions with constants will always have the constant as operand 1, so you don't need to worry about those. But the 3 muls and 2 adds can all be commuted, so that's 16 potential patterns?

Since we are only creating a single new instruction, there's no need to check for m_OneUse on any of the existing values (but we should include at least one test with extra uses to show that works as expected).

Delete condtion m_OneUse and I.getType()->getIntegerBitWidth() == 64, and Add relavant test cases

Allen marked an inline comment as done.Oct 19 2022, 4:51 AM

Allen added inline comments.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856 ↗	(On Diff #468010)	delete the checking I.getType()->getIntegerBitWidth() == 64, thanks.

Harbormaster completed remote builds in B192966: Diff 468860.Oct 19 2022, 5:04 AM

spatel added inline comments.Oct 19 2022, 7:40 AM

llvm/lib/Transforms/InstCombine/InstCombineInternal.h
550 ↗	(On Diff #468860)	There's no need to make a class function for this transform. Just create a static function above InstCombinerImpl::visitAdd(). Use the raw BinaryOperator::CreateMul() to return an Instruction, so we don't need to pass the Builder or use replaceInstUsesWith().
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856 ↗	(On Diff #468860)	The type check is insufficient in at least 2 ways and over-restrictive in other. So we need at least 3 more tests like this: define i9 @mul9_low(i9 %in0, i9 %in1) { %In0Lo = and i9 %in0, 15 %In0Hi = lshr i9 %in0, 4 %In1Lo = and i9 %in1, 15 %In1Hi = lshr i9 %in1, 4 %m10 = mul i9 %In1Hi, %In0Lo %m01 = mul i9 %In1Lo, %In0Hi %m00 = mul i9 %In1Lo, %In0Lo %addc = add i9 %m10, %m01 %shl = shl i9 %addc, 4 %addc9 = add i9 %shl, %m00 ret i9 %addc9 } define <2 x i8> @mul_v2i8_low(<2 x i8> %in0, <2 x i8> %in1) { %In0Lo = and <2 x i8> %in0, <i8 15, i8 15> %In0Hi = lshr <2 x i8> %in0, <i8 4, i8 4> %In1Lo = and <2 x i8> %in1, <i8 15, i8 15> %In1Hi = lshr <2 x i8> %in1, <i8 4, i8 4> %m10 = mul <2 x i8> %In1Hi, %In0Lo %m01 = mul <2 x i8> %In1Lo, %In0Hi %m00 = mul <2 x i8> %In1Lo, %In0Lo %addc = add <2 x i8> %m10, %m01 %shl = shl <2 x i8> %addc, <i8 4, i8 4> %addc9 = add <2 x i8> %shl, %m00 ret <2 x i8> %addc9 } define i128 @mul128_low(i128 %in0, i128 %in1) { %In0Lo = and i128 %in0, 18446744073709551615 %In0Hi = lshr i128 %in0, 64 %In1Lo = and i128 %in1, 18446744073709551615 %In1Hi = lshr i128 %in1, 64 %m10 = mul i128 %In1Hi, %In0Lo %m01 = mul i128 %In1Lo, %In0Hi %m00 = mul i128 %In1Lo, %In0Lo %addc = add i128 %m10, %m01 %shl = shl i128 %addc, 64 %addc9 = add i128 %shl, %m00 ret i128 %addc9 }
866 ↗	(On Diff #468860)	The structure of these matches is confusing. I'd prefer to organize it more like this: // R = (CrossSum << HalfBits) + (XLo * YLo) Value XLo, YLo; Value CrossSum; if (!match(&I, m_c_Add(m_Shl(m_Value(CrossSum), m_SpecificInt(HalfBits)), m_Mul(m_Value(XLo), m_Value(YLo))))) return nullptr; // XLo = X & HalfMask // YLo = Y & HalfMask Value X, Y; if (!match(XLo, m_And(m_Value(X), m_SpecificInt(HalfMask))) \|\| !match(YLo, m_And(m_Value(Y), m_SpecificInt(HalfMask)))) return nullptr; // CrossSum = (X' (Y >> Halfbits)) + (Y' * (X >> HalfBits)) ... IIUC, X' can be either X or XLo in the pattern (and the same for Y'). You can probably use `m_CombineOr(m_Specific(), m_Specific())` to match that with minimal code.
llvm/test/Transforms/InstCombine/mul.ll
1578 ↗	(On Diff #468860)	The tests are incomplete for commutative patterns. As I said earlier, I think we need at least 16 tests to verify that the matching is working as expected. Once we have the right tests in place, please pre-commit the baseline tests (CHECK lines without the code change), so we will only show diffs in this patch.

1、 use BinaryOperator::CreateMul() to avoid the use of replaceInstUsesWith()
2、 Add 3 more cases according comment
3、 Use m_CombineOr to match that with minimal code
4、create a static function above InstCombinerImpl::visitAdd()

Harbormaster completed remote builds in B193154: Diff 469121.Oct 20 2022, 12:20 AM

Allen mentioned this in D136340: [tests] precommit tests for D136015.Oct 20 2022, 5:21 AM

update after precommit the testcases

spatel added inline comments.Oct 20 2022, 6:31 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1270	Add a better description for the full transform. Something like: /// Reduce a sequence of masked half-width multiplies to a single multiply. /// ((XLow * YHigh) + (YLow * XHigh)) << HalfBits) + (XLow * YLow) --> X * Y
1271	Function names should start with lower-case letter. "Simplify" has a distinct meaning in LLVM combining - it suggests that we are not creating a new instruction. Even though it is misused in other places including in this file, we shouldn't do that again. I suggest naming this "foldLongMultiply" or "foldBoxMultiply" ( https://www.ixl.com/math/grade-4/box-multiplication ) or something like that, so it's more obvious that we are reducing a sequence of mul and add to something else.
1275	I don't see a reason to exclude vectors from this transform. Just change this line? unsigned BitWidth = I.getType()->getScalarSizeInBits();
1277	Similarly, why exclude wide widths? We're already using APInt::getMaxValue(), so just use that APInt in the m_SpecificInt() calls?

Harbormaster completed remote builds in B193205: Diff 469189.Oct 20 2022, 6:39 AM

any chance we could get vector support/tests please?

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	What about if the AND has been removed by SimplifyDemandedBits? Maybe also test for KnownBits known leading zeros?

Allen added inline comments.Oct 20 2022, 7:36 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1277	exclude the wide/vectors widths as hey are unusual get the IR from C/C++ code, and can be expand when needed later? or a seperate patch, now we already need too many cases to cover the pattern?
llvm/lib/Transforms/InstCombine/InstCombineInternal.h
550 ↗	(On Diff #468860)	Done, thanks for detail suggestions
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
856 ↗	(On Diff #468860)	Thanks for detail examples.
866 ↗	(On Diff #468860)	Apply your comment, thanks
llvm/test/Transforms/InstCombine/mul.ll
1578 ↗	(On Diff #468860)	Addressed in D136340

a) rename function name to foldBoxMultiply and it's description
b) use APInt in m_SpecificInt directly
c) update getIntegerBitWidth with getScalarSizeInBits

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	Thanks for your suggestion, I'll record this issue, and I'll try out your suggestions later with a separate patch ?

Allen marked 2 inline comments as done.Oct 20 2022, 8:06 AM

RKSimon added inline comments.Oct 20 2022, 8:39 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	A TODO comment is fine for now - cheers

Harbormaster completed remote builds in B193243: Diff 469236.Oct 20 2022, 8:55 AM

spatel added inline comments.Oct 20 2022, 10:36 AM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1277	There's really no difference in the testing - just change one test to i130 or something like that? And the code difference is just to remove that clause in the `if` on line 1278 - nothing else changes? But if you think there's some risk from handling that, then please add a TODO comment, so we can relax the constraint in a follow-up patch.

CC @chfast who was looking at something similar in D56214

In https://reviews.llvm.org/D56214 similar pattern match was applied in AggressiveInstCombine.

Do you want me to submit test cases from there?

rebase as the precommit tests update

Allen marked 5 inline comments as done.Oct 21 2022, 6:26 PM

Harbormaster completed remote builds in B193700: Diff 469842.Oct 21 2022, 7:09 PM

In D136015#3875187, @chfast wrote:

In https://reviews.llvm.org/D56214 similar pattern match was applied in AggressiveInstCombine.

Do you want me to submit test cases from there?

Yes please @chfast, if you think we can just use this patch then maybe just move them (and tweak for -instcombine).

chfast mentioned this in rG119c34e7f9c6: [InstCombine][test] Add tests for mul combinations.Oct 22 2022, 7:26 AM

In D136015#3876878, @RKSimon wrote:

In D136015#3875187, @chfast wrote:

In https://reviews.llvm.org/D56214 similar pattern match was applied in AggressiveInstCombine.

Do you want me to submit test cases from there?

Yes please @chfast, if you think we can just use this patch then maybe just move them (and tweak for -instcombine).

Added in https://reviews.llvm.org/rG119c34e7f9c66dbdb77f69d67bb50507c91dc2ef.

@Allen please can you rebase?

Allen mentioned this in rG770d5e89ba89: [tests] precommit tests for D136015.Oct 23 2022, 6:41 AM

rebase top as the precommit test

In D136015#3877593, @RKSimon wrote:

@Allen please can you rebase?

Done, thanks @RKSimon/@chfast for your precommit tests.

Harbormaster completed remote builds in B193852: Diff 470033.Oct 23 2022, 7:51 PM

spatel mentioned this in rG41c42f5b1825: [InstCombine] adjust mul tests to avoid reliance on other folds; NFC.Oct 24 2022, 6:20 AM

spatel mentioned this in rG56c6b612aed1: [InstCombine] vary commuted patterns for mul fold; NFC.

Please rebase again after 41c42f5b1825 / 56c6b612aed1.
If I did that correctly, we won't see any changes for the final value in each test from this revision, but we'll test this patch directly and get a better coverage for commuted patterns.
After that, I think this patch will be complete.

rebase after 41c42f5b1825 / 56c6b612aed1

chfast added inline comments.Oct 24 2022, 7:15 AM

llvm/test/Transforms/InstCombine/mul_full_64.ll
452	Interestingly, it hasn't folded this one.

In D136015#3879133, @spatel wrote:

Please rebase again after 41c42f5b1825 / 56c6b612aed1.
If I did that correctly, we won't see any changes for the final value in each test from this revision, but we'll test this patch directly and get a better coverage for commuted patterns.
After that, I think this patch will be complete.

Done, thanks very much for your changes. And I don't completely understand why need the use at the beginning of a function? eg:

define i8 @mul8_low_A0_B2(i8 %in0, i8 %p) {
  %in1 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization
  %In0Lo = and i8 %in0, 15
  %In0Hi = lshr i8 %in0, 4
  %In1Lo = and i8 %in1, 15
  %In1Hi = lshr i8 %in1, 4
  %m10 = mul i8 %In1Hi, %in0
  %m01 = mul i8 %in1, %In0Hi
  %m00 = mul i8 %In1Lo, %In0Lo
  %addc = add i8 %m01, %m10
  %shl = shl i8 %addc, 4
  %retLo = add i8 %shl, %m00
  ret i8 %retLo
}

Harbormaster completed remote builds in B193924: Diff 470137.Oct 24 2022, 8:26 AM

LGTM

In D136015#3879280, @Allen wrote:
Done, thanks very much for your changes. And I don't completely understand why need the use at the beginning of a function? eg:
define i8 @mul8_low_A0_B2(i8 %in0, i8 %p) {
  %in1 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization

If you remove that line, notice that the values in the later multiply get commuted. That happens before we reach this transform, so the test is trying to ensure that the exact placement of the values at runtime is the same as specified in the test.

llvm/test/Transforms/InstCombine/mul_full_64.ll
452	This patch assumes we are ending with an "add", but this test changes to an "or". We'd need to add another check for hasNoCommonBitsSet() to catch it? Here's another potential fold: https://alive2.llvm.org/ce/z/hUm56R ...but it needs to freeze the inputs to be poison-safe because they have multiple uses.

This revision is now accepted and ready to land.Oct 24 2022, 9:16 AM

If you remove that line, notice that the values in the later multiply get commuted. That happens before we reach this transform, so the test is trying to ensure that the exact placement of the values at runtime is the same as specified in the test.

Thanks very much for your guidance.

Closed by commit rG81713e893a33: [InstCombine] Fold series of instructions into mull (authored by Allen). · Explain WhyOct 24 2022, 10:10 AM

This revision was automatically updated to reflect the committed changes.

Allen added a commit: rG81713e893a33: [InstCombine] Fold series of instructions into mull.

Allen added a subscriber: tgt.Oct 24 2022, 6:56 PM

Allen added inline comments.

llvm/test/Transforms/InstCombine/mul_full_64.ll

452

hi @chfast

I think the case **@mullo** should not be matched? https://alive2.llvm.org/ce/z/jH4kU7

hi, @spatel

 As the case in link https://alive2.llvm.org/ce/z/hUm56R, it's result not equal to **mul i8 %y, %x**, so it need some other logic to match ? maybe defined with a new helper function. see detail https://alive2.llvm.org/ce/z/FEgEU7
```

define i8 @tgt(i8 %x, i8 %y) {

%m = mul i8 %y, %x
ret i8 %m

}

Allen mentioned this in D136661: [InstCombine] Fold series of instructions into mull for more types.Oct 24 2022, 10:06 PM

chfast added inline comments.Oct 24 2022, 11:56 PM

llvm/test/Transforms/InstCombine/mul_full_64.ll
452	I think the case @mullo should not be matched? https://alive2.llvm.org/ce/z/jH4kU7 There is a typo in the example. You changed `or` to `and` but the original pattern starts at `add`. I.e. all patterns starting at `add`, `or` and `xor` should work, the one starting at `and` should not. https://alive2.llvm.org/ce/z/y26zaW I'm not sure it is worth to expand the matching to `or` and `xor.

Allen mentioned this in rG620cff096aba: [InstCombine] Fold series of instructions into mull for more types.Oct 25 2022, 8:05 AM

RKSimon mentioned this in D56214: AggressiveInstCombine: Fold full mul i64 x i64 -> i128.Oct 26 2022, 3:20 AM

Allen mentioned this in rGf58311796c49: [InstCombine] refactor the SimplifyUsingDistributiveLaws NFC.Oct 30 2022, 6:06 AM

Allen added inline comments.Oct 31 2022, 7:32 PM

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1296	hi @RKSimon As this revision is accept, so it is time to consider your refactor suggestion, do you have some idea about the extra tests ? thanks.
llvm/test/Transforms/InstCombine/mul_full_64.ll
452	Thanks @chfast for your case. I take a look at your case more, except the above add VS or, there is some other diffirence with my initail case. https://alive2.llvm.org/ce/z/ZKmrJB

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

46 lines

test/

Transforms/

InstCombine/

mul_fold.ll

195 lines

mul_full_64.ll

15 lines

Diff 470200

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines	if (auto *NewI = dyn_cast<BinaryOperator>(NewMath)) {
NewI->setHasNoUnsignedWrap(HasNUW);		NewI->setHasNoUnsignedWrap(HasNUW);
}		}
auto *NewShl = BinaryOperator::CreateShl(NewMath, ShAmt);		auto *NewShl = BinaryOperator::CreateShl(NewMath, ShAmt);
NewShl->setHasNoSignedWrap(HasNSW);		NewShl->setHasNoSignedWrap(HasNSW);
NewShl->setHasNoUnsignedWrap(HasNUW);		NewShl->setHasNoUnsignedWrap(HasNUW);
return NewShl;		return NewShl;
}		}

		/// Reduce a sequence of masked half-width multiplies to a single multiply.
		spatelUnsubmitted Done Reply Inline Actions Add a better description for the full transform. Something like: /// Reduce a sequence of masked half-width multiplies to a single multiply. /// ((XLow * YHigh) + (YLow * XHigh)) << HalfBits) + (XLow * YLow) --> X * Y spatel: Add a better description for the full transform. Something like: /// Reduce a sequence of…
		/// ((XLow * YHigh) + (YLow * XHigh)) << HalfBits) + (XLow * YLow) --> X * Y
		spatelUnsubmitted Done Reply Inline Actions Function names should start with lower-case letter. "Simplify" has a distinct meaning in LLVM combining - it suggests that we are not creating a new instruction. Even though it is misused in other places including in this file, we shouldn't do that again. I suggest naming this "foldLongMultiply" or "foldBoxMultiply" ( https://www.ixl.com/math/grade-4/box-multiplication ) or something like that, so it's more obvious that we are reducing a sequence of mul and add to something else. spatel: Function names should start with lower-case letter. "Simplify" has a distinct meaning in LLVM…
		static Instruction *foldBoxMultiply(BinaryOperator &I) {
		if (!I.getType()->isIntegerTy())
		return nullptr;

		spatelUnsubmitted Done Reply Inline Actions I don't see a reason to exclude vectors from this transform. Just change this line? unsigned BitWidth = I.getType()->getScalarSizeInBits(); spatel: I don't see a reason to exclude vectors from this transform. Just change this line? unsigned…
		unsigned BitWidth = I.getType()->getScalarSizeInBits();
		// Skip the odd bitwidth types and large bitwidth types
		spatelUnsubmitted Done Reply Inline Actions Similarly, why exclude wide widths? We're already using APInt::getMaxValue(), so just use that APInt in the m_SpecificInt() calls? spatel: Similarly, why exclude wide widths? We're already using APInt::getMaxValue(), so just use that…
		AllenAuthorUnsubmitted Done Reply Inline Actions exclude the wide/vectors widths as hey are unusual get the IR from C/C++ code, and can be expand when needed later? or a seperate patch, now we already need too many cases to cover the pattern? Allen: exclude the wide/vectors widths as hey are unusual get the IR from C/C++ code, and can be…
		spatelUnsubmitted Done Reply Inline Actions There's really no difference in the testing - just change one test to i130 or something like that? And the code difference is just to remove that clause in the `if` on line 1278 - nothing else changes? But if you think there's some risk from handling that, then please add a TODO comment, so we can relax the constraint in a follow-up patch. spatel: There's really no difference in the testing - just change one test to i130 or something like…
		// TODO: Relax the constraint of wide/vectors types.
		if ((BitWidth & 0x1) \|\| (BitWidth > 128))
		return nullptr;

		unsigned HalfBits = BitWidth >> 1;
		APInt HalfMask = APInt::getMaxValue(HalfBits);

		// ResLo = (CrossSum << HalfBits) + (YLo * XLo)
		Value XLo, YLo;
		Value *CrossSum;
		if (!match(&I, m_c_Add(m_Shl(m_Value(CrossSum), m_SpecificInt(HalfBits)),
		m_Mul(m_Value(YLo), m_Value(XLo)))))
		return nullptr;

		// XLo = X & HalfMask
		// YLo = Y & HalfMask
		// TODO: Refactor with SimplifyDemandedBits or KnownBits known leading zeros
		// to enhance robustness
		Value X, Y;
		RKSimonUnsubmitted Done Reply Inline Actions What about if the AND has been removed by SimplifyDemandedBits? Maybe also test for KnownBits known leading zeros? RKSimon: What about if the AND has been removed by SimplifyDemandedBits? Maybe also test for KnownBits…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for your suggestion, I'll record this issue, and I'll try out your suggestions later with a separate patch ? Allen: Thanks for your suggestion, I'll record this issue, and I'll try out your suggestions later…
		RKSimonUnsubmitted Done Reply Inline Actions A TODO comment is fine for now - cheers RKSimon: A TODO comment is fine for now - cheers
		AllenAuthorUnsubmitted Done Reply Inline Actions hi @RKSimon As this revision is accept, so it is time to consider your refactor suggestion, do you have some idea about the extra tests ? thanks. Allen: hi @RKSimon As this revision is accept, so it is time to consider your refactor suggestion…
		if (!match(XLo, m_And(m_Value(X), m_SpecificInt(HalfMask))) \|\|
		!match(YLo, m_And(m_Value(Y), m_SpecificInt(HalfMask))))
		return nullptr;

		// CrossSum = (X' * (Y >> Halfbits)) + (Y' * (X >> HalfBits))
		// X' can be either X or XLo in the pattern (and the same for Y')
		if (match(CrossSum,
		m_c_Add(m_c_Mul(m_LShr(m_Specific(Y), m_SpecificInt(HalfBits)),
		m_CombineOr(m_Specific(X), m_Specific(XLo))),
		m_c_Mul(m_LShr(m_Specific(X), m_SpecificInt(HalfBits)),
		m_CombineOr(m_Specific(Y), m_Specific(YLo))))))
		return BinaryOperator::CreateMul(X, Y);

		return nullptr;
		}

Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {		Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {
if (Value *V = simplifyAddInst(I.getOperand(0), I.getOperand(1),		if (Value *V = simplifyAddInst(I.getOperand(0), I.getOperand(1),
I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),		I.hasNoSignedWrap(), I.hasNoUnsignedWrap(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (SimplifyAssociativeOrCommutative(I))		if (SimplifyAssociativeOrCommutative(I))
return &I;		return &I;

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

if (Instruction *Phi = foldBinopWithPhiOperands(I))		if (Instruction *Phi = foldBinopWithPhiOperands(I))
return Phi;		return Phi;

// (AB)+(AC) -> A*(B+C) etc		// (AB)+(AC) -> A*(B+C) etc
if (Value *V = SimplifyUsingDistributiveLaws(I))		if (Value *V = SimplifyUsingDistributiveLaws(I))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

		if (Instruction *R = foldBoxMultiply(I))
		return R;

if (Instruction *R = factorizeMathWithShlOps(I, Builder))		if (Instruction *R = factorizeMathWithShlOps(I, Builder))
return R;		return R;

if (Instruction *X = foldAddWithConstant(I))		if (Instruction *X = foldAddWithConstant(I))
return X;		return X;

if (Instruction *X = foldNoWrapAdd(I, Builder))		if (Instruction *X = foldNoWrapAdd(I, Builder))
return X;		return X;
▲ Show 20 Lines • Show All 1,245 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/mul_fold.ll

	Show All 10 Lines

	; The following 16 cases are used for cover the commuted operand ADD and MUL			; The following 16 cases are used for cover the commuted operand ADD and MUL
	; with extra uses to more of these tests to exercise those cases.			; with extra uses to more of these tests to exercise those cases.
	; The different _Ax suffix hints the variety of combinations MUL			; The different _Ax suffix hints the variety of combinations MUL
	; The different _Bx suffix hints the variety of combinations ADD			; The different _Bx suffix hints the variety of combinations ADD
	; 4 tests that use in0/in1 with different commutes			; 4 tests that use in0/in1 with different commutes
	define i8 @mul8_low_A0_B0(i8 %in0, i8 %in1) {			define i8 @mul8_low_A0_B0(i8 %in0, i8 %in1) {
	; CHECK-LABEL: @mul8_low_A0_B0(			; CHECK-LABEL: @mul8_low_A0_B0(
	; CHECK-NEXT: [[IN0LO:%.]] = and i8 [[IN0:%.]], 15			; CHECK-NEXT: [[RETLO:%.]] = mul i8 [[IN0:%.]], [[IN1:%.*]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i8 [[IN0]], 4
	; CHECK-NEXT: [[IN1LO:%.]] = and i8 [[IN1:%.]], 15
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i8 [[IN1]], 4
	; CHECK-NEXT: [[M10:%.*]] = mul i8 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul i8 [[IN0HI]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i8 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i8 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i8 [[ADDC]], 4
	; CHECK-NEXT: [[RETLO:%.*]] = add i8 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i8 [[RETLO]]			; CHECK-NEXT: ret i8 [[RETLO]]
	;			;
	%In0Lo = and i8 %in0, 15			%In0Lo = and i8 %in0, 15
	%In0Hi = lshr i8 %in0, 4			%In0Hi = lshr i8 %in0, 4
	%In1Lo = and i8 %in1, 15			%In1Lo = and i8 %in1, 15
	%In1Hi = lshr i8 %in1, 4			%In1Hi = lshr i8 %in1, 4
	%m10 = mul i8 %In1Hi, %in0			%m10 = mul i8 %In1Hi, %in0
	%m01 = mul i8 %In0Hi, %in1			%m01 = mul i8 %In0Hi, %in1
	%m00 = mul i8 %In1Lo, %In0Lo			%m00 = mul i8 %In1Lo, %In0Lo
	%addc = add i8 %m10, %m01			%addc = add i8 %m10, %m01
	%shl = shl i8 %addc, 4			%shl = shl i8 %addc, 4
	%retLo = add i8 %shl, %m00			%retLo = add i8 %shl, %m00
	ret i8 %retLo			ret i8 %retLo
	}			}

	define i8 @mul8_low_A0_B1(i8 %p, i8 %in1) {			define i8 @mul8_low_A0_B1(i8 %p, i8 %in1) {
	; CHECK-LABEL: @mul8_low_A0_B1(			; CHECK-LABEL: @mul8_low_A0_B1(
	; CHECK-NEXT: [[IN0:%.]] = call i8 @use8(i8 [[P:%.]])			; CHECK-NEXT: [[IN0:%.]] = call i8 @use8(i8 [[P:%.]])
	; CHECK-NEXT: [[IN0LO:%.*]] = and i8 [[IN0]], 15			; CHECK-NEXT: [[RETLO:%.]] = mul i8 [[IN0]], [[IN1:%.]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i8 [[IN0]], 4
	; CHECK-NEXT: [[IN1LO:%.]] = and i8 [[IN1:%.]], 15
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i8 [[IN1]], 4
	; CHECK-NEXT: [[M10:%.*]] = mul i8 [[IN0]], [[IN1HI]]
	; CHECK-NEXT: [[M01:%.*]] = mul i8 [[IN0HI]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i8 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i8 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i8 [[ADDC]], 4
	; CHECK-NEXT: [[RETLO:%.*]] = add i8 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i8 [[RETLO]]			; CHECK-NEXT: ret i8 [[RETLO]]
	;			;
	%in0 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization			%in0 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization
	%In0Lo = and i8 %in0, 15			%In0Lo = and i8 %in0, 15
	%In0Hi = lshr i8 %in0, 4			%In0Hi = lshr i8 %in0, 4
	%In1Lo = and i8 %in1, 15			%In1Lo = and i8 %in1, 15
	%In1Hi = lshr i8 %in1, 4			%In1Hi = lshr i8 %in1, 4
	%m10 = mul i8 %in0, %In1Hi			%m10 = mul i8 %in0, %In1Hi
	%m01 = mul i8 %In0Hi, %in1			%m01 = mul i8 %In0Hi, %in1
	%m00 = mul i8 %In1Lo, %In0Lo			%m00 = mul i8 %In1Lo, %In0Lo
	%addc = add i8 %m10, %m01			%addc = add i8 %m10, %m01
	%shl = shl i8 %addc, 4			%shl = shl i8 %addc, 4
	%retLo = add i8 %m00, %shl			%retLo = add i8 %m00, %shl
	ret i8 %retLo			ret i8 %retLo
	}			}

	define i8 @mul8_low_A0_B2(i8 %in0, i8 %p) {			define i8 @mul8_low_A0_B2(i8 %in0, i8 %p) {
	; CHECK-LABEL: @mul8_low_A0_B2(			; CHECK-LABEL: @mul8_low_A0_B2(
	; CHECK-NEXT: [[IN1:%.]] = call i8 @use8(i8 [[P:%.]])			; CHECK-NEXT: [[IN1:%.]] = call i8 @use8(i8 [[P:%.]])
	; CHECK-NEXT: [[IN0LO:%.]] = and i8 [[IN0:%.]], 15			; CHECK-NEXT: [[RETLO:%.]] = mul i8 [[IN1]], [[IN0:%.]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i8 [[IN0]], 4
	; CHECK-NEXT: [[IN1LO:%.*]] = and i8 [[IN1]], 15
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i8 [[IN1]], 4
	; CHECK-NEXT: [[M10:%.*]] = mul i8 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul i8 [[IN1]], [[IN0HI]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i8 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i8 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i8 [[ADDC]], 4
	; CHECK-NEXT: [[RETLO:%.*]] = add i8 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i8 [[RETLO]]			; CHECK-NEXT: ret i8 [[RETLO]]
	;			;

	%in1 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization			%in1 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization
	%In0Lo = and i8 %in0, 15			%In0Lo = and i8 %in0, 15
	%In0Hi = lshr i8 %in0, 4			%In0Hi = lshr i8 %in0, 4
	%In1Lo = and i8 %in1, 15			%In1Lo = and i8 %in1, 15
	%In1Hi = lshr i8 %in1, 4			%In1Hi = lshr i8 %in1, 4
	%m10 = mul i8 %In1Hi, %in0			%m10 = mul i8 %In1Hi, %in0
	%m01 = mul i8 %in1, %In0Hi			%m01 = mul i8 %in1, %In0Hi
	%m00 = mul i8 %In1Lo, %In0Lo			%m00 = mul i8 %In1Lo, %In0Lo
	%addc = add i8 %m01, %m10			%addc = add i8 %m01, %m10
	%shl = shl i8 %addc, 4			%shl = shl i8 %addc, 4
	%retLo = add i8 %shl, %m00			%retLo = add i8 %shl, %m00
	ret i8 %retLo			ret i8 %retLo
	}			}

	define i8 @mul8_low_A0_B3(i8 %p, i8 %q) {			define i8 @mul8_low_A0_B3(i8 %p, i8 %q) {
	; CHECK-LABEL: @mul8_low_A0_B3(			; CHECK-LABEL: @mul8_low_A0_B3(
	; CHECK-NEXT: [[IN0:%.]] = call i8 @use8(i8 [[P:%.]])			; CHECK-NEXT: [[IN0:%.]] = call i8 @use8(i8 [[P:%.]])
	; CHECK-NEXT: [[IN1:%.]] = call i8 @use8(i8 [[Q:%.]])			; CHECK-NEXT: [[IN1:%.]] = call i8 @use8(i8 [[Q:%.]])
	; CHECK-NEXT: [[IN0LO:%.*]] = and i8 [[IN0]], 15			; CHECK-NEXT: [[RETLO:%.*]] = mul i8 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i8 [[IN0]], 4
	; CHECK-NEXT: [[IN1LO:%.*]] = and i8 [[IN1]], 15
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i8 [[IN1]], 4
	; CHECK-NEXT: [[M10:%.*]] = mul i8 [[IN0]], [[IN1HI]]
	; CHECK-NEXT: [[M01:%.*]] = mul i8 [[IN1]], [[IN0HI]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i8 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i8 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i8 [[ADDC]], 4
	; CHECK-NEXT: [[RETLO:%.*]] = add i8 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i8 [[RETLO]]			; CHECK-NEXT: ret i8 [[RETLO]]
	;			;
	%in0 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization			%in0 = call i8 @use8(i8 %p) ; thwart complexity-based canonicalization
	%in1 = call i8 @use8(i8 %q) ; thwart complexity-based canonicalization			%in1 = call i8 @use8(i8 %q) ; thwart complexity-based canonicalization
	%In0Lo = and i8 %in0, 15			%In0Lo = and i8 %in0, 15
	%In0Hi = lshr i8 %in0, 4			%In0Hi = lshr i8 %in0, 4
	%In1Lo = and i8 %in1, 15			%In1Lo = and i8 %in1, 15
	%In1Hi = lshr i8 %in1, 4			%In1Hi = lshr i8 %in1, 4
	Show All 12 Lines
	; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255			; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8			; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8
	; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255			; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8			; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN0LO]], [[IN1HI]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN0LO]], [[IN1HI]]
	; CHECK-NEXT: call void @use16(i16 [[M10]])			; CHECK-NEXT: call void @use16(i16 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN1LO]], [[IN0HI]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN1LO]], [[IN0HI]]
	; CHECK-NEXT: call void @use16(i16 [[M01]])			; CHECK-NEXT: call void @use16(i16 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i16 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i16 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i16 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ADDC]], 8
	; CHECK-NEXT: [[RETLO:%.*]] = add i16 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i16 [[RETLO]]			; CHECK-NEXT: ret i16 [[RETLO]]
	;			;
	%In0Lo = and i16 %in0, 255			%In0Lo = and i16 %in0, 255
	%In0Hi = lshr i16 %in0, 8			%In0Hi = lshr i16 %in0, 8
	%In1Lo = and i16 %in1, 255			%In1Lo = and i16 %in1, 255
	%In1Hi = lshr i16 %in1, 8			%In1Hi = lshr i16 %in1, 8
	%m10 = mul i16 %In0Lo, %In1Hi			%m10 = mul i16 %In0Lo, %In1Hi
	call void @use16(i16 %m10)			call void @use16(i16 %m10)
	Show All 11 Lines
	; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255			; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8			; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8
	; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255			; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8			; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN0LO]], [[IN1HI]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN0LO]], [[IN1HI]]
	; CHECK-NEXT: call void @use16(i16 [[M10]])			; CHECK-NEXT: call void @use16(i16 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN0HI]], [[IN1LO]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN0HI]], [[IN1LO]]
	; CHECK-NEXT: call void @use16(i16 [[M01]])			; CHECK-NEXT: call void @use16(i16 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i16 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i16 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i16 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ADDC]], 8
	; CHECK-NEXT: [[RETLO:%.*]] = add i16 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i16 [[RETLO]]			; CHECK-NEXT: ret i16 [[RETLO]]
	;			;
	%In0Lo = and i16 %in0, 255			%In0Lo = and i16 %in0, 255
	%In0Hi = lshr i16 %in0, 8			%In0Hi = lshr i16 %in0, 8
	%In1Lo = and i16 %in1, 255			%In1Lo = and i16 %in1, 255
	%In1Hi = lshr i16 %in1, 8			%In1Hi = lshr i16 %in1, 8
	%m10 = mul i16 %In0Lo, %In1Hi			%m10 = mul i16 %In0Lo, %In1Hi
	call void @use16(i16 %m10)			call void @use16(i16 %m10)
	Show All 11 Lines
	; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255			; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8			; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8
	; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255			; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8			; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN1HI]], [[IN0LO]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN1HI]], [[IN0LO]]
	; CHECK-NEXT: call void @use16(i16 [[M10]])			; CHECK-NEXT: call void @use16(i16 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN1LO]], [[IN0HI]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN1LO]], [[IN0HI]]
	; CHECK-NEXT: call void @use16(i16 [[M01]])			; CHECK-NEXT: call void @use16(i16 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i16 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i16 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i16 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ADDC]], 8
	; CHECK-NEXT: [[RETLO:%.*]] = add i16 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i16 [[RETLO]]			; CHECK-NEXT: ret i16 [[RETLO]]
	;			;
	%In0Lo = and i16 %in0, 255			%In0Lo = and i16 %in0, 255
	%In0Hi = lshr i16 %in0, 8			%In0Hi = lshr i16 %in0, 8
	%In1Lo = and i16 %in1, 255			%In1Lo = and i16 %in1, 255
	%In1Hi = lshr i16 %in1, 8			%In1Hi = lshr i16 %in1, 8
	%m10 = mul i16 %In1Hi, %In0Lo			%m10 = mul i16 %In1Hi, %In0Lo
	call void @use16(i16 %m10)			call void @use16(i16 %m10)
	Show All 11 Lines
	; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255			; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8			; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8
	; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255			; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8			; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN0LO]], [[IN1HI]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i16 [[IN0LO]], [[IN1HI]]
	; CHECK-NEXT: call void @use16(i16 [[M10]])			; CHECK-NEXT: call void @use16(i16 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN1LO]], [[IN0HI]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i16 [[IN1LO]], [[IN0HI]]
	; CHECK-NEXT: call void @use16(i16 [[M01]])			; CHECK-NEXT: call void @use16(i16 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i16 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i16 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i16 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ADDC]], 8
	; CHECK-NEXT: [[RETLO:%.*]] = add i16 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i16 [[RETLO]]			; CHECK-NEXT: ret i16 [[RETLO]]
	;			;
	%In0Lo = and i16 %in0, 255			%In0Lo = and i16 %in0, 255
	%In0Hi = lshr i16 %in0, 8			%In0Hi = lshr i16 %in0, 8
	%In1Lo = and i16 %in1, 255			%In1Lo = and i16 %in1, 255
	%In1Hi = lshr i16 %in1, 8			%In1Hi = lshr i16 %in1, 8
	%m10 = mul i16 %In0Lo, %In1Hi			%m10 = mul i16 %In0Lo, %In1Hi
	call void @use16(i16 %m10)			call void @use16(i16 %m10)
	%m01 = mul i16 %In1Lo, %In0Hi			%m01 = mul i16 %In1Lo, %In0Hi
	call void @use16(i16 %m01)			call void @use16(i16 %m01)
	%m00 = mul i16 %In1Lo, %In0Lo			%m00 = mul i16 %In1Lo, %In0Lo
	%addc = add i16 %m01, %m10			%addc = add i16 %m01, %m10
	%shl = shl i16 %addc, 8			%shl = shl i16 %addc, 8
	%retLo = add i16 %m00, %shl			%retLo = add i16 %m00, %shl
	ret i16 %retLo			ret i16 %retLo
	}			}

	; 4 tests that use In0Lo/in1 with different commutes			; 4 tests that use In0Lo/in1 with different commutes
	define i32 @mul32_low_A2_B0(i32 %in0, i32 %in1) {			define i32 @mul32_low_A2_B0(i32 %in0, i32 %in1) {
	; CHECK-LABEL: @mul32_low_A2_B0(			; CHECK-LABEL: @mul32_low_A2_B0(
	; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535			; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i32 [[IN0]], 16			; CHECK-NEXT: [[IN1HI:%.]] = lshr i32 [[IN1:%.]], 16
	; CHECK-NEXT: [[IN1LO:%.]] = and i32 [[IN1:%.]], 65535
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]
	; CHECK-NEXT: call void @use32(i32 [[M10]])			; CHECK-NEXT: call void @use32(i32 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul i32 [[IN0HI]], [[IN1]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i32 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i32 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i32 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ADDC]], 16
	; CHECK-NEXT: [[RETLO:%.*]] = add i32 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i32 [[RETLO]]			; CHECK-NEXT: ret i32 [[RETLO]]
	;			;
	%In0Lo = and i32 %in0, 65535			%In0Lo = and i32 %in0, 65535
	%In0Hi = lshr i32 %in0, 16			%In0Hi = lshr i32 %in0, 16
	%In1Lo = and i32 %in1, 65535			%In1Lo = and i32 %in1, 65535
	%In1Hi = lshr i32 %in1, 16			%In1Hi = lshr i32 %in1, 16
	%m10 = mul i32 %In1Hi, %In0Lo			%m10 = mul i32 %In1Hi, %In0Lo
	call void @use32(i32 %m10)			call void @use32(i32 %m10)
	%m01 = mul i32 %In0Hi, %in1			%m01 = mul i32 %In0Hi, %in1
	%m00 = mul i32 %In1Lo, %In0Lo			%m00 = mul i32 %In1Lo, %In0Lo
	%addc = add i32 %m10, %m01			%addc = add i32 %m10, %m01
	%shl = shl i32 %addc, 16			%shl = shl i32 %addc, 16
	%retLo = add i32 %shl, %m00			%retLo = add i32 %shl, %m00
	ret i32 %retLo			ret i32 %retLo
	}			}

	define i32 @mul32_low_A2_B1(i32 %in0, i32 %in1) {			define i32 @mul32_low_A2_B1(i32 %in0, i32 %in1) {
	; CHECK-LABEL: @mul32_low_A2_B1(			; CHECK-LABEL: @mul32_low_A2_B1(
	; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535			; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i32 [[IN0]], 16			; CHECK-NEXT: [[IN1HI:%.]] = lshr i32 [[IN1:%.]], 16
	; CHECK-NEXT: [[IN1LO:%.]] = and i32 [[IN1:%.]], 65535
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]
	; CHECK-NEXT: call void @use32(i32 [[M10]])			; CHECK-NEXT: call void @use32(i32 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul i32 [[IN0HI]], [[IN1]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i32 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i32 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i32 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ADDC]], 16
	; CHECK-NEXT: [[RETLO:%.*]] = add i32 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i32 [[RETLO]]			; CHECK-NEXT: ret i32 [[RETLO]]
	;			;
	%In0Lo = and i32 %in0, 65535			%In0Lo = and i32 %in0, 65535
	%In0Hi = lshr i32 %in0, 16			%In0Hi = lshr i32 %in0, 16
	%In1Lo = and i32 %in1, 65535			%In1Lo = and i32 %in1, 65535
	%In1Hi = lshr i32 %in1, 16			%In1Hi = lshr i32 %in1, 16
	%m10 = mul i32 %In1Hi, %In0Lo			%m10 = mul i32 %In1Hi, %In0Lo
	call void @use32(i32 %m10)			call void @use32(i32 %m10)
	%m01 = mul i32 %In0Hi, %in1			%m01 = mul i32 %In0Hi, %in1
	%m00 = mul i32 %In1Lo, %In0Lo			%m00 = mul i32 %In1Lo, %In0Lo
	%addc = add i32 %m10, %m01			%addc = add i32 %m10, %m01
	%shl = shl i32 %addc, 16			%shl = shl i32 %addc, 16
	%retLo = add i32 %m00, %shl			%retLo = add i32 %m00, %shl
	ret i32 %retLo			ret i32 %retLo
	}			}

	define i32 @mul32_low_A2_B2(i32 %in0, i32 %p) {			define i32 @mul32_low_A2_B2(i32 %in0, i32 %p) {
	; CHECK-LABEL: @mul32_low_A2_B2(			; CHECK-LABEL: @mul32_low_A2_B2(
	; CHECK-NEXT: [[IN1:%.]] = call i32 @use32(i32 [[P:%.]])			; CHECK-NEXT: [[IN1:%.]] = call i32 @use32(i32 [[P:%.]])
	; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535			; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i32 [[IN0]], 16
	; CHECK-NEXT: [[IN1LO:%.*]] = and i32 [[IN1]], 65535
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16			; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN0LO]], [[IN1HI]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN0LO]], [[IN1HI]]
	; CHECK-NEXT: call void @use32(i32 [[M10]])			; CHECK-NEXT: call void @use32(i32 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul i32 [[IN1]], [[IN0HI]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i32 [[IN1]], [[IN0]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i32 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i32 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ADDC]], 16
	; CHECK-NEXT: [[RETLO:%.*]] = add i32 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i32 [[RETLO]]			; CHECK-NEXT: ret i32 [[RETLO]]
	;			;
	%in1 = call i32 @use32(i32 %p) ; thwart complexity-based canonicalization			%in1 = call i32 @use32(i32 %p) ; thwart complexity-based canonicalization
	%In0Lo = and i32 %in0, 65535			%In0Lo = and i32 %in0, 65535
	%In0Hi = lshr i32 %in0, 16			%In0Hi = lshr i32 %in0, 16
	%In1Lo = and i32 %in1, 65535			%In1Lo = and i32 %in1, 65535
	%In1Hi = lshr i32 %in1, 16			%In1Hi = lshr i32 %in1, 16
	%m10 = mul i32 %In0Lo, %In1Hi			%m10 = mul i32 %In0Lo, %In1Hi
	call void @use32(i32 %m10)			call void @use32(i32 %m10)
	%m01 = mul i32 %in1, %In0Hi			%m01 = mul i32 %in1, %In0Hi
	%m00 = mul i32 %In1Lo, %In0Lo			%m00 = mul i32 %In1Lo, %In0Lo
	%addc = add i32 %m01, %m10			%addc = add i32 %m01, %m10
	%shl = shl i32 %addc, 16			%shl = shl i32 %addc, 16
	%retLo = add i32 %shl, %m00			%retLo = add i32 %shl, %m00
	ret i32 %retLo			ret i32 %retLo
	}			}

	define i32 @mul32_low_A2_B3(i32 %in0, i32 %p) {			define i32 @mul32_low_A2_B3(i32 %in0, i32 %p) {
	; CHECK-LABEL: @mul32_low_A2_B3(			; CHECK-LABEL: @mul32_low_A2_B3(
	; CHECK-NEXT: [[IN1:%.]] = call i32 @use32(i32 [[P:%.]])			; CHECK-NEXT: [[IN1:%.]] = call i32 @use32(i32 [[P:%.]])
	; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535			; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i32 [[IN0]], 16
	; CHECK-NEXT: [[IN1LO:%.*]] = and i32 [[IN1]], 65535
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16			; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]
	; CHECK-NEXT: call void @use32(i32 [[M10]])			; CHECK-NEXT: call void @use32(i32 [[M10]])
	; CHECK-NEXT: [[M01:%.*]] = mul i32 [[IN1]], [[IN0HI]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i32 [[IN1]], [[IN0]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i32 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i32 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ADDC]], 16
	; CHECK-NEXT: [[RETLO:%.*]] = add i32 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i32 [[RETLO]]			; CHECK-NEXT: ret i32 [[RETLO]]
	;			;
	%in1 = call i32 @use32(i32 %p) ; thwart complexity-based canonicalization			%in1 = call i32 @use32(i32 %p) ; thwart complexity-based canonicalization
	%In0Lo = and i32 %in0, 65535			%In0Lo = and i32 %in0, 65535
	%In0Hi = lshr i32 %in0, 16			%In0Hi = lshr i32 %in0, 16
	%In1Lo = and i32 %in1, 65535			%In1Lo = and i32 %in1, 65535
	%In1Hi = lshr i32 %in1, 16			%In1Hi = lshr i32 %in1, 16
	%m10 = mul i32 %In1Hi, %In0Lo			%m10 = mul i32 %In1Hi, %In0Lo
	call void @use32(i32 %m10)			call void @use32(i32 %m10)
	%m01 = mul i32 %in1, %In0Hi			%m01 = mul i32 %in1, %In0Hi
	%m00 = mul i32 %In1Lo, %In0Lo			%m00 = mul i32 %In1Lo, %In0Lo
	%addc = add i32 %m01, %m10			%addc = add i32 %m01, %m10
	%shl = shl i32 %addc, 16			%shl = shl i32 %addc, 16
	%retLo = add i32 %m00, %shl			%retLo = add i32 %m00, %shl
	ret i32 %retLo			ret i32 %retLo
	}			}

	; 4 tests that use in0/In1Lo with different commutes			; 4 tests that use in0/In1Lo with different commutes
	define i64 @mul64_low_A3_B0(i64 %in0, i64 %in1) {			define i64 @mul64_low_A3_B0(i64 %in0, i64 %in1) {
	; CHECK-LABEL: @mul64_low_A3_B0(			; CHECK-LABEL: @mul64_low_A3_B0(
	; CHECK-NEXT: [[IN0LO:%.]] = and i64 [[IN0:%.]], 4294967295			; CHECK-NEXT: [[IN0HI:%.]] = lshr i64 [[IN0:%.]], 32
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i64 [[IN0]], 32
	; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295			; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i64 [[IN1]], 32
	; CHECK-NEXT: [[M10:%.*]] = mul i64 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN0HI]], [[IN1LO]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN0HI]], [[IN1LO]]
	; CHECK-NEXT: call void @use64(i64 [[M01]])			; CHECK-NEXT: call void @use64(i64 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i64 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i64 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i64 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADDC]], 32
	; CHECK-NEXT: [[RETLO:%.*]] = add i64 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i64 [[RETLO]]			; CHECK-NEXT: ret i64 [[RETLO]]
	;			;
	%In0Lo = and i64 %in0, 4294967295			%In0Lo = and i64 %in0, 4294967295
	%In0Hi = lshr i64 %in0, 32			%In0Hi = lshr i64 %in0, 32
	%In1Lo = and i64 %in1, 4294967295			%In1Lo = and i64 %in1, 4294967295
	%In1Hi = lshr i64 %in1, 32			%In1Hi = lshr i64 %in1, 32
	%m10 = mul i64 %In1Hi, %in0			%m10 = mul i64 %In1Hi, %in0
	%m01 = mul i64 %In0Hi, %In1Lo			%m01 = mul i64 %In0Hi, %In1Lo
	call void @use64(i64 %m01)			call void @use64(i64 %m01)
	%m00 = mul i64 %In1Lo, %In0Lo			%m00 = mul i64 %In1Lo, %In0Lo
	%addc = add i64 %m10, %m01			%addc = add i64 %m10, %m01
	%shl = shl i64 %addc, 32			%shl = shl i64 %addc, 32
	%retLo = add i64 %shl, %m00			%retLo = add i64 %shl, %m00
	ret i64 %retLo			ret i64 %retLo
	}			}

	define i64 @mul64_low_A3_B1(i64 %in0, i64 %in1) {			define i64 @mul64_low_A3_B1(i64 %in0, i64 %in1) {
	; CHECK-LABEL: @mul64_low_A3_B1(			; CHECK-LABEL: @mul64_low_A3_B1(
	; CHECK-NEXT: [[IN0LO:%.]] = and i64 [[IN0:%.]], 4294967295			; CHECK-NEXT: [[IN0HI:%.]] = lshr i64 [[IN0:%.]], 32
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i64 [[IN0]], 32
	; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295			; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i64 [[IN1]], 32
	; CHECK-NEXT: [[M10:%.*]] = mul i64 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN0HI]], [[IN1LO]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN0HI]], [[IN1LO]]
	; CHECK-NEXT: call void @use64(i64 [[M01]])			; CHECK-NEXT: call void @use64(i64 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i64 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i64 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i64 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADDC]], 32
	; CHECK-NEXT: [[RETLO:%.*]] = add i64 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i64 [[RETLO]]			; CHECK-NEXT: ret i64 [[RETLO]]
	;			;
	%In0Lo = and i64 %in0, 4294967295			%In0Lo = and i64 %in0, 4294967295
	%In0Hi = lshr i64 %in0, 32			%In0Hi = lshr i64 %in0, 32
	%In1Lo = and i64 %in1, 4294967295			%In1Lo = and i64 %in1, 4294967295
	%In1Hi = lshr i64 %in1, 32			%In1Hi = lshr i64 %in1, 32
	%m10 = mul i64 %In1Hi, %in0			%m10 = mul i64 %In1Hi, %in0
	%m01 = mul i64 %In0Hi, %In1Lo			%m01 = mul i64 %In0Hi, %In1Lo
	call void @use64(i64 %m01)			call void @use64(i64 %m01)
	%m00 = mul i64 %In1Lo, %In0Lo			%m00 = mul i64 %In1Lo, %In0Lo
	%addc = add i64 %m10, %m01			%addc = add i64 %m10, %m01
	%shl = shl i64 %addc, 32			%shl = shl i64 %addc, 32
	%retLo = add i64 %m00, %shl			%retLo = add i64 %m00, %shl
	ret i64 %retLo			ret i64 %retLo
	}			}

	define i64 @mul64_low_A3_B2(i64 %p, i64 %in1) {			define i64 @mul64_low_A3_B2(i64 %p, i64 %in1) {
	; CHECK-LABEL: @mul64_low_A3_B2(			; CHECK-LABEL: @mul64_low_A3_B2(
	; CHECK-NEXT: [[IN0:%.]] = call i64 @use64(i64 [[P:%.]])			; CHECK-NEXT: [[IN0:%.]] = call i64 @use64(i64 [[P:%.]])
	; CHECK-NEXT: [[IN0LO:%.*]] = and i64 [[IN0]], 4294967295
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i64 [[IN0]], 32			; CHECK-NEXT: [[IN0HI:%.*]] = lshr i64 [[IN0]], 32
	; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295			; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i64 [[IN1]], 32
	; CHECK-NEXT: [[M10:%.*]] = mul i64 [[IN0]], [[IN1HI]]
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN0HI]], [[IN1LO]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN0HI]], [[IN1LO]]
	; CHECK-NEXT: call void @use64(i64 [[M01]])			; CHECK-NEXT: call void @use64(i64 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i64 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i64 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i64 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADDC]], 32
	; CHECK-NEXT: [[RETLO:%.*]] = add i64 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i64 [[RETLO]]			; CHECK-NEXT: ret i64 [[RETLO]]
	;			;
	%in0 = call i64 @use64(i64 %p) ; thwart complexity-based canonicalization			%in0 = call i64 @use64(i64 %p) ; thwart complexity-based canonicalization
	%In0Lo = and i64 %in0, 4294967295			%In0Lo = and i64 %in0, 4294967295
	%In0Hi = lshr i64 %in0, 32			%In0Hi = lshr i64 %in0, 32
	%In1Lo = and i64 %in1, 4294967295			%In1Lo = and i64 %in1, 4294967295
	%In1Hi = lshr i64 %in1, 32			%In1Hi = lshr i64 %in1, 32
	%m10 = mul i64 %in0, %In1Hi			%m10 = mul i64 %in0, %In1Hi
	%m01 = mul i64 %In0Hi, %In1Lo			%m01 = mul i64 %In0Hi, %In1Lo
	call void @use64(i64 %m01)			call void @use64(i64 %m01)
	%m00 = mul i64 %In1Lo, %In0Lo			%m00 = mul i64 %In1Lo, %In0Lo
	%addc = add i64 %m01, %m10			%addc = add i64 %m01, %m10
	%shl = shl i64 %addc, 32			%shl = shl i64 %addc, 32
	%retLo = add i64 %shl, %m00			%retLo = add i64 %shl, %m00
	ret i64 %retLo			ret i64 %retLo
	}			}

	define i64 @mul64_low_A3_B3(i64 %p, i64 %in1) {			define i64 @mul64_low_A3_B3(i64 %p, i64 %in1) {
	; CHECK-LABEL: @mul64_low_A3_B3(			; CHECK-LABEL: @mul64_low_A3_B3(
	; CHECK-NEXT: [[IN0:%.]] = call i64 @use64(i64 [[P:%.]])			; CHECK-NEXT: [[IN0:%.]] = call i64 @use64(i64 [[P:%.]])
	; CHECK-NEXT: [[IN0LO:%.*]] = and i64 [[IN0]], 4294967295
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i64 [[IN0]], 32			; CHECK-NEXT: [[IN0HI:%.*]] = lshr i64 [[IN0]], 32
	; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295			; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i64 [[IN1]], 32
	; CHECK-NEXT: [[M10:%.*]] = mul i64 [[IN0]], [[IN1HI]]
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN1LO]], [[IN0HI]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i64 [[IN1LO]], [[IN0HI]]
	; CHECK-NEXT: call void @use64(i64 [[M01]])			; CHECK-NEXT: call void @use64(i64 [[M01]])
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i64 [[IN1LO]], [[IN0LO]]			; CHECK-NEXT: [[RETLO:%.*]] = mul i64 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i64 [[M01]], [[M10]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADDC]], 32
	; CHECK-NEXT: [[RETLO:%.*]] = add i64 [[M00]], [[SHL]]
	; CHECK-NEXT: ret i64 [[RETLO]]			; CHECK-NEXT: ret i64 [[RETLO]]
	;			;
	%in0 = call i64 @use64(i64 %p) ; thwart complexity-based canonicalization			%in0 = call i64 @use64(i64 %p) ; thwart complexity-based canonicalization
	%In0Lo = and i64 %in0, 4294967295			%In0Lo = and i64 %in0, 4294967295
	%In0Hi = lshr i64 %in0, 32			%In0Hi = lshr i64 %in0, 32
	%In1Lo = and i64 %in1, 4294967295			%In1Lo = and i64 %in1, 4294967295
	%In1Hi = lshr i64 %in1, 32			%In1Hi = lshr i64 %in1, 32
	%m10 = mul i64 %in0, %In1Hi			%m10 = mul i64 %in0, %In1Hi
	Show All 9 Lines
	define i32 @mul32_low_one_extra_user(i32 %in0, i32 %in1) {			define i32 @mul32_low_one_extra_user(i32 %in0, i32 %in1) {
	; CHECK-LABEL: @mul32_low_one_extra_user(			; CHECK-LABEL: @mul32_low_one_extra_user(
	; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535			; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i32 [[IN0]], 16			; CHECK-NEXT: [[IN0HI:%.*]] = lshr i32 [[IN0]], 16
	; CHECK-NEXT: [[IN1LO:%.]] = and i32 [[IN1:%.]], 65535			; CHECK-NEXT: [[IN1LO:%.]] = and i32 [[IN1:%.]], 65535
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16			; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16
	; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]			; CHECK-NEXT: [[M10:%.*]] = mul nuw i32 [[IN1HI]], [[IN0LO]]
	; CHECK-NEXT: [[M01:%.*]] = mul nuw i32 [[IN1LO]], [[IN0HI]]			; CHECK-NEXT: [[M01:%.*]] = mul nuw i32 [[IN1LO]], [[IN0HI]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i32 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i32 [[M10]], [[M01]]			; CHECK-NEXT: [[ADDC:%.*]] = add i32 [[M10]], [[M01]]
	; CHECK-NEXT: call void @use32(i32 [[ADDC]])			; CHECK-NEXT: call void @use32(i32 [[ADDC]])
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ADDC]], 16			; CHECK-NEXT: [[RETLO:%.*]] = mul i32 [[IN0]], [[IN1]]
	; CHECK-NEXT: [[RETLO:%.*]] = add i32 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i32 [[RETLO]]			; CHECK-NEXT: ret i32 [[RETLO]]
	;			;
	%In0Lo = and i32 %in0, 65535			%In0Lo = and i32 %in0, 65535
	%In0Hi = lshr i32 %in0, 16			%In0Hi = lshr i32 %in0, 16
	%In1Lo = and i32 %in1, 65535			%In1Lo = and i32 %in1, 65535
	%In1Hi = lshr i32 %in1, 16			%In1Hi = lshr i32 %in1, 16
	%m10 = mul i32 %In1Hi, %In0Lo			%m10 = mul i32 %In1Hi, %In0Lo
	%m01 = mul i32 %In1Lo, %In0Hi			%m01 = mul i32 %In1Lo, %In0Hi
	%m00 = mul i32 %In1Lo, %In0Lo			%m00 = mul i32 %In1Lo, %In0Lo
	%addc = add i32 %m10, %m01			%addc = add i32 %m10, %m01
	call void @use32(i32 %addc)			call void @use32(i32 %addc)
	%shl = shl i32 %addc, 16			%shl = shl i32 %addc, 16
	%retLo = add i32 %shl, %m00			%retLo = add i32 %shl, %m00
	ret i32 %retLo			ret i32 %retLo
	}			}

	; The following are variety types of target cases			; The following are variety types of target cases
	; https://alive2.llvm.org/ce/z/2BqKLt			; https://alive2.llvm.org/ce/z/2BqKLt
	define i8 @mul8_low(i8 %in0, i8 %in1) {			define i8 @mul8_low(i8 %in0, i8 %in1) {
	; CHECK-LABEL: @mul8_low(			; CHECK-LABEL: @mul8_low(
	; CHECK-NEXT: [[IN0LO:%.]] = and i8 [[IN0:%.]], 15			; CHECK-NEXT: [[RETLO:%.]] = mul i8 [[IN0:%.]], [[IN1:%.*]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i8 [[IN0]], 4
	; CHECK-NEXT: [[IN1LO:%.]] = and i8 [[IN1:%.]], 15
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i8 [[IN1]], 4
	; CHECK-NEXT: [[M10:%.*]] = mul i8 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul i8 [[IN0HI]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i8 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i8 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i8 [[ADDC]], 4
	; CHECK-NEXT: [[RETLO:%.*]] = add i8 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i8 [[RETLO]]			; CHECK-NEXT: ret i8 [[RETLO]]
	;			;
	%In0Lo = and i8 %in0, 15			%In0Lo = and i8 %in0, 15
	%In0Hi = lshr i8 %in0, 4			%In0Hi = lshr i8 %in0, 4
	%In1Lo = and i8 %in1, 15			%In1Lo = and i8 %in1, 15
	%In1Hi = lshr i8 %in1, 4			%In1Hi = lshr i8 %in1, 4
	%m10 = mul i8 %In1Hi, %In0Lo			%m10 = mul i8 %In1Hi, %In0Lo
	%m01 = mul i8 %In1Lo, %In0Hi			%m01 = mul i8 %In1Lo, %In0Hi
	%m00 = mul i8 %In1Lo, %In0Lo			%m00 = mul i8 %In1Lo, %In0Lo
	%addc = add i8 %m10, %m01			%addc = add i8 %m10, %m01
	%shl = shl i8 %addc, 4			%shl = shl i8 %addc, 4
	%retLo = add i8 %shl, %m00			%retLo = add i8 %shl, %m00
	ret i8 %retLo			ret i8 %retLo
	}			}

	define i16 @mul16_low(i16 %in0, i16 %in1) {			define i16 @mul16_low(i16 %in0, i16 %in1) {
	; CHECK-LABEL: @mul16_low(			; CHECK-LABEL: @mul16_low(
	; CHECK-NEXT: [[IN0LO:%.]] = and i16 [[IN0:%.]], 255			; CHECK-NEXT: [[RETLO:%.]] = mul i16 [[IN0:%.]], [[IN1:%.*]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i16 [[IN0]], 8
	; CHECK-NEXT: [[IN1LO:%.]] = and i16 [[IN1:%.]], 255
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i16 [[IN1]], 8
	; CHECK-NEXT: [[M10:%.*]] = mul i16 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul i16 [[IN0HI]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i16 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i16 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i16 [[ADDC]], 8
	; CHECK-NEXT: [[RETLO:%.*]] = add i16 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i16 [[RETLO]]			; CHECK-NEXT: ret i16 [[RETLO]]
	;			;
	%In0Lo = and i16 %in0, 255			%In0Lo = and i16 %in0, 255
	%In0Hi = lshr i16 %in0, 8			%In0Hi = lshr i16 %in0, 8
	%In1Lo = and i16 %in1, 255			%In1Lo = and i16 %in1, 255
	%In1Hi = lshr i16 %in1, 8			%In1Hi = lshr i16 %in1, 8
	%m10 = mul i16 %In1Hi, %In0Lo			%m10 = mul i16 %In1Hi, %In0Lo
	%m01 = mul i16 %In1Lo, %In0Hi			%m01 = mul i16 %In1Lo, %In0Hi
	%m00 = mul i16 %In1Lo, %In0Lo			%m00 = mul i16 %In1Lo, %In0Lo
	%addc = add i16 %m10, %m01			%addc = add i16 %m10, %m01
	%shl = shl i16 %addc, 8			%shl = shl i16 %addc, 8
	%retLo = add i16 %shl, %m00			%retLo = add i16 %shl, %m00
	ret i16 %retLo			ret i16 %retLo
	}			}

	define i32 @mul32_low(i32 %in0, i32 %in1) {			define i32 @mul32_low(i32 %in0, i32 %in1) {
	; CHECK-LABEL: @mul32_low(			; CHECK-LABEL: @mul32_low(
	; CHECK-NEXT: [[IN0LO:%.]] = and i32 [[IN0:%.]], 65535			; CHECK-NEXT: [[RETLO:%.]] = mul i32 [[IN0:%.]], [[IN1:%.*]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i32 [[IN0]], 16
	; CHECK-NEXT: [[IN1LO:%.]] = and i32 [[IN1:%.]], 65535
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i32 [[IN1]], 16
	; CHECK-NEXT: [[M10:%.*]] = mul i32 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul i32 [[IN0HI]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i32 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i32 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[ADDC]], 16
	; CHECK-NEXT: [[RETLO:%.*]] = add i32 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i32 [[RETLO]]			; CHECK-NEXT: ret i32 [[RETLO]]
	;			;
	%In0Lo = and i32 %in0, 65535			%In0Lo = and i32 %in0, 65535
	%In0Hi = lshr i32 %in0, 16			%In0Hi = lshr i32 %in0, 16
	%In1Lo = and i32 %in1, 65535			%In1Lo = and i32 %in1, 65535
	%In1Hi = lshr i32 %in1, 16			%In1Hi = lshr i32 %in1, 16
	%m10 = mul i32 %In1Hi, %In0Lo			%m10 = mul i32 %In1Hi, %In0Lo
	%m01 = mul i32 %In1Lo, %In0Hi			%m01 = mul i32 %In1Lo, %In0Hi
	%m00 = mul i32 %In1Lo, %In0Lo			%m00 = mul i32 %In1Lo, %In0Lo
	%addc = add i32 %m10, %m01			%addc = add i32 %m10, %m01
	%shl = shl i32 %addc, 16			%shl = shl i32 %addc, 16
	%retLo = add i32 %shl, %m00			%retLo = add i32 %shl, %m00
	ret i32 %retLo			ret i32 %retLo
	}			}

	define i64 @mul64_low(i64 %in0, i64 %in1) {			define i64 @mul64_low(i64 %in0, i64 %in1) {
	; CHECK-LABEL: @mul64_low(			; CHECK-LABEL: @mul64_low(
	; CHECK-NEXT: [[IN0LO:%.]] = and i64 [[IN0:%.]], 4294967295			; CHECK-NEXT: [[RETLO:%.]] = mul i64 [[IN0:%.]], [[IN1:%.*]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i64 [[IN0]], 32
	; CHECK-NEXT: [[IN1LO:%.]] = and i64 [[IN1:%.]], 4294967295
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i64 [[IN1]], 32
	; CHECK-NEXT: [[M10:%.*]] = mul i64 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul i64 [[IN0HI]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i64 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i64 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADDC]], 32
	; CHECK-NEXT: [[RETLO:%.*]] = add i64 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i64 [[RETLO]]			; CHECK-NEXT: ret i64 [[RETLO]]
	;			;
	%In0Lo = and i64 %in0, 4294967295			%In0Lo = and i64 %in0, 4294967295
	%In0Hi = lshr i64 %in0, 32			%In0Hi = lshr i64 %in0, 32
	%In1Lo = and i64 %in1, 4294967295			%In1Lo = and i64 %in1, 4294967295
	%In1Hi = lshr i64 %in1, 32			%In1Hi = lshr i64 %in1, 32
	%m10 = mul i64 %In1Hi, %In0Lo			%m10 = mul i64 %In1Hi, %In0Lo
	%m01 = mul i64 %In1Lo, %In0Hi			%m01 = mul i64 %In1Lo, %In0Hi
	%m00 = mul i64 %In1Lo, %In0Lo			%m00 = mul i64 %In1Lo, %In0Lo
	%addc = add i64 %m10, %m01			%addc = add i64 %m10, %m01
	%shl = shl i64 %addc, 32			%shl = shl i64 %addc, 32
	%retLo = add i64 %shl, %m00			%retLo = add i64 %shl, %m00
	ret i64 %retLo			ret i64 %retLo
	}			}

	define i128 @mul128_low(i128 %in0, i128 %in1) {			define i128 @mul128_low(i128 %in0, i128 %in1) {
	; CHECK-LABEL: @mul128_low(			; CHECK-LABEL: @mul128_low(
	; CHECK-NEXT: [[IN0LO:%.]] = and i128 [[IN0:%.]], 18446744073709551615			; CHECK-NEXT: [[RETLO:%.]] = mul i128 [[IN0:%.]], [[IN1:%.*]]
	; CHECK-NEXT: [[IN0HI:%.*]] = lshr i128 [[IN0]], 64
	; CHECK-NEXT: [[IN1LO:%.]] = and i128 [[IN1:%.]], 18446744073709551615
	; CHECK-NEXT: [[IN1HI:%.*]] = lshr i128 [[IN1]], 64
	; CHECK-NEXT: [[M10:%.*]] = mul i128 [[IN1HI]], [[IN0]]
	; CHECK-NEXT: [[M01:%.*]] = mul i128 [[IN0HI]], [[IN1]]
	; CHECK-NEXT: [[M00:%.*]] = mul nuw i128 [[IN1LO]], [[IN0LO]]
	; CHECK-NEXT: [[ADDC:%.*]] = add i128 [[M10]], [[M01]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i128 [[ADDC]], 64
	; CHECK-NEXT: [[RETLO:%.*]] = add i128 [[SHL]], [[M00]]
	; CHECK-NEXT: ret i128 [[RETLO]]			; CHECK-NEXT: ret i128 [[RETLO]]
	;			;
	%In0Lo = and i128 %in0, 18446744073709551615			%In0Lo = and i128 %in0, 18446744073709551615
	%In0Hi = lshr i128 %in0, 64			%In0Hi = lshr i128 %in0, 64
	%In1Lo = and i128 %in1, 18446744073709551615			%In1Lo = and i128 %in1, 18446744073709551615
	%In1Hi = lshr i128 %in1, 64			%In1Hi = lshr i128 %in1, 64
	%m10 = mul i128 %In1Hi, %In0Lo			%m10 = mul i128 %In1Hi, %In0Lo
	%m01 = mul i128 %In1Lo, %In0Hi			%m01 = mul i128 %In1Lo, %In0Hi
	▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/mul_full_64.ll

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I42]], [[MUL5]]		; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I42]], [[MUL5]]
; CHECK-NEXT: [[SHR_I41:%.*]] = lshr i64 [[ADD]], 32		; CHECK-NEXT: [[SHR_I41:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I41]], [[MUL]]		; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I41]], [[MUL]]
; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295		; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295
; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]		; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]
; CHECK-NEXT: [[SHR_I:%.*]] = lshr i64 [[ADD15]], 32		; CHECK-NEXT: [[SHR_I:%.*]] = lshr i64 [[ADD15]], 32
; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I]]		; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I]]
; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8		; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[ADD18:%.*]] = add i64 [[MUL6]], [[MUL5]]		; CHECK-NEXT: [[ADD19:%.*]] = mul i64 [[A]], [[B]]
; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADD18]], 32
; CHECK-NEXT: [[ADD19:%.*]] = add i64 [[SHL]], [[MUL7]]
; CHECK-NEXT: ret i64 [[ADD19]]		; CHECK-NEXT: ret i64 [[ADD19]]
;		;
%conv = and i64 %a, 4294967295		%conv = and i64 %a, 4294967295
%shr.i45 = lshr i64 %a, 32		%shr.i45 = lshr i64 %a, 32
%conv3 = and i64 %b, 4294967295		%conv3 = and i64 %b, 4294967295
%shr.i43 = lshr i64 %b, 32		%shr.i43 = lshr i64 %b, 32
%mul = mul nuw i64 %shr.i43, %shr.i45		%mul = mul nuw i64 %shr.i43, %shr.i45
%mul5 = mul nuw i64 %conv3, %shr.i45		%mul5 = mul nuw i64 %conv3, %shr.i45
▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	;

%u2 = add i64 %u0h, %t3		%u2 = add i64 %u0h, %t3

%hi = add i64 %u2, %u1h		%hi = add i64 %u2, %u1h
ret i64 %hi		ret i64 %hi
}		}


define i64 @mullo(i64 %x, i64 %y) {		define i64 @mullo(i64 %x, i64 %y) {
		chfastUnsubmitted Not Done Reply Inline Actions Interestingly, it hasn't folded this one. chfast: Interestingly, it hasn't folded this one.
		spatelUnsubmitted Not Done Reply Inline Actions This patch assumes we are ending with an "add", but this test changes to an "or". We'd need to add another check for hasNoCommonBitsSet() to catch it? Here's another potential fold: https://alive2.llvm.org/ce/z/hUm56R ...but it needs to freeze the inputs to be poison-safe because they have multiple uses. spatel: This patch assumes we are ending with an "add", but this test changes to an "or". We'd need to…
		AllenAuthorUnsubmitted Done Reply Inline Actions hi @chfast I think the case @mullo should not be matched? https://alive2.llvm.org/ce/z/jH4kU7 hi, @spatel As the case in link https://alive2.llvm.org/ce/z/hUm56R, it's result not equal to mul i8 %y, %x, so it need some other logic to match ? maybe defined with a new helper function. see detail https://alive2.llvm.org/ce/z/FEgEU7 ``` define i8 @tgt(i8 %x, i8 %y) { %m = mul i8 %y, %x ret i8 %m } Allen: hi @chfast I think the case @mullo should not be matched? https://alive2.llvm.
		chfastUnsubmitted Not Done Reply Inline Actions I think the case @mullo should not be matched? https://alive2.llvm.org/ce/z/jH4kU7 There is a typo in the example. You changed `or` to `and` but the original pattern starts at `add`. I.e. all patterns starting at `add`, `or` and `xor` should work, the one starting at `and` should not. https://alive2.llvm.org/ce/z/y26zaW I'm not sure it is worth to expand the matching to `or` and `xor. chfast: > I think the case @mullo should not be matched? https://alive2.llvm.org/ce/z/jH4kU7…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks @chfast for your case. I take a look at your case more, except the above add VS or, there is some other diffirence with my initail case. https://alive2.llvm.org/ce/z/ZKmrJB Allen: Thanks @chfast for your case. I take a look at your case more, except the above add VS…
; CHECK-LABEL: @mullo(		; CHECK-LABEL: @mullo(
; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295		; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32
; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295		; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]		; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]
; CHECK-NEXT: [[T1:%.*]] = mul i64 [[XH]], [[Y]]		; CHECK-NEXT: [[T1:%.*]] = mul i64 [[XH]], [[Y]]
; CHECK-NEXT: [[T2:%.*]] = mul i64 [[YH]], [[X]]		; CHECK-NEXT: [[T2:%.*]] = mul i64 [[YH]], [[X]]
Show All 25 Lines	;

%lo = or i64 %u1ls, %t0l		%lo = or i64 %u1ls, %t0l
ret i64 %lo		ret i64 %lo
}		}


define i64 @mullo_variant3(i64 %a, i64 %b) {		define i64 @mullo_variant3(i64 %a, i64 %b) {
; CHECK-LABEL: @mullo_variant3(		; CHECK-LABEL: @mullo_variant3(
; CHECK-NEXT: [[AL:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[LO:%.]] = mul i64 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[AH:%.*]] = lshr i64 [[A]], 32
; CHECK-NEXT: [[BL:%.]] = and i64 [[B:%.]], 4294967295
; CHECK-NEXT: [[BH:%.*]] = lshr i64 [[B]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[BL]], [[AL]]
; CHECK-NEXT: [[T1:%.*]] = mul i64 [[AH]], [[B]]
; CHECK-NEXT: [[T2:%.*]] = mul i64 [[BH]], [[A]]
; CHECK-NEXT: [[U1:%.*]] = add i64 [[T2]], [[T1]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[LO:%.*]] = add i64 [[U1LS]], [[T0]]
; CHECK-NEXT: ret i64 [[LO]]		; CHECK-NEXT: ret i64 [[LO]]
;		;
%al = and i64 %a, 4294967295		%al = and i64 %a, 4294967295
%ah = lshr i64 %a, 32		%ah = lshr i64 %a, 32
%bl = and i64 %b, 4294967295		%bl = and i64 %b, 4294967295
%bh = lshr i64 %b, 32		%bh = lshr i64 %b, 32

%t0 = mul nuw i64 %bl, %al		%t0 = mul nuw i64 %bl, %al
▲ Show 20 Lines • Show All 218 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold series of instructions into mullClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 470200

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/test/Transforms/InstCombine/mul_fold.ll

llvm/test/Transforms/InstCombine/mul_full_64.ll

[InstCombine] Fold series of instructions into mull
ClosedPublic