This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
13/15
AggressiveInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
mul_full_32.ll
-
mul_full_64.ll

Differential D56214

AggressiveInstCombine: Fold full mul i64 x i64 -> i128
AbandonedPublic

Authored by chfast on Jan 2 2019, 12:15 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
lebedev.ri

Summary

This PR tries to match full multiplication pattern i64 x i64 -> i128 done by 4 i32 x i32 -> i64 multiplication and meshing the results of those.

This pattern has two outputs: high & low parts and it makes the matching a bit difficult especially when you consider this is my first pattern matcher.

Currently high and low parts are mapped independently what result in generation of two multiplications. I have 3 ideas how to fix this, but suggestions welcome:

Find another pass capable of merging the same multiplications. I tried InstCombine, but instead of merging 2 identical i128 multiplications it rather truncates on of them.

Separate pattern matching from instruction rewrite. Firstly find all patterns and remember them in a worklist. Later try to map patters for low and high by their arguments.

When on of the patterns is found, try to find the pattern for the other part by traversing basic block further.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 27901
Build 27900: arc lint + arc unit

Event Timeline

chfast created this revision.Jan 2 2019, 12:15 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 2 2019, 12:15 PM

Harbormaster completed remote builds in B26333: Diff 179921.Jan 2 2019, 12:16 PM

Haven't taken a deep look yet, but some preliminary thoughts.
Also, i don't think this should be hardcoded to some particular bitwidth.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
254–264	I don't see why these need to be actual functions, lambdas will do?
311–315	if (match(&I, m_c_Or(m_LowPart(m_Value(t0)), m_Shl(m_c_Add(m_LowPart(m_c_Add(m_HighPart(m_Deferred(t0)), m_Value(t1))), m_Value(t2)), m_SpecificInt(32))))) {
330	and now you only have `t0`, no `t0a`
test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Please use `llvm/utils/update_test_checks.py`. And move the initial test case into another review, so this diff shows the change in the test output.

chfast edited the summary of this revision. (Show Details)Jan 2 2019, 12:24 PM

chfast marked 2 inline comments as done.Jan 2 2019, 12:33 PM

chfast added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
254–264	Yes, that's true. I was struggling with the matchers at first so I did them by copy&paste. Will fix that, unless this pattern is useful to someone else.
311–315	That's cool. `m_Deferred` definitely lacks some documentation.

In D56214#1344055, @lebedev.ri wrote:

Also, i don't think this should be hardcoded to some particular bitwidth.

Yes, I agree. However, I don't know how to check for bitwidth in match().

• Quuxplusone added a subscriber: • Quuxplusone.Jan 2 2019, 3:39 PM

• Quuxplusone added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
266	Does it also match the other two I mentioned in the cfe-dev thread? Specifically, where you have (my version `TWO`): %u1ls = shl i64 %u1, 32 %lo = or i64, %u1ls, %t0l my version `ONE` has: %lo = mul i64 %x, %y and my version `THREE` has: %u3 = add i64 %t2, %t1 %u3ls = shl i64 %u3, 32 %lo = add i64 %u3ls, %t0 https://godbolt.org/z/_1pDoz
360	Remove debugging printf?
test/Transforms/AggressiveInstCombine/mul128.ll
8 ↗	(On Diff #179921)	Peanut gallery says: I doubt that this test captures everything that you want to test about the optimization. You just check that the output contains `mul nuw i128`, but what if it contains that instruction plus a bunch more unintended stuff? But I don't know anything about how LLVM optimizations are usually tested. Maybe this test is fine as-is.

craig.topper added a subscriber: craig.topper.Jan 2 2019, 3:47 PM

craig.topper added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
296	Don't you need to check the type is i64 somewhere? Or did I miss it?

chfast marked 3 inline comments as done.Jan 2 2019, 3:55 PM

chfast added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
266	As mentioned before, currently the optimization matches patterns for low and high independently. Mostly, because I don't know yet what is the best way to combine both. Currently for low it replaces pattern `TWO` with `ONE`. The `THREE` will not work. These are great test, will add them to the test suite in a separate review as suggested.
296	Yes, I should. Is there a way to do this check with `match()`. I have not found any example doing this.
test/Transforms/AggressiveInstCombine/mul128.ll
8 ↗	(On Diff #179921)	@lebedev.ri already suggested how to generate better checks.

This also does nothing to guarantee that all(or most) of the instructions will be removed. They could have additional users.

If we're in 32-bit mode then the 128-bit result producing X86 instruction doesn't exist. So this will get expanded to a bunch of smaller multiplies and adds. Do we produce something as good as or better than what we would get if we left the user code alone?

craig.topper added inline comments.Jan 2 2019, 4:03 PM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
296	No. But you can check I.getType()->isIntegerTy(64);

chfast marked 2 inline comments as done.Jan 3 2019, 9:50 AM

chfast added inline comments.

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Tests added as https://reviews.llvm.org/D56277.

lebedev.ri added inline comments.Jan 3 2019, 10:27 AM

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	(Yep, now this diff just needs to be based ontop of that diff, so the tests show difference)

chfast marked 2 inline comments as done.Jan 3 2019, 10:30 AM

chfast added inline comments.

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Can this be done in Phabricator?

chfast added a parent revision: D56277: AggressiveInstCombine: Add tests for full multiplication pattern match.Jan 3 2019, 10:31 AM

lebedev.ri added inline comments.Jan 3 2019, 10:37 AM

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Phabricator only displays the diff you upload. If you used git for this, simply keep these two diffs as two consecutive commits, and upload each one of them separately to their respective reviews. If svn, no idea.

Update with some intermidiate changes.

Harbormaster completed remote builds in B26374: Diff 180132.Jan 3 2019, 1:32 PM

I did a small update.

I rebased the diff on top of the review with tests.

I focused on merging replacement for low and high parts. The strategy is to instead of blindly replacing the pattern with the single multiplication to first try to find the desired multiplication instruction. This feels quite "manual". And I also have trouble with properly placing the new mul instruction.

The types are not checked yet.
I plan to check the native integer size from DataLayout. So this transform will be applied for i64xi64->i128 when i64 is native.

Answering the question: the CodeGen generates the same pattern (I was fixing some bugs there years ago, will verify that claim later on). I don't see benefit of applying this transform if it is going to be reverted in CodeGen unless you know any optimization what this might enable.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
254–264	These must be templates, so lambada will not work until C++14.

Hi again,

I believe I addressed most of the comments. Now HI and LO parts are matched independently, but when both are matched they will use the same i128 multiplication.
Now also DataLayout is checked for the max int size. The pass only replaces multiplication up to 2x native int size. E.g. it will produce max i64 multiplication on 32-bit targets.

The only think left to do is to address the comment about other uses of intermediate values.
The most restrictive approach would be to check all intermediate values if the number of uses matches the pattern?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 5:24 AM

Harbormaster completed remote builds in B27732: Diff 185287.Feb 5 2019, 5:24 AM

Update unit tests.

Harbormaster completed remote builds in B27736: Diff 185297.Feb 5 2019, 6:28 AM

chfast marked 4 inline comments as done.Feb 5 2019, 6:30 AM

• Quuxplusone added inline comments.Feb 5 2019, 9:09 AM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
254	This also does nothing to guarantee that all(or most) of the instructions will be removed.[...] Do we produce something as good as or better than what we would get if we left the user code alone? The only think left to do is to address the comment about other uses of intermediate values. The most restrictive approach would be to check all intermediate values if the number of uses matches the pattern? IIUC, this is not a problem with _correctness_, right? We are protected against removing an instruction whose output still has live uses? But we're worried that the intermediate outputs will all have so many uses that we'll end up generating our MUL and keeping all those intermediate instructions, and so the codegen will be bigger than if we'd left it alone. If I've understood the problem correctly, then I think @chfast's proposed solution is correct: you should do this optimization only if every intermediate result is completely dead (or can be replaced by a corresponding intermediate result of the new code). The vast majority of cases where we want this optimization to fire will be cases where all the intermediate results are dead.
279	Here and lines 277 and 290: `const auto` should be something else (such as [simply `auto&&`](https://quuxplusone.github.io/blog/2018/12/15/autorefref-always-works/)), or else the `const` applies only to the copy you made of whatever the element type of `U->users()` was. I suspect you actually meant `const auto *` but I'm not sure.

Another round of changes.

I fixed some small defects and added more tests.

I'm also checking the number of uses of different intermediate values. However, this check is not perfect. This is the best I could get in the current design of the pass. The main problem is that I try to match 2 different patterns: mullo (actually has 2 variants) and umulhi. Depending if the go together, the uses count differs. Let me know what you think about the current code.

Harbormaster completed remote builds in B27901: Diff 185816.Feb 7 2019, 10:43 AM

chfast marked an inline comment as done.Feb 7 2019, 10:45 AM

• Quuxplusone added inline comments.Feb 8 2019, 7:29 AM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
391	Nit: This expression isn't intuitively clear to me. Also, I would write `uint64_t{0}` as `uint64_t(0)`. I think you're getting really lucky here that `MaxSizeInBits / 2` just happens to be the same number of bits (64) as the width of `uint64_t{0}`; otherwise this math would be wrong. How about assert(HalfSizeInBits <= 64); const auto LowMask = m_SpecificInt((uint64_t(1) << (HalfSizeInBits-1) << 1) - 1); or if there's an existing utility function to compute `uint64_t(1) << HalfSizeInBits` directly.

@chfast What happened to this patch? I was looking at https://llvm.org/PR36243 and wondering if it'd be worth AggressiveInstCombine/InstCombine doing something similar for adds (in that case a 3 chain add i32 to add i96).

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2022, 8:31 AM

In D56214#3377947, @RKSimon wrote:

@chfast What happened to this patch? I was looking at https://llvm.org/PR36243 and wondering if it'd be worth AggressiveInstCombine/InstCombine doing something similar for adds (in that case a 3 chain add i32 to add i96).

I don't remember correctly, but I think I have not received clear answer this is worth the complexity.

On the technical level, we should pick a cut-off point. So it may be ok to do the transformation for i128 given i64 is native type (based on data layout?).

So in the addc case I don't think it make sense to match i96 given the biggest native type is i32. Otherwise, you will be matching a lot of integer multi-precision code and move the work to legalization.

In my practice, LLVM handles multi-precision workloads without builtins pretty good as of recently. However, I'm missing generic addc/subc intrinsic (__builtin_addc is implemented by two uaddos).

RKSimon mentioned this in D136015: [InstCombine] Fold series of instructions into mull.Oct 21 2022, 8:17 AM

chfast mentioned this in rG119c34e7f9c6: [InstCombine][test] Add tests for mul combinations.Oct 22 2022, 7:26 AM

@chfast What do you want to do with this patch now that D136015 landed?

In D56214#3885105, @RKSimon wrote:

@chfast What do you want to do with this patch now that D136015 landed?

I don't really need it any more (I was overoptimistic this is a portable way to get access to 64x64→128 mul instruction). But if you think the change is good and has now flaws I can finish it in some free time. Should it be moved to InstCombine then?

Probably abandon it for now? You can always resurrect it if you find a compelling case.

RKSimon resigned from this revision.Dec 5 2022, 7:07 AM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 12 2023, 5:32 PM

chfast abandoned this revision.Jan 13 2023, 12:59 AM

Revision Contents

Path

Size

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

236 lines

test/

Transforms/

AggressiveInstCombine/

mul_full_32.ll

31 lines

mul_full_64.ll

342 lines

Diff 185816

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	static bool foldAnyOrAllBitsSet(Instruction &I) {
Value *And = Builder.CreateAnd(MOps.Root, Mask);		Value *And = Builder.CreateAnd(MOps.Root, Mask);
Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)		Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)
: Builder.CreateIsNotNull(And);		: Builder.CreateIsNotNull(And);
Value *Zext = Builder.CreateZExt(Cmp, I.getType());		Value *Zext = Builder.CreateZExt(Cmp, I.getType());
I.replaceAllUsesWith(Zext);		I.replaceAllUsesWith(Zext);
return true;		return true;
}		}

		/// Finds the first instruction after both A and B.
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions This also does nothing to guarantee that all(or most) of the instructions will be removed.[...] Do we produce something as good as or better than what we would get if we left the user code alone? The only think left to do is to address the comment about other uses of intermediate values. The most restrictive approach would be to check all intermediate values if the number of uses matches the pattern? IIUC, this is not a problem with _correctness_, right? We are protected against removing an instruction whose output still has live uses? But we're worried that the intermediate outputs will all have so many uses that we'll end up generating our MUL and keeping all those intermediate instructions, and so the codegen will be bigger than if we'd left it alone. If I've understood the problem correctly, then I think @chfast's proposed solution is correct: you should do this optimization only if every intermediate result is completely dead (or can be replaced by a corresponding intermediate result of the new code). The vast majority of cases where we want this optimization to fire will be cases where all the intermediate results are dead. Quuxplusone: > This also does nothing to guarantee that all(or most) of the instructions will be removed.[...
		/// A and B are assumed to be either Instruction or Argument.
		static Instruction getInstructionAfter(Value A, Value *B, DominatorTree &DT) {
		// TODO: Is there better way to achieve that?
		Instruction *I = nullptr;

		if (auto AI = dyn_cast<Instruction>(A))
		I = AI->getNextNode();
		else // If Argument use the first insert point in the entry block.
		I = &*cast<Argument>(A)->getParent()->front().getFirstInsertionPt();

		lebedev.riUnsubmitted Done Reply Inline Actions I don't see why these need to be actual functions, lambdas will do? lebedev.ri: I don't see why these need to be actual functions, lambdas will do?
		chfastAuthorUnsubmitted Done Reply Inline Actions Yes, that's true. I was struggling with the matchers at first so I did them by copy&paste. Will fix that, unless this pattern is useful to someone else. chfast: Yes, that's true. I was struggling with the matchers at first so I did them by copy&paste. Will…
		chfastAuthorUnsubmitted Done Reply Inline Actions These must be templates, so lambada will not work until C++14. chfast: These must be templates, so lambada will not work until C++14.
		auto BI = dyn_cast<Instruction>(B);
		if (BI && (BI == I \|\| DT.dominates(I, BI)))
		QuuxplusoneUnsubmitted Done Reply Inline Actions Does it also match the other two I mentioned in the cfe-dev thread? Specifically, where you have (my version `TWO`): %u1ls = shl i64 %u1, 32 %lo = or i64, %u1ls, %t0l my version `ONE` has: %lo = mul i64 %x, %y and my version `THREE` has: %u3 = add i64 %t2, %t1 %u3ls = shl i64 %u3, 32 %lo = add i64 %u3ls, %t0 https://godbolt.org/z/_1pDoz Quuxplusone: Does it also match the other two I mentioned in the cfe-dev thread? Specifically, where you…
		chfastAuthorUnsubmitted Done Reply Inline Actions As mentioned before, currently the optimization matches patterns for low and high independently. Mostly, because I don't know yet what is the best way to combine both. Currently for low it replaces pattern `TWO` with `ONE`. The `THREE` will not work. These are great test, will add them to the test suite in a separate review as suggested. chfast: As mentioned before, currently the optimization matches patterns for low and high independently.
		I = BI->getNextNode(); // After B.

		return I;
		}

		/// Tries to find the full multiplication instructions pattern:
		/// mul(zext(X), zext(Y)).
		static Value findFullMul(Value X, Value *Y) {
		auto *FullTy = IntegerType::get(X->getContext(),
		X->getType()->getPrimitiveSizeInBits() * 2);
		for (auto *U : X->users()) {
		if (U->getType() == FullTy && match(U, m_ZExt(m_Specific(X)))) {
		for (auto *V : U->users()) {
		QuuxplusoneUnsubmitted Done Reply Inline Actions Here and lines 277 and 290: `const auto` should be something else (such as [simply `auto&&`](https://quuxplusone.github.io/blog/2018/12/15/autorefref-always-works/)), or else the `const` applies only to the copy you made of whatever the element type of `U->users()` was. I suspect you actually meant `const auto ` but I'm not sure. Quuxplusone:* Here and lines 277 and 290: `const auto` should be something else (such as [simply `auto&&`]…
		if (match(V, m_c_Mul(m_Specific(U), m_ZExt(m_Specific(Y)))))
		return V;
		}
		}
		}
		return nullptr;
		}

		/// Tries to find instruction mul(X, Y).
		static Value findLowMul(Value X, Value *Y) {
		for (auto *U : X->users()) {
		if (match(U, m_c_Mul(m_Specific(X), m_Specific(Y))))
		return U;
		}
		return nullptr;
		}

		craig.topperUnsubmitted Done Reply Inline Actions Don't you need to check the type is i64 somewhere? Or did I miss it? craig.topper: Don't you need to check the type is i64 somewhere? Or did I miss it?
		chfastAuthorUnsubmitted Done Reply Inline Actions Yes, I should. Is there a way to do this check with `match()`. I have not found any example doing this. chfast: Yes, I should. Is there a way to do this check with `match()`. I have not found any example…
		craig.topperUnsubmitted Done Reply Inline Actions No. But you can check I.getType()->isIntegerTy(64); craig.topper: No. But you can check I.getType()->isIntegerTy(64);
		/// Tries to find a mul with X, Y as arguments. Creates a new one if not found.
		static Value findOrCreateLowMul(Instruction &I, Value X, Value *Y,
		DominatorTree &DT) {
		if (auto *Mul = findLowMul(X, Y))
		return Mul;

		if (auto *FullMul = findFullMul(X, Y)) {
		IRBuilder<> Builder{&I};
		return Builder.CreateTrunc(FullMul, X->getType(), "fullmul.lo");
		}

		// Create the full multiplication instruction and place it just after its
		// operands. This position is the higher possible so will be safe to be used
		// as a replacement for all future matched patterns.
		IRBuilder<> Builder{getInstructionAfter(X, Y, DT)};
		return Builder.CreateMul(X, Y, "mul");
		}

		/// Tries to find the full mul with X, Y as arguments. If not found it creates
		lebedev.riUnsubmitted Done Reply Inline Actions if (match(&I, m_c_Or(m_LowPart(m_Value(t0)), m_Shl(m_c_Add(m_LowPart(m_c_Add(m_HighPart(m_Deferred(t0)), m_Value(t1))), m_Value(t2)), m_SpecificInt(32))))) { lebedev.ri: ``` if (match(&I, m_c_Or(m_LowPart(m_Value(t0)), m_Shl(m_c_Add…
		chfastAuthorUnsubmitted Done Reply Inline Actions That's cool. `m_Deferred` definitely lacks some documentation. chfast: That's cool. `m_Deferred` definitely lacks some documentation.
		/// a new one. It also replaces low mul if found.
		static Value findOrCreateFullMul(Value X, Value *Y, DominatorTree &DT) {

		if (auto *Mul = findFullMul(X, Y))
		return Mul;

		auto *MulTy = IntegerType::get(X->getContext(),
		X->getType()->getPrimitiveSizeInBits() * 2);
		IRBuilder<> Builder{getInstructionAfter(X, Y, DT)};
		auto *FullMul = Builder.CreateNUWMul(
		Builder.CreateZExt(X, MulTy, {"fullmul.", X->getName()}),
		Builder.CreateZExt(Y, MulTy, {"fullmul.", Y->getName()}), "fullmul");

		// If you find a low mul, replace it also with the full mul.
		if (auto *LowMul = findLowMul(X, Y)) {
		lebedev.riUnsubmitted Done Reply Inline Actions and now you only have `t0`, no `t0a` lebedev.ri: and now you only have `t0`, no `t0a`
		auto *FullMulLo =
		Builder.CreateTrunc(FullMul, LowMul->getType(), "fullmul.lo");
		LowMul->replaceAllUsesWith(FullMulLo);
		}

		return FullMul;
		}

		/// Matches the following pattern producing full multiplication:
		///
		/// %xl = and i64 %x, 4294967295
		/// %xh = lshr i64 %x, 32
		/// %yl = and i64 %y, 4294967295
		/// %yh = lshr i64 %y, 32
		///
		/// %t0 = mul nuw i64 %yl, %xl
		/// %t1 = mul nuw i64 %yl, %xh
		/// %t2 = mul nuw i64 %yh, %xl
		/// %t3 = mul nuw i64 %yh, %xh
		///
		/// %t0l = and i64 %t0, 4294967295
		/// %t0h = lshr i64 %t0, 32
		///
		/// %u0 = add i64 %t0h, %t1
		/// %u0l = and i64 %u0, 4294967295
		/// %u0h = lshr i64 %u0, 32
		///
		/// %u1 = add i64 %u0l, %t2
		/// %u1ls = shl i64 %u1, 32
		/// %u1h = lshr i64 %u1, 32
		QuuxplusoneUnsubmitted Done Reply Inline Actions Remove debugging printf? Quuxplusone: Remove debugging printf?
		///
		/// %u2 = add i64 %u0h, %t3
		///
		/// %lo = or i64 %u1ls, %t0l
		/// %hi = add i64 %u2, %u1h
		///
		static bool foldFullMul(Instruction &I, const DataLayout &DL,
		DominatorTree &DT) {

		// We limit this up to 128 bits to have the low part mask be at most 64-bit
		// (m_SpecificInt() matcher limitation).
		static constexpr unsigned MaxSizeInBits = 128;

		auto *Ty = I.getType();
		if (!Ty->isIntegerTy())
		return false;

		// Check the integer type size.
		// Also make sure the size in bits is even to make low-high split trivial.
		const unsigned SizeInBits = Ty->getPrimitiveSizeInBits();
		if (SizeInBits > MaxSizeInBits \|\| SizeInBits % 2 != 0)
		return false;

		// Skip integers bigger than native.
		if (SizeInBits > DL.getLargestLegalIntTypeSizeInBits())
		return false;

		const unsigned HalfSizeInBits = SizeInBits / 2; // Max 64.
		const auto Half = m_SpecificInt(HalfSizeInBits);
		const auto LowMask =
		m_SpecificInt(~uint64_t{0} >> ((MaxSizeInBits / 2) - HalfSizeInBits));
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions Nit: This expression isn't intuitively clear to me. Also, I would write `uint64_t{0}` as `uint64_t(0)`. I think you're getting really lucky here that `MaxSizeInBits / 2` just happens to be the same number of bits (64) as the width of `uint64_t{0}`; otherwise this math would be wrong. How about assert(HalfSizeInBits <= 64); const auto LowMask = m_SpecificInt((uint64_t(1) << (HalfSizeInBits-1) << 1) - 1); or if there's an existing utility function to compute `uint64_t(1) << HalfSizeInBits` directly. Quuxplusone: Nit: This expression isn't intuitively clear to me. Also, I would write `uint64_t{0}` as…

		Value *X = nullptr;
		Value *Y = nullptr;
		Value *T0 = nullptr;
		Value *T1 = nullptr;
		Value *T2 = nullptr;
		Value *T3 = nullptr;
		Value *U0 = nullptr;

		// Match low part of the full multiplication.
		//
		// First we match up to the multiplications t0, t1, t2.
		// The t0 is reachable by two edges and we _assume_ it's the same node
		// but in general it does not have to be.
		//
		// The long pattern is: ((t2 + lo(t1 + hi(t0))) << 32) \| lo(t0).
		bool LowLongPattern =
		match(
		&I,
		m_c_Or(m_OneUse(m_And(m_Value(T0), LowMask)),
		m_OneUse(m_Shl(
		m_c_Add(m_OneUse(m_And(
		m_c_Add(m_OneUse(m_LShr(m_Deferred(T0), Half)),
		m_OneUse(m_Value(T1))),
		LowMask)),
		m_OneUse(m_Value(T2))),
		Half)))) &&
		!T0->hasNUsesOrMore(3);

		// The short pattern is: ((t2 + t1) << Half) + t0.
		bool LowShortPattern =
		match(&I,
		m_c_Add(m_Value(T0),
		m_OneUse(m_Shl(m_OneUse(m_c_Add(m_Value(T1), m_Value(T2))),
		Half)))) &&
		!T0->hasNUsesOrMore(3) && !T1->hasNUsesOrMore(3) &&
		!T2->hasNUsesOrMore(3);

		if (LowLongPattern \|\| LowShortPattern) {
		// 1. Match t1 and remember its arguments. Start with t1 because asymmetric.
		// 2. Require t2 to be a swapped version of t1.
		// 3. For t0 require to have the same arguments as t1.
		if (match(T1,
		m_c_Mul(m_LShr(m_Value(X), Half), m_And(m_Value(Y), LowMask))) &&
		match(T2, m_c_Mul(m_And(m_Specific(X), LowMask),
		m_LShr(m_Specific(Y), Half))) &&
		match(T0, m_c_Mul(m_And(m_Specific(X), LowMask),
		m_And(m_Specific(Y), LowMask)))) {
		// Replace with single multiplication.
		auto Mul = findOrCreateLowMul(I, X, Y, DT);
		IRBuilder<> Builder{&I};
		auto Low = Builder.CreateTrunc(Mul, Ty, "fullmul.lo");
		I.replaceAllUsesWith(Low);
		return true;
		}
		}

		// Match hi part of the full multiplication.
		//
		// First we match up to multiplications t2 and t3 and u0 node.
		// Then check the u0 node.
		// In the end check all 4 multiplications starting from asymmetric ones
		// the same as in matching the low part.
		if (match(&I, m_c_Add(m_OneUse(m_LShr(
		m_c_Add(m_And(m_Value(U0), LowMask), m_Value(T2)),
		Half)),
		m_OneUse(m_c_Add(m_LShr(m_Deferred(U0), Half),
		m_OneUse(m_Value(T3)))))) &&
		match(U0, m_c_Add(m_LShr(m_Value(T0), Half), m_Value(T1))) &&
		!T2->hasNUsesOrMore(3) && !T1->hasNUsesOrMore(3) &&
		!T0->hasNUsesOrMore(3)) {

		if (match(T1,
		m_c_Mul(m_LShr(m_Value(X), Half), m_And(m_Value(Y), LowMask))) &&
		match(T2, m_c_Mul(m_And(m_Specific(X), LowMask),
		m_LShr(m_Specific(Y), Half))) &&
		match(T0, m_c_Mul(m_And(m_Specific(X), LowMask),
		m_And(m_Specific(Y), LowMask))) &&
		match(T3, m_c_Mul(m_LShr(m_Specific(X), Half),
		m_LShr(m_Specific(Y), Half)))) {
		auto mul = findOrCreateFullMul(X, Y, DT);
		IRBuilder<> Builder{&I};
		auto hi = Builder.CreateTrunc(Builder.CreateLShr(mul, HalfSizeInBits), Ty,
		"fullmul.hi");
		I.replaceAllUsesWith(hi);
		return true;
		}
		}

		return false;
		}

/// This is the entry point for folds that could be implemented in regular		/// This is the entry point for folds that could be implemented in regular
/// InstCombine, but they are separated because they are not expected to		/// InstCombine, but they are separated because they are not expected to
/// occur frequently and/or have more than a constant-length pattern match.		/// occur frequently and/or have more than a constant-length pattern match.
static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {		static bool foldUnusualPatterns(Function &F, const DataLayout &DL,
		DominatorTree &DT) {
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// Also, we want to avoid matching partial patterns.		// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions		// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {		for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
MadeChange \|= foldAnyOrAllBitsSet(I);		MadeChange \|= foldAnyOrAllBitsSet(I);
MadeChange \|= foldGuardedRotateToFunnelShift(I);		MadeChange \|= foldGuardedRotateToFunnelShift(I);
		MadeChange \|= foldFullMul(I, DL, DT);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

return MadeChange;		return MadeChange;
}		}

/// This is the entry point for all transforms. Pass manager differences are		/// This is the entry point for all transforms. Pass manager differences are
/// handled in the callers of this function.		/// handled in the callers of this function.
static bool runImpl(Function &F, TargetLibraryInfo &TLI, DominatorTree &DT) {		static bool runImpl(Function &F, TargetLibraryInfo &TLI, DominatorTree &DT) {
bool MadeChange = false;		bool MadeChange = false;
const DataLayout &DL = F.getParent()->getDataLayout();		const DataLayout &DL = F.getParent()->getDataLayout();
TruncInstCombine TIC(TLI, DL, DT);		TruncInstCombine TIC(TLI, DL, DT);
MadeChange \|= TIC.run(F);		MadeChange \|= TIC.run(F);
MadeChange \|= foldUnusualPatterns(F, DT);		MadeChange \|= foldUnusualPatterns(F, DL, DT);
return MadeChange;		return MadeChange;
}		}

void AggressiveInstCombinerLegacyPass::getAnalysisUsage(		void AggressiveInstCombinerLegacyPass::getAnalysisUsage(
AnalysisUsage &AU) const {		AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

test/Transforms/AggressiveInstCombine/mul_full_32.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s		; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"		target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
target triple = "i386-unknown-linux-gnu"		target triple = "i386-unknown-linux-gnu"

		; This one should not be optimized. We don't want produce mul i128 when
		; the biggest native integer is 32.
define { i64, i64 } @mul_full_64(i64 %x, i64 %y) {		define { i64, i64 } @mul_full_64(i64 %x, i64 %y) {
; CHECK-LABEL: @mul_full_64(		; CHECK-LABEL: @mul_full_64(
; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295		; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32
; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295		; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]		; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]		; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;

%res_lo = insertvalue { i64, i64 } undef, i64 %lo, 0		%res_lo = insertvalue { i64, i64 } undef, i64 %lo, 0
%res = insertvalue { i64, i64 } %res_lo, i64 %hi, 1		%res = insertvalue { i64, i64 } %res_lo, i64 %hi, 1
ret { i64, i64 } %res		ret { i64, i64 } %res
}		}

define { i32, i32 } @mul_full_32(i32 %x, i32 %y) {		define { i32, i32 } @mul_full_32(i32 %x, i32 %y) {
; CHECK-LABEL: @mul_full_32(		; CHECK-LABEL: @mul_full_32(
; CHECK-NEXT: [[XL:%.]] = and i32 [[X:%.]], 65535		; CHECK-NEXT: [[FULLMUL_X:%.]] = zext i32 [[X:%.]] to i64
; CHECK-NEXT: [[XH:%.*]] = lshr i32 [[X]], 16		; CHECK-NEXT: [[FULLMUL_Y:%.]] = zext i32 [[Y:%.]] to i64
; CHECK-NEXT: [[YL:%.]] = and i32 [[Y:%.]], 65535		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i64 [[FULLMUL_X]], [[FULLMUL_Y]]
; CHECK-NEXT: [[YH:%.*]] = lshr i32 [[Y]], 16		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i64 [[FULLMUL]] to i32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i32 [[YL]], [[XL]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[FULLMUL]], 16
; CHECK-NEXT: [[T1:%.*]] = mul nuw i32 [[YL]], [[XH]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i64 [[TMP1]] to i32
; CHECK-NEXT: [[T2:%.*]] = mul nuw i32 [[YH]], [[XL]]		; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i32, i32 } undef, i32 [[FULLMUL_LO]], 0
; CHECK-NEXT: [[T3:%.*]] = mul nuw i32 [[YH]], [[XH]]		; CHECK-NEXT: [[RES:%.*]] = insertvalue { i32, i32 } [[RES_LO]], i32 [[FULLMUL_HI]], 1
; CHECK-NEXT: [[T0L:%.*]] = and i32 [[T0]], 65535
; CHECK-NEXT: [[T0H:%.*]] = lshr i32 [[T0]], 16
; CHECK-NEXT: [[U0:%.*]] = add i32 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i32 [[U0]], 65535
; CHECK-NEXT: [[U0H:%.*]] = lshr i32 [[U0]], 16
; CHECK-NEXT: [[U1:%.*]] = add i32 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i32 [[U1]], 16
; CHECK-NEXT: [[U1H:%.*]] = lshr i32 [[U1]], 16
; CHECK-NEXT: [[U2:%.*]] = add i32 [[U0H]], [[T3]]
; CHECK-NEXT: [[LO:%.*]] = or i32 [[U1LS]], [[T0L]]
; CHECK-NEXT: [[HI:%.*]] = add i32 [[U2]], [[U1H]]
; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i32, i32 } undef, i32 [[LO]], 0
; CHECK-NEXT: [[RES:%.*]] = insertvalue { i32, i32 } [[RES_LO]], i32 [[HI]], 1
; CHECK-NEXT: ret { i32, i32 } [[RES]]		; CHECK-NEXT: ret { i32, i32 } [[RES]]
;		;
%xl = and i32 %x, 65535		%xl = and i32 %x, 65535
%xh = lshr i32 %x, 16		%xh = lshr i32 %x, 16
%yl = and i32 %y, 65535		%yl = and i32 %y, 65535
%yh = lshr i32 %y, 16		%yh = lshr i32 %y, 16

%t0 = mul nuw i32 %yl, %xl		%t0 = mul nuw i32 %yl, %xl
Show All 24 Lines

test/Transforms/AggressiveInstCombine/mul_full_64.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s		; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

define { i64, i64 } @mul_full_64_variant0(i64 %x, i64 %y) {		define { i64, i64 } @mul_full_64_variant0(i64 %x, i64 %y) {
; CHECK-LABEL: @mul_full_64_variant0(		; CHECK-LABEL: @mul_full_64_variant0(
; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL_X:%.]] = zext i64 [[X:%.]] to i128
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[FULLMUL_Y:%.]] = zext i64 [[Y:%.]] to i128
; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_X]], [[FULLMUL_Y]]
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i128 [[FULLMUL]] to i64
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]		; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[FULLMUL_LO]], 0
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[YH]], [[XH]]		; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[FULLMUL_HI]], 1
; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[LO:%.*]] = or i64 [[U1LS]], [[T0L]]
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U2]], [[U1H]]
; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[LO]], 0
; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[HI]], 1
; CHECK-NEXT: ret { i64, i64 } [[RES]]		; CHECK-NEXT: ret { i64, i64 } [[RES]]
;		;
%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32

%t0 = mul nuw i64 %yl, %xl		%t0 = mul nuw i64 %yl, %xl
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; return (uint64_t(lo(rlh + lo(rhl + hi(rll)))) << 32) + lo(rll);		; return (uint64_t(lo(rlh + lo(rhl + hi(rll)))) << 32) + lo(rll);
; #elif THREE		; #elif THREE
; return ((rlh + rhl) << 32) + rll;		; return ((rlh + rhl) << 32) + rll;
; #endif		; #endif
; }		; }

define i64 @mul_full_64_variant1(i64 %a, i64 %b, i64* nocapture %rhi) {		define i64 @mul_full_64_variant1(i64 %a, i64 %b, i64* nocapture %rhi) {
; CHECK-LABEL: @mul_full_64_variant1(		; CHECK-LABEL: @mul_full_64_variant1(
; CHECK-NEXT: [[CONV:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL_A:%.]] = zext i64 [[A:%.]] to i128
; CHECK-NEXT: [[SHR_I43:%.*]] = lshr i64 [[A]], 32		; CHECK-NEXT: [[FULLMUL_B:%.]] = zext i64 [[B:%.]] to i128
; CHECK-NEXT: [[CONV3:%.]] = and i64 [[B:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_A]], [[FULLMUL_B]]
; CHECK-NEXT: [[SHR_I41:%.*]] = lshr i64 [[B]], 32		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i128 [[FULLMUL]] to i64
; CHECK-NEXT: [[MUL:%.*]] = mul nuw i64 [[SHR_I41]], [[SHR_I43]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[MUL5:%.*]] = mul nuw i64 [[CONV3]], [[SHR_I43]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[MUL6:%.*]] = mul nuw i64 [[SHR_I41]], [[CONV]]		; CHECK-NEXT: store i64 [[FULLMUL_HI]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[MUL7:%.*]] = mul nuw i64 [[CONV3]], [[CONV]]		; CHECK-NEXT: ret i64 [[FULLMUL_LO]]
; CHECK-NEXT: [[SHR_I40:%.*]] = lshr i64 [[MUL7]], 32
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I40]], [[MUL5]]
; CHECK-NEXT: [[SHR_I39:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I39]], [[MUL]]
; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295
; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]
; CHECK-NEXT: [[SHR_I:%.*]] = lshr i64 [[ADD15]], 32
; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I]]
; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[MULLO:%.*]] = mul i64 [[B]], [[A]]
; CHECK-NEXT: ret i64 [[MULLO]]
;		;
%conv = and i64 %a, 4294967295		%conv = and i64 %a, 4294967295
%shr.i43 = lshr i64 %a, 32		%shr.i43 = lshr i64 %a, 32
%conv3 = and i64 %b, 4294967295		%conv3 = and i64 %b, 4294967295
%shr.i41 = lshr i64 %b, 32		%shr.i41 = lshr i64 %b, 32
%mul = mul nuw i64 %shr.i41, %shr.i43		%mul = mul nuw i64 %shr.i41, %shr.i43
%mul5 = mul nuw i64 %conv3, %shr.i43		%mul5 = mul nuw i64 %conv3, %shr.i43
%mul6 = mul nuw i64 %shr.i41, %conv		%mul6 = mul nuw i64 %shr.i41, %conv
%mul7 = mul nuw i64 %conv3, %conv		%mul7 = mul nuw i64 %conv3, %conv
%shr.i40 = lshr i64 %mul7, 32		%shr.i40 = lshr i64 %mul7, 32
%add = add i64 %shr.i40, %mul5		%add = add i64 %shr.i40, %mul5
%shr.i39 = lshr i64 %add, 32		%shr.i39 = lshr i64 %add, 32
%add10 = add i64 %shr.i39, %mul		%add10 = add i64 %shr.i39, %mul
%conv14 = and i64 %add, 4294967295		%conv14 = and i64 %add, 4294967295
%add15 = add i64 %conv14, %mul6		%add15 = add i64 %conv14, %mul6
%shr.i = lshr i64 %add15, 32		%shr.i = lshr i64 %add15, 32
%add17 = add i64 %add10, %shr.i		%add17 = add i64 %add10, %shr.i
store i64 %add17, i64* %rhi, align 8		store i64 %add17, i64* %rhi, align 8
%mullo = mul i64 %b, %a		%mullo = mul i64 %b, %a
ret i64 %mullo		ret i64 %mullo
}		}

define i64 @mul_full_64_variant2(i64 %a, i64 %b, i64* nocapture %rhi) {		define i64 @mul_full_64_variant2(i64 %a, i64 %b, i64* nocapture %rhi) {
; CHECK-LABEL: @mul_full_64_variant2(		; CHECK-LABEL: @mul_full_64_variant2(
; CHECK-NEXT: [[CONV:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL_A:%.]] = zext i64 [[A:%.]] to i128
; CHECK-NEXT: [[SHR_I58:%.*]] = lshr i64 [[A]], 32		; CHECK-NEXT: [[FULLMUL_B:%.]] = zext i64 [[B:%.]] to i128
; CHECK-NEXT: [[CONV3:%.]] = and i64 [[B:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_A]], [[FULLMUL_B]]
; CHECK-NEXT: [[SHR_I56:%.*]] = lshr i64 [[B]], 32		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i128 [[FULLMUL]] to i64
; CHECK-NEXT: [[MUL:%.*]] = mul nuw i64 [[SHR_I56]], [[SHR_I58]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[MUL5:%.*]] = mul nuw i64 [[CONV3]], [[SHR_I58]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[MUL6:%.*]] = mul nuw i64 [[SHR_I56]], [[CONV]]		; CHECK-NEXT: store i64 [[FULLMUL_HI]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[MUL7:%.*]] = mul nuw i64 [[CONV3]], [[CONV]]		; CHECK-NEXT: ret i64 [[FULLMUL_LO]]
; CHECK-NEXT: [[SHR_I55:%.*]] = lshr i64 [[MUL7]], 32
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I55]], [[MUL5]]
; CHECK-NEXT: [[SHR_I54:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I54]], [[MUL]]
; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295
; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]
; CHECK-NEXT: [[SHR_I51:%.*]] = lshr i64 [[ADD15]], 32
; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I51]]
; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[CONV24:%.*]] = shl i64 [[ADD15]], 32
; CHECK-NEXT: [[CONV26:%.*]] = and i64 [[MUL7]], 4294967295
; CHECK-NEXT: [[ADD27:%.*]] = or i64 [[CONV24]], [[CONV26]]
; CHECK-NEXT: ret i64 [[ADD27]]
;		;
%conv = and i64 %a, 4294967295		%conv = and i64 %a, 4294967295
%shr.i58 = lshr i64 %a, 32		%shr.i58 = lshr i64 %a, 32
%conv3 = and i64 %b, 4294967295		%conv3 = and i64 %b, 4294967295
%shr.i56 = lshr i64 %b, 32		%shr.i56 = lshr i64 %b, 32
%mul = mul nuw i64 %shr.i56, %shr.i58		%mul = mul nuw i64 %shr.i56, %shr.i58
%mul5 = mul nuw i64 %conv3, %shr.i58		%mul5 = mul nuw i64 %conv3, %shr.i58
%mul6 = mul nuw i64 %shr.i56, %conv		%mul6 = mul nuw i64 %shr.i56, %conv
Show All 10 Lines	;
%conv24 = shl i64 %add15, 32		%conv24 = shl i64 %add15, 32
%conv26 = and i64 %mul7, 4294967295		%conv26 = and i64 %mul7, 4294967295
%add27 = or i64 %conv24, %conv26		%add27 = or i64 %conv24, %conv26
ret i64 %add27		ret i64 %add27
}		}

define i64 @mul_full_64_variant3(i64 %a, i64 %b, i64* nocapture %rhi) {		define i64 @mul_full_64_variant3(i64 %a, i64 %b, i64* nocapture %rhi) {
; CHECK-LABEL: @mul_full_64_variant3(		; CHECK-LABEL: @mul_full_64_variant3(
; CHECK-NEXT: [[CONV:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL_A:%.]] = zext i64 [[A:%.]] to i128
; CHECK-NEXT: [[SHR_I45:%.*]] = lshr i64 [[A]], 32		; CHECK-NEXT: [[FULLMUL_B:%.]] = zext i64 [[B:%.]] to i128
; CHECK-NEXT: [[CONV3:%.]] = and i64 [[B:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_A]], [[FULLMUL_B]]
; CHECK-NEXT: [[SHR_I43:%.*]] = lshr i64 [[B]], 32		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i128 [[FULLMUL]] to i64
; CHECK-NEXT: [[MUL:%.*]] = mul nuw i64 [[SHR_I43]], [[SHR_I45]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[MUL5:%.*]] = mul nuw i64 [[CONV3]], [[SHR_I45]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[MUL6:%.*]] = mul nuw i64 [[SHR_I43]], [[CONV]]		; CHECK-NEXT: store i64 [[FULLMUL_HI]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[MUL7:%.*]] = mul nuw i64 [[CONV3]], [[CONV]]		; CHECK-NEXT: ret i64 [[FULLMUL_LO]]
; CHECK-NEXT: [[SHR_I42:%.*]] = lshr i64 [[MUL7]], 32
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I42]], [[MUL5]]
; CHECK-NEXT: [[SHR_I41:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I41]], [[MUL]]
; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295
; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]
; CHECK-NEXT: [[SHR_I:%.*]] = lshr i64 [[ADD15]], 32
; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I]]
; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[ADD18:%.*]] = add i64 [[MUL6]], [[MUL5]]
; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADD18]], 32
; CHECK-NEXT: [[ADD19:%.*]] = add i64 [[SHL]], [[MUL7]]
; CHECK-NEXT: ret i64 [[ADD19]]
;		;
%conv = and i64 %a, 4294967295		%conv = and i64 %a, 4294967295
%shr.i45 = lshr i64 %a, 32		%shr.i45 = lshr i64 %a, 32
%conv3 = and i64 %b, 4294967295		%conv3 = and i64 %b, 4294967295
%shr.i43 = lshr i64 %b, 32		%shr.i43 = lshr i64 %b, 32
%mul = mul nuw i64 %shr.i43, %shr.i45		%mul = mul nuw i64 %shr.i43, %shr.i45
%mul5 = mul nuw i64 %conv3, %shr.i45		%mul5 = mul nuw i64 %conv3, %shr.i45
%mul6 = mul nuw i64 %shr.i43, %conv		%mul6 = mul nuw i64 %shr.i43, %conv
Show All 11 Lines	;
%shl = shl i64 %add18, 32		%shl = shl i64 %add18, 32
%add19 = add i64 %shl, %mul7		%add19 = add i64 %shl, %mul7
ret i64 %add19		ret i64 %add19
}		}


define { i32, i32 } @mul_full_32(i32 %x, i32 %y) {		define { i32, i32 } @mul_full_32(i32 %x, i32 %y) {
; CHECK-LABEL: @mul_full_32(		; CHECK-LABEL: @mul_full_32(
; CHECK-NEXT: [[XL:%.]] = and i32 [[X:%.]], 65535		; CHECK-NEXT: [[FULLMUL_X:%.]] = zext i32 [[X:%.]] to i64
; CHECK-NEXT: [[XH:%.*]] = lshr i32 [[X]], 16		; CHECK-NEXT: [[FULLMUL_Y:%.]] = zext i32 [[Y:%.]] to i64
; CHECK-NEXT: [[YL:%.]] = and i32 [[Y:%.]], 65535		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i64 [[FULLMUL_X]], [[FULLMUL_Y]]
; CHECK-NEXT: [[YH:%.*]] = lshr i32 [[Y]], 16		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i64 [[FULLMUL]] to i32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i32 [[YL]], [[XL]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[FULLMUL]], 16
; CHECK-NEXT: [[T1:%.*]] = mul nuw i32 [[YL]], [[XH]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i64 [[TMP1]] to i32
; CHECK-NEXT: [[T2:%.*]] = mul nuw i32 [[YH]], [[XL]]		; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i32, i32 } undef, i32 [[FULLMUL_LO]], 0
; CHECK-NEXT: [[T3:%.*]] = mul nuw i32 [[YH]], [[XH]]		; CHECK-NEXT: [[RES:%.*]] = insertvalue { i32, i32 } [[RES_LO]], i32 [[FULLMUL_HI]], 1
; CHECK-NEXT: [[T0L:%.*]] = and i32 [[T0]], 65535
; CHECK-NEXT: [[T0H:%.*]] = lshr i32 [[T0]], 16
; CHECK-NEXT: [[U0:%.*]] = add i32 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i32 [[U0]], 65535
; CHECK-NEXT: [[U0H:%.*]] = lshr i32 [[U0]], 16
; CHECK-NEXT: [[U1:%.*]] = add i32 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i32 [[U1]], 16
; CHECK-NEXT: [[U1H:%.*]] = lshr i32 [[U1]], 16
; CHECK-NEXT: [[U2:%.*]] = add i32 [[U0H]], [[T3]]
; CHECK-NEXT: [[LO:%.*]] = or i32 [[U1LS]], [[T0L]]
; CHECK-NEXT: [[HI:%.*]] = add i32 [[U2]], [[U1H]]
; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i32, i32 } undef, i32 [[LO]], 0
; CHECK-NEXT: [[RES:%.*]] = insertvalue { i32, i32 } [[RES_LO]], i32 [[HI]], 1
; CHECK-NEXT: ret { i32, i32 } [[RES]]		; CHECK-NEXT: ret { i32, i32 } [[RES]]
;		;
%xl = and i32 %x, 65535		%xl = and i32 %x, 65535
%xh = lshr i32 %x, 16		%xh = lshr i32 %x, 16
%yl = and i32 %y, 65535		%yl = and i32 %y, 65535
%yh = lshr i32 %y, 16		%yh = lshr i32 %y, 16

%t0 = mul nuw i32 %yl, %xl		%t0 = mul nuw i32 %yl, %xl
Show All 27 Lines

; In the following test cases %x and %y are instructions, not arguments.		; In the following test cases %x and %y are instructions, not arguments.
; This tests the placement of mul i128 and zexts.		; This tests the placement of mul i128 and zexts.
; Instructions are also shuffled.		; Instructions are also shuffled.

define { i64, i64 } @mul_full_64_variant0_1() {		define { i64, i64 } @mul_full_64_variant0_1() {
; CHECK-LABEL: @mul_full_64_variant0_1(		; CHECK-LABEL: @mul_full_64_variant0_1(
; CHECK-NEXT: [[TMP1:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[TMP1:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[YL:%.*]] = and i64 [[TMP1]], 4294967295
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[TMP1]], 32
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[TMP2:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[TMP2]], 32		; CHECK-NEXT: [[FULLMUL_:%.*]] = zext i64 [[TMP2]] to i128
; CHECK-NEXT: [[XL:%.*]] = and i64 [[TMP2]], 4294967295		; CHECK-NEXT: [[FULLMUL_1:%.*]] = zext i64 [[TMP1]] to i128
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_]], [[FULLMUL_1]]
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[YH]], [[XH]]		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i128 [[FULLMUL]] to i64
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]		; CHECK-NEXT: [[TMP3:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP3]] to i64
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32		; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[FULLMUL_LO]], 0
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]		; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[FULLMUL_HI]], 1
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U2]], [[U1H]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
; CHECK-NEXT: [[LO:%.*]] = or i64 [[U1LS]], [[T0L]]
; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[LO]], 0
; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[HI]], 1
; CHECK-NEXT: ret { i64, i64 } [[RES]]		; CHECK-NEXT: ret { i64, i64 } [[RES]]
;		;
%1 = call i64 @get_number()		%1 = call i64 @get_number()
%yl = and i64 %1, 4294967295		%yl = and i64 %1, 4294967295
%yh = lshr i64 %1, 32		%yh = lshr i64 %1, 32

%2 = call i64 @get_number()		%2 = call i64 @get_number()
%xh = lshr i64 %2, 32		%xh = lshr i64 %2, 32
Show All 21 Lines	;
%res = insertvalue { i64, i64 } %res_lo, i64 %hi, 1		%res = insertvalue { i64, i64 } %res_lo, i64 %hi, 1
ret { i64, i64 } %res		ret { i64, i64 } %res
}		}

define { i64, i64 } @mul_full_64_variant0_2() {		define { i64, i64 } @mul_full_64_variant0_2() {
; CHECK-LABEL: @mul_full_64_variant0_2(		; CHECK-LABEL: @mul_full_64_variant0_2(
; CHECK-NEXT: [[X:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[X:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[Y:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[Y:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[YL:%.*]] = and i64 [[Y]], 4294967295		; CHECK-NEXT: [[FULLMUL_X:%.*]] = zext i64 [[X]] to i128
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[FULLMUL_Y:%.*]] = zext i64 [[Y]] to i128
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_X]], [[FULLMUL_Y]]
; CHECK-NEXT: [[XL:%.*]] = and i64 [[X]], 4294967295		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i128 [[FULLMUL]] to i64
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[XH]], [[YH]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[XL]], [[YH]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[XH]], [[YL]]		; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[FULLMUL_LO]], 0
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[XL]], [[YL]]		; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[FULLMUL_HI]], 1
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T1]], [[T0H]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U1:%.*]] = add i64 [[T2]], [[U0L]]
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U1H]], [[U2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
; CHECK-NEXT: [[LO:%.*]] = or i64 [[T0L]], [[U1LS]]
; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[LO]], 0
; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[HI]], 1
; CHECK-NEXT: ret { i64, i64 } [[RES]]		; CHECK-NEXT: ret { i64, i64 } [[RES]]
;		;
%x = call i64 @get_number()		%x = call i64 @get_number()
%y = call i64 @get_number()		%y = call i64 @get_number()

%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
Show All 20 Lines	;
%res_lo = insertvalue { i64, i64 } undef, i64 %lo, 0		%res_lo = insertvalue { i64, i64 } undef, i64 %lo, 0
%res = insertvalue { i64, i64 } %res_lo, i64 %hi, 1		%res = insertvalue { i64, i64 } %res_lo, i64 %hi, 1
ret { i64, i64 } %res		ret { i64, i64 } %res
}		}


define i64 @umulh_64(i64 %x, i64 %y) {		define i64 @umulh_64(i64 %x, i64 %y) {
; CHECK-LABEL: @umulh_64(		; CHECK-LABEL: @umulh_64(
; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL_X:%.]] = zext i64 [[X:%.]] to i128
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[FULLMUL_Y:%.]] = zext i64 [[Y:%.]] to i128
; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_X]], [[FULLMUL_Y]]
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]		; CHECK-NEXT: ret i64 [[FULLMUL_HI]]
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[YH]], [[XH]]
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U2]], [[U1H]]
; CHECK-NEXT: ret i64 [[HI]]
;		;
%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32

%t0 = mul nuw i64 %yl, %xl		%t0 = mul nuw i64 %yl, %xl
%t1 = mul nuw i64 %yl, %xh		%t1 = mul nuw i64 %yl, %xh
Show All 13 Lines	;

%hi = add i64 %u2, %u1h		%hi = add i64 %u2, %u1h
ret i64 %hi		ret i64 %hi
}		}


define i64 @mullo(i64 %x, i64 %y) {		define i64 @mullo(i64 %x, i64 %y) {
; CHECK-LABEL: @mullo(		; CHECK-LABEL: @mullo(
; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295		; CHECK-NEXT: [[MUL:%.]] = mul i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: ret i64 [[MUL]]
; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]
; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[LO:%.*]] = or i64 [[U1LS]], [[T0L]]
; CHECK-NEXT: ret i64 [[LO]]
;		;
%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32

%t0 = mul nuw i64 %yl, %xl		%t0 = mul nuw i64 %yl, %xl
%t1 = mul nuw i64 %yl, %xh		%t1 = mul nuw i64 %yl, %xh
Show All 10 Lines	;

%lo = or i64 %u1ls, %t0l		%lo = or i64 %u1ls, %t0l
ret i64 %lo		ret i64 %lo
}		}


define i64 @mullo_variant3(i64 %a, i64 %b) {		define i64 @mullo_variant3(i64 %a, i64 %b) {
; CHECK-LABEL: @mullo_variant3(		; CHECK-LABEL: @mullo_variant3(
; CHECK-NEXT: [[AL:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[MUL:%.]] = mul i64 [[B:%.]], [[A:%.*]]
; CHECK-NEXT: [[AH:%.*]] = lshr i64 [[A]], 32		; CHECK-NEXT: ret i64 [[MUL]]
; CHECK-NEXT: [[BL:%.]] = and i64 [[B:%.]], 4294967295
; CHECK-NEXT: [[BH:%.*]] = lshr i64 [[B]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[BL]], [[AL]]
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[BL]], [[AH]]
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[BH]], [[AL]]
; CHECK-NEXT: [[U1:%.*]] = add i64 [[T2]], [[T1]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[LO:%.*]] = add i64 [[U1LS]], [[T0]]
; CHECK-NEXT: ret i64 [[LO]]
;		;
%al = and i64 %a, 4294967295		%al = and i64 %a, 4294967295
%ah = lshr i64 %a, 32		%ah = lshr i64 %a, 32
%bl = and i64 %b, 4294967295		%bl = and i64 %b, 4294967295
%bh = lshr i64 %b, 32		%bh = lshr i64 %b, 32

%t0 = mul nuw i64 %bl, %al		%t0 = mul nuw i64 %bl, %al
%t1 = mul nuw i64 %bl, %ah		%t1 = mul nuw i64 %bl, %ah
Show All 9 Lines

declare void @eat_i64(i64)		declare void @eat_i64(i64)
declare void @eat_i128(i128)		declare void @eat_i128(i128)

define i64 @mullo_duplicate(i64 %x, i64 %y) {		define i64 @mullo_duplicate(i64 %x, i64 %y) {
; CHECK-LABEL: @mullo_duplicate(		; CHECK-LABEL: @mullo_duplicate(
; CHECK-NEXT: [[DUPLICATED_MUL:%.]] = mul i64 [[X:%.]], [[Y:%.*]]		; CHECK-NEXT: [[DUPLICATED_MUL:%.]] = mul i64 [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: call void @eat_i64(i64 [[DUPLICATED_MUL]])		; CHECK-NEXT: call void @eat_i64(i64 [[DUPLICATED_MUL]])
; CHECK-NEXT: [[XL:%.*]] = and i64 [[X]], 4294967295		; CHECK-NEXT: ret i64 [[DUPLICATED_MUL]]
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32
; CHECK-NEXT: [[YL:%.*]] = and i64 [[Y]], 4294967295
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]
; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[LO:%.*]] = or i64 [[U1LS]], [[T0L]]
; CHECK-NEXT: ret i64 [[LO]]
;		;
%duplicated_mul = mul i64 %x, %y		%duplicated_mul = mul i64 %x, %y
call void @eat_i64(i64 %duplicated_mul)		call void @eat_i64(i64 %duplicated_mul)

%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32
Show All 16 Lines
}		}

define { i64, i64 } @mul_full_64_duplicate(i64 %x, i64 %y) {		define { i64, i64 } @mul_full_64_duplicate(i64 %x, i64 %y) {
; CHECK-LABEL: @mul_full_64_duplicate(		; CHECK-LABEL: @mul_full_64_duplicate(
; CHECK-NEXT: [[XX:%.]] = zext i64 [[X:%.]] to i128		; CHECK-NEXT: [[XX:%.]] = zext i64 [[X:%.]] to i128
; CHECK-NEXT: [[YY:%.]] = zext i64 [[Y:%.]] to i128		; CHECK-NEXT: [[YY:%.]] = zext i64 [[Y:%.]] to i128
; CHECK-NEXT: [[DUPLICATED_MUL:%.*]] = mul i128 [[XX]], [[YY]]		; CHECK-NEXT: [[DUPLICATED_MUL:%.*]] = mul i128 [[XX]], [[YY]]
; CHECK-NEXT: call void @eat_i128(i128 [[DUPLICATED_MUL]])		; CHECK-NEXT: call void @eat_i128(i128 [[DUPLICATED_MUL]])
; CHECK-NEXT: [[XL:%.*]] = and i64 [[X]], 4294967295		; CHECK-NEXT: [[FULLMUL_LO:%.*]] = trunc i128 [[DUPLICATED_MUL]] to i64
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[DUPLICATED_MUL]], 32
; CHECK-NEXT: [[YL:%.*]] = and i64 [[Y]], 4294967295		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[FULLMUL_LO]], 0
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]		; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[FULLMUL_HI]], 1
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[YH]], [[XH]]
; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[LO:%.*]] = or i64 [[U1LS]], [[T0L]]
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U2]], [[U1H]]
; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[LO]], 0
; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[HI]], 1
; CHECK-NEXT: ret { i64, i64 } [[RES]]		; CHECK-NEXT: ret { i64, i64 } [[RES]]
;		;
%xx = zext i64 %x to i128		%xx = zext i64 %x to i128
%yy = zext i64 %y to i128		%yy = zext i64 %y to i128
%duplicated_mul = mul i128 %xx, %yy		%duplicated_mul = mul i128 %xx, %yy
call void @eat_i128(i128 %duplicated_mul)		call void @eat_i128(i128 %duplicated_mul)

%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295
Show All 27 Lines	;
ret { i64, i64 } %res		ret { i64, i64 } %res
}		}


define i64 @umulhi_64_v2() {		define i64 @umulhi_64_v2() {
; CHECK-LABEL: @umulhi_64_v2(		; CHECK-LABEL: @umulhi_64_v2(
; CHECK-NEXT: [[X:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[X:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[Y:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[Y:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[YL:%.*]] = and i64 [[Y]], 4294967295		; CHECK-NEXT: [[FULLMUL_X:%.*]] = zext i64 [[X]] to i128
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[FULLMUL_Y:%.*]] = zext i64 [[Y]] to i128
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_X]], [[FULLMUL_Y]]
; CHECK-NEXT: [[XL:%.*]] = and i64 [[X]], 4294967295		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[XH]], [[YH]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[XL]], [[YH]]		; CHECK-NEXT: ret i64 [[FULLMUL_HI]]
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[XH]], [[YL]]
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[XL]], [[YL]]
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T1]], [[T0H]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U1:%.*]] = add i64 [[T2]], [[U0L]]
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U1H]], [[U2]]
; CHECK-NEXT: ret i64 [[HI]]
;		;
%x = call i64 @get_number()		%x = call i64 @get_number()
%y = call i64 @get_number()		%y = call i64 @get_number()

%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295
Show All 14 Lines	;

ret i64 %hi		ret i64 %hi
}		}


define i64 @umulhi_64_v3() {		define i64 @umulhi_64_v3() {
; CHECK-LABEL: @umulhi_64_v3(		; CHECK-LABEL: @umulhi_64_v3(
; CHECK-NEXT: [[X:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[X:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32
; CHECK-NEXT: [[XL:%.*]] = and i64 [[X]], 4294967295
; CHECK-NEXT: [[Y:%.*]] = call i64 @get_number()		; CHECK-NEXT: [[Y:%.*]] = call i64 @get_number()
; CHECK-NEXT: [[YL:%.*]] = and i64 [[Y]], 4294967295		; CHECK-NEXT: [[FULLMUL_X:%.*]] = zext i64 [[X]] to i128
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[FULLMUL_Y:%.*]] = zext i64 [[Y]] to i128
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[XH]], [[YH]]		; CHECK-NEXT: [[FULLMUL:%.*]] = mul nuw i128 [[FULLMUL_X]], [[FULLMUL_Y]]
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[XL]], [[YH]]		; CHECK-NEXT: [[TMP1:%.*]] = lshr i128 [[FULLMUL]], 32
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[XH]], [[YL]]		; CHECK-NEXT: [[FULLMUL_HI:%.*]] = trunc i128 [[TMP1]] to i64
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[XL]], [[YL]]		; CHECK-NEXT: ret i64 [[FULLMUL_HI]]
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T1]], [[T0H]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U1:%.*]] = add i64 [[T2]], [[U0L]]
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U1H]], [[U2]]
; CHECK-NEXT: ret i64 [[HI]]
;		;
%x = call i64 @get_number()		%x = call i64 @get_number()
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295

%y = call i64 @get_number()		%y = call i64 @get_number()
%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32
Show All 17 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AggressiveInstCombine: Fold full mul i64 x i64 -> i128AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 185816

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

test/Transforms/AggressiveInstCombine/mul_full_32.ll

test/Transforms/AggressiveInstCombine/mul_full_64.ll

AggressiveInstCombine: Fold full mul i64 x i64 -> i128
AbandonedPublic