This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
13/15
AggressiveInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
mul_full_32.ll
-
mul_full_64.ll

Differential D56214

AggressiveInstCombine: Fold full mul i64 x i64 -> i128
AbandonedPublic

Authored by chfast on Jan 2 2019, 12:15 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
lebedev.ri

Summary

This PR tries to match full multiplication pattern i64 x i64 -> i128 done by 4 i32 x i32 -> i64 multiplication and meshing the results of those.

This pattern has two outputs: high & low parts and it makes the matching a bit difficult especially when you consider this is my first pattern matcher.

Currently high and low parts are mapped independently what result in generation of two multiplications. I have 3 ideas how to fix this, but suggestions welcome:

Find another pass capable of merging the same multiplications. I tried InstCombine, but instead of merging 2 identical i128 multiplications it rather truncates on of them.

Separate pattern matching from instruction rewrite. Firstly find all patterns and remember them in a worklist. Later try to map patters for low and high by their arguments.

When on of the patterns is found, try to find the pattern for the other part by traversing basic block further.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 26374
Build 26373: arc lint + arc unit

Event Timeline

chfast created this revision.Jan 2 2019, 12:15 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 2 2019, 12:15 PM

Harbormaster completed remote builds in B26333: Diff 179921.Jan 2 2019, 12:16 PM

Haven't taken a deep look yet, but some preliminary thoughts.
Also, i don't think this should be hardcoded to some particular bitwidth.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
255–265	I don't see why these need to be actual functions, lambdas will do?
312–316	if (match(&I, m_c_Or(m_LowPart(m_Value(t0)), m_Shl(m_c_Add(m_LowPart(m_c_Add(m_HighPart(m_Deferred(t0)), m_Value(t1))), m_Value(t2)), m_SpecificInt(32))))) {
331	and now you only have `t0`, no `t0a`
test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Please use `llvm/utils/update_test_checks.py`. And move the initial test case into another review, so this diff shows the change in the test output.

chfast edited the summary of this revision. (Show Details)Jan 2 2019, 12:24 PM

chfast marked 2 inline comments as done.Jan 2 2019, 12:33 PM

chfast added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
255–265	Yes, that's true. I was struggling with the matchers at first so I did them by copy&paste. Will fix that, unless this pattern is useful to someone else.
312–316	That's cool. `m_Deferred` definitely lacks some documentation.

In D56214#1344055, @lebedev.ri wrote:

Also, i don't think this should be hardcoded to some particular bitwidth.

Yes, I agree. However, I don't know how to check for bitwidth in match().

• Quuxplusone added a subscriber: • Quuxplusone.Jan 2 2019, 3:39 PM

• Quuxplusone added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
267	Does it also match the other two I mentioned in the cfe-dev thread? Specifically, where you have (my version `TWO`): %u1ls = shl i64 %u1, 32 %lo = or i64, %u1ls, %t0l my version `ONE` has: %lo = mul i64 %x, %y and my version `THREE` has: %u3 = add i64 %t2, %t1 %u3ls = shl i64 %u3, 32 %lo = add i64 %u3ls, %t0 https://godbolt.org/z/_1pDoz
361	Remove debugging printf?
test/Transforms/AggressiveInstCombine/mul128.ll
8 ↗	(On Diff #179921)	Peanut gallery says: I doubt that this test captures everything that you want to test about the optimization. You just check that the output contains `mul nuw i128`, but what if it contains that instruction plus a bunch more unintended stuff? But I don't know anything about how LLVM optimizations are usually tested. Maybe this test is fine as-is.

craig.topper added a subscriber: craig.topper.Jan 2 2019, 3:47 PM

craig.topper added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
297	Don't you need to check the type is i64 somewhere? Or did I miss it?

chfast marked 3 inline comments as done.Jan 2 2019, 3:55 PM

chfast added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
267	As mentioned before, currently the optimization matches patterns for low and high independently. Mostly, because I don't know yet what is the best way to combine both. Currently for low it replaces pattern `TWO` with `ONE`. The `THREE` will not work. These are great test, will add them to the test suite in a separate review as suggested.
297	Yes, I should. Is there a way to do this check with `match()`. I have not found any example doing this.
test/Transforms/AggressiveInstCombine/mul128.ll
8 ↗	(On Diff #179921)	@lebedev.ri already suggested how to generate better checks.

This also does nothing to guarantee that all(or most) of the instructions will be removed. They could have additional users.

If we're in 32-bit mode then the 128-bit result producing X86 instruction doesn't exist. So this will get expanded to a bunch of smaller multiplies and adds. Do we produce something as good as or better than what we would get if we left the user code alone?

craig.topper added inline comments.Jan 2 2019, 4:03 PM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
297	No. But you can check I.getType()->isIntegerTy(64);

chfast marked 2 inline comments as done.Jan 3 2019, 9:50 AM

chfast added inline comments.

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Tests added as https://reviews.llvm.org/D56277.

lebedev.ri added inline comments.Jan 3 2019, 10:27 AM

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	(Yep, now this diff just needs to be based ontop of that diff, so the tests show difference)

chfast marked 2 inline comments as done.Jan 3 2019, 10:30 AM

chfast added inline comments.

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Can this be done in Phabricator?

chfast added a parent revision: D56277: AggressiveInstCombine: Add tests for full multiplication pattern match.Jan 3 2019, 10:31 AM

lebedev.ri added inline comments.Jan 3 2019, 10:37 AM

test/Transforms/AggressiveInstCombine/mul128.ll
1 ↗	(On Diff #179921)	Phabricator only displays the diff you upload. If you used git for this, simply keep these two diffs as two consecutive commits, and upload each one of them separately to their respective reviews. If svn, no idea.

Update with some intermidiate changes.

Harbormaster completed remote builds in B26374: Diff 180132.Jan 3 2019, 1:32 PM

I did a small update.

I rebased the diff on top of the review with tests.

I focused on merging replacement for low and high parts. The strategy is to instead of blindly replacing the pattern with the single multiplication to first try to find the desired multiplication instruction. This feels quite "manual". And I also have trouble with properly placing the new mul instruction.

The types are not checked yet.
I plan to check the native integer size from DataLayout. So this transform will be applied for i64xi64->i128 when i64 is native.

Answering the question: the CodeGen generates the same pattern (I was fixing some bugs there years ago, will verify that claim later on). I don't see benefit of applying this transform if it is going to be reverted in CodeGen unless you know any optimization what this might enable.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
255–265	These must be templates, so lambada will not work until C++14.

Hi again,

I believe I addressed most of the comments. Now HI and LO parts are matched independently, but when both are matched they will use the same i128 multiplication.
Now also DataLayout is checked for the max int size. The pass only replaces multiplication up to 2x native int size. E.g. it will produce max i64 multiplication on 32-bit targets.

The only think left to do is to address the comment about other uses of intermediate values.
The most restrictive approach would be to check all intermediate values if the number of uses matches the pattern?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 5:24 AM

Harbormaster completed remote builds in B27732: Diff 185287.Feb 5 2019, 5:24 AM

Update unit tests.

Harbormaster completed remote builds in B27736: Diff 185297.Feb 5 2019, 6:28 AM

chfast marked 4 inline comments as done.Feb 5 2019, 6:30 AM

• Quuxplusone added inline comments.Feb 5 2019, 9:09 AM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
255	This also does nothing to guarantee that all(or most) of the instructions will be removed.[...] Do we produce something as good as or better than what we would get if we left the user code alone? The only think left to do is to address the comment about other uses of intermediate values. The most restrictive approach would be to check all intermediate values if the number of uses matches the pattern? IIUC, this is not a problem with _correctness_, right? We are protected against removing an instruction whose output still has live uses? But we're worried that the intermediate outputs will all have so many uses that we'll end up generating our MUL and keeping all those intermediate instructions, and so the codegen will be bigger than if we'd left it alone. If I've understood the problem correctly, then I think @chfast's proposed solution is correct: you should do this optimization only if every intermediate result is completely dead (or can be replaced by a corresponding intermediate result of the new code). The vast majority of cases where we want this optimization to fire will be cases where all the intermediate results are dead.
280	Here and lines 277 and 290: `const auto` should be something else (such as [simply `auto&&`](https://quuxplusone.github.io/blog/2018/12/15/autorefref-always-works/)), or else the `const` applies only to the copy you made of whatever the element type of `U->users()` was. I suspect you actually meant `const auto *` but I'm not sure.

Another round of changes.

I fixed some small defects and added more tests.

I'm also checking the number of uses of different intermediate values. However, this check is not perfect. This is the best I could get in the current design of the pass. The main problem is that I try to match 2 different patterns: mullo (actually has 2 variants) and umulhi. Depending if the go together, the uses count differs. Let me know what you think about the current code.

Harbormaster completed remote builds in B27901: Diff 185816.Feb 7 2019, 10:43 AM

chfast marked an inline comment as done.Feb 7 2019, 10:45 AM

• Quuxplusone added inline comments.Feb 8 2019, 7:29 AM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
392	Nit: This expression isn't intuitively clear to me. Also, I would write `uint64_t{0}` as `uint64_t(0)`. I think you're getting really lucky here that `MaxSizeInBits / 2` just happens to be the same number of bits (64) as the width of `uint64_t{0}`; otherwise this math would be wrong. How about assert(HalfSizeInBits <= 64); const auto LowMask = m_SpecificInt((uint64_t(1) << (HalfSizeInBits-1) << 1) - 1); or if there's an existing utility function to compute `uint64_t(1) << HalfSizeInBits` directly.

@chfast What happened to this patch? I was looking at https://llvm.org/PR36243 and wondering if it'd be worth AggressiveInstCombine/InstCombine doing something similar for adds (in that case a 3 chain add i32 to add i96).

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2022, 8:31 AM

In D56214#3377947, @RKSimon wrote:

@chfast What happened to this patch? I was looking at https://llvm.org/PR36243 and wondering if it'd be worth AggressiveInstCombine/InstCombine doing something similar for adds (in that case a 3 chain add i32 to add i96).

I don't remember correctly, but I think I have not received clear answer this is worth the complexity.

On the technical level, we should pick a cut-off point. So it may be ok to do the transformation for i128 given i64 is native type (based on data layout?).

So in the addc case I don't think it make sense to match i96 given the biggest native type is i32. Otherwise, you will be matching a lot of integer multi-precision code and move the work to legalization.

In my practice, LLVM handles multi-precision workloads without builtins pretty good as of recently. However, I'm missing generic addc/subc intrinsic (__builtin_addc is implemented by two uaddos).

RKSimon mentioned this in D136015: [InstCombine] Fold series of instructions into mull.Oct 21 2022, 8:17 AM

chfast mentioned this in rG119c34e7f9c6: [InstCombine][test] Add tests for mul combinations.Oct 22 2022, 7:26 AM

@chfast What do you want to do with this patch now that D136015 landed?

In D56214#3885105, @RKSimon wrote:

@chfast What do you want to do with this patch now that D136015 landed?

I don't really need it any more (I was overoptimistic this is a portable way to get access to 64x64→128 mul instruction). But if you think the change is good and has now flaws I can finish it in some free time. Should it be moved to InstCombine then?

Probably abandon it for now? You can always resurrect it if you find a compelling case.

RKSimon resigned from this revision.Dec 5 2022, 7:07 AM

This review may be stuck/dead, consider abandoning if no longer relevant.
Removing myself as reviewer in attempt to clean dashboard.

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 12 2023, 5:32 PM

chfast abandoned this revision.Jan 13 2023, 12:59 AM

Revision Contents

Path

Size

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

156 lines

test/

Transforms/

AggressiveInstCombine/

mul_full_32.ll

29 lines

mul_full_64.ll

101 lines

Diff 180132

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	static bool foldAnyOrAllBitsSet(Instruction &I) {
Value *And = Builder.CreateAnd(MOps.Root, Mask);		Value *And = Builder.CreateAnd(MOps.Root, Mask);
Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)		Value *Cmp = MatchAllBitsSet ? Builder.CreateICmpEQ(And, Mask)
: Builder.CreateIsNotNull(And);		: Builder.CreateIsNotNull(And);
Value *Zext = Builder.CreateZExt(Cmp, I.getType());		Value *Zext = Builder.CreateZExt(Cmp, I.getType());
I.replaceAllUsesWith(Zext);		I.replaceAllUsesWith(Zext);
return true;		return true;
}		}

		template <typename LHS>
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions This also does nothing to guarantee that all(or most) of the instructions will be removed.[...] Do we produce something as good as or better than what we would get if we left the user code alone? The only think left to do is to address the comment about other uses of intermediate values. The most restrictive approach would be to check all intermediate values if the number of uses matches the pattern? IIUC, this is not a problem with _correctness_, right? We are protected against removing an instruction whose output still has live uses? But we're worried that the intermediate outputs will all have so many uses that we'll end up generating our MUL and keeping all those intermediate instructions, and so the codegen will be bigger than if we'd left it alone. If I've understood the problem correctly, then I think @chfast's proposed solution is correct: you should do this optimization only if every intermediate result is completely dead (or can be replaced by a corresponding intermediate result of the new code). The vast majority of cases where we want this optimization to fire will be cases where all the intermediate results are dead. Quuxplusone: > This also does nothing to guarantee that all(or most) of the instructions will be removed.[...
		inline BinaryOp_match<LHS, specific_intval, Instruction::And>
		m_LowPart(const LHS &L) {
		return m_And(L, m_SpecificInt(0xffffffff));
		}

		template <typename LHS>
		inline BinaryOp_match<LHS, specific_intval, Instruction::LShr>
		m_HighPart(const LHS &L) {
		return m_LShr(L, m_SpecificInt(32));
		}
		lebedev.riUnsubmitted Done Reply Inline Actions I don't see why these need to be actual functions, lambdas will do? lebedev.ri: I don't see why these need to be actual functions, lambdas will do?
		chfastAuthorUnsubmitted Done Reply Inline Actions Yes, that's true. I was struggling with the matchers at first so I did them by copy&paste. Will fix that, unless this pattern is useful to someone else. chfast: Yes, that's true. I was struggling with the matchers at first so I did them by copy&paste. Will…
		chfastAuthorUnsubmitted Done Reply Inline Actions These must be templates, so lambada will not work until C++14. chfast: These must be templates, so lambada will not work until C++14.

		static Value findOrCreateFullMul(Instruction &I, Value x, Value *y,
		QuuxplusoneUnsubmitted Done Reply Inline Actions Does it also match the other two I mentioned in the cfe-dev thread? Specifically, where you have (my version `TWO`): %u1ls = shl i64 %u1, 32 %lo = or i64, %u1ls, %t0l my version `ONE` has: %lo = mul i64 %x, %y and my version `THREE` has: %u3 = add i64 %t2, %t1 %u3ls = shl i64 %u3, 32 %lo = add i64 %u3ls, %t0 https://godbolt.org/z/_1pDoz Quuxplusone: Does it also match the other two I mentioned in the cfe-dev thread? Specifically, where you…
		chfastAuthorUnsubmitted Done Reply Inline Actions As mentioned before, currently the optimization matches patterns for low and high independently. Mostly, because I don't know yet what is the best way to combine both. Currently for low it replaces pattern `TWO` with `ONE`. The `THREE` will not work. These are great test, will add them to the test suite in a separate review as suggested. chfast: As mentioned before, currently the optimization matches patterns for low and high independently.
		DominatorTree &DT) {
		// Try to find the wanted multiplication instruction.
		// FIXME: Check the multiplication type.
		for (const auto U : x->users()) {
		LLVM_DEBUG(dbgs() << "User1 " << *U << "\n");
		if (match(U, m_ZExt(m_Specific(x)))) {
		LLVM_DEBUG(dbgs() << "ZExt found: " << *U << "\n");
		for (const auto V : U->users()) {
		if (match(V, m_c_Mul(m_Specific(U), m_ZExt(m_Specific(y))))) {
		LLVM_DEBUG(dbgs() << "Mul found: " << *V << "\n");
		return cast<Instruction>(V);
		}
		}
		QuuxplusoneUnsubmitted Done Reply Inline Actions Here and lines 277 and 290: `const auto` should be something else (such as [simply `auto&&`](https://quuxplusone.github.io/blog/2018/12/15/autorefref-always-works/)), or else the `const` applies only to the copy you made of whatever the element type of `U->users()` was. I suspect you actually meant `const auto ` but I'm not sure. Quuxplusone:* Here and lines 277 and 290: `const auto` should be something else (such as [simply `auto&&`]…
		}
		}

		// Create the full multiplication instruction and place it just after its
		// operands. This position is the higher possible so will be safe to be used
		// as a replacement for all future matched patterns.
		// FIXME: All this placement probably don't even work.
		Instruction *insertPoint = &I.getParent()->front();
		auto xI = dyn_cast<Instruction>(x);
		if (xI && DT.dominates(xI, insertPoint))
		insertPoint = xI;
		auto yI = dyn_cast<Instruction>(y);
		if (yI && DT.dominates(yI, insertPoint))
		insertPoint = yI;

		LLVM_DEBUG(dbgs() << "Insert Point: " << *insertPoint << "\n");
		IRBuilder<> Builder{insertPoint};
		craig.topperUnsubmitted Done Reply Inline Actions Don't you need to check the type is i64 somewhere? Or did I miss it? craig.topper: Don't you need to check the type is i64 somewhere? Or did I miss it?
		chfastAuthorUnsubmitted Done Reply Inline Actions Yes, I should. Is there a way to do this check with `match()`. I have not found any example doing this. chfast: Yes, I should. Is there a way to do this check with `match()`. I have not found any example…
		craig.topperUnsubmitted Done Reply Inline Actions No. But you can check I.getType()->isIntegerTy(64); craig.topper: No. But you can check I.getType()->isIntegerTy(64);
		auto ex = Builder.CreateZExt(x, Builder.getInt128Ty());
		auto ey = Builder.CreateZExt(y, Builder.getInt128Ty());
		return cast<Instruction>(Builder.CreateNUWMul(ex, ey, "p"));
		}

		/// Matches the following pattern producing full multiplication:
		///
		/// %xl = and i64 %x, 4294967295
		/// %xh = lshr i64 %x, 32
		/// %yl = and i64 %y, 4294967295
		/// %yh = lshr i64 %y, 32
		///
		/// %t0 = mul nuw i64 %yl, %xl
		/// %t1 = mul nuw i64 %yl, %xh
		/// %t2 = mul nuw i64 %yh, %xl
		/// %t3 = mul nuw i64 %yh, %xh
		///
		/// %t0l = and i64 %t0, 4294967295
		/// %t0h = lshr i64 %t0, 32
		lebedev.riUnsubmitted Done Reply Inline Actions if (match(&I, m_c_Or(m_LowPart(m_Value(t0)), m_Shl(m_c_Add(m_LowPart(m_c_Add(m_HighPart(m_Deferred(t0)), m_Value(t1))), m_Value(t2)), m_SpecificInt(32))))) { lebedev.ri: ``` if (match(&I, m_c_Or(m_LowPart(m_Value(t0)), m_Shl(m_c_Add…
		chfastAuthorUnsubmitted Done Reply Inline Actions That's cool. `m_Deferred` definitely lacks some documentation. chfast: That's cool. `m_Deferred` definitely lacks some documentation.
		///
		/// %u0 = add i64 %t0h, %t1
		/// %u0l = and i64 %u0, 4294967295
		/// %u0h = lshr i64 %u0, 32
		///
		/// %u1 = add i64 %u0l, %t2
		/// %u1ls = shl i64 %u1, 32
		/// %u1h = lshr i64 %u1, 32
		///
		/// %u2 = add i64 %u0h, %t3
		///
		/// %lo = or i64 %u1ls, %t0l
		/// %hi = add i64 %u2, %u1h

		static bool foldMul(Instruction &I, DominatorTree &DT) {
		lebedev.riUnsubmitted Done Reply Inline Actions and now you only have `t0`, no `t0a` lebedev.ri: and now you only have `t0`, no `t0a`

		Value *x = nullptr;
		Value *y = nullptr;

		Value *t0 = nullptr;
		Value *t1 = nullptr;
		Value *t2 = nullptr;

		Value *t3 = nullptr;
		Value *u0 = nullptr;

		// Match low part of the full multiplication.
		//
		// First we match up to the multiplications t0, t1, t2.
		// The t0 is reachable by two edges and we _assume_ it's the same node
		// in general it does not have to be.
		if (match(&I,
		m_c_Or(m_LowPart(m_Value(t0)),
		m_Shl(m_c_Add(m_LowPart(m_c_Add(m_HighPart(m_Deferred(t0)),
		m_Value(t1))),
		m_Value(t2)),
		m_SpecificInt(32))))) {

		LLVM_DEBUG(dbgs() << "Lo found up to muls\n");
		LLVM_DEBUG(dbgs() << *t0 << "\n");
		LLVM_DEBUG(dbgs() << *t1 << "\n");
		LLVM_DEBUG(dbgs() << *t2 << "\n");

		// 1. Match t1 and remember its arguments. We start with t1 is asymmetric.
		// 2. Require t2 to be a swapped version of t1.
		QuuxplusoneUnsubmitted Done Reply Inline Actions Remove debugging printf? Quuxplusone: Remove debugging printf?
		// 3. For t0 require to have the same arguments as t1.
		if (match(t1, m_c_Mul(m_HighPart(m_Value(x)), m_LowPart(m_Value(y)))) &&
		match(t2,
		m_c_Mul(m_LowPart(m_Specific(x)), m_HighPart(m_Specific(y)))) &&
		match(t0,
		m_c_Mul(m_LowPart(m_Specific(x)), m_LowPart(m_Specific(y))))) {
		LLVM_DEBUG(dbgs() << "Lo muls are ok\n");

		// The whole pattern can be replaced with single multiplication.
		auto mul = findOrCreateFullMul(I, x, y, DT);
		IRBuilder<> Builder{&I};
		auto lo = Builder.CreateTrunc(mul, I.getType(), "p.lo");
		I.replaceAllUsesWith(lo);
		return true;
		}
		}

		// Match low part of the full multiplication.
		//
		// First we match up to multiplications t2 and t3 and u0 node.
		// Then check the u0 node.
		// In the end check all 4 multiplications starting from asymmetric ones
		// the same as in matching the low part.
		if (match(&I,
		m_c_Add(m_HighPart(m_c_Add(m_LowPart(m_Value(u0)), m_Value(t2))),
		m_Add(m_HighPart(m_Deferred(u0)), m_Value(t3)))) &&
		match(u0, m_c_Add(m_HighPart(m_Value(t0)), m_Value(t1)))) {
		if (match(t1, m_c_Mul(m_HighPart(m_Value(x)), m_LowPart(m_Value(y)))) &&
		match(t2,
		m_c_Mul(m_LowPart(m_Specific(x)), m_HighPart(m_Specific(y)))) &&
		match(t0,
		QuuxplusoneUnsubmitted Not Done Reply Inline Actions Nit: This expression isn't intuitively clear to me. Also, I would write `uint64_t{0}` as `uint64_t(0)`. I think you're getting really lucky here that `MaxSizeInBits / 2` just happens to be the same number of bits (64) as the width of `uint64_t{0}`; otherwise this math would be wrong. How about assert(HalfSizeInBits <= 64); const auto LowMask = m_SpecificInt((uint64_t(1) << (HalfSizeInBits-1) << 1) - 1); or if there's an existing utility function to compute `uint64_t(1) << HalfSizeInBits` directly. Quuxplusone: Nit: This expression isn't intuitively clear to me. Also, I would write `uint64_t{0}` as…
		m_c_Mul(m_LowPart(m_Specific(x)), m_LowPart(m_Specific(y)))) &&
		match(t3,
		m_c_Mul(m_HighPart(m_Specific(x)), m_HighPart(m_Specific(y))))) {
		LLVM_DEBUG(dbgs() << "Hi found!!! (" << x << ", " << y << ")\n");

		auto mul = findOrCreateFullMul(I, x, y, DT);
		IRBuilder<> Builder{&I};
		auto hi =
		Builder.CreateTrunc(Builder.CreateLShr(mul, 64), I.getType(), "p.hi");
		I.replaceAllUsesWith(hi);
		return true;
		}
		}

		return false;
		}

/// This is the entry point for folds that could be implemented in regular		/// This is the entry point for folds that could be implemented in regular
/// InstCombine, but they are separated because they are not expected to		/// InstCombine, but they are separated because they are not expected to
/// occur frequently and/or have more than a constant-length pattern match.		/// occur frequently and/or have more than a constant-length pattern match.
static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {		static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// Also, we want to avoid matching partial patterns.		// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions		// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {		for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
MadeChange \|= foldAnyOrAllBitsSet(I);		MadeChange \|= foldAnyOrAllBitsSet(I);
MadeChange \|= foldGuardedRotateToFunnelShift(I);		MadeChange \|= foldGuardedRotateToFunnelShift(I);
		MadeChange \|= foldMul(I, DT);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

test/Transforms/AggressiveInstCombine/mul_full_32.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s			; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"			target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
	target triple = "i386-unknown-linux-gnu"			target triple = "i386-unknown-linux-gnu"

	define { i64, i64 } @mul_full_64(i64 %x, i64 %y) {			define { i64, i64 } @mul_full_64(i64 %x, i64 %y) {
	; CHECK-LABEL: @mul_full_64(			; CHECK-LABEL: @mul_full_64(
	; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295			; CHECK-NEXT: [[TMP1:%.]] = zext i64 [[X:%.]] to i128
	; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32			; CHECK-NEXT: [[TMP2:%.]] = zext i64 [[Y:%.]] to i128
	; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295			; CHECK-NEXT: [[P:%.*]] = mul nuw i128 [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32			; CHECK-NEXT: [[P_LO:%.*]] = trunc i128 [[P]] to i64
	; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]			; CHECK-NEXT: [[TMP3:%.*]] = lshr i128 [[P]], 64
	; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]			; CHECK-NEXT: [[P_HI:%.*]] = trunc i128 [[TMP3]] to i64
	; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]			; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[P_LO]], 0
	; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[YH]], [[XH]]			; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[P_HI]], 1
	; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
	; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
	; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]
	; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
	; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
	; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
	; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
	; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
	; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
	; CHECK-NEXT: [[LO:%.*]] = or i64 [[U1LS]], [[T0L]]
	; CHECK-NEXT: [[HI:%.*]] = add i64 [[U2]], [[U1H]]
	; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[LO]], 0
	; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[HI]], 1
	; CHECK-NEXT: ret { i64, i64 } [[RES]]			; CHECK-NEXT: ret { i64, i64 } [[RES]]
	;			;
	%xl = and i64 %x, 4294967295			%xl = and i64 %x, 4294967295
	%xh = lshr i64 %x, 32			%xh = lshr i64 %x, 32
	%yl = and i64 %y, 4294967295			%yl = and i64 %y, 4294967295
	%yh = lshr i64 %y, 32			%yh = lshr i64 %y, 32

	%t0 = mul nuw i64 %yl, %xl			%t0 = mul nuw i64 %yl, %xl
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

test/Transforms/AggressiveInstCombine/mul_full_64.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s		; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

define { i64, i64 } @mul_full_64_variant0(i64 %x, i64 %y) {		define { i64, i64 } @mul_full_64_variant0(i64 %x, i64 %y) {
; CHECK-LABEL: @mul_full_64_variant0(		; CHECK-LABEL: @mul_full_64_variant0(
; CHECK-NEXT: [[XL:%.]] = and i64 [[X:%.]], 4294967295		; CHECK-NEXT: [[TMP1:%.]] = zext i64 [[X:%.]] to i128
; CHECK-NEXT: [[XH:%.*]] = lshr i64 [[X]], 32		; CHECK-NEXT: [[TMP2:%.]] = zext i64 [[Y:%.]] to i128
; CHECK-NEXT: [[YL:%.]] = and i64 [[Y:%.]], 4294967295		; CHECK-NEXT: [[P:%.*]] = mul nuw i128 [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[YH:%.*]] = lshr i64 [[Y]], 32		; CHECK-NEXT: [[P_LO:%.*]] = trunc i128 [[P]] to i64
; CHECK-NEXT: [[T0:%.*]] = mul nuw i64 [[YL]], [[XL]]		; CHECK-NEXT: [[TMP3:%.*]] = lshr i128 [[P]], 64
; CHECK-NEXT: [[T1:%.*]] = mul nuw i64 [[YL]], [[XH]]		; CHECK-NEXT: [[P_HI:%.*]] = trunc i128 [[TMP3]] to i64
; CHECK-NEXT: [[T2:%.*]] = mul nuw i64 [[YH]], [[XL]]		; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[P_LO]], 0
; CHECK-NEXT: [[T3:%.*]] = mul nuw i64 [[YH]], [[XH]]		; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[P_HI]], 1
; CHECK-NEXT: [[T0L:%.*]] = and i64 [[T0]], 4294967295
; CHECK-NEXT: [[T0H:%.*]] = lshr i64 [[T0]], 32
; CHECK-NEXT: [[U0:%.*]] = add i64 [[T0H]], [[T1]]
; CHECK-NEXT: [[U0L:%.*]] = and i64 [[U0]], 4294967295
; CHECK-NEXT: [[U0H:%.*]] = lshr i64 [[U0]], 32
; CHECK-NEXT: [[U1:%.*]] = add i64 [[U0L]], [[T2]]
; CHECK-NEXT: [[U1LS:%.*]] = shl i64 [[U1]], 32
; CHECK-NEXT: [[U1H:%.*]] = lshr i64 [[U1]], 32
; CHECK-NEXT: [[U2:%.*]] = add i64 [[U0H]], [[T3]]
; CHECK-NEXT: [[LO:%.*]] = or i64 [[U1LS]], [[T0L]]
; CHECK-NEXT: [[HI:%.*]] = add i64 [[U2]], [[U1H]]
; CHECK-NEXT: [[RES_LO:%.*]] = insertvalue { i64, i64 } undef, i64 [[LO]], 0
; CHECK-NEXT: [[RES:%.*]] = insertvalue { i64, i64 } [[RES_LO]], i64 [[HI]], 1
; CHECK-NEXT: ret { i64, i64 } [[RES]]		; CHECK-NEXT: ret { i64, i64 } [[RES]]
;		;
%xl = and i64 %x, 4294967295		%xl = and i64 %x, 4294967295
%xh = lshr i64 %x, 32		%xh = lshr i64 %x, 32
%yl = and i64 %y, 4294967295		%yl = and i64 %y, 4294967295
%yh = lshr i64 %y, 32		%yh = lshr i64 %y, 32

%t0 = mul nuw i64 %yl, %xl		%t0 = mul nuw i64 %yl, %xl
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; return (uint64_t(lo(rlh + lo(rhl + hi(rll)))) << 32) + lo(rll);		; return (uint64_t(lo(rlh + lo(rhl + hi(rll)))) << 32) + lo(rll);
; #elif THREE		; #elif THREE
; return ((rlh + rhl) << 32) + rll;		; return ((rlh + rhl) << 32) + rll;
; #endif		; #endif
; }		; }

define i64 @mul_full_64_variant1(i64 %a, i64 %b, i64* nocapture %rhi) {		define i64 @mul_full_64_variant1(i64 %a, i64 %b, i64* nocapture %rhi) {
; CHECK-LABEL: @mul_full_64_variant1(		; CHECK-LABEL: @mul_full_64_variant1(
; CHECK-NEXT: [[CONV:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[TMP1:%.]] = zext i64 [[A:%.]] to i128
; CHECK-NEXT: [[SHR_I43:%.*]] = lshr i64 [[A]], 32		; CHECK-NEXT: [[TMP2:%.]] = zext i64 [[B:%.]] to i128
; CHECK-NEXT: [[CONV3:%.]] = and i64 [[B:%.]], 4294967295		; CHECK-NEXT: [[P:%.*]] = mul nuw i128 [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[SHR_I41:%.*]] = lshr i64 [[B]], 32		; CHECK-NEXT: [[TMP3:%.*]] = lshr i128 [[P]], 64
; CHECK-NEXT: [[MUL:%.*]] = mul nuw i64 [[SHR_I41]], [[SHR_I43]]		; CHECK-NEXT: [[P_HI:%.*]] = trunc i128 [[TMP3]] to i64
; CHECK-NEXT: [[MUL5:%.*]] = mul nuw i64 [[CONV3]], [[SHR_I43]]		; CHECK-NEXT: store i64 [[P_HI]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[MUL6:%.*]] = mul nuw i64 [[SHR_I41]], [[CONV]]
; CHECK-NEXT: [[MUL7:%.*]] = mul nuw i64 [[CONV3]], [[CONV]]
; CHECK-NEXT: [[SHR_I40:%.*]] = lshr i64 [[MUL7]], 32
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I40]], [[MUL5]]
; CHECK-NEXT: [[SHR_I39:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I39]], [[MUL]]
; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295
; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]
; CHECK-NEXT: [[SHR_I:%.*]] = lshr i64 [[ADD15]], 32
; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I]]
; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[MUL18:%.*]] = mul i64 [[B]], [[A]]		; CHECK-NEXT: [[MUL18:%.*]] = mul i64 [[B]], [[A]]
; CHECK-NEXT: ret i64 [[MUL18]]		; CHECK-NEXT: ret i64 [[MUL18]]
;		;
%conv = and i64 %a, 4294967295		%conv = and i64 %a, 4294967295
%shr.i43 = lshr i64 %a, 32		%shr.i43 = lshr i64 %a, 32
%conv3 = and i64 %b, 4294967295		%conv3 = and i64 %b, 4294967295
%shr.i41 = lshr i64 %b, 32		%shr.i41 = lshr i64 %b, 32
%mul = mul nuw i64 %shr.i41, %shr.i43		%mul = mul nuw i64 %shr.i41, %shr.i43
Show All 10 Lines	;
%add17 = add i64 %add10, %shr.i		%add17 = add i64 %add10, %shr.i
store i64 %add17, i64* %rhi, align 8		store i64 %add17, i64* %rhi, align 8
%mul18 = mul i64 %b, %a		%mul18 = mul i64 %b, %a
ret i64 %mul18		ret i64 %mul18
}		}

define i64 @mul_full_64_variant2(i64 %a, i64 %b, i64* nocapture %rhi) {		define i64 @mul_full_64_variant2(i64 %a, i64 %b, i64* nocapture %rhi) {
; CHECK-LABEL: @mul_full_64_variant2(		; CHECK-LABEL: @mul_full_64_variant2(
; CHECK-NEXT: [[CONV:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[TMP1:%.]] = zext i64 [[A:%.]] to i128
; CHECK-NEXT: [[SHR_I58:%.*]] = lshr i64 [[A]], 32		; CHECK-NEXT: [[TMP2:%.]] = zext i64 [[B:%.]] to i128
; CHECK-NEXT: [[CONV3:%.]] = and i64 [[B:%.]], 4294967295		; CHECK-NEXT: [[P:%.*]] = mul nuw i128 [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[SHR_I56:%.*]] = lshr i64 [[B]], 32		; CHECK-NEXT: [[TMP3:%.*]] = lshr i128 [[P]], 64
; CHECK-NEXT: [[MUL:%.*]] = mul nuw i64 [[SHR_I56]], [[SHR_I58]]		; CHECK-NEXT: [[P_HI:%.*]] = trunc i128 [[TMP3]] to i64
; CHECK-NEXT: [[MUL5:%.*]] = mul nuw i64 [[CONV3]], [[SHR_I58]]		; CHECK-NEXT: store i64 [[P_HI]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[MUL6:%.*]] = mul nuw i64 [[SHR_I56]], [[CONV]]		; CHECK-NEXT: [[P_LO:%.*]] = trunc i128 [[P]] to i64
; CHECK-NEXT: [[MUL7:%.*]] = mul nuw i64 [[CONV3]], [[CONV]]		; CHECK-NEXT: ret i64 [[P_LO]]
; CHECK-NEXT: [[SHR_I55:%.*]] = lshr i64 [[MUL7]], 32
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I55]], [[MUL5]]
; CHECK-NEXT: [[SHR_I54:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I54]], [[MUL]]
; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295
; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]
; CHECK-NEXT: [[SHR_I51:%.*]] = lshr i64 [[ADD15]], 32
; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I51]]
; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[CONV24:%.*]] = shl i64 [[ADD15]], 32
; CHECK-NEXT: [[CONV26:%.*]] = and i64 [[MUL7]], 4294967295
; CHECK-NEXT: [[ADD27:%.*]] = or i64 [[CONV24]], [[CONV26]]
; CHECK-NEXT: ret i64 [[ADD27]]
;		;
%conv = and i64 %a, 4294967295		%conv = and i64 %a, 4294967295
%shr.i58 = lshr i64 %a, 32		%shr.i58 = lshr i64 %a, 32
%conv3 = and i64 %b, 4294967295		%conv3 = and i64 %b, 4294967295
%shr.i56 = lshr i64 %b, 32		%shr.i56 = lshr i64 %b, 32
%mul = mul nuw i64 %shr.i56, %shr.i58		%mul = mul nuw i64 %shr.i56, %shr.i58
%mul5 = mul nuw i64 %conv3, %shr.i58		%mul5 = mul nuw i64 %conv3, %shr.i58
%mul6 = mul nuw i64 %shr.i56, %conv		%mul6 = mul nuw i64 %shr.i56, %conv
Show All 10 Lines	;
%conv24 = shl i64 %add15, 32		%conv24 = shl i64 %add15, 32
%conv26 = and i64 %mul7, 4294967295		%conv26 = and i64 %mul7, 4294967295
%add27 = or i64 %conv24, %conv26		%add27 = or i64 %conv24, %conv26
ret i64 %add27		ret i64 %add27
}		}

define i64 @mul_full_64_variant3(i64 %a, i64 %b, i64* nocapture %rhi) {		define i64 @mul_full_64_variant3(i64 %a, i64 %b, i64* nocapture %rhi) {
; CHECK-LABEL: @mul_full_64_variant3(		; CHECK-LABEL: @mul_full_64_variant3(
; CHECK-NEXT: [[CONV:%.]] = and i64 [[A:%.]], 4294967295		; CHECK-NEXT: [[TMP1:%.]] = zext i64 [[A:%.]] to i128
		; CHECK-NEXT: [[TMP2:%.]] = zext i64 [[B:%.]] to i128
		; CHECK-NEXT: [[P:%.*]] = mul nuw i128 [[TMP1]], [[TMP2]]
		; CHECK-NEXT: [[CONV:%.*]] = and i64 [[A]], 4294967295
; CHECK-NEXT: [[SHR_I45:%.*]] = lshr i64 [[A]], 32		; CHECK-NEXT: [[SHR_I45:%.*]] = lshr i64 [[A]], 32
; CHECK-NEXT: [[CONV3:%.]] = and i64 [[B:%.]], 4294967295		; CHECK-NEXT: [[CONV3:%.*]] = and i64 [[B]], 4294967295
; CHECK-NEXT: [[SHR_I43:%.*]] = lshr i64 [[B]], 32		; CHECK-NEXT: [[SHR_I43:%.*]] = lshr i64 [[B]], 32
; CHECK-NEXT: [[MUL:%.*]] = mul nuw i64 [[SHR_I43]], [[SHR_I45]]
; CHECK-NEXT: [[MUL5:%.*]] = mul nuw i64 [[CONV3]], [[SHR_I45]]		; CHECK-NEXT: [[MUL5:%.*]] = mul nuw i64 [[CONV3]], [[SHR_I45]]
; CHECK-NEXT: [[MUL6:%.*]] = mul nuw i64 [[SHR_I43]], [[CONV]]		; CHECK-NEXT: [[MUL6:%.*]] = mul nuw i64 [[SHR_I43]], [[CONV]]
; CHECK-NEXT: [[MUL7:%.*]] = mul nuw i64 [[CONV3]], [[CONV]]		; CHECK-NEXT: [[MUL7:%.*]] = mul nuw i64 [[CONV3]], [[CONV]]
; CHECK-NEXT: [[SHR_I42:%.*]] = lshr i64 [[MUL7]], 32		; CHECK-NEXT: [[TMP3:%.*]] = lshr i128 [[P]], 64
; CHECK-NEXT: [[ADD:%.*]] = add i64 [[SHR_I42]], [[MUL5]]		; CHECK-NEXT: [[P_HI:%.*]] = trunc i128 [[TMP3]] to i64
; CHECK-NEXT: [[SHR_I41:%.*]] = lshr i64 [[ADD]], 32		; CHECK-NEXT: store i64 [[P_HI]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[ADD10:%.*]] = add i64 [[SHR_I41]], [[MUL]]
; CHECK-NEXT: [[CONV14:%.*]] = and i64 [[ADD]], 4294967295
; CHECK-NEXT: [[ADD15:%.*]] = add i64 [[CONV14]], [[MUL6]]
; CHECK-NEXT: [[SHR_I:%.*]] = lshr i64 [[ADD15]], 32
; CHECK-NEXT: [[ADD17:%.*]] = add i64 [[ADD10]], [[SHR_I]]
; CHECK-NEXT: store i64 [[ADD17]], i64* [[RHI:%.*]], align 8
; CHECK-NEXT: [[ADD18:%.*]] = add i64 [[MUL6]], [[MUL5]]		; CHECK-NEXT: [[ADD18:%.*]] = add i64 [[MUL6]], [[MUL5]]
; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADD18]], 32		; CHECK-NEXT: [[SHL:%.*]] = shl i64 [[ADD18]], 32
; CHECK-NEXT: [[ADD19:%.*]] = add i64 [[SHL]], [[MUL7]]		; CHECK-NEXT: [[ADD19:%.*]] = add i64 [[SHL]], [[MUL7]]
; CHECK-NEXT: ret i64 [[ADD19]]		; CHECK-NEXT: ret i64 [[ADD19]]
;		;
%conv = and i64 %a, 4294967295		%conv = and i64 %a, 4294967295
%shr.i45 = lshr i64 %a, 32		%shr.i45 = lshr i64 %a, 32
%conv3 = and i64 %b, 4294967295		%conv3 = and i64 %b, 4294967295
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AggressiveInstCombine: Fold full mul i64 x i64 -> i128AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 180132

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

test/Transforms/AggressiveInstCombine/mul_full_32.ll

test/Transforms/AggressiveInstCombine/mul_full_64.ll

AggressiveInstCombine: Fold full mul i64 x i64 -> i128
AbandonedPublic