This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
14/21
InstCombineShifts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/1
shift-logic.ll

Differential D123453

[InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor
ClosedPublic

Authored by bcl5980 on Apr 9 2022, 4:40 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
lebedev.ri
craig.topper

Commits

rG1fae4b492dd1: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a…

Summary

if c is divisible by (1 << ShAmtC), we can fold this pattern:
lshr (mul nuw x, c), ShAmtC -> mul nuw x, (c >> ShAmtC)

https://alive2.llvm.org/ce/z/ox4wAt

Fix https://github.com/llvm/llvm-project/issues/54824

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bcl5980 created this revision.Apr 9 2022, 4:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2022, 4:40 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

bcl5980 requested review of this revision.Apr 9 2022, 4:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2022, 4:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B158846: Diff 421716.Apr 9 2022, 5:23 AM

Comment should be more clear

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1172	mul nuw, c??
1172	qc is what?

bcl5980 added inline comments.Apr 9 2022, 6:44 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1172	I'm sorry, should be : lshr exact (mul nuw x, c * (1 << ShAmtC)), ShAmtC -> mul nuw, c I will fix the comment later

fix typo

xbolva00 added inline comments.Apr 9 2022, 7:01 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1172	mul nuw x, c?

bcl5980 added inline comments.Apr 9 2022, 7:16 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1172	yeah

Harbormaster completed remote builds in B158854: Diff 421727.Apr 9 2022, 7:37 AM

craig.topper added inline comments.Apr 9 2022, 8:36 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1172	c * (1 << ShAmtC) is (c << ShAmtC) isn't it?

craig.topper added inline comments.Apr 9 2022, 8:50 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1165–1166	This comment needs to move inside the first.

Please can you post the link to the general proof?

In D123453#3441088, @lebedev.ri wrote:

Please can you post the link to the general proof?

I'm sorry but can you help to explain what the general proof is?

Fix comment
Update alive2 link

In D123453#3441114, @bcl5980 wrote:

In D123453#3441088, @lebedev.ri wrote:

Please can you post the link to the general proof?

I'm sorry but can you help to explain what the general proof is?

The one you linked has hardcoded constants, yet the constants aren't hardcoded in the transform itself.

xbolva00 added inline comments.Apr 9 2022, 10:19 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1176	Once again: mul nuw x, c

Fix comment
Update general proof link

bcl5980 edited the summary of this revision. (Show Details)Apr 9 2022, 10:38 AM

bcl5980 edited the summary of this revision. (Show Details)

In D123453#3441118, @lebedev.ri wrote:

In D123453#3441114, @bcl5980 wrote:

In D123453#3441088, @lebedev.ri wrote:

Please can you post the link to the general proof?

I'm sorry but can you help to explain what the general proof is?

The one you linked has hardcoded constants, yet the constants aren't hardcoded in the transform itself.

Thanks for the mentioning. I'll pay attention next time for constants.

bcl5980 marked 6 inline comments as done.Apr 9 2022, 10:47 AM

bcl5980 added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1176	Sorry for that. Miss again

lebedev.ri edited the summary of this revision. (Show Details)Apr 9 2022, 10:49 AM

lebedev.ri edited the summary of this revision. (Show Details)

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1177	No one-use check needed, only a single instruction is produced. I believe, while that `lshr` is obviously `exact`, your own proof shows that there is no need to check that it is marked as such?

bcl5980 marked an inline comment as done.Apr 9 2022, 11:08 AM

bcl5980 added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

1177

If there is no one-use, we may generate 2 mul instead of mul+shr

define i64 @lshr_mul_negative_oneuse(i64 %0) {
; CHECK-LABEL: @lshr_mul_negative_oneuse(
; CHECK-NEXT:    [[TMP2:%.*]] = mul nuw i64 [[TMP0:%.*]], 52
; CHECK-NEXT:    call void @use(i64 [[TMP2]])
; CHECK-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP0]], 13
; CHECK-NEXT:    ret i64 [[TMP3]]
;
  %2 = mul nuw i64 %0, 52
  call void @use(i64 %2)
  %3 = lshr i64 %2, 2
  ret i64 %3
}

We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag.

bcl5980 added inline comments.Apr 9 2022, 11:12 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1177	We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag. Should be "make sure MulC is divisible by NewMulC"

lebedev.ri added inline comments.Apr 9 2022, 11:16 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1177	If there is no one-use, we may generate 2 mul instead of mul+shr We only create only a single `mul` here, do we not? We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag. Can you show the counter-proof that shows that not checking for `exact` is incorrect?

bcl5980 added inline comments.Apr 9 2022, 11:22 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1177	If there is no one-use, we may generate 2 mul instead of mul+shr We only create only a single `mul` here, do we not? Yeah, we create only a single mul, but most of time mul should be heavier than lshr, am I right? We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag. Can you show the counter-proof that shows that not checking for `exact` is incorrect? I don't know how to counter-proof with alive2 but this is the negative case on my machine after remove exact: define i64 @lshr_mul_negative_noexact(i64 %0) { ; CHECK-LABEL: @lshr_mul_negative_noexact( ; CHECK-NEXT: [[TMP2:%.]] = mul nuw i64 [[TMP0:%.]], 13 ; CHECK-NEXT: ret i64 [[TMP2]] ; %2 = mul nuw i64 %0, 53 %3 = lshr i64 %2, 2 ret i64 %3 }

I see. Then the proof is wrong.

Harbormaster completed remote builds in B158869: Diff 421745.Apr 9 2022, 11:28 AM

bcl5980 added inline comments.Apr 9 2022, 11:32 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

1177

If there is no one-use, we may generate 2 mul instead of mul+shr

We only create only a single mul here, do we not?

declare i64 @use(i64, i64)
define i64 @lshr_mul_negative_oneuse(i64 %0) {
; CHECK-LABEL: @lshr_mul_negative_oneuse(
; CHECK-NEXT:    [[TMP2:%.*]] = mul nuw i64 [[TMP0:%.*]], 52
; CHECK-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP0]], 13
; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @use(i64 [[TMP2]], i64 [[TMP3]])
; CHECK-NEXT:    ret i64 [[TMP4]]
;
  %2 = mul nuw i64 %0, 52
  %3 = lshr i64 %2, 2
  %4 = call i64 @use(i64 %2, i64 %3)
  ret i64 %4
}

This case should more clear why we need one use I think.

Fix test error

In D123453#3441148, @lebedev.ri wrote:

I see. Then the proof is wrong.

I'm sorry I really don't know how to proof by alive2. Can you teach me how to proof it?

Fix comment

In D123453#3441197, @bcl5980 wrote:

In D123453#3441148, @lebedev.ri wrote:

I see. Then the proof is wrong.

I'm sorry I really don't know how to proof by alive2. Can you teach me how to proof it?

You need to either adjust the fold to do what the proof says, or write another proof for the change in question.

Abstracting a bit, is there a generalization of this pattern where the shift amounts don't match?

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1177	The number of instructions didn't increase, so we're fine.

Harbormaster completed remote builds in B158875: Diff 421751.Apr 9 2022, 12:56 PM

For example, what about this:
~~https://alive2.llvm.org/ce/z/aov-GB~~
https://alive2.llvm.org/ce/z/ox4wAt (no need for exact, only nuw)

In D123453#3441216, @lebedev.ri wrote:

For example, what about this:
~~https://alive2.llvm.org/ce/z/aov-GB~~
https://alive2.llvm.org/ce/z/ox4wAt (no need for exact, only nuw)

%t0 = lshr i8 %C1, %C2
%t1 = shl i8 %t0, %C2
%precond = icmp eq i8 %t1, %C1
call void @llvm.assume(i1 %precond)

This is the proof of exact. We can't remove exact check in the code. Or we need to do something like:

if (Op0->hasOneUse()) {
  APInt NewMulC = MulC->lshr(ShAmtC);
  if (MulC->eq(NewMulC.shl(ShAmtC)))
    return BinaryOperator::CreateNUWMul(X, ConstantInt::get(Ty, NewMulC));
}

bcl5980 added inline comments.Apr 9 2022, 8:46 PM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1177	The number of instructions didn't increase, so we're fine. I still insist one-use, mul is slower than lshr on most targets.

In D123453#3441326, @bcl5980 wrote:
In D123453#3441216, @lebedev.ri wrote:

For example, what about this:
~~https://alive2.llvm.org/ce/z/aov-GB~~
https://alive2.llvm.org/ce/z/ox4wAt (no need for exact, only nuw)
%t0 = lshr i8 %C1, %C2
%t1 = shl i8 %t0, %C2
%precond = icmp eq i8 %t1, %C1
call void @llvm.assume(i1 %precond)
This is the proof of exact. We can't remove exact check in the code. Or we need to do something like:
if (Op0->hasOneUse()) {
  APInt NewMulC = MulC->lshr(ShAmtC);
  if (MulC->eq(NewMulC.shl(ShAmtC)))
    return BinaryOperator::CreateNUWMul(X, ConstantInt::get(Ty, NewMulC));
}

Couldn't the exact be set because the LHS of the multiply is known to have trailing zero bits and have nothing to do with the trailing zeros of the constant on the RHS. Would the transform still be valid in that case?

In D123453#3441329, @craig.topper wrote:
In D123453#3441326, @bcl5980 wrote:
In D123453#3441216, @lebedev.ri wrote:

For example, what about this:
~~https://alive2.llvm.org/ce/z/aov-GB~~
https://alive2.llvm.org/ce/z/ox4wAt (no need for exact, only nuw)
%t0 = lshr i8 %C1, %C2
%t1 = shl i8 %t0, %C2
%precond = icmp eq i8 %t1, %C1
call void @llvm.assume(i1 %precond)
This is the proof of exact. We can't remove exact check in the code. Or we need to do something like:
if (Op0->hasOneUse()) {
  APInt NewMulC = MulC->lshr(ShAmtC);
  if (MulC->eq(NewMulC.shl(ShAmtC)))
    return BinaryOperator::CreateNUWMul(X, ConstantInt::get(Ty, NewMulC));
}
Couldn't the exact be set because the LHS of the multiply is known to have trailing zero bits and have nothing to do with the trailing zeros of the constant on the RHS. Would the transform still be valid in that case?

Thanks for the finding. That's what I haven't expect this case. So we should use the check if (MulC->eq(NewMulC.shl(ShAmtC))) to avoid mul's LHS involve exact.

bcl5980 updated this revision to Diff 421765.Apr 9 2022, 9:17 PM

bcl5980 edited the summary of this revision. (Show Details)

bcl5980 updated this revision to Diff 421766.Apr 9 2022, 9:24 PM

Harbormaster completed remote builds in B158889: Diff 421766.Apr 9 2022, 10:16 PM

bcl5980 retitled this revision from [InstCombine] Fold mul nuw+lshr exact to a single multiplication when the latter is a factor to [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor.Apr 10 2022, 9:42 PM

craig.topper added inline comments.Apr 11 2022, 9:58 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1172	Not related to this patch, but this existing transform is incorrect if MulC happens to be 2 and BitWidth is 2 and ShAmtC is 1. MulC-1 would be 1 which is a power 2 but it doesn't have 2 bits set like the transform expects. Hopefully we would always canonicalize the Mul to a `shl` first so this transform won't fire for that case, but I'm not sure. @spatel @lebedev.ri

spatel mentioned this in rG1206a18d417a: [InstCombine] guard against splat-mul corner case.Apr 11 2022, 12:53 PM

spatel added inline comments.Apr 11 2022, 12:56 PM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1172	Good catch. I'm not sure how to write a test for it either, but I added a "BitWidth > 2" clause: 1206a18d417a

rebase

fix build error

Harbormaster completed remote builds in B159157: Diff 422123.Apr 11 2022, 10:43 PM

@lebedev.ri @craig.topper
What can I do now to continue the patch?

Ping.

Please pre-commit the tests with baseline CHECKs. Add one more positive test with vector type, so we know that works correctly -- at least for splat (uniform) constants. A negative test with 'nsw' would also be good. What happens if we have both 'nsw' and 'nuw'?

There's an existing fold for: (X * C2) << C1 --> X * (C2 << C1)

...and it does not check for one-use. For consistency, we probably don't want the one-use restriction here either. If there's already a multiply in the pattern before this transform, another one is probably fine? The backend could theoretically decompose it back to shift (but I don't think we have that transform currently).

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1178–1179	Replace 'c' with 'MulC' in this comment to make it clearer.

void test(unsigned* pa, unsigned* pb, unsigned short i)
{
    unsigned a = i * 100;
    unsigned b = a >> 2;
    *pa = a;
    *pb = b;
}

I write a test to verify backend can optimize the pattern without one use.
This is the AArch64 baseline with one-use:

// %bb.0:                               // %entry
	and	w8, w2, #0xffff
	mov	w9, #100
	mul	w8, w8, w9
	lsr	w9, w8, #2
	str	w8, [x0]
	str	w9, [x1]
	ret
                                        // -- End function

This is the AArch64 result we remove one-use:

"?test@@YAXPEAI0G@Z":                   // @"?test@@YAXPEAI0G@Z"
// %bb.0:                               // %entry
	mov	w8, #100
	and	w9, w2, #0xffff
	mov	w10, #25
	mul	w8, w9, w8
	mul	w9, w9, w10
	str	w8, [x0]
	str	w9, [x1]
	ret
                                        // -- End function

This is the X86 baseline with one-use:

"?test@@YAXPEAI0G@Z":                   # @"?test@@YAXPEAI0G@Z"
# %bb.0:                                # %entry
	movzwl	%r8w, %eax
	imull	$100, %eax, %eax
	movl	%eax, (%rcx)
	shrl	$2, %eax
	movl	%eax, (%rdx)
	retq
                                        # -- End function

This is the X86 result we remove one-use:

"?test@@YAXPEAI0G@Z":                   # @"?test@@YAXPEAI0G@Z"
# %bb.0:                                # %entry
	movzwl	%r8w, %eax
	imull	$100, %eax, %r8d
	leal	(%rax,%rax,4), %eax
	leal	(%rax,%rax,4), %eax
	movl	%r8d, (%rcx)
	movl	%eax, (%rdx)
	retq
                                        # -- End function

Backend is not easy to figure out the case as we need to loop all use of mul's operand 0 to find the candidate.

In D123453#3459579, @spatel wrote:

There's an existing fold for: (X * C2) << C1 --> X * (C2 << C1)

...and it does not check for one-use. For consistency, we probably don't want the one-use restriction here either. If there's already a multiply in the pattern before this transform, another one is probably fine? The backend could theoretically decompose it back to shift (but I don't think we have that transform currently).

Can we add one-use for this pattern also? https://godbolt.org/z/x3bo7q54j

bcl5980 mentioned this in rG8242fc7f8ad3: [InstCombine] add tests for mul+lshr; NFC.Apr 20 2022, 1:14 AM

update comments and rebase

Harbormaster completed remote builds in B160400: Diff 423835.Apr 20 2022, 2:21 AM

In D123453#3461148, @bcl5980 wrote:

Backend is not easy to figure out the case as we need to loop all use of mul's operand 0 to find the candidate.

Ok - I agree that the posted examples don't look like wins. There is no quick fix for the backend, so we can leave the one-use check in for now. Please add a code comment though to explain why have the one-use limitation.

If you want to make the other transform (for "shl") consistent with this one by adding a one-use check to it too, that can be a follow-up patch. That transform has been around for a long time, so it is possible that someone may notice an independent performance difference from that change.

llvm/test/Transforms/InstCombine/shift-logic.ll
275	If we do not propagate the 'nsw' on this, we will not be able to recover it in general: https://alive2.llvm.org/ce/z/tbckkt

keep nsw flag

bcl5980 marked an inline comment as done.Apr 20 2022, 8:32 AM

LGTM - see inline remark for minor edit to a code comment.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
1176	I would phrase this differently: // The one-use check is not strictly necessary, but codegen may not be able // to invert the transform and perf may suffer with an extra mul instruction.

This revision is now accepted and ready to land.Apr 20 2022, 8:46 AM

Harbormaster completed remote builds in B160459: Diff 423915.Apr 20 2022, 9:07 AM

This revision was landed with ongoing or failed builds.Apr 20 2022, 9:15 AM

Closed by commit rG1fae4b492dd1: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a… (authored by bcl5980). · Explain Why

This revision was automatically updated to reflect the committed changes.

bcl5980 added a commit: rG1fae4b492dd1: [InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a….

bcl5980 mentioned this in D124183: [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1).Apr 21 2022, 9:24 AM

bcl5980 mentioned this in rGb543d28df7b0: [InstCombine] Add one use limitation for (X * C2) << C1 --> X * (C2 << C1).Apr 21 2022, 9:34 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineShifts.cpp

33 lines

test/

Transforms/

InstCombine/

shift-logic.ll

26 lines

Diff 423931

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

Show First 20 Lines • Show All 1,156 Lines • ▼ Show 20 Lines	if (match(Op0, m_OneUse(m_Trunc(m_Instruction(TruncSrc)))) &&
Value *Trunc = Builder.CreateTrunc(SumShift, Ty, I.getName());		Value *Trunc = Builder.CreateTrunc(SumShift, Ty, I.getName());

// If the first shift does not cover the number of bits truncated, then		// If the first shift does not cover the number of bits truncated, then
// we require a mask to get rid of high bits in the result.		// we require a mask to get rid of high bits in the result.
APInt MaskC = APInt::getAllOnes(BitWidth).lshr(ShAmtC);		APInt MaskC = APInt::getAllOnes(BitWidth).lshr(ShAmtC);
return BinaryOperator::CreateAnd(Trunc, ConstantInt::get(Ty, MaskC));		return BinaryOperator::CreateAnd(Trunc, ConstantInt::get(Ty, MaskC));
}		}
}		}

		const APInt *MulC;
		craig.topperUnsubmitted Done Reply Inline Actions This comment needs to move inside the first. craig.topper: This comment needs to move inside the first.
		if (match(Op0, m_NUWMul(m_Value(X), m_APInt(MulC)))) {
// Look for a "splat" mul pattern - it replicates bits across each half of		// Look for a "splat" mul pattern - it replicates bits across each half of
// a value, so a right shift is just a mask of the low bits:		// a value, so a right shift is just a mask of the low bits:
// lshr i[2N] (mul nuw X, (2^N)+1), N --> and iN X, (2^N)-1		// lshr i[2N] (mul nuw X, (2^N)+1), N --> and iN X, (2^N)-1
// TODO: Generalize to allow more than just half-width shifts?		// TODO: Generalize to allow more than just half-width shifts?
const APInt *MulC;		if (BitWidth > 2 && ShAmtC * 2 == BitWidth && (*MulC - 1).isPowerOf2() &&
		xbolva00Unsubmitted Done Reply Inline Actions mul nuw, c?? xbolva00: mul nuw, c??
		xbolva00Unsubmitted Done Reply Inline Actions qc is what? xbolva00: qc is what?
		bcl5980AuthorUnsubmitted Done Reply Inline Actions I'm sorry, should be : lshr exact (mul nuw x, c * (1 << ShAmtC)), ShAmtC -> mul nuw, c I will fix the comment later bcl5980: I'm sorry, should be : lshr exact (mul nuw x, c * (1 << ShAmtC)), ShAmtC -> mul nuw, c I will…
		xbolva00Unsubmitted Done Reply Inline Actions mul nuw x, c? xbolva00: mul nuw x, c?
		bcl5980AuthorUnsubmitted Done Reply Inline Actions yeah bcl5980: yeah
		craig.topperUnsubmitted Done Reply Inline Actions c * (1 << ShAmtC) is (c << ShAmtC) isn't it? craig.topper: c * (1 << ShAmtC) is (c << ShAmtC) isn't it?
		craig.topperUnsubmitted Not Done Reply Inline Actions Not related to this patch, but this existing transform is incorrect if MulC happens to be 2 and BitWidth is 2 and ShAmtC is 1. MulC-1 would be 1 which is a power 2 but it doesn't have 2 bits set like the transform expects. Hopefully we would always canonicalize the Mul to a `shl` first so this transform won't fire for that case, but I'm not sure. @spatel @lebedev.ri craig.topper: Not related to this patch, but this existing transform is incorrect if MulC happens to be 2 and…
		spatelUnsubmitted Not Done Reply Inline Actions Good catch. I'm not sure how to write a test for it either, but I added a "BitWidth > 2" clause: 1206a18d417a spatel: Good catch. I'm not sure how to write a test for it either, but I added a "BitWidth > 2" clause…
if (match(Op0, m_NUWMul(m_Value(X), m_APInt(MulC))) &&
BitWidth > 2 && ShAmtC * 2 == BitWidth && (*MulC - 1).isPowerOf2() &&
MulC->logBase2() == ShAmtC)		MulC->logBase2() == ShAmtC)
return BinaryOperator::CreateAnd(X, ConstantInt::get(Ty, *MulC - 2));		return BinaryOperator::CreateAnd(X, ConstantInt::get(Ty, *MulC - 2));

		// The one-use check is not strictly necessary, but codegen may not be
		xbolva00Unsubmitted Done Reply Inline Actions Once again: mul nuw x, c xbolva00: Once again: mul nuw x, c
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Sorry for that. Miss again bcl5980: Sorry for that. Miss again
		spatelUnsubmitted Not Done Reply Inline Actions I would phrase this differently: // The one-use check is not strictly necessary, but codegen may not be able // to invert the transform and perf may suffer with an extra mul instruction. spatel: I would phrase this differently: // The one-use check is not strictly necessary, but codegen…
		// able to invert the transform and perf may suffer with an extra mul
		lebedev.riUnsubmitted Not Done Reply Inline Actions No one-use check needed, only a single instruction is produced. I believe, while that `lshr` is obviously `exact`, your own proof shows that there is no need to check that it is marked as such? lebedev.ri: 1. No one-use check needed, only a single instruction is produced. 2. I believe, while that…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions If there is no one-use, we may generate 2 mul instead of mul+shr define i64 @lshr_mul_negative_oneuse(i64 %0) { ; CHECK-LABEL: @lshr_mul_negative_oneuse( ; CHECK-NEXT: [[TMP2:%.]] = mul nuw i64 [[TMP0:%.]], 52 ; CHECK-NEXT: call void @use(i64 [[TMP2]]) ; CHECK-NEXT: [[TMP3:%.]] = mul nuw i64 [[TMP0]], 13 ; CHECK-NEXT: ret i64 [[TMP3]] ; %2 = mul nuw i64 %0, 52 call void @use(i64 %2) %3 = lshr i64 %2, 2 ret i64 %3 } We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag. bcl5980:* If there is no one-use, we may generate 2 mul instead of mul+shr ``` define i64…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag. Should be "make sure MulC is divisible by NewMulC" bcl5980: > We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is…
		lebedev.riUnsubmitted Not Done Reply Inline Actions If there is no one-use, we may generate 2 mul instead of mul+shr We only create only a single `mul` here, do we not? We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag. Can you show the counter-proof that shows that not checking for `exact` is incorrect? lebedev.ri: > If there is no one-use, we may generate 2 mul instead of mul+shr We only create only a…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions If there is no one-use, we may generate 2 mul instead of mul+shr We only create only a single `mul` here, do we not? Yeah, we create only a single mul, but most of time mul should be heavier than lshr, am I right? We need a condition to make sure ShAmtC is divisible by NewMulC. The proof use shl that is always sure. But in the code we still need to check eact flag. Can you show the counter-proof that shows that not checking for `exact` is incorrect? I don't know how to counter-proof with alive2 but this is the negative case on my machine after remove exact: define i64 @lshr_mul_negative_noexact(i64 %0) { ; CHECK-LABEL: @lshr_mul_negative_noexact( ; CHECK-NEXT: [[TMP2:%.]] = mul nuw i64 [[TMP0:%.]], 13 ; CHECK-NEXT: ret i64 [[TMP2]] ; %2 = mul nuw i64 %0, 53 %3 = lshr i64 %2, 2 ret i64 %3 } bcl5980: > > If there is no one-use, we may generate 2 mul instead of mul+shr > > We only create only a…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions If there is no one-use, we may generate 2 mul instead of mul+shr We only create only a single `mul` here, do we not? declare i64 @use(i64, i64) define i64 @lshr_mul_negative_oneuse(i64 %0) { ; CHECK-LABEL: @lshr_mul_negative_oneuse( ; CHECK-NEXT: [[TMP2:%.]] = mul nuw i64 [[TMP0:%.]], 52 ; CHECK-NEXT: [[TMP3:%.]] = mul nuw i64 [[TMP0]], 13 ; CHECK-NEXT: [[TMP4:%.]] = call i64 @use(i64 [[TMP2]], i64 [[TMP3]]) ; CHECK-NEXT: ret i64 [[TMP4]] ; %2 = mul nuw i64 %0, 52 %3 = lshr i64 %2, 2 %4 = call i64 @use(i64 %2, i64 %3) ret i64 %4 } This case should more clear why we need one use I think. bcl5980: > > If there is no one-use, we may generate 2 mul instead of mul+shr > > We only create only a…
		lebedev.riUnsubmitted Not Done Reply Inline Actions The number of instructions didn't increase, so we're fine. lebedev.ri: The number of instructions didn't increase, so we're fine.
		bcl5980AuthorUnsubmitted Done Reply Inline Actions The number of instructions didn't increase, so we're fine. I still insist one-use, mul is slower than lshr on most targets. bcl5980: > The number of instructions didn't increase, so we're fine. I still insist one-use, mul is…
		// instruction.
		if (Op0->hasOneUse()) {
		spatelUnsubmitted Not Done Reply Inline Actions Replace 'c' with 'MulC' in this comment to make it clearer. spatel: Replace 'c' with 'MulC' in this comment to make it clearer.
		APInt NewMulC = MulC->lshr(ShAmtC);
		// if c is divisible by (1 << ShAmtC):
		// lshr (mul nuw x, MulC), ShAmtC -> mul nuw x, (MulC >> ShAmtC)
		if (MulC->eq(NewMulC.shl(ShAmtC))) {
		auto *NewMul =
		BinaryOperator::CreateNUWMul(X, ConstantInt::get(Ty, NewMulC));
		BinaryOperator *OrigMul = cast<BinaryOperator>(Op0);
		NewMul->setHasNoSignedWrap(OrigMul->hasNoSignedWrap());
		return NewMul;
		}
		}
		}

// Try to narrow a bswap:		// Try to narrow a bswap:
// (bswap (zext X)) >> C --> zext (bswap X >> C')		// (bswap (zext X)) >> C --> zext (bswap X >> C')
// In the case where the shift amount equals the bitwidth difference, the		// In the case where the shift amount equals the bitwidth difference, the
// shift is eliminated.		// shift is eliminated.
if (match(Op0, m_OneUse(m_Intrinsic<Intrinsic::bswap>(		if (match(Op0, m_OneUse(m_Intrinsic<Intrinsic::bswap>(
m_OneUse(m_ZExt(m_Value(X))))))) {		m_OneUse(m_ZExt(m_Value(X))))))) {
// TODO: If the shift amount is less than the zext, we could shift left.		// TODO: If the shift amount is less than the zext, we could shift left.
unsigned SrcWidth = X->getType()->getScalarSizeInBits();		unsigned SrcWidth = X->getType()->getScalarSizeInBits();
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shift-logic.ll

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	;
%sh1 = ashr exact i32 %x, 16		%sh1 = ashr exact i32 %x, 16
%t0 = xor i32 %sh1, shl (i32 ptrtoint (i32* @g to i32), i32 16)		%t0 = xor i32 %sh1, shl (i32 ptrtoint (i32* @g to i32), i32 16)
%t27 = ashr exact i32 %t0, 16		%t27 = ashr exact i32 %t0, 16
ret i32 %t27		ret i32 %t27
}		}

define i64 @lshr_mul(i64 %0) {		define i64 @lshr_mul(i64 %0) {
; CHECK-LABEL: @lshr_mul(		; CHECK-LABEL: @lshr_mul(
; CHECK-NEXT: [[TMP2:%.]] = mul nuw i64 [[TMP0:%.]], 52		; CHECK-NEXT: [[TMP2:%.]] = mul nuw i64 [[TMP0:%.]], 13
; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i64 [[TMP2]], 2		; CHECK-NEXT: ret i64 [[TMP2]]
; CHECK-NEXT: ret i64 [[TMP3]]
;		;
%2 = mul nuw i64 %0, 52		%2 = mul nuw i64 %0, 52
%3 = lshr i64 %2, 2		%3 = lshr i64 %2, 2
ret i64 %3		ret i64 %3
}		}

define i64 @lshr_mul_nuw_nsw(i64 %0) {		define i64 @lshr_mul_nuw_nsw(i64 %0) {
; CHECK-LABEL: @lshr_mul_nuw_nsw(		; CHECK-LABEL: @lshr_mul_nuw_nsw(
; CHECK-NEXT: [[TMP2:%.]] = mul nuw nsw i64 [[TMP0:%.]], 52		; CHECK-NEXT: [[TMP2:%.]] = mul nuw nsw i64 [[TMP0:%.]], 13
; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i64 [[TMP2]], 2		; CHECK-NEXT: ret i64 [[TMP2]]
; CHECK-NEXT: ret i64 [[TMP3]]
;		;
%2 = mul nuw nsw i64 %0, 52		%2 = mul nuw nsw i64 %0, 52
		spatelUnsubmitted Done Reply Inline Actions If we do not propagate the 'nsw' on this, we will not be able to recover it in general: https://alive2.llvm.org/ce/z/tbckkt spatel: If we do not propagate the 'nsw' on this, we will not be able to recover it in general: https…
%3 = lshr i64 %2, 2		%3 = lshr i64 %2, 2
ret i64 %3		ret i64 %3
}		}

define <4 x i32> @lshr_mul_vector(<4 x i32> %0) {		define <4 x i32> @lshr_mul_vector(<4 x i32> %0) {
; CHECK-LABEL: @lshr_mul_vector(		; CHECK-LABEL: @lshr_mul_vector(
; CHECK-NEXT: [[TMP2:%.]] = mul nuw <4 x i32> [[TMP0:%.]], <i32 52, i32 52, i32 52, i32 52>		; CHECK-NEXT: [[TMP2:%.]] = mul nuw <4 x i32> [[TMP0:%.]], <i32 13, i32 13, i32 13, i32 13>
; CHECK-NEXT: [[TMP3:%.*]] = lshr exact <4 x i32> [[TMP2]], <i32 2, i32 2, i32 2, i32 2>		; CHECK-NEXT: ret <4 x i32> [[TMP2]]
; CHECK-NEXT: ret <4 x i32> [[TMP3]]
;		;
%2 = mul nuw <4 x i32> %0, <i32 52, i32 52, i32 52, i32 52>		%2 = mul nuw <4 x i32> %0, <i32 52, i32 52, i32 52, i32 52>
%3 = lshr <4 x i32> %2, <i32 2, i32 2, i32 2, i32 2>		%3 = lshr <4 x i32> %2, <i32 2, i32 2, i32 2, i32 2>
ret <4 x i32> %3		ret <4 x i32> %3
}		}

define i64 @lshr_mul_negative_noexact(i64 %0) {		define i64 @lshr_mul_negative_noexact(i64 %0) {
; CHECK-LABEL: @lshr_mul_negative_noexact(		; CHECK-LABEL: @lshr_mul_negative_noexact(
Show All 24 Lines
; CHECK-NEXT: [[TMP2:%.]] = mul i64 [[TMP0:%.]], 52		; CHECK-NEXT: [[TMP2:%.]] = mul i64 [[TMP0:%.]], 52
; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i64 [[TMP2]], 2		; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i64 [[TMP2]], 2
; CHECK-NEXT: ret i64 [[TMP3]]		; CHECK-NEXT: ret i64 [[TMP3]]
;		;
%2 = mul i64 %0, 52		%2 = mul i64 %0, 52
%3 = lshr i64 %2, 2		%3 = lshr i64 %2, 2
ret i64 %3		ret i64 %3
}		}

		define i64 @lshr_mul_negative_nsw(i64 %0) {
		; CHECK-LABEL: @lshr_mul_negative_nsw(
		; CHECK-NEXT: [[TMP2:%.]] = mul nsw i64 [[TMP0:%.]], 52
		; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i64 [[TMP2]], 2
		; CHECK-NEXT: ret i64 [[TMP3]]
		;
		%2 = mul nsw i64 %0, 52
		%3 = lshr i64 %2, 2
		ret i64 %3
		}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 423931

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

llvm/test/Transforms/InstCombine/shift-logic.ll

[InstCombine] Fold mul nuw+lshr to a single multiplication when the latter is a factor
ClosedPublic