This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
11/15
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
3/6
urem-seteq-nonzero.ll
-
RISCV/
2/2
addimm-mulimm.ll
-
X86/
-
urem-seteq-nonzero.ll

Differential D83153

[DAGCombiner] Prevent regression in isMulAddWithConstProfitable
AbandonedPublic

Authored by benshi001 on Jul 3 2020, 9:24 PM.

Download Raw Diff

Details

Reviewers

spatel
lenary
craig.topper
bogner
t.p.northover

Summary

Prevent folding (mul (add x, c1), c2) to (add (mul x, c2), c1*c2),
if c1 can be directly encoded in an instruction code, while c1*c2
has to be materialized into a register.

Diff Detail

Event Timeline

benshi001 created this revision.Jul 3 2020, 9:24 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 3 2020, 9:24 PM

Herald added subscribers: llvm-commits, ecnelises, luismarques and 22 others. · View Herald Transcript

My patch at least generates better code x86, aarch64 and riscv.

For x86's test urem-seteq-nonzero.ll, less instructions are emitted.

For aarch64's test urem-seteq-nonzero.ll,

2 cases have one more instruction emitted,
other 2 cases have one less instruction emitted,
other 9 cases have no change in instruction amount, but have madd replaced by mul.

Since madd has larger latency than mul, I think my change also makes aarch64 optimized in total.

There is not previous case for RISCV, but my changes also does optimization for RISCV.
For example,

%tmp0 = add i32 %x, 1971
%tmp1 = mul i32 %tmp0, 19

The original llvm generates,

addi    a1, zero, 19
mul     a0, a0, a1
lui     a1, 9
addi    a1, a1, 585
add     a0, a0, a1

And my patch optimizes it to

addi    a0, a0, 1971
addi    a1, zero, 19
mul     a0, a0, a1

Harbormaster failed remote builds in B62890: Diff 275478!Jul 3 2020, 10:11 PM

The build failure does not related to my changes, today's llvm master branch also has them after running "make check-all"

benshi001 edited the summary of this revision. (Show Details)Jul 4 2020, 1:25 AM

lebedev.ri added a subscriber: lebedev.ri.Jul 4 2020, 3:03 AM

lebedev.ri added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15822–15826	You also need a second piece if the puzzle, because we would have already performed this transform before ever getting here: https://godbolt.org/z/TE_bzk You need to add an inverse transform into DAGCombiner.cpp.

For these kinds of patches where you add new tests which show a difference in code quality it's helpful if you can split the tests into a separate patch. You can then add that other patch as a dependency for this one, creating a Stack in Phabricator. That way it's easier to see the differences in the generated code.

spatel added inline comments.Jul 4 2020, 8:07 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15817–15820	How do we know these constants are not wider than 64-bits?
15822–15826	Right - I asked if the motivation for this was GEP math: http://lists.llvm.org/pipermail/llvm-dev/2020-July/142979.html ...but I did not see an answer. If yes, then we should have GEP tests. If not, then this patch isn't enough.

benshi001 marked 3 inline comments as done.Jul 4 2020, 9:12 AM

benshi001 added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15817–15820	In real world, no machines will encode a large imm into instruction. Even x86 support legalAddImm up to 32-bit, and arm & riscv only support to 12-bit. So I think using int64_t here is OK. If the real value exceed that, then the DAG falls to normal path --- get transformed. What's your opinion?
15822–15826	No. I target for normal IR transform, not GEP. So I need to do a inverse transform?

benshi001 marked an inline comment as done.Jul 4 2020, 9:37 AM

benshi001 added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15822–15826	Where is the other place perform the same transform before reach here?

lebedev.ri added inline comments.Jul 4 2020, 9:46 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15817–15820	In real world, no machines will encode a large imm into instruction. Even x86 support legalAddImm up to 32-bit, and arm & riscv only support to 12-bit. I agree that `isLegalAddImmediate()` takes int64_t, but that is orthogonal to the question. The question being: when doing `int64_t C1 = C1Node->getSExtValue();`, is it guaranteed that `APInt` value in `C1Node` is at most 64-bit? If not, this code will crash with an assertion.

Thanks for all of your suggestions. I have uploaded a new patch edition.

Seperate the test cases to show improvement in another patch.

Done. https://reviews.llvm.org/D83159

Make sure c1 and c2 do not exceed int64, to avoid assert failure.

Done. One more if-statment is added to check that.

Make a inverse transform if "opt -instcombine" has been performed.

Shall we seperate this inverse transform in another patch? At least this patch works
for "clang a.c -O2 -Wall --target=x86/riscv/aarch64" if a.c contains code pattern like
"(a + 999) * 888". At least this patch can prevent the regression in such circumstance.

benshi001 marked 2 inline comments as done.Jul 5 2020, 4:57 AM

MaskRay added inline comments.Jul 5 2020, 6:12 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15817	This should be `>=`
15825	C1 * C2 may trigger a signed overflow. (C1 * C2) may be negative and left shifting a signed integer is UB before C++20. Please use llvm::SignExtend64(uint64_t, unsigned)

benshi001 marked an inline comment as done.Jul 6 2020, 1:15 AM

benshi001 added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15817	This should be >, not >=, otherwise riscv64/aarch64 will still fall to regression for i64 add-mul.

One more change:

add check if c1*c2 is overflow.

benshi001 marked an inline comment as done.Jul 6 2020, 2:30 AM

Chnage list according to all your comments.

Seperate the test cases to show improvement in another patch.

Done. https://reviews.llvm.org/D83159

Make sure c1 and c2 do not exceed int64, to avoid assert failure.

Done. One more if-statment is added to check that.
(the condition should be >, not >=, otherwise riscv64 can not be optimized)

Check if c1*c2 is overflow.

Done One more if-statment for that is added.

Make a inverse transform if "opt -instcombine" has been performed.

Shall we seperate this inverse transform in another patch? At least this patch improves
the test case urem-seteq-nonzero.ll, and the case in https://reviews.llvm.org/D83159

If it's not too much trouble, I've added D83159 as a parent patch, which are the tests for this change in the RISC-V backend.

I realise this shouldn't block changes to the other targets, but given the transform is target-independent, it makes sense to see how it affects more targets if possible.

Generally looks good, can you also add a test which will trigger an overflow with your previous revision?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15817	Nit: const unsigned Bits
15824	// Prevent the transform if c1*c2 overflows.
15825	ConstNode.getScalarValueSizeInBits() -> Bits
15828	Store `C1 * C2` in a variable. Please don't repeat multiplication.

benshi001 updated this revision to Diff 276023.Jul 7 2020, 5:48 AM

benshi001 edited the summary of this revision. (Show Details)

Change list according to all your comments.

Seperate the test cases to show improvement in another patch.

Done. https://reviews.llvm.org/D83159

Make sure c1 and c2 do not exceed int64, to avoid assert failure.

Done. One more if-statment is added to check that.

Check if c1*c2 is overflow.

Done One more if-statment for that is added.

Add a new test case case triggers the overflow check.

I will do that in https://reviews.llvm.org/D83159

Make a inverse transform if "opt -instcombine" has been performed.

Shall we seperate this inverse transform in another patch? At least this patch improves
the test case urem-seteq-nonzero.ll, and the case in https://reviews.llvm.org/D83159

benshi001 updated this revision to Diff 277244.Jul 11 2020, 8:49 AM

benshi001 edited the summary of this revision. (Show Details)

Change list according to all your comments.

Seperate the test cases to show improvement in another patch.

Done. https://reviews.llvm.org/D83159, which has been landed.

Make sure c1 and c2 do not exceed int64, to avoid assert failure.

Done. One more if-statment is added to check that.

Check if c1*c2 is overflow.

If we stop the transform when c1*c2 overflows, the x86 will be impacked a lot,
I am afraid introducing more regression.

Make a inverse transform if "opt -instcombine" has been performed.

Shall we seperate this inverse transform in another patch? At least this patch improves
the test case urem-seteq-nonzero.ll, and the case llvm/test/CodeGen/RISCV/addimm-mulimm.ll

Some other small fixes.

Concusion,

RISCV got improved

X86 got slight improved

For aarch64's test urem-seteq-nonzero.ll,

3.1. two cases have one more instruction emitted,
3.2. two other cases have one less instruction emitted,
3.3. nine other cases have no change in instruction amount, but have madd replaced by mul.
Since madd has larger latency than mul, I think my change also makes aarch64 optimized in total.

ping

benshi001 updated this revision to Diff 279074.Jul 19 2020, 5:40 AM

spatel added subscribers: fhahn, efriedma, dmgreen.Jul 22 2020, 9:01 AM

spatel added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15822–15826	The general transform is happening in IR in instcombine, so for example the RISC-V tests are not canonical IR. That means we will not see the pattern that you are expecting in the usual case for "mul (add X, C1), C2" in source code. You can test that by compiling from source code to IR (and on to asm) for something like this: $ cat mad.c #include <stdint.h> uint64_t f(uint64_t x) { return (x + 424242) * 15700; } $ clang -O1 mad.c -S -o - -emit-llvm define i64 @f(i64 %x) { %t0 = mul i64 %x, 15700 %mul = add i64 %t0, 6660599400 ret i64 %mul }
llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll
7–11	I think we need to hear from someone with more AArch knowledge if this is an improvement or acceptable. cc @dmgreen @efriedma @fhahn
llvm/test/CodeGen/RISCV/addimm-mulimm.ll
74	Why is this test deleted?

efriedma added inline comments.Jul 22 2020, 11:04 AM

llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll
7–11	The new code is worse; madd is essentially the same cost as mul, so the new code has one more arithmetic instruction. Better to spend an extra instruction materializing a constant, particularly if the code is in a loop.

benshi001 marked 3 inline comments as done.Jul 22 2020, 7:18 PM

benshi001 added inline comments.

llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll
7–11	But there are also other cases instruction count is reduced, see line 156-162 and line 176-180, I think it is optimized in total, though worse in specific cases.
llvm/test/CodeGen/RISCV/addimm-mulimm.ll
74	This code was also added by me, which is expected to be affected by this patch. But actually not, so I remove them.

benshi001 marked 2 inline comments as done.Jul 22 2020, 7:30 PM

benshi001 added inline comments.

llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll
7–11	There are many aarch64 cases affected, some became better while some became worse, but they are better in total, what's your opinion?

efriedma added inline comments.Jul 23 2020, 4:54 PM

llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll
7–11	156-162 is fewer instructions, but it's higher latency, and the current heuristic doesn't really have any way to usefully predict the result here. I'd prefer to keep this transform off for AArch64.

benshi001 abandoned this revision.Jul 23 2020, 6:16 PM

benshi001 marked an inline comment as done.

benshi001 added inline comments.

llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll
7–11	I see. Thanks. I also have other patches for ARM. And you are appreciated to help me review them.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

25 lines

test/

CodeGen/

AArch64/

urem-seteq-nonzero.ll

96 lines

RISCV/

addimm-mulimm.ll

33 lines

X86/

urem-seteq-nonzero.ll

7 lines

Diff 277244

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 15,804 Lines • ▼ Show 20 Lines
	// (A + c1) * c3			// (A + c1) * c3
	// (A + c2) * c3			// (A + c2) * c3
	// We're checking for cases where we have common "c3 * A" expressions.			// We're checking for cases where we have common "c3 * A" expressions.
	bool DAGCombiner::isMulAddWithConstProfitable(SDNode *MulNode,			bool DAGCombiner::isMulAddWithConstProfitable(SDNode *MulNode,
	SDValue &AddNode,			SDValue &AddNode,
	SDValue &ConstNode) {			SDValue &ConstNode) {
	APInt Val;			APInt Val;

	// If the add only has one use, this would be OK to do.			// If the add only has one use, do further check of c1 and c1*c2.
	if (AddNode.getNode()->hasOneUse())			if (AddNode.getNode()->hasOneUse()) {
				// There is no regression by the transform since both c1 and c2
				// are too large.
				const unsigned Bits = ConstNode.getScalarValueSizeInBits();
				MaskRayUnsubmitted Done Reply Inline Actions This should be `>=` MaskRay: This should be `>=`
				benshi001AuthorUnsubmitted Done Reply Inline Actions This should be >, not >=, otherwise riscv64/aarch64 will still fall to regression for i64 add-mul. benshi001: This should be >, not >=, otherwise riscv64/aarch64 will still fall to regression for i64 add…
				MaskRayUnsubmitted Done Reply Inline Actions Nit: const unsigned Bits MaskRay: Nit: const unsigned Bits
				if (Bits > 8 * sizeof(int64_t))
	return true;			return true;
				if (auto *C1Node = dyn_cast<ConstantSDNode>(AddNode.getOperand(1)))
				spatelUnsubmitted Done Reply Inline Actions How do we know these constants are not wider than 64-bits? spatel: How do we know these constants are not wider than 64-bits?
				benshi001AuthorUnsubmitted Done Reply Inline Actions In real world, no machines will encode a large imm into instruction. Even x86 support legalAddImm up to 32-bit, and arm & riscv only support to 12-bit. So I think using int64_t here is OK. If the real value exceed that, then the DAG falls to normal path --- get transformed. What's your opinion? benshi001: In real world, no machines will encode a large imm into instruction. Even x86 support…
				lebedev.riUnsubmitted Done Reply Inline Actions In real world, no machines will encode a large imm into instruction. Even x86 support legalAddImm up to 32-bit, and arm & riscv only support to 12-bit. I agree that `isLegalAddImmediate()` takes int64_t, but that is orthogonal to the question. The question being: when doing `int64_t C1 = C1Node->getSExtValue();`, is it guaranteed that `APInt` value in `C1Node` is at most 64-bit? If not, this code will crash with an assertion. lebedev.ri: > In real world, no machines will encode a large imm into instruction. Even x86 support…
				if (auto *C2Node = dyn_cast<ConstantSDNode>(ConstNode)) {
				const APInt &C1 = C1Node->getAPIntValue();
				const APInt &C2 = C2Node->getAPIntValue();
				const APInt C1C2 = C1 * C2;
				MaskRayUnsubmitted Done Reply Inline Actions // Prevent the transform if c1c2 overflows. MaskRay:* // Prevent the transform if c1*c2 overflows.
				// Do sign extension for c1*c2 according to c2's type.
				MaskRayUnsubmitted Done Reply Inline Actions C1 * C2 may trigger a signed overflow. (C1 * C2) may be negative and left shifting a signed integer is UB before C++20. Please use llvm::SignExtend64(uint64_t, unsigned) MaskRay: C1 * C2 may trigger a signed overflow. (C1 * C2) may be negative and left shifting a signed…
				MaskRayUnsubmitted Not Done Reply Inline Actions ConstNode.getScalarValueSizeInBits() -> Bits MaskRay: ConstNode.getScalarValueSizeInBits() -> Bits
				int64_t C1C2SExt = llvm::SignExtend64(C1C2.getZExtValue(), Bits);
				lebedev.riUnsubmitted Not Done Reply Inline Actions You also need a second piece if the puzzle, because we would have already performed this transform before ever getting here: https://godbolt.org/z/TE_bzk You need to add an inverse transform into DAGCombiner.cpp. lebedev.ri: You also need a second piece if the puzzle, because we would have already performed this…
				spatelUnsubmitted Not Done Reply Inline Actions Right - I asked if the motivation for this was GEP math: http://lists.llvm.org/pipermail/llvm-dev/2020-July/142979.html ...but I did not see an answer. If yes, then we should have GEP tests. If not, then this patch isn't enough. spatel: Right - I asked if the motivation for this was GEP math: http://lists.llvm.org/pipermail/llvm…
				benshi001AuthorUnsubmitted Done Reply Inline Actions No. I target for normal IR transform, not GEP. So I need to do a inverse transform? benshi001: No. I target for normal IR transform, not GEP. So I need to do a inverse transform?
				benshi001AuthorUnsubmitted Done Reply Inline Actions Where is the other place perform the same transform before reach here? benshi001: Where is the other place perform the same transform before reach here?
				spatelUnsubmitted Not Done Reply Inline Actions The general transform is happening in IR in instcombine, so for example the RISC-V tests are not canonical IR. That means we will not see the pattern that you are expecting in the usual case for "mul (add X, C1), C2" in source code. You can test that by compiling from source code to IR (and on to asm) for something like this: $ cat mad.c #include <stdint.h> uint64_t f(uint64_t x) { return (x + 424242) * 15700; } $ clang -O1 mad.c -S -o - -emit-llvm define i64 @f(i64 %x) { %t0 = mul i64 %x, 15700 %mul = add i64 %t0, 6660599400 ret i64 %mul } spatel: The general transform is happening in IR in instcombine, so for example the RISC-V tests are…
				// This transform will introduce regression, if c1 is legal add
				// immediate while c1*c2 isn't.
				MaskRayUnsubmitted Done Reply Inline Actions Store `C1 * C2` in a variable. Please don't repeat multiplication. MaskRay: Store `C1 * C2` in a variable. Please don't repeat multiplication.
				const TargetLowering &TLI = DAG.getTargetLoweringInfo();
				if (TLI.isLegalAddImmediate(C1.getSExtValue()) &&
				!TLI.isLegalAddImmediate(C1C2SExt))
				return false;
				}
				// It is OK to do the transform.
				return true;
				}

	// Walk all the users of the constant with which we're multiplying.			// Walk all the users of the constant with which we're multiplying.
	for (SDNode *Use : ConstNode->uses()) {			for (SDNode *Use : ConstNode->uses()) {
	if (Use == MulNode) // This use is the one we're on right now. Skip it.			if (Use == MulNode) // This use is the one we're on right now. Skip it.
	continue;			continue;

	if (Use->getOpcode() == ISD::MUL) { // We have another multiply use.			if (Use->getOpcode() == ISD::MUL) { // We have another multiply use.
	SDNode *OtherOp;			SDNode *OtherOp;
	▲ Show 20 Lines • Show All 6,283 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s \| FileCheck %s		; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s \| FileCheck %s

define i1 @t32_3_1(i32 %X) nounwind {		define i1 @t32_3_1(i32 %X) nounwind {
; CHECK-LABEL: t32_3_1:		; CHECK-LABEL: t32_3_1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: movk w8, #43690, lsl #16		; CHECK-NEXT: sub w8, w0, #1 // =1
		; CHECK-NEXT: movk w9, #43690, lsl #16
		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #1431655765		; CHECK-NEXT: mov w9, #1431655765
		spatelUnsubmitted Not Done Reply Inline Actions I think we need to hear from someone with more AArch knowledge if this is an improvement or acceptable. cc @dmgreen @efriedma @fhahn spatel: I think we need to hear from someone with more AArch knowledge if this is an improvement or…
		efriedmaUnsubmitted Not Done Reply Inline Actions The new code is worse; madd is essentially the same cost as mul, so the new code has one more arithmetic instruction. Better to spend an extra instruction materializing a constant, particularly if the code is in a loop. efriedma: The new code is worse; madd is essentially the same cost as mul, so the new code has one more…
		benshi001AuthorUnsubmitted Done Reply Inline Actions But there are also other cases instruction count is reduced, see line 156-162 and line 176-180, I think it is optimized in total, though worse in specific cases. benshi001: But there are also other cases instruction count is reduced, see line 156-162 and line 176-180…
		benshi001AuthorUnsubmitted Done Reply Inline Actions There are many aarch64 cases affected, some became better while some became worse, but they are better in total, what's your opinion? benshi001: There are many aarch64 cases affected, some became better while some became worse, but they are…
		efriedmaUnsubmitted Not Done Reply Inline Actions 156-162 is fewer instructions, but it's higher latency, and the current heuristic doesn't really have any way to usefully predict the result here. I'd prefer to keep this transform off for AArch64. efriedma: 156-162 is fewer instructions, but it's higher latency, and the current heuristic doesn't…
		benshi001AuthorUnsubmitted Done Reply Inline Actions I see. Thanks. I also have other patches for ARM. And you are appreciated to help me review them. benshi001: I see. Thanks. I also have other patches for ARM. And you are appreciated to help me review…
; CHECK-NEXT: madd w8, w0, w8, w9
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 3		%urem = urem i32 %X, 3
%cmp = icmp eq i32 %urem, 1		%cmp = icmp eq i32 %urem, 1
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t32_3_2(i32 %X) nounwind {		define i1 @t32_3_2(i32 %X) nounwind {
; CHECK-LABEL: t32_3_2:		; CHECK-LABEL: t32_3_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: movk w8, #43690, lsl #16		; CHECK-NEXT: sub w8, w0, #2 // =2
; CHECK-NEXT: mov w9, #-1431655766		; CHECK-NEXT: movk w9, #43690, lsl #16
; CHECK-NEXT: madd w8, w0, w8, w9		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #1431655765		; CHECK-NEXT: mov w9, #1431655765
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 3		%urem = urem i32 %X, 3
%cmp = icmp eq i32 %urem, 2		%cmp = icmp eq i32 %urem, 2
ret i1 %cmp		ret i1 %cmp
}		}


define i1 @t32_5_1(i32 %X) nounwind {		define i1 @t32_5_1(i32 %X) nounwind {
; CHECK-LABEL: t32_5_1:		; CHECK-LABEL: t32_5_1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #52429		; CHECK-NEXT: mov w9, #52429
; CHECK-NEXT: movk w8, #52428, lsl #16		; CHECK-NEXT: sub w8, w0, #1 // =1
		; CHECK-NEXT: movk w9, #52428, lsl #16
		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #858993459		; CHECK-NEXT: mov w9, #858993459
; CHECK-NEXT: madd w8, w0, w8, w9
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 5		%urem = urem i32 %X, 5
%cmp = icmp eq i32 %urem, 1		%cmp = icmp eq i32 %urem, 1
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t32_5_2(i32 %X) nounwind {		define i1 @t32_5_2(i32 %X) nounwind {
; CHECK-LABEL: t32_5_2:		; CHECK-LABEL: t32_5_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #52429		; CHECK-NEXT: mov w9, #52429
; CHECK-NEXT: movk w8, #52428, lsl #16		; CHECK-NEXT: sub w8, w0, #2 // =2
; CHECK-NEXT: mov w9, #1717986918		; CHECK-NEXT: movk w9, #52428, lsl #16
; CHECK-NEXT: madd w8, w0, w8, w9		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #858993459		; CHECK-NEXT: mov w9, #858993459
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 5		%urem = urem i32 %X, 5
%cmp = icmp eq i32 %urem, 2		%cmp = icmp eq i32 %urem, 2
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t32_5_3(i32 %X) nounwind {		define i1 @t32_5_3(i32 %X) nounwind {
; CHECK-LABEL: t32_5_3:		; CHECK-LABEL: t32_5_3:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #52429		; CHECK-NEXT: mov w9, #52429
; CHECK-NEXT: movk w8, #52428, lsl #16		; CHECK-NEXT: sub w8, w0, #3 // =3
; CHECK-NEXT: mov w9, #-1717986919		; CHECK-NEXT: movk w9, #52428, lsl #16
; CHECK-NEXT: madd w8, w0, w8, w9		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #858993459		; CHECK-NEXT: mov w9, #858993459
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 5		%urem = urem i32 %X, 5
%cmp = icmp eq i32 %urem, 3		%cmp = icmp eq i32 %urem, 3
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t32_5_4(i32 %X) nounwind {		define i1 @t32_5_4(i32 %X) nounwind {
; CHECK-LABEL: t32_5_4:		; CHECK-LABEL: t32_5_4:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #52429		; CHECK-NEXT: mov w9, #52429
; CHECK-NEXT: movk w8, #52428, lsl #16		; CHECK-NEXT: sub w8, w0, #4 // =4
; CHECK-NEXT: mov w9, #-858993460		; CHECK-NEXT: movk w9, #52428, lsl #16
; CHECK-NEXT: madd w8, w0, w8, w9		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #858993459		; CHECK-NEXT: mov w9, #858993459
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 5		%urem = urem i32 %X, 5
%cmp = icmp eq i32 %urem, 4		%cmp = icmp eq i32 %urem, 4
ret i1 %cmp		ret i1 %cmp
}		}


define i1 @t32_6_1(i32 %X) nounwind {		define i1 @t32_6_1(i32 %X) nounwind {
; CHECK-LABEL: t32_6_1:		; CHECK-LABEL: t32_6_1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: movk w8, #43690, lsl #16		; CHECK-NEXT: sub w8, w0, #1 // =1
; CHECK-NEXT: mov w9, #1431655765		; CHECK-NEXT: movk w9, #43690, lsl #16
; CHECK-NEXT: madd w8, w0, w8, w9		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: ror w8, w8, #1		; CHECK-NEXT: ror w8, w8, #1
; CHECK-NEXT: movk w9, #10922, lsl #16		; CHECK-NEXT: movk w9, #10922, lsl #16
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 6		%urem = urem i32 %X, 6
%cmp = icmp eq i32 %urem, 1		%cmp = icmp eq i32 %urem, 1
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t32_6_2(i32 %X) nounwind {		define i1 @t32_6_2(i32 %X) nounwind {
; CHECK-LABEL: t32_6_2:		; CHECK-LABEL: t32_6_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: movk w8, #43690, lsl #16		; CHECK-NEXT: sub w8, w0, #2 // =2
; CHECK-NEXT: mov w9, #-1431655766		; CHECK-NEXT: movk w9, #43690, lsl #16
; CHECK-NEXT: madd w8, w0, w8, w9		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: ror w8, w8, #1		; CHECK-NEXT: ror w8, w8, #1
; CHECK-NEXT: movk w9, #10922, lsl #16		; CHECK-NEXT: movk w9, #10922, lsl #16
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 6		%urem = urem i32 %X, 6
%cmp = icmp eq i32 %urem, 2		%cmp = icmp eq i32 %urem, 2
Show All 16 Lines	; CHECK-NEXT: ret
%urem = urem i32 %X, 6		%urem = urem i32 %X, 6
%cmp = icmp eq i32 %urem, 3		%cmp = icmp eq i32 %urem, 3
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t32_6_4(i32 %X) nounwind {		define i1 @t32_6_4(i32 %X) nounwind {
; CHECK-LABEL: t32_6_4:		; CHECK-LABEL: t32_6_4:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: mov w9, #21844		; CHECK-NEXT: sub w8, w0, #4 // =4
; CHECK-NEXT: movk w8, #43690, lsl #16		; CHECK-NEXT: movk w9, #43690, lsl #16
; CHECK-NEXT: movk w9, #21845, lsl #16		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: madd w8, w0, w8, w9
; CHECK-NEXT: mov w9, #43690		; CHECK-NEXT: mov w9, #43690
; CHECK-NEXT: ror w8, w8, #1		; CHECK-NEXT: ror w8, w8, #1
; CHECK-NEXT: movk w9, #10922, lsl #16		; CHECK-NEXT: movk w9, #10922, lsl #16
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 6		%urem = urem i32 %X, 6
%cmp = icmp eq i32 %urem, 4		%cmp = icmp eq i32 %urem, 4
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t32_6_5(i32 %X) nounwind {		define i1 @t32_6_5(i32 %X) nounwind {
; CHECK-LABEL: t32_6_5:		; CHECK-LABEL: t32_6_5:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #43691		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: mov w9, #43689		; CHECK-NEXT: sub w8, w0, #5 // =5
; CHECK-NEXT: movk w8, #43690, lsl #16
; CHECK-NEXT: movk w9, #43690, lsl #16		; CHECK-NEXT: movk w9, #43690, lsl #16
; CHECK-NEXT: madd w8, w0, w8, w9		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #43690		; CHECK-NEXT: mov w9, #43690
; CHECK-NEXT: ror w8, w8, #1		; CHECK-NEXT: ror w8, w8, #1
; CHECK-NEXT: movk w9, #10922, lsl #16		; CHECK-NEXT: movk w9, #10922, lsl #16
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i32 %X, 6		%urem = urem i32 %X, 6
%cmp = icmp eq i32 %urem, 5		%cmp = icmp eq i32 %urem, 5
ret i1 %cmp		ret i1 %cmp
}		}

;-------------------------------------------------------------------------------		;-------------------------------------------------------------------------------
; Other widths.		; Other widths.

define i1 @t16_3_2(i16 %X) nounwind {		define i1 @t16_3_2(i16 %X) nounwind {
; CHECK-LABEL: t16_3_2:		; CHECK-LABEL: t16_3_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: and w8, w0, #0xffff		; CHECK-NEXT: and w8, w0, #0xffff
		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: movk w9, #43690, lsl #16		; CHECK-NEXT: movk w9, #43690, lsl #16
; CHECK-NEXT: mov w10, #-1431655766		; CHECK-NEXT: sub w8, w8, #2 // =2
; CHECK-NEXT: madd w8, w8, w9, w10		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #1431655765		; CHECK-NEXT: mov w9, #1431655765
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i16 %X, 3		%urem = urem i16 %X, 3
%cmp = icmp eq i16 %urem, 2		%cmp = icmp eq i16 %urem, 2
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t8_3_2(i8 %X) nounwind {		define i1 @t8_3_2(i8 %X) nounwind {
; CHECK-LABEL: t8_3_2:		; CHECK-LABEL: t8_3_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: and w8, w0, #0xff		; CHECK-NEXT: and w8, w0, #0xff
		; CHECK-NEXT: mov w9, #43691
; CHECK-NEXT: movk w9, #43690, lsl #16		; CHECK-NEXT: movk w9, #43690, lsl #16
; CHECK-NEXT: mov w10, #-1431655766		; CHECK-NEXT: sub w8, w8, #2 // =2
; CHECK-NEXT: madd w8, w8, w9, w10		; CHECK-NEXT: mul w8, w8, w9
; CHECK-NEXT: mov w9, #1431655765		; CHECK-NEXT: mov w9, #1431655765
; CHECK-NEXT: cmp w8, w9		; CHECK-NEXT: cmp w8, w9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i8 %X, 3		%urem = urem i8 %X, 3
%cmp = icmp eq i8 %urem, 2		%cmp = icmp eq i8 %urem, 2
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @t64_3_2(i64 %X) nounwind {		define i1 @t64_3_2(i64 %X) nounwind {
; CHECK-LABEL: t64_3_2:		; CHECK-LABEL: t64_3_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-6148914691236517206
; CHECK-NEXT: movk x8, #43691
; CHECK-NEXT: mov x9, #-6148914691236517206		; CHECK-NEXT: mov x9, #-6148914691236517206
; CHECK-NEXT: madd x8, x0, x8, x9		; CHECK-NEXT: sub x8, x0, #2 // =2
		; CHECK-NEXT: movk x9, #43691
		; CHECK-NEXT: mul x8, x8, x9
; CHECK-NEXT: mov x9, #6148914691236517205		; CHECK-NEXT: mov x9, #6148914691236517205
; CHECK-NEXT: cmp x8, x9		; CHECK-NEXT: cmp x8, x9
; CHECK-NEXT: cset w0, lo		; CHECK-NEXT: cset w0, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%urem = urem i64 %X, 3		%urem = urem i64 %X, 3
%cmp = icmp eq i64 %urem, 2		%cmp = icmp eq i64 %urem, 2
ret i1 %cmp		ret i1 %cmp
}		}

llvm/test/CodeGen/RISCV/addimm-mulimm.ll

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	; RV64IM-NEXT: ret
%tmp0 = add i32 %x, 8953		%tmp0 = add i32 %x, 8953
%tmp1 = mul i32 %tmp0, 13		%tmp1 = mul i32 %tmp0, 13
ret i32 %tmp1		ret i32 %tmp1
}		}

define signext i32 @add_mul_trans_reject_1(i32 %x) {		define signext i32 @add_mul_trans_reject_1(i32 %x) {
; RV32IM-LABEL: add_mul_trans_reject_1		; RV32IM-LABEL: add_mul_trans_reject_1
; RV32IM: # %bb.0:		; RV32IM: # %bb.0:
		; RV32IM-NEXT: addi a0, a0, 1971
; RV32IM-NEXT: addi a1, zero, 19		; RV32IM-NEXT: addi a1, zero, 19
; RV32IM-NEXT: mul a0, a0, a1		; RV32IM-NEXT: mul a0, a0, a1
; RV32IM-NEXT: lui a1, 9
; RV32IM-NEXT: addi a1, a1, 585
; RV32IM-NEXT: add a0, a0, a1
; RV32IM-NEXT: ret		; RV32IM-NEXT: ret
;		;
; RV64IM-LABEL: add_mul_trans_reject_1		; RV64IM-LABEL: add_mul_trans_reject_1
; RV64IM: # %bb.0:		; RV64IM: # %bb.0:
		; RV64IM-NEXT: addi a0, a0, 1971
; RV64IM-NEXT: addi a1, zero, 19		; RV64IM-NEXT: addi a1, zero, 19
; RV64IM-NEXT: mul a0, a0, a1		; RV64IM-NEXT: mulw a0, a0, a1
; RV64IM-NEXT: lui a1, 9
; RV64IM-NEXT: addiw a1, a1, 585
; RV64IM-NEXT: addw a0, a0, a1
; RV64IM-NEXT: ret		; RV64IM-NEXT: ret
%tmp0 = add i32 %x, 1971		%tmp0 = add i32 %x, 1971
%tmp1 = mul i32 %tmp0, 19		%tmp1 = mul i32 %tmp0, 19
ret i32 %tmp1		ret i32 %tmp1
}		}

define signext i32 @add_mul_trans_reject_2(i32 %x) {
spatelUnsubmitted Done Reply Inline Actions Why is this test deleted? spatel: Why is this test deleted?
benshi001AuthorUnsubmitted Done Reply Inline Actions This code was also added by me, which is expected to be affected by this patch. But actually not, so I remove them. benshi001: This code was also added by me, which is expected to be affected by this patch. But actually…
; RV32IM: # %bb.0:
; RV32IM-NEXT: lui a1, 792
; RV32IM-NEXT: addi a1, a1, -1709
; RV32IM-NEXT: mul a0, a0, a1
; RV32IM-NEXT: lui a1, 1014660
; RV32IM-NEXT: addi a1, a1, -1891
; RV32IM-NEXT: add a0, a0, a1
; RV32IM-NEXT: ret
;
; RV64IM: # %bb.0:
; RV64IM-NEXT: lui a1, 792
; RV64IM-NEXT: addiw a1, a1, -1709
; RV64IM-NEXT: mul a0, a0, a1
; RV64IM-NEXT: lui a1, 1014660
; RV64IM-NEXT: addiw a1, a1, -1891
; RV64IM-NEXT: addw a0, a0, a1
; RV64IM-NEXT: ret
%tmp0 = add i32 %x, 1841231
%tmp1 = mul i32 %tmp0, 3242323
ret i32 %tmp1
}

llvm/test/CodeGen/X86/urem-seteq-nonzero.ll

	Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines
	; X86-NEXT: xorl $2, %eax			; X86-NEXT: xorl $2, %eax
	; X86-NEXT: orl %edx, %eax			; X86-NEXT: orl %edx, %eax
	; X86-NEXT: sete %al			; X86-NEXT: sete %al
	; X86-NEXT: addl $12, %esp			; X86-NEXT: addl $12, %esp
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: t64_3_2:			; X64-LABEL: t64_3_2:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: addq $-2, %rdi
	; X64-NEXT: movabsq $-6148914691236517205, %rax # imm = 0xAAAAAAAAAAAAAAAB			; X64-NEXT: movabsq $-6148914691236517205, %rax # imm = 0xAAAAAAAAAAAAAAAB
	; X64-NEXT: imulq %rdi, %rax			; X64-NEXT: imulq %rdi, %rax
	; X64-NEXT: movabsq $-6148914691236517206, %rcx # imm = 0xAAAAAAAAAAAAAAAA			; X64-NEXT: movabsq $6148914691236517205, %rcx # imm = 0x5555555555555555
	; X64-NEXT: addq %rax, %rcx			; X64-NEXT: cmpq %rcx, %rax
	; X64-NEXT: movabsq $6148914691236517205, %rax # imm = 0x5555555555555555
	; X64-NEXT: cmpq %rax, %rcx
	; X64-NEXT: setb %al			; X64-NEXT: setb %al
	; X64-NEXT: retq			; X64-NEXT: retq
	%urem = urem i64 %X, 3			%urem = urem i64 %X, 3
	%cmp = icmp eq i64 %urem, 2			%cmp = icmp eq i64 %urem, 2
	ret i1 %cmp			ret i1 %cmp
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Prevent regression in isMulAddWithConstProfitableAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 277244

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/urem-seteq-nonzero.ll

llvm/test/CodeGen/RISCV/addimm-mulimm.ll

llvm/test/CodeGen/X86/urem-seteq-nonzero.ll

[DAGCombiner] Prevent regression in isMulAddWithConstProfitable
AbandonedPublic