This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
TargetLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
mul_pow2.ll
-
Mips/
1/1
urem-seteq-illegal-types.ll
-
RISCV/
-
mul.ll
-
X86/
-
mul-demand.ll

Differential D120216

[DAG] try to convert multiply to shift via demanded bits
ClosedPublic

Authored by spatel on Feb 20 2022, 12:02 PM.

Download Raw Diff

Details

Reviewers

asb
craig.topper
dmgreen
RKSimon
lebedev.ri

Commits

rG21d7c3bcc646: [DAG] try to convert multiply to shift via demanded bits

Summary

This is a fix for a regression discussed in:
https://github.com/llvm/llvm-project/issues/53829

We cleared more high multiplier bits with 995d400, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator.

This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness:
https://alive2.llvm.org/ce/z/GgSKVX

The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift).

There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common:
https://alive2.llvm.org/ce/z/7QY_Ga

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Feb 20 2022, 12:02 PM

Herald added subscribers: luke957, frasercrmck, luismarques and 26 others. · View Herald TranscriptFeb 20 2022, 12:02 PM

spatel requested review of this revision.Feb 20 2022, 12:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2022, 12:02 PM

Herald added subscribers: llvm-commits, • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B150608: Diff 410171.Feb 20 2022, 12:03 PM

RKSimon added inline comments.Feb 20 2022, 12:32 PM

llvm/test/CodeGen/Mips/urem-seteq-illegal-types.ll
215	How did just a single add become a sub?

xbolva00 added a subscriber: xbolva00.Feb 20 2022, 12:47 PM

xbolva00 added inline comments.

llvm/test/CodeGen/PowerPC/urem-seteq-illegal-types.ll
261 ↗	(On Diff #410171)	also interesting change

spatel marked 2 inline comments as done.Feb 20 2022, 5:57 PM

spatel added inline comments.

llvm/test/CodeGen/PowerPC/urem-seteq-illegal-types.ll
261 ↗	(On Diff #410171)	Yes - this is the same test as for Mips above here. After legalization, we have: t56: i64 = mul t2, Constant:i64<2> t58: i64 = add t55, t56 ... t59: i64 = add t58, t57 t65: i64 = and t59, Constant:i64<3> So we are shifting a single meaningful demanded bit to bit 1, and I think the code is correct as shown here: https://alive2.llvm.org/ce/z/cqL3SC Notice that the `sub` becomes an `add` in IR with instcombine. I fixed a similar gap in DAG folding with: a2963d871ee5 ...but we need yet another demanded bits fold or some other sub->add fold. I suspect it's a rare case, and it didn't seem harmful in these tests at least, so I figured it could be another follow-up if needed.

spatel marked an inline comment as done.Feb 20 2022, 6:00 PM

spatel added inline comments.

llvm/test/CodeGen/PowerPC/urem-seteq-illegal-types.ll
261 ↗	(On Diff #410171)	Alternatively, we could probably ignore any multiply by a power-of-2 constant in this fold since that should eventually become a `shl` by itself.

Patch updated:
Ignore power-of-2 multiplies. That eliminates the add/sub diffs and other reg allocation diffs, so now we just show real improvements in the tests.

Fixed a missed test update that didn't make it into the last revision. There really are no pure-noise tests diffs. :)

Harbormaster completed remote builds in B150626: Diff 410204.Feb 20 2022, 7:10 PM

AArch64 test look OK to me. They seem to be generating correct code.

LGTM - cheers

This revision is now accepted and ready to land.Feb 23 2022, 4:55 AM

Closed by commit rG21d7c3bcc646: [DAG] try to convert multiply to shift via demanded bits (authored by spatel). · Explain WhyFeb 23 2022, 9:11 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG21d7c3bcc646: [DAG] try to convert multiply to shift via demanded bits.

Thanks Sanjay - I can confirm this fixes all code quality regressions across the GCC torture suite on RISC-V.

In D120216#3340745, @asb wrote:

Thanks Sanjay - I can confirm this fixes all code quality regressions across the GCC torture suite on RISC-V.

Great - thanks for the bug report and confirming the fix!

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

40 lines

test/

CodeGen/

AArch64/

mul_pow2.ll

12 lines

Mips/

urem-seteq-illegal-types.ll

68 lines

RISCV/

mul.ll

34 lines

X86/

mul-demand.ll

12 lines

Diff 410847

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,480 Lines • ▼ Show 20 Lines	if (C && !C->isAllOnes() && !C->isOne() &&
// Disable the nsw and nuw flags. We can no longer guarantee that we		// Disable the nsw and nuw flags. We can no longer guarantee that we
// won't wrap after simplification.		// won't wrap after simplification.
Flags.setNoSignedWrap(false);		Flags.setNoSignedWrap(false);
Flags.setNoUnsignedWrap(false);		Flags.setNoUnsignedWrap(false);
SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Neg1, Flags);		SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Neg1, Flags);
return TLO.CombineTo(Op, NewOp);		return TLO.CombineTo(Op, NewOp);
}		}

		// Match a multiply with a disguised negated-power-of-2 and convert to a
		// an equivalent shift-left amount.
		// Example: (X * MulC) + Op1 --> Op1 - (X << log2(-MulC))
		auto getShiftLeftAmt = [&HighMask](SDValue Mul) -> unsigned {
		if (Mul.getOpcode() != ISD::MUL \|\| !Mul.hasOneUse())
		return 0;

		// Don't touch opaque constants. Also, ignore zero and power-of-2
		// multiplies. Those will get folded later.
		ConstantSDNode *MulC = isConstOrConstSplat(Mul.getOperand(1));
		if (MulC && !MulC->isOpaque() && !MulC->isZero() &&
		!MulC->getAPIntValue().isPowerOf2()) {
		APInt UnmaskedC = MulC->getAPIntValue() \| HighMask;
		if (UnmaskedC.isNegatedPowerOf2())
		return (-UnmaskedC).logBase2();
		}
		return 0;
		};

		auto foldMul = [&](SDValue X, SDValue Y, unsigned ShlAmt) {
		EVT ShiftAmtTy = getShiftAmountTy(VT, TLO.DAG.getDataLayout());
		SDValue ShlAmtC = TLO.DAG.getConstant(ShlAmt, dl, ShiftAmtTy);
		SDValue Shl = TLO.DAG.getNode(ISD::SHL, dl, VT, X, ShlAmtC);
		SDValue Sub = TLO.DAG.getNode(ISD::SUB, dl, VT, Y, Shl);
		return TLO.CombineTo(Op, Sub);
		};

		if (isOperationLegalOrCustom(ISD::SHL, VT)) {
		if (Op.getOpcode() == ISD::ADD) {
		// (X * MulC) + Op1 --> Op1 - (X << log2(-MulC))
		if (unsigned ShAmt = getShiftLeftAmt(Op0))
		return foldMul(Op0.getOperand(0), Op1, ShAmt);
		// Op0 + (X * MulC) --> Op0 - (X << log2(-MulC))
		if (unsigned ShAmt = getShiftLeftAmt(Op1))
		return foldMul(Op1.getOperand(0), Op0, ShAmt);
		// TODO:
		// Op0 - (X * MulC) --> Op0 + (X << log2(-MulC))
		}
		}

LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
}		}
default:		default:
if (Op.getOpcode() >= ISD::BUILTIN_OP_END) {		if (Op.getOpcode() >= ISD::BUILTIN_OP_END) {
if (SimplifyDemandedBitsForTargetNode(Op, DemandedBits, DemandedElts,		if (SimplifyDemandedBitsForTargetNode(Op, DemandedBits, DemandedElts,
Known, TLO, Depth))		Known, TLO, Depth))
return true;		return true;
break;		break;
▲ Show 20 Lines • Show All 6,723 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/mul_pow2.ll

Show First 20 Lines • Show All 698 Lines • ▼ Show 20 Lines	; GISEL-NEXT: ret

%mul = mul nsw i32 %x, -16		%mul = mul nsw i32 %x, -16
ret i32 %mul		ret i32 %mul
}		}

define i32 @muladd_demand(i32 %x, i32 %y) {		define i32 @muladd_demand(i32 %x, i32 %y) {
; CHECK-LABEL: muladd_demand:		; CHECK-LABEL: muladd_demand:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #131008		; CHECK-NEXT: sub w8, w1, w0, lsl #6
; CHECK-NEXT: madd w8, w0, w8, w1
; CHECK-NEXT: and w0, w8, #0x1ffc0		; CHECK-NEXT: and w0, w8, #0x1ffc0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; GISEL-LABEL: muladd_demand:		; GISEL-LABEL: muladd_demand:
; GISEL: // %bb.0:		; GISEL: // %bb.0:
; GISEL-NEXT: mov w8, #131008		; GISEL-NEXT: mov w8, #131008
; GISEL-NEXT: madd w8, w0, w8, w1		; GISEL-NEXT: madd w8, w0, w8, w1
; GISEL-NEXT: and w0, w8, #0x1ffc0		; GISEL-NEXT: and w0, w8, #0x1ffc0
; GISEL-NEXT: ret		; GISEL-NEXT: ret
%m = mul i32 %x, 131008 ; 0x0001ffc0		%m = mul i32 %x, 131008 ; 0x0001ffc0
%a = add i32 %y, %m		%a = add i32 %y, %m
%r = and i32 %a, 131008		%r = and i32 %a, 131008
ret i32 %r		ret i32 %r
}		}

define <4 x i32> @muladd_demand_commute(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @muladd_demand_commute(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: muladd_demand_commute:		; CHECK-LABEL: muladd_demand_commute:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #131008		; CHECK-NEXT: movi v2.4s, #1, msl #16
; CHECK-NEXT: dup v2.4s, w8		; CHECK-NEXT: shl v0.4s, v0.4s, #6
; CHECK-NEXT: mla v1.4s, v0.4s, v2.4s		; CHECK-NEXT: sub v0.4s, v1.4s, v0.4s
; CHECK-NEXT: movi v0.4s, #1, msl #16		; CHECK-NEXT: and v0.16b, v0.16b, v2.16b
; CHECK-NEXT: and v0.16b, v1.16b, v0.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; GISEL-LABEL: muladd_demand_commute:		; GISEL-LABEL: muladd_demand_commute:
; GISEL: // %bb.0:		; GISEL: // %bb.0:
; GISEL-NEXT: adrp x8, .LCPI42_1		; GISEL-NEXT: adrp x8, .LCPI42_1
; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI42_1]		; GISEL-NEXT: ldr q2, [x8, :lo12:.LCPI42_1]
; GISEL-NEXT: adrp x8, .LCPI42_0		; GISEL-NEXT: adrp x8, .LCPI42_0
; GISEL-NEXT: mla v1.4s, v0.4s, v2.4s		; GISEL-NEXT: mla v1.4s, v0.4s, v2.4s
; GISEL-NEXT: ldr q0, [x8, :lo12:.LCPI42_0]		; GISEL-NEXT: ldr q0, [x8, :lo12:.LCPI42_0]
; GISEL-NEXT: and v0.16b, v1.16b, v0.16b		; GISEL-NEXT: and v0.16b, v1.16b, v0.16b
; GISEL-NEXT: ret		; GISEL-NEXT: ret
%m = mul <4 x i32> %x, <i32 131008, i32 131008, i32 131008, i32 131008>		%m = mul <4 x i32> %x, <i32 131008, i32 131008, i32 131008, i32 131008>
%a = add <4 x i32> %m, %y		%a = add <4 x i32> %m, %y
%r = and <4 x i32> %a, <i32 131071, i32 131071, i32 131071, i32 131071>		%r = and <4 x i32> %a, <i32 131071, i32 131071, i32 131071, i32 131071>
ret <4 x i32> %r		ret <4 x i32> %r
}		}

llvm/test/CodeGen/Mips/urem-seteq-illegal-types.ll

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; }			; }

	define i1 @test_urem_oversized(i66 %X) nounwind {			define i1 @test_urem_oversized(i66 %X) nounwind {
	; MIPSEL-LABEL: test_urem_oversized:			; MIPSEL-LABEL: test_urem_oversized:
	; MIPSEL: # %bb.0:			; MIPSEL: # %bb.0:
	; MIPSEL-NEXT: lui $1, 12057			; MIPSEL-NEXT: lui $1, 12057
	; MIPSEL-NEXT: ori $1, $1, 37186			; MIPSEL-NEXT: ori $1, $1, 37186
	; MIPSEL-NEXT: multu $6, $1			; MIPSEL-NEXT: multu $6, $1
	; MIPSEL-NEXT: mflo $2			; MIPSEL-NEXT: mflo $1
	; MIPSEL-NEXT: mfhi $3			; MIPSEL-NEXT: mfhi $2
	; MIPSEL-NEXT: lui $7, 52741			; MIPSEL-NEXT: lui $3, 52741
	; MIPSEL-NEXT: ori $7, $7, 40665			; MIPSEL-NEXT: ori $3, $3, 40665
	; MIPSEL-NEXT: multu $6, $7			; MIPSEL-NEXT: multu $6, $3
	; MIPSEL-NEXT: mflo $8			; MIPSEL-NEXT: mflo $7
				; MIPSEL-NEXT: mfhi $8
				; MIPSEL-NEXT: multu $5, $3
	; MIPSEL-NEXT: mfhi $9			; MIPSEL-NEXT: mfhi $9
	; MIPSEL-NEXT: multu $5, $7			; MIPSEL-NEXT: mflo $10
	; MIPSEL-NEXT: mfhi $10			; MIPSEL-NEXT: addu $8, $10, $8
	; MIPSEL-NEXT: mflo $11			; MIPSEL-NEXT: addu $11, $1, $8
	; MIPSEL-NEXT: addu $9, $11, $9			; MIPSEL-NEXT: sltu $8, $8, $10
	; MIPSEL-NEXT: addu $12, $2, $9			; MIPSEL-NEXT: sll $10, $11, 31
	; MIPSEL-NEXT: sltu $9, $9, $11			; MIPSEL-NEXT: sltu $1, $11, $1
	; MIPSEL-NEXT: sll $11, $12, 31			; MIPSEL-NEXT: srl $12, $7, 1
	; MIPSEL-NEXT: sltu $2, $12, $2			; MIPSEL-NEXT: sll $7, $7, 1
	; MIPSEL-NEXT: srl $13, $8, 1			; MIPSEL-NEXT: addu $1, $2, $1
	; MIPSEL-NEXT: sll $8, $8, 1			; MIPSEL-NEXT: or $10, $12, $10
	; MIPSEL-NEXT: addu $2, $3, $2			; MIPSEL-NEXT: srl $2, $11, 1
	; MIPSEL-NEXT: or $3, $13, $11			; MIPSEL-NEXT: addu $8, $9, $8
	; MIPSEL-NEXT: srl $11, $12, 1			; MIPSEL-NEXT: mul $3, $4, $3
	; MIPSEL-NEXT: addu $9, $10, $9			; MIPSEL-NEXT: sll $4, $6, 1
	; MIPSEL-NEXT: mul $4, $4, $7			; MIPSEL-NEXT: sll $5, $5, 1
	; MIPSEL-NEXT: mul $1, $5, $1
	; MIPSEL-NEXT: sll $5, $6, 1
	; MIPSEL-NEXT: lui $6, 60010			; MIPSEL-NEXT: lui $6, 60010
	; MIPSEL-NEXT: ori $6, $6, 61135			; MIPSEL-NEXT: ori $6, $6, 61135
	; MIPSEL-NEXT: addu $2, $9, $2			; MIPSEL-NEXT: addu $1, $8, $1
	; MIPSEL-NEXT: addu $1, $1, $2			; MIPSEL-NEXT: subu $1, $1, $5
	; MIPSEL-NEXT: addu $2, $5, $4			; MIPSEL-NEXT: addu $3, $4, $3
	; MIPSEL-NEXT: addu $1, $1, $2			; MIPSEL-NEXT: addu $1, $1, $3
	; MIPSEL-NEXT: andi $1, $1, 3			; MIPSEL-NEXT: andi $1, $1, 3
	; MIPSEL-NEXT: sll $2, $1, 31			; MIPSEL-NEXT: sll $3, $1, 31
	; MIPSEL-NEXT: or $4, $11, $2			; MIPSEL-NEXT: or $3, $2, $3
	; MIPSEL-NEXT: sltiu $2, $4, 13			; MIPSEL-NEXT: sltiu $2, $3, 13
	; MIPSEL-NEXT: xori $4, $4, 13			; MIPSEL-NEXT: xori $3, $3, 13
	; MIPSEL-NEXT: sltu $3, $3, $6			; MIPSEL-NEXT: sltu $4, $10, $6
	; MIPSEL-NEXT: movz $2, $3, $4			; MIPSEL-NEXT: movz $2, $4, $3
	; MIPSEL-NEXT: srl $1, $1, 1			; MIPSEL-NEXT: srl $1, $1, 1
	; MIPSEL-NEXT: or $1, $1, $8			; MIPSEL-NEXT: or $1, $1, $7
	; MIPSEL-NEXT: andi $1, $1, 3			; MIPSEL-NEXT: andi $1, $1, 3
	; MIPSEL-NEXT: jr $ra			; MIPSEL-NEXT: jr $ra
	; MIPSEL-NEXT: movn $2, $zero, $1			; MIPSEL-NEXT: movn $2, $zero, $1
	;			;
	; MIPS64EL-LABEL: test_urem_oversized:			; MIPS64EL-LABEL: test_urem_oversized:
	; MIPS64EL: # %bb.0:			; MIPS64EL: # %bb.0:
	; MIPS64EL-NEXT: lui $1, 6029			; MIPS64EL-NEXT: lui $1, 6029
	; MIPS64EL-NEXT: daddiu $1, $1, -14175			; MIPS64EL-NEXT: daddiu $1, $1, -14175
	; MIPS64EL-NEXT: dsll $1, $1, 16			; MIPS64EL-NEXT: dsll $1, $1, 16
	; MIPS64EL-NEXT: daddiu $1, $1, 26371			; MIPS64EL-NEXT: daddiu $1, $1, 26371
	; MIPS64EL-NEXT: dsll $1, $1, 17			; MIPS64EL-NEXT: dsll $1, $1, 17
	; MIPS64EL-NEXT: daddiu $1, $1, -24871			; MIPS64EL-NEXT: daddiu $1, $1, -24871
	; MIPS64EL-NEXT: dmult $5, $1			; MIPS64EL-NEXT: dmult $5, $1
	; MIPS64EL-NEXT: mflo $2			; MIPS64EL-NEXT: mflo $2
	; MIPS64EL-NEXT: dmultu $4, $1			; MIPS64EL-NEXT: dmultu $4, $1
	; MIPS64EL-NEXT: mflo $1			; MIPS64EL-NEXT: mflo $1
	; MIPS64EL-NEXT: mfhi $3			; MIPS64EL-NEXT: mfhi $3
	; MIPS64EL-NEXT: lui $5, 14			; MIPS64EL-NEXT: lui $5, 14
	; MIPS64EL-NEXT: daddiu $5, $5, -5525			; MIPS64EL-NEXT: daddiu $5, $5, -5525
	; MIPS64EL-NEXT: dsll $5, $5, 16			; MIPS64EL-NEXT: dsll $5, $5, 16
	; MIPS64EL-NEXT: daddiu $5, $5, -4401			; MIPS64EL-NEXT: daddiu $5, $5, -4401
	; MIPS64EL-NEXT: dsll $4, $4, 1			; MIPS64EL-NEXT: dsll $4, $4, 1
	; MIPS64EL-NEXT: daddu $3, $3, $4			; MIPS64EL-NEXT: daddu $3, $3, $4
				RKSimonUnsubmitted Done Reply Inline Actions How did just a single add become a sub? RKSimon: How did just a single add become a sub?
	; MIPS64EL-NEXT: daddu $2, $3, $2			; MIPS64EL-NEXT: daddu $2, $3, $2
	; MIPS64EL-NEXT: andi $3, $2, 3			; MIPS64EL-NEXT: andi $3, $2, 3
	; MIPS64EL-NEXT: dsll $2, $3, 63			; MIPS64EL-NEXT: dsll $2, $3, 63
	; MIPS64EL-NEXT: dsrl $4, $1, 1			; MIPS64EL-NEXT: dsrl $4, $1, 1
	; MIPS64EL-NEXT: or $2, $4, $2			; MIPS64EL-NEXT: or $2, $4, $2
	; MIPS64EL-NEXT: sltu $2, $2, $5			; MIPS64EL-NEXT: sltu $2, $2, $5
	; MIPS64EL-NEXT: dsrl $3, $3, 1			; MIPS64EL-NEXT: dsrl $3, $3, 1
	; MIPS64EL-NEXT: dsll $1, $1, 1			; MIPS64EL-NEXT: dsll $1, $1, 1
	; MIPS64EL-NEXT: or $1, $3, $1			; MIPS64EL-NEXT: or $1, $3, $1
	; MIPS64EL-NEXT: andi $1, $1, 3			; MIPS64EL-NEXT: andi $1, $1, 3
	; MIPS64EL-NEXT: jr $ra			; MIPS64EL-NEXT: jr $ra
	; MIPS64EL-NEXT: movn $2, $zero, $1			; MIPS64EL-NEXT: movn $2, $zero, $1
	%urem = urem i66 %X, 1234567890			%urem = urem i66 %X, 1234567890
	%cmp = icmp eq i66 %urem, 0			%cmp = icmp eq i66 %urem, 0
	ret i1 %cmp			ret i1 %cmp
	}			}

llvm/test/CodeGen/RISCV/mul.ll

Show First 20 Lines • Show All 1,544 Lines • ▼ Show 20 Lines	; RV64IM-NEXT: ret
%4 = lshr i128 %3, 64		%4 = lshr i128 %3, 64
%5 = trunc i128 %4 to i64		%5 = trunc i128 %4 to i64
ret i64 %5		ret i64 %5
}		}

define i8 @muladd_demand(i8 %x, i8 %y) nounwind {		define i8 @muladd_demand(i8 %x, i8 %y) nounwind {
; RV32I-LABEL: muladd_demand:		; RV32I-LABEL: muladd_demand:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: addi sp, sp, -16		; RV32I-NEXT: slli a0, a0, 1
; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill		; RV32I-NEXT: sub a0, a1, a0
; RV32I-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: mv s0, a1
; RV32I-NEXT: li a1, 14
; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: add a0, s0, a0
; RV32I-NEXT: andi a0, a0, 15		; RV32I-NEXT: andi a0, a0, 15
; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV32IM-LABEL: muladd_demand:		; RV32IM-LABEL: muladd_demand:
; RV32IM: # %bb.0:		; RV32IM: # %bb.0:
; RV32IM-NEXT: li a2, 14		; RV32IM-NEXT: slli a0, a0, 1
; RV32IM-NEXT: mul a0, a0, a2		; RV32IM-NEXT: sub a0, a1, a0
; RV32IM-NEXT: add a0, a1, a0
; RV32IM-NEXT: andi a0, a0, 15		; RV32IM-NEXT: andi a0, a0, 15
; RV32IM-NEXT: ret		; RV32IM-NEXT: ret
;		;
; RV64I-LABEL: muladd_demand:		; RV64I-LABEL: muladd_demand:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: slliw a0, a0, 1
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: subw a0, a1, a0
; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: mv s0, a1
; RV64I-NEXT: li a1, 14
; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: addw a0, s0, a0
; RV64I-NEXT: andi a0, a0, 15		; RV64I-NEXT: andi a0, a0, 15
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV64IM-LABEL: muladd_demand:		; RV64IM-LABEL: muladd_demand:
; RV64IM: # %bb.0:		; RV64IM: # %bb.0:
; RV64IM-NEXT: li a2, 14		; RV64IM-NEXT: slliw a0, a0, 1
; RV64IM-NEXT: mulw a0, a0, a2		; RV64IM-NEXT: subw a0, a1, a0
; RV64IM-NEXT: addw a0, a1, a0
; RV64IM-NEXT: andi a0, a0, 15		; RV64IM-NEXT: andi a0, a0, 15
; RV64IM-NEXT: ret		; RV64IM-NEXT: ret
%m = mul i8 %x, 14		%m = mul i8 %x, 14
%a = add i8 %y, %m		%a = add i8 %y, %m
%r = and i8 %a, 15		%r = and i8 %a, 15
ret i8 %r		ret i8 %r
}		}

llvm/test/CodeGen/X86/mul-demand.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-- \| FileCheck %s

	define i64 @muladd_demand(i64 %x, i64 %y) {			define i64 @muladd_demand(i64 %x, i64 %y) {
	; CHECK-LABEL: muladd_demand:			; CHECK-LABEL: muladd_demand:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: imull $131008, %edi, %eax # imm = 0x1FFC0			; CHECK-NEXT: movq %rsi, %rax
	; CHECK-NEXT: addl %esi, %eax			; CHECK-NEXT: shll $6, %edi
				; CHECK-NEXT: subl %edi, %eax
	; CHECK-NEXT: shlq $47, %rax			; CHECK-NEXT: shlq $47, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m = mul i64 %x, 131008 ; 0x0001ffc0			%m = mul i64 %x, 131008 ; 0x0001ffc0
	%a = add i64 %m, %y			%a = add i64 %m, %y
	%r = shl i64 %a, 47			%r = shl i64 %a, 47
	ret i64 %r			ret i64 %r
	}			}

	define <2 x i64> @muladd_demand_commute(<2 x i64> %x, <2 x i64> %y) {			define <2 x i64> @muladd_demand_commute(<2 x i64> %x, <2 x i64> %y) {
	; CHECK-LABEL: muladd_demand_commute:			; CHECK-LABEL: muladd_demand_commute:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: pmuludq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; CHECK-NEXT: psllq $6, %xmm0
	; CHECK-NEXT: paddq %xmm1, %xmm0			; CHECK-NEXT: psubq %xmm0, %xmm1
	; CHECK-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; CHECK-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
				; CHECK-NEXT: movdqa %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%m = mul <2 x i64> %x, <i64 131008, i64 131008>			%m = mul <2 x i64> %x, <i64 131008, i64 131008>
	%a = add <2 x i64> %y, %m			%a = add <2 x i64> %y, %m
	%r = and <2 x i64> %a, <i64 131071, i64 131071>			%r = and <2 x i64> %a, <i64 131071, i64 131071>
	ret <2 x i64> %r			ret <2 x i64> %r
	}			}