This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
3
DAGCombiner.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCISelLowering.h
1/5
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
mulli.ll

Differential D88201

[DAGCombiner] Add decomposition patterns for Mul-by-Imm.
ClosedPublic

Authored by Esme on Sep 24 2020, 12:54 AM.

Download Raw Diff

Details

Reviewers

jsji
nemanjai
steven.zhang
spatel
RKSimon
lebedev.ri
arsenm
craig.topper

Group Reviewers

Restricted Project

Commits

rGe9fd8823baf5: [DAGCombiner] Add decomposition patterns for Mul-by-Imm.

Summary

This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:

mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))

The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook decomposeMulByConstant.

More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.

*res1 = a * 0x8800;
*res2 = a * 0x8080;

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Esme created this revision.Sep 24 2020, 12:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2020, 12:54 AM

Herald added subscribers: llvm-commits, ecnelises, shchenz and 2 others. · View Herald Transcript

Esme requested review of this revision.Sep 24 2020, 12:54 AM

Esme mentioned this in D87384: [PowerPC] Add ISEL patterns for Mul with Imm..Sep 24 2020, 1:32 AM

Esme retitled this revision from [PowerPC] Add patterns for Mul-by-Imm in DAGCombiner. to [DAGCombiner][PowerPC] Add decomposition patterns for Mul-by-Imm..Sep 24 2020, 1:38 AM

steven.zhang retitled this revision from [DAGCombiner][PowerPC] Add decomposition patterns for Mul-by-Imm. to [DAGCombiner] Add decomposition patterns for Mul-by-Imm..Sep 24 2020, 1:44 AM

steven.zhang added reviewers: spatel, RKSimon.

Harbormaster completed remote builds in B72775: Diff 293952.Sep 24 2020, 1:58 AM

Esme added reviewers: lebedev.ri, arsenm, craig.topper.Sep 29 2020, 3:27 AM

Herald added a subscriber: wdng. · View Herald TranscriptSep 29 2020, 3:28 AM

RKSimon added inline comments.Sep 29 2020, 3:38 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3648	MulC.lshrInPlace(TZeros);
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16067	Maybe better to use ConstNode->getAPIntValue().isSignedIntN() ?

@RKSimon Thanks. Updated.

Herald added a subscriber: pengfei. · View Herald TranscriptOct 1 2020, 8:02 PM

Harbormaster completed remote builds in B73733: Diff 295721.Oct 1 2020, 8:14 PM

spatel added inline comments.Oct 2 2020, 7:45 AM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16075	Just to confirm: no target besides PPC is going to see any diffs from this patch because they don't check the constant for trailing zeros yet? (We really should implement the `TODO` from the DAGCombiner code comment, so every target doesn't have to duplicate this logic.)

Esme added inline comments.Oct 4 2020, 10:21 AM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16075	Thanks for your comments. @spatel Yes, RISCV and X86 implemented the hook `decomposeMulByConstant` and both of them don't check the constant for trailing zeros, therefore they will never return true for these constants. So yes, this patch has no effect on targets other than PPC. I will look into the `TODO` after my vacation (Oct 9). :D

LGTM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3602–3604	Independent of this patch, but if you are looking at making further changes in here...that looks strange/unnecessary. The other transforms around here are using `N1IsConst`; why is this one different?

This revision is now accepted and ready to land.Oct 5 2020, 6:40 AM

RKSimon added inline comments.Oct 5 2020, 6:53 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3602–3604	I think because this one has been updated to support non-uniform vectors but most of others haven't, or only match scalars or uniform/splat vectors.

Closed by commit rGe9fd8823baf5: [DAGCombiner] Add decomposition patterns for Mul-by-Imm. (authored by Esme). · Explain WhyOct 9 2020, 1:52 AM

This revision was automatically updated to reflect the committed changes.

Esme added a commit: rGe9fd8823baf5: [DAGCombiner] Add decomposition patterns for Mul-by-Imm..

MaskRay mentioned this in rG2bd4730850cc: [PowerPC] Fix signed overflow in decomposeMulByConstant after D88201.Oct 9 2020, 6:29 PM

MaskRay added a subscriber: MaskRay.Oct 9 2020, 6:31 PM

MaskRay added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16079	LONG_MAX+1 and LONG_MIN-1 are signed overflows. I fixed it in 2bd4730850cc0f3ab7bd0c51b18c0a220e480dc7

saugustine added a subscriber: saugustine.Oct 9 2020, 6:48 PM

saugustine added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16079	undefined-behaviour sanitizer reports an error at this line when executing the test at llvm-project/llvm/test/CodeGen/PowerPC/mul-const-i64.ll PPCISelLowering.cpp:16079:27: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long'

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

20 lines

Target/

PowerPC/

PPCISelLowering.h

3 lines

PPCISelLowering.cpp

26 lines

test/

CodeGen/

PowerPC/

mulli.ll

45 lines

Diff 297156

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,593 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitMUL(SDNode *N) {
// fold (mul x, -1) -> 0-x		// fold (mul x, -1) -> 0-x
if (N1IsConst && ConstValue1.isAllOnesValue()) {		if (N1IsConst && ConstValue1.isAllOnesValue()) {
SDLoc DL(N);		SDLoc DL(N);
return DAG.getNode(ISD::SUB, DL, VT,		return DAG.getNode(ISD::SUB, DL, VT,
DAG.getConstant(0, DL, VT), N0);		DAG.getConstant(0, DL, VT), N0);
}		}

// fold (mul x, (1 << c)) -> x << c		// fold (mul x, (1 << c)) -> x << c
if (isConstantOrConstantVector(N1, /NoOpaques/ true) &&		if (isConstantOrConstantVector(N1, /NoOpaques/ true) &&
DAG.isKnownToBeAPowerOfTwo(N1) &&		DAG.isKnownToBeAPowerOfTwo(N1) &&
(!VT.isVector() \|\| Level <= AfterLegalizeVectorOps)) {		(!VT.isVector() \|\| Level <= AfterLegalizeVectorOps)) {
		spatelUnsubmitted Not Done Reply Inline Actions Independent of this patch, but if you are looking at making further changes in here...that looks strange/unnecessary. The other transforms around here are using `N1IsConst`; why is this one different? spatel: Independent of this patch, but if you are looking at making further changes in here...that…
		RKSimonUnsubmitted Not Done Reply Inline Actions I think because this one has been updated to support non-uniform vectors but most of others haven't, or only match scalars or uniform/splat vectors. RKSimon: I think because this one has been updated to support non-uniform vectors but most of others…
SDLoc DL(N);		SDLoc DL(N);
SDValue LogBase2 = BuildLogBase2(N1, DL);		SDValue LogBase2 = BuildLogBase2(N1, DL);
EVT ShiftVT = getShiftAmountTy(N0.getValueType());		EVT ShiftVT = getShiftAmountTy(N0.getValueType());
SDValue Trunc = DAG.getZExtOrTrunc(LogBase2, DL, ShiftVT);		SDValue Trunc = DAG.getZExtOrTrunc(LogBase2, DL, ShiftVT);
return DAG.getNode(ISD::SHL, DL, VT, N0, Trunc);		return DAG.getNode(ISD::SHL, DL, VT, N0, Trunc);
}		}

// fold (mul x, -(1 << c)) -> -(x << c) or (-x) << c		// fold (mul x, -(1 << c)) -> -(x << c) or (-x) << c
if (N1IsConst && !N1IsOpaqueConst && (-ConstValue1).isPowerOf2()) {		if (N1IsConst && !N1IsOpaqueConst && (-ConstValue1).isPowerOf2()) {
unsigned Log2Val = (-ConstValue1).logBase2();		unsigned Log2Val = (-ConstValue1).logBase2();
SDLoc DL(N);		SDLoc DL(N);
// FIXME: If the input is something that is easily negated (e.g. a		// FIXME: If the input is something that is easily negated (e.g. a
// single-use add), we should put the negate there.		// single-use add), we should put the negate there.
return DAG.getNode(ISD::SUB, DL, VT,		return DAG.getNode(ISD::SUB, DL, VT,
DAG.getConstant(0, DL, VT),		DAG.getConstant(0, DL, VT),
DAG.getNode(ISD::SHL, DL, VT, N0,		DAG.getNode(ISD::SHL, DL, VT, N0,
DAG.getConstant(Log2Val, DL,		DAG.getConstant(Log2Val, DL,
getShiftAmountTy(N0.getValueType()))));		getShiftAmountTy(N0.getValueType()))));
}		}

// Try to transform multiply-by-(power-of-2 +/- 1) into shift and add/sub.		// Try to transform:
		// (1) multiply-by-(power-of-2 +/- 1) into shift and add/sub.
// mul x, (2^N + 1) --> add (shl x, N), x		// mul x, (2^N + 1) --> add (shl x, N), x
// mul x, (2^N - 1) --> sub (shl x, N), x		// mul x, (2^N - 1) --> sub (shl x, N), x
// Examples: x * 33 --> (x << 5) + x		// Examples: x * 33 --> (x << 5) + x
// x * 15 --> (x << 4) - x		// x * 15 --> (x << 4) - x
// x * -33 --> -((x << 5) + x)		// x * -33 --> -((x << 5) + x)
// x * -15 --> -((x << 4) - x) ; this reduces --> x - (x << 4)		// x * -15 --> -((x << 4) - x) ; this reduces --> x - (x << 4)
		// (2) multiply-by-(power-of-2 +/- power-of-2) into shifts and add/sub.
		// mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
		// mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
		// Examples: x * 0x8800 --> (x << 15) + (x << 11)
		// x * 0xf800 --> (x << 16) - (x << 11)
		// x * -0x8800 --> -((x << 15) + (x << 11))
		// x * -0xf800 --> -((x << 16) - (x << 11)) ; (x << 11) - (x << 16)
if (N1IsConst && TLI.decomposeMulByConstant(*DAG.getContext(), VT, N1)) {		if (N1IsConst && TLI.decomposeMulByConstant(*DAG.getContext(), VT, N1)) {
// TODO: We could handle more general decomposition of any constant by		// TODO: We could handle more general decomposition of any constant by
// having the target set a limit on number of ops and making a		// having the target set a limit on number of ops and making a
// callback to determine that sequence (similar to sqrt expansion).		// callback to determine that sequence (similar to sqrt expansion).
unsigned MathOp = ISD::DELETED_NODE;		unsigned MathOp = ISD::DELETED_NODE;
APInt MulC = ConstValue1.abs();		APInt MulC = ConstValue1.abs();
		// The constant `2` should be treated as (2^0 + 1).
		unsigned TZeros = MulC == 2 ? 0 : MulC.countTrailingZeros();
		MulC.lshrInPlace(TZeros);
		RKSimonUnsubmitted Not Done Reply Inline Actions MulC.lshrInPlace(TZeros); RKSimon: MulC.lshrInPlace(TZeros);
if ((MulC - 1).isPowerOf2())		if ((MulC - 1).isPowerOf2())
MathOp = ISD::ADD;		MathOp = ISD::ADD;
else if ((MulC + 1).isPowerOf2())		else if ((MulC + 1).isPowerOf2())
MathOp = ISD::SUB;		MathOp = ISD::SUB;

if (MathOp != ISD::DELETED_NODE) {		if (MathOp != ISD::DELETED_NODE) {
unsigned ShAmt =		unsigned ShAmt =
MathOp == ISD::ADD ? (MulC - 1).logBase2() : (MulC + 1).logBase2();		MathOp == ISD::ADD ? (MulC - 1).logBase2() : (MulC + 1).logBase2();
		ShAmt += TZeros;
assert(ShAmt < VT.getScalarSizeInBits() &&		assert(ShAmt < VT.getScalarSizeInBits() &&
"multiply-by-constant generated out of bounds shift");		"multiply-by-constant generated out of bounds shift");
SDLoc DL(N);		SDLoc DL(N);
SDValue Shl =		SDValue Shl =
DAG.getNode(ISD::SHL, DL, VT, N0, DAG.getConstant(ShAmt, DL, VT));		DAG.getNode(ISD::SHL, DL, VT, N0, DAG.getConstant(ShAmt, DL, VT));
SDValue R = DAG.getNode(MathOp, DL, VT, Shl, N0);		SDValue R =
		TZeros ? DAG.getNode(MathOp, DL, VT, Shl,
		DAG.getNode(ISD::SHL, DL, VT, N0,
		DAG.getConstant(TZeros, DL, VT)))
		: DAG.getNode(MathOp, DL, VT, Shl, N0);
if (ConstValue1.isNegative())		if (ConstValue1.isNegative())
R = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), R);		R = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), R);
return R;		return R;
}		}
}		}

// (mul (shl X, c1), c2) -> (mul X, c2 << c1)		// (mul (shl X, c1), c2) -> (mul X, c2 << c1)
if (N0.getOpcode() == ISD::SHL &&		if (N0.getOpcode() == ISD::SHL &&
▲ Show 20 Lines • Show All 18,695 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 925 Lines • ▼ Show 20 Lines	public:
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

bool convertSelectOfConstantsToMath(EVT VT) const override {		bool convertSelectOfConstantsToMath(EVT VT) const override {
return true;		return true;
}		}

		bool decomposeMulByConstant(LLVMContext &Context, EVT VT,
		SDValue C) const override;

bool isDesirableToTransformToIntegerOp(unsigned Opc,		bool isDesirableToTransformToIntegerOp(unsigned Opc,
EVT VT) const override {		EVT VT) const override {
// Only handle float load/store pair because float(fpr) load/store		// Only handle float load/store pair because float(fpr) load/store
// instruction has more cycles than integer(gpr) load/store in PPC.		// instruction has more cycles than integer(gpr) load/store in PPC.
if (Opc != ISD::LOAD && Opc != ISD::STORE)		if (Opc != ISD::LOAD && Opc != ISD::STORE)
return false;		return false;
if (VT != MVT::f32 && VT != MVT::f64)		if (VT != MVT::f32 && VT != MVT::f64)
return false;		return false;
▲ Show 20 Lines • Show All 395 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,051 Lines • ▼ Show 20 Lines	if (VT == MVT::ppcf128)
return false;		return false;

if (Fast)		if (Fast)
*Fast = true;		*Fast = true;

return true;		return true;
}		}

		bool PPCTargetLowering::decomposeMulByConstant(LLVMContext &Context, EVT VT,
		SDValue C) const {
		// Check integral scalar types.
		if (!VT.isScalarInteger())
		return false;
		if (auto *ConstNode = dyn_cast<ConstantSDNode>(C.getNode())) {
		if (!ConstNode->getAPIntValue().isSignedIntN(64))
		return false;
		RKSimonUnsubmitted Not Done Reply Inline Actions Maybe better to use ConstNode->getAPIntValue().isSignedIntN() ? RKSimon: Maybe better to use ConstNode->getAPIntValue().isSignedIntN() ?
		// This transformation will generate >= 2 operations. But the following
		// cases will generate <= 2 instructions during ISEL. So exclude them.
		// 1. If the constant multiplier fits 16 bits, it can be handled by one
		// HW instruction, ie. MULLI
		// 2. If the multiplier after shifted fits 16 bits, an extra shift
		// instruction is needed than case 1, ie. MULLI and RLDICR
		int64_t Imm = ConstNode->getSExtValue();
		unsigned Shift = countTrailingZeros<uint64_t>(Imm);
		spatelUnsubmitted Not Done Reply Inline Actions Just to confirm: no target besides PPC is going to see any diffs from this patch because they don't check the constant for trailing zeros yet? (We really should implement the `TODO` from the DAGCombiner code comment, so every target doesn't have to duplicate this logic.) spatel: Just to confirm: no target besides PPC is going to see any diffs from this patch because they…
		EsmeAuthorUnsubmitted Done Reply Inline Actions Thanks for your comments. @spatel Yes, RISCV and X86 implemented the hook `decomposeMulByConstant` and both of them don't check the constant for trailing zeros, therefore they will never return true for these constants. So yes, this patch has no effect on targets other than PPC. I will look into the `TODO` after my vacation (Oct 9). :D Esme: Thanks for your comments. @spatel 1. Yes, RISCV and X86 implemented the hook…
		Imm >>= Shift;
		if (isInt<16>(Imm))
		return false;
		if (isPowerOf2_64(Imm + 1) \|\| isPowerOf2_64(Imm - 1) \|\|
		saugustineUnsubmitted Not Done Reply Inline Actions undefined-behaviour sanitizer reports an error at this line when executing the test at llvm-project/llvm/test/CodeGen/PowerPC/mul-const-i64.ll PPCISelLowering.cpp:16079:27: runtime error: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long' saugustine: undefined-behaviour sanitizer reports an error at this line when executing the test at llvm…
		MaskRayUnsubmitted Not Done Reply Inline Actions LONG_MAX+1 and LONG_MIN-1 are signed overflows. I fixed it in 2bd4730850cc0f3ab7bd0c51b18c0a220e480dc7 MaskRay: LONG_MAX+1 and LONG_MIN-1 are signed overflows. I fixed it in…
		isPowerOf2_64(1 - Imm) \|\| isPowerOf2_64(-1 - Imm))
		return true;
		}
		return false;
		}

bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,		bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
EVT VT) const {		EVT VT) const {
return isFMAFasterThanFMulAndFAdd(		return isFMAFasterThanFMulAndFAdd(
MF.getFunction(), VT.getTypeForEVT(MF.getFunction().getContext()));		MF.getFunction(), VT.getTypeForEVT(MF.getFunction().getContext()));
}		}

bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const Function &F,		bool PPCTargetLowering::isFMAFasterThanFMulAndFAdd(const Function &F,
Type *Ty) const {		Type *Ty) const {
▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/mulli.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%y = mul i64 %x, -4866048			%y = mul i64 %x, -4866048
	ret i64 %y			ret i64 %y
	}			}

	define i64 @test5(i64 %x) {			define i64 @test5(i64 %x) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lis 4, 16			; CHECK-NEXT: sldi 4, 3, 12
	; CHECK-NEXT: ori 4, 4, 1			; CHECK-NEXT: sldi 3, 3, 32
	; CHECK-NEXT: sldi 4, 4, 12			; CHECK-NEXT: add 3, 3, 4
	; CHECK-NEXT: mulld 3, 3, 4
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%y = mul i64 %x, 4294971392			%y = mul i64 %x, 4294971392
	ret i64 %y			ret i64 %y
	}			}

	define i64 @test6(i64 %x) {			define i64 @test6(i64 %x) {
	; CHECK-LABEL: test6:			; CHECK-LABEL: test6:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lis 4, -17			; CHECK-NEXT: sldi 4, 3, 12
	; CHECK-NEXT: ori 4, 4, 65535			; CHECK-NEXT: sldi 3, 3, 32
	; CHECK-NEXT: sldi 4, 4, 12			; CHECK-NEXT: add 3, 3, 4
	; CHECK-NEXT: mulld 3, 3, 4			; CHECK-NEXT: neg 3, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%y = mul i64 %x, -4294971392			%y = mul i64 %x, -4294971392
	ret i64 %y			ret i64 %y
	}			}

	define i64 @test7(i64 %x) {			define i64 @test7(i64 %x) {
	; CHECK-LABEL: test7:			; CHECK-LABEL: test7:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lis 4, 31			; CHECK-NEXT: sldi 4, 3, 34
	; CHECK-NEXT: ori 4, 4, 65535			; CHECK-NEXT: sldi 3, 3, 13
	; CHECK-NEXT: sldi 4, 4, 13			; CHECK-NEXT: sub 3, 4, 3
	; CHECK-NEXT: mulld 3, 3, 4
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%y = mul i64 %x, 17179860992			%y = mul i64 %x, 17179860992
	ret i64 %y			ret i64 %y
	}			}

	define i64 @test8(i64 %x) {			define i64 @test8(i64 %x) {
	; CHECK-LABEL: test8:			; CHECK-LABEL: test8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: li 4, -4			; CHECK-NEXT: sldi 4, 3, 13
	; CHECK-NEXT: sldi 4, 4, 32			; CHECK-NEXT: sldi 3, 3, 34
	; CHECK-NEXT: ori 4, 4, 8192			; CHECK-NEXT: sub 3, 4, 3
	; CHECK-NEXT: mulld 3, 3, 4
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%y = mul i64 %x, -17179860992			%y = mul i64 %x, -17179860992
	ret i64 %y			ret i64 %y
	}			}

	define i64 @test9(i64 %x) {			define i64 @test9(i64 %x) {
	; CHECK-LABEL: test9:			; CHECK-LABEL: test9:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lis 4, 16			; CHECK-NEXT: sldi 4, 3, 12
				; CHECK-NEXT: sldi 5, 3, 32
				; CHECK-NEXT: add 4, 5, 4
	; CHECK-NEXT: li 5, 8193			; CHECK-NEXT: li 5, 8193
	; CHECK-NEXT: ori 4, 4, 1
	; CHECK-NEXT: sldi 5, 5, 19			; CHECK-NEXT: sldi 5, 5, 19
	; CHECK-NEXT: sldi 4, 4, 12
	; CHECK-NEXT: mulld 4, 3, 4
	; CHECK-NEXT: mulld 3, 3, 5			; CHECK-NEXT: mulld 3, 3, 5
	; CHECK-NEXT: sub 3, 4, 3			; CHECK-NEXT: sub 3, 4, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%y = mul i64 %x, 4294971392			%y = mul i64 %x, 4294971392
	%z = mul i64 %x, 4295491584			%z = mul i64 %x, 4295491584
	%res = sub i64 %y, %z			%res = sub i64 %y, %z
	ret i64 %res			ret i64 %res
	}			}

	define i64 @test10(i64 %x) {			define i64 @test10(i64 %x) {
	; CHECK-LABEL: test10:			; CHECK-LABEL: test10:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lis 4, 31			; CHECK-NEXT: sldi 4, 3, 34
	; CHECK-NEXT: lis 5, 16383			; CHECK-NEXT: sldi 3, 3, 30
	; CHECK-NEXT: ori 4, 4, 65535
	; CHECK-NEXT: ori 5, 5, 57344
	; CHECK-NEXT: sldi 4, 4, 13
	; CHECK-NEXT: mulld 4, 3, 4
	; CHECK-NEXT: mulld 3, 3, 5
	; CHECK-NEXT: sub 3, 4, 3			; CHECK-NEXT: sub 3, 4, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%y = mul i64 %x, 17179860992			%y = mul i64 %x, 17179860992
	%z = mul i64 %x, 1073733632			%z = mul i64 %x, 1073733632
	%res = sub i64 %y, %z			%res = sub i64 %y, %z
	ret i64 %res			ret i64 %res
	}			}