This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/2
DAGCombiner.cpp
-
Target/X86/
-
X86/
-
X86ISelDAGToDAG.cpp
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
bmi2.ll
-
smul_fix.ll
-
smul_fix_sat.ll
-
umul_fix.ll
-
umul_fix_sat.ll
-
vector-mulfix-legalize.ll

Differential D153620

[X86] Combine MUL+SRL+TRUNC to MULX for i32 on 64-bit
Needs ReviewPublic

Authored by pengfei on Jun 23 2023, 4:10 AM.

Download Raw Diff

Details

Reviewers

craig.topper
probinson
RKSimon

Summary

D153576 brought _mulx_u32 to 64-bit targets. But the lowering of it
doesn't satisfy does not read or write arithmetic flags described in
intrinsic guide: https://godbolt.org/z/xb1fjf1sM

This patch completes the lowering part through combining
(i32 (trunc (shr (mul (zext (i32 A)), (zext (i32 B))), 32)))
to (umul_lohi (i32 A), (i32 B))

It is also a general optimization.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,050 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

pengfei created this revision.Jun 23 2023, 4:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 4:10 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

pengfei requested review of this revision.Jun 23 2023, 4:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2023, 4:10 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

pengfei mentioned this in D153576: [Headers] Fix up some conditionals.Jun 23 2023, 4:12 AM

On https://github.com/llvm/llvm-project/issues/33580 there was concern whether MULX32 on 64-bit targets is slower than the alternatives?

Harbormaster completed remote builds in B240733: Diff 533919.Jun 23 2023, 4:46 AM

In D153620#4443954, @RKSimon wrote:

On https://github.com/llvm/llvm-project/issues/33580 there was concern whether MULX32 on 64-bit targets is slower than the alternatives?

From the current code base, I think we slightly prefer D80498 to D55565?

If not updating arithmetic flags is a requirement, the we should have an IR intrinsic. Relying on pattern matching that can be easily broken is not a good solution.

Most of these test are for fixed point intrinsics. Maybe we should change the lowering of those to what we want instead of using a DAGCombine to get there?

"does not read or write arithmetic flags" described in intrinsic guide

How much do we actually care about honoring this? A program written in C can't tell if an intrinsic overwrites arithmetic flags; I'd feel free to just pick the fastest lowering, regardless of what the guide says.

In D153620#4444499, @craig.topper wrote:

If not updating arithmetic flags is a requirement, the we should have an IR intrinsic. Relying on pattern matching that can be easily broken is not a good solution.

You are right. I just found it doesn't work under O0 due to we use fast ISel. _mulx_u64 works because it falls back to DAGISel.
But I don't see much necessity to add a intrinsic. So maybe just need to roll back it D153681

Most of these test are for fixed point intrinsics. Maybe we should change the lowering of those to what we want instead of using a DAGCombine to get there?

They are not affected by DAGCombine but changes in Select. Is this what you expected here?

How much do we actually care about honoring this? A program written in C can't tell if an intrinsic overwrites arithmetic flags; I'd feel free to just pick the fastest lowering, regardless of what the guide says.

I don't know. But that might explain why we implemented these intrinsics in this way. I have no preference between this and D153681, either way is good to me.

Most of these test are for fixed point intrinsics. Maybe we should change the lowering of those to what we want instead of using a DAGCombine to get there?

They are not affected by DAGCombine but changes in Select. Is this what you expected here?

Why would it be the changes to Select? If the i8 or i16 umul_logo existed before this change, wouldn’t we have failed selection?

craig.topper added inline comments.Jun 24 2023, 12:10 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5284	Shouldn't this be SMUL_LOHI?

In D153620#4445950, @craig.topper wrote:

Most of these test are for fixed point intrinsics. Maybe we should change the lowering of those to what we want instead of using a DAGCombine to get there?

They are not affected by DAGCombine but changes in Select. Is this what you expected here?

Why would it be the changes to Select? If the i8 or i16 umul_logo existed before this change, wouldn’t we have failed selection?

Not sure I understand your question here. I was thinking you mean to DAGCombine in X86ISelLowering, but now I guess you mean in DAGCombiner.
The code was there for a long time. We removed i8 and i16 several years ago due to that code.
So I think the answer to the question is we should give targets a chance to lower it themselves, hence calling TLI.isOperationLegal(ISD::UMUL_LOHI, VT) before doing that.
TBH, I didn't think that far when I changed the code, I just wanted to break the infinite combine loop between mul and mul_lohi.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5284	Yes. Thanks!

Fix typo.

Harbormaster completed remote builds in B240955: Diff 534200.Jun 24 2023, 6:20 AM

probinson mentioned this in D153681: [X86] Move back _mulx_u32 to 32-bit only.Jun 30 2023, 8:40 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

6 lines

Target/

X86/

X86ISelDAGToDAG.cpp

12 lines

X86ISelLowering.cpp

46 lines

test/

CodeGen/

X86/

24 lines

35 lines

130 lines

10 lines

120 lines

vector-mulfix-legalize.ll

156 lines

Diff 534200

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,274 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSMUL_LOHI(SDNode *N) {

// canonicalize constant to RHS (vector doesn't have to splat)		// canonicalize constant to RHS (vector doesn't have to splat)
if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&		if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&
!DAG.isConstantIntBuildVectorOrConstantInt(N1))		!DAG.isConstantIntBuildVectorOrConstantInt(N1))
return DAG.getNode(ISD::SMUL_LOHI, DL, N->getVTList(), N1, N0);		return DAG.getNode(ISD::SMUL_LOHI, DL, N->getVTList(), N1, N0);

// If the type is twice as wide is legal, transform the mulhu to a wider		// If the type is twice as wide is legal, transform the mulhu to a wider
// multiply plus a shift.		// multiply plus a shift.
if (VT.isSimple() && !VT.isVector()) {		if (VT.isSimple() && !VT.isVector() &&
		!TLI.isOperationLegal(ISD::SMUL_LOHI, VT)) {
		craig.topperUnsubmitted Not Done Reply Inline Actions Shouldn't this be SMUL_LOHI? craig.topper: Shouldn't this be SMUL_LOHI?
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Yes. Thanks! pengfei: Yes. Thanks!
MVT Simple = VT.getSimpleVT();		MVT Simple = VT.getSimpleVT();
unsigned SimpleSize = Simple.getSizeInBits();		unsigned SimpleSize = Simple.getSizeInBits();
EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);		EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);
if (TLI.isOperationLegal(ISD::MUL, NewVT)) {		if (TLI.isOperationLegal(ISD::MUL, NewVT)) {
SDValue Lo = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N0);		SDValue Lo = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N0);
SDValue Hi = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N1);		SDValue Hi = DAG.getNode(ISD::SIGN_EXTEND, DL, NewVT, N1);
Lo = DAG.getNode(ISD::MUL, DL, NewVT, Lo, Hi);		Lo = DAG.getNode(ISD::MUL, DL, NewVT, Lo, Hi);
// Compute the high part as N1.		// Compute the high part as N1.
Show All 33 Lines	SDValue DAGCombiner::visitUMUL_LOHI(SDNode *N) {
// (umul_lohi N0, 1) -> (N0, 0)		// (umul_lohi N0, 1) -> (N0, 0)
if (isOneConstant(N1)) {		if (isOneConstant(N1)) {
SDValue Zero = DAG.getConstant(0, DL, VT);		SDValue Zero = DAG.getConstant(0, DL, VT);
return CombineTo(N, N0, Zero);		return CombineTo(N, N0, Zero);
}		}

// If the type is twice as wide is legal, transform the mulhu to a wider		// If the type is twice as wide is legal, transform the mulhu to a wider
// multiply plus a shift.		// multiply plus a shift.
if (VT.isSimple() && !VT.isVector()) {		if (VT.isSimple() && !VT.isVector() &&
		!TLI.isOperationLegal(ISD::UMUL_LOHI, VT)) {
MVT Simple = VT.getSimpleVT();		MVT Simple = VT.getSimpleVT();
unsigned SimpleSize = Simple.getSizeInBits();		unsigned SimpleSize = Simple.getSizeInBits();
EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);		EVT NewVT = EVT::getIntegerVT(DAG.getContext(), SimpleSize2);
if (TLI.isOperationLegal(ISD::MUL, NewVT)) {		if (TLI.isOperationLegal(ISD::MUL, NewVT)) {
SDValue Lo = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N0);		SDValue Lo = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N0);
SDValue Hi = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N1);		SDValue Hi = DAG.getNode(ISD::ZERO_EXTEND, DL, NewVT, N1);
Lo = DAG.getNode(ISD::MUL, DL, NewVT, Lo, Hi);		Lo = DAG.getNode(ISD::MUL, DL, NewVT, Lo, Hi);
// Compute the high part as N1.		// Compute the high part as N1.
▲ Show 20 Lines • Show All 22,097 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 5,249 Lines • ▼ Show 20 Lines	case ISD::UMUL_LOHI: {

unsigned Opc, MOpc;		unsigned Opc, MOpc;
unsigned LoReg, HiReg;		unsigned LoReg, HiReg;
bool IsSigned = Opcode == ISD::SMUL_LOHI;		bool IsSigned = Opcode == ISD::SMUL_LOHI;
bool UseMULX = !IsSigned && Subtarget->hasBMI2();		bool UseMULX = !IsSigned && Subtarget->hasBMI2();
bool UseMULXHi = UseMULX && SDValue(Node, 0).use_empty();		bool UseMULXHi = UseMULX && SDValue(Node, 0).use_empty();
switch (NVT.SimpleTy) {		switch (NVT.SimpleTy) {
default: llvm_unreachable("Unsupported VT!");		default: llvm_unreachable("Unsupported VT!");
		case MVT::i8:
		Opc = IsSigned ? X86::IMUL8r : X86::MUL8r;
		MOpc = IsSigned ? X86::IMUL8m : X86::MUL8m;
		LoReg = X86::AL;
		HiReg = X86::AH;
		break;
		case MVT::i16:
		Opc = IsSigned ? X86::IMUL16r : X86::MUL16r;
		MOpc = IsSigned ? X86::IMUL16m : X86::MUL16m;
		LoReg = X86::AX;
		HiReg = X86::DX;
		break;
case MVT::i32:		case MVT::i32:
Opc = UseMULXHi ? X86::MULX32Hrr :		Opc = UseMULXHi ? X86::MULX32Hrr :
UseMULX ? X86::MULX32rr :		UseMULX ? X86::MULX32rr :
IsSigned ? X86::IMUL32r : X86::MUL32r;		IsSigned ? X86::IMUL32r : X86::MUL32r;
MOpc = UseMULXHi ? X86::MULX32Hrm :		MOpc = UseMULXHi ? X86::MULX32Hrm :
UseMULX ? X86::MULX32rm :		UseMULX ? X86::MULX32rm :
IsSigned ? X86::IMUL32m : X86::MUL32m;		IsSigned ? X86::IMUL32m : X86::MUL32m;
LoReg = UseMULX ? X86::EDX : X86::EAX;		LoReg = UseMULX ? X86::EDX : X86::EAX;
▲ Show 20 Lines • Show All 971 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,759 Lines • ▼ Show 20 Lines
	EVT ResVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16,			EVT ResVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16,
	InVT.getVectorNumElements() / 2);			InVT.getVectorNumElements() / 2);
	return DAG.getNode(X86ISD::VPMADDUBSW, DL, ResVT, Ops[0], Ops[1]);			return DAG.getNode(X86ISD::VPMADDUBSW, DL, ResVT, Ops[0], Ops[1]);
	};			};
	return SplitOpsAndApply(DAG, Subtarget, DL, VT, { ZExtIn, SExtIn },			return SplitOpsAndApply(DAG, Subtarget, DL, VT, { ZExtIn, SExtIn },
	PMADDBuilder);			PMADDBuilder);
	}			}

				// Attempt to match MULX, which multiplies corresponding unsigned int and
				// extracts high part and low part respectively.
				//
				// Which looks something like this:
				// (i32 (trunc (shr (mul (zext (i32 A)), (zext (i32 B))), 32)))
				static SDValue detectMULX(SDValue In, EVT VT, SelectionDAG &DAG,
				TargetLowering::DAGCombinerInfo &DCI,
				const X86Subtarget &Subtarget, const SDLoc &DL) {
				if (VT != MVT::i32 \|\| !Subtarget.hasBMI2() \|\| In.getOpcode() != ISD::SRL)
				return SDValue();
				SDValue Op0 = In.getOperand(0);
				auto *C = dyn_cast<ConstantSDNode>(In.getOperand(1));
				if (!C \|\| C->getZExtValue() != 32 \|\| Op0.getOpcode() != ISD::MUL)
				return SDValue();
				SDValue Op00 = Op0.getOperand(0);
				SDValue Op01 = Op0.getOperand(1);
				if (Op00.getOpcode() != ISD::ZERO_EXTEND \|\|
				Op00.getOperand(0).getValueType() != MVT::i32 \|\|
				Op01.getOpcode() != ISD::ZERO_EXTEND \|\|
				Op01.getOperand(0).getValueType() != MVT::i32)
				return SDValue();
				SmallVector<SDValue, 2> UserL;
				for (SDNode *User : Op0->uses()) {
				if (User == In.getNode())
				continue;
				if (User->getOpcode() != ISD::TRUNCATE \|\| User->getValueType(0) != VT)
				return SDValue();
				UserL.push_back(SDValue(User, 0));
				}
				SDValue Lo = DAG.getNode(ISD::UMUL_LOHI, DL, DAG.getVTList(VT, VT),
				Op00.getOperand(0), Op01.getOperand(0));
				for (SDValue U : UserL) {
				DAG.ReplaceAllUsesOfValueWith(U, Lo);
				DCI.recursivelyDeleteUnusedNodes(U.getNode());
				}
				return SDValue(Lo.getNode(), 1);
				}

	static SDValue combineTruncate(SDNode *N, SelectionDAG &DAG,			static SDValue combineTruncate(SDNode *N, SelectionDAG &DAG,
				TargetLowering::DAGCombinerInfo &DCI,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);
	SDValue Src = N->getOperand(0);			SDValue Src = N->getOperand(0);
	SDLoc DL(N);			SDLoc DL(N);

	// Attempt to pre-truncate inputs to arithmetic ops instead.			// Attempt to pre-truncate inputs to arithmetic ops instead.
	if (SDValue V = combineTruncatedArithmetic(N, DAG, Subtarget, DL))			if (SDValue V = combineTruncatedArithmetic(N, DAG, Subtarget, DL))
	return V;			return V;

	// Try to detect AVG pattern first.			// Try to detect AVG pattern first.
	if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget, DL))			if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget, DL))
	return Avg;			return Avg;

	// Try to detect PMADD			// Try to detect PMADD
	if (SDValue PMAdd = detectPMADDUBSW(Src, VT, DAG, Subtarget, DL))			if (SDValue PMAdd = detectPMADDUBSW(Src, VT, DAG, Subtarget, DL))
	return PMAdd;			return PMAdd;

				// Try to detect MULX
				if (SDValue MulX = detectMULX(Src, VT, DAG, DCI, Subtarget, DL)) {
				return MulX;
				}

	// Try to combine truncation with signed/unsigned saturation.			// Try to combine truncation with signed/unsigned saturation.
	if (SDValue Val = combineTruncateWithSat(Src, VT, DL, DAG, Subtarget))			if (SDValue Val = combineTruncateWithSat(Src, VT, DL, DAG, Subtarget))
	return Val;			return Val;

	// Try to combine PMULHUW/PMULHW for vXi16.			// Try to combine PMULHUW/PMULHW for vXi16.
	if (SDValue V = combinePMULH(Src, VT, DL, DAG, Subtarget))			if (SDValue V = combinePMULH(Src, VT, DL, DAG, Subtarget))
	return V;			return V;

	▲ Show 20 Lines • Show All 4,607 Lines • ▼ Show 20 Lines
	case ISD::UINT_TO_FP:			case ISD::UINT_TO_FP:
	case ISD::STRICT_UINT_TO_FP:			case ISD::STRICT_UINT_TO_FP:
	return combineUIntToFP(N, DAG, Subtarget);			return combineUIntToFP(N, DAG, Subtarget);
	case ISD::FADD:			case ISD::FADD:
	case ISD::FSUB: return combineFaddFsub(N, DAG, Subtarget);			case ISD::FSUB: return combineFaddFsub(N, DAG, Subtarget);
	case X86ISD::VFCMULC:			case X86ISD::VFCMULC:
	case X86ISD::VFMULC: return combineFMulcFCMulc(N, DAG, Subtarget);			case X86ISD::VFMULC: return combineFMulcFCMulc(N, DAG, Subtarget);
	case ISD::FNEG: return combineFneg(N, DAG, DCI, Subtarget);			case ISD::FNEG: return combineFneg(N, DAG, DCI, Subtarget);
	case ISD::TRUNCATE: return combineTruncate(N, DAG, Subtarget);			case ISD::TRUNCATE: return combineTruncate(N, DAG, DCI, Subtarget);
	case X86ISD::VTRUNC: return combineVTRUNC(N, DAG, DCI);			case X86ISD::VTRUNC: return combineVTRUNC(N, DAG, DCI);
	case X86ISD::ANDNP: return combineAndnp(N, DAG, DCI, Subtarget);			case X86ISD::ANDNP: return combineAndnp(N, DAG, DCI, Subtarget);
	case X86ISD::FAND: return combineFAnd(N, DAG, Subtarget);			case X86ISD::FAND: return combineFAnd(N, DAG, Subtarget);
	case X86ISD::FANDN: return combineFAndn(N, DAG, Subtarget);			case X86ISD::FANDN: return combineFAndn(N, DAG, Subtarget);
	case X86ISD::FXOR:			case X86ISD::FXOR:
	case X86ISD::FOR: return combineFOr(N, DAG, DCI, Subtarget);			case X86ISD::FOR: return combineFOr(N, DAG, DCI, Subtarget);
	case X86ISD::FMIN:			case X86ISD::FMIN:
	case X86ISD::FMAX: return combineFMinFMax(N, DAG);			case X86ISD::FMAX: return combineFMinFMax(N, DAG);
	▲ Show 20 Lines • Show All 1,487 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/bmi2.ll

	Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	; X86-NEXT: addl %edx, %edx			; X86-NEXT: addl %edx, %edx
	; X86-NEXT: addl %eax, %eax			; X86-NEXT: addl %eax, %eax
	; X86-NEXT: mulxl %eax, %eax, %edx			; X86-NEXT: mulxl %eax, %eax, %edx
	; X86-NEXT: movl %edx, (%ecx)			; X86-NEXT: movl %edx, (%ecx)
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: mulx32:			; X64-LABEL: mulx32:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: # kill: def $esi killed $esi def $rsi			; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: # kill: def $edi killed $edi def $rdi			; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: addl %edi, %edi			; X64-NEXT: leal (%rdi,%rdi), %edx
	; X64-NEXT: leal (%rsi,%rsi), %eax			; X64-NEXT: addl %esi, %esi
	; X64-NEXT: imulq %rdi, %rax			; X64-NEXT: mulxl %esi, %eax, %edx
	; X64-NEXT: movq %rax, %rcx			; X64-NEXT: movl %edx, (%rcx)
	; X64-NEXT: shrq $32, %rcx
	; X64-NEXT: movl %ecx, (%rdx)
	; X64-NEXT: # kill: def $eax killed $eax killed $rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%x1 = add i32 %x, %x			%x1 = add i32 %x, %x
	%y1 = add i32 %y, %y			%y1 = add i32 %y, %y
	%x2 = zext i32 %x1 to i64			%x2 = zext i32 %x1 to i64
	%y2 = zext i32 %y1 to i64			%y2 = zext i32 %y1 to i64
	%r1 = mul i64 %x2, %y2			%r1 = mul i64 %x2, %y2
	%h1 = lshr i64 %r1, 32			%h1 = lshr i64 %r1, 32
	%h = trunc i64 %h1 to i32			%h = trunc i64 %h1 to i32
	Show All 10 Lines
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: addl %edx, %edx			; X86-NEXT: addl %edx, %edx
	; X86-NEXT: mulxl (%eax), %eax, %edx			; X86-NEXT: mulxl (%eax), %eax, %edx
	; X86-NEXT: movl %edx, (%ecx)			; X86-NEXT: movl %edx, (%ecx)
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: mulx32_load:			; X64-LABEL: mulx32_load:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: # kill: def $edi killed $edi def $rdi			; X64-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: leal (%rdi,%rdi), %eax			; X64-NEXT: leal (%rdi,%rdi), %edx
	; X64-NEXT: movl (%rsi), %ecx			; X64-NEXT: mulxl (%rsi), %eax, %edx
	; X64-NEXT: imulq %rcx, %rax			; X64-NEXT: movl %edx, (%rcx)
	; X64-NEXT: movq %rax, %rcx
	; X64-NEXT: shrq $32, %rcx
	; X64-NEXT: movl %ecx, (%rdx)
	; X64-NEXT: # kill: def $eax killed $eax killed $rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%x1 = add i32 %x, %x			%x1 = add i32 %x, %x
	%y1 = load i32, ptr %y			%y1 = load i32, ptr %y
	%x2 = zext i32 %x1 to i64			%x2 = zext i32 %x1 to i64
	%y2 = zext i32 %y1 to i64			%y2 = zext i32 %y1 to i64
	%r1 = mul i64 %x2, %y2			%r1 = mul i64 %x2, %y2
	%h1 = lshr i64 %r1, 32			%h1 = lshr i64 %r1, 32
	%h = trunc i64 %h1 to i32			%h = trunc i64 %h1 to i32
	%l = trunc i64 %r1 to i32			%l = trunc i64 %r1 to i32
	store i32 %h, ptr %p			store i32 %h, ptr %p
	ret i32 %l			ret i32 %l
	}			}

llvm/test/CodeGen/X86/smul_fix.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64
	; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86

	declare i4 @llvm.smul.fix.i4 (i4, i4, i32)			declare i4 @llvm.smul.fix.i4 (i4, i4, i32)
	declare i32 @llvm.smul.fix.i32 (i32, i32, i32)			declare i32 @llvm.smul.fix.i32 (i32, i32, i32)
	declare i64 @llvm.smul.fix.i64 (i64, i64, i32)			declare i64 @llvm.smul.fix.i64 (i64, i64, i32)
	declare <4 x i32> @llvm.smul.fix.v4i32(<4 x i32>, <4 x i32>, i32)			declare <4 x i32> @llvm.smul.fix.v4i32(<4 x i32>, <4 x i32>, i32)

	define i32 @func(i32 %x, i32 %y) nounwind {			define i32 @func(i32 %x, i32 %y) nounwind {
	; X64-LABEL: func:			; X64-LABEL: func:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movslq %esi, %rax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: movslq %edi, %rcx			; X64-NEXT: imull %esi
	; X64-NEXT: imulq %rax, %rcx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: shrq $32, %rax
	; X64-NEXT: shldl $30, %ecx, %eax
	; X64-NEXT: # kill: def $eax killed $eax killed $rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func:			; X86-LABEL: func:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: imull {{[0-9]+}}(%esp)			; X86-NEXT: imull {{[0-9]+}}(%esp)
	; X86-NEXT: shrdl $2, %edx, %eax			; X86-NEXT: shrdl $2, %edx, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	%tmp = call i64 @llvm.smul.fix.i64(i64 %x, i64 %y, i32 2)			%tmp = call i64 @llvm.smul.fix.i64(i64 %x, i64 %y, i32 2)
	ret i64 %tmp			ret i64 %tmp
	}			}

	define i4 @func3(i4 %x, i4 %y) nounwind {			define i4 @func3(i4 %x, i4 %y) nounwind {
	; X64-LABEL: func3:			; X64-LABEL: func3:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: shlb $4, %dil			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: sarb $4, %dil			; X64-NEXT: shlb $4, %al
				; X64-NEXT: sarb $4, %al
	; X64-NEXT: shlb $4, %sil			; X64-NEXT: shlb $4, %sil
	; X64-NEXT: sarb $4, %sil			; X64-NEXT: sarb $4, %sil
	; X64-NEXT: movsbl %sil, %ecx
	; X64-NEXT: movsbl %dil, %eax
	; X64-NEXT: imull %ecx, %eax
	; X64-NEXT: movl %eax, %ecx
	; X64-NEXT: shrb $2, %cl
	; X64-NEXT: shrl $8, %eax
	; X64-NEXT: shlb $6, %al
	; X64-NEXT: orb %cl, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
				; X64-NEXT: imulb %sil
				; X64-NEXT: movb %ah, %cl
				; X64-NEXT: shrb $2, %al
				; X64-NEXT: shlb $6, %cl
				; X64-NEXT: orb %cl, %al
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func3:			; X86-LABEL: func3:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: shlb $4, %al			; X86-NEXT: shlb $4, %al
	; X86-NEXT: sarb $4, %al			; X86-NEXT: sarb $4, %al
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: shlb $4, %cl			; X86-NEXT: shlb $4, %cl
	; X86-NEXT: sarb $4, %cl			; X86-NEXT: sarb $4, %cl
	; X86-NEXT: movsbl %cl, %ecx			; X86-NEXT: imulb %cl
	; X86-NEXT: movsbl %al, %eax
	; X86-NEXT: imull %ecx, %eax
	; X86-NEXT: shlb $6, %ah
	; X86-NEXT: shrb $2, %al			; X86-NEXT: shrb $2, %al
				; X86-NEXT: shlb $6, %ah
	; X86-NEXT: orb %ah, %al			; X86-NEXT: orb %ah, %al
	; X86-NEXT: # kill: def $al killed $al killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	%tmp = call i4 @llvm.smul.fix.i4(i4 %x, i4 %y, i32 2)			%tmp = call i4 @llvm.smul.fix.i4(i4 %x, i4 %y, i32 2)
	ret i4 %tmp			ret i4 %tmp
	}			}

	define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {			define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
	; X64-LABEL: vec:			; X64-LABEL: vec:
	; X64: # %bb.0:			; X64: # %bb.0:
	▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/smul_fix_sat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64
	; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86

	declare i4 @llvm.smul.fix.sat.i4 (i4, i4, i32)			declare i4 @llvm.smul.fix.sat.i4 (i4, i4, i32)
	declare i32 @llvm.smul.fix.sat.i32 (i32, i32, i32)			declare i32 @llvm.smul.fix.sat.i32 (i32, i32, i32)
	declare i64 @llvm.smul.fix.sat.i64 (i64, i64, i32)			declare i64 @llvm.smul.fix.sat.i64 (i64, i64, i32)
	declare <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32>, <4 x i32>, i32)			declare <4 x i32> @llvm.smul.fix.sat.v4i32(<4 x i32>, <4 x i32>, i32)

	define i32 @func(i32 %x, i32 %y) nounwind {			define i32 @func(i32 %x, i32 %y) nounwind {
	; X64-LABEL: func:			; X64-LABEL: func:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movslq %esi, %rax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: movslq %edi, %rcx			; X64-NEXT: imull %esi
	; X64-NEXT: imulq %rax, %rcx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: cmpl $2, %edx
	; X64-NEXT: shrq $32, %rax			; X64-NEXT: movl $2147483647, %ecx # imm = 0x7FFFFFFF
	; X64-NEXT: shrdl $2, %eax, %ecx			; X64-NEXT: cmovgel %ecx, %eax
	; X64-NEXT: cmpl $2, %eax			; X64-NEXT: cmpl $-2, %edx
	; X64-NEXT: movl $2147483647, %edx # imm = 0x7FFFFFFF			; X64-NEXT: movl $-2147483648, %ecx # imm = 0x80000000
	; X64-NEXT: cmovll %ecx, %edx			; X64-NEXT: cmovll %ecx, %eax
	; X64-NEXT: cmpl $-2, %eax
	; X64-NEXT: movl $-2147483648, %eax # imm = 0x80000000
	; X64-NEXT: cmovgel %edx, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func:			; X86-LABEL: func:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: imull {{[0-9]+}}(%esp)			; X86-NEXT: imull {{[0-9]+}}(%esp)
	; X86-NEXT: shrdl $2, %edx, %eax			; X86-NEXT: shrdl $2, %edx, %eax
	; X86-NEXT: cmpl $2, %edx			; X86-NEXT: cmpl $2, %edx
	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	%tmp = call i64 @llvm.smul.fix.sat.i64(i64 %x, i64 %y, i32 2)			%tmp = call i64 @llvm.smul.fix.sat.i64(i64 %x, i64 %y, i32 2)
	ret i64 %tmp			ret i64 %tmp
	}			}

	define i4 @func3(i4 %x, i4 %y) nounwind {			define i4 @func3(i4 %x, i4 %y) nounwind {
	; X64-LABEL: func3:			; X64-LABEL: func3:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: movl %edi, %eax
	; X64-NEXT: shlb $4, %sil			; X64-NEXT: shlb $4, %sil
	; X64-NEXT: sarb $4, %sil			; X64-NEXT: sarb $4, %sil
	; X64-NEXT: shlb $4, %dil			; X64-NEXT: shlb $4, %al
	; X64-NEXT: movsbl %dil, %eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: movsbl %sil, %ecx			; X64-NEXT: imulb %sil
	; X64-NEXT: imull %eax, %ecx			; X64-NEXT: movb %ah, %cl
	; X64-NEXT: movl %ecx, %eax
	; X64-NEXT: shrb $2, %al			; X64-NEXT: shrb $2, %al
	; X64-NEXT: shrl $8, %ecx			; X64-NEXT: movb %ah, %dl
	; X64-NEXT: movl %ecx, %edx
	; X64-NEXT: shlb $6, %dl			; X64-NEXT: shlb $6, %dl
	; X64-NEXT: orb %al, %dl			; X64-NEXT: orb %al, %dl
	; X64-NEXT: movzbl %dl, %eax			; X64-NEXT: movzbl %dl, %eax
	; X64-NEXT: cmpb $2, %cl			; X64-NEXT: cmpb $2, %cl
	; X64-NEXT: movl $127, %edx			; X64-NEXT: movl $127, %edx
	; X64-NEXT: cmovll %eax, %edx			; X64-NEXT: cmovll %eax, %edx
	; X64-NEXT: cmpb $-2, %cl			; X64-NEXT: cmpb $-2, %cl
	; X64-NEXT: movl $128, %eax			; X64-NEXT: movl $128, %eax
	; X64-NEXT: cmovgel %edx, %eax			; X64-NEXT: cmovgel %edx, %eax
	; X64-NEXT: sarb $4, %al			; X64-NEXT: sarb $4, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func3:			; X86-LABEL: func3:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: shlb $4, %al
	; X86-NEXT: sarb $4, %al
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: shlb $4, %cl			; X86-NEXT: shlb $4, %cl
	; X86-NEXT: movsbl %cl, %ecx			; X86-NEXT: sarb $4, %cl
	; X86-NEXT: movsbl %al, %eax			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: imull %ecx, %eax			; X86-NEXT: shlb $4, %al
				; X86-NEXT: imulb %cl
				; X86-NEXT: shrb $2, %al
	; X86-NEXT: movb %ah, %cl			; X86-NEXT: movb %ah, %cl
	; X86-NEXT: shlb $6, %cl			; X86-NEXT: shlb $6, %cl
	; X86-NEXT: shrb $2, %al			; X86-NEXT: orb %al, %cl
	; X86-NEXT: orb %cl, %al			; X86-NEXT: movzbl %cl, %ecx
	; X86-NEXT: movzbl %al, %ecx
	; X86-NEXT: cmpb $2, %ah			; X86-NEXT: cmpb $2, %ah
	; X86-NEXT: movl $127, %edx			; X86-NEXT: movl $127, %edx
	; X86-NEXT: cmovll %ecx, %edx			; X86-NEXT: cmovll %ecx, %edx
	; X86-NEXT: cmpb $-2, %ah			; X86-NEXT: cmpb $-2, %ah
	; X86-NEXT: movl $128, %eax			; X86-NEXT: movl $128, %eax
	; X86-NEXT: cmovgel %edx, %eax			; X86-NEXT: cmovgel %edx, %eax
	; X86-NEXT: sarb $4, %al			; X86-NEXT: sarb $4, %al
	; X86-NEXT: # kill: def $al killed $al killed $eax			; X86-NEXT: # kill: def $al killed $al killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	%tmp = call i4 @llvm.smul.fix.sat.i4(i4 %x, i4 %y, i32 2)			%tmp = call i4 @llvm.smul.fix.sat.i4(i4 %x, i4 %y, i32 2)
	ret i4 %tmp			ret i4 %tmp
	}			}

	define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {			define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
	; X64-LABEL: vec:			; X64-LABEL: vec:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,3,3,3]
	; X64-NEXT: movd %xmm2, %eax
	; X64-NEXT: cltq
	; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[3,3,3,3]			; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[3,3,3,3]
				; X64-NEXT: movd %xmm2, %eax
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,3,3,3]
	; X64-NEXT: movd %xmm2, %ecx			; X64-NEXT: movd %xmm2, %ecx
	; X64-NEXT: movslq %ecx, %rdx			; X64-NEXT: imull %ecx
	; X64-NEXT: imulq %rax, %rdx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: cmpl $2, %edx
	; X64-NEXT: shrq $32, %rcx			; X64-NEXT: movl $2147483647, %ecx # imm = 0x7FFFFFFF
	; X64-NEXT: shrdl $2, %ecx, %edx			; X64-NEXT: cmovgel %ecx, %eax
	; X64-NEXT: cmpl $2, %ecx			; X64-NEXT: cmpl $-2, %edx
	; X64-NEXT: movl $2147483647, %eax # imm = 0x7FFFFFFF			; X64-NEXT: movl $-2147483648, %esi # imm = 0x80000000
	; X64-NEXT: cmovgel %eax, %edx			; X64-NEXT: cmovll %esi, %eax
	; X64-NEXT: cmpl $-2, %ecx			; X64-NEXT: movd %eax, %xmm2
	; X64-NEXT: movl $-2147483648, %ecx # imm = 0x80000000			; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm0[2,3,2,3]
	; X64-NEXT: cmovll %ecx, %edx			; X64-NEXT: movd %xmm3, %eax
	; X64-NEXT: movd %edx, %xmm2
	; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,2,3]			; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,2,3]
	; X64-NEXT: movd %xmm3, %edx			; X64-NEXT: movd %xmm3, %edx
	; X64-NEXT: movslq %edx, %rdx			; X64-NEXT: imull %edx
	; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm0[2,3,2,3]			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: movd %xmm3, %esi
	; X64-NEXT: movslq %esi, %rsi
	; X64-NEXT: imulq %rdx, %rsi
	; X64-NEXT: movq %rsi, %rdx
	; X64-NEXT: shrq $32, %rdx
	; X64-NEXT: shrdl $2, %edx, %esi
	; X64-NEXT: cmpl $2, %edx			; X64-NEXT: cmpl $2, %edx
	; X64-NEXT: cmovgel %eax, %esi			; X64-NEXT: cmovgel %ecx, %eax
	; X64-NEXT: cmpl $-2, %edx			; X64-NEXT: cmpl $-2, %edx
	; X64-NEXT: cmovll %ecx, %esi			; X64-NEXT: cmovll %esi, %eax
	; X64-NEXT: movd %esi, %xmm3			; X64-NEXT: movd %eax, %xmm3
	; X64-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]			; X64-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
				; X64-NEXT: movd %xmm0, %eax
	; X64-NEXT: movd %xmm1, %edx			; X64-NEXT: movd %xmm1, %edx
	; X64-NEXT: movslq %edx, %rdx			; X64-NEXT: imull %edx
	; X64-NEXT: movd %xmm0, %esi			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: movslq %esi, %rsi
	; X64-NEXT: imulq %rdx, %rsi
	; X64-NEXT: movq %rsi, %rdx
	; X64-NEXT: shrq $32, %rdx
	; X64-NEXT: shrdl $2, %edx, %esi
	; X64-NEXT: cmpl $2, %edx			; X64-NEXT: cmpl $2, %edx
	; X64-NEXT: cmovgel %eax, %esi			; X64-NEXT: cmovgel %ecx, %eax
	; X64-NEXT: cmpl $-2, %edx			; X64-NEXT: cmpl $-2, %edx
	; X64-NEXT: cmovll %ecx, %esi			; X64-NEXT: cmovll %esi, %eax
	; X64-NEXT: movd %esi, %xmm2			; X64-NEXT: movd %eax, %xmm2
	; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,1,1]
	; X64-NEXT: movd %xmm1, %edx
	; X64-NEXT: movslq %edx, %rdx
	; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,1,1]			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,1,1]
	; X64-NEXT: movd %xmm0, %esi			; X64-NEXT: movd %xmm0, %eax
	; X64-NEXT: movslq %esi, %rsi			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,1,1]
	; X64-NEXT: imulq %rdx, %rsi			; X64-NEXT: movd %xmm0, %edx
	; X64-NEXT: movq %rsi, %rdx			; X64-NEXT: imull %edx
	; X64-NEXT: shrq $32, %rdx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: shrdl $2, %edx, %esi
	; X64-NEXT: cmpl $2, %edx			; X64-NEXT: cmpl $2, %edx
	; X64-NEXT: cmovgel %eax, %esi			; X64-NEXT: cmovgel %ecx, %eax
	; X64-NEXT: cmpl $-2, %edx			; X64-NEXT: cmpl $-2, %edx
	; X64-NEXT: cmovll %ecx, %esi			; X64-NEXT: cmovll %esi, %eax
	; X64-NEXT: movd %esi, %xmm0			; X64-NEXT: movd %eax, %xmm0
	; X64-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]			; X64-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
	; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]			; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
	; X64-NEXT: movdqa %xmm2, %xmm0			; X64-NEXT: movdqa %xmm2, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: vec:			; X86-LABEL: vec:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl %ebp			; X86-NEXT: pushl %ebp
	▲ Show 20 Lines • Show All 507 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/umul_fix.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64
	; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86

	declare i4 @llvm.umul.fix.i4 (i4, i4, i32)			declare i4 @llvm.umul.fix.i4 (i4, i4, i32)
	declare i32 @llvm.umul.fix.i32 (i32, i32, i32)			declare i32 @llvm.umul.fix.i32 (i32, i32, i32)
	declare i64 @llvm.umul.fix.i64 (i64, i64, i32)			declare i64 @llvm.umul.fix.i64 (i64, i64, i32)
	declare <4 x i32> @llvm.umul.fix.v4i32(<4 x i32>, <4 x i32>, i32)			declare <4 x i32> @llvm.umul.fix.v4i32(<4 x i32>, <4 x i32>, i32)

	define i32 @func(i32 %x, i32 %y) nounwind {			define i32 @func(i32 %x, i32 %y) nounwind {
	; X64-LABEL: func:			; X64-LABEL: func:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %esi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: movl %edi, %ecx			; X64-NEXT: mull %esi
	; X64-NEXT: imulq %rax, %rcx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: movq %rcx, %rax
	; X64-NEXT: shrq $32, %rax
	; X64-NEXT: shldl $30, %ecx, %eax
	; X64-NEXT: # kill: def $eax killed $eax killed $rax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func:			; X86-LABEL: func:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: mull {{[0-9]+}}(%esp)			; X86-NEXT: mull {{[0-9]+}}(%esp)
	; X86-NEXT: shrdl $2, %edx, %eax			; X86-NEXT: shrdl $2, %edx, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/umul_fix_sat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s --check-prefix=X64
	; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686 -mattr=cmov \| FileCheck %s --check-prefix=X86

	declare i4 @llvm.umul.fix.sat.i4 (i4, i4, i32)			declare i4 @llvm.umul.fix.sat.i4 (i4, i4, i32)
	declare i32 @llvm.umul.fix.sat.i32 (i32, i32, i32)			declare i32 @llvm.umul.fix.sat.i32 (i32, i32, i32)
	declare i64 @llvm.umul.fix.sat.i64 (i64, i64, i32)			declare i64 @llvm.umul.fix.sat.i64 (i64, i64, i32)
	declare <4 x i32> @llvm.umul.fix.sat.v4i32(<4 x i32>, <4 x i32>, i32)			declare <4 x i32> @llvm.umul.fix.sat.v4i32(<4 x i32>, <4 x i32>, i32)

	define i32 @func(i32 %x, i32 %y) nounwind {			define i32 @func(i32 %x, i32 %y) nounwind {
	; X64-LABEL: func:			; X64-LABEL: func:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl %esi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: movl %edi, %ecx			; X64-NEXT: mull %esi
	; X64-NEXT: imulq %rax, %rcx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: cmpl $4, %edx
	; X64-NEXT: shrq $32, %rax			; X64-NEXT: movl $-1, %ecx
	; X64-NEXT: shrdl $2, %eax, %ecx			; X64-NEXT: cmovael %ecx, %eax
	; X64-NEXT: cmpl $4, %eax
	; X64-NEXT: movl $-1, %eax
	; X64-NEXT: cmovbl %ecx, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func:			; X86-LABEL: func:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: mull {{[0-9]+}}(%esp)			; X86-NEXT: mull {{[0-9]+}}(%esp)
	; X86-NEXT: shrdl $2, %edx, %eax			; X86-NEXT: shrdl $2, %edx, %eax
	; X86-NEXT: cmpl $4, %edx			; X86-NEXT: cmpl $4, %edx
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; X86-NEXT: retl			; X86-NEXT: retl
	%tmp = call i64 @llvm.umul.fix.sat.i64(i64 %x, i64 %y, i32 2)			%tmp = call i64 @llvm.umul.fix.sat.i64(i64 %x, i64 %y, i32 2)
	ret i64 %tmp			ret i64 %tmp
	}			}

	define i4 @func3(i4 %x, i4 %y) nounwind {			define i4 @func3(i4 %x, i4 %y) nounwind {
	; X64-LABEL: func3:			; X64-LABEL: func3:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: andl $15, %esi			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: shlb $4, %dil			; X64-NEXT: andb $15, %sil
	; X64-NEXT: movzbl %dil, %eax			; X64-NEXT: shlb $4, %al
	; X64-NEXT: imull %esi, %eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: movl %eax, %ecx			; X64-NEXT: mulb %sil
	; X64-NEXT: shrb $2, %cl			; X64-NEXT: shrb $2, %al
	; X64-NEXT: shrl $8, %eax			; X64-NEXT: movb %ah, %dl
	; X64-NEXT: movl %eax, %edx
	; X64-NEXT: shlb $6, %dl			; X64-NEXT: shlb $6, %dl
	; X64-NEXT: orb %cl, %dl			; X64-NEXT: orb %al, %dl
	; X64-NEXT: movzbl %dl, %ecx			; X64-NEXT: movzbl %dl, %edx
	; X64-NEXT: cmpb $4, %al			; X64-NEXT: cmpb $4, %ah
	; X64-NEXT: movl $255, %eax			; X64-NEXT: movl $255, %eax
	; X64-NEXT: cmovbl %ecx, %eax			; X64-NEXT: cmovbl %edx, %eax
	; X64-NEXT: shrb $4, %al			; X64-NEXT: shrb $4, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func3:			; X86-LABEL: func3:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: andb $15, %al
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movzbl %al, %edx			; X86-NEXT: andb $15, %cl
	; X86-NEXT: shlb $4, %cl			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movzbl %cl, %eax			; X86-NEXT: shlb $4, %al
	; X86-NEXT: imull %edx, %eax			; X86-NEXT: mulb %cl
				; X86-NEXT: shrb $2, %al
	; X86-NEXT: movb %ah, %cl			; X86-NEXT: movb %ah, %cl
	; X86-NEXT: shlb $6, %cl			; X86-NEXT: shlb $6, %cl
	; X86-NEXT: shrb $2, %al			; X86-NEXT: orb %al, %cl
	; X86-NEXT: orb %cl, %al			; X86-NEXT: movzbl %cl, %ecx
	; X86-NEXT: movzbl %al, %ecx
	; X86-NEXT: cmpb $4, %ah			; X86-NEXT: cmpb $4, %ah
	; X86-NEXT: movl $255, %eax			; X86-NEXT: movl $255, %eax
	; X86-NEXT: cmovbl %ecx, %eax			; X86-NEXT: cmovbl %ecx, %eax
	; X86-NEXT: shrb $4, %al			; X86-NEXT: shrb $4, %al
	; X86-NEXT: # kill: def $al killed $al killed $eax			; X86-NEXT: # kill: def $al killed $al killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	%tmp = call i4 @llvm.umul.fix.sat.i4(i4 %x, i4 %y, i32 2)			%tmp = call i4 @llvm.umul.fix.sat.i4(i4 %x, i4 %y, i32 2)
	ret i4 %tmp			ret i4 %tmp
	}			}

	define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {			define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
	; X64-LABEL: vec:			; X64-LABEL: vec:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,3,3,3]
	; X64-NEXT: movd %xmm2, %eax
	; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[3,3,3,3]			; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[3,3,3,3]
				; X64-NEXT: movd %xmm2, %eax
				; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[3,3,3,3]
	; X64-NEXT: movd %xmm2, %ecx			; X64-NEXT: movd %xmm2, %ecx
	; X64-NEXT: imulq %rax, %rcx			; X64-NEXT: mull %ecx
	; X64-NEXT: movq %rcx, %rax			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: shrq $32, %rax			; X64-NEXT: cmpl $4, %edx
	; X64-NEXT: shrdl $2, %eax, %ecx			; X64-NEXT: movl $-1, %ecx
	; X64-NEXT: cmpl $4, %eax			; X64-NEXT: cmovael %ecx, %eax
	; X64-NEXT: movl $-1, %eax			; X64-NEXT: movd %eax, %xmm2
	; X64-NEXT: cmovael %eax, %ecx
	; X64-NEXT: movd %ecx, %xmm2
	; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,2,3]
	; X64-NEXT: movd %xmm3, %ecx
	; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm0[2,3,2,3]			; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm0[2,3,2,3]
				; X64-NEXT: movd %xmm3, %eax
				; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,2,3]
	; X64-NEXT: movd %xmm3, %edx			; X64-NEXT: movd %xmm3, %edx
	; X64-NEXT: imulq %rcx, %rdx			; X64-NEXT: mull %edx
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: shrq $32, %rcx			; X64-NEXT: cmpl $4, %edx
	; X64-NEXT: shrdl $2, %ecx, %edx			; X64-NEXT: cmovael %ecx, %eax
	; X64-NEXT: cmpl $4, %ecx			; X64-NEXT: movd %eax, %xmm3
	; X64-NEXT: cmovael %eax, %edx
	; X64-NEXT: movd %edx, %xmm3
	; X64-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]			; X64-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
	; X64-NEXT: movd %xmm1, %ecx			; X64-NEXT: movd %xmm0, %eax
	; X64-NEXT: movd %xmm0, %edx			; X64-NEXT: movd %xmm1, %edx
	; X64-NEXT: imulq %rcx, %rdx			; X64-NEXT: mull %edx
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: shrq $32, %rcx			; X64-NEXT: cmpl $4, %edx
	; X64-NEXT: shrdl $2, %ecx, %edx			; X64-NEXT: cmovael %ecx, %eax
	; X64-NEXT: cmpl $4, %ecx			; X64-NEXT: movd %eax, %xmm2
	; X64-NEXT: cmovael %eax, %edx
	; X64-NEXT: movd %edx, %xmm2
	; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,1,1]
	; X64-NEXT: movd %xmm1, %ecx
	; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,1,1]			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,1,1]
				; X64-NEXT: movd %xmm0, %eax
				; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,1,1]
	; X64-NEXT: movd %xmm0, %edx			; X64-NEXT: movd %xmm0, %edx
	; X64-NEXT: imulq %rcx, %rdx			; X64-NEXT: mull %edx
	; X64-NEXT: movq %rdx, %rcx			; X64-NEXT: shrdl $2, %edx, %eax
	; X64-NEXT: shrq $32, %rcx			; X64-NEXT: cmpl $4, %edx
	; X64-NEXT: shrdl $2, %ecx, %edx			; X64-NEXT: cmovael %ecx, %eax
	; X64-NEXT: cmpl $4, %ecx			; X64-NEXT: movd %eax, %xmm0
	; X64-NEXT: cmovael %eax, %edx
	; X64-NEXT: movd %edx, %xmm0
	; X64-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]			; X64-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
	; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]			; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
	; X64-NEXT: movdqa %xmm2, %xmm0			; X64-NEXT: movdqa %xmm2, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: vec:			; X86-LABEL: vec:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl %ebp			; X86-NEXT: pushl %ebp
	▲ Show 20 Lines • Show All 344 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-mulfix-legalize.ll

	Show All 37 Lines
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t = call <4 x i16> @llvm.umul.fix.v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>, <4 x i16> %a, i32 15)			%t = call <4 x i16> @llvm.umul.fix.v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>, <4 x i16> %a, i32 15)
	ret <4 x i16> %t			ret <4 x i16> %t
	}			}

	define <4 x i16> @smulfixsat(<4 x i16> %a) {			define <4 x i16> @smulfixsat(<4 x i16> %a) {
	; CHECK-LABEL: smulfixsat:			; CHECK-LABEL: smulfixsat:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: pextrw $2, %xmm0, %eax			; CHECK-NEXT: pextrw $1, %xmm0, %eax
	; CHECK-NEXT: cwtl			; CHECK-NEXT: movw $2, %cx
	; CHECK-NEXT: leal (%rax,%rax,2), %ecx			; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
	; CHECK-NEXT: movl %ecx, %edx			; CHECK-NEXT: imulw %cx
	; CHECK-NEXT: shrl $16, %edx			; CHECK-NEXT: movl %eax, %ecx
	; CHECK-NEXT: shldw $1, %cx, %dx			; CHECK-NEXT: shrdw $15, %dx, %cx
	; CHECK-NEXT: sarl $16, %ecx			; CHECK-NEXT: movswl %dx, %eax
	; CHECK-NEXT: cmpl $16384, %ecx # imm = 0x4000			; CHECK-NEXT: cmpl $16384, %eax # imm = 0x4000
	; CHECK-NEXT: movl $32767, %eax # imm = 0x7FFF			; CHECK-NEXT: movl $32767, %esi # imm = 0x7FFF
	; CHECK-NEXT: cmovgel %eax, %edx			; CHECK-NEXT: cmovgel %esi, %ecx
	; CHECK-NEXT: cmpl $-16384, %ecx # imm = 0xC000			; CHECK-NEXT: cmpl $-16384, %eax # imm = 0xC000
	; CHECK-NEXT: movl $32768, %ecx # imm = 0x8000			; CHECK-NEXT: movl $32768, %edi # imm = 0x8000
	; CHECK-NEXT: cmovll %ecx, %edx			; CHECK-NEXT: cmovll %edi, %ecx
	; CHECK-NEXT: pextrw $1, %xmm0, %esi			; CHECK-NEXT: movd %xmm0, %eax
	; CHECK-NEXT: leal (%rsi,%rsi), %edi			; CHECK-NEXT: movw $1, %dx
	; CHECK-NEXT: movswl %si, %r8d			; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
	; CHECK-NEXT: movl %r8d, %esi			; CHECK-NEXT: imulw %dx
	; CHECK-NEXT: shrl $16, %esi			; CHECK-NEXT: # kill: def $ax killed $ax def $eax
	; CHECK-NEXT: shldw $1, %di, %si			; CHECK-NEXT: shrdw $15, %dx, %ax
	; CHECK-NEXT: sarl $16, %r8d
	; CHECK-NEXT: cmpl $16384, %r8d # imm = 0x4000
	; CHECK-NEXT: cmovgel %eax, %esi
	; CHECK-NEXT: cmpl $-16384, %r8d # imm = 0xC000
	; CHECK-NEXT: cmovll %ecx, %esi
	; CHECK-NEXT: movd %xmm0, %edi
	; CHECK-NEXT: movswl %di, %edi
	; CHECK-NEXT: movl %edi, %r8d
	; CHECK-NEXT: shrl $16, %r8d
	; CHECK-NEXT: shldw $1, %di, %r8w
	; CHECK-NEXT: sarl $16, %edi
	; CHECK-NEXT: cmpl $16384, %edi # imm = 0x4000
	; CHECK-NEXT: cmovgel %eax, %r8d
	; CHECK-NEXT: cmpl $-16384, %edi # imm = 0xC000
	; CHECK-NEXT: cmovll %ecx, %r8d
	; CHECK-NEXT: movzwl %r8w, %edi
	; CHECK-NEXT: movd %edi, %xmm1
	; CHECK-NEXT: pinsrw $1, %esi, %xmm1
	; CHECK-NEXT: pinsrw $2, %edx, %xmm1
	; CHECK-NEXT: pextrw $3, %xmm0, %edx
	; CHECK-NEXT: movswl %dx, %edx			; CHECK-NEXT: movswl %dx, %edx
	; CHECK-NEXT: leal (,%rdx,4), %esi
	; CHECK-NEXT: movl %esi, %edi
	; CHECK-NEXT: shrl $16, %edi
	; CHECK-NEXT: shldw $1, %si, %di
	; CHECK-NEXT: sarl $14, %edx
	; CHECK-NEXT: cmpl $16384, %edx # imm = 0x4000			; CHECK-NEXT: cmpl $16384, %edx # imm = 0x4000
	; CHECK-NEXT: cmovgel %eax, %edi			; CHECK-NEXT: cmovgel %esi, %eax
	; CHECK-NEXT: cmpl $-16384, %edx # imm = 0xC000			; CHECK-NEXT: cmpl $-16384, %edx # imm = 0xC000
	; CHECK-NEXT: cmovll %ecx, %edi			; CHECK-NEXT: cmovll %edi, %eax
	; CHECK-NEXT: pinsrw $3, %edi, %xmm1			; CHECK-NEXT: movzwl %ax, %eax
				; CHECK-NEXT: movd %eax, %xmm1
				; CHECK-NEXT: pinsrw $1, %ecx, %xmm1
				; CHECK-NEXT: pextrw $2, %xmm0, %eax
				; CHECK-NEXT: movw $3, %cx
				; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
				; CHECK-NEXT: imulw %cx
				; CHECK-NEXT: # kill: def $ax killed $ax def $eax
				; CHECK-NEXT: shrdw $15, %dx, %ax
				; CHECK-NEXT: movswl %dx, %ecx
				; CHECK-NEXT: cmpl $16384, %ecx # imm = 0x4000
				; CHECK-NEXT: cmovgel %esi, %eax
				; CHECK-NEXT: cmpl $-16384, %ecx # imm = 0xC000
				; CHECK-NEXT: cmovll %edi, %eax
				; CHECK-NEXT: pinsrw $2, %eax, %xmm1
				; CHECK-NEXT: pextrw $3, %xmm0, %eax
				; CHECK-NEXT: movw $4, %cx
				; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
				; CHECK-NEXT: imulw %cx
				; CHECK-NEXT: # kill: def $ax killed $ax def $eax
				; CHECK-NEXT: shrdw $15, %dx, %ax
				; CHECK-NEXT: movswl %dx, %ecx
				; CHECK-NEXT: cmpl $16384, %ecx # imm = 0x4000
				; CHECK-NEXT: cmovgel %esi, %eax
				; CHECK-NEXT: cmpl $-16384, %ecx # imm = 0xC000
				; CHECK-NEXT: cmovll %edi, %eax
				; CHECK-NEXT: pinsrw $3, %eax, %xmm1
	; CHECK-NEXT: movdqa %xmm1, %xmm0			; CHECK-NEXT: movdqa %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t = call <4 x i16> @llvm.smul.fix.sat.v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>, <4 x i16> %a, i32 15)			%t = call <4 x i16> @llvm.smul.fix.sat.v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>, <4 x i16> %a, i32 15)
	ret <4 x i16> %t			ret <4 x i16> %t
	}			}


	define <4 x i16> @umulfixsat(<4 x i16> %a) {			define <4 x i16> @umulfixsat(<4 x i16> %a) {
	; CHECK-LABEL: umulfixsat:			; CHECK-LABEL: umulfixsat:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: pextrw $2, %xmm0, %eax			; CHECK-NEXT: pextrw $2, %xmm0, %eax
	; CHECK-NEXT: leal (%rax,%rax,2), %eax			; CHECK-NEXT: movw $3, %cx
	; CHECK-NEXT: movl %eax, %edx			; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
	; CHECK-NEXT: shrl $16, %edx			; CHECK-NEXT: mulw %cx
	; CHECK-NEXT: movl %edx, %ecx			; CHECK-NEXT: movl %eax, %ecx
	; CHECK-NEXT: shldw $1, %ax, %cx			; CHECK-NEXT: shrdw $15, %dx, %cx
				; CHECK-NEXT: movzwl %dx, %eax
				; CHECK-NEXT: cmpl $32768, %eax # imm = 0x8000
				; CHECK-NEXT: movl $65535, %esi # imm = 0xFFFF
				; CHECK-NEXT: cmovael %esi, %ecx
				; CHECK-NEXT: pextrw $1, %xmm0, %eax
				; CHECK-NEXT: movw $2, %dx
				; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
				; CHECK-NEXT: mulw %dx
				; CHECK-NEXT: # kill: def $ax killed $ax def $eax
				; CHECK-NEXT: shrdw $15, %dx, %ax
				; CHECK-NEXT: movzwl %dx, %edx
	; CHECK-NEXT: cmpl $32768, %edx # imm = 0x8000			; CHECK-NEXT: cmpl $32768, %edx # imm = 0x8000
	; CHECK-NEXT: movl $65535, %eax # imm = 0xFFFF			; CHECK-NEXT: cmovael %esi, %eax
	; CHECK-NEXT: cmovael %eax, %ecx
	; CHECK-NEXT: pextrw $1, %xmm0, %edx
	; CHECK-NEXT: addl %edx, %edx
	; CHECK-NEXT: movl %edx, %esi
	; CHECK-NEXT: shrl $16, %esi
	; CHECK-NEXT: movl %esi, %edi
	; CHECK-NEXT: shldw $1, %dx, %di
	; CHECK-NEXT: cmpl $32768, %esi # imm = 0x8000
	; CHECK-NEXT: cmovael %eax, %edi
	; CHECK-NEXT: movd %xmm0, %edx			; CHECK-NEXT: movd %xmm0, %edx
	; CHECK-NEXT: xorl %esi, %esi			; CHECK-NEXT: xorl %edi, %edi
	; CHECK-NEXT: shldw $1, %dx, %si			; CHECK-NEXT: shldw $1, %dx, %di
	; CHECK-NEXT: movl $32768, %edx # imm = 0x8000			; CHECK-NEXT: movl $32768, %edx # imm = 0x8000
	; CHECK-NEXT: negl %edx			; CHECK-NEXT: negl %edx
	; CHECK-NEXT: cmovael %eax, %esi			; CHECK-NEXT: cmovael %esi, %edi
	; CHECK-NEXT: movzwl %si, %edx			; CHECK-NEXT: movzwl %di, %edx
	; CHECK-NEXT: movd %edx, %xmm1			; CHECK-NEXT: movd %edx, %xmm1
	; CHECK-NEXT: pinsrw $1, %edi, %xmm1			; CHECK-NEXT: pinsrw $1, %eax, %xmm1
	; CHECK-NEXT: pinsrw $2, %ecx, %xmm1			; CHECK-NEXT: pinsrw $2, %ecx, %xmm1
	; CHECK-NEXT: pextrw $3, %xmm0, %ecx			; CHECK-NEXT: pextrw $3, %xmm0, %eax
	; CHECK-NEXT: shll $2, %ecx			; CHECK-NEXT: movw $4, %cx
	; CHECK-NEXT: movl %ecx, %edx			; CHECK-NEXT: # kill: def $ax killed $ax killed $eax
	; CHECK-NEXT: shrl $16, %edx			; CHECK-NEXT: mulw %cx
	; CHECK-NEXT: movl %edx, %esi			; CHECK-NEXT: # kill: def $ax killed $ax def $eax
	; CHECK-NEXT: shldw $1, %cx, %si			; CHECK-NEXT: shrdw $15, %dx, %ax
	; CHECK-NEXT: cmpl $32768, %edx # imm = 0x8000			; CHECK-NEXT: movzwl %dx, %ecx
	; CHECK-NEXT: cmovael %eax, %esi			; CHECK-NEXT: cmpl $32768, %ecx # imm = 0x8000
	; CHECK-NEXT: pinsrw $3, %esi, %xmm1			; CHECK-NEXT: cmovael %esi, %eax
				; CHECK-NEXT: pinsrw $3, %eax, %xmm1
	; CHECK-NEXT: movdqa %xmm1, %xmm0			; CHECK-NEXT: movdqa %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t = call <4 x i16> @llvm.umul.fix.sat.v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>, <4 x i16> %a, i32 15)			%t = call <4 x i16> @llvm.umul.fix.sat.v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>, <4 x i16> %a, i32 15)
	ret <4 x i16> %t			ret <4 x i16> %t
	}			}