This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86.td
-
X86ISelLowering.cpp
-
X86Subtarget.h
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
shift-mask.ll

Differential D61830

[X86][SSE] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758)
ClosedPublic

Authored by RKSimon on May 11 2019, 1:18 PM.

Download Raw Diff

Details

Reviewers

andreadb
spatel
lebedev.ri
craig.topper

Commits

rZORG3e131336d3a7: [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD…
rZORGa915fd49484e: [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD…
rG3e131336d3a7: [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD…
rGa915fd49484e: [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD…
rGc2d9cfd9250d: [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD…
rL360684: [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD…

Summary

D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops.

This is true almost entirely for AMD targets where the scalar ALUs are well balanced

This combine avoids AND immediate mask which usually means we reduce encoding size.

Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon created this revision.May 11 2019, 1:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2019, 1:18 PM

RKSimon mentioned this in rL360533: [X86] Updated shift-mask test targets for D61830.May 11 2019, 1:29 PM

RKSimon mentioned this in rG3fa632a11236: [X86] Updated shift-mask test targets for D61830.

rebase

lebedev.ri added inline comments.May 13 2019, 2:12 PM

lib/Target/X86/X86.td
430 ↗	(On Diff #199158)	This is talking about 'scalar', but the hook is '..Constant..'. This target feature is not specifically about fast shifts by immediate?

RKSimon marked an inline comment as done.May 14 2019, 1:49 AM

RKSimon added inline comments.

lib/Target/X86/X86.td
430 ↗	(On Diff #199158)	The scalar/vector feature flags indicate that shifts have a similar throughput to ANDs

LG
I'm not sure why [SSE] tag is in the subject, should it be [AMD]?

Also, do we want to unfold and (shift x, c1), c2 / shift (and x, c1), c2 ?
Not sure, under !TLI.shouldFoldConstantShiftPairToMask() or a new shouldFoldMaskToConstantShiftPair().

This revision is now accepted and ready to land.May 14 2019, 2:18 AM

Closed by commit rL360684: [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD… (authored by RKSimon). · Explain WhyMay 14 2019, 8:19 AM

This revision was automatically updated to reflect the committed changes.

sidorovd mentioned this in rGee42baea1108: [X86] Updated shift-mask test targets for D61830.May 30 2019, 9:11 AM

sidorovd mentioned this in rGc7b44895f85e: [X86] Updated shift-mask test targets for D61830.May 30 2019, 10:11 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86.td

16 lines

X86ISelLowering.cpp

9 lines

X86Subtarget.h

4 lines

test/

CodeGen/

X86/

shift-mask.ll

161 lines

Diff 199454

llvm/trunk/lib/Target/X86/X86.td

Show First 20 Lines • Show All 421 Lines • ▼ Show 20 Lines
// instructions if a CPU implements horizontal operations (introduced with		// instructions if a CPU implements horizontal operations (introduced with
// SSE3) with better latency/throughput than the alternative sequence.		// SSE3) with better latency/throughput than the alternative sequence.
def FeatureFastHorizontalOps		def FeatureFastHorizontalOps
: SubtargetFeature<		: SubtargetFeature<
"fast-hops", "HasFastHorizontalOps", "true",		"fast-hops", "HasFastHorizontalOps", "true",
"Prefer horizontal vector math instructions (haddp, phsub, etc.) over "		"Prefer horizontal vector math instructions (haddp, phsub, etc.) over "
"normal vector instructions with shuffles", [FeatureSSE3]>;		"normal vector instructions with shuffles", [FeatureSSE3]>;

		def FeatureFastScalarShiftMasks
		: SubtargetFeature<
		"fast-scalar-shift-masks", "HasFastScalarShiftMasks", "true",
		"Prefer a left/right scalar logical shift pair over a shift+and pair">;

def FeatureFastVectorShiftMasks		def FeatureFastVectorShiftMasks
: SubtargetFeature<		: SubtargetFeature<
"fast-vector-shift-masks", "HasFastVectorShiftMasks", "true",		"fast-vector-shift-masks", "HasFastVectorShiftMasks", "true",
"Prefer a left/right vector logical shift pair over a shift+and pair">;		"Prefer a left/right vector logical shift pair over a shift+and pair">;

// Merge branches using three-way conditional code.		// Merge branches using three-way conditional code.
def FeatureMergeToThreeWayBranch : SubtargetFeature<"merge-to-threeway-branch",		def FeatureMergeToThreeWayBranch : SubtargetFeature<"merge-to-threeway-branch",
"ThreewayBranchProfitable", "true",		"ThreewayBranchProfitable", "true",
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	list<SubtargetFeature> BtVer1InheritableFeatures = [FeatureX87,
Feature64Bit,		Feature64Bit,
FeatureCMPXCHG16B,		FeatureCMPXCHG16B,
FeaturePRFCHW,		FeaturePRFCHW,
FeatureLZCNT,		FeatureLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureFast15ByteNOP,		FeatureFast15ByteNOP,
		FeatureFastScalarShiftMasks,
FeatureFastVectorShiftMasks];		FeatureFastVectorShiftMasks];
list<SubtargetFeature> BtVer1Features = BtVer1InheritableFeatures;		list<SubtargetFeature> BtVer1Features = BtVer1InheritableFeatures;

// Jaguar		// Jaguar
list<SubtargetFeature> BtVer2AdditionalFeatures = [FeatureAVX,		list<SubtargetFeature> BtVer2AdditionalFeatures = [FeatureAVX,
FeatureAES,		FeatureAES,
FeaturePCLMUL,		FeaturePCLMUL,
FeatureBMI,		FeatureBMI,
Show All 25 Lines	list<SubtargetFeature> BdVer1InheritableFeatures = [FeatureX87,
FeatureNOPL,		FeatureNOPL,
FeatureLZCNT,		FeatureLZCNT,
FeaturePOPCNT,		FeaturePOPCNT,
FeatureXSAVE,		FeatureXSAVE,
FeatureLWP,		FeatureLWP,
FeatureSlowSHLD,		FeatureSlowSHLD,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureFast11ByteNOP,		FeatureFast11ByteNOP,
		FeatureFastScalarShiftMasks,
FeatureBranchFusion];		FeatureBranchFusion];
list<SubtargetFeature> BdVer1Features = BdVer1InheritableFeatures;		list<SubtargetFeature> BdVer1Features = BdVer1InheritableFeatures;

// PileDriver		// PileDriver
list<SubtargetFeature> BdVer2AdditionalFeatures = [FeatureF16C,		list<SubtargetFeature> BdVer2AdditionalFeatures = [FeatureF16C,
FeatureBMI,		FeatureBMI,
FeatureTBM,		FeatureTBM,
FeatureFMA,		FeatureFMA,
Show All 35 Lines	list<SubtargetFeature> ZNFeatures = [FeatureADX,
FeatureFXSR,		FeatureFXSR,
FeatureNOPL,		FeatureNOPL,
FeatureFastLZCNT,		FeatureFastLZCNT,
FeatureLAHFSAHF,		FeatureLAHFSAHF,
FeatureLZCNT,		FeatureLZCNT,
FeatureFastBEXTR,		FeatureFastBEXTR,
FeatureFast15ByteNOP,		FeatureFast15ByteNOP,
FeatureBranchFusion,		FeatureBranchFusion,
		FeatureFastScalarShiftMasks,
FeatureMMX,		FeatureMMX,
FeatureMOVBE,		FeatureMOVBE,
FeatureMWAITX,		FeatureMWAITX,
FeaturePCLMUL,		FeaturePCLMUL,
FeaturePOPCNT,		FeaturePOPCNT,
FeaturePRFCHW,		FeaturePRFCHW,
FeatureRDRAND,		FeatureRDRAND,
FeatureRDSEED,		FeatureRDSEED,
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	foreach P = ["athlon-4", "athlon-xp", "athlon-mp"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureCMOV,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureCMOV,
FeatureSSE1, Feature3DNowA, FeatureFXSR, FeatureNOPL,		FeatureSSE1, Feature3DNowA, FeatureFXSR, FeatureNOPL,
FeatureSlowSHLD]>;		FeatureSlowSHLD]>;
}		}

foreach P = ["k8", "opteron", "athlon64", "athlon-fx"] in {		foreach P = ["k8", "opteron", "athlon64", "athlon-fx"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B,
FeatureSSE2, Feature3DNowA, FeatureFXSR, FeatureNOPL,		FeatureSSE2, Feature3DNowA, FeatureFXSR, FeatureNOPL,
Feature64Bit, FeatureSlowSHLD, FeatureCMOV]>;		Feature64Bit, FeatureSlowSHLD, FeatureCMOV,
		FeatureFastScalarShiftMasks]>;
}		}

foreach P = ["k8-sse3", "opteron-sse3", "athlon64-sse3"] in {		foreach P = ["k8-sse3", "opteron-sse3", "athlon64-sse3"] in {
def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureSSE3,		def : Proc<P, [FeatureX87, FeatureSlowUAMem16, FeatureCMPXCHG8B, FeatureSSE3,
Feature3DNowA, FeatureFXSR, FeatureNOPL, FeatureCMPXCHG16B,		Feature3DNowA, FeatureFXSR, FeatureNOPL, FeatureCMPXCHG16B,
FeatureSlowSHLD, FeatureCMOV, Feature64Bit]>;		FeatureSlowSHLD, FeatureCMOV, Feature64Bit,
		FeatureFastScalarShiftMasks]>;
}		}

foreach P = ["amdfam10", "barcelona"] in {		foreach P = ["amdfam10", "barcelona"] in {
def : Proc<P, [FeatureX87, FeatureCMPXCHG8B, FeatureSSE4A, Feature3DNowA,		def : Proc<P, [FeatureX87, FeatureCMPXCHG8B, FeatureSSE4A, Feature3DNowA,
FeatureFXSR, FeatureNOPL, FeatureCMPXCHG16B, FeatureLZCNT,		FeatureFXSR, FeatureNOPL, FeatureCMPXCHG16B, FeatureLZCNT,
FeaturePOPCNT, FeatureSlowSHLD, FeatureLAHFSAHF, FeatureCMOV,		FeaturePOPCNT, FeatureSlowSHLD, FeatureLAHFSAHF, FeatureCMOV,
Feature64Bit]>;		Feature64Bit, FeatureFastScalarShiftMasks]>;
}		}

// Bobcat		// Bobcat
def : Proc<"btver1", ProcessorFeatures.BtVer1Features>;		def : Proc<"btver1", ProcessorFeatures.BtVer1Features>;
// Jaguar		// Jaguar
def : ProcessorModel<"btver2", BtVer2Model, ProcessorFeatures.BtVer2Features>;		def : ProcessorModel<"btver2", BtVer2Model, ProcessorFeatures.BtVer2Features>;

// Bulldozer		// Bulldozer
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,015 Lines • ▼ Show 20 Lines

	bool X86TargetLowering::shouldFoldConstantShiftPairToMask(			bool X86TargetLowering::shouldFoldConstantShiftPairToMask(
	const SDNode *N, CombineLevel Level) const {			const SDNode *N, CombineLevel Level) const {
	assert(((N->getOpcode() == ISD::SHL &&			assert(((N->getOpcode() == ISD::SHL &&
	N->getOperand(0).getOpcode() == ISD::SRL) \|\|			N->getOperand(0).getOpcode() == ISD::SRL) \|\|
	(N->getOpcode() == ISD::SRL &&			(N->getOpcode() == ISD::SRL &&
	N->getOperand(0).getOpcode() == ISD::SHL)) &&			N->getOperand(0).getOpcode() == ISD::SHL)) &&
	"Expected shift-shift mask");			"Expected shift-shift mask");
				EVT VT = N->getValueType(0);
	if (Subtarget.hasFastVectorShiftMasks() && N->getValueType(0).isVector()) {			if ((Subtarget.hasFastVectorShiftMasks() && VT.isVector()) \|\|
				(Subtarget.hasFastScalarShiftMasks() && !VT.isVector())) {
	// Only fold if the shift values are equal - so it folds to AND.			// Only fold if the shift values are equal - so it folds to AND.
	// TODO - we should fold if either is non-uniform but we don't do the			// TODO - we should fold if either is a non-uniform vector but we don't do
	// fold for non-splats yet.			// the fold for non-splats yet.
	return N->getOperand(1) == N->getOperand(0).getOperand(1);			return N->getOperand(1) == N->getOperand(0).getOperand(1);
	}			}
	return TargetLoweringBase::shouldFoldConstantShiftPairToMask(N, Level);			return TargetLoweringBase::shouldFoldConstantShiftPairToMask(N, Level);
	}			}

	bool X86TargetLowering::shouldFoldMaskToVariableShiftPair(SDValue Y) const {			bool X86TargetLowering::shouldFoldMaskToVariableShiftPair(SDValue Y) const {
	EVT VT = Y.getValueType();			EVT VT = Y.getValueType();

	▲ Show 20 Lines • Show All 39,516 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 390 Lines • ▼ Show 20 Lines	protected:
bool HasPCONFIG = false;		bool HasPCONFIG = false;

/// Processor has a single uop BEXTR implementation.		/// Processor has a single uop BEXTR implementation.
bool HasFastBEXTR = false;		bool HasFastBEXTR = false;

/// Try harder to combine to horizontal vector ops if they are fast.		/// Try harder to combine to horizontal vector ops if they are fast.
bool HasFastHorizontalOps = false;		bool HasFastHorizontalOps = false;

		/// Prefer a left/right scalar logical shifts pair over a shift+and pair.
		bool HasFastScalarShiftMasks = false;

/// Prefer a left/right vector logical shifts pair over a shift+and pair.		/// Prefer a left/right vector logical shifts pair over a shift+and pair.
bool HasFastVectorShiftMasks = false;		bool HasFastVectorShiftMasks = false;

/// Use a retpoline thunk rather than indirect calls to block speculative		/// Use a retpoline thunk rather than indirect calls to block speculative
/// execution.		/// execution.
bool UseRetpolineIndirectCalls = false;		bool UseRetpolineIndirectCalls = false;

/// Use a retpoline thunk or remove any indirect branch to block speculative		/// Use a retpoline thunk or remove any indirect branch to block speculative
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	public:
}		}
bool hasFastGather() const { return HasFastGather; }		bool hasFastGather() const { return HasFastGather; }
bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }		bool hasFastScalarFSQRT() const { return HasFastScalarFSQRT; }
bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }		bool hasFastVectorFSQRT() const { return HasFastVectorFSQRT; }
bool hasFastLZCNT() const { return HasFastLZCNT; }		bool hasFastLZCNT() const { return HasFastLZCNT; }
bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }		bool hasFastSHLDRotate() const { return HasFastSHLDRotate; }
bool hasFastBEXTR() const { return HasFastBEXTR; }		bool hasFastBEXTR() const { return HasFastBEXTR; }
bool hasFastHorizontalOps() const { return HasFastHorizontalOps; }		bool hasFastHorizontalOps() const { return HasFastHorizontalOps; }
		bool hasFastScalarShiftMasks() const { return HasFastScalarShiftMasks; }
bool hasFastVectorShiftMasks() const { return HasFastVectorShiftMasks; }		bool hasFastVectorShiftMasks() const { return HasFastVectorShiftMasks; }
bool hasMacroFusion() const { return HasMacroFusion; }		bool hasMacroFusion() const { return HasMacroFusion; }
bool hasBranchFusion() const { return HasBranchFusion; }		bool hasBranchFusion() const { return HasBranchFusion; }
bool hasERMSB() const { return HasERMSB; }		bool hasERMSB() const { return HasERMSB; }
bool hasSlowDivide32() const { return HasSlowDivide32; }		bool hasSlowDivide32() const { return HasSlowDivide32; }
bool hasSlowDivide64() const { return HasSlowDivide64; }		bool hasSlowDivide64() const { return HasSlowDivide64; }
bool padShortFunctions() const { return PadShortFunctions; }		bool padShortFunctions() const { return PadShortFunctions; }
bool slowTwoMemOps() const { return SlowTwoMemOps; }		bool slowTwoMemOps() const { return SlowTwoMemOps; }
▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/shift-mask.ll

	Show All 37 Lines
	define i8 @test_i8_shl_lshr_1(i8 %a0) {			define i8 @test_i8_shl_lshr_1(i8 %a0) {
	; X86-LABEL: test_i8_shl_lshr_1:			; X86-LABEL: test_i8_shl_lshr_1:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movb {{[0-9]+}}(%esp), %al			; X86-NEXT: movb {{[0-9]+}}(%esp), %al
	; X86-NEXT: shlb $2, %al			; X86-NEXT: shlb $2, %al
	; X86-NEXT: andb $-32, %al			; X86-NEXT: andb $-32, %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i8_shl_lshr_1:			; X64-MASK-LABEL: test_i8_shl_lshr_1:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: # kill: def $edi killed $edi def $rdi			; X64-MASK-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: leal (,%rdi,4), %eax			; X64-MASK-NEXT: leal (,%rdi,4), %eax
	; X64-NEXT: andb $-32, %al			; X64-MASK-NEXT: andb $-32, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-MASK-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i8_shl_lshr_1:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: movl %edi, %eax
				; X64-SHIFT-NEXT: shrb $3, %al
				; X64-SHIFT-NEXT: shlb $5, %al
				; X64-SHIFT-NEXT: # kill: def $al killed $al killed $eax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i8 %a0, 3			%1 = lshr i8 %a0, 3
	%2 = shl i8 %1, 5			%2 = shl i8 %1, 5
	ret i8 %2			ret i8 %2
	}			}

	define i8 @test_i8_shl_lshr_2(i8 %a0) {			define i8 @test_i8_shl_lshr_2(i8 %a0) {
	; X86-LABEL: test_i8_shl_lshr_2:			; X86-LABEL: test_i8_shl_lshr_2:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movb {{[0-9]+}}(%esp), %al			; X86-NEXT: movb {{[0-9]+}}(%esp), %al
	; X86-NEXT: shrb $2, %al			; X86-NEXT: shrb $2, %al
	; X86-NEXT: andb $56, %al			; X86-NEXT: andb $56, %al
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i8_shl_lshr_2:			; X64-MASK-LABEL: test_i8_shl_lshr_2:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-MASK-NEXT: movl %edi, %eax
	; X64-NEXT: shrb $2, %al			; X64-MASK-NEXT: shrb $2, %al
	; X64-NEXT: andb $56, %al			; X64-MASK-NEXT: andb $56, %al
	; X64-NEXT: # kill: def $al killed $al killed $eax			; X64-MASK-NEXT: # kill: def $al killed $al killed $eax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i8_shl_lshr_2:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: # kill: def $edi killed $edi def $rdi
				; X64-SHIFT-NEXT: shrb $5, %dil
				; X64-SHIFT-NEXT: leal (,%rdi,8), %eax
				; X64-SHIFT-NEXT: # kill: def $al killed $al killed $eax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i8 %a0, 5			%1 = lshr i8 %a0, 5
	%2 = shl i8 %1, 3			%2 = shl i8 %1, 3
	ret i8 %2			ret i8 %2
	}			}

	define i16 @test_i16_shl_lshr_0(i16 %a0) {			define i16 @test_i16_shl_lshr_0(i16 %a0) {
	; X86-LABEL: test_i16_shl_lshr_0:			; X86-LABEL: test_i16_shl_lshr_0:
	; X86: # %bb.0:			; X86: # %bb.0:
	Show All 17 Lines
	; X86-LABEL: test_i16_shl_lshr_1:			; X86-LABEL: test_i16_shl_lshr_1:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: shll $2, %eax			; X86-NEXT: shll $2, %eax
	; X86-NEXT: andl $65504, %eax # imm = 0xFFE0			; X86-NEXT: andl $65504, %eax # imm = 0xFFE0
	; X86-NEXT: # kill: def $ax killed $ax killed $eax			; X86-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i16_shl_lshr_1:			; X64-MASK-LABEL: test_i16_shl_lshr_1:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: # kill: def $edi killed $edi def $rdi			; X64-MASK-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: leal (,%rdi,4), %eax			; X64-MASK-NEXT: leal (,%rdi,4), %eax
	; X64-NEXT: andl $65504, %eax # imm = 0xFFE0			; X64-MASK-NEXT: andl $65504, %eax # imm = 0xFFE0
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-MASK-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i16_shl_lshr_1:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: movzwl %di, %eax
				; X64-SHIFT-NEXT: shrl $3, %eax
				; X64-SHIFT-NEXT: shll $5, %eax
				; X64-SHIFT-NEXT: # kill: def $ax killed $ax killed $eax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i16 %a0, 3			%1 = lshr i16 %a0, 3
	%2 = shl i16 %1, 5			%2 = shl i16 %1, 5
	ret i16 %2			ret i16 %2
	}			}

	define i16 @test_i16_shl_lshr_2(i16 %a0) {			define i16 @test_i16_shl_lshr_2(i16 %a0) {
	; X86-LABEL: test_i16_shl_lshr_2:			; X86-LABEL: test_i16_shl_lshr_2:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: shrl $2, %eax			; X86-NEXT: shrl $2, %eax
	; X86-NEXT: andl $16376, %eax # imm = 0x3FF8			; X86-NEXT: andl $16376, %eax # imm = 0x3FF8
	; X86-NEXT: # kill: def $ax killed $ax killed $eax			; X86-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i16_shl_lshr_2:			; X64-MASK-LABEL: test_i16_shl_lshr_2:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-MASK-NEXT: movl %edi, %eax
	; X64-NEXT: shrl $2, %eax			; X64-MASK-NEXT: shrl $2, %eax
	; X64-NEXT: andl $16376, %eax # imm = 0x3FF8			; X64-MASK-NEXT: andl $16376, %eax # imm = 0x3FF8
	; X64-NEXT: # kill: def $ax killed $ax killed $eax			; X64-MASK-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i16_shl_lshr_2:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: movzwl %di, %eax
				; X64-SHIFT-NEXT: shrl $5, %eax
				; X64-SHIFT-NEXT: shll $3, %eax
				; X64-SHIFT-NEXT: # kill: def $ax killed $ax killed $eax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i16 %a0, 5			%1 = lshr i16 %a0, 5
	%2 = shl i16 %1, 3			%2 = shl i16 %1, 3
	ret i16 %2			ret i16 %2
	}			}

	define i32 @test_i32_shl_lshr_0(i32 %a0) {			define i32 @test_i32_shl_lshr_0(i32 %a0) {
	; X86-LABEL: test_i32_shl_lshr_0:			; X86-LABEL: test_i32_shl_lshr_0:
	; X86: # %bb.0:			; X86: # %bb.0:
	Show All 14 Lines
	define i32 @test_i32_shl_lshr_1(i32 %a0) {			define i32 @test_i32_shl_lshr_1(i32 %a0) {
	; X86-LABEL: test_i32_shl_lshr_1:			; X86-LABEL: test_i32_shl_lshr_1:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: shll $2, %eax			; X86-NEXT: shll $2, %eax
	; X86-NEXT: andl $-32, %eax			; X86-NEXT: andl $-32, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i32_shl_lshr_1:			; X64-MASK-LABEL: test_i32_shl_lshr_1:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: # kill: def $edi killed $edi def $rdi			; X64-MASK-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-NEXT: leal (,%rdi,4), %eax			; X64-MASK-NEXT: leal (,%rdi,4), %eax
	; X64-NEXT: andl $-32, %eax			; X64-MASK-NEXT: andl $-32, %eax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i32_shl_lshr_1:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: movl %edi, %eax
				; X64-SHIFT-NEXT: shrl $3, %eax
				; X64-SHIFT-NEXT: shll $5, %eax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i32 %a0, 3			%1 = lshr i32 %a0, 3
	%2 = shl i32 %1, 5			%2 = shl i32 %1, 5
	ret i32 %2			ret i32 %2
	}			}

	define i32 @test_i32_shl_lshr_2(i32 %a0) {			define i32 @test_i32_shl_lshr_2(i32 %a0) {
	; X86-LABEL: test_i32_shl_lshr_2:			; X86-LABEL: test_i32_shl_lshr_2:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: shrl $2, %eax			; X86-NEXT: shrl $2, %eax
	; X86-NEXT: andl $-8, %eax			; X86-NEXT: andl $-8, %eax
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i32_shl_lshr_2:			; X64-MASK-LABEL: test_i32_shl_lshr_2:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: movl %edi, %eax			; X64-MASK-NEXT: movl %edi, %eax
	; X64-NEXT: shrl $2, %eax			; X64-MASK-NEXT: shrl $2, %eax
	; X64-NEXT: andl $-8, %eax			; X64-MASK-NEXT: andl $-8, %eax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i32_shl_lshr_2:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: # kill: def $edi killed $edi def $rdi
				; X64-SHIFT-NEXT: shrl $5, %edi
				; X64-SHIFT-NEXT: leal (,%rdi,8), %eax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i32 %a0, 5			%1 = lshr i32 %a0, 5
	%2 = shl i32 %1, 3			%2 = shl i32 %1, 3
	ret i32 %2			ret i32 %2
	}			}

	define i64 @test_i64_shl_lshr_0(i64 %a0) {			define i64 @test_i64_shl_lshr_0(i64 %a0) {
	; X86-LABEL: test_i64_shl_lshr_0:			; X86-LABEL: test_i64_shl_lshr_0:
	; X86: # %bb.0:			; X86: # %bb.0:
	Show All 17 Lines
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: leal (,%ecx,4), %eax			; X86-NEXT: leal (,%ecx,4), %eax
	; X86-NEXT: andl $-32, %eax			; X86-NEXT: andl $-32, %eax
	; X86-NEXT: shldl $2, %ecx, %edx			; X86-NEXT: shldl $2, %ecx, %edx
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i64_shl_lshr_1:			; X64-MASK-LABEL: test_i64_shl_lshr_1:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: leaq (,%rdi,4), %rax			; X64-MASK-NEXT: leaq (,%rdi,4), %rax
	; X64-NEXT: andq $-32, %rax			; X64-MASK-NEXT: andq $-32, %rax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i64_shl_lshr_1:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: movq %rdi, %rax
				; X64-SHIFT-NEXT: shrq $3, %rax
				; X64-SHIFT-NEXT: shlq $5, %rax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i64 %a0, 3			%1 = lshr i64 %a0, 3
	%2 = shl i64 %1, 5			%2 = shl i64 %1, 5
	ret i64 %2			ret i64 %2
	}			}

	define i64 @test_i64_shl_lshr_2(i64 %a0) {			define i64 @test_i64_shl_lshr_2(i64 %a0) {
	; X86-LABEL: test_i64_shl_lshr_2:			; X86-LABEL: test_i64_shl_lshr_2:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: shrdl $2, %edx, %eax			; X86-NEXT: shrdl $2, %edx, %eax
	; X86-NEXT: andl $-8, %eax			; X86-NEXT: andl $-8, %eax
	; X86-NEXT: shrl $2, %edx			; X86-NEXT: shrl $2, %edx
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test_i64_shl_lshr_2:			; X64-MASK-LABEL: test_i64_shl_lshr_2:
	; X64: # %bb.0:			; X64-MASK: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-MASK-NEXT: movq %rdi, %rax
	; X64-NEXT: shrq $2, %rax			; X64-MASK-NEXT: shrq $2, %rax
	; X64-NEXT: andq $-8, %rax			; X64-MASK-NEXT: andq $-8, %rax
	; X64-NEXT: retq			; X64-MASK-NEXT: retq
				;
				; X64-SHIFT-LABEL: test_i64_shl_lshr_2:
				; X64-SHIFT: # %bb.0:
				; X64-SHIFT-NEXT: shrq $5, %rdi
				; X64-SHIFT-NEXT: leaq (,%rdi,8), %rax
				; X64-SHIFT-NEXT: retq
	%1 = lshr i64 %a0, 5			%1 = lshr i64 %a0, 5
	%2 = shl i64 %1, 3			%2 = shl i64 %1, 3
	ret i64 %2			ret i64 %2
	}			}

	;			;
	; fold (lshr (shl x, c1), c2) -> (0) (and x, MASK) or			; fold (lshr (shl x, c1), c2) -> (0) (and x, MASK) or
	; (1) (and (shl x, (sub c1, c2), MASK) or			; (1) (and (shl x, (sub c1, c2), MASK) or
	▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines