This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
3
TargetLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2/6
fshl.ll
-
fshr.ll

Differential D77301

[TargetLowering] Improve expansion of FSHL/FSHR
ClosedPublic

Authored by foad on Apr 2 2020, 5:46 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
arsenm
spatel

Commits

rG17941437a2ed: [TargetLowering] Improve expansion of FSHL/FSHR

Summary

Use an extra shift-by-1 instead of a compare and select to handle the
shift-by-zero case. This sometimes saves one instruction (if the compare
couldn't be combined with a previous instruction). It also works better
on targets that don't have good select instructions.

Note that currently this change doesn't affect most targets because
expandFunnelShift is not used because funnel shift intrinsics are
lowered early in SelectionDAGBuilder. But there is work afoot to change
that; see D77152.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Apr 2 2020, 5:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2020, 5:46 AM

Herald added subscribers: llvm-commits, hiraditya, wdng. · View Herald Transcript

foad added a parent revision: D77300: [X86] Improve combineVectorShiftImm.Apr 2 2020, 5:47 AM

foad marked an inline comment as done.Apr 2 2020, 5:49 AM

foad added inline comments.

llvm/test/CodeGen/X86/fshl.ll
123–124	The `andb` is redundant here and in a bunch of other i32/i64 test cases. I would have thought the simplifyDemandedBits machinery could work that out. Any ideas why this isn't already working?

Harbormaster failed remote builds in B51466: Diff 254496!Apr 2 2020, 7:01 AM

foad marked an inline comment as done.Apr 2 2020, 8:39 AM

foad added inline comments.

llvm/test/CodeGen/X86/fshl.ll
123–124	Answering my own question, `X86InstrCompiler.td` has patterns that match `(shift x (and y, 31)) ==> (shift x, y)` but it has no chance of matching when there is an intervening `sub` or `xor` as in `(shift x (xor (and y, 31), 31))`.

RKSimon added inline comments.Apr 2 2020, 10:24 AM

llvm/test/CodeGen/X86/fshl.ll
123–124	It might be possible to extend X86DAGToDAGISel.isUnneededShiftMask to handle this - @craig.topper might be able to advise.

RKSimon added inline comments.Apr 3 2020, 2:12 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6050	Incidently, DAGCombiner::MatchRotate doesn't currently recognise this pattern to CREATE funnel shifts / rotates - X86 handles it in combineOrShiftToFunnelShift, which is the last part that needs to be moved to DAGCombiner before we can close PR40081.

RKSimon added inline comments.Apr 15 2020, 9:52 AM

llvm/test/CodeGen/X86/fshl.ll
123–124	@craig.topper What do you reckon? Could we handle the xor(and(x,bw-1),bw-1) pattern directly or even try to call SimplifyDemandedBits/SimplifyMultipleUseDemandedBits ?

craig.topper added inline comments.Apr 15 2020, 10:52 AM

llvm/test/CodeGen/X86/fshl.ll
123–124	Node creation inside of Select requires maintaining topological sorting so calling SimplifyDemandedBits/SimplifyMultipleUseDemandedBits is problematic. Why isn't this combine from DAGCombiner.cpp visitXOR kicking in "fold (xor (and x, y), y) -> (and (not x), y) "? Is it because the (and x, y) has another use by the other shift until the isel pattern skips over it?

RKSimon added inline comments.Apr 15 2020, 11:45 AM

llvm/test/CodeGen/X86/fshl.ll
123–124	Almost certainly - I'm happy for the xor+and fix to be handled later if you are - the slow-case codegen looks better even with this.

RKSimon added inline comments.Apr 17 2020, 7:11 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6071–6082	Shouldn't this be using ShVT?

RKSimon mentioned this in D78935: [DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine.Apr 27 2020, 8:37 AM

RKSimon mentioned this in rG96238486ed62: [DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine.Apr 30 2020, 5:05 AM

ping? I'm waiting on this to copy the same for the GlobalISel expansion

RKSimon added inline comments.May 13 2020, 9:59 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6071–6082	SDValue One = DAG.getConstant(1, DL, ShVT);

Rebase. Fix VT -> ShVT.

Harbormaster completed remote builds in B56727: Diff 263983.May 14 2020, 7:32 AM

LGTM - cheers

This revision is now accepted and ready to land.May 14 2020, 7:43 AM

Closed by commit rG17941437a2ed: [TargetLowering] Improve expansion of FSHL/FSHR (authored by foad). · Explain WhyMay 14 2020, 8:38 AM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in D80466: [X86] Improve i8 + 'slow' i16 funnel shift codegen.May 23 2020, 3:09 AM

RKSimon mentioned this in rGcc65a7a5ea81: [X86] Improve i8 + 'slow' i16 funnel shift codegen.May 24 2020, 12:29 AM

RKSimon mentioned this in D80489: [TargetLowering] Improve expandFunnelShift shift amount masking.May 24 2020, 2:40 AM

RKSimon mentioned this in rG16031067252d: [TargetLowering] Improve expandFunnelShift shift amount masking.May 24 2020, 3:44 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

37 lines

test/

CodeGen/

X86/

fshl.ll

211 lines

fshr.ll

206 lines

Diff 264005

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,040 Lines • ▼ Show 20 Lines	bool TargetLowering::expandFunnelShift(SDNode *Node, SDValue &Result,
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);

if (VT.isVector() && (!isOperationLegalOrCustom(ISD::SHL, VT) \|\|		if (VT.isVector() && (!isOperationLegalOrCustom(ISD::SHL, VT) \|\|
!isOperationLegalOrCustom(ISD::SRL, VT) \|\|		!isOperationLegalOrCustom(ISD::SRL, VT) \|\|
!isOperationLegalOrCustom(ISD::SUB, VT) \|\|		!isOperationLegalOrCustom(ISD::SUB, VT) \|\|
!isOperationLegalOrCustomOrPromote(ISD::OR, VT)))		!isOperationLegalOrCustomOrPromote(ISD::OR, VT)))
return false;		return false;

// fshl: (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))		// fshl: X << (Z % BW) \| Y >> 1 >> (BW - 1 - (Z % BW))
// fshr: (X << (BW - (Z % BW))) \| (Y >> (Z % BW))		// fshr: X << 1 << (BW - 1 - (Z % BW)) \| Y >> (Z % BW)
		RKSimonUnsubmitted Not Done Reply Inline Actions Incidently, DAGCombiner::MatchRotate doesn't currently recognise this pattern to CREATE funnel shifts / rotates - X86 handles it in combineOrShiftToFunnelShift, which is the last part that needs to be moved to DAGCombiner before we can close PR40081. RKSimon: Incidently, DAGCombiner::MatchRotate doesn't currently recognise this pattern to CREATE funnel…
SDValue X = Node->getOperand(0);		SDValue X = Node->getOperand(0);
SDValue Y = Node->getOperand(1);		SDValue Y = Node->getOperand(1);
SDValue Z = Node->getOperand(2);		SDValue Z = Node->getOperand(2);

unsigned EltSizeInBits = VT.getScalarSizeInBits();		unsigned EltSizeInBits = VT.getScalarSizeInBits();
bool IsFSHL = Node->getOpcode() == ISD::FSHL;		bool IsFSHL = Node->getOpcode() == ISD::FSHL;
SDLoc DL(SDValue(Node, 0));		SDLoc DL(SDValue(Node, 0));

EVT ShVT = Z.getValueType();		EVT ShVT = Z.getValueType();
SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT);		SDValue Mask = DAG.getConstant(EltSizeInBits - 1, DL, ShVT);
SDValue Zero = DAG.getConstant(0, DL, ShVT);

SDValue ShAmt;		SDValue ShAmt;
if (isPowerOf2_32(EltSizeInBits)) {		if (isPowerOf2_32(EltSizeInBits)) {
SDValue Mask = DAG.getConstant(EltSizeInBits - 1, DL, ShVT);		// Z % BW -> Z & (BW - 1)
ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask);		ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask);
} else {		} else {
		SDValue BitWidthC = DAG.getConstant(EltSizeInBits, DL, ShVT);
ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC);		ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC);
}		}
		SDValue InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt);

SDValue InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, ShAmt);		SDValue One = DAG.getConstant(1, DL, ShVT);
SDValue ShX = DAG.getNode(ISD::SHL, DL, VT, X, IsFSHL ? ShAmt : InvShAmt);		SDValue ShX, ShY;
SDValue ShY = DAG.getNode(ISD::SRL, DL, VT, Y, IsFSHL ? InvShAmt : ShAmt);		if (IsFSHL) {
SDValue Or = DAG.getNode(ISD::OR, DL, VT, ShX, ShY);		ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt);
		SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One);
// If (Z % BW == 0), then the opposite direction shift is shift-by-bitwidth,		ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt);
// and that is undefined. We must compare and select to avoid UB.		} else {
EVT CCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), ShVT);		SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One);
		ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt);
// For fshl, 0-shift returns the 1st arg (X).		ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt);
// For fshr, 0-shift returns the 2nd arg (Y).		}
SDValue IsZeroShift = DAG.getSetCC(DL, CCVT, ShAmt, Zero, ISD::SETEQ);		Result = DAG.getNode(ISD::OR, DL, VT, ShX, ShY);
		RKSimonUnsubmitted Not Done Reply Inline Actions Shouldn't this be using ShVT? RKSimon: Shouldn't this be using ShVT?
		RKSimonUnsubmitted Not Done Reply Inline Actions SDValue One = DAG.getConstant(1, DL, ShVT); RKSimon: SDValue One = DAG.getConstant(1, DL, ShVT);
Result = DAG.getSelect(DL, VT, IsZeroShift, IsFSHL ? X : Y, Or);
return true;		return true;
}		}

// TODO: Merge with expandFunnelShift.		// TODO: Merge with expandFunnelShift.
bool TargetLowering::expandROT(SDNode *Node, SDValue &Result,		bool TargetLowering::expandROT(SDNode *Node, SDValue &Result,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
unsigned EltSizeInBits = VT.getScalarSizeInBits();		unsigned EltSizeInBits = VT.getScalarSizeInBits();
▲ Show 20 Lines • Show All 1,668 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fshl.ll

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; X86-FAST-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X86-FAST-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl			; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-FAST-NEXT: andb $15, %cl			; X86-FAST-NEXT: andb $15, %cl
	; X86-FAST-NEXT: shldw %cl, %dx, %ax			; X86-FAST-NEXT: shldw %cl, %dx, %ax
	; X86-FAST-NEXT: retl			; X86-FAST-NEXT: retl
	;			;
	; X86-SLOW-LABEL: var_shift_i16:			; X86-SLOW-LABEL: var_shift_i16:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %edi			; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: pushl %esi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %esi			; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl			; X86-SLOW-NEXT: andb $15, %cl
	; X86-SLOW-NEXT: andb $15, %dl			; X86-SLOW-NEXT: shll %cl, %edx
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SLOW-NEXT: shrl %eax
	; X86-SLOW-NEXT: movl %eax, %edi			; X86-SLOW-NEXT: xorb $15, %cl
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: shrl %cl, %eax
	; X86-SLOW-NEXT: shll %cl, %edi			; X86-SLOW-NEXT: orl %edx, %eax
	; X86-SLOW-NEXT: movb $16, %cl
	; X86-SLOW-NEXT: subb %dl, %cl
	; X86-SLOW-NEXT: shrl %cl, %esi
	; X86-SLOW-NEXT: testb %dl, %dl
	; X86-SLOW-NEXT: je .LBB1_2
	; X86-SLOW-NEXT: # %bb.1:
	; X86-SLOW-NEXT: orl %esi, %edi
	; X86-SLOW-NEXT: movl %edi, %eax
	; X86-SLOW-NEXT: .LBB1_2:
	; X86-SLOW-NEXT: # kill: def $ax killed $ax killed $eax			; X86-SLOW-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-SLOW-NEXT: popl %esi
	; X86-SLOW-NEXT: popl %edi
	; X86-SLOW-NEXT: retl			; X86-SLOW-NEXT: retl
	;			;
	; X64-FAST-LABEL: var_shift_i16:			; X64-FAST-LABEL: var_shift_i16:
	; X64-FAST: # %bb.0:			; X64-FAST: # %bb.0:
	; X64-FAST-NEXT: movl %edx, %ecx			; X64-FAST-NEXT: movl %edx, %ecx
	; X64-FAST-NEXT: movl %edi, %eax			; X64-FAST-NEXT: movl %edi, %eax
	; X64-FAST-NEXT: andb $15, %cl			; X64-FAST-NEXT: andb $15, %cl
	; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-FAST-NEXT: shldw %cl, %si, %ax			; X64-FAST-NEXT: shldw %cl, %si, %ax
	; X64-FAST-NEXT: # kill: def $ax killed $ax killed $eax			; X64-FAST-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-FAST-NEXT: retq			; X64-FAST-NEXT: retq
	;			;
	; X64-SLOW-LABEL: var_shift_i16:			; X64-SLOW-LABEL: var_shift_i16:
	; X64-SLOW: # %bb.0:			; X64-SLOW: # %bb.0:
	; X64-SLOW-NEXT: movzwl %si, %eax
	; X64-SLOW-NEXT: andb $15, %dl
	; X64-SLOW-NEXT: movl %edi, %esi
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: movl %edx, %ecx
	; X64-SLOW-NEXT: shll %cl, %esi			; X64-SLOW-NEXT: movzwl %si, %eax
	; X64-SLOW-NEXT: movb $16, %cl			; X64-SLOW-NEXT: andb $15, %cl
	; X64-SLOW-NEXT: subb %dl, %cl			; X64-SLOW-NEXT: shll %cl, %edi
				; X64-SLOW-NEXT: xorb $15, %cl
				; X64-SLOW-NEXT: shrl %eax
				; X64-SLOW-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-SLOW-NEXT: shrl %cl, %eax			; X64-SLOW-NEXT: shrl %cl, %eax
	; X64-SLOW-NEXT: orl %esi, %eax			; X64-SLOW-NEXT: orl %edi, %eax
	; X64-SLOW-NEXT: testb %dl, %dl
	; X64-SLOW-NEXT: cmovel %edi, %eax
	; X64-SLOW-NEXT: # kill: def $ax killed $ax killed $eax			; X64-SLOW-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-SLOW-NEXT: retq			; X64-SLOW-NEXT: retq
	%tmp = tail call i16 @llvm.fshl.i16(i16 %x, i16 %y, i16 %z)			%tmp = tail call i16 @llvm.fshl.i16(i16 %x, i16 %y, i16 %z)
	ret i16 %tmp			ret i16 %tmp
	}			}

	define i32 @var_shift_i32(i32 %x, i32 %y, i32 %z) nounwind {			define i32 @var_shift_i32(i32 %x, i32 %y, i32 %z) nounwind {
	; X86-FAST-LABEL: var_shift_i32:			; X86-FAST-LABEL: var_shift_i32:
	; X86-FAST: # %bb.0:			; X86-FAST: # %bb.0:
	; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl			; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-FAST-NEXT: shldl %cl, %edx, %eax			; X86-FAST-NEXT: shldl %cl, %edx, %eax
	; X86-FAST-NEXT: retl			; X86-FAST-NEXT: retl
	;			;
	; X86-SLOW-LABEL: var_shift_i32:			; X86-SLOW-LABEL: var_shift_i32:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %edi
	; X86-SLOW-NEXT: pushl %esi
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl %eax, %edi			; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: shll %cl, %edi			; X86-SLOW-NEXT: shll %cl, %edx
	; X86-SLOW-NEXT: andb $31, %dl			; X86-SLOW-NEXT: shrl %eax
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: andb $31, %cl
	; X86-SLOW-NEXT: negb %cl			; X86-SLOW-NEXT: xorb $31, %cl
				foadAuthorUnsubmitted Done Reply Inline Actions The `andb` is redundant here and in a bunch of other i32/i64 test cases. I would have thought the simplifyDemandedBits machinery could work that out. Any ideas why this isn't already working? foad: The `andb` is redundant here and in a bunch of other i32/i64 test cases. I would have thought…
				foadAuthorUnsubmitted Done Reply Inline Actions Answering my own question, `X86InstrCompiler.td` has patterns that match `(shift x (and y, 31)) ==> (shift x, y)` but it has no chance of matching when there is an intervening `sub` or `xor` as in `(shift x (xor (and y, 31), 31))`. foad: Answering my own question, `X86InstrCompiler.td` has patterns that match `(shift x (and y, 31))…
				RKSimonUnsubmitted Not Done Reply Inline Actions It might be possible to extend X86DAGToDAGISel.isUnneededShiftMask to handle this - @craig.topper might be able to advise. RKSimon: It might be possible to extend X86DAGToDAGISel.isUnneededShiftMask to handle this - @craig.
				RKSimonUnsubmitted Not Done Reply Inline Actions @craig.topper What do you reckon? Could we handle the xor(and(x,bw-1),bw-1) pattern directly or even try to call SimplifyDemandedBits/SimplifyMultipleUseDemandedBits ? RKSimon: @craig.topper What do you reckon? Could we handle the xor(and(x,bw-1),bw-1) pattern directly or…
				craig.topperUnsubmitted Not Done Reply Inline Actions Node creation inside of Select requires maintaining topological sorting so calling SimplifyDemandedBits/SimplifyMultipleUseDemandedBits is problematic. Why isn't this combine from DAGCombiner.cpp visitXOR kicking in "fold (xor (and x, y), y) -> (and (not x), y) "? Is it because the (and x, y) has another use by the other shift until the isel pattern skips over it? craig.topper: Node creation inside of Select requires maintaining topological sorting so calling…
				RKSimonUnsubmitted Not Done Reply Inline Actions Almost certainly - I'm happy for the xor+and fix to be handled later if you are - the slow-case codegen looks better even with this. RKSimon: Almost certainly - I'm happy for the xor+and fix to be handled later if you are - the slow-case…
	; X86-SLOW-NEXT: shrl %cl, %esi			; X86-SLOW-NEXT: shrl %cl, %eax
	; X86-SLOW-NEXT: testb %dl, %dl			; X86-SLOW-NEXT: orl %edx, %eax
	; X86-SLOW-NEXT: je .LBB2_2
	; X86-SLOW-NEXT: # %bb.1:
	; X86-SLOW-NEXT: orl %esi, %edi
	; X86-SLOW-NEXT: movl %edi, %eax
	; X86-SLOW-NEXT: .LBB2_2:
	; X86-SLOW-NEXT: popl %esi
	; X86-SLOW-NEXT: popl %edi
	; X86-SLOW-NEXT: retl			; X86-SLOW-NEXT: retl
	;			;
	; X64-FAST-LABEL: var_shift_i32:			; X64-FAST-LABEL: var_shift_i32:
	; X64-FAST: # %bb.0:			; X64-FAST: # %bb.0:
	; X64-FAST-NEXT: movl %edx, %ecx			; X64-FAST-NEXT: movl %edx, %ecx
	; X64-FAST-NEXT: movl %edi, %eax			; X64-FAST-NEXT: movl %edi, %eax
	; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-FAST-NEXT: shldl %cl, %esi, %eax			; X64-FAST-NEXT: shldl %cl, %esi, %eax
	; X64-FAST-NEXT: retq			; X64-FAST-NEXT: retq
	;			;
	; X64-SLOW-LABEL: var_shift_i32:			; X64-SLOW-LABEL: var_shift_i32:
	; X64-SLOW: # %bb.0:			; X64-SLOW: # %bb.0:
	; X64-SLOW-NEXT: movl %esi, %eax
	; X64-SLOW-NEXT: movl %edi, %esi
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: movl %edx, %ecx
	; X64-SLOW-NEXT: shll %cl, %esi			; X64-SLOW-NEXT: movl %esi, %eax
	; X64-SLOW-NEXT: andb $31, %dl			; X64-SLOW-NEXT: shll %cl, %edi
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: shrl %eax
	; X64-SLOW-NEXT: negb %cl			; X64-SLOW-NEXT: andb $31, %cl
				; X64-SLOW-NEXT: xorb $31, %cl
				; X64-SLOW-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-SLOW-NEXT: shrl %cl, %eax			; X64-SLOW-NEXT: shrl %cl, %eax
	; X64-SLOW-NEXT: orl %esi, %eax			; X64-SLOW-NEXT: orl %edi, %eax
	; X64-SLOW-NEXT: testb %dl, %dl
	; X64-SLOW-NEXT: cmovel %edi, %eax
	; X64-SLOW-NEXT: retq			; X64-SLOW-NEXT: retq
	%tmp = tail call i32 @llvm.fshl.i32(i32 %x, i32 %y, i32 %z)			%tmp = tail call i32 @llvm.fshl.i32(i32 %x, i32 %y, i32 %z)
	ret i32 %tmp			ret i32 %tmp
	}			}

	define i32 @var_shift_i32_optsize(i32 %x, i32 %y, i32 %z) nounwind optsize {			define i32 @var_shift_i32_optsize(i32 %x, i32 %y, i32 %z) nounwind optsize {
	; X86-LABEL: var_shift_i32_optsize:			; X86-LABEL: var_shift_i32_optsize:
	; X86: # %bb.0:			; X86: # %bb.0:
	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; X86-FAST-NEXT: retl			; X86-FAST-NEXT: retl
	;			;
	; X86-SLOW-LABEL: var_shift_i64:			; X86-SLOW-LABEL: var_shift_i64:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %ebp			; X86-SLOW-NEXT: pushl %ebp
	; X86-SLOW-NEXT: pushl %ebx			; X86-SLOW-NEXT: pushl %ebx
	; X86-SLOW-NEXT: pushl %edi			; X86-SLOW-NEXT: pushl %edi
	; X86-SLOW-NEXT: pushl %esi			; X86-SLOW-NEXT: pushl %esi
	; X86-SLOW-NEXT: subl $8, %esp			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebp
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebx
	; X86-SLOW-NEXT: andl $63, %ebx			; X86-SLOW-NEXT: andl $63, %ebx
	; X86-SLOW-NEXT: movb $64, %dh			; X86-SLOW-NEXT: movb $64, %ch
	; X86-SLOW-NEXT: subb %bl, %dh			; X86-SLOW-NEXT: subb %bl, %ch
	; X86-SLOW-NEXT: movl %eax, (%esp) # 4-byte Spill
	; X86-SLOW-NEXT: movb %dh, %cl
	; X86-SLOW-NEXT: shrl %cl, %eax
	; X86-SLOW-NEXT: movb %dh, %dl
	; X86-SLOW-NEXT: andb $31, %dl
	; X86-SLOW-NEXT: movl %edx, %ecx
	; X86-SLOW-NEXT: negb %cl
	; X86-SLOW-NEXT: movl %esi, %ebp
	; X86-SLOW-NEXT: shll %cl, %ebp
	; X86-SLOW-NEXT: testb %dl, %dl
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-SLOW-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; X86-SLOW-NEXT: je .LBB5_2
	; X86-SLOW-NEXT: # %bb.1:
	; X86-SLOW-NEXT: orl %eax, %ebp
	; X86-SLOW-NEXT: movl %ebp, (%esp) # 4-byte Spill
	; X86-SLOW-NEXT: .LBB5_2:
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebp
	; X86-SLOW-NEXT: movl %ebp, %eax
	; X86-SLOW-NEXT: movl %ebx, %ecx
	; X86-SLOW-NEXT: shll %cl, %eax
	; X86-SLOW-NEXT: movb %bl, %ch
	; X86-SLOW-NEXT: andb $31, %ch
	; X86-SLOW-NEXT: movb %ch, %cl			; X86-SLOW-NEXT: movb %ch, %cl
	; X86-SLOW-NEXT: negb %cl			; X86-SLOW-NEXT: shrl %cl, %edx
				; X86-SLOW-NEXT: andb $31, %cl
				; X86-SLOW-NEXT: xorb $31, %cl
				; X86-SLOW-NEXT: addl %eax, %eax
				; X86-SLOW-NEXT: shll %cl, %eax
				; X86-SLOW-NEXT: movb %bl, %cl
				; X86-SLOW-NEXT: shll %cl, %ebp
				; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi
				; X86-SLOW-NEXT: movl %esi, %edi
				; X86-SLOW-NEXT: shrl %edi
				; X86-SLOW-NEXT: andb $31, %cl
				; X86-SLOW-NEXT: xorb $31, %cl
	; X86-SLOW-NEXT: shrl %cl, %edi			; X86-SLOW-NEXT: shrl %cl, %edi
	; X86-SLOW-NEXT: testb %ch, %ch			; X86-SLOW-NEXT: movb %bl, %cl
	; X86-SLOW-NEXT: je .LBB5_4			; X86-SLOW-NEXT: shll %cl, %esi
	; X86-SLOW-NEXT: # %bb.3:
	; X86-SLOW-NEXT: orl %edi, %eax
	; X86-SLOW-NEXT: movl %eax, %ebp
	; X86-SLOW-NEXT: .LBB5_4:
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl %eax, %edi
	; X86-SLOW-NEXT: movl %ebx, %ecx
	; X86-SLOW-NEXT: shll %cl, %edi
	; X86-SLOW-NEXT: testb $32, %bl			; X86-SLOW-NEXT: testb $32, %bl
	; X86-SLOW-NEXT: je .LBB5_6			; X86-SLOW-NEXT: jne .LBB5_1
				; X86-SLOW-NEXT: # %bb.2:
				; X86-SLOW-NEXT: orl %edi, %ebp
				; X86-SLOW-NEXT: jmp .LBB5_3
				; X86-SLOW-NEXT: .LBB5_1:
				; X86-SLOW-NEXT: movl %esi, %ebp
				; X86-SLOW-NEXT: xorl %esi, %esi
				; X86-SLOW-NEXT: .LBB5_3:
				; X86-SLOW-NEXT: movb %ch, %cl
				; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edi
				; X86-SLOW-NEXT: shrl %cl, %edi
				; X86-SLOW-NEXT: testb $32, %ch
				; X86-SLOW-NEXT: jne .LBB5_4
	; X86-SLOW-NEXT: # %bb.5:			; X86-SLOW-NEXT: # %bb.5:
	; X86-SLOW-NEXT: movl %edi, %ebp			; X86-SLOW-NEXT: orl %edx, %eax
				; X86-SLOW-NEXT: movl %eax, %ecx
				; X86-SLOW-NEXT: jmp .LBB5_6
				; X86-SLOW-NEXT: .LBB5_4:
				; X86-SLOW-NEXT: movl %edi, %ecx
	; X86-SLOW-NEXT: xorl %edi, %edi			; X86-SLOW-NEXT: xorl %edi, %edi
	; X86-SLOW-NEXT: .LBB5_6:			; X86-SLOW-NEXT: .LBB5_6:
	; X86-SLOW-NEXT: movb %dh, %cl			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: shrl %cl, %esi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: testb $32, %dh
	; X86-SLOW-NEXT: jne .LBB5_7
	; X86-SLOW-NEXT: # %bb.8:
	; X86-SLOW-NEXT: movl (%esp), %ecx # 4-byte Reload
	; X86-SLOW-NEXT: testl %ebx, %ebx
	; X86-SLOW-NEXT: jne .LBB5_10
	; X86-SLOW-NEXT: jmp .LBB5_11
	; X86-SLOW-NEXT: .LBB5_7:
	; X86-SLOW-NEXT: movl %esi, %ecx
	; X86-SLOW-NEXT: xorl %esi, %esi
	; X86-SLOW-NEXT: testl %ebx, %ebx			; X86-SLOW-NEXT: testl %ebx, %ebx
	; X86-SLOW-NEXT: je .LBB5_11			; X86-SLOW-NEXT: je .LBB5_8
	; X86-SLOW-NEXT: .LBB5_10:			; X86-SLOW-NEXT: # %bb.7:
	; X86-SLOW-NEXT: orl %esi, %ebp			; X86-SLOW-NEXT: orl %edi, %ebp
	; X86-SLOW-NEXT: orl %ecx, %edi			; X86-SLOW-NEXT: orl %ecx, %esi
	; X86-SLOW-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; X86-SLOW-NEXT: movl %ebp, %edx
	; X86-SLOW-NEXT: movl %edi, %eax			; X86-SLOW-NEXT: movl %esi, %eax
	; X86-SLOW-NEXT: .LBB5_11:			; X86-SLOW-NEXT: .LBB5_8:
	; X86-SLOW-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
	; X86-SLOW-NEXT: addl $8, %esp
	; X86-SLOW-NEXT: popl %esi			; X86-SLOW-NEXT: popl %esi
	; X86-SLOW-NEXT: popl %edi			; X86-SLOW-NEXT: popl %edi
	; X86-SLOW-NEXT: popl %ebx			; X86-SLOW-NEXT: popl %ebx
	; X86-SLOW-NEXT: popl %ebp			; X86-SLOW-NEXT: popl %ebp
	; X86-SLOW-NEXT: retl			; X86-SLOW-NEXT: retl
	;			;
	; X64-FAST-LABEL: var_shift_i64:			; X64-FAST-LABEL: var_shift_i64:
	; X64-FAST: # %bb.0:			; X64-FAST: # %bb.0:
	; X64-FAST-NEXT: movq %rdx, %rcx			; X64-FAST-NEXT: movq %rdx, %rcx
	; X64-FAST-NEXT: movq %rdi, %rax			; X64-FAST-NEXT: movq %rdi, %rax
	; X64-FAST-NEXT: # kill: def $cl killed $cl killed $rcx			; X64-FAST-NEXT: # kill: def $cl killed $cl killed $rcx
	; X64-FAST-NEXT: shldq %cl, %rsi, %rax			; X64-FAST-NEXT: shldq %cl, %rsi, %rax
	; X64-FAST-NEXT: retq			; X64-FAST-NEXT: retq
	;			;
	; X64-SLOW-LABEL: var_shift_i64:			; X64-SLOW-LABEL: var_shift_i64:
	; X64-SLOW: # %bb.0:			; X64-SLOW: # %bb.0:
				; X64-SLOW-NEXT: movq %rdx, %rcx
	; X64-SLOW-NEXT: movq %rsi, %rax			; X64-SLOW-NEXT: movq %rsi, %rax
	; X64-SLOW-NEXT: movq %rdi, %rsi			; X64-SLOW-NEXT: shlq %cl, %rdi
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: shrq %rax
	; X64-SLOW-NEXT: shlq %cl, %rsi			; X64-SLOW-NEXT: andb $63, %cl
	; X64-SLOW-NEXT: andb $63, %dl			; X64-SLOW-NEXT: xorb $63, %cl
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: # kill: def $cl killed $cl killed $rcx
	; X64-SLOW-NEXT: negb %cl
	; X64-SLOW-NEXT: shrq %cl, %rax			; X64-SLOW-NEXT: shrq %cl, %rax
	; X64-SLOW-NEXT: orq %rsi, %rax			; X64-SLOW-NEXT: orq %rdi, %rax
	; X64-SLOW-NEXT: testb %dl, %dl
	; X64-SLOW-NEXT: cmoveq %rdi, %rax
	; X64-SLOW-NEXT: retq			; X64-SLOW-NEXT: retq
	%tmp = tail call i64 @llvm.fshl.i64(i64 %x, i64 %y, i64 %z)			%tmp = tail call i64 @llvm.fshl.i64(i64 %x, i64 %y, i64 %z)
	ret i64 %tmp			ret i64 %tmp
	}			}

	;			;
	; Const Funnel Shift			; Const Funnel Shift
	;			;
	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fshr.ll

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; X86-FAST-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X86-FAST-NEXT: movzwl {{[0-9]+}}(%esp), %eax
	; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl			; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-FAST-NEXT: andb $15, %cl			; X86-FAST-NEXT: andb $15, %cl
	; X86-FAST-NEXT: shrdw %cl, %dx, %ax			; X86-FAST-NEXT: shrdw %cl, %dx, %ax
	; X86-FAST-NEXT: retl			; X86-FAST-NEXT: retl
	;			;
	; X86-SLOW-LABEL: var_shift_i16:			; X86-SLOW-LABEL: var_shift_i16:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %edi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: pushl %esi			; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi			; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl			; X86-SLOW-NEXT: andb $15, %cl
	; X86-SLOW-NEXT: andb $15, %dl			; X86-SLOW-NEXT: shrl %cl, %edx
	; X86-SLOW-NEXT: movzwl {{[0-9]+}}(%esp), %eax			; X86-SLOW-NEXT: addl %eax, %eax
	; X86-SLOW-NEXT: movl %eax, %edi			; X86-SLOW-NEXT: xorb $15, %cl
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: shll %cl, %eax
	; X86-SLOW-NEXT: shrl %cl, %edi			; X86-SLOW-NEXT: orl %edx, %eax
	; X86-SLOW-NEXT: movb $16, %cl
	; X86-SLOW-NEXT: subb %dl, %cl
	; X86-SLOW-NEXT: shll %cl, %esi
	; X86-SLOW-NEXT: testb %dl, %dl
	; X86-SLOW-NEXT: je .LBB1_2
	; X86-SLOW-NEXT: # %bb.1:
	; X86-SLOW-NEXT: orl %edi, %esi
	; X86-SLOW-NEXT: movl %esi, %eax
	; X86-SLOW-NEXT: .LBB1_2:
	; X86-SLOW-NEXT: # kill: def $ax killed $ax killed $eax			; X86-SLOW-NEXT: # kill: def $ax killed $ax killed $eax
	; X86-SLOW-NEXT: popl %esi
	; X86-SLOW-NEXT: popl %edi
	; X86-SLOW-NEXT: retl			; X86-SLOW-NEXT: retl
	;			;
	; X64-FAST-LABEL: var_shift_i16:			; X64-FAST-LABEL: var_shift_i16:
	; X64-FAST: # %bb.0:			; X64-FAST: # %bb.0:
	; X64-FAST-NEXT: movl %edx, %ecx			; X64-FAST-NEXT: movl %edx, %ecx
	; X64-FAST-NEXT: movl %esi, %eax			; X64-FAST-NEXT: movl %esi, %eax
	; X64-FAST-NEXT: andb $15, %cl			; X64-FAST-NEXT: andb $15, %cl
	; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-FAST-NEXT: shrdw %cl, %di, %ax			; X64-FAST-NEXT: shrdw %cl, %di, %ax
	; X64-FAST-NEXT: # kill: def $ax killed $ax killed $eax			; X64-FAST-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-FAST-NEXT: retq			; X64-FAST-NEXT: retq
	;			;
	; X64-SLOW-LABEL: var_shift_i16:			; X64-SLOW-LABEL: var_shift_i16:
	; X64-SLOW: # %bb.0:			; X64-SLOW: # %bb.0:
	; X64-SLOW-NEXT: movzwl %si, %eax
	; X64-SLOW-NEXT: andb $15, %dl
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: movl %edx, %ecx
	; X64-SLOW-NEXT: shrl %cl, %eax			; X64-SLOW-NEXT: # kill: def $edi killed $edi def $rdi
	; X64-SLOW-NEXT: movb $16, %cl			; X64-SLOW-NEXT: movzwl %si, %edx
	; X64-SLOW-NEXT: subb %dl, %cl			; X64-SLOW-NEXT: andb $15, %cl
	; X64-SLOW-NEXT: shll %cl, %edi			; X64-SLOW-NEXT: shrl %cl, %edx
	; X64-SLOW-NEXT: orl %edi, %eax			; X64-SLOW-NEXT: leal (%rdi,%rdi), %eax
	; X64-SLOW-NEXT: testb %dl, %dl			; X64-SLOW-NEXT: xorb $15, %cl
	; X64-SLOW-NEXT: cmovel %esi, %eax			; X64-SLOW-NEXT: # kill: def $cl killed $cl killed $ecx
				; X64-SLOW-NEXT: shll %cl, %eax
				; X64-SLOW-NEXT: orl %edx, %eax
	; X64-SLOW-NEXT: # kill: def $ax killed $ax killed $eax			; X64-SLOW-NEXT: # kill: def $ax killed $ax killed $eax
	; X64-SLOW-NEXT: retq			; X64-SLOW-NEXT: retq
	%tmp = tail call i16 @llvm.fshr.i16(i16 %x, i16 %y, i16 %z)			%tmp = tail call i16 @llvm.fshr.i16(i16 %x, i16 %y, i16 %z)
	ret i16 %tmp			ret i16 %tmp
	}			}

	define i32 @var_shift_i32(i32 %x, i32 %y, i32 %z) nounwind {			define i32 @var_shift_i32(i32 %x, i32 %y, i32 %z) nounwind {
	; X86-FAST-LABEL: var_shift_i32:			; X86-FAST-LABEL: var_shift_i32:
	; X86-FAST: # %bb.0:			; X86-FAST: # %bb.0:
	; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl			; X86-FAST-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-FAST-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-FAST-NEXT: shrdl %cl, %edx, %eax			; X86-FAST-NEXT: shrdl %cl, %edx, %eax
	; X86-FAST-NEXT: retl			; X86-FAST-NEXT: retl
	;			;
	; X86-SLOW-LABEL: var_shift_i32:			; X86-SLOW-LABEL: var_shift_i32:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %edi
	; X86-SLOW-NEXT: pushl %esi
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %dl
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl %eax, %edi			; X86-SLOW-NEXT: movb {{[0-9]+}}(%esp), %cl
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: shrl %cl, %edi			; X86-SLOW-NEXT: shrl %cl, %edx
	; X86-SLOW-NEXT: andb $31, %dl			; X86-SLOW-NEXT: addl %eax, %eax
	; X86-SLOW-NEXT: movl %edx, %ecx			; X86-SLOW-NEXT: andb $31, %cl
	; X86-SLOW-NEXT: negb %cl			; X86-SLOW-NEXT: xorb $31, %cl
	; X86-SLOW-NEXT: shll %cl, %esi			; X86-SLOW-NEXT: shll %cl, %eax
	; X86-SLOW-NEXT: testb %dl, %dl			; X86-SLOW-NEXT: orl %edx, %eax
	; X86-SLOW-NEXT: je .LBB2_2
	; X86-SLOW-NEXT: # %bb.1:
	; X86-SLOW-NEXT: orl %edi, %esi
	; X86-SLOW-NEXT: movl %esi, %eax
	; X86-SLOW-NEXT: .LBB2_2:
	; X86-SLOW-NEXT: popl %esi
	; X86-SLOW-NEXT: popl %edi
	; X86-SLOW-NEXT: retl			; X86-SLOW-NEXT: retl
	;			;
	; X64-FAST-LABEL: var_shift_i32:			; X64-FAST-LABEL: var_shift_i32:
	; X64-FAST: # %bb.0:			; X64-FAST: # %bb.0:
	; X64-FAST-NEXT: movl %edx, %ecx			; X64-FAST-NEXT: movl %edx, %ecx
	; X64-FAST-NEXT: movl %esi, %eax			; X64-FAST-NEXT: movl %esi, %eax
	; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-FAST-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-FAST-NEXT: shrdl %cl, %edi, %eax			; X64-FAST-NEXT: shrdl %cl, %edi, %eax
	; X64-FAST-NEXT: retq			; X64-FAST-NEXT: retq
	;			;
	; X64-SLOW-LABEL: var_shift_i32:			; X64-SLOW-LABEL: var_shift_i32:
	; X64-SLOW: # %bb.0:			; X64-SLOW: # %bb.0:
	; X64-SLOW-NEXT: movl %edi, %eax
	; X64-SLOW-NEXT: movl %esi, %edi
	; X64-SLOW-NEXT: movl %edx, %ecx
	; X64-SLOW-NEXT: shrl %cl, %edi
	; X64-SLOW-NEXT: andb $31, %dl
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: movl %edx, %ecx
	; X64-SLOW-NEXT: negb %cl			; X64-SLOW-NEXT: # kill: def $edi killed $edi def $rdi
				; X64-SLOW-NEXT: shrl %cl, %esi
				; X64-SLOW-NEXT: leal (%rdi,%rdi), %eax
				; X64-SLOW-NEXT: andb $31, %cl
				; X64-SLOW-NEXT: xorb $31, %cl
				; X64-SLOW-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-SLOW-NEXT: shll %cl, %eax			; X64-SLOW-NEXT: shll %cl, %eax
	; X64-SLOW-NEXT: orl %edi, %eax			; X64-SLOW-NEXT: orl %esi, %eax
	; X64-SLOW-NEXT: testb %dl, %dl
	; X64-SLOW-NEXT: cmovel %esi, %eax
	; X64-SLOW-NEXT: retq			; X64-SLOW-NEXT: retq
	%tmp = tail call i32 @llvm.fshr.i32(i32 %x, i32 %y, i32 %z)			%tmp = tail call i32 @llvm.fshr.i32(i32 %x, i32 %y, i32 %z)
	ret i32 %tmp			ret i32 %tmp
	}			}

	define i32 @var_shift_i32_optsize(i32 %x, i32 %y, i32 %z) nounwind optsize {			define i32 @var_shift_i32_optsize(i32 %x, i32 %y, i32 %z) nounwind optsize {
	; X86-LABEL: var_shift_i32_optsize:			; X86-LABEL: var_shift_i32_optsize:
	; X86: # %bb.0:			; X86: # %bb.0:
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; X86-SLOW-LABEL: var_shift_i64:			; X86-SLOW-LABEL: var_shift_i64:
	; X86-SLOW: # %bb.0:			; X86-SLOW: # %bb.0:
	; X86-SLOW-NEXT: pushl %ebp			; X86-SLOW-NEXT: pushl %ebp
	; X86-SLOW-NEXT: pushl %ebx			; X86-SLOW-NEXT: pushl %ebx
	; X86-SLOW-NEXT: pushl %edi			; X86-SLOW-NEXT: pushl %edi
	; X86-SLOW-NEXT: pushl %esi			; X86-SLOW-NEXT: pushl %esi
	; X86-SLOW-NEXT: subl $8, %esp			; X86-SLOW-NEXT: subl $8, %esp
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebx
	; X86-SLOW-NEXT: andl $63, %ebx			; X86-SLOW-NEXT: andl $63, %ebx
	; X86-SLOW-NEXT: movb $64, %al			; X86-SLOW-NEXT: movb $64, %ch
	; X86-SLOW-NEXT: subb %bl, %al			; X86-SLOW-NEXT: subb %bl, %ch
	; X86-SLOW-NEXT: movl %edx, (%esp) # 4-byte Spill
	; X86-SLOW-NEXT: movl %eax, %ecx
	; X86-SLOW-NEXT: shll %cl, %edx
	; X86-SLOW-NEXT: movb %al, %ch
	; X86-SLOW-NEXT: andb $31, %ch
	; X86-SLOW-NEXT: movb %ch, %cl			; X86-SLOW-NEXT: movb %ch, %cl
	; X86-SLOW-NEXT: negb %cl			; X86-SLOW-NEXT: shll %cl, %edx
	; X86-SLOW-NEXT: movl %esi, %edi			; X86-SLOW-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; X86-SLOW-NEXT: shrl %cl, %edi			; X86-SLOW-NEXT: movl %esi, %edx
	; X86-SLOW-NEXT: testb %ch, %ch			; X86-SLOW-NEXT: andb $31, %cl
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebp			; X86-SLOW-NEXT: xorb $31, %cl
	; X86-SLOW-NEXT: je .LBB5_2			; X86-SLOW-NEXT: shrl %esi
	; X86-SLOW-NEXT: # %bb.1:			; X86-SLOW-NEXT: shrl %cl, %esi
	; X86-SLOW-NEXT: orl %edi, %edx			; X86-SLOW-NEXT: movb %bl, %cl
	; X86-SLOW-NEXT: movl %edx, (%esp) # 4-byte Spill			; X86-SLOW-NEXT: shrl %cl, %eax
	; X86-SLOW-NEXT: .LBB5_2:			; X86-SLOW-NEXT: andb $31, %cl
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-SLOW-NEXT: xorb $31, %cl
	; X86-SLOW-NEXT: movl %ebx, %ecx
	; X86-SLOW-NEXT: shrl %cl, %edx
	; X86-SLOW-NEXT: movb %bl, %ah
	; X86-SLOW-NEXT: andb $31, %ah
	; X86-SLOW-NEXT: movb %ah, %cl
	; X86-SLOW-NEXT: negb %cl
	; X86-SLOW-NEXT: movl %ebp, %edi
	; X86-SLOW-NEXT: shll %cl, %edi
	; X86-SLOW-NEXT: testb %ah, %ah
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebp
	; X86-SLOW-NEXT: je .LBB5_4
	; X86-SLOW-NEXT: # %bb.3:
	; X86-SLOW-NEXT: orl %edx, %edi
	; X86-SLOW-NEXT: movl %edi, %ebp
	; X86-SLOW-NEXT: .LBB5_4:
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X86-SLOW-NEXT: movl %ebx, %ecx			; X86-SLOW-NEXT: leal (%edi,%edi), %ebp
				; X86-SLOW-NEXT: shll %cl, %ebp
				; X86-SLOW-NEXT: movb %bl, %cl
	; X86-SLOW-NEXT: shrl %cl, %edi			; X86-SLOW-NEXT: shrl %cl, %edi
	; X86-SLOW-NEXT: testb $32, %bl			; X86-SLOW-NEXT: testb $32, %bl
	; X86-SLOW-NEXT: je .LBB5_6			; X86-SLOW-NEXT: jne .LBB5_1
	; X86-SLOW-NEXT: # %bb.5:			; X86-SLOW-NEXT: # %bb.2:
				; X86-SLOW-NEXT: orl %eax, %ebp
				; X86-SLOW-NEXT: jmp .LBB5_3
				; X86-SLOW-NEXT: .LBB5_1:
	; X86-SLOW-NEXT: movl %edi, %ebp			; X86-SLOW-NEXT: movl %edi, %ebp
	; X86-SLOW-NEXT: xorl %edi, %edi			; X86-SLOW-NEXT: xorl %edi, %edi
				; X86-SLOW-NEXT: .LBB5_3:
				; X86-SLOW-NEXT: movb %ch, %cl
				; X86-SLOW-NEXT: shll %cl, %edx
				; X86-SLOW-NEXT: testb $32, %ch
				; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X86-SLOW-NEXT: jne .LBB5_4
				; X86-SLOW-NEXT: # %bb.5:
				; X86-SLOW-NEXT: movl %edx, (%esp) # 4-byte Spill
				; X86-SLOW-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
				; X86-SLOW-NEXT: orl %esi, %ecx
				; X86-SLOW-NEXT: jmp .LBB5_6
				; X86-SLOW-NEXT: .LBB5_4:
				; X86-SLOW-NEXT: movl %edx, %ecx
				; X86-SLOW-NEXT: movl $0, (%esp) # 4-byte Folded Spill
	; X86-SLOW-NEXT: .LBB5_6:			; X86-SLOW-NEXT: .LBB5_6:
	; X86-SLOW-NEXT: movl %eax, %ecx
	; X86-SLOW-NEXT: shll %cl, %esi
	; X86-SLOW-NEXT: testb $32, %al
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-SLOW-NEXT: jne .LBB5_7
	; X86-SLOW-NEXT: # %bb.8:
	; X86-SLOW-NEXT: movl (%esp), %eax # 4-byte Reload
	; X86-SLOW-NEXT: testl %ebx, %ebx
	; X86-SLOW-NEXT: jne .LBB5_10
	; X86-SLOW-NEXT: jmp .LBB5_11
	; X86-SLOW-NEXT: .LBB5_7:
	; X86-SLOW-NEXT: movl %esi, %eax
	; X86-SLOW-NEXT: xorl %esi, %esi
	; X86-SLOW-NEXT: testl %ebx, %ebx			; X86-SLOW-NEXT: testl %ebx, %ebx
	; X86-SLOW-NEXT: je .LBB5_11			; X86-SLOW-NEXT: je .LBB5_8
	; X86-SLOW-NEXT: .LBB5_10:			; X86-SLOW-NEXT: # %bb.7:
	; X86-SLOW-NEXT: orl %ebp, %esi			; X86-SLOW-NEXT: movl (%esp), %eax # 4-byte Reload
	; X86-SLOW-NEXT: orl %edi, %eax			; X86-SLOW-NEXT: orl %ebp, %eax
	; X86-SLOW-NEXT: movl %esi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; X86-SLOW-NEXT: orl %edi, %ecx
	; X86-SLOW-NEXT: movl %eax, %edx			; X86-SLOW-NEXT: movl %ecx, %edx
	; X86-SLOW-NEXT: .LBB5_11:			; X86-SLOW-NEXT: .LBB5_8:
	; X86-SLOW-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
	; X86-SLOW-NEXT: addl $8, %esp			; X86-SLOW-NEXT: addl $8, %esp
	; X86-SLOW-NEXT: popl %esi			; X86-SLOW-NEXT: popl %esi
	; X86-SLOW-NEXT: popl %edi			; X86-SLOW-NEXT: popl %edi
	; X86-SLOW-NEXT: popl %ebx			; X86-SLOW-NEXT: popl %ebx
	; X86-SLOW-NEXT: popl %ebp			; X86-SLOW-NEXT: popl %ebp
	; X86-SLOW-NEXT: retl			; X86-SLOW-NEXT: retl
	;			;
	; X64-FAST-LABEL: var_shift_i64:			; X64-FAST-LABEL: var_shift_i64:
	; X64-FAST: # %bb.0:			; X64-FAST: # %bb.0:
	; X64-FAST-NEXT: movq %rdx, %rcx			; X64-FAST-NEXT: movq %rdx, %rcx
	; X64-FAST-NEXT: movq %rsi, %rax			; X64-FAST-NEXT: movq %rsi, %rax
	; X64-FAST-NEXT: # kill: def $cl killed $cl killed $rcx			; X64-FAST-NEXT: # kill: def $cl killed $cl killed $rcx
	; X64-FAST-NEXT: shrdq %cl, %rdi, %rax			; X64-FAST-NEXT: shrdq %cl, %rdi, %rax
	; X64-FAST-NEXT: retq			; X64-FAST-NEXT: retq
	;			;
	; X64-SLOW-LABEL: var_shift_i64:			; X64-SLOW-LABEL: var_shift_i64:
	; X64-SLOW: # %bb.0:			; X64-SLOW: # %bb.0:
	; X64-SLOW-NEXT: movq %rdi, %rax			; X64-SLOW-NEXT: movq %rdx, %rcx
	; X64-SLOW-NEXT: movq %rsi, %rdi			; X64-SLOW-NEXT: shrq %cl, %rsi
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: leaq (%rdi,%rdi), %rax
	; X64-SLOW-NEXT: shrq %cl, %rdi			; X64-SLOW-NEXT: andb $63, %cl
	; X64-SLOW-NEXT: andb $63, %dl			; X64-SLOW-NEXT: xorb $63, %cl
	; X64-SLOW-NEXT: movl %edx, %ecx			; X64-SLOW-NEXT: # kill: def $cl killed $cl killed $rcx
	; X64-SLOW-NEXT: negb %cl
	; X64-SLOW-NEXT: shlq %cl, %rax			; X64-SLOW-NEXT: shlq %cl, %rax
	; X64-SLOW-NEXT: orq %rdi, %rax			; X64-SLOW-NEXT: orq %rsi, %rax
	; X64-SLOW-NEXT: testb %dl, %dl
	; X64-SLOW-NEXT: cmoveq %rsi, %rax
	; X64-SLOW-NEXT: retq			; X64-SLOW-NEXT: retq
	%tmp = tail call i64 @llvm.fshr.i64(i64 %x, i64 %y, i64 %z)			%tmp = tail call i64 @llvm.fshr.i64(i64 %x, i64 %y, i64 %z)
	ret i64 %tmp			ret i64 %tmp
	}			}

	;			;
	; Const Funnel Shift			; Const Funnel Shift
	;			;
	▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines