This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
3/16
DAGCombiner.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
funnel-shift.ll

Differential D58009

[DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts
ClosedPublic

Authored by RKSimon on Feb 9 2019, 2:39 PM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
nikic

Commits

rG5a82a788a28e: [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts
rL353645: [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts

Summary

Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon created this revision.Feb 9 2019, 2:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2019, 2:39 PM

Some thoughts.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7119	What should be done if N2 is `undef`? Pretend that it is `0`, or replace the entire op with `undef`?
7148	Since we can replace `undef` with something, e.g. `0`, we should do the same if it's `0` too.
7152	Same
7158–7159	Do `ISD::SRL` / `ISD::SHL` implicitly take the modulo of the shift amount?
7161	Same remark re `0`.
7163	Same remark re `0`.

nikic added a subscriber: nikic.Feb 10 2019, 1:13 AM

nikic added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7119	Pretending it is zero would be consistent with InstSimplify: https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/InstructionSimplify.cpp#L5129 Replacing with undef is not legal (consider for example N0=0, N1=0, which has zero as the only possible result, regardless of N2).
7148	This would also be consistent with InstCombine: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1996

RKSimon marked 2 inline comments as done.Feb 10 2019, 3:34 AM

RKSimon added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7148	Sure, I can add support for zero as well.
7158–7159	Ah good point! Will limit this to PowerOf2 cases that have passed the maskediszero above

lebedev.ri added inline comments.Feb 10 2019, 4:40 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7158–7159	By modulo meant that funnel shift implicitly `urem`'s the shift amount by the bitwidth. So %r = fshl %a, 0, %c => %r = shl %a, %c is a miscompile. I.e. this should be folded to %r = fshl i32 %a, 0, %c => %cmodwidth = and i32 %c, 31 ; <- !!! %r = shl i32 %a, %cmodwidth

RKSimon mentioned this in rL353640: [X86] Add additional tests for funnel undef/zero argument combines.Feb 10 2019, 6:57 AM

RKSimon mentioned this in rG76683e7b5800: [X86] Add additional tests for funnel undef/zero argument combines.

Add support for folding cases with zero arguments.

Limited the variable cases to where we know the shift is in range

I'd prefer to deal with the undef shift amounts in a separate patch as that's mostly separate from this fold logic.

RKSimon added a reviewer: nikic.Feb 10 2019, 8:07 AM

Looks ok to me.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7161	Hmm, if `FeatureSlowSHLD` is set, or we have BMI2 (and thus sh[lr]x, not depending on `%cl` reg)? Also, to be noted `sub 32, n` should get folded to `neg n`, IIRC.

This revision is now accepted and ready to land.Feb 10 2019, 8:19 AM

nikic added inline comments.Feb 10 2019, 8:26 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7132	Maybe pass AllowUndefs=true, as we want to deal with both undef/zero anyway?
7147	Should be s/lshr/shl in the two latter comments.
7163	Shouldn't this be either setting the `BitWidth - Log2_32(BitWidth)` high bits, or maybe use `getBitsSetFrom()` instead? I think right now this is checking too few bits.

RKSimon marked an inline comment as done.Feb 10 2019, 8:54 AM

RKSimon added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7163	Nice catch!

Closed by commit rL353645: [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts (authored by RKSimon). · Explain WhyFeb 10 2019, 9:03 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

	DAGCombiner.cpp
	DAGCombiner.cpp (revision 353616)

33 lines

test/

CodeGen/

X86/

	funnel-shift.ll
	funnel-shift.ll (revision 353628)

26 lines

Diff 186135

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,110 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSRL(SDNode *N) {

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFunnelShift(SDNode *N) {		SDValue DAGCombiner::visitFunnelShift(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
		lebedev.riUnsubmitted Not Done Reply Inline Actions What should be done if N2 is `undef`? Pretend that it is `0`, or replace the entire op with `undef`? lebedev.ri: What should be done if N2 is `undef`? Pretend that it is `0`, or replace the entire op with…
		nikicUnsubmitted Not Done Reply Inline Actions Pretending it is zero would be consistent with InstSimplify: https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/InstructionSimplify.cpp#L5129 Replacing with undef is not legal (consider for example N0=0, N1=0, which has zero as the only possible result, regardless of N2). nikic: Pretending it is zero would be consistent with InstSimplify: https://github.com/llvm…
bool IsFSHL = N->getOpcode() == ISD::FSHL;		bool IsFSHL = N->getOpcode() == ISD::FSHL;
unsigned BitWidth = VT.getScalarSizeInBits();		unsigned BitWidth = VT.getScalarSizeInBits();

// fold (fshl N0, N1, 0) -> N0		// fold (fshl N0, N1, 0) -> N0
// fold (fshr N0, N1, 0) -> N1		// fold (fshr N0, N1, 0) -> N1
if (isPowerOf2_32(BitWidth))		if (isPowerOf2_32(BitWidth))
if (DAG.MaskedValueIsZero(		if (DAG.MaskedValueIsZero(
N2, APInt(N2.getScalarValueSizeInBits(), BitWidth - 1)))		N2, APInt(N2.getScalarValueSizeInBits(), BitWidth - 1)))
return IsFSHL ? N0 : N1;		return IsFSHL ? N0 : N1;

// fold (fsh* N0, N1, c) -> (fsh* N0, N1, c % BitWidth)
if (ConstantSDNode *Cst = isConstOrConstSplat(N2)) {		if (ConstantSDNode *Cst = isConstOrConstSplat(N2)) {
		EVT ShAmtTy = N2.getValueType();

		nikicUnsubmitted Not Done Reply Inline Actions Maybe pass AllowUndefs=true, as we want to deal with both undef/zero anyway? nikic: Maybe pass AllowUndefs=true, as we want to deal with both undef/zero anyway?
		// fold (fsh* N0, N1, c) -> (fsh* N0, N1, c % BitWidth)
if (Cst->getAPIntValue().uge(BitWidth)) {		if (Cst->getAPIntValue().uge(BitWidth)) {
uint64_t RotAmt = Cst->getAPIntValue().urem(BitWidth);		uint64_t RotAmt = Cst->getAPIntValue().urem(BitWidth);
return DAG.getNode(N->getOpcode(), SDLoc(N), VT, N0, N1,		return DAG.getNode(N->getOpcode(), SDLoc(N), VT, N0, N1,
DAG.getConstant(RotAmt, SDLoc(N), N2.getValueType()));		DAG.getConstant(RotAmt, SDLoc(N), ShAmtTy));
}		}

		unsigned ShAmt = Cst->getZExtValue();
		if (ShAmt == 0)
		return IsFSHL ? N0 : N1;

		// fold fshl(undef, N1, C) -> lshr(N1, BW-C)
		// fold fshr(undef, N1, C) -> lshr(N1, C)
		// fold fshl(N0, undef, C) -> lshr(N0, C)
		// fold fshr(N0, undef, C) -> lshr(N0, BW-C)
		nikicUnsubmitted Not Done Reply Inline Actions Should be s/lshr/shl in the two latter comments. nikic: Should be s/lshr/shl in the two latter comments.
		if (N0.isUndef())
		lebedev.riUnsubmitted Not Done Reply Inline Actions Since we can replace `undef` with something, e.g. `0`, we should do the same if it's `0` too. lebedev.ri: Since we can replace `undef` with something, e.g. `0`, we should do the same if it's `0` too.
		nikicUnsubmitted Not Done Reply Inline Actions This would also be consistent with InstCombine: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1996 nikic: This would also be consistent with InstCombine: https://github.com/llvm…
		RKSimonAuthorUnsubmitted Done Reply Inline Actions Sure, I can add support for zero as well. RKSimon: Sure, I can add support for zero as well.
		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N1,
		DAG.getConstant(IsFSHL ? BitWidth - ShAmt : ShAmt,
		SDLoc(N), ShAmtTy));
		if (N1.isUndef())
		lebedev.riUnsubmitted Not Done Reply Inline Actions Same lebedev.ri: Same
		return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0,
		DAG.getConstant(IsFSHL ? ShAmt : BitWidth - ShAmt,
		SDLoc(N), ShAmtTy));
}		}

		// fold fshr(undef, N1, N2) -> lshr(N1, N2)
		// fold fshl(N0, undef, N2) -> shl(N0, N2)
		lebedev.riUnsubmitted Not Done Reply Inline Actions Do `ISD::SRL` / `ISD::SHL` implicitly take the modulo of the shift amount? lebedev.ri: Do `ISD::SRL` / `ISD::SHL` implicitly take the modulo of the shift amount?
		RKSimonAuthorUnsubmitted Done Reply Inline Actions Ah good point! Will limit this to PowerOf2 cases that have passed the maskediszero above RKSimon: Ah good point! Will limit this to PowerOf2 cases that have passed the maskediszero above
		lebedev.riUnsubmitted Not Done Reply Inline Actions By modulo meant that funnel shift implicitly `urem`'s the shift amount by the bitwidth. So %r = fshl %a, 0, %c => %r = shl %a, %c is a miscompile. I.e. this should be folded to %r = fshl i32 %a, 0, %c => %cmodwidth = and i32 %c, 31 ; <- !!! %r = shl i32 %a, %cmodwidth lebedev.ri: By modulo meant that funnel shift implicitly `urem`'s the shift amount by the bitwidth. So ```…
		// TODO: when is it worth doing SUB(BW, N2) as well?
		if (N0.isUndef() && !IsFSHL)
		lebedev.riUnsubmitted Not Done Reply Inline Actions Same remark re `0`. lebedev.ri: Same remark re `0`.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Hmm, if `FeatureSlowSHLD` is set, or we have BMI2 (and thus sh[lr]x, not depending on `%cl` reg)? Also, to be noted `sub 32, n` should get folded to `neg n`, IIRC. lebedev.ri: Hmm, if `FeatureSlowSHLD` is set, or we have BMI2 (and thus sh[lr]x, not depending on `%cl`…
		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N1, N2);
		if (N1.isUndef() && IsFSHL)
		lebedev.riUnsubmitted Not Done Reply Inline Actions Same remark re `0`. lebedev.ri: Same remark re `0`.
		nikicUnsubmitted Not Done Reply Inline Actions Shouldn't this be either setting the `BitWidth - Log2_32(BitWidth)` high bits, or maybe use `getBitsSetFrom()` instead? I think right now this is checking too few bits. nikic: Shouldn't this be either setting the `BitWidth - Log2_32(BitWidth)` high bits, or maybe use…
		RKSimonAuthorUnsubmitted Done Reply Inline Actions Nice catch! RKSimon: Nice catch!
		return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0, N2);

// fold (fshl N0, N0, N2) -> (rotl N0, N2)		// fold (fshl N0, N0, N2) -> (rotl N0, N2)
// fold (fshr N0, N0, N2) -> (rotr N0, N2)		// fold (fshr N0, N0, N2) -> (rotr N0, N2)
// TODO: Investigate flipping this rotate if only one is legal, if funnel shift		// TODO: Investigate flipping this rotate if only one is legal, if funnel shift
// is legal as well we might be better off avoiding non-constant (BW - N2).		// is legal as well we might be better off avoiding non-constant (BW - N2).
unsigned RotOpc = IsFSHL ? ISD::ROTL : ISD::ROTR;		unsigned RotOpc = IsFSHL ? ISD::ROTL : ISD::ROTR;
if (N0 == N1 && hasOperation(RotOpc, VT))		if (N0 == N1 && hasOperation(RotOpc, VT))
return DAG.getNode(RotOpc, SDLoc(N), VT, N0, N2);		return DAG.getNode(RotOpc, SDLoc(N), VT, N0, N2);

▲ Show 20 Lines • Show All 12,383 Lines • Show Last 20 Lines

test/CodeGen/X86/funnel-shift.ll

Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines	; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 %a1)		%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef0_cst(i32 %a0) nounwind {		define i32 @fshl_i32_undef0_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef0_cst:		; X32-SSE2-LABEL: fshl_i32_undef0_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shldl $9, %eax, %eax		; X32-SSE2-NEXT: shrl $23, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_undef0_cst:		; X64-AVX2-LABEL: fshl_i32_undef0_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: shldl $9, %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
		; X64-AVX2-NEXT: shrl $23, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 9)		%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef1(i32 %a0, i32 %a1) nounwind {		define i32 @fshl_i32_undef1(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef1:		; X32-SSE2-LABEL: fshl_i32_undef1:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movb {{[0-9]+}}(%esp), %cl		; X32-SSE2-NEXT: movb {{[0-9]+}}(%esp), %cl
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shldl %cl, %eax, %eax		; X32-SSE2-NEXT: shll %cl, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_undef1:		; X64-AVX2-LABEL: fshl_i32_undef1:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %esi, %ecx		; X64-AVX2-NEXT: movl %esi, %ecx
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx		; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx
; X64-AVX2-NEXT: shldl %cl, %eax, %eax		; X64-AVX2-NEXT: shll %cl, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 %a1)		%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef1_cst(i32 %a0) nounwind {		define i32 @fshl_i32_undef1_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef1_cst:		; X32-SSE2-LABEL: fshl_i32_undef1_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shldl $9, %eax, %eax		; X32-SSE2-NEXT: shll $9, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_undef1_cst:		; X64-AVX2-LABEL: fshl_i32_undef1_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shldl $9, %eax, %eax		; X64-AVX2-NEXT: shll $9, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 9)		%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef0(i32 %a0, i32 %a1) nounwind {		define i32 @fshr_i32_undef0(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef0:		; X32-SSE2-LABEL: fshr_i32_undef0:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movb {{[0-9]+}}(%esp), %cl		; X32-SSE2-NEXT: movb {{[0-9]+}}(%esp), %cl
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shrdl %cl, %eax, %eax		; X32-SSE2-NEXT: shrl %cl, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_undef0:		; X64-AVX2-LABEL: fshr_i32_undef0:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %esi, %ecx		; X64-AVX2-NEXT: movl %esi, %ecx
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx		; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx
; X64-AVX2-NEXT: shrdl %cl, %eax, %eax		; X64-AVX2-NEXT: shrl %cl, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 %a1)		%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef0_cst(i32 %a0) nounwind {		define i32 @fshr_i32_undef0_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef0_cst:		; X32-SSE2-LABEL: fshr_i32_undef0_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shrdl $9, %eax, %eax		; X32-SSE2-NEXT: shrl $9, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_undef0_cst:		; X64-AVX2-LABEL: fshr_i32_undef0_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shrdl $9, %eax, %eax		; X64-AVX2-NEXT: shrl $9, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 9)		%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef1(i32 %a0, i32 %a1) nounwind {		define i32 @fshr_i32_undef1(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef1:		; X32-SSE2-LABEL: fshr_i32_undef1:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 11 Lines	; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 %a1)		%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef1_cst(i32 %a0) nounwind {		define i32 @fshr_i32_undef1_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef1_cst:		; X32-SSE2-LABEL: fshr_i32_undef1_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shrdl $9, %eax, %eax		; X32-SSE2-NEXT: shll $23, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_undef1_cst:		; X64-AVX2-LABEL: fshr_i32_undef1_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: shrdl $9, %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
		; X64-AVX2-NEXT: shll $23, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 9)		%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 9)
ret i32 %res		ret i32 %res
}		}

; With constant shift amount, this is 'shrd' or 'shld'.		; With constant shift amount, this is 'shrd' or 'shld'.

define i32 @fshr_i32_const_shift(i32 %x, i32 %y) nounwind {		define i32 @fshr_i32_const_shift(i32 %x, i32 %y) nounwind {
▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines