This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
funnel-shift.ll

Differential D58009

[DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts
ClosedPublic

Authored by RKSimon on Feb 9 2019, 2:39 PM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
nikic

Commits

rG5a82a788a28e: [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts
rL353645: [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts

Summary

Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon created this revision.Feb 9 2019, 2:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2019, 2:39 PM

Some thoughts.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7119 ↗	(On Diff #186135)	What should be done if N2 is `undef`? Pretend that it is `0`, or replace the entire op with `undef`?
7148 ↗	(On Diff #186135)	Since we can replace `undef` with something, e.g. `0`, we should do the same if it's `0` too.
7152 ↗	(On Diff #186135)	Same
7158–7159 ↗	(On Diff #186135)	Do `ISD::SRL` / `ISD::SHL` implicitly take the modulo of the shift amount?
7161 ↗	(On Diff #186135)	Same remark re `0`.
7163 ↗	(On Diff #186135)	Same remark re `0`.

nikic added a subscriber: nikic.Feb 10 2019, 1:13 AM

nikic added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7119 ↗	(On Diff #186135)	Pretending it is zero would be consistent with InstSimplify: https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/InstructionSimplify.cpp#L5129 Replacing with undef is not legal (consider for example N0=0, N1=0, which has zero as the only possible result, regardless of N2).
7148 ↗	(On Diff #186135)	This would also be consistent with InstCombine: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1996

RKSimon marked 2 inline comments as done.Feb 10 2019, 3:34 AM

RKSimon added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7148 ↗	(On Diff #186135)	Sure, I can add support for zero as well.
7158–7159 ↗	(On Diff #186135)	Ah good point! Will limit this to PowerOf2 cases that have passed the maskediszero above

lebedev.ri added inline comments.Feb 10 2019, 4:40 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7158–7159 ↗	(On Diff #186135)	By modulo meant that funnel shift implicitly `urem`'s the shift amount by the bitwidth. So %r = fshl %a, 0, %c => %r = shl %a, %c is a miscompile. I.e. this should be folded to %r = fshl i32 %a, 0, %c => %cmodwidth = and i32 %c, 31 ; <- !!! %r = shl i32 %a, %cmodwidth

RKSimon mentioned this in rL353640: [X86] Add additional tests for funnel undef/zero argument combines.Feb 10 2019, 6:57 AM

RKSimon mentioned this in rG76683e7b5800: [X86] Add additional tests for funnel undef/zero argument combines.

Add support for folding cases with zero arguments.

Limited the variable cases to where we know the shift is in range

I'd prefer to deal with the undef shift amounts in a separate patch as that's mostly separate from this fold logic.

RKSimon added a reviewer: nikic.Feb 10 2019, 8:07 AM

Looks ok to me.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7169 ↗	(On Diff #186153)	Hmm, if `FeatureSlowSHLD` is set, or we have BMI2 (and thus sh[lr]x, not depending on `%cl` reg)? Also, to be noted `sub 32, n` should get folded to `neg n`, IIRC.

This revision is now accepted and ready to land.Feb 10 2019, 8:19 AM

nikic added inline comments.Feb 10 2019, 8:26 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7133 ↗	(On Diff #186153)	Maybe pass AllowUndefs=true, as we want to deal with both undef/zero anyway?
7155 ↗	(On Diff #186153)	Should be s/lshr/shl in the two latter comments.
7171 ↗	(On Diff #186153)	Shouldn't this be either setting the `BitWidth - Log2_32(BitWidth)` high bits, or maybe use `getBitsSetFrom()` instead? I think right now this is checking too few bits.

RKSimon marked an inline comment as done.Feb 10 2019, 8:54 AM

RKSimon added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7171 ↗	(On Diff #186153)	Nice catch!

Closed by commit rL353645: [DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts (authored by RKSimon). · Explain WhyFeb 10 2019, 9:03 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

43 lines

test/

CodeGen/

X86/

funnel-shift.ll

76 lines

Diff 186154

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,121 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFunnelShift(SDNode *N) {

// fold (fshl N0, N1, 0) -> N0		// fold (fshl N0, N1, 0) -> N0
// fold (fshr N0, N1, 0) -> N1		// fold (fshr N0, N1, 0) -> N1
if (isPowerOf2_32(BitWidth))		if (isPowerOf2_32(BitWidth))
if (DAG.MaskedValueIsZero(		if (DAG.MaskedValueIsZero(
N2, APInt(N2.getScalarValueSizeInBits(), BitWidth - 1)))		N2, APInt(N2.getScalarValueSizeInBits(), BitWidth - 1)))
return IsFSHL ? N0 : N1;		return IsFSHL ? N0 : N1;

// fold (fsh* N0, N1, c) -> (fsh* N0, N1, c % BitWidth)		auto IsUndefOrZero = [](SDValue V) {
		if (V.isUndef())
		return true;
		if (ConstantSDNode Cst = isConstOrConstSplat(V, /AllowUndefs*/true))
		return Cst->getAPIntValue() == 0;
		return false;
		};

if (ConstantSDNode *Cst = isConstOrConstSplat(N2)) {		if (ConstantSDNode *Cst = isConstOrConstSplat(N2)) {
		EVT ShAmtTy = N2.getValueType();

		// fold (fsh* N0, N1, c) -> (fsh* N0, N1, c % BitWidth)
if (Cst->getAPIntValue().uge(BitWidth)) {		if (Cst->getAPIntValue().uge(BitWidth)) {
uint64_t RotAmt = Cst->getAPIntValue().urem(BitWidth);		uint64_t RotAmt = Cst->getAPIntValue().urem(BitWidth);
return DAG.getNode(N->getOpcode(), SDLoc(N), VT, N0, N1,		return DAG.getNode(N->getOpcode(), SDLoc(N), VT, N0, N1,
DAG.getConstant(RotAmt, SDLoc(N), N2.getValueType()));		DAG.getConstant(RotAmt, SDLoc(N), ShAmtTy));
}		}

		unsigned ShAmt = Cst->getZExtValue();
		if (ShAmt == 0)
		return IsFSHL ? N0 : N1;

		// fold fshl(undef_or_zero, N1, C) -> lshr(N1, BW-C)
		// fold fshr(undef_or_zero, N1, C) -> lshr(N1, C)
		// fold fshl(N0, undef_or_zero, C) -> shl(N0, C)
		// fold fshr(N0, undef_or_zero, C) -> shl(N0, BW-C)
		if (IsUndefOrZero(N0))
		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N1,
		DAG.getConstant(IsFSHL ? BitWidth - ShAmt : ShAmt,
		SDLoc(N), ShAmtTy));
		if (IsUndefOrZero(N1))
		return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0,
		DAG.getConstant(IsFSHL ? ShAmt : BitWidth - ShAmt,
		SDLoc(N), ShAmtTy));
		}

		// fold fshr(undef_or_zero, N1, N2) -> lshr(N1, N2)
		// fold fshl(N0, undef_or_zero, N2) -> shl(N0, N2)
		// iff We know the shift amount is in range.
		// TODO: when is it worth doing SUB(BW, N2) as well?
		if (isPowerOf2_32(BitWidth)) {
		APInt ModuloBits(N2.getScalarValueSizeInBits(), BitWidth - 1);
		if (IsUndefOrZero(N0) && !IsFSHL && DAG.MaskedValueIsZero(N2, ~ModuloBits))
		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N1, N2);
		if (IsUndefOrZero(N1) && IsFSHL && DAG.MaskedValueIsZero(N2, ~ModuloBits))
		return DAG.getNode(ISD::SHL, SDLoc(N), VT, N0, N2);
}		}

// fold (fshl N0, N0, N2) -> (rotl N0, N2)		// fold (fshl N0, N0, N2) -> (rotl N0, N2)
// fold (fshr N0, N0, N2) -> (rotr N0, N2)		// fold (fshr N0, N0, N2) -> (rotr N0, N2)
// TODO: Investigate flipping this rotate if only one is legal, if funnel shift		// TODO: Investigate flipping this rotate if only one is legal, if funnel shift
// is legal as well we might be better off avoiding non-constant (BW - N2).		// is legal as well we might be better off avoiding non-constant (BW - N2).
unsigned RotOpc = IsFSHL ? ISD::ROTL : ISD::ROTR;		unsigned RotOpc = IsFSHL ? ISD::ROTL : ISD::ROTR;
if (N0 == N1 && hasOperation(RotOpc, VT))		if (N0 == N1 && hasOperation(RotOpc, VT))
▲ Show 20 Lines • Show All 12,385 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/funnel-shift.ll

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 %m)		%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 %m)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef0_cst(i32 %a0) nounwind {		define i32 @fshl_i32_undef0_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef0_cst:		; X32-SSE2-LABEL: fshl_i32_undef0_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shldl $9, %eax, %eax		; X32-SSE2-NEXT: shrl $23, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_undef0_cst:		; X64-AVX2-LABEL: fshl_i32_undef0_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: shldl $9, %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
		; X64-AVX2-NEXT: shrl $23, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 9)		%res = call i32 @llvm.fshl.i32(i32 undef, i32 %a0, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef1(i32 %a0, i32 %a1) nounwind {		define i32 @fshl_i32_undef1(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef1:		; X32-SSE2-LABEL: fshl_i32_undef1:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 12 Lines	; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 %a1)		%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef1_msk(i32 %a0, i32 %a1) nounwind {		define i32 @fshl_i32_undef1_msk(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef1_msk:		; X32-SSE2-LABEL: fshl_i32_undef1_msk:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE2-NEXT: movb {{[0-9]+}}(%esp), %cl
; X32-SSE2-NEXT: andl $7, %ecx		; X32-SSE2-NEXT: andb $7, %cl
; X32-SSE2-NEXT: # kill: def $cl killed $cl killed $ecx		; X32-SSE2-NEXT: shll %cl, %eax
; X32-SSE2-NEXT: shldl %cl, %eax, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_undef1_msk:		; X64-AVX2-LABEL: fshl_i32_undef1_msk:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %esi, %ecx		; X64-AVX2-NEXT: movl %esi, %ecx
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: andl $7, %ecx		; X64-AVX2-NEXT: andb $7, %cl
; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx		; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx
; X64-AVX2-NEXT: shldl %cl, %eax, %eax		; X64-AVX2-NEXT: shll %cl, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%m = and i32 %a1, 7		%m = and i32 %a1, 7
%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 %m)		%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 %m)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef1_cst(i32 %a0) nounwind {		define i32 @fshl_i32_undef1_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef1_cst:		; X32-SSE2-LABEL: fshl_i32_undef1_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shldl $9, %eax, %eax		; X32-SSE2-NEXT: shll $9, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_undef1_cst:		; X64-AVX2-LABEL: fshl_i32_undef1_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shldl $9, %eax, %eax		; X64-AVX2-NEXT: shll $9, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 9)		%res = call i32 @llvm.fshl.i32(i32 %a0, i32 undef, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_undef2(i32 %a0, i32 %a1) nounwind {		define i32 @fshl_i32_undef2(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshl_i32_undef2:		; X32-SSE2-LABEL: fshl_i32_undef2:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 29 Lines	; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 %a1)		%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef0_msk(i32 %a0, i32 %a1) nounwind {		define i32 @fshr_i32_undef0_msk(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef0_msk:		; X32-SSE2-LABEL: fshr_i32_undef0_msk:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE2-NEXT: movb {{[0-9]+}}(%esp), %cl
; X32-SSE2-NEXT: andl $7, %ecx		; X32-SSE2-NEXT: andb $7, %cl
; X32-SSE2-NEXT: # kill: def $cl killed $cl killed $ecx		; X32-SSE2-NEXT: shrl %cl, %eax
; X32-SSE2-NEXT: shrdl %cl, %eax, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_undef0_msk:		; X64-AVX2-LABEL: fshr_i32_undef0_msk:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %esi, %ecx		; X64-AVX2-NEXT: movl %esi, %ecx
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: andl $7, %ecx		; X64-AVX2-NEXT: andb $7, %cl
; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx		; X64-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx
; X64-AVX2-NEXT: shrdl %cl, %eax, %eax		; X64-AVX2-NEXT: shrl %cl, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%m = and i32 %a1, 7		%m = and i32 %a1, 7
%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 %m)		%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 %m)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef0_cst(i32 %a0) nounwind {		define i32 @fshr_i32_undef0_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef0_cst:		; X32-SSE2-LABEL: fshr_i32_undef0_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shrdl $9, %eax, %eax		; X32-SSE2-NEXT: shrl $9, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_undef0_cst:		; X64-AVX2-LABEL: fshr_i32_undef0_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: movl %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shrdl $9, %eax, %eax		; X64-AVX2-NEXT: shrl $9, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 9)		%res = call i32 @llvm.fshr.i32(i32 undef, i32 %a0, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef1(i32 %a0, i32 %a1) nounwind {		define i32 @fshr_i32_undef1(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef1:		; X32-SSE2-LABEL: fshr_i32_undef1:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 33 Lines	; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 %m)		%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 %m)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef1_cst(i32 %a0) nounwind {		define i32 @fshr_i32_undef1_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef1_cst:		; X32-SSE2-LABEL: fshr_i32_undef1_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: shrdl $9, %eax, %eax		; X32-SSE2-NEXT: shll $23, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_undef1_cst:		; X64-AVX2-LABEL: fshr_i32_undef1_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: shrdl $9, %edi, %eax		; X64-AVX2-NEXT: movl %edi, %eax
		; X64-AVX2-NEXT: shll $23, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 9)		%res = call i32 @llvm.fshr.i32(i32 %a0, i32 undef, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_undef2(i32 %a0, i32 %a1) nounwind {		define i32 @fshr_i32_undef2(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshr_i32_undef2:		; X32-SSE2-LABEL: fshr_i32_undef2:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 31 Lines
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 0, i32 %a0, i32 %a1)		%res = call i32 @llvm.fshl.i32(i32 0, i32 %a0, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_zero0_cst(i32 %a0) nounwind {		define i32 @fshl_i32_zero0_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshl_i32_zero0_cst:		; X32-SSE2-LABEL: fshl_i32_zero0_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: xorl %eax, %eax		; X32-SSE2-NEXT: shrl $23, %eax
; X32-SSE2-NEXT: shldl $9, %ecx, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_zero0_cst:		; X64-AVX2-LABEL: fshl_i32_zero0_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: xorl %eax, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shldl $9, %edi, %eax		; X64-AVX2-NEXT: shrl $23, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 0, i32 %a0, i32 9)		%res = call i32 @llvm.fshl.i32(i32 0, i32 %a0, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_zero1(i32 %a0, i32 %a1) nounwind {		define i32 @fshl_i32_zero1(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshl_i32_zero1:		; X32-SSE2-LABEL: fshl_i32_zero1:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 13 Lines
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 %a0, i32 0, i32 %a1)		%res = call i32 @llvm.fshl.i32(i32 %a0, i32 0, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshl_i32_zero1_cst(i32 %a0) nounwind {		define i32 @fshl_i32_zero1_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshl_i32_zero1_cst:		; X32-SSE2-LABEL: fshl_i32_zero1_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: xorl %eax, %eax		; X32-SSE2-NEXT: shll $9, %eax
; X32-SSE2-NEXT: shrdl $23, %ecx, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshl_i32_zero1_cst:		; X64-AVX2-LABEL: fshl_i32_zero1_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: xorl %eax, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shrdl $23, %edi, %eax		; X64-AVX2-NEXT: shll $9, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshl.i32(i32 %a0, i32 0, i32 9)		%res = call i32 @llvm.fshl.i32(i32 %a0, i32 0, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_zero0(i32 %a0, i32 %a1) nounwind {		define i32 @fshr_i32_zero0(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshr_i32_zero0:		; X32-SSE2-LABEL: fshr_i32_zero0:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 13 Lines
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 0, i32 %a0, i32 %a1)		%res = call i32 @llvm.fshr.i32(i32 0, i32 %a0, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_zero0_cst(i32 %a0) nounwind {		define i32 @fshr_i32_zero0_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshr_i32_zero0_cst:		; X32-SSE2-LABEL: fshr_i32_zero0_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: xorl %eax, %eax		; X32-SSE2-NEXT: shrl $9, %eax
; X32-SSE2-NEXT: shldl $23, %ecx, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_zero0_cst:		; X64-AVX2-LABEL: fshr_i32_zero0_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: xorl %eax, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shldl $23, %edi, %eax		; X64-AVX2-NEXT: shrl $9, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 0, i32 %a0, i32 9)		%res = call i32 @llvm.fshr.i32(i32 0, i32 %a0, i32 9)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_zero1(i32 %a0, i32 %a1) nounwind {		define i32 @fshr_i32_zero1(i32 %a0, i32 %a1) nounwind {
; X32-SSE2-LABEL: fshr_i32_zero1:		; X32-SSE2-LABEL: fshr_i32_zero1:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
Show All 12 Lines
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 %a0, i32 0, i32 %a1)		%res = call i32 @llvm.fshr.i32(i32 %a0, i32 0, i32 %a1)
ret i32 %res		ret i32 %res
}		}

define i32 @fshr_i32_zero1_cst(i32 %a0) nounwind {		define i32 @fshr_i32_zero1_cst(i32 %a0) nounwind {
; X32-SSE2-LABEL: fshr_i32_zero1_cst:		; X32-SSE2-LABEL: fshr_i32_zero1_cst:
; X32-SSE2: # %bb.0:		; X32-SSE2: # %bb.0:
; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-SSE2-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-SSE2-NEXT: xorl %eax, %eax		; X32-SSE2-NEXT: shll $23, %eax
; X32-SSE2-NEXT: shrdl $9, %ecx, %eax
; X32-SSE2-NEXT: retl		; X32-SSE2-NEXT: retl
;		;
; X64-AVX2-LABEL: fshr_i32_zero1_cst:		; X64-AVX2-LABEL: fshr_i32_zero1_cst:
; X64-AVX2: # %bb.0:		; X64-AVX2: # %bb.0:
; X64-AVX2-NEXT: xorl %eax, %eax		; X64-AVX2-NEXT: movl %edi, %eax
; X64-AVX2-NEXT: shrdl $9, %edi, %eax		; X64-AVX2-NEXT: shll $23, %eax
; X64-AVX2-NEXT: retq		; X64-AVX2-NEXT: retq
%res = call i32 @llvm.fshr.i32(i32 %a0, i32 0, i32 9)		%res = call i32 @llvm.fshr.i32(i32 %a0, i32 0, i32 9)
ret i32 %res		ret i32 %res
}		}

; shift by zero		; shift by zero

define i32 @fshl_i32_zero2(i32 %a0, i32 %a1) nounwind {		define i32 @fshl_i32_zero2(i32 %a0, i32 %a1) nounwind {
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines