This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][x86] add transform/hook to vectorize: cast(extract V, Y)
AbandonedPublic

Authored by spatel on Jan 16 2019, 10:10 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
lebedev.ri
andreadb

Summary

This is a fix for PR39974:
https://bugs.llvm.org/show_bug.cgi?id=39974

I didn't see any existing TLI hooks that capture what we need to know if this is profitable, so I'm proposing a new hook that includes the source and destination types of the cast op. This is enabled for x86 only here, but any target that wants to avoid a register file back-and-forth may find this useful.

The known bits diffs suggest that we can do better at simplifying based on vector demanded elements, but I'm assuming those are not the typical patterns.
We would also likely improve things by moving shuffles ahead of the cast in the case where we are not extracting from element 0.

Diff Detail

Event Timeline

spatel created this revision.Jan 16 2019, 10:10 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptJan 16 2019, 10:10 AM

spatel mentioned this in D56864: [x86] vectorize cast ops in lowering to avoid register file transfers.Jan 17 2019, 9:30 AM

Do any other backends want something like this?
@t.p.northover, @asb, @uweigand, others?

In D56796#1369071, @lebedev.ri wrote:

Do any other backends want something like this?
@t.p.northover, @asb, @uweigand, others?

For RISC-V, there's no vector support upstream currently (the speci is still in flux). @rkruppe may be able to comment on whether it's likely this hook would be useful.

xbolva00 added a subscriber: xbolva00.Jan 24 2019, 3:33 AM

xbolva00 added inline comments.

test/CodeGen/X86/known-signbits-vector.ll
158	Looks bad

A later, target-specific alternative to this patch is proposed in D56864.
As I mentioned in the summary, I'm not that concerned about the knownbits regression, but the other patch does sidestep that problem.

Thanks for the heads-up! This may indeed be interesting for SystemZ, but I think it's still probably preferable to do it in the back-end like your alternative approach does, that will allow us to handle some special instruction selection issues we'll likely run into ...

In D56796#1369497, @uweigand wrote:

Thanks for the heads-up! This may indeed be interesting for SystemZ, but I think it's still probably preferable to do it in the back-end like your alternative approach does, that will allow us to handle some special instruction selection issues we'll likely run into ...

Yes, I think we'll proceed with a target-specific approach for x86; it also has some custom opcode requirements.
Abandoning.

spatel mentioned this in rL353302: [x86] vectorize cast ops in lowering to avoid register file transfers.Feb 6 2019, 6:59 AM

spatel mentioned this in rGe84fbb67a1f0: [x86] vectorize cast ops in lowering to avoid register file transfers.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

TargetLowering.h

9 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

31 lines

Target/

X86/

X86ISelLowering.h

3 lines

X86ISelLowering.cpp

22 lines

test/

CodeGen/

X86/

known-bits-vector.ll

3 lines

known-signbits-vector.ll

29 lines

vec_int_to_fp.ll

197 lines

Diff 182087

include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,409 Lines • ▼ Show 20 Lines	public:
}		}

/// Try to convert an extract element of a vector binary operation into an		/// Try to convert an extract element of a vector binary operation into an
/// extract element followed by a scalar operation.		/// extract element followed by a scalar operation.
virtual bool shouldScalarizeBinop(SDValue VecOp) const {		virtual bool shouldScalarizeBinop(SDValue VecOp) const {
return false;		return false;
}		}

		/// If the target supports the specified vector cast operation, use it to
		/// avoid the more expensive scalarized version of the cast operation.
		/// This is used pre-legalization, so the target may return true even if the
		/// types are not legal (legalization will transform the operation into a
		/// profitable vector instruction sequence).
		virtual bool useVectorCast(unsigned Opcode, EVT FromVT, EVT ToVT) const {
		return false;
		}

// Return true if it is profitable to use a scalar input to a BUILD_VECTOR		// Return true if it is profitable to use a scalar input to a BUILD_VECTOR
// even if the vector itself has multiple uses.		// even if the vector itself has multiple uses.
virtual bool aggressivelyPreferBuildVectorSources(EVT VecVT) const {		virtual bool aggressivelyPreferBuildVectorSources(EVT VecVT) const {
return false;		return false;
}		}

// Return true if CodeGenPrepare should consider splitting large offset of a		// Return true if CodeGenPrepare should consider splitting large offset of a
// GEP to make the GEP fit into the addressing mode and can be sunk into the		// GEP to make the GEP fit into the addressing mode and can be sunk into the
▲ Show 20 Lines • Show All 1,491 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,900 Lines • ▼ Show 20 Lines	static SDValue foldFPToIntToFP(SDNode *N, SelectionDAG &DAG,

if (N->getOpcode() == ISD::UINT_TO_FP && N0.getOpcode() == ISD::FP_TO_UINT &&		if (N->getOpcode() == ISD::UINT_TO_FP && N0.getOpcode() == ISD::FP_TO_UINT &&
N0.getOperand(0).getValueType() == VT)		N0.getOperand(0).getValueType() == VT)
return DAG.getNode(ISD::FTRUNC, SDLoc(N), VT, N0.getOperand(0));		return DAG.getNode(ISD::FTRUNC, SDLoc(N), VT, N0.getOperand(0));

return SDValue();		return SDValue();
}		}

		/// Given a scalar cast operation that is extracted from a vector, ask the
		/// target if it is profitable to vectorize the cast op followed by extraction.
		/// This may avoid an expensive round-trip between vector and scalar registers.
		static SDValue vectorizeExtractedCast(SDNode *N, SelectionDAG &DAG) {
		SDValue N0 = N->getOperand(0);
		EVT VT = N->getValueType(0);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		if (!N0.hasOneUse() \|\| N0.getOpcode() != ISD::EXTRACT_VECTOR_ELT)
		return SDValue();

		SDValue VecOp = N0.getOperand(0);
		EVT FromVT = VecOp.getValueType();
		EVT ToVT = EVT::getVectorVT(*DAG.getContext(), VT,
		FromVT.getVectorNumElements());
		if (!TLI.useVectorCast(N->getOpcode(), FromVT, ToVT))
		return SDValue();

		// cast (extract V, Y) --> extract (cast V), Y
		SDLoc DL(N);
		SDValue VCast = DAG.getNode(N->getOpcode(), DL, ToVT, VecOp);
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, VT, VCast, N0.getOperand(1));
		}

SDValue DAGCombiner::visitSINT_TO_FP(SDNode *N) {		SDValue DAGCombiner::visitSINT_TO_FP(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT OpVT = N0.getValueType();		EVT OpVT = N0.getValueType();

// fold (sint_to_fp c1) -> c1fp		// fold (sint_to_fp c1) -> c1fp
if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&		if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&
// ...but only if the target supports immediate floating-point values		// ...but only if the target supports immediate floating-point values
Show All 38 Lines	if (N0.getOpcode() == ISD::ZERO_EXTEND &&
N0.getOperand(0).getOperand(2) };		N0.getOperand(0).getOperand(2) };
return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);		return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);
}		}
}		}

if (SDValue FTrunc = foldFPToIntToFP(N, DAG, TLI))		if (SDValue FTrunc = foldFPToIntToFP(N, DAG, TLI))
return FTrunc;		return FTrunc;

		if (!LegalOperations)
		if (SDValue Extract = vectorizeExtractedCast(N, DAG))
		return Extract;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitUINT_TO_FP(SDNode *N) {		SDValue DAGCombiner::visitUINT_TO_FP(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT OpVT = N0.getValueType();		EVT OpVT = N0.getValueType();

Show All 26 Lines	if (N0.getOpcode() == ISD::SETCC && !VT.isVector() &&
N0.getOperand(2) };		N0.getOperand(2) };
return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);		return DAG.getNode(ISD::SELECT_CC, DL, VT, Ops);
}		}
}		}

if (SDValue FTrunc = foldFPToIntToFP(N, DAG, TLI))		if (SDValue FTrunc = foldFPToIntToFP(N, DAG, TLI))
return FTrunc;		return FTrunc;

		if (!LegalOperations)
		if (SDValue Extract = vectorizeExtractedCast(N, DAG))
		return Extract;

return SDValue();		return SDValue();
}		}

// Fold (fp_to_{s/u}int ({s/u}int_to_fpx)) -> zext x, sext x, trunc x, or x		// Fold (fp_to_{s/u}int ({s/u}int_to_fpx)) -> zext x, sext x, trunc x, or x
static SDValue FoldIntToFPToInt(SDNode *N, SelectionDAG &DAG) {		static SDValue FoldIntToFPToInt(SDNode *N, SelectionDAG &DAG) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

▲ Show 20 Lines • Show All 7,383 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,057 Lines • ▼ Show 20 Lines	public:
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;		unsigned Index) const override;

/// Scalar ops always have equal or better analysis/performance/power than		/// Scalar ops always have equal or better analysis/performance/power than
/// the vector equivalent, so this always makes sense if the scalar op is		/// the vector equivalent, so this always makes sense if the scalar op is
/// supported.		/// supported.
bool shouldScalarizeBinop(SDValue) const override;		bool shouldScalarizeBinop(SDValue) const override;

		/// Vector casts can be used to avoid transfers between scalar and vector.
		bool useVectorCast(unsigned Opcode, EVT FromVT, EVT ToVT) const override;

bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,		bool storeOfVectorConstantIsCheap(EVT MemVT, unsigned NumElem,
unsigned AddrSpace) const override {		unsigned AddrSpace) const override {
// If we can replace more than 2 scalar stores, there will be a reduction		// If we can replace more than 2 scalar stores, there will be a reduction
// in instructions even after we add a vector constant load.		// in instructions even after we add a vector constant load.
return NumElem > 2;		return NumElem > 2;
}		}

bool isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT) const override;		bool isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT) const override;
▲ Show 20 Lines • Show All 518 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,914 Lines • ▼ Show 20 Lines	if (!isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), VecVT))
return true;		return true;

// If the vector op is supported, but the scalar op is not, the transform may		// If the vector op is supported, but the scalar op is not, the transform may
// not be worthwhile.		// not be worthwhile.
EVT ScalarVT = VecVT.getScalarType();		EVT ScalarVT = VecVT.getScalarType();
return isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), ScalarVT);		return isOperationLegalOrCustomOrPromote(VecOp.getOpcode(), ScalarVT);
}		}

		bool X86TargetLowering::useVectorCast(unsigned Opcode, EVT FromVT,
		EVT ToVT) const {
		switch (Opcode) {
		case ISD::SINT_TO_FP:
		// TODO: Handle wider types with AVX/AVX512.
		if (!Subtarget.hasSSE2() \|\| FromVT != MVT::v4i32)
		return false;
		// CVTDQ2PS or (V)CVTDQ2PD (pre-AVX, this will legalize to 128-bit)
		return ToVT == MVT::v4f32 \|\| ToVT == MVT::v4f64;

		case ISD::UINT_TO_FP:
		// TODO: Handle wider types and i64 elements.
		if (!Subtarget.hasAVX512() \|\| FromVT != MVT::v4i32)
		return false;
		// VCVTUDQ2PS or VCVTUDQ2PD
		return ToVT == MVT::v4f32 \|\| ToVT == MVT::v4f64;

		default:
		return false;
		}
		}

bool X86TargetLowering::isCheapToSpeculateCttz() const {		bool X86TargetLowering::isCheapToSpeculateCttz() const {
// Speculate cttz only if we can directly use TZCNT.		// Speculate cttz only if we can directly use TZCNT.
return Subtarget.hasBMI();		return Subtarget.hasBMI();
}		}

bool X86TargetLowering::isCheapToSpeculateCtlz() const {		bool X86TargetLowering::isCheapToSpeculateCtlz() const {
// Speculate ctlz only if we can directly use LZCNT.		// Speculate ctlz only if we can directly use LZCNT.
return Subtarget.hasLZCNT();		return Subtarget.hasLZCNT();
▲ Show 20 Lines • Show All 37,867 Lines • Show Last 20 Lines

test/CodeGen/X86/known-bits-vector.ll

Show All 19 Lines	; X64-NEXT: retq
ret i32 %3		ret i32 %3
}		}

define float @knownbits_mask_extract_uitofp(<2 x i64> %a0) nounwind {		define float @knownbits_mask_extract_uitofp(<2 x i64> %a0) nounwind {
; X32-LABEL: knownbits_mask_extract_uitofp:		; X32-LABEL: knownbits_mask_extract_uitofp:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero		; X32-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; X32-NEXT: vmovd %xmm0, %eax		; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
; X32-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0
; X32-NEXT: vmovss %xmm0, (%esp)		; X32-NEXT: vmovss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: knownbits_mask_extract_uitofp:		; X64-LABEL: knownbits_mask_extract_uitofp:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovq %xmm0, %rax		; X64-NEXT: vmovq %xmm0, %rax
▲ Show 20 Lines • Show All 631 Lines • Show Last 20 Lines

test/CodeGen/X86/known-signbits-vector.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%9 = sitofp <4 x i64> %8 to <4 x float>		%9 = sitofp <4 x i64> %8 to <4 x float>
ret <4 x float> %9		ret <4 x float> %9
}		}

define float @signbits_ashr_extract_sitofp_0(<2 x i64> %a0) nounwind {		define float @signbits_ashr_extract_sitofp_0(<2 x i64> %a0) nounwind {
; X32-LABEL: signbits_ashr_extract_sitofp_0:		; X32-LABEL: signbits_ashr_extract_sitofp_0:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: vextractps $1, %xmm0, %eax		; X32-NEXT: vpsrlq $32, %xmm0, %xmm0
; X32-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0		; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
; X32-NEXT: vmovss %xmm0, (%esp)		; X32-NEXT: vmovss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: signbits_ashr_extract_sitofp_0:		; X64-LABEL: signbits_ashr_extract_sitofp_0:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovq %xmm0, %rax		; X64-NEXT: vmovq %xmm0, %rax
; X64-NEXT: shrq $32, %rax		; X64-NEXT: shrq $32, %rax
; X64-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0		; X64-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0
; X64-NEXT: retq		; X64-NEXT: retq
%1 = ashr <2 x i64> %a0, <i64 32, i64 32>		%1 = ashr <2 x i64> %a0, <i64 32, i64 32>
%2 = extractelement <2 x i64> %1, i32 0		%2 = extractelement <2 x i64> %1, i32 0
%3 = sitofp i64 %2 to float		%3 = sitofp i64 %2 to float
ret float %3		ret float %3
}		}

define float @signbits_ashr_extract_sitofp_1(<2 x i64> %a0) nounwind {		define float @signbits_ashr_extract_sitofp_1(<2 x i64> %a0) nounwind {
; X32-LABEL: signbits_ashr_extract_sitofp_1:		; X32-LABEL: signbits_ashr_extract_sitofp_1:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
		; X32-NEXT: vpsrlq $63, %xmm0, %xmm1
; X32-NEXT: vpsrlq $32, %xmm0, %xmm0		; X32-NEXT: vpsrlq $32, %xmm0, %xmm0
		; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [0,32768,0,0,1,0,0,0]		; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [0,32768,0,0,1,0,0,0]
; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0		; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0
; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0		; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0
; X32-NEXT: vmovd %xmm0, %eax		; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
; X32-NEXT: vcvtsi2ssl %eax, %xmm2, %xmm0
; X32-NEXT: vmovss %xmm0, (%esp)		; X32-NEXT: vmovss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: signbits_ashr_extract_sitofp_1:		; X64-LABEL: signbits_ashr_extract_sitofp_1:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovq %xmm0, %rax		; X64-NEXT: vmovq %xmm0, %rax
; X64-NEXT: shrq $32, %rax		; X64-NEXT: shrq $32, %rax
; X64-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0		; X64-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0
; X64-NEXT: retq		; X64-NEXT: retq
%1 = ashr <2 x i64> %a0, <i64 32, i64 63>		%1 = ashr <2 x i64> %a0, <i64 32, i64 63>
%2 = extractelement <2 x i64> %1, i32 0		%2 = extractelement <2 x i64> %1, i32 0
%3 = sitofp i64 %2 to float		%3 = sitofp i64 %2 to float
ret float %3		ret float %3
}		}

define float @signbits_ashr_shl_extract_sitofp(<2 x i64> %a0) nounwind {		define float @signbits_ashr_shl_extract_sitofp(<2 x i64> %a0) nounwind {
; X32-LABEL: signbits_ashr_shl_extract_sitofp:		; X32-LABEL: signbits_ashr_shl_extract_sitofp:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
		; X32-NEXT: vpsrlq $60, %xmm0, %xmm1
; X32-NEXT: vpsrlq $61, %xmm0, %xmm0		; X32-NEXT: vpsrlq $61, %xmm0, %xmm0
		; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,0,0,8,0,0,0]		; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,0,0,8,0,0,0]
; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0		; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0
; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0		; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0
		; X32-NEXT: vpsllq $16, %xmm0, %xmm1
; X32-NEXT: vpsllq $20, %xmm0, %xmm0		; X32-NEXT: vpsllq $20, %xmm0, %xmm0
; X32-NEXT: vmovd %xmm0, %eax		; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
; X32-NEXT: vcvtsi2ssl %eax, %xmm2, %xmm0		; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
; X32-NEXT: vmovss %xmm0, (%esp)		; X32-NEXT: vmovss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: signbits_ashr_shl_extract_sitofp:		; X64-LABEL: signbits_ashr_shl_extract_sitofp:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vmovq %xmm0, %rax		; X64-NEXT: vmovq %xmm0, %rax
Show All 13 Lines
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: shrdl $30, %ecx, %eax		; X32-NEXT: shrdl $30, %ecx, %eax
; X32-NEXT: sarl $30, %ecx		; X32-NEXT: sarl $30, %ecx
; X32-NEXT: vmovd %eax, %xmm0		; X32-NEXT: vmovd %eax, %xmm0
; X32-NEXT: vpinsrd $1, %ecx, %xmm0, %xmm0		; X32-NEXT: vpinsrd $1, %ecx, %xmm0, %xmm0
		; X32-NEXT: vpinsrd $2, {{[0-9]+}}(%esp), %xmm0, %xmm0
		xbolva00Unsubmitted Not Done Reply Inline Actions Looks bad xbolva00: Looks bad
		; X32-NEXT: vpinsrd $3, {{[0-9]+}}(%esp), %xmm0, %xmm0
; X32-NEXT: vpsrlq $3, %xmm0, %xmm0		; X32-NEXT: vpsrlq $3, %xmm0, %xmm0
; X32-NEXT: vmovd %xmm0, %eax		; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
; X32-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0
; X32-NEXT: vmovss %xmm0, (%esp)		; X32-NEXT: vmovss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: signbits_ashr_insert_ashr_extract_sitofp:		; X64-LABEL: signbits_ashr_insert_ashr_extract_sitofp:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: sarq $30, %rdi		; X64-NEXT: sarq $30, %rdi
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%6 = sitofp <2 x i64> %5 to <2 x double>		%6 = sitofp <2 x i64> %5 to <2 x double>
ret <2 x double> %6		ret <2 x double> %6
}		}

define float @signbits_ashr_sext_sextinreg_and_extract_sitofp(<2 x i64> %a0, <2 x i64> %a1, i32 %a2) nounwind {		define float @signbits_ashr_sext_sextinreg_and_extract_sitofp(<2 x i64> %a0, <2 x i64> %a1, i32 %a2) nounwind {
; X32-LABEL: signbits_ashr_sext_sextinreg_and_extract_sitofp:		; X32-LABEL: signbits_ashr_sext_sextinreg_and_extract_sitofp:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %eax		; X32-NEXT: pushl %eax
		; X32-NEXT: vpsrlq $60, %xmm0, %xmm1
; X32-NEXT: vpsrlq $61, %xmm0, %xmm0		; X32-NEXT: vpsrlq $61, %xmm0, %xmm0
		; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,0,0,8,0,0,0]		; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,0,0,8,0,0,0]
; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0		; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0
; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0		; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0
; X32-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero		; X32-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
; X32-NEXT: vpand %xmm1, %xmm0, %xmm0		; X32-NEXT: vpand %xmm1, %xmm0, %xmm0
; X32-NEXT: vmovd %xmm0, %eax		; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
; X32-NEXT: vcvtsi2ssl %eax, %xmm2, %xmm0
; X32-NEXT: vmovss %xmm0, (%esp)		; X32-NEXT: vmovss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: signbits_ashr_sext_sextinreg_and_extract_sitofp:		; X64-LABEL: signbits_ashr_sext_sextinreg_and_extract_sitofp:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpsrlq $61, %xmm0, %xmm0		; X64-NEXT: vpsrlq $61, %xmm0, %xmm0
Show All 26 Lines
; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]		; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]
; X32-NEXT: vmovdqa {{.*#+}} xmm2 = [4,0,0,0,8,0,0,0]		; X32-NEXT: vmovdqa {{.*#+}} xmm2 = [4,0,0,0,8,0,0,0]
; X32-NEXT: vpxor %xmm2, %xmm0, %xmm0		; X32-NEXT: vpxor %xmm2, %xmm0, %xmm0
; X32-NEXT: vpsubq %xmm2, %xmm0, %xmm0		; X32-NEXT: vpsubq %xmm2, %xmm0, %xmm0
; X32-NEXT: vpmovsxdq %xmm1, %xmm1		; X32-NEXT: vpmovsxdq %xmm1, %xmm1
; X32-NEXT: vpand %xmm1, %xmm0, %xmm2		; X32-NEXT: vpand %xmm1, %xmm0, %xmm2
; X32-NEXT: vpor %xmm1, %xmm2, %xmm1		; X32-NEXT: vpor %xmm1, %xmm2, %xmm1
; X32-NEXT: vpxor %xmm0, %xmm1, %xmm0		; X32-NEXT: vpxor %xmm0, %xmm1, %xmm0
; X32-NEXT: vmovd %xmm0, %eax		; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
; X32-NEXT: vcvtsi2ssl %eax, %xmm3, %xmm0
; X32-NEXT: vmovss %xmm0, (%esp)		; X32-NEXT: vmovss %xmm0, (%esp)
; X32-NEXT: flds (%esp)		; X32-NEXT: flds (%esp)
; X32-NEXT: popl %eax		; X32-NEXT: popl %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: signbits_ashr_sextvecinreg_bitops_extract_sitofp:		; X64-LABEL: signbits_ashr_sextvecinreg_bitops_extract_sitofp:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpsrlq $60, %xmm0, %xmm2		; X64-NEXT: vpsrlq $60, %xmm0, %xmm2
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_int_to_fp.ll

Show First 20 Lines • Show All 5,550 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
ret <4 x float> %res		ret <4 x float> %res
}		}

; Extract from int vector and convert to FP.		; Extract from int vector and convert to FP.

define float @extract0_sitofp_v4i32_f32(<4 x i32> %x) nounwind {		define float @extract0_sitofp_v4i32_f32(<4 x i32> %x) nounwind {
; SSE-LABEL: extract0_sitofp_v4i32_f32:		; SSE-LABEL: extract0_sitofp_v4i32_f32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movd %xmm0, %eax		; SSE-NEXT: cvtdq2ps %xmm0, %xmm0
; SSE-NEXT: xorps %xmm0, %xmm0
; SSE-NEXT: cvtsi2ssl %eax, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: extract0_sitofp_v4i32_f32:		; AVX-LABEL: extract0_sitofp_v4i32_f32:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vmovd %xmm0, %eax		; AVX-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <4 x i32> %x, i32 0		%e = extractelement <4 x i32> %x, i32 0
%r = sitofp i32 %e to float		%r = sitofp i32 %e to float
ret float %r		ret float %r
}		}

define double @extract0_sitofp_v4i32_f64(<4 x i32> %x) nounwind {		define double @extract0_sitofp_v4i32_f64(<4 x i32> %x) nounwind {
; SSE-LABEL: extract0_sitofp_v4i32_f64:		; SSE-LABEL: extract0_sitofp_v4i32_f64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movd %xmm0, %eax		; SSE-NEXT: cvtdq2pd %xmm0, %xmm0
; SSE-NEXT: xorps %xmm0, %xmm0
; SSE-NEXT: cvtsi2sdl %eax, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: extract0_sitofp_v4i32_f64:		; AVX-LABEL: extract0_sitofp_v4i32_f64:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vmovd %xmm0, %eax		; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
; AVX-NEXT: vcvtsi2sdl %eax, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <4 x i32> %x, i32 0		%e = extractelement <4 x i32> %x, i32 0
%r = sitofp i32 %e to double		%r = sitofp i32 %e to double
ret double %r		ret double %r
}		}

define float @extract0_uitofp_v4i32_f32(<4 x i32> %x) nounwind {		define float @extract0_uitofp_v4i32_f32(<4 x i32> %x) nounwind {
; SSE-LABEL: extract0_uitofp_v4i32_f32:		; SSE-LABEL: extract0_uitofp_v4i32_f32:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movd %xmm0, %eax		; SSE-NEXT: movd %xmm0, %eax
; SSE-NEXT: xorps %xmm0, %xmm0		; SSE-NEXT: xorps %xmm0, %xmm0
; SSE-NEXT: cvtsi2ssq %rax, %xmm0		; SSE-NEXT: cvtsi2ssq %rax, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; VEX-LABEL: extract0_uitofp_v4i32_f32:		; VEX-LABEL: extract0_uitofp_v4i32_f32:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vmovd %xmm0, %eax		; VEX-NEXT: vmovd %xmm0, %eax
; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0		; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512-LABEL: extract0_uitofp_v4i32_f32:		; AVX512F-LABEL: extract0_uitofp_v4i32_f32:
; AVX512: # %bb.0:		; AVX512F: # %bb.0:
; AVX512-NEXT: vmovd %xmm0, %eax		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; AVX512-NEXT: vcvtusi2ssl %eax, %xmm1, %xmm0		; AVX512F-NEXT: vcvtudq2ps %zmm0, %zmm0
; AVX512-NEXT: retq		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512VL-LABEL: extract0_uitofp_v4i32_f32:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vcvtudq2ps %xmm0, %xmm0
		; AVX512VL-NEXT: retq
		;
		; AVX512DQ-LABEL: extract0_uitofp_v4i32_f32:
		; AVX512DQ: # %bb.0:
		; AVX512DQ-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
		; AVX512DQ-NEXT: vcvtudq2ps %zmm0, %zmm0
		; AVX512DQ-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
		; AVX512DQ-NEXT: vzeroupper
		; AVX512DQ-NEXT: retq
		;
		; AVX512VLDQ-LABEL: extract0_uitofp_v4i32_f32:
		; AVX512VLDQ: # %bb.0:
		; AVX512VLDQ-NEXT: vcvtudq2ps %xmm0, %xmm0
		; AVX512VLDQ-NEXT: retq
%e = extractelement <4 x i32> %x, i32 0		%e = extractelement <4 x i32> %x, i32 0
%r = uitofp i32 %e to float		%r = uitofp i32 %e to float
ret float %r		ret float %r
}		}

define double @extract0_uitofp_v4i32_f64(<4 x i32> %x) nounwind {		define double @extract0_uitofp_v4i32_f64(<4 x i32> %x) nounwind {
; SSE-LABEL: extract0_uitofp_v4i32_f64:		; SSE-LABEL: extract0_uitofp_v4i32_f64:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movd %xmm0, %eax		; SSE-NEXT: movd %xmm0, %eax
; SSE-NEXT: xorps %xmm0, %xmm0		; SSE-NEXT: xorps %xmm0, %xmm0
; SSE-NEXT: cvtsi2sdq %rax, %xmm0		; SSE-NEXT: cvtsi2sdq %rax, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; VEX-LABEL: extract0_uitofp_v4i32_f64:		; VEX-LABEL: extract0_uitofp_v4i32_f64:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vmovd %xmm0, %eax		; VEX-NEXT: vmovd %xmm0, %eax
; VEX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0		; VEX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512-LABEL: extract0_uitofp_v4i32_f64:		; AVX512F-LABEL: extract0_uitofp_v4i32_f64:
; AVX512: # %bb.0:		; AVX512F: # %bb.0:
; AVX512-NEXT: vmovd %xmm0, %eax		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; AVX512-NEXT: vcvtusi2sdl %eax, %xmm1, %xmm0		; AVX512F-NEXT: vcvtudq2pd %ymm0, %zmm0
; AVX512-NEXT: retq		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512VL-LABEL: extract0_uitofp_v4i32_f64:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vcvtudq2pd %xmm0, %ymm0
		; AVX512VL-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
		; AVX512VL-NEXT: vzeroupper
		; AVX512VL-NEXT: retq
		;
		; AVX512DQ-LABEL: extract0_uitofp_v4i32_f64:
		; AVX512DQ: # %bb.0:
		; AVX512DQ-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
		; AVX512DQ-NEXT: vcvtudq2pd %ymm0, %zmm0
		; AVX512DQ-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
		; AVX512DQ-NEXT: vzeroupper
		; AVX512DQ-NEXT: retq
		;
		; AVX512VLDQ-LABEL: extract0_uitofp_v4i32_f64:
		; AVX512VLDQ: # %bb.0:
		; AVX512VLDQ-NEXT: vcvtudq2pd %xmm0, %ymm0
		; AVX512VLDQ-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
		; AVX512VLDQ-NEXT: vzeroupper
		; AVX512VLDQ-NEXT: retq
%e = extractelement <4 x i32> %x, i32 0		%e = extractelement <4 x i32> %x, i32 0
%r = uitofp i32 %e to double		%r = uitofp i32 %e to double
ret double %r		ret double %r
}		}

; Extract non-zero element from int vector and convert to FP.		; Extract non-zero element from int vector and convert to FP.

define float @extract3_sitofp_v4i32_f32(<4 x i32> %x) nounwind {		define float @extract3_sitofp_v4i32_f32(<4 x i32> %x) nounwind {
; SSE2-LABEL: extract3_sitofp_v4i32_f32:		; SSE-LABEL: extract3_sitofp_v4i32_f32:
; SSE2: # %bb.0:		; SSE: # %bb.0:
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3]		; SSE-NEXT: cvtdq2ps %xmm0, %xmm0
; SSE2-NEXT: movd %xmm0, %eax		; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,1,2,3]
; SSE2-NEXT: xorps %xmm0, %xmm0		; SSE-NEXT: retq
; SSE2-NEXT: cvtsi2ssl %eax, %xmm0
; SSE2-NEXT: retq
;
; SSE41-LABEL: extract3_sitofp_v4i32_f32:
; SSE41: # %bb.0:
; SSE41-NEXT: extractps $3, %xmm0, %eax
; SSE41-NEXT: xorps %xmm0, %xmm0
; SSE41-NEXT: cvtsi2ssl %eax, %xmm0
; SSE41-NEXT: retq
;		;
; AVX-LABEL: extract3_sitofp_v4i32_f32:		; AVX-LABEL: extract3_sitofp_v4i32_f32:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vextractps $3, %xmm0, %eax		; AVX-NEXT: vcvtdq2ps %xmm0, %xmm0
; AVX-NEXT: vcvtsi2ssl %eax, %xmm1, %xmm0		; AVX-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <4 x i32> %x, i32 3		%e = extractelement <4 x i32> %x, i32 3
%r = sitofp i32 %e to float		%r = sitofp i32 %e to float
ret float %r		ret float %r
}		}

define double @extract3_sitofp_v4i32_f64(<4 x i32> %x) nounwind {		define double @extract3_sitofp_v4i32_f64(<4 x i32> %x) nounwind {
; SSE2-LABEL: extract3_sitofp_v4i32_f64:		; SSE-LABEL: extract3_sitofp_v4i32_f64:
; SSE2: # %bb.0:		; SSE: # %bb.0:
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,1,2,3]		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
; SSE2-NEXT: movd %xmm0, %eax		; SSE-NEXT: cvtdq2pd %xmm0, %xmm0
; SSE2-NEXT: xorps %xmm0, %xmm0		; SSE-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
; SSE2-NEXT: cvtsi2sdl %eax, %xmm0		; SSE-NEXT: retq
; SSE2-NEXT: retq
;
; SSE41-LABEL: extract3_sitofp_v4i32_f64:
; SSE41: # %bb.0:
; SSE41-NEXT: extractps $3, %xmm0, %eax
; SSE41-NEXT: xorps %xmm0, %xmm0
; SSE41-NEXT: cvtsi2sdl %eax, %xmm0
; SSE41-NEXT: retq
;		;
; AVX-LABEL: extract3_sitofp_v4i32_f64:		; AVX-LABEL: extract3_sitofp_v4i32_f64:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vextractps $3, %xmm0, %eax		; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0
; AVX-NEXT: vcvtsi2sdl %eax, %xmm1, %xmm0		; AVX-NEXT: vextractf128 $1, %ymm0, %xmm0
		; AVX-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
		; AVX-NEXT: vzeroupper
; AVX-NEXT: retq		; AVX-NEXT: retq
%e = extractelement <4 x i32> %x, i32 3		%e = extractelement <4 x i32> %x, i32 3
%r = sitofp i32 %e to double		%r = sitofp i32 %e to double
ret double %r		ret double %r
}		}

define float @extract3_uitofp_v4i32_f32(<4 x i32> %x) nounwind {		define float @extract3_uitofp_v4i32_f32(<4 x i32> %x) nounwind {
; SSE2-LABEL: extract3_uitofp_v4i32_f32:		; SSE2-LABEL: extract3_uitofp_v4i32_f32:
Show All 12 Lines
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; VEX-LABEL: extract3_uitofp_v4i32_f32:		; VEX-LABEL: extract3_uitofp_v4i32_f32:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vextractps $3, %xmm0, %eax		; VEX-NEXT: vextractps $3, %xmm0, %eax
; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0		; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512-LABEL: extract3_uitofp_v4i32_f32:		; AVX512F-LABEL: extract3_uitofp_v4i32_f32:
; AVX512: # %bb.0:		; AVX512F: # %bb.0:
; AVX512-NEXT: vextractps $3, %xmm0, %eax		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
; AVX512-NEXT: vcvtusi2ssl %eax, %xmm1, %xmm0		; AVX512F-NEXT: vcvtudq2ps %zmm0, %zmm0
; AVX512-NEXT: retq		; AVX512F-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512VL-LABEL: extract3_uitofp_v4i32_f32:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vcvtudq2ps %xmm0, %xmm0
		; AVX512VL-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
		; AVX512VL-NEXT: retq
		;
		; AVX512DQ-LABEL: extract3_uitofp_v4i32_f32:
		; AVX512DQ: # %bb.0:
		; AVX512DQ-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
		; AVX512DQ-NEXT: vcvtudq2ps %zmm0, %zmm0
		; AVX512DQ-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
		; AVX512DQ-NEXT: vzeroupper
		; AVX512DQ-NEXT: retq
		;
		; AVX512VLDQ-LABEL: extract3_uitofp_v4i32_f32:
		; AVX512VLDQ: # %bb.0:
		; AVX512VLDQ-NEXT: vcvtudq2ps %xmm0, %xmm0
		; AVX512VLDQ-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,1,2,3]
		; AVX512VLDQ-NEXT: retq
%e = extractelement <4 x i32> %x, i32 3		%e = extractelement <4 x i32> %x, i32 3
%r = uitofp i32 %e to float		%r = uitofp i32 %e to float
ret float %r		ret float %r
}		}

define double @extract3_uitofp_v4i32_f64(<4 x i32> %x) nounwind {		define double @extract3_uitofp_v4i32_f64(<4 x i32> %x) nounwind {
; SSE2-LABEL: extract3_uitofp_v4i32_f64:		; SSE2-LABEL: extract3_uitofp_v4i32_f64:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
Show All 11 Lines
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; VEX-LABEL: extract3_uitofp_v4i32_f64:		; VEX-LABEL: extract3_uitofp_v4i32_f64:
; VEX: # %bb.0:		; VEX: # %bb.0:
; VEX-NEXT: vextractps $3, %xmm0, %eax		; VEX-NEXT: vextractps $3, %xmm0, %eax
; VEX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0		; VEX-NEXT: vcvtsi2sdq %rax, %xmm1, %xmm0
; VEX-NEXT: retq		; VEX-NEXT: retq
;		;
; AVX512-LABEL: extract3_uitofp_v4i32_f64:		; AVX512F-LABEL: extract3_uitofp_v4i32_f64:
; AVX512: # %bb.0:		; AVX512F: # %bb.0:
; AVX512-NEXT: vextractps $3, %xmm0, %eax		; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; AVX512-NEXT: vcvtusi2sdl %eax, %xmm1, %xmm0		; AVX512F-NEXT: vcvtudq2pd %ymm0, %zmm0
; AVX512-NEXT: retq		; AVX512F-NEXT: vextractf128 $1, %ymm0, %xmm0
		; AVX512F-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512VL-LABEL: extract3_uitofp_v4i32_f64:
		; AVX512VL: # %bb.0:
		; AVX512VL-NEXT: vcvtudq2pd %xmm0, %ymm0
		; AVX512VL-NEXT: vextractf128 $1, %ymm0, %xmm0
		; AVX512VL-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
		; AVX512VL-NEXT: vzeroupper
		; AVX512VL-NEXT: retq
		;
		; AVX512DQ-LABEL: extract3_uitofp_v4i32_f64:
		; AVX512DQ: # %bb.0:
		; AVX512DQ-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
		; AVX512DQ-NEXT: vcvtudq2pd %ymm0, %zmm0
		; AVX512DQ-NEXT: vextractf128 $1, %ymm0, %xmm0
		; AVX512DQ-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
		; AVX512DQ-NEXT: vzeroupper
		; AVX512DQ-NEXT: retq
		;
		; AVX512VLDQ-LABEL: extract3_uitofp_v4i32_f64:
		; AVX512VLDQ: # %bb.0:
		; AVX512VLDQ-NEXT: vcvtudq2pd %xmm0, %ymm0
		; AVX512VLDQ-NEXT: vextractf128 $1, %ymm0, %xmm0
		; AVX512VLDQ-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
		; AVX512VLDQ-NEXT: vzeroupper
		; AVX512VLDQ-NEXT: retq
%e = extractelement <4 x i32> %x, i32 3		%e = extractelement <4 x i32> %x, i32 3
%r = uitofp i32 %e to double		%r = uitofp i32 %e to double
ret double %r		ret double %r
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][x86] add transform/hook to vectorize: cast(extract V, Y)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 182087

include/llvm/CodeGen/TargetLowering.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/Target/X86/X86ISelLowering.h

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/known-bits-vector.ll

test/CodeGen/X86/known-signbits-vector.ll

test/CodeGen/X86/vec_int_to_fp.ll

[DAGCombiner][x86] add transform/hook to vectorize: cast(extract V, Y)
AbandonedPublic