This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Vectorize v2i32 to v2f64 conversions
ClosedPublic

Authored by RKSimon on Jun 14 2015, 8:33 AM.

Download Raw Diff

Details

Reviewers

spatel
qcolombet
delena
andreadb

Commits

rGcae7b94cbd9d: [X86][SSE] Vectorize v2i32 to v2f64 conversions
rL239855: [X86][SSE] Vectorize v2i32 to v2f64 conversions

Summary

This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 27645.Jun 14 2015, 8:33 AM

RKSimon retitled this revision from to [X86][SSE] Vectorize v2i32 to v2f64 conversions.

RKSimon updated this object.

RKSimon edited the test plan for this revision. (Show Details)

RKSimon added reviewers: qcolombet, delena, andreadb, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: Unknown Object (MLST).

Hi Simon,

I am not sure this is the best way to fix this issue.
In particular, I wonder if there is an alternative approach that doesn't involve adding a new target opcode.

At least, on AVX, you can have a canonicalization rule that converts the following dag node sequence:

v4i32: A = ...
v2i32: B  = extract_subvector A, 0
v2f64: C = sint_to_fp B

into:

v4i32: A = ...
v4f64: B  = sint_to_fp A
v2f64: C = extract_subvector B, 0

Then, I think you can add a ISel pattern to match a VCVTDQ2PDrr from a:

(v2f64 (extract_subvector (v4f64 (sint_to_fp v2f64:%V ), 0).

Unfortunately, the combine rule above would not fix the problem for non-AVX targets.
On those targets you will end up with a dag that looks like this:
v2f64 = build_vector (f64 (sint_to_fp i32:A)), (f64 (sint_to_fp i32:B))

Where:

A: i32 = extract_vector_elt %InVec, i64 0
B: i32 = extract_vector_elt %InVec, i64 1

I am not sure if this would be a good approach, but I think one way to fix this is to add a (quite long) ISel pattern to match that sequence and select a VCVTDQ2PDrr.

I hope it helps.
Andrea

test/CodeGen/X86/vec_int_to_fp.ll
11 ↗	(On Diff #27645)	I know that this is unrelated to your patch, but I noticed that on SSE2, this 'i64 extract element has been expanded to 'movd'. Shouldn't this be a 'movq' instead?
14 ↗	(On Diff #27645)	Same as above. Although this is unrelated to your patch, I think this should be 'movq'. Otherwise, we end up losing the upper half of the quadword in input.
26 ↗	(On Diff #27645)	Again, this is unrelated to your patch but this vxorps seems redundant. I haven't looked at the code, but I suspect that this may be caused by a sub-optimal build_vector lowering.
test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll
30–31 ↗	(On Diff #27645)	You can get rid of that FIXME since you fixed it with this patch :-)

Hi Simon,

Look good to me.

Do we want to do something similar with i32 -> PS using CVTPI2PS?

That would require to use MMX so that may not be desirable though.

Just something to experiment ;).

Cheers,
-Quentin

This revision is now accepted and ready to land.Jun 15 2015, 11:46 AM

PS: Shouldn’t we update the vectorizer cost model as well?

Thanks guys for the reviews.

Andrea - I did investigate TableGen / ISel pattern approaches but couldn't manage to make anything work - it doesn't like the fact that v2i32 isn't valid. I believe its why the X86ISD VFPEXT and VFPROUND node types were added in a similar manner. I considered giving the node a more general name 'SINT_TO_FPEXT' but given that cvtdq2pd appears to be the only user of this pattern it didn't seem necessary.

Quentin - yes the MMX instructions (CVTPI2PD and CVTPI2PS) could work in a similar fashion. Are we actively adding MMX/3DNow! lowering? I always thought they were just hidden behind their builtin intrinsics.

I'll add the vectorization costs (+ a test) as part of the submission (or possibly as a followup).

test/CodeGen/X86/vec_int_to_fp.ll
11 ↗	(On Diff #27645)	This has come up before - I was sure we made a bugzilla for this but can't find it (Sanjay can you remember?).
26 ↗	(On Diff #27645)	I'll see if I can find out what's causing it - odd that xmm0 had just been used in exactly the same type of instruction without clearing the upper bits.

In D10433#188307, @RKSimon wrote:

Thanks guys for the reviews.

Andrea - I did investigate TableGen / ISel pattern approaches but couldn't manage to make anything work - it doesn't like the fact that v2i32 isn't valid. I believe its why the X86ISD VFPEXT and VFPROUND node types were added in a similar manner. I considered giving the node a more general name 'SINT_TO_FPEXT' but given that cvtdq2pd appears to be the only user of this pattern it didn't seem necessary.

Thanks for pointing it out.
I see how FP_EXTEND and FP_ROUND are marked as 'Custom' on v2f32.

The patch looks good to me too.

Quentin - yes the MMX instructions (CVTPI2PD and CVTPI2PS) could work in a similar fashion. Are we actively adding MMX/3DNow! lowering? I always thought they were just hidden behind their builtin intrinsics.

I'll add the vectorization costs (+ a test) as part of the submission (or possibly as a followup).

Are we actively adding MMX/3DNow! lowering?

No, not particularly, that was just a thought :).

Closed by commit rL239855: [X86][SSE] Vectorize v2i32 to v2f64 conversions (authored by RKSimon). · Explain WhyJun 16 2015, 2:44 PM

This revision was automatically updated to reflect the committed changes.

Thanks guys - I'll add vectorization costs/tests in the next day or so.

RKSimon mentioned this in rL239966: [X86][SSE] Improved support for vector i16 to float conversions..Jun 17 2015, 3:48 PM

RKSimon mentioned this in rL241394: [X86][SSE] Improved i8/i16 to f64 uint2fp vector conversions.Jul 4 2015, 8:33 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.h

3 lines

X86ISelLowering.cpp

15 lines

X86InstrFragmentsSIMD.td

3 lines

X86InstrSSE.td

15 lines

test/

CodeGen/

X86/

vec_int_to_fp.ll

71 lines

x86-setcc-int-to-fp-combine.ll

8 lines

Diff 27782

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
VTRUNCM,		VTRUNCM,

// Vector FP extend.		// Vector FP extend.
VFPEXT,		VFPEXT,

// Vector FP round.		// Vector FP round.
VFPROUND,		VFPROUND,

		// Vector signed integer to double.
		CVTDQ2PD,

// 128-bit vector logical left / right shift		// 128-bit vector logical left / right shift
VSHLDQ, VSRLDQ,		VSHLDQ, VSRLDQ,

// Vector shift elements		// Vector shift elements
VSHL, VSRL, VSRA,		VSHL, VSRL, VSRA,

// Vector shift elements by immediate		// Vector shift elements by immediate
VSHLI, VSRLI, VSRAI,		VSHLI, VSRLI, VSRAI,
▲ Show 20 Lines • Show All 818 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 909 Lines • ▼ Show 20 Lines	if (!Subtarget->useSoftFloat() && Subtarget->hasSSE2()) {
setOperationAction(ISD::LOAD, MVT::v2f64, Legal);		setOperationAction(ISD::LOAD, MVT::v2f64, Legal);
setOperationAction(ISD::LOAD, MVT::v2i64, Legal);		setOperationAction(ISD::LOAD, MVT::v2i64, Legal);
setOperationAction(ISD::SELECT, MVT::v2f64, Custom);		setOperationAction(ISD::SELECT, MVT::v2f64, Custom);
setOperationAction(ISD::SELECT, MVT::v2i64, Custom);		setOperationAction(ISD::SELECT, MVT::v2i64, Custom);

setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Legal);		setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Legal);
setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Legal);		setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Legal);

		setOperationAction(ISD::SINT_TO_FP, MVT::v2i32, Custom);

setOperationAction(ISD::UINT_TO_FP, MVT::v4i8, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v4i8, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::v4i16, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v4i16, Custom);
// As there is no 64-bit GPR available, we need build a special custom		// As there is no 64-bit GPR available, we need build a special custom
// sequence to convert from v2i32 to v2f32.		// sequence to convert from v2i32 to v2f32.
if (!Subtarget->is64Bit())		if (!Subtarget->is64Bit())
setOperationAction(ISD::UINT_TO_FP, MVT::v2f32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v2f32, Custom);

setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);		setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);
▲ Show 20 Lines • Show All 10,717 Lines • ▼ Show 20 Lines	static SDValue LowerShiftParts(SDValue Op, SelectionDAG &DAG) {
}		}

SDValue Ops[2] = { Lo, Hi };		SDValue Ops[2] = { Lo, Hi };
return DAG.getMergeValues(Ops, dl);		return DAG.getMergeValues(Ops, dl);
}		}

SDValue X86TargetLowering::LowerSINT_TO_FP(SDValue Op,		SDValue X86TargetLowering::LowerSINT_TO_FP(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MVT SrcVT = Op.getOperand(0).getSimpleValueType();		SDValue Src = Op.getOperand(0);
		MVT SrcVT = Src.getSimpleValueType();
		MVT VT = Op.getSimpleValueType();
SDLoc dl(Op);		SDLoc dl(Op);

if (SrcVT.isVector()) {		if (SrcVT.isVector()) {
		if (SrcVT == MVT::v2i32 && VT == MVT::v2f64) {
		return DAG.getNode(X86ISD::CVTDQ2PD, dl, VT,
		DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v4i32, Src,
		DAG.getUNDEF(SrcVT)));
		}
if (SrcVT.getVectorElementType() == MVT::i1) {		if (SrcVT.getVectorElementType() == MVT::i1) {
MVT IntegerVT = MVT::getVectorVT(MVT::i32, SrcVT.getVectorNumElements());		MVT IntegerVT = MVT::getVectorVT(MVT::i32, SrcVT.getVectorNumElements());
return DAG.getNode(ISD::SINT_TO_FP, dl, Op.getValueType(),		return DAG.getNode(ISD::SINT_TO_FP, dl, Op.getValueType(),
DAG.getNode(ISD::SIGN_EXTEND, dl, IntegerVT,		DAG.getNode(ISD::SIGN_EXTEND, dl, IntegerVT, Src));
Op.getOperand(0)));
}		}
return SDValue();		return SDValue();
}		}

assert(SrcVT <= MVT::i64 && SrcVT >= MVT::i16 &&		assert(SrcVT <= MVT::i64 && SrcVT >= MVT::i16 &&
"Unknown SINT_TO_FP to lower!");		"Unknown SINT_TO_FP to lower!");

// These are really Legal; return the operand so the caller accepts it as		// These are really Legal; return the operand so the caller accepts it as
▲ Show 20 Lines • Show All 6,825 Lines • ▼ Show 20 Lines	const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {
case X86ISD::VZEXT_LOAD: return "X86ISD::VZEXT_LOAD";		case X86ISD::VZEXT_LOAD: return "X86ISD::VZEXT_LOAD";
case X86ISD::VZEXT: return "X86ISD::VZEXT";		case X86ISD::VZEXT: return "X86ISD::VZEXT";
case X86ISD::VSEXT: return "X86ISD::VSEXT";		case X86ISD::VSEXT: return "X86ISD::VSEXT";
case X86ISD::VTRUNC: return "X86ISD::VTRUNC";		case X86ISD::VTRUNC: return "X86ISD::VTRUNC";
case X86ISD::VTRUNCM: return "X86ISD::VTRUNCM";		case X86ISD::VTRUNCM: return "X86ISD::VTRUNCM";
case X86ISD::VINSERT: return "X86ISD::VINSERT";		case X86ISD::VINSERT: return "X86ISD::VINSERT";
case X86ISD::VFPEXT: return "X86ISD::VFPEXT";		case X86ISD::VFPEXT: return "X86ISD::VFPEXT";
case X86ISD::VFPROUND: return "X86ISD::VFPROUND";		case X86ISD::VFPROUND: return "X86ISD::VFPROUND";
		case X86ISD::CVTDQ2PD: return "X86ISD::CVTDQ2PD";
case X86ISD::VSHLDQ: return "X86ISD::VSHLDQ";		case X86ISD::VSHLDQ: return "X86ISD::VSHLDQ";
case X86ISD::VSRLDQ: return "X86ISD::VSRLDQ";		case X86ISD::VSRLDQ: return "X86ISD::VSRLDQ";
case X86ISD::VSHL: return "X86ISD::VSHL";		case X86ISD::VSHL: return "X86ISD::VSHL";
case X86ISD::VSRL: return "X86ISD::VSRL";		case X86ISD::VSRL: return "X86ISD::VSRL";
case X86ISD::VSRA: return "X86ISD::VSRA";		case X86ISD::VSRA: return "X86ISD::VSRA";
case X86ISD::VSHLI: return "X86ISD::VSHLI";		case X86ISD::VSHLI: return "X86ISD::VSHLI";
case X86ISD::VSRLI: return "X86ISD::VSRLI";		case X86ISD::VSRLI: return "X86ISD::VSRLI";
case X86ISD::VSRAI: return "X86ISD::VSRAI";		case X86ISD::VSRAI: return "X86ISD::VSRAI";
▲ Show 20 Lines • Show All 7,295 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	def X86fhadd : SDNode<"X86ISD::FHADD", SDTFPBinOp>;			def X86fhadd : SDNode<"X86ISD::FHADD", SDTFPBinOp>;
	def X86fhsub : SDNode<"X86ISD::FHSUB", SDTFPBinOp>;			def X86fhsub : SDNode<"X86ISD::FHSUB", SDTFPBinOp>;
	def X86hadd : SDNode<"X86ISD::HADD", SDTIntBinOp>;			def X86hadd : SDNode<"X86ISD::HADD", SDTIntBinOp>;
	def X86hsub : SDNode<"X86ISD::HSUB", SDTIntBinOp>;			def X86hsub : SDNode<"X86ISD::HSUB", SDTIntBinOp>;
	def X86comi : SDNode<"X86ISD::COMI", SDTX86CmpTest>;			def X86comi : SDNode<"X86ISD::COMI", SDTX86CmpTest>;
	def X86ucomi : SDNode<"X86ISD::UCOMI", SDTX86CmpTest>;			def X86ucomi : SDNode<"X86ISD::UCOMI", SDTX86CmpTest>;
	def X86cmps : SDNode<"X86ISD::FSETCC", SDTX86Cmps>;			def X86cmps : SDNode<"X86ISD::FSETCC", SDTX86Cmps>;
	//def X86cmpsd : SDNode<"X86ISD::FSETCCsd", SDTX86Cmpsd>;			//def X86cmpsd : SDNode<"X86ISD::FSETCCsd", SDTX86Cmpsd>;
				def X86cvtdq2pd: SDNode<"X86ISD::CVTDQ2PD",
				SDTypeProfile<1, 1, [SDTCisVT<0, v2f64>,
				SDTCisVT<1, v4i32>]>>;
	def X86pshufb : SDNode<"X86ISD::PSHUFB",			def X86pshufb : SDNode<"X86ISD::PSHUFB",
	SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,			SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisSameAs<0,2>]>>;			SDTCisSameAs<0,2>]>>;
	def X86psadbw : SDNode<"X86ISD::PSADBW",			def X86psadbw : SDNode<"X86ISD::PSADBW",
	SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,			SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisSameAs<0,2>]>>;			SDTCisSameAs<0,2>]>>;
	def X86andnp : SDNode<"X86ISD::ANDNP",			def X86andnp : SDNode<"X86ISD::ANDNP",
	SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,			SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	▲ Show 20 Lines • Show All 678 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,228 Lines • ▼ Show 20 Lines
	def CVTDQ2PDrm : S2SI<0xE6, MRMSrcMem, (outs VR128:$dst), (ins i64mem:$src),			def CVTDQ2PDrm : S2SI<0xE6, MRMSrcMem, (outs VR128:$dst), (ins i64mem:$src),
	"cvtdq2pd\t{$src, $dst\|$dst, $src}", [],			"cvtdq2pd\t{$src, $dst\|$dst, $src}", [],
	IIC_SSE_CVT_PD_RR>, Sched<[WriteCvtI2FLd]>;			IIC_SSE_CVT_PD_RR>, Sched<[WriteCvtI2FLd]>;
	def CVTDQ2PDrr : S2SI<0xE6, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),			def CVTDQ2PDrr : S2SI<0xE6, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
	"cvtdq2pd\t{$src, $dst\|$dst, $src}",			"cvtdq2pd\t{$src, $dst\|$dst, $src}",
	[(set VR128:$dst, (int_x86_sse2_cvtdq2pd VR128:$src))],			[(set VR128:$dst, (int_x86_sse2_cvtdq2pd VR128:$src))],
	IIC_SSE_CVT_PD_RM>, Sched<[WriteCvtI2F]>;			IIC_SSE_CVT_PD_RM>, Sched<[WriteCvtI2F]>;

	// AVX 256-bit register conversion intrinsics			// AVX register conversion intrinsics
	let Predicates = [HasAVX] in {			let Predicates = [HasAVX] in {
				def : Pat<(v2f64 (X86cvtdq2pd (v4i32 VR128:$src))),
				(VCVTDQ2PDrr VR128:$src)>;
				def : Pat<(v2f64 (X86cvtdq2pd (bc_v4i32 (loadv2i64 addr:$src)))),
				(VCVTDQ2PDrm addr:$src)>;

	def : Pat<(v4f64 (sint_to_fp (v4i32 VR128:$src))),			def : Pat<(v4f64 (sint_to_fp (v4i32 VR128:$src))),
	(VCVTDQ2PDYrr VR128:$src)>;			(VCVTDQ2PDYrr VR128:$src)>;
	def : Pat<(v4f64 (sint_to_fp (bc_v4i32 (loadv2i64 addr:$src)))),			def : Pat<(v4f64 (sint_to_fp (bc_v4i32 (loadv2i64 addr:$src)))),
	(VCVTDQ2PDYrm addr:$src)>;			(VCVTDQ2PDYrm addr:$src)>;
	} // Predicates = [HasAVX]			} // Predicates = [HasAVX]

				// SSE2 register conversion intrinsics
				let Predicates = [HasSSE2] in {
				def : Pat<(v2f64 (X86cvtdq2pd (v4i32 VR128:$src))),
				(CVTDQ2PDrr VR128:$src)>;
				def : Pat<(v2f64 (X86cvtdq2pd (bc_v4i32 (loadv2i64 addr:$src)))),
				(CVTDQ2PDrm addr:$src)>;
				} // Predicates = [HasSSE2]

	// Convert packed double to packed single			// Convert packed double to packed single
	// The assembler can recognize rr 256-bit instructions by seeing a ymm			// The assembler can recognize rr 256-bit instructions by seeing a ymm
	// register, but the same isn't true when using memory operands instead.			// register, but the same isn't true when using memory operands instead.
	// Provide other assembly rr and rm forms to address this explicitly.			// Provide other assembly rr and rm forms to address this explicitly.
	def VCVTPD2PSrr : VPDI<0x5A, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),			def VCVTPD2PSrr : VPDI<0x5A, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src),
	"cvtpd2ps\t{$src, $dst\|$dst, $src}",			"cvtpd2ps\t{$src, $dst\|$dst, $src}",
	[(set VR128:$dst, (int_x86_sse2_cvtpd2ps VR128:$src))],			[(set VR128:$dst, (int_x86_sse2_cvtpd2ps VR128:$src))],
	IIC_SSE_CVT_PD_RR>, VEX, Sched<[WriteCvtF2F]>;			IIC_SSE_CVT_PD_RR>, VEX, Sched<[WriteCvtF2F]>;
	▲ Show 20 Lines • Show All 6,636 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll

	Show All 28 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%cvt = sitofp <2 x i64> %a to <2 x double>			%cvt = sitofp <2 x i64> %a to <2 x double>
	ret <2 x double> %cvt			ret <2 x double> %cvt
	}			}

	define <2 x double> @sitofp_2vf64_i32(<4 x i32> %a) {			define <2 x double> @sitofp_2vf64_i32(<4 x i32> %a) {
	; SSE2-LABEL: sitofp_2vf64_i32:			; SSE2-LABEL: sitofp_2vf64_i32:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSE2-NEXT: cvtdq2pd %xmm0, %xmm0
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
	; SSE2-NEXT: movd %xmm1, %rax
	; SSE2-NEXT: cltq
	; SSE2-NEXT: movd %xmm0, %rcx
	; SSE2-NEXT: movslq %ecx, %rcx
	; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rcx, %xmm0
	; SSE2-NEXT: xorps %xmm1, %xmm1
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm1
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: sitofp_2vf64_i32:			; AVX-LABEL: sitofp_2vf64_i32:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0
	; AVX-NEXT: vmovq %xmm0, %rax
	; AVX-NEXT: cltq
	; AVX-NEXT: vpextrq $1, %xmm0, %rcx
	; AVX-NEXT: movslq %ecx, %rcx
	; AVX-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vcvtsi2sdq %rcx, %xmm0, %xmm0
	; AVX-NEXT: vcvtsi2sdq %rax, %xmm0, %xmm1
	; AVX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm1[0],xmm0[0]
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuf = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 0, i32 1>			%shuf = shufflevector <4 x i32> %a, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
	%cvt = sitofp <2 x i32> %shuf to <2 x double>			%cvt = sitofp <2 x i32> %shuf to <2 x double>
	ret <2 x double> %cvt			ret <2 x double> %cvt
	}			}

	define <2 x double> @sitofp_2vf64_i16(<8 x i16> %a) {			define <2 x double> @sitofp_2vf64_i16(<8 x i16> %a) {
	; SSE2-LABEL: sitofp_2vf64_i16:			; SSE2-LABEL: sitofp_2vf64_i16:
	▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%cvt = sitofp <4 x i64> %a to <4 x double>			%cvt = sitofp <4 x i64> %a to <4 x double>
	ret <4 x double> %cvt			ret <4 x double> %cvt
	}			}

	define <4 x double> @sitofp_4vf64_i32(<4 x i32> %a) {			define <4 x double> @sitofp_4vf64_i32(<4 x i32> %a) {
	; SSE2-LABEL: sitofp_4vf64_i32:			; SSE2-LABEL: sitofp_4vf64_i32:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,1,3]			; SSE2-NEXT: cvtdq2pd %xmm0, %xmm2
	; SSE2-NEXT: movd %xmm1, %rax
	; SSE2-NEXT: cltq
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm2
	; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
	; SSE2-NEXT: movd %xmm1, %rax
	; SSE2-NEXT: cltq
	; SSE2-NEXT: xorps %xmm1, %xmm1
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm1
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; SSE2-NEXT: movd %xmm0, %rax
	; SSE2-NEXT: cltq
	; SSE2-NEXT: xorps %xmm1, %xmm1
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; SSE2-NEXT: movd %xmm0, %rax			; SSE2-NEXT: cvtdq2pd %xmm0, %xmm1
	; SSE2-NEXT: cltq			; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm0[0]
	; SSE2-NEXT: movapd %xmm2, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: sitofp_4vf64_i32:			; AVX-LABEL: sitofp_4vf64_i32:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0			; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%cvt = sitofp <4 x i32> %a to <4 x double>			%cvt = sitofp <4 x i32> %a to <4 x double>
	ret <4 x double> %cvt			ret <4 x double> %cvt
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	}			}

	define <4 x double> @sitofp_4vf64_i8(<16 x i8> %a) {			define <4 x double> @sitofp_4vf64_i8(<16 x i8> %a) {
	; SSE2-LABEL: sitofp_4vf64_i8:			; SSE2-LABEL: sitofp_4vf64_i8:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]			; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]			; SSE2-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1],xmm1[2],xmm0[2],xmm1[3],xmm0[3]
	; SSE2-NEXT: psrad $24, %xmm1			; SSE2-NEXT: psrad $24, %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[0,1,1,3]			; SSE2-NEXT: cvtdq2pd %xmm1, %xmm0
	; SSE2-NEXT: movd %xmm2, %rax			; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
	; SSE2-NEXT: cltq			; SSE2-NEXT: cvtdq2pd %xmm1, %xmm1
	; SSE2-NEXT: xorps %xmm0, %xmm0
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm0
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; SSE2-NEXT: movd %xmm2, %rax
	; SSE2-NEXT: cltq
	; SSE2-NEXT: xorps %xmm2, %xmm2
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm2
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,2,3,3]
	; SSE2-NEXT: movd %xmm2, %rax
	; SSE2-NEXT: cltq
	; SSE2-NEXT: xorps %xmm1, %xmm1
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm1
	; SSE2-NEXT: pshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; SSE2-NEXT: movd %xmm2, %rax
	; SSE2-NEXT: cltq
	; SSE2-NEXT: xorps %xmm2, %xmm2
	; SSE2-NEXT: cvtsi2sdq %rax, %xmm2
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: sitofp_4vf64_i8:			; AVX-LABEL: sitofp_4vf64_i8:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vpmovsxbd %xmm0, %xmm0			; AVX-NEXT: vpmovsxbd %xmm0, %xmm0
	; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0			; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	▲ Show 20 Lines • Show All 932 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/x86-setcc-int-to-fp-combine.ll

	Show All 21 Lines
	; the folded nodes.			; the folded nodes.
	define void @foo1(<4 x float> %val, <4 x float> %test, <4 x double>* %p) nounwind {			define void @foo1(<4 x float> %val, <4 x float> %test, <4 x double>* %p) nounwind {
	; CHECK-LABEL: LCPI1_0:			; CHECK-LABEL: LCPI1_0:
	; CHECK-NEXT: .long 1 ## 0x1			; CHECK-NEXT: .long 1 ## 0x1
	; CHECK-NEXT: .long 1 ## 0x1			; CHECK-NEXT: .long 1 ## 0x1
	; CHECK-NEXT: .long 1 ## 0x1			; CHECK-NEXT: .long 1 ## 0x1
	; CHECK-NEXT: .long 1 ## 0x1			; CHECK-NEXT: .long 1 ## 0x1
	; CHECK-LABEL: foo1:			; CHECK-LABEL: foo1:
	; FIXME: The operation gets scalarized. If/when the compiler learns to better			; CHECK: cvtdq2pd
	; use [V]CVTDQ2PD, this will need updated.			; CHECK: cvtdq2pd
	; CHECK: cvtsi2sdq
	; CHECK: cvtsi2sdq
	; CHECK: cvtsi2sdq
	; CHECK: cvtsi2sdq
	%cmp = fcmp oeq <4 x float> %val, %test			%cmp = fcmp oeq <4 x float> %val, %test
	%ext = zext <4 x i1> %cmp to <4 x i32>			%ext = zext <4 x i1> %cmp to <4 x i32>
	%result = sitofp <4 x i32> %ext to <4 x double>			%result = sitofp <4 x i32> %ext to <4 x double>
	store <4 x double> %result, <4 x double>* %p			store <4 x double> %result, <4 x double>* %p
	ret void			ret void
	}			}

	; Also test the general purpose constant folding of int->fp.			; Also test the general purpose constant folding of int->fp.
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines