This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] sink target-supported cast op after concat vectors
ClosedPublic

Authored by spatel on May 4 2020, 12:46 PM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
lebedev.ri

Commits

rG2f1fe1864d25: [DAGCombiner] sink target-supported FP<->int cast op after concat vectors

Summary

Try to combine N short vector cast ops into 1 wide vector cast op:
concat (cast X), (cast Y)... -> cast (concat X, Y...)

This is part of solving PR45794:
https://bugs.llvm.org/show_bug.cgi?id=45794

As noted in the code comment, this is uglier than I was hoping because the opcode determines whether we pass the source or destination type to isOperationLegalOrCustom(). Also IIUC, there's no way to validate what the other (dest or src) type is. Without the extra legality check on that, there's an ARM regression test in test/CodeGen/ARM/isel-v8i32-crash.ll that will crash trying to lower an unsupported v8f32 to v8i16.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.May 4 2020, 12:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2020, 12:46 PM

Herald added subscribers: hiraditya, kristof.beyls, mcrosier. · View Herald Transcript

craig.topper added inline comments.May 4 2020, 1:16 PM

llvm/test/CodeGen/X86/avx-shift.ll
177	Do we do the best thing if the shl is used by another operation that needs to be split? Do we keep the vcvttps2dq split?

spatel mentioned this in rG58c1770b8fb1: [x86] add test for shift+op+concat; NFC.May 4 2020, 2:33 PM

Patch updated:
Added another AVX split/concat test - no diff here.

spatel marked an inline comment as done.May 4 2020, 2:36 PM

spatel added inline comments.

llvm/test/CodeGen/X86/avx-shift.ll
177	Does the next test (vshift08_add) cover the scenario you're thinking of? There's no difference on that one because the concat isn't directly after the cast.

craig.topper added inline comments.May 4 2020, 3:53 PM

llvm/test/CodeGen/X86/avx-shift.ll
177	I think it does. Let me see if this works how I think it does. The shift will be legalized by LegalizeVectorOps first because we run that stage by legalizing operands before users. So the shift gets lowered first. When the shift gets lowered, it should split and produce a concat. Then each part of the split should get legalized. Then the add gets legalized which produces another split. getNode for the extracts for that split will look through the concat produced by the shift? Leaving that concat dead. Then a new concat will be produced for the add split?

spatel marked an inline comment as done.May 5 2020, 5:23 AM

spatel added inline comments.

llvm/test/CodeGen/X86/avx-shift.ll

177

I couldn't visualize it without looking at debug output, but that looks about right to me:

Legalizing vector op: t7: v8i32 = shl t6, t2
-->
...
Creating new node: t18: v4i32 = shl t13, t15
Creating new node: t19: v4i32 = shl t13, t17
Creating new node: t20: v8i32 = concat_vectors t18, t19

Legalizing vector op: t18: v4i32 = shl t13, t15
--> 
...
Creating new node: t28: v4i32 = fp_to_sint t27
Creating new node: t29: v4i32 = mul t13, t28

Legalizing vector op: t8: v8i32 = add t20, t4
-->
...
Creating new node: t40: v4i32 = add t29, t38
Creating new node: t41: v4i32 = add t36, t39
Creating new node: t42: v8i32 = concat_vectors t40, t41

So the add is already directly using the 128-bit "t29" mul node. And we only show the final concat here - "t20" is gone:

Vector-legalized selection DAG: %bb.0 'vshift08:'
SelectionDAG has 33 nodes:
  t0: ch = EntryToken
  t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0
  t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1
                  t15: v4i32 = extract_subvector t2, Constant:i64<0>
                t31: v4i32 = X86ISD::VSHLI t15, TargetConstant:i8<23>
              t26: v4i32 = add t31, t25
            t27: v4f32 = bitcast t26
          t28: v4i32 = fp_to_sint t27
        t29: v4i32 = mul t13, t28
        t38: v4i32 = extract_subvector t4, Constant:i64<0>
      t40: v4i32 = add t29, t38
                  t17: v4i32 = extract_subvector t2, Constant:i64<4>
                t37: v4i32 = X86ISD::VSHLI t17, TargetConstant:i8<23>
              t33: v4i32 = add t37, t25
            t34: v4f32 = bitcast t33
          t35: v4i32 = fp_to_sint t34
        t36: v4i32 = mul t13, t35
        t39: v4i32 = extract_subvector t4, Constant:i64<4>
      t41: v4i32 = add t36, t39
    t42: v8i32 = concat_vectors t40, t41
  t11: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t42
  t13: v4i32 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
  t25: v4i32 = BUILD_VECTOR Constant:i32<1065353216>, Constant:i32<1065353216>, Constant:i32<1065353216>, Constant:i32<1065353216>
  t12: ch = X86ISD::RET_FLAG t11, TargetConstant:i32<0>, Register:v8i32 $ymm0, t11:1

Thanks for checking. LGTM

This revision is now accepted and ready to land.May 5 2020, 1:36 PM

Closed by commit rG2f1fe1864d25: [DAGCombiner] sink target-supported FP<->int cast op after concat vectors (authored by spatel). · Explain WhyMay 6 2020, 7:30 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

66 lines

test/

CodeGen/

X86/

avx-shift.ll

9 lines

concat-cast.ll

145 lines

Diff 262376

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,540 Lines • ▼ Show 20 Lines	for (SDValue Op : N->ops()) {
}		}
}		}

const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
return TLI.buildLegalVectorShuffle(VT, SDLoc(N), DAG.getBitcast(VT, SV0),		return TLI.buildLegalVectorShuffle(VT, SDLoc(N), DAG.getBitcast(VT, SV0),
DAG.getBitcast(VT, SV1), Mask, DAG);		DAG.getBitcast(VT, SV1), Mask, DAG);
}		}

		static SDValue combineConcatVectorOfCasts(SDNode *N, SelectionDAG &DAG) {
		unsigned CastOpcode = N->getOperand(0).getOpcode();
		switch (CastOpcode) {
		case ISD::SINT_TO_FP:
		case ISD::UINT_TO_FP:
		case ISD::FP_TO_SINT:
		case ISD::FP_TO_UINT:
		// TODO: Allow more opcodes?
		// case ISD::BITCAST:
		// case ISD::TRUNCATE:
		// case ISD::ZERO_EXTEND:
		// case ISD::SIGN_EXTEND:
		// case ISD::FP_EXTEND:
		break;
		default:
		return SDValue();
		}

		EVT SrcVT = N->getOperand(0).getOperand(0).getValueType();
		if (!SrcVT.isVector())
		return SDValue();

		// All operands of the concat must be the same kind of cast from the same
		// source type.
		SmallVector<SDValue, 4> SrcOps;
		for (SDValue Op : N->ops()) {
		if (Op.getOpcode() != CastOpcode \|\| !Op.hasOneUse() \|\|
		Op.getOperand(0).getValueType() != SrcVT)
		return SDValue();
		SrcOps.push_back(Op.getOperand(0));
		}

		// The wider cast must be supported by the target. This is unusual because
		// the operation support type parameter depends on the opcode. In addition,
		// check the other type in the cast to make sure this is really legal.
		EVT VT = N->getValueType(0);
		EVT SrcEltVT = SrcVT.getVectorElementType();
		unsigned NumElts = SrcVT.getVectorElementCount().Min * N->getNumOperands();
		EVT ConcatSrcVT = EVT::getVectorVT(*DAG.getContext(), SrcEltVT, NumElts);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		switch (CastOpcode) {
		case ISD::SINT_TO_FP:
		case ISD::UINT_TO_FP:
		if (!TLI.isOperationLegalOrCustom(CastOpcode, ConcatSrcVT) \|\|
		!TLI.isTypeLegal(VT))
		return SDValue();
		break;
		case ISD::FP_TO_SINT:
		case ISD::FP_TO_UINT:
		if (!TLI.isOperationLegalOrCustom(CastOpcode, VT) \|\|
		!TLI.isTypeLegal(ConcatSrcVT))
		return SDValue();
		break;
		default:
		llvm_unreachable("Unexpected cast opcode");
		}

		// concat (cast X), (cast Y)... -> cast (concat X, Y...)
		SDLoc DL(N);
		SDValue NewConcat = DAG.getNode(ISD::CONCAT_VECTORS, DL, ConcatSrcVT, SrcOps);
		return DAG.getNode(CastOpcode, DL, VT, NewConcat);
		}

SDValue DAGCombiner::visitCONCAT_VECTORS(SDNode *N) {		SDValue DAGCombiner::visitCONCAT_VECTORS(SDNode *N) {
// If we only have one input vector, we don't need to do any concatenation.		// If we only have one input vector, we don't need to do any concatenation.
if (N->getNumOperands() == 1)		if (N->getNumOperands() == 1)
return N->getOperand(0);		return N->getOperand(0);

// Check if all of the operands are undefs.		// Check if all of the operands are undefs.
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (ISD::allOperandsUndef(N))		if (ISD::allOperandsUndef(N))
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitCONCAT_VECTORS(SDNode *N) {
if (SDValue V = combineConcatVectorOfScalars(N, DAG))		if (SDValue V = combineConcatVectorOfScalars(N, DAG))
return V;		return V;

// Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE.		// Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE.
if (Level < AfterLegalizeVectorOps && TLI.isTypeLegal(VT))		if (Level < AfterLegalizeVectorOps && TLI.isTypeLegal(VT))
if (SDValue V = combineConcatVectorOfExtracts(N, DAG))		if (SDValue V = combineConcatVectorOfExtracts(N, DAG))
return V;		return V;

		if (SDValue V = combineConcatVectorOfCasts(N, DAG))
		return V;

// Type legalization of vectors and DAG canonicalization of SHUFFLE_VECTOR		// Type legalization of vectors and DAG canonicalization of SHUFFLE_VECTOR
// nodes often generate nop CONCAT_VECTOR nodes.		// nodes often generate nop CONCAT_VECTOR nodes.
// Scan the CONCAT_VECTOR operands and look for a CONCAT operations that		// Scan the CONCAT_VECTOR operands and look for a CONCAT operations that
// place the incoming vectors at the exact same location.		// place the incoming vectors at the exact same location.
SDValue SingleSource = SDValue();		SDValue SingleSource = SDValue();
unsigned PartNumElem = N->getOperand(0).getValueType().getVectorNumElements();		unsigned PartNumElem = N->getOperand(0).getValueType().getVectorNumElements();

for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) {
▲ Show 20 Lines • Show All 3,074 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-shift.ll

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%s = shl <32 x i8> %a, <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>		%s = shl <32 x i8> %a, <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>
ret <32 x i8> %s		ret <32 x i8> %s
}		}

;;; Support variable shifts		;;; Support variable shifts
define <8 x i32> @vshift08(<8 x i32> %a) {		define <8 x i32> @vshift08(<8 x i32> %a) {
; CHECK-LABEL: vshift08:		; CHECK-LABEL: vshift08:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpslld $23, %xmm0, %xmm1		; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm1
		; CHECK-NEXT: vpslld $23, %xmm1, %xmm1
; CHECK-NEXT: vmovdqa {{.*#+}} xmm2 = [1065353216,1065353216,1065353216,1065353216]		; CHECK-NEXT: vmovdqa {{.*#+}} xmm2 = [1065353216,1065353216,1065353216,1065353216]
; CHECK-NEXT: vpaddd %xmm2, %xmm1, %xmm1		; CHECK-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; CHECK-NEXT: vcvttps2dq %xmm1, %xmm1
; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm0
; CHECK-NEXT: vpslld $23, %xmm0, %xmm0		; CHECK-NEXT: vpslld $23, %xmm0, %xmm0
; CHECK-NEXT: vpaddd %xmm2, %xmm0, %xmm0		; CHECK-NEXT: vpaddd %xmm2, %xmm0, %xmm0
; CHECK-NEXT: vcvttps2dq %xmm0, %xmm0		; CHECK-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0		; CHECK-NEXT: vcvttps2dq %ymm0, %ymm0
		craig.topperUnsubmitted Not Done Reply Inline Actions Do we do the best thing if the shl is used by another operation that needs to be split? Do we keep the vcvttps2dq split? craig.topper: Do we do the best thing if the shl is used by another operation that needs to be split? Do we…
		spatelAuthorUnsubmitted Done Reply Inline Actions Does the next test (vshift08_add) cover the scenario you're thinking of? There's no difference on that one because the concat isn't directly after the cast. spatel: Does the next test (vshift08_add) cover the scenario you're thinking of? There's no difference…
		craig.topperUnsubmitted Not Done Reply Inline Actions I think it does. Let me see if this works how I think it does. The shift will be legalized by LegalizeVectorOps first because we run that stage by legalizing operands before users. So the shift gets lowered first. When the shift gets lowered, it should split and produce a concat. Then each part of the split should get legalized. Then the add gets legalized which produces another split. getNode for the extracts for that split will look through the concat produced by the shift? Leaving that concat dead. Then a new concat will be produced for the add split? craig.topper: I think it does. Let me see if this works how I think it does. The shift will be legalized by…
		spatelAuthorUnsubmitted Done Reply Inline Actions I couldn't visualize it without looking at debug output, but that looks about right to me: Legalizing vector op: t7: v8i32 = shl t6, t2 --> ... Creating new node: t18: v4i32 = shl t13, t15 Creating new node: t19: v4i32 = shl t13, t17 Creating new node: t20: v8i32 = concat_vectors t18, t19 Legalizing vector op: t18: v4i32 = shl t13, t15 --> ... Creating new node: t28: v4i32 = fp_to_sint t27 Creating new node: t29: v4i32 = mul t13, t28 Legalizing vector op: t8: v8i32 = add t20, t4 --> ... Creating new node: t40: v4i32 = add t29, t38 Creating new node: t41: v4i32 = add t36, t39 Creating new node: t42: v8i32 = concat_vectors t40, t41 So the add is already directly using the 128-bit "t29" mul node. And we only show the final concat here - "t20" is gone: Vector-legalized selection DAG: %bb.0 'vshift08:' SelectionDAG has 33 nodes: t0: ch = EntryToken t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0 t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1 t15: v4i32 = extract_subvector t2, Constant:i64<0> t31: v4i32 = X86ISD::VSHLI t15, TargetConstant:i8<23> t26: v4i32 = add t31, t25 t27: v4f32 = bitcast t26 t28: v4i32 = fp_to_sint t27 t29: v4i32 = mul t13, t28 t38: v4i32 = extract_subvector t4, Constant:i64<0> t40: v4i32 = add t29, t38 t17: v4i32 = extract_subvector t2, Constant:i64<4> t37: v4i32 = X86ISD::VSHLI t17, TargetConstant:i8<23> t33: v4i32 = add t37, t25 t34: v4f32 = bitcast t33 t35: v4i32 = fp_to_sint t34 t36: v4i32 = mul t13, t35 t39: v4i32 = extract_subvector t4, Constant:i64<4> t41: v4i32 = add t36, t39 t42: v8i32 = concat_vectors t40, t41 t11: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t42 t13: v4i32 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1> t25: v4i32 = BUILD_VECTOR Constant:i32<1065353216>, Constant:i32<1065353216>, Constant:i32<1065353216>, Constant:i32<1065353216> t12: ch = X86ISD::RET_FLAG t11, TargetConstant:i32<0>, Register:v8i32 $ymm0, t11:1 spatel: I couldn't visualize it without looking at debug output, but that looks about right to me…
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%bitop = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, %a		%bitop = shl <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, %a
ret <8 x i32> %bitop		ret <8 x i32> %bitop
}		}

define <8 x i32> @vshift08_add(<8 x i32> %a, <8 x i32> %y) {		define <8 x i32> @vshift08_add(<8 x i32> %a, <8 x i32> %y) {
; CHECK-LABEL: vshift08_add:		; CHECK-LABEL: vshift08_add:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/concat-cast.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2			; RUN: llc < %s -mtriple=x86_64-- -mattr=+sse2 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE2
	; RUN: llc < %s -mtriple=x86_64-- -mattr=+sse4.1 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE4			; RUN: llc < %s -mtriple=x86_64-- -mattr=+sse4.1 \| FileCheck %s --check-prefixes=CHECK,SSE,SSE4
	; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1			; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx \| FileCheck %s --check-prefixes=CHECK,AVX,AVX1
	; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2			; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK,AVX,AVX2
	; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512			; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK,AVX,AVX512

	define <4 x float> @sitofp_v4i32_v4f32(<2 x i32> %x, <2 x i32> %y) {			define <4 x float> @sitofp_v4i32_v4f32(<2 x i32> %x, <2 x i32> %y) {
	; SSE-LABEL: sitofp_v4i32_v4f32:			; SSE-LABEL: sitofp_v4i32_v4f32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE-NEXT: cvtdq2ps %xmm1, %xmm1
	; SSE-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; SSE-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; SSE-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: sitofp_v4i32_v4f32:			; AVX-LABEL: sitofp_v4i32_v4f32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvtdq2ps %xmm0, %xmm0
	; AVX-NEXT: vcvtdq2ps %xmm1, %xmm1
	; AVX-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX-NEXT: vcvtdq2ps %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%s0 = sitofp <2 x i32> %x to <2 x float>			%s0 = sitofp <2 x i32> %x to <2 x float>
	%s1 = sitofp <2 x i32> %y to <2 x float>			%s1 = sitofp <2 x i32> %y to <2 x float>
	%r = shufflevector <2 x float> %s0, <2 x float> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x float> %s0, <2 x float> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @uitofp_v4i32_v4f32(<2 x i32> %x, <2 x i32> %y) {			define <4 x float> @uitofp_v4i32_v4f32(<2 x i32> %x, <2 x i32> %y) {
	; SSE2-LABEL: uitofp_v4i32_v4f32:			; SSE2-LABEL: uitofp_v4i32_v4f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: xorpd %xmm2, %xmm2			; SSE2-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]			; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [65535,65535,65535,65535]
	; SSE2-NEXT: movapd {{.*#+}} xmm3 = [4.503599627370496E+15,4.503599627370496E+15]			; SSE2-NEXT: pand %xmm0, %xmm1
	; SSE2-NEXT: orpd %xmm3, %xmm0			; SSE2-NEXT: por {{.*}}(%rip), %xmm1
	; SSE2-NEXT: subpd %xmm3, %xmm0			; SSE2-NEXT: psrld $16, %xmm0
	; SSE2-NEXT: cvtpd2ps %xmm0, %xmm0			; SSE2-NEXT: por {{.*}}(%rip), %xmm0
	; SSE2-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]			; SSE2-NEXT: subps {{.*}}(%rip), %xmm0
	; SSE2-NEXT: orpd %xmm3, %xmm1			; SSE2-NEXT: addps %xmm1, %xmm0
	; SSE2-NEXT: subpd %xmm3, %xmm1
	; SSE2-NEXT: cvtpd2ps %xmm1, %xmm1
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE4-LABEL: uitofp_v4i32_v4f32:			; SSE4-LABEL: uitofp_v4i32_v4f32:
	; SSE4: # %bb.0:			; SSE4: # %bb.0:
	; SSE4-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; SSE4-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE4-NEXT: movdqa {{.*#+}} xmm2 = [4.503599627370496E+15,4.503599627370496E+15]			; SSE4-NEXT: movdqa {{.*#+}} xmm1 = [1258291200,1258291200,1258291200,1258291200]
	; SSE4-NEXT: por %xmm2, %xmm0			; SSE4-NEXT: pblendw {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3],xmm0[4],xmm1[5],xmm0[6],xmm1[7]
	; SSE4-NEXT: subpd %xmm2, %xmm0			; SSE4-NEXT: psrld $16, %xmm0
	; SSE4-NEXT: cvtpd2ps %xmm0, %xmm0			; SSE4-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
	; SSE4-NEXT: pmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero			; SSE4-NEXT: subps {{.*}}(%rip), %xmm0
	; SSE4-NEXT: por %xmm2, %xmm1			; SSE4-NEXT: addps %xmm1, %xmm0
	; SSE4-NEXT: subpd %xmm2, %xmm1
	; SSE4-NEXT: cvtpd2ps %xmm1, %xmm1
	; SSE4-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE4-NEXT: retq			; SSE4-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_v4i32_v4f32:			; AVX1-LABEL: uitofp_v4i32_v4f32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [4.503599627370496E+15,4.503599627370496E+15]			; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
	; AVX1-NEXT: vpor %xmm2, %xmm0, %xmm0			; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
	; AVX1-NEXT: vsubpd %xmm2, %xmm0, %xmm0			; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],mem[1],xmm0[2],mem[3],xmm0[4],mem[5],xmm0[6],mem[7]
	; AVX1-NEXT: vcvtpd2ps %xmm0, %xmm0			; AVX1-NEXT: vsubps {{.*}}(%rip), %xmm0, %xmm0
	; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero			; AVX1-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: vpor %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vsubpd %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vcvtpd2ps %xmm1, %xmm1
	; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_v4i32_v4f32:			; AVX2-LABEL: uitofp_v4i32_v4f32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = [4.503599627370496E+15,4.503599627370496E+15]			; AVX2-NEXT: vpbroadcastd {{.*#+}} xmm1 = [1258291200,1258291200,1258291200,1258291200]
	; AVX2-NEXT: vpor %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3],xmm0[4],xmm1[5],xmm0[6],xmm1[7]
	; AVX2-NEXT: vsubpd %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpsrld $16, %xmm0, %xmm0
	; AVX2-NEXT: vcvtpd2ps %xmm0, %xmm0			; AVX2-NEXT: vpbroadcastd {{.*#+}} xmm2 = [1392508928,1392508928,1392508928,1392508928]
	; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero			; AVX2-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
	; AVX2-NEXT: vpor %xmm2, %xmm1, %xmm1			; AVX2-NEXT: vbroadcastss {{.*#+}} xmm2 = [5.49764202E+11,5.49764202E+11,5.49764202E+11,5.49764202E+11]
	; AVX2-NEXT: vsubpd %xmm2, %xmm1, %xmm1			; AVX2-NEXT: vsubps %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vcvtpd2ps %xmm1, %xmm1			; AVX2-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: uitofp_v4i32_v4f32:			; AVX512-LABEL: uitofp_v4i32_v4f32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512-NEXT: vcvtudq2ps %zmm0, %zmm0
	; AVX512-NEXT: vcvtudq2ps %zmm1, %zmm1
	; AVX512-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512-NEXT: vcvtudq2ps %zmm0, %zmm0
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%s0 = uitofp <2 x i32> %x to <2 x float>			%s0 = uitofp <2 x i32> %x to <2 x float>
	%s1 = uitofp <2 x i32> %y to <2 x float>			%s1 = uitofp <2 x i32> %y to <2 x float>
	%r = shufflevector <2 x float> %s0, <2 x float> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x float> %s0, <2 x float> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x i32> @fptosi_v4f32_v4i32(<2 x float> %x, <2 x float> %y) {			define <4 x i32> @fptosi_v4f32_v4i32(<2 x float> %x, <2 x float> %y) {
	; SSE-LABEL: fptosi_v4f32_v4i32:			; SSE-LABEL: fptosi_v4f32_v4i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttps2dq %xmm0, %xmm0
	; SSE-NEXT: cvttps2dq %xmm1, %xmm1
	; SSE-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; SSE-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; SSE-NEXT: cvttps2dq %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: fptosi_v4f32_v4i32:			; AVX-LABEL: fptosi_v4f32_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttps2dq %xmm0, %xmm0
	; AVX-NEXT: vcvttps2dq %xmm1, %xmm1
	; AVX-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX-NEXT: vcvttps2dq %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%s0 = fptosi <2 x float> %x to <2 x i32>			%s0 = fptosi <2 x float> %x to <2 x i32>
	%s1 = fptosi <2 x float> %y to <2 x i32>			%s1 = fptosi <2 x float> %y to <2 x i32>
	%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @fptoui_v4f32_v4i32(<2 x float> %x, <2 x float> %y) {			define <4 x i32> @fptoui_v4f32_v4i32(<2 x float> %x, <2 x float> %y) {
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vxorps %xmm5, %xmm2, %xmm2			; AVX2-NEXT: vxorps %xmm5, %xmm2, %xmm2
	; AVX2-NEXT: vcvttps2dq %xmm1, %xmm1			; AVX2-NEXT: vcvttps2dq %xmm1, %xmm1
	; AVX2-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1			; AVX2-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1
	; AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: fptoui_v4f32_v4i32:			; AVX512-LABEL: fptoui_v4f32_v4i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512-NEXT: vcvttps2udq %zmm0, %zmm0
	; AVX512-NEXT: vcvttps2udq %zmm1, %zmm1
	; AVX512-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX512-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512-NEXT: vcvttps2udq %zmm0, %zmm0
				; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%s0 = fptoui <2 x float> %x to <2 x i32>			%s0 = fptoui <2 x float> %x to <2 x i32>
	%s1 = fptoui <2 x float> %y to <2 x i32>			%s1 = fptoui <2 x float> %y to <2 x i32>
	%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x double> @sitofp_v4i32_v4f64(<2 x i32> %x, <2 x i32> %y) {			define <4 x double> @sitofp_v4i32_v4f64(<2 x i32> %x, <2 x i32> %y) {
	; SSE-LABEL: sitofp_v4i32_v4f64:			; SSE-LABEL: sitofp_v4i32_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvtdq2pd %xmm0, %xmm0			; SSE-NEXT: cvtdq2pd %xmm0, %xmm0
	; SSE-NEXT: cvtdq2pd %xmm1, %xmm1			; SSE-NEXT: cvtdq2pd %xmm1, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: sitofp_v4i32_v4f64:			; AVX-LABEL: sitofp_v4i32_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvtdq2pd %xmm0, %xmm0			; AVX-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX-NEXT: vcvtdq2pd %xmm1, %xmm1			; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0
	; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%s0 = sitofp <2 x i32> %x to <2 x double>			%s0 = sitofp <2 x i32> %x to <2 x double>
	%s1 = sitofp <2 x i32> %y to <2 x double>			%s1 = sitofp <2 x i32> %y to <2 x double>
	%r = shufflevector <2 x double> %s0, <2 x double> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x double> %s0, <2 x double> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x double> %r			ret <4 x double> %r
	}			}

	define <4 x double> @uitofp_v4i32_v4f64(<2 x i32> %x, <2 x i32> %y) {			define <4 x double> @uitofp_v4i32_v4f64(<2 x i32> %x, <2 x i32> %y) {
	Show All 18 Lines
	; SSE4-NEXT: pmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero			; SSE4-NEXT: pmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero
	; SSE4-NEXT: por %xmm2, %xmm1			; SSE4-NEXT: por %xmm2, %xmm1
	; SSE4-NEXT: subpd %xmm2, %xmm1			; SSE4-NEXT: subpd %xmm2, %xmm1
	; SSE4-NEXT: retq			; SSE4-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_v4i32_v4f64:			; AVX1-LABEL: uitofp_v4i32_v4f64:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [4.503599627370496E+15,4.503599627370496E+15]
	; AVX1-NEXT: vpor %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vsubpd %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero			; AVX1-NEXT: vpmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero
	; AVX1-NEXT: vpor %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vsubpd %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: vbroadcastsd {{.*#+}} ymm1 = [4.503599627370496E+15,4.503599627370496E+15,4.503599627370496E+15,4.503599627370496E+15]
				; AVX1-NEXT: vorpd %ymm1, %ymm0, %ymm0
				; AVX1-NEXT: vsubpd %ymm1, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_v4i32_v4f64:			; AVX2-LABEL: uitofp_v4i32_v4f64:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = [4.503599627370496E+15,4.503599627370496E+15]			; AVX2-NEXT: vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; AVX2-NEXT: vpor %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpbroadcastq {{.*#+}} ymm1 = [4.503599627370496E+15,4.503599627370496E+15,4.503599627370496E+15,4.503599627370496E+15]
	; AVX2-NEXT: vsubpd %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpmovzxdq {{.*#+}} xmm1 = xmm1[0],zero,xmm1[1],zero			; AVX2-NEXT: vsubpd %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpor %xmm2, %xmm1, %xmm1
	; AVX2-NEXT: vsubpd %xmm2, %xmm1, %xmm1
	; AVX2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: uitofp_v4i32_v4f64:			; AVX512-LABEL: uitofp_v4i32_v4f64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1			; AVX512-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512-NEXT: vcvtudq2pd %ymm0, %zmm0			; AVX512-NEXT: vcvtudq2pd %ymm0, %zmm0
	; AVX512-NEXT: vcvtudq2pd %ymm1, %zmm1			; AVX512-NEXT: # kill: def $ymm0 killed $ymm0 killed $zmm0
	; AVX512-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%s0 = uitofp <2 x i32> %x to <2 x double>			%s0 = uitofp <2 x i32> %x to <2 x double>
	%s1 = uitofp <2 x i32> %y to <2 x double>			%s1 = uitofp <2 x i32> %y to <2 x double>
	%r = shufflevector <2 x double> %s0, <2 x double> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x double> %s0, <2 x double> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x double> %r			ret <4 x double> %r
	}			}

	define <4 x i32> @fptosi_v4f64_v4i32(<2 x double> %x, <2 x double> %y) {			define <4 x i32> @fptosi_v4f64_v4i32(<2 x double> %x, <2 x double> %y) {
	; SSE-LABEL: fptosi_v4f64_v4i32:			; SSE-LABEL: fptosi_v4f64_v4i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvttpd2dq %xmm0, %xmm0			; SSE-NEXT: cvttpd2dq %xmm0, %xmm0
	; SSE-NEXT: cvttpd2dq %xmm1, %xmm1			; SSE-NEXT: cvttpd2dq %xmm1, %xmm1
	; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: fptosi_v4f64_v4i32:			; AVX-LABEL: fptosi_v4f64_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vcvttpd2dq %xmm0, %xmm0			; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vcvttpd2dq %xmm1, %xmm1			; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX-NEXT: vcvttpd2dq %ymm0, %xmm0
				; AVX-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%s0 = fptosi <2 x double> %x to <2 x i32>			%s0 = fptosi <2 x double> %x to <2 x i32>
	%s1 = fptosi <2 x double> %y to <2 x i32>			%s1 = fptosi <2 x double> %y to <2 x i32>
	%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @fptoui_v4f64_v4i32(<2 x double> %x, <2 x double> %y) {			define <4 x i32> @fptoui_v4f64_v4i32(<2 x double> %x, <2 x double> %y) {
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1			; AVX2-NEXT: vcvttpd2dq %ymm1, %xmm1
	; AVX2-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1			; AVX2-NEXT: vblendvps %xmm3, %xmm1, %xmm2, %xmm1
	; AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: fptoui_v4f64_v4i32:			; AVX512-LABEL: fptoui_v4f64_v4i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX512-NEXT: vcvttpd2udq %zmm0, %ymm0			; AVX512-NEXT: vcvttpd2udq %zmm0, %ymm0
	; AVX512-NEXT: vcvttpd2udq %zmm1, %ymm1			; AVX512-NEXT: # kill: def $xmm0 killed $xmm0 killed $ymm0
	; AVX512-NEXT: vmovlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%s0 = fptoui <2 x double> %x to <2 x i32>			%s0 = fptoui <2 x double> %x to <2 x i32>
	%s1 = fptoui <2 x double> %y to <2 x i32>			%s1 = fptoui <2 x double> %y to <2 x i32>
	%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x i32> %s0, <2 x i32> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

				; Negative test

	define <4 x float> @mismatch_tofp_v4i32_v4f32(<2 x i32> %x, <2 x i32> %y) {			define <4 x float> @mismatch_tofp_v4i32_v4f32(<2 x i32> %x, <2 x i32> %y) {
	; SSE2-LABEL: mismatch_tofp_v4i32_v4f32:			; SSE2-LABEL: mismatch_tofp_v4i32_v4f32:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: xorpd %xmm2, %xmm2			; SSE2-NEXT: xorpd %xmm2, %xmm2
	; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]			; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
	; SSE2-NEXT: movapd {{.*#+}} xmm2 = [4.503599627370496E+15,4.503599627370496E+15]			; SSE2-NEXT: movapd {{.*#+}} xmm2 = [4.503599627370496E+15,4.503599627370496E+15]
	; SSE2-NEXT: orpd %xmm2, %xmm0			; SSE2-NEXT: orpd %xmm2, %xmm0
	; SSE2-NEXT: subpd %xmm2, %xmm0			; SSE2-NEXT: subpd %xmm2, %xmm0
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%s0 = uitofp <2 x i32> %x to <2 x float>			%s0 = uitofp <2 x i32> %x to <2 x float>
	%s1 = sitofp <2 x i32> %y to <2 x float>			%s1 = sitofp <2 x i32> %y to <2 x float>
	%r = shufflevector <2 x float> %s0, <2 x float> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%r = shufflevector <2 x float> %s0, <2 x float> %s1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x float> %r			ret <4 x float> %r
	}			}

				; Negative test

	define <4 x float> @sitofp_v4i32_v4f32_extra_use(<2 x i32> %x, <2 x i32> %y, <2 x float>* %p) {			define <4 x float> @sitofp_v4i32_v4f32_extra_use(<2 x i32> %x, <2 x i32> %y, <2 x float>* %p) {
	; SSE-LABEL: sitofp_v4i32_v4f32_extra_use:			; SSE-LABEL: sitofp_v4i32_v4f32_extra_use:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm0			; SSE-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE-NEXT: cvtdq2ps %xmm1, %xmm1			; SSE-NEXT: cvtdq2ps %xmm1, %xmm1
	; SSE-NEXT: movlps %xmm1, (%rdi)			; SSE-NEXT: movlps %xmm1, (%rdi)
	; SSE-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; SSE-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] sink target-supported cast op after concat vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 262376

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/X86/avx-shift.ll

llvm/test/CodeGen/X86/concat-cast.ll

[DAGCombiner] sink target-supported cast op after concat vectors
ClosedPublic