This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657)
ClosedPublic

Authored by spatel on Apr 27 2017, 3:53 PM.

Download Raw Diff

Details

Reviewers

efriedma
nadav
RKSimon

Commits

rGad13826aea22: [DAGCombiner] shrink/widen a vselect to match its condition operand size…
rL301781: [DAGCombiner] shrink/widen a vselect to match its condition operand size…

Summary

We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check legality of the select that we want to produce.

A few things to note:

We can't wait until after legalization and do this generically because (at least in the x86 tests from PR14657), we'll have PACKSS and bitcasts in the pattern.
This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but I think that requires a closer look to make sure we don't end up worse.
There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases. I thought I'd better just post this as-is to make sure I'm not off the rails, but I could fix that first.
I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple legal vector sizes, but I can certainly add tests for other targets to make sure this isn't doing harm.
There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests): despite IR that is terrible for the target, this patch allows us to generate the optimal loop code because something post-ISEL is hoisting the splat extends above the vector loops.

Diff Detail

Event Timeline

spatel created this revision.Apr 27 2017, 3:53 PM

Herald added subscribers: mcrosier, rengolin, aemerson. · View Herald TranscriptApr 27 2017, 3:53 PM

nadav added inline comments.Apr 27 2017, 4:05 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6935	I think that it can be a good idea to check which DAGCombine phase we are in and make sure that we are in the pre-legalization phase.
6945	I think that we should check that a and b have one use.

Patch updated:

Bail out if we require LegalOperations (unlikely that this could have benefits if we are past legalization).
That change tilted the scale to make this a proper DAGCombiner member rather than a static, so some cosmetic diffs from that.
Improved code comments, structure, variable names, and assert message (no functional change from any of these either).

spatel added inline comments.Apr 28 2017, 9:38 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

6945

I could not find a reason for this. Apologies for moving the goal posts, but I renamed the variables in the updated version of the patch in an attempt to make the code clearer.

Your reference to 'a and b' was to the setcc operands. These are not in play with this transform. Ie, the setcc remains as-is regardless of what happens to the vselect.

'c and d' were the vselect operands (these are now 'A and B'), but I don't see a benefit to checking the number of uses on those either.

Here are a couple of hacked up tests to show these cases. I can add these to the patch if that would help.

define <4 x double> @fpext2(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) {
%cmp = fcmp olt <4 x double> %x, %y
%sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
%ext = fpext <4 x float> %sel to <4 x double>
%add = fadd <4 x double> %x, %y   <--- extra uses of fcmp operands
%div = fdiv <4 x double> %ext, %add
ret <4 x double> %div
}

With this patch (no one-use check on the compare operands):

vcmpltpd   %ymm1, %ymm0, %ymm4
vcvtps2pd  %xmm2, %ymm2
vcvtps2pd  %xmm3, %ymm3
vblendvpd  %ymm4, %ymm2, %ymm3, %ymm2
vaddpd     %ymm1, %ymm0, %ymm0
vdivpd     %ymm0, %ymm2, %ymm0

If we bail out on hasOneUse():

vcmpltpd      %ymm1, %ymm0, %ymm4
vextractf128  $1, %ymm4, %xmm5
vpacksswb     %xmm5, %xmm4, %xmm4
vblendvps     %xmm4, %xmm2, %xmm3, %xmm2
vcvtps2pd     %xmm2, %ymm2
vaddpd        %ymm1, %ymm0, %ymm0
vdivpd        %ymm0, %ymm2, %ymm0

And the case where the select ops have >1 use:

define <4 x double> @fpext3(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) {
%cmp = fcmp olt <4 x double> %x, %y
%sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
%ext = fpext <4 x float> %sel to <4 x double>
%add = fadd <4 x float> %a, %b   <--- extra uses of select ops
%ext2 = fpext <4 x float> %add to <4 x double>
%div = fdiv <4 x double> %ext, %ext2
ret <4 x double> %div
}

Transformed:

vcmpltpd   %ymm1, %ymm0, %ymm0
vcvtps2pd  %xmm2, %ymm1
vcvtps2pd  %xmm3, %ymm4
vblendvpd  %ymm0, %ymm1, %ymm4, %ymm0
vaddps     %xmm3, %xmm2, %xmm1
vcvtps2pd  %xmm1, %ymm1
vdivpd     %ymm1, %ymm0, %ymm0

Or bail out:

vcmpltpd      %ymm1, %ymm0, %ymm0
vextractf128  $1, %ymm0, %xmm1
vpacksswb     %xmm1, %xmm0, %xmm0
vblendvps     %xmm0, %xmm2, %xmm3, %xmm0
vcvtps2pd     %xmm0, %ymm0
vaddps	      %xmm3, %xmm2, %xmm1
vcvtps2pd     %xmm1, %ymm1
vdivpd        %ymm1, %ymm0, %ymm0

LGTM!

This revision is now accepted and ready to land.Apr 29 2017, 1:08 AM

Closed by commit rL301781: [DAGCombiner] shrink/widen a vselect to match its condition operand size… (authored by spatel). · Explain WhyApr 30 2017, 3:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

61 lines

test/

CodeGen/

X86/

cast-vsel.ll

238 lines

Diff 97005

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,900 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::CombineExtLoad(SDNode *N) {
SDValue Trunc =		SDValue Trunc =
DAG.getNode(ISD::TRUNCATE, SDLoc(N0), N0.getValueType(), NewValue);		DAG.getNode(ISD::TRUNCATE, SDLoc(N0), N0.getValueType(), NewValue);
CombineTo(N0.getNode(), Trunc, NewChain);		CombineTo(N0.getNode(), Trunc, NewChain);
ExtendSetCCUses(SetCCs, Trunc, NewValue, DL,		ExtendSetCCUses(SetCCs, Trunc, NewValue, DL,
(ISD::NodeType)N->getOpcode());		(ISD::NodeType)N->getOpcode());
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}

		/// If we're narrowing or widening the result of a vector select and the final
		/// size is the same size as a setcc (compare) feeding the select, then try to
		/// apply the cast operation to the select's operands because matching vector
		/// sizes for a select condition and other operands should be more efficient.
		static SDValue matchVSelectOpSizesWithSetCC(SDNode *N, SelectionDAG &DAG) {
		unsigned CastOpcode = N->getOpcode();
		assert((CastOpcode == ISD::SIGN_EXTEND \|\| CastOpcode == ISD::ZERO_EXTEND \|\|
		CastOpcode == ISD::TRUNCATE \|\| CastOpcode == ISD::FP_EXTEND \|\|
		CastOpcode == ISD::FP_ROUND) &&
		"Unexpected cast for vsel fold");

		SDValue N0 = N->getOperand(0);
		EVT VT = N->getValueType(0);
		SDLoc DL(N);
		if (N0.getOpcode() != ISD::VSELECT \|\| !N0.hasOneUse() \|\|
		N0.getOperand(0).getOpcode() != ISD::SETCC)
		return SDValue();

		SDValue SetCC = N0.getOperand(0);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT SetCCVT = TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
		SetCC.getOperand(0).getValueType());

		// We want to do this transform before legalization because the pattern may be
		// obfuscated by target-specific operations after legalization. However, we do
		// not want to create an illegal select.
		if (SetCCVT.getScalarSizeInBits() != VT.getScalarSizeInBits() \|\|
		nadavUnsubmitted Done Reply Inline Actions I think that it can be a good idea to check which DAGCombine phase we are in and make sure that we are in the pre-legalization phase. nadav: I think that it can be a good idea to check which DAGCombine phase we are in and make sure that…
		!TLI.isOperationLegalOrCustom(ISD::VSELECT, VT))
		return SDValue();

		SDValue SelOp1 = N0.getOperand(1);
		SDValue SelOp2 = N0.getOperand(2);

		// cast (vsel (cmp a, b), c, d) --> vsel (cmp a, b), (cast c), (cast d)
		if (CastOpcode == ISD::FP_ROUND) {
		// FP_ROUND (fptrunc) has an extra operand.
		SDValue CastOp1 = DAG.getNode(CastOpcode, DL, VT, SelOp1, N->getOperand(1));
		nadavUnsubmitted Not Done Reply Inline Actions I think that we should check that a and b have one use. nadav: I think that we should check that a and b have one use.
		spatelAuthorUnsubmitted Not Done Reply Inline Actions I could not find a reason for this. Apologies for moving the goal posts, but I renamed the variables in the updated version of the patch in an attempt to make the code clearer. Your reference to 'a and b' was to the setcc operands. These are not in play with this transform. Ie, the setcc remains as-is regardless of what happens to the vselect. 'c and d' were the vselect operands (these are now 'A and B'), but I don't see a benefit to checking the number of uses on those either. Here are a couple of hacked up tests to show these cases. I can add these to the patch if that would help. define <4 x double> @fpext2(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) { %cmp = fcmp olt <4 x double> %x, %y %sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b %ext = fpext <4 x float> %sel to <4 x double> %add = fadd <4 x double> %x, %y <--- extra uses of fcmp operands %div = fdiv <4 x double> %ext, %add ret <4 x double> %div } With this patch (no one-use check on the compare operands): vcmpltpd %ymm1, %ymm0, %ymm4 vcvtps2pd %xmm2, %ymm2 vcvtps2pd %xmm3, %ymm3 vblendvpd %ymm4, %ymm2, %ymm3, %ymm2 vaddpd %ymm1, %ymm0, %ymm0 vdivpd %ymm0, %ymm2, %ymm0 If we bail out on hasOneUse(): vcmpltpd %ymm1, %ymm0, %ymm4 vextractf128 $1, %ymm4, %xmm5 vpacksswb %xmm5, %xmm4, %xmm4 vblendvps %xmm4, %xmm2, %xmm3, %xmm2 vcvtps2pd %xmm2, %ymm2 vaddpd %ymm1, %ymm0, %ymm0 vdivpd %ymm0, %ymm2, %ymm0 And the case where the select ops have >1 use: define <4 x double> @fpext3(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) { %cmp = fcmp olt <4 x double> %x, %y %sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b %ext = fpext <4 x float> %sel to <4 x double> %add = fadd <4 x float> %a, %b <--- extra uses of select ops %ext2 = fpext <4 x float> %add to <4 x double> %div = fdiv <4 x double> %ext, %ext2 ret <4 x double> %div } Transformed: vcmpltpd %ymm1, %ymm0, %ymm0 vcvtps2pd %xmm2, %ymm1 vcvtps2pd %xmm3, %ymm4 vblendvpd %ymm0, %ymm1, %ymm4, %ymm0 vaddps %xmm3, %xmm2, %xmm1 vcvtps2pd %xmm1, %ymm1 vdivpd %ymm1, %ymm0, %ymm0 Or bail out: vcmpltpd %ymm1, %ymm0, %ymm0 vextractf128 $1, %ymm0, %xmm1 vpacksswb %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 vcvtps2pd %xmm0, %ymm0 vaddps %xmm3, %xmm2, %xmm1 vcvtps2pd %xmm1, %ymm1 vdivpd %ymm1, %ymm0, %ymm0 spatel: I could not find a reason for this. Apologies for moving the goal posts, but I renamed the…
		SDValue CastOp2 = DAG.getNode(CastOpcode, DL, VT, SelOp2, N->getOperand(1));
		return DAG.getNode(ISD::VSELECT, DL, VT, SetCC, CastOp1, CastOp2);
		}

		SDValue CastOp1 = DAG.getNode(CastOpcode, DL, VT, SelOp1);
		SDValue CastOp2 = DAG.getNode(CastOpcode, DL, VT, SelOp2);
		return DAG.getNode(ISD::VSELECT, DL, VT, SetCC, CastOp1, CastOp2);
		}

SDValue DAGCombiner::visitSIGN_EXTEND(SDNode *N) {		SDValue DAGCombiner::visitSIGN_EXTEND(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);

if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
LegalOperations))		LegalOperations))
return SDValue(Res, 0);		return SDValue(Res, 0);
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::SETCC) {
}		}
}		}

// fold (sext x) -> (zext x) if the sign bit is known zero.		// fold (sext x) -> (zext x) if the sign bit is known zero.
if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::ZERO_EXTEND, VT)) &&		if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::ZERO_EXTEND, VT)) &&
DAG.SignBitIsZero(N0))		DAG.SignBitIsZero(N0))
return DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0);		return DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0);

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N, DAG))
		return NewVSel;

return SDValue();		return SDValue();
}		}

// isTruncateOf - If N is a truncate of some other value, return true, record		// isTruncateOf - If N is a truncate of some other value, return true, record
// the value being truncated in Op and which of Op's bits are zero in KnownZero.		// the value being truncated in Op and which of Op's bits are zero in KnownZero.
// This function computes KnownZero to avoid a duplicated call to		// This function computes KnownZero to avoid a duplicated call to
// computeKnownBits in the caller.		// computeKnownBits in the caller.
static bool isTruncateOf(SelectionDAG &DAG, SDValue N, SDValue &Op,		static bool isTruncateOf(SelectionDAG &DAG, SDValue N, SDValue &Op,
▲ Show 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	if ((N0.getOpcode() == ISD::SHL \|\| N0.getOpcode() == ISD::SRL) &&
if (VT.getSizeInBits() >= 256)		if (VT.getSizeInBits() >= 256)
ShAmt = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, ShAmt);		ShAmt = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, ShAmt);

return DAG.getNode(N0.getOpcode(), DL, VT,		return DAG.getNode(N0.getOpcode(), DL, VT,
DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0)),		DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0)),
ShAmt);		ShAmt);
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N, DAG))
		return NewVSel;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {		SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
▲ Show 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::ADDE && N0.hasOneUse() &&
(!LegalOperations \|\| TLI.isOperationLegal(ISD::ADDE, VT))) {		(!LegalOperations \|\| TLI.isOperationLegal(ISD::ADDE, VT))) {
SDLoc SL(N);		SDLoc SL(N);
auto X = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(0));		auto X = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(0));
auto Y = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));		auto Y = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));
return DAG.getNode(ISD::ADDE, SL, DAG.getVTList(VT, MVT::Glue),		return DAG.getNode(ISD::ADDE, SL, DAG.getVTList(VT, MVT::Glue),
X, Y, N0.getOperand(2));		X, Y, N0.getOperand(2));
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N, DAG))
		return NewVSel;

return SDValue();		return SDValue();
}		}

static SDNode getBuildPairElt(SDNode N, unsigned i) {		static SDNode getBuildPairElt(SDNode N, unsigned i) {
SDValue Elt = N->getOperand(i);		SDValue Elt = N->getOperand(i);
if (Elt.getOpcode() != ISD::MERGE_VALUES)		if (Elt.getOpcode() != ISD::MERGE_VALUES)
return Elt.getNode();		return Elt.getNode();
return Elt.getOperand(Elt.getResNo()).getNode();		return Elt.getOperand(Elt.getResNo()).getNode();
▲ Show 20 Lines • Show All 1,935 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFP_ROUND(SDNode *N) {
if (N0.getOpcode() == ISD::FCOPYSIGN && N0.getNode()->hasOneUse()) {		if (N0.getOpcode() == ISD::FCOPYSIGN && N0.getNode()->hasOneUse()) {
SDValue Tmp = DAG.getNode(ISD::FP_ROUND, SDLoc(N0), VT,		SDValue Tmp = DAG.getNode(ISD::FP_ROUND, SDLoc(N0), VT,
N0.getOperand(0), N1);		N0.getOperand(0), N1);
AddToWorklist(Tmp.getNode());		AddToWorklist(Tmp.getNode());
return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT,		return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT,
Tmp, N0.getOperand(1));		Tmp, N0.getOperand(1));
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N, DAG))
		return NewVSel;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFP_ROUND_INREG(SDNode *N) {		SDValue DAGCombiner::visitFP_ROUND_INREG(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT EVT = cast<VTSDNode>(N->getOperand(1))->getVT();		EVT EVT = cast<VTSDNode>(N->getOperand(1))->getVT();
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (ISD::isNormalLoad(N0.getNode()) && N0.hasOneUse() &&
CombineTo(N0.getNode(),		CombineTo(N0.getNode(),
DAG.getNode(ISD::FP_ROUND, SDLoc(N0),		DAG.getNode(ISD::FP_ROUND, SDLoc(N0),
N0.getValueType(), ExtLoad,		N0.getValueType(), ExtLoad,
DAG.getIntPtrConstant(1, SDLoc(N0))),		DAG.getIntPtrConstant(1, SDLoc(N0))),
ExtLoad.getValue(1));		ExtLoad.getValue(1));
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N, DAG))
		return NewVSel;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFCEIL(SDNode *N) {		SDValue DAGCombiner::visitFCEIL(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// fold (fceil c1) -> fceil(c1)		// fold (fceil c1) -> fceil(c1)
▲ Show 20 Lines • Show All 6,130 Lines • Show Last 20 Lines

test/CodeGen/X86/cast-vsel.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; SSE41-NEXT: pmovsxwd %xmm0, %xmm1			; SSE41-NEXT: pmovsxwd %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: sext:			; AVX1-LABEL: sext:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vpmovsxwd %xmm2, %xmm1
	; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; AVX1-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX1-NEXT: vpmovsxwd %xmm2, %xmm2
	; AVX1-NEXT: vpand %xmm0, %xmm2, %xmm0			; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpmovsxwd %xmm3, %xmm2
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm1			; AVX1-NEXT: vpshufd {{.*#+}} xmm3 = xmm3[2,3,0,1]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; AVX1-NEXT: vpmovsxwd %xmm3, %xmm3
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm0			; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm2, %ymm2
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX1-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: sext:			; AVX2-LABEL: sext:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX2-NEXT: vpmovsxwd %xmm2, %ymm1
	; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpmovsxwd %xmm3, %ymm2
	; AVX2-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX2-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX2-NEXT: vpand %xmm0, %xmm2, %xmm0
	; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovsxwd %xmm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cmp = fcmp olt <8 x float> %a, %b			%cmp = fcmp olt <8 x float> %a, %b
	%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d			%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d
	%ext = sext <8 x i16> %sel to <8 x i32>			%ext = sext <8 x i16> %sel to <8 x i32>
	ret <8 x i32> %ext			ret <8 x i32> %ext
	}			}

	define <8 x i32> @zext(<8 x float> %a, <8 x float> %b, <8 x i16> %c, <8 x i16> %d) {			define <8 x i32> @zext(<8 x float> %a, <8 x float> %b, <8 x i16> %c, <8 x i16> %d) {
	Show All 32 Lines
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; SSE41-NEXT: pmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; SSE41-NEXT: pmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext:			; AVX1-LABEL: zext:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
	; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; AVX1-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm2 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
	; AVX1-NEXT: vpand %xmm0, %xmm2, %xmm0			; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm2 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero
	; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; AVX1-NEXT: vpshufd {{.*#+}} xmm3 = xmm3[2,3,0,1]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm3 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero
	; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm2, %ymm2
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX1-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext:			; AVX2-LABEL: zext:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero,xmm2[4],zero,xmm2[5],zero,xmm2[6],zero,xmm2[7],zero
	; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm2 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero,xmm3[4],zero,xmm3[5],zero,xmm3[6],zero,xmm3[7],zero
	; AVX2-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX2-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX2-NEXT: vpand %xmm0, %xmm2, %xmm0
	; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cmp = fcmp olt <8 x float> %a, %b			%cmp = fcmp olt <8 x float> %a, %b
	%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d			%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d
	%ext = zext <8 x i16> %sel to <8 x i32>			%ext = zext <8 x i16> %sel to <8 x i32>
	ret <8 x i32> %ext			ret <8 x i32> %ext
	}			}

	define <4 x double> @fpext(<4 x double> %a, <4 x double> %b, <4 x float> %c, <4 x float> %d) {			define <4 x double> @fpext(<4 x double> %a, <4 x double> %b, <4 x float> %c, <4 x float> %d) {
	Show All 20 Lines
	; SSE41-NEXT: cvtps2pd %xmm5, %xmm0			; SSE41-NEXT: cvtps2pd %xmm5, %xmm0
	; SSE41-NEXT: movhlps {{.*#+}} xmm5 = xmm5[1,1]			; SSE41-NEXT: movhlps {{.*#+}} xmm5 = xmm5[1,1]
	; SSE41-NEXT: cvtps2pd %xmm5, %xmm1			; SSE41-NEXT: cvtps2pd %xmm5, %xmm1
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: fpext:			; AVX-LABEL: fpext:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vcmpltpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vcmpltpd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX-NEXT: vcvtps2pd %xmm2, %ymm1
	; AVX-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX-NEXT: vcvtps2pd %xmm3, %ymm2
	; AVX-NEXT: vblendvps %xmm0, %xmm2, %xmm3, %xmm0			; AVX-NEXT: vblendvpd %ymm0, %ymm1, %ymm2, %ymm0
	; AVX-NEXT: vcvtps2pd %xmm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%cmp = fcmp olt <4 x double> %a, %b			%cmp = fcmp olt <4 x double> %a, %b
	%sel = select <4 x i1> %cmp, <4 x float> %c, <4 x float> %d			%sel = select <4 x i1> %cmp, <4 x float> %c, <4 x float> %d
	%ext = fpext <4 x float> %sel to <4 x double>			%ext = fpext <4 x float> %sel to <4 x double>
	ret <4 x double> %ext			ret <4 x double> %ext
	}			}

	define <8 x i16> @trunc(<8 x i16> %a, <8 x i16> %b, <8 x i32> %c, <8 x i32> %d) {			define <8 x i16> @trunc(<8 x i16> %a, <8 x i16> %b, <8 x i32> %c, <8 x i32> %d) {
	; SSE2-LABEL: trunc:			; SSE2-LABEL: trunc:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pcmpeqw %xmm1, %xmm0			; SSE2-NEXT: pcmpeqw %xmm1, %xmm0
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pslld $16, %xmm5
	; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE2-NEXT: psrad $16, %xmm5
	; SSE2-NEXT: psrad $16, %xmm1			; SSE2-NEXT: pslld $16, %xmm4
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSE2-NEXT: psrad $16, %xmm4
	; SSE2-NEXT: psrad $16, %xmm0			; SSE2-NEXT: packssdw %xmm5, %xmm4
				; SSE2-NEXT: pslld $16, %xmm3
				; SSE2-NEXT: psrad $16, %xmm3
				; SSE2-NEXT: pslld $16, %xmm2
				; SSE2-NEXT: psrad $16, %xmm2
				; SSE2-NEXT: packssdw %xmm3, %xmm2
	; SSE2-NEXT: pand %xmm0, %xmm2			; SSE2-NEXT: pand %xmm0, %xmm2
	; SSE2-NEXT: pandn %xmm4, %xmm0			; SSE2-NEXT: pandn %xmm4, %xmm0
	; SSE2-NEXT: por %xmm2, %xmm0			; SSE2-NEXT: por %xmm2, %xmm0
	; SSE2-NEXT: pand %xmm1, %xmm3
	; SSE2-NEXT: pandn %xmm5, %xmm1
	; SSE2-NEXT: por %xmm3, %xmm1
	; SSE2-NEXT: pslld $16, %xmm1
	; SSE2-NEXT: psrad $16, %xmm1
	; SSE2-NEXT: pslld $16, %xmm0
	; SSE2-NEXT: psrad $16, %xmm0
	; SSE2-NEXT: packssdw %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc:			; SSE41-LABEL: trunc:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pcmpeqw %xmm1, %xmm0			; SSE41-NEXT: pcmpeqw %xmm1, %xmm0
	; SSE41-NEXT: pxor %xmm1, %xmm1			; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE41-NEXT: pshufb %xmm1, %xmm5
	; SSE41-NEXT: pmovsxwd %xmm0, %xmm0			; SSE41-NEXT: pshufb %xmm1, %xmm4
	; SSE41-NEXT: blendvps %xmm0, %xmm2, %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: blendvps %xmm0, %xmm3, %xmm5
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; SSE41-NEXT: pshufb %xmm0, %xmm5
	; SSE41-NEXT: pshufb %xmm0, %xmm4
	; SSE41-NEXT: punpcklqdq {{.*#+}} xmm4 = xmm4[0],xmm5[0]			; SSE41-NEXT: punpcklqdq {{.*#+}} xmm4 = xmm4[0],xmm5[0]
	; SSE41-NEXT: movdqa %xmm4, %xmm0			; SSE41-NEXT: pshufb %xmm1, %xmm3
				; SSE41-NEXT: pshufb %xmm1, %xmm2
				; SSE41-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
				; SSE41-NEXT: pand %xmm0, %xmm2
				; SSE41-NEXT: pandn %xmm4, %xmm0
				; SSE41-NEXT: por %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc:			; AVX1-LABEL: trunc:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm1			; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm1
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm0			; AVX1-NEXT: vpshufb %xmm4, %xmm1, %xmm1
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
	; AVX1-NEXT: vblendvps %ymm0, %ymm2, %ymm3, %ymm0			; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm3[0],xmm1[0]
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vpandn %xmm1, %xmm0, %xmm1
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]			; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3
	; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1			; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
	; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0			; AVX1-NEXT: vpshufb %xmm4, %xmm2, %xmm2
	; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
				; AVX1-NEXT: vpand %xmm0, %xmm2, %xmm0
				; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: trunc:			; AVX2-LABEL: trunc:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovsxwd %xmm0, %ymm0			; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]
	; AVX2-NEXT: vblendvps %ymm0, %ymm2, %ymm3, %ymm0			; AVX2-NEXT: vpshufb %ymm1, %ymm3, %ymm3
	; AVX2-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]			; AVX2-NEXT: vpermq {{.*#+}} ymm3 = ymm3[0,2,2,3]
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,2,3]			; AVX2-NEXT: vpandn %xmm3, %xmm0, %xmm3
	; AVX2-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>			; AVX2-NEXT: vpshufb %ymm1, %ymm2, %ymm1
				; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,2,2,3]
				; AVX2-NEXT: vpand %xmm0, %xmm1, %xmm0
				; AVX2-NEXT: vpor %xmm3, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cmp = icmp eq <8 x i16> %a, %b			%cmp = icmp eq <8 x i16> %a, %b
	%sel = select <8 x i1> %cmp, <8 x i32> %c, <8 x i32> %d			%sel = select <8 x i1> %cmp, <8 x i32> %c, <8 x i32> %d
	%tr = trunc <8 x i32> %sel to <8 x i16>			%tr = trunc <8 x i32> %sel to <8 x i16>
	ret <8 x i16> %tr			ret <8 x i16> %tr
	}			}

	define <4 x float> @fptrunc(<4 x float> %a, <4 x float> %b, <4 x double> %c, <4 x double> %d) {			define <4 x float> @fptrunc(<4 x float> %a, <4 x float> %b, <4 x double> %c, <4 x double> %d) {
	; SSE2-LABEL: fptrunc:			; SSE2-LABEL: fptrunc:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: cmpltps %xmm1, %xmm0			; SSE2-NEXT: cmpltps %xmm1, %xmm0
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: cvtpd2ps %xmm5, %xmm1
	; SSE2-NEXT: psrad $31, %xmm1			; SSE2-NEXT: cvtpd2ps %xmm4, %xmm4
	; SSE2-NEXT: xorps %xmm6, %xmm6			; SSE2-NEXT: unpcklpd {{.*#+}} xmm4 = xmm4[0],xmm1[0]
	; SSE2-NEXT: unpckhps {{.*#+}} xmm6 = xmm6[2],xmm0[2],xmm6[3],xmm0[3]			; SSE2-NEXT: cvtpd2ps %xmm3, %xmm1
	; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; SSE2-NEXT: cvtpd2ps %xmm2, %xmm2
	; SSE2-NEXT: movaps %xmm6, %xmm1			; SSE2-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
	; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3,2,3]			; SSE2-NEXT: andpd %xmm0, %xmm2
	; SSE2-NEXT: psrad $31, %xmm6			; SSE2-NEXT: andnpd %xmm4, %xmm0
	; SSE2-NEXT: pshufd {{.*#+}} xmm6 = xmm6[1,3,2,3]			; SSE2-NEXT: orpd %xmm2, %xmm0
	; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm6[0],xmm1[1],xmm6[1]
	; SSE2-NEXT: pand %xmm1, %xmm3
	; SSE2-NEXT: pandn %xmm5, %xmm1
	; SSE2-NEXT: por %xmm3, %xmm1
	; SSE2-NEXT: pand %xmm0, %xmm2
	; SSE2-NEXT: pandn %xmm4, %xmm0
	; SSE2-NEXT: por %xmm2, %xmm0
	; SSE2-NEXT: cvtpd2ps %xmm0, %xmm0
	; SSE2-NEXT: cvtpd2ps %xmm1, %xmm1
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: fptrunc:			; SSE41-LABEL: fptrunc:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: cmpltps %xmm1, %xmm0			; SSE41-NEXT: cmpltps %xmm1, %xmm0
	; SSE41-NEXT: xorps %xmm1, %xmm1			; SSE41-NEXT: cvtpd2ps %xmm3, %xmm1
	; SSE41-NEXT: unpckhps {{.*#+}} xmm1 = xmm1[2],xmm0[2],xmm1[3],xmm0[3]			; SSE41-NEXT: cvtpd2ps %xmm2, %xmm2
	; SSE41-NEXT: pmovsxdq %xmm0, %xmm0			; SSE41-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
	; SSE41-NEXT: blendvpd %xmm0, %xmm2, %xmm4			; SSE41-NEXT: cvtpd2ps %xmm5, %xmm3
				; SSE41-NEXT: cvtpd2ps %xmm4, %xmm1
				; SSE41-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm3[0]
				; SSE41-NEXT: blendvps %xmm0, %xmm2, %xmm1
	; SSE41-NEXT: movaps %xmm1, %xmm0			; SSE41-NEXT: movaps %xmm1, %xmm0
	; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm5
	; SSE41-NEXT: cvtpd2ps %xmm5, %xmm1
	; SSE41-NEXT: cvtpd2ps %xmm4, %xmm0
	; SSE41-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: fptrunc:			; AVX-LABEL: fptrunc:
	; AVX1: # BB#0:			; AVX: # BB#0:
	; AVX1-NEXT: vcmpltps %xmm1, %xmm0, %xmm0			; AVX-NEXT: vcmpltps %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vpmovsxdq %xmm0, %xmm1			; AVX-NEXT: vcvtpd2ps %ymm2, %xmm1
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX-NEXT: vcvtpd2ps %ymm3, %xmm2
	; AVX1-NEXT: vpmovsxdq %xmm0, %xmm0			; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX-NEXT: vzeroupper
	; AVX1-NEXT: vblendvpd %ymm0, %ymm2, %ymm3, %ymm0			; AVX-NEXT: retq
	; AVX1-NEXT: vcvtpd2ps %ymm0, %xmm0
	; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq
	;
	; AVX2-LABEL: fptrunc:
	; AVX2: # BB#0:
	; AVX2-NEXT: vcmpltps %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovsxdq %xmm0, %ymm0
	; AVX2-NEXT: vblendvpd %ymm0, %ymm2, %ymm3, %ymm0
	; AVX2-NEXT: vcvtpd2ps %ymm0, %xmm0
	; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq
	%cmp = fcmp olt <4 x float> %a, %b			%cmp = fcmp olt <4 x float> %a, %b
	%sel = select <4 x i1> %cmp, <4 x double> %c, <4 x double> %d			%sel = select <4 x i1> %cmp, <4 x double> %c, <4 x double> %d
	%tr = fptrunc <4 x double> %sel to <4 x float>			%tr = fptrunc <4 x double> %sel to <4 x float>
	ret <4 x float> %tr			ret <4 x float> %tr
	}			}

	; PR14657 - avoid truncation/extension of comparison results			; PR14657 - avoid truncation/extension of comparison results
	; These tests demonstrate the same issue as the simpler cases above,			; These tests demonstrate the same issue as the simpler cases above,
	▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; AVX1: # BB#0: # %vector.ph			; AVX1: # BB#0: # %vector.ph
	; AVX1-NEXT: vmovd %edi, %xmm0			; AVX1-NEXT: vmovd %edi, %xmm0
	; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]			; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
	; AVX1-NEXT: vmovd %esi, %xmm1			; AVX1-NEXT: vmovd %esi, %xmm1
	; AVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]			; AVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,1,1]			; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,1,1]
	; AVX1-NEXT: movq $-4096, %rax # imm = 0xF000			; AVX1-NEXT: movq $-4096, %rax # imm = 0xF000
				; AVX1-NEXT: vpmovsxwd %xmm0, %xmm2
				; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; AVX1-NEXT: vpmovsxwd %xmm0, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
				; AVX1-NEXT: vpmovsxwd %xmm1, %xmm2
				; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
				; AVX1-NEXT: vpmovsxwd %xmm1, %xmm1
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
	; AVX1-NEXT: .p2align 4, 0x90			; AVX1-NEXT: .p2align 4, 0x90
	; AVX1-NEXT: .LBB6_1: # %vector.body			; AVX1-NEXT: .LBB6_1: # %vector.body
	; AVX1-NEXT: # =>This Inner Loop Header: Depth=1			; AVX1-NEXT: # =>This Inner Loop Header: Depth=1
	; AVX1-NEXT: vmovups da+4096(%rax), %ymm2			; AVX1-NEXT: vmovups da+4096(%rax), %ymm2
	; AVX1-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2			; AVX1-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2
	; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3			; AVX1-NEXT: vblendvps %ymm2, %ymm0, %ymm1, %ymm2
	; AVX1-NEXT: vpacksswb %xmm3, %xmm2, %xmm2
	; AVX1-NEXT: vpandn %xmm1, %xmm2, %xmm3
	; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm2
	; AVX1-NEXT: vpor %xmm3, %xmm2, %xmm2
	; AVX1-NEXT: vpmovsxwd %xmm2, %xmm3
	; AVX1-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; AVX1-NEXT: vpmovsxwd %xmm2, %xmm2
	; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm3, %ymm2
	; AVX1-NEXT: vmovups %ymm2, dj+4096(%rax)			; AVX1-NEXT: vmovups %ymm2, dj+4096(%rax)
	; AVX1-NEXT: addq $32, %rax			; AVX1-NEXT: addq $32, %rax
	; AVX1-NEXT: jne .LBB6_1			; AVX1-NEXT: jne .LBB6_1
	; AVX1-NEXT: # BB#2: # %for.end			; AVX1-NEXT: # BB#2: # %for.end
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: example24:			; AVX2-LABEL: example24:
	; AVX2: # BB#0: # %vector.ph			; AVX2: # BB#0: # %vector.ph
	; AVX2-NEXT: vmovd %edi, %xmm0			; AVX2-NEXT: vmovd %edi, %xmm0
	; AVX2-NEXT: vpbroadcastw %xmm0, %xmm0			; AVX2-NEXT: vpbroadcastw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %esi, %xmm1			; AVX2-NEXT: vmovd %esi, %xmm1
	; AVX2-NEXT: vpbroadcastw %xmm1, %xmm1			; AVX2-NEXT: vpbroadcastw %xmm1, %xmm1
	; AVX2-NEXT: movq $-4096, %rax # imm = 0xF000			; AVX2-NEXT: movq $-4096, %rax # imm = 0xF000
				; AVX2-NEXT: vpmovsxwd %xmm0, %ymm0
				; AVX2-NEXT: vpmovsxwd %xmm1, %ymm1
	; AVX2-NEXT: .p2align 4, 0x90			; AVX2-NEXT: .p2align 4, 0x90
	; AVX2-NEXT: .LBB6_1: # %vector.body			; AVX2-NEXT: .LBB6_1: # %vector.body
	; AVX2-NEXT: # =>This Inner Loop Header: Depth=1			; AVX2-NEXT: # =>This Inner Loop Header: Depth=1
	; AVX2-NEXT: vmovups da+4096(%rax), %ymm2			; AVX2-NEXT: vmovups da+4096(%rax), %ymm2
	; AVX2-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2			; AVX2-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2
	; AVX2-NEXT: vextractf128 $1, %ymm2, %xmm3			; AVX2-NEXT: vblendvps %ymm2, %ymm0, %ymm1, %ymm2
	; AVX2-NEXT: vpacksswb %xmm3, %xmm2, %xmm2			; AVX2-NEXT: vmovups %ymm2, dj+4096(%rax)
	; AVX2-NEXT: vpandn %xmm1, %xmm2, %xmm3
	; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm2
	; AVX2-NEXT: vpor %xmm3, %xmm2, %xmm2
	; AVX2-NEXT: vpmovsxwd %xmm2, %ymm2
	; AVX2-NEXT: vmovdqu %ymm2, dj+4096(%rax)
	; AVX2-NEXT: addq $32, %rax			; AVX2-NEXT: addq $32, %rax
	; AVX2-NEXT: jne .LBB6_1			; AVX2-NEXT: jne .LBB6_1
	; AVX2-NEXT: # BB#2: # %for.end			; AVX2-NEXT: # BB#2: # %for.end
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	vector.ph:			vector.ph:
	%0 = insertelement <8 x i16> undef, i16 %x, i32 0			%0 = insertelement <8 x i16> undef, i16 %x, i32 0
	%broadcast11 = shufflevector <8 x i16> %0, <8 x i16> undef, <8 x i32> zeroinitializer			%broadcast11 = shufflevector <8 x i16> %0, <8 x i16> undef, <8 x i32> zeroinitializer
	Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 97005

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/cast-vsel.ll

[DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657)
ClosedPublic