This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657)
ClosedPublic

Authored by spatel on Apr 27 2017, 3:53 PM.

Download Raw Diff

Details

Reviewers

efriedma
nadav
RKSimon

Commits

rGad13826aea22: [DAGCombiner] shrink/widen a vselect to match its condition operand size…
rL301781: [DAGCombiner] shrink/widen a vselect to match its condition operand size…

Summary

We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check legality of the select that we want to produce.

A few things to note:

We can't wait until after legalization and do this generically because (at least in the x86 tests from PR14657), we'll have PACKSS and bitcasts in the pattern.
This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but I think that requires a closer look to make sure we don't end up worse.
There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases. I thought I'd better just post this as-is to make sure I'm not off the rails, but I could fix that first.
I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple legal vector sizes, but I can certainly add tests for other targets to make sure this isn't doing harm.
There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests): despite IR that is terrible for the target, this patch allows us to generate the optimal loop code because something post-ISEL is hoisting the splat extends above the vector loops.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Apr 27 2017, 3:53 PM

Herald added subscribers: mcrosier, rengolin, aemerson. · View Herald TranscriptApr 27 2017, 3:53 PM

nadav added inline comments.Apr 27 2017, 4:05 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6935 ↗	(On Diff #97005)	I think that it can be a good idea to check which DAGCombine phase we are in and make sure that we are in the pre-legalization phase.
6945 ↗	(On Diff #97005)	I think that we should check that a and b have one use.

Patch updated:

Bail out if we require LegalOperations (unlikely that this could have benefits if we are past legalization).
That change tilted the scale to make this a proper DAGCombiner member rather than a static, so some cosmetic diffs from that.
Improved code comments, structure, variable names, and assert message (no functional change from any of these either).

spatel added inline comments.Apr 28 2017, 9:38 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

6945 ↗

(On Diff #97005)

I could not find a reason for this. Apologies for moving the goal posts, but I renamed the variables in the updated version of the patch in an attempt to make the code clearer.

Your reference to 'a and b' was to the setcc operands. These are not in play with this transform. Ie, the setcc remains as-is regardless of what happens to the vselect.

'c and d' were the vselect operands (these are now 'A and B'), but I don't see a benefit to checking the number of uses on those either.

Here are a couple of hacked up tests to show these cases. I can add these to the patch if that would help.

define <4 x double> @fpext2(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) {
%cmp = fcmp olt <4 x double> %x, %y
%sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
%ext = fpext <4 x float> %sel to <4 x double>
%add = fadd <4 x double> %x, %y   <--- extra uses of fcmp operands
%div = fdiv <4 x double> %ext, %add
ret <4 x double> %div
}

With this patch (no one-use check on the compare operands):

vcmpltpd   %ymm1, %ymm0, %ymm4
vcvtps2pd  %xmm2, %ymm2
vcvtps2pd  %xmm3, %ymm3
vblendvpd  %ymm4, %ymm2, %ymm3, %ymm2
vaddpd     %ymm1, %ymm0, %ymm0
vdivpd     %ymm0, %ymm2, %ymm0

If we bail out on hasOneUse():

vcmpltpd      %ymm1, %ymm0, %ymm4
vextractf128  $1, %ymm4, %xmm5
vpacksswb     %xmm5, %xmm4, %xmm4
vblendvps     %xmm4, %xmm2, %xmm3, %xmm2
vcvtps2pd     %xmm2, %ymm2
vaddpd        %ymm1, %ymm0, %ymm0
vdivpd        %ymm0, %ymm2, %ymm0

And the case where the select ops have >1 use:

define <4 x double> @fpext3(<4 x double> %x, <4 x double> %y, <4 x float> %a, <4 x float> %b) {
%cmp = fcmp olt <4 x double> %x, %y
%sel = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
%ext = fpext <4 x float> %sel to <4 x double>
%add = fadd <4 x float> %a, %b   <--- extra uses of select ops
%ext2 = fpext <4 x float> %add to <4 x double>
%div = fdiv <4 x double> %ext, %ext2
ret <4 x double> %div
}

Transformed:

vcmpltpd   %ymm1, %ymm0, %ymm0
vcvtps2pd  %xmm2, %ymm1
vcvtps2pd  %xmm3, %ymm4
vblendvpd  %ymm0, %ymm1, %ymm4, %ymm0
vaddps     %xmm3, %xmm2, %xmm1
vcvtps2pd  %xmm1, %ymm1
vdivpd     %ymm1, %ymm0, %ymm0

Or bail out:

vcmpltpd      %ymm1, %ymm0, %ymm0
vextractf128  $1, %ymm0, %xmm1
vpacksswb     %xmm1, %xmm0, %xmm0
vblendvps     %xmm0, %xmm2, %xmm3, %xmm0
vcvtps2pd     %xmm0, %ymm0
vaddps	      %xmm3, %xmm2, %xmm1
vcvtps2pd     %xmm1, %ymm1
vdivpd        %ymm1, %ymm0, %ymm0

LGTM!

This revision is now accepted and ready to land.Apr 29 2017, 1:08 AM

Closed by commit rL301781: [DAGCombiner] shrink/widen a vselect to match its condition operand size… (authored by spatel). · Explain WhyApr 30 2017, 3:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

61 lines

test/

CodeGen/

X86/

cast-vsel.ll

238 lines

Diff 97249

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	private:
SDValue splitMergedValStore(StoreSDNode *ST);		SDValue splitMergedValStore(StoreSDNode *ST);
SDValue TransformFPLoadStorePair(SDNode *N);		SDValue TransformFPLoadStorePair(SDNode *N);
SDValue reduceBuildVecExtToExtBuildVec(SDNode *N);		SDValue reduceBuildVecExtToExtBuildVec(SDNode *N);
SDValue reduceBuildVecConvertToConvertBuildVec(SDNode *N);		SDValue reduceBuildVecConvertToConvertBuildVec(SDNode *N);
SDValue reduceBuildVecToShuffle(SDNode *N);		SDValue reduceBuildVecToShuffle(SDNode *N);
SDValue createBuildVecShuffle(const SDLoc &DL, SDNode *N,		SDValue createBuildVecShuffle(const SDLoc &DL, SDNode *N,
ArrayRef<int> VectorMask, SDValue VecIn1,		ArrayRef<int> VectorMask, SDValue VecIn1,
SDValue VecIn2, unsigned LeftIdx);		SDValue VecIn2, unsigned LeftIdx);
		SDValue matchVSelectOpSizesWithSetCC(SDNode *N);

SDValue GetDemandedBits(SDValue V, const APInt &Mask);		SDValue GetDemandedBits(SDValue V, const APInt &Mask);

/// Walk up chain skipping non-aliasing memory nodes,		/// Walk up chain skipping non-aliasing memory nodes,
/// looking for aliasing nodes and adding them to the Aliases vector.		/// looking for aliasing nodes and adding them to the Aliases vector.
void GatherAllAliases(SDNode *N, SDValue OriginalChain,		void GatherAllAliases(SDNode *N, SDValue OriginalChain,
SmallVectorImpl<SDValue> &Aliases);		SmallVectorImpl<SDValue> &Aliases);

▲ Show 20 Lines • Show All 6,527 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::CombineExtLoad(SDNode *N) {
SDValue Trunc =		SDValue Trunc =
DAG.getNode(ISD::TRUNCATE, SDLoc(N0), N0.getValueType(), NewValue);		DAG.getNode(ISD::TRUNCATE, SDLoc(N0), N0.getValueType(), NewValue);
CombineTo(N0.getNode(), Trunc, NewChain);		CombineTo(N0.getNode(), Trunc, NewChain);
ExtendSetCCUses(SetCCs, Trunc, NewValue, DL,		ExtendSetCCUses(SetCCs, Trunc, NewValue, DL,
(ISD::NodeType)N->getOpcode());		(ISD::NodeType)N->getOpcode());
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}

		/// If we're narrowing or widening the result of a vector select and the final
		/// size is the same size as a setcc (compare) feeding the select, then try to
		/// apply the cast operation to the select's operands because matching vector
		/// sizes for a select condition and other operands should be more efficient.
		SDValue DAGCombiner::matchVSelectOpSizesWithSetCC(SDNode *Cast) {
		unsigned CastOpcode = Cast->getOpcode();
		assert((CastOpcode == ISD::SIGN_EXTEND \|\| CastOpcode == ISD::ZERO_EXTEND \|\|
		CastOpcode == ISD::TRUNCATE \|\| CastOpcode == ISD::FP_EXTEND \|\|
		CastOpcode == ISD::FP_ROUND) &&
		"Unexpected opcode for vector select narrowing/widening");

		// We only do this transform before legal ops because the pattern may be
		// obfuscated by target-specific operations after legalization. Do not create
		// an illegal select op, however, because that may be difficult to lower.
		EVT VT = Cast->getValueType(0);
		if (LegalOperations \|\| !TLI.isOperationLegalOrCustom(ISD::VSELECT, VT))
		return SDValue();

		SDValue VSel = Cast->getOperand(0);
		if (VSel.getOpcode() != ISD::VSELECT \|\| !VSel.hasOneUse() \|\|
		VSel.getOperand(0).getOpcode() != ISD::SETCC)
		return SDValue();

		// Does the setcc have the same vector size as the casted select?
		SDValue SetCC = VSel.getOperand(0);
		EVT SetCCVT = getSetCCResultType(SetCC.getOperand(0).getValueType());
		if (SetCCVT.getSizeInBits() != VT.getSizeInBits())
		return SDValue();

		// cast (vsel (setcc X), A, B) --> vsel (setcc X), (cast A), (cast B)
		SDValue A = VSel.getOperand(1);
		SDValue B = VSel.getOperand(2);
		SDValue CastA, CastB;
		SDLoc DL(Cast);
		if (CastOpcode == ISD::FP_ROUND) {
		// FP_ROUND (fptrunc) has an extra flag operand to pass along.
		CastA = DAG.getNode(CastOpcode, DL, VT, A, Cast->getOperand(1));
		CastB = DAG.getNode(CastOpcode, DL, VT, B, Cast->getOperand(1));
		} else {
		CastA = DAG.getNode(CastOpcode, DL, VT, A);
		CastB = DAG.getNode(CastOpcode, DL, VT, B);
		}
		return DAG.getNode(ISD::VSELECT, DL, VT, SetCC, CastA, CastB);
		}

SDValue DAGCombiner::visitSIGN_EXTEND(SDNode *N) {		SDValue DAGCombiner::visitSIGN_EXTEND(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);

if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
LegalOperations))		LegalOperations))
return SDValue(Res, 0);		return SDValue(Res, 0);
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::SETCC) {
}		}
}		}

// fold (sext x) -> (zext x) if the sign bit is known zero.		// fold (sext x) -> (zext x) if the sign bit is known zero.
if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::ZERO_EXTEND, VT)) &&		if ((!LegalOperations \|\| TLI.isOperationLegal(ISD::ZERO_EXTEND, VT)) &&
DAG.SignBitIsZero(N0))		DAG.SignBitIsZero(N0))
return DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0);		return DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0);

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
		return NewVSel;

return SDValue();		return SDValue();
}		}

// isTruncateOf - If N is a truncate of some other value, return true, record		// isTruncateOf - If N is a truncate of some other value, return true, record
// the value being truncated in Op and which of Op's bits are zero/one in Known.		// the value being truncated in Op and which of Op's bits are zero/one in Known.
// This function computes KnownBits to avoid a duplicated call to		// This function computes KnownBits to avoid a duplicated call to
// computeKnownBits in the caller.		// computeKnownBits in the caller.
static bool isTruncateOf(SelectionDAG &DAG, SDValue N, SDValue &Op,		static bool isTruncateOf(SelectionDAG &DAG, SDValue N, SDValue &Op,
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	if ((N0.getOpcode() == ISD::SHL \|\| N0.getOpcode() == ISD::SRL) &&
if (VT.getSizeInBits() >= 256)		if (VT.getSizeInBits() >= 256)
ShAmt = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, ShAmt);		ShAmt = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, ShAmt);

return DAG.getNode(N0.getOpcode(), DL, VT,		return DAG.getNode(N0.getOpcode(), DL, VT,
DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0)),		DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0)),
ShAmt);		ShAmt);
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
		return NewVSel;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {		SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
▲ Show 20 Lines • Show All 778 Lines • ▼ Show 20 Lines	if ((N0.getOpcode() == ISD::ADDE \|\| N0.getOpcode() == ISD::ADDCARRY) &&
(!LegalOperations \|\| TLI.isOperationLegal(N0.getOpcode(), VT))) {		(!LegalOperations \|\| TLI.isOperationLegal(N0.getOpcode(), VT))) {
SDLoc SL(N);		SDLoc SL(N);
auto X = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(0));		auto X = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(0));
auto Y = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));		auto Y = DAG.getNode(ISD::TRUNCATE, SL, VT, N0.getOperand(1));
auto VTs = DAG.getVTList(VT, N0->getValueType(1));		auto VTs = DAG.getVTList(VT, N0->getValueType(1));
return DAG.getNode(N0.getOpcode(), SL, VTs, X, Y, N0.getOperand(2));		return DAG.getNode(N0.getOpcode(), SL, VTs, X, Y, N0.getOperand(2));
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
		return NewVSel;

return SDValue();		return SDValue();
}		}

static SDNode getBuildPairElt(SDNode N, unsigned i) {		static SDNode getBuildPairElt(SDNode N, unsigned i) {
SDValue Elt = N->getOperand(i);		SDValue Elt = N->getOperand(i);
if (Elt.getOpcode() != ISD::MERGE_VALUES)		if (Elt.getOpcode() != ISD::MERGE_VALUES)
return Elt.getNode();		return Elt.getNode();
return Elt.getOperand(Elt.getResNo()).getNode();		return Elt.getOperand(Elt.getResNo()).getNode();
▲ Show 20 Lines • Show All 1,935 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFP_ROUND(SDNode *N) {
if (N0.getOpcode() == ISD::FCOPYSIGN && N0.getNode()->hasOneUse()) {		if (N0.getOpcode() == ISD::FCOPYSIGN && N0.getNode()->hasOneUse()) {
SDValue Tmp = DAG.getNode(ISD::FP_ROUND, SDLoc(N0), VT,		SDValue Tmp = DAG.getNode(ISD::FP_ROUND, SDLoc(N0), VT,
N0.getOperand(0), N1);		N0.getOperand(0), N1);
AddToWorklist(Tmp.getNode());		AddToWorklist(Tmp.getNode());
return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT,		return DAG.getNode(ISD::FCOPYSIGN, SDLoc(N), VT,
Tmp, N0.getOperand(1));		Tmp, N0.getOperand(1));
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
		return NewVSel;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFP_ROUND_INREG(SDNode *N) {		SDValue DAGCombiner::visitFP_ROUND_INREG(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT EVT = cast<VTSDNode>(N->getOperand(1))->getVT();		EVT EVT = cast<VTSDNode>(N->getOperand(1))->getVT();
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (ISD::isNormalLoad(N0.getNode()) && N0.hasOneUse() &&
CombineTo(N0.getNode(),		CombineTo(N0.getNode(),
DAG.getNode(ISD::FP_ROUND, SDLoc(N0),		DAG.getNode(ISD::FP_ROUND, SDLoc(N0),
N0.getValueType(), ExtLoad,		N0.getValueType(), ExtLoad,
DAG.getIntPtrConstant(1, SDLoc(N0))),		DAG.getIntPtrConstant(1, SDLoc(N0))),
ExtLoad.getValue(1));		ExtLoad.getValue(1));
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}

		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
		return NewVSel;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFCEIL(SDNode *N) {		SDValue DAGCombiner::visitFCEIL(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// fold (fceil c1) -> fceil(c1)		// fold (fceil c1) -> fceil(c1)
▲ Show 20 Lines • Show All 6,130 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/cast-vsel.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; SSE41-NEXT: pmovsxwd %xmm0, %xmm1			; SSE41-NEXT: pmovsxwd %xmm0, %xmm1
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: sext:			; AVX1-LABEL: sext:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vpmovsxwd %xmm2, %xmm1
	; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; AVX1-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX1-NEXT: vpmovsxwd %xmm2, %xmm2
	; AVX1-NEXT: vpand %xmm0, %xmm2, %xmm0			; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpmovsxwd %xmm3, %xmm2
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm1			; AVX1-NEXT: vpshufd {{.*#+}} xmm3 = xmm3[2,3,0,1]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; AVX1-NEXT: vpmovsxwd %xmm3, %xmm3
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm0			; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm2, %ymm2
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX1-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: sext:			; AVX2-LABEL: sext:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX2-NEXT: vpmovsxwd %xmm2, %ymm1
	; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpmovsxwd %xmm3, %ymm2
	; AVX2-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX2-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX2-NEXT: vpand %xmm0, %xmm2, %xmm0
	; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovsxwd %xmm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cmp = fcmp olt <8 x float> %a, %b			%cmp = fcmp olt <8 x float> %a, %b
	%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d			%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d
	%ext = sext <8 x i16> %sel to <8 x i32>			%ext = sext <8 x i16> %sel to <8 x i32>
	ret <8 x i32> %ext			ret <8 x i32> %ext
	}			}

	define <8 x i32> @zext(<8 x float> %a, <8 x float> %b, <8 x i16> %c, <8 x i16> %d) {			define <8 x i32> @zext(<8 x float> %a, <8 x float> %b, <8 x i16> %c, <8 x i16> %d) {
	Show All 32 Lines
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; SSE41-NEXT: pmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; SSE41-NEXT: pmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext:			; AVX1-LABEL: zext:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX1-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
	; AVX1-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; AVX1-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm2 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero
	; AVX1-NEXT: vpand %xmm0, %xmm2, %xmm0			; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
	; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm2 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero
	; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; AVX1-NEXT: vpshufd {{.*#+}} xmm3 = xmm3[2,3,0,1]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm3 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero
	; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm2, %ymm2
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX1-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext:			; AVX2-LABEL: zext:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vcmpltps %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm1 = xmm2[0],zero,xmm2[1],zero,xmm2[2],zero,xmm2[3],zero,xmm2[4],zero,xmm2[5],zero,xmm2[6],zero,xmm2[7],zero
	; AVX2-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm2 = xmm3[0],zero,xmm3[1],zero,xmm3[2],zero,xmm3[3],zero,xmm3[4],zero,xmm3[5],zero,xmm3[6],zero,xmm3[7],zero
	; AVX2-NEXT: vpandn %xmm3, %xmm0, %xmm1			; AVX2-NEXT: vblendvps %ymm0, %ymm1, %ymm2, %ymm0
	; AVX2-NEXT: vpand %xmm0, %xmm2, %xmm0
	; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cmp = fcmp olt <8 x float> %a, %b			%cmp = fcmp olt <8 x float> %a, %b
	%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d			%sel = select <8 x i1> %cmp, <8 x i16> %c, <8 x i16> %d
	%ext = zext <8 x i16> %sel to <8 x i32>			%ext = zext <8 x i16> %sel to <8 x i32>
	ret <8 x i32> %ext			ret <8 x i32> %ext
	}			}

	define <4 x double> @fpext(<4 x double> %a, <4 x double> %b, <4 x float> %c, <4 x float> %d) {			define <4 x double> @fpext(<4 x double> %a, <4 x double> %b, <4 x float> %c, <4 x float> %d) {
	Show All 20 Lines
	; SSE41-NEXT: cvtps2pd %xmm5, %xmm0			; SSE41-NEXT: cvtps2pd %xmm5, %xmm0
	; SSE41-NEXT: movhlps {{.*#+}} xmm5 = xmm5[1,1]			; SSE41-NEXT: movhlps {{.*#+}} xmm5 = xmm5[1,1]
	; SSE41-NEXT: cvtps2pd %xmm5, %xmm1			; SSE41-NEXT: cvtps2pd %xmm5, %xmm1
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: fpext:			; AVX-LABEL: fpext:
	; AVX: # BB#0:			; AVX: # BB#0:
	; AVX-NEXT: vcmpltpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vcmpltpd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX-NEXT: vcvtps2pd %xmm2, %ymm1
	; AVX-NEXT: vpacksswb %xmm1, %xmm0, %xmm0			; AVX-NEXT: vcvtps2pd %xmm3, %ymm2
	; AVX-NEXT: vblendvps %xmm0, %xmm2, %xmm3, %xmm0			; AVX-NEXT: vblendvpd %ymm0, %ymm1, %ymm2, %ymm0
	; AVX-NEXT: vcvtps2pd %xmm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%cmp = fcmp olt <4 x double> %a, %b			%cmp = fcmp olt <4 x double> %a, %b
	%sel = select <4 x i1> %cmp, <4 x float> %c, <4 x float> %d			%sel = select <4 x i1> %cmp, <4 x float> %c, <4 x float> %d
	%ext = fpext <4 x float> %sel to <4 x double>			%ext = fpext <4 x float> %sel to <4 x double>
	ret <4 x double> %ext			ret <4 x double> %ext
	}			}

	define <8 x i16> @trunc(<8 x i16> %a, <8 x i16> %b, <8 x i32> %c, <8 x i32> %d) {			define <8 x i16> @trunc(<8 x i16> %a, <8 x i16> %b, <8 x i32> %c, <8 x i32> %d) {
	; SSE2-LABEL: trunc:			; SSE2-LABEL: trunc:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pcmpeqw %xmm1, %xmm0			; SSE2-NEXT: pcmpeqw %xmm1, %xmm0
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pslld $16, %xmm5
	; SSE2-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE2-NEXT: psrad $16, %xmm5
	; SSE2-NEXT: psrad $16, %xmm1			; SSE2-NEXT: pslld $16, %xmm4
	; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3]			; SSE2-NEXT: psrad $16, %xmm4
	; SSE2-NEXT: psrad $16, %xmm0			; SSE2-NEXT: packssdw %xmm5, %xmm4
				; SSE2-NEXT: pslld $16, %xmm3
				; SSE2-NEXT: psrad $16, %xmm3
				; SSE2-NEXT: pslld $16, %xmm2
				; SSE2-NEXT: psrad $16, %xmm2
				; SSE2-NEXT: packssdw %xmm3, %xmm2
	; SSE2-NEXT: pand %xmm0, %xmm2			; SSE2-NEXT: pand %xmm0, %xmm2
	; SSE2-NEXT: pandn %xmm4, %xmm0			; SSE2-NEXT: pandn %xmm4, %xmm0
	; SSE2-NEXT: por %xmm2, %xmm0			; SSE2-NEXT: por %xmm2, %xmm0
	; SSE2-NEXT: pand %xmm1, %xmm3
	; SSE2-NEXT: pandn %xmm5, %xmm1
	; SSE2-NEXT: por %xmm3, %xmm1
	; SSE2-NEXT: pslld $16, %xmm1
	; SSE2-NEXT: psrad $16, %xmm1
	; SSE2-NEXT: pslld $16, %xmm0
	; SSE2-NEXT: psrad $16, %xmm0
	; SSE2-NEXT: packssdw %xmm1, %xmm0
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: trunc:			; SSE41-LABEL: trunc:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: pcmpeqw %xmm1, %xmm0			; SSE41-NEXT: pcmpeqw %xmm1, %xmm0
	; SSE41-NEXT: pxor %xmm1, %xmm1			; SSE41-NEXT: movdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; SSE41-NEXT: punpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; SSE41-NEXT: pshufb %xmm1, %xmm5
	; SSE41-NEXT: pmovsxwd %xmm0, %xmm0			; SSE41-NEXT: pshufb %xmm1, %xmm4
	; SSE41-NEXT: blendvps %xmm0, %xmm2, %xmm4
	; SSE41-NEXT: movdqa %xmm1, %xmm0
	; SSE41-NEXT: blendvps %xmm0, %xmm3, %xmm5
	; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; SSE41-NEXT: pshufb %xmm0, %xmm5
	; SSE41-NEXT: pshufb %xmm0, %xmm4
	; SSE41-NEXT: punpcklqdq {{.*#+}} xmm4 = xmm4[0],xmm5[0]			; SSE41-NEXT: punpcklqdq {{.*#+}} xmm4 = xmm4[0],xmm5[0]
	; SSE41-NEXT: movdqa %xmm4, %xmm0			; SSE41-NEXT: pshufb %xmm1, %xmm3
				; SSE41-NEXT: pshufb %xmm1, %xmm2
				; SSE41-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
				; SSE41-NEXT: pand %xmm0, %xmm2
				; SSE41-NEXT: pandn %xmm4, %xmm0
				; SSE41-NEXT: por %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: trunc:			; AVX1-LABEL: trunc:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm1			; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm1
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
	; AVX1-NEXT: vpmovsxwd %xmm0, %xmm0			; AVX1-NEXT: vpshufb %xmm4, %xmm1, %xmm1
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
	; AVX1-NEXT: vblendvps %ymm0, %ymm2, %ymm3, %ymm0			; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm3[0],xmm1[0]
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vpandn %xmm1, %xmm0, %xmm1
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]			; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3
	; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1			; AVX1-NEXT: vpshufb %xmm4, %xmm3, %xmm3
	; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0			; AVX1-NEXT: vpshufb %xmm4, %xmm2, %xmm2
	; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
				; AVX1-NEXT: vpand %xmm0, %xmm2, %xmm0
				; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: trunc:			; AVX2-LABEL: trunc:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovsxwd %xmm0, %ymm0			; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]
	; AVX2-NEXT: vblendvps %ymm0, %ymm2, %ymm3, %ymm0			; AVX2-NEXT: vpshufb %ymm1, %ymm3, %ymm3
	; AVX2-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]			; AVX2-NEXT: vpermq {{.*#+}} ymm3 = ymm3[0,2,2,3]
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,2,2,3]			; AVX2-NEXT: vpandn %xmm3, %xmm0, %xmm3
	; AVX2-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<kill>			; AVX2-NEXT: vpshufb %ymm1, %ymm2, %ymm1
				; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm1[0,2,2,3]
				; AVX2-NEXT: vpand %xmm0, %xmm1, %xmm0
				; AVX2-NEXT: vpor %xmm3, %xmm0, %xmm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cmp = icmp eq <8 x i16> %a, %b			%cmp = icmp eq <8 x i16> %a, %b
	%sel = select <8 x i1> %cmp, <8 x i32> %c, <8 x i32> %d			%sel = select <8 x i1> %cmp, <8 x i32> %c, <8 x i32> %d
	%tr = trunc <8 x i32> %sel to <8 x i16>			%tr = trunc <8 x i32> %sel to <8 x i16>
	ret <8 x i16> %tr			ret <8 x i16> %tr
	}			}

	define <4 x float> @fptrunc(<4 x float> %a, <4 x float> %b, <4 x double> %c, <4 x double> %d) {			define <4 x float> @fptrunc(<4 x float> %a, <4 x float> %b, <4 x double> %c, <4 x double> %d) {
	; SSE2-LABEL: fptrunc:			; SSE2-LABEL: fptrunc:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: cmpltps %xmm1, %xmm0			; SSE2-NEXT: cmpltps %xmm1, %xmm0
	; SSE2-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: cvtpd2ps %xmm5, %xmm1
	; SSE2-NEXT: psrad $31, %xmm1			; SSE2-NEXT: cvtpd2ps %xmm4, %xmm4
	; SSE2-NEXT: xorps %xmm6, %xmm6			; SSE2-NEXT: unpcklpd {{.*#+}} xmm4 = xmm4[0],xmm1[0]
	; SSE2-NEXT: unpckhps {{.*#+}} xmm6 = xmm6[2],xmm0[2],xmm6[3],xmm0[3]			; SSE2-NEXT: cvtpd2ps %xmm3, %xmm1
	; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; SSE2-NEXT: cvtpd2ps %xmm2, %xmm2
	; SSE2-NEXT: movaps %xmm6, %xmm1			; SSE2-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
	; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3,2,3]			; SSE2-NEXT: andpd %xmm0, %xmm2
	; SSE2-NEXT: psrad $31, %xmm6			; SSE2-NEXT: andnpd %xmm4, %xmm0
	; SSE2-NEXT: pshufd {{.*#+}} xmm6 = xmm6[1,3,2,3]			; SSE2-NEXT: orpd %xmm2, %xmm0
	; SSE2-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm6[0],xmm1[1],xmm6[1]
	; SSE2-NEXT: pand %xmm1, %xmm3
	; SSE2-NEXT: pandn %xmm5, %xmm1
	; SSE2-NEXT: por %xmm3, %xmm1
	; SSE2-NEXT: pand %xmm0, %xmm2
	; SSE2-NEXT: pandn %xmm4, %xmm0
	; SSE2-NEXT: por %xmm2, %xmm0
	; SSE2-NEXT: cvtpd2ps %xmm0, %xmm0
	; SSE2-NEXT: cvtpd2ps %xmm1, %xmm1
	; SSE2-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE41-LABEL: fptrunc:			; SSE41-LABEL: fptrunc:
	; SSE41: # BB#0:			; SSE41: # BB#0:
	; SSE41-NEXT: cmpltps %xmm1, %xmm0			; SSE41-NEXT: cmpltps %xmm1, %xmm0
	; SSE41-NEXT: xorps %xmm1, %xmm1			; SSE41-NEXT: cvtpd2ps %xmm3, %xmm1
	; SSE41-NEXT: unpckhps {{.*#+}} xmm1 = xmm1[2],xmm0[2],xmm1[3],xmm0[3]			; SSE41-NEXT: cvtpd2ps %xmm2, %xmm2
	; SSE41-NEXT: pmovsxdq %xmm0, %xmm0			; SSE41-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
	; SSE41-NEXT: blendvpd %xmm0, %xmm2, %xmm4			; SSE41-NEXT: cvtpd2ps %xmm5, %xmm3
				; SSE41-NEXT: cvtpd2ps %xmm4, %xmm1
				; SSE41-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm3[0]
				; SSE41-NEXT: blendvps %xmm0, %xmm2, %xmm1
	; SSE41-NEXT: movaps %xmm1, %xmm0			; SSE41-NEXT: movaps %xmm1, %xmm0
	; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm5
	; SSE41-NEXT: cvtpd2ps %xmm5, %xmm1
	; SSE41-NEXT: cvtpd2ps %xmm4, %xmm0
	; SSE41-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: fptrunc:			; AVX-LABEL: fptrunc:
	; AVX1: # BB#0:			; AVX: # BB#0:
	; AVX1-NEXT: vcmpltps %xmm1, %xmm0, %xmm0			; AVX-NEXT: vcmpltps %xmm1, %xmm0, %xmm0
	; AVX1-NEXT: vpmovsxdq %xmm0, %xmm1			; AVX-NEXT: vcvtpd2ps %ymm2, %xmm1
	; AVX1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]			; AVX-NEXT: vcvtpd2ps %ymm3, %xmm2
	; AVX1-NEXT: vpmovsxdq %xmm0, %xmm0			; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; AVX-NEXT: vzeroupper
	; AVX1-NEXT: vblendvpd %ymm0, %ymm2, %ymm3, %ymm0			; AVX-NEXT: retq
	; AVX1-NEXT: vcvtpd2ps %ymm0, %xmm0
	; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq
	;
	; AVX2-LABEL: fptrunc:
	; AVX2: # BB#0:
	; AVX2-NEXT: vcmpltps %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpmovsxdq %xmm0, %ymm0
	; AVX2-NEXT: vblendvpd %ymm0, %ymm2, %ymm3, %ymm0
	; AVX2-NEXT: vcvtpd2ps %ymm0, %xmm0
	; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq
	%cmp = fcmp olt <4 x float> %a, %b			%cmp = fcmp olt <4 x float> %a, %b
	%sel = select <4 x i1> %cmp, <4 x double> %c, <4 x double> %d			%sel = select <4 x i1> %cmp, <4 x double> %c, <4 x double> %d
	%tr = fptrunc <4 x double> %sel to <4 x float>			%tr = fptrunc <4 x double> %sel to <4 x float>
	ret <4 x float> %tr			ret <4 x float> %tr
	}			}

	; PR14657 - avoid truncation/extension of comparison results			; PR14657 - avoid truncation/extension of comparison results
	; These tests demonstrate the same issue as the simpler cases above,			; These tests demonstrate the same issue as the simpler cases above,
	▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
	; AVX1: # BB#0: # %vector.ph			; AVX1: # BB#0: # %vector.ph
	; AVX1-NEXT: vmovd %edi, %xmm0			; AVX1-NEXT: vmovd %edi, %xmm0
	; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]			; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
	; AVX1-NEXT: vmovd %esi, %xmm1			; AVX1-NEXT: vmovd %esi, %xmm1
	; AVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]			; AVX1-NEXT: vpshuflw {{.*#+}} xmm1 = xmm1[0,0,0,0,4,5,6,7]
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,1,1]			; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,0,1,1]
	; AVX1-NEXT: movq $-4096, %rax # imm = 0xF000			; AVX1-NEXT: movq $-4096, %rax # imm = 0xF000
				; AVX1-NEXT: vpmovsxwd %xmm0, %xmm2
				; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; AVX1-NEXT: vpmovsxwd %xmm0, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
				; AVX1-NEXT: vpmovsxwd %xmm1, %xmm2
				; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
				; AVX1-NEXT: vpmovsxwd %xmm1, %xmm1
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
	; AVX1-NEXT: .p2align 4, 0x90			; AVX1-NEXT: .p2align 4, 0x90
	; AVX1-NEXT: .LBB6_1: # %vector.body			; AVX1-NEXT: .LBB6_1: # %vector.body
	; AVX1-NEXT: # =>This Inner Loop Header: Depth=1			; AVX1-NEXT: # =>This Inner Loop Header: Depth=1
	; AVX1-NEXT: vmovups da+4096(%rax), %ymm2			; AVX1-NEXT: vmovups da+4096(%rax), %ymm2
	; AVX1-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2			; AVX1-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2
	; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3			; AVX1-NEXT: vblendvps %ymm2, %ymm0, %ymm1, %ymm2
	; AVX1-NEXT: vpacksswb %xmm3, %xmm2, %xmm2
	; AVX1-NEXT: vpandn %xmm1, %xmm2, %xmm3
	; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm2
	; AVX1-NEXT: vpor %xmm3, %xmm2, %xmm2
	; AVX1-NEXT: vpmovsxwd %xmm2, %xmm3
	; AVX1-NEXT: vpshufd {{.*#+}} xmm2 = xmm2[2,3,0,1]
	; AVX1-NEXT: vpmovsxwd %xmm2, %xmm2
	; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm3, %ymm2
	; AVX1-NEXT: vmovups %ymm2, dj+4096(%rax)			; AVX1-NEXT: vmovups %ymm2, dj+4096(%rax)
	; AVX1-NEXT: addq $32, %rax			; AVX1-NEXT: addq $32, %rax
	; AVX1-NEXT: jne .LBB6_1			; AVX1-NEXT: jne .LBB6_1
	; AVX1-NEXT: # BB#2: # %for.end			; AVX1-NEXT: # BB#2: # %for.end
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: example24:			; AVX2-LABEL: example24:
	; AVX2: # BB#0: # %vector.ph			; AVX2: # BB#0: # %vector.ph
	; AVX2-NEXT: vmovd %edi, %xmm0			; AVX2-NEXT: vmovd %edi, %xmm0
	; AVX2-NEXT: vpbroadcastw %xmm0, %xmm0			; AVX2-NEXT: vpbroadcastw %xmm0, %xmm0
	; AVX2-NEXT: vmovd %esi, %xmm1			; AVX2-NEXT: vmovd %esi, %xmm1
	; AVX2-NEXT: vpbroadcastw %xmm1, %xmm1			; AVX2-NEXT: vpbroadcastw %xmm1, %xmm1
	; AVX2-NEXT: movq $-4096, %rax # imm = 0xF000			; AVX2-NEXT: movq $-4096, %rax # imm = 0xF000
				; AVX2-NEXT: vpmovsxwd %xmm0, %ymm0
				; AVX2-NEXT: vpmovsxwd %xmm1, %ymm1
	; AVX2-NEXT: .p2align 4, 0x90			; AVX2-NEXT: .p2align 4, 0x90
	; AVX2-NEXT: .LBB6_1: # %vector.body			; AVX2-NEXT: .LBB6_1: # %vector.body
	; AVX2-NEXT: # =>This Inner Loop Header: Depth=1			; AVX2-NEXT: # =>This Inner Loop Header: Depth=1
	; AVX2-NEXT: vmovups da+4096(%rax), %ymm2			; AVX2-NEXT: vmovups da+4096(%rax), %ymm2
	; AVX2-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2			; AVX2-NEXT: vcmpltps db+4096(%rax), %ymm2, %ymm2
	; AVX2-NEXT: vextractf128 $1, %ymm2, %xmm3			; AVX2-NEXT: vblendvps %ymm2, %ymm0, %ymm1, %ymm2
	; AVX2-NEXT: vpacksswb %xmm3, %xmm2, %xmm2			; AVX2-NEXT: vmovups %ymm2, dj+4096(%rax)
	; AVX2-NEXT: vpandn %xmm1, %xmm2, %xmm3
	; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm2
	; AVX2-NEXT: vpor %xmm3, %xmm2, %xmm2
	; AVX2-NEXT: vpmovsxwd %xmm2, %ymm2
	; AVX2-NEXT: vmovdqu %ymm2, dj+4096(%rax)
	; AVX2-NEXT: addq $32, %rax			; AVX2-NEXT: addq $32, %rax
	; AVX2-NEXT: jne .LBB6_1			; AVX2-NEXT: jne .LBB6_1
	; AVX2-NEXT: # BB#2: # %for.end			; AVX2-NEXT: # BB#2: # %for.end
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	vector.ph:			vector.ph:
	%0 = insertelement <8 x i16> undef, i16 %x, i32 0			%0 = insertelement <8 x i16> undef, i16 %x, i32 0
	%broadcast11 = shufflevector <8 x i16> %0, <8 x i16> undef, <8 x i32> zeroinitializer			%broadcast11 = shufflevector <8 x i16> %0, <8 x i16> undef, <8 x i32> zeroinitializer
	Show All 26 Lines