This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
ClosedPublic

Authored by RKSimon on Feb 28 2016, 4:17 AM.

Download Raw Diff

Details

Reviewers

spatel
delena
congh
igorb

Commits

rG61eb49e437d7: [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to…
rG91dd0a796cb3: [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
rL263159: [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to…
rL262599: [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG

Summary

Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

This won't solve the issues with PR25718 (load+zext 8xi8 to 8xi32) but is one of several things that needs to be done at the same time.

Igor/Elena - can you advise why the masks aren't being folded into the VPMOVZX instructions on skylake targets any more please?

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 49314.Feb 28 2016, 4:17 AM

RKSimon retitled this revision from to [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG.

RKSimon updated this object.

RKSimon added reviewers: igorb, delena, congh, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

delena added inline comments.Feb 28 2016, 6:10 AM

test/CodeGen/X86/avx512-ext.ll
116	Hi Simon, Why do we need an additional instruction here? vpmovzxbw %xmm0, %ymm0 {%k1} {z} does the work

RKSimon added inline comments.Feb 28 2016, 6:32 AM

test/CodeGen/X86/avx512-ext.ll
116	Hi - that was what I was asking yourself + Igor. I don't know much about how the masking lowering work in AVX512, but for some reason these VZEXT (which in this case has come via VECTOR_SHUFFLE lowering) don't correctly combine with the masks. Now, I'm tempted to avoid this issue by just not combining cases where ZERO_EXTEND is legal and extends the whole register, but it looks like its just hiding a bigger problem. I'll see if I can create a simplified repro if you wish?

I looked at the code. We combine "zext" with "select" in td file. and receive
vpmovzxbw %xmm0, %ymm0 {%k1} {z}

Then I'm looking at lowering of ZERO_EXTEND_VECTOR_INREG : VectorLegalizer::ExpandZERO_EXTEND_VECTOR_INREG()

It goes there, right?
At the end I see:

return DAG.getNode(ISD::BITCAST, DL, VT,
                   DAG.getVectorShuffle(SrcVT, DL, Zero, Src, ShuffleMask));

I assume that BITCAST does not allow combining "zext" with "select" .

But SEXT code woks perfect on SKX, right?

define <16 x i16> @zext_16x8_to_16x16_mask(<16 x i8> %a ,<16 x i1> %mask) nounwind readnone {

%x   = sext <16 x i8> %a to <16 x i16>
%ret = select <16 x i1> %mask, <16 x i16> %x, <16 x i16> zeroinitializer
ret <16 x i16> %ret

}

In D17691#363782, @delena wrote:

I assume that BITCAST does not allow combining "zext" with "select" .

Yes, I've just done a repro using vector shuffle - the bitcast seems to be the problem (and is why we don't see it with the VSEXT case). I can raise a bugzilla describing this if you want me to.

It looks the best option for now is only use this combine for certain non-legal ZERO_EXTEND cases - I'll update it shortly.

I wonder whether X86ISD::VZEXT / X86ISD::VSEXT are that useful or whether we should try to implement the PMOVZX/PMOVSX instructions with a mixture of ZERO/SIGN_EXTEND and ZERO/SIGN_EXTEND_VECTOR_INREG - I've looked at this in the past but haven't pursued it recently, IIRC there were problems with memory folding.

Tightened the restrictions on the *_EXTEND to *_EXTEND_VECTOR_INREG so that already legal transforms (AVX2+) don't get unnecssarily combined.

Created PR26762 to investigate the issue with bitcasts preventing masked instruction combining

Looks better. I suggest to run llc with --debug for zext and sext for SKX and compare between them.
But in general, the patch looks good.

In D17691#363842, @delena wrote:

Looks better. I suggest to run llc with --debug for zext and sext for SKX and compare between them.
But in general, the patch looks good.

No problem - I'll put my findings on PR26762.

OK to commit?

LGTM

This revision is now accepted and ready to land.Mar 2 2016, 10:42 AM

Closed by commit rL262599: [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG (authored by RKSimon). · Explain WhyMar 3 2016, 1:48 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

19 lines

Target/

X86/

X86ISelLowering.cpp

27 lines

test/

CodeGen/

X86/

avx512-ext.ll

10 lines

vec_int_to_fp.ll

34 lines

vector-zext.ll

29 lines

Diff 49314

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	private:
SDValue visitSELECT_CC(SDNode *N);		SDValue visitSELECT_CC(SDNode *N);
SDValue visitSETCC(SDNode *N);		SDValue visitSETCC(SDNode *N);
SDValue visitSETCCE(SDNode *N);		SDValue visitSETCCE(SDNode *N);
SDValue visitSIGN_EXTEND(SDNode *N);		SDValue visitSIGN_EXTEND(SDNode *N);
SDValue visitZERO_EXTEND(SDNode *N);		SDValue visitZERO_EXTEND(SDNode *N);
SDValue visitANY_EXTEND(SDNode *N);		SDValue visitANY_EXTEND(SDNode *N);
SDValue visitSIGN_EXTEND_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_INREG(SDNode *N);
SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);
		SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitTRUNCATE(SDNode *N);		SDValue visitTRUNCATE(SDNode *N);
SDValue visitBITCAST(SDNode *N);		SDValue visitBITCAST(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);		SDValue visitBUILD_PAIR(SDNode *N);
SDValue visitFADD(SDNode *N);		SDValue visitFADD(SDNode *N);
SDValue visitFSUB(SDNode *N);		SDValue visitFSUB(SDNode *N);
SDValue visitFMUL(SDNode *N);		SDValue visitFMUL(SDNode *N);
SDValue visitFMA(SDNode *N);		SDValue visitFMA(SDNode *N);
SDValue visitFDIV(SDNode *N);		SDValue visitFDIV(SDNode *N);
▲ Show 20 Lines • Show All 1,107 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::SELECT_CC: return visitSELECT_CC(N);		case ISD::SELECT_CC: return visitSELECT_CC(N);
case ISD::SETCC: return visitSETCC(N);		case ISD::SETCC: return visitSETCC(N);
case ISD::SETCCE: return visitSETCCE(N);		case ISD::SETCCE: return visitSETCCE(N);
case ISD::SIGN_EXTEND: return visitSIGN_EXTEND(N);		case ISD::SIGN_EXTEND: return visitSIGN_EXTEND(N);
case ISD::ZERO_EXTEND: return visitZERO_EXTEND(N);		case ISD::ZERO_EXTEND: return visitZERO_EXTEND(N);
case ISD::ANY_EXTEND: return visitANY_EXTEND(N);		case ISD::ANY_EXTEND: return visitANY_EXTEND(N);
case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);		case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);
case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);		case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);
		case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);
case ISD::TRUNCATE: return visitTRUNCATE(N);		case ISD::TRUNCATE: return visitTRUNCATE(N);
case ISD::BITCAST: return visitBITCAST(N);		case ISD::BITCAST: return visitBITCAST(N);
case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);		case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
case ISD::FADD: return visitFADD(N);		case ISD::FADD: return visitFADD(N);
case ISD::FSUB: return visitFSUB(N);		case ISD::FSUB: return visitFSUB(N);
case ISD::FMUL: return visitFMUL(N);		case ISD::FMUL: return visitFMUL(N);
case ISD::FMA: return visitFMA(N);		case ISD::FMA: return visitFMA(N);
case ISD::FDIV: return visitFDIV(N);		case ISD::FDIV: return visitFDIV(N);
▲ Show 20 Lines • Show All 4,303 Lines • ▼ Show 20 Lines
static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,		static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,
SelectionDAG &DAG, bool LegalTypes,		SelectionDAG &DAG, bool LegalTypes,
bool LegalOperations) {		bool LegalOperations) {
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

assert((Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::ZERO_EXTEND \|\|		assert((Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::ZERO_EXTEND \|\|
Opcode == ISD::ANY_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_VECTOR_INREG)		Opcode == ISD::ANY_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_VECTOR_INREG \|\|
		Opcode == ISD::ZERO_EXTEND_VECTOR_INREG)
&& "Expected EXTEND dag node in input!");		&& "Expected EXTEND dag node in input!");

// fold (sext c1) -> c1		// fold (sext c1) -> c1
// fold (zext c1) -> c1		// fold (zext c1) -> c1
// fold (aext c1) -> c1		// fold (aext c1) -> c1
if (isa<ConstantSDNode>(N0))		if (isa<ConstantSDNode>(N0))
return DAG.getNode(Opcode, SDLoc(N), VT, N0).getNode();		return DAG.getNode(Opcode, SDLoc(N), VT, N0).getNode();

▲ Show 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSIGN_EXTEND_VECTOR_INREG(SDNode *N) {

if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
LegalOperations))		LegalOperations))
return SDValue(Res, 0);		return SDValue(Res, 0);

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitZERO_EXTEND_VECTOR_INREG(SDNode *N) {
		SDValue N0 = N->getOperand(0);
		EVT VT = N->getValueType(0);

		if (N0.getOpcode() == ISD::UNDEF)
		return DAG.getUNDEF(VT);

		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
		LegalOperations))
		return SDValue(Res, 0);

		return SDValue();
		}

SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {		SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
bool isLE = DAG.getDataLayout().isLittleEndian();		bool isLE = DAG.getDataLayout().isLittleEndian();

// noop truncate		// noop truncate
if (N0.getValueType() == N->getValueType(0))		if (N0.getValueType() == N->getValueType(0))
return N0;		return N0;
▲ Show 20 Lines • Show All 7,794 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 28,206 Lines • ▼ Show 20 Lines	static SDValue getDivRem8(SDNode *N, SelectionDAG &DAG) {
auto DivRemOpcode = OpcodeN0 == ISD::SDIVREM ? X86ISD::SDIVREM8_SEXT_HREG		auto DivRemOpcode = OpcodeN0 == ISD::SDIVREM ? X86ISD::SDIVREM8_SEXT_HREG
: X86ISD::UDIVREM8_ZEXT_HREG;		: X86ISD::UDIVREM8_ZEXT_HREG;
SDValue R = DAG.getNode(DivRemOpcode, SDLoc(N), NodeTys, N0.getOperand(0),		SDValue R = DAG.getNode(DivRemOpcode, SDLoc(N), NodeTys, N0.getOperand(0),
N0.getOperand(1));		N0.getOperand(1));
DAG.ReplaceAllUsesOfValueWith(N0.getValue(0), R.getValue(0));		DAG.ReplaceAllUsesOfValueWith(N0.getValue(0), R.getValue(0));
return R.getValue(1);		return R.getValue(1);
}		}

/// Convert a SEXT of a vector to a SIGN_EXTEND_VECTOR_INREG, this requires		/// Convert a SEXT or ZEXT of a vector to a SIGN_EXTEND_VECTOR_INREG or
/// the splitting (or concatenating with UNDEFs) of the input to vectors of the		/// ZERO_EXTEND_VECTOR_INREG, this requires the splitting (or concatenating
/// same size as the target type which then extends the lowest elements.		/// with UNDEFs) of the input to vectors of the same size as the target type
		/// which then extends the lowest elements.
static SDValue combineToExtendVectorInReg(SDNode *N, SelectionDAG &DAG,		static SDValue combineToExtendVectorInReg(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
if (N->getOpcode() != ISD::SIGN_EXTEND)		unsigned Opcode = N->getOpcode();
		if (Opcode != ISD::SIGN_EXTEND && Opcode != ISD::ZERO_EXTEND)
return SDValue();		return SDValue();
if (!DCI.isBeforeLegalizeOps())		if (!DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();
if (!Subtarget.hasSSE2())		if (!Subtarget.hasSSE2())
return SDValue();		return SDValue();

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
Show All 23 Lines	static SDValue combineToExtendVectorInReg(SDNode *N, SelectionDAG &DAG,

// If target-size is less than 128-bits, extend to a type that would extend		// If target-size is less than 128-bits, extend to a type that would extend
// to 128 bits, extend that and extract the original target vector.		// to 128 bits, extend that and extract the original target vector.
if (VT.getSizeInBits() < 128 && !(128 % VT.getSizeInBits())) {		if (VT.getSizeInBits() < 128 && !(128 % VT.getSizeInBits())) {
unsigned Scale = 128 / VT.getSizeInBits();		unsigned Scale = 128 / VT.getSizeInBits();
EVT ExVT =		EVT ExVT =
EVT::getVectorVT(*DAG.getContext(), SVT, 128 / SVT.getSizeInBits());		EVT::getVectorVT(*DAG.getContext(), SVT, 128 / SVT.getSizeInBits());
SDValue Ex = ExtendVecSize(DL, N0, Scale * InVT.getSizeInBits());		SDValue Ex = ExtendVecSize(DL, N0, Scale * InVT.getSizeInBits());
SDValue SExt = DAG.getNode(ISD::SIGN_EXTEND, DL, ExVT, Ex);		SDValue SExt = DAG.getNode(Opcode, DL, ExVT, Ex);
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, SExt,		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, SExt,
DAG.getIntPtrConstant(0, DL));		DAG.getIntPtrConstant(0, DL));
}		}

// If target-size is 128-bits (or 256-bits on AVX2 target), then convert to		// If target-size is 128-bits (or 256-bits on AVX2 target), then convert to
// ISD::SIGN_EXTEND_VECTOR_INREG which ensures lowering to X86ISD::VSEXT.		// ISD::_EXTEND_VECTOR_INREG which ensures lowering to X86ISD::VEXT.
if (VT.is128BitVector() \|\| (VT.is256BitVector() && Subtarget.hasInt256())) {		if (VT.is128BitVector() \|\| (VT.is256BitVector() && Subtarget.hasInt256())) {
SDValue ExOp = ExtendVecSize(DL, N0, VT.getSizeInBits());		SDValue ExOp = ExtendVecSize(DL, N0, VT.getSizeInBits());
return DAG.getSignExtendVectorInReg(ExOp, DL, VT);		return Opcode == ISD::SIGN_EXTEND
		? DAG.getSignExtendVectorInReg(ExOp, DL, VT)
		: DAG.getZeroExtendVectorInReg(ExOp, DL, VT);
}		}

// On pre-AVX2 targets, split into 128-bit nodes of		// On pre-AVX2 targets, split into 128-bit nodes of
// ISD::SIGN_EXTEND_VECTOR_INREG.		// ISD::*_EXTEND_VECTOR_INREG.
if (!Subtarget.hasInt256() && !(VT.getSizeInBits() % 128)) {		if (!Subtarget.hasInt256() && !(VT.getSizeInBits() % 128)) {
unsigned NumVecs = VT.getSizeInBits() / 128;		unsigned NumVecs = VT.getSizeInBits() / 128;
unsigned NumSubElts = 128 / SVT.getSizeInBits();		unsigned NumSubElts = 128 / SVT.getSizeInBits();
EVT SubVT = EVT::getVectorVT(*DAG.getContext(), SVT, NumSubElts);		EVT SubVT = EVT::getVectorVT(*DAG.getContext(), SVT, NumSubElts);
EVT InSubVT = EVT::getVectorVT(*DAG.getContext(), InSVT, NumSubElts);		EVT InSubVT = EVT::getVectorVT(*DAG.getContext(), InSVT, NumSubElts);

SmallVector<SDValue, 8> Opnds;		SmallVector<SDValue, 8> Opnds;
for (unsigned i = 0, Offset = 0; i != NumVecs; ++i, Offset += NumSubElts) {		for (unsigned i = 0, Offset = 0; i != NumVecs; ++i, Offset += NumSubElts) {
SDValue SrcVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, InSubVT, N0,		SDValue SrcVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, InSubVT, N0,
DAG.getIntPtrConstant(Offset, DL));		DAG.getIntPtrConstant(Offset, DL));
SrcVec = ExtendVecSize(DL, SrcVec, 128);		SrcVec = ExtendVecSize(DL, SrcVec, 128);
SrcVec = DAG.getSignExtendVectorInReg(SrcVec, DL, SubVT);		SrcVec = Opcode == ISD::SIGN_EXTEND
		? DAG.getSignExtendVectorInReg(SrcVec, DL, SubVT)
		: DAG.getZeroExtendVectorInReg(SrcVec, DL, SubVT);
Opnds.push_back(SrcVec);		Opnds.push_back(SrcVec);
}		}
return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Opnds);		return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Opnds);
}		}

return SDValue();		return SDValue();
}		}

▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::TRUNCATE &&
if (N00.getOpcode() == X86ISD::SETCC_CARRY) {		if (N00.getOpcode() == X86ISD::SETCC_CARRY) {
return DAG.getNode(ISD::AND, dl, VT,		return DAG.getNode(ISD::AND, dl, VT,
DAG.getNode(X86ISD::SETCC_CARRY, dl, VT,		DAG.getNode(X86ISD::SETCC_CARRY, dl, VT,
N00.getOperand(0), N00.getOperand(1)),		N00.getOperand(0), N00.getOperand(1)),
DAG.getConstant(1, dl, VT));		DAG.getConstant(1, dl, VT));
}		}
}		}

		if (SDValue V = combineToExtendVectorInReg(N, DAG, DCI, Subtarget))
		return V;

if (VT.is256BitVector())		if (VT.is256BitVector())
if (SDValue R = WidenMaskArithmetic(N, DAG, DCI, Subtarget))		if (SDValue R = WidenMaskArithmetic(N, DAG, DCI, Subtarget))
return R;		return R;

if (SDValue DivRem8 = getDivRem8(N, DAG))		if (SDValue DivRem8 = getDivRem8(N, DAG))
return DivRem8;		return DivRem8;

return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 1,387 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-ext.ll

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
; KNL-NEXT: vpand %ymm0, %ymm1, %ymm0		; KNL-NEXT: vpand %ymm0, %ymm1, %ymm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: zext_16x8_to_16x16_mask:		; SKX-LABEL: zext_16x8_to_16x16_mask:
; SKX: ## BB#0:		; SKX: ## BB#0:
; SKX-NEXT: vpsllw $7, %xmm1, %xmm1		; SKX-NEXT: vpsllw $7, %xmm1, %xmm1
; SKX-NEXT: vpmovb2m %xmm1, %k1		; SKX-NEXT: vpmovb2m %xmm1, %k1
; SKX-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero		; SKX-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
		; SKX-NEXT: vmovdqu16 %ymm0, %ymm0 {%k1} {z}
		delenaUnsubmitted Not Done Reply Inline Actions Hi Simon, Why do we need an additional instruction here? vpmovzxbw %xmm0, %ymm0 {%k1} {z} does the work delena: Hi Simon, Why do we need an additional instruction here? vpmovzxbw %xmm0, %ymm0 {%k1}…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Hi - that was what I was asking yourself + Igor. I don't know much about how the masking lowering work in AVX512, but for some reason these VZEXT (which in this case has come via VECTOR_SHUFFLE lowering) don't correctly combine with the masks. Now, I'm tempted to avoid this issue by just not combining cases where ZERO_EXTEND is legal and extends the whole register, but it looks like its just hiding a bigger problem. I'll see if I can create a simplified repro if you wish? RKSimon: Hi - that was what I was asking yourself + Igor. I don't know much about how the masking…
; SKX-NEXT: retq		; SKX-NEXT: retq
%x = zext <16 x i8> %a to <16 x i16>		%x = zext <16 x i8> %a to <16 x i16>
%ret = select <16 x i1> %mask, <16 x i16> %x, <16 x i16> zeroinitializer		%ret = select <16 x i1> %mask, <16 x i16> %x, <16 x i16> zeroinitializer
ret <16 x i16> %ret		ret <16 x i16> %ret
}		}

define <16 x i16> @sext_16x8_to_16x16(<16 x i8> %a ) nounwind readnone {		define <16 x i16> @sext_16x8_to_16x16(<16 x i8> %a ) nounwind readnone {
; ALL-LABEL: sext_16x8_to_16x16:		; ALL-LABEL: sext_16x8_to_16x16:
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	; SKX-NEXT: retq
%x = sext <32 x i8> %a to <32 x i16>		%x = sext <32 x i8> %a to <32 x i16>
%ret = select <32 x i1> %mask, <32 x i16> %x, <32 x i16> zeroinitializer		%ret = select <32 x i1> %mask, <32 x i16> %x, <32 x i16> zeroinitializer
ret <32 x i16> %ret		ret <32 x i16> %ret
}		}

define <32 x i16> @zext_32x8_to_32x16(<32 x i8> %a ) nounwind readnone {		define <32 x i16> @zext_32x8_to_32x16(<32 x i8> %a ) nounwind readnone {
; KNL-LABEL: zext_32x8_to_32x16:		; KNL-LABEL: zext_32x8_to_32x16:
; KNL: ## BB#0:		; KNL: ## BB#0:
; KNL-NEXT: vpmovzxbw {{.*#+}} ymm2 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero		; KNL-NEXT: vextracti128 $1, %ymm0, %xmm1
; KNL-NEXT: vextracti128 $1, %ymm0, %xmm0		; KNL-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm1[0],zero,xmm1[1],zero,xmm1[2],zero,xmm1[3],zero,xmm1[4],zero,xmm1[5],zero,xmm1[6],zero,xmm1[7],zero,xmm1[8],zero,xmm1[9],zero,xmm1[10],zero,xmm1[11],zero,xmm1[12],zero,xmm1[13],zero,xmm1[14],zero,xmm1[15],zero
; KNL-NEXT: vpmovzxbw {{.*#+}} ymm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero		; KNL-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
; KNL-NEXT: vmovaps %zmm2, %zmm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: zext_32x8_to_32x16:		; SKX-LABEL: zext_32x8_to_32x16:
; SKX: ## BB#0:		; SKX: ## BB#0:
; SKX-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero		; SKX-NEXT: vpmovzxbw {{.*#+}} zmm0 = ymm0[0],zero,ymm0[1],zero,ymm0[2],zero,ymm0[3],zero,ymm0[4],zero,ymm0[5],zero,ymm0[6],zero,ymm0[7],zero,ymm0[8],zero,ymm0[9],zero,ymm0[10],zero,ymm0[11],zero,ymm0[12],zero,ymm0[13],zero,ymm0[14],zero,ymm0[15],zero,ymm0[16],zero,ymm0[17],zero,ymm0[18],zero,ymm0[19],zero,ymm0[20],zero,ymm0[21],zero,ymm0[22],zero,ymm0[23],zero,ymm0[24],zero,ymm0[25],zero,ymm0[26],zero,ymm0[27],zero,ymm0[28],zero,ymm0[29],zero,ymm0[30],zero,ymm0[31],zero
; SKX-NEXT: retq		; SKX-NEXT: retq
%x = zext <32 x i8> %a to <32 x i16>		%x = zext <32 x i8> %a to <32 x i16>
ret <32 x i16> %x		ret <32 x i16> %x
▲ Show 20 Lines • Show All 538 Lines • ▼ Show 20 Lines
; KNL-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}		; KNL-NEXT: vpblendmd %zmm0, %zmm1, %zmm0 {%k1}
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: zext_8x16_to_8x32mask:		; SKX-LABEL: zext_8x16_to_8x32mask:
; SKX: ## BB#0:		; SKX: ## BB#0:
; SKX-NEXT: vpsllw $15, %xmm1, %xmm1		; SKX-NEXT: vpsllw $15, %xmm1, %xmm1
; SKX-NEXT: vpmovw2m %xmm1, %k1		; SKX-NEXT: vpmovw2m %xmm1, %k1
; SKX-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero		; SKX-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
		; SKX-NEXT: vmovdqa32 %ymm0, %ymm0 {%k1} {z}
; SKX-NEXT: retq		; SKX-NEXT: retq
%x = zext <8 x i16> %a to <8 x i32>		%x = zext <8 x i16> %a to <8 x i32>
%ret = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer		%ret = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer
ret <8 x i32> %ret		ret <8 x i32> %ret
}		}

define <8 x i32> @zext_8x16_to_8x32(<8 x i16> %a ) nounwind readnone {		define <8 x i32> @zext_8x16_to_8x32(<8 x i16> %a ) nounwind readnone {
; ALL-LABEL: zext_8x16_to_8x32:		; ALL-LABEL: zext_8x16_to_8x32:
▲ Show 20 Lines • Show All 400 Lines • ▼ Show 20 Lines
; KNL-NEXT: vpand %ymm0, %ymm1, %ymm0		; KNL-NEXT: vpand %ymm0, %ymm1, %ymm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: zext_4x32_to_4x64mask:		; SKX-LABEL: zext_4x32_to_4x64mask:
; SKX: ## BB#0:		; SKX: ## BB#0:
; SKX-NEXT: vpslld $31, %xmm1, %xmm1		; SKX-NEXT: vpslld $31, %xmm1, %xmm1
; SKX-NEXT: vptestmd %xmm1, %xmm1, %k1		; SKX-NEXT: vptestmd %xmm1, %xmm1, %k1
; SKX-NEXT: vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero		; SKX-NEXT: vpmovzxdq {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
		; SKX-NEXT: vmovdqa64 %ymm0, %ymm0 {%k1} {z}
; SKX-NEXT: retq		; SKX-NEXT: retq
%x = zext <4 x i32> %a to <4 x i64>		%x = zext <4 x i32> %a to <4 x i64>
%ret = select <4 x i1> %mask, <4 x i64> %x, <4 x i64> zeroinitializer		%ret = select <4 x i1> %mask, <4 x i64> %x, <4 x i64> zeroinitializer
ret <4 x i64> %ret		ret <4 x i64> %ret
}		}

define <8 x i64> @zext_8x32mem_to_8x64(<8 x i32> *%i , <8 x i1> %mask) nounwind readnone {		define <8 x i64> @zext_8x32mem_to_8x64(<8 x i32> *%i , <8 x i1> %mask) nounwind readnone {
; KNL-LABEL: zext_8x32mem_to_8x64:		; KNL-LABEL: zext_8x32mem_to_8x64:
▲ Show 20 Lines • Show All 687 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_int_to_fp.ll

	Show First 20 Lines • Show All 527 Lines • ▼ Show 20 Lines
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vcvtdq2pd %xmm0, %ymm0			; AVX1-NEXT: vcvtdq2pd %xmm0, %ymm0
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_16i8_to_2f64:			; AVX2-LABEL: uitofp_16i8_to_2f64:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: vcvtdq2pd %xmm0, %ymm0			; AVX2-NEXT: vcvtdq2pd %xmm0, %ymm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cvt = uitofp <16 x i8> %a to <16 x double>			%cvt = uitofp <16 x i8> %a to <16 x double>
	%shuf = shufflevector <16 x double> %cvt, <16 x double> undef, <2 x i32> <i32 0, i32 1>			%shuf = shufflevector <16 x double> %cvt, <16 x double> undef, <2 x i32> <i32 0, i32 1>
	ret <2 x double> %shuf			ret <2 x double> %shuf
	}			}

	▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	; AVX1-LABEL: uitofp_16i8_to_4f64:			; AVX1-LABEL: uitofp_16i8_to_4f64:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vcvtdq2pd %xmm0, %ymm0			; AVX1-NEXT: vcvtdq2pd %xmm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_16i8_to_4f64:			; AVX2-LABEL: uitofp_16i8_to_4f64:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: vcvtdq2pd %xmm0, %ymm0			; AVX2-NEXT: vcvtdq2pd %xmm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cvt = uitofp <16 x i8> %a to <16 x double>			%cvt = uitofp <16 x i8> %a to <16 x double>
	%shuf = shufflevector <16 x double> %cvt, <16 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%shuf = shufflevector <16 x double> %cvt, <16 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x double> %shuf			ret <4 x double> %shuf
	}			}

	;			;
	▲ Show 20 Lines • Show All 643 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: pxor %xmm1, %xmm1
	; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm0			; SSE-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_16i8_to_4f32:			; AVX1-LABEL: uitofp_16i8_to_4f32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_16i8_to_4f32:			; AVX2-LABEL: uitofp_16i8_to_4f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cvt = uitofp <16 x i8> %a to <16 x float>			%cvt = uitofp <16 x i8> %a to <16 x float>
	%shuf = shufflevector <16 x float> %cvt, <16 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%shuf = shufflevector <16 x float> %cvt, <16 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x float> %shuf			ret <4 x float> %shuf
	}			}

	▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
	; SSE-NEXT: cvtdq2ps %xmm2, %xmm2			; SSE-NEXT: cvtdq2ps %xmm2, %xmm2
	; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm1			; SSE-NEXT: cvtdq2ps %xmm0, %xmm1
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_8i8_to_8f32:			; AVX1-LABEL: uitofp_8i8_to_8f32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_8i8_to_8f32:			; AVX2-LABEL: uitofp_8i8_to_8f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%cvt = uitofp <8 x i8> %shuf to <8 x float>			%cvt = uitofp <8 x i8> %shuf to <8 x float>
	ret <8 x float> %cvt			ret <8 x float> %cvt
	}			}

	define <8 x float> @uitofp_16i8_to_8f32(<16 x i8> %a) {			define <8 x float> @uitofp_16i8_to_8f32(<16 x i8> %a) {
	; SSE-LABEL: uitofp_16i8_to_8f32:			; SSE-LABEL: uitofp_16i8_to_8f32:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: pxor %xmm1, %xmm1
	; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: movdqa %xmm0, %xmm2			; SSE-NEXT: movdqa %xmm0, %xmm2
	; SSE-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]			; SSE-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
	; SSE-NEXT: cvtdq2ps %xmm2, %xmm2			; SSE-NEXT: cvtdq2ps %xmm2, %xmm2
	; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm1			; SSE-NEXT: cvtdq2ps %xmm0, %xmm1
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_16i8_to_8f32:			; AVX1-LABEL: uitofp_16i8_to_8f32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_16i8_to_8f32:			; AVX2-LABEL: uitofp_16i8_to_8f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%cvt = uitofp <16 x i8> %a to <16 x float>			%cvt = uitofp <16 x i8> %a to <16 x float>
	%shuf = shufflevector <16 x float> %cvt, <16 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%shuf = shufflevector <16 x float> %cvt, <16 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x float> %shuf			ret <8 x float> %shuf
	}			}

	;			;
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-zext.ll

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_16i8_to_8i32:			; AVX1-LABEL: zext_16i8_to_8i32:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_16i8_to_8i32:			; AVX2-LABEL: zext_16i8_to_8i32:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: zext_16i8_to_8i32:			; AVX512-LABEL: zext_16i8_to_8i32:
	; AVX512: # BB#0: # %entry			; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX512-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX512-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	entry:			entry:
	%B = shufflevector <16 x i8> %A, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%B = shufflevector <16 x i8> %A, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%C = zext <8 x i8> %B to <8 x i32>			%C = zext <8 x i8> %B to <8 x i32>
	ret <8 x i32> %C			ret <8 x i32> %C
	}			}

	define <2 x i64> @zext_16i8_to_2i64(<16 x i8> %A) nounwind uwtable readnone ssp {			define <2 x i64> @zext_16i8_to_2i64(<16 x i8> %A) nounwind uwtable readnone ssp {
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pmovzxbq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; SSE41-NEXT: pmovzxbq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: psrld $16, %xmm0			; SSE41-NEXT: psrld $16, %xmm0
	; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_16i8_to_4i64:			; AVX1-LABEL: zext_16i8_to_4i64:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
				; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
	; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_16i8_to_4i64:			; AVX2-LABEL: zext_16i8_to_4i64:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero			; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: zext_16i8_to_4i64:			; AVX512-LABEL: zext_16i8_to_4i64:
	; AVX512: # BB#0: # %entry			; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero			; AVX512-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
	; AVX512-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	entry:			entry:
	%B = shufflevector <16 x i8> %A, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%B = shufflevector <16 x i8> %A, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%C = zext <4 x i8> %B to <4 x i64>			%C = zext <4 x i8> %B to <4 x i64>
	ret <4 x i64> %C			ret <4 x i64> %C
	}			}

	define <4 x i32> @zext_8i16_to_4i32(<8 x i16> %A) nounwind uwtable readnone ssp {			define <4 x i32> @zext_8i16_to_4i32(<8 x i16> %A) nounwind uwtable readnone ssp {
	▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pmovzxwq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; SSE41-NEXT: pmovzxwq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_8i16_to_4i64:			; AVX1-LABEL: zext_8i16_to_4i64:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
				; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_8i16_to_4i64:			; AVX2-LABEL: zext_8i16_to_4i64:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX2-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; AVX2-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5,6,7],ymm0[8],ymm1[9,10,11],ymm0[12],ymm1[13,14,15]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: zext_8i16_to_4i64:			; AVX512-LABEL: zext_8i16_to_4i64:
	; AVX512: # BB#0: # %entry			; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX512-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX512-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; AVX512-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5,6,7],ymm0[8],ymm1[9,10,11],ymm0[12],ymm1[13,14,15]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	entry:			entry:
	%B = shufflevector <8 x i16> %A, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%B = shufflevector <8 x i16> %A, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%C = zext <4 x i16> %B to <4 x i64>			%C = zext <4 x i16> %B to <4 x i64>
	ret <4 x i64> %C			ret <4 x i64> %C
	}			}

	define <2 x i64> @zext_4i32_to_2i64(<4 x i32> %A) nounwind uwtable readnone ssp {			define <2 x i64> @zext_4i32_to_2i64(<4 x i32> %A) nounwind uwtable readnone ssp {
	▲ Show 20 Lines • Show All 1,201 Lines • Show Last 20 Lines