This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
ClosedPublic

Authored by RKSimon on Feb 28 2016, 4:17 AM.

Download Raw Diff

Details

Reviewers

spatel
delena
congh
igorb

Commits

rG61eb49e437d7: [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to…
rG91dd0a796cb3: [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
rL263159: [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to…
rL262599: [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG

Summary

Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

This won't solve the issues with PR25718 (load+zext 8xi8 to 8xi32) but is one of several things that needs to be done at the same time.

Igor/Elena - can you advise why the masks aren't being folded into the VPMOVZX instructions on skylake targets any more please?

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 49314.Feb 28 2016, 4:17 AM

RKSimon retitled this revision from to [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG.

RKSimon updated this object.

RKSimon added reviewers: igorb, delena, congh, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

delena added inline comments.Feb 28 2016, 6:10 AM

test/CodeGen/X86/avx512-ext.ll
116 ↗	(On Diff #49314)	Hi Simon, Why do we need an additional instruction here? vpmovzxbw %xmm0, %ymm0 {%k1} {z} does the work

RKSimon added inline comments.Feb 28 2016, 6:32 AM

test/CodeGen/X86/avx512-ext.ll
116 ↗	(On Diff #49314)	Hi - that was what I was asking yourself + Igor. I don't know much about how the masking lowering work in AVX512, but for some reason these VZEXT (which in this case has come via VECTOR_SHUFFLE lowering) don't correctly combine with the masks. Now, I'm tempted to avoid this issue by just not combining cases where ZERO_EXTEND is legal and extends the whole register, but it looks like its just hiding a bigger problem. I'll see if I can create a simplified repro if you wish?

I looked at the code. We combine "zext" with "select" in td file. and receive
vpmovzxbw %xmm0, %ymm0 {%k1} {z}

Then I'm looking at lowering of ZERO_EXTEND_VECTOR_INREG : VectorLegalizer::ExpandZERO_EXTEND_VECTOR_INREG()

It goes there, right?
At the end I see:

return DAG.getNode(ISD::BITCAST, DL, VT,
                   DAG.getVectorShuffle(SrcVT, DL, Zero, Src, ShuffleMask));

I assume that BITCAST does not allow combining "zext" with "select" .

But SEXT code woks perfect on SKX, right?

define <16 x i16> @zext_16x8_to_16x16_mask(<16 x i8> %a ,<16 x i1> %mask) nounwind readnone {

%x   = sext <16 x i8> %a to <16 x i16>
%ret = select <16 x i1> %mask, <16 x i16> %x, <16 x i16> zeroinitializer
ret <16 x i16> %ret

}

In D17691#363782, @delena wrote:

I assume that BITCAST does not allow combining "zext" with "select" .

Yes, I've just done a repro using vector shuffle - the bitcast seems to be the problem (and is why we don't see it with the VSEXT case). I can raise a bugzilla describing this if you want me to.

It looks the best option for now is only use this combine for certain non-legal ZERO_EXTEND cases - I'll update it shortly.

I wonder whether X86ISD::VZEXT / X86ISD::VSEXT are that useful or whether we should try to implement the PMOVZX/PMOVSX instructions with a mixture of ZERO/SIGN_EXTEND and ZERO/SIGN_EXTEND_VECTOR_INREG - I've looked at this in the past but haven't pursued it recently, IIRC there were problems with memory folding.

Tightened the restrictions on the *_EXTEND to *_EXTEND_VECTOR_INREG so that already legal transforms (AVX2+) don't get unnecssarily combined.

Created PR26762 to investigate the issue with bitcasts preventing masked instruction combining

Looks better. I suggest to run llc with --debug for zext and sext for SKX and compare between them.
But in general, the patch looks good.

In D17691#363842, @delena wrote:

Looks better. I suggest to run llc with --debug for zext and sext for SKX and compare between them.
But in general, the patch looks good.

No problem - I'll put my findings on PR26762.

OK to commit?

LGTM

This revision is now accepted and ready to land.Mar 2 2016, 10:42 AM

Closed by commit rL262599: [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG (authored by RKSimon). · Explain WhyMar 3 2016, 1:48 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

19 lines

Target/

X86/

X86ISelLowering.cpp

33 lines

test/

CodeGen/

X86/

vec_int_to_fp.ll

22 lines

vector-zext.ll

29 lines

Diff 49722

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	private:
SDValue visitSELECT_CC(SDNode *N);		SDValue visitSELECT_CC(SDNode *N);
SDValue visitSETCC(SDNode *N);		SDValue visitSETCC(SDNode *N);
SDValue visitSETCCE(SDNode *N);		SDValue visitSETCCE(SDNode *N);
SDValue visitSIGN_EXTEND(SDNode *N);		SDValue visitSIGN_EXTEND(SDNode *N);
SDValue visitZERO_EXTEND(SDNode *N);		SDValue visitZERO_EXTEND(SDNode *N);
SDValue visitANY_EXTEND(SDNode *N);		SDValue visitANY_EXTEND(SDNode *N);
SDValue visitSIGN_EXTEND_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_INREG(SDNode *N);
SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);
		SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitTRUNCATE(SDNode *N);		SDValue visitTRUNCATE(SDNode *N);
SDValue visitBITCAST(SDNode *N);		SDValue visitBITCAST(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);		SDValue visitBUILD_PAIR(SDNode *N);
SDValue visitFADD(SDNode *N);		SDValue visitFADD(SDNode *N);
SDValue visitFSUB(SDNode *N);		SDValue visitFSUB(SDNode *N);
SDValue visitFMUL(SDNode *N);		SDValue visitFMUL(SDNode *N);
SDValue visitFMA(SDNode *N);		SDValue visitFMA(SDNode *N);
SDValue visitFDIV(SDNode *N);		SDValue visitFDIV(SDNode *N);
▲ Show 20 Lines • Show All 1,107 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::SELECT_CC: return visitSELECT_CC(N);		case ISD::SELECT_CC: return visitSELECT_CC(N);
case ISD::SETCC: return visitSETCC(N);		case ISD::SETCC: return visitSETCC(N);
case ISD::SETCCE: return visitSETCCE(N);		case ISD::SETCCE: return visitSETCCE(N);
case ISD::SIGN_EXTEND: return visitSIGN_EXTEND(N);		case ISD::SIGN_EXTEND: return visitSIGN_EXTEND(N);
case ISD::ZERO_EXTEND: return visitZERO_EXTEND(N);		case ISD::ZERO_EXTEND: return visitZERO_EXTEND(N);
case ISD::ANY_EXTEND: return visitANY_EXTEND(N);		case ISD::ANY_EXTEND: return visitANY_EXTEND(N);
case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);		case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);
case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);		case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);
		case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);
case ISD::TRUNCATE: return visitTRUNCATE(N);		case ISD::TRUNCATE: return visitTRUNCATE(N);
case ISD::BITCAST: return visitBITCAST(N);		case ISD::BITCAST: return visitBITCAST(N);
case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);		case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
case ISD::FADD: return visitFADD(N);		case ISD::FADD: return visitFADD(N);
case ISD::FSUB: return visitFSUB(N);		case ISD::FSUB: return visitFSUB(N);
case ISD::FMUL: return visitFMUL(N);		case ISD::FMUL: return visitFMUL(N);
case ISD::FMA: return visitFMA(N);		case ISD::FMA: return visitFMA(N);
case ISD::FDIV: return visitFDIV(N);		case ISD::FDIV: return visitFDIV(N);
▲ Show 20 Lines • Show All 4,303 Lines • ▼ Show 20 Lines
static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,		static SDNode tryToFoldExtendOfConstant(SDNode N, const TargetLowering &TLI,
SelectionDAG &DAG, bool LegalTypes,		SelectionDAG &DAG, bool LegalTypes,
bool LegalOperations) {		bool LegalOperations) {
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

assert((Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::ZERO_EXTEND \|\|		assert((Opcode == ISD::SIGN_EXTEND \|\| Opcode == ISD::ZERO_EXTEND \|\|
Opcode == ISD::ANY_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_VECTOR_INREG)		Opcode == ISD::ANY_EXTEND \|\| Opcode == ISD::SIGN_EXTEND_VECTOR_INREG \|\|
		Opcode == ISD::ZERO_EXTEND_VECTOR_INREG)
&& "Expected EXTEND dag node in input!");		&& "Expected EXTEND dag node in input!");

// fold (sext c1) -> c1		// fold (sext c1) -> c1
// fold (zext c1) -> c1		// fold (zext c1) -> c1
// fold (aext c1) -> c1		// fold (aext c1) -> c1
if (isa<ConstantSDNode>(N0))		if (isa<ConstantSDNode>(N0))
return DAG.getNode(Opcode, SDLoc(N), VT, N0).getNode();		return DAG.getNode(Opcode, SDLoc(N), VT, N0).getNode();

▲ Show 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSIGN_EXTEND_VECTOR_INREG(SDNode *N) {

if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
LegalOperations))		LegalOperations))
return SDValue(Res, 0);		return SDValue(Res, 0);

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitZERO_EXTEND_VECTOR_INREG(SDNode *N) {
		SDValue N0 = N->getOperand(0);
		EVT VT = N->getValueType(0);

		if (N0.getOpcode() == ISD::UNDEF)
		return DAG.getUNDEF(VT);

		if (SDNode *Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes,
		LegalOperations))
		return SDValue(Res, 0);

		return SDValue();
		}

SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {		SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
bool isLE = DAG.getDataLayout().isLittleEndian();		bool isLE = DAG.getDataLayout().isLittleEndian();

// noop truncate		// noop truncate
if (N0.getValueType() == N->getValueType(0))		if (N0.getValueType() == N->getValueType(0))
return N0;		return N0;
▲ Show 20 Lines • Show All 7,818 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 28,326 Lines • ▼ Show 20 Lines	static SDValue getDivRem8(SDNode *N, SelectionDAG &DAG) {
auto DivRemOpcode = OpcodeN0 == ISD::SDIVREM ? X86ISD::SDIVREM8_SEXT_HREG		auto DivRemOpcode = OpcodeN0 == ISD::SDIVREM ? X86ISD::SDIVREM8_SEXT_HREG
: X86ISD::UDIVREM8_ZEXT_HREG;		: X86ISD::UDIVREM8_ZEXT_HREG;
SDValue R = DAG.getNode(DivRemOpcode, SDLoc(N), NodeTys, N0.getOperand(0),		SDValue R = DAG.getNode(DivRemOpcode, SDLoc(N), NodeTys, N0.getOperand(0),
N0.getOperand(1));		N0.getOperand(1));
DAG.ReplaceAllUsesOfValueWith(N0.getValue(0), R.getValue(0));		DAG.ReplaceAllUsesOfValueWith(N0.getValue(0), R.getValue(0));
return R.getValue(1);		return R.getValue(1);
}		}

/// Convert a SEXT of a vector to a SIGN_EXTEND_VECTOR_INREG, this requires		/// Convert a SEXT or ZEXT of a vector to a SIGN_EXTEND_VECTOR_INREG or
/// the splitting (or concatenating with UNDEFs) of the input to vectors of the		/// ZERO_EXTEND_VECTOR_INREG, this requires the splitting (or concatenating
/// same size as the target type which then extends the lowest elements.		/// with UNDEFs) of the input to vectors of the same size as the target type
		/// which then extends the lowest elements.
static SDValue combineToExtendVectorInReg(SDNode *N, SelectionDAG &DAG,		static SDValue combineToExtendVectorInReg(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
if (N->getOpcode() != ISD::SIGN_EXTEND)		unsigned Opcode = N->getOpcode();
		if (Opcode != ISD::SIGN_EXTEND && Opcode != ISD::ZERO_EXTEND)
return SDValue();		return SDValue();
if (!DCI.isBeforeLegalizeOps())		if (!DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();
if (!Subtarget.hasSSE2())		if (!Subtarget.hasSSE2())
return SDValue();		return SDValue();

SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
EVT InVT = N0.getValueType();		EVT InVT = N0.getValueType();
EVT InSVT = InVT.getScalarType();		EVT InSVT = InVT.getScalarType();

// Input type must be a vector and we must be extending legal integer types.		// Input type must be a vector and we must be extending legal integer types.
if (!VT.isVector())		if (!VT.isVector())
return SDValue();		return SDValue();
if (SVT != MVT::i64 && SVT != MVT::i32 && SVT != MVT::i16)		if (SVT != MVT::i64 && SVT != MVT::i32 && SVT != MVT::i16)
return SDValue();		return SDValue();
if (InSVT != MVT::i32 && InSVT != MVT::i16 && InSVT != MVT::i8)		if (InSVT != MVT::i32 && InSVT != MVT::i16 && InSVT != MVT::i8)
return SDValue();		return SDValue();

		// On AVX2+ targets, if the input/output types are both legal then we will be
		// able to use SIGN_EXTEND/ZERO_EXTEND directly.
		if (Subtarget.hasInt256() && DAG.getTargetLoweringInfo().isTypeLegal(VT) &&
		DAG.getTargetLoweringInfo().isTypeLegal(InVT))
		return SDValue();

SDLoc DL(N);		SDLoc DL(N);

auto ExtendVecSize = [&DAG](SDLoc DL, SDValue N, unsigned Size) {		auto ExtendVecSize = [&DAG](SDLoc DL, SDValue N, unsigned Size) {
EVT InVT = N.getValueType();		EVT InVT = N.getValueType();
EVT OutVT = EVT::getVectorVT(*DAG.getContext(), InVT.getScalarType(),		EVT OutVT = EVT::getVectorVT(*DAG.getContext(), InVT.getScalarType(),
Size / InVT.getScalarSizeInBits());		Size / InVT.getScalarSizeInBits());
SmallVector<SDValue, 8> Opnds(Size / InVT.getSizeInBits(),		SmallVector<SDValue, 8> Opnds(Size / InVT.getSizeInBits(),
DAG.getUNDEF(InVT));		DAG.getUNDEF(InVT));
Opnds[0] = N;		Opnds[0] = N;
return DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Opnds);		return DAG.getNode(ISD::CONCAT_VECTORS, DL, OutVT, Opnds);
};		};

// If target-size is less than 128-bits, extend to a type that would extend		// If target-size is less than 128-bits, extend to a type that would extend
// to 128 bits, extend that and extract the original target vector.		// to 128 bits, extend that and extract the original target vector.
if (VT.getSizeInBits() < 128 && !(128 % VT.getSizeInBits())) {		if (VT.getSizeInBits() < 128 && !(128 % VT.getSizeInBits())) {
unsigned Scale = 128 / VT.getSizeInBits();		unsigned Scale = 128 / VT.getSizeInBits();
EVT ExVT =		EVT ExVT =
EVT::getVectorVT(*DAG.getContext(), SVT, 128 / SVT.getSizeInBits());		EVT::getVectorVT(*DAG.getContext(), SVT, 128 / SVT.getSizeInBits());
SDValue Ex = ExtendVecSize(DL, N0, Scale * InVT.getSizeInBits());		SDValue Ex = ExtendVecSize(DL, N0, Scale * InVT.getSizeInBits());
SDValue SExt = DAG.getNode(ISD::SIGN_EXTEND, DL, ExVT, Ex);		SDValue SExt = DAG.getNode(Opcode, DL, ExVT, Ex);
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, SExt,		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, SExt,
DAG.getIntPtrConstant(0, DL));		DAG.getIntPtrConstant(0, DL));
}		}

// If target-size is 128-bits (or 256-bits on AVX2 target), then convert to		// If target-size is 128-bits (or 256-bits on AVX2 target), then convert to
// ISD::SIGN_EXTEND_VECTOR_INREG which ensures lowering to X86ISD::VSEXT.		// ISD::_EXTEND_VECTOR_INREG which ensures lowering to X86ISD::VEXT.
if (VT.is128BitVector() \|\| (VT.is256BitVector() && Subtarget.hasInt256())) {		if (VT.is128BitVector() \|\| (VT.is256BitVector() && Subtarget.hasInt256())) {
SDValue ExOp = ExtendVecSize(DL, N0, VT.getSizeInBits());		SDValue ExOp = ExtendVecSize(DL, N0, VT.getSizeInBits());
return DAG.getSignExtendVectorInReg(ExOp, DL, VT);		return Opcode == ISD::SIGN_EXTEND
		? DAG.getSignExtendVectorInReg(ExOp, DL, VT)
		: DAG.getZeroExtendVectorInReg(ExOp, DL, VT);
}		}

// On pre-AVX2 targets, split into 128-bit nodes of		// On pre-AVX2 targets, split into 128-bit nodes of
// ISD::SIGN_EXTEND_VECTOR_INREG.		// ISD::*_EXTEND_VECTOR_INREG.
if (!Subtarget.hasInt256() && !(VT.getSizeInBits() % 128)) {		if (!Subtarget.hasInt256() && !(VT.getSizeInBits() % 128)) {
unsigned NumVecs = VT.getSizeInBits() / 128;		unsigned NumVecs = VT.getSizeInBits() / 128;
unsigned NumSubElts = 128 / SVT.getSizeInBits();		unsigned NumSubElts = 128 / SVT.getSizeInBits();
EVT SubVT = EVT::getVectorVT(*DAG.getContext(), SVT, NumSubElts);		EVT SubVT = EVT::getVectorVT(*DAG.getContext(), SVT, NumSubElts);
EVT InSubVT = EVT::getVectorVT(*DAG.getContext(), InSVT, NumSubElts);		EVT InSubVT = EVT::getVectorVT(*DAG.getContext(), InSVT, NumSubElts);

SmallVector<SDValue, 8> Opnds;		SmallVector<SDValue, 8> Opnds;
for (unsigned i = 0, Offset = 0; i != NumVecs; ++i, Offset += NumSubElts) {		for (unsigned i = 0, Offset = 0; i != NumVecs; ++i, Offset += NumSubElts) {
SDValue SrcVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, InSubVT, N0,		SDValue SrcVec = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, InSubVT, N0,
DAG.getIntPtrConstant(Offset, DL));		DAG.getIntPtrConstant(Offset, DL));
SrcVec = ExtendVecSize(DL, SrcVec, 128);		SrcVec = ExtendVecSize(DL, SrcVec, 128);
SrcVec = DAG.getSignExtendVectorInReg(SrcVec, DL, SubVT);		SrcVec = Opcode == ISD::SIGN_EXTEND
		? DAG.getSignExtendVectorInReg(SrcVec, DL, SubVT)
		: DAG.getZeroExtendVectorInReg(SrcVec, DL, SubVT);
Opnds.push_back(SrcVec);		Opnds.push_back(SrcVec);
}		}
return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Opnds);		return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Opnds);
}		}

return SDValue();		return SDValue();
}		}

▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::TRUNCATE &&
if (N00.getOpcode() == X86ISD::SETCC_CARRY) {		if (N00.getOpcode() == X86ISD::SETCC_CARRY) {
return DAG.getNode(ISD::AND, dl, VT,		return DAG.getNode(ISD::AND, dl, VT,
DAG.getNode(X86ISD::SETCC_CARRY, dl, VT,		DAG.getNode(X86ISD::SETCC_CARRY, dl, VT,
N00.getOperand(0), N00.getOperand(1)),		N00.getOperand(0), N00.getOperand(1)),
DAG.getConstant(1, dl, VT));		DAG.getConstant(1, dl, VT));
}		}
}		}

		if (SDValue V = combineToExtendVectorInReg(N, DAG, DCI, Subtarget))
		return V;

if (VT.is256BitVector())		if (VT.is256BitVector())
if (SDValue R = WidenMaskArithmetic(N, DAG, DCI, Subtarget))		if (SDValue R = WidenMaskArithmetic(N, DAG, DCI, Subtarget))
return R;		return R;

if (SDValue DivRem8 = getDivRem8(N, DAG))		if (SDValue DivRem8 = getDivRem8(N, DAG))
return DivRem8;		return DivRem8;

return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 1,400 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll

	Show First 20 Lines • Show All 1,422 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: pxor %xmm1, %xmm1
	; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]			; SSE-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm0			; SSE-NEXT: cvtdq2ps %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_16i8_to_4f32:			; AVX1-LABEL: uitofp_16i8_to_4f32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_16i8_to_4f32:			; AVX2-LABEL: uitofp_16i8_to_4f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	▲ Show 20 Lines • Show All 303 Lines • ▼ Show 20 Lines
	; SSE-NEXT: cvtdq2ps %xmm2, %xmm2			; SSE-NEXT: cvtdq2ps %xmm2, %xmm2
	; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm1			; SSE-NEXT: cvtdq2ps %xmm0, %xmm1
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_8i8_to_8f32:			; AVX1-LABEL: uitofp_8i8_to_8f32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_8i8_to_8f32:			; AVX2-LABEL: uitofp_8i8_to_8f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%shuf = shufflevector <16 x i8> %a, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%cvt = uitofp <8 x i8> %shuf to <8 x float>			%cvt = uitofp <8 x i8> %shuf to <8 x float>
	ret <8 x float> %cvt			ret <8 x float> %cvt
	}			}

	define <8 x float> @uitofp_16i8_to_8f32(<16 x i8> %a) {			define <8 x float> @uitofp_16i8_to_8f32(<16 x i8> %a) {
	; SSE-LABEL: uitofp_16i8_to_8f32:			; SSE-LABEL: uitofp_16i8_to_8f32:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: pxor %xmm1, %xmm1			; SSE-NEXT: pxor %xmm1, %xmm1
	; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: movdqa %xmm0, %xmm2			; SSE-NEXT: movdqa %xmm0, %xmm2
	; SSE-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]			; SSE-NEXT: punpcklwd {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1],xmm2[2],xmm1[2],xmm2[3],xmm1[3]
	; SSE-NEXT: cvtdq2ps %xmm2, %xmm2			; SSE-NEXT: cvtdq2ps %xmm2, %xmm2
	; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; SSE-NEXT: punpckhwd {{.*#+}} xmm0 = xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	; SSE-NEXT: cvtdq2ps %xmm0, %xmm1			; SSE-NEXT: cvtdq2ps %xmm0, %xmm1
	; SSE-NEXT: movaps %xmm2, %xmm0			; SSE-NEXT: movaps %xmm2, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1-LABEL: uitofp_16i8_to_8f32:			; AVX1-LABEL: uitofp_16i8_to_8f32:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpxor %xmm2, %xmm2, %xmm2			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX1-NEXT: vcvtdq2ps %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: uitofp_16i8_to_8f32:			; AVX2-LABEL: uitofp_16i8_to_8f32:
	; AVX2: # BB#0:			; AVX2: # BB#0:
	; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero			; AVX2-NEXT: vpmovzxbw {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero,xmm0[8],zero,xmm0[9],zero,xmm0[10],zero,xmm0[11],zero,xmm0[12],zero,xmm0[13],zero,xmm0[14],zero,xmm0[15],zero
	; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX2-NEXT: vpmovzxwd {{.*#+}} ymm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0			; AVX2-NEXT: vcvtdq2ps %ymm0, %ymm0
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/vector-zext.ll

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; SSE41-NEXT: pmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_16i8_to_8i32:			; AVX1-LABEL: zext_16i8_to_8i32:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vpunpckhwd {{.*#+}} xmm1 = xmm1[4],xmm0[4],xmm1[5],xmm0[5],xmm1[6],xmm0[6],xmm1[7],xmm0[7]			; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_16i8_to_8i32:			; AVX2-LABEL: zext_16i8_to_8i32:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX2-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: zext_16i8_to_8i32:			; AVX512-LABEL: zext_16i8_to_8i32:
	; AVX512: # BB#0: # %entry			; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero			; AVX512-NEXT: vpmovzxbd {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero,xmm0[4],zero,zero,zero,xmm0[5],zero,zero,zero,xmm0[6],zero,zero,zero,xmm0[7],zero,zero,zero
	; AVX512-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	entry:			entry:
	%B = shufflevector <16 x i8> %A, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%B = shufflevector <16 x i8> %A, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	%C = zext <8 x i8> %B to <8 x i32>			%C = zext <8 x i8> %B to <8 x i32>
	ret <8 x i32> %C			ret <8 x i32> %C
	}			}

	define <2 x i64> @zext_16i8_to_2i64(<16 x i8> %A) nounwind uwtable readnone ssp {			define <2 x i64> @zext_16i8_to_2i64(<16 x i8> %A) nounwind uwtable readnone ssp {
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pmovzxbq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; SSE41-NEXT: pmovzxbq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: psrld $16, %xmm0			; SSE41-NEXT: psrld $16, %xmm0
	; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; SSE41-NEXT: pmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_16i8_to_4i64:			; AVX1-LABEL: zext_16i8_to_4i64:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxbd {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
				; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
	; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero			; AVX1-NEXT: vpmovzxbq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_16i8_to_4i64:			; AVX2-LABEL: zext_16i8_to_4i64:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero			; AVX2-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: zext_16i8_to_4i64:			; AVX512-LABEL: zext_16i8_to_4i64:
	; AVX512: # BB#0: # %entry			; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero			; AVX512-NEXT: vpmovzxbq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,zero,zero,zero,zero,xmm0[1],zero,zero,zero,zero,zero,zero,zero,xmm0[2],zero,zero,zero,zero,zero,zero,zero,xmm0[3],zero,zero,zero,zero,zero,zero,zero
	; AVX512-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	entry:			entry:
	%B = shufflevector <16 x i8> %A, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%B = shufflevector <16 x i8> %A, <16 x i8> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%C = zext <4 x i8> %B to <4 x i64>			%C = zext <4 x i8> %B to <4 x i64>
	ret <4 x i64> %C			ret <4 x i64> %C
	}			}

	define <4 x i32> @zext_8i16_to_4i32(<8 x i16> %A) nounwind uwtable readnone ssp {			define <4 x i32> @zext_8i16_to_4i32(<8 x i16> %A) nounwind uwtable readnone ssp {
	▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: pmovzxwq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; SSE41-NEXT: pmovzxwq {{.*#+}} xmm2 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]			; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; SSE41-NEXT: pmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: zext_8i16_to_4i64:			; AVX1-LABEL: zext_8i16_to_4i64:
	; AVX1: # BB#0: # %entry			; AVX1: # BB#0: # %entry
	; AVX1-NEXT: vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero			; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
				; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
	; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX1-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,2,3,3]			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: zext_8i16_to_4i64:			; AVX2-LABEL: zext_8i16_to_4i64:
	; AVX2: # BB#0: # %entry			; AVX2: # BB#0: # %entry
	; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX2-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX2-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; AVX2-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5,6,7],ymm0[8],ymm1[9,10,11],ymm0[12],ymm1[13,14,15]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: zext_8i16_to_4i64:			; AVX512-LABEL: zext_8i16_to_4i64:
	; AVX512: # BB#0: # %entry			; AVX512: # BB#0: # %entry
	; AVX512-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero			; AVX512-NEXT: vpmovzxwq {{.*#+}} ymm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
	; AVX512-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; AVX512-NEXT: vpblendw {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3],ymm0[4],ymm1[5,6,7],ymm0[8],ymm1[9,10,11],ymm0[12],ymm1[13,14,15]
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	entry:			entry:
	%B = shufflevector <8 x i16> %A, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%B = shufflevector <8 x i16> %A, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%C = zext <4 x i16> %B to <4 x i64>			%C = zext <4 x i16> %B to <4 x i64>
	ret <4 x i64> %C			ret <4 x i64> %C
	}			}

	define <2 x i64> @zext_4i32_to_2i64(<4 x i32> %A) nounwind uwtable readnone ssp {			define <2 x i64> @zext_4i32_to_2i64(<4 x i32> %A) nounwind uwtable readnone ssp {
	▲ Show 20 Lines • Show All 1,201 Lines • Show Last 20 Lines