This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
PPCInstrInfo.td
-
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
vec_int_ext.ll

Differential D34009

[Power9] Exploit vector integer extend instructions when indices aren't correct
ClosedPublic

Authored by syzaara on Jun 7 2017, 12:49 PM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
sfertile
lei
jtony
inouehrs
stefanp
echristo
hfinkel

Commits

rG9a91a1811001: [Power9] Exploit vector integer extend instructions when indices aren't correct.
rL307169: [Power9] Exploit vector integer extend instructions when indices aren't correct.

Summary

This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine.

Diff Detail

Repository: rL LLVM

Event Timeline

syzaara created this revision.Jun 7 2017, 12:49 PM

syzaara added reviewers: echristo, hfinkel.Jun 8 2017, 6:16 AM

syzaara removed a subscriber: echristo.

nemanjai added inline comments.Jun 12 2017, 3:50 AM

lib/Target/PowerPC/PPCISelLowering.cpp
11226 ↗	(On Diff #101797)	I'm afraid that having such a generically named function in such a large file may get confusing down the road. Perhaps it would be clearer to fold this logic into `combineBVOfVecExtend()` and just use `getScalarSizeInBits()`. Perhaps something like: if (InputSize + OutputSize == 5) TgtElemArrayIdx = 0; // ... In any case, you don't want to use a valid array index for an invalid combination as you do here. For example, I don't really see anything in this patch that will prevent you from doing something weird for the `v16i8 -> v8i16` case (which there isn't an instruction for as far as I can tell). Also, please add that as a test case. I realize there isn't a pattern for this in the .td file, but I imagine you'll end up with some weird result in this case.
11255 ↗	(On Diff #101797)	I think it would be much cleaner to make this array local to `combineBVOfVecExtend` (you can then pass the target encoding to `addShuffleForVecExtend` rather than indexing into it again).
11271 ↗	(On Diff #101797)	Just initialize this at construction time and remove the loop.
11288 ↗	(On Diff #101797)	Do we really need this? I thought there were overloads of `SelectionDAG::getNode()` that take two `SDValue`'s.
11313 ↗	(On Diff #101797)	I don't really see a reason for this to return an `int` rather than a `bool`. The reader might assume that the lambda returns values from a larger space if the return type is `int`.
11335 ↗	(On Diff #101797)	Just a nit: it'd probably be more concise and readable to just use the ternary operator here.
11337 ↗	(On Diff #101797)	I don't actually understand how this works. Won't the nibbles that correspond to the target elements for the opposite endianness always just be zero? And if that's the case, won't the comparison below always fail? (see below) Perhaps the unnecessary shuffle produced will just be optimized out, but it's probably better not to emit it to begin with.
11352 ↗	(On Diff #101797)	Shouldn't both operands of this comparison just have the nibbles corresponding to the opposite endianness masked out? i.e. both should be and-ed with `0x0F0F0F0F0F0F0F0F` of `0xF0F0F0F0F0F0F0F0`
11355 ↗	(On Diff #101797)	Just a comment along the lines of `// Regular lowering will catch cases where a shuffle is not needed.`
lib/Target/PowerPC/PPCISelLowering.h
70 ↗	(On Diff #101797)	Nit: line length.
lib/Target/PowerPC/PPCInstrVSX.td
2721 ↗	(On Diff #101797)	Weren't these added with the previous patch? Or maybe it was only the LE ones? If so, can you please rebase this patch so that it would apply cleanly?
3043 ↗	(On Diff #101797)	The versions with the new node are not endianness specific are they? If not, please move them out of the blocks and have a single copy of each.
test/CodeGen/PowerPC/vec_int_ext.ll
4 ↗	(On Diff #101797)	Please add a test that checks that there are no shuffles when the input elements are correct.

syzaara updated this revision to Diff 102253.Jun 12 2017, 3:41 PM

syzaara added inline comments.Jun 12 2017, 3:44 PM

test/CodeGen/PowerPC/vec_int_ext.ll
4 ↗	(On Diff #101797)	Each of these tests is already checking both correct and incorrect elements. The LE tests are using correct elements for LE and so the BE pattern match should have the shuffle. The BE tests are using correct elements for BE and so the LE pattern match should have the shuffle.

nemanjai added inline comments.Jun 12 2017, 9:51 PM

test/CodeGen/PowerPC/vec_int_ext.ll
4 ↗	(On Diff #101797)	Ah I see. Not sure how I missed that. Sorry.

stefanp added inline comments.Jun 13 2017, 6:10 AM

lib/Target/PowerPC/PPCISelLowering.cpp
11332 ↗	(On Diff #102253)	I realize that you are trying to check for the byte -> word case with this if statement. However, is it possible to also catch word -> byte too? You may have thought of this already and filtered out unwanted cases earlier up... I don't know. I just wanted to bring this to your attention in case it might be a problem.

syzaara added inline comments.Jun 13 2017, 12:49 PM

lib/Target/PowerPC/PPCISelLowering.cpp
11332 ↗	(On Diff #102253)	Actually, I think it's okay since things like word -> byte would fail the isSExtOfVecExtract pattern matching so we wouldn't reach this far.

ping

LGTM. My comments are only about, well, comments. So feel free to fix them on the commit.

lib/Target/PowerPC/PPCISelLowering.cpp
11235 ↗	(On Diff #102253)	Maybe just a quick note about the algorithm to make it easier to understand at first glance. Something like: // Knowing the element indices being extracted from the original // vector and the order in which they're being inserted, just put // them at element indices required for the instruction.
11255 ↗	(On Diff #102253)	Small nit: get rid of the use of `new` in the comment because these things won't be new for very long :).
11258 ↗	(On Diff #102253)	All of this can just go away. You're repeating it in a clean and concise way at the definition of the `TargetElems` array.
11268 ↗	(On Diff #102253)	It's not just any extend, right? Probably `combineBVOfVecSExt()` then.

This revision is now accepted and ready to land.Jun 28 2017, 2:48 PM

Closed by commit rL307169: [Power9] Exploit vector integer extend instructions when indices aren't correct. (authored by jtony). · Explain WhyJul 5 2017, 9:01 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

PowerPC/

4 lines

136 lines

4 lines

98 lines

test/

CodeGen/

PowerPC/

vec_int_ext.ll

251 lines

Diff 105278

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// Newer FCTI[D,W]UZ floating-point-to-integer conversion instructions for		/// Newer FCTI[D,W]UZ floating-point-to-integer conversion instructions for
/// unsigned integers with round toward zero.		/// unsigned integers with round toward zero.
FCTIDUZ, FCTIWUZ,		FCTIDUZ, FCTIWUZ,

/// VEXTS, ByteWidth - takes an input in VSFRC and produces an output in		/// VEXTS, ByteWidth - takes an input in VSFRC and produces an output in
/// VSFRC that is sign-extended from ByteWidth to a 64-byte integer.		/// VSFRC that is sign-extended from ByteWidth to a 64-byte integer.
VEXTS,		VEXTS,

		/// SExtVElems, takes an input vector of a smaller type and sign
		/// extends to an output vector of a larger type.
		SExtVElems,

/// Reciprocal estimate instructions (unary FP ops).		/// Reciprocal estimate instructions (unary FP ops).
FRE, FRSQRTE,		FRE, FRSQRTE,

// VMADDFP, VNMSUBFP - The VMADDFP and VNMSUBFP instructions, taking		// VMADDFP, VNMSUBFP - The VMADDFP and VNMSUBFP instructions, taking
// three v4f32 operands and producing a v4f32 result.		// three v4f32 operands and producing a v4f32 result.
VMADDFP, VNMSUBFP,		VMADDFP, VNMSUBFP,

/// VPERM - The PPC VPERM Instruction.		/// VPERM - The PPC VPERM Instruction.
▲ Show 20 Lines • Show All 1,020 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,162 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::VCMPo: return "PPCISD::VCMPo";		case PPCISD::VCMPo: return "PPCISD::VCMPo";
case PPCISD::LBRX: return "PPCISD::LBRX";		case PPCISD::LBRX: return "PPCISD::LBRX";
case PPCISD::STBRX: return "PPCISD::STBRX";		case PPCISD::STBRX: return "PPCISD::STBRX";
case PPCISD::LFIWAX: return "PPCISD::LFIWAX";		case PPCISD::LFIWAX: return "PPCISD::LFIWAX";
case PPCISD::LFIWZX: return "PPCISD::LFIWZX";		case PPCISD::LFIWZX: return "PPCISD::LFIWZX";
case PPCISD::LXSIZX: return "PPCISD::LXSIZX";		case PPCISD::LXSIZX: return "PPCISD::LXSIZX";
case PPCISD::STXSIX: return "PPCISD::STXSIX";		case PPCISD::STXSIX: return "PPCISD::STXSIX";
case PPCISD::VEXTS: return "PPCISD::VEXTS";		case PPCISD::VEXTS: return "PPCISD::VEXTS";
		case PPCISD::SExtVElems: return "PPCISD::SExtVElems";
case PPCISD::LXVD2X: return "PPCISD::LXVD2X";		case PPCISD::LXVD2X: return "PPCISD::LXVD2X";
case PPCISD::STXVD2X: return "PPCISD::STXVD2X";		case PPCISD::STXVD2X: return "PPCISD::STXVD2X";
case PPCISD::COND_BRANCH: return "PPCISD::COND_BRANCH";		case PPCISD::COND_BRANCH: return "PPCISD::COND_BRANCH";
case PPCISD::BDNZ: return "PPCISD::BDNZ";		case PPCISD::BDNZ: return "PPCISD::BDNZ";
case PPCISD::BDZ: return "PPCISD::BDZ";		case PPCISD::BDZ: return "PPCISD::BDZ";
case PPCISD::MFFS: return "PPCISD::MFFS";		case PPCISD::MFFS: return "PPCISD::MFFS";
case PPCISD::FADDRTZ: return "PPCISD::FADDRTZ";		case PPCISD::FADDRTZ: return "PPCISD::FADDRTZ";
case PPCISD::TC_RETURN: return "PPCISD::TC_RETURN";		case PPCISD::TC_RETURN: return "PPCISD::TC_RETURN";
▲ Show 20 Lines • Show All 10,127 Lines • ▼ Show 20 Lines	for (int i = N->getNumOperands() - 1; i >= 0; i--)
Ops.push_back(i);		Ops.push_back(i);

return DAG.getVectorShuffle(N->getValueType(0), dl, Load,		return DAG.getVectorShuffle(N->getValueType(0), dl, Load,
DAG.getUNDEF(N->getValueType(0)), Ops);		DAG.getUNDEF(N->getValueType(0)), Ops);
}		}
return SDValue();		return SDValue();
}		}

		// This function adds the required vector_shuffle needed to get
		// the elements of the vector extract in the correct position
		// as specified by the CorrectElems encoding.
		static SDValue addShuffleForVecExtend(SDNode *N, SelectionDAG &DAG,
		SDValue Input, uint64_t Elems,
		uint64_t CorrectElems) {
		SDLoc dl(N);

		unsigned NumElems = Input.getValueType().getVectorNumElements();
		SmallVector<int, 16> ShuffleMask(NumElems, -1);

		// Knowing the element indices being extracted from the original
		// vector and the order in which they're being inserted, just put
		// them at element indices required for the instruction.
		for (unsigned i = 0; i < N->getNumOperands(); i++) {
		if (DAG.getDataLayout().isLittleEndian())
		ShuffleMask[CorrectElems & 0xF] = Elems & 0xF;
		else
		ShuffleMask[(CorrectElems & 0xF0) >> 4] = (Elems & 0xF0) >> 4;
		CorrectElems = CorrectElems >> 8;
		Elems = Elems >> 8;
		}

		SDValue Shuffle =
		DAG.getVectorShuffle(Input.getValueType(), dl, Input,
		DAG.getUNDEF(Input.getValueType()), ShuffleMask);

		EVT Ty = N->getValueType(0);
		SDValue BV = DAG.getNode(PPCISD::SExtVElems, dl, Ty, Shuffle);
		return BV;
		}

		// Look for build vector patterns where input operands come from sign
		// extended vector_extract elements of specific indices. If the correct indices
		// aren't used, add a vector shuffle to fix up the indices and create a new
		// PPCISD:SExtVElems node which selects the vector sign extend instructions
		// during instruction selection.
		static SDValue combineBVOfVecSExt(SDNode *N, SelectionDAG &DAG) {
		// This array encodes the indices that the vector sign extend instructions
		// extract from when extending from one type to another for both BE and LE.
		// The right nibble of each byte corresponds to the LE incides.
		// and the left nibble of each byte corresponds to the BE incides.
		// For example: 0x3074B8FC byte->word
		// For LE: the allowed indices are: 0x0,0x4,0x8,0xC
		// For BE: the allowed indices are: 0x3,0x7,0xB,0xF
		// For example: 0x000070F8 byte->double word
		// For LE: the allowed indices are: 0x0,0x8
		// For BE: the allowed indices are: 0x7,0xF
		uint64_t TargetElems[] = {
		0x3074B8FC, // b->w
		0x000070F8, // b->d
		0x10325476, // h->w
		0x00003074, // h->d
		0x00001032, // w->d
		};

		uint64_t Elems = 0;
		int Index;
		SDValue Input;

		auto isSExtOfVecExtract = [&](SDValue Op) -> bool {
		if (!Op)
		return false;
		if (Op.getOpcode() != ISD::SIGN_EXTEND)
		return false;

		SDValue Extract = Op.getOperand(0);
		if (Extract.getOpcode() != ISD::EXTRACT_VECTOR_ELT)
		return false;

		ConstantSDNode *ExtOp = dyn_cast<ConstantSDNode>(Extract.getOperand(1));
		if (!ExtOp)
		return false;

		Index = ExtOp->getZExtValue();
		if (Input && Input != Extract.getOperand(0))
		return false;

		if (!Input)
		Input = Extract.getOperand(0);

		Elems = Elems << 8;
		Index = DAG.getDataLayout().isLittleEndian() ? Index : Index << 4;
		Elems \|= Index;

		return true;
		};

		// If the build vector operands aren't sign extended vector extracts,
		// of the same input vector, then return.
		for (unsigned i = 0; i < N->getNumOperands(); i++) {
		if (!isSExtOfVecExtract(N->getOperand(i))) {
		return SDValue();
		}
		}

		// If the vector extract indicies are not correct, add the appropriate
		// vector_shuffle.
		int TgtElemArrayIdx;
		int InputSize = Input.getValueType().getScalarSizeInBits();
		int OutputSize = N->getValueType(0).getScalarSizeInBits();
		if (InputSize + OutputSize == 40)
		TgtElemArrayIdx = 0;
		else if (InputSize + OutputSize == 72)
		TgtElemArrayIdx = 1;
		else if (InputSize + OutputSize == 48)
		TgtElemArrayIdx = 2;
		else if (InputSize + OutputSize == 80)
		TgtElemArrayIdx = 3;
		else if (InputSize + OutputSize == 96)
		TgtElemArrayIdx = 4;
		else
		return SDValue();

		uint64_t CorrectElems = TargetElems[TgtElemArrayIdx];
		CorrectElems = DAG.getDataLayout().isLittleEndian()
		? CorrectElems & 0x0F0F0F0F0F0F0F0F
		: CorrectElems & 0xF0F0F0F0F0F0F0F0;
		if (Elems != CorrectElems) {
		return addShuffleForVecExtend(N, DAG, Input, Elems, CorrectElems);
		}

		// Regular lowering will catch cases where a shuffle is not needed.
		return SDValue();
		}

SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,		SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
assert(N->getOpcode() == ISD::BUILD_VECTOR &&		assert(N->getOpcode() == ISD::BUILD_VECTOR &&
"Should be called with a BUILD_VECTOR node");		"Should be called with a BUILD_VECTOR node");

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDLoc dl(N);		SDLoc dl(N);

Show All 11 Lines	SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,
}		}

// If we're building a vector out of consecutive loads, just load that		// If we're building a vector out of consecutive loads, just load that
// vector type.		// vector type.
SDValue Reduced = combineBVOfConsecutiveLoads(N, DAG);		SDValue Reduced = combineBVOfConsecutiveLoads(N, DAG);
if (Reduced)		if (Reduced)
return Reduced;		return Reduced;

		// If we're building a vector out of extended elements from another vector
		// we have P9 vector integer extend instructions.
		if (Subtarget.hasP9Altivec()) {
		Reduced = combineBVOfVecSExt(N, DAG);
		if (Reduced)
		return Reduced;
		}


if (N->getValueType(0) != MVT::v2f64)		if (N->getValueType(0) != MVT::v2f64)
return SDValue();		return SDValue();

// Looking for:		// Looking for:
// (build_vector ([su]int_to_fp (extractelt 0)), [su]int_to_fp (extractelt 1))		// (build_vector ([su]int_to_fp (extractelt 0)), [su]int_to_fp (extractelt 1))
if (FirstInput.getOpcode() != ISD::SINT_TO_FP &&		if (FirstInput.getOpcode() != ISD::SINT_TO_FP &&
FirstInput.getOpcode() != ISD::UINT_TO_FP)		FirstInput.getOpcode() != ISD::UINT_TO_FP)
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 1,972 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td

Show All 26 Lines	def SDT_PPCLxsizx : SDTypeProfile<1, 2, [
SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>		SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>
]>;		]>;
def SDT_PPCstxsix : SDTypeProfile<0, 3, [		def SDT_PPCstxsix : SDTypeProfile<0, 3, [
SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>		SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>
]>;		]>;
def SDT_PPCVexts : SDTypeProfile<1, 2, [		def SDT_PPCVexts : SDTypeProfile<1, 2, [
SDTCisVT<0, f64>, SDTCisVT<1, f64>, SDTCisPtrTy<2>		SDTCisVT<0, f64>, SDTCisVT<1, f64>, SDTCisPtrTy<2>
]>;		]>;
		def SDT_PPCSExtVElems : SDTypeProfile<1, 1, [
		SDTCisVec<0>, SDTCisVec<1>
		]>;

def SDT_PPCCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32>,		def SDT_PPCCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32>,
SDTCisVT<1, i32> ]>;		SDTCisVT<1, i32> ]>;
def SDT_PPCCallSeqEnd : SDCallSeqEnd<[ SDTCisVT<0, i32>,		def SDT_PPCCallSeqEnd : SDCallSeqEnd<[ SDTCisVT<0, i32>,
SDTCisVT<1, i32> ]>;		SDTCisVT<1, i32> ]>;
def SDT_PPCvperm : SDTypeProfile<1, 3, [		def SDT_PPCvperm : SDTypeProfile<1, 3, [
SDTCisVT<3, v16i8>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>		SDTCisVT<3, v16i8>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>
]>;		]>;
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	def PPClfiwax : SDNode<"PPCISD::LFIWAX", SDT_PPClfiwx,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def PPClfiwzx : SDNode<"PPCISD::LFIWZX", SDT_PPClfiwx,		def PPClfiwzx : SDNode<"PPCISD::LFIWZX", SDT_PPClfiwx,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def PPClxsizx : SDNode<"PPCISD::LXSIZX", SDT_PPCLxsizx,		def PPClxsizx : SDNode<"PPCISD::LXSIZX", SDT_PPCLxsizx,
[SDNPHasChain, SDNPMayLoad]>;		[SDNPHasChain, SDNPMayLoad]>;
def PPCstxsix : SDNode<"PPCISD::STXSIX", SDT_PPCstxsix,		def PPCstxsix : SDNode<"PPCISD::STXSIX", SDT_PPCstxsix,
[SDNPHasChain, SDNPMayStore]>;		[SDNPHasChain, SDNPMayStore]>;
def PPCVexts : SDNode<"PPCISD::VEXTS", SDT_PPCVexts, []>;		def PPCVexts : SDNode<"PPCISD::VEXTS", SDT_PPCVexts, []>;
		def PPCSExtVElems : SDNode<"PPCISD::SExtVElems", SDT_PPCSExtVElems, []>;

// Extract FPSCR (not modeled at the DAG level).		// Extract FPSCR (not modeled at the DAG level).
def PPCmffs : SDNode<"PPCISD::MFFS",		def PPCmffs : SDNode<"PPCISD::MFFS",
SDTypeProfile<1, 0, [SDTCisVT<0, f64>]>, []>;		SDTypeProfile<1, 0, [SDTCisVT<0, f64>]>, []>;

// Perform FADD in round-to-zero mode.		// Perform FADD in round-to-zero mode.
def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;		def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;

▲ Show 20 Lines • Show All 4,311 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 2,723 Lines • ▼ Show 20 Lines
def DblToFlt {		def DblToFlt {
dag A0 = (f32 (fpround (f64 (extractelt v2f64:$A, 0))));		dag A0 = (f32 (fpround (f64 (extractelt v2f64:$A, 0))));
dag A1 = (f32 (fpround (f64 (extractelt v2f64:$A, 1))));		dag A1 = (f32 (fpround (f64 (extractelt v2f64:$A, 1))));
dag B0 = (f32 (fpround (f64 (extractelt v2f64:$B, 0))));		dag B0 = (f32 (fpround (f64 (extractelt v2f64:$B, 0))));
dag B1 = (f32 (fpround (f64 (extractelt v2f64:$B, 1))));		dag B1 = (f32 (fpround (f64 (extractelt v2f64:$B, 1))));
}		}

def ByteToWord {		def ByteToWord {
dag A0 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 0)), i8));		dag LE_A0 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 0)), i8));
dag A1 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 4)), i8));		dag LE_A1 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 4)), i8));
dag A2 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 8)), i8));		dag LE_A2 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 8)), i8));
dag A3 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 12)), i8));		dag LE_A3 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 12)), i8));
		dag BE_A0 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 3)), i8));
		dag BE_A1 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 7)), i8));
		dag BE_A2 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 11)), i8));
		dag BE_A3 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 15)), i8));
}		}

def ByteToDWord {		def ByteToDWord {
dag A0 = (i64 (sext_inreg		dag LE_A0 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v16i8:$A, 0)))), i8));		(i64 (anyext (i32 (vector_extract v16i8:$A, 0)))), i8));
dag A1 = (i64 (sext_inreg		dag LE_A1 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v16i8:$A, 8)))), i8));		(i64 (anyext (i32 (vector_extract v16i8:$A, 8)))), i8));
		dag BE_A0 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v16i8:$A, 7)))), i8));
		dag BE_A1 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v16i8:$A, 15)))), i8));
}		}

def HWordToWord {		def HWordToWord {
dag A0 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 0)), i16));		dag LE_A0 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 0)), i16));
dag A1 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 2)), i16));		dag LE_A1 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 2)), i16));
dag A2 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 4)), i16));		dag LE_A2 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 4)), i16));
dag A3 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 6)), i16));		dag LE_A3 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 6)), i16));
		dag BE_A0 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 1)), i16));
		dag BE_A1 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 3)), i16));
		dag BE_A2 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 5)), i16));
		dag BE_A3 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 7)), i16));
}		}

def HWordToDWord {		def HWordToDWord {
dag A0 = (i64 (sext_inreg		dag LE_A0 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v8i16:$A, 0)))), i16));		(i64 (anyext (i32 (vector_extract v8i16:$A, 0)))), i16));
dag A1 = (i64 (sext_inreg		dag LE_A1 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v8i16:$A, 4)))), i16));		(i64 (anyext (i32 (vector_extract v8i16:$A, 4)))), i16));
		dag BE_A0 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v8i16:$A, 3)))), i16));
		dag BE_A1 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v8i16:$A, 7)))), i16));
}		}

def WordToDWord {		def WordToDWord {
dag A0 = (i64 (sext (i32 (vector_extract v4i32:$A, 0))));		dag LE_A0 = (i64 (sext (i32 (vector_extract v4i32:$A, 0))));
dag A1 = (i64 (sext (i32 (vector_extract v4i32:$A, 2))));		dag LE_A1 = (i64 (sext (i32 (vector_extract v4i32:$A, 2))));
		dag BE_A0 = (i64 (sext (i32 (vector_extract v4i32:$A, 1))));
		dag BE_A1 = (i64 (sext (i32 (vector_extract v4i32:$A, 3))));
}		}

def FltToIntLoad {		def FltToIntLoad {
dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (extloadf32 xoaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (extloadf32 xoaddr:$A)))));
}		}
def FltToUIntLoad {		def FltToUIntLoad {
dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));
}		}
▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	def : Pat<(v2i64 (build_vector i64:$rA, i64:$rB)),
(v2i64 (MTVSRDD $rB, $rA))>;		(v2i64 (MTVSRDD $rB, $rA))>;
def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),		def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
(VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.D, AnyExts.B), VSRC),		(VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.D, AnyExts.B), VSRC),
(COPY_TO_REGCLASS (MTVSRDD AnyExts.C, AnyExts.A), VSRC))>;		(COPY_TO_REGCLASS (MTVSRDD AnyExts.C, AnyExts.A), VSRC))>;
}		}
// P9 Altivec instructions that can be used to build vectors.		// P9 Altivec instructions that can be used to build vectors.
// Adding them to PPCInstrVSX.td rather than PPCAltivecVSX.td to compete		// Adding them to PPCInstrVSX.td rather than PPCAltivecVSX.td to compete
// with complexities of existing build vector patterns in this file.		// with complexities of existing build vector patterns in this file.
let Predicates = [HasP9Altivec] in {		let Predicates = [HasP9Altivec, IsLittleEndian] in {
def : Pat<(v2i64 (build_vector WordToDWord.A0, WordToDWord.A1)),		def : Pat<(v2i64 (build_vector WordToDWord.LE_A0, WordToDWord.LE_A1)),
		(v2i64 (VEXTSW2D $A))>;
		def : Pat<(v2i64 (build_vector HWordToDWord.LE_A0, HWordToDWord.LE_A1)),
		(v2i64 (VEXTSH2D $A))>;
		def : Pat<(v4i32 (build_vector HWordToWord.LE_A0, HWordToWord.LE_A1,
		HWordToWord.LE_A2, HWordToWord.LE_A3)),
		(v4i32 (VEXTSH2W $A))>;
		def : Pat<(v4i32 (build_vector ByteToWord.LE_A0, ByteToWord.LE_A1,
		ByteToWord.LE_A2, ByteToWord.LE_A3)),
		(v4i32 (VEXTSB2W $A))>;
		def : Pat<(v2i64 (build_vector ByteToDWord.LE_A0, ByteToDWord.LE_A1)),
		(v2i64 (VEXTSB2D $A))>;
		}

		let Predicates = [HasP9Altivec, IsBigEndian] in {
		def : Pat<(v2i64 (build_vector WordToDWord.BE_A0, WordToDWord.BE_A1)),
(v2i64 (VEXTSW2D $A))>;		(v2i64 (VEXTSW2D $A))>;
def : Pat<(v2i64 (build_vector HWordToDWord.A0, HWordToDWord.A1)),		def : Pat<(v2i64 (build_vector HWordToDWord.BE_A0, HWordToDWord.BE_A1)),
(v2i64 (VEXTSH2D $A))>;		(v2i64 (VEXTSH2D $A))>;
def : Pat<(v4i32 (build_vector HWordToWord.A0, HWordToWord.A1,		def : Pat<(v4i32 (build_vector HWordToWord.BE_A0, HWordToWord.BE_A1,
HWordToWord.A2, HWordToWord.A3)),		HWordToWord.BE_A2, HWordToWord.BE_A3)),
(v4i32 (VEXTSH2W $A))>;		(v4i32 (VEXTSH2W $A))>;
def : Pat<(v4i32 (build_vector ByteToWord.A0, ByteToWord.A1,		def : Pat<(v4i32 (build_vector ByteToWord.BE_A0, ByteToWord.BE_A1,
ByteToWord.A2, ByteToWord.A3)),		ByteToWord.BE_A2, ByteToWord.BE_A3)),
(v4i32 (VEXTSB2W $A))>;		(v4i32 (VEXTSB2W $A))>;
def : Pat<(v2i64 (build_vector ByteToDWord.A0, ByteToDWord.A1)),		def : Pat<(v2i64 (build_vector ByteToDWord.BE_A0, ByteToDWord.BE_A1)),
(v2i64 (VEXTSB2D $A))>;		(v2i64 (VEXTSB2D $A))>;
}		}

		let Predicates = [HasP9Altivec] in {
		def: Pat<(v2i64 (PPCSExtVElems v16i8:$A)),
		(v2i64 (VEXTSB2D $A))>;
		def: Pat<(v2i64 (PPCSExtVElems v8i16:$A)),
		(v2i64 (VEXTSH2D $A))>;
		def: Pat<(v2i64 (PPCSExtVElems v4i32:$A)),
		(v2i64 (VEXTSW2D $A))>;
		def: Pat<(v4i32 (PPCSExtVElems v16i8:$A)),
		(v4i32 (VEXTSB2W $A))>;
		def: Pat<(v4i32 (PPCSExtVElems v8i16:$A)),
		(v4i32 (VEXTSH2W $A))>;
		}
}		}

llvm/trunk/test/CodeGen/PowerPC/vec_int_ext.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=PWR9			; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-gnu-linux -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=CHECK-LE
	target triple = "powerpc64le-unknown-linux-gnu"			; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-gnu-linux -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=CHECK-BE

				define <4 x i32> @vextsb2wLE(<16 x i8> %a) {
				; CHECK-LE-LABEL: vextsb2wLE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vextsb2w 2, 2
				; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsb2wLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsb2w 2, 2
				; CHECK-BE-NEXT: blr

	define <4 x i32> @vextsb2w(<16 x i8> %a) {
	; PWR9-LABEL: vextsb2w:
	; PWR9: # BB#0: # %entry
	; PWR9-NEXT: vextsb2w 2, 2
	; PWR9-NEXT: blr
	entry:			entry:
	%vecext = extractelement <16 x i8> %a, i32 0			%vecext = extractelement <16 x i8> %a, i32 0
	%conv = sext i8 %vecext to i32			%conv = sext i8 %vecext to i32
	%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0			%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
	%vecext1 = extractelement <16 x i8> %a, i32 4			%vecext1 = extractelement <16 x i8> %a, i32 4
	%conv2 = sext i8 %vecext1 to i32			%conv2 = sext i8 %vecext1 to i32
	%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1			%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
	%vecext4 = extractelement <16 x i8> %a, i32 8			%vecext4 = extractelement <16 x i8> %a, i32 8
	%conv5 = sext i8 %vecext4 to i32			%conv5 = sext i8 %vecext4 to i32
	%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2			%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
	%vecext7 = extractelement <16 x i8> %a, i32 12			%vecext7 = extractelement <16 x i8> %a, i32 12
	%conv8 = sext i8 %vecext7 to i32			%conv8 = sext i8 %vecext7 to i32
	%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3			%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
	ret <4 x i32> %vecinit9			ret <4 x i32> %vecinit9
	}			}

	define <2 x i64> @vextsb2d(<16 x i8> %a) {			define <2 x i64> @vextsb2dLE(<16 x i8> %a) {
	; PWR9-LABEL: vextsb2d:			; CHECK-LE-LABEL: vextsb2dLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsb2d 2, 2			; CHECK-LE-NEXT: vextsb2d 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsb2dLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsb2d 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <16 x i8> %a, i32 0			%vecext = extractelement <16 x i8> %a, i32 0
	%conv = sext i8 %vecext to i64			%conv = sext i8 %vecext to i64
	%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0			%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
	%vecext1 = extractelement <16 x i8> %a, i32 8			%vecext1 = extractelement <16 x i8> %a, i32 8
	%conv2 = sext i8 %vecext1 to i64			%conv2 = sext i8 %vecext1 to i64
	%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1			%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
	ret <2 x i64> %vecinit3			ret <2 x i64> %vecinit3
	}			}

	define <4 x i32> @vextsh2w(<8 x i16> %a) {			define <4 x i32> @vextsh2wLE(<8 x i16> %a) {
	; PWR9-LABEL: vextsh2w:			; CHECK-LE-LABEL: vextsh2wLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsh2w 2, 2			; CHECK-LE-NEXT: vextsh2w 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsh2wLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsh2w 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <8 x i16> %a, i32 0			%vecext = extractelement <8 x i16> %a, i32 0
	%conv = sext i16 %vecext to i32			%conv = sext i16 %vecext to i32
	%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0			%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
	%vecext1 = extractelement <8 x i16> %a, i32 2			%vecext1 = extractelement <8 x i16> %a, i32 2
	%conv2 = sext i16 %vecext1 to i32			%conv2 = sext i16 %vecext1 to i32
	%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1			%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
	%vecext4 = extractelement <8 x i16> %a, i32 4			%vecext4 = extractelement <8 x i16> %a, i32 4
	%conv5 = sext i16 %vecext4 to i32			%conv5 = sext i16 %vecext4 to i32
	%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2			%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
	%vecext7 = extractelement <8 x i16> %a, i32 6			%vecext7 = extractelement <8 x i16> %a, i32 6
	%conv8 = sext i16 %vecext7 to i32			%conv8 = sext i16 %vecext7 to i32
	%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3			%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
	ret <4 x i32> %vecinit9			ret <4 x i32> %vecinit9
	}			}

	define <2 x i64> @vextsh2d(<8 x i16> %a) {			define <2 x i64> @vextsh2dLE(<8 x i16> %a) {
	; PWR9-LABEL: vextsh2d:			; CHECK-LE-LABEL: vextsh2dLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsh2d 2, 2			; CHECK-LE-NEXT: vextsh2d 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsh2dLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsh2d 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <8 x i16> %a, i32 0			%vecext = extractelement <8 x i16> %a, i32 0
	%conv = sext i16 %vecext to i64			%conv = sext i16 %vecext to i64
	%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0			%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
	%vecext1 = extractelement <8 x i16> %a, i32 4			%vecext1 = extractelement <8 x i16> %a, i32 4
	%conv2 = sext i16 %vecext1 to i64			%conv2 = sext i16 %vecext1 to i64
	%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1			%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
	ret <2 x i64> %vecinit3			ret <2 x i64> %vecinit3
	}			}

	define <2 x i64> @vextsw2d(<4 x i32> %a) {			define <2 x i64> @vextsw2dLE(<4 x i32> %a) {
	; PWR9-LABEL: vextsw2d:			; CHECK-LE-LABEL: vextsw2dLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsw2d 2, 2			; CHECK-LE-NEXT: vextsw2d 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsw2dLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vmrgew
				; CHECK-BE-NEXT: vextsw2d 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <4 x i32> %a, i32 0			%vecext = extractelement <4 x i32> %a, i32 0
	%conv = sext i32 %vecext to i64			%conv = sext i32 %vecext to i64
	%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0			%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
	%vecext1 = extractelement <4 x i32> %a, i32 2			%vecext1 = extractelement <4 x i32> %a, i32 2
	%conv2 = sext i32 %vecext1 to i64			%conv2 = sext i32 %vecext1 to i64
	%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1			%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
	ret <2 x i64> %vecinit3			ret <2 x i64> %vecinit3
	}			}

				define <4 x i32> @vextsb2wBE(<16 x i8> %a) {
				; CHECK-BE-LABEL: vextsb2wBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsb2w 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsb2wBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 13
				; CHECK-LE-NEXT: vextsb2w 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <16 x i8> %a, i32 3
				%conv = sext i8 %vecext to i32
				%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
				%vecext1 = extractelement <16 x i8> %a, i32 7
				%conv2 = sext i8 %vecext1 to i32
				%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
				%vecext4 = extractelement <16 x i8> %a, i32 11
				%conv5 = sext i8 %vecext4 to i32
				%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
				%vecext7 = extractelement <16 x i8> %a, i32 15
				%conv8 = sext i8 %vecext7 to i32
				%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
				ret <4 x i32> %vecinit9
				}

				define <2 x i64> @vextsb2dBE(<16 x i8> %a) {
				; CHECK-BE-LABEL: vextsb2dBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsb2d 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsb2dBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 9
				; CHECK-LE-NEXT: vextsb2d 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <16 x i8> %a, i32 7
				%conv = sext i8 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <16 x i8> %a, i32 15
				%conv2 = sext i8 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <4 x i32> @vextsh2wBE(<8 x i16> %a) {
				; CHECK-BE-LABEL: vextsh2wBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsh2w 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsh2wBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 14
				; CHECK-LE-NEXT: vextsh2w 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <8 x i16> %a, i32 1
				%conv = sext i16 %vecext to i32
				%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
				%vecext1 = extractelement <8 x i16> %a, i32 3
				%conv2 = sext i16 %vecext1 to i32
				%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
				%vecext4 = extractelement <8 x i16> %a, i32 5
				%conv5 = sext i16 %vecext4 to i32
				%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
				%vecext7 = extractelement <8 x i16> %a, i32 7
				%conv8 = sext i16 %vecext7 to i32
				%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
				ret <4 x i32> %vecinit9
				}

				define <2 x i64> @vextsh2dBE(<8 x i16> %a) {
				; CHECK-BE-LABEL: vextsh2dBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsh2d 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsh2dBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 10
				; CHECK-LE-NEXT: vextsh2d 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <8 x i16> %a, i32 3
				%conv = sext i16 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <8 x i16> %a, i32 7
				%conv2 = sext i16 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <2 x i64> @vextsw2dBE(<4 x i32> %a) {
				; CHECK-BE-LABEL: vextsw2dBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsw2d 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsw2dBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 12
				; CHECK-LE-NEXT: vextsw2d 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <4 x i32> %a, i32 1
				%conv = sext i32 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <4 x i32> %a, i32 3
				%conv2 = sext i32 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <2 x i64> @vextDiffVectors(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LE-LABEL: vextDiffVectors:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NOT: vextsw2d

				; CHECK-BE-LABEL: vextDiffVectors:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NOT: vextsw2d
				entry:
				%vecext = extractelement <4 x i32> %a, i32 0
				%conv = sext i32 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <4 x i32> %b, i32 2
				%conv2 = sext i32 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <8 x i16> @testInvalidExtend(<16 x i8> %a) {
				entry:
				; CHECK-LE-LABEL: testInvalidExtend:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NOT: vexts

				; CHECK-BE-LABEL: testInvalidExtend:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NOT: vexts

				%vecext = extractelement <16 x i8> %a, i32 0
				%conv = sext i8 %vecext to i16
				%vecinit = insertelement <8 x i16> undef, i16 %conv, i32 0
				%vecext1 = extractelement <16 x i8> %a, i32 2
				%conv2 = sext i8 %vecext1 to i16
				%vecinit3 = insertelement <8 x i16> %vecinit, i16 %conv2, i32 1
				%vecext4 = extractelement <16 x i8> %a, i32 4
				%conv5 = sext i8 %vecext4 to i16
				%vecinit6 = insertelement <8 x i16> %vecinit3, i16 %conv5, i32 2
				%vecext7 = extractelement <16 x i8> %a, i32 6
				%conv8 = sext i8 %vecext7 to i16
				%vecinit9 = insertelement <8 x i16> %vecinit6, i16 %conv8, i32 3
				%vecext10 = extractelement <16 x i8> %a, i32 8
				%conv11 = sext i8 %vecext10 to i16
				%vecinit12 = insertelement <8 x i16> %vecinit9, i16 %conv11, i32 4
				%vecext13 = extractelement <16 x i8> %a, i32 10
				%conv14 = sext i8 %vecext13 to i16
				%vecinit15 = insertelement <8 x i16> %vecinit12, i16 %conv14, i32 5
				%vecext16 = extractelement <16 x i8> %a, i32 12
				%conv17 = sext i8 %vecext16 to i16
				%vecinit18 = insertelement <8 x i16> %vecinit15, i16 %conv17, i32 6
				%vecext19 = extractelement <16 x i8> %a, i32 14
				%conv20 = sext i8 %vecext19 to i16
				%vecinit21 = insertelement <8 x i16> %vecinit18, i16 %conv20, i32 7
				ret <8 x i16> %vecinit21
				}