This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
1
PPCISelLowering.h
15
PPCISelLowering.cpp
-
PPCInstrInfo.td
2
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
3
vec_int_ext.ll

Differential D34009

[Power9] Exploit vector integer extend instructions when indices aren't correct
ClosedPublic

Authored by syzaara on Jun 7 2017, 12:49 PM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
sfertile
lei
jtony
inouehrs
stefanp
echristo
hfinkel

Commits

rG9a91a1811001: [Power9] Exploit vector integer extend instructions when indices aren't correct.
rL307169: [Power9] Exploit vector integer extend instructions when indices aren't correct.

Summary

This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine.

Diff Detail

Event Timeline

syzaara created this revision.Jun 7 2017, 12:49 PM

syzaara added reviewers: echristo, hfinkel.Jun 8 2017, 6:16 AM

syzaara removed a subscriber: echristo.

nemanjai added inline comments.Jun 12 2017, 3:50 AM

lib/Target/PowerPC/PPCISelLowering.cpp
11226	I'm afraid that having such a generically named function in such a large file may get confusing down the road. Perhaps it would be clearer to fold this logic into `combineBVOfVecExtend()` and just use `getScalarSizeInBits()`. Perhaps something like: if (InputSize + OutputSize == 5) TgtElemArrayIdx = 0; // ... In any case, you don't want to use a valid array index for an invalid combination as you do here. For example, I don't really see anything in this patch that will prevent you from doing something weird for the `v16i8 -> v8i16` case (which there isn't an instruction for as far as I can tell). Also, please add that as a test case. I realize there isn't a pattern for this in the .td file, but I imagine you'll end up with some weird result in this case.
11255	I think it would be much cleaner to make this array local to `combineBVOfVecExtend` (you can then pass the target encoding to `addShuffleForVecExtend` rather than indexing into it again).
11271	Just initialize this at construction time and remove the loop.
11288	Do we really need this? I thought there were overloads of `SelectionDAG::getNode()` that take two `SDValue`'s.
11313	I don't really see a reason for this to return an `int` rather than a `bool`. The reader might assume that the lambda returns values from a larger space if the return type is `int`.
11335	Just a nit: it'd probably be more concise and readable to just use the ternary operator here.
11337	I don't actually understand how this works. Won't the nibbles that correspond to the target elements for the opposite endianness always just be zero? And if that's the case, won't the comparison below always fail? (see below) Perhaps the unnecessary shuffle produced will just be optimized out, but it's probably better not to emit it to begin with.
11352	Shouldn't both operands of this comparison just have the nibbles corresponding to the opposite endianness masked out? i.e. both should be and-ed with `0x0F0F0F0F0F0F0F0F` of `0xF0F0F0F0F0F0F0F0`
11355	Just a comment along the lines of `// Regular lowering will catch cases where a shuffle is not needed.`
lib/Target/PowerPC/PPCISelLowering.h
70	Nit: line length.
lib/Target/PowerPC/PPCInstrVSX.td
2721	Weren't these added with the previous patch? Or maybe it was only the LE ones? If so, can you please rebase this patch so that it would apply cleanly?
3051	The versions with the new node are not endianness specific are they? If not, please move them out of the blocks and have a single copy of each.
test/CodeGen/PowerPC/vec_int_ext.ll
4	Please add a test that checks that there are no shuffles when the input elements are correct.

syzaara updated this revision to Diff 102253.Jun 12 2017, 3:41 PM

syzaara added inline comments.Jun 12 2017, 3:44 PM

test/CodeGen/PowerPC/vec_int_ext.ll
4	Each of these tests is already checking both correct and incorrect elements. The LE tests are using correct elements for LE and so the BE pattern match should have the shuffle. The BE tests are using correct elements for BE and so the LE pattern match should have the shuffle.

nemanjai added inline comments.Jun 12 2017, 9:51 PM

test/CodeGen/PowerPC/vec_int_ext.ll
4	Ah I see. Not sure how I missed that. Sorry.

stefanp added inline comments.Jun 13 2017, 6:10 AM

lib/Target/PowerPC/PPCISelLowering.cpp
11332	I realize that you are trying to check for the byte -> word case with this if statement. However, is it possible to also catch word -> byte too? You may have thought of this already and filtered out unwanted cases earlier up... I don't know. I just wanted to bring this to your attention in case it might be a problem.

syzaara added inline comments.Jun 13 2017, 12:49 PM

lib/Target/PowerPC/PPCISelLowering.cpp
11332	Actually, I think it's okay since things like word -> byte would fail the isSExtOfVecExtract pattern matching so we wouldn't reach this far.

ping

LGTM. My comments are only about, well, comments. So feel free to fix them on the commit.

lib/Target/PowerPC/PPCISelLowering.cpp
11235	Maybe just a quick note about the algorithm to make it easier to understand at first glance. Something like: // Knowing the element indices being extracted from the original // vector and the order in which they're being inserted, just put // them at element indices required for the instruction.
11255	Small nit: get rid of the use of `new` in the comment because these things won't be new for very long :).
11258	All of this can just go away. You're repeating it in a clean and concise way at the definition of the `TargetElems` array.
11268	It's not just any extend, right? Probably `combineBVOfVecSExt()` then.

This revision is now accepted and ready to land.Jun 28 2017, 2:48 PM

Closed by commit rL307169: [Power9] Exploit vector integer extend instructions when indices aren't correct. (authored by jtony). · Explain WhyJul 5 2017, 9:01 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

4 lines

143 lines

4 lines

98 lines

test/

CodeGen/

PowerPC/

vec_int_ext.ll

253 lines

Diff 102253

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// Newer FCTI[D,W]UZ floating-point-to-integer conversion instructions for		/// Newer FCTI[D,W]UZ floating-point-to-integer conversion instructions for
/// unsigned integers with round toward zero.		/// unsigned integers with round toward zero.
FCTIDUZ, FCTIWUZ,		FCTIDUZ, FCTIWUZ,

/// VEXTS, ByteWidth - takes an input in VSFRC and produces an output in		/// VEXTS, ByteWidth - takes an input in VSFRC and produces an output in
/// VSFRC that is sign-extended from ByteWidth to a 64-byte integer.		/// VSFRC that is sign-extended from ByteWidth to a 64-byte integer.
VEXTS,		VEXTS,

		/// SExtVElems, takes an input vector of a smaller type and sign
		nemanjaiUnsubmitted Not Done Reply Inline Actions Nit: line length. nemanjai: Nit: line length.
		/// extends to an output vector of a larger type.
		SExtVElems,

/// Reciprocal estimate instructions (unary FP ops).		/// Reciprocal estimate instructions (unary FP ops).
FRE, FRSQRTE,		FRE, FRSQRTE,

// VMADDFP, VNMSUBFP - The VMADDFP and VNMSUBFP instructions, taking		// VMADDFP, VNMSUBFP - The VMADDFP and VNMSUBFP instructions, taking
// three v4f32 operands and producing a v4f32 result.		// three v4f32 operands and producing a v4f32 result.
VMADDFP, VNMSUBFP,		VMADDFP, VNMSUBFP,

/// VPERM - The PPC VPERM Instruction.		/// VPERM - The PPC VPERM Instruction.
▲ Show 20 Lines • Show All 998 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,149 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::VCMPo: return "PPCISD::VCMPo";		case PPCISD::VCMPo: return "PPCISD::VCMPo";
case PPCISD::LBRX: return "PPCISD::LBRX";		case PPCISD::LBRX: return "PPCISD::LBRX";
case PPCISD::STBRX: return "PPCISD::STBRX";		case PPCISD::STBRX: return "PPCISD::STBRX";
case PPCISD::LFIWAX: return "PPCISD::LFIWAX";		case PPCISD::LFIWAX: return "PPCISD::LFIWAX";
case PPCISD::LFIWZX: return "PPCISD::LFIWZX";		case PPCISD::LFIWZX: return "PPCISD::LFIWZX";
case PPCISD::LXSIZX: return "PPCISD::LXSIZX";		case PPCISD::LXSIZX: return "PPCISD::LXSIZX";
case PPCISD::STXSIX: return "PPCISD::STXSIX";		case PPCISD::STXSIX: return "PPCISD::STXSIX";
case PPCISD::VEXTS: return "PPCISD::VEXTS";		case PPCISD::VEXTS: return "PPCISD::VEXTS";
		case PPCISD::SExtVElems: return "PPCISD::SExtVElems";
case PPCISD::LXVD2X: return "PPCISD::LXVD2X";		case PPCISD::LXVD2X: return "PPCISD::LXVD2X";
case PPCISD::STXVD2X: return "PPCISD::STXVD2X";		case PPCISD::STXVD2X: return "PPCISD::STXVD2X";
case PPCISD::COND_BRANCH: return "PPCISD::COND_BRANCH";		case PPCISD::COND_BRANCH: return "PPCISD::COND_BRANCH";
case PPCISD::BDNZ: return "PPCISD::BDNZ";		case PPCISD::BDNZ: return "PPCISD::BDNZ";
case PPCISD::BDZ: return "PPCISD::BDZ";		case PPCISD::BDZ: return "PPCISD::BDZ";
case PPCISD::MFFS: return "PPCISD::MFFS";		case PPCISD::MFFS: return "PPCISD::MFFS";
case PPCISD::FADDRTZ: return "PPCISD::FADDRTZ";		case PPCISD::FADDRTZ: return "PPCISD::FADDRTZ";
case PPCISD::TC_RETURN: return "PPCISD::TC_RETURN";		case PPCISD::TC_RETURN: return "PPCISD::TC_RETURN";
▲ Show 20 Lines • Show All 10,049 Lines • ▼ Show 20 Lines	for (int i = N->getNumOperands() - 1; i >= 0; i--)
Ops.push_back(i);		Ops.push_back(i);

return DAG.getVectorShuffle(N->getValueType(0), dl, Load,		return DAG.getVectorShuffle(N->getValueType(0), dl, Load,
DAG.getUNDEF(N->getValueType(0)), Ops);		DAG.getUNDEF(N->getValueType(0)), Ops);
}		}
return SDValue();		return SDValue();
}		}

		// This function adds the required vector_shuffle needed to get
		// the elements of the vector extract in the correct position
		// as specified by the CorrectElems encoding.
		nemanjaiUnsubmitted Not Done Reply Inline Actions I'm afraid that having such a generically named function in such a large file may get confusing down the road. Perhaps it would be clearer to fold this logic into `combineBVOfVecExtend()` and just use `getScalarSizeInBits()`. Perhaps something like: if (InputSize + OutputSize == 5) TgtElemArrayIdx = 0; // ... In any case, you don't want to use a valid array index for an invalid combination as you do here. For example, I don't really see anything in this patch that will prevent you from doing something weird for the `v16i8 -> v8i16` case (which there isn't an instruction for as far as I can tell). Also, please add that as a test case. I realize there isn't a pattern for this in the .td file, but I imagine you'll end up with some weird result in this case. nemanjai: I'm afraid that having such a generically named function in such a large file may get confusing…
		static SDValue addShuffleForVecExtend(SDNode *N, SelectionDAG &DAG,
		SDValue Input, uint64_t Elems,
		uint64_t CorrectElems) {
		SDLoc dl(N);

		unsigned NumElems = Input.getValueType().getVectorNumElements();
		SmallVector<int, 16> ShuffleMask(NumElems, -1);

		for (unsigned i = 0; i < N->getNumOperands(); i++) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions Maybe just a quick note about the algorithm to make it easier to understand at first glance. Something like: // Knowing the element indices being extracted from the original // vector and the order in which they're being inserted, just put // them at element indices required for the instruction. nemanjai: Maybe just a quick note about the algorithm to make it easier to understand at first glance.
		if (DAG.getDataLayout().isLittleEndian())
		ShuffleMask[CorrectElems & 0xF] = Elems & 0xF;
		else
		ShuffleMask[(CorrectElems & 0xF0) >> 4] = (Elems & 0xF0) >> 4;
		CorrectElems = CorrectElems >> 8;
		Elems = Elems >> 8;
		}

		SDValue Shuffle =
		DAG.getVectorShuffle(Input.getValueType(), dl, Input,
		DAG.getUNDEF(Input.getValueType()), ShuffleMask);

		EVT Ty = N->getValueType(0);
		SDValue BV = DAG.getNode(PPCISD::SExtVElems, dl, Ty, Shuffle);
		return BV;
		}

		// Look for build vector patterns where input operands come from sign
		// extended vector_extract elements of specific indices. If the correct indices
		// aren't used, add a vector shuffle to fix up the indices and create a new
		nemanjaiUnsubmitted Not Done Reply Inline Actions I think it would be much cleaner to make this array local to `combineBVOfVecExtend` (you can then pass the target encoding to `addShuffleForVecExtend` rather than indexing into it again). nemanjai: I think it would be much cleaner to make this array local to `combineBVOfVecExtend` (you can…
		nemanjaiUnsubmitted Not Done Reply Inline Actions Small nit: get rid of the use of `new` in the comment because these things won't be new for very long :). nemanjai: Small nit: get rid of the use of `new` in the comment because these things won't be new for…
		// PPCISD:SExtVElems node which selects the new vector sign extend instrustions
		// during instruction selection.
		// Extending byte to word:
		nemanjaiUnsubmitted Not Done Reply Inline Actions All of this can just go away. You're repeating it in a clean and concise way at the definition of the `TargetElems` array. nemanjai: All of this can just go away. You're repeating it in a clean and concise way at the definition…
		// LE indices: 0,4,8,12. BE indices: 3,7,11,15
		// Extending byte to double word:
		// LE indices: 0,8. BE indices: 7, 15
		// Extending half word to word:
		// LE indices: 0,2,4,6. BE indices: 1,3,5,7
		// Extending half word to double word:
		// LE indices: 0,4. BE indices: 3,7
		// Extending word to double word:
		// LE indices: 0,2. BE indices: 1,3
		static SDValue combineBVOfVecExtend(SDNode *N, SelectionDAG &DAG) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions It's not just any extend, right? Probably `combineBVOfVecSExt()` then. nemanjai: It's not just any extend, right? Probably `combineBVOfVecSExt()` then.
		// This array encodes the indices that the vector sign extend instructions
		// extract from when extending from one type to another for both BE and LE.
		// The right nibble of each byte corresponds to the LE incides.
		nemanjaiUnsubmitted Not Done Reply Inline Actions Just initialize this at construction time and remove the loop. nemanjai: Just initialize this at construction time and remove the loop.
		// and the left nibble of each byte corresponds to the BE incides.
		// For example: 0x3074B8FC byte->word
		// For LE: the allowed indices are: 0x0,0x4,0x8,0xC
		// For BE: the allowed indices are: 0x3,0x7,0xB,0xF
		// For example: 0x000070F8 byte->double word
		// For LE: the allowed indices are: 0x0,0x8
		// For BE: the allowed indices are: 0x7,0xF
		uint64_t TargetElems[] = {
		0x3074B8FC, // b->w
		0x000070F8, // b->d
		0x10325476, // h->w
		0x00003074, // h->d
		0x00001032, // w->d
		};

		uint64_t Elems = 0;
		int Index;
		nemanjaiUnsubmitted Not Done Reply Inline Actions Do we really need this? I thought there were overloads of `SelectionDAG::getNode()` that take two `SDValue`'s. nemanjai: Do we really need this? I thought there were overloads of `SelectionDAG::getNode()` that take…
		SDValue Input;

		auto isSExtOfVecExtract = [&](SDValue Op) -> bool {
		if (!Op)
		return false;
		if (Op.getOpcode() != ISD::SIGN_EXTEND)
		return false;

		SDValue Extract = Op.getOperand(0);
		if (Extract.getOpcode() != ISD::EXTRACT_VECTOR_ELT)
		return false;

		ConstantSDNode *ExtOp = dyn_cast<ConstantSDNode>(Extract.getOperand(1));
		if (!ExtOp)
		return false;

		Index = ExtOp->getZExtValue();
		if (Input && Input != Extract.getOperand(0))
		return false;

		if (!Input)
		Input = Extract.getOperand(0);

		Elems = Elems << 8;
		Index = DAG.getDataLayout().isLittleEndian() ? Index : Index << 4;
		nemanjaiUnsubmitted Not Done Reply Inline Actions I don't really see a reason for this to return an `int` rather than a `bool`. The reader might assume that the lambda returns values from a larger space if the return type is `int`. nemanjai: I don't really see a reason for this to return an `int` rather than a `bool`. The reader might…
		Elems \|= Index;

		return true;
		};

		// If the build vector operands aren't sign extended vector extracts,
		// of the same input vector, then return.
		for (unsigned i = 0; i < N->getNumOperands(); i++) {
		if (!isSExtOfVecExtract(N->getOperand(i))) {
		return SDValue();
		}
		}

		// If the vector extract indicies are not correct, add the appropriate
		// vector_shuffle.
		int TgtElemArrayIdx;
		int InputSize = Input.getValueType().getScalarSizeInBits();
		int OutputSize = N->getValueType(0).getScalarSizeInBits();
		if (InputSize + OutputSize == 40)
		stefanpUnsubmitted Not Done Reply Inline Actions I realize that you are trying to check for the byte -> word case with this if statement. However, is it possible to also catch word -> byte too? You may have thought of this already and filtered out unwanted cases earlier up... I don't know. I just wanted to bring this to your attention in case it might be a problem. stefanp: I realize that you are trying to check for the byte -> word case with this if statement.
		syzaaraAuthorUnsubmitted Not Done Reply Inline Actions Actually, I think it's okay since things like word -> byte would fail the isSExtOfVecExtract pattern matching so we wouldn't reach this far. syzaara: Actually, I think it's okay since things like word -> byte would fail the isSExtOfVecExtract…
		TgtElemArrayIdx = 0;
		else if (InputSize + OutputSize == 72)
		TgtElemArrayIdx = 1;
		nemanjaiUnsubmitted Not Done Reply Inline Actions Just a nit: it'd probably be more concise and readable to just use the ternary operator here. nemanjai: Just a nit: it'd probably be more concise and readable to just use the ternary operator here.
		else if (InputSize + OutputSize == 48)
		TgtElemArrayIdx = 2;
		nemanjaiUnsubmitted Not Done Reply Inline Actions I don't actually understand how this works. Won't the nibbles that correspond to the target elements for the opposite endianness always just be zero? And if that's the case, won't the comparison below always fail? (see below) Perhaps the unnecessary shuffle produced will just be optimized out, but it's probably better not to emit it to begin with. nemanjai: I don't actually understand how this works. Won't the nibbles that correspond to the target…
		else if (InputSize + OutputSize == 80)
		TgtElemArrayIdx = 3;
		else if (InputSize + OutputSize == 96)
		TgtElemArrayIdx = 4;
		else
		return SDValue();

		uint64_t CorrectElems = TargetElems[TgtElemArrayIdx];
		CorrectElems = DAG.getDataLayout().isLittleEndian()
		? CorrectElems & 0x0F0F0F0F0F0F0F0F
		: CorrectElems & 0xF0F0F0F0F0F0F0F0;
		if (Elems != CorrectElems) {
		return addShuffleForVecExtend(N, DAG, Input, Elems, CorrectElems);
		}

		nemanjaiUnsubmitted Not Done Reply Inline Actions Shouldn't both operands of this comparison just have the nibbles corresponding to the opposite endianness masked out? i.e. both should be and-ed with `0x0F0F0F0F0F0F0F0F` of `0xF0F0F0F0F0F0F0F0` nemanjai: Shouldn't both operands of this comparison just have the nibbles corresponding to the opposite…
		// Regular lowering will catch cases where a shuffle is not needed.
		return SDValue();
		}
		nemanjaiUnsubmitted Not Done Reply Inline Actions Just a comment along the lines of `// Regular lowering will catch cases where a shuffle is not needed.` nemanjai: Just a comment along the lines of `// Regular lowering will catch cases where a shuffle is not…

SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,		SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
assert(N->getOpcode() == ISD::BUILD_VECTOR &&		assert(N->getOpcode() == ISD::BUILD_VECTOR &&
"Should be called with a BUILD_VECTOR node");		"Should be called with a BUILD_VECTOR node");

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDLoc dl(N);		SDLoc dl(N);

Show All 11 Lines	SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,
}		}

// If we're building a vector out of consecutive loads, just load that		// If we're building a vector out of consecutive loads, just load that
// vector type.		// vector type.
SDValue Reduced = combineBVOfConsecutiveLoads(N, DAG);		SDValue Reduced = combineBVOfConsecutiveLoads(N, DAG);
if (Reduced)		if (Reduced)
return Reduced;		return Reduced;

		// If we're building a vector out of extended elements from another vector
		// we have P9 vector integer extend instructions.
		if (Subtarget.hasP9Altivec()) {
		Reduced = combineBVOfVecExtend(N, DAG);
		if (Reduced)
		return Reduced;
		}


if (N->getValueType(0) != MVT::v2f64)		if (N->getValueType(0) != MVT::v2f64)
return SDValue();		return SDValue();

// Looking for:		// Looking for:
// (build_vector ([su]int_to_fp (extractelt 0)), [su]int_to_fp (extractelt 1))		// (build_vector ([su]int_to_fp (extractelt 0)), [su]int_to_fp (extractelt 1))
if (FirstInput.getOpcode() != ISD::SINT_TO_FP &&		if (FirstInput.getOpcode() != ISD::SINT_TO_FP &&
FirstInput.getOpcode() != ISD::UINT_TO_FP)		FirstInput.getOpcode() != ISD::UINT_TO_FP)
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 1,972 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.td

Show All 26 Lines	def SDT_PPCLxsizx : SDTypeProfile<1, 2, [
SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>		SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>
]>;		]>;
def SDT_PPCstxsix : SDTypeProfile<0, 3, [		def SDT_PPCstxsix : SDTypeProfile<0, 3, [
SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>		SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>
]>;		]>;
def SDT_PPCVexts : SDTypeProfile<1, 2, [		def SDT_PPCVexts : SDTypeProfile<1, 2, [
SDTCisVT<0, f64>, SDTCisVT<1, f64>, SDTCisPtrTy<2>		SDTCisVT<0, f64>, SDTCisVT<1, f64>, SDTCisPtrTy<2>
]>;		]>;
		def SDT_PPCSExtVElems : SDTypeProfile<1, 1, [
		SDTCisVec<0>, SDTCisVec<1>
		]>;

def SDT_PPCCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32>,		def SDT_PPCCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32>,
SDTCisVT<1, i32> ]>;		SDTCisVT<1, i32> ]>;
def SDT_PPCCallSeqEnd : SDCallSeqEnd<[ SDTCisVT<0, i32>,		def SDT_PPCCallSeqEnd : SDCallSeqEnd<[ SDTCisVT<0, i32>,
SDTCisVT<1, i32> ]>;		SDTCisVT<1, i32> ]>;
def SDT_PPCvperm : SDTypeProfile<1, 3, [		def SDT_PPCvperm : SDTypeProfile<1, 3, [
SDTCisVT<3, v16i8>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>		SDTCisVT<3, v16i8>, SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>
]>;		]>;
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	def PPClfiwax : SDNode<"PPCISD::LFIWAX", SDT_PPClfiwx,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def PPClfiwzx : SDNode<"PPCISD::LFIWZX", SDT_PPClfiwx,		def PPClfiwzx : SDNode<"PPCISD::LFIWZX", SDT_PPClfiwx,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def PPClxsizx : SDNode<"PPCISD::LXSIZX", SDT_PPCLxsizx,		def PPClxsizx : SDNode<"PPCISD::LXSIZX", SDT_PPCLxsizx,
[SDNPHasChain, SDNPMayLoad]>;		[SDNPHasChain, SDNPMayLoad]>;
def PPCstxsix : SDNode<"PPCISD::STXSIX", SDT_PPCstxsix,		def PPCstxsix : SDNode<"PPCISD::STXSIX", SDT_PPCstxsix,
[SDNPHasChain, SDNPMayStore]>;		[SDNPHasChain, SDNPMayStore]>;
def PPCVexts : SDNode<"PPCISD::VEXTS", SDT_PPCVexts, []>;		def PPCVexts : SDNode<"PPCISD::VEXTS", SDT_PPCVexts, []>;
		def PPCSExtVElems : SDNode<"PPCISD::SExtVElems", SDT_PPCSExtVElems, []>;

// Extract FPSCR (not modeled at the DAG level).		// Extract FPSCR (not modeled at the DAG level).
def PPCmffs : SDNode<"PPCISD::MFFS",		def PPCmffs : SDNode<"PPCISD::MFFS",
SDTypeProfile<1, 0, [SDTCisVT<0, f64>]>, []>;		SDTypeProfile<1, 0, [SDTCisVT<0, f64>]>, []>;

// Perform FADD in round-to-zero mode.		// Perform FADD in round-to-zero mode.
def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;		def PPCfaddrtz: SDNode<"PPCISD::FADDRTZ", SDTFPBinOp, []>;

▲ Show 20 Lines • Show All 4,302 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 2,712 Lines • ▼ Show 20 Lines

def DblToFlt {		def DblToFlt {
dag A0 = (f32 (fpround (f64 (extractelt v2f64:$A, 0))));		dag A0 = (f32 (fpround (f64 (extractelt v2f64:$A, 0))));
dag A1 = (f32 (fpround (f64 (extractelt v2f64:$A, 1))));		dag A1 = (f32 (fpround (f64 (extractelt v2f64:$A, 1))));
dag B0 = (f32 (fpround (f64 (extractelt v2f64:$B, 0))));		dag B0 = (f32 (fpround (f64 (extractelt v2f64:$B, 0))));
dag B1 = (f32 (fpround (f64 (extractelt v2f64:$B, 1))));		dag B1 = (f32 (fpround (f64 (extractelt v2f64:$B, 1))));
}		}

def ByteToWord {		def ByteToWord {
		nemanjaiUnsubmitted Not Done Reply Inline Actions Weren't these added with the previous patch? Or maybe it was only the LE ones? If so, can you please rebase this patch so that it would apply cleanly? nemanjai: Weren't these added with the previous patch? Or maybe it was only the LE ones? If so, can you…
dag A0 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 0)), i8));		dag LE_A0 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 0)), i8));
dag A1 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 4)), i8));		dag LE_A1 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 4)), i8));
dag A2 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 8)), i8));		dag LE_A2 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 8)), i8));
dag A3 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 12)), i8));		dag LE_A3 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 12)), i8));
		dag BE_A0 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 3)), i8));
		dag BE_A1 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 7)), i8));
		dag BE_A2 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 11)), i8));
		dag BE_A3 = (i32 (sext_inreg (i32 (vector_extract v16i8:$A, 15)), i8));
}		}

def ByteToDWord {		def ByteToDWord {
dag A0 = (i64 (sext_inreg		dag LE_A0 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v16i8:$A, 0)))), i8));		(i64 (anyext (i32 (vector_extract v16i8:$A, 0)))), i8));
dag A1 = (i64 (sext_inreg		dag LE_A1 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v16i8:$A, 8)))), i8));		(i64 (anyext (i32 (vector_extract v16i8:$A, 8)))), i8));
		dag BE_A0 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v16i8:$A, 7)))), i8));
		dag BE_A1 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v16i8:$A, 15)))), i8));
}		}

def HWordToWord {		def HWordToWord {
dag A0 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 0)), i16));		dag LE_A0 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 0)), i16));
dag A1 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 2)), i16));		dag LE_A1 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 2)), i16));
dag A2 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 4)), i16));		dag LE_A2 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 4)), i16));
dag A3 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 6)), i16));		dag LE_A3 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 6)), i16));
		dag BE_A0 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 1)), i16));
		dag BE_A1 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 3)), i16));
		dag BE_A2 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 5)), i16));
		dag BE_A3 = (i32 (sext_inreg (i32 (vector_extract v8i16:$A, 7)), i16));
}		}

def HWordToDWord {		def HWordToDWord {
dag A0 = (i64 (sext_inreg		dag LE_A0 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v8i16:$A, 0)))), i16));		(i64 (anyext (i32 (vector_extract v8i16:$A, 0)))), i16));
dag A1 = (i64 (sext_inreg		dag LE_A1 = (i64 (sext_inreg
(i64 (anyext (i32 (vector_extract v8i16:$A, 4)))), i16));		(i64 (anyext (i32 (vector_extract v8i16:$A, 4)))), i16));
		dag BE_A0 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v8i16:$A, 3)))), i16));
		dag BE_A1 = (i64 (sext_inreg
		(i64 (anyext (i32 (vector_extract v8i16:$A, 7)))), i16));
}		}

def WordToDWord {		def WordToDWord {
dag A0 = (i64 (sext (i32 (vector_extract v4i32:$A, 0))));		dag LE_A0 = (i64 (sext (i32 (vector_extract v4i32:$A, 0))));
dag A1 = (i64 (sext (i32 (vector_extract v4i32:$A, 2))));		dag LE_A1 = (i64 (sext (i32 (vector_extract v4i32:$A, 2))));
		dag BE_A0 = (i64 (sext (i32 (vector_extract v4i32:$A, 1))));
		dag BE_A1 = (i64 (sext (i32 (vector_extract v4i32:$A, 3))));
}		}

def FltToIntLoad {		def FltToIntLoad {
dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (extloadf32 xoaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (extloadf32 xoaddr:$A)))));
}		}
def FltToUIntLoad {		def FltToUIntLoad {
dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));		dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));
}		}
▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	def : Pat<(v2i64 (build_vector i64:$rA, i64:$rB)),
(v2i64 (MTVSRDD $rB, $rA))>;		(v2i64 (MTVSRDD $rB, $rA))>;
def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),		def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
(VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.D, AnyExts.B), VSRC),		(VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.D, AnyExts.B), VSRC),
(COPY_TO_REGCLASS (MTVSRDD AnyExts.C, AnyExts.A), VSRC))>;		(COPY_TO_REGCLASS (MTVSRDD AnyExts.C, AnyExts.A), VSRC))>;
}		}
// P9 Altivec instructions that can be used to build vectors.		// P9 Altivec instructions that can be used to build vectors.
// Adding them to PPCInstrVSX.td rather than PPCAltivecVSX.td to compete		// Adding them to PPCInstrVSX.td rather than PPCAltivecVSX.td to compete
// with complexities of existing build vector patterns in this file.		// with complexities of existing build vector patterns in this file.
let Predicates = [HasP9Altivec] in {		let Predicates = [HasP9Altivec, IsLittleEndian] in {
def : Pat<(v2i64 (build_vector WordToDWord.A0, WordToDWord.A1)),		def : Pat<(v2i64 (build_vector WordToDWord.LE_A0, WordToDWord.LE_A1)),
		(v2i64 (VEXTSW2D $A))>;
		def : Pat<(v2i64 (build_vector HWordToDWord.LE_A0, HWordToDWord.LE_A1)),
		(v2i64 (VEXTSH2D $A))>;
		def : Pat<(v4i32 (build_vector HWordToWord.LE_A0, HWordToWord.LE_A1,
		HWordToWord.LE_A2, HWordToWord.LE_A3)),
		(v4i32 (VEXTSH2W $A))>;
		def : Pat<(v4i32 (build_vector ByteToWord.LE_A0, ByteToWord.LE_A1,
		ByteToWord.LE_A2, ByteToWord.LE_A3)),
		(v4i32 (VEXTSB2W $A))>;
		def : Pat<(v2i64 (build_vector ByteToDWord.LE_A0, ByteToDWord.LE_A1)),
		(v2i64 (VEXTSB2D $A))>;
		}

		let Predicates = [HasP9Altivec, IsBigEndian] in {
		def : Pat<(v2i64 (build_vector WordToDWord.BE_A0, WordToDWord.BE_A1)),
(v2i64 (VEXTSW2D $A))>;		(v2i64 (VEXTSW2D $A))>;
def : Pat<(v2i64 (build_vector HWordToDWord.A0, HWordToDWord.A1)),		def : Pat<(v2i64 (build_vector HWordToDWord.BE_A0, HWordToDWord.BE_A1)),
(v2i64 (VEXTSH2D $A))>;		(v2i64 (VEXTSH2D $A))>;
def : Pat<(v4i32 (build_vector HWordToWord.A0, HWordToWord.A1,		def : Pat<(v4i32 (build_vector HWordToWord.BE_A0, HWordToWord.BE_A1,
HWordToWord.A2, HWordToWord.A3)),		HWordToWord.BE_A2, HWordToWord.BE_A3)),
(v4i32 (VEXTSH2W $A))>;		(v4i32 (VEXTSH2W $A))>;
def : Pat<(v4i32 (build_vector ByteToWord.A0, ByteToWord.A1,		def : Pat<(v4i32 (build_vector ByteToWord.BE_A0, ByteToWord.BE_A1,
ByteToWord.A2, ByteToWord.A3)),		ByteToWord.BE_A2, ByteToWord.BE_A3)),
		nemanjaiUnsubmitted Not Done Reply Inline Actions The versions with the new node are not endianness specific are they? If not, please move them out of the blocks and have a single copy of each. nemanjai: The versions with the new node are not endianness specific are they? If not, please move them…
(v4i32 (VEXTSB2W $A))>;		(v4i32 (VEXTSB2W $A))>;
def : Pat<(v2i64 (build_vector ByteToDWord.A0, ByteToDWord.A1)),		def : Pat<(v2i64 (build_vector ByteToDWord.BE_A0, ByteToDWord.BE_A1)),
(v2i64 (VEXTSB2D $A))>;		(v2i64 (VEXTSB2D $A))>;
}		}

		let Predicates = [HasP9Altivec] in {
		def: Pat<(v2i64 (PPCSExtVElems v16i8:$A)),
		(v2i64 (VEXTSB2D $A))>;
		def: Pat<(v2i64 (PPCSExtVElems v8i16:$A)),
		(v2i64 (VEXTSH2D $A))>;
		def: Pat<(v2i64 (PPCSExtVElems v4i32:$A)),
		(v2i64 (VEXTSW2D $A))>;
		def: Pat<(v4i32 (PPCSExtVElems v16i8:$A)),
		(v4i32 (VEXTSB2W $A))>;
		def: Pat<(v4i32 (PPCSExtVElems v8i16:$A)),
		(v4i32 (VEXTSH2W $A))>;
		}
}		}

test/CodeGen/PowerPC/vec_int_ext.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=PWR9			; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-gnu-linux -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=CHECK-LE
	target triple = "powerpc64le-unknown-linux-gnu"			; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-gnu-linux -mcpu=pwr9 < %s \| FileCheck %s -check-prefix=CHECK-BE

				nemanjaiUnsubmitted Not Done Reply Inline Actions Please add a test that checks that there are no shuffles when the input elements are correct. nemanjai: Please add a test that checks that there are no shuffles when the input elements are correct.
				syzaaraAuthorUnsubmitted Not Done Reply Inline Actions Each of these tests is already checking both correct and incorrect elements. The LE tests are using correct elements for LE and so the BE pattern match should have the shuffle. The BE tests are using correct elements for BE and so the LE pattern match should have the shuffle. syzaara: Each of these tests is already checking both correct and incorrect elements. The *LE tests are…
				nemanjaiUnsubmitted Not Done Reply Inline Actions Ah I see. Not sure how I missed that. Sorry. nemanjai: Ah I see. Not sure how I missed that. Sorry.
				define <4 x i32> @vextsb2wLE(<16 x i8> %a) {
				; CHECK-LE-LABEL: vextsb2wLE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vextsb2w 2, 2
				; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsb2wLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsb2w 2, 2
				; CHECK-BE-NEXT: blr

	define <4 x i32> @vextsb2w(<16 x i8> %a) {
	; PWR9-LABEL: vextsb2w:
	; PWR9: # BB#0: # %entry
	; PWR9-NEXT: vextsb2w 2, 2
	; PWR9-NEXT: blr
	entry:			entry:
	%vecext = extractelement <16 x i8> %a, i32 0			%vecext = extractelement <16 x i8> %a, i32 0
	%conv = sext i8 %vecext to i32			%conv = sext i8 %vecext to i32
	%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0			%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
	%vecext1 = extractelement <16 x i8> %a, i32 4			%vecext1 = extractelement <16 x i8> %a, i32 4
	%conv2 = sext i8 %vecext1 to i32			%conv2 = sext i8 %vecext1 to i32
	%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1			%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
	%vecext4 = extractelement <16 x i8> %a, i32 8			%vecext4 = extractelement <16 x i8> %a, i32 8
	%conv5 = sext i8 %vecext4 to i32			%conv5 = sext i8 %vecext4 to i32
	%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2			%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
	%vecext7 = extractelement <16 x i8> %a, i32 12			%vecext7 = extractelement <16 x i8> %a, i32 12
	%conv8 = sext i8 %vecext7 to i32			%conv8 = sext i8 %vecext7 to i32
	%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3			%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
	ret <4 x i32> %vecinit9			ret <4 x i32> %vecinit9
	}			}

	define <2 x i64> @vextsb2d(<16 x i8> %a) {			define <2 x i64> @vextsb2dLE(<16 x i8> %a) {
	; PWR9-LABEL: vextsb2d:			; CHECK-LE-LABEL: vextsb2dLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsb2d 2, 2			; CHECK-LE-NEXT: vextsb2d 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsb2dLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsb2d 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <16 x i8> %a, i32 0			%vecext = extractelement <16 x i8> %a, i32 0
	%conv = sext i8 %vecext to i64			%conv = sext i8 %vecext to i64
	%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0			%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
	%vecext1 = extractelement <16 x i8> %a, i32 8			%vecext1 = extractelement <16 x i8> %a, i32 8
	%conv2 = sext i8 %vecext1 to i64			%conv2 = sext i8 %vecext1 to i64
	%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1			%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
	ret <2 x i64> %vecinit3			ret <2 x i64> %vecinit3
	}			}

	define <4 x i32> @vextsh2w(<8 x i16> %a) {			define <4 x i32> @vextsh2wLE(<8 x i16> %a) {
	; PWR9-LABEL: vextsh2w:			; CHECK-LE-LABEL: vextsh2wLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsh2w 2, 2			; CHECK-LE-NEXT: vextsh2w 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsh2wLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsh2w 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <8 x i16> %a, i32 0			%vecext = extractelement <8 x i16> %a, i32 0
	%conv = sext i16 %vecext to i32			%conv = sext i16 %vecext to i32
	%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0			%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
	%vecext1 = extractelement <8 x i16> %a, i32 2			%vecext1 = extractelement <8 x i16> %a, i32 2
	%conv2 = sext i16 %vecext1 to i32			%conv2 = sext i16 %vecext1 to i32
	%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1			%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
	%vecext4 = extractelement <8 x i16> %a, i32 4			%vecext4 = extractelement <8 x i16> %a, i32 4
	%conv5 = sext i16 %vecext4 to i32			%conv5 = sext i16 %vecext4 to i32
	%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2			%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
	%vecext7 = extractelement <8 x i16> %a, i32 6			%vecext7 = extractelement <8 x i16> %a, i32 6
	%conv8 = sext i16 %vecext7 to i32			%conv8 = sext i16 %vecext7 to i32
	%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3			%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
	ret <4 x i32> %vecinit9			ret <4 x i32> %vecinit9
	}			}

	define <2 x i64> @vextsh2d(<8 x i16> %a) {			define <2 x i64> @vextsh2dLE(<8 x i16> %a) {
	; PWR9-LABEL: vextsh2d:			; CHECK-LE-LABEL: vextsh2dLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsh2d 2, 2			; CHECK-LE-NEXT: vextsh2d 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsh2dLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vperm 2, 2, 2, 3
				; CHECK-BE-NEXT: vextsh2d 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <8 x i16> %a, i32 0			%vecext = extractelement <8 x i16> %a, i32 0
	%conv = sext i16 %vecext to i64			%conv = sext i16 %vecext to i64
	%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0			%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
	%vecext1 = extractelement <8 x i16> %a, i32 4			%vecext1 = extractelement <8 x i16> %a, i32 4
	%conv2 = sext i16 %vecext1 to i64			%conv2 = sext i16 %vecext1 to i64
	%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1			%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
	ret <2 x i64> %vecinit3			ret <2 x i64> %vecinit3
	}			}

	define <2 x i64> @vextsw2d(<4 x i32> %a) {			define <2 x i64> @vextsw2dLE(<4 x i32> %a) {
	; PWR9-LABEL: vextsw2d:			; CHECK-LE-LABEL: vextsw2dLE:
	; PWR9: # BB#0: # %entry			; CHECK-LE: # BB#0: # %entry
	; PWR9-NEXT: vextsw2d 2, 2			; CHECK-LE-NEXT: vextsw2d 2, 2
	; PWR9-NEXT: blr			; CHECK-LE-NEXT: blr
				; CHECK-BE-LABEL: vextsw2dLE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE: vmrgew
				; CHECK-BE-NEXT: vextsw2d 2, 2
				; CHECK-BE-NEXT: blr

	entry:			entry:
	%vecext = extractelement <4 x i32> %a, i32 0			%vecext = extractelement <4 x i32> %a, i32 0
	%conv = sext i32 %vecext to i64			%conv = sext i32 %vecext to i64
	%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0			%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
	%vecext1 = extractelement <4 x i32> %a, i32 2			%vecext1 = extractelement <4 x i32> %a, i32 2
	%conv2 = sext i32 %vecext1 to i64			%conv2 = sext i32 %vecext1 to i64
	%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1			%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
	ret <2 x i64> %vecinit3			ret <2 x i64> %vecinit3
	}			}

				define <4 x i32> @vextsb2wBE(<16 x i8> %a) {
				; CHECK-BE-LABEL: vextsb2wBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsb2w 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsb2wBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 13
				; CHECK-LE-NEXT: vextsb2w 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <16 x i8> %a, i32 3
				%conv = sext i8 %vecext to i32
				%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
				%vecext1 = extractelement <16 x i8> %a, i32 7
				%conv2 = sext i8 %vecext1 to i32
				%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
				%vecext4 = extractelement <16 x i8> %a, i32 11
				%conv5 = sext i8 %vecext4 to i32
				%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
				%vecext7 = extractelement <16 x i8> %a, i32 15
				%conv8 = sext i8 %vecext7 to i32
				%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
				ret <4 x i32> %vecinit9
				}

				define <2 x i64> @vextsb2dBE(<16 x i8> %a) {
				; CHECK-BE-LABEL: vextsb2dBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsb2d 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsb2dBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 9
				; CHECK-LE-NEXT: vextsb2d 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <16 x i8> %a, i32 7
				%conv = sext i8 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <16 x i8> %a, i32 15
				%conv2 = sext i8 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <4 x i32> @vextsh2wBE(<8 x i16> %a) {
				; CHECK-BE-LABEL: vextsh2wBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsh2w 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsh2wBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 14
				; CHECK-LE-NEXT: vextsh2w 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <8 x i16> %a, i32 1
				%conv = sext i16 %vecext to i32
				%vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
				%vecext1 = extractelement <8 x i16> %a, i32 3
				%conv2 = sext i16 %vecext1 to i32
				%vecinit3 = insertelement <4 x i32> %vecinit, i32 %conv2, i32 1
				%vecext4 = extractelement <8 x i16> %a, i32 5
				%conv5 = sext i16 %vecext4 to i32
				%vecinit6 = insertelement <4 x i32> %vecinit3, i32 %conv5, i32 2
				%vecext7 = extractelement <8 x i16> %a, i32 7
				%conv8 = sext i16 %vecext7 to i32
				%vecinit9 = insertelement <4 x i32> %vecinit6, i32 %conv8, i32 3
				ret <4 x i32> %vecinit9
				}

				define <2 x i64> @vextsh2dBE(<8 x i16> %a) {
				; CHECK-BE-LABEL: vextsh2dBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsh2d 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsh2dBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 10
				; CHECK-LE-NEXT: vextsh2d 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <8 x i16> %a, i32 3
				%conv = sext i16 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <8 x i16> %a, i32 7
				%conv2 = sext i16 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <2 x i64> @vextsw2dBE(<4 x i32> %a) {
				; CHECK-BE-LABEL: vextsw2dBE:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: vextsw2d 2, 2
				; CHECK-BE-NEXT: blr
				; CHECK-LE-LABEL: vextsw2dBE:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: vsldoi 2, 2, 2, 12
				; CHECK-LE-NEXT: vextsw2d 2, 2
				; CHECK-LE-NEXT: blr
				entry:
				%vecext = extractelement <4 x i32> %a, i32 1
				%conv = sext i32 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <4 x i32> %a, i32 3
				%conv2 = sext i32 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <2 x i64> @vextDiffVectors(<4 x i32> %a, <4 x i32> %b) {
				; CHECK-LE-LABEL: vextDiffVectors:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NOT: vextsw2d

				; CHECK-BE-LABEL: vextDiffVectors:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NOT: vextsw2d
				entry:
				%vecext = extractelement <4 x i32> %a, i32 0
				%conv = sext i32 %vecext to i64
				%vecinit = insertelement <2 x i64> undef, i64 %conv, i32 0
				%vecext1 = extractelement <4 x i32> %b, i32 2
				%conv2 = sext i32 %vecext1 to i64
				%vecinit3 = insertelement <2 x i64> %vecinit, i64 %conv2, i32 1
				ret <2 x i64> %vecinit3
				}

				define <8 x i16> @testInvalidExtend(<16 x i8> %a) {
				entry:
				; CHECK-LE-LABEL: testInvalidExtend:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NOT: vexts

				; CHECK-BE-LABEL: testInvalidExtend:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NOT: vexts

				%vecext = extractelement <16 x i8> %a, i32 0
				%conv = sext i8 %vecext to i16
				%vecinit = insertelement <8 x i16> undef, i16 %conv, i32 0
				%vecext1 = extractelement <16 x i8> %a, i32 2
				%conv2 = sext i8 %vecext1 to i16
				%vecinit3 = insertelement <8 x i16> %vecinit, i16 %conv2, i32 1
				%vecext4 = extractelement <16 x i8> %a, i32 4
				%conv5 = sext i8 %vecext4 to i16
				%vecinit6 = insertelement <8 x i16> %vecinit3, i16 %conv5, i32 2
				%vecext7 = extractelement <16 x i8> %a, i32 6
				%conv8 = sext i8 %vecext7 to i16
				%vecinit9 = insertelement <8 x i16> %vecinit6, i16 %conv8, i32 3
				%vecext10 = extractelement <16 x i8> %a, i32 8
				%conv11 = sext i8 %vecext10 to i16
				%vecinit12 = insertelement <8 x i16> %vecinit9, i16 %conv11, i32 4
				%vecext13 = extractelement <16 x i8> %a, i32 10
				%conv14 = sext i8 %vecext13 to i16
				%vecinit15 = insertelement <8 x i16> %vecinit12, i16 %conv14, i32 5
				%vecext16 = extractelement <16 x i8> %a, i32 12
				%conv17 = sext i8 %vecext16 to i16
				%vecinit18 = insertelement <8 x i16> %vecinit15, i16 %conv17, i32 6
				%vecext19 = extractelement <16 x i8> %a, i32 14
				%conv20 = sext i8 %vecext19 to i16
				%vecinit21 = insertelement <8 x i16> %vecinit18, i16 %conv20, i32 7
				ret <8 x i16> %vecinit21
				}