This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
1/1
PPCISelLowering.h
20/22
PPCISelLowering.cpp
-
PPCInstrInfo.td
-
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
vec_xxpermdi.ll

Differential D33404

[PowerPC] Fix a performance bug for PPC::XXPERMDI.
ClosedPublic

Authored by jtony on May 22 2017, 4:34 AM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
sfertile
lei
syzaara
echristo
hfinkel
inouehrs

Commits

rG60c247de18c5: [PowerPC] Fix a performance bug for PPC::XXPERMDI.
rL304298: [PowerPC] Fix a performance bug for PPC::XXPERMDI.

Summary

There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance.

Diff Detail

Event Timeline

jtony created this revision.May 22 2017, 4:34 AM

nemanjai added inline comments.May 23 2017, 1:43 PM

lib/Target/PowerPC/PPCISelLowering.cpp
1703	Seems like a bit of code duplication with `isWordShuffleMask()`. Perhaps it would be more consistent to just implement something like `isNByteElemShuffleMask(unsigned Width)` which can be called for width 1, 2, 4, 8, 16 as needed.
1704	Nit: formatting. This is indented more than the rest of the function for some reason.
1705	I'm not a fan of loops with 2 iterations. It's probably clearer if this is just straight-line code.
1765	Some people really don't like boolean parameters :). I really don't think this function is needed - just do the computation inline. Especially when I look at the call sites - it looks like there are 3 and you've already checked the endianness on 2 of them.
1770	Nit: indentation.
1771	Is there a reason for this check (i.e. vs. an assert)? It seems we should be safe to assert this is the case.
1780	I think maybe an assert is in order here... Perhaps: `assert((M0 \| M1 < 4) && "A mask element out of bounds?")`
1785	Isn't this just `(M0 \| M1) < 2`? Namely, neither is greater than 1.
1793	This entire if statement makes my head hurt. At the very least, it should be documented. But I'm sure when you document it, it'll be obvious how it can be rewritten to be much simpler. Comment it with something along the lines of: // If element 0 of the result comes from the first input (LE) or second input (BE) // the inputs need to be swapped and elements adjusted accordingly. I've suggested simplifications for the LE conditions, but you should be able to simplify the BE conditions accordingly.
1794	If we ignore the fact that the second half of this can never be satisfied (M1 can't be 0 and 1), I assume this should be just: `if (M0 > 1 && M1 < 2)`
1796	Similarly, this one seems like it should just be: `if (M0 < 2 && M1 > 1)`

kbarton added inline comments.May 25 2017, 11:23 AM

lib/Target/PowerPC/PPCISelLowering.cpp
1705	I agree - this should be just straight line code, with a short comment explaining the logic.
1769	Please add a comment explaining the semantics, and what what conditions the parameters will be modified.
1804	This is a bit hard to follow, but I don't think Swap is modified on this path. Is that intentional? If so, it needs to be documented clearly in the comments above.
1816	Same comment for Swap.
lib/Target/PowerPC/PPCISelLowering.h
461	Please add a comment above to be consistent with all of the other declarations here.

jtony marked 16 inline comments as done.May 25 2017, 1:08 PM

jtony added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
1703	Good suggestion, I have added the isNByteElemShuffleMask function you mentioned here to replace the isWordShuffleMask and isDoubleWordShuffleMask function, but only for 2,4,8,16 bytes since there is no need to call this function to check it is 1 byte element shuffle mask, it is always true.
1705	This function have been refactored to isNByteElemShuffleMask mentioned by Nemanja above.

Address comments from Nemanja and Kit.

echristo added inline comments.May 25 2017, 5:57 PM

lib/Target/PowerPC/PPCISelLowering.cpp
1772	Since this is basically two functions concatenated on each other with a boolean parameter it seems reasonable to just remove the parameter and split it into two functions and dispatch in the caller. Or do it as a followup with the rest of them, but either way it'd be good to get the endianness stuff cleaned up.

nemanjai added inline comments.May 27 2017, 10:27 AM

lib/Target/PowerPC/PPCISelLowering.cpp
1606	Is there a compelling reason to manually allocate memory for this over using efficient containers (something like `SmallVector` or similar)? I'd much rather avoid manual memory management of this sort if we can. And I don't see anything in this function to indicate we can't.

hfinkel added inline comments.May 27 2017, 12:23 PM

lib/Target/PowerPC/PPCISelLowering.cpp
1606	We should definitely be using a SmallVector here, if we need to keep track of the size easily. Otherwise, given that Width is never greater than 16, we can just use: unsigned MaskVal[16];

Address comments from Hal and Nemanja.

jtony marked 2 inline comments as done.May 28 2017, 12:56 PM

jtony added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
1772	This can be done together with the same issue created for the xxsldwi patch.

jtony marked an inline comment as done.May 28 2017, 12:57 PM

In addition to the inline nit, I think it'd be more natural to use the v2i64 type for this PPCISD node (and the instruction) since the instruction inherently operates on doublewords. You should be able to just change the type you bitcast to in the C++ code as well as the type you match in the TblGen code.

Otherwise, LGTM.

lib/Target/PowerPC/PPCISelLowering.cpp
1764	A couple of nits that can just be addressed on the commit (no need for another revision): This appears to use doxygen-frendly comments with parameter tags, etc. I think doxygen will only use comments that start with three slashes (`///`). I find it confusing for the comment to jump into the implementation like this. A short sentence explaining what the function does, then some details as needed. For example: /// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap /// if the inputs to the instruction should be swapped and set \p DM to the /// value for the immediate.

This revision is now accepted and ready to land.May 28 2017, 9:13 PM

jtony marked an inline comment as done.May 29 2017, 7:21 PM

Closed by commit rL304298: [PowerPC] Fix a performance bug for PPC::XXPERMDI. (authored by jtony). · Explain WhyMay 31 2017, 6:10 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

8 lines

108 lines

5 lines

4 lines

test/

CodeGen/

PowerPC/

vec_xxpermdi.ll

307 lines

Diff 100300

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// XXINSERT - The PPC VSX insert instruction		/// XXINSERT - The PPC VSX insert instruction
///		///
XXINSERT,		XXINSERT,

/// VECSHL - The PPC VSX shift left instruction		/// VECSHL - The PPC VSX shift left instruction
///		///
VECSHL,		VECSHL,

		/// XXPERMDI - The PPC XXPERMDI instruction
		///
		XXPERMDI,

/// The CMPB instruction (takes two operands of i32 or i64).		/// The CMPB instruction (takes two operands of i32 or i64).
CMPB,		CMPB,

/// Hi/Lo - These represent the high and low 16-bit parts of a global		/// Hi/Lo - These represent the high and low 16-bit parts of a global
/// address respectively. These nodes have two operands, the first of		/// address respectively. These nodes have two operands, the first of
/// which must be a TargetGlobalAddress, and the second of which must be a		/// which must be a TargetGlobalAddress, and the second of which must be a
/// Constant. Selected naively, these turn into 'lis G+C' and 'li G+C',		/// Constant. Selected naively, these turn into 'lis G+C' and 'li G+C',
/// though these are usually folded into other nodes.		/// though these are usually folded into other nodes.
▲ Show 20 Lines • Show All 348 Lines • ▼ Show 20 Lines	namespace PPC {
/// isVMRGEOShuffleMask - Return true if this is a shuffle mask suitable for		/// isVMRGEOShuffleMask - Return true if this is a shuffle mask suitable for
/// a VMRGEW or VMRGOW instruction		/// a VMRGEW or VMRGOW instruction
bool isVMRGEOShuffleMask(ShuffleVectorSDNode *N, bool CheckEven,		bool isVMRGEOShuffleMask(ShuffleVectorSDNode *N, bool CheckEven,
unsigned ShuffleKind, SelectionDAG &DAG);		unsigned ShuffleKind, SelectionDAG &DAG);
/// isXXSLDWIShuffleMask - Return true if this is a shuffle mask suitable		/// isXXSLDWIShuffleMask - Return true if this is a shuffle mask suitable
/// for a XXSLDWI instruction.		/// for a XXSLDWI instruction.
bool isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
bool &Swap, bool IsLE);		bool &Swap, bool IsLE);
		/// isXXPERMDIShuffleMask - Return true if this is a shuffle mask suitable
		kbartonUnsubmitted Done Reply Inline Actions Please add a comment above to be consistent with all of the other declarations here. kbarton: Please add a comment above to be consistent with all of the other declarations here.
		/// for a XXPERMDI instruction.
		bool isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
		bool &Swap, bool IsLE);

/// isVSLDOIShuffleMask - If this is a vsldoi shuffle mask, return the		/// isVSLDOIShuffleMask - If this is a vsldoi shuffle mask, return the
/// shift amount, otherwise return -1.		/// shift amount, otherwise return -1.
int isVSLDOIShuffleMask(SDNode *N, unsigned ShuffleKind,		int isVSLDOIShuffleMask(SDNode *N, unsigned ShuffleKind,
SelectionDAG &DAG);		SelectionDAG &DAG);

/// isSplatShuffleMask - Return true if the specified VECTOR_SHUFFLE operand		/// isSplatShuffleMask - Return true if the specified VECTOR_SHUFFLE operand
/// specifies a splat of a single element that is suitable for input to		/// specifies a splat of a single element that is suitable for input to
▲ Show 20 Lines • Show All 603 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,109 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::FRE: return "PPCISD::FRE";		case PPCISD::FRE: return "PPCISD::FRE";
case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";		case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";
case PPCISD::STFIWX: return "PPCISD::STFIWX";		case PPCISD::STFIWX: return "PPCISD::STFIWX";
case PPCISD::VMADDFP: return "PPCISD::VMADDFP";		case PPCISD::VMADDFP: return "PPCISD::VMADDFP";
case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";		case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";
case PPCISD::VPERM: return "PPCISD::VPERM";		case PPCISD::VPERM: return "PPCISD::VPERM";
case PPCISD::XXSPLT: return "PPCISD::XXSPLT";		case PPCISD::XXSPLT: return "PPCISD::XXSPLT";
case PPCISD::XXINSERT: return "PPCISD::XXINSERT";		case PPCISD::XXINSERT: return "PPCISD::XXINSERT";
		case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI";
case PPCISD::VECSHL: return "PPCISD::VECSHL";		case PPCISD::VECSHL: return "PPCISD::VECSHL";
case PPCISD::CMPB: return "PPCISD::CMPB";		case PPCISD::CMPB: return "PPCISD::CMPB";
case PPCISD::Hi: return "PPCISD::Hi";		case PPCISD::Hi: return "PPCISD::Hi";
case PPCISD::Lo: return "PPCISD::Lo";		case PPCISD::Lo: return "PPCISD::Lo";
case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";		case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";
case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";		case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";
case PPCISD::DYNAREAOFFSET: return "PPCISD::DYNAREAOFFSET";		case PPCISD::DYNAREAOFFSET: return "PPCISD::DYNAREAOFFSET";
case PPCISD::GlobalBaseReg: return "PPCISD::GlobalBaseReg";		case PPCISD::GlobalBaseReg: return "PPCISD::GlobalBaseReg";
▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	for (unsigned i = EltSize, e = 16; i != e; i += EltSize) {
if (N->getMaskElt(i) < 0) continue;		if (N->getMaskElt(i) < 0) continue;
for (unsigned j = 0; j != EltSize; ++j)		for (unsigned j = 0; j != EltSize; ++j)
if (N->getMaskElt(i+j) != N->getMaskElt(j))		if (N->getMaskElt(i+j) != N->getMaskElt(j))
return false;		return false;
}		}
return true;		return true;
}		}

// Check that the mask is shuffling words		// Check that the mask is shuffling N byte elements.
static bool isWordShuffleMask(ShuffleVectorSDNode *N) {		static bool isNByteElemShuffleMask(ShuffleVectorSDNode *N, unsigned Width) {
for (unsigned i = 0; i < 4; ++i) {		assert((Width == 2 \|\| Width == 4 \|\| Width == 8 \|\| Width == 16) &&
unsigned B0 = N->getMaskElt(i*4);		"Unexpected element width.");
unsigned B1 = N->getMaskElt(i*4+1);
unsigned B2 = N->getMaskElt(i*4+2);		unsigned NumOfElem = 16 / Width;
unsigned B3 = N->getMaskElt(i*4+3);		unsigned *MaskVal = new unsigned[Width];
		nemanjaiUnsubmitted Done Reply Inline Actions Is there a compelling reason to manually allocate memory for this over using efficient containers (something like `SmallVector` or similar)? I'd much rather avoid manual memory management of this sort if we can. And I don't see anything in this function to indicate we can't. nemanjai: Is there a compelling reason to manually allocate memory for this over using efficient…
		hfinkelUnsubmitted Done Reply Inline Actions We should definitely be using a SmallVector here, if we need to keep track of the size easily. Otherwise, given that Width is never greater than 16, we can just use: unsigned MaskVal[16]; hfinkel: We should definitely be using a SmallVector here, if we need to keep track of the size easily.
if (B0 % 4)		for (unsigned i = 0; i < NumOfElem; ++i) {
		MaskVal[0] = N->getMaskElt(i * Width);
		if (MaskVal[0] % Width) {
		delete [] MaskVal;
return false;		return false;
if (B1 != B0+1 \|\| B2 != B1+1 \|\| B3 != B2+1)		}

		for (unsigned int j = 1; j < Width; ++j) {
		MaskVal[j] = N->getMaskElt(i * Width + j);
		if (MaskVal[j] != MaskVal[j-1] + 1) {
		delete [] MaskVal;
return false;		return false;
}		}
		}
		}

		delete [] MaskVal;
return true;		return true;
}		}

bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
unsigned &InsertAtByte, bool &Swap, bool IsLE) {		unsigned &InsertAtByte, bool &Swap, bool IsLE) {
if (!isWordShuffleMask(N))		if (!isNByteElemShuffleMask(N, 4))
return false;		return false;

// Now we look at mask elements 0,4,8,12		// Now we look at mask elements 0,4,8,12
unsigned M0 = N->getMaskElt(0) / 4;		unsigned M0 = N->getMaskElt(0) / 4;
unsigned M1 = N->getMaskElt(4) / 4;		unsigned M1 = N->getMaskElt(4) / 4;
unsigned M2 = N->getMaskElt(8) / 4;		unsigned M2 = N->getMaskElt(8) / 4;
unsigned M3 = N->getMaskElt(12) / 4;		unsigned M3 = N->getMaskElt(12) / 4;
unsigned LittleEndianShifts[] = { 2, 1, 0, 3 };		unsigned LittleEndianShifts[] = { 2, 1, 0, 3 };
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	if (M0 == 0 && M1 == 1 && M2 == 2 && M3 == XXINSERTWSrcElem) {
return true;		return true;
}		}
}		}

return false;		return false;
}		}

bool PPC::isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool PPC::isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
bool &Swap, bool IsLE) {		bool &Swap, bool IsLE) {
		nemanjaiUnsubmitted Done Reply Inline Actions Seems like a bit of code duplication with `isWordShuffleMask()`. Perhaps it would be more consistent to just implement something like `isNByteElemShuffleMask(unsigned Width)` which can be called for width 1, 2, 4, 8, 16 as needed. nemanjai: Seems like a bit of code duplication with `isWordShuffleMask()`. Perhaps it would be more…
		jtonyAuthorUnsubmitted Not Done Reply Inline Actions Good suggestion, I have added the isNByteElemShuffleMask function you mentioned here to replace the isWordShuffleMask and isDoubleWordShuffleMask function, but only for 2,4,8,16 bytes since there is no need to call this function to check it is 1 byte element shuffle mask, it is always true. jtony: Good suggestion, I have added the isNByteElemShuffleMask function you mentioned here to replace…
assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");		assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");
		nemanjaiUnsubmitted Done Reply Inline Actions Nit: formatting. This is indented more than the rest of the function for some reason. nemanjai: Nit: formatting. This is indented more than the rest of the function for some reason.
// Ensure each byte index of the word is consecutive.		// Ensure each byte index of the word is consecutive.
		nemanjaiUnsubmitted Done Reply Inline Actions I'm not a fan of loops with 2 iterations. It's probably clearer if this is just straight-line code. nemanjai: I'm not a fan of loops with 2 iterations. It's probably clearer if this is just straight-line…
		kbartonUnsubmitted Done Reply Inline Actions I agree - this should be just straight line code, with a short comment explaining the logic. kbarton: I agree - this should be just straight line code, with a short comment explaining the logic.
		jtonyAuthorUnsubmitted Not Done Reply Inline Actions This function have been refactored to isNByteElemShuffleMask mentioned by Nemanja above. jtony: This function have been refactored to isNByteElemShuffleMask mentioned by Nemanja above.
if (!isWordShuffleMask(N))		if (!isNByteElemShuffleMask(N, 4))
return false;		return false;

// Now we look at mask elements 0,4,8,12, which are the beginning of words.		// Now we look at mask elements 0,4,8,12, which are the beginning of words.
unsigned M0 = N->getMaskElt(0) / 4;		unsigned M0 = N->getMaskElt(0) / 4;
unsigned M1 = N->getMaskElt(4) / 4;		unsigned M1 = N->getMaskElt(4) / 4;
unsigned M2 = N->getMaskElt(8) / 4;		unsigned M2 = N->getMaskElt(8) / 4;
unsigned M3 = N->getMaskElt(12) / 4;		unsigned M3 = N->getMaskElt(12) / 4;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (M0 == 0 \|\| M0 == 1 \|\| M0 == 2 \|\| M0 == 3) {
Swap = true;		Swap = true;
ShiftElts = M0 - 4;		ShiftElts = M0 - 4;
}		}

return true;		return true;
}		}
}		}

		// Set \p Swap to true only if the function return true AND element 0 of the
		nemanjaiUnsubmitted Done Reply Inline Actions A couple of nits that can just be addressed on the commit (no need for another revision): This appears to use doxygen-frendly comments with parameter tags, etc. I think doxygen will only use comments that start with three slashes (`///`). I find it confusing for the comment to jump into the implementation like this. A short sentence explaining what the function does, then some details as needed. For example: /// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap /// if the inputs to the instruction should be swapped and set \p DM to the /// value for the immediate. nemanjai: A couple of nits that can just be addressed on the commit (no need for another revision)…
		// result comes from the first input (LE) or second input (BE), i.e., the inputs
		nemanjaiUnsubmitted Done Reply Inline Actions Some people really don't like boolean parameters :). I really don't think this function is needed - just do the computation inline. Especially when I look at the call sites - it looks like there are 3 and you've already checked the endianness on 2 of them. nemanjai: Some people really don't like boolean parameters :). I really don't think this function is…
		// need to be swapped and elements adjusted accordingly so that we can still
		// generate XXPERMDI instruction.
		// Set \p DM to the calculated result (0-3) only if the function return true.
		// \return true iff the given mask of shuffle node \p N is a XXPERMDI shuffle
		kbartonUnsubmitted Done Reply Inline Actions Please add a comment explaining the semantics, and what what conditions the parameters will be modified. kbarton: Please add a comment explaining the semantics, and what what conditions the parameters will be…
		// mask.
		nemanjaiUnsubmitted Done Reply Inline Actions Nit: indentation. nemanjai: Nit: indentation.
		bool PPC::isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &DM,
		nemanjaiUnsubmitted Done Reply Inline Actions Is there a reason for this check (i.e. vs. an assert)? It seems we should be safe to assert this is the case. nemanjai: Is there a reason for this check (i.e. vs. an assert)? It seems we should be safe to assert…
		bool &Swap, bool IsLE) {
		echristoUnsubmitted Done Reply Inline Actions Since this is basically two functions concatenated on each other with a boolean parameter it seems reasonable to just remove the parameter and split it into two functions and dispatch in the caller. Or do it as a followup with the rest of them, but either way it'd be good to get the endianness stuff cleaned up. echristo: Since this is basically two functions concatenated on each other with a boolean parameter it…
		jtonyAuthorUnsubmitted Done Reply Inline Actions This can be done together with the same issue created for the xxsldwi patch. jtony: This can be done together with the same issue created for the xxsldwi patch.
		assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");

		// Ensure each byte index of the double word is consecutive.
		if (!isNByteElemShuffleMask(N, 8))
		return false;

		unsigned M0 = N->getMaskElt(0) / 8;
		unsigned M1 = N->getMaskElt(8) / 8;
		nemanjaiUnsubmitted Done Reply Inline Actions I think maybe an assert is in order here... Perhaps: `assert((M0 \| M1 < 4) && "A mask element out of bounds?")` nemanjai: I think maybe an assert is in order here... Perhaps: `assert((M0 \| M1 < 4) && "A mask element…
		assert(((M0 \| M1) < 4) && "A mask element out of bounds?");

		// If both vector operands for the shuffle are the same vector, the mask will
		// contain only elements from the first one and the second one will be undef.
		if (N->getOperand(1).isUndef()) {
		nemanjaiUnsubmitted Done Reply Inline Actions Isn't this just `(M0 \| M1) < 2`? Namely, neither is greater than 1. nemanjai: Isn't this just `(M0 \| M1) < 2`? Namely, neither is greater than 1.
		if ((M0 \| M1) < 2) {
		DM = IsLE ? (((~M1) & 1) << 1) + ((~M0) & 1) : (M0 << 1) + (M1 & 1);
		Swap = false;
		return true;
		} else
		return false;
		}

		nemanjaiUnsubmitted Done Reply Inline Actions This entire if statement makes my head hurt. At the very least, it should be documented. But I'm sure when you document it, it'll be obvious how it can be rewritten to be much simpler. Comment it with something along the lines of: // If element 0 of the result comes from the first input (LE) or second input (BE) // the inputs need to be swapped and elements adjusted accordingly. I've suggested simplifications for the LE conditions, but you should be able to simplify the BE conditions accordingly. nemanjai: This entire if statement makes my head hurt. At the very least, it should be documented. But…
		if (IsLE) {
		nemanjaiUnsubmitted Done Reply Inline Actions If we ignore the fact that the second half of this can never be satisfied (M1 can't be 0 and 1), I assume this should be just: `if (M0 > 1 && M1 < 2)` nemanjai: If we ignore the fact that the second half of this can never be satisfied (M1 can't be 0 and 1)…
		if (M0 > 1 && M1 < 2) {
		Swap = false;
		nemanjaiUnsubmitted Done Reply Inline Actions Similarly, this one seems like it should just be: `if (M0 < 2 && M1 > 1)` nemanjai: Similarly, this one seems like it should just be: `if (M0 < 2 && M1 > 1)`
		} else if (M0 < 2 && M1 > 1) {
		M0 = (M0 + 2) % 4;
		M1 = (M1 + 2) % 4;
		Swap = true;
		} else
		return false;

		// Note: if control flow comes here that means Swap is already set above
		kbartonUnsubmitted Done Reply Inline Actions This is a bit hard to follow, but I don't think Swap is modified on this path. Is that intentional? If so, it needs to be documented clearly in the comments above. kbarton: This is a bit hard to follow, but I don't think Swap is modified on this path. Is that…
		DM = (((~M1) & 1) << 1) + ((~M0) & 1);
		return true;
		} else { // BE
		if (M0 < 2 && M1 > 1) {
		Swap = false;
		} else if (M0 > 1 && M1 < 2) {
		M0 = (M0 + 2) % 4;
		M1 = (M1 + 2) % 4;
		Swap = true;
		} else
		return false;

		kbartonUnsubmitted Done Reply Inline Actions Same comment for Swap. kbarton: Same comment for Swap.
		// Note: if control flow comes here that means Swap is already set above
		DM = (M0 << 1) + (M1 & 1);
		return true;
		}
		}


/// getVSPLTImmediate - Return the appropriate VSPLT* immediate to splat the		/// getVSPLTImmediate - Return the appropriate VSPLT* immediate to splat the
/// specified isSplatShuffleMask VECTOR_SHUFFLE mask.		/// specified isSplatShuffleMask VECTOR_SHUFFLE mask.
unsigned PPC::getVSPLTImmediate(SDNode *N, unsigned EltSize,		unsigned PPC::getVSPLTImmediate(SDNode *N, unsigned EltSize,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(N);		ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(N);
assert(isSplatShuffleMask(SVOp, EltSize));		assert(isSplatShuffleMask(SVOp, EltSize));
if (DAG.getDataLayout().isLittleEndian())		if (DAG.getDataLayout().isLittleEndian())
▲ Show 20 Lines • Show All 5,998 Lines • ▼ Show 20 Lines	if (Subtarget.hasVSX() &&
SDValue Conv2 =		SDValue Conv2 =
DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V2.isUndef() ? V1 : V2);		DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V2.isUndef() ? V1 : V2);

SDValue Shl = DAG.getNode(PPCISD::VECSHL, dl, MVT::v4i32, Conv1, Conv2,		SDValue Shl = DAG.getNode(PPCISD::VECSHL, dl, MVT::v4i32, Conv1, Conv2,
DAG.getConstant(ShiftElts, dl, MVT::i32));		DAG.getConstant(ShiftElts, dl, MVT::i32));
return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Shl);		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Shl);
}		}

		if (Subtarget.hasVSX() &&
		PPC::isXXPERMDIShuffleMask(SVOp, ShiftElts, Swap, isLittleEndian)) {
		if (Swap)
		std::swap(V1, V2);
		SDValue Conv1 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);
		SDValue Conv2 =
		DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V2.isUndef() ? V1 : V2);

		SDValue PermDI = DAG.getNode(PPCISD::XXPERMDI, dl, MVT::v4i32, Conv1, Conv2,
		DAG.getConstant(ShiftElts, dl, MVT::i32));
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, PermDI);
		}

if (Subtarget.hasVSX()) {		if (Subtarget.hasVSX()) {
if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {		if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {
int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);		int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);

// If the source for the shuffle is a scalar_to_vector that came from a		// If the source for the shuffle is a scalar_to_vector that came from a
// 32-bit load, it will have used LXVWSX so we don't need to splat again.		// 32-bit load, it will have used LXVWSX so we don't need to splat again.
if (Subtarget.hasP9Vector() &&		if (Subtarget.hasP9Vector() &&
((isLittleEndian && SplatIdx == 3) \|\|		((isLittleEndian && SplatIdx == 3) \|\|
▲ Show 20 Lines • Show All 5,371 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.td

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
def SDT_PPCVecShift : SDTypeProfile<1, 3, [ SDTCisVec<0>,		def SDT_PPCVecShift : SDTypeProfile<1, 3, [ SDTCisVec<0>,
SDTCisVec<1>, SDTCisVec<2>, SDTCisPtrTy<3>		SDTCisVec<1>, SDTCisVec<2>, SDTCisPtrTy<3>
]>;		]>;

def SDT_PPCVecInsert : SDTypeProfile<1, 3, [ SDTCisVec<0>,		def SDT_PPCVecInsert : SDTypeProfile<1, 3, [ SDTCisVec<0>,
SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>		SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>
]>;		]>;

		def SDT_PPCxxpermdi: SDTypeProfile<1, 3, [ SDTCisVec<0>,
		SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>
		]>;

def SDT_PPCvcmp : SDTypeProfile<1, 3, [		def SDT_PPCvcmp : SDTypeProfile<1, 3, [
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisVT<3, i32>		SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisVT<3, i32>
]>;		]>;

def SDT_PPCcondbr : SDTypeProfile<0, 3, [		def SDT_PPCcondbr : SDTypeProfile<0, 3, [
SDTCisVT<0, i32>, SDTCisVT<2, OtherVT>		SDTCisVT<0, i32>, SDTCisVT<2, OtherVT>
]>;		]>;

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	def PPCaddiTlsldLAddr : SDNode<"PPCISD::ADDI_TLSLD_L_ADDR",
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,
SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;		SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;
def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;		def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;
def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;		def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;

def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;		def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;
def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;		def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;
def PPCxxinsert : SDNode<"PPCISD::XXINSERT", SDT_PPCVecInsert, []>;		def PPCxxinsert : SDNode<"PPCISD::XXINSERT", SDT_PPCVecInsert, []>;
		def PPCxxpermdi : SDNode<"PPCISD::XXPERMDI", SDT_PPCxxpermdi, []>;
def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;		def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;

def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;		def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;
def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;		def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;
def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;		def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;
def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;		def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;

def PPCqbflt : SDNode<"PPCISD::QBFLT", SDT_PPCqbflt, []>;		def PPCqbflt : SDNode<"PPCISD::QBFLT", SDT_PPCqbflt, []>;
▲ Show 20 Lines • Show All 4,254 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 837 Lines • ▼ Show 20 Lines	def XXMRGHW : XX3Form<60, 18,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xxmrghw $XT, $XA, $XB", IIC_VecPerm, []>;		"xxmrghw $XT, $XA, $XB", IIC_VecPerm, []>;
def XXMRGLW : XX3Form<60, 50,		def XXMRGLW : XX3Form<60, 50,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xxmrglw $XT, $XA, $XB", IIC_VecPerm, []>;		"xxmrglw $XT, $XA, $XB", IIC_VecPerm, []>;

def XXPERMDI : XX3Form_2<60, 10,		def XXPERMDI : XX3Form_2<60, 10,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, u2imm:$DM),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, u2imm:$DM),
"xxpermdi $XT, $XA, $XB, $DM", IIC_VecPerm, []>;		"xxpermdi $XT, $XA, $XB, $DM", IIC_VecPerm,
		[(set v4i32:$XT, (PPCxxpermdi v4i32:$XA, v4i32:$XB,
		imm32SExt16:$DM))]>;
let isCodeGenOnly = 1 in		let isCodeGenOnly = 1 in
def XXPERMDIs : XX3Form_2s<60, 10, (outs vsrc:$XT), (ins vsfrc:$XA, u2imm:$DM),		def XXPERMDIs : XX3Form_2s<60, 10, (outs vsrc:$XT), (ins vsfrc:$XA, u2imm:$DM),
"xxpermdi $XT, $XA, $XA, $DM", IIC_VecPerm, []>;		"xxpermdi $XT, $XA, $XA, $DM", IIC_VecPerm, []>;
def XXSEL : XX4Form<60, 3,		def XXSEL : XX4Form<60, 3,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, vsrc:$XC),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, vsrc:$XC),
"xxsel $XT, $XA, $XB, $XC", IIC_VecPerm, []>;		"xxsel $XT, $XA, $XB, $XC", IIC_VecPerm, []>;

def XXSLDWI : XX3Form_2<60, 2,		def XXSLDWI : XX3Form_2<60, 2,
▲ Show 20 Lines • Show All 2,104 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vec_xxpermdi.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 < %s \| \
				; RUN: FileCheck %s -check-prefix=CHECK-LE
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 < %s \| \
				; RUN: FileCheck %s -check-prefix=CHECK-BE

				; Possible LE ShuffleVector masks (Case 1):
				; ShuffleVector((vector double)a, (vector double)b, 3, 1)
				; ShuffleVector((vector double)a, (vector double)b, 2, 1)
				; ShuffleVector((vector double)a, (vector double)b, 3, 0)
				; ShuffleVector((vector double)a, (vector double)b, 2, 0)
				; which targets at:
				; xxpermdi a, b, 0
				; xxpermdi a, b, 1
				; xxpermdi a, b, 2
				; xxpermdi a, b, 3
				; Possible LE Swap ShuffleVector masks (Case 2):
				; ShuffleVector((vector double)a, (vector double)b, 1, 3)
				; ShuffleVector((vector double)a, (vector double)b, 0, 3)
				; ShuffleVector((vector double)a, (vector double)b, 1, 2)
				; ShuffleVector((vector double)a, (vector double)b, 0, 2)
				; which targets at:
				; xxpermdi b, a, 0
				; xxpermdi b, a, 1
				; xxpermdi b, a, 2
				; xxpermdi b, a, 3
				; Possible LE ShuffleVector masks when a == b, b is undef (Case 3):
				; ShuffleVector((vector double)a, (vector double)a, 1, 1)
				; ShuffleVector((vector double)a, (vector double)a, 0, 1)
				; ShuffleVector((vector double)a, (vector double)a, 1, 0)
				; ShuffleVector((vector double)a, (vector double)a, 0, 0)
				; which targets at:
				; xxpermdi a, a, 0
				; xxpermdi a, a, 1
				; xxpermdi a, a, 2
				; xxpermdi a, a, 3

				; Possible BE ShuffleVector masks (Case 4):
				; ShuffleVector((vector double)a, (vector double)b, 0, 2)
				; ShuffleVector((vector double)a, (vector double)b, 0, 3)
				; ShuffleVector((vector double)a, (vector double)b, 1, 2)
				; ShuffleVector((vector double)a, (vector double)b, 1, 3)
				; which targets at:
				; xxpermdi a, b, 0
				; xxpermdi a, b, 1
				; xxpermdi a, b, 2
				; xxpermdi a, b, 3
				; Possible BE Swap ShuffleVector masks (Case 5):
				; ShuffleVector((vector double)a, (vector double)b, 2, 0)
				; ShuffleVector((vector double)a, (vector double)b, 3, 0)
				; ShuffleVector((vector double)a, (vector double)b, 2, 1)
				; ShuffleVector((vector double)a, (vector double)b, 3, 1)
				; which targets at:
				; xxpermdi b, a, 0
				; xxpermdi b, a, 1
				; xxpermdi b, a, 2
				; xxpermdi b, a, 3
				; Possible BE ShuffleVector masks when a == b, b is undef (Case 6):
				; ShuffleVector((vector double)a, (vector double)a, 0, 0)
				; ShuffleVector((vector double)a, (vector double)a, 0, 1)
				; ShuffleVector((vector double)a, (vector double)a, 1, 0)
				; ShuffleVector((vector double)a, (vector double)a, 1, 1)
				; which targets at:
				; xxpermdi a, a, 0
				; xxpermdi a, a, 1
				; xxpermdi a, a, 2
				; xxpermdi a, a, 3

				define <2 x double> @test_le_vec_xxpermdi_v2f64_v2f64_0(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 3, i32 1>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_v2f64_0
				; CHECK-LE: xxmrghd 34, 34, 35
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_vec_xxpermdi_v2f64_v2f64_1(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 2, i32 1>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_v2f64_1
				; CHECK-LE: xxpermdi 34, 34, 35, 1
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_vec_xxpermdi_v2f64_v2f64_2(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 3, i32 0>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_v2f64_2
				; CHECK-LE: xxpermdi 34, 34, 35, 2
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_vec_xxpermdi_v2f64_v2f64_3(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 2, i32 0>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_v2f64_3
				; CHECK-LE: xxmrgld 34, 34, 35
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_swap_vec_xxpermdi_v2f64_v2f64_0(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 1, i32 3>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_swap_vec_xxpermdi_v2f64_v2f64_0
				; CHECK-LE: xxmrghd 34, 35, 34
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_swap_vec_xxpermdi_v2f64_v2f64_1(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 0, i32 3>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_swap_vec_xxpermdi_v2f64_v2f64_1
				; CHECK-LE: xxpermdi 34, 35, 34, 1
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_swap_vec_xxpermdi_v2f64_v2f64_2(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 1, i32 2>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_swap_vec_xxpermdi_v2f64_v2f64_2
				; CHECK-LE: xxpermdi 34, 35, 34, 2
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_swap_vec_xxpermdi_v2f64_v2f64_3(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 0, i32 2>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_swap_vec_xxpermdi_v2f64_v2f64_3
				; CHECK-LE: xxmrgld 34, 35, 34
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_vec_xxpermdi_v2f64_undef_0(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 1, i32 1>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_undef_0
				; CHECK-LE: xxspltd 34, 34, 0
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_vec_xxpermdi_v2f64_undef_1(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 0, i32 1>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_undef_1
				; CHECK-LE: blr
				}

				define <2 x double> @test_le_vec_xxpermdi_v2f64_undef_2(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 1, i32 0>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_undef_2
				; CHCECK-LE: xxswapd 34, 34
				}

				define <2 x double> @test_le_vec_xxpermdi_v2f64_undef_3(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 0, i32 0>
				ret <2 x double> %0
				; CHECK-LE-LABEL: @test_le_vec_xxpermdi_v2f64_undef_3
				; CHECK-LE: xxspltd 34, 34, 1
				; CHECK-LE: blr
				}

				; Start testing BE
				define <2 x double> @test_be_vec_xxpermdi_v2f64_v2f64_0(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 0, i32 2>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_v2f64_0
				; CHECK-BE: xxmrghd 34, 34, 35
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_vec_xxpermdi_v2f64_v2f64_1(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 0, i32 3>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_v2f64_1
				; CHECK-BE: xxpermdi 34, 34, 35, 1
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_vec_xxpermdi_v2f64_v2f64_2(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 1, i32 2>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_v2f64_2
				; CHECK-BE: xxpermdi 34, 34, 35, 2
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_vec_xxpermdi_v2f64_v2f64_3(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 1, i32 3>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_v2f64_3
				; CHECK-BE: xxmrgld 34, 34, 35
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_swap_vec_xxpermdi_v2f64_v2f64_0(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 2, i32 0>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_swap_vec_xxpermdi_v2f64_v2f64_0
				; CHECK-BE: xxmrghd 34, 35, 34
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_swap_vec_xxpermdi_v2f64_v2f64_1(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 2, i32 1>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_swap_vec_xxpermdi_v2f64_v2f64_1
				; CHECK-BE: xxpermdi 34, 35, 34, 1
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_swap_vec_xxpermdi_v2f64_v2f64_2(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 3, i32 0>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_swap_vec_xxpermdi_v2f64_v2f64_2
				; CHECK-BE: xxpermdi 34, 35, 34, 2
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_swap_vec_xxpermdi_v2f64_v2f64_3(<2 x double> %VA, <2 x double> %VB) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> %VB,<2 x i32> <i32 3, i32 1>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_swap_vec_xxpermdi_v2f64_v2f64_3
				; CHECK-BE: xxmrgld 34, 35, 34
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_vec_xxpermdi_v2f64_undef_0(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 0, i32 0>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_undef_0
				; CHECK-BE: xxspltd 34, 34, 0
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_vec_xxpermdi_v2f64_undef_1(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 0, i32 1>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_undef_1
				; CHECK-BE: blr
				}

				define <2 x double> @test_be_vec_xxpermdi_v2f64_undef_2(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 1, i32 0>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_undef_2
				; CHCECK-LE: xxswapd 34, 34
				}

				define <2 x double> @test_be_vec_xxpermdi_v2f64_undef_3(<2 x double> %VA) {
				entry:
				%0 = shufflevector <2 x double> %VA, <2 x double> undef, <2 x i32> <i32 1, i32 1>
				ret <2 x double> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v2f64_undef_3
				; CHECK-BE: xxspltd 34, 34, 1
				; CHECK-BE: blr
				}

				; More test cases to test different types of vector inputs
				define <16 x i8> @test_be_vec_xxpermdi_v16i8_v16i8(<16 x i8> %VA, <16 x i8> %VB) {
				entry:
				%0 = shufflevector <16 x i8> %VA, <16 x i8> %VB,<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19>
				ret <16 x i8> %0
				; CHECK-BE-LABEL: @test_be_vec_xxpermdi_v16i8_v16i8
				; CHECK-BE: xxpermdi 34, 34, 35, 1
				; CHECK-BE: blr
				}

				define <8 x i16> @test_le_swap_vec_xxpermdi_v8i16_v8i16(<8 x i16> %VA, <8 x i16> %VB) {
				entry:
				%0 = shufflevector <8 x i16> %VA, <8 x i16> %VB,<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
				ret <8 x i16> %0
				; CHECK-LE-LABEL: @test_le_swap_vec_xxpermdi_v8i16_v8i16
				; CHECK-LE: xxpermdi 34, 35, 34, 1
				; CHECK-LE: blr
				}

				define <4 x i32> @test_le_swap_vec_xxpermdi_v4i32_v4i32(<4 x i32> %VA, <4 x i32> %VB) {
				entry:
				%0 = shufflevector <4 x i32> %VA, <4 x i32> %VB,<4 x i32> <i32 0, i32 1, i32 6, i32 7>
				ret <4 x i32> %0
				; CHECK-LE-LABEL: @test_le_swap_vec_xxpermdi_v4i32_v4i32
				; CHECK-LE: xxpermdi 34, 35, 34, 1
				; CHECK-LE: blr
				}