This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Match vec_revb builtins to P9 instructions.
ClosedPublic

Authored by jtony on May 30 2017, 1:27 PM.

Download Raw Diff

Details

Reviewers

nemanjai
kbarton
sfertile
syzaara
inouehrs
stefanp
lei
hfinkel
echristo

Commits

rG1a8eec141ac2: [PowerPC] Match vec_revb builtins to P9 instructions.
rL305214: [PowerPC] Match vec_revb builtins to P9 instructions.

Summary

Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means. 
 
 This patch is tested functionally clean on Power9 machine.

Diff Detail

Repository: rL LLVM

Event Timeline

jtony created this revision.May 30 2017, 1:27 PM

I am debating internally about suggesting a potential solution for this, but this implementation essentially misses an entire set of complementary shuffle masks. We have a number of instructions that do element-wise reordering and we now have these that do per-element byte reversal. Combining the capabilities covers a lot more shuffles - I am just not positive that these occur enough to warrant the effort.

Here's what I mean:

We have an instruction that will do a "rotate-left-by-word" operation on a vector (and a way to emit that instructions)
We now have a "reverse-bytes-within-word-elements" operation
We don't have a "reverse-bytes-within-each-word-and-rotate-left-by-word", which we can simply do with a 2 instruction sequence now

And of course, the same goes for all other masks we lower to a single instruction. It might be useful for each of them to detect byte-reversal as well. It is non-trivial work, but doesn't sound fundamentally all that hard. Perhaps we should re-design our handling of shuffles at some point and have a robust way to determine what we can lower to a one or two instruction sequence on any Subtarget.

lib/Target/PowerPC/PPCISelLowering.cpp
1598 ↗	(On Diff #100756)	This comment is no longer adequate. There is now another parameter which appears to be the "direction and stride". Please elaborate in a comment what this function does and how to use it. Also, if values other than 1/-1 don't make sense for the new parameter, you can add an assert.
1768 ↗	(On Diff #100756)	Doesn't it just suffice at this point to ensure that each element of the shuffle mask has a value less than 16?
test/CodeGen/PowerPC/vec_revb.ll
1 ↗	(On Diff #100756)	If you're breaking up the RUN lines, keep them within 80 columns.

I agree to Nemanja's comment. Since we have so many special patterns of permutation (e.g. byte reverse, rotate, shift, merge, splat, pack, unpack etc etc), we need a robust framework to optimize these patterns. Maybe we can create a table that maps a shuffle pattern to a specific instruction.

In complex case that can map onto a couple of instructions, there is a trade-off between consuming permutation pipeline resource and a vector register; e.g. "reverse-bytes-within-each-word-and-rotate-left-by-word" can be executed by one permute instruction using one additional vector register for the shuffle pattern. In hot loops, one permute approach may be a better choice than 2 instruction approach since we may be able to prepare shuffle pattern out side loop (if we have enough vector registers).

In D33690#768675, @inouehrs wrote:

I agree to Nemanja's comment. Since we have so many special patterns of permutation (e.g. byte reverse, rotate, shift, merge, splat, pack, unpack etc etc), we need a robust framework to optimize these patterns. Maybe we can create a table that maps a shuffle pattern to a specific instruction.

In complex case that can map onto a couple of instructions, there is a trade-off between consuming permutation pipeline resource and a vector register; e.g. "reverse-bytes-within-each-word-and-rotate-left-by-word" can be executed by one permute instruction using one additional vector register for the shuffle pattern. In hot loops, one permute approach may be a better choice than 2 instruction approach since we may be able to prepare shuffle pattern out side loop (if we have enough vector registers).

Of course, we have the "Perfect Shuffle" solution that works on BE and for v4i32 only. We could extend it to work on LE and to support the new instructions. However, I'm not convinced that's the best (or fastest) way forward.

jtony marked 3 inline comments as done.May 31 2017, 12:53 PM

jtony added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
1768 ↗	(On Diff #100756)	Just check the value less 16 is not sufficient here. We need to make sure the starting element index of each N Byte element is i + Width - 1. For example, for XXBRW , I expect a mask like this: shuffleVector(a, undef, 3,2,1,0,7,6,5,4, 11,10,9,8,15,14,13,12) We check getElt(0)=3, getElt(4)=7, getElt(8)=11, getElt(12)=15 i.e. N->getMaskElt(i) != i + Width - 1, here Width == 4

Address comments from Nemanja.

In D33690#768479, @nemanjai wrote:

I am debating internally about suggesting a potential solution for this, but this implementation essentially misses an entire set of complementary shuffle masks. We have a number of instructions that do element-wise reordering and we now have these that do per-element byte reversal. Combining the capabilities covers a lot more shuffles - I am just not positive that these occur enough to warrant the effort.

Here's what I mean:

We have an instruction that will do a "rotate-left-by-word" operation on a vector (and a way to emit that instructions)

We now have a "reverse-bytes-within-word-elements" operation

We don't have a "reverse-bytes-within-each-word-and-rotate-left-by-word", which we can simply do with a 2 instruction sequence now

And of course, the same goes for all other masks we lower to a single instruction. It might be useful for each of them to detect byte-reversal as well. It is non-trivial work, but doesn't sound fundamentally all that hard. Perhaps we should re-design our handling of shuffles at some point and have a robust way to determine what we can lower to a one or two instruction sequence on any Subtarget.

Had a talk with Nemanja, this work should be done as a separate issue.

jtony edited the summary of this revision. (Show Details)Jun 2 2017, 7:36 AM

kbarton added inline comments.Jun 5 2017, 10:54 AM

lib/Target/PowerPC/PPCISelLowering.cpp
1602 ↗	(On Diff #100958)	Please remove \brief. We no longer need to use \brief now that autobrief has been enabled.
1604 ↗	(On Diff #100958)	incremental/decremental -> increasing/decreasing
1609 ↗	(On Diff #100958)	incremental/decremental -> increasing/decreasing
1781 ↗	(On Diff #100958)	No braces required here.
7909 ↗	(On Diff #100958)	I'm probably missing something basic here, but why are we always converting the return to v16i8?
test/CodeGen/PowerPC/vec_revb.ll
2 ↗	(On Diff #100958)	The patterns are the same for both BE and LE, therefor you don't need separate CHECK-BE and CHECK-LE labels.

Address Kit's comments.

jtony added inline comments.Jun 8 2017, 4:41 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7909 ↗	(On Diff #100958)	Base on my understanding, after legalization, the only legal vector type for vector_shuffle is v16i8, and also the return value for vector_shuffle is v16i8, if we want to replace vector_shuffle with PPCISD::XXREVERSE node, we want to keep the original return type. Otherwise, it will cause "LLVM ERROR: Cannot select" error. If you look at the debug info, it is more clear. If I remove the bitcast to v16i8, for XXBRQ, we would do the following (note we changed the return type after DAG combine): Legalizing: t5: v16i8 = vector_shuffle<15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0> t3, undef:v16i8 ... replacing: t5: v16i8 = vector_shuffle<15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0> t3, undef:v16i8 with: t11: v1i128 = PPCISD::XXREVERSE t2 This will eventually cause: LLVM ERROR: Cannot select: t6: v1i128 = bitcast t11 t11: v1i128 = PPCISD::XXREVERSE t2 t2: v1i128,ch = CopyFromReg t0, Register:v1i128 %vreg0 t1: v1i128 = Register %vreg0 In function: testXXBRQ Therefore, we always need to cast the return value to v16i8. Correct me, if there is any improper understanding.

LGTM

This revision is now accepted and ready to land.Jun 12 2017, 11:04 AM

Closed by commit rL305214: [PowerPC] Match vec_revb builtins to P9 instructions. (authored by jtony). · Explain WhyJun 12 2017, 11:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

PowerPC/

21 lines

76 lines

5 lines

10 lines

test/

CodeGen/

PowerPC/

vec_revb.ll

54 lines

Diff 102208

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// XXSPLT - The PPC VSX splat instructions		/// XXSPLT - The PPC VSX splat instructions
///		///
XXSPLT,		XXSPLT,

/// XXINSERT - The PPC VSX insert instruction		/// XXINSERT - The PPC VSX insert instruction
///		///
XXINSERT,		XXINSERT,

		/// XXREVERSE - The PPC VSX reverse instruction
		///
		XXREVERSE,

/// VECSHL - The PPC VSX shift left instruction		/// VECSHL - The PPC VSX shift left instruction
///		///
VECSHL,		VECSHL,

/// XXPERMDI - The PPC XXPERMDI instruction		/// XXPERMDI - The PPC XXPERMDI instruction
///		///
XXPERMDI,		XXPERMDI,

▲ Show 20 Lines • Show All 356 Lines • ▼ Show 20 Lines	namespace PPC {
/// isVMRGEOShuffleMask - Return true if this is a shuffle mask suitable for		/// isVMRGEOShuffleMask - Return true if this is a shuffle mask suitable for
/// a VMRGEW or VMRGOW instruction		/// a VMRGEW or VMRGOW instruction
bool isVMRGEOShuffleMask(ShuffleVectorSDNode *N, bool CheckEven,		bool isVMRGEOShuffleMask(ShuffleVectorSDNode *N, bool CheckEven,
unsigned ShuffleKind, SelectionDAG &DAG);		unsigned ShuffleKind, SelectionDAG &DAG);
/// isXXSLDWIShuffleMask - Return true if this is a shuffle mask suitable		/// isXXSLDWIShuffleMask - Return true if this is a shuffle mask suitable
/// for a XXSLDWI instruction.		/// for a XXSLDWI instruction.
bool isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
bool &Swap, bool IsLE);		bool &Swap, bool IsLE);

		/// isXXBRHShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRH instruction.
		bool isXXBRHShuffleMask(ShuffleVectorSDNode *N);

		/// isXXBRWShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRW instruction.
		bool isXXBRWShuffleMask(ShuffleVectorSDNode *N);

		/// isXXBRDShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRD instruction.
		bool isXXBRDShuffleMask(ShuffleVectorSDNode *N);

		/// isXXBRQShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRQ instruction.
		bool isXXBRQShuffleMask(ShuffleVectorSDNode *N);

/// isXXPERMDIShuffleMask - Return true if this is a shuffle mask suitable		/// isXXPERMDIShuffleMask - Return true if this is a shuffle mask suitable
/// for a XXPERMDI instruction.		/// for a XXPERMDI instruction.
bool isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
bool &Swap, bool IsLE);		bool &Swap, bool IsLE);

/// isVSLDOIShuffleMask - If this is a vsldoi shuffle mask, return the		/// isVSLDOIShuffleMask - If this is a vsldoi shuffle mask, return the
/// shift amount, otherwise return -1.		/// shift amount, otherwise return -1.
int isVSLDOIShuffleMask(SDNode *N, unsigned ShuffleKind,		int isVSLDOIShuffleMask(SDNode *N, unsigned ShuffleKind,
▲ Show 20 Lines • Show All 608 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,122 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::FRE: return "PPCISD::FRE";		case PPCISD::FRE: return "PPCISD::FRE";
case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";		case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";
case PPCISD::STFIWX: return "PPCISD::STFIWX";		case PPCISD::STFIWX: return "PPCISD::STFIWX";
case PPCISD::VMADDFP: return "PPCISD::VMADDFP";		case PPCISD::VMADDFP: return "PPCISD::VMADDFP";
case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";		case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";
case PPCISD::VPERM: return "PPCISD::VPERM";		case PPCISD::VPERM: return "PPCISD::VPERM";
case PPCISD::XXSPLT: return "PPCISD::XXSPLT";		case PPCISD::XXSPLT: return "PPCISD::XXSPLT";
case PPCISD::XXINSERT: return "PPCISD::XXINSERT";		case PPCISD::XXINSERT: return "PPCISD::XXINSERT";
		case PPCISD::XXREVERSE: return "PPCISD::XXREVERSE";
case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI";		case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI";
case PPCISD::VECSHL: return "PPCISD::VECSHL";		case PPCISD::VECSHL: return "PPCISD::VECSHL";
case PPCISD::CMPB: return "PPCISD::CMPB";		case PPCISD::CMPB: return "PPCISD::CMPB";
case PPCISD::Hi: return "PPCISD::Hi";		case PPCISD::Hi: return "PPCISD::Hi";
case PPCISD::Lo: return "PPCISD::Lo";		case PPCISD::Lo: return "PPCISD::Lo";
case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";		case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";
case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";		case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";
case PPCISD::DYNAREAOFFSET: return "PPCISD::DYNAREAOFFSET";		case PPCISD::DYNAREAOFFSET: return "PPCISD::DYNAREAOFFSET";
▲ Show 20 Lines • Show All 466 Lines • ▼ Show 20 Lines	for (unsigned i = EltSize, e = 16; i != e; i += EltSize) {
if (N->getMaskElt(i) < 0) continue;		if (N->getMaskElt(i) < 0) continue;
for (unsigned j = 0; j != EltSize; ++j)		for (unsigned j = 0; j != EltSize; ++j)
if (N->getMaskElt(i+j) != N->getMaskElt(j))		if (N->getMaskElt(i+j) != N->getMaskElt(j))
return false;		return false;
}		}
return true;		return true;
}		}

// Check that the mask is shuffling N byte elements.		/// Check that the mask is shuffling N byte elements. Within each N byte
static bool isNByteElemShuffleMask(ShuffleVectorSDNode *N, unsigned Width) {		/// element of the mask, the indices could be either in increasing or
		/// decreasing order as long as they are consecutive.
		/// \param[in] N: the shuffle vector SD Node to analyze
		/// \param[in] Width: the element width in bytes, could be 2/4/8/16 (HalfWord/
		/// Word/DoubleWord/QuadWord).
		/// \param[in] StepLen: the delta indices number among the N byte element, if
		/// the mask is in increasing/decreasing order then it is 1/-1.
		/// \return true iff the mask is shuffling N byte elements.
		static bool isNByteElemShuffleMask(ShuffleVectorSDNode *N, unsigned Width,
		int StepLen) {
assert((Width == 2 \|\| Width == 4 \|\| Width == 8 \|\| Width == 16) &&		assert((Width == 2 \|\| Width == 4 \|\| Width == 8 \|\| Width == 16) &&
"Unexpected element width.");		"Unexpected element width.");
		assert((StepLen == 1 \|\| StepLen == -1) && "Unexpected element width.");

unsigned NumOfElem = 16 / Width;		unsigned NumOfElem = 16 / Width;
unsigned MaskVal[16]; // Width is never greater than 16		unsigned MaskVal[16]; // Width is never greater than 16
for (unsigned i = 0; i < NumOfElem; ++i) {		for (unsigned i = 0; i < NumOfElem; ++i) {
MaskVal[0] = N->getMaskElt(i * Width);		MaskVal[0] = N->getMaskElt(i * Width);
if (MaskVal[0] % Width) {		if ((StepLen == 1) && (MaskVal[0] % Width)) {
		return false;
		} else if ((StepLen == -1) && ((MaskVal[0] + 1) % Width)) {
return false;		return false;
}		}

for (unsigned int j = 1; j < Width; ++j) {		for (unsigned int j = 1; j < Width; ++j) {
MaskVal[j] = N->getMaskElt(i * Width + j);		MaskVal[j] = N->getMaskElt(i * Width + j);
if (MaskVal[j] != MaskVal[j-1] + 1) {		if (MaskVal[j] != MaskVal[j-1] + StepLen) {
return false;		return false;
}		}
}		}
}		}

return true;		return true;
}		}

bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
unsigned &InsertAtByte, bool &Swap, bool IsLE) {		unsigned &InsertAtByte, bool &Swap, bool IsLE) {
if (!isNByteElemShuffleMask(N, 4))		if (!isNByteElemShuffleMask(N, 4, 1))
return false;		return false;

// Now we look at mask elements 0,4,8,12		// Now we look at mask elements 0,4,8,12
unsigned M0 = N->getMaskElt(0) / 4;		unsigned M0 = N->getMaskElt(0) / 4;
unsigned M1 = N->getMaskElt(4) / 4;		unsigned M1 = N->getMaskElt(4) / 4;
unsigned M2 = N->getMaskElt(8) / 4;		unsigned M2 = N->getMaskElt(8) / 4;
unsigned M3 = N->getMaskElt(12) / 4;		unsigned M3 = N->getMaskElt(12) / 4;
unsigned LittleEndianShifts[] = { 2, 1, 0, 3 };		unsigned LittleEndianShifts[] = { 2, 1, 0, 3 };
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,

return false;		return false;
}		}

bool PPC::isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool PPC::isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
bool &Swap, bool IsLE) {		bool &Swap, bool IsLE) {
assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");		assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");
// Ensure each byte index of the word is consecutive.		// Ensure each byte index of the word is consecutive.
if (!isNByteElemShuffleMask(N, 4))		if (!isNByteElemShuffleMask(N, 4, 1))
return false;		return false;

// Now we look at mask elements 0,4,8,12, which are the beginning of words.		// Now we look at mask elements 0,4,8,12, which are the beginning of words.
unsigned M0 = N->getMaskElt(0) / 4;		unsigned M0 = N->getMaskElt(0) / 4;
unsigned M1 = N->getMaskElt(4) / 4;		unsigned M1 = N->getMaskElt(4) / 4;
unsigned M2 = N->getMaskElt(8) / 4;		unsigned M2 = N->getMaskElt(8) / 4;
unsigned M3 = N->getMaskElt(12) / 4;		unsigned M3 = N->getMaskElt(12) / 4;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (M0 == 0 \|\| M0 == 1 \|\| M0 == 2 \|\| M0 == 3) {
Swap = true;		Swap = true;
ShiftElts = M0 - 4;		ShiftElts = M0 - 4;
}		}

return true;		return true;
}		}
}		}

		bool static isXXBRShuffleMaskHelper(ShuffleVectorSDNode *N, int Width) {
		assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");

		if (!isNByteElemShuffleMask(N, Width, -1))
		return false;

		for (int i = 0; i < 16; i += Width)
		if (N->getMaskElt(i) != i + Width - 1)
		return false;

		return true;
		}

		bool PPC::isXXBRHShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 2);
		}

		bool PPC::isXXBRWShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 4);
		}

		bool PPC::isXXBRDShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 8);
		}

		bool PPC::isXXBRQShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 16);
		}

/// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap		/// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap
/// if the inputs to the instruction should be swapped and set \p DM to the		/// if the inputs to the instruction should be swapped and set \p DM to the
/// value for the immediate.		/// value for the immediate.
/// Specifically, set \p Swap to true only if \p N can be lowered to XXPERMDI		/// Specifically, set \p Swap to true only if \p N can be lowered to XXPERMDI
/// AND element 0 of the result comes from the first input (LE) or second input		/// AND element 0 of the result comes from the first input (LE) or second input
/// (BE). Set \p DM to the calculated result (0-3) only if \p N can be lowered.		/// (BE). Set \p DM to the calculated result (0-3) only if \p N can be lowered.
/// \return true iff the given mask of shuffle node \p N is a XXPERMDI shuffle		/// \return true iff the given mask of shuffle node \p N is a XXPERMDI shuffle
/// mask.		/// mask.
bool PPC::isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &DM,		bool PPC::isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &DM,
bool &Swap, bool IsLE) {		bool &Swap, bool IsLE) {
assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");		assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");

// Ensure each byte index of the double word is consecutive.		// Ensure each byte index of the double word is consecutive.
if (!isNByteElemShuffleMask(N, 8))		if (!isNByteElemShuffleMask(N, 8, 1))
return false;		return false;

unsigned M0 = N->getMaskElt(0) / 8;		unsigned M0 = N->getMaskElt(0) / 8;
unsigned M1 = N->getMaskElt(8) / 8;		unsigned M1 = N->getMaskElt(8) / 8;
assert(((M0 \| M1) < 4) && "A mask element out of bounds?");		assert(((M0 \| M1) < 4) && "A mask element out of bounds?");

// If both vector operands for the shuffle are the same vector, the mask will		// If both vector operands for the shuffle are the same vector, the mask will
// contain only elements from the first one and the second one will be undef.		// contain only elements from the first one and the second one will be undef.
▲ Show 20 Lines • Show All 6,058 Lines • ▼ Show 20 Lines	if (Subtarget.hasVSX() &&
SDValue Conv2 =		SDValue Conv2 =
DAG.getNode(ISD::BITCAST, dl, MVT::v2i64, V2.isUndef() ? V1 : V2);		DAG.getNode(ISD::BITCAST, dl, MVT::v2i64, V2.isUndef() ? V1 : V2);

SDValue PermDI = DAG.getNode(PPCISD::XXPERMDI, dl, MVT::v2i64, Conv1, Conv2,		SDValue PermDI = DAG.getNode(PPCISD::XXPERMDI, dl, MVT::v2i64, Conv1, Conv2,
DAG.getConstant(ShiftElts, dl, MVT::i32));		DAG.getConstant(ShiftElts, dl, MVT::i32));
return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, PermDI);		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, PermDI);
}		}

		if (Subtarget.hasP9Vector()) {
		if (PPC::isXXBRHShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v8i16, V1);
		SDValue ReveHWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v8i16, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveHWord);
		} else if (PPC::isXXBRWShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);
		SDValue ReveWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v4i32, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveWord);
		} else if (PPC::isXXBRDShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v2i64, V1);
		SDValue ReveDWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v2i64, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveDWord);
		} else if (PPC::isXXBRQShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v1i128, V1);
		SDValue ReveQWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v1i128, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveQWord);
		}
		}

if (Subtarget.hasVSX()) {		if (Subtarget.hasVSX()) {
if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {		if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {
int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);		int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);

// If the source for the shuffle is a scalar_to_vector that came from a		// If the source for the shuffle is a scalar_to_vector that came from a
// 32-bit load, it will have used LXVWSX so we don't need to splat again.		// 32-bit load, it will have used LXVWSX so we don't need to splat again.
if (Subtarget.hasP9Vector() &&		if (Subtarget.hasP9Vector() &&
((isLittleEndian && SplatIdx == 3) \|\|		((isLittleEndian && SplatIdx == 3) \|\|
▲ Show 20 Lines • Show All 5,388 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
def SDT_PPCVecShift : SDTypeProfile<1, 3, [ SDTCisVec<0>,		def SDT_PPCVecShift : SDTypeProfile<1, 3, [ SDTCisVec<0>,
SDTCisVec<1>, SDTCisVec<2>, SDTCisPtrTy<3>		SDTCisVec<1>, SDTCisVec<2>, SDTCisPtrTy<3>
]>;		]>;

def SDT_PPCVecInsert : SDTypeProfile<1, 3, [ SDTCisVec<0>,		def SDT_PPCVecInsert : SDTypeProfile<1, 3, [ SDTCisVec<0>,
SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>		SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>
]>;		]>;

		def SDT_PPCVecReverse: SDTypeProfile<1, 1, [ SDTCisVec<0>,
		SDTCisVec<1>
		]>;

def SDT_PPCxxpermdi: SDTypeProfile<1, 3, [ SDTCisVec<0>,		def SDT_PPCxxpermdi: SDTypeProfile<1, 3, [ SDTCisVec<0>,
SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>		SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>
]>;		]>;

def SDT_PPCvcmp : SDTypeProfile<1, 3, [		def SDT_PPCvcmp : SDTypeProfile<1, 3, [
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisVT<3, i32>		SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisVT<3, i32>
]>;		]>;

▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	def PPCaddiTlsldLAddr : SDNode<"PPCISD::ADDI_TLSLD_L_ADDR",
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,
SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;		SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;
def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;		def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;
def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;		def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;

def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;		def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;
def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;		def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;
def PPCxxinsert : SDNode<"PPCISD::XXINSERT", SDT_PPCVecInsert, []>;		def PPCxxinsert : SDNode<"PPCISD::XXINSERT", SDT_PPCVecInsert, []>;
		def PPCxxreverse : SDNode<"PPCISD::XXREVERSE", SDT_PPCVecReverse, []>;
def PPCxxpermdi : SDNode<"PPCISD::XXPERMDI", SDT_PPCxxpermdi, []>;		def PPCxxpermdi : SDNode<"PPCISD::XXPERMDI", SDT_PPCxxpermdi, []>;
def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;		def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;

def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;		def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;
def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;		def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;
def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;		def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;
def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;		def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;

▲ Show 20 Lines • Show All 4,263 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 2,334 Lines • ▼ Show 20 Lines	let AddedComplexity = 400, Predicates = [HasP9Vector] in {
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Vector Byte-Reverse H/W/D/Q Word		// Vector Byte-Reverse H/W/D/Q Word
def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;		def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;
def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;		def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;
def XXBRD : XX2_XT6_XO5_XB6<60, 23, 475, "xxbrd", vsrc, []>;		def XXBRD : XX2_XT6_XO5_XB6<60, 23, 475, "xxbrd", vsrc, []>;
def XXBRQ : XX2_XT6_XO5_XB6<60, 31, 475, "xxbrq", vsrc, []>;		def XXBRQ : XX2_XT6_XO5_XB6<60, 31, 475, "xxbrq", vsrc, []>;

		// Vector Reverse
		def : Pat<(v8i16 (PPCxxreverse v8i16 :$A)),
		(v8i16 (COPY_TO_REGCLASS (XXBRH (COPY_TO_REGCLASS $A, VSRC)), VRRC))>;
		def : Pat<(v4i32 (PPCxxreverse v4i32 :$A)),
		(v4i32 (XXBRW $A))>;
		def : Pat<(v2i64 (PPCxxreverse v2i64 :$A)),
		(v2i64 (XXBRD $A))>;
		def : Pat<(v1i128 (PPCxxreverse v1i128 :$A)),
		(v1i128 (COPY_TO_REGCLASS (XXBRQ (COPY_TO_REGCLASS $A, VSRC)), VRRC))>;

// Vector Permute		// Vector Permute
def XXPERM : XX3_XT5_XA5_XB5<60, 26, "xxperm" , vsrc, vsrc, vsrc,		def XXPERM : XX3_XT5_XA5_XB5<60, 26, "xxperm" , vsrc, vsrc, vsrc,
IIC_VecPerm, []>;		IIC_VecPerm, []>;
def XXPERMR : XX3_XT5_XA5_XB5<60, 58, "xxpermr", vsrc, vsrc, vsrc,		def XXPERMR : XX3_XT5_XA5_XB5<60, 58, "xxpermr", vsrc, vsrc, vsrc,
IIC_VecPerm, []>;		IIC_VecPerm, []>;

// Vector Splat Immediate Byte		// Vector Splat Immediate Byte
def XXSPLTIB : X_RD6_IMM8<60, 360, (outs vsrc:$XT), (ins u8imm:$IMM8),		def XXSPLTIB : X_RD6_IMM8<60, 360, (outs vsrc:$XT), (ins u8imm:$IMM8),
▲ Show 20 Lines • Show All 673 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/vec_revb.ll

				; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr9 < %s \| FileCheck %s
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 < %s \| FileCheck %s

				define <8 x i16> @testXXBRH(<8 x i16> %a) {
				; CHECK-LABEL: testXXBRH:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: xxbrh 34, 34
				; CHECK-NEXT: blr

				entry:
				%0 = bitcast <8 x i16> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
				%2 = bitcast <16 x i8> %1 to <8 x i16>
				ret <8 x i16> %2
				}

				define <4 x i32> @testXXBRW(<4 x i32> %a) {
				; CHECK-LABEL: testXXBRW:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: xxbrw 34, 34
				; CHECK-NEXT: blr

				entry:
				%0 = bitcast <4 x i32> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>
				%2 = bitcast <16 x i8> %1 to <4 x i32>
				ret <4 x i32> %2
				}

				define <2 x double> @testXXBRD(<2 x double> %a) {
				; CHECK-LABEL: testXXBRD:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: xxbrd 34, 34
				; CHECK-NEXT: blr

				entry:
				%0 = bitcast <2 x double> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8>
				%2 = bitcast <16 x i8> %1 to <2 x double>
				ret <2 x double> %2
				}

				define <1 x i128> @testXXBRQ(<1 x i128> %a) {
				; CHECK-LABEL: testXXBRQ:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: xxbrq 34, 34
				; CHECK-NEXT: blr

				entry:
				%0 = bitcast <1 x i128> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				%2 = bitcast <16 x i8> %1 to <1 x i128>
				ret <1 x i128> %2
				}