This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Match vec_revb builtins to P9 instructions.
ClosedPublic

Authored by jtony on May 30 2017, 1:27 PM.

Download Raw Diff

Details

Reviewers

nemanjai
kbarton
sfertile
syzaara
inouehrs
stefanp
lei
hfinkel
echristo

Commits

rG1a8eec141ac2: [PowerPC] Match vec_revb builtins to P9 instructions.
rL305214: [PowerPC] Match vec_revb builtins to P9 instructions.

Summary

Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means. 
 
 This patch is tested functionally clean on Power9 machine.

Diff Detail

Event Timeline

jtony created this revision.May 30 2017, 1:27 PM

I am debating internally about suggesting a potential solution for this, but this implementation essentially misses an entire set of complementary shuffle masks. We have a number of instructions that do element-wise reordering and we now have these that do per-element byte reversal. Combining the capabilities covers a lot more shuffles - I am just not positive that these occur enough to warrant the effort.

Here's what I mean:

We have an instruction that will do a "rotate-left-by-word" operation on a vector (and a way to emit that instructions)
We now have a "reverse-bytes-within-word-elements" operation
We don't have a "reverse-bytes-within-each-word-and-rotate-left-by-word", which we can simply do with a 2 instruction sequence now

And of course, the same goes for all other masks we lower to a single instruction. It might be useful for each of them to detect byte-reversal as well. It is non-trivial work, but doesn't sound fundamentally all that hard. Perhaps we should re-design our handling of shuffles at some point and have a robust way to determine what we can lower to a one or two instruction sequence on any Subtarget.

lib/Target/PowerPC/PPCISelLowering.cpp
1598	This comment is no longer adequate. There is now another parameter which appears to be the "direction and stride". Please elaborate in a comment what this function does and how to use it. Also, if values other than 1/-1 don't make sense for the new parameter, you can add an assert.
1768	Doesn't it just suffice at this point to ensure that each element of the shuffle mask has a value less than 16?
test/CodeGen/PowerPC/vec_revb.ll
1	If you're breaking up the RUN lines, keep them within 80 columns.

I agree to Nemanja's comment. Since we have so many special patterns of permutation (e.g. byte reverse, rotate, shift, merge, splat, pack, unpack etc etc), we need a robust framework to optimize these patterns. Maybe we can create a table that maps a shuffle pattern to a specific instruction.

In complex case that can map onto a couple of instructions, there is a trade-off between consuming permutation pipeline resource and a vector register; e.g. "reverse-bytes-within-each-word-and-rotate-left-by-word" can be executed by one permute instruction using one additional vector register for the shuffle pattern. In hot loops, one permute approach may be a better choice than 2 instruction approach since we may be able to prepare shuffle pattern out side loop (if we have enough vector registers).

In D33690#768675, @inouehrs wrote:

I agree to Nemanja's comment. Since we have so many special patterns of permutation (e.g. byte reverse, rotate, shift, merge, splat, pack, unpack etc etc), we need a robust framework to optimize these patterns. Maybe we can create a table that maps a shuffle pattern to a specific instruction.

In complex case that can map onto a couple of instructions, there is a trade-off between consuming permutation pipeline resource and a vector register; e.g. "reverse-bytes-within-each-word-and-rotate-left-by-word" can be executed by one permute instruction using one additional vector register for the shuffle pattern. In hot loops, one permute approach may be a better choice than 2 instruction approach since we may be able to prepare shuffle pattern out side loop (if we have enough vector registers).

Of course, we have the "Perfect Shuffle" solution that works on BE and for v4i32 only. We could extend it to work on LE and to support the new instructions. However, I'm not convinced that's the best (or fastest) way forward.

jtony marked 3 inline comments as done.May 31 2017, 12:53 PM

jtony added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
1768	Just check the value less 16 is not sufficient here. We need to make sure the starting element index of each N Byte element is i + Width - 1. For example, for XXBRW , I expect a mask like this: shuffleVector(a, undef, 3,2,1,0,7,6,5,4, 11,10,9,8,15,14,13,12) We check getElt(0)=3, getElt(4)=7, getElt(8)=11, getElt(12)=15 i.e. N->getMaskElt(i) != i + Width - 1, here Width == 4

Address comments from Nemanja.

In D33690#768479, @nemanjai wrote:

I am debating internally about suggesting a potential solution for this, but this implementation essentially misses an entire set of complementary shuffle masks. We have a number of instructions that do element-wise reordering and we now have these that do per-element byte reversal. Combining the capabilities covers a lot more shuffles - I am just not positive that these occur enough to warrant the effort.

Here's what I mean:

We have an instruction that will do a "rotate-left-by-word" operation on a vector (and a way to emit that instructions)

We now have a "reverse-bytes-within-word-elements" operation

We don't have a "reverse-bytes-within-each-word-and-rotate-left-by-word", which we can simply do with a 2 instruction sequence now

And of course, the same goes for all other masks we lower to a single instruction. It might be useful for each of them to detect byte-reversal as well. It is non-trivial work, but doesn't sound fundamentally all that hard. Perhaps we should re-design our handling of shuffles at some point and have a robust way to determine what we can lower to a one or two instruction sequence on any Subtarget.

Had a talk with Nemanja, this work should be done as a separate issue.

jtony edited the summary of this revision. (Show Details)Jun 2 2017, 7:36 AM

kbarton added inline comments.Jun 5 2017, 10:54 AM

lib/Target/PowerPC/PPCISelLowering.cpp
1602	Please remove \brief. We no longer need to use \brief now that autobrief has been enabled.
1604	incremental/decremental -> increasing/decreasing
1609	incremental/decremental -> increasing/decreasing
1781	No braces required here.
7909	I'm probably missing something basic here, but why are we always converting the return to v16i8?
test/CodeGen/PowerPC/vec_revb.ll
3	The patterns are the same for both BE and LE, therefor you don't need separate CHECK-BE and CHECK-LE labels.

Address Kit's comments.

jtony added inline comments.Jun 8 2017, 4:41 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7909	Base on my understanding, after legalization, the only legal vector type for vector_shuffle is v16i8, and also the return value for vector_shuffle is v16i8, if we want to replace vector_shuffle with PPCISD::XXREVERSE node, we want to keep the original return type. Otherwise, it will cause "LLVM ERROR: Cannot select" error. If you look at the debug info, it is more clear. If I remove the bitcast to v16i8, for XXBRQ, we would do the following (note we changed the return type after DAG combine): Legalizing: t5: v16i8 = vector_shuffle<15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0> t3, undef:v16i8 ... replacing: t5: v16i8 = vector_shuffle<15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0> t3, undef:v16i8 with: t11: v1i128 = PPCISD::XXREVERSE t2 This will eventually cause: LLVM ERROR: Cannot select: t6: v1i128 = bitcast t11 t11: v1i128 = PPCISD::XXREVERSE t2 t2: v1i128,ch = CopyFromReg t0, Register:v1i128 %vreg0 t1: v1i128 = Register %vreg0 In function: testXXBRQ Therefore, we always need to cast the return value to v16i8. Correct me, if there is any improper understanding.

LGTM

This revision is now accepted and ready to land.Jun 12 2017, 11:04 AM

Closed by commit rL305214: [PowerPC] Match vec_revb builtins to P9 instructions. (authored by jtony). · Explain WhyJun 12 2017, 11:25 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

21 lines

66 lines

5 lines

10 lines

test/

CodeGen/

PowerPC/

vec_revb.ll

72 lines

Diff 100756

lib/Target/PowerPC/PPCISelLowering.h

Context not available.
	///	///
	XXINSERT,	XXINSERT,

		/// XXREVERSE - The PPC VSX reverse instruction
		///
		XXREVERSE,

	/// VECSHL - The PPC VSX shift left instruction	/// VECSHL - The PPC VSX shift left instruction
	///	///
	VECSHL,	VECSHL,
Context not available.
	/// for a XXSLDWI instruction.	/// for a XXSLDWI instruction.
	bool isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,	bool isXXSLDWIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
	bool &Swap, bool IsLE);	bool &Swap, bool IsLE);

		/// isXXBRHShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRH instruction.
		bool isXXBRHShuffleMask(ShuffleVectorSDNode *N);

		/// isXXBRWShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRW instruction.
		bool isXXBRWShuffleMask(ShuffleVectorSDNode *N);

		/// isXXBRDShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRD instruction.
		bool isXXBRDShuffleMask(ShuffleVectorSDNode *N);

		/// isXXBRQShuffleMask - Return true if this is a shuffle mask suitable
		/// for a XXBRQ instruction.
		bool isXXBRQShuffleMask(ShuffleVectorSDNode *N);

	/// isXXPERMDIShuffleMask - Return true if this is a shuffle mask suitable	/// isXXPERMDIShuffleMask - Return true if this is a shuffle mask suitable
	/// for a XXPERMDI instruction.	/// for a XXPERMDI instruction.
	bool isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,	bool isXXPERMDIShuffleMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
Context not available.

lib/Target/PowerPC/PPCISelLowering.cpp

Context not available.
	case PPCISD::VPERM: return "PPCISD::VPERM";	case PPCISD::VPERM: return "PPCISD::VPERM";
	case PPCISD::XXSPLT: return "PPCISD::XXSPLT";	case PPCISD::XXSPLT: return "PPCISD::XXSPLT";
	case PPCISD::XXINSERT: return "PPCISD::XXINSERT";	case PPCISD::XXINSERT: return "PPCISD::XXINSERT";
		case PPCISD::XXREVERSE: return "PPCISD::XXREVERSE";
	case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI";	case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI";
	case PPCISD::VECSHL: return "PPCISD::VECSHL";	case PPCISD::VECSHL: return "PPCISD::VECSHL";
	case PPCISD::CMPB: return "PPCISD::CMPB";	case PPCISD::CMPB: return "PPCISD::CMPB";
Context not available.
	}	}

	// Check that the mask is shuffling N byte elements.	// Check that the mask is shuffling N byte elements.
		nemanjaiUnsubmitted Done Reply Inline Actions This comment is no longer adequate. There is now another parameter which appears to be the "direction and stride". Please elaborate in a comment what this function does and how to use it. Also, if values other than 1/-1 don't make sense for the new parameter, you can add an assert. nemanjai: This comment is no longer adequate. There is now another parameter which appears to be the…
	static bool isNByteElemShuffleMask(ShuffleVectorSDNode *N, unsigned Width) {	static bool isNByteElemShuffleMask(ShuffleVectorSDNode *N, unsigned Width,
		int StepLen) {
	assert((Width == 2 \|\| Width == 4 \|\| Width == 8 \|\| Width == 16) &&	assert((Width == 2 \|\| Width == 4 \|\| Width == 8 \|\| Width == 16) &&
	"Unexpected element width.");	"Unexpected element width.");
		kbartonUnsubmitted Done Reply Inline Actions Please remove \brief. We no longer need to use \brief now that autobrief has been enabled. kbarton: Please remove \brief. We no longer need to use \brief now that autobrief has been enabled.

		kbartonUnsubmitted Done Reply Inline Actions incremental/decremental -> increasing/decreasing kbarton: incremental/decremental -> increasing/decreasing
Context not available.
	unsigned MaskVal[16]; // Width is never greater than 16	unsigned MaskVal[16]; // Width is never greater than 16
	for (unsigned i = 0; i < NumOfElem; ++i) {	for (unsigned i = 0; i < NumOfElem; ++i) {
	MaskVal[0] = N->getMaskElt(i * Width);	MaskVal[0] = N->getMaskElt(i * Width);
	if (MaskVal[0] % Width) {	if ((StepLen == 1) && (MaskVal[0] % Width)) {
		return false;
		kbartonUnsubmitted Done Reply Inline Actions incremental/decremental -> increasing/decreasing kbarton: incremental/decremental -> increasing/decreasing
		} else if ((StepLen == -1) && ((MaskVal[0] + 1) % Width)) {
	return false;	return false;
	}	}

	for (unsigned int j = 1; j < Width; ++j) {	for (unsigned int j = 1; j < Width; ++j) {
	MaskVal[j] = N->getMaskElt(i * Width + j);	MaskVal[j] = N->getMaskElt(i * Width + j);
	if (MaskVal[j] != MaskVal[j-1] + 1) {	if (MaskVal[j] != MaskVal[j-1] + StepLen) {
	return false;	return false;
	}	}
	}	}
Context not available.

	bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,	bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
	unsigned &InsertAtByte, bool &Swap, bool IsLE) {	unsigned &InsertAtByte, bool &Swap, bool IsLE) {
	if (!isNByteElemShuffleMask(N, 4))	if (!isNByteElemShuffleMask(N, 4, 1))
	return false;	return false;

	// Now we look at mask elements 0,4,8,12	// Now we look at mask elements 0,4,8,12
Context not available.
	bool &Swap, bool IsLE) {	bool &Swap, bool IsLE) {
	assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");	assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");
	// Ensure each byte index of the word is consecutive.	// Ensure each byte index of the word is consecutive.
	if (!isNByteElemShuffleMask(N, 4))	if (!isNByteElemShuffleMask(N, 4, 1))
	return false;	return false;

	// Now we look at mask elements 0,4,8,12, which are the beginning of words.	// Now we look at mask elements 0,4,8,12, which are the beginning of words.
Context not available.
	}	}
	}	}

		bool static isXXBRShuffleMaskHelper(ShuffleVectorSDNode *N, int Width) {
		assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");

		if (!isNByteElemShuffleMask(N, Width, -1))
		return false;

		for (int i = 0; i < 16; i += Width) {
		nemanjaiUnsubmitted Done Reply Inline Actions Doesn't it just suffice at this point to ensure that each element of the shuffle mask has a value less than 16? nemanjai: Doesn't it just suffice at this point to ensure that each element of the shuffle mask has a…
		jtonyAuthorUnsubmitted Done Reply Inline Actions Just check the value less 16 is not sufficient here. We need to make sure the starting element index of each N Byte element is i + Width - 1. For example, for XXBRW , I expect a mask like this: shuffleVector(a, undef, 3,2,1,0,7,6,5,4, 11,10,9,8,15,14,13,12) We check getElt(0)=3, getElt(4)=7, getElt(8)=11, getElt(12)=15 i.e. N->getMaskElt(i) != i + Width - 1, here Width == 4 jtony: Just check the value less 16 is not sufficient here. We need to make sure the starting element…
		if (N->getMaskElt(i) != i + Width - 1)
		return false;
		}

		return true;
		}

		bool PPC::isXXBRHShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 2);
		}

		bool PPC::isXXBRWShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 4);
		kbartonUnsubmitted Done Reply Inline Actions No braces required here. kbarton: No braces required here.
		}

		bool PPC::isXXBRDShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 8);
		}

		bool PPC::isXXBRQShuffleMask(ShuffleVectorSDNode *N) {
		return isXXBRShuffleMaskHelper(N, 16);
		}

	/// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap	/// Can node \p N be lowered to an XXPERMDI instruction? If so, set \p Swap
	/// if the inputs to the instruction should be swapped and set \p DM to the	/// if the inputs to the instruction should be swapped and set \p DM to the
	/// value for the immediate.	/// value for the immediate.
Context not available.
	assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");	assert(N->getValueType(0) == MVT::v16i8 && "Shuffle vector expects v16i8");

	// Ensure each byte index of the double word is consecutive.	// Ensure each byte index of the double word is consecutive.
	if (!isNByteElemShuffleMask(N, 8))	if (!isNByteElemShuffleMask(N, 8, 1))
	return false;	return false;

	unsigned M0 = N->getMaskElt(0) / 8;	unsigned M0 = N->getMaskElt(0) / 8;
Context not available.
	return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, PermDI);	return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, PermDI);
	}	}

		if (Subtarget.hasP9Vector()) {
		if (PPC::isXXBRHShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v8i16, V1);
		SDValue ReveHWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v8i16, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveHWord);
		} else if (PPC::isXXBRWShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);
		SDValue ReveWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v4i32, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveWord);
		} else if (PPC::isXXBRDShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v2i64, V1);
		SDValue ReveDWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v2i64, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveDWord);
		} else if (PPC::isXXBRQShuffleMask(SVOp)) {
		SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v1i128, V1);
		SDValue ReveQWord = DAG.getNode(PPCISD::XXREVERSE, dl, MVT::v1i128, Conv);
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, ReveQWord);
		}
		}

	if (Subtarget.hasVSX()) {	if (Subtarget.hasVSX()) {
	if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {	if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {
	int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);	int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);
Context not available.
		kbartonUnsubmitted Done Reply Inline Actions I'm probably missing something basic here, but why are we always converting the return to v16i8? kbarton: I'm probably missing something basic here, but why are we always converting the return to v16i8?
		jtonyAuthorUnsubmitted Not Done Reply Inline Actions Base on my understanding, after legalization, the only legal vector type for vector_shuffle is v16i8, and also the return value for vector_shuffle is v16i8, if we want to replace vector_shuffle with PPCISD::XXREVERSE node, we want to keep the original return type. Otherwise, it will cause "LLVM ERROR: Cannot select" error. If you look at the debug info, it is more clear. If I remove the bitcast to v16i8, for XXBRQ, we would do the following (note we changed the return type after DAG combine): Legalizing: t5: v16i8 = vector_shuffle<15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0> t3, undef:v16i8 ... replacing: t5: v16i8 = vector_shuffle<15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0> t3, undef:v16i8 with: t11: v1i128 = PPCISD::XXREVERSE t2 This will eventually cause: LLVM ERROR: Cannot select: t6: v1i128 = bitcast t11 t11: v1i128 = PPCISD::XXREVERSE t2 t2: v1i128,ch = CopyFromReg t0, Register:v1i128 %vreg0 t1: v1i128 = Register %vreg0 In function: testXXBRQ Therefore, we always need to cast the return value to v16i8. Correct me, if there is any improper understanding. jtony: Base on my understanding, after legalization, the only legal vector type for vector_shuffle is…

lib/Target/PowerPC/PPCInstrInfo.td

Context not available.
	SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>	SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>
	]>;	]>;

		def SDT_PPCVecReverse: SDTypeProfile<1, 1, [ SDTCisVec<0>,
		SDTCisVec<1>
		]>;

	def SDT_PPCxxpermdi: SDTypeProfile<1, 3, [ SDTCisVec<0>,	def SDT_PPCxxpermdi: SDTypeProfile<1, 3, [ SDTCisVec<0>,
	SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>	SDTCisVec<1>, SDTCisVec<2>, SDTCisInt<3>
	]>;	]>;
Context not available.
	def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;	def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;
	def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;	def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;
	def PPCxxinsert : SDNode<"PPCISD::XXINSERT", SDT_PPCVecInsert, []>;	def PPCxxinsert : SDNode<"PPCISD::XXINSERT", SDT_PPCVecInsert, []>;
		def PPCxxreverse : SDNode<"PPCISD::XXREVERSE", SDT_PPCVecReverse, []>;
	def PPCxxpermdi : SDNode<"PPCISD::XXPERMDI", SDT_PPCxxpermdi, []>;	def PPCxxpermdi : SDNode<"PPCISD::XXPERMDI", SDT_PPCxxpermdi, []>;
	def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;	def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;

Context not available.

lib/Target/PowerPC/PPCInstrVSX.td

Context not available.
	def XXBRD : XX2_XT6_XO5_XB6<60, 23, 475, "xxbrd", vsrc, []>;	def XXBRD : XX2_XT6_XO5_XB6<60, 23, 475, "xxbrd", vsrc, []>;
	def XXBRQ : XX2_XT6_XO5_XB6<60, 31, 475, "xxbrq", vsrc, []>;	def XXBRQ : XX2_XT6_XO5_XB6<60, 31, 475, "xxbrq", vsrc, []>;

		// Vector Reverse
		def : Pat<(v8i16 (PPCxxreverse v8i16 :$A)),
		(v8i16 (COPY_TO_REGCLASS (XXBRH (COPY_TO_REGCLASS $A, VSRC)), VRRC))>;
		def : Pat<(v4i32 (PPCxxreverse v4i32 :$A)),
		(v4i32 (XXBRW $A))>;
		def : Pat<(v2i64 (PPCxxreverse v2i64 :$A)),
		(v2i64 (XXBRD $A))>;
		def : Pat<(v1i128 (PPCxxreverse v1i128 :$A)),
		(v1i128 (COPY_TO_REGCLASS (XXBRQ (COPY_TO_REGCLASS $A, VSRC)), VRRC))>;

	// Vector Permute	// Vector Permute
	def XXPERM : XX3_XT5_XA5_XB5<60, 26, "xxperm" , vsrc, vsrc, vsrc,	def XXPERM : XX3_XT5_XA5_XB5<60, 26, "xxperm" , vsrc, vsrc, vsrc,
	IIC_VecPerm, []>;	IIC_VecPerm, []>;
Context not available.

test/CodeGen/PowerPC/vec_revb.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr9 < %s \| \
				nemanjaiUnsubmitted Done Reply Inline Actions If you're breaking up the RUN lines, keep them within 80 columns. nemanjai: If you're breaking up the RUN lines, keep them within 80 columns.
				; RUN: FileCheck %s -check-prefix=CHECK-BE
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 < %s \| \
				kbartonUnsubmitted Not Done Reply Inline Actions The patterns are the same for both BE and LE, therefor you don't need separate CHECK-BE and CHECK-LE labels. kbarton: The patterns are the same for both BE and LE, therefor you don't need separate CHECK-BE and…
				; RUN: FileCheck %s -check-prefix=CHECK-LE

				define <8 x i16> @testXXBRH(<8 x i16> %a) {
				; CHECK-LE-LABEL: testXXBRH:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: xxbrh 34, 34
				; CHECK-LE-NEXT: blr

				; CHECK-BE-LABEL: testXXBRH:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: xxbrh 34, 34
				; CHECK-BE-NEXT: blr
				entry:
				%0 = bitcast <8 x i16> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 1, i32 0, i32 3, i32 2, i32 5, i32 4, i32 7, i32 6, i32 9, i32 8, i32 11, i32 10, i32 13, i32 12, i32 15, i32 14>
				%2 = bitcast <16 x i8> %1 to <8 x i16>
				ret <8 x i16> %2
				}

				define <4 x i32> @testXXBRW(<4 x i32> %a) {
				; CHECK-LE-LABEL: testXXBRW:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: xxbrw 34, 34
				; CHECK-LE-NEXT: blr

				; CHECK-BE-LABEL: testXXBRW:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: xxbrw 34, 34
				; CHECK-BE-NEXT: blr
				entry:
				%0 = bitcast <4 x i32> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4, i32 11, i32 10, i32 9, i32 8, i32 15, i32 14, i32 13, i32 12>
				%2 = bitcast <16 x i8> %1 to <4 x i32>
				ret <4 x i32> %2
				}

				define <2 x double> @testXXBRD(<2 x double> %a) {
				; CHECK-LE-LABEL: testXXBRD:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: xxbrd 34, 34
				; CHECK-LE-NEXT: blr

				; CHECK-BE-LABEL: testXXBRD:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: xxbrd 34, 34
				; CHECK-BE-NEXT: blr
				entry:
				%0 = bitcast <2 x double> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0, i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8>
				%2 = bitcast <16 x i8> %1 to <2 x double>
				ret <2 x double> %2
				}

				define <1 x i128> @testXXBRQ(<1 x i128> %a) {
				; CHECK-LE-LABEL: testXXBRQ:
				; CHECK-LE: # BB#0: # %entry
				; CHECK-LE-NEXT: xxbrq 34, 34
				; CHECK-LE-NEXT: blr

				; CHECK-BE-LABEL: testXXBRQ:
				; CHECK-BE: # BB#0: # %entry
				; CHECK-BE-NEXT: xxbrq 34, 34
				; CHECK-BE-NEXT: blr
				entry:
				%0 = bitcast <1 x i128> %a to <16 x i8>
				%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
				%2 = bitcast <16 x i8> %1 to <1 x i128>
				ret <1 x i128> %2
				}