This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
3/3
PPCISelLowering.h
4/20
PPCISelLowering.cpp
-
PPCInstrAltivec.td
-
PPCInstrInfo.td
-
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
2/2
p9-vinsert-vextract.ll

Differential D34160

[Power9] Exploit vinserth instruction
ClosedPublic

Authored by gyiu on Jun 13 2017, 12:51 PM.

Download Raw Diff

Details

Reviewers

kbarton
nemanjai
inouehrs
sfertile
jtony
lei
hfinkel
stefanp
syzaara

Commits

rG671526148c54: Adds code to PPC ISEL lowering to recognize half-word inserts from…
rL317111: Adds code to PPC ISEL lowering to recognize half-word inserts from…

Summary

This patch adds code to PPC ISEL lowering to recognize half-word inserts from vector_shuffles, and use P9 shift and vector insert instructions instead of vperm.

Diff Detail

Repository: rL LLVM

Event Timeline

gyiu created this revision.Jun 13 2017, 12:51 PM

This patch (potentially) increase the number of vector instructions (permutation -> shift + insert). Is my understanding correct?

lei added inline comments.Jun 14 2017, 12:03 PM

lib/Target/PowerPC/PPCISelLowering.cpp
1652	nit: don't forget the "." :)
test/CodeGen/PowerPC/p9-vinsert-vextract.ll
6	Can you add a short description of what each of these functions are testing?

In D34160#779972, @inouehrs wrote:

This patch (potentially) increase the number of vector instructions (permutation -> shift + insert). Is my understanding correct?

Yep. Though I think with a vperm you still need to load the mask into a vector register first, whereas with vshift + vinsert we're saving on the load.

In D34160#781261, @gyiu wrote:

In D34160#779972, @inouehrs wrote:

This patch (potentially) increase the number of vector instructions (permutation -> shift + insert). Is my understanding correct?

Yep. Though I think with a vperm you still need to load the mask into a vector register first, whereas with vshift + vinsert we're saving on the load.

I feel we should not increase the number of vector instructions within a loop (i.e. a common case for vector code) if we can load the mask into a vector register before the loop.
In case without an additional shift, it is nice to do opt in a loop for freeing up one vector register.

jtony added inline comments.Jun 15 2017, 1:55 PM

lib/Target/PowerPC/PPCISelLowering.h
515	The community doesn't like the bool parameters here. But I am not sure whether we must remove them or it is just nice to do.
test/CodeGen/PowerPC/p9-vinsert-vextract.ll
9	I would prefer to use non-mangled function names to make it more readable. I think you can just regenerate the IR from c source file instead of cpp file.

Added comments and demangled function names for LIT tests
Added period to comment
Fixed issue when one operand of the shufflevector is 'undef', in which case the PPCISDs we generate will use only the defined one.
Initialize 'Swap' boolean

Go ahead and submit the rename separately (if you don't have commit access send me a patch and I'll do it). I would prefer to try to minimize boolean arguments - in this case you're adding a few more including a true/false and a return value one. Feel free to refactor the code around it to require fewer booleans or have multiple functions with helper functions that end up being called.

Thanks!

lib/Target/PowerPC/PPCISelLowering.h
515	Please do.

jtony added inline comments.Jun 16 2017, 7:55 AM

lib/Target/PowerPC/PPCISelLowering.cpp
1629	Is there any reason why we use uint32_t? If not, I would use unsigned instead to make it consistent. We are using both unsigned and uint32_t in this file.

gyiu marked 2 inline comments as done.Jun 16 2017, 10:13 AM

gyiu added inline comments.Jun 19 2017, 8:05 AM

lib/Target/PowerPC/PPCISelLowering.cpp
1629	I think using uint32_t is appropriate here because I want to illustrate that I'm using 32-bits exactly. Using 'unsigned int' would be fine as well, but I don't think it's as clear that I'm looking for 32-bits exactly.

Added -O0 to LIT tests to test corner case of undef 2nd operand of vector shuffle.
Refactored VINSERTH code to avoid boolean parameters and return value.
Merged loops for 2nd operand undefined case and both operands defined.

echristo added inline comments.Jun 19 2017, 11:19 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7916	Nit: (Here and other places) Comments are complete sentences including punctuation.

In D34160#781301, @inouehrs wrote:

In D34160#781261, @gyiu wrote:

In D34160#779972, @inouehrs wrote:

This patch (potentially) increase the number of vector instructions (permutation -> shift + insert). Is my understanding correct?

Yep. Though I think with a vperm you still need to load the mask into a vector register first, whereas with vshift + vinsert we're saving on the load.

I feel we should not increase the number of vector instructions within a loop (i.e. a common case for vector code) if we can load the mask into a vector register before the loop.
In case without an additional shift, it is nice to do opt in a loop for freeing up one vector register.

Although I definitely agree that we should take steps to ensure we don't introduce further instructions in loops, I'm not sure that avoiding a 2-instruction sequence for a shuffle is necessarily the right thing to do. This statement is predicated on the fact that we can hoist the constant pool load out of a loop. If register pressure prevents this, we will have a load in the loop. Furthermore, if the loop is large enough and has other memory operations, it is conceivable that the constant pool load could be a cache miss on every iteration. And it is conceivable that such large loops will be the ones for which register pressure prevents the hoisting of the load. Furthermore, if the GPR register pressure is also high, we might not even be able to hoist the address calculations outside the loop, which would make the vperm sequence 3-4 instructions.
I think that at ISEL time, we should favour shorter instruction sequences that don't involve loads. And perhaps if we can show that multi-instruction permute sequences in loops appear enough in real code, we might want to have a loop pass that simplifies them into a load outside the loop with a vperm in the loop in general.

lib/Target/PowerPC/PPCISelLowering.h
1066	This is probably a remnant of a previous implementation. Please rewrite the comment.

nemanjai added inline comments.Jun 21 2017, 8:35 PM

lib/Target/PowerPC/PPCISelLowering.cpp
7910	So we don't want to shift if we're within the same register? Is there a specific reason for this?
7921	Isn't this already guaranteed to only have the low order 3 bits set?
7926	Why would we continue to search if we've already confirmed that: We have an element from vector A All other elements are from vector B in the correct order

Addressed comments about my comments (grammar, periods, etc).
Removed irrelevant comments in PPCISelLowering.h

gyiu marked 2 inline comments as done.Jun 21 2017, 9:58 PM

gyiu added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
7910	I believe we have to add a xxlor to another VR if we want to shift the vector since we can't shift if both operands of the vector shuffle are the same vector. Adding another two cycles to VECSHL+VECINSERT seems diminish its value versus load+vperm.
7921	MaskOneElt could actually be >= 8, since the mask is in range [0, 15].
7926	Yep, you're correct. Need a break here since we can't find more than one candidate.

nemanjai added inline comments.Jun 23 2017, 6:17 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7910	I don't really see why. Assume that you have something like this: vector unsigned short test(vector unsigned short a) { a[5] = a[2]; } I don't see why we can't codegen something like this for it: vsldoi 3, 2, 2, 4 vinserth 2, 3, 4 Forgive me if I didn't work out the immediates exactly correctly, but the point is the [lack of] need for the XXLOR. Of course, this does use an extra register, but so does the alternative (vperm).
7921	Ah, right. I didn't think of that. Sorry about that.

gyiu added inline comments.Jun 23 2017, 11:49 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7910	Hmmm... Yep, you're right. I guess I can simplify my code even further now. I think this also means I have to fix up the code for the original xxinsertw lowering in a separate patch.

nemanjai added inline comments.Jun 23 2017, 1:08 PM

lib/Target/PowerPC/PPCISelLowering.cpp
7910	Yes, as @echristo mentioned, you should do all the renaming of things in a separate patch that doesn't really require a review. You're just renaming stuff.

I'll open a separate item to address Nemenja's comments as I will not get a chance to do another enchancement.

I don't really see why. Assume that you have something like this:

vector unsigned short test(vector unsigned short a) {
a[5] = a[2];
}

I don't see why we can't codegen something like this for it:

vsldoi 3, 2, 2, 4
vinserth 2, 3, 4

Forgive me if I didn't work out the immediates exactly correctly, but the point is the [lack of] need for the XXLOR. Of course, this does use an extra register, but so does the alternative (vperm).

lib/Target/PowerPC/PPCISelLowering.cpp
7910	Actually, I'm not quite sure what you mean here. The original code for xxinsertw has the limitation of only being able to insert element 3 if both input vectors to the vector_shuffle are the same. I'll need to change that in a separate patch. I'm not sure where the 'renaming of things' comes into play?

nemanjai added inline comments.Jun 26 2017, 9:14 AM

lib/Target/PowerPC/PPCISelLowering.cpp
1132	By "renaming stuff", I mean things like this. Kind of orthogonal to the patch and should go in as a separate NFC change.

Added breaks to stop searching for the pattern once I've found a candidate.

gyiu marked an inline comment as done.Jun 26 2017, 9:15 AM

nemanjai added inline comments.Jun 29 2017, 1:45 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7903	You should be able to get rid of this condition here. Move the assignment `if (V.isUndef()) V2 = V1;` above here Use the `OriginalOrderLow` if the two vectors are the same The rest should fall out naturally and we'll do the shift for the single-input case as well. And the code will also be simpler.

gyiu added inline comments.Aug 23 2017, 10:35 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7903	@nemanjai I created Issue #410 on github to address the issue when using vector shifts in the case when both inputs are the same vector. There's further investigation that's required as it's not clear which input/output registers the (vector shift + vector extract) sequence uses in this case. I would rather do this change as part of that work item instead.

Refactored NFCs to another patch to be committed.
Made changes to remove restriction on only recognizing shuffles of halfword element 3 (4 in LE mode) when both input vectors are the same vector. That is, we can now recognize all single element shuffles in this situation.

Note that I was able to re-implement Nemanja's suggestion of generalizing the case when both inputs are the same vector because the registers used in code-gen are now consistent. Not sure if it was a real problem that I saw previously, or a transient issue that was fixed with newer levels of LLVM.

Changed my mind, removed changes related to this comment:

"Made changes to remove restriction on only recognizing shuffles of halfword element 3 (4 in LE mode) when both input vectors are the same vector. That is, we can now recognize all single element shuffles in this situation."

Will use a different patch to remove the restriction instead, as the contents of this patch is still functionally correct.

kbarton added inline comments.Oct 23 2017, 8:03 PM

lib/Target/PowerPC/PPCISelLowering.cpp
117	Is this still necessary? I don't see any calls to it - only to the 3-parameter version of it.

gyiu marked an inline comment as done.Oct 24 2017, 8:29 AM

gyiu added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
117	Yeah, this patch is old so the declaration is based on the version that had two parameters. I'll update it when I merge with the latest code.

LGTM

This revision is now accepted and ready to land.Oct 24 2017, 3:27 PM

Closed by commit rL317111: Adds code to PPC ISEL lowering to recognize half-word inserts from… (authored by gyiu). · Explain WhyNov 1 2017, 11:07 AM

This revision was automatically updated to reflect the committed changes.

gyiu marked an inline comment as done.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

18 lines

125 lines

16 lines

2 lines

2 lines

test/

CodeGen/

PowerPC/

p9-vinsert-vextract.ll

300 lines

Diff 103079

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// VPERM - The PPC VPERM Instruction.		/// VPERM - The PPC VPERM Instruction.
///		///
VPERM,		VPERM,

/// XXSPLT - The PPC VSX splat instructions		/// XXSPLT - The PPC VSX splat instructions
///		///
XXSPLT,		XXSPLT,

/// XXINSERT - The PPC VSX insert instruction		/// VECINSERT - The PPC vector insert instruction
///		///
XXINSERT,		VECINSERT,

/// XXREVERSE - The PPC VSX reverse instruction		/// XXREVERSE - The PPC VSX reverse instruction
///		///
XXREVERSE,		XXREVERSE,

/// VECSHL - The PPC VSX shift left instruction		/// VECSHL - The PPC vector shift left instruction
///		///
VECSHL,		VECSHL,

/// XXPERMDI - The PPC XXPERMDI instruction		/// XXPERMDI - The PPC XXPERMDI instruction
///		///
XXPERMDI,		XXPERMDI,

/// The CMPB instruction (takes two operands of i32 or i64).		/// The CMPB instruction (takes two operands of i32 or i64).
▲ Show 20 Lines • Show All 405 Lines • ▼ Show 20 Lines	namespace PPC {
/// getVSPLTImmediate - Return the appropriate VSPLT* immediate to splat the		/// getVSPLTImmediate - Return the appropriate VSPLT* immediate to splat the
/// specified isSplatShuffleMask VECTOR_SHUFFLE mask.		/// specified isSplatShuffleMask VECTOR_SHUFFLE mask.
unsigned getVSPLTImmediate(SDNode *N, unsigned EltSize, SelectionDAG &DAG);		unsigned getVSPLTImmediate(SDNode *N, unsigned EltSize, SelectionDAG &DAG);

/// get_VSPLTI_elt - If this is a build_vector of constants which can be		/// get_VSPLTI_elt - If this is a build_vector of constants which can be
/// formed by using a vspltis[bhw] instruction of the specified element		/// formed by using a vspltis[bhw] instruction of the specified element
/// size, return the constant being splatted. The ByteSize field indicates		/// size, return the constant being splatted. The ByteSize field indicates
/// the number of bytes of each element [124] -> [bhw].		/// the number of bytes of each element [124] -> [bhw].
SDValue get_VSPLTI_elt(SDNode *N, unsigned ByteSize, SelectionDAG &DAG);		SDValue get_VSPLTI_elt(SDNode *N, unsigned ByteSize, SelectionDAG &DAG);
		jtonyUnsubmitted Done Reply Inline Actions The community doesn't like the bool parameters here. But I am not sure whether we must remove them or it is just nice to do. jtony: The community doesn't like the bool parameters here. But I am not sure whether we must remove…
		echristoUnsubmitted Done Reply Inline Actions Please do. echristo: Please do.

/// If this is a qvaligni shuffle mask, return the shift		/// If this is a qvaligni shuffle mask, return the shift
/// amount, otherwise return -1.		/// amount, otherwise return -1.
int isQVALIGNIShuffleMask(SDNode *N);		int isQVALIGNIShuffleMask(SDNode *N);

} // end namespace PPC		} // end namespace PPC

class PPCTargetLowering : public TargetLowering {		class PPCTargetLowering : public TargetLowering {
▲ Show 20 Lines • Show All 530 Lines • ▼ Show 20 Lines	SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &RefinementSteps) const override;		int &RefinementSteps) const override;
unsigned combineRepeatedFPDivisors() const override;		unsigned combineRepeatedFPDivisors() const override;

CCAssignFn *useFastISelCCs(unsigned Flag) const;		CCAssignFn *useFastISelCCs(unsigned Flag) const;

SDValue		SDValue
combineElementTruncationToVectorTruncation(SDNode *N,		combineElementTruncationToVectorTruncation(SDNode *N,
DAGCombinerInfo &DCI) const;		DAGCombinerInfo &DCI) const;
};
		/// lowerToVINSERTH - Return the SDValue if this VECTOR_SHUFFLE can be
		/// handled by the VINSERTH instruction introduced in ISA 3.0. This is
		/// essentially any shuffle of v8i16 vectors that just inserts one element
		/// from one vector into the other. This function will also set a couple of
		nemanjaiUnsubmitted Done Reply Inline Actions This is probably a remnant of a previous implementation. Please rewrite the comment. nemanjai: This is probably a remnant of a previous implementation. Please rewrite the comment.
		/// output parameters for how much the source vector needs to be shifted and
		/// what byte number needs to be specified for the instruction to put the
		/// element in the desired location of the target vector.
		SDValue lowerToVINSERTH(ShuffleVectorSDNode *N, SelectionDAG &DAG) const;

		}; // end class PPCTargetLowering

namespace PPC {		namespace PPC {

FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,		FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,
const TargetLibraryInfo *LibInfo);		const TargetLibraryInfo *LibInfo);

} // end namespace PPC		} // end namespace PPC

Show All 27 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
cl::desc("disable unaligned load/store generation on PPC"), cl::Hidden);		cl::desc("disable unaligned load/store generation on PPC"), cl::Hidden);

static cl::opt<bool> DisableSCO("disable-ppc-sco",		static cl::opt<bool> DisableSCO("disable-ppc-sco",
cl::desc("disable sibling call optimization on ppc"), cl::Hidden);		cl::desc("disable sibling call optimization on ppc"), cl::Hidden);

STATISTIC(NumTailCalls, "Number of tail calls");		STATISTIC(NumTailCalls, "Number of tail calls");
STATISTIC(NumSiblingCalls, "Number of sibling calls");		STATISTIC(NumSiblingCalls, "Number of sibling calls");

		static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);
		kbartonUnsubmitted Done Reply Inline Actions Is this still necessary? I don't see any calls to it - only to the 3-parameter version of it. kbarton: Is this still necessary? I don't see any calls to it - only to the 3-parameter version of it.
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions Yeah, this patch is old so the declaration is based on the version that had two parameters. I'll update it when I merge with the latest code. gyiu: Yeah, this patch is old so the declaration is based on the version that had two parameters.

// FIXME: Remove this once the bug has been fixed!		// FIXME: Remove this once the bug has been fixed!
extern cl::opt<bool> ANDIGlueBug;		extern cl::opt<bool> ANDIGlueBug;

PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,		PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
const PPCSubtarget &STI)		const PPCSubtarget &STI)
: TargetLowering(TM), Subtarget(STI) {		: TargetLowering(TM), Subtarget(STI) {
// Use _setjmp/_longjmp instead of setjmp/longjmp.		// Use _setjmp/_longjmp instead of setjmp/longjmp.
setUseUnderscoreSetJmp(true);		setUseUnderscoreSetJmp(true);
▲ Show 20 Lines • Show All 997 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";		case PPCISD::FCTIWUZ: return "PPCISD::FCTIWUZ";
case PPCISD::FRE: return "PPCISD::FRE";		case PPCISD::FRE: return "PPCISD::FRE";
case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";		case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";
case PPCISD::STFIWX: return "PPCISD::STFIWX";		case PPCISD::STFIWX: return "PPCISD::STFIWX";
case PPCISD::VMADDFP: return "PPCISD::VMADDFP";		case PPCISD::VMADDFP: return "PPCISD::VMADDFP";
case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";		case PPCISD::VNMSUBFP: return "PPCISD::VNMSUBFP";
case PPCISD::VPERM: return "PPCISD::VPERM";		case PPCISD::VPERM: return "PPCISD::VPERM";
case PPCISD::XXSPLT: return "PPCISD::XXSPLT";		case PPCISD::XXSPLT: return "PPCISD::XXSPLT";
case PPCISD::XXINSERT: return "PPCISD::XXINSERT";		case PPCISD::VECINSERT: return "PPCISD::VECINSERT";
		nemanjaiUnsubmitted Not Done Reply Inline Actions By "renaming stuff", I mean things like this. Kind of orthogonal to the patch and should go in as a separate NFC change. nemanjai: By "renaming stuff", I mean things like this. Kind of orthogonal to the patch and should go in…
case PPCISD::XXREVERSE: return "PPCISD::XXREVERSE";		case PPCISD::XXREVERSE: return "PPCISD::XXREVERSE";
case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI";		case PPCISD::XXPERMDI: return "PPCISD::XXPERMDI";
case PPCISD::VECSHL: return "PPCISD::VECSHL";		case PPCISD::VECSHL: return "PPCISD::VECSHL";
case PPCISD::CMPB: return "PPCISD::CMPB";		case PPCISD::CMPB: return "PPCISD::CMPB";
case PPCISD::Hi: return "PPCISD::Hi";		case PPCISD::Hi: return "PPCISD::Hi";
case PPCISD::Lo: return "PPCISD::Lo";		case PPCISD::Lo: return "PPCISD::Lo";
case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";		case PPCISD::TOC_ENTRY: return "PPCISD::TOC_ENTRY";
case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";		case PPCISD::DYNALLOC: return "PPCISD::DYNALLOC";
▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines
/// Word/DoubleWord/QuadWord).		/// Word/DoubleWord/QuadWord).
/// \param[in] StepLen the delta indices number among the N byte element, if		/// \param[in] StepLen the delta indices number among the N byte element, if
/// the mask is in increasing/decreasing order then it is 1/-1.		/// the mask is in increasing/decreasing order then it is 1/-1.
/// \return true iff the mask is shuffling N byte elements.		/// \return true iff the mask is shuffling N byte elements.
static bool isNByteElemShuffleMask(ShuffleVectorSDNode *N, unsigned Width,		static bool isNByteElemShuffleMask(ShuffleVectorSDNode *N, unsigned Width,
int StepLen) {		int StepLen) {
assert((Width == 2 \|\| Width == 4 \|\| Width == 8 \|\| Width == 16) &&		assert((Width == 2 \|\| Width == 4 \|\| Width == 8 \|\| Width == 16) &&
"Unexpected element width.");		"Unexpected element width.");
assert((StepLen == 1 \|\| StepLen == -1) && "Unexpected element width.");		assert((StepLen == 1 \|\| StepLen == -1) && "Unexpected element width.");
		jtonyUnsubmitted Not Done Reply Inline Actions Is there any reason why we use uint32_t? If not, I would use unsigned instead to make it consistent. We are using both unsigned and uint32_t in this file. jtony: Is there any reason why we use uint32_t? If not, I would use unsigned instead to make it…
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions I think using uint32_t is appropriate here because I want to illustrate that I'm using 32-bits exactly. Using 'unsigned int' would be fine as well, but I don't think it's as clear that I'm looking for 32-bits exactly. gyiu: I think using uint32_t is appropriate here because I want to illustrate that I'm using 32-bits…

unsigned NumOfElem = 16 / Width;		unsigned NumOfElem = 16 / Width;
unsigned MaskVal[16]; // Width is never greater than 16		unsigned MaskVal[16]; // Width is never greater than 16
for (unsigned i = 0; i < NumOfElem; ++i) {		for (unsigned i = 0; i < NumOfElem; ++i) {
MaskVal[0] = N->getMaskElt(i * Width);		MaskVal[0] = N->getMaskElt(i * Width);
if ((StepLen == 1) && (MaskVal[0] % Width)) {		if ((StepLen == 1) && (MaskVal[0] % Width)) {
return false;		return false;
} else if ((StepLen == -1) && ((MaskVal[0] + 1) % Width)) {		} else if ((StepLen == -1) && ((MaskVal[0] + 1) % Width)) {
return false;		return false;
}		}

for (unsigned int j = 1; j < Width; ++j) {		for (unsigned int j = 1; j < Width; ++j) {
MaskVal[j] = N->getMaskElt(i * Width + j);		MaskVal[j] = N->getMaskElt(i * Width + j);
if (MaskVal[j] != MaskVal[j-1] + StepLen) {		if (MaskVal[j] != MaskVal[j-1] + StepLen) {
return false;		return false;
}		}
}		}
}		}

return true;		return true;
}		}

bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,		bool PPC::isXXINSERTWMask(ShuffleVectorSDNode *N, unsigned &ShiftElts,
		leiUnsubmitted Done Reply Inline Actions nit: don't forget the "." :) lei: nit: don't forget the "." :)
unsigned &InsertAtByte, bool &Swap, bool IsLE) {		unsigned &InsertAtByte, bool &Swap, bool IsLE) {
if (!isNByteElemShuffleMask(N, 4, 1))		if (!isNByteElemShuffleMask(N, 4, 1))
return false;		return false;

// Now we look at mask elements 0,4,8,12		// Now we look at mask elements 0,4,8,12
unsigned M0 = N->getMaskElt(0) / 4;		unsigned M0 = N->getMaskElt(0) / 4;
unsigned M1 = N->getMaskElt(4) / 4;		unsigned M1 = N->getMaskElt(4) / 4;
unsigned M2 = N->getMaskElt(8) / 4;		unsigned M2 = N->getMaskElt(8) / 4;
▲ Show 20 Lines • Show All 6,177 Lines • ▼ Show 20 Lines	static SDValue GeneratePerfectShuffle(unsigned PFEntry, SDValue LHS,
}		}
EVT VT = OpLHS.getValueType();		EVT VT = OpLHS.getValueType();
OpLHS = DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, OpLHS);		OpLHS = DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, OpLHS);
OpRHS = DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, OpRHS);		OpRHS = DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, OpRHS);
SDValue T = DAG.getVectorShuffle(MVT::v16i8, dl, OpLHS, OpRHS, ShufIdxs);		SDValue T = DAG.getVectorShuffle(MVT::v16i8, dl, OpLHS, OpRHS, ShufIdxs);
return DAG.getNode(ISD::BITCAST, dl, VT, T);		return DAG.getNode(ISD::BITCAST, dl, VT, T);
}		}

		/// lowerToVINSERTH - Return the SDValue if this VECTOR_SHUFFLE can be handled
		/// by the VINSERTH instruction introduced in ISA 3.0, else just return default
		/// SDValue
		SDValue PPCTargetLowering::lowerToVINSERTH(ShuffleVectorSDNode *N,
		SelectionDAG &DAG) const {
		const unsigned NumHalfWords = 8;
		const unsigned BytesInVector = NumHalfWords * 2;
		// check that the shuffle is on half-words
		if (!isNByteElemShuffleMask(N, 2, 1))
		return SDValue();

		bool IsLE = Subtarget.isLittleEndian();
		SDLoc dl(N);
		SDValue V1 = N->getOperand(0);
		SDValue V2 = N->getOperand(1);
		unsigned ShiftElts = 0, InsertAtByte = 0;
		bool Swap = false;

		// shifts required to get the half-word we want at element 3
		unsigned LittleEndianShifts[] = {4, 3, 2, 1, 0, 7, 6, 5};
		unsigned BigEndianShifts[] = {5, 6, 7, 0, 1, 2, 3, 4};

		uint32_t Mask = 0;
		uint32_t OriginalOrderLow = 0x1234567;
		uint32_t OriginalOrderHigh = 0x89ABCDEF;
		// Now we look at mask elements 0,2,4,6,8,10,12,14
		// Pack the mask into a 32-bit space, only need 4-bit nibbles per element
		for (unsigned i = 0; i < NumHalfWords; ++i) {
		unsigned MaskShift = (NumHalfWords - 1 - i) * 4;
		Mask \|= ((uint32_t)(N->getMaskElt(i * 2) / 2) << MaskShift);
		}

		// For each mask element, find out if we're just inserting something
		// from V2 into V1 or vice versa.
		// Possible permutations inserting an element from V2 into V1:
		// X, 1, 2, 3, 4, 5, 6, 7
		// 0, X, 2, 3, 4, 5, 6, 7
		// 0, 1, X, 3, 4, 5, 6, 7
		// 0, 1, 2, X, 4, 5, 6, 7
		// 0, 1, 2, 3, X, 5, 6, 7
		// 0, 1, 2, 3, 4, X, 6, 7
		// 0, 1, 2, 3, 4, 5, X, 7
		// 0, 1, 2, 3, 4, 5, 6, X
		// Inserting from V1 into V2 will be similar, except mask range will be [8,15]

		bool FoundCandidate = false;
		// Go through the mask of half-words to find an element that's being moved
		// from one vector to the other.
		for (unsigned i = 0; i < NumHalfWords; ++i) {
		unsigned MaskShift = (NumHalfWords - 1 - i) * 4;
		uint32_t MaskOneElt = (Mask >> MaskShift) & 0xF;
		uint32_t MaskOtherElts = ~(0xF << MaskShift);
		uint32_t TargetOrder = 0x0;

		// If both vector operands for the shuffle are the same vector, the mask
		// will contain only elements from the first one and the second one will be
		// undef.
		if (V2.isUndef()) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions You should be able to get rid of this condition here. Move the assignment `if (V.isUndef()) V2 = V1;` above here Use the `OriginalOrderLow` if the two vectors are the same The rest should fall out naturally and we'll do the shift for the single-input case as well. And the code will also be simpler. nemanjai: You should be able to get rid of this condition here. - Move the assignment `if (V.isUndef())…
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions @nemanjai I created Issue #410 on github to address the issue when using vector shifts in the case when both inputs are the same vector. There's further investigation that's required as it's not clear which input/output registers the (vector shift + vector extract) sequence uses in this case. I would rather do this change as part of that work item instead. gyiu: @nemanjai I created Issue #410 on github to address the issue when using vector shifts in the…
		ShiftElts = 0;
		unsigned VINSERTHSrcElem = IsLE ? 4 : 3;
		TargetOrder = OriginalOrderLow;
		Swap = false;
		// skip if not the correct element or mask of other elements don't equal
		// to our expected order
		if (MaskOneElt == VINSERTHSrcElem &&
		nemanjaiUnsubmitted Not Done Reply Inline Actions So we don't want to shift if we're within the same register? Is there a specific reason for this? nemanjai: So we don't want to shift if we're within the same register? Is there a specific reason for…
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions I believe we have to add a xxlor to another VR if we want to shift the vector since we can't shift if both operands of the vector shuffle are the same vector. Adding another two cycles to VECSHL+VECINSERT seems diminish its value versus load+vperm. gyiu: I believe we have to add a xxlor to another VR if we want to shift the vector since we can't…
		nemanjaiUnsubmitted Not Done Reply Inline Actions I don't really see why. Assume that you have something like this: vector unsigned short test(vector unsigned short a) { a[5] = a[2]; } I don't see why we can't codegen something like this for it: vsldoi 3, 2, 2, 4 vinserth 2, 3, 4 Forgive me if I didn't work out the immediates exactly correctly, but the point is the [lack of] need for the XXLOR. Of course, this does use an extra register, but so does the alternative (vperm). nemanjai: I don't really see why. Assume that you have something like this: ``` vector unsigned short…
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions Hmmm... Yep, you're right. I guess I can simplify my code even further now. I think this also means I have to fix up the code for the original xxinsertw lowering in a separate patch. gyiu: Hmmm... Yep, you're right. I guess I can simplify my code even further now. I think this also…
		nemanjaiUnsubmitted Not Done Reply Inline Actions Yes, as @echristo mentioned, you should do all the renaming of things in a separate patch that doesn't really require a review. You're just renaming stuff. nemanjai: Yes, as @echristo mentioned, you should do all the renaming of things in a separate patch that…
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions Actually, I'm not quite sure what you mean here. The original code for xxinsertw has the limitation of only being able to insert element 3 if both input vectors to the vector_shuffle are the same. I'll need to change that in a separate patch. I'm not sure where the 'renaming of things' comes into play? gyiu: Actually, I'm not quite sure what you mean here. The original code for xxinsertw has the…
		(Mask & MaskOtherElts) == (TargetOrder & MaskOtherElts)) {
		InsertAtByte = IsLE ? BytesInVector - (i + 1) * 2 : i * 2;
		FoundCandidate = true;
		}
		} else { // both operands are defined
		// target order is [8,15] if the current mask is between [0,7]
		echristoUnsubmitted Done Reply Inline Actions Nit: (Here and other places) Comments are complete sentences including punctuation. echristo: Nit: (Here and other places) Comments are complete sentences including punctuation.
		TargetOrder =
		(MaskOneElt < NumHalfWords) ? OriginalOrderHigh : OriginalOrderLow;
		// skip if mask of other elements don't equal our expected order
		if ((Mask & MaskOtherElts) == (TargetOrder & MaskOtherElts)) {
		// we only need the last 3 bits for the number of shifts
		nemanjaiUnsubmitted Not Done Reply Inline Actions Isn't this already guaranteed to only have the low order 3 bits set? nemanjai: Isn't this already guaranteed to only have the low order 3 bits set?
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions MaskOneElt could actually be >= 8, since the mask is in range [0, 15]. gyiu: MaskOneElt could actually be >= 8, since the mask is in range [0, 15].
		nemanjaiUnsubmitted Not Done Reply Inline Actions Ah, right. I didn't think of that. Sorry about that. nemanjai: Ah, right. I didn't think of that. Sorry about that.
		ShiftElts = IsLE ? LittleEndianShifts[MaskOneElt & 0x7]
		: BigEndianShifts[MaskOneElt & 0x7];
		InsertAtByte = IsLE ? BytesInVector - (i + 1) * 2 : i * 2;
		Swap = MaskOneElt < NumHalfWords;
		FoundCandidate = true;
		nemanjaiUnsubmitted Done Reply Inline Actions Why would we continue to search if we've already confirmed that: We have an element from vector A All other elements are from vector B in the correct order nemanjai: Why would we continue to search if we've already confirmed that: 1. We have an element from…
		gyiuAuthorUnsubmitted Not Done Reply Inline Actions Yep, you're correct. Need a break here since we can't find more than one candidate. gyiu: Yep, you're correct. Need a break here since we can't find more than one candidate.
		}
		}
		}

		if (!FoundCandidate)
		return SDValue();

		// Candidate found, construct the proper SDAG sequence with VINSERTH,
		// optionally with VECSHL if shift is required
		if (Swap)
		std::swap(V1, V2);
		if (V2.isUndef())
		V2 = V1;
		SDValue Conv1 = DAG.getNode(ISD::BITCAST, dl, MVT::v8i16, V1);
		if (ShiftElts) {
		// Double ShiftElts because we're left shifting on v16i8 type
		SDValue Shl = DAG.getNode(PPCISD::VECSHL, dl, MVT::v16i8, V2, V2,
		DAG.getConstant(2 * ShiftElts, dl, MVT::i32));
		SDValue Conv2 = DAG.getNode(ISD::BITCAST, dl, MVT::v8i16, Shl);
		SDValue Ins = DAG.getNode(PPCISD::VECINSERT, dl, MVT::v8i16, Conv1, Conv2,
		DAG.getConstant(InsertAtByte, dl, MVT::i32));
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins);
		}
		SDValue Conv2 = DAG.getNode(ISD::BITCAST, dl, MVT::v8i16, V2);
		SDValue Ins = DAG.getNode(PPCISD::VECINSERT, dl, MVT::v8i16, Conv1, Conv2,
		DAG.getConstant(InsertAtByte, dl, MVT::i32));
		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins);
		}

/// LowerVECTOR_SHUFFLE - Return the code we lower for VECTOR_SHUFFLE. If this		/// LowerVECTOR_SHUFFLE - Return the code we lower for VECTOR_SHUFFLE. If this
/// is a shuffle we can handle in a single instruction, return it. Otherwise,		/// is a shuffle we can handle in a single instruction, return it. Otherwise,
/// return the code it can be lowered into. Worst case, it can always be		/// return the code it can be lowered into. Worst case, it can always be
/// lowered into a vperm.		/// lowered into a vperm.
SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,		SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(Op);		SDLoc dl(Op);
SDValue V1 = Op.getOperand(0);		SDValue V1 = Op.getOperand(0);
SDValue V2 = Op.getOperand(1);		SDValue V2 = Op.getOperand(1);
ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);		ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(Op);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
bool isLittleEndian = Subtarget.isLittleEndian();		bool isLittleEndian = Subtarget.isLittleEndian();

unsigned ShiftElts, InsertAtByte;		unsigned ShiftElts, InsertAtByte;
bool Swap;		bool Swap = false;
if (Subtarget.hasP9Vector() &&		if (Subtarget.hasP9Vector() &&
PPC::isXXINSERTWMask(SVOp, ShiftElts, InsertAtByte, Swap,		PPC::isXXINSERTWMask(SVOp, ShiftElts, InsertAtByte, Swap,
isLittleEndian)) {		isLittleEndian)) {
if (Swap)		if (Swap)
std::swap(V1, V2);		std::swap(V1, V2);
SDValue Conv1 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);		SDValue Conv1 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);
SDValue Conv2 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V2);		SDValue Conv2 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V2);
if (ShiftElts) {		if (ShiftElts) {
SDValue Shl = DAG.getNode(PPCISD::VECSHL, dl, MVT::v4i32, Conv2, Conv2,		SDValue Shl = DAG.getNode(PPCISD::VECSHL, dl, MVT::v4i32, Conv2, Conv2,
DAG.getConstant(ShiftElts, dl, MVT::i32));		DAG.getConstant(ShiftElts, dl, MVT::i32));
SDValue Ins = DAG.getNode(PPCISD::XXINSERT, dl, MVT::v4i32, Conv1, Shl,		SDValue Ins = DAG.getNode(PPCISD::VECINSERT, dl, MVT::v4i32, Conv1, Shl,
DAG.getConstant(InsertAtByte, dl, MVT::i32));		DAG.getConstant(InsertAtByte, dl, MVT::i32));
return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins);		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins);
}		}
SDValue Ins = DAG.getNode(PPCISD::XXINSERT, dl, MVT::v4i32, Conv1, Conv2,		SDValue Ins = DAG.getNode(PPCISD::VECINSERT, dl, MVT::v4i32, Conv1, Conv2,
DAG.getConstant(InsertAtByte, dl, MVT::i32));		DAG.getConstant(InsertAtByte, dl, MVT::i32));
return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins);		return DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, Ins);
}		}

		if (Subtarget.hasP9Altivec()) {
		SDValue NewISDNode = lowerToVINSERTH(SVOp, DAG);
		if (NewISDNode)
		return NewISDNode;
		}

if (Subtarget.hasVSX() &&		if (Subtarget.hasVSX() &&
PPC::isXXSLDWIShuffleMask(SVOp, ShiftElts, Swap, isLittleEndian)) {		PPC::isXXSLDWIShuffleMask(SVOp, ShiftElts, Swap, isLittleEndian)) {
if (Swap)		if (Swap)
std::swap(V1, V2);		std::swap(V1, V2);
SDValue Conv1 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);		SDValue Conv1 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);
SDValue Conv2 =		SDValue Conv2 =
DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V2.isUndef() ? V1 : V2);		DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V2.isUndef() ? V1 : V2);
▲ Show 20 Lines • Show All 5,435 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrAltivec.td

	Show First 20 Lines • Show All 471 Lines • ▼ Show 20 Lines
	def VMLADDUHM : VA1a_Int_Ty<34, "vmladduhm", int_ppc_altivec_vmladduhm, v8i16>;			def VMLADDUHM : VA1a_Int_Ty<34, "vmladduhm", int_ppc_altivec_vmladduhm, v8i16>;
	} // isCommutable			} // isCommutable

	def VPERM : VA1a_Int_Ty3<43, "vperm", int_ppc_altivec_vperm,			def VPERM : VA1a_Int_Ty3<43, "vperm", int_ppc_altivec_vperm,
	v4i32, v4i32, v16i8>;			v4i32, v4i32, v16i8>;
	def VSEL : VA1a_Int_Ty<42, "vsel", int_ppc_altivec_vsel, v4i32>;			def VSEL : VA1a_Int_Ty<42, "vsel", int_ppc_altivec_vsel, v4i32>;

	// Shuffles.			// Shuffles.
	def VSLDOI : VAForm_2<44, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB, u5imm:$SH),			def VSLDOI : VAForm_2<44, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB, u4imm:$SH),
	"vsldoi $vD, $vA, $vB, $SH", IIC_VecFP,			"vsldoi $vD, $vA, $vB, $SH", IIC_VecFP,
	[(set v16i8:$vD,			[(set v16i8:$vD,
	(vsldoi_shuffle:$SH v16i8:$vA, v16i8:$vB))]>;			(PPCvecshl v16i8:$vA, v16i8:$vB, imm32SExt16:$SH))]>;

	// VX-Form instructions. AltiVec arithmetic ops.			// VX-Form instructions. AltiVec arithmetic ops.
	let isCommutable = 1 in {			let isCommutable = 1 in {
	def VADDFP : VXForm_1<10, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),			def VADDFP : VXForm_1<10, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
	"vaddfp $vD, $vA, $vB", IIC_VecFP,			"vaddfp $vD, $vA, $vB", IIC_VecFP,
	[(set v4f32:$vD, (fadd v4f32:$vA, v4f32:$vB))]>;			[(set v4f32:$vD, (fadd v4f32:$vA, v4f32:$vB))]>;

	def VADDUBM : VXForm_1<0, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),			def VADDUBM : VXForm_1<0, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
	▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines

	// Match vsldoi(x,x), vpkuwum(x,x), vpkuhum(x,x)			// Match vsldoi(x,x), vpkuwum(x,x), vpkuhum(x,x)
	def:Pat<(vsldoi_unary_shuffle:$in v16i8:$vA, undef),			def:Pat<(vsldoi_unary_shuffle:$in v16i8:$vA, undef),
	(VSLDOI $vA, $vA, (VSLDOI_unary_get_imm $in))>;			(VSLDOI $vA, $vA, (VSLDOI_unary_get_imm $in))>;
	def:Pat<(vpkuwum_unary_shuffle v16i8:$vA, undef),			def:Pat<(vpkuwum_unary_shuffle v16i8:$vA, undef),
	(VPKUWUM $vA, $vA)>;			(VPKUWUM $vA, $vA)>;
	def:Pat<(vpkuhum_unary_shuffle v16i8:$vA, undef),			def:Pat<(vpkuhum_unary_shuffle v16i8:$vA, undef),
	(VPKUHUM $vA, $vA)>;			(VPKUHUM $vA, $vA)>;
				def:Pat<(vsldoi_shuffle:$SH v16i8:$vA, v16i8:$vB),
				(VSLDOI v16i8:$vA, v16i8:$vB, (VSLDOI_get_imm $SH))>;


	// Match vsldoi(y,x), vpkuwum(y,x), vpkuhum(y,x), i.e., swapped operands.			// Match vsldoi(y,x), vpkuwum(y,x), vpkuhum(y,x), i.e., swapped operands.
	// These fragments are matched for little-endian, where the inputs must			// These fragments are matched for little-endian, where the inputs must
	// be swapped for correct semantics.			// be swapped for correct semantics.
	def:Pat<(vsldoi_swapped_shuffle:$in v16i8:$vA, v16i8:$vB),			def:Pat<(vsldoi_swapped_shuffle:$in v16i8:$vA, v16i8:$vB),
	(VSLDOI $vB, $vA, (VSLDOI_swapped_get_imm $in))>;			(VSLDOI $vB, $vA, (VSLDOI_swapped_get_imm $in))>;
	def:Pat<(vpkuwum_swapped_shuffle v16i8:$vA, v16i8:$vB),			def:Pat<(vpkuwum_swapped_shuffle v16i8:$vA, v16i8:$vB),
	(VPKUWUM $vB, $vA)>;			(VPKUWUM $vB, $vA)>;
	▲ Show 20 Lines • Show All 386 Lines • ▼ Show 20 Lines
	def VEXTUBRX : VX1_RT5_RA5_VB5<1805, "vextubrx", []>;			def VEXTUBRX : VX1_RT5_RA5_VB5<1805, "vextubrx", []>;
	def VEXTUHLX : VX1_RT5_RA5_VB5<1613, "vextuhlx", []>;			def VEXTUHLX : VX1_RT5_RA5_VB5<1613, "vextuhlx", []>;
	def VEXTUHRX : VX1_RT5_RA5_VB5<1869, "vextuhrx", []>;			def VEXTUHRX : VX1_RT5_RA5_VB5<1869, "vextuhrx", []>;
	def VEXTUWLX : VX1_RT5_RA5_VB5<1677, "vextuwlx", []>;			def VEXTUWLX : VX1_RT5_RA5_VB5<1677, "vextuwlx", []>;
	def VEXTUWRX : VX1_RT5_RA5_VB5<1933, "vextuwrx", []>;			def VEXTUWRX : VX1_RT5_RA5_VB5<1933, "vextuwrx", []>;

	// Vector Insert Element Instructions			// Vector Insert Element Instructions
	def VINSERTB : VX1_VT5_UIM5_VB5<781, "vinsertb", []>;			def VINSERTB : VX1_VT5_UIM5_VB5<781, "vinsertb", []>;
	def VINSERTH : VX1_VT5_UIM5_VB5<845, "vinserth", []>;			def VINSERTH : VXForm_1<845, (outs vrrc:$vD),
				(ins vrrc:$vDi, u4imm:$UIM, vrrc:$vB),
				"vinserth $vD, $vB, $UIM", IIC_VecGeneral,
				[(set v8i16:$vD, (PPCvecinsert v8i16:$vDi, v8i16:$vB,
				imm32SExt16:$UIM))]>,
				RegConstraint<"$vDi = $vD">, NoEncode<"$vDi">;
	def VINSERTW : VX1_VT5_UIM5_VB5<909, "vinsertw", []>;			def VINSERTW : VX1_VT5_UIM5_VB5<909, "vinsertw", []>;
	def VINSERTD : VX1_VT5_UIM5_VB5<973, "vinsertd", []>;			def VINSERTD : VX1_VT5_UIM5_VB5<973, "vinsertd", []>;

	class VX_VT5_EO5_VB5<bits<11> xo, bits<5> eo, string opc, list<dag> pattern>			class VX_VT5_EO5_VB5<bits<11> xo, bits<5> eo, string opc, list<dag> pattern>
	: VXForm_RD5_XO5_RS5<xo, eo, (outs vrrc:$vD), (ins vrrc:$vB),			: VXForm_RD5_XO5_RS5<xo, eo, (outs vrrc:$vD), (ins vrrc:$vB),
	!strconcat(opc, " $vD, $vB"), IIC_VecGeneral, pattern>;			!strconcat(opc, " $vD, $vB"), IIC_VecGeneral, pattern>;
	class VX_VT5_EO5_VB5s<bits<11> xo, bits<5> eo, string opc, list<dag> pattern>			class VX_VT5_EO5_VB5s<bits<11> xo, bits<5> eo, string opc, list<dag> pattern>
	: VXForm_RD5_XO5_RS5<xo, eo, (outs vfrc:$vD), (ins vfrc:$vB),			: VXForm_RD5_XO5_RS5<xo, eo, (outs vfrc:$vD), (ins vfrc:$vB),
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.td

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	def PPCaddiTlsldLAddr : SDNode<"PPCISD::ADDI_TLSLD_L_ADDR",
SDTypeProfile<1, 3, [		SDTypeProfile<1, 3, [
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>,
SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;		SDTCisSameAs<0, 3>, SDTCisInt<0> ]>>;
def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;		def PPCaddisDtprelHA : SDNode<"PPCISD::ADDIS_DTPREL_HA", SDTIntBinOp>;
def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;		def PPCaddiDtprelL : SDNode<"PPCISD::ADDI_DTPREL_L", SDTIntBinOp>;

def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;		def PPCvperm : SDNode<"PPCISD::VPERM", SDT_PPCvperm, []>;
def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;		def PPCxxsplt : SDNode<"PPCISD::XXSPLT", SDT_PPCVecSplat, []>;
def PPCxxinsert : SDNode<"PPCISD::XXINSERT", SDT_PPCVecInsert, []>;		def PPCvecinsert : SDNode<"PPCISD::VECINSERT", SDT_PPCVecInsert, []>;
def PPCxxreverse : SDNode<"PPCISD::XXREVERSE", SDT_PPCVecReverse, []>;		def PPCxxreverse : SDNode<"PPCISD::XXREVERSE", SDT_PPCVecReverse, []>;
def PPCxxpermdi : SDNode<"PPCISD::XXPERMDI", SDT_PPCxxpermdi, []>;		def PPCxxpermdi : SDNode<"PPCISD::XXPERMDI", SDT_PPCxxpermdi, []>;
def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;		def PPCvecshl : SDNode<"PPCISD::VECSHL", SDT_PPCVecShift, []>;

def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;		def PPCqvfperm : SDNode<"PPCISD::QVFPERM", SDT_PPCqvfperm, []>;
def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;		def PPCqvgpci : SDNode<"PPCISD::QVGPCI", SDT_PPCqvgpci, []>;
def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;		def PPCqvaligni : SDNode<"PPCISD::QVALIGNI", SDT_PPCqvaligni, []>;
def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;		def PPCqvesplati : SDNode<"PPCISD::QVESPLATI", SDT_PPCqvesplati, []>;
▲ Show 20 Lines • Show All 4,264 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 2,246 Lines • ▼ Show 20 Lines	let AddedComplexity = 400, Predicates = [HasP9Vector] in {

// Vector Insert Word		// Vector Insert Word
let UseVSXReg = 1 in {		let UseVSXReg = 1 in {
// XB NOTE: Only XB.dword[1] is used, but we use vsrc on XB.		// XB NOTE: Only XB.dword[1] is used, but we use vsrc on XB.
def XXINSERTW :		def XXINSERTW :
XX2_RD6_UIM5_RS6<60, 181, (outs vsrc:$XT),		XX2_RD6_UIM5_RS6<60, 181, (outs vsrc:$XT),
(ins vsrc:$XTi, vsrc:$XB, u4imm:$UIM),		(ins vsrc:$XTi, vsrc:$XB, u4imm:$UIM),
"xxinsertw $XT, $XB, $UIM", IIC_VecFP,		"xxinsertw $XT, $XB, $UIM", IIC_VecFP,
[(set v4i32:$XT, (PPCxxinsert v4i32:$XTi, v4i32:$XB,		[(set v4i32:$XT, (PPCvecinsert v4i32:$XTi, v4i32:$XB,
imm32SExt16:$UIM))]>,		imm32SExt16:$UIM))]>,
RegConstraint<"$XTi = $XT">, NoEncode<"$XTi">;		RegConstraint<"$XTi = $XT">, NoEncode<"$XTi">;

// Vector Extract Unsigned Word		// Vector Extract Unsigned Word
def XXEXTRACTUW : XX2_RD6_UIM5_RS6<60, 165,		def XXEXTRACTUW : XX2_RD6_UIM5_RS6<60, 165,
(outs vsfrc:$XT), (ins vsrc:$XB, u4imm:$UIMM),		(outs vsfrc:$XT), (ins vsrc:$XB, u4imm:$UIMM),
"xxextractuw $XT, $XB, $UIMM", IIC_VecFP, []>;		"xxextractuw $XT, $XB, $UIMM", IIC_VecFP, []>;
} // UseVSXReg = 1		} // UseVSXReg = 1
▲ Show 20 Lines • Show All 770 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/p9-vinsert-vextract.ll

This file was added.

				; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -O0 -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -mcpu=pwr9 -mtriple=powerpc64-unknown-linux-gnu \
				; RUN: -verify-machineinstrs < %s \| FileCheck %s --check-prefix=CHECK-BE
				leiUnsubmitted Done Reply Inline Actions Can you add a short description of what each of these functions are testing? lei: Can you add a short description of what each of these functions are testing?
				; RUN: llc -O0 -mcpu=pwr9 -mtriple=powerpc64-unknown-linux-gnu \
				; RUN: -verify-machineinstrs < %s \| FileCheck %s --check-prefix=CHECK-BE

				jtonyUnsubmitted Done Reply Inline Actions I would prefer to use non-mangled function names to make it more readable. I think you can just regenerate the IR from c source file instead of cpp file. jtony: I would prefer to use non-mangled function names to make it more readable. I think you can just…
				; The following testcases take one halfword element from the second vector and
				; inserts it at various locations in the first vector
				define <8 x i16> @shuffle_vector_halfword_0_8(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_0_8
				; CHECK: vsldoi 3, 3, 3, 8
				; CHECK: vinserth 2, 3, 14
				; CHECK-BE-LABEL: shuffle_vector_halfword_0_8
				; CHECK-BE: vsldoi 3, 3, 3, 10
				; CHECK-BE: vinserth 2, 3, 0
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_1_15(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_1_15
				; CHECK: vsldoi 3, 3, 3, 10
				; CHECK: vinserth 2, 3, 12
				; CHECK-BE-LABEL: shuffle_vector_halfword_1_15
				; CHECK-BE: vsldoi 3, 3, 3, 8
				; CHECK-BE: vinserth 2, 3, 2
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 15, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_2_9(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_2_9
				; CHECK: vsldoi 3, 3, 3, 6
				; CHECK: vinserth 2, 3, 10
				; CHECK-BE-LABEL: shuffle_vector_halfword_2_9
				; CHECK-BE: vsldoi 3, 3, 3, 12
				; CHECK-BE: vinserth 2, 3, 4
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 1, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_3_13(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_3_13
				; CHECK: vsldoi 3, 3, 3, 14
				; CHECK: vinserth 2, 3, 8
				; CHECK-BE-LABEL: shuffle_vector_halfword_3_13
				; CHECK-BE: vsldoi 3, 3, 3, 4
				; CHECK-BE: vinserth 2, 3, 6
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 13, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_4_10(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_4_10
				; CHECK: vsldoi 3, 3, 3, 4
				; CHECK: vinserth 2, 3, 6
				; CHECK-BE-LABEL: shuffle_vector_halfword_4_10
				; CHECK-BE: vsldoi 3, 3, 3, 14
				; CHECK-BE: vinserth 2, 3, 8
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 10, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_5_14(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_5_14
				; CHECK: vsldoi 3, 3, 3, 12
				; CHECK: vinserth 2, 3, 4
				; CHECK-BE-LABEL: shuffle_vector_halfword_5_14
				; CHECK-BE: vsldoi 3, 3, 3, 6
				; CHECK-BE: vinserth 2, 3, 10
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 14, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_6_11(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_6_11
				; CHECK: vsldoi 3, 3, 3, 2
				; CHECK: vinserth 2, 3, 2
				; CHECK-BE-LABEL: shuffle_vector_halfword_6_11
				; CHECK-BE: vinserth 2, 3, 12
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 11, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_7_12(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_7_12
				; CHECK: vinserth 2, 3, 0
				; CHECK-BE-LABEL: shuffle_vector_halfword_7_12
				; CHECK-BE: vsldoi 3, 3, 3, 2
				; CHECK-BE: vinserth 2, 3, 14
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 12>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_8_1(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_8_1
				; CHECK: vsldoi 2, 2, 2, 6
				; CHECK: vinserth 3, 2, 14
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_8_1
				; CHECK-BE: vsldoi 2, 2, 2, 12
				; CHECK-BE: vinserth 3, 2, 0
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 1, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				ret <8 x i16> %vecins
				}

				; The following testcases take one halfword element from the first vector and
				; inserts it at various locations in the second vector
				define <8 x i16> @shuffle_vector_halfword_9_7(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_9_7
				; CHECK: vsldoi 2, 2, 2, 10
				; CHECK: vinserth 3, 2, 12
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_9_7
				; CHECK-BE: vsldoi 2, 2, 2, 8
				; CHECK-BE: vinserth 3, 2, 2
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 7, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_10_4(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_10_4
				; CHECK: vinserth 3, 2, 10
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_10_4
				; CHECK-BE: vsldoi 2, 2, 2, 2
				; CHECK-BE: vinserth 3, 2, 4
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 9, i32 4, i32 11, i32 12, i32 13, i32 14, i32 15>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_11_2(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_11_2
				; CHECK: vsldoi 2, 2, 2, 4
				; CHECK: vinserth 3, 2, 8
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_11_2
				; CHECK-BE: vsldoi 2, 2, 2, 14
				; CHECK-BE: vinserth 3, 2, 6
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 9, i32 10, i32 2, i32 12, i32 13, i32 14, i32 15>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_12_6(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_12_6
				; CHECK: vsldoi 2, 2, 2, 12
				; CHECK: vinserth 3, 2, 6
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_12_6
				; CHECK-BE: vsldoi 2, 2, 2, 6
				; CHECK-BE: vinserth 3, 2, 8
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 6, i32 13, i32 14, i32 15>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_13_3(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_13_3
				; CHECK: vsldoi 2, 2, 2, 2
				; CHECK: vinserth 3, 2, 4
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_13_3
				; CHECK-BE: vinserth 3, 2, 10
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 3, i32 14, i32 15>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_14_5(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_14_5
				; CHECK: vsldoi 2, 2, 2, 14
				; CHECK: vinserth 3, 2, 2
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_14_5
				; CHECK-BE: vsldoi 2, 2, 2, 4
				; CHECK-BE: vinserth 3, 2, 12
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 5, i32 15>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_15_0(<8 x i16> %a, <8 x i16> %b) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_15_0
				; CHECK: vsldoi 2, 2, 2, 8
				; CHECK: vinserth 3, 2, 0
				; CHECK: vmr 2, 3
				; CHECK-BE-LABEL: shuffle_vector_halfword_15_0
				; CHECK-BE: vsldoi 2, 2, 2, 10
				; CHECK-BE: vinserth 3, 2, 14
				; CHECK-BE: vmr 2, 3
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 0>
				ret <8 x i16> %vecins
				}

				; The following testcases use the same vector in both arguments of the
				; shufflevector. If halfword element 3 in BE mode(or 4 in LE mode) is the one
				; we're attempting to insert, then we can use the vector insert instruction
				define <8 x i16> @shuffle_vector_halfword_0_4(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_0_4
				; CHECK: vinserth 2, 2, 14
				; CHECK-BE-LABEL: shuffle_vector_halfword_0_4
				; CHECK-BE-NOT: vinserth
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 4, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_1_3(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_1_3
				; CHECK-NOT: vinserth
				; CHECK-BE-LABEL: shuffle_vector_halfword_1_3
				; CHECK-BE: vinserth 2, 2, 2
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 0, i32 3, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_2_3(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_2_3
				; CHECK-NOT: vinserth
				; CHECK-BE-LABEL: shuffle_vector_halfword_2_3
				; CHECK-BE: vinserth 2, 2, 4
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 3, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_3_4(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_3_4
				; CHECK: vinserth 2, 2, 8
				; CHECK-BE-LABEL: shuffle_vector_halfword_3_4
				; CHECK-BE-NOT: vinserth
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 4, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_4_3(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_4_3
				; CHECK-NOT: vinserth
				; CHECK-BE-LABEL: shuffle_vector_halfword_4_3
				; CHECK-BE: vinserth 2, 2, 8
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 3, i32 5, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_5_3(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_5_3
				; CHECK-NOT: vinserth
				; CHECK-BE-LABEL: shuffle_vector_halfword_5_3
				; CHECK-BE: vinserth 2, 2, 10
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 3, i32 6, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_6_4(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_6_4
				; CHECK: vinserth 2, 2, 2
				; CHECK-BE-LABEL: shuffle_vector_halfword_6_4
				; CHECK-BE-NOT: vinserth
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 4, i32 7>
				ret <8 x i16> %vecins
				}

				define <8 x i16> @shuffle_vector_halfword_7_4(<8 x i16> %a) {
				entry:
				; CHECK-LABEL: shuffle_vector_halfword_7_4
				; CHECK: vinserth 2, 2, 0
				; CHECK-BE-LABEL: shuffle_vector_halfword_7_4
				; CHECK-BE-NOT: vinserth
				%vecins = shufflevector <8 x i16> %a, <8 x i16> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 4>
				ret <8 x i16> %vecins
				}