This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
27
PPCISelLowering.cpp
-
PPCInstrInfo.td
6
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
p8-scalar_vector_conversions.ll
-
power9-moves-and-splats.ll
-
tail-dup-analyzable-fallthrough.ll
-
vsx.ll

Differential D25912

[PowerPC] Improvements for BUILD_VECTOR Vol. 1
ClosedPublic

Authored by nemanjai on Oct 24 2016, 9:59 AM.

Download Raw Diff

Details

Reviewers

andreadb
echristo
kbarton
amehsan
jtony
hfinkel

Summary

This patch is the first in a series that derives from https://reviews.llvm.org/D25580 which will be abandoned because it is excessively complex.
This particular patch provides SDAG patterns for the nodes and adds the logic to decide when to expand the node vs. treating it as legal and using the SDAG patterns.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 75600.Oct 24 2016, 9:59 AM

nemanjai retitled this revision from to [PowerPC] Improvements for BUILD_VECTOR Vol. 1.

nemanjai updated this object.

nemanjai added reviewers: echristo, hfinkel, andreadb, kbarton, amehsan, jtony.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptOct 24 2016, 9:59 AM

amehsan added inline comments.Oct 26 2016, 7:23 AM

lib/Target/PowerPC/PPCISelLowering.cpp
674–676	I have added a similar logic in one of the patches that I committed recently. Please make sure that we do not do this twice when you commit your code. (mine is a little bit earlier in the file IIRC)
7214–7217	Do we need a lambda that is called only once? Since this function is very long already, I think creating a separate function for this is reasonable to avoid making this function too long.

nemanjai added inline comments.Oct 26 2016, 7:48 AM

lib/Target/PowerPC/PPCISelLowering.cpp
674–676	OK, thanks for mentioning that. I'll double check.
7214–7217	I only added a lambda for two reasons: The logic is very close to where it's used The logic isn't needed anywhere else But I really don't feel strongly about that reasoning and would be happy to change it to a function if that's what is preferred. What do the others think?

This patch also fixes https://llvm.org/bugs/show_bug.cgi?id=30823.

nemanjai added a subscriber: wschmidt.Oct 28 2016, 2:29 PM

echristo added inline comments.Oct 31 2016, 2:38 PM

lib/Target/PowerPC/PPCISelLowering.cpp
7214–7217	No strong opinion here. I'm slightly more likely to want to outline code because it is a very long function, but no strong opinion.

Can you please add some more comments in the summary about the intention of this group of patches?

lib/Target/PowerPC/PPCISelLowering.cpp
7203	Would it be better to add this check in the condition above? Then at least the remaining logic might be able to lower this, instead of just punting to the default case. If not, then I think a comment here is justified.
7267	I don't follow this logic. Doesn't the isBuildVectorAllOnes already account for any Undefs in the vector?
lib/Target/PowerPC/PPCInstrVSX.td
1383	Is there a way we can do this without the use of AddedComplexity?
2752	Please convert to C++ comments

nemanjai added inline comments.Nov 1 2016, 2:38 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7203	Do you mean add this condition to the outer if? I don't think that would have the correct semantics. We don't want to bail on constant splats regardless of whether we have VSX or not. And if this is not a constant splat, we may be able to do something better depending on a number of conditions - however, without VSX, we can't do anything very productive so we bail. How about the following comment: // BUILD_VECTOR nodes that are not constant splats of up to 32-bits can be // lowered to VSX instructions under certain conditions. // Without VSX however, there is no pattern more efficient than expanding the node.
7214–7217	Two reviewers constitute a consensus in my opinion. I'll convert this to a static function.
7267	Neither condition is strictly a strengthened version of the other. Think about a build vector node such as this: (v16i8 (build_vector i8:3, i8:3, i8:3, i8:undef, i8:undef, i8:3, ...)) That one has undefs and the splat size is 1 byte but it is not a BUILD_VECTOR of all ones. Although it is certainly possible for a node to have undefs and be a BUILD_VECTOR of all ones.
lib/Target/PowerPC/PPCInstrVSX.td
1383	Are you suggesting that we do away with AddedComplexity from VSX instruction definitions altogether? This was just added for consistency as it was initially missing. Ultimately, having it there doesn't change anything (at this time) because these instructions match PPCISD nodes that are not matched by anything else. However, in the highly unlikely situation where some VMX instructions are added in the future that match these nodes, it is good to have the AddedComplexity there as it appears in the remainder of the VSX target definition file. If you want, I can certainly remove this instance of it without affecting anything this patch does.
2752	OK, will do.

In D25912#584201, @kbarton wrote:

Can you please add some more comments in the summary about the intention of this group of patches?

Perhaps it was an error to put the full description of the reasoning behind all of the patches in to the very last patch rather than the first...
Does the description in https://reviews.llvm.org/D26066 suffice for what you're after or would you like further commentary in the patch or code?

Removed an unnecessary lambda and converted it to a static function.
Simplified the logic.
Converted comments as requested.

kbarton added inline comments.Nov 9 2016, 11:44 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7078–7079	The current comment says that this can be done for non constant splats, but this seems to indicate it cannot.
7082	I don't think I've ever seen \param used like this. Typically you have a more detailed description following the \brief description, followed by a list of the parameters and return. Something like: \param V The BuildVectorSDNode to analyze \param HasDirectMove - indicates whether the DirectMove instruction is available \returns true if there is a pattern in a .td file for this node, false otherwise. Also, as a minor nit, I find the description of the logic here a bit convoluted. I generally prefer more high-level english descriptions for the logic: There are several efficient patterns for BUILD_VECTORS. If it is a constant splat it is handled <blah>. If it is a non-constant splat, the following conditions must be true: It is not a load-and-splat It is either a floating point vector or an integer vector on a target with direct moves. This needs to be refined somewhat, because I don't think it actually matches the logic in the function, but hopefully you get the idea.
7094	This also looks a bit suspicious, but that's mostly because I'm not clear on the exact conditions under which we have efficient patterns.
7203	Yup, this is good. Thanks.
7267	I still don't follow the logic. In the example above, do we want to match it or not? Based on the comments above, we want to (all elements are either 3, or undef). How do we guarantee that all the all elements are the same constant - I don't see that check anywhere.
lib/Target/PowerPC/PPCInstrVSX.td
1383	My (slight) preference is to not use it unless it is absolutely necessary. Ideally we'll get remove the need for it at some point, but that is a big piece of work. For now, I don't think we should be using it unless it's absolutely necessary.

nemanjai added inline comments.Nov 10 2016, 7:49 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7078–7079	This function is only called if the node was already confirmed not to be a constant splat (see below). So if the node is constant, it's building a vector out of different constants and it is beneficial to expand the BUILD_VECTOR so we can just get a LOAD node (from the constant pool).
7082	To be perfectly honest with you, I'm not very familiar with Doxygen and this was based on a very quick reading about how these are to be written in LLVM code (so is probably wrong). I'll just follow the pattern you suggested. How about this for the description: // There are some patterns where it is beneficial to keep a BUILD_VECTOR // node as a BUILD_VECTOR node rather than expanding it. The patterns where // the opposite is true (expansion is beneficial) are: // The node builds a vector out of integers that are not 32 or 64-bits // The node builds a vector out of constants // The node is a "load-and-splat" // In all other cases, we will choose to keep the BUILD_VECTOR.
7094	I think hopefully the updated comment will clarify this.

kbarton added inline comments.Nov 10 2016, 11:54 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7078–7079	That's probably worth an inline comment ;)
7082	This looks fine. The only minor comment would be to indent and bullet/number the 3 lines that make up the list. That makes them standout visually. You also need to use /// (3 slashes) to indicate it's something doxygen can format. And make sure to keep the \brief at the beginning - that is important (and quite useful).
7094	Yes, this is good now.

mehdi_amini added inline comments.Nov 10 2016, 12:06 PM

lib/Target/PowerPC/PPCISelLowering.cpp
7082	Side note: the use of `\brief` is deprecated in llvm, we turned on autobrief some time ago. (Ref: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments )

nemanjai added inline comments.Nov 10 2016, 12:36 PM

lib/Target/PowerPC/PPCISelLowering.cpp
7078–7079	OK, I'll add that. // This function is called in a block that confirms the node is not a constant splat. // So a constant BUILD_VECTOR here means the vector is built out of different constants.
7082	OK, thanks. I'll format it accordingly.
7267	So at this point, we have determined that this is a splat of a constant and that we can build this vector by splatting a 1-byte value. We also have the value of that constant (`SplatBits`). However, the actual node may have some undefs. What this code will do is change any inputs into the value that needs to be splat to build this vector. In the example above, we'll just replace all inputs with constant 3 and leave everything the same so the BUILD_VECTOR node will be matched in the .td file.
lib/Target/PowerPC/PPCInstrVSX.td
1383	Yeah, for sure. I'll remove this. Direct moves do not have a corresponding VMX alternative so this accomplishes nothing.

LGTM

lib/Target/PowerPC/PPCISelLowering.cpp
7082	@mehdi_amini Thanks for the pointer - I didn't realize this! Although, it's not immediately obvious (to me at least) from that text that \brief has been deprecated in favour of autobrief. How does one go about updating that document?

This revision is now accepted and ready to land.Nov 10 2016, 1:01 PM

mehdi_amini added inline comments.Nov 10 2016, 1:02 PM

lib/Target/PowerPC/PPCISelLowering.cpp
7082	Send a patch for llvm/docs/CodingStandards.rst

Committed revision 288152.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.cpp

82 lines

PPCInstrInfo.td

1 line

PPCInstrVSX.td

326 lines

test/

CodeGen/

PowerPC/

p8-scalar_vector_conversions.ll

8 lines

power9-moves-and-splats.ll

18 lines

tail-dup-analyzable-fallthrough.ll

2 lines

vsx.ll

8 lines

Diff 75600

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 665 Lines • ▼ Show 20 Lines	if (Subtarget.hasVSX()) {
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::v2i16, Custom);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::v2i16, Custom);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::v2i8, Custom);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::v2i8, Custom);

setOperationAction(ISD::FNEG, MVT::v4f32, Legal);		setOperationAction(ISD::FNEG, MVT::v4f32, Legal);
setOperationAction(ISD::FNEG, MVT::v2f64, Legal);		setOperationAction(ISD::FNEG, MVT::v2f64, Legal);
setOperationAction(ISD::FABS, MVT::v4f32, Legal);		setOperationAction(ISD::FABS, MVT::v4f32, Legal);
setOperationAction(ISD::FABS, MVT::v2f64, Legal);		setOperationAction(ISD::FABS, MVT::v2f64, Legal);

		if (Subtarget.hasDirectMove())
		setOperationAction(ISD::BUILD_VECTOR, MVT::v2i64, Custom);
		setOperationAction(ISD::BUILD_VECTOR, MVT::v2f64, Custom);
		amehsanUnsubmitted Not Done Reply Inline Actions I have added a similar logic in one of the patches that I committed recently. Please make sure that we do not do this twice when you commit your code. (mine is a little bit earlier in the file IIRC) amehsan: I have added a similar logic in one of the patches that I committed recently. Please make sure…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions OK, thanks for mentioning that. I'll double check. nemanjai: OK, thanks for mentioning that. I'll double check.

addRegisterClass(MVT::v2i64, &PPC::VSRCRegClass);		addRegisterClass(MVT::v2i64, &PPC::VSRCRegClass);
}		}

if (Subtarget.hasP8Altivec()) {		if (Subtarget.hasP8Altivec()) {
addRegisterClass(MVT::v2i64, &PPC::VRRCRegClass);		addRegisterClass(MVT::v2i64, &PPC::VRRCRegClass);
addRegisterClass(MVT::v1i128, &PPC::VRRCRegClass);		addRegisterClass(MVT::v1i128, &PPC::VRRCRegClass);
}		}

if (Subtarget.hasP9Vector()) {		if (Subtarget.hasP9Vector()) {
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v4i32, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v4i32, Custom);
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v4f32, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v4f32, Custom);
}		}

if (Subtarget.isISA3_0() && Subtarget.hasDirectMove())
setOperationAction(ISD::BUILD_VECTOR, MVT::v2i64, Custom);
}		}

if (Subtarget.hasQPX()) {		if (Subtarget.hasQPX()) {
setOperationAction(ISD::FADD, MVT::v4f64, Legal);		setOperationAction(ISD::FADD, MVT::v4f64, Legal);
setOperationAction(ISD::FSUB, MVT::v4f64, Legal);		setOperationAction(ISD::FSUB, MVT::v4f64, Legal);
setOperationAction(ISD::FMUL, MVT::v4f64, Legal);		setOperationAction(ISD::FMUL, MVT::v4f64, Legal);
setOperationAction(ISD::FREM, MVT::v4f64, Expand);		setOperationAction(ISD::FREM, MVT::v4f64, Expand);

▲ Show 20 Lines • Show All 6,372 Lines • ▼ Show 20 Lines	static SDValue BuildVSLDOI(SDValue LHS, SDValue RHS, unsigned Amt, EVT VT,
RHS = DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, RHS);		RHS = DAG.getNode(ISD::BITCAST, dl, MVT::v16i8, RHS);

int Ops[16];		int Ops[16];
for (unsigned i = 0; i != 16; ++i)		for (unsigned i = 0; i != 16; ++i)
Ops[i] = i + Amt;		Ops[i] = i + Amt;
SDValue T = DAG.getVectorShuffle(MVT::v16i8, dl, LHS, RHS, Ops);		SDValue T = DAG.getVectorShuffle(MVT::v16i8, dl, LHS, RHS, Ops);
return DAG.getNode(ISD::BITCAST, dl, VT, T);		return DAG.getNode(ISD::BITCAST, dl, VT, T);
}		}

static bool isNonConstSplatBV(BuildVectorSDNode *BVN, EVT Type) {
if (BVN->isConstant() \|\| BVN->getValueType(0) != Type)
return false;
auto OpZero = BVN->getOperand(0);
for (int i = 1, e = BVN->getNumOperands(); i < e; i++)
if (BVN->getOperand(i) != OpZero)
return false;
return true;
}

// If this is a case we can't handle, return null and let the default		// If this is a case we can't handle, return null and let the default
		kbartonUnsubmitted Not Done Reply Inline Actions The current comment says that this can be done for non constant splats, but this seems to indicate it cannot. kbarton: The current comment says that this can be done for non constant splats, but this seems to…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions This function is only called if the node was already confirmed not to be a constant splat (see below). So if the node is constant, it's building a vector out of different constants and it is beneficial to expand the BUILD_VECTOR so we can just get a LOAD node (from the constant pool). nemanjai: This function is only called if the node was already confirmed not to be a constant splat (see…
		kbartonUnsubmitted Not Done Reply Inline Actions That's probably worth an inline comment ;) kbarton: That's probably worth an inline comment ;)
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions OK, I'll add that. // This function is called in a block that confirms the node is not a constant splat. // So a constant BUILD_VECTOR here means the vector is built out of different constants. nemanjai: OK, I'll add that. ``` // This function is called in a block that confirms the node is not a…
// expansion code take care of it. If we CAN select this case, and if it		// expansion code take care of it. If we CAN select this case, and if it
// selects to a single instruction, return Op. Otherwise, if we can codegen		// selects to a single instruction, return Op. Otherwise, if we can codegen
// this case more efficiently than a constant pool load, lower it to the		// this case more efficiently than a constant pool load, lower it to the
		kbartonUnsubmitted Not Done Reply Inline Actions I don't think I've ever seen \param used like this. Typically you have a more detailed description following the \brief description, followed by a list of the parameters and return. Something like: \param V The BuildVectorSDNode to analyze \param HasDirectMove - indicates whether the DirectMove instruction is available \returns true if there is a pattern in a .td file for this node, false otherwise. Also, as a minor nit, I find the description of the logic here a bit convoluted. I generally prefer more high-level english descriptions for the logic: There are several efficient patterns for BUILD_VECTORS. If it is a constant splat it is handled <blah>. If it is a non-constant splat, the following conditions must be true: It is not a load-and-splat It is either a floating point vector or an integer vector on a target with direct moves. This needs to be refined somewhat, because I don't think it actually matches the logic in the function, but hopefully you get the idea. kbarton: I don't think I've ever seen \param used like this. Typically you have a more detailed…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions To be perfectly honest with you, I'm not very familiar with Doxygen and this was based on a very quick reading about how these are to be written in LLVM code (so is probably wrong). I'll just follow the pattern you suggested. How about this for the description: // There are some patterns where it is beneficial to keep a BUILD_VECTOR // node as a BUILD_VECTOR node rather than expanding it. The patterns where // the opposite is true (expansion is beneficial) are: // The node builds a vector out of integers that are not 32 or 64-bits // The node builds a vector out of constants // The node is a "load-and-splat" // In all other cases, we will choose to keep the BUILD_VECTOR. nemanjai: To be perfectly honest with you, I'm not very familiar with Doxygen and this was based on a…
		kbartonUnsubmitted Not Done Reply Inline Actions This looks fine. The only minor comment would be to indent and bullet/number the 3 lines that make up the list. That makes them standout visually. You also need to use /// (3 slashes) to indicate it's something doxygen can format. And make sure to keep the \brief at the beginning - that is important (and quite useful). kbarton: This looks fine. The only minor comment would be to indent and bullet/number the 3 lines that…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions OK, thanks. I'll format it accordingly. nemanjai: OK, thanks. I'll format it accordingly.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Side note: the use of `\brief` is deprecated in llvm, we turned on autobrief some time ago. (Ref: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments ) mehdi_amini: Side note: the use of `\brief` is deprecated in llvm, we turned on autobrief some time ago.
		kbartonUnsubmitted Not Done Reply Inline Actions @mehdi_amini Thanks for the pointer - I didn't realize this! Although, it's not immediately obvious (to me at least) from that text that \brief has been deprecated in favour of autobrief. How does one go about updating that document? kbarton: @mehdi_amini Thanks for the pointer - I didn't realize this! Although, it's not immediately…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Send a patch for llvm/docs/CodingStandards.rst mehdi_amini: Send a patch for llvm/docs/CodingStandards.rst
// sequence of ops that should be used.		// sequence of ops that should be used.
SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,		SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(Op);		SDLoc dl(Op);
BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(Op.getNode());		BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(Op.getNode());
assert(BVN && "Expected a BuildVectorSDNode in LowerBUILD_VECTOR");		assert(BVN && "Expected a BuildVectorSDNode in LowerBUILD_VECTOR");

if (Subtarget.hasQPX() && Op.getValueType() == MVT::v4i1) {		if (Subtarget.hasQPX() && Op.getValueType() == MVT::v4i1) {
// We first build an i32 vector, load it into a QPX register,		// We first build an i32 vector, load it into a QPX register,
// then convert it to a floating-point vector and compare it		// then convert it to a floating-point vector and compare it
// to a zero vector to get the boolean result.		// to a zero vector to get the boolean result.
MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();		MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
		kbartonUnsubmitted Not Done Reply Inline Actions This also looks a bit suspicious, but that's mostly because I'm not clear on the exact conditions under which we have efficient patterns. kbarton: This also looks a bit suspicious, but that's mostly because I'm not clear on the exact…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I think hopefully the updated comment will clarify this. nemanjai: I think hopefully the updated comment will clarify this.
		kbartonUnsubmitted Not Done Reply Inline Actions Yes, this is good now. kbarton: Yes, this is good now.
int FrameIdx = MFI.CreateStackObject(16, 16, false);		int FrameIdx = MFI.CreateStackObject(16, 16, false);
MachinePointerInfo PtrInfo =		MachinePointerInfo PtrInfo =
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FrameIdx);		MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FrameIdx);
EVT PtrVT = getPointerTy(DAG.getDataLayout());		EVT PtrVT = getPointerTy(DAG.getDataLayout());
SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);		SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);

assert(BVN->getNumOperands() == 4 &&		assert(BVN->getNumOperands() == 4 &&
"BUILD_VECTOR for v4i1 does not have 4 operands");		"BUILD_VECTOR for v4i1 does not have 4 operands");
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,

// Check if this is a splat of a constant value.		// Check if this is a splat of a constant value.
APInt APSplatBits, APSplatUndef;		APInt APSplatBits, APSplatUndef;
unsigned SplatBitSize;		unsigned SplatBitSize;
bool HasAnyUndefs;		bool HasAnyUndefs;
if (! BVN->isConstantSplat(APSplatBits, APSplatUndef, SplatBitSize,		if (! BVN->isConstantSplat(APSplatBits, APSplatUndef, SplatBitSize,
HasAnyUndefs, 0, !Subtarget.isLittleEndian()) \|\|		HasAnyUndefs, 0, !Subtarget.isLittleEndian()) \|\|
SplatBitSize > 32) {		SplatBitSize > 32) {
// We can splat a non-const value on CPU's that implement ISA 3.0		if (!Subtarget.hasVSX())
// in two ways: LXVWSX (load and splat) and MTVSRWS(move and splat).		return SDValue();
auto OpZero = BVN->getOperand(0);
		kbartonUnsubmitted Not Done Reply Inline Actions Would it be better to add this check in the condition above? Then at least the remaining logic might be able to lower this, instead of just punting to the default case. If not, then I think a comment here is justified. kbarton: Would it be better to add this check in the condition above? Then at least the remaining logic…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Do you mean add this condition to the outer if? I don't think that would have the correct semantics. We don't want to bail on constant splats regardless of whether we have VSX or not. And if this is not a constant splat, we may be able to do something better depending on a number of conditions - however, without VSX, we can't do anything very productive so we bail. How about the following comment: // BUILD_VECTOR nodes that are not constant splats of up to 32-bits can be // lowered to VSX instructions under certain conditions. // Without VSX however, there is no pattern more efficient than expanding the node. nemanjai: Do you mean add this condition to the outer if? I don't think that would have the correct…
		kbartonUnsubmitted Not Done Reply Inline Actions Yup, this is good. Thanks. kbarton: Yup, this is good. Thanks.
bool CanLoadAndSplat = OpZero.getOpcode() == ISD::LOAD &&		// SDAG patterns are provided for building vectors out of values that are
BVN->isOnlyUserOf(OpZero.getNode());		// in registers.
if (Subtarget.isISA3_0() && !CanLoadAndSplat &&		bool RightType = Op.getValueType() == MVT::v2f64 \|\|
(isNonConstSplatBV(BVN, MVT::v4i32) \|\|		Op.getValueType() == MVT::v4f32 \|\|
isNonConstSplatBV(BVN, MVT::v2i64)))		(Op.getValueType() == MVT::v2i64 && Subtarget.hasDirectMove()) \|\|
		(Op.getValueType() == MVT::v4i32 && Subtarget.hasDirectMove());

		// We have efficient patterns for BUILD_VECTOR nodes whose inputs
		// are non-constant and non-undef. Also, if this is a load-and-splat,
		// it is better handled through (splat (scalar_to_vector)).
		auto haveEfficientPattern = [&](BuildVectorSDNode *V) -> bool {
		bool IsSplat = true;
		bool IsLoad = false;
		SDValue Op0 = V->getOperand(0);
		amehsanUnsubmitted Not Done Reply Inline Actions Do we need a lambda that is called only once? Since this function is very long already, I think creating a separate function for this is reasonable to avoid making this function too long. amehsan: Do we need a lambda that is called only once? Since this function is very long already, I think…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I only added a lambda for two reasons: The logic is very close to where it's used The logic isn't needed anywhere else But I really don't feel strongly about that reasoning and would be happy to change it to a function if that's what is preferred. What do the others think? nemanjai: I only added a lambda for two reasons: 1. The logic is very close to where it's used 2. The…
		echristoUnsubmitted Not Done Reply Inline Actions No strong opinion here. I'm slightly more likely to want to outline code because it is a very long function, but no strong opinion. echristo: No strong opinion here. I'm slightly more likely to want to outline code because it is a very…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Two reviewers constitute a consensus in my opinion. I'll convert this to a static function. nemanjai: Two reviewers constitute a consensus in my opinion. I'll convert this to a static function.
		if (V->isConstant())
		return false;
		for (int i = 0, e = V->getNumOperands(); i < e; i++) {
		if (V->getOperand(i).isUndef())
		return false;
		// We want to expand nodes that represent load-and-splat even if the
		// loaded value is a floating point truncation or conversion to int.
		if (V->getOperand(i).getOpcode() == ISD::LOAD \|\|
		(V->getOperand(i).getOpcode() == ISD::FP_ROUND &&
		V->getOperand(i).getOperand(0).getOpcode() == ISD::LOAD) \|\|
		(V->getOperand(i).getOpcode() == ISD::FP_TO_SINT &&
		V->getOperand(i).getOperand(0).getOpcode() == ISD::LOAD) \|\|
		(V->getOperand(i).getOpcode() == ISD::FP_TO_UINT &&
		V->getOperand(i).getOperand(0).getOpcode() == ISD::LOAD))
		IsLoad = true;
		// If the operands are different or the input is not a load and has more
		// uses than just this BV node, then it isn't a splat.
		if (V->getOperand(i) != Op0 \|\|
		(!IsLoad && !V->isOnlyUserOf(V->getOperand(i).getNode())))
		IsSplat = false;
		}
		return !(IsSplat && IsLoad);
		};
		if (RightType && haveEfficientPattern(BVN))
return Op;		return Op;
return SDValue();		return SDValue();
}		}

unsigned SplatBits = APSplatBits.getZExtValue();		unsigned SplatBits = APSplatBits.getZExtValue();
unsigned SplatUndef = APSplatUndef.getZExtValue();		unsigned SplatUndef = APSplatUndef.getZExtValue();
unsigned SplatSize = SplatBitSize / 8;		unsigned SplatSize = SplatBitSize / 8;

// First, handle single instruction cases.		// First, handle single instruction cases.

// All zeros?		// All zeros?
if (SplatBits == 0) {		if (SplatBits == 0) {
// Canonicalize all zero vectors to be v4i32.		// Canonicalize all zero vectors to be v4i32.
if (Op.getValueType() != MVT::v4i32 \|\| HasAnyUndefs) {		if (Op.getValueType() != MVT::v4i32 \|\| HasAnyUndefs) {
SDValue Z = DAG.getConstant(0, dl, MVT::v4i32);		SDValue Z = DAG.getConstant(0, dl, MVT::v4i32);
Op = DAG.getNode(ISD::BITCAST, dl, Op.getValueType(), Z);		Op = DAG.getNode(ISD::BITCAST, dl, Op.getValueType(), Z);
}		}
return Op;		return Op;
}		}

// We have XXSPLTIB for constant splats one byte wide		// We have XXSPLTIB for constant splats one byte wide
if (Subtarget.isISA3_0() && Op.getValueType() == MVT::v16i8)		if (Subtarget.hasP9Vector() && SplatSize == 1) {
		// This is a splat of 1-byte elements with some elements potentially undef.
		// Rather than trying to match undef in the SDAG patterns, ensure that all
		// elements are the same constant.
		if (HasAnyUndefs \|\| ISD::isBuildVectorAllOnes(BVN)) {
		kbartonUnsubmitted Not Done Reply Inline Actions I don't follow this logic. Doesn't the isBuildVectorAllOnes already account for any Undefs in the vector? kbarton: I don't follow this logic. Doesn't the isBuildVectorAllOnes already account for any Undefs in…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Neither condition is strictly a strengthened version of the other. Think about a build vector node such as this: (v16i8 (build_vector i8:3, i8:3, i8:3, i8:undef, i8:undef, i8:3, ...)) That one has undefs and the splat size is 1 byte but it is not a BUILD_VECTOR of all ones. Although it is certainly possible for a node to have undefs and be a BUILD_VECTOR of all ones. nemanjai: Neither condition is strictly a strengthened version of the other. Think about a build vector…
		kbartonUnsubmitted Not Done Reply Inline Actions I still don't follow the logic. In the example above, do we want to match it or not? Based on the comments above, we want to (all elements are either 3, or undef). How do we guarantee that all the all elements are the same constant - I don't see that check anywhere. kbarton: I still don't follow the logic. In the example above, do we want to match it or not? Based on…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions So at this point, we have determined that this is a splat of a constant and that we can build this vector by splatting a 1-byte value. We also have the value of that constant (`SplatBits`). However, the actual node may have some undefs. What this code will do is change any inputs into the value that needs to be splat to build this vector. In the example above, we'll just replace all inputs with constant 3 and leave everything the same so the BUILD_VECTOR node will be matched in the .td file. nemanjai: So at this point, we have determined that this is a splat of a constant and that we can build…
		SmallVector<SDValue, 16> Ops(16, DAG.getConstant(SplatBits,
		dl, MVT::i32));
		SDValue NewBV = DAG.getBuildVector(MVT::v16i8, dl, Ops);
		if (Op.getValueType() != MVT::v16i8)
		return DAG.getBitcast(Op.getValueType(), NewBV);
		return NewBV;
		}
return Op;		return Op;
		}

// If the sign extended value is in the range [-16,15], use VSPLTI[bhw].		// If the sign extended value is in the range [-16,15], use VSPLTI[bhw].
int32_t SextVal= (int32_t(SplatBits << (32-SplatBitSize)) >>		int32_t SextVal= (int32_t(SplatBits << (32-SplatBitSize)) >>
(32-SplatBitSize));		(32-SplatBitSize));
if (SextVal >= -16 && SextVal <= 15)		if (SextVal >= -16 && SextVal <= 15)
return BuildSplatI(SextVal, SplatSize, Op.getValueType(), DAG, dl);		return BuildSplatI(SextVal, SplatSize, Op.getValueType(), DAG, dl);

// Two instruction sequences.		// Two instruction sequences.
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
}		}

if (Subtarget.hasVSX()) {		if (Subtarget.hasVSX()) {
if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {		if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {
int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);		int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);

// If the source for the shuffle is a scalar_to_vector that came from a		// If the source for the shuffle is a scalar_to_vector that came from a
// 32-bit load, it will have used LXVWSX so we don't need to splat again.		// 32-bit load, it will have used LXVWSX so we don't need to splat again.
if (Subtarget.isISA3_0() &&		if (Subtarget.hasP9Vector() &&
((isLittleEndian && SplatIdx == 3) \|\|		((isLittleEndian && SplatIdx == 3) \|\|
(!isLittleEndian && SplatIdx == 0))) {		(!isLittleEndian && SplatIdx == 0))) {
SDValue Src = V1.getOperand(0);		SDValue Src = V1.getOperand(0);
if (Src.getOpcode() == ISD::SCALAR_TO_VECTOR &&		if (Src.getOpcode() == ISD::SCALAR_TO_VECTOR &&
Src.getOperand(0).getOpcode() == ISD::LOAD &&		Src.getOperand(0).getOpcode() == ISD::LOAD &&
Src.getOperand(0).hasOneUse())		Src.getOperand(0).hasOneUse())
return V1;		return V1;
}		}
▲ Show 20 Lines • Show All 4,873 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrInfo.td

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	def imm64SExt16 : Operand<i64>, ImmLeaf<i64, [{
return (int64_t)Imm == (short)Imm;		return (int64_t)Imm == (short)Imm;
}]>;		}]>;
def immZExt16 : PatLeaf<(imm), [{		def immZExt16 : PatLeaf<(imm), [{
// immZExt16 predicate - True if the immediate fits in a 16-bit zero extended		// immZExt16 predicate - True if the immediate fits in a 16-bit zero extended
// field. Used by instructions like 'ori'.		// field. Used by instructions like 'ori'.
return (uint64_t)N->getZExtValue() == (unsigned short)N->getZExtValue();		return (uint64_t)N->getZExtValue() == (unsigned short)N->getZExtValue();
}], LO16>;		}], LO16>;
def immSExt8 : ImmLeaf<i32, [{ return isInt<8>(Imm); }]>;		def immSExt8 : ImmLeaf<i32, [{ return isInt<8>(Imm); }]>;
		def immSExt5NonZero : ImmLeaf<i32, [{ return Imm && isInt<5>(Imm); }]>;

// imm16Shifted* - These match immediates where the low 16-bits are zero. There		// imm16Shifted* - These match immediates where the low 16-bits are zero. There
// are two forms: imm16ShiftedSExt and imm16ShiftedZExt. These two forms are		// are two forms: imm16ShiftedSExt and imm16ShiftedZExt. These two forms are
// identical in 32-bit mode, but in 64-bit mode, they return true if the		// identical in 32-bit mode, but in 64-bit mode, they return true if the
// immediate fits into a sign/zero extended 32-bit immediate (with the low bits		// immediate fits into a sign/zero extended 32-bit immediate (with the low bits
// clear).		// clear).
def imm16ShiftedZExt : PatLeaf<(imm), [{		def imm16ShiftedZExt : PatLeaf<(imm), [{
// imm16ShiftedZExt predicate - True if only bits in the top 16-bits of the		// imm16ShiftedZExt predicate - True if only bits in the top 16-bits of the
▲ Show 20 Lines • Show All 4,062 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 564 Lines • ▼ Show 20 Lines	let Uses = [RM] in {
// Conversion Instructions		// Conversion Instructions
def XSCVDPSP : XX2Form<60, 265,		def XSCVDPSP : XX2Form<60, 265,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpsp $XT, $XB", IIC_VecFP, []>;		"xscvdpsp $XT, $XB", IIC_VecFP, []>;
def XSCVDPSXDS : XX2Form<60, 344,		def XSCVDPSXDS : XX2Form<60, 344,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpsxds $XT, $XB", IIC_VecFP,		"xscvdpsxds $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctidz f64:$XB))]>;		[(set f64:$XT, (PPCfctidz f64:$XB))]>;
		let isCodeGenOnly = 1 in
		def XSCVDPSXDSs : XX2Form<60, 344,
		(outs vssrc:$XT), (ins vssrc:$XB),
		"xscvdpsxds $XT, $XB", IIC_VecFP,
		[(set f32:$XT, (PPCfctidz f32:$XB))]>;
def XSCVDPSXWS : XX2Form<60, 88,		def XSCVDPSXWS : XX2Form<60, 88,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpsxws $XT, $XB", IIC_VecFP,		"xscvdpsxws $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctiwz f64:$XB))]>;		[(set f64:$XT, (PPCfctiwz f64:$XB))]>;
		let isCodeGenOnly = 1 in
		def XSCVDPSXWSs : XX2Form<60, 88,
		(outs vssrc:$XT), (ins vssrc:$XB),
		"xscvdpsxws $XT, $XB", IIC_VecFP,
		[(set f32:$XT, (PPCfctiwz f32:$XB))]>;
def XSCVDPUXDS : XX2Form<60, 328,		def XSCVDPUXDS : XX2Form<60, 328,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpuxds $XT, $XB", IIC_VecFP,		"xscvdpuxds $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctiduz f64:$XB))]>;		[(set f64:$XT, (PPCfctiduz f64:$XB))]>;
		let isCodeGenOnly = 1 in
		def XSCVDPUXDSs : XX2Form<60, 328,
		(outs vssrc:$XT), (ins vssrc:$XB),
		"xscvdpuxds $XT, $XB", IIC_VecFP,
		[(set f32:$XT, (PPCfctiduz f32:$XB))]>;
def XSCVDPUXWS : XX2Form<60, 72,		def XSCVDPUXWS : XX2Form<60, 72,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpuxws $XT, $XB", IIC_VecFP,		"xscvdpuxws $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctiwuz f64:$XB))]>;		[(set f64:$XT, (PPCfctiwuz f64:$XB))]>;
		let isCodeGenOnly = 1 in
		def XSCVDPUXWSs : XX2Form<60, 72,
		(outs vssrc:$XT), (ins vssrc:$XB),
		"xscvdpuxws $XT, $XB", IIC_VecFP,
		[(set f32:$XT, (PPCfctiwuz f32:$XB))]>;
def XSCVSPDP : XX2Form<60, 329,		def XSCVSPDP : XX2Form<60, 329,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvspdp $XT, $XB", IIC_VecFP, []>;		"xscvspdp $XT, $XB", IIC_VecFP, []>;
def XSCVSXDDP : XX2Form<60, 376,		def XSCVSXDDP : XX2Form<60, 376,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvsxddp $XT, $XB", IIC_VecFP,		"xscvsxddp $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfcfid f64:$XB))]>;		[(set f64:$XT, (PPCfcfid f64:$XB))]>;
def XSCVUXDDP : XX2Form<60, 360,		def XSCVUXDDP : XX2Form<60, 360,
Show All 22 Lines	let Uses = [RM] in {
def XVCVSPDP : XX2Form<60, 457,		def XVCVSPDP : XX2Form<60, 457,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvspdp $XT, $XB", IIC_VecFP, []>;		"xvcvspdp $XT, $XB", IIC_VecFP, []>;
def XVCVSPSXDS : XX2Form<60, 408,		def XVCVSPSXDS : XX2Form<60, 408,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvspsxds $XT, $XB", IIC_VecFP, []>;		"xvcvspsxds $XT, $XB", IIC_VecFP, []>;
def XVCVSPSXWS : XX2Form<60, 152,		def XVCVSPSXWS : XX2Form<60, 152,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvspsxws $XT, $XB", IIC_VecFP, []>;		"xvcvspsxws $XT, $XB", IIC_VecFP,
		[(set v4i32:$XT, (fp_to_sint v4f32:$XB))]>;
def XVCVSPUXDS : XX2Form<60, 392,		def XVCVSPUXDS : XX2Form<60, 392,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvspuxds $XT, $XB", IIC_VecFP, []>;		"xvcvspuxds $XT, $XB", IIC_VecFP, []>;
def XVCVSPUXWS : XX2Form<60, 136,		def XVCVSPUXWS : XX2Form<60, 136,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvspuxws $XT, $XB", IIC_VecFP, []>;		"xvcvspuxws $XT, $XB", IIC_VecFP,
		[(set v4i32:$XT, (fp_to_uint v4f32:$XB))]>;
def XVCVSXDDP : XX2Form<60, 504,		def XVCVSXDDP : XX2Form<60, 504,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvsxddp $XT, $XB", IIC_VecFP,		"xvcvsxddp $XT, $XB", IIC_VecFP,
[(set v2f64:$XT, (sint_to_fp v2i64:$XB))]>;		[(set v2f64:$XT, (sint_to_fp v2i64:$XB))]>;
def XVCVSXDSP : XX2Form<60, 440,		def XVCVSXDSP : XX2Form<60, 440,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvsxdsp $XT, $XB", IIC_VecFP, []>;		"xvcvsxdsp $XT, $XB", IIC_VecFP, []>;
def XVCVSXWDP : XX2Form<60, 248,		def XVCVSXWDP : XX2Form<60, 248,
Show All 10 Lines	let Uses = [RM] in {
def XVCVUXDSP : XX2Form<60, 424,		def XVCVUXDSP : XX2Form<60, 424,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvuxdsp $XT, $XB", IIC_VecFP, []>;		"xvcvuxdsp $XT, $XB", IIC_VecFP, []>;
def XVCVUXWDP : XX2Form<60, 232,		def XVCVUXWDP : XX2Form<60, 232,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvuxwdp $XT, $XB", IIC_VecFP, []>;		"xvcvuxwdp $XT, $XB", IIC_VecFP, []>;
def XVCVUXWSP : XX2Form<60, 168,		def XVCVUXWSP : XX2Form<60, 168,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvuxwsp $XT, $XB", IIC_VecFP, []>;		"xvcvuxwsp $XT, $XB", IIC_VecFP,
		[(set v4f32:$XT, (uint_to_fp v4i32:$XB))]>;

// Rounding Instructions		// Rounding Instructions
def XSRDPI : XX2Form<60, 73,		def XSRDPI : XX2Form<60, 73,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
"xsrdpi $XT, $XB", IIC_VecFP,		"xsrdpi $XT, $XB", IIC_VecFP,
[(set f64:$XT, (fround f64:$XB))]>;		[(set f64:$XT, (fround f64:$XB))]>;
def XSRDPIC : XX2Form<60, 107,		def XSRDPIC : XX2Form<60, 107,
(outs vsfrc:$XT), (ins vsfrc:$XB),		(outs vsfrc:$XT), (ins vsfrc:$XB),
▲ Show 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	let mayStore = 1 in {
def STXSIWX : XX1Form<31, 140, (outs), (ins vsfrc:$XT, memrr:$dst),		def STXSIWX : XX1Form<31, 140, (outs), (ins vsfrc:$XT, memrr:$dst),
"stxsiwx $XT, $dst", IIC_LdStSTFD,		"stxsiwx $XT, $dst", IIC_LdStSTFD,
[(PPCstfiwx f64:$XT, xoaddr:$dst)]>;		[(PPCstfiwx f64:$XT, xoaddr:$dst)]>;
} // mayStore		} // mayStore
} // UseVSXReg = 1		} // UseVSXReg = 1

def : Pat<(f64 (extloadf32 xoaddr:$src)),		def : Pat<(f64 (extloadf32 xoaddr:$src)),
(COPY_TO_REGCLASS (LXSSPX xoaddr:$src), VSFRC)>;		(COPY_TO_REGCLASS (LXSSPX xoaddr:$src), VSFRC)>;
		def : Pat<(f32 (fpround (extloadf32 xoaddr:$src))),
		(f32 (LXSSPX xoaddr:$src))>;
def : Pat<(f64 (fpextend f32:$src)),		def : Pat<(f64 (fpextend f32:$src)),
(COPY_TO_REGCLASS $src, VSFRC)>;		(COPY_TO_REGCLASS $src, VSFRC)>;

def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETLT)),		def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETLT)),
(SELECT_VSSRC (CRANDC $lhs, $rhs), $tval, $fval)>;		(SELECT_VSSRC (CRANDC $lhs, $rhs), $tval, $fval)>;
def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETULT)),		def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETULT)),
(SELECT_VSSRC (CRANDC $rhs, $lhs), $tval, $fval)>;		(SELECT_VSSRC (CRANDC $rhs, $lhs), $tval, $fval)>;
def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETLE)),		def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETLE)),
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
def : Pat<(f32 (PPCfcfidus (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),		def : Pat<(f32 (PPCfcfidus (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),
(f32 (XSCVUXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;		(f32 (XSCVUXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
}		}
def : Pat<(v4i32 (scalar_to_vector ScalarLoads.Li32)),		def : Pat<(v4i32 (scalar_to_vector ScalarLoads.Li32)),
(v4i32 (XXSPLTWs (LXSIWAX xoaddr:$src), 1))>;		(v4i32 (XXSPLTWs (LXSIWAX xoaddr:$src), 1))>;
} // AddedComplexity = 400		} // AddedComplexity = 400
} // HasP8Vector		} // HasP8Vector

let UseVSXReg = 1 in {		let UseVSXReg = 1, AddedComplexity = 400 in {
		kbartonUnsubmitted Not Done Reply Inline Actions Is there a way we can do this without the use of AddedComplexity? kbarton: Is there a way we can do this without the use of AddedComplexity?
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Are you suggesting that we do away with AddedComplexity from VSX instruction definitions altogether? This was just added for consistency as it was initially missing. Ultimately, having it there doesn't change anything (at this time) because these instructions match PPCISD nodes that are not matched by anything else. However, in the highly unlikely situation where some VMX instructions are added in the future that match these nodes, it is good to have the AddedComplexity there as it appears in the remainder of the VSX target definition file. If you want, I can certainly remove this instance of it without affecting anything this patch does. nemanjai: Are you suggesting that we do away with AddedComplexity from VSX instruction definitions…
		kbartonUnsubmitted Not Done Reply Inline Actions My (slight) preference is to not use it unless it is absolutely necessary. Ideally we'll get remove the need for it at some point, but that is a big piece of work. For now, I don't think we should be using it unless it's absolutely necessary. kbarton: My (slight) preference is to not use it unless it is absolutely necessary. Ideally we'll get…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Yeah, for sure. I'll remove this. Direct moves do not have a corresponding VMX alternative so this accomplishes nothing. nemanjai: Yeah, for sure. I'll remove this. Direct moves do not have a corresponding VMX alternative so…
let Predicates = [HasDirectMove] in {		let Predicates = [HasDirectMove] in {
// VSX direct move instructions		// VSX direct move instructions
def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),		def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),
"mfvsrd $rA, $XT", IIC_VecGeneral,		"mfvsrd $rA, $XT", IIC_VecGeneral,
[(set i64:$rA, (PPCmfvsr f64:$XT))]>,		[(set i64:$rA, (PPCmfvsr f64:$XT))]>,
Requires<[In64BitMode]>;		Requires<[In64BitMode]>;
def MFVSRWZ : XX1_RS6_RD5_XO<31, 115, (outs gprc:$rA), (ins vsfrc:$XT),		def MFVSRWZ : XX1_RS6_RD5_XO<31, 115, (outs gprc:$rA), (ins vsfrc:$XT),
"mfvsrwz $rA, $XT", IIC_VecGeneral,		"mfvsrwz $rA, $XT", IIC_VecGeneral,
▲ Show 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	/* BE variable double
Same as the BE doubleword except there is no move.		Same as the BE doubleword except there is no move.
*/		*/
dag BE_VDOUBLE_PERMUTE = (VPERM (COPY_TO_REGCLASS $S, VRRC),		dag BE_VDOUBLE_PERMUTE = (VPERM (COPY_TO_REGCLASS $S, VRRC),
(COPY_TO_REGCLASS $S, VRRC),		(COPY_TO_REGCLASS $S, VRRC),
BE_VDWORD_PERM_VEC);		BE_VDWORD_PERM_VEC);
dag BE_VARIABLE_DOUBLE = (COPY_TO_REGCLASS BE_VDOUBLE_PERMUTE, VSRC);		dag BE_VARIABLE_DOUBLE = (COPY_TO_REGCLASS BE_VDOUBLE_PERMUTE, VSRC);
}		}

		let AddedComplexity = 400 in {
// v4f32 scalar <-> vector conversions (BE)		// v4f32 scalar <-> vector conversions (BE)
let Predicates = [IsBigEndian, HasP8Vector] in {		let Predicates = [IsBigEndian, HasP8Vector] in {
def : Pat<(v4f32 (scalar_to_vector f32:$A)),		def : Pat<(v4f32 (scalar_to_vector f32:$A)),
(v4f32 (XSCVDPSPN $A))>;		(v4f32 (XSCVDPSPN $A))>;
def : Pat<(f32 (vector_extract v4f32:$S, 0)),		def : Pat<(f32 (vector_extract v4f32:$S, 0)),
(f32 (XSCVSPDPN $S))>;		(f32 (XSCVSPDPN $S))>;
def : Pat<(f32 (vector_extract v4f32:$S, 1)),		def : Pat<(f32 (vector_extract v4f32:$S, 1)),
(f32 (XSCVSPDPN (XXSLDWI $S, $S, 1)))>;		(f32 (XSCVSPDPN (XXSLDWI $S, $S, 1)))>;
▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	def : Pat<(i64 (bitconvert f64:$S)),
(i64 (MFVSRD $S))>;		(i64 (MFVSRD $S))>;

// bitconvert i64 -> f64		// bitconvert i64 -> f64
// (move to FPR, nothing else needed)		// (move to FPR, nothing else needed)
def : Pat<(f64 (bitconvert i64:$S)),		def : Pat<(f64 (bitconvert i64:$S)),
(f64 (MTVSRD $S))>;		(f64 (MTVSRD $S))>;
}		}

		// Materialize a zero-vector of long long
		def : Pat<(v2i64 immAllZerosV),
		(v2i64 (XXLXORz))>;
		}

def AlignValues {		def AlignValues {
dag F32_TO_BE_WORD1 = (v4f32 (XXSLDWI (XSCVDPSPN $B), (XSCVDPSPN $B), 3));		dag F32_TO_BE_WORD1 = (v4f32 (XXSLDWI (XSCVDPSPN $B), (XSCVDPSPN $B), 3));
dag I32_TO_BE_WORD1 = (COPY_TO_REGCLASS (MTVSRWZ $B), VSRC);		dag I32_TO_BE_WORD1 = (COPY_TO_REGCLASS (MTVSRWZ $B), VSRC);
}		}

// Materialize a zero-vector of long long
def : Pat<(v2i64 immAllZerosV),
(v2i64 (XXLXORz))>;

// The following VSX instructions were introduced in Power ISA 3.0		// The following VSX instructions were introduced in Power ISA 3.0
def HasP9Vector : Predicate<"PPCSubTarget->hasP9Vector()">;		def HasP9Vector : Predicate<"PPCSubTarget->hasP9Vector()">;
let AddedComplexity = 400, Predicates = [HasP9Vector] in {		let AddedComplexity = 400, Predicates = [HasP9Vector] in {

// [PO VRT XO VRB XO /]		// [PO VRT XO VRB XO /]
class X_VT5_XO5_VB5<bits<6> opcode, bits<5> xo2, bits<10> xo, string opc,		class X_VT5_XO5_VB5<bits<6> opcode, bits<5> xo2, bits<10> xo, string opc,
list<dag> pattern>		list<dag> pattern>
: X_RD5_XO5_RS5<opcode, xo2, xo, (outs vrrc:$vT), (ins vrrc:$vB),		: X_RD5_XO5_RS5<opcode, xo2, xo, (outs vrrc:$vT), (ins vrrc:$vB),
▲ Show 20 Lines • Show All 443 Lines • ▼ Show 20 Lines	def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),
(STXVX $rS, xoaddr:$dst)>;		(STXVX $rS, xoaddr:$dst)>;
def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),		def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
(STXVX $rS, xoaddr:$dst)>;		(STXVX $rS, xoaddr:$dst)>;

def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),		def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
(v4i32 (LXVWSX xoaddr:$src))>;		(v4i32 (LXVWSX xoaddr:$src))>;
def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),		def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),
(v4f32 (LXVWSX xoaddr:$src))>;		(v4f32 (LXVWSX xoaddr:$src))>;
def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),		def : Pat<(v4f32 (scalar_to_vector (f32 (fpround (extloadf32 xoaddr:$src))))),
(v4i32 (MTVSRWS $A))>;		(v4f32 (LXVWSX xoaddr:$src))>;
def : Pat<(v16i8 (build_vector immSExt8:$A, immSExt8:$A, immSExt8:$A,
immSExt8:$A, immSExt8:$A, immSExt8:$A,
immSExt8:$A, immSExt8:$A, immSExt8:$A,
immSExt8:$A, immSExt8:$A, immSExt8:$A,
immSExt8:$A, immSExt8:$A, immSExt8:$A,
immSExt8:$A)),
(v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>;
def : Pat<(v16i8 immAllOnesV),
(v16i8 (COPY_TO_REGCLASS (XXSPLTIB 255), VSRC))>;
def : Pat<(v8i16 immAllOnesV),
(v8i16 (COPY_TO_REGCLASS (XXSPLTIB 255), VSRC))>;
def : Pat<(v4i32 immAllOnesV),
(v4i32 (XXSPLTIB 255))>;
def : Pat<(v2i64 immAllOnesV),
(v2i64 (XXSPLTIB 255))>;

// Build vectors from i8 loads		// Build vectors from i8 loads
def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)),		def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)),
(v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>;		(v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>;
def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)),		def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)),
(v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>;		(v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>;
def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)),		def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)),
(v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>;		(v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>;
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)), xoaddr:$dst),
(STXSIHXv (VSLDOI $S, $S, 10), xoaddr:$dst)>;		(STXSIHXv (VSLDOI $S, $S, 10), xoaddr:$dst)>;
} // IsLittleEndian, HasP9Vector		} // IsLittleEndian, HasP9Vector

// Vector sign extensions		// Vector sign extensions
def : Pat<(f64 (PPCVexts f64:$A, 1)),		def : Pat<(f64 (PPCVexts f64:$A, 1)),
(f64 (COPY_TO_REGCLASS (VEXTSB2Ds $A), VSFRC))>;		(f64 (COPY_TO_REGCLASS (VEXTSB2Ds $A), VSFRC))>;
def : Pat<(f64 (PPCVexts f64:$A, 2)),		def : Pat<(f64 (PPCVexts f64:$A, 2)),
(f64 (COPY_TO_REGCLASS (VEXTSH2Ds $A), VSFRC))>;		(f64 (COPY_TO_REGCLASS (VEXTSH2Ds $A), VSFRC))>;

let isPseudo = 1 in {		let isPseudo = 1 in {
def DFLOADf32 : Pseudo<(outs vssrc:$XT), (ins memrix:$src),		def DFLOADf32 : Pseudo<(outs vssrc:$XT), (ins memrix:$src),
"#DFLOADf32",		"#DFLOADf32",
[(set f32:$XT, (load iaddr:$src))]>;		[(set f32:$XT, (load iaddr:$src))]>;
def DFLOADf64 : Pseudo<(outs vsfrc:$XT), (ins memrix:$src),		def DFLOADf64 : Pseudo<(outs vsfrc:$XT), (ins memrix:$src),
"#DFLOADf64",		"#DFLOADf64",
[(set f64:$XT, (load iaddr:$src))]>;		[(set f64:$XT, (load iaddr:$src))]>;
def DFSTOREf32 : Pseudo<(outs), (ins vssrc:$XT, memrix:$dst),		def DFSTOREf32 : Pseudo<(outs), (ins vssrc:$XT, memrix:$dst),
"#DFSTOREf32",		"#DFSTOREf32",
[(store f32:$XT, iaddr:$dst)]>;		[(store f32:$XT, iaddr:$dst)]>;
def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),		def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),
"#DFSTOREf64",		"#DFSTOREf64",
[(store f64:$XT, iaddr:$dst)]>;		[(store f64:$XT, iaddr:$dst)]>;
}		}
def : Pat<(f64 (extloadf32 iaddr:$src)),		def : Pat<(f64 (extloadf32 iaddr:$src)),
(COPY_TO_REGCLASS (DFLOADf32 iaddr:$src), VSFRC)>;		(COPY_TO_REGCLASS (DFLOADf32 iaddr:$src), VSFRC)>;
		def : Pat<(f32 (fpround (extloadf32 iaddr:$src))),
		(f32 (DFLOADf32 iaddr:$src))>;
} // end HasP9Vector, AddedComplexity		} // end HasP9Vector, AddedComplexity

let Predicates = [IsISA3_0, HasDirectMove, IsLittleEndian] in {		// Integer extend helper dags 32 -> 64
def : Pat<(v2i64 (build_vector i64:$rA, i64:$rB)),		def AnyExts {
(v2i64 (MTVSRDD $rB, $rA))>;		dag A = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32);
def : Pat<(i64 (extractelt v2i64:$A, 0)),		dag B = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $B, sub_32);
(i64 (MFVSRLD $A))>;		dag C = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $C, sub_32);
		dag D = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $D, sub_32);
		}

		def DblToFlt {
		dag A0 = (f32 (fpround (f64 (extractelt v2f64:$A, 0))));
		dag A1 = (f32 (fpround (f64 (extractelt v2f64:$A, 1))));
		dag B0 = (f32 (fpround (f64 (extractelt v2f64:$B, 0))));
		dag B1 = (f32 (fpround (f64 (extractelt v2f64:$B, 1))));
		}
		def FltToIntLoad {
		dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (extloadf32 xoaddr:$A)))));
		}
		def FltToUIntLoad {
		dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));
		}
		def FltToLongLoad {
		dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (extloadf32 xoaddr:$A)))));
		}
		def FltToULongLoad {
		dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (extloadf32 xoaddr:$A)))));
		}
		def FltToLong {
		dag A = (i64 (PPCmfvsr (PPCfctidz (fpextend f32:$A))));
		}
		def FltToULong {
		dag A = (i64 (PPCmfvsr (PPCfctiduz (fpextend f32:$A))));
		}
		def DblToInt {
		dag A = (i32 (PPCmfvsr (f64 (PPCfctiwz f64:$A))));
		}
		def DblToUInt {
		dag A = (i32 (PPCmfvsr (f64 (PPCfctiwuz f64:$A))));
		}
		def DblToLong {
		dag A = (i64 (PPCmfvsr (f64 (PPCfctidz f64:$A))));
		}
		def DblToULong {
		dag A = (i64 (PPCmfvsr (f64 (PPCfctiduz f64:$A))));
		}
		def DblToIntLoad {
		dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (load xoaddr:$A)))));
		}
		def DblToUIntLoad {
		dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (load xoaddr:$A)))));
		}
		def DblToLongLoad {
		dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (load xoaddr:$A)))));
		}
		def DblToULongLoad {
		dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (load xoaddr:$A)))));
		}

		// FP merge dags (for f32 -> v4f32)
		def MrgFP {
		dag AC = (XVCVDPSP (XXPERMDI (COPY_TO_REGCLASS $A, VSRC),
		(COPY_TO_REGCLASS $C, VSRC), 0));
		dag BD = (XVCVDPSP (XXPERMDI (COPY_TO_REGCLASS $B, VSRC),
		(COPY_TO_REGCLASS $D, VSRC), 0));
		dag ABhToFlt = (XVCVDPSP (XXPERMDI $A, $B, 0));
		dag ABlToFlt = (XVCVDPSP (XXPERMDI $A, $B, 3));
		dag BAhToFlt = (XVCVDPSP (XXPERMDI $B, $A, 0));
		dag BAlToFlt = (XVCVDPSP (XXPERMDI $B, $A, 3));
		}

		// Patterns for BUILD_VECTOR nodes.
		def NoP9Vector : Predicate<"!PPCSubTarget->hasP9Vector()">;
		let AddedComplexity = 400 in {

		let Predicates = [HasVSX] in {
		// Build vectors of floating point converted to i32.
		def : Pat<(v4i32 (build_vector DblToInt.A, DblToInt.A,
		DblToInt.A, DblToInt.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS $A), VSRC), 1))>;
		def : Pat<(v4i32 (build_vector DblToUInt.A, DblToUInt.A,
		DblToUInt.A, DblToUInt.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS $A), VSRC), 1))>;
		def : Pat<(v2i64 (build_vector DblToLong.A, DblToLong.A)),
		(v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPSXDS $A), VSRC),
		(COPY_TO_REGCLASS (XSCVDPSXDS $A), VSRC), 0))>;
		def : Pat<(v2i64 (build_vector DblToULong.A, DblToULong.A)),
		(v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC),
		(COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC), 0))>;
		def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
		(XSCVDPSXWSs (LXSSPX xoaddr:$A)), VSRC), 1))>;
		def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
		(XSCVDPUXWSs (LXSSPX xoaddr:$A)), VSRC), 1))>;
		def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)),
		(v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>;

		// Build vectors of floating point converted to i64.
		def : Pat<(v2i64 (build_vector FltToLong.A, FltToLong.A)),
		(v2i64 (XXPERMDIs (COPY_TO_REGCLASS (XSCVDPSXDSs $A), VSFRC), 0))>;
		def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)),
		(v2i64 (XXPERMDIs (COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>;
		def : Pat<(v2i64 (scalar_to_vector DblToLongLoad.A)),
		(v2i64 (XVCVDPSXDS (LXVDSX xoaddr:$A)))>;
		def : Pat<(v2i64 (scalar_to_vector DblToULongLoad.A)),
		(v2i64 (XVCVDPUXDS (LXVDSX xoaddr:$A)))>;
		}

		let Predicates = [HasVSX, NoP9Vector] in {
		// Load-and-splat with fp-to-int conversion (using X-Form VSX loads).
		def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
		(XSCVDPSXWS (LXSDX xoaddr:$A)), VSRC), 1))>;
		def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
		(XSCVDPUXWS (LXSDX xoaddr:$A)), VSRC), 1))>;
		def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)),
		(v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
		(LXSSPX xoaddr:$A), VSFRC)), 0))>;
		def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)),
		(v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
		(LXSSPX xoaddr:$A), VSFRC)), 0))>;
		}

		// Big endian, available on all targets with VSX
		let Predicates = [IsBigEndian, HasVSX] in {
		def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),
		(v2f64 (XXPERMDI
		(COPY_TO_REGCLASS $A, VSRC),
		(COPY_TO_REGCLASS $B, VSRC), 0))>;

		def : Pat<(v4f32 (build_vector f32:$A, f32:$B, f32:$C, f32:$D)),
		(VMRGEW MrgFP.AC, MrgFP.BD)>;
		def : Pat<(v4f32 (build_vector DblToFlt.A0, DblToFlt.A1,
		DblToFlt.B0, DblToFlt.B1)),
		(v4f32 (VMRGEW MrgFP.ABhToFlt, MrgFP.ABlToFlt))>;
		}

		let Predicates = [IsLittleEndian, HasVSX] in {
		// Little endian, available on all targets with VSX
		def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),
		(v2f64 (XXPERMDI
		(COPY_TO_REGCLASS $B, VSRC),
		(COPY_TO_REGCLASS $A, VSRC), 0))>;

		def : Pat<(v4f32 (build_vector f32:$D, f32:$C, f32:$B, f32:$A)),
		(VMRGEW MrgFP.AC, MrgFP.BD)>;
		def : Pat<(v4f32 (build_vector DblToFlt.A0, DblToFlt.A1,
		DblToFlt.B0, DblToFlt.B1)),
		(v4f32 (VMRGEW MrgFP.BAhToFlt, MrgFP.BAlToFlt))>;
		}

		let Predicates = [HasDirectMove] in {
		/* Endianness-neutral constant splat on P8 and newer targets. The reason
		kbartonUnsubmitted Not Done Reply Inline Actions Please convert to C++ comments kbarton: Please convert to C++ comments
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions OK, will do. nemanjai: OK, will do.
		for this pattern is that on targets with direct moves, we don't expand
		BUILD_VECTOR nodes for v4i32.
		*/
		def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
		immSExt5NonZero:$A, immSExt5NonZero:$A)),
		(v4i32 (VSPLTISW imm:$A))>;
		}

		let Predicates = [IsBigEndian, HasDirectMove, NoP9Vector] in {
		// Big endian integer vectors using direct moves.
		def : Pat<(v2i64 (build_vector i64:$A, i64:$B)),
		(v2i64 (XXPERMDI
		(COPY_TO_REGCLASS (MTVSRD $A), VSRC),
		(COPY_TO_REGCLASS (MTVSRD $B), VSRC), 0))>;
		def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
		(VMRGOW (XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $A), VSRC),
		(COPY_TO_REGCLASS (MTVSRWZ $C), VSRC), 0),
		(XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $B), VSRC),
		(COPY_TO_REGCLASS (MTVSRWZ $D), VSRC), 0))>;
		def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
		(XXSPLTW (COPY_TO_REGCLASS (MTVSRWZ $A), VSRC), 1)>;
		}

		let Predicates = [IsLittleEndian, HasDirectMove, NoP9Vector] in {
		// Little endian integer vectors using direct moves.
		def : Pat<(v2i64 (build_vector i64:$A, i64:$B)),
		(v2i64 (XXPERMDI
		(COPY_TO_REGCLASS (MTVSRD $B), VSRC),
		(COPY_TO_REGCLASS (MTVSRD $A), VSRC), 0))>;
		def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
		(VMRGOW (XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $D), VSRC),
		(COPY_TO_REGCLASS (MTVSRWZ $B), VSRC), 0),
		(XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $C), VSRC),
		(COPY_TO_REGCLASS (MTVSRWZ $A), VSRC), 0))>;
		def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
		(XXSPLTW (COPY_TO_REGCLASS (MTVSRWZ $A), VSRC), 1)>;
		}

		let Predicates = [HasP9Vector] in {
		// Endianness-neutral patterns for const splats with ISA 3.0 instructions.
		def : Pat<(v4i32 (scalar_to_vector i32:$A)),
		(v4i32 (MTVSRWS $A))>;
		def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
		(v4i32 (MTVSRWS $A))>;
		def : Pat<(v16i8 (build_vector immSExt8:$A, immSExt8:$A, immSExt8:$A,
		immSExt8:$A, immSExt8:$A, immSExt8:$A,
		immSExt8:$A, immSExt8:$A, immSExt8:$A,
		immSExt8:$A, immSExt8:$A, immSExt8:$A,
		immSExt8:$A, immSExt8:$A, immSExt8:$A,
		immSExt8:$A)),
		(v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>;
		def : Pat<(v16i8 immAllOnesV),
		(v16i8 (COPY_TO_REGCLASS (XXSPLTIB 255), VSRC))>;
		def : Pat<(v8i16 immAllOnesV),
		(v8i16 (COPY_TO_REGCLASS (XXSPLTIB 255), VSRC))>;
		def : Pat<(v4i32 immAllOnesV),
		(v4i32 (XXSPLTIB 255))>;
		def : Pat<(v2i64 immAllOnesV),
		(v2i64 (XXSPLTIB 255))>;
		def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
		(v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>;
		def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
		(v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>;
		def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
		(XSCVDPSXWS (DFLOADf64 iaddr:$A)), VSRC), 1))>;
		def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)),
		(v4i32 (XXSPLTW (COPY_TO_REGCLASS
		(XSCVDPUXWS (DFLOADf64 iaddr:$A)), VSRC), 1))>;
		def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)),
		(v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
		(DFLOADf32 iaddr:$A),
		VSFRC)), 0))>;
		def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)),
		(v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
		(DFLOADf32 iaddr:$A),
		VSFRC)), 0))>;
}		}

let Predicates = [IsISA3_0, HasDirectMove, IsBigEndian] in {		let Predicates = [IsISA3_0, HasDirectMove, IsBigEndian] in {
		def : Pat<(i64 (extractelt v2i64:$A, 1)),
		(i64 (MFVSRLD $A))>;
		// Better way to build integer vectors if we have MTVSRDD. Big endian.
def : Pat<(v2i64 (build_vector i64:$rB, i64:$rA)),		def : Pat<(v2i64 (build_vector i64:$rB, i64:$rA)),
(v2i64 (MTVSRDD $rB, $rA))>;		(v2i64 (MTVSRDD $rB, $rA))>;
def : Pat<(i64 (extractelt v2i64:$A, 1)),		def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
		(VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.A, AnyExts.C), VSRC),
		(COPY_TO_REGCLASS (MTVSRDD AnyExts.B, AnyExts.D), VSRC))>;
		}

		let Predicates = [IsISA3_0, HasDirectMove, IsLittleEndian] in {
		def : Pat<(i64 (extractelt v2i64:$A, 0)),
(i64 (MFVSRLD $A))>;		(i64 (MFVSRLD $A))>;
		// Better way to build integer vectors if we have MTVSRDD. Little endian.
		def : Pat<(v2i64 (build_vector i64:$rA, i64:$rB)),
		(v2i64 (MTVSRDD $rB, $rA))>;
		def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
		(VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.D, AnyExts.B), VSRC),
		(COPY_TO_REGCLASS (MTVSRDD AnyExts.C, AnyExts.A), VSRC))>;
		}
}		}

test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll

	Show All 40 Lines
	define <4 x i32> @buildi(i32 zeroext %a) {			define <4 x i32> @buildi(i32 zeroext %a) {
	entry:			entry:
	%a.addr = alloca i32, align 4			%a.addr = alloca i32, align 4
	store i32 %a, i32* %a.addr, align 4			store i32 %a, i32* %a.addr, align 4
	%0 = load i32, i32* %a.addr, align 4			%0 = load i32, i32* %a.addr, align 4
	%splat.splatinsert = insertelement <4 x i32> undef, i32 %0, i32 0			%splat.splatinsert = insertelement <4 x i32> undef, i32 %0, i32 0
	%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer			%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
	ret <4 x i32> %splat.splat			ret <4 x i32> %splat.splat
	; CHECK: sldi [[REG1:[0-9]+]], 3, 32			; CHECK: mtvsrwz [[REG1:[0-9]+]], 3
	; CHECK: mtvsrd {{[0-9]+}}, [[REG1]]			; CHECK: xxspltw 34, [[REG1]]
	; CHECK-LE: mtvsrd [[REG1:[0-9]+]], 3			; CHECK-LE: mtvsrwz [[REG1:[0-9]+]], 3
	; CHECK-LE: xxswapd {{[0-9]+}}, [[REG1]]			; CHECK-LE: xxspltw 34, [[REG1]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define <2 x i64> @buildl(i64 %a) {			define <2 x i64> @buildl(i64 %a) {
	entry:			entry:
	%a.addr = alloca i64, align 8			%a.addr = alloca i64, align 8
	store i64 %a, i64* %a.addr, align 8			store i64 %a, i64* %a.addr, align 8
	%0 = load i64, i64* %a.addr, align 8			%0 = load i64, i64* %a.addr, align 8
	▲ Show 20 Lines • Show All 1,417 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/power9-moves-and-splats.ll

	; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s			; RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck %s
	; RUN: llc -mcpu=pwr9 -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck %s \			; RUN: llc -mcpu=pwr9 -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck %s \
	; RUN: --check-prefix=CHECK-BE			; RUN: --check-prefix=CHECK-BE

	@Globi = external global i32, align 4			@Globi = external global i32, align 4
	@Globf = external global float, align 4			@Globf = external global float, align 4

	define <2 x i64> @test1(i64 %a, i64 %b) {			define <2 x i64> @test1(i64 %a, i64 %b) {
	entry:			entry:
	; The FIXME below is due to the lowering for BUILD_VECTOR needing a re-vamp			; The FIXME below is due to the lowering for BUILD_VECTOR needing a re-vamp
	; which will happen in a subsequent patch.			; which will happen in a subsequent patch.
	; CHECK-LABEL: test1			; CHECK-LABEL: test1
	; FIXME: mtvsrdd 34, 4, 3			; CHECK: mtvsrdd 34, 4, 3
	; CHECK: mtvsrd {{[0-9]+}}, 3
	; CHECK: mtvsrd {{[0-9]+}}, 4
	; CHECK: xxmrgld
	; CHECK-BE-LABEL: test1			; CHECK-BE-LABEL: test1
	; FIXME-BE: mtvsrdd 34, 3, 4			; CHECK-BE: mtvsrdd 34, 3, 4
	; CHECK-BE: mtvsrd {{[0-9]+}}, 4
	; CHECK-BE: mtvsrd {{[0-9]+}}, 3
	; CHECK-BE: xxmrghd
	%vecins = insertelement <2 x i64> undef, i64 %a, i32 0			%vecins = insertelement <2 x i64> undef, i64 %a, i32 0
	%vecins1 = insertelement <2 x i64> %vecins, i64 %b, i32 1			%vecins1 = insertelement <2 x i64> %vecins, i64 %b, i32 1
	ret <2 x i64> %vecins1			ret <2 x i64> %vecins1
	}			}

	define i64 @test2(<2 x i64> %a) {			define i64 @test2(<2 x i64> %a) {
	entry:			entry:
	; CHECK-LABEL: test2			; CHECK-LABEL: test2
	▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; CHECK-BE: xxspltib 34, 129			; CHECK-BE: xxspltib 34, 129
	ret <16 x i8> <i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127>			ret <16 x i8> <i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127, i8 -127>
	}			}

	define <4 x i32> @test14(<4 x i32> %a, i32* nocapture readonly %b) {			define <4 x i32> @test14(<4 x i32> %a, i32* nocapture readonly %b) {
	entry:			entry:
	; CHECK-LABEL: test14			; CHECK-LABEL: test14
	; CHECK: lwz [[LD:[0-9]+]],			; CHECK: lwz [[LD:[0-9]+]],
	; CHECK: mtvsrws 34, [[LD]]			; FIXME: mtvsrws 34, [[LD]]
				; CHECK: mtvsrws [[SPLT:[0-9]+]], [[LD]]
				; CHECK: xxspltw 34, [[SPLT]], 3
	; CHECK-BE-LABEL: test14			; CHECK-BE-LABEL: test14
	; CHECK-BE: lwz [[LD:[0-9]+]],			; CHECK-BE: lwz [[LD:[0-9]+]],
	; CHECK-BE: mtvsrws 34, [[LD]]			; FIXME: mtvsrws 34, [[LD]]
				; CHECK-BE: mtvsrws [[SPLT:[0-9]+]], [[LD]]
				; CHECK-BE: xxspltw 34, [[SPLT]], 0
	%0 = load i32, i32* %b, align 4			%0 = load i32, i32* %b, align 4
	%splat.splatinsert = insertelement <4 x i32> undef, i32 %0, i32 0			%splat.splatinsert = insertelement <4 x i32> undef, i32 %0, i32 0
	%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer			%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
	%1 = add i32 %0, 5			%1 = add i32 %0, 5
	store i32 %1, i32* %b, align 4			store i32 %1, i32* %b, align 4
	ret <4 x i32> %splat.splat			ret <4 x i32> %splat.splat
	}			}

test/CodeGen/PowerPC/tail-dup-analyzable-fallthrough.ll

	; RUN: llc -O2 < %s \| FileCheck %s			; RUN: llc -O2 < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-n32:64"			target datalayout = "e-m:e-i64:64-n32:64"
	target triple = "powerpc64le-unknown-linux-gnu"			target triple = "powerpc64le-unknown-linux-gnu"

	; Check that the conditional return block of fmax_double3.exit was not			; Check that the conditional return block of fmax_double3.exit was not
	; duplicated into the if.then.i block			; duplicated into the if.then.i block
	; CHECK: # %if.then.i			; CHECK: # %if.then.i
	; CHECK: lxvd2x			; CHECK: xxlxor
	; CHECK: stxvd2x			; CHECK: stxvd2x
	; CHECK-NOT: bclr			; CHECK-NOT: bclr
	; CHECK: {{^}}.LBB{{[0-9_]+}}:			; CHECK: {{^}}.LBB{{[0-9_]+}}:
	; CHECK-SAME: # %fmax_double3.exit			; CHECK-SAME: # %fmax_double3.exit
	; CHECK: bclr			; CHECK: bclr
	; CHECK: # %if.then			; CHECK: # %if.then
	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @__fmax_double3_3D_exec(<2 x double>* %input6, i1 %bool1, i1 %bool2) #0 {			define void @__fmax_double3_3D_exec(<2 x double>* %input6, i1 %bool1, i1 %bool2) #0 {
	Show All 18 Lines

test/CodeGen/PowerPC/vsx.ll

	Show First 20 Lines • Show All 1,081 Lines • ▼ Show 20 Lines
	; CHECK: blr			; CHECK: blr

	; CHECK-LE-LABEL: @test69			; CHECK-LE-LABEL: @test69
	; CHECK-LE: mfvsrd			; CHECK-LE: mfvsrd
	; CHECK-LE: mtvsrwa			; CHECK-LE: mtvsrwa
	; CHECK-LE: mtvsrwa			; CHECK-LE: mtvsrwa
	; CHECK-LE: xscvsxddp			; CHECK-LE: xscvsxddp
	; CHECK-LE: xscvsxddp			; CHECK-LE: xscvsxddp
	; CHECK-LE: xxspltd			; CHECK-LE: xxmrghd
	; CHECK-LE: xxspltd
	; CHECK-LE: xxmrgld
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

	; This gets scalarized so the code isn't great			; This gets scalarized so the code isn't great
	define <2 x double> @test70(<2 x i8> %a) {			define <2 x double> @test70(<2 x i8> %a) {
	%w = sitofp <2 x i8> %a to <2 x double>			%w = sitofp <2 x i8> %a to <2 x double>
	ret <2 x double> %w			ret <2 x double> %w

	; CHECK-LABEL: @test70			; CHECK-LABEL: @test70
	; CHECK-DAG: lfiwax			; CHECK-DAG: lfiwax
	; CHECK-DAG: lfiwax			; CHECK-DAG: lfiwax
	; CHECK-DAG: xscvsxddp			; CHECK-DAG: xscvsxddp
	; CHECK-DAG: xscvsxddp			; CHECK-DAG: xscvsxddp
	; CHECK: xxmrghd			; CHECK: xxmrghd
	; CHECK: blr			; CHECK: blr

	; CHECK-LE-LABEL: @test70			; CHECK-LE-LABEL: @test70
	; CHECK-LE: mfvsrd			; CHECK-LE: mfvsrd
	; CHECK-LE: mtvsrwa			; CHECK-LE: mtvsrwa
	; CHECK-LE: mtvsrwa			; CHECK-LE: mtvsrwa
	; CHECK-LE: xscvsxddp			; CHECK-LE: xscvsxddp
	; CHECK-LE: xscvsxddp			; CHECK-LE: xscvsxddp
	; CHECK-LE: xxspltd			; CHECK-LE: xxmrghd
	; CHECK-LE: xxspltd
	; CHECK-LE: xxmrgld
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

	; This gets scalarized so the code isn't great			; This gets scalarized so the code isn't great
	define <2 x i32> @test80(i32 %v) {			define <2 x i32> @test80(i32 %v) {
	%b1 = insertelement <2 x i32> undef, i32 %v, i32 0			%b1 = insertelement <2 x i32> undef, i32 %v, i32 0
	%b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer			%b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
	%i = add <2 x i32> %b2, <i32 2, i32 3>			%i = add <2 x i32> %b2, <i32 2, i32 3>
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines