This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9065	Why take `Op` as a non-const reference? I don't see it being modified so it is not an output operand AFAICT.
llvm/lib/Target/PowerPC/PPCInstrVSX.td
2844	As far as I can tell, this is in a block that only sets `let Predicates = [VSX]`. It is not safe to use direct moves on subtargets that don't have direct moves.
llvm/test/CodeGen/PowerPC/load-and-splat.ll
223	Definitely bad. P7 doesn't have direct moves.

1: Lint warning fix
2: LD_SPLAT for v8i16/v16i8 on pwr7

shchenz marked 2 inline comments as done.Jul 27 2021, 5:11 AM

shchenz added inline comments.

llvm/test/CodeGen/PowerPC/load-and-splat.ll
223	For this case, now we use one more instruction than the left ones. But I think it should still be a win as we don't use the stack which is always good for some opts, like leaf calls related optimizations. And now it uses fewer memory operations 2 vs 3.

Harbormaster completed remote builds in B116398: Diff 361984.Jul 27 2021, 6:03 AM

ping

gentle ping

@nemanjai Hi Nemanja, could you please help to have another look at this issue? Thanks.

ping

gentle ping

gentle ping...

jsji added inline comments.Oct 26 2021, 8:01 AM

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
5840	VSPLTB?
5862	clang-tidy has warneed, don't use 'else' after 'return'.
5870	The code patterns are very similar for these two cases, can we common them? eg: something like: SDNode Mask = CurDAG->getMachineNode( Subtarget->isLittleEndian() ? PPC::LVSR : PPC::LVSL, dl, N->getValueType(0) , CurDAG->getRegister(ZeroReg, MVT::i64), N->getOperand(1)); SDNode Load = CurDAG->getMachineNode(PPC::LVX, dl, N->getValueType(0), MVT::Other, {CurDAG->getRegister(ZeroReg, MVT::i64), N->getOperand(1), N->getOperand(0)}); opcode = ... splatconst=... if (N->getValueType(0) == MVT::v8i16){ SDNode *LoadHigh = CurDAG->getMachineNode( PPC::LVX, dl, MVT::v16i8, MVT::Other, {SDValue(CurDAG->getMachineNode( LIOpcode, dl, MVT::i32, CurDAG->getTargetConstant(1, dl, MVT::i8)), 0), N->getOperand(1), SDValue(Load, 1)}); Load = LoadHigh; opcode=... spltconst =... } CurDAG->ReplaceAllUsesOfValueWith(SDValue(N, 1), SDValue(Load, 1)); transferMemOperands(N, Load); CurDAG->SelectNodeTo( N, opcode, N->getValueType(0), spliconst, SDValue(Perm, 0));
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9200	Can we split this into multiple early returns, with comments to corresponding ones. //case 1 ... if(...) return SDValue(); //case 2 ... if(...) return SDValue(); ...
llvm/lib/Target/PowerPC/PPCInstrVSX.td
2839	Do we need some conditional checking here? As `LFIWZX` is changing the type to int now, are there any precision change?
3560	Do we need to predicate 64 bit before using `MTVSRD`?
4113	Do we need alignment check here for `LXSIHZX`?
llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
646	`f0` here looks weird regarding register classes, `xxspltd` does need `vs0` to get the `element 0` of doubleword. Maybe we should follow up to see why we are printing `f0` here in ASMPrinter.
llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
18	Unrelated changes?

address @jsji comments

llvm/lib/Target/PowerPC/PPCInstrVSX.td
2839	I think we don't need a conditional check. `LFIWZX` will not change the raw_data in the registers. For example, if the data stored in the memory is float type 1.5(0x000000003fc00000), after `LFIWZX`, the raw data stored in result FPR is still `0x000000003fc00000`, not integer 1?
3560	Thanks, this is a very good question. Yes, normally, we need to predicate 64 bit if MTVSRD emits any value in the high 32bits of the first double world. However, for this case, seems we don't need the predicate, because the useful bits for this case are the lowest 16/8 bits, so even on the 32 bit arch, we still can ensure the lowest 16/8 bits are having the right value. I add a comment to indicate why we don't need to predicate 64 bit.
4113	I think non-alignment load may introduce perf issue, but it will not have functionality issue? Seems load like `LDX` also not checks the alignment. Do you see any issues here?
llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
646	Sure, I will check this later. `vs0` seems a more reasonable input.
llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
18	It is also for the weird change: `xxspltd v2, f0`

Harbormaster completed remote builds in B131162: Diff 382973.Oct 28 2021, 3:49 AM

jsji mentioned this in rGbd932f7499ff: [NFC][PowerPC] Update testcases using script.Nov 1 2021, 8:37 AM

LGTM. Thanks.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
4113	Is it possible that we call `PPCldsplat` with unaligned address? If so, then we may load from wrong address.
llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll
18	I meant the `;`, anyway, I have committed the changes in bd932f7499ff7ab958f5bc2f55dcf4b06cd87950, you should be able to rebase.

This revision is now accepted and ready to land.Nov 1 2021, 8:45 AM

rebase

This revision was landed with ongoing or failed builds.Nov 2 2021, 10:33 PM

Closed by commit rG5a8b19634002: [PowerPC] handle more splat loads without stack operation (authored by shchenz). · Explain Why

This revision was automatically updated to reflect the committed changes.

shchenz added a commit: rG5a8b19634002: [PowerPC] handle more splat loads without stack operation.

Harbormaster completed remote builds in B132143: Diff 384338.Nov 2 2021, 11:18 PM

shchenz mentioned this in D113178: [PowerPC] use right register class for input operand of XXPERMDIs.Nov 4 2021, 3:35 AM

nemanjai added inline comments.Nov 4 2021, 5:08 AM

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
5831	Sorry, I had this comment but didn't submit because I hand't finished the review. Please flip the condition and turn it into an early exit.
5837	Seems like all the code is also inside this block. Please flip the condition and early exit. You should really look for opportunities to do this and reduce the level of nesting. I think it would also be useful to provide a much simpler code path for loads that are aligned at 16 bytes. In that case, all you need is `LVX + SPLAT` (where the splat index depends on endianness. Example: typedef signed short __attribute__((aligned(16))) AlignedShort; vector signed short test(AlignedShort Ptr) { return (vector signed short)(Ptr + 3); }
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9080	This is of course a minor nit, but please be mindful of the convention to make comments proper sentences with capitalization and punctuation.
9165	This is not trivially obvious and therefore requires a comment. Something like: // If the input load is an extending load, it will be an i32 -> i64 // extending load and isValidSplatLoad() will update NewOpcode. Generally, I think this pattern is somewhat dangerous. It makes an assumption that a different function has specific behaviour that isn't clearly documented. There is little assurance that someone won't update `isValidSplatLoad()` to accept extending loads from `i16` to `i64`, etc. So I think you should add an assert here to ensure the types are what you expect them to be.
9176	s/Execlude/Exclude
9200	I agree with separating out the `LFIWAX/LFIWZX` case because the condition is very different. But for the `LXVRHX/LXVRBX`, the conditions are essentially the same. Those should be combined in an obvious way: // case 2 - lxvr[bh]x // 2.1: load result is at most 16 bits; // 2.2: build a vector with above loaded value; // 2.3: the vector has only one value at index 0, others are all undef; // 2.4: on LE target, so that lxvr[bh]x does not need any permute. if (NumUsesOfInputLD == 1 && Subtarget.isLittleEndian() && Subtarget.isISA3_1() && Op->getValueType(0).getSizeInBits() <= 16)
llvm/lib/Target/PowerPC/PPCInstrVSX.td
3560	I understand that we don't really care about the undefined bits because they'll be overwritten when we splat, but why not just use `MTVSRWZ` and not even need the comment?

shchenz mentioned this in D113236: [PowerPC] use correct selection for v16i8/v8i16 splat load.Nov 4 2021, 8:30 PM

shchenz mentioned this in rG96950270669a: [PowerPC] address post-commit comments for D106555; NFC.Nov 4 2021, 10:31 PM

Thanks for your comments @nemanjai

I addressed most of the comment in commit 96950270669acd3c342a266562ff3a41464cc0a0

One comment related to the code gen improvement for the aligned load will be addressed in the follow-up Phabricator patch.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
5837	I'd like to follow up on the 16-bytes aligned splat load in another patch.
llvm/lib/Target/PowerPC/PPCInstrVSX.td
3560	Thanks for the good suggestion. This seems also resolve one crash related to `INSERT_SUBREG(x, x, sub_32)` on 32-bit AIX. I addressed this comment in D113236

shchenz mentioned this in D114062: [PowerPC] use lvx + splat directly for aligned splat load.Nov 16 2021, 11:41 PM

amyk mentioned this in D117803: [PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads..Jan 20 2022, 9:45 AM

amyk mentioned this in rG9cc5b064f185: [PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non….Jan 28 2022, 6:23 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

62 lines

8 lines

86 lines

26 lines

16 lines

test/

CodeGen/

PowerPC/

canonical-merge-shuffles.ll

36 lines

load-and-splat.ll

59 lines

scalar_vector_test_3.ll

16 lines

Diff 384341

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,819 Lines • ▼ Show 20 Lines	if ((Elt & 1) == 0) {
SDNode *Tmp1 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);		SDNode *Tmp1 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);
EltVal = getI32Imm(-16, dl);		EltVal = getI32Imm(-16, dl);
SDNode *Tmp2 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);		SDNode *Tmp2 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);
ReplaceNode(N, CurDAG->getMachineNode(Opc2, dl, VT, SDValue(Tmp1, 0),		ReplaceNode(N, CurDAG->getMachineNode(Opc2, dl, VT, SDValue(Tmp1, 0),
SDValue(Tmp2, 0)));		SDValue(Tmp2, 0)));
return;		return;
}		}
}		}
		case PPCISD::LD_SPLAT: {
		// For v16i8 and v8i16, if target has no direct move, we can still handle
		// this without using stack.
		if (Subtarget->hasAltivec() && !Subtarget->hasDirectMove()) {
		nemanjaiUnsubmitted Done Reply Inline Actions Sorry, I had this comment but didn't submit because I hand't finished the review. Please flip the condition and turn it into an early exit. nemanjai: Sorry, I had this comment but didn't submit because I hand't finished the review. Please flip…
		SDValue ZeroReg =
		CurDAG->getRegister(Subtarget->isPPC64() ? PPC::ZERO8 : PPC::ZERO,
		Subtarget->isPPC64() ? MVT::i64 : MVT::i32);
		unsigned LIOpcode = Subtarget->isPPC64() ? PPC::LI8 : PPC::LI;
		EVT Type = N->getValueType(0);
		if (Type == MVT::v16i8 \|\| Type == MVT::v8i16) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions Seems like all the code is also inside this block. Please flip the condition and early exit. You should really look for opportunities to do this and reduce the level of nesting. I think it would also be useful to provide a much simpler code path for loads that are aligned at 16 bytes. In that case, all you need is `LVX + SPLAT` (where the splat index depends on endianness. Example: typedef signed short __attribute__((aligned(16))) AlignedShort; vector signed short test(AlignedShort Ptr) { return (vector signed short)(Ptr + 3); } nemanjai: Seems like all the code is also inside this block. Please flip the condition and early exit.
		shchenzAuthorUnsubmitted Done Reply Inline Actions I'd like to follow up on the 16-bytes aligned splat load in another patch. shchenz: I'd like to follow up on the 16-bytes aligned splat load in another patch.
		// v16i8 LD_SPLAT addr
		// ======>
		// Mask = LVSR/LVSL 0, addr
		jsjiUnsubmitted Done Reply Inline Actions VSPLTB? jsji: VSPLTB?
		// LoadLow = LXV 0, addr
		// Perm = VPERM LoadLow, LoadLow, Mask
		// Splat = VSPLTB 15/0, Perm
		//
		// v8i16 LD_SPLAT addr
		// ======>
		// Mask = LVSR/LVSL 0, addr
		// LoadLow = LXV 0, addr
		// LoadHigh = LXV (LI, 1), addr
		// Perm = VPERM LoadLow, LoadHigh, Mask
		// Splat = VSPLTH 7/0, Perm
		unsigned SplatOp = (Type == MVT::v16i8) ? PPC::VSPLTB : PPC::VSPLTH;
		unsigned SplatElemIndex =
		Subtarget->isLittleEndian() ? ((Type == MVT::v16i8) ? 15 : 7) : 0;

		SDNode *Mask = CurDAG->getMachineNode(
		Subtarget->isLittleEndian() ? PPC::LVSR : PPC::LVSL, dl, Type,
		ZeroReg, N->getOperand(1));

		SDNode *LoadLow = CurDAG->getMachineNode(
		PPC::LVX, dl, MVT::v16i8, MVT::Other,
		{ZeroReg, N->getOperand(1), N->getOperand(0)});
		jsjiUnsubmitted Done Reply Inline Actions clang-tidy has warneed, don't use 'else' after 'return'. jsji: clang-tidy has warneed, don't use 'else' after 'return'.

		SDNode *LoadHigh = LoadLow;
		if (Type == MVT::v8i16) {
		LoadHigh = CurDAG->getMachineNode(
		PPC::LVX, dl, MVT::v16i8, MVT::Other,
		{SDValue(CurDAG->getMachineNode(
		LIOpcode, dl, MVT::i32,
		CurDAG->getTargetConstant(1, dl, MVT::i8)),
		jsjiUnsubmitted Done Reply Inline Actions The code patterns are very similar for these two cases, can we common them? eg: something like: SDNode Mask = CurDAG->getMachineNode( Subtarget->isLittleEndian() ? PPC::LVSR : PPC::LVSL, dl, N->getValueType(0) , CurDAG->getRegister(ZeroReg, MVT::i64), N->getOperand(1)); SDNode Load = CurDAG->getMachineNode(PPC::LVX, dl, N->getValueType(0), MVT::Other, {CurDAG->getRegister(ZeroReg, MVT::i64), N->getOperand(1), N->getOperand(0)}); opcode = ... splatconst=... if (N->getValueType(0) == MVT::v8i16){ SDNode LoadHigh = CurDAG->getMachineNode( PPC::LVX, dl, MVT::v16i8, MVT::Other, {SDValue(CurDAG->getMachineNode( LIOpcode, dl, MVT::i32, CurDAG->getTargetConstant(1, dl, MVT::i8)), 0), N->getOperand(1), SDValue(Load, 1)}); Load = LoadHigh; opcode=... spltconst =... } CurDAG->ReplaceAllUsesOfValueWith(SDValue(N, 1), SDValue(Load, 1)); transferMemOperands(N, Load); CurDAG->SelectNodeTo( N, opcode, N->getValueType(0), spliconst, SDValue(Perm, 0)); jsji:* The code patterns are very similar for these two cases, can we common them? eg: something like…
		0),
		N->getOperand(1), SDValue(LoadLow, 1)});
		}

		CurDAG->ReplaceAllUsesOfValueWith(SDValue(N, 1), SDValue(LoadHigh, 1));
		transferMemOperands(N, LoadHigh);

		SDNode *Perm =
		CurDAG->getMachineNode(PPC::VPERM, dl, Type, SDValue(LoadLow, 0),
		SDValue(LoadHigh, 0), SDValue(Mask, 0));
		CurDAG->SelectNodeTo(
		N, SplatOp, Type,
		CurDAG->getTargetConstant(SplatElemIndex, dl, MVT::i8),
		SDValue(Perm, 0));
		return;
		}
		}
		break;
		}
}		}

SelectCode(N);		SelectCode(N);
}		}

// If the target supports the cmpb instruction, do the idiom recognition here.		// If the target supports the cmpb instruction, do the idiom recognition here.
// We don't do this as a DAG combine because we don't want to do it as nodes		// We don't do this as a DAG combine because we don't want to do it as nodes
// are being combined (because we might miss part of the eventual idiom). We		// are being combined (because we might miss part of the eventual idiom). We
▲ Show 20 Lines • Show All 1,452 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 553 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// VSRC, CHAIN = LD_VSX_LH CHAIN, Ptr - This is a floating-point load of a		/// VSRC, CHAIN = LD_VSX_LH CHAIN, Ptr - This is a floating-point load of a
/// v2f32 value into the lower half of a VSR register.		/// v2f32 value into the lower half of a VSR register.
LD_VSX_LH,		LD_VSX_LH,

/// VSRC, CHAIN = LD_SPLAT, CHAIN, Ptr - a splatting load memory		/// VSRC, CHAIN = LD_SPLAT, CHAIN, Ptr - a splatting load memory
/// instructions such as LXVDSX, LXVWSX.		/// instructions such as LXVDSX, LXVWSX.
LD_SPLAT,		LD_SPLAT,

		/// VSRC, CHAIN = ZEXT_LD_SPLAT, CHAIN, Ptr - a splatting load memory
		/// that zero-extends.
		ZEXT_LD_SPLAT,

		/// VSRC, CHAIN = SEXT_LD_SPLAT, CHAIN, Ptr - a splatting load memory
		/// that sign-extends.
		SEXT_LD_SPLAT,

/// CHAIN = STXVD2X CHAIN, VSRC, Ptr - Occurs only for little endian.		/// CHAIN = STXVD2X CHAIN, VSRC, Ptr - Occurs only for little endian.
/// Maps directly to an stxvd2x instruction that will be preceded by		/// Maps directly to an stxvd2x instruction that will be preceded by
/// an xxswapd.		/// an xxswapd.
STXVD2X,		STXVD2X,

/// CHAIN = STORE_VEC_BE CHAIN, VSRC, Ptr - Occurs only for little endian.		/// CHAIN = STORE_VEC_BE CHAIN, VSRC, Ptr - Occurs only for little endian.
/// Maps directly to one of stxvd2x/stxvw4x/stxvh8x/stxvb16x depending on		/// Maps directly to one of stxvd2x/stxvw4x/stxvh8x/stxvb16x depending on
/// the vector type to store vector in big-endian element order.		/// the vector type to store vector in big-endian element order.
▲ Show 20 Lines • Show All 870 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,706 Lines • ▼ Show 20 Lines	case PPCISD::TLS_DYNAMIC_MAT_PCREL_ADDR:
return "PPCISD::TLS_DYNAMIC_MAT_PCREL_ADDR";		return "PPCISD::TLS_DYNAMIC_MAT_PCREL_ADDR";
case PPCISD::TLS_LOCAL_EXEC_MAT_ADDR:		case PPCISD::TLS_LOCAL_EXEC_MAT_ADDR:
return "PPCISD::TLS_LOCAL_EXEC_MAT_ADDR";		return "PPCISD::TLS_LOCAL_EXEC_MAT_ADDR";
case PPCISD::ACC_BUILD: return "PPCISD::ACC_BUILD";		case PPCISD::ACC_BUILD: return "PPCISD::ACC_BUILD";
case PPCISD::PAIR_BUILD: return "PPCISD::PAIR_BUILD";		case PPCISD::PAIR_BUILD: return "PPCISD::PAIR_BUILD";
case PPCISD::EXTRACT_VSX_REG: return "PPCISD::EXTRACT_VSX_REG";		case PPCISD::EXTRACT_VSX_REG: return "PPCISD::EXTRACT_VSX_REG";
case PPCISD::XXMFACC: return "PPCISD::XXMFACC";		case PPCISD::XXMFACC: return "PPCISD::XXMFACC";
case PPCISD::LD_SPLAT: return "PPCISD::LD_SPLAT";		case PPCISD::LD_SPLAT: return "PPCISD::LD_SPLAT";
		case PPCISD::ZEXT_LD_SPLAT: return "PPCISD::ZEXT_LD_SPLAT";
		case PPCISD::SEXT_LD_SPLAT: return "PPCISD::SEXT_LD_SPLAT";
case PPCISD::FNMSUB: return "PPCISD::FNMSUB";		case PPCISD::FNMSUB: return "PPCISD::FNMSUB";
case PPCISD::STRICT_FADDRTZ:		case PPCISD::STRICT_FADDRTZ:
return "PPCISD::STRICT_FADDRTZ";		return "PPCISD::STRICT_FADDRTZ";
case PPCISD::STRICT_FCTIDZ:		case PPCISD::STRICT_FCTIDZ:
return "PPCISD::STRICT_FCTIDZ";		return "PPCISD::STRICT_FCTIDZ";
case PPCISD::STRICT_FCTIWZ:		case PPCISD::STRICT_FCTIWZ:
return "PPCISD::STRICT_FCTIWZ";		return "PPCISD::STRICT_FCTIWZ";
case PPCISD::STRICT_FCTIDUZ:		case PPCISD::STRICT_FCTIDUZ:
▲ Show 20 Lines • Show All 7,332 Lines • ▼ Show 20 Lines	bool llvm::checkConvertToNonDenormSingle(APFloat &ArgAPFloat) {
APFloat APFloatToConvert = ArgAPFloat;		APFloat APFloatToConvert = ArgAPFloat;
bool LosesInfo = true;		bool LosesInfo = true;
APFloatToConvert.convert(APFloat::IEEEsingle(), APFloat::rmNearestTiesToEven,		APFloatToConvert.convert(APFloat::IEEEsingle(), APFloat::rmNearestTiesToEven,
&LosesInfo);		&LosesInfo);

return (!LosesInfo && !APFloatToConvert.isDenormal());		return (!LosesInfo && !APFloatToConvert.isDenormal());
}		}

		static bool isValidSplatLoad(const PPCSubtarget &Subtarget, const SDValue &Op,
		nemanjaiUnsubmitted Done Reply Inline Actions Why take `Op` as a non-const reference? I don't see it being modified so it is not an output operand AFAICT. nemanjai: Why take `Op` as a non-const reference? I don't see it being modified so it is not an output…
		unsigned &Opcode) {
		const SDNode *InputNode = Op.getOperand(0).getNode();
		if (!InputNode \|\| !ISD::isUNINDEXEDLoad(InputNode))
		return false;

		if (!Subtarget.hasVSX())
		return false;

		EVT Ty = Op->getValueType(0);
		if (Ty == MVT::v2f64 \|\| Ty == MVT::v4f32 \|\| Ty == MVT::v4i32 \|\|
		Ty == MVT::v8i16 \|\| Ty == MVT::v16i8)
		return true;

		if (Ty == MVT::v2i64) {
		// check the extend type if the input is i32 while the output vector type is
		nemanjaiUnsubmitted Done Reply Inline Actions This is of course a minor nit, but please be mindful of the convention to make comments proper sentences with capitalization and punctuation. nemanjai: This is of course a minor nit, but please be mindful of the convention to make comments proper…
		// v2i64.
		if (cast<LoadSDNode>(Op.getOperand(0))->getMemoryVT() == MVT::i32) {
		if (ISD::isZEXTLoad(InputNode))
		Opcode = PPCISD::ZEXT_LD_SPLAT;
		if (ISD::isSEXTLoad(InputNode))
		Opcode = PPCISD::SEXT_LD_SPLAT;
		}
		return true;
		}
		return false;
		}

// If this is a case we can't handle, return null and let the default		// If this is a case we can't handle, return null and let the default
// expansion code take care of it. If we CAN select this case, and if it		// expansion code take care of it. If we CAN select this case, and if it
// selects to a single instruction, return Op. Otherwise, if we can codegen		// selects to a single instruction, return Op. Otherwise, if we can codegen
// this case more efficiently than a constant pool load, lower it to the		// this case more efficiently than a constant pool load, lower it to the
// sequence of ops that should be used.		// sequence of ops that should be used.
SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,		SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(Op);		SDLoc dl(Op);
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if ((Op->getValueType(0) == MVT::v2f64) &&
DAG.getTargetConstant(1, dl, MVT::i32),		DAG.getTargetConstant(1, dl, MVT::i32),
DAG.getTargetConstant(Lo, dl, MVT::i32));		DAG.getTargetConstant(Lo, dl, MVT::i32));

return DAG.getBitcast(Op.getValueType(), SplatNode);		return DAG.getBitcast(Op.getValueType(), SplatNode);
}		}
}		}

if (!BVNIsConstantSplat \|\| SplatBitSize > 32) {		if (!BVNIsConstantSplat \|\| SplatBitSize > 32) {
		unsigned NewOpcode = PPCISD::LD_SPLAT;

bool IsPermutedLoad = false;
const SDValue *InputLoad =
getNormalLoadInput(Op.getOperand(0), IsPermutedLoad);
// Handle load-and-splat patterns as we have instructions that will do this		// Handle load-and-splat patterns as we have instructions that will do this
// in one go.		// in one go.
if (InputLoad && DAG.isSplatValue(Op, true)) {		if (DAG.isSplatValue(Op, true) &&
		isValidSplatLoad(Subtarget, Op, NewOpcode)) {
		const SDValue *InputLoad = &Op.getOperand(0);
LoadSDNode LD = cast<LoadSDNode>(InputLoad);		LoadSDNode LD = cast<LoadSDNode>(InputLoad);

// We have handling for 4 and 8 byte elements.		unsigned ElementSize = LD->getMemoryVT().getScalarSizeInBits() *
		nemanjaiUnsubmitted Done Reply Inline Actions This is not trivially obvious and therefore requires a comment. Something like: // If the input load is an extending load, it will be an i32 -> i64 // extending load and isValidSplatLoad() will update NewOpcode. Generally, I think this pattern is somewhat dangerous. It makes an assumption that a different function has specific behaviour that isn't clearly documented. There is little assurance that someone won't update `isValidSplatLoad()` to accept extending loads from `i16` to `i64`, etc. So I think you should add an assert here to ensure the types are what you expect them to be. nemanjai: This is not trivially obvious and therefore requires a comment. Something like: ``` // If the…
unsigned ElementSize = LD->getMemoryVT().getScalarSizeInBits();		((NewOpcode == PPCISD::LD_SPLAT) ? 1 : 2);

// Checking for a single use of this load, we have to check for vector		// Checking for a single use of this load, we have to check for vector
// width (128 bits) / ElementSize uses (since each operand of the		// width (128 bits) / ElementSize uses (since each operand of the
// BUILD_VECTOR is a separate use of the value.		// BUILD_VECTOR is a separate use of the value.
unsigned NumUsesOfInputLD = 128 / ElementSize;		unsigned NumUsesOfInputLD = 128 / ElementSize;
for (SDValue BVInOp : Op->ops())		for (SDValue BVInOp : Op->ops())
if (BVInOp.isUndef())		if (BVInOp.isUndef())
NumUsesOfInputLD--;		NumUsesOfInputLD--;

		// Execlude somes case where LD_SPLAT is worse than scalar_to_vector:
		nemanjaiUnsubmitted Done Reply Inline Actions s/Execlude/Exclude nemanjai: s/Execlude/Exclude
		// Below cases should also happen for "lfiwzx/lfiwax + LE target + index
		// 1" and "lxvrhx + BE target + index 7" and "lxvrbx + BE target + index
		// 15", but funciton IsValidSplatLoad() now will only return true when
		// the data at index 0 is not nullptr. So we will not get into trouble for
		// these cases.
		//
		// case 1 - lfiwzx/lfiwax
		// 1.1: load result is i32 and is sign/zero extend to i64;
		// 1.2: build a v2i64 vector type with above loaded value;
		// 1.3: the vector has only one value at index 0, others are all undef;
		// 1.4: on BE target, so that lfiwzx/lfiwax does not need any permute.
		if (NumUsesOfInputLD == 1 &&
		(Op->getValueType(0) == MVT::v2i64 && NewOpcode != PPCISD::LD_SPLAT &&
		!Subtarget.isLittleEndian() && Subtarget.hasVSX() &&
		Subtarget.hasLFIWAX()))
		return SDValue();

		// case 2 - lxvrhx
		// 2.1: load result is i16;
		// 2.2: build a v8i16 vector with above loaded value;
		// 2.3: the vector has only one value at index 0, others are all undef;
		// 2.4: on LE target, so that lxvrhx does not need any permute.
		if (NumUsesOfInputLD == 1 && Subtarget.isLittleEndian() &&
		Subtarget.isISA3_1() && Op->getValueType(0) == MVT::v16i8)
		jsjiUnsubmitted Done Reply Inline Actions Can we split this into multiple early returns, with comments to corresponding ones. //case 1 ... if(...) return SDValue(); //case 2 ... if(...) return SDValue(); ... jsji: Can we split this into multiple early returns, with comments to corresponding ones. ```…
		nemanjaiUnsubmitted Done Reply Inline Actions I agree with separating out the `LFIWAX/LFIWZX` case because the condition is very different. But for the `LXVRHX/LXVRBX`, the conditions are essentially the same. Those should be combined in an obvious way: // case 2 - lxvr[bh]x // 2.1: load result is at most 16 bits; // 2.2: build a vector with above loaded value; // 2.3: the vector has only one value at index 0, others are all undef; // 2.4: on LE target, so that lxvr[bh]x does not need any permute. if (NumUsesOfInputLD == 1 && Subtarget.isLittleEndian() && Subtarget.isISA3_1() && Op->getValueType(0).getSizeInBits() <= 16) nemanjai: I agree with separating out the `LFIWAX/LFIWZX` case because the condition is very different.
		return SDValue();

		// case 3 - lxvrbx
		// 3.1: load result is i8;
		// 3.2: build a v16i8 vector with above loaded value;
		// 3.3: the vector has only one value at index 0, others are all undef;
		// 3.4: on LE target, so that lxvrbx does not need any permute.
		if (NumUsesOfInputLD == 1 && Subtarget.isLittleEndian() &&
		Subtarget.isISA3_1() && Op->getValueType(0) == MVT::v8i16)
		return SDValue();

assert(NumUsesOfInputLD > 0 && "No uses of input LD of a build_vector?");		assert(NumUsesOfInputLD > 0 && "No uses of input LD of a build_vector?");
if (InputLoad->getNode()->hasNUsesOfValue(NumUsesOfInputLD, 0) &&		if (InputLoad->getNode()->hasNUsesOfValue(NumUsesOfInputLD, 0) &&
((Subtarget.hasVSX() && ElementSize == 64) \|\|		Subtarget.hasVSX()) {
(Subtarget.hasP9Vector() && ElementSize == 32))) {
SDValue Ops[] = {		SDValue Ops[] = {
LD->getChain(), // Chain		LD->getChain(), // Chain
LD->getBasePtr(), // Ptr		LD->getBasePtr(), // Ptr
DAG.getValueType(Op.getValueType()) // VT		DAG.getValueType(Op.getValueType()) // VT
};		};
SDValue LdSplt = DAG.getMemIntrinsicNode(		SDValue LdSplt = DAG.getMemIntrinsicNode(
PPCISD::LD_SPLAT, dl, DAG.getVTList(Op.getValueType(), MVT::Other),		NewOpcode, dl, DAG.getVTList(Op.getValueType(), MVT::Other), Ops,
Ops, LD->getMemoryVT(), LD->getMemOperand());		LD->getMemoryVT(), LD->getMemOperand());
// Replace all uses of the output chain of the original load with the		// Replace all uses of the output chain of the original load with the
// output chain of the new load.		// output chain of the new load.
DAG.ReplaceAllUsesOfValueWith(InputLoad->getValue(1),		DAG.ReplaceAllUsesOfValueWith(InputLoad->getValue(1),
LdSplt.getValue(1));		LdSplt.getValue(1));
return LdSplt;		return LdSplt;
}		}
}		}

▲ Show 20 Lines • Show All 8,648 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
def PPCswapNoChain : SDNode<"PPCISD::SWAP_NO_CHAIN", SDT_PPCxxswapd>;		def PPCswapNoChain : SDNode<"PPCISD::SWAP_NO_CHAIN", SDT_PPCxxswapd>;
def PPCvabsd : SDNode<"PPCISD::VABSD", SDTVabsd, []>;		def PPCvabsd : SDNode<"PPCISD::VABSD", SDTVabsd, []>;

def PPCfpexth : SDNode<"PPCISD::FP_EXTEND_HALF", SDT_PPCfpexth, []>;		def PPCfpexth : SDNode<"PPCISD::FP_EXTEND_HALF", SDT_PPCfpexth, []>;
def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH", SDT_PPCldvsxlh,		def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH", SDT_PPCldvsxlh,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def PPCldsplat : SDNode<"PPCISD::LD_SPLAT", SDT_PPCldsplat,		def PPCldsplat : SDNode<"PPCISD::LD_SPLAT", SDT_PPCldsplat,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
		def PPCzextldsplat : SDNode<"PPCISD::ZEXT_LD_SPLAT", SDT_PPCldsplat,
		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
		def PPCsextldsplat : SDNode<"PPCISD::SEXT_LD_SPLAT", SDT_PPCldsplat,
		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def PPCSToV : SDNode<"PPCISD::SCALAR_TO_VECTOR_PERMUTED",		def PPCSToV : SDNode<"PPCISD::SCALAR_TO_VECTOR_PERMUTED",
SDTypeProfile<1, 1, []>, []>;		SDTypeProfile<1, 1, []>, []>;

//-------------------------- Predicate definitions ---------------------------//		//-------------------------- Predicate definitions ---------------------------//
def HasVSX : Predicate<"Subtarget->hasVSX()">;		def HasVSX : Predicate<"Subtarget->hasVSX()">;
def IsLittleEndian : Predicate<"Subtarget->isLittleEndian()">;		def IsLittleEndian : Predicate<"Subtarget->isLittleEndian()">;
def IsBigEndian : Predicate<"!Subtarget->isLittleEndian()">;		def IsBigEndian : Predicate<"!Subtarget->isLittleEndian()">;
def IsPPC64 : Predicate<"Subtarget->isPPC64()">;		def IsPPC64 : Predicate<"Subtarget->isPPC64()">;
▲ Show 20 Lines • Show All 2,673 Lines • ▼ Show 20 Lines	defm : ScalToVecWPermute<
(XXSPLTW (SUBREG_TO_REG (i64 1), (XSCVDPUXWSs (XFLOADf32 ForceXForm:$A)), sub_64), 1),		(XXSPLTW (SUBREG_TO_REG (i64 1), (XSCVDPUXWSs (XFLOADf32 ForceXForm:$A)), sub_64), 1),
(SUBREG_TO_REG (i64 1), (XSCVDPUXWSs (XFLOADf32 ForceXForm:$A)), sub_64)>;		(SUBREG_TO_REG (i64 1), (XSCVDPUXWSs (XFLOADf32 ForceXForm:$A)), sub_64)>;
def : Pat<(v4f32 (build_vector (f32 (fpround f64:$A)), (f32 (fpround f64:$A)),		def : Pat<(v4f32 (build_vector (f32 (fpround f64:$A)), (f32 (fpround f64:$A)),
(f32 (fpround f64:$A)), (f32 (fpround f64:$A)))),		(f32 (fpround f64:$A)), (f32 (fpround f64:$A)))),
(v4f32 (XXSPLTW (SUBREG_TO_REG (i64 1), (XSCVDPSP f64:$A), sub_64), 0))>;		(v4f32 (XXSPLTW (SUBREG_TO_REG (i64 1), (XSCVDPSP f64:$A), sub_64), 0))>;

def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)),		def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)),
(v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>;		(v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>;

		// Splat loads.
def : Pat<(v2f64 (PPCldsplat ForceXForm:$A)),		def : Pat<(v2f64 (PPCldsplat ForceXForm:$A)),
(v2f64 (LXVDSX ForceXForm:$A))>;		(v2f64 (LXVDSX ForceXForm:$A))>;
		def : Pat<(v4f32 (PPCldsplat ForceXForm:$A)),
		(v4f32 (XXSPLTW (SUBREG_TO_REG (i64 1), (LFIWZX ForceXForm:$A), sub_64), 1))>;
		jsjiUnsubmitted Done Reply Inline Actions Do we need some conditional checking here? As `LFIWZX` is changing the type to int now, are there any precision change? jsji: Do we need some conditional checking here? As `LFIWZX` is changing the type to int now, are…
		shchenzAuthorUnsubmitted Done Reply Inline Actions I think we don't need a conditional check. `LFIWZX` will not change the raw_data in the registers. For example, if the data stored in the memory is float type 1.5(0x000000003fc00000), after `LFIWZX`, the raw data stored in result FPR is still `0x000000003fc00000`, not integer 1? shchenz: I think we don't need a conditional check. `LFIWZX` will not change the raw_data in the…
def : Pat<(v2i64 (PPCldsplat ForceXForm:$A)),		def : Pat<(v2i64 (PPCldsplat ForceXForm:$A)),
(v2i64 (LXVDSX ForceXForm:$A))>;		(v2i64 (LXVDSX ForceXForm:$A))>;
		def : Pat<(v4i32 (PPCldsplat ForceXForm:$A)),
		(v4i32 (XXSPLTW (SUBREG_TO_REG (i64 1), (LFIWZX ForceXForm:$A), sub_64), 1))>;
		def : Pat<(v2i64 (PPCzextldsplat ForceXForm:$A)),
		nemanjaiUnsubmitted Done Reply Inline Actions As far as I can tell, this is in a block that only sets `let Predicates = [VSX]`. It is not safe to use direct moves on subtargets that don't have direct moves. nemanjai: As far as I can tell, this is in a block that only sets `let Predicates = [VSX]`. It is not…
		(v2i64 (XXPERMDIs (LFIWZX ForceXForm:$A), 0))>;
		def : Pat<(v2i64 (PPCsextldsplat ForceXForm:$A)),
		(v2i64 (XXPERMDIs (LFIWAX ForceXForm:$A), 0))>;

// Build vectors of floating point converted to i64.		// Build vectors of floating point converted to i64.
def : Pat<(v2i64 (build_vector FltToLong.A, FltToLong.A)),		def : Pat<(v2i64 (build_vector FltToLong.A, FltToLong.A)),
(v2i64 (XXPERMDIs		(v2i64 (XXPERMDIs
(COPY_TO_REGCLASS (XSCVDPSXDSs $A), VSFRC), 0))>;		(COPY_TO_REGCLASS (XSCVDPSXDSs $A), VSFRC), 0))>;
def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)),		def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)),
(v2i64 (XXPERMDIs		(v2i64 (XXPERMDIs
(COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>;		(COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>;
▲ Show 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	def : Pat<(v16i8 (PPCmtvsrz i32:$A)),
(v16i8 (SUBREG_TO_REG (i64 1), (MTVSRWZ $A), sub_64))>;		(v16i8 (SUBREG_TO_REG (i64 1), (MTVSRWZ $A), sub_64))>;

// Endianness-neutral constant splat on P8 and newer targets. The reason		// Endianness-neutral constant splat on P8 and newer targets. The reason
// for this pattern is that on targets with direct moves, we don't expand		// for this pattern is that on targets with direct moves, we don't expand
// BUILD_VECTOR nodes for v4i32.		// BUILD_VECTOR nodes for v4i32.
def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,		def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
immSExt5NonZero:$A, immSExt5NonZero:$A)),		immSExt5NonZero:$A, immSExt5NonZero:$A)),
(v4i32 (VSPLTISW imm:$A))>;		(v4i32 (VSPLTISW imm:$A))>;

		// Splat loads.
		// Note that, we use MTVSRD without checking PPC64 because we only care the
		// lowest 16/8 bits.
		jsjiUnsubmitted Done Reply Inline Actions Do we need to predicate 64 bit before using `MTVSRD`? jsji: Do we need to predicate 64 bit before using `MTVSRD`?
		shchenzAuthorUnsubmitted Done Reply Inline Actions Thanks, this is a very good question. Yes, normally, we need to predicate 64 bit if MTVSRD emits any value in the high 32bits of the first double world. However, for this case, seems we don't need the predicate, because the useful bits for this case are the lowest 16/8 bits, so even on the 32 bit arch, we still can ensure the lowest 16/8 bits are having the right value. I add a comment to indicate why we don't need to predicate 64 bit. shchenz: Thanks, this is a very good question. Yes, normally, we need to predicate 64 bit if MTVSRD…
		nemanjaiUnsubmitted Not Done Reply Inline Actions I understand that we don't really care about the undefined bits because they'll be overwritten when we splat, but why not just use `MTVSRWZ` and not even need the comment? nemanjai: I understand that we don't really care about the undefined bits because they'll be overwritten…
		shchenzAuthorUnsubmitted Done Reply Inline Actions Thanks for the good suggestion. This seems also resolve one crash related to `INSERT_SUBREG(x, x, sub_32)` on 32-bit AIX. I addressed this comment in D113236 shchenz: Thanks for the good suggestion. This seems also resolve one crash related to `INSERT_SUBREG(x…
		def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
		(v8i16 (VSPLTHs 3, (MTVSRD (INSERT_SUBREG (i64 (IMPLICIT_DEF)), (LHZX ForceXForm:$A), sub_32))))>;
		def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)),
		(v16i8 (VSPLTBs 7, (MTVSRD (INSERT_SUBREG (i64 (IMPLICIT_DEF)), (LBZX ForceXForm:$A), sub_32))))>;
} // HasVSX, HasDirectMove		} // HasVSX, HasDirectMove

// Big endian VSX subtarget with direct moves.		// Big endian VSX subtarget with direct moves.
let Predicates = [HasVSX, HasDirectMove, IsBigEndian] in {		let Predicates = [HasVSX, HasDirectMove, IsBigEndian] in {
// v16i8 scalar <-> vector conversions (BE)		// v16i8 scalar <-> vector conversions (BE)
defm : ScalToVecWPermute<		defm : ScalToVecWPermute<
v16i8, (i32 i32:$A),		v16i8, (i32 i32:$A),
(SUBREG_TO_REG (i64 1), MovesToVSR.BE_BYTE_0, sub_64),		(SUBREG_TO_REG (i64 1), MovesToVSR.BE_BYTE_0, sub_64),
▲ Show 20 Lines • Show All 531 Lines • ▼ Show 20 Lines	defm : ScalToVecWPermute<
(XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 DSForm:$A), VSFRC)), 0),		(XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 DSForm:$A), VSFRC)), 0),
(SUBREG_TO_REG		(SUBREG_TO_REG
(i64 1),		(i64 1),
(XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 DSForm:$A), VSFRC)), sub_64)>;		(XSCVDPUXDS (COPY_TO_REGCLASS (DFLOADf32 DSForm:$A), VSFRC)), sub_64)>;
def : Pat<(v4f32 (PPCldsplat ForceXForm:$A)),		def : Pat<(v4f32 (PPCldsplat ForceXForm:$A)),
(v4f32 (LXVWSX ForceXForm:$A))>;		(v4f32 (LXVWSX ForceXForm:$A))>;
def : Pat<(v4i32 (PPCldsplat ForceXForm:$A)),		def : Pat<(v4i32 (PPCldsplat ForceXForm:$A)),
(v4i32 (LXVWSX ForceXForm:$A))>;		(v4i32 (LXVWSX ForceXForm:$A))>;
		def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
		(v8i16 (VSPLTHs 3, (LXSIHZX ForceXForm:$A)))>;
		jsjiUnsubmitted Done Reply Inline Actions Do we need alignment check here for `LXSIHZX`? jsji: Do we need alignment check here for `LXSIHZX`?
		shchenzAuthorUnsubmitted Done Reply Inline Actions I think non-alignment load may introduce perf issue, but it will not have functionality issue? Seems load like `LDX` also not checks the alignment. Do you see any issues here? shchenz: I think non-alignment load may introduce perf issue, but it will not have functionality issue?
		jsjiUnsubmitted Not Done Reply Inline Actions Is it possible that we call `PPCldsplat` with unaligned address? If so, then we may load from wrong address. jsji: Is it possible that we call `PPCldsplat` with unaligned address? If so, then we may load from…
		def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)),
		(v16i8 (VSPLTBs 7, (LXSIBZX ForceXForm:$A)))>;
} // HasVSX, HasP9Vector		} // HasVSX, HasP9Vector

// Any Power9 VSX subtarget with equivalent length but better Power10 VSX		// Any Power9 VSX subtarget with equivalent length but better Power10 VSX
// patterns.		// patterns.
// Two identical blocks are required due to the slightly different predicates:		// Two identical blocks are required due to the slightly different predicates:
// One without P10 instructions, the other is BigEndian only with P10 instructions.		// One without P10 instructions, the other is BigEndian only with P10 instructions.
let Predicates = [HasVSX, HasP9Vector, NoP10Vector] in {		let Predicates = [HasVSX, HasP9Vector, NoP10Vector] in {
// Little endian Power10 subtargets produce a shorter pattern but require a		// Little endian Power10 subtargets produce a shorter pattern but require a
▲ Show 20 Lines • Show All 1,004 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCMIPeephole.cpp

Show First 20 Lines • Show All 597 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : MBB) {
LLVM_DEBUG(dbgs() << "Optimizing swap/swap => copy: ");		LLVM_DEBUG(dbgs() << "Optimizing swap/swap => copy: ");
LLVM_DEBUG(MI.dump());		LLVM_DEBUG(MI.dump());
BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(PPC::COPY),		BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(PPC::COPY),
MI.getOperand(0).getReg())		MI.getOperand(0).getReg())
.add(DefMI->getOperand(1));		.add(DefMI->getOperand(1));
ToErase = &MI;		ToErase = &MI;
Simplified = true;		Simplified = true;
}		}
} else if ((Immed == 0 \|\| Immed == 3) && DefOpc == PPC::XXPERMDIs &&		} else if ((Immed == 0 \|\| Immed == 3 \|\| Immed == 2) &&
		DefOpc == PPC::XXPERMDIs &&
(DefMI->getOperand(2).getImm() == 0 \|\|		(DefMI->getOperand(2).getImm() == 0 \|\|
DefMI->getOperand(2).getImm() == 3)) {		DefMI->getOperand(2).getImm() == 3)) {
		ToErase = &MI;
		Simplified = true;
		// Swap of a splat, convert to copy.
		if (Immed == 2) {
		LLVM_DEBUG(dbgs() << "Optimizing swap(splat) => copy(splat): ");
		LLVM_DEBUG(MI.dump());
		BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(PPC::COPY),
		MI.getOperand(0).getReg())
		.add(MI.getOperand(1));
		break;
		}
// Splat fed by another splat - switch the output of the first		// Splat fed by another splat - switch the output of the first
// and remove the second.		// and remove the second.
DefMI->getOperand(0).setReg(MI.getOperand(0).getReg());		DefMI->getOperand(0).setReg(MI.getOperand(0).getReg());
ToErase = &MI;
Simplified = true;
LLVM_DEBUG(dbgs() << "Removing redundant splat: ");		LLVM_DEBUG(dbgs() << "Removing redundant splat: ");
LLVM_DEBUG(MI.dump());		LLVM_DEBUG(MI.dump());
}		}
break;		break;
}		}
case PPC::VSPLTB:		case PPC::VSPLTB:
case PPC::VSPLTH:		case PPC::VSPLTH:
case PPC::XXSPLTW: {		case PPC::XXSPLTW: {
▲ Show 20 Lines • Show All 1,049 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll

Show First 20 Lines • Show All 634 Lines • ▼ Show 20 Lines
entry:		entry:
%vecins1 = shufflevector <4 x i32> %a, <4 x i32> <i32 undef, i32 566, i32 undef, i32 566>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>		%vecins1 = shufflevector <4 x i32> %a, <4 x i32> <i32 undef, i32 566, i32 undef, i32 566>, <4 x i32> <i32 0, i32 5, i32 2, i32 7>
ret <4 x i32> %vecins1		ret <4 x i32> %vecins1
}		}

define dso_local <16 x i8> @no_RAUW_in_combine_during_legalize(i32* nocapture readonly %ptr, i32 signext %offset) local_unnamed_addr #0 {		define dso_local <16 x i8> @no_RAUW_in_combine_during_legalize(i32* nocapture readonly %ptr, i32 signext %offset) local_unnamed_addr #0 {
; CHECK-P8-LABEL: no_RAUW_in_combine_during_legalize:		; CHECK-P8-LABEL: no_RAUW_in_combine_during_legalize:
; CHECK-P8: # %bb.0: # %entry		; CHECK-P8: # %bb.0: # %entry
; CHECK-P8-NEXT: addis r5, r2, .LCPI16_0@toc@ha
; CHECK-P8-NEXT: sldi r4, r4, 2		; CHECK-P8-NEXT: sldi r4, r4, 2
; CHECK-P8-NEXT: xxlxor v4, v4, v4		; CHECK-P8-NEXT: xxlxor v3, v3, v3
; CHECK-P8-NEXT: addi r5, r5, .LCPI16_0@toc@l		; CHECK-P8-NEXT: lfiwzx f0, r3, r4
; CHECK-P8-NEXT: lxsiwzx v2, r3, r4		; CHECK-P8-NEXT: xxspltd v2, f0, 0
		jsjiUnsubmitted Not Done Reply Inline Actions `f0` here looks weird regarding register classes, `xxspltd` does need `vs0` to get the `element 0` of doubleword. Maybe we should follow up to see why we are printing `f0` here in ASMPrinter. jsji: `f0` here looks weird regarding register classes, `xxspltd` does need `vs0` to get the `element…
		shchenzAuthorUnsubmitted Done Reply Inline Actions Sure, I will check this later. `vs0` seems a more reasonable input. shchenz: Sure, I will check this later. `vs0` seems a more reasonable input.
; CHECK-P8-NEXT: lvx v3, 0, r5		; CHECK-P8-NEXT: vmrglb v2, v3, v2
; CHECK-P8-NEXT: vperm v2, v4, v2, v3
; CHECK-P8-NEXT: blr		; CHECK-P8-NEXT: blr
;		;
; CHECK-P9-LABEL: no_RAUW_in_combine_during_legalize:		; CHECK-P9-LABEL: no_RAUW_in_combine_during_legalize:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: sldi r4, r4, 2		; CHECK-P9-NEXT: sldi r4, r4, 2
; CHECK-P9-NEXT: xxlxor v4, v4, v4		; CHECK-P9-NEXT: xxlxor v3, v3, v3
; CHECK-P9-NEXT: lxsiwzx v2, r3, r4		; CHECK-P9-NEXT: lfiwzx f0, r3, r4
; CHECK-P9-NEXT: addis r3, r2, .LCPI16_0@toc@ha		; CHECK-P9-NEXT: xxspltd v2, f0, 0
; CHECK-P9-NEXT: addi r3, r3, .LCPI16_0@toc@l		; CHECK-P9-NEXT: vmrglb v2, v3, v2
; CHECK-P9-NEXT: lxv v3, 0(r3)
; CHECK-P9-NEXT: vperm v2, v4, v2, v3
; CHECK-P9-NEXT: blr		; CHECK-P9-NEXT: blr
;		;
; CHECK-P9-BE-LABEL: no_RAUW_in_combine_during_legalize:		; CHECK-P9-BE-LABEL: no_RAUW_in_combine_during_legalize:
; CHECK-P9-BE: # %bb.0: # %entry		; CHECK-P9-BE: # %bb.0: # %entry
; CHECK-P9-BE-NEXT: sldi r4, r4, 2		; CHECK-P9-BE-NEXT: sldi r4, r4, 2
; CHECK-P9-BE-NEXT: xxlxor v3, v3, v3		; CHECK-P9-BE-NEXT: xxlxor v3, v3, v3
; CHECK-P9-BE-NEXT: lxsiwzx v2, r3, r4		; CHECK-P9-BE-NEXT: lxsiwzx v2, r3, r4
; CHECK-P9-BE-NEXT: vmrghb v2, v2, v3		; CHECK-P9-BE-NEXT: vmrghb v2, v2, v3
; CHECK-P9-BE-NEXT: blr		; CHECK-P9-BE-NEXT: blr
;		;
; CHECK-NOVSX-LABEL: no_RAUW_in_combine_during_legalize:		; CHECK-NOVSX-LABEL: no_RAUW_in_combine_during_legalize:
; CHECK-NOVSX: # %bb.0: # %entry		; CHECK-NOVSX: # %bb.0: # %entry
; CHECK-NOVSX-NEXT: sldi r4, r4, 2		; CHECK-NOVSX-NEXT: sldi r4, r4, 2
; CHECK-NOVSX-NEXT: vxor v2, v2, v2		; CHECK-NOVSX-NEXT: vxor v2, v2, v2
; CHECK-NOVSX-NEXT: lwzx r3, r3, r4		; CHECK-NOVSX-NEXT: lwzx r3, r3, r4
; CHECK-NOVSX-NEXT: std r3, -16(r1)		; CHECK-NOVSX-NEXT: std r3, -16(r1)
; CHECK-NOVSX-NEXT: addi r3, r1, -16		; CHECK-NOVSX-NEXT: addi r3, r1, -16
; CHECK-NOVSX-NEXT: lvx v3, 0, r3		; CHECK-NOVSX-NEXT: lvx v3, 0, r3
; CHECK-NOVSX-NEXT: vmrglb v2, v2, v3		; CHECK-NOVSX-NEXT: vmrglb v2, v2, v3
; CHECK-NOVSX-NEXT: blr		; CHECK-NOVSX-NEXT: blr
;		;
; CHECK-P7-LABEL: no_RAUW_in_combine_during_legalize:		; CHECK-P7-LABEL: no_RAUW_in_combine_during_legalize:
; CHECK-P7: # %bb.0: # %entry		; CHECK-P7: # %bb.0: # %entry
; CHECK-P7-NEXT: sldi r4, r4, 2		; CHECK-P7-NEXT: sldi r4, r4, 2
; CHECK-P7-NEXT: addi r5, r1, -16
; CHECK-P7-NEXT: xxlxor v3, v3, v3		; CHECK-P7-NEXT: xxlxor v3, v3, v3
; CHECK-P7-NEXT: lwzx r3, r3, r4		; CHECK-P7-NEXT: lfiwzx f0, r3, r4
; CHECK-P7-NEXT: std r3, -16(r1)		; CHECK-P7-NEXT: xxspltd v2, f0, 0
; CHECK-P7-NEXT: lxvd2x vs0, 0, r5
; CHECK-P7-NEXT: xxswapd v2, vs0
; CHECK-P7-NEXT: vmrglb v2, v3, v2		; CHECK-P7-NEXT: vmrglb v2, v3, v2
; CHECK-P7-NEXT: blr		; CHECK-P7-NEXT: blr
entry:		entry:
%idx.ext = sext i32 %offset to i64		%idx.ext = sext i32 %offset to i64
%add.ptr = getelementptr inbounds i32, i32* %ptr, i64 %idx.ext		%add.ptr = getelementptr inbounds i32, i32* %ptr, i64 %idx.ext
%0 = load i32, i32* %add.ptr, align 4		%0 = load i32, i32* %add.ptr, align 4
%conv = zext i32 %0 to i64		%conv = zext i32 %0 to i64
%splat.splatinsert = insertelement <2 x i64> undef, i64 %conv, i32 0		%splat.splatinsert = insertelement <2 x i64> undef, i64 %conv, i32 0
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	entry:
%vecinit30 = shufflevector <8 x i8> %0, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%vecinit30 = shufflevector <8 x i8> %0, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%1 = bitcast <16 x i8> %vecinit30 to <2 x i64>		%1 = bitcast <16 x i8> %vecinit30 to <2 x i64>
ret <2 x i64> %1		ret <2 x i64> %1
}		}

define dso_local void @testByteSplat() #0 {		define dso_local void @testByteSplat() #0 {
; CHECK-P8-LABEL: testByteSplat:		; CHECK-P8-LABEL: testByteSplat:
; CHECK-P8: # %bb.0: # %entry		; CHECK-P8: # %bb.0: # %entry
; CHECK-P8-NEXT: lbz r3, 0(r3)		; CHECK-P8-NEXT: lbzx r3, 0, r3
; CHECK-P8-NEXT: mtvsrd v2, r3		; CHECK-P8-NEXT: mtvsrd v2, r3
; CHECK-P8-NEXT: vspltb v2, v2, 7		; CHECK-P8-NEXT: vspltb v2, v2, 7
; CHECK-P8-NEXT: stvx v2, 0, r3		; CHECK-P8-NEXT: stvx v2, 0, r3
; CHECK-P8-NEXT: blr		; CHECK-P8-NEXT: blr
;		;
; CHECK-P9-LABEL: testByteSplat:		; CHECK-P9-LABEL: testByteSplat:
; CHECK-P9: # %bb.0: # %entry		; CHECK-P9: # %bb.0: # %entry
; CHECK-P9-NEXT: lxsibzx v2, 0, r3		; CHECK-P9-NEXT: lxsibzx v2, 0, r3
Show All 15 Lines
; CHECK-NOVSX-NEXT: addi r3, r1, -16		; CHECK-NOVSX-NEXT: addi r3, r1, -16
; CHECK-NOVSX-NEXT: lvx v2, 0, r3		; CHECK-NOVSX-NEXT: lvx v2, 0, r3
; CHECK-NOVSX-NEXT: vspltb v2, v2, 15		; CHECK-NOVSX-NEXT: vspltb v2, v2, 15
; CHECK-NOVSX-NEXT: stvx v2, 0, r3		; CHECK-NOVSX-NEXT: stvx v2, 0, r3
; CHECK-NOVSX-NEXT: blr		; CHECK-NOVSX-NEXT: blr
;		;
; CHECK-P7-LABEL: testByteSplat:		; CHECK-P7-LABEL: testByteSplat:
; CHECK-P7: # %bb.0: # %entry		; CHECK-P7: # %bb.0: # %entry
; CHECK-P7-NEXT: lbz r3, 0(r3)		; CHECK-P7-NEXT: lvsr v2, 0, r3
; CHECK-P7-NEXT: stb r3, -16(r1)		; CHECK-P7-NEXT: lvx v3, 0, r3
; CHECK-P7-NEXT: addi r3, r1, -16		; CHECK-P7-NEXT: vperm v2, v3, v3, v2
; CHECK-P7-NEXT: lvx v2, 0, r3
; CHECK-P7-NEXT: vspltb v2, v2, 15		; CHECK-P7-NEXT: vspltb v2, v2, 15
; CHECK-P7-NEXT: stvx v2, 0, r3		; CHECK-P7-NEXT: stvx v2, 0, r3
; CHECK-P7-NEXT: blr		; CHECK-P7-NEXT: blr
entry:		entry:
%0 = load i8, i8* undef, align 1		%0 = load i8, i8* undef, align 1
%splat.splatinsert.i = insertelement <16 x i8> poison, i8 %0, i32 0		%splat.splatinsert.i = insertelement <16 x i8> poison, i8 %0, i32 0
%splat.splat.i = shufflevector <16 x i8> %splat.splatinsert.i, <16 x i8> poison, <16 x i32> zeroinitializer		%splat.splat.i = shufflevector <16 x i8> %splat.splatinsert.i, <16 x i8> poison, <16 x i32> zeroinitializer
store <16 x i8> %splat.splat.i, <16 x i8>* undef, align 16		store <16 x i8> %splat.splat.i, <16 x i8>* undef, align 16
ret void		ret void
}		}

declare double @dummy() local_unnamed_addr		declare double @dummy() local_unnamed_addr
attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/test/CodeGen/PowerPC/load-and-splat.ll

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
; P8-NEXT: addi r4, r4, 12		; P8-NEXT: addi r4, r4, 12
; P8-NEXT: lfiwzx f0, 0, r4		; P8-NEXT: lfiwzx f0, 0, r4
; P8-NEXT: xxspltw v2, vs0, 1		; P8-NEXT: xxspltw v2, vs0, 1
; P8-NEXT: stvx v2, 0, r3		; P8-NEXT: stvx v2, 0, r3
; P8-NEXT: blr		; P8-NEXT: blr
;		;
; P7-LABEL: test2:		; P7-LABEL: test2:
; P7: # %bb.0: # %entry		; P7: # %bb.0: # %entry
; P7-NEXT: lwz r4, 12(r4)		; P7-NEXT: addi r4, r4, 12
; P7-NEXT: addi r5, r1, -16		; P7-NEXT: lfiwzx f0, 0, r4
; P7-NEXT: stw r4, -16(r1)		; P7-NEXT: xxspltw vs0, vs0, 1
; P7-NEXT: lxvw4x vs0, 0, r5
; P7-NEXT: xxspltw vs0, vs0, 0
; P7-NEXT: stxvw4x vs0, 0, r3		; P7-NEXT: stxvw4x vs0, 0, r3
; P7-NEXT: blr		; P7-NEXT: blr
entry:		entry:
%arrayidx = getelementptr inbounds float, float* %a, i64 3		%arrayidx = getelementptr inbounds float, float* %a, i64 3
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
%splat.splatinsert.i = insertelement <4 x float> undef, float %0, i32 0		%splat.splatinsert.i = insertelement <4 x float> undef, float %0, i32 0
%splat.splat.i = shufflevector <4 x float> %splat.splatinsert.i, <4 x float> undef, <4 x i32> zeroinitializer		%splat.splat.i = shufflevector <4 x float> %splat.splatinsert.i, <4 x float> undef, <4 x i32> zeroinitializer
store <4 x float> %splat.splat.i, <4 x float>* %c, align 16		store <4 x float> %splat.splat.i, <4 x float>* %c, align 16
Show All 14 Lines
; P8-NEXT: addi r4, r4, 12		; P8-NEXT: addi r4, r4, 12
; P8-NEXT: lfiwzx f0, 0, r4		; P8-NEXT: lfiwzx f0, 0, r4
; P8-NEXT: xxspltw v2, vs0, 1		; P8-NEXT: xxspltw v2, vs0, 1
; P8-NEXT: stvx v2, 0, r3		; P8-NEXT: stvx v2, 0, r3
; P8-NEXT: blr		; P8-NEXT: blr
;		;
; P7-LABEL: test3:		; P7-LABEL: test3:
; P7: # %bb.0: # %entry		; P7: # %bb.0: # %entry
; P7-NEXT: lwz r4, 12(r4)		; P7-NEXT: addi r4, r4, 12
; P7-NEXT: addi r5, r1, -16		; P7-NEXT: lfiwzx f0, 0, r4
; P7-NEXT: stw r4, -16(r1)		; P7-NEXT: xxspltw vs0, vs0, 1
; P7-NEXT: lxvw4x vs0, 0, r5
; P7-NEXT: xxspltw vs0, vs0, 0
; P7-NEXT: stxvw4x vs0, 0, r3		; P7-NEXT: stxvw4x vs0, 0, r3
; P7-NEXT: blr		; P7-NEXT: blr
entry:		entry:
%arrayidx = getelementptr inbounds i32, i32* %a, i64 3		%arrayidx = getelementptr inbounds i32, i32* %a, i64 3
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
%splat.splatinsert.i = insertelement <4 x i32> undef, i32 %0, i32 0		%splat.splatinsert.i = insertelement <4 x i32> undef, i32 %0, i32 0
%splat.splat.i = shufflevector <4 x i32> %splat.splatinsert.i, <4 x i32> undef, <4 x i32> zeroinitializer		%splat.splat.i = shufflevector <4 x i32> %splat.splatinsert.i, <4 x i32> undef, <4 x i32> zeroinitializer
store <4 x i32> %splat.splat.i, <4 x i32>* %c, align 16		store <4 x i32> %splat.splat.i, <4 x i32>* %c, align 16
ret void		ret void
}		}


; v2i64		; v2i64
define dso_local void @test4(<2 x i64>* nocapture %c, i64* nocapture readonly %a) local_unnamed_addr {		define dso_local void @test4(<2 x i64>* nocapture %c, i64* nocapture readonly %a) local_unnamed_addr {
; P9-LABEL: test4:		; P9-LABEL: test4:
; P9: # %bb.0: # %entry		; P9: # %bb.0: # %entry
; P9-NEXT: addi r4, r4, 24		; P9-NEXT: addi r4, r4, 24
; P9-NEXT: lxvdsx vs0, 0, r4		; P9-NEXT: lxvdsx vs0, 0, r4
; P9-NEXT: stxv vs0, 0(r3)		; P9-NEXT: stxv vs0, 0(r3)
; P9-NEXT: blr		; P9-NEXT: blr
Show All 20 Lines	entry:
ret void		ret void
}		}

; sext v2i64		; sext v2i64
define void @test5(<2 x i64>* %a, i32* %in) {		define void @test5(<2 x i64>* %a, i32* %in) {
; P9-LABEL: test5:		; P9-LABEL: test5:
; P9: # %bb.0: # %entry		; P9: # %bb.0: # %entry
; P9-NEXT: lfiwax f0, 0, r4		; P9-NEXT: lfiwax f0, 0, r4
; P9-NEXT: xxspltd vs0, vs0, 0		; P9-NEXT: xxspltd vs0, f0, 0
; P9-NEXT: stxv vs0, 0(r3)		; P9-NEXT: stxv vs0, 0(r3)
; P9-NEXT: blr		; P9-NEXT: blr
;		;
; P8-LABEL: test5:		; P8-LABEL: test5:
; P8: # %bb.0: # %entry		; P8: # %bb.0: # %entry
; P8-NEXT: lfiwax f0, 0, r4		; P8-NEXT: lfiwax f0, 0, r4
; P8-NEXT: xxspltd vs0, vs0, 0		; P8-NEXT: xxspltd vs0, f0, 0
; P8-NEXT: stxvd2x vs0, 0, r3		; P8-NEXT: stxvd2x vs0, 0, r3
; P8-NEXT: blr		; P8-NEXT: blr
;		;
; P7-LABEL: test5:		; P7-LABEL: test5:
; P7: # %bb.0: # %entry		; P7: # %bb.0: # %entry
; P7-NEXT: lwa r4, 0(r4)		; P7-NEXT: lfiwax f0, 0, r4
; P7-NEXT: addi r5, r1, -16		; P7-NEXT: xxspltd vs0, f0, 0
; P7-NEXT: std r4, -8(r1)
; P7-NEXT: std r4, -16(r1)
; P7-NEXT: lxvd2x vs0, 0, r5
; P7-NEXT: stxvd2x vs0, 0, r3		; P7-NEXT: stxvd2x vs0, 0, r3
; P7-NEXT: blr		; P7-NEXT: blr
entry:		entry:
%0 = load i32, i32* %in, align 4		%0 = load i32, i32* %in, align 4
%conv = sext i32 %0 to i64		%conv = sext i32 %0 to i64
%splat.splatinsert.i = insertelement <2 x i64> poison, i64 %conv, i32 0		%splat.splatinsert.i = insertelement <2 x i64> poison, i64 %conv, i32 0
%splat.splat.i = shufflevector <2 x i64> %splat.splatinsert.i, <2 x i64> poison, <2 x i32> zeroinitializer		%splat.splat.i = shufflevector <2 x i64> %splat.splatinsert.i, <2 x i64> poison, <2 x i32> zeroinitializer
store <2 x i64> %splat.splat.i, <2 x i64>* %a, align 16		store <2 x i64> %splat.splat.i, <2 x i64>* %a, align 16
ret void		ret void
}		}

; zext v2i64		; zext v2i64
define void @test6(<2 x i64>* %a, i32* %in) {		define void @test6(<2 x i64>* %a, i32* %in) {
; P9-LABEL: test6:		; P9-LABEL: test6:
; P9: # %bb.0: # %entry		; P9: # %bb.0: # %entry
; P9-NEXT: lfiwzx f0, 0, r4		; P9-NEXT: lfiwzx f0, 0, r4
; P9-NEXT: xxspltd vs0, vs0, 0		; P9-NEXT: xxspltd vs0, f0, 0
; P9-NEXT: stxv vs0, 0(r3)		; P9-NEXT: stxv vs0, 0(r3)
; P9-NEXT: blr		; P9-NEXT: blr
;		;
; P8-LABEL: test6:		; P8-LABEL: test6:
; P8: # %bb.0: # %entry		; P8: # %bb.0: # %entry
; P8-NEXT: lfiwzx f0, 0, r4		; P8-NEXT: lfiwzx f0, 0, r4
; P8-NEXT: xxspltd vs0, vs0, 0		; P8-NEXT: xxspltd vs0, f0, 0
; P8-NEXT: stxvd2x vs0, 0, r3		; P8-NEXT: stxvd2x vs0, 0, r3
; P8-NEXT: blr		; P8-NEXT: blr
;		;
; P7-LABEL: test6:		; P7-LABEL: test6:
; P7: # %bb.0: # %entry		; P7: # %bb.0: # %entry
; P7-NEXT: lwz r4, 0(r4)		; P7-NEXT: lfiwzx f0, 0, r4
; P7-NEXT: addi r5, r1, -16		; P7-NEXT: xxspltd vs0, f0, 0
; P7-NEXT: std r4, -8(r1)
; P7-NEXT: std r4, -16(r1)
; P7-NEXT: lxvd2x vs0, 0, r5
; P7-NEXT: stxvd2x vs0, 0, r3		; P7-NEXT: stxvd2x vs0, 0, r3
; P7-NEXT: blr		; P7-NEXT: blr
entry:		entry:
%0 = load i32, i32* %in, align 4		%0 = load i32, i32* %in, align 4
%conv = zext i32 %0 to i64		%conv = zext i32 %0 to i64
%splat.splatinsert.i = insertelement <2 x i64> poison, i64 %conv, i32 0		%splat.splatinsert.i = insertelement <2 x i64> poison, i64 %conv, i32 0
%splat.splat.i = shufflevector <2 x i64> %splat.splatinsert.i, <2 x i64> poison, <2 x i32> zeroinitializer		%splat.splat.i = shufflevector <2 x i64> %splat.splatinsert.i, <2 x i64> poison, <2 x i32> zeroinitializer
store <2 x i64> %splat.splat.i, <2 x i64>* %a, align 16		store <2 x i64> %splat.splat.i, <2 x i64>* %a, align 16
ret void		ret void
}		}

; v8i16		; v8i16
define void @test7(<8 x i16>* %a, i16* %in) {		define void @test7(<8 x i16>* %a, i16* %in) {
; P9-LABEL: test7:		; P9-LABEL: test7:
; P9: # %bb.0: # %entry		; P9: # %bb.0: # %entry
; P9-NEXT: lxsihzx v2, 0, r4		; P9-NEXT: lxsihzx v2, 0, r4
; P9-NEXT: vsplth v2, v2, 3		; P9-NEXT: vsplth v2, v2, 3
; P9-NEXT: stxv v2, 0(r3)		; P9-NEXT: stxv v2, 0(r3)
; P9-NEXT: blr		; P9-NEXT: blr
;		;
; P8-LABEL: test7:		; P8-LABEL: test7:
; P8: # %bb.0: # %entry		; P8: # %bb.0: # %entry
; P8-NEXT: lhz r4, 0(r4)		; P8-NEXT: lhzx r4, 0, r4
; P8-NEXT: mtvsrd v2, r4		; P8-NEXT: mtvsrd v2, r4
; P8-NEXT: vsplth v2, v2, 3		; P8-NEXT: vsplth v2, v2, 3
; P8-NEXT: stvx v2, 0, r3		; P8-NEXT: stvx v2, 0, r3
; P8-NEXT: blr		; P8-NEXT: blr
;		;
; P7-LABEL: test7:		; P7-LABEL: test7:
; P7: # %bb.0: # %entry		; P7: # %bb.0: # %entry
; P7-NEXT: lhz r4, 0(r4)		; P7-NEXT: li r5, 1
; P7-NEXT: addi r5, r1, -16		; P7-NEXT: lvx v2, 0, r4
		nemanjaiUnsubmitted Done Reply Inline Actions Definitely bad. P7 doesn't have direct moves. nemanjai: Definitely bad. P7 doesn't have direct moves.
		shchenzAuthorUnsubmitted Done Reply Inline Actions For this case, now we use one more instruction than the left ones. But I think it should still be a win as we don't use the stack which is always good for some opts, like leaf calls related optimizations. And now it uses fewer memory operations 2 vs 3. shchenz: For this case, now we use one more instruction than the left ones. But I think it should still…
; P7-NEXT: sth r4, -16(r1)		; P7-NEXT: lvsl v4, 0, r4
; P7-NEXT: lxvw4x v2, 0, r5		; P7-NEXT: lvx v3, r5, r4
		; P7-NEXT: vperm v2, v2, v3, v4
; P7-NEXT: vsplth v2, v2, 0		; P7-NEXT: vsplth v2, v2, 0
; P7-NEXT: stxvw4x v2, 0, r3		; P7-NEXT: stxvw4x v2, 0, r3
; P7-NEXT: blr		; P7-NEXT: blr
entry:		entry:
%0 = load i16, i16* %in, align 2		%0 = load i16, i16* %in, align 2
%splat.splatinsert.i = insertelement <8 x i16> poison, i16 %0, i32 0		%splat.splatinsert.i = insertelement <8 x i16> poison, i16 %0, i32 0
%splat.splat.i = shufflevector <8 x i16> %splat.splatinsert.i, <8 x i16> poison, <8 x i32> zeroinitializer		%splat.splat.i = shufflevector <8 x i16> %splat.splatinsert.i, <8 x i16> poison, <8 x i32> zeroinitializer
store <8 x i16> %splat.splat.i, <8 x i16>* %a, align 16		store <8 x i16> %splat.splat.i, <8 x i16>* %a, align 16
ret void		ret void
}		}

; v16i8		; v16i8
define void @test8(<16 x i8>* %a, i8* %in) {		define void @test8(<16 x i8>* %a, i8* %in) {
; P9-LABEL: test8:		; P9-LABEL: test8:
; P9: # %bb.0: # %entry		; P9: # %bb.0: # %entry
; P9-NEXT: lxsibzx v2, 0, r4		; P9-NEXT: lxsibzx v2, 0, r4
; P9-NEXT: vspltb v2, v2, 7		; P9-NEXT: vspltb v2, v2, 7
; P9-NEXT: stxv v2, 0(r3)		; P9-NEXT: stxv v2, 0(r3)
; P9-NEXT: blr		; P9-NEXT: blr
;		;
; P8-LABEL: test8:		; P8-LABEL: test8:
; P8: # %bb.0: # %entry		; P8: # %bb.0: # %entry
; P8-NEXT: lbz r4, 0(r4)		; P8-NEXT: lbzx r4, 0, r4
; P8-NEXT: mtvsrd v2, r4		; P8-NEXT: mtvsrd v2, r4
; P8-NEXT: vspltb v2, v2, 7		; P8-NEXT: vspltb v2, v2, 7
; P8-NEXT: stvx v2, 0, r3		; P8-NEXT: stvx v2, 0, r3
; P8-NEXT: blr		; P8-NEXT: blr
;		;
; P7-LABEL: test8:		; P7-LABEL: test8:
; P7: # %bb.0: # %entry		; P7: # %bb.0: # %entry
; P7-NEXT: lbz r4, 0(r4)		; P7-NEXT: lvsl v2, 0, r4
; P7-NEXT: addi r5, r1, -16		; P7-NEXT: lvx v3, 0, r4
; P7-NEXT: stb r4, -16(r1)		; P7-NEXT: vperm v2, v3, v3, v2
; P7-NEXT: lxvw4x v2, 0, r5
; P7-NEXT: vspltb v2, v2, 0		; P7-NEXT: vspltb v2, v2, 0
; P7-NEXT: stxvw4x v2, 0, r3		; P7-NEXT: stxvw4x v2, 0, r3
; P7-NEXT: blr		; P7-NEXT: blr
entry:		entry:
%0 = load i8, i8* %in, align 1		%0 = load i8, i8* %in, align 1
%splat.splatinsert.i = insertelement <16 x i8> poison, i8 %0, i32 0		%splat.splatinsert.i = insertelement <16 x i8> poison, i8 %0, i32 0
%splat.splat.i = shufflevector <16 x i8> %splat.splatinsert.i, <16 x i8> poison, <16 x i32> zeroinitializer		%splat.splat.i = shufflevector <16 x i8> %splat.splatinsert.i, <16 x i8> poison, <16 x i32> zeroinitializer
store <16 x i8> %splat.splat.i, <16 x i8>* %a, align 16		store <16 x i8> %splat.splat.i, <16 x i8>* %a, align 16
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/scalar_vector_test_3.ll

Show All 9 Lines

; Function Attrs: norecurse nounwind readonly		; Function Attrs: norecurse nounwind readonly
define <2 x i64> @s2v_test1(i32* nocapture readonly %int32, <2 x i64> %vec) {		define <2 x i64> @s2v_test1(i32* nocapture readonly %int32, <2 x i64> %vec) {
; P9LE-LABEL: s2v_test1:		; P9LE-LABEL: s2v_test1:
; P9LE: # %bb.0: # %entry		; P9LE: # %bb.0: # %entry
; P9LE-NEXT: lfiwax f0, 0, r3		; P9LE-NEXT: lfiwax f0, 0, r3
; P9LE-NEXT: xxmrghd v2, v2, vs0		; P9LE-NEXT: xxmrghd v2, v2, vs0
; P9LE-NEXT: blr		; P9LE-NEXT: blr
;		;
		jsjiUnsubmitted Done Reply Inline Actions Unrelated changes? jsji: Unrelated changes?
		shchenzAuthorUnsubmitted Done Reply Inline Actions It is also for the weird change: `xxspltd v2, f0` shchenz: It is also for the weird change: `xxspltd v2, f0`
		jsjiUnsubmitted Not Done Reply Inline Actions I meant the `;`, anyway, I have committed the changes in bd932f7499ff7ab958f5bc2f55dcf4b06cd87950, you should be able to rebase. jsji: I meant the `;`, anyway, I have committed the changes in…
; P9BE-LABEL: s2v_test1:		; P9BE-LABEL: s2v_test1:
; P9BE: # %bb.0: # %entry		; P9BE: # %bb.0: # %entry
; P9BE-NEXT: lfiwax f0, 0, r3		; P9BE-NEXT: lfiwax f0, 0, r3
; P9BE-NEXT: xxpermdi v2, vs0, v2, 1		; P9BE-NEXT: xxpermdi v2, vs0, v2, 1
; P9BE-NEXT: blr		; P9BE-NEXT: blr
;		;
; P8LE-LABEL: s2v_test1:		; P8LE-LABEL: s2v_test1:
; P8LE: # %bb.0: # %entry		; P8LE: # %bb.0: # %entry
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	entry:
ret <2 x i64> %vecins		ret <2 x i64> %vecins
}		}

; Function Attrs: norecurse nounwind readonly		; Function Attrs: norecurse nounwind readonly
define <2 x i64> @s2v_test6(i32* nocapture readonly %ptr) {		define <2 x i64> @s2v_test6(i32* nocapture readonly %ptr) {
; P9LE-LABEL: s2v_test6:		; P9LE-LABEL: s2v_test6:
; P9LE: # %bb.0: # %entry		; P9LE: # %bb.0: # %entry
; P9LE-NEXT: lfiwax f0, 0, r3		; P9LE-NEXT: lfiwax f0, 0, r3
; P9LE-NEXT: xxspltd v2, vs0, 0		; P9LE-NEXT: xxspltd v2, f0, 0
; P9LE-NEXT: blr		; P9LE-NEXT: blr
;		;
; P9BE-LABEL: s2v_test6:		; P9BE-LABEL: s2v_test6:
; P9BE: # %bb.0: # %entry		; P9BE: # %bb.0: # %entry
; P9BE-NEXT: lfiwax f0, 0, r3		; P9BE-NEXT: lfiwax f0, 0, r3
; P9BE-NEXT: xxspltd v2, vs0, 0		; P9BE-NEXT: xxspltd v2, f0, 0
; P9BE-NEXT: blr		; P9BE-NEXT: blr
;		;
; P8LE-LABEL: s2v_test6:		; P8LE-LABEL: s2v_test6:
; P8LE: # %bb.0: # %entry		; P8LE: # %bb.0: # %entry
; P8LE-NEXT: lfiwax f0, 0, r3		; P8LE-NEXT: lfiwax f0, 0, r3
; P8LE-NEXT: xxspltd v2, vs0, 0		; P8LE-NEXT: xxspltd v2, f0, 0
; P8LE-NEXT: blr		; P8LE-NEXT: blr
;		;
; P8BE-LABEL: s2v_test6:		; P8BE-LABEL: s2v_test6:
; P8BE: # %bb.0: # %entry		; P8BE: # %bb.0: # %entry
; P8BE-NEXT: lfiwax f0, 0, r3		; P8BE-NEXT: lfiwax f0, 0, r3
; P8BE-NEXT: xxspltd v2, vs0, 0		; P8BE-NEXT: xxspltd v2, f0, 0
; P8BE-NEXT: blr		; P8BE-NEXT: blr



entry:		entry:
%0 = load i32, i32* %ptr, align 4		%0 = load i32, i32* %ptr, align 4
%conv = sext i32 %0 to i64		%conv = sext i32 %0 to i64
%splat.splatinsert = insertelement <2 x i64> undef, i64 %conv, i32 0		%splat.splatinsert = insertelement <2 x i64> undef, i64 %conv, i32 0
%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer		%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
ret <2 x i64> %splat.splat		ret <2 x i64> %splat.splat
}		}

; Function Attrs: norecurse nounwind readonly		; Function Attrs: norecurse nounwind readonly
define <2 x i64> @s2v_test7(i32* nocapture readonly %ptr) {		define <2 x i64> @s2v_test7(i32* nocapture readonly %ptr) {
; P9LE-LABEL: s2v_test7:		; P9LE-LABEL: s2v_test7:
; P9LE: # %bb.0: # %entry		; P9LE: # %bb.0: # %entry
; P9LE-NEXT: lfiwax f0, 0, r3		; P9LE-NEXT: lfiwax f0, 0, r3
; P9LE-NEXT: xxspltd v2, vs0, 0		; P9LE-NEXT: xxspltd v2, f0, 0
; P9LE-NEXT: blr		; P9LE-NEXT: blr
;		;
; P9BE-LABEL: s2v_test7:		; P9BE-LABEL: s2v_test7:
; P9BE: # %bb.0: # %entry		; P9BE: # %bb.0: # %entry
; P9BE-NEXT: lfiwax f0, 0, r3		; P9BE-NEXT: lfiwax f0, 0, r3
; P9BE-NEXT: xxspltd v2, vs0, 0		; P9BE-NEXT: xxspltd v2, f0, 0
; P9BE-NEXT: blr		; P9BE-NEXT: blr
;		;
; P8LE-LABEL: s2v_test7:		; P8LE-LABEL: s2v_test7:
; P8LE: # %bb.0: # %entry		; P8LE: # %bb.0: # %entry
; P8LE-NEXT: lfiwax f0, 0, r3		; P8LE-NEXT: lfiwax f0, 0, r3
; P8LE-NEXT: xxspltd v2, vs0, 0		; P8LE-NEXT: xxspltd v2, f0, 0
; P8LE-NEXT: blr		; P8LE-NEXT: blr
;		;
; P8BE-LABEL: s2v_test7:		; P8BE-LABEL: s2v_test7:
; P8BE: # %bb.0: # %entry		; P8BE: # %bb.0: # %entry
; P8BE-NEXT: lfiwax f0, 0, r3		; P8BE-NEXT: lfiwax f0, 0, r3
; P8BE-NEXT: xxspltd v2, vs0, 0		; P8BE-NEXT: xxspltd v2, f0, 0
; P8BE-NEXT: blr		; P8BE-NEXT: blr



entry:		entry:
%0 = load i32, i32* %ptr, align 4		%0 = load i32, i32* %ptr, align 4
%conv = sext i32 %0 to i64		%conv = sext i32 %0 to i64
%splat.splatinsert = insertelement <2 x i64> undef, i64 %conv, i32 0		%splat.splatinsert = insertelement <2 x i64> undef, i64 %conv, i32 0
%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer		%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
ret <2 x i64> %splat.splat		ret <2 x i64> %splat.splat
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] handle more splat loadsClosedPublic

Details

Diff Detail

Event Timeline