This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
reduce_scalarization.ll

Differential D57857

[PowerPC] custom lower `v2f64 fpext v2f32`
ClosedPublic

Authored by lei on Feb 6 2019, 3:46 PM.

Download Raw Diff

Details

Reviewers

power-llvm-team
hfinkel
echristo
saghir
nemanjai

Commits

rZORGacad7dc83e51: [PowerPC] custom lower `v2f64 fpext v2f32`
rZORG228ae25f5f60: [PowerPC] custom lower `v2f64 fpext v2f32`
rGacad7dc83e51: [PowerPC] custom lower `v2f64 fpext v2f32`
rG228ae25f5f60: [PowerPC] custom lower `v2f64 fpext v2f32`
rG1ac6e9636c9e: [PowerPC] custom lower `v2f64 fpext v2f32`
rL360429: [PowerPC] custom lower `v2f64 fpext v2f32`

Summary

Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32

eg. For the following IR

%0 = load <2 x float>, <2 x float>* %Ptr, align 8
%1 = fpext <2 x float> %0 to <2 x double>
ret <2 x double> %1

Pre custom lowering:

ld r3, 0(r3)
mtvsrd f0, r3
xxswapd vs34, vs0
xscvspdpn f0, vs0
xxsldwi vs1, vs34, vs34, 3
xscvspdpn f1, vs1
xxmrghd vs34, vs0, vs1

After custom lowering:

lfd f0, 0(r3)
xxmrghw vs0, vs0, vs0
xvcvspdp vs34, vs0

spec2017 improvements:

parest by 1.16%
blender by 1.24%.

spec2006 improvements:

mcf by 2%
xalancbmk by 1.29%

Diff Detail

Repository: rL LLVM

Event Timeline

lei created this revision.Feb 6 2019, 3:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 3:46 PM

Herald added subscribers: jsji, kbarton, hiraditya, nemanjai. · View Herald Transcript

saghir requested changes to this revision.Feb 7 2019, 2:30 PM

saghir added a subscriber: saghir.

saghir added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9526 ↗	(On Diff #185655)	Formatting
9544 ↗	(On Diff #185655)	Minor: `DAG` instead of `dag`.
llvm/lib/Target/PowerPC/PPCISelLowering.h
407 ↗	(On Diff #185655)	`expand` maybe?
llvm/lib/Target/PowerPC/PPCInstrVSX.td
1089 ↗	(On Diff #185655)	You may get rid of this new line.
llvm/test/CodeGen/PowerPC/reduce_scalarization.ll
7 ↗	(On Diff #185655)	Do we need `RUN` on all these lines?

This revision now requires changes to proceed.Feb 7 2019, 2:30 PM

saghir added inline comments.Feb 7 2019, 5:51 PM

llvm/test/CodeGen/PowerPC/reduce_scalarization.ll
7 ↗	(On Diff #185655)	Never mind - please ignore this comment.

lei requested review of this revision.Feb 13 2019, 5:21 AM

amyk added a subscriber: amyk.Feb 13 2019, 9:45 AM

amyk added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9553 ↗	(On Diff #185655)	Spacing between `newOp=DAG...` into `newOp = DAG...` maybe?
llvm/lib/Target/PowerPC/PPCISelLowering.h
407 ↗	(On Diff #185655)	I think maybe the correct word is `extend` since the line below is `FP_EXTEND_LHW`?

Address review comments.

LGTM

This revision is now accepted and ready to land.Feb 13 2019, 12:27 PM

I think approaching the problem this way is unnecessarily limited - and I say that knowing that I may have suggested a similar approach.
However, what we are actually looking to do is to convert a DAG such as:

t1: v2f32,ch = load(...)
t2: v2f32,ch = load(...)
# arbitrary operations on v2f32
tN: v2f64 = fp_extend...

into:

t1: v2f32,ch = load(...)
t2: v2f32,ch = load(...)
t3: v2f64 = fp_extend t1
t4: v2f64 = fp_extend t2
# widen all operations to v2f64

And then all we need is a custom load of two float values into a vector of two double values. I think the best way to do that would be to combine any occurrences of
(v2f64 fp_extend (v2f32 op (v2f32 op_input1), (v2f32 op_input2)...)) (i.e. as long as all its inputs are of type v2f32)
into
(v2f64 op (fp_extend op_input1) [, (fp_extend op_input2)...])

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9524 ↗	(On Diff #186710)	`// We only want to custom lower an extend from v2f32 to v2f64.` Talking about return values or parameters on SD Nodes seems quite unnatural.
9544 ↗	(On Diff #186710)	You are not generating a new DAG, just a new node.
9568 ↗	(On Diff #186710)	It is obvious that an `llvm_unreachable` should not be reached. A more descriptive comment is desired. This is equivalent to emitting a message when the compiler crashes to say "Compiler crashed".
llvm/lib/Target/PowerPC/PPCISelLowering.h
408 ↗	(On Diff #186710)	I don't think you should use the abbreviation `LHW` in these as that seems to suggest a "half-word" and that is not the case. This is the "low half" of a VSR (which is itself a doubleword). I believe `FP-EXTEND_LH` and `LD_VSX_LH` should be adequate (the name of the node as you have it makes it seem as if it mimics an ISA mnemonic, which it does not).

Update comments and renamed new ppc ISD nodes.

I tried the suggested approach to fpextend v2f32 loads to v2f64 and then adding a custom load of two float values into a vector of two double values. However this generated extra instructions for each load and we no longer have any of the performance gains seen with the original approach. With the new approach we see negligible performance changes for all 2017 spec benchmarks with the exception of omnetpp which showed a 1.7% performance degradation.

lei requested review of this revision.Apr 8 2019, 11:56 AM

nemanjai added a reviewer: nemanjai.May 2 2019, 8:52 AM

Other than a few minor nits that can be addressed on the commit, LGTM.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9599 ↗	(On Diff #194183)	In a subsequent patch, we should probably add FP unary operations here as well.
9619 ↗	(On Diff #194183)	nit: naming convention (`NewOp`).
llvm/lib/Target/PowerPC/PPCInstrVSX.td
3307 ↗	(On Diff #194183)	These should copy into `VSRC` rather than `VRRC` so as to avoid unnecessary copies. It is very important to get these into the right register class as the extra copies will certainly reduce the performance impact of this transformation.

This revision is now accepted and ready to land.May 3 2019, 5:51 AM

Closed by commit rL360429: [PowerPC] custom lower `v2f64 fpext v2f32` (authored by lei). · Explain WhyMay 10 2019, 7:02 AM

This revision was automatically updated to reflect the committed changes.

lei marked 2 inline comments as done.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

PowerPC/

PPCISelLowering.h

8 lines

PPCISelLowering.cpp

57 lines

PPCInstrVSX.td

19 lines

test/

CodeGen/

PowerPC/

reduce_scalarization.ll

77 lines

Diff 199013

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

/// QVESPLATI = This corresponds to the QPX qvesplati instruction.		/// QVESPLATI = This corresponds to the QPX qvesplati instruction.
QVESPLATI,		QVESPLATI,

/// QBFLT = Access the underlying QPX floating-point boolean		/// QBFLT = Access the underlying QPX floating-point boolean
/// representation.		/// representation.
QBFLT,		QBFLT,

		/// Custom extend v4f32 to v2f64.
		FP_EXTEND_LH,

/// CHAIN = STBRX CHAIN, GPRC, Ptr, Type - This is a		/// CHAIN = STBRX CHAIN, GPRC, Ptr, Type - This is a
/// byte-swapping store instruction. It byte-swaps the low "Type" bits of		/// byte-swapping store instruction. It byte-swaps the low "Type" bits of
/// the GPRC input, then stores it through Ptr. Type can be either i16 or		/// the GPRC input, then stores it through Ptr. Type can be either i16 or
/// i32.		/// i32.
STBRX = ISD::FIRST_TARGET_MEMORY_OPCODE,		STBRX = ISD::FIRST_TARGET_MEMORY_OPCODE,

/// GPRC, CHAIN = LBRX CHAIN, Ptr, Type - This is a		/// GPRC, CHAIN = LBRX CHAIN, Ptr, Type - This is a
/// byte-swapping load instruction. It loads "Type" bits, byte swaps it,		/// byte-swapping load instruction. It loads "Type" bits, byte swaps it,
Show All 25 Lines	enum NodeType : unsigned {
/// followed by a byte-width for the store.		/// followed by a byte-width for the store.
STXSIX,		STXSIX,

/// VSRC, CHAIN = LXVD2X_LE CHAIN, Ptr - Occurs only for little endian.		/// VSRC, CHAIN = LXVD2X_LE CHAIN, Ptr - Occurs only for little endian.
/// Maps directly to an lxvd2x instruction that will be followed by		/// Maps directly to an lxvd2x instruction that will be followed by
/// an xxswapd.		/// an xxswapd.
LXVD2X,		LXVD2X,

		/// VSRC, CHAIN = LD_VSX_LH CHAIN, Ptr - This is a floating-point load of a
		/// v2f32 value into the lower half of a VSR register.
		LD_VSX_LH,

/// CHAIN = STXVD2X CHAIN, VSRC, Ptr - Occurs only for little endian.		/// CHAIN = STXVD2X CHAIN, VSRC, Ptr - Occurs only for little endian.
/// Maps directly to an stxvd2x instruction that will be preceded by		/// Maps directly to an stxvd2x instruction that will be preceded by
/// an xxswapd.		/// an xxswapd.
STXVD2X,		STXVD2X,

/// Store scalar integers from VSR.		/// Store scalar integers from VSR.
ST_VSR_SCAL_INT,		ST_VSR_SCAL_INT,

▲ Show 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	private:
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBSWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBSWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerVectorLoad(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorLoad(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorStore(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorStore(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerCallResult(SDValue Chain, SDValue InFlag,		SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
CallingConv::ID CallConv, bool isVarArg,		CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &dl, SelectionDAG &DAG,		const SDLoc &dl, SelectionDAG &DAG,
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 871 Lines • ▼ Show 20 Lines	if (Subtarget.hasP9Vector()) {
setOperationAction(ISD::BITCAST, MVT::i128, Custom);		setOperationAction(ISD::BITCAST, MVT::i128, Custom);
// No implementation for these ops for PowerPC.		// No implementation for these ops for PowerPC.
setOperationAction(ISD::FSIN , MVT::f128, Expand);		setOperationAction(ISD::FSIN , MVT::f128, Expand);
setOperationAction(ISD::FCOS , MVT::f128, Expand);		setOperationAction(ISD::FCOS , MVT::f128, Expand);
setOperationAction(ISD::FPOW, MVT::f128, Expand);		setOperationAction(ISD::FPOW, MVT::f128, Expand);
setOperationAction(ISD::FPOWI, MVT::f128, Expand);		setOperationAction(ISD::FPOWI, MVT::f128, Expand);
setOperationAction(ISD::FREM, MVT::f128, Expand);		setOperationAction(ISD::FREM, MVT::f128, Expand);
}		}
		setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);

}		}

if (Subtarget.hasP9Altivec()) {		if (Subtarget.hasP9Altivec()) {
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v8i16, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v8i16, Custom);
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v16i8, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v16i8, Custom);
}		}
}		}
▲ Show 20 Lines • Show All 485 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::QVFPERM: return "PPCISD::QVFPERM";		case PPCISD::QVFPERM: return "PPCISD::QVFPERM";
case PPCISD::QVGPCI: return "PPCISD::QVGPCI";		case PPCISD::QVGPCI: return "PPCISD::QVGPCI";
case PPCISD::QVALIGNI: return "PPCISD::QVALIGNI";		case PPCISD::QVALIGNI: return "PPCISD::QVALIGNI";
case PPCISD::QVESPLATI: return "PPCISD::QVESPLATI";		case PPCISD::QVESPLATI: return "PPCISD::QVESPLATI";
case PPCISD::QBFLT: return "PPCISD::QBFLT";		case PPCISD::QBFLT: return "PPCISD::QBFLT";
case PPCISD::QVLFSb: return "PPCISD::QVLFSb";		case PPCISD::QVLFSb: return "PPCISD::QVLFSb";
case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";		case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";
case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";		case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";
		case PPCISD::LD_VSX_LH: return "PPCISD::LD_VSX_LH";
		case PPCISD::FP_EXTEND_LH: return "PPCISD::FP_EXTEND_LH";
}		}
return nullptr;		return nullptr;
}		}

EVT PPCTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &C,		EVT PPCTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &C,
EVT VT) const {		EVT VT) const {
if (!VT.isVector())		if (!VT.isVector())
return Subtarget.useCRBits() ? MVT::i1 : MVT::i32;		return Subtarget.useCRBits() ? MVT::i1 : MVT::i32;
▲ Show 20 Lines • Show All 8,214 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerABS(SDValue Op, SelectionDAG &DAG) const {
else if (VT == MVT::v8i16)		else if (VT == MVT::v8i16)
BifID = Intrinsic::ppc_altivec_vmaxsh;		BifID = Intrinsic::ppc_altivec_vmaxsh;
else if (VT == MVT::v16i8)		else if (VT == MVT::v16i8)
BifID = Intrinsic::ppc_altivec_vmaxsb;		BifID = Intrinsic::ppc_altivec_vmaxsb;

return BuildIntrinsicOp(BifID, X, Y, DAG, dl, VT);		return BuildIntrinsicOp(BifID, X, Y, DAG, dl, VT);
}		}

		// Custom lowering for fpext vf32 to v2f64
		SDValue PPCTargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {

		assert(Op.getOpcode() == ISD::FP_EXTEND &&
		"Should only be called for ISD::FP_EXTEND");

		// We only want to custom lower an extend from v2f32 to v2f64.
		if (Op.getValueType() != MVT::v2f64 \|\|
		Op.getOperand(0).getValueType() != MVT::v2f32)
		return SDValue();

		SDLoc dl(Op);
		SDValue Op0 = Op.getOperand(0);

		switch (Op0.getOpcode()) {
		default:
		return SDValue();
		case ISD::FADD:
		case ISD::FMUL:
		case ISD::FSUB: {
		SDValue NewLoad[2];
		for (unsigned i = 0, ie = Op0.getNumOperands(); i != ie; ++i) {
		// Ensure both input are loads.
		SDValue LdOp = Op0.getOperand(i);
		if (LdOp.getOpcode() != ISD::LOAD)
		return SDValue();
		// Generate new load node.
		LoadSDNode *LD = cast<LoadSDNode>(LdOp);
		SDValue LoadOps[] = { LD->getChain(), LD->getBasePtr() };
		NewLoad[i] =
		DAG.getMemIntrinsicNode(PPCISD::LD_VSX_LH, dl,
		DAG.getVTList(MVT::v4f32, MVT::Other),
		LoadOps, LD->getMemoryVT(),
		LD->getMemOperand());
		}
		SDValue NewOp = DAG.getNode(Op0.getOpcode(), SDLoc(Op0), MVT::v4f32,
		NewLoad[0], NewLoad[1],
		Op0.getNode()->getFlags());
		return DAG.getNode(PPCISD::FP_EXTEND_LH, dl, MVT::v2f64, NewOp);
		}
		case ISD::LOAD: {
		LoadSDNode *LD = cast<LoadSDNode>(Op0);
		SDValue LoadOps[] = { LD->getChain(), LD->getBasePtr() };
		SDValue NewLd =
		DAG.getMemIntrinsicNode(PPCISD::LD_VSX_LH, dl,
		DAG.getVTList(MVT::v4f32, MVT::Other),
		LoadOps, LD->getMemoryVT(), LD->getMemOperand());
		return DAG.getNode(PPCISD::FP_EXTEND_LH, dl, MVT::v2f64, NewLd);
		}
		}
		llvm_unreachable("ERROR:Should return for all cases within swtich.");
		}

/// LowerOperation - Provide custom lowering hooks for some operations.		/// LowerOperation - Provide custom lowering hooks for some operations.
///		///
SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {		SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default: llvm_unreachable("Wasn't expecting to be able to lower this!");		default: llvm_unreachable("Wasn't expecting to be able to lower this!");
case ISD::ConstantPool: return LowerConstantPool(Op, DAG);		case ISD::ConstantPool: return LowerConstantPool(Op, DAG);
case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);		case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);
case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);		case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);
Show All 37 Lines	SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG);		case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG);
case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);		case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);		case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);
case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);		case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);
case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);		case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::MUL: return LowerMUL(Op, DAG);		case ISD::MUL: return LowerMUL(Op, DAG);
case ISD::ABS: return LowerABS(Op, DAG);		case ISD::ABS: return LowerABS(Op, DAG);
		case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);

// For counter-based loop handling.		// For counter-based loop handling.
case ISD::INTRINSIC_W_CHAIN: return SDValue();		case ISD::INTRINSIC_W_CHAIN: return SDValue();

case ISD::BITCAST: return LowerBITCAST(Op, DAG);		case ISD::BITCAST: return LowerBITCAST(Op, DAG);

// Frame & Return address.		// Frame & Return address.
case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);		case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);
▲ Show 20 Lines • Show All 5,311 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

def PPCRegSPILLTOVSRRCAsmOperand : AsmOperandClass {		def PPCRegSPILLTOVSRRCAsmOperand : AsmOperandClass {
let Name = "RegSPILLTOVSRRC"; let PredicateMethod = "isVSRegNumber";		let Name = "RegSPILLTOVSRRC"; let PredicateMethod = "isVSRegNumber";
}		}

def spilltovsrrc : RegisterOperand<SPILLTOVSRRC> {		def spilltovsrrc : RegisterOperand<SPILLTOVSRRC> {
let ParserMatchClass = PPCRegSPILLTOVSRRCAsmOperand;		let ParserMatchClass = PPCRegSPILLTOVSRRCAsmOperand;
}		}

		def SDT_PPCldvsxlh : SDTypeProfile<1, 1, [
		SDTCisVT<0, v4f32>, SDTCisPtrTy<1>
		]>;

		def SDT_PPCfpextlh : SDTypeProfile<1, 1, [
		SDTCisVT<0, v2f64>, SDTCisVT<1, v4f32>
		]>;

// Little-endian-specific nodes.		// Little-endian-specific nodes.
def SDT_PPClxvd2x : SDTypeProfile<1, 1, [		def SDT_PPClxvd2x : SDTypeProfile<1, 1, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [		def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCxxswapd : SDTypeProfile<1, 1, [		def SDT_PPCxxswapd : SDTypeProfile<1, 1, [
Show All 15 Lines
def PPCmfvsr : SDNode<"PPCISD::MFVSR", SDTUnaryOp, []>;		def PPCmfvsr : SDNode<"PPCISD::MFVSR", SDTUnaryOp, []>;
def PPCmtvsra : SDNode<"PPCISD::MTVSRA", SDTUnaryOp, []>;		def PPCmtvsra : SDNode<"PPCISD::MTVSRA", SDTUnaryOp, []>;
def PPCmtvsrz : SDNode<"PPCISD::MTVSRZ", SDTUnaryOp, []>;		def PPCmtvsrz : SDNode<"PPCISD::MTVSRZ", SDTUnaryOp, []>;
def PPCsvec2fp : SDNode<"PPCISD::SINT_VEC_TO_FP", SDTVecConv, []>;		def PPCsvec2fp : SDNode<"PPCISD::SINT_VEC_TO_FP", SDTVecConv, []>;
def PPCuvec2fp: SDNode<"PPCISD::UINT_VEC_TO_FP", SDTVecConv, []>;		def PPCuvec2fp: SDNode<"PPCISD::UINT_VEC_TO_FP", SDTVecConv, []>;
def PPCswapNoChain : SDNode<"PPCISD::SWAP_NO_CHAIN", SDT_PPCxxswapd>;		def PPCswapNoChain : SDNode<"PPCISD::SWAP_NO_CHAIN", SDT_PPCxxswapd>;
def PPCvabsd : SDNode<"PPCISD::VABSD", SDTVabsd, []>;		def PPCvabsd : SDNode<"PPCISD::VABSD", SDTVabsd, []>;

		def PPCfpextlh : SDNode<"PPCISD::FP_EXTEND_LH", SDT_PPCfpextlh, []>;
		def PPCldvsxlh : SDNode<"PPCISD::LD_VSX_LH", SDT_PPCldvsxlh,
		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, string asmbase,		multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, string asmbase,
string asmstr, InstrItinClass itin, Intrinsic Int,		string asmstr, InstrItinClass itin, Intrinsic Int,
ValueType OutTy, ValueType InTy> {		ValueType OutTy, ValueType InTy> {
let BaseName = asmbase in {		let BaseName = asmbase in {
def NAME : XX3Form_Rc<opcode, xo, (outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		def NAME : XX3Form_Rc<opcode, xo, (outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
!strconcat(asmbase, !strconcat(" ", asmstr)), itin,		!strconcat(asmbase, !strconcat(" ", asmstr)), itin,
[(set OutTy:$XT, (Int InTy:$XA, InTy:$XB))]>;		[(set OutTy:$XT, (Int InTy:$XA, InTy:$XB))]>;
let Defs = [CR6] in		let Defs = [CR6] in
▲ Show 20 Lines • Show All 960 Lines • ▼ Show 20 Lines
def : Pat<(v2f64 (PPCsvec2fp v4i32:$C, 1)),		def : Pat<(v2f64 (PPCsvec2fp v4i32:$C, 1)),
(v2f64 (XVCVSXWDP (v2i64 (XXMRGLW $C, $C))))>;		(v2f64 (XVCVSXWDP (v2i64 (XXMRGLW $C, $C))))>;

def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 0)),		def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 0)),
(v2f64 (XVCVUXWDP (v2i64 (XXMRGHW $C, $C))))>;		(v2f64 (XVCVUXWDP (v2i64 (XXMRGHW $C, $C))))>;
def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 1)),		def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 1)),
(v2f64 (XVCVUXWDP (v2i64 (XXMRGLW $C, $C))))>;		(v2f64 (XVCVUXWDP (v2i64 (XXMRGLW $C, $C))))>;

		def : Pat<(v2f64 (PPCfpextlh v4f32:$C)), (XVCVSPDP (XXMRGHW $C, $C))>;

// Loads.		// Loads.
let Predicates = [HasVSX, HasOnlySwappingMemOps] in {		let Predicates = [HasVSX, HasOnlySwappingMemOps] in {
def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;

// Stores.		// Stores.
def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),		def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
(STXVD2X $rS, xoaddr:$dst)>;		(STXVD2X $rS, xoaddr:$dst)>;
def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;		def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
▲ Show 20 Lines • Show All 2,190 Lines • ▼ Show 20 Lines	def DFSTOREf64 : PPCPostRAExpPseudo<(outs), (ins vsfrc:$XT, memrix:$dst),
"#DFSTOREf64",		"#DFSTOREf64",
[(store f64:$XT, ixaddr:$dst)]>;		[(store f64:$XT, ixaddr:$dst)]>;

def : Pat<(f64 (extloadf32 ixaddr:$src)),		def : Pat<(f64 (extloadf32 ixaddr:$src)),
(COPY_TO_REGCLASS (DFLOADf32 ixaddr:$src), VSFRC)>;		(COPY_TO_REGCLASS (DFLOADf32 ixaddr:$src), VSFRC)>;
def : Pat<(f32 (fpround (f64 (extloadf32 ixaddr:$src)))),		def : Pat<(f32 (fpround (f64 (extloadf32 ixaddr:$src)))),
(f32 (DFLOADf32 ixaddr:$src))>;		(f32 (DFLOADf32 ixaddr:$src))>;

		def : Pat<(v4f32 (PPCldvsxlh xaddr:$src)),
		(COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VSRC)>;
		def : Pat<(v4f32 (PPCldvsxlh ixaddr:$src)),
		(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC)>;

let AddedComplexity = 400 in {		let AddedComplexity = 400 in {
// The following pseudoinstructions are used to ensure the utilization		// The following pseudoinstructions are used to ensure the utilization
// of all 64 VSX registers.		// of all 64 VSX registers.
let Predicates = [IsLittleEndian, HasP9Vector] in {		let Predicates = [IsLittleEndian, HasP9Vector] in {
def : Pat<(v2i64 (scalar_to_vector (i64 (load ixaddr:$src)))),		def : Pat<(v2i64 (scalar_to_vector (i64 (load ixaddr:$src)))),
(v2i64 (XXPERMDIs		(v2i64 (XXPERMDIs
(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC), 2))>;		(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC), 2))>;
▲ Show 20 Lines • Show All 874 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/reduce_scalarization.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names \
				; RUN: -ppc-vsr-nums-as-vr < %s \| FileCheck %s
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-unknown \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names \
				; RUN: -ppc-vsr-nums-as-vr < %s \| FileCheck %s

				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test1(<2 x float>* nocapture readonly %Ptr) {
				; CHECK-LABEL: test1:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r3)
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %Ptr, align 8
				%1 = fpext <2 x float> %0 to <2 x double>
				ret <2 x double> %1
				}

				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test2(<2 x float>* nocapture readonly %a, <2 x float>* nocapture readonly %b) {
				; CHECK-LABEL: test2:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r4)
				; CHECK-NEXT: lfd f1, 0(r3)
				; CHECK-NEXT: xvsubsp vs0, vs1, vs0
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %a, align 8
				%1 = load <2 x float>, <2 x float>* %b, align 8
				%sub = fsub <2 x float> %0, %1
				%2 = fpext <2 x float> %sub to <2 x double>
				ret <2 x double> %2
				}

				; Function Attrs: norecurse nounwind readonly
				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test3(<2 x float>* nocapture readonly %a, <2 x float>* nocapture readonly %b) {
				; CHECK-LABEL: test3:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r4)
				; CHECK-NEXT: lfd f1, 0(r3)
				; CHECK-NEXT: xvaddsp vs0, vs1, vs0
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %a, align 8
				%1 = load <2 x float>, <2 x float>* %b, align 8
				%sub = fadd <2 x float> %0, %1
				%2 = fpext <2 x float> %sub to <2 x double>
				ret <2 x double> %2
				}

				; Function Attrs: norecurse nounwind readonly
				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test4(<2 x float>* nocapture readonly %a, <2 x float>* nocapture readonly %b) {
				; CHECK-LABEL: test4:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r4)
				; CHECK-NEXT: lfd f1, 0(r3)
				; CHECK-NEXT: xvmulsp vs0, vs1, vs0
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %a, align 8
				%1 = load <2 x float>, <2 x float>* %b, align 8
				%sub = fmul <2 x float> %0, %1
				%2 = fpext <2 x float> %sub to <2 x double>
				ret <2 x double> %2
				}