This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
2/3
PPCISelLowering.h
6/8
PPCISelLowering.cpp
2/2
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
2
reduce_scalarization.ll

Differential D57857

[PowerPC] custom lower `v2f64 fpext v2f32`
ClosedPublic

Authored by lei on Feb 6 2019, 3:46 PM.

Download Raw Diff

Details

Reviewers

power-llvm-team
hfinkel
echristo
saghir
nemanjai

Commits

rZORGacad7dc83e51: [PowerPC] custom lower `v2f64 fpext v2f32`
rZORG228ae25f5f60: [PowerPC] custom lower `v2f64 fpext v2f32`
rGacad7dc83e51: [PowerPC] custom lower `v2f64 fpext v2f32`
rG228ae25f5f60: [PowerPC] custom lower `v2f64 fpext v2f32`
rG1ac6e9636c9e: [PowerPC] custom lower `v2f64 fpext v2f32`
rL360429: [PowerPC] custom lower `v2f64 fpext v2f32`

Summary

Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32

eg. For the following IR

%0 = load <2 x float>, <2 x float>* %Ptr, align 8
%1 = fpext <2 x float> %0 to <2 x double>
ret <2 x double> %1

Pre custom lowering:

ld r3, 0(r3)
mtvsrd f0, r3
xxswapd vs34, vs0
xscvspdpn f0, vs0
xxsldwi vs1, vs34, vs34, 3
xscvspdpn f1, vs1
xxmrghd vs34, vs0, vs1

After custom lowering:

lfd f0, 0(r3)
xxmrghw vs0, vs0, vs0
xvcvspdp vs34, vs0

spec2017 improvements:

parest by 1.16%
blender by 1.24%.

spec2006 improvements:

mcf by 2%
xalancbmk by 1.29%

Diff Detail

Event Timeline

lei created this revision.Feb 6 2019, 3:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 3:46 PM

Herald added subscribers: jsji, kbarton, hiraditya, nemanjai. · View Herald Transcript

saghir requested changes to this revision.Feb 7 2019, 2:30 PM

saghir added a subscriber: saghir.

saghir added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9526	Formatting
9544	Minor: `DAG` instead of `dag`.
llvm/lib/Target/PowerPC/PPCISelLowering.h
407	`expand` maybe?
llvm/lib/Target/PowerPC/PPCInstrVSX.td
1089	You may get rid of this new line.
llvm/test/CodeGen/PowerPC/reduce_scalarization.ll
8	Do we need `RUN` on all these lines?

This revision now requires changes to proceed.Feb 7 2019, 2:30 PM

saghir added inline comments.Feb 7 2019, 5:51 PM

llvm/test/CodeGen/PowerPC/reduce_scalarization.ll
8	Never mind - please ignore this comment.

lei requested review of this revision.Feb 13 2019, 5:21 AM

amyk added a subscriber: amyk.Feb 13 2019, 9:45 AM

amyk added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9553	Spacing between `newOp=DAG...` into `newOp = DAG...` maybe?
llvm/lib/Target/PowerPC/PPCISelLowering.h
407	I think maybe the correct word is `extend` since the line below is `FP_EXTEND_LHW`?

Address review comments.

LGTM

This revision is now accepted and ready to land.Feb 13 2019, 12:27 PM

I think approaching the problem this way is unnecessarily limited - and I say that knowing that I may have suggested a similar approach.
However, what we are actually looking to do is to convert a DAG such as:

t1: v2f32,ch = load(...)
t2: v2f32,ch = load(...)
# arbitrary operations on v2f32
tN: v2f64 = fp_extend...

into:

t1: v2f32,ch = load(...)
t2: v2f32,ch = load(...)
t3: v2f64 = fp_extend t1
t4: v2f64 = fp_extend t2
# widen all operations to v2f64

And then all we need is a custom load of two float values into a vector of two double values. I think the best way to do that would be to combine any occurrences of
(v2f64 fp_extend (v2f32 op (v2f32 op_input1), (v2f32 op_input2)...)) (i.e. as long as all its inputs are of type v2f32)
into
(v2f64 op (fp_extend op_input1) [, (fp_extend op_input2)...])

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9524	`// We only want to custom lower an extend from v2f32 to v2f64.` Talking about return values or parameters on SD Nodes seems quite unnatural.
9544	You are not generating a new DAG, just a new node.
9568	It is obvious that an `llvm_unreachable` should not be reached. A more descriptive comment is desired. This is equivalent to emitting a message when the compiler crashes to say "Compiler crashed".
llvm/lib/Target/PowerPC/PPCISelLowering.h
408	I don't think you should use the abbreviation `LHW` in these as that seems to suggest a "half-word" and that is not the case. This is the "low half" of a VSR (which is itself a doubleword). I believe `FP-EXTEND_LH` and `LD_VSX_LH` should be adequate (the name of the node as you have it makes it seem as if it mimics an ISA mnemonic, which it does not).

Update comments and renamed new ppc ISD nodes.

I tried the suggested approach to fpextend v2f32 loads to v2f64 and then adding a custom load of two float values into a vector of two double values. However this generated extra instructions for each load and we no longer have any of the performance gains seen with the original approach. With the new approach we see negligible performance changes for all 2017 spec benchmarks with the exception of omnetpp which showed a 1.7% performance degradation.

lei requested review of this revision.Apr 8 2019, 11:56 AM

nemanjai added a reviewer: nemanjai.May 2 2019, 8:52 AM

Other than a few minor nits that can be addressed on the commit, LGTM.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9533	In a subsequent patch, we should probably add FP unary operations here as well.
9553	nit: naming convention (`NewOp`).
llvm/lib/Target/PowerPC/PPCInstrVSX.td
3307	These should copy into `VSRC` rather than `VRRC` so as to avoid unnecessary copies. It is very important to get these into the right register class as the extra copies will certainly reduce the performance impact of this transformation.

This revision is now accepted and ready to land.May 3 2019, 5:51 AM

Closed by commit rL360429: [PowerPC] custom lower `v2f64 fpext v2f32` (authored by lei). · Explain WhyMay 10 2019, 7:02 AM

This revision was automatically updated to reflect the committed changes.

lei marked 2 inline comments as done.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCISelLowering.h

8 lines

PPCISelLowering.cpp

57 lines

PPCInstrVSX.td

19 lines

test/

CodeGen/

PowerPC/

reduce_scalarization.ll

80 lines

Diff 186710

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

/// QVESPLATI = This corresponds to the QPX qvesplati instruction.		/// QVESPLATI = This corresponds to the QPX qvesplati instruction.
QVESPLATI,		QVESPLATI,

/// QBFLT = Access the underlying QPX floating-point boolean		/// QBFLT = Access the underlying QPX floating-point boolean
/// representation.		/// representation.
QBFLT,		QBFLT,

		/// Custom extend v4f32 to v2f64.
		saghirUnsubmitted Not Done Reply Inline Actions `expand` maybe? saghir: `expand` maybe?
		amykUnsubmitted Done Reply Inline Actions I think maybe the correct word is `extend` since the line below is `FP_EXTEND_LHW`? amyk: I think maybe the correct word is `extend` since the line below is `FP_EXTEND_LHW`?
		FP_EXTEND_LHW,
		nemanjaiUnsubmitted Done Reply Inline Actions I don't think you should use the abbreviation `LHW` in these as that seems to suggest a "half-word" and that is not the case. This is the "low half" of a VSR (which is itself a doubleword). I believe `FP-EXTEND_LH` and `LD_VSX_LH` should be adequate (the name of the node as you have it makes it seem as if it mimics an ISA mnemonic, which it does not). nemanjai: I don't think you should use the abbreviation `LHW` in these as that seems to suggest a "half…

/// CHAIN = STBRX CHAIN, GPRC, Ptr, Type - This is a		/// CHAIN = STBRX CHAIN, GPRC, Ptr, Type - This is a
/// byte-swapping store instruction. It byte-swaps the low "Type" bits of		/// byte-swapping store instruction. It byte-swaps the low "Type" bits of
/// the GPRC input, then stores it through Ptr. Type can be either i16 or		/// the GPRC input, then stores it through Ptr. Type can be either i16 or
/// i32.		/// i32.
STBRX = ISD::FIRST_TARGET_MEMORY_OPCODE,		STBRX = ISD::FIRST_TARGET_MEMORY_OPCODE,

/// GPRC, CHAIN = LBRX CHAIN, Ptr, Type - This is a		/// GPRC, CHAIN = LBRX CHAIN, Ptr, Type - This is a
/// byte-swapping load instruction. It loads "Type" bits, byte swaps it,		/// byte-swapping load instruction. It loads "Type" bits, byte swaps it,
Show All 25 Lines	enum NodeType : unsigned {
/// followed by a byte-width for the store.		/// followed by a byte-width for the store.
STXSIX,		STXSIX,

/// VSRC, CHAIN = LXVD2X_LE CHAIN, Ptr - Occurs only for little endian.		/// VSRC, CHAIN = LXVD2X_LE CHAIN, Ptr - Occurs only for little endian.
/// Maps directly to an lxvd2x instruction that will be followed by		/// Maps directly to an lxvd2x instruction that will be followed by
/// an xxswapd.		/// an xxswapd.
LXVD2X,		LXVD2X,

		/// VSRC, CHAIN = LXVLHW CHAIN, Ptr - This is a floating-point load of a
		/// v2f32 value into the lower half of a VSR register.
		LXVLHW,

/// CHAIN = STXVD2X CHAIN, VSRC, Ptr - Occurs only for little endian.		/// CHAIN = STXVD2X CHAIN, VSRC, Ptr - Occurs only for little endian.
/// Maps directly to an stxvd2x instruction that will be preceded by		/// Maps directly to an stxvd2x instruction that will be preceded by
/// an xxswapd.		/// an xxswapd.
STXVD2X,		STXVD2X,

/// Store scalar integers from VSR.		/// Store scalar integers from VSR.
ST_VSR_SCAL_INT,		ST_VSR_SCAL_INT,

▲ Show 20 Lines • Show All 557 Lines • ▼ Show 20 Lines	private:
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBSWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBSWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerVectorLoad(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorLoad(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorStore(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorStore(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerCallResult(SDValue Chain, SDValue InFlag,		SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
CallingConv::ID CallConv, bool isVarArg,		CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &dl, SelectionDAG &DAG,		const SDLoc &dl, SelectionDAG &DAG,
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 859 Lines • ▼ Show 20 Lines	if (Subtarget.hasP9Vector()) {
setOperationAction(ISD::BITCAST, MVT::i128, Custom);		setOperationAction(ISD::BITCAST, MVT::i128, Custom);
// No implementation for these ops for PowerPC.		// No implementation for these ops for PowerPC.
setOperationAction(ISD::FSIN , MVT::f128, Expand);		setOperationAction(ISD::FSIN , MVT::f128, Expand);
setOperationAction(ISD::FCOS , MVT::f128, Expand);		setOperationAction(ISD::FCOS , MVT::f128, Expand);
setOperationAction(ISD::FPOW, MVT::f128, Expand);		setOperationAction(ISD::FPOW, MVT::f128, Expand);
setOperationAction(ISD::FPOWI, MVT::f128, Expand);		setOperationAction(ISD::FPOWI, MVT::f128, Expand);
setOperationAction(ISD::FREM, MVT::f128, Expand);		setOperationAction(ISD::FREM, MVT::f128, Expand);
}		}
		setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);

}		}

if (Subtarget.hasP9Altivec()) {		if (Subtarget.hasP9Altivec()) {
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v8i16, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v8i16, Custom);
setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v16i8, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, MVT::v16i8, Custom);
}		}
}		}
▲ Show 20 Lines • Show All 484 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::QVFPERM: return "PPCISD::QVFPERM";		case PPCISD::QVFPERM: return "PPCISD::QVFPERM";
case PPCISD::QVGPCI: return "PPCISD::QVGPCI";		case PPCISD::QVGPCI: return "PPCISD::QVGPCI";
case PPCISD::QVALIGNI: return "PPCISD::QVALIGNI";		case PPCISD::QVALIGNI: return "PPCISD::QVALIGNI";
case PPCISD::QVESPLATI: return "PPCISD::QVESPLATI";		case PPCISD::QVESPLATI: return "PPCISD::QVESPLATI";
case PPCISD::QBFLT: return "PPCISD::QBFLT";		case PPCISD::QBFLT: return "PPCISD::QBFLT";
case PPCISD::QVLFSb: return "PPCISD::QVLFSb";		case PPCISD::QVLFSb: return "PPCISD::QVLFSb";
case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";		case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";
case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";		case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";
		case PPCISD::LXVLHW: return "PPCISD::LXVLHW";
		case PPCISD::FP_EXTEND_LHW: return "PPCISD::FP_EXTEND_LHW";
}		}
return nullptr;		return nullptr;
}		}

EVT PPCTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &C,		EVT PPCTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &C,
EVT VT) const {		EVT VT) const {
if (!VT.isVector())		if (!VT.isVector())
return Subtarget.useCRBits() ? MVT::i1 : MVT::i32;		return Subtarget.useCRBits() ? MVT::i1 : MVT::i32;
▲ Show 20 Lines • Show All 8,131 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerABS(SDValue Op, SelectionDAG &DAG) const {
else if (VT == MVT::v8i16)		else if (VT == MVT::v8i16)
BifID = Intrinsic::ppc_altivec_vmaxsh;		BifID = Intrinsic::ppc_altivec_vmaxsh;
else if (VT == MVT::v16i8)		else if (VT == MVT::v16i8)
BifID = Intrinsic::ppc_altivec_vmaxsb;		BifID = Intrinsic::ppc_altivec_vmaxsb;

return BuildIntrinsicOp(BifID, X, Y, DAG, dl, VT);		return BuildIntrinsicOp(BifID, X, Y, DAG, dl, VT);
}		}

		// Custom lowering for fpext vf32 to v2f64
		SDValue PPCTargetLowering::LowerFP_EXTEND(SDValue Op, SelectionDAG &DAG) const {

		assert(Op.getOpcode() == ISD::FP_EXTEND &&
		"Should only be called for ISD::FP_EXTEND");

		// return value is not MTV::v2f64 or param is not v2f32
		nemanjaiUnsubmitted Done Reply Inline Actions `// We only want to custom lower an extend from v2f32 to v2f64.` Talking about return values or parameters on SD Nodes seems quite unnatural. nemanjai: `// We only want to custom lower an extend from v2f32 to v2f64.` Talking about return values…
		if (Op.getValueType() != MVT::v2f64 \|\|
		Op.getOperand(0).getValueType() != MVT::v2f32)
		saghirUnsubmitted Done Reply Inline Actions Formatting saghir: Formatting
		return SDValue();

		SDLoc dl(Op);
		SDValue Op0 = Op.getOperand(0);

		switch (Op0.getOpcode()) {
		default:
		nemanjaiUnsubmitted Not Done Reply Inline Actions In a subsequent patch, we should probably add FP unary operations here as well. nemanjai: In a subsequent patch, we should probably add FP unary operations here as well.
		return SDValue();
		case ISD::FADD:
		case ISD::FMUL:
		case ISD::FSUB: {
		SDValue NewLoad[2];
		for (unsigned i = 0, ie = Op0.getNumOperands(); i != ie; ++i) {
		// Ensure both input are loads.
		SDValue LdOp = Op0.getOperand(i);
		if (LdOp.getOpcode() != ISD::LOAD)
		return SDValue();
		// Generate new load DAG.
		saghirUnsubmitted Done Reply Inline Actions Minor: `DAG` instead of `dag`. saghir: Minor: `DAG` instead of `dag`.
		nemanjaiUnsubmitted Done Reply Inline Actions You are not generating a new DAG, just a new node. nemanjai: You are not generating a new DAG, just a new node.
		LoadSDNode *LD = cast<LoadSDNode>(LdOp);
		SDValue LoadOps[] = { LD->getChain(), LD->getBasePtr() };
		NewLoad[i] =
		DAG.getMemIntrinsicNode(PPCISD::LXVLHW, dl,
		DAG.getVTList(MVT::v4f32, MVT::Other),
		LoadOps, LD->getMemoryVT(),
		LD->getMemOperand());
		}
		SDValue newOp = DAG.getNode(Op0.getOpcode(), SDLoc(Op0), MVT::v4f32,
		amykUnsubmitted Done Reply Inline Actions Spacing between `newOp=DAG...` into `newOp = DAG...` maybe? amyk: Spacing between `newOp=DAG...` into `newOp = DAG...` maybe?
		nemanjaiUnsubmitted Done Reply Inline Actions nit: naming convention (`NewOp`). nemanjai: nit: naming convention (`NewOp`).
		NewLoad[0], NewLoad[1],
		Op0.getNode()->getFlags());
		return DAG.getNode(PPCISD::FP_EXTEND_LHW, dl, MVT::v2f64, newOp);
		}
		case ISD::LOAD: {
		LoadSDNode *LD = cast<LoadSDNode>(Op0);
		SDValue LoadOps[] = { LD->getChain(), LD->getBasePtr() };
		SDValue NewLd =
		DAG.getMemIntrinsicNode(PPCISD::LXVLHW, dl,
		DAG.getVTList(MVT::v4f32, MVT::Other),
		LoadOps, LD->getMemoryVT(), LD->getMemOperand());
		return DAG.getNode(PPCISD::FP_EXTEND_LHW, dl, MVT::v2f64, NewLd);
		}
		}
		llvm_unreachable("Should never reach here!");
		nemanjaiUnsubmitted Not Done Reply Inline Actions It is obvious that an `llvm_unreachable` should not be reached. A more descriptive comment is desired. This is equivalent to emitting a message when the compiler crashes to say "Compiler crashed". nemanjai: It is obvious that an `llvm_unreachable` should not be reached. A more descriptive comment is…
		}

/// LowerOperation - Provide custom lowering hooks for some operations.		/// LowerOperation - Provide custom lowering hooks for some operations.
///		///
SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {		SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default: llvm_unreachable("Wasn't expecting to be able to lower this!");		default: llvm_unreachable("Wasn't expecting to be able to lower this!");
case ISD::ConstantPool: return LowerConstantPool(Op, DAG);		case ISD::ConstantPool: return LowerConstantPool(Op, DAG);
case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);		case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);
case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);		case ISD::GlobalAddress: return LowerGlobalAddress(Op, DAG);
Show All 37 Lines	SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG);		case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG);
case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);		case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);		case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);
case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);		case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);
case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);		case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::MUL: return LowerMUL(Op, DAG);		case ISD::MUL: return LowerMUL(Op, DAG);
case ISD::ABS: return LowerABS(Op, DAG);		case ISD::ABS: return LowerABS(Op, DAG);
		case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);

// For counter-based loop handling.		// For counter-based loop handling.
case ISD::INTRINSIC_W_CHAIN: return SDValue();		case ISD::INTRINSIC_W_CHAIN: return SDValue();

case ISD::BITCAST: return LowerBITCAST(Op, DAG);		case ISD::BITCAST: return LowerBITCAST(Op, DAG);

// Frame & Return address.		// Frame & Return address.
case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);		case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);
▲ Show 20 Lines • Show All 5,082 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

def PPCRegSPILLTOVSRRCAsmOperand : AsmOperandClass {		def PPCRegSPILLTOVSRRCAsmOperand : AsmOperandClass {
let Name = "RegSPILLTOVSRRC"; let PredicateMethod = "isVSRegNumber";		let Name = "RegSPILLTOVSRRC"; let PredicateMethod = "isVSRegNumber";
}		}

def spilltovsrrc : RegisterOperand<SPILLTOVSRRC> {		def spilltovsrrc : RegisterOperand<SPILLTOVSRRC> {
let ParserMatchClass = PPCRegSPILLTOVSRRCAsmOperand;		let ParserMatchClass = PPCRegSPILLTOVSRRCAsmOperand;
}		}

		def SDT_PPClxvlhw : SDTypeProfile<1, 1, [
		SDTCisVT<0, v4f32>, SDTCisPtrTy<1>
		]>;

		def SDT_PPCfpextlhw : SDTypeProfile<1, 1, [
		SDTCisVT<0, v2f64>, SDTCisVT<1, v4f32>
		]>;

// Little-endian-specific nodes.		// Little-endian-specific nodes.
def SDT_PPClxvd2x : SDTypeProfile<1, 1, [		def SDT_PPClxvd2x : SDTypeProfile<1, 1, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [		def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCxxswapd : SDTypeProfile<1, 1, [		def SDT_PPCxxswapd : SDTypeProfile<1, 1, [
Show All 15 Lines
def PPCmfvsr : SDNode<"PPCISD::MFVSR", SDTUnaryOp, []>;		def PPCmfvsr : SDNode<"PPCISD::MFVSR", SDTUnaryOp, []>;
def PPCmtvsra : SDNode<"PPCISD::MTVSRA", SDTUnaryOp, []>;		def PPCmtvsra : SDNode<"PPCISD::MTVSRA", SDTUnaryOp, []>;
def PPCmtvsrz : SDNode<"PPCISD::MTVSRZ", SDTUnaryOp, []>;		def PPCmtvsrz : SDNode<"PPCISD::MTVSRZ", SDTUnaryOp, []>;
def PPCsvec2fp : SDNode<"PPCISD::SINT_VEC_TO_FP", SDTVecConv, []>;		def PPCsvec2fp : SDNode<"PPCISD::SINT_VEC_TO_FP", SDTVecConv, []>;
def PPCuvec2fp: SDNode<"PPCISD::UINT_VEC_TO_FP", SDTVecConv, []>;		def PPCuvec2fp: SDNode<"PPCISD::UINT_VEC_TO_FP", SDTVecConv, []>;
def PPCswapNoChain : SDNode<"PPCISD::SWAP_NO_CHAIN", SDT_PPCxxswapd>;		def PPCswapNoChain : SDNode<"PPCISD::SWAP_NO_CHAIN", SDT_PPCxxswapd>;
def PPCvabsd : SDNode<"PPCISD::VABSD", SDTVabsd, []>;		def PPCvabsd : SDNode<"PPCISD::VABSD", SDTVabsd, []>;

		def PPCfpextlhw : SDNode<"PPCISD::FP_EXTEND_LHW", SDT_PPCfpextlhw, []>;
		def PPClxvlhw : SDNode<"PPCISD::LXVLHW", SDT_PPClxvlhw,
		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, string asmbase,		multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, string asmbase,
string asmstr, InstrItinClass itin, Intrinsic Int,		string asmstr, InstrItinClass itin, Intrinsic Int,
ValueType OutTy, ValueType InTy> {		ValueType OutTy, ValueType InTy> {
let BaseName = asmbase in {		let BaseName = asmbase in {
def NAME : XX3Form_Rc<opcode, xo, (outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		def NAME : XX3Form_Rc<opcode, xo, (outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
!strconcat(asmbase, !strconcat(" ", asmstr)), itin,		!strconcat(asmbase, !strconcat(" ", asmstr)), itin,
[(set OutTy:$XT, (Int InTy:$XA, InTy:$XB))]>;		[(set OutTy:$XT, (Int InTy:$XA, InTy:$XB))]>;
let Defs = [CR6] in		let Defs = [CR6] in
▲ Show 20 Lines • Show All 962 Lines • ▼ Show 20 Lines
def : Pat<(v2f64 (PPCsvec2fp v4i32:$C, 1)),		def : Pat<(v2f64 (PPCsvec2fp v4i32:$C, 1)),
(v2f64 (XVCVSXWDP (v2i64 (XXMRGLW $C, $C))))>;		(v2f64 (XVCVSXWDP (v2i64 (XXMRGLW $C, $C))))>;

def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 0)),		def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 0)),
(v2f64 (XVCVUXWDP (v2i64 (XXMRGHW $C, $C))))>;		(v2f64 (XVCVUXWDP (v2i64 (XXMRGHW $C, $C))))>;
def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 1)),		def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 1)),
(v2f64 (XVCVUXWDP (v2i64 (XXMRGLW $C, $C))))>;		(v2f64 (XVCVUXWDP (v2i64 (XXMRGLW $C, $C))))>;

		def : Pat<(v2f64 (PPCfpextlhw v4f32:$C)), (XVCVSPDP (XXMRGHW $C, $C))>;

// Loads.		// Loads.
let Predicates = [HasVSX, HasOnlySwappingMemOps] in {		let Predicates = [HasVSX, HasOnlySwappingMemOps] in {
def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;

// Stores.		// Stores.
def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),		def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
(STXVD2X $rS, xoaddr:$dst)>;		(STXVD2X $rS, xoaddr:$dst)>;
def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;		def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
}		}
let Predicates = [IsBigEndian, HasVSX, HasOnlySwappingMemOps] in {		let Predicates = [IsBigEndian, HasVSX, HasOnlySwappingMemOps] in {
		saghirUnsubmitted Done Reply Inline Actions You may get rid of this new line. saghir: You may get rid of this new line.
def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;		def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;
def : Pat<(v4i32 (int_ppc_vsx_lxvw4x xoaddr:$src)), (LXVW4X xoaddr:$src)>;		def : Pat<(v4i32 (int_ppc_vsx_lxvw4x xoaddr:$src)), (LXVW4X xoaddr:$src)>;
def : Pat<(store v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;		def : Pat<(store v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
def : Pat<(store v2i64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;		def : Pat<(store v2i64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
def : Pat<(store v4i32:$XT, xoaddr:$dst), (STXVW4X $XT, xoaddr:$dst)>;		def : Pat<(store v4i32:$XT, xoaddr:$dst), (STXVW4X $XT, xoaddr:$dst)>;
def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),		def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),
▲ Show 20 Lines • Show All 2,200 Lines • ▼ Show 20 Lines	def DFSTOREf64 : PPCPostRAExpPseudo<(outs), (ins vsfrc:$XT, memrix:$dst),
"#DFSTOREf64",		"#DFSTOREf64",
[(store f64:$XT, ixaddr:$dst)]>;		[(store f64:$XT, ixaddr:$dst)]>;

def : Pat<(f64 (extloadf32 ixaddr:$src)),		def : Pat<(f64 (extloadf32 ixaddr:$src)),
(COPY_TO_REGCLASS (DFLOADf32 ixaddr:$src), VSFRC)>;		(COPY_TO_REGCLASS (DFLOADf32 ixaddr:$src), VSFRC)>;
def : Pat<(f32 (fpround (f64 (extloadf32 ixaddr:$src)))),		def : Pat<(f32 (fpround (f64 (extloadf32 ixaddr:$src)))),
(f32 (DFLOADf32 ixaddr:$src))>;		(f32 (DFLOADf32 ixaddr:$src))>;

		def : Pat<(v4f32 (PPClxvlhw xaddr:$src)),
		(COPY_TO_REGCLASS (XFLOADf64 xaddr:$src), VRRC)>;
		nemanjaiUnsubmitted Done Reply Inline Actions These should copy into `VSRC` rather than `VRRC` so as to avoid unnecessary copies. It is very important to get these into the right register class as the extra copies will certainly reduce the performance impact of this transformation. nemanjai: These should copy into `VSRC` rather than `VRRC` so as to avoid unnecessary copies. It is very…
		def : Pat<(v4f32 (PPClxvlhw ixaddr:$src)),
		(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VRRC)>;

let AddedComplexity = 400 in {		let AddedComplexity = 400 in {
// The following pseudoinstructions are used to ensure the utilization		// The following pseudoinstructions are used to ensure the utilization
// of all 64 VSX registers.		// of all 64 VSX registers.
let Predicates = [IsLittleEndian, HasP9Vector] in {		let Predicates = [IsLittleEndian, HasP9Vector] in {
def : Pat<(v2i64 (scalar_to_vector (i64 (load ixaddr:$src)))),		def : Pat<(v2i64 (scalar_to_vector (i64 (load ixaddr:$src)))),
(v2i64 (XXPERMDIs		(v2i64 (XXPERMDIs
(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC), 2))>;		(COPY_TO_REGCLASS (DFLOADf64 ixaddr:$src), VSRC), 2))>;
▲ Show 20 Lines • Show All 874 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/reduce_scalarization.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names \
				; RUN: -ppc-vsr-nums-as-vr < %s \| FileCheck %s
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-unknown \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names \
				; RUN: -ppc-vsr-nums-as-vr < %s \| FileCheck %s

				saghirUnsubmitted Not Done Reply Inline Actions Do we need `RUN` on all these lines? saghir: Do we need `RUN` on all these lines?
				saghirUnsubmitted Not Done Reply Inline Actions Never mind - please ignore this comment. saghir: Never mind - please ignore this comment.
				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test1(<2 x float>* nocapture readonly %Ptr) {
				; CHECK-LABEL: test1:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r3)
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %Ptr, align 8
				%1 = fpext <2 x float> %0 to <2 x double>
				ret <2 x double> %1
				}

				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test2(<2 x float>* nocapture readonly %a, <2 x float>* nocapture readonly %b) {
				; CHECK-LABEL: test2:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r4)
				; CHECK-NEXT: xxlor v2, vs0, vs0
				; CHECK-NEXT: lfd f0, 0(r3)
				; CHECK-NEXT: xvsubsp vs0, vs0, v2
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %a, align 8
				%1 = load <2 x float>, <2 x float>* %b, align 8
				%sub = fsub <2 x float> %0, %1
				%2 = fpext <2 x float> %sub to <2 x double>
				ret <2 x double> %2
				}

				; Function Attrs: norecurse nounwind readonly
				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test3(<2 x float>* nocapture readonly %a, <2 x float>* nocapture readonly %b) {
				; CHECK-LABEL: test3:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r4)
				; CHECK-NEXT: xxlor v2, vs0, vs0
				; CHECK-NEXT: lfd f0, 0(r3)
				; CHECK-NEXT: xvaddsp vs0, vs0, v2
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %a, align 8
				%1 = load <2 x float>, <2 x float>* %b, align 8
				%sub = fadd <2 x float> %0, %1
				%2 = fpext <2 x float> %sub to <2 x double>
				ret <2 x double> %2
				}

				; Function Attrs: norecurse nounwind readonly
				; Function Attrs: norecurse nounwind readonly
				define dso_local <2 x double> @test4(<2 x float>* nocapture readonly %a, <2 x float>* nocapture readonly %b) {
				; CHECK-LABEL: test4:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lfd f0, 0(r4)
				; CHECK-NEXT: xxlor v2, vs0, vs0
				; CHECK-NEXT: lfd f0, 0(r3)
				; CHECK-NEXT: xvmulsp vs0, vs0, v2
				; CHECK-NEXT: xxmrghw vs0, vs0, vs0
				; CHECK-NEXT: xvcvspdp v2, vs0
				; CHECK-NEXT: blr
				entry:
				%0 = load <2 x float>, <2 x float>* %a, align 8
				%1 = load <2 x float>, <2 x float>* %b, align 8
				%sub = fmul <2 x float> %0, %1
				%2 = fpext <2 x float> %sub to <2 x double>
				ret <2 x double> %2
				}