This is an archive of the discontinued LLVM Phabricator instance.

Add direct moves to/from VSR and exploit them for FP/INT conversions
ClosedPublic

Authored by nemanjai on Apr 9 2015, 10:39 AM.

Download Raw Diff

Details

Reviewers

wschmidt
echristo
kbarton
seurer
hfinkel

Summary

This patch adds support for the direct move instructions to and from VSRs that can be used for converting floating point to fixed point values and vice versa without the slow load/store combinations to the same memory locations.
For now, exploitation is limited to explicit conversions between floating/fixed point values and only on byte/halfword/word/doubleword sizes.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 23501.Apr 9 2015, 10:39 AM

nemanjai retitled this revision from to Add direct moves to/from VSR and exploit them for FP/INT conversions.

nemanjai updated this object.

nemanjai edited the test plan for this revision. (Show Details)

nemanjai added reviewers: wschmidt, hfinkel, kbarton, seurer, echristo.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: Unknown Object (MLST).

nemanjai added inline comments.Apr 9 2015, 10:41 AM

lib/Target/PowerPC/PPCInstrFormats.td
770	This brace will be moved up to where it should be in the code (already done in my source tree).
lib/Target/PowerPC/PPCInstrVSX.td
992	In the actual commit, this will have both: HasDirectMove, HasVSX to indicate what the brace terminates.

nemanjai mentioned this in D8930: Add Clang support for -mdirect-move on PPC.Apr 9 2015, 10:46 AM

Some style issues, and some fragile test-case concerns.

lib/Target/PowerPC/PPCFastISel.cpp
962	Obligatory "add a period" comment.
1072	Likewise.
lib/Target/PowerPC/PPCISelLowering.cpp
5954	Missing space before left parenthesis.
6105	Space before parenthesis.
lib/Target/PowerPC/PPCInstrVSX.td
992	In all of the above, please indent the overflow lines so that the first character is underneath the character following the "<" character (as you did for MFVSRWZ, but for none of the others).
test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll
16	Specifying the intermediate register number (0) makes the test case more likely to break in the future. Please use {{[0-9]+}} so the test isn't reliant on specific register allocation. The 1 and 3 are ok because they are ABI registers we expect to use. This applies to all the tests here.
28	Here the result register of the xscvuxddp is not an ABI reg so should also use a regexp, not a specific reg. (Also, why didn't it end up in float reg 1?)
80	Same concerns with result reg.
132	And again.
184	And again.
236	And again.
288	And again.
427	Please remove the attributes and metadata. Delete these trailing lines, and remove the #0 from the function definitions.

Couple of inline questions, please do fix up all of the comments to be complete sentences. Bill looks like he has the rest of the correctness issues.

Thanks!

-eric

lib/Target/PowerPC/PPCISelLowering.cpp
5917	Block comment before the function please.
test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll
2–3	Can probably just use -unknown-unknown here as the OS part of the triple?
test/CodeGen/PowerPC/stfiwx.ll
1–2 ↗	(On Diff #23501)	How is direct-move getting turned on? I thought it was power8 only?

nemanjai added inline comments.Apr 9 2015, 8:15 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6069	This is going away in the patch that I'm about to upload. Thanks Bill for catching the register number issue that led to the realization that we have this unnecessary rounding.
lib/Target/PowerPC/PPCInstrVSX.td
992	Ugh, I don't know how I miss these things - it is so obviously misaligned. Thanks for pointing it out. Fixed and will be part of the next revision.
test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll
2–3	Yup, I don't see why not. Will do. Thanks for the comment.
16	I will turn these into a regular expressions. Thanks for the tip.
28	I'll turn these into regular expressions if you would like. To answer the question in parentheses, it does not end up in float reg 1 because it is followed by an frsp since we need to round it to single precision. I will change the custom lowering to use PPCISD::FCFID[U]S when converting from any integral value to a single-precision float. I replicated the existing logic which in retrospect does not seem sound (actually I just realized that existing logic only rounds if there is no FPCVT). It uses the SDAG nodes for rounding directly to single-precision only when we are converting i64 to f32 but not for other integral types. Since we are now assuming hasFPCVT for the entire function, there is no harm in refactoring this to skip the need for the extra rounding instruction. Wow, I am really glad you pointed this out since I didn't really think about why FPR 1 was not used. Thanks.
test/CodeGen/PowerPC/stfiwx.ll
1–2 ↗	(On Diff #23501)	I was running on a Power 8 system and this failed. However, I believe the assumption of a default CPU may have been removed in a newer revision than the one I was modifying for this patch. I will investigate if this can safely be removed. I just ran this without the -mattr and it is OK now with a newer revision. Removed the change to this test case.

Addressed the comments from Bill and Eric. Notable changes:

we use conversions directly to single-precision when converting to f32 (rather than converting to f64 and subsequently rounding)
test case now does not specify a specific register except where required by the ABI (instead a regex is used)
formatting issues fixed
removed the unnecessary change to stfiwx.ll test case (since no longer required)

nemanjai added inline comments.Apr 9 2015, 8:48 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6060	This comment is now redundant. I forgot to remove it prior to uploading the patch. It is removed from my source tree so it won't show up in the final commit (or the next review if one is necessary).

nemanjai added inline comments.Apr 10 2015, 7:34 AM

test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll
13	The regular expression will be changed from {{[0-9]+}} to [[CONV-REG:[0-9]+]] for the first occurrence of the register and to [[CONV-REG]] for the second occurrence. This is to ensure that the destination register for what we convert is the same as the source register for the move instruction. This applies throughout. If this turns out to be the only comment, no further review will be posted but the change will be made in the committed version.

With the issues you've already promised to fix, and with one nit I noticed, LGTM.

One question: I was surprised to see that XSCVUXDSP is unimplemented. Do we have a work item open to address that?

Thanks,
Bill

lib/Target/PowerPC/PPCISelLowering.cpp
6104	Oh look, another missing period! 3.7 demerits.

This revision is now accepted and ready to land.Apr 10 2015, 10:00 AM

Committed revision 234682.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

6 lines

4 lines

13 lines

80 lines

6 lines

25 lines

2 lines

1 line

test/

CodeGen/

PowerPC/

fp-int-conversions-direct-moves.ll

426 lines

MC/

Disassembler/

PowerPC/

vsx.txt

14 lines

PowerPC/

vsx.s

17 lines

Diff 23554

lib/Target/PowerPC/PPC.td

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	def FeatureP8Altivec : SubtargetFeature<"power8-altivec", "HasP8Altivec", "true",
"Enable POWER8 Altivec instructions",		"Enable POWER8 Altivec instructions",
[FeatureAltivec]>;		[FeatureAltivec]>;
def FeatureP8Crypto : SubtargetFeature<"crypto", "HasP8Crypto", "true",		def FeatureP8Crypto : SubtargetFeature<"crypto", "HasP8Crypto", "true",
"Enable POWER8 Crypto instructions",		"Enable POWER8 Crypto instructions",
[FeatureP8Altivec]>;		[FeatureP8Altivec]>;
def FeatureP8Vector : SubtargetFeature<"power8-vector", "HasP8Vector", "true",		def FeatureP8Vector : SubtargetFeature<"power8-vector", "HasP8Vector", "true",
"Enable POWER8 vector instructions",		"Enable POWER8 vector instructions",
[FeatureVSX, FeatureP8Altivec]>;		[FeatureVSX, FeatureP8Altivec]>;
		def FeatureDirectMove :
		SubtargetFeature<"direct-move", "HasDirectMove", "true",
		"Enable Power8 direct move instructions",
		[FeatureVSX]>;
def FeaturePartwordAtomic : SubtargetFeature<"partword-atomics",		def FeaturePartwordAtomic : SubtargetFeature<"partword-atomics",
"HasPartwordAtomics", "true",		"HasPartwordAtomics", "true",
"Enable l[bh]arx and st[bh]cx.">;		"Enable l[bh]arx and st[bh]cx.">;
def FeatureInvariantFunctionDescriptors :		def FeatureInvariantFunctionDescriptors :
SubtargetFeature<"invariant-function-descriptors",		SubtargetFeature<"invariant-function-descriptors",
"HasInvariantFunctionDescriptors", "true",		"HasInvariantFunctionDescriptors", "true",
"Assume function descriptors are invariant">;		"Assume function descriptors are invariant">;
def FeatureHTM : SubtargetFeature<"htm", "HasHTM", "true",		def FeatureHTM : SubtargetFeature<"htm", "HasHTM", "true",
Show All 26 Lines	list<SubtargetFeature> Power7FeatureList =
FeatureRecipPrec, FeatureSTFIWX, FeatureLFIWAX,		FeatureRecipPrec, FeatureSTFIWX, FeatureLFIWAX,
FeatureFPRND, FeatureFPCVT, FeatureISEL,		FeatureFPRND, FeatureFPCVT, FeatureISEL,
FeaturePOPCNTD, FeatureCMPB, FeatureLDBRX,		FeaturePOPCNTD, FeatureCMPB, FeatureLDBRX,
Feature64Bit /, Feature64BitRegs /, FeaturePartwordAtomic,		Feature64Bit /, Feature64BitRegs /, FeaturePartwordAtomic,
FeatureBPERMD, FeatureExtDiv,		FeatureBPERMD, FeatureExtDiv,
DeprecatedMFTB, DeprecatedDST];		DeprecatedMFTB, DeprecatedDST];
list<SubtargetFeature> Power8SpecificFeatures =		list<SubtargetFeature> Power8SpecificFeatures =
[DirectivePwr8, FeatureP8Altivec, FeatureP8Vector, FeatureP8Crypto,		[DirectivePwr8, FeatureP8Altivec, FeatureP8Vector, FeatureP8Crypto,
FeatureHTM, FeatureICBT];		FeatureHTM, FeatureDirectMove, FeatureICBT];
list<SubtargetFeature> Power8FeatureList =		list<SubtargetFeature> Power8FeatureList =
!listconcat(Power7FeatureList, Power8SpecificFeatures);		!listconcat(Power7FeatureList, Power8SpecificFeatures);
}		}

// Note: Future features to add when support is extended to more		// Note: Future features to add when support is extended to more
// recent ISA levels:		// recent ISA levels:
//		//
// DFP p6, p6x, p7 decimal floating-point instructions		// DFP p6, p6x, p7 decimal floating-point instructions
▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCFastISel.cpp

Show First 20 Lines • Show All 952 Lines • ▼ Show 20 Lines	unsigned PPCFastISel::PPCMoveToFPReg(MVT SrcVT, unsigned SrcReg,
unsigned ResultReg = 0;		unsigned ResultReg = 0;
if (!PPCEmitLoad(MVT::f64, ResultReg, Addr, RC, !IsSigned, LoadOpc))		if (!PPCEmitLoad(MVT::f64, ResultReg, Addr, RC, !IsSigned, LoadOpc))
return 0;		return 0;

return ResultReg;		return ResultReg;
}		}

// Attempt to fast-select an integer-to-floating-point conversion.		// Attempt to fast-select an integer-to-floating-point conversion.
		// FIXME: Once fast-isel has better support for VSX, conversions using
		// direct moves should be implemented.
		wschmidtUnsubmitted Not Done Reply Inline Actions Obligatory "add a period" comment. wschmidt: Obligatory "add a period" comment.
bool PPCFastISel::SelectIToFP(const Instruction *I, bool IsSigned) {		bool PPCFastISel::SelectIToFP(const Instruction *I, bool IsSigned) {
MVT DstVT;		MVT DstVT;
Type *DstTy = I->getType();		Type *DstTy = I->getType();
if (!isTypeLegal(DstTy, DstVT))		if (!isTypeLegal(DstTy, DstVT))
return false;		return false;

if (DstVT != MVT::f32 && DstVT != MVT::f64)		if (DstVT != MVT::f32 && DstVT != MVT::f64)
return false;		return false;
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	unsigned PPCFastISel::PPCMoveToIntReg(const Instruction *I, MVT VT,
unsigned ResultReg = 0;		unsigned ResultReg = 0;
if (!PPCEmitLoad(VT, ResultReg, Addr, RC, !IsSigned))		if (!PPCEmitLoad(VT, ResultReg, Addr, RC, !IsSigned))
return 0;		return 0;

return ResultReg;		return ResultReg;
}		}

// Attempt to fast-select a floating-point-to-integer conversion.		// Attempt to fast-select a floating-point-to-integer conversion.
		// FIXME: Once fast-isel has better support for VSX, conversions using
		// direct moves should be implemented.
bool PPCFastISel::SelectFPToI(const Instruction *I, bool IsSigned) {		bool PPCFastISel::SelectFPToI(const Instruction *I, bool IsSigned) {
		wschmidtUnsubmitted Not Done Reply Inline Actions Likewise. wschmidt: Likewise.
MVT DstVT, SrcVT;		MVT DstVT, SrcVT;
Type *DstTy = I->getType();		Type *DstTy = I->getType();
if (!isTypeLegal(DstTy, DstVT))		if (!isTypeLegal(DstTy, DstVT))
return false;		return false;

if (DstVT != MVT::i32 && DstVT != MVT::i64)		if (DstVT != MVT::i32 && DstVT != MVT::i64)
return false;		return false;

▲ Show 20 Lines • Show All 1,248 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	enum NodeType {
/// Return with a flag operand, matched by 'blr'		/// Return with a flag operand, matched by 'blr'
RET_FLAG,		RET_FLAG,

/// R32 = MFOCRF(CRREG, INFLAG) - Represents the MFOCRF instruction.		/// R32 = MFOCRF(CRREG, INFLAG) - Represents the MFOCRF instruction.
/// This copies the bits corresponding to the specified CRREG into the		/// This copies the bits corresponding to the specified CRREG into the
/// resultant GPR. Bits corresponding to other CR regs are undefined.		/// resultant GPR. Bits corresponding to other CR regs are undefined.
MFOCRF,		MFOCRF,

		/// Direct move from a VSX register to a GPR
		MFVSR,

		/// Direct move from a GPR to a VSX register (algebraic)
		MTVSRA,

		/// Direct move from a GPR to a VSX register (zero)
		MTVSRZ,

// FIXME: Remove these once the ANDI glue bug is fixed:		// FIXME: Remove these once the ANDI glue bug is fixed:
/// i1 = ANDIo_1_[EQ\|GT]_BIT(i32 or i64 x) - Represents the result of the		/// i1 = ANDIo_1_[EQ\|GT]_BIT(i32 or i64 x) - Represents the result of the
/// eq or gt bit of CR0 after executing andi. x, 1. This is used to		/// eq or gt bit of CR0 after executing andi. x, 1. This is used to
/// implement truncation of i32 or i64 to i1.		/// implement truncation of i32 or i64 to i1.
ANDIo_1_EQ_BIT, ANDIo_1_GT_BIT,		ANDIo_1_EQ_BIT, ANDIo_1_GT_BIT,

// READ_TIME_BASE - A read of the 64-bit time-base register on a 32-bit		// READ_TIME_BASE - A read of the 64-bit time-base register on a 32-bit
// target (returns (Lo, Hi)). It takes a chain operand.		// target (returns (Lo, Hi)). It takes a chain operand.
▲ Show 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	private:
bool canReuseLoadAddress(SDValue Op, EVT MemVT, ReuseLoadInfo &RLI,		bool canReuseLoadAddress(SDValue Op, EVT MemVT, ReuseLoadInfo &RLI,
SelectionDAG &DAG,		SelectionDAG &DAG,
ISD::LoadExtType ET = ISD::NON_EXTLOAD) const;		ISD::LoadExtType ET = ISD::NON_EXTLOAD) const;
void spliceIntoChain(SDValue ResChain, SDValue NewResChain,		void spliceIntoChain(SDValue ResChain, SDValue NewResChain,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

void LowerFP_TO_INTForReuse(SDValue Op, ReuseLoadInfo &RLI,		void LowerFP_TO_INTForReuse(SDValue Op, ReuseLoadInfo &RLI,
SelectionDAG &DAG, SDLoc dl) const;		SelectionDAG &DAG, SDLoc dl) const;
		SDValue LowerFP_TO_INTDirectMove(SDValue Op, SelectionDAG &DAG,
		SDLoc dl) const;
		SDValue LowerINT_TO_FPDirectMove(SDValue Op, SelectionDAG &DAG,
		SDLoc dl) const;

SDValue getFramePointerFrameIndex(SelectionDAG & DAG) const;		SDValue getFramePointerFrameIndex(SelectionDAG & DAG) const;
SDValue getReturnAddrFrameIndex(SelectionDAG & DAG) const;		SDValue getReturnAddrFrameIndex(SelectionDAG & DAG) const;

bool		bool
IsEligibleForTailCallOptimization(SDValue Callee,		IsEligibleForTailCallOptimization(SDValue Callee,
CallingConv::ID CalleeCC,		CallingConv::ID CalleeCC,
bool isVarArg,		bool isVarArg,
▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 990 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::MTCTR: return "PPCISD::MTCTR";		case PPCISD::MTCTR: return "PPCISD::MTCTR";
case PPCISD::BCTRL: return "PPCISD::BCTRL";		case PPCISD::BCTRL: return "PPCISD::BCTRL";
case PPCISD::BCTRL_LOAD_TOC: return "PPCISD::BCTRL_LOAD_TOC";		case PPCISD::BCTRL_LOAD_TOC: return "PPCISD::BCTRL_LOAD_TOC";
case PPCISD::RET_FLAG: return "PPCISD::RET_FLAG";		case PPCISD::RET_FLAG: return "PPCISD::RET_FLAG";
case PPCISD::READ_TIME_BASE: return "PPCISD::READ_TIME_BASE";		case PPCISD::READ_TIME_BASE: return "PPCISD::READ_TIME_BASE";
case PPCISD::EH_SJLJ_SETJMP: return "PPCISD::EH_SJLJ_SETJMP";		case PPCISD::EH_SJLJ_SETJMP: return "PPCISD::EH_SJLJ_SETJMP";
case PPCISD::EH_SJLJ_LONGJMP: return "PPCISD::EH_SJLJ_LONGJMP";		case PPCISD::EH_SJLJ_LONGJMP: return "PPCISD::EH_SJLJ_LONGJMP";
case PPCISD::MFOCRF: return "PPCISD::MFOCRF";		case PPCISD::MFOCRF: return "PPCISD::MFOCRF";
		case PPCISD::MFVSR: return "PPCISD::MFVSR";
		case PPCISD::MTVSRA: return "PPCISD::MTVSRA";
		case PPCISD::MTVSRZ: return "PPCISD::MTVSRZ";
case PPCISD::VCMP: return "PPCISD::VCMP";		case PPCISD::VCMP: return "PPCISD::VCMP";
case PPCISD::VCMPo: return "PPCISD::VCMPo";		case PPCISD::VCMPo: return "PPCISD::VCMPo";
case PPCISD::LBRX: return "PPCISD::LBRX";		case PPCISD::LBRX: return "PPCISD::LBRX";
case PPCISD::STBRX: return "PPCISD::STBRX";		case PPCISD::STBRX: return "PPCISD::STBRX";
case PPCISD::LFIWAX: return "PPCISD::LFIWAX";		case PPCISD::LFIWAX: return "PPCISD::LFIWAX";
case PPCISD::LFIWZX: return "PPCISD::LFIWZX";		case PPCISD::LFIWZX: return "PPCISD::LFIWZX";
case PPCISD::COND_BRANCH: return "PPCISD::COND_BRANCH";		case PPCISD::COND_BRANCH: return "PPCISD::COND_BRANCH";
case PPCISD::BDNZ: return "PPCISD::BDNZ";		case PPCISD::BDNZ: return "PPCISD::BDNZ";
▲ Show 20 Lines • Show All 4,899 Lines • ▼ Show 20 Lines	if (Op.getValueType() == MVT::i32 && !i32Stack) {
MPI = MPI.getWithOffset(4);		MPI = MPI.getWithOffset(4);
}		}

RLI.Chain = Chain;		RLI.Chain = Chain;
RLI.Ptr = FIPtr;		RLI.Ptr = FIPtr;
RLI.MPI = MPI;		RLI.MPI = MPI;
}		}

		/// \brief Custom lowers floating point to integer conversions to use
		echristoUnsubmitted Not Done Reply Inline Actions Block comment before the function please. echristo: Block comment before the function please.
		/// the direct move instructions available in ISA 2.07 to avoid the
		/// need for load/store combinations.
		SDValue PPCTargetLowering::LowerFP_TO_INTDirectMove(SDValue Op,
		SelectionDAG &DAG,
		SDLoc dl) const {
		assert(Op.getOperand(0).getValueType().isFloatingPoint());
		SDValue Src = Op.getOperand(0);

		if (Src.getValueType() == MVT::f32)
		Src = DAG.getNode(ISD::FP_EXTEND, dl, MVT::f64, Src);

		SDValue Tmp;
		switch (Op.getSimpleValueType().SimpleTy) {
		default: llvm_unreachable("Unhandled FP_TO_INT type in custom expander!");
		case MVT::i32:
		Tmp = DAG.getNode(
		Op.getOpcode() == ISD::FP_TO_SINT
		? PPCISD::FCTIWZ
		: (Subtarget.hasFPCVT() ? PPCISD::FCTIWUZ : PPCISD::FCTIDZ),
		dl, MVT::f64, Src);
		Tmp = DAG.getNode(PPCISD::MFVSR, dl, MVT::i32, Tmp);
		break;
		case MVT::i64:
		assert((Op.getOpcode() == ISD::FP_TO_SINT \|\| Subtarget.hasFPCVT()) &&
		"i64 FP_TO_UINT is supported only with FPCVT");
		Tmp = DAG.getNode(Op.getOpcode()==ISD::FP_TO_SINT ? PPCISD::FCTIDZ :
		PPCISD::FCTIDUZ,
		dl, MVT::f64, Src);
		Tmp = DAG.getNode(PPCISD::MFVSR, dl, MVT::i64, Tmp);
		break;
		}
		return Tmp;
		}

SDValue PPCTargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG,		SDValue PPCTargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG,
SDLoc dl) const {		SDLoc dl) const {
		if (Subtarget.hasDirectMove() && Subtarget.isPPC64())
		wschmidtUnsubmitted Not Done Reply Inline Actions Missing space before left parenthesis. wschmidt: Missing space before left parenthesis.
		return LowerFP_TO_INTDirectMove(Op, DAG, dl);

ReuseLoadInfo RLI;		ReuseLoadInfo RLI;
LowerFP_TO_INTForReuse(Op, RLI, DAG, dl);		LowerFP_TO_INTForReuse(Op, RLI, DAG, dl);

return DAG.getLoad(Op.getValueType(), dl, RLI.Chain, RLI.Ptr, RLI.MPI, false,		return DAG.getLoad(Op.getValueType(), dl, RLI.Chain, RLI.Ptr, RLI.MPI, false,
false, RLI.IsInvariant, RLI.Alignment, RLI.AAInfo,		false, RLI.IsInvariant, RLI.Alignment, RLI.AAInfo,
RLI.Ranges);		RLI.Ranges);
}		}

▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
NewResChain, DAG.getUNDEF(MVT::Other));		NewResChain, DAG.getUNDEF(MVT::Other));
assert(TF.getNode() != NewResChain.getNode() &&		assert(TF.getNode() != NewResChain.getNode() &&
"A new TF really is required here");		"A new TF really is required here");

DAG.ReplaceAllUsesOfValueWith(ResChain, TF);		DAG.ReplaceAllUsesOfValueWith(ResChain, TF);
DAG.UpdateNodeOperands(TF.getNode(), ResChain, NewResChain);		DAG.UpdateNodeOperands(TF.getNode(), ResChain, NewResChain);
}		}

		/// \brief Custom lowers integer to floating point conversions to use
		/// the direct move instructions available in ISA 2.07 to avoid the
		/// need for load/store combinations.
		SDValue PPCTargetLowering::LowerINT_TO_FPDirectMove(SDValue Op,
		SelectionDAG &DAG,
		SDLoc dl) const {
		assert((Op.getValueType() == MVT::f32 \|\|
		Op.getValueType() == MVT::f64) &&
		"Invalid floating point type as target of conversion");
		assert(Subtarget.hasFPCVT() &&
		"Int to FP conversions with direct moves require FPCVT");
		SDValue FP;
		SDValue Src = Op.getOperand(0);
		bool SinglePrec = Op.getValueType() == MVT::f32;
		bool WordInt = Src.getSimpleValueType().SimpleTy == MVT::i32;
		bool Signed = Op.getOpcode() == ISD::SINT_TO_FP;
		unsigned ConvOp = Signed ? (SinglePrec ? PPCISD::FCFIDS : PPCISD::FCFID) :
		(SinglePrec ? PPCISD::FCFIDUS : PPCISD::FCFIDU);

		if (WordInt) {
		FP = DAG.getNode(Signed ? PPCISD::MTVSRA : PPCISD::MTVSRZ,
		dl, MVT::f64, Src);
		FP = DAG.getNode(ConvOp, dl, SinglePrec ? MVT::f32 : MVT::f64, FP);
		}
		else {
		FP = DAG.getNode(PPCISD::MTVSRA, dl, MVT::f64, Src);
		// To prevent unnecessary double rounding, convert directly to single
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions This comment is now redundant. I forgot to remove it prior to uploading the patch. It is removed from my source tree so it won't show up in the final commit (or the next review if one is necessary). nemanjai: This comment is now redundant. I forgot to remove it prior to uploading the patch. It is…
		// precision if the target value is single precision
		FP = DAG.getNode(ConvOp, dl, SinglePrec ? MVT::f32 : MVT::f64, FP);
		}

		return FP;
		}

SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,		SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions This is going away in the patch that I'm about to upload. Thanks Bill for catching the register number issue that led to the realization that we have this unnecessary rounding. nemanjai: This is going away in the patch that I'm about to upload. Thanks Bill for catching the register…
SDLoc dl(Op);		SDLoc dl(Op);

if (Subtarget.hasQPX() && Op.getOperand(0).getValueType() == MVT::v4i1) {		if (Subtarget.hasQPX() && Op.getOperand(0).getValueType() == MVT::v4i1) {
if (Op.getValueType() != MVT::v4f32 && Op.getValueType() != MVT::v4f64)		if (Op.getValueType() != MVT::v4f32 && Op.getValueType() != MVT::v4f64)
return SDValue();		return SDValue();

SDValue Value = Op.getOperand(0);		SDValue Value = Op.getOperand(0);
// The values are now known to be -1 (false) or 1 (true). To convert this		// The values are now known to be -1 (false) or 1 (true). To convert this
Show All 17 Lines	SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,
if (Op.getValueType() != MVT::f32 && Op.getValueType() != MVT::f64)		if (Op.getValueType() != MVT::f32 && Op.getValueType() != MVT::f64)
return SDValue();		return SDValue();

if (Op.getOperand(0).getValueType() == MVT::i1)		if (Op.getOperand(0).getValueType() == MVT::i1)
return DAG.getNode(ISD::SELECT, dl, Op.getValueType(), Op.getOperand(0),		return DAG.getNode(ISD::SELECT, dl, Op.getValueType(), Op.getOperand(0),
DAG.getConstantFP(1.0, Op.getValueType()),		DAG.getConstantFP(1.0, Op.getValueType()),
DAG.getConstantFP(0.0, Op.getValueType()));		DAG.getConstantFP(0.0, Op.getValueType()));

		// If we have direct moves, we can do all the conversion, skip the store/load
		// However, without FPCVT we can't do most conversions
		wschmidtUnsubmitted Not Done Reply Inline Actions Oh look, another missing period! 3.7 demerits. wschmidt: Oh look, another missing period! 3.7 demerits.
		if (Subtarget.hasDirectMove() && Subtarget.isPPC64() && Subtarget.hasFPCVT())
		wschmidtUnsubmitted Not Done Reply Inline Actions Space before parenthesis. wschmidt: Space before parenthesis.
		return LowerINT_TO_FPDirectMove(Op, DAG, dl);

assert((Op.getOpcode() == ISD::SINT_TO_FP \|\| Subtarget.hasFPCVT()) &&		assert((Op.getOpcode() == ISD::SINT_TO_FP \|\| Subtarget.hasFPCVT()) &&
"UINT_TO_FP is supported only with FPCVT");		"UINT_TO_FP is supported only with FPCVT");

// If we have FCFIDS, then use it when converting to single-precision.		// If we have FCFIDS, then use it when converting to single-precision.
// Otherwise, convert to double-precision and then round.		// Otherwise, convert to double-precision and then round.
unsigned FCFOp = (Subtarget.hasFPCVT() && Op.getValueType() == MVT::f32)		unsigned FCFOp = (Subtarget.hasFPCVT() && Op.getValueType() == MVT::f32)
? (Op.getOpcode() == ISD::UINT_TO_FP ? PPCISD::FCFIDUS		? (Op.getOpcode() == ISD::UINT_TO_FP ? PPCISD::FCFIDUS
: PPCISD::FCFIDS)		: PPCISD::FCFIDS)
▲ Show 20 Lines • Show All 5,137 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrFormats.td

Show First 20 Lines • Show All 758 Lines • ▼ Show 20 Lines	class XX1Form<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,

let Inst{6-10} = XT{4-0};		let Inst{6-10} = XT{4-0};
let Inst{11-15} = A;		let Inst{11-15} = A;
let Inst{16-20} = B;		let Inst{16-20} = B;
let Inst{21-30} = xo;		let Inst{21-30} = xo;
let Inst{31} = XT{5};		let Inst{31} = XT{5};
}		}

		class XX1_RS6_RD5_XO<bits<6> opcode, bits<10> xo, dag OOL, dag IOL,
		string asmstr, InstrItinClass itin, list<dag> pattern>
		: XX1Form<opcode, xo, OOL, IOL, asmstr, itin, pattern> {
		let B = 0;
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions This brace will be moved up to where it should be in the code (already done in my source tree). nemanjai: This brace will be moved up to where it should be in the code (already done in my source tree).
		}

class XX2Form<bits<6> opcode, bits<9> xo, dag OOL, dag IOL, string asmstr,		class XX2Form<bits<6> opcode, bits<9> xo, dag OOL, dag IOL, string asmstr,
InstrItinClass itin, list<dag> pattern>		InstrItinClass itin, list<dag> pattern>
: I<opcode, OOL, IOL, asmstr, itin> {		: I<opcode, OOL, IOL, asmstr, itin> {
bits<6> XT;		bits<6> XT;
bits<6> XB;		bits<6> XB;

let Pattern = pattern;		let Pattern = pattern;

▲ Show 20 Lines • Show All 854 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show All 35 Lines	def SDT_PPCxxswapd : SDTypeProfile<1, 1, [
SDTCisSameAs<0, 1>		SDTCisSameAs<0, 1>
]>;		]>;

def PPClxvd2x : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,		def PPClxvd2x : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
[SDNPHasChain, SDNPMayLoad]>;		[SDNPHasChain, SDNPMayLoad]>;
def PPCstxvd2x : SDNode<"PPCISD::STXVD2X", SDT_PPCstxvd2x,		def PPCstxvd2x : SDNode<"PPCISD::STXVD2X", SDT_PPCstxvd2x,
[SDNPHasChain, SDNPMayStore]>;		[SDNPHasChain, SDNPMayStore]>;
def PPCxxswapd : SDNode<"PPCISD::XXSWAPD", SDT_PPCxxswapd, [SDNPHasChain]>;		def PPCxxswapd : SDNode<"PPCISD::XXSWAPD", SDT_PPCxxswapd, [SDNPHasChain]>;
		def PPCmfvsr : SDNode<"PPCISD::MFVSR", SDTUnaryOp, []>;
		def PPCmtvsra : SDNode<"PPCISD::MTVSRA", SDTUnaryOp, []>;
		def PPCmtvsrz : SDNode<"PPCISD::MTVSRZ", SDTUnaryOp, []>;

multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, dag OOL, dag IOL,		multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, dag OOL, dag IOL,
string asmbase, string asmstr, InstrItinClass itin,		string asmbase, string asmstr, InstrItinClass itin,
list<dag> pattern> {		list<dag> pattern> {
let BaseName = asmbase in {		let BaseName = asmbase in {
def NAME : XX3Form_Rc<opcode, xo, OOL, IOL,		def NAME : XX3Form_Rc<opcode, xo, OOL, IOL,
!strconcat(asmbase, !strconcat(" ", asmstr)), itin,		!strconcat(asmbase, !strconcat(" ", asmstr)), itin,
pattern>;		pattern>;
▲ Show 20 Lines • Show All 889 Lines • ▼ Show 20 Lines
} // HasVSX		} // HasVSX

// The following VSX instructions were introduced in Power ISA 2.07		// The following VSX instructions were introduced in Power ISA 2.07
/* FIXME: if the operands are v2i64, these patterns will not match.		/* FIXME: if the operands are v2i64, these patterns will not match.
we should define new patterns or otherwise match the same patterns		we should define new patterns or otherwise match the same patterns
when the elements are larger than i32.		when the elements are larger than i32.
*/		*/
def HasP8Vector : Predicate<"PPCSubTarget->hasP8Vector()">;		def HasP8Vector : Predicate<"PPCSubTarget->hasP8Vector()">;
		def HasDirectMove : Predicate<"PPCSubTarget->hasDirectMove()">;
let Predicates = [HasP8Vector] in {		let Predicates = [HasP8Vector] in {
let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.		let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
let isCommutable = 1 in {		let isCommutable = 1 in {
def XXLEQV : XX3Form<60, 186,		def XXLEQV : XX3Form<60, 186,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xxleqv $XT, $XA, $XB", IIC_VecGeneral,		"xxleqv $XT, $XA, $XB", IIC_VecGeneral,
[(set v4i32:$XT, (vnot_ppc (xor v4i32:$XA, v4i32:$XB)))]>;		[(set v4i32:$XT, (vnot_ppc (xor v4i32:$XA, v4i32:$XB)))]>;
def XXLNAND : XX3Form<60, 178,		def XXLNAND : XX3Form<60, 178,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xxlnand $XT, $XA, $XB", IIC_VecGeneral,		"xxlnand $XT, $XA, $XB", IIC_VecGeneral,
[(set v4i32:$XT, (vnot_ppc (and v4i32:$XA,		[(set v4i32:$XT, (vnot_ppc (and v4i32:$XA,
v4i32:$XB)))]>;		v4i32:$XB)))]>;
} // isCommutable		} // isCommutable
def XXLORC : XX3Form<60, 170,		def XXLORC : XX3Form<60, 170,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xxlorc $XT, $XA, $XB", IIC_VecGeneral,		"xxlorc $XT, $XA, $XB", IIC_VecGeneral,
[(set v4i32:$XT, (or v4i32:$XA, (vnot_ppc v4i32:$XB)))]>;		[(set v4i32:$XT, (or v4i32:$XA, (vnot_ppc v4i32:$XB)))]>;
} // AddedComplexity = 500		} // AddedComplexity = 500
} // HasP8Vector		} // HasP8Vector

		let Predicates = [HasDirectMove, HasVSX] in {
		// VSX direct move instructions
		def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),
		"mfvsrd $rA, $XT", IIC_VecGeneral,
		[(set i64:$rA, (PPCmfvsr f64:$XT))]>,
		Requires<[In64BitMode]>;
		def MFVSRWZ : XX1_RS6_RD5_XO<31, 115, (outs gprc:$rA), (ins vsfrc:$XT),
		"mfvsrwz $rA, $XT", IIC_VecGeneral,
		[(set i32:$rA, (PPCmfvsr f64:$XT))]>;
		def MTVSRD : XX1_RS6_RD5_XO<31, 179, (outs vsfrc:$XT), (ins g8rc:$rA),
		"mtvsrd $XT, $rA", IIC_VecGeneral,
		[(set f64:$XT, (PPCmtvsra i64:$rA))]>,
		Requires<[In64BitMode]>;
		def MTVSRWA : XX1_RS6_RD5_XO<31, 211, (outs vsfrc:$XT), (ins gprc:$rA),
		"mtvsrwa $XT, $rA", IIC_VecGeneral,
		[(set f64:$XT, (PPCmtvsra i32:$rA))]>;
		def MTVSRWZ : XX1_RS6_RD5_XO<31, 243, (outs vsfrc:$XT), (ins gprc:$rA),
		"mtvsrwz $XT, $rA", IIC_VecGeneral,
		[(set f64:$XT, (PPCmtvsrz i32:$rA))]>;
		} // HasDirectMove, HasVSX
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions In the actual commit, this will have both: HasDirectMove, HasVSX to indicate what the brace terminates. nemanjai: In the actual commit, this will have both: HasDirectMove, HasVSX to indicate what the brace…
		wschmidtUnsubmitted Not Done Reply Inline Actions In all of the above, please indent the overflow lines so that the first character is underneath the character following the "<" character (as you did for MFVSRWZ, but for none of the others). wschmidt: In all of the above, please indent the overflow lines so that the first character is underneath…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Ugh, I don't know how I miss these things - it is so obviously misaligned. Thanks for pointing it out. Fixed and will be part of the next revision. nemanjai: Ugh, I don't know how I miss these things - it is so obviously misaligned. Thanks for pointing…

lib/Target/PowerPC/PPCSubtarget.h

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	protected:
bool IsPPC6xx;		bool IsPPC6xx;
bool DeprecatedMFTB;		bool DeprecatedMFTB;
bool DeprecatedDST;		bool DeprecatedDST;
bool HasLazyResolverStubs;		bool HasLazyResolverStubs;
bool IsLittleEndian;		bool IsLittleEndian;
bool HasICBT;		bool HasICBT;
bool HasInvariantFunctionDescriptors;		bool HasInvariantFunctionDescriptors;
bool HasPartwordAtomics;		bool HasPartwordAtomics;
		bool HasDirectMove;
bool HasHTM;		bool HasHTM;

/// When targeting QPX running a stock PPC64 Linux kernel where the stack		/// When targeting QPX running a stock PPC64 Linux kernel where the stack
/// alignment has not been changed, we need to keep the 16-byte alignment		/// alignment has not been changed, we need to keep the 16-byte alignment
/// of the stack.		/// of the stack.
bool IsQPXStackUnaligned;		bool IsQPXStackUnaligned;

const PPCTargetMachine &TM;		const PPCTargetMachine &TM;
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	public:
bool isE500() const { return IsE500; }		bool isE500() const { return IsE500; }
bool isDeprecatedMFTB() const { return DeprecatedMFTB; }		bool isDeprecatedMFTB() const { return DeprecatedMFTB; }
bool isDeprecatedDST() const { return DeprecatedDST; }		bool isDeprecatedDST() const { return DeprecatedDST; }
bool hasICBT() const { return HasICBT; }		bool hasICBT() const { return HasICBT; }
bool hasInvariantFunctionDescriptors() const {		bool hasInvariantFunctionDescriptors() const {
return HasInvariantFunctionDescriptors;		return HasInvariantFunctionDescriptors;
}		}
bool hasPartwordAtomics() const { return HasPartwordAtomics; }		bool hasPartwordAtomics() const { return HasPartwordAtomics; }
		bool hasDirectMove() const { return HasDirectMove; }

bool isQPXStackUnaligned() const { return IsQPXStackUnaligned; }		bool isQPXStackUnaligned() const { return IsQPXStackUnaligned; }
unsigned getPlatformStackAlignment() const {		unsigned getPlatformStackAlignment() const {
if ((hasQPX() \|\| isBGQ()) && !isQPXStackUnaligned())		if ((hasQPX() \|\| isBGQ()) && !isQPXStackUnaligned())
return 32;		return 32;

return 16;		return 16;
}		}
Show All 36 Lines

lib/Target/PowerPC/PPCSubtarget.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	void PPCSubtarget::initializeEnvironment() {
IsPPC6xx = false;		IsPPC6xx = false;
IsE500 = false;		IsE500 = false;
DeprecatedMFTB = false;		DeprecatedMFTB = false;
DeprecatedDST = false;		DeprecatedDST = false;
HasLazyResolverStubs = false;		HasLazyResolverStubs = false;
HasICBT = false;		HasICBT = false;
HasInvariantFunctionDescriptors = false;		HasInvariantFunctionDescriptors = false;
HasPartwordAtomics = false;		HasPartwordAtomics = false;
		HasDirectMove = false;
IsQPXStackUnaligned = false;		IsQPXStackUnaligned = false;
HasHTM = false;		HasHTM = false;
}		}

void PPCSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {		void PPCSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
// Determine default and user specified characteristics		// Determine default and user specified characteristics
std::string CPUName = CPU;		std::string CPUName = CPU;
if (CPUName.empty()) {		if (CPUName.empty()) {
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll

				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64-unknown-unknown < %s \| FileCheck %s
				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s

				echristoUnsubmitted Not Done Reply Inline Actions Can probably just use -unknown-unknown here as the OS part of the triple? echristo: Can probably just use -unknown-unknown here as the OS part of the triple?
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Yup, I don't see why not. Will do. Thanks for the comment. nemanjai: Yup, I don't see why not. Will do. Thanks for the comment.
				; Function Attrs: nounwind
				define zeroext i8 @_Z6testcff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptoui float %0 to i8
				ret i8 %conv
				; CHECK-LABEL: @_Z6testcff
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions The regular expression will be changed from {{[0-9]+}} to [[CONV-REG:[0-9]+]] for the first occurrence of the register and to [[CONV-REG]] for the second occurrence. This is to ensure that the destination register for what we convert is the same as the source register for the move instruction. This applies throughout. If this turns out to be the only comment, no further review will be posted but the change will be made in the committed version. nemanjai: The regular expression will be changed from {{[0-9]+}} to [[CONV-REG:[0-9]+]] for the first…
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				wschmidtUnsubmitted Not Done Reply Inline Actions Specifying the intermediate register number (0) makes the test case more likely to break in the future. Please use {{[0-9]+}} so the test isn't reliant on specific register allocation. The 1 and 3 are ok because they are ABI registers we expect to use. This applies to all the tests here. wschmidt: Specifying the intermediate register number (0) makes the test case more likely to break in the…
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I will turn these into a regular expressions. Thanks for the tip. nemanjai: I will turn these into a regular expressions. Thanks for the tip.
				; Function Attrs: nounwind
				define float @_Z6testfcc(i8 zeroext %arg) {
				entry:
				%arg.addr = alloca i8, align 1
				store i8 %arg, i8* %arg.addr, align 1
				%0 = load i8, i8* %arg.addr, align 1
				%conv = uitofp i8 %0 to float
				ret float %conv
				; CHECK-LABEL: @_Z6testfcc
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; FIXME: Once we have XSCVUXDSP implemented, this will change
				; CHECK: fcfidus 1, {{[0-9]+}}
				wschmidtUnsubmitted Not Done Reply Inline Actions Here the result register of the xscvuxddp is not an ABI reg so should also use a regexp, not a specific reg. (Also, why didn't it end up in float reg 1?) wschmidt: Here the result register of the xscvuxddp is not an ABI reg so should also use a regexp, not a…
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I'll turn these into regular expressions if you would like. To answer the question in parentheses, it does not end up in float reg 1 because it is followed by an frsp since we need to round it to single precision. I will change the custom lowering to use PPCISD::FCFID[U]S when converting from any integral value to a single-precision float. I replicated the existing logic which in retrospect does not seem sound (actually I just realized that existing logic only rounds if there is no FPCVT). It uses the SDAG nodes for rounding directly to single-precision only when we are converting i64 to f32 but not for other integral types. Since we are now assuming hasFPCVT for the entire function, there is no harm in refactoring this to skip the need for the extra rounding instruction. Wow, I am really glad you pointed this out since I didn't really think about why FPR 1 was not used. Thanks. nemanjai: I'll turn these into regular expressions if you would like. To answer the question in…
				}

				; Function Attrs: nounwind
				define zeroext i8 @_Z6testcdd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptoui double %0 to i8
				ret i8 %conv
				; CHECK-LABEL: @_Z6testcdd
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z6testdcc(i8 zeroext %arg) {
				entry:
				%arg.addr = alloca i8, align 1
				store i8 %arg, i8* %arg.addr, align 1
				%0 = load i8, i8* %arg.addr, align 1
				%conv = uitofp i8 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z6testdcc
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; CHECK: xscvuxddp 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define zeroext i8 @_Z7testucff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptoui float %0 to i8
				ret i8 %conv
				; CHECK-LABEL: @_Z7testucff
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define float @_Z7testfuch(i8 zeroext %arg) {
				entry:
				%arg.addr = alloca i8, align 1
				store i8 %arg, i8* %arg.addr, align 1
				%0 = load i8, i8* %arg.addr, align 1
				%conv = uitofp i8 %0 to float
				ret float %conv
				; CHECK-LABEL: @_Z7testfuch
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; FIXME: Once we have XSCVUXDSP implemented, this will change
				wschmidtUnsubmitted Not Done Reply Inline Actions Same concerns with result reg. wschmidt: Same concerns with result reg.
				; CHECK: fcfidus 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define zeroext i8 @_Z7testucdd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptoui double %0 to i8
				ret i8 %conv
				; CHECK-LABEL: @_Z7testucdd
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z7testduch(i8 zeroext %arg) {
				entry:
				%arg.addr = alloca i8, align 1
				store i8 %arg, i8* %arg.addr, align 1
				%0 = load i8, i8* %arg.addr, align 1
				%conv = uitofp i8 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z7testduch
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; CHECK: xscvuxddp 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define signext i16 @_Z6testsff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptosi float %0 to i16
				ret i16 %conv
				; CHECK-LABEL: @_Z6testsff
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define float @_Z6testfss(i16 signext %arg) {
				entry:
				%arg.addr = alloca i16, align 2
				store i16 %arg, i16* %arg.addr, align 2
				%0 = load i16, i16* %arg.addr, align 2
				%conv = sitofp i16 %0 to float
				ret float %conv
				; CHECK-LABEL: @_Z6testfss
				; CHECK: mtvsrwa {{[0-9]+}}, 3
				wschmidtUnsubmitted Not Done Reply Inline Actions And again. wschmidt: And again.
				; FIXME: Once we have XSCVSXDSP implemented, this will change
				; CHECK: fcfids 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define signext i16 @_Z6testsdd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptosi double %0 to i16
				ret i16 %conv
				; CHECK-LABEL: @_Z6testsdd
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z6testdss(i16 signext %arg) {
				entry:
				%arg.addr = alloca i16, align 2
				store i16 %arg, i16* %arg.addr, align 2
				%0 = load i16, i16* %arg.addr, align 2
				%conv = sitofp i16 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z6testdss
				; CHECK: mtvsrwa {{[0-9]+}}, 3
				; CHECK: xscvsxddp 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define zeroext i16 @_Z7testusff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptoui float %0 to i16
				ret i16 %conv
				; CHECK-LABEL: @_Z7testusff
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define float @_Z7testfust(i16 zeroext %arg) {
				entry:
				%arg.addr = alloca i16, align 2
				store i16 %arg, i16* %arg.addr, align 2
				%0 = load i16, i16* %arg.addr, align 2
				%conv = uitofp i16 %0 to float
				ret float %conv
				; CHECK-LABEL: @_Z7testfust
				wschmidtUnsubmitted Not Done Reply Inline Actions And again. wschmidt: And again.
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; FIXME: Once we have XSCVUXDSP implemented, this will change
				; CHECK: fcfidus 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define zeroext i16 @_Z7testusdd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptoui double %0 to i16
				ret i16 %conv
				; CHECK-LABEL: @_Z7testusdd
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z7testdust(i16 zeroext %arg) {
				entry:
				%arg.addr = alloca i16, align 2
				store i16 %arg, i16* %arg.addr, align 2
				%0 = load i16, i16* %arg.addr, align 2
				%conv = uitofp i16 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z7testdust
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; CHECK: xscvuxddp 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define signext i32 @_Z6testiff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptosi float %0 to i32
				ret i32 %conv
				; CHECK-LABEL: @_Z6testiff
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define float @_Z6testfii(i32 signext %arg) {
				entry:
				%arg.addr = alloca i32, align 4
				store i32 %arg, i32* %arg.addr, align 4
				%0 = load i32, i32* %arg.addr, align 4
				%conv = sitofp i32 %0 to float
				ret float %conv
				wschmidtUnsubmitted Not Done Reply Inline Actions And again. wschmidt: And again.
				; CHECK-LABEL: @_Z6testfii
				; CHECK: mtvsrwa {{[0-9]+}}, 3
				; FIXME: Once we have XSCVSXDSP implemented, this will change
				; CHECK: fcfids 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define signext i32 @_Z6testidd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptosi double %0 to i32
				ret i32 %conv
				; CHECK-LABEL: @_Z6testidd
				; CHECK: xscvdpsxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z6testdii(i32 signext %arg) {
				entry:
				%arg.addr = alloca i32, align 4
				store i32 %arg, i32* %arg.addr, align 4
				%0 = load i32, i32* %arg.addr, align 4
				%conv = sitofp i32 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z6testdii
				; CHECK: mtvsrwa {{[0-9]+}}, 3
				; CHECK: xscvsxddp 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define zeroext i32 @_Z7testuiff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptoui float %0 to i32
				ret i32 %conv
				; CHECK-LABEL: @_Z7testuiff
				; CHECK: xscvdpuxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define float @_Z7testfuij(i32 zeroext %arg) {
				entry:
				%arg.addr = alloca i32, align 4
				store i32 %arg, i32* %arg.addr, align 4
				%0 = load i32, i32* %arg.addr, align 4
				%conv = uitofp i32 %0 to float
				wschmidtUnsubmitted Not Done Reply Inline Actions And again. wschmidt: And again.
				ret float %conv
				; CHECK-LABEL: @_Z7testfuij
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; FIXME: Once we have XSCVUXDSP implemented, this will change
				; CHECK: fcfidus 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define zeroext i32 @_Z7testuidd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptoui double %0 to i32
				ret i32 %conv
				; CHECK-LABEL: @_Z7testuidd
				; CHECK: xscvdpuxws {{[0-9]+}}, 1
				; CHECK: mfvsrwz 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z7testduij(i32 zeroext %arg) {
				entry:
				%arg.addr = alloca i32, align 4
				store i32 %arg, i32* %arg.addr, align 4
				%0 = load i32, i32* %arg.addr, align 4
				%conv = uitofp i32 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z7testduij
				; CHECK: mtvsrwz {{[0-9]+}}, 3
				; CHECK: xscvuxddp 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define i64 @_Z7testllff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptosi float %0 to i64
				ret i64 %conv
				; CHECK-LABEL: @_Z7testllff
				; CHECK: xscvdpsxds {{[0-9]+}}, 1
				; CHECK: mfvsrd 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define float @_Z7testfllx(i64 %arg) {
				entry:
				%arg.addr = alloca i64, align 8
				store i64 %arg, i64* %arg.addr, align 8
				%0 = load i64, i64* %arg.addr, align 8
				%conv = sitofp i64 %0 to float
				ret float %conv
				; CHECK-LABEL:@_Z7testfllx
				; CHECK: mtvsrd {{[0-9]+}}, 3
				; FIXME: Once we have XSCVSXDSP implemented, this will change
				; CHECK: fcfids 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define i64 @_Z7testlldd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptosi double %0 to i64
				ret i64 %conv
				; CHECK-LABEL: @_Z7testlldd
				; CHECK: xscvdpsxds {{[0-9]+}}, 1
				; CHECK: mfvsrd 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z7testdllx(i64 %arg) {
				entry:
				%arg.addr = alloca i64, align 8
				store i64 %arg, i64* %arg.addr, align 8
				%0 = load i64, i64* %arg.addr, align 8
				%conv = sitofp i64 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z7testdllx
				; CHECK: mtvsrd {{[0-9]+}}, 3
				; CHECK: xscvsxddp 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define i64 @_Z8testullff(float %arg) {
				entry:
				%arg.addr = alloca float, align 4
				store float %arg, float* %arg.addr, align 4
				%0 = load float, float* %arg.addr, align 4
				%conv = fptoui float %0 to i64
				ret i64 %conv
				; CHECK-LABEL: @_Z8testullff
				; CHECK: xscvdpuxds {{[0-9]+}}, 1
				; CHECK: mfvsrd 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define float @_Z8testfully(i64 %arg) {
				entry:
				%arg.addr = alloca i64, align 8
				store i64 %arg, i64* %arg.addr, align 8
				%0 = load i64, i64* %arg.addr, align 8
				%conv = uitofp i64 %0 to float
				ret float %conv
				; CHECK-LABEL: @_Z8testfully
				; CHECK: mtvsrd {{[0-9]+}}, 3
				; FIXME: Once we have XSCVUXDSP implemented, this will change
				; CHECK: fcfidus 1, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define i64 @_Z8testulldd(double %arg) {
				entry:
				%arg.addr = alloca double, align 8
				store double %arg, double* %arg.addr, align 8
				%0 = load double, double* %arg.addr, align 8
				%conv = fptoui double %0 to i64
				ret i64 %conv
				; CHECK-LABEL: @_Z8testulldd
				; CHECK: xscvdpuxds {{[0-9]+}}, 1
				; CHECK: mfvsrd 3, {{[0-9]+}}
				}

				; Function Attrs: nounwind
				define double @_Z8testdully(i64 %arg) {
				entry:
				%arg.addr = alloca i64, align 8
				store i64 %arg, i64* %arg.addr, align 8
				%0 = load i64, i64* %arg.addr, align 8
				%conv = uitofp i64 %0 to double
				ret double %conv
				; CHECK-LABEL: @_Z8testdully
				; CHECK: mtvsrd {{[0-9]+}}, 3
				; CHECK: xscvuxddp 1, {{[0-9]+}}
				}
				wschmidtUnsubmitted Not Done Reply Inline Actions Please remove the attributes and metadata. Delete these trailing lines, and remove the #0 from the function definitions. wschmidt: Please remove the attributes and metadata. Delete these trailing lines, and remove the #0 from…

test/MC/Disassembler/PowerPC/vsx.txt

	Show First 20 Lines • Show All 453 Lines • ▼ Show 20 Lines

	# CHECK: xxspltw 7, 27, 3			# CHECK: xxspltw 7, 27, 3
	0xf0 0xe3 0xda 0x90			0xf0 0xe3 0xda 0x90

	# FIXME: decode as xxswapd 7, 63			# FIXME: decode as xxswapd 7, 63
	# CHECK: xxpermdi 7, 63, 63, 2			# CHECK: xxpermdi 7, 63, 63, 2
	0xf0 0xff 0xfa 0x56			0xf0 0xff 0xfa 0x56

				# CHECK: mfvsrd 3, 0
				0x7c 0x03 0x00 0x66

				# CHECK: mfvsrwz 5, 0
				0x7c 0x05 0x00 0xe6

				# CHECK: mtvsrd 0, 3
				0x7c 0x03 0x01 0x66

				# CHECK: mtvsrwa 0, 3
				0x7c 0x03 0x01 0xa6

				# CHECK: mtvsrwz 0, 3
				0x7c 0x03 0x01 0xe6

test/MC/PowerPC/vsx.s

	Show First 20 Lines • Show All 448 Lines • ▼ Show 20 Lines
	# CHECK-LE: xxpermdi 7, 63, 63, 3 # encoding: [0x56,0xfb,0xff,0xf0]			# CHECK-LE: xxpermdi 7, 63, 63, 3 # encoding: [0x56,0xfb,0xff,0xf0]
	xxspltd 7, 63, 1			xxspltd 7, 63, 1
	# CHECK-BE: xxspltw 7, 27, 3 # encoding: [0xf0,0xe3,0xda,0x90]			# CHECK-BE: xxspltw 7, 27, 3 # encoding: [0xf0,0xe3,0xda,0x90]
	# CHECK-LE: xxspltw 7, 27, 3 # encoding: [0x90,0xda,0xe3,0xf0]			# CHECK-LE: xxspltw 7, 27, 3 # encoding: [0x90,0xda,0xe3,0xf0]
	xxspltw 7, 27, 3			xxspltw 7, 27, 3
	# CHECK-BE: xxpermdi 7, 63, 63, 2 # encoding: [0xf0,0xff,0xfa,0x56]			# CHECK-BE: xxpermdi 7, 63, 63, 2 # encoding: [0xf0,0xff,0xfa,0x56]
	# CHECK-LE: xxpermdi 7, 63, 63, 2 # encoding: [0x56,0xfa,0xff,0xf0]			# CHECK-LE: xxpermdi 7, 63, 63, 2 # encoding: [0x56,0xfa,0xff,0xf0]
	xxswapd 7, 63			xxswapd 7, 63

				# Move to/from VSR
				# CHECK-BE: mfvsrd 3, 0 # encoding: [0x7c,0x03,0x00,0x66]
				# CHECK-LE: mfvsrd 3, 0 # encoding: [0x66,0x00,0x03,0x7c]
				mfvsrd 3, 0
				# CHECK-BE: mfvsrwz 5, 0 # encoding: [0x7c,0x05,0x00,0xe6]
				# CHECK-LE: mfvsrwz 5, 0 # encoding: [0xe6,0x00,0x05,0x7c]
				mfvsrwz 5, 0
				# CHECK-BE: mtvsrd 0, 3 # encoding: [0x7c,0x03,0x01,0x66]
				# CHECK-LE: mtvsrd 0, 3 # encoding: [0x66,0x01,0x03,0x7c]
				mtvsrd 0, 3
				# CHECK-BE: mtvsrwa 0, 3 # encoding: [0x7c,0x03,0x01,0xa6]
				# CHECK-LE: mtvsrwa 0, 3 # encoding: [0xa6,0x01,0x03,0x7c]
				mtvsrwa 0, 3
				# CHECK-BE: mtvsrwz 0, 3 # encoding: [0x7c,0x03,0x01,0xe6]
				# CHECK-LE: mtvsrwz 0, 3 # encoding: [0xe6,0x01,0x03,0x7c]
				mtvsrwz 0, 3

This is an archive of the discontinued LLVM Phabricator instance.

Add direct moves to/from VSR and exploit them for FP/INT conversionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 23554

lib/Target/PowerPC/PPC.td

lib/Target/PowerPC/PPCFastISel.cpp

lib/Target/PowerPC/PPCISelLowering.h

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCInstrFormats.td

lib/Target/PowerPC/PPCInstrVSX.td

lib/Target/PowerPC/PPCSubtarget.h

lib/Target/PowerPC/PPCSubtarget.cpp

test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll

test/MC/Disassembler/PowerPC/vsx.txt

test/MC/PowerPC/vsx.s

Add direct moves to/from VSR and exploit them for FP/INT conversions
ClosedPublic