This is an archive of the discontinued LLVM Phabricator instance.

Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
ClosedPublic

Authored by spatel on Sep 24 2014, 12:47 PM.

Download Raw Diff

Details

Reviewers

chandlerc
hfinkel

Commits

rGbdf1e38856a9: Refactor reciprocal and reciprocal square root estimate into target-independent…
rL218553: Refactor reciprocal and reciprocal square root estimate into target-independent…

Summary

This is purely refactoring. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

In part 1 ( http://reviews.llvm.org/D5425 ) of this refactoring, I moved just the wrapper portion of the square root estimate out of the PPC backend and into DAGCombiner. In this patch, I've moved everything that I can out of PPCISelLowering and into DAGCombiner.

It turns out that we might as well grab the reciprocal estimate code too because I think that any hardware that provides a rsqrt estimate is also going to provide a recip estimate. And PPC even uses rcpe to generate sqrt from rsqrte! I added a visitFSQRT() to DAGCombiner to keep that functionality.

There are small hooks in TargetLowering to get the target-specific opcode for each estimate instruction and a function to tell DAGCombiner how many times it needs to run the Newton-Raphson refinement loop.

This will allow any target to generate the estimate code by implementing these methods:

virtual SDValue getRecipEst(SDValue Op, DAGCombinerInfo &DCI) const;
virtual SDValue getRSqrtEst(SDValue Op, DAGCombinerInfo &DCI) const;
virtual unsigned getNRSteps(EVT VT) const;

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 14047.Sep 24 2014, 12:47 PM

spatel retitled this revision from to Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2)..

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: chandlerc, hfinkel.

spatel added subscribers: Unknown Object (MLST), tycho.

Herald added a subscriber: aemerson. · View Herald TranscriptSep 24 2014, 12:47 PM

LGTM, thanks!

include/llvm/Target/TargetLowering.h
2632 ↗	(On Diff #14047)	The number of iterations necessary for the reciprocal estimate and for the reciprocal sqrt estimate might be different. Please provide a way to differentiate (and I'd want to make really sure the target actually overrides this). Maybe: virtual unsigned getNRSteps(EVT VT, bool SqrtEst) const { llvm_unreachable("Target must provide the number of iterations"); }

This revision is now accepted and ready to land.Sep 25 2014, 1:21 PM

spatel added inline comments.Sep 26 2014, 9:02 AM

include/llvm/Target/TargetLowering.h
2632 ↗	(On Diff #14047)	Sure - I'll make unique functions to return iteration counts for sqrte and rcpe. We may need one more refinement here regarding the rcpe(rsqrt(x)) transformation of a regular sqrt(x)...my guess is that's not a win on any recent X86 (and probably not PPC either?). But that change can come later if needed.

hfinkel added inline comments.Sep 26 2014, 9:39 AM

include/llvm/Target/TargetLowering.h
2632 ↗	(On Diff #14047)	Regarding PPC, you might be right about some of them -- it is certainly a win on the embedded cores where the sqrt instruction is not fully pipelined. We'll need to do some measurements.

spatel added inline comments.Sep 26 2014, 1:37 PM

include/llvm/Target/TargetLowering.h
2632 ↗	(On Diff #14047)	It's coming back to me now (used to be at IBM and Apple)... I think the deciding factor is not whether the sqrt instruction is pipelined, but whether it exists at all. Eg, 7400/7450 had fre/frsqrte, but lacked fsqrt. In that case, the decision is between doing a long sequence of dependent ops using the estimates vs. making a call to libm sqrt(). If fsqrt exists, it should probably be used unless there's some truly horrible HW implementation out there. Certainly, this should be measured on as many targets as possible to see if it's true.

Instead of making different functions for each estimate possibility, I think it's better to make the getEstimate() interface in TargetLowering as generic as possible and let the targets do whatever they want under that. Eg, Altivec provides estimates for log() and exp(). These could conceivably be used to replace libm calls. GPUs might have instructions for those too.

As part of the minimization, I rolled the refinement steps parameter into the single API. Please let me know if you see a better or more elegant way.

Closed by commit rL218553 (authored by @spatel).

hfinkel added inline comments.Sep 27 2014, 6:22 PM

include/llvm/Target/TargetLowering.h
2632 ↗	(On Diff #14047)	Obviously whether it exists at all matters, but the pipelining definitely also matters -- at least on some cores. On the A2 (an embedded core), for example, a full sqrt blocks the issuing thread from issuing any additional floating-point instructions for 69 cycles. There the pipelining definitely matters, but on other cores I'm less certain (which is why I said that we'd need to measure it).

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetLowering.h

15 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

170 lines

Target/

PowerPC/

PPCISelLowering.h

6 lines

PPCISelLowering.cpp

201 lines

test/

CodeGen/

PowerPC/

recipest.ll

62 lines

Diff 14140

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 2,618 Lines • ▼ Show 20 Lines	SDValue BuildUDIV(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
bool IsAfterLegalization,		bool IsAfterLegalization,
std::vector<SDNode > Created) const;		std::vector<SDNode > Created) const;
virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,		virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,
SelectionDAG &DAG,		SelectionDAG &DAG,
std::vector<SDNode > Created) const {		std::vector<SDNode > Created) const {
return SDValue();		return SDValue();
}		}

virtual SDValue BuildRSQRTE(SDValue Op, DAGCombinerInfo &DCI) const {		/// Hooks for building estimates in place of, for example, slower divisions
		/// and square roots. These are not builder functions themselves, just the
		/// target-specific variables needed for building the estimate algorithm.

		/// Return an estimate value for the input opcode and input operand.
		/// The RefinementSteps output is the number of refinement iterations
		/// required to generate a sufficient (though not necessarily IEEE-754
		/// compliant) estimate for the value type.
		/// An empty SDValue return means no estimate sequence can be created.
		virtual SDValue getEstimate(unsigned Opcode, SDValue Operand,
		DAGCombinerInfo &DCI,
		unsigned &RefinementSteps) const {
return SDValue();		return SDValue();
}		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Legalization utility functions		// Legalization utility functions
//		//

/// Expand a MUL into two nodes. One that computes the high bits of		/// Expand a MUL into two nodes. One that computes the high bits of
/// the result and one that computes the low bits.		/// the result and one that computes the low bits.
/// \param HiLoVT The value type to use for the Lo and Hi nodes.		/// \param HiLoVT The value type to use for the Lo and Hi nodes.
/// \param LL Low bits of the LHS of the MUL. You can use this parameter		/// \param LL Low bits of the LHS of the MUL. You can use this parameter
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	private:
SDValue visitBITCAST(SDNode *N);		SDValue visitBITCAST(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);		SDValue visitBUILD_PAIR(SDNode *N);
SDValue visitFADD(SDNode *N);		SDValue visitFADD(SDNode *N);
SDValue visitFSUB(SDNode *N);		SDValue visitFSUB(SDNode *N);
SDValue visitFMUL(SDNode *N);		SDValue visitFMUL(SDNode *N);
SDValue visitFMA(SDNode *N);		SDValue visitFMA(SDNode *N);
SDValue visitFDIV(SDNode *N);		SDValue visitFDIV(SDNode *N);
SDValue visitFREM(SDNode *N);		SDValue visitFREM(SDNode *N);
		SDValue visitFSQRT(SDNode *N);
SDValue visitFCOPYSIGN(SDNode *N);		SDValue visitFCOPYSIGN(SDNode *N);
SDValue visitSINT_TO_FP(SDNode *N);		SDValue visitSINT_TO_FP(SDNode *N);
SDValue visitUINT_TO_FP(SDNode *N);		SDValue visitUINT_TO_FP(SDNode *N);
SDValue visitFP_TO_SINT(SDNode *N);		SDValue visitFP_TO_SINT(SDNode *N);
SDValue visitFP_TO_UINT(SDNode *N);		SDValue visitFP_TO_UINT(SDNode *N);
SDValue visitFP_ROUND(SDNode *N);		SDValue visitFP_ROUND(SDNode *N);
SDValue visitFP_ROUND_INREG(SDNode *N);		SDValue visitFP_ROUND_INREG(SDNode *N);
SDValue visitFP_EXTEND(SDNode *N);		SDValue visitFP_EXTEND(SDNode *N);
Show All 34 Lines	private:

SDValue SimplifyNodeWithTwoResults(SDNode *N, unsigned LoOp,		SDValue SimplifyNodeWithTwoResults(SDNode *N, unsigned LoOp,
unsigned HiOp);		unsigned HiOp);
SDValue CombineConsecutiveLoads(SDNode *N, EVT VT);		SDValue CombineConsecutiveLoads(SDNode *N, EVT VT);
SDValue ConstantFoldBITCASTofBUILD_VECTOR(SDNode *, EVT);		SDValue ConstantFoldBITCASTofBUILD_VECTOR(SDNode *, EVT);
SDValue BuildSDIV(SDNode *N);		SDValue BuildSDIV(SDNode *N);
SDValue BuildSDIVPow2(SDNode *N);		SDValue BuildSDIVPow2(SDNode *N);
SDValue BuildUDIV(SDNode *N);		SDValue BuildUDIV(SDNode *N);
SDValue BuildRSQRTE(SDNode *N);		SDValue BuildReciprocalEstimate(SDValue Op);
		SDValue BuildRsqrtEstimate(SDValue Op);
SDValue MatchBSwapHWordLow(SDNode *N, SDValue N0, SDValue N1,		SDValue MatchBSwapHWordLow(SDNode *N, SDValue N0, SDValue N1,
bool DemandHighBits = true);		bool DemandHighBits = true);
SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);		SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);
SDNode *MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,		SDNode *MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,
SDValue InnerPos, SDValue InnerNeg,		SDValue InnerPos, SDValue InnerNeg,
unsigned PosOpcode, unsigned NegOpcode,		unsigned PosOpcode, unsigned NegOpcode,
SDLoc DL);		SDLoc DL);
SDNode *MatchRotate(SDValue LHS, SDValue RHS, SDLoc DL);		SDNode *MatchRotate(SDValue LHS, SDValue RHS, SDLoc DL);
▲ Show 20 Lines • Show All 964 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::BITCAST: return visitBITCAST(N);		case ISD::BITCAST: return visitBITCAST(N);
case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);		case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
case ISD::FADD: return visitFADD(N);		case ISD::FADD: return visitFADD(N);
case ISD::FSUB: return visitFSUB(N);		case ISD::FSUB: return visitFSUB(N);
case ISD::FMUL: return visitFMUL(N);		case ISD::FMUL: return visitFMUL(N);
case ISD::FMA: return visitFMA(N);		case ISD::FMA: return visitFMA(N);
case ISD::FDIV: return visitFDIV(N);		case ISD::FDIV: return visitFDIV(N);
case ISD::FREM: return visitFREM(N);		case ISD::FREM: return visitFREM(N);
		case ISD::FSQRT: return visitFSQRT(N);
case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);		case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);
case ISD::SINT_TO_FP: return visitSINT_TO_FP(N);		case ISD::SINT_TO_FP: return visitSINT_TO_FP(N);
case ISD::UINT_TO_FP: return visitUINT_TO_FP(N);		case ISD::UINT_TO_FP: return visitUINT_TO_FP(N);
case ISD::FP_TO_SINT: return visitFP_TO_SINT(N);		case ISD::FP_TO_SINT: return visitFP_TO_SINT(N);
case ISD::FP_TO_UINT: return visitFP_TO_UINT(N);		case ISD::FP_TO_UINT: return visitFP_TO_UINT(N);
case ISD::FP_ROUND: return visitFP_ROUND(N);		case ISD::FP_ROUND: return visitFP_ROUND(N);
case ISD::FP_ROUND_INREG: return visitFP_ROUND_INREG(N);		case ISD::FP_ROUND_INREG: return visitFP_ROUND_INREG(N);
case ISD::FP_EXTEND: return visitFP_EXTEND(N);		case ISD::FP_EXTEND: return visitFP_EXTEND(N);
▲ Show 20 Lines • Show All 5,653 Lines • ▼ Show 20 Lines
}		}

SDValue DAGCombiner::visitFDIV(SDNode *N) {		SDValue DAGCombiner::visitFDIV(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);		ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;

// fold vector ops		// fold vector ops
if (VT.isVector()) {		if (VT.isVector()) {
SDValue FoldedVOp = SimplifyVBinOp(N);		SDValue FoldedVOp = SimplifyVBinOp(N);
if (FoldedVOp.getNode()) return FoldedVOp;		if (FoldedVOp.getNode()) return FoldedVOp;
}		}

Show All 15 Lines	if (N1CFP) {
// FIXME: custom lowering of ConstantFP might fail (see e.g. ARM		// FIXME: custom lowering of ConstantFP might fail (see e.g. ARM
// backend)... we should handle this gracefully after Legalize.		// backend)... we should handle this gracefully after Legalize.
// TLI.isOperationLegalOrCustom(llvm::ISD::ConstantFP, VT) \|\|		// TLI.isOperationLegalOrCustom(llvm::ISD::ConstantFP, VT) \|\|
TLI.isOperationLegal(llvm::ISD::ConstantFP, VT) \|\|		TLI.isOperationLegal(llvm::ISD::ConstantFP, VT) \|\|
TLI.isFPImmLegal(Recip, VT)))		TLI.isFPImmLegal(Recip, VT)))
return DAG.getNode(ISD::FMUL, SDLoc(N), VT, N0,		return DAG.getNode(ISD::FMUL, SDLoc(N), VT, N0,
DAG.getConstantFP(Recip, VT));		DAG.getConstantFP(Recip, VT));
}		}

// If this FDIV is part of a reciprocal square root, it may be folded		// If this FDIV is part of a reciprocal square root, it may be folded
// into a target-specific square root estimate instruction.		// into a target-specific square root estimate instruction.
if (SDValue SqrtOp = BuildRSQRTE(N))		if (N1.getOpcode() == ISD::FSQRT) {
return SqrtOp;		if (SDValue RV = BuildRsqrtEstimate(N1.getOperand(0))) {
		AddToWorklist(RV.getNode());
		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
		}
		} else if (N1.getOpcode() == ISD::FP_EXTEND &&
		N1.getOperand(0).getOpcode() == ISD::FSQRT) {
		if (SDValue RV = BuildRsqrtEstimate(N1.getOperand(0).getOperand(0))) {
		AddToWorklist(RV.getNode());
		RV = DAG.getNode(ISD::FP_EXTEND, SDLoc(N1), VT, RV);
		AddToWorklist(RV.getNode());
		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
		}
		} else if (N1.getOpcode() == ISD::FP_ROUND &&
		N1.getOperand(0).getOpcode() == ISD::FSQRT) {
		if (SDValue RV = BuildRsqrtEstimate(N1.getOperand(0).getOperand(0))) {
		AddToWorklist(RV.getNode());
		RV = DAG.getNode(ISD::FP_ROUND, SDLoc(N1), VT, RV, N1.getOperand(1));
		AddToWorklist(RV.getNode());
		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
		}
		}

		// Fold into a reciprocal estimate and multiply instead of a real divide.
		if (SDValue RV = BuildReciprocalEstimate(N1)) {
		AddToWorklist(RV.getNode());
		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
		}
}		}

// (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y)		// (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y)
if (char LHSNeg = isNegatibleForFree(N0, LegalOperations, TLI, &Options)) {		if (char LHSNeg = isNegatibleForFree(N0, LegalOperations, TLI, &Options)) {
if (char RHSNeg = isNegatibleForFree(N1, LegalOperations, TLI, &Options)) {		if (char RHSNeg = isNegatibleForFree(N1, LegalOperations, TLI, &Options)) {
// Both can be negated for free, check to see if at least one is cheaper		// Both can be negated for free, check to see if at least one is cheaper
// negated.		// negated.
if (LHSNeg == 2 \|\| RHSNeg == 2)		if (LHSNeg == 2 \|\| RHSNeg == 2)
Show All 15 Lines	SDValue DAGCombiner::visitFREM(SDNode *N) {

// fold (frem c1, c2) -> fmod(c1,c2)		// fold (frem c1, c2) -> fmod(c1,c2)
if (N0CFP && N1CFP)		if (N0CFP && N1CFP)
return DAG.getNode(ISD::FREM, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::FREM, SDLoc(N), VT, N0, N1);

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitFSQRT(SDNode *N) {
		if (DAG.getTarget().Options.UnsafeFPMath) {
		// Compute this as 1/(1/sqrt(X)): the reciprocal of the reciprocal sqrt.
		if (SDValue RV = BuildRsqrtEstimate(N->getOperand(0))) {
		AddToWorklist(RV.getNode());
		RV = BuildReciprocalEstimate(RV);
		if (RV.getNode()) {
		// Unfortunately, RV is now NaN if the input was exactly 0.
		// Select out this case and force the answer to 0.
		EVT VT = RV.getValueType();

		SDValue Zero = DAG.getConstantFP(0.0, VT);
		SDValue ZeroCmp =
		DAG.getSetCC(SDLoc(N), TLI.getSetCCResultType(*DAG.getContext(), VT),
		N->getOperand(0), Zero, ISD::SETEQ);
		AddToWorklist(ZeroCmp.getNode());
		AddToWorklist(RV.getNode());

		RV = DAG.getNode(VT.isVector() ? ISD::VSELECT : ISD::SELECT,
		SDLoc(N), VT, ZeroCmp, Zero, RV);
		return RV;
		}
		}
		}
		return SDValue();
		}

SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {		SDValue DAGCombiner::visitFCOPYSIGN(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);		ConstantFPSDNode *N1CFP = dyn_cast<ConstantFPSDNode>(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (N0CFP && N1CFP) // Constant fold		if (N0CFP && N1CFP) // Constant fold
▲ Show 20 Lines • Show All 4,644 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::BuildUDIV(SDNode *N) {
SDValue S =		SDValue S =
TLI.BuildUDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);		TLI.BuildUDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);

for (SDNode *N : Built)		for (SDNode *N : Built)
AddToWorklist(N);		AddToWorklist(N);
return S;		return S;
}		}

/// Given an ISD::FDIV node with either a direct or indirect ISD::FSQRT operand,		SDValue DAGCombiner::BuildReciprocalEstimate(SDValue Op) {
/// generate a DAG expression using a reciprocal square root estimate op.		if (Level >= AfterLegalizeDAG)
SDValue DAGCombiner::BuildRSQRTE(SDNode *N) {		return SDValue();

// Expose the DAG combiner to the target combiner implementations.		// Expose the DAG combiner to the target combiner implementations.
TargetLowering::DAGCombinerInfo DCI(DAG, Level, false, this);		TargetLowering::DAGCombinerInfo DCI(DAG, Level, false, this);
SDLoc DL(N);
EVT VT = N->getValueType(0);
SDValue N1 = N->getOperand(1);

if (N1.getOpcode() == ISD::FSQRT) {		unsigned Iterations;
if (SDValue RV = TLI.BuildRSQRTE(N1.getOperand(0), DCI)) {		if (SDValue Est = TLI.getEstimate(ISD::FDIV, Op, DCI, Iterations)) {
AddToWorklist(RV.getNode());		// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)
return DAG.getNode(ISD::FMUL, DL, VT, N->getOperand(0), RV);		// For the reciprocal, we need to find the zero of the function:
		// F(X) = A X - 1 [which has a zero at X = 1/A]
		// =>
		// X_{i+1} = X_i (2 - A X_i) = X_i + X_i (1 - A X_i) [this second form
		// does not require additional intermediate precision]
		EVT VT = Op.getValueType();
		SDLoc DL(Op);
		SDValue FPOne = DAG.getConstantFP(1.0, VT);

		AddToWorklist(Est.getNode());

		// Newton iterations: Est = Est + Est (1 - Arg * Est)
		for (unsigned i = 0; i < Iterations; ++i) {
		SDValue NewEst = DAG.getNode(ISD::FMUL, DL, VT, Op, Est);
		AddToWorklist(NewEst.getNode());

		NewEst = DAG.getNode(ISD::FSUB, DL, VT, FPOne, NewEst);
		AddToWorklist(NewEst.getNode());

		NewEst = DAG.getNode(ISD::FMUL, DL, VT, Est, NewEst);
		AddToWorklist(NewEst.getNode());

		Est = DAG.getNode(ISD::FADD, DL, VT, Est, NewEst);
		AddToWorklist(Est.getNode());
}		}
} else if (N1.getOpcode() == ISD::FP_EXTEND &&
N1.getOperand(0).getOpcode() == ISD::FSQRT) {		return Est;
if (SDValue RV = TLI.BuildRSQRTE(N1.getOperand(0).getOperand(0), DCI)) {
DCI.AddToWorklist(RV.getNode());
RV = DAG.getNode(ISD::FP_EXTEND, SDLoc(N1), VT, RV);
AddToWorklist(RV.getNode());
return DAG.getNode(ISD::FMUL, DL, VT, N->getOperand(0), RV);
}		}
} else if (N1.getOpcode() == ISD::FP_ROUND &&
N1.getOperand(0).getOpcode() == ISD::FSQRT) {		return SDValue();
if (SDValue RV = TLI.BuildRSQRTE(N1.getOperand(0).getOperand(0), DCI)) {		}
DCI.AddToWorklist(RV.getNode());
RV = DAG.getNode(ISD::FP_ROUND, SDLoc(N1), VT, RV, N1.getOperand(1));		SDValue DAGCombiner::BuildRsqrtEstimate(SDValue Op) {
AddToWorklist(RV.getNode());		if (Level >= AfterLegalizeDAG)
return DAG.getNode(ISD::FMUL, DL, VT, N->getOperand(0), RV);		return SDValue();

		// Expose the DAG combiner to the target combiner implementations.
		TargetLowering::DAGCombinerInfo DCI(DAG, Level, false, this);
		unsigned Iterations;
		if (SDValue Est = TLI.getEstimate(ISD::FSQRT, Op, DCI, Iterations)) {
		// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)
		// For the reciprocal sqrt, we need to find the zero of the function:
		// F(X) = 1/X^2 - A [which has a zero at X = 1/sqrt(A)]
		// =>
		// X_{i+1} = X_i (1.5 - A X_i^2 / 2)
		// As a result, we precompute A/2 prior to the iteration loop.
		EVT VT = Op.getValueType();
		SDLoc DL(Op);
		SDValue FPThreeHalves = DAG.getConstantFP(1.5, VT);

		AddToWorklist(Est.getNode());

		// We now need 0.5 * Arg which we can write as (1.5 * Arg - Arg) so that
		// this entire sequence requires only one FP constant.
		SDValue HalfArg = DAG.getNode(ISD::FMUL, DL, VT, FPThreeHalves, Op);
		AddToWorklist(HalfArg.getNode());

		HalfArg = DAG.getNode(ISD::FSUB, DL, VT, HalfArg, Op);
		AddToWorklist(HalfArg.getNode());

		// Newton iterations: Est = Est * (1.5 - HalfArg * Est * Est)
		for (unsigned i = 0; i < Iterations; ++i) {
		SDValue NewEst = DAG.getNode(ISD::FMUL, DL, VT, Est, Est);
		AddToWorklist(NewEst.getNode());

		NewEst = DAG.getNode(ISD::FMUL, DL, VT, HalfArg, NewEst);
		AddToWorklist(NewEst.getNode());

		NewEst = DAG.getNode(ISD::FSUB, DL, VT, FPThreeHalves, NewEst);
		AddToWorklist(NewEst.getNode());

		Est = DAG.getNode(ISD::FMUL, DL, VT, Est, NewEst);
		AddToWorklist(Est.getNode());
}		}

		return Est;
}		}

return SDValue();		return SDValue();
}		}

/// Return true if base is a frame index, which is known not to alias with		/// Return true if base is a frame index, which is known not to alias with
/// anything but itself. Provides base object and offset as results.		/// anything but itself. Provides base object and offset as results.
static bool FindBaseOffset(SDValue Ptr, SDValue &Base, int64_t &Offset,		static bool FindBaseOffset(SDValue Ptr, SDValue &Base, int64_t &Offset,
▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 694 Lines • ▼ Show 20 Lines	LowerCall_32SVR4(SDValue Chain, SDValue Callee, CallingConv::ID CallConv,
SDLoc dl, SelectionDAG &DAG,		SDLoc dl, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals) const;		SmallVectorImpl<SDValue> &InVals) const;

SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;

SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue DAGCombineTruncBoolExt(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue DAGCombineTruncBoolExt(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue DAGCombineFastRecip(SDValue Op, DAGCombinerInfo &DCI) const;
SDValue BuildRSQRTE(SDValue Op, DAGCombinerInfo &DCI) const;		SDValue getEstimate(unsigned Opcode, SDValue Operand,
		DAGCombinerInfo &DCI,
		unsigned &RefinementSteps) const override;

CCAssignFn *useFastISelCCs(unsigned Flag) const;		CCAssignFn *useFastISelCCs(unsigned Flag) const;
};		};

namespace PPC {		namespace PPC {
FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,		FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,
const TargetLibraryInfo *LibInfo);		const TargetLibraryInfo *LibInfo);
}		}
Show All 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,452 Lines • ▼ Show 20 Lines	PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI,
MI->eraseFromParent(); // The pseudo instruction is gone now.		MI->eraseFromParent(); // The pseudo instruction is gone now.
return BB;		return BB;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target Optimization Hooks		// Target Optimization Hooks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

SDValue PPCTargetLowering::DAGCombineFastRecip(SDValue Op,		SDValue PPCTargetLowering::getEstimate(unsigned Opcode, SDValue Operand,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI,
if (DCI.isAfterLegalizeVectorOps())		unsigned &RefinementSteps) const {
return SDValue();		EVT VT = Operand.getValueType();
		SDValue RV;
EVT VT = Op.getValueType();		if (Opcode == ISD::FSQRT) {
		if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|
		(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|
		(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
		(VT == MVT::v2f64 && Subtarget.hasVSX()))
		RV = DCI.DAG.getNode(PPCISD::FRSQRTE, SDLoc(Operand), VT, Operand);
		} else if (Opcode == ISD::FDIV) {
if ((VT == MVT::f32 && Subtarget.hasFRES()) \|\|		if ((VT == MVT::f32 && Subtarget.hasFRES()) \|\|
(VT == MVT::f64 && Subtarget.hasFRE()) \|\|		(VT == MVT::f64 && Subtarget.hasFRE()) \|\|
(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|		(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
(VT == MVT::v2f64 && Subtarget.hasVSX())) {		(VT == MVT::v2f64 && Subtarget.hasVSX()))
		RV = DCI.DAG.getNode(PPCISD::FRE, SDLoc(Operand), VT, Operand);
// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)
// For the reciprocal, we need to find the zero of the function:
// F(X) = A X - 1 [which has a zero at X = 1/A]
// =>
// X_{i+1} = X_i (2 - A X_i) = X_i + X_i (1 - A X_i) [this second form
// does not require additional intermediate precision]

// Convergence is quadratic, so we essentially double the number of digits
// correct after every iteration. The minimum architected relative
// accuracy is 2^-5. When hasRecipPrec(), this is 2^-14. IEEE float has
// 23 digits and double has 52 digits.
int Iterations = Subtarget.hasRecipPrec() ? 1 : 3;
if (VT.getScalarType() == MVT::f64)
++Iterations;

SelectionDAG &DAG = DCI.DAG;
SDLoc dl(Op);

SDValue FPOne =
DAG.getConstantFP(1.0, VT.getScalarType());
if (VT.isVector()) {
assert(VT.getVectorNumElements() == 4 &&
"Unknown vector type");
FPOne = DAG.getNode(ISD::BUILD_VECTOR, dl, VT,
FPOne, FPOne, FPOne, FPOne);
}

SDValue Est = DAG.getNode(PPCISD::FRE, dl, VT, Op);
DCI.AddToWorklist(Est.getNode());

// Newton iterations: Est = Est + Est (1 - Arg * Est)
for (int i = 0; i < Iterations; ++i) {
SDValue NewEst = DAG.getNode(ISD::FMUL, dl, VT, Op, Est);
DCI.AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FSUB, dl, VT, FPOne, NewEst);
DCI.AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FMUL, dl, VT, Est, NewEst);
DCI.AddToWorklist(NewEst.getNode());

Est = DAG.getNode(ISD::FADD, dl, VT, Est, NewEst);
DCI.AddToWorklist(Est.getNode());
}

return Est;
}		}
		if (RV.getNode()) {
return SDValue();
}

SDValue PPCTargetLowering::BuildRSQRTE(SDValue Op, DAGCombinerInfo &DCI) const {
if (DCI.isAfterLegalizeVectorOps())
return SDValue();

EVT VT = Op.getValueType();

if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|
(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|
(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
(VT == MVT::v2f64 && Subtarget.hasVSX())) {

// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)
// For the reciprocal sqrt, we need to find the zero of the function:
// F(X) = 1/X^2 - A [which has a zero at X = 1/sqrt(A)]
// =>
// X_{i+1} = X_i (1.5 - A X_i^2 / 2)
// As a result, we precompute A/2 prior to the iteration loop.

// Convergence is quadratic, so we essentially double the number of digits		// Convergence is quadratic, so we essentially double the number of digits
// correct after every iteration. The minimum architected relative		// correct after every iteration. For both FRE and FRSQRTE, the minimum
// accuracy is 2^-5. When hasRecipPrec(), this is 2^-14. IEEE float has		// architected relative accuracy is 2^-5. When hasRecipPrec(), this is
// 23 digits and double has 52 digits.		// 2^-14. IEEE float has 23 digits and double has 52 digits.
int Iterations = Subtarget.hasRecipPrec() ? 1 : 3;		RefinementSteps = Subtarget.hasRecipPrec() ? 1 : 3;
if (VT.getScalarType() == MVT::f64)		if (VT.getScalarType() == MVT::f64)
++Iterations;		++RefinementSteps;

SelectionDAG &DAG = DCI.DAG;
SDLoc dl(Op);

SDValue FPThreeHalves =
DAG.getConstantFP(1.5, VT.getScalarType());
if (VT.isVector()) {
assert(VT.getVectorNumElements() == 4 &&
"Unknown vector type");
FPThreeHalves = DAG.getNode(ISD::BUILD_VECTOR, dl, VT,
FPThreeHalves, FPThreeHalves,
FPThreeHalves, FPThreeHalves);
}		}
		return RV;
SDValue Est = DAG.getNode(PPCISD::FRSQRTE, dl, VT, Op);
DCI.AddToWorklist(Est.getNode());

// We now need 0.5Arg which we can write as (1.5Arg - Arg) so that
// this entire sequence requires only one FP constant.
SDValue HalfArg = DAG.getNode(ISD::FMUL, dl, VT, FPThreeHalves, Op);
DCI.AddToWorklist(HalfArg.getNode());

HalfArg = DAG.getNode(ISD::FSUB, dl, VT, HalfArg, Op);
DCI.AddToWorklist(HalfArg.getNode());

// Newton iterations: Est = Est * (1.5 - HalfArg * Est * Est)
for (int i = 0; i < Iterations; ++i) {
SDValue NewEst = DAG.getNode(ISD::FMUL, dl, VT, Est, Est);
DCI.AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FMUL, dl, VT, HalfArg, NewEst);
DCI.AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FSUB, dl, VT, FPThreeHalves, NewEst);
DCI.AddToWorklist(NewEst.getNode());

Est = DAG.getNode(ISD::FMUL, dl, VT, Est, NewEst);
DCI.AddToWorklist(Est.getNode());
}

return Est;
}

return SDValue();
}		}

static bool isConsecutiveLSLoc(SDValue Loc, EVT VT, LSBaseSDNode *Base,		static bool isConsecutiveLSLoc(SDValue Loc, EVT VT, LSBaseSDNode *Base,
unsigned Bytes, int Dist,		unsigned Bytes, int Dist,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
if (VT.getSizeInBits() / 8 != Bytes)		if (VT.getSizeInBits() / 8 != Bytes)
return false;		return false;

▲ Show 20 Lines • Show All 710 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
return DAGCombineExtBoolTrunc(N, DCI);		return DAGCombineExtBoolTrunc(N, DCI);
case ISD::TRUNCATE:		case ISD::TRUNCATE:
case ISD::SETCC:		case ISD::SETCC:
case ISD::SELECT_CC:		case ISD::SELECT_CC:
return DAGCombineTruncBoolExt(N, DCI);		return DAGCombineTruncBoolExt(N, DCI);
case ISD::FDIV: {
assert(TM.Options.UnsafeFPMath &&
"Reciprocal estimates require UnsafeFPMath");

SDValue RV = DAGCombineFastRecip(N->getOperand(1), DCI);
if (RV.getNode()) {
DCI.AddToWorklist(RV.getNode());
return DAG.getNode(ISD::FMUL, dl, N->getValueType(0),
N->getOperand(0), RV);
}

}
break;
case ISD::FSQRT: {
assert(TM.Options.UnsafeFPMath &&
"Reciprocal estimates require UnsafeFPMath");

// Compute this as 1/(1/sqrt(X)), which is the reciprocal of the
// reciprocal sqrt.
SDValue RV = BuildRSQRTE(N->getOperand(0), DCI);
if (RV.getNode()) {
DCI.AddToWorklist(RV.getNode());
RV = DAGCombineFastRecip(RV, DCI);
if (RV.getNode()) {
// Unfortunately, RV is now NaN if the input was exactly 0. Select out
// this case and force the answer to 0.

EVT VT = RV.getValueType();

SDValue Zero = DAG.getConstantFP(0.0, VT.getScalarType());
if (VT.isVector()) {
assert(VT.getVectorNumElements() == 4 && "Unknown vector type");
Zero = DAG.getNode(ISD::BUILD_VECTOR, dl, VT, Zero, Zero, Zero, Zero);
}

SDValue ZeroCmp =
DAG.getSetCC(dl, getSetCCResultType(*DAG.getContext(), VT),
N->getOperand(0), Zero, ISD::SETEQ);
DCI.AddToWorklist(ZeroCmp.getNode());
DCI.AddToWorklist(RV.getNode());

RV = DAG.getNode(VT.isVector() ? ISD::VSELECT : ISD::SELECT, dl, VT,
ZeroCmp, Zero, RV);
return RV;
}
}

}
break;
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
if (TM.getSubtarget<PPCSubtarget>().has64BitSupport()) {		if (TM.getSubtarget<PPCSubtarget>().has64BitSupport()) {
if (N->getOperand(0).getOpcode() == ISD::FP_TO_SINT) {		if (N->getOperand(0).getOpcode() == ISD::FP_TO_SINT) {
// Turn (sint_to_fp (fp_to_sint X)) -> fctidz/fcfid without load/stores.		// Turn (sint_to_fp (fp_to_sint X)) -> fctidz/fcfid without load/stores.
// We allow the src/dst to be either f32/f64, but the intermediate		// We allow the src/dst to be either f32/f64, but the intermediate
// type must be i64.		// type must be i64.
if (N->getOperand(0).getValueType() == MVT::i64 &&		if (N->getOperand(0).getValueType() == MVT::i64 &&
N->getOperand(0).getOperand(0).getValueType() != MVT::ppcf128) {		N->getOperand(0).getOperand(0).getValueType() != MVT::ppcf128) {
▲ Show 20 Lines • Show All 1,029 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/recipest.ll

Show All 10 Lines	define double @foo(double %a, double %b) nounwind {
%x = call double @llvm.sqrt.f64(double %b)		%x = call double @llvm.sqrt.f64(double %b)
%r = fdiv double %a, %x		%r = fdiv double %a, %x
ret double %r		ret double %r

; CHECK: @foo		; CHECK: @foo
; CHECK-DAG: frsqrte		; CHECK-DAG: frsqrte
; CHECK-DAG: fnmsub		; CHECK-DAG: fnmsub
; CHECK: fmul		; CHECK: fmul
; CHECK: fmadd		; CHECK-NEXT: fmadd
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: fmadd		; CHECK-NEXT: fmadd
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: blr		; CHECK: blr

; CHECK-SAFE: @foo		; CHECK-SAFE: @foo
; CHECK-SAFE: fsqrt		; CHECK-SAFE: fsqrt
; CHECK-SAFE: fdiv		; CHECK-SAFE: fdiv
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	define float @goo(float %a, float %b) nounwind {
%x = call float @llvm.sqrt.f32(float %b)		%x = call float @llvm.sqrt.f32(float %b)
%r = fdiv float %a, %x		%r = fdiv float %a, %x
ret float %r		ret float %r

; CHECK: @goo		; CHECK: @goo
; CHECK-DAG: frsqrtes		; CHECK-DAG: frsqrtes
; CHECK-DAG: fnmsubs		; CHECK-DAG: fnmsubs
; CHECK: fmuls		; CHECK: fmuls
; CHECK: fmadds		; CHECK-NEXT: fmadds
; CHECK: fmuls		; CHECK-NEXT: fmuls
; CHECK: fmuls		; CHECK-NEXT: fmuls
; CHECK: blr		; CHECK-NEXT: blr

; CHECK-SAFE: @goo		; CHECK-SAFE: @goo
; CHECK-SAFE: fsqrts		; CHECK-SAFE: fsqrts
; CHECK-SAFE: fdivs		; CHECK-SAFE: fdivs
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define <4 x float> @hoo(<4 x float> %a, <4 x float> %b) nounwind {		define <4 x float> @hoo(<4 x float> %a, <4 x float> %b) nounwind {
Show All 12 Lines
define double @foo2(double %a, double %b) nounwind {		define double @foo2(double %a, double %b) nounwind {
%r = fdiv double %a, %b		%r = fdiv double %a, %b
ret double %r		ret double %r

; CHECK: @foo2		; CHECK: @foo2
; CHECK-DAG: fre		; CHECK-DAG: fre
; CHECK-DAG: fnmsub		; CHECK-DAG: fnmsub
; CHECK: fmadd		; CHECK: fmadd
; CHECK: fnmsub		; CHECK-NEXT: fnmsub
; CHECK: fmadd		; CHECK-NEXT: fmadd
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: blr		; CHECK-NEXT: blr

; CHECK-SAFE: @foo2		; CHECK-SAFE: @foo2
; CHECK-SAFE: fdiv		; CHECK-SAFE: fdiv
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define float @goo2(float %a, float %b) nounwind {		define float @goo2(float %a, float %b) nounwind {
%r = fdiv float %a, %b		%r = fdiv float %a, %b
ret float %r		ret float %r

; CHECK: @goo2		; CHECK: @goo2
; CHECK-DAG: fres		; CHECK-DAG: fres
; CHECK-DAG: fnmsubs		; CHECK-DAG: fnmsubs
; CHECK: fmadds		; CHECK: fmadds
; CHECK: fmuls		; CHECK-NEXT: fmuls
; CHECK: blr		; CHECK-NEXT: blr

; CHECK-SAFE: @goo2		; CHECK-SAFE: @goo2
; CHECK-SAFE: fdivs		; CHECK-SAFE: fdivs
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define <4 x float> @hoo2(<4 x float> %a, <4 x float> %b) nounwind {		define <4 x float> @hoo2(<4 x float> %a, <4 x float> %b) nounwind {
%r = fdiv <4 x float> %a, %b		%r = fdiv <4 x float> %a, %b
Show All 11 Lines	define double @foo3(double %a) nounwind {
%r = call double @llvm.sqrt.f64(double %a)		%r = call double @llvm.sqrt.f64(double %a)
ret double %r		ret double %r

; CHECK: @foo3		; CHECK: @foo3
; CHECK: fcmpu		; CHECK: fcmpu
; CHECK-DAG: frsqrte		; CHECK-DAG: frsqrte
; CHECK-DAG: fnmsub		; CHECK-DAG: fnmsub
; CHECK: fmul		; CHECK: fmul
; CHECK: fmadd		; CHECK-NEXT: fmadd
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: fmadd		; CHECK-NEXT: fmadd
; CHECK: fmul		; CHECK-NEXT: fmul
; CHECK: fre		; CHECK-NEXT: fre
; CHECK: fnmsub		; CHECK-NEXT: fnmsub
; CHECK: fmadd		; CHECK-NEXT: fmadd
; CHECK: fnmsub		; CHECK-NEXT: fnmsub
; CHECK: fmadd		; CHECK-NEXT: fmadd
; CHECK: blr		; CHECK: blr

; CHECK-SAFE: @foo3		; CHECK-SAFE: @foo3
; CHECK-SAFE: fsqrt		; CHECK-SAFE: fsqrt
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define float @goo3(float %a) nounwind {		define float @goo3(float %a) nounwind {
%r = call float @llvm.sqrt.f32(float %a)		%r = call float @llvm.sqrt.f32(float %a)
ret float %r		ret float %r

; CHECK: @goo3		; CHECK: @goo3
; CHECK: fcmpu		; CHECK: fcmpu
; CHECK-DAG: frsqrtes		; CHECK-DAG: frsqrtes
; CHECK-DAG: fnmsubs		; CHECK-DAG: fnmsubs
; CHECK: fmuls		; CHECK: fmuls
; CHECK: fmadds		; CHECK-NEXT: fmadds
; CHECK: fmuls		; CHECK-NEXT: fmuls
; CHECK: fres		; CHECK-NEXT: fres
; CHECK: fnmsubs		; CHECK-NEXT: fnmsubs
; CHECK: fmadds		; CHECK-NEXT: fmadds
; CHECK: blr		; CHECK: blr

; CHECK-SAFE: @goo3		; CHECK-SAFE: @goo3
; CHECK-SAFE: fsqrts		; CHECK-SAFE: fsqrts
; CHECK-SAFE: blr		; CHECK-SAFE: blr
}		}

define <4 x float> @hoo3(<4 x float> %a) nounwind {		define <4 x float> @hoo3(<4 x float> %a) nounwind {
Show All 13 Lines