This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Target/
-
llvm/
-
Target/
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
2/10
DAGCombiner.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
2/2
AArch64ISelLowering.cpp
-
AArch64InstrInfo.td
-
AMDGPU/
-
AMDGPUISelLowering.h
-
AMDGPUISelLowering.cpp
-
PowerPC/
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
X86/
-
X86ISelLowering.h
-
X86ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
recp-fastmath.ll
-
sqrt-fastmath.ll

Differential D22975

[DAG Combiner] Fix the native computation of the Newton series for reciprocals
ClosedPublic

Authored by evandro on Jul 29 2016, 1:49 PM.

Download Raw Diff

Details

Reviewers

spatel
rengolin
t.p.northover
• tstellarAMD
n.bozhenov
hfinkel

Commits

rG21f9ce1a0d87: [DAG Combiner] Fix the native computation of the Newton series for reciprocals
rL286523: [DAG Combiner] Fix the native computation of the Newton series for reciprocals

Summary

The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own.

Diff Detail

Repository: rL LLVM

Event Timeline

evandro updated this revision to Diff 66169.Jul 29 2016, 1:49 PM

evandro retitled this revision from to Add new nodes for computing the Newton series.

evandro updated this object.

evandro added reviewers: n.bozhenov, t.p.northover.

evandro set the repository for this revision to rL LLVM.

evandro added subscribers: llvm-commits, spatel, zinovy.nis.

Herald added a subscriber: aemerson. · View Herald TranscriptJul 29 2016, 1:49 PM

t.p.northover added inline comments.Jul 29 2016, 2:05 PM

llvm/include/llvm/CodeGen/ISDOpcodes.h
540–542 ↗	(On Diff #66169)	You should probably mention the argument order and precision requirements here (specifically that multiple rounding steps may occur).
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14522	Isn't this more a question of legality?
llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
203–204 ↗	(On Diff #66169)	Probably should be all lower-case looking at the surrounding entries.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7832–7833	Isn't this really just a pattern? It looks like we could mark FRECPI as legal (which is more what the DAGCombiner should be asking anyway) and write: def : Pat<(f32 (frecpi f32:$arg, f32:$est), (FMULSrr (FRECPS32 $arg, $est) $est>; Similarly for the sqrt case.
llvm/test/CodeGen/AArch64/recp-fastmath.ll
16	We should be checking data-flow too in these tests. It's not enough to know that LLVM managed to cobble together an frecps instruction.

flyingforyou added a subscriber: flyingforyou.Jul 29 2016, 6:16 PM

n.bozhenov added inline comments.Aug 1 2016, 5:50 AM

llvm/include/llvm/CodeGen/ISDOpcodes.h
546 ↗	(On Diff #66169)	I don't quite understand why do you need to introduce new ARM specific nodes into the ISD namespace. As an alternative, you could ask the target to directly build ARMISD::FRSQRTI instructions from buildSqrtNRNative function.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14695	A more consistent way to choose a refinement method would be to replace the boolean UseOneConstNR value with some enum, let the target set this enum and choose the refinement method based on the enum value. Yet another option would be to return an already refined value from ARMTargetLowering::getRsqrtEstimate and set Iterations to 0. In this case we wouldn't need the buildSqrtNRNative function.

n.bozhenov added a reviewer: hfinkel.Aug 1 2016, 5:52 AM

Thank you.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14695	I like this suggestion. I'll explore it and get back to y'all.

evandro retitled this revision from Add new nodes for computing the Newton series to [AArch64] Compute the Newton series iterations natively.Aug 2 2016, 10:53 AM

evandro updated this object.

Refactored the previous patch extensively to implement the series in the AArch64 backend.

Herald added a subscriber: rengolin. · View Herald TranscriptAug 2 2016, 10:57 AM

Harden the test cases.

t.p.northover added inline comments.Aug 2 2016, 2:09 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14500	All of the changes in this file appear to be refactoring, is that right? Mostly they look OK (though should be committed separately). But I'm not convinced moving the sqrt(0) handling into BuildReciprocalEstimate is an improvement.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4598	A for loop would probably be better. People expect while loops to do crazy things.

evandro marked an inline comment as done.Aug 3 2016, 7:51 AM

evandro added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14500	It's actually necessary to inform getRsqrtEstimate() if the estimate is for sqrt() so that, if the target prefers to generate the whole sequence, it has to know if it's for sqrt() or its reciprocal, since the final product, in case of sqrt(), is currently done inside buildSqrtNR{One,Two}Const().

evandro updated this revision to Diff 66719.Aug 3 2016, 2:21 PM

evandro retitled this revision from [AArch64] Compute the Newton series iterations natively to Compute the Newton series natively.

evandro updated this object.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptAug 3 2016, 2:21 PM

Herald added subscribers: nemanjai, arsenm. · View Herald Transcript

evandro updated this object.Aug 3 2016, 2:22 PM

evandro edited edge metadata.

t.p.northover added inline comments.Aug 3 2016, 2:56 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14500	OK, I see that change now. But what about the block of 0-handling code moved from buildSqrtEstimate to buildSqrtEstimateImpl? I still don't see how that's an improvement.

spatel added inline comments.Aug 3 2016, 3:09 PM

llvm/test/CodeGen/X86/sqrt-fastmath.ll
42–45 ↗	(On Diff #66719)	vblendvps is a regression vs. the vandnps that we had. Do you know what caused that? Also, why did the "-NEXT" part of the assertions disappear? For any x86 codegen test changes, please use the script that is noted on the first line of the test file to avoid that problem.

evandro marked an inline comment as done.Aug 4 2016, 12:27 PM

evandro added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14500	For now it does improve grouping, IMO. But I'm mulling whether the handling code could be profitably moved inside `getSqrtEstimate`.
14669	@spatel I changed Zero for Op here, since Op is then known to be 0. The motivation was that since AArch64 has an immediate version that implements SETEQ and materializing a 0 can be expensive. This probably caused the side effect of changing `andnps` for `vblendvps` in X86.
llvm/test/CodeGen/X86/sqrt-fastmath.ll
42–45 ↗	(On Diff #66719)	Sorry about that.

Addressed some comments.

spatel added inline comments.Aug 4 2016, 12:59 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14669	Ah, I see the diff now. But this is a target-independent transform, so isn't using 'Zero' in the select the more specific, and therefore the better, construct? This suggests that AArch64 is missing a fold that checks if an operand of a select is a zero; x86 must have this somewhere to allow the transform from blendv to andn?
llvm/test/CodeGen/X86/sqrt-fastmath.ll
42–45 ↗	(On Diff #66840)	No worries. Note that I've used a modified version of that script to generate checks for targets besides x86 - in case anyone would like to enhance the script and make test generation easier for AArch64. :)

t.p.northover added inline comments.Aug 4 2016, 1:02 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14669	Using `Op` also gets the correct value for -0. I'm not entirely sure how much that matters though given that we're in fast-math anyway.

Cleaned up comments that no longer apply.

Please can you recreate the diff with context? svn diff --diff-cmd=diff -x -U999999

llvm/test/CodeGen/X86/sqrt-fastmath.ll
42–45 ↗	(On Diff #66840)	As Sanjay said, the use of vblendvps over vandnps is a regression that could affect throughput quite badly.

evandro added inline comments.Aug 9 2016, 3:30 PM

llvm/test/CodeGen/X86/sqrt-fastmath.ll
42–45 ↗	(On Diff #66840)	@t.p.northover, is Sanjay onto something that AArch64 could use a folding instead? Otherwise, I could move the check for 0.0 inside `getSqrtEstimate()`.

t.p.northover added inline comments.Aug 9 2016, 3:40 PM

llvm/test/CodeGen/X86/sqrt-fastmath.ll
42–45 ↗	(On Diff #66840)	We seem to catch the simple cases, based on: define <4 x float> @foo(<4 x float> %lhs, <4 x float> %rhs, <4 x float> %val) { %tst = fcmp oeq <4 x float> %lhs, %rhs %res = select <4 x i1> %tst, <4 x float> %val, <4 x float> zeroinitializer ret <4 x float> %res } (I haven't checked but suspect it's actually the generic DAG combiner that's doing it). I'm not sure why we don't get this one, but fixing it would improve performance, probably beyond using `Op`.

Side note regarding select folding with -1/0: inspired by this patch, I filed PR28895 ( https://llvm.org/bugs/show_bug.cgi?id=28895 ).
There are a few different paths and optimizations for this in x86. Some of it (eg, D23337 ) could be lifted to generic DAG combiner I think.

When I looked at how AArch64 handled the case in PR28895, I noticed that the select always get cracked into and/andn/or and then re-matched into a vbsl. That seems like better general policy than what x86 is doing (matching to ISD::VSELECT early).

Regardless of all that, we really do want to avoid vblendv on x86 in this patch. As Simon hinted, some cores suffer greatly because vblendv is cracked into the base logic ops (and/andn/or) by the HW, and so that instruction has 3 times worse latency/throughput than a simple op.

andreadb added a subscriber: andreadb.Aug 10 2016, 2:21 AM

I folded the checking when the argument is zero into getSqrtEstimate(). I think that it makes sense when it returns no iterations, allowing the target to handle everything as is more optimal for it.

It's better to use VSELECT also for scalar types. How do I go about to stuff scalars into vectors and then out around VSELECT?

Thank you.

Ping^1

RKSimon resigned from this revision.Sep 14 2016, 8:23 AM

RKSimon removed a reviewer: RKSimon.

Ping^2

Ahem, update the patch with full context.

Herald added subscribers: nhaehnle, wdng. · View Herald TranscriptSep 19 2016, 8:28 AM

evandro added a parent revision: D25291: [AArch64] Optionally use the Newton series for reciprocal estimation.Oct 5 2016, 12:55 PM

Rebase patch from D25291.

Herald edited edge metadata. · View Herald TranscriptOct 17 2016, 12:41 PM

evandro added a reviewer: spatel.Oct 17 2016, 12:43 PM

Rebase patch from D25291.

Herald edited edge metadata. · View Herald TranscriptOct 24 2016, 1:13 PM

evandro updated this object.Oct 25 2016, 3:30 PM

evandro edited edge metadata.

Ping^1

evandro retitled this revision from Compute the Newton series natively to [DAG Combiner] Fix the native computation of the Newton series for reciprocals.Oct 31 2016, 11:20 AM

The generic combiner parts look alright to me, and since there are no test changes for PPC or x86, I assume that those changes are only enabling the AArch64 diffs. Someone with AArch experience should confirm that that part of the patch looks good.

Thanks, @spatel!

It makes sense to split the fix of this issue from the native implementation using it in AArch64.

This part retains the target-independent changes as before. The part with the AArch64 specific changes will be followed up in another patch.

Herald edited edge metadata. · View Herald TranscriptNov 10 2016, 11:14 AM

evandro added a child revision: D26518: [AArch64] Compute the Newton series for reciprocals natively.Nov 10 2016, 12:49 PM

LGTM.

This revision is now accepted and ready to land.Nov 10 2016, 2:40 PM

Closed by commit rL286523: [DAG Combiner] Fix the native computation of the Newton series for reciprocals (authored by evandro). · Explain WhyNov 10 2016, 3:40 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Target/

TargetLowering.h

29 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

58 lines

Target/

AArch64/

AArch64ISelLowering.h

12 lines

AArch64ISelLowering.cpp

86 lines

AArch64InstrInfo.td

24 lines

AMDGPU/

AMDGPUISelLowering.h

9 lines

AMDGPUISelLowering.cpp

9 lines

PowerPC/

PPCISelLowering.h

6 lines

PPCISelLowering.cpp

9 lines

X86/

X86ISelLowering.h

6 lines

X86ISelLowering.cpp

9 lines

test/

CodeGen/

AArch64/

recp-fastmath.ll

34 lines

sqrt-fastmath.ll

93 lines

Diff 74888

llvm/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 2,968 Lines • ▼ Show 20 Lines	public:
/// that must exist.		/// that must exist.
virtual unsigned combineRepeatedFPDivisors() const {		virtual unsigned combineRepeatedFPDivisors() const {
return 0;		return 0;
}		}

/// Hooks for building estimates in place of slower divisions and square		/// Hooks for building estimates in place of slower divisions and square
/// roots.		/// roots.

/// Return a reciprocal square root estimate value for the input operand.		/// Return initial estimate of the reciprocal square root or an approximate
/// The RefinementSteps output is the number of Newton-Raphson refinement		/// square root or its reciprocal for the input operand.
/// iterations required to generate a sufficient (though not necessarily		/// The target has the choice to use the generic approximation algorithm,
/// IEEE-754 compliant) estimate for the value type.		/// when it should return the initial estimate of the reciprocal square root
/// The boolean UseOneConstNR output is used to select a Newton-Raphson		/// and the number of extra refinement iterations to generate a sufficient
/// algorithm implementation that uses one constant or two constants.		/// (though not necessarily IEEE-754 compliant) approximation in
/// A target may choose to implement its own refinement within this function.		/// RefinementSteps, or to approximate square root or its reciprocal itself,
/// If that's true, then return '0' as the number of RefinementSteps to avoid		/// when it should return the approximation and '0' in RefinementSteps.
/// any further refinement of the estimate.		/// Reciprocal indicates whether the square root or its reciprocal is being
		/// approximated, either by the target itself or by the generic algorithm.
		/// The boolean UseOneConstNR output is used to select between the generic
		/// Newton-Raphson algorithms for the refinement iterations that use one or
		/// two constants, in case the target provides just the initial estimate.
/// An empty SDValue return means no estimate sequence can be created.		/// An empty SDValue return means no estimate sequence can be created.
virtual SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,		virtual SDValue getSqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps,
bool &UseOneConstNR) const {		bool &UseOneConstNR,
		bool Reciprocal) const {
return SDValue();		return SDValue();
}		}

/// Return a reciprocal estimate value for the input operand.		/// Return a reciprocal estimate value for the input operand.
/// The RefinementSteps output is the number of Newton-Raphson refinement		/// The RefinementSteps output is the number of Newton-Raphson refinement
/// iterations required to generate a sufficient (though not necessarily		/// iterations required to generate a sufficient (though not necessarily
/// IEEE-754 compliant) estimate for the value type.		/// IEEE-754 compliant) estimate for the value type.
/// A target may choose to implement its own refinement within this function.		/// A target may choose to implement its own refinement within this function.
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,491 Lines • ▼ Show 20 Lines	if (XType.bitsGE(AType)) {
return DAG.getNode(ISD::AND, DL, AType, Shift, N2);		return DAG.getNode(ISD::AND, DL, AType, Shift, N2);
}		}

SDValue Shift = DAG.getNode(ISD::SRA, SDLoc(N0),		SDValue Shift = DAG.getNode(ISD::SRA, SDLoc(N0),
XType, N0,		XType, N0,
DAG.getConstant(XType.getSizeInBits() - 1,		DAG.getConstant(XType.getSizeInBits() - 1,
SDLoc(N0),		SDLoc(N0),
getShiftAmountTy(N0.getValueType())));		getShiftAmountTy(N0.getValueType())));
AddToWorklist(Shift.getNode());		AddToWorklist(Shift.getNode());
		t.p.northoverUnsubmitted Not Done Reply Inline Actions All of the changes in this file appear to be refactoring, is that right? Mostly they look OK (though should be committed separately). But I'm not convinced moving the sqrt(0) handling into BuildReciprocalEstimate is an improvement. t.p.northover: All of the changes in this file appear to be refactoring, is that right? Mostly they look OK…
		evandroAuthorUnsubmitted Not Done Reply Inline Actions It's actually necessary to inform getRsqrtEstimate() if the estimate is for sqrt() so that, if the target prefers to generate the whole sequence, it has to know if it's for sqrt() or its reciprocal, since the final product, in case of sqrt(), is currently done inside buildSqrtNR{One,Two}Const(). evandro: It's actually necessary to inform getRsqrtEstimate() if the estimate is for sqrt() so that, if…
		t.p.northoverUnsubmitted Not Done Reply Inline Actions OK, I see that change now. But what about the block of 0-handling code moved from buildSqrtEstimate to buildSqrtEstimateImpl? I still don't see how that's an improvement. t.p.northover: OK, I see that change now. But what about the block of 0-handling code moved from…
		evandroAuthorUnsubmitted Not Done Reply Inline Actions For now it does improve grouping, IMO. But I'm mulling whether the handling code could be profitably moved inside `getSqrtEstimate`. evandro: For now it does improve grouping, IMO. But I'm mulling whether the handling code could be…

if (XType.bitsGT(AType)) {		if (XType.bitsGT(AType)) {
Shift = DAG.getNode(ISD::TRUNCATE, DL, AType, Shift);		Shift = DAG.getNode(ISD::TRUNCATE, DL, AType, Shift);
AddToWorklist(Shift.getNode());		AddToWorklist(Shift.getNode());
}		}

return DAG.getNode(ISD::AND, DL, AType, Shift, N2);		return DAG.getNode(ISD::AND, DL, AType, Shift, N2);
}		}
}		}

// fold (select_cc seteq (and x, y), 0, 0, A) -> (and (shr (shl x)) A)		// fold (select_cc seteq (and x, y), 0, 0, A) -> (and (shr (shl x)) A)
// where y is has a single bit set.		// where y is has a single bit set.
// A plaintext description would be, we can turn the SELECT_CC into an AND		// A plaintext description would be, we can turn the SELECT_CC into an AND
// when the condition can be materialized as an all-ones register. Any		// when the condition can be materialized as an all-ones register. Any
// single bit-test can be materialized as an all-ones register with		// single bit-test can be materialized as an all-ones register with
// shift-left and shift-right-arith.		// shift-left and shift-right-arith.
if (CC == ISD::SETEQ && N0->getOpcode() == ISD::AND &&		if (CC == ISD::SETEQ && N0->getOpcode() == ISD::AND &&
N0->getValueType(0) == VT && isNullConstant(N1) && isNullConstant(N2)) {		N0->getValueType(0) == VT && isNullConstant(N1) && isNullConstant(N2)) {
SDValue AndLHS = N0->getOperand(0);		SDValue AndLHS = N0->getOperand(0);
ConstantSDNode *ConstAndRHS = dyn_cast<ConstantSDNode>(N0->getOperand(1));		ConstantSDNode *ConstAndRHS = dyn_cast<ConstantSDNode>(N0->getOperand(1));
if (ConstAndRHS && ConstAndRHS->getAPIntValue().countPopulation() == 1) {		if (ConstAndRHS && ConstAndRHS->getAPIntValue().countPopulation() == 1) {
// Shift the tested bit over the sign bit.		// Shift the tested bit over the sign bit.
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Isn't this more a question of legality? t.p.northover: Isn't this more a question of legality?
const APInt &AndMask = ConstAndRHS->getAPIntValue();		const APInt &AndMask = ConstAndRHS->getAPIntValue();
SDValue ShlAmt =		SDValue ShlAmt =
DAG.getConstant(AndMask.countLeadingZeros(), SDLoc(AndLHS),		DAG.getConstant(AndMask.countLeadingZeros(), SDLoc(AndLHS),
getShiftAmountTy(AndLHS.getValueType()));		getShiftAmountTy(AndLHS.getValueType()));
SDValue Shl = DAG.getNode(ISD::SHL, SDLoc(N0), VT, AndLHS, ShlAmt);		SDValue Shl = DAG.getNode(ISD::SHL, SDLoc(N0), VT, AndLHS, ShlAmt);

// Now arithmetic right shift it all the way over, so the result is either		// Now arithmetic right shift it all the way over, so the result is either
// all-ones, or zero.		// all-ones, or zero.
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);		return TLI.SimplifySetCC(VT, N0, N1, Cond, foldBooleans, DagCombineInfo, DL);
}		}

/// Given an ISD::SDIV node expressing a divide by constant, return		/// Given an ISD::SDIV node expressing a divide by constant, return
/// a DAG expression to select that will generate the same value by multiplying		/// a DAG expression to select that will generate the same value by multiplying
/// by a magic number.		/// by a magic number.
/// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".		/// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".
SDValue DAGCombiner::BuildSDIV(SDNode *N) {		SDValue DAGCombiner::BuildSDIV(SDNode *N) {
// when optimising for minimum size, we don't want to expand a div to a mul		// when optimising for minimum size, we don't want to expand a div to a mul
		evandroAuthorUnsubmitted Not Done Reply Inline Actions @spatel I changed Zero for Op here, since Op is then known to be 0. The motivation was that since AArch64 has an immediate version that implements SETEQ and materializing a 0 can be expensive. This probably caused the side effect of changing `andnps` for `vblendvps` in X86. evandro: @spatel I changed Zero for Op here, since Op is then known to be 0. The motivation was that…
		spatelUnsubmitted Not Done Reply Inline Actions Ah, I see the diff now. But this is a target-independent transform, so isn't using 'Zero' in the select the more specific, and therefore the better, construct? This suggests that AArch64 is missing a fold that checks if an operand of a select is a zero; x86 must have this somewhere to allow the transform from blendv to andn? spatel: Ah, I see the diff now. But this is a target-independent transform, so isn't using 'Zero' in…
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Using `Op` also gets the correct value for -0. I'm not entirely sure how much that matters though given that we're in fast-math anyway. t.p.northover: Using `Op` also gets the correct value for -0. I'm not entirely sure how much that matters…
// and a shift.		// and a shift.
if (DAG.getMachineFunction().getFunction()->optForMinSize())		if (DAG.getMachineFunction().getFunction()->optForMinSize())
return SDValue();		return SDValue();

ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));		ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));
if (!C)		if (!C)
return SDValue();		return SDValue();

Show All 9 Lines	for (SDNode *N : Built)
AddToWorklist(N);		AddToWorklist(N);
return S;		return S;
}		}

/// Given an ISD::SDIV node expressing a divide by constant power of 2, return a		/// Given an ISD::SDIV node expressing a divide by constant power of 2, return a
/// DAG expression that will generate the same value by right shifting.		/// DAG expression that will generate the same value by right shifting.
SDValue DAGCombiner::BuildSDIVPow2(SDNode *N) {		SDValue DAGCombiner::BuildSDIVPow2(SDNode *N) {
ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));		ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));
if (!C)		if (!C)
		n.bozhenovUnsubmitted Done Reply Inline Actions A more consistent way to choose a refinement method would be to replace the boolean UseOneConstNR value with some enum, let the target set this enum and choose the refinement method based on the enum value. Yet another option would be to return an already refined value from ARMTargetLowering::getRsqrtEstimate and set Iterations to 0. In this case we wouldn't need the buildSqrtNRNative function. n.bozhenov: A more consistent way to choose a refinement method would be to replace the boolean…
		evandroAuthorUnsubmitted Done Reply Inline Actions I like this suggestion. I'll explore it and get back to y'all. evandro: I like this suggestion. I'll explore it and get back to y'all.
return SDValue();		return SDValue();

// Avoid division by zero.		// Avoid division by zero.
if (C->isNullValue())		if (C->isNullValue())
return SDValue();		return SDValue();

std::vector<SDNode *> Built;		std::vector<SDNode *> Built;
SDValue S = TLI.BuildSDIVPow2(N, C->getAPIntValue(), DAG, &Built);		SDValue S = TLI.BuildSDIVPow2(N, C->getAPIntValue(), DAG, &Built);
Show All 25 Lines	SDValue DAGCombiner::BuildUDIV(SDNode *N) {
SDValue S =		SDValue S =
TLI.BuildUDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);		TLI.BuildUDIV(N, C->getAPIntValue(), DAG, LegalOperations, &Built);

for (SDNode *N : Built)		for (SDNode *N : Built)
AddToWorklist(N);		AddToWorklist(N);
return S;		return S;
}		}

		/// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)
		/// For the reciprocal, we need to find the zero of the function:
		/// F(X) = A X - 1 [which has a zero at X = 1/A]
		/// =>
		/// X_{i+1} = X_i (2 - A X_i) = X_i + X_i (1 - A X_i) [this second form
		/// does not require additional intermediate precision]
SDValue DAGCombiner::BuildReciprocalEstimate(SDValue Op, SDNodeFlags *Flags) {		SDValue DAGCombiner::BuildReciprocalEstimate(SDValue Op, SDNodeFlags *Flags) {
if (Level >= AfterLegalizeDAG)		if (Level >= AfterLegalizeDAG)
return SDValue();		return SDValue();

// Expose the DAG combiner to the target combiner implementations.		// Expose the DAG combiner to the target combiner implementations.
TargetLowering::DAGCombinerInfo DCI(DAG, Level, false, this);		TargetLowering::DAGCombinerInfo DCI(DAG, Level, false, this);

unsigned Iterations = 0;		unsigned Iterations = 0;
if (SDValue Est = TLI.getRecipEstimate(Op, DCI, Iterations)) {		if (SDValue Est = TLI.getRecipEstimate(Op, DCI, Iterations)) {
if (Iterations) {		if (Iterations) {
// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)
// For the reciprocal, we need to find the zero of the function:
// F(X) = A X - 1 [which has a zero at X = 1/A]
// =>
// X_{i+1} = X_i (2 - A X_i) = X_i + X_i (1 - A X_i) [this second form
// does not require additional intermediate precision]
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc DL(Op);		SDLoc DL(Op);
SDValue FPOne = DAG.getConstantFP(1.0, DL, VT);		SDValue FPOne = DAG.getConstantFP(1.0, DL, VT);

AddToWorklist(Est.getNode());		AddToWorklist(Est.getNode());

// Newton iterations: Est = Est + Est (1 - Arg * Est)		// Newton iterations: Est = Est + Est (1 - Op * Est)
for (unsigned i = 0; i < Iterations; ++i) {		for (unsigned i = 0; i < Iterations; ++i) {
SDValue NewEst = DAG.getNode(ISD::FMUL, DL, VT, Op, Est, Flags);		SDValue NewEst = DAG.getNode(ISD::FMUL, DL, VT, Op, Est, Flags);
AddToWorklist(NewEst.getNode());		AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FSUB, DL, VT, FPOne, NewEst, Flags);		NewEst = DAG.getNode(ISD::FSUB, DL, VT, FPOne, NewEst, Flags);
AddToWorklist(NewEst.getNode());		AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FMUL, DL, VT, Est, NewEst, Flags);		NewEst = DAG.getNode(ISD::FMUL, DL, VT, Est, NewEst, Flags);
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::buildSqrtEstimateImpl(SDValue Op, SDNodeFlags *Flags,
bool Reciprocal) {		bool Reciprocal) {
if (Level >= AfterLegalizeDAG)		if (Level >= AfterLegalizeDAG)
return SDValue();		return SDValue();

// Expose the DAG combiner to the target combiner implementations.		// Expose the DAG combiner to the target combiner implementations.
TargetLowering::DAGCombinerInfo DCI(DAG, Level, false, this);		TargetLowering::DAGCombinerInfo DCI(DAG, Level, false, this);
unsigned Iterations = 0;		unsigned Iterations = 0;
bool UseOneConstNR = false;		bool UseOneConstNR = false;
if (SDValue Est = TLI.getRsqrtEstimate(Op, DCI, Iterations, UseOneConstNR)) {		if (SDValue Est = TLI.getSqrtEstimate(Op, DCI, Iterations, UseOneConstNR,
		Reciprocal)) {
AddToWorklist(Est.getNode());		AddToWorklist(Est.getNode());

if (Iterations) {		if (Iterations) {
Est = UseOneConstNR		Est = UseOneConstNR
? buildSqrtNROneConst(Op, Est, Iterations, Flags, Reciprocal)		? buildSqrtNROneConst(Op, Est, Iterations, Flags, Reciprocal)
: buildSqrtNRTwoConst(Op, Est, Iterations, Flags, Reciprocal);		: buildSqrtNRTwoConst(Op, Est, Iterations, Flags, Reciprocal);

		if (!Reciprocal) {
		// Unfortunately, Est is now NaN if the input was exactly 0.
		// Select out this case and force the answer to 0.
		EVT VT = Op.getValueType();
		SDLoc DL(Op);

		SDValue Zero = DAG.getConstantFP(0.0, DL, VT);
		EVT CCVT = getSetCCResultType(VT);
		SDValue ZeroCmp = DAG.getSetCC(DL, CCVT, Op, Zero, ISD::SETEQ);
		AddToWorklist(ZeroCmp.getNode());

		Est = DAG.getNode(VT.isVector() ? ISD::VSELECT : ISD::SELECT, DL, VT,
		ZeroCmp, Zero, Est);
		AddToWorklist(Est.getNode());
}		}
		}

return Est;		return Est;
}		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::buildRsqrtEstimate(SDValue Op, SDNodeFlags *Flags) {		SDValue DAGCombiner::buildRsqrtEstimate(SDValue Op, SDNodeFlags *Flags) {
return buildSqrtEstimateImpl(Op, Flags, true);		return buildSqrtEstimateImpl(Op, Flags, true);
}		}

SDValue DAGCombiner::buildSqrtEstimate(SDValue Op, SDNodeFlags *Flags) {		SDValue DAGCombiner::buildSqrtEstimate(SDValue Op, SDNodeFlags *Flags) {
SDValue Est = buildSqrtEstimateImpl(Op, Flags, false);		return buildSqrtEstimateImpl(Op, Flags, false);
if (!Est)
return SDValue();

// Unfortunately, Est is now NaN if the input was exactly 0.
// Select out this case and force the answer to 0.
EVT VT = Est.getValueType();
SDLoc DL(Op);
SDValue Zero = DAG.getConstantFP(0.0, DL, VT);
EVT CCVT = getSetCCResultType(VT);
SDValue ZeroCmp = DAG.getSetCC(DL, CCVT, Op, Zero, ISD::SETEQ);
AddToWorklist(ZeroCmp.getNode());

Est = DAG.getNode(VT.isVector() ? ISD::VSELECT : ISD::SELECT, DL, VT, ZeroCmp,
Zero, Est);
AddToWorklist(Est.getNode());
return Est;
}		}

/// Return true if base is a frame index, which is known not to alias with		/// Return true if base is a frame index, which is known not to alias with
/// anything but itself. Provides base object and offset as results.		/// anything but itself. Provides base object and offset as results.
static bool FindBaseOffset(SDValue Ptr, SDValue &Base, int64_t &Offset,		static bool FindBaseOffset(SDValue Ptr, SDValue &Base, int64_t &Offset,
const GlobalValue &GV, const void &CV) {		const GlobalValue &GV, const void &CV) {
// Assume it is a primitive operation.		// Assume it is a primitive operation.
Base = Ptr; Offset = 0; GV = nullptr; CV = nullptr;		Base = Ptr; Offset = 0; GV = nullptr; CV = nullptr;
▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// generated to compensate for the byte-swapping. But sometimes we do		/// generated to compensate for the byte-swapping. But sometimes we do
/// need to re-interpret the data in SIMD vector registers in big-endian		/// need to re-interpret the data in SIMD vector registers in big-endian
/// mode without emitting such REV instructions.		/// mode without emitting such REV instructions.
NVCAST,		NVCAST,

SMULL,		SMULL,
UMULL,		UMULL,

// Reciprocal estimates.		// Reciprocal estimates and steps.
FRECPE,		FRECPE, FRECPS,
FRSQRTE,		FRSQRTE, FRSQRTS,

// NEON Load/Store with post-increment base updates		// NEON Load/Store with post-increment base updates
LD2post = ISD::FIRST_TARGET_MEMORY_OPCODE,		LD2post = ISD::FIRST_TARGET_MEMORY_OPCODE,
LD3post,		LD3post,
LD4post,		LD4post,
ST2post,		ST2post,
ST3post,		ST3post,
ST4post,		ST4post,
▲ Show 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	private:
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorAND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorAND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVectorOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVectorOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFSINCOS(SDValue Op, SelectionDAG &DAG) const;

SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,		SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
std::vector<SDNode > Created) const override;		std::vector<SDNode > Created) const override;
SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getSqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps, bool &UseOneConstNR,
bool &UseOneConstNR) const override;		bool Reciprocal = false) const override;
SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps) const override;		unsigned &RefinementSteps) const override;
unsigned combineRepeatedFPDivisors() const override;		unsigned combineRepeatedFPDivisors() const override;

ConstraintType getConstraintType(StringRef Constraint) const override;		ConstraintType getConstraintType(StringRef Constraint) const override;
unsigned getRegisterByName(const char* RegName, EVT VT,		unsigned getRegisterByName(const char* RegName, EVT VT,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;

▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 976 Lines • ▼ Show 20 Lines	const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
case AArch64ISD::LD2LANEpost: return "AArch64ISD::LD2LANEpost";		case AArch64ISD::LD2LANEpost: return "AArch64ISD::LD2LANEpost";
case AArch64ISD::LD3LANEpost: return "AArch64ISD::LD3LANEpost";		case AArch64ISD::LD3LANEpost: return "AArch64ISD::LD3LANEpost";
case AArch64ISD::LD4LANEpost: return "AArch64ISD::LD4LANEpost";		case AArch64ISD::LD4LANEpost: return "AArch64ISD::LD4LANEpost";
case AArch64ISD::ST2LANEpost: return "AArch64ISD::ST2LANEpost";		case AArch64ISD::ST2LANEpost: return "AArch64ISD::ST2LANEpost";
case AArch64ISD::ST3LANEpost: return "AArch64ISD::ST3LANEpost";		case AArch64ISD::ST3LANEpost: return "AArch64ISD::ST3LANEpost";
case AArch64ISD::ST4LANEpost: return "AArch64ISD::ST4LANEpost";		case AArch64ISD::ST4LANEpost: return "AArch64ISD::ST4LANEpost";
case AArch64ISD::SMULL: return "AArch64ISD::SMULL";		case AArch64ISD::SMULL: return "AArch64ISD::SMULL";
case AArch64ISD::UMULL: return "AArch64ISD::UMULL";		case AArch64ISD::UMULL: return "AArch64ISD::UMULL";
case AArch64ISD::FRSQRTE: return "AArch64ISD::FRSQRTE";
case AArch64ISD::FRECPE: return "AArch64ISD::FRECPE";		case AArch64ISD::FRECPE: return "AArch64ISD::FRECPE";
		case AArch64ISD::FRECPS: return "AArch64ISD::FRECPS";
		case AArch64ISD::FRSQRTE: return "AArch64ISD::FRSQRTE";
		case AArch64ISD::FRSQRTS: return "AArch64ISD::FRSQRTS";
}		}
return nullptr;		return nullptr;
}		}

MachineBasicBlock *		MachineBasicBlock *
AArch64TargetLowering::EmitF128CSEL(MachineInstr &MI,		AArch64TargetLowering::EmitF128CSEL(MachineInstr &MI,
MachineBasicBlock *MBB) const {		MachineBasicBlock *MBB) const {
// We materialise the F128CSEL pseudo-instruction as some control flow and a		// We materialise the F128CSEL pseudo-instruction as some control flow and a
▲ Show 20 Lines • Show All 3,593 Lines • ▼ Show 20 Lines	SDValue Lo = DAG.getNode(AArch64ISD::CSEL, dl, VT, LoForBigShift,
LoForNormalShift, CCVal, Cmp);		LoForNormalShift, CCVal, Cmp);

SDValue Ops[2] = { Lo, Hi };		SDValue Ops[2] = { Lo, Hi };
return DAG.getMergeValues(Ops, dl);		return DAG.getMergeValues(Ops, dl);
}		}

bool AArch64TargetLowering::isOffsetFoldingLegal(		bool AArch64TargetLowering::isOffsetFoldingLegal(
const GlobalAddressSDNode *GA) const {		const GlobalAddressSDNode *GA) const {
// The AArch64 target doesn't support folding offsets into global addresses.		// The AArch64 target doesn't support folding offsets into global addresses.
		t.p.northoverUnsubmitted Done Reply Inline Actions A for loop would probably be better. People expect while loops to do crazy things. t.p.northover: A for loop would probably be better. People expect while loops to do crazy things.
return false;		return false;
}		}

bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT) const {		bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT) const {
// We can materialize #0.0 as fmov $Rd, XZR for 64-bit and 32-bit cases.		// We can materialize #0.0 as fmov $Rd, XZR for 64-bit and 32-bit cases.
// FIXME: We should be able to handle f128 as well with a clever lowering.		// FIXME: We should be able to handle f128 as well with a clever lowering.
if (Imm.isPosZero() && (VT == MVT::f64 \|\| VT == MVT::f32))		if (Imm.isPosZero() && (VT == MVT::f64 \|\| VT == MVT::f32))
return true;		return true;

if (VT == MVT::f64)		if (VT == MVT::f64)
return AArch64_AM::getFP64Imm(Imm) != -1;		return AArch64_AM::getFP64Imm(Imm) != -1;
else if (VT == MVT::f32)		else if (VT == MVT::f32)
return AArch64_AM::getFP32Imm(Imm) != -1;		return AArch64_AM::getFP32Imm(Imm) != -1;
return false;		return false;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AArch64 Optimization Hooks		// AArch64 Optimization Hooks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Return the appropriate estimate DAG for either the reciprocal		/// Return the appropriate estimate DAG for either the reciprocal
/// or the reciprocal square root.		/// or the reciprocal square root.
static SDValue getEstimate(const AArch64TargetLowering::DAGCombinerInfo &DCI,		static SDValue getEstimate(const AArch64TargetLowering::DAGCombinerInfo &DCI,
TargetRecip &Recip, unsigned Opcode,		const TargetRecip &Recip, unsigned Opcode,
const SDValue &Operand, unsigned &ExtraSteps) {		const SDValue &Operand, unsigned &ExtraSteps) {
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();
if (VT != MVT::f64 && VT != MVT::v1f64 && VT != MVT::v2f64 &&		if (VT != MVT::f64 && VT != MVT::v1f64 && VT != MVT::v2f64 &&
VT != MVT::f32 && VT != MVT::v1f32 &&		VT != MVT::f32 && VT != MVT::v1f32 &&
VT != MVT::v2f32 && VT != MVT::v4f32)		VT != MVT::v2f32 && VT != MVT::v4f32)
return SDValue();		return SDValue();

std::string RecipOp;		std::string RecipOp;
RecipOp = Opcode == (AArch64ISD::FRECPE) ? "div": "sqrt";		RecipOp = Opcode == (AArch64ISD::FRECPE) ? "div": "sqrt";
RecipOp = ((VT.isVector()) ? "vec-": "") + RecipOp;		RecipOp = ((VT.isVector()) ? "vec-": "") + RecipOp;
RecipOp += (VT.getScalarType() == MVT::f64) ? "d": "f";		RecipOp += (VT.getScalarType() == MVT::f64) ? "d": "f";

if (!Recip.isEnabled(RecipOp))		if (!Recip.isEnabled(RecipOp))
return SDValue();		return SDValue();

ExtraSteps = Recip.getRefinementSteps(RecipOp);		ExtraSteps = Recip.getRefinementSteps(RecipOp);
return DCI.DAG.getNode(Opcode, SDLoc(Operand), VT, Operand);		return DCI.DAG.getNode(Opcode, SDLoc(Operand), VT, Operand);
}		}

SDValue AArch64TargetLowering::getRecipEstimate(SDValue Operand,		SDValue AArch64TargetLowering::getRecipEstimate(SDValue Arg,
DAGCombinerInfo &DCI, unsigned &ExtraSteps) const {		DAGCombinerInfo &DCI,
TargetRecip Recip = getTargetRecipForFunc(DCI.DAG.getMachineFunction());		unsigned &ExtraSteps) const {
		const TargetRecip Recip = getTargetRecipForFunc(DCI.DAG.getMachineFunction());

		if (SDValue Est = getEstimate(DCI, Recip, AArch64ISD::FRECPE, Arg,
		ExtraSteps)) {
		SDLoc DL(Arg);
		EVT VT = Arg.getValueType();

		SDNodeFlags Flags;
		Flags.setUnsafeAlgebra(true);

		// Newton reciprocal iteration: Est * (2 - Arg * Est)
		// AArch64 reciprocal iteration instruction: (2 - M * N)
		for (unsigned i = ExtraSteps; i > 0; --i) {
		SDValue Step = DCI.DAG.getNode(AArch64ISD::FRECPS, DL, VT, Arg, Est,
		&Flags);
		Est = DCI.DAG.getNode(ISD::FMUL, DL, VT, Est, Step, &Flags);
		}

return getEstimate(DCI, Recip, AArch64ISD::FRECPE, Operand, ExtraSteps);		ExtraSteps = 0;
		return Est;
}		}

SDValue AArch64TargetLowering::getRsqrtEstimate(SDValue Operand,		return SDValue();
DAGCombinerInfo &DCI, unsigned &ExtraSteps, bool &UseOneConst) const {		}
TargetRecip Recip = getTargetRecipForFunc(DCI.DAG.getMachineFunction());
		SDValue AArch64TargetLowering::getSqrtEstimate(SDValue Arg,
		DAGCombinerInfo &DCI,
		unsigned &ExtraSteps,
		bool &UseOneConst,
		bool Reciprocal) const {
		const TargetRecip Recip = getTargetRecipForFunc(DCI.DAG.getMachineFunction());

		if (SDValue Est = getEstimate(DCI, Recip, AArch64ISD::FRSQRTE, Arg,
		ExtraSteps)) {
		SDLoc DL(Arg);
		EVT VT = Arg.getValueType();

		SDNodeFlags Flags;
		Flags.setUnsafeAlgebra(true);

UseOneConst = true;		// Newton reciprocal square root iteration: Est * 0.5 * (3 - Arg * Est^2)
return getEstimate(DCI, Recip, AArch64ISD::FRSQRTE, Operand, ExtraSteps);		// AArch64 reciprocal square root iteration instruction: 0.5 * (3 - M * N)
		SDValue Step;
		for (unsigned i = ExtraSteps; i > 0; --i) {
		Step = DCI.DAG.getNode(ISD::FMUL, DL, VT, Est, Est, &Flags);
		Step = DCI.DAG.getNode(AArch64ISD::FRSQRTS, DL, VT, Arg, Step, &Flags);
		if (i > 1)
		Est = DCI.DAG.getNode(ISD::FMUL, DL, VT, Est, Step, &Flags);
		}

		if (!Reciprocal) {
		// Calculate the square root.
		Est = DCI.DAG.getNode(ISD::FMUL, DL, VT, Arg, Est, &Flags);

		EVT CCVT = getSetCCResultType(DCI.DAG.getDataLayout(),
		*DCI.DAG.getContext(), VT);
		SDValue Zero = DCI.DAG.getConstantFP(0.0, DL, VT);
		SDValue Eq = DCI.DAG.getSetCC(DL, CCVT, Arg, Zero, ISD::SETEQ);

		// Correct the result if the argument is zero.
		Est = DCI.DAG.getNode(VT.isVector() ? ISD::VSELECT : ISD::SELECT,
		DL, VT, Eq, Arg, Est);
		}

		if (ExtraSteps)
		Est = DCI.DAG.getNode(ISD::FMUL, DL, VT, Est, Step, &Flags);

		ExtraSteps = 0;
		return Est;
		}

		return SDValue();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AArch64 Inline Assembly Support		// AArch64 Inline Assembly Support
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Table of Constraints		// Table of Constraints
// TODO: This is the current set of constraints supported by ARM for the		// TODO: This is the current set of constraints supported by ARM for the
▲ Show 20 Lines • Show All 3,099 Lines • ▼ Show 20 Lines	static SDValue performFpToIntCombine(SDNode *N, SelectionDAG &DAG,
MVT FloatTy = Op.getSimpleValueType().getVectorElementType();		MVT FloatTy = Op.getSimpleValueType().getVectorElementType();
uint32_t FloatBits = FloatTy.getSizeInBits();		uint32_t FloatBits = FloatTy.getSizeInBits();
if (FloatBits != 32 && FloatBits != 64)		if (FloatBits != 32 && FloatBits != 64)
return SDValue();		return SDValue();

MVT IntTy = N->getSimpleValueType(0).getVectorElementType();		MVT IntTy = N->getSimpleValueType(0).getVectorElementType();
uint32_t IntBits = IntTy.getSizeInBits();		uint32_t IntBits = IntTy.getSizeInBits();
if (IntBits != 16 && IntBits != 32 && IntBits != 64)		if (IntBits != 16 && IntBits != 32 && IntBits != 64)
return SDValue();		return SDValue();

		t.p.northoverUnsubmitted Done Reply Inline Actions Isn't this really just a pattern? It looks like we could mark FRECPI as legal (which is more what the DAGCombiner should be asking anyway) and write: def : Pat<(f32 (frecpi f32:$arg, f32:$est), (FMULSrr (FRECPS32 $arg, $est) $est>; Similarly for the sqrt case. t.p.northover: Isn't this really just a pattern? It looks like we could mark FRECPI as legal (which is more…
// Avoid conversions where iN is larger than the float (e.g., float -> i64).		// Avoid conversions where iN is larger than the float (e.g., float -> i64).
if (IntBits > FloatBits)		if (IntBits > FloatBits)
return SDValue();		return SDValue();

BitVector UndefElements;		BitVector UndefElements;
BuildVectorSDNode *BV = cast<BuildVectorSDNode>(ConstVec);		BuildVectorSDNode *BV = cast<BuildVectorSDNode>(ConstVec);
int32_t Bits = IntBits == 64 ? 64 : 32;		int32_t Bits = IntBits == 64 ? 64 : 32;
int32_t C = BV->getConstantFPSplatPow2ToLog2Int(&UndefElements, Bits + 1);		int32_t C = BV->getConstantFPSplatPow2ToLog2Int(&UndefElements, Bits + 1);
▲ Show 20 Lines • Show All 2,673 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 281 Lines • ▼ Show 20 Lines
def AArch64NvCast : SDNode<"AArch64ISD::NVCAST", SDTUnaryOp>;		def AArch64NvCast : SDNode<"AArch64ISD::NVCAST", SDTUnaryOp>;

def SDT_AArch64mull : SDTypeProfile<1, 2, [SDTCisInt<0>, SDTCisInt<1>,		def SDT_AArch64mull : SDTypeProfile<1, 2, [SDTCisInt<0>, SDTCisInt<1>,
SDTCisSameAs<1, 2>]>;		SDTCisSameAs<1, 2>]>;
def AArch64smull : SDNode<"AArch64ISD::SMULL", SDT_AArch64mull>;		def AArch64smull : SDNode<"AArch64ISD::SMULL", SDT_AArch64mull>;
def AArch64umull : SDNode<"AArch64ISD::UMULL", SDT_AArch64mull>;		def AArch64umull : SDNode<"AArch64ISD::UMULL", SDT_AArch64mull>;

def AArch64frecpe : SDNode<"AArch64ISD::FRECPE", SDTFPUnaryOp>;		def AArch64frecpe : SDNode<"AArch64ISD::FRECPE", SDTFPUnaryOp>;
		def AArch64frecps : SDNode<"AArch64ISD::FRECPS", SDTFPBinOp>;
def AArch64frsqrte : SDNode<"AArch64ISD::FRSQRTE", SDTFPUnaryOp>;		def AArch64frsqrte : SDNode<"AArch64ISD::FRSQRTE", SDTFPUnaryOp>;
		def AArch64frsqrts : SDNode<"AArch64ISD::FRSQRTS", SDTFPBinOp>;

def AArch64saddv : SDNode<"AArch64ISD::SADDV", SDT_AArch64UnaryVec>;		def AArch64saddv : SDNode<"AArch64ISD::SADDV", SDT_AArch64UnaryVec>;
def AArch64uaddv : SDNode<"AArch64ISD::UADDV", SDT_AArch64UnaryVec>;		def AArch64uaddv : SDNode<"AArch64ISD::UADDV", SDT_AArch64UnaryVec>;
def AArch64sminv : SDNode<"AArch64ISD::SMINV", SDT_AArch64UnaryVec>;		def AArch64sminv : SDNode<"AArch64ISD::SMINV", SDT_AArch64UnaryVec>;
def AArch64uminv : SDNode<"AArch64ISD::UMINV", SDT_AArch64UnaryVec>;		def AArch64uminv : SDNode<"AArch64ISD::UMINV", SDT_AArch64UnaryVec>;
def AArch64smaxv : SDNode<"AArch64ISD::SMAXV", SDT_AArch64UnaryVec>;		def AArch64smaxv : SDNode<"AArch64ISD::SMAXV", SDT_AArch64UnaryVec>;
def AArch64umaxv : SDNode<"AArch64ISD::UMAXV", SDT_AArch64UnaryVec>;		def AArch64umaxv : SDNode<"AArch64ISD::UMAXV", SDT_AArch64UnaryVec>;

▲ Show 20 Lines • Show All 3,118 Lines • ▼ Show 20 Lines	def : Pat<(v4f32 (AArch64frecpe (v4f32 FPR128:$Rn))),
(FRECPEv4f32 FPR128:$Rn)>;		(FRECPEv4f32 FPR128:$Rn)>;
def : Pat<(f64 (AArch64frecpe (f64 FPR64:$Rn))),		def : Pat<(f64 (AArch64frecpe (f64 FPR64:$Rn))),
(FRECPEv1i64 FPR64:$Rn)>;		(FRECPEv1i64 FPR64:$Rn)>;
def : Pat<(v1f64 (AArch64frecpe (v1f64 FPR64:$Rn))),		def : Pat<(v1f64 (AArch64frecpe (v1f64 FPR64:$Rn))),
(FRECPEv1i64 FPR64:$Rn)>;		(FRECPEv1i64 FPR64:$Rn)>;
def : Pat<(v2f64 (AArch64frecpe (v2f64 FPR128:$Rn))),		def : Pat<(v2f64 (AArch64frecpe (v2f64 FPR128:$Rn))),
(FRECPEv2f64 FPR128:$Rn)>;		(FRECPEv2f64 FPR128:$Rn)>;

		def : Pat<(f32 (AArch64frecps (f32 FPR32:$Rn), (f32 FPR32:$Rm))),
		(FRECPS32 FPR32:$Rn, FPR32:$Rm)>;
		def : Pat<(v2f32 (AArch64frecps (v2f32 V64:$Rn), (v2f32 V64:$Rm))),
		(FRECPSv2f32 V64:$Rn, V64:$Rm)>;
		def : Pat<(v4f32 (AArch64frecps (v4f32 FPR128:$Rn), (v4f32 FPR128:$Rm))),
		(FRECPSv4f32 FPR128:$Rn, FPR128:$Rm)>;
		def : Pat<(f64 (AArch64frecps (f64 FPR64:$Rn), (f64 FPR64:$Rm))),
		(FRECPS64 FPR64:$Rn, FPR64:$Rm)>;
		def : Pat<(v2f64 (AArch64frecps (v2f64 FPR128:$Rn), (v2f64 FPR128:$Rm))),
		(FRECPSv2f64 FPR128:$Rn, FPR128:$Rm)>;

def : Pat<(f32 (int_aarch64_neon_frecpx (f32 FPR32:$Rn))),		def : Pat<(f32 (int_aarch64_neon_frecpx (f32 FPR32:$Rn))),
(FRECPXv1i32 FPR32:$Rn)>;		(FRECPXv1i32 FPR32:$Rn)>;
def : Pat<(f64 (int_aarch64_neon_frecpx (f64 FPR64:$Rn))),		def : Pat<(f64 (int_aarch64_neon_frecpx (f64 FPR64:$Rn))),
(FRECPXv1i64 FPR64:$Rn)>;		(FRECPXv1i64 FPR64:$Rn)>;

def : Pat<(f32 (int_aarch64_neon_frsqrte (f32 FPR32:$Rn))),		def : Pat<(f32 (int_aarch64_neon_frsqrte (f32 FPR32:$Rn))),
(FRSQRTEv1i32 FPR32:$Rn)>;		(FRSQRTEv1i32 FPR32:$Rn)>;
def : Pat<(f64 (int_aarch64_neon_frsqrte (f64 FPR64:$Rn))),		def : Pat<(f64 (int_aarch64_neon_frsqrte (f64 FPR64:$Rn))),
Show All 9 Lines	def : Pat<(v4f32 (AArch64frsqrte (v4f32 FPR128:$Rn))),
(FRSQRTEv4f32 FPR128:$Rn)>;		(FRSQRTEv4f32 FPR128:$Rn)>;
def : Pat<(f64 (AArch64frsqrte (f64 FPR64:$Rn))),		def : Pat<(f64 (AArch64frsqrte (f64 FPR64:$Rn))),
(FRSQRTEv1i64 FPR64:$Rn)>;		(FRSQRTEv1i64 FPR64:$Rn)>;
def : Pat<(v1f64 (AArch64frsqrte (v1f64 FPR64:$Rn))),		def : Pat<(v1f64 (AArch64frsqrte (v1f64 FPR64:$Rn))),
(FRSQRTEv1i64 FPR64:$Rn)>;		(FRSQRTEv1i64 FPR64:$Rn)>;
def : Pat<(v2f64 (AArch64frsqrte (v2f64 FPR128:$Rn))),		def : Pat<(v2f64 (AArch64frsqrte (v2f64 FPR128:$Rn))),
(FRSQRTEv2f64 FPR128:$Rn)>;		(FRSQRTEv2f64 FPR128:$Rn)>;

		def : Pat<(f32 (AArch64frsqrts (f32 FPR32:$Rn), (f32 FPR32:$Rm))),
		(FRSQRTS32 FPR32:$Rn, FPR32:$Rm)>;
		def : Pat<(v2f32 (AArch64frsqrts (v2f32 V64:$Rn), (v2f32 V64:$Rm))),
		(FRSQRTSv2f32 V64:$Rn, V64:$Rm)>;
		def : Pat<(v4f32 (AArch64frsqrts (v4f32 FPR128:$Rn), (v4f32 FPR128:$Rm))),
		(FRSQRTSv4f32 FPR128:$Rn, FPR128:$Rm)>;
		def : Pat<(f64 (AArch64frsqrts (f64 FPR64:$Rn), (f64 FPR64:$Rm))),
		(FRSQRTS64 FPR64:$Rn, FPR64:$Rm)>;
		def : Pat<(v2f64 (AArch64frsqrts (v2f64 FPR128:$Rn), (v2f64 FPR128:$Rm))),
		(FRSQRTSv2f64 FPR128:$Rn, FPR128:$Rm)>;

// If an integer is about to be converted to a floating point value,		// If an integer is about to be converted to a floating point value,
// just load it on the floating point unit.		// just load it on the floating point unit.
// Here are the patterns for 8 and 16-bits to float.		// Here are the patterns for 8 and 16-bits to float.
// 8-bits -> float.		// 8-bits -> float.
multiclass UIntToFPROLoadPat<ValueType DstTy, ValueType SrcTy,		multiclass UIntToFPROLoadPat<ValueType DstTy, ValueType SrcTy,
SDPatternOperator loadop, Instruction UCVTF,		SDPatternOperator loadop, Instruction UCVTF,
ROAddrMode ro, Instruction LDRW, Instruction LDRX,		ROAddrMode ro, Instruction LDRW, Instruction LDRX,
SubRegIndex sub> {		SubRegIndex sub> {
▲ Show 20 Lines • Show All 2,649 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	SDValue CombineFMinMaxLegacy(const SDLoc &DL, EVT VT, SDValue LHS,
SDValue RHS, SDValue True, SDValue False,		SDValue RHS, SDValue True, SDValue False,
SDValue CC, DAGCombinerInfo &DCI) const;		SDValue CC, DAGCombinerInfo &DCI) const;

const char* getTargetNodeName(unsigned Opcode) const override;		const char* getTargetNodeName(unsigned Opcode) const override;

bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override {		bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override {
return true;		return true;
}		}
SDValue getRsqrtEstimate(SDValue Operand,		SDValue getSqrtEstimate(SDValue Operand,
DAGCombinerInfo &DCI,		DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps,
bool &UseOneConstNR) const override;		bool &UseOneConstNR,
		bool Reciprocal) const override;
SDValue getRecipEstimate(SDValue Operand,		SDValue getRecipEstimate(SDValue Operand,
DAGCombinerInfo &DCI,		DAGCombinerInfo &DCI,
unsigned &RefinementSteps) const override;		unsigned &RefinementSteps) const override;

virtual SDNode PostISelFolding(MachineSDNode N,		virtual SDNode PostISelFolding(MachineSDNode N,
SelectionDAG &DAG) const = 0;		SelectionDAG &DAG) const = 0;

/// \brief Determine which of the bits specified in \p Mask are known to be		/// \brief Determine which of the bits specified in \p Mask are known to be
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 2,856 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(ATOMIC_CMP_SWAP)		NODE_NAME_CASE(ATOMIC_CMP_SWAP)
NODE_NAME_CASE(ATOMIC_INC)		NODE_NAME_CASE(ATOMIC_INC)
NODE_NAME_CASE(ATOMIC_DEC)		NODE_NAME_CASE(ATOMIC_DEC)
case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break;		case AMDGPUISD::LAST_AMDGPU_ISD_NUMBER: break;
}		}
return nullptr;		return nullptr;
}		}

SDValue AMDGPUTargetLowering::getRsqrtEstimate(SDValue Operand,		SDValue AMDGPUTargetLowering::getSqrtEstimate(SDValue Operand,
DAGCombinerInfo &DCI,		DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps,
bool &UseOneConstNR) const {		bool &UseOneConstNR,
		bool Reciprocal) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();

if (VT == MVT::f32) {		if (VT == MVT::f32) {
RefinementSteps = 0;		RefinementSteps = 0;
return DAG.getNode(AMDGPUISD::RSQ, SDLoc(Operand), VT, Operand);		return DAG.getNode(AMDGPUISD::RSQ, SDLoc(Operand), VT, Operand);
}		}

▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 961 Lines • ▼ Show 20 Lines	private:
SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;

SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue DAGCombineBuildVector(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue DAGCombineBuildVector(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue DAGCombineTruncBoolExt(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue DAGCombineTruncBoolExt(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineFPToIntToFP(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineFPToIntToFP(SDNode *N, DAGCombinerInfo &DCI) const;

SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getSqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps, bool &UseOneConstNR,
bool &UseOneConstNR) const override;		bool Reciprocal) const override;
SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps) const override;		unsigned &RefinementSteps) const override;
unsigned combineRepeatedFPDivisors() const override;		unsigned combineRepeatedFPDivisors() const override;

CCAssignFn *useFastISelCCs(unsigned Flag) const;		CCAssignFn *useFastISelCCs(unsigned Flag) const;
};		};

namespace PPC {		namespace PPC {
Show All 30 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,646 Lines • ▼ Show 20 Lines	else
RecipOp += "f";		RecipOp += "f";

if (VT.isVector())		if (VT.isVector())
RecipOp = "vec-" + RecipOp;		RecipOp = "vec-" + RecipOp;

return RecipOp;		return RecipOp;
}		}

SDValue PPCTargetLowering::getRsqrtEstimate(SDValue Operand,		SDValue PPCTargetLowering::getSqrtEstimate(SDValue Operand,
DAGCombinerInfo &DCI,		DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps,
bool &UseOneConstNR) const {		bool &UseOneConstNR,
		bool Reciprocal) const {
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();
if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|		if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|
(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|		(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|
(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|		(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|		(VT == MVT::v2f64 && Subtarget.hasVSX()) \|\|
(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|		(VT == MVT::v4f32 && Subtarget.hasQPX()) \|\|
(VT == MVT::v4f64 && Subtarget.hasQPX())) {		(VT == MVT::v4f64 && Subtarget.hasQPX())) {
TargetRecip Recips = getTargetRecipForFunc(DCI.DAG.getMachineFunction());		TargetRecip Recips = getTargetRecipForFunc(DCI.DAG.getMachineFunction());
▲ Show 20 Lines • Show All 2,730 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,246 Lines • ▼ Show 20 Lines	private:

/// Convert a comparison if required by the subtarget.		/// Convert a comparison if required by the subtarget.
SDValue ConvertCmpIfNecessary(SDValue Cmp, SelectionDAG &DAG) const;		SDValue ConvertCmpIfNecessary(SDValue Cmp, SelectionDAG &DAG) const;

/// Check if replacement of SQRT with RSQRT should be disabled.		/// Check if replacement of SQRT with RSQRT should be disabled.
bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override;		bool isFsqrtCheap(SDValue Operand, SelectionDAG &DAG) const override;

/// Use rsqrt* to speed up sqrt calculations.		/// Use rsqrt* to speed up sqrt calculations.
SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getSqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps, bool &UseOneConstNR,
bool &UseOneConstNR) const override;		bool Reciprocal) const override;

/// Use rcp* to speed up fdiv calculations.		/// Use rcp* to speed up fdiv calculations.
SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps) const override;		unsigned &RefinementSteps) const override;

/// Reassociate floating point divisions into multiply by reciprocal.		/// Reassociate floating point divisions into multiply by reciprocal.
unsigned combineRepeatedFPDivisors() const override;		unsigned combineRepeatedFPDivisors() const override;
};		};

namespace X86 {		namespace X86 {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace X86		} // end namespace X86
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_X86_X86ISELLOWERING_H		#endif // LLVM_LIB_TARGET_X86_X86ISELLOWERING_H

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,188 Lines • ▼ Show 20 Lines	bool X86TargetLowering::isFsqrtCheap(SDValue Op, SelectionDAG &DAG) const {

if (VT.isVector())		if (VT.isVector())
return Subtarget.hasFastVectorFSQRT();		return Subtarget.hasFastVectorFSQRT();
return Subtarget.hasFastScalarFSQRT();		return Subtarget.hasFastScalarFSQRT();
}		}

/// The minimum architected relative accuracy is 2^-12. We need one		/// The minimum architected relative accuracy is 2^-12. We need one
/// Newton-Raphson step to have a good float result (24 bits of precision).		/// Newton-Raphson step to have a good float result (24 bits of precision).
SDValue X86TargetLowering::getRsqrtEstimate(SDValue Op,		SDValue X86TargetLowering::getSqrtEstimate(SDValue Op,
DAGCombinerInfo &DCI,		DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps,
bool &UseOneConstNR) const {		bool &UseOneConstNR,
		bool Reciprocal) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
const char *RecipOp;		const char *RecipOp;

// SSE1 has rsqrtss and rsqrtps. AVX adds a 256-bit variant for rsqrtps.		// SSE1 has rsqrtss and rsqrtps. AVX adds a 256-bit variant for rsqrtps.
// TODO: Add support for AVX512 (v16f32).		// TODO: Add support for AVX512 (v16f32).
// It is likely not profitable to do this for f64 because a double-precision		// It is likely not profitable to do this for f64 because a double-precision
// rsqrt estimate with refinement on x86 prior to FMA requires at least 16		// rsqrt estimate with refinement on x86 prior to FMA requires at least 16
// instructions: convert to single, rsqrtss, convert back to double, refine		// instructions: convert to single, rsqrtss, convert back to double, refine
▲ Show 20 Lines • Show All 17,456 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/recp-fastmath.ll

	; RUN: llc < %s -mtriple=aarch64-unknown-linux-gnu -mattr=+neon \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-unknown-linux-gnu -mattr=+neon \| FileCheck %s

	define float @frecp0(float %x) #0 {			define float @frecp0(float %x) #0 {
	%div = fdiv fast float 1.0, %x			%div = fdiv fast float 1.0, %x
	ret float %div			ret float %div

	; CHECK-LABEL: frecp0:			; CHECK-LABEL: frecp0:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: fmov
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	}			}

	define float @frecp1(float %x) #1 {			define float @frecp1(float %x) #1 {
	%div = fdiv fast float 1.0, %x			%div = fdiv fast float 1.0, %x
	ret float %div			ret float %div

	t.p.northoverUnsubmitted Not Done Reply Inline Actions We should be checking data-flow too in these tests. It's not enough to know that LLVM managed to cobble together an frecps instruction. t.p.northover: We should be checking data-flow too in these tests. It's not enough to know that LLVM managed…
	; CHECK-LABEL: frecp1:			; CHECK-LABEL: frecp1:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: frecpe			; CHECK-NEXT: frecpe [[R:s[0-7]]]
	; CHECK-NEXT: fmov			; CHECK-NEXT: frecps {{s[0-7](, s[0-7])?}}, [[R]]
	}			}

	define <2 x float> @f2recp0(<2 x float> %x) #0 {			define <2 x float> @f2recp0(<2 x float> %x) #0 {
	%div = fdiv fast <2 x float> <float 1.0, float 1.0>, %x			%div = fdiv fast <2 x float> <float 1.0, float 1.0>, %x
	ret <2 x float> %div			ret <2 x float> %div

	; CHECK-LABEL: f2recp0:			; CHECK-LABEL: f2recp0:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: fmov
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	}			}

	define <2 x float> @f2recp1(<2 x float> %x) #1 {			define <2 x float> @f2recp1(<2 x float> %x) #1 {
	%div = fdiv fast <2 x float> <float 1.0, float 1.0>, %x			%div = fdiv fast <2 x float> <float 1.0, float 1.0>, %x
	ret <2 x float> %div			ret <2 x float> %div

	; CHECK-LABEL: f2recp1:			; CHECK-LABEL: f2recp1:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: frecpe [[R:v[0-7]\.2s]]
	; CHECK-NEXT: frecpe			; CHECK-NEXT: frecps {{v[0-7]\.2s(, v[0-7]\.2s)?}}, [[R]]
	}			}

	define <4 x float> @f4recp0(<4 x float> %x) #0 {			define <4 x float> @f4recp0(<4 x float> %x) #0 {
	%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <4 x float> %div			ret <4 x float> %div

	; CHECK-LABEL: f4recp0:			; CHECK-LABEL: f4recp0:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: fmov
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	}			}

	define <4 x float> @f4recp1(<4 x float> %x) #1 {			define <4 x float> @f4recp1(<4 x float> %x) #1 {
	%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <4 x float> %div			ret <4 x float> %div

	; CHECK-LABEL: f4recp1:			; CHECK-LABEL: f4recp1:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: frecpe [[R:v[0-7]\.4s]]
	; CHECK-NEXT: frecpe			; CHECK-NEXT: frecps {{v[0-7]\.4s(, v[0-7]\.4s)?}}, [[R]]
	}			}

	define <8 x float> @f8recp0(<8 x float> %x) #0 {			define <8 x float> @f8recp0(<8 x float> %x) #0 {
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <8 x float> %div			ret <8 x float> %div

	; CHECK-LABEL: f8recp0:			; CHECK-LABEL: f8recp0:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: fmov
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	}			}

	define <8 x float> @f8recp1(<8 x float> %x) #1 {			define <8 x float> @f8recp1(<8 x float> %x) #1 {
	%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x			%div = fdiv fast <8 x float> <float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>, %x
	ret <8 x float> %div			ret <8 x float> %div

	; CHECK-LABEL: f8recp1:			; CHECK-LABEL: f8recp1:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: frecpe [[RA:v[0-7]\.4s]]
	; CHECK-NEXT: frecpe			; CHECK-NEXT: frecpe [[RB:v[0-7]\.4s]]
	; CHECK: frecpe			; CHECK-NEXT: frecps {{v[0-7]\.4s(, v[0-7].4s)?}}, [[RA]]
				; CHECK: frecps {{v[0-7]\.4s(, v[0-7]\.4s)?}}, [[RB]]
	}			}

	define double @drecp0(double %x) #0 {			define double @drecp0(double %x) #0 {
	%div = fdiv fast double 1.0, %x			%div = fdiv fast double 1.0, %x
	ret double %div			ret double %div

	; CHECK-LABEL: drecp0:			; CHECK-LABEL: drecp0:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: fmov
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	}			}

	define double @drecp1(double %x) #1 {			define double @drecp1(double %x) #1 {
	%div = fdiv fast double 1.0, %x			%div = fdiv fast double 1.0, %x
	ret double %div			ret double %div

	; CHECK-LABEL: drecp1:			; CHECK-LABEL: drecp1:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: frecpe			; CHECK-NEXT: frecpe [[R:d[0-7]]]
	; CHECK-NEXT: fmov			; CHECK-NEXT: frecps {{d[0-7](, d[0-7])?}}, [[R]]
	}			}

	define <2 x double> @d2recp0(<2 x double> %x) #0 {			define <2 x double> @d2recp0(<2 x double> %x) #0 {
	%div = fdiv fast <2 x double> <double 1.0, double 1.0>, %x			%div = fdiv fast <2 x double> <double 1.0, double 1.0>, %x
	ret <2 x double> %div			ret <2 x double> %div

	; CHECK-LABEL: d2recp0:			; CHECK-LABEL: d2recp0:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: fmov
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	}			}

	define <2 x double> @d2recp1(<2 x double> %x) #1 {			define <2 x double> @d2recp1(<2 x double> %x) #1 {
	%div = fdiv fast <2 x double> <double 1.0, double 1.0>, %x			%div = fdiv fast <2 x double> <double 1.0, double 1.0>, %x
	ret <2 x double> %div			ret <2 x double> %div

	; CHECK-LABEL: d2recp1:			; CHECK-LABEL: d2recp1:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: frecpe [[R:v[0-7]\.2d]]
	; CHECK-NEXT: frecpe			; CHECK-NEXT: frecps {{v[0-7]\.2d(, v[0-7]\.2d)?}}, [[R]]
	}			}

	define <4 x double> @d4recp0(<4 x double> %x) #0 {			define <4 x double> @d4recp0(<4 x double> %x) #0 {
	%div = fdiv fast <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>, %x			%div = fdiv fast <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>, %x
	ret <4 x double> %div			ret <4 x double> %div

	; CHECK-LABEL: d4recp0:			; CHECK-LABEL: d4recp0:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: fmov
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	; CHECK-NEXT: fdiv			; CHECK-NEXT: fdiv
	}			}

	define <4 x double> @d4recp1(<4 x double> %x) #1 {			define <4 x double> @d4recp1(<4 x double> %x) #1 {
	%div = fdiv fast <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>, %x			%div = fdiv fast <4 x double> <double 1.0, double 1.0, double 1.0, double 1.0>, %x
	ret <4 x double> %div			ret <4 x double> %div

	; CHECK-LABEL: d4recp1:			; CHECK-LABEL: d4recp1:
	; CHECK-NEXT: BB#0			; CHECK-NEXT: BB#0
	; CHECK-NEXT: fmov			; CHECK-NEXT: frecpe [[RA:v[0-7]\.2d]]
	; CHECK-NEXT: frecpe			; CHECK-NEXT: frecpe [[RB:v[0-7]\.2d]]
	; CHECK: frecpe			; CHECK-NEXT: frecps {{v[0-7]\.2d(, v[0-7]\.2d)?}}, [[RA]]
				; CHECK: frecps {{v[0-7]\.2d(, v[0-7]\.2d)?}}, [[RB]]
	}			}

	attributes #0 = { nounwind "unsafe-fp-math"="true" }			attributes #0 = { nounwind "unsafe-fp-math"="true" }
	attributes #1 = { nounwind "unsafe-fp-math"="true" "reciprocal-estimates"="div,vec-div" }			attributes #1 = { nounwind "unsafe-fp-math"="true" "reciprocal-estimates"="div,vec-div" }

llvm/test/CodeGen/AArch64/sqrt-fastmath.ll

Show All 13 Lines	define float @fsqrt(float %a) #0 {
ret float %1		ret float %1

; FAULT-LABEL: fsqrt:		; FAULT-LABEL: fsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: fsqrt:		; CHECK-LABEL: fsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:s[0-7]]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: fmul [[RB:s[0-7]]], [[RA]], [[RA]]
		; CHECK-NEXT: frsqrts {{s[0-7](, s[0-7])?}}, [[RB]]
		; CHECK: fcmp s0, #0
}		}

define <2 x float> @f2sqrt(<2 x float> %a) #0 {		define <2 x float> @f2sqrt(<2 x float> %a) #0 {
%1 = tail call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> %a)		%1 = tail call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> %a)
ret <2 x float> %1		ret <2 x float> %1

; FAULT-LABEL: f2sqrt:		; FAULT-LABEL: f2sqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: f2sqrt:		; CHECK-LABEL: f2sqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.2s]]
; CHECK-NEXT: mov		; CHECK-NEXT: fmul [[RB:v[0-7]\.2s]], [[RA]], [[RA]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: frsqrts {{v[0-7]\.2s(, v[0-7]\.2s)?}}, [[RB]]
		; CHECK: fcmeq {{v[0-7]\.2s, v0\.2s}}, #0
}		}

define <4 x float> @f4sqrt(<4 x float> %a) #0 {		define <4 x float> @f4sqrt(<4 x float> %a) #0 {
%1 = tail call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)		%1 = tail call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
ret <4 x float> %1		ret <4 x float> %1

; FAULT-LABEL: f4sqrt:		; FAULT-LABEL: f4sqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: f4sqrt:		; CHECK-LABEL: f4sqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.4s]]
; CHECK-NEXT: mov		; CHECK-NEXT: fmul [[RB:v[0-7]\.4s]], [[RA]], [[RA]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: frsqrts {{v[0-7]\.4s(, v[0-7]\.4s)?}}, [[RB]]
		; CHECK: fcmeq {{v[0-7]\.4s, v0\.4s}}, #0
}		}

define <8 x float> @f8sqrt(<8 x float> %a) #0 {		define <8 x float> @f8sqrt(<8 x float> %a) #0 {
%1 = tail call fast <8 x float> @llvm.sqrt.v8f32(<8 x float> %a)		%1 = tail call fast <8 x float> @llvm.sqrt.v8f32(<8 x float> %a)
ret <8 x float> %1		ret <8 x float> %1

; FAULT-LABEL: f8sqrt:		; FAULT-LABEL: f8sqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: f8sqrt:		; CHECK-LABEL: f8sqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.4s]]
; CHECK-NEXT: mov		; CHECK: fmul [[RB:v[0-7]\.4s]], [[RA]], [[RA]]
; CHECK-NEXT: frsqrte		; CHECK: frsqrts {{v[0-7]\.4s(, v[0-7]\.4s)?}}, [[RB]]
; CHECK: frsqrte		; CHECK: fcmeq {{v[0-7]\.4s, v[0-1]\.4s}}, #0
}		}

define double @dsqrt(double %a) #0 {		define double @dsqrt(double %a) #0 {
%1 = tail call fast double @llvm.sqrt.f64(double %a)		%1 = tail call fast double @llvm.sqrt.f64(double %a)
ret double %1		ret double %1

; FAULT-LABEL: dsqrt:		; FAULT-LABEL: dsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: dsqrt:		; CHECK-LABEL: dsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:d[0-7]]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: fmul [[RB:d[0-7]]], [[RA]], [[RA]]
		; CHECK-NEXT: frsqrts {{d[0-7](, d[0-7])?}}, [[RB]]
		; CHECK: fcmp d0, #0
}		}

define <2 x double> @d2sqrt(<2 x double> %a) #0 {		define <2 x double> @d2sqrt(<2 x double> %a) #0 {
%1 = tail call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> %a)		%1 = tail call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> %a)
ret <2 x double> %1		ret <2 x double> %1

; FAULT-LABEL: d2sqrt:		; FAULT-LABEL: d2sqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: d2sqrt:		; CHECK-LABEL: d2sqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.2d]]
; CHECK-NEXT: mov		; CHECK-NEXT: fmul [[RB:v[0-7]\.2d]], [[RA]], [[RA]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: frsqrts {{v[0-7]\.2d(, v[0-7]\.2d)?}}, [[RB]]
		; CHECK: fcmeq {{v[0-7]\.2d, v0\.2d}}, #0
}		}

define <4 x double> @d4sqrt(<4 x double> %a) #0 {		define <4 x double> @d4sqrt(<4 x double> %a) #0 {
%1 = tail call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)		%1 = tail call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)
ret <4 x double> %1		ret <4 x double> %1

; FAULT-LABEL: d4sqrt:		; FAULT-LABEL: d4sqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: d4sqrt:		; CHECK-LABEL: d4sqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.2d]]
; CHECK-NEXT: mov		; CHECK: fmul [[RB:v[0-7]\.2d]], [[RA]], [[RA]]
; CHECK-NEXT: frsqrte		; CHECK: frsqrts {{v[0-7]\.2d(, v[0-7]\.2d)?}}, [[RB]]
; CHECK: frsqrte		; CHECK: fcmeq {{v[0-7]\.2d, v[0-1]\.2d}}, #0
}		}

define float @frsqrt(float %a) #0 {		define float @frsqrt(float %a) #0 {
%1 = tail call fast float @llvm.sqrt.f32(float %a)		%1 = tail call fast float @llvm.sqrt.f32(float %a)
%2 = fdiv fast float 1.000000e+00, %1		%2 = fdiv fast float 1.000000e+00, %1
ret float %2		ret float %2

; FAULT-LABEL: frsqrt:		; FAULT-LABEL: frsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: frsqrt:		; CHECK-LABEL: frsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:s[0-7]]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: fmul [[RB:s[0-7]]], [[RA]], [[RA]]
		; CHECK-NEXT: frsqrts {{s[0-7](, s[0-7])?}}, [[RB]]
		; CHECK-NOT: fcmp {{s[0-7]}}, #0
}		}

define <2 x float> @f2rsqrt(<2 x float> %a) #0 {		define <2 x float> @f2rsqrt(<2 x float> %a) #0 {
%1 = tail call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> %a)		%1 = tail call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> %a)
%2 = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, %1		%2 = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, %1
ret <2 x float> %2		ret <2 x float> %2

; FAULT-LABEL: f2rsqrt:		; FAULT-LABEL: f2rsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: f2rsqrt:		; CHECK-LABEL: f2rsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.2s]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: fmul [[RB:v[0-7]\.2s]], [[RA]], [[RA]]
		; CHECK-NEXT: frsqrts {{v[0-7]\.2s(, v[0-7]\.2s)?}}, [[RB]]
		; CHECK-NOT: fcmeq {{v[0-7]\.2s, v0\.2s}}, #0
}		}

define <4 x float> @f4rsqrt(<4 x float> %a) #0 {		define <4 x float> @f4rsqrt(<4 x float> %a) #0 {
%1 = tail call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)		%1 = tail call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
%2 = fdiv fast <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %1		%2 = fdiv fast <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %1
ret <4 x float> %2		ret <4 x float> %2

; FAULT-LABEL: f4rsqrt:		; FAULT-LABEL: f4rsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: f4rsqrt:		; CHECK-LABEL: f4rsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.4s]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: fmul [[RB:v[0-7]\.4s]], [[RA]], [[RA]]
		; CHECK-NEXT: frsqrts {{v[0-7]\.4s(, v[0-7]\.4s)?}}, [[RB]]
		; CHECK-NOT: fcmeq {{v[0-7]\.4s, v0\.4s}}, #0
}		}

define <8 x float> @f8rsqrt(<8 x float> %a) #0 {		define <8 x float> @f8rsqrt(<8 x float> %a) #0 {
%1 = tail call fast <8 x float> @llvm.sqrt.v8f32(<8 x float> %a)		%1 = tail call fast <8 x float> @llvm.sqrt.v8f32(<8 x float> %a)
%2 = fdiv fast <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %1		%2 = fdiv fast <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %1
ret <8 x float> %2		ret <8 x float> %2

; FAULT-LABEL: f8rsqrt:		; FAULT-LABEL: f8rsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: f8rsqrt:		; CHECK-LABEL: f8rsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.4s]]
; CHECK-NEXT: frsqrte		; CHECK: fmul [[RB:v[0-7]\.4s]], [[RA]], [[RA]]
; CHECK: frsqrte		; CHECK: frsqrts {{v[0-7]\.4s(, v[0-7]\.4s)?}}, [[RB]]
		; CHECK-NOT: fcmeq {{v[0-7]\.4s, v0\.4s}}, #0
}		}

define double @drsqrt(double %a) #0 {		define double @drsqrt(double %a) #0 {
%1 = tail call fast double @llvm.sqrt.f64(double %a)		%1 = tail call fast double @llvm.sqrt.f64(double %a)
%2 = fdiv fast double 1.000000e+00, %1		%2 = fdiv fast double 1.000000e+00, %1
ret double %2		ret double %2

; FAULT-LABEL: drsqrt:		; FAULT-LABEL: drsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: drsqrt:		; CHECK-LABEL: drsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:d[0-7]]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: fmul [[RB:d[0-7]]], [[RA]], [[RA]]
		; CHECK-NEXT: frsqrts {{d[0-7](, d[0-7])?}}, [[RB]]
		; CHECK-NOT: fcmp d0, #0
}		}

define <2 x double> @d2rsqrt(<2 x double> %a) #0 {		define <2 x double> @d2rsqrt(<2 x double> %a) #0 {
%1 = tail call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> %a)		%1 = tail call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> %a)
%2 = fdiv fast <2 x double> <double 1.000000e+00, double 1.000000e+00>, %1		%2 = fdiv fast <2 x double> <double 1.000000e+00, double 1.000000e+00>, %1
ret <2 x double> %2		ret <2 x double> %2

; FAULT-LABEL: d2rsqrt:		; FAULT-LABEL: d2rsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: d2rsqrt:		; CHECK-LABEL: d2rsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.2d]]
; CHECK-NEXT: frsqrte		; CHECK-NEXT: fmul [[RB:v[0-7]\.2d]], [[RA]], [[RA]]
		; CHECK-NEXT: frsqrts {{v[0-7]\.2d(, v[0-7]\.2d)?}}, [[RB]]
		; CHECK-NOT: fcmeq {{v[0-7]\.2d, v0\.2d}}, #0
}		}

define <4 x double> @d4rsqrt(<4 x double> %a) #0 {		define <4 x double> @d4rsqrt(<4 x double> %a) #0 {
%1 = tail call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)		%1 = tail call fast <4 x double> @llvm.sqrt.v4f64(<4 x double> %a)
%2 = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %1		%2 = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %1
ret <4 x double> %2		ret <4 x double> %2

; FAULT-LABEL: d4rsqrt:		; FAULT-LABEL: d4rsqrt:
; FAULT-NEXT: BB#0		; FAULT-NEXT: BB#0
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt
; FAULT-NEXT: fsqrt		; FAULT-NEXT: fsqrt

; CHECK-LABEL: d4rsqrt:		; CHECK-LABEL: d4rsqrt:
; CHECK-NEXT: BB#0		; CHECK-NEXT: BB#0
; CHECK-NEXT: fmov		; CHECK-NEXT: frsqrte [[RA:v[0-7]\.2d]]
; CHECK-NEXT: frsqrte		; CHECK: fmul [[RB:v[0-7]\.2d]], [[RA]], [[RA]]
; CHECK: frsqrte		; CHECK: frsqrts {{v[0-7]\.2d(, v[0-7]\.2d)?}}, [[RB]]
		; CHECK-NOT: fcmeq {{v[0-7]\.2d, v0\.2d}}, #0
}		}

attributes #0 = { nounwind "unsafe-fp-math"="true" }		attributes #0 = { nounwind "unsafe-fp-math"="true" }

This is an archive of the discontinued LLVM Phabricator instance.

[DAG Combiner] Fix the native computation of the Newton series for reciprocalsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74888

llvm/include/llvm/Target/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/AArch64/recp-fastmath.ll

llvm/test/CodeGen/AArch64/sqrt-fastmath.ll

[DAG Combiner] Fix the native computation of the Newton series for reciprocals
ClosedPublic