This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/3
DAGCombiner.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCISelLowering.h
1/3
PPCISelLowering.cpp
-
PPCInstrInfo.td
-
PPCInstrVSX.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
fma-mutate.ll
-
recipest.ll

Differential D80974

[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
ClosedPublic

Authored by steven.zhang on Jun 1 2020, 9:06 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
jsji
hfinkel
shchenz

Group Reviewers

Restricted Project

Commits

rG4d83aba4228e: [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is…

Summary

For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will have the impact the precision. As the fsqrt added belong to the cold path of the cmp+branch, it won't impact the performance for normal inputs for PowerPC. Besides, it removes the xxlxor of the hot path.

clang without this patch

sqrt(2.2250738585072014e-308) = 1.4916681462400413e-154
sqrt(2.2250738585072009e-308) = 0
sqrt(4.9406564584124654e-324) = 0

With this patch:

sqrt(2.2250738585072014e-308) = 1.4916681462400413e-154
sqrt(2.2250738585072009e-308) = 1.4916681462400412e-154
sqrt(4.9406564584124654e-324) = 2.2227587494850775e-162

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steven.zhang created this revision.Jun 1 2020, 9:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 9:06 PM

Herald added subscribers: ecnelises, • wuzish, kbarton and 2 others. · View Herald Transcript

steven.zhang added a parent revision: D80706: [DAGCombine] Add hook to allow target specific test for sqrt input.Jun 1 2020, 9:07 PM

Harbormaster completed remote builds in B58712: Diff 267782.Jun 1 2020, 10:11 PM

cameron.mcinally added a subscriber: cameron.mcinally.Jun 2 2020, 7:21 AM

Not sure about other targets, but x86 is mostly expected to flush denorms (FTZ/DAZ) when using fast-math, so this should not matter even if we decide to override the default setting.

llvm/include/llvm/CodeGen/TargetLowering.h
4293	Can we have this default to: return DAG.getConstantFP(0.0, SDLoc(Operand), Operand.getValueType()); and save some code in the calling function?
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22079	"value that provided" -> "value provided"

steven.zhang planned changes to this revision.Sep 7 2020, 6:56 PM

steven.zhang mentioned this in D80706: [DAGCombine] Add hook to allow target specific test for sqrt input.Nov 3 2020, 12:08 AM

Address comments.

Harbormaster completed remote builds in B77371: Diff 302499.Nov 3 2020, 1:31 AM

As with the other patch, this LGTM in general, so if someone can verify that the PPC changes are as expected, we should be good.

shchenz added a subscriber: shchenz.Nov 4 2020, 7:36 PM

shchenz added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12771	Target independent code supports vector type and we also have vector sqrt instruction on Powerpc target. Can we use them for vector types like v4f32 or v2f64?

qiucf added a subscriber: qiucf.Nov 4 2020, 7:42 PM

qiucf added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12771	Yes. Supported types: VSX off: `f32`, `f64` VSX on: `f32`, `f64`, `f128`, `v4f32`, `v2f64`

Address comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12771	Yes, I will post another patch for the vector type as what I have mentioned in the comments of D80706.

Harbormaster completed remote builds in B77832: Diff 303353.Nov 6 2020, 1:34 AM

shchenz accepted this revision.Nov 10 2020, 8:16 PM

shchenz added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22084	On Powerpc target, we use 0.0 as denormal float point sqrt result for a long time. Changing the result to hardware sqrt instructions will improve the precision for sure, but it also degrades the runtime performance. Is it possible to do it like: if we concern about performance, we use 0.0, if we concern about precision, we use hardware sqrt instruction. Maybe `-Ofast` is an indicatation for this?

This revision is now accepted and ready to land.Nov 10 2020, 8:16 PM

oops, accept by mistake...

shchenz requested changes to this revision.Nov 10 2020, 8:17 PM

This revision now requires changes to proceed.Nov 10 2020, 8:17 PM

steven.zhang requested review of this revision.Nov 10 2020, 9:05 PM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22084	It won't deg the runtime performance if the input is not denormal which is the usual case, as the select is expanded as cmp + branch later and hw will predict to the normal code path. And it indeed slows down the performance if the input is denormal, but it is expected as the precision is improved. All the optimization is done under -Ofast. Considering that we only have impact on the denormal input code path, which usually cares about the precision, not the performance, I tend to keep it this way. Does it make sense ?

LGTM

This revision is now accepted and ready to land.Nov 10 2020, 9:37 PM

This revision was landed with ongoing or failed builds.Nov 26 2020, 6:13 PM

Closed by commit rG4d83aba4228e: [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is… (authored by steven.zhang). · Explain Why

This revision was automatically updated to reflect the committed changes.

steven.zhang added a commit: rG4d83aba4228e: [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is….

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

7 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

13 lines

Target/

PowerPC/

5 lines

13 lines

3 lines

2 lines

test/

CodeGen/

PowerPC/

fma-mutate.ll

6 lines

recipest.ll

65 lines

Diff 307944

llvm/include/llvm/CodeGen/TargetLowering.h

	Show First 20 Lines • Show All 4,280 Lines • ▼ Show 20 Lines
	/// suitable for use with a square root estimate calculation. For example, the			/// suitable for use with a square root estimate calculation. For example, the
	/// comparison may check if the operand is NAN, INF, zero, normal, etc. The			/// comparison may check if the operand is NAN, INF, zero, normal, etc. The
	/// result should be used as the condition operand for a select or branch.			/// result should be used as the condition operand for a select or branch.
	virtual SDValue getSqrtInputTest(SDValue Operand, SelectionDAG &DAG,			virtual SDValue getSqrtInputTest(SDValue Operand, SelectionDAG &DAG,
	const DenormalMode &Mode) const {			const DenormalMode &Mode) const {
	return SDValue();			return SDValue();
	}			}

				/// Return a target-dependent result if the input operand is not suitable for
				/// use with a square root estimate calculation.
				virtual SDValue getSqrtResultForDenormInput(SDValue Operand,
				SelectionDAG &DAG) const {
				return DAG.getConstantFP(0.0, SDLoc(Operand), Operand.getValueType());
				spatelUnsubmitted Not Done Reply Inline Actions Can we have this default to: return DAG.getConstantFP(0.0, SDLoc(Operand), Operand.getValueType()); and save some code in the calling function? spatel: Can we have this default to: return DAG.getConstantFP(0.0, SDLoc(Operand), Operand.
				}

	//===--------------------------------------------------------------------===//			//===--------------------------------------------------------------------===//
	// Legalization utility functions			// Legalization utility functions
	//			//

	/// Expand a MUL or [US]MUL_LOHI of n-bit values into two or four nodes,			/// Expand a MUL or [US]MUL_LOHI of n-bit values into two or four nodes,
	/// respectively, each computing an n/2-bit part of the result.			/// respectively, each computing an n/2-bit part of the result.
	/// \param Result A vector that will be filled with the parts of the result			/// \param Result A vector that will be filled with the parts of the result
	/// in little-endian order.			/// in little-endian order.
	▲ Show 20 Lines • Show All 280 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 22,046 Lines • ▼ Show 20 Lines	if (SDValue Est =
AddToWorklist(Est.getNode());		AddToWorklist(Est.getNode());

if (Iterations) {		if (Iterations) {
Est = UseOneConstNR		Est = UseOneConstNR
? buildSqrtNROneConst(Op, Est, Iterations, Flags, Reciprocal)		? buildSqrtNROneConst(Op, Est, Iterations, Flags, Reciprocal)
: buildSqrtNRTwoConst(Op, Est, Iterations, Flags, Reciprocal);		: buildSqrtNRTwoConst(Op, Est, Iterations, Flags, Reciprocal);

if (!Reciprocal) {		if (!Reciprocal) {
// The estimate is now completely wrong if the input was exactly 0.0 or
// possibly a denormal. Force the answer to 0.0 for those cases.
SDLoc DL(Op);		SDLoc DL(Op);
EVT CCVT = getSetCCResultType(VT);		EVT CCVT = getSetCCResultType(VT);
SDValue FPZero = DAG.getConstantFP(0.0, DL, VT);		SDValue FPZero = DAG.getConstantFP(0.0, DL, VT);
DenormalMode DenormMode = DAG.getDenormalMode(VT);		DenormalMode DenormMode = DAG.getDenormalMode(VT);
// Try the target specific test first.		// Try the target specific test first.
SDValue Test = TLI.getSqrtInputTest(Op, DAG, DenormMode);		SDValue Test = TLI.getSqrtInputTest(Op, DAG, DenormMode);
if (!Test) {		if (!Test) {
// If no test provided by target, testing it with denormal inputs to		// If no test provided by target, testing it with denormal inputs to
// avoid wrong estimate.		// avoid wrong estimate.
if (DenormMode.Input == DenormalMode::IEEE) {		if (DenormMode.Input == DenormalMode::IEEE) {
// This is specifically a check for the handling of denormal inputs,		// This is specifically a check for the handling of denormal inputs,
// not the result.		// not the result.

// Test = fabs(X) < SmallestNormal		// Test = fabs(X) < SmallestNormal
const fltSemantics &FltSem = DAG.EVTToAPFloatSemantics(VT);		const fltSemantics &FltSem = DAG.EVTToAPFloatSemantics(VT);
APFloat SmallestNorm = APFloat::getSmallestNormalized(FltSem);		APFloat SmallestNorm = APFloat::getSmallestNormalized(FltSem);
SDValue NormC = DAG.getConstantFP(SmallestNorm, DL, VT);		SDValue NormC = DAG.getConstantFP(SmallestNorm, DL, VT);
SDValue Fabs = DAG.getNode(ISD::FABS, DL, VT, Op);		SDValue Fabs = DAG.getNode(ISD::FABS, DL, VT, Op);
Test = DAG.getSetCC(DL, CCVT, Fabs, NormC, ISD::SETLT);		Test = DAG.getSetCC(DL, CCVT, Fabs, NormC, ISD::SETLT);
} else		} else
// Test = X == 0.0		// Test = X == 0.0
Test = DAG.getSetCC(DL, CCVT, Op, FPZero, ISD::SETEQ);		Test = DAG.getSetCC(DL, CCVT, Op, FPZero, ISD::SETEQ);
}		}
// Test ? 0.0 : Est
Est = DAG.getNode(Test.getValueType().isVector() ? ISD::VSELECT		// The estimate is now completely wrong if the input was exactly 0.0 or
		spatelUnsubmitted Not Done Reply Inline Actions "value that provided" -> "value provided" spatel: "value that provided" -> "value provided"
: ISD::SELECT,		// possibly a denormal. Force the answer to 0.0 or value provided by
DL, VT, Test, FPZero, Est);		// target for those cases.
		Est = DAG.getNode(
		Test.getValueType().isVector() ? ISD::VSELECT : ISD::SELECT, DL, VT,
		Test, TLI.getSqrtResultForDenormInput(Op, DAG), Est);
		shchenzUnsubmitted Not Done Reply Inline Actions On Powerpc target, we use 0.0 as denormal float point sqrt result for a long time. Changing the result to hardware sqrt instructions will improve the precision for sure, but it also degrades the runtime performance. Is it possible to do it like: if we concern about performance, we use 0.0, if we concern about precision, we use hardware sqrt instruction. Maybe `-Ofast` is an indicatation for this? shchenz: On Powerpc target, we use 0.0 as denormal float point sqrt result for a long time. Changing the…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions It won't deg the runtime performance if the input is not denormal which is the usual case, as the select is expanded as cmp + branch later and hw will predict to the normal code path. And it indeed slows down the performance if the input is denormal, but it is expected as the precision is improved. All the optimization is done under -Ofast. Considering that we only have impact on the denormal input code path, which usually cares about the precision, not the performance, I tend to keep it this way. Does it make sense ? steven.zhang: It won't deg the runtime performance if the input is not denormal which is the usual case, as…
}		}
}		}
return Est;		return Est;
}		}

return SDValue();		return SDValue();
}		}

▲ Show 20 Lines • Show All 440 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

/// Reciprocal estimate instructions (unary FP ops).		/// Reciprocal estimate instructions (unary FP ops).
FRE,		FRE,
FRSQRTE,		FRSQRTE,

/// Test instruction for software square root.		/// Test instruction for software square root.
FTSQRT,		FTSQRT,

		/// Square root instruction.
		FSQRT,

/// VPERM - The PPC VPERM Instruction.		/// VPERM - The PPC VPERM Instruction.
///		///
VPERM,		VPERM,

/// XXSPLT - The PPC VSX splat instructions		/// XXSPLT - The PPC VSX splat instructions
///		///
XXSPLT,		XXSPLT,

▲ Show 20 Lines • Show All 1,179 Lines • ▼ Show 20 Lines	private:

SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &RefinementSteps, bool &UseOneConstNR,		int &RefinementSteps, bool &UseOneConstNR,
bool Reciprocal) const override;		bool Reciprocal) const override;
SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &RefinementSteps) const override;		int &RefinementSteps) const override;
SDValue getSqrtInputTest(SDValue Operand, SelectionDAG &DAG,		SDValue getSqrtInputTest(SDValue Operand, SelectionDAG &DAG,
const DenormalMode &Mode) const override;		const DenormalMode &Mode) const override;
		SDValue getSqrtResultForDenormInput(SDValue Operand,
		SelectionDAG &DAG) const override;
unsigned combineRepeatedFPDivisors() const override;		unsigned combineRepeatedFPDivisors() const override;

SDValue		SDValue
combineElementTruncationToVectorTruncation(SDNode *N,		combineElementTruncationToVectorTruncation(SDNode *N,
DAGCombinerInfo &DCI) const;		DAGCombinerInfo &DCI) const;

/// lowerToVINSERTH - Return the SDValue if this VECTOR_SHUFFLE can be		/// lowerToVINSERTH - Return the SDValue if this VECTOR_SHUFFLE can be
/// handled by the VINSERTH instruction introduced in ISA 3.0. This is		/// handled by the VINSERTH instruction introduced in ISA 3.0. This is
Show All 39 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,435 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::FP_TO_UINT_IN_VSR:		case PPCISD::FP_TO_UINT_IN_VSR:
return "PPCISD::FP_TO_UINT_IN_VSR,";		return "PPCISD::FP_TO_UINT_IN_VSR,";
case PPCISD::FP_TO_SINT_IN_VSR:		case PPCISD::FP_TO_SINT_IN_VSR:
return "PPCISD::FP_TO_SINT_IN_VSR";		return "PPCISD::FP_TO_SINT_IN_VSR";
case PPCISD::FRE: return "PPCISD::FRE";		case PPCISD::FRE: return "PPCISD::FRE";
case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";		case PPCISD::FRSQRTE: return "PPCISD::FRSQRTE";
case PPCISD::FTSQRT:		case PPCISD::FTSQRT:
return "PPCISD::FTSQRT";		return "PPCISD::FTSQRT";
		case PPCISD::FSQRT:
		return "PPCISD::FSQRT";
case PPCISD::STFIWX: return "PPCISD::STFIWX";		case PPCISD::STFIWX: return "PPCISD::STFIWX";
case PPCISD::VPERM: return "PPCISD::VPERM";		case PPCISD::VPERM: return "PPCISD::VPERM";
case PPCISD::XXSPLT: return "PPCISD::XXSPLT";		case PPCISD::XXSPLT: return "PPCISD::XXSPLT";
case PPCISD::XXSPLTI_SP_TO_DP:		case PPCISD::XXSPLTI_SP_TO_DP:
return "PPCISD::XXSPLTI_SP_TO_DP";		return "PPCISD::XXSPLTI_SP_TO_DP";
case PPCISD::XXSPLTI32DX:		case PPCISD::XXSPLTI32DX:
return "PPCISD::XXSPLTI32DX";		return "PPCISD::XXSPLTI32DX";
case PPCISD::VECINSERT: return "PPCISD::VECINSERT";		case PPCISD::VECINSERT: return "PPCISD::VECINSERT";
▲ Show 20 Lines • Show All 11,304 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::getSqrtInputTest(SDValue Op, SelectionDAG &DAG,
// not eligible for iteration. (zero/negative/infinity/nan or unbiased		// not eligible for iteration. (zero/negative/infinity/nan or unbiased
// exponent is less than -970)		// exponent is less than -970)
SDValue SRIdxVal = DAG.getTargetConstant(PPC::sub_eq, DL, MVT::i32);		SDValue SRIdxVal = DAG.getTargetConstant(PPC::sub_eq, DL, MVT::i32);
return SDValue(DAG.getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL, MVT::i1,		return SDValue(DAG.getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL, MVT::i1,
FTSQRT, SRIdxVal),		FTSQRT, SRIdxVal),
0);		0);
}		}

		SDValue
		PPCTargetLowering::getSqrtResultForDenormInput(SDValue Op,
		SelectionDAG &DAG) const {
		// TODO - add support for v2f64/v4f32
		EVT VT = Op.getValueType();
		if (VT != MVT::f64)
		shchenzUnsubmitted Not Done Reply Inline Actions Target independent code supports vector type and we also have vector sqrt instruction on Powerpc target. Can we use them for vector types like v4f32 or v2f64? shchenz: Target independent code supports vector type and we also have vector sqrt instruction on…
		qiucfUnsubmitted Not Done Reply Inline Actions Yes. Supported types: VSX off: `f32`, `f64` VSX on: `f32`, `f64`, `f128`, `v4f32`, `v2f64` qiucf: Yes. Supported types: - VSX off: `f32`, `f64` - VSX on: `f32`, `f64`, `f128`, `v4f32`, `v2f64`
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Yes, I will post another patch for the vector type as what I have mentioned in the comments of D80706. steven.zhang: Yes, I will post another patch for the vector type as what I have mentioned in the comments of…
		return TargetLowering::getSqrtResultForDenormInput(Op, DAG);

		return DAG.getNode(PPCISD::FSQRT, SDLoc(Op), VT, Op);
		}

SDValue PPCTargetLowering::getSqrtEstimate(SDValue Operand, SelectionDAG &DAG,		SDValue PPCTargetLowering::getSqrtEstimate(SDValue Operand, SelectionDAG &DAG,
int Enabled, int &RefinementSteps,		int Enabled, int &RefinementSteps,
bool &UseOneConstNR,		bool &UseOneConstNR,
bool Reciprocal) const {		bool Reciprocal) const {
EVT VT = Operand.getValueType();		EVT VT = Operand.getValueType();
if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|		if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) \|\|
(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|		(VT == MVT::f64 && Subtarget.hasFRSQRTE()) \|\|
(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|		(VT == MVT::v4f32 && Subtarget.hasAltivec()) \|\|
▲ Show 20 Lines • Show All 4,237 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.td

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
]>;		]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// PowerPC specific DAG Nodes.		// PowerPC specific DAG Nodes.
//		//

def PPCfre : SDNode<"PPCISD::FRE", SDTFPUnaryOp, []>;		def PPCfre : SDNode<"PPCISD::FRE", SDTFPUnaryOp, []>;
def PPCfrsqrte: SDNode<"PPCISD::FRSQRTE", SDTFPUnaryOp, []>;		def PPCfrsqrte: SDNode<"PPCISD::FRSQRTE", SDTFPUnaryOp, []>;
		def PPCfsqrt : SDNode<"PPCISD::FSQRT", SDTFPUnaryOp, []>;
def PPCftsqrt : SDNode<"PPCISD::FTSQRT", SDT_PPCFtsqrt,[]>;		def PPCftsqrt : SDNode<"PPCISD::FTSQRT", SDT_PPCFtsqrt,[]>;

def PPCfcfid : SDNode<"PPCISD::FCFID", SDTFPUnaryOp, []>;		def PPCfcfid : SDNode<"PPCISD::FCFID", SDTFPUnaryOp, []>;
def PPCfcfidu : SDNode<"PPCISD::FCFIDU", SDTFPUnaryOp, []>;		def PPCfcfidu : SDNode<"PPCISD::FCFIDU", SDTFPUnaryOp, []>;
def PPCfcfids : SDNode<"PPCISD::FCFIDS", SDTFPRoundOp, []>;		def PPCfcfids : SDNode<"PPCISD::FCFIDS", SDTFPRoundOp, []>;
def PPCfcfidus: SDNode<"PPCISD::FCFIDUS", SDTFPRoundOp, []>;		def PPCfcfidus: SDNode<"PPCISD::FCFIDUS", SDTFPRoundOp, []>;
def PPCfctidz : SDNode<"PPCISD::FCTIDZ", SDTFPUnaryOp, []>;		def PPCfctidz : SDNode<"PPCISD::FCTIDZ", SDTFPUnaryOp, []>;
def PPCfctiwz : SDNode<"PPCISD::FCTIWZ", SDTFPUnaryOp, []>;		def PPCfctiwz : SDNode<"PPCISD::FCTIWZ", SDTFPUnaryOp, []>;
▲ Show 20 Lines • Show All 2,563 Lines • ▼ Show 20 Lines	defm FSQRT : XForm_26r<63, 22, (outs f8rc:$frD), (ins f8rc:$frB),
"fsqrt", "$frD, $frB", IIC_FPSqrtD,		"fsqrt", "$frD, $frB", IIC_FPSqrtD,
[(set f64:$frD, (any_fsqrt f64:$frB))]>;		[(set f64:$frD, (any_fsqrt f64:$frB))]>;
defm FSQRTS : XForm_26r<59, 22, (outs f4rc:$frD), (ins f4rc:$frB),		defm FSQRTS : XForm_26r<59, 22, (outs f4rc:$frD), (ins f4rc:$frB),
"fsqrts", "$frD, $frB", IIC_FPSqrtS,		"fsqrts", "$frD, $frB", IIC_FPSqrtS,
[(set f32:$frD, (any_fsqrt f32:$frB))]>;		[(set f32:$frD, (any_fsqrt f32:$frB))]>;
}		}
}		}

		def : Pat<(PPCfsqrt f64:$frA), (FSQRT $frA)>;

/// Note that FMR is defined as pseudo-ops on the PPC970 because they are		/// Note that FMR is defined as pseudo-ops on the PPC970 because they are
/// often coalesced away and we don't want the dispatch group builder to think		/// often coalesced away and we don't want the dispatch group builder to think
/// that they will fill slots (which could cause the load of a LSU reject to		/// that they will fill slots (which could cause the load of a LSU reject to
/// sneak into a d-group with a store).		/// sneak into a d-group with a store).
let hasSideEffects = 0, Predicates = [HasFPU] in		let hasSideEffects = 0, Predicates = [HasFPU] in
defm FMR : XForm_26r<63, 72, (outs f4rc:$frD), (ins f4rc:$frB),		defm FMR : XForm_26r<63, 72, (outs f4rc:$frD), (ins f4rc:$frB),
"fmr", "$frD, $frB", IIC_FPGeneral,		"fmr", "$frD, $frB", IIC_FPGeneral,
[]>, // (set f32:$frD, f32:$frB)		[]>, // (set f32:$frD, f32:$frB)
▲ Show 20 Lines • Show All 2,560 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrVSX.td

	Show First 20 Lines • Show All 2,457 Lines • ▼ Show 20 Lines

	def : Pat<(PPCfnmsub v4f32:$A, v4f32:$B, v4f32:$C),			def : Pat<(PPCfnmsub v4f32:$A, v4f32:$B, v4f32:$C),
	(XVNMSUBASP $C, $A, $B)>;			(XVNMSUBASP $C, $A, $B)>;
	def : Pat<(fneg (PPCfnmsub v4f32:$A, v4f32:$B, v4f32:$C)),			def : Pat<(fneg (PPCfnmsub v4f32:$A, v4f32:$B, v4f32:$C)),
	(XVMSUBASP $C, $A, $B)>;			(XVMSUBASP $C, $A, $B)>;
	def : Pat<(PPCfnmsub v4f32:$A, v4f32:$B, (fneg v4f32:$C)),			def : Pat<(PPCfnmsub v4f32:$A, v4f32:$B, (fneg v4f32:$C)),
	(XVNMADDASP $C, $A, $B)>;			(XVNMADDASP $C, $A, $B)>;

				def : Pat<(PPCfsqrt f64:$frA), (XSSQRTDP $frA)>;

	def : Pat<(v2f64 (bitconvert v4f32:$A)),			def : Pat<(v2f64 (bitconvert v4f32:$A)),
	(COPY_TO_REGCLASS $A, VSRC)>;			(COPY_TO_REGCLASS $A, VSRC)>;
	def : Pat<(v2f64 (bitconvert v4i32:$A)),			def : Pat<(v2f64 (bitconvert v4i32:$A)),
	(COPY_TO_REGCLASS $A, VSRC)>;			(COPY_TO_REGCLASS $A, VSRC)>;
	def : Pat<(v2f64 (bitconvert v8i16:$A)),			def : Pat<(v2f64 (bitconvert v8i16:$A)),
	(COPY_TO_REGCLASS $A, VSRC)>;			(COPY_TO_REGCLASS $A, VSRC)>;
	def : Pat<(v2f64 (bitconvert v16i8:$A)),			def : Pat<(v2f64 (bitconvert v16i8:$A)),
	(COPY_TO_REGCLASS $A, VSRC)>;			(COPY_TO_REGCLASS $A, VSRC)>;
	▲ Show 20 Lines • Show All 2,190 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/fma-mutate.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr7 -mattr=+vsx -disable-ppc-vsx-fma-mutation=false \| FileCheck %s

	declare double @llvm.sqrt.f64(double)			declare double @llvm.sqrt.f64(double)

	; Test several VSX FMA mutation opportunities.			; Test several VSX FMA mutation opportunities.

	; This is reasonable transformation since it eliminates extra register copy.			; This is reasonable transformation since it eliminates extra register copy.
	define double @foo3_fmf(double %a) nounwind {			define double @foo3_fmf(double %a) nounwind {
	; CHECK-LABEL: foo3_fmf:			; CHECK-LABEL: foo3_fmf:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xstsqrtdp 0, 1			; CHECK-NEXT: xstsqrtdp 0, 1
	; CHECK-NEXT: xxlxor 0, 0, 0
	; CHECK-NEXT: bc 12, 2, .LBB0_2			; CHECK-NEXT: bc 12, 2, .LBB0_2
	; CHECK-NEXT: # %bb.1:			; CHECK-NEXT: # %bb.1:
	; CHECK-NEXT: xsrsqrtedp 0, 1			; CHECK-NEXT: xsrsqrtedp 0, 1
	; CHECK-NEXT: addis 3, 2, .LCPI0_0@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI0_0@toc@ha
	; CHECK-NEXT: lfs 3, .LCPI0_0@toc@l(3)			; CHECK-NEXT: lfs 3, .LCPI0_0@toc@l(3)
	; CHECK-NEXT: addis 3, 2, .LCPI0_1@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI0_1@toc@ha
	; CHECK-NEXT: lfs 4, .LCPI0_1@toc@l(3)			; CHECK-NEXT: lfs 4, .LCPI0_1@toc@l(3)
	; CHECK-NEXT: xsmuldp 2, 1, 0			; CHECK-NEXT: xsmuldp 2, 1, 0
	; CHECK-NEXT: xsmaddmdp 2, 0, 3			; CHECK-NEXT: xsmaddmdp 2, 0, 3
	; CHECK-NEXT: xsmuldp 0, 0, 4			; CHECK-NEXT: xsmuldp 0, 0, 4
	; CHECK-NEXT: xsmuldp 0, 0, 2			; CHECK-NEXT: xsmuldp 0, 0, 2
	; CHECK-NEXT: xsmuldp 1, 1, 0			; CHECK-NEXT: xsmuldp 1, 1, 0
	; CHECK-NEXT: xsmaddadp 3, 1, 0			; CHECK-NEXT: xsmaddadp 3, 1, 0
	; CHECK-NEXT: xsmuldp 0, 1, 4			; CHECK-NEXT: xsmuldp 0, 1, 4
	; CHECK-NEXT: xsmuldp 0, 0, 3			; CHECK-NEXT: xsmuldp 1, 0, 3
				; CHECK-NEXT: blr
	; CHECK-NEXT: .LBB0_2:			; CHECK-NEXT: .LBB0_2:
	; CHECK-NEXT: fmr 1, 0			; CHECK-NEXT: xssqrtdp 1, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%r = call reassoc afn ninf double @llvm.sqrt.f64(double %a)			%r = call reassoc afn ninf double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r
	}			}

	define double @foo3_safe(double %a) nounwind {			define double @foo3_safe(double %a) nounwind {
	; CHECK-LABEL: foo3_safe:			; CHECK-LABEL: foo3_safe:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xssqrtdp 1, 1			; CHECK-NEXT: xssqrtdp 1, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%r = call double @llvm.sqrt.f64(double %a)			%r = call double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r
	}			}

llvm/test/CodeGen/PowerPC/recipest.ll

	Show First 20 Lines • Show All 761 Lines • ▼ Show 20 Lines
	; CHECK-P7-NEXT: fmul 0, 0, 4			; CHECK-P7-NEXT: fmul 0, 0, 4
	; CHECK-P7-NEXT: fmul 0, 0, 2			; CHECK-P7-NEXT: fmul 0, 0, 2
	; CHECK-P7-NEXT: fmul 1, 1, 0			; CHECK-P7-NEXT: fmul 1, 1, 0
	; CHECK-P7-NEXT: fmadd 0, 1, 0, 3			; CHECK-P7-NEXT: fmadd 0, 1, 0, 3
	; CHECK-P7-NEXT: fmul 1, 1, 4			; CHECK-P7-NEXT: fmul 1, 1, 4
	; CHECK-P7-NEXT: fmul 1, 1, 0			; CHECK-P7-NEXT: fmul 1, 1, 0
	; CHECK-P7-NEXT: blr			; CHECK-P7-NEXT: blr
	; CHECK-P7-NEXT: .LBB20_2:			; CHECK-P7-NEXT: .LBB20_2:
	; CHECK-P7-NEXT: addis 3, 2, .LCPI20_2@toc@ha			; CHECK-P7-NEXT: fsqrt 1, 1
	; CHECK-P7-NEXT: lfs 1, .LCPI20_2@toc@l(3)
	; CHECK-P7-NEXT: blr			; CHECK-P7-NEXT: blr
	;			;
	; CHECK-P8-LABEL: foo3_fmf:			; CHECK-P8-LABEL: foo3_fmf:
	; CHECK-P8: # %bb.0:			; CHECK-P8: # %bb.0:
	; CHECK-P8-NEXT: xstsqrtdp 0, 1			; CHECK-P8-NEXT: xstsqrtdp 0, 1
	; CHECK-P8-NEXT: xxlxor 0, 0, 0
	; CHECK-P8-NEXT: bc 12, 2, .LBB20_2			; CHECK-P8-NEXT: bc 12, 2, .LBB20_2
	; CHECK-P8-NEXT: # %bb.1:			; CHECK-P8-NEXT: # %bb.1:
	; CHECK-P8-NEXT: xsrsqrtedp 0, 1			; CHECK-P8-NEXT: xsrsqrtedp 0, 1
	; CHECK-P8-NEXT: addis 3, 2, .LCPI20_0@toc@ha			; CHECK-P8-NEXT: addis 3, 2, .LCPI20_0@toc@ha
	; CHECK-P8-NEXT: lfs 3, .LCPI20_0@toc@l(3)			; CHECK-P8-NEXT: lfs 3, .LCPI20_0@toc@l(3)
	; CHECK-P8-NEXT: addis 3, 2, .LCPI20_1@toc@ha			; CHECK-P8-NEXT: addis 3, 2, .LCPI20_1@toc@ha
	; CHECK-P8-NEXT: lfs 4, .LCPI20_1@toc@l(3)			; CHECK-P8-NEXT: lfs 4, .LCPI20_1@toc@l(3)
	; CHECK-P8-NEXT: fmr 5, 3			; CHECK-P8-NEXT: fmr 5, 3
	; CHECK-P8-NEXT: xsmuldp 2, 1, 0			; CHECK-P8-NEXT: xsmuldp 2, 1, 0
	; CHECK-P8-NEXT: xsmaddadp 5, 2, 0			; CHECK-P8-NEXT: xsmaddadp 5, 2, 0
	; CHECK-P8-NEXT: xsmuldp 0, 0, 4			; CHECK-P8-NEXT: xsmuldp 0, 0, 4
	; CHECK-P8-NEXT: xsmuldp 0, 0, 5			; CHECK-P8-NEXT: xsmuldp 0, 0, 5
	; CHECK-P8-NEXT: xsmuldp 1, 1, 0			; CHECK-P8-NEXT: xsmuldp 1, 1, 0
	; CHECK-P8-NEXT: xsmaddadp 3, 1, 0			; CHECK-P8-NEXT: xsmaddadp 3, 1, 0
	; CHECK-P8-NEXT: xsmuldp 0, 1, 4			; CHECK-P8-NEXT: xsmuldp 0, 1, 4
	; CHECK-P8-NEXT: xsmuldp 0, 0, 3			; CHECK-P8-NEXT: xsmuldp 1, 0, 3
				; CHECK-P8-NEXT: blr
	; CHECK-P8-NEXT: .LBB20_2:			; CHECK-P8-NEXT: .LBB20_2:
	; CHECK-P8-NEXT: fmr 1, 0			; CHECK-P8-NEXT: xssqrtdp 1, 1
	; CHECK-P8-NEXT: blr			; CHECK-P8-NEXT: blr
	;			;
	; CHECK-P9-LABEL: foo3_fmf:			; CHECK-P9-LABEL: foo3_fmf:
	; CHECK-P9: # %bb.0:			; CHECK-P9: # %bb.0:
	; CHECK-P9-NEXT: xstsqrtdp 0, 1			; CHECK-P9-NEXT: xstsqrtdp 0, 1
	; CHECK-P9-NEXT: xxlxor 0, 0, 0
	; CHECK-P9-NEXT: bc 12, 2, .LBB20_2			; CHECK-P9-NEXT: bc 12, 2, .LBB20_2
	; CHECK-P9-NEXT: # %bb.1:			; CHECK-P9-NEXT: # %bb.1:
	; CHECK-P9-NEXT: xsrsqrtedp 0, 1			; CHECK-P9-NEXT: xsrsqrtedp 0, 1
	; CHECK-P9-NEXT: addis 3, 2, .LCPI20_0@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI20_0@toc@ha
	; CHECK-P9-NEXT: lfs 3, .LCPI20_0@toc@l(3)			; CHECK-P9-NEXT: lfs 3, .LCPI20_0@toc@l(3)
	; CHECK-P9-NEXT: addis 3, 2, .LCPI20_1@toc@ha			; CHECK-P9-NEXT: addis 3, 2, .LCPI20_1@toc@ha
	; CHECK-P9-NEXT: xsmuldp 2, 1, 0			; CHECK-P9-NEXT: xsmuldp 2, 1, 0
	; CHECK-P9-NEXT: fmr 4, 3			; CHECK-P9-NEXT: fmr 4, 3
	; CHECK-P9-NEXT: xsmaddadp 4, 2, 0			; CHECK-P9-NEXT: xsmaddadp 4, 2, 0
	; CHECK-P9-NEXT: lfs 2, .LCPI20_1@toc@l(3)			; CHECK-P9-NEXT: lfs 2, .LCPI20_1@toc@l(3)
	; CHECK-P9-NEXT: xsmuldp 0, 0, 2			; CHECK-P9-NEXT: xsmuldp 0, 0, 2
	; CHECK-P9-NEXT: xsmuldp 0, 0, 4			; CHECK-P9-NEXT: xsmuldp 0, 0, 4
	; CHECK-P9-NEXT: xsmuldp 1, 1, 0			; CHECK-P9-NEXT: xsmuldp 1, 1, 0
	; CHECK-P9-NEXT: xsmaddadp 3, 1, 0			; CHECK-P9-NEXT: xsmaddadp 3, 1, 0
	; CHECK-P9-NEXT: xsmuldp 0, 1, 2			; CHECK-P9-NEXT: xsmuldp 0, 1, 2
	; CHECK-P9-NEXT: xsmuldp 0, 0, 3			; CHECK-P9-NEXT: xsmuldp 1, 0, 3
				; CHECK-P9-NEXT: blr
	; CHECK-P9-NEXT: .LBB20_2:			; CHECK-P9-NEXT: .LBB20_2:
	; CHECK-P9-NEXT: fmr 1, 0			; CHECK-P9-NEXT: xssqrtdp 1, 1
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	%r = call reassoc ninf afn double @llvm.sqrt.f64(double %a)			%r = call reassoc ninf afn double @llvm.sqrt.f64(double %a)
	ret double %r			ret double %r
	}			}

	define double @foo3_safe(double %a) nounwind {			define double @foo3_safe(double %a) nounwind {
	; CHECK-P7-LABEL: foo3_safe:			; CHECK-P7-LABEL: foo3_safe:
	; CHECK-P7: # %bb.0:			; CHECK-P7: # %bb.0:
	▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
	; CHECK-P9-NEXT: blr			; CHECK-P9-NEXT: blr
	%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)			%r = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @hoo4_fmf(<2 x double> %a) #1 {			define <2 x double> @hoo4_fmf(<2 x double> %a) #1 {
	; CHECK-P7-LABEL: hoo4_fmf:			; CHECK-P7-LABEL: hoo4_fmf:
	; CHECK-P7: # %bb.0:			; CHECK-P7: # %bb.0:
	; CHECK-P7-NEXT: addis 3, 2, .LCPI26_2@toc@ha
	; CHECK-P7-NEXT: ftsqrt 0, 1			; CHECK-P7-NEXT: ftsqrt 0, 1
	; CHECK-P7-NEXT: fmr 3, 1			; CHECK-P7-NEXT: addis 3, 2, .LCPI26_0@toc@ha
	; CHECK-P7-NEXT: addis 4, 2, .LCPI26_0@toc@ha			; CHECK-P7-NEXT: addis 4, 2, .LCPI26_1@toc@ha
	; CHECK-P7-NEXT: lfs 0, .LCPI26_2@toc@l(3)			; CHECK-P7-NEXT: lfs 3, .LCPI26_0@toc@l(3)
	; CHECK-P7-NEXT: addis 3, 2, .LCPI26_1@toc@ha			; CHECK-P7-NEXT: lfs 0, .LCPI26_1@toc@l(4)
	; CHECK-P7-NEXT: lfs 5, .LCPI26_0@toc@l(4)			; CHECK-P7-NEXT: bc 12, 2, .LBB26_3
	; CHECK-P7-NEXT: lfs 4, .LCPI26_1@toc@l(3)
	; CHECK-P7-NEXT: fmr 1, 0
	; CHECK-P7-NEXT: bc 4, 2, .LBB26_3
	; CHECK-P7-NEXT: # %bb.1:			; CHECK-P7-NEXT: # %bb.1:
				; CHECK-P7-NEXT: frsqrte 4, 1
				; CHECK-P7-NEXT: fmul 5, 1, 4
				; CHECK-P7-NEXT: fmadd 5, 5, 4, 3
				; CHECK-P7-NEXT: fmul 4, 4, 0
				; CHECK-P7-NEXT: fmul 4, 4, 5
				; CHECK-P7-NEXT: fmul 1, 1, 4
				; CHECK-P7-NEXT: fmadd 4, 1, 4, 3
				; CHECK-P7-NEXT: fmul 1, 1, 0
				; CHECK-P7-NEXT: fmul 1, 1, 4
	; CHECK-P7-NEXT: ftsqrt 0, 2			; CHECK-P7-NEXT: ftsqrt 0, 2
	; CHECK-P7-NEXT: bc 4, 2, .LBB26_4			; CHECK-P7-NEXT: bc 4, 2, .LBB26_4
	; CHECK-P7-NEXT: .LBB26_2:			; CHECK-P7-NEXT: .LBB26_2:
	; CHECK-P7-NEXT: fmr 2, 0			; CHECK-P7-NEXT: fsqrt 2, 2
	; CHECK-P7-NEXT: blr			; CHECK-P7-NEXT: blr
	; CHECK-P7-NEXT: .LBB26_3:			; CHECK-P7-NEXT: .LBB26_3:
	; CHECK-P7-NEXT: frsqrte 1, 3			; CHECK-P7-NEXT: fsqrt 1, 1
	; CHECK-P7-NEXT: fmul 6, 3, 1
	; CHECK-P7-NEXT: fmadd 6, 6, 1, 5
	; CHECK-P7-NEXT: fmul 1, 1, 4
	; CHECK-P7-NEXT: fmul 1, 1, 6
	; CHECK-P7-NEXT: fmul 3, 3, 1
	; CHECK-P7-NEXT: fmadd 1, 3, 1, 5
	; CHECK-P7-NEXT: fmul 3, 3, 4
	; CHECK-P7-NEXT: fmul 1, 3, 1
	; CHECK-P7-NEXT: ftsqrt 0, 2			; CHECK-P7-NEXT: ftsqrt 0, 2
	; CHECK-P7-NEXT: bc 12, 2, .LBB26_2			; CHECK-P7-NEXT: bc 12, 2, .LBB26_2
	; CHECK-P7-NEXT: .LBB26_4:			; CHECK-P7-NEXT: .LBB26_4:
	; CHECK-P7-NEXT: frsqrte 0, 2			; CHECK-P7-NEXT: frsqrte 4, 2
	; CHECK-P7-NEXT: fmul 3, 2, 0			; CHECK-P7-NEXT: fmul 5, 2, 4
	; CHECK-P7-NEXT: fmadd 3, 3, 0, 5			; CHECK-P7-NEXT: fmadd 5, 5, 4, 3
	; CHECK-P7-NEXT: fmul 0, 0, 4			; CHECK-P7-NEXT: fmul 4, 4, 0
	; CHECK-P7-NEXT: fmul 0, 0, 3			; CHECK-P7-NEXT: fmul 4, 4, 5
	; CHECK-P7-NEXT: fmul 2, 2, 0
	; CHECK-P7-NEXT: fmadd 0, 2, 0, 5
	; CHECK-P7-NEXT: fmul 2, 2, 4			; CHECK-P7-NEXT: fmul 2, 2, 4
				; CHECK-P7-NEXT: fmadd 3, 2, 4, 3
	; CHECK-P7-NEXT: fmul 0, 2, 0			; CHECK-P7-NEXT: fmul 0, 2, 0
	; CHECK-P7-NEXT: fmr 2, 0			; CHECK-P7-NEXT: fmul 2, 0, 3
	; CHECK-P7-NEXT: blr			; CHECK-P7-NEXT: blr
	;			;
	; CHECK-P8-LABEL: hoo4_fmf:			; CHECK-P8-LABEL: hoo4_fmf:
	; CHECK-P8: # %bb.0:			; CHECK-P8: # %bb.0:
	; CHECK-P8-NEXT: xvrsqrtedp 0, 34			; CHECK-P8-NEXT: xvrsqrtedp 0, 34
	; CHECK-P8-NEXT: addis 3, 2, .LCPI26_0@toc@ha			; CHECK-P8-NEXT: addis 3, 2, .LCPI26_0@toc@ha
	; CHECK-P8-NEXT: addi 3, 3, .LCPI26_0@toc@l			; CHECK-P8-NEXT: addi 3, 3, .LCPI26_0@toc@l
	; CHECK-P8-NEXT: lxvd2x 1, 0, 3			; CHECK-P8-NEXT: lxvd2x 1, 0, 3
	▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormalClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 307944

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/lib/Target/PowerPC/PPCInstrInfo.td

llvm/lib/Target/PowerPC/PPCInstrVSX.td

llvm/test/CodeGen/PowerPC/fma-mutate.ll

llvm/test/CodeGen/PowerPC/recipest.ll

[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
ClosedPublic