This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Custom lowering constrained fps.
AbandonedPublic

Authored by arsenm on Oct 6 2017, 9:38 AM.

Download Raw Diff

Details

Reviewers

andrew.w.kaylor
b-sumner
rampitec
kzhuravl
wdng

Summary

This patch only shows a way how to custom lowering the constrained fma operation.

What does this patch do:

Expand SDNodeFlags APIs to set up SDNodeFlags at the initial DAG build phase when reading the constrained fps metadata data.
AMDGPU backend sets up resister modes based on retrieved SDNodeFlags.

Diff Detail

Event Timeline

wdng created this revision.Oct 6 2017, 9:38 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptOct 6 2017, 9:38 AM

Fixed format issue.

andrew.w.kaylor requested changes to this revision.Oct 6 2017, 12:09 PM

andrew.w.kaylor added inline comments.

include/llvm/CodeGen/SelectionDAGNodes.h
368	I don't really like the fact that these are separate flags, given that they're mutually exclusive. Also, I think we're eventually going to need to be able to distinguish between assumed rounding modes (where the instruction encoding isn't expected to include the rounding mode) and forced rounding modes (where the rounding mode will be encoded in the instruction). I don't have a specific vision for how that will need to work, but I know there are instructions that work this way and we'll need to handle at least intrinsics that use them. As I recall someone at AMD mentioned wanting behavior like that for flush-to-zero also. The currently documented behavior of the constrained FP intrinsics is that the rounding mode tells the optimizer what it may assume about the rounding mode at the intrinsic location. Something else must have been done to set the rounding mode. If you are lowering to instructions that include a rounding mode, how do you handle the RoundDynamic case?
384–386	I don't think all of the rounding modes can default to false. Maybe you need a RoundDefault option. For instance, if SDNodeFlags::isDefined() returns true, but the node doesn't have a constrained rounding mode I'd need to check four different flags and then make an assumption to see that.
510	I think you need some logic here to set the rounding mode to dynamic if the flags being intersected conflict.
515	There needs to be a hierarchy here. For instance, if you merge ExceptIgnore with ExceptStrict it should result in ExceptStrict.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
943	This change will fail on all platforms except AMDGPU as you have this patch written. If you need target-specific behavior here, we'll need a target hook of some kind.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
5471	All of the intrinsics above from fadd to fma fall through into this code. That's obviously not what you intended.
5473	What about the other rounding modes?
lib/Target/AMDGPU/SIISelLowering.cpp
3177	If you're going to make the changes in this patch, you need at least reasonable default behavior for all other platforms.

This revision now requires changes to proceed.Oct 6 2017, 12:09 PM

wdng added a reviewer: rampitec.Oct 9 2017, 2:58 PM

rampitec added inline comments.Oct 9 2017, 3:17 PM

include/llvm/CodeGen/SelectionDAGNodes.h
368	As these flags only apply to the ISD::STRICT_* opcodes you probably do not need to add fields to a generic SDNodeFlags. STRICT_* opcodes can get an extra operand for rounding mode.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
943	Looks like a good place to check for TLI.getOperationAction on the incoming Opcode and keep it as is if it is custom.

Instead of using SDNodeFlags to store metadata information, this patch directly appends an extra operand for rounding mode during the DAG build phase based on Stats's suggestion. This patch currently implements strict fps for fadd, fsub, fmul, fma, and fsqrt. Thanks a lot for @andrew.w.kaylor and @rampitec comments for this!

Known issues:

FDIV case is a special case, a separate patch will be created for it.
Currently, we don't take care of the "round.dynamic" rounding mode for the time being.
Will create a separate patch for f16 data type.

wdng marked 9 inline comments as done.Oct 16 2017, 11:36 AM

wdng retitled this revision from AMDGPU : Expand SDNodeFlags APIs & custom lowering constrained fps. to AMDGPU : Custom lowering constrained fps..

rampitec added inline comments.Oct 16 2017, 2:35 PM

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
206	You have removed it but declaration remains.
783–788	What about f16?
804	Use of "if(cond) ... else .. cond ? :" is weird.
845	f16?
lib/Target/AMDGPU/AMDGPUISelLowering.h
338 ↗	(On Diff #119184)	You do not handle every one of that.
lib/Target/AMDGPU/SIISelLowering.cpp
319	These commented lines not needed.
4918	llvm_unreachable
4948	You are already doing translation, you can remove all the switches translating EqOpc into the same with chain. Just use it here.
4995	OK, you have chained all the nodes which require to reside within two s_setreg statements. How do you prevent any other regular fp operations without a chain to be scheduled in between of them?

kzhuravl requested changes to this revision.Oct 17 2017, 2:09 PM

kzhuravl added inline comments.

lib/Target/AMDGPU/SIDefines.h
461–462 ↗	(On Diff #119184)	These should go to relevant enums (like Id, Offset, WidthMinusOne, etc.).
lib/Target/AMDGPU/SIISelLowering.cpp
4912	Rename WidthBit to "Offset".
4917	Remove. This can go to default?
4928	New line.
4931	Missing ID_SHIFT_.
4933	What is 1? Do not use bare numbers.

This revision now requires changes to proceed.Oct 17 2017, 2:09 PM

It isn't clear to me how your custom lowering interacts, if at all, with existing table-driven selection patterns. One of the goals in the implementation up to this point has been to have the instruction selection fall back on existing pattern matching as much as possible so that we don't need to duplicate all of the cases that are currently handled. Can you explain to me how this applies in the AMDGPU case?

lib/Target/AMDGPU/SIISelLowering.cpp
4925	You have upward and downward reversed.
4941	I think you're interpreting the rounding mode argument differently than I have, and therefore differently than the documentation in the LLVM Language Reference ("therefore" because I wrote the documentation). My intention was that the rounding mode argument was provided as information to the optimizer. It tells the optimizer what it can assume about rounding mode at the point of the operation. It was not intended to actually set the rounding mode. I'm approaching this from the perspective of the STDC pragmas related to the FP environment. My understanding of these is that if FENV_ACCESS on is declared, we must assume dynamic (i.e. unknown) rounding mode in those scopes unless we can prove otherwise, but if the user wants to change the rounding mode a specific function call (such as fesetround) will be used. I'm not sure what sort of front end you are assuming here, so that may explain the difference in your approach. There are some x86 instructions that can incorporate a rounding mode operand, and it is my understanding that the AMDGPU architecture has similar needs. However, I believe we will need to extend the constrained FP intrinsics (or possibly introduce new intrinsics to handle cases like that.

arsenm commandeered this revision.Apr 5 2020, 8:48 AM

arsenm edited reviewers, added: wdng; removed: arsenm.

Herald added subscribers: kerbowa, jvesely. · View Herald TranscriptApr 5 2020, 8:48 AM

Needs to be redone

Revision Contents

Path

Size

include/

llvm/

CodeGen/

SelectionDAGNodes.h

70 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

5 lines

SelectionDAGBuilder.cpp

17 lines

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

8 lines

SIISelLowering.h

1 line

SIISelLowering.cpp

60 lines

test/

CodeGen/

AMDGPU/

constrained_fp.ll

19 lines

Diff 118020

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	private:
bool UnsafeAlgebra : 1;		bool UnsafeAlgebra : 1;
bool NoNaNs : 1;		bool NoNaNs : 1;
bool NoInfs : 1;		bool NoInfs : 1;
bool NoSignedZeros : 1;		bool NoSignedZeros : 1;
bool AllowReciprocal : 1;		bool AllowReciprocal : 1;
bool VectorReduction : 1;		bool VectorReduction : 1;
bool AllowContract : 1;		bool AllowContract : 1;

		bool RoundDynamic : 1;
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I don't really like the fact that these are separate flags, given that they're mutually exclusive. Also, I think we're eventually going to need to be able to distinguish between assumed rounding modes (where the instruction encoding isn't expected to include the rounding mode) and forced rounding modes (where the rounding mode will be encoded in the instruction). I don't have a specific vision for how that will need to work, but I know there are instructions that work this way and we'll need to handle at least intrinsics that use them. As I recall someone at AMD mentioned wanting behavior like that for flush-to-zero also. The currently documented behavior of the constrained FP intrinsics is that the rounding mode tells the optimizer what it may assume about the rounding mode at the intrinsic location. Something else must have been done to set the rounding mode. If you are lowering to instructions that include a rounding mode, how do you handle the RoundDynamic case? andrew.w.kaylor: I don't really like the fact that these are separate flags, given that they're mutually…
		rampitecUnsubmitted Done Reply Inline Actions As these flags only apply to the ISD::STRICT_* opcodes you probably do not need to add fields to a generic SDNodeFlags. STRICT_* opcodes can get an extra operand for rounding mode. rampitec: As these flags only apply to the ISD::STRICT_* opcodes you probably do not need to add fields…
		bool RoundTonearest : 1;
		bool RoundDownward : 1;
		bool RoundUpward : 1;
		bool RoundTowardZero : 1;

		bool ExceptIgnore : 1;
		bool ExceptMayTrap : 1;
		bool ExceptStrict : 1;

public:		public:
/// Default constructor turns off all optimization flags.		/// Default constructor turns off all optimization flags.
SDNodeFlags()		SDNodeFlags()
: AnyDefined(false), NoUnsignedWrap(false), NoSignedWrap(false),		: AnyDefined(false), NoUnsignedWrap(false), NoSignedWrap(false),
Exact(false), UnsafeAlgebra(false), NoNaNs(false), NoInfs(false),		Exact(false), UnsafeAlgebra(false), NoNaNs(false), NoInfs(false),
NoSignedZeros(false), AllowReciprocal(false), VectorReduction(false),		NoSignedZeros(false), AllowReciprocal(false), VectorReduction(false),
AllowContract(false) {}		AllowContract(false), RoundDynamic(false), RoundTonearest(false),
		RoundDownward(false), RoundUpward(false), RoundTowardZero(false),
		ExceptIgnore(false), ExceptMayTrap(false), ExceptStrict(false) {}
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I don't think all of the rounding modes can default to false. Maybe you need a RoundDefault option. For instance, if SDNodeFlags::isDefined() returns true, but the node doesn't have a constrained rounding mode I'd need to check four different flags and then make an assumption to see that. andrew.w.kaylor: I don't think all of the rounding modes can default to false. Maybe you need a RoundDefault…

/// Sets the state of the flags to the defined state.		/// Sets the state of the flags to the defined state.
void setDefined() { AnyDefined = true; }		void setDefined() { AnyDefined = true; }
/// Returns true if the flags are in a defined state.		/// Returns true if the flags are in a defined state.
bool isDefined() const { return AnyDefined; }		bool isDefined() const { return AnyDefined; }

// These are mutators for each flag.		// These are mutators for each flag.
void setNoUnsignedWrap(bool b) {		void setNoUnsignedWrap(bool b) {
Show All 32 Lines	void setVectorReduction(bool b) {
setDefined();		setDefined();
VectorReduction = b;		VectorReduction = b;
}		}
void setAllowContract(bool b) {		void setAllowContract(bool b) {
setDefined();		setDefined();
AllowContract = b;		AllowContract = b;
}		}

		void setRoundDynamic(bool b) {
		setDefined();
		RoundDynamic = b;
		}

		void setRoundTonearest(bool b) {
		setDefined();
		RoundTonearest = b;
		}

		void setRoundDownward(bool b) {
		setDefined();
		RoundDownward = b;
		}

		void setRoundUpward(bool b) {
		setDefined();
		RoundUpward = b;
		}

		void setRoundTowardZero(bool b) {
		setDefined();
		RoundTowardZero = b;
		}

		void setExceptIgnore(bool b) {
		setDefined();
		ExceptIgnore = b;
		}

		void setExceptMayTrap(bool b) {
		setDefined();
		ExceptMayTrap = b;
		}

		void setExceptStrict(bool b) {
		setDefined();
		ExceptStrict = b;
		}

// These are accessors for each flag.		// These are accessors for each flag.
bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }		bool hasNoUnsignedWrap() const { return NoUnsignedWrap; }
bool hasNoSignedWrap() const { return NoSignedWrap; }		bool hasNoSignedWrap() const { return NoSignedWrap; }
bool hasExact() const { return Exact; }		bool hasExact() const { return Exact; }
bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }		bool hasUnsafeAlgebra() const { return UnsafeAlgebra; }
bool hasNoNaNs() const { return NoNaNs; }		bool hasNoNaNs() const { return NoNaNs; }
bool hasNoInfs() const { return NoInfs; }		bool hasNoInfs() const { return NoInfs; }
bool hasNoSignedZeros() const { return NoSignedZeros; }		bool hasNoSignedZeros() const { return NoSignedZeros; }
bool hasAllowReciprocal() const { return AllowReciprocal; }		bool hasAllowReciprocal() const { return AllowReciprocal; }
bool hasVectorReduction() const { return VectorReduction; }		bool hasVectorReduction() const { return VectorReduction; }
bool hasAllowContract() const { return AllowContract; }		bool hasAllowContract() const { return AllowContract; }
		bool hasRoundDynamic() const { return RoundDynamic; }
		bool hasRoundTonearest() const { return RoundTonearest; }
		bool hasRoundDownward() const { return RoundDownward; }
		bool hasRoundUpward() const { return RoundUpward; }
		bool hasRoundTowardZero() const { return RoundTowardZero; }
		bool hasExceptIgnore() const { return ExceptIgnore; }
		bool hasExceptMayTrap() const { return ExceptMayTrap; }
		bool hasExceptStrict() const { return ExceptStrict; }

/// Clear any flags in this flag set that aren't also set in Flags.		/// Clear any flags in this flag set that aren't also set in Flags.
/// If the given Flags are undefined then don't do anything.		/// If the given Flags are undefined then don't do anything.
void intersectWith(const SDNodeFlags Flags) {		void intersectWith(const SDNodeFlags Flags) {
if (!Flags.isDefined())		if (!Flags.isDefined())
return;		return;
NoUnsignedWrap &= Flags.NoUnsignedWrap;		NoUnsignedWrap &= Flags.NoUnsignedWrap;
NoSignedWrap &= Flags.NoSignedWrap;		NoSignedWrap &= Flags.NoSignedWrap;
Exact &= Flags.Exact;		Exact &= Flags.Exact;
UnsafeAlgebra &= Flags.UnsafeAlgebra;		UnsafeAlgebra &= Flags.UnsafeAlgebra;
NoNaNs &= Flags.NoNaNs;		NoNaNs &= Flags.NoNaNs;
NoInfs &= Flags.NoInfs;		NoInfs &= Flags.NoInfs;
NoSignedZeros &= Flags.NoSignedZeros;		NoSignedZeros &= Flags.NoSignedZeros;
AllowReciprocal &= Flags.AllowReciprocal;		AllowReciprocal &= Flags.AllowReciprocal;
VectorReduction &= Flags.VectorReduction;		VectorReduction &= Flags.VectorReduction;
AllowContract &= Flags.AllowContract;		AllowContract &= Flags.AllowContract;
		RoundDynamic &= Flags.RoundDynamic;
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I think you need some logic here to set the rounding mode to dynamic if the flags being intersected conflict. andrew.w.kaylor: I think you need some logic here to set the rounding mode to dynamic if the flags being…
		RoundTonearest &= Flags.RoundTonearest;
		RoundDownward &= Flags.RoundDownward;
		RoundUpward &= Flags.RoundUpward;
		RoundTowardZero &= Flags.RoundTowardZero;
		ExceptIgnore &= Flags.ExceptIgnore;
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions There needs to be a hierarchy here. For instance, if you merge ExceptIgnore with ExceptStrict it should result in ExceptStrict. andrew.w.kaylor: There needs to be a hierarchy here. For instance, if you merge ExceptIgnore with ExceptStrict…
		ExceptMayTrap &= Flags.ExceptMayTrap;
		ExceptStrict &= Flags.ExceptStrict;
}		}
};		};

/// Represents one node in the SelectionDAG.		/// Represents one node in the SelectionDAG.
///		///
class SDNode : public FoldingSetNode, public ilist_node<SDNode> {		class SDNode : public FoldingSetNode, public ilist_node<SDNode> {
private:		private:
/// The operation that this node performs.		/// The operation that this node performs.
▲ Show 20 Lines • Show All 1,877 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

	Show First 20 Lines • Show All 934 Lines • ▼ Show 20 Lines
	static TargetLowering::LegalizeAction			static TargetLowering::LegalizeAction
	getStrictFPOpcodeAction(const TargetLowering &TLI, unsigned Opcode, EVT VT) {			getStrictFPOpcodeAction(const TargetLowering &TLI, unsigned Opcode, EVT VT) {
	unsigned EqOpc;			unsigned EqOpc;
	switch (Opcode) {			switch (Opcode) {
	default: llvm_unreachable("Unexpected FP pseudo-opcode");			default: llvm_unreachable("Unexpected FP pseudo-opcode");
	case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;			case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;
	case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;			case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;
	case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;			case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;
	case ISD::STRICT_FMA: EqOpc = ISD::FMA; break;			case ISD::STRICT_FMA: EqOpc = ISD::STRICT_FMA; break;
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions This change will fail on all platforms except AMDGPU as you have this patch written. If you need target-specific behavior here, we'll need a target hook of some kind. andrew.w.kaylor: This change will fail on all platforms except AMDGPU as you have this patch written. If you…
				rampitecUnsubmitted Done Reply Inline Actions Looks like a good place to check for TLI.getOperationAction on the incoming Opcode and keep it as is if it is custom. rampitec: Looks like a good place to check for TLI.getOperationAction on the incoming Opcode and keep it…
	case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;			case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;
	case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;			case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;
	case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;			case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;
	case ISD::STRICT_FEXP2: EqOpc = ISD::FEXP2; break;			case ISD::STRICT_FEXP2: EqOpc = ISD::FEXP2; break;
	case ISD::STRICT_FLOG: EqOpc = ISD::FLOG; break;			case ISD::STRICT_FLOG: EqOpc = ISD::FLOG; break;
	case ISD::STRICT_FLOG10: EqOpc = ISD::FLOG10; break;			case ISD::STRICT_FLOG10: EqOpc = ISD::FLOG10; break;
	case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break;			case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break;
	case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;			case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;
	case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;			case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;
	}			}

	auto Action = TLI.getOperationAction(EqOpc, VT);			auto Action = TLI.getOperationAction(EqOpc, VT);

				if (Action == TargetLowering::Custom)
				return Action;

	// We don't currently handle Custom or Promote for strict FP pseudo-ops.			// We don't currently handle Custom or Promote for strict FP pseudo-ops.
	// For now, we just expand for those cases.			// For now, we just expand for those cases.
	if (Action != TargetLowering::Legal)			if (Action != TargetLowering::Legal)
	Action = TargetLowering::Expand;			Action = TargetLowering::Expand;

	return Action;			return Action;
	}			}

	▲ Show 20 Lines • Show All 3,693 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,461 Lines • ▼ Show 20 Lines	setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(1)),
getValue(I.getArgOperand(2))));		getValue(I.getArgOperand(2))));
return nullptr;		return nullptr;
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
case Intrinsic::experimental_constrained_fma:		case Intrinsic::experimental_constrained_fma: {
		SDNodeFlags SDFlags;
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions All of the intrinsics above from fadd to fma fall through into this code. That's obviously not what you intended. andrew.w.kaylor: All of the intrinsics above from fadd to fma fall through into this code. That's obviously not…
		const ConstrainedFPIntrinsic &FPI = cast<ConstrainedFPIntrinsic>(I);
		if (FPI.getRoundingMode() == llvm::ConstrainedFPIntrinsic::rmDynamic)
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions What about the other rounding modes? andrew.w.kaylor: What about the other rounding modes?
		SDFlags.setRoundDynamic(true);

		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		SDValue Res = DAG.getNode(ISD::STRICT_FMA, sdl, VT,
		getValue(I.getArgOperand(0)),
		getValue(I.getArgOperand(1)),
		getValue(I.getArgOperand(2)));

		Res.getNode()->setFlags(SDFlags);
		setValue(&I, Res);
		return nullptr;
		}
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
case Intrinsic::experimental_constrained_exp2:		case Intrinsic::experimental_constrained_exp2:
case Intrinsic::experimental_constrained_log:		case Intrinsic::experimental_constrained_log:
▲ Show 20 Lines • Show All 4,426 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	bool SelectVOP3OpSelMods0(SDValue In, SDValue &Src, SDValue &SrcMods,
SDValue &Clamp) const;		SDValue &Clamp) const;
bool SelectVOP3PMadMixModsImpl(SDValue In, SDValue &Src, unsigned &Mods) const;		bool SelectVOP3PMadMixModsImpl(SDValue In, SDValue &Src, unsigned &Mods) const;
bool SelectVOP3PMadMixMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3PMadMixMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;

void SelectADD_SUB_I64(SDNode *N);		void SelectADD_SUB_I64(SDNode *N);
void SelectUADDO_USUBO(SDNode *N);		void SelectUADDO_USUBO(SDNode *N);
void SelectDIV_SCALE(SDNode *N);		void SelectDIV_SCALE(SDNode *N);
void SelectFMA_W_CHAIN(SDNode *N);		void SelectFMA_W_CHAIN(SDNode *N);
void SelectFMUL_W_CHAIN(SDNode *N);		void SelectFMUL_W_CHAIN(SDNode *N);
		rampitecUnsubmitted Not Done Reply Inline Actions You have removed it but declaration remains. rampitec: You have removed it but declaration remains.

SDNode *getS_BFE(unsigned Opcode, const SDLoc &DL, SDValue Val,		SDNode *getS_BFE(unsigned Opcode, const SDLoc &DL, SDValue Val,
uint32_t Offset, uint32_t Width);		uint32_t Offset, uint32_t Width);
void SelectS_BFEFromShifts(SDNode *N);		void SelectS_BFEFromShifts(SDNode *N);
void SelectS_BFE(SDNode *N);		void SelectS_BFE(SDNode *N);
bool isCBranchSCC(const SDNode *N) const;		bool isCBranchSCC(const SDNode *N) const;
void SelectBRCOND(SDNode *N);		void SelectBRCOND(SDNode *N);
void SelectFMAD(SDNode *N);		void SelectFMAD(SDNode *N);
▲ Show 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	void AMDGPUDAGToDAGISel::SelectFMA_W_CHAIN(SDNode *N) {
// src0_modifiers, src0, src1_modifiers, src1, src2_modifiers, src2, clamp, omod		// src0_modifiers, src0, src1_modifiers, src1, src2_modifiers, src2, clamp, omod
SDValue Ops[10];		SDValue Ops[10];

SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[6], Ops[7]);		SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[6], Ops[7]);
SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);		SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);
SelectVOP3Mods(N->getOperand(3), Ops[5], Ops[4]);		SelectVOP3Mods(N->getOperand(3), Ops[5], Ops[4]);
Ops[8] = N->getOperand(0);		Ops[8] = N->getOperand(0);
Ops[9] = N->getOperand(4);		Ops[9] = N->getOperand(4);
		assert((N->getValueType(0) == MVT::f32 \|\| N->getValueType(0) == MVT::f64) &&
CurDAG->SelectNodeTo(N, AMDGPU::V_FMA_F32, N->getVTList(), Ops);		"Incorrent Value Type!");
		unsigned TargetOpc = N->getValueType(0) == MVT::f32 ?
		AMDGPU::V_FMA_F32 :
		AMDGPU::V_FMA_F64;
		CurDAG->SelectNodeTo(N, TargetOpc, N->getVTList(), Ops);
		rampitecUnsubmitted Not Done Reply Inline Actions What about f16? rampitec: What about f16?
}		}

void AMDGPUDAGToDAGISel::SelectFMUL_W_CHAIN(SDNode *N) {		void AMDGPUDAGToDAGISel::SelectFMUL_W_CHAIN(SDNode *N) {
SDLoc SL(N);		SDLoc SL(N);
// src0_modifiers, src0, src1_modifiers, src1, clamp, omod		// src0_modifiers, src0, src1_modifiers, src1, clamp, omod
SDValue Ops[8];		SDValue Ops[8];

SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[4], Ops[5]);		SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[4], Ops[5]);
SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);		SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);
Ops[6] = N->getOperand(0);		Ops[6] = N->getOperand(0);
Ops[7] = N->getOperand(3);		Ops[7] = N->getOperand(3);

CurDAG->SelectNodeTo(N, AMDGPU::V_MUL_F32_e64, N->getVTList(), Ops);		CurDAG->SelectNodeTo(N, AMDGPU::V_MUL_F32_e64, N->getVTList(), Ops);
}		}

// We need to handle this here because tablegen doesn't support matching		// We need to handle this here because tablegen doesn't support matching
		rampitecUnsubmitted Not Done Reply Inline Actions Use of "if(cond) ... else .. cond ? :" is weird. rampitec: Use of "if(cond) ... else .. cond ? :" is weird.
// instructions with multiple outputs.		// instructions with multiple outputs.
void AMDGPUDAGToDAGISel::SelectDIV_SCALE(SDNode *N) {		void AMDGPUDAGToDAGISel::SelectDIV_SCALE(SDNode *N) {
SDLoc SL(N);		SDLoc SL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

assert(VT == MVT::f32 \|\| VT == MVT::f64);		assert(VT == MVT::f32 \|\| VT == MVT::f64);

unsigned Opc		unsigned Opc
Show All 24 Lines	bool AMDGPUDAGToDAGISel::SelectDS1Addr1Offset(SDValue Addr, SDValue &Base,
if (CurDAG->isBaseWithConstantOffset(Addr)) {		if (CurDAG->isBaseWithConstantOffset(Addr)) {
SDValue N0 = Addr.getOperand(0);		SDValue N0 = Addr.getOperand(0);
SDValue N1 = Addr.getOperand(1);		SDValue N1 = Addr.getOperand(1);
ConstantSDNode *C1 = cast<ConstantSDNode>(N1);		ConstantSDNode *C1 = cast<ConstantSDNode>(N1);
if (isDSOffsetLegal(N0, C1->getSExtValue(), 16)) {		if (isDSOffsetLegal(N0, C1->getSExtValue(), 16)) {
// (add n0, c0)		// (add n0, c0)
Base = N0;		Base = N0;
Offset = CurDAG->getTargetConstant(C1->getZExtValue(), DL, MVT::i16);		Offset = CurDAG->getTargetConstant(C1->getZExtValue(), DL, MVT::i16);
return true;		return true;
		rampitecUnsubmitted Not Done Reply Inline Actions f16? rampitec: f16?
}		}
} else if (Addr.getOpcode() == ISD::SUB) {		} else if (Addr.getOpcode() == ISD::SUB) {
// sub C, x -> add (sub 0, x), C		// sub C, x -> add (sub 0, x), C
if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr.getOperand(0))) {		if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr.getOperand(0))) {
int64_t ByteOffset = C->getSExtValue();		int64_t ByteOffset = C->getSExtValue();
if (isUInt<16>(ByteOffset)) {		if (isUInt<16>(ByteOffset)) {
SDValue Zero = CurDAG->getTargetConstant(0, DL, MVT::i32);		SDValue Zero = CurDAG->getTargetConstant(0, DL, MVT::i32);

▲ Show 20 Lines • Show All 1,263 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	class SITargetLowering final : public AMDGPUTargetLowering {
SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFastUnsafeFDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFastUnsafeFDIV(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFDIV_FAST(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFDIV_FAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV16(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV32(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV32(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerConstrainedFMA(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;

/// \brief Converts \p Op, which must be of floating point type, to the		/// \brief Converts \p Op, which must be of floating point type, to the
/// floating point type \p VT, by either extending or truncating it.		/// floating point type \p VT, by either extending or truncating it.
▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,

setOperationAction(ISD::FFLOOR, MVT::f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::f64, Legal);

setOperationAction(ISD::FSIN, MVT::f32, Custom);		setOperationAction(ISD::FSIN, MVT::f32, Custom);
setOperationAction(ISD::FCOS, MVT::f32, Custom);		setOperationAction(ISD::FCOS, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f32, Custom);		setOperationAction(ISD::FDIV, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f64, Custom);		setOperationAction(ISD::FDIV, MVT::f64, Custom);

		//setOperationAction(ISD::FMA, MVT::f32, Custom);
		rampitecUnsubmitted Not Done Reply Inline Actions These commented lines not needed. rampitec: These commented lines not needed.
		//setOperationAction(ISD::FMA, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FMA, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FMA, MVT::f64, Custom);

if (Subtarget->has16BitInsts()) {		if (Subtarget->has16BitInsts()) {
setOperationAction(ISD::Constant, MVT::i16, Legal);		setOperationAction(ISD::Constant, MVT::i16, Legal);

setOperationAction(ISD::SMIN, MVT::i16, Legal);		setOperationAction(ISD::SMIN, MVT::i16, Legal);
setOperationAction(ISD::SMAX, MVT::i16, Legal);		setOperationAction(ISD::SMAX, MVT::i16, Legal);

setOperationAction(ISD::UMIN, MVT::i16, Legal);		setOperationAction(ISD::UMIN, MVT::i16, Legal);
setOperationAction(ISD::UMAX, MVT::i16, Legal);		setOperationAction(ISD::UMAX, MVT::i16, Legal);
▲ Show 20 Lines • Show All 2,836 Lines • ▼ Show 20 Lines	case ISD::LOAD: {
return Result;		return Result;
}		}

case ISD::FSIN:		case ISD::FSIN:
case ISD::FCOS:		case ISD::FCOS:
return LowerTrig(Op, DAG);		return LowerTrig(Op, DAG);
case ISD::SELECT: return LowerSELECT(Op, DAG);		case ISD::SELECT: return LowerSELECT(Op, DAG);
case ISD::FDIV: return LowerFDIV(Op, DAG);		case ISD::FDIV: return LowerFDIV(Op, DAG);
		case ISD::STRICT_FMA: return LowerConstrainedFMA(Op, DAG);
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions If you're going to make the changes in this patch, you need at least reasonable default behavior for all other platforms. andrew.w.kaylor: If you're going to make the changes in this patch, you need at least reasonable default…
case ISD::ATOMIC_CMP_SWAP: return LowerATOMIC_CMP_SWAP(Op, DAG);		case ISD::ATOMIC_CMP_SWAP: return LowerATOMIC_CMP_SWAP(Op, DAG);
case ISD::STORE: return LowerSTORE(Op, DAG);		case ISD::STORE: return LowerSTORE(Op, DAG);
case ISD::GlobalAddress: {		case ISD::GlobalAddress: {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
return LowerGlobalAddress(MFI, Op, DAG);		return LowerGlobalAddress(MFI, Op, DAG);
}		}
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);
▲ Show 20 Lines • Show All 1,473 Lines • ▼ Show 20 Lines

static SDValue getFPTernOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,		static SDValue getFPTernOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,
EVT VT, SDValue A, SDValue B, SDValue C,		EVT VT, SDValue A, SDValue B, SDValue C,
SDValue GlueChain) {		SDValue GlueChain) {
if (GlueChain->getNumValues() <= 1) {		if (GlueChain->getNumValues() <= 1) {
return DAG.getNode(Opcode, SL, VT, A, B, C);		return DAG.getNode(Opcode, SL, VT, A, B, C);
}		}

assert(GlueChain->getNumValues() == 3);		assert(GlueChain->getNumValues() == 3 \|\| GlueChain->getNumValues() == 2);

SDVTList VTList = DAG.getVTList(VT, MVT::Other, MVT::Glue);		SDVTList VTList = DAG.getVTList(VT, MVT::Other, MVT::Glue);
switch (Opcode) {		switch (Opcode) {
default: llvm_unreachable("no chain equivalent for opcode");		default: llvm_unreachable("no chain equivalent for opcode");
case ISD::FMA:		case ISD::FMA:
Opcode = AMDGPUISD::FMA_W_CHAIN;		Opcode = AMDGPUISD::FMA_W_CHAIN;
break;		break;
}		}

		if (GlueChain->getNumValues() == 3)
return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(1), A, B, C,		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(1), A, B, C,
GlueChain.getValue(2));		GlueChain.getValue(2));
		else if (GlueChain->getNumValues() == 2)
		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(0), A, B, C,
		GlueChain.getValue(1));
}		}

SDValue SITargetLowering::LowerFDIV16(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerFDIV16(SDValue Op, SelectionDAG &DAG) const {
if (SDValue FastLowered = lowerFastUnsafeFDIV(Op, DAG))		if (SDValue FastLowered = lowerFastUnsafeFDIV(Op, DAG))
return FastLowered;		return FastLowered;

SDLoc SL(Op);		SDLoc SL(Op);
SDValue Src0 = Op.getOperand(0);		SDValue Src0 = Op.getOperand(0);
▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	if (VT == MVT::f64)
return LowerFDIV64(Op, DAG);		return LowerFDIV64(Op, DAG);

if (VT == MVT::f16)		if (VT == MVT::f16)
return LowerFDIV16(Op, DAG);		return LowerFDIV16(Op, DAG);

llvm_unreachable("Unexpected type for fdiv");		llvm_unreachable("Unexpected type for fdiv");
}		}

		SDValue SITargetLowering::LowerConstrainedFMA(SDValue Op, SelectionDAG &DAG) const {
		SDLoc SL(Op);

		// Retrieve FP Rouding Mode.
		bool RoundMode = Op->getFlags().hasRoundDynamic();
		// TODO: Based on retrieved FP RoundMode to set up register modes.
		const unsigned Denorm32Reg = AMDGPU::Hwreg::ID_MODE \|
		(2 << AMDGPU::Hwreg::OFFSET_SHIFT_) \|
		(1 << AMDGPU::Hwreg::WIDTH_M1_SHIFT_);
		kzhuravlUnsubmitted Not Done Reply Inline Actions Rename WidthBit to "Offset". kzhuravl: Rename WidthBit to "Offset".

		const SDValue BitField = DAG.getTargetConstant(Denorm32Reg, SL, MVT::i16);

		SDVTList BindParamVTs = DAG.getVTList(MVT::Other, MVT::Glue);
		const SDValue EnableDenormValue = DAG.getConstant(FP_DENORM_FLUSH_NONE,
		kzhuravlUnsubmitted Not Done Reply Inline Actions Remove. This can go to default? kzhuravl: Remove. This can go to default?
		SL, MVT::i32);
		rampitecUnsubmitted Not Done Reply Inline Actions llvm_unreachable rampitec: llvm_unreachable

		SDValue EnableDenorm = DAG.getNode(AMDGPUISD::SETREG, SL, BindParamVTs,
		DAG.getEntryNode(),
		EnableDenormValue, BitField);

		SDValue FMA = getFPTernOp(DAG, ISD::FMA, SL, MVT::f64, Op.getOperand(0),
		Op.getOperand(1),
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions You have upward and downward reversed. andrew.w.kaylor: You have upward and downward reversed.
		Op.getOperand(2),
		EnableDenorm);

		kzhuravlUnsubmitted Not Done Reply Inline Actions New line. kzhuravl: New line.
		const SDValue DisableDenormValue = DAG.getConstant(FP_DENORM_FLUSH_NONE,
		SL, MVT::i32);

		kzhuravlUnsubmitted Not Done Reply Inline Actions Missing ID_SHIFT_. kzhuravl: Missing ID_SHIFT_.
		SDValue DisableDenorm = DAG.getNode(AMDGPUISD::SETREG, SL, BindParamVTs,
		FMA.getValue(1),
		kzhuravlUnsubmitted Not Done Reply Inline Actions What is 1? Do not use bare numbers. kzhuravl: What is 1? Do not use bare numbers.
		DisableDenormValue,
		BitField,
		FMA.getValue(2));

		SDValue OutputChain = DAG.getNode(ISD::TokenFactor, SL, MVT::Other,
		DisableDenorm, DAG.getRoot());
		DAG.setRoot(OutputChain);

		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I think you're interpreting the rounding mode argument differently than I have, and therefore differently than the documentation in the LLVM Language Reference ("therefore" because I wrote the documentation). My intention was that the rounding mode argument was provided as information to the optimizer. It tells the optimizer what it can assume about rounding mode at the point of the operation. It was not intended to actually set the rounding mode. I'm approaching this from the perspective of the STDC pragmas related to the FP environment. My understanding of these is that if FENV_ACCESS on is declared, we must assume dynamic (i.e. unknown) rounding mode in those scopes unless we can prove otherwise, but if the user wants to change the rounding mode a specific function call (such as fesetround) will be used. I'm not sure what sort of front end you are assuming here, so that may explain the difference in your approach. There are some x86 instructions that can incorporate a rounding mode operand, and it is my understanding that the AMDGPU architecture has similar needs. However, I believe we will need to extend the constrained FP intrinsics (or possibly introduce new intrinsics to handle cases like that. andrew.w.kaylor: I think you're interpreting the rounding mode argument differently than I have, and therefore…
		return FMA;

		llvm_unreachable("Unexpected type for fma");
		}

SDValue SITargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
		rampitecUnsubmitted Not Done Reply Inline Actions You are already doing translation, you can remove all the switches translating EqOpc into the same with chain. Just use it here. rampitec: You are already doing translation, you can remove all the switches translating EqOpc into the…
StoreSDNode *Store = cast<StoreSDNode>(Op);		StoreSDNode *Store = cast<StoreSDNode>(Op);
EVT VT = Store->getMemoryVT();		EVT VT = Store->getMemoryVT();

if (VT == MVT::i1) {		if (VT == MVT::i1) {
return DAG.getTruncStore(Store->getChain(), DL,		return DAG.getTruncStore(Store->getChain(), DL,
DAG.getSExtOrTrunc(Store->getValue(), DL, MVT::i32),		DAG.getSExtOrTrunc(Store->getValue(), DL, MVT::i32),
Store->getBasePtr(), MVT::i1, Store->getMemOperand());		Store->getBasePtr(), MVT::i1, Store->getMemOperand());
}		}
Show All 30 Lines	case 8:
return SplitVectorStore(Op, DAG);		return SplitVectorStore(Op, DAG);
return SDValue();		return SDValue();
case 16:		case 16:
if (NumElements > 4)		if (NumElements > 4)
return SplitVectorStore(Op, DAG);		return SplitVectorStore(Op, DAG);
return SDValue();		return SDValue();
default:		default:
llvm_unreachable("unsupported private_element_size");		llvm_unreachable("unsupported private_element_size");
}		}
		rampitecUnsubmitted Not Done Reply Inline Actions OK, you have chained all the nodes which require to reside within two s_setreg statements. How do you prevent any other regular fp operations without a chain to be scheduled in between of them? rampitec: OK, you have chained all the nodes which require to reside within two s_setreg statements. How…
} else if (AS == AMDGPUASI.LOCAL_ADDRESS) {		} else if (AS == AMDGPUASI.LOCAL_ADDRESS) {
if (NumElements > 2)		if (NumElements > 2)
return SplitVectorStore(Op, DAG);		return SplitVectorStore(Op, DAG);

if (NumElements == 2)		if (NumElements == 2)
return Op;		return Op;

// If properly aligned, if we split we might be able to use ds_write_b64.		// If properly aligned, if we split we might be able to use ds_write_b64.
▲ Show 20 Lines • Show All 1,880 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/constrained_fp.ll

This file was added.

				; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s

				declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata) nounwind readnone

				; FUNC-LABEL: {{^}}fma_f64:
				; FUNC: s_setreg_b32
				; FUNC: v_fma_f64
				; FUNC: s_setreg_b32
				define amdgpu_kernel void @fma_f64(double addrspace(1)* %out, double addrspace(1)* %in1,
				double addrspace(1)* %in2, double addrspace(1)* %in3) {
				%r0 = load double, double addrspace(1)* %in1
				%r1 = load double, double addrspace(1)* %in2
				%r2 = load double, double addrspace(1)* %in3
				%r3 = tail call double @llvm.experimental.constrained.fma.f64(double %r0, double %r1, double %r2, metadata !"round.dynamic", metadata !"fpexcept.strict")
				store double %r3, double addrspace(1)* %out
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Custom lowering constrained fps.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 118020

include/llvm/CodeGen/SelectionDAGNodes.h

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

lib/Target/AMDGPU/SIISelLowering.h

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/constrained_fp.ll

AMDGPU : Custom lowering constrained fps.
AbandonedPublic