This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Custom lowering constrained fps.
AbandonedPublic

Authored by arsenm on Oct 6 2017, 9:38 AM.

Download Raw Diff

Details

Reviewers

andrew.w.kaylor
b-sumner
rampitec
kzhuravl
wdng

Summary

This patch only shows a way how to custom lowering the constrained fma operation.

What does this patch do:

Expand SDNodeFlags APIs to set up SDNodeFlags at the initial DAG build phase when reading the constrained fps metadata data.
AMDGPU backend sets up resister modes based on retrieved SDNodeFlags.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Oct 6 2017, 9:38 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptOct 6 2017, 9:38 AM

Fixed format issue.

andrew.w.kaylor requested changes to this revision.Oct 6 2017, 12:09 PM

andrew.w.kaylor added inline comments.

include/llvm/CodeGen/SelectionDAGNodes.h
368 ↗	(On Diff #118028)	I don't really like the fact that these are separate flags, given that they're mutually exclusive. Also, I think we're eventually going to need to be able to distinguish between assumed rounding modes (where the instruction encoding isn't expected to include the rounding mode) and forced rounding modes (where the rounding mode will be encoded in the instruction). I don't have a specific vision for how that will need to work, but I know there are instructions that work this way and we'll need to handle at least intrinsics that use them. As I recall someone at AMD mentioned wanting behavior like that for flush-to-zero also. The currently documented behavior of the constrained FP intrinsics is that the rounding mode tells the optimizer what it may assume about the rounding mode at the intrinsic location. Something else must have been done to set the rounding mode. If you are lowering to instructions that include a rounding mode, how do you handle the RoundDynamic case?
384 ↗	(On Diff #118028)	I don't think all of the rounding modes can default to false. Maybe you need a RoundDefault option. For instance, if SDNodeFlags::isDefined() returns true, but the node doesn't have a constrained rounding mode I'd need to check four different flags and then make an assumption to see that.
510 ↗	(On Diff #118028)	I think you need some logic here to set the rounding mode to dynamic if the flags being intersected conflict.
515 ↗	(On Diff #118028)	There needs to be a hierarchy here. For instance, if you merge ExceptIgnore with ExceptStrict it should result in ExceptStrict.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
952	This change will fail on all platforms except AMDGPU as you have this patch written. If you need target-specific behavior here, we'll need a target hook of some kind.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
5471	All of the intrinsics above from fadd to fma fall through into this code. That's obviously not what you intended.
5473	What about the other rounding modes?
lib/Target/AMDGPU/SIISelLowering.cpp
3228	If you're going to make the changes in this patch, you need at least reasonable default behavior for all other platforms.

This revision now requires changes to proceed.Oct 6 2017, 12:09 PM

wdng added a reviewer: rampitec.Oct 9 2017, 2:58 PM

rampitec added inline comments.Oct 9 2017, 3:17 PM

include/llvm/CodeGen/SelectionDAGNodes.h
368 ↗	(On Diff #118028)	As these flags only apply to the ISD::STRICT_* opcodes you probably do not need to add fields to a generic SDNodeFlags. STRICT_* opcodes can get an extra operand for rounding mode.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
952	Looks like a good place to check for TLI.getOperationAction on the incoming Opcode and keep it as is if it is custom.

Instead of using SDNodeFlags to store metadata information, this patch directly appends an extra operand for rounding mode during the DAG build phase based on Stats's suggestion. This patch currently implements strict fps for fadd, fsub, fmul, fma, and fsqrt. Thanks a lot for @andrew.w.kaylor and @rampitec comments for this!

Known issues:

FDIV case is a special case, a separate patch will be created for it.
Currently, we don't take care of the "round.dynamic" rounding mode for the time being.
Will create a separate patch for f16 data type.

wdng marked 9 inline comments as done.Oct 16 2017, 11:36 AM

wdng retitled this revision from AMDGPU : Expand SDNodeFlags APIs & custom lowering constrained fps. to AMDGPU : Custom lowering constrained fps..

rampitec added inline comments.Oct 16 2017, 2:35 PM

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
206	You have removed it but declaration remains.
790	What about f16?
811	Use of "if(cond) ... else .. cond ? :" is weird.
852	f16?
lib/Target/AMDGPU/AMDGPUISelLowering.h
338	You do not handle every one of that.
lib/Target/AMDGPU/SIISelLowering.cpp
319	These commented lines not needed.
5020	llvm_unreachable
5050	You are already doing translation, you can remove all the switches translating EqOpc into the same with chain. Just use it here.
5097	OK, you have chained all the nodes which require to reside within two s_setreg statements. How do you prevent any other regular fp operations without a chain to be scheduled in between of them?

kzhuravl requested changes to this revision.Oct 17 2017, 2:09 PM

kzhuravl added inline comments.

lib/Target/AMDGPU/SIDefines.h
461–462	These should go to relevant enums (like Id, Offset, WidthMinusOne, etc.).
lib/Target/AMDGPU/SIISelLowering.cpp
5014	Rename WidthBit to "Offset".
5019	Remove. This can go to default?
5030	New line.
5033	Missing ID_SHIFT_.
5035	What is 1? Do not use bare numbers.

This revision now requires changes to proceed.Oct 17 2017, 2:09 PM

It isn't clear to me how your custom lowering interacts, if at all, with existing table-driven selection patterns. One of the goals in the implementation up to this point has been to have the instruction selection fall back on existing pattern matching as much as possible so that we don't need to duplicate all of the cases that are currently handled. Can you explain to me how this applies in the AMDGPU case?

lib/Target/AMDGPU/SIISelLowering.cpp
5027	You have upward and downward reversed.
5043	I think you're interpreting the rounding mode argument differently than I have, and therefore differently than the documentation in the LLVM Language Reference ("therefore" because I wrote the documentation). My intention was that the rounding mode argument was provided as information to the optimizer. It tells the optimizer what it can assume about rounding mode at the point of the operation. It was not intended to actually set the rounding mode. I'm approaching this from the perspective of the STDC pragmas related to the FP environment. My understanding of these is that if FENV_ACCESS on is declared, we must assume dynamic (i.e. unknown) rounding mode in those scopes unless we can prove otherwise, but if the user wants to change the rounding mode a specific function call (such as fesetround) will be used. I'm not sure what sort of front end you are assuming here, so that may explain the difference in your approach. There are some x86 instructions that can incorporate a rounding mode operand, and it is my understanding that the AMDGPU architecture has similar needs. However, I believe we will need to extend the constrained FP intrinsics (or possibly introduce new intrinsics to handle cases like that.

arsenm commandeered this revision.Apr 5 2020, 8:48 AM

arsenm edited reviewers, added: wdng; removed: arsenm.

Herald added subscribers: kerbowa, jvesely. · View Herald TranscriptApr 5 2020, 8:48 AM

Needs to be redone

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

11 lines

SelectionDAGBuilder.cpp

17 lines

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

77 lines

AMDGPUISelLowering.h

16 lines

AMDGPUISelLowering.cpp

16 lines

50 lines

3 lines

1 line

221 lines

test/

CodeGen/

AMDGPU/

constrained_fp.ll

129 lines

Diff 119184

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 928 Lines • ▼ Show 20 Lines	if (UpdatedNodes) {
UpdatedNodes->insert(Chain.getNode());		UpdatedNodes->insert(Chain.getNode());
}		}
ReplacedNode(Node);		ReplacedNode(Node);
}		}
}		}

static TargetLowering::LegalizeAction		static TargetLowering::LegalizeAction
getStrictFPOpcodeAction(const TargetLowering &TLI, unsigned Opcode, EVT VT) {		getStrictFPOpcodeAction(const TargetLowering &TLI, unsigned Opcode, EVT VT) {
		auto Action = TLI.getOperationAction(Opcode, VT);
		if (Action == TargetLowering::Custom)
		return Action;

unsigned EqOpc;		unsigned EqOpc;
switch (Opcode) {		switch (Opcode) {
default: llvm_unreachable("Unexpected FP pseudo-opcode");		default: llvm_unreachable("Unexpected FP pseudo-opcode");
		case ISD::STRICT_FADD: EqOpc = ISD::FADD; break;
		case ISD::STRICT_FSUB: EqOpc = ISD::FSUB; break;
		case ISD::STRICT_FMUL: EqOpc = ISD::FMUL; break;
		case ISD::STRICT_FDIV: EqOpc = ISD::FDIV; break;
		case ISD::STRICT_FREM: EqOpc = ISD::FREM; break;
case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;		case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;
case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;		case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;
case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;		case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;
case ISD::STRICT_FMA: EqOpc = ISD::FMA; break;		case ISD::STRICT_FMA: EqOpc = ISD::FMA; break;
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions This change will fail on all platforms except AMDGPU as you have this patch written. If you need target-specific behavior here, we'll need a target hook of some kind. andrew.w.kaylor: This change will fail on all platforms except AMDGPU as you have this patch written. If you…
		rampitecUnsubmitted Done Reply Inline Actions Looks like a good place to check for TLI.getOperationAction on the incoming Opcode and keep it as is if it is custom. rampitec: Looks like a good place to check for TLI.getOperationAction on the incoming Opcode and keep it…
case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;		case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;
case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;		case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;
case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;		case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;
case ISD::STRICT_FEXP2: EqOpc = ISD::FEXP2; break;		case ISD::STRICT_FEXP2: EqOpc = ISD::FEXP2; break;
case ISD::STRICT_FLOG: EqOpc = ISD::FLOG; break;		case ISD::STRICT_FLOG: EqOpc = ISD::FLOG; break;
case ISD::STRICT_FLOG10: EqOpc = ISD::FLOG10; break;		case ISD::STRICT_FLOG10: EqOpc = ISD::FLOG10; break;
case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break;		case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break;
case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;		case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;
case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;		case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;
}		}

auto Action = TLI.getOperationAction(EqOpc, VT);		Action = TLI.getOperationAction(EqOpc, VT);

// We don't currently handle Custom or Promote for strict FP pseudo-ops.		// We don't currently handle Custom or Promote for strict FP pseudo-ops.
// For now, we just expand for those cases.		// For now, we just expand for those cases.
if (Action != TargetLowering::Legal)		if (Action != TargetLowering::Legal)
Action = TargetLowering::Expand;		Action = TargetLowering::Expand;

return Action;		return Action;
}		}
▲ Show 20 Lines • Show All 3,694 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,456 Lines • ▼ Show 20 Lines	case Intrinsic::copysign:
return nullptr;		return nullptr;
case Intrinsic::fma:		case Intrinsic::fma:
setValue(&I, DAG.getNode(ISD::FMA, sdl,		setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(1)),
getValue(I.getArgOperand(2))));		getValue(I.getArgOperand(2))));
return nullptr;		return nullptr;
		case Intrinsic::experimental_constrained_fma:
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
case Intrinsic::experimental_constrained_fma:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions All of the intrinsics above from fadd to fma fall through into this code. That's obviously not what you intended. andrew.w.kaylor: All of the intrinsics above from fadd to fma fall through into this code. That's obviously not…
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions What about the other rounding modes? andrew.w.kaylor: What about the other rounding modes?
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
case Intrinsic::experimental_constrained_exp2:		case Intrinsic::experimental_constrained_exp2:
case Intrinsic::experimental_constrained_log:		case Intrinsic::experimental_constrained_log:
case Intrinsic::experimental_constrained_log10:		case Intrinsic::experimental_constrained_log10:
case Intrinsic::experimental_constrained_log2:		case Intrinsic::experimental_constrained_log2:
case Intrinsic::experimental_constrained_rint:		case Intrinsic::experimental_constrained_rint:
▲ Show 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDValue Chain = getRoot();		SDValue Chain = getRoot();
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);		ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);
ValueVTs.push_back(MVT::Other); // Out chain		ValueVTs.push_back(MVT::Other); // Out chain

SDVTList VTs = DAG.getVTList(ValueVTs);		SDVTList VTs = DAG.getVTList(ValueVTs);
SDValue Result;		SDValue Result;
		const SDValue FPMode = DAG.getConstant(FPI.getRoundingMode(), sdl, MVT::i32);

if (FPI.isUnaryOp())		if (FPI.isUnaryOp())
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)) });		{Chain, getValue(FPI.getArgOperand(0)), FPMode});
else if (FPI.isTernaryOp())		else if (FPI.isTernaryOp())
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)),		{Chain, getValue(FPI.getArgOperand(0)),
getValue(FPI.getArgOperand(1)),		getValue(FPI.getArgOperand(1)),
getValue(FPI.getArgOperand(2)) });		getValue(FPI.getArgOperand(2)),
		FPMode});
else		else
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)),		{Chain, getValue(FPI.getArgOperand(0)),
getValue(FPI.getArgOperand(1)) });		getValue(FPI.getArgOperand(1)), FPMode});

assert(Result.getNode()->getNumValues() == 2);		assert(Result.getNode()->getNumValues() == 2);
SDValue OutChain = Result.getValue(1);		SDValue OutChain = Result.getValue(1);
DAG.setRoot(OutChain);		DAG.setRoot(OutChain);
SDValue FPResult = Result.getValue(0);		SDValue FPResult = Result.getValue(0);
setValue(&FPI, FPResult);		setValue(&FPI, FPResult);
}		}

▲ Show 20 Lines • Show All 3,823 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	bool SelectVOP3OpSelMods0(SDValue In, SDValue &Src, SDValue &SrcMods,
SDValue &Clamp) const;		SDValue &Clamp) const;
bool SelectVOP3PMadMixModsImpl(SDValue In, SDValue &Src, unsigned &Mods) const;		bool SelectVOP3PMadMixModsImpl(SDValue In, SDValue &Src, unsigned &Mods) const;
bool SelectVOP3PMadMixMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;		bool SelectVOP3PMadMixMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;

void SelectADD_SUB_I64(SDNode *N);		void SelectADD_SUB_I64(SDNode *N);
void SelectUADDO_USUBO(SDNode *N);		void SelectUADDO_USUBO(SDNode *N);
void SelectDIV_SCALE(SDNode *N);		void SelectDIV_SCALE(SDNode *N);
void SelectFMA_W_CHAIN(SDNode *N);		void SelectFMA_W_CHAIN(SDNode *N);
void SelectFMUL_W_CHAIN(SDNode *N);		void SelectFMUL_W_CHAIN(SDNode *N);
		rampitecUnsubmitted Not Done Reply Inline Actions You have removed it but declaration remains. rampitec: You have removed it but declaration remains.
		void SelectStrictBinOp_W_CHAIN(SDNode *N);
		void SelectStrictUnaryOp_W_CHAIN(SDNode *N);

SDNode *getS_BFE(unsigned Opcode, const SDLoc &DL, SDValue Val,		SDNode *getS_BFE(unsigned Opcode, const SDLoc &DL, SDValue Val,
uint32_t Offset, uint32_t Width);		uint32_t Offset, uint32_t Width);
void SelectS_BFEFromShifts(SDNode *N);		void SelectS_BFEFromShifts(SDNode *N);
void SelectS_BFE(SDNode *N);		void SelectS_BFE(SDNode *N);
bool isCBranchSCC(const SDNode *N) const;		bool isCBranchSCC(const SDNode *N) const;
void SelectBRCOND(SDNode *N);		void SelectBRCOND(SDNode *N);
void SelectFMAD(SDNode *N);		void SelectFMAD(SDNode *N);
▲ Show 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	case ISD::SUBE: {
SelectADD_SUB_I64(N);		SelectADD_SUB_I64(N);
return;		return;
}		}
case ISD::UADDO:		case ISD::UADDO:
case ISD::USUBO: {		case ISD::USUBO: {
SelectUADDO_USUBO(N);		SelectUADDO_USUBO(N);
return;		return;
}		}
		case AMDGPUISD::FSQRT_W_CHAIN: {
		SelectStrictUnaryOp_W_CHAIN(N);
		return;
		}
		case AMDGPUISD::FADD_W_CHAIN:
		case AMDGPUISD::FSUB_W_CHAIN:
case AMDGPUISD::FMUL_W_CHAIN: {		case AMDGPUISD::FMUL_W_CHAIN: {
SelectFMUL_W_CHAIN(N);		SelectStrictBinOp_W_CHAIN(N);
return;		return;
}		}
case AMDGPUISD::FMA_W_CHAIN: {		case AMDGPUISD::FMA_W_CHAIN: {
SelectFMA_W_CHAIN(N);		SelectFMA_W_CHAIN(N);
return;		return;
}		}

case ISD::SCALAR_TO_VECTOR:		case ISD::SCALAR_TO_VECTOR:
case ISD::BUILD_VECTOR: {		case ISD::BUILD_VECTOR: {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
unsigned NumVectorElts = VT.getVectorNumElements();		unsigned NumVectorElts = VT.getVectorNumElements();

if (VT == MVT::v2i16 \|\| VT == MVT::v2f16) {		if (VT == MVT::v2i16 \|\| VT == MVT::v2f16) {
if (Opc == ISD::BUILD_VECTOR) {		if (Opc == ISD::BUILD_VECTOR) {
uint32_t LHSVal, RHSVal;		uint32_t LHSVal, RHSVal;
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	void AMDGPUDAGToDAGISel::SelectFMA_W_CHAIN(SDNode *N) {
// src0_modifiers, src0, src1_modifiers, src1, src2_modifiers, src2, clamp, omod		// src0_modifiers, src0, src1_modifiers, src1, src2_modifiers, src2, clamp, omod
SDValue Ops[10];		SDValue Ops[10];

SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[6], Ops[7]);		SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[6], Ops[7]);
SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);		SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);
SelectVOP3Mods(N->getOperand(3), Ops[5], Ops[4]);		SelectVOP3Mods(N->getOperand(3), Ops[5], Ops[4]);
Ops[8] = N->getOperand(0);		Ops[8] = N->getOperand(0);
Ops[9] = N->getOperand(4);		Ops[9] = N->getOperand(4);
		assert((N->getValueType(0) == MVT::f32 \|\| N->getValueType(0) == MVT::f64) &&
		rampitecUnsubmitted Not Done Reply Inline Actions What about f16? rampitec: What about f16?
CurDAG->SelectNodeTo(N, AMDGPU::V_FMA_F32, N->getVTList(), Ops);		"Incorrent Value Type!");
		unsigned TargetOpc = N->getValueType(0) == MVT::f32 ?
		AMDGPU::V_FMA_F32 :
		AMDGPU::V_FMA_F64;
		CurDAG->SelectNodeTo(N, TargetOpc, N->getVTList(), Ops);
}		}

void AMDGPUDAGToDAGISel::SelectFMUL_W_CHAIN(SDNode *N) {		void AMDGPUDAGToDAGISel::SelectStrictBinOp_W_CHAIN(SDNode *N) {
SDLoc SL(N);		SDLoc SL(N);
// src0_modifiers, src0, src1_modifiers, src1, clamp, omod		// src0_modifiers, src0, src1_modifiers, src1, clamp, omod
SDValue Ops[8];		SDValue Ops[8];

SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[4], Ops[5]);		SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[4], Ops[5]);
SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);		SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);
Ops[6] = N->getOperand(0);		Ops[6] = N->getOperand(0);
Ops[7] = N->getOperand(3);		Ops[7] = N->getOperand(3);
		unsigned TargetOpc;
		switch (N->getOpcode()) {
		default: llvm_unreachable("Unpected Opcode encountered!");
		case AMDGPUISD::FADD_W_CHAIN:
		if (N->getValueType(0) == MVT::f16)
		rampitecUnsubmitted Not Done Reply Inline Actions Use of "if(cond) ... else .. cond ? :" is weird. rampitec: Use of "if(cond) ... else .. cond ? :" is weird.
		TargetOpc = AMDGPU::V_ADD_F16_e64;
		else
		TargetOpc = N->getValueType(0) == MVT::f32 ?
		AMDGPU::V_ADD_F32_e64 :
		AMDGPU::V_ADD_F64;
		break;
		case AMDGPUISD::FSUB_W_CHAIN:
		assert(N->getValueType(0) == MVT::f16 \|\| N->getValueType(0) == MVT::f32 &&
		"Expected Type Encountered!");
		TargetOpc = N->getValueType(0) == MVT::f16 ?
		AMDGPU::V_SUB_F16_e64 :
		AMDGPU::V_SUB_F32_e64;
		break;
		case AMDGPUISD::FMUL_W_CHAIN:
		if (N->getValueType(0) == MVT::f16)
		TargetOpc = AMDGPU::V_MUL_F16_e64;
		else
		TargetOpc = N->getValueType(0) == MVT::f32 ?
		AMDGPU::V_MUL_F32_e64 :
		AMDGPU::V_MUL_F64;
		break;
		}

		CurDAG->SelectNodeTo(N, TargetOpc, N->getVTList(), Ops);
		}

		void AMDGPUDAGToDAGISel::SelectStrictUnaryOp_W_CHAIN(SDNode *N) {
		SDLoc SL(N);

		// src0_modifiers, src0, src1_modifiers, src1, clamp, omod
		SDValue Ops[6];

		SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[2], Ops[3]);
		Ops[4] = N->getOperand(0);
		Ops[5] = N->getOperand(2);
		unsigned TargetOpc;
		switch (N->getOpcode()) {
		default: llvm_unreachable("Unexpected Opcode encountered!");
		case AMDGPUISD::FSQRT_W_CHAIN:
		TargetOpc = N->getValueType(0) == MVT::f32 ?
		AMDGPU::V_SQRT_F32_e64 :
		rampitecUnsubmitted Not Done Reply Inline Actions f16? rampitec: f16?
		AMDGPU::V_SQRT_F64_e64;
		break;
		case AMDGPUISD::FSIN_W_CHAIN:
		TargetOpc = N->getValueType(0) == MVT::f32 ?
		AMDGPU::V_SIN_F32_e64 :
		AMDGPU::V_SIN_F16_e64;
		break;
		}

CurDAG->SelectNodeTo(N, AMDGPU::V_MUL_F32_e64, N->getVTList(), Ops);		CurDAG->SelectNodeTo(N, TargetOpc, N->getVTList(), Ops);
}		}

// We need to handle this here because tablegen doesn't support matching		// We need to handle this here because tablegen doesn't support matching
// instructions with multiple outputs.		// instructions with multiple outputs.
void AMDGPUDAGToDAGISel::SelectDIV_SCALE(SDNode *N) {		void AMDGPUDAGToDAGISel::SelectDIV_SCALE(SDNode *N) {
SDLoc SL(N);		SDLoc SL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

▲ Show 20 Lines • Show All 1,307 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

// This is SETCC with the full mask result which is used for a compare with a		// This is SETCC with the full mask result which is used for a compare with a
// result bit per item in the wavefront.		// result bit per item in the wavefront.
SETCC,		SETCC,
SETREG,		SETREG,
// FP ops with input and output chain.		// FP ops with input and output chain.
FMA_W_CHAIN,		FMA_W_CHAIN,
FMUL_W_CHAIN,		FMUL_W_CHAIN,
		FADD_W_CHAIN,
		FSUB_W_CHAIN,
		FDIV_W_CHAIN,
		FREM_W_CHAIN,
		FSQRT_W_CHAIN,
		FPOW_W_CHAIN,
		FPOWI_W_CHAIN,
		FSIN_W_CHAIN,
		FCOS_W_CHAIN,
		rampitecUnsubmitted Not Done Reply Inline Actions You do not handle every one of that. rampitec: You do not handle every one of that.
		FEXP_W_CHAIN,
		FEXP2_W_CHAIN,
		FLOG_W_CHAIN,
		FLOG10_W_CHAIN,
		FLOG2_W_CHAIN,
		FRINT_W_CHAIN,
		FNEARBYINT_W_CHAIN,

// SIN_HW, COS_HW - f32 for SI, 1 ULP max error, valid from -100 pi to 100 pi.		// SIN_HW, COS_HW - f32 for SI, 1 ULP max error, valid from -100 pi to 100 pi.
// Denormals handled on some parts.		// Denormals handled on some parts.
COS_HW,		COS_HW,
SIN_HW,		SIN_HW,
FMAX_LEGACY,		FMAX_LEGACY,
FMIN_LEGACY,		FMIN_LEGACY,
FMAX3,		FMAX3,
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 3,711 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(RETURN_TO_EPILOG)		NODE_NAME_CASE(RETURN_TO_EPILOG)
NODE_NAME_CASE(ENDPGM)		NODE_NAME_CASE(ENDPGM)
NODE_NAME_CASE(DWORDADDR)		NODE_NAME_CASE(DWORDADDR)
NODE_NAME_CASE(FRACT)		NODE_NAME_CASE(FRACT)
NODE_NAME_CASE(SETCC)		NODE_NAME_CASE(SETCC)
NODE_NAME_CASE(SETREG)		NODE_NAME_CASE(SETREG)
NODE_NAME_CASE(FMA_W_CHAIN)		NODE_NAME_CASE(FMA_W_CHAIN)
NODE_NAME_CASE(FMUL_W_CHAIN)		NODE_NAME_CASE(FMUL_W_CHAIN)
		NODE_NAME_CASE(FADD_W_CHAIN)
		NODE_NAME_CASE(FSUB_W_CHAIN)
		NODE_NAME_CASE(FDIV_W_CHAIN)
		NODE_NAME_CASE(FREM_W_CHAIN)
		NODE_NAME_CASE(FSQRT_W_CHAIN)
		NODE_NAME_CASE(FPOW_W_CHAIN)
		NODE_NAME_CASE(FPOWI_W_CHAIN)
		NODE_NAME_CASE(FSIN_W_CHAIN)
		NODE_NAME_CASE(FCOS_W_CHAIN)
		NODE_NAME_CASE(FEXP_W_CHAIN)
		NODE_NAME_CASE(FEXP2_W_CHAIN)
		NODE_NAME_CASE(FLOG_W_CHAIN)
		NODE_NAME_CASE(FLOG10_W_CHAIN)
		NODE_NAME_CASE(FLOG2_W_CHAIN)
		NODE_NAME_CASE(FRINT_W_CHAIN)
		NODE_NAME_CASE(FNEARBYINT_W_CHAIN)
NODE_NAME_CASE(CLAMP)		NODE_NAME_CASE(CLAMP)
NODE_NAME_CASE(COS_HW)		NODE_NAME_CASE(COS_HW)
NODE_NAME_CASE(SIN_HW)		NODE_NAME_CASE(SIN_HW)
NODE_NAME_CASE(FMAX_LEGACY)		NODE_NAME_CASE(FMAX_LEGACY)
NODE_NAME_CASE(FMIN_LEGACY)		NODE_NAME_CASE(FMIN_LEGACY)
NODE_NAME_CASE(FMAX3)		NODE_NAME_CASE(FMAX3)
NODE_NAME_CASE(SMAX3)		NODE_NAME_CASE(SMAX3)
NODE_NAME_CASE(UMAX3)		NODE_NAME_CASE(UMAX3)
▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines

	def AMDGPUSetRegOp : SDTypeProfile<0, 2, [			def AMDGPUSetRegOp : SDTypeProfile<0, 2, [
	SDTCisInt<0>, SDTCisInt<1>			SDTCisInt<0>, SDTCisInt<1>
	]>;			]>;

	def AMDGPUsetreg : SDNode<"AMDGPUISD::SETREG", AMDGPUSetRegOp, [			def AMDGPUsetreg : SDNode<"AMDGPUISD::SETREG", AMDGPUSetRegOp, [
	SDNPHasChain, SDNPSideEffect, SDNPOptInGlue, SDNPOutGlue]>;			SDNPHasChain, SDNPSideEffect, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUfadd : SDNode<"AMDGPUISD::FADD_W_CHAIN", SDTFPBinOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUfsub : SDNode<"AMDGPUISD::FSUB_W_CHAIN", SDTFPBinOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUfmul : SDNode<"AMDGPUISD::FMUL_W_CHAIN", SDTFPBinOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUfdiv : SDNode<"AMDGPUISD::FDIV_W_CHAIN", SDTFPBinOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUfrem : SDNode<"AMDGPUISD::FREM_W_CHAIN", SDTFPBinOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

	def AMDGPUfma : SDNode<"AMDGPUISD::FMA_W_CHAIN", SDTFPTernaryOp, [			def AMDGPUfma : SDNode<"AMDGPUISD::FMA_W_CHAIN", SDTFPTernaryOp, [
	SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;			SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

	def AMDGPUmul : SDNode<"AMDGPUISD::FMUL_W_CHAIN", SDTFPBinOp, [			def AMDGPUsqrt : SDNode<"AMDGPUISD::SQRT_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUpow : SDNode<"AMDGPUISD::FPOW_W_CHAIN", SDTFPBinOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUpowi : SDNode<"AMDGPUISD::FPOWI_W_CHAIN", SDTFPBinOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUsin_chain : SDNode<"AMDGPUISD::FSIN_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUcos_chain : SDNode<"AMDGPUISD::FCOS_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUexp : SDNode<"AMDGPUISD::FEXP_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUexp2 : SDNode<"AMDGPUISD::FEXP2_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUlog : SDNode<"AMDGPUISD::FLOG_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUlog10 : SDNode<"AMDGPUISD::FLOG10_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUlog2 : SDNode<"AMDGPUISD::FLOG2_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUrint : SDNode<"AMDGPUISD::FRINT_W_CHAIN", SDTFPUnaryOp, [
				SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

				def AMDGPUnearbyint : SDNode<"AMDGPUISD::FNEARBYINT_W_CHAIN", SDTFPUnaryOp, [
	SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;			SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

	def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0",			def AMDGPUcvt_f32_ubyte0 : SDNode<"AMDGPUISD::CVT_F32_UBYTE0",
	SDTIntToFPOp, []>;			SDTIntToFPOp, []>;
	def AMDGPUcvt_f32_ubyte1 : SDNode<"AMDGPUISD::CVT_F32_UBYTE1",			def AMDGPUcvt_f32_ubyte1 : SDNode<"AMDGPUISD::CVT_F32_UBYTE1",
	SDTIntToFPOp, []>;			SDTIntToFPOp, []>;
	def AMDGPUcvt_f32_ubyte2 : SDNode<"AMDGPUISD::CVT_F32_UBYTE2",			def AMDGPUcvt_f32_ubyte2 : SDNode<"AMDGPUISD::CVT_F32_UBYTE2",
	SDTIntToFPOp, []>;			SDTIntToFPOp, []>;
	▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIDefines.h

	Show First 20 Lines • Show All 452 Lines • ▼ Show 20 Lines
	#define FP_ROUND_ROUND_TO_NEGINF 2			#define FP_ROUND_ROUND_TO_NEGINF 2
	#define FP_ROUND_ROUND_TO_ZERO 3			#define FP_ROUND_ROUND_TO_ZERO 3

	// Bits 3:0 control rounding mode. 1:0 control single precision, 3:2 double			// Bits 3:0 control rounding mode. 1:0 control single precision, 3:2 double
	// precision.			// precision.
	#define FP_ROUND_MODE_SP(x) ((x) & 0x3)			#define FP_ROUND_MODE_SP(x) ((x) & 0x3)
	#define FP_ROUND_MODE_DP(x) (((x) & 0x3) << 2)			#define FP_ROUND_MODE_DP(x) (((x) & 0x3) << 2)

				#define OFFSET_SINGLE_FP_ROUND 0
				#define OFFSET_DOUBLE_FP_ROUND 2
				kzhuravlUnsubmitted Not Done Reply Inline Actions These should go to relevant enums (like Id, Offset, WidthMinusOne, etc.). kzhuravl: These should go to relevant enums (like Id, Offset, WidthMinusOne, etc.).

	#define FP_DENORM_FLUSH_IN_FLUSH_OUT 0			#define FP_DENORM_FLUSH_IN_FLUSH_OUT 0
	#define FP_DENORM_FLUSH_OUT 1			#define FP_DENORM_FLUSH_OUT 1
	#define FP_DENORM_FLUSH_IN 2			#define FP_DENORM_FLUSH_IN 2
	#define FP_DENORM_FLUSH_NONE 3			#define FP_DENORM_FLUSH_NONE 3


	// Bits 7:4 control denormal handling. 5:4 control single precision, 6:7 double			// Bits 7:4 control denormal handling. 5:4 control single precision, 6:7 double
	// precision.			// precision.
	Show All 14 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	class SITargetLowering final : public AMDGPUTargetLowering {
SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFastUnsafeFDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFastUnsafeFDIV(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFDIV_FAST(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFDIV_FAST(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV16(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV32(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV32(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFDIV(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerConstrainedFPs(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerTrig(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;

/// \brief Converts \p Op, which must be of floating point type, to the		/// \brief Converts \p Op, which must be of floating point type, to the
/// floating point type \p VT, by either extending or truncating it.		/// floating point type \p VT, by either extending or truncating it.
▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	SITargetLowering::SITargetLowering(const TargetMachine &TM,

setOperationAction(ISD::FFLOOR, MVT::f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::f64, Legal);

setOperationAction(ISD::FSIN, MVT::f32, Custom);		setOperationAction(ISD::FSIN, MVT::f32, Custom);
setOperationAction(ISD::FCOS, MVT::f32, Custom);		setOperationAction(ISD::FCOS, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f32, Custom);		setOperationAction(ISD::FDIV, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f64, Custom);		setOperationAction(ISD::FDIV, MVT::f64, Custom);

		//setOperationAction(ISD::FMA, MVT::f32, Custom);
		rampitecUnsubmitted Not Done Reply Inline Actions These commented lines not needed. rampitec: These commented lines not needed.
		//setOperationAction(ISD::FMA, MVT::f64, Custom);
		setOperationAction(ISD::STRICT_FADD, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FADD, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FSUB, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FSUB, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FMUL, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FMUL, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FDIV, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FDIV, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FREM, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FREM, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FMA, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FMA, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FSQRT, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FSQRT, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FPOW, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FPOW, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FPOWI, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FPOWI, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FSIN, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FSIN, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FCOS, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FCOS, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FEXP, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FEXP, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FEXP2, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FEXP2, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FLOG, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FLOG, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FLOG10, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FLOG10, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FLOG2, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FLOG2, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FRINT, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FRINT, MVT::f64, Custom);

		setOperationAction(ISD::STRICT_FNEARBYINT, MVT::f32, Custom);
		setOperationAction(ISD::STRICT_FNEARBYINT, MVT::f64, Custom);


if (Subtarget->has16BitInsts()) {		if (Subtarget->has16BitInsts()) {
setOperationAction(ISD::Constant, MVT::i16, Legal);		setOperationAction(ISD::Constant, MVT::i16, Legal);

setOperationAction(ISD::SMIN, MVT::i16, Legal);		setOperationAction(ISD::SMIN, MVT::i16, Legal);
setOperationAction(ISD::SMAX, MVT::i16, Legal);		setOperationAction(ISD::SMAX, MVT::i16, Legal);

setOperationAction(ISD::UMIN, MVT::i16, Legal);		setOperationAction(ISD::UMIN, MVT::i16, Legal);
setOperationAction(ISD::UMAX, MVT::i16, Legal);		setOperationAction(ISD::UMAX, MVT::i16, Legal);
▲ Show 20 Lines • Show All 2,836 Lines • ▼ Show 20 Lines	case ISD::LOAD: {
return Result;		return Result;
}		}

case ISD::FSIN:		case ISD::FSIN:
case ISD::FCOS:		case ISD::FCOS:
return LowerTrig(Op, DAG);		return LowerTrig(Op, DAG);
case ISD::SELECT: return LowerSELECT(Op, DAG);		case ISD::SELECT: return LowerSELECT(Op, DAG);
case ISD::FDIV: return LowerFDIV(Op, DAG);		case ISD::FDIV: return LowerFDIV(Op, DAG);
case ISD::ATOMIC_CMP_SWAP: return LowerATOMIC_CMP_SWAP(Op, DAG);		case ISD::ATOMIC_CMP_SWAP: return LowerATOMIC_CMP_SWAP(Op, DAG);
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions If you're going to make the changes in this patch, you need at least reasonable default behavior for all other platforms. andrew.w.kaylor: If you're going to make the changes in this patch, you need at least reasonable default…
case ISD::STORE: return LowerSTORE(Op, DAG);		case ISD::STORE: return LowerSTORE(Op, DAG);
case ISD::GlobalAddress: {		case ISD::GlobalAddress: {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
return LowerGlobalAddress(MFI, Op, DAG);		return LowerGlobalAddress(MFI, Op, DAG);
}		}
		case ISD::STRICT_FADD:
		case ISD::STRICT_FSUB:
		case ISD::STRICT_FMUL:
		case ISD::STRICT_FDIV:
		case ISD::STRICT_FREM:
		case ISD::STRICT_FMA:
		case ISD::STRICT_FSQRT:
		case ISD::STRICT_FPOW:
		case ISD::STRICT_FPOWI:
		case ISD::STRICT_FSIN:
		case ISD::STRICT_FCOS:
		case ISD::STRICT_FEXP:
		case ISD::STRICT_FEXP2:
		case ISD::STRICT_FLOG:
		case ISD::STRICT_FLOG10:
		case ISD::STRICT_FLOG2:
		case ISD::STRICT_FRINT:
		case ISD::STRICT_FNEARBYINT:
		return LowerConstrainedFPs(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::INTRINSIC_W_CHAIN: return LowerINTRINSIC_W_CHAIN(Op, DAG);		case ISD::INTRINSIC_W_CHAIN: return LowerINTRINSIC_W_CHAIN(Op, DAG);
case ISD::INTRINSIC_VOID: return LowerINTRINSIC_VOID(Op, DAG);		case ISD::INTRINSIC_VOID: return LowerINTRINSIC_VOID(Op, DAG);
case ISD::ADDRSPACECAST: return lowerADDRSPACECAST(Op, DAG);		case ISD::ADDRSPACECAST: return lowerADDRSPACECAST(Op, DAG);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return lowerINSERT_VECTOR_ELT(Op, DAG);		return lowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return lowerEXTRACT_VECTOR_ELT(Op, DAG);		return lowerEXTRACT_VECTOR_ELT(Op, DAG);
▲ Show 20 Lines • Show All 1,439 Lines • ▼ Show 20 Lines	if (Unsafe) {
// x / y -> x * (1.0 / y)		// x / y -> x * (1.0 / y)
SDValue Recip = DAG.getNode(AMDGPUISD::RCP, SL, VT, RHS);		SDValue Recip = DAG.getNode(AMDGPUISD::RCP, SL, VT, RHS);
return DAG.getNode(ISD::FMUL, SL, VT, LHS, Recip, Flags);		return DAG.getNode(ISD::FMUL, SL, VT, LHS, Recip, Flags);
}		}

return SDValue();		return SDValue();
}		}

		static SDValue getFPUnaryOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,
		EVT VT, SDValue A, SDValue GlueChain) {
		if (GlueChain->getNumValues() <= 1) {
		return DAG.getNode(Opcode, SL, VT, A);
		}

		assert(GlueChain->getNumValues() == 2 \|\| GlueChain->getNumValues() == 3);

		SDVTList VTList = DAG.getVTList(VT, MVT::Other, MVT::Glue);
		switch (Opcode) {
		default: llvm_unreachable("no chain equivalent for opcode");
		case ISD::FSQRT:
		Opcode = AMDGPUISD::FSQRT_W_CHAIN; break;
		}

		if (GlueChain->getNumValues() == 2)
		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(0), A,
		GlueChain.getValue(1));

		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(1), A,
		GlueChain.getValue(2));
		}


static SDValue getFPBinOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,		static SDValue getFPBinOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,
EVT VT, SDValue A, SDValue B, SDValue GlueChain) {		EVT VT, SDValue A, SDValue B, SDValue GlueChain) {
if (GlueChain->getNumValues() <= 1) {		if (GlueChain->getNumValues() <= 1) {
return DAG.getNode(Opcode, SL, VT, A, B);		return DAG.getNode(Opcode, SL, VT, A, B);
}		}

assert(GlueChain->getNumValues() == 3);		assert(GlueChain->getNumValues() == 2 \|\| GlueChain->getNumValues() == 3);

SDVTList VTList = DAG.getVTList(VT, MVT::Other, MVT::Glue);		SDVTList VTList = DAG.getVTList(VT, MVT::Other, MVT::Glue);
switch (Opcode) {		switch (Opcode) {
default: llvm_unreachable("no chain equivalent for opcode");		default: llvm_unreachable("no chain equivalent for opcode");
		case ISD::FADD:
		Opcode = AMDGPUISD::FADD_W_CHAIN; break;
		case ISD::FSUB:
		Opcode = AMDGPUISD::FSUB_W_CHAIN; break;
case ISD::FMUL:		case ISD::FMUL:
Opcode = AMDGPUISD::FMUL_W_CHAIN;		Opcode = AMDGPUISD::FMUL_W_CHAIN; break;
break;		case ISD::FDIV:
		Opcode = AMDGPUISD::FDIV_W_CHAIN; break;
}		}

		if (GlueChain->getNumValues() == 2)
		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(0), A, B,
		GlueChain.getValue(1));

return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(1), A, B,		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(1), A, B,
GlueChain.getValue(2));		GlueChain.getValue(2));
}		}

static SDValue getFPTernOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,		static SDValue getFPTernOp(SelectionDAG &DAG, unsigned Opcode, const SDLoc &SL,
EVT VT, SDValue A, SDValue B, SDValue C,		EVT VT, SDValue A, SDValue B, SDValue C,
SDValue GlueChain) {		SDValue GlueChain) {
if (GlueChain->getNumValues() <= 1) {		if (GlueChain->getNumValues() <= 1) {
return DAG.getNode(Opcode, SL, VT, A, B, C);		return DAG.getNode(Opcode, SL, VT, A, B, C);
}		}

assert(GlueChain->getNumValues() == 3);		assert(GlueChain->getNumValues() == 3 \|\| GlueChain->getNumValues() == 2);

SDVTList VTList = DAG.getVTList(VT, MVT::Other, MVT::Glue);		SDVTList VTList = DAG.getVTList(VT, MVT::Other, MVT::Glue);
switch (Opcode) {		switch (Opcode) {
default: llvm_unreachable("no chain equivalent for opcode");		default: llvm_unreachable("no chain equivalent for opcode");
case ISD::FMA:		case ISD::FMA:
Opcode = AMDGPUISD::FMA_W_CHAIN;		Opcode = AMDGPUISD::FMA_W_CHAIN;
break;		break;
}		}

		if (GlueChain->getNumValues() == 2)
		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(0), A, B, C,
		GlueChain.getValue(1));

return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(1), A, B, C,		return DAG.getNode(Opcode, SL, VTList, GlueChain.getValue(1), A, B, C,
GlueChain.getValue(2));		GlueChain.getValue(2));
}		}

SDValue SITargetLowering::LowerFDIV16(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerFDIV16(SDValue Op, SelectionDAG &DAG) const {
if (SDValue FastLowered = lowerFastUnsafeFDIV(Op, DAG))		if (SDValue FastLowered = lowerFastUnsafeFDIV(Op, DAG))
return FastLowered;		return FastLowered;

▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	if (VT == MVT::f64)
return LowerFDIV64(Op, DAG);		return LowerFDIV64(Op, DAG);

if (VT == MVT::f16)		if (VT == MVT::f16)
return LowerFDIV16(Op, DAG);		return LowerFDIV16(Op, DAG);

llvm_unreachable("Unexpected type for fdiv");		llvm_unreachable("Unexpected type for fdiv");
}		}

		SDValue SITargetLowering::LowerConstrainedFPs(SDValue Op, SelectionDAG &DAG) const {
		SDLoc SL(Op);

		// Retrieve FP Rouding Mode.
		SDValue RoundModeSD = Op.getOperand(Op.getNumOperands()-1);
		unsigned RoundModeValue = cast<ConstantSDNode>(RoundModeSD.getNode())->getZExtValue();
		unsigned WidthBit = 0, RoundingMode = 0;
		if (Op.getValueType() == MVT::f16 \|\| Op.getValueType() == MVT::f32)
		WidthBit = OFFSET_SINGLE_FP_ROUND;
		kzhuravlUnsubmitted Not Done Reply Inline Actions Rename WidthBit to "Offset". kzhuravl: Rename WidthBit to "Offset".
		else
		WidthBit = OFFSET_DOUBLE_FP_ROUND;

		switch (RoundModeValue) {
		case llvm::ConstrainedFPIntrinsic::rmDynamic:
		kzhuravlUnsubmitted Not Done Reply Inline Actions Remove. This can go to default? kzhuravl: Remove. This can go to default?
		//assert(false && "We don't support dynamic rouding mode currently!");
		rampitecUnsubmitted Not Done Reply Inline Actions llvm_unreachable rampitec: llvm_unreachable
		break;
		case llvm::ConstrainedFPIntrinsic::rmToNearest:
		RoundingMode = FP_ROUND_ROUND_TO_NEAREST; break;
		case llvm::ConstrainedFPIntrinsic::rmDownward:
		RoundingMode = FP_ROUND_ROUND_TO_INF; break;
		case llvm::ConstrainedFPIntrinsic::rmUpward:
		RoundingMode = FP_ROUND_ROUND_TO_NEGINF; break;
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions You have upward and downward reversed. andrew.w.kaylor: You have upward and downward reversed.
		case llvm::ConstrainedFPIntrinsic::rmTowardZero:
		RoundingMode = FP_ROUND_ROUND_TO_ZERO; break;
		default: llvm_unreachable("Unknown fp mode code!");
		kzhuravlUnsubmitted Not Done Reply Inline Actions New line. kzhuravl: New line.
		}

		const unsigned RoudingMode32Reg = AMDGPU::Hwreg::ID_MODE \|
		kzhuravlUnsubmitted Not Done Reply Inline Actions Missing ID_SHIFT_. kzhuravl: Missing ID_SHIFT_.
		(WidthBit << AMDGPU::Hwreg::OFFSET_SHIFT_) \|
		(1 << AMDGPU::Hwreg::WIDTH_M1_SHIFT_);
		kzhuravlUnsubmitted Not Done Reply Inline Actions What is 1? Do not use bare numbers. kzhuravl: What is 1? Do not use bare numbers.

		const SDValue BitField = DAG.getTargetConstant(RoudingMode32Reg, SL, MVT::i16);

		SDVTList BindParamVTs = DAG.getVTList(MVT::Other, MVT::Glue);
		const SDValue EnableDenormValue = DAG.getConstant(RoundingMode,
		SL, MVT::i32);

		SDValue EnableDenorm = DAG.getNode(AMDGPUISD::SETREG, SL, BindParamVTs,
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I think you're interpreting the rounding mode argument differently than I have, and therefore differently than the documentation in the LLVM Language Reference ("therefore" because I wrote the documentation). My intention was that the rounding mode argument was provided as information to the optimizer. It tells the optimizer what it can assume about rounding mode at the point of the operation. It was not intended to actually set the rounding mode. I'm approaching this from the perspective of the STDC pragmas related to the FP environment. My understanding of these is that if FENV_ACCESS on is declared, we must assume dynamic (i.e. unknown) rounding mode in those scopes unless we can prove otherwise, but if the user wants to change the rounding mode a specific function call (such as fesetround) will be used. I'm not sure what sort of front end you are assuming here, so that may explain the difference in your approach. There are some x86 instructions that can incorporate a rounding mode operand, and it is my understanding that the AMDGPU architecture has similar needs. However, I believe we will need to extend the constrained FP intrinsics (or possibly introduce new intrinsics to handle cases like that. andrew.w.kaylor: I think you're interpreting the rounding mode argument differently than I have, and therefore…
		DAG.getEntryNode(),
		EnableDenormValue, BitField);
		// get OPC
		unsigned EqOpc;
		switch (Op.getOpcode()) {
		default: llvm_unreachable("no chain equivalent for opcode");
		case ISD::STRICT_FADD : EqOpc = ISD::FADD; break;
		rampitecUnsubmitted Not Done Reply Inline Actions You are already doing translation, you can remove all the switches translating EqOpc into the same with chain. Just use it here. rampitec: You are already doing translation, you can remove all the switches translating EqOpc into the…
		case ISD::STRICT_FSUB : EqOpc = ISD::FSUB; break;
		case ISD::STRICT_FMUL : EqOpc = ISD::FMUL; break;
		case ISD::STRICT_FDIV: EqOpc = ISD::FDIV; break;
		case ISD::STRICT_FMA : EqOpc = ISD::FMA; break;
		case ISD::STRICT_FREM: EqOpc = ISD::FREM; break;
		case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;
		case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;
		case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;
		case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;
		case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;
		case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;
		case ISD::STRICT_FEXP2: EqOpc = ISD::FEXP2; break;
		case ISD::STRICT_FLOG: EqOpc = ISD::FLOG; break;
		case ISD::STRICT_FLOG10: EqOpc = ISD::FLOG10; break;
		case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break;
		case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;
		case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;
		}

		SDValue Res;
		if (Op.getNumOperands() == 3) {
		Res = getFPUnaryOp(DAG, EqOpc, SL, Op.getValueType(),
		Op.getOperand(1),
		EnableDenorm);
		}
		else if (Op.getNumOperands() == 4) {
		Res = getFPBinOp(DAG, EqOpc, SL, Op.getValueType(),
		Op.getOperand(1),
		Op.getOperand(2),
		EnableDenorm);
		}
		else if (Op.getNumOperands() == 5) {
		Res = getFPTernOp(DAG, EqOpc, SL, Op.getValueType(), Op.getOperand(1),
		Op.getOperand(2),
		Op.getOperand(3),
		EnableDenorm);
		}
		const SDValue DefaultRoundingModeValue = DAG.getConstant(FP_ROUND_ROUND_TO_NEAREST,
		SL, MVT::i32);

		SDValue DisableDenorm = DAG.getNode(AMDGPUISD::SETREG, SL, MVT::Other,
		Res.getValue(1),
		DefaultRoundingModeValue,
		BitField,
		Res.getValue(2));

		SDValue OutputChain = DAG.getNode(ISD::TokenFactor, SL, MVT::Other,
		rampitecUnsubmitted Not Done Reply Inline Actions OK, you have chained all the nodes which require to reside within two s_setreg statements. How do you prevent any other regular fp operations without a chain to be scheduled in between of them? rampitec: OK, you have chained all the nodes which require to reside within two s_setreg statements. How…
		DisableDenorm, DAG.getRoot());
		DAG.setRoot(OutputChain);

		return Res;

		llvm_unreachable("Unexpected type for fma");
		}

SDValue SITargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {		SDValue SITargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
StoreSDNode *Store = cast<StoreSDNode>(Op);		StoreSDNode *Store = cast<StoreSDNode>(Op);
EVT VT = Store->getMemoryVT();		EVT VT = Store->getMemoryVT();

if (VT == MVT::i1) {		if (VT == MVT::i1) {
return DAG.getTruncStore(Store->getChain(), DL,		return DAG.getTruncStore(Store->getChain(), DL,
DAG.getSExtOrTrunc(Store->getValue(), DL, MVT::i32),		DAG.getSExtOrTrunc(Store->getValue(), DL, MVT::i32),
▲ Show 20 Lines • Show All 1,929 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/constrained_fp.ll

	; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

				declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata) nounwind readnone
	declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata) nounwind readnone			declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata) nounwind readnone
				declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata) nounwind readnone
				declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata) nounwind readnone
				declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata) nounwind readnone
				declare float @llvm.experimental.constrained.fsub.f32(float, float, metadata, metadata) nounwind readnone

	; FUNC-LABEL: {{^}}fma_f64:			; GCN-LABEL: {{^}}fadd_f64_round_tonearest
	; FUNC: s_setreg_b32			; GCN: s_mov_b32
	; FUNC: v_fma_f64			; GCN: s_mov_b32 [[MODE:s[0-9]+]], 0
	; FUNC: s_setreg_b32			; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 2, 2), [[MODE]]
	define amdgpu_kernel void @fma_f64(double addrspace(1)* %out, double addrspace(1)* %in1,			; GCN: v_add_f64
	double addrspace(1)* %in2, double addrspace(1)* %in3) {			; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 2, 2)
				define amdgpu_kernel void @fadd_f64_round_tonearest(double addrspace(1)* %out, double addrspace(1)* %in1,
				double addrspace(1)* %in2) {
				%r0 = load double, double addrspace(1)* %in1
				%r1 = load double, double addrspace(1)* %in2
				%r3 = tail call double @llvm.experimental.constrained.fadd.f64(double %r0, double %r1, metadata !"round.tonearest", metadata !"fpexcept.strict")
				store double %r3, double addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}fadd_f64_round_downward
				; GCN: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 1
				; GCN: v_add_f64
				; GCN: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				define amdgpu_kernel void @fadd_f64_round_downward(double addrspace(1)* %out, double addrspace(1)* %in1,
				double addrspace(1)* %in2) {
				%r0 = load double, double addrspace(1)* %in1
				%r1 = load double, double addrspace(1)* %in2
				%r3 = tail call double @llvm.experimental.constrained.fadd.f64(double %r0, double %r1, metadata !"round.downward", metadata !"fpexcept.strict")
				store double %r3, double addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}fadd_f64_round_upward
				; GCN: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 2
				; GCN: v_add_f64
				; GCN: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				define amdgpu_kernel void @fadd_f64_round_upward(double addrspace(1)* %out, double addrspace(1)* %in1,
				double addrspace(1)* %in2) {
				%r0 = load double, double addrspace(1)* %in1
				%r1 = load double, double addrspace(1)* %in2
				%r3 = tail call double @llvm.experimental.constrained.fadd.f64(double %r0, double %r1, metadata !"round.upward", metadata !"fpexcept.strict")
				store double %r3, double addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}fadd_f64_round_towardzero
				; GCN: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 3
				; GCN: v_add_f64
				; GCN: s_setreg_imm32_b32 hwreg(HW_REG_MODE, 2, 2), 0
				define amdgpu_kernel void @fadd_f64_round_towardzero(double addrspace(1)* %out, double addrspace(1)* %in1,
				double addrspace(1)* %in2) {
				%r0 = load double, double addrspace(1)* %in1
				%r1 = load double, double addrspace(1)* %in2
				%r3 = tail call double @llvm.experimental.constrained.fadd.f64(double %r0, double %r1, metadata !"round.towardzero", metadata !"fpexcept.strict")
				store double %r3, double addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}fmul_f64_tonearest
				; GCN: s_mov_b32
				; GCN: s_mov_b32 [[MODE:s[0-9]+]], 0
				; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 2, 2), [[MODE]]
				; GCN: v_mul_f64
				; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 2, 2)
				define amdgpu_kernel void @fmul_f64_tonearest(double addrspace(1)* %out, double addrspace(1)* %in1,
				double addrspace(1)* %in2) {
				%r0 = load double, double addrspace(1)* %in1
				%r1 = load double, double addrspace(1)* %in2
				%r3 = tail call double @llvm.experimental.constrained.fmul.f64(double %r0, double %r1, metadata !"round.tonearest", metadata !"fpexcept.strict")
				store double %r3, double addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}fsqrt_f64_tonearest
				; GCN: s_mov_b32
				; GCN: s_mov_b32 [[MODE:s[0-9]+]], 0
				; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 2, 2), [[MODE]]
				; GCN: v_sqrt_f64_e32
				; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 2, 2)
				define amdgpu_kernel void @fsqrt_f64_tonearest(double addrspace(1)* %out, double addrspace(1)* %in1) {
				%r0 = load double, double addrspace(1)* %in1
				%r1 = tail call double @llvm.experimental.constrained.sqrt.f64(double %r0, metadata !"round.tonearest", metadata !"fpexcept.strict")
				store double %r1, double addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}fsub_f32_tonearest
				; GCN: s_mov_b32
				; GCN: s_mov_b32 [[MODE:s[0-9]+]], 0
				; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 0, 2), [[MODE]]
				; GCN: v_sub_f32_e32
				; GCN: s_setreg_b32 hwreg(HW_REG_MODE, 0, 2)
				define amdgpu_kernel void @fsub_f32_tonearest(float addrspace(1)* %out, float addrspace(1)* %in1, float addrspace(1)* %in2) {
				%r0 = load float, float addrspace(1)* %in1
				%r1 = load float, float addrspace(1)* %in2
				%r2 = tail call float @llvm.experimental.constrained.fsub.f32(float %r0, float %r1, metadata !"round.tonearest", metadata !"fpexcept.strict")
				store float %r2, float addrspace(1)* %out
				ret void
				}

				; GCN-LABEL: {{^}}fma_f64:
				; GCN: s_setreg_b32
				; GCN: v_fma_f64
				; GCN: s_setreg_b32
				define amdgpu_kernel void @fma_f64(double addrspace(1)* %out, double addrspace(1)* %in1, double addrspace(1)* %in2, double addrspace(1)* %in3) {
	%r0 = load double, double addrspace(1)* %in1			%r0 = load double, double addrspace(1)* %in1
	%r1 = load double, double addrspace(1)* %in2			%r1 = load double, double addrspace(1)* %in2
	%r2 = load double, double addrspace(1)* %in3			%r2 = load double, double addrspace(1)* %in3
	%r3 = tail call double @llvm.experimental.constrained.fma.f64(double %r0, double %r1, double %r2, metadata !"round.dynamic", metadata !"fpexcept.strict")			%r3 = tail call double @llvm.experimental.constrained.fma.f64(double %r0, double %r1, double %r2, metadata !"round.tonearest", metadata !"fpexcept.strict")
	store double %r3, double addrspace(1)* %out			store double %r3, double addrspace(1)* %out
	ret void			ret void
	}			}

				; GCN-LABEL: {{^}}fma_f32:
				; GCN: s_setreg_b32
				; GCN: v_fma_f32
				; GCN: s_setreg_b32
				define amdgpu_kernel void @fma_f32(float addrspace(1)* %out, float addrspace(1)* %in1, float addrspace(1)* %in2, float addrspace(1)* %in3) {
				%r0 = load float, float addrspace(1)* %in1
				%r1 = load float, float addrspace(1)* %in2
				%r2 = load float, float addrspace(1)* %in3
				%r3 = tail call float @llvm.experimental.constrained.fma.f32(float %r0, float %r1, float %r2, metadata !"round.tonearest", metadata !"fpexcept.strict")
				store float %r3, float addrspace(1)* %out
				ret void
				}