Download Raw Diff

Details

Reviewers

b-sumner
kzhuravl
arsenm

Commits

rG4d3d4ca1b37a: AMDGPU : Replace FMAD with FMA when denormals are enabled.
rL296186: AMDGPU : Replace FMAD with FMA when denormals are enabled.

Summary

AMDGPU : Replace FMAD with FMA when denormals are enabled.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Feb 14 2017, 12:15 PM

Herald added subscribers: tpr, tony-tye, yaxunl, nhaehnle. · View Herald TranscriptFeb 14 2017, 12:15 PM

You should not be attempting to lower fmad. It is not the same as FMA. You should be creating a new FMAD_FTZ node for use in the one specific case you are trying to fix, which will only be used if denormals are disabled.

This revision now requires changes to proceed.Feb 14 2017, 12:17 PM

Address code reviews.

b-sumner edited edge metadata.Feb 15 2017, 10:34 AM

This comment was removed by b-sumner.

arsenm added inline comments.Feb 15 2017, 11:43 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1296–1298 ↗	(On Diff #88572)	You only need to select between the opcode. You don't need 2 getNode calls
lib/Target/AMDGPU/AMDGPUInstrInfo.td
198 ↗	(On Diff #88572)	Extra newline
lib/Target/AMDGPU/SIISelLowering.cpp
365 ↗	(On Diff #88572)	Not sure why this appears in the diff
lib/Target/AMDGPU/SIInstructions.td
500–505 ↗	(On Diff #88572)	You should be using the exact fmad pattern above. You should refactor the class to take the input node
test/CodeGen/AMDGPU/udiv.ll
4–5 ↗	(On Diff #88572)	You should only need one new run line, since the default is already tested. You should change the existing VI runline to explicitly set off. The r600 run line should also be last
192–194 ↗	(On Diff #88572)	You don't need both functions since you are using the global -mattr to do this. These also should have the same result with and without denormals
194 ↗	(On Diff #88572)	You should minimize this list

As complex pattern causes pattern cannot be selected. So no complex pattern used for FMAD_FTZ pattern definition. Also the generated ISAs is different as fneg is being folded, so we have v_mac_f32 and v_mad_f32 respectively when denormals are enabled and disabled.

v_mac_f32 always flushes subnormal inputs just like v_mad_f32

Change V_MAC_f32 to V_MAD_F32 to emit the same result with and without denormals.

arsenm added inline comments.Feb 17 2017, 4:42 PM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1297 ↗	(On Diff #88908)	The casts are unnecessary
lib/Target/AMDGPU/SIInstructions.td
505 ↗	(On Diff #88908)	This won't fold the neg source modifier, so will still make code worse. There should be a 3-operand class with source modifiers you should use rather than calling it FMAC. This also can be a class, a multi class isn't necessary
test/CodeGen/AMDGPU/udiv.ll
166 ↗	(On Diff #88908)	Should check the neg source modifier is used

wdng marked an inline comment as done.Feb 21 2017, 3:57 PM

wdng added inline comments.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1297 ↗	(On Diff #88908)	If cast is removed, there will be a "warning: enumeral mismatch in conditional expression: ‘llvm::AMDGPUISD::NodeType’ vs ‘llvm::ISD::NodeType’", shall we still remove it?

Address code reviews.

Herald added a subscriber: dstuttard. · View Herald TranscriptFeb 22 2017, 12:12 PM

arsenm added inline comments.Feb 22 2017, 1:43 PM

lib/Target/AMDGPU/SIInstructions.td
511 ↗	(On Diff #89399)	For now you might want to just stick emitting v_mad_f32 to not deviate from the denormal disabled output. Selecting v_mac_ with src0/src1 modifiers should be a separate patch

Address code review feedback.

arsenm added inline comments.Feb 23 2017, 12:54 PM

lib/Target/AMDGPU/AMDGPUISelLowering.h
269 ↗	(On Diff #89526)	Needs a comment explaining what it is for
lib/Target/AMDGPU/SIInstructions.td
505 ↗	(On Diff #89526)	Extra space after VOP3Mods

Address code review feedback.

arsenm added inline comments.Feb 23 2017, 2:53 PM

lib/Target/AMDGPU/AMDGPUISelLowering.h
269 ↗	(On Diff #89526)	This is the opposite. It is for emitting mad when f32 denormals are enabled because mac/mad always flush

Update comments.

ping.

arsenm added inline comments.Feb 24 2017, 10:38 AM

lib/Target/AMDGPU/AMDGPUISelLowering.h
269 ↗	(On Diff #89682)	should refer to it as ISD::FMAD here
270 ↗	(On Diff #89682)	It's an illegal operation, not an illegal type

Updated, thanks!

ping.

LGTM

This revision is now accepted and ready to land.Feb 24 2017, 1:24 PM

Closed by commit rL296186: AMDGPU : Replace FMAD with FMA when denormals are enabled. (authored by wdng). · Explain WhyFeb 24 2017, 3:12 PM

This revision was automatically updated to reflect the committed changes.

Diff 89731

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
UMIN3,		UMIN3,
FMED3,		FMED3,
SMED3,		SMED3,
UMED3,		UMED3,
URECIP,		URECIP,
DIV_SCALE,		DIV_SCALE,
DIV_FMAS,		DIV_FMAS,
DIV_FIXUP,		DIV_FIXUP,
		// For emitting ISD::FMAD when f32 denormals are enabled because mac/mad is
		// treated as an illegal operation.
		FMAD_FTZ,
TRIG_PREOP, // 1 ULP max error for f64		TRIG_PREOP, // 1 ULP max error for f64

// RCP, RSQ - For f32, 1 ULP max error, no denormal handling.		// RCP, RSQ - For f32, 1 ULP max error, no denormal handling.
// For f64, max error 2^29 ULP, handles denormals.		// For f64, max error 2^29 ULP, handles denormals.
RCP,		RCP,
RSQ,		RSQ,
RCP_LEGACY,		RCP_LEGACY,
RSQ_LEGACY,		RSQ_LEGACY,
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 1,284 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerDIVREM24(SDValue Op, SelectionDAG &DAG,

// fq = trunc(fq);		// fq = trunc(fq);
fq = DAG.getNode(ISD::FTRUNC, DL, FltVT, fq);		fq = DAG.getNode(ISD::FTRUNC, DL, FltVT, fq);

// float fqneg = -fq;		// float fqneg = -fq;
SDValue fqneg = DAG.getNode(ISD::FNEG, DL, FltVT, fq);		SDValue fqneg = DAG.getNode(ISD::FNEG, DL, FltVT, fq);

// float fr = mad(fqneg, fb, fa);		// float fr = mad(fqneg, fb, fa);
SDValue fr = DAG.getNode(ISD::FMAD, DL, FltVT, fqneg, fb, fa);		unsigned OpCode = Subtarget->hasFP32Denormals() ?
		(unsigned)AMDGPUISD::FMAD_FTZ :
		(unsigned)ISD::FMAD;
		SDValue fr = DAG.getNode(OpCode, DL, FltVT, fqneg, fb, fa);

// int iq = (int)fq;		// int iq = (int)fq;
SDValue iq = DAG.getNode(ToInt, DL, IntVT, fq);		SDValue iq = DAG.getNode(ToInt, DL, IntVT, fq);

// fr = fabs(fr);		// fr = fabs(fr);
fr = DAG.getNode(ISD::FABS, DL, FltVT, fr);		fr = DAG.getNode(ISD::FABS, DL, FltVT, fr);

// fb = fabs(fb);		// fb = fabs(fb);
▲ Show 20 Lines • Show All 2,109 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(UMIN3)		NODE_NAME_CASE(UMIN3)
NODE_NAME_CASE(FMED3)		NODE_NAME_CASE(FMED3)
NODE_NAME_CASE(SMED3)		NODE_NAME_CASE(SMED3)
NODE_NAME_CASE(UMED3)		NODE_NAME_CASE(UMED3)
NODE_NAME_CASE(URECIP)		NODE_NAME_CASE(URECIP)
NODE_NAME_CASE(DIV_SCALE)		NODE_NAME_CASE(DIV_SCALE)
NODE_NAME_CASE(DIV_FMAS)		NODE_NAME_CASE(DIV_FMAS)
NODE_NAME_CASE(DIV_FIXUP)		NODE_NAME_CASE(DIV_FIXUP)
		NODE_NAME_CASE(FMAD_FTZ)
NODE_NAME_CASE(TRIG_PREOP)		NODE_NAME_CASE(TRIG_PREOP)
NODE_NAME_CASE(RCP)		NODE_NAME_CASE(RCP)
NODE_NAME_CASE(RSQ)		NODE_NAME_CASE(RSQ)
NODE_NAME_CASE(RCP_LEGACY)		NODE_NAME_CASE(RCP_LEGACY)
NODE_NAME_CASE(RSQ_LEGACY)		NODE_NAME_CASE(RSQ_LEGACY)
NODE_NAME_CASE(FMUL_LEGACY)		NODE_NAME_CASE(FMUL_LEGACY)
NODE_NAME_CASE(RSQ_CLAMP)		NODE_NAME_CASE(RSQ_CLAMP)
NODE_NAME_CASE(LDEXP)		NODE_NAME_CASE(LDEXP)
▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	// src1 = Denominator, src2 = Numerator).			// src1 = Denominator, src2 = Numerator).
	def AMDGPUdiv_fmas : SDNode<"AMDGPUISD::DIV_FMAS", AMDGPUFmasOp>;			def AMDGPUdiv_fmas : SDNode<"AMDGPUISD::DIV_FMAS", AMDGPUFmasOp>;

	// Single or double precision division fixup.			// Single or double precision division fixup.
	// Special case divide fixup and flags(src0 = Quotient, src1 =			// Special case divide fixup and flags(src0 = Quotient, src1 =
	// Denominator, src2 = Numerator).			// Denominator, src2 = Numerator).
	def AMDGPUdiv_fixup : SDNode<"AMDGPUISD::DIV_FIXUP", SDTFPTernaryOp>;			def AMDGPUdiv_fixup : SDNode<"AMDGPUISD::DIV_FIXUP", SDTFPTernaryOp>;

				def AMDGPUfmad_ftz : SDNode<"AMDGPUISD::FMAD_FTZ", SDTFPTernaryOp>;

	// Look Up 2.0 / pi src0 with segment select src1[4:0]			// Look Up 2.0 / pi src0 with segment select src1[4:0]
	def AMDGPUtrig_preop : SDNode<"AMDGPUISD::TRIG_PREOP", AMDGPUTrigPreOp>;			def AMDGPUtrig_preop : SDNode<"AMDGPUISD::TRIG_PREOP", AMDGPUTrigPreOp>;

	def AMDGPUregister_load : SDNode<"AMDGPUISD::REGISTER_LOAD",			def AMDGPUregister_load : SDNode<"AMDGPUISD::REGISTER_LOAD",
	SDTypeProfile<1, 2, [SDTCisPtrTy<1>, SDTCisInt<2>]>,			SDTypeProfile<1, 2, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
	[SDNPHasChain, SDNPMayLoad]>;			[SDNPHasChain, SDNPMayLoad]>;

	def AMDGPUregister_store : SDNode<"AMDGPUISD::REGISTER_STORE",			def AMDGPUregister_store : SDNode<"AMDGPUISD::REGISTER_STORE",
	▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 502 Lines • ▼ Show 20 Lines	def : Pat <
(inst $src0_modifiers, $src0, $src1_modifiers, $src1,		(inst $src0_modifiers, $src0, $src1_modifiers, $src1,
$src2_modifiers, $src2, $clamp, $omod)		$src2_modifiers, $src2, $clamp, $omod)
>;		>;
}		}

defm : FMADPat <f16, V_MAC_F16_e64>;		defm : FMADPat <f16, V_MAC_F16_e64>;
defm : FMADPat <f32, V_MAC_F32_e64>;		defm : FMADPat <f32, V_MAC_F32_e64>;

		class FMADModsPat<Instruction inst, SDPatternOperator mad_opr> : Pat<
		(f32 (mad_opr (VOP3Mods f32:$src0, i32:$src0_mod),
		(VOP3Mods f32:$src1, i32:$src1_mod),
		(VOP3Mods f32:$src2, i32:$src2_mod))),
		(inst $src0_mod, $src0, $src1_mod, $src1,
		$src2_mod, $src2, DSTCLAMP.NONE, DSTOMOD.NONE)
		>;

		def : FMADModsPat<V_MAD_F32, AMDGPUfmad_ftz>;

multiclass SelectPat <ValueType vt, Instruction inst> {		multiclass SelectPat <ValueType vt, Instruction inst> {
def : Pat <		def : Pat <
(vt (select i1:$src0, vt:$src1, vt:$src2)),		(vt (select i1:$src0, vt:$src1, vt:$src2)),
(inst $src2, $src1, $src0)		(inst $src2, $src1, $src0)
>;		>;
}		}

defm : SelectPat <i16, V_CNDMASK_B32_e64>;		defm : SelectPat <i16, V_CNDMASK_B32_e64>;
▲ Show 20 Lines • Show All 658 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/udiv.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs -mattr=-fp32-denormals < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC -check-prefix=VI %s

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+fp32-denormals < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s

	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}udiv_i32:			; FUNC-LABEL: {{^}}udiv_i32:
	; EG-NOT: SETGE_INT			; EG-NOT: SETGE_INT
	; EG: CF_END			; EG: CF_END

	; SI: v_rcp_iflag_f32_e32			; SI: v_rcp_iflag_f32_e32
	define void @udiv_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {			define void @udiv_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
	▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	; SI: v_mov_b32_e32 v{{[0-9]+}}, 0xaaaaaaab			; SI: v_mov_b32_e32 v{{[0-9]+}}, 0xaaaaaaab
	; SI: v_mul_hi_u32 v0, {{v[0-9]+}}, {{s[0-9]+}}			; SI: v_mul_hi_u32 v0, {{v[0-9]+}}, {{s[0-9]+}}
	; SI-NEXT: v_lshrrev_b32_e32 v0, 1, v0			; SI-NEXT: v_lshrrev_b32_e32 v0, 1, v0
	define void @test_udiv_3_mulhu(i32 %p) {			define void @test_udiv_3_mulhu(i32 %p) {
	%i = udiv i32 %p, 3			%i = udiv i32 %p, 3
	store volatile i32 %i, i32 addrspace(1)* undef			store volatile i32 %i, i32 addrspace(1)* undef
	ret void			ret void
	}			}

				; GCN-LABEL: {{^}}fdiv_test_denormals
				; VI: v_mad_f32 v{{[0-9]+}}, -v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
				define amdgpu_kernel void @fdiv_test_denormals(i8 addrspace(1)* nocapture readonly %arg) {
				bb:
				%tmp = load i8, i8 addrspace(1)* null, align 1
				%tmp1 = sext i8 %tmp to i32
				%tmp2 = getelementptr inbounds i8, i8 addrspace(1)* %arg, i64 undef
				%tmp3 = load i8, i8 addrspace(1)* %tmp2, align 1
				%tmp4 = sext i8 %tmp3 to i32
				%tmp5 = sdiv i32 %tmp1, %tmp4
				%tmp6 = trunc i32 %tmp5 to i8
				store i8 %tmp6, i8 addrspace(1)* null, align 1
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Replace FMAD with FMA when denormals are enabled.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 89731

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/SIInstructions.td

llvm/trunk/test/CodeGen/AMDGPU/udiv.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Replace FMAD with FMA when denormals are enabled.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 89731

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/SIInstructions.td

llvm/trunk/test/CodeGen/AMDGPU/udiv.ll

AMDGPU : Replace FMAD with FMA when denormals are enabled.
ClosedPublic