This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Use V_MAC_F32 for fmad.ftz
ClosedPublic

Authored by arsenm on Mar 9 2020, 7:38 PM.

Download Raw Diff

Details

Reviewers

rampitec
nhaehnle
foad
b-sumner

Summary

This avoids regressions in a future patch. I'm confused by the use of
the gfx9 usage legacy_mad. Was this a pointless instruction rename, or
uses fmul_legacy handling? Why is regular mac avilable in that case?

Diff Detail

Event Timeline

arsenm created this revision.Mar 9 2020, 7:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2020, 7:38 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald Transcript

foad added inline comments.Mar 10 2020, 2:31 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
852	It's a shame that FMADPat and FMADModsPat take the same arguments but in a different order.
865	Doesn't this comment belong four lines earlier?

Fix parameter order.

Looks reasonable to me but I'd prefer if it was reviewed by someone who understands the difference between mad / mad_legacy / mac better than I do.

It was used in the div expansion for some non obvious reason. As far as I know it means 0.0 * x = 0.0, but I am not sure it is needed in this case.
Please run PSDB on this change, if it passes then LGTM.

In D75889#1915361, @rampitec wrote:

It was used in the div expansion for some non obvious reason. As far as I know it means 0.0 * x = 0.0, but I am not sure it is needed in this case.
Please run PSDB on this change, if it passes then LGTM.

It’s used in the div expansion because we want it regardless of the denormal mode. My confusion is for the f16 case, where the instruction was renamed or changed for some reason

In D75889#1915375, @arsenm wrote:

In D75889#1915361, @rampitec wrote:

It was used in the div expansion for some non obvious reason. As far as I know it means 0.0 * x = 0.0, but I am not sure it is needed in this case.
Please run PSDB on this change, if it passes then LGTM.

It’s used in the div expansion because we want it regardless of the denormal mode. My confusion is for the f16 case, where the instruction was renamed or changed for some reason

In GFX8, the opcodes below would write their result as: { 16’h0, result[15:0] }

MAD_F16
MAD_U16
MAD_I16
FMA_F16
DIV_FIXUP_F16
INTERP_P2_F16

GFX9 adds the “OPSEL” field to VOP3 (which was all zeros in GFX8), and the behavior of those instructions changes to: { previous_gpr_value[31:16], result[15:0] } when used with VOP3. VOP1/VOP2 will write zero to unused bits unless SDWA specifies otherwise, and VOP1/VOP2 ops encoded as VOP3 will write zero.
In order to support apps written for GFX8, the opcodes above will be renamed with the word “legacy” added:

MAD_LEGACY_F16
MAD_LEGACY_U16
MAD_LEGACY_I16
FMA_LEGACY_F16
DIV_FIXUP_LEGACY_F16
INTERP_P2_LEGACY_F16

New opcodes with the names of the original opcodes have been created which support this new “preserve” behavior. Also, all new 16-bit opcodes for GFX9 support the new “preserve” behavior.

Please run PSDB here anyway as if there are problems conformance should catch it.

In D75889#1915394, @rampitec wrote:

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

In D75889#1915688, @arsenm wrote:

In D75889#1915394, @rampitec wrote:

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

This revision is now accepted and ready to land.Mar 10 2020, 2:15 PM

In D75889#1915703, @rampitec wrote:

In D75889#1915688, @arsenm wrote:

In D75889#1915394, @rampitec wrote:

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

We don't have tests that run with f16 denormals enabled. We also don't use the f16 intrinsic for div expansion (although I suppose we could in some cases?)

200b20639ac2804318dee9fddfe79c7fea7bd623

In D75889#1915712, @arsenm wrote:

In D75889#1915703, @rampitec wrote:

In D75889#1915688, @arsenm wrote:

In D75889#1915394, @rampitec wrote:

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

We don't have tests that run with f16 denormals enabled. We also don't use the f16 intrinsic for div expansion (although I suppose we could in some cases?)

f16 denormals are always enabled.

In D75889#1915780, @rampitec wrote:

In D75889#1915712, @arsenm wrote:

In D75889#1915703, @rampitec wrote:

In D75889#1915688, @arsenm wrote:

In D75889#1915394, @rampitec wrote:

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

We don't have tests that run with f16 denormals enabled. We also don't use the f16 intrinsic for div expansion (although I suppose we could in some cases?)

f16 denormals are always enabled.

I meant disabled. They're always enabled, so we have no useful tests for the case where they're forced off

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIInstructions.td

32 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

inst-select-amdgcn.fmad.ftz.mir

36 lines

fcanonicalize-elimination.ll

4 lines

llvm.amdgcn.fmad.ftz.f16.ll

7 lines

llvm.amdgcn.fmad.ftz.ll

9 lines

Diff 249406

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 842 Lines • ▼ Show 20 Lines	def : GCNPat <
(f16 (uint_to_fp i32:$src)),		(f16 (uint_to_fp i32:$src)),
(V_CVT_F16_F32_e32 (V_CVT_F32_U32_e32 VSrc_b32:$src))		(V_CVT_F16_F32_e32 (V_CVT_F32_U32_e32 VSrc_b32:$src))
>;		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP2 Patterns		// VOP2 Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

multiclass FMADPat <ValueType vt, Instruction inst> {		// TODO: Check only no src2 mods?
def : GCNPat <		class FMADPat <ValueType vt, Instruction inst, SDPatternOperator node>
		foadUnsubmitted Not Done Reply Inline Actions It's a shame that FMADPat and FMADModsPat take the same arguments but in a different order. foad: It's a shame that FMADPat and FMADModsPat take the same arguments but in a different order.
(vt (fmad (VOP3NoMods vt:$src0),		: GCNPat <(vt (node (vt (VOP3NoMods vt:$src0)),
(VOP3NoMods vt:$src1),		(vt (VOP3NoMods vt:$src1)),
(VOP3NoMods vt:$src2))),		(vt (VOP3NoMods vt:$src2)))),
(inst SRCMODS.NONE, $src0, SRCMODS.NONE, $src1,		(inst SRCMODS.NONE, $src0, SRCMODS.NONE, $src1,
SRCMODS.NONE, $src2, DSTCLAMP.NONE, DSTOMOD.NONE)		SRCMODS.NONE, $src2, DSTCLAMP.NONE, DSTOMOD.NONE)
>;		>;


		// Prefer mac form when there are no modifiers.
		let AddedComplexity = 9 in {
		def : FMADPat <f32, V_MAC_F32_e64, fmad>;
		def : FMADPat <f32, V_MAC_F32_e64, AMDGPUfmad_ftz>;

		foadUnsubmitted Not Done Reply Inline Actions Doesn't this comment belong four lines earlier? foad: Doesn't this comment belong four lines earlier?
		let SubtargetPredicate = Has16BitInsts in {
		def : FMADPat <f16, V_MAC_F16_e64, fmad>;
		def : FMADPat <f16, V_MAC_F16_e64, AMDGPUfmad_ftz>;
}		}

defm : FMADPat <f16, V_MAC_F16_e64>;		}
defm : FMADPat <f32, V_MAC_F32_e64>;

class FMADModsPat<Instruction inst, SDPatternOperator mad_opr, ValueType Ty>		class FMADModsPat<ValueType Ty, Instruction inst, SDPatternOperator mad_opr>
: GCNPat<		: GCNPat<
(Ty (mad_opr (Ty (VOP3Mods Ty:$src0, i32:$src0_mod)),		(Ty (mad_opr (Ty (VOP3Mods Ty:$src0, i32:$src0_mod)),
(Ty (VOP3Mods Ty:$src1, i32:$src1_mod)),		(Ty (VOP3Mods Ty:$src1, i32:$src1_mod)),
(Ty (VOP3Mods Ty:$src2, i32:$src2_mod)))),		(Ty (VOP3Mods Ty:$src2, i32:$src2_mod)))),
(inst $src0_mod, $src0, $src1_mod, $src1,		(inst $src0_mod, $src0, $src1_mod, $src1,
$src2_mod, $src2, DSTCLAMP.NONE, DSTOMOD.NONE)		$src2_mod, $src2, DSTCLAMP.NONE, DSTOMOD.NONE)
>;		>;

// FIXME: This should select to V_MAC_F32		def : FMADModsPat<f32, V_MAD_F32, AMDGPUfmad_ftz>;
def : FMADModsPat<V_MAD_F32, AMDGPUfmad_ftz, f32>;		def : FMADModsPat<f16, V_MAD_F16, AMDGPUfmad_ftz> {
def : FMADModsPat<V_MAD_F16, AMDGPUfmad_ftz, f16> {
let SubtargetPredicate = Has16BitInsts;		let SubtargetPredicate = Has16BitInsts;
}		}

class VOPSelectModsPat <ValueType vt> : GCNPat <		class VOPSelectModsPat <ValueType vt> : GCNPat <
(vt (select i1:$src0, (VOP3Mods vt:$src1, i32:$src1_mods),		(vt (select i1:$src0, (VOP3Mods vt:$src1, i32:$src1_mods),
(VOP3Mods vt:$src2, i32:$src2_mods))),		(VOP3Mods vt:$src2, i32:$src2_mods))),
(V_CNDMASK_B32_e64 FP32InputMods:$src2_mods, VSrc_b32:$src2,		(V_CNDMASK_B32_e64 FP32InputMods:$src2_mods, VSrc_b32:$src2,
FP32InputMods:$src1_mods, VSrc_b32:$src1, SSrc_i1:$src0)		FP32InputMods:$src1_mods, VSrc_b32:$src1, SSrc_i1:$src0)
▲ Show 20 Lines • Show All 1,464 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-amdgcn.fmad.ftz.mir

Show All 13 Lines	body: \|
bb.0:		bb.0:
liveins: $vgpr0, $vgpr1, $vgpr2		liveins: $vgpr0, $vgpr1, $vgpr2

; GCN-LABEL: name: fmad_ftz_s32_vvvv		; GCN-LABEL: name: fmad_ftz_s32_vvvv
; GCN: liveins: $vgpr0, $vgpr1, $vgpr2		; GCN: liveins: $vgpr0, $vgpr1, $vgpr2
; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1		; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2		; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:vgpr(s32) = COPY $vgpr0		%0:vgpr(s32) = COPY $vgpr0
%1:vgpr(s32) = COPY $vgpr1		%1:vgpr(s32) = COPY $vgpr1
%2:vgpr(s32) = COPY $vgpr2		%2:vgpr(s32) = COPY $vgpr2
%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2		%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2
S_ENDPGM 0, implicit %3		S_ENDPGM 0, implicit %3
...		...

---		---
name: fmad_ftz_s32_vsvv		name: fmad_ftz_s32_vsvv
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0, $vgpr0, $vgpr1		liveins: $sgpr0, $vgpr0, $vgpr1

; GCN-LABEL: name: fmad_ftz_s32_vsvv		; GCN-LABEL: name: fmad_ftz_s32_vsvv
; GCN: liveins: $sgpr0, $vgpr0, $vgpr1		; GCN: liveins: $sgpr0, $vgpr0, $vgpr1
; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0		; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr1		; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr1
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:sgpr(s32) = COPY $sgpr0		%0:sgpr(s32) = COPY $sgpr0
%1:vgpr(s32) = COPY $vgpr0		%1:vgpr(s32) = COPY $vgpr0
%2:vgpr(s32) = COPY $vgpr1		%2:vgpr(s32) = COPY $vgpr1
%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2		%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2
S_ENDPGM 0, implicit %3		S_ENDPGM 0, implicit %3
...		...

---		---
name: fmad_ftz_s32_vvsv		name: fmad_ftz_s32_vvsv
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0, $vgpr0, $vgpr1		liveins: $sgpr0, $vgpr0, $vgpr1

; GCN-LABEL: name: fmad_ftz_s32_vvsv		; GCN-LABEL: name: fmad_ftz_s32_vvsv
; GCN: liveins: $sgpr0, $vgpr0, $vgpr1		; GCN: liveins: $sgpr0, $vgpr0, $vgpr1
; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr0		; GCN: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr0
; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr1		; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr1
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:vgpr(s32) = COPY $vgpr0		%0:vgpr(s32) = COPY $vgpr0
%1:sgpr(s32) = COPY $sgpr0		%1:sgpr(s32) = COPY $sgpr0
%2:vgpr(s32) = COPY $vgpr1		%2:vgpr(s32) = COPY $vgpr1
%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2		%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2
S_ENDPGM 0, implicit %3		S_ENDPGM 0, implicit %3
...		...

---		---
name: fmad_ftz_s32_vvvs		name: fmad_ftz_s32_vvvs
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0, $vgpr0, $vgpr1		liveins: $sgpr0, $vgpr0, $vgpr1

; GCN-LABEL: name: fmad_ftz_s32_vvvs		; GCN-LABEL: name: fmad_ftz_s32_vvvs
; GCN: liveins: $sgpr0, $vgpr0, $vgpr1		; GCN: liveins: $sgpr0, $vgpr0, $vgpr1
; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr0		; GCN: [[COPY2:%[0-9]+]]:sreg_32 = COPY $sgpr0
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec		; GCN: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY2]]
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY]], 0, [[COPY1]], 0, [[COPY3]], 0, 0, implicit $exec
		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:vgpr(s32) = COPY $vgpr0		%0:vgpr(s32) = COPY $vgpr0
%1:vgpr(s32) = COPY $vgpr0		%1:vgpr(s32) = COPY $vgpr0
%2:sgpr(s32) = COPY $sgpr0		%2:sgpr(s32) = COPY $sgpr0
%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2		%3:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %2
S_ENDPGM 0, implicit %3		S_ENDPGM 0, implicit %3
...		...


# Same SGPR used, so doesn't violate the constant bus restriction.		# Same SGPR used, so doesn't violate the constant bus restriction.
---		---
name: fmad_ftz_s32_vssv		name: fmad_ftz_s32_vssv
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0, $vgpr0		liveins: $sgpr0, $vgpr0

; GCN-LABEL: name: fmad_ftz_s32_vssv		; GCN-LABEL: name: fmad_ftz_s32_vssv
; GCN: liveins: $sgpr0, $vgpr0		; GCN: liveins: $sgpr0, $vgpr0
; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0		; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $exec		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $exec
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:sgpr(s32) = COPY $sgpr0		%0:sgpr(s32) = COPY $sgpr0
%1:vgpr(s32) = COPY $vgpr0		%1:vgpr(s32) = COPY $vgpr0
%2:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %0, %1		%2:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %0, %1
S_ENDPGM 0, implicit %2		S_ENDPGM 0, implicit %2
...		...

---		---
name: fmad_ftz_s32_vsvs		name: fmad_ftz_s32_vsvs
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0, $vgpr0		liveins: $sgpr0, $vgpr0

; GCN-LABEL: name: fmad_ftz_s32_vsvs		; GCN-LABEL: name: fmad_ftz_s32_vsvs
; GCN: liveins: $sgpr0, $vgpr0		; GCN: liveins: $sgpr0, $vgpr0
; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0		; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY]], 0, [[COPY1]], 0, [[COPY]], 0, 0, implicit $exec		; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY]], 0, [[COPY1]], 0, [[COPY2]], 0, 0, implicit $exec
		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:sgpr(s32) = COPY $sgpr0		%0:sgpr(s32) = COPY $sgpr0
%1:vgpr(s32) = COPY $vgpr0		%1:vgpr(s32) = COPY $vgpr0
%2:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %0		%2:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %1, %0
S_ENDPGM 0, implicit %2		S_ENDPGM 0, implicit %2
...		...

---		---
name: fmad_ftz_s32_vvss		name: fmad_ftz_s32_vvss
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0, $vgpr0		liveins: $sgpr0, $vgpr0

; GCN-LABEL: name: fmad_ftz_s32_vvss		; GCN-LABEL: name: fmad_ftz_s32_vvss
; GCN: liveins: $sgpr0, $vgpr0		; GCN: liveins: $sgpr0, $vgpr0
; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0		; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0		; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY1]], 0, [[COPY]], 0, [[COPY]], 0, 0, implicit $exec		; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY1]], 0, [[COPY]], 0, [[COPY2]], 0, 0, implicit $exec
		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:sgpr(s32) = COPY $sgpr0		%0:sgpr(s32) = COPY $sgpr0
%1:vgpr(s32) = COPY $vgpr0		%1:vgpr(s32) = COPY $vgpr0
%2:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %1, %0, %0		%2:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %1, %0, %0
S_ENDPGM 0, implicit %2		S_ENDPGM 0, implicit %2
...		...

---		---
name: fmad_ftz_s32_vsss		name: fmad_ftz_s32_vsss
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0, $vgpr0		liveins: $sgpr0, $vgpr0

; GCN-LABEL: name: fmad_ftz_s32_vsss		; GCN-LABEL: name: fmad_ftz_s32_vsss
; GCN: liveins: $sgpr0, $vgpr0		; GCN: liveins: $sgpr0, $vgpr0
; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0		; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
; GCN: [[V_MAD_F32_:%[0-9]+]]:vgpr_32 = V_MAD_F32 0, [[COPY]], 0, [[COPY]], 0, [[COPY]], 0, 0, implicit $exec		; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[COPY]]
; GCN: S_ENDPGM 0, implicit [[V_MAD_F32_]]		; GCN: [[V_MAC_F32_e64_:%[0-9]+]]:vgpr_32 = V_MAC_F32_e64 0, [[COPY]], 0, [[COPY]], 0, [[COPY1]], 0, 0, implicit $exec
		; GCN: S_ENDPGM 0, implicit [[V_MAC_F32_e64_]]
%0:sgpr(s32) = COPY $sgpr0		%0:sgpr(s32) = COPY $sgpr0
%1:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %0, %0		%1:vgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.fmad.ftz), %0, %0, %0
S_ENDPGM 0, implicit %1		S_ENDPGM 0, implicit %1
...		...


# FIXME: This should probably have been fixed by RegBankSelect, but we should fail to select it.		# FIXME: This should probably have been fixed by RegBankSelect, but we should fail to select it.
# ---		# ---
Show All 40 Lines

llvm/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_fold_canonicalize_fma_value_f32(float addrspace(1)* %arg) {
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.fma.f32(float %load, float 15.0, float 15.0)		%v = call float @llvm.fma.f32(float %load, float 15.0, float 15.0)
%canonicalized = tail call float @llvm.canonicalize.f32(float %v)		%canonicalized = tail call float @llvm.canonicalize.f32(float %v)
store float %canonicalized, float addrspace(1)* %gep, align 4		store float %canonicalized, float addrspace(1)* %gep, align 4
ret void		ret void
}		}

; GCN-LABEL: test_fold_canonicalize_fmad_ftz_value_f32:		; GCN-LABEL: test_fold_canonicalize_fmad_ftz_value_f32:
; GCN: s_mov_b32 [[SGPR:s[0-9]+]], 0x41700000		; GCN: v_mov_b32_e32 [[V:v[0-9]+]], 0x41700000
; GCN: v_mad_f32 [[V:v[0-9]+]], v{{[0-9]+}}, [[SGPR]], [[SGPR]]		; GCN: v_mac_f32_e32 [[V]], v{{[0-9]+}}, v{{[0-9]+$}}
; GCN-NOT: v_mul		; GCN-NOT: v_mul
; GCN-NOT: v_max		; GCN-NOT: v_max
; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]		; GCN: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[V]]
define amdgpu_kernel void @test_fold_canonicalize_fmad_ftz_value_f32(float addrspace(1)* %arg) {		define amdgpu_kernel void @test_fold_canonicalize_fmad_ftz_value_f32(float addrspace(1)* %arg) {
%id = tail call i32 @llvm.amdgcn.workitem.id.x()		%id = tail call i32 @llvm.amdgcn.workitem.id.x()
%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id		%gep = getelementptr inbounds float, float addrspace(1)* %arg, i32 %id
%load = load float, float addrspace(1)* %gep, align 4		%load = load float, float addrspace(1)* %gep, align 4
%v = call float @llvm.amdgcn.fmad.ftz.f32(float %load, float 15.0, float 15.0)		%v = call float @llvm.amdgcn.fmad.ftz.f32(float %load, float 15.0, float 15.0)
▲ Show 20 Lines • Show All 761 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmad.ftz.f16.ll

; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX8 %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX8 %s
; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+fp32-denormals -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX8 %s		; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+fp32-denormals -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX8 %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+fp32-denormals -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+fp32-denormals -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

declare half @llvm.amdgcn.fmad.ftz.f16(half %a, half %b, half %c)		declare half @llvm.amdgcn.fmad.ftz.f16(half %a, half %b, half %c)

; GCN-LABEL: {{^}}mad_f16:		; GCN-LABEL: {{^}}mad_f16:
; GFX8: v_ma{{[dc]}}_f16		; GCN: v_mac_f16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+$}}
; GFX9: v_mad_legacy_f16
define amdgpu_kernel void @mad_f16(		define amdgpu_kernel void @mad_f16(
half addrspace(1)* %r,		half addrspace(1)* %r,
half addrspace(1)* %a,		half addrspace(1)* %a,
half addrspace(1)* %b,		half addrspace(1)* %b,
half addrspace(1)* %c) {		half addrspace(1)* %c) {
%a.val = load half, half addrspace(1)* %a		%a.val = load half, half addrspace(1)* %a
%b.val = load half, half addrspace(1)* %b		%b.val = load half, half addrspace(1)* %b
%c.val = load half, half addrspace(1)* %c		%c.val = load half, half addrspace(1)* %c
Show All 11 Lines	define amdgpu_kernel void @mad_f16_imm_a(
%b.val = load half, half addrspace(1)* %b		%b.val = load half, half addrspace(1)* %b
%c.val = load half, half addrspace(1)* %c		%c.val = load half, half addrspace(1)* %c
%r.val = call half @llvm.amdgcn.fmad.ftz.f16(half 8.0, half %b.val, half %c.val)		%r.val = call half @llvm.amdgcn.fmad.ftz.f16(half 8.0, half %b.val, half %c.val)
store half %r.val, half addrspace(1)* %r		store half %r.val, half addrspace(1)* %r
ret void		ret void
}		}

; GCN-LABEL: {{^}}mad_f16_imm_b:		; GCN-LABEL: {{^}}mad_f16_imm_b:
; GCN: s_movk_i32 [[KB:s[0-9]+]], 0x4800		; GCN: v_mac_f16_e32 {{v[0-9]+}}, 0x4800, {{v[0-9]+$}}
; GFX8: v_mad_f16 {{v[0-9]+}}, {{v[0-9]+}}, [[KB]],
; GFX9: v_mad_legacy_f16 {{v[0-9]+}}, {{v[0-9]+}}, [[KB]],
define amdgpu_kernel void @mad_f16_imm_b(		define amdgpu_kernel void @mad_f16_imm_b(
half addrspace(1)* %r,		half addrspace(1)* %r,
half addrspace(1)* %a,		half addrspace(1)* %a,
half addrspace(1)* %c) {		half addrspace(1)* %c) {
%a.val = load half, half addrspace(1)* %a		%a.val = load half, half addrspace(1)* %a
%c.val = load half, half addrspace(1)* %c		%c.val = load half, half addrspace(1)* %c
%r.val = call half @llvm.amdgcn.fmad.ftz.f16(half %a.val, half 8.0, half %c.val)		%r.val = call half @llvm.amdgcn.fmad.ftz.f16(half %a.val, half 8.0, half %c.val)
store half %r.val, half addrspace(1)* %r		store half %r.val, half addrspace(1)* %r
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fmad.ftz.ll

Show All 29 Lines	define amdgpu_kernel void @mad_f32_imm_a(
%c.val = load float, float addrspace(1)* %c		%c.val = load float, float addrspace(1)* %c
%r.val = call float @llvm.amdgcn.fmad.ftz.f32(float 8.0, float %b.val, float %c.val)		%r.val = call float @llvm.amdgcn.fmad.ftz.f32(float 8.0, float %b.val, float %c.val)
store float %r.val, float addrspace(1)* %r		store float %r.val, float addrspace(1)* %r
ret void		ret void
}		}

; GCN-LABEL: {{^}}mad_f32_imm_b:		; GCN-LABEL: {{^}}mad_f32_imm_b:
; GCN: v_mov_b32_e32 [[KB:v[0-9]+]], 0x41000000		; GCN: v_mov_b32_e32 [[KB:v[0-9]+]], 0x41000000
; GCN: v_ma{{[dc]}}_f32 {{v[0-9]+}}, {{[vs][0-9]+}}, [[KB]],		; GCN: v_mac_f32_e32 {{v[0-9]+}}, {{[s][0-9]+}}, [[KB]]
define amdgpu_kernel void @mad_f32_imm_b(		define amdgpu_kernel void @mad_f32_imm_b(
float addrspace(1)* %r,		float addrspace(1)* %r,
float addrspace(1)* %a,		float addrspace(1)* %a,
float addrspace(1)* %c) {		float addrspace(1)* %c) {
%a.val = load float, float addrspace(1)* %a		%a.val = load float, float addrspace(1)* %a
%c.val = load float, float addrspace(1)* %c		%c.val = load float, float addrspace(1)* %c
%r.val = call float @llvm.amdgcn.fmad.ftz.f32(float %a.val, float 8.0, float %c.val)		%r.val = call float @llvm.amdgcn.fmad.ftz.f32(float %a.val, float 8.0, float %c.val)
store float %r.val, float addrspace(1)* %r		store float %r.val, float addrspace(1)* %r
ret void		ret void
}		}

; GCN-LABEL: {{^}}mad_f32_imm_c:		; GCN-LABEL: {{^}}mad_f32_imm_c:
; GCN: v_mov_b32_e32 [[KC:v[0-9]+]], 0x41000000		; GCN: v_mov_b32_e32 [[C:v[0-9]+]], 0x41000000
; GCN: v_ma{{[dc]}}_f32 {{v[0-9]+}}, {{[vs][0-9]+}}, {{v[0-9]+}}, [[KC]]{{$}}		; GCN: s_load_dword [[A:s[0-9]+]]
		; GCN: s_load_dword [[B:s[0-9]+]]
		; GCN: v_mov_b32_e32 [[VB:v[0-9]+]], [[B]]
		; GCN: v_mac_f32_e32 [[C]], {{s[0-9]+}}, [[VB]]{{$}}
define amdgpu_kernel void @mad_f32_imm_c(		define amdgpu_kernel void @mad_f32_imm_c(
float addrspace(1)* %r,		float addrspace(1)* %r,
float addrspace(1)* %a,		float addrspace(1)* %a,
float addrspace(1)* %b) {		float addrspace(1)* %b) {
%a.val = load float, float addrspace(1)* %a		%a.val = load float, float addrspace(1)* %a
%b.val = load float, float addrspace(1)* %b		%b.val = load float, float addrspace(1)* %b
%r.val = call float @llvm.amdgcn.fmad.ftz.f32(float %a.val, float %b.val, float 8.0)		%r.val = call float @llvm.amdgcn.fmad.ftz.f32(float %a.val, float %b.val, float 8.0)
store float %r.val, float addrspace(1)* %r		store float %r.val, float addrspace(1)* %r
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines