This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Use V_MAC_F32 for fmad.ftz
ClosedPublic

Authored by arsenm on Mar 9 2020, 7:38 PM.

Details

Summary

This avoids regressions in a future patch. I'm confused by the use of
the gfx9 usage legacy_mad. Was this a pointless instruction rename, or
uses fmul_legacy handling? Why is regular mac avilable in that case?

Diff Detail

Event Timeline

arsenm created this revision.Mar 9 2020, 7:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2020, 7:38 PM
foad added inline comments.Mar 10 2020, 2:31 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
852

It's a shame that FMADPat and FMADModsPat take the same arguments but in a different order.

865

Doesn't this comment belong four lines earlier?

arsenm updated this revision to Diff 249406.Mar 10 2020, 9:10 AM

Fix parameter order.

foad added a comment.Mar 10 2020, 9:16 AM

Looks reasonable to me but I'd prefer if it was reviewed by someone who understands the difference between mad / mad_legacy / mac better than I do.

It was used in the div expansion for some non obvious reason. As far as I know it means 0.0 * x = 0.0, but I am not sure it is needed in this case.
Please run PSDB on this change, if it passes then LGTM.

It was used in the div expansion for some non obvious reason. As far as I know it means 0.0 * x = 0.0, but I am not sure it is needed in this case.
Please run PSDB on this change, if it passes then LGTM.

It’s used in the div expansion because we want it regardless of the denormal mode. My confusion is for the f16 case, where the instruction was renamed or changed for some reason

It was used in the div expansion for some non obvious reason. As far as I know it means 0.0 * x = 0.0, but I am not sure it is needed in this case.
Please run PSDB on this change, if it passes then LGTM.

It’s used in the div expansion because we want it regardless of the denormal mode. My confusion is for the f16 case, where the instruction was renamed or changed for some reason

In GFX8, the opcodes below would write their result as: { 16’h0, result[15:0] }

MAD_F16
MAD_U16
MAD_I16
FMA_F16
DIV_FIXUP_F16
INTERP_P2_F16

GFX9 adds the “OPSEL” field to VOP3 (which was all zeros in GFX8), and the behavior of those instructions changes to: { previous_gpr_value[31:16], result[15:0] } when used with VOP3. VOP1/VOP2 will write zero to unused bits unless SDWA specifies otherwise, and VOP1/VOP2 ops encoded as VOP3 will write zero.
In order to support apps written for GFX8, the opcodes above will be renamed with the word “legacy” added:

MAD_LEGACY_F16
MAD_LEGACY_U16
MAD_LEGACY_I16
FMA_LEGACY_F16
DIV_FIXUP_LEGACY_F16
INTERP_P2_LEGACY_F16

New opcodes with the names of the original opcodes have been created which support this new “preserve” behavior. Also, all new 16-bit opcodes for GFX9 support the new “preserve” behavior.

Please run PSDB here anyway as if there are problems conformance should catch it.

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

rampitec accepted this revision.Mar 10 2020, 2:15 PM

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

This revision is now accepted and ready to land.Mar 10 2020, 2:15 PM

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

We don't have tests that run with f16 denormals enabled. We also don't use the f16 intrinsic for div expansion (although I suppose we could in some cases?)

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

We don't have tests that run with f16 denormals enabled. We also don't use the f16 intrinsic for div expansion (although I suppose we could in some cases?)

f16 denormals are always enabled.

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

We don't have tests that run with f16 denormals enabled. We also don't use the f16 intrinsic for div expansion (although I suppose we could in some cases?)

f16 denormals are always enabled.

I

Please run PSDB here anyway as if there are problems conformance should catch it.

It passes, but there's no way we have anything testing the f16 path for this

There are half tests there, I suppose mad should be covered. LGTM.

We don't have tests that run with f16 denormals enabled. We also don't use the f16 intrinsic for div expansion (although I suppose we could in some cases?)

f16 denormals are always enabled.

I meant disabled. They're always enabled, so we have no useful tests for the case where they're forced off