This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable v4f16 and above for v_pk_fma instructions
ClosedPublic

Authored by dstuttard on Jul 26 2019, 3:37 AM.

Details

Summary

If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)

Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)

Diff Detail

Event Timeline

dstuttard created this revision.Jul 26 2019, 3:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2019, 3:37 AM

+Stas to comment on the v_fmac_f16 test change.
Is it acceptable to change the result to look for v_pk_fma_f16 rather than 2 v_fmac_f16 instructions? If not, any suggestions on how to get the compiler to generate 2 x fmac instead?

arsenm added inline comments.Jul 26 2019, 5:59 AM
test/CodeGen/AMDGPU/llvm.fma.f16.ll
358

Should test the intrinsic rather than the contraction

dstuttard updated this revision to Diff 211938.Jul 26 2019, 7:18 AM

Changed test to use fma intrinsic

dstuttard marked an inline comment as done.Jul 26 2019, 7:18 AM
foad added a subscriber: foad.Jul 26 2019, 7:25 AM
dstuttard updated this revision to Diff 211957.Jul 26 2019, 9:40 AM

Managed to get the fmac test to keep using fmac
Also updated the test to use non-anonymous values

arsenm accepted this revision.Jul 26 2019, 9:44 AM
This revision is now accepted and ready to land.Jul 26 2019, 9:44 AM
This revision was automatically updated to reflect the committed changes.