This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Make f16 a legal type for VI subtargets
ClosedPublic

Authored by kzhuravl on Oct 26 2016, 1:31 AM.

Diff Detail

Event Timeline

kzhuravl updated this revision to Diff 75834.Oct 26 2016, 1:31 AM
kzhuravl retitled this revision from to AMDGPU/SI: Make f16 a legal type for VI subtargets.
kzhuravl updated this object.
kzhuravl added reviewers: tstellarAMD, arsenm.
kzhuravl added a subscriber: llvm-commits.

All lit tests are passing, did not get a chance to verify on hardware yet. half.ll was replaced by amdgcn-vop*f16*.ll tests. I am in the process of f16 test restructuring, and did not have time to include following tests:

  • vector vt
  • vop2: v_mac_f16, v_madak_f16
  • vop3: v_mad_f16
  • vopc: all

I will upload those tests a bit later today.

include/llvm/IR/IntrinsicsAMDGPU.td
172–182 ↗(On Diff #75834)

Can these changes be done in a different patch? They change the intrinsic signature, and will require updating any users of this intrinsic in external projects.

lib/Target/AMDGPU/SIISelLowering.cpp
83–84

There is a 1-to-1 mapping between types and register classes, so you can drop the first addRegisterClass(MVT::f16, ... ) call

arsenm added inline comments.Oct 26 2016, 9:52 AM
lib/Target/AMDGPU/SIISelLowering.cpp
80–81

The TODO was taken care of already

83–84

I think this should only be added to SReg_32. We already have other random inconsistencies from having f32 in a VGPR class and i32 in SGPR which I've been trying to fix

3623–3624

Does this need an f16 is legal check? If not this can probably just be an !isVector check, any of the other FP types are unusable

lib/Target/AMDGPU/SIInstrInfo.cpp
1448–1452

I would prefer checking the more common f32 cases first

1539–1540

Ditto

lib/Target/AMDGPU/SIInstructions.td
452–454

The f32 patterns on the sources look wrong. Not sure how this compiles

kzhuravl updated this revision to Diff 77231.Nov 8 2016, 12:03 PM
kzhuravl edited edge metadata.
kzhuravl marked 7 inline comments as done.

Review feedback + tests

kzhuravl added inline comments.Nov 8 2016, 12:05 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3623–3624

This is needed for v_cmp_class_f16. I have forgotten to add Subtarget->has16BitInsts(), which I have added in the revised patch.

kzhuravl added inline comments.Nov 8 2016, 12:14 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3703–3704

Changed it by accident, I will put it back.

kzhuravl updated this revision to Diff 77247.Nov 8 2016, 1:16 PM
kzhuravl edited edge metadata.

Put back code that was removed accidently

kzhuravl marked an inline comment as done.Nov 8 2016, 1:16 PM
arsenm added inline comments.Nov 8 2016, 2:20 PM
lib/Target/AMDGPU/SIISelLowering.cpp
2041–2044

Braces

3768

Should this also handle f16? Could be separate optimization patch

lib/Target/AMDGPU/SIISelLowering.h
51

should start with lower case

lib/Target/AMDGPU/SIInstructions.td
422–423

Is this (and fpextend) correct? A correct lowering for these was just added I thought

lib/Target/AMDGPU/SIRegisterInfo.td
254

Missing i16/f16 but it probably doesn't matter

lib/Target/AMDGPU/SISchedule.td
29

I don't think we need this since they run at the same rat as 32-bit

lib/Target/AMDGPU/VOP2Instructions.td
326

Why is this necessary? It can already mangle the FP type. This should work if you change it to VOP_F16_F16_I32, the int type doesn't matter

kzhuravl updated this revision to Diff 77344.Nov 9 2016, 6:54 AM
kzhuravl edited edge metadata.
kzhuravl marked 4 inline comments as done.

Address review feedback

kzhuravl added inline comments.Nov 9 2016, 6:54 AM
lib/Target/AMDGPU/SIISelLowering.cpp
3768

Yes, I was initially planning to do it in this patch, but then decided that separate patch would be better.

lib/Target/AMDGPU/SIInstructions.td
422–423

I think so. Without these changes a few existing tests (i.e. fptrunc.ll) were failing with the "cannot select error".

lib/Target/AMDGPU/VOP2Instructions.td
326

Would this be acceptable to use VOP_F16_F16_I32? Spec says:

D.f16 = S0.f16 * (2 ** S1.i16)

So it should be i16. I have a follow up patch that changes ldexp intrinsic.

kzhuravl updated this revision to Diff 77376.Nov 9 2016, 10:26 AM
kzhuravl edited edge metadata.
kzhuravl marked 2 inline comments as done.

Rebased + added v_ldexp_f16

kzhuravl added inline comments.Nov 9 2016, 10:27 AM
lib/Target/AMDGPU/VOP2Instructions.td
326

Changed to VOP_F16_F16_I32.

arsenm added inline comments.Nov 11 2016, 10:41 AM
lib/Target/AMDGPU/VOP2Instructions.td
348

This needs new to use an f16 variant of VOP_MAC. I just ran into some problems from this

test/MC/Disassembler/AMDGPU/sdwa_vi.txt
303–305

This looks accidental

kzhuravl updated this revision to Diff 77653.Nov 11 2016, 12:34 PM
kzhuravl edited edge metadata.
kzhuravl marked an inline comment as done.

Address review feedback + include D26475 (without changing the return type)

kzhuravl marked an inline comment as done.Nov 11 2016, 12:35 PM
tstellarAMD accepted this revision.Nov 11 2016, 1:39 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Nov 11 2016, 1:39 PM
arsenm edited edge metadata.Nov 11 2016, 1:39 PM

LGTM except for some nits

lib/Target/AMDGPU/SIISelLowering.cpp
2033–2035

Braces

lib/Target/AMDGPU/SIShrinkInstructions.cpp
93–94

I would sort f16 after

lib/Target/AMDGPU/VOP2Instructions.td
148–149

I think it should be easy to make VOP_MAC be the class with the type operand to avoid copy pasting the entire thing just to change the type

test/CodeGen/AMDGPU/fadd.f16.ll
2

s/SI/CI

test/CodeGen/AMDGPU/fcmp.f16.ll
2

Ditto

test/CodeGen/AMDGPU/llvm.exp2.f16.ll
42

The naming convention should not name this vector and end in the actual vector type, v2f16 so this can expand to other vector widths that we might need to test later

kzhuravl marked 6 inline comments as done.Nov 12 2016, 9:42 PM
This revision was automatically updated to reflect the committed changes.