Page MenuHomePhabricator

[AMDGPU] global-isel support for RT
ClosedPublic

Authored by rampitec on Sep 17 2020, 11:49 AM.

Diff Detail

Event Timeline

rampitec created this revision.Sep 17 2020, 11:49 AM
rampitec requested review of this revision.Sep 17 2020, 11:49 AM
rampitec added a reviewer: arsenm.
rampitec added a subscriber: Restricted Project.
arsenm added inline comments.Sep 17 2020, 3:49 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3042

Braces here

3057–3073

Can you do this during custom lowering rather than adding bit operations here late? I'm also surprised a V_PACK_B32_F16 is involved here

arsenm added inline comments.Sep 17 2020, 3:52 PM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4402

Missing waterfall loop for SGPR operand

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll
68

Should include a case where this needs a waterfall loop

rampitec added inline comments.Sep 17 2020, 3:56 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3057–3073

What kind of operations you'd like to see in the custom lowering? v_pack_b32_f16 should be fine, this is packed half type in this case.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4402

That's a descriptor, I'd rather refuse to select.

arsenm added inline comments.Sep 17 2020, 4:28 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3057–3073

But v_pack_b32_f16 isn't semantically the same as the bit packing, so I would be surprised to insert this for the argument handling.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
4402

You can never guarantee the input is uniform or in a VGPR. We can do the right thing now easily (and every other intrinsic with a descriptor does it)

rampitec added inline comments.Sep 17 2020, 4:32 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3057–3073

That's the best instruction for the job IMO. What we are doing is repacking vector of halfs.

arsenm added inline comments.Sep 17 2020, 4:33 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3057–3073

But it does change the input values. I believe this is a canonicalizing operation, so may flush denorms and quiet snans

rampitec added inline comments.Sep 17 2020, 4:36 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3057–3073

It should behave the same as bhv itself, the mode is common right? So if flushing on the value will be flushed anyway. Everything else results in a longer code. It can use v_lshl_or_b32, but it will also need an extra v_and_b32 to clear high half.

rampitec updated this revision to Diff 292662.Sep 17 2020, 4:50 PM
rampitec marked 4 inline comments as done.
rampitec updated this revision to Diff 292675.Sep 17 2020, 5:20 PM
rampitec marked 2 inline comments as done.

Replaced v_pack with two instructions.

rampitec updated this revision to Diff 292679.Sep 17 2020, 5:43 PM

Actually it can be done with a singe v_perm_b32.

rampitec added inline comments.Sep 17 2020, 6:00 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3057–3073

Doing a custom lowering would need 4 different custom nodes and then selection. It will be much more overhead.

arsenm added inline comments.Sep 18 2020, 4:09 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3057–3073

You could use one wrapper instruction like the image intrinsics. We should expose the bit packing to the post-legalize combiner

rampitec updated this revision to Diff 293267.Sep 21 2020, 3:08 PM

Added legalization. Code is expectedly worse though.

rampitec marked 2 inline comments as done.Sep 21 2020, 3:09 PM
arsenm accepted this revision.Thu, Sep 24, 10:12 AM

LGTM. Code should get better when we start trying to optimize bit ops

This revision is now accepted and ready to land.Thu, Sep 24, 10:12 AM
This revision was automatically updated to reflect the committed changes.