Page MenuHomePhabricator

[AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary
ClosedPublic

Authored by mbrkusanin on Jul 9 2021, 9:22 AM.

Details

Summary

While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.

This fixes a Vulkan CTS test that would hang otherwise.

Diff Detail

Event Timeline

mbrkusanin created this revision.Jul 9 2021, 9:22 AM
mbrkusanin requested review of this revision.Jul 9 2021, 9:22 AM
arsenm added inline comments.Jul 9 2021, 2:12 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
2517

This isn't the precise condition. It should also cover class intrinsics and maybe a few other cases. I thought we had an is known vector bool helper somewhere?

foad added inline comments.Jul 12 2021, 7:42 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
2517

For SelectionDAG we have isBoolSGPR. I don't know of anything similar for GlobalISel.

arsenm added inline comments.Jul 12 2021, 3:26 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
2517

We should have the equivalent

Alternatively we could always insert and with exec and try to remove it in SIOptimizeExecMaskingPreRA (something similar to optimizeVcndVcmpPair).

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
2517

I could not find a global-isel equivalent.

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-brcond.mir
231–233

Is this correct? As far as I can tell this is the only way to copy sgpr to vcc.

foad added inline comments.Jul 28 2021, 7:52 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
2485–2486

I don't think you should handle amdgcn_icmp and amdgcn_fcmp here. They are strange beasts that return the full 32/64-bit results of executing a divergent comparison in all lanes, and they should be deprecated in favour of amdgcn_ballot. Remove the corresponding test case as well.

  • Removed amdgcn_icmp and amdgcn_fcmp
  • Covered case with amdgcn_ballot.
foad accepted this revision.Jul 28 2021, 9:02 AM

Looks OK to me, just one suggestion inline.

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
2518–2519

Use TRI.getBoolRC?

This revision is now accepted and ready to land.Jul 28 2021, 9:02 AM
foad added a comment.Jul 28 2021, 9:04 AM
  • Covered case with amdgcn_ballot.

No, ballot is a weird thing that returns the full 32/64-bit result of evaluating an expression in all lanes. You should not handle it here. I meant to approve the previous version of the patch.

  • Remove amdgcn_ballot case
  • Use TRI.getBoolRC()
foad accepted this revision.Jul 28 2021, 9:20 AM