While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.
This fixes a Vulkan CTS test that would hang otherwise.
Paths
| Differential D105709
[AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary ClosedPublic Authored by mbrkusanin on Jul 9 2021, 9:22 AM.
Details Summary While v_cmp will AND inactive lanes with 0, that is not the case for logical This fixes a Vulkan CTS test that would hang otherwise.
Diff Detail Event TimelineHerald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptJul 9 2021, 9:22 AM
Comment Actions Alternatively we could always insert and with exec and try to remove it in SIOptimizeExecMaskingPreRA (something similar to optimizeVcndVcmpPair).
Comment Actions Looks OK to me, just one suggestion inline.
This revision is now accepted and ready to land.Jul 28 2021, 9:02 AM Comment Actions
No, ballot is a weird thing that returns the full 32/64-bit result of evaluating an expression in all lanes. You should not handle it here. I meant to approve the previous version of the patch. Closed by commit rG971f4173f82d: [AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary (authored by mbrkusanin). · Explain WhyJul 29 2021, 2:23 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 362411 llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-brcond.mir
|
I don't think you should handle amdgcn_icmp and amdgcn_fcmp here. They are strange beasts that return the full 32/64-bit results of executing a divergent comparison in all lanes, and they should be deprecated in favour of amdgcn_ballot. Remove the corresponding test case as well.