While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.
This fixes a Vulkan CTS test that would hang otherwise.
Differential D105709
[AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary mbrkusanin on Jul 9 2021, 9:22 AM. Authored by
Details While v_cmp will AND inactive lanes with 0, that is not the case for logical This fixes a Vulkan CTS test that would hang otherwise.
Diff Detail
Event Timeline
Comment Actions Alternatively we could always insert and with exec and try to remove it in SIOptimizeExecMaskingPreRA (something similar to optimizeVcndVcmpPair).
Comment Actions Looks OK to me, just one suggestion inline.
Comment Actions No, ballot is a weird thing that returns the full 32/64-bit result of evaluating an expression in all lanes. You should not handle it here. I meant to approve the previous version of the patch. |
I don't think you should handle amdgcn_icmp and amdgcn_fcmp here. They are strange beasts that return the full 32/64-bit results of executing a divergent comparison in all lanes, and they should be deprecated in favour of amdgcn_ballot. Remove the corresponding test case as well.