Mark the ctpop as convergent so it does not get moved into the
single-lane basic block. This saves us currently one instruction.
Another way to save this instruction is reusing the saved exec register
from the inserted control flow (output of s_saveexec). This is
currently hard to do though it might work when GlobalISel gets used.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Unfortunately this does not work anymore with the updated ballot intrinsic. I’ll leave this for later, see also D65088.
llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp | ||
---|---|---|
530 | This won't be preserved in any meaningful way to th backend, this should be removed |
This won't be preserved in any meaningful way to th backend, this should be removed