Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.
This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.
Use the new intrinsic in the atomic optimizer pass.
I'm not going to commit this as-is because tests are failing due to
poor code generation, e.g. test2 in ballot.ll generates:
v_cmp_eq_u32_e32 vcc, v0, v1
v_cndmask_b32_e64 v0, 0, 1, vcc
v_cmp_ne_u32_e64 s[4:5], 0, v0
v_mov_b32_e32 v0, s4
v_mov_b32_e32 v1, s5
instead of:
v_cmp_eq_u32_e32 s[4:5], v0, v1
v_mov_b32_e32 v0, s4
v_mov_b32_e32 v1, s5
I'd appreciate feedback on (a) the idea, (b) the implementation and
(c) how best to improve the code generation.
Can you add a test for this in test/Analysis/DivergenceAnalysis/AMDGPU