This takes a uniform 64-bit bitmask, and returns a boolean value which
is true for each thread if the corresponding bit is 1. The
implementation of subgroupInverseBallot() in radv (and AMDVLK presumably) is
currently implements this by shifting by the thread-id, but this is a
complicated way of doing what's basically just a no-op thanks to how booleans
are stored in SGPR's.
Although the user guarantees that the value is uniform, it may still
wind up in VGPR's thanks to deficiencies in the backend, in which case
we have to emit two readfirstlane's.
extra whitespace change