The 32-bit floating-point atomic add instructions on AMDGPUs does not support a
"flat" or "generic" address space. So, if the address space cannot be determined
statically, the AMDGPU backend will fall back to a CAS loop (which does support
"flat" addressing). Instead, this patch emits runtime address-space checks to
allow native FP atomic add instructions for global and LDS memory (and non-atomic
FP add instructions for private/scratch memory).
In order to do that, this patch introduces a new interface function
emitExpandAtomicRMW. It is expected to be called when a common atomic expand
doesn't work for a specific target, such as the case we discussed here.
If this atomic falls into system scope it has to be expanded into CAS. This code breaks the logic.
The check below was done after the AS check to perform a fast check first since the outcome is the same anyway. This is not true anymore.