These functions use __builtin_amdgcn_atomic_inc32():
uint32_t atomicInc(uint32_t *address); uint32_t atomicInc(uint32_t *address, uint32_t max);
These functions use builtin_amdgcn_fence():
kmpc_impl_threadfence()
kmpc_impl_threadfence_block()
kmpc_impl_threadfence_system()
They will take place of current mechanism of directly calling IR functions.
Please delete the unused one. Also, should need to mark it INLINE instead of DEVICE to avoid duplicate symbol errors. I'm surprised this linked.