These functions use __builtin_amdgcn_atomic_inc32():
uint32_t atomicInc(uint32_t *address); uint32_t atomicInc(uint32_t *address, uint32_t max);
These functions use builtin_amdgcn_fence():
kmpc_impl_threadfence()
kmpc_impl_threadfence_block()
kmpc_impl_threadfence_system()
They will take place of current mechanism of directly calling IR functions.
Since these are one line wrappers around intrinsics, it's probably better to implement them as INLINE annotated functions in target_impl.h. Less noise in the filesystem, can do some optimisation on the single-tu level before calling opt.