Implemented __nvvm_atom_*_gen_* builtins.
Integer variants are implmented as atomicrmw or cmpxchg instructions.
Atomic add for floating point (__nvvm_atom_add_gen_f()) is implemented as a call to an overloaded @llvm.nvvm.atomic.load.add.f32.xxx LVVM intrinsic.