These intrinsics should work at least with standard integer and floating
point sizes, pointers, and vectors of those.
This fixes selection for non-s32 types when readfirstlane is inserted
for SGPR return values.
Moving the atomic optimizer pass in the pass pipeline so that it can be
simplified and rely on the more general support of lane intrinsics.
API users should move to these new intrinsics so that we can remove the