This pass transforms
gep (addrspacecast p)
addrspacecast (gep p)
Doing the addrspacecast instruction first (which is valid in NVPTX but
not on all targets) lets LLVM optimize better, because LLVM treats
addrspacecast as a black-box function.
This is particularly important for the idiom clang will use for lowering
the __ldg builtin.