The runtime functions memset and memcpy are lowered are declared with pointers to the default address space (0) while their ops however are compatible with memrefs taking any address space.
Such cases do not cause any issues with MLIRs LLVM Dialect due to bitcasts verifier being too lenient at the moment, but actual LLVM IR does not allow casting between address spaces using bitcast: https://godbolt.org/z/3a1z97rc9
This patch fixes the issue by inserting an address space cast before the bitcast, to first cast the pointer into the correct address space before doing the bitcast.
While you're here, would it be possible to update this to use the "global" address space from the GPU module instead of the hard-coded integer?
(On the other hand, that would make gpu-to-llvm platform dependent, something we've wanted to avoid...)