This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPU] add required address space cast when lowering to LLVM
ClosedPublic

Authored by zero9178 on Feb 12 2023, 3:02 PM.

Details

Summary

The runtime functions memset and memcpy are lowered are declared with pointers to the default address space (0) while their ops however are compatible with memrefs taking any address space.
Such cases do not cause any issues with MLIRs LLVM Dialect due to bitcasts verifier being too lenient at the moment, but actual LLVM IR does not allow casting between address spaces using bitcast: https://godbolt.org/z/3a1z97rc9

This patch fixes the issue by inserting an address space cast before the bitcast, to first cast the pointer into the correct address space before doing the bitcast.

Diff Detail

Event Timeline

zero9178 created this revision.Feb 12 2023, 3:02 PM
zero9178 requested review of this revision.Feb 12 2023, 3:02 PM
krzysz00 accepted this revision.Feb 13 2023, 7:41 AM

I see so reason not to make this change.

mlir/test/Conversion/GPUCommon/lower-memset-to-gpu-runtime-calls.mlir
14

While you're here, would it be possible to update this to use the "global" address space from the GPU module instead of the hard-coded integer?

(On the other hand, that would make gpu-to-llvm platform dependent, something we've wanted to avoid...)

This revision is now accepted and ready to land.Feb 13 2023, 7:41 AM
zero9178 added inline comments.Feb 13 2023, 8:20 AM
mlir/test/Conversion/GPUCommon/lower-memset-to-gpu-runtime-calls.mlir
14

I am not 100% familiar with the infrastructure, but wouldn't this require an additional lowering pass to either rocdl or nvvm so that the named address space would be correctly mapped for the given GPU target? I'd rather keep the test as --gpu-to-llvm if possible. It doesn't change anything significant about the test either I think.

Yeah, I agree with keeping the test as is. We probably should, going forward, tweak gpu-to-llvm to allow for passing in a target-specific map like this, or extract it out of DTLI, or _something_, but that's not this revision.