- Enable it before infer-address-space pass.
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 40359 Build 40466: arc lint + arc unit
Event Timeline
Clang should just be directly emitting the arguments with the global address space in the first place. It already has support for coercing argument types per calling convention and this is no different
In CUDA/HIP language all pointer type kernel args are in default address space, so it is reasonable to emit them as pointers in default address space in IR. Translating them to global address space in clang codegen is not necessary and is better done in backend, as what is done in NVPTX
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp#L30
Add @tra @rjmccall for comments about where should this be implemented.
The language address space doesn't need to match the calling convention argument type. You can coerce them to global for codegen purposes
@arsenm has a point. We can do it in clang, and it seems to be a better long-term solution compared to patching-up the inputs' AS that we've done for NVPTX and, now, AMDGPU.
On the other hand I wonder whether the benefit is worth the effort. It moves responsibility of coercing pointers' AS from LLVM to the end-users with not much to show for it. I think the only real issue I see with the status quo and this patch is that this is the second instance of a trivial pass that does this kind of job. Perhaps we can make it into a generic IR pass which would be able to coerce the pointers to the right address space and use it for both AMDGPU and NVPTX.
If we do want to change the IR-level calling convention for the kernels, clang would not be the only place that would need to adapt to that change. We will need to think about transitioning existing external users, too (e.g. XLA in TensorFlow & JAX, julia). We may need to keep the promote-pointers-to-global-AS pass around for a while until the LLVM users have a chance to change their code to pass pointers using correct address space.
Another issue I'm concerned about is the interaction with the (still out of tree), hacky pass to handle kernels calling other kernels as a normal function for OpenCL. This is another area that clang needs codegen work to really emit the right IR/ABI for kernels. If we blindly rewrite arguments with this hack in place, we risk incorrectly rewriting the callable function form
I think OpenCL still disallows generic pointer arguments to kernels, although I wouldn't like to rely on this fact for correctness
AMDGPU backend is able to handle pointer type kernel arg in default address space. This pass is more for performance.
Yes, but it’s ultimately a workaround for not producing the correct pointer type in the first place. It adds more passes and intermediate instructions that could be avoided