This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add amdgpu-promote-pointer-kernargs pass
AbandonedPublic

Authored by hliao on Oct 31 2019, 1:25 PM.

Details

Summary
  • Enable it before infer-address-space pass.

Event Timeline

hliao created this revision.Oct 31 2019, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2019, 1:25 PM
arsenm requested changes to this revision.Oct 31 2019, 1:28 PM

Clang should just be directly emitting the arguments with the global address space in the first place. It already has support for coercing argument types per calling convention and this is no different

This revision now requires changes to proceed.Oct 31 2019, 1:28 PM

Clang should just be directly emitting the arguments with the global address space in the first place. It already has support for coercing argument types per calling convention and this is no different

In CUDA/HIP language all pointer type kernel args are in default address space, so it is reasonable to emit them as pointers in default address space in IR. Translating them to global address space in clang codegen is not necessary and is better done in backend, as what is done in NVPTX

https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp#L30

Add @tra @rjmccall for comments about where should this be implemented.

Clang should just be directly emitting the arguments with the global address space in the first place. It already has support for coercing argument types per calling convention and this is no different

In CUDA/HIP language all pointer type kernel args are in default address space, so it is reasonable to emit them as pointers in default address space in IR. Translating them to global address space in clang codegen is not necessary and is better done in backend, as what is done in NVPTX

https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/NVPTX/NVPTXLowerArgs.cpp#L30

Add @tra @rjmccall for comments about where should this be implemented.

The language address space doesn't need to match the calling convention argument type. You can coerce them to global for codegen purposes

tra added a comment.Oct 31 2019, 3:06 PM

@arsenm has a point. We can do it in clang, and it seems to be a better long-term solution compared to patching-up the inputs' AS that we've done for NVPTX and, now, AMDGPU.

On the other hand I wonder whether the benefit is worth the effort. It moves responsibility of coercing pointers' AS from LLVM to the end-users with not much to show for it. I think the only real issue I see with the status quo and this patch is that this is the second instance of a trivial pass that does this kind of job. Perhaps we can make it into a generic IR pass which would be able to coerce the pointers to the right address space and use it for both AMDGPU and NVPTX.

If we do want to change the IR-level calling convention for the kernels, clang would not be the only place that would need to adapt to that change. We will need to think about transitioning existing external users, too (e.g. XLA in TensorFlow & JAX, julia). We may need to keep the promote-pointers-to-global-AS pass around for a while until the LLVM users have a chance to change their code to pass pointers using correct address space.

Another issue I'm concerned about is the interaction with the (still out of tree), hacky pass to handle kernels calling other kernels as a normal function for OpenCL. This is another area that clang needs codegen work to really emit the right IR/ABI for kernels. If we blindly rewrite arguments with this hack in place, we risk incorrectly rewriting the callable function form

Another issue I'm concerned about is the interaction with the (still out of tree), hacky pass to handle kernels calling other kernels as a normal function for OpenCL. This is another area that clang needs codegen work to really emit the right IR/ABI for kernels. If we blindly rewrite arguments with this hack in place, we risk incorrectly rewriting the callable function form

I think OpenCL still disallows generic pointer arguments to kernels, although I wouldn't like to rely on this fact for correctness

In D69679#1729466, @tra wrote:

@arsenm has a point. We can do it in clang, and it seems to be a better long-term solution compared to patching-up the inputs' AS that we've done for NVPTX and, now, AMDGPU.

If we do want to change the IR-level calling convention for the kernels, clang would not be the only place that would need to adapt to that change. We will need to think about transitioning existing external users, too (e.g. XLA in TensorFlow & JAX, julia). We may need to keep the promote-pointers-to-global-AS pass around for a while until the LLVM users have a chance to change their code to pass pointers using correct address space.

AMDGPU backend is able to handle pointer type kernel arg in default address space. This pass is more for performance.

arsenm added a comment.Nov 1 2019, 1:18 PM
In D69679#1729466, @tra wrote:

@arsenm has a point. We can do it in clang, and it seems to be a better long-term solution compared to patching-up the inputs' AS that we've done for NVPTX and, now, AMDGPU.

If we do want to change the IR-level calling convention for the kernels, clang would not be the only place that would need to adapt to that change. We will need to think about transitioning existing external users, too (e.g. XLA in TensorFlow & JAX, julia). We may need to keep the promote-pointers-to-global-AS pass around for a while until the LLVM users have a chance to change their code to pass pointers using correct address space.

AMDGPU backend is able to handle pointer type kernel arg in default address space. This pass is more for performance.

Yes, but it’s ultimately a workaround for not producing the correct pointer type in the first place. It adds more passes and intermediate instructions that could be avoided

hliao abandoned this revision.Nov 5 2019, 10:10 AM

A different approach (D69826) is taken to address the same issue.