This is an archive of the discontinued LLVM Phabricator instance.

[cuda][hip] Add CUDA builtin surface/texture reference support.
AbandonedPublic

Authored by hliao on Mar 27 2020, 12:59 PM.

Details

Summary
  • Re-commit after fix Sema checks on partial template specialization.

Diff Detail

Event Timeline

hliao created this revision.Mar 27 2020, 12:59 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: cfe-commits. · View Herald Transcript
hliao added a comment.Mar 27 2020, 1:05 PM

This's revised change from https://reviews.llvm.org/D76365 after fixing Sema checks on the template partial specialization. With this change, I could compile the following sample code using surface reference.

kernel.cu

#include <cuda.h>

surface<void, cudaSurfaceType2D> surf;

#if defined(__clang__)
__device__ int
suld_2d_trap(surface<void, cudaSurfaceType2D>, int, int) asm("llvm.nvvm.suld.2d.i32.trap");

template <typename T>
static inline __device__ T
surf2Dread(surface<void, cudaSurfaceType2D> s, int x, int y) {
  // By default, `surf2Dread` uses trap mode.
  return suld_2d_trap(s, x, y);
}
#endif

__device__ int foo(int x, int y) { return surf2Dread<int>(surf, x, y); }

With NVCC, it generates

kernel.ptx after nvcc --ptx -rdc=true kernel.cu

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-27506705
// Cuda compilation tools, release 10.2, V10.2.89
// Based on LLVM 3.4svn
//

.version 6.5
.target sm_30
.address_size 64

        // .globl       _Z3fooii
.visible .global .surfref surf;

.visible .func  (.param .b32 func_retval0) _Z3fooii(
        .param .b32 _Z3fooii_param_0,
        .param .b32 _Z3fooii_param_1
)
{
        .reg .b32       %r<4>;
        .reg .b64       %rd<2>;


        ld.param.u32    %r1, [_Z3fooii_param_0];
        ld.param.u32    %r2, [_Z3fooii_param_1];
        suld.b.2d.b32.trap {%r3}, [surf, {%r1, %r2}];
        st.param.b32    [func_retval0+0], %r3;
        ret;
}

With Clang, it generates

kernel-cuda-nvptx64-nvidia-cuda-sm_30.s after clang --cuda-device-only --cuda-gpu-arch=sm_30 -O2 -S kernel.cu

//
// Generated by LLVM NVPTX Back-End
//

.version 6.4
.target sm_30
.address_size 64

        // .globl       _Z3fooii
.visible .global .surfref surf;

.visible .func  (.param .b32 func_retval0) _Z3fooii(
        .param .b32 _Z3fooii_param_0,
        .param .b32 _Z3fooii_param_1
)
{
        .reg .b32       %r<4>;
        .reg .b64       %rd<2>;

        ld.param.u32    %r1, [_Z3fooii_param_0];
        ld.param.u32    %r2, [_Z3fooii_param_1];
        mov.u64         %rd1, surf;
        suld.b.2d.b32.trap {%r3}, [%rd1, {%r1, %r2}];
        st.param.b32    [func_retval0+0], %r3;
        ret;

}
tra added a comment.Mar 27 2020, 1:36 PM

Would it be possible to update the old review with the new diff? It would make it easier to see the incremental changes you've made. If the old review can be reopened that would be great as it would keep all relevant info in one place, but I'm fine doing the review here, too, if phabricator does not let you do it.

clang/test/SemaCUDA/bad-attributes.cu
74–75

Please add few test cases replicating use of these attributes in CUDA headers.

hliao added a comment.Mar 27 2020, 1:39 PM
In D76948#1946861, @tra wrote:

Would it be possible to update the old review with the new diff? It would make it easier to see the incremental changes you've made. If the old review can be reopened that would be great as it would keep all relevant info in one place, but I'm fine doing the review here, too, if phabricator does not let you do it.

I tried that before submitting this one. But, as it's in the closed state, I cannot submit that anymore. I will attach the difference against the previous change somewhere.

hliao added a comment.Mar 27 2020, 1:43 PM
In D76948#1946861, @tra wrote:

Would it be possible to update the old review with the new diff? It would make it easier to see the incremental changes you've made. If the old review can be reopened that would be great as it would keep all relevant info in one place, but I'm fine doing the review here, too, if phabricator does not let you do it.

Check this for the new change.

https://gist.github.com/darkbuck/836dbb3112ca2e5fab769cf3cdaecd09

hliao marked an inline comment as done.Mar 27 2020, 1:45 PM
hliao added inline comments.
clang/test/SemaCUDA/bad-attributes.cu
74–75

the replication from CUDA headers is added on those codegen tests. These tests are illegal ones which sema checks should identify.

tra added a comment.Mar 27 2020, 1:51 PM

I tried that before submitting this one. But, as it's in the closed state, I cannot submit that anymore. I will attach the difference against the previous change somewhere.

I've reopened it. Let's move the patch and discussion there.

hliao abandoned this revision.Mar 27 2020, 1:55 PM