This is an archive of the discontinued LLVM Phabricator instance.

[cuda][hip] Add CUDA builtin surface/texture reference support.
ClosedPublic

Authored by hliao on Mar 18 2020, 7:54 AM.

Details

Summary
  • Even though the bindless surface/texture interfaces are promoted, there are still code using surface/texture references. For example, PR#26400 reports the compilation issue for code using tex2D with texture references. For better compatibility, this patch proposes the support of surface/texture references.
  • Due to the absent documentation and magic headers, it's believed that nvcc does use builtins for texture support. From the limited NVVM documentation[^nvvm] and NVPTX backend texture/surface related tests[^test], it's believed that surface/texture references are supported by replacing their reference types, which are annotated with device_builtin_surface_type/device_builtin_texture_type, with the corresponding handle-like object types, cudaSurfaceObject_t or cudaTextureObject_t, in the device-side compilation. On the host side, that global handle variables are registered and will be established and updated later when corresponding binding/unbinding APIs are called[^bind]. Surface/texture references are most like device global variables but represented in different types on the host and device sides.
  • In this patch, the following changes are proposed to support that behavior: + Refine device_builtin_surface_type and device_builtin_texture_type attributes to be applied on Type decl only to check whether a variable is of the surface/texture reference type. + Add hooks in code generation to replace that reference types with the correponding object types as well as all accesses to them. In particular, nvvm.texsurf.handle.internal should be used to load object handles from global reference variables[^texsurf] as well as metadata annotations. + Generate host-side registration with proper template argument parsing.

[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 `Texture reference API in CUDA C Programming Guide.
[^texsurf]: According to NVVM IR, nvvm.texsurf.handle should be used. But, the current backend doesn't have that supported. We may revise that later.

Diff Detail

Event Timeline

hliao created this revision.Mar 18 2020, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2020, 7:54 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
hliao updated this revision to Diff 251101.Mar 18 2020, 8:47 AM

Reformatting with clang-format.

hliao updated this revision to Diff 251125.Mar 18 2020, 10:07 AM

Fix warnings from clang-tidy.

hliao updated this revision to Diff 251241.Mar 18 2020, 7:04 PM

Revise one part of the logic to reduce condition evaluation overhead.

hliao updated this revision to Diff 251488.Mar 19 2020, 3:17 PM

More refinement to compile sample code with CUDA headers.

hliao added a comment.Mar 19 2020, 3:24 PM

With this revision, the following sample could be compiled with CUDA SDK and almost the same PTX code is generated.

#include <cuda.h>

texture<float, cudaTextureType2D, cudaReadModeElementType> tex;

#if defined(__clang__)
struct v4f {
  float x, y, z, w;
};
__device__ v4f
tex_2d_ld(texture<float, cudaTextureType2D, cudaReadModeElementType>,
          float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");

template <typename T>
static inline __device__ T
tex2D(texture<T, cudaTextureType2D, cudaReadModeElementType> t,
      float x, float y) {
  return tex_2d_ld(t, x, y).x;
}
#endif

__device__ float foo(float x, float y) { return tex2D(tex, x, y); }

Note that, clang-based one needs defining texture fetch functions as they could not be reused from CUDA SDK. That part is enclosed with #if defined(__clang__).

Here's the PTX code generated from NVCC. ``

kernel.ptx
//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-27506705
// Cuda compilation tools, release 10.2, V10.2.89
// Based on LLVM 3.4svn
//

.version 6.5
.target sm_30
.address_size 64

        // .globl       _Z3fooff
.visible .global .texref tex;

.visible .func  (.param .b32 func_retval0) _Z3fooff(
        .param .b32 _Z3fooff_param_0,
        .param .b32 _Z3fooff_param_1
)
{
        .reg .f32       %f<7>;
        .reg .b64       %rd<2>;


        ld.param.f32    %f1, [_Z3fooff_param_0];
        ld.param.f32    %f2, [_Z3fooff_param_1];
        tex.2d.v4.f32.f32       {%f3, %f4, %f5, %f6}, [tex, {%f1, %f2}];
        st.param.f32    [func_retval0+0], %f3;
        ret;
}

Here's the PTX code generated from Clang and LLVM backend. clang --cuda-device-only --cuda-gpu-arch=sm_30 -O2 -S kernel.cu

kernel-cuda-nvptx64-nvidia-cuda-sm_30.s
//
// Generated by LLVM NVPTX Back-End
//

.version 6.4
.target sm_30
.address_size 64

        // .globl       _Z3fooff
.visible .global .texref tex;

.visible .func  (.param .b32 func_retval0) _Z3fooff(
        .param .b32 _Z3fooff_param_0,
        .param .b32 _Z3fooff_param_1
)
{
        .reg .f32       %f<7>;
        .reg .b64       %rd<2>;

        ld.param.f32    %f1, [_Z3fooff_param_0];
        ld.param.f32    %f2, [_Z3fooff_param_1];
        mov.u64         %rd1, tex;
        tex.2d.v4.f32.f32       {%f3, %f4, %f5, %f6}, [%rd1, {%f1, %f2}];
        st.param.f32    [func_retval0+0], %f3;
        ret;

}
tra added a comment.Mar 19 2020, 4:06 PM

Note that, clang-based one needs defining texture fetch functions as they could not be reused from CUDA SDK. That part is enclosed with #if defined(clang).

What prevents clang to compile the texture functions in the CUDA headers? It looks like we'll need to implement the __nv_tex_surf_handler() builtin, but other than that it should work.

hliao added a comment.Mar 19 2020, 4:44 PM
In D76365#1932392, @tra wrote:

Note that, clang-based one needs defining texture fetch functions as they could not be reused from CUDA SDK. That part is enclosed with #if defined(clang).

What prevents clang to compile the texture functions in the CUDA headers? It looks like we'll need to implement the __nv_tex_surf_handler() builtin, but other than that it should work.

That's a magic. I could not figure out how it works. From its use, e.g. tex2D on texture<T, cudaTextureType2D, cudaReadModeElementType>,

__nv_tex_surf_handler("__tex2D_v2", (typename __nv_tex_rmet_cast<T>::type) &temp, t, x, y);

__tex2D_v2 is a string literal. However, it's more likely a underly function name for the real implementation. Hardly imagine that that string literal is checked directly instead used for constructing the real function name. If that's the case, we also need to find that where that underlying functions are defined as the device bitcode library has no such definition.

hliao added a comment.Mar 19 2020, 4:48 PM
In D76365#1932398, @tra wrote:

This one only adds the definition but NVPTX backend doesn't handle it.

We also appear to have some plumbing for it in clang: https://github.com/llvm/llvm-project/blob/31262d6722c7ae6a9966a76064af43e5b3a8df71/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp#L724

Yeah, that's so far the internal version is used. The original one with one metadata as parameter that's only used to prevent CSE as the handle loading should be not optimized away or difficult for the backend to handle it. We should be able to add that intrinsic support easily. I could add that later. That should not be a big issue.

tra added a comment.Mar 19 2020, 5:12 PM

That's a magic. I could not figure out how it works. From its use, e.g. tex2D on texture<T, cudaTextureType2D, cudaReadModeElementType>,

__nv_tex_surf_handler("__tex2D_v2", (typename __nv_tex_rmet_cast<T>::type) &temp, t, x, y);

__tex2D_v2 is a string literal. However, it's more likely a underly function name for the real implementation. Hardly imagine that that string literal is checked directly instead used for constructing the real function name. If that's the case, we also need to find that where that underlying functions are defined as the device bitcode library has no such definition.

Most likely it's a compiler built-in with no implementation we could reuse and we'll need to implement our own. It should be fairly straightforward to figure out what it does by compiling all variants used by CUDA headers and observing generated PTX. The first 'meta' argument may be tricky, but we should be able to retrieve the constant string value in the front-end and map it to appropriate intrinsic or generate necessary glue.

hliao added a comment.Mar 20 2020, 7:07 AM
In D76365#1932517, @tra wrote:

That's a magic. I could not figure out how it works. From its use, e.g. tex2D on texture<T, cudaTextureType2D, cudaReadModeElementType>,

__nv_tex_surf_handler("__tex2D_v2", (typename __nv_tex_rmet_cast<T>::type) &temp, t, x, y);

__tex2D_v2 is a string literal. However, it's more likely a underly function name for the real implementation. Hardly imagine that that string literal is checked directly instead used for constructing the real function name. If that's the case, we also need to find that where that underlying functions are defined as the device bitcode library has no such definition.

Most likely it's a compiler built-in with no implementation we could reuse and we'll need to implement our own. It should be fairly straightforward to figure out what it does by compiling all variants used by CUDA headers and observing generated PTX. The first 'meta' argument may be tricky, but we should be able to retrieve the constant string value in the front-end and map it to appropriate intrinsic or generate necessary glue.

I could add that support gradually in my spare time. The goal of this patch not only addresses the texture/surface reference support for CUDA but also for HIP to keep the maximum compatibility. Once this is landed, we will follow the similar approach in HIP.

tra added inline comments.Mar 20 2020, 12:04 PM
clang/lib/CodeGen/CGCUDARuntime.h
51

This should be DeviceVarKind

53

Why does it need 2 bits?

In general, I think there's no point squeezing things into bitfields here as this struct is not going to be used all that often. I'd just use enum and bool.

clang/lib/CodeGen/CodeGenModule.cpp
701–713

Would isCUDADeviceBuiltinTextureType() be sufficient criteria for skipping TBAA regeneration?
Or does it need to be 'it is the texture type and it will be replaced with something else'? What is 'something else' is the same type?

4096–4122

This is the part I'm not comfortable with.
It's possible for the user to use the attribute on other types that do not match the expectations encoded here.
We should not be failing with an assert here because that's *user* error, not a compiler bug.

Expectations we have for the types should be enforced by Sema and compiler should produce proper diagnostics.

4102

Nit: 'Unexp*e*cted'

clang/lib/CodeGen/TargetInfo.cpp
6471–6472

What's the expectation here? Do we care which address spaces we're casting to/from?

6561

This part could use some additional comments. Why do we return an int64? Is that the size of the handle object? Is it guaranteed to always be a 64-bit int, or does it depend on particualr PTX version?

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

Please add comments on why CUDACC is needed for driver_types.h here? AFAICT, driver_types.h does not have any conditionals that depend on CUDACC. What happens if it's not defined.

clang/lib/Sema/SemaDeclAttr.cpp
6931–6932

Nit: Formatting is a bit odd here. Why is AL on a separate line?

clang/test/CodeGenCUDA/surface.cu
13–15

Please add a test for applying the attribute to a wrong type. I.e. a non-template or a template with different number or kinds of parameters. We should have a proper syntax error and not a compiler crash or silent failure.

hliao marked 5 inline comments as done.Mar 20 2020, 2:47 PM
hliao added inline comments.
clang/lib/CodeGen/CodeGenModule.cpp
701–713

The replacement only happens in the device compilation. On the host-side, the original type is still used.

4096–4122

device_builtin_surface_type and device_builtin_texture_type should only be used internally. Regular users of either CUDA or HIP must not use them as they need special internal handling and coordination beyond the compiler itself.

clang/lib/CodeGen/TargetInfo.cpp
6471–6472

We need to check whether we copy from that global variable directly. As all pointers are generic ones, the code here is to look through the addrspacecast constant expression for the original global variable.

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

driver_types.h includes host_defines.h, where macros __device_builtin_surface_type__ and __device_builtin_texture_type__ are conditional defined if __CUDACC__.

The following is extracted from cuda/crt/host_defines.h

#if !defined(__CUDACC__)
#define __device_builtin__
#define __device_builtin_texture_type__
#define __device_builtin_surface_type__
#define __cudart_builtin__
#else /* defined(__CUDACC__) */
#define __device_builtin__ \
        __location__(device_builtin)
#define __device_builtin_texture_type__ \
        __location__(device_builtin_texture_type)
#define __device_builtin_surface_type__ \
        __location__(device_builtin_surface_type)
#define __cudart_builtin__ \
        __location__(cudart_builtin)
#endif /* !defined(__CUDACC__) */
clang/lib/Sema/SemaDeclAttr.cpp
6931–6932

it's formatted by clang-format, which is run in pre-merge checks

hliao updated this revision to Diff 251777.Mar 20 2020, 2:50 PM

Minor revising following reviewer's comment. Work on Sema checks and upload another review.

tra added inline comments.Mar 20 2020, 5:12 PM
clang/lib/CodeGen/CodeGenModule.cpp
701–713

But you've already checked CUDAIsDevice so you already know that you want to replace the type.
if (getTargetCodeGenInfo().getCUDADeviceBuiltinTextureDeviceType() != nullptr) appears to be redundant and can probably be dropped.

4096–4122

I agree that it's probably not something that should be used by users.
Still, such use should be reported as an error and should *not* crash the compiler. Asserts are for clang/llvm developers to catch the bugs in the compiler itself, not for the end users misusing something they should not.

clang/lib/CodeGen/TargetInfo.cpp
6471–6472

I'm still not sure what exactly you want to do here.
If the assumption is that all addrspacecast ops you may see are from global to generic AS, this assumption is not always valid. I can annotate any pointer with an arbitrary address space which may then be cast to generic. Or something else.

If you accept Src as is, without special-casing addrspacecast, what's going to happen?
AFAICT nvvm_texsurf_handle_internal does not really care about specific AS.

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

My concern is -- what else is going to get defined? There are ~60 references to CUDACC in CUDA-10.1 headers. The wrappers are fragile enough that there's a good chance something may break. It does not help that my CUDA build bot decided to die just after we switched to work-from-home, so there will be no early warning if something goes wrong.

If all we need are the macros above, we may just define them.

clang/lib/Sema/SemaDeclAttr.cpp
6931–6932

Sorry. It was an artifact of messed up fonts in my browser. Apparently I've ended up using proportional font.
<rant> Why, oh why almost all fonts listed as 'fixed-width' on the chromebook are actually *not* ?! Even the ones that are fixed-width are prone to use ligatures and mess formatting. 'ffff' is still longer than 'fifi' for me.</rant>

This code looks much better with fixed-width font.

hliao updated this revision to Diff 252468.Mar 24 2020, 6:24 PM
hliao marked an inline comment as done.

Add Sema checks on CUDA device builtin surface/texture attributes.

hliao marked 6 inline comments as done.Mar 24 2020, 7:06 PM
hliao added inline comments.
clang/lib/CodeGen/CodeGenModule.cpp
701–713

That check is a target-specific one, which may choose very different implementation on how to handle these builtin surface/texture types. If they don't want to change those types on the device side and, instead, use very different different textureReference. Their getCUDADeviceBuiltinTextureDeviceType() may return nullptr to keep use the same reference type on both host- and device-side compilation.

4096–4122

addressed in the latest revision

clang/lib/CodeGen/TargetInfo.cpp
6471–6472

the backend needs a GlobalVariable as the argument for that intrinsic. The lookup through addrspacecast to check a global variable, which is created in the global address space and casted into a generic pointer.

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

Let me check all CUDA SDK through their dockers. Redefining sounds good me as wll.

clang/test/CodeGenCUDA/surface.cu
13–15

addressed in refined tests in the latest revision

hliao marked 3 inline comments as done.Mar 25 2020, 8:01 AM
hliao added inline comments.
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

I checked headers from 7.0 to 10.0, __device_builtin_texture_type__ and __builtin_builtin_surface_type__ are only defined with that attributes if __CUDACC__ is defined. As we only pre-define __CUDA_ARCH__ in clang but flip __CUDACC__ on and off in the wrapper headers to selectively reuse CUDA's headers. I would hear your suggestion on that.
BTW, macros like __device__ are defined regardless of __CUDACC__ from 7.0 to 10.0 as __location(device). __location__ is defined if __CUDACC__ is present. But, different from __device__, __device_builtin_texture_type__ is defined only __CUDACC__ is defined.

tra added inline comments.Mar 25 2020, 10:04 AM
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

__device_builtin_texture_type__ is defined in host_defines.h, which does not seem to include any other files or does anything suspicious with __CUDACC__

It may be OK to move inclusion of host_defines.h to the point before driver_types.h, which happens to include the host_defines.h first, and define CUDACC only around host_defines.h.

An alternative is to add the macros just after inclusion of host_defines.h

In either case please verify that these attributes are the only things that's changed by diffing output of clang++ -x cuda /dev/null --cuda-host-only -dD -E -o - before and after the change.

hliao updated this revision to Diff 252758.Mar 25 2020, 11:59 PM

Fix windows build and revise header change.

  • When including drivers_types.h or host_defines.h with __CUDACC__, the only difference is the additional attributes added. No additional change.
  • After including host_defines.h firstly with __CUDACC__, there is no significant change from the one including drivers_types.h.
hliao marked an inline comment as done.Mar 26 2020, 12:03 AM
hliao added inline comments.
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

With __CUDACC__, the only difference is the additional attributes added, such as device_builtin_texture_type. Attributes like cudart_builtin are also defined correctly. That should be used to start the support CUDART features.
I revised the change to include host_defines.h first and found there's no changes from the one using driver_types.h. We should be OK for that change.

hliao updated this revision to Diff 252828.Mar 26 2020, 6:50 AM

Rebase to the master code

tra accepted this revision.Mar 26 2020, 9:58 AM

LGTM. Next step is to figure out what various __nv_tex_surf_handler(<string>...) maps to for various strings (there are ~110 of them in CUDA-10.2) and implement its replacement. I think we should be able to do it in the wrapper file.

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗(On Diff #251488)

SGTM. Thank you for verifying this.

This revision is now accepted and ready to land.Mar 26 2020, 9:58 AM
In D76365#1944272, @tra wrote:

LGTM. Next step is to figure out what various __nv_tex_surf_handler(<string>...) maps to for various strings (there are ~110 of them in CUDA-10.2) and implement its replacement. I think we should be able to do it in the wrapper file.

Besides the texture/surface functions for their reference types, we also need to add corresponding ones for surface/texture object types as well. Even there are many but most of them are straight-forward, I will do that in my spare time. Thanks for review.

This revision was automatically updated to reflect the committed changes.
tra added a comment.Mar 27 2020, 9:38 AM

Looks like the change breaks compilation for us:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.

I'm investigating, but we may need to roll back this patch. Stay tuned.

tra added a comment.Mar 27 2020, 9:58 AM
In D76365#1946345, @tra wrote:

Looks like the change breaks compilation for us:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.

I'm investigating, but we may need to roll back this patch. Stay tuned.

It appears that the assumptions of what types the attributes can apply to are not valid. In CUDA headers they are also used on non-templated classes/structs. E.g in cuda/include/cuda_surface_types.h:74

struct __attribute__((device_builtin_surface_type)) surface : public surfaceReference
{
...
};

I'll undo this patch until we can make it work.

hliao added a comment.Mar 27 2020, 9:58 AM
In D76365#1946345, @tra wrote:

Looks like the change breaks compilation for us:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.

I'm investigating, but we may need to roll back this patch. Stay tuned.

I am looking into it as well. Thanks.

In D76365#1946407, @tra wrote:
In D76365#1946345, @tra wrote:

Looks like the change breaks compilation for us:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.

I'm investigating, but we may need to roll back this patch. Stay tuned.

It appears that the assumptions of what types the attributes can apply to are not valid. In CUDA headers they are also used on non-templated classes/structs. E.g in cuda/include/cuda_surface_types.h:74

struct __attribute__((device_builtin_surface_type)) surface : public surfaceReference
{
...
};

I'll undo this patch until we can make it work.

That's a partial template specialization needs handling. I am revising that patch. Please revert it first. Thanks.

tra added a comment.Mar 27 2020, 10:05 AM

That's a partial template specialization needs handling. I am revising that patch. Please revert it first. Thanks.

Reverted in fe8063e1a0e983f1b4d

tra reopened this revision.Mar 27 2020, 1:50 PM

Reopened for further work

This revision is now accepted and ready to land.Mar 27 2020, 1:50 PM
hliao updated this revision to Diff 253214.Mar 27 2020, 1:54 PM

Fix Sema checks on partial template specialization.

  • Revise Sema checks on the template class.
hliao added a comment.Mar 27 2020, 2:00 PM

Fix Sema checks on partial template specialization.

  • Revise Sema checks on the template class.

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

In D76365#1946908, @tra wrote:

Reopened for further work

This revision was automatically updated to reflect the committed changes.
tra added a comment.Mar 27 2020, 3:33 PM

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:

__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?
hliao added a comment.Mar 27 2020, 4:40 PM
In D76365#1947103, @tra wrote:

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:

__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?

I remembered that triggers NVCC internal errors or errors. I will check that this night.

hliao added a comment.EditedMar 27 2020, 8:35 PM
In D76365#1947103, @tra wrote:

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:

__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?

For such case, NVCC reports the following error:

kernel.cu(3): error: attribute "device_builtin_surface_type" does not apply here

1 error detected in the compilation of "kernel.cpp1.ii"

That error is generated after nvcc --keep -g -c kernel.cu from this sample code (kernel.cu)

#include <cuda.h>

__attribute__((device_builtin_surface_type)) int foo;

int f() {
  return foo;
}

I changed that sample code a little bit to this one

#include <cuda.h>

#if 1
typedef __attribute__((device_builtin_surface_type)) int dev_texsurf_int_t;
dev_texsurf_int_t foo;
#else
__attribute__((device_builtin_surface_type)) int foo;
#endif

int f() {
  return foo;
}

It triggers a crash in NVCC with the same compilation command line.

We may enhance clang to report an error instead of a warning only so far.

hliao added a comment.Mar 27 2020, 9:24 PM
In D76365#1947103, @tra wrote:

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:

__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?

For such case, NVCC reports the following error:

kernel.cu(3): error: attribute "device_builtin_surface_type" does not apply here

1 error detected in the compilation of "kernel.cpp1.ii"

That error is generated after nvcc --keep -g -c kernel.cu from this sample code (kernel.cu)

#include <cuda.h>

__attribute__((device_builtin_surface_type)) int foo;

int f() {
  return foo;
}

I changed that sample code a little bit to this one

#include <cuda.h>

#if 1
typedef __attribute__((device_builtin_surface_type)) int dev_texsurf_int_t;
dev_texsurf_int_t foo;
#else
__attribute__((device_builtin_surface_type)) int foo;
#endif

int f() {
  return foo;
}

It triggers a crash in NVCC with the same compilation command line.

We may enhance clang to report an error instead of a warning only so far.

I tried one more sample, it triggers NVCC crash as well.

struct __attribute__((device_builtin_surface_type)) ref {
  int x;
} R;

int f() { return R.x; }

For this case, clang reports error due to the same checks added in this patch.

tra added a comment.Mar 30 2020, 9:57 AM

Nice! I'll file a bug with NVIDIA.

tra added a comment.Apr 10 2020, 9:56 PM

It appears I can crash clang with some texture code: https://godbolt.org/z/5vdEwC

hliao added a comment.Apr 11 2020, 9:19 AM
In D76365#1975784, @tra wrote:

It appears I can crash clang with some texture code: https://godbolt.org/z/5vdEwC

llvm.nvvm.tex.unified.2d.v4f32.f32 has a vector output, the alias

__attribute__((device)) float tex2d_ld(tex_t, float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");

needs replacing with

__attribute__((device)) v4f tex2d_ld(tex_t, float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");

see this revised sample code https://godbolt.org/z/B7rtxR