This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
AST/
-
Type.h
-
Basic/
-
Attr.td
-
AttrDocs.td
-
lib/
-
AST/
-
Type.cpp
-
CodeGen/
-
CGCUDANV.cpp
2
CGCUDARuntime.h
-
CGExprAgg.cpp
5/9
CodeGenModule.cpp
-
CodeGenTypes.cpp
-
TargetInfo.h
2/5
TargetInfo.cpp
-
Sema/
1/3
SemaDeclAttr.cpp
-
test/
-
CodeGenCUDA/
2/2
surface.cu
-
texture.cu
-
SemaCUDA/
-
attr-declspec.cu
-
attributes-on-non-cuda.cu
-
llvm/include/llvm/IR/
-
include/
-
llvm/
-
IR/
-
Operator.h

Differential D76365

[cuda][hip] Add CUDA builtin surface/texture reference support.
ClosedPublic

Authored by hliao on Mar 18 2020, 7:54 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
yaxunl
a.sidorin
aaron.ballman

Commits

rG5be9b8cbe2b2: [cuda][hip] Add CUDA builtin surface/texture reference support.
rGfe8063e1a0e9: Revert "[cuda][hip] Add CUDA builtin surface/texture reference support."
rG6a9ad5f3f4ac: [cuda][hip] Add CUDA builtin surface/texture reference support.

Summary

Even though the bindless surface/texture interfaces are promoted, there are still code using surface/texture references. For example, PR#26400 reports the compilation issue for code using tex2D with texture references. For better compatibility, this patch proposes the support of surface/texture references.
Due to the absent documentation and magic headers, it's believed that nvcc does use builtins for texture support. From the limited NVVM documentation[^nvvm] and NVPTX backend texture/surface related tests[^test], it's believed that surface/texture references are supported by replacing their reference types, which are annotated with device_builtin_surface_type/device_builtin_texture_type, with the corresponding handle-like object types, cudaSurfaceObject_t or cudaTextureObject_t, in the device-side compilation. On the host side, that global handle variables are registered and will be established and updated later when corresponding binding/unbinding APIs are called[^bind]. Surface/texture references are most like device global variables but represented in different types on the host and device sides.
In this patch, the following changes are proposed to support that behavior: + Refine device_builtin_surface_type and device_builtin_texture_type attributes to be applied on Type decl only to check whether a variable is of the surface/texture reference type. + Add hooks in code generation to replace that reference types with the correponding object types as well as all accesses to them. In particular, nvvm.texsurf.handle.internal should be used to load object handles from global reference variables[^texsurf] as well as metadata annotations. + Generate host-side registration with proper template argument parsing.

[^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf
[^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
[^bind]: See section 3.2.11.1.2 `Texture reference API in CUDA C Programming Guide.
[^texsurf]: According to NVVM IR, nvvm.texsurf.handle should be used. But, the current backend doesn't have that supported. We may revise that later.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hliao created this revision.Mar 18 2020, 7:54 AM

Herald added a reviewer: a.sidorin. · View Herald TranscriptMar 18 2020, 7:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2020, 7:54 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster failed remote builds in B49599: Diff 251084!Mar 18 2020, 8:42 AM

Reformatting with clang-format.

Harbormaster failed remote builds in B49608: Diff 251101!Mar 18 2020, 9:47 AM

Fix warnings from clang-tidy.

Harbormaster completed remote builds in B49618: Diff 251125.Mar 18 2020, 11:57 AM

Revise one part of the logic to reduce condition evaluation overhead.

Harbormaster completed remote builds in B49674: Diff 251241.Mar 18 2020, 9:09 PM

More refinement to compile sample code with CUDA headers.

With this revision, the following sample could be compiled with CUDA SDK and almost the same PTX code is generated.

#include <cuda.h>

texture<float, cudaTextureType2D, cudaReadModeElementType> tex;

#if defined(__clang__)
struct v4f {
  float x, y, z, w;
};
__device__ v4f
tex_2d_ld(texture<float, cudaTextureType2D, cudaReadModeElementType>,
          float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");

template <typename T>
static inline __device__ T
tex2D(texture<T, cudaTextureType2D, cudaReadModeElementType> t,
      float x, float y) {
  return tex_2d_ld(t, x, y).x;
}
#endif

__device__ float foo(float x, float y) { return tex2D(tex, x, y); }

Note that, clang-based one needs defining texture fetch functions as they could not be reused from CUDA SDK. That part is enclosed with #if defined(__clang__).

Here's the PTX code generated from NVCC. ``

kernel.ptx
//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-27506705
// Cuda compilation tools, release 10.2, V10.2.89
// Based on LLVM 3.4svn
//

.version 6.5
.target sm_30
.address_size 64

        // .globl       _Z3fooff
.visible .global .texref tex;

.visible .func  (.param .b32 func_retval0) _Z3fooff(
        .param .b32 _Z3fooff_param_0,
        .param .b32 _Z3fooff_param_1
)
{
        .reg .f32       %f<7>;
        .reg .b64       %rd<2>;


        ld.param.f32    %f1, [_Z3fooff_param_0];
        ld.param.f32    %f2, [_Z3fooff_param_1];
        tex.2d.v4.f32.f32       {%f3, %f4, %f5, %f6}, [tex, {%f1, %f2}];
        st.param.f32    [func_retval0+0], %f3;
        ret;
}

Here's the PTX code generated from Clang and LLVM backend. clang --cuda-device-only --cuda-gpu-arch=sm_30 -O2 -S kernel.cu

kernel-cuda-nvptx64-nvidia-cuda-sm_30.s
//
// Generated by LLVM NVPTX Back-End
//

.version 6.4
.target sm_30
.address_size 64

        // .globl       _Z3fooff
.visible .global .texref tex;

.visible .func  (.param .b32 func_retval0) _Z3fooff(
        .param .b32 _Z3fooff_param_0,
        .param .b32 _Z3fooff_param_1
)
{
        .reg .f32       %f<7>;
        .reg .b64       %rd<2>;

        ld.param.f32    %f1, [_Z3fooff_param_0];
        ld.param.f32    %f2, [_Z3fooff_param_1];
        mov.u64         %rd1, tex;
        tex.2d.v4.f32.f32       {%f3, %f4, %f5, %f6}, [%rd1, {%f1, %f2}];
        st.param.f32    [func_retval0+0], %f3;
        ret;

}

Note that, clang-based one needs defining texture fetch functions as they could not be reused from CUDA SDK. That part is enclosed with #if defined(clang).

What prevents clang to compile the texture functions in the CUDA headers? It looks like we'll need to implement the __nv_tex_surf_handler() builtin, but other than that it should work.

I believe LLVM does have nvvm.texsurf.handle implemented: https://github.com/llvm/llvm-project/blob/d9972f848294b06807c8764615852ba2bc1e8a74/llvm/include/llvm/IR/IntrinsicsNVVM.td#L1150

We also appear to have some plumbing for it in clang: https://github.com/llvm/llvm-project/blob/31262d6722c7ae6a9966a76064af43e5b3a8df71/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp#L724

In D76365#1932392, @tra wrote:

Note that, clang-based one needs defining texture fetch functions as they could not be reused from CUDA SDK. That part is enclosed with #if defined(clang).

What prevents clang to compile the texture functions in the CUDA headers? It looks like we'll need to implement the __nv_tex_surf_handler() builtin, but other than that it should work.

That's a magic. I could not figure out how it works. From its use, e.g. tex2D on texture<T, cudaTextureType2D, cudaReadModeElementType>,

__nv_tex_surf_handler("__tex2D_v2", (typename __nv_tex_rmet_cast<T>::type) &temp, t, x, y);

__tex2D_v2 is a string literal. However, it's more likely a underly function name for the real implementation. Hardly imagine that that string literal is checked directly instead used for constructing the real function name. If that's the case, we also need to find that where that underlying functions are defined as the device bitcode library has no such definition.

In D76365#1932398, @tra wrote:

I believe LLVM does have nvvm.texsurf.handle implemented: https://github.com/llvm/llvm-project/blob/d9972f848294b06807c8764615852ba2bc1e8a74/llvm/include/llvm/IR/IntrinsicsNVVM.td#L1150

This one only adds the definition but NVPTX backend doesn't handle it.

We also appear to have some plumbing for it in clang: https://github.com/llvm/llvm-project/blob/31262d6722c7ae6a9966a76064af43e5b3a8df71/llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp#L724

Yeah, that's so far the internal version is used. The original one with one metadata as parameter that's only used to prevent CSE as the handle loading should be not optimized away or difficult for the backend to handle it. We should be able to add that intrinsic support easily. I could add that later. That should not be a big issue.

Harbormaster completed remote builds in B49805: Diff 251488.Mar 19 2020, 5:00 PM

In D76365#1932439, @hliao wrote:

That's a magic. I could not figure out how it works. From its use, e.g. tex2D on texture<T, cudaTextureType2D, cudaReadModeElementType>,
__nv_tex_surf_handler("__tex2D_v2", (typename __nv_tex_rmet_cast<T>::type) &temp, t, x, y);
__tex2D_v2 is a string literal. However, it's more likely a underly function name for the real implementation. Hardly imagine that that string literal is checked directly instead used for constructing the real function name. If that's the case, we also need to find that where that underlying functions are defined as the device bitcode library has no such definition.

Most likely it's a compiler built-in with no implementation we could reuse and we'll need to implement our own. It should be fairly straightforward to figure out what it does by compiling all variants used by CUDA headers and observing generated PTX. The first 'meta' argument may be tricky, but we should be able to retrieve the constant string value in the front-end and map it to appropriate intrinsic or generate necessary glue.

In D76365#1932517, @tra wrote:
In D76365#1932439, @hliao wrote:
That's a magic. I could not figure out how it works. From its use, e.g. tex2D on texture<T, cudaTextureType2D, cudaReadModeElementType>,
__nv_tex_surf_handler("__tex2D_v2", (typename __nv_tex_rmet_cast<T>::type) &temp, t, x, y);
__tex2D_v2 is a string literal. However, it's more likely a underly function name for the real implementation. Hardly imagine that that string literal is checked directly instead used for constructing the real function name. If that's the case, we also need to find that where that underlying functions are defined as the device bitcode library has no such definition.
Most likely it's a compiler built-in with no implementation we could reuse and we'll need to implement our own. It should be fairly straightforward to figure out what it does by compiling all variants used by CUDA headers and observing generated PTX. The first 'meta' argument may be tricky, but we should be able to retrieve the constant string value in the front-end and map it to appropriate intrinsic or generate necessary glue.

I could add that support gradually in my spare time. The goal of this patch not only addresses the texture/surface reference support for CUDA but also for HIP to keep the maximum compatibility. Once this is landed, we will follow the similar approach in HIP.

tra added inline comments.Mar 20 2020, 12:04 PM

clang/lib/CodeGen/CGCUDARuntime.h
51	This should be `DeviceVarKind`
53	Why does it need 2 bits? In general, I think there's no point squeezing things into bitfields here as this struct is not going to be used all that often. I'd just use enum and bool.
clang/lib/CodeGen/CodeGenModule.cpp
701–713	Would `isCUDADeviceBuiltinTextureType()` be sufficient criteria for skipping TBAA regeneration? Or does it need to be 'it is the texture type and it will be replaced with something else'? What is 'something else' is the same type?
4096–4122	This is the part I'm not comfortable with. It's possible for the user to use the attribute on other types that do not match the expectations encoded here. We should not be failing with an assert here because that's user error, not a compiler bug. Expectations we have for the types should be enforced by Sema and compiler should produce proper diagnostics.
4102	Nit: 'Unexpected'
clang/lib/CodeGen/TargetInfo.cpp
6471–6472	What's the expectation here? Do we care which address spaces we're casting to/from?
6561	This part could use some additional comments. Why do we return an int64? Is that the size of the handle object? Is it guaranteed to always be a 64-bit int, or does it depend on particualr PTX version?
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	Please add comments on why CUDACC is needed for driver_types.h here? AFAICT, driver_types.h does not have any conditionals that depend on CUDACC. What happens if it's not defined.
clang/lib/Sema/SemaDeclAttr.cpp
6931–6932	Nit: Formatting is a bit odd here. Why is AL on a separate line?
clang/test/CodeGenCUDA/surface.cu
13–15	Please add a test for applying the attribute to a wrong type. I.e. a non-template or a template with different number or kinds of parameters. We should have a proper syntax error and not a compiler crash or silent failure.

hliao marked 5 inline comments as done.Mar 20 2020, 2:47 PM

hliao added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
701–713	The replacement only happens in the device compilation. On the host-side, the original type is still used.
4096–4122	`device_builtin_surface_type` and `device_builtin_texture_type` should only be used internally. Regular users of either CUDA or HIP must not use them as they need special internal handling and coordination beyond the compiler itself.
clang/lib/CodeGen/TargetInfo.cpp
6471–6472	We need to check whether we copy from that global variable directly. As all pointers are generic ones, the code here is to look through the `addrspacecast` constant expression for the original global variable.
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	`driver_types.h` includes `host_defines.h`, where macros `__device_builtin_surface_type__` and `__device_builtin_texture_type__` are conditional defined if `__CUDACC__`. The following is extracted from `cuda/crt/host_defines.h` #if !defined(__CUDACC__) #define __device_builtin__ #define __device_builtin_texture_type__ #define __device_builtin_surface_type__ #define __cudart_builtin__ #else /* defined(__CUDACC__) / #define __device_builtin__ \ __location__(device_builtin) #define __device_builtin_texture_type__ \ __location__(device_builtin_texture_type) #define __device_builtin_surface_type__ \ __location__(device_builtin_surface_type) #define __cudart_builtin__ \ __location__(cudart_builtin) #endif / !defined(__CUDACC__) */
clang/lib/Sema/SemaDeclAttr.cpp
6931–6932	it's formatted by `clang-format`, which is run in pre-merge checks

Minor revising following reviewer's comment. Work on Sema checks and upload another review.

tra added inline comments.Mar 20 2020, 5:12 PM

clang/lib/CodeGen/CodeGenModule.cpp
701–713	But you've already checked CUDAIsDevice so you already know that you want to replace the type. `if (getTargetCodeGenInfo().getCUDADeviceBuiltinTextureDeviceType() != nullptr)` appears to be redundant and can probably be dropped.
4096–4122	I agree that it's probably not something that should be used by users. Still, such use should be reported as an error and should not crash the compiler. Asserts are for clang/llvm developers to catch the bugs in the compiler itself, not for the end users misusing something they should not.
clang/lib/CodeGen/TargetInfo.cpp
6471–6472	I'm still not sure what exactly you want to do here. If the assumption is that all `addrspacecast` ops you may see are from global to generic AS, this assumption is not always valid. I can annotate any pointer with an arbitrary address space which may then be cast to generic. Or something else. If you accept Src as is, without special-casing addrspacecast, what's going to happen? AFAICT `nvvm_texsurf_handle_internal` does not really care about specific AS.
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	My concern is -- what else is going to get defined? There are ~60 references to CUDACC in CUDA-10.1 headers. The wrappers are fragile enough that there's a good chance something may break. It does not help that my CUDA build bot decided to die just after we switched to work-from-home, so there will be no early warning if something goes wrong. If all we need are the macros above, we may just define them.
clang/lib/Sema/SemaDeclAttr.cpp
6931–6932	Sorry. It was an artifact of messed up fonts in my browser. Apparently I've ended up using proportional font. <rant> Why, oh why almost all fonts listed as 'fixed-width' on the chromebook are actually not ?! Even the ones that are fixed-width are prone to use ligatures and mess formatting. 'ffff' is still longer than 'fifi' for me.</rant> This code looks much better with fixed-width font.

Harbormaster failed remote builds in B49968: Diff 251777!Mar 20 2020, 6:27 PM

Add Sema checks on CUDA device builtin surface/texture attributes.

hliao marked 6 inline comments as done.Mar 24 2020, 7:06 PM

hliao added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
701–713	That check is a target-specific one, which may choose very different implementation on how to handle these builtin surface/texture types. If they don't want to change those types on the device side and, instead, use very different different `textureReference`. Their `getCUDADeviceBuiltinTextureDeviceType()` may return `nullptr` to keep use the same reference type on both host- and device-side compilation.
4096–4122	addressed in the latest revision
clang/lib/CodeGen/TargetInfo.cpp
6471–6472	the backend needs a GlobalVariable as the argument for that intrinsic. The lookup through `addrspacecast` to check a global variable, which is created in the global address space and casted into a generic pointer.
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	Let me check all CUDA SDK through their dockers. Redefining sounds good me as wll.
clang/test/CodeGenCUDA/surface.cu
13–15	addressed in refined tests in the latest revision

Harbormaster failed remote builds in B50336: Diff 252468!Mar 24 2020, 7:09 PM

hliao marked 3 inline comments as done.Mar 25 2020, 8:01 AM

hliao added inline comments.

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	I checked headers from 7.0 to 10.0, `__device_builtin_texture_type__` and `__builtin_builtin_surface_type__` are only defined with that attributes if `__CUDACC__` is defined. As we only pre-define `__CUDA_ARCH__` in clang but flip `__CUDACC__` on and off in the wrapper headers to selectively reuse CUDA's headers. I would hear your suggestion on that. BTW, macros like `__device__` are defined regardless of `__CUDACC__` from 7.0 to 10.0 as `__location(device)`. `__location__` is defined if `__CUDACC__` is present. But, different from `__device__`, `__device_builtin_texture_type__` is defined only `__CUDACC__` is defined.

tra added inline comments.Mar 25 2020, 10:04 AM

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	`__device_builtin_texture_type__` is defined in `host_defines.h`, which does not seem to include any other files or does anything suspicious with `__CUDACC__` It may be OK to move inclusion of `host_defines.h` to the point before `driver_types.h`, which happens to include the host_defines.h first, and define CUDACC only around `host_defines.h`. An alternative is to add the macros just after inclusion of `host_defines.h` In either case please verify that these attributes are the only things that's changed by diffing output of `clang++ -x cuda /dev/null --cuda-host-only -dD -E -o -` before and after the change.

Fix windows build and revise header change.

When including drivers_types.h or host_defines.h with __CUDACC__, the only difference is the additional attributes added. No additional change.
After including host_defines.h firstly with __CUDACC__, there is no significant change from the one including drivers_types.h.

hliao marked an inline comment as done.Mar 26 2020, 12:03 AM

hliao added inline comments.

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	With `__CUDACC__`, the only difference is the additional attributes added, such as `device_builtin_texture_type`. Attributes like `cudart_builtin` are also defined correctly. That should be used to start the support CUDART features. I revised the change to include `host_defines.h` first and found there's no changes from the one using `driver_types.h`. We should be OK for that change.

Harbormaster failed remote builds in B50499: Diff 252758!Mar 26 2020, 1:35 AM

Rebase to the master code

Harbormaster failed remote builds in B50531: Diff 252828!Mar 26 2020, 8:06 AM

LGTM. Next step is to figure out what various __nv_tex_surf_handler(<string>...) maps to for various strings (there are ~110 of them in CUDA-10.2) and implement its replacement. I think we should be able to do it in the wrapper file.

clang/lib/Headers/__clang_cuda_runtime_wrapper.h
82–94 ↗	(On Diff #251488)	SGTM. Thank you for verifying this.

This revision is now accepted and ready to land.Mar 26 2020, 9:58 AM

In D76365#1944272, @tra wrote:

LGTM. Next step is to figure out what various __nv_tex_surf_handler(<string>...) maps to for various strings (there are ~110 of them in CUDA-10.2) and implement its replacement. I think we should be able to do it in the wrapper file.

Besides the texture/surface functions for their reference types, we also need to add corresponding ones for surface/texture object types as well. Even there are many but most of them are straight-forward, I will do that in my spare time. Thanks for review.

Closed by commit rG6a9ad5f3f4ac: [cuda][hip] Add CUDA builtin surface/texture reference support. (authored by hliao). · Explain WhyMar 26 2020, 11:55 AM

This revision was automatically updated to reflect the committed changes.

Looks like the change breaks compilation for us:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.

I'm investigating, but we may need to roll back this patch. Stay tuned.

In D76365#1946345, @tra wrote:

Looks like the change breaks compilation for us:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.

I'm investigating, but we may need to roll back this patch. Stay tuned.

It appears that the assumptions of what types the attributes can apply to are not valid. In CUDA headers they are also used on non-templated classes/structs. E.g in cuda/include/cuda_surface_types.h:74

struct __attribute__((device_builtin_surface_type)) surface : public surfaceReference
{
...
};

I'll undo this patch until we can make it work.

In D76365#1946345, @tra wrote:

Looks like the change breaks compilation for us:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.

I'm investigating, but we may need to roll back this patch. Stay tuned.

I am looking into it as well. Thanks.

In D76365#1946407, @tra wrote:
In D76365#1946345, @tra wrote:
Looks like the change breaks compilation for us:
In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:104:
In file included from cuda/include/cuda_runtime.h:116: cuda/include/cuda_surface_types.h:91:42: error: illegal device builtin surface reference type 'surface<void, dim>' declared here
struct  __device_builtin_surface_type__  surface<void, dim> : public surfaceReference
                                         ^
cuda/include/cuda_surface_types.h:91:42: note: 'surface<void, dim>' needs to be instantiated from a class template with the 2nd template argument as an integral value
1 error generated when compiling for sm_60.
I'm investigating, but we may need to roll back this patch. Stay tuned.
It appears that the assumptions of what types the attributes can apply to are not valid. In CUDA headers they are also used on non-templated classes/structs. E.g in cuda/include/cuda_surface_types.h:74
struct __attribute__((device_builtin_surface_type)) surface : public surfaceReference
{
...
};
I'll undo this patch until we can make it work.

That's a partial template specialization needs handling. I am revising that patch. Please revert it first. Thanks.

In D76365#1946415, @hliao wrote:

That's a partial template specialization needs handling. I am revising that patch. Please revert it first. Thanks.

Reverted in fe8063e1a0e983f1b4d

Reopened for further work

This revision is now accepted and ready to land.Mar 27 2020, 1:50 PM

Fix Sema checks on partial template specialization.

Revise Sema checks on the template class.

In D76365#1946925, @hliao wrote:

Fix Sema checks on partial template specialization.

Revise Sema checks on the template class.

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

In D76365#1946908, @tra wrote:

Reopened for further work

Closed by commit rG5be9b8cbe2b2: [cuda][hip] Add CUDA builtin surface/texture reference support. (authored by hliao). · Explain WhyMar 27 2020, 2:20 PM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B50733: Diff 253214!Mar 27 2020, 2:53 PM

In D76365#1946938, @hliao wrote:

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:

__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?

In D76365#1947103, @tra wrote:
In D76365#1946938, @hliao wrote:

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:
__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?

I remembered that triggers NVCC internal errors or errors. I will check that this night.

In D76365#1947103, @tra wrote:
In D76365#1946938, @hliao wrote:

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:
__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?

For such case, NVCC reports the following error:

kernel.cu(3): error: attribute "device_builtin_surface_type" does not apply here

1 error detected in the compilation of "kernel.cpp1.ii"

That error is generated after nvcc --keep -g -c kernel.cu from this sample code (kernel.cu)

#include <cuda.h>

__attribute__((device_builtin_surface_type)) int foo;

int f() {
  return foo;
}

I changed that sample code a little bit to this one

#include <cuda.h>

#if 1
typedef __attribute__((device_builtin_surface_type)) int dev_texsurf_int_t;
dev_texsurf_int_t foo;
#else
__attribute__((device_builtin_surface_type)) int foo;
#endif

int f() {
  return foo;
}

It triggers a crash in NVCC with the same compilation command line.

We may enhance clang to report an error instead of a warning only so far.

In D76365#1947462, @hliao wrote:
In D76365#1947103, @tra wrote:
In D76365#1946938, @hliao wrote:

The new revision is accepted, right? Just want to confirm as it seems you accept it before I posted the new change.

The approval was for the old version. I didn't undo it when I reopened the review. The diff looks OK, though the last variant still leaves open the question of what's the meaning of these attributes and what are the restrictions on their use.

So what's the reasonable thing to do if I write something like this:
__attribute__((device_builtin_surface_type)) int foo; // Ignore? Warn? Error? Do something sensible?
For such case, NVCC reports the following error:
kernel.cu(3): error: attribute "device_builtin_surface_type" does not apply here

1 error detected in the compilation of "kernel.cpp1.ii"
That error is generated after nvcc --keep -g -c kernel.cu from this sample code (kernel.cu)
#include <cuda.h>

__attribute__((device_builtin_surface_type)) int foo;

int f() {
  return foo;
}
I changed that sample code a little bit to this one
#include <cuda.h>

#if 1
typedef __attribute__((device_builtin_surface_type)) int dev_texsurf_int_t;
dev_texsurf_int_t foo;
#else
__attribute__((device_builtin_surface_type)) int foo;
#endif

int f() {
  return foo;
}
It triggers a crash in NVCC with the same compilation command line.

We may enhance clang to report an error instead of a warning only so far.

I tried one more sample, it triggers NVCC crash as well.

struct __attribute__((device_builtin_surface_type)) ref {
  int x;
} R;

int f() { return R.x; }

For this case, clang reports error due to the same checks added in this patch.

In D76365#1947479, @hliao wrote:

Nice! I'll file a bug with NVIDIA.

It appears I can crash clang with some texture code: https://godbolt.org/z/5vdEwC

In D76365#1975784, @tra wrote:

It appears I can crash clang with some texture code: https://godbolt.org/z/5vdEwC

llvm.nvvm.tex.unified.2d.v4f32.f32 has a vector output, the alias

__attribute__((device)) float tex2d_ld(tex_t, float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");

needs replacing with

__attribute__((device)) v4f tex2d_ld(tex_t, float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");

see this revised sample code https://godbolt.org/z/B7rtxR

kalvdans added a subscriber: kalvdans.Sep 17 2021, 1:06 AM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptSep 17 2021, 1:06 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, dexonsmith, jdoerfert. · View Herald Transcript

Revision Contents

Path

Size

clang/

include/

clang/

AST/

Type.h

5 lines

Basic/

Attr.td

8 lines

AttrDocs.td

22 lines

lib/

AST/

Type.cpp

14 lines

CodeGen/

80 lines

20 lines

13 lines

67 lines

14 lines

26 lines

76 lines

Sema/

SemaDeclAttr.cpp

10 lines

test/

CodeGenCUDA/

surface.cu

37 lines

texture.cu

55 lines

SemaCUDA/

attr-declspec.cu

19 lines

attributes-on-non-cuda.cu

19 lines

llvm/

include/

llvm/

IR/

Operator.h

19 lines

Diff 251101

clang/include/clang/AST/Type.h

Show First 20 Lines • Show All 2,147 Lines • ▼ Show 20 Lines	#include "clang/Basic/OpenCLExtensionTypes.def"
bool isPipeType() const; // OpenCL pipe type		bool isPipeType() const; // OpenCL pipe type
bool isOpenCLSpecificType() const; // Any OpenCL specific type		bool isOpenCLSpecificType() const; // Any OpenCL specific type

/// Determines if this type, which must satisfy		/// Determines if this type, which must satisfy
/// isObjCLifetimeType(), is implicitly __unsafe_unretained rather		/// isObjCLifetimeType(), is implicitly __unsafe_unretained rather
/// than implicitly __strong.		/// than implicitly __strong.
bool isObjCARCImplicitlyUnretainedType() const;		bool isObjCARCImplicitlyUnretainedType() const;

		/// Check if the type is the CUDA device builtin surface type.
		bool isCUDADeviceBuiltinSurfaceType() const;
		/// Check if the type is the CUDA device builtin texture type.
		bool isCUDADeviceBuiltinTextureType() const;

/// Return the implicit lifetime for this type, which must not be dependent.		/// Return the implicit lifetime for this type, which must not be dependent.
Qualifiers::ObjCLifetime getObjCARCImplicitLifetime() const;		Qualifiers::ObjCLifetime getObjCARCImplicitLifetime() const;

enum ScalarTypeKind {		enum ScalarTypeKind {
STK_CPointer,		STK_CPointer,
STK_BlockPointer,		STK_BlockPointer,
STK_ObjCObjectPointer,		STK_ObjCObjectPointer,
STK_MemberPointer,		STK_MemberPointer,
▲ Show 20 Lines • Show All 4,979 Lines • Show Last 20 Lines

clang/include/clang/Basic/Attr.td

Show First 20 Lines • Show All 1,036 Lines • ▼ Show 20 Lines	def HIPPinnedShadow : InheritableAttr {
let Documentation = [HIPPinnedShadowDocs];		let Documentation = [HIPPinnedShadowDocs];
}		}

def CUDADeviceBuiltin : IgnoredAttr {		def CUDADeviceBuiltin : IgnoredAttr {
let Spellings = [GNU<"device_builtin">, Declspec<"__device_builtin__">];		let Spellings = [GNU<"device_builtin">, Declspec<"__device_builtin__">];
let LangOpts = [CUDA];		let LangOpts = [CUDA];
}		}

def CUDADeviceBuiltinSurfaceType : IgnoredAttr {		def CUDADeviceBuiltinSurfaceType : InheritableAttr {
let Spellings = [GNU<"device_builtin_surface_type">,		let Spellings = [GNU<"device_builtin_surface_type">,
Declspec<"__device_builtin_surface_type__">];		Declspec<"__device_builtin_surface_type__">];
let LangOpts = [CUDA];		let LangOpts = [CUDA];
		let Subjects = SubjectList<[Type]>;
		let Documentation = [CUDADeviceBuiltinSurfaceTypeDocs];
}		}

def CUDADeviceBuiltinTextureType : IgnoredAttr {		def CUDADeviceBuiltinTextureType : InheritableAttr {
let Spellings = [GNU<"device_builtin_texture_type">,		let Spellings = [GNU<"device_builtin_texture_type">,
Declspec<"__device_builtin_texture_type__">];		Declspec<"__device_builtin_texture_type__">];
let LangOpts = [CUDA];		let LangOpts = [CUDA];
		let Subjects = SubjectList<[Type]>;
		let Documentation = [CUDADeviceBuiltinTextureTypeDocs];
}		}

def CUDAGlobal : InheritableAttr {		def CUDAGlobal : InheritableAttr {
let Spellings = [GNU<"global">, Declspec<"__global__">];		let Spellings = [GNU<"global">, Declspec<"__global__">];
let Subjects = SubjectList<[Function]>;		let Subjects = SubjectList<[Function]>;
let LangOpts = [CUDA];		let LangOpts = [CUDA];
let Documentation = [Undocumented];		let Documentation = [Undocumented];
}		}
▲ Show 20 Lines • Show All 2,296 Lines • Show Last 20 Lines

clang/include/clang/Basic/AttrDocs.td

	Show First 20 Lines • Show All 4,618 Lines • ▼ Show 20 Lines
	__declspec(hip_pinned_shadow) can be added to the definition of a global variable			__declspec(hip_pinned_shadow) can be added to the definition of a global variable
	to indicate it is a HIP pinned shadow variable. A HIP pinned shadow variable can			to indicate it is a HIP pinned shadow variable. A HIP pinned shadow variable can
	be accessed on both device side and host side. It has external linkage and is			be accessed on both device side and host side. It has external linkage and is
	not initialized on device side. It has internal linkage and is initialized by			not initialized on device side. It has internal linkage and is initialized by
	the initializer on host side.			the initializer on host side.
	}];			}];
	}			}

				def CUDADeviceBuiltinSurfaceTypeDocs : Documentation {
				let Category = DocCatType;
				let Content = [{
				The ``device_builtin_surface_type`` attribute can be applied to a class
				template when declaring the surface reference. A surface reference variable
				could be accessed on the host side and, on the device side, might be translated
				into an internal surface object, which is established through surface bind and
				unbind runtime APIs.
				}];
				}

				def CUDADeviceBuiltinTextureTypeDocs : Documentation {
				let Category = DocCatType;
				let Content = [{
				The ``device_builtin_texture_type`` attribute can be applied to a class
				template when declaring the texture reference. A texture reference variable
				could be accessed on the host side and, on the device side, might be translated
				into an internal texture object, which is established through texture bind and
				unbind runtime APIs.
				}];
				}

	def LifetimeOwnerDocs : Documentation {			def LifetimeOwnerDocs : Documentation {
	let Category = DocCatDecl;			let Category = DocCatDecl;
	let Content = [{			let Content = [{
	.. Note:: This attribute is experimental and its effect on analysis is subject to change in			.. Note:: This attribute is experimental and its effect on analysis is subject to change in
	a future version of clang.			a future version of clang.

	The attribute ``[[gsl::Owner(T)]]`` applies to structs and classes that own an			The attribute ``[[gsl::Owner(T)]]`` applies to structs and classes that own an
	object of type ``T``:			object of type ``T``:
	▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

clang/lib/AST/Type.cpp

Show First 20 Lines • Show All 4,110 Lines • ▼ Show 20 Lines	bool Type::isCARCBridgableType() const {
const auto *Pointer = getAs<PointerType>();		const auto *Pointer = getAs<PointerType>();
if (!Pointer)		if (!Pointer)
return false;		return false;

QualType Pointee = Pointer->getPointeeType();		QualType Pointee = Pointer->getPointeeType();
return Pointee->isVoidType() \|\| Pointee->isRecordType();		return Pointee->isVoidType() \|\| Pointee->isRecordType();
}		}

		/// Check if the specified type is the CUDA device builtin surface type.
		bool Type::isCUDADeviceBuiltinSurfaceType() const {
		if (const auto *RT = getAs<RecordType>())
		return RT->getDecl()->hasAttr<CUDADeviceBuiltinSurfaceTypeAttr>();
		return false;
		}

		/// Check if the specified type is the CUDA device builtin texture type.
		bool Type::isCUDADeviceBuiltinTextureType() const {
		if (const auto *RT = getAs<RecordType>())
		return RT->getDecl()->hasAttr<CUDADeviceBuiltinTextureTypeAttr>();
		return false;
		}

bool Type::hasSizedVLAType() const {		bool Type::hasSizedVLAType() const {
if (!isVariablyModifiedType()) return false;		if (!isVariablyModifiedType()) return false;

if (const auto *ptr = getAs<PointerType>())		if (const auto *ptr = getAs<PointerType>())
return ptr->getPointeeType()->hasSizedVLAType();		return ptr->getPointeeType()->hasSizedVLAType();
if (const auto *ref = getAs<ReferenceType>())		if (const auto *ref = getAs<ReferenceType>())
return ref->getPointeeType()->hasSizedVLAType();		return ref->getPointeeType()->hasSizedVLAType();
if (const ArrayType *arr = getAsArrayTypeUnsafe()) {		if (const ArrayType *arr = getAsArrayTypeUnsafe()) {
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	private:
struct KernelInfo {		struct KernelInfo {
llvm::Function *Kernel;		llvm::Function *Kernel;
const Decl *D;		const Decl *D;
};		};
llvm::SmallVector<KernelInfo, 16> EmittedKernels;		llvm::SmallVector<KernelInfo, 16> EmittedKernels;
struct VarInfo {		struct VarInfo {
llvm::GlobalVariable *Var;		llvm::GlobalVariable *Var;
const VarDecl *D;		const VarDecl *D;
unsigned Flag;		DeviceVarFlags Flags;
};		};
llvm::SmallVector<VarInfo, 16> DeviceVars;		llvm::SmallVector<VarInfo, 16> DeviceVars;
/// Keeps track of variable containing handle of GPU binary. Populated by		/// Keeps track of variable containing handle of GPU binary. Populated by
/// ModuleCtorFunction() and used to create corresponding cleanup calls in		/// ModuleCtorFunction() and used to create corresponding cleanup calls in
/// ModuleDtorFunction()		/// ModuleDtorFunction()
llvm::GlobalVariable *GpuBinaryHandle = nullptr;		llvm::GlobalVariable *GpuBinaryHandle = nullptr;
/// Whether we generate relocatable device code.		/// Whether we generate relocatable device code.
bool RelocatableDeviceCode;		bool RelocatableDeviceCode;
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	private:
void emitDeviceStubBodyNew(CodeGenFunction &CGF, FunctionArgList &Args);		void emitDeviceStubBodyNew(CodeGenFunction &CGF, FunctionArgList &Args);
std::string getDeviceSideName(const NamedDecl *ND) override;		std::string getDeviceSideName(const NamedDecl *ND) override;

public:		public:
CGNVCUDARuntime(CodeGenModule &CGM);		CGNVCUDARuntime(CodeGenModule &CGM);

void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) override;		void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) override;
void registerDeviceVar(const VarDecl *VD, llvm::GlobalVariable &Var,		void registerDeviceVar(const VarDecl *VD, llvm::GlobalVariable &Var,
unsigned Flags) override {		bool Extern, bool Constant) override {
DeviceVars.push_back({&Var, VD, Flags});		DeviceVars.push_back(
		{&Var, VD, {DeviceVarFlags::Variable, Extern, Constant}});
		}
		void registerDeviceSurf(const VarDecl *VD, llvm::GlobalVariable &Var,
		bool Extern, int Type) override {
		DeviceVars.push_back({&Var,
		VD,
		{DeviceVarFlags::Surface, Extern, /Constant=/false,
		/Normalized=/false, Type}});
		}
		void registerDeviceTex(const VarDecl *VD, llvm::GlobalVariable &Var,
		bool Extern, int Type, bool Normalized) override {
		DeviceVars.push_back({&Var,
		VD,
		{DeviceVarFlags::Texture, Extern, /Constant=/false,
		Normalized, Type}});
}		}

/// Creates module constructor function		/// Creates module constructor function
llvm::Function *makeModuleCtorFunction() override;		llvm::Function *makeModuleCtorFunction() override;
/// Creates module destructor function		/// Creates module destructor function
llvm::Function *makeModuleDtorFunction() override;		llvm::Function *makeModuleDtorFunction() override;
};		};

▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	llvm::Function *CGNVCUDARuntime::makeRegisterGlobalsFn() {
// void __cudaRegisterVar(void *, char , char , const char ,		// void __cudaRegisterVar(void *, char , char , const char ,
// int, int, int, int)		// int, int, int, int)
llvm::Type *RegisterVarParams[] = {VoidPtrPtrTy, CharPtrTy, CharPtrTy,		llvm::Type *RegisterVarParams[] = {VoidPtrPtrTy, CharPtrTy, CharPtrTy,
CharPtrTy, IntTy, IntTy,		CharPtrTy, IntTy, IntTy,
IntTy, IntTy};		IntTy, IntTy};
llvm::FunctionCallee RegisterVar = CGM.CreateRuntimeFunction(		llvm::FunctionCallee RegisterVar = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, RegisterVarParams, false),		llvm::FunctionType::get(IntTy, RegisterVarParams, false),
addUnderscoredPrefixToName("RegisterVar"));		addUnderscoredPrefixToName("RegisterVar"));
		// void __cudaRegisterSurface(void *, const struct surfaceReference ,
		// const void *, const char , int, int);
		llvm::FunctionCallee RegisterSurf = CGM.CreateRuntimeFunction(
		llvm::FunctionType::get(
		VoidTy, {VoidPtrPtrTy, VoidPtrTy, CharPtrTy, CharPtrTy, IntTy, IntTy},
		false),
		addUnderscoredPrefixToName("RegisterSurface"));
		// void __cudaRegisterTexture(void *, const struct textureReference ,
		// const void *, const char , int, int, int)
		llvm::FunctionCallee RegisterTex = CGM.CreateRuntimeFunction(
		llvm::FunctionType::get(
		VoidTy,
		{VoidPtrPtrTy, VoidPtrTy, CharPtrTy, CharPtrTy, IntTy, IntTy, IntTy},
		false),
		addUnderscoredPrefixToName("RegisterTexture"));
for (auto &&Info : DeviceVars) {		for (auto &&Info : DeviceVars) {
llvm::GlobalVariable *Var = Info.Var;		llvm::GlobalVariable *Var = Info.Var;
unsigned Flags = Info.Flag;
llvm::Constant *VarName = makeConstantString(getDeviceSideName(Info.D));		llvm::Constant *VarName = makeConstantString(getDeviceSideName(Info.D));
		switch (Info.Flags.Kind) {
		case DeviceVarFlags::Variable: {
uint64_t VarSize =		uint64_t VarSize =
CGM.getDataLayout().getTypeAllocSize(Var->getValueType());		CGM.getDataLayout().getTypeAllocSize(Var->getValueType());
llvm::Value *Args[] = {		llvm::Value *Args[] = {&GpuBinaryHandlePtr,
&GpuBinaryHandlePtr,
Builder.CreateBitCast(Var, VoidPtrTy),		Builder.CreateBitCast(Var, VoidPtrTy),
VarName,		VarName,
VarName,		VarName,
llvm::ConstantInt::get(IntTy, (Flags & ExternDeviceVar) ? 1 : 0),		llvm::ConstantInt::get(IntTy, Info.Flags.Extern),
llvm::ConstantInt::get(IntTy, VarSize),		llvm::ConstantInt::get(IntTy, VarSize),
llvm::ConstantInt::get(IntTy, (Flags & ConstantDeviceVar) ? 1 : 0),		llvm::ConstantInt::get(IntTy, Info.Flags.Constant),
llvm::ConstantInt::get(IntTy, 0)};		llvm::ConstantInt::get(IntTy, 0)};
Builder.CreateCall(RegisterVar, Args);		Builder.CreateCall(RegisterVar, Args);
		break;
		}
		case DeviceVarFlags::Surface:
		Builder.CreateCall(
		RegisterSurf,
		{&GpuBinaryHandlePtr, Builder.CreateBitCast(Var, VoidPtrTy), VarName,
		VarName, llvm::ConstantInt::get(IntTy, Info.Flags.SurfTexType),
		llvm::ConstantInt::get(IntTy, Info.Flags.Extern)});
		break;
		case DeviceVarFlags::Texture:
		Builder.CreateCall(
		RegisterTex,
		{&GpuBinaryHandlePtr, Builder.CreateBitCast(Var, VoidPtrTy), VarName,
		VarName, llvm::ConstantInt::get(IntTy, Info.Flags.SurfTexType),
		llvm::ConstantInt::get(IntTy, Info.Flags.Normalized),
		llvm::ConstantInt::get(IntTy, Info.Flags.Extern)});
		break;
		}
}		}

Builder.CreateRetVoid();		Builder.CreateRetVoid();
return RegisterKernelsFunc;		return RegisterKernelsFunc;
}		}

/// Creates a global constructor function for the module:		/// Creates a global constructor function for the module:
///		///
▲ Show 20 Lines • Show All 334 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCUDARuntime.h

	Show All 36 Lines
	class RValue;			class RValue;

	class CGCUDARuntime {			class CGCUDARuntime {
	protected:			protected:
	CodeGenModule &CGM;			CodeGenModule &CGM;

	public:			public:
	// Global variable properties that must be passed to CUDA runtime.			// Global variable properties that must be passed to CUDA runtime.
	enum DeviceVarFlags {			struct DeviceVarFlags {
	ExternDeviceVar = 0x01, // extern			enum DeviceVarKind {
	ConstantDeviceVar = 0x02, // __constant__			Variable, // Variable
				Surface, // Builtin surface
				Texture, // Builtin texture
				};
				unsigned Kind : 2;
				traUnsubmitted Not Done Reply Inline Actions This should be `DeviceVarKind` tra: This should be `DeviceVarKind`
				unsigned Extern : 1;
				unsigned Constant : 2; // Constant variable.
				traUnsubmitted Not Done Reply Inline Actions Why does it need 2 bits? In general, I think there's no point squeezing things into bitfields here as this struct is not going to be used all that often. I'd just use enum and bool. tra: Why does it need 2 bits? In general, I think there's no point squeezing things into bitfields…
				unsigned Normalized : 1; // Normalized texture.
				int SurfTexType; // Type of surface/texutre.
	};			};

	CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}			CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}
	virtual ~CGCUDARuntime();			virtual ~CGCUDARuntime();

	virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,			virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
	const CUDAKernelCallExpr *E,			const CUDAKernelCallExpr *E,
	ReturnValueSlot ReturnValue);			ReturnValueSlot ReturnValue);

	/// Emits a kernel launch stub.			/// Emits a kernel launch stub.
	virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) = 0;			virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) = 0;
	virtual void registerDeviceVar(const VarDecl *VD, llvm::GlobalVariable &Var,			virtual void registerDeviceVar(const VarDecl *VD, llvm::GlobalVariable &Var,
	unsigned Flags) = 0;			bool Extern, bool Constant) = 0;
				virtual void registerDeviceSurf(const VarDecl *VD, llvm::GlobalVariable &Var,
				bool Extern, int Type) = 0;
				virtual void registerDeviceTex(const VarDecl *VD, llvm::GlobalVariable &Var,
				bool Extern, int Type, bool Normalized) = 0;

	/// Constructs and returns a module initialization function or nullptr if it's			/// Constructs and returns a module initialization function or nullptr if it's
	/// not needed. Must be called after all kernels have been emitted.			/// not needed. Must be called after all kernels have been emitted.
	virtual llvm::Function *makeModuleCtorFunction() = 0;			virtual llvm::Function *makeModuleCtorFunction() = 0;

	/// Returns a module cleanup function or nullptr if it's not needed.			/// Returns a module cleanup function or nullptr if it's not needed.
	/// Must be called after ModuleCtorFunction			/// Must be called after ModuleCtorFunction
	virtual llvm::Function *makeModuleDtorFunction() = 0;			virtual llvm::Function *makeModuleDtorFunction() = 0;
	Show All 13 Lines

clang/lib/CodeGen/CGExprAgg.cpp

Show All 9 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGCXXABI.h"		#include "CGCXXABI.h"
#include "CGObjCRuntime.h"		#include "CGObjCRuntime.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "ConstantEmitter.h"		#include "ConstantEmitter.h"
		#include "TargetInfo.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/DeclCXX.h"		#include "clang/AST/DeclCXX.h"
#include "clang/AST/DeclTemplate.h"		#include "clang/AST/DeclTemplate.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
▲ Show 20 Lines • Show All 1,906 Lines • ▼ Show 20 Lines	if (const RecordType *RT = Ty->getAs<RecordType>()) {
"Trying to aggregate-copy a type without a trivial copy/move "		"Trying to aggregate-copy a type without a trivial copy/move "
"constructor or assignment operator");		"constructor or assignment operator");
// Ignore empty classes in C++.		// Ignore empty classes in C++.
if (Record->isEmpty())		if (Record->isEmpty())
return;		return;
}		}
}		}

		if (getLangOpts().CUDAIsDevice) {
		if (Ty->isCUDADeviceBuiltinSurfaceType()) {
		if (getTargetHooks().emitCUDADeviceBuiltinSurfaceDeviceCopy(*this, Dest,
		Src))
		return;
		} else if (Ty->isCUDADeviceBuiltinTextureType()) {
		if (getTargetHooks().emitCUDADeviceBuiltinTextureDeviceCopy(*this, Dest,
		Src))
		return;
		}
		}

// Aggregate assignment turns into llvm.memcpy. This is almost valid per		// Aggregate assignment turns into llvm.memcpy. This is almost valid per
// C99 6.5.16.1p3, which states "If the value being stored in an object is		// C99 6.5.16.1p3, which states "If the value being stored in an object is
// read from another object that overlaps in anyway the storage of the first		// read from another object that overlaps in anyway the storage of the first
// object, then the overlap shall be exact and the two objects shall have		// object, then the overlap shall be exact and the two objects shall have
// qualified or unqualified versions of a compatible type."		// qualified or unqualified versions of a compatible type."
//		//
// memcpy is not defined if the source and destination pointers are exactly		// memcpy is not defined if the source and destination pointers are exactly
// equal, but other compilers do this optimization, and almost every memcpy		// equal, but other compilers do this optimization, and almost every memcpy
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 692 Lines • ▼ Show 20 Lines	llvm::MDNode *CodeGenModule::getTBAATypeInfo(QualType QTy) {
if (!TBAA)		if (!TBAA)
return nullptr;		return nullptr;
return TBAA->getTypeInfo(QTy);		return TBAA->getTypeInfo(QTy);
}		}

TBAAAccessInfo CodeGenModule::getTBAAAccessInfo(QualType AccessType) {		TBAAAccessInfo CodeGenModule::getTBAAAccessInfo(QualType AccessType) {
if (!TBAA)		if (!TBAA)
return TBAAAccessInfo();		return TBAAAccessInfo();
		if (getLangOpts().CUDAIsDevice) {
		// As CUDA builtin surface/texture types are replaced, skip generating TBAA
		// access info.
		if (AccessType->isCUDADeviceBuiltinSurfaceType() &&
		getTargetCodeGenInfo().getCUDADeviceBuiltinSurfaceDeviceType() !=
		nullptr)
		return TBAAAccessInfo();
		if (AccessType->isCUDADeviceBuiltinTextureType() &&
		getTargetCodeGenInfo().getCUDADeviceBuiltinTextureDeviceType() !=
		nullptr)
		return TBAAAccessInfo();
		}
return TBAA->getAccessInfo(AccessType);		return TBAA->getAccessInfo(AccessType);
		traUnsubmitted Not Done Reply Inline Actions Would `isCUDADeviceBuiltinTextureType()` be sufficient criteria for skipping TBAA regeneration? Or does it need to be 'it is the texture type and it will be replaced with something else'? What is 'something else' is the same type? tra: Would `isCUDADeviceBuiltinTextureType()` be sufficient criteria for skipping TBAA regeneration?
		hliaoAuthorUnsubmitted Done Reply Inline Actions The replacement only happens in the device compilation. On the host-side, the original type is still used. hliao: The replacement only happens in the device compilation. On the host-side, the original type is…
		traUnsubmitted Not Done Reply Inline Actions But you've already checked CUDAIsDevice so you already know that you want to replace the type. `if (getTargetCodeGenInfo().getCUDADeviceBuiltinTextureDeviceType() != nullptr)` appears to be redundant and can probably be dropped. tra: But you've already checked CUDAIsDevice so you already know that you want to replace the type.
		hliaoAuthorUnsubmitted Done Reply Inline Actions That check is a target-specific one, which may choose very different implementation on how to handle these builtin surface/texture types. If they don't want to change those types on the device side and, instead, use very different different `textureReference`. Their `getCUDADeviceBuiltinTextureDeviceType()` may return `nullptr` to keep use the same reference type on both host- and device-side compilation. hliao: That check is a target-specific one, which may choose very different implementation on how to…
}		}

TBAAAccessInfo		TBAAAccessInfo
CodeGenModule::getTBAAVTablePtrAccessInfo(llvm::Type *VTablePtrType) {		CodeGenModule::getTBAAVTablePtrAccessInfo(llvm::Type *VTablePtrType) {
if (!TBAA)		if (!TBAA)
return TBAAAccessInfo();		return TBAAAccessInfo();
return TBAA->getVTablePtrAccessInfo(VTablePtrType);		return TBAA->getVTablePtrAccessInfo(VTablePtrType);
}		}
▲ Show 20 Lines • Show All 1,777 Lines • ▼ Show 20 Lines	void CodeGenModule::EmitGlobal(GlobalDecl GD) {

// If this is CUDA, be selective about which declarations we emit.		// If this is CUDA, be selective about which declarations we emit.
if (LangOpts.CUDA) {		if (LangOpts.CUDA) {
if (LangOpts.CUDAIsDevice) {		if (LangOpts.CUDAIsDevice) {
if (!Global->hasAttr<CUDADeviceAttr>() &&		if (!Global->hasAttr<CUDADeviceAttr>() &&
!Global->hasAttr<CUDAGlobalAttr>() &&		!Global->hasAttr<CUDAGlobalAttr>() &&
!Global->hasAttr<CUDAConstantAttr>() &&		!Global->hasAttr<CUDAConstantAttr>() &&
!Global->hasAttr<CUDASharedAttr>() &&		!Global->hasAttr<CUDASharedAttr>() &&
!(LangOpts.HIP && Global->hasAttr<HIPPinnedShadowAttr>()))		!(LangOpts.HIP && Global->hasAttr<HIPPinnedShadowAttr>()) &&
		!Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
		!Global->getType()->isCUDADeviceBuiltinTextureType())
return;		return;
} else {		} else {
// We need to emit host-side 'shadows' for all global		// We need to emit host-side 'shadows' for all global
// device-side variables because the CUDA runtime needs their		// device-side variables because the CUDA runtime needs their
// size and host-side address in order to provide access to		// size and host-side address in order to provide access to
// their device-side incarnations.		// their device-side incarnations.

// So device-only functions are the only things we skip.		// So device-only functions are the only things we skip.
▲ Show 20 Lines • Show All 1,552 Lines • ▼ Show 20 Lines	if (GV && LangOpts.CUDA) {
} else {		} else {
// Host-side shadows of external declarations of device-side		// Host-side shadows of external declarations of device-side
// global variables become internal definitions. These have to		// global variables become internal definitions. These have to
// be internal in order to prevent name conflicts with global		// be internal in order to prevent name conflicts with global
// host variables with the same name in a different TUs.		// host variables with the same name in a different TUs.
if (D->hasAttr<CUDADeviceAttr>() \|\| D->hasAttr<CUDAConstantAttr>() \|\|		if (D->hasAttr<CUDADeviceAttr>() \|\| D->hasAttr<CUDAConstantAttr>() \|\|
D->hasAttr<HIPPinnedShadowAttr>()) {		D->hasAttr<HIPPinnedShadowAttr>()) {
Linkage = llvm::GlobalValue::InternalLinkage;		Linkage = llvm::GlobalValue::InternalLinkage;
		// Shadow variables and their properties must be registered with CUDA
// Shadow variables and their properties must be registered		// runtime. Skip Extern global variables, which will be registered in
// with CUDA runtime.		// the TU where they are defined.
unsigned Flags = 0;
if (!D->hasDefinition())
Flags \|= CGCUDARuntime::ExternDeviceVar;
if (D->hasAttr<CUDAConstantAttr>())
Flags \|= CGCUDARuntime::ConstantDeviceVar;
// Extern global variables will be registered in the TU where they are
// defined.
if (!D->hasExternalStorage())		if (!D->hasExternalStorage())
getCUDARuntime().registerDeviceVar(D, *GV, Flags);		getCUDARuntime().registerDeviceVar(D, *GV, !D->hasDefinition(),
} else if (D->hasAttr<CUDASharedAttr>())		D->hasAttr<CUDAConstantAttr>());
		} else if (D->hasAttr<CUDASharedAttr>()) {
// __shared__ variables are odd. Shadows do get created, but		// __shared__ variables are odd. Shadows do get created, but
// they are not registered with the CUDA runtime, so they		// they are not registered with the CUDA runtime, so they
// can't really be used to access their device-side		// can't really be used to access their device-side
// counterparts. It's not clear yet whether it's nvcc's bug or		// counterparts. It's not clear yet whether it's nvcc's bug or
// a feature, but we've got to do the same for compatibility.		// a feature, but we've got to do the same for compatibility.
Linkage = llvm::GlobalValue::InternalLinkage;		Linkage = llvm::GlobalValue::InternalLinkage;
		} else if (D->getType()->isCUDADeviceBuiltinSurfaceType() \|\|
		D->getType()->isCUDADeviceBuiltinTextureType()) {
		const RecordDecl *RD = D->getType()->getAs<RecordType>()->getDecl();
		// Builtin surfaces and textures and their template arguments are
		// also registered with CUDA runtime.
		if (const ClassTemplateSpecializationDecl *TD =
		dyn_cast<ClassTemplateSpecializationDecl>(RD)) {
		Linkage = llvm::GlobalValue::InternalLinkage;
		const TemplateArgumentList &Args = TD->getTemplateInstantiationArgs();
		if (RD->hasAttr<CUDADeviceBuiltinSurfaceTypeAttr>()) {
		assert(Args.size() == 2 &&
		"Unexpcted number of template arguments of CUDA device "
		traUnsubmitted Not Done Reply Inline Actions Nit: 'Unexpected' tra: Nit: 'Unexpected'
		"builtin surface type.");
		auto Type = Args[1].getAsIntegral();
		if (!D->hasExternalStorage())
		getCUDARuntime().registerDeviceSurf(D, *GV, !D->hasDefinition(),
		Type.getSExtValue());
		} else {
		assert(Args.size() == 3 &&
		"Unexpected number of template arguments of CUDA device "
		"builtin texture type.");
		auto Type = Args[1].getAsIntegral();
		auto Normalized = Args[2].getAsIntegral();
		assert(Normalized >= 0 && Normalized <= 1 &&
		"Unexpected normalized argument of CUDA device builtin "
		"texture type.");
		if (!D->hasExternalStorage())
		getCUDARuntime().registerDeviceTex(D, *GV, !D->hasDefinition(),
		Type.getSExtValue(),
		Normalized.getZExtValue());
		}
		}
		traUnsubmitted Not Done Reply Inline Actions This is the part I'm not comfortable with. It's possible for the user to use the attribute on other types that do not match the expectations encoded here. We should not be failing with an assert here because that's user error, not a compiler bug. Expectations we have for the types should be enforced by Sema and compiler should produce proper diagnostics. tra: This is the part I'm not comfortable with. It's possible for the user to use the attribute on…
		hliaoAuthorUnsubmitted Done Reply Inline Actions `device_builtin_surface_type` and `device_builtin_texture_type` should only be used internally. Regular users of either CUDA or HIP must not use them as they need special internal handling and coordination beyond the compiler itself. hliao: `device_builtin_surface_type` and `device_builtin_texture_type` should only be used internally.
		traUnsubmitted Done Reply Inline Actions I agree that it's probably not something that should be used by users. Still, such use should be reported as an error and should not crash the compiler. Asserts are for clang/llvm developers to catch the bugs in the compiler itself, not for the end users misusing something they should not. tra: I agree that it's probably not something that should be used by users. Still, such use should…
		hliaoAuthorUnsubmitted Done Reply Inline Actions addressed in the latest revision hliao: addressed in the latest revision
		}
}		}
}		}

// HIPPinnedShadowVar should remain in the final code object irrespective of		// HIPPinnedShadowVar should remain in the final code object irrespective of
// whether it is used or not within the code. Add it to used list, so that		// whether it is used or not within the code. Add it to used list, so that
// it will not get eliminated when it is unused. Also, it is an extern var		// it will not get eliminated when it is unused. Also, it is an extern var
// within device code, and it should not get initialized within device code.		// within device code, and it should not get initialized within device code.
if (IsHIPPinnedShadowVar)		if (IsHIPPinnedShadowVar)
▲ Show 20 Lines • Show All 1,876 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenTypes.cpp

	Show First 20 Lines • Show All 377 Lines • ▼ Show 20 Lines
	}			}

	/// ConvertType - Convert the specified type to its LLVM form.			/// ConvertType - Convert the specified type to its LLVM form.
	llvm::Type *CodeGenTypes::ConvertType(QualType T) {			llvm::Type *CodeGenTypes::ConvertType(QualType T) {
	T = Context.getCanonicalType(T);			T = Context.getCanonicalType(T);

	const Type *Ty = T.getTypePtr();			const Type *Ty = T.getTypePtr();

				// For the device-side compilation, CUDA device builtin surface/texture types
				// may be represented in different types.
				if (Context.getLangOpts().CUDAIsDevice) {
				if (T->isCUDADeviceBuiltinSurfaceType()) {
				if (auto Ty = CGM.getTargetCodeGenInfo()
				.getCUDADeviceBuiltinSurfaceDeviceType())
				return Ty;
				} else if (T->isCUDADeviceBuiltinTextureType()) {
				if (auto Ty = CGM.getTargetCodeGenInfo()
				.getCUDADeviceBuiltinTextureDeviceType())
				return Ty;
				}
				}

	// RecordTypes are cached and processed specially.			// RecordTypes are cached and processed specially.
	if (const RecordType *RT = dyn_cast<RecordType>(Ty))			if (const RecordType *RT = dyn_cast<RecordType>(Ty))
	return ConvertRecordDeclType(RT->getDecl());			return ConvertRecordDeclType(RT->getDecl());

	// See if type is already cached.			// See if type is already cached.
	llvm::DenseMap<const Type , llvm::Type >::iterator TCI = TypeCache.find(Ty);			llvm::DenseMap<const Type , llvm::Type >::iterator TCI = TypeCache.find(Ty);
	// If type is found in map then use it. Otherwise, convert type T.			// If type is found in map then use it. Otherwise, convert type T.
	if (TCI != TypeCache.end())			if (TCI != TypeCache.end())
	▲ Show 20 Lines • Show All 455 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.h

Show First 20 Lines • Show All 309 Lines • ▼ Show 20 Lines	createEnqueuedBlockKernel(CodeGenFunction &CGF,
llvm::Value *BlockLiteral) const;		llvm::Value *BlockLiteral) const;

/// \return true if the target supports alias from the unmangled name to the		/// \return true if the target supports alias from the unmangled name to the
/// mangled name of functions declared within an extern "C" region and marked		/// mangled name of functions declared within an extern "C" region and marked
/// as 'used', and having internal linkage.		/// as 'used', and having internal linkage.
virtual bool shouldEmitStaticExternCAliases() const { return true; }		virtual bool shouldEmitStaticExternCAliases() const { return true; }

virtual void setCUDAKernelCallingConvention(const FunctionType *&FT) const {}		virtual void setCUDAKernelCallingConvention(const FunctionType *&FT) const {}

		/// Return the device-side type for the CUDA device builtin surface type.
		virtual llvm::Type *getCUDADeviceBuiltinSurfaceDeviceType() const {
		// By default, no change from the original one.
		return nullptr;
		}
		/// Return the device-side type for the CUDA device builtin texture type.
		virtual llvm::Type *getCUDADeviceBuiltinTextureDeviceType() const {
		// By default, no change from the original one.
		return nullptr;
		}

		/// Emit the device-side copy of the builtin surface type.
		virtual bool emitCUDADeviceBuiltinSurfaceDeviceCopy(CodeGenFunction &CGF,
		LValue Dst,
		LValue Src) const {
		// DO NOTHING by default.
		return false;
		}
		/// Emit the device-side copy of the builtin texture type.
		virtual bool emitCUDADeviceBuiltinTextureDeviceCopy(CodeGenFunction &CGF,
		LValue Dst,
		LValue Src) const {
		// DO NOTHING by default.
		return false;
		}
};		};

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H		#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 22 Lines
#include "clang/CodeGen/CGFunctionInfo.h"		#include "clang/CodeGen/CGFunctionInfo.h"
#include "clang/CodeGen/SwiftCallingConv.h"		#include "clang/CodeGen/SwiftCallingConv.h"
#include "llvm/ADT/SmallBitVector.h"		#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
		#include "llvm/IR/IntrinsicsNVPTX.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm> // std::sort		#include <algorithm> // std::sort

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

// Helper for coercing an aggregate argument or return value into an integer		// Helper for coercing an aggregate argument or return value into an integer
▲ Show 20 Lines • Show All 6,391 Lines • ▼ Show 20 Lines
public:		public:
NVPTXTargetCodeGenInfo(CodeGenTypes &CGT)		NVPTXTargetCodeGenInfo(CodeGenTypes &CGT)
: TargetCodeGenInfo(new NVPTXABIInfo(CGT)) {}		: TargetCodeGenInfo(new NVPTXABIInfo(CGT)) {}

void setTargetAttributes(const Decl D, llvm::GlobalValue GV,		void setTargetAttributes(const Decl D, llvm::GlobalValue GV,
CodeGen::CodeGenModule &M) const override;		CodeGen::CodeGenModule &M) const override;
bool shouldEmitStaticExternCAliases() const override;		bool shouldEmitStaticExternCAliases() const override;

		llvm::Type *getCUDADeviceBuiltinSurfaceDeviceType() const override {
		return llvm::Type::getInt64Ty(getABIInfo().getVMContext());
		}

		llvm::Type *getCUDADeviceBuiltinTextureDeviceType() const override {
		return llvm::Type::getInt64Ty(getABIInfo().getVMContext());
		}

		bool emitCUDADeviceBuiltinSurfaceDeviceCopy(CodeGenFunction &CGF, LValue Dst,
		LValue Src) const override {
		emitBuiltinSurfTexDeviceCopy(CGF, Dst, Src);
		return true;
		}

		bool emitCUDADeviceBuiltinTextureDeviceCopy(CodeGenFunction &CGF, LValue Dst,
		LValue Src) const override {
		emitBuiltinSurfTexDeviceCopy(CGF, Dst, Src);
		return true;
		}

private:		private:
// Adds a NamedMDNode with F, Name, and Operand as operands, and adds the		// Adds a NamedMDNode with GV, Name, and Operand as operands, and adds the
// resulting MDNode to the nvvm.annotations MDNode.		// resulting MDNode to the nvvm.annotations MDNode.
static void addNVVMMetadata(llvm::Function *F, StringRef Name, int Operand);		static void addNVVMMetadata(llvm::GlobalValue *GV, StringRef Name,
		int Operand);

		static void emitBuiltinSurfTexDeviceCopy(CodeGenFunction &CGF, LValue Dst,
		LValue Src) {
		llvm::Value *Handle = nullptr;
		llvm::Constant *C =
		llvm::dyn_cast<llvm::Constant>(Src.getAddress(CGF).getPointer());
		// Lookup `addrspacecast` through the constant pointer if any.
		if (auto ASC = llvm::dyn_cast_or_null<llvm::AddrSpaceCastOperator>(C))
		C = llvm::cast<llvm::Constant>(ASC->getPointerOperand());
		traUnsubmitted Not Done Reply Inline Actions What's the expectation here? Do we care which address spaces we're casting to/from? tra: What's the expectation here? Do we care which address spaces we're casting to/from?
		hliaoAuthorUnsubmitted Done Reply Inline Actions We need to check whether we copy from that global variable directly. As all pointers are generic ones, the code here is to look through the `addrspacecast` constant expression for the original global variable. hliao: We need to check whether we copy from that global variable directly. As all pointers are…
		traUnsubmitted Not Done Reply Inline Actions I'm still not sure what exactly you want to do here. If the assumption is that all `addrspacecast` ops you may see are from global to generic AS, this assumption is not always valid. I can annotate any pointer with an arbitrary address space which may then be cast to generic. Or something else. If you accept Src as is, without special-casing addrspacecast, what's going to happen? AFAICT `nvvm_texsurf_handle_internal` does not really care about specific AS. tra: I'm still not sure what exactly you want to do here. If the assumption is that all…
		hliaoAuthorUnsubmitted Done Reply Inline Actions the backend needs a GlobalVariable as the argument for that intrinsic. The lookup through `addrspacecast` to check a global variable, which is created in the global address space and casted into a generic pointer. hliao: the backend needs a GlobalVariable as the argument for that intrinsic. The lookup through…
		if (auto GV = llvm::dyn_cast_or_null<llvm::GlobalVariable>(C)) {
		// Load the handle from the specific global variable using
		// `nvvm.texsurf.handle.internal` intrinsic.
		Handle = CGF.EmitRuntimeCall(
		CGF.CGM.getIntrinsic(llvm::Intrinsic::nvvm_texsurf_handle_internal,
		{GV->getType()}),
		{GV}, "texsurf_handle");
		} else
		Handle = CGF.EmitLoadOfScalar(Src, SourceLocation());
		CGF.EmitStoreOfScalar(Handle, Dst);
		}
};		};

/// Checks if the type is unsupported directly by the current target.		/// Checks if the type is unsupported directly by the current target.
static bool isUnsupportedType(ASTContext &Context, QualType T) {		static bool isUnsupportedType(ASTContext &Context, QualType T) {
if (!Context.getTargetInfo().hasFloat16Type() && T->isFloat16Type())		if (!Context.getTargetInfo().hasFloat16Type() && T->isFloat16Type())
return true;		return true;
if (!Context.getTargetInfo().hasFloat128Type() &&		if (!Context.getTargetInfo().hasFloat128Type() &&
(T->isFloat128Type() \|\|		(T->isFloat128Type() \|\|
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
}		}

ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) const {		ABIArgInfo NVPTXABIInfo::classifyArgumentType(QualType Ty) const {
// Treat an enum type as its underlying type.		// Treat an enum type as its underlying type.
if (const EnumType *EnumTy = Ty->getAs<EnumType>())		if (const EnumType *EnumTy = Ty->getAs<EnumType>())
Ty = EnumTy->getDecl()->getIntegerType();		Ty = EnumTy->getDecl()->getIntegerType();

// Return aggregates type as indirect by value		// Return aggregates type as indirect by value
if (isAggregateTypeForABI(Ty))		if (isAggregateTypeForABI(Ty)) {
		// Under CUDA device compilation, tex/surf builtin types are replaced with
		// object types and passed directly.
		if (getContext().getLangOpts().CUDAIsDevice) {
		if (Ty->isCUDADeviceBuiltinSurfaceType())
		return ABIArgInfo::getDirect(llvm::Type::getInt64Ty(getVMContext()));
		traUnsubmitted Not Done Reply Inline Actions This part could use some additional comments. Why do we return an int64? Is that the size of the handle object? Is it guaranteed to always be a 64-bit int, or does it depend on particualr PTX version? tra: This part could use some additional comments. Why do we return an int64? Is that the size of…
		if (Ty->isCUDADeviceBuiltinTextureType())
		return ABIArgInfo::getDirect(llvm::Type::getInt64Ty(getVMContext()));
		}
return getNaturalAlignIndirect(Ty, /* byval */ true);		return getNaturalAlignIndirect(Ty, /* byval */ true);
		}

return (Ty->isPromotableIntegerType() ? ABIArgInfo::getExtend(Ty)		return (Ty->isPromotableIntegerType() ? ABIArgInfo::getExtend(Ty)
: ABIArgInfo::getDirect());		: ABIArgInfo::getDirect());
}		}

void NVPTXABIInfo::computeInfo(CGFunctionInfo &FI) const {		void NVPTXABIInfo::computeInfo(CGFunctionInfo &FI) const {
if (!getCXXABI().classifyReturnType(FI))		if (!getCXXABI().classifyReturnType(FI))
FI.getReturnInfo() = classifyReturnType(FI.getReturnType());		FI.getReturnInfo() = classifyReturnType(FI.getReturnType());
Show All 11 Lines	Address NVPTXABIInfo::EmitVAArg(CodeGenFunction &CGF, Address VAListAddr,
QualType Ty) const {		QualType Ty) const {
llvm_unreachable("NVPTX does not support varargs");		llvm_unreachable("NVPTX does not support varargs");
}		}

void NVPTXTargetCodeGenInfo::setTargetAttributes(		void NVPTXTargetCodeGenInfo::setTargetAttributes(
const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {		const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {
if (GV->isDeclaration())		if (GV->isDeclaration())
return;		return;
		const VarDecl *VD = dyn_cast_or_null<VarDecl>(D);
		if (VD) {
		if (M.getLangOpts().CUDA) {
		if (VD->getType()->isCUDADeviceBuiltinSurfaceType())
		addNVVMMetadata(GV, "surface", 1);
		else if (VD->getType()->isCUDADeviceBuiltinTextureType())
		addNVVMMetadata(GV, "texture", 1);
		return;
		}
		}

const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);		const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);
if (!FD) return;		if (!FD) return;

llvm::Function *F = cast<llvm::Function>(GV);		llvm::Function *F = cast<llvm::Function>(GV);

// Perform special handling in OpenCL mode		// Perform special handling in OpenCL mode
if (M.getLangOpts().OpenCL) {		if (M.getLangOpts().OpenCL) {
// Use OpenCL function attributes to check for kernel functions		// Use OpenCL function attributes to check for kernel functions
Show All 32 Lines	if (CUDALaunchBoundsAttr *Attr = FD->getAttr<CUDALaunchBoundsAttr>()) {
if (MinBlocks > 0)		if (MinBlocks > 0)
// Create !{<func-ref>, metadata !"minctasm", i32 <val>} node		// Create !{<func-ref>, metadata !"minctasm", i32 <val>} node
addNVVMMetadata(F, "minctasm", MinBlocks.getExtValue());		addNVVMMetadata(F, "minctasm", MinBlocks.getExtValue());
}		}
}		}
}		}
}		}

void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::Function *F, StringRef Name,		void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV,
int Operand) {		StringRef Name, int Operand) {
llvm::Module *M = F->getParent();		llvm::Module *M = GV->getParent();
llvm::LLVMContext &Ctx = M->getContext();		llvm::LLVMContext &Ctx = M->getContext();

// Get "nvvm.annotations" metadata node		// Get "nvvm.annotations" metadata node
llvm::NamedMDNode *MD = M->getOrInsertNamedMetadata("nvvm.annotations");		llvm::NamedMDNode *MD = M->getOrInsertNamedMetadata("nvvm.annotations");

llvm::Metadata *MDVals[] = {		llvm::Metadata *MDVals[] = {
llvm::ConstantAsMetadata::get(F), llvm::MDString::get(Ctx, Name),		llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, Name),
llvm::ConstantAsMetadata::get(		llvm::ConstantAsMetadata::get(
llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), Operand))};		llvm::ConstantInt::get(llvm::Type::getInt32Ty(Ctx), Operand))};
// Append metadata to nvvm.annotations		// Append metadata to nvvm.annotations
MD->addOperand(llvm::MDNode::get(Ctx, MDVals));		MD->addOperand(llvm::MDNode::get(Ctx, MDVals));
}		}

bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {		bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {
return false;		return false;
▲ Show 20 Lines • Show All 3,525 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,920 Lines • ▼ Show 20 Lines	case ParsedAttr::AT_CUDADevice:
break;		break;
case ParsedAttr::AT_CUDAHost:		case ParsedAttr::AT_CUDAHost:
handleSimpleAttributeWithExclusions<CUDAHostAttr, CUDAGlobalAttr>(S, D, AL);		handleSimpleAttributeWithExclusions<CUDAHostAttr, CUDAGlobalAttr>(S, D, AL);
break;		break;
case ParsedAttr::AT_HIPPinnedShadow:		case ParsedAttr::AT_HIPPinnedShadow:
handleSimpleAttributeWithExclusions<HIPPinnedShadowAttr, CUDADeviceAttr,		handleSimpleAttributeWithExclusions<HIPPinnedShadowAttr, CUDADeviceAttr,
CUDAConstantAttr>(S, D, AL);		CUDAConstantAttr>(S, D, AL);
break;		break;
		case ParsedAttr::AT_CUDADeviceBuiltinSurfaceType:
		handleSimpleAttributeWithExclusions<CUDADeviceBuiltinSurfaceTypeAttr,
		CUDADeviceBuiltinTextureTypeAttr>(S, D,
		AL);
		traUnsubmitted Not Done Reply Inline Actions Nit: Formatting is a bit odd here. Why is AL on a separate line? tra: Nit: Formatting is a bit odd here. Why is AL on a separate line?
		hliaoAuthorUnsubmitted Done Reply Inline Actions it's formatted by `clang-format`, which is run in pre-merge checks hliao: it's formatted by `clang-format`, which is run in pre-merge checks
		traUnsubmitted Not Done Reply Inline Actions Sorry. It was an artifact of messed up fonts in my browser. Apparently I've ended up using proportional font. <rant> Why, oh why almost all fonts listed as 'fixed-width' on the chromebook are actually not ?! Even the ones that are fixed-width are prone to use ligatures and mess formatting. 'ffff' is still longer than 'fifi' for me.</rant> This code looks much better with fixed-width font. tra: Sorry. It was an artifact of messed up fonts in my browser. Apparently I've ended up using…
		break;
		case ParsedAttr::AT_CUDADeviceBuiltinTextureType:
		handleSimpleAttributeWithExclusions<CUDADeviceBuiltinTextureTypeAttr,
		CUDADeviceBuiltinSurfaceTypeAttr>(S, D,
		AL);
		break;
case ParsedAttr::AT_GNUInline:		case ParsedAttr::AT_GNUInline:
handleGNUInlineAttr(S, D, AL);		handleGNUInlineAttr(S, D, AL);
break;		break;
case ParsedAttr::AT_CUDALaunchBounds:		case ParsedAttr::AT_CUDALaunchBounds:
handleLaunchBoundsAttr(S, D, AL);		handleLaunchBoundsAttr(S, D, AL);
break;		break;
case ParsedAttr::AT_Restrict:		case ParsedAttr::AT_Restrict:
handleRestrictAttr(S, D, AL);		handleRestrictAttr(S, D, AL);
▲ Show 20 Lines • Show All 949 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/surface.cu

This file was added.

				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target

				// RUN: %clang_cc1 -std=c++11 -fcuda-is-device -triple nvptx64-nvidia-cuda -emit-llvm -o - %s \| FileCheck --check-prefix=DEVICE %s
				// RUN: echo "GPU binary would be here" > %t
				// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -target-sdk-version=8.0 -fcuda-include-gpubinary %t -emit-llvm -o - %s \| FileCheck --check-prefix=HOST %s

				struct surfaceReference {
				int desc;
				};

				template<typename T, int type = 1>
				struct __attribute__((device_builtin_surface_type)) surface : public surfaceReference {
				};

				traUnsubmitted Done Reply Inline Actions Please add a test for applying the attribute to a wrong type. I.e. a non-template or a template with different number or kinds of parameters. We should have a proper syntax error and not a compiler crash or silent failure. tra: Please add a test for applying the attribute to a wrong type. I.e. a non-template or a template…
				hliaoAuthorUnsubmitted Done Reply Inline Actions addressed in refined tests in the latest revision hliao: addressed in refined tests in the latest revision
				// On the device side, surface references are represented as `i64` handles.
				// DEVICE: @surf = addrspace(1) global i64 0, align 4
				// On the host side, they remain in the original type.
				// HOST: @surf = internal global %struct.surface
				// HOST: @0 = private unnamed_addr constant [5 x i8] c"surf\00"
				surface<void, 2> surf;

				__attribute__((device)) int suld_2d_zero(surface<void, 2>, int, int) asm("llvm.nvvm.suld.2d.i32.zero");

				// DEVICE-LABEL: i32 @_Z3fooii(i32 %x, i32 %y)
				// DEVICE: call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @surf)
				// DEVICE: call i32 @llvm.nvvm.suld.2d.i32.zero(i64 %{{.}}, i32 %{{.}}, i32 %{{.*}})
				__attribute__((device)) int foo(int x, int y) {
				return suld_2d_zero(surf, x, y);
				}

				// HOST: define internal void @[[PREFIX:__cuda]]_register_globals
				// Texture references need registering with correct arguments.
				// HOST: call void @[[PREFIX]]RegisterSurface(i8** %0, i8{{.}}({{.}}@surf{{.}}), i8{{.}}({{.}}@0{{.}}), i8{{.}}({{.}}@0{{.}}), i32 2, i32 0)

				// They also need annotating in metadata.
				// DEVICE: !0 = !{i64 addrspace(1)* @surf, !"surface", i32 1}

clang/test/CodeGenCUDA/texture.cu

This file was added.

				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target

				// RUN: %clang_cc1 -std=c++11 -fcuda-is-device -triple nvptx64-nvidia-cuda -emit-llvm -o - %s \| FileCheck --check-prefix=DEVICE %s
				// RUN: echo "GPU binary would be here" > %t
				// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -target-sdk-version=8.0 -fcuda-include-gpubinary %t -emit-llvm -o - %s \| FileCheck --check-prefix=HOST %s

				struct textureReference {
				int desc;
				};

				enum ReadMode {
				ElementType = 0,
				NormalizedFloat = 1
				};

				template<typename T, int dim = 1, enum ReadMode mode = ElementType>
				struct __attribute__((device_builtin_texture_type)) texture : public textureReference {
				};

				// On the device side, texture references are represented as `i64` handles.
				// DEVICE: @tex = addrspace(1) global i64 0, align 4
				// DEVICE: @norm = addrspace(1) global i64 0, align 4
				// On the host side, they remain in the original type.
				// HOST: @tex = internal global %struct.texture
				// HOST: @norm = internal global %struct.texture
				// HOST: @0 = private unnamed_addr constant [4 x i8] c"tex\00"
				// HOST: @1 = private unnamed_addr constant [5 x i8] c"norm\00"
				texture<float, 2, ElementType> tex;
				texture<float, 2, NormalizedFloat> norm;

				struct v4f {
				float x, y, z, w;
				};

				__attribute__((device)) v4f tex2d_ld(texture<float, 2, ElementType>, float, float) asm("llvm.nvvm.tex.unified.2d.v4f32.f32");
				__attribute__((device)) v4f tex2d_ld(texture<float, 2, NormalizedFloat>, int, int) asm("llvm.nvvm.tex.unified.2d.v4f32.s32");

				// DEVICE-LABEL: float @_Z3fooff(float %x, float %y)
				// DEVICE: call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex)
				// DEVICE: call %struct.v4f @llvm.nvvm.tex.unified.2d.v4f32.f32(i64 %{{.}}, float %{{.}}, float %{{.*}})
				// DEVICE: call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @norm)
				// DEVICE: call %struct.v4f @llvm.nvvm.tex.unified.2d.v4f32.s32(i64 %{{.}}, i32 %{{.}}, i32 %{{.*}})
				__attribute__((device)) float foo(float x, float y) {
				return tex2d_ld(tex, x, y).x + tex2d_ld(norm, int(x), int(y)).x;
				}

				// HOST: define internal void @[[PREFIX:__cuda]]_register_globals
				// Texture references need registering with correct arguments.
				// HOST: call void @[[PREFIX]]RegisterTexture(i8** %0, i8{{.}}({{.}}@tex{{.}}), i8{{.}}({{.}}@0{{.}}), i8{{.}}({{.}}@0{{.}}), i32 2, i32 0, i32 0)
				// HOST: call void @[[PREFIX]]RegisterTexture(i8** %0, i8{{.}}({{.}}@norm{{.}}), i8{{.}}({{.}}@1{{.}}), i8{{.}}({{.}}@1{{.}}), i32 2, i32 1, i32 0)

				// They also need annotating in metadata.
				// DEVICE: !0 = !{i64 addrspace(1)* @tex, !"texture", i32 1}
				// DEVICE: !1 = !{i64 addrspace(1)* @norm, !"texture", i32 1}

clang/test/SemaCUDA/attr-declspec.cu

	// Test the __declspec spellings of CUDA attributes.			// Test the __declspec spellings of CUDA attributes.
	//			//
	// RUN: %clang_cc1 -fsyntax-only -fms-extensions -verify %s			// RUN: %clang_cc1 -fsyntax-only -fms-extensions -verify %s
	// RUN: %clang_cc1 -fsyntax-only -fms-extensions -fcuda-is-device -verify %s			// RUN: %clang_cc1 -fsyntax-only -fms-extensions -fcuda-is-device -verify %s
	// Now pretend that we're compiling a C file. There should be warnings.			// Now pretend that we're compiling a C file. There should be warnings.
	// RUN: %clang_cc1 -DEXPECT_WARNINGS -fms-extensions -fsyntax-only -verify -x c %s			// RUN: %clang_cc1 -DEXPECT_WARNINGS -fms-extensions -fsyntax-only -verify -x c %s

	#if defined(EXPECT_WARNINGS)			#if defined(EXPECT_WARNINGS)
	// expected-warning@+12 {{'__device__' attribute ignored}}			// expected-warning@+17 {{'__device__' attribute ignored}}
	// expected-warning@+12 {{'__global__' attribute ignored}}			// expected-warning@+17 {{'__global__' attribute ignored}}
	// expected-warning@+12 {{'__constant__' attribute ignored}}			// expected-warning@+17 {{'__constant__' attribute ignored}}
	// expected-warning@+12 {{'__shared__' attribute ignored}}			// expected-warning@+17 {{'__shared__' attribute ignored}}
	// expected-warning@+12 {{'__host__' attribute ignored}}			// expected-warning@+17 {{'__host__' attribute ignored}}
				// expected-warning@+22 {{'__device_builtin_surface_type__' attribute ignored}}
				// expected-warning@+22 {{'__device_builtin_texture_type__' attribute ignored}}
				// expected-warning@+22 {{'__device_builtin_surface_type__' attribute ignored}}
				// expected-warning@+22 {{'__device_builtin_texture_type__' attribute ignored}}
	//			//
	// (Currently we don't for the other attributes. They are implemented with			// (Currently we don't for the other attributes. They are implemented with
	// IgnoredAttr, which is ignored irrespective of any LangOpts.)			// IgnoredAttr, which is ignored irrespective of any LangOpts.)
	#else			#else
	// expected-no-diagnostics			// expected-warning@+14 {{'__device_builtin_surface_type__' attribute only applies to types}}
				// expected-warning@+14 {{'__device_builtin_texture_type__' attribute only applies to types}}
	#endif			#endif

	__declspec(__device__) void f_device();			__declspec(__device__) void f_device();
	__declspec(__global__) void f_global();			__declspec(__global__) void f_global();
	__declspec(__constant__) int* g_constant;			__declspec(__constant__) int* g_constant;
	__declspec(__shared__) float *g_shared;			__declspec(__shared__) float *g_shared;
	__declspec(__host__) void f_host();			__declspec(__host__) void f_host();
	__declspec(__device_builtin__) void f_device_builtin();			__declspec(__device_builtin__) void f_device_builtin();
	typedef __declspec(__device_builtin__) const void *t_device_builtin;			typedef __declspec(__device_builtin__) const void *t_device_builtin;
	enum __declspec(__device_builtin__) e_device_builtin {E};			enum __declspec(__device_builtin__) e_device_builtin {E};
	__declspec(__device_builtin__) int v_device_builtin;			__declspec(__device_builtin__) int v_device_builtin;
	__declspec(__cudart_builtin__) void f_cudart_builtin();			__declspec(__cudart_builtin__) void f_cudart_builtin();
	__declspec(__device_builtin_surface_type__) unsigned long long surface_var;			__declspec(__device_builtin_surface_type__) unsigned long long surface_var;
	__declspec(__device_builtin_texture_type__) unsigned long long texture_var;			__declspec(__device_builtin_texture_type__) unsigned long long texture_var;
				struct __declspec(__device_builtin_surface_type__) surf_ref {};
				struct __declspec(__device_builtin_texture_type__) tex_ref {};

	// Note that there's no __declspec spelling of nv_weak.			// Note that there's no __declspec spelling of nv_weak.

clang/test/SemaCUDA/attributes-on-non-cuda.cu

	// Tests that CUDA attributes are warnings when compiling C files, but not when			// Tests that CUDA attributes are warnings when compiling C files, but not when
	// compiling CUDA files.			// compiling CUDA files.
	//			//
	// RUN: %clang_cc1 -fsyntax-only -verify %s			// RUN: %clang_cc1 -fsyntax-only -verify %s
	// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s			// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s
	// Now pretend that we're compiling a C file. There should be warnings.			// Now pretend that we're compiling a C file. There should be warnings.
	// RUN: %clang_cc1 -DEXPECT_WARNINGS -fsyntax-only -verify -x c %s			// RUN: %clang_cc1 -DEXPECT_WARNINGS -fsyntax-only -verify -x c %s

	#if defined(EXPECT_WARNINGS)			#if defined(EXPECT_WARNINGS)
	// expected-warning@+12 {{'device' attribute ignored}}			// expected-warning@+17 {{'device' attribute ignored}}
	// expected-warning@+12 {{'global' attribute ignored}}			// expected-warning@+17 {{'global' attribute ignored}}
	// expected-warning@+12 {{'constant' attribute ignored}}			// expected-warning@+17 {{'constant' attribute ignored}}
	// expected-warning@+12 {{'shared' attribute ignored}}			// expected-warning@+17 {{'shared' attribute ignored}}
	// expected-warning@+12 {{'host' attribute ignored}}			// expected-warning@+17 {{'host' attribute ignored}}
				// expected-warning@+23 {{'device_builtin_surface_type' attribute ignored}}
				// expected-warning@+23 {{'device_builtin_texture_type' attribute ignored}}
				// expected-warning@+23 {{'device_builtin_surface_type' attribute ignored}}
				// expected-warning@+23 {{'device_builtin_texture_type' attribute ignored}}
	//			//
	// NOTE: IgnoredAttr in clang which is used for the rest of			// NOTE: IgnoredAttr in clang which is used for the rest of
	// attributes ignores LangOpts, so there are no warnings.			// attributes ignores LangOpts, so there are no warnings.
	#else			#else
	// expected-no-diagnostics			// expected-warning@+15 {{'device_builtin_surface_type' attribute only applies to types}}
				// expected-warning@+15 {{'device_builtin_texture_type' attribute only applies to types}}
	#endif			#endif

	__attribute__((device)) void f_device();			__attribute__((device)) void f_device();
	__attribute__((global)) void f_global();			__attribute__((global)) void f_global();
	__attribute__((constant)) int* g_constant;			__attribute__((constant)) int* g_constant;
	__attribute__((shared)) float *g_shared;			__attribute__((shared)) float *g_shared;
	__attribute__((host)) void f_host();			__attribute__((host)) void f_host();
	__attribute__((device_builtin)) void f_device_builtin();			__attribute__((device_builtin)) void f_device_builtin();
	typedef __attribute__((device_builtin)) const void *t_device_builtin;			typedef __attribute__((device_builtin)) const void *t_device_builtin;
	enum __attribute__((device_builtin)) e_device_builtin {E};			enum __attribute__((device_builtin)) e_device_builtin {E};
	__attribute__((device_builtin)) int v_device_builtin;			__attribute__((device_builtin)) int v_device_builtin;
	__attribute__((cudart_builtin)) void f_cudart_builtin();			__attribute__((cudart_builtin)) void f_cudart_builtin();
	__attribute__((nv_weak)) void f_nv_weak();			__attribute__((nv_weak)) void f_nv_weak();
	__attribute__((device_builtin_surface_type)) unsigned long long surface_var;			__attribute__((device_builtin_surface_type)) unsigned long long surface_var;
	__attribute__((device_builtin_texture_type)) unsigned long long texture_var;			__attribute__((device_builtin_texture_type)) unsigned long long texture_var;
				struct __attribute__((device_builtin_surface_type)) surf_ref {};
				struct __attribute__((device_builtin_texture_type)) tex_ref {};

llvm/include/llvm/IR/Operator.h

Show First 20 Lines • Show All 593 Lines • ▼ Show 20 Lines	Type *getSrcTy() const {
return getOperand(0)->getType();		return getOperand(0)->getType();
}		}

Type *getDestTy() const {		Type *getDestTy() const {
return getType();		return getType();
}		}
};		};

		class AddrSpaceCastOperator
		: public ConcreteOperator<Operator, Instruction::AddrSpaceCast> {
		friend class AddrSpaceCastInst;
		friend class ConstantExpr;

		public:
		Value *getPointerOperand() { return getOperand(0); }

		const Value *getPointerOperand() const { return getOperand(0); }

		unsigned getSrcAddressSpace() const {
		return getPointerOperand()->getType()->getPointerAddressSpace();
		}

		unsigned getDestAddressSpace() const {
		return getType()->getPointerAddressSpace();
		}
		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_IR_OPERATOR_H		#endif // LLVM_IR_OPERATOR_H

This is an archive of the discontinued LLVM Phabricator instance.

[cuda][hip] Add CUDA builtin surface/texture reference support.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 251101

clang/include/clang/AST/Type.h

clang/include/clang/Basic/Attr.td

clang/include/clang/Basic/AttrDocs.td

clang/lib/AST/Type.cpp

clang/lib/CodeGen/CGCUDANV.cpp

clang/lib/CodeGen/CGCUDARuntime.h

clang/lib/CodeGen/CGExprAgg.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/lib/CodeGen/CodeGenTypes.cpp

clang/lib/CodeGen/TargetInfo.h

clang/lib/CodeGen/TargetInfo.cpp

clang/lib/Sema/SemaDeclAttr.cpp

clang/test/CodeGenCUDA/surface.cu

clang/test/CodeGenCUDA/texture.cu

clang/test/SemaCUDA/attr-declspec.cu

clang/test/SemaCUDA/attributes-on-non-cuda.cu

llvm/include/llvm/IR/Operator.h

[cuda][hip] Add CUDA builtin surface/texture reference support.
ClosedPublic