This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/AST/
-
clang/
-
AST/
18/18
GlobalDecl.h
-
lib/
-
AST/
-
Expr.cpp
14/14
ItaniumMangle.cpp
-
Mangle.cpp
-
CodeGen/
-
CGCUDANV.cpp
2/2
CGCUDARuntime.h
-
CGDecl.cpp
-
CGExpr.cpp
-
CodeGenModule.h
14/14
CodeGenModule.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
amdgpu-kernel-arg-pointer-type.cu
-
kernel-stub-name.cu
-
unnamed-types.cu

Differential D68578

[HIP] Fix device stub name
ClosedPublic

Authored by yaxunl on Oct 7 2019, 7:45 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
hliao
rsmith

Commits

rG22c457a869d5: [HIP] Fix device stub name

Summary

HIP emits a device stub function for each kernel in host code.

The HIP debugger requires device stub function to have a different unmangled name as the kernel.

Currently the name of the device stub function is the mangled name with a postfix .stub. However,
this does not work with the HIP debugger since the unmangled name is the same as the kernel.

This patch adds prefix __device__stub__ to the unmangled name of the device stub before mangling,
therefore the device stub function has a valid mangled name which is different than the device kernel
name. The device side kernel name is kept unchanged. kernels with extern "C" also gets the prefix added
to the corresponding device stub function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.Oct 7 2019, 7:45 AM

Herald added a subscriber: erik.pilkington. · View Herald TranscriptOct 7 2019, 7:45 AM

tra added a reviewer: rsmith.Oct 7 2019, 8:56 AM

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

In D68578#1697822, @tra wrote:

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

HIP debugger is a branch of gdb and the changes to support HIP will be upstreamed. When users set break point on a kernel, they intend to set a break point on the real kernel, not the device stub function. The device stub function is only a compiler generated helper function to help launch the kernel. Therefore it should have a different name so that it does not interfere with the symbol resolution of the real kernel.

hliao added inline comments.Oct 7 2019, 9:48 AM

lib/CodeGen/CGCUDANV.cpp
235 ↗	(On Diff #223598)	keeping the original assertion in HIP is still valuable to capture naming mismatch issue for unnamed types

In D68578#1697851, @yaxunl wrote:

In D68578#1697822, @tra wrote:

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

HIP debugger is a branch of gdb and the changes to support HIP will be upstreamed. When users set break point on a kernel, they intend to set a break point on the real kernel, not the device stub function. The device stub function is only a compiler generated helper function to help launch the kernel. Therefore it should have a different name so that it does not interfere with the symbol resolution of the real kernel.

I would agree that having distinct names for the device-side kernel and it's host-side stub would probably make things easier for debugger.
However, debugger does have access to mangled names and does see the '.stub' suffix in the mangled name. I don't understand why it can't be considered to disambiguate between the kernel and the stub?
I'm clearly missing something here. Is there a chance to get someone from the debugger team to chime in on this review directly?

Also, I would not agree that they intend to set a break point on the real kernel is the only scenario. E.g. quite often when I debug CUDA stuff, I do only care about host-side things and I do want to set breakpoint on the stub, so I can check kernel call parameters as they are passed to the kernel. It would be great if there were a way to explicitly tell debugger whether we want host-side stub or the kernel without having user to know how particular compiler transforms the name. For the user both entities have the same name, but distinct location and there should be a way to express that in the debugger.

In D68578#1697898, @tra wrote:

In D68578#1697851, @yaxunl wrote:

In D68578#1697822, @tra wrote:

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

HIP debugger is a branch of gdb and the changes to support HIP will be upstreamed. When users set break point on a kernel, they intend to set a break point on the real kernel, not the device stub function. The device stub function is only a compiler generated helper function to help launch the kernel. Therefore it should have a different name so that it does not interfere with the symbol resolution of the real kernel.

I would agree that having distinct names for the device-side kernel and it's host-side stub would probably make things easier for debugger.
However, debugger does have access to mangled names and does see the '.stub' suffix in the mangled name. I don't understand why it can't be considered to disambiguate between the kernel and the stub?
I'm clearly missing something here. Is there a chance to get someone from the debugger team to chime in on this review directly?

Also, I would not agree that they intend to set a break point on the real kernel is the only scenario. E.g. quite often when I debug CUDA stuff, I do only care about host-side things and I do want to set breakpoint on the stub, so I can check kernel call parameters as they are passed to the kernel. It would be great if there were a way to explicitly tell debugger whether we want host-side stub or the kernel without having user to know how particular compiler transforms the name. For the user both entities have the same name, but distinct location and there should be a way to express that in the debugger.

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function. It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

In contrast, the stub is effectively part of the implementation of actually launching the device function. It should have a distinct name. The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it. My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

In D68578#1698864, @t-tye wrote:

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it. My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

Yes this patch does this.

In D68578#1698864, @t-tye wrote:

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.

What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.

It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

OK. This sounds like __host__/__device__ function overloads and what you're saying does make sense for that.

In contrast, the stub is effectively part of the implementation of actually launching the device function. It should have a distinct name.

I'm not sure how the requirement of distinct name follows from the fact that the stub is the host-side part of the device-side kernel? To me it looks like an argument for them to have the same name so it's clear that they are both part of the same function as written in the source.

The don't have to be different. CUDA (and HIP) does not allow overloading of kernels, so the stub and the kernel can have identical names as in the example of __host__ and __device__ overloads you've described above, only now it's __host__ stub + __global__ kernel itself, instead of two user-implemented functions. Debugger, of course, will need to know about that to pick the stub or kernel as the breakpoint location, but that appears doable.

The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.

What happens if the stub and the kernel do have identical names?
My understanding, based on your comments above is that debugger does know about host and device 'spaces' and that it can find pointers to both host and device functions and set appropriate breakpoints for both. In this case it would normally attempt to set breakpoint on both the stub and the kernel as it would in case of __host__/__device__ overloads you've described above. In case of stub/kernel, we would want the breakpoint set only on the kernel itself. Given that debugger does have ability to tell host and device functions/symbols apart, the difficulty is really in being able to tell a real host function from the stub, so we can skip it.

Is that indeed what we want/need? Is there something else?

Does debugger know that device-side function is a kernel? In case of CUDA, the kernels are distinct from regular device-side functions. I don't know whether that's the case for AMDGPU.
If debugger can tell that particular device function is a kernel, that can be used to infer that the matching host-side symbol is a stub and skip setting a breakpoint on it.
If that does not work, debugger presumably has access to the mangled symbols for the potential breakpoint locations. The stub currently has distinct .stub suffix. This can also be used to tell it apart from a regular __host__ function.

I do not see how changing the source-level name for the stub is going to change things in principle. It's just yet another way to disambiguate a real __host__ function from a host stub we generate for the kernel.
Is there anything else about the stubs that requires them to have a name different from the kernel?

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it.

Currently the mangled name has .stub suffix which is discarded during unmangling, so unmangled names for the stub and the kernel end up being identical. I'm trying to figure out why is it a problem to be fixed in the compiler.

My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

This patch proposes changing the source-level name for the stub. Unfortunately the way it attempt to implement it is by doing the renaming during mangling phase itself. This appears to be the wrong place to change source-level name.
Before figuring out what would be the right thing to do here, I want to understand why we're doing it. I appreciate your description of what drives this requirement. I think I have petter idea of it now, but I still have some questions. Please bear with me.

In D68578#1700652, @tra wrote:

In D68578#1698864, @t-tye wrote:

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.

What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.

We want to support a heterogeneous gdb debugger for a single source programming language. We would like to follow the same conventions used by compilers that implement other languages supported by gdb. The debugger can use symbols to find functions. It supports unmangling them and using the unmangled name to indicate the source language function it corresponds to. We would like this to remain true. The stub is not the kernel function, it is a helper function that will launch the kernel. In many ways it is acting like other trampolines. Therefore, it should be named as a internal helper function. The debugger can chose what it wants to do with it, but it does not want to be confused into thinking it actually IS the kernel function. If the user sets a breakpoint in the code of the kernel function then that breakpoint should be hit by every instance of the kernel that is created by the dispatch. It should not be hit by the code that is initiatig the dispatch. If that is what the user wanted they would set a breakpoint at the statement that performs the dispatch launch.

Whether the kernel is present in the CPU or GPU code is s separate concept. If it is present in both, then both would have the same symbol as they are both implementing the kernel. The debugger would set a breakpoint in both as from a language execution model poit of view if either piece of code executes it corresponds to the same source language kernel.

It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

OK. This sounds like __host__/__device__ function overloads and what you're saying does make sense for that.

Right. Well its not really and overload, not a request to have instances of the kernel avaiable for either the CPU or GPU to execute. They are the same function, not different overloads.

In contrast, the stub is effectively part of the implementation of actually launching the device function. It should have a distinct name.

I'm not sure how the requirement of distinct name follows from the fact that the stub is the host-side part of the device-side kernel? To me it looks like an argument for them to have the same name so it's clear that they are both part of the same function as written in the source.

The don't have to be different. CUDA (and HIP) does not allow overloading of kernels, so the stub and the kernel can have identical names as in the example of __host__ and __device__ overloads you've described above, only now it's __host__ stub + __global__ kernel itself, instead of two user-implemented functions. Debugger, of course, will need to know about that to pick the stub or kernel as the breakpoint location, but that appears doable.

As mentioned, the stub is not the host side part of the device side kernel. The stub is a means to launch the kernel. That launching could happen on the device (device enqueue), or on the host. The kernel itself could execute on the device or the host. There is the act of launching the kernel (the function call statement if you will), and the kernel instances that come into existence (the threads created to execute the body of the kernel according to the launch bounds presented at the launch statement).

The user may want to set a breakpoint at the launch statement, or in the body of the kernel. The language execution model treats those separately. The standard debugger expects the symbols to reflect the language constructs. Hence wanting the launch stub (which is compiler generated and not user written) to be distinct from the kernel body. If the compiler decided not to use a launch stub function (perhaps it is launching a kernel that will execute on the CPU and so does not need a helper function) then that is its choice. It is desirable that the debugger does not have to know about the choices made by the compiler. It simply wants to know that a symbol that appears to be a user source language construct is in fact exactly that. It does not have to do any compiler specific filtering.

When the debugger has DWARF available, it can use that to get a more accurate picture, but gdb does have the ability to fall back on just symbols, and it is that functionality we would like to preserve in the same manner as is done for other languages.

The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.

It is not really a work around. It is making the symbols reflect the reality of the source language program. The debugger can then simply trust that information and use it as gdb does for other languages.

Features such only break on this inlining of a function may be useful, but gdb does not currenty support that. Similarly, it does not support breakpoints based on the architecture. That could be simulated by having a conditional breakpoint that continues if the current thread architecture does not equal the chosen architecture. If such a feature were widely used it could be accelerated by adding architecture conditional breakpoints. I can add that to our list of suggestions.

What happens if the stub and the kernel do have identical names?

The stub is compiler generated so should never have a name that can collide with a user name.

My understanding, based on your comments above is that debugger does know about host and device 'spaces' and that it can find pointers to both host and device functions and set appropriate breakpoints for both. In this case it would normally attempt to set breakpoint on both the stub and the kernel as it would in case of __host__/__device__ overloads you've described above. In case of stub/kernel, we would want the breakpoint set only on the kernel itself. Given that debugger does have ability to tell host and device functions/symbols apart, the difficulty is really in being able to tell a real host function from the stub, so we can skip it.

Is that indeed what we want/need? Is there something else?

As mentioned above. The desire is to have the compiler generate information in a standard way so the debugger can consume it in a standard way. If the user wants to set a breakpoint in the kernel, the compiler information should only lead the debugger to setting a breakpoint in the kernel, not in some other function that is used in the implementation of launching kernels. The user expects a kernel breakpoint to only be hit by the threads that execute the kernel instances created by the launch as that is how the language is defined. The debugger simply keeps track of what code objects are loaded, and what symbols they contain. It does not need to know the distinction between host and device code to implement the basic debugger functionality.

Does debugger know that device-side function is a kernel? In case of CUDA, the kernels are distinct from regular device-side functions. I don't know whether that's the case for AMDGPU.
If debugger can tell that particular device function is a kernel, that can be used to infer that the matching host-side symbol is a stub and skip setting a breakpoint on it.
If that does not work, debugger presumably has access to the mangled symbols for the potential breakpoint locations. The stub currently has distinct .stub suffix. This can also be used to tell it apart from a regular __host__ function.

The debugger does not have to care if the symbol is for a kernel or a function, it will simply plonk a breakpoint in the code that corresponds to the symbol and report when it is hit. If the symbols follow the conventions used by other languages then the debugger does not have to do anything special to support a source language that happens to be executing on multiple devices.

I do not see how changing the source-level name for the stub is going to change things in principle. It's just yet another way to disambiguate a real __host__ function from a host stub we generate for the kernel.
Is there anything else about the stubs that requires them to have a name different from the kernel?

The stub is not a source level function, it is a compiler generated function, and the desire is that it be named so as not to conflict with actual source level constructs as is done for other compiler generated entries.

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it.

Currently the mangled name has .stub suffix which is discarded during unmangling, so unmangled names for the stub and the kernel end up being identical. I'm trying to figure out why is it a problem to be fixed in the compiler.

My understanding was that an earlier review rejected adding the suffix to the mangled name as it broke unmangling. It also does not seem the right thing to do as it does not follow the convention for oter compiler generate symbols.

My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

This patch proposes changing the source-level name for the stub. Unfortunately the way it attempt to implement it is by doing the renaming during mangling phase itself. This appears to be the wrong place to change source-level name.

What do you think is the right place to do it?

Before figuring out what would be the right thing to do here, I want to understand why we're doing it. I appreciate your description of what drives this requirement. I think I have petter idea of it now, but I still have some questions. Please bear with me.

That makes complete sense. The desire is to have the debugger treat heterogeneous single source debugging in the same way as traditional CPU debugging. That the user experience is basically the same. By following the same conventions used for the CPU in the GPU, and implementing similar runtime controls, it allows a common debugger code base to support both with minimal change and ensure a consistent user experience. We would like to avoid adding special treatment to support the GPU in the debugger if following the existing conventions/standards will allow the existing code to simply work.

Hopefully the above responses help describe the motivation for this. If not let me know and thanks for taking the time to review.

In D68578#1700652, @tra wrote:

This patch proposes changing the source-level name for the stub. Unfortunately the way it attempt to implement it is by doing the renaming during mangling phase itself. This appears to be the wrong place to change source-level name.

A specific difficulty here is that we need not only get the mangled kernel stub name, but also get the mangled kernel name. However, there is only one FuncDecl for the kernel. If we change the name of the FuncDecl to the stub name to be different from the kernel name, then we cannot get the mangled name for the kernel. That's why this patch does not change FuncDecl but let the mangler mangle it in two different ways. An alternative approach would be to create two FuncDecls, one for stub, one for kernel, and keep a map from the stub to the kernel. In this way we do not need to change the mangler.

Apologies for the delay with my response.

In D68578#1700819, @t-tye wrote:

In D68578#1700652, @tra wrote:

In D68578#1698864, @t-tye wrote:

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.

What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.

We want to support a heterogeneous gdb debugger for a single source programming language. We would like to follow the same conventions used by compilers that implement other languages supported by gdb. The debugger can use symbols to find functions. It supports unmangling them and using the unmangled name to indicate the source language function it corresponds to. We would like this to remain true. The stub is not the kernel function, it is a helper function that will launch the kernel. In many ways it is acting like other trampolines. Therefore, it should be named as a internal helper function. The debugger can chose what it wants to do with it, but it does not want to be confused into thinking it actually IS the kernel function. If the user sets a breakpoint in the code of the kernel function then that breakpoint should be hit by every instance of the kernel that is created by the dispatch. It should not be hit by the code that is initiatig the dispatch. If that is what the user wanted they would set a breakpoint at the statement that performs the dispatch launch.

Whether the kernel is present in the CPU or GPU code is s separate concept. If it is present in both, then both would have the same symbol as they are both implementing the kernel. The debugger would set a breakpoint in both as from a language execution model poit of view if either piece of code executes it corresponds to the same source language kernel.

Thank you for the details.

It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

OK. This sounds like __host__/__device__ function overloads and what you're saying does make sense for that.

Right. Well its not really and overload, not a request to have instances of the kernel avaiable for either the CPU or GPU to execute. They are the same function, not different overloads.

In case of cuda they may be overloads -- there may be two functions with identical signatures (modulo __host__/__device__ attributes) or multiple functions with the same names but different signatures with different __host__/__device__ attributes. It does not change things in principle. I'm just pointing out that CUDA (and thus HIP) as implemented in clang uses target attributes as another dimension in the space of functions with the same name.

The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.

It is not really a work around. It is making the symbols reflect the reality of the source language program. The debugger can then simply trust that information and use it as gdb does for other languages.

It still seems to boil down to "the stub should not get in the way of debugger accessing the function itself", but I see your point and agree that it would be useful if the stub could be an entity separate from the function itself.

Now we need to figure out what would be the best way to implement it.

Clang uses the real function in AST to generate the IR for the stub and, because of that, the stub ends up using function's name.
Actually, the situation is a bit worse than that. Clang implicitly relies on __host__ and __device__ entities not being codegen'ed at the same time, so we don't have to care about name conflicts.
Your description above indicates that the assumption is somewhat optimistic and that's what really causes the issue here.

I think @yaxunl's suggestion that we may need different FuncDecl's would be a good way forward.
I suspect we may already have places where clang deals with compiler-generated functions, so we should have existing examples of how it could be done.

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

In D68578#1737351, @rjmccall wrote:

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

It seems GlobalDecl is just a wrapper for concrete Decl's

https://github.com/llvm/llvm-project/blob/31817731167135870259ef1e7387746345b96a2f/clang/include/clang/AST/GlobalDecl.h#L40

Here we need to get the mangled name of a kernel and the mangled name of the same kernel but with a prefix before mangling.

Can I use GlobalDecl with the same FunctionDecl* but different multi-version index to indicate it is a kernel or a stub, then let the mangler mangle them differently?

In D68578#1737415, @yaxunl wrote:

In D68578#1737351, @rjmccall wrote:

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

It seems GlobalDecl is just a wrapper for concrete Decl's

It's a Decl plus a discriminator which is required for certain kinds of declaration. See e.g. GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type). GlobalDecl asserts if you try to construct it using GlobalDecl(FunctionDecl*) with a constructor/destructor declaration; we could similarly make that forbid construction with a kernel and then require code to use a GlobalDecl constructor that passes down whether it's the kernel or the stub that's being requested.

John.

In D68578#1737419, @rjmccall wrote:

In D68578#1737415, @yaxunl wrote:

In D68578#1737351, @rjmccall wrote:

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

It seems GlobalDecl is just a wrapper for concrete Decl's

It's a Decl plus a discriminator which is required for certain kinds of declaration. See e.g. GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type). GlobalDecl asserts if you try to construct it using GlobalDecl(FunctionDecl*) with a constructor/destructor declaration; we could similarly make that forbid construction with a kernel and then require code to use a GlobalDecl constructor that passes down whether it's the kernel or the stub that's being requested.

John.

In host compilation, we do not need to differentiate device function or stub function except for the mangler. Currently the mangler does not know about GlobalDecl. If we let the mangler to mangle a function based on whether it is a GlobalDecl or FunctionDecl, we still need to modify the mangler, and the change will be similar to the current approach.

There are a number of places in IRGen that pass around GlobalDecls with the expectation that that's sufficient to uniquely identify a symbol. The fact that IRGen breaks down the GD at the last second before passing it to the mangler, rather than passing it to the mangler and letting the mangler decide what to do with it, doesn't really change anything and is arguably poor code design anyway. Inventing a second declaration, or trying to propagate a flag outside of GD, is just fighting the architecture for no good reason.

Attempt to prefix the kernel stub name on the fly.

If we do not want to create two Decl's during parsing, and do not want to change the mangler, it seems the least invasive way to get the prefixed kernel name is to change it on the fly then change it back.

tra added inline comments.Nov 13 2019, 11:53 AM

clang/lib/CodeGen/CodeGenModule.cpp
1128–1137	On one hand I like this patch variant much better than the one that changed the mangling itself. On the other hand this code appears to reply on implementation details. I.e. we're setting new name on `FD` which may or may not be the same as `ND`, but we're always passing `ND` to `getMangledNameImpl()`. Perhaps we could implement name-tweaking as another `MultiVersionKind` which we already plumb into getMangledNameImpl() and which allows changing the name for target attributes & features.

yaxunl marked an inline comment as done.Nov 14 2019, 9:02 AM

yaxunl added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
1128–1137	The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name.

tra added inline comments.Nov 14 2019, 9:36 AM

clang/lib/CodeGen/CodeGenModule.cpp
1128–1137	The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. That would be putting more effort into working around the fact that `getMangledNameImpl()` doesa not provide a good API to change the name the way you need to. That's what needs to be addressed, IMO. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name. Some existing implementations append to the mangled name, but we can do other manipulations there, too. The string with the mangled name originates in `getMangledNameImpl` and we could do more than just append to it. We do not have to use the `MultiVersion` for that, either. E.g. we prepend `__regcall3__` to the names of functions with `CC_X86RegCall` calling convention. We could do something similar for the kernel stub, I wonder if we could just generate a unique name and be done with that? Hmm. Unique name probably would not do if, let's say, a kernel is defined in one TU, but we need to call it from another TU. So, whichever way we change the name of the stub, it will need to be the same everywhere. You may want to add a test verifying that launching of declaration-only kernels uses the right name. Consistency of name mangling means that we do need to include regular C++-mangled information. Which means we need to do the name tweaking deeper down. How about using calling conventions? It's been suggested in the past that a lot of shenanigans around kernel launches could/should be done as a different calling convention. One of the things affected by the calling convention is mangling and we can add prefix there: https://github.com/llvm/llvm-project/blob/master/clang/lib/AST/Mangle.cpp#L164 We could tag host-side kernel with 'kernel call' calling convention on the host side and then plumb prefixing to be done similar to `__regcall3__`. If that works that may be a useful improvement overall. For instance, we may no longer need to stash a `it's a kernel` flag among attributes and it would probably be useful for other things (e.g enforcing address space requirements for kernel pointer arguments).

yaxunl added inline comments.Nov 14 2019, 2:55 PM

clang/lib/CodeGen/CodeGenModule.cpp
1128–1137	will add a test for decl only kernel. At least for the current implementation I see it works. A decl of stub function with expected name is emitted and can be called. About calling conv. I've tried implementing `__global__` as a calling conv before. The issue is that it is part of type system and clang enforces type checking for that. e.g. you cannot assign it to an ordinary function pointer unless that function pointer is also declared with the same calling convention. This will cause lots of type mismatching issues. In CUDA language, `__global__` is not part of type system since it is just an attribute. We could introduce a calling conv for stub, but probably we can only use use it when we mangle the stub function.

tra added inline comments.Nov 14 2019, 3:21 PM

clang/lib/CodeGen/CodeGenModule.cpp
1128–1137	OK. I'm fresh out of ideas. We should add some sort of assert to make sure that the mangled name does have the prefix we intended to add. Also a TODO to figure out a better way to add a name prefix before mangling. If anyone else has other suggestions, please chime in.

use calling convention to mangle the stub differently.

clean up and fix assertions.

Herald added subscribers: nhaehnle, jvesely. · View Herald TranscriptDec 3 2019, 12:52 PM

ping

I mean, I made a recommendation and you dismissed it.

Revised by John's comments. Introduced HIPKernelType for GlobalDecl so that we can use GlobalDecl to
represent stub and kernel during host compilation. Revised mangler so that GlobalDecl carrying
pertinent information can be passed through.

Herald added a subscriber: kerbowa. · View Herald TranscriptFeb 17 2020, 9:00 AM

Thanks, I think this approach is really improving the existing code.

clang/include/clang/AST/GlobalDecl.h
40	The attribute here is `CUDAGlobalAttr`; should this be named in terms of CUDA, or is the CUDA model sufficiently different from HIP that the same implementation concept doesn't apply?
61	This function exists primarily to be used as a common initializer for all the constructors that don't require any of the extra fields. (This file predates LLVM's adoption of C++14, which allows constructors to delegate to other constructors.) That's why it asserts that it's not used with constructors or destructors. So, two questions: Is there a reason this function now needs to tolerate null declarations? There's some subtle implicit behavior here, in that references to kernels default to `HIPKernelKind::Kernel`. Is that reasonable, or should there be a third assertion that this function isn't used with kernel declarations?
140	`explicit`, please.
177	The indentation here seems odd.
clang/lib/AST/ItaniumMangle.cpp
397	Does passing down a `GlobalDecl` everywhere allow us to remove these constructors, i.e. to eliminate the `Structor` and `StructorType` fields?
641	This can just be `cast`, except actually I don't think you need a cast here at all given the code below.
814	There's a default constructor.
1615	The relevant wording from the Itanium spec here is: For entities in constructors and destructors, the mangling of the complete object constructor or destructor is used as the base function name, i.e. the C1 or D1 version. But you might consider pulling this out as a helper function, something like: static GlobalDecl getParentOfLocalEntity(const DeclContext *DC);
1617	This can still be `cast`.
1907	You can just pass `GlobalDecl()` here.
4992–4993	The second assertion can just be removed now, since the GD should be carrying the right information.
clang/lib/CodeGen/CodeGenModule.cpp
1025–1026	Let's see if we can make this breakdown no longer necessary, since `MangleContext::mangleName` should be capable of doing the right thing starting straight from a GD. In fact, maybe we can remove most of the specialized mangling methods (like `mangleCXXCtor` and `mangleCXXDtor`) from `MangleContext` completely? Unrelatedly: there's an `Out` declared in the outermost scope, but a bunch of these scopes declare their own `Out`; could you just fix that while you're editing this function?
1039	Is this the best way of handling this, or should `shouldMangleDeclName` return true for kernels (or at least stubs?) even in C? Honest question.
3340	Should this be handled in the caller, or would that make things unreasonably difficult?

Nice! Thank you for making these changes.

clang/include/clang/AST/GlobalDecl.h
40	I believe the attribute serves the same purpose in both CUDA and HIP and could be renamed appropriately in a separate patch. While the changes in this patch are not required for CUDA, CUDA would benefit from them. We could use a generic GPU prefix and migrate CUDA to the same model later. A TODO comment about that would be nice.
clang/lib/CodeGen/CodeGenModule.cpp
1025–1026	Perhaps it would make sense to split this patch into two -- one that changes mangler input to GlobalDecl and the other one dealing with HIP stubs.

rjmccall added inline comments.Feb 18 2020, 9:58 AM

clang/include/clang/AST/GlobalDecl.h
40	I'd just like consistency. If they're serving the same purpose, then as someone with no dog in this fight, I would give precedence to CUDA over HIP in names since it's both the older language and was implemented first in Clang (even if only partially, IIUC). I don't think a generic name works unless we can meaningfully generalize it to all languages with a similar feature, e.g. OpenCL and so on.
clang/lib/CodeGen/CodeGenModule.cpp
1025–1026	Yes, that's a good idea.

tra added inline comments.Feb 18 2020, 10:18 AM

clang/include/clang/AST/GlobalDecl.h
40	Naming, the hardest problem in computer science. :-) I personally would prefer generalization-with-exclusions over specific name which is inconsistently commingles things that's really specific to that name and things that are more widely used. Alas, right now CUDA is the example of the latter case -- some parts are CUDA-specific and a lot are shared with HIP. For the new features we've been sort of sticking with using CUDA/HIP for specific parts and GPU for shared functionality, but as things are a lot of shared bits are still 'CUDA' and it's hard to tell them apart. As you point it out, renaming the incumbent names would be a pain, so here we are. I think using `GPUKernelKind` with a comment that it reflects HIP & CUDA kernels would be somewhat better choice than `CUDAKernelKind` which would be somewhat confusing at this point given that CUDA actually does not use it yet. I'm also fine with keeping it `HIPKernelKind` and postpone the naming decision until CUDA-related parts are actually implemented.

rjmccall added inline comments.Feb 19 2020, 7:24 AM

clang/include/clang/AST/GlobalDecl.h
40	Maybe `KernelReferenceKind`? It's probably a common concept across all heterogenous-computing models.

tra added inline comments.Feb 19 2020, 9:18 AM

clang/include/clang/AST/GlobalDecl.h
40	SGTM.

yaxunl marked 38 inline comments as done.Mar 5 2020, 10:30 AM

yaxunl added inline comments.

clang/include/clang/AST/GlobalDecl.h
40	changed.
61	By using default constructor of GlobalDecl, null declaration is no longer necessary. I have reverted change to assert. I think it is a good idea to force GlobalDecl of kernels to be instantiated with the specific ctor. I have added assert to Init to prevent kernels to be instantiated through it.
140	fixed
177	fixed
clang/lib/AST/ItaniumMangle.cpp
397	Removing eliminate the Structor and StructorType will incur significantly more changes. Can it be done later? Thanks.
641	removed cast
814	fixed
1615	extracted to getParentOfLocalEntity
1617	fixed
1907	fixed
4992–4993	removed
clang/lib/CodeGen/CodeGenModule.cpp
1025–1026	Fixed the redundant Out var. However, removing mangleCXXCtor/Dtor will incur significantly more changes. Can it be done later? Thanks.
1025–1026	separated to https://reviews.llvm.org/D75700
1039	This is for extern "C" kernels, which are either not mangled or with simple prefix. I tried returning true for them in shouldMangleDeclName, and they got mangled as Itanium mangling, which seems not right.
3340	fixed

Revised by John's and Artem's comments.

yaxunl added a parent revision: D75700: [NFC] Let mangler accept GlobalDecl.Mar 5 2020, 10:37 AM

update patch

Few nits. LGTM otherwise.

clang/include/clang/AST/GlobalDecl.h
61	Wording inconsitency -- we're checking for `CUDAGlobalAttr` but complaining about 'HIP kernels'. Just drop 'HIP' or replace with 'GPU'?
85	Ditto.
129	Same wording nit.
188	Ditto.
clang/lib/CodeGen/CGCUDARuntime.h
70–72	Adding a descriptive comment would be great. Otherwise anyone looking at the function decl without the context of this patch will be puzzled about its meaning and purpose. Also, perhaps the argument type should be a `NamedDecl` -- the function is not used on or useful for regular `Decl`. It will save us few casts in other places, too.

This revision is now accepted and ready to land.Mar 9 2020, 9:22 AM

yaxunl marked 9 inline comments as done.Mar 9 2020, 1:42 PM

yaxunl added inline comments.

clang/include/clang/AST/GlobalDecl.h
61	will use GPU kernel
clang/lib/CodeGen/CGCUDARuntime.h
70–72	done

Closed by commit rG22c457a869d5: [HIP] Fix device stub name (authored by yaxunl). · Explain WhyMar 9 2020, 2:03 PM

This revision was automatically updated to reflect the committed changes.

yaxunl marked 2 inline comments as done.

Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2020, 2:03 PM

yaxunl mentioned this in D113491: [HIP] Fix device stub name for Windows.Nov 9 2021, 8:27 AM

yaxunl mentioned this in rG38211bbab1d9: [HIP] Fix device stub name for Windows.Nov 23 2021, 9:04 AM

Revision Contents

Path

Size

clang/

include/

clang/

AST/

GlobalDecl.h

38 lines

lib/

AST/

Expr.cpp

2 lines

ItaniumMangle.cpp

26 lines

Mangle.cpp

2 lines

CodeGen/

39 lines

6 lines

2 lines

4 lines

3 lines

36 lines

test/

CodeGenCUDA/

amdgpu-kernel-arg-pointer-type.cu

12 lines

kernel-stub-name.cu

45 lines

unnamed-types.cu

2 lines

Diff 249212

clang/include/clang/AST/GlobalDecl.h

//===- GlobalDecl.h - Global declaration holder ------------------ C++ --===//		//===- GlobalDecl.h - Global declaration holder ------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// A GlobalDecl can hold either a regular variable/function or a C++ ctor/dtor		// A GlobalDecl can hold either a regular variable/function or a C++ ctor/dtor
// together with its type.		// together with its type.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_AST_GLOBALDECL_H		#ifndef LLVM_CLANG_AST_GLOBALDECL_H
#define LLVM_CLANG_AST_GLOBALDECL_H		#define LLVM_CLANG_AST_GLOBALDECL_H

		#include "clang/AST/Attr.h"
#include "clang/AST/DeclCXX.h"		#include "clang/AST/DeclCXX.h"
#include "clang/AST/DeclObjC.h"		#include "clang/AST/DeclObjC.h"
#include "clang/AST/DeclOpenMP.h"		#include "clang/AST/DeclOpenMP.h"
#include "clang/Basic/ABI.h"		#include "clang/Basic/ABI.h"
#include "clang/Basic/LLVM.h"		#include "clang/Basic/LLVM.h"
#include "llvm/ADT/DenseMapInfo.h"		#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/PointerIntPair.h"		#include "llvm/ADT/PointerIntPair.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/type_traits.h"		#include "llvm/Support/type_traits.h"
#include <cassert>		#include <cassert>

namespace clang {		namespace clang {

enum class DynamicInitKind : unsigned {		enum class DynamicInitKind : unsigned {
NoStub = 0,		NoStub = 0,
Initializer,		Initializer,
AtExit,		AtExit,
};		};

		enum class KernelReferenceKind : unsigned {
		Kernel = 0,
		Stub = 1,
		};
		rjmccallUnsubmitted Done Reply Inline Actions The attribute here is `CUDAGlobalAttr`; should this be named in terms of CUDA, or is the CUDA model sufficiently different from HIP that the same implementation concept doesn't apply? rjmccall: The attribute here is `CUDAGlobalAttr`; should this be named in terms of CUDA, or is the CUDA…
		traUnsubmitted Done Reply Inline Actions I believe the attribute serves the same purpose in both CUDA and HIP and could be renamed appropriately in a separate patch. While the changes in this patch are not required for CUDA, CUDA would benefit from them. We could use a generic GPU prefix and migrate CUDA to the same model later. A TODO comment about that would be nice. tra: I believe the attribute serves the same purpose in both CUDA and HIP and could be renamed…
		rjmccallUnsubmitted Done Reply Inline Actions I'd just like consistency. If they're serving the same purpose, then as someone with no dog in this fight, I would give precedence to CUDA over HIP in names since it's both the older language and was implemented first in Clang (even if only partially, IIUC). I don't think a generic name works unless we can meaningfully generalize it to all languages with a similar feature, e.g. OpenCL and so on. rjmccall: I'd just like consistency. If they're serving the same purpose, then as someone with no dog in…
		traUnsubmitted Done Reply Inline Actions Naming, the hardest problem in computer science. :-) I personally would prefer generalization-with-exclusions over specific name which is inconsistently commingles things that's really specific to that name and things that are more widely used. Alas, right now CUDA is the example of the latter case -- some parts are CUDA-specific and a lot are shared with HIP. For the new features we've been sort of sticking with using CUDA/HIP for specific parts and GPU for shared functionality, but as things are a lot of shared bits are still 'CUDA' and it's hard to tell them apart. As you point it out, renaming the incumbent names would be a pain, so here we are. I think using `GPUKernelKind` with a comment that it reflects HIP & CUDA kernels would be somewhat better choice than `CUDAKernelKind` which would be somewhat confusing at this point given that CUDA actually does not use it yet. I'm also fine with keeping it `HIPKernelKind` and postpone the naming decision until CUDA-related parts are actually implemented. tra: Naming, the hardest problem in computer science. :-) I personally would prefer generalization…
		rjmccallUnsubmitted Done Reply Inline Actions Maybe `KernelReferenceKind`? It's probably a common concept across all heterogenous-computing models. rjmccall: Maybe `KernelReferenceKind`? It's probably a common concept across all heterogenous-computing…
		traUnsubmitted Done Reply Inline Actions SGTM. tra: SGTM.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions changed. yaxunl: changed.

/// GlobalDecl - represents a global declaration. This can either be a		/// GlobalDecl - represents a global declaration. This can either be a
/// CXXConstructorDecl and the constructor type (Base, Complete).		/// CXXConstructorDecl and the constructor type (Base, Complete).
/// a CXXDestructorDecl and the destructor type (Base, Complete),		/// a CXXDestructorDecl and the destructor type (Base, Complete),
/// a FunctionDecl and the kernel reference type (Kernel, Stub), or		/// a FunctionDecl and the kernel reference type (Kernel, Stub), or
/// a VarDecl, a FunctionDecl or a BlockDecl.		/// a VarDecl, a FunctionDecl or a BlockDecl.
///		///
/// When a new type of GlobalDecl is added, the following places should		/// When a new type of GlobalDecl is added, the following places should
/// be updated to convert a Decl* to a GlobalDecl:		/// be updated to convert a Decl* to a GlobalDecl:
/// PredefinedExpr::ComputeName() in lib/AST/Expr.cpp.		/// PredefinedExpr::ComputeName() in lib/AST/Expr.cpp.
/// getParentOfLocalEntity() in lib/AST/ItaniumMangle.cpp		/// getParentOfLocalEntity() in lib/AST/ItaniumMangle.cpp
/// ASTNameGenerator::Implementation::writeFuncOrVarName in lib/AST/Mangle.cpp		/// ASTNameGenerator::Implementation::writeFuncOrVarName in lib/AST/Mangle.cpp
///		///
class GlobalDecl {		class GlobalDecl {
llvm::PointerIntPair<const Decl *, 3> Value;		llvm::PointerIntPair<const Decl *, 3> Value;
unsigned MultiVersionIndex = 0;		unsigned MultiVersionIndex = 0;

void Init(const Decl *D) {		void Init(const Decl *D) {
assert(!isa<CXXConstructorDecl>(D) && "Use other ctor with ctor decls!");		assert(!isa<CXXConstructorDecl>(D) && "Use other ctor with ctor decls!");
assert(!isa<CXXDestructorDecl>(D) && "Use other ctor with dtor decls!");		assert(!isa<CXXDestructorDecl>(D) && "Use other ctor with dtor decls!");
		assert(!D->hasAttr<CUDAGlobalAttr>() && "Use other ctor with GPU kernels!");
		rjmccallUnsubmitted Done Reply Inline Actions This function exists primarily to be used as a common initializer for all the constructors that don't require any of the extra fields. (This file predates LLVM's adoption of C++14, which allows constructors to delegate to other constructors.) That's why it asserts that it's not used with constructors or destructors. So, two questions: Is there a reason this function now needs to tolerate null declarations? There's some subtle implicit behavior here, in that references to kernels default to `HIPKernelKind::Kernel`. Is that reasonable, or should there be a third assertion that this function isn't used with kernel declarations? rjmccall: This function exists primarily to be used as a common initializer for all the constructors that…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions By using default constructor of GlobalDecl, null declaration is no longer necessary. I have reverted change to assert. I think it is a good idea to force GlobalDecl of kernels to be instantiated with the specific ctor. I have added assert to Init to prevent kernels to be instantiated through it. yaxunl: By using default constructor of GlobalDecl, null declaration is no longer necessary. I have…
		traUnsubmitted Done Reply Inline Actions Wording inconsitency -- we're checking for `CUDAGlobalAttr` but complaining about 'HIP kernels'. Just drop 'HIP' or replace with 'GPU'? tra: Wording inconsitency -- we're checking for `CUDAGlobalAttr` but complaining about 'HIP kernels'.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will use GPU kernel yaxunl: will use GPU kernel

Value.setPointer(D);		Value.setPointer(D);
}		}

public:		public:
GlobalDecl() = default;		GlobalDecl() = default;
GlobalDecl(const VarDecl *D) { Init(D);}		GlobalDecl(const VarDecl *D) { Init(D);}
GlobalDecl(const FunctionDecl *D, unsigned MVIndex = 0)		GlobalDecl(const FunctionDecl *D, unsigned MVIndex = 0)
: MultiVersionIndex(MVIndex) {		: MultiVersionIndex(MVIndex) {
Init(D);		Init(D);
}		}
GlobalDecl(const NamedDecl *D) { Init(D); }		GlobalDecl(const NamedDecl *D) { Init(D); }
GlobalDecl(const BlockDecl *D) { Init(D); }		GlobalDecl(const BlockDecl *D) { Init(D); }
GlobalDecl(const CapturedDecl *D) { Init(D); }		GlobalDecl(const CapturedDecl *D) { Init(D); }
GlobalDecl(const ObjCMethodDecl *D) { Init(D); }		GlobalDecl(const ObjCMethodDecl *D) { Init(D); }
GlobalDecl(const OMPDeclareReductionDecl *D) { Init(D); }		GlobalDecl(const OMPDeclareReductionDecl *D) { Init(D); }
GlobalDecl(const OMPDeclareMapperDecl *D) { Init(D); }		GlobalDecl(const OMPDeclareMapperDecl *D) { Init(D); }
GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type) : Value(D, Type) {}		GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type) : Value(D, Type) {}
GlobalDecl(const CXXDestructorDecl *D, CXXDtorType Type) : Value(D, Type) {}		GlobalDecl(const CXXDestructorDecl *D, CXXDtorType Type) : Value(D, Type) {}
GlobalDecl(const VarDecl *D, DynamicInitKind StubKind)		GlobalDecl(const VarDecl *D, DynamicInitKind StubKind)
: Value(D, unsigned(StubKind)) {}		: Value(D, unsigned(StubKind)) {}
		GlobalDecl(const FunctionDecl *D, KernelReferenceKind Kind)
		: Value(D, unsigned(Kind)) {
		assert(D->hasAttr<CUDAGlobalAttr>() && "Decl is not a GPU kernel!");
		traUnsubmitted Done Reply Inline Actions Ditto. tra: Ditto.
		}

GlobalDecl getCanonicalDecl() const {		GlobalDecl getCanonicalDecl() const {
GlobalDecl CanonGD;		GlobalDecl CanonGD;
CanonGD.Value.setPointer(Value.getPointer()->getCanonicalDecl());		CanonGD.Value.setPointer(Value.getPointer()->getCanonicalDecl());
CanonGD.Value.setInt(Value.getInt());		CanonGD.Value.setInt(Value.getInt());
CanonGD.MultiVersionIndex = MultiVersionIndex;		CanonGD.MultiVersionIndex = MultiVersionIndex;

return CanonGD;		return CanonGD;
Show All 14 Lines	public:
DynamicInitKind getDynamicInitKind() const {		DynamicInitKind getDynamicInitKind() const {
assert(isa<VarDecl>(getDecl()) &&		assert(isa<VarDecl>(getDecl()) &&
cast<VarDecl>(getDecl())->hasGlobalStorage() &&		cast<VarDecl>(getDecl())->hasGlobalStorage() &&
"Decl is not a global variable!");		"Decl is not a global variable!");
return static_cast<DynamicInitKind>(Value.getInt());		return static_cast<DynamicInitKind>(Value.getInt());
}		}

unsigned getMultiVersionIndex() const {		unsigned getMultiVersionIndex() const {
assert(isa<FunctionDecl>(getDecl()) &&		assert(isa<FunctionDecl>(
		getDecl()) &&
		!cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
!isa<CXXConstructorDecl>(getDecl()) &&		!isa<CXXConstructorDecl>(getDecl()) &&
!isa<CXXDestructorDecl>(getDecl()) &&		!isa<CXXDestructorDecl>(getDecl()) &&
"Decl is not a plain FunctionDecl!");		"Decl is not a plain FunctionDecl!");
return MultiVersionIndex;		return MultiVersionIndex;
}		}

		KernelReferenceKind getKernelReferenceKind() const {
		assert(isa<FunctionDecl>(getDecl()) &&
		cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
		"Decl is not a GPU kernel!");
		traUnsubmitted Done Reply Inline Actions Same wording nit. tra: Same wording nit.
		return static_cast<KernelReferenceKind>(Value.getInt());
		}

friend bool operator==(const GlobalDecl &LHS, const GlobalDecl &RHS) {		friend bool operator==(const GlobalDecl &LHS, const GlobalDecl &RHS) {
return LHS.Value == RHS.Value &&		return LHS.Value == RHS.Value &&
LHS.MultiVersionIndex == RHS.MultiVersionIndex;		LHS.MultiVersionIndex == RHS.MultiVersionIndex;
}		}

void *getAsOpaquePtr() const { return Value.getOpaqueValue(); }		void *getAsOpaquePtr() const { return Value.getOpaqueValue(); }

explicit operator bool() const { return getAsOpaquePtr(); }		explicit operator bool() const { return getAsOpaquePtr(); }
		rjmccallUnsubmitted Done Reply Inline Actions `explicit`, please. rjmccall: `explicit`, please.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed

static GlobalDecl getFromOpaquePtr(void *P) {		static GlobalDecl getFromOpaquePtr(void *P) {
GlobalDecl GD;		GlobalDecl GD;
GD.Value.setFromOpaqueValue(P);		GD.Value.setFromOpaqueValue(P);
return GD;		return GD;
}		}

		static GlobalDecl getDefaultKernelReference(const FunctionDecl *D) {
		return GlobalDecl(D, D->getASTContext().getLangOpts().CUDAIsDevice
		? KernelReferenceKind::Kernel
		: KernelReferenceKind::Stub);
		}

GlobalDecl getWithDecl(const Decl *D) {		GlobalDecl getWithDecl(const Decl *D) {
GlobalDecl Result(*this);		GlobalDecl Result(*this);
Result.Value.setPointer(D);		Result.Value.setPointer(D);
return Result;		return Result;
}		}

GlobalDecl getWithCtorType(CXXCtorType Type) {		GlobalDecl getWithCtorType(CXXCtorType Type) {
assert(isa<CXXConstructorDecl>(getDecl()));		assert(isa<CXXConstructorDecl>(getDecl()));
GlobalDecl Result(*this);		GlobalDecl Result(*this);
Result.Value.setInt(Type);		Result.Value.setInt(Type);
return Result;		return Result;
}		}

GlobalDecl getWithDtorType(CXXDtorType Type) {		GlobalDecl getWithDtorType(CXXDtorType Type) {
assert(isa<CXXDestructorDecl>(getDecl()));		assert(isa<CXXDestructorDecl>(getDecl()));
GlobalDecl Result(*this);		GlobalDecl Result(*this);
Result.Value.setInt(Type);		Result.Value.setInt(Type);
return Result;		return Result;
}		}

GlobalDecl getWithMultiVersionIndex(unsigned Index) {		GlobalDecl getWithMultiVersionIndex(unsigned Index) {
assert(isa<FunctionDecl>(getDecl()) &&		assert(isa<FunctionDecl>(getDecl()) &&
		!cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
!isa<CXXConstructorDecl>(getDecl()) &&		!isa<CXXConstructorDecl>(getDecl()) &&
		rjmccallUnsubmitted Done Reply Inline Actions The indentation here seems odd. rjmccall: The indentation here seems odd.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
!isa<CXXDestructorDecl>(getDecl()) &&		!isa<CXXDestructorDecl>(getDecl()) &&
"Decl is not a plain FunctionDecl!");		"Decl is not a plain FunctionDecl!");
GlobalDecl Result(*this);		GlobalDecl Result(*this);
Result.MultiVersionIndex = Index;		Result.MultiVersionIndex = Index;
return Result;		return Result;
}		}

		GlobalDecl getWithKernelReferenceKind(KernelReferenceKind Kind) {
		assert(isa<FunctionDecl>(getDecl()) &&
		cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
		"Decl is not a GPU kernel!");
		traUnsubmitted Done Reply Inline Actions Ditto. tra: Ditto.
		GlobalDecl Result(*this);
		Result.Value.setInt(unsigned(Kind));
		return Result;
		}
};		};

} // namespace clang		} // namespace clang

namespace llvm {		namespace llvm {

template<> struct DenseMapInfo<clang::GlobalDecl> {		template<> struct DenseMapInfo<clang::GlobalDecl> {
static inline clang::GlobalDecl getEmptyKey() {		static inline clang::GlobalDecl getEmptyKey() {
Show All 21 Lines

clang/lib/AST/Expr.cpp

Show First 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	if (const NamedDecl *ND = dyn_cast<NamedDecl>(CurrentDecl)) {
if (MC->shouldMangleDeclName(ND)) {		if (MC->shouldMangleDeclName(ND)) {
SmallString<256> Buffer;		SmallString<256> Buffer;
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
GlobalDecl GD;		GlobalDecl GD;
if (const CXXConstructorDecl *CD = dyn_cast<CXXConstructorDecl>(ND))		if (const CXXConstructorDecl *CD = dyn_cast<CXXConstructorDecl>(ND))
GD = GlobalDecl(CD, Ctor_Base);		GD = GlobalDecl(CD, Ctor_Base);
else if (const CXXDestructorDecl *DD = dyn_cast<CXXDestructorDecl>(ND))		else if (const CXXDestructorDecl *DD = dyn_cast<CXXDestructorDecl>(ND))
GD = GlobalDecl(DD, Dtor_Base);		GD = GlobalDecl(DD, Dtor_Base);
		else if (ND->hasAttr<CUDAGlobalAttr>())
		GD = GlobalDecl::getDefaultKernelReference(cast<FunctionDecl>(ND));
else		else
GD = GlobalDecl(ND);		GD = GlobalDecl(ND);
MC->mangleName(GD, Out);		MC->mangleName(GD, Out);

if (!Buffer.empty() && Buffer.front() == '\01')		if (!Buffer.empty() && Buffer.front() == '\01')
return std::string(Buffer.substr(1));		return std::string(Buffer.substr(1));
return std::string(Buffer.str());		return std::string(Buffer.str());
} else		} else
▲ Show 20 Lines • Show All 3,978 Lines • Show Last 20 Lines

clang/lib/AST/ItaniumMangle.cpp

Show First 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	public:
}		}
CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,		CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,
const CXXConstructorDecl *D, CXXCtorType Type)		const CXXConstructorDecl *D, CXXCtorType Type)
: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),		: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),
SeqID(0), AbiTagsRoot(AbiTags) { }		SeqID(0), AbiTagsRoot(AbiTags) { }
CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,		CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,
const CXXDestructorDecl *D, CXXDtorType Type)		const CXXDestructorDecl *D, CXXDtorType Type)
: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),		: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),
SeqID(0), AbiTagsRoot(AbiTags) { }		SeqID(0), AbiTagsRoot(AbiTags) { }
		rjmccallUnsubmitted Done Reply Inline Actions Does passing down a `GlobalDecl` everywhere allow us to remove these constructors, i.e. to eliminate the `Structor` and `StructorType` fields? rjmccall: Does passing down a `GlobalDecl` everywhere allow us to remove these constructors, i.e. to…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Removing eliminate the Structor and StructorType will incur significantly more changes. Can it be done later? Thanks. yaxunl: Removing eliminate the Structor and StructorType will incur significantly more changes. Can it…

CXXNameMangler(CXXNameMangler &Outer, raw_ostream &Out_)		CXXNameMangler(CXXNameMangler &Outer, raw_ostream &Out_)
: Context(Outer.Context), Out(Out_), NullOut(false),		: Context(Outer.Context), Out(Out_), NullOut(false),
Structor(Outer.Structor), StructorType(Outer.StructorType),		Structor(Outer.Structor), StructorType(Outer.StructorType),
SeqID(Outer.SeqID), FunctionTypeDepth(Outer.FunctionTypeDepth),		SeqID(Outer.SeqID), FunctionTypeDepth(Outer.FunctionTypeDepth),
AbiTagsRoot(AbiTags), Substitutions(Outer.Substitutions) {}		AbiTagsRoot(AbiTags), Substitutions(Outer.Substitutions) {}

CXXNameMangler(CXXNameMangler &Outer, llvm::raw_null_ostream &Out_)		CXXNameMangler(CXXNameMangler &Outer, llvm::raw_null_ostream &Out_)
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	private:
void mangleUnscopedName(GlobalDecl GD,		void mangleUnscopedName(GlobalDecl GD,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleUnscopedTemplateName(GlobalDecl GD,		void mangleUnscopedTemplateName(GlobalDecl GD,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleUnscopedTemplateName(TemplateName,		void mangleUnscopedTemplateName(TemplateName,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleSourceName(const IdentifierInfo *II);		void mangleSourceName(const IdentifierInfo *II);
void mangleRegCallName(const IdentifierInfo *II);		void mangleRegCallName(const IdentifierInfo *II);
		void mangleDeviceStubName(const IdentifierInfo *II);
void mangleSourceNameWithAbiTags(		void mangleSourceNameWithAbiTags(
const NamedDecl ND, const AbiTagList AdditionalAbiTags = nullptr);		const NamedDecl ND, const AbiTagList AdditionalAbiTags = nullptr);
void mangleLocalName(GlobalDecl GD,		void mangleLocalName(GlobalDecl GD,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleBlockForPrefix(const BlockDecl *Block);		void mangleBlockForPrefix(const BlockDecl *Block);
void mangleUnqualifiedBlock(const BlockDecl *Block);		void mangleUnqualifiedBlock(const BlockDecl *Block);
void mangleTemplateParamDecl(const NamedDecl *Decl);		void mangleTemplateParamDecl(const NamedDecl *Decl);
void mangleLambda(const CXXRecordDecl *Lambda);		void mangleLambda(const CXXRecordDecl *Lambda);
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines

void CXXNameMangler::mangleSourceNameWithAbiTags(		void CXXNameMangler::mangleSourceNameWithAbiTags(
const NamedDecl ND, const AbiTagList AdditionalAbiTags) {		const NamedDecl ND, const AbiTagList AdditionalAbiTags) {
mangleSourceName(ND->getIdentifier());		mangleSourceName(ND->getIdentifier());
writeAbiTags(ND, AdditionalAbiTags);		writeAbiTags(ND, AdditionalAbiTags);
}		}

void CXXNameMangler::mangle(GlobalDecl GD) {		void CXXNameMangler::mangle(GlobalDecl GD) {
// <mangled-name> ::= _Z <encoding>		// <mangled-name> ::= _Z <encoding>
		rjmccallUnsubmitted Done Reply Inline Actions This can just be `cast`, except actually I don't think you need a cast here at all given the code below. rjmccall: This can just be `cast`, except actually I don't think you need a cast here at all given the…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions removed cast yaxunl: removed cast
// ::= <data name>		// ::= <data name>
// ::= <special-name>		// ::= <special-name>
Out << "_Z";		Out << "_Z";
if (isa<FunctionDecl>(GD.getDecl()))		if (isa<FunctionDecl>(GD.getDecl()))
mangleFunctionEncoding(GD);		mangleFunctionEncoding(GD);
else if (const VarDecl *VD = dyn_cast<VarDecl>(GD.getDecl()))		else if (const VarDecl *VD = dyn_cast<VarDecl>(GD.getDecl()))
mangleName(VD);		mangleName(VD);
else if (const IndirectFieldDecl *IFD =		else if (const IndirectFieldDecl *IFD =
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	isTemplate(GlobalDecl GD, const TemplateArgumentList *&TemplateArgs) {

// Check if we have a variable template.		// Check if we have a variable template.
if (const VarTemplateSpecializationDecl *Spec =		if (const VarTemplateSpecializationDecl *Spec =
dyn_cast<VarTemplateSpecializationDecl>(ND)) {		dyn_cast<VarTemplateSpecializationDecl>(ND)) {
TemplateArgs = &Spec->getTemplateArgs();		TemplateArgs = &Spec->getTemplateArgs();
return GD.getWithDecl(Spec->getSpecializedTemplate());		return GD.getWithDecl(Spec->getSpecializedTemplate());
}		}

return GlobalDecl();		return GlobalDecl();
		rjmccallUnsubmitted Done Reply Inline Actions There's a default constructor. rjmccall: There's a default constructor.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
}		}

void CXXNameMangler::mangleName(GlobalDecl GD) {		void CXXNameMangler::mangleName(GlobalDecl GD) {
const NamedDecl *ND = cast<NamedDecl>(GD.getDecl());		const NamedDecl *ND = cast<NamedDecl>(GD.getDecl());
if (const VarDecl *VD = dyn_cast<VarDecl>(ND)) {		if (const VarDecl *VD = dyn_cast<VarDecl>(ND)) {
// Variables should have implicit tags from its type.		// Variables should have implicit tags from its type.
AbiTagList VariableTypeAbiTags = makeVariableTypeTags(VD);		AbiTagList VariableTypeAbiTags = makeVariableTypeTags(VD);
if (VariableTypeAbiTags.empty()) {		if (VariableTypeAbiTags.empty()) {
▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines	if (II) {
getEffectiveDeclContext(ND)->isFileContext() &&		getEffectiveDeclContext(ND)->isFileContext() &&
!ND->isInAnonymousNamespace())		!ND->isInAnonymousNamespace())
Out << 'L';		Out << 'L';

auto *FD = dyn_cast<FunctionDecl>(ND);		auto *FD = dyn_cast<FunctionDecl>(ND);
bool IsRegCall = FD &&		bool IsRegCall = FD &&
FD->getType()->castAs<FunctionType>()->getCallConv() ==		FD->getType()->castAs<FunctionType>()->getCallConv() ==
clang::CC_X86RegCall;		clang::CC_X86RegCall;
if (IsRegCall)		bool IsDeviceStub =
		FD && FD->hasAttr<CUDAGlobalAttr>() &&
		GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
		if (IsDeviceStub)
		mangleDeviceStubName(II);
		else if (IsRegCall)
mangleRegCallName(II);		mangleRegCallName(II);
else		else
mangleSourceName(II);		mangleSourceName(II);

writeAbiTags(ND, AdditionalAbiTags);		writeAbiTags(ND, AdditionalAbiTags);
break;		break;
}		}

▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
void CXXNameMangler::mangleRegCallName(const IdentifierInfo *II) {		void CXXNameMangler::mangleRegCallName(const IdentifierInfo *II) {
// <source-name> ::= <positive length number> __regcall3__ <identifier>		// <source-name> ::= <positive length number> __regcall3__ <identifier>
// <number> ::= [n] <non-negative decimal integer>		// <number> ::= [n] <non-negative decimal integer>
// <identifier> ::= <unqualified source code identifier>		// <identifier> ::= <unqualified source code identifier>
Out << II->getLength() + sizeof("__regcall3__") - 1 << "__regcall3__"		Out << II->getLength() + sizeof("__regcall3__") - 1 << "__regcall3__"
<< II->getName();		<< II->getName();
}		}

		void CXXNameMangler::mangleDeviceStubName(const IdentifierInfo *II) {
		// <source-name> ::= <positive length number> __device_stub__ <identifier>
		// <number> ::= [n] <non-negative decimal integer>
		// <identifier> ::= <unqualified source code identifier>
		Out << II->getLength() + sizeof("__device_stub__") - 1 << "__device_stub__"
		<< II->getName();
		}

void CXXNameMangler::mangleSourceName(const IdentifierInfo *II) {		void CXXNameMangler::mangleSourceName(const IdentifierInfo *II) {
// <source-name> ::= <positive length number> <identifier>		// <source-name> ::= <positive length number> <identifier>
// <number> ::= [n] <non-negative decimal integer>		// <number> ::= [n] <non-negative decimal integer>
// <identifier> ::= <unqualified source code identifier>		// <identifier> ::= <unqualified source code identifier>
Out << II->getLength() << II->getName();		Out << II->getLength() << II->getName();
}		}

void CXXNameMangler::mangleNestedName(GlobalDecl GD,		void CXXNameMangler::mangleNestedName(GlobalDecl GD,
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	static GlobalDecl getParentOfLocalEntity(const DeclContext *DC) {
// The Itanium spec says:		// The Itanium spec says:
// For entities in constructors and destructors, the mangling of the		// For entities in constructors and destructors, the mangling of the
// complete object constructor or destructor is used as the base function		// complete object constructor or destructor is used as the base function
// name, i.e. the C1 or D1 version.		// name, i.e. the C1 or D1 version.
if (auto *CD = dyn_cast<CXXConstructorDecl>(DC))		if (auto *CD = dyn_cast<CXXConstructorDecl>(DC))
GD = GlobalDecl(CD, Ctor_Complete);		GD = GlobalDecl(CD, Ctor_Complete);
else if (auto *DD = dyn_cast<CXXDestructorDecl>(DC))		else if (auto *DD = dyn_cast<CXXDestructorDecl>(DC))
GD = GlobalDecl(DD, Dtor_Complete);		GD = GlobalDecl(DD, Dtor_Complete);
		else {
		auto *FD = cast<FunctionDecl>(DC);
		// Local variables can only exist in real kernels.
		if (FD->hasAttr<CUDAGlobalAttr>())
		GD = GlobalDecl(FD, KernelReferenceKind::Kernel);
else		else
GD = GlobalDecl(cast<FunctionDecl>(DC));		GD = GlobalDecl(FD);
		}
return GD;		return GD;
}		}

void CXXNameMangler::mangleLocalName(GlobalDecl GD,		void CXXNameMangler::mangleLocalName(GlobalDecl GD,
const AbiTagList *AdditionalAbiTags) {		const AbiTagList *AdditionalAbiTags) {
const Decl *D = GD.getDecl();		const Decl *D = GD.getDecl();
// <local-name> := Z <function encoding> E <entity name> [<discriminator>]		// <local-name> := Z <function encoding> E <entity name> [<discriminator>]
// := Z <function encoding> E s [<discriminator>]		// := Z <function encoding> E s [<discriminator>]
Show All 15 Lines	else if (const BlockDecl *BD = dyn_cast<BlockDecl>(DC))
mangleBlockForPrefix(BD);		mangleBlockForPrefix(BD);
else		else
mangleFunctionEncoding(getParentOfLocalEntity(DC));		mangleFunctionEncoding(getParentOfLocalEntity(DC));

// Implicit ABI tags (from namespace) are not available in the following		// Implicit ABI tags (from namespace) are not available in the following
// entity; reset to actually emitted tags, which are available.		// entity; reset to actually emitted tags, which are available.
LocalAbiTags.setUsedAbiTags(LocalAbiTags.getEmittedAbiTags());		LocalAbiTags.setUsedAbiTags(LocalAbiTags.getEmittedAbiTags());
}		}

		rjmccallUnsubmitted Done Reply Inline Actions The relevant wording from the Itanium spec here is: For entities in constructors and destructors, the mangling of the complete object constructor or destructor is used as the base function name, i.e. the C1 or D1 version. But you might consider pulling this out as a helper function, something like: static GlobalDecl getParentOfLocalEntity(const DeclContext DC); rjmccall:* The relevant wording from the Itanium spec here is: > For entities in constructors and…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions extracted to getParentOfLocalEntity yaxunl: extracted to getParentOfLocalEntity
Out << 'E';		Out << 'E';

		rjmccallUnsubmitted Done Reply Inline Actions This can still be `cast`. rjmccall: This can still be `cast`.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
// GCC 5.3.0 doesn't emit derived ABI tags for local names but that seems to		// GCC 5.3.0 doesn't emit derived ABI tags for local names but that seems to
// be a bug that is fixed in trunk.		// be a bug that is fixed in trunk.

if (RD) {		if (RD) {
// The parameter number is omitted for the last parameter, 0 for the		// The parameter number is omitted for the last parameter, 0 for the
// second-to-last parameter, 1 for the third-to-last parameter, etc. The		// second-to-last parameter, 1 for the third-to-last parameter, etc. The
// <entity name> will of course contain a <closure-type-name>: Its		// <entity name> will of course contain a <closure-type-name>: Its
// numbering will be local to the particular argument in which it appears		// numbering will be local to the particular argument in which it appears
▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	void CXXNameMangler::mangleTemplatePrefix(TemplateName Template) {
if (TemplateDecl *TD = Template.getAsTemplateDecl())		if (TemplateDecl *TD = Template.getAsTemplateDecl())
return mangleTemplatePrefix(TD);		return mangleTemplatePrefix(TD);

if (QualifiedTemplateName *Qualified = Template.getAsQualifiedTemplateName())		if (QualifiedTemplateName *Qualified = Template.getAsQualifiedTemplateName())
manglePrefix(Qualified->getQualifier());		manglePrefix(Qualified->getQualifier());

if (OverloadedTemplateStorage *Overloaded		if (OverloadedTemplateStorage *Overloaded
= Template.getAsOverloadedTemplate()) {		= Template.getAsOverloadedTemplate()) {
mangleUnqualifiedName(GlobalDecl(), (*Overloaded->begin())->getDeclName(),		mangleUnqualifiedName(GlobalDecl(), (*Overloaded->begin())->getDeclName(),
		rjmccallUnsubmitted Done Reply Inline Actions You can just pass `GlobalDecl()` here. rjmccall: You can just pass `GlobalDecl()` here.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
UnknownArity, nullptr);		UnknownArity, nullptr);
return;		return;
}		}

DependentTemplateName *Dependent = Template.getAsDependentTemplateName();		DependentTemplateName *Dependent = Template.getAsDependentTemplateName();
assert(Dependent && "Unknown template name kind?");		assert(Dependent && "Unknown template name kind?");
if (NestedNameSpecifier *Qualifier = Dependent->getQualifier())		if (NestedNameSpecifier *Qualifier = Dependent->getQualifier())
manglePrefix(Qualifier);		manglePrefix(Qualifier);
▲ Show 20 Lines • Show All 3,068 Lines • ▼ Show 20 Lines
/// mangled name to \p os and return true. Otherwise, \p os will be unchanged		/// mangled name to \p os and return true. Otherwise, \p os will be unchanged
/// and this routine will return false. In this case, the caller should just		/// and this routine will return false. In this case, the caller should just
/// emit the identifier of the declaration (\c D->getIdentifier()) as its		/// emit the identifier of the declaration (\c D->getIdentifier()) as its
/// name.		/// name.
void ItaniumMangleContextImpl::mangleCXXName(GlobalDecl GD,		void ItaniumMangleContextImpl::mangleCXXName(GlobalDecl GD,
raw_ostream &Out) {		raw_ostream &Out) {
const NamedDecl *D = cast<NamedDecl>(GD.getDecl());		const NamedDecl *D = cast<NamedDecl>(GD.getDecl());
assert((isa<FunctionDecl>(D) \|\| isa<VarDecl>(D)) &&		assert((isa<FunctionDecl>(D) \|\| isa<VarDecl>(D)) &&
"Invalid mangleName() call, argument is not a variable or function!");		"Invalid mangleName() call, argument is not a variable or function!");

		rjmccallUnsubmitted Done Reply Inline Actions The second assertion can just be removed now, since the GD should be carrying the right information. rjmccall: The second assertion can just be removed now, since the GD should be carrying the right…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions removed yaxunl: removed
PrettyStackTraceDecl CrashInfo(D, SourceLocation(),		PrettyStackTraceDecl CrashInfo(D, SourceLocation(),
getASTContext().getSourceManager(),		getASTContext().getSourceManager(),
"Mangling declaration");		"Mangling declaration");

if (auto *CD = dyn_cast<CXXConstructorDecl>(D)) {		if (auto *CD = dyn_cast<CXXConstructorDecl>(D)) {
auto Type = GD.getCtorType();		auto Type = GD.getCtorType();
CXXNameMangler Mangler(*this, Out, CD, Type);		CXXNameMangler Mangler(*this, Out, CD, Type);
return Mangler.mangle(GlobalDecl(CD, Type));		return Mangler.mangle(GlobalDecl(CD, Type));
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

clang/lib/AST/Mangle.cpp

	Show First 20 Lines • Show All 437 Lines • ▼ Show 20 Lines
	private:			private:
	bool writeFuncOrVarName(const NamedDecl *D, raw_ostream &OS) {			bool writeFuncOrVarName(const NamedDecl *D, raw_ostream &OS) {
	if (MC->shouldMangleDeclName(D)) {			if (MC->shouldMangleDeclName(D)) {
	GlobalDecl GD;			GlobalDecl GD;
	if (const auto *CtorD = dyn_cast<CXXConstructorDecl>(D))			if (const auto *CtorD = dyn_cast<CXXConstructorDecl>(D))
	GD = GlobalDecl(CtorD, Ctor_Complete);			GD = GlobalDecl(CtorD, Ctor_Complete);
	else if (const auto *DtorD = dyn_cast<CXXDestructorDecl>(D))			else if (const auto *DtorD = dyn_cast<CXXDestructorDecl>(D))
	GD = GlobalDecl(DtorD, Dtor_Complete);			GD = GlobalDecl(DtorD, Dtor_Complete);
				else if (D->hasAttr<CUDAGlobalAttr>())
				GD = GlobalDecl::getDefaultKernelReference(cast<FunctionDecl>(D));
	else			else
	GD = GlobalDecl(D);			GD = GlobalDecl(D);
	MC->mangleName(GD, OS);			MC->mangleName(GD, OS);
	return false;			return false;
	} else {			} else {
	IdentifierInfo *II = D->getIdentifier();			IdentifierInfo *II = D->getIdentifier();
	if (!II)			if (!II)
	return true;			return true;
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	llvm::Function makeDummyFunction(llvm::FunctionType FnTy) {
FuncBuilder.SetInsertPoint(DummyBlock);		FuncBuilder.SetInsertPoint(DummyBlock);
FuncBuilder.CreateRetVoid();		FuncBuilder.CreateRetVoid();

return DummyFunc;		return DummyFunc;
}		}

void emitDeviceStubBodyLegacy(CodeGenFunction &CGF, FunctionArgList &Args);		void emitDeviceStubBodyLegacy(CodeGenFunction &CGF, FunctionArgList &Args);
void emitDeviceStubBodyNew(CodeGenFunction &CGF, FunctionArgList &Args);		void emitDeviceStubBodyNew(CodeGenFunction &CGF, FunctionArgList &Args);
std::string getDeviceSideName(const Decl *ND);		std::string getDeviceSideName(const NamedDecl *ND) override;

public:		public:
CGNVCUDARuntime(CodeGenModule &CGM);		CGNVCUDARuntime(CodeGenModule &CGM);

void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) override;		void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) override;
void registerDeviceVar(const VarDecl *VD, llvm::GlobalVariable &Var,		void registerDeviceVar(const VarDecl *VD, llvm::GlobalVariable &Var,
unsigned Flags) override {		unsigned Flags) override {
DeviceVars.push_back({&Var, VD, Flags});		DeviceVars.push_back({&Var, VD, Flags});
}		}

/// Creates module constructor function		/// Creates module constructor function
llvm::Function *makeModuleCtorFunction() override;		llvm::Function *makeModuleCtorFunction() override;
/// Creates module destructor function		/// Creates module destructor function
llvm::Function *makeModuleDtorFunction() override;		llvm::Function *makeModuleDtorFunction() override;
/// Construct and return the stub name of a kernel.
std::string getDeviceStubName(llvm::StringRef Name) const override;
};		};

}		}

std::string CGNVCUDARuntime::addPrefixToName(StringRef FuncName) const {		std::string CGNVCUDARuntime::addPrefixToName(StringRef FuncName) const {
if (CGM.getLangOpts().HIP)		if (CGM.getLangOpts().HIP)
return ((Twine("hip") + Twine(FuncName)).str());		return ((Twine("hip") + Twine(FuncName)).str());
return ((Twine("cuda") + Twine(FuncName)).str());		return ((Twine("cuda") + Twine(FuncName)).str());
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
llvm::FunctionType *CGNVCUDARuntime::getRegisterLinkedBinaryFnTy() const {		llvm::FunctionType *CGNVCUDARuntime::getRegisterLinkedBinaryFnTy() const {
auto CallbackFnTy = getCallbackFnTy();		auto CallbackFnTy = getCallbackFnTy();
auto RegisterGlobalsFnTy = getRegisterGlobalsFnTy();		auto RegisterGlobalsFnTy = getRegisterGlobalsFnTy();
llvm::Type *Params[] = {RegisterGlobalsFnTy->getPointerTo(), VoidPtrTy,		llvm::Type *Params[] = {RegisterGlobalsFnTy->getPointerTo(), VoidPtrTy,
VoidPtrTy, CallbackFnTy->getPointerTo()};		VoidPtrTy, CallbackFnTy->getPointerTo()};
return llvm::FunctionType::get(VoidTy, Params, false);		return llvm::FunctionType::get(VoidTy, Params, false);
}		}

std::string CGNVCUDARuntime::getDeviceSideName(const Decl *D) {		std::string CGNVCUDARuntime::getDeviceSideName(const NamedDecl *ND) {
auto *ND = cast<const NamedDecl>(D);		GlobalDecl GD;
		// D could be either a kernel or a variable.
		if (auto *FD = dyn_cast<FunctionDecl>(ND))
		GD = GlobalDecl(FD, KernelReferenceKind::Kernel);
		else
		GD = GlobalDecl(ND);
std::string DeviceSideName;		std::string DeviceSideName;
if (DeviceMC->shouldMangleDeclName(ND)) {		if (DeviceMC->shouldMangleDeclName(ND)) {
SmallString<256> Buffer;		SmallString<256> Buffer;
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
DeviceMC->mangleName(ND, Out);		DeviceMC->mangleName(GD, Out);
DeviceSideName = std::string(Out.str());		DeviceSideName = std::string(Out.str());
} else		} else
DeviceSideName = std::string(ND->getIdentifier()->getName());		DeviceSideName = std::string(ND->getIdentifier()->getName());
return DeviceSideName;		return DeviceSideName;
}		}

void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
// Ensure either we have different ABIs between host and device compilations,
// says host compilation following MSVC ABI but device compilation follows
// Itanium C++ ABI or, if they follow the same ABI, kernel names after
// mangling should be the same after name stubbing. The later checking is
// very important as the device kernel name being mangled in host-compilation
// is used to resolve the device binaries to be executed. Inconsistent naming
// result in undefined behavior. Even though we cannot check that naming
// directly between host- and device-compilations, the host- and
// device-mangling in host compilation could help catching certain ones.
assert((CGF.CGM.getContext().getAuxTargetInfo() &&
(CGF.CGM.getContext().getAuxTargetInfo()->getCXXABI() !=
CGF.CGM.getContext().getTargetInfo().getCXXABI())) \|\|
getDeviceStubName(getDeviceSideName(CGF.CurFuncDecl)) ==
CGF.CurFn->getName());

EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});		EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),		if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|		CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|
CGF.getLangOpts().HIPUseNewLaunchAPI)		CGF.getLangOpts().HIPUseNewLaunchAPI)
emitDeviceStubBodyNew(CGF, Args);		emitDeviceStubBodyNew(CGF, Args);
else		else
emitDeviceStubBodyLegacy(CGF, Args);		emitDeviceStubBodyLegacy(CGF, Args);
}		}
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	llvm::FunctionCallee RegisterFunc = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, RegisterFuncParams, false),		llvm::FunctionType::get(IntTy, RegisterFuncParams, false),
addUnderscoredPrefixToName("RegisterFunction"));		addUnderscoredPrefixToName("RegisterFunction"));

// Extract GpuBinaryHandle passed as the first argument passed to		// Extract GpuBinaryHandle passed as the first argument passed to
// __cuda_register_globals() and generate __cudaRegisterFunction() call for		// __cuda_register_globals() and generate __cudaRegisterFunction() call for
// each emitted kernel.		// each emitted kernel.
llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();		llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();
for (auto &&I : EmittedKernels) {		for (auto &&I : EmittedKernels) {
llvm::Constant *KernelName = makeConstantString(getDeviceSideName(I.D));		llvm::Constant *KernelName =
		makeConstantString(getDeviceSideName(cast<NamedDecl>(I.D)));
llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);		llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
llvm::Value *Args[] = {		llvm::Value *Args[] = {
&GpuBinaryHandlePtr,		&GpuBinaryHandlePtr,
Builder.CreateBitCast(I.Kernel, VoidPtrTy),		Builder.CreateBitCast(I.Kernel, VoidPtrTy),
KernelName,		KernelName,
KernelName,		KernelName,
llvm::ConstantInt::get(IntTy, -1),		llvm::ConstantInt::get(IntTy, -1),
NullPtr,		NullPtr,
▲ Show 20 Lines • Show All 362 Lines • ▼ Show 20 Lines	if (CGM.getLangOpts().HIP) {
DtorBuilder.SetInsertPoint(ExitBlock);		DtorBuilder.SetInsertPoint(ExitBlock);
} else {		} else {
DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);		DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);
}		}
DtorBuilder.CreateRetVoid();		DtorBuilder.CreateRetVoid();
return ModuleDtorFunc;		return ModuleDtorFunc;
}		}

std::string CGNVCUDARuntime::getDeviceStubName(llvm::StringRef Name) const {
if (!CGM.getLangOpts().HIP)
return std::string(Name);
return (Name + ".stub").str();
}

CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {		CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
return new CGNVCUDARuntime(CGM);		return new CGNVCUDARuntime(CGM);
}		}

clang/lib/CodeGen/CGCUDARuntime.h

Show All 19 Lines
namespace llvm {		namespace llvm {
class Function;		class Function;
class GlobalVariable;		class GlobalVariable;
}		}

namespace clang {		namespace clang {

class CUDAKernelCallExpr;		class CUDAKernelCallExpr;
		class NamedDecl;
class VarDecl;		class VarDecl;

namespace CodeGen {		namespace CodeGen {

class CodeGenFunction;		class CodeGenFunction;
class CodeGenModule;		class CodeGenModule;
class FunctionArgList;		class FunctionArgList;
class ReturnValueSlot;		class ReturnValueSlot;
Show All 25 Lines	public:
/// Constructs and returns a module initialization function or nullptr if it's		/// Constructs and returns a module initialization function or nullptr if it's
/// not needed. Must be called after all kernels have been emitted.		/// not needed. Must be called after all kernels have been emitted.
virtual llvm::Function *makeModuleCtorFunction() = 0;		virtual llvm::Function *makeModuleCtorFunction() = 0;

/// Returns a module cleanup function or nullptr if it's not needed.		/// Returns a module cleanup function or nullptr if it's not needed.
/// Must be called after ModuleCtorFunction		/// Must be called after ModuleCtorFunction
virtual llvm::Function *makeModuleDtorFunction() = 0;		virtual llvm::Function *makeModuleDtorFunction() = 0;

/// Construct and return the stub name of a kernel.		/// Returns function or variable name on device side even if the current
virtual std::string getDeviceStubName(llvm::StringRef Name) const = 0;		/// compilation is for host.
		virtual std::string getDeviceSideName(const NamedDecl *ND) = 0;
		traUnsubmitted Done Reply Inline Actions Adding a descriptive comment would be great. Otherwise anyone looking at the function decl without the context of this patch will be puzzled about its meaning and purpose. Also, perhaps the argument type should be a `NamedDecl` -- the function is not used on or useful for regular `Decl`. It will save us few casts in other places, too. tra: Adding a descriptive comment would be great. Otherwise anyone looking at the function decl…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
};		};

/// Creates an instance of a CUDA runtime class.		/// Creates an instance of a CUDA runtime class.
CGCUDARuntime *CreateNVCUDARuntime(CodeGenModule &CGM);		CGCUDARuntime *CreateNVCUDARuntime(CodeGenModule &CGM);

}		}
}		}

#endif		#endif

clang/lib/CodeGen/CGDecl.cpp

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	llvm::Constant *CodeGenModule::getOrCreateStaticVarDecl(
}		}

GlobalDecl GD;		GlobalDecl GD;
if (const auto *CD = dyn_cast<CXXConstructorDecl>(DC))		if (const auto *CD = dyn_cast<CXXConstructorDecl>(DC))
GD = GlobalDecl(CD, Ctor_Base);		GD = GlobalDecl(CD, Ctor_Base);
else if (const auto *DD = dyn_cast<CXXDestructorDecl>(DC))		else if (const auto *DD = dyn_cast<CXXDestructorDecl>(DC))
GD = GlobalDecl(DD, Dtor_Base);		GD = GlobalDecl(DD, Dtor_Base);
else if (const auto *FD = dyn_cast<FunctionDecl>(DC))		else if (const auto *FD = dyn_cast<FunctionDecl>(DC))
GD = GlobalDecl(FD);		GD = getGlobalDecl(FD);
else {		else {
// Don't do anything for Obj-C method decls or global closures. We should		// Don't do anything for Obj-C method decls or global closures. We should
// never defer them.		// never defer them.
assert(isa<ObjCMethodDecl>(DC) && "unexpected parent code decl");		assert(isa<ObjCMethodDecl>(DC) && "unexpected parent code decl");
}		}
if (GD.getDecl()) {		if (GD.getDecl()) {
// Disable emission of the parent function for the OpenMP device codegen.		// Disable emission of the parent function for the OpenMP device codegen.
CGOpenMPRuntime::DisableAutoDeclareTargetRAII NoDeclTarget(*this);		CGOpenMPRuntime::DisableAutoDeclareTargetRAII NoDeclTarget(*this);
▲ Show 20 Lines • Show All 2,236 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGExpr.cpp

Show First 20 Lines • Show All 4,664 Lines • ▼ Show 20 Lines	if (auto ICE = dyn_cast<ImplicitCastExpr>(E)) {
if (ICE->getCastKind() == CK_FunctionToPointerDecay \|\|		if (ICE->getCastKind() == CK_FunctionToPointerDecay \|\|
ICE->getCastKind() == CK_BuiltinFnToFnPtr) {		ICE->getCastKind() == CK_BuiltinFnToFnPtr) {
return EmitCallee(ICE->getSubExpr());		return EmitCallee(ICE->getSubExpr());
}		}

// Resolve direct calls.		// Resolve direct calls.
} else if (auto DRE = dyn_cast<DeclRefExpr>(E)) {		} else if (auto DRE = dyn_cast<DeclRefExpr>(E)) {
if (auto FD = dyn_cast<FunctionDecl>(DRE->getDecl())) {		if (auto FD = dyn_cast<FunctionDecl>(DRE->getDecl())) {
return EmitDirectCallee(*this, FD);		return EmitDirectCallee(*this, CGM.getGlobalDecl(FD));
}		}
} else if (auto ME = dyn_cast<MemberExpr>(E)) {		} else if (auto ME = dyn_cast<MemberExpr>(E)) {
if (auto FD = dyn_cast<FunctionDecl>(ME->getMemberDecl())) {		if (auto FD = dyn_cast<FunctionDecl>(ME->getMemberDecl())) {
EmitIgnoredExpr(ME->getBase());		EmitIgnoredExpr(ME->getBase());
return EmitDirectCallee(*this, FD);		return EmitDirectCallee(*this, CGM.getGlobalDecl(FD));
}		}

// Look through template substitutions.		// Look through template substitutions.
} else if (auto NTTP = dyn_cast<SubstNonTypeTemplateParmExpr>(E)) {		} else if (auto NTTP = dyn_cast<SubstNonTypeTemplateParmExpr>(E)) {
return EmitCallee(NTTP->getReplacement());		return EmitCallee(NTTP->getReplacement());

// Treat pseudo-destructor calls differently.		// Treat pseudo-destructor calls differently.
} else if (auto PDE = dyn_cast<CXXPseudoDestructorExpr>(E)) {		} else if (auto PDE = dyn_cast<CXXPseudoDestructorExpr>(E)) {
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.h

Show First 20 Lines • Show All 705 Lines • ▼ Show 20 Lines	public:

MicrosoftVTableContext &getMicrosoftVTableContext() {		MicrosoftVTableContext &getMicrosoftVTableContext() {
return VTables.getMicrosoftVTableContext();		return VTables.getMicrosoftVTableContext();
}		}

CtorList &getGlobalCtors() { return GlobalCtors; }		CtorList &getGlobalCtors() { return GlobalCtors; }
CtorList &getGlobalDtors() { return GlobalDtors; }		CtorList &getGlobalDtors() { return GlobalDtors; }

		/// get GlobalDecl for non-ctor/dtor functions.
		GlobalDecl getGlobalDecl(const FunctionDecl *FD);

/// getTBAATypeInfo - Get metadata used to describe accesses to objects of		/// getTBAATypeInfo - Get metadata used to describe accesses to objects of
/// the given type.		/// the given type.
llvm::MDNode *getTBAATypeInfo(QualType QTy);		llvm::MDNode *getTBAATypeInfo(QualType QTy);

/// getTBAAAccessInfo - Get TBAA information that describes an access to		/// getTBAAAccessInfo - Get TBAA information that describes an access to
/// an object of the given type.		/// an object of the given type.
TBAAAccessInfo getTBAAAccessInfo(QualType AccessType);		TBAAAccessInfo getTBAAAccessInfo(QualType AccessType);

▲ Show 20 Lines • Show All 831 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 1,016 Lines • ▼ Show 20 Lines	static void AppendTargetMangling(const CodeGenModule &CGM,
}		}
}		}

static std::string getMangledNameImpl(const CodeGenModule &CGM, GlobalDecl GD,		static std::string getMangledNameImpl(const CodeGenModule &CGM, GlobalDecl GD,
const NamedDecl *ND,		const NamedDecl *ND,
bool OmitMultiVersionMangling = false) {		bool OmitMultiVersionMangling = false) {
SmallString<256> Buffer;		SmallString<256> Buffer;
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
MangleContext &MC = CGM.getCXXABI().getMangleContext();		MangleContext &MC = CGM.getCXXABI().getMangleContext();
if (MC.shouldMangleDeclName(ND))		if (MC.shouldMangleDeclName(ND))
		rjmccallUnsubmitted Done Reply Inline Actions Let's see if we can make this breakdown no longer necessary, since `MangleContext::mangleName` should be capable of doing the right thing starting straight from a GD. In fact, maybe we can remove most of the specialized mangling methods (like `mangleCXXCtor` and `mangleCXXDtor`) from `MangleContext` completely? Unrelatedly: there's an `Out` declared in the outermost scope, but a bunch of these scopes declare their own `Out`; could you just fix that while you're editing this function? rjmccall: Let's see if we can make this breakdown no longer necessary, since `MangleContext::mangleName`…
		traUnsubmitted Done Reply Inline Actions Perhaps it would make sense to split this patch into two -- one that changes mangler input to GlobalDecl and the other one dealing with HIP stubs. tra: Perhaps it would make sense to split this patch into two -- one that changes mangler input to…
		rjmccallUnsubmitted Done Reply Inline Actions Yes, that's a good idea. rjmccall: Yes, that's a good idea.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions separated to https://reviews.llvm.org/D75700 yaxunl: separated to https://reviews.llvm.org/D75700
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Fixed the redundant Out var. However, removing mangleCXXCtor/Dtor will incur significantly more changes. Can it be done later? Thanks. yaxunl: Fixed the redundant Out var. However, removing mangleCXXCtor/Dtor will incur significantly…
MC.mangleName(GD.getWithDecl(ND), Out);		MC.mangleName(GD.getWithDecl(ND), Out);
else {		else {
IdentifierInfo *II = ND->getIdentifier();		IdentifierInfo *II = ND->getIdentifier();
assert(II && "Attempt to mangle unnamed decl.");		assert(II && "Attempt to mangle unnamed decl.");
const auto *FD = dyn_cast<FunctionDecl>(ND);		const auto *FD = dyn_cast<FunctionDecl>(ND);

if (FD &&		if (FD &&
FD->getType()->castAs<FunctionType>()->getCallConv() == CC_X86RegCall) {		FD->getType()->castAs<FunctionType>()->getCallConv() == CC_X86RegCall) {
Out << "__regcall3__" << II->getName();		Out << "__regcall3__" << II->getName();
		} else if (FD && FD->hasAttr<CUDAGlobalAttr>() &&
		GD.getKernelReferenceKind() == KernelReferenceKind::Stub) {
		Out << "__device_stub__" << II->getName();
} else {		} else {
		rjmccallUnsubmitted Done Reply Inline Actions Is this the best way of handling this, or should `shouldMangleDeclName` return true for kernels (or at least stubs?) even in C? Honest question. rjmccall: Is this the best way of handling this, or should `shouldMangleDeclName` return true for kernels…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions This is for extern "C" kernels, which are either not mangled or with simple prefix. I tried returning true for them in shouldMangleDeclName, and they got mangled as Itanium mangling, which seems not right. yaxunl: This is for extern "C" kernels, which are either not mangled or with simple prefix. I tried…
Out << II->getName();		Out << II->getName();
}		}
}		}

if (const auto *FD = dyn_cast<FunctionDecl>(ND))		if (const auto *FD = dyn_cast<FunctionDecl>(ND))
if (FD->isMultiVersion() && !OmitMultiVersionMangling) {		if (FD->isMultiVersion() && !OmitMultiVersionMangling) {
switch (FD->getMultiVersionKind()) {		switch (FD->getMultiVersionKind()) {
case MultiVersionKind::CPUDispatch:		case MultiVersionKind::CPUDispatch:
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	StringRef CodeGenModule::getMangledName(GlobalDecl GD) {
auto FoundName = MangledDeclNames.find(CanonicalGD);		auto FoundName = MangledDeclNames.find(CanonicalGD);
if (FoundName != MangledDeclNames.end())		if (FoundName != MangledDeclNames.end())
return FoundName->second;		return FoundName->second;

// Keep the first result in the case of a mangling collision.		// Keep the first result in the case of a mangling collision.
const auto *ND = cast<NamedDecl>(GD.getDecl());		const auto *ND = cast<NamedDecl>(GD.getDecl());
std::string MangledName = getMangledNameImpl(*this, GD, ND);		std::string MangledName = getMangledNameImpl(*this, GD, ND);

// Adjust kernel stub mangling as we may need to be able to differentiate		// Ensure either we have different ABIs between host and device compilations,
// them from the kernel itself (e.g., for HIP).		// says host compilation following MSVC ABI but device compilation follows
if (auto *FD = dyn_cast<FunctionDecl>(GD.getDecl()))		// Itanium C++ ABI or, if they follow the same ABI, kernel names after
if (!getLangOpts().CUDAIsDevice && FD->hasAttr<CUDAGlobalAttr>())		// mangling should be the same after name stubbing. The later checking is
MangledName = getCUDARuntime().getDeviceStubName(MangledName);		// very important as the device kernel name being mangled in host-compilation
		// is used to resolve the device binaries to be executed. Inconsistent naming
		// result in undefined behavior. Even though we cannot check that naming
		// directly between host- and device-compilations, the host- and
		// device-mangling in host compilation could help catching certain ones.
		assert(!isa<FunctionDecl>(ND) \|\| !ND->hasAttr<CUDAGlobalAttr>() \|\|
		getLangOpts().CUDAIsDevice \|\|
		(getContext().getAuxTargetInfo() &&
		traUnsubmitted Done Reply Inline Actions On one hand I like this patch variant much better than the one that changed the mangling itself. On the other hand this code appears to reply on implementation details. I.e. we're setting new name on `FD` which may or may not be the same as `ND`, but we're always passing `ND` to `getMangledNameImpl()`. Perhaps we could implement name-tweaking as another `MultiVersionKind` which we already plumb into getMangledNameImpl() and which allows changing the name for target attributes & features. tra: On one hand I like this patch variant much better than the one that changed the mangling itself.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name. yaxunl: The mangled name of an instantiated template function does not depends on its own name, but on…
		traUnsubmitted Done Reply Inline Actions The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. That would be putting more effort into working around the fact that `getMangledNameImpl()` doesa not provide a good API to change the name the way you need to. That's what needs to be addressed, IMO. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name. Some existing implementations append to the mangled name, but we can do other manipulations there, too. The string with the mangled name originates in `getMangledNameImpl` and we could do more than just append to it. We do not have to use the `MultiVersion` for that, either. E.g. we prepend `__regcall3__` to the names of functions with `CC_X86RegCall` calling convention. We could do something similar for the kernel stub, I wonder if we could just generate a unique name and be done with that? Hmm. Unique name probably would not do if, let's say, a kernel is defined in one TU, but we need to call it from another TU. So, whichever way we change the name of the stub, it will need to be the same everywhere. You may want to add a test verifying that launching of declaration-only kernels uses the right name. Consistency of name mangling means that we do need to include regular C++-mangled information. Which means we need to do the name tweaking deeper down. How about using calling conventions? It's been suggested in the past that a lot of shenanigans around kernel launches could/should be done as a different calling convention. One of the things affected by the calling convention is mangling and we can add prefix there: https://github.com/llvm/llvm-project/blob/master/clang/lib/AST/Mangle.cpp#L164 We could tag host-side kernel with 'kernel call' calling convention on the host side and then plumb prefixing to be done similar to `__regcall3__`. If that works that may be a useful improvement overall. For instance, we may no longer need to stash a `it's a kernel` flag among attributes and it would probably be useful for other things (e.g enforcing address space requirements for kernel pointer arguments). tra: > The mangled name of an instantiated template function does not depends on its own name, but…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will add a test for decl only kernel. At least for the current implementation I see it works. A decl of stub function with expected name is emitted and can be called. About calling conv. I've tried implementing `__global__` as a calling conv before. The issue is that it is part of type system and clang enforces type checking for that. e.g. you cannot assign it to an ordinary function pointer unless that function pointer is also declared with the same calling convention. This will cause lots of type mismatching issues. In CUDA language, `__global__` is not part of type system since it is just an attribute. We could introduce a calling conv for stub, but probably we can only use use it when we mangle the stub function. yaxunl: will add a test for decl only kernel. At least for the current implementation I see it works. A…
		traUnsubmitted Done Reply Inline Actions OK. I'm fresh out of ideas. We should add some sort of assert to make sure that the mangled name does have the prefix we intended to add. Also a TODO to figure out a better way to add a name prefix before mangling. If anyone else has other suggestions, please chime in. tra: OK. I'm fresh out of ideas. We should add some sort of assert to make sure that the mangled…
		(getContext().getAuxTargetInfo()->getCXXABI() !=
		getContext().getTargetInfo().getCXXABI())) \|\|
		getCUDARuntime().getDeviceSideName(ND) ==
		getMangledNameImpl(
		*this,
		GD.getWithKernelReferenceKind(KernelReferenceKind::Kernel),
		ND));

auto Result = Manglings.insert(std::make_pair(MangledName, GD));		auto Result = Manglings.insert(std::make_pair(MangledName, GD));
return MangledDeclNames[CanonicalGD] = Result.first->first();		return MangledDeclNames[CanonicalGD] = Result.first->first();
}		}

StringRef CodeGenModule::getBlockMangledName(GlobalDecl GD,		StringRef CodeGenModule::getBlockMangledName(GlobalDecl GD,
const BlockDecl *BD) {		const BlockDecl *BD) {
MangleContext &MangleCtx = getCXXABI().getMangleContext();		MangleContext &MangleCtx = getCXXABI().getMangleContext();
▲ Show 20 Lines • Show All 2,179 Lines • ▼ Show 20 Lines	llvm::Constant *CodeGenModule::GetAddrOfFunction(GlobalDecl GD,
// getAddrOfCXXStructor. Make sure we use the MS ABI base destructor instead		// getAddrOfCXXStructor. Make sure we use the MS ABI base destructor instead
// of the complete destructor when necessary.		// of the complete destructor when necessary.
if (const auto *DD = dyn_cast<CXXDestructorDecl>(GD.getDecl())) {		if (const auto *DD = dyn_cast<CXXDestructorDecl>(GD.getDecl())) {
if (getTarget().getCXXABI().isMicrosoft() &&		if (getTarget().getCXXABI().isMicrosoft() &&
GD.getDtorType() == Dtor_Complete &&		GD.getDtorType() == Dtor_Complete &&
DD->getParent()->getNumVBases() == 0)		DD->getParent()->getNumVBases() == 0)
GD = GlobalDecl(DD, Dtor_Base);		GD = GlobalDecl(DD, Dtor_Base);
}		}

		rjmccallUnsubmitted Done Reply Inline Actions Should this be handled in the caller, or would that make things unreasonably difficult? rjmccall: Should this be handled in the caller, or would that make things unreasonably difficult?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
StringRef MangledName = getMangledName(GD);		StringRef MangledName = getMangledName(GD);
return GetOrCreateLLVMFunction(MangledName, Ty, GD, ForVTable, DontDefer,		return GetOrCreateLLVMFunction(MangledName, Ty, GD, ForVTable, DontDefer,
/IsThunk=/false, llvm::AttributeList(),		/IsThunk=/false, llvm::AttributeList(),
IsForDefinition);		IsForDefinition);
}		}

static const FunctionDecl *		static const FunctionDecl *
GetRuntimeFunctionDecl(ASTContext &C, StringRef Name) {		GetRuntimeFunctionDecl(ASTContext &C, StringRef Name) {
▲ Show 20 Lines • Show All 1,945 Lines • ▼ Show 20 Lines	void CodeGenModule::EmitTopLevelDecl(Decl *D) {
// Ignore dependent declarations.		// Ignore dependent declarations.
if (D->isTemplated())		if (D->isTemplated())
return;		return;

switch (D->getKind()) {		switch (D->getKind()) {
case Decl::CXXConversion:		case Decl::CXXConversion:
case Decl::CXXMethod:		case Decl::CXXMethod:
case Decl::Function:		case Decl::Function:
EmitGlobal(cast<FunctionDecl>(D));		EmitGlobal(getGlobalDecl(cast<FunctionDecl>(D)));
// Always provide some coverage mapping		// Always provide some coverage mapping
// even for the functions that aren't emitted.		// even for the functions that aren't emitted.
AddDeferredUnusedCoverageMapping(D);		AddDeferredUnusedCoverageMapping(D);
break;		break;

case Decl::CXXDeductionGuide:		case Decl::CXXDeductionGuide:
// Function-like, but does not result in code emission.		// Function-like, but does not result in code emission.
break;		break;
▲ Show 20 Lines • Show All 645 Lines • ▼ Show 20 Lines	CodeGenModule::createOpenCLIntToSamplerConversion(const Expr *E,
CodeGenFunction &CGF) {		CodeGenFunction &CGF) {
llvm::Constant *C = ConstantEmitter(CGF).emitAbstract(E, E->getType());		llvm::Constant *C = ConstantEmitter(CGF).emitAbstract(E, E->getType());
auto SamplerT = getOpenCLRuntime().getSamplerType(E->getType().getTypePtr());		auto SamplerT = getOpenCLRuntime().getSamplerType(E->getType().getTypePtr());
auto FTy = llvm::FunctionType::get(SamplerT, {C->getType()}, false);		auto FTy = llvm::FunctionType::get(SamplerT, {C->getType()}, false);
return CGF.Builder.CreateCall(CreateRuntimeFunction(FTy,		return CGF.Builder.CreateCall(CreateRuntimeFunction(FTy,
"__translate_sampler_initializer"),		"__translate_sampler_initializer"),
{C});		{C});
}		}

		GlobalDecl CodeGenModule::getGlobalDecl(const FunctionDecl *FD) {
		if (FD->hasAttr<CUDAGlobalAttr>())
		return GlobalDecl::getDefaultKernelReference(FD);
		else
		return GlobalDecl(FD);
		}

clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip %s -o - \| FileCheck %s			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip %s -o - \| FileCheck %s
	// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \| FileCheck -check-prefix=HOST %s			// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \| FileCheck -check-prefix=HOST %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Coerced struct from `struct S` without all generic pointers lowered into			// Coerced struct from `struct S` without all generic pointers lowered into
	// global ones.			// global ones.
	// CHECK: %struct.S.coerce = type { i32 addrspace(1), float addrspace(1) }			// CHECK: %struct.S.coerce = type { i32 addrspace(1), float addrspace(1) }
	// CHECK: %struct.T.coerce = type { [2 x float addrspace(1)*] }			// CHECK: %struct.T.coerce = type { [2 x float addrspace(1)*] }

	// On the host-side compilation, generic pointer won't be coerced.			// On the host-side compilation, generic pointer won't be coerced.
	// HOST-NOT: %struct.S.coerce			// HOST-NOT: %struct.S.coerce
	// HOST-NOT: %struct.T.coerce			// HOST-NOT: %struct.T.coerce

	// CHECK: define amdgpu_kernel void @_Z7kernel1Pi(i32 addrspace(1)* %x.coerce)			// CHECK: define amdgpu_kernel void @_Z7kernel1Pi(i32 addrspace(1)* %x.coerce)
	// HOST: define void @_Z7kernel1Pi.stub(i32* %x)			// HOST: define void @_Z22__device_stub__kernel1Pi(i32* %x)
	__global__ void kernel1(int *x) {			__global__ void kernel1(int *x) {
	x[0]++;			x[0]++;
	}			}

	// CHECK: define amdgpu_kernel void @_Z7kernel2Ri(i32 addrspace(1)* dereferenceable(4) %x.coerce)			// CHECK: define amdgpu_kernel void @_Z7kernel2Ri(i32 addrspace(1)* dereferenceable(4) %x.coerce)
	// HOST: define void @_Z7kernel2Ri.stub(i32* dereferenceable(4) %x)			// HOST: define void @_Z22__device_stub__kernel2Ri(i32* dereferenceable(4) %x)
	__global__ void kernel2(int &x) {			__global__ void kernel2(int &x) {
	x++;			x++;
	}			}

	// CHECK: define amdgpu_kernel void @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)* %x, i32 addrspace(1)* %y)			// CHECK: define amdgpu_kernel void @_Z7kernel3PU3AS2iPU3AS1i(i32 addrspace(2)* %x, i32 addrspace(1)* %y)
	// HOST: define void @_Z7kernel3PU3AS2iPU3AS1i.stub(i32 addrspace(2)* %x, i32 addrspace(1)* %y)			// HOST: define void @_Z22__device_stub__kernel3PU3AS2iPU3AS1i(i32 addrspace(2)* %x, i32 addrspace(1)* %y)
	__global__ void kernel3(__attribute__((address_space(2))) int *x,			__global__ void kernel3(__attribute__((address_space(2))) int *x,
	__attribute__((address_space(1))) int *y) {			__attribute__((address_space(1))) int *y) {
	y[0] = x[0];			y[0] = x[0];
	}			}

	// CHECK: define void @_Z4funcPi(i32* %x)			// CHECK: define void @_Z4funcPi(i32* %x)
	__device__ void func(int *x) {			__device__ void func(int *x) {
	x[0]++;			x[0]++;
	}			}

	struct S {			struct S {
	int *x;			int *x;
	float *y;			float *y;
	};			};
	// `by-val` struct will be coerced into a similar struct with all generic			// `by-val` struct will be coerced into a similar struct with all generic
	// pointers lowerd into global ones.			// pointers lowerd into global ones.
	// CHECK: define amdgpu_kernel void @_Z7kernel41S(%struct.S.coerce %s.coerce)			// CHECK: define amdgpu_kernel void @_Z7kernel41S(%struct.S.coerce %s.coerce)
	// HOST: define void @_Z7kernel41S.stub(i32* %s.coerce0, float* %s.coerce1)			// HOST: define void @_Z22__device_stub__kernel41S(i32* %s.coerce0, float* %s.coerce1)
	__global__ void kernel4(struct S s) {			__global__ void kernel4(struct S s) {
	s.x[0]++;			s.x[0]++;
	s.y[0] += 1.f;			s.y[0] += 1.f;
	}			}

	// If a pointer to struct is passed, only the pointer itself is coerced into the global one.			// If a pointer to struct is passed, only the pointer itself is coerced into the global one.
	// CHECK: define amdgpu_kernel void @_Z7kernel5P1S(%struct.S addrspace(1)* %s.coerce)			// CHECK: define amdgpu_kernel void @_Z7kernel5P1S(%struct.S addrspace(1)* %s.coerce)
	// HOST: define void @_Z7kernel5P1S.stub(%struct.S* %s)			// HOST: define void @_Z22__device_stub__kernel5P1S(%struct.S* %s)
	__global__ void kernel5(struct S *s) {			__global__ void kernel5(struct S *s) {
	s->x[0]++;			s->x[0]++;
	s->y[0] += 1.f;			s->y[0] += 1.f;
	}			}

	struct T {			struct T {
	float *x[2];			float *x[2];
	};			};
	// `by-val` array is also coerced.			// `by-val` array is also coerced.
	// CHECK: define amdgpu_kernel void @_Z7kernel61T(%struct.T.coerce %t.coerce)			// CHECK: define amdgpu_kernel void @_Z7kernel61T(%struct.T.coerce %t.coerce)
	// HOST: define void @_Z7kernel61T.stub(float* %t.coerce0, float* %t.coerce1)			// HOST: define void @_Z22__device_stub__kernel61T(float* %t.coerce0, float* %t.coerce1)
	__global__ void kernel6(struct T t) {			__global__ void kernel6(struct T t) {
	t.x[0][0] += 1.f;			t.x[0][0] += 1.f;
	t.x[1][0] += 2.f;			t.x[1][0] += 2.f;
	}			}

clang/test/CodeGenCUDA/kernel-stub-name.cu

	// RUN: echo "GPU binary would be here" > %t			// RUN: echo "GPU binary would be here" > %t

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - -x hip\			// RUN: -fcuda-include-gpubinary %t -o - -x hip\
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=CHECK			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=CHECK

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

				extern "C" __global__ void ckernel() {}

				namespace ns {
				__global__ void nskernel() {}
				} // namespace ns

	template<class T>			template<class T>
	__global__ void kernelfunc() {}			__global__ void kernelfunc() {}

				__global__ void kernel_decl();

				// Device side kernel names

				// CHECK: @[[CKERN:[0-9]]] = {{.}} c"ckernel\00"
				// CHECK: @[[NSKERN:[0-9]]] = {{.}} c"_ZN2ns8nskernelEv\00"
				// CHECK: @[[TKERN:[0-9]]] = {{.}} c"_Z10kernelfuncIiEvv\00"

				// Non-template kernel stub functions

				// CHECK: define{{.*}}@[[CSTUB:__device_stub__ckernel]]
				// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[CSTUB]]
				// CHECK: define{{.*}}@[[NSSTUB:_ZN2ns23__device_stub__nskernelEv]]
				// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[NSSTUB]]

	// CHECK-LABEL: define{{.*}}@_Z8hostfuncv()			// CHECK-LABEL: define{{.*}}@_Z8hostfuncv()
	// CHECK: call void @[[STUB:_Z10kernelfuncIiEvv.stub]]()			// CHECK: call void @[[CSTUB]]()
	void hostfunc(void) { kernelfunc<int><<<1, 1>>>(); }			// CHECK: call void @[[NSSTUB]]()
				// CHECK: call void @[[TSTUB:_Z25__device_stub__kernelfuncIiEvv]]()
				// CHECK: call void @[[DSTUB:_Z26__device_stub__kernel_declv]]()
				void hostfunc(void) {
				ckernel<<<1, 1>>>();
				ns::nskernel<<<1, 1>>>();
				kernelfunc<int><<<1, 1>>>();
				kernel_decl<<<1, 1>>>();
				}

				// Template kernel stub functions

				// CHECK: define{{.*}}@[[TSTUB]]
				// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[TSTUB]]

	// CHECK: define{{.*}}@[[STUB]]			// CHECK: declare{{.*}}@[[DSTUB]]
	// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[STUB]]

	// CHECK-LABEL: define{{.*}}@__hip_register_globals			// CHECK-LABEL: define{{.*}}@__hip_register_globals
	// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[STUB]]			// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[CSTUB]]{{.*}}@[[CKERN]]
				// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[NSSTUB]]{{.*}}@[[NSKERN]]
				// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[TSTUB]]{{.*}}@[[TKERN]]

clang/test/CodeGenCUDA/unnamed-types.cu

	Show All 30 Lines
	// HOST: define internal void @_ZZ2f1PfENKUlS_E_clES_(			// HOST: define internal void @_ZZ2f1PfENKUlS_E_clES_(
	// DEVICE: define internal float @_ZZZ2f1PfENKUlS_E_clES_ENKUlfE_clEf(			// DEVICE: define internal float @_ZZZ2f1PfENKUlS_E_clES_ENKUlfE_clEf(
	void f1(float *p) {			void f1(float *p) {
	[](float *p) {			[](float *p) {
	k0<<<1,1>>>(p, [] __device__ (float x) { return x + 1.f; });			k0<<<1,1>>>(p, [] __device__ (float x) { return x + 1.f; });
	}(p);			}(p);
	}			}
	// HOST: @__hip_register_globals			// HOST: @__hip_register_globals
	// HOST: __hipRegisterFunction{{.}}@_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_{{.}}@0			// HOST: __hipRegisterFunction{{.}}@_Z17__device_stub__k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_{{.}}@0

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Fix device stub nameClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 249212

clang/include/clang/AST/GlobalDecl.h

clang/lib/AST/Expr.cpp

clang/lib/AST/ItaniumMangle.cpp

clang/lib/AST/Mangle.cpp

clang/lib/CodeGen/CGCUDANV.cpp

clang/lib/CodeGen/CGCUDARuntime.h

clang/lib/CodeGen/CGDecl.cpp

clang/lib/CodeGen/CGExpr.cpp

clang/lib/CodeGen/CodeGenModule.h

clang/lib/CodeGen/CodeGenModule.cpp

clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu

clang/test/CodeGenCUDA/kernel-stub-name.cu

clang/test/CodeGenCUDA/unnamed-types.cu

[HIP] Fix device stub name
ClosedPublic