This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
Specifiers.h
-
lib/
-
AST/
14/14
ItaniumMangle.cpp
-
Type.cpp
-
TypePrinter.cpp
-
Basic/Targets/
-
Targets/
-
X86.h
-
CodeGen/
-
CGCUDANV.cpp
-
CGDebugInfo.cpp
14/14
CodeGenModule.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
kernel-stub-name.cu

Differential D68578

[HIP] Fix device stub name
ClosedPublic

Authored by yaxunl on Oct 7 2019, 7:45 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
hliao
rsmith

Commits

rG22c457a869d5: [HIP] Fix device stub name

Summary

HIP emits a device stub function for each kernel in host code.

The HIP debugger requires device stub function to have a different unmangled name as the kernel.

Currently the name of the device stub function is the mangled name with a postfix .stub. However,
this does not work with the HIP debugger since the unmangled name is the same as the kernel.

This patch adds prefix __device__stub__ to the unmangled name of the device stub before mangling,
therefore the device stub function has a valid mangled name which is different than the device kernel
name. The device side kernel name is kept unchanged. kernels with extern "C" also gets the prefix added
to the corresponding device stub function.

Diff Detail

Event Timeline

yaxunl created this revision.Oct 7 2019, 7:45 AM

Herald added a subscriber: erik.pilkington. · View Herald TranscriptOct 7 2019, 7:45 AM

tra added a reviewer: rsmith.Oct 7 2019, 8:56 AM

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

In D68578#1697822, @tra wrote:

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

HIP debugger is a branch of gdb and the changes to support HIP will be upstreamed. When users set break point on a kernel, they intend to set a break point on the real kernel, not the device stub function. The device stub function is only a compiler generated helper function to help launch the kernel. Therefore it should have a different name so that it does not interfere with the symbol resolution of the real kernel.

hliao added inline comments.Oct 7 2019, 9:48 AM

lib/CodeGen/CGCUDANV.cpp
235 ↗	(On Diff #223598)	keeping the original assertion in HIP is still valuable to capture naming mismatch issue for unnamed types

In D68578#1697851, @yaxunl wrote:

In D68578#1697822, @tra wrote:

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

HIP debugger is a branch of gdb and the changes to support HIP will be upstreamed. When users set break point on a kernel, they intend to set a break point on the real kernel, not the device stub function. The device stub function is only a compiler generated helper function to help launch the kernel. Therefore it should have a different name so that it does not interfere with the symbol resolution of the real kernel.

I would agree that having distinct names for the device-side kernel and it's host-side stub would probably make things easier for debugger.
However, debugger does have access to mangled names and does see the '.stub' suffix in the mangled name. I don't understand why it can't be considered to disambiguate between the kernel and the stub?
I'm clearly missing something here. Is there a chance to get someone from the debugger team to chime in on this review directly?

Also, I would not agree that they intend to set a break point on the real kernel is the only scenario. E.g. quite often when I debug CUDA stuff, I do only care about host-side things and I do want to set breakpoint on the stub, so I can check kernel call parameters as they are passed to the kernel. It would be great if there were a way to explicitly tell debugger whether we want host-side stub or the kernel without having user to know how particular compiler transforms the name. For the user both entities have the same name, but distinct location and there should be a way to express that in the debugger.

In D68578#1697898, @tra wrote:

In D68578#1697851, @yaxunl wrote:

In D68578#1697822, @tra wrote:

Could you elaborate on how exactly current implementation does not work?

I would expect the kernel and the stub to be two distinct entities, as far as debugger is concerned. It does have enough information to track each independently (different address, .stub suffix, perhaps knowledge whether it's device or host code). Without the details, it looks to me that this is something that can and should be dealt with in the debugger. I've asked the same question in D63335 but I don't think I've got a good answer.

HIP debugger is a branch of gdb and the changes to support HIP will be upstreamed. When users set break point on a kernel, they intend to set a break point on the real kernel, not the device stub function. The device stub function is only a compiler generated helper function to help launch the kernel. Therefore it should have a different name so that it does not interfere with the symbol resolution of the real kernel.

I would agree that having distinct names for the device-side kernel and it's host-side stub would probably make things easier for debugger.
However, debugger does have access to mangled names and does see the '.stub' suffix in the mangled name. I don't understand why it can't be considered to disambiguate between the kernel and the stub?
I'm clearly missing something here. Is there a chance to get someone from the debugger team to chime in on this review directly?

Also, I would not agree that they intend to set a break point on the real kernel is the only scenario. E.g. quite often when I debug CUDA stuff, I do only care about host-side things and I do want to set breakpoint on the stub, so I can check kernel call parameters as they are passed to the kernel. It would be great if there were a way to explicitly tell debugger whether we want host-side stub or the kernel without having user to know how particular compiler transforms the name. For the user both entities have the same name, but distinct location and there should be a way to express that in the debugger.

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function. It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

In contrast, the stub is effectively part of the implementation of actually launching the device function. It should have a distinct name. The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it. My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

In D68578#1698864, @t-tye wrote:

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it. My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

Yes this patch does this.

In D68578#1698864, @t-tye wrote:

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.

What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.

It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

OK. This sounds like __host__/__device__ function overloads and what you're saying does make sense for that.

In contrast, the stub is effectively part of the implementation of actually launching the device function. It should have a distinct name.

I'm not sure how the requirement of distinct name follows from the fact that the stub is the host-side part of the device-side kernel? To me it looks like an argument for them to have the same name so it's clear that they are both part of the same function as written in the source.

The don't have to be different. CUDA (and HIP) does not allow overloading of kernels, so the stub and the kernel can have identical names as in the example of __host__ and __device__ overloads you've described above, only now it's __host__ stub + __global__ kernel itself, instead of two user-implemented functions. Debugger, of course, will need to know about that to pick the stub or kernel as the breakpoint location, but that appears doable.

The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.

What happens if the stub and the kernel do have identical names?
My understanding, based on your comments above is that debugger does know about host and device 'spaces' and that it can find pointers to both host and device functions and set appropriate breakpoints for both. In this case it would normally attempt to set breakpoint on both the stub and the kernel as it would in case of __host__/__device__ overloads you've described above. In case of stub/kernel, we would want the breakpoint set only on the kernel itself. Given that debugger does have ability to tell host and device functions/symbols apart, the difficulty is really in being able to tell a real host function from the stub, so we can skip it.

Is that indeed what we want/need? Is there something else?

Does debugger know that device-side function is a kernel? In case of CUDA, the kernels are distinct from regular device-side functions. I don't know whether that's the case for AMDGPU.
If debugger can tell that particular device function is a kernel, that can be used to infer that the matching host-side symbol is a stub and skip setting a breakpoint on it.
If that does not work, debugger presumably has access to the mangled symbols for the potential breakpoint locations. The stub currently has distinct .stub suffix. This can also be used to tell it apart from a regular __host__ function.

I do not see how changing the source-level name for the stub is going to change things in principle. It's just yet another way to disambiguate a real __host__ function from a host stub we generate for the kernel.
Is there anything else about the stubs that requires them to have a name different from the kernel?

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it.

Currently the mangled name has .stub suffix which is discarded during unmangling, so unmangled names for the stub and the kernel end up being identical. I'm trying to figure out why is it a problem to be fixed in the compiler.

My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

This patch proposes changing the source-level name for the stub. Unfortunately the way it attempt to implement it is by doing the renaming during mangling phase itself. This appears to be the wrong place to change source-level name.
Before figuring out what would be the right thing to do here, I want to understand why we're doing it. I appreciate your description of what drives this requirement. I think I have petter idea of it now, but I still have some questions. Please bear with me.

In D68578#1700652, @tra wrote:

In D68578#1698864, @t-tye wrote:

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.

What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.

We want to support a heterogeneous gdb debugger for a single source programming language. We would like to follow the same conventions used by compilers that implement other languages supported by gdb. The debugger can use symbols to find functions. It supports unmangling them and using the unmangled name to indicate the source language function it corresponds to. We would like this to remain true. The stub is not the kernel function, it is a helper function that will launch the kernel. In many ways it is acting like other trampolines. Therefore, it should be named as a internal helper function. The debugger can chose what it wants to do with it, but it does not want to be confused into thinking it actually IS the kernel function. If the user sets a breakpoint in the code of the kernel function then that breakpoint should be hit by every instance of the kernel that is created by the dispatch. It should not be hit by the code that is initiatig the dispatch. If that is what the user wanted they would set a breakpoint at the statement that performs the dispatch launch.

Whether the kernel is present in the CPU or GPU code is s separate concept. If it is present in both, then both would have the same symbol as they are both implementing the kernel. The debugger would set a breakpoint in both as from a language execution model poit of view if either piece of code executes it corresponds to the same source language kernel.

It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

OK. This sounds like __host__/__device__ function overloads and what you're saying does make sense for that.

Right. Well its not really and overload, not a request to have instances of the kernel avaiable for either the CPU or GPU to execute. They are the same function, not different overloads.

In contrast, the stub is effectively part of the implementation of actually launching the device function. It should have a distinct name.

I'm not sure how the requirement of distinct name follows from the fact that the stub is the host-side part of the device-side kernel? To me it looks like an argument for them to have the same name so it's clear that they are both part of the same function as written in the source.

The don't have to be different. CUDA (and HIP) does not allow overloading of kernels, so the stub and the kernel can have identical names as in the example of __host__ and __device__ overloads you've described above, only now it's __host__ stub + __global__ kernel itself, instead of two user-implemented functions. Debugger, of course, will need to know about that to pick the stub or kernel as the breakpoint location, but that appears doable.

As mentioned, the stub is not the host side part of the device side kernel. The stub is a means to launch the kernel. That launching could happen on the device (device enqueue), or on the host. The kernel itself could execute on the device or the host. There is the act of launching the kernel (the function call statement if you will), and the kernel instances that come into existence (the threads created to execute the body of the kernel according to the launch bounds presented at the launch statement).

The user may want to set a breakpoint at the launch statement, or in the body of the kernel. The language execution model treats those separately. The standard debugger expects the symbols to reflect the language constructs. Hence wanting the launch stub (which is compiler generated and not user written) to be distinct from the kernel body. If the compiler decided not to use a launch stub function (perhaps it is launching a kernel that will execute on the CPU and so does not need a helper function) then that is its choice. It is desirable that the debugger does not have to know about the choices made by the compiler. It simply wants to know that a symbol that appears to be a user source language construct is in fact exactly that. It does not have to do any compiler specific filtering.

When the debugger has DWARF available, it can use that to get a more accurate picture, but gdb does have the ability to fall back on just symbols, and it is that functionality we would like to preserve in the same manner as is done for other languages.

The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.

It is not really a work around. It is making the symbols reflect the reality of the source language program. The debugger can then simply trust that information and use it as gdb does for other languages.

Features such only break on this inlining of a function may be useful, but gdb does not currenty support that. Similarly, it does not support breakpoints based on the architecture. That could be simulated by having a conditional breakpoint that continues if the current thread architecture does not equal the chosen architecture. If such a feature were widely used it could be accelerated by adding architecture conditional breakpoints. I can add that to our list of suggestions.

What happens if the stub and the kernel do have identical names?

The stub is compiler generated so should never have a name that can collide with a user name.

My understanding, based on your comments above is that debugger does know about host and device 'spaces' and that it can find pointers to both host and device functions and set appropriate breakpoints for both. In this case it would normally attempt to set breakpoint on both the stub and the kernel as it would in case of __host__/__device__ overloads you've described above. In case of stub/kernel, we would want the breakpoint set only on the kernel itself. Given that debugger does have ability to tell host and device functions/symbols apart, the difficulty is really in being able to tell a real host function from the stub, so we can skip it.

Is that indeed what we want/need? Is there something else?

As mentioned above. The desire is to have the compiler generate information in a standard way so the debugger can consume it in a standard way. If the user wants to set a breakpoint in the kernel, the compiler information should only lead the debugger to setting a breakpoint in the kernel, not in some other function that is used in the implementation of launching kernels. The user expects a kernel breakpoint to only be hit by the threads that execute the kernel instances created by the launch as that is how the language is defined. The debugger simply keeps track of what code objects are loaded, and what symbols they contain. It does not need to know the distinction between host and device code to implement the basic debugger functionality.

Does debugger know that device-side function is a kernel? In case of CUDA, the kernels are distinct from regular device-side functions. I don't know whether that's the case for AMDGPU.
If debugger can tell that particular device function is a kernel, that can be used to infer that the matching host-side symbol is a stub and skip setting a breakpoint on it.
If that does not work, debugger presumably has access to the mangled symbols for the potential breakpoint locations. The stub currently has distinct .stub suffix. This can also be used to tell it apart from a regular __host__ function.

The debugger does not have to care if the symbol is for a kernel or a function, it will simply plonk a breakpoint in the code that corresponds to the symbol and report when it is hit. If the symbols follow the conventions used by other languages then the debugger does not have to do anything special to support a source language that happens to be executing on multiple devices.

I do not see how changing the source-level name for the stub is going to change things in principle. It's just yet another way to disambiguate a real __host__ function from a host stub we generate for the kernel.
Is there anything else about the stubs that requires them to have a name different from the kernel?

The stub is not a source level function, it is a compiler generated function, and the desire is that it be named so as not to conflict with actual source level constructs as is done for other compiler generated entries.

I am a little unclear what this patch is doing as it is mentioned that the mangled name has a _stub in it.

Currently the mangled name has .stub suffix which is discarded during unmangling, so unmangled names for the stub and the kernel end up being identical. I'm trying to figure out why is it a problem to be fixed in the compiler.

My understanding was that an earlier review rejected adding the suffix to the mangled name as it broke unmangling. It also does not seem the right thing to do as it does not follow the convention for oter compiler generate symbols.

My understanding is that the intention was to create a distinct unmangled name for the stub, and then mangle it so that the resulting symbol was a legal mangled name. It sounded like this was the preferred approach, and makes sense to me based on my current understanding. Am I understanding this correctly?

This patch proposes changing the source-level name for the stub. Unfortunately the way it attempt to implement it is by doing the renaming during mangling phase itself. This appears to be the wrong place to change source-level name.

What do you think is the right place to do it?

Before figuring out what would be the right thing to do here, I want to understand why we're doing it. I appreciate your description of what drives this requirement. I think I have petter idea of it now, but I still have some questions. Please bear with me.

That makes complete sense. The desire is to have the debugger treat heterogeneous single source debugging in the same way as traditional CPU debugging. That the user experience is basically the same. By following the same conventions used for the CPU in the GPU, and implementing similar runtime controls, it allows a common debugger code base to support both with minimal change and ensure a consistent user experience. We would like to avoid adding special treatment to support the GPU in the debugger if following the existing conventions/standards will allow the existing code to simply work.

Hopefully the above responses help describe the motivation for this. If not let me know and thanks for taking the time to review.

In D68578#1700652, @tra wrote:

This patch proposes changing the source-level name for the stub. Unfortunately the way it attempt to implement it is by doing the renaming during mangling phase itself. This appears to be the wrong place to change source-level name.

A specific difficulty here is that we need not only get the mangled kernel stub name, but also get the mangled kernel name. However, there is only one FuncDecl for the kernel. If we change the name of the FuncDecl to the stub name to be different from the kernel name, then we cannot get the mangled name for the kernel. That's why this patch does not change FuncDecl but let the mangler mangle it in two different ways. An alternative approach would be to create two FuncDecls, one for stub, one for kernel, and keep a map from the stub to the kernel. In this way we do not need to change the mangler.

Apologies for the delay with my response.

In D68578#1700819, @t-tye wrote:

In D68578#1700652, @tra wrote:

In D68578#1698864, @t-tye wrote:

From a source language point of view, the device function comprises the code that is launched as a grid. We need this fact to be present in the symbols used. Only the device function should have a symbol name matching the mangled name of the device function.

What do you have in mind when you use 'symbol name' here? Is that a symbol as seen by linker? If that's the case, do host and device share this name space on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and each GPU have their own spaces), so they never clash, but the kernel names must have identical mangled name on host and all devices, so the host can refer to the device-side kernel when it needs to launch it.

We want to support a heterogeneous gdb debugger for a single source programming language. We would like to follow the same conventions used by compilers that implement other languages supported by gdb. The debugger can use symbols to find functions. It supports unmangling them and using the unmangled name to indicate the source language function it corresponds to. We would like this to remain true. The stub is not the kernel function, it is a helper function that will launch the kernel. In many ways it is acting like other trampolines. Therefore, it should be named as a internal helper function. The debugger can chose what it wants to do with it, but it does not want to be confused into thinking it actually IS the kernel function. If the user sets a breakpoint in the code of the kernel function then that breakpoint should be hit by every instance of the kernel that is created by the dispatch. It should not be hit by the code that is initiatig the dispatch. If that is what the user wanted they would set a breakpoint at the statement that performs the dispatch launch.

Whether the kernel is present in the CPU or GPU code is s separate concept. If it is present in both, then both would have the same symbol as they are both implementing the kernel. The debugger would set a breakpoint in both as from a language execution model poit of view if either piece of code executes it corresponds to the same source language kernel.

Thank you for the details.

It the device function has both a host and device implementation then both can have the source language function name for the symbol since both actually implement the device function. If the user asks to set a breakpoint in the device function then the debugger would set in both implementations so the user is notified when the source program executes the device function, regardless of which implementation is invoked. This is similar to the debugger setting a breakpoint in a function that is inlined into multiple places: the debugger sets breeakpoints in all the inlined places so the user can tstill think of the program debugging in terms of the source language semantics.

OK. This sounds like __host__/__device__ function overloads and what you're saying does make sense for that.

Right. Well its not really and overload, not a request to have instances of the kernel avaiable for either the CPU or GPU to execute. They are the same function, not different overloads.

In case of cuda they may be overloads -- there may be two functions with identical signatures (modulo __host__/__device__ attributes) or multiple functions with the same names but different signatures with different __host__/__device__ attributes. It does not change things in principle. I'm just pointing out that CUDA (and thus HIP) as implemented in clang uses target attributes as another dimension in the space of functions with the same name.

The debugger can still be used to set a breakpoint in it, or to step into it. But that should be done in terms of the stub name. If the debugger wants to support source language specific intelligence it can provide a helper library that understands the stub names. This helper library (similar to the thread helper library) can be used by the debugger to present a cleaner language view to the user. In fact OpenMP has also done this and provides a helper library called OMPD that can be used by tools such as a debugger to hide OpenMP trampoline functions etc.

Do I understand it correctly that giving the stub distinct name would effectively get it out of the way when a breakpoint is set on the kernel? I.e. it's essentially a work around the fact that debugger may not have convenient way to specify "set breakpoint on this name in device code only". Perhaps it would make sense to prove this ability as it sounds quite useful. I.e I may want to set breakpoint on all inlined host/device functions, but only on device side. That would be handy.

It is not really a work around. It is making the symbols reflect the reality of the source language program. The debugger can then simply trust that information and use it as gdb does for other languages.

It still seems to boil down to "the stub should not get in the way of debugger accessing the function itself", but I see your point and agree that it would be useful if the stub could be an entity separate from the function itself.

Now we need to figure out what would be the best way to implement it.

Clang uses the real function in AST to generate the IR for the stub and, because of that, the stub ends up using function's name.
Actually, the situation is a bit worse than that. Clang implicitly relies on __host__ and __device__ entities not being codegen'ed at the same time, so we don't have to care about name conflicts.
Your description above indicates that the assumption is somewhat optimistic and that's what really causes the issue here.

I think @yaxunl's suggestion that we may need different FuncDecl's would be a good way forward.
I suspect we may already have places where clang deals with compiler-generated functions, so we should have existing examples of how it could be done.

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

In D68578#1737351, @rjmccall wrote:

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

It seems GlobalDecl is just a wrapper for concrete Decl's

https://github.com/llvm/llvm-project/blob/31817731167135870259ef1e7387746345b96a2f/clang/include/clang/AST/GlobalDecl.h#L40

Here we need to get the mangled name of a kernel and the mangled name of the same kernel but with a prefix before mangling.

Can I use GlobalDecl with the same FunctionDecl* but different multi-version index to indicate it is a kernel or a stub, then let the mangler mangle them differently?

In D68578#1737415, @yaxunl wrote:

In D68578#1737351, @rjmccall wrote:

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

It seems GlobalDecl is just a wrapper for concrete Decl's

It's a Decl plus a discriminator which is required for certain kinds of declaration. See e.g. GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type). GlobalDecl asserts if you try to construct it using GlobalDecl(FunctionDecl*) with a constructor/destructor declaration; we could similarly make that forbid construction with a kernel and then require code to use a GlobalDecl constructor that passes down whether it's the kernel or the stub that's being requested.

John.

In D68578#1737419, @rjmccall wrote:

In D68578#1737415, @yaxunl wrote:

In D68578#1737351, @rjmccall wrote:

Distinguishing between multiple symbols associated with the same source-level declaration is the purpose of the GlobalDecl abstraction.

It seems GlobalDecl is just a wrapper for concrete Decl's

It's a Decl plus a discriminator which is required for certain kinds of declaration. See e.g. GlobalDecl(const CXXConstructorDecl *D, CXXCtorType Type). GlobalDecl asserts if you try to construct it using GlobalDecl(FunctionDecl*) with a constructor/destructor declaration; we could similarly make that forbid construction with a kernel and then require code to use a GlobalDecl constructor that passes down whether it's the kernel or the stub that's being requested.

John.

In host compilation, we do not need to differentiate device function or stub function except for the mangler. Currently the mangler does not know about GlobalDecl. If we let the mangler to mangle a function based on whether it is a GlobalDecl or FunctionDecl, we still need to modify the mangler, and the change will be similar to the current approach.

There are a number of places in IRGen that pass around GlobalDecls with the expectation that that's sufficient to uniquely identify a symbol. The fact that IRGen breaks down the GD at the last second before passing it to the mangler, rather than passing it to the mangler and letting the mangler decide what to do with it, doesn't really change anything and is arguably poor code design anyway. Inventing a second declaration, or trying to propagate a flag outside of GD, is just fighting the architecture for no good reason.

Attempt to prefix the kernel stub name on the fly.

If we do not want to create two Decl's during parsing, and do not want to change the mangler, it seems the least invasive way to get the prefixed kernel name is to change it on the fly then change it back.

tra added inline comments.Nov 13 2019, 11:53 AM

clang/lib/CodeGen/CodeGenModule.cpp
1108–1117	On one hand I like this patch variant much better than the one that changed the mangling itself. On the other hand this code appears to reply on implementation details. I.e. we're setting new name on `FD` which may or may not be the same as `ND`, but we're always passing `ND` to `getMangledNameImpl()`. Perhaps we could implement name-tweaking as another `MultiVersionKind` which we already plumb into getMangledNameImpl() and which allows changing the name for target attributes & features.

yaxunl marked an inline comment as done.Nov 14 2019, 9:02 AM

yaxunl added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
1108–1117	The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name.

tra added inline comments.Nov 14 2019, 9:36 AM

clang/lib/CodeGen/CodeGenModule.cpp
1108–1117	The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. That would be putting more effort into working around the fact that `getMangledNameImpl()` doesa not provide a good API to change the name the way you need to. That's what needs to be addressed, IMO. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name. Some existing implementations append to the mangled name, but we can do other manipulations there, too. The string with the mangled name originates in `getMangledNameImpl` and we could do more than just append to it. We do not have to use the `MultiVersion` for that, either. E.g. we prepend `__regcall3__` to the names of functions with `CC_X86RegCall` calling convention. We could do something similar for the kernel stub, I wonder if we could just generate a unique name and be done with that? Hmm. Unique name probably would not do if, let's say, a kernel is defined in one TU, but we need to call it from another TU. So, whichever way we change the name of the stub, it will need to be the same everywhere. You may want to add a test verifying that launching of declaration-only kernels uses the right name. Consistency of name mangling means that we do need to include regular C++-mangled information. Which means we need to do the name tweaking deeper down. How about using calling conventions? It's been suggested in the past that a lot of shenanigans around kernel launches could/should be done as a different calling convention. One of the things affected by the calling convention is mangling and we can add prefix there: https://github.com/llvm/llvm-project/blob/master/clang/lib/AST/Mangle.cpp#L164 We could tag host-side kernel with 'kernel call' calling convention on the host side and then plumb prefixing to be done similar to `__regcall3__`. If that works that may be a useful improvement overall. For instance, we may no longer need to stash a `it's a kernel` flag among attributes and it would probably be useful for other things (e.g enforcing address space requirements for kernel pointer arguments).

yaxunl added inline comments.Nov 14 2019, 2:55 PM

clang/lib/CodeGen/CodeGenModule.cpp
1108–1117	will add a test for decl only kernel. At least for the current implementation I see it works. A decl of stub function with expected name is emitted and can be called. About calling conv. I've tried implementing `__global__` as a calling conv before. The issue is that it is part of type system and clang enforces type checking for that. e.g. you cannot assign it to an ordinary function pointer unless that function pointer is also declared with the same calling convention. This will cause lots of type mismatching issues. In CUDA language, `__global__` is not part of type system since it is just an attribute. We could introduce a calling conv for stub, but probably we can only use use it when we mangle the stub function.

tra added inline comments.Nov 14 2019, 3:21 PM

clang/lib/CodeGen/CodeGenModule.cpp
1108–1117	OK. I'm fresh out of ideas. We should add some sort of assert to make sure that the mangled name does have the prefix we intended to add. Also a TODO to figure out a better way to add a name prefix before mangling. If anyone else has other suggestions, please chime in.

use calling convention to mangle the stub differently.

clean up and fix assertions.

Herald added subscribers: nhaehnle, jvesely. · View Herald TranscriptDec 3 2019, 12:52 PM

ping

I mean, I made a recommendation and you dismissed it.

Revised by John's comments. Introduced HIPKernelType for GlobalDecl so that we can use GlobalDecl to
represent stub and kernel during host compilation. Revised mangler so that GlobalDecl carrying
pertinent information can be passed through.

Herald added a subscriber: kerbowa. · View Herald TranscriptFeb 17 2020, 9:00 AM

Thanks, I think this approach is really improving the existing code.

clang/include/clang/AST/GlobalDecl.h
40 ↗	(On Diff #244988)	The attribute here is `CUDAGlobalAttr`; should this be named in terms of CUDA, or is the CUDA model sufficiently different from HIP that the same implementation concept doesn't apply?
52 ↗	(On Diff #244988)	This function exists primarily to be used as a common initializer for all the constructors that don't require any of the extra fields. (This file predates LLVM's adoption of C++14, which allows constructors to delegate to other constructors.) That's why it asserts that it's not used with constructors or destructors. So, two questions: Is there a reason this function now needs to tolerate null declarations? There's some subtle implicit behavior here, in that references to kernels default to `HIPKernelKind::Kernel`. Is that reasonable, or should there be a third assertion that this function isn't used with kernel declarations?
131 ↗	(On Diff #244988)	`explicit`, please.
162 ↗	(On Diff #244988)	The indentation here seems odd.
clang/lib/AST/ItaniumMangle.cpp
400	Does passing down a `GlobalDecl` everywhere allow us to remove these constructors, i.e. to eliminate the `Structor` and `StructorType` fields?
644	This can just be `cast`, except actually I don't think you need a cast here at all given the code below.
814	There's a default constructor.
1587	The relevant wording from the Itanium spec here is: For entities in constructors and destructors, the mangling of the complete object constructor or destructor is used as the base function name, i.e. the C1 or D1 version. But you might consider pulling this out as a helper function, something like: static GlobalDecl getParentOfLocalEntity(const DeclContext *DC);
1589	This can still be `cast`.
1879	You can just pass `GlobalDecl()` here.
4954	The second assertion can just be removed now, since the GD should be carrying the right information.
clang/lib/CodeGen/CodeGenModule.cpp
997	Let's see if we can make this breakdown no longer necessary, since `MangleContext::mangleName` should be capable of doing the right thing starting straight from a GD. In fact, maybe we can remove most of the specialized mangling methods (like `mangleCXXCtor` and `mangleCXXDtor`) from `MangleContext` completely? Unrelatedly: there's an `Out` declared in the outermost scope, but a bunch of these scopes declare their own `Out`; could you just fix that while you're editing this function?
1010	Is this the best way of handling this, or should `shouldMangleDeclName` return true for kernels (or at least stubs?) even in C? Honest question.
3300	Should this be handled in the caller, or would that make things unreasonably difficult?

Nice! Thank you for making these changes.

clang/include/clang/AST/GlobalDecl.h
40 ↗	(On Diff #244988)	I believe the attribute serves the same purpose in both CUDA and HIP and could be renamed appropriately in a separate patch. While the changes in this patch are not required for CUDA, CUDA would benefit from them. We could use a generic GPU prefix and migrate CUDA to the same model later. A TODO comment about that would be nice.
clang/lib/CodeGen/CodeGenModule.cpp
997	Perhaps it would make sense to split this patch into two -- one that changes mangler input to GlobalDecl and the other one dealing with HIP stubs.

rjmccall added inline comments.Feb 18 2020, 9:58 AM

clang/include/clang/AST/GlobalDecl.h
40 ↗	(On Diff #244988)	I'd just like consistency. If they're serving the same purpose, then as someone with no dog in this fight, I would give precedence to CUDA over HIP in names since it's both the older language and was implemented first in Clang (even if only partially, IIUC). I don't think a generic name works unless we can meaningfully generalize it to all languages with a similar feature, e.g. OpenCL and so on.
clang/lib/CodeGen/CodeGenModule.cpp
997	Yes, that's a good idea.

tra added inline comments.Feb 18 2020, 10:18 AM

clang/include/clang/AST/GlobalDecl.h
40 ↗	(On Diff #244988)	Naming, the hardest problem in computer science. :-) I personally would prefer generalization-with-exclusions over specific name which is inconsistently commingles things that's really specific to that name and things that are more widely used. Alas, right now CUDA is the example of the latter case -- some parts are CUDA-specific and a lot are shared with HIP. For the new features we've been sort of sticking with using CUDA/HIP for specific parts and GPU for shared functionality, but as things are a lot of shared bits are still 'CUDA' and it's hard to tell them apart. As you point it out, renaming the incumbent names would be a pain, so here we are. I think using `GPUKernelKind` with a comment that it reflects HIP & CUDA kernels would be somewhat better choice than `CUDAKernelKind` which would be somewhat confusing at this point given that CUDA actually does not use it yet. I'm also fine with keeping it `HIPKernelKind` and postpone the naming decision until CUDA-related parts are actually implemented.

rjmccall added inline comments.Feb 19 2020, 7:24 AM

clang/include/clang/AST/GlobalDecl.h
40 ↗	(On Diff #244988)	Maybe `KernelReferenceKind`? It's probably a common concept across all heterogenous-computing models.

tra added inline comments.Feb 19 2020, 9:18 AM

clang/include/clang/AST/GlobalDecl.h
40 ↗	(On Diff #244988)	SGTM.

yaxunl marked 38 inline comments as done.Mar 5 2020, 10:30 AM

yaxunl added inline comments.

clang/include/clang/AST/GlobalDecl.h
40 ↗	(On Diff #244988)	changed.
52 ↗	(On Diff #244988)	By using default constructor of GlobalDecl, null declaration is no longer necessary. I have reverted change to assert. I think it is a good idea to force GlobalDecl of kernels to be instantiated with the specific ctor. I have added assert to Init to prevent kernels to be instantiated through it.
131 ↗	(On Diff #244988)	fixed
162 ↗	(On Diff #244988)	fixed
clang/lib/AST/ItaniumMangle.cpp
400	Removing eliminate the Structor and StructorType will incur significantly more changes. Can it be done later? Thanks.
644	removed cast
814	fixed
1587	extracted to getParentOfLocalEntity
1589	fixed
1879	fixed
4954	removed
clang/lib/CodeGen/CodeGenModule.cpp
997	Fixed the redundant Out var. However, removing mangleCXXCtor/Dtor will incur significantly more changes. Can it be done later? Thanks.
997	separated to https://reviews.llvm.org/D75700
1010	This is for extern "C" kernels, which are either not mangled or with simple prefix. I tried returning true for them in shouldMangleDeclName, and they got mangled as Itanium mangling, which seems not right.
3300	fixed

Revised by John's and Artem's comments.

yaxunl added a parent revision: D75700: [NFC] Let mangler accept GlobalDecl.Mar 5 2020, 10:37 AM

update patch

Few nits. LGTM otherwise.

clang/include/clang/AST/GlobalDecl.h
61 ↗	(On Diff #248927)	Wording inconsitency -- we're checking for `CUDAGlobalAttr` but complaining about 'HIP kernels'. Just drop 'HIP' or replace with 'GPU'?
85 ↗	(On Diff #248927)	Ditto.
129 ↗	(On Diff #248927)	Same wording nit.
188 ↗	(On Diff #248927)	Ditto.
clang/lib/CodeGen/CGCUDARuntime.h
69 ↗	(On Diff #248927)	Adding a descriptive comment would be great. Otherwise anyone looking at the function decl without the context of this patch will be puzzled about its meaning and purpose. Also, perhaps the argument type should be a `NamedDecl` -- the function is not used on or useful for regular `Decl`. It will save us few casts in other places, too.

This revision is now accepted and ready to land.Mar 9 2020, 9:22 AM

yaxunl marked 9 inline comments as done.Mar 9 2020, 1:42 PM

yaxunl added inline comments.

clang/include/clang/AST/GlobalDecl.h
61 ↗	(On Diff #248927)	will use GPU kernel
clang/lib/CodeGen/CGCUDARuntime.h
69 ↗	(On Diff #248927)	done

Closed by commit rG22c457a869d5: [HIP] Fix device stub name (authored by yaxunl). · Explain WhyMar 9 2020, 2:03 PM

This revision was automatically updated to reflect the committed changes.

yaxunl marked 2 inline comments as done.

Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2020, 2:03 PM

yaxunl mentioned this in D113491: [HIP] Fix device stub name for Windows.Nov 9 2021, 8:27 AM

yaxunl mentioned this in rG38211bbab1d9: [HIP] Fix device stub name for Windows.Nov 23 2021, 9:04 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Specifiers.h

36 lines

lib/

AST/

ItaniumMangle.cpp

17 lines

Type.cpp

2 lines

TypePrinter.cpp

1 line

Basic/

Targets/

X86.h

3 lines

CodeGen/

CGCUDANV.cpp

5 lines

CGDebugInfo.cpp

1 line

CodeGenModule.cpp

12 lines

test/

CodeGenCUDA/

kernel-stub-name.cu

45 lines

Diff 231929

clang/include/clang/Basic/Specifiers.h

Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	#include "clang/Basic/OpenCLImageTypes.def"
enum InClassInitStyle {		enum InClassInitStyle {
ICIS_NoInit, ///< No in-class initializer.		ICIS_NoInit, ///< No in-class initializer.
ICIS_CopyInit, ///< Copy initialization.		ICIS_CopyInit, ///< Copy initialization.
ICIS_ListInit ///< Direct list-initialization.		ICIS_ListInit ///< Direct list-initialization.
};		};

/// CallingConv - Specifies the calling convention that a function uses.		/// CallingConv - Specifies the calling convention that a function uses.
enum CallingConv {		enum CallingConv {
CC_C, // __attribute__((cdecl))		CC_C, // __attribute__((cdecl))
CC_X86StdCall, // __attribute__((stdcall))		CC_X86StdCall, // __attribute__((stdcall))
CC_X86FastCall, // __attribute__((fastcall))		CC_X86FastCall, // __attribute__((fastcall))
CC_X86ThisCall, // __attribute__((thiscall))		CC_X86ThisCall, // __attribute__((thiscall))
CC_X86VectorCall, // __attribute__((vectorcall))		CC_X86VectorCall, // __attribute__((vectorcall))
CC_X86Pascal, // __attribute__((pascal))		CC_X86Pascal, // __attribute__((pascal))
CC_Win64, // __attribute__((ms_abi))		CC_Win64, // __attribute__((ms_abi))
CC_X86_64SysV, // __attribute__((sysv_abi))		CC_X86_64SysV, // __attribute__((sysv_abi))
CC_X86RegCall, // __attribute__((regcall))		CC_X86RegCall, // __attribute__((regcall))
CC_AAPCS, // __attribute__((pcs("aapcs")))		CC_AAPCS, // __attribute__((pcs("aapcs")))
CC_AAPCS_VFP, // __attribute__((pcs("aapcs-vfp")))		CC_AAPCS_VFP, // __attribute__((pcs("aapcs-vfp")))
CC_IntelOclBicc, // __attribute__((intel_ocl_bicc))		CC_IntelOclBicc, // __attribute__((intel_ocl_bicc))
CC_SpirFunction, // default for OpenCL functions on SPIR target		CC_SpirFunction, // default for OpenCL functions on SPIR target
CC_OpenCLKernel, // inferred for OpenCL kernels		CC_OpenCLKernel, // inferred for OpenCL kernels
CC_Swift, // __attribute__((swiftcall))		CC_Swift, // __attribute__((swiftcall))
CC_PreserveMost, // __attribute__((preserve_most))		CC_PreserveMost, // __attribute__((preserve_most))
CC_PreserveAll, // __attribute__((preserve_all))		CC_PreserveAll, // __attribute__((preserve_all))
CC_AArch64VectorCall, // __attribute__((aarch64_vector_pcs))		CC_AArch64VectorCall, // __attribute__((aarch64_vector_pcs))
		CC_DeviceStub, // inferred for HIP device stub
};		};

/// Checks whether the given calling convention supports variadic		/// Checks whether the given calling convention supports variadic
/// calls. Unprototyped calls also use the variadic call rules.		/// calls. Unprototyped calls also use the variadic call rules.
inline bool supportsVariadicCall(CallingConv CC) {		inline bool supportsVariadicCall(CallingConv CC) {
switch (CC) {		switch (CC) {
case CC_X86StdCall:		case CC_X86StdCall:
case CC_X86FastCall:		case CC_X86FastCall:
case CC_X86ThisCall:		case CC_X86ThisCall:
case CC_X86RegCall:		case CC_X86RegCall:
case CC_X86Pascal:		case CC_X86Pascal:
case CC_X86VectorCall:		case CC_X86VectorCall:
case CC_SpirFunction:		case CC_SpirFunction:
case CC_OpenCLKernel:		case CC_OpenCLKernel:
case CC_Swift:		case CC_Swift:
		case CC_DeviceStub:
return false;		return false;
default:		default:
return true;		return true;
}		}
}		}

/// The storage duration for an object (per C++ [basic.stc]).		/// The storage duration for an object (per C++ [basic.stc]).
enum StorageDuration {		enum StorageDuration {
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

clang/lib/AST/ItaniumMangle.cpp

Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	public:
}		}
CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,		CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,
const CXXConstructorDecl *D, CXXCtorType Type)		const CXXConstructorDecl *D, CXXCtorType Type)
: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),		: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),
SeqID(0), AbiTagsRoot(AbiTags) { }		SeqID(0), AbiTagsRoot(AbiTags) { }
CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,		CXXNameMangler(ItaniumMangleContextImpl &C, raw_ostream &Out_,
const CXXDestructorDecl *D, CXXDtorType Type)		const CXXDestructorDecl *D, CXXDtorType Type)
: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),		: Context(C), Out(Out_), Structor(getStructor(D)), StructorType(Type),
SeqID(0), AbiTagsRoot(AbiTags) { }		SeqID(0), AbiTagsRoot(AbiTags) { }
		rjmccallUnsubmitted Done Reply Inline Actions Does passing down a `GlobalDecl` everywhere allow us to remove these constructors, i.e. to eliminate the `Structor` and `StructorType` fields? rjmccall: Does passing down a `GlobalDecl` everywhere allow us to remove these constructors, i.e. to…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Removing eliminate the Structor and StructorType will incur significantly more changes. Can it be done later? Thanks. yaxunl: Removing eliminate the Structor and StructorType will incur significantly more changes. Can it…

CXXNameMangler(CXXNameMangler &Outer, raw_ostream &Out_)		CXXNameMangler(CXXNameMangler &Outer, raw_ostream &Out_)
: Context(Outer.Context), Out(Out_), NullOut(false),		: Context(Outer.Context), Out(Out_), NullOut(false),
Structor(Outer.Structor), StructorType(Outer.StructorType),		Structor(Outer.Structor), StructorType(Outer.StructorType),
SeqID(Outer.SeqID), FunctionTypeDepth(Outer.FunctionTypeDepth),		SeqID(Outer.SeqID), FunctionTypeDepth(Outer.FunctionTypeDepth),
AbiTagsRoot(AbiTags), Substitutions(Outer.Substitutions) {}		AbiTagsRoot(AbiTags), Substitutions(Outer.Substitutions) {}

CXXNameMangler(CXXNameMangler &Outer, llvm::raw_null_ostream &Out_)		CXXNameMangler(CXXNameMangler &Outer, llvm::raw_null_ostream &Out_)
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	private:
void mangleUnscopedName(const NamedDecl *ND,		void mangleUnscopedName(const NamedDecl *ND,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleUnscopedTemplateName(const TemplateDecl *ND,		void mangleUnscopedTemplateName(const TemplateDecl *ND,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleUnscopedTemplateName(TemplateName,		void mangleUnscopedTemplateName(TemplateName,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleSourceName(const IdentifierInfo *II);		void mangleSourceName(const IdentifierInfo *II);
void mangleRegCallName(const IdentifierInfo *II);		void mangleRegCallName(const IdentifierInfo *II);
		void mangleDeviceStubName(const IdentifierInfo *II);
void mangleSourceNameWithAbiTags(		void mangleSourceNameWithAbiTags(
const NamedDecl ND, const AbiTagList AdditionalAbiTags = nullptr);		const NamedDecl ND, const AbiTagList AdditionalAbiTags = nullptr);
void mangleLocalName(const Decl *D,		void mangleLocalName(const Decl *D,
const AbiTagList *AdditionalAbiTags);		const AbiTagList *AdditionalAbiTags);
void mangleBlockForPrefix(const BlockDecl *Block);		void mangleBlockForPrefix(const BlockDecl *Block);
void mangleUnqualifiedBlock(const BlockDecl *Block);		void mangleUnqualifiedBlock(const BlockDecl *Block);
void mangleTemplateParamDecl(const NamedDecl *Decl);		void mangleTemplateParamDecl(const NamedDecl *Decl);
void mangleLambda(const CXXRecordDecl *Lambda);		void mangleLambda(const CXXRecordDecl *Lambda);
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines

void CXXNameMangler::mangleSourceNameWithAbiTags(		void CXXNameMangler::mangleSourceNameWithAbiTags(
const NamedDecl ND, const AbiTagList AdditionalAbiTags) {		const NamedDecl ND, const AbiTagList AdditionalAbiTags) {
mangleSourceName(ND->getIdentifier());		mangleSourceName(ND->getIdentifier());
writeAbiTags(ND, AdditionalAbiTags);		writeAbiTags(ND, AdditionalAbiTags);
}		}

void CXXNameMangler::mangle(const NamedDecl *D) {		void CXXNameMangler::mangle(const NamedDecl *D) {
// <mangled-name> ::= _Z <encoding>		// <mangled-name> ::= _Z <encoding>
		rjmccallUnsubmitted Done Reply Inline Actions This can just be `cast`, except actually I don't think you need a cast here at all given the code below. rjmccall: This can just be `cast`, except actually I don't think you need a cast here at all given the…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions removed cast yaxunl: removed cast
// ::= <data name>		// ::= <data name>
// ::= <special-name>		// ::= <special-name>
Out << "_Z";		Out << "_Z";
if (const FunctionDecl *FD = dyn_cast<FunctionDecl>(D))		if (const FunctionDecl *FD = dyn_cast<FunctionDecl>(D))
mangleFunctionEncoding(FD);		mangleFunctionEncoding(FD);
else if (const VarDecl *VD = dyn_cast<VarDecl>(D))		else if (const VarDecl *VD = dyn_cast<VarDecl>(D))
mangleName(VD);		mangleName(VD);
else if (const IndirectFieldDecl *IFD = dyn_cast<IndirectFieldDecl>(D))		else if (const IndirectFieldDecl *IFD = dyn_cast<IndirectFieldDecl>(D))
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	isTemplate(const NamedDecl ND, const TemplateArgumentList &TemplateArgs) {

// Check if we have a variable template.		// Check if we have a variable template.
if (const VarTemplateSpecializationDecl *Spec =		if (const VarTemplateSpecializationDecl *Spec =
dyn_cast<VarTemplateSpecializationDecl>(ND)) {		dyn_cast<VarTemplateSpecializationDecl>(ND)) {
TemplateArgs = &Spec->getTemplateArgs();		TemplateArgs = &Spec->getTemplateArgs();
return Spec->getSpecializedTemplate();		return Spec->getSpecializedTemplate();
}		}

return nullptr;		return nullptr;
		rjmccallUnsubmitted Done Reply Inline Actions There's a default constructor. rjmccall: There's a default constructor.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
}		}

void CXXNameMangler::mangleName(const NamedDecl *ND) {		void CXXNameMangler::mangleName(const NamedDecl *ND) {
if (const VarDecl *VD = dyn_cast<VarDecl>(ND)) {		if (const VarDecl *VD = dyn_cast<VarDecl>(ND)) {
// Variables should have implicit tags from its type.		// Variables should have implicit tags from its type.
AbiTagList VariableTypeAbiTags = makeVariableTypeTags(VD);		AbiTagList VariableTypeAbiTags = makeVariableTypeTags(VD);
if (VariableTypeAbiTags.empty()) {		if (VariableTypeAbiTags.empty()) {
// Simple case no variable type tags.		// Simple case no variable type tags.
▲ Show 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	if (II) {
getEffectiveDeclContext(ND)->isFileContext() &&		getEffectiveDeclContext(ND)->isFileContext() &&
!ND->isInAnonymousNamespace())		!ND->isInAnonymousNamespace())
Out << 'L';		Out << 'L';

auto *FD = dyn_cast<FunctionDecl>(ND);		auto *FD = dyn_cast<FunctionDecl>(ND);
bool IsRegCall = FD &&		bool IsRegCall = FD &&
FD->getType()->castAs<FunctionType>()->getCallConv() ==		FD->getType()->castAs<FunctionType>()->getCallConv() ==
clang::CC_X86RegCall;		clang::CC_X86RegCall;
if (IsRegCall)		bool IsDeviceStub =
		FD && FD->getType()->castAs<FunctionType>()->getCallConv() ==
		clang::CC_DeviceStub;
		if (IsDeviceStub)
		mangleDeviceStubName(II);
		else if (IsRegCall)
mangleRegCallName(II);		mangleRegCallName(II);
else		else
mangleSourceName(II);		mangleSourceName(II);

writeAbiTags(ND, AdditionalAbiTags);		writeAbiTags(ND, AdditionalAbiTags);
break;		break;
}		}

▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
void CXXNameMangler::mangleRegCallName(const IdentifierInfo *II) {		void CXXNameMangler::mangleRegCallName(const IdentifierInfo *II) {
// <source-name> ::= <positive length number> __regcall3__ <identifier>		// <source-name> ::= <positive length number> __regcall3__ <identifier>
// <number> ::= [n] <non-negative decimal integer>		// <number> ::= [n] <non-negative decimal integer>
// <identifier> ::= <unqualified source code identifier>		// <identifier> ::= <unqualified source code identifier>
Out << II->getLength() + sizeof("__regcall3__") - 1 << "__regcall3__"		Out << II->getLength() + sizeof("__regcall3__") - 1 << "__regcall3__"
<< II->getName();		<< II->getName();
}		}

		void CXXNameMangler::mangleDeviceStubName(const IdentifierInfo *II) {
		// <source-name> ::= <positive length number> __device_stub__ <identifier>
		// <number> ::= [n] <non-negative decimal integer>
		// <identifier> ::= <unqualified source code identifier>
		Out << II->getLength() + sizeof("__device_stub__") - 1 << "__device_stub__"
		<< II->getName();
		}

void CXXNameMangler::mangleSourceName(const IdentifierInfo *II) {		void CXXNameMangler::mangleSourceName(const IdentifierInfo *II) {
// <source-name> ::= <positive length number> <identifier>		// <source-name> ::= <positive length number> <identifier>
// <number> ::= [n] <non-negative decimal integer>		// <number> ::= [n] <non-negative decimal integer>
// <identifier> ::= <unqualified source code identifier>		// <identifier> ::= <unqualified source code identifier>
Out << II->getLength() << II->getName();		Out << II->getLength() << II->getName();
}		}

void CXXNameMangler::mangleNestedName(const NamedDecl *ND,		void CXXNameMangler::mangleNestedName(const NamedDecl *ND,
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	else if (const BlockDecl *BD = dyn_cast<BlockDecl>(DC))
mangleBlockForPrefix(BD);		mangleBlockForPrefix(BD);
else		else
mangleFunctionEncoding(cast<FunctionDecl>(DC));		mangleFunctionEncoding(cast<FunctionDecl>(DC));

// Implicit ABI tags (from namespace) are not available in the following		// Implicit ABI tags (from namespace) are not available in the following
// entity; reset to actually emitted tags, which are available.		// entity; reset to actually emitted tags, which are available.
LocalAbiTags.setUsedAbiTags(LocalAbiTags.getEmittedAbiTags());		LocalAbiTags.setUsedAbiTags(LocalAbiTags.getEmittedAbiTags());
}		}

		rjmccallUnsubmitted Done Reply Inline Actions The relevant wording from the Itanium spec here is: For entities in constructors and destructors, the mangling of the complete object constructor or destructor is used as the base function name, i.e. the C1 or D1 version. But you might consider pulling this out as a helper function, something like: static GlobalDecl getParentOfLocalEntity(const DeclContext DC); rjmccall:* The relevant wording from the Itanium spec here is: > For entities in constructors and…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions extracted to getParentOfLocalEntity yaxunl: extracted to getParentOfLocalEntity
Out << 'E';		Out << 'E';

		rjmccallUnsubmitted Done Reply Inline Actions This can still be `cast`. rjmccall: This can still be `cast`.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
// GCC 5.3.0 doesn't emit derived ABI tags for local names but that seems to		// GCC 5.3.0 doesn't emit derived ABI tags for local names but that seems to
// be a bug that is fixed in trunk.		// be a bug that is fixed in trunk.

if (RD) {		if (RD) {
// The parameter number is omitted for the last parameter, 0 for the		// The parameter number is omitted for the last parameter, 0 for the
// second-to-last parameter, 1 for the third-to-last parameter, etc. The		// second-to-last parameter, 1 for the third-to-last parameter, etc. The
// <entity name> will of course contain a <closure-type-name>: Its		// <entity name> will of course contain a <closure-type-name>: Its
// numbering will be local to the particular argument in which it appears		// numbering will be local to the particular argument in which it appears
▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	void CXXNameMangler::mangleTemplatePrefix(TemplateName Template) {
if (TemplateDecl *TD = Template.getAsTemplateDecl())		if (TemplateDecl *TD = Template.getAsTemplateDecl())
return mangleTemplatePrefix(TD);		return mangleTemplatePrefix(TD);

if (QualifiedTemplateName *Qualified = Template.getAsQualifiedTemplateName())		if (QualifiedTemplateName *Qualified = Template.getAsQualifiedTemplateName())
manglePrefix(Qualified->getQualifier());		manglePrefix(Qualified->getQualifier());

if (OverloadedTemplateStorage *Overloaded		if (OverloadedTemplateStorage *Overloaded
= Template.getAsOverloadedTemplate()) {		= Template.getAsOverloadedTemplate()) {
mangleUnqualifiedName(nullptr, (*Overloaded->begin())->getDeclName(),		mangleUnqualifiedName(nullptr, (*Overloaded->begin())->getDeclName(),
		rjmccallUnsubmitted Done Reply Inline Actions You can just pass `GlobalDecl()` here. rjmccall: You can just pass `GlobalDecl()` here.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
UnknownArity, nullptr);		UnknownArity, nullptr);
return;		return;
}		}

DependentTemplateName *Dependent = Template.getAsDependentTemplateName();		DependentTemplateName *Dependent = Template.getAsDependentTemplateName();
assert(Dependent && "Unknown template name kind?");		assert(Dependent && "Unknown template name kind?");
if (NestedNameSpecifier *Qualifier = Dependent->getQualifier())		if (NestedNameSpecifier *Qualifier = Dependent->getQualifier())
manglePrefix(Qualifier);		manglePrefix(Qualifier);
▲ Show 20 Lines • Show All 855 Lines • ▼ Show 20 Lines	StringRef CXXNameMangler::getCallingConvQualifierName(CallingConv CC) {
case CC_AAPCS:		case CC_AAPCS:
case CC_AAPCS_VFP:		case CC_AAPCS_VFP:
case CC_AArch64VectorCall:		case CC_AArch64VectorCall:
case CC_IntelOclBicc:		case CC_IntelOclBicc:
case CC_SpirFunction:		case CC_SpirFunction:
case CC_OpenCLKernel:		case CC_OpenCLKernel:
case CC_PreserveMost:		case CC_PreserveMost:
case CC_PreserveAll:		case CC_PreserveAll:
		case CC_DeviceStub:
// FIXME: we should be mangling all of the above.		// FIXME: we should be mangling all of the above.
return "";		return "";

case CC_X86ThisCall:		case CC_X86ThisCall:
// FIXME: To match mingw GCC, thiscall should only be mangled in when it is		// FIXME: To match mingw GCC, thiscall should only be mangled in when it is
// used explicitly. At this point, we don't have that much information in		// used explicitly. At this point, we don't have that much information in
// the AST, since clang tends to bake the convention into the canonical		// the AST, since clang tends to bake the convention into the canonical
// function type. thiscall only rarely used explicitly, so don't mangle it		// function type. thiscall only rarely used explicitly, so don't mangle it
▲ Show 20 Lines • Show All 2,186 Lines • ▼ Show 20 Lines
/// and this routine will return false. In this case, the caller should just		/// and this routine will return false. In this case, the caller should just
/// emit the identifier of the declaration (\c D->getIdentifier()) as its		/// emit the identifier of the declaration (\c D->getIdentifier()) as its
/// name.		/// name.
void ItaniumMangleContextImpl::mangleCXXName(const NamedDecl *D,		void ItaniumMangleContextImpl::mangleCXXName(const NamedDecl *D,
raw_ostream &Out) {		raw_ostream &Out) {
assert((isa<FunctionDecl>(D) \|\| isa<VarDecl>(D)) &&		assert((isa<FunctionDecl>(D) \|\| isa<VarDecl>(D)) &&
"Invalid mangleName() call, argument is not a variable or function!");		"Invalid mangleName() call, argument is not a variable or function!");
assert(!isa<CXXConstructorDecl>(D) && !isa<CXXDestructorDecl>(D) &&		assert(!isa<CXXConstructorDecl>(D) && !isa<CXXDestructorDecl>(D) &&
"Invalid mangleName() call on 'structor decl!");		"Invalid mangleName() call on 'structor decl!");
		rjmccallUnsubmitted Done Reply Inline Actions The second assertion can just be removed now, since the GD should be carrying the right information. rjmccall: The second assertion can just be removed now, since the GD should be carrying the right…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions removed yaxunl: removed

PrettyStackTraceDecl CrashInfo(D, SourceLocation(),		PrettyStackTraceDecl CrashInfo(D, SourceLocation(),
getASTContext().getSourceManager(),		getASTContext().getSourceManager(),
"Mangling declaration");		"Mangling declaration");

CXXNameMangler Mangler(*this, Out, D);		CXXNameMangler Mangler(*this, Out, D);
Mangler.mangle(D);		Mangler.mangle(D);
}		}
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

clang/lib/AST/Type.cpp

Show First 20 Lines • Show All 2,941 Lines • ▼ Show 20 Lines	StringRef FunctionType::getNameForCallConv(CallingConv CC) {
case CC_AAPCS_VFP: return "aapcs-vfp";		case CC_AAPCS_VFP: return "aapcs-vfp";
case CC_AArch64VectorCall: return "aarch64_vector_pcs";		case CC_AArch64VectorCall: return "aarch64_vector_pcs";
case CC_IntelOclBicc: return "intel_ocl_bicc";		case CC_IntelOclBicc: return "intel_ocl_bicc";
case CC_SpirFunction: return "spir_function";		case CC_SpirFunction: return "spir_function";
case CC_OpenCLKernel: return "opencl_kernel";		case CC_OpenCLKernel: return "opencl_kernel";
case CC_Swift: return "swiftcall";		case CC_Swift: return "swiftcall";
case CC_PreserveMost: return "preserve_most";		case CC_PreserveMost: return "preserve_most";
case CC_PreserveAll: return "preserve_all";		case CC_PreserveAll: return "preserve_all";
		case CC_DeviceStub:
		return "device_stub";
}		}

llvm_unreachable("Invalid calling convention.");		llvm_unreachable("Invalid calling convention.");
}		}

FunctionProtoType::FunctionProtoType(QualType result, ArrayRef<QualType> params,		FunctionProtoType::FunctionProtoType(QualType result, ArrayRef<QualType> params,
QualType canonical,		QualType canonical,
const ExtProtoInfo &epi)		const ExtProtoInfo &epi)
▲ Show 20 Lines • Show All 1,198 Lines • Show Last 20 Lines

clang/lib/AST/TypePrinter.cpp

Show First 20 Lines • Show All 887 Lines • ▼ Show 20 Lines	if (!InsideCCAttribute) {
case CC_X86_64SysV:		case CC_X86_64SysV:
OS << " __attribute__((sysv_abi))";		OS << " __attribute__((sysv_abi))";
break;		break;
case CC_X86RegCall:		case CC_X86RegCall:
OS << " __attribute__((regcall))";		OS << " __attribute__((regcall))";
break;		break;
case CC_SpirFunction:		case CC_SpirFunction:
case CC_OpenCLKernel:		case CC_OpenCLKernel:
		case CC_DeviceStub:
// Do nothing. These CCs are not available as attributes.		// Do nothing. These CCs are not available as attributes.
break;		break;
case CC_Swift:		case CC_Swift:
OS << " __attribute__((swiftcall))";		OS << " __attribute__((swiftcall))";
break;		break;
case CC_PreserveMost:		case CC_PreserveMost:
OS << " __attribute__((preserve_most))";		OS << " __attribute__((preserve_most))";
break;		break;
▲ Show 20 Lines • Show All 1,006 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	CallingConvCheckResult checkCallingConvention(CallingConv CC) const override {
case CC_X86VectorCall:		case CC_X86VectorCall:
case CC_X86RegCall:		case CC_X86RegCall:
case CC_C:		case CC_C:
case CC_PreserveMost:		case CC_PreserveMost:
case CC_Swift:		case CC_Swift:
case CC_X86Pascal:		case CC_X86Pascal:
case CC_IntelOclBicc:		case CC_IntelOclBicc:
case CC_OpenCLKernel:		case CC_OpenCLKernel:
		case CC_DeviceStub:
return CCCR_OK;		return CCCR_OK;
default:		default:
return CCCR_Warning;		return CCCR_Warning;
}		}
}		}

CallingConv getDefaultCallingConv() const override {		CallingConv getDefaultCallingConv() const override {
return CC_C;		return CC_C;
▲ Show 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	CallingConvCheckResult checkCallingConvention(CallingConv CC) const override {
case CC_Swift:		case CC_Swift:
case CC_X86VectorCall:		case CC_X86VectorCall:
case CC_IntelOclBicc:		case CC_IntelOclBicc:
case CC_Win64:		case CC_Win64:
case CC_PreserveMost:		case CC_PreserveMost:
case CC_PreserveAll:		case CC_PreserveAll:
case CC_X86RegCall:		case CC_X86RegCall:
case CC_OpenCLKernel:		case CC_OpenCLKernel:
		case CC_DeviceStub:
return CCCR_OK;		return CCCR_OK;
default:		default:
return CCCR_Warning;		return CCCR_Warning;
}		}
}		}

CallingConv getDefaultCallingConv() const override {		CallingConv getDefaultCallingConv() const override {
return CC_C;		return CC_C;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	CallingConvCheckResult checkCallingConvention(CallingConv CC) const override {
case CC_X86VectorCall:		case CC_X86VectorCall:
case CC_IntelOclBicc:		case CC_IntelOclBicc:
case CC_PreserveMost:		case CC_PreserveMost:
case CC_PreserveAll:		case CC_PreserveAll:
case CC_X86_64SysV:		case CC_X86_64SysV:
case CC_Swift:		case CC_Swift:
case CC_X86RegCall:		case CC_X86RegCall:
case CC_OpenCLKernel:		case CC_OpenCLKernel:
		case CC_DeviceStub:
return CCCR_OK;		return CCCR_OK;
default:		default:
return CCCR_Warning;		return CCCR_Warning;
}		}
}		}
};		};

// x86-64 Windows Visual Studio target		// x86-64 Windows Visual Studio target
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
// very important as the device kernel name being mangled in host-compilation		// very important as the device kernel name being mangled in host-compilation
// is used to resolve the device binaries to be executed. Inconsistent naming		// is used to resolve the device binaries to be executed. Inconsistent naming
// result in undefined behavior. Even though we cannot check that naming		// result in undefined behavior. Even though we cannot check that naming
// directly between host- and device-compilations, the host- and		// directly between host- and device-compilations, the host- and
// device-mangling in host compilation could help catching certain ones.		// device-mangling in host compilation could help catching certain ones.
assert((CGF.CGM.getContext().getAuxTargetInfo() &&		assert((CGF.CGM.getContext().getAuxTargetInfo() &&
(CGF.CGM.getContext().getAuxTargetInfo()->getCXXABI() !=		(CGF.CGM.getContext().getAuxTargetInfo()->getCXXABI() !=
CGF.CGM.getContext().getTargetInfo().getCXXABI())) \|\|		CGF.CGM.getContext().getTargetInfo().getCXXABI())) \|\|
		CGF.getLangOpts().HIP \|\|
getDeviceStubName(getDeviceSideName(CGF.CurFuncDecl)) ==		getDeviceStubName(getDeviceSideName(CGF.CurFuncDecl)) ==
CGF.CurFn->getName());		CGF.CurFn->getName());

EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});		EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),		if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|		CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|
CGF.getLangOpts().HIPUseNewLaunchAPI)		CGF.getLangOpts().HIPUseNewLaunchAPI)
emitDeviceStubBodyNew(CGF, Args);		emitDeviceStubBodyNew(CGF, Args);
▲ Show 20 Lines • Show All 551 Lines • ▼ Show 20 Lines	llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() {
} else {		} else {
DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);		DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);
}		}
DtorBuilder.CreateRetVoid();		DtorBuilder.CreateRetVoid();
return ModuleDtorFunc;		return ModuleDtorFunc;
}		}

std::string CGNVCUDARuntime::getDeviceStubName(llvm::StringRef Name) const {		std::string CGNVCUDARuntime::getDeviceStubName(llvm::StringRef Name) const {
if (!CGM.getLangOpts().HIP)		if (!CGM.getLangOpts().HIP \|\| Name.startswith("_Z"))
return Name;		return Name;
return (Name + ".stub").str();		return ("__device_stub__" + Name).str();
}		}

CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {		CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
return new CGNVCUDARuntime(CGM);		return new CGNVCUDARuntime(CGM);
}		}

clang/lib/CodeGen/CGDebugInfo.cpp

Show First 20 Lines • Show All 1,143 Lines • ▼ Show 20 Lines	llvm::DIType CGDebugInfo::CreateType(const TypedefType Ty,
// Typedefs are derived from some other type.		// Typedefs are derived from some other type.
return DBuilder.createTypedef(Underlying, Ty->getDecl()->getName(),		return DBuilder.createTypedef(Underlying, Ty->getDecl()->getName(),
getOrCreateFile(Loc), getLineNumber(Loc),		getOrCreateFile(Loc), getLineNumber(Loc),
getDeclContextDescriptor(Ty->getDecl()));		getDeclContextDescriptor(Ty->getDecl()));
}		}

static unsigned getDwarfCC(CallingConv CC) {		static unsigned getDwarfCC(CallingConv CC) {
switch (CC) {		switch (CC) {
		case CC_DeviceStub:
case CC_C:		case CC_C:
// Avoid emitting DW_AT_calling_convention if the C convention was used.		// Avoid emitting DW_AT_calling_convention if the C convention was used.
return 0;		return 0;

case CC_X86StdCall:		case CC_X86StdCall:
return llvm::dwarf::DW_CC_BORLAND_stdcall;		return llvm::dwarf::DW_CC_BORLAND_stdcall;
case CC_X86FastCall:		case CC_X86FastCall:
return llvm::dwarf::DW_CC_BORLAND_msfastcall;		return llvm::dwarf::DW_CC_BORLAND_msfastcall;
▲ Show 20 Lines • Show All 3,658 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 988 Lines • ▼ Show 20 Lines	static std::string getMangledNameImpl(const CodeGenModule &CGM, GlobalDecl GD,
MangleContext &MC = CGM.getCXXABI().getMangleContext();		MangleContext &MC = CGM.getCXXABI().getMangleContext();
if (MC.shouldMangleDeclName(ND)) {		if (MC.shouldMangleDeclName(ND)) {
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
if (const auto *D = dyn_cast<CXXConstructorDecl>(ND))		if (const auto *D = dyn_cast<CXXConstructorDecl>(ND))
MC.mangleCXXCtor(D, GD.getCtorType(), Out);		MC.mangleCXXCtor(D, GD.getCtorType(), Out);
else if (const auto *D = dyn_cast<CXXDestructorDecl>(ND))		else if (const auto *D = dyn_cast<CXXDestructorDecl>(ND))
MC.mangleCXXDtor(D, GD.getDtorType(), Out);		MC.mangleCXXDtor(D, GD.getDtorType(), Out);
else		else
MC.mangleName(ND, Out);		MC.mangleName(ND, Out);
		rjmccallUnsubmitted Done Reply Inline Actions Let's see if we can make this breakdown no longer necessary, since `MangleContext::mangleName` should be capable of doing the right thing starting straight from a GD. In fact, maybe we can remove most of the specialized mangling methods (like `mangleCXXCtor` and `mangleCXXDtor`) from `MangleContext` completely? Unrelatedly: there's an `Out` declared in the outermost scope, but a bunch of these scopes declare their own `Out`; could you just fix that while you're editing this function? rjmccall: Let's see if we can make this breakdown no longer necessary, since `MangleContext::mangleName`…
		traUnsubmitted Done Reply Inline Actions Perhaps it would make sense to split this patch into two -- one that changes mangler input to GlobalDecl and the other one dealing with HIP stubs. tra: Perhaps it would make sense to split this patch into two -- one that changes mangler input to…
		rjmccallUnsubmitted Done Reply Inline Actions Yes, that's a good idea. rjmccall: Yes, that's a good idea.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions separated to https://reviews.llvm.org/D75700 yaxunl: separated to https://reviews.llvm.org/D75700
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Fixed the redundant Out var. However, removing mangleCXXCtor/Dtor will incur significantly more changes. Can it be done later? Thanks. yaxunl: Fixed the redundant Out var. However, removing mangleCXXCtor/Dtor will incur significantly…
} else {		} else {
IdentifierInfo *II = ND->getIdentifier();		IdentifierInfo *II = ND->getIdentifier();
assert(II && "Attempt to mangle unnamed decl.");		assert(II && "Attempt to mangle unnamed decl.");
const auto *FD = dyn_cast<FunctionDecl>(ND);		const auto *FD = dyn_cast<FunctionDecl>(ND);

if (FD &&		if (FD &&
FD->getType()->castAs<FunctionType>()->getCallConv() == CC_X86RegCall) {		FD->getType()->castAs<FunctionType>()->getCallConv() == CC_X86RegCall) {
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
Out << "__regcall3__" << II->getName();		Out << "__regcall3__" << II->getName();
} else {		} else {
Out << II->getName();		Out << II->getName();
}		}
}		}
		rjmccallUnsubmitted Done Reply Inline Actions Is this the best way of handling this, or should `shouldMangleDeclName` return true for kernels (or at least stubs?) even in C? Honest question. rjmccall: Is this the best way of handling this, or should `shouldMangleDeclName` return true for kernels…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions This is for extern "C" kernels, which are either not mangled or with simple prefix. I tried returning true for them in shouldMangleDeclName, and they got mangled as Itanium mangling, which seems not right. yaxunl: This is for extern "C" kernels, which are either not mangled or with simple prefix. I tried…

if (const auto *FD = dyn_cast<FunctionDecl>(ND))		if (const auto *FD = dyn_cast<FunctionDecl>(ND))
if (FD->isMultiVersion() && !OmitMultiVersionMangling) {		if (FD->isMultiVersion() && !OmitMultiVersionMangling) {
switch (FD->getMultiVersionKind()) {		switch (FD->getMultiVersionKind()) {
case MultiVersionKind::CPUDispatch:		case MultiVersionKind::CPUDispatch:
case MultiVersionKind::CPUSpecific:		case MultiVersionKind::CPUSpecific:
AppendCPUSpecificCPUDispatchMangling(CGM,		AppendCPUSpecificCPUDispatchMangling(CGM,
FD->getAttr<CPUSpecificAttr>(),		FD->getAttr<CPUSpecificAttr>(),
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	StringRef CodeGenModule::getMangledName(GlobalDecl GD) {

// Keep the first result in the case of a mangling collision.		// Keep the first result in the case of a mangling collision.
const auto *ND = cast<NamedDecl>(GD.getDecl());		const auto *ND = cast<NamedDecl>(GD.getDecl());
std::string MangledName = getMangledNameImpl(*this, GD, ND);		std::string MangledName = getMangledNameImpl(*this, GD, ND);

// Adjust kernel stub mangling as we may need to be able to differentiate		// Adjust kernel stub mangling as we may need to be able to differentiate
// them from the kernel itself (e.g., for HIP).		// them from the kernel itself (e.g., for HIP).
if (auto *FD = dyn_cast<FunctionDecl>(GD.getDecl()))		if (auto *FD = dyn_cast<FunctionDecl>(GD.getDecl()))
if (!getLangOpts().CUDAIsDevice && FD->hasAttr<CUDAGlobalAttr>())		if (!getLangOpts().CUDAIsDevice && FD->hasAttr<CUDAGlobalAttr>()) {
		if (auto *TD = cast<FunctionDecl>(FD)->getPrimaryTemplate())
		FD = TD->getTemplatedDecl();
		auto OldQT = FD->getType();
		auto *OldFT = OldQT->getAs<FunctionType>();
		auto *NewFT = getContext().adjustFunctionType(
		OldFT, OldFT->getExtInfo().withCallingConv(CC_DeviceStub));
		const_cast<FunctionDecl *>(FD)->setType(QualType(NewFT, 0));
		MangledName = getMangledNameImpl(*this, GD, ND);
MangledName = getCUDARuntime().getDeviceStubName(MangledName);		MangledName = getCUDARuntime().getDeviceStubName(MangledName);
		const_cast<FunctionDecl *>(FD)->setType(OldQT);
		}

auto Result = Manglings.insert(std::make_pair(MangledName, GD));		auto Result = Manglings.insert(std::make_pair(MangledName, GD));
return MangledDeclNames[CanonicalGD] = Result.first->first();		return MangledDeclNames[CanonicalGD] = Result.first->first();
}		}

StringRef CodeGenModule::getBlockMangledName(GlobalDecl GD,		StringRef CodeGenModule::getBlockMangledName(GlobalDecl GD,
const BlockDecl *BD) {		const BlockDecl *BD) {
MangleContext &MangleCtx = getCXXABI().getMangleContext();		MangleContext &MangleCtx = getCXXABI().getMangleContext();
const Decl *D = GD.getDecl();		const Decl *D = GD.getDecl();
		traUnsubmitted Done Reply Inline Actions On one hand I like this patch variant much better than the one that changed the mangling itself. On the other hand this code appears to reply on implementation details. I.e. we're setting new name on `FD` which may or may not be the same as `ND`, but we're always passing `ND` to `getMangledNameImpl()`. Perhaps we could implement name-tweaking as another `MultiVersionKind` which we already plumb into getMangledNameImpl() and which allows changing the name for target attributes & features. tra: On one hand I like this patch variant much better than the one that changed the mangling itself.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name. yaxunl: The mangled name of an instantiated template function does not depends on its own name, but on…
		traUnsubmitted Done Reply Inline Actions The mangled name of an instantiated template function does not depends on its own name, but on the template. If we do not want to depend on this implementation detail, it seems I have to clone the template and instantiate from the clone. That would be putting more effort into working around the fact that `getMangledNameImpl()` doesa not provide a good API to change the name the way you need to. That's what needs to be addressed, IMO. MultiVersion does not help us here since it only appends .postfix to mangled name. The obstacle we are facing is how to change the unmangled name. Some existing implementations append to the mangled name, but we can do other manipulations there, too. The string with the mangled name originates in `getMangledNameImpl` and we could do more than just append to it. We do not have to use the `MultiVersion` for that, either. E.g. we prepend `__regcall3__` to the names of functions with `CC_X86RegCall` calling convention. We could do something similar for the kernel stub, I wonder if we could just generate a unique name and be done with that? Hmm. Unique name probably would not do if, let's say, a kernel is defined in one TU, but we need to call it from another TU. So, whichever way we change the name of the stub, it will need to be the same everywhere. You may want to add a test verifying that launching of declaration-only kernels uses the right name. Consistency of name mangling means that we do need to include regular C++-mangled information. Which means we need to do the name tweaking deeper down. How about using calling conventions? It's been suggested in the past that a lot of shenanigans around kernel launches could/should be done as a different calling convention. One of the things affected by the calling convention is mangling and we can add prefix there: https://github.com/llvm/llvm-project/blob/master/clang/lib/AST/Mangle.cpp#L164 We could tag host-side kernel with 'kernel call' calling convention on the host side and then plumb prefixing to be done similar to `__regcall3__`. If that works that may be a useful improvement overall. For instance, we may no longer need to stash a `it's a kernel` flag among attributes and it would probably be useful for other things (e.g enforcing address space requirements for kernel pointer arguments). tra: > The mangled name of an instantiated template function does not depends on its own name, but…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will add a test for decl only kernel. At least for the current implementation I see it works. A decl of stub function with expected name is emitted and can be called. About calling conv. I've tried implementing `__global__` as a calling conv before. The issue is that it is part of type system and clang enforces type checking for that. e.g. you cannot assign it to an ordinary function pointer unless that function pointer is also declared with the same calling convention. This will cause lots of type mismatching issues. In CUDA language, `__global__` is not part of type system since it is just an attribute. We could introduce a calling conv for stub, but probably we can only use use it when we mangle the stub function. yaxunl: will add a test for decl only kernel. At least for the current implementation I see it works. A…
		traUnsubmitted Done Reply Inline Actions OK. I'm fresh out of ideas. We should add some sort of assert to make sure that the mangled name does have the prefix we intended to add. Also a TODO to figure out a better way to add a name prefix before mangling. If anyone else has other suggestions, please chime in. tra: OK. I'm fresh out of ideas. We should add some sort of assert to make sure that the mangled…

SmallString<256> Buffer;		SmallString<256> Buffer;
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
if (!D)		if (!D)
MangleCtx.mangleGlobalBlock(BD,		MangleCtx.mangleGlobalBlock(BD,
dyn_cast_or_null<VarDecl>(initializedGlobalDecl.getDecl()), Out);		dyn_cast_or_null<VarDecl>(initializedGlobalDecl.getDecl()), Out);
else if (const auto *CD = dyn_cast<CXXConstructorDecl>(D))		else if (const auto *CD = dyn_cast<CXXConstructorDecl>(D))
MangleCtx.mangleCtorBlock(CD, GD.getCtorType(), BD, Out);		MangleCtx.mangleCtorBlock(CD, GD.getCtorType(), BD, Out);
▲ Show 20 Lines • Show All 2,166 Lines • ▼ Show 20 Lines	llvm::Constant *CodeGenModule::GetAddrOfFunction(GlobalDecl GD,
// getAddrOfCXXStructor. Make sure we use the MS ABI base destructor instead		// getAddrOfCXXStructor. Make sure we use the MS ABI base destructor instead
// of the complete destructor when necessary.		// of the complete destructor when necessary.
if (const auto *DD = dyn_cast<CXXDestructorDecl>(GD.getDecl())) {		if (const auto *DD = dyn_cast<CXXDestructorDecl>(GD.getDecl())) {
if (getTarget().getCXXABI().isMicrosoft() &&		if (getTarget().getCXXABI().isMicrosoft() &&
GD.getDtorType() == Dtor_Complete &&		GD.getDtorType() == Dtor_Complete &&
DD->getParent()->getNumVBases() == 0)		DD->getParent()->getNumVBases() == 0)
GD = GlobalDecl(DD, Dtor_Base);		GD = GlobalDecl(DD, Dtor_Base);
}		}

		rjmccallUnsubmitted Done Reply Inline Actions Should this be handled in the caller, or would that make things unreasonably difficult? rjmccall: Should this be handled in the caller, or would that make things unreasonably difficult?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
StringRef MangledName = getMangledName(GD);		StringRef MangledName = getMangledName(GD);
return GetOrCreateLLVMFunction(MangledName, Ty, GD, ForVTable, DontDefer,		return GetOrCreateLLVMFunction(MangledName, Ty, GD, ForVTable, DontDefer,
/IsThunk=/false, llvm::AttributeList(),		/IsThunk=/false, llvm::AttributeList(),
IsForDefinition);		IsForDefinition);
}		}

static const FunctionDecl *		static const FunctionDecl *
GetRuntimeFunctionDecl(ASTContext &C, StringRef Name) {		GetRuntimeFunctionDecl(ASTContext &C, StringRef Name) {
▲ Show 20 Lines • Show All 2,629 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/kernel-stub-name.cu

	// RUN: echo "GPU binary would be here" > %t			// RUN: echo "GPU binary would be here" > %t

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - -x hip\			// RUN: -fcuda-include-gpubinary %t -o - -x hip\
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=CHECK			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=CHECK

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

				extern "C" __global__ void ckernel() {}

				namespace ns {
				__global__ void nskernel() {}
				} // namespace ns

	template<class T>			template<class T>
	__global__ void kernelfunc() {}			__global__ void kernelfunc() {}

				__global__ void kernel_decl();

				// Device side kernel names

				// CHECK: @[[CKERN:[0-9]]] = {{.}} c"ckernel\00"
				// CHECK: @[[NSKERN:[0-9]]] = {{.}} c"_ZN2ns8nskernelEv\00"
				// CHECK: @[[TKERN:[0-9]]] = {{.}} c"_Z10kernelfuncIiEvv\00"

				// Non-template kernel stub functions

				// CHECK: define{{.*}}@[[CSTUB:__device_stub__ckernel]]
				// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[CSTUB]]
				// CHECK: define{{.*}}@[[NSSTUB:_ZN2ns23__device_stub__nskernelEv]]
				// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[NSSTUB]]

	// CHECK-LABEL: define{{.*}}@_Z8hostfuncv()			// CHECK-LABEL: define{{.*}}@_Z8hostfuncv()
	// CHECK: call void @[[STUB:_Z10kernelfuncIiEvv.stub]]()			// CHECK: call void @[[CSTUB]]()
	void hostfunc(void) { kernelfunc<int><<<1, 1>>>(); }			// CHECK: call void @[[NSSTUB]]()
				// CHECK: call void @[[TSTUB:_Z25__device_stub__kernelfuncIiEvv]]()
				// CHECK: call void @[[DSTUB:_Z26__device_stub__kernel_declv]]()
				void hostfunc(void) {
				ckernel<<<1, 1>>>();
				ns::nskernel<<<1, 1>>>();
				kernelfunc<int><<<1, 1>>>();
				kernel_decl<<<1, 1>>>();
				}

				// Template kernel stub functions

				// CHECK: define{{.*}}@[[TSTUB]]
				// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[TSTUB]]

	// CHECK: define{{.*}}@[[STUB]]			// CHECK: declare{{.*}}@[[DSTUB]]
	// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[STUB]]

	// CHECK-LABEL: define{{.*}}@__hip_register_globals			// CHECK-LABEL: define{{.*}}@__hip_register_globals
	// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[STUB]]			// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[CSTUB]]{{.*}}@[[CKERN]]
				// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[NSSTUB]]{{.*}}@[[NSKERN]]
				// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[TSTUB]]{{.*}}@[[TKERN]]

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Fix device stub nameClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 231929

clang/include/clang/Basic/Specifiers.h

clang/lib/AST/ItaniumMangle.cpp

clang/lib/AST/Type.cpp

clang/lib/AST/TypePrinter.cpp

clang/lib/Basic/Targets/X86.h

clang/lib/CodeGen/CGCUDANV.cpp

clang/lib/CodeGen/CGDebugInfo.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/test/CodeGenCUDA/kernel-stub-name.cu

[HIP] Fix device stub name
ClosedPublic