This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Add the interface deriving the stub name of device kernels.
ClosedPublic

Authored by hliao on Jun 14 2019, 8:24 AM.

Event Timeline

hliao created this revision.Jun 14 2019, 8:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2019, 8:24 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tra added a comment.Jun 14 2019, 9:54 AM

Is there particular reason you need to switch to this naming scheme?

One issue with this patch is that demanglers will no longer be able to deal with the name. While they do know to ignore .stub suffix, they can't deal with __device_stub_ prefix.
E.g:

% c++filt __device_stub___Z10kernelfuncIiEvv
__device_stub___Z10kernelfuncIiEvv
% c++filt _Z10kernelfuncIiEvv.stub
void kernelfunc<int>() [clone .stub]
clang/lib/CodeGen/CGCUDANV.cpp
222–226

I'm not sure I understand what exactly this assertion checks.
The condition appears to be true is host/device ABIs are different OR the name of the current function is the same as the (possibly mangled) device-side name + __device_stub_ prefix.

While the first part makes sense, I'm not sure I understand the name comparison part.
Could you tell me more and, maybe, add a comment explaining what's going on here.

789

I suspect return "__device_stub__" + Name; would do. StringRef will convert to std::string and copy elision should avoid unnecessary copy.

hliao marked an inline comment as done.Jun 14 2019, 10:02 AM

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

clang/lib/CodeGen/CGCUDANV.cpp
222–226

The second is to ensure, if, under the same ABI, kernel stub name derived from device-side name mangling should be the same the sub name generated from host-side, CGF.CurFn->getName() is the mangled named from host compilation

hliao marked an inline comment as done.Jun 14 2019, 10:08 AM
hliao added inline comments.
clang/lib/CodeGen/CGCUDANV.cpp
222–226

previous assertion expression gets the same goal, if ABI is different, the stub name from device-side should match the stub name from the host-side compilation. As we add a dedicated interface to the derive stub name, we could simplify the comparison to a single one.
Also, we put the simple condition checking ahead (a common practice) to reduce the overhead of string comparison

tra added a comment.Jun 14 2019, 10:11 AM

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

clang/lib/CodeGen/CGCUDANV.cpp
222–226

This definitely needs a comment.

hliao marked an inline comment as done.Jun 14 2019, 11:28 AM
hliao added inline comments.
clang/lib/CodeGen/CGCUDANV.cpp
789

"devicestub__" + Name results in Twine, where not copy is generated. Only the final str() converts Twine into std::string involving copies. Otherwise, there's one copy from Name to std::string and another copy by std::string operator+, right?

In D63335#1543854, @tra wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

hliao added a comment.EditedJun 14 2019, 11:37 AM
In D63335#1543854, @tra wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

Yeah, I understand that un-demangleable name causes lots of frustration. But, based on what I learned, CUDA generated the similar thing, e.g. __device_stub__Z15transformKernelPfiif is the stub function from cuda 10.1

hliao added a comment.EditedJun 14 2019, 11:42 AM
In D63335#1543854, @tra wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

Is it OK for us to mangle __device_stub __ as the nested name into the original one, says, we prepend _ZN15__device_stub__E, so that we have _ZN15__device_stub__E10kernelfuncIiEvv

and

$ c++filt _ZN15__device_stub__E10kernelfuncIiEvv
__device_stub__(kernelfunc<int>, void, void)
tra added a comment.Jun 14 2019, 2:14 PM
In D63335#1543854, @tra wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

tra added a comment.Jun 14 2019, 2:19 PM

Is it OK for us to mangle __device_stub __ as the nested name into the original one, says, we prepend _ZN15__device_stub__E, so that we have _ZN15__device_stub__E10kernelfuncIiEvv

and

$ c++filt _ZN15__device_stub__E10kernelfuncIiEvv
__device_stub__(kernelfunc<int>, void, void)

I don't think it's a good idea. While it demangles to something, it's not what the demangled name should be. Stub's signature should match that of the kernel.

Yeah, I understand that un-demangleable name causes lots of frustration. But, based on what I learned, CUDA generated the similar thing, e.g. __device_stub__Z15transformKernelPfiif is the stub function from cuda 10.1

NVCC does a lot of things differently. It does not mean it's a good reason for us to copy *all* of their choices.
Let's figure out the underlying reasons for this change and then we can figure out about what's the right thing to do here.

hliao added a comment.Jun 14 2019, 2:19 PM
In D63335#1544311, @tra wrote:
In D63335#1543854, @tra wrote:

it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.

I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.

it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

tra requested changes to this revision.Jun 14 2019, 2:28 PM

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

OK, so the real issue is that demangled name looks identical to debugger.
One way to deal with that is to , essentially, break mangling in compiler.
Another would be to teach debugger how to distinguish the stub from the kernel using additional information likely available to debugger (i.e. mangled name or the location of the symbol -- is it in the host binary or in the GPU binary).

I would argue that breaking mangling is not the best choice here.
I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

This revision now requires changes to proceed.Jun 14 2019, 2:28 PM
hliao added a comment.Jun 14 2019, 2:33 PM
In D63335#1544320, @tra wrote:

Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.

Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.

But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.

OK, so the real issue is that demangled name looks identical to debugger.
One way to deal with that is to , essentially, break mangling in compiler.
Another would be to teach debugger how to distinguish the stub from the kernel using additional information likely available to debugger (i.e. mangled name or the location of the symbol -- is it in the host binary or in the GPU binary).

I would argue that breaking mangling is not the best choice here.
I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

hliao updated this revision to Diff 204856.Jun 14 2019, 2:51 PM

Just revise the interface for device kernel stubbing.

hliao retitled this revision from [HIP] Change kernel stub name again to [HIP] Add the interface deriving the stub name of device kernels..Jun 14 2019, 2:53 PM
hliao edited the summary of this revision. (Show Details)
tra added a comment.Jun 14 2019, 3:05 PM

I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

:-( Sorry about that. I realize how frustrating that can be.

Perhaps it's worth trying once more. You can argue that this change will have trouble being upstreamed without a good technical explanation why it must be done in the compiler. Perhaps they do have compelling reasons why it's hard to do in the debugger, but without specific details from their end it appears indistinguishable from a (possibly misguided) quick fix. It may help if you could get the debugger folks to chime in directly on the review.

hliao added a comment.Jun 14 2019, 3:12 PM
In D63335#1544428, @tra wrote:

I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.

em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?

:-( Sorry about that. I realize how frustrating that can be.

Perhaps it's worth trying once more. You can argue that this change will have trouble being upstreamed without a good technical explanation why it must be done in the compiler. Perhaps they do have compelling reasons why it's hard to do in the debugger, but without specific details from their end it appears indistinguishable from a (possibly misguided) quick fix. It may help if you could get the debugger folks to chime in directly on the review.

shall we review code refactoring first, so that that change could be just a single line change. Yes, I could post that later and drag in necessary stake holders.

tra accepted this revision.Jun 14 2019, 3:18 PM

LGTM. This is a cleaner way to provide stub name tweaks.

clang/lib/CodeGen/CGCUDANV.cpp
223

says .... -> (e.g. ....)

225

... should be the same ...
or, perhaps, identical

clang/lib/CodeGen/CodeGenModule.cpp
1091

Adjust kernel stub mangling as we may need to be able to differentiate them from the kernel itself (e.g. for HIP).

This revision is now accepted and ready to land.Jun 14 2019, 3:18 PM
yaxunl accepted this revision.Jun 17 2019, 4:24 AM

LGTM. Thanks.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2019, 5:48 AM