- Revise the interface to derive the stub name and simplify the assertion of it.
Is there particular reason you need to switch to this naming scheme?
One issue with this patch is that demanglers will no longer be able to deal with the name. While they do know to ignore .stub suffix, they can't deal with __device_stub_ prefix.
% c++filt __device_stub___Z10kernelfuncIiEvv __device_stub___Z10kernelfuncIiEvv % c++filt _Z10kernelfuncIiEvv.stub void kernelfunc<int>() [clone .stub]
I'm not sure I understand what exactly this assertion checks.
While the first part makes sense, I'm not sure I understand the name comparison part.
I suspect return "__device_stub__" + Name; would do. StringRef will convert to std::string and copy elision should avoid unnecessary copy.
it's requested from debugger people. they don't want to the host-side stub could match the device-side kernel function name. the previous scheme cannot prevent that.
The second is to ensure, if, under the same ABI, kernel stub name derived from device-side name mangling should be the same the sub name generated from host-side, CGF.CurFn->getName() is the mangled named from host compilation
previous assertion expression gets the same goal, if ABI is different, the stub name from device-side should match the stub name from the host-side compilation. As we add a dedicated interface to the derive stub name, we could simplify the comparison to a single one.
I understand that you want a different name for the stub. My question is why the ".stub" suffix was not sufficient and how does having a prefix instead helps? Making the name un-demangleable is undesirable, IMO. There should be a good reason to justify it.
This definitely needs a comment.
"devicestub__" + Name results in Twine, where not copy is generated. Only the final str() converts Twine into std::string involving copies. Otherwise, there's one copy from Name to std::string and another copy by std::string operator+, right?
it's based on debugger people told me, with ".stub", the debugger still could find it match the original device kernel even though it could find both of them. But, they want to match the original one only and leave the stub one intentionally unmatched.
Yeah, I understand that un-demangleable name causes lots of frustration. But, based on what I learned, CUDA generated the similar thing, e.g. __device_stub__Z15transformKernelPfiif is the stub function from cuda 10.1
Is it OK for us to mangle __device_stub __ as the nested name into the original one, says, we prepend _ZN15__device_stub__E, so that we have _ZN15__device_stub__E10kernelfuncIiEvv
$ c++filt _ZN15__device_stub__E10kernelfuncIiEvv __device_stub__(kernelfunc<int>, void, void)
Sorry, I still don't think I understand the reasons for this change. The stub and the kernel do have a different name now. I don't quite get it why the debugger can differentiate the names when they differ by prefix, but can't when they differ by suffix. It sounds like an attempt to work around a problem somewhere else.
Could you talk to the folks requesting the change and get more details on what exactly we need to do here and, more importantly, why.
I don't think it's a good idea. While it demangles to something, it's not what the demangled name should be. Stub's signature should match that of the kernel.
NVCC does a lot of things differently. It does not mean it's a good reason for us to copy *all* of their choices.
Let's figure out the underlying reasons for this change and then we can figure out about what's the right thing to do here.
But, after unmangling, debugger still could match both as they are almost identical excep the final variants, like clone. The debugger will set all locations matching that specified kernel name.
OK, so the real issue is that demangled name looks identical to debugger.
One way to deal with that is to , essentially, break mangling in compiler.
Another would be to teach debugger how to distinguish the stub from the kernel using additional information likely available to debugger (i.e. mangled name or the location of the symbol -- is it in the host binary or in the GPU binary).
I would argue that breaking mangling is not the best choice here.
I think debugger does have sufficient information to deal with this and that would be the right place to deal with the issue.
em, I did push the later as well, :(. OK, I will simplify the patch to change any functionality but move the calculation of device name into a common interface. So that, vendor could adjust that internally with minimal change. OK?
:-( Sorry about that. I realize how frustrating that can be.
Perhaps it's worth trying once more. You can argue that this change will have trouble being upstreamed without a good technical explanation why it must be done in the compiler. Perhaps they do have compelling reasons why it's hard to do in the debugger, but without specific details from their end it appears indistinguishable from a (possibly misguided) quick fix. It may help if you could get the debugger folks to chime in directly on the review.
LGTM. This is a cleaner way to provide stub name tweaks.
says .... -> (e.g. ....)
... should be the same ...
Adjust kernel stub mangling as we may need to be able to differentiate them from the kernel itself (e.g. for HIP).