This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Emit symbols with kernel name in host binary
Needs RevisionPublic

Authored by yaxunl on Apr 8 2020, 9:53 AM.

Details

Reviewers
tra
rjmccall
hliao
Summary

HIP provide host API to allow C/C++ programs to
launch kernel. A C/C++ program can declare a HIP
kernel as an external function and pass it to
the kernel launching API. When linked with object
files built from HIP programs. These external functions
will resolve to symbols with the same name in HIP
programs so that kernels with the same name can be
found and launched.

This requires clang to emit symbols with the same
name as kernels in object files and use them to
identify kernels, instead of using device stub
functions to identify kernels, since device stub
function has different names than kernels.

This patch lets clang emits a void* type global
variable for each kernel in host IR, which is
called kernel handle. The kernel handle has the
same mangled name as kernel by host ABI. It is
passed to __hipRegisterFunction and kernel launching
functions for identifying kernels.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 8 2020, 9:53 AM
tra added a comment.Apr 8 2020, 11:17 AM

Would not this scheme create a conflict between the device-side mangled kernel name and the handle which we emit with the same name? I recall that the distinct stub name was introduced specifically to avoid confusion between device-side kernel and the host-side stub that were visible at the same time (to debugger only?). Now we seen to re-introduce the same name only for the host-side handle instead of the host-side stub.

In D77743#1970035, @tra wrote:

Would not this scheme create a conflict between the device-side mangled kernel name and the handle which we emit with the same name? I recall that the distinct stub name was introduced specifically to avoid confusion between device-side kernel and the host-side stub that were visible at the same time (to debugger only?). Now we seen to re-introduce the same name only for the host-side handle instead of the host-side stub.

we need the stub name to be different than the kernel name because otherwise the debugger will break on the stub function when the users put a break point on the kernel.

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

tra accepted this revision.Apr 8 2020, 1:54 PM

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

The patch appears to apply only to generated kernels. What happens when we take address of the kernel directly?

a.hip: 
__global__ void kernel() {}

auto kernel_ref() {
  return kernel;
}

b.hip:
extern __global__ void kernel(); // access the handle var
something kernel_ref(); // returns the stub pointer?

void f() {
  auto x = kernel_ref();
  auto y = kernel(); 
  hipLaunchKernel(x,...); // x is the stub pointer. 
  hipLaunchKernel(y,...);
}

Will x and y contain the same value? For CUDA the answer would be yes as they both would contain the address of the host-side stub with the kernel's name.
In this case external reference will point to the handle variable, but I'm not sure what would kernel_ref() return.
My guess is that it will be the stub address, which may be a problem. I may be wrong. It would be good to add a test to verify that we always get consistent results when we're referencing the kernel.

This revision is now accepted and ready to land.Apr 8 2020, 1:54 PM
hliao requested changes to this revision.EditedApr 9 2020, 9:22 AM
hliao added a subscriber: hliao.
In D77743#1970304, @tra wrote:

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

The patch appears to apply only to generated kernels. What happens when we take address of the kernel directly?

a.hip: 
__global__ void kernel() {}

auto kernel_ref() {
  return kernel;
}

b.hip:
extern __global__ void kernel(); // access the handle var
something kernel_ref(); // returns the stub pointer?

void f() {
  auto x = kernel_ref();
  auto y = kernel(); 
  hipLaunchKernel(x,...); // x is the stub pointer. 
  hipLaunchKernel(y,...);
}

Will x and y contain the same value? For CUDA the answer would be yes as they both would contain the address of the host-side stub with the kernel's name.
In this case external reference will point to the handle variable, but I'm not sure what would kernel_ref() return.
My guess is that it will be the stub address, which may be a problem. I may be wrong. It would be good to add a test to verify that we always get consistent results when we're referencing the kernel.

That's a good question. That introduces the ambiguity on the values of the same symbol (from the programmer point of view). To ensure we won't have ambiguity, we should always use that *alias* global variable for __global__ function on the host side as it will be used in the runtime API to query the device-side function.

This revision now requires changes to proceed.Apr 9 2020, 9:22 AM
yaxunl added a comment.Apr 9 2020, 9:34 AM
In D77743#1970304, @tra wrote:

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

The patch appears to apply only to generated kernels. What happens when we take address of the kernel directly?

a.hip: 
__global__ void kernel() {}

auto kernel_ref() {
  return kernel;
}

b.hip:
extern __global__ void kernel(); // access the handle var
something kernel_ref(); // returns the stub pointer?

void f() {
  auto x = kernel_ref();
  auto y = kernel(); 
  hipLaunchKernel(x,...); // x is the stub pointer. 
  hipLaunchKernel(y,...);
}

Will x and y contain the same value? For CUDA the answer would be yes as they both would contain the address of the host-side stub with the kernel's name.
In this case external reference will point to the handle variable, but I'm not sure what would kernel_ref() return.
My guess is that it will be the stub address, which may be a problem. I may be wrong. It would be good to add a test to verify that we always get consistent results when we're referencing the kernel.

That's a good question. That introduces the ambiguity on the values of the same symbol (from the programmer point of view). To ensure we won't have ambiguity, we should always use that *alias* global variable for __global__ function on the host side as it will be used in the runtime API to query the device-side function.

I think I need to initialize the kernel handle with the address of the stub function. Any reference to the kernel in host code will use the kernel handle instead of stub function. When the stub function is called, if it is known at compile time, it will be called directly. If it is indirectly called, I will load the stub function from the kernel handle and call it.

hliao added a comment.Apr 9 2020, 9:34 AM

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

yaxunl added a comment.Apr 9 2020, 9:47 AM

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

hipKernelLaunch does not call the stub function. The stub function calls hipKernelLaunch. Therefore user/runtime does not need to know about stub function to launch a kernel.

hliao added a comment.Apr 9 2020, 9:51 AM

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

hipKernelLaunch does not call the stub function. The stub function calls hipKernelLaunch. Therefore user/runtime does not need to know about stub function to launch a kernel.

Since the code using hipKernelLuanch may be compiled by other compilers, we cannot force reinterpreting the use of that symbol by loading value from the symbol. For code like this

__global__ void foo();

hipKernelLaunch(foo, ...)

If it's compiled by other compiler, foo refers to the value of that symbol, i.e. a constant, instead of the value loading from that symbol. They are different.

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

hipKernelLaunch does not call the stub function. The stub function calls hipKernelLaunch. Therefore user/runtime does not need to know about stub function to launch a kernel.

Since the code using hipKernelLuanch may be compiled by other compilers, we cannot force reinterpreting the use of that symbol by loading value from the symbol. For code like this

__global__ void foo();

hipKernelLaunch(foo, ...)

If it's compiled by other compiler, foo refers to the value of that symbol, i.e. a constant, instead of the value loading from that symbol. They are different.

Right. This will work. We don't need user to load from foo, because foo will resolve to kernel handle instead of stub function.

hliao added a comment.Apr 9 2020, 10:36 AM

The ambiguity issue is still there. That __global__ function generates different code if it's compiled as HIP by clang or non-HIP code by clang or other compilers. That will break the resolving from the symbol value to its device kernel name.

Is the renaming just being done to avoid breakpoints from triggering in the stub? Can you not disable debugging the stub using whatever mechanism __attribute__((nodebug)) uses?

Is the renaming just being done to avoid breakpoints from triggering in the stub? Can you not disable debugging the stub using whatever mechanism __attribute__((nodebug)) uses?

I tried it. The source info and line number is gone, but gdb will still break on the function since symbol is still there.

Ah, too bad. Is there any way to suppress that in debug info? I'm not sure there's any other way to satisfy the competing requirements here, and if it's not going to be consistent, it'd be better to avoid the complexity of mangling the thunk differently.

t-tye added a subscriber: t-tye.Apr 22 2020, 5:36 PM