This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Do not set offload kind in a freestanding build
AbandonedPublic

Authored by jhuber6 on Nov 28 2022, 1:50 PM.

Details

Summary

The "new" driver embeds metadata along with the compiled device image to
be used by the linking phase. One bit of metadata is the "kind" field
which determines which runtime to generate registration code for once
it's linked. E.g. if it was compiled with OpenMP it should be registered
with the OpenMP runtime.

However, in a freestanding build the implication is that there may not
be a standard runtime and instead we should expect it to be handled
externally. This patch causes the -ffreestanding option to set the
value to None which results in no registration code being created.

This is useful for generating pure GPU libraries using the offloading
tools. E.g. we can build a libc with OpenMP tooling and link it with
CUDA without requiring the OpenMP runtime be linked.

Diff Detail

Event Timeline

jhuber6 created this revision.Nov 28 2022, 1:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 1:50 PM
Herald added a subscriber: StephenFan. · View Herald Transcript
jhuber6 requested review of this revision.Nov 28 2022, 1:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 1:50 PM
tra added a comment.Nov 28 2022, 3:06 PM

we should expect it to be handled externally.

Can you give more details on how it will work? Something/somewhere would need to generate the glue code to register kernels w/ CUDA runtime.
Are you saying that we'll have some way to tell the final linking stage that all offloaded functions with unspecified offload kind must be treated as if they need runtime X, and the appropriate glue will be generated for them?

If you mean that it would need to be handled by the end user, it may be problematic as they would need to have access to the GPU binaries, the kernel names they contain and would need to further generate host-side kernel launch stubs matching the kernel names.

we should expect it to be handled externally.

Can you give more details on how it will work? Something/somewhere would need to generate the glue code to register kernels w/ CUDA runtime.
Are you saying that we'll have some way to tell the final linking stage that all offloaded functions with unspecified offload kind must be treated as if they need runtime X, and the appropriate glue will be generated for them?

If you mean that it would need to be handled by the end user, it may be problematic as they would need to have access to the GPU binaries, the kernel names they contain and would need to further generate host-side kernel launch stubs matching the kernel names.

So this patch was made because I'm trying to create a generic libc runtime for the GPU. The easiest way to build this was to use OpenMP. However, this creates a fatbinary with the kind set to OpenMP this means if someone linked with the built libcgpu.a for a CUDA application it would lead to the linker wrapper thinking it had to create registration code for both CUDA and OpenMP. So my goal was to allow a way for us to add "Nothing" as the offloading kind. This will result in only registering with the CUDA runtime in the previous example. I figured -ffreestanding was the best option to overload for this behaviour because it more or less implies that there is no standard runtime, in this case OpenMP.

Does this change the annotation on kernels compiled from C or C++ with -ffreestanding? If so, probably want a test showing the change. I have a vague idea that they pick up a default of 'opencl' at present, where 'none' is probably better. Or perhaps this is only a linker wrapper thing and doesn't make it as far as the HSA metadata.

Does this change the annotation on kernels compiled from C or C++ with -ffreestanding? If so, probably want a test showing the change. I have a vague idea that they pick up a default of 'opencl' at present, where 'none' is probably better. Or perhaps this is only a linker wrapper thing and doesn't make it as far as the HSA metadata.

Just a linker wrapper thing saying, "This was compiled with X offloading language please use its runtime when you link".

tra added a comment.Nov 28 2022, 4:01 PM

So this patch was made because I'm trying to create a generic libc runtime for the GPU.

OK. That simplifies things. GPU-side non-kernel functions do not need anything from CUDA host runtime.

The easiest way to build this was to use OpenMP. However, this creates a fatbinary with the kind set to OpenMP this means if someone linked with the built libcgpu.a for a CUDA application it would lead to the linker wrapper thinking it had to create registration code for both CUDA and OpenMP. So my goal was to allow a way for us to add "Nothing" as the offloading kind. This will result in only registering with the CUDA runtime in the previous example. I figured -ffreestanding was the best option to overload for this behaviour because it more or less implies that there is no standard runtime, in this case OpenMP.

OK. This makes sense in general.

However, we probably need some sort of safeguards warning users if they compile CUDA *kernels* or code w/ global variable in freestanding mode.

Also considering that -freestanding mode will not be very useful if the user does compile something that does need runtime, perhaps we can turn things around and always mark the GPU object as generic unless we do know that it requires runtime support?
This way you don't need -freestanding at all. Whether the resulting GPU object is generic would be determined by what we're compiling.

Later, if/when we actually want users to provide their own runtime (as opposed to not tying the generated code to specific runtime if we don't need to), we can implement a fully functional freestanding mode. I suspect we may need it for a while yet.

However, we probably need some sort of safeguards warning users if they compile CUDA *kernels* or code w/ global variable in freestanding mode.

The behavior of -ffreestanding isn't well documented in the first place, so there could be some downsides to doing something like this internally. Although it is externally visible if you use llvm-objdump --offloading on the resulting object.

Also considering that -freestanding mode will not be very useful if the user does compile something that does need runtime, perhaps we can turn things around and always mark the GPU object as generic unless we do know that it requires runtime support?
This way you don't need -freestanding at all. Whether the resulting GPU object is generic would be determined by what we're compiling.

The difficulty here is that this is done at the driver level which can't see the source code. So we wouldn't have a good way to signal back that we actually needed the runtime after compiling it.

Later, if/when we actually want users to provide their own runtime (as opposed to not tying the generated code to specific runtime if we don't need to), we can implement a fully functional freestanding mode. I suspect we may need it for a while yet.

We could potentially make the None kind still embed the linked image so users could write custom runtimes if they so desired.

tra added a comment.Nov 28 2022, 4:37 PM

OK. So, we do need a flag, at least for now. I think we'll need a new one, because -freestanding will likely do a lot of things we don't actually want on the host side.
How about --offload-generic or --[no-]offload-runtime.

OK. So, we do need a flag, at least for now. I think we'll need a new one, because -freestanding will likely do a lot of things we don't actually want on the host side.
How about --offload-generic or --[no-]offload-runtime.

I figured this was fine since we can specify it only for the device if we use -Xopenmp-target=. But I suppose we are overloading some behavior of freestanding here, I'm not exactly sure what -ffreestanding implies.

jhuber6 abandoned this revision.Feb 3 2023, 7:40 AM