This is an archive of the discontinued LLVM Phabricator instance.

[libc] Change GPU startup and loader to use multiple kernels
ClosedPublic

Authored by jhuber6 on May 1 2023, 6:09 AM.

Details

Summary

The GPU has a different execution model to standard _start
implementations. On the GPU, all threads are active at the start of a
kernel. In order to correctly intitialize and call the constructors we
want single threaded semantics. Previously, this was done using a
makeshift global barrier with atomics. However, it should be easier to
simply put the portions of the code that must be single threaded in
separate kernels and then call those with only one thread. Generally,
mixing global state between kernel launches makes optimizations more
difficult, similarly to calling a function outside of the TU, but for
testing it is better to be correct.

Depends on D149527 D148943

Diff Detail

Event Timeline

jhuber6 created this revision.May 1 2023, 6:09 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 1 2023, 6:09 AM
jhuber6 requested review of this revision.May 1 2023, 6:09 AM
jhuber6 updated this revision to Diff 518428.May 1 2023, 6:19 AM

Leftover debug

Splitting into three kernels seems better. These implementations are very heavily copy&pasted between amdgpu and nvptx though. A bunch of stuff is inherently target specific, notably the kernel launch machinery, but things like the kernel argument structs are very easily factored into a header, and I think start.cpp are ~ a hundred lines of fairly subtle code that is identical on nvptx&amdgpu except for the spelling of kernel calling convention. I think that would be worth cleaning up - it's likely to be quicker to deduplicate than to review the duplication

libc/startup/gpu/nvptx/start.cpp
71

Is there a missing call to libc::finalize() here?

Splitting into three kernels seems better. These implementations are very heavily copy&pasted between amdgpu and nvptx though. A bunch of stuff is inherently target specific, notably the kernel launch machinery, but things like the kernel argument structs are very easily factored into a header, and I think start.cpp are ~ a hundred lines of fairly subtle code that is identical on nvptx&amdgpu except for the spelling of kernel calling convention. I think that would be worth cleaning up - it's likely to be quicker to deduplicate than to review the duplication

The structs should definitely be common, you're right. The startup code itself I think should remain separate in different directories, it's how the other libc targets do it just for clarity of implementation.

libc/startup/gpu/nvptx/start.cpp
71

That was only necessary for the weird "global barrier" hack I implemented. exit should be sufficient.

jhuber6 updated this revision to Diff 519327.May 3 2023, 6:48 PM

Merging structs

JonChesterfield accepted this revision.May 4 2023, 7:20 AM

Thanks for moving the structs. I think the copy&paste startup code is likely to be bugprone but as that's relatively likely to be your maintenance burden I'll accept.

This revision is now accepted and ready to land.May 4 2023, 7:20 AM
This revision was landed with ongoing or failed builds.May 4 2023, 5:32 PM
This revision was automatically updated to reflect the committed changes.