This is an archive of the discontinued LLVM Phabricator instance.

[libc] Support global constructors and destructors on NVPTX
ClosedPublic

Authored by jhuber6 on Apr 29 2023, 12:38 PM.

Details

Summary

This patch adds the necessary hacks to support global constructors and
destructors. This is an incredibly hacky process caused by the primary
fact that Nvidia does not provide any binary tools and very little
linker support. We first had to emit references to these functions and
their priority in D149451. Then we dig them out of the module once it's
loaded to manually create the list that the linker should have made for
us. This patch also contains a few Nvidia specific hacks, but it passes
the test, albeit with a stack size warning from ptxas for the
callback. But this should be fine given the resource usage of a common
test.

This also adds a dependency on LLVM to the NVPTX loader, which hopefully doesn't
cause problems with our CUDA buildbot.

Depends on D149451

Diff Detail

Event Timeline

jhuber6 created this revision.Apr 29 2023, 12:38 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 29 2023, 12:38 PM
jhuber6 requested review of this revision.Apr 29 2023, 12:38 PM
tra added a subscriber: MaskRay.May 1 2023, 11:11 AM
tra added inline comments.
libc/startup/gpu/nvptx/start.cpp
23

The comment is somewhat puzzling.

The sections themselves would be created by whatever generates the object files, before the linker gets involved.
IIRC from our exchange on discourse, the actual problem was that nvlink discards the sections it's not familiar with and that's why we can't just put the initializers into a known init/fini sections and have to rely on putting initializers among regular data and use explicit symbols to find them.

libc/test/IntegrationTest/test.cpp
79

What exactly is the 'toolchain' in this context?

79

What exactly needs this symbol?

I'm surprised we need to care about DSOs on NVPTX as we do not have any there.

Googling around (https://stackoverflow.com/questions/34308720/where-is-dso-handle-defined) suggests that we may avoid the issue by compiling with -fno-use-cxa-atexit.

@MaskRay -- any suggestions on what's the right way to deal with this?

libc/utils/gpu/loader/nvptx/CMakeLists.txt
12–13

Can you check how long the clean build of the tool with -j 6 would take now? If it's in the ballpark of a minute or so, we can probably live with that. Otherwise we should build the tool along with clang/LLVM, similar to how we deal with libc-hdrgen.

jhuber6 added inline comments.May 1 2023, 11:23 AM
libc/startup/gpu/nvptx/start.cpp
23

Sorry, I meant to say "symbols" here. Normally when the linker finds a .fini_array or .init_array section it will provide these symbol names to let you traverse the section. This i the behavior in ld.lld which is what AMD uses. Nvidia both does not provide a way to put these in the .init_array section, nor does the linker create these symbols if you were to force them to exist. The latter could be potentially solved by reinventing nvlink in lld the former is a more difficult problem. Maybe there's a way to hack around this in the PTX Compiler API.

libc/test/IntegrationTest/test.cpp
79

I mean this as when going through a compilation targeting nvptx64 e.g. clang++ test.cpp --target=nvptx64-nvidia-cuda -march=sm_70 -c

79

That might work for these hermetic tests since we should provide the base atexit. @sivachandra do you think we could use this? It seems to be supported on both Clang and GCC.

libc/utils/gpu/loader/nvptx/CMakeLists.txt
12–13

Is this assume we need to rebuild the libraries? I figured these would be copied somewhere in the build environment. On my machine rebuilding the loader takes about five seconds.

sivachandra added inline comments.May 1 2023, 11:37 AM
libc/test/IntegrationTest/test.cpp
79

You can choose to build for the GPUs with -fno-use-cxa-atexit.

tra added inline comments.May 1 2023, 11:43 AM
libc/utils/gpu/loader/nvptx/CMakeLists.txt
12–13

We only copy the clang installation directory to the GPU machines, so when you build the tests, the build directory itself will be empty and the libraries would have to be rebuilt.

On my machine rebuilding the loader takes about five seconds.

Is that a clean build? Can you check what ninja clean; ninja -nv nvptx_loader | wc -l shows?

jhuber6 added inline comments.May 1 2023, 11:55 AM
libc/utils/gpu/loader/nvptx/CMakeLists.txt
12–13

When done in the runtimes/runtimes-bins directory.

[1/1] Cleaning all built files...
Cleaning... 509 files.
3
jhuber6 updated this revision to Diff 518521.May 1 2023, 12:07 PM

Changing NVPTX to use -fno-use-cxa-atexit.

tra added inline comments.May 1 2023, 12:59 PM
libc/utils/gpu/loader/nvptx/CMakeLists.txt
12–13

Rebuilding it on a cloud machine with 6 cores may take too much.
Un-ccached clean rebuild of LLVMSupport and LLVMObject on 6 real cores took ~2.5 minutes. On the build bots it will probably take ~2x that much (cloud counts hyperthereads as cores, IIUIC, and GPU bots have 6 of them, so only 3 physical cores) which would almost double the wall time of each test run which currently is between 5-8 minutes.

We could live with it short-term, but I think we do need to move nvptx_loader into the main clang build and allow libc tests to be configured to use it as an externally provided tool.

jhuber6 added inline comments.May 1 2023, 1:02 PM
libc/utils/gpu/loader/nvptx/CMakeLists.txt
12–13

Yeah, I can definitely do that in a follow-up patch. I think I might've needed to do this anyway, because in the future we're going to want to export an RPC server library that OpenMP can use to implement things like printf in libomptarget. So to do that we'd need to build this first during the projects build.

Glad you could at least hack this. If NVIDIA ever fixed their tools we can get rid of this stuff.

@tra, you think this is good to go with a follow up to avoid compile time increase and provide more flexibility?

tra accepted this revision.May 1 2023, 1:29 PM

I'm OK with fixing launcher build in the followup.

This revision is now accepted and ready to land.May 1 2023, 1:29 PM
This revision was landed with ongoing or failed builds.May 4 2023, 5:13 AM
This revision was automatically updated to reflect the committed changes.