This patch adds the necessary hacks to support global constructors and
destructors. This is an incredibly hacky process caused by the primary
fact that Nvidia does not provide any binary tools and very little
linker support. We first had to emit references to these functions and
their priority in D149451. Then we dig them out of the module once it's
loaded to manually create the list that the linker should have made for
us. This patch also contains a few Nvidia specific hacks, but it passes
the test, albeit with a stack size warning from ptxas for the
callback. But this should be fine given the resource usage of a common
test.
This also adds a dependency on LLVM to the NVPTX loader, which hopefully doesn't
cause problems with our CUDA buildbot.
Depends on D149451
The comment is somewhat puzzling.
The sections themselves would be created by whatever generates the object files, before the linker gets involved.
IIRC from our exchange on discourse, the actual problem was that nvlink discards the sections it's not familiar with and that's why we can't just put the initializers into a known init/fini sections and have to rely on putting initializers among regular data and use explicit symbols to find them.