This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Add NVPTXCtorDtorLoweringPass to handle global ctors / dtors
ClosedPublic

Authored by jhuber6 on Apr 28 2023, 8:13 AM.

Details

Summary

This patch mostly adapts the existing AMDGPUCtorDtorLoweringPass for use
by the Nvidia backend. This pass transforms the ctor / dtor list into a
kernel call that can be used to invoke those functinos. Furthermore, we
emit globals such that the names and addresses of these constructor
functions can be found by the driver. Unfortunately, since NVPTX has no
way to emit variables at a named section, nor a functioning linker to
provide the begin / end symbols, we need to mangle these names and have
an external application find them.

This work is related to the work in D149398 and D149340.

Diff Detail

Event Timeline

jhuber6 created this revision.Apr 28 2023, 8:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2023, 8:13 AM
jhuber6 requested review of this revision.Apr 28 2023, 8:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2023, 8:13 AM

I'm not sure if we should provide this as the default approach. The use-case I have planned is very niche. An alternative is to also do what AMDGPU does and emit a kernel that can be called. I eschewed that mainly because we would get naming conflicts, AMDGPU tends to ignore those because of the persistent LTO mode assumptions.

jhuber6 updated this revision to Diff 517930.Apr 28 2023, 8:14 AM

Clean up leftover function.

tra added a comment.Apr 28 2023, 11:08 AM

The standard CUDA runtime has no mechanisms for dealing with dynamic initializers, so enabling this pass in NVPTX back-end by default is not the right thing to do.

We may need to keep this pass behind some sort of flag, or add the it for stand-alone compilation mode only.

An alternative is to also do what AMDGPU does and emit a kernel that can be called. I eschewed that mainly because we would get naming conflicts,

CUDA front-end uses --cuid for somewhat similar purpose of disambiguating TU-specific objects. We could pass it as an option to the pass to give each TU a unique ID and avoid naming conflicts.

The standard CUDA runtime has no mechanisms for dealing with dynamic initializers, so enabling this pass in NVPTX back-end by default is not the right thing to do.

We may need to keep this pass behind some sort of flag, or add the it for stand-alone compilation mode only.

Yeah that's what I was thinking. We could make clang emit that flag by default if there's no host ToolChain. We could attempt to make this more of a runtime specific thing. E.g. we could emit a kernel and then have some code that tries to register it in CUDA CodeGen. Probably not worth it though. This is primarily just my desire to have my weird standalone implementation functional.

An alternative is to also do what AMDGPU does and emit a kernel that can be called. I eschewed that mainly because we would get naming conflicts,

CUDA front-end uses --cuid for somewhat similar purpose of disambiguating TU-specific objects. We could pass it as an option to the pass to give each TU a unique ID and avoid naming conflicts.

It would be simpler in this case since we'd be looking things up at a prefix if we wished to fish them out. I'm sure we could use some psuedo-hash to make this work fine.

I'm still very unhappy that you can't emit sections in ptxas. maybe we need to make a ptxas wrapper that compiles it in debug mode and regular mode then objcopies the section from the debug mode one into the regular one. I'm sure nothing could go wrong there :).

tra added a comment.Apr 28 2023, 11:32 AM

I'm still very unhappy that you can't emit sections in ptxas. maybe we need to make a ptxas wrapper that compiles it in debug mode and regular mode then objcopies the section from the debug mode one into the regular one. I'm sure nothing could go wrong there :).

Considering the already-known quirkiness of ELF handling by other CUDA tools, I would not hold my breath for objcopy keeping CUDA tools/runtime/driver happy, either.

That's another point for growing our own ability to deal with NVIDIA GPU binaries in LLD and other LLVM's binary utilities. Everything else will require playing this endless game of whack-a-mole chasing version-dependent quirks in NVIDIA's tools.

I'm still very unhappy that you can't emit sections in ptxas. maybe we need to make a ptxas wrapper that compiles it in debug mode and regular mode then objcopies the section from the debug mode one into the regular one. I'm sure nothing could go wrong there :).

Considering the already-known quirkiness of ELF handling by other CUDA tools, I would not hold my breath for objcopy keeping CUDA tools/runtime/driver happy, either.

That's another point for growing our own ability to deal with NVIDIA GPU binaries in LLD and other LLVM's binary utilities. Everything else will require playing this endless game of whack-a-mole chasing version-dependent quirks in NVIDIA's tools.

Yes, I remember when putting a SHT_NOTE section in caused it to no longer loader the module. I have no clue why they seemed to have implemented the ELF target so poorly. But what can you do.

jhuber6 updated this revision to Diff 518262.Apr 29 2023, 6:01 PM

Update to only enable this when in "freestanding" mode. Also add a hash based on the module name to the global.

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2023, 6:01 PM
tra accepted this revision.May 1 2023, 1:27 PM

LGTM overall.

llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
59

Source file name may be a little bit better, though it's still easy to clash if someone does cd A; clang ./foo.c; cd ../B; clang ./foo.c and the file name uses relative paths.

I think we'll need a way to override this unique suffix explicitly as an escape hatch for cases where someone runs into a clash.

This revision is now accepted and ready to land.May 1 2023, 1:27 PM
jhuber6 added inline comments.May 1 2023, 1:30 PM
llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
59

I figured it'd be good enough since this is admittedly *very* niche. So someone would need to have a file called foo.c that also had a constructor called foo in it. For it to clash. Isn't it too late to grab the source filename while we're in the backend lowering stage?

tra added inline comments.May 1 2023, 1:39 PM
llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
59

someone would need to have a file called foo.c that also had a constructor called foo in it

Unlikely != impossible. It's a trade-off between the hassle of implementing the plan B and the hassle of debugging and working around the clash for someone who runs into this.
On one hand that's indeed unlikely to happen, but, given enough exposure, someone/somewhere will run into it and they will likely be ill-equipped to even tell what's going on.
In general, compiler options are logistically much easier to deal with compared to having to change the source code.

Isn't it too late to grab the source filename while we're in the backend lowering stage?

The module already has the file name recorded and available via llvm::Module::getSourceFileName(), so it's as easy to get as the module name.

jhuber6 updated this revision to Diff 518553.May 1 2023, 1:42 PM

Changing to use source filename.

tra added inline comments.May 1 2023, 4:01 PM
llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
59

On the second thought, do you think we'll ever end up running this pass with a module created purely in memory w/o having a source file name. Or, perhaps even without the module name either?

Even the hash of the IR itself will not be sufficient. Users are allowed to compile and link completely identical TUs as long as they don't have conflicting names. I can imagine some sort of "plugin" module with only private symbols, but which has initializers to register stuff on startup. Two identical instances of such a module should be able to work, but they would end up with identical hash in this scheme. I do not see any way to automatically disambiguate them, short of using random numbers, but that would make compilation results unstable.

I still think we need to be able to provide the uniquifier manually via an option.

jhuber6 added inline comments.May 1 2023, 4:02 PM
llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
59

Yeah, I'm assuming they would just get a name conflict in that case. We can definitely add a special option that just adds some noise.

jhuber6 updated this revision to Diff 518604.May 1 2023, 4:58 PM

Add option to allow overriding the global hash.

tra accepted this revision.May 2 2023, 11:46 AM

LGTM.

llvm/lib/Target/NVPTX/NVPTXCtorDtorLowering.cpp
32

We're not overriding the name, but rather the unique suffix for the names we generate.
Perhaps rephrase along the lines of "Override unique ID for ctor/dtor globals" ?