This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Allow the new driver to compile CUDA in non-RDC mode
ClosedPublic

Authored by jhuber6 on Jul 13 2022, 8:59 AM.

Details

Summary

The new driver primarily allows us to support RDC-mode compilations with
proper linking. This is not needed for non-RDC mode compilation, but we
still would like the new driver to be able to handle this mode so we can
transition away from the old driver in the future. This patch adds the
necessary code to support creating a fatbinary for CUDA code generation
as well as removing old assumptions and errors about RDC-mode with the
new driver.

Diff Detail

Event Timeline

jhuber6 created this revision.Jul 13 2022, 8:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 8:59 AM
jhuber6 requested review of this revision.Jul 13 2022, 8:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 8:59 AM
jhuber6 added inline comments.Jul 13 2022, 12:13 PM
clang/lib/Driver/ToolChains/Clang.cpp
6993–6994

Why do we add -fgpu-rdc here when it's already added on line 6322 above?

tra added a comment.Jul 13 2022, 12:36 PM

Nice.

clang/lib/Driver/ToolChains/Clang.cpp
6994

This should not be necessary -- we've already added it for all CUDA/HIP compilations above.

6998

I'm not sure I understand when we'd end up in this "else" branch.
Perhaps the whole sequence could be restructured this way:

if ((IsCuda || IsHIP) && CudaDeviceInput) { // old-style embedding
    CmdArgs.push_back("-fcuda-include-gpubinary");
    CmdArgs.push_back(CudaDeviceInput->getFilename());
} else if (!HostOffloadingInputs.empty()) { // new-driver offloading mode
  if (Cuda && !IsRDCMode) {
    assert(HostOffloadingInputs.size() == 1); // We should have only one GPU fatbinary to embed. Uses old-style embedding.
    CmdArgs.push_back("-fcuda-include-gpubinary");
    CmdArgs.push_back(HostOffloadingInputs.front().getFilename());
  } else { // new driver embedding of GPU object files.
     for (const InputInfo Input : HostOffloadingInputs)
        CmdArgs.push_back(Args.MakeArgString("-fembed-offload-object=" +
                                             TC.getInputFilename(Input)));
  }
}
jhuber6 marked an inline comment as done.Jul 13 2022, 12:53 PM

It's also worth noting that this doesn't include the PTX output for JIT in the fatbinary, it would be relatively easy to include that but I wanted to ask how we should handle that.

clang/lib/Driver/ToolChains/Clang.cpp
6994

What I figured, I think I'm just going to prune it in a separate commit to clean this up.

6998

This is clearer, I'll change it.

jhuber6 updated this revision to Diff 444398.Jul 13 2022, 1:02 PM
jhuber6 marked an inline comment as done.

Updating and making suggested changes. I removed the old fgpu-rdc in rG6abaa8e2103760025cee76528f555de7cf6698e6 since I considered it to be reviewed already from the discussion here.

tra accepted this revision.Jul 13 2022, 1:07 PM
tra added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
6982

I think more details/history would be helpful here. Unlike you and me, the future readers will have little to no context to understand why we do things three different ways here.

This revision is now accepted and ready to land.Jul 13 2022, 1:07 PM
jhuber6 marked an inline comment as done.Jul 13 2022, 1:10 PM
jhuber6 added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
6982

I'd like a future with only one way of doing it, but I'll add a comment mentioning the new and old drivers to make it clearer.

This revision was automatically updated to reflect the committed changes.
jhuber6 marked an inline comment as done.