This is an archive of the discontinued LLVM Phabricator instance.

[LLVM][Docs] Document the new driver for CUDA compilation
Needs ReviewPublic

Authored by jhuber6 on Jul 20 2022, 11:53 AM.

Details

Reviewers
jdoerfert
tra
Summary

This patch adds some small documentation for RDC support in clang using
the current default and the new driver. Previously there was not any
information on wherther or not this was supported in Clang so this
should help that.

Diff Detail

Event Timeline

jhuber6 created this revision.Jul 20 2022, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 11:53 AM
Herald added subscribers: mattd, yaxunl. · View Herald Transcript
jhuber6 requested review of this revision.Jul 20 2022, 11:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2022, 11:53 AM
tra added a comment.Jul 20 2022, 2:39 PM

I think, as phrased, the section mixes two somewhat independent things -- new driver and RDC compilation as one of the features the new driver makes possible. I'd suggest restructuring the text along the lines of:

  • Using new driver for CUDA/HIP/... compilation. Describe the general changes, improvements to the way offload binaries are compiled, embedded, including downsides and caveats,
  • Using new driver for RDC compilation Explain what RDC is (essentially, each TU compiles to .o, same as the host). Describe the default (each TU produces fully linked executable), and the current limitation (clang can compile to .o, but requires final linking and glue generation to be handled by external build tools). Maybe, describe how to do intermediate linking to a full GPU executable when one builds a library, as we've discussed via emails.
  • Describe other use cases like LTO

I think, as phrased, the section mixes two somewhat independent things -- new driver and RDC compilation as one of the features the new driver makes possible. I'd suggest restructuring the text along the lines of:

  • Using new driver for CUDA/HIP/... compilation. Describe the general changes, improvements to the way offload binaries are compiled, embedded, including downsides and caveats,
  • Using new driver for RDC compilation Explain what RDC is (essentially, each TU compiles to .o, same as the host). Describe the default (each TU produces fully linked executable), and the current limitation (clang can compile to .o, but requires final linking and glue generation to be handled by external build tools). Maybe, describe how to do intermediate linking to a full GPU executable when one builds a library, as we've discussed via emails.
  • Describe other use cases like LTO

So I guess I should split this between several subsections, or maybe make a new section for it.

tra added a comment.Jul 20 2022, 2:53 PM

Considering the ton of new functionality that your changes made possible, it would be warranted, IMO.

Speaking of that, it may be worth making the new-driver changes documented in clang, where they would be visible to a more relevant audience. After all, the driver is part of clang's machinery and has little to do with LLVM. The current CompileCudaWithLLVM.rst is a bit of a historic artifact that evolved from the times when we only had NVTPX back-end in LLVM and very little to no support for GPU offloading in clang.

Considering the ton of new functionality that your changes made possible, it would be warranted, IMO.

Speaking of that, it may be worth making the new-driver changes documented in clang, where they would be visible to a more relevant audience. After all, the driver is part of clang's machinery and has little to do with LLVM. The current CompileCudaWithLLVM.rst is a bit of a historic artifact that evolved from the times when we only had NVTPX back-end in LLVM and very little to no support for GPU offloading in clang.

I previously made https://clang.llvm.org/docs/OffloadingDesign.html when I first made the version for OpenMP. It's a little outdated and only deals with the internals. It may be beneficial to have some documentation geared more towards users, any suggestions on where to put that?

tra added a comment.Jul 20 2022, 3:20 PM

My guess is that llvm/docs/CompileCudaWithLLVM.rs should be transformed into proper documentation and placed in clang/docs/CudaAndHIPSupport.rst. That may be a bit more work than just adding the bits relevant to your changes, so I'm OK to keep them in LLVM for now.