diff --git a/llvm/docs/CompileCudaWithLLVM.rst b/llvm/docs/CompileCudaWithLLVM.rst --- a/llvm/docs/CompileCudaWithLLVM.rst +++ b/llvm/docs/CompileCudaWithLLVM.rst @@ -134,6 +134,32 @@ This is implied by ``-ffast-math``. +RDC-mode compilation +-------------------- + +Relocatable device code allows for CUDA device symbols to be linked between +multiple CUDA files. clang by default does not support RDC-mode compilation, +instead relying on `external tools< +https://cmake.org/cmake/help/latest/prop_tgt/CUDA_SEPARABLE_COMPILATION.html>` +to perform the necessary steps to link the device. However, experimental driver +support exists in clang since llvm 15.0. + +The experimental driver can be invoked using the ``--offload-new-driver`` flag. +This enables support for RDC-mode compilation via ``-fgpu-rdc``, device LTO via +``-foffload-lto``, and static libraries containing device code. Compiling CUDA +code using the experimental driver should cause RDC-mode compilation to behave +like compilation on the host. An example is shown below. + +.. code-block:: console + + $ clang++ foo.cu bar.cu -fgpu-rdc --offload-arch= -c + $ clang++ foo.o bar.o --offload-link -L/ \ + -lcudart_static -ldl -lrt -pthread + +Note that this new driver is not ABI compatible with binaries created by nvcc +and is only supported on Unix machines. Additionally, the experimental driver +does not emit PTX and will not be able to JIT incompatible device images. + Standard library support ========================