When we do LTO we consider ourselves to have whole program visibility if
every single input file we have contains LLVM bitcode. If we have whole
program visibliity then we can create a single image and utilize CUDA's
non-RDC mode by not passing -c to ptxas and ignoring the nvlink
job. This should be faster for some situations and also saves us the
time executing nvlink.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
LGTM with a minor test nit.
clang/test/Driver/linker-wrapper.c | ||
---|---|---|
41–42 | // LTO-NOT: nvlink |
// LTO-NOT: nvlink