nvlink does not support linking of cubin files archived in an archive.
This tool extracts all the cubin files in the given device specific archive
and pass them to nvlink. It is required for linking static device libraries.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
33,160 ms | x64 debian > libFuzzer.libFuzzer::fork.test |
Event Timeline
Tests. The usual bug in this area is that an archive can contain multiple files with the same name which clobber each other if extracted to the same directory. It looks to me like this implementation will fail that test.
Also, motivation? Seems this can be worked around by not putting cubin files in an archive.
Edit: this will fail to implement the archive semantics of only pulling in used files, i.e. it is implicitly -whole-archive. Suggest error unless whole-archive is requested, as then it does the right thing today while leaving us the option to implement the correct linker semantics later.
Side point: really not a fan of incorrectly implementing bits of a linker as standalone tools called from clang. Can we raise a bug report against nvlink instead of doing this?
While extracting cubin files from the archive each output file gets a new name using llvm::sys::fs::createTemporaryFile, so there won't be any clobbering.
Also, motivation? Seems this can be worked around by not putting cubin files in an archive.
It is required for linking static device libraries on nvptx (D105191). Given a fat heterogenous archive, clang-offload-bundler will create a device specific archive which will be passed to llvm-link for amdgpu and nvlink-wrapper for nvptx.
Edit: this will fail to implement the archive semantics of only pulling in used files, i.e. it is implicitly -whole-archive. Suggest error unless whole-archive is requested, as then it does the right thing today while leaving us the option to implement the correct linker semantics later.
llvm-link and nvlink-wrapper both work on whole archive semantics..
Side point: really not a fan of incorrectly implementing bits of a linker as standalone tools called from clang. Can we raise a bug report against nvlink instead of doing this?
I also don't like adding a new tool just for one additional wrapping around. But, couldn't find a feasible solution on my own. Another alternative is to extend llvm-ar to support "output directory" option. But, the suggestion in last week's multi-company meeting was to go with nvlink-wrapper instead of extending llvm-ar. Let's file a bug report for nvlink for supporting archive files and as soon as it is available we can drop this tool. Requires only one line change in Cuda.cpp.
this is the working steps in the linking script.
$clang++ -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 -O3 -DNDEBUG CMakeFiles/cxx.complex_reduction.x.dir/complex_reduction.cpp.o -o ../bin/cxx.complex_reduction.x -v clang-offload-bundler (host,device) in: complex_reduction.cpp.o out: complex_reduction-494ba8.o, complex_reduction-5aba63.cubin nvlink (device) in: complex_reduction-5aba63.cubin out: complex_reduction-b1898c.out clang-offload-wrapper (device) in: complex_reduction-b1898c.out out: cxx-a8318a.bc clang (device) in: cxx-a8318a.bc cxx-e54e6f.o ld (host, device) in: complex_reduction-494ba8.o, cxx-e54e6f.o out: executable
I'm not quite understand what this wrapper replaces and why.
"It is required for linking static device libraries on nvptx" is not explaining what is not working with existing steps and what the clang-nvlink-wrapper changes to make it work. Need elaboration.
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp | ||
---|---|---|
42 | arcive -> archive |
Discussed at the multicompany meeting today. Consensus reached is that the whole-archive/no-whole-archive distinction is unimportant - we can ship a toolchain that has whole-archive semantics and later change the default when doing more work in the linker. It's not my idea of backwards compatibility but I'm sympathetic to the argument that there's enough churn in gpu offloading that this particular point is lost in the noise.
The code itself looks basically OK to me, it extract files & passes them unchanged into nvlink along with other files. Error handling is incomplete - we detect some things going wrong, but end up returning 0 anyway. Bunch of comments inline.
It would be nicer from a unix perspective if the tool read the archives and returned a list of files to pass to nvlink, in pipeline fashion, but that'll make temporary file cleanup hazardous. Forking nvlink seems better for that reason.
Might be a nice usability feature for --help to print some stuff then invoke nvlink with --help itself. If this is ~ a perfect filter, aside from expanding archives, it could conceivably be named nvlink and used wherever nvlink is.
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp | ||
---|---|---|
28 | what's unistd used for here? can we drop this? | |
124 | should lose the commented out code | |
128 | this seems a fairly likely failure mode - perhaps we should find nvlink first, and only start writing archives members into temporary files etc after establishing that it exists | |
137 | This unconditionally writes an nvlink invocation to stderr. Not good post debugging the first implementation, programs should execute silently when successful. We could look for a debugging flag / environment variable if necessary for debugging this in the field | |
138 | there's a risk that the large number of arguments that results here exceeds the platform limitations. I don't know offhand if executeandwait handles that (e.g. by creating response files), or if nvlink understands response files | |
157 | there's a using namespace llvm above, can remove a lot of llvm:: prefixes | |
197 | does this name the file it couldn't delete? |
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp | ||
---|---|---|
19 | Right now clang-offload-bundler is only used to create an object file for the host and a cubin file for the device. clang-offload-bundler creates a device specific archive of cubin files. Such an archive is then passed to this tool to extract cubin files before passing to nvlink. Is this caused by changes in https://reviews.llvm.org/D105191? |
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp | ||
---|---|---|
19 | Yes, this patch is required for D105191 to work correctly on nvptx. Greg Rodgers presented about static device libraries in last year's LLVM-CTH Workshop. In summary, following commands are generated by clang driver to deal with heterogenous device libraries:
|
Ping. Any more suggestions/questions?
I will update requested changes/documentation for D105191 once this gets through.
clang/tools/clang-nvlink-wrapper/ClangNvlinkWrapper.cpp | ||
---|---|---|
63 | The -1 is odd here -- this argument is a Boolean for whether the buffer is text or not, and this introduced a new MSVC warning ('argument': truncation from 'int' to 'bool'). |
@sylvestre.ledru and @aaron.ballman please have a look at D109225. I have made the suggested changes.
I would like to include this wrapper in llvm-13 also because it is actually a fix required for D105191, which is a part of a feature being worked on for over a year (D81109 and D93525).
This patch has landed as 83f3782c6129e7a5df3faaf0ae576611d16a8d49 but not reflected on Phabricator
Right now clang-offload-bundler is only used to create an object file for the host and a cubin file for the device.
Then cubin files are passed to the nvlink.
This is different from what you described
Is this caused by changes in https://reviews.llvm.org/D105191?
Do you have any reading materials which documents the whole linking flow of D105191?