In order to do offloading compilation we need to embed files into the
host and create fatbainaries. Clang uses a special binary format to
bundle several files along with their metadata into a single binary
image. This is currently performed using the -fembed-offload-binary
option. However this is not very extensibile since it requires changing
the command flag every time we want to add something and makes optional
arguments difficult. This patch introduces a new tool called
clang-offload-packager that behaves similarly to CUDA's fatbinary.
This tool takes several input files with metadata and embeds it into a
single image that can then be embedded in the host.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/docs/ClangOffloadBinary.rst | ||
---|---|---|
8 ↗ | (On Diff #427858) | Naming nit: binary may not be the best term for what we're trying to do here. Perhaps something like package, container or collection would work better. |
42 ↗ | (On Diff #427858) | This appears to be a one-way process. How one would examine what's in the binary and unpack/extract specific component from it? |
clang/test/Frontend/embed-object.ll | ||
7 | What will happen if an openMP file compiled this way is linked with the older version of OpenMP runtime which presumably expected to see extra data in .llvm.offloading? | |
clang/tools/clang-offload-binary/ClangOffloadBinary.cpp | ||
70 ↗ | (On Diff #427858) | It would be useful to add a comment describing the 'special' keys file and kind. |
75 ↗ | (On Diff #427858) | Should kind also be required? If not, what's the default kind? |
99 ↗ | (On Diff #427858) | Nit: write is a rather misleading function name here. AFAICT, we're not actually writing anything, but rather packing the ImageBinary into a memory buffer, which we then return. |
Thanks for the comments.
clang/docs/ClangOffloadBinary.rst | ||
---|---|---|
8 ↗ | (On Diff #427858) | Yeah, I would've used bundler but that name is taken. I think I can go with clang-offload-container |
42 ↗ | (On Diff #427858) | This is done by the linker wrapper, but I think it would be good to teach llvm-objdump how to handle these. Then we could basically just treat it the same way as cuobjdump. |
clang/test/Frontend/embed-object.ll | ||
7 | I didn't change the actual data being embedded, only the method to do it. previously this command line did the work of the offload binary tool. Now it just embeds the file that the tool spits out. This test just makes sure that we run it and get the contents in the IR. | |
clang/tools/clang-offload-binary/ClangOffloadBinary.cpp | ||
70 ↗ | (On Diff #427858) | I think I'll add that to the help message. |
75 ↗ | (On Diff #427858) | The default kind is filled when we default construct the OffloadingImage below, which gives OFK_None. This has the effect of being used for linking jobs, but not emitting any registration code. |
99 ↗ | (On Diff #427858) | I read it as "write to buffer", but I can see your point. It's not related to this patch but I could see changing it. |
Changing name from clang-offload-binary to clang-offload-packager and updating some help mesages.
clang/docs/ClangOffloadBinary.rst | ||
---|---|---|
15 ↗ | (On Diff #428379) | It would help if more details are given, e.g, offset and size of members of the header and layout of the string map. |
clang/test/Frontend/embed-object.c | ||
3 | Is this due to the embedded object being empty? So now the bitcode for different targets are bundled by clang-offload-packager then embedded as one file in the relocatable object file? In the old scheme the bitcode for different targets are bundled by clang-offload-bundler then embedded in the relocatable object file, right? What's the advantage of clang-offload-packager compared with clang-offload-bundler? |
clang/docs/ClangOffloadBinary.rst | ||
---|---|---|
15 ↗ | (On Diff #428379) | I can probably add some more documentation on that, would definitely help people inspecting these. Later I intend to let llvm-objdump extract these as well. |
clang/test/Frontend/embed-object.c | ||
3 |
Yes, we used to do the binary format in Clang itself so we got the binary stuff along with the empty file. Now this flag simply embeds a file at a section, the file is empty so we get a zeroinitializer. What's important in this test is just that the option puts the contents in the IR.
Yes, this is basically like what fatbinary does for CUDA. We take all the files and put it into a single binary. The binary then contains metadata which lets us find these files later at link time.
The old clang offload bundler did some similar stuff, namely embedding multiple files into the host. It was similarly an ELF section if the target is an object file. Conceptually this only creates the actual binary that's being embedded and puts it in one big blob, this then just gets embedded directly in the IR. The benefit to this approach in my mind is that the host and device phases are more distinct, we don't need to call the clang-offload-bundler on the host files as well. I could've worked around the current clang offload bundler to make it do something similar, but I didn't see the utility when I'm doing different stuff using a different binary format. |
LGTM in principle.
Given that we're introducing a new tool dependency we may want to get a stamp from someone dealing with build and release.
@tstellar -- do we need to change anything else for the new binary to ship with clang releases?
clang/docs/ClangOffloadBinary.rst | ||
---|---|---|
42 ↗ | (On Diff #427858) | SGTM. That may be a good motivation for someone to write a proper disassembler for NVIDIA's GPU binaries. Or we could teach it to invoke nvdisasm or cuobjdump if it sees an NVIDIA ELF file. |
We did break ABI with LLVM 14 seeing as we supported -fopenmp-new-driver in the release. This used a different method of encoding which isn't compatible with this one. But since that functionality was hidden behind an opt-in experimental flag I would think it's okay to change it.
If the tools is built by default, then the test-release.sh build script will include it in the release binaries.
We now have clang-offload-bundler, clang-offload-packager, clang-offload-wrapper. Do we really need that many distinct binaries for offloading? Any chance we could combine some of those?
clang/test/Driver/linker-wrapper.c | ||
---|---|---|
5 | Since you're calling this from a test, you have to edit clang/test/CMakeLists.txt and add a dep on the new tool. |
clang/test/Driver/linker-wrapper.c | ||
---|---|---|
5 | Sorry, I missed the add_dependencies(clang clang-offload-packager) line in the new cmakelists.txt file. All good. |
Yeah, it's not ideal. I would've liked to use clang-offload-bundler for this tool but we're keeping the old ones around for backwards compatibility. The clang-offload-bundler and clang-offload-wrapper have already been merged into clang and the clang-linker-wrapper respectively, they're kept around for backwards compatibility currently. I think HIP still uses the clang-offload-bundler but I'm in the process of changing that. I'm not sure if it's possible to delete these tools once they're introduced since someone might rely on the binary.
We could add a "clang-offload-bundler and clang-offload-wrapper are deprecated, replace them with $whatever" in the release notes and then remove them a release later, assuming the replacement is straightforward.
I think it is still too early to say clang-offload-bundler is deprecated. It is used by HIP toolchain and has functionality currently not available in clang-offload-packager.
If I read the above right, jhuber says it's been merged into clang itself, not that it's being replaced by clang-offload-packager (?)
clang-offload-bundler is not merged into clang itself (https://github.com/llvm/llvm-project/blob/main/clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp)
Currently, its functionality is not replaced by clang-offload-packager. I am not sure about future.
I'll clarify, the functionality of the clang-offload-bundler is to embed device files into the host. I now do this directly in clang by creating a global string in the LLVM-IR of the host rather than calling a tool. The HIP toolchain still uses the clang-offload-bundler, but I'm planning on putting patches up to move away from that. The current clang-offload-bundler and this new tool have different purposes, this one simply create a binary that can then be embedded into the host. There is still functionality that the clang-offload-bundler provides that I don't intend to replace, namely the bundling and un-bundling of text files. I don't think we want to stick with the clang-offload-bundler approach, because the files that the --clang-offload-bundler spat out weren't valid input to the rest of LLVM, e.g. clang -S -emit-llvm --offload-arch=gfx908 foo.hip -o - | opt would break.
The clang-offload-bundler textual format can be consumed by clang since clang can automatically unbundle them.
The textural format allows clang to emit one output for -E, which can be used by ccache for calculating hashes.
Another usage of clang-offload-bundler textual format is bundled assembly code. Users can modify them and use clang to assemble them.
For embedding bitcode in relocatable object files, clang-offload-packager can be a replacement for clang-offload-bundler, since this is consumed by compiler.
For HIP toolchain, clang-offload-bundler is also used to generate fatbinary files which can be loaded dynamically at run time through module API's. So far I don't think this can be replaced by clang-offload-packager in a short time, since it needs HIP runtime change.
clang/docs/ClangOffloadPackager.rst | ||
---|---|---|
31–32 | This makes the file format depend on LLVM version and potentially standard C++ library version. If it is consumed by the same version of LLVM, it may be fine. However, if it is intended for a generic file format to be consumed by generic offloading language runtimes, it is better to use C-like simple data layout which does not depend on LLVM or standard C++ library. |
clang/docs/ClangOffloadPackager.rst | ||
---|---|---|
31–32 | That format just stores the data before it's serialized to a binary format. The binary format is basically just a few headers and a string table. See https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Object/OffloadBinary.h#L91 for the real format. I didn't want to explain it all in detail here. |
clang/docs/ClangOffloadPackager.rst | ||
---|---|---|
31–32 | If the file format is intended to be consumed by generic offloading language runtimes or development tools, better to describe its layout like https://clang.llvm.org/docs/ClangOffloadBundler.html Since language runtimes or development tools may not use LLVM to load the file. The documentation serves as a spec for this binary format. Especially, it is not clear where to find the target triple and GPU arch for each image in this documentation. |
clang/docs/ClangOffloadPackager.rst | ||
---|---|---|
31–32 | Sure, I'll add some more comprehensive documentation. |
Weird, I'll try a fresh build myself and see if I can figure it out. I'm not sure if it's the problem, but I used add_clang_executable instead of add_clang_tool in the CMake, that's the only thing that stands out I can see.
I couldn't reproduce it with a clean build, but I went ahead and pushed a patch that builds it using the clang tool preset instead. if the problem persists I can look into the specific build configuration.
Add openmp to LLVM_ENABLE_PROJECTS to trigger the issue:
cmake -GNinja -Sllvm -B/tmp/out/play -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='clang;openmp' -DCMAKE_CXX_COMPILER=~/Stable/bin/clang++ -DCMAKE_C_COMPILER=~/Stable/bin/clang -DCMAKE_INSTALL_PREFIX=/tmp/install/play ninja -C /tmp/out/play install
clang: error: unable to execute command: Executable "clang-offload-packager" doesn't exist! clang: error: clang-offload-packager command failed with exit code 1 (use -v to see invocation) [527/5061] Building CXX object tools/clang/utils/TableGen/CMakeFiles/clang-tblgen.dir/ClangAttrEmitter.cpp.o ninja: build stopped: subcommand failed.
[Clang] Change the offload packager build to be a clang tool does not fix the issue.
This worked fine for me using a fresh build
$ cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='clang;openmp' -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_INSTALL_PREFIX=$HOME/clang && ninja install $ ls ~/clang/bin/clang-offload-packager /home2/3n4/clang/bin/clang-offload-packager
The above doesn't work for you right?
Sorry, it's not this patch's fault.
This is a problem of https://raw.githubusercontent.com/chromium/chromium/main/tools/clang/scripts/update.py . Its clang has picked up this change which tries to spawn clang-offload-packager but the chromium prebuilt tools don't provide clang-offload-packager.
Some openmp directories require the host tool clang-offload-packager. (@thakis)
I can use -DLIBOMPTARGET_BUILD_DEVICERTL_BCLIB=off -DLIBOMPTARGET_BUILD_AMDGPU_PLUGIN=off -DLIBOMPTARGET_BUILD_CUDA_PLUGIN=off to disable these directories.
This makes the file format depend on LLVM version and potentially standard C++ library version.
If it is consumed by the same version of LLVM, it may be fine.
However, if it is intended for a generic file format to be consumed by generic offloading language runtimes, it is better to use C-like simple data layout which does not depend on LLVM or standard C++ library.