This patch adds unbundling support of an archive file. It takes an
archive file along with a set of offload targets as input.
Output is a device specific archive for each given offload target.
Input archive contains bundled code objects bundled using
clang-offload-bundler. Each generated device specific archive contains
a set of device code object files which are named as
<Parent Bundle Name>-<CodeObject-GPUArch>.
Entries in input archive can be of any binary type which is
supported by clang-offload-bundler, like *.bc. Output archives will
contain files in same type.
Example Usuage:
clang-offload-bundler --unbundle --inputs=lib-generic.a -type=a -targets=openmp-amdgcn-amdhsa--gfx906,openmp-amdgcn-amdhsa--gfx908 -outputs=devicelib-gfx906.a,deviceLib-gfx908.a
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Modified to handle multiple targets/outputs in one run of the tool for archive unbundling. Other minor changes as requested in the review.
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp | ||
---|---|---|
1160–1161 | wasn't possible with the code flow. there is stuff to be processed in case of failure as well. |
can you document this in ClangOffloadBundler.rst ? I think we need a clear description about how clang-offload-bundler knows which file in the .a file belongs to which target.
How does the .a relate to bundled code objects? Does the .a have a number of bundled code objects? If so wouldn't the identity of code objects be defined by the existing bundled code object ABI already documented? If the .a is a set of non-bundled code objects then defining how they are identified is not part of the clang-offload-bundler documentation as there are no bundled code objects involved. It would seem that the documentation belongs with the OpenMP runtime/compiler that is choosing to use .a files in this manner.
Bundles (created using clang-offload-bundler) are passed to llvm-ar to create an archive of bundled objects (*.a file). An archive can have bundles for multiple device types. So, yes, the identity of code objects is defined by the existing bundled code object ABI.
This patch reads such an archive and produces a device-specific archive for each of the target devices given as input. Each device-specific archive contains all the code objects corresponding to that particular device and are written as per llvm archive format.
Here is a snippet of relevant lit run lines:
// RUN: %clang -O0 -target %itanium_abi_triple %s -c -o %t.o // RUN: echo 'Content of device file 1' > %t.tgt1 // RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt1 -outputs=%t.abundle1.o // RUN: echo 'Content of device file 2' > %t.tgt2 // RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt2 -outputs=%t.abundle2.o // RUN: llvm-ar cr %t.lib.a %t.abundle1.o %t.abundle2.o This patch ==> // RUN: clang-offload-bundler -unbundle -type=a -targets=openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.lib.a -outputs=%t.devicelib.a %t.devicelib.a will contain all devices objects corresponding to gfx900
Though my interest originates from OpenMP side, Device-specific Archive Libraries created like this can be used by other offloading languages like HIP, CUDA, and OpenCL. Pelase refer D81109 for the an earlier patch in the series of patches which will enable this.
The naming of code objects in a bundled code object includes the processor name and the settings for target features (see https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id and https://llvm.org/docs/AMDGPUUsage.html#target-id). The compatibility of code objects considers both target processor matching and target feature compatibility. Target features can have three settings: on, off and any. The compatibility is that each feature that is on/off must exactly match, but any will match either on or off.
So when unbundling an archive how is the desired code object being requested? How is it handling the target features? For example, if code objects that will be compatible with a feature being on is required, then matching code objects in the archive would be those that have that feature either on or any.
At the moment this patch defines compatibility as exact string match of bundler entry ID. So, it doesn't support target ID concept fully. But, following example work.
Supporting target ID requires little more work and discussion.
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa--gfx908 -inputs=%t.o,%t.tgt1 -outputs=%t.abundle1.o // RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa--gfx908:sramecc+:xnack+,openmp-amdgcn-amd-amdhsa--gfx908:sramecc-:xnack+ -inputs=%t.o,%t.tgt1,%t.tgt2 -outputs=%t.targetIDbundle.o // RUN: llvm-ar cr %t.targetIDlib.a %t.abundle1.o %t.targetIDbundle.o // RUN: clang-offload-bundler -unbundle -type=a -targets=openmp-amdgcn-amd-amdhsa--gfx908:sramecc+:xnack+ -inputs=%t.targetIDlib.a -outputs=%t.devicelibt-sramecc+.a // RUN: llvm-ar t %t.devicelibt-sramecc+.a | FileCheck %s -check-prefix=SRAMECCplus // SRAMECCplus: targetIDbundle.bc // SRAMECCplus-NOT: abundle1.bc
At the moment this patch defines compatibility as exact string match of bundler entry ID.
[...]
Supporting target ID requires little more work and discussion.
Let's get this in first, then revisit target ID support as we need it.
I do not think this patch should ignore target ID as that is now upstreamed and documented. What is involved in correcting the compatibility test to be correct by the target ID rules? There are examples of doing this in all the runtimes and I can help if that is useful.
First, there is no reason not to have multiple patches as long as they are self contained and testable. Arguably, smaller patches are better.
That said, target ID is a new feature and, as discussed in the OpenMP call today, there is a chance we have to revisit this to support more involved information. As this discussion is open ended (and hasn't started yet), it seems absolutely sensible to continue with a tested and working patch that provides features we need for sure instead of forcing some support of a feature we don't use right now anyway.
- Removed TargetID support, to be reviewed in a followup patch.
- Added OffloadTargetInfo class to encapsulate handling of bundle entry ID components: OffloadKind, Triple, GPUArch.
Removed Triple format example from documentation and simplified conditional calling of bundling/unbundling functions.
Does this pass internal CI (ePSDB) ? I am concerned about the enforcement of the canonical format of target triple since this may break backward compatibility.
Updated clang and hip tests to ensure that all 4 components of triple are mandataroly available in the bundle entry ID.
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
7632–7641 | This is not HIP specific. Other toolchain could use a non-canonical triple too. Also there may be more components of triple missing. A generic fix would be use Triple::normalize for all toolchain. same as below. |
Generalized padding of Triple fields of Bundle Entry ID while generating command for clang-offload-bundler.
@yaxunl
this patch on its own is failing in our internal CI. I have an internal patch (542569) to integrate it cleanly there.
clang/docs/ClangOffloadBundler.rst | ||
---|---|---|
128 | I think llvm::Triple it is named Triple because of historical reasons. Otherwise, it already has these components (including the environment). | |
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp | ||
147 | Not necessarily. It is possible that target has less than 6 elements. For example all bundling/unbundling cases which do not require GPUArch field. |
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp | ||
---|---|---|
147 | OK, thanks! |
clang/test/Driver/clang-offload-bundler.c | ||
---|---|---|
390 | This test does not depend on llvm-ar, and this change causes check-clang to fail in the case where llvm-ar has not previously been built. Please can you fix? (Might need some changes to the build rules to add a dependency on llvm-ar, if you can't avoid depending on it for this test.) |
Tony's earlier objection/ask was to include TargetID support along with archive unbundling support in this. I had verbal consent from him to split the patch and propose TargetID support for OpenMP in a separate patch. The same was agreed upon in the multi-company meeting as well.
Also, D106870 places necessary infrastructure to support TargetID in a follow up patch. Once it lands, TargetID patch is fairly straightforward.
Whole archive semantics didn't introduce any bug anywhere, though performance is a separate issue. D108291 is required because nvlink silently fails to link cubin files inside an archive.
Any suggestions on how can I improve testing for archives?
A bit of wordplay, but it's weird that a *triple* now has 4 elements...