This is an archive of the discontinued LLVM Phabricator instance.

[clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives
ClosedPublic

Authored by saiislam on Dec 18 2020, 1:51 AM.

Details

Summary

This patch adds unbundling support of an archive file. It takes an
archive file along with a set of offload targets as input.
Output is a device specific archive for each given offload target.
Input archive contains bundled code objects bundled using
clang-offload-bundler. Each generated device specific archive contains
a set of device code object files which are named as
<Parent Bundle Name>-<CodeObject-GPUArch>.
Entries in input archive can be of any binary type which is
supported by clang-offload-bundler, like *.bc. Output archives will
contain files in same type.
Example Usuage:
clang-offload-bundler --unbundle --inputs=lib-generic.a -type=a -targets=openmp-amdgcn-amdhsa--gfx906,openmp-amdgcn-amdhsa--gfx908 -outputs=devicelib-gfx906.a,deviceLib-gfx908.a

Diff Detail

Event Timeline

saiislam created this revision.Dec 18 2020, 1:51 AM
saiislam requested review of this revision.Dec 18 2020, 1:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 18 2020, 1:51 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
saiislam edited the summary of this revision. (Show Details)Dec 18 2020, 1:54 AM
ABataev added inline comments.Dec 18 2020, 8:49 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
186–188

No need else here

1065–1069

No need for else here

1072–1074

I think llvm Support lib has all required functions for this.

1112

Do not use auto where the type is not obvious.

1160–1161

Just continue and make else if just if

saiislam updated this revision to Diff 314633.Jan 5 2021, 8:42 AM

Modified to handle multiple targets/outputs in one run of the tool for archive unbundling. Other minor changes as requested in the review.

saiislam marked 3 inline comments as done.Jan 5 2021, 8:47 AM
saiislam added inline comments.
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
1160–1161

wasn't possible with the code flow. there is stuff to be processed in case of failure as well.

saiislam edited the summary of this revision. (Show Details)Jan 5 2021, 8:49 AM
saiislam added reviewers: yaxunl, t-tye.

can you document this in ClangOffloadBundler.rst ? I think we need a clear description about how clang-offload-bundler knows which file in the .a file belongs to which target.

can you document this in ClangOffloadBundler.rst ? I think we need a clear description about how clang-offload-bundler knows which file in the .a file belongs to which target.

How does the .a relate to bundled code objects? Does the .a have a number of bundled code objects? If so wouldn't the identity of code objects be defined by the existing bundled code object ABI already documented? If the .a is a set of non-bundled code objects then defining how they are identified is not part of the clang-offload-bundler documentation as there are no bundled code objects involved. It would seem that the documentation belongs with the OpenMP runtime/compiler that is choosing to use .a files in this manner.

can you document this in ClangOffloadBundler.rst ? I think we need a clear description about how clang-offload-bundler knows which file in the .a file belongs to which target.

How does the .a relate to bundled code objects? Does the .a have a number of bundled code objects? If so wouldn't the identity of code objects be defined by the existing bundled code object ABI already documented? If the .a is a set of non-bundled code objects then defining how they are identified is not part of the clang-offload-bundler documentation as there are no bundled code objects involved. It would seem that the documentation belongs with the OpenMP runtime/compiler that is choosing to use .a files in this manner.

Bundles (created using clang-offload-bundler) are passed to llvm-ar to create an archive of bundled objects (*.a file). An archive can have bundles for multiple device types. So, yes, the identity of code objects is defined by the existing bundled code object ABI.
This patch reads such an archive and produces a device-specific archive for each of the target devices given as input. Each device-specific archive contains all the code objects corresponding to that particular device and are written as per llvm archive format.

Here is a snippet of relevant lit run lines:

// RUN: %clang -O0 -target %itanium_abi_triple %s -c -o %t.o

// RUN: echo 'Content of device file 1' > %t.tgt1
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt1 -outputs=%t.abundle1.o
 
// RUN: echo 'Content of device file 2' > %t.tgt2
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt2 -outputs=%t.abundle2.o
 
// RUN: llvm-ar cr %t.lib.a %t.abundle1.o %t.abundle2.o

This patch ==>
// RUN: clang-offload-bundler -unbundle -type=a -targets=openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.lib.a -outputs=%t.devicelib.a

%t.devicelib.a will contain all devices objects corresponding to gfx900

Though my interest originates from OpenMP side, Device-specific Archive Libraries created like this can be used by other offloading languages like HIP, CUDA, and OpenCL. Pelase refer D81109 for the an earlier patch in the series of patches which will enable this.

t-tye added a comment.Jan 13 2021, 8:07 AM

can you document this in ClangOffloadBundler.rst ? I think we need a clear description about how clang-offload-bundler knows which file in the .a file belongs to which target.

How does the .a relate to bundled code objects? Does the .a have a number of bundled code objects? If so wouldn't the identity of code objects be defined by the existing bundled code object ABI already documented? If the .a is a set of non-bundled code objects then defining how they are identified is not part of the clang-offload-bundler documentation as there are no bundled code objects involved. It would seem that the documentation belongs with the OpenMP runtime/compiler that is choosing to use .a files in this manner.

Bundles (created using clang-offload-bundler) are passed to llvm-ar to create an archive of bundled objects (*.a file). An archive can have bundles for multiple device types. So, yes, the identity of code objects is defined by the existing bundled code object ABI.
This patch reads such an archive and produces a device-specific archive for each of the target devices given as input. Each device-specific archive contains all the code objects corresponding to that particular device and are written as per llvm archive format.

Here is a snippet of relevant lit run lines:

// RUN: %clang -O0 -target %itanium_abi_triple %s -c -o %t.o

// RUN: echo 'Content of device file 1' > %t.tgt1
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt1 -outputs=%t.abundle1.o
 
// RUN: echo 'Content of device file 2' > %t.tgt2
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt2 -outputs=%t.abundle2.o
 
// RUN: llvm-ar cr %t.lib.a %t.abundle1.o %t.abundle2.o

This patch ==>
// RUN: clang-offload-bundler -unbundle -type=a -targets=openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.lib.a -outputs=%t.devicelib.a

%t.devicelib.a will contain all devices objects corresponding to gfx900

Though my interest originates from OpenMP side, Device-specific Archive Libraries created like this can be used by other offloading languages like HIP, CUDA, and OpenCL. Pelase refer D81109 for the an earlier patch in the series of patches which will enable this.

The naming of code objects in a bundled code object includes the processor name and the settings for target features (see https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id and https://llvm.org/docs/AMDGPUUsage.html#target-id). The compatibility of code objects considers both target processor matching and target feature compatibility. Target features can have three settings: on, off and any. The compatibility is that each feature that is on/off must exactly match, but any will match either on or off.

So when unbundling an archive how is the desired code object being requested? How is it handling the target features? For example, if code objects that will be compatible with a feature being on is required, then matching code objects in the archive would be those that have that feature either on or any.

can you document this in ClangOffloadBundler.rst ? I think we need a clear description about how clang-offload-bundler knows which file in the .a file belongs to which target.

How does the .a relate to bundled code objects? Does the .a have a number of bundled code objects? If so wouldn't the identity of code objects be defined by the existing bundled code object ABI already documented? If the .a is a set of non-bundled code objects then defining how they are identified is not part of the clang-offload-bundler documentation as there are no bundled code objects involved. It would seem that the documentation belongs with the OpenMP runtime/compiler that is choosing to use .a files in this manner.

Bundles (created using clang-offload-bundler) are passed to llvm-ar to create an archive of bundled objects (*.a file). An archive can have bundles for multiple device types. So, yes, the identity of code objects is defined by the existing bundled code object ABI.
This patch reads such an archive and produces a device-specific archive for each of the target devices given as input. Each device-specific archive contains all the code objects corresponding to that particular device and are written as per llvm archive format.

Here is a snippet of relevant lit run lines:

// RUN: %clang -O0 -target %itanium_abi_triple %s -c -o %t.o

// RUN: echo 'Content of device file 1' > %t.tgt1
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt1 -outputs=%t.abundle1.o
 
// RUN: echo 'Content of device file 2' > %t.tgt2
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.o,%t.tgt2 -outputs=%t.abundle2.o
 
// RUN: llvm-ar cr %t.lib.a %t.abundle1.o %t.abundle2.o

This patch ==>
// RUN: clang-offload-bundler -unbundle -type=a -targets=openmp-amdgcn-amd-amdhsa-gfx900 -inputs=%t.lib.a -outputs=%t.devicelib.a

%t.devicelib.a will contain all devices objects corresponding to gfx900

Though my interest originates from OpenMP side, Device-specific Archive Libraries created like this can be used by other offloading languages like HIP, CUDA, and OpenCL. Pelase refer D81109 for the an earlier patch in the series of patches which will enable this.

The naming of code objects in a bundled code object includes the processor name and the settings for target features (see https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id and https://llvm.org/docs/AMDGPUUsage.html#target-id). The compatibility of code objects considers both target processor matching and target feature compatibility. Target features can have three settings: on, off and any. The compatibility is that each feature that is on/off must exactly match, but any will match either on or off.

So when unbundling an archive how is the desired code object being requested? How is it handling the target features? For example, if code objects that will be compatible with a feature being on is required, then matching code objects in the archive would be those that have that feature either on or any.

At the moment this patch defines compatibility as exact string match of bundler entry ID. So, it doesn't support target ID concept fully. But, following example work.
Supporting target ID requires little more work and discussion.

// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa--gfx908 -inputs=%t.o,%t.tgt1 -outputs=%t.abundle1.o
// RUN: clang-offload-bundler -type=o -targets=host-%itanium_abi_triple,openmp-amdgcn-amd-amdhsa--gfx908:sramecc+:xnack+,openmp-amdgcn-amd-amdhsa--gfx908:sramecc-:xnack+ -inputs=%t.o,%t.tgt1,%t.tgt2 -outputs=%t.targetIDbundle.o
// RUN: llvm-ar cr %t.targetIDlib.a %t.abundle1.o %t.targetIDbundle.o
// RUN: clang-offload-bundler -unbundle -type=a -targets=openmp-amdgcn-amd-amdhsa--gfx908:sramecc+:xnack+ -inputs=%t.targetIDlib.a -outputs=%t.devicelibt-sramecc+.a
// RUN: llvm-ar t %t.devicelibt-sramecc+.a | FileCheck %s -check-prefix=SRAMECCplus
// SRAMECCplus: targetIDbundle.bc
// SRAMECCplus-NOT: abundle1.bc

At the moment this patch defines compatibility as exact string match of bundler entry ID.
[...]
Supporting target ID requires little more work and discussion.

Let's get this in first, then revisit target ID support as we need it.

t-tye requested changes to this revision.Jan 20 2021, 8:16 AM

At the moment this patch defines compatibility as exact string match of bundler entry ID.
[...]
Supporting target ID requires little more work and discussion.

Let's get this in first, then revisit target ID support as we need it.

I do not think this patch should ignore target ID as that is now upstreamed and documented. What is involved in correcting the compatibility test to be correct by the target ID rules? There are examples of doing this in all the runtimes and I can help if that is useful.

This revision now requires changes to proceed.Jan 20 2021, 8:16 AM

At the moment this patch defines compatibility as exact string match of bundler entry ID.
[...]
Supporting target ID requires little more work and discussion.

Let's get this in first, then revisit target ID support as we need it.

I do not think this patch should ignore target ID as that is now upstreamed and documented. What is involved in correcting the compatibility test to be correct by the target ID rules? There are examples of doing this in all the runtimes and I can help if that is useful.

First, there is no reason not to have multiple patches as long as they are self contained and testable. Arguably, smaller patches are better.

That said, target ID is a new feature and, as discussed in the OpenMP call today, there is a chance we have to revisit this to support more involved information. As this discussion is open ended (and hasn't started yet), it seems absolutely sensible to continue with a tested and working patch that provides features we need for sure instead of forcing some support of a feature we don't use right now anyway.

saiislam updated this revision to Diff 322724.Feb 10 2021, 9:58 AM

Added support for optional TargetID during unbundling of archives.

saiislam retitled this revision from [OpenMP] Add unbundling of archives containing bundled object files into device specific archives to [clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives.Feb 10 2021, 10:00 AM
saiislam edited the summary of this revision. (Show Details)
ashi1 added a subscriber: ashi1.Apr 5 2021, 8:24 AM

Can we split this patch now and make progress?

cchen added a subscriber: cchen.May 19 2021, 8:30 AM
saiislam updated this revision to Diff 350883.Jun 9 2021, 6:39 AM
  1. Removed TargetID support, to be reviewed in a followup patch.
  2. Added OffloadTargetInfo class to encapsulate handling of bundle entry ID components: OffloadKind, Triple, GPUArch.
saiislam updated this revision to Diff 350884.Jun 9 2021, 6:45 AM

Removed unused header.

saiislam edited the summary of this revision. (Show Details)Jun 9 2021, 6:50 AM

Thanks for splitting this. I quickly went over it only.

clang/docs/ClangOffloadBundler.rst
137

not needed here.

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
1446

By now, a proper conditional seems appropriate.

saiislam updated this revision to Diff 350897.Jun 9 2021, 7:44 AM

Removed Triple format example from documentation and simplified conditional calling of bundling/unbundling functions.

Does this pass internal CI (ePSDB) ? I am concerned about the enforcement of the canonical format of target triple since this may break backward compatibility.

saiislam updated this revision to Diff 353922.Jun 23 2021, 4:10 AM

Updated clang and hip tests to ensure that all 4 components of triple are mandataroly available in the bundle entry ID.

yaxunl added inline comments.Jun 23 2021, 7:04 AM
clang/lib/Driver/ToolChains/Clang.cpp
7632–7641

This is not HIP specific. Other toolchain could use a non-canonical triple too. Also there may be more components of triple missing. A generic fix would be use Triple::normalize for all toolchain. same as below.

saiislam updated this revision to Diff 354197.Jun 24 2021, 3:48 AM

Generalized padding of Triple fields of Bundle Entry ID while generating command for clang-offload-bundler.

yaxunl accepted this revision.Jun 24 2021, 7:07 AM

LGTM. Thanks! Pls make sure it passes internal CI (ePSDB) before committing.

LGTM. Thanks! Pls make sure it passes internal CI (ePSDB) before committing.

Sure, I will take care of it. Thanks!

Any comments @jdoerfert ?

jdoerfert accepted this revision.Jun 24 2021, 8:35 AM

Looks reasonable to me. We can always refine it as we go.

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
137
1263

Add a message to asserts. also other places.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 30 2021, 5:26 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

@yaxunl
this patch on its own is failing in our internal CI. I have an internal patch (542569) to integrate it cleanly there.

grokos added inline comments.Jun 30 2021, 5:34 AM
clang/docs/ClangOffloadBundler.rst
128

A bit of wordplay, but it's weird that a *triple* now has 4 elements...

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
147

Leftover? Components is already 6 elements long.

1102

CodeObject --> CodeObjectInfo

@yaxunl
this patch on its own is failing in our internal CI. I have an internal patch (542569) to integrate it cleanly there.

Fine. Thanks.

saiislam added inline comments.Jun 30 2021, 7:25 AM
clang/docs/ClangOffloadBundler.rst
128

I think llvm::Triple it is named Triple because of historical reasons. Otherwise, it already has these components (including the environment).
As llvm::Triple API's don't force presence of all components it is not an issue in most cases, but in our case of Bundle Entry ID we need a way to differentiate between 4th component of Triple and a GPU arch, hence this rule.

clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
147

Not necessarily. It is possible that target has less than 6 elements. For example all bundling/unbundling cases which do not require GPUArch field.
E.g. "openmp-powerpc64le-ibm-linux-gnu"

grokos added inline comments.Jun 30 2021, 7:45 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
147

OK, thanks!

rsmith added a subscriber: rsmith.Jun 30 2021, 2:27 PM
rsmith added inline comments.
clang/test/Driver/clang-offload-bundler.c
390

This test does not depend on llvm-ar, and this change causes check-clang to fail in the case where llvm-ar has not previously been built. Please can you fix? (Might need some changes to the build rules to add a dependency on llvm-ar, if you can't avoid depending on it for this test.)

@rsmith,
Thanks for pointing out this issue. I have proposed a fix in D105285. Please review.

JonChesterfield added a comment.EditedAug 18 2021, 5:17 AM

This is mentioned as broken in the referenced patch and landed with @t-tye still marked as requires changes. Revert warranted?

Testing looks very sparse for something handling archives. Presumably it has the same set of bugs as D108291 e.g. implicit whole archive semantics

This is mentioned as broken in the referenced patch and landed with @t-tye still marked as requires changes. Revert warranted?

Testing looks very sparse for something handling archives. Presumably it has the same set of bugs as D108291 e.g. implicit whole archive semantics

Tony's earlier objection/ask was to include TargetID support along with archive unbundling support in this. I had verbal consent from him to split the patch and propose TargetID support for OpenMP in a separate patch. The same was agreed upon in the multi-company meeting as well.
Also, D106870 places necessary infrastructure to support TargetID in a follow up patch. Once it lands, TargetID patch is fairly straightforward.
Whole archive semantics didn't introduce any bug anywhere, though performance is a separate issue. D108291 is required because nvlink silently fails to link cubin files inside an archive.

Any suggestions on how can I improve testing for archives?

lamb-j added a subscriber: lamb-j.Jun 7 2022, 11:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 11:23 PM