This is an archive of the discontinued LLVM Phabricator instance.

[clang-offload-bundler] Make Bundle Entry ID backward compatible
ClosedPublic

Authored by saiislam on Jul 26 2021, 10:09 AM.

Details

Summary

Earlier BundleEntryID used to be <OffloadKind>-<Triple>-<GPUArch>.
This used to work because the clang-offload-bundler didn't need
GPUArch explicitly for any bundling/unbundling action. With
unbundleArchive it needs GPUArch to ensure compatibility between
device specific code objects. D93525 enforced triples to have
separators for all 4 components irrespective of number of
components, like "amdgcn-amd-amdhsa--". It was required to
to correctly parse a possible 4th environment component or a GPU.
But, this condition is breaking backward compatibility with
archive libraries compiled with compilers older than D93525.

This patch allows triples to have any number of components with
and without extra separator for empty environment field. Thus,
both the following bundle entry IDs are same:
openmp-amdgcn-amd-amdhsa--gfx906
openmp-amdgcn-amd-amdhsa-gfx906

Diff Detail

Event Timeline

saiislam requested review of this revision.Jul 26 2021, 10:09 AM
saiislam created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2021, 10:09 AM
grokos added a comment.Sep 1 2021, 3:47 PM

LG. One possible suggestion is that you leave the double dash (--) variant in some tests so that we can make sure both variants (e.g. both openmp-amdgcn-amd-amdhsa--gfx906 and openmp-amdgcn-amd-amdhsa-gfx906) are correctly parsed.

saiislam updated this revision to Diff 370219.Sep 2 2021, 3:28 AM

Added test cases for compatibility old and new formats of bundle entry id.

yaxunl added inline comments.Sep 7 2021, 7:23 AM
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
148

This is rather fragile to determine whether a bundle ID contains GPU arch. Is there better way?

At least for HIP offload kind, GPU arch should always be there.

1298

need a test

saiislam updated this revision to Diff 371133.Sep 7 2021, 11:24 AM
saiislam marked 2 inline comments as done.
  1. Added tests for empty and incompatible archives.
  2. Improved identification of GPUArch in Bundle Entry ID using CudaArch.
yaxunl accepted this revision.Sep 7 2021, 1:00 PM

Ideally, we probably need something like isValidOffloadArch(Triple, OffloadArch) or getTargetTriple(OffloadArch). However, I think the current patch is good enough for its own purpose.

This revision is now accepted and ready to land.Sep 7 2021, 1:00 PM
grokos accepted this revision.Sep 7 2021, 1:02 PM

LGTM as well.

RKSimon added a subscriber: RKSimon.Sep 9 2021, 7:22 AM
RKSimon added inline comments.
clang/test/Driver/clang-offload-bundler.c
405

@saiislam This is causing an issue on a thinlto buildbot: https://lab.llvm.org/buildbot/#/builders/67/builds/4148

kwk added a subscriber: kwk.Sep 10 2021, 12:51 AM

@saiislam please have a look into why this happens. Is the -debug-only=CodeObjectCompatibility maybe a left-over of some sort?

clang/test/Driver/clang-offload-bundler.c
405

@saiislam We're seeing the same issue in one of our downstream builders:

: 'RUN: at line 405';   clang-offload-bundler -unbundle -type=a -targets=openmp-amdgcn-amd-amdhsa--gfx906,openmp-amdgcn-amd-amdhsa-gfx908 -inputs=/builddir/build/BUILD/clang-14.0.0.src/redhat-linux-build/test/Driver/Output/clang-offload-bundler.c.tmp.input-archive.a -outputs=/builddir/build/BUILD/clang-14.0.0.src/redhat-linux-build/test/Driver/Output/clang-offload-bundler.c.tmp-archive-gfx906-simple.a,/builddir/build/BUILD/clang-14.0.0.src/redhat-linux-build/test/Driver/Output/clang-offload-bundler.c.tmp-archive-gfx908-simple.a -debug-only=CodeObjectCompatibility 2>&1 | /usr/bin/FileCheck /builddir/build/BUILD/clang-14.0.0.src/test/Driver/clang-offload-bundler.c -check-prefix=BUNDLECOMPATIBILITY
--
Exit Code: 1

Command Output (stderr):
--
/builddir/build/BUILD/clang-14.0.0.src/test/Driver/clang-offload-bundler.c:406:25: error: BUNDLECOMPATIBILITY: expected string not found in input
// BUNDLECOMPATIBILITY: Compatible: Exact match: [CodeObject: openmp-amdgcn-amd-amdhsa-gfx906] : [Target: openmp-amdgcn-amd-amdhsa--gfx906]
                        ^
<stdin>:1:1: note: scanning from here
clang-offload-bundler: Unknown command line argument '-debug-only=CodeObjectCompatibility'. Try: 'clang-offload-bundler --help'
^

Input file: <stdin>
Check file: /builddir/build/BUILD/clang-14.0.0.src/test/Driver/clang-offload-bundler.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: clang-offload-bundler: Unknown command line argument '-debug-only=CodeObjectCompatibility'. Try: 'clang-offload-bundler --help' 
check:406     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
           2: clang-offload-bundler: Did you mean '--unbundle=CodeObjectCompatibility'? 
check:406     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

--

@saiislam please have a look into why this happens. Is the -debug-only=CodeObjectCompatibility maybe a left-over of some sort?

No, it is not a left-over. I was using this debug output to show that new and old bundle entry ID formats are compatible with each other.

I am taking a look on how to fix this. Thanks for pointing it out @RKSimon and @kwk .

dyung added a subscriber: dyung.Sep 10 2021, 3:49 AM

@saiislam please have a look into why this happens. Is the -debug-only=CodeObjectCompatibility maybe a left-over of some sort?

No, it is not a left-over. I was using this debug output to show that new and old bundle entry ID formats are compatible with each other.

I am taking a look on how to fix this. Thanks for pointing it out @RKSimon and @kwk .

We are also seeing this issue on our internal build bot that builds a "release" configuration.

If you must use "-debug-only" in your test, a fix could be to add "REQUIRES: asserts" to the test, but that would mean the test would then be completely skipped in release builds. A better solution might be to break out the run line with "-debug-only" to a separate test file which does have a "REQUIRES: asserts" line so that we still run everything else that we can.

saiislam added a comment.EditedSep 10 2021, 4:13 AM

@saiislam please have a look into why this happens. Is the -debug-only=CodeObjectCompatibility maybe a left-over of some sort?

No, it is not a left-over. I was using this debug output to show that new and old bundle entry ID formats are compatible with each other.

I am taking a look on how to fix this. Thanks for pointing it out @RKSimon and @kwk .

We are also seeing this issue on our internal build bot that builds a "release" configuration.

If you must use "-debug-only" in your test, a fix could be to add "REQUIRES: asserts" to the test, but that would mean the test would then be completely skipped in release builds. A better solution might be to break out the run line with "-debug-only" to a separate test file which does have a "REQUIRES: asserts" line so that we still run everything else that we can.

Thanks for the suggestion. I am going to remove the -debug-only test because the same can be tested by changing bundler entry ID format in one of the tests above.

Please have a look at D109592.