This is an archive of the discontinued LLVM Phabricator instance.

[LinkerWrapper] Support device binaries in multiple link jobs
Needs ReviewPublic

Authored by jhuber6 on Jun 13 2023, 7:19 PM.

Details

Summary

Currently the linker wrapper strictly assigns a single input binary to a
single link job based off of its input architecture. This is not
sufficient to implement the AMDGPU target ID correctly as this could
have many compatible architectures participating in multiple links.

This patch introduces the ability to have a single binary input be
linked multiple times. For example, given the following, we will now
link in the static library where previously we would not.

clang foo.c -fopenmp --offload-arch=gfx90a
llvm-ar rcs libfoo.a foo.o
clang foo.c -fopenmp --offload-arch=gfx90a:xnack+ libfoo.a

This also means that given the following we will link the basic input
twice, but that's on the user for providing two versions.

clang foo.c -fopenmp --offload-arch=gfx90a,gfx90a:xnack+

This should allow us to also support a "generic" target in the future
for IR without a specific architecture.

Diff Detail

Event Timeline

jhuber6 created this revision.Jun 13 2023, 7:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 7:19 PM
Herald added a subscriber: tpr. · View Herald Transcript
jhuber6 requested review of this revision.Jun 13 2023, 7:19 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

The design of target ID put constraints on target ID's that can be embedded into one executable https://clang.llvm.org/docs/ClangOffloadBundler.html#bundle-entry-id . For example, gfx90a and gfx90a:xnack+ cannot be embedded into one executable since this will cause difficulty for runtime to choose device binary to run, especially when there are multiple target ID features. clang does not allow --offload-arch=gfx90a and --offload-arch=gfx90a:xnack+ to be used together to compile HIP programs. It would be preferred for offloack-packager to enforce this constraint too.

However, bitcode of target ID gfx90a:xnack+ is allowed to link in bitcode of target ID gfx90a as long as they are from different containers. So there are two rules about target ID: 1. compatibility rules for objects/bitcode in the same container 2. compatibility rules for linking bitcode of different target ID's.

we need tests for both rules.

The design of target ID put constraints on target ID's that can be embedded into one executable https://clang.llvm.org/docs/ClangOffloadBundler.html#bundle-entry-id . For example, gfx90a and gfx90a:xnack+ cannot be embedded into one executable since this will cause difficulty for runtime to choose device binary to run, especially when there are multiple target ID features. clang does not allow --offload-arch=gfx90a and --offload-arch=gfx90a:xnack+ to be used together to compile HIP programs. It would be preferred for offloack-packager to enforce this constraint too.

However, bitcode of target ID gfx90a:xnack+ is allowed to link in bitcode of target ID gfx90a as long as they are from different containers. So there are two rules about target ID: 1. compatibility rules for objects/bitcode in the same container 2. compatibility rules for linking bitcode of different target ID's.

we need tests for both rules.

Should that be a follow-up patch? Or one included here. It definitely influences the test so I can change that.

jhuber6 updated this revision to Diff 531520.Jun 14 2023, 2:34 PM

Adjusting test.

However, bitcode of target ID gfx90a:xnack+ is allowed to link in bitcode of target ID gfx90a as long as they are from different containers. So there are two rules about target ID: 1. compatibility rules for objects/bitcode in the same container 2. compatibility rules for linking bitcode of different target ID's.

we need tests for both rules.

So I'm wondering why I'm allowed to do --offload-arch=gfx90a,gfx90a:xnack+. Shouldn't that be caught by getConflictTargetIDCombination? That seems like the proper place to diagnose this.

However, bitcode of target ID gfx90a:xnack+ is allowed to link in bitcode of target ID gfx90a as long as they are from different containers. So there are two rules about target ID: 1. compatibility rules for objects/bitcode in the same container 2. compatibility rules for linking bitcode of different target ID's.

we need tests for both rules.

So I'm wondering why I'm allowed to do --offload-arch=gfx90a,gfx90a:xnack+. Shouldn't that be caught by getConflictTargetIDCombination? That seems like the proper place to diagnose this.

clang --offload-arch=gfx90a,gfx90a:xnack+ -c a.hip
clang: error: invalid offload arch combinations: 'gfx90a' and 'gfx90a:xnack+' (for a specific processor, a feature should either exist in all offload archs, or not exist in any offload archs)

At least it is caught for HIP. OpenMP may not check that.

However, bitcode of target ID gfx90a:xnack+ is allowed to link in bitcode of target ID gfx90a as long as they are from different containers. So there are two rules about target ID: 1. compatibility rules for objects/bitcode in the same container 2. compatibility rules for linking bitcode of different target ID's.

we need tests for both rules.

So I'm wondering why I'm allowed to do --offload-arch=gfx90a,gfx90a:xnack+. Shouldn't that be caught by getConflictTargetIDCombination? That seems like the proper place to diagnose this.

clang --offload-arch=gfx90a,gfx90a:xnack+ -c a.hip
clang: error: invalid offload arch combinations: 'gfx90a' and 'gfx90a:xnack+' (for a specific processor, a feature should either exist in all offload archs, or not exist in any offload archs)

At least it is caught for HIP. OpenMP may not check that.

if (Kind != Action::OFK_HIP)
  return std::nullopt;

Yes, this would be the culprit. Guessing we shouldn't do that?

jhuber6 updated this revision to Diff 531756.Jun 15 2023, 7:48 AM

Hopefully fixing test on Windows. I think it's fine to let the packager bundle
mutliple of these now since it's caught in clang. So if the user really wants
to force it we should allow them to since the bundler format is just a simple
fatbinary.

yaxunl added inline comments.Jun 15 2023, 9:00 AM
clang/test/Driver/linker-wrapper.c
51

can we put some variables in the input bitcode so that we can check the linked bitcode?

I would expect there will be only one linked bitcode for gfx90a:xnack+ and it contains both variables.

I don't think it is a good idea to let the final object embed bitcode for both gfx90a:xnack+ and gfx90a since that will result in an invalid container. Therefore I think we should only do linking with target ID's from the first container.

jhuber6 added inline comments.Jun 15 2023, 9:01 AM
clang/test/Driver/linker-wrapper.c
51

I can make a static library that's gfx90a, that covers the main case where we still link in the OpenMP runtime library that's compiled with 90a if the user uses 90a:xnack+. I'd need to place a random external variable to force it to extract however.

jhuber6 updated this revision to Diff 532748.Jun 19 2023, 1:36 PM

I'm not sure why this keeps failing on Windows and have no clue how to tell what's going wrong. The builder simply says

c:\ws\w4\llvm-project\premerge-checks\build\bin\clang-linker-wrapper.exe: error: invalid argument

But I'm unsure what could be causing that since it works on Linux and we have plenty of other tests that Windows passes.