This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix using the target ID when using the new driver
ClosedPublic

Authored by jhuber6 on May 19 2023, 1:54 PM.

Details

Summary

AMDGPU sometimes uses a novel formatting for their offloading
architecture called the target id. This merges the attributes and the
architecture name into a single string. Previously, we were passing this
as the canonical architecture name. This caused the linker wrapper to
fail to find relevant libraries and then pass an incalid CPU name. This
patch changes the handling in the offload packager to handle the
canonical architecture and then extract the features.

Diff Detail

Event Timeline

jhuber6 created this revision.May 19 2023, 1:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2023, 1:54 PM
jhuber6 requested review of this revision.May 19 2023, 1:54 PM
saiislam added inline comments.May 22 2023, 7:29 AM
clang/lib/Driver/ToolChains/Clang.cpp
8419–8424

May be use parseTargetIDWithFormatCheckingOnly()?

8431

Shouldn't Arch (targetID here) should be passed along instead of just the processor?

For example, gfx90a:xnack+ and gfx90a:xnack- should be treated differently.

jhuber6 added inline comments.May 22 2023, 7:33 AM
clang/lib/Driver/ToolChains/Clang.cpp
8419–8424

I tried that but it didn't return the strings in the format required by llc for the -mattrs list.

8431

So the problem there is that this will cause us to no longer link in something like the OpenMP runtime library since gfx90a != gfx90a:xnack+. Right now the behavior is that we will link them both together since the architecture matches but then the attributes will get resolved the same way we handle -mattr=+x,-x. I'm not sure what the expected behaviour is here.

yaxunl added inline comments.May 23 2023, 7:35 AM
clang/lib/Driver/ToolChains/Clang.cpp
8431

targetID is part of ROCm ABI as it is returned as part of Isa::GetIsaName (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/rocm-5.5.x/src/core/runtime/isa.cpp#L98) .

the compatibility rule for targetID is specified by https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id . For example, bundle entry with gfx90a can be consumed by device with GetIsaName gfx90a:xnack+ or gfx90a:xnack- . but bundle entry with gfx90a:xnack+ can only be consumed by device with GetIsaName gfx90a:xnack+.

Language runtime is supposed to do a compatibility check for bundle entry with the device GetIsaName. Isa::IsCompatible (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/3b939c398bdac0c2b9a860ff9a0ed0be0c80f911/src/core/runtime/isa.cpp#L73) can be used to do that. For convenience, language runtime is expected to use targetID for identifying bundle entries instead of re-construct targetID from features when needed.

targetID is also used for compatibility checks when linking bitcode.

jhuber6 added inline comments.May 23 2023, 7:42 AM
clang/lib/Driver/ToolChains/Clang.cpp
8431

So what we need is some more sophisticated logic in the linker wrapper to merge the binaries according to these rules. However the handling will definitely require pulling this apart when we send it to LTO.

saiislam added inline comments.May 23 2023, 7:54 AM
clang/lib/Driver/ToolChains/Clang.cpp
8431

Some logic is given in ClangOffloadBundler and in AMDGPU plugin

yaxunl added inline comments.May 23 2023, 7:56 AM
clang/lib/Driver/ToolChains/Clang.cpp
8431

targetID is only used to identify bundle entry. It is not directly represented in LLVM IR. In LLVM IR, the target cpu does not contain the target ID feature string. Therefore lld does not know about it. The linker wrapper is responsible to do the compatibility check based on bundle ID.

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

yaxunl added a comment.Jun 7 2023, 8:33 AM

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

can we add a test to make sure --offload-arch=gfx90a:xnack+ and --offload-arch=gfx90a:xnack- work together? It is a very common use case for HIP.

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

can we add a test to make sure --offload-arch=gfx90a:xnack+ and --offload-arch=gfx90a:xnack- work together? It is a very common use case for HIP.

With the current patch, they would both be linked together and it would probably set the xnack value to the last one that showed up in the link list. E.g.

clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o

Would result in a.o and b.o getting linked together with xnack- set as the backend attribute.

yaxunl added a comment.Jun 7 2023, 8:43 AM

Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.

This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.

can we add a test to make sure --offload-arch=gfx90a:xnack+ and --offload-arch=gfx90a:xnack- work together? It is a very common use case for HIP.

With the current patch, they would both be linked together and it would probably set the xnack value to the last one that showed up in the link list. E.g.

clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o

Would result in a.o and b.o getting linked together with xnack- set as the backend attribute.

what happens to

clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o

Basically gfx90a:xnack+ and gfx90a:xnack- need to be treated as distinct GPU arch's and the fat binary should contain different code objects for them.

what happens to

clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang -xhip b.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc
clang --offload-link a.o b.o

Basically gfx90a:xnack+ and gfx90a:xnack- need to be treated as distinct GPU arch's and the fat binary should contain different code objects for them.

In this logic they would both map to gfx90a but be given distinct images and then be linked together into a single one with gfx90a metadata. This is the part that I'm saying should probably be a follow-up fix to handle this correctly in the linker wrapper.

yaxunl accepted this revision.Jun 7 2023, 10:20 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Jun 7 2023, 10:20 AM
This revision was automatically updated to reflect the committed changes.