AMDGPU sometimes uses a novel formatting for their offloading
architecture called the target id. This merges the attributes and the
architecture name into a single string. Previously, we were passing this
as the canonical architecture name. This caused the linker wrapper to
fail to find relevant libraries and then pass an incalid CPU name. This
patch changes the handling in the offload packager to handle the
canonical architecture and then extract the features.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
8419–8424 | I tried that but it didn't return the strings in the format required by llc for the -mattrs list. | |
8431 | So the problem there is that this will cause us to no longer link in something like the OpenMP runtime library since gfx90a != gfx90a:xnack+. Right now the behavior is that we will link them both together since the architecture matches but then the attributes will get resolved the same way we handle -mattr=+x,-x. I'm not sure what the expected behaviour is here. |
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
8431 | targetID is part of ROCm ABI as it is returned as part of Isa::GetIsaName (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/rocm-5.5.x/src/core/runtime/isa.cpp#L98) . the compatibility rule for targetID is specified by https://clang.llvm.org/docs/ClangOffloadBundler.html#target-id . For example, bundle entry with gfx90a can be consumed by device with GetIsaName gfx90a:xnack+ or gfx90a:xnack- . but bundle entry with gfx90a:xnack+ can only be consumed by device with GetIsaName gfx90a:xnack+. Language runtime is supposed to do a compatibility check for bundle entry with the device GetIsaName. Isa::IsCompatible (https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/3b939c398bdac0c2b9a860ff9a0ed0be0c80f911/src/core/runtime/isa.cpp#L73) can be used to do that. For convenience, language runtime is expected to use targetID for identifying bundle entries instead of re-construct targetID from features when needed. targetID is also used for compatibility checks when linking bitcode. |
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
8431 | So what we need is some more sophisticated logic in the linker wrapper to merge the binaries according to these rules. However the handling will definitely require pulling this apart when we send it to LTO. |
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
8431 | Some logic is given in ClangOffloadBundler and in AMDGPU plugin |
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
8431 | targetID is only used to identify bundle entry. It is not directly represented in LLVM IR. In LLVM IR, the target cpu does not contain the target ID feature string. Therefore lld does not know about it. The linker wrapper is responsible to do the compatibility check based on bundle ID. |
Can we use this approach for now and land this? It makes the "new driver" less broken than it currently is as it supports target ID compilation in the general term. Fixing the merging rules will be a rather large overhaul so I'd like this to work in the meantime.
This patch allows --offload-arch=gfx90a:xnack+ to work. It does not fix if the user links in a library that has --offload-arch=gfx90a:xnack- as well.
can we add a test to make sure --offload-arch=gfx90a:xnack+ and --offload-arch=gfx90a:xnack- work together? It is a very common use case for HIP.
With the current patch, they would both be linked together and it would probably set the xnack value to the last one that showed up in the link list. E.g.
clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-new-driver -fgpu-rdc clang -xhip b.c --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc clang --offload-link a.o b.o
Would result in a.o and b.o getting linked together with xnack- set as the backend attribute.
what happens to
clang -xhip a.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc clang -xhip b.c --offload-arch=gfx90a:xnack+ --offload-arch=gfx90a:xnack- --offload-new-driver -fgpu-rdc clang --offload-link a.o b.o
Basically gfx90a:xnack+ and gfx90a:xnack- need to be treated as distinct GPU arch's and the fat binary should contain different code objects for them.
In this logic they would both map to gfx90a but be given distinct images and then be linked together into a single one with gfx90a metadata. This is the part that I'm saying should probably be a follow-up fix to handle this correctly in the linker wrapper.
May be use parseTargetIDWithFormatCheckingOnly()?