Page MenuHomePhabricator

[clang][amdgpu] Use implicit code object default
AbandonedPublic

Authored by JonChesterfield on Mar 16 2021, 3:45 PM.

Details

Summary

[clang][amdgpu] Use implicit code object default

At present, clang always passes amdhsa-code-object-version on to -cc1. That is
great for certainty over what object version is being used when debugging.

Unfortunately, the command line argument is in AMDGPUBaseInfo.cpp in the amdgpu
target. If clang is used with an llvm compiled with DLLVM_TARGETS_TO_BUILD
that excludes amdgpu, this will be diagnosed (as discovered via D98658):

  • Unknown command line argument '--amdhsa-code-object-version=3'

This means that clang, built only for X86, can be used to compile the nvptx
devicertl for openmp but not the amdgpu one. That would shortly spawn fragile
logic in the devicertl cmake to try to guess whether the clang used will work.

This change omits the amdhsa-code-object-version parameter when it matches the
default that AMDGPUBaseInfo.cpp specifies, with a comment to indicate why. As
this is the only part of clang's codegen for amdgpu that depends on the target
in the back end it suffices to build the openmp runtime on most (all?) systems.

It is a non-functional change, though observable in the updated tests and when
compiling with -###. It may cause minor disruption to the amd-stg-open branch.

Diff Detail

Unit TestsFailed

TimeTest
140 msx64 debian > Clang.Driver::hip-autolink.hip
Script: -- : 'RUN: at line 5'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang --target=i386-pc-windows-msvc --cuda-gpu-arch=gfx906 -nogpulib --cuda-device-only /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-autolink.hip -### 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck --check-prefix=DEV /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-autolink.hip
130 msx64 debian > Clang.Driver::hip-code-object-version.hip
Script: -- : 'RUN: at line 5'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -### -target x86_64-linux-gnu -mcode-object-v3 --offload-arch=gfx906 -nogpulib /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-code-object-version.hip 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefixes=V3,V3-WARN /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-code-object-version.hip
240 msx64 debian > Clang.Driver::hip-device-compile.hip
Script: -- : 'RUN: at line 8'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -c -emit-llvm --cuda-device-only -### -target x86_64-linux-gnu -o a.bc -x hip --cuda-gpu-arch=gfx900 --hip-device-lib=lib1.bc --hip-device-lib-path=/mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/lib1 /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/a.cu 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefixes=CHECK,BC /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-device-compile.hip
250 msx64 debian > Clang.Driver::hip-host-cpu-features.hip
Script: -- : 'RUN: at line 5'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -### -c -target x86_64-linux-gnu -march=znver2 --cuda-gpu-arch=gfx803 -nogpulib /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-host-cpu-features.hip 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-host-cpu-features.hip -check-prefix=HOSTCPU
340 msx64 debian > Clang.Driver::hip-rdc-device-only.hip
Script: -- : 'RUN: at line 5'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -### -target x86_64-linux-gnu -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -c -nogpuinc -nogpulib --cuda-device-only -fgpu-rdc /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/a.cu /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/Inputs/hip_multiple_inputs/b.hip 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefixes=COMMON,EMITBC /mnt/disks/ssd0/agent/llvm-project/clang/test/Driver/hip-rdc-device-only.hip
View Full Test Results (25 Failed)

Event Timeline

JonChesterfield requested review of this revision.Mar 16 2021, 3:45 PM
Herald added a project: Restricted Project. · View Herald Transcript
jdoerfert accepted this revision.Mar 16 2021, 3:59 PM

LG, it unbreaks the build and is not totally bananas.

This revision is now accepted and ready to land.Mar 16 2021, 3:59 PM

Thanks. I'm going to wait for some of the rocm people to pass judgement too as this code path is shared with hip / opencl etc.

t-tye added inline comments.Mar 16 2021, 4:55 PM
clang/lib/Driver/ToolChains/Clang.cpp
1115–1124

This seem rather fragile. If the backend default changes and this code is not updated it will likely result in creating the wrong code object version.

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124

That is true, though numbers 2 through 4 are used in getOrCheckAMDGPUCodeObjectVersion with 3 as the default there too.

Equally, today if we change the default in clang and forget to update the backend, we won't notice because we always pass this argument.

What we have at present is the back end has the state and the front end chooses the default. Needing to write '3' in both places is a symptom of that. Perhaps the back end should not have a default for it, requiring this to always be passed, at which point we are going to have a bad time building amdgpu openmp by default.

This change decouples the two for the (likely common) case where the user has not chosen to override the value. It's not ideal, but equally we don't change the version that often and there are lit tests that notice if we get it wrong. Plus stuff seems to break if the code object version is not as expected.

There may be a better solution. Target specific variable set by clang feels like something that will occur elsewhere.

jdoerfert added inline comments.Mar 16 2021, 5:14 PM
clang/lib/Driver/ToolChains/Clang.cpp
1115–1124

Arguably, the backend should have a default, the option should be used by users that don't like the default.

t-tye added a comment.Mar 16 2021, 5:14 PM

I have no opinion, just making an observation and defer to @kzhuravl .

yaxunl added inline comments.Mar 16 2021, 6:04 PM
clang/lib/Driver/ToolChains/Clang.cpp
1115–1124

I think a reasonable behavior of clang is to add -mllvm --amdhsa-code-object-version=x only if users use -mcode-object-version=x or equivalent explicitly. Otherwise clang should not add -mllvm --amdhsa-code-object-version=x and let the backend decides the default code object version.

@t-tye @kzhuravl what do you think? thanks.

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124

That sounds right to me. It's slightly more complicated to implement than this, will need to rework getOrCheckAMDGPUCodeObjectVersion to distinguish between the get and check parts.

t-tye added a comment.Mar 16 2021, 7:42 PM

I vaguely remember that clang needed to know what code object it was going to request as it used that to either validate other options, or change the format of other passed cc1 options. If that is true, then I am not sure the defaulting approach works as clang will not know what the backend will be defaulting to. @yaxunl can you remember this?

yaxunl added inline comments.Mar 18 2021, 1:15 PM
clang/lib/Driver/ToolChains/Clang.cpp
1115–1124

well as Tony pointed out, clang itself needs to know the default code object version, since it need it to emit the correct bundle entry ID.

I think the issue is that clang is used to compile HIP to bitcode for amdgpu when amdgpu is not a registered target. Although there is a slight concern that whether this will always result in valid bitcode for amdgpu, this may be OK since the backend should be able to consume bitcode generated with -O0.

If we are confident the LLVM option --amdhsa-code-object-version=x is only needed for emitting ISA and can be omitted for emitting bitcode, a fix may be to check output type and skip emitting this LLVM option if the output is bitcode.

Great context, thanks guys. I had missed that part of the compiler.

We presently have a dependency edge from clang to the amdgcn target in llvm. The drawback I noticed here is it stops clang unconditionally building a runtime library for amdgcn. This patch, or a derivative of that, would hide that by making the dependency implicit. Unblocks openmp, which is important, but somewhat fragile.

Aside from that workaround, there are two subproblems here:

  • Reporting deprecated argument and propagating the clang argument through to the backend, so that it doesn't have to be prefixed -mllvm by the user
  • Clang emitting different IR based on different values for the code object format

The first part can be neatly solved as @yaxunl suggested above. Rewrite deprecated to new format, only pass either along if one was specified. That'll unblock openmp and break nothing. I'll put that change together.

The second is more troublesome. If clang emits different IR for different code object formats, then a device library built at clang build time (e.g. the openmp runtime) will need to be compiled N times for N different code object formats. Unfortunately, that library is already compiled M times for different architectures, so we're looking at N*M copies of roughly the same library, and logic in the clang driver to pick out the right one. Plus N extra paths to test/debug etc.

It would be preferable from an ease of use and distribution perspective if clang emits target id independent code and we specialise it later, either in the backend (at which point the backend really is the only thing that specifies a default), or as a library that is chosen by the driver during 'linking', like the device libs oclc_isa_version_906.bc pattern.

That is, we should make the target id implementation in clang more complicated by emitting some hook which is filled in later instead of hardcoding the code object version choice upfront, as that makes other stuff in the toolchain simpler. It would also turn the current work in progress from fragile into correct.

Posted D101077 which is a refactor. Expect to shortly post a replacement for this which implements @yaxunl's suggestion above, i.e. pass the argument to llc whenever the user specified one to clang

Revision at D101095 implements the suggestion above. The argument is passed to llc whenever an argument was passed to clang. Front end tests / device runtime libraries that are happy with the default can then build without the AMDGPU target enabled.

JonChesterfield abandoned this revision.Apr 23 2021, 12:45 PM

Abandoning in favour of the more conservative D101095