This is an archive of the discontinued LLVM Phabricator instance.

[clang][amdgpu] Use implicit code object default
AbandonedPublic

Authored by JonChesterfield on Mar 16 2021, 3:45 PM.

Download Raw Diff

Details

Reviewers

yaxunl
t-tye
kzhuravl
ronlieb
b-sumner
jdoerfert

Summary

[clang][amdgpu] Use implicit code object default

At present, clang always passes amdhsa-code-object-version on to -cc1. That is
great for certainty over what object version is being used when debugging.

Unfortunately, the command line argument is in AMDGPUBaseInfo.cpp in the amdgpu
target. If clang is used with an llvm compiled with DLLVM_TARGETS_TO_BUILD
that excludes amdgpu, this will be diagnosed (as discovered via D98658):

Unknown command line argument '--amdhsa-code-object-version=3'

This means that clang, built only for X86, can be used to compile the nvptx
devicertl for openmp but not the amdgpu one. That would shortly spawn fragile
logic in the devicertl cmake to try to guess whether the clang used will work.

This change omits the amdhsa-code-object-version parameter when it matches the
default that AMDGPUBaseInfo.cpp specifies, with a comment to indicate why. As
this is the only part of clang's codegen for amdgpu that depends on the target
in the back end it suffices to build the openmp runtime on most (all?) systems.

It is a non-functional change, though observable in the updated tests and when
compiling with -###. It may cause minor disruption to the amd-stg-open branch.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	140 ms	x64 debian > Clang.Driver::hip-autolink.hip
	130 ms	x64 debian > Clang.Driver::hip-code-object-version.hip
	240 ms	x64 debian > Clang.Driver::hip-device-compile.hip
	250 ms	x64 debian > Clang.Driver::hip-host-cpu-features.hip
	340 ms	x64 debian > Clang.Driver::hip-rdc-device-only.hip
		View Full Test Results (25 Failed)

Event Timeline

JonChesterfield created this revision.Mar 16 2021, 3:45 PM

Herald added subscribers: kerbowa, pengfei, tpr and 3 others. · View Herald TranscriptMar 16 2021, 3:45 PM

JonChesterfield requested review of this revision.Mar 16 2021, 3:45 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptMar 16 2021, 3:45 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, sstefan1, wdng. · View Herald Transcript

LG, it unbreaks the build and is not totally bananas.

This revision is now accepted and ready to land.Mar 16 2021, 3:59 PM

Thanks. I'm going to wait for some of the rocm people to pass judgement too as this code path is shared with hip / opencl etc.

Harbormaster completed remote builds in B94136: Diff 331119.Mar 16 2021, 4:23 PM

t-tye added inline comments.Mar 16 2021, 4:55 PM

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124	This seem rather fragile. If the backend default changes and this code is not updated it will likely result in creating the wrong code object version.

JonChesterfield added inline comments.Mar 16 2021, 5:09 PM

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124	That is true, though numbers 2 through 4 are used in getOrCheckAMDGPUCodeObjectVersion with 3 as the default there too. Equally, today if we change the default in clang and forget to update the backend, we won't notice because we always pass this argument. What we have at present is the back end has the state and the front end chooses the default. Needing to write '3' in both places is a symptom of that. Perhaps the back end should not have a default for it, requiring this to always be passed, at which point we are going to have a bad time building amdgpu openmp by default. This change decouples the two for the (likely common) case where the user has not chosen to override the value. It's not ideal, but equally we don't change the version that often and there are lit tests that notice if we get it wrong. Plus stuff seems to break if the code object version is not as expected. There may be a better solution. Target specific variable set by clang feels like something that will occur elsewhere.

jdoerfert added inline comments.Mar 16 2021, 5:14 PM

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124	Arguably, the backend should have a default, the option should be used by users that don't like the default.

I have no opinion, just making an observation and defer to @kzhuravl .

yaxunl added inline comments.Mar 16 2021, 6:04 PM

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124	I think a reasonable behavior of clang is to add -mllvm --amdhsa-code-object-version=x only if users use -mcode-object-version=x or equivalent explicitly. Otherwise clang should not add -mllvm --amdhsa-code-object-version=x and let the backend decides the default code object version. @t-tye @kzhuravl what do you think? thanks.

JonChesterfield added inline comments.Mar 16 2021, 7:21 PM

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124	That sounds right to me. It's slightly more complicated to implement than this, will need to rework getOrCheckAMDGPUCodeObjectVersion to distinguish between the get and check parts.

I vaguely remember that clang needed to know what code object it was going to request as it used that to either validate other options, or change the format of other passed cc1 options. If that is true, then I am not sure the defaulting approach works as clang will not know what the backend will be defaulting to. @yaxunl can you remember this?

JonChesterfield mentioned this in rGdbf8f2b089e8: Revert "[libomptarget] Build amdgcn devicertl by default".Mar 17 2021, 4:35 AM

JonChesterfield mentioned this in D98658: [libomptarget] Build amdgcn devicertl by default.Mar 17 2021, 4:36 AM

yaxunl added inline comments.Mar 18 2021, 1:15 PM

clang/lib/Driver/ToolChains/Clang.cpp
1115–1124	well as Tony pointed out, clang itself needs to know the default code object version, since it need it to emit the correct bundle entry ID. I think the issue is that clang is used to compile HIP to bitcode for amdgpu when amdgpu is not a registered target. Although there is a slight concern that whether this will always result in valid bitcode for amdgpu, this may be OK since the backend should be able to consume bitcode generated with -O0. If we are confident the LLVM option --amdhsa-code-object-version=x is only needed for emitting ISA and can be omitted for emitting bitcode, a fix may be to check output type and skip emitting this LLVM option if the output is bitcode.

Great context, thanks guys. I had missed that part of the compiler.

We presently have a dependency edge from clang to the amdgcn target in llvm. The drawback I noticed here is it stops clang unconditionally building a runtime library for amdgcn. This patch, or a derivative of that, would hide that by making the dependency implicit. Unblocks openmp, which is important, but somewhat fragile.

Aside from that workaround, there are two subproblems here:

Reporting deprecated argument and propagating the clang argument through to the backend, so that it doesn't have to be prefixed -mllvm by the user
Clang emitting different IR based on different values for the code object format

The first part can be neatly solved as @yaxunl suggested above. Rewrite deprecated to new format, only pass either along if one was specified. That'll unblock openmp and break nothing. I'll put that change together.

The second is more troublesome. If clang emits different IR for different code object formats, then a device library built at clang build time (e.g. the openmp runtime) will need to be compiled N times for N different code object formats. Unfortunately, that library is already compiled M times for different architectures, so we're looking at N*M copies of roughly the same library, and logic in the clang driver to pick out the right one. Plus N extra paths to test/debug etc.

It would be preferable from an ease of use and distribution perspective if clang emits target id independent code and we specialise it later, either in the backend (at which point the backend really is the only thing that specifies a default), or as a library that is chosen by the driver during 'linking', like the device libs oclc_isa_version_906.bc pattern.

That is, we should make the target id implementation in clang more complicated by emitting some hook which is filled in later instead of hardcoding the code object version choice upfront, as that makes other stuff in the toolchain simpler. It would also turn the current work in progress from fragile into correct.

JonChesterfield mentioned this in D99656: [AMDGPU][OpenMP] Enable Libomptarget runtime tests.Mar 31 2021, 7:08 AM

JonChesterfield mentioned this in D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed.Apr 16 2021, 4:02 AM

JonChesterfield mentioned this in D101077: [clang][nfc] Split getOrCheckAMDGPUCodeObjectVersion.Apr 22 2021, 9:00 AM

Posted D101077 which is a refactor. Expect to shortly post a replacement for this which implements @yaxunl's suggestion above, i.e. pass the argument to llc whenever the user specified one to clang

JonChesterfield mentioned this in D101095: [clang][amdgpu] Use implicit code object version.Apr 22 2021, 11:23 AM

Revision at D101095 implements the suggestion above. The argument is passed to llc whenever an argument was passed to clang. Front end tests / device runtime libraries that are happy with the default can then build without the AMDGPU target enabled.

JonChesterfield mentioned this in rG2cdb9873b210: [clang][nfc] Split getOrCheckAMDGPUCodeObjectVersion.Apr 22 2021, 4:25 PM

Abandoning in favour of the more conservative D101095

JonChesterfield mentioned this in rGfc88d927e30d: [clang][amdgpu] Use implicit code object version.Apr 23 2021, 3:53 PM

Revision Contents

Path

Size

clang/

lib/

Driver/

ToolChains/

Clang.cpp

13 lines

test/

Driver/

amdgpu-features-as.s

2 lines

amdgpu-features.c