Page MenuHomePhabricator

[HIP] Change to code object v4
ClosedPublic

Authored by yaxunl on Mar 23 2021, 9:01 PM.

Details

Summary

Change to code object v4 by default to match ROCm 4.1.

This is upstream of code object v4 support from amd-stg-open branch.

Diff Detail

Event Timeline

yaxunl requested review of this revision.Mar 23 2021, 9:01 PM
yaxunl created this revision.
tra added inline comments.Mar 24 2021, 9:30 AM
clang/lib/Driver/ToolChains/HIP.cpp
116

We do not do it for v2/v3. Could you elaborate on what makes v4 special that it needs its own offload kind?

Will you need to target different object versions simultaneously?
If yes, how? AFAICT, the version specified is currently global and applies to all sub-compilations.
If not, then do we really need to encode the version in the offload target name?

yaxunl marked an inline comment as done.Mar 24 2021, 10:34 AM
yaxunl added inline comments.
clang/lib/Driver/ToolChains/HIP.cpp
116

Introducing hipv4 is to differentiate with code object version 2 and 3 which are used by HIP applications compiled by older version of clang. ROCm platform is required to keep binary backward compatibility, i.e., old HIP applications built by ROCm 4.0 should run on ROCm 4.1. The bundle ID has different interpretation depending on whether it is version 2/3 or version 4, e.g. 'gfx906' implies xnack and sramecc off with code object v2/3 but implies xnack and sramecc ANY with v4. Since code object version 2/3 uses 'hip', code object version 4 needs to be different, therefore it uses 'hipv4'.

yaxunl marked an inline comment as done.Apr 6 2021, 2:24 PM

ping.

tra accepted this revision.Apr 6 2021, 3:11 PM
tra added inline comments.
clang/lib/Driver/ToolChains/HIP.cpp
115

Should it be an error if we pass -mcode-object-version=99 ?

This revision is now accepted and ready to land.Apr 6 2021, 3:11 PM
yaxunl marked an inline comment as done.Apr 6 2021, 4:22 PM
yaxunl added inline comments.
clang/lib/Driver/ToolChains/HIP.cpp
115

Yes we diagnose that.

tra added a comment.Apr 6 2021, 4:41 PM

Still LGTM.

clang/test/Driver/hip-code-object-version.hip
24–39

Nit: it would be nice to move V2 tests above the V3, so the tests are in order.

This revision was automatically updated to reflect the committed changes.
yaxunl marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 5:23 PM
yaxunl marked an inline comment as done.Apr 6 2021, 5:28 PM
yaxunl added inline comments.
clang/test/Driver/hip-code-object-version.hip
24–39

sorry I missed it. will do.

gregrodgers added inline comments.
clang/lib/Driver/ToolChains/HIP.cpp
116

We need to start thinking in terms of offload requirements of a compiled image vs the capabilities of a particular active runtime on a particular GPU. This concept can eliminate the need for a new offload kind. For AMD, we would add the requirement of code object v4 (cov4) if built for code object v4 or greater. This means it can only run on a system with that capability. This concept works well with requirements xnack+, xnack-, sramecc+ and sramecc-. The bundle entry id is the offload-kind, the triple, and the list of image requirements. The gpu type (offload-arch) is really an image requirement.

In this model, there is no requirement for xnack-any. The lack of the xnack+ or xnack- requirement implies "any" which means it can run on any capable machine.

This is a general model that is extensible. To make this work, a runtime must be able to detect the capabilities for any requirement that could be tagged on an image. In fact, every requirement of an embedded image must have its capability detected by the runtime for that offload image to be usable. However, a system's runtime could have more capabilities than the requirements of an image. So in the case of xnack, the lack of xnack- or xnack+ will be acceptable no matter what the xnack capability of the runtime is. If the compiler driver puts the requirement cov4 in the bundle entry id requirements field the runtime will not run that image unless the GPU loader supports v4 or greater.

The clang driver can create the requirement xnack- for code object < 4 on those GPUs that support either xnack mode. This will ensure the image will gracefully fail or use an alternative image if the runtime capability is xnack+.

But the cov4 requirement is mostly unrelated to xnack . It is about the capability of the GPU loader. If the code object version >= 4, then it will be tagged with the cov4 requirement. This would prevent an old system that does not have a newer software stack from running an image with a cov4 requirement.

This general notion of image requirements and runtime capabilities is extensible to other offload architectures. Suppose cuda version 12 compilation REQUIRES that a cuda version 12 runtime. Old runtimes would never display cuv12 capability and would fail to run any image created with the requirement cuv12.