Page MenuHomePhabricator

[HIPSPV][3/4] Enable SPIR-V emission for HIP
Needs ReviewPublic

Authored by linjamaki on Sep 28 2021, 5:39 AM.

Details

Summary

This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.

  • ‘--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().
  • getOffloadingDeviceToolChain() function (based on the design in the SYCL repository) is added to select HIPSPVToolChain when HIP offload target is ‘spirv64’.
  • The HIPActionBuilder is modified to produce LLVM IR at the backend phase. HIPSPV tool chain expects to receive HIP device code as LLVM IR so it can run external LLVM passes over them. HIPSPV TC is also responsible for emitting the SPIR-V binary.
  • A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.
  • Tests are added for checking the HIPSPV tool chain.

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

Diff Detail

Event Timeline

linjamaki created this revision.Sep 28 2021, 5:39 AM

Style fixes.

Remove noisy change.

linjamaki published this revision for review.Sep 29 2021, 1:15 AM
linjamaki added reviewers: yaxunl, bader.
Herald added a project: Restricted Project. · View Herald Transcript
tra added a subscriber: tra.Sep 29 2021, 9:53 AM

A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.

How generic is 'generic'? If I understand the statement above correctly, it should probably reflect that it's specific to spir-v.
If it's the only possible spir-v variant, then calling it`spir-v` might be more meaningful.
If we expect to see other spir-v variants in the future it would allow us to clearly differentiate between them later.
E.g. --offload=spirv-a,spirv-b. It would be rather odd if we had to use --offload=generic, spirv-b.

tra added a comment.Sep 29 2021, 10:42 AM

--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

Can you elaborate on what exactly this option does and how it's intended to interact with the existing --offload-arch?

In general a list of values, combined with the getLastArg will potentially be an issue if/when more than one list value will be supported.
In a large build it's fairly common for the build infrastructure to set the default options and allowing users to extend/override them with *additional* options. getLastArg works great for scalar options, not so much for the lists.
If an option is a list, modifying it requires prior knowledge of preceding values and that may not always be easy.
E.g. a build configuration may be set to target gfx900 and gfx908. If I want to *add* an option to target gfx1030, I would need to dig out the options for the currently-enabled architectures and specify all of them again. It's doable once, manually, but it does not scale if this option is expected to be regularly tweaked by the end user, as is the case with --offload-arch. If --offload is expected to have similar use patterns, you may need to consider allowing it to be adjusted per-list-element.

clang/include/clang/Basic/Cuda.h
108–109

Does this need to be adjusted to exclude SPIR-V? If so, you may want to add another enum range for SPIR-V.

clang/include/clang/Driver/Options.td
1139

comma-separated list of offloading targets. is, unfortunately, somewhat ambiguous.
Does it mean "how the offload will be done". I.e. HSA, OpenMP, SPIRV, CUDA?
Or does it mean specific hardware we need to generate the code for?
The code suggests it's a variant of the former, but the option description does not.

E.g. offload_arch_EQ also uses the term "offloading target" but with a different meaning.

A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.

How generic is 'generic'? If I understand the statement above correctly, it should probably reflect that it's specific to spir-v.
If it's the only possible spir-v variant, then calling it`spir-v` might be more meaningful.
If we expect to see other spir-v variants in the future it would allow us to clearly differentiate between them later.
E.g. --offload=spirv-a,spirv-b. It would be rather odd if we had to use --offload=generic, spirv-b.

In this patch the ‘generic’ is meant to be a processor model defined in the SPIR-V backend. Now to come to think of it a bit more, I think it should not be specific to the SPIR-V target but the target at hand if its backend defines one. What I’m seeing is that each entry in the CudaArch has a processor by the same name in the NVPTX and AMGPU backends.

If I need to set different processor other from the SPIR-V backend than what is set as the default in HIP compilation, I thought from the [1] it could be carried out with something like:

--offload=spirv64 -Xoffload=spirv64 -march=other-spirv-cpu

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

Can you elaborate on what exactly this option does and how it's intended to interact with the existing --offload-arch?

I think that the --offload-arch interaction question is for @yaxunl. What is being contributed here is a partial implementation for the unified offloading options. The --offload option in this patch is used to supply the offload device target triple (in HIP compilation mode) for retargeting the device code emission to SPIR-V instead of emitting HSA.

In general a list of values, combined with the getLastArg will potentially be an issue if/when more than one list value will be supported.
In a large build it's fairly common for the build infrastructure to set the default options and allowing users to extend/override them with *additional* options. getLastArg works great for scalar options, not so much for the lists.
If an option is a list, modifying it requires prior knowledge of preceding values and that may not always be easy.
E.g. a build configuration may be set to target gfx900 and gfx908. If I want to *add* an option to target gfx1030, I would need to dig out the options for the currently-enabled architectures and specify all of them again. It's doable once, manually, but it does not scale if this option is expected to be regularly tweaked by the end user, as is the case with --offload-arch. If --offload is expected to have similar use patterns, you may need to consider allowing it to be adjusted per-list-element.

The use of getLastArg() is an oversight. I’ll fix it with getAllArgValues().

clang/include/clang/Basic/Cuda.h
108–109

Didn't notice this. I'll fix this.

clang/include/clang/Driver/Options.td
1139

I’m not sure how to rephrase the option description to be more clear. In the [1] the --offload option is envisioned to be quite flexible/expressive - it can take in target triples, offload kinds, processors, aliases for processor sets, etc.

FYI, I have imagined that the --offload option would take in explicit offload kind and target triple combinations as the basis. For example, something like this:

--offload=hip-amdgcn-amd-amdhsa,openmp-x86_64-pc-linux-gnu

And top of that, there would be predefined strings/shortcuts/aliases that expand to the basic form. For example,
--offload=sm_70,openmp-host could expand to something like:

--offload=cuda-nvptx64-nvidia-cuda,openmp-x86_64-pc-linux-gnu -Xoffload=cuda-nvptx64-nvidia-cuda -march=sm_70 ...

Then there is a feature as discussed in [1] that the offload kind can be dropped if it can be inferred by other means (e.g. from -x hip option).

linjamaki updated this revision to Diff 376829.Oct 4 2021, 2:32 AM

Repurpose 'Generic' CudaArch, Use getAllArgValues() for reading
--offload values and fix a enum range.

linjamaki updated this revision to Diff 382244.Oct 26 2021, 3:02 AM

Improve --offload option description.

yaxunl added inline comments.Wed, Nov 17, 1:18 PM
clang/include/clang/Basic/Cuda.h
109

can we use A < CudaArch::Generic instead? to avoid updating this line each time we add a new gfx arch.

clang/include/clang/Driver/Options.td
1139

The description better matches the current implementation.

By this patch, --offload= only supports specifying target triple for HIP and assumes default processor. The description should reflect that.

In the future, as --offload= supports more values, the description may be updated.

Also, --offload= is designed to be mutually exclusive with --offload-arch=. Probably we should check and diagnose that.

linjamaki updated this revision to Diff 388146.Thu, Nov 18, 3:09 AM
  • Adjust --offload description: reflect the current state.
  • Adjust enum range check in IsAMDGpuArch().
  • Make --offload and --offload-arch options mutually exclusive.
linjamaki marked 2 inline comments as done.Thu, Nov 18, 3:12 AM
linjamaki added inline comments.
clang/include/clang/Basic/Cuda.h
109

Changed as suggested.

clang/include/clang/Driver/Options.td
1139

Thanks for the feedback. The option description has been changed to reflect the current state.

LGTM. I will defer to @tra

linjamaki updated this revision to Diff 389114.Tue, Nov 23, 1:17 AM
linjamaki marked 2 inline comments as done.

Update a driver test case.