This is an archive of the discontinued LLVM Phabricator instance.

[HIPSPV][3/4] Enable SPIR-V emission for HIP
ClosedPublic

Authored by linjamaki on Sep 28 2021, 5:39 AM.

Details

Summary

This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.

  • ‘--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().
  • getOffloadingDeviceToolChain() function (based on the design in the SYCL repository) is added to select HIPSPVToolChain when HIP offload target is ‘spirv64’.
  • The HIPActionBuilder is modified to produce LLVM IR at the backend phase. HIPSPV tool chain expects to receive HIP device code as LLVM IR so it can run external LLVM passes over them. HIPSPV TC is also responsible for emitting the SPIR-V binary.
  • A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.
  • Tests are added for checking the HIPSPV tool chain.

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

Diff Detail

Event Timeline

linjamaki created this revision.Sep 28 2021, 5:39 AM

Style fixes.

Remove noisy change.

linjamaki published this revision for review.Sep 29 2021, 1:15 AM
linjamaki added reviewers: yaxunl, bader.
Herald added a project: Restricted Project. · View Herald Transcript
tra added a subscriber: tra.Sep 29 2021, 9:53 AM

A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.

How generic is 'generic'? If I understand the statement above correctly, it should probably reflect that it's specific to spir-v.
If it's the only possible spir-v variant, then calling it`spir-v` might be more meaningful.
If we expect to see other spir-v variants in the future it would allow us to clearly differentiate between them later.
E.g. --offload=spirv-a,spirv-b. It would be rather odd if we had to use --offload=generic, spirv-b.

tra added a comment.Sep 29 2021, 10:42 AM

--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

Can you elaborate on what exactly this option does and how it's intended to interact with the existing --offload-arch?

In general a list of values, combined with the getLastArg will potentially be an issue if/when more than one list value will be supported.
In a large build it's fairly common for the build infrastructure to set the default options and allowing users to extend/override them with *additional* options. getLastArg works great for scalar options, not so much for the lists.
If an option is a list, modifying it requires prior knowledge of preceding values and that may not always be easy.
E.g. a build configuration may be set to target gfx900 and gfx908. If I want to *add* an option to target gfx1030, I would need to dig out the options for the currently-enabled architectures and specify all of them again. It's doable once, manually, but it does not scale if this option is expected to be regularly tweaked by the end user, as is the case with --offload-arch. If --offload is expected to have similar use patterns, you may need to consider allowing it to be adjusted per-list-element.

clang/include/clang/Basic/Cuda.h
108–109

Does this need to be adjusted to exclude SPIR-V? If so, you may want to add another enum range for SPIR-V.

clang/include/clang/Driver/Options.td
1139

comma-separated list of offloading targets. is, unfortunately, somewhat ambiguous.
Does it mean "how the offload will be done". I.e. HSA, OpenMP, SPIRV, CUDA?
Or does it mean specific hardware we need to generate the code for?
The code suggests it's a variant of the former, but the option description does not.

E.g. offload_arch_EQ also uses the term "offloading target" but with a different meaning.

A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.

How generic is 'generic'? If I understand the statement above correctly, it should probably reflect that it's specific to spir-v.
If it's the only possible spir-v variant, then calling it`spir-v` might be more meaningful.
If we expect to see other spir-v variants in the future it would allow us to clearly differentiate between them later.
E.g. --offload=spirv-a,spirv-b. It would be rather odd if we had to use --offload=generic, spirv-b.

In this patch the ‘generic’ is meant to be a processor model defined in the SPIR-V backend. Now to come to think of it a bit more, I think it should not be specific to the SPIR-V target but the target at hand if its backend defines one. What I’m seeing is that each entry in the CudaArch has a processor by the same name in the NVPTX and AMGPU backends.

If I need to set different processor other from the SPIR-V backend than what is set as the default in HIP compilation, I thought from the [1] it could be carried out with something like:

--offload=spirv64 -Xoffload=spirv64 -march=other-spirv-cpu

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

Can you elaborate on what exactly this option does and how it's intended to interact with the existing --offload-arch?

I think that the --offload-arch interaction question is for @yaxunl. What is being contributed here is a partial implementation for the unified offloading options. The --offload option in this patch is used to supply the offload device target triple (in HIP compilation mode) for retargeting the device code emission to SPIR-V instead of emitting HSA.

In general a list of values, combined with the getLastArg will potentially be an issue if/when more than one list value will be supported.
In a large build it's fairly common for the build infrastructure to set the default options and allowing users to extend/override them with *additional* options. getLastArg works great for scalar options, not so much for the lists.
If an option is a list, modifying it requires prior knowledge of preceding values and that may not always be easy.
E.g. a build configuration may be set to target gfx900 and gfx908. If I want to *add* an option to target gfx1030, I would need to dig out the options for the currently-enabled architectures and specify all of them again. It's doable once, manually, but it does not scale if this option is expected to be regularly tweaked by the end user, as is the case with --offload-arch. If --offload is expected to have similar use patterns, you may need to consider allowing it to be adjusted per-list-element.

The use of getLastArg() is an oversight. I’ll fix it with getAllArgValues().

clang/include/clang/Basic/Cuda.h
108–109

Didn't notice this. I'll fix this.

clang/include/clang/Driver/Options.td
1139

I’m not sure how to rephrase the option description to be more clear. In the [1] the --offload option is envisioned to be quite flexible/expressive - it can take in target triples, offload kinds, processors, aliases for processor sets, etc.

FYI, I have imagined that the --offload option would take in explicit offload kind and target triple combinations as the basis. For example, something like this:

--offload=hip-amdgcn-amd-amdhsa,openmp-x86_64-pc-linux-gnu

And top of that, there would be predefined strings/shortcuts/aliases that expand to the basic form. For example,
--offload=sm_70,openmp-host could expand to something like:

--offload=cuda-nvptx64-nvidia-cuda,openmp-x86_64-pc-linux-gnu -Xoffload=cuda-nvptx64-nvidia-cuda -march=sm_70 ...

Then there is a feature as discussed in [1] that the offload kind can be dropped if it can be inferred by other means (e.g. from -x hip option).

linjamaki updated this revision to Diff 376829.Oct 4 2021, 2:32 AM

Repurpose 'Generic' CudaArch, Use getAllArgValues() for reading
--offload values and fix a enum range.

linjamaki updated this revision to Diff 382244.Oct 26 2021, 3:02 AM

Improve --offload option description.

yaxunl added inline comments.Nov 17 2021, 1:18 PM
clang/include/clang/Basic/Cuda.h
109

can we use A < CudaArch::Generic instead? to avoid updating this line each time we add a new gfx arch.

clang/include/clang/Driver/Options.td
1139

The description better matches the current implementation.

By this patch, --offload= only supports specifying target triple for HIP and assumes default processor. The description should reflect that.

In the future, as --offload= supports more values, the description may be updated.

Also, --offload= is designed to be mutually exclusive with --offload-arch=. Probably we should check and diagnose that.

linjamaki updated this revision to Diff 388146.Nov 18 2021, 3:09 AM
  • Adjust --offload description: reflect the current state.
  • Adjust enum range check in IsAMDGpuArch().
  • Make --offload and --offload-arch options mutually exclusive.
linjamaki marked 2 inline comments as done.Nov 18 2021, 3:12 AM
linjamaki added inline comments.
clang/include/clang/Basic/Cuda.h
109

Changed as suggested.

clang/include/clang/Driver/Options.td
1139

Thanks for the feedback. The option description has been changed to reflect the current state.

LGTM. I will defer to @tra

linjamaki updated this revision to Diff 389114.Nov 23 2021, 1:17 AM
linjamaki marked 2 inline comments as done.

Update a driver test case.

tra accepted this revision.Dec 6 2021, 10:17 AM

Note to self: don't forget to hit "submit". The comments below have been left unsubmitted for two weeks. Sorry about that.

The patch looks OK for the time being. That said, I do have concerns that we may be organically growing something that will be troublesome to deal with long-term.

TBH, I still can't quite make sense of where/how SPIR-V fits in the offloading nomenclature.

Right now we have multiple levels of offloading-related control points.

  • offload targets, specified by --offload-arch. Determines the ISA of the GPU binary we produce.
  • offload mechanism: OpenMP, CUDA runtime, HSA. Determines how we compile/pack/launch the GPU binaries.
  • front-end: CUDA/HIP/ C/C++ w/ OpenMP.
  • Driver: Determines compilation pipeline to glue everything together,

SPIR-V in these patches appears to be wearing multiple hats.
It changes compilation pipeline, it changes offload mechanism and it changes offload targets. To further complicate things, it appears to be a derivative of the HIP compilation. I can't tell if it's an implementation detail at the moment, or whether it will become a more generic offload mechanism that would be expected to be used by other front- and back-ends. E.g. can we potentially compile CUDA code to target SPIR-V? Can OpenMP offload to SPIR-V?

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

This revision is now accepted and ready to land.Dec 6 2021, 10:17 AM
bader added a comment.Dec 7 2021, 3:31 AM

The patch looks OK for the time being. That said, I do have concerns that we may be organically growing something that will be troublesome to deal with long-term.

TBH, I still can't quite make sense of where/how SPIR-V fits in the offloading nomenclature.

Right now we have multiple levels of offloading-related control points.

  • offload targets, specified by --offload-arch. Determines the ISA of the GPU binary we produce.
  • offload mechanism: OpenMP, CUDA runtime, HSA. Determines how we compile/pack/launch the GPU binaries.
  • front-end: CUDA/HIP/ C/C++ w/ OpenMP.
  • Driver: Determines compilation pipeline to glue everything together,

SPIR-V in these patches appears to be wearing multiple hats.
It changes compilation pipeline, it changes offload mechanism and it changes offload targets.

From my POV, SPIR-V is "the ISA of GPU binary we produce" and we might need some changes at offloading-related control points:

  • offload mechanism: none of listed "offload mechanisms" (i.e. OpenMP, CUDA runtime, HSA) is able to handle SPIR-V natively. On the other hand, I'm not sure if there is a need in additional changes for all "offloading mechanisms". E.g. Intel's compiler implements OpenMP-offload to SPIR-V target using OpenMP runtime plug-in lowering OpenMP runtime calls to OpenCL/Level Zero. OpenCL and Level Zero runtimes are able to compile and launch SPIR-V binaries.
  • front-end: if we compare SPIR to other ISAs, they change compilation pipeline as well (e.g. add new built-ins to expose ISA, add CodeGen library changes to emit ISA specific metadata, etc.). AMDGPU ISA or NVIDIA GPU ISA changed front-end/compilation pipeline as well. Do you refer to some other non-ISA specific changes? BTW, shameless plug, the patch adding built-in functions and types for SPIR-V ISA is under review here - https://reviews.llvm.org/D108034.
  • Driver: front-end compiler doesn't support SPIR-V format yet (i.e. SPIR-V requires special encoding different from LLVM bitcode) , so Driver hooks up LLVM->SPIR-V translator tool to produce SPIR-V binary.

To further complicate things, it appears to be a derivative of the HIP compilation. I can't tell if it's an implementation detail at the moment, or whether it will become a more generic offload mechanism that would be expected to be used by other front- and back-ends. E.g. can we potentially compile CUDA code to target SPIR-V? Can OpenMP offload to SPIR-V?

Intel's compiler compiles OpenMP offload and SYCL to SPIR-V, so we definitely would like to target SPIR-V using other front-ends.

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

yaxunl added a comment.Dec 7 2021, 9:31 AM

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

tra added a comment.Dec 7 2021, 10:53 AM

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

So, one would expect that we should be able to specify it more than once to target multiple GPU variants, if we were to use it as a more flexible --offload-arch.
If I read the tests correctly, using --offload= limits us to exactly one variant now. Perhaps it should eventually be relaxed to only enforce single --offload= variant if we're offloading to SPIR-V. It's not a showstopper for this patch. We can relax it later.

yaxunl added a comment.Dec 7 2021, 1:19 PM

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

So, one would expect that we should be able to specify it more than once to target multiple GPU variants, if we were to use it as a more flexible --offload-arch.
If I read the tests correctly, using --offload= limits us to exactly one variant now. Perhaps it should eventually be relaxed to only enforce single --offload= variant if we're offloading to SPIR-V. It's not a showstopper for this patch. We can relax it later.

I don't think --offload= is restricted to be specified only once. The test checks --offload-arch= and --offload= are mutually exclusive.

tra added a comment.Dec 7 2021, 1:31 PM

I don't think --offload= is restricted to be specified only once. The test checks --offload-arch= and --offload= are mutually exclusive.

It effectively is. See my inline comment.

// RUN: %clang -### -x hip -target x86_64-linux-gnu --offload=foo,bar \
// RUN:   --hip-path=%S/Inputs/hipspv -nogpuinc -nogpulib %s \
// RUN: 2>&1 | FileCheck --check-prefix=TOO-MANY-TARGETS %s

// TOO-MANY-TARGETS: error: Only one offload target is supported in HIP.
clang/lib/Driver/Driver.cpp
106–110

^^^

This will cause issues in practice, as we're only allowed to specify --offload once.

I.e. we're neither allowed to override it (this goes contrary to how clang options are handled conventionally), nor can we extend or modify the list of offload variants as we can with --offload-arch.

This code initially used getLastArg, but that does not work for an option that controls a list of values. Perhaps we should just make --offload= a scalar value so it, at least, behaves consistently with other clang options.

yaxunl added inline comments.Dec 7 2021, 2:08 PM
clang/lib/Driver/Driver.cpp
106–110

You are right. I overlooked this part.

If multiple --offload= options are specified, they are supposed to be unioned. Since currently --offload= only accepts amdgcn-amd-amdhsa and spirv64, and they are mutually exclusive. I think it is OK.

In the future, --offload= will accept GPU archs, then this part needs to allow multiple --offload= options and more sophisticated compatibility check between different options.

I agree letting --offload= accept scalar value seems a good choice.

linjamaki updated this revision to Diff 392675.Dec 8 2021, 1:09 AM
  • Add comments to clarify the limitation of the --offload option to one target.
  • Add test for multiple --offload option instances.

Thanks for the feedback. The --offload is meant to support multiple targets but right now it is restricted to one at most. The limitation comes from the HIPActionBuilder and CudaActionBuilderBase which currently expects a single target triple and toolchain for all offload devices. To relax the --offload target count cap we would need to adjust the action builders to support multiple target triples and create a separate toolchain for each (unique) target triple.

Details for the --offload option for specifying multiple targets are left open in this patch. What this patch needs is at least an ability to specify a single target (e.g. --offload=spirv64).

Assuming that this patch is ready to land. @tra or @yaxunl, could you please commit this patch to the LLVM for us? Thanks.

Assuming that this patch is ready to land. @tra or @yaxunl, could you please commit this patch to the LLVM for us? Thanks.

I can help commit this patch.

This revision was landed with ongoing or failed builds.Dec 20 2021, 8:01 AM
This revision was automatically updated to reflect the committed changes.