This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
3/4
Cuda.h
-
DiagnosticDriverKinds.td
-
Driver/
-
Driver.h
3/4
Options.td
-
lib/
-
Basic/
-
Cuda.cpp
-
Targets/
-
NVPTX.h
-
NVPTX.cpp
-
CodeGen/
-
CGOpenMPRuntimeGPU.cpp
-
Driver/
2
Driver.cpp
-
test/Driver/
-
Driver/
-
Inputs/
-
hipspv-dev-lib/
-
a/
-
a.bc
-
b/
-
b.bc
-
hipspv-spirv64.bc
-
hipspv/
-
bin/
-
.hipVersion
-
lib/
-
hip-device-lib/
-
hipspv-spirv64.bc
-
libLLVMHipSpvPasses.so
-
pass-plugin.so
-
hipspv-device-libs.hip
-
hipspv-pass-plugin.hip
-
hipspv-toolchain-rdc.hip
-
hipspv-toolchain.hip
-
invalid-offload-options.cpp

Differential D110622

[HIPSPV][3/4] Enable SPIR-V emission for HIP
ClosedPublic

Authored by linjamaki on Sep 28 2021, 5:39 AM.

Download Raw Diff

Details

Reviewers

yaxunl
bader
jdoerfert
tra
rsmith

Commits

rGa6786cdd5757: [HIPSPV][3/4] Enable SPIR-V emission for HIP

Summary

This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.

‘--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

getOffloadingDeviceToolChain() function (based on the design in the SYCL repository) is added to select HIPSPVToolChain when HIP offload target is ‘spirv64’.

The HIPActionBuilder is modified to produce LLVM IR at the backend phase. HIPSPV tool chain expects to receive HIP device code as LLVM IR so it can run external LLVM passes over them. HIPSPV TC is also responsible for emitting the SPIR-V binary.

A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.

Tests are added for checking the HIPSPV tool chain.

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

linjamaki created this revision.Sep 28 2021, 5:39 AM

Herald added subscribers: Naghasan, dexonsmith, dang and 3 others. · View Herald TranscriptSep 28 2021, 5:39 AM

Harbormaster completed remote builds in B126082: Diff 375540.Sep 28 2021, 5:40 AM

linjamaki added a parent revision: D110618: [HIPSPV][2/4] Add HIPSPV tool chain.Sep 28 2021, 5:44 AM

Style fixes.

Harbormaster completed remote builds in B126249: Diff 375783.Sep 28 2021, 10:49 PM

Remove noisy change.

Harbormaster completed remote builds in B126252: Diff 375786.Sep 28 2021, 11:21 PM

linjamaki published this revision for review.Sep 29 2021, 1:15 AM

linjamaki added reviewers: yaxunl, bader.

Herald added a reviewer: jdoerfert. · View Herald TranscriptSep 29 2021, 1:15 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

linjamaki added a child revision: D110685: [HIPSPV][4/4] Add option to use llc to emit SPIR-V.Sep 29 2021, 1:24 AM

A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.

How generic is 'generic'? If I understand the statement above correctly, it should probably reflect that it's specific to spir-v.
If it's the only possible spir-v variant, then calling it`spir-v` might be more meaningful.
If we expect to see other spir-v variants in the future it would allow us to clearly differentiate between them later.
E.g. --offload=spirv-a,spirv-b. It would be rather odd if we had to use --offload=generic, spirv-b.

--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

Can you elaborate on what exactly this option does and how it's intended to interact with the existing --offload-arch?

In general a list of values, combined with the getLastArg will potentially be an issue if/when more than one list value will be supported.
In a large build it's fairly common for the build infrastructure to set the default options and allowing users to extend/override them with *additional* options. getLastArg works great for scalar options, not so much for the lists.
If an option is a list, modifying it requires prior knowledge of preceding values and that may not always be easy.
E.g. a build configuration may be set to target gfx900 and gfx908. If I want to *add* an option to target gfx1030, I would need to dig out the options for the currently-enabled architectures and specify all of them again. It's doable once, manually, but it does not scale if this option is expected to be regularly tweaked by the end user, as is the case with --offload-arch. If --offload is expected to have similar use patterns, you may need to consider allowing it to be adjusted per-list-element.

clang/include/clang/Basic/Cuda.h
108–109	Does this need to be adjusted to exclude SPIR-V? If so, you may want to add another enum range for SPIR-V.
clang/include/clang/Driver/Options.td
1139	`comma-separated list of offloading targets.` is, unfortunately, somewhat ambiguous. Does it mean "how the offload will be done". I.e. HSA, OpenMP, SPIRV, CUDA? Or does it mean specific hardware we need to generate the code for? The code suggests it's a variant of the former, but the option description does not. E.g. `offload_arch_EQ` also uses the term "offloading target" but with a different meaning.

In D110622#3030792, @tra wrote:

A Cuda GPU architecture ‘generic’ is added. The name is picked from the LLVM SPIR-V Backend. In the HIPSPV code path the architecture name is inserted to the bundle entry ID as target ID. Target ID is expected to be always present so a component in the target triple is not mistaken as target ID.

How generic is 'generic'? If I understand the statement above correctly, it should probably reflect that it's specific to spir-v.
If it's the only possible spir-v variant, then calling it`spir-v` might be more meaningful.
If we expect to see other spir-v variants in the future it would allow us to clearly differentiate between them later.
E.g. --offload=spirv-a,spirv-b. It would be rather odd if we had to use --offload=generic, spirv-b.

In this patch the ‘generic’ is meant to be a processor model defined in the SPIR-V backend. Now to come to think of it a bit more, I think it should not be specific to the SPIR-V target but the target at hand if its backend defines one. What I’m seeing is that each entry in the CudaArch has a processor by the same name in the NVPTX and AMGPU backends.

If I need to set different processor other from the SPIR-V backend than what is set as the default in HIP compilation, I thought from the [1] it could be carried out with something like:

--offload=spirv64 -Xoffload=spirv64 -march=other-spirv-cpu

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

In D110622#3031010, @tra wrote:

--offload’ option, which is envisioned in [1], is added for specifying offload targets. This option is used to override default device target (amdgcn-amd-amdhsa) for HIP compilation for emitting device code as SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

Can you elaborate on what exactly this option does and how it's intended to interact with the existing --offload-arch?

I think that the --offload-arch interaction question is for @yaxunl. What is being contributed here is a partial implementation for the unified offloading options. The --offload option in this patch is used to supply the offload device target triple (in HIP compilation mode) for retargeting the device code emission to SPIR-V instead of emitting HSA.

In general a list of values, combined with the getLastArg will potentially be an issue if/when more than one list value will be supported.
In a large build it's fairly common for the build infrastructure to set the default options and allowing users to extend/override them with *additional* options. getLastArg works great for scalar options, not so much for the lists.
If an option is a list, modifying it requires prior knowledge of preceding values and that may not always be easy.
E.g. a build configuration may be set to target gfx900 and gfx908. If I want to *add* an option to target gfx1030, I would need to dig out the options for the currently-enabled architectures and specify all of them again. It's doable once, manually, but it does not scale if this option is expected to be regularly tweaked by the end user, as is the case with --offload-arch. If --offload is expected to have similar use patterns, you may need to consider allowing it to be adjusted per-list-element.

The use of getLastArg() is an oversight. I’ll fix it with getAllArgValues().

clang/include/clang/Basic/Cuda.h
108–109	Didn't notice this. I'll fix this.
clang/include/clang/Driver/Options.td
1139	I’m not sure how to rephrase the option description to be more clear. In the [1] the `--offload` option is envisioned to be quite flexible/expressive - it can take in target triples, offload kinds, processors, aliases for processor sets, etc. FYI, I have imagined that the `--offload` option would take in explicit offload kind and target triple combinations as the basis. For example, something like this: --offload=hip-amdgcn-amd-amdhsa,openmp-x86_64-pc-linux-gnu And top of that, there would be predefined strings/shortcuts/aliases that expand to the basic form. For example, `--offload=sm_70,openmp-host` could expand to something like: --offload=cuda-nvptx64-nvidia-cuda,openmp-x86_64-pc-linux-gnu -Xoffload=cuda-nvptx64-nvidia-cuda -march=sm_70 ... Then there is a feature as discussed in [1] that the offload kind can be dropped if it can be inferred by other means (e.g. from `-x hip` option).

Repurpose 'Generic' CudaArch, Use getAllArgValues() for reading
--offload values and fix a enum range.

Harbormaster completed remote builds in B126800: Diff 376829.Oct 4 2021, 2:32 AM

linjamaki mentioned this in D109144: [SPIR-V] Add SPIR-V triple architecture and clang target info.Oct 8 2021, 6:16 AM

Rebase.

Harbormaster completed remote builds in B130367: Diff 381855.Oct 25 2021, 12:56 AM

Improve --offload option description.

linjamaki added reviewers: tra, rsmith.Oct 26 2021, 3:07 AM

Harbormaster completed remote builds in B130649: Diff 382244.Oct 26 2021, 3:57 AM

Rebase.

Herald added a subscriber: asavonic. · View Herald TranscriptNov 16 2021, 12:07 AM

Harbormaster completed remote builds in B134438: Diff 387509.Nov 16 2021, 12:34 AM

Gentle ping.

yaxunl added inline comments.Nov 17 2021, 1:18 PM

clang/include/clang/Basic/Cuda.h
109	can we use A < CudaArch::Generic instead? to avoid updating this line each time we add a new gfx arch.
clang/include/clang/Driver/Options.td
1139	The description better matches the current implementation. By this patch, `--offload=` only supports specifying target triple for HIP and assumes default processor. The description should reflect that. In the future, as `--offload=` supports more values, the description may be updated. Also, `--offload=` is designed to be mutually exclusive with `--offload-arch=`. Probably we should check and diagnose that.

Adjust --offload description: reflect the current state.
Adjust enum range check in IsAMDGpuArch().
Make --offload and --offload-arch options mutually exclusive.

linjamaki marked 2 inline comments as done.Nov 18 2021, 3:12 AM

linjamaki added inline comments.

clang/include/clang/Basic/Cuda.h
109	Changed as suggested.
clang/include/clang/Driver/Options.td
1139	Thanks for the feedback. The option description has been changed to reflect the current state.

Harbormaster completed remote builds in B134862: Diff 388146.Nov 18 2021, 3:44 AM

LGTM. I will defer to @tra

Update a driver test case.

Harbormaster completed remote builds in B135565: Diff 389114.Nov 23 2021, 2:33 AM

@tra, gentle ping.

Note to self: don't forget to hit "submit". The comments below have been left unsubmitted for two weeks. Sorry about that.

The patch looks OK for the time being. That said, I do have concerns that we may be organically growing something that will be troublesome to deal with long-term.

TBH, I still can't quite make sense of where/how SPIR-V fits in the offloading nomenclature.

Right now we have multiple levels of offloading-related control points.

offload targets, specified by --offload-arch. Determines the ISA of the GPU binary we produce.
offload mechanism: OpenMP, CUDA runtime, HSA. Determines how we compile/pack/launch the GPU binaries.
front-end: CUDA/HIP/ C/C++ w/ OpenMP.
Driver: Determines compilation pipeline to glue everything together,

SPIR-V in these patches appears to be wearing multiple hats.
It changes compilation pipeline, it changes offload mechanism and it changes offload targets. To further complicate things, it appears to be a derivative of the HIP compilation. I can't tell if it's an implementation detail at the moment, or whether it will become a more generic offload mechanism that would be expected to be used by other front- and back-ends. E.g. can we potentially compile CUDA code to target SPIR-V? Can OpenMP offload to SPIR-V?

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

This revision is now accepted and ready to land.Dec 6 2021, 10:17 AM

In D110622#3174113, @tra wrote:

The patch looks OK for the time being. That said, I do have concerns that we may be organically growing something that will be troublesome to deal with long-term.

TBH, I still can't quite make sense of where/how SPIR-V fits in the offloading nomenclature.

Right now we have multiple levels of offloading-related control points.

offload targets, specified by --offload-arch. Determines the ISA of the GPU binary we produce.

offload mechanism: OpenMP, CUDA runtime, HSA. Determines how we compile/pack/launch the GPU binaries.

front-end: CUDA/HIP/ C/C++ w/ OpenMP.

Driver: Determines compilation pipeline to glue everything together,

SPIR-V in these patches appears to be wearing multiple hats.
It changes compilation pipeline, it changes offload mechanism and it changes offload targets.

From my POV, SPIR-V is "the ISA of GPU binary we produce" and we might need some changes at offloading-related control points:

offload mechanism: none of listed "offload mechanisms" (i.e. OpenMP, CUDA runtime, HSA) is able to handle SPIR-V natively. On the other hand, I'm not sure if there is a need in additional changes for all "offloading mechanisms". E.g. Intel's compiler implements OpenMP-offload to SPIR-V target using OpenMP runtime plug-in lowering OpenMP runtime calls to OpenCL/Level Zero. OpenCL and Level Zero runtimes are able to compile and launch SPIR-V binaries.
front-end: if we compare SPIR to other ISAs, they change compilation pipeline as well (e.g. add new built-ins to expose ISA, add CodeGen library changes to emit ISA specific metadata, etc.). AMDGPU ISA or NVIDIA GPU ISA changed front-end/compilation pipeline as well. Do you refer to some other non-ISA specific changes? BTW, shameless plug, the patch adding built-in functions and types for SPIR-V ISA is under review here - https://reviews.llvm.org/D108034.
Driver: front-end compiler doesn't support SPIR-V format yet (i.e. SPIR-V requires special encoding different from LLVM bitcode) , so Driver hooks up LLVM->SPIR-V translator tool to produce SPIR-V binary.

To further complicate things, it appears to be a derivative of the HIP compilation. I can't tell if it's an implementation detail at the moment, or whether it will become a more generic offload mechanism that would be expected to be used by other front- and back-ends. E.g. can we potentially compile CUDA code to target SPIR-V? Can OpenMP offload to SPIR-V?

Intel's compiler compiles OpenMP offload and SYCL to SPIR-V, so we definitely would like to target SPIR-V using other front-ends.

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

In D110622#3176804, @yaxunl wrote:

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

So, one would expect that we should be able to specify it more than once to target multiple GPU variants, if we were to use it as a more flexible --offload-arch.
If I read the tests correctly, using --offload= limits us to exactly one variant now. Perhaps it should eventually be relaxed to only enforce single --offload= variant if we're offloading to SPIR-V. It's not a showstopper for this patch. We can relax it later.

In D110622#3177111, @tra wrote:

In D110622#3176804, @yaxunl wrote:

So, the question is -- what's the right way to specify something like this in a consistent manner?
--offload option proposed here does not seem to be a good fit. It was intended as a more flexible way to create a single -cc1 sub-compilation and we're doing quite a bit more here.

Does --offload-arch=spirv* fit better here? If I understand the goal of this patch correctly, it tries to provide controls for changing offload target for HIP application from default (AMDGCN) to SPIR-V.

--offload-arch= only accepts GPU arch which is translated to processor option (-mcpu= or -march=) in clang -cc1. spirv is a target triple which is not suitable for --offload-arch=.

--offload= is supposed to cover both target triple and processor with some flexibility. If only target triple is specified, it assumes default processor. If only processor is specified, it deduces target triple. It also allows both triple and processor. In this case, --offload=spirv translates to -triple spirv -mcpu=generic.

So, one would expect that we should be able to specify it more than once to target multiple GPU variants, if we were to use it as a more flexible --offload-arch.
If I read the tests correctly, using --offload= limits us to exactly one variant now. Perhaps it should eventually be relaxed to only enforce single --offload= variant if we're offloading to SPIR-V. It's not a showstopper for this patch. We can relax it later.

I don't think --offload= is restricted to be specified only once. The test checks --offload-arch= and --offload= are mutually exclusive.

In D110622#3177490, @yaxunl wrote:

I don't think --offload= is restricted to be specified only once. The test checks --offload-arch= and --offload= are mutually exclusive.

It effectively is. See my inline comment.

// RUN: %clang -### -x hip -target x86_64-linux-gnu --offload=foo,bar \
// RUN:   --hip-path=%S/Inputs/hipspv -nogpuinc -nogpulib %s \
// RUN: 2>&1 | FileCheck --check-prefix=TOO-MANY-TARGETS %s

// TOO-MANY-TARGETS: error: Only one offload target is supported in HIP.

clang/lib/Driver/Driver.cpp
106–110	^^^ This will cause issues in practice, as we're only allowed to specify --offload once. I.e. we're neither allowed to override it (this goes contrary to how clang options are handled conventionally), nor can we extend or modify the list of offload variants as we can with --offload-arch. This code initially used `getLastArg`, but that does not work for an option that controls a list of values. Perhaps we should just make `--offload=` a scalar value so it, at least, behaves consistently with other clang options.

yaxunl added inline comments.Dec 7 2021, 2:08 PM

clang/lib/Driver/Driver.cpp
106–110	You are right. I overlooked this part. If multiple `--offload=` options are specified, they are supposed to be unioned. Since currently `--offload=` only accepts `amdgcn-amd-amdhsa` and `spirv64`, and they are mutually exclusive. I think it is OK. In the future, `--offload=` will accept GPU archs, then this part needs to allow multiple `--offload=` options and more sophisticated compatibility check between different options. I agree letting `--offload=` accept scalar value seems a good choice.

Add comments to clarify the limitation of the --offload option to one target.

Add test for multiple --offload option instances.

Thanks for the feedback. The --offload is meant to support multiple targets but right now it is restricted to one at most. The limitation comes from the HIPActionBuilder and CudaActionBuilderBase which currently expects a single target triple and toolchain for all offload devices. To relax the --offload target count cap we would need to adjust the action builders to support multiple target triples and create a separate toolchain for each (unique) target triple.

Details for the --offload option for specifying multiple targets are left open in this patch. What this patch needs is at least an ability to specify a single target (e.g. --offload=spirv64).

Harbormaster completed remote builds in B138099: Diff 392675.Dec 8 2021, 1:45 AM

Rebase.

Harbormaster completed remote builds in B139609: Diff 394789.Dec 16 2021, 1:53 AM

Assuming that this patch is ready to land. @tra or @yaxunl, could you please commit this patch to the LLVM for us? Thanks.

In D110622#3199233, @linjamaki wrote:

Assuming that this patch is ready to land. @tra or @yaxunl, could you please commit this patch to the LLVM for us? Thanks.

I can help commit this patch.

This revision was landed with ongoing or failed builds.Dec 20 2021, 8:01 AM

Closed by commit rGa6786cdd5757: [HIPSPV][3/4] Enable SPIR-V emission for HIP (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rGa6786cdd5757: [HIPSPV][3/4] Enable SPIR-V emission for HIP.

Thanks, @yaxunl.

dcastagna mentioned this in D117137: [Driver] Add CUDA support for --offload param.Jan 19 2022, 5:41 PM

jlebar mentioned this in rG6eb826567af0: [Driver] Add CUDA support for --offload param.Jan 28 2022, 2:51 PM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Cuda.h

5 lines

DiagnosticDriverKinds.td

5 lines

Driver/

Driver.h

15 lines

Options.td

7 lines

lib/

Basic/

Cuda.cpp

1 line

Targets/

NVPTX.h

2 lines

NVPTX.cpp

1 line

CodeGen/

CGOpenMPRuntimeGPU.cpp

1 line

Driver/

Driver.cpp

124 lines

test/

Driver/

Inputs/

hipspv-dev-lib/

a.bc

b.bc

hipspv-spirv64.bc

hipspv/

bin/

.hipVersion

2 lines

lib/

hip-device-lib/

hipspv-spirv64.bc

libLLVMHipSpvPasses.so

pass-plugin.so

hipspv-device-libs.hip

28 lines

hipspv-pass-plugin.hip

27 lines

hipspv-toolchain-rdc.hip

63 lines

hipspv-toolchain.hip

37 lines

invalid-offload-options.cpp

31 lines

Diff 395453

clang/include/clang/Basic/Cuda.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	enum class CudaArch {
GFX1012,		GFX1012,
GFX1013,		GFX1013,
GFX1030,		GFX1030,
GFX1031,		GFX1031,
GFX1032,		GFX1032,
GFX1033,		GFX1033,
GFX1034,		GFX1034,
GFX1035,		GFX1035,
		Generic, // A processor model named 'generic' if the target backend defines a
		// public one.
LAST,		LAST,
};		};

static inline bool IsNVIDIAGpuArch(CudaArch A) {		static inline bool IsNVIDIAGpuArch(CudaArch A) {
return A >= CudaArch::SM_20 && A < CudaArch::GFX600;		return A >= CudaArch::SM_20 && A < CudaArch::GFX600;
}		}

static inline bool IsAMDGpuArch(CudaArch A) {		static inline bool IsAMDGpuArch(CudaArch A) {
return A >= CudaArch::GFX600 && A < CudaArch::LAST;		// Generic processor model is for testing only.
		return A >= CudaArch::GFX600 && A < CudaArch::Generic;
		traUnsubmitted Not Done Reply Inline Actions Does this need to be adjusted to exclude SPIR-V? If so, you may want to add another enum range for SPIR-V. tra: Does this need to be adjusted to exclude SPIR-V? If so, you may want to add another enum range…
		linjamakiAuthorUnsubmitted Done Reply Inline Actions Didn't notice this. I'll fix this. linjamaki: Didn't notice this. I'll fix this.
		yaxunlUnsubmitted Done Reply Inline Actions can we use A < CudaArch::Generic instead? to avoid updating this line each time we add a new gfx arch. yaxunl: can we use A < CudaArch::Generic instead? to avoid updating this line each time we add a new…
		linjamakiAuthorUnsubmitted Done Reply Inline Actions Changed as suggested. linjamaki: Changed as suggested.
}		}

const char *CudaArchToString(CudaArch A);		const char *CudaArchToString(CudaArch A);
const char *CudaArchToVirtualArchString(CudaArch A);		const char *CudaArchToVirtualArchString(CudaArch A);

// The input should have the form "sm_20".		// The input should have the form "sm_20".
CudaArch StringToCudaArch(llvm::StringRef S);		CudaArch StringToCudaArch(llvm::StringRef S);

Show All 21 Lines

clang/include/clang/Basic/DiagnosticDriverKinds.td

Show First 20 Lines • Show All 615 Lines • ▼ Show 20 Lines	def err_cc1_round_trip_ok_then_fail : Error<
"generated arguments parse failed in round-trip">;		"generated arguments parse failed in round-trip">;
def err_cc1_round_trip_mismatch : Error<		def err_cc1_round_trip_mismatch : Error<
"generated arguments do not match in round-trip">;		"generated arguments do not match in round-trip">;
def err_cc1_unbounded_vscale_min : Error<		def err_cc1_unbounded_vscale_min : Error<
"minimum vscale must be an unsigned integer greater than 0">;		"minimum vscale must be an unsigned integer greater than 0">;

def err_drv_ssp_missing_offset_argument : Error<		def err_drv_ssp_missing_offset_argument : Error<
"'%0' is used without '-mstack-protector-guard-offset', and there is no default">;		"'%0' is used without '-mstack-protector-guard-offset', and there is no default">;

		def err_drv_only_one_offload_target_supported_in : Error<
		"Only one offload target is supported in %0.">;
		def err_drv_invalid_or_unsupported_offload_target : Error<
		"Invalid or unsupported offload target: '%0'.">;
}		}

clang/include/clang/Driver/Driver.h

Show First 20 Lines • Show All 589 Lines • ▼ Show 20 Lines	private:
///		///
/// Will cache ToolChains for the life of the driver object, and create them		/// Will cache ToolChains for the life of the driver object, and create them
/// on-demand.		/// on-demand.
const ToolChain &getToolChain(const llvm::opt::ArgList &Args,		const ToolChain &getToolChain(const llvm::opt::ArgList &Args,
const llvm::Triple &Target) const;		const llvm::Triple &Target) const;

/// @}		/// @}

		/// Retrieves a ToolChain for a particular device \p Target triple
		///
		/// \param[in] HostTC is the host ToolChain paired with the device
		///
		/// \param[in] Action (e.g. OFK_Cuda/OFK_OpenMP/OFK_SYCL) is an Offloading
		/// action that is optionally passed to a ToolChain (used by CUDA, to specify
		/// if it's used in conjunction with OpenMP)
		///
		/// Will cache ToolChains for the life of the driver object, and create them
		/// on-demand.
		const ToolChain &getOffloadingDeviceToolChain(
		const llvm::opt::ArgList &Args, const llvm::Triple &Target,
		const ToolChain &HostTC,
		const Action::OffloadKind &TargetDeviceOffloadKind) const;

/// Get bitmasks for which option flags to include and exclude based on		/// Get bitmasks for which option flags to include and exclude based on
/// the driver mode.		/// the driver mode.
std::pair<unsigned, unsigned> getIncludeExcludeOptionFlagMasks(bool IsClCompatMode) const;		std::pair<unsigned, unsigned> getIncludeExcludeOptionFlagMasks(bool IsClCompatMode) const;

/// Helper used in BuildJobsForAction. Doesn't use the cache when building		/// Helper used in BuildJobsForAction. Doesn't use the cache when building
/// jobs specifically for the given action, but will use the cache when		/// jobs specifically for the given action, but will use the cache when
/// building jobs for the Action's inputs.		/// building jobs for the Action's inputs.
InputInfo BuildJobsForActionNoCache(		InputInfo BuildJobsForActionNoCache(
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,129 Lines • ▼ Show 20 Lines	defm double_square_bracket_attributes : BoolFOption<"double-square-bracket-attributes",
PosFlag<SetTrue, [], "Enable">, NegFlag<SetFalse, [], "Disable">,		PosFlag<SetTrue, [], "Enable">, NegFlag<SetFalse, [], "Disable">,
BothFlags<[NoXarchOption, CC1Option], " '[[]]' attributes in all C and C++ language modes">>;		BothFlags<[NoXarchOption, CC1Option], " '[[]]' attributes in all C and C++ language modes">>;

defm autolink : BoolFOption<"autolink",		defm autolink : BoolFOption<"autolink",
CodeGenOpts<"Autolink">, DefaultTrue,		CodeGenOpts<"Autolink">, DefaultTrue,
NegFlag<SetFalse, [CC1Option], "Disable generation of linker directives for automatic library linking">,		NegFlag<SetFalse, [CC1Option], "Disable generation of linker directives for automatic library linking">,
PosFlag<SetTrue>>;		PosFlag<SetTrue>>;

		// In the future this option will be supported by other offloading
		// languages and accept other values such as CPU/GPU architectures,
		traUnsubmitted Not Done Reply Inline Actions `comma-separated list of offloading targets.` is, unfortunately, somewhat ambiguous. Does it mean "how the offload will be done". I.e. HSA, OpenMP, SPIRV, CUDA? Or does it mean specific hardware we need to generate the code for? The code suggests it's a variant of the former, but the option description does not. E.g. `offload_arch_EQ` also uses the term "offloading target" but with a different meaning. tra: `comma-separated list of offloading targets.` is, unfortunately, somewhat ambiguous. Does it…
		linjamakiAuthorUnsubmitted Done Reply Inline Actions I’m not sure how to rephrase the option description to be more clear. In the [1] the `--offload` option is envisioned to be quite flexible/expressive - it can take in target triples, offload kinds, processors, aliases for processor sets, etc. FYI, I have imagined that the `--offload` option would take in explicit offload kind and target triple combinations as the basis. For example, something like this: --offload=hip-amdgcn-amd-amdhsa,openmp-x86_64-pc-linux-gnu And top of that, there would be predefined strings/shortcuts/aliases that expand to the basic form. For example, `--offload=sm_70,openmp-host` could expand to something like: --offload=cuda-nvptx64-nvidia-cuda,openmp-x86_64-pc-linux-gnu -Xoffload=cuda-nvptx64-nvidia-cuda -march=sm_70 ... Then there is a feature as discussed in [1] that the offload kind can be dropped if it can be inferred by other means (e.g. from `-x hip` option). linjamaki: I’m not sure how to rephrase the option description to be more clear. In the [1] the `…
		yaxunlUnsubmitted Done Reply Inline Actions The description better matches the current implementation. By this patch, `--offload=` only supports specifying target triple for HIP and assumes default processor. The description should reflect that. In the future, as `--offload=` supports more values, the description may be updated. Also, `--offload=` is designed to be mutually exclusive with `--offload-arch=`. Probably we should check and diagnose that. yaxunl: The description better matches the current implementation. By this patch, `--offload=` only…
		linjamakiAuthorUnsubmitted Done Reply Inline Actions Thanks for the feedback. The option description has been changed to reflect the current state. linjamaki: Thanks for the feedback. The option description has been changed to reflect the current state.
		// offload kinds and target aliases.
		def offload_EQ : CommaJoined<["--"], "offload=">, Flags<[NoXarchOption]>,
		HelpText<"Specify comma-separated list of offloading target triples"
		" (HIP only)">;

// C++ Coroutines TS		// C++ Coroutines TS
defm coroutines_ts : BoolFOption<"coroutines-ts",		defm coroutines_ts : BoolFOption<"coroutines-ts",
LangOpts<"Coroutines">, Default<cpp20.KeyPath>,		LangOpts<"Coroutines">, Default<cpp20.KeyPath>,
PosFlag<SetTrue, [CC1Option], "Enable support for the C++ Coroutines TS">,		PosFlag<SetTrue, [CC1Option], "Enable support for the C++ Coroutines TS">,
NegFlag<SetFalse>>;		NegFlag<SetFalse>>;

def fembed_bitcode_EQ : Joined<["-"], "fembed-bitcode=">,		def fembed_bitcode_EQ : Joined<["-"], "fembed-bitcode=">,
Group<f_Group>, Flags<[NoXarchOption, CC1Option, CC1AsOption]>, MetaVarName<"<option>">,		Group<f_Group>, Flags<[NoXarchOption, CC1Option, CC1AsOption]>, MetaVarName<"<option>">,
▲ Show 20 Lines • Show All 5,330 Lines • Show Last 20 Lines

clang/lib/Basic/Cuda.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	static const CudaArchToStringMap arch_names[] = {
GFX(1012), // gfx1012		GFX(1012), // gfx1012
GFX(1013), // gfx1013		GFX(1013), // gfx1013
GFX(1030), // gfx1030		GFX(1030), // gfx1030
GFX(1031), // gfx1031		GFX(1031), // gfx1031
GFX(1032), // gfx1032		GFX(1032), // gfx1032
GFX(1033), // gfx1033		GFX(1033), // gfx1033
GFX(1034), // gfx1034		GFX(1034), // gfx1034
GFX(1035), // gfx1035		GFX(1035), // gfx1035
		{CudaArch::Generic, "generic", ""},
// clang-format on		// clang-format on
};		};
#undef SM		#undef SM
#undef SM2		#undef SM2
#undef GFX		#undef GFX

const char *CudaArchToString(CudaArch A) {		const char *CudaArchToString(CudaArch A) {
auto result = std::find_if(		auto result = std::find_if(
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/NVPTX.h

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	public:
}		}

bool isValidCPUName(StringRef Name) const override {		bool isValidCPUName(StringRef Name) const override {
return StringToCudaArch(Name) != CudaArch::UNKNOWN;		return StringToCudaArch(Name) != CudaArch::UNKNOWN;
}		}

void fillValidCPUList(SmallVectorImpl<StringRef> &Values) const override {		void fillValidCPUList(SmallVectorImpl<StringRef> &Values) const override {
for (int i = static_cast<int>(CudaArch::SM_20);		for (int i = static_cast<int>(CudaArch::SM_20);
i < static_cast<int>(CudaArch::LAST); ++i)		i < static_cast<int>(CudaArch::Generic); ++i)
Values.emplace_back(CudaArchToString(static_cast<CudaArch>(i)));		Values.emplace_back(CudaArchToString(static_cast<CudaArch>(i)));
}		}

bool setCPU(const std::string &Name) override {		bool setCPU(const std::string &Name) override {
GPU = StringToCudaArch(Name);		GPU = StringToCudaArch(Name);
return GPU != CudaArch::UNKNOWN;		return GPU != CudaArch::UNKNOWN;
}		}

▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/NVPTX.cpp

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	std::string CUDAArchCode = [this] {
case CudaArch::GFX1012:		case CudaArch::GFX1012:
case CudaArch::GFX1013:		case CudaArch::GFX1013:
case CudaArch::GFX1030:		case CudaArch::GFX1030:
case CudaArch::GFX1031:		case CudaArch::GFX1031:
case CudaArch::GFX1032:		case CudaArch::GFX1032:
case CudaArch::GFX1033:		case CudaArch::GFX1033:
case CudaArch::GFX1034:		case CudaArch::GFX1034:
case CudaArch::GFX1035:		case CudaArch::GFX1035:
		case CudaArch::Generic:
case CudaArch::LAST:		case CudaArch::LAST:
break;		break;
case CudaArch::UNUSED:		case CudaArch::UNUSED:
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
assert(false && "No GPU arch when compiling CUDA device code.");		assert(false && "No GPU arch when compiling CUDA device code.");
return "";		return "";
case CudaArch::SM_20:		case CudaArch::SM_20:
return "200";		return "200";
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

Show First 20 Lines • Show All 3,897 Lines • ▼ Show 20 Lines	if (Clause->getClauseKind() == OMPC_unified_shared_memory) {
case CudaArch::GFX1012:		case CudaArch::GFX1012:
case CudaArch::GFX1013:		case CudaArch::GFX1013:
case CudaArch::GFX1030:		case CudaArch::GFX1030:
case CudaArch::GFX1031:		case CudaArch::GFX1031:
case CudaArch::GFX1032:		case CudaArch::GFX1032:
case CudaArch::GFX1033:		case CudaArch::GFX1033:
case CudaArch::GFX1034:		case CudaArch::GFX1034:
case CudaArch::GFX1035:		case CudaArch::GFX1035:
		case CudaArch::Generic:
case CudaArch::UNUSED:		case CudaArch::UNUSED:
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
break;		break;
case CudaArch::LAST:		case CudaArch::LAST:
llvm_unreachable("Unexpected Cuda arch.");		llvm_unreachable("Unexpected Cuda arch.");
}		}
}		}
}		}
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show All 18 Lines
#include "ToolChains/CrossWindows.h"		#include "ToolChains/CrossWindows.h"
#include "ToolChains/Cuda.h"		#include "ToolChains/Cuda.h"
#include "ToolChains/Darwin.h"		#include "ToolChains/Darwin.h"
#include "ToolChains/DragonFly.h"		#include "ToolChains/DragonFly.h"
#include "ToolChains/FreeBSD.h"		#include "ToolChains/FreeBSD.h"
#include "ToolChains/Fuchsia.h"		#include "ToolChains/Fuchsia.h"
#include "ToolChains/Gnu.h"		#include "ToolChains/Gnu.h"
#include "ToolChains/HIPAMD.h"		#include "ToolChains/HIPAMD.h"
		#include "ToolChains/HIPSPV.h"
#include "ToolChains/Haiku.h"		#include "ToolChains/Haiku.h"
#include "ToolChains/Hexagon.h"		#include "ToolChains/Hexagon.h"
#include "ToolChains/Hurd.h"		#include "ToolChains/Hurd.h"
#include "ToolChains/Lanai.h"		#include "ToolChains/Lanai.h"
#include "ToolChains/Linux.h"		#include "ToolChains/Linux.h"
#include "ToolChains/MSP430.h"		#include "ToolChains/MSP430.h"
#include "ToolChains/MSVC.h"		#include "ToolChains/MSVC.h"
#include "ToolChains/MinGW.h"		#include "ToolChains/MinGW.h"
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
#if LLVM_ON_UNIX		#if LLVM_ON_UNIX
#include <unistd.h> // getpid		#include <unistd.h> // getpid
#endif		#endif

using namespace clang::driver;		using namespace clang::driver;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;

static llvm::Triple getHIPOffloadTargetTriple() {		static llvm::Optional<llvm::Triple>
static const llvm::Triple T("amdgcn-amd-amdhsa");		getHIPOffloadTargetTriple(const Driver &D, const ArgList &Args) {
		if (Args.hasArg(options::OPT_offload_EQ)) {
		auto HIPOffloadTargets = Args.getAllArgValues(options::OPT_offload_EQ);

		// HIP compilation flow does not support multiple targets for now. We need
		// the HIPActionBuilder (and possibly the CudaActionBuilder{,Base}too) to
		// support multiple tool chains first.
		traUnsubmitted Not Done Reply Inline Actions ^^^ This will cause issues in practice, as we're only allowed to specify --offload once. I.e. we're neither allowed to override it (this goes contrary to how clang options are handled conventionally), nor can we extend or modify the list of offload variants as we can with --offload-arch. This code initially used `getLastArg`, but that does not work for an option that controls a list of values. Perhaps we should just make `--offload=` a scalar value so it, at least, behaves consistently with other clang options. tra: ^^^ This will cause issues in practice, as we're only allowed to specify --offload once. I.
		yaxunlUnsubmitted Not Done Reply Inline Actions You are right. I overlooked this part. If multiple `--offload=` options are specified, they are supposed to be unioned. Since currently `--offload=` only accepts `amdgcn-amd-amdhsa` and `spirv64`, and they are mutually exclusive. I think it is OK. In the future, `--offload=` will accept GPU archs, then this part needs to allow multiple `--offload=` options and more sophisticated compatibility check between different options. I agree letting `--offload=` accept scalar value seems a good choice. yaxunl: You are right. I overlooked this part. If multiple `--offload=` options are specified, they…
		switch (HIPOffloadTargets.size()) {
		default:
		D.Diag(diag::err_drv_only_one_offload_target_supported_in) << "HIP";
		return llvm::None;
		case 0:
		D.Diag(diag::err_drv_invalid_or_unsupported_offload_target) << "";
		return llvm::None;
		case 1:
		break;
		}
		llvm::Triple TT(HIPOffloadTargets[0]);
		if (TT.getArch() == llvm::Triple::amdgcn &&
		TT.getVendor() == llvm::Triple::AMD &&
		TT.getOS() == llvm::Triple::AMDHSA)
		return TT;
		if (TT.getArch() == llvm::Triple::spirv64 &&
		TT.getVendor() == llvm::Triple::UnknownVendor &&
		TT.getOS() == llvm::Triple::UnknownOS)
		return TT;
		D.Diag(diag::err_drv_invalid_or_unsupported_offload_target)
		<< HIPOffloadTargets[0];
		return llvm::None;
		}

		static const llvm::Triple T("amdgcn-amd-amdhsa"); // Default HIP triple.
return T;		return T;
}		}

// static		// static
std::string Driver::GetResourcesPath(StringRef BinaryPath,		std::string Driver::GetResourcesPath(StringRef BinaryPath,
StringRef CustomResourceDir) {		StringRef CustomResourceDir) {
// Since the resource directory is embedded in the module hash, it's important		// Since the resource directory is embedded in the module hash, it's important
// that all places that need it call this function, so that they get the		// that all places that need it call this function, so that they get the
▲ Show 20 Lines • Show All 577 Lines • ▼ Show 20 Lines	void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
} else if (IsHIP) {		} else if (IsHIP) {
if (auto *OMPTargetArg =		if (auto *OMPTargetArg =
C.getInputArgs().getLastArg(options::OPT_fopenmp_targets_EQ)) {		C.getInputArgs().getLastArg(options::OPT_fopenmp_targets_EQ)) {
Diag(clang::diag::err_drv_unsupported_opt_for_language_mode)		Diag(clang::diag::err_drv_unsupported_opt_for_language_mode)
<< OMPTargetArg->getSpelling() << "HIP";		<< OMPTargetArg->getSpelling() << "HIP";
return;		return;
}		}
const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();		const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
const llvm::Triple &HostTriple = HostTC->getTriple();
auto OFK = Action::OFK_HIP;		auto OFK = Action::OFK_HIP;
llvm::Triple HIPTriple = getHIPOffloadTargetTriple();		auto HIPTriple = getHIPOffloadTargetTriple(*this, C.getInputArgs());
// Use the HIP and host triples as the key into the ToolChains map,		if (!HIPTriple)
// because the device toolchain we create depends on both.		return;
auto &HIPTC = ToolChains[HIPTriple.str() + "/" + HostTriple.str()];		auto HIPTC = &getOffloadingDeviceToolChain(C.getInputArgs(), HIPTriple,
if (!HIPTC) {		*HostTC, OFK);
HIPTC = std::make_unique<toolchains::HIPAMDToolChain>(		assert(HIPTC && "Could not create offloading device tool chain.");
this, HIPTriple, HostTC, C.getInputArgs());		C.addOffloadDeviceToolChain(HIPTC, OFK);
}
C.addOffloadDeviceToolChain(HIPTC.get(), OFK);
}		}

//		//
// OpenMP		// OpenMP
//		//
// We need to generate an OpenMP toolchain if the user specified targets with		// We need to generate an OpenMP toolchain if the user specified targets with
// the -fopenmp-targets option.		// the -fopenmp-targets option.
if (Arg *OpenMPTargets =		if (Arg *OpenMPTargets =
▲ Show 20 Lines • Show All 2,008 Lines • ▼ Show 20 Lines	bool initialize() override {
if (UseCUID == CUID_Invalid) {		if (UseCUID == CUID_Invalid) {
C.getDriver().Diag(diag::err_drv_invalid_value)		C.getDriver().Diag(diag::err_drv_invalid_value)
<< A->getAsString(Args) << UseCUIDStr;		<< A->getAsString(Args) << UseCUIDStr;
C.setContainsError();		C.setContainsError();
return true;		return true;
}		}
}		}

		// --offload and --offload-arch options are mutually exclusive.
		if (Args.hasArgNoClaim(options::OPT_offload_EQ) &&
		Args.hasArgNoClaim(options::OPT_offload_arch_EQ,
		options::OPT_no_offload_arch_EQ)) {
		C.getDriver().Diag(diag::err_opt_not_valid_with_opt) << "--offload-arch"
		<< "--offload";
		}

// Collect all cuda_gpu_arch parameters, removing duplicates.		// Collect all cuda_gpu_arch parameters, removing duplicates.
std::set<StringRef> GpuArchs;		std::set<StringRef> GpuArchs;
bool Error = false;		bool Error = false;
for (Arg *A : Args) {		for (Arg *A : Args) {
if (!(A->getOption().matches(options::OPT_offload_arch_EQ) \|\|		if (!(A->getOption().matches(options::OPT_offload_arch_EQ) \|\|
A->getOption().matches(options::OPT_no_offload_arch_EQ)))		A->getOption().matches(options::OPT_no_offload_arch_EQ)))
continue;		continue;
A->claim();		A->claim();
Show All 26 Lines	bool initialize() override {

// Collect list of GPUs remaining in the set.		// Collect list of GPUs remaining in the set.
for (auto Arch : GpuArchs)		for (auto Arch : GpuArchs)
GpuArchList.push_back(Arch.data());		GpuArchList.push_back(Arch.data());

// Default to sm_20 which is the lowest common denominator for		// Default to sm_20 which is the lowest common denominator for
// supported GPUs. sm_20 code should work correctly, if		// supported GPUs. sm_20 code should work correctly, if
// suboptimally, on all newer GPUs.		// suboptimally, on all newer GPUs.
if (GpuArchList.empty())		if (GpuArchList.empty()) {
		if (ToolChains.front()->getTriple().isSPIRV())
		GpuArchList.push_back(CudaArch::Generic);
		else
GpuArchList.push_back(DefaultCudaArch);		GpuArchList.push_back(DefaultCudaArch);
		}

return Error;		return Error;
}		}
};		};

/// \brief CUDA action builder. It injects device code in the host backend		/// \brief CUDA action builder. It injects device code in the host backend
/// action.		/// action.
class CudaActionBuilder final : public CudaActionBuilderBase {		class CudaActionBuilder final : public CudaActionBuilderBase {
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	HIPActionBuilder(Compilation &C, DerivedArgList &Args,
BundleOutput = Args.hasFlag(options::OPT_gpu_bundle_output,		BundleOutput = Args.hasFlag(options::OPT_gpu_bundle_output,
options::OPT_no_gpu_bundle_output);		options::OPT_no_gpu_bundle_output);
}		}

bool canUseBundlerUnbundler() const override { return true; }		bool canUseBundlerUnbundler() const override { return true; }

StringRef getCanonicalOffloadArch(StringRef IdStr) override {		StringRef getCanonicalOffloadArch(StringRef IdStr) override {
llvm::StringMap<bool> Features;		llvm::StringMap<bool> Features;
auto ArchStr =		// getHIPOffloadTargetTriple() is known to return valid value as it has
parseTargetID(getHIPOffloadTargetTriple(), IdStr, &Features);		// been called successfully in the CreateOffloadingDeviceToolChains().
		auto ArchStr = parseTargetID(
		*getHIPOffloadTargetTriple(C.getDriver(), C.getInputArgs()), IdStr,
		&Features);
if (!ArchStr) {		if (!ArchStr) {
C.getDriver().Diag(clang::diag::err_drv_bad_target_id) << IdStr;		C.getDriver().Diag(clang::diag::err_drv_bad_target_id) << IdStr;
C.setContainsError();		C.setContainsError();
return StringRef();		return StringRef();
}		}
auto CanId = getCanonicalTargetID(ArchStr.getValue(), Features);		auto CanId = getCanonicalTargetID(ArchStr.getValue(), Features);
return Args.MakeArgStringRef(CanId);		return Args.MakeArgStringRef(CanId);
};		};
Show All 37 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
// Create a link action to link device IR with device library		// Create a link action to link device IR with device library
// and generate ISA.		// and generate ISA.
CudaDeviceActions[I] =		CudaDeviceActions[I] =
C.MakeAction<LinkJobAction>(AL, types::TY_Image);		C.MakeAction<LinkJobAction>(AL, types::TY_Image);
} else {		} else {
// When LTO is not enabled, we follow the conventional		// When LTO is not enabled, we follow the conventional
// compiler phases, including backend and assemble phases.		// compiler phases, including backend and assemble phases.
ActionList AL;		ActionList AL;
auto BackendAction = C.getDriver().ConstructPhaseAction(		Action *BackendAction = nullptr;
		if (ToolChains.front()->getTriple().isSPIRV()) {
		// Emit LLVM bitcode for SPIR-V targets. SPIR-V device tool chain
		// (HIPSPVToolChain) runs post-link LLVM IR passes.
		types::ID Output = Args.hasArg(options::OPT_S)
		? types::TY_LLVM_IR
		: types::TY_LLVM_BC;
		BackendAction =
		C.MakeAction<BackendJobAction>(CudaDeviceActions[I], Output);
		} else
		BackendAction = C.getDriver().ConstructPhaseAction(
C, Args, phases::Backend, CudaDeviceActions[I],		C, Args, phases::Backend, CudaDeviceActions[I],
AssociatedOffloadKind);		AssociatedOffloadKind);
auto AssembleAction = C.getDriver().ConstructPhaseAction(		auto AssembleAction = C.getDriver().ConstructPhaseAction(
C, Args, phases::Assemble, BackendAction,		C, Args, phases::Assemble, BackendAction,
AssociatedOffloadKind);		AssociatedOffloadKind);
AL.push_back(AssembleAction);		AL.push_back(AssembleAction);
// Create a link action to link device IR with device library		// Create a link action to link device IR with device library
// and generate ISA.		// and generate ISA.
CudaDeviceActions[I] =		CudaDeviceActions[I] =
C.MakeAction<LinkJobAction>(AL, types::TY_Image);		C.MakeAction<LinkJobAction>(AL, types::TY_Image);
▲ Show 20 Lines • Show All 2,442 Lines • ▼ Show 20 Lines	const ToolChain &Driver::getToolChain(const ArgList &Args,
// Intentionally omitted from the switch above: llvm::Triple::CUDA. CUDA		// Intentionally omitted from the switch above: llvm::Triple::CUDA. CUDA
// compiles always need two toolchains, the CUDA toolchain and the host		// compiles always need two toolchains, the CUDA toolchain and the host
// toolchain. So the only valid way to create a CUDA toolchain is via		// toolchain. So the only valid way to create a CUDA toolchain is via
// CreateOffloadingDeviceToolChains.		// CreateOffloadingDeviceToolChains.

return *TC;		return *TC;
}		}

		const ToolChain &Driver::getOffloadingDeviceToolChain(
		const ArgList &Args, const llvm::Triple &Target, const ToolChain &HostTC,
		const Action::OffloadKind &TargetDeviceOffloadKind) const {
		// Use device / host triples as the key into the ToolChains map because the
		// device ToolChain we create depends on both.
		auto &TC = ToolChains[Target.str() + "/" + HostTC.getTriple().str()];
		if (!TC) {
		// Categorized by offload kind > arch rather than OS > arch like
		// the normal getToolChain call, as it seems a reasonable way to categorize
		// things.
		switch (TargetDeviceOffloadKind) {
		case Action::OFK_HIP: {
		if (Target.getArch() == llvm::Triple::amdgcn &&
		Target.getVendor() == llvm::Triple::AMD &&
		Target.getOS() == llvm::Triple::AMDHSA)
		TC = std::make_unique<toolchains::HIPAMDToolChain>(*this, Target,
		HostTC, Args);
		else if (Target.getArch() == llvm::Triple::spirv64 &&
		Target.getVendor() == llvm::Triple::UnknownVendor &&
		Target.getOS() == llvm::Triple::UnknownOS)
		TC = std::make_unique<toolchains::HIPSPVToolChain>(*this, Target,
		HostTC, Args);
		break;
		}
		default:
		break;
		}
		}

		return *TC;
		}

bool Driver::ShouldUseClangCompiler(const JobAction &JA) const {		bool Driver::ShouldUseClangCompiler(const JobAction &JA) const {
// Say "no" if there is not exactly one input of a type clang understands.		// Say "no" if there is not exactly one input of a type clang understands.
if (JA.size() != 1 \|\|		if (JA.size() != 1 \|\|
!types::isAcceptedByClang((*JA.input_begin())->getType()))		!types::isAcceptedByClang((*JA.input_begin())->getType()))
return false;		return false;

// And say "no" if this is not a kind of action clang understands.		// And say "no" if this is not a kind of action clang understands.
if (!isa<PreprocessJobAction>(JA) && !isa<PrecompileJobAction>(JA) &&		if (!isa<PreprocessJobAction>(JA) && !isa<PrecompileJobAction>(JA) &&
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

clang/test/Driver/Inputs/hipspv-dev-lib/a/a.bc

This file was added.

This is an empty file.

clang/test/Driver/Inputs/hipspv-dev-lib/b/b.bc

This file was added.

This is an empty file.

clang/test/Driver/Inputs/hipspv-dev-lib/hipspv-spirv64.bc

This file was added.

This is an empty file.

clang/test/Driver/Inputs/hipspv/bin/.hipVersion

This file was added.

				HIP_VERSION_MAJOR=3
				HIP_VERSION_MINOR=6

clang/test/Driver/Inputs/hipspv/lib/hip-device-lib/hipspv-spirv64.bc

This file was added.

This is an empty file.

clang/test/Driver/Inputs/hipspv/lib/libLLVMHipSpvPasses.so

This file was added.

This is an empty file.

clang/test/Driver/Inputs/pass-plugin.so

This file was added.

This is an empty file.

clang/test/Driver/hipspv-device-libs.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// UNSUPPORTED: system-windows

				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: --hip-path=%S/Inputs/hipspv %s \
				// RUN: 2>&1 \| FileCheck --check-prefixes=ALL,HIP-PATH %s

				// Test --hip-device-lib-path
				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: --hip-path=%S/Inputs/hipspv \
				// RUN: --hip-device-lib-path=%S/Inputs/hipspv-dev-lib %s \
				// RUN: 2>&1 \| FileCheck --check-prefixes=ALL,HIP-DEV-LIB-PATH %s

				// Test --hip-device-lib w/ --hip-device-lib-path and HIP_DEVICE_LIB_PATH.
				// RUN: env HIP_DEVICE_LIB_PATH=%S/Inputs/hipspv-dev-lib/a \
				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: --hip-path=%S/Inputs/hipspv \
				// RUN: --hip-device-lib-path=%S/Inputs/hipspv-dev-lib/b \
				// RUN: --hip-device-lib=a.bc --hip-device-lib=b.bc %s \
				// RUN: 2>&1 \| FileCheck --check-prefixes=ALL,HIP-DEV-LIB %s

				// ALL: {{"[^"]clang[^"]"}}
				// HIP-PATH: "-mlink-builtin-bitcode" {{".*/hipspv/lib/hip-device-lib/hipspv-spirv64.bc"}}
				// HIP-DEV-LIB-PATH-NOT: "-mlink-builtin-bitcode" {{".*/hipspv/lib/hip-device-lib/hipspv-spirv64.bc"}}
				// HIP-DEV-LIB-PATH: "-mlink-builtin-bitcode" {{".*/hipspv-dev-lib/hipspv-spirv64.bc"}}
				// HIP-DEV-LIB: "-mlink-builtin-bitcode" {{".*/hipspv-dev-lib/a/a.bc"}}
				// HIP-DEV-LIB-SAME: "-mlink-builtin-bitcode" {{".*/hipspv-dev-lib/b/b.bc"}}

clang/test/Driver/hipspv-pass-plugin.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// UNSUPPORTED: system-windows

				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: --hip-path=%S/Inputs/hipspv -nogpuinc %s \
				// RUN: 2>&1 \| FileCheck --check-prefixes=FROM-HIP-PATH %s

				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: -nogpuinc -nogpulib --hipspv-pass-plugin=%S/Inputs/pass-plugin.so %s \
				// RUN: 2>&1 \| FileCheck --check-prefixes=FROM-OPTION %s

				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: -nogpuinc -nogpulib --hipspv-pass-plugin=foo.so %s \
				// RUN: 2>&1 \| FileCheck --check-prefixes=FROM-OPTION-INVALID %s

				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: -nogpuinc -nogpulib %s \
				// RUN: 2>&1 \| FileCheck --check-prefixes=NO-PLUGIN %s

				// FROM-HIP-PATH: {{".opt"}} {{"..bc"}} "-load-pass-plugin"
				// FROM-HIP-PATH-SAME: {{".*/Inputs/hipspv/lib/libLLVMHipSpvPasses.so"}}
				// FROM-OPTION: {{".opt"}} {{"..bc"}} "-load-pass-plugin"
				// FROM-OPTION-SAME: {{".*/Inputs/pass-plugin.so"}}
				// FROM-OPTION-INVALID: error: no such file or directory: 'foo.so'
				// NO-PLUGIN-NOT: {{".opt"}} {{"..bc"}} "-load-pass-plugin"
				// NO-PLUGIN-NOT: {{".*/Inputs/hipspv/lib/libLLVMHipSpvPasses.so"}}

clang/test/Driver/hipspv-toolchain-rdc.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// UNSUPPORTED: system-windows

				// RUN: %clang -### -x hip -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: -fgpu-rdc --hip-path=%S/Inputs/hipspv -nohipwrapperinc \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
				// RUN: 2>&1 \| FileCheck %s

				// Emit objects for host side path
				// CHECK: [[CLANG:".clang."]] "-cc1" "-triple" "x86_64-unknown-linux-gnu"
				// CHECK-SAME: "-aux-triple" "spirv64"
				// CHECK-SAME: "-emit-obj"
				// CHECK-SAME: "-fgpu-rdc"
				// CHECK-SAME: {{.}} "-o" [[A_OBJ_HOST:".o"]] "-x" "hip"
				// CHECK-SAME: {{.}} [[A_SRC:".a.cu"]]

				// CHECK: [[CLANG]] "-cc1" "-triple" "x86_64-unknown-linux-gnu"
				// CHECK-SAME: "-aux-triple" "spirv64"
				// CHECK-SAME: "-emit-obj"
				// CHECK-SAME: "-fgpu-rdc"
				// CHECK-SAME: {{.}} "-o" [[B_OBJ_HOST:".o"]] "-x" "hip"
				// CHECK-SAME: {{.}} [[B_SRC:".b.hip"]]

				// Emit code (LLVM BC) for device side path.
				// CHECK: [[CLANG]] "-cc1" "-triple" "spirv64"
				// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
				// CHECK-SAME: "-emit-llvm-bc"
				// CHECK-SAME: "-fcuda-is-device" "-fcuda-allow-variadic-functions"
				// CHECK-SAME: "-fvisibility" "hidden" "-fapply-global-visibility-to-externs"
				// CHECK-SAME: "-fgpu-rdc"
				// CHECK-SAME: {{.}} "-o" [[A_BC1:".bc"]] "-x" "hip"
				// CHECK-SAME: {{.*}} [[A_SRC]]

				// CHECK: [[CLANG]] "-cc1" "-triple" "spirv64"
				// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
				// CHECK-SAME: "-emit-llvm-bc"
				// CHECK-SAME: "-fcuda-is-device" "-fcuda-allow-variadic-functions"
				// CHECK-SAME: "-fvisibility" "hidden" "-fapply-global-visibility-to-externs"
				// CHECK-SAME: "-fgpu-rdc"
				// CHECK-SAME: {{.}} "-o" [[B_BC1:".bc"]] "-x" "hip"
				// CHECK-SAME: {{.*}} [[B_SRC]]

				// Link device code, lower it with HIPSPV passes and emit SPIR-V binary.
				// CHECK: {{".llvm-link."}} [[A_BC1]] [[B_BC1]] "-o" [[AB_LINK:".*bc"]]
				// CHECK: {{".opt."}} [[AB_LINK]] "-load-pass-plugin"
				// CHECK-SAME: "{{.*}}/Inputs/hipspv/lib/libLLVMHipSpvPasses.so"
				// CHECK-SAME: "-o" [[AB_LOWER:".*bc"]]
				// CHECK: {{".*llvm-spirv"}} "--spirv-max-version=1.1" "--spirv-ext=+all"
				// CHECK-SAME: [[AB_LOWER]] "-o" "[[AB_SPIRV:.*out]]"

				// Construct fat binary object.
				// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o" "-bundle-align=4096"
				// CHECK-SAME: "-targets={{.*}},hip-spirv64----generic"
				// CHECK-SAME: "-inputs=/dev/null,[[AB_SPIRV]]"
				// CHECK-SAME: "-outputs=[[AB_FATBIN:.*hipfb]]"
				// CHECK: {{".llvm-mc."}} "-o" [[OBJBUNDLE:".o"]] "{{.}}.mcin"
				// CHECK-SAME: "--filetype=obj"

				// Output the executable
				// CHECK: {{".ld."}} {{.}}"-o" "a.out" {{.}} [[A_OBJ_HOST]] [[B_OBJ_HOST]]
				// CHECK-SAME: [[OBJBUNDLE]]

clang/test/Driver/hipspv-toolchain.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// UNSUPPORTED: system-windows

				// RUN: %clang -### -target x86_64-linux-gnu --offload=spirv64 \
				// RUN: --hip-path=%S/Inputs/hipspv -nohipwrapperinc %s \
				// RUN: 2>&1 \| FileCheck %s

				// CHECK: [[CLANG:".clang."]] "-cc1" "-triple" "spirv64"
				// CHECK-SAME: "-aux-triple" "{{.*}}" "-emit-llvm-bc"
				// CHECK-SAME: "-fcuda-is-device"
				// CHECK-SAME: "-fcuda-allow-variadic-functions"
				// CHECK-SAME: "-mlink-builtin-bitcode" {{".*/hipspv/lib/hip-device-lib/hipspv-spirv64.bc"}}
				// CHECK-SAME: "-isystem" {{".*/hipspv/include"}}
				// CHECK-SAME: "-fhip-new-launch-api"
				// CHECK-SAME: "-o" [[DEV_BC:".*bc"]]
				// CHECK-SAME: "-x" "hip"

				// CHECK: {{".llvm-link"}} [[DEV_BC]] "-o" [[LINK_BC:".bc"]]

				// CHECK: {{".*opt"}} [[LINK_BC]] "-load-pass-plugin"
				// CHECK-SAME: {{".*/hipspv/lib/libLLVMHipSpvPasses.so"}}
				// CHECK-SAME: "-passes=hip-post-link-passes" "-o" [[LOWER_BC:".*bc"]]

				// CHECK: {{".*llvm-spirv"}} "--spirv-max-version=1.1" "--spirv-ext=+all"
				// CHECK-SAME: [[LOWER_BC]] "-o" "[[SPIRV_OUT:.*out]]"

				// CHECK: {{".*clang-offload-bundler"}} "-type=o" "-bundle-align=4096"
				// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-spirv64----generic"
				// CHECK-SAME: "-inputs={{.}},[[SPIRV_OUT]]" "-outputs=[[BUNDLE:.hipfb]]"

				// CHECK: [[CLANG]] "-cc1" "-triple" {{".*"}} "-aux-triple" "spirv64"
				// CHECK-SAME: "-emit-obj"
				// CHECK-SAME: "-fcuda-include-gpubinary" "[[BUNDLE]]"
				// CHECK-SAME: "-o" [[OBJ_HOST:".*o"]] "-x" "hip"

				// CHECK: {{".ld."}} {{.*}}[[OBJ_HOST]]

clang/test/Driver/invalid-offload-options.cpp

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// UNSUPPORTED: system-windows

				// RUN: %clang -### -x hip -target x86_64-linux-gnu --offload= \
				// RUN: --hip-path=%S/Inputs/hipspv -nogpuinc -nogpulib %s \
				// RUN: 2>&1 \| FileCheck --check-prefix=INVALID-TARGET %s
				// RUN: %clang -### -x hip -target x86_64-linux-gnu --offload=foo \
				// RUN: --hip-path=%S/Inputs/hipspv -nogpuinc -nogpulib %s \
				// RUN: 2>&1 \| FileCheck --check-prefix=INVALID-TARGET %s

				// INVALID-TARGET: error: Invalid or unsupported offload target: '{{.*}}'

				// In the future we should be able to specify multiple targets for HIP
				// compilation but currently it is not supported.
				//
				// RUN: %clang -### -x hip -target x86_64-linux-gnu --offload=foo,bar \
				// RUN: --hip-path=%S/Inputs/hipspv -nogpuinc -nogpulib %s \
				// RUN: 2>&1 \| FileCheck --check-prefix=TOO-MANY-TARGETS %s
				// RUN: %clang -### -x hip -target x86_64-linux-gnu \
				// RUN: --offload=foo --offload=bar \
				// RUN: --hip-path=%S/Inputs/hipspv -nogpuinc -nogpulib %s \
				// RUN: 2>&1 \| FileCheck --check-prefix=TOO-MANY-TARGETS %s

				// TOO-MANY-TARGETS: error: Only one offload target is supported in HIP.

				// RUN: %clang -### -x hip -target x86_64-linux-gnu -nogpuinc -nogpulib \
				// RUN: --offload=amdgcn-amd-amdhsa --offload-arch=gfx900 %s \
				// RUN: 2>&1 \| FileCheck --check-prefix=OFFLOAD-ARCH-MIX %s

				// OFFLOAD-ARCH-MIX: error: option '--offload-arch' cannot be specified with '--offload'

This is an archive of the discontinued LLVM Phabricator instance.

[HIPSPV][3/4] Enable SPIR-V emission for HIPClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 395453

clang/include/clang/Basic/Cuda.h

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/include/clang/Driver/Driver.h

clang/include/clang/Driver/Options.td

clang/lib/Basic/Cuda.cpp

clang/lib/Basic/Targets/NVPTX.h

clang/lib/Basic/Targets/NVPTX.cpp

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

clang/lib/Driver/Driver.cpp

clang/test/Driver/Inputs/hipspv-dev-lib/a/a.bc

clang/test/Driver/Inputs/hipspv-dev-lib/b/b.bc

clang/test/Driver/Inputs/hipspv-dev-lib/hipspv-spirv64.bc

clang/test/Driver/Inputs/hipspv/bin/.hipVersion

clang/test/Driver/Inputs/hipspv/lib/hip-device-lib/hipspv-spirv64.bc

clang/test/Driver/Inputs/hipspv/lib/libLLVMHipSpvPasses.so

clang/test/Driver/Inputs/pass-plugin.so

clang/test/Driver/hipspv-device-libs.hip

clang/test/Driver/hipspv-pass-plugin.hip

clang/test/Driver/hipspv-toolchain-rdc.hip

clang/test/Driver/hipspv-toolchain.hip

clang/test/Driver/invalid-offload-options.cpp

[HIPSPV][3/4] Enable SPIR-V emission for HIP
ClosedPublic