This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
-
Driver/
-
Options.td
-
lib/Driver/
-
Driver/
13/19
Driver.cpp
-
ToolChains/
7/18
Clang.cpp
-
test/Driver/
-
Driver/
3
cuda-openmp-driver.cu

Differential D120272

[CUDA] Add driver support for compiling CUDA with the new driver
ClosedPublic

Authored by jhuber6 on Feb 21 2022, 11:46 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield
tra
yaxunl

Commits

rGc5e5b54350fe: [CUDA] Add driver support for compiling CUDA with the new driver

Summary

This patch adds the basic support for the clang driver to compile and link CUDA
using the new offloading driver. This requires handling the CUDA offloading kind
and embedding the generated files into the host. This will allow us to link
OpenMP code with CUDA code in the linker wrapper. More support will be required
to create functional CUDA / HIP binaries using this method.

Depends on D120270 D120271 D120934

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Feb 21 2022, 11:46 AM

Herald added a subscriber: carlosgalvezp. · View Herald TranscriptFeb 21 2022, 11:46 AM

jhuber6 requested review of this revision.Feb 21 2022, 11:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2022, 11:46 AM

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

jhuber6 added a child revision: D120273: [OpenMP] Allow CUDA to be linked with OpenMP using the new driver.Feb 21 2022, 11:47 AM

Harbormaster completed remote builds in B150737: Diff 410354.Feb 21 2022, 12:35 PM

Tests?

clang/lib/Driver/Driver.cpp
3956	Everything till here can be a NFC commit, right? Let's split it off
4108	Can we have a doxygen comment explaining what these helpers do?
4153	With the NFC commit we can probably also include some of this but restricted to OFK_OpenMP. Try to minimize the functional change that one has to think about.

Updating, embed fatbinaries now and small changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 12:29 PM

Herald added a subscriber: dang. · View Herald Transcript

jhuber6 added a parent revision: D120934: [OpenMP][NFC] Refactor new driver to be more general.Mar 3 2022, 12:30 PM

Harbormaster completed remote builds in B152446: Diff 412812.Mar 3 2022, 12:31 PM

jhuber6 edited the summary of this revision. (Show Details)Mar 3 2022, 12:38 PM

Adding comments

Harbormaster completed remote builds in B152449: Diff 412816.Mar 3 2022, 12:46 PM

Adding test

Harbormaster completed remote builds in B152452: Diff 412821.Mar 3 2022, 1:22 PM

yaxunl added inline comments.Mar 3 2022, 1:57 PM

clang/lib/Driver/Driver.cpp
4099–4102	The final set depends on the order of -offload-arch and -no-offload-arch options, e.g. `--offload-arch=gfx906 --no-offload-arch=gfx906` and `--no-offload-arch=gfx906 --offload-arch=gfx906` is different. Also there is `--no-offload-arch=all` which removes all precedent --offload-arch= options.
4107	should this be HIP?

jhuber6 added inline comments.Mar 3 2022, 1:59 PM

clang/lib/Driver/Driver.cpp
4099–4102	I see, so we need to iterate the arguments in-order and insert or erase them as they are entered, I'll fix it.
4107	Whoops, haven't tested it with HIP yet.

Correctly handle offloading architecture options.

Harbormaster completed remote builds in B152517: Diff 412908.Mar 3 2022, 7:49 PM

jhuber6 updated this revision to Diff 412996.Mar 4 2022, 6:57 AM

Fix

Harbormaster completed remote builds in B152583: Diff 412996.Mar 4 2022, 6:58 AM

tra added inline comments.Mar 4 2022, 10:41 AM

clang/lib/Driver/Driver.cpp
4107	Nit: It would be nice to report specific option which triggered the diag. Reporting `--offload-arch` when it's triggered by `--no-offload-arch` would be somewhat confusing. `Args.hasArgNoClaim(options::OPT_offload_arch_EQ) ? "--offload-arch" : --no-offload-arch` ?
4132–4134	If we do not have constants for the default CUDA/HIP arch yet, we should probably add them and use them here.
4171–4180	If we do not allow non-relocatable compilation with the new driver, perhaps we should make `-fgpu-rdc` enabled by default with the new driver and error out if someone attempts to use `-fno-gpu-rdc`. Otherwise we're virtually guaranteed that everyone attempting to use `-foffload-new-driver` will hit this error.
4172	Do we want to return early here?
4221–4222	Nit: We do have `llvm::zip` for iterating over multiple containers. However, unpacking loop variable results maybe more trouble than it's worth it in such a small loop, Up to you.

JonChesterfield added inline comments.Mar 4 2022, 11:16 AM

clang/lib/Driver/Driver.cpp
4132–4134	Defaulting hip to gfx803 is unlikely to be helpful. It won't run on other architectures, i.e. there's no conservative default that will run on most things. I guess that's an existing quirk of the hip toolchain?

tra added inline comments.Mar 4 2022, 11:33 AM

clang/lib/Driver/Driver.cpp
4132–4134	I agree that there's no "safe" choice of GPU target. It applies to CUDA, as well, at least for GPU binaries. That said, we still want `clang -c foo.cu` to work. For CUDA we use the oldest variant still supported by the vendor. It produces PTX assembly which we embed along with the GPU binary. That PTX is valid for newer GPU archtectures and CUDA runtime will be able to compile it for the GPU the app happens to run on. It's not ideal, but it works. While for AMDGPU we do not have such forward compatibility mode as we have with CUDA, being able to compile for something by default is still useful, IMO.

Addressing nits.

Herald added a subscriber: dexonsmith. · View Herald TranscriptMar 4 2022, 11:47 AM

Harbormaster completed remote builds in B152642: Diff 413087.Mar 4 2022, 11:48 AM

jhuber6 added inline comments.Mar 4 2022, 11:48 AM

clang/lib/Driver/Driver.cpp
4132–4134	I just copied GFX803 because that's what the previous driver used. I agree we should just default to something, maybe in the AMD case we can issue a warning telling them to use the flag to properly specify it.

jhuber6 marked 8 inline comments as done.Mar 4 2022, 12:03 PM

Fix architecture parsing and still include the GPU binary so cuobjcopy can use them.

Herald added subscribers: abrachet, phosek. · View Herald TranscriptMar 10 2022, 7:03 AM

Harbormaster completed remote builds in B153559: Diff 414371.Mar 10 2022, 7:04 AM

We shouldn't need to restrict this to RDC only if implemented properly.

Harbormaster completed remote builds in B154337: Diff 415439.Mar 15 2022, 8:48 AM

Accidentally clang formatted an entire file.

Harbormaster completed remote builds in B154348: Diff 415460.Mar 15 2022, 9:37 AM

Fix wrong condition for picking up input.

Harbormaster completed remote builds in B154452: Diff 415622.Mar 15 2022, 4:15 PM

tra added inline comments.Mar 17 2022, 11:25 AM

clang/include/clang/Basic/Cuda.h
105–107 ↗	(On Diff #415460)	Nit: those could be just enum values. ... LAST, DefaultCudaArch = SM_35, DefaultHIPArch = GFX803, };
clang/lib/Driver/Driver.cpp
4181	This loop can be folded into the loop above. Alternatively, for simple loops like this one could use `llvm::for_each`.
4206	Could you elaborate why TCAndArch is incremented only here? Don't other cases need to advance it, too? At the very least we need some comments here and for the loop in general, describing what's going on.
clang/lib/Driver/ToolChains/Clang.cpp
6985	Extracting arch name from the file name looks... questionable. Where does that file name come from? Can't we get this information directly?

jhuber6 added inline comments.Mar 17 2022, 12:29 PM

clang/include/clang/Basic/Cuda.h
105–107 ↗	(On Diff #415460)	Will do.
clang/lib/Driver/Driver.cpp
4181	It could, I think the intention is clearer (i.e. make an input for every toolchain and architecture) having them separate. But I can merge them if that's better.
4206	It shouldn't be, I forgot to move this out of the conditional once I added more things, I'll explain the usage as well.
clang/lib/Driver/ToolChains/Clang.cpp
6985	Yeah, it's not ideal but it was the easiest way to do it. The alternative is to find a way to traverse the job action list and find the nodes with a bound architecture set and hope they're in the same order. I can try to do that.

Addressing comments

Harbormaster completed remote builds in B154904: Diff 416292.Mar 17 2022, 1:31 PM

jhuber6 added a parent revision: D123313: [OpenMP] Make clang argument handling for the new driver more generic.Apr 7 2022, 8:41 AM

Herald added a subscriber: MaskRay. · View Herald TranscriptApr 7 2022, 8:41 AM

Update with the more generic clang argument handling.

MaskRay added inline comments.Apr 7 2022, 10:07 AM

clang/test/Driver/cuda-openmp-driver.cu
11	Better to add -NEXT whenever applicable

MaskRay added inline comments.Apr 7 2022, 10:08 AM

clang/test/Driver/cuda-openmp-driver.cu
2	clang-driver is unneeded.

jhuber6 added a parent revision: D123325: [Clang] Make enabling the new driver more generic.Apr 7 2022, 10:25 AM

Make -foffload-new-driver imply GPU-RDC mode, it won't work otherwise. Also adjust tests.

Harbormaster completed remote builds in B158524: Diff 421270.Apr 7 2022, 12:53 PM

jhuber6 added a child revision: D123471: [CUDA] Create offloading entries when using the new driver.Apr 10 2022, 1:15 PM

Adding new test for CUDA phases.

Herald added a subscriber: mattd. · View Herald TranscriptApr 19 2022, 8:57 AM

Harbormaster completed remote builds in B160250: Diff 423645.Apr 19 2022, 8:58 AM

jhuber6 mentioned this in D123325: [Clang] Make enabling the new driver more generic.Apr 19 2022, 9:00 AM

Thank you for adding the compilation pipeline tests.

LGTM overall.

clang/lib/Driver/ToolChains/Clang.cpp
6217–6219	Combine both ifs, so we don't add `-fgpu-rdc` twice?
6909–6910	Ditto.
clang/test/Driver/cuda-openmp-driver.cu
19	You probably want to check for `clang -cc1 ... -fgpu-rdc`, too.

Making suggested changes.

Harbormaster completed remote builds in B160285: Diff 423685.Apr 19 2022, 11:21 AM

Ping

tra added inline comments.Apr 22 2022, 10:21 AM

clang/lib/Driver/ToolChains/Clang.cpp
6217–6218	If user specifies both `-fno-gpu-rdc` and `-foffload-new-driver` we would still enable RDC compilation. We may want to at least issue a warning. Considering that we have multiple places where we may check for `-f[no]gpu-rdc` we should make sure we don't get different ideas whether RDC has been enabled. I think it may make sense to provide a common way to figure it out. Either via a helper function that would process CLI arguments or calculate it once and save it somewhere.

jhuber6 added inline comments.Apr 22 2022, 10:27 AM

clang/lib/Driver/ToolChains/Clang.cpp
6217–6218	I haven't quite finalized how to handle this. The new driver should be compatible with a non-RDC build since we simply wouldn't embed the device image or create offloading entries. It's a little bit more difficult here since the new method is opt-in so it requires a flag. We should definitely emit a warning if both are enabled (I'm assuming there's one for passing both `fgpu-rdc` and `fno-gpu-rdc`). I'll add one in. Also we could consider the new driver the RDC in the future which would be the easiest. The problem is if we want to support CUDA's method of RDC considering how other build systems seem to expect it. I could see us embedding the fatbinary in the object file, even if unused, just so that cuobjdump works. However we couldn't support the generation of `__cudaRegisterFatBinary_nv....` functions because then those would cause linker errors. WDYT?

tra added inline comments.Apr 22 2022, 10:36 AM

clang/lib/Driver/ToolChains/Clang.cpp
6217–6218	I'm assuming there's one for passing both fgpu-rdc and fno-gpu-rdc This is not a valid assumption. The whole idea behind `-fno-something` is that the options can be overridden. E.g. if the build specifies a standard set of compiler options, but we need to override some of them when building a particular file. We can only do so by appending to the standard options. Potentially we may end up having those options overridden again. While it's not very common, it's certainly possible. It's also possible to start with '-fno-gpu-rdc' and then override it with `-fgpu-rdc`. In this case, we care about the final state of RDC specified by -f*gpu-rdc options, not by the fact that `-fno-gpu-rdc` is present. `Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)` does exactly that -- gives you the final state. If it returns false, but we have `-foffload-new-driver`, then we need a warning as these options are contradictory.

tra added inline comments.Apr 22 2022, 10:44 AM

clang/lib/Driver/ToolChains/Clang.cpp
6217–6218	The new driver should be compatible with a non-RDC build In that case, we don't need a warning, but we do need a test verifying this behavior. Also we could consider the new driver the RDC in the future which would be the easiest. SGTM. I do not know how it all will work out in the end. Your proposed model makes a lot of sense, and I'm guardedly optimistic about it. Eventually we would deprecate RDC options, but we still need to work sensibly when user specifies a mix of these options.

Adding warning for using both -fno-gpu-rdc and -foffload-new-driver. I think this is a good warning to have for now while this is being worked in as opt-in. Once this has matured I plan on adding the necessary logic to handle RDC and non-RDC builds correctly with this. But for the purposes of this patch just warning is fine.

Harbormaster completed remote builds in B160903: Diff 424537.Apr 22 2022, 10:53 AM

jhuber6 added inline comments.Apr 22 2022, 10:56 AM

clang/lib/Driver/ToolChains/Clang.cpp
6217–6218	In that case, we don't need a warning, but we do need a test verifying this behavior. It's possible but I don't have the logic here to do it, figured we can cross that bridge later. SGTM. I do not know how it all will work out in the end. Your proposed model makes a lot of sense, and I'm guardedly optimistic about it. So the only downsides I know of, is that we don't currently replicate CUDA's magic to JIT RDC code (We can do this with LTO anyway), and that registering offload entries relies on the linker defining `__start / __stop` variables, which I don't know if linkers on Windows / MacOS provide. I'd be really interested if someone on the LLD team knew the answer to that.

tra added a subscriber: rnk.Apr 22 2022, 11:11 AM

tra added inline comments.

clang/lib/Driver/ToolChains/Clang.cpp
6217	This has to be `Args.hasArg(options::OPT_fno_gpu_rdc) && Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) == false` E.g. we don't want a warning if we have `-foffload-new-driver -fno-gpu-rdc -fgpu-rdc`.
6217–6218	relies on the linker defining start / stop variables, which I don't know if linkers on Windows / MacOS provide. I'd be really interested if someone on the LLD team knew the answer to that. @MaskRay, @rnk - would you happen to know the answer?
6220	The warning does not take any parameters and this one looks wrong anyways.
6908	I'm not sure why we're no longer checking for `OPT_foffload_new_driver` here. Don't we want to have the same RDC mode on the host and device sides? I think we do as that affects the way we mangle some symbols and it has to be consistent on both sides.

jhuber6 added inline comments.Apr 22 2022, 11:13 AM

clang/lib/Driver/ToolChains/Clang.cpp
6217	K, will do
6220	Whoops, deleted that previously but had a little SNAFU with my commits and forgot to do it again.
6908	This is only checked with `CudaDeviceInput` which we don't use with the new driver.

Making suggested changes.

Harbormaster completed remote builds in B160908: Diff 424543.Apr 22 2022, 11:16 AM

Changing this to simply require that the user manually passes -fgpu-rdc in order to use the new driver. I think this makes more sense in the short term and we can move to make the new driver the default rdc approach later. I tested this and the following should workd,

clang foo.cu -fgpu-rdc -foffload-new-driver -c
clang bar.cu -c
clang foo.o bar.o -fgpu-rdc -foffload-new-driver

Harbormaster completed remote builds in B161432: Diff 425261.Apr 26 2022, 10:31 AM

In D120272#3475155, @jhuber6 wrote:

Changing this to simply require that the user manually passes -fgpu-rdc in order to use the new driver. I think this makes more sense in the short term and we can move to make the new driver the default rdc approach later. I tested this and the following should workd,

SGTM.

This revision is now accepted and ready to land.Apr 26 2022, 11:41 AM

rnk added inline comments.Apr 26 2022, 11:50 AM

clang/lib/Driver/ToolChains/Clang.cpp
6217–6218	I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you can search the ASan code to see how the start / stop symbols are defined on Windows using various pragmas and `__declspec(allocate)` to set up sections and sort them accordingly. I would love to have a doc that writes up how to implement this array registration mechanism portably for all major platforms, given that we believe it is possible everywhere.

jhuber6 added inline comments.Apr 26 2022, 12:09 PM

clang/lib/Driver/ToolChains/Clang.cpp
6217–6218	I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you can search the ASan code to see how the start / stop symbols are defined on Windows using various pragmas and __declspec(allocate) to set up sections and sort them accordingly. I would love to have a doc that writes up how to implement this array registration mechanism portably for all major platforms, given that we believe it is possible everywhere. Thanks for the information, I was having a hard to figuring out if it was possible to implement this on other platforms. Some documentation for handling this on each platform would definitely be useful as I am hoping this can become the standard way to compile / register offloading languages in LLVM. Let me know if I can do anything to help on that front.

This revision was landed with ongoing or failed builds.Apr 29 2022, 6:15 AM

Closed by commit rGc5e5b54350fe: [CUDA] Add driver support for compiling CUDA with the new driver (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGc5e5b54350fe: [CUDA] Add driver support for compiling CUDA with the new driver.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticDriverKinds.td

2 lines

Driver/

Options.td

1 line

lib/

Driver/

Driver.cpp

136 lines

ToolChains/

Clang.cpp

40 lines

test/

Driver/

cuda-openmp-driver.cu

16 lines

Diff 412908

clang/include/clang/Basic/DiagnosticDriverKinds.td

	Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines
	def err_drv_unsupported_embed_bitcode			def err_drv_unsupported_embed_bitcode
	: Error<"%0 is not supported with -fembed-bitcode">;			: Error<"%0 is not supported with -fembed-bitcode">;
	def err_drv_bitcode_unsupported_on_toolchain : Error<			def err_drv_bitcode_unsupported_on_toolchain : Error<
	"-fembed-bitcode is not supported on versions of iOS prior to 6.0">;			"-fembed-bitcode is not supported on versions of iOS prior to 6.0">;
	def err_drv_negative_columns : Error<			def err_drv_negative_columns : Error<
	"invalid value '%1' in '%0', value must be 'none' or a positive integer">;			"invalid value '%1' in '%0', value must be 'none' or a positive integer">;
	def err_drv_small_columns : Error<			def err_drv_small_columns : Error<
	"invalid value '%1' in '%0', value must be '%2' or greater">;			"invalid value '%1' in '%0', value must be '%2' or greater">;
				def err_drv_non_relocatable : Error<
				"the new driver requires relocatable code, compile with '-fgpu-rdc' enabled">;

	def err_drv_invalid_malign_branch_EQ : Error<			def err_drv_invalid_malign_branch_EQ : Error<
	"invalid argument '%0' to -malign-branch=; each element must be one of: %1">;			"invalid argument '%0' to -malign-branch=; each element must be one of: %1">;

	def warn_O4_is_O3 : Warning<"-O4 is equivalent to -O3">, InGroup<Deprecated>;			def warn_O4_is_O3 : Warning<"-O4 is equivalent to -O3">, InGroup<Deprecated>;
	def warn_drv_optimization_value : Warning<"optimization level '%0' is not supported; using '%1%2' instead">,			def warn_drv_optimization_value : Warning<"optimization level '%0' is not supported; using '%1%2' instead">,
	InGroup<InvalidCommandLineArgument>;			InGroup<InvalidCommandLineArgument>;
	def warn_ignored_gcc_optimization : Warning<"optimization flag '%0' is not supported">,			def warn_ignored_gcc_optimization : Warning<"optimization flag '%0' is not supported">,
	▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,505 Lines • ▼ Show 20 Lines	defm openmp_target_new_runtime: BoolFOption<"openmp-target-new-runtime",
NegFlag<SetFalse>>;		NegFlag<SetFalse>>;
defm openmp_optimistic_collapse : BoolFOption<"openmp-optimistic-collapse",		defm openmp_optimistic_collapse : BoolFOption<"openmp-optimistic-collapse",
LangOpts<"OpenMPOptimisticCollapse">, DefaultFalse,		LangOpts<"OpenMPOptimisticCollapse">, DefaultFalse,
PosFlag<SetTrue, [CC1Option]>, NegFlag<SetFalse>, BothFlags<[NoArgumentUnused, HelpHidden]>>;		PosFlag<SetTrue, [CC1Option]>, NegFlag<SetFalse>, BothFlags<[NoArgumentUnused, HelpHidden]>>;
def static_openmp: Flag<["-"], "static-openmp">,		def static_openmp: Flag<["-"], "static-openmp">,
HelpText<"Use the static host OpenMP runtime while linking.">;		HelpText<"Use the static host OpenMP runtime while linking.">;
def fopenmp_new_driver : Flag<["-"], "fopenmp-new-driver">, Flags<[CC1Option]>, Group<Action_Group>,		def fopenmp_new_driver : Flag<["-"], "fopenmp-new-driver">, Flags<[CC1Option]>, Group<Action_Group>,
HelpText<"Use the new driver for OpenMP offloading.">;		HelpText<"Use the new driver for OpenMP offloading.">;
		def : Flag<["-"], "foffload-new-driver">, Alias<fopenmp_new_driver>;
def fno_optimize_sibling_calls : Flag<["-"], "fno-optimize-sibling-calls">, Group<f_Group>;		def fno_optimize_sibling_calls : Flag<["-"], "fno-optimize-sibling-calls">, Group<f_Group>;
def foptimize_sibling_calls : Flag<["-"], "foptimize-sibling-calls">, Group<f_Group>;		def foptimize_sibling_calls : Flag<["-"], "foptimize-sibling-calls">, Group<f_Group>;
defm escaping_block_tail_calls : BoolFOption<"escaping-block-tail-calls",		defm escaping_block_tail_calls : BoolFOption<"escaping-block-tail-calls",
CodeGenOpts<"NoEscapingBlockTailCalls">, DefaultFalse,		CodeGenOpts<"NoEscapingBlockTailCalls">, DefaultFalse,
NegFlag<SetTrue, [CC1Option]>, PosFlag<SetFalse>>;		NegFlag<SetTrue, [CC1Option]>, PosFlag<SetFalse>>;
def force__cpusubtype__ALL : Flag<["-"], "force_cpusubtype_ALL">;		def force__cpusubtype__ALL : Flag<["-"], "force_cpusubtype_ALL">;
def force__flat__namespace : Flag<["-"], "force_flat_namespace">;		def force__flat__namespace : Flag<["-"], "force_flat_namespace">;
def force__load : Separate<["-"], "force_load">;		def force__load : Separate<["-"], "force_load">;
▲ Show 20 Lines • Show All 4,089 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 3,947 Lines • ▼ Show 20 Lines	if (!Args.hasArg(options::OPT_fopenmp_new_driver))
LinkerInputs.push_back(Wrapper);		LinkerInputs.push_back(Wrapper);
Action *LA;		Action *LA;
// Check if this Linker Job should emit a static library.		// Check if this Linker Job should emit a static library.
if (ShouldEmitStaticLibrary(Args)) {		if (ShouldEmitStaticLibrary(Args)) {
LA = C.MakeAction<StaticLibJobAction>(LinkerInputs, types::TY_Image);		LA = C.MakeAction<StaticLibJobAction>(LinkerInputs, types::TY_Image);
} else if (Args.hasArg(options::OPT_fopenmp_new_driver) &&		} else if (Args.hasArg(options::OPT_fopenmp_new_driver) &&
C.getActiveOffloadKinds() != Action::OFK_None) {		C.getActiveOffloadKinds() != Action::OFK_None) {
LA = C.MakeAction<LinkerWrapperJobAction>(LinkerInputs, types::TY_Image);		LA = C.MakeAction<LinkerWrapperJobAction>(LinkerInputs, types::TY_Image);
LA->propagateHostOffloadInfo(C.getActiveOffloadKinds(),		LA->propagateHostOffloadInfo(C.getActiveOffloadKinds(),
		jdoerfertUnsubmitted Done Reply Inline Actions Everything till here can be a NFC commit, right? Let's split it off jdoerfert: Everything till here can be a NFC commit, right? Let's split it off
/BoundArch=/nullptr);		/BoundArch=/nullptr);
} else {		} else {
LA = C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image);		LA = C.MakeAction<LinkJobAction>(LinkerInputs, types::TY_Image);
}		}
if (!Args.hasArg(options::OPT_fopenmp_new_driver))		if (!Args.hasArg(options::OPT_fopenmp_new_driver))
LA = OffloadBuilder.processHostLinkAction(LA);		LA = OffloadBuilder.processHostLinkAction(LA);
Actions.push_back(LA);		Actions.push_back(LA);
}		}
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
Args.ClaimAllArgs(options::OPT_cl_ignored_Group);		Args.ClaimAllArgs(options::OPT_cl_ignored_Group);

// Claim --cuda-host-only and --cuda-compile-host-device, which may be passed		// Claim --cuda-host-only and --cuda-compile-host-device, which may be passed
// to non-CUDA compilations and should not trigger warnings there.		// to non-CUDA compilations and should not trigger warnings there.
Args.ClaimAllArgs(options::OPT_cuda_host_only);		Args.ClaimAllArgs(options::OPT_cuda_host_only);
Args.ClaimAllArgs(options::OPT_cuda_compile_host_device);		Args.ClaimAllArgs(options::OPT_cuda_compile_host_device);
}		}

		/// Returns the canonical name for the offloading architecture when using HIP or
		/// CUDA.
		static StringRef getCanonicalArchString(Compilation &C,
		llvm::opt::DerivedArgList &Args,
		StringRef ArchStr,
		Action::OffloadKind Kind) {
		if (Kind == Action::OFK_Cuda) {
		CudaArch Arch = StringToCudaArch(ArchStr);
		if (Arch == CudaArch::UNKNOWN \|\| !IsNVIDIAGpuArch(Arch)) {
		C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch) << ArchStr;
		return StringRef();
		}
		return Args.MakeArgStringRef(CudaArchToString(Arch));
		} else if (Kind == Action::OFK_HIP) {
		llvm::StringMap<bool> Features;
		// getHIPOffloadTargetTriple() is known to return valid value as it has
		// been called successfully in the CreateOffloadingDeviceToolChains().
		auto Arch = parseTargetID(
		*getHIPOffloadTargetTriple(C.getDriver(), C.getInputArgs()), ArchStr,
		&Features);
		if (!Arch) {
		C.getDriver().Diag(clang::diag::err_drv_bad_target_id) << ArchStr;
		C.setContainsError();
		return StringRef();
		}
		return Args.MakeArgStringRef(
		getCanonicalTargetID(Arch.getValue(), Features));
		}
		return StringRef();
		}

		/// Checks if the set offloading architectures does not conflict. Returns the
		/// incompatible pair if a conflict occurs.
		static llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>
		getConflictOffloadArchCombination(const llvm::DenseSet<StringRef> &Archs,
		Action::OffloadKind Kind) {
		if (Kind != Action::OFK_HIP)
		return None;

		std::set<StringRef> ArchSet;
		llvm::copy(Archs, std::back_inserter(ArchSet));
		return getConflictTargetIDCombination(ArchSet);
		}

		/// Returns the set of bound architectures active for this compilation kind.
		/// This function returns a set of bound architectures, if there are bound
		/// architctures we return a set containing only the empty string.
		static llvm::DenseSet<StringRef>
		getOffloadArchs(Compilation &C, llvm::opt::DerivedArgList &Args,
		Action::OffloadKind Kind) {

		// If this is OpenMP offloading we don't use a bound architecture.
		if (Kind == Action::OFK_OpenMP)
		return llvm::DenseSet<StringRef>{StringRef()};

		yaxunlUnsubmitted Done Reply Inline Actions The final set depends on the order of -offload-arch and -no-offload-arch options, e.g. `--offload-arch=gfx906 --no-offload-arch=gfx906` and `--no-offload-arch=gfx906 --offload-arch=gfx906` is different. Also there is `--no-offload-arch=all` which removes all precedent --offload-arch= options. yaxunl: The final set depends on the order of -offload-arch and -no-offload-arch options, e.g. `…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I see, so we need to iterate the arguments in-order and insert or erase them as they are entered, I'll fix it. jhuber6: I see, so we need to iterate the arguments in-order and insert or erase them as they are…
		// --offload and --offload-arch options are mutually exclusive.
		if (Args.hasArgNoClaim(options::OPT_offload_EQ) &&
		Args.hasArgNoClaim(options::OPT_offload_arch_EQ,
		options::OPT_no_offload_arch_EQ)) {
		C.getDriver().Diag(diag::err_opt_not_valid_with_opt) << "--offload-arch"
		yaxunlUnsubmitted Done Reply Inline Actions should this be HIP? yaxunl: should this be HIP?
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Whoops, haven't tested it with HIP yet. jhuber6: Whoops, haven't tested it with HIP yet.
		traUnsubmitted Done Reply Inline Actions Nit: It would be nice to report specific option which triggered the diag. Reporting `--offload-arch` when it's triggered by `--no-offload-arch` would be somewhat confusing. `Args.hasArgNoClaim(options::OPT_offload_arch_EQ) ? "--offload-arch" : --no-offload-arch` ? tra: Nit: It would be nice to report specific option which triggered the diag. Reporting `--offload…
		<< "--offload";
		jdoerfertUnsubmitted Done Reply Inline Actions Can we have a doxygen comment explaining what these helpers do? jdoerfert: Can we have a doxygen comment explaining what these helpers do?
		}

		llvm::DenseSet<StringRef> Archs;
		for (auto &Arg : Args) {
		if (Arg->getOption().matches(options::OPT_offload_arch_EQ)) {
		Archs.insert(getCanonicalArchString(C, Args, Arg->getValue(), Kind));
		} else if (Arg->getOption().matches(options::OPT_no_offload_arch_EQ)) {
		if (Arg->getValue() == "all")
		Archs.clear();
		else
		Archs.erase(getCanonicalArchString(C, Args, Arg->getValue(), Kind));
		}
		}

		if (auto ConflictingArchs = getConflictOffloadArchCombination(Archs, Kind)) {
		C.getDriver().Diag(clang::diag::err_drv_bad_offload_arch_combo)
		<< ConflictingArchs.getValue().first
		<< ConflictingArchs.getValue().second;
		C.setContainsError();
		}

		if (Archs.empty()) {
		if (Kind == Action::OFK_Cuda)
		Archs.insert(CudaArchToString(CudaArch::SM_35));
		else if (Kind == Action::OFK_HIP)
		Archs.insert(CudaArchToString(CudaArch::GFX803));
		traUnsubmitted Done Reply Inline Actions If we do not have constants for the default CUDA/HIP arch yet, we should probably add them and use them here. tra: If we do not have constants for the default CUDA/HIP arch yet, we should probably add them and…
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Defaulting hip to gfx803 is unlikely to be helpful. It won't run on other architectures, i.e. there's no conservative default that will run on most things. I guess that's an existing quirk of the hip toolchain? JonChesterfield: Defaulting hip to gfx803 is unlikely to be helpful. It won't run on other architectures, i.e.
		traUnsubmitted Not Done Reply Inline Actions I agree that there's no "safe" choice of GPU target. It applies to CUDA, as well, at least for GPU binaries. That said, we still want `clang -c foo.cu` to work. For CUDA we use the oldest variant still supported by the vendor. It produces PTX assembly which we embed along with the GPU binary. That PTX is valid for newer GPU archtectures and CUDA runtime will be able to compile it for the GPU the app happens to run on. It's not ideal, but it works. While for AMDGPU we do not have such forward compatibility mode as we have with CUDA, being able to compile for something by default is still useful, IMO. tra: I agree that there's no "safe" choice of GPU target. It applies to CUDA, as well, at least for…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I just copied GFX803 because that's what the previous driver used. I agree we should just default to something, maybe in the AMD case we can issue a warning telling them to use the flag to properly specify it. jhuber6: I just copied GFX803 because that's what the previous driver used. I agree we should just…
		}

		return Archs;
		}

Action *Driver::BuildOffloadingActions(Compilation &C,		Action *Driver::BuildOffloadingActions(Compilation &C,
llvm::opt::DerivedArgList &Args,		llvm::opt::DerivedArgList &Args,
const InputTy &Input,		const InputTy &Input,
Action *HostAction) const {		Action *HostAction) const {
if (!isa<CompileJobAction>(HostAction))		if (!isa<CompileJobAction>(HostAction))
return HostAction;		return HostAction;

OffloadAction::DeviceDependences DDeps;		OffloadAction::DeviceDependences DDeps;

types::ID InputType = Input.first;		types::ID InputType = Input.first;
const Arg *InputArg = Input.second;		const Arg *InputArg = Input.second;

const Action::OffloadKind OffloadKinds[] = {Action::OFK_OpenMP};		const Action::OffloadKind OffloadKinds[] = {
		Action::OFK_OpenMP, Action::OFK_Cuda, Action::OFK_HIP};
		jdoerfertUnsubmitted Not Done Reply Inline Actions With the NFC commit we can probably also include some of this but restricted to OFK_OpenMP. Try to minimize the functional change that one has to think about. jdoerfert: With the NFC commit we can probably also include some of this but restricted to OFK_OpenMP. Try…

for (Action::OffloadKind Kind : OffloadKinds) {		for (Action::OffloadKind Kind : OffloadKinds) {
SmallVector<const ToolChain *, 2> ToolChains;		SmallVector<const ToolChain *, 2> ToolChains;
ActionList DeviceActions;		ActionList DeviceActions;

		const bool Relocatable =
		Kind == Action::OFK_OpenMP \|\|
		Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
		/Default=/false);

auto TCRange = C.getOffloadToolChains(Kind);		auto TCRange = C.getOffloadToolChains(Kind);
for (auto TI = TCRange.first, TE = TCRange.second; TI != TE; ++TI)		for (auto TI = TCRange.first, TE = TCRange.second; TI != TE; ++TI)
ToolChains.push_back(TI->second);		ToolChains.push_back(TI->second);

if (ToolChains.empty())		if (ToolChains.empty())
continue;		continue;

for (unsigned I = 0; I < ToolChains.size(); ++I)		if (!Relocatable)
		Diags.Report(diag::err_drv_non_relocatable);
		traUnsubmitted Done Reply Inline Actions Do we want to return early here? tra: Do we want to return early here?

		// Get the product of all bound architectures and toolchains.
		SmallVector<std::pair<const ToolChain *, StringRef>> TCAndArchs;
		for (const ToolChain *TC : ToolChains)
		for (StringRef Arch : getOffloadArchs(C, Args, Kind))
		TCAndArchs.push_back(std::make_pair(TC, Arch));

		for (unsigned I = 0, E = TCAndArchs.size(); I != E; ++I)
		traUnsubmitted Done Reply Inline Actions If we do not allow non-relocatable compilation with the new driver, perhaps we should make `-fgpu-rdc` enabled by default with the new driver and error out if someone attempts to use `-fno-gpu-rdc`. Otherwise we're virtually guaranteed that everyone attempting to use `-foffload-new-driver` will hit this error. tra: If we do not allow non-relocatable compilation with the new driver, perhaps we should make `…
DeviceActions.push_back(C.MakeAction<InputAction>(*InputArg, InputType));		DeviceActions.push_back(C.MakeAction<InputAction>(*InputArg, InputType));
		traUnsubmitted Not Done Reply Inline Actions This loop can be folded into the loop above. Alternatively, for simple loops like this one could use `llvm::for_each`. tra: This loop can be folded into the loop above. Alternatively, for simple loops like this one…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions It could, I think the intention is clearer (i.e. make an input for every toolchain and architecture) having them separate. But I can merge them if that's better. jhuber6: It could, I think the intention is clearer (i.e. make an input for every toolchain and…

if (DeviceActions.empty())		if (DeviceActions.empty())
return HostAction;		return HostAction;

auto PL = types::getCompilationPhases(*this, Args, InputType);		auto PL = types::getCompilationPhases(*this, Args, InputType);

for (phases::ID Phase : PL) {		for (phases::ID Phase : PL) {
if (Phase == phases::Link) {		if (Phase == phases::Link) {
assert(Phase == PL.back() && "linking must be final compilation step.");		assert(Phase == PL.back() && "linking must be final compilation step.");
break;		break;
}		}

auto TC = ToolChains.begin();		auto TCAndArch = TCAndArchs.begin();
for (Action *&A : DeviceActions) {		for (Action *&A : DeviceActions) {
A = ConstructPhaseAction(C, Args, Phase, A, Kind);		A = ConstructPhaseAction(C, Args, Phase, A, Kind);

if (isa<CompileJobAction>(A) && Kind == Action::OFK_OpenMP) {		if (isa<CompileJobAction>(A) && Kind == Action::OFK_OpenMP) {
HostAction->setCannotBeCollapsedWithNextDependentAction();		HostAction->setCannotBeCollapsedWithNextDependentAction();
OffloadAction::HostDependence HDep(		OffloadAction::HostDependence HDep(
HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),		HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),
/BourdArch=/nullptr, Action::OFK_OpenMP);		/BoundArch=/nullptr, Kind);
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(A, TC, /BoundArch=*/nullptr, Kind);		DDep.add(A, TCAndArch->first, /BoundArch=/nullptr, Kind);
A = C.MakeAction<OffloadAction>(HDep, DDep);		A = C.MakeAction<OffloadAction>(HDep, DDep);
		++TCAndArch;
		traUnsubmitted Not Done Reply Inline Actions Could you elaborate why TCAndArch is incremented only here? Don't other cases need to advance it, too? At the very least we need some comments here and for the loop in general, describing what's going on. tra: Could you elaborate why TCAndArch is incremented only here? Don't other cases need to advance…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions It shouldn't be, I forgot to move this out of the conditional once I added more things, I'll explain the usage as well. jhuber6: It shouldn't be, I forgot to move this out of the conditional once I added more things, I'll…
		} else if (isa<AssembleJobAction>(A) && Kind == Action::OFK_Cuda) {
		ActionList FatbinActions;
		for (Action *A : {A, A->getInputs()[0]}) {
		OffloadAction::DeviceDependences DDep;
		DDep.add(A, TCAndArch->first, TCAndArch->second.data(), Kind);
		FatbinActions.emplace_back(
		C.MakeAction<OffloadAction>(DDep, A->getType()));
		}
		A = C.MakeAction<LinkJobAction>(FatbinActions, types::TY_CUDA_FATBIN);
}		}
++TC;
}		}
}		}

auto TC = ToolChains.begin();		auto TCAndArch = TCAndArchs.begin();
for (Action *A : DeviceActions) {		for (Action *A : DeviceActions) {
DDeps.add(A, TC, /BoundArch=*/nullptr, Kind);		DDeps.add(A, TCAndArch->first, TCAndArch->second.data(), Kind);
		traUnsubmitted Not Done Reply Inline Actions Nit: We do have `llvm::zip` for iterating over multiple containers. However, unpacking loop variable results maybe more trouble than it's worth it in such a small loop, Up to you. tra: Nit: We do have `llvm::zip` for iterating over multiple containers. However, unpacking loop…
TC++;		++TCAndArch;
}		}
}		}

OffloadAction::HostDependence HDep(		OffloadAction::HostDependence HDep(
HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),		HostAction, C.getSingleOffloadToolChain<Action::OFK_Host>(),
/BoundArch=/nullptr, DDeps);		/BoundArch=/nullptr, DDeps);
return C.MakeAction<OffloadAction>(HDep, DDeps);		return C.MakeAction<OffloadAction>(HDep, DDeps);
}		}
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	Action *Driver::ConstructPhaseAction(
}		}
case phases::Backend: {		case phases::Backend: {
if (isUsingLTO() && TargetDeviceOffloadKind == Action::OFK_None) {		if (isUsingLTO() && TargetDeviceOffloadKind == Action::OFK_None) {
types::ID Output =		types::ID Output =
Args.hasArg(options::OPT_S) ? types::TY_LTO_IR : types::TY_LTO_BC;		Args.hasArg(options::OPT_S) ? types::TY_LTO_IR : types::TY_LTO_BC;
return C.MakeAction<BackendJobAction>(Input, Output);		return C.MakeAction<BackendJobAction>(Input, Output);
}		}
if (isUsingLTO(/* IsOffload */ true) &&		if (isUsingLTO(/* IsOffload */ true) &&
TargetDeviceOffloadKind == Action::OFK_OpenMP) {		TargetDeviceOffloadKind != Action::OFK_None) {
types::ID Output =		types::ID Output =
Args.hasArg(options::OPT_S) ? types::TY_LTO_IR : types::TY_LTO_BC;		Args.hasArg(options::OPT_S) ? types::TY_LTO_IR : types::TY_LTO_BC;
return C.MakeAction<BackendJobAction>(Input, Output);		return C.MakeAction<BackendJobAction>(Input, Output);
}		}
if (Args.hasArg(options::OPT_emit_llvm) \|\|		if (Args.hasArg(options::OPT_emit_llvm) \|\|
(TargetDeviceOffloadKind == Action::OFK_HIP &&		(TargetDeviceOffloadKind == Action::OFK_HIP &&
Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,		Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc,
false))) {		false))) {
▲ Show 20 Lines • Show All 1,663 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,385 Lines • ▼ Show 20 Lines	void Clang::ConstructJob(Compilation &C, const JobAction &JA,
assert(Inputs.size() >= 1 && "Must have at least one input.");		assert(Inputs.size() >= 1 && "Must have at least one input.");
// CUDA/HIP compilation may have multiple inputs (source file + results of		// CUDA/HIP compilation may have multiple inputs (source file + results of
// device-side compilations). OpenMP device jobs also take the host IR as a		// device-side compilations). OpenMP device jobs also take the host IR as a
// second input. Module precompilation accepts a list of header files to		// second input. Module precompilation accepts a list of header files to
// include as part of the module. All other jobs are expected to have exactly		// include as part of the module. All other jobs are expected to have exactly
// one input.		// one input.
bool IsCuda = JA.isOffloading(Action::OFK_Cuda);		bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
bool IsCudaDevice = JA.isDeviceOffloading(Action::OFK_Cuda);		bool IsCudaDevice = JA.isDeviceOffloading(Action::OFK_Cuda);
		bool IsCudaHost = JA.isHostOffloading(Action::OFK_Cuda);
bool IsHIP = JA.isOffloading(Action::OFK_HIP);		bool IsHIP = JA.isOffloading(Action::OFK_HIP);
bool IsHIPDevice = JA.isDeviceOffloading(Action::OFK_HIP);		bool IsHIPDevice = JA.isDeviceOffloading(Action::OFK_HIP);
bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP);		bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP);
bool IsOpenMPHost = JA.isHostOffloading(Action::OFK_OpenMP);		bool IsOpenMPHost = JA.isHostOffloading(Action::OFK_OpenMP);
bool IsHeaderModulePrecompile = isa<HeaderModulePrecompileJobAction>(JA);		bool IsHeaderModulePrecompile = isa<HeaderModulePrecompileJobAction>(JA);
bool IsDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) \|\|		bool IsDeviceOffloadAction = !(JA.isDeviceOffloading(Action::OFK_None) \|\|
JA.isDeviceOffloading(Action::OFK_Host));		JA.isDeviceOffloading(Action::OFK_Host));
bool IsUsingLTO = D.isUsingLTO(IsDeviceOffloadAction);		bool IsUsingLTO = D.isUsingLTO(IsDeviceOffloadAction);
auto LTOMode = D.getLTOMode(IsDeviceOffloadAction);		auto LTOMode = D.getLTOMode(IsDeviceOffloadAction);

// A header module compilation doesn't have a main input file, so invent a		// A header module compilation doesn't have a main input file, so invent a
// fake one as a placeholder.		// fake one as a placeholder.
const char *ModuleName = [&]{		const char *ModuleName = [&]{
auto *ModuleNameArg = Args.getLastArg(options::OPT_fmodule_name_EQ);		auto *ModuleNameArg = Args.getLastArg(options::OPT_fmodule_name_EQ);
return ModuleNameArg ? ModuleNameArg->getValue() : "";		return ModuleNameArg ? ModuleNameArg->getValue() : "";
}();		}();
InputInfo HeaderModuleInput(Inputs[0].getType(), ModuleName, ModuleName);		InputInfo HeaderModuleInput(Inputs[0].getType(), ModuleName, ModuleName);

const InputInfo &Input =		const InputInfo &Input =
IsHeaderModulePrecompile ? HeaderModuleInput : Inputs[0];		IsHeaderModulePrecompile ? HeaderModuleInput : Inputs[0];

InputInfoList ModuleHeaderInputs;		InputInfoList ModuleHeaderInputs;
InputInfoList OpenMPHostInputs;		InputInfoList OpenMPHostInputs;
		InputInfoList CudaHostInputs;
const InputInfo *CudaDeviceInput = nullptr;		const InputInfo *CudaDeviceInput = nullptr;
const InputInfo *OpenMPDeviceInput = nullptr;		const InputInfo *OpenMPDeviceInput = nullptr;
for (const InputInfo &I : Inputs) {		for (const InputInfo &I : Inputs) {
if (&I == &Input) {		if (&I == &Input) {
// This is the primary input.		// This is the primary input.
} else if (IsHeaderModulePrecompile &&		} else if (IsHeaderModulePrecompile &&
types::getPrecompiledType(I.getType()) == types::TY_PCH) {		types::getPrecompiledType(I.getType()) == types::TY_PCH) {
types::ID Expected = HeaderModuleInput.getType();		types::ID Expected = HeaderModuleInput.getType();
if (I.getType() != Expected) {		if (I.getType() != Expected) {
D.Diag(diag::err_drv_module_header_wrong_kind)		D.Diag(diag::err_drv_module_header_wrong_kind)
<< I.getFilename() << types::getTypeName(I.getType())		<< I.getFilename() << types::getTypeName(I.getType())
<< types::getTypeName(Expected);		<< types::getTypeName(Expected);
}		}
ModuleHeaderInputs.push_back(I);		ModuleHeaderInputs.push_back(I);
		} else if (IsCudaHost && Args.hasArg(options::OPT_fopenmp_new_driver)) {
		CudaHostInputs.push_back(I);
} else if ((IsCuda \|\| IsHIP) && !CudaDeviceInput) {		} else if ((IsCuda \|\| IsHIP) && !CudaDeviceInput) {
CudaDeviceInput = &I;		CudaDeviceInput = &I;
} else if (IsOpenMPDevice && !OpenMPDeviceInput) {		} else if (IsOpenMPDevice && !OpenMPDeviceInput) {
OpenMPDeviceInput = &I;		OpenMPDeviceInput = &I;
} else if (IsOpenMPHost) {		} else if (IsOpenMPHost) {
OpenMPHostInputs.push_back(I);		OpenMPHostInputs.push_back(I);
} else {		} else {
llvm_unreachable("unexpectedly given multiple inputs");		llvm_unreachable("unexpectedly given multiple inputs");
▲ Show 20 Lines • Show All 1,766 Lines • ▼ Show 20 Lines	if (Args.hasFlag(options::OPT_fhip_new_launch_api,
options::OPT_fno_hip_new_launch_api, true))		options::OPT_fno_hip_new_launch_api, true))
CmdArgs.push_back("-fhip-new-launch-api");		CmdArgs.push_back("-fhip-new-launch-api");
if (Args.hasFlag(options::OPT_fgpu_allow_device_init,		if (Args.hasFlag(options::OPT_fgpu_allow_device_init,
options::OPT_fno_gpu_allow_device_init, false))		options::OPT_fno_gpu_allow_device_init, false))
CmdArgs.push_back("-fgpu-allow-device-init");		CmdArgs.push_back("-fgpu-allow-device-init");
}		}

if (IsCuda \|\| IsHIP) {		if (IsCuda \|\| IsHIP) {
if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))		if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))
		traUnsubmitted Not Done Reply Inline Actions This has to be `Args.hasArg(options::OPT_fno_gpu_rdc) && Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) == false` E.g. we don't want a warning if we have `-foffload-new-driver -fno-gpu-rdc -fgpu-rdc`. tra: This has to be `Args.hasArg(options::OPT_fno_gpu_rdc) && Args.hasFlag(options::OPT_fgpu_rdc…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions K, will do jhuber6: K, will do
CmdArgs.push_back("-fgpu-rdc");		CmdArgs.push_back("-fgpu-rdc");
		traUnsubmitted Not Done Reply Inline Actions If user specifies both `-fno-gpu-rdc` and `-foffload-new-driver` we would still enable RDC compilation. We may want to at least issue a warning. Considering that we have multiple places where we may check for `-f[no]gpu-rdc` we should make sure we don't get different ideas whether RDC has been enabled. I think it may make sense to provide a common way to figure it out. Either via a helper function that would process CLI arguments or calculate it once and save it somewhere. tra: If user specifies both `-fno-gpu-rdc` and `-foffload-new-driver` we would still enable RDC…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I haven't quite finalized how to handle this. The new driver should be compatible with a non-RDC build since we simply wouldn't embed the device image or create offloading entries. It's a little bit more difficult here since the new method is opt-in so it requires a flag. We should definitely emit a warning if both are enabled (I'm assuming there's one for passing both `fgpu-rdc` and `fno-gpu-rdc`). I'll add one in. Also we could consider the new driver the RDC in the future which would be the easiest. The problem is if we want to support CUDA's method of RDC considering how other build systems seem to expect it. I could see us embedding the fatbinary in the object file, even if unused, just so that cuobjdump works. However we couldn't support the generation of `__cudaRegisterFatBinary_nv....` functions because then those would cause linker errors. WDYT? jhuber6: I haven't quite finalized how to handle this. The new driver should be compatible with a non…
		traUnsubmitted Not Done Reply Inline Actions I'm assuming there's one for passing both fgpu-rdc and fno-gpu-rdc This is not a valid assumption. The whole idea behind `-fno-something` is that the options can be overridden. E.g. if the build specifies a standard set of compiler options, but we need to override some of them when building a particular file. We can only do so by appending to the standard options. Potentially we may end up having those options overridden again. While it's not very common, it's certainly possible. It's also possible to start with '-fno-gpu-rdc' and then override it with `-fgpu-rdc`. In this case, we care about the final state of RDC specified by -fgpu-rdc options, not by the fact that `-fno-gpu-rdc` is present. `Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)` does exactly that -- gives you the final state. If it returns false, but we have `-foffload-new-driver`, then we need a warning as these options are contradictory. tra:* > I'm assuming there's one for passing both fgpu-rdc and fno-gpu-rdc This is not a valid…
		traUnsubmitted Not Done Reply Inline Actions The new driver should be compatible with a non-RDC build In that case, we don't need a warning, but we do need a test verifying this behavior. Also we could consider the new driver the RDC in the future which would be the easiest. SGTM. I do not know how it all will work out in the end. Your proposed model makes a lot of sense, and I'm guardedly optimistic about it. Eventually we would deprecate RDC options, but we still need to work sensibly when user specifies a mix of these options. tra: > The new driver should be compatible with a non-RDC build In that case, we don't need a…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions In that case, we don't need a warning, but we do need a test verifying this behavior. It's possible but I don't have the logic here to do it, figured we can cross that bridge later. SGTM. I do not know how it all will work out in the end. Your proposed model makes a lot of sense, and I'm guardedly optimistic about it. So the only downsides I know of, is that we don't currently replicate CUDA's magic to JIT RDC code (We can do this with LTO anyway), and that registering offload entries relies on the linker defining `__start / __stop` variables, which I don't know if linkers on Windows / MacOS provide. I'd be really interested if someone on the LLD team knew the answer to that. jhuber6: > In that case, we don't need a warning, but we do need a test verifying this behavior. > It's…
		traUnsubmitted Not Done Reply Inline Actions relies on the linker defining start / stop variables, which I don't know if linkers on Windows / MacOS provide. I'd be really interested if someone on the LLD team knew the answer to that. @MaskRay, @rnk - would you happen to know the answer? tra: > relies on the linker defining __start / __stop variables, which I don't know if linkers on…
		rnkUnsubmitted Not Done Reply Inline Actions I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you can search the ASan code to see how the start / stop symbols are defined on Windows using various pragmas and `__declspec(allocate)` to set up sections and sort them accordingly. I would love to have a doc that writes up how to implement this array registration mechanism portably for all major platforms, given that we believe it is possible everywhere. rnk: I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you can…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you can search the ASan code to see how the start / stop symbols are defined on Windows using various pragmas and __declspec(allocate) to set up sections and sort them accordingly. I would love to have a doc that writes up how to implement this array registration mechanism portably for all major platforms, given that we believe it is possible everywhere. Thanks for the information, I was having a hard to figuring out if it was possible to implement this on other platforms. Some documentation for handling this on each platform would definitely be useful as I am hoping this can become the standard way to compile / register offloading languages in LLVM. Let me know if I can do anything to help on that front. jhuber6: > I believe MachO has an equivalent mechanism, but I'm not familiar with it. For PE/COFF, you…
if (Args.hasFlag(options::OPT_fgpu_defer_diag,		if (Args.hasFlag(options::OPT_fgpu_defer_diag,
		traUnsubmitted Not Done Reply Inline Actions Combine both ifs, so we don't add `-fgpu-rdc` twice? tra: Combine both ifs, so we don't add `-fgpu-rdc` twice?
options::OPT_fno_gpu_defer_diag, false))		options::OPT_fno_gpu_defer_diag, false))
		traUnsubmitted Not Done Reply Inline Actions The warning does not take any parameters and this one looks wrong anyways. tra: The warning does not take any parameters and this one looks wrong anyways.
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Whoops, deleted that previously but had a little SNAFU with my commits and forgot to do it again. jhuber6: Whoops, deleted that previously but had a little SNAFU with my commits and forgot to do it…
CmdArgs.push_back("-fgpu-defer-diag");		CmdArgs.push_back("-fgpu-defer-diag");
if (Args.hasFlag(options::OPT_fgpu_exclude_wrong_side_overloads,		if (Args.hasFlag(options::OPT_fgpu_exclude_wrong_side_overloads,
options::OPT_fno_gpu_exclude_wrong_side_overloads,		options::OPT_fno_gpu_exclude_wrong_side_overloads,
false)) {		false)) {
CmdArgs.push_back("-fgpu-exclude-wrong-side-overloads");		CmdArgs.push_back("-fgpu-exclude-wrong-side-overloads");
CmdArgs.push_back("-fgpu-defer-diag");		CmdArgs.push_back("-fgpu-defer-diag");
}		}
}		}
▲ Show 20 Lines • Show All 671 Lines • ▼ Show 20 Lines
}		}

// Host-side cuda compilation receives all device-side outputs in a single		// Host-side cuda compilation receives all device-side outputs in a single
// fatbin as Inputs[1]. Include the binary with -fcuda-include-gpubinary.		// fatbin as Inputs[1]. Include the binary with -fcuda-include-gpubinary.
if ((IsCuda \|\| IsHIP) && CudaDeviceInput) {		if ((IsCuda \|\| IsHIP) && CudaDeviceInput) {
CmdArgs.push_back("-fcuda-include-gpubinary");		CmdArgs.push_back("-fcuda-include-gpubinary");
CmdArgs.push_back(CudaDeviceInput->getFilename());		CmdArgs.push_back(CudaDeviceInput->getFilename());
if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))		if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false))
CmdArgs.push_back("-fgpu-rdc");		CmdArgs.push_back("-fgpu-rdc");
		traUnsubmitted Not Done Reply Inline Actions I'm not sure why we're no longer checking for `OPT_foffload_new_driver` here. Don't we want to have the same RDC mode on the host and device sides? I think we do as that affects the way we mangle some symbols and it has to be consistent on both sides. tra: I'm not sure why we're no longer checking for `OPT_foffload_new_driver` here. Don't we want to…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions This is only checked with `CudaDeviceInput` which we don't use with the new driver. jhuber6: This is only checked with `CudaDeviceInput` which we don't use with the new driver.
}		}

		traUnsubmitted Not Done Reply Inline Actions Ditto. tra: Ditto.
if (IsCuda) {		if (IsCuda) {
if (Args.hasFlag(options::OPT_fcuda_short_ptr,		if (Args.hasFlag(options::OPT_fcuda_short_ptr,
options::OPT_fno_cuda_short_ptr, false))		options::OPT_fno_cuda_short_ptr, false))
CmdArgs.push_back("-fcuda-short-ptr");		CmdArgs.push_back("-fcuda-short-ptr");
}		}

if (IsCuda \|\| IsHIP) {		if (IsCuda \|\| IsHIP) {
// Determine the original source input.		// Determine the original source input.
Show All 37 Lines

// Host-side OpenMP offloading recieves the device object files and embeds it		// Host-side OpenMP offloading recieves the device object files and embeds it
// in a named section including the associated target triple and architecture.		// in a named section including the associated target triple and architecture.
if (IsOpenMPHost && !OpenMPHostInputs.empty()) {		if (IsOpenMPHost && !OpenMPHostInputs.empty()) {
auto InputFile = OpenMPHostInputs.begin();		auto InputFile = OpenMPHostInputs.begin();
auto OpenMPTCs = C.getOffloadToolChains<Action::OFK_OpenMP>();		auto OpenMPTCs = C.getOffloadToolChains<Action::OFK_OpenMP>();
for (auto TI = OpenMPTCs.first, TE = OpenMPTCs.second; TI != TE;		for (auto TI = OpenMPTCs.first, TE = OpenMPTCs.second; TI != TE;
++TI, ++InputFile) {		++TI, ++InputFile) {
		assert(InputFile->isFilename() && "Offloading requires a filename");
const ToolChain *TC = TI->second;		const ToolChain *TC = TI->second;
const ArgList &TCArgs = C.getArgsForToolChain(TC, "", Action::OFK_OpenMP);		const ArgList &TCArgs = C.getArgsForToolChain(TC, "", Action::OFK_OpenMP);
StringRef File =		StringRef File =
C.getArgs().MakeArgString(TC->getInputFilename(*InputFile));		C.getArgs().MakeArgString(TC->getInputFilename(*InputFile));
StringRef InputName = Clang::getBaseInputStem(Args, Inputs);		StringRef InputName = Clang::getBaseInputStem(Args, Inputs);

CmdArgs.push_back(Args.MakeArgString(		CmdArgs.push_back(Args.MakeArgString(
"-fembed-offload-object=" + File + "," +		"-fembed-offload-object=" + File + "," +
Action::GetOffloadKindName(Action::OFK_OpenMP) + "." +		Action::GetOffloadKindName(Action::OFK_OpenMP) + "." +
TC->getTripleString() + "." +		TC->getTripleString() + "." +
TCArgs.getLastArgValue(options::OPT_march_EQ) + "." + InputName));		TCArgs.getLastArgValue(options::OPT_march_EQ) + "." + InputName));
}		}
		} else if (IsCudaHost && !CudaHostInputs.empty()) {
		const ToolChain *TC = C.getSingleOffloadToolChain<Action::OFK_Cuda>();
		for (const auto &InputFile : CudaHostInputs) {
		assert(InputFile.isFilename() && "Offloading requires a filename");
		StringRef File =
		C.getArgs().MakeArgString(TC->getInputFilename(InputFile));
		StringRef InputName = Clang::getBaseInputStem(Args, Inputs);
		// The CUDA toolchain should have a bound arch appended to the filename.
		StringRef Arch = File.split(".").first.rsplit('-').second;
		traUnsubmitted Not Done Reply Inline Actions Extracting arch name from the file name looks... questionable. Where does that file name come from? Can't we get this information directly? tra: Extracting arch name from the file name looks... questionable. Where does that file name come…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Yeah, it's not ideal but it was the easiest way to do it. The alternative is to find a way to traverse the job action list and find the nodes with a bound architecture set and hope they're in the same order. I can try to do that. jhuber6: Yeah, it's not ideal but it was the easiest way to do it. The alternative is to find a way to…
		CmdArgs.push_back(Args.MakeArgString(
		"-fembed-offload-object=" + File + "," +
		Action::GetOffloadKindName(Action::OFK_Cuda) + "." +
		TC->getTripleString() + "." +
		Arch + "." + InputName));
		}
}		}

if (Triple.isAMDGPU()) {		if (Triple.isAMDGPU()) {
handleAMDGPUCodeObjectVersionOptions(D, Args, CmdArgs);		handleAMDGPUCodeObjectVersionOptions(D, Args, CmdArgs);

if (Args.hasFlag(options::OPT_munsafe_fp_atomics,		if (Args.hasFlag(options::OPT_munsafe_fp_atomics,
options::OPT_mno_unsafe_fp_atomics, /Default=/false))		options::OPT_mno_unsafe_fp_atomics, /Default=/false))
CmdArgs.push_back("-munsafe-fp-atomics");		CmdArgs.push_back("-munsafe-fp-atomics");
▲ Show 20 Lines • Show All 1,232 Lines • ▼ Show 20 Lines
void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,		void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const Driver &D = getToolChain().getDriver();		const Driver &D = getToolChain().getDriver();
const llvm::Triple TheTriple = getToolChain().getTriple();		const llvm::Triple TheTriple = getToolChain().getTriple();
auto OpenMPTCRange = C.getOffloadToolChains<Action::OFK_OpenMP>();		auto OpenMPTCRange = C.getOffloadToolChains<Action::OFK_OpenMP>();
		auto CudaTCRange = C.getOffloadToolChains<Action::OFK_Cuda>();
ArgStringList CmdArgs;		ArgStringList CmdArgs;

// Pass the CUDA path to the linker wrapper tool.		// Pass the CUDA path to the linker wrapper tool.
for (auto &I : llvm::make_range(OpenMPTCRange.first, OpenMPTCRange.second)) {		for (Action::OffloadKind Kind : {Action::OFK_Cuda, Action::OFK_OpenMP}) {
		auto TCRange = C.getOffloadToolChains(Kind);
		for (auto &I : llvm::make_range(TCRange.first, TCRange.second)) {
const ToolChain *TC = I.second;		const ToolChain *TC = I.second;
if (TC->getTriple().isNVPTX()) {		if (TC->getTriple().isNVPTX()) {
CudaInstallationDetector CudaInstallation(D, TheTriple, Args);		CudaInstallationDetector CudaInstallation(D, TheTriple, Args);
if (CudaInstallation.isValid())		if (CudaInstallation.isValid())
CmdArgs.push_back(Args.MakeArgString(		CmdArgs.push_back(Args.MakeArgString(
"--cuda-path=" + CudaInstallation.getInstallPath()));		"--cuda-path=" + CudaInstallation.getInstallPath()));
break;		break;
}		}
}		}
		}

// Get the AMDGPU math libraries.		// Get the AMDGPU math libraries.
// FIXME: This method is bad, remove once AMDGPU has a proper math library		// FIXME: This method is bad, remove once AMDGPU has a proper math library
// (see AMDGCN::OpenMPLinker::constructLLVMLinkCommand).		// (see AMDGCN::OpenMPLinker::constructLLVMLinkCommand).
for (auto &I : llvm::make_range(OpenMPTCRange.first, OpenMPTCRange.second)) {		for (auto &I : llvm::make_range(OpenMPTCRange.first, OpenMPTCRange.second)) {
const ToolChain *TC = I.second;		const ToolChain *TC = I.second;

if (!TC->getTriple().isAMDGPU() \|\| Args.hasArg(options::OPT_nogpulib))		if (!TC->getTriple().isAMDGPU() \|\| Args.hasArg(options::OPT_nogpulib))
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

clang/test/Driver/cuda-openmp-driver.cu

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				MaskRayUnsubmitted Not Done Reply Inline Actions clang-driver is unneeded. MaskRay: clang-driver is unneeded.
				// REQUIRES: nvptx-registered-target

				// RUN: %clang -### -target x86_64-linux-gnu -nocudalib -ccc-print-bindings -fgpu-rdc \
				// RUN: -foffload-new-driver --offload-arch=sm_35 --offload-arch=sm_70 %s 2>&1 \
				// RUN: \| FileCheck -check-prefix CHECK %s

				// CHECK: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[PTX_SM_35:.+]]"
				// CHECK: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_SM_35]]"], output: "[[CUBIN_SM_35:.+]]"
				// CHECK: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN_SM_35]]", "[[PTX_SM_35]]"], output: "[[FATBIN_SM_35:.+]]"
				MaskRayUnsubmitted Not Done Reply Inline Actions Better to add -NEXT whenever applicable MaskRay: Better to add -NEXT whenever applicable
				// CHECK: "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]"], output: "[[PTX_SM_70:.+]]"
				// CHECK: "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[PTX_SM_70:.+]]"], output: "[[CUBIN_SM_70:.+]]"
				// CHECK: "nvptx64-nvidia-cuda" - "NVPTX::Linker", inputs: ["[[CUBIN_SM_70]]", "[[PTX_SM_70:.+]]"], output: "[[FATBIN_SM_70:.+]]"
				// CHECK: "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT]]", "[[FATBIN_SM_35]]", "[[FATBIN_SM_70]]"], output: "[[HOST_OBJ:.+]]"
				// CHECK: "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out"
				traUnsubmitted Not Done Reply Inline Actions You probably want to check for `clang -cc1 ... -fgpu-rdc`, too. tra: You probably want to check for `clang -cc1 ... -fgpu-rdc`, too.

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Add driver support for compiling CUDA with the new driverClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 412908

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/include/clang/Driver/Options.td

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/test/Driver/cuda-openmp-driver.cu

[CUDA] Add driver support for compiling CUDA with the new driver
ClosedPublic