This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
-
Driver/
-
Action.h
7/8
Options.td
4/4
ToolChain.h
-
lib/Driver/
-
Driver/
-
Action.cpp
-
Compilation.cpp
12/12
Driver.cpp
-
ToolChain.cpp
-
ToolChains/
-
Clang.cpp
3/6
Cuda.h
34/49
Cuda.cpp
-
test/Driver/
-
Driver/
-
cuda-bad-arch.cu
-
cuda-phases.cu

Differential D45212

Add HIP toolchain
ClosedPublic

Authored by yaxunl on Apr 3 2018, 9:27 AM.

Download Raw Diff

Details

Reviewers

rjmccall
tra
hfinkel
Hahnfeld
t-tye

Commits

rGf614422da9bd: Add HIP toolchain
rL333484: Add HIP toolchain
rC333484: Add HIP toolchain

Summary

This patch adds HIP toolchain to support HIP language mode. It includes:

Create specific compiler jobs for HIP.

Choose specific libraries for HIP.

With contribution from Greg Rodgers.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 3 2018, 9:27 AM

Herald added subscribers: tpr, nhaehnle, jholewinski. · View Herald TranscriptApr 3 2018, 9:27 AM

yaxunl edited the summary of this revision. (Show Details)Apr 3 2018, 9:31 AM

Fixed typo which causes lit test failures. Now all lit tests pass.

The other changes I see here seem reasonable, but please do split the patch.

include/clang/Basic/Cuda.h
61 ↗	(On Diff #140827)	Does this actually have anything to do with HIP? You have a lot of changes in this patch which seem to just be about supporting more GPU revisions.
include/clang/Driver/Options.td
557	Do we absolutely need the non-CUDA-related aliases here? We generally try to be good about namespacing extension-specific language options. I understand that you're probably trying to maintain command-line compatibility with some existing toolchain, but if it's possible to avoid this, I would be much happier.
include/clang/Driver/ToolChain.h
124	"Backend" is a really generic name for this thing that is probably hyper-specific to the CUDA translation model.

yaxunl retitled this revision from [HIP] Let CUDA toolchain support HIP language mode to [HIP] Let CUDA toolchain support HIP language mode and amdgpu.Apr 4 2018, 7:29 AM

Herald added subscribers: dstuttard, wdng, kzhuravl. · View Herald TranscriptApr 4 2018, 7:29 AM

yaxunl marked an inline comment as done.Apr 4 2018, 7:50 AM

yaxunl added inline comments.

include/clang/Basic/Cuda.h
61 ↗	(On Diff #140827)	This patch not only adds support of HIP language mode but also adds support of amdgpu to CUDA toolchain. Currently HIP extension is only supported by amdgpu although in the future it may be supported by other targets.
include/clang/Driver/Options.td
557	There were discussions about a uniform clang option for offloading sub-arcs http://lists.llvm.org/pipermail/llvm-dev/2017-February/109896.html the consensus seem to be -offload-arch or -offload-archs. This patch attempts to make that transition to use this new option. We can separate this change to a different patch though.
include/clang/Driver/ToolChain.h
124	Agree. This tool actually links bunch of bitcode libraries (so called device libraries). For non-GPU targets, this is usually unnecessary since they support ISA-level linking. However most GPU targets do not support that, therefore they need this stage. How about renaming it as BitcodeLink?

rjmccall added inline comments.Apr 4 2018, 9:10 AM

include/clang/Basic/Cuda.h
61 ↗	(On Diff #140827)	I understand that, but I think you can separate those two patches without too much difficulty.
include/clang/Driver/Options.td
557	I don't mind the `-offload-*` options; I'm more concerned about `--host-only` and so on.
include/clang/Driver/ToolChain.h
124	DeviceLibraryLink, maybe? I wouldn't want someone to think this was related to ordinary LTO.

Hahnfeld added a subscriber: Hahnfeld.Apr 4 2018, 10:04 AM

Hahnfeld added inline comments.

include/clang/Basic/Cuda.h
61 ↗	(On Diff #140827)	There already is D42800 with lots of unanswered comments, so those need to be addressed first.

yaxunl marked 4 inline comments as done.Apr 4 2018, 10:26 AM

yaxunl added inline comments.

include/clang/Basic/Cuda.h
61 ↗	(On Diff #140827)	Will split the changes for supporting amdgcn to another review.
61 ↗	(On Diff #140827)	Will address those comments in the split review.
include/clang/Driver/Options.td
557	Will drop those options for now. Since HIP is just a language mode of CUDA, it is OK to use CUDA options e.g. 'cuda-device-only' for HIP.
include/clang/Driver/ToolChain.h
124	That sounds good. Will change to that.

Revised by reviewers' comments, including comments from previous review.

This looks okay to me, but I think it would better if someone with more expertise in the design of the driver and frontend code could review this.

lib/Driver/Driver.cpp
2284	The extra parens are weird here.

Can this revision be split further? The summary mentions many things that might make up multiple independent changes...

lib/Driver/ToolChains/Cuda.cpp
234	Will this override the user's value, e.g. `-std=c++14`?

yaxunl marked 2 inline comments as done.Apr 9 2018, 9:42 AM

yaxunl added inline comments.

lib/Driver/Driver.cpp
2284	will remove it.
lib/Driver/ToolChains/Cuda.cpp
234	No. We will add diagnotics in a separate patch.

gregrodgers added a subscriber: gregrodgers.Apr 9 2018, 10:25 AM

Separate file type changes to another patch.

In D45212#1060628, @Hahnfeld wrote:

Can this revision be split further? The summary mentions many things that might make up multiple independent changes...

I separated file type changes to https://reviews.llvm.org/D45489 and deferred some other changes for future.

It is kind of difficult to split this patch further since they depend on each other.

In D45212#1063186, @yaxunl wrote:

In D45212#1060628, @Hahnfeld wrote:

Can this revision be split further? The summary mentions many things that might make up multiple independent changes...

I separated file type changes to https://reviews.llvm.org/D45489 and deferred some other changes for future.

It is kind of difficult to split this patch further since they depend on each other.

You can add Related Revisions to make this easy to see.

yaxunl added parent revisions: D44984: [HIP] Add hip input kind and codegen for kernel launching, D45489: [HIP] Add input type for HIP.Apr 10 2018, 9:47 AM

I'd still prefer if someone with more driver-design expertise weighed in, but we might not have any specialists there.

LGTM, although you might consider changing your tests a bit: FileCheck recently added support for a -D argument where you can predefine variables in the command line. But that's just a suggestion, not something I'm asking you to do as part of review.

This revision is now accepted and ready to land.Apr 13 2018, 12:16 AM

In D45212#1066842, @rjmccall wrote:

I'd still prefer if someone with more driver-design expertise weighed in, but we might not have any specialists there.

I think you should at least give @tra the possibility to take a look. Last time we essentially ended up reverting a patch.

Sorry about the delay. I've been out for few days.

include/clang/Driver/Options.td
556–557	The discussion you've mentioned appears to be unfinished. @jlebar has raised a valid point regarding -no-offload-arch[s] and his email went unanswered AFAICT. If you propose to generalize a way to specify offload targets, it should probably be done in a separate patch. I'd just stick with `--cuda-gpu-arch` for the time being until we figure out how we're going to handle target specification in general. It works well enough for the moment and the new options are not required for the functionality implemented by this patch.
lib/Driver/Driver.cpp
2354–2362	You really want to either have your own error message or change existing one to something more generic. `unsupported use of NVPTX for host compilation.` will sound very confusing tof someone who tries to compile for AMD GPU.
3364	Would it make sense for HIP to have its own offloading kind? You seem to be adding similar checks in few other places.
3525–3527	`llvm::sys::path::extension` ?
3962–3969	I'm not sure why we do this here. Nor does it seem relevant to HIP.
4034–4035	Same here -- I'm not sure why we need this nor how it's related to HIP.
lib/Driver/ToolChains/Cuda.cpp
313	`LLC_OUTPUT` ?
317	Please use routines in`llvm::sys::path` here and in other places where you manipulate paths.
321–324	You could use .append({...}) CmdArgs2.append({"foo","bar","buz"});
397–400	This is rather awkward -- you're looking for /libdevice under all paths specified by -L, but there's no way to explicitly point to the directory with the bitcode library. If device library may be in a non-canonical location, I'd rather there was an explicit option to specify it. Furthermore, my understanding is that you will need to find these bitcode libraries during `-c` compilation. Using `-L` to derive bitcode search path during compilation looks wrong to me.
406	This (and other places where you're calculating libdevice path relative to driver dir) should probably be derived from the resource dir. Driver's path may not be the 'root' the compiler has been told to use.
408	LIBRARY_PATH sounds rather generic. Perhaps it should have HIP somewhere in its name.
423	Why do we need to silence the warnings?
459–462	This does not look like the right place to disable particular passes. Shouldn't it be done somewhere in LLVM?

This revision now requires changes to proceed.Apr 17 2018, 4:30 PM

yaxunl removed a parent revision: D45489: [HIP] Add input type for HIP.Apr 18 2018, 11:07 AM

yaxunl edited parent revisions, added: D45441: [HIP] Add predefined macros __HIPCC__ and __HIP_DEVICE_COMPILE__; removed: D44984: [HIP] Add hip input kind and codegen for kernel launching.

yaxunl marked 21 inline comments as done.Apr 19 2018, 7:25 PM

yaxunl added inline comments.

include/clang/Driver/Options.td
556–557	Will use `--cuda-gpu-arch` for this patch.
lib/Driver/Driver.cpp
2354–2362	Will generalise this error message.
3364	Yes it makes sense to let HIP have its own offloading kind. Will do.
3525–3527	will do
3962–3969	will remove it
4034–4035	will remove
lib/Driver/ToolChains/Cuda.cpp
313	will change to LLC_OUTPUT
317	will do
321–324	will do
397–400	Will use `--hip-device-lib-path=` and drop /libdevice.
406	will remove this
408	will use HIP_DEVICE_LIB_PATH
423	will remove this

yaxunl added inline comments.Apr 19 2018, 7:25 PM

lib/Driver/ToolChains/Cuda.cpp
459–462	These are not disabling the passes but adding these passes. They are some optimizations which are usually improving performance for amdgcn.

Revised by Artem's comments.

sync to ToT.

minor bug fix: do not add CUDA specific link options for HIP.

yaxunl marked an inline comment as done.Apr 20 2018, 8:26 AM

yaxunl added inline comments.

lib/Driver/ToolChains/Cuda.cpp
459–462	Since opt is now able to adjust passes based on -mtriple and -mcpu, will remove these manually added passes and add -mtriple and -mcpu instead.

Remove manually added passes from opt and add -mtriple.

ping

clean up code and separate action builder to another review.

yaxunl added a parent revision: D46476: [HIP] Add action builder for HIP.May 4 2018, 8:28 PM

yaxunl removed a parent revision: D45441: [HIP] Add predefined macros __HIPCC__ and __HIP_DEVICE_COMPILE__.May 4 2018, 8:42 PM

yaxunl added a child revision: D46489: [HIP] Let assembler output bitcode for amdgcn.May 4 2018, 9:06 PM

yaxunl added reviewers: hfinkel, Hahnfeld.May 15 2018, 10:00 AM

yaxunl removed a child revision: D46489: [HIP] Let assembler output bitcode for amdgcn.May 16 2018, 12:40 PM

ping

Hi Artem,

I've addressed your comments. Any further changes are needed? Thanks.

yaxunl added a reviewer: t-tye.May 18 2018, 1:42 PM

Hi,

Sorry about the long silence. I'm back to continue the reviews. I'll handle what I can today and will continue with the rest on Tuesday.

It looks like patch description needs to be updated:

Use clang-offload-bindler to create binary for device ISA.

I don't see anything related to offload-bundler in this patch any more.

include/clang/Driver/Options.td
579	I'm not sure about `i_Group`? This will cause this option to be passed to all preprocessor jobs. It will also be passed to host and device side compilations, while you probably only want/need it on device side only.
lib/Driver/ToolChains/Cuda.cpp
320	FullName is already result of Args.MakeArgString. You only need to do it once.
326	This function is too large to easily see that we're actually constructing sequence of commands. I'd probably split construction of individual tool's command line into its own function.
333	No need for the leading space in the message.
341–342	`for (const InputInfo &it : Inputs)` ?
347	All-caps name looks like a macro. Rename to `GfxName` ?
351–356	for (path : Args.getAllArgValues(...)) { LibraryPaths.push_back(Args.MakeArgString(path)); }
372–375	This is somewhat unreadable. Perhaps you could construct the name in a temp variable.
381	You don't need to use c_str() for MakeArgString. It will happily accept std::string.
391	`BitcodeOutputFile`?
414	I think you can get rid of the temp var here without hurting readability.
417	I wonder if we could derive temp file name from the input's name. This may make it easier to find relevant temp files if/when we need to debug the compilation process.
419	No need for c_str() here.
436	c_str(), again.
803	I'd just add something like this and leave existing if unchanged: // There's no need for CUDA-specific bitcode linking with HIP. if( DeviceOffloadingKind == Action::OFK_HIP) return;
965–977	All these should be in the derived toolchain.
lib/Driver/ToolChains/Cuda.h
137–138	Where does amdgcn-link come from? Does it accept --options-file ?
181–182	You may want to derive your own HIPToolChain as you're unlikely to reuse NVPTX-specific tools.
202	In HIPToolchain this one would become an inline function returning true.

One more thing -- it would be really good to add some tests to make sure your commands are constructed the way you want.

In D45212#1105177, @tra wrote:

Hi,

Sorry about the long silence. I'm back to continue the reviews. I'll handle what I can today and will continue with the rest on Tuesday.

It looks like patch description needs to be updated:

Use clang-offload-bindler to create binary for device ISA.

I don't see anything related to offload-bundler in this patch any more.

You are right. Using clang-offload-bundler to create binary for device ISA has been moved to another patch. Will update the description of this patch.

In D45212#1105282, @tra wrote:

One more thing -- it would be really good to add some tests to make sure your commands are constructed the way you want.

will do

include/clang/Driver/Options.td
579	will change to Link_Group
lib/Driver/ToolChains/Cuda.cpp
320	will fix
326	will do
333	will fix.
341–342	will fix
347	will fix
351–356	will fix
372–375	will do
381	will fix
391	will change
414	will do
417	will do
419	will do
436	will fix
803	will change
965–977	will do
lib/Driver/ToolChains/Cuda.h
137–138	amdgcn-link is the short name of the amdgcn bitcode linker. It is not a real program and does not support response file. Will remove the arguments about response file.
181–182	Will do
202	will do

Revised by Artem's comments.

Herald added a subscriber: mgorny. · View Herald TranscriptMay 23 2018, 8:05 AM

tra added inline comments.May 23 2018, 12:13 PM

lib/Driver/ToolChains/HIP.cpp
29–47 ↗	(On Diff #148216)	FullName may remain uninitialized if LibraryPaths are empty which will probably crash compiler when you attempt to pass it to MakeArgString. If empty LibraryPaths is not expected there should be an assert. If the library is not found, we issue an error, but we still proceed to append the FullName to the CmdArgs. I don't think we should do that. FullName will be either NULL or pointing to the last directory in the LibraryPaths. You seem to be relying on diagnostics to deal with errors and are not using return value of the function. You may as well make it void. I'd move `CmdArgs.push_back(...)` under `if(::exists(FullName))` and change `break` to `return`; Then you can get rid of FoundLibDevice and just issue the error if we ever reach the end of the function.
79–81 ↗	(On Diff #148216)	No need for intermediate values here -- just '+' all parts together.
133 ↗	(On Diff #148216)	Nit: I think explicit llvm::Twine is unnecessary here.
155–160 ↗	(On Diff #148216)	Nit: THis could be collapsed into `ArgStringList LlcArgs({...});`
179–181 ↗	(On Diff #148216)	Same here: `ArgStringList LldArgs({"-flavor", "gnu", "--no-undefined", "-shared", "-o"});`
212–215 ↗	(On Diff #148216)	Right now the code is structured as if you're appending to the same TempFile string which is not the case here. I'd give intermediate variables their own names -- `OptCommand`,`LlcCommand`. This would make it easier to see that you are chaining separate commands, each producing its own temp output file.

yaxunl marked 6 inline comments as done.May 23 2018, 1:31 PM

yaxunl added inline comments.

lib/Driver/ToolChains/HIP.cpp
29–47 ↗	(On Diff #148216)	Will CmdArgs.push_back(...) under if(::exists(FullName)) and change break to return; and change return type to void.
79–81 ↗	(On Diff #148216)	will do
133 ↗	(On Diff #148216)	will remove
155–160 ↗	(On Diff #148216)	will do
179–181 ↗	(On Diff #148216)	will do
212–215 ↗	(On Diff #148216)	will do

Revised by Artem's comments.

One small nit. LGTM otherwise.

lib/Driver/ToolChains/HIP.cpp
44 ↗	(On Diff #148277)	You don't need FoundLibDevice any more as you will always return from inside the loop if it is ever true.

This revision is now accepted and ready to land.May 23 2018, 1:42 PM

yaxunl marked an inline comment as done.May 23 2018, 1:47 PM

yaxunl added inline comments.

lib/Driver/ToolChains/HIP.cpp
44 ↗	(On Diff #148277)	will remove when committing. Thanks!

Closed by commit rC333484: Add HIP toolchain (authored by yaxunl). · Explain WhyMay 29 2018, 5:57 PM

This revision was automatically updated to reflect the committed changes.

yaxunl marked an inline comment as done.

Revision Contents

Path

Size

include/

clang/

Basic/

DiagnosticDriverKinds.td

2 lines

Driver/

Action.h

1 line

Options.td

4 lines

ToolChain.h

3 lines

lib/

Driver/

9 lines

8 lines

137 lines

24 lines

ToolChains/

Clang.cpp

42 lines

Cuda.h

48 lines

Cuda.cpp

215 lines

test/

Driver/

cuda-bad-arch.cu

8 lines

cuda-phases.cu

290 lines

Diff 143320

include/clang/Basic/DiagnosticDriverKinds.td

	Show All 30 Lines
	def err_drv_no_cuda_libdevice : Error<			def err_drv_no_cuda_libdevice : Error<
	"cannot find libdevice for %0. Provide path to different CUDA installation "			"cannot find libdevice for %0. Provide path to different CUDA installation "
	"via --cuda-path, or pass -nocudalib to build without linking with libdevice.">;			"via --cuda-path, or pass -nocudalib to build without linking with libdevice.">;
	def err_drv_cuda_version_unsupported : Error<			def err_drv_cuda_version_unsupported : Error<
	"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "			"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "
	"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "			"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "
	"install, pass a different GPU arch with --cuda-gpu-arch, or pass "			"install, pass a different GPU arch with --cuda-gpu-arch, or pass "
	"--no-cuda-version-check.">;			"--no-cuda-version-check.">;
	def err_drv_cuda_nvptx_host : Error<"unsupported use of NVPTX for host compilation.">;			def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;
	def err_drv_invalid_thread_model_for_target : Error<			def err_drv_invalid_thread_model_for_target : Error<
	"invalid thread model '%0' in '%1' for this target">;			"invalid thread model '%0' in '%1' for this target">;
	def err_drv_invalid_linker_name : Error<			def err_drv_invalid_linker_name : Error<
	"invalid linker name in argument '%0'">;			"invalid linker name in argument '%0'">;
	def err_drv_invalid_pgo_instrumentor : Error<			def err_drv_invalid_pgo_instrumentor : Error<
	"invalid PGO instrumentor in argument '%0'">;			"invalid PGO instrumentor in argument '%0'">;
	def err_drv_invalid_rtlib_name : Error<			def err_drv_invalid_rtlib_name : Error<
	"invalid runtime library name in argument '%0'">;			"invalid runtime library name in argument '%0'">;
	▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

include/clang/Driver/Action.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	enum OffloadKind {
OFK_None = 0x00,		OFK_None = 0x00,

// The host offloading tool chain.		// The host offloading tool chain.
OFK_Host = 0x01,		OFK_Host = 0x01,

// The device offloading tool chains - one bit for each programming model.		// The device offloading tool chains - one bit for each programming model.
OFK_Cuda = 0x02,		OFK_Cuda = 0x02,
OFK_OpenMP = 0x04,		OFK_OpenMP = 0x04,
		OFK_HIP = 0x08,
};		};

static const char *getClassName(ActionClass AC);		static const char *getClassName(ActionClass AC);

private:		private:
ActionClass Kind;		ActionClass Kind;

/// The output type of this action.		/// The output type of this action.
▲ Show 20 Lines • Show All 496 Lines • Show Last 20 Lines

include/clang/Driver/Options.td

	Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines
	def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,			def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,
	HelpText<"Compile CUDA code for both host and device (default). Has no "			HelpText<"Compile CUDA code for both host and device (default). Has no "
	"effect on non-CUDA compilations.">;			"effect on non-CUDA compilations.">;
	def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[DriverOption]>,			def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[DriverOption]>,
	HelpText<"Include PTX for the follwing GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;			HelpText<"Include PTX for the follwing GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
	def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[DriverOption]>,			def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[DriverOption]>,
	HelpText<"Do not include PTX for the follwing GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;			HelpText<"Do not include PTX for the follwing GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
	def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,			def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,
	HelpText<"CUDA GPU architecture (e.g. sm_35). May be specified more than once.">;			HelpText<"CUDA/HIP GPU architecture (e.g. sm_35). May be specified more than once.">;
	def no_cuda_gpu_arch_EQ : Joined<["--"], "no-cuda-gpu-arch=">, Flags<[DriverOption]>,			def no_cuda_gpu_arch_EQ : Joined<["--"], "no-cuda-gpu-arch=">, Flags<[DriverOption]>,
	HelpText<"Remove GPU architecture (e.g. sm_35) from the list of GPUs to compile for. "			HelpText<"Remove GPU architecture (e.g. sm_35) from the list of GPUs to compile for. "
	"'all' resets the list to its default value.">;			"'all' resets the list to its default value.">;
				rjmccallUnsubmitted Done Reply Inline Actions Do we absolutely need the non-CUDA-related aliases here? We generally try to be good about namespacing extension-specific language options. I understand that you're probably trying to maintain command-line compatibility with some existing toolchain, but if it's possible to avoid this, I would be much happier. rjmccall: Do we absolutely need the non-CUDA-related aliases here? We generally try to be good about…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions There were discussions about a uniform clang option for offloading sub-arcs http://lists.llvm.org/pipermail/llvm-dev/2017-February/109896.html the consensus seem to be -offload-arch or -offload-archs. This patch attempts to make that transition to use this new option. We can separate this change to a different patch though. yaxunl: There were discussions about a uniform clang option for offloading sub-arcs http://lists.llvm.
				rjmccallUnsubmitted Done Reply Inline Actions I don't mind the `-offload-` options; I'm more concerned about `--host-only` and so on. rjmccall:* I don't mind the `-offload-*` options; I'm more concerned about `--host-only` and so on.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions Will drop those options for now. Since HIP is just a language mode of CUDA, it is OK to use CUDA options e.g. 'cuda-device-only' for HIP. yaxunl: Will drop those options for now. Since HIP is just a language mode of CUDA, it is OK to use…
				traUnsubmitted Done Reply Inline Actions The discussion you've mentioned appears to be unfinished. @jlebar has raised a valid point regarding -no-offload-arch[s] and his email went unanswered AFAICT. If you propose to generalize a way to specify offload targets, it should probably be done in a separate patch. I'd just stick with `--cuda-gpu-arch` for the time being until we figure out how we're going to handle target specification in general. It works well enough for the moment and the new options are not required for the functionality implemented by this patch. tra: The discussion you've mentioned appears to be unfinished. @jlebar has raised a valid point…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions Will use `--cuda-gpu-arch` for this patch. yaxunl: Will use `--cuda-gpu-arch` for this patch.
	def cuda_noopt_device_debug : Flag<["--"], "cuda-noopt-device-debug">,			def cuda_noopt_device_debug : Flag<["--"], "cuda-noopt-device-debug">,
	HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;			HelpText<"Enable device-side debug info generation. Disables ptxas optimizations.">;
	def no_cuda_version_check : Flag<["--"], "no-cuda-version-check">,			def no_cuda_version_check : Flag<["--"], "no-cuda-version-check">,
	HelpText<"Don't error out if the detected version of the CUDA install is "			HelpText<"Don't error out if the detected version of the CUDA install is "
	"too low for the requested CUDA gpu architecture.">;			"too low for the requested CUDA gpu architecture.">;
	def no_cuda_noopt_device_debug : Flag<["--"], "no-cuda-noopt-device-debug">;			def no_cuda_noopt_device_debug : Flag<["--"], "no-cuda-noopt-device-debug">;
	def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,			def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group<i_Group>,
	HelpText<"CUDA installation path">;			HelpText<"CUDA installation path">;
	def cuda_path_ignore_env : Flag<["--"], "cuda-path-ignore-env">, Group<i_Group>,			def cuda_path_ignore_env : Flag<["--"], "cuda-path-ignore-env">, Group<i_Group>,
	HelpText<"Ignore environment variables to detect CUDA installation">;			HelpText<"Ignore environment variables to detect CUDA installation">;
	def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Group<i_Group>,			def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Group<i_Group>,
	HelpText<"Path to ptxas (used for compiling CUDA code)">;			HelpText<"Path to ptxas (used for compiling CUDA code)">;
	def fcuda_flush_denormals_to_zero : Flag<["-"], "fcuda-flush-denormals-to-zero">,			def fcuda_flush_denormals_to_zero : Flag<["-"], "fcuda-flush-denormals-to-zero">,
	Flags<[CC1Option]>, HelpText<"Flush denormal floating point values to zero in CUDA device mode.">;			Flags<[CC1Option]>, HelpText<"Flush denormal floating point values to zero in CUDA device mode.">;
	def fno_cuda_flush_denormals_to_zero : Flag<["-"], "fno-cuda-flush-denormals-to-zero">;			def fno_cuda_flush_denormals_to_zero : Flag<["-"], "fno-cuda-flush-denormals-to-zero">;
	def fcuda_approx_transcendentals : Flag<["-"], "fcuda-approx-transcendentals">,			def fcuda_approx_transcendentals : Flag<["-"], "fcuda-approx-transcendentals">,
	Flags<[CC1Option]>, HelpText<"Use approximate transcendental functions">;			Flags<[CC1Option]>, HelpText<"Use approximate transcendental functions">;
	def fno_cuda_approx_transcendentals : Flag<["-"], "fno-cuda-approx-transcendentals">;			def fno_cuda_approx_transcendentals : Flag<["-"], "fno-cuda-approx-transcendentals">;
	def fcuda_rdc : Flag<["-"], "fcuda-rdc">, Flags<[CC1Option, HelpHidden]>,			def fcuda_rdc : Flag<["-"], "fcuda-rdc">, Flags<[CC1Option, HelpHidden]>,
	HelpText<"Generate relocatable device code, also known as separate compilation mode.">;			HelpText<"Generate relocatable device code, also known as separate compilation mode.">;
	def fno_cuda_rdc : Flag<["-"], "fno-cuda-rdc">;			def fno_cuda_rdc : Flag<["-"], "fno-cuda-rdc">;
				def hip_device_lib_path_EQ : Joined<["--"], "hip-device-lib-path=">, Group<i_Group>,
				traUnsubmitted Done Reply Inline Actions I'm not sure about `i_Group`? This will cause this option to be passed to all preprocessor jobs. It will also be passed to host and device side compilations, while you probably only want/need it on device side only. tra: I'm not sure about `i_Group`? This will cause this option to be passed to all preprocessor jobs.
				yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will change to Link_Group yaxunl: will change to Link_Group
				HelpText<"HIP device library path">;
	def dA : Flag<["-"], "dA">, Group<d_Group>;			def dA : Flag<["-"], "dA">, Group<d_Group>;
	def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,			def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,
	HelpText<"Print macro definitions in -E mode in addition to normal output">;			HelpText<"Print macro definitions in -E mode in addition to normal output">;
	def dI : Flag<["-"], "dI">, Group<d_Group>, Flags<[CC1Option]>,			def dI : Flag<["-"], "dI">, Group<d_Group>, Flags<[CC1Option]>,
	HelpText<"Print include directives in -E mode in addition to normal output">;			HelpText<"Print include directives in -E mode in addition to normal output">;
	def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,			def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,
	HelpText<"Print macro definitions in -E mode instead of normal output">;			HelpText<"Print macro definitions in -E mode instead of normal output">;
	def dead__strip : Flag<["-"], "dead_strip">;			def dead__strip : Flag<["-"], "dead_strip">;
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	private:

/// The list of toolchain specific path prefixes to search for files.		/// The list of toolchain specific path prefixes to search for files.
path_list FilePaths;		path_list FilePaths;

/// The list of toolchain specific path prefixes to search for programs.		/// The list of toolchain specific path prefixes to search for programs.
path_list ProgramPaths;		path_list ProgramPaths;

mutable std::unique_ptr<Tool> Clang;		mutable std::unique_ptr<Tool> Clang;
		mutable std::unique_ptr<Tool> DeviceLibraryLink;
		rjmccallUnsubmitted Done Reply Inline Actions "Backend" is a really generic name for this thing that is probably hyper-specific to the CUDA translation model. rjmccall: "Backend" is a really generic name for this thing that is probably hyper-specific to the CUDA…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Agree. This tool actually links bunch of bitcode libraries (so called device libraries). For non-GPU targets, this is usually unnecessary since they support ISA-level linking. However most GPU targets do not support that, therefore they need this stage. How about renaming it as BitcodeLink? yaxunl: Agree. This tool actually links bunch of bitcode libraries (so called device libraries). For…
		rjmccallUnsubmitted Done Reply Inline Actions DeviceLibraryLink, maybe? I wouldn't want someone to think this was related to ordinary LTO. rjmccall: DeviceLibraryLink, maybe? I wouldn't want someone to think this was related to ordinary LTO.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions That sounds good. Will change to that. yaxunl: That sounds good. Will change to that.
mutable std::unique_ptr<Tool> Assemble;		mutable std::unique_ptr<Tool> Assemble;
mutable std::unique_ptr<Tool> Link;		mutable std::unique_ptr<Tool> Link;
mutable std::unique_ptr<Tool> OffloadBundler;		mutable std::unique_ptr<Tool> OffloadBundler;

Tool *getClang() const;		Tool *getClang() const;
		Tool *getDeviceLibraryLink() const;
Tool *getAssemble() const;		Tool *getAssemble() const;
Tool *getLink() const;		Tool *getLink() const;
Tool *getClangAs() const;		Tool *getClangAs() const;
Tool *getOffloadBundler() const;		Tool *getOffloadBundler() const;

mutable std::unique_ptr<SanitizerArgs> SanitizerArguments;		mutable std::unique_ptr<SanitizerArgs> SanitizerArguments;
mutable std::unique_ptr<XRayArgs> XRayArguments;		mutable std::unique_ptr<XRayArgs> XRayArguments;

Show All 9 Lines	protected:
MultilibSet Multilibs;		MultilibSet Multilibs;

ToolChain(const Driver &D, const llvm::Triple &T,		ToolChain(const Driver &D, const llvm::Triple &T,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);

void setTripleEnvironment(llvm::Triple::EnvironmentType Env);		void setTripleEnvironment(llvm::Triple::EnvironmentType Env);

virtual Tool *buildAssembler() const;		virtual Tool *buildAssembler() const;
		virtual Tool *buildDeviceLibraryLinker() const;
virtual Tool *buildLinker() const;		virtual Tool *buildLinker() const;
virtual Tool *getTool(Action::ActionClass AC) const;		virtual Tool *getTool(Action::ActionClass AC) const;

/// \name Utilities for implementing subclasses.		/// \name Utilities for implementing subclasses.
///@{		///@{
static void addSystemInclude(const llvm::opt::ArgList &DriverArgs,		static void addSystemInclude(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
const Twine &Path);		const Twine &Path);
▲ Show 20 Lines • Show All 391 Lines • Show Last 20 Lines

lib/Driver/Action.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	case OFK_None:
break;		break;
case OFK_Host:		case OFK_Host:
llvm_unreachable("Host kind is not an offloading device kind.");		llvm_unreachable("Host kind is not an offloading device kind.");
break;		break;
case OFK_Cuda:		case OFK_Cuda:
return "device-cuda";		return "device-cuda";
case OFK_OpenMP:		case OFK_OpenMP:
return "device-openmp";		return "device-openmp";
		case OFK_HIP:
		return "device-hip";

// TODO: Add other programming models here.		// TODO: Add other programming models here.
}		}

if (!ActiveOffloadKindMask)		if (!ActiveOffloadKindMask)
return {};		return {};

std::string Res("host");		std::string Res("host");
		assert(!((ActiveOffloadKindMask & OFK_Cuda) &&
		(ActiveOffloadKindMask & OFK_HIP)) &&
		"Cannot offload CUDA and HIP at the same time");
if (ActiveOffloadKindMask & OFK_Cuda)		if (ActiveOffloadKindMask & OFK_Cuda)
Res += "-cuda";		Res += "-cuda";
		if (ActiveOffloadKindMask & OFK_HIP)
		Res += "-hip";
if (ActiveOffloadKindMask & OFK_OpenMP)		if (ActiveOffloadKindMask & OFK_OpenMP)
Res += "-openmp";		Res += "-openmp";

// TODO: Add other programming models here.		// TODO: Add other programming models here.

return Res;		return Res;
}		}

Show All 20 Lines	StringRef Action::GetOffloadKindName(OffloadKind Kind) {
switch (Kind) {		switch (Kind) {
case OFK_None:		case OFK_None:
case OFK_Host:		case OFK_Host:
return "host";		return "host";
case OFK_Cuda:		case OFK_Cuda:
return "cuda";		return "cuda";
case OFK_OpenMP:		case OFK_OpenMP:
return "openmp";		return "openmp";
		case OFK_HIP:
		return "hip";

// TODO: Add other programming models here.		// TODO: Add other programming models here.
}		}

llvm_unreachable("invalid offload kind");		llvm_unreachable("invalid offload kind");
}		}

void InputAction::anchor() {}		void InputAction::anchor() {}
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

lib/Driver/Compilation.cpp

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines

	using FailingCommandList = SmallVectorImpl<std::pair<int, const Command *>>;			using FailingCommandList = SmallVectorImpl<std::pair<int, const Command *>>;

	static bool ActionFailed(const Action *A,			static bool ActionFailed(const Action *A,
	const FailingCommandList &FailingCommands) {			const FailingCommandList &FailingCommands) {
	if (FailingCommands.empty())			if (FailingCommands.empty())
	return false;			return false;

	// CUDA can have the same input source code compiled multiple times so do not			// CUDA/HIP can have the same input source code compiled multiple times so do
	// compiled again if there are already failures. It is OK to abort the CUDA			// not compiled again if there are already failures. It is OK to abort the
	// pipeline on errors.			// CUDA pipeline on errors.
	if (A->isOffloading(Action::OFK_Cuda))			if (A->isOffloading(Action::OFK_Cuda) \|\| A->isOffloading(Action::OFK_HIP))
	return true;			return true;

	for (const auto &CI : FailingCommands)			for (const auto &CI : FailingCommands)
	if (A == &(CI.second->getSource()))			if (A == &(CI.second->getSource()))
	return true;			return true;

	for (const auto *AI : A->inputs())			for (const auto *AI : A->inputs())
	if (ActionFailed(AI, FailingCommands))			if (ActionFailed(AI, FailingCommands))
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

lib/Driver/Driver.cpp

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	Driver::OpenMPRuntimeKind Driver::getOpenMPRuntime(const ArgList &Args) const {

return RT;		return RT;
}		}

void Driver::CreateOffloadingDeviceToolChains(Compilation &C,		void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
InputList &Inputs) {		InputList &Inputs) {

//		//
// CUDA		// CUDA/HIP
//		//
// We need to generate a CUDA toolchain if any of the inputs has a CUDA type.		// We need to generate a CUDA toolchain if any of the inputs has a CUDA type.
if (llvm::any_of(Inputs, [](std::pair<types::ID, const llvm::opt::Arg *> &I) {		// ToDo: Handle mixed CUDA/HIP input files and -x hip option. Diagnose
		// CUDA on amdgcn and HIP on nvptx.
		bool IsHIP =
		C.getInputArgs().hasArg(options::OPT_x) &&
		StringRef(C.getInputArgs().getLastArg(options::OPT_x)->getValue()) ==
		"hip";
		if (llvm::any_of(Inputs,
		[](std::pair<types::ID, const llvm::opt::Arg *> &I) {
return types::isCuda(I.first);		return types::isCuda(I.first);
})) {		}) \|\|
		IsHIP) {
const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();		const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
const llvm::Triple &HostTriple = HostTC->getTriple();		const llvm::Triple &HostTriple = HostTC->getTriple();
llvm::Triple CudaTriple(HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda"		StringRef DeviceTripleStr;
: "nvptx-nvidia-cuda");		auto OFK = IsHIP ? Action::OFK_HIP : Action::OFK_Cuda;
// Use the CUDA and host triples as the key into the ToolChains map, because		if (IsHIP) {
// the device toolchain we create depends on both.		// HIP is only supported on amdgcn.
		DeviceTripleStr = "amdgcn-amd-amdhsa";
		} else {
		// CUDA is only supported on nvptx.
		DeviceTripleStr = HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda"
		: "nvptx-nvidia-cuda";
		}
		llvm::Triple CudaTriple(DeviceTripleStr);
		// Use the CUDA/HIP and host triples as the key into the ToolChains map,
		// because the device toolchain we create depends on both.
auto &CudaTC = ToolChains[CudaTriple.str() + "/" + HostTriple.str()];		auto &CudaTC = ToolChains[CudaTriple.str() + "/" + HostTriple.str()];
if (!CudaTC) {		if (!CudaTC) {
CudaTC = llvm::make_unique<toolchains::CudaToolChain>(		CudaTC = llvm::make_unique<toolchains::CudaToolChain>(
this, CudaTriple, HostTC, C.getInputArgs(), Action::OFK_Cuda);		this, CudaTriple, HostTC, C.getInputArgs(), OFK);
}		}
C.addOffloadDeviceToolChain(CudaTC.get(), Action::OFK_Cuda);		C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
}		}

//		//
// OpenMP		// OpenMP
//		//
// We need to generate an OpenMP toolchain if the user specified targets with		// We need to generate an OpenMP toolchain if the user specified targets with
// the -fopenmp-targets option.		// the -fopenmp-targets option.
if (Arg *OpenMPTargets =		if (Arg *OpenMPTargets =
▲ Show 20 Lines • Show All 1,548 Lines • ▼ Show 20 Lines	public:
bool isValid() { return !ToolChains.empty(); }		bool isValid() { return !ToolChains.empty(); }

/// Return the associated offload kind.		/// Return the associated offload kind.
Action::OffloadKind getAssociatedOffloadKind() {		Action::OffloadKind getAssociatedOffloadKind() {
return AssociatedOffloadKind;		return AssociatedOffloadKind;
}		}
};		};

/// \brief CUDA action builder. It injects device code in the host backend		/// \brief base class for CUDA/HIP action builder. It injects device code in
/// action.		/// the host backend action.
class CudaActionBuilder final : public DeviceActionBuilder {		class CudaActionBuilderBase : public DeviceActionBuilder {
/// Flags to signal if the user requested host-only or device-only		/// Flags to signal if the user requested host-only or device-only
/// compilation.		/// compilation.
bool CompileHostOnly = false;		bool CompileHostOnly = false;
bool CompileDeviceOnly = false;		bool CompileDeviceOnly = false;

/// List of GPU architectures to use in this compilation.		/// List of GPU architectures to use in this compilation.
SmallVector<CudaArch, 4> GpuArchList;		SmallVector<CudaArch, 4> GpuArchList;

/// The CUDA actions for the current input.		/// The CUDA actions for the current input.
ActionList CudaDeviceActions;		ActionList CudaDeviceActions;

/// The CUDA fat binary if it was generated for the current input.		/// The CUDA fat binary if it was generated for the current input.
Action *CudaFatBinary = nullptr;		Action *CudaFatBinary = nullptr;

/// Flag that is set to true if this builder acted on the current input.		/// Flag that is set to true if this builder acted on the current input.
bool IsActive = false;		bool IsActive = false;

		/// The offload kind for CUDA/HIP.
		Action::OffloadKind OFK;

public:		public:
CudaActionBuilder(Compilation &C, DerivedArgList &Args,		CudaActionBuilderBase(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)		const Driver::InputList &Inputs,
: DeviceActionBuilder(C, Args, Inputs, Action::OFK_Cuda) {}		Action::OffloadKind OFKind)
		: DeviceActionBuilder(C, Args, Inputs, OFKind), OFK(OFKind) {}

ActionBuilderReturnCode		ActionBuilderReturnCode
getDeviceDependences(OffloadAction::DeviceDependences &DA,		getDeviceDependences(OffloadAction::DeviceDependences &DA,
phases::ID CurPhase, phases::ID FinalPhase,		phases::ID CurPhase, phases::ID FinalPhase,
PhasesTy &Phases) override {		PhasesTy &Phases) override {
if (!IsActive)		if (!IsActive)
return ABRT_Inactive;		return ABRT_Inactive;

Show All 22 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
// Skip the phases that were already dealt with.		// Skip the phases that were already dealt with.
if (Ph < CurPhase)		if (Ph < CurPhase)
continue;		continue;
// We have to be consistent with the host final phase.		// We have to be consistent with the host final phase.
if (Ph > FinalPhase)		if (Ph > FinalPhase)
break;		break;

CudaDeviceActions[I] = C.getDriver().ConstructPhaseAction(		CudaDeviceActions[I] = C.getDriver().ConstructPhaseAction(
C, Args, Ph, CudaDeviceActions[I], Action::OFK_Cuda);		C, Args, Ph, CudaDeviceActions[I], OFK);

if (Ph == phases::Assemble)		if (Ph == phases::Assemble)
break;		break;
}		}

// If we didn't reach the assemble phase, we can't generate the fat		// If we didn't reach the assemble phase, we can't generate the fat
// binary. We don't need to generate the fat binary if we are not in		// binary. We don't need to generate the fat binary if we are not in
// device-only mode.		// device-only mode.
if (!isa<AssembleJobAction>(CudaDeviceActions[I]) \|\|		if (!isa<AssembleJobAction>(CudaDeviceActions[I]) \|\|
CompileDeviceOnly)		CompileDeviceOnly)
continue;		continue;

Action *AssembleAction = CudaDeviceActions[I];		Action *AssembleAction = CudaDeviceActions[I];
assert(AssembleAction->getType() == types::TY_Object);		assert(AssembleAction->getType() == types::TY_Object);
assert(AssembleAction->getInputs().size() == 1);		assert(AssembleAction->getInputs().size() == 1);

Action *BackendAction = AssembleAction->getInputs()[0];		Action *BackendAction = AssembleAction->getInputs()[0];
assert(BackendAction->getType() == types::TY_PP_Asm);		assert(BackendAction->getType() == types::TY_PP_Asm);

for (auto &A : {AssembleAction, BackendAction}) {		for (auto &A : {AssembleAction, BackendAction}) {
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(A, ToolChains.front(), CudaArchToString(GpuArchList[I]),		DDep.add(A, ToolChains.front(), CudaArchToString(GpuArchList[I]),
Action::OFK_Cuda);		OFK);
DeviceActions.push_back(		DeviceActions.push_back(
C.MakeAction<OffloadAction>(DDep, A->getType()));		C.MakeAction<OffloadAction>(DDep, A->getType()));
}		}
}		}

// We generate the fat binary if we have device input actions.		// We generate the fat binary if we have device input actions.
if (!DeviceActions.empty()) {		if (!DeviceActions.empty()) {
CudaFatBinary =		CudaFatBinary =
C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN);		C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN);

if (!CompileDeviceOnly) {		if (!CompileDeviceOnly) {
DA.add(CudaFatBinary, ToolChains.front(), /BoundArch=/nullptr,		DA.add(CudaFatBinary, ToolChains.front(), /BoundArch=/nullptr,
Action::OFK_Cuda);		OFK);
// Clear the fat binary, it is already a dependence to an host		// Clear the fat binary, it is already a dependence to an host
// action.		// action.
CudaFatBinary = nullptr;		CudaFatBinary = nullptr;
}		}

// Remove the CUDA actions as they are already connected to an host		// Remove the CUDA actions as they are already connected to an host
// action or fat binary.		// action or fat binary.
CudaDeviceActions.clear();		CudaDeviceActions.clear();
Show All 28 Lines	ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {
// the host uses the CUDA offload kind.		// the host uses the CUDA offload kind.
if (auto *IA = dyn_cast<InputAction>(HostAction)) {		if (auto *IA = dyn_cast<InputAction>(HostAction)) {
assert(!GpuArchList.empty() &&		assert(!GpuArchList.empty() &&
"We should have at least one GPU architecture.");		"We should have at least one GPU architecture.");

// If the host input is not CUDA or HIP, we don't need to bother about		// If the host input is not CUDA or HIP, we don't need to bother about
// this input.		// this input.
if (IA->getType() != types::TY_CUDA &&		if (IA->getType() != types::TY_CUDA &&
IA->getType() != types::TY_HIP) {		IA->getType() != types::TY_HIP) {
		rjmccallUnsubmitted Done Reply Inline Actions The extra parens are weird here. rjmccall: The extra parens are weird here.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove it. yaxunl: will remove it.
// The builder will ignore this input.		// The builder will ignore this input.
IsActive = false;		IsActive = false;
return ABRT_Inactive;		return ABRT_Inactive;
}		}

// Set the flag to true, so that the builder acts on the current input.		// Set the flag to true, so that the builder acts on the current input.
IsActive = true;		IsActive = true;

Show All 13 Lines	ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {

return IsActive ? ABRT_Success : ABRT_Inactive;		return IsActive ? ABRT_Success : ABRT_Inactive;
}		}

void appendTopLevelActions(ActionList &AL) override {		void appendTopLevelActions(ActionList &AL) override {
// Utility to append actions to the top level list.		// Utility to append actions to the top level list.
auto AddTopLevel = [&](Action *A, CudaArch BoundArch) {		auto AddTopLevel = [&](Action *A, CudaArch BoundArch) {
OffloadAction::DeviceDependences Dep;		OffloadAction::DeviceDependences Dep;
Dep.add(A, ToolChains.front(), CudaArchToString(BoundArch),		Dep.add(A, ToolChains.front(), CudaArchToString(BoundArch), OFK);
Action::OFK_Cuda);
AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));		AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));
};		};

// If we have a fat binary, add it to the list.		// If we have a fat binary, add it to the list.
if (CudaFatBinary) {		if (CudaFatBinary) {
AddTopLevel(CudaFatBinary, CudaArch::UNKNOWN);		AddTopLevel(CudaFatBinary, CudaArch::UNKNOWN);
CudaDeviceActions.clear();		CudaDeviceActions.clear();
CudaFatBinary = nullptr;		CudaFatBinary = nullptr;
Show All 12 Lines	void appendTopLevelActions(ActionList &AL) override {
"Expecting to have a sing CUDA toolchain.");		"Expecting to have a sing CUDA toolchain.");
for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)		for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)
AddTopLevel(CudaDeviceActions[I], GpuArchList[I]);		AddTopLevel(CudaDeviceActions[I], GpuArchList[I]);

CudaDeviceActions.clear();		CudaDeviceActions.clear();
}		}

bool initialize() override {		bool initialize() override {
		assert(OFK == Action::OFK_Cuda \|\| OFK == Action::OFK_HIP);

// We don't need to support CUDA.		// We don't need to support CUDA.
if (!C.hasOffloadToolChain<Action::OFK_Cuda>())		if (OFK == Action::OFK_Cuda && !C.hasOffloadToolChain<Action::OFK_Cuda>())
		return false;

		// We don't need to support HIP.
		if (OFK == Action::OFK_HIP && !C.hasOffloadToolChain<Action::OFK_HIP>())
return false;		return false;

const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();		const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
assert(HostTC && "No toolchain for host compilation.");		assert(HostTC && "No toolchain for host compilation.");
if (HostTC->getTriple().isNVPTX()) {		if (HostTC->getTriple().isNVPTX() \|\|
// We do not support targeting NVPTX for host compilation. Throw		HostTC->getTriple().getArch() == llvm::Triple::amdgcn) {
		// We do not support targeting NVPTX/AMDGCN for host compilation. Throw
// an error and abort pipeline construction early so we don't trip		// an error and abort pipeline construction early so we don't trip
// asserts that assume device-side compilation.		// asserts that assume device-side compilation.
C.getDriver().Diag(diag::err_drv_cuda_nvptx_host);		C.getDriver().Diag(diag::err_drv_cuda_host_arch)
		<< HostTC->getTriple().getArchName();
return true;		return true;
		traUnsubmitted Done Reply Inline Actions You really want to either have your own error message or change existing one to something more generic. `unsupported use of NVPTX for host compilation.` will sound very confusing tof someone who tries to compile for AMD GPU. tra: You really want to either have your own error message or change existing one to something more…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Will generalise this error message. yaxunl: Will generalise this error message.
}		}

ToolChains.push_back(C.getSingleOffloadToolChain<Action::OFK_Cuda>());		ToolChains.push_back(
		OFK == Action::OFK_Cuda
		? C.getSingleOffloadToolChain<Action::OFK_Cuda>()
		: C.getSingleOffloadToolChain<Action::OFK_HIP>());

Arg *PartialCompilationArg = Args.getLastArg(		Arg *PartialCompilationArg = Args.getLastArg(
options::OPT_cuda_host_only, options::OPT_cuda_device_only,		options::OPT_cuda_host_only, options::OPT_cuda_device_only,
options::OPT_cuda_compile_host_device);		options::OPT_cuda_compile_host_device);
CompileHostOnly = PartialCompilationArg &&		CompileHostOnly = PartialCompilationArg &&
PartialCompilationArg->getOption().matches(		PartialCompilationArg->getOption().matches(
options::OPT_cuda_host_only);		options::OPT_cuda_host_only);
CompileDeviceOnly = PartialCompilationArg &&		CompileDeviceOnly = PartialCompilationArg &&
Show All 36 Lines	bool initialize() override {
// suboptimally, on all newer GPUs.		// suboptimally, on all newer GPUs.
if (GpuArchList.empty())		if (GpuArchList.empty())
GpuArchList.push_back(CudaArch::SM_20);		GpuArchList.push_back(CudaArch::SM_20);

return Error;		return Error;
}		}
};		};

		/// \brief CUDA action builder. It injects device code in the host backend
		/// action.
		class CudaActionBuilder final : public CudaActionBuilderBase {
		public:
		CudaActionBuilder(Compilation &C, DerivedArgList &Args,
		const Driver::InputList &Inputs)
		: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_Cuda) {}
		};

		/// \brief HIP action builder. It injects device code in the host backend
		/// action.
		class HIPActionBuilder final : public CudaActionBuilderBase {
		public:
		HIPActionBuilder(Compilation &C, DerivedArgList &Args,
		const Driver::InputList &Inputs)
		: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_HIP) {}
		};

/// OpenMP action builder. The host bitcode is passed to the device frontend		/// OpenMP action builder. The host bitcode is passed to the device frontend
/// and all the device linked images are passed to the host link phase.		/// and all the device linked images are passed to the host link phase.
class OpenMPActionBuilder final : public DeviceActionBuilder {		class OpenMPActionBuilder final : public DeviceActionBuilder {
/// The OpenMP actions for the current input.		/// The OpenMP actions for the current input.
ActionList OpenMPDeviceActions;		ActionList OpenMPDeviceActions;

/// The linker inputs obtained for each toolchain.		/// The linker inputs obtained for each toolchain.
SmallVector<ActionList, 8> DeviceLinkerInputs;		SmallVector<ActionList, 8> DeviceLinkerInputs;
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	OffloadingActionBuilder(Compilation &C, DerivedArgList &Args,
: C(C) {		: C(C) {
// Create a specialized builder for each device toolchain.		// Create a specialized builder for each device toolchain.

IsValid = true;		IsValid = true;

// Create a specialized builder for CUDA.		// Create a specialized builder for CUDA.
SpecializedBuilders.push_back(new CudaActionBuilder(C, Args, Inputs));		SpecializedBuilders.push_back(new CudaActionBuilder(C, Args, Inputs));

		// Create a specialized builder for HIP.
		SpecializedBuilders.push_back(new HIPActionBuilder(C, Args, Inputs));

// Create a specialized builder for OpenMP.		// Create a specialized builder for OpenMP.
SpecializedBuilders.push_back(new OpenMPActionBuilder(C, Args, Inputs));		SpecializedBuilders.push_back(new OpenMPActionBuilder(C, Args, Inputs));

//		//
// TODO: Build other specialized builders here.		// TODO: Build other specialized builders here.
//		//

// Initialize all the builders, keeping track of errors. If all valid		// Initialize all the builders, keeping track of errors. If all valid
▲ Show 20 Lines • Show All 655 Lines • ▼ Show 20 Lines	class ToolSelector final {

/// Set to true if the current toolchain refers to host actions.		/// Set to true if the current toolchain refers to host actions.
bool IsHostSelector;		bool IsHostSelector;

/// Set to true if save-temps and embed-bitcode functionalities are active.		/// Set to true if save-temps and embed-bitcode functionalities are active.
bool SaveTemps;		bool SaveTemps;
bool EmbedBitcode;		bool EmbedBitcode;

		/// Has LLVM IR input files (either bitcode or LLVM assembly).
		bool HasIRInputs;

/// Get previous dependent action or null if that does not exist. If		/// Get previous dependent action or null if that does not exist. If
/// \a CanBeCollapsed is false, that action must be legal to collapse or		/// \a CanBeCollapsed is false, that action must be legal to collapse or
/// null will be returned.		/// null will be returned.
const JobAction *getPrevDependentAction(const ActionList &Inputs,		const JobAction *getPrevDependentAction(const ActionList &Inputs,
ActionList &SavedOffloadAction,		ActionList &SavedOffloadAction,
bool CanBeCollapsed = true) {		bool CanBeCollapsed = true) {
// An option can be collapsed only if it has a single input.		// An option can be collapsed only if it has a single input.
if (Inputs.size() != 1)		if (Inputs.size() != 1)
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	return TC.useIntegratedAs() && !SaveTemps &&
!C.getArgs().hasArg(options::OPT__SLASH_FA) &&		!C.getArgs().hasArg(options::OPT__SLASH_FA) &&
!C.getArgs().hasArg(options::OPT__SLASH_Fa);		!C.getArgs().hasArg(options::OPT__SLASH_Fa);
}		}

/// Return true if a preprocessor action can be collapsed.		/// Return true if a preprocessor action can be collapsed.
bool canCollapsePreprocessorAction() const {		bool canCollapsePreprocessorAction() const {
return !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&		return !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&
!C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps &&		!C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps &&
!C.getArgs().hasArg(options::OPT_rewrite_objc);		!HasIRInputs && !C.getArgs().hasArg(options::OPT_rewrite_objc);
}		}

/// Struct that relates an action with the offload actions that would be		/// Struct that relates an action with the offload actions that would be
/// collapsed with it.		/// collapsed with it.
struct JobActionInfo final {		struct JobActionInfo final {
/// The action this info refers to.		/// The action this info refers to.
const JobAction *JA = nullptr;		const JobAction *JA = nullptr;
/// The offload actions we need to take care off if this action is		/// The offload actions we need to take care off if this action is
/// collapsed.		/// collapsed.
ActionList SavedOffloadAction;		ActionList SavedOffloadAction;
};		};

/// Append collapsed offload actions from the give nnumber of elements in the		/// Append collapsed offload actions from the give nnumber of elements in the
/// action info array.		/// action info array.
static void AppendCollapsedOffloadAction(ActionList &CollapsedOffloadAction,		static void AppendCollapsedOffloadAction(ActionList &CollapsedOffloadAction,
ArrayRef<JobActionInfo> &ActionInfo,		ArrayRef<JobActionInfo> &ActionInfo,
unsigned ElementNum) {		unsigned ElementNum) {
assert(ElementNum <= ActionInfo.size() && "Invalid number of elements.");		assert(ElementNum <= ActionInfo.size() && "Invalid number of elements.");
for (unsigned I = 0; I < ElementNum; ++I)		for (unsigned I = 0; I < ElementNum; ++I)
CollapsedOffloadAction.append(ActionInfo[I].SavedOffloadAction.begin(),		CollapsedOffloadAction.append(ActionInfo[I].SavedOffloadAction.begin(),
ActionInfo[I].SavedOffloadAction.end());		ActionInfo[I].SavedOffloadAction.end());
}		}

/// Functions that attempt to perform the combining. They detect if that is		/// Functions that attempt to perform the combining. They detect if that is
/// legal, and if so they update the inputs \a Inputs and the offload action		/// legal, and if so they update the inputs \a Inputs and the offload action
		traUnsubmitted Done Reply Inline Actions Would it make sense for HIP to have its own offloading kind? You seem to be adding similar checks in few other places. tra: Would it make sense for HIP to have its own offloading kind? You seem to be adding similar…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Yes it makes sense to let HIP have its own offloading kind. Will do. yaxunl: Yes it makes sense to let HIP have its own offloading kind. Will do.
/// that were collapsed in \a CollapsedOffloadAction. A tool that deals with		/// that were collapsed in \a CollapsedOffloadAction. A tool that deals with
/// the combined action is returned. If the combining is not legal or if the		/// the combined action is returned. If the combining is not legal or if the
/// tool does not exist, null is returned.		/// tool does not exist, null is returned.
/// Currently three kinds of collapsing are supported:		/// Currently three kinds of collapsing are supported:
/// - Assemble + Backend + Compile;		/// - Assemble + Backend + Compile;
/// - Assemble + Backend ;		/// - Assemble + Backend ;
/// - Backend + Compile.		/// - Backend + Compile.
const Tool *		const Tool *
combineAssembleBackendCompile(ArrayRef<JobActionInfo> ActionInfo,		combineAssembleBackendCompile(ArrayRef<JobActionInfo> ActionInfo,
const ActionList *&Inputs,		const ActionList *&Inputs,
ActionList &CollapsedOffloadAction) {		ActionList &CollapsedOffloadAction) {
if (ActionInfo.size() < 3 \|\| !canCollapseAssembleAction())		if (ActionInfo.size() < 3 \|\| !canCollapseAssembleAction())
return nullptr;		return nullptr;
auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);		auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);
auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);		auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);
auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[2].JA);		auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[2].JA);
if (!AJ \|\| !BJ \|\| !CJ)		if (!AJ \|\| !BJ \|\| !CJ)
return nullptr;		return nullptr;

		if (AJ->isOffloading(Action::OFK_HIP))
		return nullptr;

// Get compiler tool.		// Get compiler tool.
const Tool T = TC.SelectTool(CJ);		const Tool T = TC.SelectTool(CJ);
if (!T)		if (!T)
return nullptr;		return nullptr;

// When using -fembed-bitcode, it is required to have the same tool (clang)		// When using -fembed-bitcode, it is required to have the same tool (clang)
// for both CompilerJA and BackendJA. Otherwise, combine two stages.		// for both CompilerJA and BackendJA. Otherwise, combine two stages.
if (EmbedBitcode) {		if (EmbedBitcode) {
Show All 15 Lines	const Tool *combineAssembleBackend(ArrayRef<JobActionInfo> ActionInfo,
ActionList &CollapsedOffloadAction) {		ActionList &CollapsedOffloadAction) {
if (ActionInfo.size() < 2 \|\| !canCollapseAssembleAction())		if (ActionInfo.size() < 2 \|\| !canCollapseAssembleAction())
return nullptr;		return nullptr;
auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);		auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);
auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);		auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);
if (!AJ \|\| !BJ)		if (!AJ \|\| !BJ)
return nullptr;		return nullptr;

		if (AJ->isOffloading(Action::OFK_HIP))
		return nullptr;

// Retrieve the compile job, backend action must always be preceded by one.		// Retrieve the compile job, backend action must always be preceded by one.
ActionList CompileJobOffloadActions;		ActionList CompileJobOffloadActions;
auto *CJ = getPrevDependentAction(BJ->getInputs(), CompileJobOffloadActions,		auto *CJ = getPrevDependentAction(BJ->getInputs(), CompileJobOffloadActions,
/CanBeCollapsed=/false);		/CanBeCollapsed=/false);
if (!AJ \|\| !BJ \|\| !CJ)		if (!AJ \|\| !BJ \|\| !CJ)
return nullptr;		return nullptr;

assert(isa<CompileJobAction>(CJ) &&		assert(isa<CompileJobAction>(CJ) &&
Show All 17 Lines	const Tool *combineBackendCompile(ArrayRef<JobActionInfo> ActionInfo,
ActionList &CollapsedOffloadAction) {		ActionList &CollapsedOffloadAction) {
if (ActionInfo.size() < 2)		if (ActionInfo.size() < 2)
return nullptr;		return nullptr;
auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[0].JA);		auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[0].JA);
auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[1].JA);		auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[1].JA);
if (!BJ \|\| !CJ)		if (!BJ \|\| !CJ)
return nullptr;		return nullptr;

		// Cannot combine compilation with backend for HIP. However
		// it is necessary to combine when generating IR for compile-only with
		// flags "-c -S -emit-llvm". If only flags "-c -S" DeviceLibraryLink is
		// needed to generate linked and opt IR for assembler, so do not combine.
		if (BJ->isOffloading(Action::OFK_HIP) &&
		!(C.getArgs().hasArg(options::OPT_c) &&
		C.getArgs().hasArg(options::OPT_S) &&
		C.getArgs().hasArg(options::OPT_emit_llvm)))
		return nullptr;

// Check if the initial input (to the compile job or its predessor if one		// Check if the initial input (to the compile job or its predessor if one
// exists) is LLVM bitcode. In that case, no preprocessor step is required		// exists) is LLVM bitcode. In that case, no preprocessor step is required
// and we can still collapse the compile and backend jobs when we have		// and we can still collapse the compile and backend jobs when we have
// -save-temps. I.e. there is no need for a separate compile job just to		// -save-temps. I.e. there is no need for a separate compile job just to
// emit unoptimized bitcode.		// emit unoptimized bitcode.
bool InputIsBitcode = true;		bool InputIsBitcode = true;
for (size_t i = 1; i < ActionInfo.size(); i++)		for (size_t i = 1; i < ActionInfo.size(); i++)
if (ActionInfo[i].JA->getType() != types::TY_LLVM_BC &&		if (ActionInfo[i].JA->getType() != types::TY_LLVM_BC &&
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines

public:		public:
ToolSelector(const JobAction *BaseAction, const ToolChain &TC,		ToolSelector(const JobAction *BaseAction, const ToolChain &TC,
const Compilation &C, bool SaveTemps, bool EmbedBitcode)		const Compilation &C, bool SaveTemps, bool EmbedBitcode)
: TC(TC), C(C), BaseAction(BaseAction), SaveTemps(SaveTemps),		: TC(TC), C(C), BaseAction(BaseAction), SaveTemps(SaveTemps),
EmbedBitcode(EmbedBitcode) {		EmbedBitcode(EmbedBitcode) {
assert(BaseAction && "Invalid base action.");		assert(BaseAction && "Invalid base action.");
IsHostSelector = BaseAction->getOffloadingDeviceKind() == Action::OFK_None;		IsHostSelector = BaseAction->getOffloadingDeviceKind() == Action::OFK_None;
		// Check whether there are LLVM IR input files.
		HasIRInputs = false;
		for (Arg *A : C.getInputArgs()) {
		if (A->getOption().getKind() == Option::InputClass) {
		auto Ext = llvm::sys::path::extension(A->getValue());
		if (!Ext.empty()) {
		traUnsubmitted Done Reply Inline Actions `llvm::sys::path::extension` ? tra: `llvm::sys::path::extension` ?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
		auto InputType = TC.LookupTypeForExtension(Ext.drop_front());
		HasIRInputs =
		InputType == types::TY_LLVM_IR \|\| InputType == types::TY_LLVM_BC;
		}
		}
		}
}		}

/// Check if a chain of actions can be combined and return the tool that can		/// Check if a chain of actions can be combined and return the tool that can
/// handle the combination of actions. The pointer to the current inputs \a		/// handle the combination of actions. The pointer to the current inputs \a
/// Inputs and the list of offload actions \a CollapsedOffloadActions		/// Inputs and the list of offload actions \a CollapsedOffloadActions
/// connected to collapsed actions are updated accordingly. The latter enables		/// connected to collapsed actions are updated accordingly. The latter enables
/// the caller of the selector to process them afterwards instead of just		/// the caller of the selector to process them afterwards instead of just
/// dropping them. If no suitable tool is found, null will be returned.		/// dropping them. If no suitable tool is found, null will be returned.
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	const char *Driver::GetNamedOutputPath(Compilation &C, const JobAction &JA,
}		}

// Output to a temporary file?		// Output to a temporary file?
if ((!AtTopLevel && !isSaveTempsEnabled() &&		if ((!AtTopLevel && !isSaveTempsEnabled() &&
!C.getArgs().hasArg(options::OPT__SLASH_Fo)) \|\|		!C.getArgs().hasArg(options::OPT__SLASH_Fo)) \|\|
CCGenDiagnostics) {		CCGenDiagnostics) {
StringRef Name = llvm::sys::path::filename(BaseInput);		StringRef Name = llvm::sys::path::filename(BaseInput);
std::pair<StringRef, StringRef> Split = Name.split('.');		std::pair<StringRef, StringRef> Split = Name.split('.');
std::string TmpName = GetTemporaryPath(		std::string TmpName = GetTemporaryPath(
Split.first, types::getTypeTempSuffix(JA.getType(), IsCLMode()));		Split.first, types::getTypeTempSuffix(JA.getType(), IsCLMode()));
return C.addTempFile(C.getArgs().MakeArgString(TmpName));		return C.addTempFile(C.getArgs().MakeArgString(TmpName));
}		}

SmallString<128> BasePath(BaseInput);		SmallString<128> BasePath(BaseInput);
StringRef BaseName;		StringRef BaseName;

		traUnsubmitted Done Reply Inline Actions I'm not sure why we do this here. Nor does it seem relevant to HIP. tra: I'm not sure why we do this here. Nor does it seem relevant to HIP.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove it yaxunl: will remove it
// Dsymutil actions should use the full path.		// Dsymutil actions should use the full path.
if (isa<DsymutilJobAction>(JA) \|\| isa<VerifyJobAction>(JA))		if (isa<DsymutilJobAction>(JA) \|\| isa<VerifyJobAction>(JA))
BaseName = BasePath;		BaseName = BasePath;
else		else
BaseName = llvm::sys::path::filename(BasePath);		BaseName = llvm::sys::path::filename(BasePath);

// Determine what the derived output name should be.		// Determine what the derived output name should be.
const char *NamedOutput;		const char *NamedOutput;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if ((JA.getType() == types::TY_Object \|\| JA.getType() == types::TY_LTO_BC) &&
}		}
// When using both -save-temps and -emit-llvm, use a ".tmp.bc" suffix for		// When using both -save-temps and -emit-llvm, use a ".tmp.bc" suffix for
// the unoptimized bitcode so that it does not get overwritten by the ".bc"		// the unoptimized bitcode so that it does not get overwritten by the ".bc"
// optimized bitcode output.		// optimized bitcode output.
if (!AtTopLevel && C.getArgs().hasArg(options::OPT_emit_llvm) &&		if (!AtTopLevel && C.getArgs().hasArg(options::OPT_emit_llvm) &&
JA.getType() == types::TY_LLVM_BC)		JA.getType() == types::TY_LLVM_BC)
Suffixed += ".tmp";		Suffixed += ".tmp";
Suffixed += '.';		Suffixed += '.';
Suffixed += Suffix;		Suffixed += Suffix;
NamedOutput = C.getArgs().MakeArgString(Suffixed.c_str());		NamedOutput = C.getArgs().MakeArgString(Suffixed.c_str());
		traUnsubmitted Done Reply Inline Actions Same here -- I'm not sure why we need this nor how it's related to HIP. tra: Same here -- I'm not sure why we need this nor how it's related to HIP.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove yaxunl: will remove
}		}

// Prepend object file path if -save-temps=obj		// Prepend object file path if -save-temps=obj
if (!AtTopLevel && isSaveTempsObj() && C.getArgs().hasArg(options::OPT_o) &&		if (!AtTopLevel && isSaveTempsObj() && C.getArgs().hasArg(options::OPT_o) &&
JA.getType() != types::TY_PCH) {		JA.getType() != types::TY_PCH) {
Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o);		Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o);
SmallString<128> TempPath(FinalOutput->getValue());		SmallString<128> TempPath(FinalOutput->getValue());
llvm::sys::path::remove_filename(TempPath);		llvm::sys::path::remove_filename(TempPath);
▲ Show 20 Lines • Show All 403 Lines • Show Last 20 Lines

lib/Driver/ToolChain.cpp

Show First 20 Lines • Show All 247 Lines • ▼ Show 20 Lines
}		}

Tool *ToolChain::getClang() const {		Tool *ToolChain::getClang() const {
if (!Clang)		if (!Clang)
Clang.reset(new tools::Clang(*this));		Clang.reset(new tools::Clang(*this));
return Clang.get();		return Clang.get();
}		}

		Tool *ToolChain::buildDeviceLibraryLinker() const {
		return new tools::Clang(*this);
		}

Tool *ToolChain::buildAssembler() const {		Tool *ToolChain::buildAssembler() const {
return new tools::ClangAs(*this);		return new tools::ClangAs(*this);
}		}

Tool *ToolChain::buildLinker() const {		Tool *ToolChain::buildLinker() const {
llvm_unreachable("Linking is not supported by this toolchain");		llvm_unreachable("Linking is not supported by this toolchain");
}		}

Tool *ToolChain::getAssemble() const {		Tool *ToolChain::getAssemble() const {
if (!Assemble)		if (!Assemble)
Assemble.reset(buildAssembler());		Assemble.reset(buildAssembler());
return Assemble.get();		return Assemble.get();
}		}

		Tool *ToolChain::getDeviceLibraryLink() const {
		if (!DeviceLibraryLink)
		DeviceLibraryLink.reset(buildDeviceLibraryLinker());
		return DeviceLibraryLink.get();
		}

Tool *ToolChain::getClangAs() const {		Tool *ToolChain::getClangAs() const {
if (!Assemble)		if (!Assemble)
Assemble.reset(new tools::ClangAs(*this));		Assemble.reset(new tools::ClangAs(*this));
return Assemble.get();		return Assemble.get();
}		}

Tool *ToolChain::getLink() const {		Tool *ToolChain::getLink() const {
if (!Link)		if (!Link)
Show All 24 Lines	case Action::VerifyDebugInfoJobClass:
llvm_unreachable("Invalid tool kind.");		llvm_unreachable("Invalid tool kind.");

case Action::CompileJobClass:		case Action::CompileJobClass:
case Action::PrecompileJobClass:		case Action::PrecompileJobClass:
case Action::PreprocessJobClass:		case Action::PreprocessJobClass:
case Action::AnalyzeJobClass:		case Action::AnalyzeJobClass:
case Action::MigrateJobClass:		case Action::MigrateJobClass:
case Action::VerifyPCHJobClass:		case Action::VerifyPCHJobClass:
case Action::BackendJobClass:
return getClang();		return getClang();
		case Action::BackendJobClass:
		return getDeviceLibraryLink();

case Action::OffloadBundlingJobClass:		case Action::OffloadBundlingJobClass:
case Action::OffloadUnbundlingJobClass:		case Action::OffloadUnbundlingJobClass:
return getOffloadBundler();		return getOffloadBundler();
}		}

llvm_unreachable("Invalid tool kind.");		llvm_unreachable("Invalid tool kind.");
}		}
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	if (Args.hasFlag(options::OPT_fprofile_arcs, options::OPT_fno_profile_arcs,
Args.hasArg(options::OPT_fcreate_profile) \|\|		Args.hasArg(options::OPT_fcreate_profile) \|\|
Args.hasArg(options::OPT_coverage))		Args.hasArg(options::OPT_coverage))
return true;		return true;

return false;		return false;
}		}

Tool *ToolChain::SelectTool(const JobAction &JA) const {		Tool *ToolChain::SelectTool(const JobAction &JA) const {
if (getDriver().ShouldUseClangCompiler(JA)) return getClang();
Action::ActionClass AC = JA.getKind();		Action::ActionClass AC = JA.getKind();
		if (JA.isOffloading(Action::OFK_HIP) &&
		(AC == Action::BackendJobClass)) {
		if ((Args.hasArg(options::OPT_emit_llvm)) \|\|
		(Args.hasArg(options::OPT_emit_llvm_bc)))
		return getClang();
		else
		return getTool(AC);
		};
		if (getDriver().ShouldUseClangCompiler(JA))
		return getClang();
if (AC == Action::AssembleJobClass && useIntegratedAs())		if (AC == Action::AssembleJobClass && useIntegratedAs())
return getClangAs();		return getClangAs();
return getTool(AC);		return getTool(AC);
}		}

std::string ToolChain::GetFilePath(const char *Name) const {		std::string ToolChain::GetFilePath(const char *Name) const {
return D.GetFilePath(Name, *this);		return D.GetFilePath(Name, *this);
}		}
▲ Show 20 Lines • Show All 530 Lines • Show Last 20 Lines

lib/Driver/ToolChains/Clang.cpp

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines

	// Apply Work on all the offloading tool chains associated with the current			// Apply Work on all the offloading tool chains associated with the current
	// action.			// action.
	if (JA.isHostOffloading(Action::OFK_Cuda))			if (JA.isHostOffloading(Action::OFK_Cuda))
	Work(*C.getSingleOffloadToolChain<Action::OFK_Cuda>());			Work(*C.getSingleOffloadToolChain<Action::OFK_Cuda>());
	else if (JA.isDeviceOffloading(Action::OFK_Cuda))			else if (JA.isDeviceOffloading(Action::OFK_Cuda))
	Work(*C.getSingleOffloadToolChain<Action::OFK_Host>());			Work(*C.getSingleOffloadToolChain<Action::OFK_Host>());

				if (JA.isHostOffloading(Action::OFK_HIP))
				Work(*C.getSingleOffloadToolChain<Action::OFK_HIP>());
				else if (JA.isDeviceOffloading(Action::OFK_HIP))
				Work(*C.getSingleOffloadToolChain<Action::OFK_Host>());

	if (JA.isHostOffloading(Action::OFK_OpenMP)) {			if (JA.isHostOffloading(Action::OFK_OpenMP)) {
	auto TCs = C.getOffloadToolChains<Action::OFK_OpenMP>();			auto TCs = C.getOffloadToolChains<Action::OFK_OpenMP>();
	for (auto II = TCs.first, IE = TCs.second; II != IE; ++II)			for (auto II = TCs.first, IE = TCs.second; II != IE; ++II)
	Work(*II->second);			Work(*II->second);
	} else if (JA.isDeviceOffloading(Action::OFK_OpenMP))			} else if (JA.isDeviceOffloading(Action::OFK_OpenMP))
	Work(*C.getSingleOffloadToolChain<Action::OFK_Host>());			Work(*C.getSingleOffloadToolChain<Action::OFK_Host>());

	//			//
	▲ Show 20 Lines • Show All 1,984 Lines • ▼ Show 20 Lines
	bool KernelOrKext =			bool KernelOrKext =
	Args.hasArg(options::OPT_mkernel, options::OPT_fapple_kext);			Args.hasArg(options::OPT_mkernel, options::OPT_fapple_kext);
	const Driver &D = getToolChain().getDriver();			const Driver &D = getToolChain().getDriver();
	ArgStringList CmdArgs;			ArgStringList CmdArgs;

	// Check number of inputs for sanity. We need at least one input.			// Check number of inputs for sanity. We need at least one input.
	assert(Inputs.size() >= 1 && "Must have at least one input.");			assert(Inputs.size() >= 1 && "Must have at least one input.");
	const InputInfo &Input = Inputs[0];			const InputInfo &Input = Inputs[0];
	// CUDA compilation may have multiple inputs (source file + results of			// CUDA/HIP compilation may have multiple inputs (source file + results of
	// device-side compilations). OpenMP device jobs also take the host IR as a			// device-side compilations). OpenMP device jobs also take the host IR as a
	// second input. All other jobs are expected to have exactly one			// second input. All other jobs are expected to have exactly one
	// input.			// input.
	bool IsCuda = JA.isOffloading(Action::OFK_Cuda);			bool IsCuda = JA.isOffloading(Action::OFK_Cuda);
				bool IsHIP = JA.isOffloading(Action::OFK_HIP);
	bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP);			bool IsOpenMPDevice = JA.isDeviceOffloading(Action::OFK_OpenMP);
	assert((IsCuda \|\| (IsOpenMPDevice && Inputs.size() == 2) \|\|			assert((IsCuda \|\| IsHIP \|\| (IsOpenMPDevice && Inputs.size() == 2) \|\|
	Inputs.size() == 1) &&			Inputs.size() == 1) &&
	"Unable to handle multiple inputs.");			"Unable to handle multiple inputs.");

	const llvm::Triple *AuxTriple =			const llvm::Triple *AuxTriple =
	IsCuda ? getToolChain().getAuxTriple() : nullptr;			IsCuda ? getToolChain().getAuxTriple() : nullptr;

	bool IsWindowsGNU = RawTriple.isWindowsGNUEnvironment();			bool IsWindowsGNU = RawTriple.isWindowsGNUEnvironment();
	bool IsWindowsCygnus = RawTriple.isWindowsCygwinEnvironment();			bool IsWindowsCygnus = RawTriple.isWindowsCygwinEnvironment();
	bool IsWindowsMSVC = RawTriple.isWindowsMSVCEnvironment();			bool IsWindowsMSVC = RawTriple.isWindowsMSVCEnvironment();
	bool IsIAMCU = RawTriple.isOSIAMCU();			bool IsIAMCU = RawTriple.isOSIAMCU();

	// Adjust IsWindowsXYZ for CUDA compilations. Even when compiling in device			// Adjust IsWindowsXYZ for CUDA/HIP compilations. Even when compiling in
	// mode (i.e., getToolchain().getTriple() is NVPTX, not Windows), we need to			// device mode (i.e., getToolchain().getTriple() is NVPTX/AMDGCN, not
	// pass Windows-specific flags to cc1.			// Windows), we need to pass Windows-specific flags to cc1.
	if (IsCuda) {			if (IsCuda \|\| IsHIP) {
	IsWindowsMSVC \|= AuxTriple && AuxTriple->isWindowsMSVCEnvironment();			IsWindowsMSVC \|= AuxTriple && AuxTriple->isWindowsMSVCEnvironment();
	IsWindowsGNU \|= AuxTriple && AuxTriple->isWindowsGNUEnvironment();			IsWindowsGNU \|= AuxTriple && AuxTriple->isWindowsGNUEnvironment();
	IsWindowsCygnus \|= AuxTriple && AuxTriple->isWindowsCygwinEnvironment();			IsWindowsCygnus \|= AuxTriple && AuxTriple->isWindowsCygwinEnvironment();
	}			}

	// C++ is not supported for IAMCU.			// C++ is not supported for IAMCU.
	if (IsIAMCU && types::isCXX(Input.getType()))			if (IsIAMCU && types::isCXX(Input.getType()))
	D.Diag(diag::err_drv_clang_unsupported) << "C++ for IAMCU";			D.Diag(diag::err_drv_clang_unsupported) << "C++ for IAMCU";

	// Invoke ourselves in -cc1 mode.			// Invoke ourselves in -cc1 mode.
	//			//
	// FIXME: Implement custom jobs for internal actions.			// FIXME: Implement custom jobs for internal actions.
	CmdArgs.push_back("-cc1");			CmdArgs.push_back("-cc1");

	// Add the "effective" target triple.			// Add the "effective" target triple.
	CmdArgs.push_back("-triple");			CmdArgs.push_back("-triple");
	CmdArgs.push_back(Args.MakeArgString(TripleStr));			CmdArgs.push_back(Args.MakeArgString(TripleStr));

	if (const Arg *MJ = Args.getLastArg(options::OPT_MJ)) {			if (const Arg *MJ = Args.getLastArg(options::OPT_MJ)) {
	DumpCompilationDatabase(C, MJ->getValue(), TripleStr, Output, Input, Args);			DumpCompilationDatabase(C, MJ->getValue(), TripleStr, Output, Input, Args);
	Args.ClaimAllArgs(options::OPT_MJ);			Args.ClaimAllArgs(options::OPT_MJ);
	}			}

	if (IsCuda) {			if (IsCuda \|\| IsHIP) {
	// We have to pass the triple of the host if compiling for a CUDA device and			// We have to pass the triple of the host if compiling for a CUDA/HIP device
	// vice-versa.			// and vice-versa.
	std::string NormalizedTriple;			std::string NormalizedTriple;
	if (JA.isDeviceOffloading(Action::OFK_Cuda))			if (JA.isDeviceOffloading(Action::OFK_Cuda) \|\|
				JA.isDeviceOffloading(Action::OFK_HIP))
	NormalizedTriple = C.getSingleOffloadToolChain<Action::OFK_Host>()			NormalizedTriple = C.getSingleOffloadToolChain<Action::OFK_Host>()
	->getTriple()			->getTriple()
	.normalize();			.normalize();
	else			else
	NormalizedTriple = C.getSingleOffloadToolChain<Action::OFK_Cuda>()			NormalizedTriple =
	->getTriple()			(IsCuda ? C.getSingleOffloadToolChain<Action::OFK_Cuda>()
	.normalize();			: C.getSingleOffloadToolChain<Action::OFK_HIP>())
				->getTriple()
				.normalize();

	CmdArgs.push_back("-aux-triple");			CmdArgs.push_back("-aux-triple");
	CmdArgs.push_back(Args.MakeArgString(NormalizedTriple));			CmdArgs.push_back(Args.MakeArgString(NormalizedTriple));
	}			}

	if (IsOpenMPDevice) {			if (IsOpenMPDevice) {
	// We have to pass the triple of the host if compiling for an OpenMP device.			// We have to pass the triple of the host if compiling for an OpenMP device.
	std::string NormalizedTriple =			std::string NormalizedTriple =
	▲ Show 20 Lines • Show All 1,529 Lines • ▼ Show 20 Lines
	EscapeSpacesAndBackslashes(OriginalArg, EscapedArg);			EscapeSpacesAndBackslashes(OriginalArg, EscapedArg);
	Flags += " ";			Flags += " ";
	Flags += EscapedArg;			Flags += EscapedArg;
	}			}
	CmdArgs.push_back("-dwarf-debug-flags");			CmdArgs.push_back("-dwarf-debug-flags");
	CmdArgs.push_back(Args.MakeArgString(Flags));			CmdArgs.push_back(Args.MakeArgString(Flags));
	}			}

	if (IsCuda) {			if (IsCuda \|\| IsHIP) {
	// Host-side cuda compilation receives all device-side outputs in a single			// Host-side cuda/HIP compilation receives all device-side outputs in a
	// fatbin as Inputs[1]. Include the binary with -fcuda-include-gpubinary.			// single fatbin as Inputs[1]. Include the binary with
				// -fcuda-include-gpubinary.
	if (Inputs.size() > 1) {			if (Inputs.size() > 1) {
	assert(Inputs.size() == 2 && "More than one GPU binary!");			assert(Inputs.size() == 2 && "More than one GPU binary!");
	CmdArgs.push_back("-fcuda-include-gpubinary");			CmdArgs.push_back("-fcuda-include-gpubinary");
	CmdArgs.push_back(Inputs[1].getFilename());			CmdArgs.push_back(Inputs[1].getFilename());
	}			}

	if (Args.hasFlag(options::OPT_fcuda_rdc, options::OPT_fno_cuda_rdc, false))			if (Args.hasFlag(options::OPT_fcuda_rdc, options::OPT_fno_cuda_rdc, false))
	CmdArgs.push_back("-fcuda-rdc");			CmdArgs.push_back("-fcuda-rdc");
	▲ Show 20 Lines • Show All 924 Lines • Show Last 20 Lines

lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	public:

void ConstructJob(Compilation &C, const JobAction &JA,		void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,		const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,		const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;		const char *LinkingOutput) const override;
};		};

} // end namespace NVPTX		} // end namespace NVPTX

		namespace AMDGCN {
		// Run llc, the AMDGPU assembler.
		class LLVM_LIBRARY_VISIBILITY Assembler : public Tool {
		public:
		Assembler(const ToolChain &TC)
		: Tool("AMDGCN::Assembler", "llc", TC, RF_Full, llvm::sys::WEM_UTF8,
		"--options-file") {}

		traUnsubmitted Done Reply Inline Actions Where does amdgcn-link come from? Does it accept --options-file ? tra: Where does amdgcn-link come from? Does it accept --options-file ?
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions amdgcn-link is the short name of the amdgcn bitcode linker. It is not a real program and does not support response file. Will remove the arguments about response file. yaxunl: amdgcn-link is the short name of the amdgcn bitcode linker. It is not a real program and does…
		bool hasIntegratedCPP() const override { return false; }

		void ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output, const InputInfoList &Inputs,
		const llvm::opt::ArgList &TCArgs,
		const char *LinkingOutput) const override;
		};

		// Runs clang-offload-bundler, which combines AMDGCN object files into a single
		// output file.
		class LLVM_LIBRARY_VISIBILITY Linker : public Tool {
		public:
		Linker(const ToolChain &TC)
		: Tool("AMDGCN::Linker", "clang-offload-bundler", TC, RF_Full,
		llvm::sys::WEM_UTF8, "--options-file") {}

		bool hasIntegratedCPP() const override { return false; }

		void ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output, const InputInfoList &Inputs,
		const llvm::opt::ArgList &TCArgs,
		const char *LinkingOutput) const override;
		};

		// For amdgcn the device library linker is llvm-link + opt.
		class LLVM_LIBRARY_VISIBILITY DeviceLibraryLinker : public Tool {
		public:
		DeviceLibraryLinker(const ToolChain &TC)
		: Tool("AMDGCN::DeviceLibraryLinker", "device-library-linker", TC,
		RF_Full, llvm::sys::WEM_UTF8, "--options-file") {}
		virtual bool hasIntegratedCPP() const override { return false; }
		virtual void ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output,
		const InputInfoList &Inputs,
		const llvm::opt::ArgList &TCArgs,
		const char *LinkingOutput) const override;
		};
		} // end namespace AMDGCN
} // end namespace tools		} // end namespace tools

namespace toolchains {		namespace toolchains {

class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {		class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {
public:		public:
		traUnsubmitted Done Reply Inline Actions You may want to derive your own HIPToolChain as you're unlikely to reuse NVPTX-specific tools. tra: You may want to derive your own HIPToolChain as you're unlikely to reuse NVPTX-specific tools.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions Will do yaxunl: Will do
CudaToolChain(const Driver &D, const llvm::Triple &Triple,		CudaToolChain(const Driver &D, const llvm::Triple &Triple,
const ToolChain &HostTC, const llvm::opt::ArgList &Args,		const ToolChain &HostTC, const llvm::opt::ArgList &Args,
const Action::OffloadKind OK);		const Action::OffloadKind OK);

const llvm::Triple *getAuxTriple() const override {		const llvm::Triple *getAuxTriple() const override {
return &HostTC.getTriple();		return &HostTC.getTriple();
}		}

std::string getInputFilename(const InputInfo &Input) const override;		std::string getInputFilename(const InputInfo &Input) const override;

llvm::opt::DerivedArgList *		llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,		TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;
void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;		Action::OffloadKind DeviceOffloadKind) const override;

// Never try to use the integrated assembler with CUDA; always fork out to		// Never try to use the integrated assembler with CUDA; always fork out to
// ptxas.		// ptxas.
bool useIntegratedAs() const override { return false; }		bool useIntegratedAs() const override { return false; }
		traUnsubmitted Done Reply Inline Actions In HIPToolchain this one would become an inline function returning true. tra: In HIPToolchain this one would become an inline function returning true.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will do yaxunl: will do
bool isCrossCompiling() const override { return true; }		bool isCrossCompiling() const override { return true; }
bool isPICDefault() const override { return false; }		bool isPICDefault() const override { return false; }
bool isPIEDefault() const override { return false; }		bool isPIEDefault() const override { return false; }
bool isPICDefaultForced() const override { return false; }		bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }		bool SupportsProfiling() const override { return false; }
bool IsMathErrnoDefault() const override { return false; }		bool IsMathErrnoDefault() const override { return false; }

void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,		void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
Show All 17 Lines	computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;

unsigned GetDefaultDwarfVersion() const override { return 2; }		unsigned GetDefaultDwarfVersion() const override { return 2; }

const ToolChain &HostTC;		const ToolChain &HostTC;
CudaInstallationDetector CudaInstallation;		CudaInstallationDetector CudaInstallation;

protected:		protected:
		Tool *buildDeviceLibraryLinker() const override; // for amdgcn, link and opt
Tool *buildAssembler() const override; // ptxas		Tool *buildAssembler() const override; // ptxas
Tool *buildLinker() const override; // fatbinary (ok, not really a linker)		Tool *buildLinker() const override; // fatbinary (ok, not really a linker)

private:		private:
const Action::OffloadKind OK;		const Action::OffloadKind OK;
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H

lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines

void CudaInstallationDetector::AddCudaIncludeArgs(		void CudaInstallationDetector::AddCudaIncludeArgs(
const ArgList &DriverArgs, ArgStringList &CC1Args) const {		const ArgList &DriverArgs, ArgStringList &CC1Args) const {
if (!DriverArgs.hasArg(options::OPT_nobuiltininc)) {		if (!DriverArgs.hasArg(options::OPT_nobuiltininc)) {
// Add cuda_wrappers/* to our system include path. This lets us wrap		// Add cuda_wrappers/* to our system include path. This lets us wrap
// standard library headers.		// standard library headers.
SmallString<128> P(D.ResourceDir);		SmallString<128> P(D.ResourceDir);
llvm::sys::path::append(P, "include");		llvm::sys::path::append(P, "include");
llvm::sys::path::append(P, "cuda_wrappers");		llvm::sys::path::append(P, "cuda_wrappers");
		HahnfeldUnsubmitted Done Reply Inline Actions Will this override the user's value, e.g. `-std=c++14`? Hahnfeld: Will this override the user's value, e.g. `-std=c++14`?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions No. We will add diagnotics in a separate patch. yaxunl: No. We will add diagnotics in a separate patch.
CC1Args.push_back("-internal-isystem");		CC1Args.push_back("-internal-isystem");
CC1Args.push_back(DriverArgs.MakeArgString(P));		CC1Args.push_back(DriverArgs.MakeArgString(P));
}		}

if (DriverArgs.hasArg(options::OPT_nocudainc))		if (DriverArgs.hasArg(options::OPT_nocudainc))
return;		return;

if (!isValid()) {		if (!isValid()) {
Show All 24 Lines	void CudaInstallationDetector::CheckCudaVersionSupportsArch(
}		}
}		}

void CudaInstallationDetector::print(raw_ostream &OS) const {		void CudaInstallationDetector::print(raw_ostream &OS) const {
if (isValid())		if (isValid())
OS << "Found CUDA installation: " << InstallPath << ", version "		OS << "Found CUDA installation: " << InstallPath << ", version "
<< CudaVersionToString(Version) << "\n";		<< CudaVersionToString(Version) << "\n";
}		}

namespace {		namespace {
/// Debug info kind.		/// Debug info kind.
enum DebugInfoKind {		enum DebugInfoKind {
NoDebug, /// No debug info.		NoDebug, /// No debug info.
LineTableOnly, /// Line tables only.		LineTableOnly, /// Line tables only.
FullDebug /// Full debug info.		FullDebug /// Full debug info.
};		};
} // anonymous namespace		} // anonymous namespace
Show All 13 Lines	if (const Arg *A = Args.getLastArg(options::OPT_g_Group)) {
return LineTableOnly;		return LineTableOnly;
}		}
return FullDebug;		return FullDebug;
}		}
}		}
return NoDebug;		return NoDebug;
}		}

		static bool addBCLib(Compilation &C, const ArgList &Args,
		ArgStringList &CmdArgs, ArgStringList LibraryPaths,
		StringRef BCName) {
		std::string FullName;
		bool FoundLibDevice = false;
		for (std::string LibraryPath : LibraryPaths) {
		SmallString<128> Path(LibraryPath);
		llvm::sys::path::append(Path, BCName);
		FullName = Args.MakeArgString(Path);
		if (llvm::sys::fs::exists(FullName.c_str())) {
		traUnsubmitted Done Reply Inline Actions `LLC_OUTPUT` ? tra: `LLC_OUTPUT` ?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will change to LLC_OUTPUT yaxunl: will change to LLC_OUTPUT
		FoundLibDevice = true;
		break;
		}
		}
		traUnsubmitted Done Reply Inline Actions Please use routines in`llvm::sys::path` here and in other places where you manipulate paths. tra: Please use routines in`llvm::sys::path` here and in other places where you manipulate paths.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
		if (!FoundLibDevice)
		C.getDriver().Diag(diag::err_drv_no_such_file) << BCName;
		CmdArgs.push_back(Args.MakeArgString(FullName));
		traUnsubmitted Done Reply Inline Actions FullName is already result of Args.MakeArgString. You only need to do it once. tra: FullName is already result of Args.MakeArgString. You only need to do it once.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will fix yaxunl: will fix
		return FoundLibDevice;
		}

		void AMDGCN::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
		traUnsubmitted Done Reply Inline Actions You could use .append({...}) CmdArgs2.append({"foo","bar","buz"}); tra: You could use .append({...}) ``` CmdArgs2.append({"foo","bar","buz"}); ```
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
		const InputInfo &Output,
		const InputInfoList &Inputs,
		traUnsubmitted Done Reply Inline Actions This function is too large to easily see that we're actually constructing sequence of commands. I'd probably split construction of individual tool's command line into its own function. tra: This function is too large to easily see that we're actually constructing sequence of commands.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will do yaxunl: will do
		const ArgList &Args,
		const char *LinkingOutput) const {
		const auto &TC =
		static_cast<const toolchains::CudaToolChain &>(getToolChain());
		assert(TC.getTriple().getArch() == llvm::Triple::amdgcn && "Wrong platform");

		ArgStringList CmdArgs;
		traUnsubmitted Done Reply Inline Actions No need for the leading space in the message. tra: No need for the leading space in the message.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will fix. yaxunl: will fix.
		for (InputInfoList::const_iterator it = Inputs.begin(), ie = Inputs.end();
		it != ie; ++it) {
		const InputInfo &II = *it;
		CmdArgs.push_back(II.getFilename());
		}
		CmdArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
		CmdArgs.push_back("-filetype=obj");
		std::string GFXNAME = JA.getOffloadingArch();
		CmdArgs.push_back(Args.MakeArgString("-mcpu=" + GFXNAME));
		traUnsubmitted Done Reply Inline Actions `for (const InputInfo &it : Inputs)` ? tra: `for (const InputInfo &it : Inputs)` ?
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will fix yaxunl: will fix
		CmdArgs.push_back("-o");
		std::string TmpName = C.getDriver().GetTemporaryPath("LLC_OUTPUT", "o");
		const char *llcOutputFile =
		C.addTempFile(C.getArgs().MakeArgString(TmpName.c_str()));
		CmdArgs.push_back(llcOutputFile);
		traUnsubmitted Done Reply Inline Actions All-caps name looks like a macro. Rename to `GfxName` ? tra: All-caps name looks like a macro. Rename to `GfxName` ?
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will fix yaxunl: will fix
		SmallString<128> llcPath(C.getDriver().Dir);
		llvm::sys::path::append(llcPath, "llc");
		const char *Exec = Args.MakeArgString(llcPath);
		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));

		ArgStringList CmdArgs2;
		// The output from ld.lld is an HSA code object file.
		CmdArgs2.append({"-flavor", "gnu", "--no-undefined", "-shared", "-o"});
		CmdArgs2.push_back(Output.getFilename());
		traUnsubmitted Done Reply Inline Actions for (path : Args.getAllArgValues(...)) { LibraryPaths.push_back(Args.MakeArgString(path)); } tra: ``` for (path : Args.getAllArgValues(...)) { LibraryPaths.push_back(Args.MakeArgString…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will fix yaxunl: will fix
		CmdArgs2.push_back(llcOutputFile);
		SmallString<128> lldPath(C.getDriver().Dir);
		llvm::sys::path::append(lldPath, "lld");
		const char *lld = Args.MakeArgString(lldPath);
		C.addCommand(llvm::make_unique<Command>(JA, *this, lld, CmdArgs2, Inputs));
		return;
		}

		void AMDGCN::Linker::ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output,
		const InputInfoList &Inputs,
		const ArgList &Args,
		const char *LinkingOutput) const {
		const auto &TC =
		static_cast<const toolchains::CudaToolChain &>(getToolChain());
		assert(TC.getTriple().getArch() == llvm::Triple::amdgcn && "Wrong platform");

		ArgStringList CmdArgs;
		CmdArgs.push_back(Args.MakeArgString("-type=o"));
		traUnsubmitted Done Reply Inline Actions This is somewhat unreadable. Perhaps you could construct the name in a temp variable. tra: This is somewhat unreadable. Perhaps you could construct the name in a temp variable.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will do yaxunl: will do

		// ToDo: Remove the dummy host binary entry which is required by
		// clang-offload-bundler.
		std::string targets = "-targets=host-x86_64-uknown-linux";
		std::string inputs = "-inputs=/dev/null";
		for (const auto &II : Inputs) {
		traUnsubmitted Done Reply Inline Actions You don't need to use c_str() for MakeArgString. It will happily accept std::string. tra: You don't need to use c_str() for MakeArgString. It will happily accept std::string.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will fix yaxunl: will fix
		if (II.getType() != types::TY_PP_Asm) {
		// ToDo: Teach clang-offload-bundler to recognize hip.
		targets = targets + ",openmp-amdgcn--amdhsa-" +
		StringRef(II.getAction()->getOffloadingArch()).str();
		inputs = inputs + "," + II.getFilename();
		}
		}
		CmdArgs.push_back(Args.MakeArgString(targets));
		CmdArgs.push_back(Args.MakeArgString(inputs));

		traUnsubmitted Done Reply Inline Actions `BitcodeOutputFile`? tra: `BitcodeOutputFile`?
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will change yaxunl: will change
		auto outputArgString =
		Args.MakeArgString(std::string("-outputs=").append(Output.getFilename()));
		CmdArgs.push_back(outputArgString);

		SmallString<128> ExecPath(C.getDriver().Dir);
		llvm::sys::path::append(ExecPath, "clang-offload-bundler");
		const char *Exec = Args.MakeArgString(ExecPath);
		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));
		return;
		traUnsubmitted Done Reply Inline Actions This is rather awkward -- you're looking for /libdevice under all paths specified by -L, but there's no way to explicitly point to the directory with the bitcode library. If device library may be in a non-canonical location, I'd rather there was an explicit option to specify it. Furthermore, my understanding is that you will need to find these bitcode libraries during `-c` compilation. Using `-L` to derive bitcode search path during compilation looks wrong to me. tra: This is rather awkward -- you're looking for /libdevice under all paths specified by -L, but…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Will use `--hip-device-lib-path=` and drop /libdevice. yaxunl: Will use `--hip-device-lib-path=` and drop /libdevice.
		}

		void AMDGCN::DeviceLibraryLinker::ConstructJob(
		Compilation &C, const JobAction &JA, const InputInfo &Output,
		const InputInfoList &Inputs, const ArgList &Args,
		const char *LinkingOutput) const {
		traUnsubmitted Done Reply Inline Actions This (and other places where you're calculating libdevice path relative to driver dir) should probably be derived from the resource dir. Driver's path may not be the 'root' the compiler has been told to use. tra: This (and other places where you're calculating libdevice path relative to driver dir) should…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove this yaxunl: will remove this

		assert(StringRef(JA.getOffloadingArch()).startswith("gfx") &&
		traUnsubmitted Done Reply Inline Actions LIBRARY_PATH sounds rather generic. Perhaps it should have HIP somewhere in its name. tra: LIBRARY_PATH sounds rather generic. Perhaps it should have HIP somewhere in its name.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will use HIP_DEVICE_LIB_PATH yaxunl: will use HIP_DEVICE_LIB_PATH
		" unless gfx processor, backend should be clang");

		// For amdgcn the DeviceLibraryLinker Job will call llvm-link & opt steps.
		ArgStringList CmdArgs;
		// Add the input bc's created by compile step.
		for (InputInfoList::const_iterator it = Inputs.begin(), ie = Inputs.end();
		traUnsubmitted Done Reply Inline Actions I think you can get rid of the temp var here without hurting readability. tra: I think you can get rid of the temp var here without hurting readability.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will do yaxunl: will do
		it != ie; ++it) {
		const InputInfo &II = *it;
		CmdArgs.push_back(II.getFilename());
		traUnsubmitted Done Reply Inline Actions I wonder if we could derive temp file name from the input's name. This may make it easier to find relevant temp files if/when we need to debug the compilation process. tra: I wonder if we could derive temp file name from the input's name. This may make it easier to…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will do yaxunl: will do
		}

		traUnsubmitted Done Reply Inline Actions No need for c_str() here. tra: No need for c_str() here.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will do yaxunl: will do
		std::string GFXNAME = JA.getOffloadingArch();

		ArgStringList LibraryPaths;

		traUnsubmitted Done Reply Inline Actions Why do we need to silence the warnings? tra: Why do we need to silence the warnings?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will remove this yaxunl: will remove this
		// Find in --hip-device-lib-path and HIP_LIBRARY_PATH.
		for (auto Arg : Args) {
		if (Arg->getSpelling() == "--hip-device-lib-path=") {
		LibraryPaths.push_back(Args.MakeArgString(Arg->getValue()));
		}
		}

		addDirectoryList(Args, LibraryPaths, "-L", "HIP_DEVICE_LIB_PATH");

		addBCLib(C, Args, CmdArgs, LibraryPaths, "libhiprt.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "opencl.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "ockl.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "irif.amdgcn.bc");
		traUnsubmitted Done Reply Inline Actions c_str(), again. tra: c_str(), again.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will fix yaxunl: will fix
		addBCLib(C, Args, CmdArgs, LibraryPaths, "ocml.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "oclc_finite_only_off.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "oclc_daz_opt_off.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths,
		"oclc_correctly_rounded_sqrt_on.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "oclc_unsafe_math_off.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "hc.amdgcn.bc");
		// Drop gfx in GFXNAME.
		addBCLib(C, Args, CmdArgs, LibraryPaths,
		(Twine("oclc_isa_version_") + StringRef(GFXNAME).drop_front(3) +
		".amdgcn.bc")
		.str());

		// Add an intermediate output file which is input to opt
		CmdArgs.push_back("-o");
		std::string TmpName = C.getDriver().GetTemporaryPath("OPT_INPUT", "bc");
		const char *ResultingBitcodeF =
		C.addTempFile(C.getArgs().MakeArgString(TmpName.c_str()));
		CmdArgs.push_back(ResultingBitcodeF);
		SmallString<128> ExecPath(C.getDriver().Dir);
		llvm::sys::path::append(ExecPath, "llvm-link");
		const char *Exec = Args.MakeArgString(ExecPath);
		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));

		ArgStringList OptArgs;
		// The input to opt is the output from llvm-link.
		traUnsubmitted Done Reply Inline Actions This does not look like the right place to disable particular passes. Shouldn't it be done somewhere in LLVM? tra: This does not look like the right place to disable particular passes. Shouldn't it be done…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions These are not disabling the passes but adding these passes. They are some optimizations which are usually improving performance for amdgcn. yaxunl: These are not disabling the passes but adding these passes. They are some optimizations which…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Since opt is now able to adjust passes based on -mtriple and -mcpu, will remove these manually added passes and add -mtriple and -mcpu instead. yaxunl: Since opt is now able to adjust passes based on -mtriple and -mcpu, will remove these manually…
		OptArgs.push_back(ResultingBitcodeF);
		// Pass optimization arg to opt.
		if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
		StringRef OOpt = "3";
		if (A->getOption().matches(options::OPT_O4) \|\|
		A->getOption().matches(options::OPT_Ofast))
		OOpt = "3";
		else if (A->getOption().matches(options::OPT_O0))
		OOpt = "0";
		else if (A->getOption().matches(options::OPT_O)) {
		// -Os, -Oz, and -O(anything else) map to -O2
		OOpt = llvm::StringSwitch<const char *>(A->getValue())
		.Case("1", "1")
		.Case("2", "2")
		.Case("3", "3")
		.Case("s", "2")
		.Case("z", "2")
		.Default("2");
		}
		OptArgs.push_back(Args.MakeArgString(llvm::Twine("-O") + OOpt));
		}
		OptArgs.push_back("-S");
		OptArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
		const char *mcpustr = Args.MakeArgString("-mcpu=" + GFXNAME);
		OptArgs.push_back(mcpustr);
		OptArgs.push_back("-o");
		OptArgs.push_back(Output.getFilename());
		SmallString<128> OptPath(C.getDriver().Dir);
		llvm::sys::path::append(OptPath, "opt");
		const char *OptExec = Args.MakeArgString(OptPath);
		C.addCommand(llvm::make_unique<Command>(JA, *this, OptExec, OptArgs, Inputs));
		}

void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,		void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const auto &TC =		const auto &TC =
static_cast<const toolchains::CudaToolChain &>(getToolChain());		static_cast<const toolchains::CudaToolChain &>(getToolChain());
assert(TC.getTriple().isNVPTX() && "Wrong platform");		assert(TC.getTriple().isNVPTX() && "Wrong platform");
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))
CmdArgs.push_back(Args.MakeArgString(A));		CmdArgs.push_back(Args.MakeArgString(A));

bool Relocatable = false;		bool Relocatable = false;
if (JA.isOffloading(Action::OFK_OpenMP))		if (JA.isOffloading(Action::OFK_OpenMP))
// In OpenMP we need to generate relocatable code.		// In OpenMP we need to generate relocatable code.
Relocatable = Args.hasFlag(options::OPT_fopenmp_relocatable_target,		Relocatable = Args.hasFlag(options::OPT_fopenmp_relocatable_target,
options::OPT_fnoopenmp_relocatable_target,		options::OPT_fnoopenmp_relocatable_target,
/Default=/true);		/Default=/true);
else if (JA.isOffloading(Action::OFK_Cuda))		else if (JA.isOffloading(Action::OFK_Cuda) \|\|
		JA.isOffloading(Action::OFK_HIP))
Relocatable = Args.hasFlag(options::OPT_fcuda_rdc,		Relocatable = Args.hasFlag(options::OPT_fcuda_rdc,
options::OPT_fno_cuda_rdc, /Default=/false);		options::OPT_fno_cuda_rdc, /Default=/false);

if (Relocatable)		if (Relocatable)
CmdArgs.push_back("-c");		CmdArgs.push_back("-c");

const char *Exec;		const char *Exec;
if (Arg *A = Args.getLastArg(options::OPT_ptxas_path_EQ))		if (Arg *A = Args.getLastArg(options::OPT_ptxas_path_EQ))
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	void CudaToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);		HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|		assert((DeviceOffloadingKind == Action::OFK_OpenMP \|\|
DeviceOffloadingKind == Action::OFK_Cuda) &&		DeviceOffloadingKind == Action::OFK_Cuda \|\|
"Only OpenMP or CUDA offloading kinds are supported for NVIDIA GPUs.");		DeviceOffloadingKind == Action::OFK_HIP) &&
		"Only OpenMP, CUDA, or HIP offloading kinds are supported for GPUs.");

if (DeviceOffloadingKind == Action::OFK_Cuda) {		if (DeviceOffloadingKind == Action::OFK_Cuda \|\|
		DeviceOffloadingKind == Action::OFK_HIP) {
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,		if (DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
options::OPT_fno_cuda_flush_denormals_to_zero, false))		options::OPT_fno_cuda_flush_denormals_to_zero, false))
CC1Args.push_back("-fcuda-flush-denormals-to-zero");		CC1Args.push_back("-fcuda-flush-denormals-to-zero");

if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,		if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,
options::OPT_fno_cuda_approx_transcendentals, false))		options::OPT_fno_cuda_approx_transcendentals, false))
CC1Args.push_back("-fcuda-approx-transcendentals");		CC1Args.push_back("-fcuda-approx-transcendentals");

if (DriverArgs.hasFlag(options::OPT_fcuda_rdc, options::OPT_fno_cuda_rdc,		if (DriverArgs.hasFlag(options::OPT_fcuda_rdc, options::OPT_fno_cuda_rdc,
false))		false))
CC1Args.push_back("-fcuda-rdc");		CC1Args.push_back("-fcuda-rdc");
}		}

if (DriverArgs.hasArg(options::OPT_nocudalib))		if (DriverArgs.hasArg(options::OPT_nocudalib) \|\|
		DeviceOffloadingKind == Action::OFK_HIP)
		traUnsubmitted Done Reply Inline Actions I'd just add something like this and leave existing if unchanged: // There's no need for CUDA-specific bitcode linking with HIP. if( DeviceOffloadingKind == Action::OFK_HIP) return; tra: I'd just add something like this and leave existing if unchanged: ``` // There's no need for…
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will change yaxunl: will change
return;		return;

std::string LibDeviceFile = CudaInstallation.getLibDeviceFile(GpuArch);		std::string LibDeviceFile = CudaInstallation.getLibDeviceFile(GpuArch);

if (LibDeviceFile.empty()) {		if (LibDeviceFile.empty()) {
if (DeviceOffloadingKind == Action::OFK_OpenMP &&		if (DeviceOffloadingKind == Action::OFK_OpenMP &&
DriverArgs.hasArg(options::OPT_S))		DriverArgs.hasArg(options::OPT_S))
return;		return;
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,

if (!BoundArch.empty()) {		if (!BoundArch.empty()) {
DAL->eraseArg(options::OPT_march_EQ);		DAL->eraseArg(options::OPT_march_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);
}		}
return DAL;		return DAL;
}		}

		Tool *CudaToolChain::buildDeviceLibraryLinker() const {
		return new tools::AMDGCN::DeviceLibraryLinker(*this);
		}

Tool *CudaToolChain::buildAssembler() const {		Tool *CudaToolChain::buildAssembler() const {
		if (getTriple().getArch() == llvm::Triple::amdgcn)
		return new tools::AMDGCN::Assembler(*this);
return new tools::NVPTX::Assembler(*this);		return new tools::NVPTX::Assembler(*this);
}		}

Tool *CudaToolChain::buildLinker() const {		Tool *CudaToolChain::buildLinker() const {
if (OK == Action::OFK_OpenMP)		if (OK == Action::OFK_OpenMP)
return new tools::NVPTX::OpenMPLinker(*this);		return new tools::NVPTX::OpenMPLinker(*this);
		if (getTriple().getArch() == llvm::Triple::amdgcn)
		return new tools::AMDGCN::Linker(*this);
return new tools::NVPTX::Linker(*this);		return new tools::NVPTX::Linker(*this);
}		}
		traUnsubmitted Done Reply Inline Actions All these should be in the derived toolchain. tra: All these should be in the derived toolchain.
		yaxunlAuthorUnsubmitted Not Done Reply Inline Actions will do yaxunl: will do

void CudaToolChain::addClangWarningOptions(ArgStringList &CC1Args) const {		void CudaToolChain::addClangWarningOptions(ArgStringList &CC1Args) const {
HostTC.addClangWarningOptions(CC1Args);		HostTC.addClangWarningOptions(CC1Args);
}		}

ToolChain::CXXStdlibType		ToolChain::CXXStdlibType
CudaToolChain::GetCXXStdlibType(const ArgList &Args) const {		CudaToolChain::GetCXXStdlibType(const ArgList &Args) const {
return HostTC.GetCXXStdlibType(Args);		return HostTC.GetCXXStdlibType(Args);
Show All 34 Lines

test/Driver/cuda-bad-arch.cu

	// Checks errors generated by passing a bad value for --cuda-gpu-arch.			// Checks errors generated by passing a bad value for --cuda-gpu-arch.
	// REQUIRES: clang-driver			// REQUIRES: clang-driver
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
				// REQUIRES: amdgpu-registered-target

	// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=compute_20 -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=compute_20 -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix BAD %s			// RUN: \| FileCheck -check-prefix BAD %s
	// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm20 -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm20 -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix BAD %s			// RUN: \| FileCheck -check-prefix BAD %s
	// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_19 -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_19 -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix BAD %s			// RUN: \| FileCheck -check-prefix BAD %s

	// BAD: error: Unsupported CUDA gpu architecture			// BAD: error: Unsupported CUDA gpu architecture

	// RUN: %clang -### -v --target=x86_64-linux-gnu --cuda-gpu-arch=sm_21 \			// RUN: %clang -### -v --target=x86_64-linux-gnu --cuda-gpu-arch=sm_21 \
	// RUN: --cuda-path=%S/Inputs/CUDA_90/usr/local/cuda %s 2>&1 \			// RUN: --cuda-path=%S/Inputs/CUDA_90/usr/local/cuda %s 2>&1 \
	// RUN: \| FileCheck -check-prefix BAD_CUDA9 %s			// RUN: \| FileCheck -check-prefix BAD_CUDA9 %s

	// BAD_CUDA9: GPU arch sm_21 is supported by CUDA versions between 7.0 and 8.0			// BAD_CUDA9: GPU arch sm_21 is supported by CUDA versions between 7.0 and 8.0

	// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_20 -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_20 -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix OK %s			// RUN: \| FileCheck -check-prefix OK %s
	// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_52 -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu --cuda-gpu-arch=sm_52 -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix OK %s			// RUN: \| FileCheck -check-prefix OK %s
	// RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \			// RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix OK %s			// RUN: \| FileCheck -check-prefix OK %s

	// We don't allow using NVPTX for host compilation.			// We don't allow using NVPTX/AMDGCN for host compilation.
	// RUN: %clang -### --cuda-host-only -target nvptx-nvidia-cuda -c %s 2>&1 \			// RUN: %clang -### --cuda-host-only -target nvptx-nvidia-cuda -c %s 2>&1 \
	// RUN: \| FileCheck -check-prefix HOST_NVPTX %s			// RUN: \| FileCheck -check-prefix HOST_NVPTX %s
				// RUN: %clang -### --cuda-host-only -target amdgcn-amd-amdhsa -c %s 2>&1 \
				// RUN: \| FileCheck -check-prefix HOST_AMDGCN %s

	// OK-NOT: error: Unsupported CUDA gpu architecture			// OK-NOT: error: Unsupported CUDA gpu architecture
	// HOST_NVPTX: error: unsupported use of NVPTX for host compilation.			// HOST_NVPTX: error: unsupported architecture 'nvptx' for host compilation.
				// HOST_AMDGCN: error: unsupported architecture 'amdgcn' for host compilation.

test/Driver/cuda-phases.cu

	// Tests the phases generated for a CUDA offloading target for different			// Tests the phases generated for a CUDA offloading target for different
	// combinations of:			// combinations of:
	// - Number of gpu architectures;			// - Number of gpu architectures;
	// - Host/device-only compilation;			// - Host/device-only compilation;
	// - User-requested final phase - binary or assembly.			// - User-requested final phase - binary or assembly.

	// REQUIRES: clang-driver			// REQUIRES: clang-driver
	// REQUIRES: powerpc-registered-target			// REQUIRES: powerpc-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
				// REQUIRES: amdgpu-registered-target
	//			//
	// Test single gpu architecture with complete compilation.			// Test single gpu architecture with complete compilation.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=BIN %s			// RUN: \| FileCheck -check-prefixes=BIN,BIN_NV %s
	// BIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s 2>&1 \
	// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// RUN: \| FileCheck -check-prefixes=BIN,BIN_AMD %s
	// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// BIN_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]])
	// BIN-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// BIN_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]])
	// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, sm_30)			// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
	// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, sm_30)			// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
	// BIN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, sm_30)			// BIN_NV-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH:sm_30]])
	// BIN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, sm_30)			// BIN_AMD-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH:gfx803]])
	// BIN-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P7]]}, object			// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
	// BIN-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P6]]}, assembler			// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-[[T]], [[ARCH]])
	// BIN-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-cuda)			// BIN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH]])
	// BIN-DAG: [[P11:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P10]]}, ir			// BIN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH]])
	// BIN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-cuda)			// BIN_NV-DAG: [[P8:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda]]:[[ARCH]])" {[[P7]]}, object
	// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-cuda)			// BIN_AMD-DAG: [[P8:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P7]]}, object
	// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-cuda)			// BIN-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH]])" {[[P6]]}, assembler
				// BIN-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-[[T]])
				// BIN-DAG: [[P11:[0-9]+]]: offload, "host-[[T]] (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-[[T]] ([[TRIPLE]])" {[[P10]]}, ir
				// BIN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-[[T]])
				// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-[[T]])
				// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-[[T]])

	//			//
	// Test single gpu architecture up to the assemble phase.			// Test single gpu architecture up to the assemble phase.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=ASM %s			// RUN: \| FileCheck -check-prefixes=ASM,ASM_NV %s
	// ASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s -S 2>&1 \
	// ASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefixes=ASM,ASM_AMD %s
	// ASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// ASM_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (device-[[T]], [[ARCH:sm_30]])
	// ASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// ASM_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
	// ASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// ASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
	// ASM-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// ASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
	// ASM-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (host-cuda)			// ASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
	// ASM-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (host-cuda)			// ASM-DAG: [[P4:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P3]]}, assembler
	// ASM-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (host-cuda)			// ASM-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (host-[[T]])
				// ASM-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, [[T]]-cpp-output, (host-[[T]])
				// ASM-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (host-[[T]])
				// ASM-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (host-[[T]])

	//			//
	// Test two gpu architectures with complete compilation.			// Test two gpu architectures with complete compilation.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=BIN2 %s			// RUN: \| FileCheck -check-prefixes=BIN2,BIN2_NV %s
	// BIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s 2>&1 \
	// BIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// RUN: \| FileCheck -check-prefixes=BIN2,BIN2_AMD %s
	// BIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// BIN2_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]])
	// BIN2-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// BIN2_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]])
	// BIN2-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, sm_30)			// BIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
	// BIN2-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, sm_30)			// BIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
	// BIN2-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, sm_30)			// BIN2-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH1:sm_30\|gfx803]])
	// BIN2-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, sm_30)			// BIN2-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH1]])
	// BIN2-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P7]]}, object			// BIN2-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-[[T]], [[ARCH1]])
	// BIN2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P6]]}, assembler			// BIN2-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-[[T]], [[ARCH1]])
	// BIN2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// BIN2-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-[[T]], [[ARCH1]])
	// BIN2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, cuda-cpp-output, (device-cuda, sm_35)			// BIN2-DAG: [[P8:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa]]:[[ARCH1]])" {[[P7]]}, object
	// BIN2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (device-cuda, sm_35)			// BIN2-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH1]])" {[[P6]]}, assembler
	// BIN2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (device-cuda, sm_35)			// BIN2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH2:sm_35\|gfx900]])
	// BIN2-DAG: [[P14:[0-9]+]]: assembler, {[[P13]]}, object, (device-cuda, sm_35)			// BIN2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH2]])
	// BIN2-DAG: [[P15:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P14]]}, object			// BIN2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (device-[[T]], [[ARCH2]])
	// BIN2-DAG: [[P16:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P13]]}, assembler			// BIN2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (device-[[T]], [[ARCH2]])
	// BIN2-DAG: [[P17:[0-9]+]]: linker, {[[P8]], [[P9]], [[P15]], [[P16]]}, cuda-fatbin, (device-cuda)			// BIN2-DAG: [[P14:[0-9]+]]: assembler, {[[P13]]}, object, (device-[[T]], [[ARCH2]])
	// BIN2-DAG: [[P18:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P17]]}, ir			// BIN2-DAG: [[P15:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH2]])" {[[P14]]}, object
	// BIN2-DAG: [[P19:[0-9]+]]: backend, {[[P18]]}, assembler, (host-cuda)			// BIN2-DAG: [[P16:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH2]])" {[[P13]]}, assembler
	// BIN2-DAG: [[P20:[0-9]+]]: assembler, {[[P19]]}, object, (host-cuda)			// BIN2-DAG: [[P17:[0-9]+]]: linker, {[[P8]], [[P9]], [[P15]], [[P16]]}, cuda-fatbin, (device-[[T]])
	// BIN2-DAG: [[P21:[0-9]+]]: linker, {[[P20]]}, image, (host-cuda)			// BIN2-DAG: [[P18:[0-9]+]]: offload, "host-[[T]] (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-[[T]] ([[TRIPLE]])" {[[P17]]}, ir
				// BIN2-DAG: [[P19:[0-9]+]]: backend, {[[P18]]}, assembler, (host-[[T]])
				// BIN2-DAG: [[P20:[0-9]+]]: assembler, {[[P19]]}, object, (host-[[T]])
				// BIN2-DAG: [[P21:[0-9]+]]: linker, {[[P20]]}, image, (host-[[T]])

	//			//
	// Test two gpu architecturess up to the assemble phase.			// Test two gpu architecturess up to the assemble phase.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=ASM2 %s			// RUN: \| FileCheck -check-prefixes=ASM2,ASM2_NV %s
	// ASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s -S 2>&1 \
	// ASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefixes=ASM2,ASM2_AMD %s
	// ASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// ASM2_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (device-[[T]], [[ARCH1:sm_30]])
	// ASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// ASM2_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (device-[[T]], [[ARCH1:gfx803]])
	// ASM2-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// ASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH1]])
	// ASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// ASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH1]])
	// ASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (device-cuda, sm_35)			// ASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH1]])
	// ASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-cuda, sm_35)			// ASM2-DAG: [[P4:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa]]:[[ARCH1]])" {[[P3]]}, assembler
	// ASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-cuda, sm_35)			// ASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH2:sm_35\|gfx900]])
	// ASM2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P8]]}, assembler			// ASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH2]])
	// ASM2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// ASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-[[T]], [[ARCH2]])
	// ASM2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, cuda-cpp-output, (host-cuda)			// ASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-[[T]], [[ARCH2]])
	// ASM2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (host-cuda)			// ASM2-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH2]])" {[[P8]]}, assembler
	// ASM2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (host-cuda)			// ASM2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (host-[[T]])
				// ASM2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, [[T]]-cpp-output, (host-[[T]])
				// ASM2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (host-[[T]])
				// ASM2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (host-[[T]])

	//			//
	// Test single gpu architecture with complete compilation in host-only			// Test single gpu architecture with complete compilation in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=HBIN %s			// RUN: \| FileCheck -check-prefixes=HBIN,HBIN_NV %s
	// HBIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-host-only 2>&1 \
	// HBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// RUN: \| FileCheck -check-prefixes=HBIN,HBIN_AMD %s
	// HBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HBIN_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]])
	// HBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HBIN_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]])
	// HBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-cuda)			// HBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
	// HBIN-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-cuda)			// HBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
				// HBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])
				// HBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-[[T]])
				// HBIN-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-[[T]])
	//			//
	// Test single gpu architecture up to the assemble phase in host-only			// Test single gpu architecture up to the assemble phase in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=HASM %s			// RUN: \| FileCheck -check-prefixes=HASM,HASM_NV %s
	// HASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-host-only -S 2>&1 \
	// HASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// RUN: \| FileCheck -check-prefixes=HASM,HASM_AMD %s
	// HASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HASM_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]])
	// HASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HASM_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]])
				// HASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
				// HASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
				// HASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])

	//			//
	// Test two gpu architectures with complete compilation in host-only			// Test two gpu architectures with complete compilation in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=HBIN2 %s			// RUN: \| FileCheck -check-prefixes=HBIN2,HBIN2_NV %s
	// HBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-host-only 2>&1 \
	// HBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// RUN: \| FileCheck -check-prefixes=HBIN2,HBIN2_AMD %s
	// HBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HBIN2_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]])
	// HBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HBIN2_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]])
	// HBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-cuda)			// HBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
	// HBIN2-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-cuda)			// HBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
				// HBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])
				// HBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-[[T]])
				// HBIN2-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-[[T]])

	//			//
	// Test two gpu architectures up to the assemble phase in host-only			// Test two gpu architectures up to the assemble phase in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=HASM2 %s			// RUN: \| FileCheck -check-prefixes=HASM2,HASM2_NV %s
	// HASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-host-only -S 2>&1 \
	// HASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// RUN: \| FileCheck -check-prefixes=HASM2,HASM2_AMD %s
	// HASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HASM2_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (host-[[T]])
	// HASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HASM2_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (host-[[T]])
				// HASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (host-[[T]])
				// HASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-[[T]])
				// HASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-[[T]])

	//			//
	// Test single gpu architecture with complete compilation in device-only			// Test single gpu architecture with complete compilation in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=DBIN %s			// RUN: \| FileCheck -check-prefixes=DBIN,DBIN_NV %s
	// DBIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-device-only 2>&1 \
	// DBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefixes=DBIN,DBIN_AMD %s
	// DBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DBIN_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (device-[[T]], [[ARCH:sm_30]])
	// DBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DBIN_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
	// DBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-cuda, sm_30)			// DBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
	// DBIN-DAG: [[P5:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P4]]}, object			// DBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
				// DBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
				// DBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-[[T]], [[ARCH]])
				// DBIN-DAG: [[P5:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P4]]}, object

	//			//
	// Test single gpu architecture up to the assemble phase in device-only			// Test single gpu architecture up to the assemble phase in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=DASM %s			// RUN: \| FileCheck -check-prefixes=DASM,DASM_NV %s
	// DASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-device-only -S 2>&1 \
	// DASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefixes=DASM,DASM_AMD %s
	// DASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DASM_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (device-[[T]], [[ARCH:sm_30]])
	// DASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DASM_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
	// DASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// DASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
				// DASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
				// DASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
				// DASM-DAG: [[P4:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P3]]}, assembler

	//			//
	// Test two gpu architectures with complete compilation in device-only			// Test two gpu architectures with complete compilation in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=DBIN2 %s			// RUN: \| FileCheck -check-prefixes=DBIN2,DBIN2_NV %s
	// DBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-device-only 2>&1 \
	// DBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefixes=DBIN2,DBIN2_AMD %s
	// DBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DBIN2_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (device-[[T]], [[ARCH:sm_30]])
	// DBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DBIN2_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
	// DBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-cuda, sm_30)			// DBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
	// DBIN2-DAG: [[P5:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P4]]}, object			// DBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
	// DBIN2-DAG: [[P6:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// DBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
	// DBIN2-DAG: [[P7:[0-9]+]]: preprocessor, {[[P6]]}, cuda-cpp-output, (device-cuda, sm_35)			// DBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-[[T]], [[ARCH]])
	// DBIN2-DAG: [[P8:[0-9]+]]: compiler, {[[P7]]}, ir, (device-cuda, sm_35)			// DBIN2-DAG: [[P5:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P4]]}, object
	// DBIN2-DAG: [[P9:[0-9]+]]: backend, {[[P8]]}, assembler, (device-cuda, sm_35)			// DBIN2-DAG: [[P6:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH2:sm_35\|gfx900]])
	// DBIN2-DAG: [[P10:[0-9]+]]: assembler, {[[P9]]}, object, (device-cuda, sm_35)			// DBIN2-DAG: [[P7:[0-9]+]]: preprocessor, {[[P6]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH2]])
	// DBIN2-DAG: [[P11:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P10]]}, object			// DBIN2-DAG: [[P8:[0-9]+]]: compiler, {[[P7]]}, ir, (device-[[T]], [[ARCH2]])
				// DBIN2-DAG: [[P9:[0-9]+]]: backend, {[[P8]]}, assembler, (device-[[T]], [[ARCH2]])
				// DBIN2-DAG: [[P10:[0-9]+]]: assembler, {[[P9]]}, object, (device-[[T]], [[ARCH2]])
				// DBIN2-DAG: [[P11:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH2]])" {[[P10]]}, object

	//			//
	// Test two gpu architectures up to the assemble phase in device-only			// Test two gpu architectures up to the assemble phase in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=DASM2 %s			// RUN: \| FileCheck -check-prefixes=DASM2,DASM2_NV %s
	// DASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -x hip -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-device-only -S 2>&1 \
	// DASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefixes=DASM2,DASM2_AMD %s
	// DASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DASM2_NV-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:cuda]], (device-[[T]], [[ARCH:sm_30]])
	// DASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DASM2_AMD-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T:hip]], (device-[[T]], [[ARCH:gfx803]])
	// DASM2-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// DASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH]])
	// DASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// DASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-[[T]], [[ARCH]])
	// DASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (device-cuda, sm_35)			// DASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-[[T]], [[ARCH]])
	// DASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-cuda, sm_35)			// DASM2-DAG: [[P4:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE:nvptx64-nvidia-cuda\|amdgcn-amd-amdhsa]]:[[ARCH]])" {[[P3]]}, assembler
	// DASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-cuda, sm_35)			// DASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", [[T]], (device-[[T]], [[ARCH2:sm_35\|gfx900]])
	// DASM2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P8]]}, assembler			// DASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, [[T]]-cpp-output, (device-[[T]], [[ARCH2]])
				// DASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-[[T]], [[ARCH2]])
				// DASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-[[T]], [[ARCH2]])
				// DASM2-DAG: [[P9:[0-9]+]]: offload, "device-[[T]] ([[TRIPLE]]:[[ARCH2]])" {[[P8]]}, assembler

This is an archive of the discontinued LLVM Phabricator instance.

Add HIP toolchainClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 143320

include/clang/Basic/DiagnosticDriverKinds.td

include/clang/Driver/Action.h

include/clang/Driver/Options.td

include/clang/Driver/ToolChain.h

lib/Driver/Action.cpp

lib/Driver/Compilation.cpp

lib/Driver/Driver.cpp

lib/Driver/ToolChain.cpp

lib/Driver/ToolChains/Clang.cpp

lib/Driver/ToolChains/Cuda.h

lib/Driver/ToolChains/Cuda.cpp

test/Driver/cuda-bad-arch.cu

test/Driver/cuda-phases.cu

Add HIP toolchain
ClosedPublic