This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/
-
clang/
-
Basic/
4
Cuda.h
-
Driver/
-
ToolChain.h
-
lib/
-
Basic/
1
Cuda.cpp
-
Targets/
4
AMDGPU.h
2
AMDGPU.cpp
-
NVPTX.cpp
-
Driver/
2
Driver.cpp
-
SanitizerArgs.cpp
-
ToolChain.cpp
-
ToolChains/
-
Clang.cpp
-
Cuda.h
15
Cuda.cpp
-
test/Driver/
-
Driver/
-
cuda-phases.cu

Differential D42800

Let CUDA toolchain support amdgpu target
AbandonedPublic

Authored by gregrodgers on Feb 1 2018, 8:20 AM.

Download Raw Diff

Details

Reviewers

jlebar
b-sumner
t-tye
tra
yaxunl
jdoerfert
arsenm

Summary

Currently CUDA toolchain only supports nvptx.

This patch will let CUDA toolchain support amdgpu target. It can also serve as an example for supporting CUDA on other targets.

Patch by Greg Rodgers.
Lit test added by Yaxun Liu.

Diff Detail

Event Timeline

yaxunl created this revision.Feb 1 2018, 8:20 AM

Herald added subscribers: tpr, dstuttard, nhaehnle and 3 others. · View Herald TranscriptFeb 1 2018, 8:20 AM

Only commenting on parts that I'm a bit familiar with. In general, does it make sense to split this patch, are there different "stages" of support? Like 1) being able to compile an empty file, 2) generate optimized code, 3) allow using math functions?

lib/Driver/ToolChains/Cuda.cpp
403–415	This never gets cleaned up!
531–534	That is already done in `TC.getInputFilename(Output)` (since rC318763), the same function call that you are removing here...
639–640	`clang-fixup-fatbin` is not upstreamed and won't work. Sounds like a horrible name btw...
788–793	You should use `GpuArch` which comes from `DriverArgs.getLastArgValue`: The last `-march` overrides previous arguments.

In D42800#994955, @Hahnfeld wrote:

Only commenting on parts that I'm a bit familiar with. In general, does it make sense to split this patch, are there different "stages" of support? Like 1) being able to compile an empty file, 2) generate optimized code, 3) allow using math functions?

Good suggestion. Actually this patch is mainly to let the toolchain recognise the amdgpu implementation of CUDA and create proper stages. I can try to create a test for compiling an empty file.

tra added a reviewer: tra.Feb 1 2018, 9:44 AM

arsenm added inline comments.Feb 1 2018, 9:53 AM

lib/Basic/Targets/AMDGPU.cpp
437	llvm_unreachable
lib/Driver/ToolChains/Cuda.cpp
359–361	Why is this done under an NVPTX:: class
390	Why is this hardcoded?

I don't have enough knowledge about compute on AMD's GPU and would appreciate if you could share your thoughts on how you think CUDA on AMD should work. Is there a good document describing how compute currently works (how do I launch a kernel using rough equivalent of nvidia's driver API ) on AMD GPUs?

Headers. clang pre-includes *a lot* of headers from NVidia's CUDA SDK. Some of them may work for AMD, but some certainly will not -- there are a lot of headers with nvidia-specific inline assembly or things that rely on nvidia-specific functionality. In the end, I think, we'll need some sort of CUDA SDK for AMD which would implement (possibly with asserts for unsupported functions) existing CUDA APIs. Or, perhaps the plan is to just use CUDA syntax only without providing complete API compatibility with nvidia.

How will GPU-side object file be incorporated into the final executable? I believe OpenMP has a fairly generic way to deal with it in clang. I'm not sure if that would be suitable for use with AMD's runtime (whatever we need to use to launch the kernels).

Launching kernels. Will it be similar to the way kernel launches are configured on NVidia? I.e. grid of blocks of threads with per-block shared memory.

Thanks to everyone for the reviews. I hope I replied to all inline comments. Since I sent this to Sam to post, we discovered a major shortcoming. As tra points out, there is a lot of cuda headers in the cuda sdk that are processed. We are able to override asm() expansions with #undef and redefine as an equivalent amdgpu component so the compiler never sees the asm(). I am sure we will need to add more redefines as we broaden our testing. But that is not the big problem. We would like to be able to run cudaclang for AMD GPUs without an install of cuda. Of course you must always install cuda if any of your targeted GPUs are NVidia GPUs. To run cudaclang without cuda when only non-NVidia gpus are specified, we need an open set of headers and we must replace the fatbin tools used in the toolchain. The later can be addressed by using the libomptarget methods for embedding multiple target GPU objects. The former is going to take a lot of work. I am going to be sending an updated patch that has the stubs for the open headers noted in clang_cuda_runtime_wrapper.h. They will be included with the CC1 flag -DUSE_OPEN_HEADERS__. This will be generated by the cuda driver when it finds no cuda installation and all target GPUs are not NVidia.

This revision now requires changes to proceed.Feb 1 2018, 6:40 PM

Sorry, all my great inline comments got lost somehow. I am a newbie to Phabricator. I will try to reconstruct my comments.

t-tye added inline comments.Feb 5 2018, 10:35 AM

include/clang/Basic/Cuda.h
49–57	Should complete list of processors for the amdgcn architecture be included? See https://llvm.org/docs/AMDGPUUsage.html#processors .
79	Suggest using amdgcn which matches the architecture name in https://llvm.org/docs/AMDGPUUsage.html#processors .
lib/Basic/Targets/AMDGPU.h
85	We still want to use the amdhsa OS for amdgpu which currently supports the different environments. So can cuda simply support the same environments? Is the plan is to eliminate the environments and simply always use the default address space for generic so this code is no longer needed?

yaxunl added inline comments.Feb 5 2018, 10:40 AM

lib/Basic/Targets/AMDGPU.h
85	Currently we already use amdgiz by default. This is no longer needed.

bader added a subscriber: bader.Feb 5 2018, 10:40 AM

Here my replys to the inline comments. Everything should be fixed in the next revision.

include/clang/Basic/Cuda.h
79	Yes, I will add them in the update.
79	Done in next update
lib/Basic/Targets/AMDGPU.cpp
437	Fixed in next update
lib/Basic/Targets/AMDGPU.h
85	removed in next update
lib/Driver/ToolChains/Cuda.cpp
359–361	Because we are not creating a new toolchain for AMDGCN. We modify the logic in the tool constructor as needed for AMDGCN, keeping the ability to provide a set of mixed targets.
403–415	OK, Deleted in revision.
531–534	Fixed in next update
639–640	Major cleanup here in the next update. It is not a bad name when you see the update and the comments in the update.
788–793	Nice catch. I will fix this in the next update.

jprice added a subscriber: jprice.Feb 12 2018, 10:25 AM

Update with Greg's change.

Let's start with the obvious points:

The latest patch clearly wasn't run through clang-format.
It only has some 90 lines of context which makes it look like you deleted quite some code when browsing through the history.
This patch is still large, did you at least consider splitting it as I proposed some weeks ago?

Additionally I don't see responses to all points that @tra raised. Their answers will probably explain the approach you chose, so I think they should also be added to the summary.

lib/Basic/Cuda.cpp
254–266	This means you can't compile when the Clang driver detected an installation of CUDA 9?
lib/Basic/Targets/AMDGPU.h
319	How can `LangAS::opencl_local` ever be used in CUDA? I think this check is redundant.
lib/Driver/Driver.cpp
3262–3263	I'm not sure I understand this change to the generic driver code: How can LLVM IR / BC ever need a preprocessor?
3954–3957	If this is really needed this deserves some justification.
lib/Driver/ToolChains/Cuda.cpp
293	I don't see this file in the upstream repository?
359–361	That sounds more like a hack, the CUDA classes should be separated from NVPTX and AMDGCN.
413	I don't think you should invent new environment variables, Clang normally uses the `-X` class to pass arguments to specific parts of the toolchain.
639–640	I wasn't only criticizing the name but also the fact that this code won't work with only upstream components!

tra mentioned this in D44984: [HIP] Add hip input kind and codegen for kernel launching.Mar 28 2018, 10:31 AM

mkuron added a subscriber: mkuron.Mar 31 2018, 2:29 AM

Hahnfeld mentioned this in D45212: Add HIP toolchain.Apr 4 2018, 10:04 AM

arsenm resigned from this revision.Feb 21 2019, 6:58 PM

Herald added subscribers: jdoerfert, jvesely. · View Herald TranscriptFeb 21 2019, 6:58 PM

gregrodgers commandeered this revision.May 14 2020, 4:10 PM

gregrodgers edited reviewers, added: yaxunl; removed: gregrodgers.

Herald added a reviewer: jdoerfert. · View Herald TranscriptMay 14 2020, 4:10 PM

Herald added a subscriber: kerbowa. · View Herald Transcript

gregrodgers abandoned this revision.May 14 2020, 4:10 PM

Revision Contents

Path

Size

include/

clang/

Basic/

Cuda.h

10 lines

Driver/

ToolChain.h

3 lines

lib/

Basic/

Cuda.cpp

59 lines

Targets/

AMDGPU.h

8 lines

AMDGPU.cpp

124 lines

NVPTX.cpp

18 lines

Driver/

Driver.cpp

51 lines

SanitizerArgs.cpp

3 lines

ToolChain.cpp

27 lines

ToolChains/

Clang.cpp

21 lines

Cuda.h

27 lines

Cuda.cpp

312 lines

test/

Driver/

cuda-phases.cu

167 lines

Diff 132402

include/clang/Basic/Cuda.h

Show All 40 Lines	enum class CudaArch {
SM_50,		SM_50,
SM_52,		SM_52,
SM_53,		SM_53,
SM_60,		SM_60,
SM_61,		SM_61,
SM_62,		SM_62,
SM_70,		SM_70,
SM_72,		SM_72,
		GFX700,
		GFX701,
		GFX800,
		GFX801,
		GFX802,
		GFX803,
		GFX810,
		GFX900,
		GFX901,
		t-tyeUnsubmitted Not Done Reply Inline Actions Should complete list of processors for the amdgcn architecture be included? See https://llvm.org/docs/AMDGPUUsage.html#processors . t-tye: Should complete list of processors for the amdgcn architecture be included? See https://llvm.
};		};
const char *CudaArchToString(CudaArch A);		const char *CudaArchToString(CudaArch A);

// The input should have the form "sm_20".		// The input should have the form "sm_20".
CudaArch StringToCudaArch(llvm::StringRef S);		CudaArch StringToCudaArch(llvm::StringRef S);

enum class CudaVirtualArch {		enum class CudaVirtualArch {
UNKNOWN,		UNKNOWN,
COMPUTE_20,		COMPUTE_20,
COMPUTE_30,		COMPUTE_30,
COMPUTE_32,		COMPUTE_32,
COMPUTE_35,		COMPUTE_35,
COMPUTE_37,		COMPUTE_37,
COMPUTE_50,		COMPUTE_50,
COMPUTE_52,		COMPUTE_52,
COMPUTE_53,		COMPUTE_53,
COMPUTE_60,		COMPUTE_60,
COMPUTE_61,		COMPUTE_61,
COMPUTE_62,		COMPUTE_62,
COMPUTE_70,		COMPUTE_70,
COMPUTE_72,		COMPUTE_72,
		COMPUTE_GCN,
		t-tyeUnsubmitted Not Done Reply Inline Actions Suggest using amdgcn which matches the architecture name in https://llvm.org/docs/AMDGPUUsage.html#processors . t-tye: Suggest using amdgcn which matches the architecture name in https://llvm.org/docs/AMDGPUUsage.
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions Yes, I will add them in the update. gregrodgers: Yes, I will add them in the update.
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions Done in next update gregrodgers: Done in next update
};		};
const char *CudaVirtualArchToString(CudaVirtualArch A);		const char *CudaVirtualArchToString(CudaVirtualArch A);

// The input should have the form "compute_20".		// The input should have the form "compute_20".
CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S);		CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S);

/// Get the compute_xx corresponding to an sm_yy.		/// Get the compute_xx corresponding to an sm_yy.
CudaVirtualArch VirtualArchForCudaArch(CudaArch A);		CudaVirtualArch VirtualArchForCudaArch(CudaArch A);
Show All 10 Lines

include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	private:
/// files.		/// files.
path_list FilePaths;		path_list FilePaths;

/// The list of toolchain specific path prefixes to search for		/// The list of toolchain specific path prefixes to search for
/// programs.		/// programs.
path_list ProgramPaths;		path_list ProgramPaths;

mutable std::unique_ptr<Tool> Clang;		mutable std::unique_ptr<Tool> Clang;
		mutable std::unique_ptr<Tool> Backend;
mutable std::unique_ptr<Tool> Assemble;		mutable std::unique_ptr<Tool> Assemble;
mutable std::unique_ptr<Tool> Link;		mutable std::unique_ptr<Tool> Link;
mutable std::unique_ptr<Tool> OffloadBundler;		mutable std::unique_ptr<Tool> OffloadBundler;
Tool *getClang() const;		Tool *getClang() const;
		Tool *getBackend() const;
Tool *getAssemble() const;		Tool *getAssemble() const;
Tool *getLink() const;		Tool *getLink() const;
Tool *getClangAs() const;		Tool *getClangAs() const;
Tool *getOffloadBundler() const;		Tool *getOffloadBundler() const;

mutable std::unique_ptr<SanitizerArgs> SanitizerArguments;		mutable std::unique_ptr<SanitizerArgs> SanitizerArguments;
mutable std::unique_ptr<XRayArgs> XRayArguments;		mutable std::unique_ptr<XRayArgs> XRayArguments;

Show All 11 Lines	protected:
MultilibSet Multilibs;		MultilibSet Multilibs;

ToolChain(const Driver &D, const llvm::Triple &T,		ToolChain(const Driver &D, const llvm::Triple &T,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);

void setTripleEnvironment(llvm::Triple::EnvironmentType Env);		void setTripleEnvironment(llvm::Triple::EnvironmentType Env);

virtual Tool *buildAssembler() const;		virtual Tool *buildAssembler() const;
		virtual Tool *buildBackend() const;
virtual Tool *buildLinker() const;		virtual Tool *buildLinker() const;
virtual Tool *getTool(Action::ActionClass AC) const;		virtual Tool *getTool(Action::ActionClass AC) const;

/// \name Utilities for implementing subclasses.		/// \name Utilities for implementing subclasses.
///@{		///@{
static void addSystemInclude(const llvm::opt::ArgList &DriverArgs,		static void addSystemInclude(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
const Twine &Path);		const Twine &Path);
▲ Show 20 Lines • Show All 391 Lines • Show Last 20 Lines

lib/Basic/Cuda.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	const char *CudaArchToString(CudaArch A) {
case CudaArch::SM_61:		case CudaArch::SM_61:
return "sm_61";		return "sm_61";
case CudaArch::SM_62:		case CudaArch::SM_62:
return "sm_62";		return "sm_62";
case CudaArch::SM_70:		case CudaArch::SM_70:
return "sm_70";		return "sm_70";
case CudaArch::SM_72:		case CudaArch::SM_72:
return "sm_72";		return "sm_72";
		case CudaArch::GFX700: // kaveri
		return "gfx700";
		case CudaArch::GFX701: // hawaii
		return "gfx701";
		case CudaArch::GFX800: // iceland
		return "gfx800";
		case CudaArch::GFX801: // carrizo
		return "gfx801";
		case CudaArch::GFX802: //
		return "gfx802";
		case CudaArch::GFX803: // fiji,polaris10
		return "gfx803";
		case CudaArch::GFX810: //
		return "gfx810";
		case CudaArch::GFX900: // Vega
		return "gfx900";
		case CudaArch::GFX901: //
		return "gfx901";
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaArch StringToCudaArch(llvm::StringRef S) {		CudaArch StringToCudaArch(llvm::StringRef S) {
return llvm::StringSwitch<CudaArch>(S)		return llvm::StringSwitch<CudaArch>(S)
.Case("sm_20", CudaArch::SM_20)		.Case("sm_20", CudaArch::SM_20)
.Case("sm_21", CudaArch::SM_21)		.Case("sm_21", CudaArch::SM_21)
.Case("sm_30", CudaArch::SM_30)		.Case("sm_30", CudaArch::SM_30)
.Case("sm_32", CudaArch::SM_32)		.Case("sm_32", CudaArch::SM_32)
.Case("sm_35", CudaArch::SM_35)		.Case("sm_35", CudaArch::SM_35)
.Case("sm_37", CudaArch::SM_37)		.Case("sm_37", CudaArch::SM_37)
.Case("sm_50", CudaArch::SM_50)		.Case("sm_50", CudaArch::SM_50)
.Case("sm_52", CudaArch::SM_52)		.Case("sm_52", CudaArch::SM_52)
.Case("sm_53", CudaArch::SM_53)		.Case("sm_53", CudaArch::SM_53)
.Case("sm_60", CudaArch::SM_60)		.Case("sm_60", CudaArch::SM_60)
.Case("sm_61", CudaArch::SM_61)		.Case("sm_61", CudaArch::SM_61)
.Case("sm_62", CudaArch::SM_62)		.Case("sm_62", CudaArch::SM_62)
.Case("sm_70", CudaArch::SM_70)		.Case("sm_70", CudaArch::SM_70)
.Case("sm_72", CudaArch::SM_72)		.Case("sm_72", CudaArch::SM_72)
		.Case("gfx700", CudaArch::GFX700)
		.Case("gfx701", CudaArch::GFX701)
		.Case("gfx800", CudaArch::GFX800)
		.Case("gfx801", CudaArch::GFX801)
		.Case("gfx802", CudaArch::GFX802)
		.Case("gfx803", CudaArch::GFX803)
		.Case("gfx810", CudaArch::GFX810)
		.Case("gfx900", CudaArch::GFX900)
		.Case("gfx901", CudaArch::GFX901)
.Default(CudaArch::UNKNOWN);		.Default(CudaArch::UNKNOWN);
}		}

const char *CudaVirtualArchToString(CudaVirtualArch A) {		const char *CudaVirtualArchToString(CudaVirtualArch A) {
switch (A) {		switch (A) {
case CudaVirtualArch::UNKNOWN:		case CudaVirtualArch::UNKNOWN:
return "unknown";		return "unknown";
case CudaVirtualArch::COMPUTE_20:		case CudaVirtualArch::COMPUTE_20:
Show All 17 Lines	const char *CudaVirtualArchToString(CudaVirtualArch A) {
case CudaVirtualArch::COMPUTE_61:		case CudaVirtualArch::COMPUTE_61:
return "compute_61";		return "compute_61";
case CudaVirtualArch::COMPUTE_62:		case CudaVirtualArch::COMPUTE_62:
return "compute_62";		return "compute_62";
case CudaVirtualArch::COMPUTE_70:		case CudaVirtualArch::COMPUTE_70:
return "compute_70";		return "compute_70";
case CudaVirtualArch::COMPUTE_72:		case CudaVirtualArch::COMPUTE_72:
return "compute_72";		return "compute_72";
		case CudaVirtualArch::COMPUTE_GCN:
		return "compute_gcn";
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S) {		CudaVirtualArch StringToCudaVirtualArch(llvm::StringRef S) {
return llvm::StringSwitch<CudaVirtualArch>(S)		return llvm::StringSwitch<CudaVirtualArch>(S)
.Case("compute_20", CudaVirtualArch::COMPUTE_20)		.Case("compute_20", CudaVirtualArch::COMPUTE_20)
.Case("compute_30", CudaVirtualArch::COMPUTE_30)		.Case("compute_30", CudaVirtualArch::COMPUTE_30)
.Case("compute_32", CudaVirtualArch::COMPUTE_32)		.Case("compute_32", CudaVirtualArch::COMPUTE_32)
.Case("compute_35", CudaVirtualArch::COMPUTE_35)		.Case("compute_35", CudaVirtualArch::COMPUTE_35)
.Case("compute_37", CudaVirtualArch::COMPUTE_37)		.Case("compute_37", CudaVirtualArch::COMPUTE_37)
.Case("compute_50", CudaVirtualArch::COMPUTE_50)		.Case("compute_50", CudaVirtualArch::COMPUTE_50)
.Case("compute_52", CudaVirtualArch::COMPUTE_52)		.Case("compute_52", CudaVirtualArch::COMPUTE_52)
.Case("compute_53", CudaVirtualArch::COMPUTE_53)		.Case("compute_53", CudaVirtualArch::COMPUTE_53)
.Case("compute_60", CudaVirtualArch::COMPUTE_60)		.Case("compute_60", CudaVirtualArch::COMPUTE_60)
.Case("compute_61", CudaVirtualArch::COMPUTE_61)		.Case("compute_61", CudaVirtualArch::COMPUTE_61)
.Case("compute_62", CudaVirtualArch::COMPUTE_62)		.Case("compute_62", CudaVirtualArch::COMPUTE_62)
.Case("compute_70", CudaVirtualArch::COMPUTE_70)		.Case("compute_70", CudaVirtualArch::COMPUTE_70)
.Case("compute_72", CudaVirtualArch::COMPUTE_72)		.Case("compute_72", CudaVirtualArch::COMPUTE_72)
		.Case("compute_gcn", CudaVirtualArch::COMPUTE_GCN)
.Default(CudaVirtualArch::UNKNOWN);		.Default(CudaVirtualArch::UNKNOWN);
}		}

CudaVirtualArch VirtualArchForCudaArch(CudaArch A) {		CudaVirtualArch VirtualArchForCudaArch(CudaArch A) {
switch (A) {		switch (A) {
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
return CudaVirtualArch::UNKNOWN;		return CudaVirtualArch::UNKNOWN;
case CudaArch::SM_20:		case CudaArch::SM_20:
Show All 18 Lines	CudaVirtualArch VirtualArchForCudaArch(CudaArch A) {
case CudaArch::SM_61:		case CudaArch::SM_61:
return CudaVirtualArch::COMPUTE_61;		return CudaVirtualArch::COMPUTE_61;
case CudaArch::SM_62:		case CudaArch::SM_62:
return CudaVirtualArch::COMPUTE_62;		return CudaVirtualArch::COMPUTE_62;
case CudaArch::SM_70:		case CudaArch::SM_70:
return CudaVirtualArch::COMPUTE_70;		return CudaVirtualArch::COMPUTE_70;
case CudaArch::SM_72:		case CudaArch::SM_72:
return CudaVirtualArch::COMPUTE_72;		return CudaVirtualArch::COMPUTE_72;
		case CudaArch::GFX700:
		case CudaArch::GFX701:
		case CudaArch::GFX800:
		case CudaArch::GFX801:
		case CudaArch::GFX802:
		case CudaArch::GFX803:
		case CudaArch::GFX810:
		case CudaArch::GFX900:
		case CudaArch::GFX901:
		return CudaVirtualArch::COMPUTE_GCN;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVersion MinVersionForCudaArch(CudaArch A) {		CudaVersion MinVersionForCudaArch(CudaArch A) {
switch (A) {		switch (A) {
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
return CudaVersion::UNKNOWN;		return CudaVersion::UNKNOWN;
Show All 10 Lines	CudaVersion MinVersionForCudaArch(CudaArch A) {
case CudaArch::SM_60:		case CudaArch::SM_60:
case CudaArch::SM_61:		case CudaArch::SM_61:
case CudaArch::SM_62:		case CudaArch::SM_62:
return CudaVersion::CUDA_80;		return CudaVersion::CUDA_80;
case CudaArch::SM_70:		case CudaArch::SM_70:
return CudaVersion::CUDA_90;		return CudaVersion::CUDA_90;
case CudaArch::SM_72:		case CudaArch::SM_72:
return CudaVersion::CUDA_91;		return CudaVersion::CUDA_91;
		case CudaArch::GFX700:
		case CudaArch::GFX701:
		case CudaArch::GFX800:
		case CudaArch::GFX801:
		case CudaArch::GFX802:
		case CudaArch::GFX803:
		case CudaArch::GFX810:
		case CudaArch::GFX900:
		case CudaArch::GFX901:
		return CudaVersion::CUDA_70;
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVersion MaxVersionForCudaArch(CudaArch A) {		CudaVersion MaxVersionForCudaArch(CudaArch A) {
switch (A) {		switch (A) {
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
return CudaVersion::UNKNOWN;		return CudaVersion::UNKNOWN;
case CudaArch::SM_20:		case CudaArch::SM_20:
case CudaArch::SM_21:		case CudaArch::SM_21:
		case CudaArch::GFX700:
		case CudaArch::GFX701:
		case CudaArch::GFX800:
		case CudaArch::GFX801:
		case CudaArch::GFX802:
		case CudaArch::GFX803:
		case CudaArch::GFX810:
		case CudaArch::GFX900:
		case CudaArch::GFX901:
return CudaVersion::CUDA_80;		return CudaVersion::CUDA_80;
default:		default:
return CudaVersion::LATEST;		return CudaVersion::LATEST;
}		}
}		}

} // namespace clang		} // namespace clang
		HahnfeldUnsubmitted Not Done Reply Inline Actions This means you can't compile when the Clang driver detected an installation of CUDA 9? Hahnfeld: This means you can't compile when the Clang driver detected an installation of CUDA 9?

lib/Basic/Targets/AMDGPU.h

//===--- AMDGPU.h - Declare AMDGPU target feature support -------- C++ --===//		//===--- AMDGPU.h - Declare AMDGPU target feature support -------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file declares AMDGPU TargetInfo objects.		// This file declares AMDGPU TargetInfo objects.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H
#define LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#define LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H

		#include "clang/Basic/Cuda.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/TargetOptions.h"		#include "clang/Basic/TargetOptions.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"

namespace clang {		namespace clang {
namespace targets {		namespace targets {
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo {
static bool hasFullSpeedFMAF32(StringRef GPUName) {		static bool hasFullSpeedFMAF32(StringRef GPUName) {
return parseAMDGCNName(GPUName) >= GK_GFX9;		return parseAMDGCNName(GPUName) >= GK_GFX9;
}		}

static bool isAMDGCN(const llvm::Triple &TT) {		static bool isAMDGCN(const llvm::Triple &TT) {
return TT.getArch() == llvm::Triple::amdgcn;		return TT.getArch() == llvm::Triple::amdgcn;
}		}

		CudaArch GCN_Subarch;

static bool isGenericZero(const llvm::Triple &TT) {		static bool isGenericZero(const llvm::Triple &TT) {
return TT.getEnvironmentName() == "amdgiz" \|\|		return TT.getEnvironmentName() == "amdgiz" \|\|
		TT.getOS() == llvm::Triple::CUDA \|\|
		t-tyeUnsubmitted Not Done Reply Inline Actions We still want to use the amdhsa OS for amdgpu which currently supports the different environments. So can cuda simply support the same environments? Is the plan is to eliminate the environments and simply always use the default address space for generic so this code is no longer needed? t-tye: We still want to use the amdhsa OS for amdgpu which currently supports the different…
		yaxunlUnsubmitted Not Done Reply Inline Actions Currently we already use amdgiz by default. This is no longer needed. yaxunl: Currently we already use amdgiz by default. This is no longer needed.
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions removed in next update gregrodgers: removed in next update
TT.getEnvironmentName() == "amdgizcl";		TT.getEnvironmentName() == "amdgizcl";
}		}

public:		public:
AMDGPUTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);		AMDGPUTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts);

void setAddressSpaceMap(bool DefaultIsPrivate);		void setAddressSpaceMap(bool DefaultIsPrivate);

▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	else
return GK_NONE != parseR600Name(Name);		return GK_NONE != parseR600Name(Name);
}		}

bool setCPU(const std::string &Name) override {		bool setCPU(const std::string &Name) override {
if (getTriple().getArch() == llvm::Triple::amdgcn)		if (getTriple().getArch() == llvm::Triple::amdgcn)
GPU = parseAMDGCNName(Name);		GPU = parseAMDGCNName(Name);
else		else
GPU = parseR600Name(Name);		GPU = parseR600Name(Name);
		GCN_Subarch = StringToCudaArch(Name);
return GPU != GK_NONE;		return GPU != GK_NONE;
}		}

void setSupportedOpenCLOpts() override {		void setSupportedOpenCLOpts() override {
auto &Opts = getSupportedOpenCLOpts();		auto &Opts = getSupportedOpenCLOpts();
Opts.support("cl_clang_storage_class_specifiers");		Opts.support("cl_clang_storage_class_specifiers");
Opts.support("cl_khr_icd");		Opts.support("cl_khr_icd");

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	case CC_OpenCLKernel:
return CCCR_OK;		return CCCR_OK;
}		}
}		}

// In amdgcn target the null pointer in global, constant, and generic		// In amdgcn target the null pointer in global, constant, and generic
// address space has value 0 but in private and local address space has		// address space has value 0 but in private and local address space has
// value ~0.		// value ~0.
uint64_t getNullPointerValue(LangAS AS) const override {		uint64_t getNullPointerValue(LangAS AS) const override {
		if (getTriple().getOS() == llvm::Triple::CUDA)
		HahnfeldUnsubmitted Not Done Reply Inline Actions How can `LangAS::opencl_local` ever be used in CUDA? I think this check is redundant. Hahnfeld: How can `LangAS::opencl_local` ever be used in CUDA? I think this check is redundant.
		return 0;
return AS == LangAS::opencl_local ? ~0 : 0;		return AS == LangAS::opencl_local ? ~0 : 0;
}		}
};		};

} // namespace targets		} // namespace targets
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#endif // LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H

lib/Basic/Targets/AMDGPU.cpp

Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	if (getTriple().getArch() == llvm::Triple::r600) {
}		}
}		}
auto IsGenericZero = isGenericZero(Triple);		auto IsGenericZero = isGenericZero(Triple);
resetDataLayout(getTriple().getArch() == llvm::Triple::amdgcn		resetDataLayout(getTriple().getArch() == llvm::Triple::amdgcn
? (IsGenericZero ? DataLayoutStringSIGenericIsZero		? (IsGenericZero ? DataLayoutStringSIGenericIsZero
: DataLayoutStringSIPrivateIsZero)		: DataLayoutStringSIPrivateIsZero)
: DataLayoutStringR600);		: DataLayoutStringR600);
assert(DataLayout->getAllocaAddrSpace() == AS.Private);		assert(DataLayout->getAllocaAddrSpace() == AS.Private);
		GCN_Subarch = CudaArch::GFX803; // Default to fiji

setAddressSpaceMap(Triple.getOS() == llvm::Triple::Mesa3D \|\|		setAddressSpaceMap(Triple.getOS() == llvm::Triple::Mesa3D \|\|
Triple.getEnvironment() == llvm::Triple::OpenCL \|\|		Triple.getEnvironment() == llvm::Triple::OpenCL \|\|
Triple.getEnvironmentName() == "amdgizcl" \|\|		Triple.getEnvironmentName() == "amdgizcl" \|\|
!isAMDGCN(Triple));		!isAMDGCN(Triple));
UseAddrSpaceMapMangling = true;		UseAddrSpaceMapMangling = true;

// Set pointer width and alignment for target address space 0.		// Set pointer width and alignment for target address space 0.
Show All 15 Lines

ArrayRef<Builtin::Info> AMDGPUTargetInfo::getTargetBuiltins() const {		ArrayRef<Builtin::Info> AMDGPUTargetInfo::getTargetBuiltins() const {
return llvm::makeArrayRef(BuiltinInfo, clang::AMDGPU::LastTSBuiltin -		return llvm::makeArrayRef(BuiltinInfo, clang::AMDGPU::LastTSBuiltin -
Builtin::FirstTSBuiltin);		Builtin::FirstTSBuiltin);
}		}

void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,		void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const {		MacroBuilder &Builder) const {
if (getTriple().getArch() == llvm::Triple::amdgcn)		if (getTriple().getArch() == llvm::Triple::amdgcn) {
		if (getTriple().getOS() == llvm::Triple::CUDA) {
		std::string GFXProcCode = [this] {
		switch (GCN_Subarch) {
		case CudaArch::UNKNOWN:
		assert(false && "No GPU arch when compiling CUDA device code.");
		return "";
		case CudaArch::SM_20:
		return "000";
		case CudaArch::SM_21:
		return "000";
		case CudaArch::SM_30:
		return "000";
		case CudaArch::SM_32:
		return "000";
		case CudaArch::SM_35:
		return "000";
		case CudaArch::SM_37:
		return "000";
		case CudaArch::SM_50:
		return "000";
		case CudaArch::SM_52:
		return "000";
		case CudaArch::SM_53:
		return "000";
		case CudaArch::SM_60:
		return "000";
		case CudaArch::SM_61:
		return "000";
		case CudaArch::SM_62:
		return "000";
		case CudaArch::SM_70:
		return "000";
		case CudaArch::SM_72:
		return "000";
		case CudaArch::GFX700:
		return "700";
		case CudaArch::GFX701:
		return "701";
		case CudaArch::GFX800:
		return "800";
		case CudaArch::GFX801:
		return "801";
		case CudaArch::GFX802:
		return "802";
		case CudaArch::GFX803:
		return "803";
		case CudaArch::GFX810:
		return "810";
		case CudaArch::GFX900:
		return "900";
		case CudaArch::GFX901:
		return "901";
		}
		llvm_unreachable("unhandled GFX processor");
		}();
		Builder.defineMacro("__AMDGCN__", GFXProcCode);
		} else {
Builder.defineMacro("__AMDGCN__");		Builder.defineMacro("__AMDGCN__");
else		}
		} else
Builder.defineMacro("__R600__");		Builder.defineMacro("__R600__");

if (hasFMAF)		if (hasFMAF)
Builder.defineMacro("__HAS_FMAF__");		Builder.defineMacro("__HAS_FMAF__");
if (hasLDEXPF)		if (hasLDEXPF)
Builder.defineMacro("__HAS_LDEXPF__");		Builder.defineMacro("__HAS_LDEXPF__");
if (hasFP64)		if (hasFP64)
Builder.defineMacro("__HAS_FP64__");		Builder.defineMacro("__HAS_FP64__");
		if (getTriple().getOS() == llvm::Triple::CUDA) {
		// Set __CUDA_ARCH__ for the GPU specified.
		std::string CUDAArchCode = [this] {
		switch (GCN_Subarch) {
		case CudaArch::UNKNOWN:
		assert(false && "No GPU arch when compiling CUDA device code.");
		arsenmUnsubmitted Not Done Reply Inline Actions llvm_unreachable arsenm: llvm_unreachable
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions Fixed in next update gregrodgers: Fixed in next update
		return "";
		case CudaArch::SM_20:
		return "200";
		case CudaArch::SM_21:
		return "210";
		case CudaArch::SM_30:
		return "300";
		case CudaArch::SM_32:
		return "320";
		case CudaArch::SM_35:
		return "350";
		case CudaArch::SM_37:
		return "370";
		case CudaArch::SM_50:
		return "500";
		case CudaArch::SM_52:
		return "520";
		case CudaArch::SM_53:
		return "530";
		case CudaArch::SM_60:
		return "600";
		case CudaArch::SM_61:
		return "610";
		case CudaArch::SM_62:
		return "620";
		case CudaArch::SM_70:
		return "700";
		case CudaArch::SM_72:
		return "720";
		case CudaArch::GFX700:
		return "320";
		case CudaArch::GFX701:
		return "320";
		case CudaArch::GFX800:
		return "320";
		case CudaArch::GFX801:
		return "320";
		case CudaArch::GFX802:
		return "320";
		case CudaArch::GFX803:
		return "320";
		case CudaArch::GFX810:
		return "320";
		case CudaArch::GFX900:
		return "320";
		case CudaArch::GFX901:
		return "320";
		}
		llvm_unreachable("unhandled CudaArch");
		}();
		// Set __CUDA_ARCH__ for the GPU specified.
		Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);
		}
}		}

lib/Basic/Targets/NVPTX.cpp

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	std::string CUDAArchCode = [this] {
case CudaArch::SM_61:		case CudaArch::SM_61:
return "610";		return "610";
case CudaArch::SM_62:		case CudaArch::SM_62:
return "620";		return "620";
case CudaArch::SM_70:		case CudaArch::SM_70:
return "700";		return "700";
case CudaArch::SM_72:		case CudaArch::SM_72:
return "720";		return "720";
		case CudaArch::GFX700:
		return "320";
		case CudaArch::GFX701:
		return "320";
		case CudaArch::GFX800:
		return "320";
		case CudaArch::GFX801:
		return "320";
		case CudaArch::GFX802:
		return "320";
		case CudaArch::GFX803:
		return "320";
		case CudaArch::GFX810:
		return "320";
		case CudaArch::GFX900:
		return "320";
		case CudaArch::GFX901:
		return "320";
}		}
llvm_unreachable("unhandled CudaArch");		llvm_unreachable("unhandled CudaArch");
}();		}();
Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);		Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);
}		}
}		}

ArrayRef<Builtin::Info> NVPTXTargetInfo::getTargetBuiltins() const {		ArrayRef<Builtin::Info> NVPTXTargetInfo::getTargetBuiltins() const {
return llvm::makeArrayRef(BuiltinInfo, clang::NVPTX::LastTSBuiltin -		return llvm::makeArrayRef(BuiltinInfo, clang::NVPTX::LastTSBuiltin -
Builtin::FirstTSBuiltin);		Builtin::FirstTSBuiltin);
}		}

lib/Driver/Driver.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines

	bool initialize() override {			bool initialize() override {
	// We don't need to support CUDA.			// We don't need to support CUDA.
	if (!C.hasOffloadToolChain<Action::OFK_Cuda>())			if (!C.hasOffloadToolChain<Action::OFK_Cuda>())
	return false;			return false;

	const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();			const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
	assert(HostTC && "No toolchain for host compilation.");			assert(HostTC && "No toolchain for host compilation.");
	if (HostTC->getTriple().isNVPTX()) {			if (HostTC->getTriple().isNVPTX() \|\|
				HostTC->getTriple().getArch() == llvm::Triple::amdgcn) {
	// We do not support targeting NVPTX for host compilation. Throw			// We do not support targeting NVPTX for host compilation. Throw
	// an error and abort pipeline construction early so we don't trip			// an error and abort pipeline construction early so we don't trip
	// asserts that assume device-side compilation.			// asserts that assume device-side compilation.
	C.getDriver().Diag(diag::err_drv_cuda_nvptx_host);			C.getDriver().Diag(diag::err_drv_cuda_nvptx_host);
	return true;			return true;
	}			}

	ToolChains.push_back(C.getSingleOffloadToolChain<Action::OFK_Cuda>());			ToolChains.push_back(C.getSingleOffloadToolChain<Action::OFK_Cuda>());
	▲ Show 20 Lines • Show All 880 Lines • ▼ Show 20 Lines

	/// Set to true if the current toolchain refers to host actions.			/// Set to true if the current toolchain refers to host actions.
	bool IsHostSelector;			bool IsHostSelector;

	/// Set to true if save-temps and embed-bitcode functionalities are active.			/// Set to true if save-temps and embed-bitcode functionalities are active.
	bool SaveTemps;			bool SaveTemps;
	bool EmbedBitcode;			bool EmbedBitcode;

				/// Type of the input file for the tool
				types::ID InputType;

	/// Get previous dependent action or null if that does not exist. If			/// Get previous dependent action or null if that does not exist. If
	/// \a CanBeCollapsed is false, that action must be legal to collapse or			/// \a CanBeCollapsed is false, that action must be legal to collapse or
	/// null will be returned.			/// null will be returned.
	const JobAction *getPrevDependentAction(const ActionList &Inputs,			const JobAction *getPrevDependentAction(const ActionList &Inputs,
	ActionList &SavedOffloadAction,			ActionList &SavedOffloadAction,
	bool CanBeCollapsed = true) {			bool CanBeCollapsed = true) {
	// An option can be collapsed only if it has a single input.			// An option can be collapsed only if it has a single input.
	if (Inputs.size() != 1)			if (Inputs.size() != 1)
	Show All 39 Lines
	return TC.useIntegratedAs() && !SaveTemps &&			return TC.useIntegratedAs() && !SaveTemps &&
	!C.getArgs().hasArg(options::OPT_via_file_asm) &&			!C.getArgs().hasArg(options::OPT_via_file_asm) &&
	!C.getArgs().hasArg(options::OPT__SLASH_FA) &&			!C.getArgs().hasArg(options::OPT__SLASH_FA) &&
	!C.getArgs().hasArg(options::OPT__SLASH_Fa);			!C.getArgs().hasArg(options::OPT__SLASH_Fa);
	}			}

	/// Return true if a preprocessor action can be collapsed.			/// Return true if a preprocessor action can be collapsed.
	bool canCollapsePreprocessorAction() const {			bool canCollapsePreprocessorAction() const {
	return !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&			return !C.getArgs().hasArg(options::OPT_no_integrated_cpp) &&
	!C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps &&			!C.getArgs().hasArg(options::OPT_traditional_cpp) && !SaveTemps &&
				HahnfeldUnsubmitted Not Done Reply Inline Actions I'm not sure I understand this change to the generic driver code: How can LLVM IR / BC ever need a preprocessor? Hahnfeld: I'm not sure I understand this change to the generic driver code: How can LLVM IR / BC ever…
				(InputType != types::TY_LLVM_IR) &&
				(InputType != types::TY_LLVM_BC) &&
	!C.getArgs().hasArg(options::OPT_rewrite_objc);			!C.getArgs().hasArg(options::OPT_rewrite_objc);
	}			}

	/// Struct that relates an action with the offload actions that would be			/// Struct that relates an action with the offload actions that would be
	/// collapsed with it.			/// collapsed with it.
	struct JobActionInfo final {			struct JobActionInfo final {
	/// The action this info refers to.			/// The action this info refers to.
	const JobAction *JA = nullptr;			const JobAction *JA = nullptr;
	Show All 29 Lines
	if (ActionInfo.size() < 3 \|\| !canCollapseAssembleAction())			if (ActionInfo.size() < 3 \|\| !canCollapseAssembleAction())
	return nullptr;			return nullptr;
	auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);			auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);
	auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);			auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);
	auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[2].JA);			auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[2].JA);
	if (!AJ \|\| !BJ \|\| !CJ)			if (!AJ \|\| !BJ \|\| !CJ)
	return nullptr;			return nullptr;

				// Cannot combine compilation with backend for amdgcn backend
				if ((AJ->isOffloading(Action::OFK_Cuda) \|\|
				AJ->isOffloading(Action::OFK_OpenMP)) &&
				(StringRef(AJ->getOffloadingArch()).startswith("gfx") \|\|
				TC.getTriple().getArch() == llvm::Triple::amdgcn)) {
				return nullptr;
				}

	// Get compiler tool.			// Get compiler tool.
	const Tool T = TC.SelectTool(CJ);			const Tool T = TC.SelectTool(CJ);
	if (!T)			if (!T)
	return nullptr;			return nullptr;

	// When using -fembed-bitcode, it is required to have the same tool (clang)			// When using -fembed-bitcode, it is required to have the same tool (clang)
	// for both CompilerJA and BackendJA. Otherwise, combine two stages.			// for both CompilerJA and BackendJA. Otherwise, combine two stages.
	if (EmbedBitcode) {			if (EmbedBitcode) {
	Show All 15 Lines
	ActionList &CollapsedOffloadAction) {			ActionList &CollapsedOffloadAction) {
	if (ActionInfo.size() < 2 \|\| !canCollapseAssembleAction())			if (ActionInfo.size() < 2 \|\| !canCollapseAssembleAction())
	return nullptr;			return nullptr;
	auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);			auto *AJ = dyn_cast<AssembleJobAction>(ActionInfo[0].JA);
	auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);			auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[1].JA);
	if (!AJ \|\| !BJ)			if (!AJ \|\| !BJ)
	return nullptr;			return nullptr;

				// Cannot combine assemble with backend for amdgcn backend
				if ((AJ->isOffloading(Action::OFK_Cuda) \|\|
				AJ->isOffloading(Action::OFK_OpenMP)) &&
				(StringRef(AJ->getOffloadingArch()).startswith("gfx") \|\|
				TC.getTriple().getArch() == llvm::Triple::amdgcn)) {
				return nullptr;
				}

	// Retrieve the compile job, backend action must always be preceded by one.			// Retrieve the compile job, backend action must always be preceded by one.
	ActionList CompileJobOffloadActions;			ActionList CompileJobOffloadActions;
	auto *CJ = getPrevDependentAction(BJ->getInputs(), CompileJobOffloadActions,			auto *CJ = getPrevDependentAction(BJ->getInputs(), CompileJobOffloadActions,
	/CanBeCollapsed=/false);			/CanBeCollapsed=/false);
	if (!AJ \|\| !BJ \|\| !CJ)			if (!AJ \|\| !BJ \|\| !CJ)
	return nullptr;			return nullptr;

	assert(isa<CompileJobAction>(CJ) &&			assert(isa<CompileJobAction>(CJ) &&
	Show All 17 Lines
	ActionList &CollapsedOffloadAction) {			ActionList &CollapsedOffloadAction) {
	if (ActionInfo.size() < 2 \|\| !canCollapsePreprocessorAction())			if (ActionInfo.size() < 2 \|\| !canCollapsePreprocessorAction())
	return nullptr;			return nullptr;
	auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[0].JA);			auto *BJ = dyn_cast<BackendJobAction>(ActionInfo[0].JA);
	auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[1].JA);			auto *CJ = dyn_cast<CompileJobAction>(ActionInfo[1].JA);
	if (!BJ \|\| !CJ)			if (!BJ \|\| !CJ)
	return nullptr;			return nullptr;

				// Cannot combine compilation with backend for amdgcn backend
				if ((BJ->isOffloading(Action::OFK_Cuda) \|\|
				BJ->isOffloading(Action::OFK_OpenMP)) &&
				(StringRef(BJ->getOffloadingArch()).startswith("gfx") \|\|
				TC.getTriple().getArch() == llvm::Triple::amdgcn)) {
				// It is necessary to combine when generating IR for compile-only with
				// flags "-c -S -emit-llvm". If only flags "-c -S" the gcn backend is
				// needed to generate linked and opt IR for llc, so do not combine.
				if (!(C.getArgs().hasArg(options::OPT_c) &&
				C.getArgs().hasArg(options::OPT_S) &&
				C.getArgs().hasArg(options::OPT_emit_llvm)))
				return nullptr;
				}

	// Get compiler tool.			// Get compiler tool.
	const Tool T = TC.SelectTool(CJ);			const Tool T = TC.SelectTool(CJ);
	if (!T)			if (!T)
	return nullptr;			return nullptr;

	if (T->canEmitIR() && (SaveTemps \|\| EmbedBitcode))			if (T->canEmitIR() && (SaveTemps \|\| EmbedBitcode))
	return nullptr;			return nullptr;

	Show All 27 Lines

	public:			public:
	ToolSelector(const JobAction *BaseAction, const ToolChain &TC,			ToolSelector(const JobAction *BaseAction, const ToolChain &TC,
	const Compilation &C, bool SaveTemps, bool EmbedBitcode)			const Compilation &C, bool SaveTemps, bool EmbedBitcode)
	: TC(TC), C(C), BaseAction(BaseAction), SaveTemps(SaveTemps),			: TC(TC), C(C), BaseAction(BaseAction), SaveTemps(SaveTemps),
	EmbedBitcode(EmbedBitcode) {			EmbedBitcode(EmbedBitcode) {
	assert(BaseAction && "Invalid base action.");			assert(BaseAction && "Invalid base action.");
	IsHostSelector = BaseAction->getOffloadingDeviceKind() == Action::OFK_None;			IsHostSelector = BaseAction->getOffloadingDeviceKind() == Action::OFK_None;
				// Store the InputType to check if Compile and Backend can collapse
				for (Arg *A : C.getInputArgs()) {
				if (A->getOption().getKind() == Option::InputClass) {
				const char *Value = A->getValue();
				if (const char *Ext = strrchr(Value, '.'))
				InputType = TC.LookupTypeForExtension(Ext + 1);
				}
				}
	}			}

	/// Check if a chain of actions can be combined and return the tool that can			/// Check if a chain of actions can be combined and return the tool that can
	/// handle the combination of actions. The pointer to the current inputs \a			/// handle the combination of actions. The pointer to the current inputs \a
	/// Inputs and the list of offload actions \a CollapsedOffloadActions			/// Inputs and the list of offload actions \a CollapsedOffloadActions
	/// connected to collapsed actions are updated accordingly. The latter enables			/// connected to collapsed actions are updated accordingly. The latter enables
	/// the caller of the selector to process them afterwards instead of just			/// the caller of the selector to process them afterwards instead of just
	/// dropping them. If no suitable tool is found, null will be returned.			/// dropping them. If no suitable tool is found, null will be returned.
	▲ Show 20 Lines • Show All 482 Lines • ▼ Show 20 Lines
	Suffixed += "-";			Suffixed += "-";
	Suffixed.append(BoundArch);			Suffixed.append(BoundArch);
	}			}
	// When using both -save-temps and -emit-llvm, use a ".tmp.bc" suffix for			// When using both -save-temps and -emit-llvm, use a ".tmp.bc" suffix for
	// the unoptimized bitcode so that it does not get overwritten by the ".bc"			// the unoptimized bitcode so that it does not get overwritten by the ".bc"
	// optimized bitcode output.			// optimized bitcode output.
	if (!AtTopLevel && C.getArgs().hasArg(options::OPT_emit_llvm) &&			if (!AtTopLevel && C.getArgs().hasArg(options::OPT_emit_llvm) &&
	JA.getType() == types::TY_LLVM_BC)			JA.getType() == types::TY_LLVM_BC)
	Suffixed += ".tmp";			Suffixed += ".tmp";
	Suffixed += '.';			Suffixed += '.';
	Suffixed += Suffix;			if (((StringRef)BaseInput).endswith(".a"))
				Suffixed += "a";
				HahnfeldUnsubmitted Not Done Reply Inline Actions If this is really needed this deserves some justification. Hahnfeld: If this is really needed this deserves some justification.
				else
				Suffixed += Suffix;
	NamedOutput = C.getArgs().MakeArgString(Suffixed.c_str());			NamedOutput = C.getArgs().MakeArgString(Suffixed.c_str());
	}			}

	// Prepend object file path if -save-temps=obj			// Prepend object file path if -save-temps=obj
	if (!AtTopLevel && isSaveTempsObj() && C.getArgs().hasArg(options::OPT_o) &&			if (!AtTopLevel && isSaveTempsObj() && C.getArgs().hasArg(options::OPT_o) &&
	JA.getType() != types::TY_PCH) {			JA.getType() != types::TY_PCH) {
	Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o);			Arg *FinalOutput = C.getArgs().getLastArg(options::OPT_o);
	SmallString<128> TempPath(FinalOutput->getValue());			SmallString<128> TempPath(FinalOutput->getValue());
	▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines

lib/Driver/SanitizerArgs.cpp

	Show First 20 Lines • Show All 714 Lines • ▼ Show 20 Lines
	}			}

	void SanitizerArgs::addArgs(const ToolChain &TC, const llvm::opt::ArgList &Args,			void SanitizerArgs::addArgs(const ToolChain &TC, const llvm::opt::ArgList &Args,
	llvm::opt::ArgStringList &CmdArgs,			llvm::opt::ArgStringList &CmdArgs,
	types::ID InputType) const {			types::ID InputType) const {
	// NVPTX doesn't currently support sanitizers. Bailing out here means that			// NVPTX doesn't currently support sanitizers. Bailing out here means that
	// e.g. -fsanitize=address applies only to host code, which is what we want			// e.g. -fsanitize=address applies only to host code, which is what we want
	// for now.			// for now.
	if (TC.getTriple().isNVPTX())			if (TC.getTriple().isNVPTX() \|\|
				TC.getTriple().getArch() == llvm::Triple::amdgcn)
	return;			return;

	// Translate available CoverageFeatures to corresponding clang-cc1 flags.			// Translate available CoverageFeatures to corresponding clang-cc1 flags.
	// Do it even if Sanitizers.empty() since some forms of coverage don't require			// Do it even if Sanitizers.empty() since some forms of coverage don't require
	// sanitizers.			// sanitizers.
	std::pair<int, const char *> CoverageFlags[] = {			std::pair<int, const char *> CoverageFlags[] = {
	std::make_pair(CoverageFunc, "-fsanitize-coverage-type=1"),			std::make_pair(CoverageFunc, "-fsanitize-coverage-type=1"),
	std::make_pair(CoverageBB, "-fsanitize-coverage-type=2"),			std::make_pair(CoverageBB, "-fsanitize-coverage-type=2"),
	▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

lib/Driver/ToolChain.cpp

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
}		}

Tool *ToolChain::getClang() const {		Tool *ToolChain::getClang() const {
if (!Clang)		if (!Clang)
Clang.reset(new tools::Clang(*this));		Clang.reset(new tools::Clang(*this));
return Clang.get();		return Clang.get();
}		}

		Tool ToolChain::buildBackend() const { return new tools::Clang(this); }

Tool *ToolChain::buildAssembler() const {		Tool *ToolChain::buildAssembler() const {
return new tools::ClangAs(*this);		return new tools::ClangAs(*this);
}		}

Tool *ToolChain::buildLinker() const {		Tool *ToolChain::buildLinker() const {
llvm_unreachable("Linking is not supported by this toolchain");		llvm_unreachable("Linking is not supported by this toolchain");
}		}

Tool *ToolChain::getAssemble() const {		Tool *ToolChain::getAssemble() const {
if (!Assemble)		if (!Assemble)
Assemble.reset(buildAssembler());		Assemble.reset(buildAssembler());
return Assemble.get();		return Assemble.get();
}		}

		Tool *ToolChain::getBackend() const {
		if (!Backend)
		Backend.reset(buildBackend());
		return Backend.get();
		}

Tool *ToolChain::getClangAs() const {		Tool *ToolChain::getClangAs() const {
if (!Assemble)		if (!Assemble)
Assemble.reset(new tools::ClangAs(*this));		Assemble.reset(new tools::ClangAs(*this));
return Assemble.get();		return Assemble.get();
}		}

Tool *ToolChain::getLink() const {		Tool *ToolChain::getLink() const {
if (!Link)		if (!Link)
Show All 24 Lines	case Action::VerifyDebugInfoJobClass:
llvm_unreachable("Invalid tool kind.");		llvm_unreachable("Invalid tool kind.");

case Action::CompileJobClass:		case Action::CompileJobClass:
case Action::PrecompileJobClass:		case Action::PrecompileJobClass:
case Action::PreprocessJobClass:		case Action::PreprocessJobClass:
case Action::AnalyzeJobClass:		case Action::AnalyzeJobClass:
case Action::MigrateJobClass:		case Action::MigrateJobClass:
case Action::VerifyPCHJobClass:		case Action::VerifyPCHJobClass:
case Action::BackendJobClass:
return getClang();		return getClang();
		case Action::BackendJobClass:
		return getBackend();

case Action::OffloadBundlingJobClass:		case Action::OffloadBundlingJobClass:
case Action::OffloadUnbundlingJobClass:		case Action::OffloadUnbundlingJobClass:
return getOffloadBundler();		return getOffloadBundler();
}		}

llvm_unreachable("Invalid tool kind.");		llvm_unreachable("Invalid tool kind.");
}		}
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	if (Args.hasFlag(options::OPT_fprofile_arcs, options::OPT_fno_profile_arcs,
Args.hasArg(options::OPT_fcreate_profile) \|\|		Args.hasArg(options::OPT_fcreate_profile) \|\|
Args.hasArg(options::OPT_coverage))		Args.hasArg(options::OPT_coverage))
return true;		return true;

return false;		return false;
}		}

Tool *ToolChain::SelectTool(const JobAction &JA) const {		Tool *ToolChain::SelectTool(const JobAction &JA) const {
if (getDriver().ShouldUseClangCompiler(JA)) return getClang();
Action::ActionClass AC = JA.getKind();		Action::ActionClass AC = JA.getKind();
		// The amdgcn Backend needs buildBackend()
		// if ( StringRef(JA.getOffloadingArch()).startswith("gfx") &&
		if ((JA.isOffloading(Action::OFK_Cuda) \|\|
		JA.isOffloading(Action::OFK_OpenMP)) &&
		(StringRef(JA.getOffloadingArch()).startswith("gfx") \|\|
		(getTriple().getArch() == llvm::Triple::amdgcn)) &&
		(AC == Action::BackendJobClass)) {
		if ((Args.hasArg(options::OPT_emit_llvm)) \|\|
		(Args.hasArg(options::OPT_emit_llvm_bc)))
		return getClang(); // Dont run amdgcn backend if we just want LLVM IR
		else
		return getTool(AC);
		};
		if (getDriver().ShouldUseClangCompiler(JA))
		return getClang();
if (AC == Action::AssembleJobClass && useIntegratedAs())		if (AC == Action::AssembleJobClass && useIntegratedAs())
return getClangAs();		return getClangAs();
return getTool(AC);		return getTool(AC);
}		}

std::string ToolChain::GetFilePath(const char *Name) const {		std::string ToolChain::GetFilePath(const char *Name) const {
return D.GetFilePath(Name, *this);		return D.GetFilePath(Name, *this);
}		}
▲ Show 20 Lines • Show All 525 Lines • Show Last 20 Lines

lib/Driver/ToolChains/Clang.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	// Add -Xanalyzer arguments when running as analyzer.			// Add -Xanalyzer arguments when running as analyzer.
	Args.AddAllArgValues(CmdArgs, options::OPT_Xanalyzer);			Args.AddAllArgValues(CmdArgs, options::OPT_Xanalyzer);
	}			}

	static void RenderSSPOptions(const ToolChain &TC, const ArgList &Args,			static void RenderSSPOptions(const ToolChain &TC, const ArgList &Args,
	ArgStringList &CmdArgs, bool KernelOrKext) {			ArgStringList &CmdArgs, bool KernelOrKext) {
	const llvm::Triple &EffectiveTriple = TC.getEffectiveTriple();			const llvm::Triple &EffectiveTriple = TC.getEffectiveTriple();

	// NVPTX doesn't support stack protectors; from the compiler's perspective, it			// NVPTX and GCN don't support stack protectors; from the compiler's
	// doesn't even have a stack!			// perspective, it doesn't even have a stack!
	if (EffectiveTriple.isNVPTX())			if (EffectiveTriple.isNVPTX() \|\|
				EffectiveTriple.getArch() == llvm::Triple::amdgcn)
	return;			return;

	// -stack-protector=0 is default.			// -stack-protector=0 is default.
	unsigned StackProtectorLevel = 0;			unsigned StackProtectorLevel = 0;
	unsigned DefaultStackProtectorLevel =			unsigned DefaultStackProtectorLevel =
	TC.GetDefaultStackProtectorLevel(KernelOrKext);			TC.GetDefaultStackProtectorLevel(KernelOrKext);

	if (Arg *A = Args.getLastArg(options::OPT_fno_stack_protector,			if (Arg *A = Args.getLastArg(options::OPT_fno_stack_protector,
	▲ Show 20 Lines • Show All 720 Lines • ▼ Show 20 Lines
	RenderDebugInfoCompressionArgs(Args, CmdArgs, D);			RenderDebugInfoCompressionArgs(Args, CmdArgs, D);
	}			}

	void Clang::ConstructJob(Compilation &C, const JobAction &JA,			void Clang::ConstructJob(Compilation &C, const JobAction &JA,
	const InputInfo &Output, const InputInfoList &Inputs,			const InputInfo &Output, const InputInfoList &Inputs,
	const ArgList &Args, const char *LinkingOutput) const {			const ArgList &Args, const char *LinkingOutput) const {
	const llvm::Triple &RawTriple = getToolChain().getTriple();			const llvm::Triple &RawTriple = getToolChain().getTriple();
	const llvm::Triple &Triple = getToolChain().getEffectiveTriple();			const llvm::Triple &Triple = getToolChain().getEffectiveTriple();
	const std::string &TripleStr = Triple.getTriple();
				bool Is_amdgcn = StringRef(JA.getOffloadingArch()).startswith("gfx") \|\|
				(getToolChain().getArch() == llvm::Triple::amdgcn);
				// Currently cuda driver only support offload triple nvptx64-nvidia-cuda.
				// Switch this from nvptx to amdgcn iff the subarch is a gfx processor.
				const std::string &TripleStr =
				Is_amdgcn && (JA.isOffloading(Action::OFK_Cuda) \|\|
				JA.isOffloading(Action::OFK_OpenMP))
				? "amdgcn--cuda"
				: Triple.getTriple();

	bool KernelOrKext =			bool KernelOrKext =
	Args.hasArg(options::OPT_mkernel, options::OPT_fapple_kext);			Args.hasArg(options::OPT_mkernel, options::OPT_fapple_kext);
	const Driver &D = getToolChain().getDriver();			const Driver &D = getToolChain().getDriver();
	ArgStringList CmdArgs;			ArgStringList CmdArgs;

	// Check number of inputs for sanity. We need at least one input.			// Check number of inputs for sanity. We need at least one input.
	assert(Inputs.size() >= 1 && "Must have at least one input.");			assert(Inputs.size() >= 1 && "Must have at least one input.");
	▲ Show 20 Lines • Show All 384 Lines • ▼ Show 20 Lines
	if (Args.hasArg(options::OPT_fdebug_pass_arguments)) {			if (Args.hasArg(options::OPT_fdebug_pass_arguments)) {
	CmdArgs.push_back("-mdebug-pass");			CmdArgs.push_back("-mdebug-pass");
	CmdArgs.push_back("Arguments");			CmdArgs.push_back("Arguments");
	}			}

	// Enable -mconstructor-aliases except on darwin, where we have to work around			// Enable -mconstructor-aliases except on darwin, where we have to work around
	// a linker bug (see <rdar://problem/7651567>), and CUDA device code, where			// a linker bug (see <rdar://problem/7651567>), and CUDA device code, where
	// aliases aren't supported.			// aliases aren't supported.
	if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX())			if (!RawTriple.isOSDarwin() && !RawTriple.isNVPTX() &&
				RawTriple.getArch() != llvm::Triple::amdgcn)
	CmdArgs.push_back("-mconstructor-aliases");			CmdArgs.push_back("-mconstructor-aliases");

	// Darwin's kernel doesn't support guard variables; just die if we			// Darwin's kernel doesn't support guard variables; just die if we
	// try to use them.			// try to use them.
	if (KernelOrKext && RawTriple.isOSDarwin())			if (KernelOrKext && RawTriple.isOSDarwin())
	CmdArgs.push_back("-fforbid-guard-variables");			CmdArgs.push_back("-fforbid-guard-variables");

	if (Args.hasFlag(options::OPT_mms_bitfields, options::OPT_mno_ms_bitfields,			if (Args.hasFlag(options::OPT_mms_bitfields, options::OPT_mno_ms_bitfields,
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/Driver/ToolChains/Cuda.h

//===--- Cuda.h - Cuda ToolChain Implementations ----------------- C++ --===//		//===--- Cuda.h - Cuda ToolChain Implementations ----------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H		#ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H
#define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H		#define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H

#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
		#include "clang/Basic/DebugInfoOptions.h"
#include "clang/Basic/VersionTuple.h"		#include "clang/Basic/VersionTuple.h"
#include "clang/Driver/Action.h"		#include "clang/Driver/Action.h"
#include "clang/Driver/Multilib.h"		#include "clang/Driver/Multilib.h"
#include "clang/Driver/ToolChain.h"
#include "clang/Driver/Tool.h"		#include "clang/Driver/Tool.h"
		#include "clang/Driver/ToolChain.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include <set>		#include <set>
#include <vector>		#include <vector>

namespace clang {		namespace clang {
namespace driver {		namespace driver {
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	public:
std::string getLibDeviceFile(StringRef Gpu) const {		std::string getLibDeviceFile(StringRef Gpu) const {
return LibDeviceMap.lookup(Gpu);		return LibDeviceMap.lookup(Gpu);
}		}
};		};

namespace tools {		namespace tools {
namespace NVPTX {		namespace NVPTX {

		// for amdgcn the backend is llvm-link + opt
		class LLVM_LIBRARY_VISIBILITY Backend : public Tool {
		public:
		Backend(const ToolChain &TC)
		: Tool("NVPTX::Backend", "GPU-backend", TC, RF_Full, llvm::sys::WEM_UTF8,
		"--options-file") {}
		virtual bool hasIntegratedCPP() const override { return false; }
		virtual void ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output,
		const InputInfoList &Inputs,
		const llvm::opt::ArgList &TCArgs,
		const char *LinkingOutput) const override;
		};

// Run ptxas, the NVPTX assembler.		// Run ptxas, the NVPTX assembler.
class LLVM_LIBRARY_VISIBILITY Assembler : public Tool {		class LLVM_LIBRARY_VISIBILITY Assembler : public Tool {
public:		public:
Assembler(const ToolChain &TC)		Assembler(const ToolChain &TC)
: Tool("NVPTX::Assembler", "ptxas", TC, RF_Full, llvm::sys::WEM_UTF8,		: Tool("NVPTX::Assembler", "ptxas", TC, RF_Full, llvm::sys::WEM_UTF8,
"--options-file") {}		"--options-file") {}

bool hasIntegratedCPP() const override { return false; }		bool hasIntegratedCPP() const override { return false; }
Show All 30 Lines	public:

void ConstructJob(Compilation &C, const JobAction &JA,		void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,		const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,		const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;		const char *LinkingOutput) const override;
};		};

} // end namespace NVPTX		} // end namespace NVPTX

		void addBCLib(Compilation &C, const llvm::opt::ArgList &Args,
		llvm::opt::ArgStringList &CmdArgs,
		llvm::opt::ArgStringList LibraryPaths, const char *BCName);

		void addEnvListWithSpaces(const llvm::opt::ArgList &Args,
		llvm::opt::ArgStringList &CmdArgs,
		const char *EnvVar);

} // end namespace tools		} // end namespace tools

namespace toolchains {		namespace toolchains {

class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {		class LLVM_LIBRARY_VISIBILITY CudaToolChain : public ToolChain {
public:		public:
CudaToolChain(const Driver &D, const llvm::Triple &Triple,		CudaToolChain(const Driver &D, const llvm::Triple &Triple,
const ToolChain &HostTC, const llvm::opt::ArgList &Args,		const ToolChain &HostTC, const llvm::opt::ArgList &Args,
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:
VersionTuple		VersionTuple
computeMSVCVersion(const Driver *D,		computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;

const ToolChain &HostTC;		const ToolChain &HostTC;
CudaInstallationDetector CudaInstallation;		CudaInstallationDetector CudaInstallation;

protected:		protected:
		Tool *buildBackend() const override; // for amdgcn, link and opt
Tool *buildAssembler() const override; // ptxas		Tool *buildAssembler() const override; // ptxas
Tool *buildLinker() const override; // fatbinary (ok, not really a linker)		Tool *buildLinker() const override; // fatbinary (ok, not really a linker)

private:		private:
const Action::OffloadKind OK;		const Action::OffloadKind OK;
};		};

} // end namespace toolchains		} // end namespace toolchains
} // end namespace driver		} // end namespace driver
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H		#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_CUDA_H

lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	for (const auto &CudaPath : CudaPathCandidates) {
// Check that we have found at least one libdevice that we can link in if		// Check that we have found at least one libdevice that we can link in if
// -nocudalib hasn't been specified.		// -nocudalib hasn't been specified.
if (LibDeviceMap.empty() && !Args.hasArg(options::OPT_nocudalib))		if (LibDeviceMap.empty() && !Args.hasArg(options::OPT_nocudalib))
continue;		continue;

IsValid = true;		IsValid = true;
break;		break;
}		}

		// Search for GCN Device Libraries
		std::string GCNPath;
		for (Arg *A : Args) {
		if (A->getOption().matches(options::OPT_cuda_gpu_arch_EQ) &&
		StringRef(A->getValue()).startswith("gfx")) {
		SmallVector<std::string, 4> GCNPathCandidates;
		if (const char *libamdgcn = getenv("LIBAMDGCN"))
		GCNPathCandidates.push_back(D.SysRoot + libamdgcn);
		else
		GCNPathCandidates.push_back(D.SysRoot + "/opt/rocm/libamdgcn");
		for (const auto &CPath : GCNPathCandidates) {
		if (CPath.empty() \|\| !D.getVFS().exists(CPath))
		continue;
		GCNPath = CPath;
		}
		break;
		}
		}

		// The directory names of GCN device libraries are the gfxnames.
		// e.g. /opt/rocm/libamdgcn/gfx701
		if (!GCNPath.empty()) {
		auto &FS = D.getVFS();
		std::error_code EC;
		for (llvm::sys::fs::directory_iterator LI(GCNPath, EC), LE; !EC && LI != LE;
		LI = LI.increment(EC)) {
		StringRef Dirname = LI->path();
		StringRef GCNname = Dirname.rsplit('/').second;
		if (GCNname.startswith("gfx")) {
		std::string OCLFilePath = Dirname.str() + "/lib/opencl.amdgcn.bc";
		if (FS.exists(OCLFilePath))
		LibDeviceMap[GCNname.str()] = OCLFilePath;
		}
		continue;
		}
		}
}		}

void CudaInstallationDetector::AddCudaIncludeArgs(		void CudaInstallationDetector::AddCudaIncludeArgs(
const ArgList &DriverArgs, ArgStringList &CC1Args) const {		const ArgList &DriverArgs, ArgStringList &CC1Args) const {
if (!DriverArgs.hasArg(options::OPT_nobuiltininc)) {		if (!DriverArgs.hasArg(options::OPT_nobuiltininc)) {
// Add cuda_wrappers/* to our system include path. This lets us wrap		// Add cuda_wrappers/* to our system include path. This lets us wrap
// standard library headers.		// standard library headers.
SmallString<128> P(D.ResourceDir);		SmallString<128> P(D.ResourceDir);
Show All 35 Lines
}		}

void CudaInstallationDetector::print(raw_ostream &OS) const {		void CudaInstallationDetector::print(raw_ostream &OS) const {
if (isValid())		if (isValid())
OS << "Found CUDA installation: " << InstallPath << ", version "		OS << "Found CUDA installation: " << InstallPath << ", version "
<< CudaVersionToString(Version) << "\n";		<< CudaVersionToString(Version) << "\n";
}		}

		void tools::addBCLib(Compilation &C, const ArgList &Args,
		ArgStringList &CmdArgs, ArgStringList LibraryPaths,
		const char *BCName) {
		std::string FullName;
		bool FoundLibDevice = false;
		for (std::string LibraryPath : LibraryPaths) {
		LibraryPath = LibraryPath.substr(2, LibraryPath.length() - 2); // -L
		FullName = Args.MakeArgString(LibraryPath + "/" + BCName);
		if (llvm::sys::fs::exists(FullName.c_str())) {
		FoundLibDevice = true;
		break;
		}
		}
		if (!FoundLibDevice)
		C.getDriver().Diag(diag::err_drv_no_such_file) << BCName;
		CmdArgs.push_back(Args.MakeArgString(FullName));
		}
		HahnfeldUnsubmitted Not Done Reply Inline Actions I don't see this file in the upstream repository? Hahnfeld: I don't see this file in the upstream repository?

		void tools::addEnvListWithSpaces(const ArgList &Args, ArgStringList &CmdArgs,
		const char *EnvVar) {
		const char *DirList = ::getenv(EnvVar);
		if (!DirList)
		return;
		StringRef Dirs(DirList);
		if (Dirs.empty())
		return;
		StringRef::size_type Delim;
		while ((Delim = Dirs.find(" ")) != StringRef::npos) {
		if (Delim != 0)
		CmdArgs.push_back(Args.MakeArgString(Dirs.substr(0, Delim)));
		Dirs = Dirs.substr(Delim + 1); // Trim front space
		}
		if (!Dirs.empty()) // Last arg may have no spaces
		CmdArgs.push_back(Args.MakeArgString(Dirs));
		}

		void NVPTX::Backend::ConstructJob(Compilation &C, const JobAction &JA,
		const InputInfo &Output,
		const InputInfoList &Inputs,
		const ArgList &Args,
		const char *LinkingOutput) const {

		assert(StringRef(JA.getOffloadingArch()).startswith("gfx") &&
		" unless gfx processor, backend should be clang");

		// For amdgcn the Backend Job will call 3.9 llvm-link & opt steps
		ArgStringList CmdArgs;
		// Add the input bc's created by compile step
		for (InputInfoList::const_iterator it = Inputs.begin(), ie = Inputs.end();
		it != ie; ++it) {
		const InputInfo &II = *it;
		CmdArgs.push_back(II.getFilename());
		}

		std::string GFXNAME = JA.getOffloadingArch();

		ArgStringList LibraryPaths;

		// Always search for bc libs in libamdgcn first
		const char *libamdgcn;
		libamdgcn = getenv("LIBAMDGCN");
		if (!libamdgcn)
		libamdgcn = "/opt/rocm/libamdgcn";
		LibraryPaths.push_back(Args.MakeArgString(
		"-L" + std::string(libamdgcn) + "/" + std::string(GFXNAME) + "/lib"));

		// Find in -L<path> and LIBRARY_PATH.
		for (auto Arg : Args) {
		if (Arg->getSpelling() == "-L") {
		std::string Current = "-L";
		Current += Arg->getValue();
		LibraryPaths.push_back(Args.MakeArgString(Current.c_str()));
		}
		}
		addDirectoryList(Args, LibraryPaths, "-L", "LIBRARY_PATH");

		LibraryPaths.push_back(
		Args.MakeArgString("-L" + C.getDriver().Dir + "/../lib/libdevice"));
		addBCLib(C, Args, CmdArgs, LibraryPaths,
		Args.MakeArgString("libicuda2gcn-" + std::string(GFXNAME) + ".bc"));

		addBCLib(C, Args, CmdArgs, LibraryPaths, "cuda2gcn.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "opencl.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "ockl.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "irif.amdgcn.bc");
		arsenmUnsubmitted Not Done Reply Inline Actions Why is this done under an NVPTX:: class arsenm: Why is this done under an NVPTX:: class
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions Because we are not creating a new toolchain for AMDGCN. We modify the logic in the tool constructor as needed for AMDGCN, keeping the ability to provide a set of mixed targets. gregrodgers: Because we are not creating a new toolchain for AMDGCN. We modify the logic in the tool…
		HahnfeldUnsubmitted Not Done Reply Inline Actions That sounds more like a hack, the CUDA classes should be separated from NVPTX and AMDGCN. Hahnfeld: That sounds more like a hack, the CUDA classes should be separated from NVPTX and AMDGCN.
		addBCLib(C, Args, CmdArgs, LibraryPaths, "ocml.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "oclc_finite_only_off.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "oclc_daz_opt_off.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths,
		"oclc_correctly_rounded_sqrt_on.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "oclc_unsafe_math_off.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "hc.amdgcn.bc");
		addBCLib(C, Args, CmdArgs, LibraryPaths, "oclc_isa_version.amdgcn.bc");

		addEnvListWithSpaces(Args, CmdArgs, "CLANG_TARGET_LINK_OPTS");
		CmdArgs.push_back("-suppress-warnings");

		// Add an intermediate output file which is input to opt
		CmdArgs.push_back("-o");
		std::string TmpName = C.getDriver().GetTemporaryPath("OPT_INPUT", "bc");
		const char *ResultingBitcodeF =
		C.addTempFile(C.getArgs().MakeArgString(TmpName.c_str()));
		CmdArgs.push_back(ResultingBitcodeF);
		const char *Exec = Args.MakeArgString(C.getDriver().Dir + "/llvm-link");
		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));

		ArgStringList OptArgs;
		// The input to opt is the output from llvm-link
		OptArgs.push_back(ResultingBitcodeF);
		// Add CLANG_TARGETOPT_OPTS override options to opt
		if (getenv("CLANG_TARGET_OPT_OPTS"))
		addEnvListWithSpaces(Args, OptArgs, "CLANG_TARGET_OPT_OPTS");
		else {
		OptArgs.push_back(Args.MakeArgString("-O2"));
		arsenmUnsubmitted Not Done Reply Inline Actions Why is this hardcoded? arsenm: Why is this hardcoded?
		OptArgs.push_back("-S");
		const char *mcpustr = Args.MakeArgString("-mcpu=" + GFXNAME);
		OptArgs.push_back(mcpustr);
		OptArgs.push_back("-dce");
		OptArgs.push_back("-sroa");
		OptArgs.push_back("-globaldce");
		}
		OptArgs.push_back("-o");
		OptArgs.push_back(Output.getFilename());
		const char *OptExec = Args.MakeArgString(C.getDriver().Dir + "/opt");
		C.addCommand(llvm::make_unique<Command>(JA, *this, OptExec, OptArgs, Inputs));

		// If Verbose, save input for llc in /tmp and print all symbols
		if (Args.hasArg(options::OPT_v)) {
		ArgStringList nmArgs;
		nmArgs.push_back(ResultingBitcodeF);
		nmArgs.push_back("-debug-syms");
		const char *nmExec = Args.MakeArgString(C.getDriver().Dir + "/llvm-nm");
		C.addCommand(llvm::make_unique<Command>(JA, *this, nmExec, nmArgs, Inputs));
		ArgStringList cpArgs;
		cpArgs.push_back(Output.getFilename());
		cpArgs.push_back("/tmp/llc_input.ll");
		C.addCommand(llvm::make_unique<Command>(
		HahnfeldUnsubmitted Not Done Reply Inline Actions I don't think you should invent new environment variables, Clang normally uses the `-X` class to pass arguments to specific parts of the toolchain. Hahnfeld: I don't think you should invent new environment variables, Clang normally uses the `-X` class…
		JA, *this, Args.MakeArgString("/bin/cp"), cpArgs, Inputs));
		}
		HahnfeldUnsubmitted Not Done Reply Inline Actions This never gets cleaned up! Hahnfeld: This never gets cleaned up!
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions OK, Deleted in revision. gregrodgers: OK, Deleted in revision.
		}

void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,		void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const auto &TC =		const auto &TC =
static_cast<const toolchains::CudaToolChain &>(getToolChain());		static_cast<const toolchains::CudaToolChain &>(getToolChain());
assert(TC.getTriple().isNVPTX() && "Wrong platform");		assert((TC.getTriple().isNVPTX() \|\|
		TC.getTriple().getArch() == llvm::Triple::amdgcn) &&
		"Wrong platform");

StringRef GPUArchName;		StringRef GPUArchName;
// If this is an OpenMP action we need to extract the device architecture		// If this is an OpenMP action we need to extract the device architecture
// from the -march=arch option. This option may come from -Xopenmp-target		// from the -march=arch option. This option may come from -Xopenmp-target
// flag or the default value.		// flag or the default value.
if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {		if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);		GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);
assert(!GPUArchName.empty() && "Must have an architecture passed in.");		assert(!GPUArchName.empty() && "Must have an architecture passed in.");
} else		} else
GPUArchName = JA.getOffloadingArch();		GPUArchName = JA.getOffloadingArch();

// Obtain architecture from the action.		// Obtain architecture from the action.
CudaArch gpu_arch = StringToCudaArch(GPUArchName);		CudaArch gpu_arch = StringToCudaArch(GPUArchName);
assert(gpu_arch != CudaArch::UNKNOWN &&		assert(gpu_arch != CudaArch::UNKNOWN &&
"Device action expected to have an architecture.");		"Device action expected to have an architecture.");

		// For amdgcn this job will call llc (Lightning Compiler)
		if (StringRef(JA.getOffloadingArch()).startswith("gfx")) {
		ArgStringList CmdArgs;
		for (InputInfoList::const_iterator it = Inputs.begin(), ie = Inputs.end();
		it != ie; ++it) {
		const InputInfo &II = *it;
		CmdArgs.push_back(II.getFilename());
		}
		CmdArgs.push_back("-mtriple=amdgcn--cuda");
		CmdArgs.push_back("-filetype=obj");
		addEnvListWithSpaces(Args, CmdArgs, "CLANG_TARGET_LLC_OPTS");
		std::string GFXNAME = JA.getOffloadingArch();
		CmdArgs.push_back(Args.MakeArgString("-mcpu=" + GFXNAME));
		CmdArgs.push_back("-o");
		std::string TmpName = C.getDriver().GetTemporaryPath("LC_OUTPUT", "o");
		const char *llcOutputFile =
		C.addTempFile(C.getArgs().MakeArgString(TmpName.c_str()));
		CmdArgs.push_back(llcOutputFile);
		const char *Exec = Args.MakeArgString(C.getDriver().Dir + "/llc");
		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));

		ArgStringList CmdArgs2;
		CmdArgs2.push_back("-flavor");
		CmdArgs2.push_back("gnu");
		CmdArgs2.push_back("--no-undefined");
		CmdArgs2.push_back("-shared");
		// The output from ld.lld is an HSA code object file
		CmdArgs2.push_back("-o");
		CmdArgs2.push_back(Output.getFilename());
		CmdArgs2.push_back(llcOutputFile);
		const char *lld = Args.MakeArgString(C.getDriver().Dir + "/lld");
		C.addCommand(llvm::make_unique<Command>(JA, *this, lld, CmdArgs2, Inputs));
		return;
		}

// Check that our installation's ptxas supports gpu_arch.		// Check that our installation's ptxas supports gpu_arch.
if (!Args.hasArg(options::OPT_no_cuda_version_check)) {		if (!Args.hasArg(options::OPT_no_cuda_version_check)) {
TC.CudaInstallation.CheckCudaVersionSupportsArch(gpu_arch);		TC.CudaInstallation.CheckCudaVersionSupportsArch(gpu_arch);
}		}

ArgStringList CmdArgs;		ArgStringList CmdArgs;
CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32");		CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32");
if (Args.hasFlag(options::OPT_cuda_noopt_device_debug,		if (Args.hasFlag(options::OPT_cuda_noopt_device_debug,
Show All 36 Lines	void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,

// Pass -v to ptxas if it was passed to the driver.		// Pass -v to ptxas if it was passed to the driver.
if (Args.hasArg(options::OPT_v))		if (Args.hasArg(options::OPT_v))
CmdArgs.push_back("-v");		CmdArgs.push_back("-v");

CmdArgs.push_back("--gpu-name");		CmdArgs.push_back("--gpu-name");
CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));		CmdArgs.push_back(Args.MakeArgString(CudaArchToString(gpu_arch)));
CmdArgs.push_back("--output-file");		CmdArgs.push_back("--output-file");
CmdArgs.push_back(Args.MakeArgString(TC.getInputFilename(Output)));		SmallString<256> OutputFileName(Output.getFilename());
		if (JA.isOffloading(Action::OFK_OpenMP))
		llvm::sys::path::replace_extension(OutputFileName, "cubin");
		CmdArgs.push_back(Args.MakeArgString(OutputFileName));
		HahnfeldUnsubmitted Not Done Reply Inline Actions That is already done in `TC.getInputFilename(Output)` (since rC318763), the same function call that you are removing here... Hahnfeld: That is already done in `TC.getInputFilename(Output)` (since rC318763), the same function call…
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions Fixed in next update gregrodgers: Fixed in next update
for (const auto& II : Inputs)		for (const auto& II : Inputs)
CmdArgs.push_back(Args.MakeArgString(II.getFilename()));		CmdArgs.push_back(Args.MakeArgString(II.getFilename()));

for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))		for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_ptxas))
CmdArgs.push_back(Args.MakeArgString(A));		CmdArgs.push_back(Args.MakeArgString(A));

// In OpenMP we need to generate relocatable code.		// In OpenMP we need to generate relocatable code.
if (JA.isOffloading(Action::OFK_OpenMP) &&		if (JA.isOffloading(Action::OFK_OpenMP) &&
Show All 15 Lines
// correspond to.		// correspond to.
void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,		void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const auto &TC =		const auto &TC =
static_cast<const toolchains::CudaToolChain &>(getToolChain());		static_cast<const toolchains::CudaToolChain &>(getToolChain());
assert(TC.getTriple().isNVPTX() && "Wrong platform");		assert((TC.getTriple().isNVPTX() \|\|
		TC.getTriple().getArch() == llvm::Triple::amdgcn) &&
		"Wrong platform");

ArgStringList CmdArgs;		ArgStringList CmdArgs;
CmdArgs.push_back("--cuda");		CmdArgs.push_back("--cuda");
CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-64" : "-32");		CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-64" : "-32");
CmdArgs.push_back(Args.MakeArgString("--create"));		CmdArgs.push_back(Args.MakeArgString("--create"));
		bool found_gfx = false;
		for (const auto &II : Inputs)
		if (StringRef(II.getAction()->getOffloadingArch()).startswith("gfx"))
		found_gfx = true;
		const char *fbOutputFile;
		if (found_gfx) {
		// If gfx, we need clang-fixup-fatbin, so intercept the fatbinary output
		std::string TmpName = C.getDriver().GetTemporaryPath("FB_FIXUP", "fatbin");
		fbOutputFile = C.addTempFile(C.getArgs().MakeArgString(TmpName.c_str()));
		CmdArgs.push_back(fbOutputFile);
		} else
CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));		CmdArgs.push_back(Args.MakeArgString(Output.getFilename()));

for (const auto& II : Inputs) {		for (const auto& II : Inputs) {
auto *A = II.getAction();		auto *A = II.getAction();
		if (StringRef(A->getOffloadingArch()).startswith("gfx")) {
		if (II.getType() != types::TY_PP_Asm) {
		CmdArgs.push_back(Args.MakeArgString("--no-asm"));
		// LIE to avoid unknown profile in fatbinary
		CmdArgs.push_back(Args.MakeArgString(
		llvm::Twine("--image=profile=sm_37,file=") + +II.getFilename()));
		}
		} else {
assert(A->getInputs().size() == 1 &&		assert(A->getInputs().size() == 1 &&
"Device offload action is expected to have a single input");		"Device offload action is expected to have a single input");
const char *gpu_arch_str = A->getOffloadingArch();		const char *gpu_arch_str = A->getOffloadingArch();
assert(gpu_arch_str &&		assert(gpu_arch_str &&
"Device action expected to have associated a GPU architecture!");		"Device action expected to have associated a GPU architecture!");
CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);		CudaArch gpu_arch = StringToCudaArch(gpu_arch_str);

// We need to pass an Arch of the form "sm_XX" for cubin files and		// We need to pass an Arch of the form "sm_XX" for cubin files and
// "compute_XX" for ptx.		// "compute_XX" for ptx.
const char *Arch =		const char *Arch =
(II.getType() == types::TY_PP_Asm)		(II.getType() == types::TY_PP_Asm)
? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))		? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
: gpu_arch_str;		: gpu_arch_str;
CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +		CmdArgs.push_back(Args.MakeArgString(llvm::Twine("--image=profile=") +
Arch + ",file=" + II.getFilename()));		Arch + ",file=" + II.getFilename()));
}		}
		}

for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_fatbinary))		for (const auto& A : Args.getAllArgValues(options::OPT_Xcuda_fatbinary))
CmdArgs.push_back(Args.MakeArgString(A));		CmdArgs.push_back(Args.MakeArgString(A));

const char *Exec = Args.MakeArgString(TC.GetProgramPath("fatbinary"));		const char *Exec = Args.MakeArgString(TC.GetProgramPath("fatbinary"));
C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));

		if (found_gfx) {
		std::string subarchs = "-offload-archs=";
		bool first = true;
		for (const auto &II : Inputs)
		if (II.getType() != types::TY_PP_Asm) {
		if (first) {
		subarchs =
		subarchs + StringRef(II.getAction()->getOffloadingArch()).str();
		first = false;
		} else {
		subarchs = subarchs + "," +
		StringRef(II.getAction()->getOffloadingArch()).str();
		}
		}
		ArgStringList CmdArgs2;
		CmdArgs2.push_back(Args.MakeArgString(subarchs));
		CmdArgs2.push_back(fbOutputFile);
		CmdArgs2.push_back(Args.MakeArgString(Output.getFilename()));
		const char *Exec2 =
		Args.MakeArgString(C.getDriver().Dir + "/clang-fixup-fatbin");
		HahnfeldUnsubmitted Not Done Reply Inline Actions `clang-fixup-fatbin` is not upstreamed and won't work. Sounds like a horrible name btw... Hahnfeld: `clang-fixup-fatbin` is not upstreamed and won't work. Sounds like a horrible name btw...
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions Major cleanup here in the next update. It is not a bad name when you see the update and the comments in the update. gregrodgers: Major cleanup here in the next update. It is not a bad name when you see the update and the…
		HahnfeldUnsubmitted Not Done Reply Inline Actions I wasn't only criticizing the name but also the fact that this code won't work with only upstream components! Hahnfeld: I wasn't only criticizing the name but also the fact that this code won't work with only…
		C.addCommand(
		llvm::make_unique<Command>(JA, *this, Exec2, CmdArgs2, Inputs));
		}
}		}

void NVPTX::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,		void NVPTX::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const auto &TC =		const auto &TC =
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (LibDeviceFile.empty()) {
if (DeviceOffloadingKind == Action::OFK_OpenMP &&		if (DeviceOffloadingKind == Action::OFK_OpenMP &&
DriverArgs.hasArg(options::OPT_S))		DriverArgs.hasArg(options::OPT_S))
return;		return;

getDriver().Diag(diag::err_drv_no_cuda_libdevice) << GpuArch;		getDriver().Diag(diag::err_drv_no_cuda_libdevice) << GpuArch;
return;		return;
}		}

		// Do not add -link-cuda-bitcode or ptx42 features if gfx
		for (Arg *A : DriverArgs)
		if (A->getOption().matches(options::OPT_cuda_gpu_arch_EQ) &&
		StringRef(A->getValue()).startswith("gfx"))
		return;

		HahnfeldUnsubmitted Not Done Reply Inline Actions You should use `GpuArch` which comes from `DriverArgs.getLastArgValue`: The last `-march` overrides previous arguments. Hahnfeld: You should use `GpuArch` which comes from `DriverArgs.getLastArgValue`: The last `-march`…
		gregrodgersAuthorUnsubmitted Not Done Reply Inline Actions Nice catch. I will fix this in the next update. gregrodgers: Nice catch. I will fix this in the next update.
CC1Args.push_back("-mlink-cuda-bitcode");		CC1Args.push_back("-mlink-cuda-bitcode");
CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));		CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));

if (CudaInstallation.version() >= CudaVersion::CUDA_90) {		if (CudaInstallation.version() >= CudaVersion::CUDA_90) {
// CUDA-9 uses new instructions that are only available in PTX6.0		// CUDA-9 uses new instructions that are only available in PTX6.0
CC1Args.push_back("-target-feature");		CC1Args.push_back("-target-feature");
CC1Args.push_back("+ptx60");		CC1Args.push_back("+ptx60");
} else {		} else {
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	CudaToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,

if (!BoundArch.empty()) {		if (!BoundArch.empty()) {
DAL->eraseArg(options::OPT_march_EQ);		DAL->eraseArg(options::OPT_march_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);
}		}
return DAL;		return DAL;
}		}

		Tool *CudaToolChain::buildBackend() const {
		return new tools::NVPTX::Backend(*this);
		}

Tool *CudaToolChain::buildAssembler() const {		Tool *CudaToolChain::buildAssembler() const {
return new tools::NVPTX::Assembler(*this);		return new tools::NVPTX::Assembler(*this);
}		}

Tool *CudaToolChain::buildLinker() const {		Tool *CudaToolChain::buildLinker() const {
if (OK == Action::OFK_OpenMP)		if (OK == Action::OFK_OpenMP)
return new tools::NVPTX::OpenMPLinker(*this);		return new tools::NVPTX::OpenMPLinker(*this);
return new tools::NVPTX::Linker(*this);		return new tools::NVPTX::Linker(*this);
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/Driver/cuda-phases.cu

	// Tests the phases generated for a CUDA offloading target for different			// Tests the phases generated for a CUDA offloading target for different
	// combinations of:			// combinations of:
	// - Number of gpu architectures;			// - Number of gpu architectures;
	// - Host/device-only compilation;			// - Host/device-only compilation;
	// - User-requested final phase - binary or assembly.			// - User-requested final phase - binary or assembly.

	// REQUIRES: clang-driver			// REQUIRES: clang-driver
	// REQUIRES: powerpc-registered-target			// REQUIRES: powerpc-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
				// REQUIRES: amdgpu-registered-target
	//			//
	// Test single gpu architecture with complete compilation.			// Test single gpu architecture with complete compilation.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=BIN %s			// RUN: \| FileCheck -check-prefixes=BIN,BIN_NV %s
				// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=BIN,BIN_AMD %s
	// BIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// BIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// BIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)
	// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// BIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)
	// BIN-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// BIN_NV-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH:sm_30]])
	// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, sm_30)			// BIN_AMD-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH:gfx803]])
	// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, sm_30)			// BIN-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, [[ARCH]])
	// BIN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, sm_30)			// BIN-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, [[ARCH]])
	// BIN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, sm_30)			// BIN-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, [[ARCH]])
	// BIN-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P7]]}, object			// BIN-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, [[ARCH]])
	// BIN-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P6]]}, assembler			// BIN-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH]])" {[[P7]]}, object
				// BIN-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH]])" {[[P6]]}, assembler
	// BIN-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-cuda)			// BIN-DAG: [[P10:[0-9]+]]: linker, {[[P8]], [[P9]]}, cuda-fatbin, (device-cuda)
	// BIN-DAG: [[P11:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P10]]}, ir			// BIN-DAG: [[P11:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P10]]}, ir
	// BIN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-cuda)			// BIN-DAG: [[P12:[0-9]+]]: backend, {[[P11]]}, assembler, (host-cuda)
	// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-cuda)			// BIN-DAG: [[P13:[0-9]+]]: assembler, {[[P12]]}, object, (host-cuda)
	// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-cuda)			// BIN-DAG: [[P14:[0-9]+]]: linker, {[[P13]]}, image, (host-cuda)

	//			//
	// Test single gpu architecture up to the assemble phase.			// Test single gpu architecture up to the assemble phase.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=ASM %s			// RUN: \| FileCheck -check-prefix=ASM %s
	// ASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s -S 2>&1 \
	// ASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefix=ASM %s
	// ASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// ASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH:sm_30\|gfx803]])
	// ASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// ASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, [[ARCH]])
	// ASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// ASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, [[ARCH]])
				// ASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, [[ARCH]])
				// ASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH]])" {[[P3]]}, assembler
	// ASM-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// ASM-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// ASM-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (host-cuda)			// ASM-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (host-cuda)
	// ASM-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (host-cuda)			// ASM-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (host-cuda)
	// ASM-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (host-cuda)			// ASM-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (host-cuda)

	//			//
	// Test two gpu architectures with complete compilation.			// Test two gpu architectures with complete compilation.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=BIN2 %s			// RUN: \| FileCheck -check-prefix=BIN2 %s
				// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=BIN2 %s
	// BIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// BIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// BIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// BIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)
	// BIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// BIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)
	// BIN2-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// BIN2-DAG: [[P3:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH1:sm_30\|gfx803]])
	// BIN2-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, sm_30)			// BIN2-DAG: [[P4:[0-9]+]]: preprocessor, {[[P3]]}, cuda-cpp-output, (device-cuda, [[ARCH1]])
	// BIN2-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, sm_30)			// BIN2-DAG: [[P5:[0-9]+]]: compiler, {[[P4]]}, ir, (device-cuda, [[ARCH1]])
	// BIN2-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, sm_30)			// BIN2-DAG: [[P6:[0-9]+]]: backend, {[[P5]]}, assembler, (device-cuda, [[ARCH1]])
	// BIN2-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, sm_30)			// BIN2-DAG: [[P7:[0-9]+]]: assembler, {[[P6]]}, object, (device-cuda, [[ARCH1]])
	// BIN2-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P7]]}, object			// BIN2-DAG: [[P8:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH1]])" {[[P7]]}, object
	// BIN2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P6]]}, assembler			// BIN2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH1]])" {[[P6]]}, assembler
	// BIN2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// BIN2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH2:sm_35\|gfx900]])
	// BIN2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, cuda-cpp-output, (device-cuda, sm_35)			// BIN2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, cuda-cpp-output, (device-cuda, [[ARCH2]])
	// BIN2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (device-cuda, sm_35)			// BIN2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (device-cuda, [[ARCH2]])
	// BIN2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (device-cuda, sm_35)			// BIN2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (device-cuda, [[ARCH2]])
	// BIN2-DAG: [[P14:[0-9]+]]: assembler, {[[P13]]}, object, (device-cuda, sm_35)			// BIN2-DAG: [[P14:[0-9]+]]: assembler, {[[P13]]}, object, (device-cuda, [[ARCH2]])
	// BIN2-DAG: [[P15:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P14]]}, object			// BIN2-DAG: [[P15:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH2]])" {[[P14]]}, object
	// BIN2-DAG: [[P16:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P13]]}, assembler			// BIN2-DAG: [[P16:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH2]])" {[[P13]]}, assembler
	// BIN2-DAG: [[P17:[0-9]+]]: linker, {[[P8]], [[P9]], [[P15]], [[P16]]}, cuda-fatbin, (device-cuda)			// BIN2-DAG: [[P17:[0-9]+]]: linker, {[[P8]], [[P9]], [[P15]], [[P16]]}, cuda-fatbin, (device-cuda)
	// BIN2-DAG: [[P18:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P17]]}, ir			// BIN2-DAG: [[P18:[0-9]+]]: offload, "host-cuda (powerpc64le-ibm-linux-gnu)" {[[P2]]}, "device-cuda (nvptx64-nvidia-cuda)" {[[P17]]}, ir
	// BIN2-DAG: [[P19:[0-9]+]]: backend, {[[P18]]}, assembler, (host-cuda)			// BIN2-DAG: [[P19:[0-9]+]]: backend, {[[P18]]}, assembler, (host-cuda)
	// BIN2-DAG: [[P20:[0-9]+]]: assembler, {[[P19]]}, object, (host-cuda)			// BIN2-DAG: [[P20:[0-9]+]]: assembler, {[[P19]]}, object, (host-cuda)
	// BIN2-DAG: [[P21:[0-9]+]]: linker, {[[P20]]}, image, (host-cuda)			// BIN2-DAG: [[P21:[0-9]+]]: linker, {[[P20]]}, image, (host-cuda)

	//			//
	// Test two gpu architecturess up to the assemble phase.			// Test two gpu architecturess up to the assemble phase.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=ASM2 %s			// RUN: \| FileCheck -check-prefix=ASM2 %s
	// ASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s -S 2>&1 \
	// ASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefix=ASM2 %s
	// ASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// ASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH1:sm_30\|gfx803]])
	// ASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// ASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, [[ARCH1]])
	// ASM2-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// ASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, [[ARCH1]])
	// ASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// ASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, [[ARCH1]])
	// ASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (device-cuda, sm_35)			// ASM2-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH1]])" {[[P3]]}, assembler
	// ASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-cuda, sm_35)			// ASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH2:sm_35\|gfx900]])
	// ASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-cuda, sm_35)			// ASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (device-cuda, [[ARCH2]])
	// ASM2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P8]]}, assembler			// ASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-cuda, [[ARCH2]])
				// ASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-cuda, [[ARCH2]])
				// ASM2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH2]])" {[[P8]]}, assembler
	// ASM2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// ASM2-DAG: [[P10:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// ASM2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, cuda-cpp-output, (host-cuda)			// ASM2-DAG: [[P11:[0-9]+]]: preprocessor, {[[P10]]}, cuda-cpp-output, (host-cuda)
	// ASM2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (host-cuda)			// ASM2-DAG: [[P12:[0-9]+]]: compiler, {[[P11]]}, ir, (host-cuda)
	// ASM2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (host-cuda)			// ASM2-DAG: [[P13:[0-9]+]]: backend, {[[P12]]}, assembler, (host-cuda)

	//			//
	// Test single gpu architecture with complete compilation in host-only			// Test single gpu architecture with complete compilation in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=HBIN %s			// RUN: \| FileCheck -check-prefix=HBIN %s
				// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-host-only 2>&1 \
				// RUN: \| FileCheck -check-prefix=HBIN %s
	// HBIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// HBIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// HBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// HBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)
	// HBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)
	// HBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)
	// HBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-cuda)			// HBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-cuda)
	// HBIN-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-cuda)			// HBIN-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-cuda)
	//			//
	// Test single gpu architecture up to the assemble phase in host-only			// Test single gpu architecture up to the assemble phase in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-host-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=HASM %s			// RUN: \| FileCheck -check-prefix=HASM %s
				// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-host-only -S 2>&1 \
				// RUN: \| FileCheck -check-prefix=HASM %s
	// HASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// HASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// HASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// HASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)
	// HASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)
	// HASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)

	//			//
	// Test two gpu architectures with complete compilation in host-only			// Test two gpu architectures with complete compilation in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=HBIN2 %s			// RUN: \| FileCheck -check-prefix=HBIN2 %s
				// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-host-only 2>&1 \
				// RUN: \| FileCheck -check-prefix=HBIN2 %s
	// HBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// HBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// HBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// HBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)
	// HBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)
	// HBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)
	// HBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-cuda)			// HBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (host-cuda)
	// HBIN2-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-cuda)			// HBIN2-DAG: [[P5:[0-9]+]]: linker, {[[P4]]}, image, (host-cuda)

	//			//
	// Test two gpu architectures up to the assemble phase in host-only			// Test two gpu architectures up to the assemble phase in host-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-host-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=HASM2 %s			// RUN: \| FileCheck -check-prefix=HASM2 %s
				// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-host-only -S 2>&1 \
				// RUN: \| FileCheck -check-prefix=HASM2 %s
	// HASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)			// HASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (host-cuda)
	// HASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)			// HASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (host-cuda)
	// HASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)			// HASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (host-cuda)
	// HASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)			// HASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (host-cuda)

	//			//
	// Test single gpu architecture with complete compilation in device-only			// Test single gpu architecture with complete compilation in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=DBIN %s			// RUN: \| FileCheck -check-prefix=DBIN %s
	// DBIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-device-only 2>&1 \
	// DBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefix=DBIN %s
	// DBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DBIN-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH:sm_30\|gfx803]])
	// DBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DBIN-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, [[ARCH]])
	// DBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-cuda, sm_30)			// DBIN-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, [[ARCH]])
	// DBIN-DAG: [[P5:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P4]]}, object			// DBIN-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, [[ARCH]])
				// DBIN-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-cuda, [[ARCH]])
				// DBIN-DAG: [[P5:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH]])" {[[P4]]}, object

	//			//
	// Test single gpu architecture up to the assemble phase in device-only			// Test single gpu architecture up to the assemble phase in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 %s --cuda-device-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=DASM %s			// RUN: \| FileCheck -check-prefix=DASM %s
	// DASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 %s --cuda-device-only -S 2>&1 \
	// DASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefix=DASM %s
	// DASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DASM-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH:sm_30\|gfx803]])
	// DASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DASM-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, [[ARCH]])
	// DASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// DASM-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, [[ARCH]])
				// DASM-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, [[ARCH]])
				// DASM-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH]])" {[[P3]]}, assembler

	//			//
	// Test two gpu architectures with complete compilation in device-only			// Test two gpu architectures with complete compilation in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only 2>&1 \
	// RUN: \| FileCheck -check-prefix=DBIN2 %s			// RUN: \| FileCheck -check-prefix=DBIN2 %s
	// DBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-device-only 2>&1 \
	// DBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefix=DBIN2 %s
	// DBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DBIN2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH:sm_30\|gfx803]])
	// DBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DBIN2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, [[ARCH]])
	// DBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-cuda, sm_30)			// DBIN2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, [[ARCH]])
	// DBIN2-DAG: [[P5:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P4]]}, object			// DBIN2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, [[ARCH]])
	// DBIN2-DAG: [[P6:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// DBIN2-DAG: [[P4:[0-9]+]]: assembler, {[[P3]]}, object, (device-cuda, [[ARCH]])
	// DBIN2-DAG: [[P7:[0-9]+]]: preprocessor, {[[P6]]}, cuda-cpp-output, (device-cuda, sm_35)			// DBIN2-DAG: [[P5:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH]])" {[[P4]]}, object
	// DBIN2-DAG: [[P8:[0-9]+]]: compiler, {[[P7]]}, ir, (device-cuda, sm_35)			// DBIN2-DAG: [[P6:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH2:sm_35\|gfx900]])
	// DBIN2-DAG: [[P9:[0-9]+]]: backend, {[[P8]]}, assembler, (device-cuda, sm_35)			// DBIN2-DAG: [[P7:[0-9]+]]: preprocessor, {[[P6]]}, cuda-cpp-output, (device-cuda, [[ARCH2]])
	// DBIN2-DAG: [[P10:[0-9]+]]: assembler, {[[P9]]}, object, (device-cuda, sm_35)			// DBIN2-DAG: [[P8:[0-9]+]]: compiler, {[[P7]]}, ir, (device-cuda, [[ARCH2]])
	// DBIN2-DAG: [[P11:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P10]]}, object			// DBIN2-DAG: [[P9:[0-9]+]]: backend, {[[P8]]}, assembler, (device-cuda, [[ARCH2]])
				// DBIN2-DAG: [[P10:[0-9]+]]: assembler, {[[P9]]}, object, (device-cuda, [[ARCH2]])
				// DBIN2-DAG: [[P11:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH2]])" {[[P10]]}, object

	//			//
	// Test two gpu architectures up to the assemble phase in device-only			// Test two gpu architectures up to the assemble phase in device-only
	// compilation mode.			// compilation mode.
	//			//
	// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_35 %s --cuda-device-only -S 2>&1 \
	// RUN: \| FileCheck -check-prefix=DASM2 %s			// RUN: \| FileCheck -check-prefix=DASM2 %s
	// DASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_30)			// RUN: %clang -target powerpc64le-ibm-linux-gnu -ccc-print-phases --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s --cuda-device-only -S 2>&1 \
	// DASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, sm_30)			// RUN: \| FileCheck -check-prefix=DASM2 %s
	// DASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, sm_30)			// DASM2-DAG: [[P0:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH:sm_30\|gfx803]])
	// DASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, sm_30)			// DASM2-DAG: [[P1:[0-9]+]]: preprocessor, {[[P0]]}, cuda-cpp-output, (device-cuda, [[ARCH]])
	// DASM2-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {[[P3]]}, assembler			// DASM2-DAG: [[P2:[0-9]+]]: compiler, {[[P1]]}, ir, (device-cuda, [[ARCH]])
	// DASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, sm_35)			// DASM2-DAG: [[P3:[0-9]+]]: backend, {[[P2]]}, assembler, (device-cuda, [[ARCH]])
	// DASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (device-cuda, sm_35)			// DASM2-DAG: [[P4:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH]])" {[[P3]]}, assembler
	// DASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-cuda, sm_35)			// DASM2-DAG: [[P5:[0-9]+]]: input, "{{.*}}cuda-phases.cu", cuda, (device-cuda, [[ARCH2:sm_35\|gfx900]])
	// DASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-cuda, sm_35)			// DASM2-DAG: [[P6:[0-9]+]]: preprocessor, {[[P5]]}, cuda-cpp-output, (device-cuda, [[ARCH2]])
	// DASM2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:sm_35)" {[[P8]]}, assembler			// DASM2-DAG: [[P7:[0-9]+]]: compiler, {[[P6]]}, ir, (device-cuda, [[ARCH2]])
				// DASM2-DAG: [[P8:[0-9]+]]: backend, {[[P7]]}, assembler, (device-cuda, [[ARCH2]])
				// DASM2-DAG: [[P9:[0-9]+]]: offload, "device-cuda (nvptx64-nvidia-cuda:[[ARCH2]])" {[[P8]]}, assembler

This is an archive of the discontinued LLVM Phabricator instance.

Let CUDA toolchain support amdgpu targetAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 132402

include/clang/Basic/Cuda.h

include/clang/Driver/ToolChain.h

lib/Basic/Cuda.cpp

lib/Basic/Targets/AMDGPU.h

lib/Basic/Targets/AMDGPU.cpp

lib/Basic/Targets/NVPTX.cpp

lib/Driver/Driver.cpp

lib/Driver/SanitizerArgs.cpp

lib/Driver/ToolChain.cpp

lib/Driver/ToolChains/Clang.cpp

lib/Driver/ToolChains/Cuda.h

lib/Driver/ToolChains/Cuda.cpp

test/Driver/cuda-phases.cu

Let CUDA toolchain support amdgpu target
AbandonedPublic