This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
Cuda.h
-
DiagnosticDriverKinds.td
-
Driver/
-
Options.td
-
lib/
-
Basic/
-
Cuda.cpp
-
Driver/
2
Driver.cpp
-
ToolChains/
-
Cuda.h
-
Cuda.cpp

Differential D75811

[CUDA] Choose default architecture based on CUDA installation
AbandonedPublic

Authored by tambre on Mar 7 2020, 9:53 AM.

Download Raw Diff

Details

Reviewers

tra
jlebar

Summary

Currently always defaults to sm_20.
However, CUDA >=9.0 doesn't support the sm_20 architecture.
Choose the minimum architecture the CUDA installation supports as the default.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tambre created this revision.Mar 7 2020, 9:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2020, 9:53 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

tambre added inline comments.Mar 7 2020, 9:59 AM

clang/lib/Driver/Driver.cpp
642	This isn't very pretty. Any better ideas for how to pass the current CUDA version or default arch to `CudaActionBuilder`?

Harbormaster failed remote builds in B48468: Diff 248944!Mar 7 2020, 10:50 AM

Add missing error message

Harbormaster completed remote builds in B48471: Diff 248949.Mar 7 2020, 12:28 PM

tambre added inline comments.Mar 7 2020, 11:04 PM

clang/lib/Driver/Driver.cpp
2579	This error is hit when simply running `clang++ -v`, because a `CudaToolchain` isn't created. The `CudaInstallation` is instead created in `Generic_GCC`. My current approach of propagating the current CUDA version to here seems even worse now. Ignoring an unknown version doesn't seem like a good idea either. Ideas?

I'm not sure that's the problem worth solving.

Magically changing compiler target based on something external to compiler is a bad idea IMO. I would expect a compilation with exactly the same compiler options to do exactly the same thing. If we magically change default target, that will not be the case.

Also, there's no good default for a GPU. I can't think of anything that would work out of the box for most of the users.
In practice compilation for the GPU require specifying GPU target set that's specific to particular user.

It may make more sense to just bump the default to a sensible value. E.g. sm_60, warn users ahead of time and flip the default at some point later. This will shift the default towards the target that's useful for most of the GPUs that are currently out there (though there are still a lot of sm_35 GPUs in the clouds, so it may be a reasonable default, too).

Magically changing compiler target based on something external to compiler is a bad idea IMO. I would expect a compilation with exactly the same compiler options to do exactly the same thing. If we magically change default target, that will not be the case.

It'd be the same behaviour as NVCC, which compiles for the lowest architecture it supports.

I'm currently implementing Clang CUDA support for CMake and lack of this behaviour compared to other languages and compilers complicates matters.
During compiler detection CMake compiles a simple program, which includes preprocessor stuff for embedding compiler info in the output. Then it parses that and determines the compiler vendor, version, etc.

The general assumption is that a compiler can compile a simple program for its language without us having to do compiler-specific options, flags, etc. If the compiler fails on this simple program, it's considered broken.
A limited list of flags is usually cycled through to support exotic compilers and I could do the same here, but it'd require us invoking the compiler multiple times and increasingly more as old architectures are deprecated.
We could detect the CUDA installation ourselves and specify a list of arches for each. This seems quite unnecessary when Clang already knows the version and could select a default that at least compiles.
Note that this detection happens before any user CMake files are ran, so we can't pass the user's preferred arch (which could also differ per file).

In D75811#1913148, @tambre wrote:

Magically changing compiler target based on something external to compiler is a bad idea IMO. I would expect a compilation with exactly the same compiler options to do exactly the same thing. If we magically change default target, that will not be the case.

It'd be the same behaviour as NVCC, which compiles for the lowest architecture it supports.

The difference is NVCC is closely tied to the CUDA SDK itself while clang is expected to work with all of the CUDA versions since 7.x.
There's no way to match behavior of all NVCC versions at once. Bumping up the current default is fine. Matching particular NVCC version based on the CUDA SDK we happen to find is, IMO, somewhat similar to -march=native. We could implement it via --cuda-gpu-arch=auto or something like that, but I do not want it to be the default.

I'm currently implementing Clang CUDA support for CMake and lack of this behaviour compared to other languages and compilers complicates matters.
During compiler detection CMake compiles a simple program, which includes preprocessor stuff for embedding compiler info in the output. Then it parses that and determines the compiler vendor, version, etc.

The general assumption is that a compiler can compile a simple program for its language without us having to do compiler-specific options, flags, etc.

Bumping up the default to sm_35 would satisfy this criteria.

If the compiler fails on this simple program, it's considered broken.

I'm not sure how applicable this criteria for cross-compilation, which is effectively what clang does when we compile CUDA sources.
You are expected to provide correct path to the CUDA installation and correct set of target GPUs to compile for. Only the end user may know it. While we do hardcode few default CUDA locations and deal with quirks of some linux distributions, it does not remove the fact that in general cross-compilation does need the end-user to supply additional inputs.

A limited list of flags is usually cycled through to support exotic compilers and I could do the same here, but it'd require us invoking the compiler multiple times and increasingly more as old architectures are deprecated.

You can use --cuda-gpu-arch=sm_30 and that should cover all CUDA versions currently supported by clang. Maybe, even sm_50 -- I can no longer find any docs for CUDA-7.0, so can't say if it did support Maxwell already.

We could detect the CUDA installation ourselves and specify a list of arches for each. This seems quite unnecessary when Clang already knows the version and could select a default that at least compiles.
Note that this detection happens before any user CMake files are ran, so we can't pass the user's preferred arch (which could also differ per file).

See above. Repeated iteration is indeed unnecessary and bumped up default target should do the job.

In general, though, relying on this check without taking into-account the information supplied by user will be rather fragile.
The CUDA version clang finds by default may not be correct or working and clang *relies* on it in order to do anything useful with CUDA. E.g. if I have an ARM version of CUDA installed under /usr/local/cuda where clang looks for CUDA by default. It will happily find it, but it will not be able to compile anything with it. It may work fine if it's pointed to the correct CUDA location via user-specified options.

Can you elaborate on what exactly does cmake attempts to establish with the test?
If it looks for a working end-to-end CUDA compilation, then it will need to rely on user input to make sure that correct CUDA location is used.
If it wants to check if clang is capable of CUDA compilation, then it should be told *not* to look for CUDA (though you will need to provide a bit of glue similar to what we use for tests https://github.com/llvm/llvm-project/blob/master/clang/test/Driver/cuda-simple.cu). Would something like that be sufficient?

The farthest you can push clang w/o relying on the CUDA SDK is by using --cuda-gpu-arch=sm_30 --cuda-device-only -S -- it will verify that clang does have NVPTX back-end compiled in and can generate PTX which will then be passed to CUDA's ptxas. If this part works, then clang is likely to work with any supported CUDA version.

Thank you for the long and detailed explanation. It's been of great help!

I've gone with the approach of trying the architectures in the most recent non-deprecated order – sm_52, sm_30.
A problem with bumping the default architecture would have been that there are already Clang version released, which support CUDA 10, but still use sm_20 by default. CMake probably wants to support the widest range possible.

Can you elaborate on what exactly does cmake attempts to establish with the test?
If it looks for a working end-to-end CUDA compilation, then it will need to rely on user input to make sure that correct CUDA location is used.
If it wants to check if clang is capable of CUDA compilation, then it should be told *not* to look for CUDA (though you will need to provide a bit of glue similar to what we use for tests https://github.com/llvm/llvm-project/blob/master/clang/test/Driver/cuda-simple.cu). Would something like that be sufficient?

The aim is to check for working end-to-end CUDA compilation.

You're right that CMake ought to rely on the user to provide many of the variables.
I'll be adding a CUDA_ROOT option to CMake that will be passed to clang as --cuda-path.
CMake also currently lacks options to pass an architecture to the CUDA compiler though this feature has been requested multiple times. Users so far had to do this themselves by passing raw compiler flags. I'm also working on support for this. The first detected working architecture during compiler identification will be used as the default.

After some work on my CMake changes, Clang detection as a CUDA compiler works and I can compile CUDA code.
However code using separable compilation doesn't compile. What is the Clang equivalent of NVCC's -dc (--device-c) option for this case?

The CMake code review for CUDA Clang support is here.

In D75811#1919368, @tambre wrote:

After some work on my CMake changes, Clang detection as a CUDA compiler works and I can compile CUDA code.

\o/ Nice! Having cmake supporting clang as a cuda compiler out of the box would be really nice.

However code using separable compilation doesn't compile. What is the Clang equivalent of NVCC's -dc (--device-c) option for this case?

Ah, -rdc compilation is somewhat tricky. NVCC does quite a bit of extra stuff under the hood that would be rather hard to implement in clang's driver, so it falls on the build system.
Clang will generate relocatable GPU code if you pass -fcuda-rdc, but that's only part of the story. Someone somewhere will need to perform the final linking step. There's also additional initialization glue to be handled.
Here's how it's implemented in bazel in Tensorflow: https://github.com/tensorflow/tensorflow/blob/ed371aa5d266222c799a7192e438cdd8c00464fe/third_party/nccl/build_defs.bzl.tpl
The file has fairly detailed description of what needs to be done.

The CMake code review for CUDA Clang support is here.

I'll take a look.

I'll be adding a CUDA_ROOT option to CMake that will be passed to clang as --cuda-path.

I'm not familiar with CMake and whether that option is picked up from an environment variable, but on Windows that environment variable that the CUDA installer sets is CUDA_PATH:
https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#build-customizations-for-existing-projects

On Linux you are expected to add the <cuda root>/bin directory to the PATH environment variable.

I've gone with the approach of trying the architectures in the most recent non-deprecated order – sm_52, sm_30.

I'm curious why you added sm_52 (I'm currently writing bazel rules for better CUDA support, and I'm using just sm_30 because that's been nvcc's default for a while now).
Do you consider sm_52 GPUs to be particularly common or does sm_52 introduce a commonly used feature?
(fp16 requires sm_53, but I don't think that needs to be included in the out of the box experience)

Your help here and over on CMake's side has been very helpful. Thank you!
I'll @ you on CMake's side if I need any help while working on CUDA support. Hopefully you won't mind. :)

I'm progressing on this and hope to have initial support in a mergeable state within two weeks.
I've also now got CUDA crosscompilation working for ARM64.

tambre abandoned this revision.Mar 14 2020, 11:26 PM

In D75811#1923278, @csigg wrote:

I'll be adding a CUDA_ROOT option to CMake that will be passed to clang as --cuda-path.

I'm not familiar with CMake and whether that option is picked up from an environment variable, but on Windows that environment variable that the CUDA installer sets is CUDA_PATH:
https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#build-customizations-for-existing-projects

CMake's FindCUDAToolkit module indeed already uses CUDA_PATH on Windows.

On Linux you are expected to add the <cuda root>/bin directory to the PATH environment variable.

The CMake way is to usually provide an environment variable alongside a CMake variable (e.g. CUDACXX and CMAKE_CUDA_COMPILER). The environment variable will be respected above, then the CMake variable if set (e.g. in a toolchain file) and finally CMake tries common paths, executable names, etc to find what it needs.

In D75811#1923280, @csigg wrote:

I've gone with the approach of trying the architectures in the most recent non-deprecated order – sm_52, sm_30.

I'm curious why you added sm_52 (I'm currently writing bazel rules for better CUDA support, and I'm using just sm_30 because that's been nvcc's default for a while now).
Do you consider sm_52 GPUs to be particularly common or does sm_52 introduce a commonly used feature?
(fp16 requires sm_53, but I don't think that needs to be included in the out of the box experience)

I added sm_52 as the first one to try because support for sm_35, sm_37 and sm_50 is deprecated in CUDA 10.2.
CUDA 11 will probably remove them, so this ensures we're compatible with it ahead of time.

In D75811#1923281, @tambre wrote:

Your help here and over on CMake's side has been very helpful. Thank you!
I'll @ you on CMake's side if I need any help while working on CUDA support. Hopefully you won't mind. :)

No problem. I'll be happy to help.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Cuda.h

3 lines

DiagnosticDriverKinds.td

1 line

Driver/

Options.td

3 lines

lib/

Basic/

Cuda.cpp

8 lines

Driver/

Driver.cpp

26 lines

ToolChains/

Cuda.h

1 line

Cuda.cpp

4 lines

Diff 248949

clang/include/clang/Basic/Cuda.h

	Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
	CudaVirtualArch VirtualArchForCudaArch(CudaArch A);			CudaVirtualArch VirtualArchForCudaArch(CudaArch A);

	/// Get the earliest CudaVersion that supports the given CudaArch.			/// Get the earliest CudaVersion that supports the given CudaArch.
	CudaVersion MinVersionForCudaArch(CudaArch A);			CudaVersion MinVersionForCudaArch(CudaArch A);

	/// Get the latest CudaVersion that supports the given CudaArch.			/// Get the latest CudaVersion that supports the given CudaArch.
	CudaVersion MaxVersionForCudaArch(CudaArch A);			CudaVersion MaxVersionForCudaArch(CudaArch A);

				/// Get the minimum CudaArch supported by the given CudaVersion.
				CudaArch MinArchForCudaVersion(CudaVersion V);

	// Various SDK-dependent features that affect CUDA compilation			// Various SDK-dependent features that affect CUDA compilation
	enum class CudaFeature {			enum class CudaFeature {
	// CUDA-9.2+ uses a new API for launching kernels.			// CUDA-9.2+ uses a new API for launching kernels.
	CUDA_USES_NEW_LAUNCH,			CUDA_USES_NEW_LAUNCH,
	// CUDA-10.1+ needs explicit end of GPU binary registration.			// CUDA-10.1+ needs explicit end of GPU binary registration.
	CUDA_USES_FATBIN_REGISTER_END,			CUDA_USES_FATBIN_REGISTER_END,
	};			};

	bool CudaFeatureEnabled(llvm::VersionTuple, CudaFeature);			bool CudaFeatureEnabled(llvm::VersionTuple, CudaFeature);
	bool CudaFeatureEnabled(CudaVersion, CudaFeature);			bool CudaFeatureEnabled(CudaVersion, CudaFeature);

	} // namespace clang			} // namespace clang

	#endif			#endif

clang/include/clang/Basic/DiagnosticDriverKinds.td

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	def err_drv_cuda_version_unsupported : Error<
"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "		"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "
"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "		"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "
"install, pass a different GPU arch with --cuda-gpu-arch, or pass "		"install, pass a different GPU arch with --cuda-gpu-arch, or pass "
"--no-cuda-version-check.">;		"--no-cuda-version-check.">;
def warn_drv_unknown_cuda_version: Warning<		def warn_drv_unknown_cuda_version: Warning<
"Unknown CUDA version %0. Assuming the latest supported version %1">,		"Unknown CUDA version %0. Assuming the latest supported version %1">,
InGroup<CudaUnknownVersion>;		InGroup<CudaUnknownVersion>;
def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;		def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;
		def err_drv_cuda_unknown_version : Error<"unknown CUDA version '%0'">;
def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;		def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;
def err_drv_invalid_thread_model_for_target : Error<		def err_drv_invalid_thread_model_for_target : Error<
"invalid thread model '%0' in '%1' for this target">;		"invalid thread model '%0' in '%1' for this target">;
def err_drv_invalid_linker_name : Error<		def err_drv_invalid_linker_name : Error<
"invalid linker name in argument '%0'">;		"invalid linker name in argument '%0'">;
def err_drv_invalid_pgo_instrumentor : Error<		def err_drv_invalid_pgo_instrumentor : Error<
"invalid PGO instrumentor in argument '%0'">;		"invalid PGO instrumentor in argument '%0'">;
def err_drv_invalid_rtlib_name : Error<		def err_drv_invalid_rtlib_name : Error<
▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

	Show First 20 Lines • Show All 557 Lines • ▼ Show 20 Lines
	def cuda_device_only : Flag<["--"], "cuda-device-only">,			def cuda_device_only : Flag<["--"], "cuda-device-only">,
	HelpText<"Compile CUDA code for device only">;			HelpText<"Compile CUDA code for device only">;
	def cuda_host_only : Flag<["--"], "cuda-host-only">,			def cuda_host_only : Flag<["--"], "cuda-host-only">,
	HelpText<"Compile CUDA code for host only. Has no effect on non-CUDA "			HelpText<"Compile CUDA code for host only. Has no effect on non-CUDA "
	"compilations.">;			"compilations.">;
	def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,			def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,
	HelpText<"Compile CUDA code for both host and device (default). Has no "			HelpText<"Compile CUDA code for both host and device (default). Has no "
	"effect on non-CUDA compilations.">;			"effect on non-CUDA compilations.">;
				def cuda_version_EQ : Joined<["--"], "cuda-version=">, Flags<[DriverOption]>,
				MetaVarName<"<version>">, Values<"<major>.<minor>">,
				HelpText<"Used to choose default architecture if no other options are given.">;
	def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[DriverOption]>,			def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[DriverOption]>,
	HelpText<"Include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;			HelpText<"Include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
	def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[DriverOption]>,			def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[DriverOption]>,
	HelpText<"Do not include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;			HelpText<"Do not include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
	def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,			def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,
	HelpText<"CUDA GPU architecture (e.g. sm_35). May be specified more than once.">;			HelpText<"CUDA GPU architecture (e.g. sm_35). May be specified more than once.">;
	def hip_link : Flag<["--"], "hip-link">,			def hip_link : Flag<["--"], "hip-link">,
	HelpText<"Link clang-offload-bundler bundles for HIP">;			HelpText<"Link clang-offload-bundler bundles for HIP">;
	▲ Show 20 Lines • Show All 2,859 Lines • Show Last 20 Lines

clang/lib/Basic/Cuda.cpp

Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines	CudaVersion MaxVersionForCudaArch(CudaArch A) {
case CudaArch::GFX1011:		case CudaArch::GFX1011:
case CudaArch::GFX1012:		case CudaArch::GFX1012:
return CudaVersion::CUDA_80;		return CudaVersion::CUDA_80;
default:		default:
return CudaVersion::LATEST;		return CudaVersion::LATEST;
}		}
}		}

		CudaArch MinArchForCudaVersion(CudaVersion V) {
		if (V >= CudaVersion::CUDA_90) {
		return CudaArch::SM_30;
		} else {
		return CudaArch::SM_20;
		}
		}

static CudaVersion ToCudaVersion(llvm::VersionTuple Version) {		static CudaVersion ToCudaVersion(llvm::VersionTuple Version) {
int IVer =		int IVer =
Version.getMajor() * 10 + Version.getMinor().getValueOr(0);		Version.getMajor() * 10 + Version.getMinor().getValueOr(0);
switch(IVer) {		switch(IVer) {
case 70:		case 70:
return CudaVersion::CUDA_70;		return CudaVersion::CUDA_70;
case 75:		case 75:
return CudaVersion::CUDA_75;		return CudaVersion::CUDA_75;
Show All 31 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 628 Lines • ▼ Show 20 Lines	if (IsCuda) {
const llvm::Triple &HostTriple = HostTC->getTriple();		const llvm::Triple &HostTriple = HostTC->getTriple();
StringRef DeviceTripleStr;		StringRef DeviceTripleStr;
auto OFK = Action::OFK_Cuda;		auto OFK = Action::OFK_Cuda;
DeviceTripleStr =		DeviceTripleStr =
HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" : "nvptx-nvidia-cuda";		HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda" : "nvptx-nvidia-cuda";
llvm::Triple CudaTriple(DeviceTripleStr);		llvm::Triple CudaTriple(DeviceTripleStr);
// Use the CUDA and host triples as the key into the ToolChains map,		// Use the CUDA and host triples as the key into the ToolChains map,
// because the device toolchain we create depends on both.		// because the device toolchain we create depends on both.
auto &CudaTC = ToolChains[CudaTriple.str() + "/" + HostTriple.str()];		auto &Toolchain = ToolChains[CudaTriple.str() + "/" + HostTriple.str()];
if (!CudaTC) {		if (!Toolchain) {
CudaTC = std::make_unique<toolchains::CudaToolChain>(		std::unique_ptr<toolchains::CudaToolChain> CudaTC =
		std::make_unique<toolchains::CudaToolChain>(
this, CudaTriple, HostTC, C.getInputArgs(), OFK);		this, CudaTriple, HostTC, C.getInputArgs(), OFK);
		C.getArgs().AddJoinedArg(
		tambreAuthorUnsubmitted Not Done Reply Inline Actions This isn't very pretty. Any better ideas for how to pass the current CUDA version or default arch to `CudaActionBuilder`? tambre: This isn't very pretty. Any better ideas for how to pass the current CUDA version or default…
		nullptr, getOpts().getOption(options::OPT_cuda_version_EQ),
		CudaVersionToString(CudaTC->getCudaVersion()));
		Toolchain = std::move(CudaTC);
}		}
C.addOffloadDeviceToolChain(CudaTC.get(), OFK);		C.addOffloadDeviceToolChain(Toolchain.get(), OFK);
} else if (IsHIP) {		} else if (IsHIP) {
const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();		const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
const llvm::Triple &HostTriple = HostTC->getTriple();		const llvm::Triple &HostTriple = HostTC->getTriple();
StringRef DeviceTripleStr;		StringRef DeviceTripleStr;
auto OFK = Action::OFK_HIP;		auto OFK = Action::OFK_HIP;
DeviceTripleStr = "amdgcn-amd-amdhsa";		DeviceTripleStr = "amdgcn-amd-amdhsa";
llvm::Triple HIPTriple(DeviceTripleStr);		llvm::Triple HIPTriple(DeviceTripleStr);
// Use the HIP and host triples as the key into the ToolChains map,		// Use the HIP and host triples as the key into the ToolChains map,
▲ Show 20 Lines • Show All 1,910 Lines • ▼ Show 20 Lines	class OffloadingActionBuilder final {

/// \brief CUDA action builder. It injects device code in the host backend		/// \brief CUDA action builder. It injects device code in the host backend
/// action.		/// action.
class CudaActionBuilder final : public CudaActionBuilderBase {		class CudaActionBuilder final : public CudaActionBuilderBase {
public:		public:
CudaActionBuilder(Compilation &C, DerivedArgList &Args,		CudaActionBuilder(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)		const Driver::InputList &Inputs)
: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_Cuda) {		: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_Cuda) {
DefaultCudaArch = CudaArch::SM_20;		StringRef VersionString =
		Args.getLastArgValue(options::OPT_cuda_version_EQ);
		CudaVersion Version = CudaStringToVersion(VersionString);

		if (Version == CudaVersion::UNKNOWN) {
		C.getDriver().Diag(clang::diag::err_drv_cuda_unknown_version)
		tambreAuthorUnsubmitted Not Done Reply Inline Actions This error is hit when simply running `clang++ -v`, because a `CudaToolchain` isn't created. The `CudaInstallation` is instead created in `Generic_GCC`. My current approach of propagating the current CUDA version to here seems even worse now. Ignoring an unknown version doesn't seem like a good idea either. Ideas? tambre: This error is hit when simply running `clang++ -v`, because a `CudaToolchain` isn't created.
		<< VersionString;
		}

		DefaultCudaArch = MinArchForCudaVersion(Version);
}		}

ActionBuilderReturnCode		ActionBuilderReturnCode
getDeviceDependences(OffloadAction::DeviceDependences &DA,		getDeviceDependences(OffloadAction::DeviceDependences &DA,
phases::ID CurPhase, phases::ID FinalPhase,		phases::ID CurPhase, phases::ID FinalPhase,
PhasesTy &Phases) override {		PhasesTy &Phases) override {
if (!IsActive)		if (!IsActive)
return ABRT_Inactive;		return ABRT_Inactive;
▲ Show 20 Lines • Show All 2,514 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.h

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	public:

SanitizerMask getSupportedSanitizers() const override;		SanitizerMask getSupportedSanitizers() const override;

VersionTuple		VersionTuple
computeMSVCVersion(const Driver *D,		computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;		const llvm::opt::ArgList &Args) const override;

unsigned GetDefaultDwarfVersion() const override { return 2; }		unsigned GetDefaultDwarfVersion() const override { return 2; }
		CudaVersion getCudaVersion() const;

const ToolChain &HostTC;		const ToolChain &HostTC;
CudaInstallationDetector CudaInstallation;		CudaInstallationDetector CudaInstallation;

protected:		protected:
Tool *buildAssembler() const override; // ptxas		Tool *buildAssembler() const override; // ptxas
Tool *buildLinker() const override; // fatbinary (ok, not really a linker)		Tool *buildLinker() const override; // fatbinary (ok, not really a linker)

Show All 9 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 878 Lines • ▼ Show 20 Lines	SanitizerMask CudaToolChain::getSupportedSanitizers() const {
// tolerate flags meant only for the host toolchain.		// tolerate flags meant only for the host toolchain.
return HostTC.getSupportedSanitizers();		return HostTC.getSupportedSanitizers();
}		}

VersionTuple CudaToolChain::computeMSVCVersion(const Driver *D,		VersionTuple CudaToolChain::computeMSVCVersion(const Driver *D,
const ArgList &Args) const {		const ArgList &Args) const {
return HostTC.computeMSVCVersion(D, Args);		return HostTC.computeMSVCVersion(D, Args);
}		}

		CudaVersion CudaToolChain::getCudaVersion() const {
		return CudaInstallation.version();
		}