This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
Cuda.h
-
lib/
-
Basic/
-
Cuda.cpp
-
Targets/
-
NVPTX.cpp
-
CodeGen/
-
CGOpenMPRuntimeNVPTX.cpp
-
Driver/ToolChains/
-
ToolChains/
-
Cuda.cpp
-
llvm/lib/Target/NVPTX/
-
lib/
-
Target/
-
NVPTX/
-
NVPTX.td

Differential D77670

[CUDA] Add partial support for recent CUDA versions.
ClosedPublic

Authored by tra on Apr 7 2020, 11:58 AM.

Download Raw Diff

Details

Reviewers

yaxunl

Commits

rGa9627b7ea7e2: [CUDA] Add partial support for recent CUDA versions.

Summary

Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tra created this revision.Apr 7 2020, 11:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2020, 11:58 AM

Herald added subscribers: sanjoy.google, bixia, hiraditya, jholewinski. · View Herald Transcript

tra added a parent revision: D77665: [CUDA] Simplify GPU variant handling. NFC..Apr 7 2020, 12:04 PM

Harbormaster completed remote builds in B52198: Diff 255759.Apr 7 2020, 12:32 PM

LGTM. Thanks!

This revision is now accepted and ready to land.Apr 7 2020, 2:37 PM

Closed by commit rGa9627b7ea7e2: [CUDA] Add partial support for recent CUDA versions. (authored by tra). · Explain WhyApr 8 2020, 11:24 AM

This revision was automatically updated to reflect the committed changes.

@tra The split between LATEST and LATEST_SUPPORTED leads to very weird warning and error messages:

clang-14: warning: unknown CUDA version: cuda.h: CUDA_VERSION=11040.; assuming the latest supported version 10.1 [-Wunknown-cuda-version]
clang-14: error: cannot find libdevice for sm_20; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice                                                                                                                    
clang-14: error: GPU arch sm_20 is supported by CUDA versions between 7.0 and 8.0 (inclusive), but installation at /usr/local/cuda-11.4 is 11.2; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'

Clang is mentioning three different CUDA versions here: 11.4 is what I really have installed, 11.2 is LATEST and therefore the one returned by getCudaVersion or as the "last resort" in CudaInstallationDetector, and the first warning says that Clang assumes the latest supported version 10.1. As a developer looking into the code, I get that the first warning is about saying that 10.1 is the latest fully supported version in terms of features, but I think this is really confusing to users. Do you see a chance to improve this? (other than adding just 11.3 and 11.4 to the enumerations where we'll always run behind)

Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2021, 7:26 AM

Herald added a subscriber: dexonsmith. · View Herald Transcript

In D77670#2943753, @Hahnfeld wrote:

@tra The split between LATEST and LATEST_SUPPORTED leads to very weird warning and error messages:

Agreed, it's far from ideal. There's also more than one issue involved.

clang-14: warning: unknown CUDA version: cuda.h: CUDA_VERSION=11040.; assuming the latest supported version 10.1 [-Wunknown-cuda-version]

The good news is that we've grown support for enough clang builtins and PTX instructions to bump the "latest supported" to ~CUDA-11.3 or, maybe, even 11.4. At least, clang should be able to compile all CUDA headers in those versions.
This should reduce the noise.

clang-14: error: cannot find libdevice for sm_20; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice

It's also time to bump the default GPU target to something that's supported by the CUDA versions we reasonably expect to see. That should probably be sm_35 as that's probably the oldest GPU platform that's still widely available (e.g. there are tons of them on Google cloud and AWS) and is still supported by all CUDA versions clang accepts.

clang-14: error: GPU arch sm_20 is supported by CUDA versions between 7.0 and 8.0 (inclusive), but installation at /usr/local/cuda-11.4 is 11.2; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'

Perhaps it's time to start considering decommisioning sm_20 support in clang and NVPTX. nvcc has done that long ago and is already on the way to dropping sm_3x, too. sm_30 is no longer supported and sm_35 has been deprecated and is expected be gone in the next CUDA release.

Clang is mentioning three different CUDA versions here: 11.4 is what I really have installed, 11.2 is LATEST and therefore the one returned by getCudaVersion or as the "last resort" in CudaInstallationDetector, and the first warning says that Clang assumes the latest supported version 10.1. As a developer looking into the code, I get that the first warning is about saying that 10.1 is the latest fully supported version in terms of features, but I think this is really confusing to users. Do you see a chance to improve this? (other than adding just 11.3 and 11.4 to the enumerations where we'll always run behind)

I'm open to suggestions. This was the least bad compromise I managed to come up with.

We could report the actually detected version, instead of the 'latest' version clang knows about. Or not report it at all as it's not particularly helpful for the end user. That would mitigate one source of confusion.

As for the latest supported, I think we may still want to have it in some form. Clang has to deal with version-specific CUDA quirks, so a CUDA version outside of the range that clang is known to work with puts the user in uncharted waters. E.g. until recently clang worked well enough with CUDA-11.3, but only if you were compiling for the older GPUs. Attempts to compile some headers for sm_80 would fail and that *was* confusing to users who ran into that when the warning was disabled.

In D77670#2944192, @tra wrote:

In D77670#2943753, @Hahnfeld wrote:

@tra The split between LATEST and LATEST_SUPPORTED leads to very weird warning and error messages:

Agreed, it's far from ideal. There's also more than one issue involved.

Unfortunately, yes...

clang-14: warning: unknown CUDA version: cuda.h: CUDA_VERSION=11040.; assuming the latest supported version 10.1 [-Wunknown-cuda-version]

The good news is that we've grown support for enough clang builtins and PTX instructions to bump the "latest supported" to ~CUDA-11.3 or, maybe, even 11.4. At least, clang should be able to compile all CUDA headers in those versions.
This should reduce the noise.

Great!

clang-14: error: cannot find libdevice for sm_20; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice

It's also time to bump the default GPU target to something that's supported by the CUDA versions we reasonably expect to see. That should probably be sm_35 as that's probably the oldest GPU platform that's still widely available (e.g. there are tons of them on Google cloud and AWS) and is still supported by all CUDA versions clang accepts.

+1 for at least sm_35 - that would match recent nvccs, right?

clang-14: error: GPU arch sm_20 is supported by CUDA versions between 7.0 and 8.0 (inclusive), but installation at /usr/local/cuda-11.4 is 11.2; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'

Perhaps it's time to start considering decommisioning sm_20 support in clang and NVPTX. nvcc has done that long ago and is already on the way to dropping sm_3x, too. sm_30 is no longer supported and sm_35 has been deprecated and is expected be gone in the next CUDA release.

+1 - given that Clang 13.x just branched, now may be an ideal moment to make this cut.

Clang is mentioning three different CUDA versions here: 11.4 is what I really have installed, 11.2 is LATEST and therefore the one returned by getCudaVersion or as the "last resort" in CudaInstallationDetector, and the first warning says that Clang assumes the latest supported version 10.1. As a developer looking into the code, I get that the first warning is about saying that 10.1 is the latest fully supported version in terms of features, but I think this is really confusing to users. Do you see a chance to improve this? (other than adding just 11.3 and 11.4 to the enumerations where we'll always run behind)

I'm open to suggestions. This was the least bad compromise I managed to come up with.

We could report the actually detected version, instead of the 'latest' version clang knows about. Or not report it at all as it's not particularly helpful for the end user. That would mitigate one source of confusion.

As for the latest supported, I think we may still want to have it in some form. Clang has to deal with version-specific CUDA quirks, so a CUDA version outside of the range that clang is known to work with puts the user in uncharted waters. E.g. until recently clang worked well enough with CUDA-11.3, but only if you were compiling for the older GPUs. Attempts to compile some headers for sm_80 would fail and that *was* confusing to users who ran into that when the warning was disabled.

Yeah, the problem was that I didn't have better suggestions either when I wrote the first comment. But maybe now: How about having a "past-the-latest" value in the enum that Clang remembers if it detects a version more recent than it knows about? Then we could have two warnings:

If we have a "past-the-latest" version, tell the user that Clang has no clue about this version and we assume the LATEST version; things might work, but no guarantees.
If we have a version that is greater than the latest supported version, emit the current warning and say that support is "best-effort" (or something along that line). In that case, both the detected version and the "assumed" supported version should make sense to the user.

In D77670#2944442, @Hahnfeld wrote:

It's also time to bump the default GPU target to something that's supported by the CUDA versions we reasonably expect to see. That should probably be sm_35 as that's probably the oldest GPU platform that's still widely available (e.g. there are tons of them on Google cloud and AWS) and is still supported by all CUDA versions clang accepts.

+1 for at least sm_35 - that would match recent nvccs, right?

NVCC in 11.4.1 defaults to sm_52 as the oldest non-deprecated GPU. I don't think it's time for clang to go that far as, unlike NVCC, we have to deal with older CUDA versions, too. For us the lowest common denominator for supported CUDA versions and GPUs hardware availability is sm_35.

I could also argue the other way around -- it may make sense to set default GPU to the most recent one supported by all CUDA versions. That will allow clang to compile larger subset of existing CUDA code (new GPUs support more builtins/features the code may rely on.

We could set target GPU to the most recent GPU variant supported by the CUDA version we've found. This, however will mean that the target will change from one system to another, depending on which CUDA version happens to be installed. I think that would be pushing it too far.

In any case, there's no universally good choice as we don't know which GPU the user needs the code for. If our choice is wrong, the app will not run. In practice users do need to specify both CUDA SDK path and the list of GPUs they want to compile for. The defaults, especially the default GPU target, is likely to be wrong more often than not.

Yeah, the problem was that I didn't have better suggestions either when I wrote the first comment. But maybe now: How about having a "past-the-latest" value in the enum that Clang remembers if it detects a version more recent than it knows about? Then we could have two warnings:

If we have a "past-the-latest" version, tell the user that Clang has no clue about this version and we assume the LATEST version; things might work, but no guarantees.

If we have a version that is greater than the latest supported version, emit the current warning and say that support is "best-effort" (or something along that line). In that case, both the detected version and the "assumed" supported version should make sense to the user.

SGTM. I'll send the patch next week and we can discuss the details there.

tra mentioned this in D108235: [CUDA] Bump default GPU architecture to sm_35..Aug 17 2021, 12:57 PM

See the patch stack at https://reviews.llvm.org/D108248

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Cuda.h

6 lines

lib/

Basic/

Cuda.cpp

9 lines

Targets/

NVPTX.cpp

4 lines

CodeGen/

CGOpenMPRuntimeNVPTX.cpp

2 lines

Driver/

ToolChains/

Cuda.cpp

53 lines

llvm/

lib/

Target/

NVPTX/

NVPTX.td

7 lines

Diff 255759

clang/include/clang/Basic/Cuda.h

Show All 21 Lines	enum class CudaVersion {
CUDA_70,		CUDA_70,
CUDA_75,		CUDA_75,
CUDA_80,		CUDA_80,
CUDA_90,		CUDA_90,
CUDA_91,		CUDA_91,
CUDA_92,		CUDA_92,
CUDA_100,		CUDA_100,
CUDA_101,		CUDA_101,
LATEST = CUDA_101,		CUDA_102,
		CUDA_110,
		LATEST = CUDA_110,
		LATEST_SUPPORTED = CUDA_101,
};		};
const char *CudaVersionToString(CudaVersion V);		const char *CudaVersionToString(CudaVersion V);
// Input is "Major.Minor"		// Input is "Major.Minor"
CudaVersion CudaStringToVersion(const llvm::Twine &S);		CudaVersion CudaStringToVersion(const llvm::Twine &S);

enum class CudaArch {		enum class CudaArch {
UNKNOWN,		UNKNOWN,
SM_20,		SM_20,
SM_21,		SM_21,
SM_30,		SM_30,
SM_32,		SM_32,
SM_35,		SM_35,
SM_37,		SM_37,
SM_50,		SM_50,
SM_52,		SM_52,
SM_53,		SM_53,
SM_60,		SM_60,
SM_61,		SM_61,
SM_62,		SM_62,
SM_70,		SM_70,
SM_72,		SM_72,
SM_75,		SM_75,
		SM_80,
GFX600,		GFX600,
GFX601,		GFX601,
GFX700,		GFX700,
GFX701,		GFX701,
GFX702,		GFX702,
GFX703,		GFX703,
GFX704,		GFX704,
GFX801,		GFX801,
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

clang/lib/Basic/Cuda.cpp

Show All 22 Lines	const char *CudaVersionToString(CudaVersion V) {
case CudaVersion::CUDA_91:		case CudaVersion::CUDA_91:
return "9.1";		return "9.1";
case CudaVersion::CUDA_92:		case CudaVersion::CUDA_92:
return "9.2";		return "9.2";
case CudaVersion::CUDA_100:		case CudaVersion::CUDA_100:
return "10.0";		return "10.0";
case CudaVersion::CUDA_101:		case CudaVersion::CUDA_101:
return "10.1";		return "10.1";
		case CudaVersion::CUDA_102:
		return "10.2";
		case CudaVersion::CUDA_110:
		return "11.0";
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVersion CudaStringToVersion(const llvm::Twine &S) {		CudaVersion CudaStringToVersion(const llvm::Twine &S) {
return llvm::StringSwitch<CudaVersion>(S.str())		return llvm::StringSwitch<CudaVersion>(S.str())
.Case("7.0", CudaVersion::CUDA_70)		.Case("7.0", CudaVersion::CUDA_70)
.Case("7.5", CudaVersion::CUDA_75)		.Case("7.5", CudaVersion::CUDA_75)
.Case("8.0", CudaVersion::CUDA_80)		.Case("8.0", CudaVersion::CUDA_80)
.Case("9.0", CudaVersion::CUDA_90)		.Case("9.0", CudaVersion::CUDA_90)
.Case("9.1", CudaVersion::CUDA_91)		.Case("9.1", CudaVersion::CUDA_91)
.Case("9.2", CudaVersion::CUDA_92)		.Case("9.2", CudaVersion::CUDA_92)
.Case("10.0", CudaVersion::CUDA_100)		.Case("10.0", CudaVersion::CUDA_100)
.Case("10.1", CudaVersion::CUDA_101)		.Case("10.1", CudaVersion::CUDA_101)
		.Case("10.2", CudaVersion::CUDA_102)
		.Case("11.0", CudaVersion::CUDA_110)
.Default(CudaVersion::UNKNOWN);		.Default(CudaVersion::UNKNOWN);
}		}

struct CudaArchToStringMap {		struct CudaArchToStringMap {
CudaArch arch;		CudaArch arch;
const char *arch_name;		const char *arch_name;
const char *virtual_arch_name;		const char *virtual_arch_name;
};		};

#define SM2(sm, ca) \		#define SM2(sm, ca) \
{ CudaArch::SM_##sm, "sm_" #sm, ca }		{ CudaArch::SM_##sm, "sm_" #sm, ca }
#define SM(sm) SM2(sm, "compute_" #sm)		#define SM(sm) SM2(sm, "compute_" #sm)
#define GFX(gpu) \		#define GFX(gpu) \
{ CudaArch::GFX##gpu, "gfx" #gpu, "compute_amdgcn" }		{ CudaArch::GFX##gpu, "gfx" #gpu, "compute_amdgcn" }
CudaArchToStringMap arch_names[] = {		CudaArchToStringMap arch_names[] = {
// clang-format off		// clang-format off
SM2(20, "compute_20"), SM2(21, "compute_20"), // Fermi		SM2(20, "compute_20"), SM2(21, "compute_20"), // Fermi
SM(30), SM(32), SM(35), SM(37), // Kepler		SM(30), SM(32), SM(35), SM(37), // Kepler
SM(50), SM(52), SM(53), // Maxwell		SM(50), SM(52), SM(53), // Maxwell
SM(60), SM(61), SM(62), // Pascal		SM(60), SM(61), SM(62), // Pascal
SM(70), SM(72), // Volta		SM(70), SM(72), // Volta
SM(75), // Turing		SM(75), // Turing
		SM(80), // Ampere
GFX(600), // tahiti		GFX(600), // tahiti
GFX(601), // pitcairn, verde, oland,hainan		GFX(601), // pitcairn, verde, oland,hainan
GFX(700), // kaveri		GFX(700), // kaveri
GFX(701), // hawaii		GFX(701), // hawaii
GFX(702), // 290,290x,R390,R390x		GFX(702), // 290,290x,R390,R390x
GFX(703), // kabini mullins		GFX(703), // kabini mullins
GFX(704), // bonaire		GFX(704), // bonaire
GFX(801), // carrizo		GFX(801), // carrizo
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	CudaVersion MinVersionForCudaArch(CudaArch A) {
case CudaArch::SM_62:		case CudaArch::SM_62:
return CudaVersion::CUDA_80;		return CudaVersion::CUDA_80;
case CudaArch::SM_70:		case CudaArch::SM_70:
return CudaVersion::CUDA_90;		return CudaVersion::CUDA_90;
case CudaArch::SM_72:		case CudaArch::SM_72:
return CudaVersion::CUDA_91;		return CudaVersion::CUDA_91;
case CudaArch::SM_75:		case CudaArch::SM_75:
return CudaVersion::CUDA_100;		return CudaVersion::CUDA_100;
		case CudaArch::SM_80:
		return CudaVersion::CUDA_110;
default:		default:
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}
}		}

CudaVersion MaxVersionForCudaArch(CudaArch A) {		CudaVersion MaxVersionForCudaArch(CudaArch A) {
// AMD GPUs do not depend on CUDA versions.		// AMD GPUs do not depend on CUDA versions.
if (IsAMDGpuArch(A))		if (IsAMDGpuArch(A))
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/NVPTX.cpp

Show All 38 Lines	NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
assert((TargetPointerWidth == 32 \|\| TargetPointerWidth == 64) &&		assert((TargetPointerWidth == 32 \|\| TargetPointerWidth == 64) &&
"NVPTX only supports 32- and 64-bit modes.");		"NVPTX only supports 32- and 64-bit modes.");

PTXVersion = 32;		PTXVersion = 32;
for (const StringRef Feature : Opts.FeaturesAsWritten) {		for (const StringRef Feature : Opts.FeaturesAsWritten) {
if (!Feature.startswith("+ptx"))		if (!Feature.startswith("+ptx"))
continue;		continue;
PTXVersion = llvm::StringSwitch<unsigned>(Feature)		PTXVersion = llvm::StringSwitch<unsigned>(Feature)
		.Case("+ptx70", 70)
		.Case("+ptx65", 65)
.Case("+ptx64", 64)		.Case("+ptx64", 64)
.Case("+ptx63", 63)		.Case("+ptx63", 63)
.Case("+ptx61", 61)		.Case("+ptx61", 61)
.Case("+ptx60", 60)		.Case("+ptx60", 60)
.Case("+ptx50", 50)		.Case("+ptx50", 50)
.Case("+ptx43", 43)		.Case("+ptx43", 43)
.Case("+ptx42", 42)		.Case("+ptx42", 42)
.Case("+ptx41", 41)		.Case("+ptx41", 41)
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	std::string CUDAArchCode = [this] {
case CudaArch::SM_62:		case CudaArch::SM_62:
return "620";		return "620";
case CudaArch::SM_70:		case CudaArch::SM_70:
return "700";		return "700";
case CudaArch::SM_72:		case CudaArch::SM_72:
return "720";		return "720";
case CudaArch::SM_75:		case CudaArch::SM_75:
return "750";		return "750";
		case CudaArch::SM_80:
		return "800";
}		}
llvm_unreachable("unhandled CudaArch");		llvm_unreachable("unhandled CudaArch");
}();		}();
Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);		Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);
}		}
}		}

ArrayRef<Builtin::Info> NVPTXTargetInfo::getTargetBuiltins() const {		ArrayRef<Builtin::Info> NVPTXTargetInfo::getTargetBuiltins() const {
return llvm::makeArrayRef(BuiltinInfo, clang::NVPTX::LastTSBuiltin -		return llvm::makeArrayRef(BuiltinInfo, clang::NVPTX::LastTSBuiltin -
Builtin::FirstTSBuiltin);		Builtin::FirstTSBuiltin);
}		}

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

Show First 20 Lines • Show All 4,986 Lines • ▼ Show 20 Lines	if (Clause->getClauseKind() == OMPC_unified_shared_memory) {
Out << "Target architecture " << CudaArchToString(Arch)		Out << "Target architecture " << CudaArchToString(Arch)
<< " does not support unified addressing";		<< " does not support unified addressing";
CGM.Error(Clause->getBeginLoc(), Out.str());		CGM.Error(Clause->getBeginLoc(), Out.str());
return;		return;
}		}
case CudaArch::SM_70:		case CudaArch::SM_70:
case CudaArch::SM_72:		case CudaArch::SM_72:
case CudaArch::SM_75:		case CudaArch::SM_75:
		case CudaArch::SM_80:
case CudaArch::GFX600:		case CudaArch::GFX600:
case CudaArch::GFX601:		case CudaArch::GFX601:
case CudaArch::GFX700:		case CudaArch::GFX700:
case CudaArch::GFX701:		case CudaArch::GFX701:
case CudaArch::GFX702:		case CudaArch::GFX702:
case CudaArch::GFX703:		case CudaArch::GFX703:
case CudaArch::GFX704:		case CudaArch::GFX704:
case CudaArch::GFX801:		case CudaArch::GFX801:
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	case CudaArch::SM_53:
return {16, 16};		return {16, 16};
case CudaArch::SM_60:		case CudaArch::SM_60:
case CudaArch::SM_61:		case CudaArch::SM_61:
case CudaArch::SM_62:		case CudaArch::SM_62:
return {56, 32};		return {56, 32};
case CudaArch::SM_70:		case CudaArch::SM_70:
case CudaArch::SM_72:		case CudaArch::SM_72:
case CudaArch::SM_75:		case CudaArch::SM_75:
		case CudaArch::SM_80:
return {84, 32};		return {84, 32};
case CudaArch::GFX600:		case CudaArch::GFX600:
case CudaArch::GFX601:		case CudaArch::GFX601:
case CudaArch::GFX700:		case CudaArch::GFX700:
case CudaArch::GFX701:		case CudaArch::GFX701:
case CudaArch::GFX702:		case CudaArch::GFX702:
case CudaArch::GFX703:		case CudaArch::GFX703:
case CudaArch::GFX704:		case CudaArch::GFX704:
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show All 39 Lines	if (!V.startswith("CUDA Version "))
return;		return;
V = V.substr(strlen("CUDA Version "));		V = V.substr(strlen("CUDA Version "));
SmallVector<StringRef,4> VersionParts;		SmallVector<StringRef,4> VersionParts;
V.split(VersionParts, '.');		V.split(VersionParts, '.');
if (VersionParts.size() < 2)		if (VersionParts.size() < 2)
return;		return;
DetectedVersion = join_items(".", VersionParts[0], VersionParts[1]);		DetectedVersion = join_items(".", VersionParts[0], VersionParts[1]);
Version = CudaStringToVersion(DetectedVersion);		Version = CudaStringToVersion(DetectedVersion);
if (Version != CudaVersion::UNKNOWN)		if (Version != CudaVersion::UNKNOWN) {
		// TODO(tra): remove the warning once we have all features of 10.2 and 11.0
		// implemented.
		DetectedVersionIsNotSupported = Version > CudaVersion::LATEST_SUPPORTED;
return;		return;
		}

Version = CudaVersion::LATEST;		Version = CudaVersion::LATEST_SUPPORTED;
DetectedVersionIsNotSupported = true;		DetectedVersionIsNotSupported = true;
}		}

void CudaInstallationDetector::WarnIfUnsupportedVersion() {		void CudaInstallationDetector::WarnIfUnsupportedVersion() {
if (DetectedVersionIsNotSupported)		if (DetectedVersionIsNotSupported)
D.Diag(diag::warn_drv_unknown_cuda_version)		D.Diag(diag::warn_drv_unknown_cuda_version)
<< DetectedVersion << CudaVersionToString(Version);		<< DetectedVersion
		<< CudaVersionToString(CudaVersion::LATEST_SUPPORTED);
}		}

CudaInstallationDetector::CudaInstallationDetector(		CudaInstallationDetector::CudaInstallationDetector(
const Driver &D, const llvm::Triple &HostTriple,		const Driver &D, const llvm::Triple &HostTriple,
const llvm::opt::ArgList &Args)		const llvm::opt::ArgList &Args)
: D(D) {		: D(D) {
struct Candidate {		struct Candidate {
std::string Path;		std::string Path;
▲ Show 20 Lines • Show All 567 Lines • ▼ Show 20 Lines	void CudaToolChain::addClangTargetOptions(

CC1Args.push_back("-mlink-builtin-bitcode");		CC1Args.push_back("-mlink-builtin-bitcode");
CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));		CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));

// New CUDA versions often introduce new instructions that are only supported		// New CUDA versions often introduce new instructions that are only supported
// by new PTX version, so we need to raise PTX level to enable them in NVPTX		// by new PTX version, so we need to raise PTX level to enable them in NVPTX
// back-end.		// back-end.
const char *PtxFeature = nullptr;		const char *PtxFeature = nullptr;
switch(CudaInstallation.version()) {		switch (CudaInstallation.version()) {
		case CudaVersion::CUDA_110:
		PtxFeature = "+ptx70";
		break;
		case CudaVersion::CUDA_102:
		PtxFeature = "+ptx65";
		break;
case CudaVersion::CUDA_101:		case CudaVersion::CUDA_101:
PtxFeature = "+ptx64";		PtxFeature = "+ptx64";
break;		break;
case CudaVersion::CUDA_100:		case CudaVersion::CUDA_100:
PtxFeature = "+ptx63";		PtxFeature = "+ptx63";
break;		break;
case CudaVersion::CUDA_92:		case CudaVersion::CUDA_92:
PtxFeature = "+ptx61";		PtxFeature = "+ptx61";
break;		break;
case CudaVersion::CUDA_91:		case CudaVersion::CUDA_91:
PtxFeature = "+ptx61";		PtxFeature = "+ptx61";
break;		break;
case CudaVersion::CUDA_90:		case CudaVersion::CUDA_90:
PtxFeature = "+ptx60";		PtxFeature = "+ptx60";
break;		break;
default:		default:
PtxFeature = "+ptx42";		PtxFeature = "+ptx42";
}		}
CC1Args.append({"-target-feature", PtxFeature});		CC1Args.append({"-target-feature", PtxFeature});
if (DriverArgs.hasFlag(options::OPT_fcuda_short_ptr,		if (DriverArgs.hasFlag(options::OPT_fcuda_short_ptr,
options::OPT_fno_cuda_short_ptr, false))		options::OPT_fno_cuda_short_ptr, false))
CC1Args.append({"-mllvm", "--nvptx-short-ptr"});		CC1Args.append({"-mllvm", "--nvptx-short-ptr"});

if (CudaInstallation.version() >= CudaVersion::UNKNOWN)		if (CudaInstallation.version() >= CudaVersion::UNKNOWN)
CC1Args.push_back(DriverArgs.MakeArgString(		CC1Args.push_back(DriverArgs.MakeArgString(
▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTX.td

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	def SM62 : SubtargetFeature<"sm_62", "SmVersion", "62",			def SM62 : SubtargetFeature<"sm_62", "SmVersion", "62",
	"Target SM 6.2">;			"Target SM 6.2">;
	def SM70 : SubtargetFeature<"sm_70", "SmVersion", "70",			def SM70 : SubtargetFeature<"sm_70", "SmVersion", "70",
	"Target SM 7.0">;			"Target SM 7.0">;
	def SM72 : SubtargetFeature<"sm_72", "SmVersion", "72",			def SM72 : SubtargetFeature<"sm_72", "SmVersion", "72",
	"Target SM 7.2">;			"Target SM 7.2">;
	def SM75 : SubtargetFeature<"sm_75", "SmVersion", "75",			def SM75 : SubtargetFeature<"sm_75", "SmVersion", "75",
	"Target SM 7.5">;			"Target SM 7.5">;
				def SM80 : SubtargetFeature<"sm_80", "SmVersion", "80",
				"Target SM 8.0">;

	// PTX Versions			// PTX Versions
	def PTX32 : SubtargetFeature<"ptx32", "PTXVersion", "32",			def PTX32 : SubtargetFeature<"ptx32", "PTXVersion", "32",
	"Use PTX version 3.2">;			"Use PTX version 3.2">;
	def PTX40 : SubtargetFeature<"ptx40", "PTXVersion", "40",			def PTX40 : SubtargetFeature<"ptx40", "PTXVersion", "40",
	"Use PTX version 4.0">;			"Use PTX version 4.0">;
	def PTX41 : SubtargetFeature<"ptx41", "PTXVersion", "41",			def PTX41 : SubtargetFeature<"ptx41", "PTXVersion", "41",
	"Use PTX version 4.1">;			"Use PTX version 4.1">;
	def PTX42 : SubtargetFeature<"ptx42", "PTXVersion", "42",			def PTX42 : SubtargetFeature<"ptx42", "PTXVersion", "42",
	"Use PTX version 4.2">;			"Use PTX version 4.2">;
	def PTX43 : SubtargetFeature<"ptx43", "PTXVersion", "43",			def PTX43 : SubtargetFeature<"ptx43", "PTXVersion", "43",
	"Use PTX version 4.3">;			"Use PTX version 4.3">;
	def PTX50 : SubtargetFeature<"ptx50", "PTXVersion", "50",			def PTX50 : SubtargetFeature<"ptx50", "PTXVersion", "50",
	"Use PTX version 5.0">;			"Use PTX version 5.0">;
	def PTX60 : SubtargetFeature<"ptx60", "PTXVersion", "60",			def PTX60 : SubtargetFeature<"ptx60", "PTXVersion", "60",
	"Use PTX version 6.0">;			"Use PTX version 6.0">;
	def PTX61 : SubtargetFeature<"ptx61", "PTXVersion", "61",			def PTX61 : SubtargetFeature<"ptx61", "PTXVersion", "61",
	"Use PTX version 6.1">;			"Use PTX version 6.1">;
	def PTX63 : SubtargetFeature<"ptx63", "PTXVersion", "63",			def PTX63 : SubtargetFeature<"ptx63", "PTXVersion", "63",
	"Use PTX version 6.3">;			"Use PTX version 6.3">;
	def PTX64 : SubtargetFeature<"ptx64", "PTXVersion", "64",			def PTX64 : SubtargetFeature<"ptx64", "PTXVersion", "64",
	"Use PTX version 6.4">;			"Use PTX version 6.4">;
				def PTX65 : SubtargetFeature<"ptx65", "PTXVersion", "65",
				"Use PTX version 6.5">;
				def PTX70 : SubtargetFeature<"ptx70", "PTXVersion", "70",
				"Use PTX version 7.0">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// NVPTX supported processors.			// NVPTX supported processors.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class Proc<string Name, list<SubtargetFeature> Features>			class Proc<string Name, list<SubtargetFeature> Features>
	: Processor<Name, NoItineraries, Features>;			: Processor<Name, NoItineraries, Features>;

	def : Proc<"sm_20", [SM20]>;			def : Proc<"sm_20", [SM20]>;
	def : Proc<"sm_21", [SM21]>;			def : Proc<"sm_21", [SM21]>;
	def : Proc<"sm_30", [SM30]>;			def : Proc<"sm_30", [SM30]>;
	def : Proc<"sm_32", [SM32, PTX40]>;			def : Proc<"sm_32", [SM32, PTX40]>;
	def : Proc<"sm_35", [SM35]>;			def : Proc<"sm_35", [SM35]>;
	def : Proc<"sm_37", [SM37, PTX41]>;			def : Proc<"sm_37", [SM37, PTX41]>;
	def : Proc<"sm_50", [SM50, PTX40]>;			def : Proc<"sm_50", [SM50, PTX40]>;
	def : Proc<"sm_52", [SM52, PTX41]>;			def : Proc<"sm_52", [SM52, PTX41]>;
	def : Proc<"sm_53", [SM53, PTX42]>;			def : Proc<"sm_53", [SM53, PTX42]>;
	def : Proc<"sm_60", [SM60, PTX50]>;			def : Proc<"sm_60", [SM60, PTX50]>;
	def : Proc<"sm_61", [SM61, PTX50]>;			def : Proc<"sm_61", [SM61, PTX50]>;
	def : Proc<"sm_62", [SM62, PTX50]>;			def : Proc<"sm_62", [SM62, PTX50]>;
	def : Proc<"sm_70", [SM70, PTX60]>;			def : Proc<"sm_70", [SM70, PTX60]>;
	def : Proc<"sm_72", [SM72, PTX61]>;			def : Proc<"sm_72", [SM72, PTX61]>;
	def : Proc<"sm_75", [SM75, PTX63]>;			def : Proc<"sm_75", [SM75, PTX63]>;
				def : Proc<"sm_80", [SM80, PTX70]>;

	def NVPTXInstrInfo : InstrInfo {			def NVPTXInstrInfo : InstrInfo {
	}			}

	def NVPTX : Target {			def NVPTX : Target {
	let InstructionSet = NVPTXInstrInfo;			let InstructionSet = NVPTXInstrInfo;
	}			}