This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ReleaseNotes.rst
-
include/clang/Basic/
-
clang/
-
Basic/
-
BuiltinsNVPTX.def
-
Cuda.h
-
lib/
-
Basic/
-
Cuda.cpp
-
Targets/
-
NVPTX.cpp
-
Driver/ToolChains/
-
ToolChains/
-
Cuda.cpp
-
llvm/
-
docs/
-
CompileCudaWithLLVM.rst
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
-
NVPTX.td

Differential D113249

[CUDA] Bump CUDA version to 11.5
ClosedPublic

Authored by carlosgalvezp on Nov 5 2021, 2:14 AM.

Download Raw Diff

Details

Reviewers

tra
yaxunl
Hahnfeld

Commits

rG7ecec3f0f521: [CUDA] Bump supported CUDA version to 11.5

Summary

Following the pattern used for 11.4:
https://github.com/llvm/llvm-project/commit/49d982d8cbbbb6e01b6f8e4f173ed6325beab08b

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

carlosgalvezp created this revision.Nov 5 2021, 2:14 AM

Herald added subscribers: asavonic, dexonsmith, hiraditya and 2 others. · View Herald TranscriptNov 5 2021, 2:14 AM

carlosgalvezp requested review of this revision.Nov 5 2021, 2:14 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 5 2021, 2:14 AM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

carlosgalvezp added reviewers: tra, yaxunl, Hahnfeld.Nov 5 2021, 2:15 AM

carlosgalvezp retitled this revision from Bump CUDA version to 11.5 to [CUDA] Bump CUDA version to 11.5.Nov 5 2021, 2:17 AM

I'm not sure if it's actually correct to advertise full support for CUDA 11.5, but I didn't look into exact changes since 11.4

In D113249#3110954, @Hahnfeld wrote:

I'm not sure if it's actually correct to advertise full support for CUDA 11.5, but I didn't look into exact changes since 11.4

Good point. I was confused about the fact that 11.4 is both FULLY_SUPPORTED and PARTIALLY_SUPPORTED, so I thought to just follow the existing pattern. I didn't find any extra tests added for the bump 11.2 -> 11.4. Do we have infrastructure in place to test this, or how does it work?

Harbormaster completed remote builds in B132628: Diff 384993.Nov 5 2021, 3:09 AM

I think we're missing few more changes here:

The driver needs to enable ptx75 when it constructs cc1 command line in clang/lib/Driver/ToolChains/Cuda.cpp
We also need to handle PTX75 in clang/include/clang/Basic/BuiltinsNVPTX.def

In D113249#3110954, @Hahnfeld wrote:

I'm not sure if it's actually correct to advertise full support for CUDA 11.5, but I didn't look into exact changes since 11.4

Technically we never support all the features supported by NVCC, so for clang it essentially means "works well enough". I.e. no known regressions vs previous clang and CUDA versions. Usually it boils down to being able to compile CUDA headers.
"Partially supported" happens when we can compile code compileable with the older CUDA versions, but are missing something critical introduced by the new CUDA version. E.g. a new GPU variant. Or new compiler builtins/functions that a user may expect from the new CUDA version. Or some CUDA headers may use new instructions in inline asm that would not compile with ptxas unless we generate PTX output using the new PTX version.

AFAICT from the CUDA-11.5 release notes, it didn't introduce anything particularly interesting. We've been using clang with CUDA-11.5 for a few weeks w/o any issues, so I think it's fine to stamp it as supported, once the missing bits are in place.

This revision now requires changes to proceed.Nov 5 2021, 10:33 AM

Experimental support for __int128 is new in CUDA 11.5, not sure if Clang enables this for CUDA. The release notes also specify

builtin_assume can now be used to specify address space to allow for efficient loads and stores.

The docs are very scarce on this, I could only find void __builtin_assume(bool exp) which I think is not what they are talking about...

In D113249#3112279, @Hahnfeld wrote:

Experimental support for __int128 is new in CUDA 11.5, not sure if Clang enables this for CUDA.

I think we've added support for i128 a while back: https://godbolt.org/z/18bEbhMYb

The release notes also specify

builtin_assume can now be used to specify address space to allow for efficient loads and stores.

The docs are very scarce on this, I could only find void __builtin_assume(bool exp) which I think is not what they are talking about...

AMD folks have D112041 under review which will have builtin_assume help AS inference. In any case, we've already been doing it reasonably well automatically.

Updated BuilintsNVPTX.def

The driver needs to enable ptx75 when it constructs cc1 command line in clang/lib/Driver/ToolChains/Cuda.cpp

@tra Haven't I already done it in line 712? Or where should I enable it?

Update release notes

Harbormaster completed remote builds in B132852: Diff 385286.Nov 6 2021, 10:06 AM

In D113249#3113666, @carlosgalvezp wrote:

The driver needs to enable ptx75 when it constructs cc1 command line in clang/lib/Driver/ToolChains/Cuda.cpp

@tra Haven't I already done it in line 712? Or where should I enable it?

You're right. I've missed that.

This revision is now accepted and ready to land.Nov 8 2021, 10:55 AM

@Hahnfeld Are you satisfied with the replies to your questions? If so I can go ahead and merge.

In D113249#3117541, @carlosgalvezp wrote:

@Hahnfeld Are you satisfied with the replies to your questions? If so I can go ahead and merge.

Yes yes, I'm happy if @tra is happy :)

Awesome, thanks a lot for the reviews!

Closed by commit rG7ecec3f0f521: [CUDA] Bump supported CUDA version to 11.5 (authored by carlosgalvezp). · Explain WhyNov 9 2021, 12:21 AM

This revision was automatically updated to reflect the committed changes.

carlosgalvezp added a commit: rG7ecec3f0f521: [CUDA] Bump supported CUDA version to 11.5.

tra mentioned this in D107054: [Clang][CUDA] Add descriptors, mappings, and features for missing CUDA and PTX versions.Nov 18 2021, 2:10 PM

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

2 lines

include/

clang/

Basic/

BuiltinsNVPTX.def

5 lines

Cuda.h

5 lines

lib/

Basic/

Cuda.cpp

5 lines

Targets/

NVPTX.cpp

1 line

Driver/

ToolChains/

Cuda.cpp

3 lines

llvm/

docs/

CompileCudaWithLLVM.rst

4 lines

lib/

Target/

NVPTX/

NVPTX.td

2 lines

Diff 385717

clang/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines
	^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^
	- Implemented `P1938R3: if consteval <https://wg21.link/P1938R3>`_.			- Implemented `P1938R3: if consteval <https://wg21.link/P1938R3>`_.
	- Implemented `P2360R0: Extend init-statement to allow alias-declaration <https://wg21.link/P2360R0>`_.			- Implemented `P2360R0: Extend init-statement to allow alias-declaration <https://wg21.link/P2360R0>`_.


	CUDA Language Changes in Clang			CUDA Language Changes in Clang
	------------------------------			------------------------------

	- Clang now supports CUDA versions up to 11.4.			- Clang now supports CUDA versions up to 11.5.
	- Default GPU architecture has been changed from sm_20 to sm_35.			- Default GPU architecture has been changed from sm_20 to sm_35.

	Objective-C Language Changes in Clang			Objective-C Language Changes in Clang
	-------------------------------------			-------------------------------------

	OpenCL C Language Changes in Clang			OpenCL C Language Changes in Clang
	----------------------------------			----------------------------------

	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

clang/include/clang/Basic/BuiltinsNVPTX.def

	Show All 35 Lines
	#pragma push_macro("PTX63")			#pragma push_macro("PTX63")
	#pragma push_macro("PTX64")			#pragma push_macro("PTX64")
	#pragma push_macro("PTX65")			#pragma push_macro("PTX65")
	#pragma push_macro("PTX70")			#pragma push_macro("PTX70")
	#pragma push_macro("PTX71")			#pragma push_macro("PTX71")
	#pragma push_macro("PTX72")			#pragma push_macro("PTX72")
	#pragma push_macro("PTX73")			#pragma push_macro("PTX73")
	#pragma push_macro("PTX74")			#pragma push_macro("PTX74")
	#define PTX74 "ptx74"			#pragma push_macro("PTX75")
				#define PTX75 "ptx75"
				#define PTX74 "ptx74\|" PTX75
	#define PTX73 "ptx73\|" PTX74			#define PTX73 "ptx73\|" PTX74
	#define PTX72 "ptx72\|" PTX73			#define PTX72 "ptx72\|" PTX73
	#define PTX71 "ptx71\|" PTX72			#define PTX71 "ptx71\|" PTX72
	#define PTX70 "ptx70\|" PTX71			#define PTX70 "ptx70\|" PTX71
	#define PTX65 "ptx65\|" PTX70			#define PTX65 "ptx65\|" PTX70
	#define PTX64 "ptx64\|" PTX65			#define PTX64 "ptx64\|" PTX65
	#define PTX63 "ptx63\|" PTX64			#define PTX63 "ptx63\|" PTX64
	#define PTX61 "ptx61\|" PTX63			#define PTX61 "ptx61\|" PTX63
	▲ Show 20 Lines • Show All 769 Lines • ▼ Show 20 Lines
	#pragma pop_macro("PTX63")			#pragma pop_macro("PTX63")
	#pragma pop_macro("PTX64")			#pragma pop_macro("PTX64")
	#pragma pop_macro("PTX65")			#pragma pop_macro("PTX65")
	#pragma pop_macro("PTX70")			#pragma pop_macro("PTX70")
	#pragma pop_macro("PTX71")			#pragma pop_macro("PTX71")
	#pragma pop_macro("PTX72")			#pragma pop_macro("PTX72")
	#pragma pop_macro("PTX73")			#pragma pop_macro("PTX73")
	#pragma pop_macro("PTX74")			#pragma pop_macro("PTX74")
				#pragma pop_macro("PTX75")

clang/include/clang/Basic/Cuda.h

Show All 27 Lines	enum class CudaVersion {
CUDA_100,		CUDA_100,
CUDA_101,		CUDA_101,
CUDA_102,		CUDA_102,
CUDA_110,		CUDA_110,
CUDA_111,		CUDA_111,
CUDA_112,		CUDA_112,
CUDA_113,		CUDA_113,
CUDA_114,		CUDA_114,
FULLY_SUPPORTED = CUDA_114,		CUDA_115,
		FULLY_SUPPORTED = CUDA_115,
PARTIALLY_SUPPORTED =		PARTIALLY_SUPPORTED =
CUDA_114, // Partially supported. Proceed with a warning.		CUDA_115, // Partially supported. Proceed with a warning.
NEW = 10000, // Too new. Issue a warning, but allow using it.		NEW = 10000, // Too new. Issue a warning, but allow using it.
};		};
const char *CudaVersionToString(CudaVersion V);		const char *CudaVersionToString(CudaVersion V);
// Input is "Major.Minor"		// Input is "Major.Minor"
CudaVersion CudaStringToVersion(const llvm::Twine &S);		CudaVersion CudaStringToVersion(const llvm::Twine &S);

enum class CudaArch {		enum class CudaArch {
UNUSED,		UNUSED,
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

clang/lib/Basic/Cuda.cpp

Show All 34 Lines	const char *CudaVersionToString(CudaVersion V) {
case CudaVersion::CUDA_111:		case CudaVersion::CUDA_111:
return "11.1";		return "11.1";
case CudaVersion::CUDA_112:		case CudaVersion::CUDA_112:
return "11.2";		return "11.2";
case CudaVersion::CUDA_113:		case CudaVersion::CUDA_113:
return "11.3";		return "11.3";
case CudaVersion::CUDA_114:		case CudaVersion::CUDA_114:
return "11.4";		return "11.4";
		case CudaVersion::CUDA_115:
		return "11.5";
case CudaVersion::NEW:		case CudaVersion::NEW:
return "";		return "";
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVersion CudaStringToVersion(const llvm::Twine &S) {		CudaVersion CudaStringToVersion(const llvm::Twine &S) {
return llvm::StringSwitch<CudaVersion>(S.str())		return llvm::StringSwitch<CudaVersion>(S.str())
.Case("7.0", CudaVersion::CUDA_70)		.Case("7.0", CudaVersion::CUDA_70)
.Case("7.5", CudaVersion::CUDA_75)		.Case("7.5", CudaVersion::CUDA_75)
.Case("8.0", CudaVersion::CUDA_80)		.Case("8.0", CudaVersion::CUDA_80)
.Case("9.0", CudaVersion::CUDA_90)		.Case("9.0", CudaVersion::CUDA_90)
.Case("9.1", CudaVersion::CUDA_91)		.Case("9.1", CudaVersion::CUDA_91)
.Case("9.2", CudaVersion::CUDA_92)		.Case("9.2", CudaVersion::CUDA_92)
.Case("10.0", CudaVersion::CUDA_100)		.Case("10.0", CudaVersion::CUDA_100)
.Case("10.1", CudaVersion::CUDA_101)		.Case("10.1", CudaVersion::CUDA_101)
.Case("10.2", CudaVersion::CUDA_102)		.Case("10.2", CudaVersion::CUDA_102)
.Case("11.0", CudaVersion::CUDA_110)		.Case("11.0", CudaVersion::CUDA_110)
.Case("11.1", CudaVersion::CUDA_111)		.Case("11.1", CudaVersion::CUDA_111)
.Case("11.2", CudaVersion::CUDA_112)		.Case("11.2", CudaVersion::CUDA_112)
.Case("11.3", CudaVersion::CUDA_113)		.Case("11.3", CudaVersion::CUDA_113)
.Case("11.4", CudaVersion::CUDA_114)		.Case("11.4", CudaVersion::CUDA_114)
		.Case("11.5", CudaVersion::CUDA_115)
.Default(CudaVersion::UNKNOWN);		.Default(CudaVersion::UNKNOWN);
}		}

namespace {		namespace {
struct CudaArchToStringMap {		struct CudaArchToStringMap {
CudaArch arch;		CudaArch arch;
const char *arch_name;		const char *arch_name;
const char *virtual_arch_name;		const char *virtual_arch_name;
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	CudaVersion ToCudaVersion(llvm::VersionTuple Version) {
case 111:		case 111:
return CudaVersion::CUDA_111;		return CudaVersion::CUDA_111;
case 112:		case 112:
return CudaVersion::CUDA_112;		return CudaVersion::CUDA_112;
case 113:		case 113:
return CudaVersion::CUDA_113;		return CudaVersion::CUDA_113;
case 114:		case 114:
return CudaVersion::CUDA_114;		return CudaVersion::CUDA_114;
		case 115:
		return CudaVersion::CUDA_115;
default:		default:
return CudaVersion::UNKNOWN;		return CudaVersion::UNKNOWN;
}		}
}		}

bool CudaFeatureEnabled(llvm::VersionTuple Version, CudaFeature Feature) {		bool CudaFeatureEnabled(llvm::VersionTuple Version, CudaFeature Feature) {
return CudaFeatureEnabled(ToCudaVersion(Version), Feature);		return CudaFeatureEnabled(ToCudaVersion(Version), Feature);
}		}
Show All 11 Lines

clang/lib/Basic/Targets/NVPTX.cpp

Show All 38 Lines	NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
assert((TargetPointerWidth == 32 \|\| TargetPointerWidth == 64) &&		assert((TargetPointerWidth == 32 \|\| TargetPointerWidth == 64) &&
"NVPTX only supports 32- and 64-bit modes.");		"NVPTX only supports 32- and 64-bit modes.");

PTXVersion = 32;		PTXVersion = 32;
for (const StringRef Feature : Opts.FeaturesAsWritten) {		for (const StringRef Feature : Opts.FeaturesAsWritten) {
if (!Feature.startswith("+ptx"))		if (!Feature.startswith("+ptx"))
continue;		continue;
PTXVersion = llvm::StringSwitch<unsigned>(Feature)		PTXVersion = llvm::StringSwitch<unsigned>(Feature)
		.Case("+ptx75", 75)
.Case("+ptx74", 74)		.Case("+ptx74", 74)
.Case("+ptx73", 73)		.Case("+ptx73", 73)
.Case("+ptx72", 72)		.Case("+ptx72", 72)
.Case("+ptx71", 71)		.Case("+ptx71", 71)
.Case("+ptx70", 70)		.Case("+ptx70", 70)
.Case("+ptx65", 65)		.Case("+ptx65", 65)
.Case("+ptx64", 64)		.Case("+ptx64", 64)
.Case("+ptx63", 63)		.Case("+ptx63", 63)
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	CudaVersion getCudaVersion(uint32_t raw_version) {
if (raw_version < 11020)		if (raw_version < 11020)
return CudaVersion::CUDA_111;		return CudaVersion::CUDA_111;
if (raw_version < 11030)		if (raw_version < 11030)
return CudaVersion::CUDA_112;		return CudaVersion::CUDA_112;
if (raw_version < 11040)		if (raw_version < 11040)
return CudaVersion::CUDA_113;		return CudaVersion::CUDA_113;
if (raw_version < 11050)		if (raw_version < 11050)
return CudaVersion::CUDA_114;		return CudaVersion::CUDA_114;
		if (raw_version < 11060)
		return CudaVersion::CUDA_115;
return CudaVersion::NEW;		return CudaVersion::NEW;
}		}

CudaVersion parseCudaHFile(llvm::StringRef Input) {		CudaVersion parseCudaHFile(llvm::StringRef Input) {
// Helper lambda which skips the words if the line starts with them or returns		// Helper lambda which skips the words if the line starts with them or returns
// None otherwise.		// None otherwise.
auto StartsWithWords =		auto StartsWithWords =
[](llvm::StringRef Line,		[](llvm::StringRef Line,
▲ Show 20 Lines • Show All 626 Lines • ▼ Show 20 Lines	void CudaToolChain::addClangTargetOptions(
// by new PTX version, so we need to raise PTX level to enable them in NVPTX		// by new PTX version, so we need to raise PTX level to enable them in NVPTX
// back-end.		// back-end.
const char *PtxFeature = nullptr;		const char *PtxFeature = nullptr;
switch (CudaInstallationVersion) {		switch (CudaInstallationVersion) {
#define CASE_CUDA_VERSION(CUDA_VER, PTX_VER) \		#define CASE_CUDA_VERSION(CUDA_VER, PTX_VER) \
case CudaVersion::CUDA_##CUDA_VER: \		case CudaVersion::CUDA_##CUDA_VER: \
PtxFeature = "+ptx" #PTX_VER; \		PtxFeature = "+ptx" #PTX_VER; \
break;		break;
		CASE_CUDA_VERSION(115, 75);
CASE_CUDA_VERSION(114, 74);		CASE_CUDA_VERSION(114, 74);
CASE_CUDA_VERSION(113, 73);		CASE_CUDA_VERSION(113, 73);
CASE_CUDA_VERSION(112, 72);		CASE_CUDA_VERSION(112, 72);
CASE_CUDA_VERSION(111, 71);		CASE_CUDA_VERSION(111, 71);
CASE_CUDA_VERSION(110, 70);		CASE_CUDA_VERSION(110, 70);
CASE_CUDA_VERSION(102, 65);		CASE_CUDA_VERSION(102, 65);
CASE_CUDA_VERSION(101, 64);		CASE_CUDA_VERSION(101, 64);
CASE_CUDA_VERSION(100, 63);		CASE_CUDA_VERSION(100, 63);
▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/docs/CompileCudaWithLLVM.rst

	Show All 17 Lines

	Compiling CUDA Code			Compiling CUDA Code
	===================			===================

	Prerequisites			Prerequisites
	-------------			-------------

	CUDA is supported since llvm 3.9. Clang currently supports CUDA 7.0 through			CUDA is supported since llvm 3.9. Clang currently supports CUDA 7.0 through
	10.1. If clang detects a newer CUDA version, it will issue a warning and will			11.5. If clang detects a newer CUDA version, it will issue a warning and will
	attempt to use detected CUDA SDK it as if it were CUDA-10.1.			attempt to use detected CUDA SDK it as if it were CUDA 11.5.

	Before you build CUDA code, you'll need to have installed the CUDA SDK. See			Before you build CUDA code, you'll need to have installed the CUDA SDK. See
	`NVIDIA's CUDA installation guide			`NVIDIA's CUDA installation guide
	<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ for			<https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ for
	details. Note that clang `maynot support			details. Note that clang `maynot support
	<https://bugs.llvm.org/show_bug.cgi?id=26966>`_ the CUDA toolkit as installed by			<https://bugs.llvm.org/show_bug.cgi?id=26966>`_ the CUDA toolkit as installed by
	some Linux package managers. Clang does attempt to deal with specific details of			some Linux package managers. Clang does attempt to deal with specific details of
	CUDA installation on a handful of common Linux distributions, but in general the			CUDA installation on a handful of common Linux distributions, but in general the
	▲ Show 20 Lines • Show All 526 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTX.td

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	def PTX71 : SubtargetFeature<"ptx71", "PTXVersion", "71",			def PTX71 : SubtargetFeature<"ptx71", "PTXVersion", "71",
	"Use PTX version 7.1">;			"Use PTX version 7.1">;
	def PTX72 : SubtargetFeature<"ptx72", "PTXVersion", "72",			def PTX72 : SubtargetFeature<"ptx72", "PTXVersion", "72",
	"Use PTX version 7.2">;			"Use PTX version 7.2">;
	def PTX73 : SubtargetFeature<"ptx73", "PTXVersion", "73",			def PTX73 : SubtargetFeature<"ptx73", "PTXVersion", "73",
	"Use PTX version 7.3">;			"Use PTX version 7.3">;
	def PTX74 : SubtargetFeature<"ptx74", "PTXVersion", "74",			def PTX74 : SubtargetFeature<"ptx74", "PTXVersion", "74",
	"Use PTX version 7.4">;			"Use PTX version 7.4">;
				def PTX75 : SubtargetFeature<"ptx75", "PTXVersion", "75",
				"Use PTX version 7.5">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// NVPTX supported processors.			// NVPTX supported processors.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class Proc<string Name, list<SubtargetFeature> Features>			class Proc<string Name, list<SubtargetFeature> Features>
	: Processor<Name, NoItineraries, Features>;			: Processor<Name, NoItineraries, Features>;

	Show All 24 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Bump CUDA version to 11.5ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 385717

clang/docs/ReleaseNotes.rst

clang/include/clang/Basic/BuiltinsNVPTX.def

clang/include/clang/Basic/Cuda.h

clang/lib/Basic/Cuda.cpp

clang/lib/Basic/Targets/NVPTX.cpp

clang/lib/Driver/ToolChains/Cuda.cpp

llvm/docs/CompileCudaWithLLVM.rst

llvm/lib/Target/NVPTX/NVPTX.td

[CUDA] Bump CUDA version to 11.5
ClosedPublic