This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
Cuda.h
-
lib/
-
Basic/
-
Cuda.cpp
-
Driver/ToolChains/
-
ToolChains/
-
Cuda.cpp

Differential D74571

[OpenMP][CUDA] Add CUDA 10.2 support
AbandonedPublic

Authored by kkwli0 on Feb 13 2020, 11:03 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
ABataev
JonChesterfield
tra
ye-luo

Summary

This patch is to add CUDA 10.2 support. It essentially removes the warning message regarding the CUDA 10.2 toolkit not being supported. This warning causes libomp not being built as the LIBOMP_HAVE_VERSION_SCRIPT_FLAG test returns false.

Question to reviewers and others:
Do we also want to put this patch to 10.0.0 RC2 in order to avoid bug 44587 which requires >9.0.1 to be the build compiler?

Diff Detail

Event Timeline

kkwli0 created this revision.Feb 13 2020, 11:03 AM

Herald added a subscriber: guansong. · View Herald TranscriptFeb 13 2020, 11:03 AM

kkwli0 edited the summary of this revision. (Show Details)Feb 13 2020, 11:06 AM

Do the in tree tests all pass with the 10.2 toolchain? That's not exactly the same as whether it works but is the closest approximation we have available.

Assuming yes, this patch seems uncontroversial.

Do we also want to put this patch to 10.0.0 RC2

Yes, if there is nothing else needed to support 10.2 we should for sure do that.

in order to avoid bug 44587 which requires >9.0.1 to be the build compiler?

Do we already require CUDA 10 as part of the libomptarget cmake (with a nice message) to avoid the errors you've seen? We for sure need to.

It's a bit premature to call CUDA-10.2 supported. We can compile using it, but clang/llvm has no support for the new things introduced by CUDA-10.2.
E.g. CUDA-10.2 introduces new PTX version with new instructions (and matching clang builtins)
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-isa-version-6-5

Any code which checks cuda version (and gets 10.2) and tries to use those new features will fail.
IMO the warning does serve a valid purpose here indicating that we only intend to provide features available in 10.1.

Given that clang provides an option to disable the warning, I don't think this change should be applied.

This revision now requires changes to proceed.Feb 13 2020, 11:36 AM

In D74571#1874947, @JonChesterfield wrote:

Do the in tree tests all pass with the 10.2 toolchain? That's not exactly the same as whether it works but is the closest approximation we have available.

Assuming yes, this patch seems uncontroversial.

Yes, in tree tests pass with 10.2.

In D74571#1874950, @jdoerfert wrote:

in order to avoid bug 44587 which requires >9.0.1 to be the build compiler?

Do we already require CUDA 10 as part of the libomptarget cmake (with a nice message) to avoid the errors you've seen? We for sure need to.

This is a good idea but probably done in a separate patch.

In D74571#1875002, @kkwli0 wrote:

In D74571#1874947, @JonChesterfield wrote:

Do the in tree tests all pass with the 10.2 toolchain? That's not exactly the same as whether it works but is the closest approximation we have available.

Assuming yes, this patch seems uncontroversial.

Yes, in tree tests pass with 10.2.

I do not think it is relevant as those tests don't use CUDA-10.2 specific features yet.

Removing the warning here is similar to claiming that C++ compiler supports C++17 based on it passing C++ tests for c++11.

If OpenMP needs the warning gone, I'm OK with disabling it for openmp compilation, if that's what openmp owners want/need.
The warning should remain enabled otherwise.

In D74571#1875022, @kkwli0 wrote:

In D74571#1874950, @jdoerfert wrote:

in order to avoid bug 44587 which requires >9.0.1 to be the build compiler?

Do we already require CUDA 10 as part of the libomptarget cmake (with a nice message) to avoid the errors you've seen? We for sure need to.

This is a good idea but probably done in a separate patch.

Let's do this to "fix" PR44587.

Interesting distinction.

Should compiling without warning indicate comprehensive support, or merely that we ran the tests and they passed?

I assumed the latter on the basis that we probably don't have comprehensive support for cuda 10.1 either. No preference either way though.

In D74571#1875296, @JonChesterfield wrote:

Interesting distinction.

Should compiling without warning indicate comprehensive support, or merely that we ran the tests and they passed?

The test in LLVM do have very limited testing to CUDA-versioned things and at the moment they have no tests for anything specific to 10.2.

I assumed the latter on the basis that we probably don't have comprehensive support for cuda 10.1 either. No preference either way though.

On LLVM side I believe we have almost everything that came with PTX6.4 which arrived in CUDA-10.1.
Clang also has the necessary builtins plumbed through. That's not the case for PTX6.5 & CUDA-10.2.
Most of the CUDA testing is done in the test-suite and updating it for CUDA-10.2 is another TODO item.
At the moment all we can infer from the passing tests is that we didn't break CUDA-10.1 functionality.
If we could force users to never use any new features from 10.2, then we could have declared mission accomplished.
As things stand right now, it would be a matter of time when the actual lack of support would start manifesting itself.

In the past I did now and then do something similar to this patch, allowing compilation
with the new CUDA version without adding the features that come with that version.
Arguably it was better than not being able to compile at all, but I did end up with a fair share
of users trying to compile some code that relied on the latest and greatest CUDA features
only to see things fail with odd error messages -- PTX reporting syntax errors,
clang complaining about unknown functions.

The "warn and assume the last known version" was the best compromise I managed to come up with.
You can *try* using it with a newer CUDA version before someone gets to implement the missing bits,
but whether it will work is very much a YMMV situation. For those who find it acceptable,
there's a big red "Trust me, I know what I'm doing" button in form of -Wno-cuda-unknown-version.
Everybody else gets a somewhat informative warning explaining what's going on.

@tra that's great context, thank you very much for writing it up.

Thanks for all the comments. It makes sense to keep the warning there before any ptx65 features are added. The warning should also apply to both the CUDA and OpenMP compile paths. As a result, I will pursue to ignore the warning in the build of libomp (i.e. libomp_check_linker_flag function in LibompCheckLinkerFlag.cmake). I think it is less pervasive and also can avoid similar occurrence when a new version of CUDA toolkit is available.

If users are annoyed by many many warning messages in the build, as @tra suggested, users can just use -Wno-unknown-cuda-version to suppress it.

Comments?

In D74571#1876617, @kkwli0 wrote:

Thanks for all the comments. It makes sense to keep the warning there before any ptx65 features are added. The warning should also apply to both the CUDA and OpenMP compile paths. As a result, I will pursue to ignore the warning in the build of libomp (i.e. libomp_check_linker_flag function in LibompCheckLinkerFlag.cmake). I think it is less pervasive and also can avoid similar occurrence when a new version of CUDA toolkit is available.

If users are annoyed by many many warning messages in the build, as @tra suggested, users can just use -Wno-unknown-cuda-version to suppress it.

Comments?

That sounds like the right approach for OpenMP. We require a minimal CUDA version, based on what we use internally, but no maximal version if possible since we don't use new features anyway.

In D74571#1876647, @jdoerfert wrote:

That sounds like the right approach for OpenMP. We require a minimal CUDA version, based on what we use internally, but no maximal version if possible since we don't use new features anyway.

SGTM, with a caveat.

Word of caution -- it's not just the features. New CUDA versions also come with sometimes substantially different header files. Clang pre-includes a lot of them, so if the new headers have something funky, clang may need extra changes before it can compile anything, even if you don't use the new features.

If you silence compiler with the option set during build config, it may be prudent to tell the user that CUDA compilation *may* fail if the CUDA version if not one of the known "not quite supported, but usable on par with the oldest supported version."

It turns out that having the warning message also affects the C_SUPPORTS_FPIC test in cmake/modules/HandleLLVMOptions.cmake. As a result, cmake thinks that -fPIC is not supported. Eventually, it leads to error in libclang-cpp.so.

../../lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenModule.cpp.o: In function `clang::CodeGen::CodeGenModule::~CodeGenModule()':
CodeGenModule.cpp:(.text+0x1134): call to `std::_Rb_tree<int, std::pair<int const, llvm::TinyPtrVector<llvm::Function*> >, std::_Select1st<std::pair<int const, llvm::TinyPtrVector<llvm::Function*> > >, std::less<int>, std::allocator<std::pair<int const, llvm::TinyPtrVector<llvm::Function*> > > >::_M_erase(std::_Rb_tree_node<std::pair<int const, llvm::TinyPtrVector<llvm::Function*> > >*)' lacks nop, can't restore toc; recompile with -fPIC
...

I don't think it is a good idea to modify this test which explicitly specifies -Werror.

Any other ideas are definitely welcome!

In D74571#1876952, @kkwli0 wrote:
It turns out that having the warning message also affects the C_SUPPORTS_FPIC test in cmake/modules/HandleLLVMOptions.cmake. As a result, cmake thinks that -fPIC is not supported. Eventually, it leads to error in libclang-cpp.so.
../../lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenModule.cpp.o: In function `clang::CodeGen::CodeGenModule::~CodeGenModule()':
CodeGenModule.cpp:(.text+0x1134): call to `std::_Rb_tree<int, std::pair<int const, llvm::TinyPtrVector<llvm::Function*> >, std::_Select1st<std::pair<int const, llvm::TinyPtrVector<llvm::Function*> > >, std::less<int>, std::allocator<std::pair<int const, llvm::TinyPtrVector<llvm::Function*> > > >::_M_erase(std::_Rb_tree_node<std::pair<int const, llvm::TinyPtrVector<llvm::Function*> > >*)' lacks nop, can't restore toc; recompile with -fPIC
...
I don't think it is a good idea to modify this test which explicitly specifies -Werror.

I am not sure I understand. Do we need to modify stuff outside of /openmp? I was hoping it is our CMake that can be adjusted to make this work as described earlier. TBH, he warning is even not my biggest problem. As long as we get a libomptarget.bc we should be fine.

I am not sure I understand. Do we need to modify stuff outside of /openmp? I was hoping it is our CMake that can be adjusted to make this work as described earlier. TBH, he warning is even not my biggest problem. As long as we get a libomptarget.bc we should be fine.

@jdoerfert Yes. The warning message affects the outcome of another test in cmake. With the approach of ignoring the warning message, we need to modify stuff outside of /openmp. For me, I don't mind to have the warning messages all over the places. The problem we have here is unable to build libomp.so.

In D74571#1877000, @kkwli0 wrote:

I am not sure I understand. Do we need to modify stuff outside of /openmp? I was hoping it is our CMake that can be adjusted to make this work as described earlier. TBH, he warning is even not my biggest problem. As long as we get a libomptarget.bc we should be fine.

@jdoerfert Yes. The warning message affects the outcome of another test in cmake. With the approach of ignoring the warning message, we need to modify stuff outside of /openmp. For me, I don't mind to have the warning messages all over the places. The problem we have here is unable to build libomp.so.

I was naively hoping we can say -Wno-unknown-cuda-version somewhere "early" in openmp/CMake... to ensure we can build the runtime.

Summary

In order to avoid the bug in PR44587, one needs to build with >9.0.
With CUDA toolkit 10.2, building the trunk will fail because 10.0.0 does not support it yet.

So an alternative is to:

patch openmp/runtime/cmake/LibompCheckLinkerFlag.cmake to make the libomp_check_linker_flag function to ignore the "Unknown CUDA version" warning, AND
ask users to build with -DCMAKE_CXX_FLAGS=-Wno-unknown-cuda-version -DCMAKE_C_FLAGS=-Wno-unknown-cuda-version to get around the C_SUPPORTS_FPIC test in cmake/modules/HandleLLVMOptions.cmake if the system has CUDA10.2 installed.

kkwli0 abandoned this revision.Feb 25 2020, 3:10 PM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Cuda.h

3 lines

lib/

Basic/

Cuda.cpp

5 lines

Driver/

ToolChains/

Cuda.cpp

1 line

Diff 244473

clang/include/clang/Basic/Cuda.h

Show All 21 Lines	enum class CudaVersion {
CUDA_70,		CUDA_70,
CUDA_75,		CUDA_75,
CUDA_80,		CUDA_80,
CUDA_90,		CUDA_90,
CUDA_91,		CUDA_91,
CUDA_92,		CUDA_92,
CUDA_100,		CUDA_100,
CUDA_101,		CUDA_101,
LATEST = CUDA_101,		CUDA_102,
		LATEST = CUDA_102,
};		};
const char *CudaVersionToString(CudaVersion V);		const char *CudaVersionToString(CudaVersion V);
// Input is "Major.Minor"		// Input is "Major.Minor"
CudaVersion CudaStringToVersion(const llvm::Twine &S);		CudaVersion CudaStringToVersion(const llvm::Twine &S);

enum class CudaArch {		enum class CudaArch {
UNKNOWN,		UNKNOWN,
SM_20,		SM_20,
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

clang/lib/Basic/Cuda.cpp

Show All 22 Lines	const char *CudaVersionToString(CudaVersion V) {
case CudaVersion::CUDA_91:		case CudaVersion::CUDA_91:
return "9.1";		return "9.1";
case CudaVersion::CUDA_92:		case CudaVersion::CUDA_92:
return "9.2";		return "9.2";
case CudaVersion::CUDA_100:		case CudaVersion::CUDA_100:
return "10.0";		return "10.0";
case CudaVersion::CUDA_101:		case CudaVersion::CUDA_101:
return "10.1";		return "10.1";
		case CudaVersion::CUDA_102:
		return "10.2";
}		}
llvm_unreachable("invalid enum");		llvm_unreachable("invalid enum");
}		}

CudaVersion CudaStringToVersion(const llvm::Twine &S) {		CudaVersion CudaStringToVersion(const llvm::Twine &S) {
return llvm::StringSwitch<CudaVersion>(S.str())		return llvm::StringSwitch<CudaVersion>(S.str())
.Case("7.0", CudaVersion::CUDA_70)		.Case("7.0", CudaVersion::CUDA_70)
.Case("7.5", CudaVersion::CUDA_75)		.Case("7.5", CudaVersion::CUDA_75)
.Case("8.0", CudaVersion::CUDA_80)		.Case("8.0", CudaVersion::CUDA_80)
.Case("9.0", CudaVersion::CUDA_90)		.Case("9.0", CudaVersion::CUDA_90)
.Case("9.1", CudaVersion::CUDA_91)		.Case("9.1", CudaVersion::CUDA_91)
.Case("9.2", CudaVersion::CUDA_92)		.Case("9.2", CudaVersion::CUDA_92)
.Case("10.0", CudaVersion::CUDA_100)		.Case("10.0", CudaVersion::CUDA_100)
.Case("10.1", CudaVersion::CUDA_101)		.Case("10.1", CudaVersion::CUDA_101)
		.Case("10.2", CudaVersion::CUDA_102)
.Default(CudaVersion::UNKNOWN);		.Default(CudaVersion::UNKNOWN);
}		}

const char *CudaArchToString(CudaArch A) {		const char *CudaArchToString(CudaArch A) {
switch (A) {		switch (A) {
case CudaArch::LAST:		case CudaArch::LAST:
break;		break;
case CudaArch::UNKNOWN:		case CudaArch::UNKNOWN:
▲ Show 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	static CudaVersion ToCudaVersion(llvm::VersionTuple Version) {
case 91:		case 91:
return CudaVersion::CUDA_91;		return CudaVersion::CUDA_91;
case 92:		case 92:
return CudaVersion::CUDA_92;		return CudaVersion::CUDA_92;
case 100:		case 100:
return CudaVersion::CUDA_100;		return CudaVersion::CUDA_100;
case 101:		case 101:
return CudaVersion::CUDA_101;		return CudaVersion::CUDA_101;
		case 102:
		return CudaVersion::CUDA_102;
default:		default:
return CudaVersion::UNKNOWN;		return CudaVersion::UNKNOWN;
}		}
}		}

bool CudaFeatureEnabled(llvm::VersionTuple Version, CudaFeature Feature) {		bool CudaFeatureEnabled(llvm::VersionTuple Version, CudaFeature Feature) {
return CudaFeatureEnabled(ToCudaVersion(Version), Feature);		return CudaFeatureEnabled(ToCudaVersion(Version), Feature);
}		}
Show All 11 Lines

clang/lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 628 Lines • ▼ Show 20 Lines	void CudaToolChain::addClangTargetOptions(
CC1Args.push_back("-mlink-builtin-bitcode");		CC1Args.push_back("-mlink-builtin-bitcode");
CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));		CC1Args.push_back(DriverArgs.MakeArgString(LibDeviceFile));

// New CUDA versions often introduce new instructions that are only supported		// New CUDA versions often introduce new instructions that are only supported
// by new PTX version, so we need to raise PTX level to enable them in NVPTX		// by new PTX version, so we need to raise PTX level to enable them in NVPTX
// back-end.		// back-end.
const char *PtxFeature = nullptr;		const char *PtxFeature = nullptr;
switch(CudaInstallation.version()) {		switch(CudaInstallation.version()) {
		case CudaVersion::CUDA_102:
case CudaVersion::CUDA_101:		case CudaVersion::CUDA_101:
PtxFeature = "+ptx64";		PtxFeature = "+ptx64";
break;		break;
case CudaVersion::CUDA_100:		case CudaVersion::CUDA_100:
PtxFeature = "+ptx63";		PtxFeature = "+ptx63";
break;		break;
case CudaVersion::CUDA_92:		case CudaVersion::CUDA_92:
PtxFeature = "+ptx61";		PtxFeature = "+ptx61";
▲ Show 20 Lines • Show All 242 Lines • Show Last 20 Lines