This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Add partial support for recent CUDA versions.
ClosedPublic

Authored by tra on Apr 7 2020, 11:58 AM.

Details

Summary

Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.

Diff Detail

Event Timeline

tra created this revision.Apr 7 2020, 11:58 AM
yaxunl accepted this revision.Apr 7 2020, 2:37 PM

LGTM. Thanks!

This revision is now accepted and ready to land.Apr 7 2020, 2:37 PM
This revision was automatically updated to reflect the committed changes.

@tra The split between LATEST and LATEST_SUPPORTED leads to very weird warning and error messages:

clang-14: warning: unknown CUDA version: cuda.h: CUDA_VERSION=11040.; assuming the latest supported version 10.1 [-Wunknown-cuda-version]
clang-14: error: cannot find libdevice for sm_20; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice                                                                                                                    
clang-14: error: GPU arch sm_20 is supported by CUDA versions between 7.0 and 8.0 (inclusive), but installation at /usr/local/cuda-11.4 is 11.2; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'

Clang is mentioning three different CUDA versions here: 11.4 is what I really have installed, 11.2 is LATEST and therefore the one returned by getCudaVersion or as the "last resort" in CudaInstallationDetector, and the first warning says that Clang assumes the latest supported version 10.1. As a developer looking into the code, I get that the first warning is about saying that 10.1 is the latest fully supported version in terms of features, but I think this is really confusing to users. Do you see a chance to improve this? (other than adding just 11.3 and 11.4 to the enumerations where we'll always run behind)

Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2021, 7:26 AM
Herald added a subscriber: dexonsmith. · View Herald Transcript
tra added a comment.EditedAug 13 2021, 11:39 AM

@tra The split between LATEST and LATEST_SUPPORTED leads to very weird warning and error messages:

Agreed, it's far from ideal. There's also more than one issue involved.

clang-14: warning: unknown CUDA version: cuda.h: CUDA_VERSION=11040.; assuming the latest supported version 10.1 [-Wunknown-cuda-version]

The good news is that we've grown support for enough clang builtins and PTX instructions to bump the "latest supported" to ~CUDA-11.3 or, maybe, even 11.4. At least, clang should be able to compile all CUDA headers in those versions.
This should reduce the noise.

clang-14: error: cannot find libdevice for sm_20; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice

It's also time to bump the default GPU target to something that's supported by the CUDA versions we reasonably expect to see. That should probably be sm_35 as that's probably the oldest GPU platform that's still widely available (e.g. there are tons of them on Google cloud and AWS) and is still supported by all CUDA versions clang accepts.

clang-14: error: GPU arch sm_20 is supported by CUDA versions between 7.0 and 8.0 (inclusive), but installation at /usr/local/cuda-11.4 is 11.2; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'

Perhaps it's time to start considering decommisioning sm_20 support in clang and NVPTX. nvcc has done that long ago and is already on the way to dropping sm_3x, too. sm_30 is no longer supported and sm_35 has been deprecated and is expected be gone in the next CUDA release.

Clang is mentioning three different CUDA versions here: 11.4 is what I really have installed, 11.2 is LATEST and therefore the one returned by getCudaVersion or as the "last resort" in CudaInstallationDetector, and the first warning says that Clang assumes the latest supported version 10.1. As a developer looking into the code, I get that the first warning is about saying that 10.1 is the latest fully supported version in terms of features, but I think this is really confusing to users. Do you see a chance to improve this? (other than adding just 11.3 and 11.4 to the enumerations where we'll always run behind)

I'm open to suggestions. This was the least bad compromise I managed to come up with.

We could report the actually detected version, instead of the 'latest' version clang knows about. Or not report it at all as it's not particularly helpful for the end user. That would mitigate one source of confusion.

As for the latest supported, I think we may still want to have it in some form. Clang has to deal with version-specific CUDA quirks, so a CUDA version outside of the range that clang is known to work with puts the user in uncharted waters. E.g. until recently clang worked well enough with CUDA-11.3, but only if you were compiling for the older GPUs. Attempts to compile some headers for sm_80 would fail and that *was* confusing to users who ran into that when the warning was disabled.

In D77670#2944192, @tra wrote:

@tra The split between LATEST and LATEST_SUPPORTED leads to very weird warning and error messages:

Agreed, it's far from ideal. There's also more than one issue involved.

Unfortunately, yes...

clang-14: warning: unknown CUDA version: cuda.h: CUDA_VERSION=11040.; assuming the latest supported version 10.1 [-Wunknown-cuda-version]

The good news is that we've grown support for enough clang builtins and PTX instructions to bump the "latest supported" to ~CUDA-11.3 or, maybe, even 11.4. At least, clang should be able to compile all CUDA headers in those versions.
This should reduce the noise.

Great!

clang-14: error: cannot find libdevice for sm_20; provide path to different CUDA installation via '--cuda-path', or pass '-nocudalib' to build without linking with libdevice

It's also time to bump the default GPU target to something that's supported by the CUDA versions we reasonably expect to see. That should probably be sm_35 as that's probably the oldest GPU platform that's still widely available (e.g. there are tons of them on Google cloud and AWS) and is still supported by all CUDA versions clang accepts.

+1 for at least sm_35 - that would match recent nvccs, right?

clang-14: error: GPU arch sm_20 is supported by CUDA versions between 7.0 and 8.0 (inclusive), but installation at /usr/local/cuda-11.4 is 11.2; use '--cuda-path' to specify a different CUDA install, pass a different GPU arch with '--cuda-gpu-arch', or pass '--no-cuda-version-check'

Perhaps it's time to start considering decommisioning sm_20 support in clang and NVPTX. nvcc has done that long ago and is already on the way to dropping sm_3x, too. sm_30 is no longer supported and sm_35 has been deprecated and is expected be gone in the next CUDA release.

+1 - given that Clang 13.x just branched, now may be an ideal moment to make this cut.

Clang is mentioning three different CUDA versions here: 11.4 is what I really have installed, 11.2 is LATEST and therefore the one returned by getCudaVersion or as the "last resort" in CudaInstallationDetector, and the first warning says that Clang assumes the latest supported version 10.1. As a developer looking into the code, I get that the first warning is about saying that 10.1 is the latest fully supported version in terms of features, but I think this is really confusing to users. Do you see a chance to improve this? (other than adding just 11.3 and 11.4 to the enumerations where we'll always run behind)

I'm open to suggestions. This was the least bad compromise I managed to come up with.

We could report the actually detected version, instead of the 'latest' version clang knows about. Or not report it at all as it's not particularly helpful for the end user. That would mitigate one source of confusion.

As for the latest supported, I think we may still want to have it in some form. Clang has to deal with version-specific CUDA quirks, so a CUDA version outside of the range that clang is known to work with puts the user in uncharted waters. E.g. until recently clang worked well enough with CUDA-11.3, but only if you were compiling for the older GPUs. Attempts to compile some headers for sm_80 would fail and that *was* confusing to users who ran into that when the warning was disabled.

Yeah, the problem was that I didn't have better suggestions either when I wrote the first comment. But maybe now: How about having a "past-the-latest" value in the enum that Clang remembers if it detects a version more recent than it knows about? Then we could have two warnings:

  • If we have a "past-the-latest" version, tell the user that Clang has no clue about this version and we assume the LATEST version; things might work, but no guarantees.
  • If we have a version that is greater than the latest supported version, emit the current warning and say that support is "best-effort" (or something along that line). In that case, both the detected version and the "assumed" supported version should make sense to the user.
tra added a comment.Aug 13 2021, 2:11 PM

It's also time to bump the default GPU target to something that's supported by the CUDA versions we reasonably expect to see. That should probably be sm_35 as that's probably the oldest GPU platform that's still widely available (e.g. there are tons of them on Google cloud and AWS) and is still supported by all CUDA versions clang accepts.

+1 for at least sm_35 - that would match recent nvccs, right?

NVCC in 11.4.1 defaults to sm_52 as the oldest non-deprecated GPU. I don't think it's time for clang to go that far as, unlike NVCC, we have to deal with older CUDA versions, too. For us the lowest common denominator for supported CUDA versions and GPUs hardware availability is sm_35.

I could also argue the other way around -- it may make sense to set default GPU to the most recent one supported by all CUDA versions. That will allow clang to compile larger subset of existing CUDA code (new GPUs support more builtins/features the code may rely on.

We could set target GPU to the most recent GPU variant supported by the CUDA version we've found. This, however will mean that the target will change from one system to another, depending on which CUDA version happens to be installed. I think that would be pushing it too far.

In any case, there's no universally good choice as we don't know which GPU the user needs the code for. If our choice is wrong, the app will not run. In practice users do need to specify both CUDA SDK path and the list of GPUs they want to compile for. The defaults, especially the default GPU target, is likely to be wrong more often than not.

Yeah, the problem was that I didn't have better suggestions either when I wrote the first comment. But maybe now: How about having a "past-the-latest" value in the enum that Clang remembers if it detects a version more recent than it knows about? Then we could have two warnings:

  • If we have a "past-the-latest" version, tell the user that Clang has no clue about this version and we assume the LATEST version; things might work, but no guarantees.
  • If we have a version that is greater than the latest supported version, emit the current warning and say that support is "best-effort" (or something along that line). In that case, both the detected version and the "assumed" supported version should make sense to the user.

SGTM. I'll send the patch next week and we can discuss the details there.