This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Improve CUDA version detection and diagnostics.
ClosedPublic

Authored by tra on Aug 17 2021, 2:55 PM.

Details

Summary

Always use cuda.h to detect CUDA version. It's a more universal approach
compared to version.txt which is no longer present in recent CUDA versions.

Split the 'unknown CUDA version' warning in two:

  • when detected CUDA version is partially supported by clang. It's expected to

work in general, at the feature parity with the latest supported CUDA
version. and may be missing support for the new features/instructions/GPU
variants. Clang will issue a warning.

  • when detected version is new. Recent CUDA versions have been working with

clang reasonably well, and will likely to work similarly to the partially
supported ones above. Or it may not work at all. Clang will issue a warning and
proceed as if the latest known CUDA version was detected.

Diff Detail

Event Timeline

tra created this revision.Aug 17 2021, 2:55 PM
tra requested review of this revision.Aug 17 2021, 2:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2021, 2:55 PM
Hahnfeld added inline comments.Aug 18 2021, 2:35 AM
clang/test/Driver/cuda-version-check.cu
75

Can we keep the [unknown] out of the warning message? I understand that we cannot nicely print a version identifier that we don't know about, so maybe only print if the version is known and newer than PARTIALLY_SUPPORTED?

The change about amdgpu LGTM

tra updated this revision to Diff 367246.Aug 18 2021, 10:24 AM
tra edited the summary of this revision. (Show Details)

Do not report the version if we don't know it.

tra added inline comments.Aug 18 2021, 10:26 AM
clang/test/Driver/cuda-version-check.cu
75

Done.

Hahnfeld accepted this revision.Aug 18 2021, 10:55 AM

Otherwise LGTM

clang/lib/Driver/ToolChains/Cuda.cpp
103–104

I think you have to prepend the space in order to match CUDA version%0 is newer than, or move the %0. But I didn't try locally, might be missing something.

This revision is now accepted and ready to land.Aug 18 2021, 10:55 AM
tra updated this revision to Diff 367261.Aug 18 2021, 11:07 AM

Prepend space to the version string.

tra added inline comments.Aug 18 2021, 11:08 AM
clang/lib/Driver/ToolChains/Cuda.cpp
103–104

Good catch. Fixed.

Hahnfeld added inline comments.Aug 19 2021, 8:39 AM
clang/lib/Driver/ToolChains/Cuda.cpp
209–211

The compiler is now warning here because of the assignment to VERSION in the ternary operator

tra added inline comments.Aug 19 2021, 9:48 AM
clang/lib/Driver/ToolChains/Cuda.cpp
209–211

I'll fix it shortly. I also need to figure out why my build does not produce the warning.

tra updated this revision to Diff 367532.Aug 19 2021, 9:51 AM

Fixed an error spotted by reviewer.

Hahnfeld added inline comments.Aug 19 2021, 10:46 AM
clang/lib/Driver/ToolChains/Cuda.cpp
209–211

The exact warning is

LLVM/src/clang/lib/Driver/ToolChains/Cuda.cpp: In constructor ‘clang::driver::CudaInstallationDetector::CudaInstallationDetector(const clang::driver::Driver&, const llvm::Triple&, const llvm::opt::ArgList&)’:              
LLVM/src/clang/lib/Driver/ToolChains/Cuda.cpp:207:15: warning: operation on ‘((clang::driver::CudaInstallationDetector*)this)->clang::driver::CudaInstallationDetector::Version’ may be undefined [-Wsequence-point]          
       Version = FS.exists(LibDevicePath + "/libdevice.10.bc")
       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     ? Version = CudaVersion::NEW
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     : Version = CudaVersion::CUDA_70;
                     ~~~~~~~~~~~~~~~~~~~~~~~

Re-reading the code, I actually think this is a false positive of GCC 8.3.1 (on CentOS 8) and Version is never undefined. But it was redundant, so it's good to remove it anyway.

This revision was automatically updated to reflect the committed changes.
tambre added a subscriber: tambre.EditedSep 26 2021, 9:33 AM

This unfortunately breaks using Debian distribution CUDA.
The Debian/Ubuntu special case is now useless.

On Debian:

  • cuda.h is in /usr/include, as it must be per policy.
  • libdevice.10.bc is in /usr/lib/cuda/nvidia-cuda-toolkit/libdevice and /usr/lib/cuda/nvvm/libdevice is symlinked to that.

Passing --cuda-path=/usr/lib/cuda previously worked, but now results in trying to parse from /usr/lib/cuda/include/cuda.h (doesn't exist) and Clang defaulting to the newest available version, which isn't supported by the ptxas currently in Debian.
NVCC has no issues with this layout.

Symlinking /usr/lib/cuda/include to /usr/include on Debian's side seems like the obvious short-term solution for trunk.
Could probably be backported to ensure LTSes will work with Clang 14.

If any thoughts for an alternative solution let me know.
I have filed Debian bug #995122.

tra added a comment.Sep 27 2021, 10:26 AM

So, what's the current state of affairs regarding CUDA SDK layout in debian?
Clang does rely on very particular ordering of includes, so having CUDA headers in a location clang does not expect will lead to issues sooner or later.
If the headers are not available where --cuda-path points clang to, I'm not sure what clang is supposed to do. Second-guessing explicit user input is not a good idea in general.

CUDA SDK no longer ships version.txt, so cuda.h is the only reliable mechanism for detecting CUDA version and clang does need to know it.

In case the version has not been found we also can not choose an arbitrary old version as the default.

So, one workaround would be to install CUDA-11.4. AFAICT, we do find the headers and ptxas, but misdetect CUDA version.
Another workaround would be to place a fake /usr/lib/cuda/include/cuda.h with something like this:

#pragma push_macro("CUDA_VERSION")
#define CUDA_VERSION 11030
#pragma pop_macro("CUDA_VERSION")
#include_next <cuda.h>
tambre added a comment.EditedSep 27 2021, 11:26 AM

So, what's the current state of affairs regarding CUDA SDK layout in debian?

/usr/bin: most executables, nvcc and nvprof shims
/usr/lib/nvidia-cuda-toolkit/bin: cicc, crt, nvcc, nvprof, g++ and gcc wrappers for supporting different compilers
/usr/lib/nvidia-cuda-toolkit/libdevice: libdevice.10.bc
/usr/lib/cuda/{bin,include,lib64}: empty directories
/usr/lib/cuda/nvvm/libdevice: symlinked to /usr/lib/nvidia-cuda-toolkit/libdevice
/usr/include: headers

CUDA SDK no longer ships version.txt, so cuda.h is the only reliable mechanism for detecting CUDA version and clang does need to know it.
In case the version has not been found we also can not choose an arbitrary old version as the default.

Agreed, sensible.

So, one workaround would be to install CUDA-11.4. AFAICT, we do find the headers and ptxas, but misdetect CUDA version.

I don't think that's going to be acceptable for most distributions, which might ship a Clang supporting a newer CUDA than the one they ship.

Another workaround would be to place a fake /usr/lib/cuda/include/cuda.h with something like this:

My CMake CI bot encountered cmath template errors after I manually symlinked /usr/lib/cuda/include to /usr/include, though that solved the version detection.
Adding a shim like the one you show fixes everything and compiling works.

Clang does rely on very particular ordering of includes, so having CUDA headers in a location clang does not expect will lead to issues sooner or later.

Having the headers in the system-wide /usr/include shouldn't be a problem though, right?
I'm not quite sure why simply having /usr/lib/cuda/include/usr/include symlink breaks as seen on my bot.
Prior to this an empty /usr/lib/cuda/include and Clang finding them through the regular include search path worked.

If the headers are not available where --cuda-path points clang to, I'm not sure what clang is supposed to do. Second-guessing explicit user input is not a good idea in general.

I agree that second-guessing is bad, but expecting it to be available in the include search path otherwise seems reasonable.

tra added a comment.Sep 27 2021, 11:49 AM

Another workaround would be to place a fake /usr/lib/cuda/include/cuda.h with something like this:

My CMake CI bot [[ https://open.cdash.org/test/498767666 | encountered cmath template errors ]] after I manually symlinked /usr/lib/cuda/include to /usr/include, though that solved the version detection.
Adding a shim like the one you show fixes everything and compiling works.

Clang does rely on very particular ordering of includes, so having CUDA headers in a location clang does not expect will lead to issues sooner or later.

Having the headers in the system-wide /usr/include shouldn't be a problem though, right?
I'm not quite sure why simply having /usr/lib/cuda/include/usr/include symlink breaks as seen on my bot.

If you look at the logs of the broken build you will see that /usr/lib/cuda/include/ appears fairly early in the search path, before the standard library C++ headers.
Symlinking it to /usr/include places a lot of other includes and that breaks the include order for the C++ library which has its own assumptions and requirements.

Prior to this an empty /usr/lib/cuda/include and Clang finding them through the regular include search path worked.

It's possible that the CUDA headers themselves do not need to be placed in the include path that early. We do need cuda_wrappers to be there, but at the moment I do not recall why we ended up placing cuda/include there as well. The fact that it worked for you with the empty directory suggests that it may not be necessary.

If we move cuda includes down to where /usr/include is in the search list, it should allow symlinking it to /usr/include.
I'll check whether that's feasible.

If the headers are not available where --cuda-path points clang to, I'm not sure what clang is supposed to do. Second-guessing explicit user input is not a good idea in general.

I agree that second-guessing is bad, but expecting it to be available in the include search path otherwise seems reasonable.

The issue with the include search path is that it's something that's not readily available at the clang driver level at the point in time where we need to detect cuda version.
The driver does construct the include arguments for cc1 invocations, but all that happens quite a bit later after CUDA detection is done.

clang/test/Driver/Inputs/CUDA-new/usr/local/cuda/nvvm/libdevice/libdevice.10.bc