Page MenuHomePhabricator

Update the list of CUDA versions up to 11.5
Needs RevisionPublic

Authored by mojca on Sat, Nov 20, 1:55 PM.

Details

Reviewers
tra
Hahnfeld
Summary

I've been trying to address the following issue in clangd for Visual Studio Code trying to access CUDA:
https://github.com/clangd/clangd/issues/793

This patch alone is not yet sufficient for a fully functional clangd, but it gets rid of the error message saying

Cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.

in case clangd and the code is located on the C: drive.

In my case I had both clangd (which I needed for debugging purposes) and the source code on D:. At this point:

Candidates.emplace_back(D.SysRoot + "/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v" + Ver);

the D.SysRoot seems to be empty. Then, during the call to

if (InstallPath.empty() || !FS.exists(InstallPath))

after sys::fs::make_absolute(WD->Resolved, Storage); (in VirtualFileSystem.cpp) the Storage expands to D:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4 which isn't found either.

Diff Detail

Event Timeline

mojca created this revision.Sat, Nov 20, 1:55 PM
mojca requested review of this revision.Sat, Nov 20, 1:55 PM
mojca edited the summary of this revision. (Show Details)Sat, Nov 20, 1:57 PM
carlosgalvezp accepted this revision.Sun, Nov 21, 4:07 AM

Thanks for the fix! I'm surprised nobody complained about this until now, CUDA 8 is really old.

This revision is now accepted and ready to land.Sun, Nov 21, 4:07 AM

Well, even after this patch neither clang nor clangd work correctly for me (I need some patches in llvm/clang sources, there are some issues with Microsoft's libraries; I wasn't able to make the linker work even after that), and CMake doesn't fully support CUDA + Clang on Windows either.
This is just the first baby step, there's probably still a relatively long way to go before anyone can actually use this chunk of the code out of the box.

And thanks to Carlos for accepting the patch.

(In case it's not a super demanding task, I would be willing to invest a bit of time towards making CUDA + Clang on Windows work better, but it would help to have "a supervisor" I could turn to when I get stuck or when I have questions.)

And thanks to Carlos for accepting the patch.

(In case it's not a super demanding task, I would be willing to invest a bit of time towards making CUDA + Clang on Windows work better, but it would help to have "a supervisor" I could turn to when I get stuck or when I have questions.)

You are welcome! I think it's great if you want to contribute to the CUDA support. As for myself, I am very new here as well and only pushed one very obvious patch to CUDA, so I don't feel confident reviewing more complex work. I believe the best people to supervise/review that are the recently added reviewers :)

kadircet added inline comments.Mon, Nov 22, 2:36 AM
clang/lib/Driver/ToolChains/Cuda.cpp
131

looks like the list is getting big and hard to maintain. considering that this is done only once per compiler invocation (and we check for existence of directories down in the loop anyway). what about throwing in an extra directory listing to base-directories mentioned down below and populate Candidates while preserving the newest-version-first order?

mojca added inline comments.Mon, Nov 22, 9:27 AM
clang/lib/Driver/ToolChains/Cuda.cpp
131

I totally agree with the sentiment, and that was my initial thought as well, but with zero experience I was too scared to make any more significant changes.

I can try to come up with a new patch (that doesn't need further maintenance whenever a new CUDA version gets released) if that's what you are suggesting. I would nevertheless merge this one, and prepare a new more advanced patch separately, but that's finally your call.

What's your suggestion about D.SysRoot + "Program Files/..."? At the time when this function gets called it looks like D.SysRoot is empty (or at least my debugger says so) and in my case it resolves to D: while the CUDA support is installed under C:.

Is there any special LLVM-specific/preferrable way to iterate through directories?

(What I also miss a bit in the whole process in an option to simply say "I want CUDA 11.1" without the need to explicitly spell out the full path.)

If you provide me give some general guidelines, I'll prepare another, hopefully more future-proof patch.

(Side note: I'm not sure if I'm calling clang-format correctly, but if I call it, it keeps reformatting the rest of this file.)

tra added inline comments.Mon, Nov 22, 12:20 PM
clang/lib/Driver/ToolChains/Cuda.cpp
131

This whole list may no longer be particularly useful. The most common use case on Linux, AFAICT, is to install only one CUDA version using system-provided package manager.
E.g. https://packages.ubuntu.com/focal/amd64/nvidia-cuda-toolkit/filelist

TBH, I'm tempted to limit autodetection to only that one system-default version and require user to use --cuda-path if they need something else.

132

I think these can be dropped altogether. Maybe 9,x too.

mojca added inline comments.Wed, Nov 24, 2:14 AM
clang/lib/Driver/ToolChains/Cuda.cpp
131

On Windows this is certainly not the case. Unless the installation is changed manually, one always gets the new version installed into a new directory.

I really do need multiple versions on Windows (and the ability to pick an older one) if I want to compile a binary that works on someone else's computer (if I compile against the latest CUDA, users need "the latest" drivers that may sometimes not even be available for their machine).

(That said, at least when using CMake, the selection needs to be done by CMake anyway, and maybe CMake could be forced to specify the correct flag automatically.)

So even if the functionality gets removed from non-Windows platforms, it would be really nice to keep it for Windows.

Now, there are three "conflicting" feedbacks/suggestions above. I can try to improve support, but it would be really helpful to reach the consensus of what needs to be done before coding it.

carlosgalvezp added inline comments.Wed, Nov 24, 2:37 AM
clang/lib/Driver/ToolChains/Cuda.cpp
131

one always gets the new version installed into a new directory.

A similar thing happens on Linux.

users need "the latest" drivers

Since CUDA 10.2, there's some "compatibility mode" that allows to run newer CUDA on older drivers. As long as you are not using the latest features, of course.

I'm personally all up for removing redundancy and duplication.

mojca added inline comments.Wed, Nov 24, 5:18 AM
clang/lib/Driver/ToolChains/Cuda.cpp
131

I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under /usr/local/cuda-11.x.

That sounds significant enough to me to argue against the removal of versioned folders from search.

tra added inline comments.Wed, Nov 24, 10:36 AM
clang/lib/Driver/ToolChains/Cuda.cpp
131

I think on windows (I mean the windows environment itself, not WSL), CUDA installer sets an environment variable which could be used to detect the default CUDA version, so it may warrant a windows-specific way to find it.

131

I'm following https://docs.nvidia.com/cuda/wsl-user-guide/index.html right now and the NVIDIA's "official packages" for Ubuntu get installed under /usr/local/cuda-11.x.

OK, that's something that will be used often enough, so still need to deal with versioned directories. :-(

That sounds significant enough to me to argue against the removal of versioned folders from search.

I'm not against probing for multiple versions in principle.

What I'm saying is that:

  • blindly increasing the number of probed directories comes with a price and should be done cautiously.
  • it may be a good time to revisit how we detect CUDA installations and make sure it makes sense now.

There are two considerations.

First is the overhead it adds to compiler driver. While for most users running locally it's negligible, it would be noticed for the users who may have /usr/local mounted over NFS, which is not that unusual in institutional environments. Another thing to consider that it will be noticed by *all* such users, even if they don't use CUDA. E.g. all C++ compilations will be probing for those directories.

The other issue is that we should differentiate between finding the canonical location of CUDA SDK, vs picking one out of many CUDA SDK versions that may be installed simultaneously. I'd argue that in case of having multiple versions compiler has no business picking one version over another (though we could make an attempt to use the most recent one as we do now). It's up to the user to explicitly specify the one they want. When we allow to pick one version out of many, it makes it way too easy for a user to end up with a mix of the default version and the version they want, all they need to do is to forget --cuda-path somewhere in their build.

For this patch, I propose to drop the 9.x and 8.x so we keep the number of probed paths under control.

131

Since CUDA 10.2, there's some "compatibility mode" that allows to run newer CUDA on older drivers.

This only works with very specific versions of the drivers and those are not very common on the end-user machines. It's mostly for the datacenter use.

mojca added inline comments.Wed, Nov 24, 11:47 AM
clang/lib/Driver/ToolChains/Cuda.cpp
131

I think on windows (I mean the windows environment itself, not WSL), CUDA installer sets an environment variable which could be used to detect the default CUDA version, so it may warrant a windows-specific way to find it.

I see that I have

CUDA_PATH       = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
CUDA_PATH_V11_4 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4
CUDA_PATH_V11_3 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3

etc. Using those directly might in fact be a good idea.

The other issue is that we should differentiate between finding the canonical location of CUDA SDK, vs picking one out of many CUDA SDK versions that may be installed simultaneously.

Agreed. (I also wish clang would accept something like --cuda-version=11.4 that would automatically work on Windows and Linux whenever CUDA resides under /usr/local/cuda-x.y)

I wanted to add another strange observation. Today I installed the latest CUDA (11.5) and clang ("14") under Ubuntu 20.04 inside WSL 2 on Windows 11 (equivalent to a regular Ubuntu, I would say). I had to explicitly provide the -L flag despite using --cuda-path (else the linker fails) which sounds relatively strange to me:

clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart -L /usr/local/cuda-11.5/lib64 hello.c -o hello

First is the overhead it adds to compiler driver. While for most users running locally it's negligible, it would be noticed for the users who may have /usr/local mounted over NFS, which is not that unusual in institutional environments. Another thing to consider that it will be noticed by *all* such users, even if they don't use CUDA. E.g. all C++ compilations will be probing for those directories.

Are you saying that clang executes this portion of the code even when it knows that it's not compiling *.cu files? Why does it need to know anything at all about directories related to CUDA? (I'm curious, but there's no real need to answer, it's outside the scope of this ticket ;)

For this patch, I propose to drop the 9.x and 8.x so we keep the number of probed paths under control.

Do you want me to upload another patch that only keeps versions 10.0 up to 11.5 on the list, and then open a new ticket in the tracker describing what the rework should be, and potentially start working towards that goal as a separate patch?

carlosgalvezp added inline comments.Wed, Nov 24, 1:51 PM
clang/lib/Driver/ToolChains/Cuda.cpp
131

It's mostly for the datacenter use.

I'm not sure about that. We have recently switched to CUDA 11.4 with old 460 drivers on regular desktop machines (CUDA 11.4 comes with 470 drivers) and everything works just fine (not using CUDA 11.4 features, naturally)

The docs specify the minimum version required:
CUDA 11.x >= 450.80.02*

I think the "Enterprise" thing you are referring to only applies to the "3. Forward compatibility" section, not the "2. Minor version compatibility" section.

I had to explicitly provide the -L flag

Yes, this is documented

tra added inline comments.Wed, Nov 24, 5:29 PM
clang/lib/Driver/ToolChains/Cuda.cpp
131

I'm not saying that cuda_compat is exclusively for the datacenter use, just that the datacenter is the primary audience.

Regular users are not as constrained in updating the drivers to a more recent version and are much more likely to use the drivers that are *not* supported by cuda_compat.
The end result is that in practice "upgrade your drivers" is much more likely to work for most of the end users than "use cuda_compat" -- less moving parts this way and the result is more predictable.

The docs specify the minimum version required: CUDA 11.x >= 450.80.02*

It's complicated. See compatibility between CUDA versions and various driver versions:
https://docs.nvidia.com/deploy/cuda-compatibility/#use-the-right-compat-package

It also has the downside of lack of interoperability with OpenGL, which may be needed by some users.

Anyways, we've got way off topic here.

mojca updated this revision to Diff 389678.Thu, Nov 25, 12:37 AM
carlosgalvezp accepted this revision.Thu, Nov 25, 1:13 AM

LGTM but I think @tra should have the final word.

clang/lib/Driver/ToolChains/Cuda.cpp
131

we've got way off topic here.

Agreed, apologies for that and thanks for the discussion!

mojca added a comment.Thu, Nov 25, 8:15 AM

I opened https://reviews.llvm.org/D114601. I wasn't sure if that's something that should have been combined with this ticket or not because it can be merged or rejected independently.

I also opened a ticket for Microsoft STL on https://github.com/microsoft/STL/issues/2359.

mojca added a comment.EditedThu, Nov 25, 9:05 AM

Somewhat off-topic from a discussion earlier in the thread.
What's the purpose of the following code then if users are supposed to explicitly specify the -L flag anyway?

if (HostTriple.isArch64Bit() && FS.exists(InstallPath + "/lib64"))
  LibPath = InstallPath + "/lib64";
else if (FS.exists(InstallPath + "/lib"))
  LibPath = InstallPath + "/lib";
else
  continue;

The code above may be improved for Windows (it needs a different path there), for example like this:

if (HostTriple.isOSWindows() && (HostTriple.getArch() == llvm::Triple::ArchType::x86_64) && FS.exists(InstallPath + "/lib/x64"))
  LibPath = InstallPath + "/lib/x64";
else if (HostTriple.isArch64Bit() && FS.exists(InstallPath + "/lib64"))
  LibPath = InstallPath + "/lib64";
else if (FS.exists(InstallPath + "/lib"))
  LibPath = InstallPath + "/lib";
else
  continue;

but I fail to figure out how to make use of this variable.

CMake might be able to add the right flags automatically (it doesn't currently support Clang with CUDA on Windows), but writing the following down "manually" is relatively annoying:

clang++.exe hello.cu -o hello --cuda-path="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4" -l cudart -L "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4/lib/x64"

Also, it's probably destined to lead into issues if the user needs to manually specify the library path, while the cuda path is determined automatically.

carlosgalvezp resigned from this revision.Fri, Nov 26, 1:16 AM

I just learned that by approving this patch I make it not visible for other reviewers that they should still review, so I'll remove my vote. LGTM though.

This revision now requires review to proceed.Fri, Nov 26, 1:16 AM
mojca added a comment.Mon, Nov 29, 2:16 AM

@tra: what should be done about the unit test that checks whether CUDA SDK 8.0 specifically has been found?
Apparently you are taking care of the buildbot configuration performing those tests.

And how should we proceed in general, what can I do next? Thanks.

tra added a comment.Mon, Nov 29, 10:40 AM

Somewhat off-topic from a discussion earlier in the thread.
What's the purpose of the following code then if users are supposed to explicitly specify the -L flag anyway?

Good point, it is indeed unused.

clang++.exe hello.cu -o hello --cuda-path="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4" -l cudart -L "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.4/lib/x64"

Also, it's probably destined to lead into issues if the user needs to manually specify the library path, while the cuda path is determined automatically.

Yup. That's why I'm a strong proponent of not autosearching beyond the canonical location and ask users to explicitly specify the CUDA SDK they want to use.
In case of the canonical location on windows, users can then use -L "$CUDA_PATH%/lib/x64" everywhere.

As for automatically adding linker flage, it's been considered before and the conclusion was that it's not very useful.
Unlike nvcc, clang's linking phase has no idea about CUDA and we can not assume that any of the objects or libraries we're linking with have anything to do with CUDA.
It is useful in a niche case when one compiles source-to-executable in one clang invocation and where we do know that we're dealing with CUDA, but this is not how clang is used most of the time, when compilation and linking are done separately.

So, yes, you could automatically add linking flags in your example, but it's not feasible to do so for the linking phase in general. It would also introduce inconsistency in clang's linking phase behavior and would lead to questions -- why things work with clang hello.co, but not with clang -c hello.cu; clang hello.o ?

tra accepted this revision.Mon, Nov 29, 10:40 AM
This revision is now accepted and ready to land.Mon, Nov 29, 10:40 AM

@tra: this is not yet 100% ready since the unit tests are now failing (expecting to find CUDA 8.0).
I can fix the unit test, but I suppose that someone needs to install additional SDK somewhere into the infrastructure as well?

So, yes, you could automatically add linking flags in your example, but it's not feasible to do so for the linking phase in general. It would also introduce inconsistency in clang's linking phase behavior and would lead to questions -- why things work with clang hello.co, but not with clang -c hello.cu; clang hello.o ?

Please note that I'm a complete newbie to compilers.
Te flag --cuda-path= slightly reminds me of the --framework keyword on macOS. I'm not behind a mac right now, so please forgive me any typo or wrong paths (it's also not exact since the frameworks are searched for in different folders etc.), below is just a general idea.

# this one effectively adds -I/System/Library/Frameworks/OpenGL.framework/Headers
clang++ -c hello.cpp --framework OpenGL

# this one effectively adds -L/System/Library/Frameworks/OpenGL.framework/lib as well as the library itself (-l<name>)
clang++ hello.o --framework OpenGL

# this one adds both include (-I) and linker directory (-L) flags
clang++ hello.cpp --framework OpenGL

What would be cool to me was if --cuda-path was also implicitly adding the -L flag, so that the following would work:

clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart hello.cu
clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart hello.o

I don't know whether it would be acceptable to add (the default) cuda library path to the linker though.

tra added a comment.Mon, Nov 29, 1:14 PM

@tra: this is not yet 100% ready since the unit tests are now failing (expecting to find CUDA 8.0).
I can fix the unit test, but I suppose that someone needs to install additional SDK somewhere into the infrastructure as well?

Clang tests run w/o CUDA SDK itself. We have some mock installations under Inputs in the test directory. You may need to adjust both the inputs and the tests to work with your changes.

So, yes, you could automatically add linking flags in your example, but it's not feasible to do so for the linking phase in general. It would also introduce inconsistency in clang's linking phase behavior and would lead to questions -- why things work with clang hello.co, but not with clang -c hello.cu; clang hello.o ?

Please note that I'm a complete newbie to compilers.
The flag --cuda-path= slightly reminds me of the --framework keyword on macOS. I'm not behind a mac right now, so please forgive me any typo or wrong paths (it's also not exact since the frameworks are searched for in different folders etc.), below is just a general idea.

Mac is... special in more than one way. "We do it on Mac" alone is not a very strong argument for clang's behavior everywhere else.

My general rule of thumb is that a flag should serve specific purpose and should not duplicate the work of existing flags.
The way I see it is that CUDA libraries are not any different than any other libraries an app must be linked with and we already have -L to tell the linker where to find them.

What would be cool to me was if --cuda-path was also implicitly adding the -L flag, so that the following would work:

clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart hello.cu
clang++ --cuda-path=/usr/local/cuda-11.5 -l cudart hello.o

I don't know whether it would be acceptable to add (the default) cuda library path to the linker though.

TBH, I don't see much of a benefit over clang++ -L/usr/local/cuda-11.5/lib64 -l cudart hello.o, and I do see a few downsides in form of implicit -L interfering with explicit -L. E.g. will a user need to consider where on the command line they place --cuda-path relative to -L/-l now? Where (or whether?) will the linker path be added if clang is run w/o --cuda-path? How many users will be surprised by magically materialized linking paths (that they were not aware of) interfering with explicitly specified -L? E.g. user runs clang -L/non/default/cuda/lib64 and the implicitly added -L picks the library from the auto-found CUDA location, instead of the one specified by the user. Etc.

tra requested changes to this revision.Mon, Nov 29, 1:21 PM

With D114601, this patch would no longer be needed.

This revision now requires changes to proceed.Mon, Nov 29, 1:21 PM