This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Detect installation in PATH
ClosedPublic

Authored by Hahnfeld on Jan 29 2018, 6:15 AM.

Details

Summary

If the CUDA toolkit is not installed to its default locations
in /usr/local/cuda, the user is forced to specify --cuda-path.
This is tedious and the driver can be smarter if well-known tools
(like ptxas) can already be found in the PATH environment variable.

Add option --cuda-path-ignore-env if the user wants to ignore
set environment variables. Also use it in the tests to make sure
the driver always finds the same CUDA installation, regardless
of the user's environment.

Diff Detail

Repository
rL LLVM

Event Timeline

Hahnfeld created this revision.Jan 29 2018, 6:15 AM
tra accepted this revision.Jan 29 2018, 12:00 PM

Some linux distributions integrate CUDA into the standard directory structure. I.e. binaries go into /usr/bin, headers into /usr/include, bitcode goes somewhere else, etc. ptxas will be found, but we would still fail to detect CUDA. I'd add one more test case to make sure that's still the case.

Otherwise - LGTM.

This revision is now accepted and ready to land.Jan 29 2018, 12:00 PM

Can we document this behavior in https://llvm.org/docs/CompileCudaWithLLVM.html (in the LLVM repo)? Totally fine if you want to do this in a separate patch.

In D42642#990976, @tra wrote:

Some linux distributions integrate CUDA into the standard directory structure. I.e. binaries go into /usr/bin, headers into /usr/include, bitcode goes somewhere else, etc. ptxas will be found, but we would still fail to detect CUDA. I'd add one more test case to make sure that's still the case.

I'm not sure how this can work, we only require bin/ and include/ to exist, and nvvm/libdevice/ if -nocudalib isn't specified. I agree this can be a problem because the defaults might detect an invalid installation...

Can we document this behavior in https://llvm.org/docs/CompileCudaWithLLVM.html (in the LLVM repo)? Totally fine if you want to do this in a separate patch.

Sure, will do!

tra added a comment.Jan 29 2018, 1:44 PM
In D42642#990976, @tra wrote:

Some linux distributions integrate CUDA into the standard directory structure. I.e. binaries go into /usr/bin, headers into /usr/include, bitcode goes somewhere else, etc. ptxas will be found, but we would still fail to detect CUDA. I'd add one more test case to make sure that's still the case.

I'm not sure how this can work, we only require bin/ and include/ to exist, and nvvm/libdevice/ if -nocudalib isn't specified. I agree this can be a problem because the defaults might detect an invalid installation...

Good point. Perhaps we want to be more strict about CUDA installation checking if we've found it indirectly via PATH (as opposed to explicit --cuda-path or a known common install path).
Should we always check for nvvm/libdevice directory? It's unlikely to be under /usr or /usr/local and it will be always present in a CUDA installation of all currently supported CUDA versions.

Hahnfeld updated this revision to Diff 131963.Jan 30 2018, 6:46 AM

Check for libdevice in candidates from PATH.

In D42642#991154, @tra wrote:
In D42642#990976, @tra wrote:

Some linux distributions integrate CUDA into the standard directory structure. I.e. binaries go into /usr/bin, headers into /usr/include, bitcode goes somewhere else, etc. ptxas will be found, but we would still fail to detect CUDA. I'd add one more test case to make sure that's still the case.

I'm not sure how this can work, we only require bin/ and include/ to exist, and nvvm/libdevice/ if -nocudalib isn't specified. I agree this can be a problem because the defaults might detect an invalid installation...

Good point. Perhaps we want to be more strict about CUDA installation checking if we've found it indirectly via PATH (as opposed to explicit --cuda-path or a known common install path).
Should we always check for nvvm/libdevice directory? It's unlikely to be under /usr or /usr/local and it will be always present in a CUDA installation of all currently supported CUDA versions.

That sounds like a sane check. I updated the implementation and added corresponding checks.

tra added inline comments.Jan 30 2018, 10:42 AM
lib/Driver/ToolChains/Cuda.cpp
206 ↗(On Diff #131963)

I think this may be too strict.

Checking directory structure for the purposes of detecting CUDA SDK should work well enough to weed out false detection for 'split' CUDA installation and we've verified libdevice directory presence above.

Checking for libdevice bitcode is somewhat orthogonal to this. IMO, regardless of how we've found the installation directory, whether we have suitable libdevice version there should not matter if user explicitly passed -nocudalib. Insisting on libdevice presence in this situation would be somewhat surprising.

Hahnfeld added inline comments.Jan 30 2018, 10:48 AM
lib/Driver/ToolChains/Cuda.cpp
206 ↗(On Diff #131963)

So you are suggesting to revert the change to this line, right?

tra added a comment.Jan 30 2018, 11:06 AM

I've thought a bit more about this and there's another quirk -- symlinks.

What if we've found /usr/bin/ptxas and is a symlink pointing to the real ptxas in the CUDA installation? If we add /usr to the list of candidates it will not help us at all. We should probably find the real path and add another candidate path derived from it.

lib/Driver/ToolChains/Cuda.cpp
206 ↗(On Diff #131963)

Yes, if you agree with my reasoning.

In D42642#992137, @tra wrote:

I've thought a bit more about this and there's another quirk -- symlinks.

What if we've found /usr/bin/ptxas and is a symlink pointing to the real ptxas in the CUDA installation? If we add /usr to the list of candidates it will not help us at all. We should probably find the real path and add another candidate path derived from it.

Yes, this might be a valid use case, for example linking /usr/bin/ptxas to /opt/cuda/bin/ptxas or something. I think there should be an LLVM function to get the realpath, let me check that.
(In fact I have symlinks in my environment but it doesn't matter in this case because we are just linking the complete CUDA installation to another path...)

lib/Driver/ToolChains/Cuda.cpp
206 ↗(On Diff #131963)

I generally agree that the working bitcode files are orthogonal to the directory structure. However I would argue that there won't be a real world scenario of an empty directory called nvvm/libdevice without any libdevice files in there. That said it probably doesn't make a difference, so I'll follow the separation of concerns.

Hahnfeld updated this revision to Diff 132029.Jan 30 2018, 12:25 PM

Follow symlinked ptxas executables.

tra added inline comments.Jan 30 2018, 12:56 PM
lib/Driver/ToolChains/Cuda.cpp
96–105 ↗(On Diff #132029)

Another corner case:
Debian scatters CUDA install all over the filesystem. To make it work with clang it has a 'shim' package which re-creates complete CUDA install using symlinks to its scattered bits. https://bugs.llvm.org/show_bug.cgi?id=35249. If PATH includes such a shim with a symlink pointing to location somewhere else in the filesystem, this variant of the patch will not work.

I'd add another candidate derived from the path returned by find. This should cover all reasonable scenarios I can think of.

Caveat: clang on Debian already has a special case to add this shim to the list of candidates ( D40453 ), so this patch should not affect it. Still, it's possible for the similar case to happen somewhere else where we do not have any explicit workarounds in clang.

BTW, should this heuristic apply on Windows, too? IIRC cuda installer does add CUDA's bin dir to PATH.

Hahnfeld marked 5 inline comments as done.Jan 30 2018, 1:26 PM
Hahnfeld added a subscriber: sylvestre.ledru.
Hahnfeld added inline comments.
lib/Driver/ToolChains/Cuda.cpp
96–105 ↗(On Diff #132029)

I'd rather not complicate this further. I asked on D40453 and the reply was

Debian packages don't update the PATH and we are aiming at providing packages working out of the box.

So IMO if distributions opt to fully diverge from CUDA's standard directory layout they should add special cases here to make it work.

I'm not really sure regarding Windows:

  1. From what I recall the PATH variable was somewhat dubious under Windows...
  2. Does the installer actually allow to customize the installation path? If not, how likely is it that users move the directory manually?

I can't test on Windows, so I'd generally prefer if you adjust the code and add tests in a separate patch...

tra accepted this revision.Jan 30 2018, 1:44 PM

LGTM.

lib/Driver/ToolChains/Cuda.cpp
96–105 ↗(On Diff #132029)

I don't have windows either.
OK, we can handle oddball CUDA installations and exotic platforms until someone wants it.

Hahnfeld updated this revision to Diff 132103.Jan 31 2018, 12:27 AM
Hahnfeld marked 3 inline comments as done.

Add some sysroot arguments to new test to make sure it doesn't accidentally find CUDA installations in the system.

This revision was automatically updated to reflect the committed changes.
aganea added a subscriber: aganea.Apr 25 2019, 10:22 AM

Just a quick heads-up - the cuda-detect-path.cu test fails on WSL/Ubuntu 18.04:

aganea@MTL-BJ842:/mnt/f/svn/buildWSL$ /usr/bin/python3.6 bin/llvm-lit -vv ../clang/test/Driver/cuda-detect-path.cu
llvm-lit: /mnt/f/svn/llvm/utils/lit/lit/llvm/config.py:341: note: using clang: /mnt/f/svn/buildWSL/bin/clang
-- Testing: 1 tests, single process --
FAIL: Clang :: Driver/cuda-detect-path.cu (1 of 1)
******************** TEST 'Clang :: Driver/cuda-detect-path.cu' FAILED ********************

(output chopped)

+ env PATH=/mnt/f/svn/clang/test/Driver/Inputs/CUDA-symlinks/usr/bin /mnt/f/svn/buildWSL/bin/clang -v --target=i386-unknown-linux --sysroot=/mnt/f/svn/clang/test/Driver/no-cuda-there
+ /mnt/f/svn/buildWSL/bin/FileCheck /mnt/f/svn/clang/test/Driver/cuda-detect-path.cu --check-prefix SYMLINKS
/mnt/f/svn/clang/test/Driver/cuda-detect-path.cu:82:14: error: SYMLINKS: expected string not found in input
// SYMLINKS: Found CUDA installation: {{.*}}/Inputs/CUDA-symlinks/opt/cuda
             ^
<stdin>:1:1: note: scanning from here
clang version 9.0.0 (trunk)
^
<stdin>:4:1: note: possible intended match here
InstalledDir: /mnt/f/svn/buildWSL/bin
^

aganea@MTL-BJ842:/mnt/f/svn/buildWSL$ env PATH=/mnt/f/svn/clang/test/Driver/Inputs/CUDA-symlinks/usr/bin /mnt/f/svn/buildWSL/bin/clang -v --target=i386-unknown-linux --sysroot=/mnt/f/svn/clang/test/Driver/no-cuda-there
clang version 9.0.0 (trunk)
Target: i386-unknown-linux
Thread model: posix
InstalledDir: /mnt/f/svn/buildWSL/bin
aganea@MTL-BJ842:/mnt/f/svn/buildWSL$ ls -l /mnt/f/svn/clang/test/Driver/Inputs/CUDA-symlinks/usr/bin
total 0
-rwxrwxrwx 1 aganea aganea 29 Feb  1  2018 ptxas
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2019, 10:22 AM
Hahnfeld marked an inline comment as done.Apr 26 2019, 6:52 AM

Just a quick heads-up - the cuda-detect-path.cu test fails on WSL/Ubuntu 18.04:

Hmm, maybe that's because of symlinks. By chance, can you debug print the values of real_path and parent_path? They are supposed to resolve the links which is required for the candidate search to work and the test to pass.

cfe/trunk/lib/Driver/ToolChains/Cuda.cpp
98–101

These calls.

ormris removed a subscriber: ormris.Apr 26 2019, 9:34 AM
aganea added a subscriber: rnk.EditedMay 1 2019, 1:57 PM

So it turns out this is a symlink issue. The file clang/trunk/test/Driver/Inputs/CUDA-symlinks/usr/bin/ptxas has been synchronized on my Windows 10 PC as a regular text file, not a symlink. It looks like TortoiseSVN doesn't implement symlinks. As WSL inherits of my file system, it will not find the symbolic link. I suppose REQUIRES: !system-windows isn't enough for cuda-detect-path.cu, and it would need something like REQUIRES: symlinks along with support in lit. @rnk