Page MenuHomePhabricator

[CUDA] Fix nvidia-cuda-toolkit detection on Ubuntu
ClosedPublic

Authored by jdenny on Tue, Dec 4, 7:15 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

jdenny created this revision.Tue, Dec 4, 7:15 AM

Ah, this packaging style is a never ending source of fun... Adding isUbuntu() seems fine to solve the first issue.

For the second point I'd instead propose to fix FindCuda.cmake: It's wrong to report /usr if the shim tree is in /usr/lib/cuda. @sylvestre.ledru what's your take on this, looking from a Debian perspective?

tra requested changes to this revision.Tue, Dec 4, 10:06 AM

I'm not sure that's something that needs to be fixed in clang.

IIUIC, Debian has added a shim that pretends to be a monolithic CUDA install:
https://bugs.launchpad.net/ubuntu/+source/clang/+bug/1706326
That change seems to be in Ubuntu bionic (18.04) https://packages.ubuntu.com/en/bionic/nvidia-cuda-toolkit
With that fix in place --cuda-path=/usr/lib/cuda should work.

--cuda-path=/usr was never supposed to work -- /usr is *not* the root of the CUDA SDK.

I guess that just adding the check for isUbuntu() should make clang work on Ubuntu 18.04+.

This revision now requires changes to proceed.Tue, Dec 4, 10:06 AM

@tra, @Hahnfeld: Thanks for your replies.

In D55269#1318901, @tra wrote:

I'm not sure that's something that needs to be fixed in clang.

IIUIC, Debian has added a shim that pretends to be a monolithic CUDA install:
https://bugs.launchpad.net/ubuntu/+source/clang/+bug/1706326
That change seems to be in Ubuntu bionic (18.04) https://packages.ubuntu.com/en/bionic/nvidia-cuda-toolkit

apt confirms that's what I have: nvidia-cuda-toolkit 9.1.85-3ubuntu1

With that fix in place --cuda-path=/usr/lib/cuda should work.

Seems to. To be clear, I'm trying to address the use case where cmake/clang finds the cuda installation automatically.

--cuda-path=/usr was never supposed to work -- /usr is *not* the root of the CUDA SDK.

/usr/lib/cuda/bin/nvcc doesn't exist, so that's probably why FindCUDA.cmake finds /usr/bin/nvcc (also installed by nvidia-cuda-toolkit). Is it fair then to say that /usr/lib/cuda isn't the root either?

I guess that just adding the check for isUbuntu() should make clang work on Ubuntu 18.04+.

It fixes the first issue I reported. It does not fix the second.

It seems that nvidia-cuda-toolkit still isn't installing a complete CUDA install in one location. Even if it installed it all to /usr/lib/cuda, FindCUDA.cmake would probably still see /usr/bin/nvcc and assume /usr is the CUDA install root.

What's the path forward?

By the way, nvidia-cuda-toolkit does install the following, but there's no nvvm directory as clang currently expects:

/usr/lib/nvidia-cuda-toolkit
/usr/lib/nvidia-cuda-toolkit/bin
/usr/lib/nvidia-cuda-toolkit/bin/cicc
/usr/lib/nvidia-cuda-toolkit/bin/crt
/usr/lib/nvidia-cuda-toolkit/bin/crt/link.stub
/usr/lib/nvidia-cuda-toolkit/bin/crt/prelink.stub
/usr/lib/nvidia-cuda-toolkit/bin/g++
/usr/lib/nvidia-cuda-toolkit/bin/gcc
/usr/lib/nvidia-cuda-toolkit/bin/nvcc
/usr/lib/nvidia-cuda-toolkit/bin/nvcc.profile
/usr/lib/nvidia-cuda-toolkit/libdevice
/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc
tra added a comment.Tue, Dec 4, 2:15 PM

It appears that what you're trying to do is to add "/usr/lib/cuda" on Ubuntu and Debian when --cuda-path=/usr is specified.
This is a rather odd thing to do. In the end only one of those paths will be in effect and that's the path that should be specified via --cuda-path. The fact that you want to add /usr/cuda/lib in this case suggests that /usr is the wrong path and /usr/lib/cuda is the correct one. It sounds like you need to change your build system and tell clang the correct path.

I do not think that changing clang to work around an issue in cmake files of one project is something we want to do.

There is also an easy workaround. Just download CUDA SDK, install it into a local directory and point your build system there. This will work on any linux distribution.

jdenny added a comment.Tue, Dec 4, 2:47 PM
In D55269#1319252, @tra wrote:

It appears that what you're trying to do is to add "/usr/lib/cuda" on Ubuntu and Debian when --cuda-path=/usr is specified.
This is a rather odd thing to do.

My real goal is to get clang and openmp working out of the box on Ubuntu. Treating --cuda-path=/usr as a special case was just a way to get there. It's odd apparently because nvidia-cuda-toolkit is odd.

In the end only one of those paths will be in effect and that's the path that should be specified via --cuda-path. The fact that you want to add /usr/cuda/lib in this case suggests that /usr is the wrong path and /usr/lib/cuda is the correct one. It sounds like you need to change your build system and tell clang the correct path.

I do not think that changing clang to work around an issue in cmake files of one project is something we want to do.

I don't have a separate project using cmake. The cmake files I'm referring to are clang's. I'm just trying to make it easy to build clang and openmp and call clang on the command line under Ubuntu.

There is also an easy workaround. Just download CUDA SDK, install it into a local directory and point your build system there. This will work on any linux distribution.

An easier workaround is to specify CUDA_TOOLKIT_ROOT_DIR when building llvm, but my goal is make building on Ubuntu work without special configuration. D40453 was my hint that people already agreed that's a worthwhile pursuit.

I'm not adamant that handling --cuda-path=/usr is the right solution. But just adding IsUbuntu() is not a full solution, so I'm looking for advice on how to proceed.

Thanks for your responses.

jdenny added a comment.Tue, Dec 4, 2:52 PM

I do not think that changing clang to work around an issue in cmake files of one project is something we want to do.

I don't have a separate project using cmake. The cmake files I'm referring to are clang's.

... and opemp's. Is that the one project you meant?

tra added a comment.EditedTue, Dec 4, 3:27 PM

My real goal is to get clang and openmp working out of the box on Ubuntu. Treating --cuda-path=/usr as a special case was just a way to get there. It's odd apparently because nvidia-cuda-toolkit is odd.

It does not address the issue in principle -- if we add tweaks do deal with Ubuntu oddities, why shouldn't we also add tweaks for any other linux distro?
If you want to solve the issue in general you need to be able to tell where to find each component of CUDA SDK needed by clang. Once you have that flexibility, it will open a lot of possibilities to specify one of the options incorrectly and fail in some new and exciting way that nobody has seen before. I don't think the added complexity is worth the effort.

I personally prefer to keep things as simple as possible. Clang expects to see a monolithic CUDA SDK installation, specified, if necessary, with one option.
Dealing with distro-specific way SUDA SDK may be split there is up to the distro maintainers. If they want Clang compilation work with their variant of CUDA installation, they can always create a shim the way Debian did.

As for cmake, it currently does not support clang as the CUDA compiler. Whatever detects CUDA install path apparently does not do it correctly on Ubuntu (it's the same problem of having CUDA SDK bits all over the place) and that's something that needs to be fixed there.

I don't have a separate project using cmake. The cmake files I'm referring to are clang's. I'm just trying to make it easy to build clang and openmp and call clang on the command line under Ubuntu.
An easier workaround is to specify CUDA_TOOLKIT_ROOT_DIR when building llvm,

That would work, too.

but my goal is make building on Ubuntu work without special configuration. D40453 was my hint that people already agreed that's a worthwhile pursuit.

And I believe that we did convince Debian that it's up to them to arrange their packages in a way that works with clang.
Granted, closer examination of the shim they've added shows that the shim is incomplete, but it's the right way to solve this problem, IMO.

I'm not adamant that handling --cuda-path=/usr is the right solution. But just adding IsUbuntu() is not a full solution, so I'm looking for advice on how to proceed.

Let's start with fixing OpenMP's cmake files. Once it no longer insists on specifying --cuda-path=/usr, and isUbuntu is in place, what is the remaining failure that you see?

jdenny added a comment.Tue, Dec 4, 4:03 PM
In D55269#1319382, @tra wrote:

Let's start with fixing OpenMP's cmake files. Once it no longer insists on specifying --cuda-path=/usr, and isUbuntu is in place, what is the remaining failure that you see?

I'm fairly certain I would see no remaining failure then, but it's not clear to me how we should convince OpenMP's cmake files to stop doing that.

openmp/libomptarget/cmake/Modules/LibomptargetNVPTXBitcodeLibrary.cmake is the file, and there's one occurrence of --cuda-path.

If CUDA_TOOLKIT_ROOT_DIR is specified explicitly to cmake, that --cuda-path is necessary, so we cannot just remove it.

If CUDA_TOOLKIT_ROOT_DIR is not specified explicitly, it is computed by cmake. So I believe this boils down to getting cmake to find the right CUDA root on Debian/Ubuntu.

Do you agree so far?

Would you recommend submitting a patch to cmake's CUDA support? Or would you recommend replicating clang's logic from D40453 into openmp's cmake files, overriding cmake's own selection of CUDA_TOOLKIT_ROOT_DIR?

tra added a comment.Tue, Dec 4, 4:21 PM
In D55269#1319382, @tra wrote:

Let's start with fixing OpenMP's cmake files. Once it no longer insists on specifying --cuda-path=/usr, and isUbuntu is in place, what is the remaining failure that you see?

I'm fairly certain I would see no remaining failure then, but it's not clear to me how we should convince OpenMP's cmake files to stop doing that.

openmp/libomptarget/cmake/Modules/LibomptargetNVPTXBitcodeLibrary.cmake is the file, and there's one occurrence of --cuda-path.

If CUDA_TOOLKIT_ROOT_DIR is specified explicitly to cmake, that --cuda-path is necessary, so we cannot just remove it.

If CUDA_TOOLKIT_ROOT_DIR is not specified explicitly, it is computed by cmake. So I believe this boils down to getting cmake to find the right CUDA root on Debian/Ubuntu.

Do you agree so far?

I guess we're dealing with even more issues here.
cmake currently only knows about CUDA compilation with nvcc. The autodetection of CUDA_TOOLKIT_ROOT_DIR is intended to work for nvcc only and it does serve that purpose. Debian/Ubuntu packaging was also initially done with only nvcc in mind and it does work for nvcc. OpenMP's cmake files rely on CUDA_TOOLKIT_ROOT_DIR to find CUDA SDK and that is what fails for clang in case of split CUDA SDK installation.

One way to deal with that would be to change OpenMP's cmake files to check whether we're using Clang to compile CUDA and, if that's the case override CUDA root detection. Maybe ensure that CUDA install is monolithic and require user to specify a different location. Or check if we've detected a split path in /usr and override it with /usr/lib/cuda.

Would you recommend submitting a patch to cmake's CUDA support? Or would you recommend replicating clang's logic from D40453 into openmp's cmake files, overriding cmake's own selection of CUDA_TOOLKIT_ROOT_DIR?

That may be a bigger can of works than a single patch can solve. The good news is that the maintainer of CUDA support in cmake is working on adding clang support. So, things may improve soon.

jdenny added a comment.Tue, Dec 4, 4:56 PM
In D55269#1319452, @tra wrote:

That may be a bigger can of works than a single patch can solve. The good news is that the maintainer of CUDA support in cmake is working on adding clang support. So, things may improve soon.

To summarize all this as I understand it: It sounds like anything we do now for the cmake issue would be a short-term solution because cmake might later present a better solution. Originally, I figured the easiest place to work around this issue was clang as that would help anyone who detects a CUDA installation using current cmake support. You prefer to do it in OpenMP's cmake files, which I suppose would help avoid any unexpected consequences from my change to clang. A third alternative is to just wait for cmake to improve.

For now, I'll trim this patch to just the IsUbuntu() part (maybe tomorrow), and we can think more about the cmake side of things.

I think there are some misunderstandings here, or at least I understand things differently than @tra is describing them: AFAICS this change is NOT about replacing nvcc by clang in any CMake project (be that the OpenMP runtime or any other project outside of LLVM). The scope of the second issue is that there is a CMake project (namely openmp) which uses FindCuda.cmake to detect the CUDA installation and passes the resulting path to --cuda-path. I wrote that detection and I still think it's correct because that is a way to force Clang to use a given CUDA install location. If FindCuda.cmake returns a wrong path (/usr instead of /usr/lib/cuda) or the shim package behind /usr/lib/cuda is incomplete, then this needs to be fixed and not workarounded in Clang.

That said, I think it's the right way to add isUbuntu() to the candidate detection that we already have for Debian. Simply because the compiler should just get most of the simple cases right automatically.

[...]

In D55269#1318901, @tra wrote:

--cuda-path=/usr was never supposed to work -- /usr is *not* the root of the CUDA SDK.

/usr/lib/cuda/bin/nvcc doesn't exist, so that's probably why FindCUDA.cmake finds /usr/bin/nvcc (also installed by nvidia-cuda-toolkit). Is it fair then to say that /usr/lib/cuda isn't the root either?

[...]

It seems that nvidia-cuda-toolkit still isn't installing a complete CUDA install in one location. Even if it installed it all to /usr/lib/cuda, FindCUDA.cmake would probably still see /usr/bin/nvcc and assume /usr is the CUDA install root.

I think this needs to be fixed then: The shim package should provide links to all necessary things and CMake must be prepared to deal with it. IMO we shouldn't workaround broken build system detection in the compiler.

By the way, nvidia-cuda-toolkit does install the following, but there's no nvvm directory as clang currently expects:

Then again the distro's packaging is broken and needs to be adjusted.

In D55269#1319382, @tra wrote:

And I believe that we did convince Debian that it's up to them to arrange their packages in a way that works with clang.
Granted, closer examination of the shim they've added shows that the shim is incomplete, but it's the right way to solve this problem, IMO.

I fully agree here: Adding a single candidate path for a specific distro seems worth it, and we are already doing it.

In D55269#1319382, @tra wrote:

I'm not adamant that handling --cuda-path=/usr is the right solution. But just adding IsUbuntu() is not a full solution, so I'm looking for advice on how to proceed.

Let's start with fixing OpenMP's cmake files. Once it no longer insists on specifying --cuda-path=/usr, and isUbuntu is in place, what is the remaining failure that you see?

I disagree here: It's not OpenMP but CMake that's detecting the wrong path here. This is the place to fix it once and for all (and possibly get the shim package right in that process).

tra added a comment.Wed, Dec 5, 9:44 AM

I think there are some misunderstandings here, or at least I understand things differently than @tra is describing them: AFAICS this change is NOT about replacing nvcc by clang in any CMake project (be that the OpenMP runtime or any other project outside of LLVM).

I was not talking about replacing nvcc with clang, but rather just pointed that CUDA support in cmake is geared towards nvcc only and may not always give the right answer for clang.

The scope of the second issue is that there is a CMake project (namely openmp) which uses FindCuda.cmake to detect the CUDA installation and passes the resulting path to --cuda-path. I wrote that detection and I still think it's correct because that is a way to force Clang to use a given CUDA install location. If FindCuda.cmake returns a wrong path (/usr instead of /usr/lib/cuda) or the shim package behind /usr/lib/cuda is incomplete, then this needs to be fixed and not workarounded in Clang.

We are in agreement here. I see no problem with explicitly specifying CUDA path with --cuda-path and indeed, do think that CUDA detection in Cmake currently provides the path that does not work for clang. It's not exactly invalid as it is a valid path if one were to use nvcc, but that relies on some additional settings debian/ubuntu sets in the environment that only work for nvcc.

That said, I think it's the right way to add isUbuntu() to the candidate detection that we already have for Debian. Simply because the compiler should just get most of the simple cases right automatically.

That part I'm also fine with as Ubuntu uses exactly the same shim Debian has implemented and it makes sense for Clang to benefit from it on Ubuntu.

[...]

In D55269#1318901, @tra wrote:

--cuda-path=/usr was never supposed to work -- /usr is *not* the root of the CUDA SDK.

/usr/lib/cuda/bin/nvcc doesn't exist, so that's probably why FindCUDA.cmake finds /usr/bin/nvcc (also installed by nvidia-cuda-toolkit). Is it fair then to say that /usr/lib/cuda isn't the root either?

[...]

It seems that nvidia-cuda-toolkit still isn't installing a complete CUDA install in one location. Even if it installed it all to /usr/lib/cuda, FindCUDA.cmake would probably still see /usr/bin/nvcc and assume /usr is the CUDA install root.

I think this needs to be fixed then: The shim package should provide links to all necessary things and CMake must be prepared to deal with it. IMO we shouldn't workaround broken build system detection in the compiler.

That's exactly what I proposed to Debian folks https://bugs.llvm.org/show_bug.cgi?id=26966#c6 and I was under impression that that's what they did. It appears that they only created an empty directory structure with version.txt in it. At least that's what I see when I unpack nvidia-cuda-toolkit_9.1.85-3ubuntu1_amd64.deb. Perhaps they do something extra in the install script, but I didn't find anything obvious in the deb itself.

By the way, nvidia-cuda-toolkit does install the following, but there's no nvvm directory as clang currently expects:

Then again the distro's packaging is broken and needs to be adjusted.

Perhaps we can build a shim package ourselves and distribute it along with the clang. This would decouple usability of clang from the Ubuntu/Debian release process and would make it possible to make clang work with CUDA on older debian/Ubuntu versions.

Let's start with fixing OpenMP's cmake files. Once it no longer insists on specifying --cuda-path=/usr, and isUbuntu is in place, what is the remaining failure that you see?

I disagree here: It's not OpenMP but CMake that's detecting the wrong path here. This is the place to fix it once and for all (and possibly get the shim package right in that process).

It's somewhat orthogonal, IMO. I agree that it's not OpenMP's problem. Cmake will fix it at some point in future, but, presumably, we want OpenMP buildable *now*. Adding a temporary workaround for something that Cmake can't handle now would make building OpenMP with clang somewhat easier which was the end goal of this patch. In the end it's up to OpenMP maintainers what they would want to do.

jdenny updated this revision to Diff 176916.Wed, Dec 5, 7:11 PM
jdenny retitled this revision from [CUDA][OpenMP] Fix nvidia-cuda-toolkit detection on Debian/Ubuntu to [CUDA] Fix nvidia-cuda-toolkit detection on Ubuntu.
jdenny edited the summary of this revision. (Show Details)

Apply reviewer suggestion.

tra accepted this revision.Thu, Dec 6, 9:33 AM

LGTM.

This revision is now accepted and ready to land.Thu, Dec 6, 9:33 AM
This revision was automatically updated to reflect the committed changes.
In D55269#1320382, @tra wrote:

Let's start with fixing OpenMP's cmake files. Once it no longer insists on specifying --cuda-path=/usr, and isUbuntu is in place, what is the remaining failure that you see?

I disagree here: It's not OpenMP but CMake that's detecting the wrong path here. This is the place to fix it once and for all (and possibly get the shim package right in that process).

It's somewhat orthogonal, IMO. I agree that it's not OpenMP's problem. Cmake will fix it at some point in future, but, presumably, we want OpenMP buildable *now*. Adding a temporary workaround for something that Cmake can't handle now would make building OpenMP with clang somewhat easier which was the end goal of this patch. In the end it's up to OpenMP maintainers what they would want to do.

I don't know when I'll explore the cmake problem further. If someone else decides to, please cc/subscribe me.

Thanks for the reviews.