Page MenuHomePhabricator

[CMake][OpenMP] Customize default offloading arch
ClosedPublic

Authored by Hahnfeld on Oct 13 2017, 8:07 AM.

Details

Summary

For the shuffle instructions in reductions we need at least sm_30
but the user may want to customize the default architecture.

Diff Detail

Repository
rL LLVM

Event Timeline

Hahnfeld created this revision.Oct 13 2017, 8:07 AM
tra added a subscriber: tra.Oct 13 2017, 9:29 AM
tra added inline comments.
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

I'd keep this code. It appears to serve useful purpose as it requires CUDA installation to have at least some libdevice library in it. It gives us a change to find a valid installation, instead of ailing some time later when we ask for a libdevice file and fail because there are none.

556 ↗(On Diff #118912)

This sets default GPU arch for *everyone* based on OPENMP requirements. Perhaps this should be predicated on this being openmp compilation.

IMO to avoid unnecessary surprises, the default for CUDA compilation should follow defaults of nvcc. sm_30 becomes default only in CUDA-9.

Hahnfeld marked an inline comment as done.Oct 13 2017, 10:15 AM
Hahnfeld added inline comments.
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

We had some internal discussions about this after I submitted the patch here.

The main question is: Do we want to support CUDA installations without libdevice and are there use cases for that? I'd say that the user should be able to use a toolchain without libdevice together with -nocudalib.

540 ↗(On Diff #118912)

This check guards the whole block.

556 ↗(On Diff #118912)

This branch is only executed for OpenMP, see above

tra added inline comments.Oct 13 2017, 10:52 AM
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

Sounds reasonable. How about keeping the code but putting it under if(!hasArg(nocudalib))?

556 ↗(On Diff #118912)

OK. I've missed that 'if'.

Hahnfeld marked 4 inline comments as done.Oct 13 2017, 10:58 AM
Hahnfeld added inline comments.
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

Ok, I'll do that in a separate patch and keep the code here for now.

gtbercea added inline comments.Oct 13 2017, 11:02 AM
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

The problem with nocudalib is that if for example you write a test, which looks to verify some device facing feature that requires a libdevice to be found (so you don't want to use nocudalib), it will probably work on your machine which has the correct CUDA setup but fail on another machine which does not (which is where you want to use nocudalib). You can see the contradiction there.

gtbercea added inline comments.Oct 13 2017, 11:04 AM
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

Just to be clear I am arguing for keeping this code :)

gtbercea added inline comments.Oct 13 2017, 11:05 AM
lib/Driver/ToolChains/Cuda.h
90 ↗(On Diff #118912)

I would also like to keep the spirit of this code if not in this exact form at least something that performs the same functionality.

tra added inline comments.Oct 13 2017, 11:11 AM
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

@gtbercea: I'm not sure I follow your example. If you're talking about clang tests, we do have fake CUDA installation setup under test/Driver/Inputs which removes dependency on whatever CUDA you may or may not have installed on your machine. I also don't see a contradiction -- you you do need libdevice, it makes no point picking a broken CUDA installation which does not have any libdevice files. If you explicitly tell compiler that you don't need libdevice, that would make CUDA w/o libdevice acceptable. With --cuda-path you do have a way to tell clang which installation you want it to use. What do I miss?

tra added inline comments.Oct 13 2017, 11:13 AM
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

Ah, you were arguing with Hahnfeld@'s -nocudalib example. Then I guess we're in violent agreement.

gtbercea added inline comments.Oct 13 2017, 11:16 AM
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

I fully agree with this: "you do need libdevice, it makes no point picking a broken CUDA installation which does not have any libdevice files. If you explicitly tell compiler that you don't need libdevice, that would make CUDA w/o libdevice acceptable."

I was trying to show an example of a situation where you have your code compiled using nocudalib on one machine and then the same code will error on a machine which requires the nocudalib flag to be passed to make up for the absence of libdevice.

gtbercea added inline comments.Oct 13 2017, 11:17 AM
lib/Driver/ToolChains/Cuda.cpp
170–182 ↗(On Diff #118912)

Yes it was a counter argument to that! :)

tra added a reviewer: tra.Oct 13 2017, 11:19 AM
tra removed a subscriber: tra.
gtbercea added inline comments.Oct 13 2017, 11:19 AM
lib/Driver/ToolChains/Cuda.h
90 ↗(On Diff #118912)

@tra what's your opinion on this code? Should this stay, stay but modified to be more robust or taken out completely?

tra added inline comments.Oct 13 2017, 11:25 AM
lib/Driver/ToolChains/Cuda.h
90 ↗(On Diff #118912)

There are currently no users for this. In general, I would rather not have magically-changing default GPU based on how broken your CUDA installation is. IMO it would be better to keep defaults static and fail if prerequisites are not met.

gtbercea added inline comments.Oct 13 2017, 11:39 AM
lib/Driver/ToolChains/Cuda.h
90 ↗(On Diff #118912)

I would have thought that it is up to the compiler to select, as default, the lowest viable compute capability. This is what this code aims to do (whether it actually does that's a separate issue :) ).

gtbercea added inline comments.Oct 13 2017, 11:41 AM
lib/Driver/ToolChains/Cuda.h
90 ↗(On Diff #118912)

The reason I added this code in the first place was to overcome the fact that something like a default of sm_30 may work on the K40 but once you go to newer Pascal, Volta GPUs then you need a new minimum compute capability that is supported.

Hahnfeld updated this revision to Diff 118961.Oct 13 2017, 12:55 PM
Hahnfeld marked an inline comment as done.
Hahnfeld edited the summary of this revision. (Show Details)

Check that the user didn't specify a value lower than sm_30 and re-add some code as discussed.

tra added inline comments.Oct 13 2017, 1:13 PM
lib/Driver/ToolChains/Cuda.h
90 ↗(On Diff #118912)

Should this stay, stay but modified to be more robust or taken out completely?

I'd take it out, at least for now as you've removed the only user of that function.

In general, though, compilers tend to use conservative defaults and for CUDA that would be the lowest GPU variant supported by compiler. In case of CUDA it's determined by the CUDA SDK version. Figuring lowers supported version via libdevice mapping we've created is wrong. E.g. with this patch and -nocudalib you may end up using CUDA-9 without any libdevice and would return sm_20.

If/when we need to figure out minimum supported version, it should be based directly on the value returned by version().

tra accepted this revision.Oct 16 2017, 1:48 PM
This revision is now accepted and ready to land.Oct 16 2017, 1:48 PM
This revision was automatically updated to reflect the committed changes.