For the shuffle instructions in reductions we need at least sm_30
but the user may want to customize the default architecture.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | I'd keep this code. It appears to serve useful purpose as it requires CUDA installation to have at least some libdevice library in it. It gives us a change to find a valid installation, instead of ailing some time later when we ask for a libdevice file and fail because there are none. |
556 ↗ | (On Diff #118912) | This sets default GPU arch for *everyone* based on OPENMP requirements. Perhaps this should be predicated on this being openmp compilation. IMO to avoid unnecessary surprises, the default for CUDA compilation should follow defaults of nvcc. sm_30 becomes default only in CUDA-9. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | We had some internal discussions about this after I submitted the patch here. The main question is: Do we want to support CUDA installations without libdevice and are there use cases for that? I'd say that the user should be able to use a toolchain without libdevice together with -nocudalib. |
540 ↗ | (On Diff #118912) | This check guards the whole block. |
556 ↗ | (On Diff #118912) | This branch is only executed for OpenMP, see above |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | Ok, I'll do that in a separate patch and keep the code here for now. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | The problem with nocudalib is that if for example you write a test, which looks to verify some device facing feature that requires a libdevice to be found (so you don't want to use nocudalib), it will probably work on your machine which has the correct CUDA setup but fail on another machine which does not (which is where you want to use nocudalib). You can see the contradiction there. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | Just to be clear I am arguing for keeping this code :) |
lib/Driver/ToolChains/Cuda.h | ||
---|---|---|
90 ↗ | (On Diff #118912) | I would also like to keep the spirit of this code if not in this exact form at least something that performs the same functionality. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | @gtbercea: I'm not sure I follow your example. If you're talking about clang tests, we do have fake CUDA installation setup under test/Driver/Inputs which removes dependency on whatever CUDA you may or may not have installed on your machine. I also don't see a contradiction -- you you do need libdevice, it makes no point picking a broken CUDA installation which does not have any libdevice files. If you explicitly tell compiler that you don't need libdevice, that would make CUDA w/o libdevice acceptable. With --cuda-path you do have a way to tell clang which installation you want it to use. What do I miss? |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | Ah, you were arguing with Hahnfeld@'s -nocudalib example. Then I guess we're in violent agreement. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | I fully agree with this: "you do need libdevice, it makes no point picking a broken CUDA installation which does not have any libdevice files. If you explicitly tell compiler that you don't need libdevice, that would make CUDA w/o libdevice acceptable." I was trying to show an example of a situation where you have your code compiled using nocudalib on one machine and then the same code will error on a machine which requires the nocudalib flag to be passed to make up for the absence of libdevice. |
lib/Driver/ToolChains/Cuda.cpp | ||
---|---|---|
170–182 ↗ | (On Diff #118912) | Yes it was a counter argument to that! :) |
lib/Driver/ToolChains/Cuda.h | ||
---|---|---|
90 ↗ | (On Diff #118912) | There are currently no users for this. In general, I would rather not have magically-changing default GPU based on how broken your CUDA installation is. IMO it would be better to keep defaults static and fail if prerequisites are not met. |
lib/Driver/ToolChains/Cuda.h | ||
---|---|---|
90 ↗ | (On Diff #118912) | I would have thought that it is up to the compiler to select, as default, the lowest viable compute capability. This is what this code aims to do (whether it actually does that's a separate issue :) ). |
lib/Driver/ToolChains/Cuda.h | ||
---|---|---|
90 ↗ | (On Diff #118912) | The reason I added this code in the first place was to overcome the fact that something like a default of sm_30 may work on the K40 but once you go to newer Pascal, Volta GPUs then you need a new minimum compute capability that is supported. |
Check that the user didn't specify a value lower than sm_30 and re-add some code as discussed.
lib/Driver/ToolChains/Cuda.h | ||
---|---|---|
90 ↗ | (On Diff #118912) |
I'd take it out, at least for now as you've removed the only user of that function. In general, though, compilers tend to use conservative defaults and for CUDA that would be the lowest GPU variant supported by compiler. In case of CUDA it's determined by the CUDA SDK version. Figuring lowers supported version via libdevice mapping we've created is wrong. E.g. with this patch and -nocudalib you may end up using CUDA-9 without any libdevice and would return sm_20. If/when we need to figure out minimum supported version, it should be based directly on the value returned by version(). |