This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Print an error if you try to compile with < sm_30 on CUDA 9.
ClosedPublic

Authored by jlebar on Oct 19 2017, 3:34 PM.

Details

Summary

CUDA 9's minimum sm is sm_30.

Ideally we should also make sm_30 the default when compiling with CUDA
9, but that seems harder than it should be.

Diff Detail

Repository
rL LLVM

Event Timeline

jlebar created this revision.Oct 19 2017, 3:34 PM
jlebar updated this revision to Diff 119625.Oct 19 2017, 3:38 PM

Fix libdevice files.

jlebar added a reviewer: tra.Oct 19 2017, 4:03 PM
tra added inline comments.Oct 23 2017, 1:38 PM
clang/lib/Driver/ToolChains/Cuda.cpp
211–228 ↗(On Diff #119625)

Generally speaking, if a user attempts to compile for sm_20 with cuda-9, we can't really tell whether he mistyped --cuda-path argument and pointed us to cuda-9 instead of cuda-8 or mistyped --cuda-gpu-arch and asked for sm_20 instead of sm_30. We'll report _arch_too_low, while in reality it could be the case of "CUDA version too high".

Perhaps it would be better to collapse MinVersionForCudaArch(Arch) and MinArchForCudaVersion(Version) into isSupportedCudaArch(CUDA Version, GPU arch) and just report if particular GPU arch is not supported by this CUDA version. That should be sufficient for user to figure out which of the two parameters were wrong.

Generally speaking, if a user attempts to compile for sm_20 with cuda-9, we can't really tell whether he mistyped --cuda-path argument and pointed us to cuda-9 instead of cuda-8 or mistyped --cuda-gpu-arch and asked for sm_20 instead of sm_30. We'll report _arch_too_low, while in reality it could be the case of "CUDA version too high".

Agreed, maybe the messages in DiagnosticDriverKinds.td don't have the best names (though I'm struggling to improve them).

But do you think the error message we'll print is bad? For example:

CUDA version 9.0 does not support compiling for GPU archs earlier than sm_30, so cannot compile for sm_20. Use --cuda-gpu-arch to specify a different GPU arch, use --cuda-path to specify a different CUDA install, or pass --no-cuda-version-check.

The other error message you might get is something like

GPU arch sm_70 requires CUDA version at least 9.0, but installation at /usr/local/cuda is 8.0. Use --cuda-path to specify a different CUDA install, or pass --no-cuda-version-check.

For this one we could add a clause mentioning --cuda-gpu-arch, but I'm not sure that's necessary, since if your CUDA arch is too *high*, you're surely specifying --cuda-gpu-arch on the command line.

It seems to me that these two are reasonable error messages, and I'm not sure it makes sense to combine them...

tra accepted this revision.Oct 23 2017, 7:11 PM

The point was that we have two error messages for one problem -- this CUDA version does not support this GPU. The new message you've added (CUDA9, sm20) has to be rather verbose in order to be correct as it must deal with the possibility of either of the relevant arguments being the source of the error. The other end of the problem (CUDA<9, sm_70) should ideally be phrased similarly. But why do we need both? IMO both cases could be reported more consistently with a single message similar to the one you've added -- "CUDA version X does not support compiling for GPU arch Y. Use --cuda-gpu-arch to specify a different GPU arch, use --cuda-path to specify a different CUDA install, or pass --no-cuda-version-check."

Either way, it's a minor nit. If you believe there's some utility in having two different error messages, I'm OK with it.

This revision is now accepted and ready to land.Oct 23 2017, 7:11 PM

I see what you're saying -- makes sense. Let me try to revise the patch.

jlebar updated this revision to Diff 119998.Oct 23 2017, 10:59 PM

Use just one error message for bad cuda arch instead of two.

tra accepted this revision.Oct 25 2017, 1:33 PM

LGTM.

This revision was automatically updated to reflect the committed changes.