This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs
ClosedPublic

Authored by jhuber6 on Dec 20 2022, 2:22 PM.

Details

Summary

We already have a tool called amdgpu-arch which returns the GPUs on
the system. This is used to determine the default architecture when
doing offloading. This patch introduces a similar tool nvptx-arch.
Right now we use the detected GPU at compile time. This is unhelpful
when building on a login node and moving execution to a compute node for
example. This will allow us to better choose a default architecture when
targeting NVPTX. Also we can probably use this with CMake's native
setting for CUDA now.

CUDA since 11.6 provides __nvcc_device_query which has a similar
function but it is probably better to define this locally if we want to
depend on it in clang.

Diff Detail

Event Timeline

jhuber6 created this revision.Dec 20 2022, 2:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2022, 2:22 PM
jhuber6 requested review of this revision.Dec 20 2022, 2:22 PM
tianshilei1992 added inline comments.Dec 21 2022, 7:46 AM
clang/tools/nvptx-arch/NVPTXArch.cpp
64

Do we want to include device number here?

jhuber6 added inline comments.Dec 21 2022, 7:49 AM
clang/tools/nvptx-arch/NVPTXArch.cpp
64

For amdgpu-arch and here we just have it implicitly in the order, so the n-th line is the n-th device, i.e.

sm_70 // device 0
sm_80 // device 1
sm_70 // device 2
jhuber6 updated this revision to Diff 484594.Dec 21 2022, 8:33 AM

Change header I copied from the AMD implementation.

arsenm added a subscriber: arsenm.Dec 21 2022, 10:54 AM
arsenm added inline comments.
clang/tools/nvptx-arch/NVPTXArch.cpp
38

stderr?

jhuber6 updated this revision to Diff 484637.Dec 21 2022, 11:35 AM

Print to stderr and only return 1 if thre was an actual error. A lack of devices is considered a success and we print nothing.

This revision is now accepted and ready to land.Dec 21 2022, 5:31 PM
Hahnfeld added inline comments.
clang/tools/nvptx-arch/CMakeLists.txt
28

This broke my build with CLANG_LINK_CLANG_DYLIB; we must use the standard CMake target_link_libraries for the CUDA libraries. I fixed this in commit rGf3c9342a3d56e1782e3b6db081401af334648492.

tra added inline comments.Jan 3 2023, 3:46 PM
clang/tools/nvptx-arch/CMakeLists.txt
19

Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API , It has nothing to do with the CUDA runtime.
Here, it's actually not even the libcuda.so itself that's not found, but it's stub.
I think a sensible error here should say "Failed to find stubs/libcuda.so in CUDA_LIBDIR"

25

Does it mean that the executable will have RPATH pointing to CUDA_LIBDIR/stubs?

This should not be necessary. The stub shipped with CUDA comes as "libcuda.so" only. It's SONAME is libcuda.so.1, but there's no symlink with that name in stubs, so RPATH pointing there will do nothing. At runtime, dynamic linker will attempt to open libcuda.so.1 and it will only be found among the actual libraries installed by NVIDIA drivers.

clang/tools/nvptx-arch/NVPTXArch.cpp
26

How do we distinguish "we didn't have CUDA at build time" reported here from "some driver API failed with CUDA_ERROR_INVALID_VALUE=1" ?

34

One problem with this approach is that nvptx-arch will fail to run on a machine without NVIDIA drivers installed because dynamic linker will not find libcuda.so.1.

Ideally we want it to run on any machine and fail the way we want.

A typical way to achieve that is to dlopen("libcuda.so.1"), and obtain the pointers to the functions we need via dlsym().

64

NVIDIA GPU enumeration order is more or less arbitrary. By default it's arranged by "sort of fastest GPU first", but can be rearranged in order of PCI(e) bus IDs or in an arbitrary user-specified order using CUDA_VISIBLE_DEVICES. Printing compute capability in the enumeration order is pretty much all the user needs. If we want to print something uniquely identifying the device, we would need to pring the device UUID, similarly to what nvidia-smi -L does. Or PCIe bus IDs. In other words -- we can uniquely identify devices, but there's no such thing as inherent canonical order among the devices.

jhuber6 added inline comments.Jan 3 2023, 4:35 PM
clang/tools/nvptx-arch/CMakeLists.txt
19

Good point. Never thought about the difference because they're both called cuda somewhere.

25

Interesting, I can probably delete it. Another thing I mostly just copied from the existing tool.

clang/tools/nvptx-arch/NVPTXArch.cpp
26

I guess the latter would print an error message. We do the same thing with the amdgpu-arch so I just copied it.

34

We do this in the OpenMP runtime. I mostly copied this approach from the existing amdgpu-arch but we could change both to use this method.

64

I think it's mostly just important that it prints a valid GPU. Most of the uses for this tool will just be "Give me a valid GPU I can run on this machine".

tra added inline comments.Jan 3 2023, 4:55 PM
clang/tools/nvptx-arch/NVPTXArch.cpp
34

An alternative would be to enumerate GPUs using CUDA runtime API, and link statically with libcudart_static.a

CUDA runtime will take care of finding libcuda.so and will return an error if it fails, so you do not need to mess with dlopen, etc.

E.g. this could be used as a base:
https://github.com/NVIDIA/cuda-samples/blob/master/Samples/1_Utilities/deviceQuery/deviceQuery.cpp