This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/tools/
-
tools/
-
CMakeLists.txt
-
nvptx-arch/
2/5
CMakeLists.txt
4/10
NVPTXArch.cpp

Differential D140433

[Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs
ClosedPublic

Authored by jhuber6 on Dec 20 2022, 2:22 PM.

Download Raw Diff

Details

Reviewers

JonChesterfield
tra
yaxunl
jdoerfert
tianshilei1992
MaskRay

Commits

rGd5a5ee856e7c: [Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs

Summary

We already have a tool called amdgpu-arch which returns the GPUs on
the system. This is used to determine the default architecture when
doing offloading. This patch introduces a similar tool nvptx-arch.
Right now we use the detected GPU at compile time. This is unhelpful
when building on a login node and moving execution to a compute node for
example. This will allow us to better choose a default architecture when
targeting NVPTX. Also we can probably use this with CMake's native
setting for CUDA now.

CUDA since 11.6 provides __nvcc_device_query which has a similar
function but it is probably better to define this locally if we want to
depend on it in clang.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Dec 20 2022, 2:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2022, 2:22 PM

Herald added subscribers: kosarev, mattd, gchakrabarti and 3 others. · View Herald Transcript

jhuber6 requested review of this revision.Dec 20 2022, 2:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2022, 2:22 PM

Herald added subscribers: cfe-commits, sstefan1, jholewinski. · View Herald Transcript

Harbormaster completed remote builds in B204241: Diff 484381.Dec 20 2022, 4:04 PM

tianshilei1992 added inline comments.Dec 21 2022, 7:46 AM

clang/tools/nvptx-arch/NVPTXArch.cpp
63	Do we want to include device number here?

jhuber6 added inline comments.Dec 21 2022, 7:49 AM

clang/tools/nvptx-arch/NVPTXArch.cpp
63	For `amdgpu-arch` and here we just have it implicitly in the order, so the n-th line is the n-th device, i.e. sm_70 // device 0 sm_80 // device 1 sm_70 // device 2

Change header I copied from the AMD implementation.

Harbormaster completed remote builds in B204397: Diff 484594.Dec 21 2022, 10:03 AM

arsenm added a subscriber: arsenm.Dec 21 2022, 10:54 AM

arsenm added inline comments.

clang/tools/nvptx-arch/NVPTXArch.cpp
38	stderr?

Print to stderr and only return 1 if thre was an actual error. A lack of devices is considered a success and we print nothing.

Harbormaster completed remote builds in B204427: Diff 484637.Dec 21 2022, 12:12 PM

LGTM

This revision is now accepted and ready to land.Dec 21 2022, 5:31 PM

Closed by commit rGd5a5ee856e7c: [Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs (authored by jhuber6). · Explain WhyDec 25 2022, 7:24 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rGd5a5ee856e7c: [Clang] Add `nvptx-arch` tool to query installed NVIDIA GPUs.

Hahnfeld added a subscriber: Hahnfeld.Dec 29 2022, 12:49 AM

Hahnfeld added inline comments.

clang/tools/nvptx-arch/CMakeLists.txt
29	This broke my build with `CLANG_LINK_CLANG_DYLIB`; we must use the standard CMake `target_link_libraries` for the CUDA libraries. I fixed this in commit rGf3c9342a3d56e1782e3b6db081401af334648492.

tra added inline comments.Jan 3 2023, 3:46 PM

clang/tools/nvptx-arch/CMakeLists.txt
20	Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API , It has nothing to do with the CUDA runtime. Here, it's actually not even the libcuda.so itself that's not found, but it's stub. I think a sensible error here should say "Failed to find stubs/libcuda.so in CUDA_LIBDIR"
26	Does it mean that the executable will have RPATH pointing to CUDA_LIBDIR/stubs? This should not be necessary. The stub shipped with CUDA comes as "libcuda.so" only. It's SONAME is libcuda.so.1, but there's no symlink with that name in stubs, so RPATH pointing there will do nothing. At runtime, dynamic linker will attempt to open libcuda.so.1 and it will only be found among the actual libraries installed by NVIDIA drivers.
clang/tools/nvptx-arch/NVPTXArch.cpp
27	How do we distinguish "we didn't have CUDA at build time" reported here from "some driver API failed with CUDA_ERROR_INVALID_VALUE=1" ?
35	One problem with this approach is that `nvptx-arch` will fail to run on a machine without NVIDIA drivers installed because dynamic linker will not find `libcuda.so.1`. Ideally we want it to run on any machine and fail the way we want. A typical way to achieve that is to dlopen("libcuda.so.1"), and obtain the pointers to the functions we need via `dlsym()`.
63	NVIDIA GPU enumeration order is more or less arbitrary. By default it's arranged by "sort of fastest GPU first", but can be rearranged in order of PCI(e) bus IDs or in an arbitrary user-specified order using `CUDA_VISIBLE_DEVICES`. Printing compute capability in the enumeration order is pretty much all the user needs. If we want to print something uniquely identifying the device, we would need to pring the device UUID, similarly to what `nvidia-smi -L` does. Or PCIe bus IDs. In other words -- we can uniquely identify devices, but there's no such thing as inherent canonical order among the devices.

jhuber6 added inline comments.Jan 3 2023, 4:35 PM

clang/tools/nvptx-arch/CMakeLists.txt
20	Good point. Never thought about the difference because they're both called `cuda` somewhere.
26	Interesting, I can probably delete it. Another thing I mostly just copied from the existing tool.
clang/tools/nvptx-arch/NVPTXArch.cpp
27	I guess the latter would print an error message. We do the same thing with the `amdgpu-arch` so I just copied it.
35	We do this in the OpenMP runtime. I mostly copied this approach from the existing `amdgpu-arch` but we could change both to use this method.
63	I think it's mostly just important that it prints a valid GPU. Most of the uses for this tool will just be "Give me a valid GPU I can run on this machine".

tra added inline comments.Jan 3 2023, 4:55 PM

clang/tools/nvptx-arch/NVPTXArch.cpp
35	An alternative would be to enumerate GPUs using CUDA runtime API, and link statically with libcudart_static.a CUDA runtime will take care of finding libcuda.so and will return an error if it fails, so you do not need to mess with dlopen, etc. E.g. this could be used as a base: https://github.com/NVIDIA/cuda-samples/blob/master/Samples/1_Utilities/deviceQuery/deviceQuery.cpp

Revision Contents

Path

Size

clang/

tools/

CMakeLists.txt

1 line

nvptx-arch/

CMakeLists.txt

28 lines

NVPTXArch.cpp

67 lines

Diff 484381

clang/tools/CMakeLists.txt

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	# to keep the primary Clang repository small and focused.			# to keep the primary Clang repository small and focused.
	# It also may be included by LLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR.			# It also may be included by LLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR.
	add_llvm_external_project(clang-tools-extra extra)			add_llvm_external_project(clang-tools-extra extra)

	# libclang may require clang-tidy in clang-tools-extra.			# libclang may require clang-tidy in clang-tools-extra.
	add_clang_subdirectory(libclang)			add_clang_subdirectory(libclang)

	add_clang_subdirectory(amdgpu-arch)			add_clang_subdirectory(amdgpu-arch)
				add_clang_subdirectory(nvptx-arch)

clang/tools/nvptx-arch/CMakeLists.txt

This file was added.

				# //===--------------------------------------------------------------------===//
				# //
				# // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				# // See https://llvm.org/LICENSE.txt for details.
				# // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				# //
				# //===--------------------------------------------------------------------===//


				# TODO: This is deprecated. Since CMake 3.17 we can use FindCUDAToolkit instead.
				find_package(CUDA QUIET)
				find_library(cuda-library NAMES cuda PATHS /lib64)
				if (NOT cuda-library AND CUDA_FOUND)
				get_filename_component(CUDA_LIBDIR "${CUDA_cudart_static_LIBRARY}" DIRECTORY)
				find_library(cuda-library NAMES cuda HINTS "${CUDA_LIBDIR}/stubs")
				endif()

				if (NOT CUDA_FOUND OR NOT cuda-library)
				message(STATUS "Not building nvptx-arch: cuda runtime not found")
				return()
				traUnsubmitted Not Done Reply Inline Actions Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API , It has nothing to do with the CUDA runtime. Here, it's actually not even the libcuda.so itself that's not found, but it's stub. I think a sensible error here should say "Failed to find stubs/libcuda.so in CUDA_LIBDIR" tra: Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API , It has nothing…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Good point. Never thought about the difference because they're both called `cuda` somewhere. jhuber6: Good point. Never thought about the difference because they're both called `cuda` somewhere.
				endif()

				add_clang_tool(nvptx-arch NVPTXArch.cpp)

				set_target_properties(nvptx-arch PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
				target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})
				traUnsubmitted Not Done Reply Inline Actions Does it mean that the executable will have RPATH pointing to CUDA_LIBDIR/stubs? This should not be necessary. The stub shipped with CUDA comes as "libcuda.so" only. It's SONAME is libcuda.so.1, but there's no symlink with that name in stubs, so RPATH pointing there will do nothing. At runtime, dynamic linker will attempt to open libcuda.so.1 and it will only be found among the actual libraries installed by NVIDIA drivers. tra: Does it mean that the executable will have RPATH pointing to CUDA_LIBDIR/stubs? This should…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions Interesting, I can probably delete it. Another thing I mostly just copied from the existing tool. jhuber6: Interesting, I can probably delete it. Another thing I mostly just copied from the existing…

				clang_target_link_libraries(nvptx-arch PRIVATE ${cuda-library})
				HahnfeldUnsubmitted Not Done Reply Inline Actions This broke my build with `CLANG_LINK_CLANG_DYLIB`; we must use the standard CMake `target_link_libraries` for the CUDA libraries. I fixed this in commit rGf3c9342a3d56e1782e3b6db081401af334648492. Hahnfeld: This broke my build with `CLANG_LINK_CLANG_DYLIB`; we must use the standard CMake…

clang/tools/nvptx-arch/NVPTXArch.cpp

This file was added.

				//===- NVPTXArch.cpp - list installed NVPTX devies ------- C++ ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a tool for detecting name of AMDGPU installed in system
				// using HSA. This tool is used by AMDGPU OpenMP driver.
				//
				//===----------------------------------------------------------------------===//

				#if defined(__has_include)
				#if __has_include("cuda.h")
				#include "cuda.h"
				#define CUDA_HEADER_FOUND 1
				#else
				#define CUDA_HEADER_FOUND 0
				#endif
				#else
				#define CUDA_HEADER_FOUND 0
				#endif

				#if !CUDA_HEADER_FOUND
				int main() { return 1; }
				#else
				traUnsubmitted Not Done Reply Inline Actions How do we distinguish "we didn't have CUDA at build time" reported here from "some driver API failed with CUDA_ERROR_INVALID_VALUE=1" ? tra: How do we distinguish "we didn't have CUDA at build time" reported here from "some driver API…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions I guess the latter would print an error message. We do the same thing with the `amdgpu-arch` so I just copied it. jhuber6: I guess the latter would print an error message. We do the same thing with the `amdgpu-arch` so…

				#include <cstdint>
				#include <cstdio>

				static int handleError(CUresult Err) {
				const char *ErrStr = nullptr;
				CUresult Result = cuGetErrorString(Err, &ErrStr);
				if (Result != CUDA_SUCCESS)
				traUnsubmitted Not Done Reply Inline Actions One problem with this approach is that `nvptx-arch` will fail to run on a machine without NVIDIA drivers installed because dynamic linker will not find `libcuda.so.1`. Ideally we want it to run on any machine and fail the way we want. A typical way to achieve that is to dlopen("libcuda.so.1"), and obtain the pointers to the functions we need via `dlsym()`. tra: One problem with this approach is that `nvptx-arch` will fail to run on a machine without…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions We do this in the OpenMP runtime. I mostly copied this approach from the existing `amdgpu-arch` but we could change both to use this method. jhuber6: We do this in the OpenMP runtime. I mostly copied this approach from the existing `amdgpu-arch`…
				traUnsubmitted Not Done Reply Inline Actions An alternative would be to enumerate GPUs using CUDA runtime API, and link statically with libcudart_static.a CUDA runtime will take care of finding libcuda.so and will return an error if it fails, so you do not need to mess with dlopen, etc. E.g. this could be used as a base: https://github.com/NVIDIA/cuda-samples/blob/master/Samples/1_Utilities/deviceQuery/deviceQuery.cpp tra: An alternative would be to enumerate GPUs using CUDA runtime API, and link statically with…
				return 1;
				printf("CUDA error: %s\n", ErrStr);
				return 1;
				arsenmUnsubmitted Not Done Reply Inline Actions stderr? arsenm: stderr?
				}

				int main() {
				if (CUresult Err = cuInit(0))
				return 1;

				int Count = 0;
				if (cuDeviceGetCount(&Count))
				return 1;
				if (Count == 0)
				return 0;
				for (int DeviceId = 0; DeviceId < Count; ++DeviceId) {
				CUdevice Device;
				if (CUresult Err = cuDeviceGet(&Device, DeviceId))
				return handleError(Err);

				int32_t Major, Minor;
				if (CUresult Err = cuDeviceGetAttribute(
				&Major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, Device))
				return handleError(Err);
				if (CUresult Err = cuDeviceGetAttribute(
				&Minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, Device))
				return handleError(Err);

				printf("sm_%d%d\n", Major, Minor);
				tianshilei1992Unsubmitted Not Done Reply Inline Actions Do we want to include device number here? tianshilei1992: Do we want to include device number here?
				jhuber6AuthorUnsubmitted Done Reply Inline Actions For `amdgpu-arch` and here we just have it implicitly in the order, so the n-th line is the n-th device, i.e. sm_70 // device 0 sm_80 // device 1 sm_70 // device 2 jhuber6: For `amdgpu-arch` and here we just have it implicitly in the order, so the n-th line is the n…
				traUnsubmitted Not Done Reply Inline Actions NVIDIA GPU enumeration order is more or less arbitrary. By default it's arranged by "sort of fastest GPU first", but can be rearranged in order of PCI(e) bus IDs or in an arbitrary user-specified order using `CUDA_VISIBLE_DEVICES`. Printing compute capability in the enumeration order is pretty much all the user needs. If we want to print something uniquely identifying the device, we would need to pring the device UUID, similarly to what `nvidia-smi -L` does. Or PCIe bus IDs. In other words -- we can uniquely identify devices, but there's no such thing as inherent canonical order among the devices. tra: NVIDIA GPU enumeration order is more or less arbitrary. By default it's arranged by "sort of…
				jhuber6AuthorUnsubmitted Done Reply Inline Actions I think it's mostly just important that it prints a valid GPU. Most of the uses for this tool will just be "Give me a valid GPU I can run on this machine". jhuber6: I think it's mostly just important that it prints a valid GPU. Most of the uses for this tool…
				}
				return 0;
				}
				#endif