We already have a tool called amdgpu-arch which returns the GPUs on
the system. This is used to determine the default architecture when
doing offloading. This patch introduces a similar tool nvptx-arch.
Right now we use the detected GPU at compile time. This is unhelpful
when building on a login node and moving execution to a compute node for
example. This will allow us to better choose a default architecture when
targeting NVPTX. Also we can probably use this with CMake's native
setting for CUDA now.
CUDA since 11.6 provides __nvcc_device_query which has a similar
function but it is probably better to define this locally if we want to
depend on it in clang.
Nit: libcuda.so is part of the NVIDIA driver which provides NVIDIA driver API , It has nothing to do with the CUDA runtime.
Here, it's actually not even the libcuda.so itself that's not found, but it's stub.
I think a sensible error here should say "Failed to find stubs/libcuda.so in CUDA_LIBDIR"