Much like the changes in D141859, this patch allows the nvptx-arch
tool to be built and provided with every distrubition of LLVM / Clang.
This will make it more reliable for our toolchains to depend on. The
changes here configure a version that dynamically loads CUDA if it was
not found at build time.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Yeah, otherwise I suppose there will be some errors when compiling OpenMP program when there is no CUDA installed.
clang/tools/nvptx-arch/NVPTXArch.cpp | ||
---|---|---|
89–95 | unrelated: I always prefer: int main(int argc, char *argv[]) { return 0; } |
This change appears to have broken the build when crosscompiling to x86-32 on a Linux x86-64 system; on the Halide buildbots, we now fail at link time with
FAILED: bin/nvptx-arch : && /usr/bin/g++-7 -m32 -Wno-psabi -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib -Wl,--gc-sections tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o bin/nvptx-arch -Wl,-rpath,"\$ORIGIN/../lib" lib/libLLVMSupport.a -lpthread -lrt -ldl -lpthread -lm lib/libLLVMDemangle.a && : /usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `handleError(cudaError_enum)': NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined reference to `cuGetErrorString' /usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `main': NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit' /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference to `cuDeviceGetCount' /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference to `cuDeviceGet' /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference to `cuDeviceGetAttribute' /usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference to `cuDeviceGetAttribute' collect2: error: ld returned 1 exit status
I'm guessing that the problem here is that the build machine has Cuda installed (so the headers are found), but no 32-bit version of Cuda (so linking fails).
Probably easy to fix, but as of right now, our 32-bit testing is dead in the water; could someone please revert this pending a proper fix?
I'm guessing here it's finding the cuda.h header but not the libraries? Maybe we should determine this via the CMake and add a definition instead of has_include.
Can you let me know if rG4ce454c654bd solves it? I'm guessing the problem is arising when we find the libraries at build configure time but not at build time so we might need another check as well.
It looks like this change (but not the rG4ce454c654bd) is in the 17 branch, as the latter is now failing in the same way for crosscompiles.
It looks like it's there for me, see https://github.com/llvm/llvm-project/blob/main/clang/tools/nvptx-arch/NVPTXArch.cpp#L20. What is the issue? I made a slight tweak a few days ago on the 17 branch that updated how we could the CUDA driver.
We just started testing with the 17 branch this morning, can you point me at your tweak?
https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515. Does it work if you revert this? I wouldn't think it wouldn't affect anything. That's the only change that happened after the 16 release as far as I'm aware.
https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515. Does it work if you revert this? I wouldn't think it wouldn't affect anything. That's the only change that happened after the 16 release as far as I'm aware.
Reverting this (well, actually just monkey-patching in the old code) does indeed make things build correctly again -- a bit surprising to me too, but that's CMake for you. Reverting the change seems appropriate pending a tested fix.
It's finding a 64-bit CUDAToolkit, which it can't link against because the rest of the build is 32-bit.
Wondering why it didn't find it before then. But that's definitely a weird configuration. Not sure what a good generic solution is. We could always make it dlopen all the time.
Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly weird at all (especially on Windows), but apparently it is quite weird for LLVM at this point in time as we keep getting a lot of different things broken there :-)
I'm not very familiar with this type of build. Are there any variables we could pick up to just disable this if it's not building for the host system? Something like CMAKE_CROSSCOMPILING?
I'm not an expert on the LLVM build system, so I'm not entirely sure, but I'd start by examining the CMake setting LLVM_BUILD_32_BITS (which we set to ON in this case)
Can you let me know if adding this fixes it.
diff --git a/clang/tools/nvptx-arch/CMakeLists.txt b/clang/tools/nvptx-arch/CMakeLists.txt index 95c25dc75847..ccdba5ed69a7 100644 --- a/clang/tools/nvptx-arch/CMakeLists.txt +++ b/clang/tools/nvptx-arch/CMakeLists.txt @@ -12,7 +12,7 @@ add_clang_tool(nvptx-arch NVPTXArch.cpp) find_package(CUDAToolkit QUIET) # If we found the CUDA library directly we just dynamically link against it. -if (CUDAToolkit_FOUND) +if (CUDAToolkit_FOUND AND NOT CMAKE_CROSSCOMPILING) target_link_libraries(nvptx-arch PRIVATE CUDA::cuda_driver) else() target_compile_definitions(nvptx-arch PRIVATE "DYNAMIC_CUDA")
Unfortunately, no. (That is: It does not fix it. CMAKE_CROSSCOMPILING is unfortunately a subtle and flaky beast, see e.g. https://gitlab.kitware.com/cmake/cmake/-/issues/21744)
Interesting. It's definitely a bad problem that we find the 64-bit version and try to link with it for a cross-compile. I don't want to revert the code because the old FIND_CUDA is officially deprecated. I figured there was surely a way to check if LLVM was cross compiling.
For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago (https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356) and the process had finally come to the end with the release of CUDA-12:
32-bit compilation native and cross-compilation is removed from CUDA 12.0 and later Toolkit.
Use the CUDA Toolkit from earlier releases for 32-bit compilation. CUDA Driver will continue
to support running existing 32-bit applications on existing GPUs except Hopper.
Hopper does not support 32-bit applications. Ada will be the last architecture with driver support for 32-bit applications.
Hmm... maybe the right answer then is to just always use the dynamic-loading path when doing any kind of 32-bit build.
That's probably the best option. I don't think we have much pretense of supporting 64-bit offloading right now. Would this just require checking LLVM_BUILD_32_BITS? Should be an easy change.
I think so. (It might be tempting to check if (CMAKE_SIZEOF_VOID_P EQUAL 8) but LLVM_BUILD_32_BITS is likely to be a more explicit signal.)
unrelated: I always prefer: