This patch changes the CMake files for Clang and Libomptarget to query the system for its supported CUDA architecture. This will simplify the experience of building LLVM with OpenMP Offloading support by removing the need to manually specify the most optimal architecture for each system. Libomptarget will also build support for sm_35 as a fallback. This uses the find_cuda methods from CMake to detect the architecture which is deprecated in Cmake 3.18.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
FindCUDA has been deprecated.
Please explore the following feature without directly calling FindCUDA.
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1856
Finding architectures using the CUDA language support requires CMake 3.18 as far as I know. LLVM's minimum CMake requirement is 3.13.4 so this method is probably the best I could see. Libomptarget already uses FindCUDA inside openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake so I figured it was okay. Alternatively, we could do the detection manually. All this amount to is compiling and running some CUDA code that prints the major and minor version. We could do that manually and try to use the original language support if you think that's a better option, I'm not a CMake guru or anything.
The link I posted indicated that independent feature is merged since 3.12. Better to avoid deprecated stuff when introducing new cmake lines even though some existing lines may still rely on deprecated cmake.
3.18 introduces CMAKE_CUDA_ARCHITECTURES. Does 3.18 supports detection? If we know a new way works since 3.18, I think putting both with if-else makes sense.
Regarding doing it manually. I think it is more risky than using existing schemes shipped with cmake.
This requires adding CUDA as a language, is that alright inside of Clang? Nothing is using CUDA, we're just checking the architecture.
I'm not sure how that would affect the other Libomptarget stuff that uses FindCUDA since language enabling is done at the highest level, does that conflict?
I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake
If we're sticking with using FindCUDA it's definitely redundant here since it was already called by the time we get here. The support for CUDA language would use the same method but have enable_language(CUDA) somewhere instead of find_package(CUDA)
Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?
That only controls loading the library, since this is where we set all the CUDA options I think it's fine to call it here.
Yes. I'm reducing my review to libomptarget only. I could not comment on clang/CMakeLists.txt. Regarding libomptarget, keep all the CUDA detection inside LibomptargetGetDependencies.cmake
An alternative approach is to build the deviceRTL for multiple cuda versions and then pick whichever one is the best fit when compiling application code. That has advantages when building the deviceRTL libraries on a different machine to the one that intends to use it.
Cmake isn't my thing, but I see that my trunk build only has libomptarget-nvptx-sm_35.bc when the local card is a sm_50. The downstream amd toolchain builds lots of this library, my install dir has fifteen of them (including sm_50).
You can build multiple deviceRTL today with LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=50,61,70. This patch tries to add the high arch automatically.
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt | ||
---|---|---|
79 |
after find_package(CUDA QUIET). |
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt | ||
---|---|---|
79 |
|
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt | ||
---|---|---|
87 | my point 2 refers to here CUDA_ARCH which gets into the compile line, your point 1 issue. rename your output variable ot CUDA_ARCH_MATCH_OUTPUT should solve the problem. I still think it is better to move default_capabilities. Very natural to have cuda_select_nvcc_arch_flags next to find_package(cuda) in one place. |
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt | ||
---|---|---|
79 |
libomptarget part should be fulling working. |
Implementing Ye's code. I changed it to putput the architecture as a dependency and then set the value where we define the default architecture. I did a full build from scratch using this and got the sm_70 libraries for my machine without needing to specify it in my build script.
Screwed up and put the wrong revision in the commit message. Closed in rGd564409946a5a13cb6391fc0fec54dcbd6f6d249
after find_package(CUDA QUIET).