Page MenuHomePhabricator

[OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default
ClosedPublic

Authored by jhuber6 on Oct 6 2020, 2:36 PM.

Details

Reviewers
jdoerfert
ye-luo
Summary

This patch changes the CMake files for Clang and Libomptarget to query the system for its supported CUDA architecture. This will simplify the experience of building LLVM with OpenMP Offloading support by removing the need to manually specify the most optimal architecture for each system. Libomptarget will also build support for sm_35 as a fallback. This uses the find_cuda methods from CMake to detect the architecture which is deprecated in Cmake 3.18.

Diff Detail

Unit TestsFailed

TimeTest
2,410 mswindows > Profile-x86_64.Profile-x86_64::gcc-flag-compatibility.test
Script: -- : 'RUN: at line 1'; rm -rf C:\ws\w64\llvm-project\premerge-checks\build\projects\compiler-rt\test\profile\Profile-x86_64\Output\gcc-flag-compatibility.test.tmp.d
600,110 mswindows > Profile-x86_64.Profile-x86_64::instrprof-basic.c
Script: -- : 'RUN: at line 1'; C:/ws/w64/llvm-project/premerge-checks/build/./bin/clang.exe -Wl,-incremental:no -fprofile-instr-generate -o C:\ws\w64\llvm-project\premerge-checks\build\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-basic.c.tmp -O3 C:\ws\w64\llvm-project\premerge-checks\compiler-rt\test\profile\instrprof-basic.c
600,080 mswindows > Profile-x86_64.Profile-x86_64::instrprof-dump.c
Script: -- : 'RUN: at line 2'; rm -fr C:\ws\w64\llvm-project\premerge-checks\build\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-dump.c.tmp.profdir
2,940 mswindows > Profile-x86_64.Profile-x86_64::instrprof-override-filename.c
Script: -- : 'RUN: at line 1'; rm -rf C:\ws\w64\llvm-project\premerge-checks\build\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-override-filename.c.tmp.dir && mkdir -p C:\ws\w64\llvm-project\premerge-checks\build\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-override-filename.c.tmp.dir
600,090 mswindows > Profile-x86_64.Windows::instrprof-multiprocess.test
Script: -- : 'RUN: at line 1'; C:/ws/w64/llvm-project/premerge-checks/build/./bin/clang.exe -Wl,-incremental:no -fprofile-instr-generate C:\ws\w64\llvm-project\premerge-checks\compiler-rt\test\profile\Windows/Inputs/instrprof-multiprocess.c -o C:\ws\w64\llvm-project\premerge-checks\build\projects\compiler-rt\test\profile\Profile-x86_64\Windows\Output\instrprof-multiprocess.test.tmp
View Full Test Results (28 Failed)

Event Timeline

jhuber6 created this revision.Oct 6 2020, 2:36 PM
jhuber6 requested review of this revision.Oct 6 2020, 2:36 PM
ye-luo added a subscriber: ye-luo.Oct 6 2020, 2:50 PM

FindCUDA has been deprecated.
Please explore the following feature without directly calling FindCUDA.
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1856

FindCUDA has been deprecated.
Please explore the following feature without directly calling FindCUDA.
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1856

Finding architectures using the CUDA language support requires CMake 3.18 as far as I know. LLVM's minimum CMake requirement is 3.13.4 so this method is probably the best I could see. Libomptarget already uses FindCUDA inside openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake so I figured it was okay. Alternatively, we could do the detection manually. All this amount to is compiling and running some CUDA code that prints the major and minor version. We could do that manually and try to use the original language support if you think that's a better option, I'm not a CMake guru or anything.

ye-luo added a comment.EditedOct 6 2020, 3:04 PM

The link I posted indicated that independent feature is merged since 3.12. Better to avoid deprecated stuff when introducing new cmake lines even though some existing lines may still rely on deprecated cmake.

ye-luo added a comment.EditedOct 6 2020, 3:13 PM

3.18 introduces CMAKE_CUDA_ARCHITECTURES. Does 3.18 supports detection? If we know a new way works since 3.18, I think putting both with if-else makes sense.

Regarding doing it manually. I think it is more risky than using existing schemes shipped with cmake.

The link I posted indicated that independent feature is merged since 3.12. Better to avoid deprecated stuff when introducing new cmake lines even though some existing lines may still rely on deprecated cmake.

This requires adding CUDA as a language, is that alright inside of Clang? Nothing is using CUDA, we're just checking the architecture.

3.18 introduces CMAKE_CUDA_ARCHITECTURES. Does 3.18 supports detection? If we know a new way works since 3.18, I think putting both with if-else makes sense.

Regarding doing it manually. I think it is more risky than using existing schemes shipped with cmake.

I'm not sure how that would affect the other Libomptarget stuff that uses FindCUDA since language enabling is done at the highest level, does that conflict?

ye-luo added a comment.Oct 6 2020, 3:36 PM

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

If we're sticking with using FindCUDA it's definitely redundant here since it was already called by the time we get here. The support for CUDA language would use the same method but have enable_language(CUDA) somewhere instead of find_package(CUDA)

ye-luo added a comment.Oct 6 2020, 3:54 PM

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

If we're sticking with using FindCUDA it's definitely redundant here since it was already called by the time we get here. The support for CUDA language would use the same method but have enable_language(CUDA) somewhere instead of find_package(CUDA)

Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?

Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?

That only controls loading the library, since this is where we set all the CUDA options I think it's fine to call it here.

ye-luo added a comment.Oct 6 2020, 4:09 PM

Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?

That only controls loading the library, since this is where we set all the CUDA options I think it's fine to call it here.

Yes. I'm reducing my review to libomptarget only. I could not comment on clang/CMakeLists.txt. Regarding libomptarget, keep all the CUDA detection inside LibomptargetGetDependencies.cmake

An alternative approach is to build the deviceRTL for multiple cuda versions and then pick whichever one is the best fit when compiling application code. That has advantages when building the deviceRTL libraries on a different machine to the one that intends to use it.

Cmake isn't my thing, but I see that my trunk build only has libomptarget-nvptx-sm_35.bc when the local card is a sm_50. The downstream amd toolchain builds lots of this library, my install dir has fifteen of them (including sm_50).

ye-luo added a comment.Oct 6 2020, 5:07 PM

An alternative approach is to build the deviceRTL for multiple cuda versions and then pick whichever one is the best fit when compiling application code. That has advantages when building the deviceRTL libraries on a different machine to the one that intends to use it.

Cmake isn't my thing, but I see that my trunk build only has libomptarget-nvptx-sm_35.bc when the local card is a sm_50. The downstream amd toolchain builds lots of this library, my install dir has fifteen of them (including sm_50).

You can build multiple deviceRTL today with LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=50,61,70. This patch tries to add the high arch automatically.

jhuber6 updated this revision to Diff 296577.Oct 6 2020, 6:59 PM

Removing redundant call to find_package.

ye-luo requested changes to this revision.Oct 6 2020, 8:01 PM
ye-luo added inline comments.
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
79
  1. Doesn't work right now. Missing comma ",${CMAKE_MATCH_1}"
  2. using CUDA_ARCH as "string(REGEX MATCH" output causes problems.
  3. "append" needs to protect redundant 35.
  4. I think it is better to move this part of logic to openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

after find_package(CUDA QUIET).

This revision now requires changes to proceed.Oct 6 2020, 8:01 PM
jhuber6 added inline comments.Oct 6 2020, 8:12 PM
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
79
  1. I noticed the comma problem, it's because the compute capabilities uses commas instead of semicolons like the rest of CMake to delimit the values. There's another weird error I'm getting while trying to build with this where it will add sm_70 as an argument causing an nvcc error. like in nvcc -o out file.cpp sm_70.
  2. What's the problem here? Pretty much the output gives some string that you would pass to nvcc like --arch=sm_70 and I'm regexing out the sm_70 if there was no match or the architecture is too small it doesn't add anything.
  3. I'm thinking it would just be easiest to change it do something like set(default_capabilities "35,${CMAKE_MATCH_1}")
  4. My feeling is that there's not enough complexity here to justify moving it since I'd need to add just as much code to generate the output and then even more to use it here. Since the LibomptargetGetDependencies.cmake doesn't bother checking whether or not the find_package(CUDA) was successful I feel like there's no need to add logic that requires checking if it was there when we already have it here.
ye-luo added inline comments.Oct 6 2020, 9:40 PM
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
87

my point 2 refers to here CUDA_ARCH which gets into the compile line, your point 1 issue. rename your output variable ot CUDA_ARCH_MATCH_OUTPUT should solve the problem.

I still think it is better to move default_capabilities. Very natural to have cuda_select_nvcc_arch_flags next to find_package(cuda) in one place.

ye-luo added inline comments.Oct 6 2020, 10:13 PM
openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
79
  1. better than append but still need a check to avoid "35,35"
  2. https://github.com/ye-luo/llvm-project/commit/ac5f20f9770e894ff48467a9317ec0649f5c7562

libomptarget part should be fulling working.

jhuber6 updated this revision to Diff 296792.Oct 7 2020, 2:27 PM

Implementing Ye's code. I changed it to putput the architecture as a dependency and then set the value where we define the default architecture. I did a full build from scratch using this and got the sm_70 libraries for my machine without needing to specify it in my build script.

ye-luo accepted this revision.Oct 7 2020, 7:55 PM

LGTM

This revision is now accepted and ready to land.Oct 7 2020, 7:55 PM

Nice patch. Thanks!

jhuber6 closed this revision.Oct 8 2020, 9:12 AM

Screwed up and put the wrong revision in the commit message. Closed in rGd564409946a5a13cb6391fc0fec54dcbd6f6d249