This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
CMakeLists.txt
-
openmp/libomptarget/deviceRTLs/nvptx/
-
libomptarget/
-
deviceRTLs/
-
nvptx/
1/4
CMakeLists.txt

Differential D88929

[OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default
ClosedPublic

Authored by jhuber6 on Oct 6 2020, 2:36 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
ye-luo

Summary

This patch changes the CMake files for Clang and Libomptarget to query the system for its supported CUDA architecture. This will simplify the experience of building LLVM with OpenMP Offloading support by removing the need to manually specify the most optimal architecture for each system. Libomptarget will also build support for sm_35 as a fallback. This uses the find_cuda methods from CMake to detect the architecture which is deprecated in Cmake 3.18.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Oct 6 2020, 2:36 PM

Herald added subscribers: openmp-commits, cfe-commits, guansong and 2 others. · View Herald TranscriptOct 6 2020, 2:36 PM

jhuber6 requested review of this revision.Oct 6 2020, 2:36 PM

Herald added a subscriber: sstefan1. · View Herald TranscriptOct 6 2020, 2:36 PM

Harbormaster completed remote builds in B74190: Diff 296548.Oct 6 2020, 2:48 PM

FindCUDA has been deprecated.
Please explore the following feature without directly calling FindCUDA.
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1856

In D88929#2315378, @ye-luo wrote:

FindCUDA has been deprecated.
Please explore the following feature without directly calling FindCUDA.
https://gitlab.kitware.com/cmake/cmake/-/merge_requests/1856

Finding architectures using the CUDA language support requires CMake 3.18 as far as I know. LLVM's minimum CMake requirement is 3.13.4 so this method is probably the best I could see. Libomptarget already uses FindCUDA inside openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake so I figured it was okay. Alternatively, we could do the detection manually. All this amount to is compiling and running some CUDA code that prints the major and minor version. We could do that manually and try to use the original language support if you think that's a better option, I'm not a CMake guru or anything.

The link I posted indicated that independent feature is merged since 3.12. Better to avoid deprecated stuff when introducing new cmake lines even though some existing lines may still rely on deprecated cmake.

3.18 introduces CMAKE_CUDA_ARCHITECTURES. Does 3.18 supports detection? If we know a new way works since 3.18, I think putting both with if-else makes sense.

Regarding doing it manually. I think it is more risky than using existing schemes shipped with cmake.

In D88929#2315402, @ye-luo wrote:

The link I posted indicated that independent feature is merged since 3.12. Better to avoid deprecated stuff when introducing new cmake lines even though some existing lines may still rely on deprecated cmake.

This requires adding CUDA as a language, is that alright inside of Clang? Nothing is using CUDA, we're just checking the architecture.

In D88929#2315404, @ye-luo wrote:

3.18 introduces CMAKE_CUDA_ARCHITECTURES. Does 3.18 supports detection? If we know a new way works since 3.18, I think putting both with if-else makes sense.

Regarding doing it manually. I think it is more risky than using existing schemes shipped with cmake.

I'm not sure how that would affect the other Libomptarget stuff that uses FindCUDA since language enabling is done at the highest level, does that conflict?

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

In D88929#2315451, @ye-luo wrote:

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

If we're sticking with using FindCUDA it's definitely redundant here since it was already called by the time we get here. The support for CUDA language would use the same method but have enable_language(CUDA) somewhere instead of find_package(CUDA)

In D88929#2315513, @jhuber6 wrote:

In D88929#2315451, @ye-luo wrote:

I just realized that this patch affects clang and libomptarget.
I cannot comment on clang. Regarding libomptarget, Could you explain why the detection is not put together with other cuda stuff in openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

If we're sticking with using FindCUDA it's definitely redundant here since it was already called by the time we get here. The support for CUDA language would use the same method but have enable_language(CUDA) somewhere instead of find_package(CUDA)

Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?

In D88929#2315519, @ye-luo wrote:

Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?

That only controls loading the library, since this is where we set all the CUDA options I think it's fine to call it here.

In D88929#2315538, @jhuber6 wrote:

In D88929#2315519, @ye-luo wrote:

Probably not messing with enable_language(CUDA) at the moment, just add cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS) to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake?

That only controls loading the library, since this is where we set all the CUDA options I think it's fine to call it here.

Yes. I'm reducing my review to libomptarget only. I could not comment on clang/CMakeLists.txt. Regarding libomptarget, keep all the CUDA detection inside LibomptargetGetDependencies.cmake

JonChesterfield added a subscriber: JonChesterfield.Oct 6 2020, 4:35 PM

An alternative approach is to build the deviceRTL for multiple cuda versions and then pick whichever one is the best fit when compiling application code. That has advantages when building the deviceRTL libraries on a different machine to the one that intends to use it.

Cmake isn't my thing, but I see that my trunk build only has libomptarget-nvptx-sm_35.bc when the local card is a sm_50. The downstream amd toolchain builds lots of this library, my install dir has fifteen of them (including sm_50).

In D88929#2315640, @JonChesterfield wrote:

An alternative approach is to build the deviceRTL for multiple cuda versions and then pick whichever one is the best fit when compiling application code. That has advantages when building the deviceRTL libraries on a different machine to the one that intends to use it.

Cmake isn't my thing, but I see that my trunk build only has libomptarget-nvptx-sm_35.bc when the local card is a sm_50. The downstream amd toolchain builds lots of this library, my install dir has fifteen of them (including sm_50).

You can build multiple deviceRTL today with LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=50,61,70. This patch tries to add the high arch automatically.

Removing redundant call to find_package.

Harbormaster completed remote builds in B74211: Diff 296577.Oct 6 2020, 7:11 PM

ye-luo requested changes to this revision.Oct 6 2020, 8:01 PM

ye-luo added inline comments.

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
80	Doesn't work right now. Missing comma ",${CMAKE_MATCH_1}" using CUDA_ARCH as "string(REGEX MATCH" output causes problems. "append" needs to protect redundant 35. I think it is better to move this part of logic to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake` after `find_package(CUDA QUIET)`.

This revision now requires changes to proceed.Oct 6 2020, 8:01 PM

jhuber6 added inline comments.Oct 6 2020, 8:12 PM

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
80	I noticed the comma problem, it's because the compute capabilities uses commas instead of semicolons like the rest of CMake to delimit the values. There's another weird error I'm getting while trying to build with this where it will add `sm_70` as an argument causing an nvcc error. like in `nvcc -o out file.cpp sm_70`. What's the problem here? Pretty much the output gives some string that you would pass to nvcc like `--arch=sm_70` and I'm regexing out the `sm_70` if there was no match or the architecture is too small it doesn't add anything. I'm thinking it would just be easiest to change it do something like `set(default_capabilities "35,${CMAKE_MATCH_1}")` My feeling is that there's not enough complexity here to justify moving it since I'd need to add just as much code to generate the output and then even more to use it here. Since the LibomptargetGetDependencies.cmake doesn't bother checking whether or not the `find_package(CUDA)` was successful I feel like there's no need to add logic that requires checking if it was there when we already have it here.

ye-luo added inline comments.Oct 6 2020, 9:40 PM

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
95	my point 2 refers to here CUDA_ARCH which gets into the compile line, your point 1 issue. rename your output variable ot CUDA_ARCH_MATCH_OUTPUT should solve the problem. I still think it is better to move default_capabilities. Very natural to have cuda_select_nvcc_arch_flags next to find_package(cuda) in one place.

ye-luo added inline comments.Oct 6 2020, 10:13 PM

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt
80	better than append but still need a check to avoid "35,35" https://github.com/ye-luo/llvm-project/commit/ac5f20f9770e894ff48467a9317ec0649f5c7562 libomptarget part should be fulling working.

Implementing Ye's code. I changed it to putput the architecture as a dependency and then set the value where we define the default architecture. I did a full build from scratch using this and got the sm_70 libraries for my machine without needing to specify it in my build script.

Harbormaster completed remote builds in B74352: Diff 296792.Oct 7 2020, 3:20 PM

@ye-luo good?

LGTM

This revision is now accepted and ready to land.Oct 7 2020, 7:55 PM

Nice patch. Thanks!

Screwed up and put the wrong revision in the commit message. Closed in rGd564409946a5a13cb6391fc0fec54dcbd6f6d249

Meinersbur mentioned this in D89974: [driver][CUDA] Use CMake's FindCUDA as default --cuda-path..Oct 22 2020, 11:07 AM

Revision Contents

Path

Size

clang/

CMakeLists.txt

23 lines

openmp/

libomptarget/

deviceRTLs/

nvptx/

CMakeLists.txt

19 lines

Diff 296548

clang/CMakeLists.txt

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	set(CLANG_DEFAULT_OBJCOPY "objcopy" CACHE STRING
"Default objcopy executable to use.")		"Default objcopy executable to use.")

set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING		set(CLANG_DEFAULT_OPENMP_RUNTIME "libomp" CACHE STRING
"Default OpenMP runtime used by -fopenmp.")		"Default OpenMP runtime used by -fopenmp.")

# OpenMP offloading requires at least sm_35 because we use shuffle instructions		# OpenMP offloading requires at least sm_35 because we use shuffle instructions
# to generate efficient code for reductions and the atomicMax instruction on		# to generate efficient code for reductions and the atomicMax instruction on
# 64-bit integers in the implementation of conditional lastprivate.		# 64-bit integers in the implementation of conditional lastprivate.
set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING		set(CUDA_ARCH_FLAGS "sm_35")
"Default architecture for OpenMP offloading to Nvidia GPUs.")
string(REGEX MATCH "^sm_([0-9]+)$" MATCHED_ARCH "${CLANG_OPENMP_NVPTX_DEFAULT_ARCH}")		# Try to find the highest architecture the host supports
if (NOT DEFINED MATCHED_ARCH OR "${CMAKE_MATCH_1}" LESS 35)		if (NOT DEFINED CLANG_OPENMP_NVPTX_DEFAULT_ARCH)
message(WARNING "Resetting default architecture for OpenMP offloading to Nvidia GPUs to sm_35")		find_package(CUDA QUIET)
		if (CUDA_FOUND)
		cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS)
		endif()
		else()
		set(CUDA_ARCH_FLAGS ${CLANG_OPENMP_NVPTX_DEFAULT_ARCH})
		endif()

		string(REGEX MATCH "sm_([0-9]+)" CUDA_ARCH ${CUDA_ARCH_FLAGS})
		if (NOT DEFINED CUDA_ARCH OR "${CMAKE_MATCH_1}" LESS 35)
set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING		set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH "sm_35" CACHE STRING
"Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)		"Default architecture for OpenMP offloading to Nvidia GPUs." FORCE)
		message(WARNING "Resetting default architecture for OpenMP offloading to Nvidia GPUs to sm_35")
		else()
		set(CLANG_OPENMP_NVPTX_DEFAULT_ARCH ${CUDA_ARCH} CACHE STRING
		"Default architecture for OpenMP offloading to Nvidia GPUs.")
endif()		endif()

set(CLANG_SYSTEMZ_DEFAULT_ARCH "z10" CACHE STRING "SystemZ Default Arch")		set(CLANG_SYSTEMZ_DEFAULT_ARCH "z10" CACHE STRING "SystemZ Default Arch")

set(CLANG_VENDOR ${PACKAGE_VENDOR} CACHE STRING		set(CLANG_VENDOR ${PACKAGE_VENDOR} CACHE STRING
"Vendor-specific text for showing with version information.")		"Vendor-specific text for showing with version information.")

if( CLANG_VENDOR )		if( CLANG_VENDOR )
▲ Show 20 Lines • Show All 574 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	set(cuda_src_files
${devicertl_common_directory}/src/sync.cu		${devicertl_common_directory}/src/sync.cu
${devicertl_common_directory}/src/task.cu		${devicertl_common_directory}/src/task.cu
src/target_impl.cu		src/target_impl.cu
)		)

set(omp_data_objects ${devicertl_common_directory}/src/omp_data.cu)		set(omp_data_objects ${devicertl_common_directory}/src/omp_data.cu)

# Get the compute capability the user requested or use SM_35 by default.		# Get the compute capability the user requested or use SM_35 by default.
# SM_35 is what clang uses by default.		set(compute_capabilities 35)
set(default_capabilities 35)		if (NOT DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES)
		find_package(CUDA QUIET)
		if (CUDA_FOUND)
		cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS)
		string(REGEX MATCH "sm_([0-9]+)" CUDA_ARCH ${CUDA_ARCH_FLAGS})
		if (NOT DEFINED CUDA_ARCH OR "${CMAKE_MATCH_1}" LESS 35)
		message(WARNING "Setting default architecture for OpenMP target library to sm_35")
		else()
		list(APPEND compute_capabilities ${CMAKE_MATCH_1})
		ye-luoUnsubmitted Not Done Reply Inline Actions Doesn't work right now. Missing comma ",${CMAKE_MATCH_1}" using CUDA_ARCH as "string(REGEX MATCH" output causes problems. "append" needs to protect redundant 35. I think it is better to move this part of logic to `openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake` after `find_package(CUDA QUIET)`. ye-luo: 1. Doesn't work right now. Missing comma ",${CMAKE_MATCH_1}" 2. using CUDA_ARCH as "string…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions I noticed the comma problem, it's because the compute capabilities uses commas instead of semicolons like the rest of CMake to delimit the values. There's another weird error I'm getting while trying to build with this where it will add `sm_70` as an argument causing an nvcc error. like in `nvcc -o out file.cpp sm_70`. What's the problem here? Pretty much the output gives some string that you would pass to nvcc like `--arch=sm_70` and I'm regexing out the `sm_70` if there was no match or the architecture is too small it doesn't add anything. I'm thinking it would just be easiest to change it do something like `set(default_capabilities "35,${CMAKE_MATCH_1}")` My feeling is that there's not enough complexity here to justify moving it since I'd need to add just as much code to generate the output and then even more to use it here. Since the LibomptargetGetDependencies.cmake doesn't bother checking whether or not the `find_package(CUDA)` was successful I feel like there's no need to add logic that requires checking if it was there when we already have it here. jhuber6: 1. I noticed the comma problem, it's because the compute capabilities uses commas instead of…
		ye-luoUnsubmitted Not Done Reply Inline Actions better than append but still need a check to avoid "35,35" https://github.com/ye-luo/llvm-project/commit/ac5f20f9770e894ff48467a9317ec0649f5c7562 libomptarget part should be fulling working. ye-luo: 3. better than append but still need a check to avoid "35,35" 4. https://github.com/ye-luo/llvm…
		endif()
		endif()
		endif()


if (DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)		if (DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
set(default_capabilities ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY})		set(default_capabilities ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY})
libomptarget_warning_say("LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY is deprecated, please use LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES")		libomptarget_warning_say("LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY is deprecated, please use LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES")
endif()		endif()
set(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES ${default_capabilities} CACHE STRING		set(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES ${compute_capabilities} CACHE STRING
"List of CUDA Compute Capabilities to be used to compile the NVPTX device RTL.")		"List of CUDA Compute Capabilities to be used to compile the NVPTX device RTL.")
string(REPLACE "," ";" nvptx_sm_list ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES})		string(REPLACE "," ";" nvptx_sm_list ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES})

foreach(sm ${nvptx_sm_list})		foreach(sm ${nvptx_sm_list})
set(CUDA_ARCH ${CUDA_ARCH} -gencode arch=compute_${sm},code=sm_${sm})		set(CUDA_ARCH ${CUDA_ARCH} -gencode arch=compute_${sm},code=sm_${sm})
		ye-luoUnsubmitted Not Done Reply Inline Actions my point 2 refers to here CUDA_ARCH which gets into the compile line, your point 1 issue. rename your output variable ot CUDA_ARCH_MATCH_OUTPUT should solve the problem. I still think it is better to move default_capabilities. Very natural to have cuda_select_nvcc_arch_flags next to find_package(cuda) in one place. ye-luo: my point 2 refers to here CUDA_ARCH which gets into the compile line, your point 1 issue.
endforeach()		endforeach()

# Override default MAX_SM in src/target_impl.h if requested		# Override default MAX_SM in src/target_impl.h if requested
if (DEFINED LIBOMPTARGET_NVPTX_MAX_SM)		if (DEFINED LIBOMPTARGET_NVPTX_MAX_SM)
set(MAX_SM_DEFINITION "-DMAX_SM=${LIBOMPTARGET_NVPTX_MAX_SM}")		set(MAX_SM_DEFINITION "-DMAX_SM=${LIBOMPTARGET_NVPTX_MAX_SM}")
endif()		endif()

# Activate RTL message dumps if requested by the user.		# Activate RTL message dumps if requested by the user.
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines