This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/
-
README.rst
-
libomptarget/
-
cmake/Modules/
-
Modules/
1/1
LibomptargetGetDependencies.cmake
-
LibomptargetNVPTXBitcodeLibrary.cmake
-
deviceRTLs/nvptx/
-
nvptx/
-
CMakeLists.txt

Differential D95466

[OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs`
ClosedPublic

Authored by tianshilei1992 on Jan 26 2021, 11:26 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield

Commits

rGe7535f8fedb5: [OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs`

Summary

With D94745, we no longer use CUDA SDK to compile deviceRTLs. Therefore,
many CMake code in the project is useless. This patch cleans up unnecessary code
and also drops the requirement to build NVPTX deviceRTLs. CUDA detection is
still being used however to determine whether we need to involve the tests. Auto
detection of compute capability is enabled by default and can be disabled by
setting CMake variable LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF.
If auto detection is enabled, and CUDA is also valid, it will only build the
bitcode library for the detected version; otherwise, all variants supported will
be generated. One drawback of this patch is, we now generate 96 variants of
bitcode library, and totally 1485 files to be built with a clean build on a
non-CUDA system. LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="" can be used to
disable building NVPTX deviceRTLs.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	270 ms	x64 windows > Clang.CodeGen::profile-filter.c

Event Timeline

tianshilei1992 created this revision.Jan 26 2021, 11:26 AM

Herald added subscribers: guansong, yaxunl, mgorny, jholewinski. · View Herald TranscriptJan 26 2021, 11:26 AM

tianshilei1992 requested review of this revision.Jan 26 2021, 11:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2021, 11:26 AM

Herald added subscribers: openmp-commits, sstefan1. · View Herald Transcript

tianshilei1992 added inline comments.Jan 26 2021, 11:27 AM

openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake
15	My editor can avoid all trailing spaces.

Cool, thank you. If I'm reading this right, we use the same CMAKE variables to pick a compiler as before, and compile every file 96 times to create 96 libraries.

That seems OK, if a bit inefficient. I believe it's only target_impl.cpp that cares about ptx version, so we could reduce the build time by compiling everything else once per SM and using llvm-link to create each output library from the common base plus the ptx-specific target_impl

Harbormaster completed remote builds in B86750: Diff 319367.Jan 26 2021, 12:45 PM

In D95466#2523434, @JonChesterfield wrote:

That seems OK, if a bit inefficient. I believe it's only target_impl.cpp that cares about ptx version, so we could reduce the build time by compiling everything else once per SM and using llvm-link to create each output library from the common base plus the ptx-specific target_impl

Only target_impl.cu cares about the macro, but every time we invoke the compiler, we need to pass -target-cpu sm_xx. I'm not sure it's safe to assume for other code it is good to use an arbitrary SM number.

Spent some time trying to build this without system headers and failed get the printf->vprintf transform to fire. Installing gcc-multilib and using the glibc headers worked out of the box.

This revision is now accepted and ready to land.Jan 26 2021, 4:19 PM

This revision was landed with ongoing or failed builds.Jan 26 2021, 5:21 PM

Closed by commit rGe7535f8fedb5: [OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs` (authored by tianshilei1992). · Explain Why

This revision was automatically updated to reflect the committed changes.

tianshilei1992 added a commit: rGe7535f8fedb5: [OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs`.

I don't have CUDA on my system, and now the build is broken:

[ 38%] Building LLVM bitcode target_impl.cu-cuda_80-sm_80.bc
In file included from /w/src/llvm.org/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:14:
In file included from /w/src/llvm.org/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:15:
In file included from /usr/include/assert.h:35:
/usr/include/features.h:424:12: fatal error: 'sys/cdefs.h' file not found
#  include <sys/cdefs.h>
           ^~~~~~~~~~~~~
1 error generated.

In D95466#2525767, @kparzysz wrote:

I don't have CUDA on my system, and now the build is broken:

[ 38%] Building LLVM bitcode target_impl.cu-cuda_80-sm_80.bc
In file included from /w/src/llvm.org/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:14:
In file included from /w/src/llvm.org/openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:15:
In file included from /usr/include/assert.h:35:
/usr/include/features.h:424:12: fatal error: 'sys/cdefs.h' file not found
#  include <sys/cdefs.h>
           ^~~~~~~~~~~~~
1 error generated.

gcc-multilib is needed.

I think this is because nvptx is a 32 bit platform and you're compiling on a 64 bit platform. gcc-multilib will fix.

In general I don't like the dependence on host libc when compiling for a gpu (as they don't have very much in common), but need to debug through printf handling to break that link.

I don't have CUDA, why is this being compiled on my system to begin with?

In D95466#2525874, @kparzysz wrote:

I don't have CUDA, why is this being compiled on my system to begin with?

Because they were not compiled before.

In D95466#2525874, @kparzysz wrote:

I don't have CUDA, why is this being compiled on my system to begin with?

The idea is to compile llvm on systems that don't have cuda installed, such that the toolchain can later compile openmp code that runs on systems that do have cuda + nvptx hardware.

In particular, so that the llvm compiled for linux distributions can be installed from a package manager onto a system that does have a gpu.

However, it's starting to look like some people who are building libomptarget don't have gcc-multilib installed and also don't care about nvptx offloading.

I think we therefore need to do one of the following:

turn this build off by default if cuda is missing (what we used to have)
turn this build off if compilation fails, e.g. by trying to detect multilibs, or otherwise make compilation failures non-fatal
drop the dependency on the host, which is straightforward if we disable printf and otherwise awkward

turn this build off by default if cuda is missing (what we used to have)

Yes, let's do this and ask the packagers for releases to enable it.

Aside from potentially disabling this build, I actually have multilib installed:

ii  gcc-7-multilib                   7.5.0-3ubuntu1~18.04                amd64        GNU C compiler (multilib support)
ii  gcc-multilib                     4:7.4.0-1ubuntu2.3                  amd64        GNU C compiler (multilib files)

Is there something missing in the cmake files that should make use of it?

In D95466#2526059, @kparzysz wrote:
Aside from potentially disabling this build, I actually have multilib installed:
ii  gcc-7-multilib                   7.5.0-3ubuntu1~18.04                amd64        GNU C compiler (multilib support)
ii  gcc-multilib                     4:7.4.0-1ubuntu2.3                  amd64        GNU C compiler (multilib files)
Is there something missing in the cmake files that should make use of it?

Hmm, I installed both gcc-multilib and g++-multilib, but here we actually only includes C header.

tianshilei1992 mentioned this in D95556: [OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system.Jan 27 2021, 12:27 PM

tianshilei1992 mentioned this in rGfb12df4a8e33: [OpenMP][NVPTX] Disable building NVPTX deviceRTL by default on a non-CUDA system.Jan 27 2021, 2:06 PM

In D95466#2526059, @kparzysz wrote:
Aside from potentially disabling this build, I actually have multilib installed:
ii  gcc-7-multilib                   7.5.0-3ubuntu1~18.04                amd64        GNU C compiler (multilib support)
ii  gcc-multilib                     4:7.4.0-1ubuntu2.3                  amd64        GNU C compiler (multilib files)
Is there something missing in the cmake files that should make use of it?

The package should be libc-dev on Ubuntu.

Meinersbur mentioned this in D101265: [OpenMP][CMake] Use in-project clang as CUDA->IR compiler..Apr 25 2021, 5:53 PM

Revision Contents

Path

Size

openmp/

README.rst

2 lines

libomptarget/

cmake/

Modules/

LibomptargetGetDependencies.cmake

22 lines

LibomptargetNVPTXBitcodeLibrary.cmake

deviceRTLs/

nvptx/

CMakeLists.txt

308 lines

Diff 319367

openmp/README.rst

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	LIBOMPTARGET_NVPTX_ALTERNATE_HOST_COMPILER = ``""``
Host compiler to use with NVCC. This compiler is not going to be used to		Host compiler to use with NVCC. This compiler is not going to be used to
produce any binary. Instead, this is used to overcome the input compiler		produce any binary. Instead, this is used to overcome the input compiler
checks done by NVCC. E.g. if using a default host compiler that is not		checks done by NVCC. E.g. if using a default host compiler that is not
compatible with NVCC, this option can be use to pass to NVCC a valid compiler		compatible with NVCC, this option can be use to pass to NVCC a valid compiler
to avoid the error.		to avoid the error.

LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES = ``35``		LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES = ``35``
List of CUDA compute capabilities that should be supported by the NVPTX		List of CUDA compute capabilities that should be supported by the NVPTX
device RTL. E.g. for compute capabilities 6.0 and 7.0, the option "60,70"		device RTL. E.g. for compute capabilities 6.0 and 7.0, the option "60;70"
should be used. Compute capability 3.5 is the minimum required.		should be used. Compute capability 3.5 is the minimum required.

LIBOMPTARGET_NVPTX_DEBUG = ``OFF\|ON``		LIBOMPTARGET_NVPTX_DEBUG = ``OFF\|ON``
Enable printing of debug messages from the NVPTX device RTL.		Enable printing of debug messages from the NVPTX device RTL.

Example Usages of CMake		Example Usages of CMake
=======================		=======================

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

openmp/libomptarget/cmake/Modules/LibomptargetGetDependencies.cmake

#		#
#//===----------------------------------------------------------------------===//		#//===----------------------------------------------------------------------===//
#//		#//
#// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		#// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
#// See https://llvm.org/LICENSE.txt for license information.		#// See https://llvm.org/LICENSE.txt for license information.
#// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		#// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#//		#//
#//===----------------------------------------------------------------------===//		#//===----------------------------------------------------------------------===//
#		#

# Try to detect in the system several dependencies required by the different		# Try to detect in the system several dependencies required by the different
# components of libomptarget. These are the dependencies we have:		# components of libomptarget. These are the dependencies we have:
#		#
# libelf : required by some targets to handle the ELF files at runtime.		# libelf : required by some targets to handle the ELF files at runtime.
# libffi : required to launch target kernels given function and argument		# libffi : required to launch target kernels given function and argument
		tianshilei1992AuthorUnsubmitted Done Reply Inline Actions My editor can avoid all trailing spaces. tianshilei1992: My editor can avoid all trailing spaces.
# pointers.		# pointers.
# CUDA : required to control offloading to NVIDIA GPUs.		# CUDA : required to control offloading to NVIDIA GPUs.
# VEOS : required to control offloading to NEC Aurora.		# VEOS : required to control offloading to NEC Aurora.

include (FindPackageHandleStandardArgs)		include (FindPackageHandleStandardArgs)

################################################################################		################################################################################
# Looking for libelf...		# Looking for libelf...
Show All 18 Lines	NAMES
elf		elf
PATHS		PATHS
/usr/lib		/usr/lib
/usr/local/lib		/usr/local/lib
/opt/local/lib		/opt/local/lib
/sw/lib		/sw/lib
ENV LIBRARY_PATH		ENV LIBRARY_PATH
ENV LD_LIBRARY_PATH)		ENV LD_LIBRARY_PATH)

set(LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIRS ${LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIR})		set(LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIRS ${LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIR})
find_package_handle_standard_args(		find_package_handle_standard_args(
LIBOMPTARGET_DEP_LIBELF		LIBOMPTARGET_DEP_LIBELF
DEFAULT_MSG		DEFAULT_MSG
LIBOMPTARGET_DEP_LIBELF_LIBRARIES		LIBOMPTARGET_DEP_LIBELF_LIBRARIES
LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIRS)		LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIRS)

mark_as_advanced(		mark_as_advanced(
LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIRS		LIBOMPTARGET_DEP_LIBELF_INCLUDE_DIRS
LIBOMPTARGET_DEP_LIBELF_LIBRARIES)		LIBOMPTARGET_DEP_LIBELF_LIBRARIES)

################################################################################		################################################################################
# Looking for libffi...		# Looking for libffi...
################################################################################		################################################################################
find_package(PkgConfig)		find_package(PkgConfig)

pkg_check_modules(LIBOMPTARGET_SEARCH_LIBFFI QUIET libffi)		pkg_check_modules(LIBOMPTARGET_SEARCH_LIBFFI QUIET libffi)

find_path (		find_path (
Show All 25 Lines	PATHS
/opt/local/lib		/opt/local/lib
/sw/lib		/sw/lib
ENV LIBRARY_PATH		ENV LIBRARY_PATH
ENV LD_LIBRARY_PATH)		ENV LD_LIBRARY_PATH)
endif()		endif()

set(LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIRS ${LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIR})		set(LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIRS ${LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIR})
find_package_handle_standard_args(		find_package_handle_standard_args(
LIBOMPTARGET_DEP_LIBFFI		LIBOMPTARGET_DEP_LIBFFI
DEFAULT_MSG		DEFAULT_MSG
LIBOMPTARGET_DEP_LIBFFI_LIBRARIES		LIBOMPTARGET_DEP_LIBFFI_LIBRARIES
LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIRS)		LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIRS)

mark_as_advanced(		mark_as_advanced(
LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIRS		LIBOMPTARGET_DEP_LIBFFI_INCLUDE_DIRS
LIBOMPTARGET_DEP_LIBFFI_LIBRARIES)		LIBOMPTARGET_DEP_LIBFFI_LIBRARIES)

################################################################################		################################################################################
# Looking for CUDA...		# Looking for CUDA...
################################################################################		################################################################################
if (CUDA_TOOLKIT_ROOT_DIR)		if (CUDA_TOOLKIT_ROOT_DIR)
set(LIBOMPTARGET_CUDA_TOOLKIT_ROOT_DIR_PRESET TRUE)		set(LIBOMPTARGET_CUDA_TOOLKIT_ROOT_DIR_PRESET TRUE)
endif()		endif()
find_package(CUDA QUIET)		find_package(CUDA QUIET)

# Try to get the highest Nvidia GPU architecture the system supports		# Try to get the highest Nvidia GPU architecture the system supports
if (CUDA_FOUND)		set(LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY TRUE CACHE BOOL
		"Auto detect CUDA Compute Capability if CUDA is detected.")
		if (CUDA_FOUND AND LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY)
cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS)		cuda_select_nvcc_arch_flags(CUDA_ARCH_FLAGS)
string(REGEX MATCH "sm_([0-9]+)" CUDA_ARCH_MATCH_OUTPUT ${CUDA_ARCH_FLAGS})		string(REGEX MATCH "sm_([0-9]+)" CUDA_ARCH_MATCH_OUTPUT ${CUDA_ARCH_FLAGS})
if (NOT DEFINED CUDA_ARCH_MATCH_OUTPUT OR "${CMAKE_MATCH_1}" LESS 35)		if (NOT DEFINED CUDA_ARCH_MATCH_OUTPUT OR "${CMAKE_MATCH_1}" LESS 35)
libomptarget_warning_say("Setting Nvidia GPU architecture support for OpenMP target runtime library to sm_35 by default")		libomptarget_warning_say("Setting Nvidia GPU architecture support for OpenMP target runtime library to sm_35 by default")
set(LIBOMPTARGET_DEP_CUDA_ARCH "35")		set(LIBOMPTARGET_DEP_CUDA_ARCH "35")
else()		else()
set(LIBOMPTARGET_DEP_CUDA_ARCH "${CMAKE_MATCH_1}")		set(LIBOMPTARGET_DEP_CUDA_ARCH "${CMAKE_MATCH_1}")
endif()		endif()
endif()		endif()

set(LIBOMPTARGET_DEP_CUDA_FOUND ${CUDA_FOUND})		set(LIBOMPTARGET_DEP_CUDA_FOUND ${CUDA_FOUND})
set(LIBOMPTARGET_DEP_CUDA_INCLUDE_DIRS ${CUDA_INCLUDE_DIRS})		set(LIBOMPTARGET_DEP_CUDA_INCLUDE_DIRS ${CUDA_INCLUDE_DIRS})

mark_as_advanced(		mark_as_advanced(
LIBOMPTARGET_DEP_CUDA_FOUND		LIBOMPTARGET_DEP_CUDA_FOUND
LIBOMPTARGET_DEP_CUDA_INCLUDE_DIRS)		LIBOMPTARGET_DEP_CUDA_INCLUDE_DIRS)

################################################################################		################################################################################
# Looking for CUDA Driver API... (needed for CUDA plugin)		# Looking for CUDA Driver API... (needed for CUDA plugin)
################################################################################		################################################################################

find_library (		find_library (
LIBOMPTARGET_DEP_CUDA_DRIVER_LIBRARIES		LIBOMPTARGET_DEP_CUDA_DRIVER_LIBRARIES
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

openmp/libomptarget/cmake/Modules/LibomptargetNVPTXBitcodeLibrary.cmake

This file was deleted.

	#
	#//===----------------------------------------------------------------------===//
	#//
	#// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	#// See https://llvm.org/LICENSE.txt for license information.
	#// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	#//
	#//===----------------------------------------------------------------------===//
	#

	# We use the compiler and linker provided by the user, attempt to use the one
	# used to build libomptarget or just fail.
	set(LIBOMPTARGET_NVPTX_BCLIB_SUPPORTED FALSE)

	if (NOT LIBOMPTARGET_NVPTX_CUDA_COMPILER STREQUAL "")
	set(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER ${LIBOMPTARGET_NVPTX_CUDA_COMPILER})
	elseif(${CMAKE_C_COMPILER_ID} STREQUAL "Clang")
	set(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER ${CMAKE_C_COMPILER})
	else()
	return()
	endif()

	# Get compiler directory to try to locate a suitable linker.
	get_filename_component(compiler_dir ${LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER} DIRECTORY)
	set(llvm_link "${compiler_dir}/llvm-link")

	if (NOT LIBOMPTARGET_NVPTX_BC_LINKER STREQUAL "")
	set(LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER ${LIBOMPTARGET_NVPTX_BC_LINKER})
	elseif (EXISTS "${llvm_link}")
	# Use llvm-link from the compiler directory.
	set(LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER "${llvm_link}")
	else()
	return()
	endif()

	function(try_compile_bitcode output source)
	set(srcfile ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/src.cu)
	file(WRITE ${srcfile} "${source}\n")
	set(bcfile ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/out.bc)

	# The remaining arguments are the flags to be tested.
	# FIXME: Don't hardcode GPU version. This is currently required because
	# Clang refuses to compile its default of sm_20 with CUDA 9.
	execute_process(
	COMMAND ${LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER} ${ARGN}
	--cuda-gpu-arch=sm_35 -c ${srcfile} -o ${bcfile}
	RESULT_VARIABLE result
	OUTPUT_QUIET ERROR_QUIET)
	if (result EQUAL 0)
	set(${output} TRUE PARENT_SCOPE)
	else()
	set(${output} FALSE PARENT_SCOPE)
	endif()
	endfunction()

	# Save for which compiler we are going to do the following checks so that we
	# can discard cached values if the user specifies a different value.
	set(discard_cached FALSE)
	if (DEFINED LIBOMPTARGET_NVPTX_CHECKED_CUDA_COMPILER AND
	NOT("${LIBOMPTARGET_NVPTX_CHECKED_CUDA_COMPILER}" STREQUAL "${LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER}"))
	set(discard_cached TRUE)
	endif()
	set(LIBOMPTARGET_NVPTX_CHECKED_CUDA_COMPILER "${LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER}" CACHE INTERNAL "" FORCE)

	function(check_bitcode_compilation output source)
	if (${discard_cached} OR NOT DEFINED ${output})
	message(STATUS "Performing Test ${output}")
	# Forward additional arguments which contain the flags.
	try_compile_bitcode(result "${source}" ${ARGN})
	set(${output} ${result} CACHE INTERNAL "" FORCE)
	if(${result})
	message(STATUS "Performing Test ${output} - Success")
	else()
	message(STATUS "Performing Test ${output} - Failed")
	endif()
	endif()
	endfunction()

	# These flags are required to emit LLVM Bitcode. We check them together because
	# if any of them are not supported, there is no point in finding out which are.
	set(compiler_flags_required -emit-llvm -O1 --cuda-device-only -std=c++14 --cuda-path=${CUDA_TOOLKIT_ROOT_DIR})
	set(compiler_flags_required_src "extern \"C\" __device__ int thread() { return threadIdx.x; }")
	check_bitcode_compilation(LIBOMPTARGET_NVPTX_CUDA_COMPILER_SUPPORTS_FLAGS_REQUIRED "${compiler_flags_required_src}" ${compiler_flags_required})

	# It makes no sense to continue given that the compiler doesn't support
	# emitting basic LLVM Bitcode
	if (NOT LIBOMPTARGET_NVPTX_CUDA_COMPILER_SUPPORTS_FLAGS_REQUIRED)
	return()
	endif()

	set(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER_FLAGS ${compiler_flags_required})

	# Declaring external shared device variables might need an additional flag
	# since Clang 7.0 and was entirely unsupported since version 4.0.
	set(extern_device_shared_src "extern __device__ __shared__ int test;")

	check_bitcode_compilation(LIBOMPTARGET_NVPTX_CUDA_COMPILER_SUPPORTS_EXTERN_SHARED "${extern_device_shared_src}" ${LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER_FLAGS})
	if (NOT LIBOMPTARGET_NVPTX_CUDA_COMPILER_SUPPORTS_EXTERN_SHARED)
	set(compiler_flag_fcuda_rdc -fcuda-rdc)
	set(compiler_flag_fcuda_rdc_full ${LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER_FLAGS} ${compiler_flag_fcuda_rdc})
	check_bitcode_compilation(LIBOMPTARGET_NVPTX_CUDA_COMPILER_SUPPORTS_FCUDA_RDC "${extern_device_shared_src}" ${compiler_flag_fcuda_rdc_full})

	if (NOT LIBOMPTARGET_NVPTX_CUDA_COMPILER_SUPPORTS_FCUDA_RDC)
	return()
	endif()

	set(LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER_FLAGS "${compiler_flag_fcuda_rdc_full}")
	endif()

	# We can compile LLVM Bitcode from CUDA source code!
	set(LIBOMPTARGET_NVPTX_BCLIB_SUPPORTED TRUE)

openmp/libomptarget/deviceRTLs/nvptx/CMakeLists.txt

	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##
	#			#
	# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	# See https://llvm.org/LICENSE.txt for license information.			# See https://llvm.org/LICENSE.txt for license information.
	# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	#			#
	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##
	#			#
	# Build the NVPTX (CUDA) Device RTL if the CUDA tools are available			# Build the NVPTX (CUDA) Device RTL if the CUDA tools are available
	#			#
	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##

				# Check if we can create an LLVM bitcode implementation of the runtime library
				# that could be inlined in the user application. For that we need to find
				# a Clang compiler capable of compiling our CUDA files to LLVM bitcode and
				# an LLVM linker.
				set(LIBOMPTARGET_NVPTX_CUDA_COMPILER "" CACHE STRING
				"Location of a CUDA compiler capable of emitting LLVM bitcode.")
				set(LIBOMPTARGET_NVPTX_BC_LINKER "" CACHE STRING
				"Location of a linker capable of linking LLVM bitcode objects.")

				if (NOT LIBOMPTARGET_NVPTX_CUDA_COMPILER STREQUAL "")
				set(cuda_compiler ${LIBOMPTARGET_NVPTX_CUDA_COMPILER})
				elseif(${CMAKE_C_COMPILER_ID} STREQUAL "Clang")
				set(cuda_compiler ${CMAKE_C_COMPILER})
				else()
				libomptarget_say("Not building NVPTX deviceRTL: clang not found")
				return()
				endif()

				# Get compiler directory to try to locate a suitable linker.
				get_filename_component(compiler_dir ${cuda_compiler} DIRECTORY)
				set(llvm_link "${compiler_dir}/llvm-link")

				if (NOT LIBOMPTARGET_NVPTX_BC_LINKER STREQUAL "")
				set(bc_linker ${LIBOMPTARGET_NVPTX_BC_LINKER})
				elseif (EXISTS ${llvm_link})
				set(bc_linker ${llvm_link})
				else()
				libomptarget_say("Not building NVPTX deviceRTL: llvm-link not found")
				return()
				endif()

	# TODO: This part needs to be refined when libomptarget is going to support			# TODO: This part needs to be refined when libomptarget is going to support
	# Windows!			# Windows!
	# TODO: This part can also be removed if we can change the clang driver to make			# TODO: This part can also be removed if we can change the clang driver to make
	# it support device only compilation.			# it support device only compilation.
	if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "x86_64")			if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "x86_64")
	set(aux_triple x86_64-unknown-linux-gnu)			set(aux_triple x86_64-unknown-linux-gnu)
	elseif(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "ppc64le")			elseif(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "ppc64le")
	set(aux_triple powerpc64le-unknown-linux-gnu)			set(aux_triple powerpc64le-unknown-linux-gnu)
	elseif(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "aarch64")			elseif(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "aarch64")
	set(aux_triple aarch64-unknown-linux-gnu)			set(aux_triple aarch64-unknown-linux-gnu)
	else()			else()
	libomptarget_say("Not building CUDA offloading device RTL: unknown host arch: ${CMAKE_HOST_SYSTEM_PROCESSOR}")			libomptarget_say("Not building CUDA offloading device RTL: unknown host arch: ${CMAKE_HOST_SYSTEM_PROCESSOR}")
	return()			return()
	endif()			endif()

	get_filename_component(devicertl_base_directory			get_filename_component(devicertl_base_directory
	${CMAKE_CURRENT_SOURCE_DIR}			${CMAKE_CURRENT_SOURCE_DIR}
	DIRECTORY)			DIRECTORY)
	set(devicertl_common_directory			set(devicertl_common_directory
	${devicertl_base_directory}/common)			${devicertl_base_directory}/common)
	set(devicertl_nvptx_directory			set(devicertl_nvptx_directory
	${devicertl_base_directory}/nvptx)			${devicertl_base_directory}/nvptx)

	if(LIBOMPTARGET_DEP_CUDA_FOUND)			if (DEFINED LIBOMPTARGET_DEP_CUDA_ARCH)
	# Build library support for the highest compute capability the system supports			set(default_capabilities ${LIBOMPTARGET_DEP_CUDA_ARCH})
	# and always build support for sm_35 by default
	if (${LIBOMPTARGET_DEP_CUDA_ARCH} EQUAL 35)
	set(default_capabilities 35)
	else()			else()
	set(default_capabilities "35,${LIBOMPTARGET_DEP_CUDA_ARCH}")			set(default_capabilities 35 37 50 52 53 60 61 62 70 72 75 80)
	endif()			endif()

	if (DEFINED LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY)
	set(default_capabilities ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY})
	libomptarget_warning_say("LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITY is deprecated, please use LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES")
	endif()
	set(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES ${default_capabilities} CACHE STRING			set(LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES ${default_capabilities} CACHE STRING
	"List of CUDA Compute Capabilities to be used to compile the NVPTX device RTL.")			"List of CUDA Compute Capabilities to be used to compile the NVPTX device RTL.")
	string(REPLACE "," ";" nvptx_sm_list ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES})

				set(nvptx_sm_list ${LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES})

				# If user set LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES to empty, we disable the
				# build.
				if (NOT nvptx_sm_list)
				libomptarget_say("Not building CUDA offloading device RTL: empty compute capability list")
				return()
				endif()

				# Check all SM values
	foreach(sm ${nvptx_sm_list})			foreach(sm ${nvptx_sm_list})
	set(CUDA_ARCH ${CUDA_ARCH} -gencode arch=compute_${sm},code=sm_${sm})			if (NOT ${sm} IN_LIST default_capabilities)
				message(FATAL_ERROR "LIBOMPTARGET-NVPTX: compute capability ${sm} is not supported. Supported values: ${default_capabilities}")
				endif()
	endforeach()			endforeach()

	# Override default MAX_SM in src/target_impl.h if requested			# Override default MAX_SM in src/target_impl.h if requested
	if (DEFINED LIBOMPTARGET_NVPTX_MAX_SM)			if (DEFINED LIBOMPTARGET_NVPTX_MAX_SM)
	set(MAX_SM_DEFINITION "-DMAX_SM=${LIBOMPTARGET_NVPTX_MAX_SM}")			set(MAX_SM_DEFINITION "-DMAX_SM=${LIBOMPTARGET_NVPTX_MAX_SM}")
	endif()			endif()

	# Activate RTL message dumps if requested by the user.			# Activate RTL message dumps if requested by the user.
	set(LIBOMPTARGET_NVPTX_DEBUG FALSE CACHE BOOL			set(LIBOMPTARGET_NVPTX_DEBUG FALSE CACHE BOOL
	"Activate NVPTX device RTL debug messages.")			"Activate NVPTX device RTL debug messages.")

	# Check if we can create an LLVM bitcode implementation of the runtime library
	# that could be inlined in the user application. For that we need to find
	# a Clang compiler capable of compiling our CUDA files to LLVM bitcode and
	# an LLVM linker.
	set(LIBOMPTARGET_NVPTX_CUDA_COMPILER "" CACHE STRING
	"Location of a CUDA compiler capable of emitting LLVM bitcode.")
	set(LIBOMPTARGET_NVPTX_BC_LINKER "" CACHE STRING
	"Location of a linker capable of linking LLVM bitcode objects.")

	include(LibomptargetNVPTXBitcodeLibrary)

	if (LIBOMPTARGET_NVPTX_BCLIB_SUPPORTED)
	libomptarget_say("Building CUDA LLVM bitcode offloading device RTL.")			libomptarget_say("Building CUDA LLVM bitcode offloading device RTL.")

	set(cuda_src_files			set(cuda_src_files
	${devicertl_common_directory}/src/cancel.cu			${devicertl_common_directory}/src/cancel.cu
	${devicertl_common_directory}/src/critical.cu			${devicertl_common_directory}/src/critical.cu
	${devicertl_common_directory}/src/data_sharing.cu			${devicertl_common_directory}/src/data_sharing.cu
	${devicertl_common_directory}/src/libcall.cu			${devicertl_common_directory}/src/libcall.cu
	${devicertl_common_directory}/src/loop.cu			${devicertl_common_directory}/src/loop.cu
	${devicertl_common_directory}/src/omp_data.cu			${devicertl_common_directory}/src/omp_data.cu
	${devicertl_common_directory}/src/omptarget.cu			${devicertl_common_directory}/src/omptarget.cu
	${devicertl_common_directory}/src/parallel.cu			${devicertl_common_directory}/src/parallel.cu
	${devicertl_common_directory}/src/reduction.cu			${devicertl_common_directory}/src/reduction.cu
	${devicertl_common_directory}/src/support.cu			${devicertl_common_directory}/src/support.cu
	${devicertl_common_directory}/src/sync.cu			${devicertl_common_directory}/src/sync.cu
	${devicertl_common_directory}/src/task.cu			${devicertl_common_directory}/src/task.cu
	src/target_impl.cu			src/target_impl.cu
	)			)

	# Set flags for LLVM Bitcode compilation.			# Set flags for LLVM Bitcode compilation.
	set(bc_flags -S -x c++			set(bc_flags -S -x c++
	-target nvptx64			-target nvptx64
	-Xclang -emit-llvm-bc			-Xclang -emit-llvm-bc
	-Xclang -aux-triple -Xclang ${aux_triple}			-Xclang -aux-triple -Xclang ${aux_triple}
	-fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device			-fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device
	-D__CUDACC__			-D__CUDACC__
	-I${devicertl_base_directory}			-I${devicertl_base_directory}
	-I${devicertl_nvptx_directory}/src)			-I${devicertl_nvptx_directory}/src)

	if(${LIBOMPTARGET_NVPTX_DEBUG})			if(${LIBOMPTARGET_NVPTX_DEBUG})
	list(APPEND bc_flags -DOMPTARGET_NVPTX_DEBUG=-1)			list(APPEND bc_flags -DOMPTARGET_NVPTX_DEBUG=-1)
	else()			else()
	list(APPEND bc_flags -DOMPTARGET_NVPTX_DEBUG=0)			list(APPEND bc_flags -DOMPTARGET_NVPTX_DEBUG=0)
	endif()			endif()

	# Create target to build all Bitcode libraries.			# Create target to build all Bitcode libraries.
	add_custom_target(omptarget-nvptx-bc)			add_custom_target(omptarget-nvptx-bc)

	# This map is from clang/lib/Driver/ToolChains/Cuda.cpp.			# This map is from clang/lib/Driver/ToolChains/Cuda.cpp.
	# The last element is the default case.			# The last element is the default case.
	set(cuda_version_list 110 102 101 100 92 91 90 80)			set(cuda_version_list 110 102 101 100 92 91 90 80)
	set(ptx_feature_list 70 65 64 63 61 61 60 42)			set(ptx_feature_list 70 65 64 63 61 61 60 42)
	# The following two lines of ugly code is not needed when the minimal CMake			# The following two lines of ugly code is not needed when the minimal CMake
	# version requirement is 3.17+.			# version requirement is 3.17+.
	list(LENGTH cuda_version_list num_version_supported)			list(LENGTH cuda_version_list num_version_supported)
	math(EXPR loop_range "${num_version_supported} - 1")			math(EXPR loop_range "${num_version_supported} - 1")

	# Generate a Bitcode library for all the compute capabilities the user			# Generate a Bitcode library for all the compute capabilities the user
	# requested and all PTX version we know for now.			# requested and all PTX version we know for now.
	foreach(sm ${nvptx_sm_list})			foreach(sm ${nvptx_sm_list})
	set(sm_flags -Xclang -target-cpu -Xclang sm_${sm} "-D__CUDA_ARCH__=${sm}0")			set(sm_flags -Xclang -target-cpu -Xclang sm_${sm} "-D__CUDA_ARCH__=${sm}0")

	# Uncomment the following code and remove those ugly part if the feature			# Uncomment the following code and remove those ugly part if the feature
	# is available.			# is available.
	# foreach(cuda_version ptx_num IN ZIP_LISTS cuda_version_list ptx_feature_list)			# foreach(cuda_version ptx_num IN ZIP_LISTS cuda_version_list ptx_feature_list)
	foreach(itr RANGE ${loop_range})			foreach(itr RANGE ${loop_range})
	list(GET cuda_version_list ${itr} cuda_version)			list(GET cuda_version_list ${itr} cuda_version)
	list(GET ptx_feature_list ${itr} ptx_num)			list(GET ptx_feature_list ${itr} ptx_num)
	set(cuda_flags ${sm_flags})			set(cuda_flags ${sm_flags})
	list(APPEND cuda_flags -Xclang -target-feature -Xclang +ptx${ptx_num})			list(APPEND cuda_flags -Xclang -target-feature -Xclang +ptx${ptx_num})
	list(APPEND cuda_flags "-DCUDA_VERSION=${cuda_version}00")			list(APPEND cuda_flags "-DCUDA_VERSION=${cuda_version}00")

	set(bc_files "")			set(bc_files "")
	foreach(src ${cuda_src_files})			foreach(src ${cuda_src_files})
	get_filename_component(infile ${src} ABSOLUTE)			get_filename_component(infile ${src} ABSOLUTE)
	get_filename_component(outfile ${src} NAME)			get_filename_component(outfile ${src} NAME)
	set(outfile "${outfile}-cuda_${cuda_version}-sm_${sm}.bc")			set(outfile "${outfile}-cuda_${cuda_version}-sm_${sm}.bc")

	add_custom_command(OUTPUT ${outfile}			add_custom_command(OUTPUT ${outfile}
	COMMAND ${LIBOMPTARGET_NVPTX_SELECTED_CUDA_COMPILER} ${bc_flags}			COMMAND ${cuda_compiler} ${bc_flags}
	${cuda_flags} ${MAX_SM_DEFINITION} ${infile} -o ${outfile}			${cuda_flags} ${MAX_SM_DEFINITION} ${infile} -o ${outfile}
	DEPENDS ${infile}			DEPENDS ${infile}
	IMPLICIT_DEPENDS CXX ${infile}			IMPLICIT_DEPENDS CXX ${infile}
	COMMENT "Building LLVM bitcode ${outfile}"			COMMENT "Building LLVM bitcode ${outfile}"
	VERBATIM			VERBATIM
	)			)
	set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})			set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})

	list(APPEND bc_files ${outfile})			list(APPEND bc_files ${outfile})
	endforeach()			endforeach()

	set(bclib_name "libomptarget-nvptx-cuda_${cuda_version}-sm_${sm}.bc")			set(bclib_name "libomptarget-nvptx-cuda_${cuda_version}-sm_${sm}.bc")

	# Link to a bitcode library.			# Link to a bitcode library.
	add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}			add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
	COMMAND ${LIBOMPTARGET_NVPTX_SELECTED_BC_LINKER}			COMMAND ${bc_linker}
	-o ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} ${bc_files}			-o ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} ${bc_files}
	DEPENDS ${bc_files}			DEPENDS ${bc_files}
	COMMENT "Linking LLVM bitcode ${bclib_name}"			COMMENT "Linking LLVM bitcode ${bclib_name}"
	)			)
	set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${bclib_name})			set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${bclib_name})

	set(bclib_target_name "omptarget-nvptx-cuda_${cuda_version}-sm_${sm}-bc")			set(bclib_target_name "omptarget-nvptx-cuda_${cuda_version}-sm_${sm}-bc")

	add_custom_target(${bclib_target_name} ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name})			add_custom_target(${bclib_target_name} ALL DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name})
	add_dependencies(omptarget-nvptx-bc ${bclib_target_name})			add_dependencies(omptarget-nvptx-bc ${bclib_target_name})

	# Copy library to destination.			# Copy library to destination.
	add_custom_command(TARGET ${bclib_target_name} POST_BUILD			add_custom_command(TARGET ${bclib_target_name} POST_BUILD
	COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}			COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
	${LIBOMPTARGET_LIBRARY_DIR})			${LIBOMPTARGET_LIBRARY_DIR})

	# Install bitcode library under the lib destination folder.			# Install bitcode library under the lib destination folder.
	install(FILES ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} DESTINATION "${OPENMP_INSTALL_LIBDIR}")			install(FILES ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name} DESTINATION "${OPENMP_INSTALL_LIBDIR}")
	endforeach()			endforeach()
	endforeach()			endforeach()
	endif()

				# Test will be enabled if the building machine supports CUDA
				if (LIBOMPTARGET_DEP_CUDA_FOUND)
	add_subdirectory(test)			add_subdirectory(test)
	else()
	libomptarget_say("Not building CUDA offloading device RTL: tools to build bc lib not found in the system.")
	endif()			endif()