This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/deviceRTLs/
-
libomptarget/
-
deviceRTLs/
-
CMakeLists.txt
-
amdgcn/
-
CMakeLists.txt
-
src/
-
target_impl.h

Differential D101213

[libomptarget] Enable AMDGPU devicertl
ClosedPublic

Authored by JonChesterfield on Apr 23 2021, 5:18 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
tianshilei1992
grokos
ronlieb
ye-luo

Commits

rG58f125493d3c: [libomptarget] Enable AMDGPU devicertl

Summary

[libomptarget] Enable AMDGPU devicertl

The amdgpu devicertl is written in freestanding openmp and compiles to a
bitcode library (per listed gfx arch) with no unresolved symbols. It requires
a recent clang, preferably the one from the same monorepo checkout.

This is D98658, with printf explicitly stubbed out, after patching clang to no
longer require an llvm with the amdgpu target enabled.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JonChesterfield created this revision.Apr 23 2021, 5:18 PM

Herald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptApr 23 2021, 5:18 PM

JonChesterfield requested review of this revision.Apr 23 2021, 5:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 5:18 PM

Herald added subscribers: openmp-commits, sstefan1, wdng. · View Herald Transcript

LG. The only thing missing here is, if the compiler doesn't support the target (no matter NVPTX or AMDGCN), we need to disable them. So we need a check in CMake. I'll do it after I'm done with my current business.

This revision is now accepted and ready to land.Apr 23 2021, 5:23 PM

There are some interesting configuration choices.

In particular, we can build the nvptx or amdgcn device bitcode with a (recent/same-commit) clang, even if the llvm doesn't have either of those targets enabled. If that same clang is later used to build an application, it (hopefully!) errors about the missing target at that point.

We currently use the existence of cuda to control whether to run tests on nvptx. That's not strictly the right test, a machine may have cuda installed but no nvptx hardware.

I'm trying to get a simplified cmake (based on llvm-config) to run, will hold off on committing this until I can tell if that's going to work out.

edit: didn't work. llvm-config not found during a runtimes build for some reason. Will land this and iterate in tree.

detect aux triple, following nvptx

This revision was landed with ongoing or failed builds.Apr 23 2021, 6:25 PM

Closed by commit rG58f125493d3c: [libomptarget] Enable AMDGPU devicertl (authored by JonChesterfield). · Explain Why

This revision was automatically updated to reflect the committed changes.

JonChesterfield added a commit: rG58f125493d3c: [libomptarget] Enable AMDGPU devicertl.

Harbormaster completed remote builds in B100709: Diff 340208.Apr 23 2021, 6:56 PM

Harbormaster completed remote builds in B100718: Diff 340218.Apr 23 2021, 7:33 PM

I think we are really in dire need of the mechanism to test if the compiler can be used to compile the source code w/o making any assumption. Corresponding directory should ONLY be included or exit ahead of time if the compiler is not qualified. Now I found multiple issues:

libomptarget depends on LLVM components, but when I try to build OpenMP standalone with GCC, it causes failure of CMake configuration.
The AMDGCN device runtime compilation fails because my Clang doesn't support AMDGCN backend. This failure can block the whole compilation of OpenMP.

I'll make a patch to make corresponding detection before including any directory in libomptarget. By default everything will be disabled.

In D101213#2762233, @tianshilei1992 wrote:

The AMDGCN device runtime compilation fails because my Clang doesn't support AMDGCN backend. This failure can block the whole compilation of OpenMP.

For what it's worth, that's a bug in clang that has been fixed. I guess it was introduced before the last release and fixed after, so there's a window of vulnerability there

In D101213#2762248, @JonChesterfield wrote:

In D101213#2762233, @tianshilei1992 wrote:

The AMDGCN device runtime compilation fails because my Clang doesn't support AMDGCN backend. This failure can block the whole compilation of OpenMP.

For what it's worth, that's a bug in clang that has been fixed. I guess it was introduced before the last release and fixed after, so there's a window of vulnerability there

I'm using the latest trunk, and I can still observe:

[31/174] Generating sync.gfx900.bc
FAILED: libomptarget/deviceRTLs/amdgcn/sync.gfx900.bc
cd /nvm/0/shiltian/build/openmp/debug/libomptarget/deviceRTLs/amdgcn && /home/shiltian/.local/llvm-12.0.0/bin/clang -xc++ -c -std=c++14 -ffreestanding -target amdgcn-amd-amdhsa -emit-llvm -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__AMDGCN__ -Xclang -target-cpu -Xclang gfx900 -fvisibility=default -Wno-unused-value -nogpulib -O2 -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/common/include -I/home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs /home/shiltian/Documents/vscode/llvm-project/openmp/libomptarget/deviceRTLs/common/src/sync.cu -o sync.gfx900.bc
clang (LLVM option parsing): Unknown command line argument '--amdhsa-code-object-version=3'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--pgo-memop-max-version=3'?

Actually, interestingly, although I already set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER in CMake configuration, it still uses the clang in my $PATH.

That's annoying. D101095 fixed that at the time, and hasn't been reverted, though I don't have a CI system to catch regressions.

Revision Contents

Path

Size

openmp/

libomptarget/

deviceRTLs/

CMakeLists.txt

1 line

amdgcn/

CMakeLists.txt

16 lines

src/

target_impl.h

6 lines

Diff 340220

openmp/libomptarget/deviceRTLs/CMakeLists.txt

	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##
	#			#
	# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	# See https://llvm.org/LICENSE.txt for license information.			# See https://llvm.org/LICENSE.txt for license information.
	# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	#			#
	# ##===----------------------------------------------------------------------===##			# ##===----------------------------------------------------------------------===##
	#			#
	# Build a device RTL for each available machine.			# Build a device RTL for each available machine.
	#			#
	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##

				add_subdirectory(amdgcn)
	add_subdirectory(nvptx)			add_subdirectory(nvptx)

openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt

Show All 30 Lines
set(AOMP_INSTALL_PREFIX ${LLVM_INSTALL_PREFIX})		set(AOMP_INSTALL_PREFIX ${LLVM_INSTALL_PREFIX})

if (AOMP_INSTALL_PREFIX)		if (AOMP_INSTALL_PREFIX)
set(AOMP_BINDIR ${AOMP_INSTALL_PREFIX}/bin)		set(AOMP_BINDIR ${AOMP_INSTALL_PREFIX}/bin)
else()		else()
set(AOMP_BINDIR ${LLVM_BUILD_BINARY_DIR}/bin)		set(AOMP_BINDIR ${LLVM_BUILD_BINARY_DIR}/bin)
endif()		endif()

		# Copied from nvptx CMakeLists
		if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "x86_64")
		set(aux_triple x86_64-unknown-linux-gnu)
		elseif(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "ppc64le")
		set(aux_triple powerpc64le-unknown-linux-gnu)
		elseif(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "aarch64")
		set(aux_triple aarch64-unknown-linux-gnu)
		else()
		libomptarget_say("Not building AMDGCN device RTL: unknown host arch: ${CMAKE_HOST_SYSTEM_PROCESSOR}")
		return()
		endif()

libomptarget_say("Building AMDGCN device RTL. LLVM_COMPILER_PATH=${AOMP_BINDIR}")		libomptarget_say("Building AMDGCN device RTL. LLVM_COMPILER_PATH=${AOMP_BINDIR}")

project(omptarget-amdgcn)		project(omptarget-amdgcn)

add_custom_target(omptarget-amdgcn ALL)		add_custom_target(omptarget-amdgcn ALL)

#optimization level		#optimization level
set(optimization_level 2)		set(optimization_level 2)
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
macro(add_cuda_bc_library)		macro(add_cuda_bc_library)
set(cu_cmd ${AOMP_BINDIR}/clang++		set(cu_cmd ${AOMP_BINDIR}/clang++
-xc++		-xc++
-c		-c
-std=c++14		-std=c++14
-ffreestanding		-ffreestanding
-target amdgcn-amd-amdhsa		-target amdgcn-amd-amdhsa
-emit-llvm		-emit-llvm
-Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu # see nvptx		-Xclang -aux-triple -Xclang ${aux_triple}
-fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device		-fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device
-D__AMDGCN__		-D__AMDGCN__
-Xclang -target-cpu -Xclang ${mcpu}		-Xclang -target-cpu -Xclang ${mcpu}
-fvisibility=default		-fvisibility=default
-Wno-unused-value		-Wno-unused-value
-nogpulib		-nogpulib
-O${optimization_level}		-O${optimization_level}
${CUDA_DEBUG}		${CUDA_DEBUG}
Show All 33 Lines	foreach(mcpu ${mcpus})
add_custom_command(		add_custom_command(
OUTPUT ${bc_libname}		OUTPUT ${bc_libname}
COMMAND ${AOMP_BINDIR}/llvm-link ${bc_files} \| ${AOMP_BINDIR}/opt --always-inline -o ${OUTPUTDIR}/${bc_libname}		COMMAND ${AOMP_BINDIR}/llvm-link ${bc_files} \| ${AOMP_BINDIR}/opt --always-inline -o ${OUTPUTDIR}/${bc_libname}
DEPENDS ${bc_files})		DEPENDS ${bc_files})

add_custom_target(lib${libname}-${mcpu} ALL DEPENDS ${bc_libname})		add_custom_target(lib${libname}-${mcpu} ALL DEPENDS ${bc_libname})

install(FILES ${OUTPUTDIR}/${bc_libname}		install(FILES ${OUTPUTDIR}/${bc_libname}
DESTINATION "${OPENMP_INSTALL_LIBDIR}/libdevice"		DESTINATION "${OPENMP_INSTALL_LIBDIR}"
)		)
endforeach()		endforeach()

openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	enum DATA_SHARING_SIZES {
// The maximum number of warps in use		// The maximum number of warps in use
DS_Max_Warp_Number = 16,		DS_Max_Warp_Number = 16,
};		};

enum : __kmpc_impl_lanemask_t {		enum : __kmpc_impl_lanemask_t {
__kmpc_impl_all_lanes = ~(__kmpc_impl_lanemask_t)0		__kmpc_impl_all_lanes = ~(__kmpc_impl_lanemask_t)0
};		};

EXTERN int printf(const char *, ...);		// The return code of printf is not checked in the call sites in this library.
		// A call to a function named printf currently hits some special case handling
		// for opencl, which translates to calls that do not presently exist for openmp
		// Therefore, for now, stub out printf while building this library.
		#define printf(...)

#endif		#endif