Download Raw Diff

Details

Reviewers

jdoerfert
tianshilei1992
jhuber6
ronlieb

Commits

rG4d50803ce49c: [libomptarget] Build DeviceRTL for amdgpu
rG33427fdb7b52: [libomptarget] Build DeviceRTL for amdgpu

Summary

Passes same tests as the current deviceRTL. Includes cmake change from D111987.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JonChesterfield created this revision.Oct 21 2021, 8:28 AM

Herald added subscribers: kerbowa, t-tye, tpr and 6 others. · View Herald TranscriptOct 21 2021, 8:28 AM

JonChesterfield requested review of this revision.Oct 21 2021, 8:28 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptOct 21 2021, 8:28 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: openmp-commits, cfe-commits, sstefan1, wdng. · View Herald Transcript

JonChesterfield edited the summary of this revision. (Show Details)Oct 21 2021, 8:28 AM

JonChesterfield added inline comments.

openmp/libomptarget/DeviceRTL/src/Configuration.cpp
23	Otherwise the missing symbols prevents linking, not clear why it works on nvptx64
openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
71–87	This is not good, need to revise sema checking on these intrinsics and add some lowering in clang/llvm that builds the switch. Written longhand here to get things running.

jdoerfert added inline comments.Oct 21 2021, 8:56 AM

openmp/libomptarget/DeviceRTL/src/Configuration.cpp
23	linking what? Clang emits the symbol, maybe just not for amdgpu.

JonChesterfield added inline comments.Oct 21 2021, 9:02 AM

openmp/libomptarget/DeviceRTL/src/Configuration.cpp
23	Where? The only reference I can find to it is here, and it's marked extern.

Subscribed some AMD people to this. I wanted to apply this patch as-is to amd-stg-open to feed it to the internal testing, but it doesn't apply because Driver/ToolChains/AMDGPUOpenMP.cpp in rocm is significantly different to trunk (in particular the call to addOpenMPDeviceRTL is commented out)

Harbormaster completed remote builds in B129956: Diff 381277.Oct 21 2021, 9:16 AM

rebase

Harbormaster completed remote builds in B130690: Diff 382306.Oct 26 2021, 7:51 AM

Enable tests on amdgpu, with same ones marked xfail/unsupported as on the old runtime

JonChesterfield edited the summary of this revision. (Show Details)Oct 26 2021, 12:37 PM

Harbormaster completed remote builds in B130783: Diff 382421.Oct 26 2021, 12:38 PM

drop outdated comment

delete the extern debug_kind variable

I think this is good enough for now. It drops the not yet used debug variable and writes out the lowering for runtime values of memory ordering manually. The latter will be simplified once clang learns to emit the switch instead of error. Omp lock is a problem I don't have a good solution to.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
255	'amdgcn' appears to be a subset of 'amdgpu', so this seems a reasonable point to rename it.
openmp/libomptarget/DeviceRTL/CMakeLists.txt
180	rearranging the naming here - the llvm-link file is now prefixed linked_, with the optimised library left without a prefix. Updated depends / output clauses to match.
openmp/libomptarget/DeviceRTL/src/Configuration.cpp
23	I think the cuda toolchain treats unresolved references as 'just use zero', in which case deleting this is a no-op on nvptx. Maybe it's intended to be patched by cuda rtl.cpp in the future? If so can reintroduce it then

rebase

Harbormaster completed remote builds in B130797: Diff 382441.Oct 26 2021, 1:11 PM

rebase

rebase, having landed D111987 this time

Problem with missing symbol for __omp_rtl_debug_kind was a local error. I did the initial testing of this with a jury rigged clang that linked the new bitcode and ignored the old. The generation of this integer is guarded by which runtime clang thinks it is compiling for. Thus, my local clang compiled for the old runtime and linked with the new, which it turns out does not work.

I was under the impression that pointing clang at the new runtime with libomptarget-amdgcn-bc-path was sufficient, but evidently not. Doesn't matter for this patch now that the variable is reinstated.

JonChesterfield added reviewers: tianshilei1992, jhuber6.Oct 26 2021, 1:39 PM

Harbormaster completed remote builds in B130801: Diff 382446.Oct 26 2021, 1:48 PM

Requires either D112544 or disabling bug51982 on newRTL before landing

LG, I'll land my fix and with this we can switch over to the new RT.

This revision is now accepted and ready to land.Oct 26 2021, 3:54 PM

JonChesterfield mentioned this in D112544: [Attributor][FIX] Use right address space to avoid assertion.Oct 27 2021, 3:14 AM

Closed by commit rG33427fdb7b52: [libomptarget] Build DeviceRTL for amdgpu (authored by JonChesterfield). · Explain WhyOct 27 2021, 4:41 PM

This revision was automatically updated to reflect the committed changes.

JonChesterfield added a commit: rG33427fdb7b52: [libomptarget] Build DeviceRTL for amdgpu.

JonChesterfield added a reverting change: rG6c7b203d1d70: Revert "[libomptarget] Build DeviceRTL for amdgpu".Oct 27 2021, 5:02 PM

Didn't fare very well under CI, investigating. Ten failures at https://lab.llvm.org/buildbot/#/builders/193/builds/915, but they all pass locally.

This revision is now accepted and ready to land.Oct 27 2021, 5:03 PM

JonChesterfield added inline comments.Oct 27 2021, 5:37 PM

openmp/libomptarget/DeviceRTL/src/Synchronization.cpp
197–202	Error here - syncThreadsAligned is deleted but should not be

Reintroduce syncThreadsAligned, dropped in git merge conflict

fix new+old test enabling

JonChesterfield mentioned this in D112680: [OpenMP] Lower printf to __llvm_omp_vprintf.Oct 27 2021, 6:01 PM

Tagging Ron as this is current stuck on the mystery of passing locally and failing CI

Harbormaster completed remote builds in B131087: Diff 382866.Oct 27 2021, 6:14 PM

Closed by commit rG4d50803ce49c: [libomptarget] Build DeviceRTL for amdgpu (authored by JonChesterfield). · Explain WhyOct 28 2021, 4:34 AM

This revision was automatically updated to reflect the committed changes.

JonChesterfield added a commit: rG4d50803ce49c: [libomptarget] Build DeviceRTL for amdgpu.

Landed a slightly modified version of this - code and test changes are included, but the tests are not run by default. I'm hopeful this will help the process of working out why ~10 are failing on CI and passing locally.

Diff 382987

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	void AMDGPUOpenMPToolChain::addClangTargetOptions(
CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
return;		return;

std::string BitcodeSuffix;		std::string BitcodeSuffix;
if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime,		if (DriverArgs.hasFlag(options::OPT_fopenmp_target_new_runtime,
options::OPT_fno_openmp_target_new_runtime, false))		options::OPT_fno_openmp_target_new_runtime, false))
BitcodeSuffix = "new-amdgcn-" + GPUArch;		BitcodeSuffix = "new-amdgpu-" + GPUArch;
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions 'amdgcn' appears to be a subset of 'amdgpu', so this seems a reasonable point to rename it. JonChesterfield: 'amdgcn' appears to be a subset of 'amdgpu', so this seems a reasonable point to rename it.
else		else
BitcodeSuffix = "amdgcn-" + GPUArch;		BitcodeSuffix = "amdgcn-" + GPUArch;

addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, BitcodeSuffix,		addOpenMPDeviceRTL(getDriver(), DriverArgs, CC1Args, BitcodeSuffix,
getTriple());		getTriple());
}		}

llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(		llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

openmp/libomptarget/DeviceRTL/CMakeLists.txt

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	foreach(src ${src_files})
endif()		endif()
set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})		set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})

list(APPEND bc_files ${outfile})		list(APPEND bc_files ${outfile})
endforeach()		endforeach()

set(bclib_name "libomptarget-new-${target_name}-${target_cpu}.bc")		set(bclib_name "libomptarget-new-${target_name}-${target_cpu}.bc")

# Link to a bitcode library.		# Link to a bitcode library.
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions rearranging the naming here - the llvm-link file is now prefixed linked_, with the optimised library left without a prefix. Updated depends / output clauses to match. JonChesterfield: rearranging the naming here - the llvm-link file is now prefixed linked_, with the optimised…
add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name}		add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name}
COMMAND ${LINK_TOOL}		COMMAND ${LINK_TOOL}
-o ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} ${bc_files}		-o ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} ${bc_files}
DEPENDS ${bc_files}		DEPENDS ${bc_files}
COMMENT "Linking LLVM bitcode ${bclib_name}"		COMMENT "Linking LLVM bitcode ${bclib_name}"
)		)

add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}		add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
Show All 32 Lines
endfunction()		endfunction()

# Generate a Bitcode library for all the compute capabilities the user requested		# Generate a Bitcode library for all the compute capabilities the user requested
foreach(sm ${nvptx_sm_list})		foreach(sm ${nvptx_sm_list})
compileDeviceRTLLibrary(sm_${sm} nvptx -target nvptx64 -Xclang -target-feature -Xclang +ptx61 "-D__CUDA_ARCH__=${sm}0")		compileDeviceRTLLibrary(sm_${sm} nvptx -target nvptx64 -Xclang -target-feature -Xclang +ptx61 "-D__CUDA_ARCH__=${sm}0")
endforeach()		endforeach()

foreach(mcpu ${amdgpu_mcpus})		foreach(mcpu ${amdgpu_mcpus})
# require D112227 or similar to enable the compilation for amdgpu		compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib)
# compileDeviceRTLLibrary(${mcpu} amdgpu -target amdgcn-amd-amdhsa -D__AMDGCN__ -fvisibility=default -nogpulib)
endforeach()		endforeach()

openmp/libomptarget/DeviceRTL/src/Configuration.cpp

	Show All 14 Lines
	#include "DeviceEnvironment.h"			#include "DeviceEnvironment.h"
	#include "State.h"			#include "State.h"
	#include "Types.h"			#include "Types.h"

	using namespace _OMP;			using namespace _OMP;

	#pragma omp declare target			#pragma omp declare target

	extern uint32_t __omp_rtl_debug_kind;			extern uint32_t __omp_rtl_debug_kind; // defined by CGOpenMPRuntimeGPU
				JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Otherwise the missing symbols prevents linking, not clear why it works on nvptx64 JonChesterfield: Otherwise the missing symbols prevents linking, not clear why it works on nvptx64
				jdoerfertUnsubmitted Not Done Reply Inline Actions linking what? Clang emits the symbol, maybe just not for amdgpu. jdoerfert: linking what? Clang emits the symbol, maybe just not for amdgpu.
				JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Where? The only reference I can find to it is here, and it's marked extern. JonChesterfield: Where? The only reference I can find to it is here, and it's marked extern.
				JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions I think the cuda toolchain treats unresolved references as 'just use zero', in which case deleting this is a no-op on nvptx. Maybe it's intended to be patched by cuda rtl.cpp in the future? If so can reintroduce it then JonChesterfield: I think the cuda toolchain treats unresolved references as 'just use zero', in which case…

	// TOOD: We want to change the name as soon as the old runtime is gone.			// TODO: We want to change the name as soon as the old runtime is gone.
	DeviceEnvironmentTy CONSTANT(omptarget_device_environment)			DeviceEnvironmentTy CONSTANT(omptarget_device_environment)
	__attribute__((used));			__attribute__((used));

	uint32_t config::getDebugKind() {			uint32_t config::getDebugKind() {
	return __omp_rtl_debug_kind & omptarget_device_environment.DebugKind;			return __omp_rtl_debug_kind & omptarget_device_environment.DebugKind;
	}			}

	uint32_t config::getNumDevices() {			uint32_t config::getNumDevices() {
	Show All 16 Lines

openmp/libomptarget/DeviceRTL/src/Synchronization.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
}		}
///}		///}

/// AMDGCN Implementation		/// AMDGCN Implementation
///		///
///{		///{
#pragma omp begin declare variant match(device = {arch(amdgcn)})		#pragma omp begin declare variant match(device = {arch(amdgcn)})

uint32_t atomicInc(uint32_t *Address, uint32_t Val, int Ordering) {		uint32_t atomicInc(uint32_t *A, uint32_t V, int Ordering) {
return __builtin_amdgcn_atomic_inc32(Address, Val, Ordering, "");		// builtin_amdgcn_atomic_inc32 should expand to this switch when
		// passed a runtime value, but does not do so yet. Workaround here.
		switch (Ordering) {
		default:
		__builtin_unreachable();
		case __ATOMIC_RELAXED:
		return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELAXED, "");
		case __ATOMIC_ACQUIRE:
		return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_ACQUIRE, "");
		case __ATOMIC_RELEASE:
		return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_RELEASE, "");
		case __ATOMIC_ACQ_REL:
		return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_ACQ_REL, "");
		case __ATOMIC_SEQ_CST:
		return __builtin_amdgcn_atomic_inc32(A, V, __ATOMIC_SEQ_CST, "");
		}
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions This is not good, need to revise sema checking on these intrinsics and add some lowering in clang/llvm that builds the switch. Written longhand here to get things running. JonChesterfield: This is not good, need to revise sema checking on these intrinsics and add some lowering in…
}		}

uint32_t SHARED(namedBarrierTracker);		uint32_t SHARED(namedBarrierTracker);

void namedBarrierInit() {		void namedBarrierInit() {
// Don't have global ctors, and shared memory is not zero init		// Don't have global ctors, and shared memory is not zero init
atomic::store(&namedBarrierTracker, 0u, __ATOMIC_RELEASE);		atomic::store(&namedBarrierTracker, 0u, __ATOMIC_RELEASE);
}		}
Show All 40 Lines	if ((load & 0x0000ffffu) == (NumWaves - 1)) {
__builtin_amdgcn_s_sleep(0);		__builtin_amdgcn_s_sleep(0);
load = atomic::load(&namedBarrierTracker, __ATOMIC_RELAXED);		load = atomic::load(&namedBarrierTracker, __ATOMIC_RELAXED);
} while ((load & 0xffff0000u) == generation);		} while ((load & 0xffff0000u) == generation);
}		}
}		}
fence::team(__ATOMIC_RELEASE);		fence::team(__ATOMIC_RELEASE);
}		}

		// sema checking of amdgcn_fence is aggressive. Intention is to patch clang
		// so that it is usable within a template environment and so that a runtime
		// value of the memory order is expanded to this switch within clang/llvm.
		void fenceTeam(int Ordering) {
		switch (Ordering) {
		default:
		__builtin_unreachable();
		case __ATOMIC_ACQUIRE:
		return __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "workgroup");
		case __ATOMIC_RELEASE:
		return __builtin_amdgcn_fence(__ATOMIC_RELEASE, "workgroup");
		case __ATOMIC_ACQ_REL:
		return __builtin_amdgcn_fence(__ATOMIC_ACQ_REL, "workgroup");
		case __ATOMIC_SEQ_CST:
		return __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup");
		}
		}
		void fenceKernel(int Ordering) {
		switch (Ordering) {
		default:
		__builtin_unreachable();
		case __ATOMIC_ACQUIRE:
		return __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent");
		case __ATOMIC_RELEASE:
		return __builtin_amdgcn_fence(__ATOMIC_RELEASE, "agent");
		case __ATOMIC_ACQ_REL:
		return __builtin_amdgcn_fence(__ATOMIC_ACQ_REL, "agent");
		case __ATOMIC_SEQ_CST:
		return __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "agent");
		}
		}
		void fenceSystem(int Ordering) {
		switch (Ordering) {
		default:
		__builtin_unreachable();
		case __ATOMIC_ACQUIRE:
		return __builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "");
		case __ATOMIC_RELEASE:
		return __builtin_amdgcn_fence(__ATOMIC_RELEASE, "");
		case __ATOMIC_ACQ_REL:
		return __builtin_amdgcn_fence(__ATOMIC_ACQ_REL, "");
		case __ATOMIC_SEQ_CST:
		return __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "");
		}
		}

void syncWarp(__kmpc_impl_lanemask_t) {		void syncWarp(__kmpc_impl_lanemask_t) {
// AMDGCN doesn't need to sync threads in a warp		// AMDGCN doesn't need to sync threads in a warp
}		}

void syncThreads() { __builtin_amdgcn_s_barrier(); }		void syncThreads() { __builtin_amdgcn_s_barrier(); }
void syncThreadsAligned() { syncThreads(); }		void syncThreadsAligned() { syncThreads(); }

void fenceTeam(int Ordering) { __builtin_amdgcn_fence(Ordering, "workgroup"); }		// TODO: Don't have wavefront lane locks. Possibly can't have them.
		void unsetLock(omp_lock_t *) { __builtin_trap(); }
void fenceKernel(int Ordering) { __builtin_amdgcn_fence(Ordering, "agent"); }		int testLock(omp_lock_t *) { __builtin_trap(); }
		void initLock(omp_lock_t *) { __builtin_trap(); }
void fenceSystem(int Ordering) { __builtin_amdgcn_fence(Ordering, ""); }		void destroyLock(omp_lock_t *) { __builtin_trap(); }
		void setLock(omp_lock_t *) { __builtin_trap(); }
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Error here - syncThreadsAligned is deleted but should not be JonChesterfield: Error here - syncThreadsAligned is deleted but should not be

#pragma omp end declare variant		#pragma omp end declare variant
///}		///}

/// NVPTX Implementation		/// NVPTX Implementation
///		///
///{		///{
#pragma omp begin declare variant match( \		#pragma omp begin declare variant match( \
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

openmp/libomptarget/test/mapping/data_member_ref.cpp

	// RUN: %libomptarget-compilexx-run-and-check-generic			// RUN: %libomptarget-compilexx-run-and-check-generic

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>

	struct View {			struct View {
	int Data;			int Data;
	};			};

	struct ViewPtr {			struct ViewPtr {
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp

	// RUN: %libomptarget-compilexx-run-and-check-generic			// RUN: %libomptarget-compilexx-run-and-check-generic

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <cstdio>			#include <cstdio>
	#include <cstdlib>			#include <cstdlib>

	typedef struct {			typedef struct {
	int a;			int a;
	double *b;			double *b;
	} C1;			} C1;
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp

	// RUN: %libomptarget-compilexx-run-and-check-generic			// RUN: %libomptarget-compilexx-run-and-check-generic

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <cstdio>			#include <cstdio>
	#include <cstdlib>			#include <cstdlib>

	typedef struct {			typedef struct {
	int a;			int a;
	double *b;			double *b;
	} C;			} C;
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

openmp/libomptarget/test/mapping/delete_inf_refcount.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic

	// fails with error message 'Unable to generate target entries' on amdgcn			// fails with error message 'Unable to generate target entries' on amdgcn
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>
	#include <omp.h>			#include <omp.h>

	#pragma omp declare target			#pragma omp declare target
	int isHost;			int isHost;
	#pragma omp end declare target			#pragma omp end declare target

	Show All 20 Lines

openmp/libomptarget/test/mapping/lambda_by_value.cpp

	// RUN: %libomptarget-compilexx-run-and-check-generic			// RUN: %libomptarget-compilexx-run-and-check-generic

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>
	#include <stdint.h>			#include <stdint.h>

	// CHECK: before: [[V1:111]] [[V2:222]] [[PX:0x[^ ]+]] [[PY:0x[^ ]+]]			// CHECK: before: [[V1:111]] [[V2:222]] [[PX:0x[^ ]+]] [[PY:0x[^ ]+]]
	// CHECK: lambda: [[V1]] [[V2]] [[PX_TGT:0x[^ ]+]] 0x{{.*}}			// CHECK: lambda: [[V1]] [[V2]] [[PX_TGT:0x[^ ]+]] 0x{{.*}}
	// CHECK: tgt : [[V2]] [[PX_TGT]] 1			// CHECK: tgt : [[V2]] [[PX_TGT]] 1
	// CHECK: out : [[V2]] [[V2]] [[PX]] [[PY]]			// CHECK: out : [[V2]] [[V2]] [[PX]] [[PY]]
	Show All 25 Lines

openmp/libomptarget/test/mapping/ompx_hold/struct.c

	// RUN: %libomptarget-compile-generic -fopenmp-extensions			// RUN: %libomptarget-compile-generic -fopenmp-extensions
	// RUN: %libomptarget-run-generic \| %fcheck-generic -strict-whitespace			// RUN: %libomptarget-run-generic \| %fcheck-generic -strict-whitespace

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <omp.h>			#include <omp.h>
	#include <stdio.h>			#include <stdio.h>

	#define CHECK_PRESENCE(Var1, Var2, Var3) \			#define CHECK_PRESENCE(Var1, Var2, Var3) \
	printf(" presence of %s, %s, %s: %d, %d, %d\n", \			printf(" presence of %s, %s, %s: %d, %d, %d\n", \
	#Var1, #Var2, #Var3, \			#Var1, #Var2, #Var3, \
	omp_target_is_present(&(Var1), omp_get_default_device()), \			omp_target_is_present(&(Var1), omp_get_default_device()), \
	▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

openmp/libomptarget/test/mapping/ptr_and_obj_motion.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>

	typedef struct {			typedef struct {
	double *dataptr;			double *dataptr;
	int dummy1;			int dummy1;
	int dummy2;			int dummy2;
	} DV;			} DV;
	Show All 35 Lines

openmp/libomptarget/test/mapping/reduction_implicit_map.cpp

	// RUN: %libomptarget-compilexx-run-and-check-generic			// RUN: %libomptarget-compilexx-run-and-check-generic

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// UNSUPPORTED: amdgcn-amd-amdhsa			// UNSUPPORTED: amdgcn-amd-amdhsa
				// UNSUPPORTED: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>

	void sum(int* input, int size, int* output)			void sum(int* input, int size, int* output)
	{			{
	#pragma omp target teams distribute parallel for reduction(+:output[0]) \			#pragma omp target teams distribute parallel for reduction(+:output[0]) \
	map(to:input[0:size])			map(to:input[0:size])
	for (int i = 0; i < size; i++)			for (int i = 0; i < size; i++)
	Show All 16 Lines

openmp/libomptarget/test/offloading/bug49021.cpp

	// RUN: %libomptarget-compilexx-generic -O3 && %libomptarget-run-generic			// RUN: %libomptarget-compilexx-generic -O3 && %libomptarget-run-generic

	// Wrong results on amdgcn			// Wrong results on amdgcn
	// UNSUPPORTED: amdgcn-amd-amdhsa			// UNSUPPORTED: amdgcn-amd-amdhsa
				// UNSUPPORTED: amdgcn-amd-amdhsa-newRTL

	#include <iostream>			#include <iostream>

	template <typename T> int test_map() {			template <typename T> int test_map() {
	std::cout << "map(complex<>)" << std::endl;			std::cout << "map(complex<>)" << std::endl;
	T a(0.2), a_check;			T a(0.2), a_check;
	#pragma omp target map(from : a_check)			#pragma omp target map(from : a_check)
	{ a_check = a; }			{ a_check = a; }
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

openmp/libomptarget/test/offloading/bug49334.cpp

	// RUN: %libomptarget-compilexx-run-and-check-generic			// RUN: %libomptarget-compilexx-run-and-check-generic

	// Currently hangs on amdgpu			// Currently hangs on amdgpu
	// UNSUPPORTED: amdgcn-amd-amdhsa			// UNSUPPORTED: amdgcn-amd-amdhsa
				// UNSUPPORTED: amdgcn-amd-amdhsa-newRTL
	// UNSUPPORTED: x86_64-pc-linux-gnu			// UNSUPPORTED: x86_64-pc-linux-gnu

	#include <cassert>			#include <cassert>
	#include <iostream>			#include <iostream>
	#include <memory>			#include <memory>
	#include <vector>			#include <vector>

	class BlockMatrix {			class BlockMatrix {
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

openmp/libomptarget/test/offloading/bug50022.cpp

	// RUN: %libomptarget-compilexx-and-run-generic			// RUN: %libomptarget-compilexx-and-run-generic

	// UNSUPPORTED: amdgcn-amd-amdhsa			// UNSUPPORTED: amdgcn-amd-amdhsa
				// UNSUPPORTED: amdgcn-amd-amdhsa-newRTL

	#include <cassert>			#include <cassert>
	#include <iostream>			#include <iostream>
	#include <stdexcept>			#include <stdexcept>

	int main(int argc, char *argv[]) {			int main(int argc, char *argv[]) {
	int a = 0;			int a = 0;
	std::cout << "outside a = " << a << " addr " << &a << std::endl;			std::cout << "outside a = " << a << " addr " << &a << std::endl;
	Show All 30 Lines

openmp/libomptarget/test/offloading/global_constructor.cpp

	// RUN: %libomptarget-compilexx-generic && %libomptarget-run-generic \| %fcheck-generic			// RUN: %libomptarget-compilexx-generic && %libomptarget-run-generic \| %fcheck-generic

	// Fails in DAGToDAG on an address space problem			// Fails in DAGToDAG on an address space problem
	// UNSUPPORTED: amdgcn-amd-amdhsa			// UNSUPPORTED: amdgcn-amd-amdhsa
				// UNSUPPORTED: amdgcn-amd-amdhsa-newRTL

	#include <cmath>			#include <cmath>
	#include <cstdio>			#include <cstdio>

	const double Host = log(2.0) / log(2.0);			const double Host = log(2.0) / log(2.0);
	#pragma omp declare target			#pragma omp declare target
	const double Device = log(2.0) / log(2.0);			const double Device = log(2.0) / log(2.0);
	#pragma omp end declare target			#pragma omp end declare target
	Show All 10 Lines

openmp/libomptarget/test/offloading/host_as_target.c

	// Check that specifying device as omp_get_initial_device():			// Check that specifying device as omp_get_initial_device():
	// - Doesn't cause the runtime to fail.			// - Doesn't cause the runtime to fail.
	// - Offloads code to the host.			// - Offloads code to the host.
	// - Doesn't transfer data. In this case, just check that neither host data nor			// - Doesn't transfer data. In this case, just check that neither host data nor
	// default device data are affected by the specified transfers.			// default device data are affected by the specified transfers.
	// - Works whether it's specified directly or as the default device.			// - Works whether it's specified directly or as the default device.

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>
	#include <omp.h>			#include <omp.h>

	static void check(char *X, int Dev) {			static void check(char *X, int Dev) {
	printf(" host X = %c\n", *X);			printf(" host X = %c\n", *X);
	#pragma omp target device(Dev)			#pragma omp target device(Dev)
	printf("device X = %c\n", *X);			printf("device X = %c\n", *X);
	▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

openmp/libomptarget/test/unified_shared_memory/api.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic
	// XFAIL: nvptx64-nvidia-cuda			// XFAIL: nvptx64-nvidia-cuda
	// XFAIL: nvptx64-nvidia-cuda-newRTL			// XFAIL: nvptx64-nvidia-cuda-newRTL

	// Fails on amdgcn with error: GPU Memory Error			// Fails on amdgcn with error: GPU Memory Error
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>
	#include <omp.h>			#include <omp.h>

	// ---------------------------------------------------------------------------			// ---------------------------------------------------------------------------
	// Various definitions copied from OpenMP RTL			// Various definitions copied from OpenMP RTL

	extern void __tgt_register_requires(int64_t);			extern void __tgt_register_requires(int64_t);
	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic

	// REQUIRES: unified_shared_memory			// REQUIRES: unified_shared_memory
	// UNSUPPORTED: clang-6, clang-7, clang-8, clang-9			// UNSUPPORTED: clang-6, clang-7, clang-8, clang-9

	// Fails on amdgcn with error: GPU Memory Error			// Fails on amdgcn with error: GPU Memory Error
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <omp.h>			#include <omp.h>
	#include <stdio.h>			#include <stdio.h>

	#pragma omp requires unified_shared_memory			#pragma omp requires unified_shared_memory

	#define N 1024			#define N 1024

	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

openmp/libomptarget/test/unified_shared_memory/close_modifier.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic

	// REQUIRES: unified_shared_memory			// REQUIRES: unified_shared_memory
	// UNSUPPORTED: clang-6, clang-7, clang-8, clang-9			// UNSUPPORTED: clang-6, clang-7, clang-8, clang-9

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <omp.h>			#include <omp.h>
	#include <stdio.h>			#include <stdio.h>

	#pragma omp requires unified_shared_memory			#pragma omp requires unified_shared_memory

	#define N 1024			#define N 1024

	▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

openmp/libomptarget/test/unified_shared_memory/shared_update.c

	// RUN: %libomptarget-compile-run-and-check-generic			// RUN: %libomptarget-compile-run-and-check-generic

	// REQUIRES: unified_shared_memory			// REQUIRES: unified_shared_memory

	// amdgcn does not have printf definition			// amdgcn does not have printf definition
	// XFAIL: amdgcn-amd-amdhsa			// XFAIL: amdgcn-amd-amdhsa
				// XFAIL: amdgcn-amd-amdhsa-newRTL

	#include <stdio.h>			#include <stdio.h>
	#include <omp.h>			#include <omp.h>

	// ---------------------------------------------------------------------------			// ---------------------------------------------------------------------------
	// Various definitions copied from OpenMP RTL			// Various definitions copied from OpenMP RTL

	extern void __tgt_register_requires(int64_t);			extern void __tgt_register_requires(int64_t);
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Build DeviceRTL for amdgpu
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 382987

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

openmp/libomptarget/DeviceRTL/CMakeLists.txt

openmp/libomptarget/DeviceRTL/src/Configuration.cpp

openmp/libomptarget/DeviceRTL/src/Synchronization.cpp

openmp/libomptarget/test/mapping/data_member_ref.cpp

openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp

openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp

openmp/libomptarget/test/mapping/delete_inf_refcount.c

openmp/libomptarget/test/mapping/lambda_by_value.cpp

openmp/libomptarget/test/mapping/ompx_hold/struct.c

openmp/libomptarget/test/mapping/ptr_and_obj_motion.c

openmp/libomptarget/test/mapping/reduction_implicit_map.cpp

openmp/libomptarget/test/offloading/bug49021.cpp

openmp/libomptarget/test/offloading/bug49334.cpp

openmp/libomptarget/test/offloading/bug50022.cpp

openmp/libomptarget/test/offloading/global_constructor.cpp

openmp/libomptarget/test/offloading/host_as_target.c

openmp/libomptarget/test/unified_shared_memory/api.c

openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c

openmp/libomptarget/test/unified_shared_memory/close_modifier.c

openmp/libomptarget/test/unified_shared_memory/shared_update.c

This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Build DeviceRTL for amdgpuClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 382987

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

openmp/libomptarget/DeviceRTL/CMakeLists.txt

openmp/libomptarget/DeviceRTL/src/Configuration.cpp

openmp/libomptarget/DeviceRTL/src/Synchronization.cpp

openmp/libomptarget/test/mapping/data_member_ref.cpp

openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers.cpp

openmp/libomptarget/test/mapping/declare_mapper_nested_mappers.cpp

openmp/libomptarget/test/mapping/delete_inf_refcount.c

openmp/libomptarget/test/mapping/lambda_by_value.cpp

openmp/libomptarget/test/mapping/ompx_hold/struct.c

openmp/libomptarget/test/mapping/ptr_and_obj_motion.c

openmp/libomptarget/test/mapping/reduction_implicit_map.cpp

openmp/libomptarget/test/offloading/bug49021.cpp

openmp/libomptarget/test/offloading/bug49334.cpp

openmp/libomptarget/test/offloading/bug50022.cpp

openmp/libomptarget/test/offloading/global_constructor.cpp

openmp/libomptarget/test/offloading/host_as_target.c

openmp/libomptarget/test/unified_shared_memory/api.c

openmp/libomptarget/test/unified_shared_memory/close_enter_exit.c

openmp/libomptarget/test/unified_shared_memory/close_modifier.c

openmp/libomptarget/test/unified_shared_memory/shared_update.c

[libomptarget] Build DeviceRTL for amdgpu
ClosedPublic