This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/
-
libomptarget/
-
DeviceRTL/
-
include/
-
Interface.h
-
src/
-
State.cpp
-
deviceRTLs/
-
common/src/
-
src/
1
data_sharing.cu
-
interface.h
-
include/
-
omptarget.h
-
src/
-
api.cpp
-
exports
-
test/api/
-
api/
-
omp_dynamic_shared_memory.c

Differential D110957

[Libomptarget] Add an external interface to dynamic shared memory
ClosedPublic

Authored by jhuber6 on Oct 1 2021, 12:11 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
JonChesterfield

Commits

rG208f9005277a: [Libomptarget] Add an external interface to dynamic shared memory

Summary

This patch adds an external interface to access the dynamic shared
memory buffer in the device runtime. The function introduced is
`llvm_omp_get_dynamic_shared`. This includes a host-side
definition that only returns a null pointer so that it can be used when
host-fallback is enabled without crashing. Support for dynamic shared
memory was also ported to the old device runtime.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 requested review of this revision.Oct 1 2021, 12:11 PM

jhuber6 created this revision.

Herald added subscribers: openmp-commits, sstefan1. · View Herald TranscriptOct 1 2021, 12:11 PM

Exciting! Will take a close look early next week. Surprised there's no change to the GPU plugins needed

In D110957#3038169, @JonChesterfield wrote:

Exciting! Will take a close look early next week. Surprised there's no change to the GPU plugins needed

That was introduced in D110006, for CUDA it's easy since it's just an argument to the kernel launch function. I haven't implemented it for AMD yet.

JonChesterfield added inline comments.Oct 4 2021, 8:38 AM

openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu
24	^ static

JonChesterfield mentioned this in D111069: [libomptarget] Move device environment to shared header, remove divergence.Oct 4 2021, 9:09 AM

JonChesterfield mentioned this in rG0c554a4769f2: [libomptarget] Move device environment to shared header, remove divergence.Oct 7 2021, 4:04 AM

Fix and ping.

Harbormaster completed remote builds in B127746: Diff 378193.Oct 8 2021, 6:22 AM

The plumbing here is all uncontroversial, it's just a wrapper over the openmp pragma.

This won't work on amdgpu as-is, will need to pass the environment variable through to the HSA packet, and see what code clang emits for the allocator construct, and if that doesn't match what hip are using add lowering in the back end. There's nothing there that can't be done, just need to find the time.

This revision is now accepted and ready to land.Oct 8 2021, 7:21 AM

In D110957#3051055, @JonChesterfield wrote:

The plumbing here is all uncontroversial, it's just a wrapper over the openmp pragma.

This won't work on amdgpu as-is, will need to pass the environment variable through to the HSA packet, and see what code clang emits for the allocator construct, and if that doesn't match what hip are using add lowering in the back end. There's nothing there that can't be done, just need to find the time.

NVPTX just sees anything with the extern shared x[] pattern in the PTX and hooks up the pointer to dynamic shared memory. I'm not sure if AMD uses a similar method, but if they do I think all that would need to be done is to add the argument to the config struct used in the AMD plugin.

Closed by commit rG208f9005277a: [Libomptarget] Add an external interface to dynamic shared memory (authored by jhuber6). · Explain WhyOct 8 2021, 12:37 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG208f9005277a: [Libomptarget] Add an external interface to dynamic shared memory.

Revision Contents

Path

Size

openmp/

libomptarget/

DeviceRTL/

include/

Interface.h

2 lines

src/

State.cpp

6 lines

deviceRTLs/

common/

src/

data_sharing.cu

12 lines

interface.h

5 lines

include/

omptarget.h

3 lines

src/

api.cpp

2 lines

exports

1 line

test/

api/

omp_dynamic_shared_memory.c

12 lines

Diff 378193

openmp/libomptarget/DeviceRTL/include/Interface.h

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	int omp_get_num_devices(void);			int omp_get_num_devices(void);

	int omp_get_num_teams(void);			int omp_get_num_teams(void);

	int omp_get_team_num();			int omp_get_team_num();

	int omp_get_initial_device(void);			int omp_get_initial_device(void);

				void *llvm_omp_get_dynamic_shared();

	/// Synchronization			/// Synchronization
	///			///
	///{			///{
	void omp_init_lock(omp_lock_t *Lock);			void omp_init_lock(omp_lock_t *Lock);

	void omp_destroy_lock(omp_lock_t *Lock);			void omp_destroy_lock(omp_lock_t *Lock);

	void omp_set_lock(omp_lock_t *Lock);			void omp_set_lock(omp_lock_t *Lock);
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

openmp/libomptarget/DeviceRTL/src/State.cpp

	Show First 20 Lines • Show All 497 Lines • ▼ Show 20 Lines
	__attribute__((noinline)) void *__kmpc_alloc_shared(uint64_t Bytes) {			__attribute__((noinline)) void *__kmpc_alloc_shared(uint64_t Bytes) {
	return memory::allocShared(Bytes, "Frontend alloc shared");			return memory::allocShared(Bytes, "Frontend alloc shared");
	}			}

	__attribute__((noinline)) void __kmpc_free_shared(void *Ptr, uint64_t Bytes) {			__attribute__((noinline)) void __kmpc_free_shared(void *Ptr, uint64_t Bytes) {
	memory::freeShared(Ptr, Bytes, "Frontend free shared");			memory::freeShared(Ptr, Bytes, "Frontend free shared");
	}			}

	__attribute__((noinline)) void *__kmpc_get_dynamic_shared() {			void *__kmpc_get_dynamic_shared() { return memory::getDynamicBuffer(); }
	return memory::getDynamicBuffer();
	}			void *llvm_omp_get_dynamic_shared() { return __kmpc_get_dynamic_shared(); }

	/// Allocate storage in shared memory to communicate arguments from the main			/// Allocate storage in shared memory to communicate arguments from the main
	/// thread to the workers in generic mode. If we exceed			/// thread to the workers in generic mode. If we exceed
	/// NUM_SHARED_VARIABLES_IN_SHARED_MEM we will malloc space for communication.			/// NUM_SHARED_VARIABLES_IN_SHARED_MEM we will malloc space for communication.
	constexpr uint64_t NUM_SHARED_VARIABLES_IN_SHARED_MEM = 64;			constexpr uint64_t NUM_SHARED_VARIABLES_IN_SHARED_MEM = 64;

	[[clang::loader_uninitialized]] static void			[[clang::loader_uninitialized]] static void
	*SharedMemVariableSharingSpace[NUM_SHARED_VARIABLES_IN_SHARED_MEM];			*SharedMemVariableSharingSpace[NUM_SHARED_VARIABLES_IN_SHARED_MEM];
	Show All 26 Lines

openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu

	Show All 15 Lines
	#include "target_impl.h"			#include "target_impl.h"

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// Runtime functions for trunk data sharing scheme.			// Runtime functions for trunk data sharing scheme.
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////

	static constexpr unsigned MinBytes = 8;			static constexpr unsigned MinBytes = 8;

				static constexpr unsigned Alignment = 8;
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions ^ static JonChesterfield: ^ static

				/// External symbol to access dynamic shared memory.
				extern unsigned char DynamicSharedBuffer[] __attribute__((aligned(Alignment)));
				#pragma omp allocate(DynamicSharedBuffer) allocator(omp_pteam_mem_alloc)

				EXTERN void *__kmpc_get_dynamic_shared() { return DynamicSharedBuffer; }

				EXTERN void *llvm_omp_get_dynamic_shared() {
				return __kmpc_get_dynamic_shared();
				}

	template <unsigned BPerThread, unsigned NThreads = MAX_THREADS_PER_TEAM>			template <unsigned BPerThread, unsigned NThreads = MAX_THREADS_PER_TEAM>
	struct alignas(32) ThreadStackTy {			struct alignas(32) ThreadStackTy {
	static constexpr unsigned BytesPerThread = BPerThread;			static constexpr unsigned BytesPerThread = BPerThread;
	static constexpr unsigned NumThreads = NThreads;			static constexpr unsigned NumThreads = NThreads;
	static constexpr unsigned NumWarps = (NThreads + WARPSIZE - 1) / WARPSIZE;			static constexpr unsigned NumWarps = (NThreads + WARPSIZE - 1) / WARPSIZE;

	unsigned char Data[NumThreads][BytesPerThread];			unsigned char Data[NumThreads][BytesPerThread];
	unsigned char Usage[NumThreads];			unsigned char Usage[NumThreads];
	▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/interface.h

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	EXTERN void omp_set_default_device(int deviceId);			EXTERN void omp_set_default_device(int deviceId);
	EXTERN int omp_get_default_device(void);			EXTERN int omp_get_default_device(void);
	EXTERN int omp_get_num_devices(void);			EXTERN int omp_get_num_devices(void);
	EXTERN int omp_get_num_teams(void);			EXTERN int omp_get_num_teams(void);
	EXTERN int omp_get_team_num(void);			EXTERN int omp_get_team_num(void);
	EXTERN int omp_get_initial_device(void);			EXTERN int omp_get_initial_device(void);
	EXTERN int omp_get_max_task_priority(void);			EXTERN int omp_get_max_task_priority(void);

				EXTERN void *llvm_omp_get_dynamic_shared();

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// file below is swiped from kmpc host interface			// file below is swiped from kmpc host interface
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// kmp specific types			// kmp specific types
	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////

	▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines
	EXTERN void *__kmpc_alloc_shared(uint64_t Bytes);			EXTERN void *__kmpc_alloc_shared(uint64_t Bytes);

	/// Deallocate \p Ptr. Needs to be called balanced with __kmpc_alloc_shared like			/// Deallocate \p Ptr. Needs to be called balanced with __kmpc_alloc_shared like
	/// a stack (push/pop). Can be called by any thread. \p Ptr must be allocated by			/// a stack (push/pop). Can be called by any thread. \p Ptr must be allocated by
	/// __kmpc_alloc_shared by the same thread. \p Bytes contains the size of the			/// __kmpc_alloc_shared by the same thread. \p Bytes contains the size of the
	/// paired allocation to make memory management easier.			/// paired allocation to make memory management easier.
	EXTERN void __kmpc_free_shared(void *Ptr, size_t Bytes);			EXTERN void __kmpc_free_shared(void *Ptr, size_t Bytes);

				/// Get a pointer to the dynamic shared memory buffer in the device.
				EXTERN void *__kmpc_get_dynamic_shared();

	#endif			#endif

openmp/libomptarget/include/omptarget.h

	Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	int omp_target_disassociate_ptr(const void *host_ptr, int device_num);			int omp_target_disassociate_ptr(const void *host_ptr, int device_num);

	/// Explicit target memory allocators			/// Explicit target memory allocators
	/// Using the llvm_ prefix until they become part of the OpenMP standard.			/// Using the llvm_ prefix until they become part of the OpenMP standard.
	void *llvm_omp_target_alloc_device(size_t size, int device_num);			void *llvm_omp_target_alloc_device(size_t size, int device_num);
	void *llvm_omp_target_alloc_host(size_t size, int device_num);			void *llvm_omp_target_alloc_host(size_t size, int device_num);
	void *llvm_omp_target_alloc_shared(size_t size, int device_num);			void *llvm_omp_target_alloc_shared(size_t size, int device_num);

				/// Dummy target so we have a symbol for generating host fallback.
				void *llvm_omp_get_dynamic_shared();

	/// add the clauses of the requires directives in a given file			/// add the clauses of the requires directives in a given file
	void __tgt_register_requires(int64_t flags);			void __tgt_register_requires(int64_t flags);

	/// adds a target shared library to the target execution image			/// adds a target shared library to the target execution image
	void __tgt_register_lib(__tgt_bin_desc *desc);			void __tgt_register_lib(__tgt_bin_desc *desc);

	/// Initialize all RTLs at once			/// Initialize all RTLs at once
	void __tgt_init_all_rtls();			void __tgt_init_all_rtls();
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

openmp/libomptarget/src/api.cpp

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	EXTERN void *llvm_omp_target_alloc_host(size_t size, int device_num) {			EXTERN void *llvm_omp_target_alloc_host(size_t size, int device_num) {
	return targetAllocExplicit(size, device_num, TARGET_ALLOC_HOST, __func__);			return targetAllocExplicit(size, device_num, TARGET_ALLOC_HOST, __func__);
	}			}

	EXTERN void *llvm_omp_target_alloc_shared(size_t size, int device_num) {			EXTERN void *llvm_omp_target_alloc_shared(size_t size, int device_num) {
	return targetAllocExplicit(size, device_num, TARGET_ALLOC_SHARED, __func__);			return targetAllocExplicit(size, device_num, TARGET_ALLOC_SHARED, __func__);
	}			}

				EXTERN void *llvm_omp_get_dynamic_shared() { return nullptr; }

	EXTERN void omp_target_free(void *device_ptr, int device_num) {			EXTERN void omp_target_free(void *device_ptr, int device_num) {
	TIMESCOPE();			TIMESCOPE();
	DP("Call to omp_target_free for device %d and address " DPxMOD "\n",			DP("Call to omp_target_free for device %d and address " DPxMOD "\n",
	device_num, DPxPTR(device_ptr));			device_num, DPxPTR(device_ptr));

	if (!device_ptr) {			if (!device_ptr) {
	DP("Call to omp_target_free with NULL ptr\n");			DP("Call to omp_target_free with NULL ptr\n");
	return;			return;
	▲ Show 20 Lines • Show All 255 Lines • Show Last 20 Lines

openmp/libomptarget/src/exports

Show All 34 Lines	global:
omp_target_is_present;		omp_target_is_present;
omp_target_memcpy;		omp_target_memcpy;
omp_target_memcpy_rect;		omp_target_memcpy_rect;
omp_target_associate_ptr;		omp_target_associate_ptr;
omp_target_disassociate_ptr;		omp_target_disassociate_ptr;
llvm_omp_target_alloc_host;		llvm_omp_target_alloc_host;
llvm_omp_target_alloc_shared;		llvm_omp_target_alloc_shared;
llvm_omp_target_alloc_device;		llvm_omp_target_alloc_device;
		llvm_omp_get_dynamic_shared;
__tgt_set_info_flag;		__tgt_set_info_flag;
__tgt_print_device_info;		__tgt_print_device_info;
local:		local:
*;		*;
};		};

openmp/libomptarget/test/api/omp_dynamic_shared_memory.c

	// RUN: %libomptarget-compile-nvptx64-nvidia-cuda -fopenmp-target-new-runtime			// RUN: %libomptarget-compile-nvptx64-nvidia-cuda -fopenmp-target-new-runtime
	// RUN: env LIBOMPTARGET_SHARED_MEMORY_SIZE=4 \			// RUN: env LIBOMPTARGET_SHARED_MEMORY_SIZE=256 \
	// RUN: %libomptarget-run-nvptx64-nvidia-cuda \| %fcheck-nvptx64-nvidia-cuda			// RUN: %libomptarget-run-nvptx64-nvidia-cuda \| %fcheck-nvptx64-nvidia-cuda
	// REQUIRES: nvptx64-nvidia-cuda			// REQUIRES: nvptx64-nvidia-cuda

	#include <omp.h>			#include <omp.h>
	#include <stdio.h>			#include <stdio.h>

	void *get_dynamic_shared() { return NULL; }			void *llvm_omp_get_dynamic_shared();
	#pragma omp begin declare variant match(device = {arch(nvptx64)})
	extern void *__kmpc_get_dynamic_shared();
	void *get_dynamic_shared() { return __kmpc_get_dynamic_shared(); }
	#pragma omp end declare variant

	int main() {			int main() {
	int x;			int x;
	#pragma omp target parallel map(from : x)			#pragma omp target parallel map(from : x)
	{			{
	int *buf = get_dynamic_shared();			int *buf = llvm_omp_get_dynamic_shared() + 252;
	#pragma omp barrier			#pragma omp barrier
	if (omp_get_thread_num() == 0)			if (omp_get_thread_num() == 0)
	*buf = 1;			*buf = 1;
	#pragma omp barrier			#pragma omp barrier
	if (omp_get_thread_num() == 1)			if (omp_get_thread_num() == 1)
	x = *buf;			x = *buf;
	}			}

	// CHECK: PASS			// CHECK: PASS
	if (x == 1)			if (x == 1 && llvm_omp_get_dynamic_shared() == NULL)
	printf("PASS\n");			printf("PASS\n");
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Add an external interface to dynamic shared memoryClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 378193

openmp/libomptarget/DeviceRTL/include/Interface.h

openmp/libomptarget/DeviceRTL/src/State.cpp

openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu

openmp/libomptarget/deviceRTLs/interface.h

openmp/libomptarget/include/omptarget.h

openmp/libomptarget/src/api.cpp

openmp/libomptarget/src/exports

openmp/libomptarget/test/api/omp_dynamic_shared_memory.c

[Libomptarget] Add an external interface to dynamic shared memory
ClosedPublic