This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/
-
libomptarget/
-
include/
-
omptarget.h
-
omptargetplugin.h
-
plugins/
-
cuda/src/
-
src/
-
rtl.cpp
-
exports
-
src/
-
device.h
-
device.cpp
-
exports
-
interface.cpp
1
omptarget.cpp
-
private.h
1
rtl.h
1
rtl.cpp
-
runtime/src/
-
src/
-
kmp.h
-
kmp_taskdeps.h
-
kmp_taskdeps.cpp
-
kmp_tasking.cpp

Differential D81989

[OpenMP] Introduce low level dependency process to target offloading
AbandonedPublic

Authored by tianshilei1992 on Jun 16 2020, 8:26 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
grokos
AndreyChurbanov

Summary

Asynchronous offloading will be wrapped into a target task, and the
corresponding dependencies will go to the task. Only all dependencies are
full-filled, the task will be enqueued and dispatched. However, almost all
device runtime libraries provide ways for dependencies such that we don't need
to go back to host side to resolve the dependencies. For exmaple, we could wait
for a CUDA event before we push some operations into a stream. The wait is not
blocking so that all following enqueues will be proceeded. However, they will
not be executed until the waiting event is full-filled.

This patch lowers the dependency process of target task to the device side. It
supports depending on both host tasks and target tasks. For depending on target
tasks, the process goes to the device side. As for depending on host tasks,
current mechanism is still used with a tiny modification.

The following are design details:

When a target construct is encountered, Clang wraps it into a task, and emit
function call to __kmpc_omp_target_task_alloc. We mark all tasks allocated by
__kmpc_omp_target_task_alloc as _target task_s. The transformation is like:

#pragma omp target depend(D) nowait
{ /* target region */ }
// The above one will be transformed to the following one
#pragma omp task depend(D) shared(...) target_task
#pragma omp target
{ /* target region */ }

where target_task is just a flag that is not really a part of the construct.

After the target task is created, let's call it _A_ and assume it has
dependencies, __kmpc_omp_task_with_deps is called to resolve and process its
dependencies. The only change here is, when A depends on another target task,
let's call it _B_, B is add into A's predecessors which is a linked list
storing all A's *target* predecessors. Here we do NOT increase the counter
npredecessors of A. If B is a host/regular task, existing scheme is used, which
is to add A into B's successors, and increase A's npredecessors. This
approach indicates that a target task's npredecessors only represents the number
of *host/regular* tasks a target task depends. Finally, enqueue a target task no
matter whether its dependencies are full-filled.

Now let's switch to libomptarget and take target nowait as example, and
target data related stuffs are same. It first create a new __tgt_async_info
which contains three fields: DeviceID which is the index to use Devices,
Queue which is a queue-like data structure where to push device operations, and
Event that is a device-dependent event. In the function target_nowait, it
first checks whether asynchronous APIs are supported. If not, wait all its
dependencies to be full-filled by checking the counter and yield the current
task if it still has unfinished dependencies. Once all dependencies are done, it
calls the synchronous version of target by setting the __tgt_async_info to
nullptr to tell the device RTL to use synchronous APIs.

Let's get back to the asynchronous version. It first calls waitForDeps to process dependencies. Here
it checks whether the npredecessors is zero. If not, it means there is still depending host tasks that
have not been finished, then it yields the current task. After that, we know it has no depending
host/regular task, then it starts to check all depending target tasks. __kmpc_get_target_task_waiting_list
is called to fetch __tgt_async_info pointers of all its depending target task. We'll talk about how
__kmpc_get_target_task_waiting_list is implemented later. For each async info, if the depending
task and current task are from the same *type* of target device, which means we can ask the device
API to take care of the dependency, it calls device RTL function wait_event which is mapped to
the plugin interface __tgt_rtl_wait_event to insert the event before doing real offloading works.
I'll cover the map of plugin interfaces and their functionality later. The wait_event is expected to
be asynchronous and its effect is to tell the device RTL that all later enqueued operations can only
be started once the inserted events are full-filled. This mechanism will not work if current task depends
on a target task that is on another type of target device. In this case, we will perform queryAndWait
to check whether the corresponding event is full-filled, aka. the corresponding target task is finished.
If not, yield the current task. It's worth noting that we're not infinitely yielding current task. There is
a counter to tell how much time we have yielded. If it reaches a certain point, it will not yield again.
Instead, it will _synchronize_ the event, which is a blocking wait. This is an optimization to avoid
long-time looping when there is no task in the queue. Two target tasks are of same type if their device
RTLs are same.

Once we finish insert all waiting events, we can start the offloading work of current target task. Again,
the device RTL will make sure that our following offloading operations will not be started until all
waiting events are full-filled. The offloading work of current target task is done by target. It
basically transfers data to device, launches the kernel, and then transfers data back to host. Note that
all these operations are asynchronous. After that, we need to get an event which can only be full-filled
if all operations enqueued before are done. The event is fetched by calling recordEvent which is mapped
to __tgt_rtl_record_event. In fact, this step may not be necessary for some target devices if the event
is generated by each enqueue. In that case, just leave the __tgt_rtl_record_event empty and return
OFFLOAD_SUCCESS. Now we have the event, and we need to attach it to the current task by
calling __kmpc_set_async_info such that all its depended tasks can fetch and use it. There is an
optimization that if there is no dependency in this task, we don't need to do that. However, due to the
issue in current CG that it cannot pass right number into those functions, we cannot depend on it now.
As a consequence, we could only set it whatever. After this point, all its _depended_ tasks can get the
async info and starts their own wait by inserting the event. For current task, it performs queryAndWait
which basically is exactly the one we mentioned for the dependency waiting of two different types of
target devices, and finally finish the current target tasks.

So there are four new plugin interfaces:
__tgt_rtl_release_async_info: To release the asynchronous information, basically returning the Queue
and destroying the event.
__tgt_rtl_wait_event: Non-blocking wait for events. It is like just inserting the event and all following
enqueuing will not be started once the event is full-filled. Since it is non-blocking, we can still enqueue
operations even if the event is not full-filled. They just cannot be started. This can improve the
concurrency.
__tgt_rtl_record_event: Basically to generate an event which can only be full-filled when all previously
enqueued operations are finished. The _record_ here is a CUDA terminology. Feel free to comment
if you have a better name.
__tgt_rtl_check_event: To check whether the event is full-filled. If not, returns OFFLOAD_NOT_DONE;
If yes, returns OFFLOAD_SUCCESS. Return OFFLOAD_FAIL if anything is wrong in RTL.

The last part is about some functions implemented in libomp. We add two member data in the depnode
data structure because it is a per-task data structure and implemented with reference count. One is
a linked list successors, and another one is a void * pointer which is the async info of current
target task.

__kmpc_get_target_task_waiting_list basically goes through all nodes in successors and check
whether the corresponding async information pointer is nullptr. If yes, it means the target task has
not set the async info yet. We yield the current task here. If not, push the pointer to a list which will
be used by current task.

Like before, once the reference count of a depnode is zero, this node will be freed. It calls the function
__kmpc_free_async_info to release corresponding information and free the memory, and deref all
nodes in its successors.

Diff Detail

Event Timeline

tianshilei1992 created this revision.Jun 16 2020, 8:26 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 16 2020, 8:26 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: openmp-commits, sstefan1, jfb and 2 others. · View Herald Transcript

Two high level comments below. We need to split this patch.

Can you explain the approach in a bit more detail in the commit message? (Also a typo in there).

openmp/libomptarget/src/omptarget.cpp
609	I guess we can make async info a pointer argument in a separate (NFC) patch to reduce this one, WDYT?
openmp/libomptarget/src/rtl.cpp
419	Style: Everywhere I have seen this we do `/* name */ value`. I know this was different here but I'd like us to align with LLVM & Clang on this one. Feel free to commit the comments for all but the new argument as NFC without further review.
openmp/libomptarget/src/rtl.h
114	Typo in comment

Harbormaster failed remote builds in B60582: Diff 271269!Jun 16 2020, 9:19 PM

Is there some design documentation on this? It's tricky to distinguish intent from quirks of cuda.

Amdgcn is built on the 'heterogenous system architecture' model which has a fair amount of support for managing graphs of tasks but also has challenging forward progress properties. I'm not immediately sure it would share much code with the nvptx implementation.

In D81989#2098113, @JonChesterfield wrote:

Is there some design documentation on this? It's tricky to distinguish intent from quirks of cuda.

Amdgcn is built on the 'heterogenous system architecture' model which has a fair amount of support for managing graphs of tasks but also has challenging forward progress properties. I'm not immediately sure it would share much code with the nvptx implementation.

I'll add some documentation.

The high level idea is:

Add events to a queue. This operation is not blocking.
Add following operations into the queue.
Save the event from the second step.

Does AMD GCN support this pattern? The record event thing can be optional because I know some device RT generate the event when pushing an operation into a queue, like OpenCL.

ye-luo added a subscriber: ye-luo.Jun 17 2020, 2:21 PM

Fixed some issues and code style

Harbormaster failed remote builds in B60827: Diff 271715!Jun 18 2020, 8:40 AM

tianshilei1992 edited the summary of this revision. (Show Details)Jun 18 2020, 9:49 AM

tianshilei1992 updated this revision to Diff 272503.Jun 22 2020, 11:24 AM

tianshilei1992 edited the summary of this revision. (Show Details)Jun 24 2020, 1:23 PM

tianshilei1992 updated this revision to Diff 274215.Jun 29 2020, 1:08 PM

tianshilei1992 planned changes to this revision.Jan 16 2021, 2:55 PM

tianshilei1992 abandoned this revision.Feb 23 2021, 3:46 PM

RaviNarayanaswamy added a subscriber: RaviNarayanaswamy.Mar 29 2022, 5:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 5:24 PM

tianshilei1992 mentioned this in D132005: [OpenMP] Add non-blocking support for target nowait regions.Oct 22 2022, 4:29 PM

Revision Contents

Path

Size

openmp/

libomptarget/

include/

omptarget.h

10 lines

omptargetplugin.h

25 lines

plugins/

cuda/

src/

rtl.cpp

169 lines

exports

5 lines

src/

4 lines

32 lines

2 lines

272 lines

499 lines

37 lines

48 lines

32 lines

runtime/

src/

13 lines

24 lines

63 lines

76 lines

Diff 272503

openmp/libomptarget/include/omptarget.h

Show All 13 Lines
#ifndef _OMPTARGET_H_		#ifndef _OMPTARGET_H_
#define _OMPTARGET_H_		#define _OMPTARGET_H_

#include <stdint.h>		#include <stdint.h>
#include <stddef.h>		#include <stddef.h>

#define OFFLOAD_SUCCESS (0)		#define OFFLOAD_SUCCESS (0)
#define OFFLOAD_FAIL (~0)		#define OFFLOAD_FAIL (~0)
		#define OFFLOAD_NOT_DONE (1)

#define OFFLOAD_DEVICE_DEFAULT -1		#define OFFLOAD_DEVICE_DEFAULT -1
#define HOST_DEVICE -10		#define HOST_DEVICE -10

/// Data attributes for each data reference used in an OpenMP target region.		/// Data attributes for each data reference used in an OpenMP target region.
enum tgt_map_type {		enum tgt_map_type {
// No flags		// No flags
OMP_TGT_MAPTYPE_NONE = 0x000,		OMP_TGT_MAPTYPE_NONE = 0x000,
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	struct __tgt_target_table {
__tgt_offload_entry *EntriesBegin; // Begin of the table with all the entries		__tgt_offload_entry *EntriesBegin; // Begin of the table with all the entries
__tgt_offload_entry		__tgt_offload_entry
*EntriesEnd; // End of the table with all the entries (non inclusive)		*EntriesEnd; // End of the table with all the entries (non inclusive)
};		};

/// This struct contains information exchanged between different asynchronous		/// This struct contains information exchanged between different asynchronous
/// operations for device-dependent optimization and potential synchronization		/// operations for device-dependent optimization and potential synchronization
struct __tgt_async_info {		struct __tgt_async_info {
		// Device ID. Note that it is NOT the RTLDeviceID. We don't need to store the
		// RTLDeviceID explicitly as we can always get it via DeviceID.
		int DeviceID = -1;
// A pointer to a queue-like structure where offloading operations are issued.		// A pointer to a queue-like structure where offloading operations are issued.
// We assume to use this structure to do synchronization. In CUDA backend, it		// We assume to use this structure to do synchronization.
// is CUstream.
void *Queue = nullptr;		void *Queue = nullptr;
		// A pointer to a device-dependent event used for synchronization as well.
		void *Event = nullptr;
};		};

#ifdef __cplusplus		#ifdef __cplusplus
extern "C" {		extern "C" {
#endif		#endif

int omp_get_num_devices(void);		int omp_get_num_devices(void);
int omp_get_initial_device(void);		int omp_get_initial_device(void);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	int __tgt_target_teams(int64_t device_id, void *host_ptr, int32_t arg_num,
int32_t thread_limit);		int32_t thread_limit);
int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,		int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args,		int32_t arg_num, void args_base, void args,
int64_t arg_sizes, int64_t arg_types,		int64_t arg_sizes, int64_t arg_types,
int32_t num_teams, int32_t thread_limit,		int32_t num_teams, int32_t thread_limit,
int32_t depNum, void *depList,		int32_t depNum, void *depList,
int32_t noAliasDepNum, void *noAliasDepList);		int32_t noAliasDepNum, void *noAliasDepList);
void __kmpc_push_target_tripcount(int64_t device_id, uint64_t loop_tripcount);		void __kmpc_push_target_tripcount(int64_t device_id, uint64_t loop_tripcount);
		void __kmpc_free_async_info(void *Ptr);

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
#include <stdio.h>		#include <stdio.h>
#define DEBUGP(prefix, ...) \		#define DEBUGP(prefix, ...) \
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

openmp/libomptarget/include/omptargetplugin.h

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	int32_t __tgt_rtl_run_target_team_region(int32_t ID, void Entry, void *Args,
uint64_t loop_tripcount);		uint64_t loop_tripcount);

// Asynchronous version of __tgt_rtl_run_target_team_region		// Asynchronous version of __tgt_rtl_run_target_team_region
int32_t __tgt_rtl_run_target_team_region_async(		int32_t __tgt_rtl_run_target_team_region_async(
int32_t ID, void Entry, void Args, ptrdiff_t Offsets, int32_t NumArgs,		int32_t ID, void Entry, void Args, ptrdiff_t Offsets, int32_t NumArgs,
int32_t NumTeams, int32_t ThreadLimit, uint64_t loop_tripcount,		int32_t NumTeams, int32_t ThreadLimit, uint64_t loop_tripcount,
__tgt_async_info *AsyncInfoPtr);		__tgt_async_info *AsyncInfoPtr);

// Device synchronization. In case of success, return zero. Otherwise, return an		// Release all resources in __tgt_async_info
// error code.		int32_t __tgt_rtl_release_async_info(int32_t ID, __tgt_async_info *AsyncInfo);
int32_t __tgt_rtl_synchronize(int32_t ID, __tgt_async_info *AsyncInfoPtr);
		// Wait an event. This is different from synchronizing an event. Waiting an
		// event is a non-blocking operation. Basically, all operations enqueued after
		// this waiting should be blocked until this event is full-filled.
		int32_t __tgt_rtl_wait_event(int32_t ID, __tgt_async_info *AsyncInfo,
		__tgt_async_info *DepAsyncInfo);

		// Record an event such that the event can be later used for waiting or
		// synchronization. Note that once the event is recorded, all following use of
		// async_info should not use the queue again. In the implementation, the queue
		// should be released somehow.
		int32_t __tgt_rtl_record_event(int32_t ID, __tgt_async_info *AsyncInfo);

		// Synchronize an event. This is a blocking operation.
		int32_t __tgt_rtl_synchronize(int32_t ID, __tgt_async_info *AsyncInfo);

		// Query an event whether it has been full-filled. If return OFFLOAD_SUCCESS,
		// the event has been full-filled. If OFFLOAD_NOT_DONE, it has not been finished
		// yet. If OFFLOAD_FAIL, something wrong.
		int32_t __tgt_rtl_check_event(int32_t ID, __tgt_async_info *AsyncInfo);

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

#endif // _OMPTARGETPLUGIN_H_		#endif // _OMPTARGETPLUGIN_H_

openmp/libomptarget/plugins/cuda/src/rtl.cpp

Show First 20 Lines • Show All 919 Lines • ▼ Show 20 Lines	if (!checkResult(Err, "Error returned from cuLaunchKernel\n"))
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;

DP("Launch of entry point at " DPxMOD " successful!\n",		DP("Launch of entry point at " DPxMOD " successful!\n",
DPxPTR(TgtEntryPtr));		DPxPTR(TgtEntryPtr));

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

		// Since we have two items that can be synchronized, we will always first
		// try to synchronize the event. If success, return directly. Otherwise,
		// synchronize the stream.
int synchronize(const int DeviceId, __tgt_async_info *AsyncInfoPtr) const {		int synchronize(const int DeviceId, __tgt_async_info *AsyncInfoPtr) const {
		CUresult Err;

		if (AsyncInfoPtr->Event) {
		CUevent Event = reinterpret_cast<CUevent>(AsyncInfoPtr->Event);
		Err = cuEventSynchronize(Event);
		if (!checkResult(Err, "error returned from cuEventSynchronize"))
		return OFFLOAD_FAIL;

		return OFFLOAD_SUCCESS;
		}

		assert(AsyncInfoPtr->Queue && "AsyncInfoPtr->Queue is nullptr");

CUstream Stream = reinterpret_cast<CUstream>(AsyncInfoPtr->Queue);		CUstream Stream = reinterpret_cast<CUstream>(AsyncInfoPtr->Queue);
CUresult Err = cuStreamSynchronize(Stream);		Err = cuStreamSynchronize(Stream);
if (Err != CUDA_SUCCESS) {		if (!checkResult(Err, "error returned from cuStreamSynchronize"))
DP("Error when synchronizing stream. stream = " DPxMOD
", async info ptr = " DPxMOD "\n",
DPxPTR(Stream), DPxPTR(AsyncInfoPtr));
CUDA_ERR_STRING(Err);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;

		StreamManager->returnStream(DeviceId, Stream);
		AsyncInfoPtr->Queue = nullptr;

		return OFFLOAD_SUCCESS;
}		}

// Once the stream is synchronized, return it to stream pool and reset		int releaseAsyncInfo(int DeviceId, __tgt_async_info *AsyncInfo) const {
// async_info. This is to make sure the synchronization only works for its		if (AsyncInfo->Queue) {
// own tasks.
StreamManager->returnStream(		StreamManager->returnStream(
DeviceId, reinterpret_cast<CUstream>(AsyncInfoPtr->Queue));		DeviceId, reinterpret_cast<CUstream>(AsyncInfo->Queue));
AsyncInfoPtr->Queue = nullptr;		AsyncInfo->Queue = nullptr;
		}

		if (AsyncInfo->Event) {
		CUresult Err =
		cuEventDestroy(reinterpret_cast<CUevent>(AsyncInfo->Event));
		if (!checkResult(Err, "error returned from cuEventDestroy"))
		return OFFLOAD_FAIL;
		AsyncInfo->Event = nullptr;
		}

		delete AsyncInfo;

		return OFFLOAD_SUCCESS;
		}

		int waitEvent(int DeviceID, __tgt_async_info *AsyncInfo,
		__tgt_async_info *DepAsyncInfo) const {
		CUstream Stream = getStream(DeviceID, AsyncInfo);
		CUevent Event = reinterpret_cast<CUevent>(DepAsyncInfo->Event);

		CUresult Err = cuStreamWaitEvent(Stream, Event, 0);
		if (!checkResult(Err, "error returned from cuStreamWaitEvent"))
		return OFFLOAD_FAIL;

		return OFFLOAD_SUCCESS;
		}

		int recordEvent(int DeviceId, __tgt_async_info *AsyncInfoPtr) const {
		CUstream Stream = reinterpret_cast<CUstream>(AsyncInfoPtr->Queue);
		CUevent Event;
		CUresult Err;

		if (AsyncInfoPtr->Event == nullptr) {
		Err = cuEventCreate(&Event, CU_EVENT_DISABLE_TIMING);
		if (!checkResult(Err, "error returned from cuEventCreate"))
		return OFFLOAD_FAIL;
		AsyncInfoPtr->Event = Event;
		} else {
		Event = reinterpret_cast<CUevent>(AsyncInfoPtr->Event);
		}

		Err = cuEventRecord(Event, Stream);
		if (!checkResult(Err, "error returned from cuEventRecord"))
		return OFFLOAD_FAIL;

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

		int checkEvent(int DeviceId, __tgt_async_info *AsyncInfoPtr) const {
		CUresult Err = cuCtxSetCurrent(DeviceData[DeviceId].Context);
		if (!checkResult(Err, "error returned from cuCtxSetCurrent"))
		return OFFLOAD_FAIL;

		CUevent Event = reinterpret_cast<CUevent>(AsyncInfoPtr->Event);
		Err = cuEventQuery(Event);
		// Event has been full-filled
		if (Err == CUDA_SUCCESS)
		return OFFLOAD_SUCCESS;
		// Event has not been full-filled
		if (Err == CUDA_ERROR_NOT_READY)
		return OFFLOAD_NOT_DONE;
		// Other errors
		checkResult(Err, "error returned from cuEventQuery");
		return OFFLOAD_FAIL;
		}

		int initAsyncInfo(int DeviceId, __tgt_async_info **AsyncInfo) const {
		CUresult Err = cuCtxSetCurrent(DeviceData[DeviceId].Context);
		if (!checkResult(Err, "error returned from cuCtxSetCurrent"))
		return OFFLOAD_FAIL;

		__tgt_async_info *P = new __tgt_async_info;
		getStream(DeviceId, P);
		CUevent Event;
		Err = cuEventCreate(&Event, CU_EVENT_DISABLE_TIMING);
		if (!checkResult(Err, "error returned from cuEventCreate"))
		return OFFLOAD_FAIL;
		P->Event = Event;
		*AsyncInfo = P;
		return OFFLOAD_SUCCESS;
		}
};		};

DeviceRTLTy DeviceRTL;		DeviceRTLTy DeviceRTL;
} // namespace		} // namespace

// Exposed library API function		// Exposed library API function
#ifdef __cplusplus		#ifdef __cplusplus
extern "C" {		extern "C" {
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	int32_t __tgt_rtl_run_target_region_async(int32_t device_id,
assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");

return __tgt_rtl_run_target_team_region_async(		return __tgt_rtl_run_target_team_region_async(
device_id, tgt_entry_ptr, tgt_args, tgt_offsets, arg_num,		device_id, tgt_entry_ptr, tgt_args, tgt_offsets, arg_num,
/* team num/ 1, / thread_limit / 1, / loop_tripcount */ 0,		/* team num/ 1, / thread_limit / 1, / loop_tripcount */ 0,
async_info_ptr);		async_info_ptr);
}		}

int32_t __tgt_rtl_synchronize(int32_t device_id,		int32_t __tgt_rtl_release_async_info(int32_t device_id,
__tgt_async_info *async_info_ptr) {		__tgt_async_info *async_info) {
assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
assert(async_info_ptr && "async_info_ptr is nullptr");		assert(async_info && "async_info is nullptr");
assert(async_info_ptr->Queue && "async_info_ptr->Queue is nullptr");
		return DeviceRTL.releaseAsyncInfo(device_id, async_info);
		}

		int32_t __tgt_rtl_wait_event(int32_t device_id, __tgt_async_info *async_info,
		__tgt_async_info *dep_async_info) {
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert(dep_async_info->Event && "dep_async_info->Event is nullptr");

		return DeviceRTL.waitEvent(device_id, async_info, dep_async_info);
		}

		int32_t __tgt_rtl_record_event(int32_t device_id,
		__tgt_async_info *async_info) {
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert(async_info->Queue && "async_info->Queue is nullptr");

		return DeviceRTL.recordEvent(device_id, async_info);
		}

		int32_t __tgt_rtl_synchronize(int32_t device_id, __tgt_async_info *async_info) {
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert((async_info->Event \|\| async_info->Queue) &&
		"Both async_info->Event and async_info->Queue are nullptr");

		return DeviceRTL.synchronize(device_id, async_info);
		}

		int32_t __tgt_rtl_check_event(int32_t device_id, __tgt_async_info *async_info) {
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert(async_info->Event && "async_info->Event is nullptr");

		return DeviceRTL.checkEvent(device_id, async_info);
		}

		int32_t __tgt_rtl_initialize_async_info(int32_t device_id, __tgt_async_info **async_info) {
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");

return DeviceRTL.synchronize(device_id, async_info_ptr);		return DeviceRTL.initAsyncInfo(device_id, async_info);
}		}

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

openmp/libomptarget/plugins/exports

Show All 12 Lines	global:
__tgt_rtl_data_retrieve_async;		__tgt_rtl_data_retrieve_async;
__tgt_rtl_data_exchange;		__tgt_rtl_data_exchange;
__tgt_rtl_data_exchange_async;		__tgt_rtl_data_exchange_async;
__tgt_rtl_data_delete;		__tgt_rtl_data_delete;
__tgt_rtl_run_target_team_region;		__tgt_rtl_run_target_team_region;
__tgt_rtl_run_target_team_region_async;		__tgt_rtl_run_target_team_region_async;
__tgt_rtl_run_target_region;		__tgt_rtl_run_target_region;
__tgt_rtl_run_target_region_async;		__tgt_rtl_run_target_region_async;
		__tgt_rtl_initialize_async_info;
		__tgt_rtl_release_async_info;
		__tgt_rtl_wait_event;
		__tgt_rtl_record_event;
__tgt_rtl_synchronize;		__tgt_rtl_synchronize;
		__tgt_rtl_check_event;
local:		local:
*;		*;
};		};

openmp/libomptarget/src/device.h

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	int32_t run_region(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
__tgt_async_info *AsyncInfoPtr);		__tgt_async_info *AsyncInfoPtr);
int32_t run_team_region(void TgtEntryPtr, void *TgtVarsPtr,		int32_t run_team_region(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
int32_t NumTeams, int32_t ThreadLimit,		int32_t NumTeams, int32_t ThreadLimit,
uint64_t LoopTripCount,		uint64_t LoopTripCount,
__tgt_async_info *AsyncInfoPtr);		__tgt_async_info *AsyncInfoPtr);

		// Functions for initialization and release of async info
		int32_t initAsyncInfo(__tgt_async_info **AsyncInfo);
		int32_t releaseAsyncInfo(__tgt_async_info *AsyncInfo);

private:		private:
// Call to RTL		// Call to RTL
void init(); // To be called only via DeviceTy::initOnce()		void init(); // To be called only via DeviceTy::initOnce()
};		};

/// Map between Device ID (i.e. openmp device id) and its DeviceTy.		/// Map between Device ID (i.e. openmp device id) and its DeviceTy.
typedef std::vector<DeviceTy> DevicesTy;		typedef std::vector<DeviceTy> DevicesTy;
extern DevicesTy Devices;		extern DevicesTy Devices;

extern bool device_is_ready(int device_num);		extern bool device_is_ready(int device_num);

#endif		#endif

openmp/libomptarget/src/device.cpp

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	__tgt_target_table DeviceTy::load_binary(void Img) {
__tgt_target_table *rc = RTL->load_binary(RTLDeviceID, Img);		__tgt_target_table *rc = RTL->load_binary(RTLDeviceID, Img);
RTL->Mtx.unlock();		RTL->Mtx.unlock();
return rc;		return rc;
}		}

// Submit data to device		// Submit data to device
int32_t DeviceTy::data_submit(void TgtPtrBegin, void HstPtrBegin,		int32_t DeviceTy::data_submit(void TgtPtrBegin, void HstPtrBegin,
int64_t Size, __tgt_async_info *AsyncInfoPtr) {		int64_t Size, __tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->data_submit_async \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->data_submit_async)
return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);		return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);
else		else
return RTL->data_submit_async(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size,		return RTL->data_submit_async(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size,
AsyncInfoPtr);		AsyncInfoPtr);
}		}

// Retrieve data from device		// Retrieve data from device
int32_t DeviceTy::data_retrieve(void HstPtrBegin, void TgtPtrBegin,		int32_t DeviceTy::data_retrieve(void HstPtrBegin, void TgtPtrBegin,
int64_t Size, __tgt_async_info *AsyncInfoPtr) {		int64_t Size, __tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->data_retrieve_async \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->data_retrieve_async)
return RTL->data_retrieve(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size);		return RTL->data_retrieve(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size);
else		else
return RTL->data_retrieve_async(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size,		return RTL->data_retrieve_async(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size,
AsyncInfoPtr);		AsyncInfoPtr);
}		}

// Copy data from current device to destination device directly		// Copy data from current device to destination device directly
int32_t DeviceTy::data_exchange(void SrcPtr, DeviceTy DstDev, void DstPtr,		int32_t DeviceTy::data_exchange(void SrcPtr, DeviceTy DstDev, void DstPtr,
int64_t Size, __tgt_async_info *AsyncInfoPtr) {		int64_t Size, __tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->data_exchange_async \|\| !RTL->synchronize) {		if (!AsyncInfoPtr \|\| !RTL->data_exchange_async) {
assert(RTL->data_exchange && "RTL->data_exchange is nullptr");		assert(RTL->data_exchange && "RTL->data_exchange is nullptr");
return RTL->data_exchange(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID, DstPtr,		return RTL->data_exchange(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID, DstPtr,
Size);		Size);
} else		} else
return RTL->data_exchange_async(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID,		return RTL->data_exchange_async(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID,
DstPtr, Size, AsyncInfoPtr);		DstPtr, Size, AsyncInfoPtr);
}		}

// Run region on device		// Run region on device
int32_t DeviceTy::run_region(void TgtEntryPtr, void *TgtVarsPtr,		int32_t DeviceTy::run_region(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
__tgt_async_info *AsyncInfoPtr) {		__tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->run_region \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->run_region)
return RTL->run_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr, TgtOffsets,		return RTL->run_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr, TgtOffsets,
TgtVarsSize);		TgtVarsSize);
else		else
return RTL->run_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
TgtOffsets, TgtVarsSize, AsyncInfoPtr);		TgtOffsets, TgtVarsSize, AsyncInfoPtr);
}		}

// Run team region on device.		// Run team region on device.
int32_t DeviceTy::run_team_region(void TgtEntryPtr, void *TgtVarsPtr,		int32_t DeviceTy::run_team_region(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
int32_t NumTeams, int32_t ThreadLimit,		int32_t NumTeams, int32_t ThreadLimit,
uint64_t LoopTripCount,		uint64_t LoopTripCount,
__tgt_async_info *AsyncInfoPtr) {		__tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->run_team_region_async \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->run_team_region_async)
return RTL->run_team_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_team_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
TgtOffsets, TgtVarsSize, NumTeams, ThreadLimit,		TgtOffsets, TgtVarsSize, NumTeams, ThreadLimit,
LoopTripCount);		LoopTripCount);
else		else
return RTL->run_team_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_team_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
TgtOffsets, TgtVarsSize, NumTeams,		TgtOffsets, TgtVarsSize, NumTeams,
ThreadLimit, LoopTripCount, AsyncInfoPtr);		ThreadLimit, LoopTripCount, AsyncInfoPtr);
}		}

// Whether data can be copied to DstDevice directly		// Whether data can be copied to DstDevice directly
bool DeviceTy::isDataExchangable(const DeviceTy &DstDevice) {		bool DeviceTy::isDataExchangable(const DeviceTy &DstDevice) {
if (RTL != DstDevice.RTL \|\| !RTL->is_data_exchangable)		if (RTL != DstDevice.RTL \|\| !RTL->is_data_exchangable)
return false;		return false;

if (RTL->is_data_exchangable(RTLDeviceID, DstDevice.RTLDeviceID))		if (RTL->is_data_exchangable(RTLDeviceID, DstDevice.RTLDeviceID))
return (RTL->data_exchange != nullptr) \|\|		return (RTL->data_exchange != nullptr) \|\|
(RTL->data_exchange_async != nullptr);		(RTL->data_exchange_async != nullptr);

return false;		return false;
}		}

		int DeviceTy::initAsyncInfo(__tgt_async_info **AsyncInfo) {
		if (!RTL->AsyncSupported) {
		DP("Asynchronous offloading is not supported");
		return OFFLOAD_FAIL;
		}

		if (RTL->initAsyncInfo(RTLDeviceID, AsyncInfo) != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;
		(*AsyncInfo)->DeviceID = DeviceID;

		return OFFLOAD_SUCCESS;
		}

		int32_t DeviceTy::releaseAsyncInfo(__tgt_async_info *AsyncInfo) {
		if (!RTL->AsyncSupported) {
		DP("Asynchronous offloading is not supported");
		return OFFLOAD_FAIL;
		}

		return RTL->releaseAsyncInfo(RTLDeviceID, AsyncInfo);
		}

/// Check whether a device has an associated RTL and initialize it if it's not		/// Check whether a device has an associated RTL and initialize it if it's not
/// already initialized.		/// already initialized.
bool device_is_ready(int device_num) {		bool device_is_ready(int device_num) {
DP("Checking whether device %d is ready.\n", device_num);		DP("Checking whether device %d is ready.\n", device_num);
// Devices.size() can only change while registering a new		// Devices.size() can only change while registering a new
// library, so try to acquire the lock of RTLs' mutex.		// library, so try to acquire the lock of RTLs' mutex.
RTLsMtx->lock();		RTLsMtx->lock();
size_t Devices_size = Devices.size();		size_t Devices_size = Devices.size();
Show All 22 Lines

openmp/libomptarget/src/exports

Show All 19 Lines	global:
omp_target_alloc;		omp_target_alloc;
omp_target_free;		omp_target_free;
omp_target_is_present;		omp_target_is_present;
omp_target_memcpy;		omp_target_memcpy;
omp_target_memcpy_rect;		omp_target_memcpy_rect;
omp_target_associate_ptr;		omp_target_associate_ptr;
omp_target_disassociate_ptr;		omp_target_disassociate_ptr;
__kmpc_push_target_tripcount;		__kmpc_push_target_tripcount;
		__kmpc_free_async_info;
local:		local:
*;		*;
};		};

openmp/libomptarget/src/interface.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	case tgt_disabled:
}		}
break;		break;
case tgt_default:		case tgt_default:
FATAL_MESSAGE0(1, "default offloading policy must be switched to "		FATAL_MESSAGE0(1, "default offloading policy must be switched to "
"mandatory or disabled");		"mandatory or disabled");
break;		break;
case tgt_mandatory:		case tgt_mandatory:
if (!success) {		if (!success) {
FATAL_MESSAGE0(1, "failure of target construct while offloading is mandatory");		FATAL_MESSAGE0(
		1, "failure of target construct while offloading is mandatory");
}		}
break;		break;
}		}
}		}

		template <bool Begin> static bool checkAndInitDevice(int64_t &DeviceId) {
		if (IsOffloadDisabled())
		return false;

		// No devices available?
		if (DeviceId == OFFLOAD_DEVICE_DEFAULT) {
		DeviceId = omp_get_default_device();
		DP("Use default device id %" PRId64 "\n", DeviceId);
		}

		// Invalid device id as we always expect a non-negative device id and it must
		// be less than the size of all device RTLs
		if (DeviceId < 0 \|\| static_cast<uint64_t>(DeviceId) >= Devices.size()) {
		DP("Invalid device %" PRId64 "\n", DeviceId);
		return false;
		}

		if (!Begin)
		return true;

		if (CheckDeviceAndCtors(DeviceId) != OFFLOAD_SUCCESS) {
		DP("Failed to get device %" PRId64 " ready\n", DeviceId);
		HandleTargetOutcome(false);
		return false;
		} else {
		return true;
		}
		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// adds requires flags		/// adds requires flags
EXTERN void __tgt_register_requires(int64_t flags) {		EXTERN void __tgt_register_requires(int64_t flags) {
RTLs->RegisterRequires(flags);		RTLs->RegisterRequires(flags);
}		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// adds a target shared library to the target execution image		/// adds a target shared library to the target execution image
EXTERN void __tgt_register_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_register_lib(__tgt_bin_desc *desc) {
RTLs->RegisterLib(desc);		RTLs->RegisterLib(desc);
}		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// unloads a target shared library		/// unloads a target shared library
EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {
RTLs->UnregisterLib(desc);		RTLs->UnregisterLib(desc);
}		}

/// creates host-to-target data mapping, stores it in the		/// creates host-to-target data mapping, stores it in the
/// libomptarget.so internal structure (an entry in a stack of data maps)		/// libomptarget.so internal structure (an entry in a stack of data maps)
/// and passes the data to the device.		/// and passes the data to the device.
EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return;		// device_id will be corrected if it is default value
		if (!checkAndInitDevice<true>(device_id))
		return;

DP("Entering data begin region for device %" PRId64 " with %d mappings\n",		DP("Entering data begin region for device %" PRId64 " with %d mappings\n",
device_id, arg_num);		device_id, arg_num);

// No devices available?
if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();
DP("Use default device id %" PRId64 "\n", device_id);
}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);
HandleTargetOutcome(false);
return;
}

DeviceTy &Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i = 0; i < arg_num; ++i) {		for (int i = 0; i < arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n",		", Type=0x%" PRIx64 "\n",
i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target_data_begin(Device, arg_num, args_base, args, arg_sizes,		const int rc = targetData<TargetDataFuncTy::Begin>(
arg_types, nullptr);		Device, arg_num, args_base, args, arg_sizes, arg_types);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return;

		DP("Entering data begin region for device %" PRId64 " with %d mappings\n",
		device_id, arg_num);

		DeviceTy &Device = Devices[device_id];

__tgt_target_data_begin(device_id, arg_num, args_base, args, arg_sizes,		const int rc = targetDataNowait<TargetDataFuncTy::Begin>(
arg_types);		Device, arg_num, args_base, args, arg_sizes, arg_types, depNum, depList,
		noAliasDepNum, noAliasDepList);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

/// passes data from the target, releases target memory and destroys		/// passes data from the target, releases target memory and destroys
/// the host-target mapping (top entry from the stack of data maps)		/// the host-target mapping (top entry from the stack of data maps)
/// created by the last __tgt_target_data_begin.		/// created by the last __tgt_target_data_begin.
EXTERN void __tgt_target_data_end(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_end(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return;		// device_id will be corrected if it is default value
DP("Entering data end region with %d mappings\n", arg_num);		if (!checkAndInitDevice<false>(device_id))
		return;

// No devices available?		DP("Entering data end region for device %" PRId64 " with %d mappings\n",
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		device_id, arg_num);
device_id = omp_get_default_device();
}

RTLsMtx->lock();		RTLsMtx->lock();
size_t Devices_size = Devices.size();		size_t Devices_size = Devices.size();
RTLsMtx->unlock();		RTLsMtx->unlock();
if (Devices_size <= (size_t)device_id) {		if (Devices_size <= (size_t)device_id) {
DP("Device ID %" PRId64 " does not have a matching RTL.\n", device_id);		DP("Device ID %" PRId64 " does not have a matching RTL.\n", device_id);
HandleTargetOutcome(false);		HandleTargetOutcome(false);
return;		return;
}		}

DeviceTy &Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];
if (!Device.IsInit) {		if (!Device.IsInit) {
DP("Uninit device: ignore");		DP("Uninit device: ignore");
HandleTargetOutcome(false);		HandleTargetOutcome(false);
return;		return;
}		}

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i = 0; i < arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n",
arg_sizes[i], arg_types[i]);		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target_data_end(Device, arg_num, args_base, args, arg_sizes,		const int rc = targetData<TargetDataFuncTy::End>(Device, arg_num, args_base,
arg_types, nullptr);		args, arg_sizes, arg_types);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return;

__tgt_target_data_end(device_id, arg_num, args_base, args, arg_sizes,		DP("Entering data end region for device %" PRId64 " with %d mappings\n",
arg_types);		device_id, arg_num);
}

EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,		DeviceTy &Device = Devices[device_id];
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return;
DP("Entering data update with %d mappings\n", arg_num);

// No devices available?		const int rc = targetDataNowait<TargetDataFuncTy::End>(
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		Device, arg_num, args_base, args, arg_sizes, arg_types, depNum, depList,
device_id = omp_get_default_device();		noAliasDepNum, noAliasDepList);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,
DP("Failed to get device %" PRId64 " ready\n", device_id);		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
HandleTargetOutcome(false);		// device_id will be corrected if it is default value
		if (!checkAndInitDevice<true>(device_id))
return;		return;
}
		DP("Entering data update region for device %" PRId64 " with %d mappings\n",
		device_id, arg_num);

DeviceTy& Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];
int rc = target_data_update(Device, arg_num, args_base,		const int rc = targetData<TargetDataFuncTy::Update>(
args, arg_sizes, arg_types);		Device, arg_num, args_base, args, arg_sizes, arg_types);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_update_nowait(		EXTERN void __tgt_target_data_update_nowait(
int64_t device_id, int32_t arg_num, void args_base, void args,		int64_t device_id, int32_t arg_num, void args_base, void args,
int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,		int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,
int32_t noAliasDepNum, void *noAliasDepList) {		int32_t noAliasDepNum, void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return;

__tgt_target_data_update(device_id, arg_num, args_base, args, arg_sizes,		DP("Entering data update region for device %" PRId64 " with %d mappings\n",
arg_types);		device_id, arg_num);
}

EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,		DeviceTy &Device = Devices[device_id];
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return OFFLOAD_FAIL;
DP("Entering target region with entry point " DPxMOD " and device Id %"
PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {		// TODO: this part should be refined maybe in case of memory error
device_id = omp_get_default_device();		__tgt_async_info *async_info = new __tgt_async_info;
		async_info->DeviceID = device_id;

		const int rc = targetDataNowait<TargetDataFuncTy::Update>(
		Device, arg_num, args_base, args, arg_sizes, arg_types, depNum, depList,
		noAliasDepNum, noAliasDepList);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,
DP("Failed to get device %" PRId64 " ready\n", device_id);		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
HandleTargetOutcome(false);		// device_id will be corrected if it is default value
		if (!checkAndInitDevice<true>(device_id))
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}
		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
		"\n",
		DPxPTR(host_ptr), device_id);

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i = 0; i < arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n",
arg_sizes[i], arg_types[i]);		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		const int rc =
arg_types, 0, 0, false /team/);		target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,
		arg_types, /* TeamNum / 0, / ThreadLimit */ 0,
		/* IsTeamConstruct*/ false);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
return rc;		return rc;
}		}

EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,		int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return OFFLOAD_FAIL;

return __tgt_target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
arg_types);		"\n",
		DPxPTR(host_ptr), device_id);

		#ifdef OMPTARGET_DEBUG
		for (int i = 0; i < arg_num; ++i) {
		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
		", Type=0x%" PRIx64 "\n",
		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
		}
		#endif

		const int rc = targetNowait(
		device_id, host_ptr, arg_num, args_base, args, arg_sizes, arg_types,
		/* TeamNum / 0, / ThreadLimit / 0, / IsTeamConstruct */ false, depNum,
		depList, noAliasDepNum, noAliasDepList);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
		return rc;
}		}

EXTERN int __tgt_target_teams(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_teams(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types, int32_t team_num, int32_t thread_limit) {		int64_t *arg_types, int32_t team_num, int32_t thread_limit) {
if (IsOffloadDisabled()) return OFFLOAD_FAIL;		// device_id will be corrected if it is default value
DP("Entering target region with entry point " DPxMOD " and device Id %"		if (!checkAndInitDevice<true>(device_id))
PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();
}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);
HandleTargetOutcome(false);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}
		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
		"\n",
		DPxPTR(host_ptr), device_id);

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i = 0; i < arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n",
arg_sizes[i], arg_types[i]);		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		const int rc = target(device_id, host_ptr, arg_num, args_base, args,
arg_types, team_num, thread_limit, true /team/);		arg_sizes, arg_types, team_num, thread_limit,
		/* IsTeamConstruct */ true);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);

return rc;		return rc;
}		}

EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,		int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,
void depList, int32_t noAliasDepNum, void noAliasDepList) {		void depList, int32_t noAliasDepNum, void noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return OFFLOAD_FAIL;

		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
		"\n",
		DPxPTR(host_ptr), device_id);

return __tgt_target_teams(device_id, host_ptr, arg_num, args_base, args,		#ifdef OMPTARGET_DEBUG
arg_sizes, arg_types, team_num, thread_limit);		for (int i = 0; i < arg_num; ++i) {
		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
		", Type=0x%" PRIx64 "\n",
		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
		}
		#endif

		const int rc = targetNowait(device_id, host_ptr, arg_num, args_base, args,
		arg_sizes, arg_types, team_num, thread_limit,
		/* IsTeamConstruct */ true, depNum, depList,
		noAliasDepNum, noAliasDepList);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
		return rc;
}		}

// Get the current number of components for a user-defined mapper.		// Get the current number of components for a user-defined mapper.
EXTERN int64_t __tgt_mapper_num_components(void *rt_mapper_handle) {		EXTERN int64_t __tgt_mapper_num_components(void *rt_mapper_handle) {
auto MapperComponentsPtr = (struct MapperComponentsTy )rt_mapper_handle;		auto MapperComponentsPtr = (struct MapperComponentsTy )rt_mapper_handle;
int64_t size = MapperComponentsPtr->Components.size();		int64_t size = MapperComponentsPtr->Components.size();
DP("__tgt_mapper_num_components(Handle=" DPxMOD ") returns %" PRId64 "\n",		DP("__tgt_mapper_num_components(Handle=" DPxMOD ") returns %" PRId64 "\n",
DPxPTR(rt_mapper_handle), size);		DPxPTR(rt_mapper_handle), size);
Show All 24 Lines	EXTERN void __kmpc_push_target_tripcount(int64_t device_id,

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
HandleTargetOutcome(false);		HandleTargetOutcome(false);
return;		return;
}		}

DP("__kmpc_push_target_tripcount(%" PRId64 ", %" PRIu64 ")\n", device_id,		DP("__kmpc_push_target_tripcount(%" PRId64 ", %" PRIu64 ")\n", device_id,
loop_tripcount);		loop_tripcount);
TblMapMtx->lock();		TblMapMtx->lock();
Devices[device_id].LoopTripCnt.emplace(__kmpc_global_thread_num(NULL),		Devices[device_id].LoopTripCnt.emplace(__kmpc_global_thread_num(NULL),
loop_tripcount);		loop_tripcount);
TblMapMtx->unlock();		TblMapMtx->unlock();
}		}

		EXTERN void __kmpc_free_async_info(void *Ptr) {
		if (!Ptr)
		return;
		__tgt_async_info AsyncInfo = reinterpret_cast<__tgt_async_info >(Ptr);
		int DeviceId = AsyncInfo->DeviceID;

		assert(DeviceId >= 0 && "Invalid DeviceId");

		Devices[DeviceId].releaseAsyncInfo(AsyncInfo);
		}

openmp/libomptarget/src/omptarget.cpp

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	static int InitLibrary(DeviceTy& Device) {
*/		*/
if (!Device.PendingCtorsDtors.empty()) {		if (!Device.PendingCtorsDtors.empty()) {
// Call all ctors for all libraries registered so far		// Call all ctors for all libraries registered so far
for (auto &lib : Device.PendingCtorsDtors) {		for (auto &lib : Device.PendingCtorsDtors) {
if (!lib.second.PendingCtors.empty()) {		if (!lib.second.PendingCtors.empty()) {
DP("Has pending ctors... call now\n");		DP("Has pending ctors... call now\n");
for (auto &entry : lib.second.PendingCtors) {		for (auto &entry : lib.second.PendingCtors) {
void *ctor = entry;		void *ctor = entry;
int rc = target(device_id, ctor, 0, NULL, NULL, NULL,		int rc = target(device_id, ctor, 0, NULL, NULL, NULL, NULL, 1, 1,
NULL, 1, 1, true /team/);		/IsTeamConstruct/ true);
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP("Running ctor " DPxMOD " failed.\n", DPxPTR(ctor));		DP("Running ctor " DPxMOD " failed.\n", DPxPTR(ctor));
Device.PendingGlobalsMtx.unlock();		Device.PendingGlobalsMtx.unlock();
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
// Clear the list to indicate that this device has been used		// Clear the list to indicate that this device has been used
lib.second.PendingCtors.clear();		lib.second.PendingCtors.clear();
Show All 30 Lines	int CheckDeviceAndCtors(int64_t device_id) {

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

static int32_t member_of(int64_t type) {		static int32_t member_of(int64_t type) {
return ((type & OMP_TGT_MAPTYPE_MEMBER_OF) >> 48) - 1;		return ((type & OMP_TGT_MAPTYPE_MEMBER_OF) >> 48) - 1;
}		}

		static const unsigned LambdaMapping = OMP_TGT_MAPTYPE_PTR_AND_OBJ \|
		OMP_TGT_MAPTYPE_LITERAL \|
		OMP_TGT_MAPTYPE_IMPLICIT;
		static bool isLambdaMapping(int64_t Mapping) {
		return (Mapping & LambdaMapping) == LambdaMapping;
		}

		// Runtime functions from libomp
		extern "C" {
		void __kmpc_get_target_task_waiting_list(void *list, int num);
		void __kmpc_set_async_info(void *async_info);
		void __kmpc_target_task_yield();
		int __kmpc_get_target_task_npredecessors();
		}

		namespace {
/// Internal function to do the mapping and transfer the data to the device		/// Internal function to do the mapping and transfer the data to the device
int target_data_begin(DeviceTy &Device, int32_t arg_num, void **args_base,		int target_data_begin(DeviceTy &Device, int32_t arg_num, void **args_base,
void *args, int64_t arg_sizes, int64_t *arg_types,		void *args, int64_t arg_sizes, int64_t *arg_types,
__tgt_async_info *async_info_ptr) {		__tgt_async_info *AsyncInfo) {
// process each input.		// process each input.
for (int32_t i = 0; i < arg_num; ++i) {		for (int32_t i = 0; i < arg_num; ++i) {
// Ignore private variables and arrays - there is no mapping for them.		// Ignore private variables and arrays - there is no mapping for them.
if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))		(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
void *HstPtrBase = args_base[i];		void *HstPtrBase = args_base[i];
int64_t data_size = arg_sizes[i];		int64_t data_size = arg_sizes[i];

// Adjust for proper alignment if this is a combined entry (for structs).		// Adjust for proper alignment if this is a combined entry (for structs).
// Look at the next argument - if that is MEMBER_OF this one, then this one		// Look at the next argument - if that is MEMBER_OF this one, then this one
// is a combined entry.		// is a combined entry.
int64_t padding = 0;		int64_t padding = 0;
const int next_i = i+1;		const int next_i = i + 1;
if (member_of(arg_types[i]) < 0 && next_i < arg_num &&		if (member_of(arg_types[i]) < 0 && next_i < arg_num &&
member_of(arg_types[next_i]) == i) {		member_of(arg_types[next_i]) == i) {
padding = (int64_t)HstPtrBegin % alignment;		padding = (int64_t)HstPtrBegin % alignment;
if (padding) {		if (padding) {
DP("Using a padding of %" PRId64 " bytes for begin address " DPxMOD		DP("Using a padding of %" PRId64 " bytes for begin address " DPxMOD
"\n", padding, DPxPTR(HstPtrBegin));		"\n",
		padding, DPxPTR(HstPtrBegin));
HstPtrBegin = (char *) HstPtrBegin - padding;		HstPtrBegin = (char *)HstPtrBegin - padding;
data_size += padding;		data_size += padding;
}		}
}		}

// Address of pointer on the host and device, respectively.		// Address of pointer on the host and device, respectively.
void Pointer_HstPtrBegin, Pointer_TgtPtrBegin;		void Pointer_HstPtrBegin, Pointer_TgtPtrBegin;
bool IsNew, Pointer_IsNew;		bool IsNew, Pointer_IsNew;
bool IsHostPtr = false;		bool IsHostPtr = false;
bool IsImplicit = arg_types[i] & OMP_TGT_MAPTYPE_IMPLICIT;		bool IsImplicit = arg_types[i] & OMP_TGT_MAPTYPE_IMPLICIT;
// Force the creation of a device side copy of the data when:		// Force the creation of a device side copy of the data when:
// a close map modifier was associated with a map that contained a to.		// a close map modifier was associated with a map that contained a to.
bool HasCloseModifier = arg_types[i] & OMP_TGT_MAPTYPE_CLOSE;		bool HasCloseModifier = arg_types[i] & OMP_TGT_MAPTYPE_CLOSE;
// UpdateRef is based on MEMBER_OF instead of TARGET_PARAM because if we		// UpdateRef is based on MEMBER_OF instead of TARGET_PARAM because if we
// have reached this point via __tgt_target_data_begin and not __tgt_target		// have reached this point via __tgt_target_data_begin and not __tgt_target
// then no argument is marked as TARGET_PARAM ("omp target data map" is not		// then no argument is marked as TARGET_PARAM ("omp target data map" is not
// associated with a target region, so there are no target parameters). This		// associated with a target region, so there are no target parameters). This
// may be considered a hack, we could revise the scheme in the future.		// may be considered a hack, we could revise the scheme in the future.
bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF);		bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF);
if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {		if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {
DP("Has a pointer entry: \n");		DP("Has a pointer entry: \n");
// base is address of pointer.		// base is address of pointer.
Pointer_TgtPtrBegin = Device.getOrAllocTgtPtr(HstPtrBase, HstPtrBase,		Pointer_TgtPtrBegin = Device.getOrAllocTgtPtr(
sizeof(void *), Pointer_IsNew, IsHostPtr, IsImplicit, UpdateRef,		HstPtrBase, HstPtrBase, sizeof(void *), Pointer_IsNew, IsHostPtr,
HasCloseModifier);		IsImplicit, UpdateRef, HasCloseModifier);
if (!Pointer_TgtPtrBegin) {		if (!Pointer_TgtPtrBegin) {
DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "		DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "
"illegal mapping).\n");		"illegal mapping).\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
DP("There are %zu bytes allocated at target address " DPxMOD " - is%s new"		DP("There are %zu bytes allocated at target address " DPxMOD " - is%s new"
"\n", sizeof(void *), DPxPTR(Pointer_TgtPtrBegin),		"\n",
		sizeof(void *), DPxPTR(Pointer_TgtPtrBegin),
(Pointer_IsNew ? "" : " not"));		(Pointer_IsNew ? "" : " not"));
Pointer_HstPtrBegin = HstPtrBase;		Pointer_HstPtrBegin = HstPtrBase;
// modify current entry.		// modify current entry.
HstPtrBase = (void *)HstPtrBase;		HstPtrBase = (void *)HstPtrBase;
UpdateRef = true; // subsequently update ref count of pointee		UpdateRef = true; // subsequently update ref count of pointee
}		}

void *TgtPtrBegin = Device.getOrAllocTgtPtr(HstPtrBegin, HstPtrBase,		void *TgtPtrBegin = Device.getOrAllocTgtPtr(
data_size, IsNew, IsHostPtr, IsImplicit, UpdateRef, HasCloseModifier);		HstPtrBegin, HstPtrBase, data_size, IsNew, IsHostPtr, IsImplicit,
		UpdateRef, HasCloseModifier);
if (!TgtPtrBegin && data_size) {		if (!TgtPtrBegin && data_size) {
// If data_size==0, then the argument could be a zero-length pointer to		// If data_size==0, then the argument could be a zero-length pointer to
// NULL, so getOrAlloc() returning NULL is not an error.		// NULL, so getOrAlloc() returning NULL is not an error.
DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "		DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "
"illegal mapping).\n");		"illegal mapping).\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
DP("There are %" PRId64 " bytes allocated at target address " DPxMOD		DP("There are %" PRId64 " bytes allocated at target address " DPxMOD
" - is%s new\n", data_size, DPxPTR(TgtPtrBegin),		" - is%s new\n",
(IsNew ? "" : " not"));		data_size, DPxPTR(TgtPtrBegin), (IsNew ? "" : " not"));

if (arg_types[i] & OMP_TGT_MAPTYPE_RETURN_PARAM) {		if (arg_types[i] & OMP_TGT_MAPTYPE_RETURN_PARAM) {
uintptr_t Delta = (uintptr_t)HstPtrBegin - (uintptr_t)HstPtrBase;		uintptr_t Delta = (uintptr_t)HstPtrBegin - (uintptr_t)HstPtrBase;
void TgtPtrBase = (void )((uintptr_t)TgtPtrBegin - Delta);		void TgtPtrBase = (void )((uintptr_t)TgtPtrBegin - Delta);
DP("Returning device pointer " DPxMOD "\n", DPxPTR(TgtPtrBase));		DP("Returning device pointer " DPxMOD "\n", DPxPTR(TgtPtrBase));
args_base[i] = TgtPtrBase;		args_base[i] = TgtPtrBase;
}		}

Show All 12 Lines	if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
copy = true;		copy = true;
}		}
}		}
}		}

if (copy && !IsHostPtr) {		if (copy && !IsHostPtr) {
DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",
data_size, DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));		data_size, DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));
int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, data_size,		int rt =
async_info_ptr);		Device.data_submit(TgtPtrBegin, HstPtrBegin, data_size, AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ && !IsHostPtr) {		if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ && !IsHostPtr) {
DP("Update pointer (" DPxMOD ") -> [" DPxMOD "]\n",		DP("Update pointer (" DPxMOD ") -> [" DPxMOD "]\n",
DPxPTR(Pointer_TgtPtrBegin), DPxPTR(TgtPtrBegin));		DPxPTR(Pointer_TgtPtrBegin), DPxPTR(TgtPtrBegin));
uint64_t Delta = (uint64_t)HstPtrBegin - (uint64_t)HstPtrBase;		uint64_t Delta = (uint64_t)HstPtrBegin - (uint64_t)HstPtrBase;
void TgtPtrBase = (void )((uint64_t)TgtPtrBegin - Delta);		void TgtPtrBase = (void )((uint64_t)TgtPtrBegin - Delta);
int rt = Device.data_submit(Pointer_TgtPtrBegin, &TgtPtrBase,		int rt = Device.data_submit(Pointer_TgtPtrBegin, &TgtPtrBase,
sizeof(void *), async_info_ptr);		sizeof(void *), AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
// create shadow pointers for this entry		// create shadow pointers for this entry
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
Device.ShadowPtrMap[Pointer_HstPtrBegin] = {HstPtrBase,		Device.ShadowPtrMap[Pointer_HstPtrBegin] = {
Pointer_TgtPtrBegin, TgtPtrBase};		HstPtrBase, Pointer_TgtPtrBegin, TgtPtrBase};
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
}		}
}		}

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

/// Internal function to undo the mapping and retrieve the data from the device.		/// Internal function to undo the mapping and retrieve the data from the device.
int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,		int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,
void *args, int64_t arg_sizes, int64_t *arg_types,		void *args, int64_t arg_sizes, int64_t *arg_types,
__tgt_async_info *async_info_ptr) {		__tgt_async_info *AsyncInfo) {
// process each input.		// process each input.
for (int32_t i = arg_num - 1; i >= 0; --i) {		for (int32_t i = arg_num - 1; i >= 0; --i) {
// Ignore private variables and arrays - there is no mapping for them.		// Ignore private variables and arrays - there is no mapping for them.
// Also, ignore the use_device_ptr directive, it has no effect here.		// Also, ignore the use_device_ptr directive, it has no effect here.
if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))		(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
int64_t data_size = arg_sizes[i];		int64_t data_size = arg_sizes[i];
// Adjust for proper alignment if this is a combined entry (for structs).		// Adjust for proper alignment if this is a combined entry (for structs).
// Look at the next argument - if that is MEMBER_OF this one, then this one		// Look at the next argument - if that is MEMBER_OF this one, then this one
// is a combined entry.		// is a combined entry.
int64_t padding = 0;		int64_t padding = 0;
const int next_i = i+1;		const int next_i = i + 1;
if (member_of(arg_types[i]) < 0 && next_i < arg_num &&		if (member_of(arg_types[i]) < 0 && next_i < arg_num &&
member_of(arg_types[next_i]) == i) {		member_of(arg_types[next_i]) == i) {
padding = (int64_t)HstPtrBegin % alignment;		padding = (int64_t)HstPtrBegin % alignment;
if (padding) {		if (padding) {
DP("Using a padding of %" PRId64 " bytes for begin address " DPxMOD		DP("Using a padding of %" PRId64 " bytes for begin address " DPxMOD
"\n", padding, DPxPTR(HstPtrBegin));		"\n",
		padding, DPxPTR(HstPtrBegin));
HstPtrBegin = (char *) HstPtrBegin - padding;		HstPtrBegin = (char *)HstPtrBegin - padding;
data_size += padding;		data_size += padding;
}		}
}		}

bool IsLast, IsHostPtr;		bool IsLast, IsHostPtr;
bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) \|\|		bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ);		(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ);
bool ForceDelete = arg_types[i] & OMP_TGT_MAPTYPE_DELETE;		bool ForceDelete = arg_types[i] & OMP_TGT_MAPTYPE_DELETE;
bool HasCloseModifier = arg_types[i] & OMP_TGT_MAPTYPE_CLOSE;		bool HasCloseModifier = arg_types[i] & OMP_TGT_MAPTYPE_CLOSE;

// If PTR_AND_OBJ, HstPtrBegin is address of pointee		// If PTR_AND_OBJ, HstPtrBegin is address of pointee
void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, data_size, IsLast,		void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, data_size, IsLast,
UpdateRef, IsHostPtr);		UpdateRef, IsHostPtr);
DP("There are %" PRId64 " bytes allocated at target address " DPxMOD		DP("There are %" PRId64 " bytes allocated at target address " DPxMOD
" - is%s last\n", data_size, DPxPTR(TgtPtrBegin),		" - is%s last\n",
(IsLast ? "" : " not"));		data_size, DPxPTR(TgtPtrBegin), (IsLast ? "" : " not"));

bool DelEntry = IsLast \|\| ForceDelete;		bool DelEntry = IsLast \|\| ForceDelete;

if ((arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) &&		if ((arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) &&
!(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ)) {		!(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ)) {
DelEntry = false; // protect parent struct from being deallocated		DelEntry = false; // protect parent struct from being deallocated
}		}

Show All 17 Lines	if ((arg_types[i] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {
}		}

if ((DelEntry \|\| Always \|\| CopyMember) &&		if ((DelEntry \|\| Always \|\| CopyMember) &&
!(RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&		!(RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
TgtPtrBegin == HstPtrBegin)) {		TgtPtrBegin == HstPtrBegin)) {
DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",
data_size, DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));		data_size, DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));
int rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, data_size,		int rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, data_size,
async_info_ptr);		AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data from device failed.\n");		DP("Copying data from device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
}		}

// If we copied back to the host a struct/array containing pointers, we		// If we copied back to the host a struct/array containing pointers, we
// need to restore the original host pointer values from their shadow		// need to restore the original host pointer values from their shadow
// copies. If the struct is going to be deallocated, remove any remaining		// copies. If the struct is going to be deallocated, remove any remaining
// shadow pointer entries for this struct.		// shadow pointer entries for this struct.
uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t)HstPtrBegin;
uintptr_t ub = (uintptr_t) HstPtrBegin + data_size;		uintptr_t ub = (uintptr_t)HstPtrBegin + data_size;
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end();) {		it != Device.ShadowPtrMap.end();) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void )it->first;

// An STL map is sorted on its keys; use this property		// An STL map is sorted on its keys; use this property
// to quickly determine when to break out of the loop.		// to quickly determine when to break out of the loop.
if ((uintptr_t) ShadowHstPtrAddr < lb) {		if ((uintptr_t)ShadowHstPtrAddr < lb) {
++it;		++it;
continue;		continue;
}		}
if ((uintptr_t) ShadowHstPtrAddr >= ub)		if ((uintptr_t)ShadowHstPtrAddr >= ub)
break;		break;

// If we copied the struct to the host, we need to restore the pointer.		// If we copied the struct to the host, we need to restore the pointer.
if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {		if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {
DP("Restoring original host pointer value " DPxMOD " for host "		DP("Restoring original host pointer value " DPxMOD " for host "
"pointer " DPxMOD "\n", DPxPTR(it->second.HstPtrVal),		"pointer " DPxMOD "\n",
DPxPTR(ShadowHstPtrAddr));		DPxPTR(it->second.HstPtrVal), DPxPTR(ShadowHstPtrAddr));
*ShadowHstPtrAddr = it->second.HstPtrVal;		*ShadowHstPtrAddr = it->second.HstPtrVal;
}		}
// If the struct is to be deallocated, remove the shadow entry.		// If the struct is to be deallocated, remove the shadow entry.
if (DelEntry) {		if (DelEntry) {
DP("Removing shadow pointer " DPxMOD "\n", DPxPTR(ShadowHstPtrAddr));		DP("Removing shadow pointer " DPxMOD "\n", DPxPTR(ShadowHstPtrAddr));
it = Device.ShadowPtrMap.erase(it);		it = Device.ShadowPtrMap.erase(it);
} else {		} else {
++it;		++it;
Show All 12 Lines	if ((arg_types[i] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {
}		}
}		}
}		}

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

/// Internal function to pass data to/from the target.		/// Internal function to pass data to/from the target.
int target_data_update(DeviceTy &Device, int32_t arg_num,		int target_data_update(DeviceTy &Device, int32_t arg_num, void **args_base,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void *args, int64_t arg_sizes, int64_t *arg_types,
		__tgt_async_info *AsyncInfo) {
// process each input.		// process each input.
for (int32_t i = 0; i < arg_num; ++i) {		for (int32_t i = 0; i < arg_num; ++i) {
if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))		(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
int64_t MapSize = arg_sizes[i];		int64_t MapSize = arg_sizes[i];
bool IsLast, IsHostPtr;		bool IsLast, IsHostPtr;
void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, MapSize, IsLast,		void *TgtPtrBegin =
false, IsHostPtr);		Device.getTgtPtrBegin(HstPtrBegin, MapSize, IsLast, false, IsHostPtr);
if (!TgtPtrBegin) {		if (!TgtPtrBegin) {
DP("hst data:" DPxMOD " not found, becomes a noop\n", DPxPTR(HstPtrBegin));		DP("hst data:" DPxMOD " not found, becomes a noop\n",
		DPxPTR(HstPtrBegin));
continue;		continue;
}		}

if (RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&		if (RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
TgtPtrBegin == HstPtrBegin) {		TgtPtrBegin == HstPtrBegin) {
DP("hst data:" DPxMOD " unified and shared, becomes a noop\n",		DP("hst data:" DPxMOD " unified and shared, becomes a noop\n",
DPxPTR(HstPtrBegin));		DPxPTR(HstPtrBegin));
continue;		continue;
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {		if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {
DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));		arg_sizes[i], DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));
int rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, MapSize, nullptr);		int rt =
		Device.data_retrieve(HstPtrBegin, TgtPtrBegin, MapSize, AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data from device failed.\n");		DP("Copying data from device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t)HstPtrBegin;
uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;		uintptr_t ub = (uintptr_t)HstPtrBegin + MapSize;
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end(); ++it) {		it != Device.ShadowPtrMap.end(); ++it) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void )it->first;
if ((uintptr_t) ShadowHstPtrAddr < lb)		if ((uintptr_t)ShadowHstPtrAddr < lb)
continue;		continue;
if ((uintptr_t) ShadowHstPtrAddr >= ub)		if ((uintptr_t)ShadowHstPtrAddr >= ub)
break;		break;
DP("Restoring original host pointer value " DPxMOD " for host pointer "		DP("Restoring original host pointer value " DPxMOD
DPxMOD "\n", DPxPTR(it->second.HstPtrVal),		" for host pointer " DPxMOD "\n",
DPxPTR(ShadowHstPtrAddr));		DPxPTR(it->second.HstPtrVal), DPxPTR(ShadowHstPtrAddr));
*ShadowHstPtrAddr = it->second.HstPtrVal;		*ShadowHstPtrAddr = it->second.HstPtrVal;
}		}
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {		if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));		arg_sizes[i], DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));
int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, MapSize, nullptr);		int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, MapSize, AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t)HstPtrBegin;
uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;		uintptr_t ub = (uintptr_t)HstPtrBegin + MapSize;
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end(); ++it) {		it != Device.ShadowPtrMap.end(); ++it) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void )it->first;
if ((uintptr_t) ShadowHstPtrAddr < lb)		if ((uintptr_t)ShadowHstPtrAddr < lb)
continue;		continue;
if ((uintptr_t) ShadowHstPtrAddr >= ub)		if ((uintptr_t)ShadowHstPtrAddr >= ub)
break;		break;
DP("Restoring original target pointer value " DPxMOD " for target "		DP("Restoring original target pointer value " DPxMOD " for target "
"pointer " DPxMOD "\n", DPxPTR(it->second.TgtPtrVal),		"pointer " DPxMOD "\n",
DPxPTR(it->second.TgtPtrAddr));		DPxPTR(it->second.TgtPtrVal), DPxPTR(it->second.TgtPtrAddr));
rt = Device.data_submit(it->second.TgtPtrAddr,		rt = Device.data_submit(it->second.TgtPtrAddr, &it->second.TgtPtrVal,
&it->second.TgtPtrVal, sizeof(void *), nullptr);		sizeof(void *), AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
}		}
}		}
return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

static const unsigned LambdaMapping = OMP_TGT_MAPTYPE_PTR_AND_OBJ \|
OMP_TGT_MAPTYPE_LITERAL \|
OMP_TGT_MAPTYPE_IMPLICIT;
static bool isLambdaMapping(int64_t Mapping) {
return (Mapping & LambdaMapping) == LambdaMapping;
}

/// performs the same actions as data_begin in case arg_num is		/// performs the same actions as data_begin in case arg_num is
/// non-zero and initiates run of the offloaded region on the target platform;		/// non-zero and initiates run of the offloaded region on the target platform;
/// if arg_num is non-zero after the region execution is done it also		/// if arg_num is non-zero after the region execution is done it also
/// performs the same action as data_update and data_end above. This function		/// performs the same action as data_update and data_end above. This function
/// returns 0 if it was able to transfer the execution to a target and an		/// returns 0 if it was able to transfer the execution to a target and an
/// integer different from zero otherwise.		/// integer different from zero otherwise.
int target(int64_t device_id, void *host_ptr, int32_t arg_num,		int target(int64_t device_id, void host_ptr, int32_t arg_num, void *args_base,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void *args, int64_t arg_sizes, int64_t *arg_types,
int32_t team_num, int32_t thread_limit, int IsTeamConstruct) {		int32_t team_num, int32_t thread_limit, int IsTeamConstruct,
		__tgt_async_info *AsyncInfo) {
		jdoerfertUnsubmitted Not Done Reply Inline Actions I guess we can make async info a pointer argument in a separate (NFC) patch to reduce this one, WDYT? jdoerfert: I guess we can make async info a pointer argument in a separate (NFC) patch to reduce this one…
DeviceTy &Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];

// Find the table information in the map or look it up in the translation		// Find the table information in the map or look it up in the translation
// tables.		// tables.
TableMap *TM = 0;		TableMap *TM = 0;
TblMapMtx->lock();		TblMapMtx->lock();
HostPtrToTableMapTy::iterator TableMapIt = HostPtrToTableMap->find(host_ptr);		HostPtrToTableMapTy::iterator TableMapIt = HostPtrToTableMap->find(host_ptr);
if (TableMapIt == HostPtrToTableMap->end()) {		if (TableMapIt == HostPtrToTableMap->end()) {
Show All 37 Lines	int target(int64_t device_id, void host_ptr, int32_t arg_num, void *args_base,
// get target table.		// get target table.
TrlTblMtx->lock();		TrlTblMtx->lock();
assert(TM->Table->TargetsTable.size() > (size_t)device_id &&		assert(TM->Table->TargetsTable.size() > (size_t)device_id &&
"Not expecting a device ID outside the table's bounds!");		"Not expecting a device ID outside the table's bounds!");
__tgt_target_table *TargetTable = TM->Table->TargetsTable[device_id];		__tgt_target_table *TargetTable = TM->Table->TargetsTable[device_id];
TrlTblMtx->unlock();		TrlTblMtx->unlock();
assert(TargetTable && "Global data has not been mapped\n");		assert(TargetTable && "Global data has not been mapped\n");

__tgt_async_info AsyncInfo;

// Move data to device.		// Move data to device.
int rc = target_data_begin(Device, arg_num, args_base, args, arg_sizes,		int rc = target_data_begin(Device, arg_num, args_base, args, arg_sizes,
arg_types, &AsyncInfo);		arg_types, AsyncInfo);
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP("Call to target_data_begin failed, abort target.\n");		DP("Call to target_data_begin failed, abort target.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

std::vector<void *> tgt_args;		std::vector<void *> tgt_args;
std::vector<ptrdiff_t> tgt_offsets;		std::vector<ptrdiff_t> tgt_offsets;

Show All 17 Lines	if (!(arg_types[i] & OMP_TGT_MAPTYPE_TARGET_PARAM)) {
void *HstPtrBegin = args_base[i];		void *HstPtrBegin = args_base[i];
void *HstPtrBase = args[idx];		void *HstPtrBase = args[idx];
bool IsLast, IsHostPtr; // unused.		bool IsLast, IsHostPtr; // unused.
void *TgtPtrBase =		void *TgtPtrBase =
(void *)((intptr_t)tgt_args[tgtIdx] + tgt_offsets[tgtIdx]);		(void *)((intptr_t)tgt_args[tgtIdx] + tgt_offsets[tgtIdx]);
DP("Parent lambda base " DPxMOD "\n", DPxPTR(TgtPtrBase));		DP("Parent lambda base " DPxMOD "\n", DPxPTR(TgtPtrBase));
uint64_t Delta = (uint64_t)HstPtrBegin - (uint64_t)HstPtrBase;		uint64_t Delta = (uint64_t)HstPtrBegin - (uint64_t)HstPtrBase;
void TgtPtrBegin = (void )((uintptr_t)TgtPtrBase + Delta);		void TgtPtrBegin = (void )((uintptr_t)TgtPtrBase + Delta);
void *Pointer_TgtPtrBegin =		void *Pointer_TgtPtrBegin = Device.getTgtPtrBegin(
Device.getTgtPtrBegin(HstPtrVal, arg_sizes[i], IsLast, false,		HstPtrVal, arg_sizes[i], IsLast, false, IsHostPtr);
IsHostPtr);
if (!Pointer_TgtPtrBegin) {		if (!Pointer_TgtPtrBegin) {
DP("No lambda captured variable mapped (" DPxMOD ") - ignored\n",		DP("No lambda captured variable mapped (" DPxMOD ") - ignored\n",
DPxPTR(HstPtrVal));		DPxPTR(HstPtrVal));
continue;		continue;
}		}
if (RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&		if (RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
TgtPtrBegin == HstPtrBegin) {		TgtPtrBegin == HstPtrBegin) {
DP("Unified memory is active, no need to map lambda captured"		DP("Unified memory is active, no need to map lambda captured"
"variable (" DPxMOD ")\n", DPxPTR(HstPtrVal));		"variable (" DPxMOD ")\n",
		DPxPTR(HstPtrVal));
continue;		continue;
}		}
DP("Update lambda reference (" DPxMOD ") -> [" DPxMOD "]\n",		DP("Update lambda reference (" DPxMOD ") -> [" DPxMOD "]\n",
DPxPTR(Pointer_TgtPtrBegin), DPxPTR(TgtPtrBegin));		DPxPTR(Pointer_TgtPtrBegin), DPxPTR(TgtPtrBegin));
int rt = Device.data_submit(TgtPtrBegin, &Pointer_TgtPtrBegin,		int rt = Device.data_submit(TgtPtrBegin, &Pointer_TgtPtrBegin,
sizeof(void *), &AsyncInfo);		sizeof(void *), AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
continue;		continue;
}		}
void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
void *HstPtrBase = args_base[i];		void *HstPtrBase = args_base[i];
void *TgtPtrBegin;		void *TgtPtrBegin;
ptrdiff_t TgtBaseOffset;		ptrdiff_t TgtBaseOffset;
bool IsLast, IsHostPtr; // unused.		bool IsLast, IsHostPtr; // unused.
if (arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) {		if (arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) {
DP("Forwarding first-private value " DPxMOD " to the target construct\n",		DP("Forwarding first-private value " DPxMOD " to the target construct\n",
DPxPTR(HstPtrBase));		DPxPTR(HstPtrBase));
TgtPtrBegin = HstPtrBase;		TgtPtrBegin = HstPtrBase;
TgtBaseOffset = 0;		TgtBaseOffset = 0;
} else if (arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE) {		} else if (arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE) {
// Allocate memory for (first-)private array		// Allocate memory for (first-)private array
TgtPtrBegin = Device.RTL->data_alloc(Device.RTLDeviceID,		TgtPtrBegin =
arg_sizes[i], HstPtrBegin);		Device.RTL->data_alloc(Device.RTLDeviceID, arg_sizes[i], HstPtrBegin);
if (!TgtPtrBegin) {		if (!TgtPtrBegin) {
DP ("Data allocation for %sprivate array " DPxMOD " failed, "		DP("Data allocation for %sprivate array " DPxMOD " failed, "
"abort target.\n",		"abort target.\n",
(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),		(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),
DPxPTR(HstPtrBegin));		DPxPTR(HstPtrBegin));
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
fpArrays.push_back(TgtPtrBegin);		fpArrays.push_back(TgtPtrBegin);
TgtBaseOffset = (intptr_t)HstPtrBase - (intptr_t)HstPtrBegin;		TgtBaseOffset = (intptr_t)HstPtrBase - (intptr_t)HstPtrBegin;
#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
void TgtPtrBase = (void )((intptr_t)TgtPtrBegin + TgtBaseOffset);		void TgtPtrBase = (void )((intptr_t)TgtPtrBegin + TgtBaseOffset);
DP("Allocated %" PRId64 " bytes of target memory at " DPxMOD " for "		DP("Allocated %" PRId64 " bytes of target memory at " DPxMOD " for "
"%sprivate array " DPxMOD " - pushing target argument " DPxMOD "\n",		"%sprivate array " DPxMOD " - pushing target argument " DPxMOD "\n",
arg_sizes[i], DPxPTR(TgtPtrBegin),		arg_sizes[i], DPxPTR(TgtPtrBegin),
(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),		(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),
DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBase));		DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBase));
#endif		#endif
// If first-private, copy data from host		// If first-private, copy data from host
if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {		if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, arg_sizes[i],		int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, arg_sizes[i],
&AsyncInfo);		AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed, failed.\n");		DP("Copying data to device failed, failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
} else if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {		} else if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {
TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBase, sizeof(void *), IsLast,		TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBase, sizeof(void *), IsLast,
false, IsHostPtr);		false, IsHostPtr);
TgtBaseOffset = 0; // no offset for ptrs.		TgtBaseOffset = 0; // no offset for ptrs.
DP("Obtained target argument " DPxMOD " from host pointer " DPxMOD " to "		DP("Obtained target argument " DPxMOD " from host pointer " DPxMOD " to "
"object " DPxMOD "\n", DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBase),		"object " DPxMOD "\n",
DPxPTR(HstPtrBase));		DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBase), DPxPTR(HstPtrBase));
} else {		} else {
TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, arg_sizes[i], IsLast,		TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, arg_sizes[i], IsLast,
false, IsHostPtr);		false, IsHostPtr);
TgtBaseOffset = (intptr_t)HstPtrBase - (intptr_t)HstPtrBegin;		TgtBaseOffset = (intptr_t)HstPtrBase - (intptr_t)HstPtrBegin;
#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
void TgtPtrBase = (void )((intptr_t)TgtPtrBegin + TgtBaseOffset);		void TgtPtrBase = (void )((intptr_t)TgtPtrBegin + TgtBaseOffset);
DP("Obtained target argument " DPxMOD " from host pointer " DPxMOD "\n",		DP("Obtained target argument " DPxMOD " from host pointer " DPxMOD "\n",
DPxPTR(TgtPtrBase), DPxPTR(HstPtrBegin));		DPxPTR(TgtPtrBase), DPxPTR(HstPtrBegin));
#endif		#endif
}		}
tgtArgsPositions[i] = tgt_args.size();		tgtArgsPositions[i] = tgt_args.size();
tgt_args.push_back(TgtPtrBegin);		tgt_args.push_back(TgtPtrBegin);
tgt_offsets.push_back(TgtBaseOffset);		tgt_offsets.push_back(TgtBaseOffset);
}		}

assert(tgt_args.size() == tgt_offsets.size() &&		assert(tgt_args.size() == tgt_offsets.size() &&
"Size mismatch in arguments and offsets");		"Size mismatch in arguments and offsets");

// Pop loop trip count		// Pop loop trip count
uint64_t ltc = 0;		uint64_t ltc = 0;
TblMapMtx->lock();		TblMapMtx->lock();
auto I = Device.LoopTripCnt.find(__kmpc_global_thread_num(NULL));		auto I = Device.LoopTripCnt.find(__kmpc_global_thread_num(NULL));
if (I != Device.LoopTripCnt.end()) {		if (I != Device.LoopTripCnt.end()) {
ltc = I->second;		ltc = I->second;
Device.LoopTripCnt.erase(I);		Device.LoopTripCnt.erase(I);
DP("loop trip count is %lu.\n", ltc);		DP("loop trip count is %lu.\n", ltc);
}		}
TblMapMtx->unlock();		TblMapMtx->unlock();

// Launch device execution.		// Launch device execution.
DP("Launching target execution %s with pointer " DPxMOD " (index=%d).\n",		DP("Launching target execution %s with pointer " DPxMOD " (index=%d).\n",
TargetTable->EntriesBegin[TM->Index].name,		TargetTable->EntriesBegin[TM->Index].name,
DPxPTR(TargetTable->EntriesBegin[TM->Index].addr), TM->Index);		DPxPTR(TargetTable->EntriesBegin[TM->Index].addr), TM->Index);
if (IsTeamConstruct) {		if (IsTeamConstruct) {
rc = Device.run_team_region(TargetTable->EntriesBegin[TM->Index].addr,		rc = Device.run_team_region(TargetTable->EntriesBegin[TM->Index].addr,
&tgt_args[0], &tgt_offsets[0], tgt_args.size(),		&tgt_args[0], &tgt_offsets[0], tgt_args.size(),
team_num, thread_limit, ltc, &AsyncInfo);		team_num, thread_limit, ltc, AsyncInfo);
} else {		} else {
rc = Device.run_region(TargetTable->EntriesBegin[TM->Index].addr,		rc = Device.run_region(TargetTable->EntriesBegin[TM->Index].addr,
&tgt_args[0], &tgt_offsets[0], tgt_args.size(),		&tgt_args[0], &tgt_offsets[0], tgt_args.size(),
&AsyncInfo);		AsyncInfo);
}		}
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP ("Executing target region abort target.\n");		DP("Executing target region abort target.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

// Deallocate (first-)private arrays		// Deallocate (first-)private arrays
for (auto it : fpArrays) {		for (auto it : fpArrays) {
int rt = Device.RTL->data_delete(Device.RTLDeviceID, it);		int rt = Device.RTL->data_delete(Device.RTLDeviceID, it);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Deallocation of (first-)private arrays failed.\n");		DP("Deallocation of (first-)private arrays failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}

// Move data from device.		// Move data from device.
int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,		int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,
arg_types, &AsyncInfo);		arg_types, AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Call to target_data_end failed, abort targe.\n");		DP("Call to target_data_end failed, abort target.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

if (Device.RTL->synchronize)		return OFFLOAD_SUCCESS;
return Device.RTL->synchronize(device_id, &AsyncInfo);		}

		int queryAndWait(DeviceTy &Device, __tgt_async_info *AsyncInfo) {
		// TODO: Do we need to make it configurable?
		constexpr const int MAX_TASK_YIELD_COUNT = 16;
		int TaskYieldCount = 0;
		while (1) {
		int Ret = Device.RTL->check_event(Device.RTLDeviceID, AsyncInfo);
		if (Ret == OFFLOAD_SUCCESS)
return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
		// Something wrong
		if (Ret == OFFLOAD_FAIL)
		return OFFLOAD_FAIL;
		// We have yielded enough time. Now do blocking waiting here.
		if (TaskYieldCount > MAX_TASK_YIELD_COUNT)
		return Device.RTL->synchronize(Device.RTLDeviceID, AsyncInfo);
		// Still not finished yet, do task yield
		__kmpc_target_task_yield();
		++TaskYieldCount;
		}

		assert("It should never reach this point!");
		// It should never reach this point
		return OFFLOAD_FAIL;
}		}

		int recordEvent(DeviceTy &Device, __tgt_async_info *AsyncInfo) {
		int Ret = Device.RTL->record_event(Device.RTLDeviceID, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		assert(AsyncInfo->Event && "AsyncInfo->Event is nullptr");

		return OFFLOAD_SUCCESS;
		}

		int waitForDeps(DeviceTy &Device, __tgt_async_info *AsyncInfo) {
		// Wait until current task has no depending task because during the task
		// creation as we enqueue the task even if it has depending host tasks.
		while (__kmpc_get_target_task_npredecessors() != 0)
		__kmpc_target_task_yield();

		int Num;

		// Get the number of events that this task depends on
		__kmpc_get_target_task_waiting_list(nullptr, &Num);

		// We have a number of depending tasks so we need to insert the event wait
		// before pushing operations of current task into the queue
		if (Num > 0) {
		// Get a list of events that this task depends on
		std::vector<void *> WaitingList(Num, nullptr);
		__kmpc_get_target_task_waiting_list(WaitingList.data(), &Num);

		for (int I = 0; I < Num; ++I) {
		__tgt_async_info *WaitingAsyncInfo =
		reinterpret_cast<__tgt_async_info *>(WaitingList[I]);

		assert(WaitingAsyncInfo && "WaitingAsyncInfo is nullptr");
		assert(WaitingAsyncInfo->Event && "WaitingAsyncInfo->Event is nullptr");
		assert(WaitingAsyncInfo->DeviceID != -1 &&
		"Invalid WaitingAsyncInfo->DeviceID");

		int Ret;

		// Depend on a target task of different type. We do query and wait here.
		if (Device.RTL != Devices[WaitingAsyncInfo->DeviceID].RTL) {
		Ret =
		queryAndWait(Devices[WaitingAsyncInfo->DeviceID], WaitingAsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		continue;
		}

		Ret = Device.RTL->wait_event(Device.RTLDeviceID, AsyncInfo,
		WaitingAsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;
		}
		}

		return OFFLOAD_SUCCESS;
		}
		} // namespace

		int target(int64_t DeviceID, void HostPtr, int32_t ArgNum, void *ArgsBase,
		void *Args, int64_t ArgSizes, int64_t *ArgTypes, int32_t TeamNum,
		int32_t ThreadLimit, int IsTeamConstruct) {
		return target(DeviceID, HostPtr, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		TeamNum, ThreadLimit, IsTeamConstruct,
		/* AsyncInfo */ nullptr);
		}

		int targetNowait(int64_t DeviceID, void *HostPtr, int32_t ArgNum,
		void ArgsBase, void Args, int64_t *ArgSizes,
		int64_t *ArgTypes, int32_t TeamNum, int32_t ThreadLimit,
		int IsTeamConstruct, int32_t DepNum, void *DepList,
		int32_t NoAliasDepNum, void *NoAliasDepList) {
		DeviceTy &Device = Devices[DeviceID];

		// Fall back to synchronous version if necessary interfaces are not supported
		if (!Device.RTL->AsyncSupported) {
		// Wait until current task has no depending task because during the task
		// creation, we enqueue the task even if it has depending target tasks but
		// here we don't have enough API to do asynchronous offloading, therefore we
		// need to make sure that all depending tasks are finished.
		while (__kmpc_get_target_task_npredecessors() != 0)
		__kmpc_target_task_yield();

		// TODO: Need to wait for all dependencies in successors as well

		return target(DeviceID, HostPtr, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		TeamNum, ThreadLimit, IsTeamConstruct);
		}

		__tgt_async_info *AsyncInfo;
		int Ret = Device.initAsyncInfo(&AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = waitForDeps(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = target(DeviceID, HostPtr, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		TeamNum, ThreadLimit, IsTeamConstruct, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = recordEvent(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		// Attach the async info to current task such that all dependent tasks can
		// start wait for the event if there is any dependency
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum > 0)
		// __kmpc_set_async_info(AsyncInfo);
		__kmpc_set_async_info(AsyncInfo);

		Ret = queryAndWait(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		// If there is no dependency, we should release all async info. Otherwise, the
		// info will be released when the task is finished and its depnode is freed.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// return Device.RTL->releaseAsyncInfo(Device.RTLDeviceID, AsyncInfo);
		// else
		// return OFFLOAD_SUCCESS;
		return OFFLOAD_SUCCESS;
		}

		template <TargetDataFuncTy F>
		int targetData(DeviceTy &Device, int32_t ArgNum, void ArgsBase, void Args,
		int64_t ArgSizes, int64_t ArgTypes) {
		if (F == TargetDataFuncTy::Begin)
		return target_data_begin(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		nullptr);
		if (F == TargetDataFuncTy::End)
		return target_data_end(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		nullptr);
		if (F == TargetDataFuncTy::Update)
		return target_data_update(Device, ArgNum, ArgsBase, Args, ArgSizes,
		ArgTypes, nullptr);
		}

		template int targetData<TargetDataFuncTy::Begin>(DeviceTy &Device,
		int32_t ArgNum,
		void ArgsBase, void Args,
		int64_t *ArgSizes,
		int64_t *ArgTypes);
		template int targetData<TargetDataFuncTy::End>(DeviceTy &Device, int32_t ArgNum,
		void ArgsBase, void Args,
		int64_t *ArgSizes,
		int64_t *ArgTypes);
		template int targetData<TargetDataFuncTy::Update>(DeviceTy &Device,
		int32_t ArgNum,
		void ArgsBase, void Args,
		int64_t *ArgSizes,
		int64_t *ArgTypes);
		template <TargetDataFuncTy F>
		int targetDataNowait(DeviceTy &Device, int32_t ArgNum, void **ArgsBase,
		void *Args, int64_t ArgSizes, int64_t *ArgTypes,
		int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
		void *NoAliasDepList) {
		// Fall back to synchronous version if necessary interfaces are not supported
		if (!Device.RTL->AsyncSupported) {
		// Wait until current task has no depending task because during the task
		// creation, we enqueue the task even if it has depending target tasks but
		// here we don't have enough API to do asynchronous offloading, therefore we
		// need to make sure that all depending tasks are finished.
		while (__kmpc_get_target_task_npredecessors() != 0)
		__kmpc_target_task_yield();

		// TODO: Need to wait for all dependencies in successors as well

		return targetData<F>(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes);
		}

		__tgt_async_info *AsyncInfo;
		int Ret = Device.initAsyncInfo(&AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = waitForDeps(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		if (F == TargetDataFuncTy::Begin)
		Ret = target_data_begin(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		AsyncInfo);
		else if (F == TargetDataFuncTy::End)
		Ret = target_data_end(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		AsyncInfo);
		else if (F == TargetDataFuncTy::Update)
		Ret = target_data_update(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = recordEvent(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		// Attach the async info to current task such that all dependent tasks can
		// start wait for the event if there is any dependency
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers
		// if (depNum + noAliasDepNum > 0)
		// __kmpc_set_async_info(AsyncInfo);
		__kmpc_set_async_info(AsyncInfo);

		Ret = queryAndWait(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		// If there is no dependency, we should release all async info. Otherwise, the
		// info will be released when the task is finished and its depnode is freed.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// return Device.RTL->releaseAsyncInfo(Device.RTLDeviceID, AsyncInfo);
		// else
		// return OFFLOAD_SUCCESS;
		return OFFLOAD_SUCCESS;
		}

		template int targetDataNowait<TargetDataFuncTy::Begin>(
		DeviceTy &Device, int32_t ArgNum, void ArgsBase, void Args,
		int64_t ArgSizes, int64_t ArgTypes, int32_t DepNum, void *DepList,
		int32_t NoAliasDepNum, void *NoAliasDepList);
		template int targetDataNowait<TargetDataFuncTy::End>(
		DeviceTy &Device, int32_t ArgNum, void ArgsBase, void Args,
		int64_t ArgSizes, int64_t ArgTypes, int32_t DepNum, void *DepList,
		int32_t NoAliasDepNum, void *NoAliasDepList);
		template int targetDataNowait<TargetDataFuncTy::Update>(
		DeviceTy &Device, int32_t ArgNum, void ArgsBase, void Args,
		int64_t ArgSizes, int64_t ArgTypes, int32_t DepNum, void *DepList,
		int32_t NoAliasDepNum, void *NoAliasDepList);

openmp/libomptarget/src/private.h

	Show All 11 Lines

	#ifndef _OMPTARGET_PRIVATE_H			#ifndef _OMPTARGET_PRIVATE_H
	#define _OMPTARGET_PRIVATE_H			#define _OMPTARGET_PRIVATE_H

	#include <omptarget.h>			#include <omptarget.h>

	#include <cstdint>			#include <cstdint>

	extern int target_data_begin(DeviceTy &Device, int32_t arg_num,			enum TargetDataFuncTy { Begin, End, Update };
	void args_base, void args, int64_t *arg_sizes,
	int64_t *arg_types,			template <TargetDataFuncTy F>
	__tgt_async_info *async_info_ptr);			int targetData(DeviceTy &Device, int32_t ArgNum, void ArgsBase, void Args,
				int64_t ArgSizes, int64_t ArgTypes);
	extern int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,
	void *args, int64_t arg_sizes, int64_t *arg_types,			template <TargetDataFuncTy F>
	__tgt_async_info *async_info_ptr);			int targetDataNowait(DeviceTy &Device, int32_t ArgNum, void **ArgsBase,
				void *Args, int64_t ArgSizes, int64_t *ArgTypes,
	extern int target_data_update(DeviceTy &Device, int32_t arg_num,			int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
	void args_base, void args, int64_t arg_sizes, int64_t arg_types);			void *NoAliasDepList);

	extern int target(int64_t device_id, void *host_ptr, int32_t arg_num,			extern int target(int64_t DeviceID, void *HostPtr, int32_t ArgNum,
	void args_base, void args, int64_t arg_sizes, int64_t arg_types,			void ArgsBase, void Args, int64_t *ArgSizes,
	int32_t team_num, int32_t thread_limit, int IsTeamConstruct);			int64_t *ArgTypes, int32_t TeamNum, int32_t ThreadLimit,
				int IsTeamConstruct);

				extern int targetNowait(int64_t DeviceID, void *HostPtr, int32_t ArgNum,
				void ArgsBase, void Args, int64_t *ArgSizes,
				int64_t *ArgTypes, int32_t TeamNum, int32_t ThreadLimit,
				int IsTeamConstruct, int32_t DepNum, void *DepList,
				int32_t NoAliasDepNum, void *NoAliasDepList);

	extern int CheckDeviceAndCtors(int64_t device_id);			extern int CheckDeviceAndCtors(int64_t device_id);

	// enum for OMP_TARGET_OFFLOAD; keep in sync with kmp.h definition			// enum for OMP_TARGET_OFFLOAD; keep in sync with kmp.h definition
	enum kmp_target_offload_kind {			enum kmp_target_offload_kind {
	tgt_disabled = 0,			tgt_disabled = 0,
	tgt_default = 1,			tgt_default = 1,
	tgt_mandatory = 2			tgt_mandatory = 2
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

openmp/libomptarget/src/rtl.h

Show All 20 Lines
#include <vector>		#include <vector>

// Forward declarations.		// Forward declarations.
struct DeviceTy;		struct DeviceTy;
struct __tgt_bin_desc;		struct __tgt_bin_desc;

struct RTLInfoTy {		struct RTLInfoTy {
typedef int32_t(is_valid_binary_ty)(void *);		typedef int32_t(is_valid_binary_ty)(void *);
typedef int32_t(is_data_exchangable_ty)(int32_t, int32_t);
typedef int32_t(number_of_devices_ty)();		typedef int32_t(number_of_devices_ty)();
typedef int32_t(init_device_ty)(int32_t);		typedef int32_t(init_device_ty)(int32_t);
typedef __tgt_target_table (load_binary_ty)(int32_t, void );		typedef __tgt_target_table (load_binary_ty)(int32_t, void );
typedef void (data_alloc_ty)(int32_t, int64_t, void );		typedef void (data_alloc_ty)(int32_t, int64_t, void );
typedef int32_t(data_submit_ty)(int32_t, void , void , int64_t);		typedef int32_t(data_submit_ty)(int32_t, void , void , int64_t);
typedef int32_t(data_submit_async_ty)(int32_t, void , void , int64_t,
__tgt_async_info *);
typedef int32_t(data_retrieve_ty)(int32_t, void , void , int64_t);		typedef int32_t(data_retrieve_ty)(int32_t, void , void , int64_t);
typedef int32_t(data_retrieve_async_ty)(int32_t, void , void , int64_t,
__tgt_async_info *);
typedef int32_t(data_exchange_ty)(int32_t, void , int32_t, void , int64_t);
typedef int32_t(data_exchange_async_ty)(int32_t, void , int32_t, void ,
int64_t, __tgt_async_info *);
typedef int32_t(data_delete_ty)(int32_t, void *);		typedef int32_t(data_delete_ty)(int32_t, void *);
typedef int32_t(run_region_ty)(int32_t, void , void , ptrdiff_t ,		typedef int32_t(run_region_ty)(int32_t, void , void , ptrdiff_t ,
int32_t);		int32_t);
typedef int32_t(run_region_async_ty)(int32_t, void , void , ptrdiff_t ,
int32_t, __tgt_async_info *);
typedef int32_t(run_team_region_ty)(int32_t, void , void , ptrdiff_t ,		typedef int32_t(run_team_region_ty)(int32_t, void , void , ptrdiff_t ,
int32_t, int32_t, int32_t, uint64_t);		int32_t, int32_t, int32_t, uint64_t);
		typedef int64_t(init_requires_ty)(int64_t);

		// Device to device memory copy interfaces
		typedef int32_t(is_data_exchangable_ty)(int32_t, int32_t);
		typedef int32_t(data_exchange_ty)(int32_t, void , int32_t, void , int64_t);
		typedef int32_t(data_exchange_async_ty)(int32_t, void , int32_t, void ,
		int64_t, __tgt_async_info *);

		// The following interfaces are all about asynchronous operations
		typedef int32_t(data_submit_async_ty)(int32_t, void , void , int64_t,
		__tgt_async_info *);
		typedef int32_t(data_retrieve_async_ty)(int32_t, void , void , int64_t,
		__tgt_async_info *);
		typedef int32_t(run_region_async_ty)(int32_t, void , void , ptrdiff_t ,
		int32_t, __tgt_async_info *);
typedef int32_t(run_team_region_async_ty)(int32_t, void , void *,		typedef int32_t(run_team_region_async_ty)(int32_t, void , void *,
ptrdiff_t *, int32_t, int32_t,		ptrdiff_t *, int32_t, int32_t,
int32_t, uint64_t,		int32_t, uint64_t,
__tgt_async_info *);		__tgt_async_info *);
typedef int64_t(init_requires_ty)(int64_t);		typedef int32_t(wait_event_ty)(int32_t, __tgt_async_info *,
typedef int64_t(synchronize_ty)(int64_t, __tgt_async_info *);		__tgt_async_info *);
		typedef int32_t(record_event_ty)(int32_t, __tgt_async_info *);
		typedef int32_t(synchronize_ty)(int32_t, __tgt_async_info *);
		typedef int32_t(check_event_ty)(int32_t, __tgt_async_info *);
		typedef int32_t(release_async_info_ty)(int32_t, __tgt_async_info *);
		typedef int32_t(init_async_info_ty)(int32_t, __tgt_async_info **);

int32_t Idx = -1; // RTL index, index is the number of devices		int32_t Idx = -1; // RTL index, index is the number of devices
// of other RTLs that were registered before,		// of other RTLs that were registered before,
// i.e. the OpenMP index of the first device		// i.e. the OpenMP index of the first device
// to be registered with this RTL.		// to be registered with this RTL.
int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.		int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.

void *LibraryHandler = nullptr;		void *LibraryHandler = nullptr;
Show All 16 Lines	#endif
data_exchange_ty *data_exchange = nullptr;		data_exchange_ty *data_exchange = nullptr;
data_exchange_async_ty *data_exchange_async = nullptr;		data_exchange_async_ty *data_exchange_async = nullptr;
data_delete_ty *data_delete = nullptr;		data_delete_ty *data_delete = nullptr;
run_region_ty *run_region = nullptr;		run_region_ty *run_region = nullptr;
run_region_async_ty *run_region_async = nullptr;		run_region_async_ty *run_region_async = nullptr;
run_team_region_ty *run_team_region = nullptr;		run_team_region_ty *run_team_region = nullptr;
run_team_region_async_ty *run_team_region_async = nullptr;		run_team_region_async_ty *run_team_region_async = nullptr;
init_requires_ty *init_requires = nullptr;		init_requires_ty *init_requires = nullptr;
		release_async_info_ty *releaseAsyncInfo = nullptr;
		wait_event_ty *wait_event = nullptr;
		record_event_ty *record_event = nullptr;
synchronize_ty *synchronize = nullptr;		synchronize_ty *synchronize = nullptr;
		check_event_ty *check_event = nullptr;
		init_async_info_ty *initAsyncInfo = nullptr;

// Are there images associated with this RTL.		// Are there images associated with this RTL.
bool isUsed = false;		bool isUsed = false;

// Mutex for thread-safety when calling RTL interface functions.		// Mutex for thread-safety when calling RTL interface functions.
// It is easier to enforce thread-safety at the libomptarget level,		// It is easier to enforce thread-safety at the libomptarget level,
// so that developers of new RTLs do not have to worry about it.		// so that developers of new RTLs do not have to worry about it.
std::mutex Mtx;		std::mutex Mtx;

		// Whether it supports asynchronous operation
		bool AsyncSupported = true;
		jdoerfertUnsubmitted Not Done Reply Inline Actions Typo in comment jdoerfert: Typo in comment

// The existence of the mutex above makes RTLInfoTy non-copyable.		// The existence of the mutex above makes RTLInfoTy non-copyable.
// We need to provide a copy constructor explicitly.		// We need to provide a copy constructor explicitly.
RTLInfoTy() = default;		RTLInfoTy() = default;

RTLInfoTy(const RTLInfoTy &r) {		RTLInfoTy(const RTLInfoTy &r) {
Idx = r.Idx;		Idx = r.Idx;
NumberOfDevices = r.NumberOfDevices;		NumberOfDevices = r.NumberOfDevices;
LibraryHandler = r.LibraryHandler;		LibraryHandler = r.LibraryHandler;
Show All 14 Lines	#endif
data_exchange_async = r.data_exchange_async;		data_exchange_async = r.data_exchange_async;
data_delete = r.data_delete;		data_delete = r.data_delete;
run_region = r.run_region;		run_region = r.run_region;
run_region_async = r.run_region_async;		run_region_async = r.run_region_async;
run_team_region = r.run_team_region;		run_team_region = r.run_team_region;
run_team_region_async = r.run_team_region_async;		run_team_region_async = r.run_team_region_async;
init_requires = r.init_requires;		init_requires = r.init_requires;
isUsed = r.isUsed;		isUsed = r.isUsed;
		AsyncSupported = r.AsyncSupported;
		releaseAsyncInfo = r.releaseAsyncInfo;
		wait_event = r.wait_event;
		record_event = r.record_event;
synchronize = r.synchronize;		synchronize = r.synchronize;
		check_event = r.check_event;
		initAsyncInfo = r.initAsyncInfo;
}		}
};		};

/// RTLs identified in the system.		/// RTLs identified in the system.
class RTLsTy {		class RTLsTy {
private:		private:
// Mutex-like object to guarantee thread-safety and unique initialization		// Mutex-like object to guarantee thread-safety and unique initialization
// (i.e. the library attempts to load the RTLs (plugins) only once).		// (i.e. the library attempts to load the RTLs (plugins) only once).
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

openmp/libomptarget/src/rtl.cpp

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	((void *)&R.run_team_region_async) =
dlsym(dynlib_handle, "__tgt_rtl_run_target_team_region_async");		dlsym(dynlib_handle, "__tgt_rtl_run_target_team_region_async");
((void *)&R.synchronize) = dlsym(dynlib_handle, "__tgt_rtl_synchronize");		((void *)&R.synchronize) = dlsym(dynlib_handle, "__tgt_rtl_synchronize");
((void *)&R.data_exchange) =		((void *)&R.data_exchange) =
dlsym(dynlib_handle, "__tgt_rtl_data_exchange");		dlsym(dynlib_handle, "__tgt_rtl_data_exchange");
((void *)&R.data_exchange_async) =		((void *)&R.data_exchange_async) =
dlsym(dynlib_handle, "__tgt_rtl_data_exchange_async");		dlsym(dynlib_handle, "__tgt_rtl_data_exchange_async");
((void *)&R.is_data_exchangable) =		((void *)&R.is_data_exchangable) =
dlsym(dynlib_handle, "__tgt_rtl_is_data_exchangable");		dlsym(dynlib_handle, "__tgt_rtl_is_data_exchangable");
		((void *)&R.releaseAsyncInfo) =
		dlsym(dynlib_handle, "__tgt_rtl_release_async_info");
		((void *)&R.wait_event) = dlsym(dynlib_handle, "__tgt_rtl_wait_event");
		((void *)&R.record_event) =
		dlsym(dynlib_handle, "__tgt_rtl_record_event");
		((void *)&R.check_event) = dlsym(dynlib_handle, "__tgt_rtl_check_event");
		((void *)&R.initAsyncInfo) =
		dlsym(dynlib_handle, "__tgt_rtl_initialize_async_info");

		if (!R.synchronize \|\| !R.check_event) {
		DP("Asynchronous offloading not supported\n");
		R.AsyncSupported = false;
		R.data_exchange_async = nullptr;
		R.data_retrieve_async = nullptr;
		R.data_submit_async = nullptr;

		R.run_region_async = nullptr;
		R.run_team_region_async = nullptr;

		R.releaseAsyncInfo = nullptr;
		R.record_event = nullptr;
		R.wait_event = nullptr;
		R.check_event = nullptr;
		R.initAsyncInfo = nullptr;
		}

// No devices are supported by this RTL?		// No devices are supported by this RTL?
if (!(R.NumberOfDevices = R.number_of_devices())) {		if (!(R.NumberOfDevices = R.number_of_devices())) {
DP("No devices supported in this RTL\n");		DP("No devices supported in this RTL\n");
continue;		continue;
}		}

DP("Registering RTL %s supporting %d devices!\n", R.RTLName.c_str(),		DP("Registering RTL %s supporting %d devices!\n", R.RTLName.c_str(),
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	for (auto *R : UsedRTLs) {

// Execute dtors for static objects if the device has been used, i.e.		// Execute dtors for static objects if the device has been used, i.e.
// if its PendingCtors list has been emptied.		// if its PendingCtors list has been emptied.
for (int32_t i = 0; i < FoundRTL->NumberOfDevices; ++i) {		for (int32_t i = 0; i < FoundRTL->NumberOfDevices; ++i) {
DeviceTy &Device = Devices[FoundRTL->Idx + i];		DeviceTy &Device = Devices[FoundRTL->Idx + i];
Device.PendingGlobalsMtx.lock();		Device.PendingGlobalsMtx.lock();
if (Device.PendingCtorsDtors[desc].PendingCtors.empty()) {		if (Device.PendingCtorsDtors[desc].PendingCtors.empty()) {
for (auto &dtor : Device.PendingCtorsDtors[desc].PendingDtors) {		for (auto &dtor : Device.PendingCtorsDtors[desc].PendingDtors) {
int rc = target(Device.DeviceID, dtor, 0, NULL, NULL, NULL, NULL, 1,		int rc = target(Device.DeviceID, dtor, 0 /* arg_num */,
1, true /team/);		NULL /* arg_base /, NULL / args */,
		NULL /* arg_sizes /, NULL / arg_types */,
		1 /* team_num /, 1 / thread_limit */,
		true /* IsTeamConstruct */);
		jdoerfertUnsubmitted Not Done Reply Inline Actions Style: Everywhere I have seen this we do `/* name / value`. I know this was different here but I'd like us to align with LLVM & Clang on this one. Feel free to commit the comments for all but the new argument as NFC without further review. jdoerfert:* Style: Everywhere I have seen this we do `/* name */ value`. I know this was different here but…
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP("Running destructor " DPxMOD " failed.\n", DPxPTR(dtor));		DP("Running destructor " DPxMOD " failed.\n", DPxPTR(dtor));
}		}
}		}
// Remove this library's entry from PendingCtorsDtors		// Remove this library's entry from PendingCtorsDtors
Device.PendingCtorsDtors.erase(desc);		Device.PendingCtorsDtors.erase(desc);
}		}
Device.PendingGlobalsMtx.unlock();		Device.PendingGlobalsMtx.unlock();
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

openmp/runtime/src/kmp.h

Show First 20 Lines • Show All 2,150 Lines • ▼ Show 20 Lines	typedef struct kmp_base_depnode {
kmp_lock_t mtx_locks[MAX_MTX_DEPS]; / lock mutexinoutset dependent tasks */		kmp_lock_t mtx_locks[MAX_MTX_DEPS]; / lock mutexinoutset dependent tasks */
kmp_int32 mtx_num_locks; /* number of locks in mtx_locks array */		kmp_int32 mtx_num_locks; /* number of locks in mtx_locks array */
kmp_lock_t lock; /* guards shared fields: task, successors */		kmp_lock_t lock; /* guards shared fields: task, successors */
#if KMP_SUPPORT_GRAPH_OUTPUT		#if KMP_SUPPORT_GRAPH_OUTPUT
kmp_uint32 id;		kmp_uint32 id;
#endif		#endif
std::atomic<kmp_int32> npredecessors;		std::atomic<kmp_int32> npredecessors;
std::atomic<kmp_int32> nrefs;		std::atomic<kmp_int32> nrefs;
		// All its depending target tasks
		kmp_depnode_list_t *predecessors;
		// A pointer to __tgt_async_info
		std::atomic<uintptr_t> async_info;
} kmp_base_depnode_t;		} kmp_base_depnode_t;

union KMP_ALIGN_CACHE kmp_depnode {		union KMP_ALIGN_CACHE kmp_depnode {
double dn_align; /* use worst case alignment */		double dn_align; /* use worst case alignment */
char dn_pad[KMP_PAD(kmp_base_depnode_t, CACHE_LINE)];		char dn_pad[KMP_PAD(kmp_base_depnode_t, CACHE_LINE)];
kmp_base_depnode_t dn;		kmp_base_depnode_t dn;
};		};

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	unsigned merged_if0 : 1; /* no __kmpc_task_{begin/complete}_if0 calls in if0
code path */		code path */
unsigned destructors_thunk : 1; /* set if the compiler creates a thunk to		unsigned destructors_thunk : 1; /* set if the compiler creates a thunk to
invoke destructors from the runtime */		invoke destructors from the runtime */
unsigned proxy : 1; /* task is a proxy task (it will be executed outside the		unsigned proxy : 1; /* task is a proxy task (it will be executed outside the
context of the RTL) */		context of the RTL) */
unsigned priority_specified : 1; /* set if the compiler provides priority		unsigned priority_specified : 1; /* set if the compiler provides priority
setting for the task */		setting for the task */
unsigned detachable : 1; /* 1 == can detach */		unsigned detachable : 1; /* 1 == can detach */
unsigned reserved : 9; /* reserved for compiler use */		unsigned target : 1; /* 1 == target task */
		unsigned reserved : 8; /* reserved for compiler use */

/* Library flags / / Total library flags must be 16 bits */		/* Library flags / / Total library flags must be 16 bits */
unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */		unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)		unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)
unsigned tasking_ser : 1; // all tasks in team are either executed immediately		unsigned tasking_ser : 1; // all tasks in team are either executed immediately
// (1) or may be deferred (0)		// (1) or may be deferred (0)
unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel		unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel
// (0) [>= 2 threads]		// (0) [>= 2 threads]
▲ Show 20 Lines • Show All 1,654 Lines • ▼ Show 20 Lines
static inline void __kmp_resume_if_hard_paused() {		static inline void __kmp_resume_if_hard_paused() {
if (__kmp_pause_status == kmp_hard_paused) {		if (__kmp_pause_status == kmp_hard_paused) {
__kmp_pause_status = kmp_not_paused;		__kmp_pause_status = kmp_not_paused;
}		}
}		}

extern void __kmp_omp_display_env(int verbose);		extern void __kmp_omp_display_env(int verbose);

		// For interaction with libomptarget
		extern void __kmpc_set_async_info(void *async_info);
		extern void __kmpc_get_target_task_waiting_list(void *list, int num);
		extern void __kmpc_target_task_yield();
		extern int __kmpc_get_target_task_npredecessors();

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

#endif /* KMP_H */		#endif /* KMP_H */

openmp/runtime/src/kmp_taskdeps.h

/*		/*
* kmp_taskdeps.h		* kmp_taskdeps.h
*/		*/


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//


#ifndef KMP_TASKDEPS_H		#ifndef KMP_TASKDEPS_H
#define KMP_TASKDEPS_H		#define KMP_TASKDEPS_H

#include "kmp.h"		#include "kmp.h"

#define KMP_ACQUIRE_DEPNODE(gtid, n) __kmp_acquire_lock(&(n)->dn.lock, (gtid))		#define KMP_ACQUIRE_DEPNODE(gtid, n) __kmp_acquire_lock(&(n)->dn.lock, (gtid))
#define KMP_RELEASE_DEPNODE(gtid, n) __kmp_release_lock(&(n)->dn.lock, (gtid))		#define KMP_RELEASE_DEPNODE(gtid, n) __kmp_release_lock(&(n)->dn.lock, (gtid))

		extern "C" {
		void __kmpc_free_async_info(void *);
		}

static inline void __kmp_node_deref(kmp_info_t thread, kmp_depnode_t node) {		static inline void __kmp_node_deref(kmp_info_t thread, kmp_depnode_t node) {
if (!node)		if (!node)
return;		return;

kmp_int32 n = KMP_ATOMIC_DEC(&node->dn.nrefs) - 1;		kmp_int32 n = KMP_ATOMIC_DEC(&node->dn.nrefs) - 1;
if (n == 0) {		if (n == 0) {
KMP_ASSERT(node->dn.nrefs == 0);		KMP_ASSERT(node->dn.nrefs == 0);
		// Free async info
		if (node->dn.async_info)
		__kmpc_free_async_info(
		reinterpret_cast<void *>(KMP_ATOMIC_LD_ACQ(&node->dn.async_info)));
		// Free the predecessor list
		kmp_depnode_list_t *next;
		for (kmp_depnode_list_t *p = node->dn.predecessors; p; p = next) {
		__kmp_node_deref(thread, p->node);
		next = p->next;
		#if USE_FAST_MEMORY
		__kmp_fast_free(thread, p);
		#else
		__kmp_thread_free(thread, p);
		#endif
		}
#if USE_FAST_MEMORY		#if USE_FAST_MEMORY
__kmp_fast_free(thread, node);		__kmp_fast_free(thread, node);
#else		#else
__kmp_thread_free(thread, node);		__kmp_thread_free(thread, node);
#endif		#endif
}		}
}		}

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {
KMP_ACQUIRE_DEPNODE(gtid, node);		KMP_ACQUIRE_DEPNODE(gtid, node);
node->dn.task =		node->dn.task =
NULL; // mark this task as finished, so no new dependencies are generated		NULL; // mark this task as finished, so no new dependencies are generated
KMP_RELEASE_DEPNODE(gtid, node);		KMP_RELEASE_DEPNODE(gtid, node);

kmp_depnode_list_t *next;		kmp_depnode_list_t *next;
for (kmp_depnode_list_t *p = node->dn.successors; p; p = next) {		for (kmp_depnode_list_t *p = node->dn.successors; p; p = next) {
kmp_depnode_t *successor = p->node;		kmp_depnode_t *successor = p->node;
		kmp_taskdata_t *successor_task_data = KMP_TASK_TO_TASKDATA(successor);
kmp_int32 npredecessors = KMP_ATOMIC_DEC(&successor->dn.npredecessors) - 1;		kmp_int32 npredecessors = KMP_ATOMIC_DEC(&successor->dn.npredecessors) - 1;

// successor task can be NULL for wait_depends or because deps are still		// successor task can be NULL for wait_depends or because deps are still
// being processed		// being processed
if (npredecessors == 0) {		if (npredecessors == 0) {
KMP_MB();		KMP_MB();
if (successor->dn.task) {		// All target tasks have been enqueued before
		if (successor->dn.task && !successor_task_data->td_flags.target) {
KA_TRACE(20, ("__kmp_release_deps: T#%d successor %p of %p scheduled "		KA_TRACE(20, ("__kmp_release_deps: T#%d successor %p of %p scheduled "
"for execution.\n",		"for execution.\n",
gtid, successor->dn.task, task));		gtid, successor->dn.task, task));
__kmp_omp_task(gtid, successor->dn.task, false);		__kmp_omp_task(gtid, successor->dn.task, false);
}		}
}		}

next = p->next;		next = p->next;
Show All 17 Lines

openmp/runtime/src/kmp_taskdeps.cpp

Show All 39 Lines	static void __kmp_init_node(kmp_depnode_t *node) {
for (int i = 0; i < MAX_MTX_DEPS; ++i)		for (int i = 0; i < MAX_MTX_DEPS; ++i)
node->dn.mtx_locks[i] = NULL;		node->dn.mtx_locks[i] = NULL;
node->dn.mtx_num_locks = 0;		node->dn.mtx_num_locks = 0;
__kmp_init_lock(&node->dn.lock);		__kmp_init_lock(&node->dn.lock);
KMP_ATOMIC_ST_RLX(&node->dn.nrefs, 1); // init creates the first reference		KMP_ATOMIC_ST_RLX(&node->dn.nrefs, 1); // init creates the first reference
#ifdef KMP_SUPPORT_GRAPH_OUTPUT		#ifdef KMP_SUPPORT_GRAPH_OUTPUT
node->dn.id = KMP_ATOMIC_INC(&kmp_node_id_seed);		node->dn.id = KMP_ATOMIC_INC(&kmp_node_id_seed);
#endif		#endif
		node->dn.predecessors = NULL;
		KMP_ATOMIC_ST_REL(&node->dn.async_info, 0);
}		}

static inline kmp_depnode_t __kmp_node_ref(kmp_depnode_t node) {		static inline kmp_depnode_t __kmp_node_ref(kmp_depnode_t node) {
KMP_ATOMIC_INC(&node->dn.nrefs);		KMP_ATOMIC_INC(&node->dn.nrefs);
return node;		return node;
}		}

enum { KMP_DEPHASH_OTHER_SIZE = 97, KMP_DEPHASH_MASTER_SIZE = 997 };		enum { KMP_DEPHASH_OTHER_SIZE = 97, KMP_DEPHASH_MASTER_SIZE = 997 };
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines

static inline kmp_int32		static inline kmp_int32
__kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,		__kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,
kmp_task_t task, kmp_depnode_t node,		kmp_task_t task, kmp_depnode_t node,
kmp_depnode_list_t *plist) {		kmp_depnode_list_t *plist) {
if (!plist)		if (!plist)
return 0;		return 0;
kmp_int32 npredecessors = 0;		kmp_int32 npredecessors = 0;
// link node as successor of list elements		kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
		// If a task is a target task, we will check all its depending tasks. If a
		// depending task is a target task, we put it into the predecessors list of
		// current task. If a depending task is a host task, we will put current task
		// into the successors list of the depending task and increase the count.
		// Later on, when we process all dependencies of current task, if the count is
		// not zero, meaning current target task depends on at least one host task, we
		// will still push the task into the queue and let the RTL dispatch it.
		// However, before starting the offloading, it will check whether the count is
		// zero and will not proceed if not.
for (kmp_depnode_list_t *p = plist; p; p = p->next) {		for (kmp_depnode_list_t *p = plist; p; p = p->next) {
kmp_depnode_t *dep = p->node;		kmp_depnode_t *dep = p->node;
if (dep->dn.task) {		if (dep->dn.task) {
KMP_ACQUIRE_DEPNODE(gtid, dep);		KMP_ACQUIRE_DEPNODE(gtid, dep);
if (dep->dn.task) {		if (dep->dn.task) {
__kmp_track_dependence(dep, node, task);		__kmp_track_dependence(dep, node, task);
		kmp_taskdata_t *dep_taskdata = KMP_TASK_TO_TASKDATA(dep->dn.task);
		if (taskdata->td_flags.target && dep_taskdata->td_flags.target) {
		node->dn.predecessors =
		__kmp_add_node(thread, node->dn.predecessors, dep);
		KA_TRACE(40, ("__kmp_process_deps: T#%d target task %p depends on "
		"target task %p\n",
		gtid, taskdata, dep_taskdata));
		} else {
dep->dn.successors = __kmp_add_node(thread, dep->dn.successors, node);		dep->dn.successors = __kmp_add_node(thread, dep->dn.successors, node);
		++npredecessors;
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "		KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",		"%p\n",
gtid, KMP_TASK_TO_TASKDATA(dep->dn.task),		gtid, KMP_TASK_TO_TASKDATA(dep->dn.task),
KMP_TASK_TO_TASKDATA(task)));		KMP_TASK_TO_TASKDATA(task)));
npredecessors++;		}
}		}
KMP_RELEASE_DEPNODE(gtid, dep);		KMP_RELEASE_DEPNODE(gtid, dep);
}		}
}		}
return npredecessors;		return npredecessors;
}		}

static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,		static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
kmp_info_t *thread,		kmp_info_t *thread,
kmp_task_t *task,		kmp_task_t *task,
kmp_depnode_t *source,		kmp_depnode_t *source,
kmp_depnode_t *sink) {		kmp_depnode_t *sink) {
if (!sink)		if (!sink)
return 0;		return 0;
		kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
kmp_int32 npredecessors = 0;		kmp_int32 npredecessors = 0;
if (sink->dn.task) {		if (sink->dn.task) {
// synchronously add source to sink' list of successors		// synchronously add source to sink' list of successors
KMP_ACQUIRE_DEPNODE(gtid, sink);		KMP_ACQUIRE_DEPNODE(gtid, sink);
if (sink->dn.task) {		if (sink->dn.task) {
__kmp_track_dependence(sink, source, task);		__kmp_track_dependence(sink, source, task);
sink->dn.successors = __kmp_add_node(thread, sink->dn.successors, source);		kmp_taskdata_t *sink_taskdata = KMP_TASK_TO_TASKDATA(sink->dn.task);
		if (taskdata->td_flags.target && sink_taskdata->td_flags.target) {
		source->dn.predecessors =
		__kmp_add_node(thread, source->dn.predecessors, sink);
		KA_TRACE(40, ("__kmp_process_deps: T#%d target task %p depends on "
		"target task %p\n",
		gtid, taskdata, sink_taskdata));
		} else {
		sink->dn.successors =
		__kmp_add_node(thread, sink->dn.successors, source);
		npredecessors++;
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "		KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",		"%p\n",
gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),		gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),
KMP_TASK_TO_TASKDATA(task)));		KMP_TASK_TO_TASKDATA(task)));
npredecessors++;		}
}		}
KMP_RELEASE_DEPNODE(gtid, sink);		KMP_RELEASE_DEPNODE(gtid, sink);
}		}
return npredecessors;		return npredecessors;
}		}

template <bool filter>		template <bool filter>
static inline kmp_int32		static inline kmp_int32
▲ Show 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	if (__kmp_check_deps(gtid, node, new_task, &current_task->td_dephash,
"dependencies: "		"dependencies: "
"loc=%p task=%p, return: TASK_CURRENT_NOT_QUEUED\n",		"loc=%p task=%p, return: TASK_CURRENT_NOT_QUEUED\n",
gtid, loc_ref, new_taskdata));		gtid, loc_ref, new_taskdata));
#if OMPT_SUPPORT		#if OMPT_SUPPORT
if (ompt_enabled.enabled) {		if (ompt_enabled.enabled) {
current_task->ompt_task_info.frame.enter_frame = ompt_data_none;		current_task->ompt_task_info.frame.enter_frame = ompt_data_none;
}		}
#endif		#endif
		// All target tasks will be enqueued directly no matter whether their
		// dependencies have been fullfilled. They will be checked again in
		// libomptarget.
		if (!new_taskdata->td_flags.target)
return TASK_CURRENT_NOT_QUEUED;		return TASK_CURRENT_NOT_QUEUED;
}		}
} else {		} else {
KA_TRACE(10, ("__kmpc_omp_task_with_deps(exit): T#%d ignored dependencies "		KA_TRACE(10, ("__kmpc_omp_task_with_deps(exit): T#%d ignored dependencies "
"for task (serialized)"		"for task (serialized)"
"loc=%p task=%p\n",		"loc=%p task=%p\n",
gtid, loc_ref, new_taskdata));		gtid, loc_ref, new_taskdata));
}		}

▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_tasking.cpp

Show First 20 Lines • Show All 926 Lines • ▼ Show 20 Lines

#if OMPT_SUPPORT		#if OMPT_SUPPORT
// This is not a detached task, we are done here		// This is not a detached task, we are done here
if (ompt)		if (ompt)
__ompt_task_finish(task, resumed_task, ompt_task_complete);		__ompt_task_finish(task, resumed_task, ompt_task_complete);
#endif		#endif

// Only need to keep track of count if team parallel and tasking not		// Only need to keep track of count if team parallel and tasking not
// serialized, or task is detachable and event has already been fulfilled		// serialized, or task is detachable and event has already been fulfilled
if (!(taskdata->td_flags.team_serial \|\| taskdata->td_flags.tasking_ser) \|\|		if (!(taskdata->td_flags.team_serial \|\| taskdata->td_flags.tasking_ser) \|\|
taskdata->td_flags.detachable == TASK_DETACHABLE) {		taskdata->td_flags.detachable == TASK_DETACHABLE) {
// Predecrement simulated by "- 1" calculation		// Predecrement simulated by "- 1" calculation
children =		children =
KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks) - 1;		KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks) - 1;
KMP_DEBUG_ASSERT(children >= 0);		KMP_DEBUG_ASSERT(children >= 0);
if (taskdata->td_taskgroup)		if (taskdata->td_taskgroup)
KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);		KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
▲ Show 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	if (flags->proxy == TASK_FULL)
copy_icvs(&taskdata->td_icvs, &taskdata->td_parent->td_icvs);		copy_icvs(&taskdata->td_icvs, &taskdata->td_parent->td_icvs);

taskdata->td_flags.tiedness = flags->tiedness;		taskdata->td_flags.tiedness = flags->tiedness;
taskdata->td_flags.final = flags->final;		taskdata->td_flags.final = flags->final;
taskdata->td_flags.merged_if0 = flags->merged_if0;		taskdata->td_flags.merged_if0 = flags->merged_if0;
taskdata->td_flags.destructors_thunk = flags->destructors_thunk;		taskdata->td_flags.destructors_thunk = flags->destructors_thunk;
taskdata->td_flags.proxy = flags->proxy;		taskdata->td_flags.proxy = flags->proxy;
taskdata->td_flags.detachable = flags->detachable;		taskdata->td_flags.detachable = flags->detachable;
		taskdata->td_flags.target = flags->target;
taskdata->td_task_team = thread->th.th_task_team;		taskdata->td_task_team = thread->th.th_task_team;
taskdata->td_size_alloc = shareds_offset + sizeof_shareds;		taskdata->td_size_alloc = shareds_offset + sizeof_shareds;
taskdata->td_flags.tasktype = TASK_EXPLICIT;		taskdata->td_flags.tasktype = TASK_EXPLICIT;

// GEH - TODO: fix this to copy parent task's value of tasking_ser flag		// GEH - TODO: fix this to copy parent task's value of tasking_ser flag
taskdata->td_flags.tasking_ser = (__kmp_tasking_mode == tskm_immediate_exec);		taskdata->td_flags.tasking_ser = (__kmp_tasking_mode == tskm_immediate_exec);

// GEH - TODO: fix this to copy parent task's value of team_serial flag		// GEH - TODO: fix this to copy parent task's value of team_serial flag
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
}		}

kmp_task_t __kmpc_omp_target_task_alloc(ident_t loc_ref, kmp_int32 gtid,		kmp_task_t __kmpc_omp_target_task_alloc(ident_t loc_ref, kmp_int32 gtid,
kmp_int32 flags,		kmp_int32 flags,
size_t sizeof_kmp_task_t,		size_t sizeof_kmp_task_t,
size_t sizeof_shareds,		size_t sizeof_shareds,
kmp_routine_entry_t task_entry,		kmp_routine_entry_t task_entry,
kmp_int64 device_id) {		kmp_int64 device_id) {
		// All tasks created via this interface should be a target task
		kmp_tasking_flags_t input_flags = (kmp_tasking_flags_t )&flags;
		input_flags->target = TRUE;
return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,		return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,
sizeof_shareds, task_entry);		sizeof_shareds, task_entry);
}		}

/*!		/*!
@ingroup TASKING		@ingroup TASKING
@param loc_ref location of the original task directive		@param loc_ref location of the original task directive
@param gtid Global Thread ID of encountering thread		@param gtid Global Thread ID of encountering thread
▲ Show 20 Lines • Show All 3,149 Lines • ▼ Show 20 Lines	#endif
if (nogroup == 0) {		if (nogroup == 0) {
#if OMPT_SUPPORT && OMPT_OPTIONAL		#if OMPT_SUPPORT && OMPT_OPTIONAL
OMPT_STORE_RETURN_ADDRESS(gtid);		OMPT_STORE_RETURN_ADDRESS(gtid);
#endif		#endif
__kmpc_end_taskgroup(loc, gtid);		__kmpc_end_taskgroup(loc, gtid);
}		}
KA_TRACE(20, ("__kmpc_taskloop(exit): T#%d\n", gtid));		KA_TRACE(20, ("__kmpc_taskloop(exit): T#%d\n", gtid));
}		}

		// Bind the async info to current task
		void __kmpc_set_async_info(void *async_info) {
		int gtid = __kmp_get_gtid();
		kmp_info_t *thread = __kmp_threads[gtid];
		kmp_depnode_t *dep = thread->th.th_current_task->td_depnode;
		if (!dep)
		return;
		KMP_ATOMIC_ST_REL(&dep->dn.async_info,
		reinterpret_cast<uintptr_t>(async_info));
		}

		// Get the list of waiting async info. If list is NULL, just query the number of
		// predecessors current executing task has. If not, list will contain all
		// asynchronous
		void __kmpc_get_target_task_waiting_list(void *list, int num) {
		KMP_DEBUG_ASSERT(num != NULL);

		int gtid = __kmp_get_gtid();
		kmp_info_t *thread = __kmp_threads[gtid];
		kmp_taskdata_t *taskdata = thread->th.th_current_task;

		// If the depnode is nullptr, the team must work in a serial mode
		if (!taskdata->td_depnode) {
		*num = 0;
		return;
		}

		int n = 0;
		for (kmp_depnode_list_t *p = taskdata->td_depnode->dn.predecessors; p;
		p = p->next) {
		kmp_depnode_t *dep = p->node;
		if (dep->dn.task) {
		kmp_taskdata_t *pred_task = KMP_TASK_TO_TASKDATA(dep->dn.task);
		KMP_ASSERT(pred_task->td_flags.target);
		if (list) {
		while (KMP_ATOMIC_LD_ACQ(&dep->dn.async_info) == 0)
		__kmpc_omp_taskyield(nullptr, gtid, 0);
		list[n] =
		reinterpret_cast<void *>(KMP_ATOMIC_LD_ACQ(&dep->dn.async_info));
		}
		++n;
		}
		}

		*num = n;
		}

		void __kmpc_target_task_yield() {
		int gtid = __kmp_get_gtid();
		__kmpc_omp_taskyield(nullptr, gtid, 0);
		}

		int __kmpc_get_target_task_npredecessors() {
		int gtid = __kmp_get_gtid();
		kmp_info_t *thread = __kmp_threads[gtid];
		kmp_taskdata_t *taskdata = thread->th.th_current_task;
		kmp_depnode_t *dep = taskdata->td_depnode;

		if (!dep)
		return 0;

		int n;

		KMP_ACQUIRE_DEPNODE(gtid, dep);
		n = dep->dn.npredecessors;
		KMP_RELEASE_DEPNODE(gtid, dep);

		return n;
		}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Introduce low level dependency process to target offloadingAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 272503

openmp/libomptarget/include/omptarget.h

openmp/libomptarget/include/omptargetplugin.h

openmp/libomptarget/plugins/cuda/src/rtl.cpp

openmp/libomptarget/plugins/exports

openmp/libomptarget/src/device.h

openmp/libomptarget/src/device.cpp

openmp/libomptarget/src/exports

openmp/libomptarget/src/interface.cpp

openmp/libomptarget/src/omptarget.cpp

openmp/libomptarget/src/private.h

openmp/libomptarget/src/rtl.h

openmp/libomptarget/src/rtl.cpp

openmp/runtime/src/kmp.h

openmp/runtime/src/kmp_taskdeps.h

openmp/runtime/src/kmp_taskdeps.cpp

openmp/runtime/src/kmp_tasking.cpp

[OpenMP] Introduce low level dependency process to target offloading
AbandonedPublic