This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/
-
libomptarget/
-
include/
-
omptarget.h
-
omptargetplugin.h
-
plugins/
-
cuda/src/
-
src/
-
rtl.cpp
-
exports
-
src/
-
device.cpp
-
exports
-
interface.cpp
1
omptarget.cpp
-
private.h
1
rtl.h
1
rtl.cpp
-
runtime/src/
-
src/
-
kmp.h
-
kmp_taskdeps.h
-
kmp_taskdeps.cpp
-
kmp_tasking.cpp

Differential D81989

[OpenMP] Introduce low level dependency process to target offloading
AbandonedPublic

Authored by tianshilei1992 on Jun 16 2020, 8:26 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
grokos
AndreyChurbanov

Summary

Asynchronous offloading will be wrapped into a target task, and the
corresponding dependencies will go to the task. Only all dependencies are
full-filled, the task will be enqueued and dispatched. However, almost all
device runtime libraries provide ways for dependencies such that we don't need
to go back to host side to resolve the dependencies. For exmaple, we could wait
for a CUDA event before we push some operations into a stream. The wait is not
blocking so that all following enqueues will be proceeded. However, they will
not be executed until the waiting event is full-filled.

This patch lowers the dependency process of target task to the device side. It
supports depending on both host tasks and target tasks. For depending on target
tasks, the process goes to the device side. As for depending on host tasks,
current mechanism is still used with a tiny modification.

The following are design details:

When a target construct is encountered, Clang wraps it into a task, and emit
function call to __kmpc_omp_target_task_alloc. We mark all tasks allocated by
__kmpc_omp_target_task_alloc as _target task_s. The transformation is like:

#pragma omp target depend(D) nowait
{ /* target region */ }
// The above one will be transformed to the following one
#pragma omp task depend(D) shared(...) target_task
#pragma omp target
{ /* target region */ }

where target_task is just a flag that is not really a part of the construct.

After the target task is created, let's call it _A_ and assume it has
dependencies, __kmpc_omp_task_with_deps is called to resolve and process its
dependencies. The only change here is, when A depends on another target task,
let's call it _B_, B is add into A's predecessors which is a linked list
storing all A's *target* predecessors. Here we do NOT increase the counter
npredecessors of A. If B is a host/regular task, existing scheme is used, which
is to add A into B's successors, and increase A's npredecessors. This
approach indicates that a target task's npredecessors only represents the number
of *host/regular* tasks a target task depends. Finally, enqueue a target task no
matter whether its dependencies are full-filled.

Now let's switch to libomptarget and take target nowait as example, and
target data related stuffs are same. It first create a new __tgt_async_info
which contains three fields: DeviceID which is the index to use Devices,
Queue which is a queue-like data structure where to push device operations, and
Event that is a device-dependent event. In the function target_nowait, it
first checks whether asynchronous APIs are supported. If not, wait all its
dependencies to be full-filled by checking the counter and yield the current
task if it still has unfinished dependencies. Once all dependencies are done, it
calls the synchronous version of target by setting the __tgt_async_info to
nullptr to tell the device RTL to use synchronous APIs.

Let's get back to the asynchronous version. It first calls waitForDeps to process dependencies. Here
it checks whether the npredecessors is zero. If not, it means there is still depending host tasks that
have not been finished, then it yields the current task. After that, we know it has no depending
host/regular task, then it starts to check all depending target tasks. __kmpc_get_target_task_waiting_list
is called to fetch __tgt_async_info pointers of all its depending target task. We'll talk about how
__kmpc_get_target_task_waiting_list is implemented later. For each async info, if the depending
task and current task are from the same *type* of target device, which means we can ask the device
API to take care of the dependency, it calls device RTL function wait_event which is mapped to
the plugin interface __tgt_rtl_wait_event to insert the event before doing real offloading works.
I'll cover the map of plugin interfaces and their functionality later. The wait_event is expected to
be asynchronous and its effect is to tell the device RTL that all later enqueued operations can only
be started once the inserted events are full-filled. This mechanism will not work if current task depends
on a target task that is on another type of target device. In this case, we will perform queryAndWait
to check whether the corresponding event is full-filled, aka. the corresponding target task is finished.
If not, yield the current task. It's worth noting that we're not infinitely yielding current task. There is
a counter to tell how much time we have yielded. If it reaches a certain point, it will not yield again.
Instead, it will _synchronize_ the event, which is a blocking wait. This is an optimization to avoid
long-time looping when there is no task in the queue. Two target tasks are of same type if their device
RTLs are same.

Once we finish insert all waiting events, we can start the offloading work of current target task. Again,
the device RTL will make sure that our following offloading operations will not be started until all
waiting events are full-filled. The offloading work of current target task is done by target. It
basically transfers data to device, launches the kernel, and then transfers data back to host. Note that
all these operations are asynchronous. After that, we need to get an event which can only be full-filled
if all operations enqueued before are done. The event is fetched by calling recordEvent which is mapped
to __tgt_rtl_record_event. In fact, this step may not be necessary for some target devices if the event
is generated by each enqueue. In that case, just leave the __tgt_rtl_record_event empty and return
OFFLOAD_SUCCESS. Now we have the event, and we need to attach it to the current task by
calling __kmpc_set_async_info such that all its depended tasks can fetch and use it. There is an
optimization that if there is no dependency in this task, we don't need to do that. However, due to the
issue in current CG that it cannot pass right number into those functions, we cannot depend on it now.
As a consequence, we could only set it whatever. After this point, all its _depended_ tasks can get the
async info and starts their own wait by inserting the event. For current task, it performs queryAndWait
which basically is exactly the one we mentioned for the dependency waiting of two different types of
target devices, and finally finish the current target tasks.

So there are four new plugin interfaces:
__tgt_rtl_release_async_info: To release the asynchronous information, basically returning the Queue
and destroying the event.
__tgt_rtl_wait_event: Non-blocking wait for events. It is like just inserting the event and all following
enqueuing will not be started once the event is full-filled. Since it is non-blocking, we can still enqueue
operations even if the event is not full-filled. They just cannot be started. This can improve the
concurrency.
__tgt_rtl_record_event: Basically to generate an event which can only be full-filled when all previously
enqueued operations are finished. The _record_ here is a CUDA terminology. Feel free to comment
if you have a better name.
__tgt_rtl_check_event: To check whether the event is full-filled. If not, returns OFFLOAD_NOT_DONE;
If yes, returns OFFLOAD_SUCCESS. Return OFFLOAD_FAIL if anything is wrong in RTL.

The last part is about some functions implemented in libomp. We add two member data in the depnode
data structure because it is a per-task data structure and implemented with reference count. One is
a linked list successors, and another one is a void * pointer which is the async info of current
target task.

__kmpc_get_target_task_waiting_list basically goes through all nodes in successors and check
whether the corresponding async information pointer is nullptr. If yes, it means the target task has
not set the async info yet. We yield the current task here. If not, push the pointer to a list which will
be used by current task.

Like before, once the reference count of a depnode is zero, this node will be freed. It calls the function
__kmpc_free_async_info to release corresponding information and free the memory, and deref all
nodes in its successors.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tianshilei1992 created this revision.Jun 16 2020, 8:26 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 16 2020, 8:26 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: openmp-commits, sstefan1, jfb and 2 others. · View Herald Transcript

Two high level comments below. We need to split this patch.

Can you explain the approach in a bit more detail in the commit message? (Also a typo in there).

openmp/libomptarget/src/omptarget.cpp
596	I guess we can make async info a pointer argument in a separate (NFC) patch to reduce this one, WDYT?
openmp/libomptarget/src/rtl.cpp
416	Style: Everywhere I have seen this we do `/* name */ value`. I know this was different here but I'd like us to align with LLVM & Clang on this one. Feel free to commit the comments for all but the new argument as NFC without further review.
openmp/libomptarget/src/rtl.h
111	Typo in comment

Harbormaster failed remote builds in B60582: Diff 271269!Jun 16 2020, 9:19 PM

Is there some design documentation on this? It's tricky to distinguish intent from quirks of cuda.

Amdgcn is built on the 'heterogenous system architecture' model which has a fair amount of support for managing graphs of tasks but also has challenging forward progress properties. I'm not immediately sure it would share much code with the nvptx implementation.

In D81989#2098113, @JonChesterfield wrote:

Is there some design documentation on this? It's tricky to distinguish intent from quirks of cuda.

Amdgcn is built on the 'heterogenous system architecture' model which has a fair amount of support for managing graphs of tasks but also has challenging forward progress properties. I'm not immediately sure it would share much code with the nvptx implementation.

I'll add some documentation.

The high level idea is:

Add events to a queue. This operation is not blocking.
Add following operations into the queue.
Save the event from the second step.

Does AMD GCN support this pattern? The record event thing can be optional because I know some device RT generate the event when pushing an operation into a queue, like OpenCL.

ye-luo added a subscriber: ye-luo.Jun 17 2020, 2:21 PM

Fixed some issues and code style

Harbormaster failed remote builds in B60827: Diff 271715!Jun 18 2020, 8:40 AM

tianshilei1992 edited the summary of this revision. (Show Details)Jun 18 2020, 9:49 AM

tianshilei1992 updated this revision to Diff 272503.Jun 22 2020, 11:24 AM

tianshilei1992 edited the summary of this revision. (Show Details)Jun 24 2020, 1:23 PM

tianshilei1992 updated this revision to Diff 274215.Jun 29 2020, 1:08 PM

tianshilei1992 planned changes to this revision.Jan 16 2021, 2:55 PM

tianshilei1992 abandoned this revision.Feb 23 2021, 3:46 PM

RaviNarayanaswamy added a subscriber: RaviNarayanaswamy.Mar 29 2022, 5:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 5:24 PM

tianshilei1992 mentioned this in D132005: [OpenMP] Add non-blocking support for target nowait regions.Oct 22 2022, 4:29 PM

Revision Contents

Path

Size

openmp/

libomptarget/

include/

omptarget.h

10 lines

omptargetplugin.h

25 lines

plugins/

cuda/

src/

rtl.cpp

141 lines

exports

4 lines

src/

10 lines

2 lines

304 lines

285 lines

25 lines

44 lines

29 lines

runtime/

src/

13 lines

24 lines

63 lines

69 lines

Diff 271715

openmp/libomptarget/include/omptarget.h

Show All 13 Lines
#ifndef _OMPTARGET_H_		#ifndef _OMPTARGET_H_
#define _OMPTARGET_H_		#define _OMPTARGET_H_

#include <stdint.h>		#include <stdint.h>
#include <stddef.h>		#include <stddef.h>

#define OFFLOAD_SUCCESS (0)		#define OFFLOAD_SUCCESS (0)
#define OFFLOAD_FAIL (~0)		#define OFFLOAD_FAIL (~0)
		#define OFFLOAD_NOT_DONE (1)

#define OFFLOAD_DEVICE_DEFAULT -1		#define OFFLOAD_DEVICE_DEFAULT -1
#define HOST_DEVICE -10		#define HOST_DEVICE -10

/// Data attributes for each data reference used in an OpenMP target region.		/// Data attributes for each data reference used in an OpenMP target region.
enum tgt_map_type {		enum tgt_map_type {
// No flags		// No flags
OMP_TGT_MAPTYPE_NONE = 0x000,		OMP_TGT_MAPTYPE_NONE = 0x000,
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	struct __tgt_target_table {
__tgt_offload_entry *EntriesBegin; // Begin of the table with all the entries		__tgt_offload_entry *EntriesBegin; // Begin of the table with all the entries
__tgt_offload_entry		__tgt_offload_entry
*EntriesEnd; // End of the table with all the entries (non inclusive)		*EntriesEnd; // End of the table with all the entries (non inclusive)
};		};

/// This struct contains information exchanged between different asynchronous		/// This struct contains information exchanged between different asynchronous
/// operations for device-dependent optimization and potential synchronization		/// operations for device-dependent optimization and potential synchronization
struct __tgt_async_info {		struct __tgt_async_info {
		// Device ID. Note that it is NOT the RTLDeviceID. We don't need to store the
		// RTLDeviceID explicitly as we can always get it via DeviceID.
		int DeviceID = -1;
// A pointer to a queue-like structure where offloading operations are issued.		// A pointer to a queue-like structure where offloading operations are issued.
// We assume to use this structure to do synchronization. In CUDA backend, it		// We assume to use this structure to do synchronization.
// is CUstream.
void *Queue = nullptr;		void *Queue = nullptr;
		// A pointer to a device-dependent event used for synchronization as well.
		void *Event = nullptr;
};		};

#ifdef __cplusplus		#ifdef __cplusplus
extern "C" {		extern "C" {
#endif		#endif

int omp_get_num_devices(void);		int omp_get_num_devices(void);
int omp_get_initial_device(void);		int omp_get_initial_device(void);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	int __tgt_target_teams(int64_t device_id, void *host_ptr, int32_t arg_num,
int32_t thread_limit);		int32_t thread_limit);
int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,		int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args,		int32_t arg_num, void args_base, void args,
int64_t arg_sizes, int64_t arg_types,		int64_t arg_sizes, int64_t arg_types,
int32_t num_teams, int32_t thread_limit,		int32_t num_teams, int32_t thread_limit,
int32_t depNum, void *depList,		int32_t depNum, void *depList,
int32_t noAliasDepNum, void *noAliasDepList);		int32_t noAliasDepNum, void *noAliasDepList);
void __kmpc_push_target_tripcount(int64_t device_id, uint64_t loop_tripcount);		void __kmpc_push_target_tripcount(int64_t device_id, uint64_t loop_tripcount);
		void __kmpc_free_async_info(void *Ptr);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_free_async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_free_async_info' [readability…

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
#include <stdio.h>		#include <stdio.h>
#define DEBUGP(prefix, ...) \		#define DEBUGP(prefix, ...) \
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

openmp/libomptarget/include/omptargetplugin.h

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	int32_t __tgt_rtl_run_target_team_region(int32_t ID, void Entry, void *Args,
uint64_t loop_tripcount);		uint64_t loop_tripcount);

// Asynchronous version of __tgt_rtl_run_target_team_region		// Asynchronous version of __tgt_rtl_run_target_team_region
int32_t __tgt_rtl_run_target_team_region_async(		int32_t __tgt_rtl_run_target_team_region_async(
int32_t ID, void Entry, void Args, ptrdiff_t Offsets, int32_t NumArgs,		int32_t ID, void Entry, void Args, ptrdiff_t Offsets, int32_t NumArgs,
int32_t NumTeams, int32_t ThreadLimit, uint64_t loop_tripcount,		int32_t NumTeams, int32_t ThreadLimit, uint64_t loop_tripcount,
__tgt_async_info *AsyncInfoPtr);		__tgt_async_info *AsyncInfoPtr);

// Device synchronization. In case of success, return zero. Otherwise, return an		// Release all resources in __tgt_async_info
// error code.		int32_t __tgt_rtl_release_async_info(int32_t ID, __tgt_async_info *AsyncInfo);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_release_async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_release_async_info'…
int32_t __tgt_rtl_synchronize(int32_t ID, __tgt_async_info *AsyncInfoPtr);
		// Wait an event. This is different from synchronizing an event. Waiting an
		// event is a non-blocking operation. Basically, all operations enqueued after
		// this waiting should be blocked until this event is full-filled.
		int32_t __tgt_rtl_wait_event(int32_t ID, __tgt_async_info *AsyncInfo,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_wait_event' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_wait_event' [readability…
		__tgt_async_info *DepAsyncInfo);

		// Record an event such that the event can be later used for waiting or
		// synchronization. Note that once the event is recorded, all following use of
		// async_info should not use the queue again. In the implementation, the queue
		// should be released somehow.
		int32_t __tgt_rtl_record_event(int32_t ID, __tgt_async_info *AsyncInfo);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_record_event' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_record_event' [readability…

		// Synchronize an event. This is a blocking operation.
		int32_t __tgt_rtl_synchronize(int32_t ID, __tgt_async_info *AsyncInfo);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_synchronize' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_synchronize' [readability…

		// Query an event whether it has been full-filled. If return OFFLOAD_SUCCESS,
		// the event has been full-filled. If OFFLOAD_NOT_DONE, it has not been finished
		// yet. If OFFLOAD_FAIL, something wrong.
		int32_t __tgt_rtl_check_event(int32_t ID, __tgt_async_info *AsyncInfo);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_check_event' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_check_event' [readability…

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

#endif // _OMPTARGETPLUGIN_H_		#endif // _OMPTARGETPLUGIN_H_

openmp/libomptarget/plugins/cuda/src/rtl.cpp

//===----RTLs/cuda/src/rtl.cpp - Target RTLs Implementation ------- C++ -*-===//		//===----RTLs/cuda/src/rtl.cpp - Target RTLs Implementation ------- C++ -*-===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// RTL for CUDA machine		// RTL for CUDA machine
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cuda.h>		#include <cuda.h>
		Lint: Pre-merge checks Inline Actions clang-tidy: error: 'cuda.h' file not found [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: 'cuda.h' file not found [clang-diagnostic-error] [[https://github.
#include <list>		#include <list>
#include <memory>		#include <memory>
#include <mutex>		#include <mutex>
#include <string>		#include <string>
#include <vector>		#include <vector>

#include "omptargetplugin.h"		#include "omptargetplugin.h"

▲ Show 20 Lines • Show All 896 Lines • ▼ Show 20 Lines	if (!checkResult(Err, "Error returned from cuLaunchKernel\n"))
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;

DP("Launch of entry point at " DPxMOD " successful!\n",		DP("Launch of entry point at " DPxMOD " successful!\n",
DPxPTR(TgtEntryPtr));		DPxPTR(TgtEntryPtr));

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

		// Since we have two items that can be synchronized, we will always first
		// try to synchronize the event. If success, return directly. Otherwise,
		// synchronize the stream.
int synchronize(const int DeviceId, __tgt_async_info *AsyncInfoPtr) const {		int synchronize(const int DeviceId, __tgt_async_info *AsyncInfoPtr) const {
		CUresult Err;

		if (AsyncInfoPtr->Event) {
		CUevent Event = reinterpret_cast<CUevent>(AsyncInfoPtr->Event);
		Err = cuEventSynchronize(Event);
		if (!checkResult(Err, "error returned from cuEventSynchronize"))
		return OFFLOAD_FAIL;

		return OFFLOAD_SUCCESS;
		}

		assert(AsyncInfoPtr->Queue && "AsyncInfoPtr->Queue is nullptr");

CUstream Stream = reinterpret_cast<CUstream>(AsyncInfoPtr->Queue);		CUstream Stream = reinterpret_cast<CUstream>(AsyncInfoPtr->Queue);
CUresult Err = cuStreamSynchronize(Stream);		Err = cuStreamSynchronize(Stream);
if (Err != CUDA_SUCCESS) {		if (!checkResult(Err, "error returned from cuStreamSynchronize"))
DP("Error when synchronizing stream. stream = " DPxMOD
", async info ptr = " DPxMOD "\n",
DPxPTR(Stream), DPxPTR(AsyncInfoPtr));
CUDA_ERR_STRING(Err);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;

		StreamManager->returnStream(DeviceId, Stream);
		AsyncInfoPtr->Queue = nullptr;

		return OFFLOAD_SUCCESS;
}		}

// Once the stream is synchronized, return it to stream pool and reset		int releaseAsyncInfo(int DeviceId, __tgt_async_info *AsyncInfoPtr) const {
// async_info. This is to make sure the synchronization only works for its		if (AsyncInfoPtr->Queue) {
// own tasks.
StreamManager->returnStream(		StreamManager->returnStream(
DeviceId, reinterpret_cast<CUstream>(AsyncInfoPtr->Queue));		DeviceId, reinterpret_cast<CUstream>(AsyncInfoPtr->Queue));
AsyncInfoPtr->Queue = nullptr;		AsyncInfoPtr->Queue = nullptr;
		}

		if (AsyncInfoPtr->Event) {
		CUresult Err =
		cuEventDestroy(reinterpret_cast<CUevent>(AsyncInfoPtr->Event));
		if (!checkResult(Err, "error returned from cuEventDestroy"))
		return OFFLOAD_FAIL;

		AsyncInfoPtr->Event = nullptr;
		}

		return OFFLOAD_SUCCESS;
		}

		int waitEvent(int DeviceID, __tgt_async_info *AsyncInfo,
		__tgt_async_info *DepAsyncInfo) const {
		CUstream Stream = getStream(DeviceID, AsyncInfo);
		CUevent Event = reinterpret_cast<CUevent>(DepAsyncInfo->Event);

		CUresult Err = cuStreamWaitEvent(Stream, Event, 0);
		if (!checkResult(Err, "error returned from cuStreamWaitEvent"))
		return OFFLOAD_FAIL;

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

		int recordEvent(int DeviceId, __tgt_async_info *AsyncInfoPtr) const {
		CUstream Stream = reinterpret_cast<CUstream>(AsyncInfoPtr->Queue);
		CUevent Event;
		CUresult Err;

		if (AsyncInfoPtr->Event == nullptr) {
		Err = cuEventCreate(&Event, CU_EVENT_DISABLE_TIMING);
		if (!checkResult(Err, "error returned from cuEventCreate"))
		return OFFLOAD_FAIL;
		AsyncInfoPtr->Event = Event;
		} else {
		Event = reinterpret_cast<CUevent>(AsyncInfoPtr->Event);
		}

		Err = cuEventRecord(Event, Stream);
		if (!checkResult(Err, "error returned from cuEventRecord"))
		return OFFLOAD_FAIL;

		// Return the stream back to pool
		StreamManager->returnStream(
		DeviceId, reinterpret_cast<CUstream>(AsyncInfoPtr->Queue));
		AsyncInfoPtr->Queue = nullptr;

		return OFFLOAD_SUCCESS;
		}

		int checkEvent(int DeviceId, __tgt_async_info *AsyncInfoPtr) const {
		CUevent Event = reinterpret_cast<CUevent>(AsyncInfoPtr->Event);
		CUresult Err = cuEventQuery(Event);
		// Event has been full-filled
		if (Err == CUDA_SUCCESS)
		return OFFLOAD_SUCCESS;
		// Event has not been full-filled
		if (Err == CUDA_ERROR_NOT_READY)
		return OFFLOAD_NOT_DONE;
		// Other errors
		checkResult(Err, "error returned from cuEventQuery");
		return OFFLOAD_FAIL;
		}
};		};

DeviceRTLTy DeviceRTL;		DeviceRTLTy DeviceRTL;
} // namespace		} // namespace

// Exposed library API function		// Exposed library API function
#ifdef __cplusplus		#ifdef __cplusplus
extern "C" {		extern "C" {
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	int32_t __tgt_rtl_run_target_region_async(int32_t device_id,
assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");

return __tgt_rtl_run_target_team_region_async(		return __tgt_rtl_run_target_team_region_async(
device_id, tgt_entry_ptr, tgt_args, tgt_offsets, arg_num,		device_id, tgt_entry_ptr, tgt_args, tgt_offsets, arg_num,
/* team num/ 1, / thread_limit / 1, / loop_tripcount */ 0,		/* team num/ 1, / thread_limit / 1, / loop_tripcount */ 0,
async_info_ptr);		async_info_ptr);
}		}

int32_t __tgt_rtl_synchronize(int32_t device_id,		int32_t __tgt_rtl_release_async_info(int32_t device_id,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_release_async_info' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'device_id' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_release_async_info'…
__tgt_async_info *async_info_ptr) {		__tgt_async_info *async_info) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier…
assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
assert(async_info_ptr && "async_info_ptr is nullptr");		assert(async_info && "async_info is nullptr");
assert(async_info_ptr->Queue && "async_info_ptr->Queue is nullptr");
		return DeviceRTL.releaseAsyncInfo(device_id, async_info);
		}

		int32_t __tgt_rtl_wait_event(int32_t device_id, __tgt_async_info *async_info,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_wait_event' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'device_id' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_wait_event' [readability…
		__tgt_async_info *dep_async_info) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'dep_async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'dep_async_info' [readability-identifier…
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert(async_info->Queue && "async_info->Queue is nullptr");
		assert(dep_async_info->Event && "dep_async_info->Event is nullptr");

		return DeviceRTL.waitEvent(device_id, async_info, dep_async_info);
		}

		int32_t __tgt_rtl_record_event(int32_t device_id,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_record_event' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'device_id' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_record_event' [readability…
		__tgt_async_info *async_info) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier…
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert(async_info->Queue && "async_info->Queue is nullptr");

		return DeviceRTL.recordEvent(device_id, async_info);
		}

		int32_t __tgt_rtl_synchronize(int32_t device_id, __tgt_async_info *async_info) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_synchronize' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'device_id' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_synchronize' [readability…
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert((async_info->Event \|\| async_info->Queue) &&
		"Both async_info->Event and async_info->Queue are nullptr");

		return DeviceRTL.synchronize(device_id, async_info);
		}

		int32_t __tgt_rtl_check_event(int32_t device_id, __tgt_async_info *async_info) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__tgt_rtl_check_event' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'device_id' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__tgt_rtl_check_event' [readability…
		assert(DeviceRTL.isValidDeviceId(device_id) && "device_id is invalid");
		assert(async_info && "async_info is nullptr");
		assert(async_info->Event && "async_info->Event is nullptr");

return DeviceRTL.synchronize(device_id, async_info_ptr);		return DeviceRTL.checkEvent(device_id, async_info);
}		}

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

openmp/libomptarget/plugins/exports

Show All 12 Lines	global:
__tgt_rtl_data_retrieve_async;		__tgt_rtl_data_retrieve_async;
__tgt_rtl_data_exchange;		__tgt_rtl_data_exchange;
__tgt_rtl_data_exchange_async;		__tgt_rtl_data_exchange_async;
__tgt_rtl_data_delete;		__tgt_rtl_data_delete;
__tgt_rtl_run_target_team_region;		__tgt_rtl_run_target_team_region;
__tgt_rtl_run_target_team_region_async;		__tgt_rtl_run_target_team_region_async;
__tgt_rtl_run_target_region;		__tgt_rtl_run_target_region;
__tgt_rtl_run_target_region_async;		__tgt_rtl_run_target_region_async;
		__tgt_rtl_release_async_info;
		__tgt_rtl_wait_event;
		__tgt_rtl_record_event;
__tgt_rtl_synchronize;		__tgt_rtl_synchronize;
		__tgt_rtl_check_event;
local:		local:
*;		*;
};		};

openmp/libomptarget/src/device.cpp

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	__tgt_target_table DeviceTy::load_binary(void Img) {
__tgt_target_table *rc = RTL->load_binary(RTLDeviceID, Img);		__tgt_target_table *rc = RTL->load_binary(RTLDeviceID, Img);
RTL->Mtx.unlock();		RTL->Mtx.unlock();
return rc;		return rc;
}		}

// Submit data to device		// Submit data to device
int32_t DeviceTy::data_submit(void TgtPtrBegin, void HstPtrBegin,		int32_t DeviceTy::data_submit(void TgtPtrBegin, void HstPtrBegin,
int64_t Size, __tgt_async_info *AsyncInfoPtr) {		int64_t Size, __tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->data_submit_async \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->data_submit_async)
return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);		return RTL->data_submit(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size);
else		else
return RTL->data_submit_async(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size,		return RTL->data_submit_async(RTLDeviceID, TgtPtrBegin, HstPtrBegin, Size,
AsyncInfoPtr);		AsyncInfoPtr);
}		}

// Retrieve data from device		// Retrieve data from device
int32_t DeviceTy::data_retrieve(void HstPtrBegin, void TgtPtrBegin,		int32_t DeviceTy::data_retrieve(void HstPtrBegin, void TgtPtrBegin,
int64_t Size, __tgt_async_info *AsyncInfoPtr) {		int64_t Size, __tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->data_retrieve_async \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->data_retrieve_async)
return RTL->data_retrieve(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size);		return RTL->data_retrieve(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size);
else		else
return RTL->data_retrieve_async(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size,		return RTL->data_retrieve_async(RTLDeviceID, HstPtrBegin, TgtPtrBegin, Size,
AsyncInfoPtr);		AsyncInfoPtr);
}		}

// Copy data from current device to destination device directly		// Copy data from current device to destination device directly
int32_t DeviceTy::data_exchange(void SrcPtr, DeviceTy DstDev, void DstPtr,		int32_t DeviceTy::data_exchange(void SrcPtr, DeviceTy DstDev, void DstPtr,
int64_t Size, __tgt_async_info *AsyncInfoPtr) {		int64_t Size, __tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->data_exchange_async \|\| !RTL->synchronize) {		if (!AsyncInfoPtr \|\| !RTL->data_exchange_async) {
assert(RTL->data_exchange && "RTL->data_exchange is nullptr");		assert(RTL->data_exchange && "RTL->data_exchange is nullptr");
return RTL->data_exchange(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID, DstPtr,		return RTL->data_exchange(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID, DstPtr,
Size);		Size);
} else		} else
return RTL->data_exchange_async(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID,		return RTL->data_exchange_async(RTLDeviceID, SrcPtr, DstDev.RTLDeviceID,
DstPtr, Size, AsyncInfoPtr);		DstPtr, Size, AsyncInfoPtr);
}		}

// Run region on device		// Run region on device
int32_t DeviceTy::run_region(void TgtEntryPtr, void *TgtVarsPtr,		int32_t DeviceTy::run_region(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
__tgt_async_info *AsyncInfoPtr) {		__tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->run_region \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->run_region)
return RTL->run_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr, TgtOffsets,		return RTL->run_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr, TgtOffsets,
TgtVarsSize);		TgtVarsSize);
else		else
return RTL->run_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
TgtOffsets, TgtVarsSize, AsyncInfoPtr);		TgtOffsets, TgtVarsSize, AsyncInfoPtr);
}		}

// Run team region on device.		// Run team region on device.
int32_t DeviceTy::run_team_region(void TgtEntryPtr, void *TgtVarsPtr,		int32_t DeviceTy::run_team_region(void TgtEntryPtr, void *TgtVarsPtr,
ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,		ptrdiff_t *TgtOffsets, int32_t TgtVarsSize,
int32_t NumTeams, int32_t ThreadLimit,		int32_t NumTeams, int32_t ThreadLimit,
uint64_t LoopTripCount,		uint64_t LoopTripCount,
__tgt_async_info *AsyncInfoPtr) {		__tgt_async_info *AsyncInfoPtr) {
if (!AsyncInfoPtr \|\| !RTL->run_team_region_async \|\| !RTL->synchronize)		if (!AsyncInfoPtr \|\| !RTL->run_team_region_async)
return RTL->run_team_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_team_region(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
TgtOffsets, TgtVarsSize, NumTeams, ThreadLimit,		TgtOffsets, TgtVarsSize, NumTeams, ThreadLimit,
LoopTripCount);		LoopTripCount);
else		else
return RTL->run_team_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,		return RTL->run_team_region_async(RTLDeviceID, TgtEntryPtr, TgtVarsPtr,
TgtOffsets, TgtVarsSize, NumTeams,		TgtOffsets, TgtVarsSize, NumTeams,
ThreadLimit, LoopTripCount, AsyncInfoPtr);		ThreadLimit, LoopTripCount, AsyncInfoPtr);
}		}
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

openmp/libomptarget/src/exports

Show All 19 Lines	global:
omp_target_alloc;		omp_target_alloc;
omp_target_free;		omp_target_free;
omp_target_is_present;		omp_target_is_present;
omp_target_memcpy;		omp_target_memcpy;
omp_target_memcpy_rect;		omp_target_memcpy_rect;
omp_target_associate_ptr;		omp_target_associate_ptr;
omp_target_disassociate_ptr;		omp_target_disassociate_ptr;
__kmpc_push_target_tripcount;		__kmpc_push_target_tripcount;
		__kmpc_free_async_info;
local:		local:
*;		*;
};		};

openmp/libomptarget/src/interface.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	case tgt_disabled:
}		}
break;		break;
case tgt_default:		case tgt_default:
FATAL_MESSAGE0(1, "default offloading policy must be switched to "		FATAL_MESSAGE0(1, "default offloading policy must be switched to "
"mandatory or disabled");		"mandatory or disabled");
break;		break;
case tgt_mandatory:		case tgt_mandatory:
if (!success) {		if (!success) {
FATAL_MESSAGE0(1, "failure of target construct while offloading is mandatory");		FATAL_MESSAGE0(
		1, "failure of target construct while offloading is mandatory");
}		}
break;		break;
}		}
}		}

		template <bool Begin> static bool checkAndInitDevice(int64_t &DeviceId) {
		if (IsOffloadDisabled())
		return false;

		// No devices available?
		if (DeviceId == OFFLOAD_DEVICE_DEFAULT) {
		DeviceId = omp_get_default_device();
		DP("Use default device id %" PRId64 "\n", DeviceId);
		}

		// Invalid device id as we always expect a non-negative device id and it must
		// be less than the size of all device RTLs
		if (DeviceId < 0 \|\| static_cast<uint64_t>(DeviceId) >= Devices.size()) {
		DP("Invalid device %" PRId64 "\n", DeviceId);
		return false;
		}

		if (!Begin)
		return true;

		if (CheckDeviceAndCtors(DeviceId) != OFFLOAD_SUCCESS) {
		DP("Failed to get device %" PRId64 " ready\n", DeviceId);
		HandleTargetOutcome(false);
		return false;
		} else {
		return true;
		}
		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// adds requires flags		/// adds requires flags
EXTERN void __tgt_register_requires(int64_t flags) {		EXTERN void __tgt_register_requires(int64_t flags) {
RTLs->RegisterRequires(flags);		RTLs->RegisterRequires(flags);
}		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// adds a target shared library to the target execution image		/// adds a target shared library to the target execution image
EXTERN void __tgt_register_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_register_lib(__tgt_bin_desc *desc) {
RTLs->RegisterLib(desc);		RTLs->RegisterLib(desc);
}		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// unloads a target shared library		/// unloads a target shared library
EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {
RTLs->UnregisterLib(desc);		RTLs->UnregisterLib(desc);
}		}

/// creates host-to-target data mapping, stores it in the		/// creates host-to-target data mapping, stores it in the
/// libomptarget.so internal structure (an entry in a stack of data maps)		/// libomptarget.so internal structure (an entry in a stack of data maps)
/// and passes the data to the device.		/// and passes the data to the device.
EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return;		// device_id will be corrected if it is default value
		if (!checkAndInitDevice<true>(device_id))
		return;

DP("Entering data begin region for device %" PRId64 " with %d mappings\n",		DP("Entering data begin region for device %" PRId64 " with %d mappings\n",
device_id, arg_num);		device_id, arg_num);

// No devices available?
if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();
DP("Use default device id %" PRId64 "\n", device_id);
}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);
HandleTargetOutcome(false);
return;
}

DeviceTy &Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i = 0; i < arg_num; ++i) {		for (int i = 0; i < arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n",		", Type=0x%" PRIx64 "\n",
i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target_data_begin(Device, arg_num, args_base, args, arg_sizes,		int rc = target_data_begin(Device, arg_num, args_base, args, arg_sizes,
arg_types, nullptr);		arg_types, nullptr);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return;

		DP("Entering data begin region for device %" PRId64 " with %d mappings\n",
		device_id, arg_num);

		DeviceTy &Device = Devices[device_id];

__tgt_target_data_begin(device_id, arg_num, args_base, args, arg_sizes,		__tgt_async_info *async_info = new __tgt_async_info;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier…
arg_types);		async_info->DeviceID = device_id;

		const int rc = target_data_nowait(
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
		Device, arg_num, args_base, args, arg_sizes, arg_types, async_info,
		depNum, depList, noAliasDepNum, noAliasDepList, target_data_begin);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// delete async_info;
}		}

/// passes data from the target, releases target memory and destroys		/// passes data from the target, releases target memory and destroys
/// the host-target mapping (top entry from the stack of data maps)		/// the host-target mapping (top entry from the stack of data maps)
/// created by the last __tgt_target_data_begin.		/// created by the last __tgt_target_data_begin.
EXTERN void __tgt_target_data_end(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_end(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return;		// device_id will be corrected if it is default value
DP("Entering data end region with %d mappings\n", arg_num);		if (!checkAndInitDevice<false>(device_id))
		return;

// No devices available?		DP("Entering data end region for device %" PRId64 " with %d mappings\n",
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		device_id, arg_num);
device_id = omp_get_default_device();
}

RTLsMtx->lock();		RTLsMtx->lock();
size_t Devices_size = Devices.size();		size_t Devices_size = Devices.size();
RTLsMtx->unlock();		RTLsMtx->unlock();
if (Devices_size <= (size_t)device_id) {		if (Devices_size <= (size_t)device_id) {
DP("Device ID %" PRId64 " does not have a matching RTL.\n", device_id);		DP("Device ID %" PRId64 " does not have a matching RTL.\n", device_id);
HandleTargetOutcome(false);		HandleTargetOutcome(false);
return;		return;
Show All 18 Lines	int rc = target_data_end(Device, arg_num, args_base, args, arg_sizes,
arg_types, nullptr);		arg_types, nullptr);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return;

__tgt_target_data_end(device_id, arg_num, args_base, args, arg_sizes,		DP("Entering data end region for device %" PRId64 " with %d mappings\n",
arg_types);		device_id, arg_num);
}

EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,		DeviceTy &Device = Devices[device_id];
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return;
DP("Entering data update with %d mappings\n", arg_num);

// No devices available?		__tgt_async_info *async_info = new __tgt_async_info;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier…
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		async_info->DeviceID = device_id;
device_id = omp_get_default_device();
		const int rc = target_data_nowait(
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
		Device, arg_num, args_base, args, arg_sizes, arg_types, async_info,
		depNum, depList, noAliasDepNum, noAliasDepList, target_data_end);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
		// If there is no dependency, this async info will not be attached to the task
		// therefore it can be free.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// delete async_info;
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,
DP("Failed to get device %" PRId64 " ready\n", device_id);		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
HandleTargetOutcome(false);		// device_id will be corrected if it is default value
		if (!checkAndInitDevice<true>(device_id))
return;		return;
}
		DP("Entering data update region for device %" PRId64 " with %d mappings\n",
		device_id, arg_num);

DeviceTy& Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];
int rc = target_data_update(Device, arg_num, args_base,		const int rc = target_data_update(Device, arg_num, args_base, args, arg_sizes,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
args, arg_sizes, arg_types);		arg_types, /* AsyncInfo */ nullptr);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
}		}

EXTERN void __tgt_target_data_update_nowait(		EXTERN void __tgt_target_data_update_nowait(
int64_t device_id, int32_t arg_num, void args_base, void args,		int64_t device_id, int32_t arg_num, void args_base, void args,
int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,		int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,
int32_t noAliasDepNum, void *noAliasDepList) {		int32_t noAliasDepNum, void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return;

__tgt_target_data_update(device_id, arg_num, args_base, args, arg_sizes,		DP("Entering data update region for device %" PRId64 " with %d mappings\n",
arg_types);		device_id, arg_num);
}

EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,		DeviceTy &Device = Devices[device_id];
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
if (IsOffloadDisabled()) return OFFLOAD_FAIL;
DP("Entering target region with entry point " DPxMOD " and device Id %"
PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {		// TODO: this part should be refined maybe in case of memory error
device_id = omp_get_default_device();		__tgt_async_info *async_info = new __tgt_async_info;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier…
		async_info->DeviceID = device_id;

		const int rc =
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
		target_data_nowait(Device, arg_num, args_base, args, arg_sizes, arg_types,
		async_info /* AsyncInfo */, depNum, depList,
		noAliasDepNum, noAliasDepList, target_data_update);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
		// If there is no dependency, this async info will not be attached to the task
		// therefore it can be free.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// delete async_info;
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,
DP("Failed to get device %" PRId64 " ready\n", device_id);		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
HandleTargetOutcome(false);		// device_id will be corrected if it is default value
		if (!checkAndInitDevice<true>(device_id))
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}
		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
		"\n",
		DPxPTR(host_ptr), device_id);

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i = 0; i < arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n",
arg_sizes[i], arg_types[i]);		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		const int rc =
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
arg_types, 0, 0, false /team/);		target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,
		arg_types, /* team_num / 0, / thread_limit */ 0,
		/* IsTeamConstruct/ false, / AsyncInfo */ nullptr);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
return rc;		return rc;
}		}

EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,		int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return OFFLOAD_FAIL;

		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
		"\n",
		DPxPTR(host_ptr), device_id);

return __tgt_target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		#ifdef OMPTARGET_DEBUG
arg_types);		for (int i = 0; i < arg_num; ++i) {
		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
		", Type=0x%" PRIx64 "\n",
		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
		}
		#endif

		// TODO: this part should be refined maybe in case of memory error
		__tgt_async_info *async_info = new __tgt_async_info;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier…
		async_info->DeviceID = device_id;

		const int rc = target_nowait(
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
		device_id, host_ptr, arg_num, args_base, args, arg_sizes, arg_types,
		/* team_num / 0, / thread_limit / 0, / IsTeamConstruct */ false,
		async_info, depNum, depList, noAliasDepNum, noAliasDepList);
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
		// If there is no dependency, this async info will not be attached to the task
		// therefore it can be free.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// delete async_info;
		return rc;
}		}

EXTERN int __tgt_target_teams(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_teams(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types, int32_t team_num, int32_t thread_limit) {		int64_t *arg_types, int32_t team_num, int32_t thread_limit) {
if (IsOffloadDisabled()) return OFFLOAD_FAIL;		// device_id will be corrected if it is default value
DP("Entering target region with entry point " DPxMOD " and device Id %"		if (!checkAndInitDevice<true>(device_id))
PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();
}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);
HandleTargetOutcome(false);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}
		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
		"\n",
		DPxPTR(host_ptr), device_id);

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
for (int i=0; i<arg_num; ++i) {		for (int i = 0; i < arg_num; ++i) {
DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),		", Type=0x%" PRIx64 "\n",
arg_sizes[i], arg_types[i]);		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
}		}
#endif		#endif

int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		const int rc = target(device_id, host_ptr, arg_num, args_base, args,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
arg_types, team_num, thread_limit, true /team/);		arg_sizes, arg_types, team_num, thread_limit,
		/* IsTeamConstruct / true, / AsyncInfo */ nullptr);
HandleTargetOutcome(rc == OFFLOAD_SUCCESS);		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);

return rc;		return rc;
}		}

EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,		int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,
void depList, int32_t noAliasDepNum, void noAliasDepList) {		void depList, int32_t noAliasDepNum, void noAliasDepList) {
if (depNum + noAliasDepNum > 0)		// device_id will be corrected if it is default value
__kmpc_omp_taskwait(NULL, __kmpc_global_thread_num(NULL));		if (!checkAndInitDevice<true>(device_id))
		return OFFLOAD_FAIL;

return __tgt_target_teams(device_id, host_ptr, arg_num, args_base, args,		DP("Entering target region with entry point " DPxMOD " and device Id %" PRId64
arg_sizes, arg_types, team_num, thread_limit);		"\n",
		DPxPTR(host_ptr), device_id);

		#ifdef OMPTARGET_DEBUG
		for (int i = 0; i < arg_num; ++i) {
		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
		", Type=0x%" PRIx64 "\n",
		i, DPxPTR(args_base[i]), DPxPTR(args[i]), arg_sizes[i], arg_types[i]);
		}
		#endif

		__tgt_async_info *async_info = new __tgt_async_info;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'async_info' [readability-identifier…
		async_info->DeviceID = device_id;

		const int rc = target_nowait(device_id, host_ptr, arg_num, args_base, args,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
		arg_sizes, arg_types, team_num, thread_limit,
		/* IsTeamConstruct */ true, async_info, depNum,
		depList, noAliasDepNum, noAliasDepList);
		// If there is no dependency, this async info will not be attached to the task
		// therefore it can be free.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// delete async_info;
		HandleTargetOutcome(rc == OFFLOAD_SUCCESS);
		return rc;
}		}

// Get the current number of components for a user-defined mapper.		// Get the current number of components for a user-defined mapper.
EXTERN int64_t __tgt_mapper_num_components(void *rt_mapper_handle) {		EXTERN int64_t __tgt_mapper_num_components(void *rt_mapper_handle) {
auto MapperComponentsPtr = (struct MapperComponentsTy )rt_mapper_handle;		auto MapperComponentsPtr = (struct MapperComponentsTy )rt_mapper_handle;
int64_t size = MapperComponentsPtr->Components.size();		int64_t size = MapperComponentsPtr->Components.size();
DP("__tgt_mapper_num_components(Handle=" DPxMOD ") returns %" PRId64 "\n",		DP("__tgt_mapper_num_components(Handle=" DPxMOD ") returns %" PRId64 "\n",
DPxPTR(rt_mapper_handle), size);		DPxPTR(rt_mapper_handle), size);
Show All 24 Lines	EXTERN void __kmpc_push_target_tripcount(int64_t device_id,

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %" PRId64 " ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
HandleTargetOutcome(false);		HandleTargetOutcome(false);
return;		return;
}		}

DP("__kmpc_push_target_tripcount(%" PRId64 ", %" PRIu64 ")\n", device_id,		DP("__kmpc_push_target_tripcount(%" PRId64 ", %" PRIu64 ")\n", device_id,
loop_tripcount);		loop_tripcount);
TblMapMtx->lock();		TblMapMtx->lock();
Devices[device_id].LoopTripCnt.emplace(__kmpc_global_thread_num(NULL),		Devices[device_id].LoopTripCnt.emplace(__kmpc_global_thread_num(NULL),
loop_tripcount);		loop_tripcount);
TblMapMtx->unlock();		TblMapMtx->unlock();
}		}

		EXTERN void __kmpc_free_async_info(void *Ptr) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_free_async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_free_async_info' [readability…
		if (!Ptr)
		return;
		__tgt_async_info AsyncInfo = reinterpret_cast<__tgt_async_info >(Ptr);
		int DeviceId = AsyncInfo->DeviceID;

		assert(DeviceId >= 0 && "Invalid DeviceId");

		DeviceTy &Device = Devices[DeviceId];

		Device.RTL->release_async_info(Device.RTLDeviceID, AsyncInfo);

		delete AsyncInfo;
		}

openmp/libomptarget/src/omptarget.cpp

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	static int InitLibrary(DeviceTy& Device) {
*/		*/
if (!Device.PendingCtorsDtors.empty()) {		if (!Device.PendingCtorsDtors.empty()) {
// Call all ctors for all libraries registered so far		// Call all ctors for all libraries registered so far
for (auto &lib : Device.PendingCtorsDtors) {		for (auto &lib : Device.PendingCtorsDtors) {
if (!lib.second.PendingCtors.empty()) {		if (!lib.second.PendingCtors.empty()) {
DP("Has pending ctors... call now\n");		DP("Has pending ctors... call now\n");
for (auto &entry : lib.second.PendingCtors) {		for (auto &entry : lib.second.PendingCtors) {
void *ctor = entry;		void *ctor = entry;
int rc = target(device_id, ctor, 0, NULL, NULL, NULL,		int rc = target(device_id, ctor, 0, NULL, NULL, NULL, NULL, 1, 1,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
NULL, 1, 1, true /team/);		true /team/, nullptr);
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP("Running ctor " DPxMOD " failed.\n", DPxPTR(ctor));		DP("Running ctor " DPxMOD " failed.\n", DPxPTR(ctor));
Device.PendingGlobalsMtx.unlock();		Device.PendingGlobalsMtx.unlock();
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
// Clear the list to indicate that this device has been used		// Clear the list to indicate that this device has been used
lib.second.PendingCtors.clear();		lib.second.PendingCtors.clear();
▲ Show 20 Lines • Show All 302 Lines • ▼ Show 20 Lines	if ((arg_types[i] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {
}		}
}		}
}		}

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

/// Internal function to pass data to/from the target.		/// Internal function to pass data to/from the target.
int target_data_update(DeviceTy &Device, int32_t arg_num,		int target_data_update(DeviceTy &Device, int32_t arg_num, void **args_base,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'target_data_update' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_num' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'args_base' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'target_data_update' [readability…
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void *args, int64_t arg_sizes, int64_t *arg_types,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'args' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_sizes' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_types' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'args' [readability-identifier-naming]…
		__tgt_async_info *AsyncInfo) {
// process each input.		// process each input.
for (int32_t i = 0; i < arg_num; ++i) {		for (int32_t i = 0; i < arg_num; ++i) {
if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))		(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
int64_t MapSize = arg_sizes[i];		int64_t MapSize = arg_sizes[i];
bool IsLast, IsHostPtr;		bool IsLast, IsHostPtr;
void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, MapSize, IsLast,		void *TgtPtrBegin =
false, IsHostPtr);		Device.getTgtPtrBegin(HstPtrBegin, MapSize, IsLast, false, IsHostPtr);
if (!TgtPtrBegin) {		if (!TgtPtrBegin) {
DP("hst data:" DPxMOD " not found, becomes a noop\n", DPxPTR(HstPtrBegin));		DP("hst data:" DPxMOD " not found, becomes a noop\n",
		DPxPTR(HstPtrBegin));
continue;		continue;
}		}

if (RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&		if (RTLs->RequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
TgtPtrBegin == HstPtrBegin) {		TgtPtrBegin == HstPtrBegin) {
DP("hst data:" DPxMOD " unified and shared, becomes a noop\n",		DP("hst data:" DPxMOD " unified and shared, becomes a noop\n",
DPxPTR(HstPtrBegin));		DPxPTR(HstPtrBegin));
continue;		continue;
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {		if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {
DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));		arg_sizes[i], DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));
int rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, MapSize, nullptr);		int rt =
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rt' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rt' [readability-identifier-naming]…
		Device.data_retrieve(HstPtrBegin, TgtPtrBegin, MapSize, AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data from device failed.\n");		DP("Copying data from device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t)HstPtrBegin;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'lb' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'lb' [readability-identifier-naming]…
uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;		uintptr_t ub = (uintptr_t)HstPtrBegin + MapSize;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'ub' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'ub' [readability-identifier-naming]…
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end(); ++it) {		it != Device.ShadowPtrMap.end(); ++it) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void )it->first;
if ((uintptr_t) ShadowHstPtrAddr < lb)		if ((uintptr_t)ShadowHstPtrAddr < lb)
continue;		continue;
if ((uintptr_t) ShadowHstPtrAddr >= ub)		if ((uintptr_t)ShadowHstPtrAddr >= ub)
break;		break;
DP("Restoring original host pointer value " DPxMOD " for host pointer "		DP("Restoring original host pointer value " DPxMOD
DPxMOD "\n", DPxPTR(it->second.HstPtrVal),		" for host pointer " DPxMOD "\n",
DPxPTR(ShadowHstPtrAddr));		DPxPTR(it->second.HstPtrVal), DPxPTR(ShadowHstPtrAddr));
*ShadowHstPtrAddr = it->second.HstPtrVal;		*ShadowHstPtrAddr = it->second.HstPtrVal;
}		}
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {		if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));		arg_sizes[i], DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));
int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, MapSize, nullptr);		int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, MapSize, AsyncInfo);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rt' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rt' [readability-identifier-naming]…
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t) HstPtrBegin;
uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;		uintptr_t ub = (uintptr_t) HstPtrBegin + MapSize;
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end(); ++it) {		it != Device.ShadowPtrMap.end(); ++it) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void )it->first;
if ((uintptr_t) ShadowHstPtrAddr < lb)		if ((uintptr_t)ShadowHstPtrAddr < lb)
continue;		continue;
if ((uintptr_t) ShadowHstPtrAddr >= ub)		if ((uintptr_t)ShadowHstPtrAddr >= ub)
break;		break;
DP("Restoring original target pointer value " DPxMOD " for target "		DP("Restoring original target pointer value " DPxMOD " for target "
"pointer " DPxMOD "\n", DPxPTR(it->second.TgtPtrVal),		"pointer " DPxMOD "\n",
DPxPTR(it->second.TgtPtrAddr));		DPxPTR(it->second.TgtPtrVal), DPxPTR(it->second.TgtPtrAddr));
rt = Device.data_submit(it->second.TgtPtrAddr,		rt = Device.data_submit(it->second.TgtPtrAddr, &it->second.TgtPtrVal,
&it->second.TgtPtrVal, sizeof(void *), nullptr);		sizeof(void *), AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();
}		}
Show All 9 Lines
}		}

/// performs the same actions as data_begin in case arg_num is		/// performs the same actions as data_begin in case arg_num is
/// non-zero and initiates run of the offloaded region on the target platform;		/// non-zero and initiates run of the offloaded region on the target platform;
/// if arg_num is non-zero after the region execution is done it also		/// if arg_num is non-zero after the region execution is done it also
/// performs the same action as data_update and data_end above. This function		/// performs the same action as data_update and data_end above. This function
/// returns 0 if it was able to transfer the execution to a target and an		/// returns 0 if it was able to transfer the execution to a target and an
/// integer different from zero otherwise.		/// integer different from zero otherwise.
int target(int64_t device_id, void *host_ptr, int32_t arg_num,		int target(int64_t device_id, void host_ptr, int32_t arg_num, void *args_base,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'device_id' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'host_ptr' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_num' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'args_base' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'device_id' [readability-identifier…
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void *args, int64_t arg_sizes, int64_t *arg_types,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'args' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_sizes' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_types' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'args' [readability-identifier-naming]…
int32_t team_num, int32_t thread_limit, int IsTeamConstruct) {		int32_t team_num, int32_t thread_limit, int IsTeamConstruct,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'team_num' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'thread_limit' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'team_num' [readability-identifier…
		__tgt_async_info *AsyncInfo) {
		jdoerfertUnsubmitted Not Done Reply Inline Actions I guess we can make async info a pointer argument in a separate (NFC) patch to reduce this one, WDYT? jdoerfert: I guess we can make async info a pointer argument in a separate (NFC) patch to reduce this one…
DeviceTy &Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];

// Find the table information in the map or look it up in the translation		// Find the table information in the map or look it up in the translation
// tables.		// tables.
TableMap *TM = 0;		TableMap *TM = 0;
TblMapMtx->lock();		TblMapMtx->lock();
HostPtrToTableMapTy::iterator TableMapIt = HostPtrToTableMap->find(host_ptr);		HostPtrToTableMapTy::iterator TableMapIt = HostPtrToTableMap->find(host_ptr);
if (TableMapIt == HostPtrToTableMap->end()) {		if (TableMapIt == HostPtrToTableMap->end()) {
Show All 37 Lines	int target(int64_t device_id, void host_ptr, int32_t arg_num, void *args_base,
// get target table.		// get target table.
TrlTblMtx->lock();		TrlTblMtx->lock();
assert(TM->Table->TargetsTable.size() > (size_t)device_id &&		assert(TM->Table->TargetsTable.size() > (size_t)device_id &&
"Not expecting a device ID outside the table's bounds!");		"Not expecting a device ID outside the table's bounds!");
__tgt_target_table *TargetTable = TM->Table->TargetsTable[device_id];		__tgt_target_table *TargetTable = TM->Table->TargetsTable[device_id];
TrlTblMtx->unlock();		TrlTblMtx->unlock();
assert(TargetTable && "Global data has not been mapped\n");		assert(TargetTable && "Global data has not been mapped\n");

__tgt_async_info AsyncInfo;

// Move data to device.		// Move data to device.
int rc = target_data_begin(Device, arg_num, args_base, args, arg_sizes,		int rc = target_data_begin(Device, arg_num, args_base, args, arg_sizes,
arg_types, &AsyncInfo);		arg_types, AsyncInfo);
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP("Call to target_data_begin failed, abort target.\n");		DP("Call to target_data_begin failed, abort target.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

std::vector<void *> tgt_args;		std::vector<void *> tgt_args;
std::vector<ptrdiff_t> tgt_offsets;		std::vector<ptrdiff_t> tgt_offsets;

Show All 34 Lines	if (!(arg_types[i] & OMP_TGT_MAPTYPE_TARGET_PARAM)) {
TgtPtrBegin == HstPtrBegin) {		TgtPtrBegin == HstPtrBegin) {
DP("Unified memory is active, no need to map lambda captured"		DP("Unified memory is active, no need to map lambda captured"
"variable (" DPxMOD ")\n", DPxPTR(HstPtrVal));		"variable (" DPxMOD ")\n", DPxPTR(HstPtrVal));
continue;		continue;
}		}
DP("Update lambda reference (" DPxMOD ") -> [" DPxMOD "]\n",		DP("Update lambda reference (" DPxMOD ") -> [" DPxMOD "]\n",
DPxPTR(Pointer_TgtPtrBegin), DPxPTR(TgtPtrBegin));		DPxPTR(Pointer_TgtPtrBegin), DPxPTR(TgtPtrBegin));
int rt = Device.data_submit(TgtPtrBegin, &Pointer_TgtPtrBegin,		int rt = Device.data_submit(TgtPtrBegin, &Pointer_TgtPtrBegin,
sizeof(void *), &AsyncInfo);		sizeof(void *), AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
continue;		continue;
}		}
void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
Show All 25 Lines	#ifdef OMPTARGET_DEBUG
"%sprivate array " DPxMOD " - pushing target argument " DPxMOD "\n",		"%sprivate array " DPxMOD " - pushing target argument " DPxMOD "\n",
arg_sizes[i], DPxPTR(TgtPtrBegin),		arg_sizes[i], DPxPTR(TgtPtrBegin),
(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),		(arg_types[i] & OMP_TGT_MAPTYPE_TO ? "first-" : ""),
DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBase));		DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBase));
#endif		#endif
// If first-private, copy data from host		// If first-private, copy data from host
if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {		if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, arg_sizes[i],		int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, arg_sizes[i],
&AsyncInfo);		AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed, failed.\n");		DP("Copying data to device failed, failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
} else if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {		} else if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {
TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBase, sizeof(void *), IsLast,		TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBase, sizeof(void *), IsLast,
false, IsHostPtr);		false, IsHostPtr);
Show All 32 Lines	#endif

// Launch device execution.		// Launch device execution.
DP("Launching target execution %s with pointer " DPxMOD " (index=%d).\n",		DP("Launching target execution %s with pointer " DPxMOD " (index=%d).\n",
TargetTable->EntriesBegin[TM->Index].name,		TargetTable->EntriesBegin[TM->Index].name,
DPxPTR(TargetTable->EntriesBegin[TM->Index].addr), TM->Index);		DPxPTR(TargetTable->EntriesBegin[TM->Index].addr), TM->Index);
if (IsTeamConstruct) {		if (IsTeamConstruct) {
rc = Device.run_team_region(TargetTable->EntriesBegin[TM->Index].addr,		rc = Device.run_team_region(TargetTable->EntriesBegin[TM->Index].addr,
&tgt_args[0], &tgt_offsets[0], tgt_args.size(),		&tgt_args[0], &tgt_offsets[0], tgt_args.size(),
team_num, thread_limit, ltc, &AsyncInfo);		team_num, thread_limit, ltc, AsyncInfo);
} else {		} else {
rc = Device.run_region(TargetTable->EntriesBegin[TM->Index].addr,		rc = Device.run_region(TargetTable->EntriesBegin[TM->Index].addr,
&tgt_args[0], &tgt_offsets[0], tgt_args.size(),		&tgt_args[0], &tgt_offsets[0], tgt_args.size(),
&AsyncInfo);		AsyncInfo);
}		}
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP ("Executing target region abort target.\n");		DP ("Executing target region abort target.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

// Deallocate (first-)private arrays		// Deallocate (first-)private arrays
for (auto it : fpArrays) {		for (auto it : fpArrays) {
int rt = Device.RTL->data_delete(Device.RTLDeviceID, it);		int rt = Device.RTL->data_delete(Device.RTLDeviceID, it);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Deallocation of (first-)private arrays failed.\n");		DP("Deallocation of (first-)private arrays failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}

// Move data from device.		// Move data from device.
int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,		int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,
arg_types, &AsyncInfo);		arg_types, AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Call to target_data_end failed, abort targe.\n");		DP("Call to target_data_end failed, abort target.\n");
		return OFFLOAD_FAIL;
		}

		return OFFLOAD_SUCCESS;
		}

		// Runtime functions from libomp
		extern "C" {
		void __kmpc_get_target_task_waiting_list(void *list, int num);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_waiting_list' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'list' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'num' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_waiting_list'…
		void __kmpc_set_async_info(void *async_info);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_set_async_info' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_set_async_info' [readability…
		void __kmpc_target_task_yield();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_target_task_yield' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_target_task_yield' [readability…
		int __kmpc_get_target_task_npredecessors();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_npredecessors' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_npredecessors'…
		}

		namespace {
		int queryAndWait(DeviceTy &Device, __tgt_async_info *AsyncInfo) {
		// TODO: Do we need to make it configurable?
		constexpr const int MAX_TASK_YIELD_COUNT = 16;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'MAX_TASK_YIELD_COUNT' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'MAX_TASK_YIELD_COUNT' [readability…
		int TaskYieldCount = 0;
		while (1) {
		int Ret = Device.RTL->check_event(Device.RTLDeviceID, AsyncInfo);
		if (Ret == OFFLOAD_SUCCESS)
		return OFFLOAD_SUCCESS;
		// Something wrong
		if (Ret == OFFLOAD_FAIL)
		return OFFLOAD_FAIL;
		// We have yielded enough time. Now do blocking waiting here.
		if (TaskYieldCount > MAX_TASK_YIELD_COUNT)
		return Device.RTL->synchronize(Device.RTLDeviceID, AsyncInfo);
		// Still not finished yet, do task yield
		__kmpc_target_task_yield();
		++TaskYieldCount;
		}

		assert("It should never reach this point!");
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: found assert() that could be replaced by static_assert() [misc-static-assert] not useful Lint: Pre-merge checks: clang-tidy: warning: found assert() that could be replaced by static_assert() [misc-static…
		// It should never reach this point
		return OFFLOAD_FAIL;
		}

		int recordEvent(DeviceTy &Device, __tgt_async_info *AsyncInfo) {
		int Ret = Device.RTL->record_event(Device.RTLDeviceID, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		assert(AsyncInfo->Event && "AsyncInfo->Event is nullptr");

		return OFFLOAD_SUCCESS;
		}

		int waitForDeps(DeviceTy &Device, __tgt_async_info *AsyncInfo) {
		// Wait until current task has no depending task because during the task
		// creation as we enqueue the task even if it has depending host tasks.
		while (__kmpc_get_target_task_npredecessors() != 0)
		__kmpc_target_task_yield();

		int Num;

		// Get the number of events that this task depends on
		__kmpc_get_target_task_waiting_list(nullptr, &Num);

		// We have a number of depending tasks so we need to insert the event wait
		// before pushing operations of current task into the queue
		if (Num > 0) {
		// Get a list of events that this task depends on
		std::vector<void *> WaitingList(Num, nullptr);
		__kmpc_get_target_task_waiting_list(WaitingList.data(), &Num);

		for (void *Ptr : WaitingList) {
		__tgt_async_info *WaitingAsyncInfo =
		reinterpret_cast<__tgt_async_info *>(Ptr);

		assert(WaitingAsyncInfo && "WaitingAsyncInfo is nullptr");
		assert(WaitingAsyncInfo->Event && "WaitingAsyncInfo->Event is nullptr");
		assert(WaitingAsyncInfo->DeviceID != -1 &&
		"Invalid WaitingAsyncInfo->DeviceID");

		int Ret;

		// Depend on a target task of different type. We do query and wait here.
		if (Device.RTL != Devices[WaitingAsyncInfo->DeviceID].RTL) {
		Ret =
		queryAndWait(Devices[WaitingAsyncInfo->DeviceID], WaitingAsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		continue;
		}

		Ret = Device.RTL->wait_event(Device.RTLDeviceID, AsyncInfo,
		WaitingAsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
		}

		return OFFLOAD_SUCCESS;
		}
		} // namespace

		int target_nowait(int64_t DeviceID, void *HostPtr, int32_t ArgNum,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'target_nowait' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'target_nowait' [readability-identifier…
		void ArgsBase, void Args, int64_t *ArgSizes,
		int64_t *ArgTypes, int32_t TeamNum, int32_t ThreadLimit,
		int IsTeamConstruct, __tgt_async_info *AsyncInfo,
		int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
		void *NoAliasDepList) {
		DeviceTy &Device = Devices[DeviceID];

		// Fall back to synchronous version if necessary interfaces are not supported
		if (!Device.RTL->AsyncSupported) {
		// Wait until current task has no depending task because during the task
		// creation, we enqueue the task even if it has depending target tasks but
		// here we don't have enough API to do asynchronous offloading, therefore we
		// need to make sure that all depending tasks are finished.
		while (__kmpc_get_target_task_npredecessors() != 0)
		__kmpc_target_task_yield();

		return target(DeviceID, HostPtr, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		TeamNum, ThreadLimit, IsTeamConstruct,
		/* AsyncInfo */ nullptr);
		}

		int Ret = waitForDeps(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = target(DeviceID, HostPtr, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		TeamNum, ThreadLimit, IsTeamConstruct, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = recordEvent(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		// Attach the async info to current task such that all dependent tasks can
		// start wait for the event if there is any dependency
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum > 0)
		// __kmpc_set_async_info(AsyncInfo);
		__kmpc_set_async_info(AsyncInfo);

		Ret = queryAndWait(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

if (Device.RTL->synchronize)		// If there is no dependency, we should release all async info. Otherwise, the
return Device.RTL->synchronize(device_id, &AsyncInfo);		// info will be released when the task is finished and its depnode is freed.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// return Device.RTL->release_async_info(Device.RTLDeviceID, AsyncInfo);
		// else
		// return OFFLOAD_SUCCESS;
		return OFFLOAD_SUCCESS;
		}

		int target_data_nowait(DeviceTy &Device, int32_t ArgNum, void **ArgsBase,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'target_data_nowait' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'target_data_nowait' [readability…
		void *Args, int64_t ArgSizes, int64_t *ArgTypes,
		__tgt_async_info *AsyncInfo, int32_t DepNum,
		void *DepList, int32_t NoAliasDepNum,
		void *NoAliasDepList, TargetDataFuncTy F) {
		// Fall back to synchronous version if necessary interfaces are not supported
		if (!Device.RTL->AsyncSupported) {
		// Wait until current task has no depending task because during the task
		// creation, we enqueue the task even if it has depending target tasks but
		// here we don't have enough API to do asynchronous offloading, therefore we
		// need to make sure that all depending tasks are finished.
		while (__kmpc_get_target_task_npredecessors() != 0)
		__kmpc_target_task_yield();

		// TODO: Need to wait for all dependencies in successors as well

		return F(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes,
		nullptr /* AsyncInfo */);
		}

		int Ret = waitForDeps(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = F(Device, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		Ret = recordEvent(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		// Attach the async info to current task such that all dependent tasks can
		// start wait for the event if there is any dependency
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers
		// if (depNum + noAliasDepNum > 0)
		// __kmpc_set_async_info(AsyncInfo);
		__kmpc_set_async_info(AsyncInfo);

		Ret = queryAndWait(Device, AsyncInfo);
		if (Ret != OFFLOAD_SUCCESS)
		return OFFLOAD_FAIL;

		// If there is no dependency, we should release all async info. Otherwise, the
		// info will be released when the task is finished and its depnode is freed.
		// TODO: There is an issue in CG such that we cannot depend on these two
		// numbers. Need to update this part if the issue is fixed.
		// if (depNum + noAliasDepNum == 0)
		// return Device.RTL->release_async_info(Device.RTLDeviceID, AsyncInfo);
		// else
		// return OFFLOAD_SUCCESS;
return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

openmp/libomptarget/src/private.h

Show All 11 Lines

#ifndef _OMPTARGET_PRIVATE_H		#ifndef _OMPTARGET_PRIVATE_H
#define _OMPTARGET_PRIVATE_H		#define _OMPTARGET_PRIVATE_H

#include <omptarget.h>		#include <omptarget.h>

#include <cstdint>		#include <cstdint>

extern int target_data_begin(DeviceTy &Device, int32_t arg_num,		extern int target_data_begin(DeviceTy &Device, int32_t arg_num,
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] [[https://github.
void args_base, void args, int64_t *arg_sizes,		void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types,		int64_t *arg_types,
__tgt_async_info *async_info_ptr);		__tgt_async_info *async_info_ptr);

extern int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,		extern int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] [[https://github.
void *args, int64_t arg_sizes, int64_t *arg_types,		void *args, int64_t arg_sizes, int64_t *arg_types,
__tgt_async_info *async_info_ptr);		__tgt_async_info *async_info_ptr);

extern int target_data_update(DeviceTy &Device, int32_t arg_num,		extern int target_data_update(DeviceTy &Device, int32_t arg_num,
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] [[https://github.
void args_base, void args, int64_t arg_sizes, int64_t arg_types);		void args_base, void args, int64_t *arg_sizes,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'args_base' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'args' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_sizes' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'args_base' [readability-identifier…
		int64_t arg_types, __tgt_async_info AsyncInfo);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'arg_types' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'arg_types' [readability-identifier…

extern int target(int64_t device_id, void *host_ptr, int32_t arg_num,		extern int target(int64_t device_id, void *host_ptr, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t *arg_sizes,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'args_base' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'args' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'arg_sizes' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'args_base' [readability-identifier…
int32_t team_num, int32_t thread_limit, int IsTeamConstruct);		int64_t *arg_types, int32_t team_num, int32_t thread_limit,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'arg_types' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'team_num' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'thread_limit' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'arg_types' [readability-identifier…
		int IsTeamConstruct, __tgt_async_info *AsyncInfo);

		extern int target_nowait(int64_t DeviceID, void *HostPtr, int32_t ArgNum,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'target_nowait' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'target_nowait' [readability-identifier…
		void ArgsBase, void Args, int64_t *ArgSizes,
		int64_t *ArgTypes, int32_t TeamNum,
		int32_t ThreadLimit, int IsTeamConstruct,
		__tgt_async_info *AsyncInfo, int32_t DepNum,
		void *DepList, int32_t NoAliasDepNum,
		void *NoAliasDepList);

		using TargetDataFuncTy = int ()(DeviceTy &, int32_t, void , void *,
		Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] [[https://github.
		int64_t , int64_t , __tgt_async_info *);

		int target_data_nowait(DeviceTy &Device, int32_t ArgNum, void **ArgsBase,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'target_data_nowait' [readability-identifier-naming] not useful clang-tidy: error: unknown type name 'DeviceTy' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'target_data_nowait' [readability…
		void *Args, int64_t ArgSizes, int64_t *ArgTypes,
		__tgt_async_info *AsyncInfo, int32_t DepNum,
		void *DepList, int32_t NoAliasDepNum,
		void *NoAliasDepList, TargetDataFuncTy F);

extern int CheckDeviceAndCtors(int64_t device_id);		extern int CheckDeviceAndCtors(int64_t device_id);

// enum for OMP_TARGET_OFFLOAD; keep in sync with kmp.h definition		// enum for OMP_TARGET_OFFLOAD; keep in sync with kmp.h definition
enum kmp_target_offload_kind {		enum kmp_target_offload_kind {
tgt_disabled = 0,		tgt_disabled = 0,
tgt_default = 1,		tgt_default = 1,
tgt_mandatory = 2		tgt_mandatory = 2
Show All 11 Lines	struct MapComponentInfoTy {
MapComponentInfoTy(void Base, void Begin, int64_t Size, int64_t Type)		MapComponentInfoTy(void Base, void Begin, int64_t Size, int64_t Type)
: Base(Base), Begin(Begin), Size(Size), Type(Type) {}		: Base(Base), Begin(Begin), Size(Size), Type(Type) {}
};		};

// This structure stores all components of a user-defined mapper. The number of		// This structure stores all components of a user-defined mapper. The number of
// components are dynamically decided, so we utilize C++ STL vector		// components are dynamically decided, so we utilize C++ STL vector
// implementation here.		// implementation here.
struct MapperComponentsTy {		struct MapperComponentsTy {
std::vector<MapComponentInfoTy> Components;		std::vector<MapComponentInfoTy> Components;
		Lint: Pre-merge checks Inline Actions clang-tidy: error: no template named 'vector' in namespace 'std' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: no template named 'vector' in namespace 'std' [clang-diagnostic-error]…
};		};

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
// implementation for fatal messages		// implementation for fatal messages
////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////

#define FATAL_MESSAGE0(_num, _str) \		#define FATAL_MESSAGE0(_num, _str) \
do { \		do { \
Show All 38 Lines

openmp/libomptarget/src/rtl.h

Show All 20 Lines
#include <vector>		#include <vector>

// Forward declarations.		// Forward declarations.
struct DeviceTy;		struct DeviceTy;
struct __tgt_bin_desc;		struct __tgt_bin_desc;

struct RTLInfoTy {		struct RTLInfoTy {
typedef int32_t(is_valid_binary_ty)(void *);		typedef int32_t(is_valid_binary_ty)(void *);
typedef int32_t(is_data_exchangable_ty)(int32_t, int32_t);
typedef int32_t(number_of_devices_ty)();		typedef int32_t(number_of_devices_ty)();
typedef int32_t(init_device_ty)(int32_t);		typedef int32_t(init_device_ty)(int32_t);
typedef __tgt_target_table (load_binary_ty)(int32_t, void );		typedef __tgt_target_table (load_binary_ty)(int32_t, void );
typedef void (data_alloc_ty)(int32_t, int64_t, void );		typedef void (data_alloc_ty)(int32_t, int64_t, void );
typedef int32_t(data_submit_ty)(int32_t, void , void , int64_t);		typedef int32_t(data_submit_ty)(int32_t, void , void , int64_t);
typedef int32_t(data_submit_async_ty)(int32_t, void , void , int64_t,
__tgt_async_info *);
typedef int32_t(data_retrieve_ty)(int32_t, void , void , int64_t);		typedef int32_t(data_retrieve_ty)(int32_t, void , void , int64_t);
typedef int32_t(data_retrieve_async_ty)(int32_t, void , void , int64_t,
__tgt_async_info *);
typedef int32_t(data_exchange_ty)(int32_t, void , int32_t, void , int64_t);
typedef int32_t(data_exchange_async_ty)(int32_t, void , int32_t, void ,
int64_t, __tgt_async_info *);
typedef int32_t(data_delete_ty)(int32_t, void *);		typedef int32_t(data_delete_ty)(int32_t, void *);
typedef int32_t(run_region_ty)(int32_t, void , void , ptrdiff_t ,		typedef int32_t(run_region_ty)(int32_t, void , void , ptrdiff_t ,
int32_t);		int32_t);
typedef int32_t(run_region_async_ty)(int32_t, void , void , ptrdiff_t ,
int32_t, __tgt_async_info *);
typedef int32_t(run_team_region_ty)(int32_t, void , void , ptrdiff_t ,		typedef int32_t(run_team_region_ty)(int32_t, void , void , ptrdiff_t ,
int32_t, int32_t, int32_t, uint64_t);		int32_t, int32_t, int32_t, uint64_t);
		typedef int64_t(init_requires_ty)(int64_t);

		// Device to device memory copy interfaces
		typedef int32_t(is_data_exchangable_ty)(int32_t, int32_t);
		typedef int32_t(data_exchange_ty)(int32_t, void , int32_t, void , int64_t);
		typedef int32_t(data_exchange_async_ty)(int32_t, void , int32_t, void ,
		int64_t, __tgt_async_info *);

		// The following interfaces are all about asynchronous operations
		typedef int32_t(data_submit_async_ty)(int32_t, void , void , int64_t,
		__tgt_async_info *);
		typedef int32_t(data_retrieve_async_ty)(int32_t, void , void , int64_t,
		__tgt_async_info *);
		typedef int32_t(run_region_async_ty)(int32_t, void , void , ptrdiff_t ,
		int32_t, __tgt_async_info *);
typedef int32_t(run_team_region_async_ty)(int32_t, void , void *,		typedef int32_t(run_team_region_async_ty)(int32_t, void , void *,
ptrdiff_t *, int32_t, int32_t,		ptrdiff_t *, int32_t, int32_t,
int32_t, uint64_t,		int32_t, uint64_t,
__tgt_async_info *);		__tgt_async_info *);
typedef int64_t(init_requires_ty)(int64_t);		typedef int32_t(release_async_info_ty)(int32_t, __tgt_async_info *);
typedef int64_t(synchronize_ty)(int64_t, __tgt_async_info *);		typedef int32_t(wait_event_ty)(int32_t, __tgt_async_info , __tgt_async_info );
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - typedef int32_t(wait_event_ty)(int32_t, __tgt_async_info , __tgt_async_info ); + typedef int32_t(wait_event_ty)(int32_t, __tgt_async_info , + __tgt_async_info ); Lint: Pre-merge checks: clang-format: please reformat the code ``` - typedef int32_t(wait_event_ty)(int32_t…
		typedef int32_t(record_event_ty)(int32_t, __tgt_async_info *);
		typedef int32_t(synchronize_ty)(int32_t, __tgt_async_info *);
		typedef int32_t(check_event_ty)(int32_t, __tgt_async_info *);

int32_t Idx = -1; // RTL index, index is the number of devices		int32_t Idx = -1; // RTL index, index is the number of devices
// of other RTLs that were registered before,		// of other RTLs that were registered before,
// i.e. the OpenMP index of the first device		// i.e. the OpenMP index of the first device
// to be registered with this RTL.		// to be registered with this RTL.
int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.		int32_t NumberOfDevices = -1; // Number of devices this RTL deals with.

void *LibraryHandler = nullptr;		void *LibraryHandler = nullptr;
Show All 16 Lines	#endif
data_exchange_ty *data_exchange = nullptr;		data_exchange_ty *data_exchange = nullptr;
data_exchange_async_ty *data_exchange_async = nullptr;		data_exchange_async_ty *data_exchange_async = nullptr;
data_delete_ty *data_delete = nullptr;		data_delete_ty *data_delete = nullptr;
run_region_ty *run_region = nullptr;		run_region_ty *run_region = nullptr;
run_region_async_ty *run_region_async = nullptr;		run_region_async_ty *run_region_async = nullptr;
run_team_region_ty *run_team_region = nullptr;		run_team_region_ty *run_team_region = nullptr;
run_team_region_async_ty *run_team_region_async = nullptr;		run_team_region_async_ty *run_team_region_async = nullptr;
init_requires_ty *init_requires = nullptr;		init_requires_ty *init_requires = nullptr;
		release_async_info_ty *release_async_info = nullptr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'release_async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'release_async_info' [readability-identifier…
		wait_event_ty *wait_event = nullptr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'wait_event' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'wait_event' [readability-identifier-naming]…
		record_event_ty *record_event = nullptr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'record_event' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'record_event' [readability-identifier…
synchronize_ty *synchronize = nullptr;		synchronize_ty *synchronize = nullptr;
		check_event_ty *check_event = nullptr;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'check_event' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'check_event' [readability-identifier…

// Are there images associated with this RTL.		// Are there images associated with this RTL.
bool isUsed = false;		bool isUsed = false;

// Mutex for thread-safety when calling RTL interface functions.		// Mutex for thread-safety when calling RTL interface functions.
// It is easier to enforce thread-safety at the libomptarget level,		// It is easier to enforce thread-safety at the libomptarget level,
// so that developers of new RTLs do not have to worry about it.		// so that developers of new RTLs do not have to worry about it.
std::mutex Mtx;		std::mutex Mtx;

		// Whether it supports asynchronous operation
		bool AsyncSupported = true;
		jdoerfertUnsubmitted Not Done Reply Inline Actions Typo in comment jdoerfert: Typo in comment

// The existence of the mutex above makes RTLInfoTy non-copyable.		// The existence of the mutex above makes RTLInfoTy non-copyable.
// We need to provide a copy constructor explicitly.		// We need to provide a copy constructor explicitly.
RTLInfoTy() = default;		RTLInfoTy() = default;

RTLInfoTy(const RTLInfoTy &r) {		RTLInfoTy(const RTLInfoTy &r) {
Idx = r.Idx;		Idx = r.Idx;
NumberOfDevices = r.NumberOfDevices;		NumberOfDevices = r.NumberOfDevices;
LibraryHandler = r.LibraryHandler;		LibraryHandler = r.LibraryHandler;
Show All 14 Lines	#endif
data_exchange_async = r.data_exchange_async;		data_exchange_async = r.data_exchange_async;
data_delete = r.data_delete;		data_delete = r.data_delete;
run_region = r.run_region;		run_region = r.run_region;
run_region_async = r.run_region_async;		run_region_async = r.run_region_async;
run_team_region = r.run_team_region;		run_team_region = r.run_team_region;
run_team_region_async = r.run_team_region_async;		run_team_region_async = r.run_team_region_async;
init_requires = r.init_requires;		init_requires = r.init_requires;
isUsed = r.isUsed;		isUsed = r.isUsed;
		AsyncSupported = r.AsyncSupported;
		release_async_info = r.release_async_info;
		wait_event = r.wait_event;
		record_event = r.record_event;
synchronize = r.synchronize;		synchronize = r.synchronize;
		check_event = r.check_event;
}		}
};		};

/// RTLs identified in the system.		/// RTLs identified in the system.
class RTLsTy {		class RTLsTy {
private:		private:
// Mutex-like object to guarantee thread-safety and unique initialization		// Mutex-like object to guarantee thread-safety and unique initialization
// (i.e. the library attempts to load the RTLs (plugins) only once).		// (i.e. the library attempts to load the RTLs (plugins) only once).
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

openmp/libomptarget/src/rtl.cpp

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	((void *)&R.run_team_region_async) =
dlsym(dynlib_handle, "__tgt_rtl_run_target_team_region_async");		dlsym(dynlib_handle, "__tgt_rtl_run_target_team_region_async");
((void *)&R.synchronize) = dlsym(dynlib_handle, "__tgt_rtl_synchronize");		((void *)&R.synchronize) = dlsym(dynlib_handle, "__tgt_rtl_synchronize");
((void *)&R.data_exchange) =		((void *)&R.data_exchange) =
dlsym(dynlib_handle, "__tgt_rtl_data_exchange");		dlsym(dynlib_handle, "__tgt_rtl_data_exchange");
((void *)&R.data_exchange_async) =		((void *)&R.data_exchange_async) =
dlsym(dynlib_handle, "__tgt_rtl_data_exchange_async");		dlsym(dynlib_handle, "__tgt_rtl_data_exchange_async");
((void *)&R.is_data_exchangable) =		((void *)&R.is_data_exchangable) =
dlsym(dynlib_handle, "__tgt_rtl_is_data_exchangable");		dlsym(dynlib_handle, "__tgt_rtl_is_data_exchangable");
		((void *)&R.release_async_info) =
		dlsym(dynlib_handle, "__tgt_rtl_release_async_info");
		((void *)&R.wait_event) = dlsym(dynlib_handle, "__tgt_rtl_wait_event");
		((void *)&R.record_event) =
		dlsym(dynlib_handle, "__tgt_rtl_record_event");
		((void *)&R.check_event) = dlsym(dynlib_handle, "__tgt_rtl_check_event");

		if (!R.synchronize \|\| !R.check_event) {
		DP("Asynchronous offloading not supported\n");
		R.AsyncSupported = false;
		R.data_exchange_async = nullptr;
		R.data_retrieve_async = nullptr;
		R.data_submit_async = nullptr;

		R.run_region_async = nullptr;
		R.run_team_region_async = nullptr;

		R.release_async_info = nullptr;
		R.record_event = nullptr;
		R.wait_event = nullptr;
		R.check_event = nullptr;
		}

// No devices are supported by this RTL?		// No devices are supported by this RTL?
if (!(R.NumberOfDevices = R.number_of_devices())) {		if (!(R.NumberOfDevices = R.number_of_devices())) {
DP("No devices supported in this RTL\n");		DP("No devices supported in this RTL\n");
continue;		continue;
}		}

DP("Registering RTL %s supporting %d devices!\n", R.RTLName.c_str(),		DP("Registering RTL %s supporting %d devices!\n", R.RTLName.c_str(),
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	for (auto *R : UsedRTLs) {

// Execute dtors for static objects if the device has been used, i.e.		// Execute dtors for static objects if the device has been used, i.e.
// if its PendingCtors list has been emptied.		// if its PendingCtors list has been emptied.
for (int32_t i = 0; i < FoundRTL->NumberOfDevices; ++i) {		for (int32_t i = 0; i < FoundRTL->NumberOfDevices; ++i) {
DeviceTy &Device = Devices[FoundRTL->Idx + i];		DeviceTy &Device = Devices[FoundRTL->Idx + i];
Device.PendingGlobalsMtx.lock();		Device.PendingGlobalsMtx.lock();
if (Device.PendingCtorsDtors[desc].PendingCtors.empty()) {		if (Device.PendingCtorsDtors[desc].PendingCtors.empty()) {
for (auto &dtor : Device.PendingCtorsDtors[desc].PendingDtors) {		for (auto &dtor : Device.PendingCtorsDtors[desc].PendingDtors) {
int rc = target(Device.DeviceID, dtor, 0, NULL, NULL, NULL, NULL, 1,		int rc = target(Device.DeviceID, dtor, 0 /* arg_num */,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'rc' [readability-identifier-naming]…
1, true /team/);		NULL /* arg_base /, NULL / args */,
		NULL /* arg_sizes /, NULL / arg_types */,
		1 /* team_num /, 1 / thread_limit */,
		true /* IsTeamConstruct */, nullptr);
		jdoerfertUnsubmitted Not Done Reply Inline Actions Style: Everywhere I have seen this we do `/* name / value`. I know this was different here but I'd like us to align with LLVM & Clang on this one. Feel free to commit the comments for all but the new argument as NFC without further review. jdoerfert:* Style: Everywhere I have seen this we do `/* name */ value`. I know this was different here but…
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP("Running destructor " DPxMOD " failed.\n", DPxPTR(dtor));		DP("Running destructor " DPxMOD " failed.\n", DPxPTR(dtor));
}		}
}		}
// Remove this library's entry from PendingCtorsDtors		// Remove this library's entry from PendingCtorsDtors
Device.PendingCtorsDtors.erase(desc);		Device.PendingCtorsDtors.erase(desc);
}		}
Device.PendingGlobalsMtx.unlock();		Device.PendingGlobalsMtx.unlock();
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

openmp/runtime/src/kmp.h

Show First 20 Lines • Show All 2,150 Lines • ▼ Show 20 Lines	typedef struct kmp_base_depnode {
kmp_lock_t mtx_locks[MAX_MTX_DEPS]; / lock mutexinoutset dependent tasks */		kmp_lock_t mtx_locks[MAX_MTX_DEPS]; / lock mutexinoutset dependent tasks */
kmp_int32 mtx_num_locks; /* number of locks in mtx_locks array */		kmp_int32 mtx_num_locks; /* number of locks in mtx_locks array */
kmp_lock_t lock; /* guards shared fields: task, successors */		kmp_lock_t lock; /* guards shared fields: task, successors */
#if KMP_SUPPORT_GRAPH_OUTPUT		#if KMP_SUPPORT_GRAPH_OUTPUT
kmp_uint32 id;		kmp_uint32 id;
#endif		#endif
std::atomic<kmp_int32> npredecessors;		std::atomic<kmp_int32> npredecessors;
std::atomic<kmp_int32> nrefs;		std::atomic<kmp_int32> nrefs;
		// All its depending target tasks
		kmp_depnode_list_t *predecessors;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'predecessors' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'predecessors' [readability-identifier…
		// A pointer to __tgt_async_info
		std::atomic<uintptr_t> async_info;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'async_info' [readability-identifier-naming]…
} kmp_base_depnode_t;		} kmp_base_depnode_t;

union KMP_ALIGN_CACHE kmp_depnode {		union KMP_ALIGN_CACHE kmp_depnode {
double dn_align; /* use worst case alignment */		double dn_align; /* use worst case alignment */
char dn_pad[KMP_PAD(kmp_base_depnode_t, CACHE_LINE)];		char dn_pad[KMP_PAD(kmp_base_depnode_t, CACHE_LINE)];
kmp_base_depnode_t dn;		kmp_base_depnode_t dn;
};		};

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	unsigned merged_if0 : 1; /* no __kmpc_task_{begin/complete}_if0 calls in if0
code path */		code path */
unsigned destructors_thunk : 1; /* set if the compiler creates a thunk to		unsigned destructors_thunk : 1; /* set if the compiler creates a thunk to
invoke destructors from the runtime */		invoke destructors from the runtime */
unsigned proxy : 1; /* task is a proxy task (it will be executed outside the		unsigned proxy : 1; /* task is a proxy task (it will be executed outside the
context of the RTL) */		context of the RTL) */
unsigned priority_specified : 1; /* set if the compiler provides priority		unsigned priority_specified : 1; /* set if the compiler provides priority
setting for the task */		setting for the task */
unsigned detachable : 1; /* 1 == can detach */		unsigned detachable : 1; /* 1 == can detach */
unsigned reserved : 9; /* reserved for compiler use */		unsigned target : 1; /* 1 == target task */
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'target' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'target' [readability-identifier-naming]…
		unsigned reserved : 8; /* reserved for compiler use */
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'reserved' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'reserved' [readability-identifier-naming]…

/* Library flags / / Total library flags must be 16 bits */		/* Library flags / / Total library flags must be 16 bits */
unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */		unsigned tasktype : 1; /* task is either explicit(1) or implicit (0) */
unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)		unsigned task_serial : 1; // task is executed immediately (1) or deferred (0)
unsigned tasking_ser : 1; // all tasks in team are either executed immediately		unsigned tasking_ser : 1; // all tasks in team are either executed immediately
// (1) or may be deferred (0)		// (1) or may be deferred (0)
unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel		unsigned team_serial : 1; // entire team is serial (1) [1 thread] or parallel
// (0) [>= 2 threads]		// (0) [>= 2 threads]
▲ Show 20 Lines • Show All 1,654 Lines • ▼ Show 20 Lines
static inline void __kmp_resume_if_hard_paused() {		static inline void __kmp_resume_if_hard_paused() {
if (__kmp_pause_status == kmp_hard_paused) {		if (__kmp_pause_status == kmp_hard_paused) {
__kmp_pause_status = kmp_not_paused;		__kmp_pause_status = kmp_not_paused;
}		}
}		}

extern void __kmp_omp_display_env(int verbose);		extern void __kmp_omp_display_env(int verbose);

		// For interaction with libomptarget
		extern void __kmpc_set_async_info(void *async_info);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_set_async_info' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_set_async_info' [readability…
		extern void __kmpc_get_target_task_waiting_list(void *list, int num);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_waiting_list' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'list' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'num' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_waiting_list'…
		extern void __kmpc_target_task_yield();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_target_task_yield' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_target_task_yield' [readability…
		extern int __kmpc_get_target_task_npredecessors();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_npredecessors' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_npredecessors'…

#ifdef __cplusplus		#ifdef __cplusplus
}		}
#endif		#endif

#endif /* KMP_H */		#endif /* KMP_H */

openmp/runtime/src/kmp_taskdeps.h

/*		/*
* kmp_taskdeps.h		* kmp_taskdeps.h
*/		*/


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//


#ifndef KMP_TASKDEPS_H		#ifndef KMP_TASKDEPS_H
#define KMP_TASKDEPS_H		#define KMP_TASKDEPS_H

#include "kmp.h"		#include "kmp.h"

#define KMP_ACQUIRE_DEPNODE(gtid, n) __kmp_acquire_lock(&(n)->dn.lock, (gtid))		#define KMP_ACQUIRE_DEPNODE(gtid, n) __kmp_acquire_lock(&(n)->dn.lock, (gtid))
#define KMP_RELEASE_DEPNODE(gtid, n) __kmp_release_lock(&(n)->dn.lock, (gtid))		#define KMP_RELEASE_DEPNODE(gtid, n) __kmp_release_lock(&(n)->dn.lock, (gtid))

		extern "C" {
		void __kmpc_free_async_info(void *);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_free_async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_free_async_info' [readability…
		}

static inline void __kmp_node_deref(kmp_info_t thread, kmp_depnode_t node) {		static inline void __kmp_node_deref(kmp_info_t thread, kmp_depnode_t node) {
if (!node)		if (!node)
return;		return;

kmp_int32 n = KMP_ATOMIC_DEC(&node->dn.nrefs) - 1;		kmp_int32 n = KMP_ATOMIC_DEC(&node->dn.nrefs) - 1;
if (n == 0) {		if (n == 0) {
KMP_ASSERT(node->dn.nrefs == 0);		KMP_ASSERT(node->dn.nrefs == 0);
		// Free async info
		if (node->dn.async_info)
		__kmpc_free_async_info(
		reinterpret_cast<void *>(KMP_ATOMIC_LD_ACQ(&node->dn.async_info)));
		// Free the predecessor list
		kmp_depnode_list_t *next;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'next' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'next' [readability-identifier-naming]…
		for (kmp_depnode_list_t *p = node->dn.predecessors; p; p = next) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'p' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'p' [readability-identifier-naming]…
		__kmp_node_deref(thread, p->node);
		next = p->next;
		#if USE_FAST_MEMORY
		__kmp_fast_free(thread, p);
		#else
		__kmp_thread_free(thread, p);
		#endif
		}
#if USE_FAST_MEMORY		#if USE_FAST_MEMORY
__kmp_fast_free(thread, node);		__kmp_fast_free(thread, node);
#else		#else
__kmp_thread_free(thread, node);		__kmp_thread_free(thread, node);
#endif		#endif
}		}
}		}

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {
KMP_ACQUIRE_DEPNODE(gtid, node);		KMP_ACQUIRE_DEPNODE(gtid, node);
node->dn.task =		node->dn.task =
NULL; // mark this task as finished, so no new dependencies are generated		NULL; // mark this task as finished, so no new dependencies are generated
KMP_RELEASE_DEPNODE(gtid, node);		KMP_RELEASE_DEPNODE(gtid, node);

kmp_depnode_list_t *next;		kmp_depnode_list_t *next;
for (kmp_depnode_list_t *p = node->dn.successors; p; p = next) {		for (kmp_depnode_list_t *p = node->dn.successors; p; p = next) {
kmp_depnode_t *successor = p->node;		kmp_depnode_t *successor = p->node;
		kmp_taskdata_t *successor_task_data = KMP_TASK_TO_TASKDATA(successor);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'successor_task_data' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'successor_task_data' [readability…
kmp_int32 npredecessors = KMP_ATOMIC_DEC(&successor->dn.npredecessors) - 1;		kmp_int32 npredecessors = KMP_ATOMIC_DEC(&successor->dn.npredecessors) - 1;

// successor task can be NULL for wait_depends or because deps are still		// successor task can be NULL for wait_depends or because deps are still
// being processed		// being processed
if (npredecessors == 0) {		if (npredecessors == 0) {
KMP_MB();		KMP_MB();
if (successor->dn.task) {		// All target tasks have been enqueued before
		if (successor->dn.task && !successor_task_data->td_flags.target) {
KA_TRACE(20, ("__kmp_release_deps: T#%d successor %p of %p scheduled "		KA_TRACE(20, ("__kmp_release_deps: T#%d successor %p of %p scheduled "
"for execution.\n",		"for execution.\n",
gtid, successor->dn.task, task));		gtid, successor->dn.task, task));
__kmp_omp_task(gtid, successor->dn.task, false);		__kmp_omp_task(gtid, successor->dn.task, false);
}		}
}		}

next = p->next;		next = p->next;
Show All 17 Lines

openmp/runtime/src/kmp_taskdeps.cpp

Show All 39 Lines	static void __kmp_init_node(kmp_depnode_t *node) {
for (int i = 0; i < MAX_MTX_DEPS; ++i)		for (int i = 0; i < MAX_MTX_DEPS; ++i)
node->dn.mtx_locks[i] = NULL;		node->dn.mtx_locks[i] = NULL;
node->dn.mtx_num_locks = 0;		node->dn.mtx_num_locks = 0;
__kmp_init_lock(&node->dn.lock);		__kmp_init_lock(&node->dn.lock);
KMP_ATOMIC_ST_RLX(&node->dn.nrefs, 1); // init creates the first reference		KMP_ATOMIC_ST_RLX(&node->dn.nrefs, 1); // init creates the first reference
#ifdef KMP_SUPPORT_GRAPH_OUTPUT		#ifdef KMP_SUPPORT_GRAPH_OUTPUT
node->dn.id = KMP_ATOMIC_INC(&kmp_node_id_seed);		node->dn.id = KMP_ATOMIC_INC(&kmp_node_id_seed);
#endif		#endif
		node->dn.predecessors = NULL;
		KMP_ATOMIC_ST_REL(&node->dn.async_info, 0);
}		}

static inline kmp_depnode_t __kmp_node_ref(kmp_depnode_t node) {		static inline kmp_depnode_t __kmp_node_ref(kmp_depnode_t node) {
KMP_ATOMIC_INC(&node->dn.nrefs);		KMP_ATOMIC_INC(&node->dn.nrefs);
return node;		return node;
}		}

enum { KMP_DEPHASH_OTHER_SIZE = 97, KMP_DEPHASH_MASTER_SIZE = 997 };		enum { KMP_DEPHASH_OTHER_SIZE = 97, KMP_DEPHASH_MASTER_SIZE = 997 };
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines

static inline kmp_int32		static inline kmp_int32
__kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,		__kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,
kmp_task_t task, kmp_depnode_t node,		kmp_task_t task, kmp_depnode_t node,
kmp_depnode_list_t *plist) {		kmp_depnode_list_t *plist) {
if (!plist)		if (!plist)
return 0;		return 0;
kmp_int32 npredecessors = 0;		kmp_int32 npredecessors = 0;
// link node as successor of list elements		kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(node->dn.task);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming]…
		// If a task is a target task, we will check all its depending tasks. If a
		// depending task is a target task, we put it into the predecessors list of
		// current task. If a depending task is a host task, we will put current task
		// into the successors list of the depending task and increase the count.
		// Later on, when we process all dependencies of current task, if the count is
		// not zero, meaning current target task depends on at least one host task, we
		// will still push the task into the queue and let the RTL dispatch it.
		// However, before starting the offloading, it will check whether the count is
		// zero and will not proceed if not.
for (kmp_depnode_list_t *p = plist; p; p = p->next) {		for (kmp_depnode_list_t *p = plist; p; p = p->next) {
kmp_depnode_t *dep = p->node;		kmp_depnode_t *dep = p->node;
if (dep->dn.task) {		if (dep->dn.task) {
KMP_ACQUIRE_DEPNODE(gtid, dep);		KMP_ACQUIRE_DEPNODE(gtid, dep);
if (dep->dn.task) {		if (dep->dn.task) {
__kmp_track_dependence(dep, node, task);		__kmp_track_dependence(dep, node, task);
		kmp_taskdata_t *dep_taskdata = KMP_TASK_TO_TASKDATA(dep->dn.task);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'dep_taskdata' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'dep_taskdata' [readability-identifier…
		if (taskdata->td_flags.target && dep_taskdata->td_flags.target) {
		node->dn.predecessors =
		__kmp_add_node(thread, node->dn.predecessors, dep);
		KA_TRACE(40, ("__kmp_process_deps: T#%d target task %p depends on "
		"target task %p\n",
		gtid, taskdata, dep_taskdata));
		} else {
dep->dn.successors = __kmp_add_node(thread, dep->dn.successors, node);		dep->dn.successors = __kmp_add_node(thread, dep->dn.successors, node);
		++npredecessors;
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "		KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",		"%p\n",
gtid, KMP_TASK_TO_TASKDATA(dep->dn.task),		gtid, KMP_TASK_TO_TASKDATA(dep->dn.task),
KMP_TASK_TO_TASKDATA(task)));		KMP_TASK_TO_TASKDATA(task)));
npredecessors++;		}
}		}
KMP_RELEASE_DEPNODE(gtid, dep);		KMP_RELEASE_DEPNODE(gtid, dep);
}		}
}		}
return npredecessors;		return npredecessors;
}		}

static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,		static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
kmp_info_t *thread,		kmp_info_t *thread,
kmp_task_t *task,		kmp_task_t *task,
kmp_depnode_t *source,		kmp_depnode_t *source,
kmp_depnode_t *sink) {		kmp_depnode_t *sink) {
if (!sink)		if (!sink)
return 0;		return 0;
		kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming]…
kmp_int32 npredecessors = 0;		kmp_int32 npredecessors = 0;
if (sink->dn.task) {		if (sink->dn.task) {
// synchronously add source to sink' list of successors		// synchronously add source to sink' list of successors
KMP_ACQUIRE_DEPNODE(gtid, sink);		KMP_ACQUIRE_DEPNODE(gtid, sink);
if (sink->dn.task) {		if (sink->dn.task) {
__kmp_track_dependence(sink, source, task);		__kmp_track_dependence(sink, source, task);
sink->dn.successors = __kmp_add_node(thread, sink->dn.successors, source);		kmp_taskdata_t *sink_taskdata = KMP_TASK_TO_TASKDATA(sink->dn.task);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'sink_taskdata' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'sink_taskdata' [readability-identifier…
		if (taskdata->td_flags.target && sink_taskdata->td_flags.target) {
		source->dn.predecessors =
		__kmp_add_node(thread, source->dn.predecessors, sink);
		KA_TRACE(40, ("__kmp_process_deps: T#%d target task %p depends on "
		"target task %p\n",
		gtid, taskdata, sink_taskdata));
		} else {
		sink->dn.successors =
		__kmp_add_node(thread, sink->dn.successors, source);
		npredecessors++;
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "		KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",		"%p\n",
gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),		gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),
KMP_TASK_TO_TASKDATA(task)));		KMP_TASK_TO_TASKDATA(task)));
npredecessors++;		}
}		}
KMP_RELEASE_DEPNODE(gtid, sink);		KMP_RELEASE_DEPNODE(gtid, sink);
}		}
return npredecessors;		return npredecessors;
}		}

template <bool filter>		template <bool filter>
static inline kmp_int32		static inline kmp_int32
▲ Show 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	if (__kmp_check_deps(gtid, node, new_task, &current_task->td_dephash,
"dependencies: "		"dependencies: "
"loc=%p task=%p, return: TASK_CURRENT_NOT_QUEUED\n",		"loc=%p task=%p, return: TASK_CURRENT_NOT_QUEUED\n",
gtid, loc_ref, new_taskdata));		gtid, loc_ref, new_taskdata));
#if OMPT_SUPPORT		#if OMPT_SUPPORT
if (ompt_enabled.enabled) {		if (ompt_enabled.enabled) {
current_task->ompt_task_info.frame.enter_frame = ompt_data_none;		current_task->ompt_task_info.frame.enter_frame = ompt_data_none;
}		}
#endif		#endif
		// All target tasks will be enqueued directly no matter whether their
		// dependencies have been fullfilled. They will be checked again in
		// libomptarget.
		if (!new_taskdata->td_flags.target)
return TASK_CURRENT_NOT_QUEUED;		return TASK_CURRENT_NOT_QUEUED;
}		}
} else {		} else {
KA_TRACE(10, ("__kmpc_omp_task_with_deps(exit): T#%d ignored dependencies "		KA_TRACE(10, ("__kmpc_omp_task_with_deps(exit): T#%d ignored dependencies "
"for task (serialized)"		"for task (serialized)"
"loc=%p task=%p\n",		"loc=%p task=%p\n",
gtid, loc_ref, new_taskdata));		gtid, loc_ref, new_taskdata));
}		}

▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

openmp/runtime/src/kmp_tasking.cpp

Show First 20 Lines • Show All 926 Lines • ▼ Show 20 Lines

#if OMPT_SUPPORT		#if OMPT_SUPPORT
// This is not a detached task, we are done here		// This is not a detached task, we are done here
if (ompt)		if (ompt)
__ompt_task_finish(task, resumed_task, ompt_task_complete);		__ompt_task_finish(task, resumed_task, ompt_task_complete);
#endif		#endif

// Only need to keep track of count if team parallel and tasking not		// Only need to keep track of count if team parallel and tasking not
// serialized, or task is detachable and event has already been fulfilled		// serialized, or task is detachable and event has already been fulfilled
if (!(taskdata->td_flags.team_serial \|\| taskdata->td_flags.tasking_ser) \|\|		if (!(taskdata->td_flags.team_serial \|\| taskdata->td_flags.tasking_ser) \|\|
taskdata->td_flags.detachable == TASK_DETACHABLE) {		taskdata->td_flags.detachable == TASK_DETACHABLE) {
// Predecrement simulated by "- 1" calculation		// Predecrement simulated by "- 1" calculation
children =		children =
KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks) - 1;		KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks) - 1;
KMP_DEBUG_ASSERT(children >= 0);		KMP_DEBUG_ASSERT(children >= 0);
if (taskdata->td_taskgroup)		if (taskdata->td_taskgroup)
KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);		KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
▲ Show 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	if (flags->proxy == TASK_FULL)
copy_icvs(&taskdata->td_icvs, &taskdata->td_parent->td_icvs);		copy_icvs(&taskdata->td_icvs, &taskdata->td_parent->td_icvs);

taskdata->td_flags.tiedness = flags->tiedness;		taskdata->td_flags.tiedness = flags->tiedness;
taskdata->td_flags.final = flags->final;		taskdata->td_flags.final = flags->final;
taskdata->td_flags.merged_if0 = flags->merged_if0;		taskdata->td_flags.merged_if0 = flags->merged_if0;
taskdata->td_flags.destructors_thunk = flags->destructors_thunk;		taskdata->td_flags.destructors_thunk = flags->destructors_thunk;
taskdata->td_flags.proxy = flags->proxy;		taskdata->td_flags.proxy = flags->proxy;
taskdata->td_flags.detachable = flags->detachable;		taskdata->td_flags.detachable = flags->detachable;
		taskdata->td_flags.target = flags->target;
taskdata->td_task_team = thread->th.th_task_team;		taskdata->td_task_team = thread->th.th_task_team;
taskdata->td_size_alloc = shareds_offset + sizeof_shareds;		taskdata->td_size_alloc = shareds_offset + sizeof_shareds;
taskdata->td_flags.tasktype = TASK_EXPLICIT;		taskdata->td_flags.tasktype = TASK_EXPLICIT;

// GEH - TODO: fix this to copy parent task's value of tasking_ser flag		// GEH - TODO: fix this to copy parent task's value of tasking_ser flag
taskdata->td_flags.tasking_ser = (__kmp_tasking_mode == tskm_immediate_exec);		taskdata->td_flags.tasking_ser = (__kmp_tasking_mode == tskm_immediate_exec);

// GEH - TODO: fix this to copy parent task's value of team_serial flag		// GEH - TODO: fix this to copy parent task's value of team_serial flag
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
}		}

kmp_task_t __kmpc_omp_target_task_alloc(ident_t loc_ref, kmp_int32 gtid,		kmp_task_t __kmpc_omp_target_task_alloc(ident_t loc_ref, kmp_int32 gtid,
kmp_int32 flags,		kmp_int32 flags,
size_t sizeof_kmp_task_t,		size_t sizeof_kmp_task_t,
size_t sizeof_shareds,		size_t sizeof_shareds,
kmp_routine_entry_t task_entry,		kmp_routine_entry_t task_entry,
kmp_int64 device_id) {		kmp_int64 device_id) {
		// All tasks created via this interface should be a target task
		kmp_tasking_flags_t input_flags = (kmp_tasking_flags_t )&flags;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'input_flags' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'input_flags' [readability-identifier…
		input_flags->target = TRUE;
return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,		return __kmpc_omp_task_alloc(loc_ref, gtid, flags, sizeof_kmp_task_t,
sizeof_shareds, task_entry);		sizeof_shareds, task_entry);
}		}

/*!		/*!
@ingroup TASKING		@ingroup TASKING
@param loc_ref location of the original task directive		@param loc_ref location of the original task directive
@param gtid Global Thread ID of encountering thread		@param gtid Global Thread ID of encountering thread
▲ Show 20 Lines • Show All 3,149 Lines • ▼ Show 20 Lines	#endif
if (nogroup == 0) {		if (nogroup == 0) {
#if OMPT_SUPPORT && OMPT_OPTIONAL		#if OMPT_SUPPORT && OMPT_OPTIONAL
OMPT_STORE_RETURN_ADDRESS(gtid);		OMPT_STORE_RETURN_ADDRESS(gtid);
#endif		#endif
__kmpc_end_taskgroup(loc, gtid);		__kmpc_end_taskgroup(loc, gtid);
}		}
KA_TRACE(20, ("__kmpc_taskloop(exit): T#%d\n", gtid));		KA_TRACE(20, ("__kmpc_taskloop(exit): T#%d\n", gtid));
}		}

		// Bind the async info to current task
		void __kmpc_set_async_info(void *async_info) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_set_async_info' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'async_info' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_set_async_info' [readability…
		int gtid = __kmp_get_gtid();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming]…
		kmp_info_t *thread = __kmp_threads[gtid];
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'thread' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'thread' [readability-identifier-naming]…
		kmp_depnode_t *dep = thread->th.th_current_task->td_depnode;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'dep' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'dep' [readability-identifier-naming]…
		KMP_DEBUG_ASSERT(dep);
		KMP_ATOMIC_ST_REL(&dep->dn.async_info,
		reinterpret_cast<uintptr_t>(async_info));
		}

		// Get the list of waiting async info. If list is NULL, just query the number of
		// predecessors current executing task has. If not, list will contain all
		// asynchronous
		void __kmpc_get_target_task_waiting_list(void *list, int num) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_waiting_list' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'list' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'num' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_waiting_list'…
		KMP_DEBUG_ASSERT(num != NULL);

		int gtid = __kmp_get_gtid();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming]…
		kmp_info_t *thread = __kmp_threads[gtid];
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'thread' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'thread' [readability-identifier-naming]…
		kmp_taskdata_t *taskdata = thread->th.th_current_task;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming]…

		int n = 0;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'n' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'n' [readability-identifier-naming]…
		for (kmp_depnode_list_t *p = taskdata->td_depnode->dn.predecessors; p;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'p' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'p' [readability-identifier-naming]…
		p = p->next) {
		kmp_depnode_t *dep = p->node;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'dep' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'dep' [readability-identifier-naming]…
		if (dep->dn.task) {
		kmp_taskdata_t *pred_task = KMP_TASK_TO_TASKDATA(dep->dn.task);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'pred_task' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'pred_task' [readability-identifier…
		KMP_ASSERT(pred_task->td_flags.target);
		if (list) {
		while (KMP_ATOMIC_LD_ACQ(&dep->dn.async_info) == 0)
		__kmpc_omp_taskyield(nullptr, gtid, 0);
		list[n] =
		reinterpret_cast<void *>(KMP_ATOMIC_LD_ACQ(&dep->dn.async_info));
		}
		++n;
		}
		}

		*num = n;
		}

		void __kmpc_target_task_yield() {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_target_task_yield' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_target_task_yield' [readability…
		int gtid = __kmp_get_gtid();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming]…
		__kmpc_omp_taskyield(nullptr, gtid, 0);
		}

		int __kmpc_get_target_task_npredecessors() {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_npredecessors' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_get_target_task_npredecessors'…
		int gtid = __kmp_get_gtid();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'gtid' [readability-identifier-naming]…
		kmp_info_t *thread = __kmp_threads[gtid];
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'thread' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'thread' [readability-identifier-naming]…
		kmp_taskdata_t *taskdata = thread->th.th_current_task;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'taskdata' [readability-identifier-naming]…
		kmp_depnode_t *dep = taskdata->td_depnode;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'dep' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'dep' [readability-identifier-naming]…

		if (dep == nullptr)
		return 0;

		int n;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'n' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'n' [readability-identifier-naming]…

		KMP_ACQUIRE_DEPNODE(gtid, dep);
		n = dep->dn.npredecessors;
		KMP_RELEASE_DEPNODE(gtid, dep);

		return n;
		}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Introduce low level dependency process to target offloadingAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 271715

openmp/libomptarget/include/omptarget.h

openmp/libomptarget/include/omptargetplugin.h

openmp/libomptarget/plugins/cuda/src/rtl.cpp

openmp/libomptarget/plugins/exports

openmp/libomptarget/src/device.cpp

openmp/libomptarget/src/exports

openmp/libomptarget/src/interface.cpp

openmp/libomptarget/src/omptarget.cpp

openmp/libomptarget/src/private.h

openmp/libomptarget/src/rtl.h

openmp/libomptarget/src/rtl.cpp

openmp/runtime/src/kmp.h

openmp/runtime/src/kmp_taskdeps.h

openmp/runtime/src/kmp_taskdeps.cpp

openmp/runtime/src/kmp_tasking.cpp

[OpenMP] Introduce low level dependency process to target offloading
AbandonedPublic