This is an archive of the discontinued LLVM Phabricator instance.

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

In D132532#3745640, @jdoerfert wrote:

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

In D132532#3745645, @ye-luo wrote:

In D132532#3745640, @jdoerfert wrote:

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I believe what @jdoerfert is trying to say is that the race will happen between a thread deleting the device data and another making the data transfer. Imagine the following case:

Thread A: is not the "owner" (IsOwned == false) of the entry but will do a data transfer.
Thread B: is the "owner" (IsOwned == true) of the entry and will delete it.

If A is switched out of execution right after getTgtPtrBegin returns, B might remove the data before A starts any data transfer. This will lead to a segfault on the device side. Although it seems to be a problem with the user program, I think it should lead to data corruption and not a segfault. I also believe this may happen on correct programs, but I cannot think of a simple example right now.

I was planning to send a patch with this same fix this week. It was quite similar to this one, but it also had a new reference counter designed to track threads using the entry on the data end function. This way, multiple threads might get the entry, but only the last one would delete the buffer and the entry data.

In D132532#3745645, @ye-luo wrote:

In D132532#3745640, @jdoerfert wrote:

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I still believe there is a problem.

T1: copy back old device memory into HostMem after entry has been deleted
T2: copy from HostMem into some new device region as the entry is new

We cannot use events or anything to prevent this race on the host memory and since maps should be atomic we break the program.

Where am I wrong?

In D132532#3745744, @gValarini wrote:

In D132532#3745645, @ye-luo wrote:

In D132532#3745640, @jdoerfert wrote:

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I believe what @jdoerfert is trying to say is that the race will happen between a thread deleting the device data and another making the data transfer. Imagine the following case:

Thread A: is not the "owner" (IsOwned == false) of the entry but will do a data transfer.

Thread B: is the "owner" (IsOwned == true) of the entry and will delete it.

If A is switched out of execution right after getTgtPtrBegin returns, B might remove the data before A starts any data transfer. This will lead to a segfault on the device side. Although it seems to be a problem with the user program, I think it should lead to data corruption and not a segfault. I also believe this may happen on correct programs, but I cannot think of a simple example right now.

I was planning to send a patch with this same fix this week. It was quite similar to this one, but it also had a new reference counter designed to track threads using the entry on the data end function. This way, multiple threads might get the entry, but only the last one would delete the buffer and the entry data.

Your example is valid. I think we probably need to distinguish the transfer not caused by refcount to 0 and transfer caused by refcount to 0.
transfer not caused by refcount to 0 needs to be placed before actually adjusting the recount. I feel we need probably write down a few rules before devising a scheme.
Another more conservative route I though about is only schedule ahead transfers of refcount=INF.

tracking threads. My quick take will be another can of worms. I need to think about it when your patch shows up.

In D132532#3745747, @jdoerfert wrote:

In D132532#3745645, @ye-luo wrote:

In D132532#3745640, @jdoerfert wrote:

I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.

I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.

I still believe there is a problem.

T1: copy back old device memory into HostMem after entry has been deleted
T2: copy from HostMem into some new device region as the entry is new

We cannot use events or anything to prevent this race on the host memory and since maps should be atomic we break the program.

Where am I wrong?

You are right. There is potential data corruption. The H2D to a new device memory location may start before D2H completes.

In D132532#3745885, @ye-luo wrote:

tracking threads. My quick take will be another can of worms. I need to think about it when your patch shows up.

I have just submitted patch D132676 with the implementation that I described before. I believe that by tracking the threads in a scoped manner, we may reduce the work of maintaining the solution. What do you think?

Revision Contents

Path

Size

openmp/

libomptarget/

include/

device.h

24 lines

src/

device.cpp

26 lines

omptarget.cpp

46 lines

Diff 455076

openmp/libomptarget/include/device.h

//===----------- device.h - Target independent OpenMP target RTL ----------===//		//===----------- device.h - Target independent OpenMP target RTL ----------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	private:
static const uint64_t INFRefCount = ~(uint64_t)0;		static const uint64_t INFRefCount = ~(uint64_t)0;
static std::string refCountToStr(uint64_t RefCount) {		static std::string refCountToStr(uint64_t RefCount) {
return RefCount == INFRefCount ? "INF" : std::to_string(RefCount);		return RefCount == INFRefCount ? "INF" : std::to_string(RefCount);
}		}

struct StatesTy {		struct StatesTy {
StatesTy(uint64_t DRC, uint64_t HRC)		StatesTy(uint64_t DRC, uint64_t HRC)
: DynRefCount(DRC), HoldRefCount(HRC),		: DynRefCount(DRC), HoldRefCount(HRC),
MayContainAttachedPointers(false), DeleteThreadId(std::thread::id()) {		MayContainAttachedPointers(false) {
}		}
/// The dynamic reference count is the standard reference count as of OpenMP		/// The dynamic reference count is the standard reference count as of OpenMP
/// 4.5. The hold reference count is an OpenMP extension for the sake of		/// 4.5. The hold reference count is an OpenMP extension for the sake of
/// OpenACC support.		/// OpenACC support.
///		///
/// The 'ompx_hold' map type modifier is permitted only on "omp target" and		/// The 'ompx_hold' map type modifier is permitted only on "omp target" and
/// "omp target data", and "delete" is permitted only on "omp target exit		/// "omp target data", and "delete" is permitted only on "omp target exit
/// data" and associated runtime library routines. As a result, we really		/// data" and associated runtime library routines. As a result, we really
Show All 22 Lines	struct StatesTy {
/// Pointer to the event corresponding to the data update of this map.		/// Pointer to the event corresponding to the data update of this map.
/// Note: At present this event is created when the first data transfer from		/// Note: At present this event is created when the first data transfer from
/// host to device is issued, and only being used for H2D. It is not used		/// host to device is issued, and only being used for H2D. It is not used
/// for data transfer in another direction (device to host). It is still		/// for data transfer in another direction (device to host). It is still
/// unclear whether we need it for D2H. If in the future we need similar		/// unclear whether we need it for D2H. If in the future we need similar
/// mechanism for D2H, and if the event cannot be shared between them, Event		/// mechanism for D2H, and if the event cannot be shared between them, Event
/// should be written as <tt>void *Event[2]</tt>.		/// should be written as <tt>void *Event[2]</tt>.
void *Event = nullptr;		void *Event = nullptr;

/// The id of the thread responsible for deleting this entry. This thread
/// set the reference count to zero last. Other threads might reuse the
/// entry while it is marked for deletion but not yet deleted (e.g., the
/// data is still being moved back). If another thread reuses the entry we
/// will have a non-zero reference count or the thread will have changed
/// this id, effectively taking over deletion responsibility.
std::thread::id DeleteThreadId;
};		};
// When HostDataToTargetTy is used by std::set, std::set::iterator is const		// When HostDataToTargetTy is used by std::set, std::set::iterator is const
// use unique_ptr to make States mutable.		// use unique_ptr to make States mutable.
const std::unique_ptr<StatesTy> States;		const std::unique_ptr<StatesTy> States;

public:		public:
HostDataToTargetTy(uintptr_t BP, uintptr_t B, uintptr_t E, uintptr_t TB,		HostDataToTargetTy(uintptr_t BP, uintptr_t B, uintptr_t E, uintptr_t TB,
bool UseHoldRefCount, map_var_info_t Name = nullptr,		bool UseHoldRefCount, map_var_info_t Name = nullptr,
Show All 24 Lines	public:

/// Get the event bound to this data map.		/// Get the event bound to this data map.
void *getEvent() const { return States->Event; }		void *getEvent() const { return States->Event; }

/// Add a new event, if necessary.		/// Add a new event, if necessary.
/// Returns OFFLOAD_FAIL if something went wrong, OFFLOAD_SUCCESS otherwise.		/// Returns OFFLOAD_FAIL if something went wrong, OFFLOAD_SUCCESS otherwise.
int addEventIfNecessary(DeviceTy &Device, AsyncInfoTy &AsyncInfo) const;		int addEventIfNecessary(DeviceTy &Device, AsyncInfoTy &AsyncInfo) const;

/// Indicate that the current thread expected to delete this entry.
void setDeleteThreadId() const {
States->DeleteThreadId = std::this_thread::get_id();
}

/// Return the thread id of the thread expected to delete this entry.
std::thread::id getDeleteThreadId() const { return States->DeleteThreadId; }

/// Set the event bound to this data map.		/// Set the event bound to this data map.
void setEvent(void *Event) const { States->Event = Event; }		void setEvent(void *Event) const { States->Event = Event; }

/// Reset the specified reference count unless it's infinity. Reset to 1		/// Reset the specified reference count unless it's infinity. Reset to 1
/// (even if currently 0) so it can be followed by a decrement.		/// (even if currently 0) so it can be followed by a decrement.
void resetRefCount(bool UseHoldRefCount) const {		void resetRefCount(bool UseHoldRefCount) const {
uint64_t &ThisRefCount =		uint64_t &ThisRefCount =
UseHoldRefCount ? States->HoldRefCount : States->DynRefCount;		UseHoldRefCount ? States->HoldRefCount : States->DynRefCount;
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
/// This struct will be returned by \p DeviceTy::getTargetPointer which provides		/// This struct will be returned by \p DeviceTy::getTargetPointer which provides
/// more data than just a target pointer.		/// more data than just a target pointer.
struct TargetPointerResultTy {		struct TargetPointerResultTy {
struct {		struct {
/// If the map table entry is just created		/// If the map table entry is just created
unsigned IsNewEntry : 1;		unsigned IsNewEntry : 1;
/// If the pointer is actually a host pointer (when unified memory enabled)		/// If the pointer is actually a host pointer (when unified memory enabled)
unsigned IsHostPointer : 1;		unsigned IsHostPointer : 1;
} Flags = {0, 0};		unsigned IsOwned : 1;
		} Flags = {0, 0, 0};

/// The corresponding map table entry which is stable.		/// The corresponding map table entry which is stable.
HostDataToTargetTy *Entry = nullptr;		HostDataToTargetTy *Entry = nullptr;

/// The corresponding target pointer		/// The corresponding target pointer
void *TargetPointer = nullptr;		void *TargetPointer = nullptr;
};		};

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	struct DeviceTy {

/// Deallocate \p LR and remove the entry. Assume the total reference count is		/// Deallocate \p LR and remove the entry. Assume the total reference count is
/// zero and the calling thread is the deleting thread for \p LR. \p HDTTMap		/// zero and the calling thread is the deleting thread for \p LR. \p HDTTMap
/// ensure the caller holds exclusive access and can modify the map. Return \c		/// ensure the caller holds exclusive access and can modify the map. Return \c
/// OFFLOAD_SUCCESS if the map entry existed, and return \c OFFLOAD_FAIL if		/// OFFLOAD_SUCCESS if the map entry existed, and return \c OFFLOAD_FAIL if
/// not. It is the caller's responsibility to skip calling this function if		/// not. It is the caller's responsibility to skip calling this function if
/// the map entry is not expected to exist because \p HstPtrBegin uses shared		/// the map entry is not expected to exist because \p HstPtrBegin uses shared
/// memory.		/// memory.
int deallocTgtPtr(HDTTMapAccessorTy &HDTTMap, LookupResult LR, int64_t Size);		int eraseHDTTMapEntry(HDTTMapAccessorTy &HDTTMap, LookupResult LR);
		int deallocTgtPtr(HostDataToTargetTy *Entry, int64_t Size);

int associatePtr(void HstPtrBegin, void TgtPtrBegin, int64_t Size);		int associatePtr(void HstPtrBegin, void TgtPtrBegin, int64_t Size);
int disassociatePtr(void *HstPtrBegin);		int disassociatePtr(void *HstPtrBegin);

// calls to RTL		// calls to RTL
int32_t initOnce();		int32_t initOnce();
__tgt_target_table loadBinary(void Img);		__tgt_target_table loadBinary(void Img);

▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

openmp/libomptarget/src/device.cpp

//===--------- device.cpp - Target independent OpenMP target RTL ----------===//		//===--------- device.cpp - Target independent OpenMP target RTL ----------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 370 Lines • ▼ Show 20 Lines	if (ForceDelete) {
assert(IsLast == HT.decShouldRemove(UseHoldRefCount) &&		assert(IsLast == HT.decShouldRemove(UseHoldRefCount) &&
"expected correct IsLast prediction for reset");		"expected correct IsLast prediction for reset");
}		}

const char *RefCountAction;		const char *RefCountAction;
if (!UpdateRefCount) {		if (!UpdateRefCount) {
RefCountAction = " (update suppressed)";		RefCountAction = " (update suppressed)";
} else if (IsLast) {		} else if (IsLast) {
// Mark the entry as to be deleted by this thread. Another thread might
// reuse the entry and take "ownership" for the deletion while this thread
// is waiting for data transfers. That is fine and the current thread will
// simply skip the deletion step then.
HT.setDeleteThreadId();
HT.decRefCount(UseHoldRefCount);		HT.decRefCount(UseHoldRefCount);
assert(HT.getTotalRefCount() == 0 &&		assert(HT.getTotalRefCount() == 0 &&
"Expected zero reference count when deletion is scheduled");		"Expected zero reference count when deletion is scheduled");
if (ForceDelete)		if (ForceDelete)
RefCountAction = " (reset, delayed deletion)";		RefCountAction = " (reset, delayed deletion)";
else		else
RefCountAction = " (decremented, delayed deletion)";		RefCountAction = " (decremented, delayed deletion)";
} else {		} else {
Show All 15 Lines	if (LR.Flags.IsContained \|\|
// use directly from the host.		// use directly from the host.
DP("Get HstPtrBegin " DPxMOD " Size=%" PRId64 " for unified shared "		DP("Get HstPtrBegin " DPxMOD " Size=%" PRId64 " for unified shared "
"memory\n",		"memory\n",
DPxPTR((uintptr_t)HstPtrBegin), Size);		DPxPTR((uintptr_t)HstPtrBegin), Size);
IsHostPtr = true;		IsHostPtr = true;
TargetPointer = HstPtrBegin;		TargetPointer = HstPtrBegin;
}		}

return {{IsNew, IsHostPtr}, LR.Entry, TargetPointer};		const bool TransferOwnership = IsLast && UpdateRefCount;
		if (TransferOwnership)
		eraseHDTTMapEntry(HDTTMap, LR);

		return {{IsNew, IsHostPtr, TransferOwnership}, LR.Entry, TargetPointer};
}		}

// Return the target pointer begin (where the data will be moved).		// Return the target pointer begin (where the data will be moved).
void DeviceTy::getTgtPtrBegin(HDTTMapAccessorTy &HDTTMap, void HstPtrBegin,		void DeviceTy::getTgtPtrBegin(HDTTMapAccessorTy &HDTTMap, void HstPtrBegin,
int64_t Size) {		int64_t Size) {
uintptr_t HP = (uintptr_t)HstPtrBegin;		uintptr_t HP = (uintptr_t)HstPtrBegin;
LookupResult LR = lookupMapping(HDTTMap, HstPtrBegin, Size);		LookupResult LR = lookupMapping(HDTTMap, HstPtrBegin, Size);
if (LR.Flags.IsContained \|\| LR.Flags.ExtendsBefore \|\| LR.Flags.ExtendsAfter) {		if (LR.Flags.IsContained \|\| LR.Flags.ExtendsBefore \|\| LR.Flags.ExtendsAfter) {
auto &HT = *LR.Entry;		auto &HT = *LR.Entry;
uintptr_t TP = HT.TgtPtrBegin + (HP - HT.HstPtrBegin);		uintptr_t TP = HT.TgtPtrBegin + (HP - HT.HstPtrBegin);
return (void *)TP;		return (void *)TP;
}		}

return NULL;		return NULL;
}		}

int DeviceTy::deallocTgtPtr(HDTTMapAccessorTy &HDTTMap, LookupResult LR,		int DeviceTy::eraseHDTTMapEntry(HDTTMapAccessorTy &HDTTMap, LookupResult LR) {
int64_t Size) {
// Check if the pointer is contained in any sub-nodes.		// Check if the pointer is contained in any sub-nodes.
if (!(LR.Flags.IsContained \|\| LR.Flags.ExtendsBefore \|\|		if (!(LR.Flags.IsContained \|\| LR.Flags.ExtendsBefore \|\|
LR.Flags.ExtendsAfter)) {		LR.Flags.ExtendsAfter)) {
REPORT("Section to delete (hst addr " DPxMOD ") does not exist in the"		REPORT("Section to delete (hst addr " DPxMOD ") does not exist in the"
" allocated memory\n",		" allocated memory\n",
DPxPTR(LR.Entry->HstPtrBegin));		DPxPTR(LR.Entry->HstPtrBegin));
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

auto &HT = *LR.Entry;		auto &HT = *LR.Entry;
// Verify this thread is still in charge of deleting the entry.		// Verify this thread is still in charge of deleting the entry.
assert(HT.getTotalRefCount() == 0 &&		assert(HT.getTotalRefCount() == 0 &&
HT.getDeleteThreadId() == std::this_thread::get_id() &&
"Trying to delete entry that is in use or owned by another thread.");		"Trying to delete entry that is in use or owned by another thread.");

		HDTTMap->erase(LR.Entry);
		return OFFLOAD_SUCCESS;
		}

		int DeviceTy::deallocTgtPtr(HostDataToTargetTy *Entry, int64_t Size) {
		auto &HT = *Entry;
DP("Deleting tgt data " DPxMOD " of size %" PRId64 "\n",		DP("Deleting tgt data " DPxMOD " of size %" PRId64 "\n",
DPxPTR(HT.TgtPtrBegin), Size);		DPxPTR(HT.TgtPtrBegin), Size);
deleteData((void *)HT.TgtPtrBegin);		deleteData((void *)HT.TgtPtrBegin);
INFO(OMP_INFOTYPE_MAPPING_CHANGED, DeviceID,		INFO(OMP_INFOTYPE_MAPPING_CHANGED, DeviceID,
"Removing map entry with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD		"Removing map entry with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD
", Size=%" PRId64 ", Name=%s\n",		", Size=%" PRId64 ", Name=%s\n",
DPxPTR(HT.HstPtrBegin), DPxPTR(HT.TgtPtrBegin), Size,		DPxPTR(HT.HstPtrBegin), DPxPTR(HT.TgtPtrBegin), Size,
(HT.HstPtrName) ? getNameFromMapping(HT.HstPtrName).c_str() : "unknown");		(HT.HstPtrName) ? getNameFromMapping(HT.HstPtrName).c_str() : "unknown");
void *Event = LR.Entry->getEvent();		void *Event = Entry->getEvent();
HDTTMap->erase(LR.Entry);		delete Entry;
delete LR.Entry;

int Ret = OFFLOAD_SUCCESS;		int Ret = OFFLOAD_SUCCESS;
if (Event && destroyEvent(Event) != OFFLOAD_SUCCESS) {		if (Event && destroyEvent(Event) != OFFLOAD_SUCCESS) {
REPORT("Failed to destroy event " DPxMOD "\n", DPxPTR(Event));		REPORT("Failed to destroy event " DPxMOD "\n", DPxPTR(Event));
Ret = OFFLOAD_FAIL;		Ret = OFFLOAD_FAIL;
}		}

return Ret;		return Ret;
▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

openmp/libomptarget/src/omptarget.cpp

//===------ omptarget.cpp - Target independent OpenMP target RTL -- C++ -*-===//		//===------ omptarget.cpp - Target independent OpenMP target RTL -- C++ -*-===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 609 Lines • ▼ Show 20 Lines	struct PostProcessingInfo {
int64_t DataSize;		int64_t DataSize;

/// The mapping type (bitfield).		/// The mapping type (bitfield).
int64_t ArgType;		int64_t ArgType;

/// The target pointer information.		/// The target pointer information.
TargetPointerResultTy TPR;		TargetPointerResultTy TPR;

/// Are we expecting to delete this entry or not. Even if set, we might not		PostProcessingInfo(void *HstPtr, int64_t Size, int64_t ArgType,
/// delete the entry if another thread reused the entry in the meantime.
bool DelEntry;

PostProcessingInfo(void *HstPtr, int64_t Size, int64_t ArgType, bool DelEntry,
TargetPointerResultTy TPR)		TargetPointerResultTy TPR)
: HstPtrBegin(HstPtr), DataSize(Size), ArgType(ArgType), TPR(TPR),		: HstPtrBegin(HstPtr), DataSize(Size), ArgType(ArgType), TPR(TPR)
DelEntry(DelEntry) {}		{}
};		};

/// Apply \p CB to the shadow map pointer entries in the range \p Begin, to		/// Apply \p CB to the shadow map pointer entries in the range \p Begin, to
/// \p Begin + \p Size. \p CB is called with a locked shadow pointer map and the		/// \p Begin + \p Size. \p CB is called with a locked shadow pointer map and the
/// passed iterator can be updated. If the callback returns OFFLOAD_FAIL the		/// passed iterator can be updated. If the callback returns OFFLOAD_FAIL the
/// rest of the map is not checked anymore.		/// rest of the map is not checked anymore.
template <typename CBTy>		template <typename CBTy>
static void applyToShadowMapEntries(DeviceTy &Device, CBTy CB, void *Begin,		static void applyToShadowMapEntries(DeviceTy &Device, CBTy CB, void *Begin,
Show All 35 Lines

/// Internal function to undo the mapping and retrieve the data from the device.		/// Internal function to undo the mapping and retrieve the data from the device.
int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,		int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void ArgBases, void Args, int64_t *ArgSizes,		void ArgBases, void Args, int64_t *ArgSizes,
int64_t ArgTypes, map_var_info_t ArgNames,		int64_t ArgTypes, map_var_info_t ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo, bool FromMapper) {		void **ArgMappers, AsyncInfoTy &AsyncInfo, bool FromMapper) {
int Ret;		int Ret;
SmallVector<PostProcessingInfo> PostProcessingPtrs;		SmallVector<PostProcessingInfo> PostProcessingPtrs;
void *FromMapperBase = nullptr;
// process each input.		// process each input.
for (int32_t I = ArgNum - 1; I >= 0; --I) {		for (int32_t I = ArgNum - 1; I >= 0; --I) {
// Ignore private variables and arrays - there is no mapping for them.		// Ignore private variables and arrays - there is no mapping for them.
// Also, ignore the use_device_ptr directive, it has no effect here.		// Also, ignore the use_device_ptr directive, it has no effect here.
if ((ArgTypes[I] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((ArgTypes[I] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(ArgTypes[I] & OMP_TGT_MAPTYPE_PRIVATE))		(ArgTypes[I] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	if ((ArgTypes[I] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {
// also copy-back a value we would race.		// also copy-back a value we would race.
if (IsLast) {		if (IsLast) {
if (TPR.Entry->addEventIfNecessary(Device, AsyncInfo) !=		if (TPR.Entry->addEventIfNecessary(Device, AsyncInfo) !=
OFFLOAD_SUCCESS)		OFFLOAD_SUCCESS)
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}
}		}
if (DelEntry && FromMapper && I == 0) {
DelEntry = false;
FromMapperBase = HstPtrBegin;
}

// Add pointer to the buffer for post-synchronize processing.		// Add pointer to the buffer for post-synchronize processing.
PostProcessingPtrs.emplace_back(HstPtrBegin, DataSize, ArgTypes[I],		PostProcessingPtrs.emplace_back(HstPtrBegin, DataSize, ArgTypes[I], TPR);
DelEntry && !IsHostPtr, TPR);
}		}
}		}

// TODO: We should not synchronize here but pass the AsyncInfo object to the		// TODO: We should not synchronize here but pass the AsyncInfo object to the
// allocate/deallocate device APIs.		// allocate/deallocate device APIs.
//		//
// We need to synchronize before deallocating data.		// We need to synchronize before deallocating data.
Ret = AsyncInfo.synchronize();		Ret = AsyncInfo.synchronize();
if (Ret != OFFLOAD_SUCCESS)		if (Ret != OFFLOAD_SUCCESS)
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;

// Deallocate target pointer		// Deallocate target pointer
for (PostProcessingInfo &Info : PostProcessingPtrs) {		for (PostProcessingInfo &Info : PostProcessingPtrs) {
// If we marked the entry to be deleted we need to verify no other thread		bool DelEntry = Info.TPR.Flags.IsOwned;
// reused it by now. If deletion is still supposed to happen by this thread
// LR will be set and exclusive access to the HDTT map will avoid another
// thread reusing the entry now. Note that we do not request (exclusive)
// access to the HDTT map if Info.DelEntry is not set.
LookupResult LR;
DeviceTy::HDTTMapAccessorTy HDTTMap =
Device.HostDataToTargetMap.getExclusiveAccessor(!Info.DelEntry);

if (Info.DelEntry) {
LR = Device.lookupMapping(HDTTMap, Info.HstPtrBegin, Info.DataSize);
if (LR.Entry->getTotalRefCount() != 0 \|\|
LR.Entry->getDeleteThreadId() != std::this_thread::get_id()) {
// The thread is not in charge of deletion anymore. Give up access to
// the HDTT map and unset the deletion flag.
HDTTMap.destroy();
Info.DelEntry = false;
}
}

// If we copied back to the host a struct/array containing pointers, we		// If we copied back to the host a struct/array containing pointers, we
// need to restore the original host pointer values from their shadow		// need to restore the original host pointer values from their shadow
// copies. If the struct is going to be deallocated, remove any remaining		// copies. If the struct is going to be deallocated, remove any remaining
// shadow pointer entries for this struct.		// shadow pointer entries for this struct.
auto CB = [&](ShadowPtrListTy::iterator &Itr) {		auto CB = [&](ShadowPtrListTy::iterator &Itr) {
// If we copied the struct to the host, we need to restore the pointer.		// If we copied the struct to the host, we need to restore the pointer.
if (Info.ArgType & OMP_TGT_MAPTYPE_FROM) {		if (Info.ArgType & OMP_TGT_MAPTYPE_FROM) {
void ShadowHstPtrAddr = (void )Itr->first;		void ShadowHstPtrAddr = (void )Itr->first;
*ShadowHstPtrAddr = Itr->second.HstPtrVal;		*ShadowHstPtrAddr = Itr->second.HstPtrVal;
DP("Restoring original host pointer value " DPxMOD " for host "		DP("Restoring original host pointer value " DPxMOD " for host "
"pointer " DPxMOD "\n",		"pointer " DPxMOD "\n",
DPxPTR(Itr->second.HstPtrVal), DPxPTR(ShadowHstPtrAddr));		DPxPTR(Itr->second.HstPtrVal), DPxPTR(ShadowHstPtrAddr));
}		}
// If the struct is to be deallocated, remove the shadow entry.		// If the struct is to be deallocated, remove the shadow entry.
if (Info.DelEntry) {		if (DelEntry) {
DP("Removing shadow pointer " DPxMOD "\n", DPxPTR((void **)Itr->first));		DP("Removing shadow pointer " DPxMOD "\n", DPxPTR((void **)Itr->first));
auto OldItr = Itr;		auto OldItr = Itr;
Itr++;		Itr++;
Device.ShadowPtrMap.erase(OldItr);		Device.ShadowPtrMap.erase(OldItr);
} else {		} else {
++Itr;		++Itr;
}		}
return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
};		};
applyToShadowMapEntries(Device, CB, Info.HstPtrBegin, Info.DataSize,		applyToShadowMapEntries(Device, CB, Info.HstPtrBegin, Info.DataSize,
Info.TPR);		Info.TPR);

// If we are deleting the entry the DataMapMtx is locked and we own the		// If we are deleting the entry the DataMapMtx is locked and we own the
// entry.		// entry.
if (Info.DelEntry) {		if (DelEntry) {
if (!FromMapperBase \|\| FromMapperBase != Info.HstPtrBegin)		Ret = Device.deallocTgtPtr(Info.TPR.Entry, Info.DataSize);
Ret = Device.deallocTgtPtr(HDTTMap, LR, Info.DataSize);

if (Ret != OFFLOAD_SUCCESS) {		if (Ret != OFFLOAD_SUCCESS) {
REPORT("Deallocating data from device failed.\n");		REPORT("Deallocating data from device failed.\n");
break;		break;
}		}
}		}
}		}

▲ Show 20 Lines • Show All 669 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Avoid deleting the same entry multiple timesNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 455076

openmp/libomptarget/include/device.h

openmp/libomptarget/src/device.cpp

openmp/libomptarget/src/omptarget.cpp

[libomptarget] Avoid deleting the same entry multiple times
Needs ReviewPublic