This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/libomptarget/src/
-
libomptarget/
-
src/
-
omptarget.cpp

Differential D84381

[OpenMP] Wait for kernel prior to memory deallocation
AbandonedPublic

Authored by tianshilei1992 on Jul 22 2020, 7:27 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
ye-luo

Commits

rG9b2832c0897c: [OpenMP] Wait for kernel prior to memory deallocation

Summary

In the function target, memory deallocation and target_data_end is called
immediately returning from launching kernel. This might cause a race condition
that the corresponding memory is still being used by the kernel and a potential
issue that when the kernel starts to execute, its required data have already
been deallocated, especially when multiple kernels running concurrently. Since
nevertheless, we will block the thread issuing the target offloading at the end
of the target, we just added a synchronization before memory deallocation
to make sure the correctness.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tianshilei1992 created this revision.Jul 22 2020, 7:27 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJul 22 2020, 7:27 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: openmp-commits, sstefan1, guansong, yaxunl. · View Herald Transcript

LGTM. Typo in the commit message. Also shorten the commit title, e.g., wait for kernel prior to memory deallocation

This revision is now accepted and ready to land.Jul 22 2020, 7:33 PM

Updated based on comments

tianshilei1992 retitled this revision from [OpenMP] Fixed an issue that memory free might be called inappropriately when the kernel is still running to [OpenMP] Wait for kernel prior to memory deallocation.Jul 22 2020, 7:40 PM

tianshilei1992 edited the summary of this revision. (Show Details)

Harbormaster failed remote builds in B65319: Diff 280007!Jul 22 2020, 7:52 PM

Closed by commit rG9b2832c0897c: [OpenMP] Wait for kernel prior to memory deallocation (authored by tianshilei1992). · Explain WhyJul 22 2020, 7:55 PM

This revision was automatically updated to reflect the committed changes.

Does it mean the D2H will always run synchronously after this change?
Does it also mean that target_data_end should be split into data transfer and data free parts?

In D84381#2168483, @ye-luo wrote:

Does it mean the D2H will always run synchronously after this change?
Does it also mean that target_data_end should be split into data transfer and data free parts?

That is good point. There is a critical issue in this patch. D2H is still async but the synchronization is lost.

This revision is now accepted and ready to land.Jul 22 2020, 8:04 PM

Indeed, target_data_begin should be split as well. cudaMalloc blocks the whole device. Alternating cudaMalloc and transfer only makes the whole process further slower. Better to make all the allocation and then start queuing the transfer.

Harbormaster failed remote builds in B65320: Diff 280008!Jul 22 2020, 8:16 PM

Fixed an issue that target may return before D2H is still in progress

tianshilei1992 requested review of this revision.Jul 22 2020, 9:17 PM

tianshilei1992 edited the summary of this revision. (Show Details)

OK. It is less broken now.
target_data_end still does Device.deallocTgtPtr and needs a sync before it.
To fully fix this issue, target_data_end must be spitted.

ye-luo accepted this revision.Jul 22 2020, 9:48 PM

This revision is now accepted and ready to land.Jul 22 2020, 9:48 PM

Harbormaster completed remote builds in B65330: Diff 280025.Jul 22 2020, 9:52 PM

tianshilei1992 added a comment.Jul 22 2020, 9:52 PM

This comment was removed by tianshilei1992.

Will get it fix in another patch.

Revision Contents

Path

Size

openmp/

libomptarget/

src/

omptarget.cpp

15 lines

Diff 280025

openmp/libomptarget/src/omptarget.cpp

Show First 20 Lines • Show All 921 Lines • ▼ Show 20 Lines	rc = Device.run_region(TargetTable->EntriesBegin[TM->Index].addr,
&tgt_args[0], &tgt_offsets[0], tgt_args.size(),		&tgt_args[0], &tgt_offsets[0], tgt_args.size(),
&AsyncInfo);		&AsyncInfo);
}		}
if (rc != OFFLOAD_SUCCESS) {		if (rc != OFFLOAD_SUCCESS) {
DP ("Executing target region abort target.\n");		DP ("Executing target region abort target.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

		// This synchronization makes sure that before we deallocate any memory, the
		// kernel has already finished. Otherwise, data will be freed when they're
		// still being used.
		// TODO: If we can separate the memory free and data transfer, we can avoid
		// this synchronization.
		if (Device.RTL->synchronize) {
		rc = Device.RTL->synchronize(device_id, &AsyncInfo);
		if (rc != OFFLOAD_SUCCESS) {
		DP("Failed to synchronize.\n");
		return OFFLOAD_FAIL;
		}
		}

// Deallocate (first-)private arrays		// Deallocate (first-)private arrays
for (auto it : fpArrays) {		for (auto it : fpArrays) {
int rt = Device.RTL->data_delete(Device.RTLDeviceID, it);		int rt = Device.RTL->data_delete(Device.RTLDeviceID, it);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Deallocation of (first-)private arrays failed.\n");		DP("Deallocation of (first-)private arrays failed.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}
}		}

// Move data from device.		// Move data from device.
int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,		int rt = target_data_end(Device, arg_num, args_base, args, arg_sizes,
arg_types, arg_mappers, &AsyncInfo);		arg_types, arg_mappers, &AsyncInfo);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Call to target_data_end failed, abort targe.\n");		DP("Call to target_data_end failed, abort targe.\n");
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

		// This synchronization makes sure that all data transfer have been done
		// before returning from this function.
if (Device.RTL->synchronize)		if (Device.RTL->synchronize)
return Device.RTL->synchronize(device_id, &AsyncInfo);		return Device.RTL->synchronize(device_id, &AsyncInfo);

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}