This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/trunk/libomptarget/
-
trunk/
-
libomptarget/
-
include/
-
omptarget.h
-
src/
-
interface.cpp
-
omptarget.cpp

Differential D44186

[OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members
ClosedPublic

Authored by grokos on Mar 6 2018, 7:26 PM.

Download Raw Diff

Details

Reviewers

Hahnfeld
jlpeyton
jhen

Commits

rGa0da24683b2b: [OpenMP][libomptarget] New map interface: remove translation code and ensure…
rOMP337455: [OpenMP][libomptarget] New map interface: remove translation code and ensure…
rL337455: [OpenMP][libomptarget] New map interface: remove translation code and ensure…

Summary

This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment.

Diff Detail

Repository: rL LLVM

Event Timeline

grokos created this revision.Mar 6 2018, 7:26 PM

Herald added a subscriber: guansong. · View Herald TranscriptMar 6 2018, 7:26 PM

grokos retitled this revision from [Clang][OpenMP] New clang/libomptarget map interface: remove translation code to [OpenMP] New clang/libomptarget map interface: remove translation code.Mar 6 2018, 7:26 PM

These are two changes and need to be two patches.

libomptarget/src/omptarget.cpp
197–199 ↗	(On Diff #137318)	I thought this is now done in the compiler?

This revision now requires changes to proceed.Mar 7 2018, 1:45 AM

OK, I left the change to the internal device ID representation out of this patch.

grokos added inline comments.Mar 7 2018, 8:23 AM

libomptarget/src/omptarget.cpp
197–199 ↗	(On Diff #137318)	No, it's not. The compiler could take care of this issue, but it's not its job. The compiler should just inform the runtime that we requested a mapping starting from some address. If `CUDA memcpy`, for instance, has some requirements regarding the alignment of addresses, that's not the compiler's business. The compiler doesn't and shouldn't care about what happens at the plugin level of libomptarget.

grokos edited the summary of this revision. (Show Details)Mar 7 2018, 8:24 AM

Hahnfeld added inline comments.Mar 7 2018, 9:02 AM

libomptarget/src/omptarget.cpp
197–199 ↗	(On Diff #137318)	If that's specific to CUDA, why does it happen in the plugin agnostic part of libomptarget?

grokos added inline comments.Mar 7 2018, 9:16 AM

libomptarget/src/omptarget.cpp
197–199 ↗	(On Diff #137318)	That's a good point. A more elegant solution would be to extend the plugin interface (`__tgt_rtl_*` functions) with a new function which the agnostic library can query in order to get any alignment requirements. I'm in favour of this approach, but I need to ask other people what they think. In any case, implementing this potential change is not part of this patch. Thoughts?

RaviNarayanaswamy added a subscriber: RaviNarayanaswamy.Mar 7 2018, 1:24 PM

RaviNarayanaswamy added inline comments.

libomptarget/src/omptarget.cpp
197–199 ↗	(On Diff #137318)	I am not sure what you are trying to do here. For structure members, the compiler should generate the begin address, offset and size. The code generated for the target is using the beginning of the struct to access the field. So cannot just pad the field member.

Hahnfeld added inline comments.Mar 9 2018, 1:27 AM

libomptarget/src/omptarget.cpp
197–199 ↗	(On Diff #137318)	Good point, this needs clarification. Maybe @grokos could share a code example where this padding is needed and point to documentation where it says that `cuMemcpy` can only handle aligned pointers? My guess which might be completely wrong: Maybe the begin address is just for transfer and the target code will use the subsequent entries which point to the member directly?

grokos added inline comments.Mar 9 2018, 2:48 PM

libomptarget/src/omptarget.cpp
197–199 ↗	(On Diff #137318)	@RaviNarayanaswamy : What you refer to is the `Base` address, which is the starting address of the struct. And you are right, the target code uses this address to access members of the struct and we cannot modify it. What I am padding is the `Begin` address, which is the address of the first mapped member. I was mistaken by the debug output, padding is not needed for memory transfers, but for the kernel execution itself. It ensures that the alignment of each mapped field remains what it should be. E.g. struct S { int a; // 4-aligned int b; // 4-aligned int *p; // 8-aligned } s1; ... #pragma omp target map(tofrom: s1.b, s1.p[0:N]) { s1.b = 5; for (int i...) s1.p[i] = ...; } In this example we are mapping `s1` starting from member `b`. So, `BaseAddress=&s1=&s1.a` and `BeginAddress=&s1.b`. Let's assume that the struct begins at address `0x100`. Then `&s1.a=0x100`, `&s1.b=0x104`, `&s1.p=0x108`. Each member obeys the alignment requirements for its type. Now, when we allocate memory on the device, in CUDA's case `cuMemAlloc()` returns an address which is at least 256-aligned. This means that the chunk of the struct on the device will start at a 256-aligned address, let's say `0x200`. Then the address of `b` will be `0x200` and address of `p` will be a misaligned `0x204` (on the host there was no need to add padding between `b` and `p`, so `p` comes exactly 4 bytes after `b`). If the device kernel tries to access `s1.p`, a `misaligned address` error occurs (as reported by the CUDA plugin). By padding the begin address down to a multiple of 8 and extending the size of the allocated chuck accordingly, the chuck on the device will start at `0x200` with the padding (4 bytes), then `&s1.b=0x204` and `&s1.p=0x208`, as they should be to satisfy the alignment requirements.

Was the new map interface added to Clang and just missed it?

In D44186#1105894, @Hahnfeld wrote:

Was the new map interface added to Clang and just missed it?

No, the clang patch needs to be reimplemented, the current version is not going to be accepted. I'm looking into it.

The libomptarget-side of things does not need to change, however, so this patch will remain as-is, waiting to be upstreamed once the clang patch is ready (unless there are comments of course).

Hi George, when do you plan to commit this patch? I cleaned the patch for the clang and it is ready to be committed.

In D44186#1166386, @ABataev wrote:

Hi George, when do you plan to commit this patch? I cleaned the patch for the clang and it is ready to be committed.

Let's coordinate in order to commit both patches at the same time. The patch is currently blocked by Jonas.

@Hahnfeld: Is there any other issue that needs to be addressed? If not, is this good to go?

In D44186#1166793, @grokos wrote:

Let's coordinate in order to commit both patches at the same time. The patch is currently blocked by Jonas.

@Hahnfeld: Is there any other issue that needs to be addressed? If not, is this good to go?

I think we were mainly waiting for the Clang patch, removing the translation code itself is highly desirable (and is blocking some outstanding fixes).

A second point was the alignment problem which I think I now understood. Please update the patch description accordingly so that others can get the rationale from looking at the commit log.
If the code you posted inline triggers the problem (if not, I'd need a minimal example that would break), I'm happy for now and will test after the changes have landed.

One minor question inline, but I think we can proceed for now as long as we don't regress.

libomptarget/src/omptarget.cpp
202–204 ↗	(On Diff #137396)	Again looking at this code, what if the `member_of` comes later and is not directly adjacent?

This revision is now accepted and ready to land.Jul 18 2018, 11:34 PM

grokos retitled this revision from [OpenMP] New clang/libomptarget map interface: remove translation code to [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members.Jul 19 2018, 5:48 AM

grokos edited the summary of this revision. (Show Details)

Added example code to demonstrate the need for padding in partially mapped structs.

grokos added inline comments.Jul 19 2018, 6:31 AM

libomptarget/src/omptarget.cpp
202–204 ↗	(On Diff #137396)	This is a concern, but at least for now clang produces the combined entry followed by the individual fields. I think this behavior should be enforced in the compiler. Otherwise, we would have to scan all other entries trying to find a `member_of`. This would be a process of quadratic complexity, which is what we wanted to eliminate via the new map interface.

@ABataev: Can you put a link to the clang-side patch in the description so that we link the two patches together? Also, please let me know when you commit the clang patch so that I commit this one as well.

In D44186#1168001, @grokos wrote:

@ABataev: Can you put a link to the clang-side patch in the description so that we link the two patches together? Also, please let me know when you commit the clang patch so that I commit this one as well.

The patch will be committed as is, without the review. I planned to commit it as soon as you commit the libomptarget patch.

Closed by commit rL337455: [OpenMP][libomptarget] New map interface: remove translation code and ensure… (authored by grokos). · Explain WhyJul 19 2018, 6:46 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 19 2018, 6:46 AM

@ABataev did you commit the Clang patch and I just missed it?

Yes, it was committed

Best regards,
Alexey Bataev

23 июля 2018 г., в 5:18, Jonas Hahnfeld via Phabricator <reviews@reviews.llvm.org> написал(а):

Hahnfeld added a comment.

@ABataev did you commit the Clang patch and I just missed it?

Repository:
rL LLVM

https://reviews.llvm.org/D44186

asavonic mentioned this in D135462: [SelectionDAG] Do not second-guess alignment for alloca.Dec 20 2022, 12:26 PM

Revision Contents

Path

Size

openmp/

trunk/

libomptarget/

include/

omptarget.h

2 lines

src/

interface.cpp

357 lines

omptarget.cpp

118 lines

Diff 156260

openmp/trunk/libomptarget/include/omptarget.h

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	enum tgt_map_type {
// return base device address of mapped data		// return base device address of mapped data
OMP_TGT_MAPTYPE_RETURN_PARAM = 0x040,		OMP_TGT_MAPTYPE_RETURN_PARAM = 0x040,
// private variable - not mapped		// private variable - not mapped
OMP_TGT_MAPTYPE_PRIVATE = 0x080,		OMP_TGT_MAPTYPE_PRIVATE = 0x080,
// copy by value - not mapped		// copy by value - not mapped
OMP_TGT_MAPTYPE_LITERAL = 0x100,		OMP_TGT_MAPTYPE_LITERAL = 0x100,
// mapping is implicit		// mapping is implicit
OMP_TGT_MAPTYPE_IMPLICIT = 0x200,		OMP_TGT_MAPTYPE_IMPLICIT = 0x200,
// member of struct, member given by 16 MSBs - 1		// member of struct, member given by [16 MSBs] - 1
OMP_TGT_MAPTYPE_MEMBER_OF = 0xffff000000000000		OMP_TGT_MAPTYPE_MEMBER_OF = 0xffff000000000000
};		};

enum OpenMPOffloadingDeclareTargetFlags {		enum OpenMPOffloadingDeclareTargetFlags {
/// Mark the entry as having a 'link' attribute.		/// Mark the entry as having a 'link' attribute.
OMP_DECLARE_TARGET_LINK = 0x01,		OMP_DECLARE_TARGET_LINK = 0x01,
/// Mark the entry as being a global constructor.		/// Mark the entry as being a global constructor.
OMP_DECLARE_TARGET_CTOR = 0x02,		OMP_DECLARE_TARGET_CTOR = 0x02,
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

openmp/trunk/libomptarget/src/interface.cpp

Show All 27 Lines
}		}

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
/// unloads a target shared library		/// unloads a target shared library
EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {		EXTERN void __tgt_unregister_lib(__tgt_bin_desc *desc) {
RTLs.UnregisterLib(desc);		RTLs.UnregisterLib(desc);
}		}

// Following datatypes and functions (tgt_oldmap_type, combined_entry_t,
// translate_map, cleanup_map) will be removed once the compiler starts using
// the new map types.

// Old map types
enum tgt_oldmap_type {
OMP_TGT_OLDMAPTYPE_TO = 0x001, // copy data from host to device
OMP_TGT_OLDMAPTYPE_FROM = 0x002, // copy data from device to host
OMP_TGT_OLDMAPTYPE_ALWAYS = 0x004, // copy regardless of the ref. count
OMP_TGT_OLDMAPTYPE_DELETE = 0x008, // force unmapping of data
OMP_TGT_OLDMAPTYPE_MAP_PTR = 0x010, // map pointer as well as pointee
OMP_TGT_OLDMAPTYPE_FIRST_MAP = 0x020, // first occurrence of mapped variable
OMP_TGT_OLDMAPTYPE_RETURN_PTR = 0x040, // return TgtBase addr of mapped data
OMP_TGT_OLDMAPTYPE_PRIVATE_PTR = 0x080, // private variable - not mapped
OMP_TGT_OLDMAPTYPE_PRIVATE_VAL = 0x100 // copy by value - not mapped
};

// Temporary functions for map translation and cleanup
struct combined_entry_t {
int num_members; // number of members in combined entry
void *base_addr; // base address of combined entry
void *begin_addr; // begin address of combined entry
void *end_addr; // size of combined entry
};

static void translate_map(int32_t arg_num, void args_base, void args,
int64_t arg_sizes, int64_t arg_types, int32_t &new_arg_num,
void &new_args_base, void &new_args, int64_t *&new_arg_sizes,
int64_t *&new_arg_types, bool is_target_construct) {
if (arg_num <= 0) {
DP("Nothing to translate\n");
new_arg_num = 0;
return;
}

// array of combined entries
combined_entry_t *cmb_entries =
(combined_entry_t ) alloca(arg_num sizeof(combined_entry_t));
// number of combined entries
long num_combined = 0;
// old entry is MAP_PTR?
bool is_ptr_old = (bool ) alloca(arg_num * sizeof(bool));
// old entry is member of member_of[old] cmb_entry
int member_of = (int ) alloca(arg_num * sizeof(int));
// temporary storage for modifications of the original arg_types
int64_t mod_arg_types = (int64_t ) alloca(arg_num *sizeof(int64_t));

DP("Translating %d map entries\n", arg_num);
for (int i = 0; i < arg_num; ++i) {
member_of[i] = -1;
is_ptr_old[i] = false;
mod_arg_types[i] = arg_types[i];
// Scan previous entries to see whether this entry shares the same base
for (int j = 0; j < i; ++j) {
void *new_begin_addr = NULL;
void *new_end_addr = NULL;

if (mod_arg_types[i] & OMP_TGT_OLDMAPTYPE_MAP_PTR) {
if (args_base[i] == args[j]) {
if (!(mod_arg_types[j] & OMP_TGT_OLDMAPTYPE_MAP_PTR)) {
DP("Entry %d has the same base as entry %d's begin address\n", i,
j);
new_begin_addr = args_base[i];
new_end_addr = (char )args_base[i] + sizeof(void );
assert(arg_sizes[j] == sizeof(void *));
is_ptr_old[j] = true;
} else {
DP("Entry %d has the same base as entry %d's begin address, but "
"%d's base was a MAP_PTR too\n", i, j, j);
int32_t to_from_always_delete =
OMP_TGT_OLDMAPTYPE_TO \| OMP_TGT_OLDMAPTYPE_FROM \|
OMP_TGT_OLDMAPTYPE_ALWAYS \| OMP_TGT_OLDMAPTYPE_DELETE;
if (mod_arg_types[j] & to_from_always_delete) {
DP("Resetting to/from/always/delete flags for entry %d because "
"it is only a pointer to pointer\n", j);
mod_arg_types[j] &= ~to_from_always_delete;
}
}
}
} else {
if (!(mod_arg_types[i] & OMP_TGT_OLDMAPTYPE_FIRST_MAP) &&
args_base[i] == args_base[j]) {
DP("Entry %d has the same base address as entry %d\n", i, j);
new_begin_addr = args[i];
new_end_addr = (char *)args[i] + arg_sizes[i];
}
}

// If we have combined the entry with a previous one
if (new_begin_addr) {
int id;
if(member_of[j] == -1) {
// We have a new entry
id = num_combined++;
DP("Creating new combined entry %d for old entry %d\n", id, j);
// Initialize new entry
cmb_entries[id].num_members = 1;
cmb_entries[id].base_addr = args_base[j];
if (mod_arg_types[j] & OMP_TGT_OLDMAPTYPE_MAP_PTR) {
cmb_entries[id].begin_addr = args_base[j];
cmb_entries[id].end_addr = (char *)args_base[j] + arg_sizes[j];
} else {
cmb_entries[id].begin_addr = args[j];
cmb_entries[id].end_addr = (char *)args[j] + arg_sizes[j];
}
member_of[j] = id;
} else {
// Reuse existing combined entry
DP("Reusing existing combined entry %d\n", member_of[j]);
id = member_of[j];
}

// Update combined entry
DP("Adding entry %d to combined entry %d\n", i, id);
cmb_entries[id].num_members++;
// base_addr stays the same
cmb_entries[id].begin_addr =
std::min(cmb_entries[id].begin_addr, new_begin_addr);
cmb_entries[id].end_addr =
std::max(cmb_entries[id].end_addr, new_end_addr);
member_of[i] = id;
break;
}
}
}

DP("New entries: %ld combined + %d original\n", num_combined, arg_num);
new_arg_num = arg_num + num_combined;
new_args_base = (void *) malloc(new_arg_num sizeof(void *));
new_args = (void *) malloc(new_arg_num sizeof(void *));
new_arg_sizes = (int64_t ) malloc(new_arg_num sizeof(int64_t));
new_arg_types = (int64_t ) malloc(new_arg_num sizeof(int64_t));

const int64_t alignment = 8;

int next_id = 0; // next ID
int next_cid = 0; // next combined ID
int combined_to_new_id = (int ) alloca(num_combined * sizeof(int));
for (int i = 0; i < arg_num; ++i) {
// It is member_of
if (member_of[i] == next_cid) {
int cid = next_cid++; // ID of this combined entry
int nid = next_id++; // ID of the new (global) entry
combined_to_new_id[cid] = nid;
DP("Combined entry %3d will become new entry %3d\n", cid, nid);

int64_t padding = (int64_t)cmb_entries[cid].begin_addr % alignment;
if (padding) {
DP("Using a padding of %" PRId64 " for begin address " DPxMOD "\n",
padding, DPxPTR(cmb_entries[cid].begin_addr));
cmb_entries[cid].begin_addr =
(char *)cmb_entries[cid].begin_addr - padding;
}

new_args_base[nid] = cmb_entries[cid].base_addr;
new_args[nid] = cmb_entries[cid].begin_addr;
new_arg_sizes[nid] = (int64_t) ((char *)cmb_entries[cid].end_addr -
(char *)cmb_entries[cid].begin_addr);
new_arg_types[nid] = OMP_TGT_MAPTYPE_TARGET_PARAM;
DP("Entry %3d: base_addr " DPxMOD ", begin_addr " DPxMOD ", "
"size %" PRId64 ", type 0x%" PRIx64 "\n", nid,
DPxPTR(new_args_base[nid]), DPxPTR(new_args[nid]), new_arg_sizes[nid],
new_arg_types[nid]);
} else if (member_of[i] != -1) {
DP("Combined entry %3d has been encountered before, do nothing\n",
member_of[i]);
}

// Now that the combined entry (the one the old entry was a member of) has
// been inserted into the new arguments list, proceed with the old entry.
int nid = next_id++;
DP("Old entry %3d will become new entry %3d\n", i, nid);

new_args_base[nid] = args_base[i];
new_args[nid] = args[i];
new_arg_sizes[nid] = arg_sizes[i];
int64_t old_type = mod_arg_types[i];

if (is_ptr_old[i]) {
// Reset TO and FROM flags
old_type &= ~(OMP_TGT_OLDMAPTYPE_TO \| OMP_TGT_OLDMAPTYPE_FROM);
}

if (member_of[i] == -1) {
if (!is_target_construct)
old_type &= ~OMP_TGT_MAPTYPE_TARGET_PARAM;
new_arg_types[nid] = old_type;
DP("Entry %3d: base_addr " DPxMOD ", begin_addr " DPxMOD ", size %" PRId64
", type 0x%" PRIx64 " (old entry %d not MEMBER_OF)\n", nid,
DPxPTR(new_args_base[nid]), DPxPTR(new_args[nid]), new_arg_sizes[nid],
new_arg_types[nid], i);
} else {
// Old entry is not FIRST_MAP
old_type &= ~OMP_TGT_OLDMAPTYPE_FIRST_MAP;
// Add MEMBER_OF
int new_member_of = combined_to_new_id[member_of[i]];
old_type \|= ((int64_t)new_member_of + 1) << 48;
new_arg_types[nid] = old_type;
DP("Entry %3d: base_addr " DPxMOD ", begin_addr " DPxMOD ", size %" PRId64
", type 0x%" PRIx64 " (old entry %d MEMBER_OF %d)\n", nid,
DPxPTR(new_args_base[nid]), DPxPTR(new_args[nid]), new_arg_sizes[nid],
new_arg_types[nid], i, new_member_of);
}
}
}

static void cleanup_map(int32_t new_arg_num, void **new_args_base,
void *new_args, int64_t new_arg_sizes, int64_t *new_arg_types,
int32_t arg_num, void **args_base) {
if (new_arg_num > 0) {
int offset = new_arg_num - arg_num;
for (int32_t i = 0; i < arg_num; ++i) {
// Restore old base address
args_base[i] = new_args_base[i+offset];
}
free(new_args_base);
free(new_args);
free(new_arg_sizes);
free(new_arg_types);
}
}

/// creates host-to-target data mapping, stores it in the		/// creates host-to-target data mapping, stores it in the
/// libomptarget.so internal structure (an entry in a stack of data maps)		/// libomptarget.so internal structure (an entry in a stack of data maps)
/// and passes the data to the device.		/// and passes the data to the device.
EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
DP("Entering data begin region for device %ld with %d mappings\n", device_id,		DP("Entering data begin region for device %" PRId64 " with %d mappings\n",
arg_num);		device_id, arg_num);

// No devices available?		// No devices available?
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
DP("Use default device id %ld\n", device_id);		DP("Use default device id %" PRId64 "\n", device_id);
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %ld ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
return;		return;
}		}

DeviceTy& Device = Devices[device_id];		DeviceTy& Device = Devices[device_id];

// Translate maps		#ifdef OMPTARGET_DEBUG
int32_t new_arg_num;		for (int i=0; i<arg_num; ++i) {
void **new_args_base;		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
void **new_args;		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
int64_t *new_arg_sizes;		arg_sizes[i], arg_types[i]);
int64_t *new_arg_types;		}
translate_map(arg_num, args_base, args, arg_sizes, arg_types, new_arg_num,		#endif
new_args_base, new_args, new_arg_sizes, new_arg_types, false);
		target_data_begin(Device, arg_num, args_base, args, arg_sizes, arg_types);
//target_data_begin(Device, arg_num, args_base, args, arg_sizes, arg_types);
target_data_begin(Device, new_arg_num, new_args_base, new_args, new_arg_sizes,
new_arg_types);

// Cleanup translation memory
cleanup_map(new_arg_num, new_args_base, new_args, new_arg_sizes,
new_arg_types, arg_num, args_base);
}		}

EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_begin_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);
Show All 13 Lines	EXTERN void __tgt_target_data_end(int64_t device_id, int32_t arg_num,
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

RTLsMtx.lock();		RTLsMtx.lock();
size_t Devices_size = Devices.size();		size_t Devices_size = Devices.size();
RTLsMtx.unlock();		RTLsMtx.unlock();
if (Devices_size <= (size_t)device_id) {		if (Devices_size <= (size_t)device_id) {
DP("Device ID %ld does not have a matching RTL.\n", device_id);		DP("Device ID %" PRId64 " does not have a matching RTL.\n", device_id);
return;		return;
}		}

DeviceTy &Device = Devices[device_id];		DeviceTy &Device = Devices[device_id];
if (!Device.IsInit) {		if (!Device.IsInit) {
DP("uninit device: ignore");		DP("Uninit device: ignore");
return;		return;
}		}

// Translate maps		#ifdef OMPTARGET_DEBUG
int32_t new_arg_num;		for (int i=0; i<arg_num; ++i) {
void **new_args_base;		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
void **new_args;		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
int64_t *new_arg_sizes;		arg_sizes[i], arg_types[i]);
int64_t *new_arg_types;		}
translate_map(arg_num, args_base, args, arg_sizes, arg_types, new_arg_num,		#endif
new_args_base, new_args, new_arg_sizes, new_arg_types, false);
		target_data_end(Device, arg_num, args_base, args, arg_sizes, arg_types);
//target_data_end(Device, arg_num, args_base, args, arg_sizes, arg_types);
target_data_end(Device, new_arg_num, new_args_base, new_args, new_arg_sizes,
new_arg_types);

// Cleanup translation memory
cleanup_map(new_arg_num, new_args_base, new_args, new_arg_sizes,
new_arg_types, arg_num, args_base);
}		}

EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_end_nowait(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types,		void args_base, void args, int64_t arg_sizes, int64_t arg_types,
int32_t depNum, void *depList, int32_t noAliasDepNum,		int32_t depNum, void *depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);

__tgt_target_data_end(device_id, arg_num, args_base, args, arg_sizes,		__tgt_target_data_end(device_id, arg_num, args_base, args, arg_sizes,
arg_types);		arg_types);
}		}

EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,		EXTERN void __tgt_target_data_update(int64_t device_id, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
DP("Entering data update with %d mappings\n", arg_num);		DP("Entering data update with %d mappings\n", arg_num);

// No devices available?		// No devices available?
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %ld ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
return;		return;
}		}

DeviceTy& Device = Devices[device_id];		DeviceTy& Device = Devices[device_id];
target_data_update(Device, arg_num, args_base, args, arg_sizes, arg_types);		target_data_update(Device, arg_num, args_base, args, arg_sizes, arg_types);
}		}

EXTERN void __tgt_target_data_update_nowait(		EXTERN void __tgt_target_data_update_nowait(
int64_t device_id, int32_t arg_num, void args_base, void args,		int64_t device_id, int32_t arg_num, void args_base, void args,
int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,		int64_t arg_sizes, int64_t arg_types, int32_t depNum, void *depList,
int32_t noAliasDepNum, void *noAliasDepList) {		int32_t noAliasDepNum, void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);

__tgt_target_data_update(device_id, arg_num, args_base, args, arg_sizes,		__tgt_target_data_update(device_id, arg_num, args_base, args, arg_sizes,
arg_types);		arg_types);
}		}

EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,		EXTERN int __tgt_target(int64_t device_id, void *host_ptr, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
DP("Entering target region with entry point " DPxMOD " and device Id %ld\n",		DP("Entering target region with entry point " DPxMOD " and device Id %"
DPxPTR(host_ptr), device_id);		PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %ld ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

// Translate maps		#ifdef OMPTARGET_DEBUG
int32_t new_arg_num;		for (int i=0; i<arg_num; ++i) {
void **new_args_base;		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
void **new_args;		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
int64_t *new_arg_sizes;		arg_sizes[i], arg_types[i]);
int64_t *new_arg_types;		}
translate_map(arg_num, args_base, args, arg_sizes, arg_types, new_arg_num,		#endif
new_args_base, new_args, new_arg_sizes, new_arg_types, true);
		int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,
//return target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		arg_types, 0, 0, false /team/);
// arg_types, 0, 0, false /team/, false /recursive/);
int rc = target(device_id, host_ptr, new_arg_num, new_args_base, new_args,
new_arg_sizes, new_arg_types, 0, 0, false /team/);

// Cleanup translation memory
cleanup_map(new_arg_num, new_args_base, new_args, new_arg_sizes,
new_arg_types, arg_num, args_base);

return rc;		return rc;
}		}

EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,		int64_t arg_types, int32_t depNum, void depList, int32_t noAliasDepNum,
void *noAliasDepList) {		void *noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);

return __tgt_target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		return __tgt_target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,
arg_types);		arg_types);
}		}

EXTERN int __tgt_target_teams(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_teams(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types, int32_t team_num, int32_t thread_limit) {		int64_t *arg_types, int32_t team_num, int32_t thread_limit) {
DP("Entering target region with entry point " DPxMOD " and device Id %ld\n",		DP("Entering target region with entry point " DPxMOD " and device Id %"
DPxPTR(host_ptr), device_id);		PRId64 "\n", DPxPTR(host_ptr), device_id);

if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %ld ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

// Translate maps		#ifdef OMPTARGET_DEBUG
int32_t new_arg_num;		for (int i=0; i<arg_num; ++i) {
void **new_args_base;		DP("Entry %2d: Base=" DPxMOD ", Begin=" DPxMOD ", Size=%" PRId64
void **new_args;		", Type=0x%" PRIx64 "\n", i, DPxPTR(args_base[i]), DPxPTR(args[i]),
int64_t *new_arg_sizes;		arg_sizes[i], arg_types[i]);
int64_t *new_arg_types;		}
translate_map(arg_num, args_base, args, arg_sizes, arg_types, new_arg_num,		#endif
new_args_base, new_args, new_arg_sizes, new_arg_types, true);
		int rc = target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,
//return target(device_id, host_ptr, arg_num, args_base, args, arg_sizes,		arg_types, team_num, thread_limit, true /team/);
// arg_types, team_num, thread_limit, true /team/,
// false /recursive/);
int rc = target(device_id, host_ptr, new_arg_num, new_args_base, new_args,
new_arg_sizes, new_arg_types, team_num, thread_limit, true /team/);

// Cleanup translation memory
cleanup_map(new_arg_num, new_args_base, new_args, new_arg_sizes,
new_arg_types, arg_num, args_base);

return rc;		return rc;
}		}

EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,		EXTERN int __tgt_target_teams_nowait(int64_t device_id, void *host_ptr,
int32_t arg_num, void args_base, void args, int64_t *arg_sizes,		int32_t arg_num, void args_base, void args, int64_t *arg_sizes,
int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,		int64_t *arg_types, int32_t team_num, int32_t thread_limit, int32_t depNum,
void depList, int32_t noAliasDepNum, void noAliasDepList) {		void depList, int32_t noAliasDepNum, void noAliasDepList) {
if (depNum + noAliasDepNum > 0)		if (depNum + noAliasDepNum > 0)
__kmpc_omp_taskwait(NULL, 0);		__kmpc_omp_taskwait(NULL, 0);

return __tgt_target_teams(device_id, host_ptr, arg_num, args_base, args,		return __tgt_target_teams(device_id, host_ptr, arg_num, args_base, args,
arg_sizes, arg_types, team_num, thread_limit);		arg_sizes, arg_types, team_num, thread_limit);
}		}


// The trip count mechanism will be revised - this scheme is not thread-safe.		// The trip count mechanism will be revised - this scheme is not thread-safe.
EXTERN void __kmpc_push_target_tripcount(int64_t device_id,		EXTERN void __kmpc_push_target_tripcount(int64_t device_id,
uint64_t loop_tripcount) {		uint64_t loop_tripcount) {
if (device_id == OFFLOAD_DEVICE_DEFAULT) {		if (device_id == OFFLOAD_DEVICE_DEFAULT) {
device_id = omp_get_default_device();		device_id = omp_get_default_device();
}		}

if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {		if (CheckDeviceAndCtors(device_id) != OFFLOAD_SUCCESS) {
DP("Failed to get device %ld ready\n", device_id);		DP("Failed to get device %" PRId64 " ready\n", device_id);
return;		return;
}		}

DP("__kmpc_push_target_tripcount(%ld, %" PRIu64 ")\n", device_id,		DP("__kmpc_push_target_tripcount(%" PRId64 ", %" PRIu64 ")\n", device_id,
loop_tripcount);		loop_tripcount);
Devices[device_id].loopTripCnt = loop_tripcount;		Devices[device_id].loopTripCnt = loop_tripcount;
}		}

openmp/trunk/libomptarget/src/omptarget.cpp

Show All 19 Lines

#include <cassert>		#include <cassert>
#include <vector>		#include <vector>

#ifdef OMPTARGET_DEBUG		#ifdef OMPTARGET_DEBUG
int DebugLevel = 0;		int DebugLevel = 0;
#endif // OMPTARGET_DEBUG		#endif // OMPTARGET_DEBUG

		/* All begin addresses for partially mapped structs must be 8-aligned in order
		* to ensure proper alignment of members. E.g.
		*
		* struct S {
		* int a; // 4-aligned
		* int b; // 4-aligned
		* int *p; // 8-aligned
		* } s1;
		* ...
		* #pragma omp target map(tofrom: s1.b, s1.p[0:N])
		* {
		* s1.b = 5;
		* for (int i...) s1.p[i] = ...;
		* }
		*
		* Here we are mapping s1 starting from member b, so BaseAddress=&s1=&s1.a and
		* BeginAddress=&s1.b. Let's assume that the struct begins at address 0x100,
		* then &s1.a=0x100, &s1.b=0x104, &s1.p=0x108. Each member obeys the alignment
		* requirements for its type. Now, when we allocate memory on the device, in
		* CUDA's case cuMemAlloc() returns an address which is at least 256-aligned.
		* This means that the chunk of the struct on the device will start at a
		* 256-aligned address, let's say 0x200. Then the address of b will be 0x200 and
		* address of p will be a misaligned 0x204 (on the host there was no need to add
		* padding between b and p, so p comes exactly 4 bytes after b). If the device
		* kernel tries to access s1.p, a misaligned address error occurs (as reported
		* by the CUDA plugin). By padding the begin address down to a multiple of 8 and
		* extending the size of the allocated chuck accordingly, the chuck on the
		* device will start at 0x200 with the padding (4 bytes), then &s1.b=0x204 and
		* &s1.p=0x208, as they should be to satisfy the alignment requirements.
		*/
		static const int64_t alignment = 8;

/// Map global data and execute pending ctors		/// Map global data and execute pending ctors
static int InitLibrary(DeviceTy& Device) {		static int InitLibrary(DeviceTy& Device) {
/*		/*
* Map global data		* Map global data
*/		*/
int32_t device_id = Device.DeviceID;		int32_t device_id = Device.DeviceID;
int rc = OFFLOAD_SUCCESS;		int rc = OFFLOAD_SUCCESS;

▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	int CheckDeviceAndCtors(int64_t device_id) {
if (hasPendingGlobals && InitLibrary(Device) != OFFLOAD_SUCCESS) {		if (hasPendingGlobals && InitLibrary(Device) != OFFLOAD_SUCCESS) {
DP("Failed to init globals on device %" PRId64 "\n", device_id);		DP("Failed to init globals on device %" PRId64 "\n", device_id);
return OFFLOAD_FAIL;		return OFFLOAD_FAIL;
}		}

return OFFLOAD_SUCCESS;		return OFFLOAD_SUCCESS;
}		}

static short member_of(int64_t type) {		static int32_t member_of(int64_t type) {
return ((type & OMP_TGT_MAPTYPE_MEMBER_OF) >> 48) - 1;		return ((type & OMP_TGT_MAPTYPE_MEMBER_OF) >> 48) - 1;
}		}

/// Internal function to do the mapping and transfer the data to the device		/// Internal function to do the mapping and transfer the data to the device
int target_data_begin(DeviceTy &Device, int32_t arg_num,		int target_data_begin(DeviceTy &Device, int32_t arg_num,
void args_base, void args, int64_t arg_sizes, int64_t arg_types) {		void args_base, void args, int64_t arg_sizes, int64_t arg_types) {
// process each input.		// process each input.
int rc = OFFLOAD_SUCCESS;		int rc = OFFLOAD_SUCCESS;
for (int32_t i = 0; i < arg_num; ++i) {		for (int32_t i = 0; i < arg_num; ++i) {
// Ignore private variables and arrays - there is no mapping for them.		// Ignore private variables and arrays - there is no mapping for them.
if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))		(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
void *HstPtrBase = args_base[i];		void *HstPtrBase = args_base[i];
		int64_t data_size = arg_sizes[i];

		// Adjust for proper alignment if this is a combined entry (for structs).
		// Look at the next argument - if that is MEMBER_OF this one, then this one
		// is a combined entry.
		int64_t padding = 0;
		const int next_i = i+1;
		if (member_of(arg_types[i]) < 0 && next_i < arg_num &&
		member_of(arg_types[next_i]) == i) {
		padding = (int64_t)HstPtrBegin % alignment;
		if (padding) {
		DP("Using a padding of %" PRId64 " bytes for begin address " DPxMOD
		"\n", padding, DPxPTR(HstPtrBegin));
		HstPtrBegin = (char *) HstPtrBegin - padding;
		data_size += padding;
		}
		}

// Address of pointer on the host and device, respectively.		// Address of pointer on the host and device, respectively.
void Pointer_HstPtrBegin, Pointer_TgtPtrBegin;		void Pointer_HstPtrBegin, Pointer_TgtPtrBegin;
bool IsNew, Pointer_IsNew;		bool IsNew, Pointer_IsNew;
bool IsImplicit = arg_types[i] & OMP_TGT_MAPTYPE_IMPLICIT;		bool IsImplicit = arg_types[i] & OMP_TGT_MAPTYPE_IMPLICIT;
		// UpdateRef is based on MEMBER_OF instead of TARGET_PARAM because if we
		// have reached this point via __tgt_target_data_begin and not __tgt_target
		// then no argument is marked as TARGET_PARAM ("omp target data map" is not
		// associated with a target region, so there are no target parameters). This
		// may be considered a hack, we could revise the scheme in the future.
bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF);		bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF);
if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {		if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {
DP("Has a pointer entry: \n");		DP("Has a pointer entry: \n");
// base is address of pointer.		// base is address of pointer.
Pointer_TgtPtrBegin = Device.getOrAllocTgtPtr(HstPtrBase, HstPtrBase,		Pointer_TgtPtrBegin = Device.getOrAllocTgtPtr(HstPtrBase, HstPtrBase,
sizeof(void *), Pointer_IsNew, IsImplicit, UpdateRef);		sizeof(void *), Pointer_IsNew, IsImplicit, UpdateRef);
if (!Pointer_TgtPtrBegin) {		if (!Pointer_TgtPtrBegin) {
DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "		DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "
"illegal mapping).\n");		"illegal mapping).\n");
}		}
DP("There are %zu bytes allocated at target address " DPxMOD " - is%s new"		DP("There are %zu bytes allocated at target address " DPxMOD " - is%s new"
"\n", sizeof(void *), DPxPTR(Pointer_TgtPtrBegin),		"\n", sizeof(void *), DPxPTR(Pointer_TgtPtrBegin),
(Pointer_IsNew ? "" : " not"));		(Pointer_IsNew ? "" : " not"));
Pointer_HstPtrBegin = HstPtrBase;		Pointer_HstPtrBegin = HstPtrBase;
// modify current entry.		// modify current entry.
HstPtrBase = (void *)HstPtrBase;		HstPtrBase = (void *)HstPtrBase;
UpdateRef = true; // subsequently update ref count of pointee		UpdateRef = true; // subsequently update ref count of pointee
}		}

void *TgtPtrBegin = Device.getOrAllocTgtPtr(HstPtrBegin, HstPtrBase,		void *TgtPtrBegin = Device.getOrAllocTgtPtr(HstPtrBegin, HstPtrBase,
arg_sizes[i], IsNew, IsImplicit, UpdateRef);		data_size, IsNew, IsImplicit, UpdateRef);
if (!TgtPtrBegin && arg_sizes[i]) {		if (!TgtPtrBegin && data_size) {
// If arg_sizes[i]==0, then the argument is a pointer to NULL, so		// If data_size==0, then the argument could be a zero-length pointer to
// getOrAlloc() returning NULL is not an error.		// NULL, so getOrAlloc() returning NULL is not an error.
DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "		DP("Call to getOrAllocTgtPtr returned null pointer (device failure or "
"illegal mapping).\n");		"illegal mapping).\n");
}		}
DP("There are %" PRId64 " bytes allocated at target address " DPxMOD		DP("There are %" PRId64 " bytes allocated at target address " DPxMOD
" - is%s new\n", arg_sizes[i], DPxPTR(TgtPtrBegin),		" - is%s new\n", data_size, DPxPTR(TgtPtrBegin),
(IsNew ? "" : " not"));		(IsNew ? "" : " not"));

if (arg_types[i] & OMP_TGT_MAPTYPE_RETURN_PARAM) {		if (arg_types[i] & OMP_TGT_MAPTYPE_RETURN_PARAM) {
void *ret_ptr;		uintptr_t Delta = (uintptr_t)HstPtrBegin - (uintptr_t)HstPtrBase;
if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ)		void TgtPtrBase = (void )((uintptr_t)TgtPtrBegin - Delta);
ret_ptr = Pointer_TgtPtrBegin;		DP("Returning device pointer " DPxMOD "\n", DPxPTR(TgtPtrBase));
else {		args_base[i] = TgtPtrBase;
bool IsLast; // not used
ret_ptr = Device.getTgtPtrBegin(HstPtrBegin, 0, IsLast, false);
}

DP("Returning device pointer " DPxMOD "\n", DPxPTR(ret_ptr));
args_base[i] = ret_ptr;
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {		if (arg_types[i] & OMP_TGT_MAPTYPE_TO) {
bool copy = false;		bool copy = false;
if (IsNew \|\| (arg_types[i] & OMP_TGT_MAPTYPE_ALWAYS)) {		if (IsNew \|\| (arg_types[i] & OMP_TGT_MAPTYPE_ALWAYS)) {
copy = true;		copy = true;
} else if (arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) {		} else if (arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) {
// Copy data only if the "parent" struct has RefCount==1.		// Copy data only if the "parent" struct has RefCount==1.
short parent_idx = member_of(arg_types[i]);		int32_t parent_idx = member_of(arg_types[i]);
long parent_rc = Device.getMapEntryRefCnt(args[parent_idx]);		long parent_rc = Device.getMapEntryRefCnt(args[parent_idx]);
assert(parent_rc > 0 && "parent struct not found");		assert(parent_rc > 0 && "parent struct not found");
if (parent_rc == 1) {		if (parent_rc == 1) {
copy = true;		copy = true;
}		}
}		}

if (copy) {		if (copy) {
DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (hst:" DPxMOD ") -> (tgt:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));		data_size, DPxPTR(HstPtrBegin), DPxPTR(TgtPtrBegin));
int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, arg_sizes[i]);		int rt = Device.data_submit(TgtPtrBegin, HstPtrBegin, data_size);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data to device failed.\n");		DP("Copying data to device failed.\n");
rc = OFFLOAD_FAIL;		rc = OFFLOAD_FAIL;
}		}
}		}
}		}

if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {		if (arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ) {
Show All 26 Lines	int target_data_end(DeviceTy &Device, int32_t arg_num, void **args_base,
for (int32_t i = arg_num - 1; i >= 0; --i) {		for (int32_t i = arg_num - 1; i >= 0; --i) {
// Ignore private variables and arrays - there is no mapping for them.		// Ignore private variables and arrays - there is no mapping for them.
// Also, ignore the use_device_ptr directive, it has no effect here.		// Also, ignore the use_device_ptr directive, it has no effect here.
if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|		if ((arg_types[i] & OMP_TGT_MAPTYPE_LITERAL) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))		(arg_types[i] & OMP_TGT_MAPTYPE_PRIVATE))
continue;		continue;

void *HstPtrBegin = args[i];		void *HstPtrBegin = args[i];
		int64_t data_size = arg_sizes[i];
		// Adjust for proper alignment if this is a combined entry (for structs).
		// Look at the next argument - if that is MEMBER_OF this one, then this one
		// is a combined entry.
		int64_t padding = 0;
		const int next_i = i+1;
		if (member_of(arg_types[i]) < 0 && next_i < arg_num &&
		member_of(arg_types[next_i]) == i) {
		padding = (int64_t)HstPtrBegin % alignment;
		if (padding) {
		DP("Using a padding of %" PRId64 " bytes for begin address " DPxMOD
		"\n", padding, DPxPTR(HstPtrBegin));
		HstPtrBegin = (char *) HstPtrBegin - padding;
		data_size += padding;
		}
		}

bool IsLast;		bool IsLast;
bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) \|\|		bool UpdateRef = !(arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) \|\|
(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ);		(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ);
bool ForceDelete = arg_types[i] & OMP_TGT_MAPTYPE_DELETE;		bool ForceDelete = arg_types[i] & OMP_TGT_MAPTYPE_DELETE;

// If PTR_AND_OBJ, HstPtrBegin is address of pointee		// If PTR_AND_OBJ, HstPtrBegin is address of pointee
void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, arg_sizes[i], IsLast,		void *TgtPtrBegin = Device.getTgtPtrBegin(HstPtrBegin, data_size, IsLast,
UpdateRef);		UpdateRef);
DP("There are %" PRId64 " bytes allocated at target address " DPxMOD		DP("There are %" PRId64 " bytes allocated at target address " DPxMOD
" - is%s last\n", arg_sizes[i], DPxPTR(TgtPtrBegin),		" - is%s last\n", data_size, DPxPTR(TgtPtrBegin),
(IsLast ? "" : " not"));		(IsLast ? "" : " not"));

bool DelEntry = IsLast \|\| ForceDelete;		bool DelEntry = IsLast \|\| ForceDelete;

if ((arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) &&		if ((arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) &&
!(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ)) {		!(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ)) {
DelEntry = false; // protect parent struct from being deallocated		DelEntry = false; // protect parent struct from being deallocated
}		}

if ((arg_types[i] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {		if ((arg_types[i] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {
// Move data back to the host		// Move data back to the host
if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {		if (arg_types[i] & OMP_TGT_MAPTYPE_FROM) {
bool Always = arg_types[i] & OMP_TGT_MAPTYPE_ALWAYS;		bool Always = arg_types[i] & OMP_TGT_MAPTYPE_ALWAYS;
bool CopyMember = false;		bool CopyMember = false;
if ((arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) &&		if ((arg_types[i] & OMP_TGT_MAPTYPE_MEMBER_OF) &&
!(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ)) {		!(arg_types[i] & OMP_TGT_MAPTYPE_PTR_AND_OBJ)) {
// Copy data only if the "parent" struct has RefCount==1.		// Copy data only if the "parent" struct has RefCount==1.
short parent_idx = member_of(arg_types[i]);		int32_t parent_idx = member_of(arg_types[i]);
long parent_rc = Device.getMapEntryRefCnt(args[parent_idx]);		long parent_rc = Device.getMapEntryRefCnt(args[parent_idx]);
assert(parent_rc > 0 && "parent struct not found");		assert(parent_rc > 0 && "parent struct not found");
if (parent_rc == 1) {		if (parent_rc == 1) {
CopyMember = true;		CopyMember = true;
}		}
}		}

if (DelEntry \|\| Always \|\| CopyMember) {		if (DelEntry \|\| Always \|\| CopyMember) {
DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",		DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",
arg_sizes[i], DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));		data_size, DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));
int rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, arg_sizes[i]);		int rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, data_size);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Copying data from device failed.\n");		DP("Copying data from device failed.\n");
rc = OFFLOAD_FAIL;		rc = OFFLOAD_FAIL;
}		}
}		}
}		}

// If we copied back to the host a struct/array containing pointers, we		// If we copied back to the host a struct/array containing pointers, we
// need to restore the original host pointer values from their shadow		// need to restore the original host pointer values from their shadow
// copies. If the struct is going to be deallocated, remove any remaining		// copies. If the struct is going to be deallocated, remove any remaining
// shadow pointer entries for this struct.		// shadow pointer entries for this struct.
uintptr_t lb = (uintptr_t) HstPtrBegin;		uintptr_t lb = (uintptr_t) HstPtrBegin;
uintptr_t ub = (uintptr_t) HstPtrBegin + arg_sizes[i];		uintptr_t ub = (uintptr_t) HstPtrBegin + data_size;
Device.ShadowMtx.lock();		Device.ShadowMtx.lock();
for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();		for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
it != Device.ShadowPtrMap.end(); ++it) {		it != Device.ShadowPtrMap.end(); ++it) {
void ShadowHstPtrAddr = (void) it->first;		void ShadowHstPtrAddr = (void) it->first;

// An STL map is sorted on its keys; use this property		// An STL map is sorted on its keys; use this property
// to quickly determine when to break out of the loop.		// to quickly determine when to break out of the loop.
if ((uintptr_t) ShadowHstPtrAddr < lb)		if ((uintptr_t) ShadowHstPtrAddr < lb)
Show All 13 Lines	if ((arg_types[i] & OMP_TGT_MAPTYPE_FROM) \|\| DelEntry) {
DP("Removing shadow pointer " DPxMOD "\n", DPxPTR(ShadowHstPtrAddr));		DP("Removing shadow pointer " DPxMOD "\n", DPxPTR(ShadowHstPtrAddr));
Device.ShadowPtrMap.erase(it);		Device.ShadowPtrMap.erase(it);
}		}
}		}
Device.ShadowMtx.unlock();		Device.ShadowMtx.unlock();

// Deallocate map		// Deallocate map
if (DelEntry) {		if (DelEntry) {
int rt = Device.deallocTgtPtr(HstPtrBegin, arg_sizes[i], ForceDelete);		int rt = Device.deallocTgtPtr(HstPtrBegin, data_size, ForceDelete);
if (rt != OFFLOAD_SUCCESS) {		if (rt != OFFLOAD_SUCCESS) {
DP("Deallocating data from device failed.\n");		DP("Deallocating data from device failed.\n");
rc = OFFLOAD_FAIL;		rc = OFFLOAD_FAIL;
}		}
}		}
}		}
}		}

▲ Show 20 Lines • Show All 259 Lines • Show Last 20 Lines