This implements OpenMP runtime support for the OpenMP TR8 present
map type modifier. The previous patch in this series implements Clang
front end support. See that patch summary for behaviors that are not
yet supported.
Details
Diff Detail
Event Timeline
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 | I think this else-if should be moved right after the else-if that checks for explicit extension of mapping (i.e. after line 194) and outside the else if (Size) branch. If we have the present modifier then the data must be mapped already no matter whether a size is specified. | |
openmp/libomptarget/src/omptarget.cpp | ||
266–281 | Correct. The only case we can have a PTR_AND_OBJ entry is if we have a pointer which is member of a struct or element of an array. This means that whenever we have something like: #pragma omp target map(s.p[0:N]) clang will first map the struct/array and then we'll visit the PTR_AND_OBJ entry, so the space for the pointer itself has been allocated already. I think it's safe to skip this check here (or you can leave it as a form of defensive programming). |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
418 | Can we print the error in non debug mode as well? Is there a way we can provide more information on the location and variable? |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 | In the case of Size == 0, as far as I can tell, nothing is ever placed in HostDataToTargetMap, with or without the patch under review. Instead, getOrAllocTgtPtr always handles this case as already mapped and returns NULL for it. I believe your suggested change would not affect this behavior. However, when HasPresentModifier, your suggested change would cause the Mapping required but does not exist debug message to be printed for this case, incorrectly implying that this case is never already mapped and that a runtime error should follow. A runtime error won't follow because Size == 0. In the case of Size != 0, your suggested change would affect unified shared memory. As far as I can tell, unified shared memory is handled similarly to the case of Size == 0: it's never placed in HostDataToTargetMap, and getOrAllocTgtPtr always handles it as already mapped. However, when HasPresentModifier, your suggested change would cause getOrAllocTgtPtr to always handle unified shared memory as unmapped, and a runtime error will follow, so unified shared memory is then unusable with the present modifier. In both cases, your suggested change seems to produce the wrong behavior. Do you agree? I need to add tests to cover these cases. | |
openmp/libomptarget/src/omptarget.cpp | ||
266–281 | Thanks for confirming. I'll leave it but adjust the comment not to be a todo. | |
409–422 | This should be TgtPtrBegin || !data_size, mirroring similar code in target_data_begin. | |
418 | I would prefer that too. When I investigated this originally, I didn't find a good precedent in this runtime implementation, so I decided to be conservative. Is there an existing diagnostic I should model this after, or is this a unique case? |
Added tests for cases of size-zero mappings and unified shared memory.
Fixed handling of size-zero mapping in target_data_end.
Converted TODO into a normal comment.
openmp/libomptarget/src/device.h | ||
---|---|---|
183 | I see no remaining getOrAllocTgtPtr calls that make use of these default arguments. Any objection to removing them in this patch? |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 |
This is not always true. We can have zero size and still get a valid device address. Consider the example below: int *p = malloc(N); #pragma omp target enter data map(to: p[0:N]) ... #pragma omp target { p[0] = 1; } Here p[0:N] is mapped via enter data and address &p[0] is inserted into HostDataToTargetMap. When we encounter the target region, the compiler captures pointer p implicitly as it appears inside the target region, however the compiler doesn't know its size (it's a pointer, not a named array), therefore it will be mapped as p[0:0], i.e. as a zero-size pointer. getOrAllocTgtPtr will find address &p[0] in HostDataToTargetMap (lr.isContained) and return the corresponding device address. If we have demanded that p[0:0] is present on the device, then libomptarget must check for its presence even if size is 0. So if you invoke the target region like this: #pragma omp target map(present, alloc : p[0:0]) then libomptarget must either:
The latter case is not covered by this version of the patch. Regarding unified shared memory you're right. So my proposed change should be: if (contained) { ... } else if (explicit extension) { ... } else if (HasPresentModifier && (!USM || (USM && Close) ) { error out } else if (Size) { ... } | |
openmp/libomptarget/src/omptarget.cpp | ||
418 | Unfortunately, libomptarget does not have access to source code location. __tgt_ API functions do not have a related argument, like the ident_t *loc argument of __kmpc_ API functions in libomp. Of course, if there is demand for such a feature we can implement it, although we would need to change *EVERY* API function in libomptarget. |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 | Thanks for explaining. What is the correct behavior in the following case? #pragma omp target data map(alloc : p[0:0]) #pragma omp target data map(present, alloc : p[0:0]) |
Changed present modifier to fail for zero-length array section if the base address is not already mapped, and updated tests.
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 |
My latest update assumes the second directive should produce a runtime error because the first directive doesn't actually map anything. Please let me know if that's not right.
I organized this a little differently to avoid repeating the USM conditions, but I intend it to be equivalent. |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 |
For the first one, if the object p is pointing to has been mapped before then there is no noticeable effect, the only thing that happens is that getOrAllocTgtPtr increments the RefCount of the mapping and returns the corresponding device address. If the object hasn't been mapped before and because size is zero, getOrAllocTgtPtr will return NULL (there is nothing to be allocated). With the present modifier, if the object is already mapped, getOrAllocTgtPtr increments the RefCount and returns the corresponding device address (just like in the former case). If the object is not mapped, the present modifier will trigger an error - in other words NULL is not an option if we have requested something to be present. | |
openmp/libomptarget/src/device.h | ||
183 | Sure, they can go. |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 | Sorry, I meant that these directives are nested and the array was not previously mapped. It sounds like the outer directive does not map it in that case, and so the inner directive produces a runtime error. |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 | Correct. |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
209–212 | Thanks for all the careful explanations! I believe the logic is now correct for zero-length array sections. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
418 |
That would be nice, but in a later patch. For this patch, perhaps just a simple message clarifying that the error is due to the present modifier would be sufficient. Should we define another macro next to FATAL_MESSAGE to print messages to stderr without exiting? The exit happens later when FATAL_MESSAGE is actually called. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
418 | That would be good, such a message is missing from other places in libomptarget as well (e.g. when we explicitly try to extend an existing mapping). Currently such messages are only printed in debug output. It's important for the user to know why offload failed even if libomptarget is running in release mode. |
I added stderr messages for present violations, as discussed. I've named the new macro MESSAGE to contrast with FATAL_MESSAGE, but I'm open to suggestions.
I adjusted the new tests to check for these new messages instead of the LIBOMPTARGET_DEBUG=1 output.
Thanks for the review!
The Clang part of this (D83061) must land first, so I'll wait for that review.
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
416 | This is right for omp target exit data, as in openmp/libomptarget/test/mapping/present/target_exit_data.c. I think it's wrong for omp target data. My read of TR8 is that the present assertion does not happen on exit from a region. For example: #pragma omp target data map(alloc:x) { #pragma omp target data map(present,alloc:x) { #pragma omp target exit data map(delete:x) } // fails here but shouldn't } |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
268 | This comment is incorrect. We can also have PTR_AND_OBJ in the case of omp declare target link as discussed at http://lists.llvm.org/pipermail/openmp-dev/2020-July/003586.html. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
268 | Right, the comment is not relevant anymore. But I think the conveyed message is still correct. Apart from structs/arrays, if we have a PTR_AND_OBJ for a global pointer then by definition the pointer itself has been mapped (since it's a global). Maybe update the comment? The logic is still correct I think. |
openmp/libomptarget/src/device.cpp | ||
---|---|---|
215 | The \n in each of my MESSAGE calls should be dropped. MESSAGE already adds that. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
268 | Yes, I believe the code is right. I'll try to generalize the comment. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
268 | I was just reminded that globals marked with declare target link are also PTR_AND_OBJ entries. declare target link already works in the exact same way as the proposed change we discussed at the July 15 telecon. |
Rebased onto latest D83061. Updated tests to use -fopenmp-version=51.
Removed extra newlines from MESSAGE calls.
Tried to fix PTR_AND_OBJ comment.
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
268 | If we're so sure it's impossible to have !Pointer_TgtPtrBegin here, should we have an assert? That way, if the situation evolves, testing will hopefully reveal when our assumptions here are wrong. Based on my understanding now, it would probably indicate a bug somewhere. But I guess it might just mean the assert then needs to be removed and the comment needs to be updated. Either way, it will be good to know about the change. My understanding is that assert would only be active for debug builds. For release builds, we could still have the if and return OFFLOAD_FAIL for the sake of defensive programming. | |
416 | In order to fix this, it looks like I could modify Clang to pass a new argument to __tgt_target_data_end to indicate whether it's just the end of a region (as in omp target exit data) or both the start and end of a region (as in omp target data). Before I work on that, can somewhere confirm my understanding of TR8 behavior, described in the previous comment? |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
268 | The assert would currently fail for the following: int *x; #pragma omp declare target link(x) int main() { #pragma omp target map(present, tofrom: x[0:3]) ; return 0; } |
Tried to fix comments on PTR_AND_OBJ.
Added fixme for case of present at exit from omp target data.
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
268 | I didn't add the assert. I updated the comments to describe the situation as I understand it. Please advise if it's not right. | |
416 |
I reversed "omp target exit data" and "omp target data" there. Otherwise, I believe that comment is correct. For now, I've decided to add a fixme and handle this in a later patch after pushing this series. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
417 | It looks like DPxMOD and DPxPTR are not defined for release builds, and so bots are failing. I had only tested debug builds. Here's an excerpt of the relevant code from omptarget.h: #ifdef OMPTARGET_DEBUG #include <stdio.h> #define DEBUGP(prefix, ...) \ { \ fprintf(stderr, "%s --> ", prefix); \ fprintf(stderr, __VA_ARGS__); \ } #ifndef __STDC_FORMAT_MACROS #define __STDC_FORMAT_MACROS #endif #include <inttypes.h> #define DPxMOD "0x%0*" PRIxPTR #define DPxPTR(ptr) ((int)(2*sizeof(uintptr_t))), ((uintptr_t) (ptr)) Any objection to moving relevant parts before the #ifdef OMPTARGET_DEBUG? Or is there a better solution? |
I see no remaining getOrAllocTgtPtr calls that make use of these default arguments. Any objection to removing them in this patch?