OpenMP 5.0 removed a lot of restriction for overlapped mapped items
comparing to OpenMP 4.5. Patch restricts the checks for overlapped data
mappings only for OpenMP 4.5 and less and reorders mapping of the
arguments so, that present and alloc mappings are processed first and
then all others.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
- Generalized interface for all mapping operations.
- Fixed compatibility of declare mapper and other mapping operations with OpenMP 5.0 examples.
- Simplified declare mapper codegen.
Rebase.
This patch is an important part of 'default' mappers lookup for data members.
Thanks for working on this. Sorry to take so long to review. Before I try to digest the code, I have a few high-level questions.
Based on the test suite changes, TARGET_PARAM is disappearing from many cases. Can you explain a bit how that supports overlapping and reordering?
Have you considered issue 2337 for the OpenMP spec and how your implementation handles the ambiguous cases cited there?
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
7283 | Unrelated change? |
TARGET_PARAM is used only for marking the data that should be passed to the kernel function as an argument. We just generate it in many cases but the runtime actually does not use them. Thу patch relies on this thing, otherwise, we may pass an incorrect number of arguments to the kernel functions.
Have you considered issue 2337 for the OpenMP spec and how your implementation handles the ambiguous cases cited there?
Can you provide the details about this issue?
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
7283 | Well, in general, yes, I just tried to simplify the code, can commit it as a separate NFC. |
Is it reasonable to extract that change into a parent patch? That would at least make the test suite changes easier to follow.
Have you considered issue 2337 for the OpenMP spec and how your implementation handles the ambiguous cases cited there?
Can you provide the details about this issue?
It discusses ambiguities in the way the spec describes map clause ordering. Here's one example relevant to present:
#pragma omp target exit data map(from: x) map(present, release: x)
One statement in the spec says a map clause with from is effectively ordered before one with release. Another statement says a map clause with present is effectively ordered before one without. Which applies in the above example? In some discussions, people agreed the statement about present was intended to have precedence, but the spec doesn't say that. That resolution probably makes sense at entry to a region, but does it make sense at exit? Would it suppress from in this example? (Actually, should there be two reference counter decrements in this example or just one?)
These ambiguities are the reason I punted on this issue when implementing present. If we can come up with a reasonable implementation for all cases, perhaps we can propose a new wording for the spec.
I'll see what I can do about it.
Have you considered issue 2337 for the OpenMP spec and how your implementation handles the ambiguous cases cited there?
Can you provide the details about this issue?
It discusses ambiguities in the way the spec describes map clause ordering. Here's one example relevant to present:
#pragma omp target exit data map(from: x) map(present, release: x)One statement in the spec says a map clause with from is effectively ordered before one with release. Another statement says a map clause with present is effectively ordered before one without. Which applies in the above example? In some discussions, people agreed the statement about present was intended to have precedence, but the spec doesn't say that. That resolution probably makes sense at entry to a region, but does it make sense at exit? Would it suppress from in this example? (Actually, should there be two reference counter decrements in this example or just one?)
These ambiguities are the reason I punted on this issue when implementing present. If we can come up with a reasonable implementation for all cases, perhaps we can propose a new wording for the spec.
In tgis implementation, the mapping withthe present modifier will be ordered to be the first.
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | What is the current status of the order of the arguments clang emits? Is it still necessary to traverse arguments in reverse order here? |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Yes, still required |
If data mappings can overlap, then it follows that copies to/from the target must be done sequentially by the runtime, unless additional information on their independence exists. Alias analysis style.
I see the programmer convenience angle, but it is a shame to no longer be able to perform the to: mappings simultaneously.
- Could you separate the reordering related changes to separate patch?
- Could you mention which line in spec 4.5 was the restriction? Even 5.0/5.1 has some restrictions. Need to be clear which one you refer to.
It is impossible. There is a separate patch for removing extra TGT_TARGET_PARAM but it cannot be separated from this patch.
- Could you mention which line in spec 4.5 was the restriction? Even 5.0/5.1 has some restrictions. Need to be clear which one you refer to.
Yes, you right, I'm trying to support something like map(alloc: S) map(tofrom: S.f) or map(S, S.ptr[0]) which was not previously allowed and is a feature of OpenMP 5.0.
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Based on the conversation in |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | This change is part of this patch and cannot be committed separately. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | I mean could you fix that issue as a parent of this patch? |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 |
If fixing the reordering is part of this patch, I should have seen "target_data_function == targetDataEnd ?" branches disappear. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Nope, just with this patch. It reorders the maps and need to change the cleanup order too. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | It works just like constructors/destructors: allocate in direct order, deallocate in reversed to correctly handle map order. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | The description says that "present and alloc mappings are processed first and then all others." |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Because this is for mappers. Mapper maps are ordered by the compiler in the direct order (alloc, maps, delete) but when we need to do exit, we need to release the data in reversed order (deletes, maps, allocs). |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | I was not making the question clear. My question about "reverse" is not about having a reverse order for targetDataBegin. My question was about "reversing" from the the old code. Your change put the opposite order for targetDataBegin, targetDataEnd and targetDataUpdate cases. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 |
typo correction |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | My separate question specifically for targetDataEnd is the following. In target(), we call targetDataBegin(args) { // forward order for (int32_t i = 0; i < arg_num; ++i) { ... } } launch_kernels targetDataEnd(args) { // reverse order for (int32_t I = ArgNum - 1; I >= 0; --I) { } } At a mapper, targetDataMapper { // generate args_reverse in reverse order for targetDataEnd targetDataEnd(args_reverse) } Are we actually getting the original forward order due to one reverse in targetDataMapper and second reverse in targetDataEnd? Is this the desired behavior? This part confused me. Do I miss something? Could you explain a bit? |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Yes, something like this. targetDataEnd reverses the order of mapping arrays. But mapper generator always generates mapping arrays in the direct order (it fills mapping arrays that later processed by the targetDataEnd function). We could fix this by passing extra Boolean flag to the generator function but it means the redesign of the mappers. That's why we have to reverse it in the libomptarget. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | You can check it yourself. Apply the patch, restore the original behavior in libomptarget and run libomptarget tests. Mapper related tests will crash. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Stick with mapper generator always generating mapping arrays in the direct order. The targetDataMapper reverse the mapping array and then passes args_reverse into targetDataEnd. Inside targetDataEnd, mapping | |
233 |
Stick with mapper generator always generating mapping arrays in the direct order. In the targetDataBegin case, targetDataMapper keep direct order args and calls targetDataBegin(args) and targetDataBegin process args in direct order. In the targetDataEnd case, targetDataMapper reverses the mapping array and then passes args_reverse into targetDataEnd. Inside targetDataEnd, args_reverse are processed in reverse order. So targetDataEnd is actually processing the args in original direct order. This seems contradictory to the constructor/deconstructor like behavior that all the mappings must be processed in the actual reverse order in targetDataEnd. This is my understanding. The current code should be wrong but obviously the current code is working. So why the current code is working? what is inconsistent in my analysis. Could you point out the missing piece. | |
233 |
For sure without this line, tests would crash and that is why you included this line of change in the patch. Since you made the change, you could explain why, right? |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | I changed and simplified codegen for the mapper generator without changing its interface. I could do this because of the new ordering, before we could not rely on it. But it also requires a change in the runtime. Before this patch mapper generator tried to do some ordering but it was not always correct. It was not expecting something like map(alloc:s) map(s.a) because it was not allowed by the compiler. That's why it worked before and won't work with this patch. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | I played a bit with your patch. #pragma omp target exit data map(from: c.a[0:NUM], c.b[0:NU2M]) map(delete: c) I put NUM=1024 and NU2M = 2048. Libomptarget --> Entry 0: Base=0x00007fff064080a8, Begin=0x00007fff064080a8, Size=16, Type=0x0, Name=(null) Libomptarget --> Entry 1: Base=0x00007fff064080a8, Begin=0x0000000000f9cbd0, Size=4096, Type=0x1000000000012, Name=(null) Libomptarget --> Entry 2: Base=0x00007fff064080b0, Begin=0x0000000000f86e10, Size=8192, Type=0x1000000000012, Name=(null) Libomptarget --> Entry 3: Base=0x00007fff064080a8, Begin=0x00007fff064080a8, Size=16, Type=0x1000000000008, Name=(null) Since targetDataEnd internally reverse the processing order, could you confirm that the frontend was emitting entries 3,2,1,0? |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | No, the frontend emits in the order 0, 1, 2, 3. targetDataEnd process in reversed order 3, 2, 1, 0, but the mapper does not know about it and still emits the data in the order 0, 1, 2, 3. map (a) map(mapper(id), tofrom: c) where mapper for с does something like you wrote. In this case the order would be 0, 1, 2, 3, 4, where 0 is mapping of a and 1-4 is mapping of c. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Let us leave the mapper case aside which has extra mess. #pragma omp target exit data map(from: c.a[0:NUM], c.b[0:NU2M]) map(delete: c) |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | Entry 0 is the address that should be passed to the kernel (for the captured variable, that's why it is marked as TGT_TARGET_PARAM - target kernel argument), entries 1-3 are the actual mappings. Yes, I assume the frontend can fuse these entries in many cases, but it is different problem that should be addressed in a separate patch. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | There is no kernel associated enter/exit data... Thanks for answering all my puzzles. If you think some of my questions can be better answered by some documentation, please point me if there are any. I think both the frontend and runtime needs to be further consolidated to reduce side effects in future patches. The current patch in the runtime library part looks good to me for the current need. Please ping appropriate reviewers for the frontend change, so we can keep this patch moving. |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | No problem! |
openmp/libomptarget/src/omptarget.cpp | ||
---|---|---|
233 | For the documentations, there is design description for the mappers https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx. For the mapping rules, see the comments in generateInfoForComponentList function |
Just tried this patch and found that below test abort inside generateInfoCapture.
#include <stdio.h> #define LEN 100 int buf[LEN]; int main() { int i; int *p = buf; for (i = 0; i < LEN; i++) { p[i] = 0; } #pragma omp target map(p) map(p[:100]) { *(p+5) = 1; } }
Below test asserts inside generateInfoForComponentList:
Assertion failed: (!IsPointer && "Unexpected base element with the pointer type."), function generateInfoForComponentList, file /Users/cchen/workspace/llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp, line 7754. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
#define LEN 100 struct Type { int *p; }; int buf[LEN]; int main() { int i; struct Type s; s.p = buf; for (i = 0; i < LEN; i++) { s.p[i] = 0; } #pragma omp target map(s.p) map(s.p[:100]) { *(s.p+5) = 1; } }
Thanks for the changes, Alexey! I tried the patch locally, and it looks stable. It handled several tests I tried, including the following case involving array section on a pointer to pointer base, and nested mappers with PTR_AND_OBJ maps successfully:
#include <stdio.h> typedef struct { int a; double *b; } C; #pragma omp declare mapper(id1: C s) map(to:s.a) map(from:s.b[0:2]) typedef struct { int e; C f; int h; short *g; } D; #pragma omp declare mapper(default: D r) map(from:r.e) map(mapper(id1), tofrom:r.f) map(tofrom: r.g[0:r.h]) int main() { constexpr int N = 10; D s; s.e = 111; s.f.a = 222; double x[2]; x[1] = 20; short y[N]; y[1] = 30; s.f.b = &x[0]; s.g = &y[0]; s.h = N; D* sp = &s; D** spp = &sp; printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g); // Expected: 111 222 20.0 <host_addr1> 30 <host_addr2> #pragma omp target map(tofrom:spp[0][0]) { printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g); // Expected: <not 111> 222 <not 20.0> <dev_addr1> 30 <dev_addr2> spp[0][0].e = 333; spp[0][0].f.a = 444; spp[0][0].f.b[1] = 40; spp[0][0].g[1] = 50; } printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g); // Expected: 333 222 40.0 <host_addr1> 50 <host_addr2> }
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
8194–8195 | Tabs should probably be spaces. Same for a few other places in the changeset. | |
9084–9085 | IsDelete | |
9239 | Commented-out code intentionally left in? |
@ABataev This is a nice complex example, I think it's worth including it in the runtime tests (under libomptarget).
@abhinavgaba Thanks for providing it!
Unrelated change?