Page MenuHomePhabricator

[OPENMP50]Allow overlapping mapping in target constrcuts.
ClosedPublic

Authored by ABataev on Aug 17 2020, 4:04 PM.

Details

Summary

OpenMP 5.0 removed a lot of restriction for overlapped mapped items
comparing to OpenMP 4.5. Patch restricts the checks for overlapped data
mappings only for OpenMP 4.5 and less and reorders mapping of the
arguments so, that present and alloc mappings are processed first and
then all others.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
grokos added a subscriber: grokos.Nov 17 2020, 1:13 PM
grokos added inline comments.
openmp/libomptarget/src/omptarget.cpp
261

What is the current status of the order of the arguments clang emits? Is it still necessary to traverse arguments in reverse order here?

ABataev added inline comments.Nov 17 2020, 1:14 PM
openmp/libomptarget/src/omptarget.cpp
261

Yes, still required

If data mappings can overlap, then it follows that copies to/from the target must be done sequentially by the runtime, unless additional information on their independence exists. Alias analysis style.

I see the programmer convenience angle, but it is a shame to no longer be able to perform the to: mappings simultaneously.

ye-luo added a subscriber: ye-luo.Nov 18 2020, 6:34 AM
  1. Could you separate the reordering related changes to separate patch?
  2. Could you mention which line in spec 4.5 was the restriction? Even 5.0/5.1 has some restrictions. Need to be clear which one you refer to.
  1. Could you separate the reordering related changes to separate patch?

It is impossible. There is a separate patch for removing extra TGT_TARGET_PARAM but it cannot be separated from this patch.

  1. Could you mention which line in spec 4.5 was the restriction? Even 5.0/5.1 has some restrictions. Need to be clear which one you refer to.

Yes, you right, I'm trying to support something like map(alloc: S) map(tofrom: S.f) or map(S, S.ptr[0]) which was not previously allowed and is a feature of OpenMP 5.0.

ye-luo added inline comments.Dec 4 2020, 2:38 PM
openmp/libomptarget/src/omptarget.cpp
261

Based on the conversation in
https://reviews.llvm.org/D85216
This line of code neither before nor after the change plays well.
Shall we fix the order in targetDataEnd first?

ABataev added inline comments.Dec 4 2020, 2:40 PM
openmp/libomptarget/src/omptarget.cpp
261

This change is part of this patch and cannot be committed separately.

ye-luo added inline comments.Dec 4 2020, 2:46 PM
openmp/libomptarget/src/omptarget.cpp
261

I mean could you fix that issue as a parent of this patch?

ye-luo added inline comments.Dec 4 2020, 2:59 PM
openmp/libomptarget/src/omptarget.cpp
261

This change is part of this patch and cannot be committed separately.

If fixing the reordering is part of this patch, I should have seen "target_data_function == targetDataEnd ?" branches disappear.

ABataev added inline comments.Dec 4 2020, 3:01 PM
openmp/libomptarget/src/omptarget.cpp
261

Nope, just with this patch. It reorders the maps and need to change the cleanup order too.

ABataev added inline comments.Dec 4 2020, 3:03 PM
openmp/libomptarget/src/omptarget.cpp
261

It works just like constructors/destructors: allocate in direct order, deallocate in reversed to correctly handle map order.

ye-luo added inline comments.Dec 4 2020, 3:09 PM
openmp/libomptarget/src/omptarget.cpp
261

The description says that "present and alloc mappings are processed first and then all others."
Why the order of arguments in targetDataBegin, targetDataEnd and targetDataUpdate all get reversed.

ABataev added inline comments.Dec 4 2020, 3:13 PM
openmp/libomptarget/src/omptarget.cpp
261

Because this is for mappers. Mapper maps are ordered by the compiler in the direct order (alloc, maps, delete) but when we need to do exit, we need to release the data in reversed order (deletes, maps, allocs).

ye-luo added inline comments.Dec 4 2020, 3:17 PM
openmp/libomptarget/src/omptarget.cpp
261

I was not making the question clear. My question about "reverse" is not about having a reverse order for targetDataBegin. My question was about "reversing" from the the old code. Your change put the opposite order for targetDataBegin, targetDataEnd and targetDataUpdate cases.

ye-luo added inline comments.Dec 4 2020, 3:18 PM
openmp/libomptarget/src/omptarget.cpp
261

I was not making the question clear. My question about "reverse" is not about having a reverse order for targetDataBegin. My question was about "reversing" from the the old code. Your change put the opposite order for targetDataBegin, targetDataEnd and targetDataUpdate cases.

typo correction
I was not making the question clear. My question about "reverse" is not about having a reverse order for targetDataEnd. My question was about "reversing" from the the old code. Your change put the opposite order for targetDataBegin, targetDataEnd and targetDataUpdate cases.

ye-luo added inline comments.Dec 4 2020, 3:34 PM
openmp/libomptarget/src/omptarget.cpp
261

My separate question specifically for targetDataEnd is the following.

In target(), we call

targetDataBegin(args)
{  // forward order
  for (int32_t i = 0; i < arg_num; ++i) { ... }
}
launch_kernels
targetDataEnd(args)
{  // reverse order
  for (int32_t I = ArgNum - 1; I >= 0; --I) { }
}

At a mapper,

targetDataMapper
{
  // generate args_reverse in reverse order for targetDataEnd
  targetDataEnd(args_reverse)
}

Are we actually getting the original forward order due to one reverse in targetDataMapper and second reverse in targetDataEnd? Is this the desired behavior? This part confused me. Do I miss something? Could you explain a bit?

ABataev added inline comments.Dec 4 2020, 4:02 PM
openmp/libomptarget/src/omptarget.cpp
261

Yes, something like this. targetDataEnd reverses the order of mapping arrays. But mapper generator always generates mapping arrays in the direct order (it fills mapping arrays that later processed by the targetDataEnd function). We could fix this by passing extra Boolean flag to the generator function but it means the redesign of the mappers. That's why we have to reverse it in the libomptarget.

ABataev added inline comments.Dec 4 2020, 4:05 PM
openmp/libomptarget/src/omptarget.cpp
261

You can check it yourself. Apply the patch, restore the original behavior in libomptarget and run libomptarget tests. Mapper related tests will crash.

ye-luo added inline comments.Dec 4 2020, 4:26 PM
openmp/libomptarget/src/omptarget.cpp
261

Stick with mapper generator always generating mapping arrays in the direct order. The targetDataMapper reverse the mapping array and then passes args_reverse into targetDataEnd. Inside targetDataEnd, mapping

261

Yes, something like this. targetDataEnd reverses the order of mapping arrays. But mapper generator always generates mapping arrays in the direct order (it fills mapping arrays that later processed by the targetDataEnd function). We could fix this by passing extra Boolean flag to the generator function but it means the redesign of the mappers. That's why we have to reverse it in the libomptarget.

Stick with mapper generator always generating mapping arrays in the direct order.

In the targetDataBegin case, targetDataMapper keep direct order args and calls targetDataBegin(args) and targetDataBegin process args in direct order.

In the targetDataEnd case, targetDataMapper reverses the mapping array and then passes args_reverse into targetDataEnd. Inside targetDataEnd, args_reverse are processed in reverse order. So targetDataEnd is actually processing the args in original direct order. This seems contradictory to the constructor/deconstructor like behavior that all the mappings must be processed in the actual reverse order in targetDataEnd.

This is my understanding. The current code should be wrong but obviously the current code is working. So why the current code is working? what is inconsistent in my analysis. Could you point out the missing piece.

261

You can check it yourself. Apply the patch, restore the original behavior in libomptarget and run libomptarget tests. Mapper related tests will crash.

For sure without this line, tests would crash and that is why you included this line of change in the patch. Since you made the change, you could explain why, right?

ABataev added inline comments.Dec 4 2020, 4:53 PM
openmp/libomptarget/src/omptarget.cpp
261

I changed and simplified codegen for the mapper generator without changing its interface. I could do this because of the new ordering, before we could not rely on it. But it also requires a change in the runtime.
targetDataEnd calls targetDataMapper and targetDataMapper fills the array in the direct order, but targetDataEnd processes them in the reverse order, but mapper generator does not know about it. It also has to generate the data in the reverse order, just like targetDataEnd does.

Before this patch mapper generator tried to do some ordering but it was not always correct. It was not expecting something like map(alloc:s) map(s.a) because it was not allowed by the compiler. That's why it worked before and won't work with this patch.
PS. The change in the mapper generator is also required and cannot be separated. Without this mappers tests won't work.

ye-luo added inline comments.Dec 7 2020, 9:03 AM
openmp/libomptarget/src/omptarget.cpp
261

I played a bit with your patch.

#pragma omp target exit data map(from: c.a[0:NUM], c.b[0:NU2M]) map(delete: c)

I put NUM=1024 and NU2M = 2048.
LIBOMPTARGET_DEBUG reports

Libomptarget --> Entry  0: Base=0x00007fff064080a8, Begin=0x00007fff064080a8, Size=16, Type=0x0, Name=(null)
Libomptarget --> Entry  1: Base=0x00007fff064080a8, Begin=0x0000000000f9cbd0, Size=4096, Type=0x1000000000012, Name=(null)
Libomptarget --> Entry  2: Base=0x00007fff064080b0, Begin=0x0000000000f86e10, Size=8192, Type=0x1000000000012, Name=(null)
Libomptarget --> Entry  3: Base=0x00007fff064080a8, Begin=0x00007fff064080a8, Size=16, Type=0x1000000000008, Name=(null)

Since targetDataEnd internally reverse the processing order, could you confirm that the frontend was emitting entries 3,2,1,0?
I'm wondering if the frontend could emit 3, 0, 1, 2 so the processing order is 2,1,0,3? The spec requires struct element processed before the struct in "target exit data"

ABataev added inline comments.Dec 7 2020, 9:26 AM
openmp/libomptarget/src/omptarget.cpp
261

No, the frontend emits in the order 0, 1, 2, 3. targetDataEnd process in reversed order 3, 2, 1, 0, but the mapper does not know about it and still emits the data in the order 0, 1, 2, 3.
And it is only for mappers!
So, say you have an extra map something like map(a).

map (a) map(mapper(id), tofrom: c)

where mapper for с does something like you wrote.

In this case the order would be 0, 1, 2, 3, 4, where 0 is mapping of a and 1-4 is mapping of c.
When we need to delete the data, the mapper still would generate 1,2,3,4 + 0 for mapping of a, but targetDataEnd expects 4,3,2,1,0. That's why we have to reverse the mapping data, produced by the mapper generator for targetDataEnd.

cchen added a subscriber: cchen.Dec 7 2020, 9:55 AM
ye-luo added inline comments.Dec 7 2020, 10:29 AM
openmp/libomptarget/src/omptarget.cpp
261

Let us leave the mapper case aside which has extra mess.
Double checked that "Libomptarget --> Entry 0" is printed at the __tgt_target_data_end_mapper. So the order is as you said 0, 1, 2, 3 from the frontend. What is the "Entry 0"? more specifically is the difference between entry 0 and 3? Entry 0 seems to be an implicit map while 3 is explicit.

#pragma omp target exit data map(from: c.a[0:NUM], c.b[0:NU2M]) map(delete: c)
the "map(delete: c)" has some state machine to protect the delete due to ordering.
So I'm wondering why the frontend must issue both 0 and 3. Can the front end fuse 0 and 3?
I mean the frontend generates 3, 1, 2 and the runtime processing 2,1,3 without the deleting issue?

ABataev added a comment.EditedDec 7 2020, 10:37 AM
This comment has been deleted.
openmp/libomptarget/src/omptarget.cpp
261

Entry 0 is the address that should be passed to the kernel (for the captured variable, that's why it is marked as TGT_TARGET_PARAM - target kernel argument), entries 1-3 are the actual mappings. Yes, I assume the frontend can fuse these entries in many cases, but it is different problem that should be addressed in a separate patch.

ye-luo added inline comments.Dec 7 2020, 10:59 AM
openmp/libomptarget/src/omptarget.cpp
261

There is no kernel associated enter/exit data...

Thanks for answering all my puzzles. If you think some of my questions can be better answered by some documentation, please point me if there are any.

I think both the frontend and runtime needs to be further consolidated to reduce side effects in future patches. The current patch in the runtime library part looks good to me for the current need.

Please ping appropriate reviewers for the frontend change, so we can keep this patch moving.

ABataev added inline comments.Dec 7 2020, 11:04 AM
openmp/libomptarget/src/omptarget.cpp
261

No problem!
Yes, there is no kernel, missed it, but still it is the reason. The codegen you see is not directly related to this patch, this is just how the mapping currently works, it is just not quite optimal. Sure, it can be improved in many cases but just like I said it is different problem that should be addressed in separate patch(es). Also, I believe some of the optimizations can be implemented in OpenMPOpt pass.

ABataev added inline comments.Dec 7 2020, 11:13 AM
openmp/libomptarget/src/omptarget.cpp
261

For the documentations, there is design description for the mappers https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx. For the mapping rules, see the comments in generateInfoForComponentList function

cchen added a comment.Dec 7 2020, 11:16 AM

Just tried this patch and found that below test abort inside generateInfoCapture.

#include <stdio.h>
#define LEN 100
int buf[LEN];

int main()
{
    int i;
    int *p = buf;

    for (i = 0; i < LEN; i++) {
        p[i] = 0;
    }

#pragma omp target map(p) map(p[:100])
    {
        *(p+5) = 1;
    }

}

Just tried this patch and found that below test abort inside generateInfoCapture.

#include <stdio.h>
#define LEN 100
int buf[LEN];

int main()
{
    int i;
    int *p = buf;

    for (i = 0; i < LEN; i++) {
        p[i] = 0;
    }

#pragma omp target map(p) map(p[:100])
    {
        *(p+5) = 1;
    }

}

I'll check this, thanks!

ABataev updated this revision to Diff 309992.Dec 7 2020, 12:27 PM

Fix analysis for overlapping pointers.

cchen added a comment.Dec 7 2020, 12:59 PM

Below test asserts inside generateInfoForComponentList:

Assertion failed: (!IsPointer && "Unexpected base element with the pointer type."), function generateInfoForComponentList, file /Users/cchen/workspace/llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp, line 7754.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
#define LEN 100

struct Type {
    int *p;
};

int buf[LEN];

int main()
{
    int i;
    struct Type s;
    s.p = buf;

    for (i = 0; i < LEN; i++) {
        s.p[i] = 0;
    }

#pragma omp target map(s.p) map(s.p[:100])
    {
        *(s.p+5) = 1;
    }
}
ABataev updated this revision to Diff 310024.Dec 7 2020, 2:23 PM

General approach to fix the analysis of pointers.

ABataev updated this revision to Diff 323387.Feb 12 2021, 10:29 AM

Update+rebase, need to update the tests.

ABataev updated this revision to Diff 323415.Feb 12 2021, 11:20 AM

Fixed condition logic.

ABataev updated this revision to Diff 323477.Feb 12 2021, 2:16 PM

Updated tests

Thanks for the changes, Alexey! I tried the patch locally, and it looks stable. It handled several tests I tried, including the following case involving array section on a pointer to pointer base, and nested mappers with PTR_AND_OBJ maps successfully:

#include <stdio.h>

typedef struct { int a; double *b; } C;
#pragma omp declare mapper(id1: C s) map(to:s.a) map(from:s.b[0:2])

typedef struct { int e; C f; int h; short *g; } D;
#pragma omp declare mapper(default: D r) map(from:r.e) map(mapper(id1), tofrom:r.f) map(tofrom: r.g[0:r.h])

int main() {
  constexpr int N = 10;
  D s;
  s.e = 111;
  s.f.a = 222;
  double x[2]; x[1] = 20;
  short y[N]; y[1] = 30;
  s.f.b = &x[0];
  s.g = &y[0];
  s.h = N;

  D* sp = &s;
  D** spp = &sp;

  printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
  // Expected: 111 222 20.0 <host_addr1> 30 <host_addr2>

  #pragma omp target map(tofrom:spp[0][0])
  {
    printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
    // Expected: <not 111> 222 <not 20.0> <dev_addr1> 30 <dev_addr2>
    spp[0][0].e = 333;
    spp[0][0].f.a = 444;
    spp[0][0].f.b[1] = 40;
    spp[0][0].g[1] = 50;
  }
  printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
  // Expected: 333 222 40.0 <host_addr1> 50 <host_addr2>
}
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8485–8486

Tabs should probably be spaces. Same for a few other places in the changeset.

9571

IsDelete

9728

Commented-out code intentionally left in?

Thanks for the changes, Alexey! I tried the patch locally, and it looks stable. It handled several tests I tried, including the following case involving array section on a pointer to pointer base, and nested mappers with PTR_AND_OBJ maps successfully:

#include <stdio.h>

typedef struct { int a; double *b; } C;
#pragma omp declare mapper(id1: C s) map(to:s.a) map(from:s.b[0:2])

typedef struct { int e; C f; int h; short *g; } D;
#pragma omp declare mapper(default: D r) map(from:r.e) map(mapper(id1), tofrom:r.f) map(tofrom: r.g[0:r.h])

int main() {
  constexpr int N = 10;
  D s;
  s.e = 111;
  s.f.a = 222;
  double x[2]; x[1] = 20;
  short y[N]; y[1] = 30;
  s.f.b = &x[0];
  s.g = &y[0];
  s.h = N;

  D* sp = &s;
  D** spp = &sp;

  printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
  // Expected: 111 222 20.0 <host_addr1> 30 <host_addr2>

  #pragma omp target map(tofrom:spp[0][0])
  {
    printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
    // Expected: <not 111> 222 <not 20.0> <dev_addr1> 30 <dev_addr2>
    spp[0][0].e = 333;
    spp[0][0].f.a = 444;
    spp[0][0].f.b[1] = 40;
    spp[0][0].g[1] = 50;
  }
  printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
  // Expected: 333 222 40.0 <host_addr1> 50 <host_addr2>
}

@ABataev This is a nice complex example, I think it's worth including it in the runtime tests (under libomptarget).

@abhinavgaba Thanks for providing it!

ABataev marked 2 inline comments as done.Feb 16 2021, 5:37 AM

Thanks for the changes, Alexey! I tried the patch locally, and it looks stable. It handled several tests I tried, including the following case involving array section on a pointer to pointer base, and nested mappers with PTR_AND_OBJ maps successfully:

#include <stdio.h>

typedef struct { int a; double *b; } C;
#pragma omp declare mapper(id1: C s) map(to:s.a) map(from:s.b[0:2])

typedef struct { int e; C f; int h; short *g; } D;
#pragma omp declare mapper(default: D r) map(from:r.e) map(mapper(id1), tofrom:r.f) map(tofrom: r.g[0:r.h])

int main() {
  constexpr int N = 10;
  D s;
  s.e = 111;
  s.f.a = 222;
  double x[2]; x[1] = 20;
  short y[N]; y[1] = 30;
  s.f.b = &x[0];
  s.g = &y[0];
  s.h = N;

  D* sp = &s;
  D** spp = &sp;

  printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
  // Expected: 111 222 20.0 <host_addr1> 30 <host_addr2>

  #pragma omp target map(tofrom:spp[0][0])
  {
    printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
    // Expected: <not 111> 222 <not 20.0> <dev_addr1> 30 <dev_addr2>
    spp[0][0].e = 333;
    spp[0][0].f.a = 444;
    spp[0][0].f.b[1] = 40;
    spp[0][0].g[1] = 50;
  }
  printf("%d %d %lf %p %d %p\n", spp[0][0].e, spp[0][0].f.a, spp[0][0].f.b[1], spp[0][0].f.b, spp[0][0].g[1], spp[0][0].g);
  // Expected: 333 222 40.0 <host_addr1> 50 <host_addr2>
}

@ABataev This is a nice complex example, I think it's worth including it in the runtime tests (under libomptarget).

@abhinavgaba Thanks for providing it!

Ok, will add it as a part of the patch

clang/lib/CodeGen/CGOpenMPRuntime.cpp
8485–8486

These are not tabs. Looks like this is how Phabricators shows some format changes.

9728

Yeah, forgot to remove it, thanks.

ABataev updated this revision to Diff 323998.Feb 16 2021, 6:56 AM
ABataev marked an inline comment as done.

Fixes, added a test

abhinavgaba accepted this revision.Feb 16 2021, 11:09 AM

Thanks for the changes, Alexey.

This revision is now accepted and ready to land.Feb 16 2021, 11:09 AM
This revision was automatically updated to reflect the committed changes.