This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Avoid checking parent reference count in targetDataBegin
ClosedPublic

Authored by jdenny on Jun 29 2021, 7:48 AM.

Details

Summary

This patch is an attempt to do for targetDataBegin what D104924 does
for targetDataEnd:

  • Eliminates a lock/unlock of the data mapping table.
  • Clarifies the logic that determines whether a struct member's host-to-device transfer occurs. The old logic, which checks the parent struct's reference count, is a leftover from back when we had a different map interface (as pointed out at https://reviews.llvm.org/D104924#2846972).

Additionally, it eliminates the DeviceTy::getMapEntryRefCnt, which
is no longer used after this patch.

While D104924 does not change the computation of IsLast, I found I
needed to change the computation of IsNew for this patch. As far as
I can tell, the change is correct, and this patch does not cause any
additional openmp tests to fail. However, I'm not sure I've thought
of all use cases. Please advise.

Diff Detail

Event Timeline

jdenny created this revision.Jun 29 2021, 7:48 AM
jdenny requested review of this revision.Jun 29 2021, 7:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2021, 7:48 AM
Herald added a subscriber: sstefan1. · View Herald Transcript

Can you upload this with context? We should also run it through all the offload apps we have in FAROS.

jdenny updated this revision to Diff 355248.Jun 29 2021, 8:03 AM

Add context to diff. Sorry!

We should also run it through all the offload apps we have in FAROS.

I've not tried FAROS before. If you're requesting that I do this, please give me a pointer, and I'll be glad to investigate.

We should also run it through all the offload apps we have in FAROS.

I've not tried FAROS before. If you're requesting that I do this, please give me a pointer, and I'll be glad to investigate.

FAROS is just a suggestion as it is a reasonable way to run some of the offload apps we have:
https://github.com/LLNL/FAROS/tree/features/openmp-offload

It basically should allow you to build multiple proxy apps with ease. They are also run but you might need to do something like make test for miniQMC manually.

We should also run it through all the offload apps we have in FAROS.

I've not tried FAROS before. If you're requesting that I do this, please give me a pointer, and I'll be glad to investigate.

You can also build the tests manually if you find that easier. Right now FAROS only tests XSBench, RSBench, miniFE, and miniQMC for its offloading checks.

I think, we need more tests challenging the reference counting for mapping rules to make sure that the code change is valid.

Also, is the expectation that the implementation respects the 5.2 clarifications/changes regarding reference counting of struct members (probably issue #1909)?

openmp/libomptarget/src/omptarget.cpp
541–542

What is the meaning of these flags?

In code like the following, the target region should not copy any data, because S is already mapped in the enter data

struct{int a; int b;}s_s S;
#pragma omp target enter data map(to:S)
#pragma omp target map(tofrom:S.b)
{...}
#pragma omp target exit data map(from:S)

@jdoerfert, @jhuber6: Thanks for the pointers on FAROS. I'll look into it.

I think, we need more tests challenging the reference counting for mapping rules to make sure that the code change is valid.

I'm happy to add tests. My problem is figuring out the relevant use cases. Normally I'd add/update tests related to the behavior I'm changing, but I'm not expecting behavior to change (except possibly performance due to the elimination of a lock). Perhaps FAROS will uncover something.

Also, is the expectation that the implementation respects the 5.2 clarifications/changes regarding reference counting of struct members (probably issue #1909)?

I didn't intend this patch to be related to 5.2. As far as I can tell, openmp spec issue 1909 and its pr 1990 are about attempts to map a range that includes multiple previous mappings from other directives. I'm not aware of any way this patch affects that issue.

openmp/libomptarget/src/omptarget.cpp
541–542

I've verified that, with this patch, your example here works as expected.

I could have left in the else if here and simply replaced its body with copy = IsNew. That would make this code look more like the corresponding code in targetDataEnd in D104924. However, the else if would have had no effect then: If IsNew==true, then the if is taken either way. If IsNew==false, then copy = IsNew = false, but that doesn't change the value of copy.

The general meaning of the flags is documented in openmp/libomptarget/include/omptarget.h. I assume it's not necessary to check OMP_TGT_MAPTYPE_MEMBER_OF anymore because the point of this patch is that we no longer check the parent: IsNew seems sufficient whether or not there's a parent. I assume it's not necessary to check OMP_TGT_MAPTYPE_PTR_AND_OBJ anymore because 5adb3a6d86ee says the issue there is the object has its own reference count. Again, that's the point of this patch: we check the the object's reference count not the parent's.

But I really need someone who understands the history of this code to verify my reasoning.

We should also run it through all the offload apps we have in FAROS.

I've not tried FAROS before. If you're requesting that I do this, please give me a pointer, and I'll be glad to investigate.

You can also build the tests manually if you find that easier. Right now FAROS only tests XSBench, RSBench, miniFE, and miniQMC for its offloading checks.

I ran all four of those using FAROS, and I ran make test within miniQMC. I saw no failures, either before or after applying this patch plus D104924.

However, only XSBench and RSBench report on result verification, as far as I can tell. If there's a simple way to verify miniFE and miniQMC results (other than miniQMC's make test), let me know.

tianshilei1992 added inline comments.
openmp/libomptarget/src/omptarget.cpp
541–542

What is the meaning of these flags?

There is a "full" (probably not) list of the cases for data mapping in the function generateInfoForComponentList in clang/lib/CodeGen/CGOpenMPRuntime.cpp.

jdenny added inline comments.Jul 1 2021, 3:08 PM
openmp/libomptarget/src/omptarget.cpp
541–542

Thanks for mentioning that. At least some of that documentation appears to be out of date. For example, I just tried the map(to: s.ps->ps) case, and clang generated different map flags. But it's still a good list of cases.

jdenny added a comment.Jul 9 2021, 7:09 AM

Gentle ping. I'm not sure how to proceed on this and D104924. My understanding is that Intel (@grokos I think) may be testing internally. Just let me know if there's something I need to do in the meantime to move this forward. Thanks.

grokos added a comment.Jul 9 2021, 8:39 AM

Just like in D104924, the code after this patch is equivalent to the original one with one exception. The question was raised at our telecon a couple of weeks ago but no one was sure what the right answer is:

What happens if a struct has a pointer member, we map the whole struct and later on we use the pointer member to map some attached object?

struct S {
  int i;
  double d;
  float *p;
} s1;
...
s1.i = 1;
s1.d = 2.0;
s1.p = new float[10];
#pragma omp enter data map(to:s1) // whole struct is mapped but the object pointed to by s1.p is not
...
#pragma omp target map(s1.p[0:10]) // the attached object here has RefCount=1, but parent struct will have RefCount=2
{...}

How should we handle this case? Does anyone know for sure? Do we copy s1.p to the device as its RefCount=1 or are we ignoring the motion clause because the parent struct's RefCount != 1?

If the answer is "we copy it", then we can commit both this patch and D104924. If the answer is "we need to look at the parent struct's RefCount" then both patches will have to be abandoned.

grokos accepted this revision.Jul 9 2021, 8:54 AM

Thinking about it a bit more, no matter what the answer to my last question is, even the existing version of the library (without D105121 and D104924) will copy the attached object to the device. So these patches result in logically equivalent behavior. @jdenny I think you can proceed with committing them as patches that simplify the code. If we need to take extra care of the corner case in the above example, then we can prepare a new patch that fixes the bug.

This revision is now accepted and ready to land.Jul 9 2021, 8:54 AM
jdenny added a comment.Jul 9 2021, 9:15 AM

Thinking about it a bit more, no matter what the answer to my last question is, even the existing version of the library (without D105121 and D104924) will copy the attached object to the device. So these patches result in logically equivalent behavior.

That's my understanding as well: because IsNew=true for s1.p[0:10] in your example even without D105121 and D104924, the parent reference count wouldn't be checked anyway.

@RaviNarayanaswamy expressed some concern about a similar case with Fortran allocatables, but I don't know what map flags would produce the desired behavior then. I believe the current feeling is that case can be checked after my patches land.

@jdenny I think you can proceed with committing them as patches that simplify the code. If we need to take extra care of the corner case in the above example, then we can prepare a new patch that fixes the bug.

Thanks for the review. I'll try to land them today.

#pragma omp target map(s1.p[0:10])

Memory and data copy will happened for p[0:10] and the device copy of s1 will be updated with the device pointer to p

jdenny added a comment.Jul 9 2021, 1:16 PM

#pragma omp target map(s1.p[0:10])

Memory and data copy will happened for p[0:10] and the device copy of s1 will be updated with the device pointer to p

Are you concerned about a misbehavior here?

Do you want me to wait to push until some of the subscribers you've added have a chance to review?

jdenny updated this revision to Diff 357620.Jul 9 2021, 1:37 PM

Rebased.

Are you concerned about a misbehavior here?

I am just answering the question George asked about the expected behavior.

grokos added a comment.Jul 9 2021, 3:40 PM

So the expected behavior is what libomptarget currently does. These 2 patches do not change it, so they are good to go.

If the current behavior is only the pointer field of the struct and not other members of the struct are updated after the array is allocated and copied.

grokos added a comment.Jul 9 2021, 3:57 PM

Right, only the pointer member is updated, no other fields are modified.