This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix `present` for exit from `omp target data`
ClosedPublic

Authored by jdenny on Jul 23 2020, 9:06 AM.

Details

Summary

Without this patch, the following example fails but shouldn't
according to OpenMP TR8:

#pragma omp target enter data map(alloc:i)
#pragma omp target data map(present, alloc: i)
{
  #pragma omp target exit data map(delete:i)
} // fails presence check here

OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states:

If the map clause appears on a target, target data, target enter
data or target exit data construct with a present map-type-modifier
then on entry to the region if the corresponding list item does not
appear in the device data environment an error occurs and the
program terminates.

There is no corresponding statement about the exit from a region.
Thus, the present modifier should:

  1. Check for presence upon entry into any region, including a target exit data region. This behavior is already implemented correctly.
  1. Should not check for presence upon exit from any region, including a target or target data region. Without this patch, this behavior is not implemented correctly, breaking the above example.

In the case of target data, this patch fixes the latter behavior by
removing the present modifier from the map types Clang generates for
the runtime call at the end of the region.

In the case of target, we have not found a valid OpenMP program for
which such a fix would matter. It appears that, if a program can
guarantee that data is present at the beginning of a target region
so that there's no error there, that data is also guaranteed to be
present at the end.

Diff Detail

Event Timeline

jdenny created this revision.Jul 23 2020, 9:06 AM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJul 23 2020, 9:06 AM

What confuses me about this interpretation of the standard is the inconsistency at data exit. So if we have an explicit omp target exit data map(present...) then we should respect the "present" semantics, whereas when we have a scoped data exit:

#pragma omp target data map(present,...)
{
  ...
} // implicit "exit data" here

then "present" should be ignored.

I agree that the paragraph from the standard leaves little room for other interpretations, I'd just like to point out that it looks inconsistent - at least to me.

I don't know if the OpenMP committee has any documented rationale for this behavior. I can say that the OpenACC committee is considering the same semantics. However, the issues to consider are not identical. For example, OpenACC has a separate structured reference counter, meaning it should be impossible for such data not to be present at the exit of a data construct unless you've shut down the runtime.

What confuses me about this interpretation of the standard is the inconsistency at data exit. So if we have an explicit omp target exit data map(present...) then we should respect the "present" semantics, whereas when we have a scoped data exit:

#pragma omp target data map(present,...)
{
  ...
} // implicit "exit data" here

then "present" should be ignored.

I agree that the paragraph from the standard leaves little room for other interpretations, I'd just like to point out that it looks inconsistent - at least to me.

When you use present on a variable on a scoped target data region, you cannot delete that object in the scope. I would say this is a test case error. It should still be present on exit, checking for that is maybe redundant

grokos added a comment.EditedJul 23 2020, 1:49 PM

So is the test case that motivated this patch illegal OpenMP code?

#pragma omp target enter data map(alloc:i)
#pragma omp target data map(present, alloc: i)
{
  #pragma omp target exit data map(delete:i) // you cannot delete that object in the scope, illegal code?
} // fails presence check here
RaviNarayanaswamy added a comment.EditedJul 23 2020, 2:07 PM

So is the test case that motivated this patch illegal OpenMP code?

#pragma omp target enter data map(alloc:i)
#pragma omp target data map(present, alloc: i)
{
  #pragma omp target exit data map(delete:i) // you cannot delete that object in the scope, illegal code?
} // fails presence check here

According to spec the test should work. ie should not check for presence on exit from a blocked openmp pragma scope.

So is the test case that motivated this patch illegal OpenMP code?

#pragma omp target enter data map(alloc:i)
#pragma omp target data map(present, alloc: i)
{
  #pragma omp target exit data map(delete:i) // you cannot delete that object in the scope, illegal code?
} // fails presence check here

According to spec the test should work. ie should not check for presence on exit from a blocked openmp pragma scope.

It sounds like this patch's motivation is correct then. Has anyone clarified the motivation for this behavior?

So let's proceed with the patch.

Instead of introducing new API functions and making all these changes in all these files, wouldn't it be easier if we just unset the PRESENT flag from arg_types in clang when we generate the call to __tgt_target_data_end_* if we are exiting from a scoped environment?

Has anyone clarified the motivation for this behavior?

I meant, is there any insight into why the spec specifies this behavior?

Instead of introducing new API functions and making all these changes in all these files, wouldn't it be easier if we just unset the PRESENT flag from arg_types in clang when we generate the call to __tgt_target_data_end_* if we are exiting from a scoped environment?

Ah, that does sound simpler. Thanks. I'll look into it.

Suppressing the presence check on exit from omp target would require a runtime change in addition to the Clang change you suggest for omp target data. However, I've so far failed to formulate a reasonable test case. Specifically, I don't yet see a way to guarantee that the data will definitely be present at the start of omp target but might not be present by the end. Is it possible? If not, then maybe we should leave the check in place for omp target.

Has anyone clarified the motivation for this behavior?

I meant, is there any insight into why the spec specifies this behavior?

Instead of introducing new API functions and making all these changes in all these files, wouldn't it be easier if we just unset the PRESENT flag from arg_types in clang when we generate the call to __tgt_target_data_end_* if we are exiting from a scoped environment?

Ah, that does sound simpler. Thanks. I'll look into it.

Suppressing the presence check on exit from omp target would require a runtime change in addition to the Clang change you suggest for omp target data. However, I've so far failed to formulate a reasonable test case. Specifically, I don't yet see a way to guarantee that the data will definitely be present at the start of omp target but might not be present by the end. Is it possible? If not, then maybe we should leave the check in place for omp target.

I would rather not have a check if not required by the spec as it would an unnecessary overhead to performance.

jdenny marked an inline comment as done.Jul 24 2020, 3:34 PM

Has anyone clarified the motivation for this behavior?

I meant, is there any insight into why the spec specifies this behavior?

Instead of introducing new API functions and making all these changes in all these files, wouldn't it be easier if we just unset the PRESENT flag from arg_types in clang when we generate the call to __tgt_target_data_end_* if we are exiting from a scoped environment?

Ah, that does sound simpler. Thanks. I'll look into it.

Suppressing the presence check on exit from omp target would require a runtime change in addition to the Clang change you suggest for omp target data. However, I've so far failed to formulate a reasonable test case. Specifically, I don't yet see a way to guarantee that the data will definitely be present at the start of omp target but might not be present by the end. Is it possible? If not, then maybe we should leave the check in place for omp target.

I would rather not have a check if not required by the spec as it would an unnecessary overhead to performance.

I've added a comment to the runtime code that performs the check. As you can see, the check is performed regardless. It's just a question of whether the runtime treats it as an error. I don't think performance is an issue.

My concern here is that it will be hard to justify changes to the runtime if I cannot formulate a use case.

openmp/libomptarget/src/omptarget.cpp
489

This is where the runtime performs the check.

I've added a comment to the runtime code that performs the check. As you can see, the check is performed regardless. It's just a question of whether the runtime treats it as an error. I don't think performance is an issue.

My concern here is that it will be hard to justify changes to the runtime if I cannot formulate a use case.

Thinking about it, I don't think there can be a case where something is present upon entering a target region and not be present when we're exiting. Whatever code comprises the target region is code executed on the device - it cannot modify the state of host objects (i.e. libomptarget) in any possible way. E.g. the kernel cannot invoke libomptarget functions, allocate memory, map/unmap data etc.

The only case where something like this would be possible is if we have multiple host threads executing async offloading. In such a case, one thread may launch a target region at a moment when the requested mapping is present on the device and while the kernel is executing some other thread performs a target data exit on the desired mapping. Upon exiting the kernel, the mapping will no longer be present but this is clearly a race condition (user's fault), so I don't think we should pay attention to such a scenario.

I've added a comment to the runtime code that performs the check. As you can see, the check is performed regardless. It's just a question of whether the runtime treats it as an error. I don't think performance is an issue.

My concern here is that it will be hard to justify changes to the runtime if I cannot formulate a use case.

Thinking about it, I don't think there can be a case where something is present upon entering a target region and not be present when we're exiting. Whatever code comprises the target region is code executed on the device - it cannot modify the state of host objects (i.e. libomptarget) in any possible way. E.g. the kernel cannot invoke libomptarget functions, allocate memory, map/unmap data etc.

The only case where something like this would be possible is if we have multiple host threads executing async offloading. In such a case, one thread may launch a target region at a moment when the requested mapping is present on the device and while the kernel is executing some other thread performs a target data exit on the desired mapping. Upon exiting the kernel, the mapping will no longer be present but this is clearly a race condition (user's fault), so I don't think we should pay attention to such a scenario.

Exactly. As far as I can tell, the runtime simply needs a comment that explains this issue in the case of omp target.

jdenny updated this revision to Diff 281067.Jul 27 2020, 3:39 PM
jdenny edited the summary of this revision. (Show Details)

Rewrite patch as discussed: instead of generating different runtime calls for the end of an omp target data vs. the beginning of an omp target exit data so that the runtime can determine when to ignore present, change Clang to filter present from the map types at the end of an omp target data.

jdenny edited the summary of this revision. (Show Details)Jul 27 2020, 3:40 PM

This looks much better now. I don't have any other comments. Since this patch is now essentially a clang-only patch, I'll let @ABataev accept it or post comments.

ABataev added inline comments.Jul 28 2020, 11:20 AM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8686

Can this new flag be encapsulated in Info?

jdenny updated this revision to Diff 281330.Jul 28 2020, 12:41 PM

Replaced SeparateBeginEnd parameter with new TargetDataInfo field as requested. Rebased.

jdenny marked an inline comment as done.Jul 28 2020, 12:42 PM
ABataev added inline comments.Jul 28 2020, 1:43 PM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8844

Do not append param here, use the one from Info

jdenny marked an inline comment as done.Jul 28 2020, 1:54 PM
jdenny added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
8844

Info.SeparateBeginEndCalls and ForEndCall do not represent the same thing. If Info.SeparateBeginEndCalls=true, as in emitTargetDataCalls below, then emitOffloadingArraysArgument is called twice with the same Info, once with ForEndCall=false and once with ForEndCall=true.

This revision is now accepted and ready to land.Jul 30 2020, 8:34 AM
jdenny marked an inline comment as done.Jul 30 2020, 8:56 AM

Thanks for the review.

As discussed during the 7/29 call, I'll wait to push until we're sure about what the OpenMP committee intended here. I'm pursuing this and will report back when I have more information.

This revision was automatically updated to reflect the committed changes.