This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Permit map with DSA on combined directive
ClosedPublic

Authored by jdenny on Aug 6 2019, 4:43 PM.

Details

Summary

For map, the following restriction changed in OpenMP 5.0:

  • OpenMP 4.5 [2.15.5.1, Restrictions]: "A list item cannot appear in both a map clause and a data-sharing attribute clause on the same construct.
  • OpenMP 5.0 [2.19.7.1, Restrictions]: "A list item cannot appear in both a map clause and a data-sharing attribute clause on the same construct unless the construct is a combined construct."

This patch removes this restriction in the case of combined constructs
and OpenMP 5.0, and it updates Sema not to capture a scalar by copy in
the target region when firstprivate and map appear for that scalar
on a combined target construct.

This patch also adds a fixme to a test that now reveals that a
diagnostic about loop iteration variables is dropped in the case of
OpenMP 5.0. That bug exists regardless of this patch's changes.

Diff Detail

Event Timeline

jdenny created this revision.Aug 6 2019, 4:43 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: guansong. · View Herald Transcript

is_device_ptr can be considered as a kind of mapping clause (see 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives), so, I assume, clang correct here in terms of OpenMP 4.5.
Thus, I would not call this a "fix", this is just a new feature from OpenMP 5.0.
Plus, these changes are not enough to support this new feature from OpenMP 5.0. There definitely must be some changes in the codegen. If the variable is mapped in the target construct, we should not generate a code for the private clause of this variable on the target construct, since, in this case, private clauses are applied for the inner subdirectives, which are the part of the combined directive, but not the target part of the directive.

clang/lib/Sema/SemaOpenMP.cpp
10895

I would suggest to guard this change and limit this new functionality only for OpenMP 5.0.

jdoerfert added inline comments.Aug 7 2019, 11:38 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

I think this should cause an error or at least a warning. Telling the compiler ps is a device pointer only to create a local uninitialized shadowing variable seems like an error to me.

jdenny marked an inline comment as done.Aug 7 2019, 11:44 AM

is_device_ptr can be considered as a kind of mapping clause (see 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives), so, I assume, clang correct here in terms of OpenMP 4.5.

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

Another question is whether the restriction would make sense. For example, does it ever make sense to specify both is_device_ptr and firstprivate for the same variable on a target construct? I think that would mean that modifications that are made to that device pointer within the target region should not be seen after the target region. I think that's reasonable, but that combination is not possible with the restriction. As Johannes points out, private plus is_device_ptr probably doesn't make sense.

Thus, I would not call this a "fix", this is just a new feature from OpenMP 5.0.

Understood.

I should have reported that the current implementation isn't complete for OpenMP 4.5. For example, on target teams, reduction(+:x) map(x) is an error but not map(x) reduction(+:x). So there are bugs to fix, and maybe this will evolve into multiple patches, but I want to be sure I'm on the right path first.

Plus, these changes are not enough to support this new feature from OpenMP 5.0. There definitely must be some changes in the codegen. If the variable is mapped in the target construct, we should not generate a code for the private clause of this variable on the target construct, since, in this case, private clauses are applied for the inner subdirectives, which are the part of the combined directive, but not the target part of the directive.

I'll look into it.

Thanks for the quick review.

clang/lib/Sema/SemaOpenMP.cpp
10895

Do you agree that this is strictly an extension to 4.5 that won't alter the behavior of 4.5-conforming applications?

Do we generally want to complain about the use of extensions, or is there another reason for the guard you suggest?

ABataev added inline comments.Aug 7 2019, 11:44 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

It is allowed according to OpenMP 5.0. Private copy must be created in the context of the parallel region, not the target region. So, for OpenMP 5.0 we should not emit error here.

jdoerfert added inline comments.Aug 7 2019, 11:48 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

What does that mean and how does that affect my reasoning?

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

Another question is whether the restriction would make sense. For example, does it ever make sense to specify both is_device_ptr and firstprivate for the same variable on a target construct?

On a target construct - no. On a target parallel - yes. This may be important especially in unified shared memory mode.

clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

It means, that for OpenMP 5.0 we should emit a warning/error here. It is allowed according to the standard, we should allow it too.

I should have reported that the current implementation isn't complete for OpenMP 4.5. For example, on target teams, reduction(+:x) map(x) is an error but not map(x) reduction(+:x). So there are bugs to fix, and maybe this will evolve into multiple patches, but I want to be sure I'm on the right path first.

It is just a bug, not a missing feature. Just file a bug report for it.

clang/lib/Sema/SemaOpenMP.cpp
10895

No. It is incorrect according to OpenMP 4.5 and we shall emit diagnostics here.

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

I'd like to understand this better. Is there something from the spec we can quote in the code?

Another question is whether the restriction would make sense. For example, does it ever make sense to specify both is_device_ptr and firstprivate for the same variable on a target construct?

On a target construct - no.

Why not?

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

I'd like to understand this better. Is there something from the spec we can quote in the code?

See 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives

Another question is whether the restriction would make sense. For example, does it ever make sense to specify both is_device_ptr and firstprivate for the same variable on a target construct?

On a target construct - no.

Why not?

It is meaningless. That's why it is prohibited in OpenMP 5.0 and allowed only for the combined constructs. These private clauses are applied to inner subconstructs.
For example, target parallel map(p) private(p). In the context of target region the variable is mapped while in the parallel context it is private.
For target map(p) private(p) it is absolutely meaningless.

jdoerfert added inline comments.Aug 7 2019, 1:46 PM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

So, for OpenMP 5.0 we should not emit error here.
that for OpenMP 5.0 we should emit a warning/error here.

The last answer contradicts what you said earlier. I expect there is a *not* missing, correct?

Assuming you do not want an error, which is fine, I still think a warning is appropriate as it seems to me there is never a reason to have a is_device_ptr clause for a private value. I mean, it is either a bug by the programmer, e.g., 5 letters of firstprivate went missing, or simply nonsensical code for which we warn in other situations as well.

ABataev added inline comments.Aug 7 2019, 1:56 PM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

Missed not.
These kind of construct are explicitly allowed in OpenMP. And we should obey the standard unconditionally.
Plus, there might be situations where user may require it explicitly. For example, the device pointer is dereferenced in one of the clauses for the subregions but in the deeper subregion it might be used as a private pointer. Why we should emit a warning here?

jdoerfert added inline comments.Aug 7 2019, 2:26 PM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

If you have a different situation, e.g., the one you describe, you should not have a warning. Though, that is not the point. If you have the situation above (single directive), as per my reasoning, there doesn't seem to be a sensible use case. If you have one and we should create an explicit test for it.

These kind of construct are explicitly allowed in OpenMP.

explicitly allowed != not forbidded (yet)

And we should obey the standard unconditionally.

Nobody says we should not. We warn people all the time even if it is valid code.

ABataev added inline comments.Aug 7 2019, 2:35 PM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

Warnings may prevent successful compilation in some cases, e.g. when warnings are treated as errors. Why we should emit a warning if the construct is allowed by the standard? Ask to change the standard if you did not agree with it.

hfinkel added a subscriber: hfinkel.Aug 7 2019, 4:18 PM
hfinkel added inline comments.
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

Warnings are specifically for constructs which are legal, but likely wrong (i.e., will not do what the user likely intended). Treating warnings as errors is not a conforming compilation mode - by design (specifically because that will reject conforming programs). Thus...

Why we should emit a warning if the construct is allowed by the standard? Ask to change the standard if you did not agree with it.

This is the wrong way to approach this. Warnings are specifically for legal code. They help users prevent errors, however, in cases where that legal code is likely problematic or won't do what the user intends.

ABataev added inline comments.Aug 8 2019, 2:37 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

Ok, we could emit wqrnings in some cases. But better to do it in the separate patches. Each particular case requires additional analysis.

This is the wrong way to approach this.

I don't think so. If some cases are really meaningless, better to ask to prohibit them in the standard. It is always a good idea to change the requirements first, if you think that some scenarios are not described correctly rather than do the changes in the code. It leads to different behavior of different compilers in the same situation and it is not good for the users.

hfinkel added inline comments.Aug 8 2019, 4:51 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

I don't think so. If some cases are really meaningless, better to ask to prohibit them in the standard. It is always a good idea to change the requirements first, if you think that some scenarios are not described correctly rather than do the changes in the code. It leads to different behavior of different compilers in the same situation and it is not good for the users.

There are at least two relevant factors:

  1. Language standards often express general concepts that can be combined in some regular set of ways. Some of these combinations are likely unintentional (e.g., user error), but standards don't explicitly prohibit them because: a) standards committees have limited bandwidth and need to concentrate on the highest-priority items and new features b) filling standards with a large number of special cases, even in the name of preventing user error, itself has a cost (in terms of maintenance of the standard, constraining conforming implementation techniques, and so on).
  2. Even if a standards committee were to take up restricting some set of special cases, implementation experience with a warning is often very helpful. Saying, "we added a warning, and no one complained about it being a false positive" is good evidence in support of making that warning a mandated error.

In the end, standards committees depend on implementers to add value on top of the standard itself in creating an high-QoI products. This has always been a focus area of Clang, and Clang is well known for its high diagnostic quality - not just in error messages, but in warnings too.

I have plenty of users who specifically compile with multiple compilers specifically to get the warnings for each compiler. Is it sometimes true that some compilers generating some warnings ends up being problematic? yes. I think that we all have observed that. But warnings are very helpful in catching likely bugs, and implementations have more freedom with warnings than with errors, so many users depend on high-quality warnings to help quickly find bugs and, thus, increase their productivity.

ABataev added inline comments.Aug 8 2019, 5:19 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

Just like I said, if you think there are some incorrect combinations we could generate a warning. But better to implement it in a different patch. There are many possible combinations and each one may have different preconditions.

hfinkel added inline comments.Aug 8 2019, 6:14 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

I have no objection to adding warnings in separate patch. I simply wanted to provide some feedback on the general conditions under which we should consider adding warnings. Thanks, Alexey.

jdenny marked 2 inline comments as done.Aug 8 2019, 8:04 AM

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

I'd like to understand this better. Is there something from the spec we can quote in the code?

See 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives

I looked again. I'm still not finding any text in that section that implies is_device_ptr follows the same restrictions as map. Can you please cite specific lines of text instead of an entire section? Thanks for your help.

clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

Thanks for the discussion. It sounds like people are fine if (1) the diagnostic proposed here would be a warning not an error and (2) that warning would not be implemented by this patch.

Maybe, but I haven't found any statement in either version that states that map restrictions apply to is_device_ptr.

is_device_ptr is a kind of mapping clause. I assume we can extend the restrictions for map clause to this clause too.

I'd like to understand this better. Is there something from the spec we can quote in the code?

See 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives

I looked again. I'm still not finding any text in that section that implies is_device_ptr follows the same restrictions as map. Can you please cite specific lines of text instead of an entire section? Thanks for your help.

Ah, it is only in OpenMP 5.0 anв restrictions for the map clause are for map clause only. Then we should allow is_device_ptr with the private clauses for OpenMP 4.5.
Better to do this in a separate patch only for is_device_ptr.

jdoerfert added inline comments.Aug 8 2019, 9:02 AM
clang/test/OpenMP/target_parallel_for_is_device_ptr_messages.cpp
93 ↗(On Diff #213749)

yes

jdenny added a comment.Aug 8 2019, 3:40 PM

See 2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives

I looked again. I'm still not finding any text in that section that implies is_device_ptr follows the same restrictions as map. Can you please cite specific lines of text instead of an entire section? Thanks for your help.

Ah, it is only in OpenMP 5.0 anв restrictions for the map clause are for map clause only. Then we should allow is_device_ptr with the private clauses for OpenMP 4.5.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Better to do this in a separate patch only for is_device_ptr.

I'll remove it from this one. Thanks.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

kkwli0 added a subscriber: kkwli0.Aug 9 2019, 2:55 PM

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

jdenny added a comment.Aug 9 2019, 3:43 PM

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Clang currently doesn't permit is_device_ptr with firstprivate either, but I'm not aware of any reason to diagnose that. Is there? And then there are privatization clauses like linear and reduction....

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Clang currently doesn't permit is_device_ptr with firstprivate either, but I'm not aware of any reason to diagnose that. Is there? And then there are privatization clauses like linear and reduction....

I assume, we should allow all allowed datasharing clauses along with is_device_ptr clause, including shared, linear, etc.

jdenny added a comment.Aug 9 2019, 3:53 PM

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Clang currently doesn't permit is_device_ptr with firstprivate either, but I'm not aware of any reason to diagnose that. Is there? And then there are privatization clauses like linear and reduction....

I assume, we should allow all allowed datasharing clauses along with is_device_ptr clause, including shared, linear, etc.

Agreed. I was just wondering if anyone saw an obvious issue that might deserve a warning (as in the private plus is_device_ptr case). I'm not planning to implement warnings right now. I'm just curious for later.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

kkwli0 added a comment.Aug 9 2019, 4:09 PM

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

That is what I thought. Specifying these two clauses on the target construct creates ambiguity on which p it referred to inside the construct. The private p or the pointer p that stores the device address? I will work with the committee to fix the spec.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

I think, this combination is still allowed. I assume that

#Pragma omp target parallel is_device_ptr(a) <dsa_clause>(a)

is the same as

#pragma omp target is_device_ptr(a)
#pragma omp parallel <dsa_clause>(a)

i.e. datasharing clauses are applied to inner sub-regions, not the target region itself.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

I think, this combination is still allowed. I assume that

#Pragma omp target parallel is_device_ptr(a) <dsa_clause>(a)

is the same as

#pragma omp target is_device_ptr(a)
#pragma omp parallel <dsa_clause>(a)

i.e. datasharing clauses are applied to inner sub-regions, not the target region itself.

With the parallel, that makes sense to me. In that case, however, I'd imagine that the privitization works as normal and the code wouldn't crash. Right?

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

I think, this combination is still allowed. I assume that

#Pragma omp target parallel is_device_ptr(a) <dsa_clause>(a)

is the same as

#pragma omp target is_device_ptr(a)
#pragma omp parallel <dsa_clause>(a)

i.e. datasharing clauses are applied to inner sub-regions, not the target region itself.

With the parallel, that makes sense to me. In that case, however, I'd imagine that the privitization works as normal and the code wouldn't crash. Right?

It leads to the use of uninitialized memory. We create private non-initialized copy of the pointer, not the copy of the data it addresses. It shall work with firstprivate clause.

jdenny updated this revision to Diff 214542.Aug 10 2019, 3:07 PM
jdenny retitled this revision from [OpenMP] Fix map/is_device_ptr with DSA on combined directive to [OpenMP] Permit map with DSA on combined directive.
jdenny edited the summary of this revision. (Show Details)

I made the following changes, as suggested:

  • Add back restriction for OpenMP < 5.0.
  • Remove is_device_ptr changes.

Alexey, you said:

Plus, these changes are not enough to support this new feature from OpenMP 5.0. There definitely must be some changes in the codegen. If the variable is mapped in the target construct, we should not generate a code for the private clause of this variable on the target construct, since, in this case, private clauses are applied for the inner subdirectives, which are the part of the combined directive, but not the target part of the directive.

I haven't spent enough time exploring codegen here in the past, so I'm starting out by investigating the codegen output for various examples. I'm looking for differences between a combined target teams and an equivalent separate target and teams. So far, I see nothing but superficial differences. Do you have an example where there's an undesirable difference?

Thanks.

jdenny edited the summary of this revision. (Show Details)Aug 10 2019, 3:15 PM

I made the following changes, as suggested:

  • Add back restriction for OpenMP < 5.0.
  • Remove is_device_ptr changes.

Alexey, you said:

Plus, these changes are not enough to support this new feature from OpenMP 5.0. There definitely must be some changes in the codegen. If the variable is mapped in the target construct, we should not generate a code for the private clause of this variable on the target construct, since, in this case, private clauses are applied for the inner subdirectives, which are the part of the combined directive, but not the target part of the directive.

I haven't spent enough time exploring codegen here in the past, so I'm starting out by investigating the codegen output for various examples. I'm looking for differences between a combined target teams and an equivalent separate target and teams. So far, I see nothing but superficial differences. Do you have an example where there's an undesirable difference?

Thanks.

Try something like target parallel firstprivate (a) map(a). Currently it will create a firstprivate copy of the variable a in target context thought it is not required at all. It may lead to increased register pressure and performance degradation.

jdenny updated this revision to Diff 214559.Aug 11 2019, 8:44 AM
jdenny marked 3 inline comments as done.
jdenny edited the summary of this revision. (Show Details)
jdenny set the repository for this revision to rG LLVM Github Monorepo.

Try something like target parallel firstprivate (a) map(a). Currently it will create a firstprivate copy of the variable a in target context thought it is not required at all. It may lead to increased register pressure and performance degradation.

Thanks. The only combination that appears to be affected is firstprivate and map for scalar types. I had only tried arrays and structs earlier, but they're not affected. If I had looked a little more closely at the test case this patch already introduced, I would have noticed that int is affected. The problematic analysis is in sema, where there was an apparent assumption that firstprivate wouldn't appear with map due to the previous restriction. This update fixes that.

For my previous update, I meant to point out that I introduced a fixme into the test suite. See the phabricator summary for details.

Try something like target parallel firstprivate (a) map(a). Currently it will create a firstprivate copy of the variable a in target context thought it is not required at all. It may lead to increased register pressure and performance degradation.

Thanks. The only combination that appears to be affected is firstprivate and map for scalar types. I had only tried arrays and structs earlier, but they're not affected. If I had looked a little more closely at the test case this patch already introduced, I would have noticed that int is affected. The problematic analysis is in sema, where there was an apparent assumption that firstprivate wouldn't appear with map due to the previous restriction. This update fixes that.

For my previous update, I meant to point out that I introduced a fixme into the test suite. See the phabricator summary for details.

This is wrong. It affects all possible combinations and not only fof scalar types, all of them are affected.

This is wrong. It affects all possible combinations and not only fof scalar types, all of them are affected.

Are you saying the patch isn't sufficient because other types need to be fixed too? Can you give an example?

This is wrong. It affects all possible combinations and not only fof scalar types, all of them are affected.

Are you saying the patch isn't sufficient because other types need to be fixed too? Can you give an example?

Target teams private map will produce extra private in target context, other constructs either.

Target teams private map will produce extra private in target context, other constructs either.

Here's what I tried:

int a;
#pragma omp target teams private(a) map(a)
  ;

The same code is generated as for:

int a;
#pragma omp target map(a)
#pragma omp teams private(a)
  ;

I haven't debugged it yet, but it seems orthogonal to whether you have a combined directive, which is what this patch is about.

Target teams private map will produce extra private in target context, other constructs either.

Here's what I tried:

int a;
#pragma omp target teams private(a) map(a)
  ;

The same code is generated as for:

int a;
#pragma omp target map(a)
#pragma omp teams private(a)
  ;

I haven't debugged it yet, but it seems orthogonal to whether you have a combined directive, which is what this patch is about.

Did you check the mapping flags, generated during host codegen? They must be the same. With private clause it may generate just map(alloc) instead of map(tofrom).

Target teams private map will produce extra private in target context, other constructs either.

Here's what I tried:

int a;
#pragma omp target teams private(a) map(a)
  ;

The same code is generated as for:

int a;
#pragma omp target map(a)
#pragma omp teams private(a)
  ;

I haven't debugged it yet, but it seems orthogonal to whether you have a combined directive, which is what this patch is about.

Did you check the mapping flags, generated during host codegen? They must be the same. With private clause it may generate just map(alloc) instead of map(tofrom).

I diffed the .ll files for combined vs. separate constructs. The only difference is the file ID. @.offload_maptypes isn't generated in either (but it is if I replace private with firstprivate).

Target teams private map will produce extra private in target context, other constructs either.

Here's what I tried:

int a;
#pragma omp target teams private(a) map(a)
  ;

The same code is generated as for:

int a;
#pragma omp target map(a)
#pragma omp teams private(a)
  ;

I haven't debugged it yet, but it seems orthogonal to whether you have a combined directive, which is what this patch is about.

Did you check the mapping flags, generated during host codegen? They must be the same. With private clause it may generate just map(alloc) instead of map(tofrom).

I diffed the .ll files for combined vs. separate constructs. The only difference is the file ID. @.offload_maptypes isn't generated in either (but it is if I replace private with firstprivate).

Maptypes array must be generated in all cases, check the host codegen.
Also, test codegen with the different kinds of maptypes, not only to from, but also alloc, to, from, etc. Yoh will see the difference in many cases and 2ith many kinds of types, not only scalars.

Target teams private map will produce extra private in target context, other constructs either.

Here's what I tried:

int a;
#pragma omp target teams private(a) map(a)
  ;

The same code is generated as for:

int a;
#pragma omp target map(a)
#pragma omp teams private(a)
  ;

I haven't debugged it yet, but it seems orthogonal to whether you have a combined directive, which is what this patch is about.

Did you check the mapping flags, generated during host codegen? They must be the same. With private clause it may generate just map(alloc) instead of map(tofrom).

I diffed the .ll files for combined vs. separate constructs. The only difference is the file ID. @.offload_maptypes isn't generated in either (but it is if I replace private with firstprivate).

Maptypes array must be generated in all cases, check the host codegen.
Also, test codegen with the different kinds of maptypes, not only to from, but also alloc, to, from, etc. Yoh will see the difference in many cases and 2ith many kinds of types, not only scalars.

I'll work on another patch for the bugs where directives are separate, and I'll return to this patch if more fixes are then needed to support combined directives consistently. Thanks.

jdenny updated this revision to Diff 215298.Aug 14 2019, 5:58 PM

Rebase. Add more tests, as requested at D66247#1630441.

Why there are the changes from the another patch?

I think you're referring to stuff like OpenMPCaptureLevel in ScopeInfo.h. I wrote those changes for this patch first, as can be seen in phabricator history. I needed them for the other patch too, and I thought we were going to end up applying that patch first, so I was going to strip them out here. Of course, as I understand it, the plan now is to apply only this patch.

I think you're referring to stuff like OpenMPCaptureLevel in ScopeInfo.h. I wrote those changes for this patch first, as can be seen in phabricator history. I needed them for the other patch too, and I thought we were going to end up applying that patch first, so I was going to strip them out here. Of course, as I understand it, the plan now is to apply only this patch.

These changes required for this functionality, please remove.

jdenny updated this revision to Diff 216420.Aug 21 2019, 9:19 AM
jdenny edited the summary of this revision. (Show Details)
jdenny set the repository for this revision to rG LLVM Github Monorepo.

As requested, backed out the changes related to firstprivate scalars, and updated tests.

This revision is now accepted and ready to land.Aug 21 2019, 9:38 AM

I want to be sure we're on the same page. Due to the changes I just backed out, the following two examples now generate different code:

int a = 0;
#pragma omp target map(a)
#pragma omp teams firstprivate(a)
  ;
int a = 0;
#pragma omp target teams firstprivate(a) map(a)
  ;

The difference is whether a is passed by reference (the first case) or value (the second case) to the offloading function.

Is that fine for you?

I want to be sure we're on the same page. Due to the changes I just backed out, the following two examples now generate different code:

int a = 0;
#pragma omp target map(a)
#pragma omp teams firstprivate(a)
  ;
int a = 0;
#pragma omp target teams firstprivate(a) map(a)
  ;

The difference is whether a is passed by reference (the first case) or value (the second case) to the offloading function.

Is that fine for you?

No, this is what I warned about. We shall have the same codegen just like in the first case, the value must be passed by reference and mapped as tofrom.

ABataev requested changes to this revision.Aug 21 2019, 9:56 AM

Need to fix the mapping

This revision now requires changes to proceed.Aug 21 2019, 9:56 AM

I want to be sure we're on the same page. Due to the changes I just backed out, the following two examples now generate different code:

int a = 0;
#pragma omp target map(a)
#pragma omp teams firstprivate(a)
  ;
int a = 0;
#pragma omp target teams firstprivate(a) map(a)
  ;

The difference is whether a is passed by reference (the first case) or value (the second case) to the offloading function.

Is that fine for you?

No, this is what I warned about. We shall have the same codegen just like in the first case, the value must be passed by reference and mapped as tofrom.

If I add back those changes I just backed out, we get the same codegen. Is that what you want?

I want to be sure we're on the same page. Due to the changes I just backed out, the following two examples now generate different code:

int a = 0;
#pragma omp target map(a)
#pragma omp teams firstprivate(a)
  ;
int a = 0;
#pragma omp target teams firstprivate(a) map(a)
  ;

The difference is whether a is passed by reference (the first case) or value (the second case) to the offloading function.

Is that fine for you?

No, this is what I warned about. We shall have the same codegen just like in the first case, the value must be passed by reference and mapped as tofrom.

If I add back those changes I just backed out, we get the same codegen. Is that what you want?

Those 2 cases must result in the same codegen. But I rather doubt we need your previous changes. Check Sema::isOpenMPCapturedByRef instead, required functionality must be handled in this function.

I want to be sure we're on the same page. Due to the changes I just backed out, the following two examples now generate different code:

int a = 0;
#pragma omp target map(a)
#pragma omp teams firstprivate(a)
  ;
int a = 0;
#pragma omp target teams firstprivate(a) map(a)
  ;

The difference is whether a is passed by reference (the first case) or value (the second case) to the offloading function.

Is that fine for you?

No, this is what I warned about. We shall have the same codegen just like in the first case, the value must be passed by reference and mapped as tofrom.

If I add back those changes I just backed out, we get the same codegen. Is that what you want?

Those 2 cases must result in the same codegen. But I rather doubt we need your previous changes. Check Sema::isOpenMPCapturedByRef instead, required functionality must be handled in this function.

That's the focus of my previous changes. The rest just supports the changes there.

I want to be sure we're on the same page. Due to the changes I just backed out, the following two examples now generate different code:

int a = 0;
#pragma omp target map(a)
#pragma omp teams firstprivate(a)
  ;
int a = 0;
#pragma omp target teams firstprivate(a) map(a)
  ;

The difference is whether a is passed by reference (the first case) or value (the second case) to the offloading function.

Is that fine for you?

No, this is what I warned about. We shall have the same codegen just like in the first case, the value must be passed by reference and mapped as tofrom.

If I add back those changes I just backed out, we get the same codegen. Is that what you want?

Those 2 cases must result in the same codegen. But I rather doubt we need your previous changes. Check Sema::isOpenMPCapturedByRef instead, required functionality must be handled in this function.

That's the focus of my previous changes. The rest just supports the changes there.

We don't need this new level counter to correctly handle this situation. Just check for the combined target directive in Sema::isOpenMPCapturedByRef and return true if it is mapped as to from or just from. The change must be very simple.

We don't need this new level counter to correctly handle this situation. Just check for the combined target directive in Sema::isOpenMPCapturedByRef and return true if it is mapped as to from or just from. The change must be very simple.

I don't see a way to do that without also affecting capturing on the teams region in the case of a combined target teams construct.

If it's simple, it might save time for both of us if you post the change you're thinking of here.

We don't need this new level counter to correctly handle this situation. Just check for the combined target directive in Sema::isOpenMPCapturedByRef and return true if it is mapped as to from or just from. The change must be very simple.

I don't see a way to do that without also affecting capturing on the teams region in the case of a combined target teams construct.

If it's simple, it might save time for both of us if you post the change you're thinking of here.

Chwck 2 ifs: if (IsVariableUsedInMapClause) and the second if (IsByRef && Ty.getNonReferenceType()->isScalarType()). If the variable is mapped, IsByRef is set to true, but later it may be changed in the second if, when we check for the use in the firstprivate clause. We need to add a check, that the IsByRef must be set to false onlly if IsVariableUsedInMapClause is set to false too for combined target constructs.

Chwck 2 ifs: if (IsVariableUsedInMapClause) and the second if (IsByRef && Ty.getNonReferenceType()->isScalarType()). If the variable is mapped, IsByRef is set to true, but later it may be changed in the second if, when we check for the use in the firstprivate clause. We need to add a check, that the IsByRef must be set to false onlly if IsVariableUsedInMapClause is set to false too for combined target constructs.

I've tried that. It results in capturing by reference on both the target region and the teams region in the case of a combined target teams construct. Thus, we end up with different codegen than when the directives are separate, where capturing by reference is set only on the target region.

The changes I just backed out handle this correctly because they distinguish between the target and teams regions.

Chwck 2 ifs: if (IsVariableUsedInMapClause) and the second if (IsByRef && Ty.getNonReferenceType()->isScalarType()). If the variable is mapped, IsByRef is set to true, but later it may be changed in the second if, when we check for the use in the firstprivate clause. We need to add a check, that the IsByRef must be set to false onlly if IsVariableUsedInMapClause is set to false too for combined target constructs.

I've tried that. It results in capturing by reference on both the target region and the teams region in the case of a combined target teams construct. Thus, we end up with different codegen than when the directives are separate, where capturing by reference is set only on the target region.

The changes I just backed out handle this correctly because they distinguish between the target and teams regions.

Ahh, I see, in this case, the codegen for the inner subregion is changed. Ok, I see now. Then yes, we need this change with subregion counter.
In this case, I would recommend trying to fix https://bugs.llvm.org/show_bug.cgi?id=40253 at first, as it relates to the same problem, I think. Better to fix the problem separately and then extend it for new feature rather than add 2 new features in one patch.

jdenny updated this revision to Diff 216444.Aug 21 2019, 11:55 AM
jdenny edited the summary of this revision. (Show Details)

Restore previous version of patch, and rebase.

I tried, and this patch is not sufficient to fix PR40253. If there are indeed common changes, it seems it's just as well to put them here.

ABataev added inline comments.Aug 21 2019, 12:39 PM
clang/include/clang/Sema/ScopeInfo.h
763–764

I would add a parameter for OpenMPCaptureLevel rather than use default value.

clang/include/clang/Sema/Sema.h
9027

Do not use default value here, just set it to 0 in the call in SemaOpenMP.cpp. We use it there tat target level always and thus CaptureLevel is always 0 there.

clang/lib/Sema/Sema.cpp
2107–2108

I would suggest adding a new parameter for CapturedLevel and the same parameter to ActOnCapturedRegionStart function. And pass the value for CapturedLevel explicitly at ActOnCapturedRegionStart function call in SemaOpenMP.cpp rather than rely on this solution.
Also, for each separate OpenMP directive, this counter must start from 0. Currently, I think, it may lead to crash if we have several directives. Something like:

#pragma omp parallel
#pragma omp target teams
...

I think it will work incorrectly for this example.

2108–2110

CapturedRegionScopeInfo *->auto *

jdenny marked an inline comment as done.Aug 21 2019, 1:08 PM
jdenny added inline comments.
clang/lib/Sema/Sema.cpp
2107–2108

I think the check that OpenMPLevel is the same prevents the problem you mention.

ABataev added inline comments.Aug 21 2019, 1:11 PM
clang/lib/Sema/Sema.cpp
2107–2108

Ah, yes, missed this. Anyway, better to specify this explicitly as function parameters, I think, rather than trying to evaluate it here this way. We eventually may change the handling of captured regions in OpenMP, better to have everything in SemaOpenMP.cpp

ABataev added inline comments.Aug 21 2019, 1:13 PM
clang/lib/Sema/Sema.cpp
2107–2108

BTW, here you can use the parameter with the default value 0.

jdenny marked an inline comment as done.Aug 21 2019, 1:17 PM
jdenny added inline comments.
clang/lib/Sema/Sema.cpp
2107–2108

OK.

CaptureLevel or CapturedLevel? Do you care?

ABataev added inline comments.Aug 21 2019, 1:23 PM
clang/lib/Sema/Sema.cpp
2107–2108

Up to you

jdenny updated this revision to Diff 216469.Aug 21 2019, 1:50 PM
jdenny marked 7 inline comments as done.
jdenny set the repository for this revision to rG LLVM Github Monorepo.

Make suggested changes for passing around the capture level.

ABataev added inline comments.Aug 21 2019, 2:29 PM
clang/include/clang/Sema/Sema.h
1419

Better to use OpenMPCaptureLevel since this param is OpenMP specific.

1419

Add default value for the parameter here too.

3977

Same, add prefix OpenMP

9027

Same, OpenMPCaptureLevel here and in other places.

clang/lib/Sema/SemaExpr.cpp
17748

Restore original code here

clang/lib/Sema/SemaOpenMP.cpp
1350–1351

Use /*param=*/ format, please.

2975

Same for param comment here and other places.

clang/lib/Sema/SemaStmt.cpp
4324

Use default value of the parameter, no need to pass 0 here

jdenny updated this revision to Diff 216502.Aug 21 2019, 3:24 PM
jdenny marked 8 inline comments as done.

Make suggested changes to default arguments, comments on literals, and parameter names.

jdenny added inline comments.Aug 21 2019, 3:27 PM
clang/lib/Sema/SemaExpr.cpp
17748

OK, I did. What's the reason for no newline at the end of the file?

ABataev added inline comments.Aug 21 2019, 3:34 PM
clang/lib/Sema/SemaExpr.cpp
17748

Changes not related to the patch, like formatting etc., better to commit in separate patches.

ABataev accepted this revision.Aug 21 2019, 3:38 PM

LG

clang/lib/Sema/SemaExpr.cpp
17749

Still marked as changed code, better to restore it completely.

This revision is now accepted and ready to land.Aug 21 2019, 3:38 PM
jdenny marked 4 inline comments as done.Aug 21 2019, 3:41 PM

Thanks.

clang/lib/Sema/SemaExpr.cpp
17748

Ah.

17749

There's no change here. I checked the diff I uploaded to be sure.

This revision was automatically updated to reflect the committed changes.
jdenny marked 2 inline comments as done.
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2019, 8:33 PM