This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP][NVPTX]Relax flush directive.
ClosedPublic

Authored by ABataev on May 24 2019, 8:19 AM.

Details

Summary

According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.

According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality.

__threadfence_system() also provides required functionality, but it also
includes some extra functionality, like synchronization of page-locked
host memory, synchronization for the host, etc. It is not required per
the standard and we can use more relaxed version of memory fence
operation.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev created this revision.May 24 2019, 8:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2019, 8:19 AM
grokos added inline comments.May 31 2019, 3:10 PM
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
18 ↗(On Diff #201257)

S-21? Typo?

grokos added inline comments.May 31 2019, 3:21 PM
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
28 ↗(On Diff #201257)

"Values" --> "Value of data". Also, why is the value of "flag" undefined at this point?

ABataev marked 2 inline comments as done.May 31 2019, 4:48 PM
ABataev added inline comments.
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
18 ↗(On Diff #201257)

Yes, this is slightly reworked code from OpenMP Examples document and I forgot to remove this when copied the example.

28 ↗(On Diff #201257)

as I understand, it happens because the compiler is free to reorder read/write ops in any order without explicitly provided ordering. That's why flag can be undefined here.

jdoerfert added inline comments.May 31 2019, 5:04 PM
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
23 ↗(On Diff #201257)

There is a race in this program.

After the flush in thread 0 happened, and after the flush prior to the while happened, both thread 0 and 32 will access flag without proper synchronization.

29 ↗(On Diff #201257)

Couldn't we copy data to a new variable and check it on the host. Relying on printf here seems brittle.

grokos added inline comments.Jun 3 2019, 1:45 PM
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
23 ↗(On Diff #201257)

I thinks that's the purpose of this test: to make sure that, despite the race on flag, memory fencing is enforced for data.

ABataev marked 2 inline comments as done.Jun 19 2019, 8:08 AM
ABataev added inline comments.
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
23 ↗(On Diff #201257)

Exactly, just like I said before this is slightly reworked example for flush directive from OpenMP examples.

29 ↗(On Diff #201257)

Ok

ABataev updated this revision to Diff 205599.Jun 19 2019, 8:14 AM

Address comments.

jdoerfert requested changes to this revision.Jun 21 2019, 8:50 AM
jdoerfert added inline comments.
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
23 ↗(On Diff #201257)

I thinks that's the purpose of this test: to make sure that, despite the race on flag, memory fencing is enforced for data.

Having a race on flag, especially since it is used in a control condition for the flush, makes the program as a whole UB. We cannot reliably test based on UB.

28 ↗(On Diff #201257)

compiler is free to reorder read/write ops in any order without explicitly provided ordering.

Forgetting about the UB we have in this program for a moment, it is not true that a compiler is free to reorder arbitrary read/writes. I'm unsure what accesses you refer to here and how do they need to be reordered to lead to undefined values? (I could imagine this to be a side effect of the UB above which would make my point that this is not a reliable method to test things.)

This revision now requires changes to proceed.Jun 21 2019, 8:50 AM
ABataev marked 2 inline comments as done.Jun 21 2019, 9:04 AM
ABataev added inline comments.
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
23 ↗(On Diff #201257)

This is slightly modified test from the official OpenMP document (see https://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf, 8.1 The OpenMP Memory Model). If you think the test is not correct, you need to discuss it with OpenMP committee and create a ticket for this problem to fix the example.
The value of flag is undefined because of the data race and I think this was intended.

28 ↗(On Diff #201257)

This assumption was wrong, actually, the value is undefined because of the race condition.

ABataev updated this revision to Diff 206027.Jun 21 2019, 9:49 AM

Reworked test.

jdoerfert added inline comments.Jun 24 2019, 7:13 AM
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
23 ↗(On Diff #201257)

I will talk to the relevant people in the OpenMP ARB but that should not concern us here.
Whatever someone else thought would make a good test case should not impact us here.

UB is nothing to rely on and I fail to see why the race that causes it is an important part of the test case we need to keep.

ABataev marked an inline comment as done.Jun 24 2019, 8:04 AM
ABataev added inline comments.
libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
23 ↗(On Diff #201257)

I agree. That's why I reworked the test and usу atomics for flag.

jdoerfert accepted this revision.Jun 27 2019, 8:03 AM

Potential improvement inlined but this LGTM. Thanks for fixing the race problem!

libomptarget/deviceRTLs/nvptx/test/parallel/flush.c
34 ↗(On Diff #206027)

Should we return !(out == 42)?

This revision is now accepted and ready to land.Jun 27 2019, 8:03 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2019, 11:33 AM