This is an archive of the discontinued LLVM Phabricator instance.

[flang][OpenMP] Handle the data race for firstprivate and lastprivate
ClosedPublic

Authored by peixin on Aug 13 2022, 2:15 AM.

Details

Summary

Remove barriers for firstprivate except if the variable is also
lastprivatized. Re-arrange the code and put all last-privates inside a
single scf.if.

As OpenMP Spec 5.0, to avoid the data races, concurrent updates of the
original list item must be synchronized with the read of the original
list item that occurs as a result of the firstprivate clause. Adding
barrier(s) before and/or after the worksharing region would remove the
data races, and it is the application(user)'s job. However, when
one list item is in both firstprivate and lastprivate clauses, the
standard (https://www.openmp.org/spec-html/5.0/openmpsu105.html) states
the following:

If a list item appears in both firstprivate and lastprivate clauses, the 
update required for the lastprivate clause occurs after all 
initializations for the firstprivate clause.

So, the synchronization should be ensured by compiler such as emiting
one barrier since the lastprivate clause follows the reads of the
original list item performed for the initialization of each of the
firstprivate list item.

Add FIXME for two special cases, sections construct and linear clause.

The data race problem for single construct will be handled later.

This implementation is based on the discussion with OpenMP committee and
clang code (clang/lib/CodeGen/CGStmtOpenMP.cpp).

Diff Detail

Event Timeline

peixin created this revision.Aug 13 2022, 2:15 AM
peixin requested review of this revision.Aug 13 2022, 2:15 AM

Thank you for the patch. I have a couple of questions.

flang/test/Lower/OpenMP/default-clause.f90
18

I do not understand this. Does firstprivate no longer require omp.barrier in absence of a lastprivate?

flang/test/Lower/OpenMP/omp-parallel-lastprivate-clause-scalar.f90
94

So if I understand correctly, lastprivate will capture the last value of the loop and store it back into the variable. Is it necessary to wrap these in an atomic region? I am wondering if parallel execution of threads could result in the last iteration's values not being captured in these variables.

Thanks, Deepak.

That's the case I am referring to. Thanks for pointing out the source of Spec and confirmation of inserting a barrier at the beginning of the region.

All my problems are solved. Your help is really appreciated.

All the best,
Peixin


Qiao Peixin
Mobile: +86-50000037343(For Welink,eSpace Calls)
Email: qiaopeixin@huawei.com
From:Eachempati, Deepak <deepak.eachempati@hpe.com>
To:qiaopeixin <qiaopeixin@huawei.com>;Protze, Joachim <protze@itc.rwth-aachen.de>;omp-lang <omp-lang@mailman.openmp.org>;omp-lang <omp-lang@openmp.org>
Cc:abidmuslim <abidmuslim@gmail.com>;Vaishay, Shraiysh <Shraiysh.Vaishay@amd.com>;kirankumartp <kirankumartp@gmail.com>;kiranchandramohan <kiranchandramohan@gmail.com>;jdoerfert <jdoerfert@anl.gov>
Sent:2022-07-07 23:11:37
Subject:RE: 【Technical】 Questions about the data race problem for firstprivate clause

Are you referring to this case?

x = 10;
#pragma omp parallel copyin(x)
{

const int tid = omp_get_thread_num();
x += tid; // race?
printf(“%d: x = %d\n”, tid, x);

}

This is what the spec says:

“The copy is performed after the team is formed and prior to the execution of the associated structured block.”

An implementation inserting a barrier at the beginning of the region would satisfy that requirement.

  • Deepak

From: qiaopeixin <qiaopeixin@huawei.com>
Sent: Wednesday, July 6, 2022 10:49 AM
To: Eachempati, Deepak <deepak.eachempati@hpe.com>; Protze, Joachim <protze@itc.rwth-aachen.de>; omp-lang <omp-lang@mailman.openmp.org>; omp-lang <omp-lang@openmp.org>
Cc: abidmuslim <abidmuslim@gmail.com>; Vaishay, Shraiysh <Shraiysh.Vaishay@amd.com>; kirankumartp <kirankumartp@gmail.com>; kiranchandramohan <kiranchandramohan@gmail.com>; jdoerfert <jdoerfert@anl.gov>
Subject: RE: 【Technical】 Questions about the data race problem for firstprivate clause

Thanks, Deepak and Johannes.

Another interesting case is firstprivate(x) and lastprivate(x) on the same construct. In what case, I believe an implementation should ensure that the update to the original list iterm due to the lastprivate clause follows the reads of the original list item performed for the initialization of each of the firstprivate list items.

Right. I do see one barrier after copies operation of firstprivate clause in gcc, icc, and clang when firstprivate(x) and lastprivate(x) are on the same construct. There is no barrier for one single firstprivate(x) without lastprivate(x) for those three compilers.

The behaviors for copyin clause seems to be different from firstprivate clause. The OpenMP Spec does not mention the data race things. For copyin clause, the implementation should ensure one synchronization (one barrier is one option) no matter whether there is one copyprivate clause. This is what clang does. Do I understand it correctly?

All the best,
Peixin


Qiao Peixin
Mobile: +86-50000037343(For Welink,eSpace Calls)
Email: qiaopeixin@huawei.com
From:Eachempati, Deepak <deepak.eachempati@hpe.com>
To:qiaopeixin <qiaopeixin@huawei.com>;Protze, Joachim <protze@itc.rwth-aachen.de>;omp-lang <omp-lang@mailman.openmp.org>;omp-lang <omp-lang@openmp.org>
Cc:abidmuslim <abidmuslim@gmail.com>;Vaishay, Shraiysh <Shraiysh.Vaishay@amd.com>;kirankumartp <kirankumartp@gmail.com>;kiranchandramohan <kiranchandramohan@gmail.com>
Sent:2022-07-06 22:28:05
Subject:RE: 【Technical】 Questions about the data race problem for firstprivate clause

Yes. That is what this sentence is saying:

“To avoid data races, concurrent updates of the original list item must be synchronized with the read
of the original list item that occurs as a result of the firstprivate clause.”

Another interesting case is firstprivate(x) lastprivate(x) on the same construct. In that case, I believe an implementation should ensure that the update to the original list item due to the lastprivate clause follows the reads of the original list item performed for the initialization of each of the firstprivate list items.

  • Deepak

From: qiaopeixin <qiaopeixin@huawei.com>
Sent: Wednesday, July 6, 2022 9:03 AM
To: Eachempati, Deepak <deepak.eachempati@hpe.com>; Protze, Joachim <protze@itc.rwth-aachen.de>; omp-lang@mailman.openmp.org; omp-lang@openmp.org
Cc: abidmuslim@gmail.com; Vaishay, Shraiysh <Shraiysh.Vaishay@amd.com>; kirankumartp@gmail.com; kiranchandramohan@gmail.com
Subject: RE: 【Technical】 Questions about the data race problem for firstprivate clause

Thanks a lot for the reply, Deepak and Jochim.

Adding barriers before and after the loop construct would remove the data races.

I tested your provided test cases using gcc, icc and clang. All of them have the data race problems. @Deepak So, do you mean that it is application’s job to add the barriers to avoid the data race instead of compiler’s job.

All the best,
Peixin

From: Eachempati, Deepak [mailto:deepak.eachempati@hpe.com]
Sent: Wednesday, July 6, 2022 9:19 PM
To: omp-lang@mailman.openmp.org; omp-lang@openmp.org
Cc: qiaopeixin <qiaopeixin@huawei.com>; abidmuslim@gmail.com; Vaishay, Shraiysh <Shraiysh.Vaishay@amd.com>; kirankumartp@gmail.com; kiranchandramohan@gmail.com
Subject: RE: 【Technical】 Questions about the data race problem for firstprivate clause

Here’s an example:

int x = 0;
#pragma omp parallel num_threads(4)
{

#pragma omp atomic
x++;
#pragma omp for nowait schedule(static,1) firstprivate(x)
for (int i = 0; i < 4; i++) {
   printf(“%d: x = %d\n”, i, x);
}
#pragma omp atomic
x--;

}

There is a data race due to the atomic increment before the loop construct and the initialization of the firstprivate x inside the loop region. Also, due to the nowait clause, there is a race between the that firstprivate initialization and the subsequent atomic decrement after the loop construct.

Adding barriers before and after the loop construct would remove the data races.

For your second question, a race is still possible with a single construct or sections construct. Replace the above loop construct with single or sections, and it is still a race. A firstprivate on a task construct can also result in a race. Example:

int x = 0;
#pragma omp parallel num_threads(4) shared(x)
{

#pragma omp atomic
x++;
#pragma omp masked
   #pragma omp task firstprivate(x)
   {
     printf(“x = %d\n”, x);
   }
#pragma omp atomic
x--;

}

As before, the atomic updates concurrent with the read of x for the firstprivate initialization results in a data race.

  • Deepak

From: Omp-lang <omp-lang-bounces@mailman.openmp.org> On Behalf Of qiaopeixin via Omp-lang
Sent: Wednesday, July 6, 2022 7:23 AM
To: omp-lang@openmp.org
Cc: qiaopeixin <qiaopeixin@huawei.com>; abidmuslim@gmail.com; Vaishay, Shraiysh <Shraiysh.Vaishay@amd.com>; kirankumartp@gmail.com; kiranchandramohan@gmail.com
Subject: [Omp-lang] 【Technical】 Questions about the data race problem for firstprivate clause

Hi OpenMP committee,

I am Peixin from Huawei. I am working on Flang OpenMP development in LLVM. We have some questions about the data race problem for firstprivate clause.

As OpenMP 5.0 Spec 2.19.4.4, to avoid data races, concurrent updates of the original list item must be synchronized with the read of the original list item that occurs as a result of the firstprivate clause.

We have some discussions about this in LLVM: https://discourse.llvm.org/t/issues-with-the-current-implementation-of-privatization-in-openmp-with-fortran/62335.

Question 1: Does the OpenMP Spec mean that there is one implicit barrier at the beginning of the firstprivate region? Or similar thing which can synchronize the data?

Question 2: The firstprivate clause can be in parallel, teams, sections, single, worksharing-loop, distribute, task, taskloop, and target constructs and combined/composite contructs containing those contructs. The synchronization is not needed for single, sections, task constructs since there is only one thread in the region of these three constructs (section region for sections construct), right?

All the best,
Peixin Qiao

flang/test/Lower/OpenMP/default-clause.f90
18

Yes. The previous understanding was wrong. I attached the discussion with OpenMP committee, which may help you and other reviewers understand this fix.

flang/test/Lower/OpenMP/omp-parallel-lastprivate-clause-scalar.f90
94

This patch is not to change anything about lastprivate lowering. But to my understanding, there is no need to do it in an atomic region since it checks the IV_CMP1, which is the loop index check. If you emit the LLVM IR, you will see the loop index is unique for each thread and it is managed by the runtime function.

NimishMishra accepted this revision.Aug 18 2022, 6:16 PM

LGTM.

flang/lib/Lower/OpenMP.cpp
220

This FIXME on sections is probably something I can pick up. If you have not started working on it, let me know.

flang/test/Lower/OpenMP/default-clause.f90
18

Thank you. The discussion is helpful. I guess it sounds ok to offload the job of inserting barriers to the application.

This revision is now accepted and ready to land.Aug 18 2022, 6:16 PM
peixin added inline comments.Aug 18 2022, 6:21 PM
flang/lib/Lower/OpenMP.cpp
220

Sure. You can take it up. Actually you have taken up this :) https://flang-compiler.slack.com/archives/C01PY03PP9P/p1657855039246809.

BTW, please double check the behavior of clang when you work on this. I didn't dig too much about this issue in sections directive.

The other implementations such as the lowering of firstprivate in single construct depends on this implementation. Will land this after waiting for two more days if no further review.

NimishMishra added inline comments.Aug 18 2022, 6:41 PM
flang/lib/Lower/OpenMP.cpp
220

Right. Thanks. I will fix this in https://reviews.llvm.org/D131463. Will check clang's behaviour too.

Have a few Nits.
In the summary please call out what is implemented in the patch (Remove barriers for firstprivate except if the variable is also lastprivatized. Re-arrange code and put all last-privates inside a single scf.if) and then provide reasons.

For the lastprivate and firstprivate case, the standard explicitly calls this out. So it will be better to put a reference to the standard.
In https://www.openmp.org/spec-html/5.0/openmpsu105.html the following statement is there.

If a list item appears in both firstprivate and lastprivate clauses, the update required for the lastprivate clause occurs after all initializations for the firstprivate clause.
flang/lib/Lower/OpenMP.cpp
225

Nit: Add a line below.

peixin updated this revision to Diff 454151.Aug 19 2022, 6:26 PM
peixin edited the summary of this revision. (Show Details)

Address the comments from @kiranchandramohan .