This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
include/flang/Lower/
-
flang/
-
Lower/
-
AbstractConverter.h
-
lib/Lower/
-
Lower/
-
Bridge.cpp
1
OpenMP.cpp
-
test/Lower/OpenMP/
-
Lower/
-
OpenMP/
1/3
single.f90

Differential D128596

[flang][OpenMP] Support privatization for single construct
ClosedPublic

Authored by peixin on Jun 25 2022, 11:18 PM.

Download Raw Diff

Details

Reviewers

shraiysh
kiranchandramohan
clementval
kiranktp
Leporacanthicus
NimishMishra
arnamoy10
sscalpone
jdoerfert
nicolasvasilache

Commits

rGf4accbf55f4d: [flang][OpenMP] Support privatization for single construct

Summary

This supports the lowering of private and firstprivate clauses in single
construct. The alloca ops are emitted in the entry block according to
https://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas, and
the load/store ops are emitted in the single region. The data race
problem is handled in OMPIRBuilder. That is, the barrier is emitted in
OMPIRBuilder.

Co-authored-by: Nimish Mishra <neelam.nimish@gmail.com>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

peixin created this revision.Jun 25 2022, 11:18 PM

Herald added a reviewer: sscalpone. · View Herald TranscriptJun 25 2022, 11:18 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 22 others. · View Herald Transcript

peixin requested review of this revision.Jun 25 2022, 11:18 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 25 2022, 11:18 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

peixin edited the summary of this revision. (Show Details)Jun 25 2022, 11:19 PM

peixin added a parent revision: D128595: [flang][OpenMP] Fix the data race problem for firstprivate clause.

Harbormaster completed remote builds in B172055: Diff 440029.Jun 25 2022, 11:19 PM

I am preparing one patch to fix the same data-race problem in classic-flang. I noticed it seems the barrier is not necessary for task and single constructs. BTW, our test team found that the data race problem is more obvious when both firstprivate and lastprivate exist, even not only for pointer variables. Once we agree on a final decision on the data-race problem, I will submit one similar patch to classic-flang, too.

peixin planned changes to this revision.Jul 7 2022, 6:08 PM

@peixin What is status of this patch? Are you planning to modify this patch or is it ready for the review?

@domada I will update this patch later this week.

Rebase and address the data race problem for single construct.

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 16 2022, 8:41 PM

Harbormaster completed remote builds in B187298: Diff 460962.Sep 16 2022, 8:42 PM

peixin removed a parent revision: D128595: [flang][OpenMP] Fix the data race problem for firstprivate clause.Sep 16 2022, 8:42 PM

domada added inline comments.Sep 19 2022, 1:37 AM

flang/test/Lower/OpenMP/single.f90
2	Why do you use bbc instead of flang-new?

peixin added inline comments.Sep 19 2022, 1:49 AM

flang/test/Lower/OpenMP/single.f90
2	I use both `bbc` and `flang-new` for lowering test. I also remove the `fir-opt` check since it should not be here. @kiranchandramohan planned to change the lowering tests (removing the lowering to LLVMIR tests under flang/test/Lower/OpenMP) after upstreaming is done. @kiranchandramohan When do you plan to change them?

For me the patch is ok, but I think it would be better, if somebody else will also review it.

Thanks @peixin for this patch.

I have a couple of questions/concerns. See comments inline.

flang/lib/Lower/OpenMP.cpp
524	The barrier is created by the OpenMPIRBuilder. `copyprivate`'s barrier can be handled separately I guess. https://github.com/llvm/llvm-project/blob/29b37f319ac310e9c023d7c707ecfe1f709807ae/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp#L3352
flang/test/Lower/OpenMP/single.f90
2	Sorry about the delay here. It is on my list. I will get to it soon.
mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
241 ↗	(On Diff #460962)	The single is inlined usually, is the OpenMPIRBuilder behaviour different? We might want to change the name of the Interface if we really want the alloca to be inside the single region. Have you checked the Clang behaviour?

This revision now requires changes to proceed.Sep 19 2022, 2:52 PM

peixin updated this revision to Diff 462876.Sep 26 2022, 5:49 AM

peixin edited the summary of this revision. (Show Details)

Herald added a subscriber: zero9178. · View Herald TranscriptSep 26 2022, 5:49 AM

Thanks @kiranchandramohan for pointing out the problem.

You are right. The barrier is emitted in OMPIRBuilder, and the copyprivate should be handled in OMPIRBuilder, too. Clang emits the private/firstprivate in CodeGen (https://github.com/llvm/llvm-project/blob/2188cf9fa4d012b3ce484f9e97b66581569c1157/clang/lib/CodeGen/CGStmtOpenMP.cpp#L4196-L4197) before and after single runtime calls. Also, check the following case:

void sub(float a, double b);
float x = 1.0;
double y = 2.0;
void test() {
  #pragma omp single private(x) firstprivate(y)
  {
    sub(x, y);
  }
}

The generated IR is as follows:

@x = dso_local global float 1.000000e+00, align 4
@y = dso_local global double 2.000000e+00, align 8
@0 = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
@1 = private unnamed_addr constant %struct.ident_t { i32 0, i32 2, i32 0, i32 22, ptr @0 }, align 8
@2 = private unnamed_addr constant %struct.ident_t { i32 0, i32 322, i32 0, i32 22, ptr @0 }, align 8

; Function Attrs: noinline nounwind optnone uwtable
define dso_local void @test() #0 {
entry:
  %y = alloca double, align 8
  %x = alloca float, align 4
  %0 = call i32 @__kmpc_global_thread_num(ptr @1)
  %1 = call i32 @__kmpc_single(ptr @1, i32 %0)
  %2 = icmp ne i32 %1, 0
  br i1 %2, label %omp_if.then, label %omp_if.end

omp_if.then:                                      ; preds = %entry
  %3 = load double, ptr @y, align 8
  store double %3, ptr %y, align 8
  %4 = load float, ptr %x, align 4
  %5 = load double, ptr %y, align 8
  call void @sub(float noundef %4, double noundef %5)
  call void @__kmpc_end_single(ptr @1, i32 %0)
  br label %omp_if.end

omp_if.end:                                       ; preds = %omp_if.then, %entry
  call void @__kmpc_barrier(ptr @2, i32 %0)
  ret void
}

The private/firstprivate are both in single region. So, there is no need to do any additional work in flang lowering for it.

BTW, filed one ticket for one missed sema check for firstprivate clause. https://github.com/llvm/llvm-project/issues/57983

Harbormaster completed remote builds in B188683: Diff 462876.Sep 26 2022, 6:26 AM

In D128596#3814860, @peixin wrote:

Clang emits the private/firstprivate in CodeGen (https://github.com/llvm/llvm-project/blob/2188cf9fa4d012b3ce484f9e97b66581569c1157/clang/lib/CodeGen/CGStmtOpenMP.cpp#L4196-L4197) before and after single runtime calls.

Is it OK to not add the Interface to the Single operation in this case? And keep the interface for operations that are outlined only and reinstating its original name OutlineableOpenMPOpInterface.

In D128596#3817626, @kiranchandramohan wrote:

In D128596#3814860, @peixin wrote:

Clang emits the private/firstprivate in CodeGen (https://github.com/llvm/llvm-project/blob/2188cf9fa4d012b3ce484f9e97b66581569c1157/clang/lib/CodeGen/CGStmtOpenMP.cpp#L4196-L4197) before and after single runtime calls.

Is it OK to not add the Interface to the Single operation in this case? And keep the interface for operations that are outlined only and reinstating its original name OutlineableOpenMPOpInterface.

Of course. The interface is mainly used to get the alloca block currently. The following would work for the single construct.

diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index e1cba68e1686..4f2824f2c591 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -203,6 +203,8 @@ static bool privatizeVars(Op &op, Fortran::lower::AbstractConverter &converter,
   bool needBarrier = false;
   if (mlir::isa<mlir::omp::SectionOp>(op))
     firOpBuilder.setInsertionPointToStart(&op.getRegion().back());
+  else if (mlir::isa<mlir::omp::SingleOp>(op))
+    firOpBuilder.setInsertionPointToStart(&op.getRegion().front());
   else
     firOpBuilder.setInsertionPointToStart(firOpBuilder.getAllocaBlock());
   for (auto sym : privatizedSymbols) {

Do you prefer this solution?

In D128596#3817707, @peixin wrote:
In D128596#3817626, @kiranchandramohan wrote:

In D128596#3814860, @peixin wrote:

Clang emits the private/firstprivate in CodeGen (https://github.com/llvm/llvm-project/blob/2188cf9fa4d012b3ce484f9e97b66581569c1157/clang/lib/CodeGen/CGStmtOpenMP.cpp#L4196-L4197) before and after single runtime calls.

Is it OK to not add the Interface to the Single operation in this case? And keep the interface for operations that are outlined only and reinstating its original name OutlineableOpenMPOpInterface.

Of course. The interface is mainly used to get the alloca block currently. The following would work for the single construct.
diff --git a/flang/lib/Lower/OpenMP.cpp b/flang/lib/Lower/OpenMP.cpp
index e1cba68e1686..4f2824f2c591 100644
--- a/flang/lib/Lower/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP.cpp
@@ -203,6 +203,8 @@ static bool privatizeVars(Op &op, Fortran::lower::AbstractConverter &converter,
   bool needBarrier = false;
   if (mlir::isa<mlir::omp::SectionOp>(op))
     firOpBuilder.setInsertionPointToStart(&op.getRegion().back());
+  else if (mlir::isa<mlir::omp::SingleOp>(op))
+    firOpBuilder.setInsertionPointToStart(&op.getRegion().front());
   else
     firOpBuilder.setInsertionPointToStart(firOpBuilder.getAllocaBlock());
   for (auto sym : privatizedSymbols) {
Do you prefer this solution?

Why wouldn't getAllocaBlock work by itself? From our understanding of Clang it is not required for the private allocas to be inside the runtime calls, so getAllocaBlock will hoist it to the nearest region that will be outlined. Isn't that sufficient?

Why wouldn't getAllocaBlock work by itself? From our understanding of Clang it is not required for the private allocas to be inside the runtime calls, so getAllocaBlock will hoist it to the nearest region that will be outlined. Isn't that sufficient?

firOpBuilder.getAllocaBlock() is to call getEntryBlock() if it is not OutlineableOpenMPOpInterface, which is to call getFunction().front(). For the following case, it will hoist the private/firstprivate operations outside the parallel region, which is wrong.

subroutine single_private(x, y)
  real :: x
  real(8) :: y

  !$omp parallel
  !$omp single private(x) firstprivate(y)
  call bar(x, y)
  !$omp end single
  !$omp end parallel
end subroutine single_private

In D128596#3817748, @peixin wrote:
Why wouldn't getAllocaBlock work by itself? From our understanding of Clang it is not required for the private allocas to be inside the runtime calls, so getAllocaBlock will hoist it to the nearest region that will be outlined. Isn't that sufficient?

firOpBuilder.getAllocaBlock() is to call getEntryBlock() if it is not OutlineableOpenMPOpInterface, which is to call getFunction().front(). For the following case, it will hoist the private/firstprivate operations outside the parallel region, which is wrong.
subroutine single_private(x, y)
  real :: x
  real(8) :: y

  !$omp parallel
  !$omp single private(x) firstprivate(y)
  call bar(x, y)
  !$omp end single
  !$omp end parallel
end subroutine single_private

The code (as given below) tries to get a parent of type OutlineableOpenMPOpInterface which the parallel operation has. If this is not happening then there is a bug.

/// Get the block for adding Allocas.
mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
  auto iface =
      getRegion().getParentOfType<mlir::omp::OutlineableOpenMPOpInterface>();
  return iface ? iface.getAllocaBlock() : getEntryBlock();
}

In D128596#3817758, @kiranchandramohan wrote:
In D128596#3817748, @peixin wrote:
Why wouldn't getAllocaBlock work by itself? From our understanding of Clang it is not required for the private allocas to be inside the runtime calls, so getAllocaBlock will hoist it to the nearest region that will be outlined. Isn't that sufficient?

firOpBuilder.getAllocaBlock() is to call getEntryBlock() if it is not OutlineableOpenMPOpInterface, which is to call getFunction().front(). For the following case, it will hoist the private/firstprivate operations outside the parallel region, which is wrong.
subroutine single_private(x, y)
  real :: x
  real(8) :: y

  !$omp parallel
  !$omp single private(x) firstprivate(y)
  call bar(x, y)
  !$omp end single
  !$omp end parallel
end subroutine single_private
The code (as given below) tries to get a parent of type OutlineableOpenMPOpInterface which the parallel operation has. If this is not happening then there is a bug.
/// Get the block for adding Allocas.
mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
  auto iface =
      getRegion().getParentOfType<mlir::omp::OutlineableOpenMPOpInterface>();
  return iface ? iface.getAllocaBlock() : getEntryBlock();
}

Right, that case works OK. For the following case, the private clause works ok using firOpBuilder.getAllocaBlock(). However, the implementation of firstprivate clause is not reasonable. The alloca ops can be hoisted outside the single region, but the store op for firstprivate clause should be in single region. Clang put all load and store ops in the single region. There is no execution error if put all alloca, load, and store ops outside single region. However, it has worse performance. Actually, moving alloca inside single region will result in less memory usage since it will only allocate one copy of variable for the single thread instead of making the copy for each thread.

 subroutine single_private(x, y)
  real :: x
  real(8) :: y

  !$omp single private(x) firstprivate(y)
  call bar(x, y)
  !$omp end single
end subroutine single_private

Also for the wsloop, the generated IR seems to be not OK, either.

subroutine firstprivate_complex2(arg1, arg2)
        complex(4) :: arg1
        complex(8) :: arg2

!$OMP DO FIRSTPRIVATE(arg1, arg2)
do i = 1,10
        call foo(arg1, arg2)
enddo
!$OMP end DO

end subroutine

func.func @_QPfirstprivate_complex2(%arg0: !fir.ref<!fir.complex<4>> {fir.bindc_name = "arg1"}, %arg1: !fir.ref<!fir.complex<8>> {fir.bindc_name = "arg2"}) {
  %0 = fir.alloca !fir.complex<4> {bindc_name = "arg1", pinned, uniq_name = "_QFfirstprivate_complex2Earg1"}
  %1 = fir.load %arg0 : !fir.ref<!fir.complex<4>>
  fir.store %1 to %0 : !fir.ref<!fir.complex<4>>
  %2 = fir.alloca !fir.complex<8> {bindc_name = "arg2", pinned, uniq_name = "_QFfirstprivate_complex2Earg2"}
  %3 = fir.load %arg1 : !fir.ref<!fir.complex<8>>
  fir.store %3 to %2 : !fir.ref<!fir.complex<8>>
  %4 = fir.alloca i32 {adapt.valuebyref}
  %5 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFfirstprivate_complex2Ei"}
  %c1_i32 = arith.constant 1 : i32
  %c10_i32 = arith.constant 10 : i32
  %c1_i32_0 = arith.constant 1 : i32
  omp.wsloop   for  (%arg2) : i32 = (%c1_i32) to (%c10_i32) inclusive step (%c1_i32_0) {
    fir.store %arg2 to %4 : !fir.ref<i32>
    fir.call @_QPfoo(%0, %2) : (!fir.ref<!fir.complex<4>>, !fir.ref<!fir.complex<8>>) -> ()
    omp.yield
  }
  return
}

For some loop indexes, there is no need to allocate the memory and perform the firstprivate operations. For example, running do i = 1, 10 using 8 threads in the second round will only need 2 threads. Why do we do it (private/firstprivate) for each thread? The privatization should be after __kmpc_for_static_init and before __kmpc_for_static_fini if in static schedule.

In D128596#3817864, @peixin wrote:
In D128596#3817758, @kiranchandramohan wrote:
In D128596#3817748, @peixin wrote:
Why wouldn't getAllocaBlock work by itself? From our understanding of Clang it is not required for the private allocas to be inside the runtime calls, so getAllocaBlock will hoist it to the nearest region that will be outlined. Isn't that sufficient?

firOpBuilder.getAllocaBlock() is to call getEntryBlock() if it is not OutlineableOpenMPOpInterface, which is to call getFunction().front(). For the following case, it will hoist the private/firstprivate operations outside the parallel region, which is wrong.
subroutine single_private(x, y)
  real :: x
  real(8) :: y

  !$omp parallel
  !$omp single private(x) firstprivate(y)
  call bar(x, y)
  !$omp end single
  !$omp end parallel
end subroutine single_private
The code (as given below) tries to get a parent of type OutlineableOpenMPOpInterface which the parallel operation has. If this is not happening then there is a bug.
/// Get the block for adding Allocas.
mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
  auto iface =
      getRegion().getParentOfType<mlir::omp::OutlineableOpenMPOpInterface>();
  return iface ? iface.getAllocaBlock() : getEntryBlock();
}
Right, that case works OK. For the following case, the private clause works ok using firOpBuilder.getAllocaBlock(). However, the implementation of firstprivate clause is not reasonable. The alloca ops can be hoisted outside the single region, but the store op for firstprivate clause should be in single region. Clang put all load and store ops in the single region. There is no execution error if put all alloca, load, and store ops outside single region. However, it has worse performance. Actually, moving alloca inside single region will result in less memory usage since it will only allocate one copy of variable for the single thread instead of making the copy for each thread.
 subroutine single_private(x, y)
  real :: x
  real(8) :: y

  !$omp single private(x) firstprivate(y)
  call bar(x, y)
  !$omp end single
end subroutine single_private
Also for the wsloop, the generated IR seems to be not OK, either.
subroutine firstprivate_complex2(arg1, arg2)
        complex(4) :: arg1
        complex(8) :: arg2

!$OMP DO FIRSTPRIVATE(arg1, arg2)
do i = 1,10
        call foo(arg1, arg2)
enddo
!$OMP end DO

end subroutine
func.func @_QPfirstprivate_complex2(%arg0: !fir.ref<!fir.complex<4>> {fir.bindc_name = "arg1"}, %arg1: !fir.ref<!fir.complex<8>> {fir.bindc_name = "arg2"}) {
  %0 = fir.alloca !fir.complex<4> {bindc_name = "arg1", pinned, uniq_name = "_QFfirstprivate_complex2Earg1"}
  %1 = fir.load %arg0 : !fir.ref<!fir.complex<4>>
  fir.store %1 to %0 : !fir.ref<!fir.complex<4>>
  %2 = fir.alloca !fir.complex<8> {bindc_name = "arg2", pinned, uniq_name = "_QFfirstprivate_complex2Earg2"}
  %3 = fir.load %arg1 : !fir.ref<!fir.complex<8>>
  fir.store %3 to %2 : !fir.ref<!fir.complex<8>>
  %4 = fir.alloca i32 {adapt.valuebyref}
  %5 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFfirstprivate_complex2Ei"}
  %c1_i32 = arith.constant 1 : i32
  %c10_i32 = arith.constant 10 : i32
  %c1_i32_0 = arith.constant 1 : i32
  omp.wsloop   for  (%arg2) : i32 = (%c1_i32) to (%c10_i32) inclusive step (%c1_i32_0) {
    fir.store %arg2 to %4 : !fir.ref<i32>
    fir.call @_QPfoo(%0, %2) : (!fir.ref<!fir.complex<4>>, !fir.ref<!fir.complex<8>>) -> ()
    omp.yield
  }
  return
}
For some loop indexes, there is no need to allocate the memory and perform the firstprivate operations. For example, running do i = 1, 10 using 8 threads in the second round will only need 2 threads. Why do we do it (private/firstprivate) for each thread? The privatization should be after __kmpc_for_static_init and before __kmpc_for_static_fini if in static schedule.

I think, in general, LLVM recommends for performance reasons to put allocas in the entry block. See https://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas.

Yes, there will be some over-allocation. But also note that if we did not hoist and there was a loop outside the single construct then this will lead to run-away allocations without a stacksave and stackrestore.

The case of a loop is a little more complicated, since we need to place the allocas for loops in the header of the loop which is not available in the loop form, it is only available when we break the loop into control-flow form. So it is not possible to do it in the high-level MLIR.

These issues will probably be resolved when we move privatisation clauses to the OpenMP dialect and handle it in the OpenMPIRBuilder.

The alloca ops can be hoisted outside the single region, but the store op for firstprivate clause should be in single region.

Yes, this is important. Otherwise, any updates to the value will not be available. I don't remember whether we fixed the issue or whether it is still present. It was captured in https://github.com/flang-compiler/f18-llvm-project/issues/1171#issuecomment-1119997442

peixin updated this revision to Diff 463567.Sep 28 2022, 8:01 AM

peixin edited the summary of this revision. (Show Details)

I think, in general, LLVM recommends for performance reasons to put allocas in the entry block. See https://llvm.org/docs/Frontend/PerformanceTips.html#use-of-allocas.

Yes, there will be some over-allocation. But also note that if we did not hoist and there was a loop outside the single construct then this will lead to run-away allocations without a stacksave and stackrestore.

The case of a loop is a little more complicated, since we need to place the allocas for loops in the header of the loop which is not available in the loop form, it is only available when we break the loop into control-flow form. So it is not possible to do it in the high-level MLIR.

These issues will probably be resolved when we move privatisation clauses to the OpenMP dialect and handle it in the OpenMPIRBuilder.
The alloca ops can be hoisted outside the single region, but the store op for firstprivate clause should be in single region.
Yes, this is important. Otherwise, any updates to the value will not be available. I don't remember whether we fixed the issue or whether it is still present. It was captured in https://github.com/flang-compiler/f18-llvm-project/issues/1171#issuecomment-1119997442

I see. Thanks for the explanations. Borrowed some code from D133686 (add @NimishMishra as co-author) for changing the insertion point. Emit the alloca ops at the beginning of the entry block and load/store ops in the single region.

The issue in wsloop hasn't been fixed. It may need to be fixed until the privatization is done in MLIR.

Harbormaster completed remote builds in B189183: Diff 463567.Sep 28 2022, 8:35 AM

LGTM.

This revision is now accepted and ready to land.Oct 5 2022, 3:31 AM

Closed by commit rGf4accbf55f4d: [flang][OpenMP] Support privatization for single construct (authored by peixin). · Explain WhyOct 5 2022, 5:23 AM

This revision was automatically updated to reflect the committed changes.

peixin added a commit: rGf4accbf55f4d: [flang][OpenMP] Support privatization for single construct.

Revision Contents

Path

Size

flang/

include/

flang/

Lower/

AbstractConverter.h

6 lines

lib/

Lower/

Bridge.cpp

17 lines

OpenMP.cpp

39 lines

test/

Lower/

OpenMP/

single.f90

110 lines

Diff 465347

flang/include/flang/Lower/AbstractConverter.h

Show All 12 Lines
#ifndef FORTRAN_LOWER_ABSTRACTCONVERTER_H		#ifndef FORTRAN_LOWER_ABSTRACTCONVERTER_H
#define FORTRAN_LOWER_ABSTRACTCONVERTER_H		#define FORTRAN_LOWER_ABSTRACTCONVERTER_H

#include "flang/Common/Fortran.h"		#include "flang/Common/Fortran.h"
#include "flang/Lower/LoweringOptions.h"		#include "flang/Lower/LoweringOptions.h"
#include "flang/Lower/PFTDefs.h"		#include "flang/Lower/PFTDefs.h"
#include "flang/Optimizer/Builder/BoxValue.h"		#include "flang/Optimizer/Builder/BoxValue.h"
#include "flang/Semantics/symbol.h"		#include "flang/Semantics/symbol.h"
		#include "mlir/IR/Builders.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"

namespace fir {		namespace fir {
class KindMapping;		class KindMapping;
class FirOpBuilder;		class FirOpBuilder;
} // namespace fir		} // namespace fir
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	public:
/// Get the code defined by a label		/// Get the code defined by a label
virtual pft::Evaluation *lookupLabel(pft::Label label) = 0;		virtual pft::Evaluation *lookupLabel(pft::Label label) = 0;

/// For a given symbol which is host-associated, create a clone using		/// For a given symbol which is host-associated, create a clone using
/// parameters from the host-associated symbol.		/// parameters from the host-associated symbol.
virtual bool		virtual bool
createHostAssociateVarClone(const Fortran::semantics::Symbol &sym) = 0;		createHostAssociateVarClone(const Fortran::semantics::Symbol &sym) = 0;

virtual void copyHostAssociateVar(const Fortran::semantics::Symbol &sym,		virtual void copyHostAssociateVar(
mlir::Block *lastPrivBlock = nullptr) = 0;		const Fortran::semantics::Symbol &sym,
		mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) = 0;

/// Collect the set of symbols with \p flag in \p eval		/// Collect the set of symbols with \p flag in \p eval
/// region if \p collectSymbols is true. Likewise, collect the		/// region if \p collectSymbols is true. Likewise, collect the
/// set of the host symbols with \p flag of the associated symbols in \p eval		/// set of the host symbols with \p flag of the associated symbols in \p eval
/// region if collectHostAssociatedSymbols is true.		/// region if collectHostAssociatedSymbols is true.
virtual void collectSymbolSet(		virtual void collectSymbolSet(
pft::Evaluation &eval,		pft::Evaluation &eval,
llvm::SetVector<const Fortran::semantics::Symbol *> &symbolSet,		llvm::SetVector<const Fortran::semantics::Symbol *> &symbolSet,
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

flang/lib/Lower/Bridge.cpp

Show First 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	for (auto &oper : curRegion.getOps()) {
if (oper.getOperand(ii) == oldVal) {		if (oper.getOperand(ii) == oldVal) {
oper.setOperand(ii, cloneVal);		oper.setOperand(ii, cloneVal);
}		}
}		}
}		}
return bindIfNewSymbol(sym, exv);		return bindIfNewSymbol(sym, exv);
}		}

// FIXME: Generalize this function, so that lastPrivBlock can be removed		void copyHostAssociateVar(
void		const Fortran::semantics::Symbol &sym,
copyHostAssociateVar(const Fortran::semantics::Symbol &sym,		mlir::OpBuilder::InsertPoint *copyAssignIP = nullptr) override final {
mlir::Block *lastPrivBlock = nullptr) override final {
// 1) Fetch the original copy of the variable.		// 1) Fetch the original copy of the variable.
assert(sym.has<Fortran::semantics::HostAssocDetails>() &&		assert(sym.has<Fortran::semantics::HostAssocDetails>() &&
"No host-association found");		"No host-association found");
const Fortran::semantics::Symbol &hsym = sym.GetUltimate();		const Fortran::semantics::Symbol &hsym = sym.GetUltimate();
Fortran::lower::SymbolBox hsb = lookupOneLevelUpSymbol(hsym);		Fortran::lower::SymbolBox hsb = lookupOneLevelUpSymbol(hsym);
assert(hsb && "Host symbol box not found");		assert(hsb && "Host symbol box not found");
fir::ExtendedValue hexv = getExtendedValue(hsb);		fir::ExtendedValue hexv = getExtendedValue(hsb);

// 2) Fetch the copied one that will mask the original.		// 2) Fetch the copied one that will mask the original.
Fortran::lower::SymbolBox sb = shallowLookupSymbol(sym);		Fortran::lower::SymbolBox sb = shallowLookupSymbol(sym);
assert(sb && "Host-associated symbol box not found");		assert(sb && "Host-associated symbol box not found");
assert(hsb.getAddr() != sb.getAddr() &&		assert(hsb.getAddr() != sb.getAddr() &&
"Host and associated symbol boxes are the same");		"Host and associated symbol boxes are the same");
fir::ExtendedValue exv = getExtendedValue(sb);		fir::ExtendedValue exv = getExtendedValue(sb);

// 3) Perform the assignment.		// 3) Perform the assignment.
mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();		mlir::OpBuilder::InsertPoint insPt = builder->saveInsertionPoint();
if (lastPrivBlock)		if (copyAssignIP && copyAssignIP->isSet())
builder->setInsertionPointToStart(lastPrivBlock);		builder->restoreInsertionPoint(*copyAssignIP);
else		else
builder->setInsertionPointAfter(fir::getBase(exv).getDefiningOp());		builder->setInsertionPointAfter(fir::getBase(exv).getDefiningOp());

fir::ExtendedValue lhs, rhs;		fir::ExtendedValue lhs, rhs;
if (lastPrivBlock) {		if (copyAssignIP && copyAssignIP->isSet() &&
		sym.test(Fortran::semantics::Symbol::Flag::OmpLastPrivate)) {
// lastprivate case		// lastprivate case
lhs = hexv;		lhs = hexv;
rhs = exv;		rhs = exv;
} else {		} else {
lhs = exv;		lhs = exv;
rhs = hexv;		rhs = hexv;
}		}

mlir::Location loc = genLocation(sym.name());		mlir::Location loc = genLocation(sym.name());
mlir::Type symType = genType(sym);		mlir::Type symType = genType(sym);
if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {		if (auto seqTy = symType.dyn_cast<fir::SequenceType>()) {
Fortran::lower::StatementContext stmtCtx;		Fortran::lower::StatementContext stmtCtx;
Fortran::lower::createSomeArrayAssignment(*this, lhs, rhs, localSymbols,		Fortran::lower::createSomeArrayAssignment(*this, lhs, rhs, localSymbols,
stmtCtx);		stmtCtx);
stmtCtx.finalize();		stmtCtx.finalize();
} else if (hexv.getBoxOf<fir::CharBoxValue>()) {		} else if (hexv.getBoxOf<fir::CharBoxValue>()) {
fir::factory::CharacterExprHelper{*builder, loc}.createAssign(lhs, rhs);		fir::factory::CharacterExprHelper{*builder, loc}.createAssign(lhs, rhs);
} else if (hexv.getBoxOf<fir::MutableBoxValue>()) {		} else if (hexv.getBoxOf<fir::MutableBoxValue>()) {
TODO(loc, "firstprivatisation of allocatable variables");		TODO(loc, "firstprivatisation of allocatable variables");
} else {		} else {
auto loadVal = builder->create<fir::LoadOp>(loc, fir::getBase(rhs));		auto loadVal = builder->create<fir::LoadOp>(loc, fir::getBase(rhs));
builder->create<fir::StoreOp>(loc, loadVal, fir::getBase(lhs));		builder->create<fir::StoreOp>(loc, loadVal, fir::getBase(lhs));
}		}

if (lastPrivBlock)		if (copyAssignIP && copyAssignIP->isSet() &&
		sym.test(Fortran::semantics::Symbol::Flag::OmpLastPrivate))
builder->restoreInsertionPoint(insPt);		builder->restoreInsertionPoint(insPt);
}		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Utility methods		// Utility methods
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

void collectSymbolSet(		void collectSymbolSet(
▲ Show 20 Lines • Show All 2,804 Lines • Show Last 20 Lines

flang/lib/Lower/OpenMP.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	std::visit(Fortran::common::visitors{
sym = name->symbol;		sym = name->symbol;
}		}
},		},
[&](const Fortran::parser::Name &name) { sym = name.symbol; }},		[&](const Fortran::parser::Name &name) { sym = name.symbol; }},
ompObject.u);		ompObject.u);
return sym;		return sym;
}		}

static void		template <typename Op>
privatizeSymbol(Fortran::lower::AbstractConverter &converter,		static void privatizeSymbol(
		Op &op, Fortran::lower::AbstractConverter &converter,
const Fortran::semantics::Symbol *sym,		const Fortran::semantics::Symbol *sym,
[[maybe_unused]] mlir::Block *lastPrivBlock = nullptr) {		[[maybe_unused]] mlir::OpBuilder::InsertPoint *lastPrivIP = nullptr) {
// Privatization for symbols which are pre-determined (like loop index		// Privatization for symbols which are pre-determined (like loop index
// variables) happen separately, for everything else privatize here.		// variables) happen separately, for everything else privatize here.
if (sym->test(Fortran::semantics::Symbol::Flag::OmpPreDetermined))		if (sym->test(Fortran::semantics::Symbol::Flag::OmpPreDetermined))
return;		return;
bool success = converter.createHostAssociateVarClone(*sym);		bool success = converter.createHostAssociateVarClone(*sym);
(void)success;		(void)success;
assert(success && "Privatization failed due to existing binding");		assert(success && "Privatization failed due to existing binding");
if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate))		if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate)) {
converter.copyHostAssociateVar(*sym);		fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
		mlir::OpBuilder::InsertPoint firstPrivIP, insPt;
		if (mlir::isa<mlir::omp::SingleOp>(op)) {
		insPt = firOpBuilder.saveInsertionPoint();
		firOpBuilder.setInsertionPointToStart(&op.getRegion().front());
		firstPrivIP = firOpBuilder.saveInsertionPoint();
		}
		converter.copyHostAssociateVar(*sym, &firstPrivIP);
		if (mlir::isa<mlir::omp::SingleOp>(op))
		firOpBuilder.restoreInsertionPoint(insPt);
		}
if (sym->test(Fortran::semantics::Symbol::Flag::OmpLastPrivate))		if (sym->test(Fortran::semantics::Symbol::Flag::OmpLastPrivate))
converter.copyHostAssociateVar(*sym, lastPrivBlock);		converter.copyHostAssociateVar(*sym, lastPrivIP);
}		}

template <typename Op>		template <typename Op>
static bool privatizeVars(Op &op, Fortran::lower::AbstractConverter &converter,		static bool privatizeVars(Op &op, Fortran::lower::AbstractConverter &converter,
const Fortran::parser::OmpClauseList &opClauseList,		const Fortran::parser::OmpClauseList &opClauseList,
Fortran::lower::pft::Evaluation &eval) {		Fortran::lower::pft::Evaluation &eval) {
fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();		fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
auto insPt = firOpBuilder.saveInsertionPoint();		auto insPt = firOpBuilder.saveInsertionPoint();
// Symbols in private, firstprivate, and/or lastprivate clauses.		// Symbols in private, firstprivate, and/or lastprivate clauses.
llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols;		llvm::SetVector<const Fortran::semantics::Symbol *> privatizedSymbols;
auto collectOmpObjectListSymbol =		auto collectOmpObjectListSymbol =
[&](const Fortran::parser::OmpObjectList &ompObjectList,		[&](const Fortran::parser::OmpObjectList &ompObjectList,
llvm::SetVector<const Fortran::semantics::Symbol *> &symbolSet) {		llvm::SetVector<const Fortran::semantics::Symbol *> &symbolSet) {
for (const Fortran::parser::OmpObject &ompObject : ompObjectList.v) {		for (const Fortran::parser::OmpObject &ompObject : ompObjectList.v) {
Fortran::semantics::Symbol *sym = getOmpObjectSymbol(ompObject);		Fortran::semantics::Symbol *sym = getOmpObjectSymbol(ompObject);
symbolSet.insert(sym);		symbolSet.insert(sym);
}		}
};		};
// We need just one ICmpOp for multiple LastPrivate clauses.		// We need just one ICmpOp for multiple LastPrivate clauses.
mlir::arith::CmpIOp cmpOp;		mlir::arith::CmpIOp cmpOp;
mlir::Block *lastPrivBlock = nullptr;		mlir::OpBuilder::InsertPoint lastPrivIP;
bool hasLastPrivateOp = false;		bool hasLastPrivateOp = false;
for (const Fortran::parser::OmpClause &clause : opClauseList.v) {		for (const Fortran::parser::OmpClause &clause : opClauseList.v) {
if (const auto &privateClause =		if (const auto &privateClause =
std::get_if<Fortran::parser::OmpClause::Private>(&clause.u)) {		std::get_if<Fortran::parser::OmpClause::Private>(&clause.u)) {
collectOmpObjectListSymbol(privateClause->v, privatizedSymbols);		collectOmpObjectListSymbol(privateClause->v, privatizedSymbols);
} else if (const auto &firstPrivateClause =		} else if (const auto &firstPrivateClause =
std::get_if<Fortran::parser::OmpClause::Firstprivate>(		std::get_if<Fortran::parser::OmpClause::Firstprivate>(
&clause.u)) {		&clause.u)) {
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	if (const auto &privateClause =
if (!hasLastPrivateOp) {		if (!hasLastPrivateOp) {
cmpOp = firOpBuilder.create<mlir::arith::CmpIOp>(		cmpOp = firOpBuilder.create<mlir::arith::CmpIOp>(
wsLoopOp->getLoc(), mlir::arith::CmpIPredicate::eq,		wsLoopOp->getLoc(), mlir::arith::CmpIPredicate::eq,
wsLoopOp->getRegion().front().getArguments()[0],		wsLoopOp->getRegion().front().getArguments()[0],
wsLoopOp->getUpperBound()[0]);		wsLoopOp->getUpperBound()[0]);
}		}
mlir::scf::IfOp ifOp = firOpBuilder.create<mlir::scf::IfOp>(		mlir::scf::IfOp ifOp = firOpBuilder.create<mlir::scf::IfOp>(
wsLoopOp->getLoc(), cmpOp, /else/ false);		wsLoopOp->getLoc(), cmpOp, /else/ false);
lastPrivBlock = &ifOp.getThenRegion().front();		firOpBuilder.setInsertionPointToStart(&ifOp.getThenRegion().front());
		lastPrivIP = firOpBuilder.saveInsertionPoint();
} else {		} else {
TODO(converter.getCurrentLocation(),		TODO(converter.getCurrentLocation(),
"lastprivate clause in constructs other than worksharing-loop");		"lastprivate clause in constructs other than worksharing-loop");
}		}
collectOmpObjectListSymbol(lastPrivateClause->v, privatizedSymbols);		collectOmpObjectListSymbol(lastPrivateClause->v, privatizedSymbols);
hasLastPrivateOp = true;		hasLastPrivateOp = true;
}		}
}		}
Show All 35 Lines	static bool privatizeVars(Op &op, Fortran::lower::AbstractConverter &converter,
}		}

bool needBarrier = false;		bool needBarrier = false;
if (mlir::isa<mlir::omp::SectionOp>(op))		if (mlir::isa<mlir::omp::SectionOp>(op))
firOpBuilder.setInsertionPointToStart(&op.getRegion().back());		firOpBuilder.setInsertionPointToStart(&op.getRegion().back());
else		else
firOpBuilder.setInsertionPointToStart(firOpBuilder.getAllocaBlock());		firOpBuilder.setInsertionPointToStart(firOpBuilder.getAllocaBlock());
for (auto sym : privatizedSymbols) {		for (auto sym : privatizedSymbols) {
privatizeSymbol(converter, sym, lastPrivBlock);		privatizeSymbol(op, converter, sym, &lastPrivIP);
if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate) &&		if (sym->test(Fortran::semantics::Symbol::Flag::OmpFirstPrivate) &&
sym->test(Fortran::semantics::Symbol::Flag::OmpLastPrivate))		sym->test(Fortran::semantics::Symbol::Flag::OmpLastPrivate))
needBarrier = true;		needBarrier = true;
}		}

for (auto sym : defaultSymbols)		for (auto sym : defaultSymbols)
if (!symbolsInNestedRegions.contains(sym) &&		if (!symbolsInNestedRegions.contains(sym) &&
!symbolsInParentRegions.contains(sym) &&		!symbolsInParentRegions.contains(sym) &&
!privatizedSymbols.contains(sym))		!privatizedSymbols.contains(sym))
privatizeSymbol(converter, sym);		privatizeSymbol(op, converter, sym);

// Emit implicit barrier to synchronize threads and avoid data races on		// Emit implicit barrier to synchronize threads and avoid data races on
// initialization of firstprivate variables and post-update of lastprivate		// initialization of firstprivate variables and post-update of lastprivate
// variables.		// variables.
// FIXME: Emit barrier for lastprivate clause when 'sections' directive has		// FIXME: Emit barrier for lastprivate clause when 'sections' directive has
// 'nowait' clause. Otherwise, emit barrier when 'sections' directive has		// 'nowait' clause. Otherwise, emit barrier when 'sections' directive has
// both firstprivate and lastprivate clause.		// both firstprivate and lastprivate clause.
// Emit implicit barrier for linear clause. Maybe on somewhere else.		// Emit implicit barrier for linear clause. Maybe on somewhere else.
▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	createBodyOfOp(Op &op, Fortran::lower::AbstractConverter &converter,
// construct, create empty blocks for all evaluations.		// construct, create empty blocks for all evaluations.
if (eval.lowerAsUnstructured() && !outerCombined)		if (eval.lowerAsUnstructured() && !outerCombined)
createEmptyRegionBlocks(firOpBuilder, eval.getNestedEvaluations());		createEmptyRegionBlocks(firOpBuilder, eval.getNestedEvaluations());

// Insert the terminator.		// Insert the terminator.
if constexpr (std::is_same_v<Op, omp::WsLoopOp> \|\|		if constexpr (std::is_same_v<Op, omp::WsLoopOp> \|\|
std::is_same_v<Op, omp::SimdLoopOp>) {		std::is_same_v<Op, omp::SimdLoopOp>) {
mlir::ValueRange results;		mlir::ValueRange results;
firOpBuilder.create<mlir::omp::YieldOp>(loc, results);		firOpBuilder.create<mlir::omp::YieldOp>(loc, results);
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions The barrier is created by the OpenMPIRBuilder. `copyprivate`'s barrier can be handled separately I guess. https://github.com/llvm/llvm-project/blob/29b37f319ac310e9c023d7c707ecfe1f709807ae/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp#L3352 kiranchandramohan: The barrier is created by the OpenMPIRBuilder. `copyprivate`'s barrier can be handled…
} else {		} else {
firOpBuilder.create<mlir::omp::TerminatorOp>(loc);		firOpBuilder.create<mlir::omp::TerminatorOp>(loc);
}		}

// Reset the insert point to before the terminator.		// Reset the insert point to before the terminator.
resetBeforeTerminator(firOpBuilder, storeOp, block);		resetBeforeTerminator(firOpBuilder, storeOp, block);

// Handle privatization. Do not privatize if this is the outer operation.		// Handle privatization. Do not privatize if this is the outer operation.
▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	createBodyOfOp<omp::ParallelOp>(parallelOp, converter, currentLocation,
eval, &opClauseList);		eval, &opClauseList);
} else if (blockDirective.v == llvm::omp::OMPD_master) {		} else if (blockDirective.v == llvm::omp::OMPD_master) {
auto masterOp =		auto masterOp =
firOpBuilder.create<mlir::omp::MasterOp>(currentLocation, argTy);		firOpBuilder.create<mlir::omp::MasterOp>(currentLocation, argTy);
createBodyOfOp<omp::MasterOp>(masterOp, converter, currentLocation, eval);		createBodyOfOp<omp::MasterOp>(masterOp, converter, currentLocation, eval);
} else if (blockDirective.v == llvm::omp::OMPD_single) {		} else if (blockDirective.v == llvm::omp::OMPD_single) {
auto singleOp = firOpBuilder.create<mlir::omp::SingleOp>(		auto singleOp = firOpBuilder.create<mlir::omp::SingleOp>(
currentLocation, allocateOperands, allocatorOperands, nowaitAttr);		currentLocation, allocateOperands, allocatorOperands, nowaitAttr);
createBodyOfOp<omp::SingleOp>(singleOp, converter, currentLocation, eval);		createBodyOfOp<omp::SingleOp>(singleOp, converter, currentLocation, eval,
		&opClauseList);
} else if (blockDirective.v == llvm::omp::OMPD_ordered) {		} else if (blockDirective.v == llvm::omp::OMPD_ordered) {
auto orderedOp = firOpBuilder.create<mlir::omp::OrderedRegionOp>(		auto orderedOp = firOpBuilder.create<mlir::omp::OrderedRegionOp>(
currentLocation, /simd=/nullptr);		currentLocation, /simd=/nullptr);
createBodyOfOp<omp::OrderedRegionOp>(orderedOp, converter, currentLocation,		createBodyOfOp<omp::OrderedRegionOp>(orderedOp, converter, currentLocation,
eval);		eval);
} else if (blockDirective.v == llvm::omp::OMPD_task) {		} else if (blockDirective.v == llvm::omp::OMPD_task) {
auto taskOp = firOpBuilder.create<mlir::omp::TaskOp>(		auto taskOp = firOpBuilder.create<mlir::omp::TaskOp>(
currentLocation, ifClauseOperand, finalClauseOperand, untiedAttr,		currentLocation, ifClauseOperand, finalClauseOperand, untiedAttr,
mergeableAttr, /in_reduction_vars=/ValueRange(),		mergeableAttr, /in_reduction_vars=/ValueRange(),
/in_reductions=/nullptr, priorityClauseOperand, allocateOperands,		/in_reductions=/nullptr, priorityClauseOperand, allocateOperands,
allocatorOperands);		allocatorOperands);
createBodyOfOp(taskOp, converter, currentLocation, eval, &opClauseList);		createBodyOfOp(taskOp, converter, currentLocation, eval, &opClauseList);
} else if (blockDirective.v == llvm::omp::OMPD_taskgroup) {		} else if (blockDirective.v == llvm::omp::OMPD_taskgroup) {
// TODO: Add task_reduction support		// TODO: Add task_reduction support
auto taskGroupOp = firOpBuilder.create<mlir::omp::TaskGroupOp>(		auto taskGroupOp = firOpBuilder.create<mlir::omp::TaskGroupOp>(
currentLocation, /task_reduction_vars=/ValueRange(),		currentLocation, /task_reduction_vars=/ValueRange(),
/task_reductions=/nullptr, allocateOperands, allocatorOperands);		/task_reductions=/nullptr, allocateOperands, allocatorOperands);
createBodyOfOp(taskGroupOp, converter, currentLocation, eval,		createBodyOfOp(taskGroupOp, converter, currentLocation, eval,
&opClauseList);		&opClauseList);
} else {		} else {
TODO(converter.getCurrentLocation(), "Unhandled block directive");		TODO(converter.getCurrentLocation(), "Unhandled block directive");
}		}
▲ Show 20 Lines • Show All 1,105 Lines • Show Last 20 Lines

flang/test/Lower/OpenMP/single.f90

	!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - \| FileCheck %s --check-prefixes="FIRDialect,OMPDialect"			!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - \| FileCheck %s
	!RUN: %flang_fc1 -emit-fir -fopenmp %s -o - \| fir-opt --cfg-conversion \| fir-opt --fir-to-llvm-ir \| FileCheck %s --check-prefixes="LLVMDialect,OMPDialect"			!RUN: bbc -emit-fir -fopenmp %s -o - \| FileCheck %s
				domadaUnsubmitted Not Done Reply Inline Actions Why do you use bbc instead of flang-new? domada: Why do you use bbc instead of flang-new?
				peixinAuthorUnsubmitted Done Reply Inline Actions I use both `bbc` and `flang-new` for lowering test. I also remove the `fir-opt` check since it should not be here. @kiranchandramohan planned to change the lowering tests (removing the lowering to LLVMIR tests under flang/test/Lower/OpenMP) after upstreaming is done. @kiranchandramohan When do you plan to change them? peixin: I use both `bbc` and `flang-new` for lowering test. I also remove the `fir-opt` check since it…
				kiranchandramohanUnsubmitted Not Done Reply Inline Actions Sorry about the delay here. It is on my list. I will get to it soon. kiranchandramohan: Sorry about the delay here. It is on my list. I will get to it soon.

	!===============================================================================			!===============================================================================
	! Single construct			! Single construct
	!===============================================================================			!===============================================================================

	!FIRDialect-LABEL: func @_QPomp_single			!CHECK-LABEL: func @_QPomp_single
	!FIRDialect-SAME: (%[[x:.*]]: !fir.ref<i32> {fir.bindc_name = "x"})			!CHECK-SAME: (%[[x:.*]]: !fir.ref<i32> {fir.bindc_name = "x"})
	subroutine omp_single(x)			subroutine omp_single(x)
	integer, intent(inout) :: x			integer, intent(inout) :: x
	!OMPDialect: omp.parallel			!CHECK: omp.parallel
	!$omp parallel			!$omp parallel
	!OMPDialect: omp.single			!CHECK: omp.single
	!$omp single			!$omp single
	!FIRDialect: %[[xval:.*]] = fir.load %[[x]] : !fir.ref<i32>			!CHECK: %[[xval:.*]] = fir.load %[[x]] : !fir.ref<i32>
	!FIRDialect: %[[res:.]] = arith.addi %[[xval]], %{{.}} : i32			!CHECK: %[[res:.]] = arith.addi %[[xval]], %{{.}} : i32
	!FIRDialect: fir.store %[[res]] to %[[x]] : !fir.ref<i32>			!CHECK: fir.store %[[res]] to %[[x]] : !fir.ref<i32>
	x = x + 12			x = x + 12
	!OMPDialect: omp.terminator			!CHECK: omp.terminator
	!$omp end single			!$omp end single
	!OMPDialect: omp.terminator			!CHECK: omp.terminator
	!$omp end parallel			!$omp end parallel
	end subroutine omp_single			end subroutine omp_single

	!===============================================================================			!===============================================================================
	! Single construct with nowait			! Single construct with nowait
	!===============================================================================			!===============================================================================

	!FIRDialect-LABEL: func @_QPomp_single_nowait			!CHECK-LABEL: func @_QPomp_single_nowait
	!FIRDialect-SAME: (%[[x:.*]]: !fir.ref<i32> {fir.bindc_name = "x"})			!CHECK-SAME: (%[[x:.*]]: !fir.ref<i32> {fir.bindc_name = "x"})
	subroutine omp_single_nowait(x)			subroutine omp_single_nowait(x)
	integer, intent(inout) :: x			integer, intent(inout) :: x
	!OMPDialect: omp.parallel			!CHECK: omp.parallel
	!$omp parallel			!$omp parallel
	!OMPDialect: omp.single nowait			!CHECK: omp.single nowait
	!$omp single			!$omp single
	!FIRDialect: %[[xval:.*]] = fir.load %[[x]] : !fir.ref<i32>			!CHECK: %[[xval:.*]] = fir.load %[[x]] : !fir.ref<i32>
	!FIRDialect: %[[res:.]] = arith.addi %[[xval]], %{{.}} : i32			!CHECK: %[[res:.]] = arith.addi %[[xval]], %{{.}} : i32
	!FIRDialect: fir.store %[[res]] to %[[x]] : !fir.ref<i32>			!CHECK: fir.store %[[res]] to %[[x]] : !fir.ref<i32>
	x = x + 12			x = x + 12
	!OMPDialect: omp.terminator			!CHECK: omp.terminator
	!$omp end single nowait			!$omp end single nowait
	!OMPDialect: omp.terminator			!CHECK: omp.terminator
	!$omp end parallel			!$omp end parallel
	end subroutine omp_single_nowait			end subroutine omp_single_nowait

	!===============================================================================			!===============================================================================
	! Single construct with allocate			! Single construct with allocate
	!===============================================================================			!===============================================================================

	!FIRDialect-LABEL: func @_QPsingle_allocate			!CHECK-LABEL: func @_QPsingle_allocate
	subroutine single_allocate()			subroutine single_allocate()
	use omp_lib			use omp_lib
	integer :: x			integer :: x
	!OMPDialect: omp.parallel {			!CHECK: omp.parallel {
	!$omp parallel			!$omp parallel
	!OMPDialect: omp.single allocate(			!CHECK: omp.single allocate(%{{.+}} : i32 -> %{{.+}} : !fir.ref<i32>) {
	!FIRDialect: %{{.+}} : i32 -> %{{.+}} : !fir.ref<i32>
	!LLVMDialect: %{{.+}} : i32 -> %{{.+}} : !llvm.ptr<i32>
	!OMPDialect: ) {
	!$omp single allocate(omp_high_bw_mem_alloc: x) private(x)			!$omp single allocate(omp_high_bw_mem_alloc: x) private(x)
	!FIRDialect: arith.addi			!CHECK: arith.addi
	x = x + 12			x = x + 12
	!OMPDialect: omp.terminator			!CHECK: omp.terminator
	!$omp end single			!$omp end single
	!OMPDialect: omp.terminator			!CHECK: omp.terminator
	!$omp end parallel			!$omp end parallel
	end subroutine single_allocate			end subroutine single_allocate

				!===============================================================================
				! Single construct with private/firstprivate
				!===============================================================================

				! CHECK-LABEL: func.func @_QPsingle_privatization(
				! CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<f32> {fir.bindc_name = "x"},
				! CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<f64> {fir.bindc_name = "y"}) {
				! CHECK: %[[VAL_2:.*]] = fir.alloca f32 {bindc_name = "x", pinned, uniq_name = "_QFsingle_privatizationEx"}
				! CHECK: %[[VAL_3:.*]] = fir.alloca f64 {bindc_name = "y", pinned, uniq_name = "_QFsingle_privatizationEy"}
				! CHECK: omp.single {
				! CHECK: %[[VAL_4:.*]] = fir.load %[[VAL_1]] : !fir.ref<f64>
				! CHECK: fir.store %[[VAL_4]] to %[[VAL_3]] : !fir.ref<f64>
				! CHECK: fir.call @_QPbar(%[[VAL_2]], %[[VAL_3]]) : (!fir.ref<f32>, !fir.ref<f64>) -> ()
				! CHECK: omp.terminator
				! CHECK: }
				! CHECK: return
				! CHECK: }

				subroutine single_privatization(x, y)
				real :: x
				real(8) :: y

				!$omp single private(x) firstprivate(y)
				call bar(x, y)
				!$omp end single
				end subroutine

				! CHECK-LABEL: func.func @_QPsingle_privatization2(
				! CHECK-SAME: %[[VAL_0:.*]]: !fir.ref<f32> {fir.bindc_name = "x"},
				! CHECK-SAME: %[[VAL_1:.*]]: !fir.ref<f64> {fir.bindc_name = "y"}) {
				! CHECK: omp.parallel {
				! CHECK: %[[VAL_2:.*]] = fir.alloca f32 {bindc_name = "x", pinned, uniq_name = "_QFsingle_privatization2Ex"}
				! CHECK: %[[VAL_3:.*]] = fir.alloca f64 {bindc_name = "y", pinned, uniq_name = "_QFsingle_privatization2Ey"}
				! CHECK: omp.single {
				! CHECK: %[[VAL_4:.*]] = fir.load %[[VAL_1]] : !fir.ref<f64>
				! CHECK: fir.store %[[VAL_4]] to %[[VAL_3]] : !fir.ref<f64>
				! CHECK: fir.call @_QPbar(%[[VAL_2]], %[[VAL_3]]) : (!fir.ref<f32>, !fir.ref<f64>) -> ()
				! CHECK: omp.terminator
				! CHECK: }
				! CHECK: omp.terminator
				! CHECK: }
				! CHECK: return
				! CHECK: }

				subroutine single_privatization2(x, y)
				real :: x
				real(8) :: y

				!$omp parallel
				!$omp single private(x) firstprivate(y)
				call bar(x, y)
				!$omp end single
				!$omp end parallel
				end subroutine

This is an archive of the discontinued LLVM Phabricator instance.

[flang][OpenMP] Support privatization for single constructClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 465347

flang/include/flang/Lower/AbstractConverter.h

flang/lib/Lower/Bridge.cpp

flang/lib/Lower/OpenMP.cpp

flang/test/Lower/OpenMP/single.f90

[flang][OpenMP] Support privatization for single construct
ClosedPublic