This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
lib/Lower/
-
Lower/
2/6
OpenMP.cpp
-
test/Lower/OpenMP/
-
Lower/
-
OpenMP/
10/18
atomic-capture.f90
-
atomic-update.f90
-
mlir/lib/Dialect/OpenMP/IR/
-
lib/
-
Dialect/
-
OpenMP/
-
IR/
1/3
OpenMPDialect.cpp

Differential D127272

[flang][OpenMP] Lowering support for atomic capture
ClosedPublic

Authored by NimishMishra on Jun 8 2022, 12:56 AM.

Download Raw Diff

Details

Reviewers

sscalpone
jdoerfert
shraiysh
kiranchandramohan
kiranktp
peixin
MatsPetersson

Commits

rGb2eceea3929e: [flang][OpenMP] Lowering support for atomic capture

Summary

This patch adds lowering support for atomic capture operation. First is created a region (without any operand) for the atomic capture operation. Then based on one of the following configurations...

[update-stmt, capture-stmt]
[capture-stmt, update-stmt]
[capture-stmt, write-stmt]

... the lowering proceeds by creating these individual operations inside the atomic capture's region.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

NimishMishra created this revision.Jun 8 2022, 12:56 AM

Herald added a reviewer: sscalpone. · View Herald TranscriptJun 8 2022, 12:56 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 23 others. · View Herald Transcript

NimishMishra requested review of this revision.Jun 8 2022, 12:56 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 8 2022, 12:56 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B168488: Diff 435062.Jun 8 2022, 12:57 AM

NimishMishra updated this revision to Diff 435066.Jun 8 2022, 1:01 AM

NimishMishra edited the summary of this revision. (Show Details)

NimishMishra removed a parent revision: D125668: [flang][OpenMP] Lowering support for atomic update construct.

Harbormaster completed remote builds in B168489: Diff 435066.Jun 8 2022, 1:02 AM

This follows after D125668.

TODO: handling pointers in atomic capture construct

Thanks for working on this. I have a few comments on generated IR and changes in OpenMPDialect.cpp. Please excuse the formatting issues, if any as I'm commenting this from a mobile :')

flang/lib/Lower/OpenMP.cpp
1071	Shouldn't this be done in semantics?
flang/test/Lower/OpenMP/atomic-capture.f90
35	Shouldn't all of this (34-41) be wrapped in an omp.atomic.read and omp.atomic.update operation? Why can't we generate that here? Relaxing the restrictions on omp.capture is not a solution for this when it's possible to express this same thing in current syntax.
mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
936	Is this move required?
941	Can you please add a testcase for this in `mlir/test/Dialect/OpenMP/ops.mlir`? I'm having trouble understanding why this is required.

Herald added a subscriber: Peiming. · View Herald TranscriptJun 9 2022, 11:56 AM

NimishMishra added inline comments.Jun 9 2022, 9:33 PM

flang/lib/Lower/OpenMP.cpp
1071	This is the best approach I could think of in order to understand which capture construct combination to lower to. So I put basic "structural" checks for `v=x` and `x=x op expr` statements. There are more semantic checks attached with these. I assume we rely on the semantics phase to take care of them. These helper functions here are only doing structural checks.
flang/test/Lower/OpenMP/atomic-capture.f90
35	I do not understand. What do you mean by "wrapped" in a read and write operation? Generally, the read operation is not problematic. In the write operation however, the FIR for expression evaluation is involved. I was attempting to control insertion points to make this expression evaluation "outside" the omp.atomic.capture, but I couldn't do it. However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write). Please correct me if I am missing something here.
mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
936	No. It's unintentional. I will revert it.

shraiysh added inline comments.Jun 9 2022, 10:20 PM

flang/test/Lower/OpenMP/atomic-capture.f90
34–43	If you can generate this, then that's the most accurate imo. This does not violate the semantics of atomic construct because it clearly says that - Only the read and write of the location designated by x are performed mutually atomically. Neither the evaluation of expr or expr_list, nor the write to the location designated by v, need be atomic with respect to the read or write of the location designated by x.
36–42	I do not understand. What do you mean by "wrapped" in a read and write operation? This is what I meant. However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write). Ideally, the evaluation of the expression should not be inside the atomic region at all. However, if that's somehow not possible, it should be pushed inside an omp.atomic.update expression because omp.atomic.update operation supports multiple operations in it's region. If that cannot be done too, we should justify adding this relaxation to omp.atomic.capture independent of Flang by answering "what kind of capture executions cannot be expressed in MLIR with the current syntax?". As long as they can be expressed, it should be the job of flang to lower appropriately.

NimishMishra added inline comments.Jun 9 2022, 10:38 PM

flang/test/Lower/OpenMP/atomic-capture.f90
36–42	Okay. Then I will work on moving the evaluation of the expression outside the atomic region. I think I have a way to do that.

shraiysh added inline comments.Jun 9 2022, 11:03 PM

flang/lib/Lower/OpenMP.cpp
1113–1114	This should not always be true. If the function call on rhs does not use the variable on the lhs, then this is an atomic write statement. Modeling it as an update sort of works, in the sense that there is no visible change in behavior of the program, but the generated IR will not be entirely accurate. If it is very hard to deduce write here, maybe mention it as a todo?
1115–1117	I meant can something like this work? Relying on semantics for a valid binary operator.

NimishMishra added inline comments.Jun 9 2022, 11:09 PM

flang/lib/Lower/OpenMP.cpp
1113–1114	You are right. I missed it. I will revisit these functions and see what can be done.

Improved design of the solution

NimishMishra added inline comments.Aug 1 2022, 4:55 AM

flang/test/Lower/OpenMP/atomic-capture.f90
82	I had a discussion to make at this point. The verifier for atomic capture checks if `opsInRegion.size() == 3` i.e. if the number of operations in a region were 3. This issue came up before during lowering of `omp.atomic.write` inside atomic capture, since a write statement has expression evaluation which also takes up space. @shraiysh suggested to keep this expression evaluation outside the `omp.atomic.capture`, since is what I have done currently. However, with pointers, the issue has resurfaced. This particular `b = a` lowers as %0 = allocate c %1 = allocate d %2 = allocate a %3 = allocate b omp.atomic.capture{ omp.atomic.update{......} %4 = load a % 5 = load b omp.atomic.read %5 = %4 omp.terminator } The discussion I wish to have is whether the verification method here should be changed, or should we evaluate all LHS and RHS of an atomic assignment statement beforehand. To change the verification, I was thinking like the following: Ensure in a list of operations, there are exactly two omp.atomic operations and exactly one omp.terminator operations Last operation in the region is omp.terminator Second last operation in the region is necessarily a omp.atomic operation

Harbormaster completed remote builds in B178552: Diff 448993.Aug 1 2022, 5:19 AM

Ping for review!

NimishMishra mentioned this in D126612: [flang][OpenMP] Added semantic checks for atomic capture construct.Aug 18 2022, 6:38 PM

Apologies for the delay in review. Please let me know what you think. I will join the next OpenMP call and we can maybe discuss this.

flang/test/Lower/OpenMP/atomic-capture.f90
82	The generated code should look like the following - // %a_ptr : !fir.ref<!fir.ref<i32>> %b_ptr = !fir.ref<!fir.ref<i32>> %a_addr = load %a_ptr : !fir.ref<!fir.ref<i32>> %b_addr = load %b_ptr : !fir.ref<!fir.ref<i32>> // %a_addr : !fir.ref<i32>, %b_addr : !fir.ref<i32> omp.atomic.capture{ omp.atomic.update %a_addr : !fir.ref<i32> { ^bb0(%a_val: i32): %b_val = load %b_addr : !fir.ref<i32> %temp = arith.addi %a_val, %b_val : i32 omp.yield %temp } omp.atomic.read %b_addr = %a_addr omp.terminator } I get the sense that this is convoluted on flang's end to generate it. It is however not a good idea to relax the constraints on omp.capture because of this. I will reiterate that if we come up with an OpenMP atomic evaluation that cannot be expressed by hand with the `omp.atomic.capture` operation, then we should definitely change it. Just because flang isn't able to generate it - this isn't a good enough reason to alter the operation. This, and the fact that if we change the number of operations inside atomic capture, we have to worry about lowering it to LLVM IR - which will be harder as the operation relaxes. If you strongly need to relax the constraints on `omp.atomic.capture`, we should first make sure that the relaxed version translates properly to LLVM IR for execution (probably as a separate patch). I wanted to put an idea out - maybe to ease the difficulty of generation of `omp.atomic.capture`, we can define an `fir.omp.atomic.capture` operation that accepts multiple operations under it. Then during canonicalization (or some other pass) in FIR we can push the unnecessary operations (load a, load b in your example) outside the `fir.omp.atomic.capture` operation to generate `omp.atomic.capture` operation. Does that sound like it would make the implementation more straightforward?

LGTM

NimishMishra added inline comments.Aug 30 2022, 6:52 PM

flang/test/Lower/OpenMP/atomic-capture.f90
82	Ok. I will try to keep loading of the two addresses outside the capture region entirely.

Changed design of the patch to evaluate LHS and RHS of the two assignment statements before generating the omp.atomic.capture operation.

flang/test/Lower/OpenMP/atomic-capture.f90
82	@shraiysh Does this IR look ok?

Harbormaster completed remote builds in B184490: Diff 457145.Aug 31 2022, 6:54 PM

shraiysh added inline comments.Aug 31 2022, 9:19 PM

flang/test/Lower/OpenMP/atomic-capture.f90
82	Yes, the current testcases look perfect to me and I cannot spot any errors in them. Thanks for the patience and to get it to work 👏 . I have not reviewed the code itself, but functionality wise, it looks okay to me. Please feel free to go ahead without my approval for the code as it might be sometime before I get time to review the code. If because of some review comments you happen to change the testcases before landing this, then let me know and I will review the updated testcases as soon as I can.

The following test case fails.

program main
  implicit none
  integer, parameter :: n1 = 10
  integer, parameter :: n2 = 100
  integer, parameter :: n = 30
  integer :: idx(n2)
  integer :: i
  integer(1) :: xi1(n1), yi1(n1), zi1(n1), oi1(n1), pi1(n1), qi1(n1), expecti1(n1)
  integer(8) :: xi8(n1), yi8(n1), zi8(n1), oi8(n1), pi8(n1), qi8(n1), expecti8(n1)
  logical :: rst(n) = .false.

  do i = 1, n2
    idx(i) = mod(i, n1) + 1
  end do
  ! add integer(1)
  xi1 = 0
  yi1 = 0
  zi1 = 0
  expecti1 = [38, -52, -42, -32, -22, -12, -2, 8, 18, 28]
  call atomic_capture_addi1(xi1, yi1, zi1, oi1, pi1, qi1, idx, n2)
  rst(1:10) = xi1 .eq. expecti1
  rst(11:20) = yi1 .eq. expecti1
  rst(21:30) = zi1 .eq. expecti1
  if (any(rst .neqv. .true.)) STOP 1
  print *, "PASS"

contains
  integer(1) function fi1(i)
    integer :: i
    fi1 = i
  end function
  integer(8) function fi8(i)
    integer :: i
    fi8 = i
  end function
  subroutine atomic_capture_addi1(x, y, z, o, p, q, idx, n)
    integer(1) :: x(*), y(*), z(*), o(*), p(*), q(*)
    integer :: idx(*)
    integer :: i, n
    !$omp parallel do shared(x, y, z, o, p, q, idx, n)
    do i = 1, n
      !$omp atomic capture
      x(idx(i)) = x(idx(i)) + fi1(i)
      o(idx(i)) = x(idx(i))
      !$omp end atomic
      !$omp atomic capture
      y(idx(i)) = fi1(i) + y(idx(i))
      p(idx(i)) = y(idx(i))
      !$omp end atomic
      !$omp atomic capture
      z(idx(i)) = z(idx(i)) + fi8(i)
      q(idx(i)) = z(idx(i))
      !$omp end atomic
    end do
  end subroutine
end program main

$ gfortran -fopenmp test.f90 && ./a.out
 PASS
$ flang-new -flang-experimental-exec -fopenmp test.f90 
flang-new: /home/qpx/compilers/llvm-community/omp-dev/llvm-project/flang/lib/Lower/OpenMP.cpp:1591: void genOmpAtomicUpdateStatement(Fortran::lower::AbstractConverter&, Fortran::lower::pft::Evaluation&, mlir::Value, mlir::Type, const Fortran::parser::Variable&, const Fortran::parser::Expr&, const Fortran::parser::OmpAtomicClauseList*, const Fortran::parser::OmpAtomicClauseList*): Assertion `name && name->symbol && "No symbol attached to atomic update variable"' failed.

peixin requested changes to this revision.Sep 29 2022, 6:31 PM

This revision now requires changes to proceed.Sep 29 2022, 6:31 PM

Added a fix to handle Array Refs in atomic update constructs

Herald added a subscriber: sunshaoce. · View Herald TranscriptJan 16 2023, 4:31 AM

Harbormaster completed remote builds in B208016: Diff 489503.Jan 16 2023, 5:22 AM

LGTM

See comment inline about the latest change.

Since we do not have a mechanism for handling array element, I think you can add a hard TODO for the array element case. We can move ahead with the rest of the patch. File a github issue for the array element case.

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	I believe the semantics of `atomic.update` is that it will load the `address` that is provided to it and that will be available as the basic block argument `ARG`. The body of the update will use this loaded value `ARG` to perform the update and yield the updated value which will be stored again at the `address`. I think this update operation will not work as expected since it is not using the automatically loaded value in `ARG` and it is loading the value at the address passed to the `atomic.update` op and adds the constant to it. This will end up computing which is not what we want. `numbers(1) = numbers(1) + numbers(1) + 10`. Let me know if I missed a point.

This revision now requires changes to proceed.Feb 19 2023, 11:15 AM

kiranchandramohan added inline comments.Feb 19 2023, 2:28 PM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	I think I got that wrong. It will not double add since the new code is not touching the `ARG`. I tried fetching the patch but it shows some issues. Could you rebase? I would like to have a look at the IR that is generated. Particularly, we have to check, whether the following load is an atomic load in the LLVM IR. !CHECK: %[[array_element_inner:.*]] = fir.load %[[array_element_ref]] whether the loop containing `cmpxchg` is well-formed.

NimishMishra added inline comments.Mar 9 2023, 6:16 AM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	Hi Kiran. I was trying to understand the requirement here. I do not completely understand why `cmpxchg` could be a problem here? Wouldn't the compare and the exchange happen inside the atomic region. I mean I am just trying to understand how the generated IR should look like, to make sure I am doing it correctly.

Rebased with main. Fixed generation of omp.atomic.read

Herald added a subscriber: jplehr. · View Herald TranscriptMar 22 2023, 8:30 PM

NimishMishra added inline comments.Mar 22 2023, 8:35 PM

flang/test/Lower/OpenMP/atomic-capture.f90

144–154

Hi Kiran.

The LLVM IR is as follows. Does it look ok? The load is within the Atomic Update block, so it should be fine right?

llvm.func @_QParray_refs() {                                                                                                                 
  %0 = llvm.mlir.constant(1.000000e+01 : f32) : f32                                                                                         
  %1 = llvm.mlir.constant(1 : i64) : i64                                                                                                     
  %2 = llvm.alloca %1 x !llvm.array<5 x f32> {bindc_name = "numbers", in_type = !fir.array<5xf32>, operand_segment_sizes = array<i32: 0, 0>,
uniq_name = "_QFarray_refsEnumbers"} : (i64) -> !llvm.ptr<array<5 x f32>>                                                                   
  %3 = llvm.alloca %1 x f32 {bindc_name = "x", in_type = f32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFarray_refsEx"} : (i64
) -> !llvm.ptr<f32>                                                                                                                         
  %4 = llvm.getelementptr %2[0, 0] : (!llvm.ptr<array<5 x f32>>) -> !llvm.ptr<f32>                                                           
  omp.atomic.capture   {                                                                                                                     
    omp.atomic.update   %4 : !llvm.ptr<f32> {                                                                                               
    ^bb0(%arg0: f32):                                                                                                                       
      %5 = llvm.load %4 : !llvm.ptr<f32>                                                                                                     
      %6 = llvm.fadd %5, %0  {fastmathFlags = #llvm.fastmath<contract>} : f32
      omp.yield(%6 : f32)                                             
    }                                                                 
    omp.atomic.read %3 = %4   : !llvm.ptr<f32>, f32
  }         
  llvm.return

Harbormaster completed remote builds in B221197: Diff 507581.Mar 22 2023, 8:46 PM

NimishMishra added inline comments.Mar 22 2023, 11:23 PM

flang/test/Lower/OpenMP/atomic-capture.f90

144–154

Apologies. That is the MLIR dialect. Please find the LLVM IR below. It also has a cmpxchg instruction

The test case is simple:

!$omp atomic capture
      x = y
      y = x + y
!$omp end capture

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca float, i64 1, align 4
  store float 2.000000e+01, ptr %1, align 4
  store float 1.000000e+01, ptr %2, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %2 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %3 = phi i32 [ %.atomic.load, %entry ], [ %8, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %3 to float
  %4 = load float, ptr %1, align 4
  %5 = fadd contract float %4, %.atomic.fltCast
  store float %5, ptr %x.new.val, align 4
  %6 = load i32, ptr %x.new.val, align 4
  %7 = cmpxchg ptr %2, i32 %3, i32 %6 monotonic monotonic, align 4
  %8 = extractvalue { i32, i1 } %7, 0
  %9 = extractvalue { i32, i1 } %7, 1
  br i1 %9, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %.atomic.fltCast, ptr %1, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

NimishMishra added inline comments.Mar 22 2023, 11:28 PM

flang/test/Lower/OpenMP/atomic-capture.f90

144–154

And for the array reference test case

!$omp atomic capture
      x(1) = y(1)
      y(1) = x(1) + y(1)
!$omp end capture

The following IR is generated:

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca [5 x float], i64 1, align 4
  %3 = alloca [5 x float], i64 1, align 4
  %4 = getelementptr [5 x float], ptr %2, i32 0, i32 0
  store float 2.000000e+01, ptr %4, align 4
  %5 = getelementptr [5 x float], ptr %3, i32 0, i32 0
  store float 1.000000e+01, ptr %5, align 4
  %6 = load float, ptr %4, align 4
  %7 = load float, ptr %5, align 4
  %8 = fadd contract float %6, %7
  store float %8, ptr %1, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %4 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %9 = phi i32 [ %.atomic.load, %entry ], [ %13, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %9 to float
  %10 = load float, ptr %5, align 4
  store float %10, ptr %x.new.val, align 4
  %11 = load i32, ptr %x.new.val, align 4
  %12 = cmpxchg ptr %4, i32 %9, i32 %11 monotonic monotonic, align 4
  %13 = extractvalue { i32, i1 } %12, 0
  %14 = extractvalue { i32, i1 } %12, 1
  br i1 %14, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %10, ptr %5, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

kiranchandramohan added inline comments.Mar 27 2023, 4:55 PM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	The concern here is that the atomically loaded value is not used in the update operation. AFAIU, the cmpxchg instruction only updates the location if the value that is currently at that location equals the value that is used for the update. So, if we are doing y = y + x. An initial atomic load is made of y (=y_old) and it is added with the value at x to obtain the value y_old + x. Before storing this value at y, it is checked that the current resident value at y is equal to y_old. The problem here (for the array-element case) is that the value to be used for updating is not obtained using the atomically loaded value y_old, but it is using a different value and that does not seem correct. Also, the update operation (addition here) has to be inside the loop since it addition should be performed on the atomically loaded value.

NimishMishra added inline comments.Mar 29 2023, 7:32 PM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	I understand now. Thank you. The patch can not go ahead in its current form then. Do you any suggestions on how to go forward with fixing it then? Johannes did mention an alternative strategy some time back, but I am not sure how to start on that direction. Can you give some initial direction?

Please remove the handling of the array element case and remove the test for it. We can handle array elements in a separate patch.

I believe the rest of the code looks good. Thanks for the patience and the changes.

flang/lib/Lower/OpenMP.cpp
1017–1018	For the array element case, please add a Not Yet Implemented TODO. We can handle this separately.

This revision was not accepted when it landed; it landed in state Needs Review.May 3 2023, 9:48 PM

Closed by commit rGb2eceea3929e: [flang][OpenMP] Lowering support for atomic capture (authored by NimishMishra). · Explain Why

This revision was automatically updated to reflect the committed changes.

NimishMishra added a commit: rGb2eceea3929e: [flang][OpenMP] Lowering support for atomic capture.

Revision Contents

Path

Size

flang/

lib/

Lower/

OpenMP.cpp

360 lines

test/

Lower/

OpenMP/

atomic-capture.f90

129 lines

atomic-update.f90

155 lines

mlir/

lib/

Dialect/

OpenMP/

IR/

OpenMPDialect.cpp

25 lines

Diff 435062

flang/lib/Lower/OpenMP.cpp

Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines

/// \param [in] outerCombined - is this an outer operation - prevents /// \param [in] outerCombined - is this an outer operation - prevents

/// privatization. /// privatization.

template <typename Op> template <typename Op>

static void static void

createBodyOfOp(Op &op, Fortran::lower::AbstractConverter &converter, createBodyOfOp(Op &op, Fortran::lower::AbstractConverter &converter,

mlir::Location &loc, Fortran::lower::pft::Evaluation &eval, mlir::Location &loc, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpClauseList *clauses = nullptr, const Fortran::parser::OmpClauseList *clauses = nullptr,

const SmallVector<const Fortran::semantics::Symbol *> &args = {}, const SmallVector<const Fortran::semantics::Symbol *> &args = {},

bool outerCombined = false) { bool outerCombined = false,

const Fortran::parser::Expr *expr = nullptr) {

fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder(); fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();

// If an argument for the region is provided then create the block with that // If an argument for the region is provided then create the block with that

// argument. Also update the symbol's address with the mlir argument value. // argument. Also update the symbol's address with the mlir argument value.

// e.g. For loops the argument is the induction variable. And all further // e.g. For loops the argument is the induction variable. And all further

// uses of the induction variable should use this mlir value. // uses of the induction variable should use this mlir value.

mlir::Operation *storeOp = nullptr; mlir::Operation *storeOp = nullptr;

if (args.size()) { if (args.size()) {

std::size_t loopVarTypeSize = 0; std::size_t loopVarTypeSize = 0;

for (const Fortran::semantics::Symbol *arg : args) for (const Fortran::semantics::Symbol *arg : args)

loopVarTypeSize = std::max(loopVarTypeSize, arg->GetUltimate().size()); loopVarTypeSize = std::max(loopVarTypeSize, arg->GetUltimate().size());

mlir::Type loopVarType = getLoopVarType(converter, loopVarTypeSize); mlir::Type varType;

if constexpr (std::is_same_v<Op, omp::AtomicUpdateOp>) {

// In case of AtomicUpdate assignment statement, let LHS variable type =

// RHS expression type

Fortran::lower::StatementContext stmtCtx;

mlir::Value result = fir::getBase(

converter.genExprValue(*Fortran::semantics::GetExpr(*expr), stmtCtx));

varType = result.getType();

} else {

varType = getLoopVarType(converter, loopVarTypeSize);

}

SmallVector<Type> tiv; SmallVector<Type> tiv;

SmallVector<Location> locs; SmallVector<Location> locs;

for (int i = 0; i < (int)args.size(); i++) { for (int i = 0; i < (int)args.size(); i++) {

tiv.push_back(loopVarType); tiv.push_back(varType);

locs.push_back(loc); locs.push_back(loc);

} }

firOpBuilder.createBlock(&op.getRegion(), {}, tiv, locs); firOpBuilder.createBlock(&op.getRegion(), {}, tiv, locs);

int argIndex = 0; int argIndex = 0;

// The argument is not currently in memory, so make a temporary for the // The argument is not currently in memory, so make a temporary for the

// argument, and store it there, then bind that location to the argument. // argument, and store it there, then bind that location to the argument.

for (const Fortran::semantics::Symbol *arg : args) { for (const Fortran::semantics::Symbol *arg : args) {

mlir::Value val = mlir::Value val =

fir::getBase(op.getRegion().front().getArgument(argIndex)); fir::getBase(op.getRegion().front().getArgument(argIndex));

mlir::Value temp = firOpBuilder.createTemporary( mlir::Value temp = firOpBuilder.createTemporary(

loc, loopVarType, loc, varType,

llvm::ArrayRef<mlir::NamedAttribute>{ llvm::ArrayRef<mlir::NamedAttribute>{

Fortran::lower::getAdaptToByRefAttr(firOpBuilder)}); Fortran::lower::getAdaptToByRefAttr(firOpBuilder)});

storeOp = firOpBuilder.create<fir::StoreOp>(loc, val, temp); storeOp = firOpBuilder.create<fir::StoreOp>(loc, val, temp);

converter.bindSymbol(*arg, temp); converter.bindSymbol(*arg, temp);

argIndex++; argIndex++;

} }

} else { } else {

firOpBuilder.createBlock(&op.getRegion()); firOpBuilder.createBlock(&op.getRegion());

} }

// Set the insert for the terminator operation to go at the end of the // Set the insert for the terminator operation to go at the end of the

// block - this is either empty or the block with the stores above, // block - this is either empty or the block with the stores above,

// the end of the block works for both. // the end of the block works for both.

mlir::Block &block = op.getRegion().back(); mlir::Block &block = op.getRegion().back();

firOpBuilder.setInsertionPointToEnd(&block); firOpBuilder.setInsertionPointToEnd(&block);

// If it is an unstructured region and is not the outer region of a combined // If it is an unstructured region and is not the outer region of a combined

// construct, create empty blocks for all evaluations. // construct, create empty blocks for all evaluations.

if (eval.lowerAsUnstructured() && !outerCombined) if (eval.lowerAsUnstructured() && !outerCombined)

createEmptyRegionBlocks(firOpBuilder, eval.getNestedEvaluations()); createEmptyRegionBlocks(firOpBuilder, eval.getNestedEvaluations());

// Insert the terminator. // Insert the terminator.

if constexpr (std::is_same_v<Op, omp::WsLoopOp>) { if constexpr (std::is_same_v<Op, omp::WsLoopOp>) {

mlir::ValueRange results; mlir::ValueRange results;

firOpBuilder.create<mlir::omp::YieldOp>(loc, results); firOpBuilder.create<mlir::omp::YieldOp>(loc, results);

} else if constexpr (std::is_same_v<Op, omp::AtomicUpdateOp>) {

Fortran::lower::StatementContext stmtCtx;

auto result = fir::getBase(

converter.genExprValue(*Fortran::semantics::GetExpr(*expr), stmtCtx));

firOpBuilder.create<mlir::omp::YieldOp>(loc, result);

} else { } else {

firOpBuilder.create<mlir::omp::TerminatorOp>(loc); firOpBuilder.create<mlir::omp::TerminatorOp>(loc);

} }

// Reset the insert point to before the terminator. // Reset the insert point to before the terminator.

if (storeOp) if (storeOp)

firOpBuilder.setInsertionPointAfter(storeOp); firOpBuilder.setInsertionPointAfter(storeOp);

else else

▲ Show 20 Lines • Show All 624 Lines • ▼ Show 20 Lines if (auto ompClause = std::get_if<Fortran::parser::OmpClause>(&clause.u)) {

} else if (std::get_if<Fortran::parser::OmpClause::Release>( } else if (std::get_if<Fortran::parser::OmpClause::Release>(

&ompMemoryOrderClause->v.u)) { &ompMemoryOrderClause->v.u)) {

memory_order = mlir::omp::ClauseMemoryOrderKindAttr::get( memory_order = mlir::omp::ClauseMemoryOrderKindAttr::get(

firOpBuilder.getContext(), omp::ClauseMemoryOrderKind::Release); firOpBuilder.getContext(), omp::ClauseMemoryOrderKind::Release);

} }

static void genOmpAtomicCaptureStatement(

Fortran::lower::AbstractConverter &converter,

const Fortran::parser::Variable &assignmentStmtVariable,

const Fortran::parser::Expr &assignmentStmtExpr,

const Fortran::parser::OmpAtomicClauseList *leftHandClauseList,

const Fortran::parser::OmpAtomicClauseList *rightHandClauseList) {

// Generate `omp.atomic.read` operation for atomic statements of the form `v =

// x`

auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation();

Fortran::lower::StatementContext stmtCtx;

// Get the address of atomic read operands.

mlir::Value from_address = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx));

mlir::Value to_address = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

// If no hint clause is specified, the effect is as if

// hint(omp_sync_hint_none) had been specified.

mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr;

if (leftHandClauseList) {

genOmpAtomicHintAndMemoryOrderClauses(converter, *leftHandClauseList, hint,

memory_order);

}

if (rightHandClauseList) {

genOmpAtomicHintAndMemoryOrderClauses(converter, *rightHandClauseList, hint,

memory_order);

}

firOpBuilder.create<mlir::omp::AtomicReadOp>(currentLocation, from_address,

to_address, hint, memory_order);

}

static void static void genOmpAtomicWriteStatement(

kiranchandramohanUnsubmitted

Not Done

For the array element case, please add a Not Yet Implemented TODO. We can handle this separately.

kiranchandramohan: For the array element case, please add a Not Yet Implemented TODO. We can handle this…

genOmpAtomicWrite(Fortran::lower::AbstractConverter &converter, Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, const Fortran::parser::Variable &assignmentStmtVariable,

const Fortran::parser::OmpAtomicWrite &atomicWrite) { const Fortran::parser::Expr &assignmentStmtExpr,

const Fortran::parser::OmpAtomicClauseList *leftHandClauseList,

const Fortran::parser::OmpAtomicClauseList *rightHandClauseList) {

// Generate `omp.atomic.write` operation for atomic statements of the form `x

// = expr`

auto &firOpBuilder = converter.getFirOpBuilder(); auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation(); auto currentLocation = converter.getCurrentLocation();

// Get the value and address of atomic write operands.

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicWrite.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicWrite.t);

const auto &assignmentStmtExpr =

std::get<Fortran::parser::Expr>(std::get<3>(atomicWrite.t).statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<3>(atomicWrite.t).statement.t);

Fortran::lower::StatementContext stmtCtx; Fortran::lower::StatementContext stmtCtx;

// Get the address of atomic write operands.

mlir::Value value = fir::getBase(converter.genExprValue( mlir::Value value = fir::getBase(converter.genExprValue(

*Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx)); *Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx));

mlir::Value address = fir::getBase(converter.genExprAddr( mlir::Value address = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx)); *Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

// If no hint clause is specified, the effect is as if // If no hint clause is specified, the effect is as if

// hint(omp_sync_hint_none) had been specified. // hint(omp_sync_hint_none) had been specified.

mlir::IntegerAttr hint = nullptr; mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr; mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr;

genOmpAtomicHintAndMemoryOrderClauses(converter, leftHandClauseList, hint, if (leftHandClauseList) {

genOmpAtomicHintAndMemoryOrderClauses(converter, *leftHandClauseList, hint,

memory_order); memory_order);

genOmpAtomicHintAndMemoryOrderClauses(converter, rightHandClauseList, hint, }

if (rightHandClauseList) {

genOmpAtomicHintAndMemoryOrderClauses(converter, *rightHandClauseList, hint,

memory_order); memory_order);

}

firOpBuilder.create<mlir::omp::AtomicWriteOp>(currentLocation, address, value, firOpBuilder.create<mlir::omp::AtomicWriteOp>(currentLocation, address, value,

hint, memory_order); hint, memory_order);

} }

static void genOmpAtomicRead(Fortran::lower::AbstractConverter &converter, static bool checkForAtomicCaptureStmt(

const Fortran::parser::AssignmentStmt &assignmentStmt) {

// Check if the atomic statement is of the structure `v = x`

// Rely on previous phases to ensure correct semantics of `v = x`

const auto &expr{std::get<Fortran::parser::Expr>(assignmentStmt.t)};

return std::visit(

Fortran::common::visitors{

[&](const Fortran::common::Indirection<Fortran::parser::Designator>

&designator) {

if (getDesignatorNameIfDataRef(designator.value()))

return true; // found a variable on the RHS of the atomic

// statement expression

return false;

[&](const auto &) { return false; },

expr.u);

}

template <typename T>

bool isOmpAtomicUpdateStmtOperatorValid(

shraiyshUnsubmitted

Not Done

Shouldn't this be done in semantics?

shraiysh: Shouldn't this be done in semantics?

NimishMishraAuthorUnsubmitted

Done

This is the best approach I could think of in order to understand which capture construct combination to lower to. So I put basic "structural" checks for v=x and x=x op expr statements. There are more semantic checks attached with these. I assume we rely on the semantics phase to take care of them.

These helper functions here are only doing structural checks.

NimishMishra: This is the best approach I could think of in order to understand which capture construct…

const T &node, const Fortran::parser::Variable &variable) {

using AllowedBinaryOperators =

std::variant<Fortran::parser::Expr::Add, Fortran::parser::Expr::Multiply,

Fortran::parser::Expr::Subtract,

Fortran::parser::Expr::Divide, Fortran::parser::Expr::AND,

Fortran::parser::Expr::OR, Fortran::parser::Expr::EQV,

Fortran::parser::Expr::NEQV>;

using BinaryOperators =

std::variant<Fortran::parser::Expr::Add, Fortran::parser::Expr::Multiply,

Fortran::parser::Expr::Subtract,

Fortran::parser::Expr::Divide, Fortran::parser::Expr::AND,

Fortran::parser::Expr::OR, Fortran::parser::Expr::EQV,

Fortran::parser::Expr::NEQV, Fortran::parser::Expr::Power,

Fortran::parser::Expr::Concat, Fortran::parser::Expr::LT,

Fortran::parser::Expr::LE, Fortran::parser::Expr::EQ,

Fortran::parser::Expr::NE, Fortran::parser::Expr::GE,

Fortran::parser::Expr::GT>;

if constexpr (Fortran::common::HasMember<T, BinaryOperators>) {

const auto &variableName{variable.GetSource().ToString()};

const auto &exprLeft{std::get<0>(node.t)};

const auto &exprRight{std::get<1>(node.t)};

if ((exprLeft.value().source.ToString() != variableName) &&

(exprRight.value().source.ToString() != variableName)) {

return false;

}

return Fortran::common::HasMember<T, AllowedBinaryOperators>;

}

return false;

}

static bool checkForAtomicUpdateStmt(

const Fortran::parser::AssignmentStmt &assignmentStmt) {

// Check if the atomic statement is of the structure `x = x operator expr` OR

// `x = expr operator x` OR `x = intrinsic_procedure_name(x, expr_list)` OR `x

// = intrinsic_procedure_name(expr_list, x)`. Rely on previous phases to

// ensure correct semantics of these assignment statements.

const auto &expr{std::get<Fortran::parser::Expr>(assignmentStmt.t)};

const auto &var{std::get<Fortran::parser::Variable>(assignmentStmt.t)};

return std::visit(

Fortran::common::visitors{

[&](const Fortran::common::Indirection<

Fortran::parser::FunctionReference> &) { return true; },

shraiyshUnsubmitted

Not Done

This should not always be true. If the function call on rhs does not use the variable on the lhs, then this is an atomic write statement.

Modeling it as an update sort of works, in the sense that there is no visible change in behavior of the program, but the generated IR will not be entirely accurate. If it is very hard to deduce write here, maybe mention it as a todo?

shraiysh: This should not always be true. If the function call on rhs does not use the variable on the…

NimishMishraAuthorUnsubmitted

Done

You are right. I missed it. I will revisit these functions and see what can be done.

NimishMishra: You are right. I missed it. I will revisit these functions and see what can be done.

[&](const auto &x) {

return isOmpAtomicUpdateStmtOperatorValid(x, var);

shraiyshUnsubmitted

Not Done

Fortran::parser::FunctionReference> &) { return true; },

- [&](const auto &x) {

- return isOmpAtomicUpdateStmtOperatorValid(x, var);

- },

+ [&](const Fortran::parser::Expr::IntrinsicBinary &node) {

+ const auto &variableName{var.GetSource().ToString()};

+ const auto &exprLeft{std::get<0>(node.t)};

+ const auto &exprRight{std::get<1>(node.t)};

+ if ((exprLeft.value().source.ToString() == variableName) &&

+ (exprRight.value().source.ToString() == variableName)) {

+ return true;

+ }

+ return false;

+ },

+ [&](const auto &x) {

+ return false;

+ }, },

expr.u);

I meant can something like this work? Relying on semantics for a valid binary operator.

shraiysh: I meant can something like this work? Relying on semantics for a valid binary operator.

expr.u);

}

static void genOmpAtomicUpdateStatement(

Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicRead &atomicRead) { const Fortran::parser::Variable &assignmentStmtVariable,

const Fortran::parser::Expr &assignmentStmtExpr,

const Fortran::parser::OmpAtomicClauseList *leftHandClauseList,

const Fortran::parser::OmpAtomicClauseList *rightHandClauseList) {

// Generate `omp.atomic.update` operation atomic assignment statements of the

// form `x = x operator expr` OR `x = expr operator x` OR `x =

// intrinsic_procedure_name(x, expr_list)` OR `x =

// intrinsic_procedure_name(expr_list, x)`.

auto &firOpBuilder = converter.getFirOpBuilder(); auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation(); auto currentLocation = converter.getCurrentLocation();

// Get the address of atomic read operands. mlir::Value address;

SmallVector<const Fortran::semantics::Symbol *> symbolVector;

Fortran::lower::StatementContext stmtCtx;

if (auto varDesignator = std::get_if<

Fortran::common::Indirection<Fortran::parser::Designator>>(

&assignmentStmtVariable.u)) {

if (const auto *name = getDesignatorNameIfDataRef(varDesignator->value())) {

address = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

symbolVector.push_back(name->symbol);

}

// If no hint clause is specified, the effect is as if

// hint(omp_sync_hint_none) had been specified.

mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr;

if (leftHandClauseList) {

genOmpAtomicHintAndMemoryOrderClauses(converter, *leftHandClauseList, hint,

memory_order);

}

if (rightHandClauseList) {

genOmpAtomicHintAndMemoryOrderClauses(converter, *rightHandClauseList, hint,

memory_order);

}

auto atomicUpdateOp = firOpBuilder.create<mlir::omp::AtomicUpdateOp>(

currentLocation, address, hint, memory_order);

createBodyOfOp<omp::AtomicUpdateOp>(atomicUpdateOp, converter,

currentLocation, eval, nullptr,

symbolVector, false, &assignmentStmtExpr);

}

static void

genOmpAtomicWrite(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicWrite &atomicWrite) {

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicWrite.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicWrite.t);

const auto &assignmentStmtExpr =

std::get<Fortran::parser::Expr>(std::get<3>(atomicWrite.t).statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<3>(atomicWrite.t).statement.t);

genOmpAtomicWriteStatement(converter, assignmentStmtVariable,

assignmentStmtExpr, &leftHandClauseList,

&rightHandClauseList);

}

static void genOmpAtomicRead(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicRead &atomicRead) {

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList = const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicRead.t); std::get<2>(atomicRead.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList = const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicRead.t); std::get<0>(atomicRead.t);

const auto &assignmentStmtExpr = const auto &assignmentStmtExpr =

std::get<Fortran::parser::Expr>(std::get<3>(atomicRead.t).statement.t); std::get<Fortran::parser::Expr>(std::get<3>(atomicRead.t).statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>( const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<3>(atomicRead.t).statement.t); std::get<3>(atomicRead.t).statement.t);

Fortran::lower::StatementContext stmtCtx; genOmpAtomicCaptureStatement(converter, assignmentStmtVariable,

mlir::Value from_address = fir::getBase(converter.genExprAddr( assignmentStmtExpr, &leftHandClauseList,

*Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx)); &rightHandClauseList);

mlir::Value to_address = fir::getBase(converter.genExprAddr( }

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

// If no hint clause is specified, the effect is as if static void

// hint(omp_sync_hint_none) had been specified. genOmpAtomicUpdate(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicUpdate &atomicUpdate) {

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicUpdate.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicUpdate.t);

const auto &assignmentStmtExpr =

std::get<Fortran::parser::Expr>(std::get<3>(atomicUpdate.t).statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<3>(atomicUpdate.t).statement.t);

genOmpAtomicUpdateStatement(converter, eval, assignmentStmtVariable,

assignmentStmtExpr, &leftHandClauseList,

&rightHandClauseList);

}

static void genOmpAtomic(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomic &atomicConstruct) {

const Fortran::parser::OmpAtomicClauseList &atomicClauseList =

std::get<Fortran::parser::OmpAtomicClauseList>(atomicConstruct.t);

const auto &assignmentStmtExpr = std::get<Fortran::parser::Expr>(

std::get<Fortran::parser::Statement<Fortran::parser::AssignmentStmt>>(

atomicConstruct.t)

.statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<Fortran::parser::Statement<Fortran::parser::AssignmentStmt>>(

atomicConstruct.t)

.statement.t);

genOmpAtomicUpdateStatement(converter, eval, assignmentStmtVariable,

assignmentStmtExpr, &atomicClauseList, nullptr);

}

static void

genOmpAtomicCapture(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicCapture &atomicCapture) {

fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();

mlir::Location currentLocation = converter.getCurrentLocation();

mlir::IntegerAttr hint = nullptr; mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr; mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr;

const Fortran::parser::AssignmentStmt &stmt1 =

std::get<3>(atomicCapture.t).v.statement;

const Fortran::parser::AssignmentStmt &stmt2 =

std::get<4>(atomicCapture.t).v.statement;

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicCapture.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicCapture.t);

genOmpAtomicHintAndMemoryOrderClauses(converter, leftHandClauseList, hint, genOmpAtomicHintAndMemoryOrderClauses(converter, leftHandClauseList, hint,

memory_order); memory_order);

genOmpAtomicHintAndMemoryOrderClauses(converter, rightHandClauseList, hint, genOmpAtomicHintAndMemoryOrderClauses(converter, rightHandClauseList, hint,

memory_order); memory_order);

firOpBuilder.create<mlir::omp::AtomicReadOp>(currentLocation, from_address,

to_address, hint, memory_order); auto atomicCaptureOp = firOpBuilder.create<mlir::omp::AtomicCaptureOp>(

currentLocation, hint, memory_order);

firOpBuilder.createBlock(&atomicCaptureOp.getRegion());

mlir::Block &block = atomicCaptureOp.getRegion().back();

firOpBuilder.setInsertionPointToEnd(&block);

firOpBuilder.create<mlir::omp::TerminatorOp>(currentLocation);

firOpBuilder.setInsertionPointToStart(&block);

if (checkForAtomicCaptureStmt(stmt1) && checkForAtomicUpdateStmt(stmt2)) {

// Atomic capture construct is of the form [capture-stmt, update-stmt]

const auto &assignmentStmt1Expr = std::get<Fortran::parser::Expr>(stmt1.t);

const auto &assignmentStmt1Variable =

std::get<Fortran::parser::Variable>(stmt1.t);

const auto &assignmentStmt2Expr = std::get<Fortran::parser::Expr>(stmt2.t);

const auto &assignmentStmt2Variable =

std::get<Fortran::parser::Variable>(stmt2.t);

genOmpAtomicCaptureStatement(

converter, assignmentStmt1Variable, assignmentStmt1Expr,

/*OmpAtomicClauseList =*/nullptr, /*OmpAtomicClauseList =*/nullptr);

genOmpAtomicUpdateStatement(

converter, eval, assignmentStmt2Variable, assignmentStmt2Expr,

/*OmpAtomicClauseList =*/nullptr, /*OmpAtomicClauseList =*/nullptr);

} else if (checkForAtomicUpdateStmt(stmt1) &&

checkForAtomicCaptureStmt(stmt2)) {

// Atomic capture construct is of the form [update-stmt, capture-stmt]

const auto &assignmentStmt1Expr = std::get<Fortran::parser::Expr>(stmt1.t);

const auto &assignmentStmt1Variable =

std::get<Fortran::parser::Variable>(stmt1.t);

const auto &assignmentStmt2Expr = std::get<Fortran::parser::Expr>(stmt2.t);

const auto &assignmentStmt2Variable =

std::get<Fortran::parser::Variable>(stmt2.t);

// `omp.atomic.read` operation must be created outside the

// `omp.atomic.update` Hence, create these operations in a bottom-up manner:

// first create `omp.atomic.read` and then `omp.atomic.update`

genOmpAtomicCaptureStatement(

converter, assignmentStmt2Variable, assignmentStmt2Expr,

/*OmpAtomicClauseList =*/nullptr, /*OmpAtomicClauseList =*/nullptr);

firOpBuilder.setInsertionPointToStart(

&block); // insert `omp.atomic.update` "above" `omp.atomic.read`

genOmpAtomicUpdateStatement(

converter, eval, assignmentStmt1Variable, assignmentStmt1Expr,

/*OmpAtomicClauseList =*/nullptr, /*OmpAtomicClauseList =*/nullptr);

} else {

// Atomic capture construct is of the form [capture-stmt, write-stmt]

const auto &assignmentStmt1Expr = std::get<Fortran::parser::Expr>(stmt1.t);

const auto &assignmentStmt1Variable =

std::get<Fortran::parser::Variable>(stmt1.t);

const auto &assignmentStmt2Expr = std::get<Fortran::parser::Expr>(stmt2.t);

const auto &assignmentStmt2Variable =

std::get<Fortran::parser::Variable>(stmt2.t);

genOmpAtomicCaptureStatement(

converter, assignmentStmt1Variable, assignmentStmt1Expr,

/*OmpAtomicClauseList =*/nullptr, /*OmpAtomicClauseList =*/nullptr);

genOmpAtomicWriteStatement(

converter, assignmentStmt2Variable, assignmentStmt2Expr,

/*OmpAtomicClauseList =*/nullptr, /*OmpAtomicClauseList =*/nullptr);

}

} }

static void static void

genOMP(Fortran::lower::AbstractConverter &converter, genOMP(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OpenMPAtomicConstruct &atomicConstruct) { const Fortran::parser::OpenMPAtomicConstruct &atomicConstruct) {

std::visit(Fortran::common::visitors{ std::visit(Fortran::common::visitors{

[&](const Fortran::parser::OmpAtomicRead &atomicRead) { [&](const Fortran::parser::OmpAtomicRead &atomicRead) {

genOmpAtomicRead(converter, eval, atomicRead); genOmpAtomicRead(converter, eval, atomicRead);

}, },

[&](const Fortran::parser::OmpAtomicWrite &atomicWrite) { [&](const Fortran::parser::OmpAtomicWrite &atomicWrite) {

genOmpAtomicWrite(converter, eval, atomicWrite); genOmpAtomicWrite(converter, eval, atomicWrite);

}, },

[&](const auto &) { [&](const Fortran::parser::OmpAtomicUpdate &atomicUpdate) {

TODO(converter.getCurrentLocation(), genOmpAtomicUpdate(converter, eval, atomicUpdate);

"Atomic update & capture"); },

[&](const Fortran::parser::OmpAtomic &atomicConstruct) {

genOmpAtomic(converter, eval, atomicConstruct);

[&](const Fortran::parser::OmpAtomicCapture &atomicCapture) {

genOmpAtomicCapture(converter, eval, atomicCapture);

}, },

atomicConstruct.u); atomicConstruct.u);

} }

void Fortran::lower::genOpenMPConstruct( void Fortran::lower::genOpenMPConstruct(

Fortran::lower::AbstractConverter &converter, Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

flang/test/Lower/OpenMP/atomic-capture.f90

This file was added.

! RUN: bbc -fopenmp -emit-fir %s -o - | FileCheck %s

! RUN: flang-new -fc1 -emit-fir -fopenmp %s -o - | FileCheck %s --check-prefix=FIRDialect

! TODO: Add support for pointers

! This test checks the lowering of atomic capture construct

!CHECK: %[[TEMP_1:.*]] = fir.alloca i32 {adapt.valuebyref}

!CHECK: %[[TEMP_2:.*]] = fir.alloca i32 {adapt.valuebyref}

!CHECK: %[[VAR_X:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFEx"}

!CHECK: %[[VAR_Y:.*]] = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"}

!CHECK: omp.atomic.capture memory_order(release) {

!CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: omp.atomic.update %[[VAR_Y]] : !fir.ref<i32> {

!CHECK: ^bb0(%[[ARG:.*]]: i32):

!CHECK: fir.store %[[ARG]] to %[[TEMP_2]] : !fir.ref<i32>

!CHECK: %[[INTERMEDIATE_1:.*]] = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: %[[INTERMEDIATE_2:.*]] = fir.load %[[TEMP_2]] : !fir.ref<i32>

!CHECK: %[[RESULT:.*]] = arith.addi %[[INTERMEDIATE_1]], %[[INTERMEDIATE_2]] : i32

!CHECK: omp.yield(%[[RESULT]] : i32)

!CHECK: }

!CHECK: omp.atomic.capture hint(uncontended) {

!CHECK: omp.atomic.update %[[VAR_Y]] : !fir.ref<i32> {

!CHECK: ^bb0(%[[ARG:.*]]: i32):

!CHECK: fir.store %[[ARG]] to %[[TEMP_1]] : !fir.ref<i32>

!CHECK: %[[INTERMEDIATE_3:.*]] = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: %[[INTERMEDIATE_4:.*]] = fir.load %[[TEMP_1]] : !fir.ref<i32>

!CHECK: %[[RESULT:.*]] = arith.muli %[[INTERMEDIATE_3]], %[[INTERMEDIATE_4]] : i32

!CHECK: omp.yield(%[[RESULT]] : i32)

!CHECK: }

!CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: }

!CHECK: omp.atomic.capture memory_order(acquire) hint(nonspeculative) {

!CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.constant {{.*}} : i32

shraiyshUnsubmitted

Not Done

Shouldn't all of this (34-41) be wrapped in an omp.atomic.read and omp.atomic.update operation? Why can't we generate that here? Relaxing the restrictions on omp.capture is not a solution for this when it's possible to express this same thing in current syntax.

shraiysh: Shouldn't all of this (34-41) be wrapped in an omp.atomic.read and omp.atomic.update operation?

NimishMishraAuthorUnsubmitted

Done

I do not understand. What do you mean by "wrapped" in a read and write operation?

Generally, the read operation is not problematic. In the write operation however, the FIR for expression evaluation is involved. I was attempting to control insertion points to make this expression evaluation "outside" the omp.atomic.capture, but I couldn't do it.

However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write). Please correct me if I am missing something here.

NimishMishra: I do not understand. What do you mean by "wrapped" in a read and write operation? Generally…

!CHECK: {{.*}} = arith.constant {{.*}} : i32

!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!CHECK: {{.*}} = fir.no_reassoc {{.*}} : i32

!CHECK: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

!CHECK: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

!CHECK: }

shraiyshUnsubmitted

Not Done

!CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

+ !CHECK: omp.atomic.update %[[VAR_Y]] {

+ !CHECK: ^bb0(%{{.+}}: i32):

!CHECK: {{.*}} = arith.constant {{.*}} : i32

!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!CHECK: {{.*}} = fir.no_reassoc {{.*}} : i32

!CHECK: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

- !CHECK: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

+ !CHECK: omp.yield (%[[INTERMEDIATE_5]]: i32)

+ !CHECK: }

!CHECK: }

I do not understand. What do you mean by "wrapped" in a read and write operation?

This is what I meant.

However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write).

Ideally, the evaluation of the expression should not be inside the atomic region at all. However, if that's somehow not possible, it should be pushed inside an omp.atomic.update expression because omp.atomic.update operation supports multiple operations in it's region. If that cannot be done too, we should justify adding this relaxation to omp.atomic.capture independent of Flang by answering "what kind of capture executions cannot be expressed in MLIR with the current syntax?". As long as they can be expressed, it should be the job of flang to lower appropriately.

shraiysh: > I do not understand. What do you mean by "wrapped" in a read and write operation? This is…

NimishMishraAuthorUnsubmitted

Done

Okay. Then I will work on moving the evaluation of the expression outside the atomic region. I think I have a way to do that.

NimishMishra: Okay. Then I will work on moving the evaluation of the expression outside the atomic region. I…

!CHECK: omp.atomic.capture {

shraiyshUnsubmitted

Not Done

!CHECK: }

- !CHECK: omp.atomic.capture memory_order(acquire) hint(nonspeculative) {

- !CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.constant {{.*}} : i32

!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!CHECK: {{.*}} = fir.no_reassoc {{.*}} : i32

!CHECK: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

+ !CHECK: omp.atomic.capture memory_order(acquire) hint(nonspeculative) {

+ !CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

!CHECK: }

!CHECK: omp.atomic.capture {

If you can generate this, then that's the most accurate imo.

This does not violate the semantics of atomic construct because it clearly says that -

Only the read and write of the location designated by x are performed mutually atomically. Neither the evaluation of expr or expr_list, nor the write to the location designated by v, need be atomic with respect to the read or write of the location designated by x.

shraiysh: If you can generate this, then that's the most accurate imo. This does not violate the…

!CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.constant {{.*}} : i32

!CHECK: %4 = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!CHECK: {{.*}} = fir.no_reassoc {{.*}} : i32

!CHECK: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

!CHECK: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

!CHECK: }

!CHECK: return

!CHECK: }

!FIRDialect: %[[TEMP_1:.*]] = fir.alloca i32 {adapt.valuebyref}

!FIRDialect: %[[TEMP_2:.*]] = fir.alloca i32 {adapt.valuebyref}

!FIRDialect: %[[VAR_X:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFEx"}

!FIRDialect: %[[VAR_Y:.*]] = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"}

!FIRDialect: omp.atomic.capture memory_order(release) {

!FIRDialect: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!FIRDialect: omp.atomic.update %[[VAR_Y]] : !fir.ref<i32> {

!FIRDialect: ^bb0(%[[ARG:.*]]: i32):

!FIRDialect: fir.store %[[ARG]] to %[[TEMP_2]] : !fir.ref<i32>

!FIRDialect: %[[INTERMEDIATE_1:.*]] = fir.load %[[VAR_X]] : !fir.ref<i32>

!FIRDialect: %[[INTERMEDIATE_2:.*]] = fir.load %[[TEMP_2]] : !fir.ref<i32>

!FIRDialect: %[[RESULT:.*]] = arith.addi %[[INTERMEDIATE_1]], %[[INTERMEDIATE_2]] : i32

!FIRDialect: omp.yield(%[[RESULT]] : i32)

!FIRDialect: }

!FIRDialect: omp.atomic.capture hint(uncontended) {

!FIRDialect: omp.atomic.update %[[VAR_Y]] : !fir.ref<i32> {

!FIRDialect: ^bb0(%[[ARG:.*]]: i32):

!FIRDialect: fir.store %[[ARG]] to %[[TEMP_1]] : !fir.ref<i32>

!FIRDialect: %[[INTERMEDIATE_3:.*]] = fir.load %[[VAR_X]] : !fir.ref<i32>

!FIRDialect: %[[INTERMEDIATE_4:.*]] = fir.load %[[TEMP_1]] : !fir.ref<i32>

!FIRDialect: %[[RESULT:.*]] = arith.muli %[[INTERMEDIATE_3]], %[[INTERMEDIATE_4]] : i32

!FIRDialect: omp.yield(%[[RESULT]] : i32)

!FIRDialect: }

!FIRDialect: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!FIRDialect: }

!FIRDialect: omp.atomic.capture memory_order(acquire) hint(nonspeculative) {

NimishMishraAuthorUnsubmitted

Done

I had a discussion to make at this point. The verifier for atomic capture checks if opsInRegion.size() == 3 i.e. if the number of operations in a region were 3.

This issue came up before during lowering of omp.atomic.write inside atomic capture, since a write statement has expression evaluation which also takes up space. @shraiysh suggested to keep this expression evaluation outside the omp.atomic.capture, since is what I have done currently.

However, with pointers, the issue has resurfaced. This particular b = a lowers as

%0 = allocate c
%1 = allocate d
%2 = allocate a
%3 = allocate b

omp.atomic.capture{
   omp.atomic.update{......}

 %4 = load a
  % 5 = load b
  omp.atomic.read %5 = %4

 omp.terminator
}

The discussion I wish to have is whether the verification method here should be changed, or should we evaluate all LHS and RHS of an atomic assignment statement beforehand. To change the verification, I was thinking like the following:

Ensure in a list of operations, there are exactly two omp.atomic operations and exactly one omp.terminator operations

Last operation in the region is omp.terminator

Second last operation in the region is necessarily a omp.atomic operation

NimishMishra: I had a discussion to make at this point. The verifier for atomic capture checks if…

shraiyshUnsubmitted

Not Done

The generated code should look like the following -

// %a_ptr : !fir.ref<!fir.ref<i32>> %b_ptr = !fir.ref<!fir.ref<i32>>
%a_addr = load %a_ptr : !fir.ref<!fir.ref<i32>>
%b_addr = load %b_ptr : !fir.ref<!fir.ref<i32>>
// %a_addr : !fir.ref<i32>, %b_addr : !fir.ref<i32>
omp.atomic.capture{
  omp.atomic.update %a_addr : !fir.ref<i32> {
  ^bb0(%a_val: i32):
    %b_val = load %b_addr : !fir.ref<i32>
    %temp = arith.addi %a_val, %b_val : i32
    omp.yield %temp
  }
  omp.atomic.read %b_addr = %a_addr
  omp.terminator
}

I get the sense that this is convoluted on flang's end to generate it. It is however not a good idea to relax the constraints on omp.capture because of this. I will reiterate that if we come up with an OpenMP atomic evaluation that cannot be expressed by hand with the omp.atomic.capture operation, then we should definitely change it. Just because flang isn't able to generate it - this isn't a good enough reason to alter the operation. This, and the fact that if we change the number of operations inside atomic capture, we have to worry about lowering it to LLVM IR - which will be harder as the operation relaxes. If you strongly need to relax the constraints on omp.atomic.capture, we should first make sure that the relaxed version translates properly to LLVM IR for execution (probably as a separate patch).

I wanted to put an idea out - maybe to ease the difficulty of generation of omp.atomic.capture, we can define an fir.omp.atomic.capture operation that accepts multiple operations under it. Then during canonicalization (or some other pass) in FIR we can push the unnecessary operations (load a, load b in your example) outside the fir.omp.atomic.capture operation to generate omp.atomic.capture operation. Does that sound like it would make the implementation more straightforward?

shraiysh: The generated code should look like the following - ``` // %a_ptr : !fir.ref<!fir.ref<i32>>…

NimishMishraAuthorUnsubmitted

Done

Ok. I will try to keep loading of the two addresses outside the capture region entirely.

NimishMishra: Ok. I will try to keep loading of the two addresses outside the capture region entirely.

NimishMishraAuthorUnsubmitted

Done

@shraiysh Does this IR look ok?

NimishMishra: @shraiysh Does this IR look ok?

shraiyshUnsubmitted

Not Done

Yes, the current testcases look perfect to me and I cannot spot any errors in them. Thanks for the patience and to get it to work 👏 .

I have not reviewed the code itself, but functionality wise, it looks okay to me. Please feel free to go ahead without my approval for the code as it might be sometime before I get time to review the code. If because of some review comments you happen to change the testcases before landing this, then let me know and I will review the updated testcases as soon as I can.

shraiysh: Yes, the current testcases look perfect to me and I cannot spot any errors in them. Thanks for…

!FIRDialect: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!FIRDialect: {{.*}} = arith.constant {{.*}} : i32

!FIRDialect: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>

!FIRDialect: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!FIRDialect: {{.*}} = fir.no_reassoc {{.*}} : i32

!FIRDialect: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

!FIRDialect: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

!FIRDialect: }

!FIRDialect: omp.atomic.capture {

!FIRDialect: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!FIRDialect: {{.*}} = arith.constant {{.*}} : i32

!FIRDialect: %4 = fir.load %[[VAR_X]] : !fir.ref<i32>

!FIRDialect: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!FIRDialect: {{.*}} = fir.no_reassoc {{.*}} : i32

!FIRDialect: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

!FIRDialect: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

!FIRDialect: }

!FIRDialect: return

!FIRDialect: }

program sample

use omp_lib

integer :: x, y

!$omp atomic capture release

x = y

y = x + y

!$omp end atomic

!$omp atomic hint(omp_sync_hint_uncontended) capture

y = x * y

x = y

!$omp end atomic

!$omp atomic hint(omp_lock_hint_nonspeculative) capture acquire

x = y

y = 2 * 10 + (8 - x)

!$omp end atomic

!$omp atomic capture

x = y

y = 2 * 10 + (8 - x)

!$omp end atomic

end program

kiranchandramohanUnsubmitted

Not Done

I believe the semantics of atomic.update is that it will load the address that is provided to it and that will be available as the basic block argument ARG. The body of the update will use this loaded value ARG to perform the update and yield the updated value which will be stored again at the address.

I think this update operation will not work as expected since it is not using the automatically loaded value in ARG and it is loading the value at the address passed to the atomic.update op and adds the constant to it. This will end up computing which is not what we want.
numbers(1) = numbers(1) + numbers(1) + 10.

Let me know if I missed a point.

kiranchandramohan: I believe the semantics of `atomic.update` is that it will load the `address` that is provided…

kiranchandramohanUnsubmitted

Not Done

I think I got that wrong. It will not double add since the new code is not touching the ARG.

I tried fetching the patch but it shows some issues. Could you rebase? I would like to have a look at the IR that is generated.

Particularly, we have to check,

whether the following load is an atomic load in the LLVM IR.

!CHECK: %[[array_element_inner:.*]] = fir.load %[[array_element_ref]]

whether the loop containing cmpxchg is well-formed.

kiranchandramohan: I think I got that wrong. It will not double add since the new code is not touching the `ARG`.

NimishMishraAuthorUnsubmitted

Done

Hi Kiran.

I was trying to understand the requirement here. I do not completely understand why cmpxchg could be a problem here? Wouldn't the compare and the exchange happen inside the atomic region.

I mean I am just trying to understand how the generated IR should look like, to make sure I am doing it correctly.

NimishMishra: Hi Kiran. I was trying to understand the requirement here. I do not completely understand why…

NimishMishraAuthorUnsubmitted

Done

Hi Kiran.

The LLVM IR is as follows. Does it look ok? The load is within the Atomic Update block, so it should be fine right?

llvm.func @_QParray_refs() {                                                                                                                 
  %0 = llvm.mlir.constant(1.000000e+01 : f32) : f32                                                                                         
  %1 = llvm.mlir.constant(1 : i64) : i64                                                                                                     
  %2 = llvm.alloca %1 x !llvm.array<5 x f32> {bindc_name = "numbers", in_type = !fir.array<5xf32>, operand_segment_sizes = array<i32: 0, 0>,
uniq_name = "_QFarray_refsEnumbers"} : (i64) -> !llvm.ptr<array<5 x f32>>                                                                   
  %3 = llvm.alloca %1 x f32 {bindc_name = "x", in_type = f32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFarray_refsEx"} : (i64
) -> !llvm.ptr<f32>                                                                                                                         
  %4 = llvm.getelementptr %2[0, 0] : (!llvm.ptr<array<5 x f32>>) -> !llvm.ptr<f32>                                                           
  omp.atomic.capture   {                                                                                                                     
    omp.atomic.update   %4 : !llvm.ptr<f32> {                                                                                               
    ^bb0(%arg0: f32):                                                                                                                       
      %5 = llvm.load %4 : !llvm.ptr<f32>                                                                                                     
      %6 = llvm.fadd %5, %0  {fastmathFlags = #llvm.fastmath<contract>} : f32
      omp.yield(%6 : f32)                                             
    }                                                                 
    omp.atomic.read %3 = %4   : !llvm.ptr<f32>, f32
  }         
  llvm.return

NimishMishra: Hi Kiran. The LLVM IR is as follows. Does it look ok? The `load` is within the Atomic Update…

NimishMishraAuthorUnsubmitted

Done

Apologies. That is the MLIR dialect. Please find the LLVM IR below. It also has a cmpxchg instruction

The test case is simple:

!$omp atomic capture
      x = y
      y = x + y
!$omp end capture

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca float, i64 1, align 4
  store float 2.000000e+01, ptr %1, align 4
  store float 1.000000e+01, ptr %2, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %2 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %3 = phi i32 [ %.atomic.load, %entry ], [ %8, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %3 to float
  %4 = load float, ptr %1, align 4
  %5 = fadd contract float %4, %.atomic.fltCast
  store float %5, ptr %x.new.val, align 4
  %6 = load i32, ptr %x.new.val, align 4
  %7 = cmpxchg ptr %2, i32 %3, i32 %6 monotonic monotonic, align 4
  %8 = extractvalue { i32, i1 } %7, 0
  %9 = extractvalue { i32, i1 } %7, 1
  br i1 %9, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %.atomic.fltCast, ptr %1, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

NimishMishra: Apologies. That is the MLIR dialect. Please find the LLVM IR below. It also has a `cmpxchg`…

NimishMishraAuthorUnsubmitted

Done

And for the array reference test case

!$omp atomic capture
      x(1) = y(1)
      y(1) = x(1) + y(1)
!$omp end capture

The following IR is generated:

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca [5 x float], i64 1, align 4
  %3 = alloca [5 x float], i64 1, align 4
  %4 = getelementptr [5 x float], ptr %2, i32 0, i32 0
  store float 2.000000e+01, ptr %4, align 4
  %5 = getelementptr [5 x float], ptr %3, i32 0, i32 0
  store float 1.000000e+01, ptr %5, align 4
  %6 = load float, ptr %4, align 4
  %7 = load float, ptr %5, align 4
  %8 = fadd contract float %6, %7
  store float %8, ptr %1, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %4 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %9 = phi i32 [ %.atomic.load, %entry ], [ %13, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %9 to float
  %10 = load float, ptr %5, align 4
  store float %10, ptr %x.new.val, align 4
  %11 = load i32, ptr %x.new.val, align 4
  %12 = cmpxchg ptr %4, i32 %9, i32 %11 monotonic monotonic, align 4
  %13 = extractvalue { i32, i1 } %12, 0
  %14 = extractvalue { i32, i1 } %12, 1
  br i1 %14, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %10, ptr %5, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

NimishMishra: And for the array reference test case ``` !$omp atomic capture x(1) = y(1) y(1) =…

kiranchandramohanUnsubmitted

Not Done

The concern here is that the atomically loaded value is not used in the update operation.

AFAIU, the cmpxchg instruction only updates the location if the value that is currently at that location equals the value that is used for the update. So, if we are doing y = y + x. An initial atomic load is made of y (=y_old) and it is added with the value at x to obtain the value y_old + x. Before storing this value at y, it is checked that the current resident value at y is equal to y_old. The problem here (for the array-element case) is that the value to be used for updating is not obtained using the atomically loaded value y_old, but it is using a different value and that does not seem correct. Also, the update operation (addition here) has to be inside the loop since it addition should be performed on the atomically loaded value.

kiranchandramohan: The concern here is that the atomically loaded value is not used in the update operation.

NimishMishraAuthorUnsubmitted

Done

I understand now. Thank you.

The patch can not go ahead in its current form then. Do you any suggestions on how to go forward with fixing it then? Johannes did mention an alternative strategy some time back, but I am not sure how to start on that direction. Can you give some initial direction?

NimishMishra: I understand now. Thank you. The patch can not go ahead in its current form then. Do you any…

flang/test/Lower/OpenMP/atomic-update.f90

This file was added.

				! This test checks lowering of atomic update construct
				! RUN: bbc -fopenmp -emit-fir %s -o - \| \
				! RUN: FileCheck %s

				program OmpAtomicUpdate
				use omp_lib
				integer :: x, y, z
				integer, pointer :: a, b
				integer, target :: c, d
				a=>c
				b=>d

				!CHECK: %[[TEMP_1:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_2:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_3:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_4:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_5:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_6:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_7:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_8:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_9:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: %[[TEMP_10:.*]] = fir.alloca i32 {adapt.valuebyref}
				!CHECK: {{.*}} = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "a", uniq_name = "_QFEa"}
				!CHECK: {{.*}} = fir.alloca !fir.ptr<i32> {uniq_name = "_QFEa.addr"}
				!CHECK: {{.*}} = fir.zero_bits !fir.ptr<i32>
				!CHECK: fir.store {{.}} to {{.}} : !fir.ref<!fir.ptr<i32>>
				!CHECK: {{.*}} = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "b", uniq_name = "_QFEb"}
				!CHECK: %[[b_ADDR:.*]] = fir.alloca !fir.ptr<i32> {uniq_name = "_QFEb.addr"}
				!CHECK: {{.*}} = fir.zero_bits !fir.ptr<i32>
				!CHECK: fir.store {{.}} to {{.}} : !fir.ref<!fir.ptr<i32>>
				!CHECK: {{.*}} = fir.address_of(@_QFEc) : !fir.ref<i32>
				!CHECK: {{.*}} = fir.address_of(@_QFEd) : !fir.ref<i32>
				!CHECK: %[[VAR_X:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFEx"}
				!CHECK: %[[VAR_Y:.*]] = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"}
				!CHECK: %[[VAR_Z:.*]] = fir.alloca i32 {bindc_name = "z", uniq_name = "_QFEz"}
				!CHECK: {{.}} = fir.convert {{.}} : (!fir.ref<i32>) -> !fir.ptr<i32>
				!CHECK: fir.store {{.}} to {{.}} : !fir.ref<!fir.ptr<i32>>
				!CHECK: {{.}} = fir.convert {{.}} : (!fir.ref<i32>) -> !fir.ptr<i32>
				!CHECK: fir.store {{.}} to {{.}} : !fir.ref<!fir.ptr<i32>>
				!CHECK: %[[LOADED_a_ADDR:.]] = fir.load {{.}} : !fir.ref<!fir.ptr<i32>>


				!CHECK: omp.atomic.update %[[LOADED_a_ADDR]] : !fir.ptr<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_10]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[TEMP_10]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[b_ADDR]] : !fir.ref<!fir.ptr<i32>>
				!CHECK: {{.}} = fir.load {{.}} : !fir.ptr<i32>
				!CHECK: %[[RESULT:.]] = arith.addi {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!$omp atomic update
				a = a + b


				!CHECK: omp.atomic.update %[[VAR_Y]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_9]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[TEMP_9]] : !fir.ref<i32>
				!CHECK: {{.*}} = arith.constant 1 : i32
				!CHECK: %[[RESULT:.]] = arith.addi %{{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!CHECK: omp.atomic.update %[[VAR_Z]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_8]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>
				!CHECK: %{{.*}} = fir.load %[[TEMP_8]] : !fir.ref<i32>
				!CHECK: %[[RESULT:.]] = arith.muli {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!$omp atomic
				y = y + 1
				!$omp atomic update
				z = x * z

				!CHECK: omp.atomic.update memory_order(relaxed) hint(uncontended) %[[VAR_X]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_7]] : !fir.ref<i32>
				!CHECK: %{{.*}} = fir.load %[[TEMP_7]] : !fir.ref<i32>
				!CHECK: %{{.*}} = arith.constant 1 : i32
				!CHECK: %[[RESULT:.]] = arith.subi {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK:}
				!CHECK: omp.atomic.update memory_order(relaxed) %[[VAR_Y]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_6]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[TEMP_6]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[VAR_Z]] : !fir.ref<i32>
				!CHECK: {{.}} = arith.cmpi sgt, {{.}}, {{.*}} : i32
				!CHECK: {{.}} = arith.select {{.}}, {{.}}, {{.}} : i32
				!CHECK: {{.}} = arith.cmpi sgt, {{.}}, {{.*}} : i32
				!CHECK: %[[RESULT:.]] = arith.select {{.}}, {{.}}, {{.}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!CHECK: omp.atomic.update memory_order(relaxed) hint(contended) %[[VAR_Z]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_5]] : !fir.ref<i32>
				!CHECK: %{{.*}} = fir.load %[[TEMP_5]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>
				!CHECK: %[[RESULT:.]] = arith.addi {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!$omp atomic relaxed update hint(omp_sync_hint_uncontended)
				x = x - 1
				!$omp atomic update relaxed
				y = max(x, y, z)
				!$omp atomic relaxed hint(omp_sync_hint_contended)
				z = z + x

				!CHECK: omp.atomic.update memory_order(release) hint(contended) %[[VAR_Z]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_4]] : !fir.ref<i32>
				!CHECK: {{.*}} = arith.constant 10 : i32
				!CHECK: {{.*}} = fir.load %[[TEMP_4]] : !fir.ref<i32>
				!CHECK: %[[RESULT:.]] = arith.muli {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!CHECK: omp.atomic.update memory_order(release) hint(speculative) %[[VAR_X]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_3]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[TEMP_3]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[VAR_Z]] : !fir.ref<i32>
				!CHECK: %[[RESULT:.]] = arith.divsi {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!$omp atomic release update hint(omp_lock_hint_contended)
				z = z * 10
				!$omp atomic hint(omp_lock_hint_speculative) update release
				x = x / z

				!CHECK: omp.atomic.update memory_order(seq_cst) hint(nonspeculative) %[[VAR_Y]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_2]] : !fir.ref<i32>
				!CHECK: {{.*}} = arith.constant 10 : i32
				!CHECK: {{.*}} = fir.load %[[TEMP_2]] : !fir.ref<i32>
				!CHECK: %[[RESULT:.]] = arith.addi {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!CHECK: omp.atomic.update memory_order(seq_cst) %[[VAR_Z]] : !fir.ref<i32> {
				!CHECK: ^bb0(%[[ARG:.*]]: i32):
				!CHECK: fir.store %[[ARG]] to %[[TEMP_1]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[VAR_Y]] : !fir.ref<i32>
				!CHECK: {{.*}} = fir.load %[[TEMP_1]] : !fir.ref<i32>
				!CHECK: %[[RESULT:.]] = arith.addi {{.}}, {{.*}} : i32
				!CHECK: omp.yield(%[[RESULT]] : i32)
				!CHECK: }
				!CHECK: return
				!CHECK: }
				!$omp atomic hint(omp_sync_hint_nonspeculative) seq_cst
				y = 10 + y
				!$omp atomic seq_cst update
				z = y + z
				end program OmpAtomicUpdate

mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp

	Show First 20 Lines • Show All 914 Lines • ▼ Show 20 Lines
	}			}

	LogicalResult AtomicCaptureOp::verify() {			LogicalResult AtomicCaptureOp::verify() {
	return verifySynchronizationHint(*this, hint_val());			return verifySynchronizationHint(*this, hint_val());
	}			}

	LogicalResult AtomicCaptureOp::verifyRegions() {			LogicalResult AtomicCaptureOp::verifyRegions() {
	Block::OpListType &ops = region().front().getOperations();			Block::OpListType &ops = region().front().getOperations();
	if (ops.size() != 3)			int numberOfOmpOps{0};
				for (auto &op : ops) {
				if (dyn_cast<AtomicReadOp>(op) \|\| dyn_cast<AtomicUpdateOp>(op) \|\|
				dyn_cast<AtomicWriteOp>(op))
				numberOfOmpOps++;
				}
				if (!(numberOfOmpOps == 2 && dyn_cast<TerminatorOp>(ops.back())))
	return emitError()			return emitError()
	<< "expected three operations in omp.atomic.capture region (one "			<< "expected three operations in omp.atomic.capture region (one "
	"terminator, and two atomic ops)";			"terminator, and two atomic ops)";

	auto &firstOp = ops.front();			auto &firstOp = ops.front();
	auto &secondOp = *ops.getNextNode(firstOp);
	auto firstReadStmt = dyn_cast<AtomicReadOp>(firstOp);
	auto firstUpdateStmt = dyn_cast<AtomicUpdateOp>(firstOp);			auto firstUpdateStmt = dyn_cast<AtomicUpdateOp>(firstOp);
				auto firstReadStmt = dyn_cast<AtomicReadOp>(firstOp);
				shraiyshUnsubmitted Not Done Reply Inline Actions Is this move required? shraiysh: Is this move required?
				NimishMishraAuthorUnsubmitted Done Reply Inline Actions No. It's unintentional. I will revert it. NimishMishra: No. It's unintentional. I will revert it.
				auto &secondOp = *ops.getNextNode(firstOp);
	auto secondReadStmt = dyn_cast<AtomicReadOp>(secondOp);			auto secondReadStmt = dyn_cast<AtomicReadOp>(secondOp);
	auto secondUpdateStmt = dyn_cast<AtomicUpdateOp>(secondOp);			auto secondUpdateStmt = dyn_cast<AtomicUpdateOp>(secondOp);
	auto secondWriteStmt = dyn_cast<AtomicWriteOp>(secondOp);			auto secondWriteStmt = dyn_cast<AtomicWriteOp>(secondOp);
				if (!secondWriteStmt && !secondUpdateStmt) {
				shraiyshUnsubmitted Not Done Reply Inline Actions Can you please add a testcase for this in `mlir/test/Dialect/OpenMP/ops.mlir`? I'm having trouble understanding why this is required. shraiysh: Can you please add a testcase for this in `mlir/test/Dialect/OpenMP/ops.mlir`? I'm having…
				// If second statement is neither `omp.atomic.write` nor
				// `omp.atomic.update`, then the `omp.atomic.capture` structure is
				// [capture-stmt, write-stmt] and `write-stmt` occurs (if it occurs at all!)
				// as the second last statement of the block. Verify it thus

				for (auto &op : ops) {
				secondWriteStmt = dyn_cast<AtomicWriteOp>(op);
				if (secondWriteStmt)
				break;
				}
				}

	if (!((firstUpdateStmt && secondReadStmt) \|\|			if (!((firstUpdateStmt && secondReadStmt) \|\|
	(firstReadStmt && secondUpdateStmt) \|\|			(firstReadStmt && secondUpdateStmt) \|\|
	(firstReadStmt && secondWriteStmt)))			(firstReadStmt && secondWriteStmt)))
	return ops.front().emitError()			return ops.front().emitError()
	<< "invalid sequence of operations in the capture region";			<< "invalid sequence of operations in the capture region";
	if (firstUpdateStmt && secondReadStmt &&			if (firstUpdateStmt && secondReadStmt &&
	firstUpdateStmt.x() != secondReadStmt.x())			firstUpdateStmt.x() != secondReadStmt.x())
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[flang][OpenMP] Lowering support for atomic captureClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 435062

flang/lib/Lower/OpenMP.cpp

flang/test/Lower/OpenMP/atomic-capture.f90

flang/test/Lower/OpenMP/atomic-update.f90

mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp

[flang][OpenMP] Lowering support for atomic capture
ClosedPublic