This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
lib/Lower/
-
Lower/
2/6
OpenMP.cpp
-
test/Lower/OpenMP/
-
Lower/
-
OpenMP/
10/18
atomic-capture.f90

Differential D127272

[flang][OpenMP] Lowering support for atomic capture
ClosedPublic

Authored by NimishMishra on Jun 8 2022, 12:56 AM.

Download Raw Diff

Details

Reviewers

sscalpone
jdoerfert
shraiysh
kiranchandramohan
kiranktp
peixin
MatsPetersson

Commits

rGb2eceea3929e: [flang][OpenMP] Lowering support for atomic capture

Summary

This patch adds lowering support for atomic capture operation. First is created a region (without any operand) for the atomic capture operation. Then based on one of the following configurations...

[update-stmt, capture-stmt]
[capture-stmt, update-stmt]
[capture-stmt, write-stmt]

... the lowering proceeds by creating these individual operations inside the atomic capture's region.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

NimishMishra created this revision.Jun 8 2022, 12:56 AM

Herald added a reviewer: sscalpone. · View Herald TranscriptJun 8 2022, 12:56 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 23 others. · View Herald Transcript

NimishMishra requested review of this revision.Jun 8 2022, 12:56 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptJun 8 2022, 12:56 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B168488: Diff 435062.Jun 8 2022, 12:57 AM

NimishMishra updated this revision to Diff 435066.Jun 8 2022, 1:01 AM

NimishMishra edited the summary of this revision. (Show Details)

NimishMishra removed a parent revision: D125668: [flang][OpenMP] Lowering support for atomic update construct.

Harbormaster completed remote builds in B168489: Diff 435066.Jun 8 2022, 1:02 AM

This follows after D125668.

TODO: handling pointers in atomic capture construct

Thanks for working on this. I have a few comments on generated IR and changes in OpenMPDialect.cpp. Please excuse the formatting issues, if any as I'm commenting this from a mobile :')

flang/lib/Lower/OpenMP.cpp
2152	Shouldn't this be done in semantics?
flang/test/Lower/OpenMP/atomic-capture.f90
35	Shouldn't all of this (34-41) be wrapped in an omp.atomic.read and omp.atomic.update operation? Why can't we generate that here? Relaxing the restrictions on omp.capture is not a solution for this when it's possible to express this same thing in current syntax.
mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
936 ↗	(On Diff #435066)	Is this move required?
941 ↗	(On Diff #435066)	Can you please add a testcase for this in `mlir/test/Dialect/OpenMP/ops.mlir`? I'm having trouble understanding why this is required.

Herald added a subscriber: Peiming. · View Herald TranscriptJun 9 2022, 11:56 AM

NimishMishra added inline comments.Jun 9 2022, 9:33 PM

flang/lib/Lower/OpenMP.cpp
2152	This is the best approach I could think of in order to understand which capture construct combination to lower to. So I put basic "structural" checks for `v=x` and `x=x op expr` statements. There are more semantic checks attached with these. I assume we rely on the semantics phase to take care of them. These helper functions here are only doing structural checks.
flang/test/Lower/OpenMP/atomic-capture.f90
35	I do not understand. What do you mean by "wrapped" in a read and write operation? Generally, the read operation is not problematic. In the write operation however, the FIR for expression evaluation is involved. I was attempting to control insertion points to make this expression evaluation "outside" the omp.atomic.capture, but I couldn't do it. However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write). Please correct me if I am missing something here.
mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
936 ↗	(On Diff #435066)	No. It's unintentional. I will revert it.

shraiysh added inline comments.Jun 9 2022, 10:20 PM

flang/test/Lower/OpenMP/atomic-capture.f90
34–43	If you can generate this, then that's the most accurate imo. This does not violate the semantics of atomic construct because it clearly says that - Only the read and write of the location designated by x are performed mutually atomically. Neither the evaluation of expr or expr_list, nor the write to the location designated by v, need be atomic with respect to the read or write of the location designated by x.
36–42	I do not understand. What do you mean by "wrapped" in a read and write operation? This is what I meant. However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write). Ideally, the evaluation of the expression should not be inside the atomic region at all. However, if that's somehow not possible, it should be pushed inside an omp.atomic.update expression because omp.atomic.update operation supports multiple operations in it's region. If that cannot be done too, we should justify adding this relaxation to omp.atomic.capture independent of Flang by answering "what kind of capture executions cannot be expressed in MLIR with the current syntax?". As long as they can be expressed, it should be the job of flang to lower appropriately.

NimishMishra added inline comments.Jun 9 2022, 10:38 PM

flang/test/Lower/OpenMP/atomic-capture.f90
36–42	Okay. Then I will work on moving the evaluation of the expression outside the atomic region. I think I have a way to do that.

shraiysh added inline comments.Jun 9 2022, 11:03 PM

flang/lib/Lower/OpenMP.cpp
2194–2195	This should not always be true. If the function call on rhs does not use the variable on the lhs, then this is an atomic write statement. Modeling it as an update sort of works, in the sense that there is no visible change in behavior of the program, but the generated IR will not be entirely accurate. If it is very hard to deduce write here, maybe mention it as a todo?
2196–2198	I meant can something like this work? Relying on semantics for a valid binary operator.

NimishMishra added inline comments.Jun 9 2022, 11:09 PM

flang/lib/Lower/OpenMP.cpp
2194–2195	You are right. I missed it. I will revisit these functions and see what can be done.

Improved design of the solution

NimishMishra added inline comments.Aug 1 2022, 4:55 AM

flang/test/Lower/OpenMP/atomic-capture.f90
82	I had a discussion to make at this point. The verifier for atomic capture checks if `opsInRegion.size() == 3` i.e. if the number of operations in a region were 3. This issue came up before during lowering of `omp.atomic.write` inside atomic capture, since a write statement has expression evaluation which also takes up space. @shraiysh suggested to keep this expression evaluation outside the `omp.atomic.capture`, since is what I have done currently. However, with pointers, the issue has resurfaced. This particular `b = a` lowers as %0 = allocate c %1 = allocate d %2 = allocate a %3 = allocate b omp.atomic.capture{ omp.atomic.update{......} %4 = load a % 5 = load b omp.atomic.read %5 = %4 omp.terminator } The discussion I wish to have is whether the verification method here should be changed, or should we evaluate all LHS and RHS of an atomic assignment statement beforehand. To change the verification, I was thinking like the following: Ensure in a list of operations, there are exactly two omp.atomic operations and exactly one omp.terminator operations Last operation in the region is omp.terminator Second last operation in the region is necessarily a omp.atomic operation

Harbormaster completed remote builds in B178552: Diff 448993.Aug 1 2022, 5:19 AM

Ping for review!

NimishMishra mentioned this in D126612: [flang][OpenMP] Added semantic checks for atomic capture construct.Aug 18 2022, 6:38 PM

Apologies for the delay in review. Please let me know what you think. I will join the next OpenMP call and we can maybe discuss this.

flang/test/Lower/OpenMP/atomic-capture.f90
82	The generated code should look like the following - // %a_ptr : !fir.ref<!fir.ref<i32>> %b_ptr = !fir.ref<!fir.ref<i32>> %a_addr = load %a_ptr : !fir.ref<!fir.ref<i32>> %b_addr = load %b_ptr : !fir.ref<!fir.ref<i32>> // %a_addr : !fir.ref<i32>, %b_addr : !fir.ref<i32> omp.atomic.capture{ omp.atomic.update %a_addr : !fir.ref<i32> { ^bb0(%a_val: i32): %b_val = load %b_addr : !fir.ref<i32> %temp = arith.addi %a_val, %b_val : i32 omp.yield %temp } omp.atomic.read %b_addr = %a_addr omp.terminator } I get the sense that this is convoluted on flang's end to generate it. It is however not a good idea to relax the constraints on omp.capture because of this. I will reiterate that if we come up with an OpenMP atomic evaluation that cannot be expressed by hand with the `omp.atomic.capture` operation, then we should definitely change it. Just because flang isn't able to generate it - this isn't a good enough reason to alter the operation. This, and the fact that if we change the number of operations inside atomic capture, we have to worry about lowering it to LLVM IR - which will be harder as the operation relaxes. If you strongly need to relax the constraints on `omp.atomic.capture`, we should first make sure that the relaxed version translates properly to LLVM IR for execution (probably as a separate patch). I wanted to put an idea out - maybe to ease the difficulty of generation of `omp.atomic.capture`, we can define an `fir.omp.atomic.capture` operation that accepts multiple operations under it. Then during canonicalization (or some other pass) in FIR we can push the unnecessary operations (load a, load b in your example) outside the `fir.omp.atomic.capture` operation to generate `omp.atomic.capture` operation. Does that sound like it would make the implementation more straightforward?

LGTM

NimishMishra added inline comments.Aug 30 2022, 6:52 PM

flang/test/Lower/OpenMP/atomic-capture.f90
82	Ok. I will try to keep loading of the two addresses outside the capture region entirely.

Changed design of the patch to evaluate LHS and RHS of the two assignment statements before generating the omp.atomic.capture operation.

flang/test/Lower/OpenMP/atomic-capture.f90
82	@shraiysh Does this IR look ok?

Harbormaster completed remote builds in B184490: Diff 457145.Aug 31 2022, 6:54 PM

shraiysh added inline comments.Aug 31 2022, 9:19 PM

flang/test/Lower/OpenMP/atomic-capture.f90
82	Yes, the current testcases look perfect to me and I cannot spot any errors in them. Thanks for the patience and to get it to work 👏 . I have not reviewed the code itself, but functionality wise, it looks okay to me. Please feel free to go ahead without my approval for the code as it might be sometime before I get time to review the code. If because of some review comments you happen to change the testcases before landing this, then let me know and I will review the updated testcases as soon as I can.

The following test case fails.

program main
  implicit none
  integer, parameter :: n1 = 10
  integer, parameter :: n2 = 100
  integer, parameter :: n = 30
  integer :: idx(n2)
  integer :: i
  integer(1) :: xi1(n1), yi1(n1), zi1(n1), oi1(n1), pi1(n1), qi1(n1), expecti1(n1)
  integer(8) :: xi8(n1), yi8(n1), zi8(n1), oi8(n1), pi8(n1), qi8(n1), expecti8(n1)
  logical :: rst(n) = .false.

  do i = 1, n2
    idx(i) = mod(i, n1) + 1
  end do
  ! add integer(1)
  xi1 = 0
  yi1 = 0
  zi1 = 0
  expecti1 = [38, -52, -42, -32, -22, -12, -2, 8, 18, 28]
  call atomic_capture_addi1(xi1, yi1, zi1, oi1, pi1, qi1, idx, n2)
  rst(1:10) = xi1 .eq. expecti1
  rst(11:20) = yi1 .eq. expecti1
  rst(21:30) = zi1 .eq. expecti1
  if (any(rst .neqv. .true.)) STOP 1
  print *, "PASS"

contains
  integer(1) function fi1(i)
    integer :: i
    fi1 = i
  end function
  integer(8) function fi8(i)
    integer :: i
    fi8 = i
  end function
  subroutine atomic_capture_addi1(x, y, z, o, p, q, idx, n)
    integer(1) :: x(*), y(*), z(*), o(*), p(*), q(*)
    integer :: idx(*)
    integer :: i, n
    !$omp parallel do shared(x, y, z, o, p, q, idx, n)
    do i = 1, n
      !$omp atomic capture
      x(idx(i)) = x(idx(i)) + fi1(i)
      o(idx(i)) = x(idx(i))
      !$omp end atomic
      !$omp atomic capture
      y(idx(i)) = fi1(i) + y(idx(i))
      p(idx(i)) = y(idx(i))
      !$omp end atomic
      !$omp atomic capture
      z(idx(i)) = z(idx(i)) + fi8(i)
      q(idx(i)) = z(idx(i))
      !$omp end atomic
    end do
  end subroutine
end program main

$ gfortran -fopenmp test.f90 && ./a.out
 PASS
$ flang-new -flang-experimental-exec -fopenmp test.f90 
flang-new: /home/qpx/compilers/llvm-community/omp-dev/llvm-project/flang/lib/Lower/OpenMP.cpp:1591: void genOmpAtomicUpdateStatement(Fortran::lower::AbstractConverter&, Fortran::lower::pft::Evaluation&, mlir::Value, mlir::Type, const Fortran::parser::Variable&, const Fortran::parser::Expr&, const Fortran::parser::OmpAtomicClauseList*, const Fortran::parser::OmpAtomicClauseList*): Assertion `name && name->symbol && "No symbol attached to atomic update variable"' failed.

peixin requested changes to this revision.Sep 29 2022, 6:31 PM

This revision now requires changes to proceed.Sep 29 2022, 6:31 PM

Added a fix to handle Array Refs in atomic update constructs

Herald added a subscriber: sunshaoce. · View Herald TranscriptJan 16 2023, 4:31 AM

Harbormaster completed remote builds in B208016: Diff 489503.Jan 16 2023, 5:22 AM

LGTM

See comment inline about the latest change.

Since we do not have a mechanism for handling array element, I think you can add a hard TODO for the array element case. We can move ahead with the rest of the patch. File a github issue for the array element case.

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	I believe the semantics of `atomic.update` is that it will load the `address` that is provided to it and that will be available as the basic block argument `ARG`. The body of the update will use this loaded value `ARG` to perform the update and yield the updated value which will be stored again at the `address`. I think this update operation will not work as expected since it is not using the automatically loaded value in `ARG` and it is loading the value at the address passed to the `atomic.update` op and adds the constant to it. This will end up computing which is not what we want. `numbers(1) = numbers(1) + numbers(1) + 10`. Let me know if I missed a point.

This revision now requires changes to proceed.Feb 19 2023, 11:15 AM

kiranchandramohan added inline comments.Feb 19 2023, 2:28 PM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	I think I got that wrong. It will not double add since the new code is not touching the `ARG`. I tried fetching the patch but it shows some issues. Could you rebase? I would like to have a look at the IR that is generated. Particularly, we have to check, whether the following load is an atomic load in the LLVM IR. !CHECK: %[[array_element_inner:.*]] = fir.load %[[array_element_ref]] whether the loop containing `cmpxchg` is well-formed.

NimishMishra added inline comments.Mar 9 2023, 6:16 AM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	Hi Kiran. I was trying to understand the requirement here. I do not completely understand why `cmpxchg` could be a problem here? Wouldn't the compare and the exchange happen inside the atomic region. I mean I am just trying to understand how the generated IR should look like, to make sure I am doing it correctly.

Rebased with main. Fixed generation of omp.atomic.read

Herald added a subscriber: jplehr. · View Herald TranscriptMar 22 2023, 8:30 PM

NimishMishra added inline comments.Mar 22 2023, 8:35 PM

flang/test/Lower/OpenMP/atomic-capture.f90

144–154

Hi Kiran.

The LLVM IR is as follows. Does it look ok? The load is within the Atomic Update block, so it should be fine right?

llvm.func @_QParray_refs() {                                                                                                                 
  %0 = llvm.mlir.constant(1.000000e+01 : f32) : f32                                                                                         
  %1 = llvm.mlir.constant(1 : i64) : i64                                                                                                     
  %2 = llvm.alloca %1 x !llvm.array<5 x f32> {bindc_name = "numbers", in_type = !fir.array<5xf32>, operand_segment_sizes = array<i32: 0, 0>,
uniq_name = "_QFarray_refsEnumbers"} : (i64) -> !llvm.ptr<array<5 x f32>>                                                                   
  %3 = llvm.alloca %1 x f32 {bindc_name = "x", in_type = f32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFarray_refsEx"} : (i64
) -> !llvm.ptr<f32>                                                                                                                         
  %4 = llvm.getelementptr %2[0, 0] : (!llvm.ptr<array<5 x f32>>) -> !llvm.ptr<f32>                                                           
  omp.atomic.capture   {                                                                                                                     
    omp.atomic.update   %4 : !llvm.ptr<f32> {                                                                                               
    ^bb0(%arg0: f32):                                                                                                                       
      %5 = llvm.load %4 : !llvm.ptr<f32>                                                                                                     
      %6 = llvm.fadd %5, %0  {fastmathFlags = #llvm.fastmath<contract>} : f32
      omp.yield(%6 : f32)                                             
    }                                                                 
    omp.atomic.read %3 = %4   : !llvm.ptr<f32>, f32
  }         
  llvm.return

Harbormaster completed remote builds in B221197: Diff 507581.Mar 22 2023, 8:46 PM

NimishMishra added inline comments.Mar 22 2023, 11:23 PM

flang/test/Lower/OpenMP/atomic-capture.f90

144–154

Apologies. That is the MLIR dialect. Please find the LLVM IR below. It also has a cmpxchg instruction

The test case is simple:

!$omp atomic capture
      x = y
      y = x + y
!$omp end capture

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca float, i64 1, align 4
  store float 2.000000e+01, ptr %1, align 4
  store float 1.000000e+01, ptr %2, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %2 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %3 = phi i32 [ %.atomic.load, %entry ], [ %8, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %3 to float
  %4 = load float, ptr %1, align 4
  %5 = fadd contract float %4, %.atomic.fltCast
  store float %5, ptr %x.new.val, align 4
  %6 = load i32, ptr %x.new.val, align 4
  %7 = cmpxchg ptr %2, i32 %3, i32 %6 monotonic monotonic, align 4
  %8 = extractvalue { i32, i1 } %7, 0
  %9 = extractvalue { i32, i1 } %7, 1
  br i1 %9, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %.atomic.fltCast, ptr %1, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

NimishMishra added inline comments.Mar 22 2023, 11:28 PM

flang/test/Lower/OpenMP/atomic-capture.f90

144–154

And for the array reference test case

!$omp atomic capture
      x(1) = y(1)
      y(1) = x(1) + y(1)
!$omp end capture

The following IR is generated:

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca [5 x float], i64 1, align 4
  %3 = alloca [5 x float], i64 1, align 4
  %4 = getelementptr [5 x float], ptr %2, i32 0, i32 0
  store float 2.000000e+01, ptr %4, align 4
  %5 = getelementptr [5 x float], ptr %3, i32 0, i32 0
  store float 1.000000e+01, ptr %5, align 4
  %6 = load float, ptr %4, align 4
  %7 = load float, ptr %5, align 4
  %8 = fadd contract float %6, %7
  store float %8, ptr %1, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %4 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %9 = phi i32 [ %.atomic.load, %entry ], [ %13, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %9 to float
  %10 = load float, ptr %5, align 4
  store float %10, ptr %x.new.val, align 4
  %11 = load i32, ptr %x.new.val, align 4
  %12 = cmpxchg ptr %4, i32 %9, i32 %11 monotonic monotonic, align 4
  %13 = extractvalue { i32, i1 } %12, 0
  %14 = extractvalue { i32, i1 } %12, 1
  br i1 %14, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %10, ptr %5, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

kiranchandramohan added inline comments.Mar 27 2023, 4:55 PM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	The concern here is that the atomically loaded value is not used in the update operation. AFAIU, the cmpxchg instruction only updates the location if the value that is currently at that location equals the value that is used for the update. So, if we are doing y = y + x. An initial atomic load is made of y (=y_old) and it is added with the value at x to obtain the value y_old + x. Before storing this value at y, it is checked that the current resident value at y is equal to y_old. The problem here (for the array-element case) is that the value to be used for updating is not obtained using the atomically loaded value y_old, but it is using a different value and that does not seem correct. Also, the update operation (addition here) has to be inside the loop since it addition should be performed on the atomically loaded value.

NimishMishra added inline comments.Mar 29 2023, 7:32 PM

flang/test/Lower/OpenMP/atomic-capture.f90
144–154	I understand now. Thank you. The patch can not go ahead in its current form then. Do you any suggestions on how to go forward with fixing it then? Johannes did mention an alternative strategy some time back, but I am not sure how to start on that direction. Can you give some initial direction?

Please remove the handling of the array element case and remove the test for it. We can handle array elements in a separate patch.

I believe the rest of the code looks good. Thanks for the patience and the changes.

flang/lib/Lower/OpenMP.cpp
2092	For the array element case, please add a Not Yet Implemented TODO. We can handle this separately.

This revision was not accepted when it landed; it landed in state Needs Review.May 3 2023, 9:48 PM

Closed by commit rGb2eceea3929e: [flang][OpenMP] Lowering support for atomic capture (authored by NimishMishra). · Explain Why

This revision was automatically updated to reflect the committed changes.

NimishMishra added a commit: rGb2eceea3929e: [flang][OpenMP] Lowering support for atomic capture.

Revision Contents

Path

Size

flang/

lib/

Lower/

OpenMP.cpp

287 lines

test/

Lower/

OpenMP/

atomic-capture.f90

117 lines

Diff 519354

flang/lib/Lower/OpenMP.cpp

Show First 20 Lines • Show All 1,921 Lines • ▼ Show 20 Lines if (dir == llvm::omp::Directive::OMPD_parallel_sections) {

auto sectionsOp = firOpBuilder.create<mlir::omp::SectionsOp>( auto sectionsOp = firOpBuilder.create<mlir::omp::SectionsOp>(

currentLocation, reductionVars, /*reductions = */ nullptr, currentLocation, reductionVars, /*reductions = */ nullptr,

allocateOperands, allocatorOperands, noWaitClauseOperand); allocateOperands, allocatorOperands, noWaitClauseOperand);

createBodyOfOp<omp::SectionsOp>(sectionsOp, converter, currentLocation, createBodyOfOp<omp::SectionsOp>(sectionsOp, converter, currentLocation,

eval); eval);

} }

static bool checkForSingleVariableOnRHS(

const Fortran::parser::AssignmentStmt &assignmentStmt) {

// Check if the assignment statement has a single variable on the RHS

const Fortran::parser::Expr &expr{

std::get<Fortran::parser::Expr>(assignmentStmt.t)};

const Fortran::common::Indirection<Fortran::parser::Designator> *designator =

std::get_if<Fortran::common::Indirection<Fortran::parser::Designator>>(

&expr.u);

const Fortran::parser::Name *name =

designator ? getDesignatorNameIfDataRef(designator->value()) : nullptr;

return name != nullptr;

}

static bool

checkForSymbolMatch(const Fortran::parser::AssignmentStmt &assignmentStmt) {

// Check if the symbol on the LHS of the assignment statement is present in

// the RHS expression

const auto &var{std::get<Fortran::parser::Variable>(assignmentStmt.t)};

const auto &expr{std::get<Fortran::parser::Expr>(assignmentStmt.t)};

const auto *e{Fortran::semantics::GetExpr(expr)};

const auto *v{Fortran::semantics::GetExpr(var)};

const Fortran::semantics::Symbol &varSymbol =

Fortran::evaluate::GetSymbolVector(*v).front();

for (const Fortran::semantics::Symbol &symbol :

Fortran::evaluate::GetSymbolVector(*e))

if (varSymbol == symbol)

return true;

return false;

}

static void genOmpAtomicHintAndMemoryOrderClauses( static void genOmpAtomicHintAndMemoryOrderClauses(

Fortran::lower::AbstractConverter &converter, Fortran::lower::AbstractConverter &converter,

const Fortran::parser::OmpAtomicClauseList &clauseList, const Fortran::parser::OmpAtomicClauseList &clauseList,

mlir::IntegerAttr &hint, mlir::IntegerAttr &hint,

mlir::omp::ClauseMemoryOrderKindAttr &memoryOrder) { mlir::omp::ClauseMemoryOrderKindAttr &memoryOrder) {

auto &firOpBuilder = converter.getFirOpBuilder(); auto &firOpBuilder = converter.getFirOpBuilder();

for (const auto &clause : clauseList.v) { for (const auto &clause : clauseList.v) {

if (auto ompClause = std::get_if<Fortran::parser::OmpClause>(&clause.u)) { if (auto ompClause = std::get_if<Fortran::parser::OmpClause>(&clause.u)) {

Show All 22 Lines if (auto ompClause = std::get_if<Fortran::parser::OmpClause>(&clause.u)) {

&ompMemoryOrderClause->v.u)) { &ompMemoryOrderClause->v.u)) {

memoryOrder = mlir::omp::ClauseMemoryOrderKindAttr::get( memoryOrder = mlir::omp::ClauseMemoryOrderKindAttr::get(

firOpBuilder.getContext(), omp::ClauseMemoryOrderKind::Release); firOpBuilder.getContext(), omp::ClauseMemoryOrderKind::Release);

} }

static void genOmpAtomicCaptureStatement(

Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, mlir::Value from_address,

mlir::Value to_address,

const Fortran::parser::OmpAtomicClauseList *leftHandClauseList,

const Fortran::parser::OmpAtomicClauseList *rightHandClauseList,

mlir::Type elementType) {

// Generate `omp.atomic.read` operation for atomic assigment statements

auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation();

// If no hint clause is specified, the effect is as if

// hint(omp_sync_hint_none) had been specified.

mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr;

if (leftHandClauseList)

genOmpAtomicHintAndMemoryOrderClauses(converter, *leftHandClauseList, hint,

memory_order);

if (rightHandClauseList)

genOmpAtomicHintAndMemoryOrderClauses(converter, *rightHandClauseList, hint,

memory_order);

firOpBuilder.create<mlir::omp::AtomicReadOp>(

currentLocation, from_address, to_address,

mlir::TypeAttr::get(elementType), hint, memory_order);

}

static void genOmpAtomicWriteStatement(

Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, mlir::Value lhs_addr,

mlir::Value rhs_expr,

const Fortran::parser::OmpAtomicClauseList *leftHandClauseList,

const Fortran::parser::OmpAtomicClauseList *rightHandClauseList,

mlir::Value *evaluatedExprValue = nullptr) {

// Generate `omp.atomic.write` operation for atomic assignment statements

auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation();

// If no hint clause is specified, the effect is as if

// hint(omp_sync_hint_none) had been specified.

mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr;

if (leftHandClauseList)

genOmpAtomicHintAndMemoryOrderClauses(converter, *leftHandClauseList, hint,

memory_order);

if (rightHandClauseList)

genOmpAtomicHintAndMemoryOrderClauses(converter, *rightHandClauseList, hint,

memory_order);

firOpBuilder.create<mlir::omp::AtomicWriteOp>(currentLocation, lhs_addr,

rhs_expr, hint, memory_order);

}

static void genOmpAtomicUpdateStatement( static void genOmpAtomicUpdateStatement(

Fortran::lower::AbstractConverter &converter, Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval, mlir::Value lhs_addr,

const Fortran::parser::Variable &assignmentStmtVariable, mlir::Type varType, const Fortran::parser::Variable &assignmentStmtVariable,

const Fortran::parser::Expr &assignmentStmtExpr, const Fortran::parser::Expr &assignmentStmtExpr,

const Fortran::parser::OmpAtomicClauseList *leftHandClauseList, const Fortran::parser::OmpAtomicClauseList *leftHandClauseList,

const Fortran::parser::OmpAtomicClauseList *rightHandClauseList) { const Fortran::parser::OmpAtomicClauseList *rightHandClauseList) {

// Generate `omp.atomic.update` operation for atomic assignment statements // Generate `omp.atomic.update` operation for atomic assignment statements

auto &firOpBuilder = converter.getFirOpBuilder(); auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation(); auto currentLocation = converter.getCurrentLocation();

Fortran::lower::StatementContext stmtCtx;

mlir::Value address = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

// If no hint clause is specified, the effect is as if // If no hint clause is specified, the effect is as if

// hint(omp_sync_hint_none) had been specified. // hint(omp_sync_hint_none) had been specified.

mlir::IntegerAttr hint = nullptr; mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memoryOrder = nullptr; mlir::omp::ClauseMemoryOrderKindAttr memoryOrder = nullptr;

if (leftHandClauseList) if (leftHandClauseList)

genOmpAtomicHintAndMemoryOrderClauses(converter, *leftHandClauseList, hint, genOmpAtomicHintAndMemoryOrderClauses(converter, *leftHandClauseList, hint,

memoryOrder); memoryOrder);

if (rightHandClauseList) if (rightHandClauseList)

genOmpAtomicHintAndMemoryOrderClauses(converter, *rightHandClauseList, hint, genOmpAtomicHintAndMemoryOrderClauses(converter, *rightHandClauseList, hint,

memoryOrder); memoryOrder);

auto atomicUpdateOp = firOpBuilder.create<mlir::omp::AtomicUpdateOp>( auto atomicUpdateOp = firOpBuilder.create<mlir::omp::AtomicUpdateOp>(

currentLocation, address, hint, memoryOrder); currentLocation, lhs_addr, hint, memoryOrder);

//// Generate body of Atomic Update operation //// Generate body of Atomic Update operation

// If an argument for the region is provided then create the block with that // If an argument for the region is provided then create the block with that

// argument. Also update the symbol's address with the argument mlir value. // argument. Also update the symbol's address with the argument mlir value.

mlir::Type varType =

fir::getBase(

converter.genExprValue(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx))

.getType();

SmallVector<Type> varTys = {varType}; SmallVector<Type> varTys = {varType};

SmallVector<Location> locs = {currentLocation}; SmallVector<Location> locs = {currentLocation};

firOpBuilder.createBlock(&atomicUpdateOp.getRegion(), {}, varTys, locs); firOpBuilder.createBlock(&atomicUpdateOp.getRegion(), {}, varTys, locs);

mlir::Value val = mlir::Value val =

fir::getBase(atomicUpdateOp.getRegion().front().getArgument(0)); fir::getBase(atomicUpdateOp.getRegion().front().getArgument(0));

auto varDesignator = auto varDesignator =

std::get_if<Fortran::common::Indirection<Fortran::parser::Designator>>( std::get_if<Fortran::common::Indirection<Fortran::parser::Designator>>(

&assignmentStmtVariable.u); &assignmentStmtVariable.u);

assert(varDesignator && "Variable designator for atomic update assignment " assert(varDesignator && "Variable designator for atomic update assignment "

"statement does not exist"); "statement does not exist");

const auto *name = getDesignatorNameIfDataRef(varDesignator->value()); const auto *name = getDesignatorNameIfDataRef(varDesignator->value());

if (!name)

TODO(converter.getCurrentLocation(),

"Array references as atomic update variable");

assert(name && name->symbol && assert(name && name->symbol &&

"No symbol attached to atomic update variable"); "No symbol attached to atomic update variable");

converter.bindSymbol(*name->symbol, val); converter.bindSymbol(*name->symbol, val);

kiranchandramohanUnsubmitted

Not Done

For the array element case, please add a Not Yet Implemented TODO. We can handle this separately.

kiranchandramohan: For the array element case, please add a Not Yet Implemented TODO. We can handle this…

// Set the insert for the terminator operation to go at the end of the // Set the insert for the terminator operation to go at the end of the

// block. // block.

mlir::Block &block = atomicUpdateOp.getRegion().back(); mlir::Block &block = atomicUpdateOp.getRegion().back();

firOpBuilder.setInsertionPointToEnd(&block); firOpBuilder.setInsertionPointToEnd(&block);

mlir::Value result = fir::getBase(converter.genExprValue( Fortran::lower::StatementContext stmtCtx;

mlir::Value rhs_expr = fir::getBase(converter.genExprValue(

*Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx)); *Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx));

mlir::Value convertResult = mlir::Value convertResult =

firOpBuilder.createConvert(currentLocation, varType, result); firOpBuilder.createConvert(currentLocation, varType, rhs_expr);

// Insert the terminator: YieldOp. // Insert the terminator: YieldOp.

firOpBuilder.create<mlir::omp::YieldOp>(currentLocation, convertResult); firOpBuilder.create<mlir::omp::YieldOp>(currentLocation, convertResult);

// Reset the insert point to before the terminator. // Reset the insert point to before the terminator.

firOpBuilder.setInsertionPointToStart(&block); firOpBuilder.setInsertionPointToStart(&block);

} }

static void static void

genOmpAtomicWrite(Fortran::lower::AbstractConverter &converter, genOmpAtomicWrite(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicWrite &atomicWrite) { const Fortran::parser::OmpAtomicWrite &atomicWrite) {

auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation();

// Get the value and address of atomic write operands. // Get the value and address of atomic write operands.

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList = const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicWrite.t); std::get<2>(atomicWrite.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList = const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicWrite.t); std::get<0>(atomicWrite.t);

const auto &assignmentStmtExpr = const auto &assignmentStmtExpr =

std::get<Fortran::parser::Expr>(std::get<3>(atomicWrite.t).statement.t); std::get<Fortran::parser::Expr>(std::get<3>(atomicWrite.t).statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>( const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<3>(atomicWrite.t).statement.t); std::get<3>(atomicWrite.t).statement.t);

Fortran::lower::StatementContext stmtCtx; Fortran::lower::StatementContext stmtCtx;

mlir::Value value = fir::getBase(converter.genExprValue( // Get the value and address of atomic write operands.

mlir::Value rhs_expr = fir::getBase(converter.genExprValue(

*Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx)); *Fortran::semantics::GetExpr(assignmentStmtExpr), stmtCtx));

mlir::Value address = fir::getBase(converter.genExprAddr( mlir::Value lhs_addr = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx)); *Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

// If no hint clause is specified, the effect is as if genOmpAtomicWriteStatement(converter, eval, lhs_addr, rhs_expr,

// hint(omp_sync_hint_none) had been specified. &leftHandClauseList, &rightHandClauseList);

mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memoryOrder = nullptr;

genOmpAtomicHintAndMemoryOrderClauses(converter, leftHandClauseList, hint,

memoryOrder);

genOmpAtomicHintAndMemoryOrderClauses(converter, rightHandClauseList, hint,

memoryOrder);

firOpBuilder.create<mlir::omp::AtomicWriteOp>(currentLocation, address, value,

hint, memoryOrder);

} }

static void genOmpAtomicRead(Fortran::lower::AbstractConverter &converter, static void genOmpAtomicRead(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicRead &atomicRead) { const Fortran::parser::OmpAtomicRead &atomicRead) {

auto &firOpBuilder = converter.getFirOpBuilder();

auto currentLocation = converter.getCurrentLocation();

// Get the address of atomic read operands. // Get the address of atomic read operands.

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList = const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicRead.t); std::get<2>(atomicRead.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList = const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicRead.t); std::get<0>(atomicRead.t);

const auto &assignmentStmtExpr = const auto &assignmentStmtExpr =

std::get<Fortran::parser::Expr>(std::get<3>(atomicRead.t).statement.t); std::get<Fortran::parser::Expr>(std::get<3>(atomicRead.t).statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>( const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<3>(atomicRead.t).statement.t); std::get<3>(atomicRead.t).statement.t);

Fortran::lower::StatementContext stmtCtx; Fortran::lower::StatementContext stmtCtx;

const Fortran::semantics::SomeExpr &fromExpr = const Fortran::semantics::SomeExpr &fromExpr =

*Fortran::semantics::GetExpr(assignmentStmtExpr); *Fortran::semantics::GetExpr(assignmentStmtExpr);

mlir::Type elementType = converter.genType(fromExpr); mlir::Type elementType = converter.genType(fromExpr);

mlir::Value fromAddress = mlir::Value fromAddress =

fir::getBase(converter.genExprAddr(fromExpr, stmtCtx)); fir::getBase(converter.genExprAddr(fromExpr, stmtCtx));

mlir::Value toAddress = fir::getBase(converter.genExprAddr( mlir::Value toAddress = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx)); *Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

shraiyshUnsubmitted

Not Done

Shouldn't this be done in semantics?

shraiysh: Shouldn't this be done in semantics?

NimishMishraAuthorUnsubmitted

Done

This is the best approach I could think of in order to understand which capture construct combination to lower to. So I put basic "structural" checks for v=x and x=x op expr statements. There are more semantic checks attached with these. I assume we rely on the semantics phase to take care of them.

These helper functions here are only doing structural checks.

NimishMishra: This is the best approach I could think of in order to understand which capture construct…

// If no hint clause is specified, the effect is as if genOmpAtomicCaptureStatement(converter, eval, fromAddress, toAddress,

// hint(omp_sync_hint_none) had been specified. &leftHandClauseList, &rightHandClauseList,

mlir::IntegerAttr hint = nullptr; elementType);

mlir::omp::ClauseMemoryOrderKindAttr memoryOrder = nullptr;

genOmpAtomicHintAndMemoryOrderClauses(converter, leftHandClauseList, hint,

memoryOrder);

genOmpAtomicHintAndMemoryOrderClauses(converter, rightHandClauseList, hint,

memoryOrder);

firOpBuilder.create<mlir::omp::AtomicReadOp>(

currentLocation, fromAddress, toAddress, mlir::TypeAttr::get(elementType),

hint, memoryOrder);

} }

static void static void

genOmpAtomicUpdate(Fortran::lower::AbstractConverter &converter, genOmpAtomicUpdate(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicUpdate &atomicUpdate) { const Fortran::parser::OmpAtomicUpdate &atomicUpdate) {

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList = const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicUpdate.t); std::get<2>(atomicUpdate.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList = const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicUpdate.t); std::get<0>(atomicUpdate.t);

const auto &assignmentStmtExpr = const auto &assignmentStmtExpr =

std::get<Fortran::parser::Expr>(std::get<3>(atomicUpdate.t).statement.t); std::get<Fortran::parser::Expr>(std::get<3>(atomicUpdate.t).statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>( const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<3>(atomicUpdate.t).statement.t); std::get<3>(atomicUpdate.t).statement.t);

genOmpAtomicUpdateStatement(converter, eval, assignmentStmtVariable, Fortran::lower::StatementContext stmtCtx;

assignmentStmtExpr, &leftHandClauseList, mlir::Value lhs_addr = fir::getBase(converter.genExprAddr(

&rightHandClauseList); *Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

mlir::Type varType =

fir::getBase(

converter.genExprValue(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx))

.getType();

genOmpAtomicUpdateStatement(converter, eval, lhs_addr, varType,

assignmentStmtVariable, assignmentStmtExpr,

&leftHandClauseList, &rightHandClauseList);

} }

static void genOmpAtomic(Fortran::lower::AbstractConverter &converter, static void genOmpAtomic(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomic &atomicConstruct) { const Fortran::parser::OmpAtomic &atomicConstruct) {

const Fortran::parser::OmpAtomicClauseList &atomicClauseList = const Fortran::parser::OmpAtomicClauseList &atomicClauseList =

std::get<Fortran::parser::OmpAtomicClauseList>(atomicConstruct.t); std::get<Fortran::parser::OmpAtomicClauseList>(atomicConstruct.t);

const auto &assignmentStmtExpr = std::get<Fortran::parser::Expr>( const auto &assignmentStmtExpr = std::get<Fortran::parser::Expr>(

std::get<Fortran::parser::Statement<Fortran::parser::AssignmentStmt>>( std::get<Fortran::parser::Statement<Fortran::parser::AssignmentStmt>>(

atomicConstruct.t) atomicConstruct.t)

.statement.t); .statement.t);

const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>( const auto &assignmentStmtVariable = std::get<Fortran::parser::Variable>(

std::get<Fortran::parser::Statement<Fortran::parser::AssignmentStmt>>( std::get<Fortran::parser::Statement<Fortran::parser::AssignmentStmt>>(

atomicConstruct.t) atomicConstruct.t)

shraiyshUnsubmitted

Not Done

This should not always be true. If the function call on rhs does not use the variable on the lhs, then this is an atomic write statement.

Modeling it as an update sort of works, in the sense that there is no visible change in behavior of the program, but the generated IR will not be entirely accurate. If it is very hard to deduce write here, maybe mention it as a todo?

shraiysh: This should not always be true. If the function call on rhs does not use the variable on the…

NimishMishraAuthorUnsubmitted

Done

You are right. I missed it. I will revisit these functions and see what can be done.

NimishMishra: You are right. I missed it. I will revisit these functions and see what can be done.

.statement.t); .statement.t);

Fortran::lower::StatementContext stmtCtx;

mlir::Value lhs_addr = fir::getBase(converter.genExprAddr(

shraiyshUnsubmitted

Not Done

Fortran::parser::FunctionReference> &) { return true; },

- [&](const auto &x) {

- return isOmpAtomicUpdateStmtOperatorValid(x, var);

- },

+ [&](const Fortran::parser::Expr::IntrinsicBinary &node) {

+ const auto &variableName{var.GetSource().ToString()};

+ const auto &exprLeft{std::get<0>(node.t)};

+ const auto &exprRight{std::get<1>(node.t)};

+ if ((exprLeft.value().source.ToString() == variableName) &&

+ (exprRight.value().source.ToString() == variableName)) {

+ return true;

+ }

+ return false;

+ },

+ [&](const auto &x) {

+ return false;

+ }, },

expr.u);

I meant can something like this work? Relying on semantics for a valid binary operator.

shraiysh: I meant can something like this work? Relying on semantics for a valid binary operator.

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx));

mlir::Type varType =

fir::getBase(

converter.genExprValue(

*Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx))

.getType();

// If atomic-clause is not present on the construct, the behaviour is as if // If atomic-clause is not present on the construct, the behaviour is as if

// the update clause is specified // the update clause is specified

genOmpAtomicUpdateStatement(converter, eval, assignmentStmtVariable, genOmpAtomicUpdateStatement(converter, eval, lhs_addr, varType,

assignmentStmtExpr, &atomicClauseList, nullptr); assignmentStmtVariable, assignmentStmtExpr,

&atomicClauseList, nullptr);

}

static void

genOmpAtomicCapture(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OmpAtomicCapture &atomicCapture) {

fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();

mlir::Location currentLocation = converter.getCurrentLocation();

mlir::IntegerAttr hint = nullptr;

mlir::omp::ClauseMemoryOrderKindAttr memory_order = nullptr;

const Fortran::parser::OmpAtomicClauseList &rightHandClauseList =

std::get<2>(atomicCapture.t);

const Fortran::parser::OmpAtomicClauseList &leftHandClauseList =

std::get<0>(atomicCapture.t);

genOmpAtomicHintAndMemoryOrderClauses(converter, leftHandClauseList, hint,

memory_order);

genOmpAtomicHintAndMemoryOrderClauses(converter, rightHandClauseList, hint,

memory_order);

const Fortran::parser::AssignmentStmt &stmt1 =

std::get<3>(atomicCapture.t).v.statement;

const auto &stmt1Var{std::get<Fortran::parser::Variable>(stmt1.t)};

const auto &stmt1Expr{std::get<Fortran::parser::Expr>(stmt1.t)};

const Fortran::parser::AssignmentStmt &stmt2 =

std::get<4>(atomicCapture.t).v.statement;

const auto &stmt2Var{std::get<Fortran::parser::Variable>(stmt2.t)};

const auto &stmt2Expr{std::get<Fortran::parser::Expr>(stmt2.t)};

// Pre-evaluate expressions to be used in the various operations inside

// `omp.atomic.capture` since it is not desirable to have anything other than

// a `omp.atomic.read`, `omp.atomic.write`, or `omp.atomic.update` operation

// inside `omp.atomic.capture`

Fortran::lower::StatementContext stmtCtx;

mlir::Value stmt1LHSArg, stmt1RHSArg, stmt2LHSArg, stmt2RHSArg;

mlir::Type elementType;

// LHS evaluations are common to all combinations of `omp.atomic.capture`

stmt1LHSArg = fir::getBase(

converter.genExprAddr(*Fortran::semantics::GetExpr(stmt1Var), stmtCtx));

stmt2LHSArg = fir::getBase(

converter.genExprAddr(*Fortran::semantics::GetExpr(stmt2Var), stmtCtx));

// Operation specific RHS evaluations

if (checkForSingleVariableOnRHS(stmt1)) {

// Atomic capture construct is of the form [capture-stmt, update-stmt] or

// of the form [capture-stmt, write-stmt]

stmt1RHSArg = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(stmt1Expr), stmtCtx));

stmt2RHSArg = fir::getBase(converter.genExprValue(

*Fortran::semantics::GetExpr(stmt2Expr), stmtCtx));

} else {

// Atomic capture construct is of the form [update-stmt, capture-stmt]

stmt1RHSArg = fir::getBase(converter.genExprValue(

*Fortran::semantics::GetExpr(stmt1Expr), stmtCtx));

stmt2RHSArg = fir::getBase(converter.genExprAddr(

*Fortran::semantics::GetExpr(stmt2Expr), stmtCtx));

}

// Type information used in generation of `omp.atomic.update` operation

mlir::Type stmt1VarType =

fir::getBase(converter.genExprValue(

*Fortran::semantics::GetExpr(stmt1Var), stmtCtx))

.getType();

mlir::Type stmt2VarType =

fir::getBase(converter.genExprValue(

*Fortran::semantics::GetExpr(stmt2Var), stmtCtx))

.getType();

auto atomicCaptureOp = firOpBuilder.create<mlir::omp::AtomicCaptureOp>(

currentLocation, hint, memory_order);

firOpBuilder.createBlock(&atomicCaptureOp.getRegion());

mlir::Block &block = atomicCaptureOp.getRegion().back();

firOpBuilder.setInsertionPointToStart(&block);

if (checkForSingleVariableOnRHS(stmt1)) {

if (checkForSymbolMatch(stmt2)) {

// Atomic capture construct is of the form [capture-stmt, update-stmt]

const Fortran::semantics::SomeExpr &fromExpr =

*Fortran::semantics::GetExpr(stmt1Expr);

elementType = converter.genType(fromExpr);

genOmpAtomicCaptureStatement(converter, eval, stmt1RHSArg, stmt1LHSArg,

/*leftHandClauseList=*/nullptr,

/*rightHandClauseList=*/nullptr,

elementType);

genOmpAtomicUpdateStatement(converter, eval, stmt1RHSArg, stmt2VarType,

stmt2Var, stmt2Expr,

/*leftHandClauseList=*/nullptr,

/*rightHandClauseList=*/nullptr);

} else {

// Atomic capture construct is of the form [capture-stmt, write-stmt]

const Fortran::semantics::SomeExpr &fromExpr =

*Fortran::semantics::GetExpr(stmt1Expr);

elementType = converter.genType(fromExpr);

genOmpAtomicCaptureStatement(converter, eval, stmt1RHSArg, stmt1LHSArg,

/*leftHandClauseList=*/nullptr,

/*rightHandClauseList=*/nullptr,

elementType);

genOmpAtomicWriteStatement(converter, eval, stmt1RHSArg, stmt2RHSArg,

/*leftHandClauseList=*/nullptr,

/*rightHandClauseList=*/nullptr);

}

} else {

// Atomic capture construct is of the form [update-stmt, capture-stmt]

firOpBuilder.setInsertionPointToEnd(&block);

const Fortran::semantics::SomeExpr &fromExpr =

*Fortran::semantics::GetExpr(stmt2Expr);

elementType = converter.genType(fromExpr);

genOmpAtomicCaptureStatement(converter, eval, stmt1LHSArg, stmt2LHSArg,

/*leftHandClauseList=*/nullptr,

/*rightHandClauseList=*/nullptr, elementType);

firOpBuilder.setInsertionPointToStart(&block);

genOmpAtomicUpdateStatement(converter, eval, stmt1LHSArg, stmt1VarType,

stmt1Var, stmt1Expr,

/*leftHandClauseList=*/nullptr,

/*rightHandClauseList=*/nullptr);

}

firOpBuilder.setInsertionPointToEnd(&block);

firOpBuilder.create<mlir::omp::TerminatorOp>(currentLocation);

firOpBuilder.setInsertionPointToStart(&block);

} }

static void static void

genOMP(Fortran::lower::AbstractConverter &converter, genOMP(Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

const Fortran::parser::OpenMPAtomicConstruct &atomicConstruct) { const Fortran::parser::OpenMPAtomicConstruct &atomicConstruct) {

std::visit(Fortran::common::visitors{ std::visit(Fortran::common::visitors{

[&](const Fortran::parser::OmpAtomicRead &atomicRead) { [&](const Fortran::parser::OmpAtomicRead &atomicRead) {

genOmpAtomicRead(converter, eval, atomicRead); genOmpAtomicRead(converter, eval, atomicRead);

}, },

[&](const Fortran::parser::OmpAtomicWrite &atomicWrite) { [&](const Fortran::parser::OmpAtomicWrite &atomicWrite) {

genOmpAtomicWrite(converter, eval, atomicWrite); genOmpAtomicWrite(converter, eval, atomicWrite);

}, },

[&](const Fortran::parser::OmpAtomic &atomicConstruct) { [&](const Fortran::parser::OmpAtomic &atomicConstruct) {

genOmpAtomic(converter, eval, atomicConstruct); genOmpAtomic(converter, eval, atomicConstruct);

}, },

[&](const Fortran::parser::OmpAtomicUpdate &atomicUpdate) { [&](const Fortran::parser::OmpAtomicUpdate &atomicUpdate) {

genOmpAtomicUpdate(converter, eval, atomicUpdate); genOmpAtomicUpdate(converter, eval, atomicUpdate);

}, },

[&](const auto &) { [&](const Fortran::parser::OmpAtomicCapture &atomicCapture) {

TODO(converter.getCurrentLocation(), "Atomic capture"); genOmpAtomicCapture(converter, eval, atomicCapture);

}, },

atomicConstruct.u); atomicConstruct.u);

} }

void Fortran::lower::genOpenMPConstruct( void Fortran::lower::genOpenMPConstruct(

Fortran::lower::AbstractConverter &converter, Fortran::lower::AbstractConverter &converter,

Fortran::lower::pft::Evaluation &eval, Fortran::lower::pft::Evaluation &eval,

▲ Show 20 Lines • Show All 364 Lines • Show Last 20 Lines

flang/test/Lower/OpenMP/atomic-capture.f90

This file was added.

! RUN: %flang_fc1 -emit-fir -fopenmp %s -o - | FileCheck %s

! This test checks the lowering of atomic capture

program OmpAtomicCapture

use omp_lib

integer :: x, y

!CHECK: %[[X:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFEx"}

!CHECK: %[[Y:.*]] = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"}

!CHECK: omp.atomic.capture memory_order(release) {

!CHECK: omp.atomic.read %[[X]] = %[[Y]] : !fir.ref<i32>

!CHECK: omp.atomic.update %[[Y]] : !fir.ref<i32> {

!CHECK: ^bb0(%[[ARG:.*]]: i32):

!CHECK: %[[temp:.*]] = fir.load %[[X]] : !fir.ref<i32>

!CHECK: %[[result:.*]] = arith.addi %[[temp]], %[[ARG]] : i32

!CHECK: omp.yield(%[[result]] : i32)

!CHECK: }

!$omp atomic capture release

x = y

y = x + y

!$omp end atomic

!CHECK: omp.atomic.capture hint(uncontended) {

!CHECK: omp.atomic.update %[[Y]] : !fir.ref<i32> {

!CHECK: ^bb0(%[[ARG:.*]]: i32):

!CHECK: %[[temp:.*]] = fir.load %[[X]] : !fir.ref<i32>

!CHECK: %[[result:.*]] = arith.muli %[[temp]], %[[ARG]] : i32

!CHECK: omp.yield(%[[result]] : i32)

!CHECK: }

!CHECK: omp.atomic.read %[[X]] = %[[Y]] : !fir.ref<i32>

!CHECK: }

shraiyshUnsubmitted

Not Done

Shouldn't all of this (34-41) be wrapped in an omp.atomic.read and omp.atomic.update operation? Why can't we generate that here? Relaxing the restrictions on omp.capture is not a solution for this when it's possible to express this same thing in current syntax.

shraiysh: Shouldn't all of this (34-41) be wrapped in an omp.atomic.read and omp.atomic.update operation?

NimishMishraAuthorUnsubmitted

Done

I do not understand. What do you mean by "wrapped" in a read and write operation?

Generally, the read operation is not problematic. In the write operation however, the FIR for expression evaluation is involved. I was attempting to control insertion points to make this expression evaluation "outside" the omp.atomic.capture, but I couldn't do it.

However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write). Please correct me if I am missing something here.

NimishMishra: I do not understand. What do you mean by "wrapped" in a read and write operation? Generally…

!$omp atomic hint(omp_sync_hint_uncontended) capture

y = x * y

x = y

!$omp end atomic

!CHECK: %[[constant_20:.*]] = arith.constant 20 : i32

shraiyshUnsubmitted

Not Done

!CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

+ !CHECK: omp.atomic.update %[[VAR_Y]] {

+ !CHECK: ^bb0(%{{.+}}: i32):

!CHECK: {{.*}} = arith.constant {{.*}} : i32

!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!CHECK: {{.*}} = fir.no_reassoc {{.*}} : i32

!CHECK: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

- !CHECK: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

+ !CHECK: omp.yield (%[[INTERMEDIATE_5]]: i32)

+ !CHECK: }

!CHECK: }

I do not understand. What do you mean by "wrapped" in a read and write operation?

This is what I meant.

However, I am not too sure if it is wrong to put the FIR related to expression evaluation within the capture block (alongside the omp.atomic.write).

Ideally, the evaluation of the expression should not be inside the atomic region at all. However, if that's somehow not possible, it should be pushed inside an omp.atomic.update expression because omp.atomic.update operation supports multiple operations in it's region. If that cannot be done too, we should justify adding this relaxation to omp.atomic.capture independent of Flang by answering "what kind of capture executions cannot be expressed in MLIR with the current syntax?". As long as they can be expressed, it should be the job of flang to lower appropriately.

shraiysh: > I do not understand. What do you mean by "wrapped" in a read and write operation? This is…

NimishMishraAuthorUnsubmitted

Done

Okay. Then I will work on moving the evaluation of the expression outside the atomic region. I think I have a way to do that.

NimishMishra: Okay. Then I will work on moving the evaluation of the expression outside the atomic region. I…

!CHECK: %[[constant_8:.*]] = arith.constant 8 : i32

shraiyshUnsubmitted

Not Done

!CHECK: }

- !CHECK: omp.atomic.capture memory_order(acquire) hint(nonspeculative) {

- !CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.constant {{.*}} : i32

!CHECK: {{.*}} = fir.load %[[VAR_X]] : !fir.ref<i32>

!CHECK: {{.*}} = arith.subi {{.*}}, {{.*}} : i32

!CHECK: {{.*}} = fir.no_reassoc {{.*}} : i32

!CHECK: %[[INTERMEDIATE_5:.*]] = arith.addi {{.*}}, {{.*}} : i32

+ !CHECK: omp.atomic.capture memory_order(acquire) hint(nonspeculative) {

+ !CHECK: omp.atomic.read %[[VAR_X]] = %[[VAR_Y]] : !fir.ref<i32>

!CHECK: omp.atomic.write %[[VAR_Y]] = %[[INTERMEDIATE_5]] : !fir.ref<i32>, i32

!CHECK: }

!CHECK: omp.atomic.capture {

If you can generate this, then that's the most accurate imo.

This does not violate the semantics of atomic construct because it clearly says that -

Only the read and write of the location designated by x are performed mutually atomically. Neither the evaluation of expr or expr_list, nor the write to the location designated by v, need be atomic with respect to the read or write of the location designated by x.

shraiysh: If you can generate this, then that's the most accurate imo. This does not violate the…

!CHECK: %[[temp:.*]] = fir.load %[[X]] : !fir.ref<i32>

!CHECK: %[[result:.*]] = arith.subi %[[constant_8]], %[[temp]] : i32

!CHECK: %[[result_noreassoc:.*]] = fir.no_reassoc %[[result]] : i32

!CHECK: %[[result:.*]] = arith.addi %[[constant_20]], %[[result_noreassoc]] : i32

!CHECK: omp.atomic.capture memory_order(acquire) hint(nonspeculative) {

!CHECK: omp.atomic.read %[[X]] = %[[Y]] : !fir.ref<i32>

!CHECK: omp.atomic.write %[[Y]] = %[[result]] : !fir.ref<i32>, i32

!CHECK: }

!$omp atomic hint(omp_lock_hint_nonspeculative) capture acquire

x = y

y = 2 * 10 + (8 - x)

!$omp end atomic

!CHECK: %[[constant_20:.*]] = arith.constant 20 : i32

!CHECK: %[[constant_8:.*]] = arith.constant 8 : i32

!CHECK: %[[temp:.*]] = fir.load %[[X]] : !fir.ref<i32>

!CHECK: %[[result:.*]] = arith.subi %[[constant_8]], %[[temp]] : i32

!CHECK: %[[result_noreassoc:.*]] = fir.no_reassoc %[[result]] : i32

!CHECK: %[[result:.*]] = arith.addi %[[constant_20]], %[[result_noreassoc]] : i32

!CHECK: omp.atomic.capture {

!CHECK: omp.atomic.read %[[X]] = %[[Y]] : !fir.ref<i32>

!CHECK: omp.atomic.write %[[Y]] = %[[result]] : !fir.ref<i32>, i32

!CHECK: }

!$omp atomic capture

x = y

y = 2 * 10 + (8 - x)

!$omp end atomic

end program

subroutine pointers_in_atomic_capture()

!CHECK: %[[A:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "a", uniq_name = "_QFpointers_in_atomic_captureEa"}

!CHECK: {{.*}} = fir.zero_bits !fir.ptr<i32>

!CHECK: {{.*}} = fir.embox {{.*}} : (!fir.ptr<i32>) -> !fir.box<!fir.ptr<i32>>

!CHECK: fir.store {{.*}} to %[[A]] : !fir.ref<!fir.box<!fir.ptr<i32>>>

NimishMishraAuthorUnsubmitted

Done

I had a discussion to make at this point. The verifier for atomic capture checks if opsInRegion.size() == 3 i.e. if the number of operations in a region were 3.

This issue came up before during lowering of omp.atomic.write inside atomic capture, since a write statement has expression evaluation which also takes up space. @shraiysh suggested to keep this expression evaluation outside the omp.atomic.capture, since is what I have done currently.

However, with pointers, the issue has resurfaced. This particular b = a lowers as

%0 = allocate c
%1 = allocate d
%2 = allocate a
%3 = allocate b

omp.atomic.capture{
   omp.atomic.update{......}

 %4 = load a
  % 5 = load b
  omp.atomic.read %5 = %4

 omp.terminator
}

The discussion I wish to have is whether the verification method here should be changed, or should we evaluate all LHS and RHS of an atomic assignment statement beforehand. To change the verification, I was thinking like the following:

Ensure in a list of operations, there are exactly two omp.atomic operations and exactly one omp.terminator operations

Last operation in the region is omp.terminator

Second last operation in the region is necessarily a omp.atomic operation

NimishMishra: I had a discussion to make at this point. The verifier for atomic capture checks if…

shraiyshUnsubmitted

Not Done

The generated code should look like the following -

// %a_ptr : !fir.ref<!fir.ref<i32>> %b_ptr = !fir.ref<!fir.ref<i32>>
%a_addr = load %a_ptr : !fir.ref<!fir.ref<i32>>
%b_addr = load %b_ptr : !fir.ref<!fir.ref<i32>>
// %a_addr : !fir.ref<i32>, %b_addr : !fir.ref<i32>
omp.atomic.capture{
  omp.atomic.update %a_addr : !fir.ref<i32> {
  ^bb0(%a_val: i32):
    %b_val = load %b_addr : !fir.ref<i32>
    %temp = arith.addi %a_val, %b_val : i32
    omp.yield %temp
  }
  omp.atomic.read %b_addr = %a_addr
  omp.terminator
}

I get the sense that this is convoluted on flang's end to generate it. It is however not a good idea to relax the constraints on omp.capture because of this. I will reiterate that if we come up with an OpenMP atomic evaluation that cannot be expressed by hand with the omp.atomic.capture operation, then we should definitely change it. Just because flang isn't able to generate it - this isn't a good enough reason to alter the operation. This, and the fact that if we change the number of operations inside atomic capture, we have to worry about lowering it to LLVM IR - which will be harder as the operation relaxes. If you strongly need to relax the constraints on omp.atomic.capture, we should first make sure that the relaxed version translates properly to LLVM IR for execution (probably as a separate patch).

I wanted to put an idea out - maybe to ease the difficulty of generation of omp.atomic.capture, we can define an fir.omp.atomic.capture operation that accepts multiple operations under it. Then during canonicalization (or some other pass) in FIR we can push the unnecessary operations (load a, load b in your example) outside the fir.omp.atomic.capture operation to generate omp.atomic.capture operation. Does that sound like it would make the implementation more straightforward?

shraiysh: The generated code should look like the following - ``` // %a_ptr : !fir.ref<!fir.ref<i32>>…

NimishMishraAuthorUnsubmitted

Done

Ok. I will try to keep loading of the two addresses outside the capture region entirely.

NimishMishra: Ok. I will try to keep loading of the two addresses outside the capture region entirely.

NimishMishraAuthorUnsubmitted

Done

@shraiysh Does this IR look ok?

NimishMishra: @shraiysh Does this IR look ok?

shraiyshUnsubmitted

Not Done

Yes, the current testcases look perfect to me and I cannot spot any errors in them. Thanks for the patience and to get it to work 👏 .

I have not reviewed the code itself, but functionality wise, it looks okay to me. Please feel free to go ahead without my approval for the code as it might be sometime before I get time to review the code. If because of some review comments you happen to change the testcases before landing this, then let me know and I will review the updated testcases as soon as I can.

shraiysh: Yes, the current testcases look perfect to me and I cannot spot any errors in them. Thanks for…

!CHECK: %[[B:.*]] = fir.alloca !fir.box<!fir.ptr<i32>> {bindc_name = "b", uniq_name = "_QFpointers_in_atomic_captureEb"}

!CHECK: {{.*}} = fir.zero_bits !fir.ptr<i32>

!CHECK: {{.*}} = fir.embox {{.*}} : (!fir.ptr<i32>) -> !fir.box<!fir.ptr<i32>>

!CHECK: fir.store {{.*}} to %[[B]] : !fir.ref<!fir.box<!fir.ptr<i32>>>

!CHECK: %[[C:.*]] = fir.alloca i32 {bindc_name = "c", fir.target, uniq_name = "_QFpointers_in_atomic_captureEc"}

!CHECK: %[[D:.*]] = fir.alloca i32 {bindc_name = "d", fir.target, uniq_name = "_QFpointers_in_atomic_captureEd"}

!CHECK: {{.*}} = fir.embox {{.*}} : (!fir.ref<i32>) -> !fir.box<!fir.ptr<i32>>

!CHECK: fir.store {{.*}} to %[[A]] : !fir.ref<!fir.box<!fir.ptr<i32>>>

!CHECK: {{.*}} = fir.embox {{.*}} : (!fir.ref<i32>) -> !fir.box<!fir.ptr<i32>>

!CHECK: fir.store {{.*}} to %[[B]] : !fir.ref<!fir.box<!fir.ptr<i32>>>

!CHECK: %[[loaded_A:.*]] = fir.load %[[A]] : !fir.ref<!fir.box<!fir.ptr<i32>>>

!CHECK: %[[loaded_A_addr:.*]] = fir.box_addr %[[loaded_A]] : (!fir.box<!fir.ptr<i32>>) -> !fir.ptr<i32>

!CHECK: %[[loaded_B:.*]] = fir.load %[[B]] : !fir.ref<!fir.box<!fir.ptr<i32>>>

!CHECK: %[[loaded_B_addr:.*]] = fir.box_addr %[[loaded_B]] : (!fir.box<!fir.ptr<i32>>) -> !fir.ptr<i32>

!CHECK: omp.atomic.capture {

!CHECK: omp.atomic.update %[[loaded_A_addr]] : !fir.ptr<i32> {

!CHECK: ^bb0(%[[ARG:.*]]: i32):

!CHECK: %[[PRIVATE_LOADED_B:.*]] = fir.load %[[B]] : !fir.ref<!fir.box<!fir.ptr<i32>>>

!CHECK: %[[PRIVATE_LOADED_B_addr:.*]] = fir.box_addr %[[PRIVATE_LOADED_B]] : (!fir.box<!fir.ptr<i32>>) -> !fir.ptr<i32>

!CHECK: %[[loaded_value:.*]] = fir.load %[[PRIVATE_LOADED_B_addr]] : !fir.ptr<i32>

!CHECK: %[[result:.*]] = arith.addi %[[ARG]], %[[loaded_value]] : i32

!CHECK: omp.yield(%[[result]] : i32)

!CHECK: }

!CHECK: omp.atomic.read %[[loaded_B_addr]] = %[[loaded_A_addr]] : !fir.ptr<i32>, i32

!CHECK: }

integer, pointer :: a, b

integer, target :: c, d

a=>c

b=>d

!$omp atomic capture

a = a + b

b = a

!$omp end atomic

end subroutine

kiranchandramohanUnsubmitted

Not Done

I believe the semantics of atomic.update is that it will load the address that is provided to it and that will be available as the basic block argument ARG. The body of the update will use this loaded value ARG to perform the update and yield the updated value which will be stored again at the address.

I think this update operation will not work as expected since it is not using the automatically loaded value in ARG and it is loading the value at the address passed to the atomic.update op and adds the constant to it. This will end up computing which is not what we want.
numbers(1) = numbers(1) + numbers(1) + 10.

Let me know if I missed a point.

kiranchandramohan: I believe the semantics of `atomic.update` is that it will load the `address` that is provided…

kiranchandramohanUnsubmitted

Not Done

I think I got that wrong. It will not double add since the new code is not touching the ARG.

I tried fetching the patch but it shows some issues. Could you rebase? I would like to have a look at the IR that is generated.

Particularly, we have to check,

whether the following load is an atomic load in the LLVM IR.

!CHECK: %[[array_element_inner:.*]] = fir.load %[[array_element_ref]]

whether the loop containing cmpxchg is well-formed.

kiranchandramohan: I think I got that wrong. It will not double add since the new code is not touching the `ARG`.

NimishMishraAuthorUnsubmitted

Done

Hi Kiran.

I was trying to understand the requirement here. I do not completely understand why cmpxchg could be a problem here? Wouldn't the compare and the exchange happen inside the atomic region.

I mean I am just trying to understand how the generated IR should look like, to make sure I am doing it correctly.

NimishMishra: Hi Kiran. I was trying to understand the requirement here. I do not completely understand why…

NimishMishraAuthorUnsubmitted

Done

Hi Kiran.

The LLVM IR is as follows. Does it look ok? The load is within the Atomic Update block, so it should be fine right?

llvm.func @_QParray_refs() {                                                                                                                 
  %0 = llvm.mlir.constant(1.000000e+01 : f32) : f32                                                                                         
  %1 = llvm.mlir.constant(1 : i64) : i64                                                                                                     
  %2 = llvm.alloca %1 x !llvm.array<5 x f32> {bindc_name = "numbers", in_type = !fir.array<5xf32>, operand_segment_sizes = array<i32: 0, 0>,
uniq_name = "_QFarray_refsEnumbers"} : (i64) -> !llvm.ptr<array<5 x f32>>                                                                   
  %3 = llvm.alloca %1 x f32 {bindc_name = "x", in_type = f32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFarray_refsEx"} : (i64
) -> !llvm.ptr<f32>                                                                                                                         
  %4 = llvm.getelementptr %2[0, 0] : (!llvm.ptr<array<5 x f32>>) -> !llvm.ptr<f32>                                                           
  omp.atomic.capture   {                                                                                                                     
    omp.atomic.update   %4 : !llvm.ptr<f32> {                                                                                               
    ^bb0(%arg0: f32):                                                                                                                       
      %5 = llvm.load %4 : !llvm.ptr<f32>                                                                                                     
      %6 = llvm.fadd %5, %0  {fastmathFlags = #llvm.fastmath<contract>} : f32
      omp.yield(%6 : f32)                                             
    }                                                                 
    omp.atomic.read %3 = %4   : !llvm.ptr<f32>, f32
  }         
  llvm.return

NimishMishra: Hi Kiran. The LLVM IR is as follows. Does it look ok? The `load` is within the Atomic Update…

NimishMishraAuthorUnsubmitted

Done

Apologies. That is the MLIR dialect. Please find the LLVM IR below. It also has a cmpxchg instruction

The test case is simple:

!$omp atomic capture
      x = y
      y = x + y
!$omp end capture

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca float, i64 1, align 4
  store float 2.000000e+01, ptr %1, align 4
  store float 1.000000e+01, ptr %2, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %2 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %3 = phi i32 [ %.atomic.load, %entry ], [ %8, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %3 to float
  %4 = load float, ptr %1, align 4
  %5 = fadd contract float %4, %.atomic.fltCast
  store float %5, ptr %x.new.val, align 4
  %6 = load i32, ptr %x.new.val, align 4
  %7 = cmpxchg ptr %2, i32 %3, i32 %6 monotonic monotonic, align 4
  %8 = extractvalue { i32, i1 } %7, 0
  %9 = extractvalue { i32, i1 } %7, 1
  br i1 %9, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %.atomic.fltCast, ptr %1, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

NimishMishra: Apologies. That is the MLIR dialect. Please find the LLVM IR below. It also has a `cmpxchg`…

NimishMishraAuthorUnsubmitted

Done

And for the array reference test case

!$omp atomic capture
      x(1) = y(1)
      y(1) = x(1) + y(1)
!$omp end capture

The following IR is generated:

; ModuleID = 'FIRModule'
source_filename = "FIRModule"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QPtest() {
  %x.new.val = alloca float, align 4
  %1 = alloca float, i64 1, align 4
  %2 = alloca [5 x float], i64 1, align 4
  %3 = alloca [5 x float], i64 1, align 4
  %4 = getelementptr [5 x float], ptr %2, i32 0, i32 0
  store float 2.000000e+01, ptr %4, align 4
  %5 = getelementptr [5 x float], ptr %3, i32 0, i32 0
  store float 1.000000e+01, ptr %5, align 4
  %6 = load float, ptr %4, align 4
  %7 = load float, ptr %5, align 4
  %8 = fadd contract float %6, %7
  store float %8, ptr %1, align 4
  br label %entry

entry:                                            ; preds = %0
  %.atomic.load = load atomic i32, ptr %4 monotonic, align 4
  br label %.atomic.cont

.atomic.cont:                                     ; preds = %.atomic.cont, %entry
  %9 = phi i32 [ %.atomic.load, %entry ], [ %13, %.atomic.cont ]
  %.atomic.fltCast = bitcast i32 %9 to float
  %10 = load float, ptr %5, align 4
  store float %10, ptr %x.new.val, align 4
  %11 = load i32, ptr %x.new.val, align 4
  %12 = cmpxchg ptr %4, i32 %9, i32 %11 monotonic monotonic, align 4
  %13 = extractvalue { i32, i1 } %12, 0
  %14 = extractvalue { i32, i1 } %12, 1
  br i1 %14, label %.atomic.exit, label %.atomic.cont

.atomic.exit:                                     ; preds = %.atomic.cont
  store float %10, ptr %5, align 4
  ret void
}

!llvm.module.flags = !{!0}

!0 = !{i32 2, !"Debug Info Version", i32 3}

NimishMishra: And for the array reference test case ``` !$omp atomic capture x(1) = y(1) y(1) =…

kiranchandramohanUnsubmitted

Not Done

The concern here is that the atomically loaded value is not used in the update operation.

AFAIU, the cmpxchg instruction only updates the location if the value that is currently at that location equals the value that is used for the update. So, if we are doing y = y + x. An initial atomic load is made of y (=y_old) and it is added with the value at x to obtain the value y_old + x. Before storing this value at y, it is checked that the current resident value at y is equal to y_old. The problem here (for the array-element case) is that the value to be used for updating is not obtained using the atomically loaded value y_old, but it is using a different value and that does not seem correct. Also, the update operation (addition here) has to be inside the loop since it addition should be performed on the atomically loaded value.

kiranchandramohan: The concern here is that the atomically loaded value is not used in the update operation.

NimishMishraAuthorUnsubmitted

Done

I understand now. Thank you.

The patch can not go ahead in its current form then. Do you any suggestions on how to go forward with fixing it then? Johannes did mention an alternative strategy some time back, but I am not sure how to start on that direction. Can you give some initial direction?

NimishMishra: I understand now. Thank you. The patch can not go ahead in its current form then. Do you any…

This is an archive of the discontinued LLVM Phabricator instance.

[flang][OpenMP] Lowering support for atomic captureClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 519354

flang/lib/Lower/OpenMP.cpp

flang/test/Lower/OpenMP/atomic-capture.f90

[flang][OpenMP] Lowering support for atomic capture
ClosedPublic