This is an archive of the discontinued LLVM Phabricator instance.

[Flang][OpenMP] Fix for atomic lowering with HLFIR
ClosedPublic

Authored by kiranchandramohan on Aug 18 2023, 10:08 AM.

Details

Summary

Atomic update operation is modelled in OpenMP dialect as
an operation that takes a reference to the operation being
updated. It also contains a region that will perform the
update. The block argument represents the loaded value from
the update location and the Yield operation is the value
that should be stored for the update.

OpenMP FIR lowering binds the value loaded from the update
address to the SymbolAddress. HLFIR lowering does not permit
SymbolAddresses to be a value. To work around this, the
lowering is now performed in two steps. First the body of
the atomic update is lowered into an SCF execute_region
operation. Then this is copied into the omp.atomic_update
as a second step that performs the following:
-> Create an omp.atomic_update with the block argument of
the correct type.
-> Copy the operations from the SCF execute_region. Convert
the scf.yield to an omp.yield.
-> Remove the loads of the update location and replace all
uses with the block argument.

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 18 2023, 10:08 AM
kiranchandramohan requested review of this revision.Aug 18 2023, 10:08 AM

TODO:
-> Add a detailed explanation
-> Add a test for HLFIR.

flang/lib/Lower/OpenMP.cpp
3113

Lowering is in three steps :

subroutine sb
  integer :: a, b
  !$omp atomic update
    a = a + b
end subroutine

Lower to scf.execute_region_op

func.func @_QPsb() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFsbEa"}
  %1 = fir.alloca i32 {bindc_name = "b", uniq_name = "_QFsbEb"}
  %2 = scf.execute_region -> i32 {
    %3 = fir.load %0 : !fir.ref<i32>
    %4 = fir.load %1 : !fir.ref<i32>
    %5 = arith.addi %3, %4 : i32
    scf.yield %5 : i32
  }
  return
}

Move out all the non-update loads.

func.func @_QPsb() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFsbEa"}
  %1 = fir.alloca i32 {bindc_name = "b", uniq_name = "_QFsbEb"}
  %2 = fir.load %1 : !fir.ref<i32>
  %3 = scf.execute_region -> i32 {
    %4 = fir.load %0 : !fir.ref<i32>
    %5 = arith.addi %4, %2 : i32
    scf.yield %5 : i32
  }
  return
}

Convert to atomic.update.

func.func @_QPsb() {
  %0 = fir.alloca i32 {bindc_name = "a", uniq_name = "_QFsbEa"}
  %1 = fir.alloca i32 {bindc_name = "b", uniq_name = "_QFsbEb"}
  %2 = fir.load %1 : !fir.ref<i32>
  omp.atomic.update   %0 : !fir.ref<i32> {
  ^bb0(%arg0: i32):
    %3 = arith.addi %arg0, %2 : i32
    omp.yield(%3 : i32)
  }
  return
}
tblah added inline comments.Aug 21 2023, 2:42 AM
flang/lib/Lower/OpenMP.cpp
3113

Why did you decide to lower to scf.execute_region first then convert to an atomic.update?

Reduce the scope of changes to only make omp.atomic_update work with
HLFIR. (Hoisting can come in a separate patch)
Add an hlfir test.

kiranchandramohan edited the summary of this revision. (Show Details)Aug 21 2023, 4:30 AM
kiranchandramohan marked an inline comment as done.Aug 21 2023, 4:34 AM
kiranchandramohan added inline comments.
flang/lib/Lower/OpenMP.cpp
3113

HLFIR lowering only permits a reference to be bound to SymbolAddresses. omp.atomic_update has a block argument that models the value loaded from the Address and this is used inside the atomic_update region. This was previously achieved by mapping the SymbolAddress to the block argument. This does not work anymore, so it is first lowered temporarily to an scf.execute_region and then the atomic_update operation is carefully constructed from the scf.execute_region.
Note: Have added some additional explanation to the summary.

tblah accepted this revision.Aug 21 2023, 5:29 AM

LG from a HLFIR perspective

This revision is now accepted and ready to land.Aug 21 2023, 5:29 AM

Innovative approach with the intermediate step!

The change seems reasonable to me - but I think your comment showing the multiple step transformation belongs as a comment in the code instead of just summarized in the description. Also, you may want to further break up genOmpAtomicUpdateStatement into multiple functions - feels like too much is happening in a single spot now.

(I am currently looking into reusing this logic for OpenACC so I will build my change on top of this one) :)

Thanks! Great work!

kiranchandramohan marked an inline comment as done.

Add code comments inline as suggested by @razvanlupusoru.
Will split the code in a follow-up patch.

razvanlupusoru accepted this revision.Aug 23 2023, 9:44 AM

Thank you!