This patch adds lowering support for atomic update construct. A region is associated with every omp.atomic.update operation wherein resides: (1) the evaluation of the expression on the RHS of the atomic assignment statement, and (2) a omp.yield operation that yields the extended value of expression evaluated in (1).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
nit: name the test file to atomic-update.f90 in accordance with the recent changes by Peixin.
Minor changes requested. Thanks!
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
746–747 | nit: move declarations closer to their use. | |
flang/test/Lower/OpenMP/atomic03.f90 | ||
48 | Can we please add more testcases. There are a many possibilities with memory order and hint and it would be nice to test most of them (ideally all of them). Having more testcases that test different scenarios can't hurt us. [Suggestion] Maybe we can separate them into different subroutines (based on memory order) for clarity. Feel free to separate them however you'd like to ensure readability. |
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
760 | This should be incorrect for pointers. You may use the following code: mlir::Value address = fir::getBase(converter.genExprAddr( *Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx)); |
Please add a description to the summary of the patch to describe what is being implemented in the patch along with a brief description of how it is implemented.
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
113 | Nit: please document this option. | |
127 | Would it be better to pass the mlir::Value result as an argument instead of Expr and also reuse it below at the terminator insertion? | |
138 | Nit: expand auto | |
778 | Nit: Can this function share a lot of code with the genOmpAtomicUpdate? The hint clause setting, getting the designator, creating the update op etc seem to be similar? | |
811 | Can you add a comment here on why the omp::AtomicUpdateOp is created here? Something like the following. "If atomic-clause is not present on the construct, the behavior is as if the update clause is specified." | |
flang/test/Lower/OpenMP/atomic-update.f90 | ||
1 ↗ | (On Diff #434749) | Nit: please test with the flang driver as well. |
1 ↗ | (On Diff #434749) | Nit: atomic and atomic update constructs. |
3 ↗ | (On Diff #434749) | Nit: can this be on the same line as the one above? |
46–49 ↗ | (On Diff #434749) | If the options in {{.*}} are significant then capture in a variable and use it in the operation where it is used. If it is not significant then please remove. This comment applies to all the tests. |
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
127 | This is causing a change. If earlier the IR was omp.atomic.update %[[VAR_Y]] : !fir.ref<i32> { ^bb0(%[[ARG:.*]]: i32): fir.store %[[ARG]] to %[[TEMP_9]] : !fir.ref<i32> %4 = fir.load %[[TEMP_9]] : !fir.ref<i32> %5 = arith.constant 1 : i32 %[[RESULT:.*]] = arith.addi %4, %5 : i32 omp.yield(%[[RESULT]] : i32) } Now it becomes: %4 = fir.load %[[TEMP_9]] : !fir.ref<i32> %5 = arith.constant 1 : i32 %[[RESULT:.*]] = arith.addi %4, %5 : i32 omp.atomic.update %[[VAR_Y]] : !fir.ref<i32> { ^bb0(%[[ARG:.*]]: i32): fir.store %[[ARG]] to %[[TEMP_9]] : !fir.ref<i32> omp.yield(%[[RESULT]] : i32) } This does seem OK to me. But still if you could confirm it once. |
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
127 | This is not accurate IR. The "before" IR was correct (not optimal but correct). The value would get updated with that IR. The "after" IR is both incorrect and not optimal. It is loading an un-related value before the atomic update begins and so the result will not accurately "update" the value. The standard mentions that the evaluation of expr in x += expr is atomic. In this case, expr is %5, not %4 or %[[RESULT]]. Hence the evaluation of %5 can be outside the construct, but not of %[[RESULT]]. Solution: The best way to do this would be to avoid generation of load-store operations and directly use the argument. This load-store is required for worksharing loops but it would be great if we could avoid generating them for update. The arith.addi operation must be inside the update region. If it seems very hard to avoid load-store then the "before" IR is at least correct. |
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
127 |
Correction (apologies for mistyping some stuff) - The standard mentions that only the evaluation of expr in x += expr need not be atomic, but the update to x must be atomic. In this case, expr is %5, not %4 or %[[RESULT]]. Hence the evaluation of %5 can be outside the construct, but both %4 and %[[RESULT]] must be inside the atomic region. This still does not change the solution and the fact that the "after" IR is incorrect. |
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
111 | I am thinking if this should be split into one new single function instead of handling the special case in current createBodyOfOp, which is complex enough. The atomic update only uses part of code in createBodyOfOp, and it has nothing to do with the outerCombined and privitization. Actually, atomic update is one kind of special case, which defines the specific region statements in parsing. It is not the general body in the atomic region. So, would it be better to create one single function to support this? |
- created a new function for atomic update lowering
- modified the generated IR to use %arg0 in the region instead of separately loading the region argument
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
129 | // bind the argument to the symbol This is not necessary. The code can comment itself. | |
137 | Why do you generate the result twice? | |
673 | address = fir::getBase(converter.genExprAddr( *Fortran::semantics::GetExpr(assignmentStmtVariable), stmtCtx)); Move this out of if condition. | |
689 | createBodyOfAtomicUpdateOp is only used here, and it's not one large function, so it is OK to inline it directly. |
The generated code in testcases looks good. Please wait for @peixin's approval for the code part, but functionality-wise this LGTM.
Thanks for the update. The structure of lowering atomic update operation is much more clear now. I have a few more comments.
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
111 | Remove the inline. | |
132 | ||
669 | The vector of symbols seems to be not necessary. Use one symbol is enough since update operation always has one argument. Use one more description name -> updateSymbol, or better one. | |
689 | No need to pass currentLocation. It can be obtained from converter. | |
690 | symbolVector is from assignmentStmtVariable. You can move the code of getting the updateSymbol inside the callee. |
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
111 | I didn't mean add the inline attribute. What I mean is that remove this function and move the code into genOmpAtomicUpdateStatement. |
LGTM except for a few nits.
flang/lib/Lower/OpenMP.cpp | ||
---|---|---|
689 | Nit: Add one empty line after 1149. | |
690 | Ni: Remove the empty line. | |
692 | Nit | |
693 | Ni: Remove the empty line. | |
699 | tiv means type of induction variable. It's not appropriate for update variable. | |
700 | Then you can remove the following: tiv.push_back(varType); locs.push_back(currentLocation); |
I am getting:
/.../llvm-project/flang/lib/Lower/OpenMP.cpp:1134:21: error: variable 'updateSymbol' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
If I understand the code correctly, updateSymbol is sometimes initialized. Then, we always dereference the pointer later in the function:
converter.bindSymbol(*updateSymbol, val);
I could silence the warning with:
const Fortran::semantics::Symbol *updateSymbol = nullptr;
but that wouldn't fix the fundamental problem. Would you mind taking a look? Thanks!
Thanks @kazu for reaching out. I have put a potential fix at https://reviews.llvm.org/D129914.
@peixin @kiranchandramohan Could you take a quick look at the fix? The issue is failing some builds with [-Werror,-Wsometimes-uninitialized]
I am thinking if this should be split into one new single function instead of handling the special case in current createBodyOfOp, which is complex enough. The atomic update only uses part of code in createBodyOfOp, and it has nothing to do with the outerCombined and privitization.
Actually, atomic update is one kind of special case, which defines the specific region statements in parsing. It is not the general body in the atomic region. So, would it be better to create one single function to support this?