Fixes https://github.com/llvm/llvm-project/issues/61089 by updating the verification followed like translation from OpenMP+LLVM MLIR dialect to LLVM IR.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@kiranchandramohan I analysed your comment from https://github.com/llvm/llvm-project/issues/61089. Basically the following:
The fix should be based on a condition check: if the region has only two operations (the update and the yield) then try the atomic operation generation. If the atomic generation is not possible then generate cmpxchg as usual. If the region has more than two operations then generate cmpxchg.
Based on the behaviours I observed while lowering atomic update, I found that the first precondition (if the region has only two operations) is never really reached. Reason being that when we do x = x + y, the lowering always load a load on the non-atomic variable (y here) and then proceeds to do addition. Hence, the least we get are 3 operations: load, add, and yield.
If you are talking of the flang flow specifically, then the following code is an example where the condition is reached.
./bin/flang-new -fc1 -emit-fir -fopenmp at_int.f90 -o - | ./bin/fir-opt --fir-memref-dataflow-opt subroutine test() integer :: x integer :: y x = 20 y = 10 !$OMP ATOMIC y = x + y end subroutine test
Also, since the Dialect sits externally to flang in MLIR, we should not restrict ourselves only for the flang use case.
Thank you @kiranchandramohan for the comment. Apologies for the late response. I got caught up elsewhere.
For the test case you give, this patch generates the following IR:
func.func @_QQmain() attributes {fir.bindc_name = "sample"} { %0 = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFEx"} %1 = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"} %c20_i32 = arith.constant 20 : i32 %c10_i32 = arith.constant 10 : i32 omp.atomic.update %1 : !fir.ref<i32> { ^bb0(%arg0: i32): %2 = arith.addi %c20_i32, %arg0 : i32 omp.yield(%2 : i32) } return } fir.global @_QQEnvironmentDefaults constant : !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>> { %0 = fir.zero_bits !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>> fir.has_value %0 : !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>> } }
Also, I think my initial assessment of this patch was wrong. This patch will work just fine for source code where there are exactly two operations in the atomic region. The reason for the same is as follows.
- *Before the patch*: Operation &innerUpdateOp = innerOpList.front();, the update operation is taken as the "first" operation inside the atomic region.
- *After the patch*: for (Operation &innerUpdateOp : innerOpList): the update operation is taken as "some" operation inside the atomic region
As the tests with this patch show, when there are >2 operations in atomic region, the loop correctly recognises the atomic operation and lowers it. However, for the test case you gave, the loop again correctly recognises the atomic operation as the "first" operation in the atomic region (which is indistinguishable from Operation &innerUpdateOp = innerOpList.front();) and generates IR for it accordingly. As we see, there is no cmpxchg instruction generated.
Apologies. Forgot to add the LLVM IR:
@_QQEnvironmentDefaults = constant ptr null declare ptr @malloc(i64) declare void @free(ptr) define void @_QQmain() { %1 = alloca i32, i64 1, align 4 %2 = alloca i32, i64 1, align 4 br label %entry entry: ; preds = %0 %3 = atomicrmw add ptr %2, i32 20 monotonic, align 4 %4 = add i32 %3, 20 ret void } !llvm.module.flags = !{!0, !1} !0 = !{i32 2, !"Debug Info Version", i32 3} !1 = !{i32 7, !"openmp", i32 11}
Thanks for the explanation. I understood that it works for the explanation provided.
In general, what I am trying to say here is that we are moving from a translation that was very strict (for 2 ops, the first being the binary operation that creates the updated value and the second the yield) to a more liberal translation. It will be ideal to ensure only good cases are translated and error out for the rest. If that is not possible, then we should explicitly specify what we are trying to do in the translation and also avoid crashes.
From what I understand in the new code, the first operation in the region that has as its operand the block argument is the update op. This need not always be the case. For eg, I believe the following will crash if we try to translate (with a debug build).
llvm.func @_QQmain() attributes {fir.bindc_name = "sample"} { %0 = llvm.mlir.constant(20 : i64) : i64 %1 = llvm.mlir.constant(1 : i64) : i64 %2 = llvm.alloca %1 x i32 {bindc_name = "y", in_type = i32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFEy"} : (i64) -> !llvm.ptr<i32> omp.atomic.update %2 : !llvm.ptr<i32> { ^bb0(%arg0: i32): %3 = llvm.sext %arg0 : i32 to i64 %4 = llvm.add %3, %0 : i64 %5 = llvm.trunc %4 : i64 to i32 omp.yield(%5 : i32) } llvm.return }
llvm.func @_QQmain() attributes {fir.bindc_name = "sample"} { %0 = llvm.mlir.constant(1 : i64) : i64 %1 = llvm.alloca %0 x i32 {bindc_name = "y", in_type = i32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFEy"} : (i64) -> !llvm.ptr<i32> omp.atomic.update %1 : !llvm.ptr<i32> { ^bb0(%arg0: i32): omp.yield(%arg0 : i32) } llvm.return }
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1135 | Nit: remove the commented code. | |
1220 | The innerUpdateOp might not be a binary op, check that before proceeding. | |
1224 | There might not be an operand(1) if it is not a binary op. | |
1237 | Add another check, if no update operation found then produce an error. |
LG. Please see comments inline.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1233–1234 | Add a comment on what is the expectation here and what we are trying to find. Is it something like Find the binary update operation that uses the region argument and get the expression used to update? | |
1234 | Should probably initialize isRegionArgUsed and isUpdateOpPresent tp false on each iteration of the loop. |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1233–1234 | @kiranchandramohan Thank you for the LG. I have one small question. By initializing isUpdateOpPresent and isRegionArgUsed to false inside the loop, the current code gives the error "no atomic update operation found inside atomic.update region" instead of the expected "the update operation inside the region must have the region argument as an operand" for the following test case: omp.atomic.update %x : !llvm.ptr<i32>{ ^bb0(%xval : i32): %newval = llvm.mul %expr, %expr : i32 omp.yield(%newval : i32) } This is because even though we have a valid binary operator, during execution of the 2nd iteration of the loop, isUpdateOpPresent is initialized to false, causing the first error "no atomic update operation found inside atomic.update region" to be emitted. Is it necessary to initialize the variables to false inside the loop? |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1233–1234 | The following code will be true for this case (%newval = llvm.mul %expr, %expr : i32). So why will isUpdateOpPresent be false? if (innerOp.getNumOperands() == 2) { isUpdateOpPresent = true; |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1233–1234 | Because we would be setting the variable to false in every iteration of the loop. Hence, for the first operation %newval = llvm.mul %expr, %expr : i32, the variable is set to true. But for the second operation, it remains false. Something like the code below. Is this what you meant by setting the two variables to false in every iteration of the loop: bool isRegionArgUsed, isUpdateOpPresent; for (Operation &innerOp : innerOpList) { isUpdateOpPresent = false; isRegionArgUsed = false; if (innerOp.getNumOperands() == 2) { // normal code here } } if (!isUpdateOpPresent) return atomicUpdateOp.emitError("no atomic update operation found" " inside atomic.update region"); if (!isRegionArgUsed) return atomicUpdateOp.emitError( "the update operation inside the region " "must have the region argument as an operand"); |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1233–1234 | Since we want both to be true at the same time, I think you can consider merging the error checks. And I assume isRegionArgUsed will only be true if isUpdateOpPresent is true after moving the initialization inside. if (!isRegionArgUsed) return atomicUpdateOp.emitError("no atomic update operation with region argument as operand found" " inside atomic.update region"); |
Nit: remove the commented code.