This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Fixed internal compiler error with atomic update operation verification
ClosedPublic

Authored by NimishMishra on Jun 17 2023, 9:59 PM.

Diff Detail

Event Timeline

NimishMishra created this revision.Jun 17 2023, 9:59 PM
NimishMishra requested review of this revision.Jun 17 2023, 9:59 PM

@kiranchandramohan I analysed your comment from https://github.com/llvm/llvm-project/issues/61089. Basically the following:

The fix should be based on a condition check:
if the region has only two operations (the update and the yield) then try the atomic operation generation. If the atomic generation is not possible then generate cmpxchg as usual.
If the region has more than two operations then generate cmpxchg.

Based on the behaviours I observed while lowering atomic update, I found that the first precondition (if the region has only two operations) is never really reached. Reason being that when we do x = x + y, the lowering always load a load on the non-atomic variable (y here) and then proceeds to do addition. Hence, the least we get are 3 operations: load, add, and yield.

@kiranchandramohan I analysed your comment from https://github.com/llvm/llvm-project/issues/61089. Basically the following:

The fix should be based on a condition check:
if the region has only two operations (the update and the yield) then try the atomic operation generation. If the atomic generation is not possible then generate cmpxchg as usual.
If the region has more than two operations then generate cmpxchg.

Based on the behaviours I observed while lowering atomic update, I found that the first precondition (if the region has only two operations) is never really reached. Reason being that when we do x = x + y, the lowering always load a load on the non-atomic variable (y here) and then proceeds to do addition. Hence, the least we get are 3 operations: load, add, and yield.

If you are talking of the flang flow specifically, then the following code is an example where the condition is reached.

./bin/flang-new -fc1 -emit-fir -fopenmp at_int.f90 -o - | ./bin/fir-opt --fir-memref-dataflow-opt
subroutine test()
        integer :: x
        integer :: y
        x = 20
        y = 10
        !$OMP ATOMIC
        y = x + y
end subroutine test

Also, since the Dialect sits externally to flang in MLIR, we should not restrict ourselves only for the flang use case.

Thank you @kiranchandramohan for the comment. Apologies for the late response. I got caught up elsewhere.

For the test case you give, this patch generates the following IR:

 func.func @_QQmain() attributes {fir.bindc_name = "sample"} {
    %0 = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFEx"}
    %1 = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"}
    %c20_i32 = arith.constant 20 : i32
    %c10_i32 = arith.constant 10 : i32
    omp.atomic.update   %1 : !fir.ref<i32> {
    ^bb0(%arg0: i32):
      %2 = arith.addi %c20_i32, %arg0 : i32
      omp.yield(%2 : i32)
    }
    return
  }
  fir.global @_QQEnvironmentDefaults constant : !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>> {
    %0 = fir.zero_bits !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>>
    fir.has_value %0 : !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>>
  }
}

Also, I think my initial assessment of this patch was wrong. This patch will work just fine for source code where there are exactly two operations in the atomic region. The reason for the same is as follows.

  • *Before the patch*: Operation &innerUpdateOp = innerOpList.front();, the update operation is taken as the "first" operation inside the atomic region.
  • *After the patch*: for (Operation &innerUpdateOp : innerOpList): the update operation is taken as "some" operation inside the atomic region

As the tests with this patch show, when there are >2 operations in atomic region, the loop correctly recognises the atomic operation and lowers it. However, for the test case you gave, the loop again correctly recognises the atomic operation as the "first" operation in the atomic region (which is indistinguishable from Operation &innerUpdateOp = innerOpList.front();) and generates IR for it accordingly. As we see, there is no cmpxchg instruction generated.

Apologies. Forgot to add the LLVM IR:

@_QQEnvironmentDefaults = constant ptr null

declare ptr @malloc(i64)

declare void @free(ptr)

define void @_QQmain() {
  %1 = alloca i32, i64 1, align 4
  %2 = alloca i32, i64 1, align 4
  br label %entry

entry:                                            ; preds = %0
  %3 = atomicrmw add ptr %2, i32 20 monotonic, align 4
  %4 = add i32 %3, 20
  ret void
}

!llvm.module.flags = !{!0, !1}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 7, !"openmp", i32 11}

Thank you @kiranchandramohan for the comment. Apologies for the late response. I got caught up elsewhere.

For the test case you give, this patch generates the following IR:

 func.func @_QQmain() attributes {fir.bindc_name = "sample"} {
    %0 = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFEx"}
    %1 = fir.alloca i32 {bindc_name = "y", uniq_name = "_QFEy"}
    %c20_i32 = arith.constant 20 : i32
    %c10_i32 = arith.constant 10 : i32
    omp.atomic.update   %1 : !fir.ref<i32> {
    ^bb0(%arg0: i32):
      %2 = arith.addi %c20_i32, %arg0 : i32
      omp.yield(%2 : i32)
    }
    return
  }
  fir.global @_QQEnvironmentDefaults constant : !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>> {
    %0 = fir.zero_bits !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>>
    fir.has_value %0 : !fir.ref<tuple<i32, !fir.ref<!fir.array<0xtuple<!fir.ref<i8>, !fir.ref<i8>>>>>>
  }
}

Also, I think my initial assessment of this patch was wrong. This patch will work just fine for source code where there are exactly two operations in the atomic region. The reason for the same is as follows.

  • *Before the patch*: Operation &innerUpdateOp = innerOpList.front();, the update operation is taken as the "first" operation inside the atomic region.
  • *After the patch*: for (Operation &innerUpdateOp : innerOpList): the update operation is taken as "some" operation inside the atomic region

As the tests with this patch show, when there are >2 operations in atomic region, the loop correctly recognises the atomic operation and lowers it. However, for the test case you gave, the loop again correctly recognises the atomic operation as the "first" operation in the atomic region (which is indistinguishable from Operation &innerUpdateOp = innerOpList.front();) and generates IR for it accordingly. As we see, there is no cmpxchg instruction generated.

Thanks for the explanation. I understood that it works for the explanation provided.
In general, what I am trying to say here is that we are moving from a translation that was very strict (for 2 ops, the first being the binary operation that creates the updated value and the second the yield) to a more liberal translation. It will be ideal to ensure only good cases are translated and error out for the rest. If that is not possible, then we should explicitly specify what we are trying to do in the translation and also avoid crashes.

From what I understand in the new code, the first operation in the region that has as its operand the block argument is the update op. This need not always be the case. For eg, I believe the following will crash if we try to translate (with a debug build).

llvm.func @_QQmain() attributes {fir.bindc_name = "sample"} {
  %0 = llvm.mlir.constant(20 : i64) : i64
  %1 = llvm.mlir.constant(1 : i64) : i64
  %2 = llvm.alloca %1 x i32 {bindc_name = "y", in_type = i32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFEy"} : (i64) -> !llvm.ptr<i32>
  omp.atomic.update   %2 : !llvm.ptr<i32> {
  ^bb0(%arg0: i32):
    %3 = llvm.sext %arg0 : i32 to i64
    %4 = llvm.add %3, %0  : i64
    %5 = llvm.trunc %4 : i64 to i32
    omp.yield(%5 : i32)
  }
  llvm.return
}
llvm.func @_QQmain() attributes {fir.bindc_name = "sample"} {
  %0 = llvm.mlir.constant(1 : i64) : i64
  %1 = llvm.alloca %0 x i32 {bindc_name = "y", in_type = i32, operand_segment_sizes = array<i32: 0, 0>, uniq_name = "_QFEy"} : (i64) -> !llvm.ptr<i32>
  omp.atomic.update   %1 : !llvm.ptr<i32> {
  ^bb0(%arg0: i32):
    omp.yield(%arg0 : i32)
  }
  llvm.return
}
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1135

Nit: remove the commented code.

1220

The innerUpdateOp might not be a binary op, check that before proceeding.

1224

There might not be an operand(1) if it is not a binary op.

1234

Add another check, if no update operation found then produce an error.

Addressed review comments.

NimishMishra marked 3 inline comments as done.Jul 12 2023, 11:34 PM

LG. Please see comments inline.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1230–1231

Add a comment on what is the expectation here and what we are trying to find. Is it something like Find the binary update operation that uses the region argument and get the expression used to update?

1231

Should probably initialize isRegionArgUsed and isUpdateOpPresent tp false on each iteration of the loop.

This revision is now accepted and ready to land.Jul 13 2023, 7:12 AM
NimishMishra added inline comments.Jul 17 2023, 4:09 AM
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1230–1231

@kiranchandramohan Thank you for the LG. I have one small question.

By initializing isUpdateOpPresent and isRegionArgUsed to false inside the loop, the current code gives the error "no atomic update operation found inside atomic.update region" instead of the expected "the update operation inside the region must have the region argument as an operand" for the following test case:

omp.atomic.update %x : !llvm.ptr<i32>{
   ^bb0(%xval : i32):
            %newval = llvm.mul %expr, %expr : i32
            omp.yield(%newval : i32)
}

This is because even though we have a valid binary operator, during execution of the 2nd iteration of the loop, isUpdateOpPresent is initialized to false, causing the first error "no atomic update operation found inside atomic.update region" to be emitted.

Is it necessary to initialize the variables to false inside the loop?

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1230–1231

The following code will be true for this case (%newval = llvm.mul %expr, %expr : i32). So why will isUpdateOpPresent be false?

if (innerOp.getNumOperands() == 2) {
  isUpdateOpPresent = true;
NimishMishra added inline comments.Jul 17 2023, 4:27 AM
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1230–1231

Because we would be setting the variable to false in every iteration of the loop. Hence, for the first operation %newval = llvm.mul %expr, %expr : i32, the variable is set to true. But for the second operation, it remains false.

Something like the code below. Is this what you meant by setting the two variables to false in every iteration of the loop:

bool isRegionArgUsed, isUpdateOpPresent;
for (Operation &innerOp : innerOpList) {
     isUpdateOpPresent = false;
     isRegionArgUsed = false;
     if (innerOp.getNumOperands() == 2) {
        // normal code here
     }
}

if (!isUpdateOpPresent)
    return atomicUpdateOp.emitError("no atomic update operation found"
                                      " inside atomic.update region");
if (!isRegionArgUsed)
    return atomicUpdateOp.emitError(
          "the update operation inside the region "
          "must have the region argument as an operand");
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
1230–1231

Since we want both to be true at the same time, I think you can consider merging the error checks. And I assume isRegionArgUsed will only be true if isUpdateOpPresent is true after moving the initialization inside.

if (!isRegionArgUsed)
      return atomicUpdateOp.emitError("no atomic update operation with region argument as operand found"
                                      " inside atomic.update region");