This is an archive of the discontinued LLVM Phabricator instance.

[flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region
ClosedPublic

Authored by kiranchandramohan on Jul 17 2022, 10:54 AM.

Details

Summary

[flang][OpenMP][OpenACC] Support stop statement in OpenMP/OpenACC region

This supports lowering of stop statement in OpenMP/OpenACC region.

  • OpenMP/OpenACC: Emit fir.unreachable only if the block is not terminated by any terminator. This avoids knocking off an existing OpenMP/OpenACC terminator.
  • OpenMP: Emit the OpenMP terminator instead of fir.unreachable since OpenMP regions can only be terminated by OpenMP terminators. This is currently skipped for OpenACC since unstructured code is not yet handled specially in OpenACC lowering.

Fixes #60737
Fixes #61877

Co-authored-by: Peixin Qiao <qiaopeixin@huawei.com>
Co-authored-by: Val Donaldson <vdonaldson@nvidia.com>

Diff Detail

Event Timeline

peixin created this revision.Jul 17 2022, 10:54 AM
peixin requested review of this revision.Jul 17 2022, 10:54 AM
  1. For OpenMP code, I noticed that fir.if operation is not generated even with -emit-fir enabled. But for OpenACC code, it is generated.
  2. For OpenMP code, the redundant branch is not optimized in test_stop_in_region2 in flang/test/Lower/OpenMP/stop-stmt-in-region.f90. But for OpenACC code, it is optimized.

@kiranchandramohan Do you know what's going on here?

  1. For OpenACC code, I got the following error for test_stop_in_region3 in flang/test/Lower/OpenACC/stop-stmt-in-region.f90:
error: loc("./acc.f90":18:5): operation with block successors must terminate its parent block
FATAL: verification of lowering to FIR failed

@clementval Could you please help check with this? The similar OpenMP code does not fail.

peixin updated this revision to Diff 503708.Mar 9 2023, 2:51 AM
peixin edited the summary of this revision. (Show Details)

rebase and ping for review @kiranchandramohan @clementval

I think for OpenACC, this should be a simple semantic check because it is not allowed.

2.5.4
A program may not branch into or out of a compute construct.

We have such condition enforce in flang/lib/Semantics/check-directive-structure.h by the NoBranchingEnforce. I think STOP should follow this. Sorry I didn't think about this restriction before.

peixin added a comment.Mar 9 2023, 3:09 AM

I think for OpenACC, this should be a simple semantic check because it is not allowed.

2.5.4
A program may not branch into or out of a compute construct.

As Fortran 2018 11.2, branching refers to GO TO statement. I also tried the following:

$ cat test.f90 
subroutine test_stop_in_region3()
  integer :: x
  !$acc parallel
    x = 3
    if (x > 1) stop x
  !$acc end parallel
end

$ gfortran -fopenacc -c test.f90
$ cat case.f90 
subroutine test_stop_in_region3()
  integer :: x
  !$acc parallel
    x = 3
    if (x > 1) goto 10
  !$acc end parallel
10 stop 1
end

$ gfortran -fopenacc -c case.f90
case.f90:5:22:

    5 |     if (x > 1) goto 10
      |                      ^
Error: invalid branch to/from OpenACC structured block
peixin added a comment.Mar 9 2023, 3:13 AM

We have such condition enforce in flang/lib/Semantics/check-directive-structure.h by the NoBranchingEnforce. I think STOP should follow this. Sorry I didn't think about this restriction before.

I removed the semantic checks in https://reviews.llvm.org/D126471. At that time, I have tested using gfortran to confirm it should be removed. Not your fault :)

clementval added a comment.EditedMar 9 2023, 3:33 AM

We have such condition enforce in flang/lib/Semantics/check-directive-structure.h by the NoBranchingEnforce. I think STOP should follow this. Sorry I didn't think about this restriction before.

I removed the semantic checks in https://reviews.llvm.org/D126471. At that time, I have tested using gfortran to confirm it should be removed. Not your fault :)

Ok fine. nvfortran also accept this.

peixin added a comment.Mar 9 2023, 4:20 AM

We have such condition enforce in flang/lib/Semantics/check-directive-structure.h by the NoBranchingEnforce. I think STOP should follow this. Sorry I didn't think about this restriction before.

I removed the semantic checks in https://reviews.llvm.org/D126471. At that time, I have tested using gfortran to confirm it should be removed. Not your fault :)

Ok fine. nvfortran also accept this.

OK. Thanks for the confirm.

kiranchandramohan added a subscriber: vdonaldson.

Adding @vdonaldson also to check whether this is the right way to do this and also for the interaction with the block construct.

flang/lib/Lower/Bridge.cpp
2498

I am a bit surprised that this works. From the tests, it does work. How does it detect correctly whether the Stop is not in the final basic block for insertion of fir.unreachable?

This will have issues with the Block construct.

subroutine test_stop_in_region1()
  !$omp parallel
  block
    stop 1
  end block
  !$omp end parallel
end

For the following test, something goes wrong. Could be an existing issue.

subroutine test_stop_in_region1(x)
  integer :: x
  !$omp parallel
  if (x .gt. 1) then
    stop 1
  else
    stop 2
  end if
  !$omp end parallel
end
peixin added a comment.Mar 9 2023, 5:46 PM

Thanks @kiranchandramohan for the provided test cases. It looks like this method is not good one.

There is FIXME in endNewFunction for dead code elimination (https://github.com/llvm/llvm-project/blob/c417266db506c2e000931c1f7a78e81879135c0d/flang/lib/Lower/Bridge.cpp#L3691-L3694). There is also SimplifyRegionLite handling DCE (https://github.com/llvm/llvm-project/blob/c417266db506c2e000931c1f7a78e81879135c0d/flang/lib/Optimizer/Transforms/SimplifyRegionLite.cpp#L37-L47). I am thinking if we should handle it in SimplifyRegionLite pass:

void SimplifyRegionLitePass::runOnOperation() {
  auto op = getOperation();
  auto regions = op->getRegions();
  mlir::RewritePatternSet patterns(op.getContext());
  DummyRewriter rewriter(op.getContext());
  if (regions.empty())
    return;

  traverse the ops in the region:
    if op is `fir.unreachable`
      if it is not in the last Basic Block of one OpenMP/OpenACC region:
        erase this `fir.unreachable`

  (void)mlir::eraseUnreachableBlocks(rewriter, regions);
  (void)mlir::runRegionDCE(rewriter, regions);
}

For now, we can do the above things in Bridge.cpp. When the FIXME is removed, the code can be moved to SimplifyRegionLite pass.

Adding @vdonaldson also to check whether this is the right way to do this and also for the interaction with the block construct.

If the problem is that an existing required omp.terminator or omp.yield op is in danger of being deleted, would it suffice to check for that in genStopStatement code? If blockIsUnterminated from Bridge.cpp were available for use in Runtime.cpp that could be done like this:

@@ -106,6 +106,7 @@ void Fortran::lower::genStopStatement(
   }

   builder.create<fir::CallOp>(loc, callee, operands);
+  if (blockIsUnterminated())
     genUnreachable(builder, loc);
 }

If that is the right idea, then blockIsUnterminated or an alternative way of getting that information must be available in Runtime.cpp.

schweitz resigned from this revision.Mar 27 2023, 8:29 AM

Adding @vdonaldson also to check whether this is the right way to do this and also for the interaction with the block construct.

If the problem is that an existing required omp.terminator or omp.yield op is in danger of being deleted, would it suffice to check for that in genStopStatement code? If blockIsUnterminated from Bridge.cpp were available for use in Runtime.cpp that could be done like this:

@@ -106,6 +106,7 @@ void Fortran::lower::genStopStatement(
   }

   builder.create<fir::CallOp>(loc, callee, operands);
+  if (blockIsUnterminated())
     genUnreachable(builder, loc);
 }

If that is the right idea, then blockIsUnterminated or an alternative way of getting that information must be available in Runtime.cpp.

@peixin Can you implement what Val is suggesting here? This will not fix the two examples that I have provided but will fix a lot of usages of STOP. We can handle the other cases separately.

Adding @vdonaldson also to check whether this is the right way to do this and also for the interaction with the block construct.

If the problem is that an existing required omp.terminator or omp.yield op is in danger of being deleted, would it suffice to check for that in genStopStatement code? If blockIsUnterminated from Bridge.cpp were available for use in Runtime.cpp that could be done like this:

@@ -106,6 +106,7 @@ void Fortran::lower::genStopStatement(
   }

   builder.create<fir::CallOp>(loc, callee, operands);
+  if (blockIsUnterminated())
     genUnreachable(builder, loc);
 }

If that is the right idea, then blockIsUnterminated or an alternative way of getting that information must be available in Runtime.cpp.

@peixin Can you implement what Val is suggesting here? This will not fix the two examples that I have provided but will fix a lot of usages of STOP. We can handle the other cases separately.

I tried that. It's not working for all cases. I had planned to investigate what the problem is in the MLIR DCE optimization code, but it was stopped by other works. blockIsUnterminated() is not the condition to generate the unreachable instruction. Maybe there is something wrong in the MLIR DCE.

kiranchandramohan added a comment.EditedJun 11 2023, 11:19 PM

Adding @vdonaldson also to check whether this is the right way to do this and also for the interaction with the block construct.

If the problem is that an existing required omp.terminator or omp.yield op is in danger of being deleted, would it suffice to check for that in genStopStatement code? If blockIsUnterminated from Bridge.cpp were available for use in Runtime.cpp that could be done like this:

@@ -106,6 +106,7 @@ void Fortran::lower::genStopStatement(
   }

   builder.create<fir::CallOp>(loc, callee, operands);
+  if (blockIsUnterminated())
     genUnreachable(builder, loc);
 }

If that is the right idea, then blockIsUnterminated or an alternative way of getting that information must be available in Runtime.cpp.

@peixin Can you implement what Val is suggesting here? This will not fix the two examples that I have provided but will fix a lot of usages of STOP. We can handle the other cases separately.

I tried that. It's not working for all cases. I had planned to investigate what the problem is in the MLIR DCE optimization code, but it was stopped by other works. blockIsUnterminated() is not the condition to generate the unreachable instruction. Maybe there is something wrong in the MLIR DCE.

@peixin Could you share a testcase that shows the MLIR DCE issue and a case where the blockIsUnterminated does not work? Lack of support for STOP leads to a lot of failures in testsuites which makes categorisation of issues difficult. For an example, In a testsuite, the blockIsUnterminated() fix removed half of the crashes (~90 -> 45). Have not checked whether there are other issues.

@peixin Could you share a testcase that shows the MLIR DCE issue

One example is what you provided:

subroutine test_stop_in_region1(x)
  integer :: x
  !$omp parallel
  if (x .gt. 1) then
    stop 1
  else
    stop 2
  end if
  !$omp end parallel
end

These two stop statements should behave the same after DCE, but it is not for now. I guess there is something wrong in DCE.

and a case where the blockIsUnterminated does not work?

Checking blockIsUnterminated in if-else statement (the else branch) does not work, if I remeber correctly.

@peixin Could you share a testcase that shows the MLIR DCE issue

One example is what you provided:

subroutine test_stop_in_region1(x)
  integer :: x
  !$omp parallel
  if (x .gt. 1) then
    stop 1
  else
    stop 2
  end if
  !$omp end parallel
end

These two stop statements should behave the same after DCE, but it is not for now. I guess there is something wrong in DCE.

and a case where the blockIsUnterminated does not work?

Checking blockIsUnterminated in if-else statement (the else branch) does not work, if I remeber correctly.

I see. Pasting the code with blockIsUnterminated below (without running simplifyRegions). Basically, all paths leading to the region terminator are lost due to the insertion of fir.unreachable instead of cf.br. This leads to the region terminator omp.terminator being removed in the simplifyRegions code.

@vdonaldson would you have any suggestions here?

func.func @_QPtest_stop_in_region1(%arg0: !fir.ref<i32> {fir.bindc_name = "x"}) {
  omp.parallel   {
    %0 = fir.load %arg0 : !fir.ref<i32>
    %c1_i32 = arith.constant 1 : i32
    %1 = arith.cmpi sgt, %0, %c1_i32 : i32
    cf.cond_br %1, ^bb1, ^bb3
  ^bb1:  // pred: ^bb0
    %c1_i32_0 = arith.constant 1 : i32
    %false = arith.constant false
    %false_1 = arith.constant false
    %2 = fir.call @_FortranAStopStatement(%c1_i32_0, %false, %false_1) fastmath<contract> : (i32, i1, i1) -> none
    fir.unreachable
  ^bb2:  // no predecessors
    cf.br ^bb5
  ^bb3:  // pred: ^bb0
    %c2_i32 = arith.constant 2 : i32
    %false_2 = arith.constant false
    %false_3 = arith.constant false
    %3 = fir.call @_FortranAStopStatement(%c2_i32, %false_2, %false_3) fastmath<contract> : (i32, i1, i1) -> none
    fir.unreachable
  ^bb4:  // no predecessors
    cf.br ^bb5
  ^bb5:  // 2 preds: ^bb2, ^bb4
    omp.terminator
  }
  return
}

If there is only a single if and no else then there is still path to the terminator and it works fine there.

func.func @_QPtest_stop_in_region1(%arg0: !fir.ref<i32> {fir.bindc_name = "x"}) {
  omp.parallel   {
    %0 = fir.load %arg0 : !fir.ref<i32>
    %c1_i32 = arith.constant 1 : i32
    %1 = arith.cmpi sgt, %0, %c1_i32 : i32
    cf.cond_br %1, ^bb1, ^bb3
  ^bb1:  // pred: ^bb0
    %false = arith.constant false
    %2 = fir.call @_FortranAStopStatement(%c1_i32, %false, %false) fastmath<contract> : (i32, i1, i1) -> none
    fir.unreachable
  ^bb2:  // no predecessors
    cf.br ^bb3
  ^bb3:  // 2 preds: ^bb0, ^bb2
    omp.terminator
  }
  return
}

I see. Pasting the code with blockIsUnterminated below (without running simplifyRegions). Basically, all paths leading to the region terminator are lost due to the insertion of fir.unreachable instead of cf.br. This leads to the region terminator omp.terminator being removed in the simplifyRegions code.

It does make nominal sense to delete the omp.terminator if it can never be reached and executed.

As far as I'm aware the insertion of a fir.unreachable op is a performance choice, not a correctness choice. If that assumption holds, one possibility would be to suppress fir.unreachable ops after all STOP statements in parallel code.

if (not in OpenMP code)
  genUnreachable(builder, loc);

And as a variation on that, can there be more than one omp.terminator op? If yes, you could try changing the STOP code to

if (in OpenMP code)
  generate an omp.terminator or omp.yield op
else
  genUnreachable(builder, loc);

I see. Pasting the code with blockIsUnterminated below (without running simplifyRegions). Basically, all paths leading to the region terminator are lost due to the insertion of fir.unreachable instead of cf.br. This leads to the region terminator omp.terminator being removed in the simplifyRegions code.

It does make nominal sense to delete the omp.terminator if it can never be reached and executed.

As far as I'm aware the insertion of a fir.unreachable op is a performance choice, not a correctness choice. If that assumption holds, one possibility would be to suppress fir.unreachable ops after all STOP statements in parallel code.

if (not in OpenMP code)
  genUnreachable(builder, loc);

And as a variation on that, can there be more than one omp.terminator op? If yes, you could try changing the STOP code to

if (in OpenMP code)
  generate an omp.terminator or omp.yield op
else
  genUnreachable(builder, loc);

Thanks @vdonaldson. That sounds fine to me. Multiple OpenMP terminators are allowed. This will fix the two tests I had previously put up.

Thanks @vdonaldson. That sounds fine to me. Multiple OpenMP terminators are allowed. This will fix the two tests I had previously put up.

@kiranchandramohan Do you want to commandeer this revision? I am distracted by other works recently. I am afraid I don't have time to work on this patch before June 30.

Thanks @vdonaldson. That sounds fine to me. Multiple OpenMP terminators are allowed. This will fix the two tests I had previously put up.

@kiranchandramohan Do you want to commandeer this revision? I am distracted by other works recently. I am afraid I don't have time to work on this patch before June 30.

OK. I will give this a go.

kiranchandramohan commandeered this revision.Jun 14 2023, 2:42 AM
kiranchandramohan edited reviewers, added: peixin; removed: kiranchandramohan.

Adding Peixin and Val as Co-authors and commandeering this patch. Authorship will revert to Peixin on submission.

  • OpenMP/OpenACC: Emit fir.unreachable only if the block is not terminated by any terminator. This avoids knocking off an existing OpenMP/OpenACC terminator.
  • OpenMP: Emit the OpenMP terminator instead of fir.unreachable since OpenMP regions can only be terminated by OpenMP terminators. This is currently skipped for OpenACC since unstructured code is not yet handled specially in OpenACC lowering.
kiranchandramohan edited the summary of this revision. (Show Details)Jun 14 2023, 2:43 AM
kiranchandramohan removed a reviewer: schweitz.
vdonaldson accepted this revision.Jun 14 2023, 10:37 AM

@kiranchandramohan Looks ok to me. This should hopefully resolve all variations of this problem with little or no cost due to a "missing" unreachable op.

This revision is now accepted and ready to land.Jun 14 2023, 10:37 AM
peixin accepted this revision.Jun 14 2023, 6:10 PM

Adding Peixin and Val as Co-authors and commandeering this patch. Authorship will revert to Peixin on submission.

@kiranchandramohan Thanks for this work. You can authorship this patch since you finished it. Adding me as co-author is OK to me.

LGTM except for one duplicate function definition.

flang/lib/Lower/Runtime.cpp
108–109
flang/lib/Lower/Runtime.cpp
108–109

I will try this in a separate patch. We will have to put it in a header so that it can be inlined.