This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
316	Maybe compute the location first, the push is complex enough. doNothingBodyGen should probably check loopInfos size and in the innermost case execute bodyGen for the innermost loop. We can deal with non-perfectly nested ones later a TODO would suffice for now I guess.

• post.kadirselcuk added a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 10 2021, 5:55 PM

• post.kadirselcuk added a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 8:06 PM

craig.topper removed a parent revision: D105762: [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack..Jul 10 2021, 9:47 PM

ftynse removed a child revision: D34362: [LNT] Support for different DataSet usage in Polybench for "lnt runtest nt".Jul 15 2021, 5:38 AM

Update

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
316	MLIR modeling won't let you have non-perfectly nested collapsed loops at the moment.

ftynse retitled this revision from [mlir] WIP: support collapsed loops in OpenMP-to-LLVM translation to [mlir] support collapsed loops in OpenMP-to-LLVM translation.Jul 15 2021, 7:01 AM

Harbormaster completed remote builds in B114230: Diff 358954.Jul 15 2021, 7:46 AM

@MatsPetersson is working on the Flang side flow of collapse. Adding him as a reviewer.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
316	We will probably sink the code to the body in such situations in the frontend in the Flang flow.

Leporacanthicus added inline comments.Jul 16 2021, 6:15 AM

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
254–255	Am I missing something here? When I try using collapsed loops from flang, this catches me, and my compilation fails... Code on line 261 seems to imply it isn't intended to be 1 always?

Rebase

ftynse marked an inline comment as done.Jul 16 2021, 6:28 AM

ftynse added inline comments.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
254–255	Just a misrebase, this should have been removed.

Harbormaster completed remote builds in B114483: Diff 359304.Jul 16 2021, 7:04 AM

Leporacanthicus added inline comments.Jul 16 2021, 12:00 PM

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
277–286	It's getting late on a Friday, but is is possible that this is only mapping a single value? I'm seeing a crash from translating this: omp.wsloop (%arg0, %arg1, %arg2) : i32 = (%4, %4, %4) to (%28, %29, %30) step (%4, %4, %4) collapse(3) inclusive { %31 = llvm.load %20 : !llvm.ptr<i32> %32 = llvm.add %31, %arg0 : i32 %33 = llvm.add %32, %arg1 : i32 %34 = llvm.add %33, %arg2 : i32 llvm.store %34, %20 : !llvm.ptr<i32> omp.yield } The crash happens when it gets to the second add, because arg1 can't be found and a null pointer is returned, which we try to get a type from. Sorry if I'm completely off-track here, I'm about to go home, and I haven't stepped through all the code to figure out what is really going on. The crash happens on a call from line 271 below - but there's about 20 layers of call-stack to the crash-point... I'm not working on Monday-Tuesday, but will be back here on Wednesday.

The use of OpenMPIRBuilder::collapseLoops looks fine, but don't really know the omp.wsloop modelling.

No unit test coming with this?

Sorry for the rather lengthy comment. I am not sure if I'm expecting the wrong thing, I'm doing something wrong, or this patch is incorrect, so I'll try to put as much information into this comment as I can.
Any suggestions, hints or similar would be welcome. I fully expect some of my workarounds and fixes to be called out as wrong - and I don't mean my global variable, which is clearly just a workaround for not understanding how to get the correct value out of where it is known.

Background: I'm currently implement collapse in the Flang lowering to MLIR. I _think_ this is working correctly (I do not have a public patch at this time, as I was hoping to have a least some sort of functional test as part of that patch).

So, the first issue, which I did mention in a comment on line 242, I worked around the failure to find symbols with this patch:

index 1d55d0230b8f..f3bb23520d87 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -213,8 +213,8 @@ convertOmpWsLoop(Operation &opInst, llvm::IRBuilderBase &builder,
   if (loop.lowerBound().empty())
     return failure();
 
-  if (loop.getNumLoops() != 1)
-    return opInst.emitOpError("collapsed loops not yet supported");
+  //  if (loop.getNumLoops() != 1)
+  //    return opInst.emitOpError("collapsed loops not yet supported");
 
   // Static is the default.
   omp::ClauseScheduleKind schedule = omp::ClauseScheduleKind::Static;
@@ -236,13 +236,15 @@ convertOmpWsLoop(Operation &opInst, llvm::IRBuilderBase &builder,
   SmallVector<llvm::CanonicalLoopInfo *> loopInfos;
   LogicalResult bodyGenStatus = success();
   auto bodyGen = [&](llvm::OpenMPIRBuilder::InsertPointTy ip, llvm::Value *iv) {
-    if (loopInfos.size() != loop.getNumLoops() - 1)
+    auto index = loopInfos.size();
+    // Make sure further conversions know about the induction variable.
+    moduleTranslation.mapValue(loop.getRegion().front().getArgument(index), iv);
+
+    if (index != loop.getNumLoops() - 1)
       return;
 
     llvm::IRBuilder<>::InsertPointGuard guard(builder);
 
-    // Make sure further conversions know about the induction variable.
-    moduleTranslation.mapValue(loop.getRegion().front().getArgument(0), iv);
 
     llvm::BasicBlock *entryBlock = ip.getBlock();
     llvm::BasicBlock *exitBlock =

This solves the crash that I talked about om the earlier comment, but then exposes a different problem. Calling collapseLoops in convertOmpWsLoop sets the AfterIP to the wrong place - this causes incorrect LLVM-IR to be generated - an empty block and one with two unconditional branches in a row, like this:

  br label %omp_collapsed.after, !dbg !23

omp_collapsed.after:                              ; preds = %omp_collapsed.exit
  br label %omp.par.pre_finalize, !dbg !24
  br label %omp_loop.after, !dbg !23

omp_loop.after:                                   ; preds = %omp_collapsed.after

omp.wsloop.region:                                ; preds = %omp_loop.body17

where, I think, the first branch in omp_collapsed.after should be in omp_loop.after. The incorrect code is causing a crash in the findAllocas function, because some function is returning an empty set of blocks, and the next layer of code expects that to be non-empty, and falls over on the null-pointer returned when it IS empty.

I added a very ugly hack to store the OrigAfter in the collapseLoops into a global variable, and them in the end of convertOMPWsLoop set the AfterIP to the relevant stored value.

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 43c6dd9ab997..cbaa77360cd5 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -32,6 +32,8 @@
 using namespace llvm;
 using namespace omp;
 
+BasicBlock *GlobalAfter;
+
 static cl::opt<bool>
     OptimisticAttributes("openmp-ir-builder-optimistic-attributes", cl::Hidden,
                          cl::desc("Use optimistic attributes describing "
@@ -1566,6 +1568,7 @@ OpenMPIRBuilder::collapseLoops(DebugLoc DL, ArrayRef<CanonicalLoopInfo *> Loops,
   CanonicalLoopInfo *Innermost = Loops.back();
   BasicBlock *OrigPreheader = Outermost->getPreheader();
   BasicBlock *OrigAfter = Outermost->getAfter();
+  GlobalAfter = OrigAfter;
   Function *F = OrigPreheader->getParent();
 
   // Setup the IRBuilder for inserting the trip count computation.
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index f3bb23520d87..c9d83358de62 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -347,6 +347,12 @@ convertOmpWsLoop(Operation &opInst, llvm::IRBuilderBase &builder,
         ompLoc, loopInfo, allocaIP, schedType, !loop.nowait(), chunk);
   }
 
+  // HACK!
+  extern llvm::BasicBlock *GlobalAfter;
+  if (GlobalAfter) {
+         afterIP = {GlobalAfter, GlobalAfter->begin()};
+         GlobalAfter = nullptr;
+  }
   // Continue building IR after the loop.
   builder.restoreIP(afterIP);
   return success();

This allows the code to get a bit further, but it's still not right, and bails out with this:

Instruction does not dominate all uses!
  %omp_loop.tripcount2 = select i1 %37, i32 0, i32 %39, !dbg !29
  %22 = mul nuw i32 %omp_loop.tripcount, %omp_loop.tripcount2
Instruction does not dominate all uses!
  %omp_loop.tripcount13 = select i1 %43, i32 0, i32 %45, !dbg !29
  %23 = mul nuw i32 %22, %omp_loop.tripcount13
Instruction does not dominate all uses!
  %omp_loop.tripcount13 = select i1 %43, i32 0, i32 %45, !dbg !29
  %30 = urem i32 %29, %omp_loop.tripcount13, !dbg !29
Instruction does not dominate all uses!
  %omp_loop.tripcount13 = select i1 %43, i32 0, i32 %45, !dbg !29
  %31 = udiv i32 %29, %omp_loop.tripcount13, !dbg !29
Instruction does not dominate all uses!
  %omp_loop.tripcount2 = select i1 %37, i32 0, i32 %39, !dbg !29
  %32 = urem i32 %31, %omp_loop.tripcount2, !dbg !29
Instruction does not dominate all uses!
  %omp_loop.tripcount2 = select i1 %37, i32 0, i32 %39, !dbg !29
  %33 = udiv i32 %31, %omp_loop.tripcount2, !dbg !29
Instruction does not dominate all uses!
  %51 = add i32 %50, 1
  %53 = add i32 %52, %51, !dbg !32
Instruction does not dominate all uses!
  %49 = add i32 %48, 1
  %54 = add i32 %53, %49, !dbg !33
error: could not emit LLVM-IR

I'm not sure if there's something wrong in the collapseLoop or in the way that it is used?

For reference: the MLIR used to reproduce this (I don't think this needs any of my code to process):

llvm.func @_QQmain() {
  %0 = llvm.mlir.constant(3 : i32) : i32
  %1 = llvm.mlir.constant(2 : i32) : i32
  %2 = llvm.mlir.constant(5 : i32) : i32
  %3 = llvm.mlir.constant(0 : i32) : i32
  %4 = llvm.mlir.constant(1 : i32) : i32
  %5 = llvm.mlir.constant(-1 : i32) : i32
  %6 = llvm.mlir.constant(19 : i32) : i32
  %7 = llvm.mlir.constant(1 : i64) : i64
  %8 = llvm.alloca %7 x i32 {bindc_name = "a", in_type = i32, operand_segment_sizes = dense<0> : vector<2xi32>, uniq_name = "_QEa"} : (i64) -> !llvm.ptr<i32>
  %9 = llvm.mlir.constant(1 : i64) : i64
  %10 = llvm.alloca %9 x i32 {bindc_name = "b", in_type = i32, operand_segment_sizes = dense<0> : vector<2xi32>, uniq_name = "_QEb"} : (i64) -> !llvm.ptr<i32>
  %11 = llvm.mlir.constant(1 : i64) : i64
  %12 = llvm.alloca %11 x i32 {bindc_name = "c", in_type = i32, operand_segment_sizes = dense<0> : vector<2xi32>, uniq_name = "_QEc"} : (i64) -> !llvm.ptr<i32>
  %13 = llvm.mlir.constant(1 : i64) : i64
  %14 = llvm.alloca %13 x i32 {bindc_name = "i", in_type = i32, operand_segment_sizes = dense<0> : vector<2xi32>, uniq_name = "_QEi"} : (i64) -> !llvm.ptr<i32>
  %15 = llvm.mlir.constant(1 : i64) : i64
  %16 = llvm.alloca %15 x i32 {bindc_name = "j", in_type = i32, operand_segment_sizes = dense<0> : vector<2xi32>, uniq_name = "_QEj"} : (i64) -> !llvm.ptr<i32>
  %17 = llvm.mlir.constant(1 : i64) : i64
  %18 = llvm.alloca %17 x i32 {bindc_name = "k", in_type = i32, operand_segment_sizes = dense<0> : vector<2xi32>, uniq_name = "_QEk"} : (i64) -> !llvm.ptr<i32>
  %19 = llvm.mlir.constant(1 : i64) : i64
  %20 = llvm.alloca %19 x i32 {bindc_name = "x", in_type = i32, operand_segment_sizes = dense<0> : vector<2xi32>, uniq_name = "_QEx"} : (i64) -> !llvm.ptr<i32>
  llvm.store %0, %8 : !llvm.ptr<i32>
  llvm.store %1, %10 : !llvm.ptr<i32>
  llvm.store %2, %12 : !llvm.ptr<i32>
  llvm.store %3, %20 : !llvm.ptr<i32>
  omp.parallel {
    %28 = llvm.load %8 : !llvm.ptr<i32>
    %29 = llvm.load %10 : !llvm.ptr<i32>
    %30 = llvm.load %12 : !llvm.ptr<i32>
    omp.wsloop (%arg0, %arg1, %arg2) : i32 = (%4, %4, %4) to (%28, %29, %30) step (%4, %4, %4) collapse(3) inclusive {
      %31 = llvm.load %20 : !llvm.ptr<i32>
      %32 = llvm.add %31, %arg0  : i32
      %33 = llvm.add %32, %arg1  : i32
      %34 = llvm.add %33, %arg2  : i32
      llvm.store %34, %20 : !llvm.ptr<i32>
      omp.yield
    }
    omp.terminator
  }
  %21 = llvm.mlir.addressof @_QQcl.2E2F636F6C6C617073652E66393000 : !llvm.ptr<array<15 x i8>>
  %22 = llvm.bitcast %21 : !llvm.ptr<array<15 x i8>> to !llvm.ptr<i8>
  %23 = llvm.call @_FortranAioBeginExternalListOutput(%5, %22, %6) : (i32, !llvm.ptr<i8>, i32) -> !llvm.ptr<i8>
  %24 = llvm.load %20 : !llvm.ptr<i32>
  %25 = llvm.sext %24 : i32 to i64
  %26 = llvm.call @_FortranAioOutputInteger64(%23, %25) : (!llvm.ptr<i8>, i64) -> i1
  %27 = llvm.call @_FortranAioEndIoStatement(%23) : (!llvm.ptr<i8>) -> i32
  llvm.return
}
llvm.func @_FortranAioBeginExternalListOutput(i32, !llvm.ptr<i8>, i32) -> !llvm.ptr<i8> attributes {fir.io, fir.runtime, sym_visibility = "private"}
llvm.mlir.global linkonce constant @_QQcl.2E2F636F6C6C617073652E66393000() : !llvm.array<15 x i8> {
  %0 = llvm.mlir.constant("./collapse.f90\00") : !llvm.array<15 x i8>
  llvm.return %0 : !llvm.array<15 x i8>
}
llvm.func @_FortranAioOutputInteger64(!llvm.ptr<i8>, i64) -> i1 attributes {fir.io, fir.runtime, sym_visibility = "private"}
llvm.func @_FortranAioEndIoStatement(!llvm.ptr<i8>) -> i32 attributes {fir.io, fir.runtime, sym_visibility = "private"}

In D105706#2904731, @Leporacanthicus wrote:

Calling collapseLoops in convertOmpWsLoop sets the AfterIP to the wrong place - this causes incorrect LLVM-IR to be generated - an empty block and one with two unconditional branches in a row, like this:

Where does AfterIP point to and where do you think it should point to after collapseLoops? Should the result of collapseLoops be changed or is it an issue of how convertOmpWsLoop uses CanonicalLoopInfo?

I would appreciate a test case to the patch that would help me reproduce the problem.

In D105706#2905915, @Meinersbur wrote:

In D105706#2904731, @Leporacanthicus wrote:

Calling collapseLoops in convertOmpWsLoop sets the AfterIP to the wrong place - this causes incorrect LLVM-IR to be generated - an empty block and one with two unconditional branches in a row, like this:

Where does AfterIP point to and where do you think it should point to after collapseLoops? Should the result of collapseLoops be changed or is it an issue of how convertOmpWsLoop uses CanonicalLoopInfo?

It points at omp_collapsed.after, I think it should point at omp_loop.after. This branch is inserted by convertOmpRegions on the second pass through (first one is the one that converts the body or some such - I'm still not 100% sure how all this hangs together to be honest).

I'm not at all sure this is right, but I'm just trying to make my stuff [which generates the MLIR as attached in the original long comment] work all the way to LLVM-IR.

Perhaps the problem is that we should have two different CanonicalLoopInfo variables, one for the outer scope, and one for the collapsed loops? Or maybe the problem is with my understanding of what is supposed to happen [although there appears to be a problem with the way that loops are collapsed and the arguments bound, as my initial attempt crashes during the code-generation just before collapseLoops.

I would appreciate a test case to the patch that would help me reproduce the problem.

I would love to have a better test than the MLIR I attached, but at present, that's as far as I have got. I'm currently just doing build/bin/tco collapse.mlir -o collapse.ll with the MLIR that I put at the end of the long comment.

Meinersbur added inline comments.Jul 27 2021, 12:45 PM

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
371	I think this is wrong. The CanonicalLoopInfo returned by createStaticWorkshareLoop represents the inner loop over chunks, not the contruct as a whole. Don't use the returned CanonicalLoopInfo unless you want to process the loop. The workshare reagion itself ends after `kmpc_for_static_fini` which is not part of the inner canonical loop.

Fix insertion points.

Perhaps the problem is that we should have two different CanonicalLoopInfo variables, one for the outer scope, and one for the collapsed loops? Or maybe the problem is with my understanding of what is supposed to happen [although there appears to be a problem with the way that loops are collapsed and the arguments bound, as my initial attempt crashes during the code-generation just before collapseLoops.

Yeah, what looks disturbing to me is the behavior of collapseLoops. I expected it to either mutate the outermost canonical loop, or create an entirely new loop and move the bodies of the old loop there. What it does, however, is to create an additional canonical loop half-way inside the outermost loop so that the canonical loop has an After block distinct from that of the outermost loop and, what's worse, preceding it in the control flow.

The rest, i.e. the dominance issues, is due to CanonicalLoopInfo::getBodyIP pointing to a different location (entry of the first body block) than the bodyIP of the body builder codegen (before the terminator of the first body block). This is not an issue if the loop is known to have a zero lower bound and a unit step, which was the case for my test cases.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
371	If you meant "returned by collapseLoops" instead of "returned by createStaticWorkshareLoop", then it was indeed wrong. `createStaticWorkshareLoop` returns a mutated canonical loop info that is perfectly fine to use here. `collapseLoops` does not, which was highly unintuitive for me, given that virtually all the other builder APIs return an object that contains the insertion point for next operations.

Harbormaster completed remote builds in B116773: Diff 362501.Jul 28 2021, 2:28 PM

Thanks for the updated patch, this seems to work for my simple use-case.

Herald added a subscriber: Chia-hungDuan. · View Herald TranscriptJul 29 2021, 7:20 AM

Thank you for the test case

Before collapse:

After collapse:

After workshare:

In D105706#2911257, @ftynse wrote:

I expected it to either mutate the outermost canonical loop, or create an entirely new loop and move the bodies of the old loop there. What it does, however, is to create an additional canonical loop half-way inside the outermost loop so that the canonical loop has an After block distinct from that of the outermost loop and, what's worse, preceding it in the control flow.

There might be interleaving code between the loops that cannot be discarded. The OpenMP spec basically allows it to be moved into the loop body of the collapsed loop (executing it multiple times per pre-collapse execution) which is what collapseLoop does. An alternative is to protect it with a conditional to only execute interleaving code when the inner loop's logical loop counter is zero.

The outermost CanonicalLoopInfo could have been reused, so could have the innermost. From an interface perspective, the caller should not make any assumptions on what happens with the original loop blocks (the ones that are under control of CanonicalLoopInfo) or the CanonicalLoopInfo itself.

The rest, i.e. the dominance issues, is due to CanonicalLoopInfo::getBodyIP pointing to a different location (entry of the first body block) than the bodyIP of the body builder codegen (before the terminator of the first body block). This is not an issue if the loop is known to have a zero lower bound and a unit step, which was the case for my test cases.

CanonicalLoopInfo::getBody has to return the block that follows the loop condition branch. If you want the original Body BB, store it in a variable and use that.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
371	I meant "returned by createStaticWorkshareLoop" (although I now think it is not the issue here). A canonical loop is a construct that that executes the body a number of times without itself having any side-effects per OpenMP specification(). This is not the case for calls to the runtime library such as `@__kmpc_for_static_fini`. Hence, it should NOT be part of the CanonicalLoopInfo and AfterIP return an InsertionPoint where there is no site-effect between the last execution of the body and whatever follows the loop. Calls that take a `CanonicalLoopInfo` as input are different to the other `Create` Calls in that it does not insert something new at an InsertionPoint, but takes an existing piece that is already part of the control flow and transforms it. Additional code should be appended after the code that the original `Create` inserted, not where the code has been transformed. Ity might have been better if we names those two kinds of `Create` methods difference. Such as `InsertXYZ` and `ApplyXYZ`. () Per specification, if there are side-effects, it is unspecified when they happen or how often.

Thanks for the visualization! My question is basically why can't we coalesce omp_loop.preheader/omp_collapsed.preheader omp_collapsed.after/omp_loop.after pairs?

The outermost CanonicalLoopInfo could have been reused, so could have the innermost. From an interface perspective, the caller should not make any assumptions on what happens with the original loop blocks (the ones that are under control of CanonicalLoopInfo) or the CanonicalLoopInfo itself.

Yes, that's why it feels strange to me that I can just store the InsertPoint of the builder (or, alternatively, use OutermostCanonicalLoopInfo->getAfterIP()) if it points to the block managed by CanonicalLoopInfo that was passed into the call and therefore potentially invalidated.

CanonicalLoopInfo::getBody has to return the block that follows the loop condition branch. If you want the original Body BB, store it in a variable and use that.

It's the same block, but the insertion iterator is different. getBodyIP() returns the begin() of the block, but there may be a mul/add pair that computes the non-canonical induction variable.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
371	Yes, using a different naming scheme and a clear documentation that the previous insertion point should still be used sounds good.

In D105706#2916159, @ftynse wrote:

My question is basically why can't we coalesce omp_loop.preheader/omp_collapsed.preheader omp_collapsed.after/omp_loop.after pairs?

Implementation-wise, the new preheader+after BBs are created by createLoopSkeleton and are just connected to the rest of the CFG.
omp_loop.preheader could be reused by appending instructions from omp_collapsed.preheader instead of creating that BB.
omp_loop.after could be reused instead of creating omp_collapsed.after by preprending instructions.
However, I don't see what gain the extra complexity would bring.

Yes, that's why it feels strange to me that I can just store the InsertPoint of the builder (or, alternatively, use OutermostCanonicalLoopInfo->getAfterIP()) if it points to the block managed by CanonicalLoopInfo that was passed into the call and therefore potentially invalidated.

Insertion points to user code should not be invalidated.
The After (and Body and Preheader) BBs contain user-code (Preheader=any code before the canonical loop; After=any code after the loop; Body=iteration code) and therefore are not under control of CanonicalLoopInfo. My goal was to not modify these except to connect them to new BBs. This should ensure that all InsertionPoints into these BBs remain valid.

At this point, I think it is less risky to not modify existing block and just redirect branch instructions. For instance, someone has an InsertionPoint to the terminator of omp_loop.preheader. I.e. it will insert instructions to the end of the preheader. If collapseLoop would insert instructions to the preheader (calculate the new trip count) and necessarily before the branch instruction (if it did not replace the branch instruction itself which would invalidate the InsertionPoint), the InsectionPoint would insert after the collapseLoop's control instruction when the original meaning was to insert before the loop and its control statements.

In case of collapseLoop, these control instructions should be sideeffect-free so it may not matter, but for CreateWorkshareLoop, an InsertionPoint before the loop should point continue to be before any control instructions intruduced, especially __kmpc_for_static_init.

CanonicalLoopInfo::getBody has to return the block that follows the loop condition branch. If you want the original Body BB, store it in a variable and use that.

It's the same block, but the insertion iterator is different. getBodyIP() returns the begin() of the block, but there may be a mul/add pair that computes the non-canonical induction variable.

It is the begin of the transformed canonical loop. If access to the (untransformed) induction variables is needed, use the Body BB from the loop that introduced it.

Anything else would not be composable. Let's say we apply one more transformation on the collapsed loop, say unrolling. For unrolling the only relevant loop induction variable is the one introduced by collapseLoop, starting to be valid at the new BodyIP. From that standpoint evertyhing from Body to Latch is "user code", what original induction variables have been computed from it is not its concern. Unrolling will also need to insert code that computes the value of the collapsed induction variable from the induction variable of the unrolled loop (after duplicating it). It needs to insert that before any code that uses the collapsed induction variable (and then do a RUOW) which is represented by ... getBodyIP().

If this is still unclear, let's have a conference call conversation.

In D105706#2917702, @Meinersbur wrote:

In D105706#2916159, @ftynse wrote:

Yes, that's why it feels strange to me that I can just store the InsertPoint of the builder (or, alternatively, use OutermostCanonicalLoopInfo->getAfterIP()) if it points to the block managed by CanonicalLoopInfo that was passed into the call and therefore potentially invalidated.

Insertion points to user code should not be invalidated.
The After (and Body and Preheader) BBs contain user-code (Preheader=any code before the canonical loop; After=any code after the loop; Body=iteration code) and therefore are not under control of CanonicalLoopInfo. My goal was to not modify these except to connect them to new BBs. This should ensure that all InsertionPoints into these BBs remain valid.

At this point, I think it is less risky to not modify existing block and just redirect branch instructions. For instance, someone has an InsertionPoint to the terminator of omp_loop.preheader. I.e. it will insert instructions to the end of the preheader. If collapseLoop would insert instructions to the preheader (calculate the new trip count) and necessarily before the branch instruction (if it did not replace the branch instruction itself which would invalidate the InsertionPoint), the InsectionPoint would insert after the collapseLoop's control instruction when the original meaning was to insert before the loop and its control statements.

It would be great if these expectations had been documented next to CanonicalLoopInfo, especially what is considered "user code" and what is not.

CanonicalLoopInfo::getBody has to return the block that follows the loop condition branch. If you want the original Body BB, store it in a variable and use that.

It's the same block, but the insertion iterator is different. getBodyIP() returns the begin() of the block, but there may be a mul/add pair that computes the non-canonical induction variable.

It is the begin of the transformed canonical loop. If access to the (untransformed) induction variables is needed, use the Body BB from the loop that introduced it.

Anything else would not be composable. Let's say we apply one more transformation on the collapsed loop, say unrolling. For unrolling the only relevant loop induction variable is the one introduced by collapseLoop, starting to be valid at the new BodyIP. From that standpoint evertyhing from Body to Latch is "user code", what original induction variables have been computed from it is not its concern. Unrolling will also need to insert code that computes the value of the collapsed induction variable from the induction variable of the unrolled loop (after duplicating it). It needs to insert that before any code that uses the collapsed induction variable (and then do a RUOW) which is represented by ... getBodyIP().

Ditto, it's better when this reasoning is provided in the inline documentation.

Let me work on improving the documentation. For "user-code", I didn't want to commit to what InsertPoint guarantees to make, as seen by CanonicalLoopInfo::collectControlBlocks which only excludes the Body block. I expected this to be developed over time. I didn't not expect people to use CanonicalLoopInfo for code regions that are not canonical loops.

In D105706#2920497, @Meinersbur wrote:

Let me work on improving the documentation.

Thanks!

For "user-code", I didn't want to commit to what InsertPoint guarantees to make, as seen by CanonicalLoopInfo::collectControlBlocks which only excludes the Body block. I expected this to be developed over time.

IMO, it's totally fine to say "currently, this does <...>, but this may change in the future".

Do you think this patch is okay to go or do you see other issues?

Herald added a subscriber: wrengr. · View Herald TranscriptAug 3 2021, 1:15 AM

In D105706#2921584, @ftynse wrote:

Do you think this patch is okay to go or do you see other issues?

There is this one issue of not using stale CanonicalLoopInfo.

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
371	Please don't any of the `loopInfos` anymore they have been consumed by `collapseLoops`. With the documentation patch I am adding and invalidation flag that ensures that the `CanonicalLoopInfo` is not used anymore after it does not exist anymore in the IR.

Done

ftynse marked an inline comment as done.Aug 4 2021, 4:01 AM

Harbormaster completed remote builds in B117861: Diff 364034.Aug 4 2021, 4:32 AM

Meinersbur mentioned this in D107540: [OMPIRBuilder] Clarify CanonicalLoopInfo. NFC..Aug 5 2021, 1:36 AM

LGTM

This revision is now accepted and ready to land.Aug 5 2021, 3:23 PM

This revision was landed with ongoing or failed builds.Aug 6 2021, 8:13 AM

Closed by commit rGc4c103097660: [mlir] support collapsed loops in OpenMP-to-LLVM translation (authored by ftynse). · Explain Why

This revision was automatically updated to reflect the committed changes.

ftynse added a commit: rGc4c103097660: [mlir] support collapsed loops in OpenMP-to-LLVM translation.

Meinersbur mentioned this in rGb1de32d6ddd9: [OMPIRBuilder] Clarify CanonicalLoopInfo. NFC..Aug 12 2021, 7:02 PM

kiranchandramohan mentioned this in D131674: [NFC][OpenMP] Update simd loop collapse support description.Aug 16 2022, 5:33 AM

domada mentioned this in D132023: [NFC][Flang] Add simd collapse test case.Aug 17 2022, 4:18 AM

Revision Contents

Path

Size

mlir/

lib/

Target/

LLVMIR/

Dialect/

OpenMP/

OpenMPToLLVMIRTranslation.cpp

91 lines

test/

Target/

LLVMIR/

openmp-llvm.mlir

61 lines

Diff 364801

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

	Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines
	/// Converts an OpenMP workshare loop into LLVM IR using OpenMPIRBuilder.			/// Converts an OpenMP workshare loop into LLVM IR using OpenMPIRBuilder.
	static LogicalResult			static LogicalResult
	convertOmpWsLoop(Operation &opInst, llvm::IRBuilderBase &builder,			convertOmpWsLoop(Operation &opInst, llvm::IRBuilderBase &builder,
	LLVM::ModuleTranslation &moduleTranslation) {			LLVM::ModuleTranslation &moduleTranslation) {
	auto loop = cast<omp::WsLoopOp>(opInst);			auto loop = cast<omp::WsLoopOp>(opInst);
	// TODO: this should be in the op verifier instead.			// TODO: this should be in the op verifier instead.
	if (loop.lowerBound().empty())			if (loop.lowerBound().empty())
	return failure();			return failure();

	if (loop.getNumLoops() != 1)
	return opInst.emitOpError("collapsed loops not yet supported");

	// Static is the default.			// Static is the default.
				LeporacanthicusUnsubmitted Done Reply Inline Actions Am I missing something here? When I try using collapsed loops from flang, this catches me, and my compilation fails... Code on line 261 seems to imply it isn't intended to be 1 always? Leporacanthicus: Am I missing something here? When I try using collapsed loops from flang, this catches me, and…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Just a misrebase, this should have been removed. ftynse: Just a misrebase, this should have been removed.
	omp::ClauseScheduleKind schedule = omp::ClauseScheduleKind::Static;			omp::ClauseScheduleKind schedule = omp::ClauseScheduleKind::Static;
	if (loop.schedule_val().hasValue())			if (loop.schedule_val().hasValue())
	schedule =			schedule =
	*omp::symbolizeClauseScheduleKind(loop.schedule_val().getValue());			*omp::symbolizeClauseScheduleKind(loop.schedule_val().getValue());

	// Find the loop configuration.
	llvm::Value *lowerBound = moduleTranslation.lookupValue(loop.lowerBound()[0]);
	llvm::Value *upperBound = moduleTranslation.lookupValue(loop.upperBound()[0]);
	llvm::Value *step = moduleTranslation.lookupValue(loop.step()[0]);
	llvm::Type *ivType = step->getType();
	llvm::Value *chunk =
	loop.schedule_chunk_var()
	? moduleTranslation.lookupValue(loop.schedule_chunk_var())
	: llvm::ConstantInt::get(ivType, 1);

	// Set up the source location value for OpenMP runtime.			// Set up the source location value for OpenMP runtime.
	llvm::DISubprogram *subprogram =			llvm::DISubprogram *subprogram =
	builder.GetInsertBlock()->getParent()->getSubprogram();			builder.GetInsertBlock()->getParent()->getSubprogram();
	const llvm::DILocation *diLoc =			const llvm::DILocation *diLoc =
	moduleTranslation.translateLoc(opInst.getLoc(), subprogram);			moduleTranslation.translateLoc(opInst.getLoc(), subprogram);
	llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder.saveIP(),			llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder.saveIP(),
	llvm::DebugLoc(diLoc));			llvm::DebugLoc(diLoc));

	// Generator of the canonical loop body. Produces an SESE region of basic			// Generator of the canonical loop body.
	// blocks.
	// TODO: support error propagation in OpenMPIRBuilder and use it instead of			// TODO: support error propagation in OpenMPIRBuilder and use it instead of
	// relying on captured variables.			// relying on captured variables.
				SmallVector<llvm::CanonicalLoopInfo *> loopInfos;
				SmallVector<llvm::OpenMPIRBuilder::InsertPointTy> bodyInsertPoints;
	LogicalResult bodyGenStatus = success();			LogicalResult bodyGenStatus = success();
	auto bodyGen = [&](llvm::OpenMPIRBuilder::InsertPointTy ip, llvm::Value *iv) {			auto bodyGen = [&](llvm::OpenMPIRBuilder::InsertPointTy ip, llvm::Value *iv) {
	llvm::IRBuilder<>::InsertPointGuard guard(builder);

	// Make sure further conversions know about the induction variable.			// Make sure further conversions know about the induction variable.
	moduleTranslation.mapValue(loop.getRegion().front().getArgument(0), iv);			moduleTranslation.mapValue(
				loop.getRegion().front().getArgument(loopInfos.size()), iv);

				// Capture the body insertion point for use in nested loops. BodyIP of the
				// CanonicalLoopInfo always points to the beginning of the entry block of
				// the body.
				bodyInsertPoints.push_back(ip);

				if (loopInfos.size() != loop.getNumLoops() - 1)
				return;
				LeporacanthicusUnsubmitted Done Reply Inline Actions It's getting late on a Friday, but is is possible that this is only mapping a single value? I'm seeing a crash from translating this: omp.wsloop (%arg0, %arg1, %arg2) : i32 = (%4, %4, %4) to (%28, %29, %30) step (%4, %4, %4) collapse(3) inclusive { %31 = llvm.load %20 : !llvm.ptr<i32> %32 = llvm.add %31, %arg0 : i32 %33 = llvm.add %32, %arg1 : i32 %34 = llvm.add %33, %arg2 : i32 llvm.store %34, %20 : !llvm.ptr<i32> omp.yield } The crash happens when it gets to the second add, because arg1 can't be found and a null pointer is returned, which we try to get a type from. Sorry if I'm completely off-track here, I'm about to go home, and I haven't stepped through all the code to figure out what is really going on. The crash happens on a call from line 271 below - but there's about 20 layers of call-stack to the crash-point... I'm not working on Monday-Tuesday, but will be back here on Wednesday. Leporacanthicus: It's getting late on a Friday, but is is possible that this is only mapping a single value?

				// Convert the body of the loop.
	llvm::BasicBlock *entryBlock = ip.getBlock();			llvm::BasicBlock *entryBlock = ip.getBlock();
	llvm::BasicBlock *exitBlock =			llvm::BasicBlock *exitBlock =
	entryBlock->splitBasicBlock(ip.getPoint(), "omp.wsloop.exit");			entryBlock->splitBasicBlock(ip.getPoint(), "omp.wsloop.exit");

	// Convert the body of the loop.
	convertOmpOpRegions(loop.region(), "omp.wsloop.region", *entryBlock,			convertOmpOpRegions(loop.region(), "omp.wsloop.region", *entryBlock,
	*exitBlock, builder, moduleTranslation, bodyGenStatus);			*exitBlock, builder, moduleTranslation, bodyGenStatus);
	};			};

	// Delegate actual loop construction to the OpenMP IRBuilder.			// Delegate actual loop construction to the OpenMP IRBuilder.
	// TODO: this currently assumes WsLoop is semantically similar to SCF loop,			// TODO: this currently assumes WsLoop is semantically similar to SCF loop,
	// i.e. it has a positive step, uses signed integer semantics. Reconsider			// i.e. it has a positive step, uses signed integer semantics. Reconsider
	// this code when WsLoop clearly supports more cases.			// this code when WsLoop clearly supports more cases.
	llvm::CanonicalLoopInfo *loopInfo =			llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
	moduleTranslation.getOpenMPBuilder()->createCanonicalLoop(			for (unsigned i = 0, e = loop.getNumLoops(); i < e; ++i) {
	ompLoc, bodyGen, lowerBound, upperBound, step, /IsSigned=/true,			llvm::Value *lowerBound =
	/InclusiveStop=/loop.inclusive());			moduleTranslation.lookupValue(loop.lowerBound()[i]);
				llvm::Value *upperBound =
				moduleTranslation.lookupValue(loop.upperBound()[i]);
				llvm::Value *step = moduleTranslation.lookupValue(loop.step()[i]);

				// Make sure loop trip count are emitted in the preheader of the outermost
				// loop at the latest so that they are all available for the new collapsed
				// loop will be created below.
				llvm::OpenMPIRBuilder::LocationDescription loc = ompLoc;
				llvm::OpenMPIRBuilder::InsertPointTy computeIP = ompLoc.IP;
				if (i != 0) {
				loc = llvm::OpenMPIRBuilder::LocationDescription(bodyInsertPoints.back(),
				llvm::DebugLoc(diLoc));
				computeIP = loopInfos.front()->getPreheaderIP();
				jdoerfertUnsubmitted Done Reply Inline Actions Maybe compute the location first, the push is complex enough. doNothingBodyGen should probably check loopInfos size and in the innermost case execute bodyGen for the innermost loop. We can deal with non-perfectly nested ones later a TODO would suffice for now I guess. jdoerfert: Maybe compute the location first, the push is complex enough. doNothingBodyGen should probably…
				ftynseAuthorUnsubmitted Done Reply Inline Actions MLIR modeling won't let you have non-perfectly nested collapsed loops at the moment. ftynse: MLIR modeling won't let you have non-perfectly nested collapsed loops at the moment.
				kiranchandramohanUnsubmitted Not Done Reply Inline Actions We will probably sink the code to the body in such situations in the frontend in the Flang flow. kiranchandramohan: We will probably sink the code to the body in such situations in the frontend in the Flang flow.
				}
				loopInfos.push_back(ompBuilder->createCanonicalLoop(
				loc, bodyGen, lowerBound, upperBound, step,
				/IsSigned=/true, loop.inclusive(), computeIP));

	if (failed(bodyGenStatus))			if (failed(bodyGenStatus))
	return failure();			return failure();
				}

				// Collapse loops. Store the insertion point because LoopInfos may get
				// invalidated.
				llvm::IRBuilderBase::InsertPoint afterIP = loopInfos.front()->getAfterIP();
				llvm::CanonicalLoopInfo *loopInfo =
				ompBuilder->collapseLoops(diLoc, loopInfos, {});

				// Find the loop configuration.
				llvm::Type *ivType = loopInfo->getIndVar()->getType();
				llvm::Value *chunk =
				loop.schedule_chunk_var()
				? moduleTranslation.lookupValue(loop.schedule_chunk_var())
				: llvm::ConstantInt::get(ivType, 1);
	llvm::OpenMPIRBuilder::InsertPointTy allocaIP =			llvm::OpenMPIRBuilder::InsertPointTy allocaIP =
	findAllocaInsertPoint(builder, moduleTranslation);			findAllocaInsertPoint(builder, moduleTranslation);
	llvm::OpenMPIRBuilder::InsertPointTy afterIP;
	llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
	if (schedule == omp::ClauseScheduleKind::Static) {			if (schedule == omp::ClauseScheduleKind::Static) {
	loopInfo = ompBuilder->createStaticWorkshareLoop(ompLoc, loopInfo, allocaIP,			ompBuilder->createStaticWorkshareLoop(ompLoc, loopInfo, allocaIP,
	!loop.nowait(), chunk);			!loop.nowait(), chunk);
	afterIP = loopInfo->getAfterIP();
	} else {			} else {
	llvm::omp::OMPScheduleType schedType;			llvm::omp::OMPScheduleType schedType;
	switch (schedule) {			switch (schedule) {
	case omp::ClauseScheduleKind::Dynamic:			case omp::ClauseScheduleKind::Dynamic:
	schedType = llvm::omp::OMPScheduleType::DynamicChunked;			schedType = llvm::omp::OMPScheduleType::DynamicChunked;
	break;			break;
	case omp::ClauseScheduleKind::Guided:			case omp::ClauseScheduleKind::Guided:
	schedType = llvm::omp::OMPScheduleType::GuidedChunked;			schedType = llvm::omp::OMPScheduleType::GuidedChunked;
	break;			break;
	case omp::ClauseScheduleKind::Auto:			case omp::ClauseScheduleKind::Auto:
	schedType = llvm::omp::OMPScheduleType::Auto;			schedType = llvm::omp::OMPScheduleType::Auto;
	break;			break;
	case omp::ClauseScheduleKind::Runtime:			case omp::ClauseScheduleKind::Runtime:
	schedType = llvm::omp::OMPScheduleType::Runtime;			schedType = llvm::omp::OMPScheduleType::Runtime;
	break;			break;
	default:			default:
	llvm_unreachable("Unknown schedule value");			llvm_unreachable("Unknown schedule value");
	break;			break;
	}			}

	afterIP = ompBuilder->createDynamicWorkshareLoop(			ompBuilder->createDynamicWorkshareLoop(ompLoc, loopInfo, allocaIP,
	ompLoc, loopInfo, allocaIP, schedType, !loop.nowait(), chunk);			schedType, !loop.nowait(), chunk);
	}			}

	// Continue building IR after the loop.			// Continue building IR after the loop. Note that the LoopInfo returned by
				// `collapseLoops` points inside the outermost loop and is intended for
				// potential further loop transformations. Use the insertion point stored
				// before collapsing loops instead.
	builder.restoreIP(afterIP);			builder.restoreIP(afterIP);
				MeinersburUnsubmitted Done Reply Inline Actions I think this is wrong. The CanonicalLoopInfo returned by createStaticWorkshareLoop represents the inner loop over chunks, not the contruct as a whole. Don't use the returned CanonicalLoopInfo unless you want to process the loop. The workshare reagion itself ends after `kmpc_for_static_fini` which is not part of the inner canonical loop. Meinersbur: I think this is wrong. The CanonicalLoopInfo returned by createStaticWorkshareLoop represents…
				ftynseAuthorUnsubmitted Done Reply Inline Actions If you meant "returned by collapseLoops" instead of "returned by createStaticWorkshareLoop", then it was indeed wrong. `createStaticWorkshareLoop` returns a mutated canonical loop info that is perfectly fine to use here. `collapseLoops` does not, which was highly unintuitive for me, given that virtually all the other builder APIs return an object that contains the insertion point for next operations. ftynse: If you meant "returned by collapseLoops" instead of "returned by createStaticWorkshareLoop"…
				MeinersburUnsubmitted Not Done Reply Inline Actions I meant "returned by createStaticWorkshareLoop" (although I now think it is not the issue here). A canonical loop is a construct that that executes the body a number of times without itself having any side-effects per OpenMP specification(). This is not the case for calls to the runtime library such as `@__kmpc_for_static_fini`. Hence, it should NOT be part of the CanonicalLoopInfo and AfterIP return an InsertionPoint where there is no site-effect between the last execution of the body and whatever follows the loop. Calls that take a `CanonicalLoopInfo` as input are different to the other `Create` Calls in that it does not insert something new at an InsertionPoint, but takes an existing piece that is already part of the control flow and transforms it. Additional code should be appended after the code that the original `Create` inserted, not where the code has been transformed. Ity might have been better if we names those two kinds of `Create` methods difference. Such as `InsertXYZ` and `ApplyXYZ`. () Per specification, if there are side-effects, it is unspecified when they happen or how often. Meinersbur: I meant "returned by createStaticWorkshareLoop" (although I now think it is not the issue here).
				ftynseAuthorUnsubmitted Done Reply Inline Actions Yes, using a different naming scheme and a clear documentation that the previous insertion point should still be used sounds good. ftynse: Yes, using a different naming scheme and a clear documentation that the previous insertion…
				MeinersburUnsubmitted Done Reply Inline Actions Please don't any of the `loopInfos` anymore they have been consumed by `collapseLoops`. With the documentation patch I am adding and invalidation flag that ensures that the `CanonicalLoopInfo` is not used anymore after it does not exist anymore in the IR. Meinersbur: Please don't any of the `loopInfos` anymore they have been consumed by `collapseLoops`. With…
	return success();			return success();
	}			}

	namespace {			namespace {

	/// Implementation of the dialect interface that converts operations belonging			/// Implementation of the dialect interface that converts operations belonging
	/// to the OpenMP dialect to LLVM IR.			/// to the OpenMP dialect to LLVM IR.
	class OpenMPDialectLLVMIRTranslationInterface			class OpenMPDialectLLVMIRTranslationInterface
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

mlir/test/Target/LLVMIR/openmp-llvm.mlir

Show First 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	omp.wsloop (%iv) : i64 = (%lb) to (%ub) step (%step) schedule(guided) {
// CHECK: %[[cond:.*]] = icmp ne i32 %[[continue]], 0		// CHECK: %[[cond:.*]] = icmp ne i32 %[[continue]], 0
// CHECK br i1 %[[cond]], label %omp_loop.header{{.}}, label %omp_loop.exit{{.}}		// CHECK br i1 %[[cond]], label %omp_loop.header{{.}}, label %omp_loop.exit{{.}}
llvm.call @body(%iv) : (i64) -> ()		llvm.call @body(%iv) : (i64) -> ()
omp.yield		omp.yield
}		}
llvm.return		llvm.return
}		}

		// -----

// CHECK-LABEL: @omp_critical		// CHECK-LABEL: @omp_critical
llvm.func @omp_critical(%x : !llvm.ptr<i32>, %xval : i32) -> () {		llvm.func @omp_critical(%x : !llvm.ptr<i32>, %xval : i32) -> () {
// CHECK: call void @__kmpc_critical_with_hint({{.}}critical_user_.var{{.}}, i32 0)		// CHECK: call void @__kmpc_critical_with_hint({{.}}critical_user_.var{{.}}, i32 0)
// CHECK: br label %omp.critical.region		// CHECK: br label %omp.critical.region
// CHECK: omp.critical.region		// CHECK: omp.critical.region
omp.critical {		omp.critical {
// CHECK: store		// CHECK: store
llvm.store %xval, %x : !llvm.ptr<i32>		llvm.store %xval, %x : !llvm.ptr<i32>
omp.terminator		omp.terminator
}		}
// CHECK: call void @__kmpc_end_critical({{.}}critical_user_.var{{.}})		// CHECK: call void @__kmpc_end_critical({{.}}critical_user_.var{{.}})

// CHECK: call void @__kmpc_critical_with_hint({{.}}critical_user_mutex.var{{.}}, i32 2)		// CHECK: call void @__kmpc_critical_with_hint({{.}}critical_user_mutex.var{{.}}, i32 2)
// CHECK: br label %omp.critical.region		// CHECK: br label %omp.critical.region
// CHECK: omp.critical.region		// CHECK: omp.critical.region
omp.critical(@mutex) hint(contended) {		omp.critical(@mutex) hint(contended) {
// CHECK: store		// CHECK: store
llvm.store %xval, %x : !llvm.ptr<i32>		llvm.store %xval, %x : !llvm.ptr<i32>
omp.terminator		omp.terminator
}		}
// CHECK: call void @__kmpc_end_critical({{.}}critical_user_mutex.var{{.}})		// CHECK: call void @__kmpc_end_critical({{.}}critical_user_mutex.var{{.}})
		llvm.return
		}

		// -----

		// Check that the loop bounds are emitted in the correct location in case of
		// collapse. This only checks the overall shape of the IR, detailed checking
		// is done by the OpenMPIRBuilder.

		// CHECK-LABEL: @collapse_wsloop
		// CHECK: i32* noalias %[[TIDADDR:[0-9A-Za-z.]*]]
		// CHECK: load i32, i32* %[[TIDADDR]]
		// CHECK: store
		// CHECK: load
		// CHECK: %[[LB0:.*]] = load i32
		// CHECK: %[[UB0:.*]] = load i32
		// CHECK: %[[STEP0:.*]] = load i32
		// CHECK: %[[LB1:.*]] = load i32
		// CHECK: %[[UB1:.*]] = load i32
		// CHECK: %[[STEP1:.*]] = load i32
		// CHECK: %[[LB2:.*]] = load i32
		// CHECK: %[[UB2:.*]] = load i32
		// CHECK: %[[STEP2:.*]] = load i32
		llvm.func @collapse_wsloop(
		%0: i32, %1: i32, %2: i32,
		%3: i32, %4: i32, %5: i32,
		%6: i32, %7: i32, %8: i32,
		%20: !llvm.ptr<i32>) {
		omp.parallel {
		// CHECK: icmp slt i32 %[[LB0]], 0
		// CHECK-COUNT-4: select
		// CHECK: %[[TRIPCOUNT0:.*]] = select
		// CHECK: br label %[[PREHEADER:.*]]
		//
		// CHECK: [[PREHEADER]]:
		// CHECK: icmp slt i32 %[[LB1]], 0
		// CHECK-COUNT-4: select
		// CHECK: %[[TRIPCOUNT1:.*]] = select
		// CHECK: icmp slt i32 %[[LB2]], 0
		// CHECK-COUNT-4: select
		// CHECK: %[[TRIPCOUNT2:.*]] = select
		// CHECK: %[[PROD:.*]] = mul nuw i32 %[[TRIPCOUNT0]], %[[TRIPCOUNT1]]
		// CHECK: %[[TOTAL:.*]] = mul nuw i32 %[[PROD]], %[[TRIPCOUNT2]]
		// CHECK: br label %[[COLLAPSED_PREHEADER:.*]]
		//
		// CHECK: [[COLLAPSED_PREHEADER]]:
		// CHECK: store i32 0, i32*
		// CHECK: %[[TOTAL_SUB_1:.*]] = sub i32 %[[TOTAL]], 1
		// CHECK: store i32 %[[TOTAL_SUB_1]], i32*
		// CHECK: call void @__kmpc_for_static_init_4u
		omp.wsloop (%arg0, %arg1, %arg2) : i32 = (%0, %1, %2) to (%3, %4, %5) step (%6, %7, %8) collapse(3) {
		%31 = llvm.load %20 : !llvm.ptr<i32>
		%32 = llvm.add %31, %arg0 : i32
		%33 = llvm.add %32, %arg1 : i32
		%34 = llvm.add %33, %arg2 : i32
		llvm.store %34, %20 : !llvm.ptr<i32>
		omp.yield
		}
		omp.terminator
		}
llvm.return		llvm.return
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] support collapsed loops in OpenMP-to-LLVM translationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 364801

mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp

mlir/test/Target/LLVMIR/openmp-llvm.mlir

[mlir] support collapsed loops in OpenMP-to-LLVM translation
ClosedPublic