This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add translation of omp.wsloop to LLVM IR
ClosedPublic

Authored by ftynse on Nov 24 2020, 12:15 PM.

Details

Summary

Introduce a translation of OpenMP workshare loop construct to LLVM IR. This is
a minimalist version to enable the pipeline and currently only supports static
loop schedule (default in the specification) on non-collapsed loops. Other
features will be added on per-need basis.

Diff Detail

Event Timeline

ftynse created this revision.Nov 24 2020, 12:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 12:15 PM
ftynse requested review of this revision.Nov 24 2020, 12:15 PM
ftynse added inline comments.Nov 24 2020, 12:24 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532

@Meinersbur @jdoerfert I need your input here. Currently, OpenMPIRBuilder::createCanonicalLoop assumes that the body builder will only populate one basic block with no control flow (and so does CanonicalLoopInfo as far as I understand). Is there a further plan to support loops with control flow inside? This would be necessary for, e.g., nested loops (which I also introduced above).

It is sufficient for me to fix createCanonicalLoop to call the body builder callback _before_ it inserts the branch to the latch basic block and require the body builder callback to produce an single-entry-single-exit region + leave the insertion point at the end of the last block without terminating it. Not sure it will play nicely with the intended use of CanonicalLoopInfo.

ftynse marked an inline comment as not done.Nov 24 2020, 12:25 PM
jdoerfert added inline comments.Nov 25 2020, 7:14 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532

We certainly need to allow loops with multiple blocks but I thought we already do. Can't you just split the block in the callback in case you want to introduce control flow?

body:
  br latch    <- body gen insertion point

will be split to

body:
  br body.end  <- introduced by the body gen, can be removed
body.end:
  br latch

So now you should be able to generate a single-entry single-exit region with body as entry and body.end as exit. Am I missing something?

ftynse added inline comments.Nov 25 2020, 3:00 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532

I can split the block and get an SESE region, or just do nothing in the callback and later retarget the branch at the end of the body + introduce branches to latch (or to a new block that itself branches to latch if SESE is necessary). Current code does the latter.

The question is what does the builder and following transformations expect? SESE? Anything that eventually branches to latch?

The intent was to create a OpenMPIRBuilder::createWorksharingLoop that takes a CanonicalLoop as an argument to be workshared. The heavy lifting would be done by createWorksharingLoop such that its logic does not need to be duplicated by clang and the OpenMP dialect.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532

There can be control flow in the loop body and OpenMPBuilderTests even tests it (search for SplitBlockAndInsertIfThenElse).

The OpenMP spec mandates that nothing jumps into the loop body or out of it that skips the loop control. The spec calls it a 'region', effectively SESE (which also forbids breaks), but also does not allow exceptions or longjumps into/out of it. Everything should be OK as long as it stays within the body region.

For the CFG, the builder expects that CanonicalLoopInfo::getBody() dominates the loop body code and CanonicalLoopInfo::getLatch() postdominates it, although. assertOK() does not explicitly check for that. 'postdominance' is a bit wishy-washy here because of the existence of statically infinite loops and unreachable terminators.

ftynse added inline comments.Nov 27 2020, 5:03 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532

Thanks for the additional context! It would be helpful if this was stated in the documentation.

ftynse updated this revision to Diff 308024.Nov 27 2020, 5:04 AM

Update, add test and make ready for review.

ftynse retitled this revision from WIP: translate omp.wsloop to LLVM IR to [mlir] Add translation of omp.wsloop to LLVM IR.Nov 27 2020, 5:04 AM
ftynse edited the summary of this revision. (Show Details)
chelini added inline comments.Nov 27 2020, 6:10 AM
mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
190

Minor: "loop" -> "loops"

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
566

Remove?

So why not creating a OpenMPIRBuilder::createWorksharingLoop that can be used by clang as well?

ftynse marked 2 inline comments as done.Dec 1 2020, 2:16 AM

So why not creating a OpenMPIRBuilder::createWorksharingLoop that can be used by clang as well?

I just don't have cycles for diving into two unfamiliar code bases right now... If somebody can use this to define createWorksharingLoop, I am happy to switch to that instead.

ftynse updated this revision to Diff 308592.Dec 1 2020, 2:17 AM

Address review

jdoerfert added inline comments.Dec 1 2020, 7:37 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532

Thanks for the additional context! It would be helpful if this was stated in the documentation.

Patches welcome :P

(Honestly, I agree with you. Maybe @Meinersbur will add some documentation to the API and CanonicalLoopInfo ;) ?)

So why not creating a OpenMPIRBuilder::createWorksharingLoop that can be used by clang as well?

I just don't have cycles for diving into two unfamiliar code bases right now... If somebody can use this to define createWorksharingLoop, I am happy to switch to that instead.

FWIW, you don't have to actually use it from Clang but expose a reasonable API in the OpenMPIRBuilder which might change as we use it from Clang. It seems somewhat wasteful to put the code into mlir/lib/Target/LLVMIR/ModuleTranslation.cpp while we know that exactly that code needs to go into the OpenMPIRBuilder. I imagine someone else will pick it up and move it, though I'm not in favor of this kind of stacked development as I fear it creates more work and divides communities.

FWIW, you don't have to actually use it from Clang but expose a reasonable API in the OpenMPIRBuilder which might change as we use it from Clang. It seems somewhat wasteful to put the code into mlir/lib/Target/LLVMIR/ModuleTranslation.cpp while we know that exactly that code needs to go into the OpenMPIRBuilder. I imagine someone else will pick it up and move it, though I'm not in favor of this kind of stacked development as I fear it creates more work and divides communities.

+1

ftynse added a comment.Dec 2 2020, 4:01 AM

FWIW, you don't have to actually use it from Clang but expose a reasonable API in the OpenMPIRBuilder which might change as we use it from Clang. It seems somewhat wasteful to put the code into mlir/lib/Target/LLVMIR/ModuleTranslation.cpp while we know that exactly that code needs to go into the OpenMPIRBuilder. I imagine someone else will pick it up and move it, though I'm not in favor of this kind of stacked development as I fear it creates more work and divides communities.

+1

Okay, okay, you can have https://reviews.llvm.org/D92476. It's not exactly the same code, and as I expected I had to write most again.

ftynse updated this revision to Diff 310212.Dec 8 2020, 7:45 AM

Rebase on new OpenMPIRBuilder

ftynse edited the summary of this revision. (Show Details)Dec 8 2020, 7:46 AM
ftynse marked an inline comment as done.

PTAL

OMPBuilder part looks good to me, but probably an MLIR person should accept this patch.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Just to clarify the semantics: InclusiveStop=true models a for (int i = lowerBound; i <= upperBound; i+=step) loop (i.e. range includes the upper bound). This matches the semantics of wsloop?

ftynse updated this revision to Diff 310349.Dec 8 2020, 2:29 PM

Fix off-by-one error in loop bounds

ftynse added inline comments.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Nice catch! I actually need non-inclusive stop here, mis-merged with the previous version that was computing the bound here.

Meinersbur added inline comments.Dec 8 2020, 6:12 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

I looked up OpenMPOps.td and it doesn't document the semantics of lowerBound, upperBound, step. E.g. what happens when step is negative? Does it count down starting from upperBound? I chose to name the parameters createCanonicalLoop start and stop to avoid this ambiguity.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Yes, I have not defined these bounds and step. I was kind of postponing that decision.

As you know, the OpenMP standard does not specify whether the upperbound is included or excluded. The standard applies the workshare directive to the associated loop. In Fortran this would mean the end index is included. In C/C++ all relational operators are allowed, so depending on the relational operator used it can include or not include the end index. @ftynse has used this as a target for scf.parallel and scf.parallel does not include the upperbound.

What is finally visible to the user is the pretty-print syntax and it will look like the following (from https://reviews.llvm.org/D92327).

omp.wsloop (%iv) = (%lb) to (%ub) step (%step)  {
    omp.yield
}

If step is negative then it will count down from lb to ub.

ftynse added inline comments.Dec 9 2020, 2:17 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Yes, I assumed same syntax would mean same semantics, i.e. positive step non-inclusive upper bound. If we want negative step, we should really consider having a different syntax for it and expressing it as a unit/bool attribute. Otherwise, we may not know at compile time which comparison operator to emit. Say (%lb) downto (%ub) step. Same for including/excluding the stop condition. MLIR's (implicit) convention is that A to B includes A but excludes B. We can add an optional keyword and get something like A to B inclusive and map it to a unit attribute.

Meinersbur added inline comments.Dec 9 2020, 3:25 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

@kiranchandramohan
The OpenMP standard does not specify this because it is already specified by the host language. For C/C++ as well as Fortran must generate ws.loop that match its semantics. We should not define ub is inclusive in one case and exclusive in the other.

We can either:

  1. Define exclusive upperBound: Fortran front-ends must generate omp.wsloop (%iv) = (%lb) to (%last+1) step (%step)
  2. Define inclusive upperBound: C/C++ front-ends must generate omp.wsloop (%iv) = (%lb) to (%end-1) step (%step) (when the relational operation is < or >).

From @ftynse's first answer is seems to be the first (which is idiomatic C/C++).

Requiring (%lb) downto (%ub) step for counting-down loops is problematic when step is not known at compile-time, e.g. for (int i = 0; i != n; i+=stepsize) (OpenMP allows stepsize to be either 1 or -1 dynamic at runtime). C/C++ front-ends emitting MLIR had to emit constructions such as

if (stepsize > 0) {
  omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize)  { ... }
} else {
  omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize)  { ... }
}
Meinersbur added inline comments.Dec 9 2020, 4:36 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

However, I expect loop with non-constant step size to be rare, maybe the passes not requiring to handle this case is worth the versioning. On the other side CanonicalLoopInfo entirely operates on logical iteration counters, no such separation is necessary.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Thanks @Meinersbur, @ftynse for the discussion. I agree that we cannot have different semantics.

Just to complete the discussion, the fir.do_loop has the loop with upper-bounds included. This is as per the language standard.
https://github.com/llvm/llvm-project/blob/a0539298540e49cb734c7b82f93572ab46bf9b00/flang/include/flang/Optimizer/Dialect/FIROps.td#L1905

So finally when we use in Flang there will be some inclusive loops (FIR) and some exclusive loops (OpenMP).

For a general OpenMP loop in fortran, there are four candidates at the MLIR level. Note: here we do not know whether the loop is up or down-counting.

!$omp do
do i=start,end,incr
...
end do
!$omp end do
  1. use attributes for down-counting and inclusive loops.
%lb = %start
%stepsize = %incr
if (stepsize > 0) {
  omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) inclusive { ... }
} else {
  omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) inclusive { ... }
}
  1. use exclusive end bounds and an attribute for down-counting.
%lb = %start
%stepsize = %incr
if (%stepsize > 0) {
  %ub = %end + 1
  omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize)  { ... }
} else {
   %ub = %end - 1
  omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize)  { ... }
}
  1. use exclusive end bounds and differ from MLIR convention to have both down and up-counting with "to".
%lb = %start
%stepsize = %incr
if (%stepsize > 0) {
  %ub = %end + 1
} else {
   %ub = %end - 1
}
  omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize)  { ... }
  1. use inclusive end bounds and differ from MLIR convention to have both down and up-counting with "to".
%lb = %start
%stepsize = %incr
%ub = %end
omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize)  { ... }

What should we prefer?
Is (3) and (4) not recommended because we need to know the direction of counting at the MLIR layer?

ftynse added inline comments.Dec 10 2020, 2:38 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

I'm quite open to revising the convention and allowing for negative step in scf.for (affine loops are a separate story, they have a constant step anyway). I would just prefer to avoid implicit assumptions stemming from similar syntax. We can combine that with the inclusive attribute, and give all this information to OpenMPIRBuilder or, if we want to deparallelize wsloops to scf, emit the corresponding select to comply with its exlusive-upper-bound semantics.

Meinersbur added inline comments.Dec 10 2020, 10:31 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

As discussed in today's "OpenMP work for llvm-project/flang" call, there is also the possibility to redesign omp.wsloop to only take the loop's trip count with the resulting induction variable being canonical (start at 0 with step size 1). The calculation of how to compute the trip count, and deriving the loop counter value (i.e. the value between lb and ub) from the canonical induction variable (multiply by step and add ub) is done by the frontend according to the language's semantics. In case of C/C++ the front-end has to do this anyway in some cases, such as with for-loops on iterators and range-based for-loops.

In my last comment I did not consider that this is already possible with the current design, just use omp.wsloop (%iv) = (0) to (%tripcount) step (1). That is, code duplication would not be necessary. However, if front-ends need to implement the tripcount computation anyway, we can also simplify the wsloop operation to ONLY take the tripcount. This simplifies anything processing wsloop (optimizer, codegen), e.g. they don't have to consider cases such as lb larger than ub, downcounting loops and non-constant stepsize, etc. Are there concerns of breaking the current omp.wsloop syntax?

This is basically the design of OpenMPIRBuilder::createCanonicalLoop. The main version only takes the trip count. The overloads taking lower/upper bound and stepsize is only a convenience wrapper that computes the tripcount and wraps the BodyGen callback that derives the scaled/shiften loop counter from the canonical induction variable.

kiranchandramohan added inline comments.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Are there concerns of breaking the current omp.wsloop syntax?

I don't think this will break the omp.wsloop syntax. This will involve adding more constraints to the definition of the operation or the verifier. The concerns/questions are,

  1. Whether there is enough support in the flang frontend or lowering to do this normalisation transformation (i.e the conversion to up-counting loops with step 1 and iterating from 0 to tripcount).
  2. Whether this normalisation transformation is easier and more natural to do at the MLIR layer. And would it be possible to do without introducing another operation in MLIR?
  3. Whether this normalisation transformation will have or will be be impacted by the presence of clauses. I have not thought through this.

Does @schweitz or @clementval or @SouraVX have comments on 1, 2 ?

Meinersbur added inline comments.Dec 10 2020, 9:26 PM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

I was thinking about changing the syntax of omp.wsloop to:

omp.wsloop (%iv) = (%tripcount) { ... }

This would make any previous code defining %lb and %step invalid.

With normalization, we still have to agree on semantics compatible with all possible base languages, and which still would be incomplete because of range-based for-loops and iterators. Additionally, passes assuming normalization happened still need to check for it (an assert(->isNormalizes()) might be sufficient), and introduces a phase-ordering requirement.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Yes, having just the tripcount would make it invalid. I guess the tripcount only proposal also would mean that the frontend has to collapse the loops (if the collapse clause is present) as well and give only a single loop. Previously the plan was to leave the task of collapsing loops to OpenMPIRBuilder/MLIR.

With normalization, we still have to agree on semantics compatible with all possible base languages, and which still would be incomplete because of range-based for-loops and iterators.

This is correct. But this would leave lot of work for the frontends to do. Clang anway has the ability to convert both iterators and other kind of loops to normalized loops but we will have to code all this normalisation in the flang frontend. Other users (like scf.parallel) will also have to perform normalisation while converting to the omp.wsloop operation.

Additionally, passes assuming normalization happened still need to check for it (an assert(->isNormalizes()) might be sufficient), and introduces a phase-ordering requirement.

This is correct. But at the moment no such passes are planned for worksharing loop. So a possible place to do the normalisation is during conversion to LLVM dialect.

schweitz added inline comments.Dec 11 2020, 11:21 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

Are there concerns of breaking the current omp.wsloop syntax?

I don't think this will break the omp.wsloop syntax. This will involve adding more constraints to the definition of the operation or the verifier. The concerns/questions are,

  1. Whether there is enough support in the flang frontend or lowering to do this normalisation transformation (i.e the conversion to up-counting loops with step 1 and iterating from 0 to tripcount).
  2. Whether this normalisation transformation is easier and more natural to do at the MLIR layer. And would it be possible to do without introducing another operation in MLIR?
  3. Whether this normalisation transformation will have or will be be impacted by the presence of clauses. I have not thought through this.

Does @schweitz or @clementval or @SouraVX have comments on 1, 2 ?

The FIR loop ops have a semantics that is a middle ground between Fortran's DO construct (which has multiple different semantics) and a SESE counted loop. The plan is to perform loop transformations (including normalization) at the (F/ML)IR level rather than on the syntax parse tree. Some preliminary work investigating transforming FIR loops to the affine dialect, for example, have happened with success.

ftynse added inline comments.Dec 15 2020, 4:54 AM
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

We already have a "legalize-for-export" on the LLVM dialect that makes it compatible with the translator / LLVM IR since the dialect is slightly more expressive; we can also legalize OpenMP constructs in a similar way.

I may concerned by code duplication if the frontend has to emit loop bound "normalization", we will have roughly the same in FIR, in MLIR SCF and inside OpenMPIRBuilder. If the logic is sufficiently different, it may be fine though.

Some preliminary work investigating transforming FIR loops to the affine dialect, for example, have happened with success.

Affine only supports compile-time constant, positive steps :)

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565

We had a discussion today and there was consensus to go ahead with the worksharing loop with start, end and step.
(@Meinersbur notes that this will not be able to handle all the C++ variants, since OpenMP allows worksharing loop pragmas to be attached to loops with iterators. And for this case the frontend will have to do some work.)

Normalization of the worksharing loop can be handled in the OpenMPIRBuilder.

LGTM.

I think the following three points can be dealt in other patches.
-> Attribute for inclusive upperbound/stop. If at some point in the future we decide to do some transformation/normalization then would this require initially creating a wsloop operation with inclusive attributes and bounds and then transforming this to a new wsloop operation without the attribute and normalized bounds and step? And is such a transformation OK with the MLIR flow?
-> Additional work in the OpenMPIRBuilder for handling negative step.
-> Changes to SCF for negative step (if required).

This revision is now accepted and ready to land.Dec 23 2020, 1:08 AM

LGTM.

I think the following three points can be dealt in other patches.
-> Attribute for inclusive upperbound/stop. If at some point in the future we decide to do some transformation/normalization then would this require initially creating a wsloop operation with inclusive attributes and bounds and then transforming this to a new wsloop operation without the attribute and normalized bounds and step? And is such a transformation OK with the MLIR flow?

This sounds good wrt MLIR flow. We can have the translator complain about non-normalized loops and/or have normalization as part of -legalize-llvm-for-export.

-> Additional work in the OpenMPIRBuilder for handling negative step.
-> Changes to SCF for negative step (if required).

Will add a TODO in this diff to reconsider the code once other parts change.

This revision was automatically updated to reflect the committed changes.