Introduce a translation of OpenMP workshare loop construct to LLVM IR. This is
a minimalist version to enable the pipeline and currently only supports static
loop schedule (default in the specification) on non-collapsed loops. Other
features will be added on per-need basis.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
531 | @Meinersbur @jdoerfert I need your input here. Currently, OpenMPIRBuilder::createCanonicalLoop assumes that the body builder will only populate one basic block with no control flow (and so does CanonicalLoopInfo as far as I understand). Is there a further plan to support loops with control flow inside? This would be necessary for, e.g., nested loops (which I also introduced above). It is sufficient for me to fix createCanonicalLoop to call the body builder callback _before_ it inserts the branch to the latch basic block and require the body builder callback to produce an single-entry-single-exit region + leave the insertion point at the end of the last block without terminating it. Not sure it will play nicely with the intended use of CanonicalLoopInfo. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
531 | We certainly need to allow loops with multiple blocks but I thought we already do. Can't you just split the block in the callback in case you want to introduce control flow? body: br latch <- body gen insertion point will be split to body: br body.end <- introduced by the body gen, can be removed body.end: br latch So now you should be able to generate a single-entry single-exit region with body as entry and body.end as exit. Am I missing something? |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
531 | I can split the block and get an SESE region, or just do nothing in the callback and later retarget the branch at the end of the body + introduce branches to latch (or to a new block that itself branches to latch if SESE is necessary). Current code does the latter. The question is what does the builder and following transformations expect? SESE? Anything that eventually branches to latch? |
The intent was to create a OpenMPIRBuilder::createWorksharingLoop that takes a CanonicalLoop as an argument to be workshared. The heavy lifting would be done by createWorksharingLoop such that its logic does not need to be duplicated by clang and the OpenMP dialect.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
531 | There can be control flow in the loop body and OpenMPBuilderTests even tests it (search for SplitBlockAndInsertIfThenElse). The OpenMP spec mandates that nothing jumps into the loop body or out of it that skips the loop control. The spec calls it a 'region', effectively SESE (which also forbids breaks), but also does not allow exceptions or longjumps into/out of it. Everything should be OK as long as it stays within the body region. For the CFG, the builder expects that CanonicalLoopInfo::getBody() dominates the loop body code and CanonicalLoopInfo::getLatch() postdominates it, although. assertOK() does not explicitly check for that. 'postdominance' is a bit wishy-washy here because of the existence of statically infinite loops and unreachable terminators. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
531 | Thanks for the additional context! It would be helpful if this was stated in the documentation. |
So why not creating a OpenMPIRBuilder::createWorksharingLoop that can be used by clang as well?
I just don't have cycles for diving into two unfamiliar code bases right now... If somebody can use this to define createWorksharingLoop, I am happy to switch to that instead.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
531 |
Patches welcome :P (Honestly, I agree with you. Maybe @Meinersbur will add some documentation to the API and CanonicalLoopInfo ;) ?) |
FWIW, you don't have to actually use it from Clang but expose a reasonable API in the OpenMPIRBuilder which might change as we use it from Clang. It seems somewhat wasteful to put the code into mlir/lib/Target/LLVMIR/ModuleTranslation.cpp while we know that exactly that code needs to go into the OpenMPIRBuilder. I imagine someone else will pick it up and move it, though I'm not in favor of this kind of stacked development as I fear it creates more work and divides communities.
Okay, okay, you can have https://reviews.llvm.org/D92476. It's not exactly the same code, and as I expected I had to write most again.
OMPBuilder part looks good to me, but probably an MLIR person should accept this patch.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | Just to clarify the semantics: InclusiveStop=true models a for (int i = lowerBound; i <= upperBound; i+=step) loop (i.e. range includes the upper bound). This matches the semantics of wsloop? |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | Nice catch! I actually need non-inclusive stop here, mis-merged with the previous version that was computing the bound here. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | I looked up OpenMPOps.td and it doesn't document the semantics of lowerBound, upperBound, step. E.g. what happens when step is negative? Does it count down starting from upperBound? I chose to name the parameters createCanonicalLoop start and stop to avoid this ambiguity. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | Yes, I have not defined these bounds and step. I was kind of postponing that decision. As you know, the OpenMP standard does not specify whether the upperbound is included or excluded. The standard applies the workshare directive to the associated loop. In Fortran this would mean the end index is included. In C/C++ all relational operators are allowed, so depending on the relational operator used it can include or not include the end index. @ftynse has used this as a target for scf.parallel and scf.parallel does not include the upperbound. What is finally visible to the user is the pretty-print syntax and it will look like the following (from https://reviews.llvm.org/D92327). omp.wsloop (%iv) = (%lb) to (%ub) step (%step) { omp.yield } If step is negative then it will count down from lb to ub. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | Yes, I assumed same syntax would mean same semantics, i.e. positive step non-inclusive upper bound. If we want negative step, we should really consider having a different syntax for it and expressing it as a unit/bool attribute. Otherwise, we may not know at compile time which comparison operator to emit. Say (%lb) downto (%ub) step. Same for including/excluding the stop condition. MLIR's (implicit) convention is that A to B includes A but excludes B. We can add an optional keyword and get something like A to B inclusive and map it to a unit attribute. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | @kiranchandramohan We can either:
From @ftynse's first answer is seems to be the first (which is idiomatic C/C++). Requiring (%lb) downto (%ub) step for counting-down loops is problematic when step is not known at compile-time, e.g. for (int i = 0; i != n; i+=stepsize) (OpenMP allows stepsize to be either 1 or -1 dynamic at runtime). C/C++ front-ends emitting MLIR had to emit constructions such as if (stepsize > 0) { omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } } else { omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) { ... } } |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | However, I expect loop with non-constant step size to be rare, maybe the passes not requiring to handle this case is worth the versioning. On the other side CanonicalLoopInfo entirely operates on logical iteration counters, no such separation is necessary. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | Thanks @Meinersbur, @ftynse for the discussion. I agree that we cannot have different semantics. Just to complete the discussion, the fir.do_loop has the loop with upper-bounds included. This is as per the language standard. So finally when we use in Flang there will be some inclusive loops (FIR) and some exclusive loops (OpenMP). For a general OpenMP loop in fortran, there are four candidates at the MLIR level. Note: here we do not know whether the loop is up or down-counting. !$omp do do i=start,end,incr ... end do !$omp end do
%lb = %start %stepsize = %incr if (stepsize > 0) { omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) inclusive { ... } } else { omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) inclusive { ... } }
%lb = %start %stepsize = %incr if (%stepsize > 0) { %ub = %end + 1 omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } } else { %ub = %end - 1 omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) { ... } }
%lb = %start %stepsize = %incr if (%stepsize > 0) { %ub = %end + 1 } else { %ub = %end - 1 } omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... }
%lb = %start %stepsize = %incr %ub = %end omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } What should we prefer? |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | I'm quite open to revising the convention and allowing for negative step in scf.for (affine loops are a separate story, they have a constant step anyway). I would just prefer to avoid implicit assumptions stemming from similar syntax. We can combine that with the inclusive attribute, and give all this information to OpenMPIRBuilder or, if we want to deparallelize wsloops to scf, emit the corresponding select to comply with its exlusive-upper-bound semantics. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | As discussed in today's "OpenMP work for llvm-project/flang" call, there is also the possibility to redesign omp.wsloop to only take the loop's trip count with the resulting induction variable being canonical (start at 0 with step size 1). The calculation of how to compute the trip count, and deriving the loop counter value (i.e. the value between lb and ub) from the canonical induction variable (multiply by step and add ub) is done by the frontend according to the language's semantics. In case of C/C++ the front-end has to do this anyway in some cases, such as with for-loops on iterators and range-based for-loops. In my last comment I did not consider that this is already possible with the current design, just use omp.wsloop (%iv) = (0) to (%tripcount) step (1). That is, code duplication would not be necessary. However, if front-ends need to implement the tripcount computation anyway, we can also simplify the wsloop operation to ONLY take the tripcount. This simplifies anything processing wsloop (optimizer, codegen), e.g. they don't have to consider cases such as lb larger than ub, downcounting loops and non-constant stepsize, etc. Are there concerns of breaking the current omp.wsloop syntax? This is basically the design of OpenMPIRBuilder::createCanonicalLoop. The main version only takes the trip count. The overloads taking lower/upper bound and stepsize is only a convenience wrapper that computes the tripcount and wraps the BodyGen callback that derives the scaled/shiften loop counter from the canonical induction variable. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 |
I don't think this will break the omp.wsloop syntax. This will involve adding more constraints to the definition of the operation or the verifier. The concerns/questions are,
Does @schweitz or @clementval or @SouraVX have comments on 1, 2 ? |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | I was thinking about changing the syntax of omp.wsloop to: omp.wsloop (%iv) = (%tripcount) { ... } This would make any previous code defining %lb and %step invalid. With normalization, we still have to agree on semantics compatible with all possible base languages, and which still would be incomplete because of range-based for-loops and iterators. Additionally, passes assuming normalization happened still need to check for it (an assert(->isNormalizes()) might be sufficient), and introduces a phase-ordering requirement. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | Yes, having just the tripcount would make it invalid. I guess the tripcount only proposal also would mean that the frontend has to collapse the loops (if the collapse clause is present) as well and give only a single loop. Previously the plan was to leave the task of collapsing loops to OpenMPIRBuilder/MLIR.
This is correct. But this would leave lot of work for the frontends to do. Clang anway has the ability to convert both iterators and other kind of loops to normalized loops but we will have to code all this normalisation in the flang frontend. Other users (like scf.parallel) will also have to perform normalisation while converting to the omp.wsloop operation.
This is correct. But at the moment no such passes are planned for worksharing loop. So a possible place to do the normalisation is during conversion to LLVM dialect. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 |
The FIR loop ops have a semantics that is a middle ground between Fortran's DO construct (which has multiple different semantics) and a SESE counted loop. The plan is to perform loop transformations (including normalization) at the (F/ML)IR level rather than on the syntax parse tree. Some preliminary work investigating transforming FIR loops to the affine dialect, for example, have happened with success. |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | We already have a "legalize-for-export" on the LLVM dialect that makes it compatible with the translator / LLVM IR since the dialect is slightly more expressive; we can also legalize OpenMP constructs in a similar way. I may concerned by code duplication if the frontend has to emit loop bound "normalization", we will have roughly the same in FIR, in MLIR SCF and inside OpenMPIRBuilder. If the logic is sufficiently different, it may be fine though.
Affine only supports compile-time constant, positive steps :) |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
564 | We had a discussion today and there was consensus to go ahead with the worksharing loop with start, end and step. Normalization of the worksharing loop can be handled in the OpenMPIRBuilder. |
LGTM.
I think the following three points can be dealt in other patches.
-> Attribute for inclusive upperbound/stop. If at some point in the future we decide to do some transformation/normalization then would this require initially creating a wsloop operation with inclusive attributes and bounds and then transforming this to a new wsloop operation without the attribute and normalized bounds and step? And is such a transformation OK with the MLIR flow?
-> Additional work in the OpenMPIRBuilder for handling negative step.
-> Changes to SCF for negative step (if required).
This sounds good wrt MLIR flow. We can have the translator complain about non-normalized loops and/or have normalization as part of -legalize-llvm-for-export.
-> Additional work in the OpenMPIRBuilder for handling negative step.
-> Changes to SCF for negative step (if required).
Will add a TODO in this diff to reconsider the code once other parts change.
Minor: "loop" -> "loops"