This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/OpenMP/
-
OpenMP/
1/1
OpenMPOps.td
-
Target/LLVMIR/
-
LLVMIR/
-
ModuleTranslation.h
-
lib/Target/LLVMIR/
-
Target/
-
LLVMIR/
7/23
ModuleTranslation.cpp
-
test/Target/
-
Target/
-
openmp-llvm.mlir

Differential D92055

[mlir] Add translation of omp.wsloop to LLVM IR
ClosedPublic

Authored by ftynse on Nov 24 2020, 12:15 PM.

Download Raw Diff

Details

Reviewers

Meinersbur
jdoerfert
kiranchandramohan
SouraVX
kiranktp
wsmoses
chelini

Commits

rG32a884c9c52c: [mlir] Add translation of omp.wsloop to LLVM IR

Summary

Introduce a translation of OpenMP workshare loop construct to LLVM IR. This is
a minimalist version to enable the pipeline and currently only supports static
loop schedule (default in the specification) on non-collapsed loops. Other
features will be added on per-need basis.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ftynse created this revision.Nov 24 2020, 12:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 12:15 PM

Herald added subscribers: nimiwio, teijeong, rdzhabarov and 15 others. · View Herald Transcript

ftynse requested review of this revision.Nov 24 2020, 12:15 PM

Herald added subscribers: sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald TranscriptNov 24 2020, 12:15 PM

ftynse mentioned this in D91982: [mlir] Add conversion from SCF parallel loops to OpenMP.Nov 24 2020, 12:18 PM

ftynse added inline comments.Nov 24 2020, 12:24 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532	@Meinersbur @jdoerfert I need your input here. Currently, `OpenMPIRBuilder::createCanonicalLoop` assumes that the body builder will only populate one basic block with no control flow (and so does CanonicalLoopInfo as far as I understand). Is there a further plan to support loops with control flow inside? This would be necessary for, e.g., nested loops (which I also introduced above). It is sufficient for me to fix `createCanonicalLoop` to call the body builder callback _before_ it inserts the branch to the latch basic block and require the body builder callback to produce an single-entry-single-exit region + leave the insertion point at the end of the last block without terminating it. Not sure it will play nicely with the intended use of `CanonicalLoopInfo`.

ftynse marked an inline comment as not done.Nov 24 2020, 12:25 PM

Harbormaster completed remote builds in B80005: Diff 307434.Nov 24 2020, 1:02 PM

jdoerfert added inline comments.Nov 25 2020, 7:14 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532	We certainly need to allow loops with multiple blocks but I thought we already do. Can't you just split the block in the callback in case you want to introduce control flow? body: br latch <- body gen insertion point will be split to body: br body.end <- introduced by the body gen, can be removed body.end: br latch So now you should be able to generate a single-entry single-exit region with body as entry and body.end as exit. Am I missing something?

ftynse added inline comments.Nov 25 2020, 3:00 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532	I can split the block and get an SESE region, or just do nothing in the callback and later retarget the branch at the end of the body + introduce branches to latch (or to a new block that itself branches to latch if SESE is necessary). Current code does the latter. The question is what does the builder and following transformations expect? SESE? Anything that eventually branches to latch?

The intent was to create a OpenMPIRBuilder::createWorksharingLoop that takes a CanonicalLoop as an argument to be workshared. The heavy lifting would be done by createWorksharingLoop such that its logic does not need to be duplicated by clang and the OpenMP dialect.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532	There can be control flow in the loop body and OpenMPBuilderTests even tests it (search for `SplitBlockAndInsertIfThenElse`). The OpenMP spec mandates that nothing jumps into the loop body or out of it that skips the loop control. The spec calls it a 'region', effectively SESE (which also forbids breaks), but also does not allow exceptions or longjumps into/out of it. Everything should be OK as long as it stays within the body region. For the CFG, the builder expects that `CanonicalLoopInfo::getBody()` dominates the loop body code and `CanonicalLoopInfo::getLatch()` postdominates it, although. `assertOK()` does not explicitly check for that. 'postdominance' is a bit wishy-washy here because of the existence of statically infinite loops and unreachable terminators.

ftynse added inline comments.Nov 27 2020, 5:03 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532	Thanks for the additional context! It would be helpful if this was stated in the documentation.

Update, add test and make ready for review.

ftynse retitled this revision from WIP: translate omp.wsloop to LLVM IR to [mlir] Add translation of omp.wsloop to LLVM IR.Nov 27 2020, 5:04 AM

ftynse edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B80336: Diff 308024.Nov 27 2020, 5:22 AM

chelini added inline comments.Nov 27 2020, 6:10 AM

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
190	Minor: "loop" -> "loops"
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
566	Remove?

So why not creating a OpenMPIRBuilder::createWorksharingLoop that can be used by clang as well?

In D92055#2420899, @Meinersbur wrote:

So why not creating a OpenMPIRBuilder::createWorksharingLoop that can be used by clang as well?

I just don't have cycles for diving into two unfamiliar code bases right now... If somebody can use this to define createWorksharingLoop, I am happy to switch to that instead.

Address review

Harbormaster completed remote builds in B80630: Diff 308592.Dec 1 2020, 2:33 AM

SouraVX mentioned this in D87247: [MLIR,OpenMP] Added support for lowering MasterOp to LLVMIR.Dec 1 2020, 7:32 AM

jdoerfert added inline comments.Dec 1 2020, 7:37 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
532	Thanks for the additional context! It would be helpful if this was stated in the documentation. Patches welcome :P (Honestly, I agree with you. Maybe @Meinersbur will add some documentation to the API and `CanonicalLoopInfo` ;) ?)

In D92055#2425144, @ftynse wrote:

In D92055#2420899, @Meinersbur wrote:

So why not creating a OpenMPIRBuilder::createWorksharingLoop that can be used by clang as well?

I just don't have cycles for diving into two unfamiliar code bases right now... If somebody can use this to define createWorksharingLoop, I am happy to switch to that instead.

FWIW, you don't have to actually use it from Clang but expose a reasonable API in the OpenMPIRBuilder which might change as we use it from Clang. It seems somewhat wasteful to put the code into mlir/lib/Target/LLVMIR/ModuleTranslation.cpp while we know that exactly that code needs to go into the OpenMPIRBuilder. I imagine someone else will pick it up and move it, though I'm not in favor of this kind of stacked development as I fear it creates more work and divides communities.

In D92055#2425674, @jdoerfert wrote:

FWIW, you don't have to actually use it from Clang but expose a reasonable API in the OpenMPIRBuilder which might change as we use it from Clang. It seems somewhat wasteful to put the code into mlir/lib/Target/LLVMIR/ModuleTranslation.cpp while we know that exactly that code needs to go into the OpenMPIRBuilder. I imagine someone else will pick it up and move it, though I'm not in favor of this kind of stacked development as I fear it creates more work and divides communities.

In D92055#2425785, @mehdi_amini wrote:

In D92055#2425674, @jdoerfert wrote:

FWIW, you don't have to actually use it from Clang but expose a reasonable API in the OpenMPIRBuilder which might change as we use it from Clang. It seems somewhat wasteful to put the code into mlir/lib/Target/LLVMIR/ModuleTranslation.cpp while we know that exactly that code needs to go into the OpenMPIRBuilder. I imagine someone else will pick it up and move it, though I'm not in favor of this kind of stacked development as I fear it creates more work and divides communities.

+1

Okay, okay, you can have https://reviews.llvm.org/D92476. It's not exactly the same code, and as I expected I had to write most again.

Rebase on new OpenMPIRBuilder

PTAL

Harbormaster completed remote builds in B81463: Diff 310212.Dec 8 2020, 7:55 AM

OMPBuilder part looks good to me, but probably an MLIR person should accept this patch.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Just to clarify the semantics: InclusiveStop=true models a `for (int i = lowerBound; i <= upperBound; i+=step)` loop (i.e. range includes the upper bound). This matches the semantics of wsloop?

Fix off-by-one error in loop bounds

ftynse added parent revisions: D92849: [OpenMPIRBuilder] Put the barrier in the exit block in createWorkshapeLoop, D92845: [mlir] Explicitly track branch instructions in translation to LLVM IR.Dec 8 2020, 2:29 PM

ftynse added inline comments.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Nice catch! I actually need non-inclusive stop here, mis-merged with the previous version that was computing the bound here.

Harbormaster completed remote builds in B81526: Diff 310349.Dec 8 2020, 2:47 PM

Meinersbur added inline comments.Dec 8 2020, 6:12 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	I looked up OpenMPOps.td and it doesn't document the semantics of lowerBound, upperBound, step. E.g. what happens when step is negative? Does it count down starting from upperBound? I chose to name the parameters createCanonicalLoop `start` and `stop` to avoid this ambiguity.

kiranchandramohan added inline comments.Dec 9 2020, 2:08 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Yes, I have not defined these bounds and step. I was kind of postponing that decision. As you know, the OpenMP standard does not specify whether the upperbound is included or excluded. The standard applies the workshare directive to the associated loop. In Fortran this would mean the end index is included. In C/C++ all relational operators are allowed, so depending on the relational operator used it can include or not include the end index. @ftynse has used this as a target for scf.parallel and scf.parallel does not include the upperbound. What is finally visible to the user is the pretty-print syntax and it will look like the following (from https://reviews.llvm.org/D92327). omp.wsloop (%iv) = (%lb) to (%ub) step (%step) { omp.yield } If step is negative then it will count down from lb to ub.

ftynse added inline comments.Dec 9 2020, 2:17 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Yes, I assumed same syntax would mean same semantics, i.e. positive step non-inclusive upper bound. If we want negative step, we should really consider having a different syntax for it and expressing it as a unit/bool attribute. Otherwise, we may not know at compile time which comparison operator to emit. Say `(%lb) downto (%ub) step`. Same for including/excluding the stop condition. MLIR's (implicit) convention is that `A to B` includes A but excludes B. We can add an optional keyword and get something like `A to B inclusive` and map it to a unit attribute.

Meinersbur added inline comments.Dec 9 2020, 3:25 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	@kiranchandramohan The OpenMP standard does not specify this because it is already specified by the host language. For C/C++ as well as Fortran must generate `ws.loop` that match its semantics. We should not define ub is inclusive in one case and exclusive in the other. We can either: Define exclusive upperBound: Fortran front-ends must generate `omp.wsloop (%iv) = (%lb) to (%last+1) step (%step)` Define inclusive upperBound: C/C++ front-ends must generate `omp.wsloop (%iv) = (%lb) to (%end-1) step (%step)` (when the relational operation is `<` or `>`). From @ftynse's first answer is seems to be the first (which is idiomatic C/C++). Requiring `(%lb) downto (%ub) step` for counting-down loops is problematic when step is not known at compile-time, e.g. `for (int i = 0; i != n; i+=stepsize)` (OpenMP allows `stepsize` to be either 1 or -1 dynamic at runtime). C/C++ front-ends emitting MLIR had to emit constructions such as if (stepsize > 0) { omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } } else { omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) { ... } }

Meinersbur added inline comments.Dec 9 2020, 4:36 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	However, I expect loop with non-constant step size to be rare, maybe the passes not requiring to handle this case is worth the versioning. On the other side CanonicalLoopInfo entirely operates on logical iteration counters, no such separation is necessary.

kiranchandramohan added inline comments.Dec 10 2020, 2:15 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Thanks @Meinersbur, @ftynse for the discussion. I agree that we cannot have different semantics. Just to complete the discussion, the fir.do_loop has the loop with upper-bounds included. This is as per the language standard. https://github.com/llvm/llvm-project/blob/a0539298540e49cb734c7b82f93572ab46bf9b00/flang/include/flang/Optimizer/Dialect/FIROps.td#L1905 So finally when we use in Flang there will be some inclusive loops (FIR) and some exclusive loops (OpenMP). For a general OpenMP loop in fortran, there are four candidates at the MLIR level. Note: here we do not know whether the loop is up or down-counting. !$omp do do i=start,end,incr ... end do !$omp end do use attributes for down-counting and inclusive loops. %lb = %start %stepsize = %incr if (stepsize > 0) { omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) inclusive { ... } } else { omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) inclusive { ... } } use exclusive end bounds and an attribute for down-counting. %lb = %start %stepsize = %incr if (%stepsize > 0) { %ub = %end + 1 omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } } else { %ub = %end - 1 omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) { ... } } use exclusive end bounds and differ from MLIR convention to have both down and up-counting with "to". %lb = %start %stepsize = %incr if (%stepsize > 0) { %ub = %end + 1 } else { %ub = %end - 1 } omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } use inclusive end bounds and differ from MLIR convention to have both down and up-counting with "to". %lb = %start %stepsize = %incr %ub = %end omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } What should we prefer? Is (3) and (4) not recommended because we need to know the direction of counting at the MLIR layer?

ftynse added inline comments.Dec 10 2020, 2:38 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	I'm quite open to revising the convention and allowing for negative step in `scf.for` (affine loops are a separate story, they have a constant step anyway). I would just prefer to avoid implicit assumptions stemming from similar syntax. We can combine that with the `inclusive` attribute, and give all this information to OpenMPIRBuilder or, if we want to deparallelize wsloops to scf, emit the corresponding select to comply with its exlusive-upper-bound semantics.

Meinersbur added inline comments.Dec 10 2020, 10:31 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	As discussed in today's "OpenMP work for llvm-project/flang" call, there is also the possibility to redesign `omp.wsloop` to only take the loop's trip count with the resulting induction variable being canonical (start at 0 with step size 1). The calculation of how to compute the trip count, and deriving the loop counter value (i.e. the value between lb and ub) from the canonical induction variable (multiply by step and add ub) is done by the frontend according to the language's semantics. In case of C/C++ the front-end has to do this anyway in some cases, such as with for-loops on iterators and range-based for-loops. In my last comment I did not consider that this is already possible with the current design, just use `omp.wsloop (%iv) = (0) to (%tripcount) step (1)`. That is, code duplication would not be necessary. However, if front-ends need to implement the tripcount computation anyway, we can also simplify the `wsloop` operation to ONLY take the tripcount. This simplifies anything processing `wsloop` (optimizer, codegen), e.g. they don't have to consider cases such as lb larger than ub, downcounting loops and non-constant stepsize, etc. Are there concerns of breaking the current `omp.wsloop` syntax? This is basically the design of `OpenMPIRBuilder::createCanonicalLoop`. The main version only takes the trip count. The overloads taking lower/upper bound and stepsize is only a convenience wrapper that computes the tripcount and wraps the BodyGen callback that derives the scaled/shiften loop counter from the canonical induction variable.

kiranchandramohan added subscribers: clementval, schweitz.Dec 10 2020, 3:49 PM

kiranchandramohan added inline comments.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Are there concerns of breaking the current omp.wsloop syntax? I don't think this will break the omp.wsloop syntax. This will involve adding more constraints to the definition of the operation or the verifier. The concerns/questions are, Whether there is enough support in the flang frontend or lowering to do this normalisation transformation (i.e the conversion to up-counting loops with step 1 and iterating from 0 to tripcount). Whether this normalisation transformation is easier and more natural to do at the MLIR layer. And would it be possible to do without introducing another operation in MLIR? Whether this normalisation transformation will have or will be be impacted by the presence of clauses. I have not thought through this. Does @schweitz or @clementval or @SouraVX have comments on 1, 2 ?

Meinersbur added inline comments.Dec 10 2020, 9:26 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	I was thinking about changing the syntax of `omp.wsloop` to: omp.wsloop (%iv) = (%tripcount) { ... } This would make any previous code defining %lb and %step invalid. With normalization, we still have to agree on semantics compatible with all possible base languages, and which still would be incomplete because of range-based for-loops and iterators. Additionally, passes assuming normalization happened still need to check for it (an `assert(->isNormalizes())` might be sufficient), and introduces a phase-ordering requirement.

kiranchandramohan added inline comments.Dec 11 2020, 7:31 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Yes, having just the tripcount would make it invalid. I guess the tripcount only proposal also would mean that the frontend has to collapse the loops (if the collapse clause is present) as well and give only a single loop. Previously the plan was to leave the task of collapsing loops to OpenMPIRBuilder/MLIR. With normalization, we still have to agree on semantics compatible with all possible base languages, and which still would be incomplete because of range-based for-loops and iterators. This is correct. But this would leave lot of work for the frontends to do. Clang anway has the ability to convert both iterators and other kind of loops to normalized loops but we will have to code all this normalisation in the flang frontend. Other users (like scf.parallel) will also have to perform normalisation while converting to the omp.wsloop operation. Additionally, passes assuming normalization happened still need to check for it (an assert(->isNormalizes()) might be sufficient), and introduces a phase-ordering requirement. This is correct. But at the moment no such passes are planned for worksharing loop. So a possible place to do the normalisation is during conversion to LLVM dialect.

schweitz added inline comments.Dec 11 2020, 11:21 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	Are there concerns of breaking the current omp.wsloop syntax? I don't think this will break the omp.wsloop syntax. This will involve adding more constraints to the definition of the operation or the verifier. The concerns/questions are, Whether there is enough support in the flang frontend or lowering to do this normalisation transformation (i.e the conversion to up-counting loops with step 1 and iterating from 0 to tripcount). Whether this normalisation transformation is easier and more natural to do at the MLIR layer. And would it be possible to do without introducing another operation in MLIR? Whether this normalisation transformation will have or will be be impacted by the presence of clauses. I have not thought through this. Does @schweitz or @clementval or @SouraVX have comments on 1, 2 ? The FIR loop ops have a semantics that is a middle ground between Fortran's DO construct (which has multiple different semantics) and a SESE counted loop. The plan is to perform loop transformations (including normalization) at the (F/ML)IR level rather than on the syntax parse tree. Some preliminary work investigating transforming FIR loops to the affine dialect, for example, have happened with success.

ftynse added inline comments.Dec 15 2020, 4:54 AM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	We already have a "legalize-for-export" on the LLVM dialect that makes it compatible with the translator / LLVM IR since the dialect is slightly more expressive; we can also legalize OpenMP constructs in a similar way. I may concerned by code duplication if the frontend has to emit loop bound "normalization", we will have roughly the same in FIR, in MLIR SCF and inside OpenMPIRBuilder. If the logic is sufficiently different, it may be fine though. Some preliminary work investigating transforming FIR loops to the affine dialect, for example, have happened with success. Affine only supports compile-time constant, positive steps :)

kiranchandramohan added inline comments.Dec 17 2020, 12:54 PM

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
565	We had a discussion today and there was consensus to go ahead with the worksharing loop with start, end and step. (@Meinersbur notes that this will not be able to handle all the C++ variants, since OpenMP allows worksharing loop pragmas to be attached to loops with iterators. And for this case the frontend will have to do some work.) Normalization of the worksharing loop can be handled in the OpenMPIRBuilder.

LGTM.

I think the following three points can be dealt in other patches.
-> Attribute for inclusive upperbound/stop. If at some point in the future we decide to do some transformation/normalization then would this require initially creating a wsloop operation with inclusive attributes and bounds and then transforming this to a new wsloop operation without the attribute and normalized bounds and step? And is such a transformation OK with the MLIR flow?
-> Additional work in the OpenMPIRBuilder for handling negative step.
-> Changes to SCF for negative step (if required).

This revision is now accepted and ready to land.Dec 23 2020, 1:08 AM

In D92055#2469626, @kiranchandramohan wrote:

LGTM.

I think the following three points can be dealt in other patches.
-> Attribute for inclusive upperbound/stop. If at some point in the future we decide to do some transformation/normalization then would this require initially creating a wsloop operation with inclusive attributes and bounds and then transforming this to a new wsloop operation without the attribute and normalized bounds and step? And is such a transformation OK with the MLIR flow?

This sounds good wrt MLIR flow. We can have the translator complain about non-normalized loops and/or have normalization as part of -legalize-llvm-for-export.

-> Additional work in the OpenMPIRBuilder for handling negative step.
-> Changes to SCF for negative step (if required).

Will add a TODO in this diff to reconsider the code once other parts change.

Rebase.

Harbormaster completed remote builds in B83379: Diff 313513.Dec 23 2020, 2:38 AM

Closed by commit rG32a884c9c52c: [mlir] Add translation of omp.wsloop to LLVM IR (authored by ftynse). · Explain WhyDec 23 2020, 2:52 AM

This revision was automatically updated to reflect the committed changes.

ftynse added a commit: rG32a884c9c52c: [mlir] Add translation of omp.wsloop to LLVM IR.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

OpenMP/

OpenMPOps.td

5 lines

Target/

LLVMIR/

ModuleTranslation.h

2 lines

lib/

Target/

LLVMIR/

ModuleTranslation.cpp

178 lines

test/

Target/

openmp-llvm.mlir

41 lines

Diff 308592

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	def WsLoopOp : OpenMP_Op<"wsloop", [AttrSizedOperandSegments]> {

let builders = [		let builders = [
OpBuilderDAG<(ins "ValueRange":$lowerBound, "ValueRange":$upperBound,		OpBuilderDAG<(ins "ValueRange":$lowerBound, "ValueRange":$upperBound,
"ValueRange":$step,		"ValueRange":$step,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>		CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes)>
];		];

let regions = (region AnyRegion:$region);		let regions = (region AnyRegion:$region);

		let extraClassDeclaration = [{
		/// Returns the number of loops in the workshape loop nest.
		cheliniUnsubmitted Done Reply Inline Actions Minor: "loop" -> "loops" chelini: Minor: "loop" -> "loops"
		unsigned getNumLoops() { return lowerBound().size(); }
		}];
}		}

def YieldOp : OpenMP_Op<"yield", [NoSideEffect, ReturnLike, Terminator,		def YieldOp : OpenMP_Op<"yield", [NoSideEffect, ReturnLike, Terminator,
HasParent<"WsLoopOp">]> {		HasParent<"WsLoopOp">]> {
let summary = "loop yield and termination operation";		let summary = "loop yield and termination operation";
let description = [{		let description = [{
"omp.yield" yields SSA values from the OpenMP dialect op region and		"omp.yield" yields SSA values from the OpenMP dialect op region and
terminates the region. The semantics of how the values are yielded is		terminates the region. The semantics of how the values are yielded is
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	protected:
virtual ~ModuleTranslation();		virtual ~ModuleTranslation();

virtual LogicalResult convertOperation(Operation &op,		virtual LogicalResult convertOperation(Operation &op,
llvm::IRBuilder<> &builder);		llvm::IRBuilder<> &builder);
virtual LogicalResult convertOmpOperation(Operation &op,		virtual LogicalResult convertOmpOperation(Operation &op,
llvm::IRBuilder<> &builder);		llvm::IRBuilder<> &builder);
virtual LogicalResult convertOmpParallel(Operation &op,		virtual LogicalResult convertOmpParallel(Operation &op,
llvm::IRBuilder<> &builder);		llvm::IRBuilder<> &builder);
		virtual LogicalResult convertOmpWsLoop(Operation &opInst,
		llvm::IRBuilder<> &builder);

/// Converts the type from MLIR LLVM dialect to LLVM.		/// Converts the type from MLIR LLVM dialect to LLVM.
llvm::Type *convertType(LLVMType type);		llvm::Type *convertType(LLVMType type);

static std::unique_ptr<llvm::Module>		static std::unique_ptr<llvm::Module>
prepareLLVMModule(Operation *m, llvm::LLVMContext &llvmContext,		prepareLLVMModule(Operation *m, llvm::LLVMContext &llvmContext,
StringRef name);		StringRef name);

▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp

Show First 20 Lines • Show All 459 Lines • ▼ Show 20 Lines	ModuleTranslation::convertOmpParallel(Operation &opInst,
// above.		// above.
llvm::OpenMPIRBuilder::InsertPointTy allocaIP(builder.saveIP());		llvm::OpenMPIRBuilder::InsertPointTy allocaIP(builder.saveIP());
builder.restoreIP(		builder.restoreIP(
ompBuilder->createParallel(builder, allocaIP, bodyGenCB, privCB, finiCB,		ompBuilder->createParallel(builder, allocaIP, bodyGenCB, privCB, finiCB,
ifCond, numThreads, pbKind, isCancellable));		ifCond, numThreads, pbKind, isCancellable));
return success();		return success();
}		}

		/// Returns an LLVM function to call for initializing loop bounds using OpenMP
		/// static scheduling depending on `type`.
		static llvm::FunctionCallee
		getKmpcForStaticInitForType(Type type, llvm::Module &llvmModule,
		llvm::OpenMPIRBuilder &ompBuilder) {
		// So far, all code is being lowered from `index` types for loop induction
		// variables, which are considered signed.
		// TODO: model signed/unsigned difference if necessary.
		unsigned bitwidth = type.cast<LLVMIntegerType>().getBitWidth();
		if (bitwidth == 32)
		return ompBuilder.getOrCreateRuntimeFunction(
		llvmModule,
		llvm::omp::RuntimeFunction::OMPRTL___kmpc_for_static_init_4);
		if (bitwidth == 64)
		return ompBuilder.getOrCreateRuntimeFunction(
		llvmModule,
		llvm::omp::RuntimeFunction::OMPRTL___kmpc_for_static_init_8);

		// TODO: we should have a verifier on WsLoopOp for this.
		llvm_unreachable("unknown OpenMP loop iterator bitwidth");
		}

		/// Converts an OpenMP workshare loop into LLVM IR using OpenMPIRBuilder.
		LogicalResult ModuleTranslation::convertOmpWsLoop(Operation &opInst,
		llvm::IRBuilder<> &builder) {
		auto loop = cast<omp::WsLoopOp>(opInst);
		// TODO: this should be in the op verifier instead.
		if (loop.lowerBound().empty())
		return failure();

		if (loop.getNumLoops() != 1)
		return opInst.emitOpError("collapsed loops not yet supported");

		if (loop.schedule_val().hasValue() &&
		omp::symbolizeClauseScheduleKind(loop.schedule_val().getValue()) !=
		omp::ClauseScheduleKind::Static)
		return opInst.emitOpError(
		"only static (default) loop schedule is currently supported");

		llvm::Function *func = builder.GetInsertBlock()->getParent();
		llvm::LLVMContext &llvmContext = llvmModule->getContext();

		// Declare the necessary OpenMP runtime functions.
		llvm::Module &llvmModule =
		*builder.GetInsertBlock()->getParent()->getParent();
		llvm::FunctionCallee staticInit = getKmpcForStaticInitForType(
		loop.step()[0].getType(), llvmModule, *ompBuilder);
		llvm::FunctionCallee staticFini = ompBuilder->getOrCreateRuntimeFunction(
		llvmModule, llvm::omp::OMPRTL___kmpc_for_static_fini);
		llvm::FunctionCallee globalThreadNum = ompBuilder->getOrCreateRuntimeFunction(
		llvmModule, llvm::omp::OMPRTL___kmpc_global_thread_num);

		// Store loop bounds in a `alloca`ed memory as expected by the upcoming call
		// to the scheduler that can modify them. Note that the upper bound is
		// expected to be inclusive by the runtime. Also prepare other arguments to
		// the runtime call.
		llvm::Type *i32Type = llvm::Type::getInt32Ty(builder.getContext());
		llvm::Value *lastIter = builder.CreateAlloca(i32Type, nullptr, "p.lastiter");
		llvm::Value *step = valueMapping.lookup(loop.step()[0]);
		llvm::Type *ivType = step->getType();
		llvm::Value *lowerBound = builder.CreateAlloca(ivType);
		llvm::Value *upperBound = builder.CreateAlloca(ivType);
		llvm::Value *stride = builder.CreateAlloca(ivType);
		llvm::Constant *one = llvm::ConstantInt::get(ivType, 1);
		llvm::Value *inclusiveUpperBound =
		ftynseAuthorUnsubmitted Not Done Reply Inline Actions @Meinersbur @jdoerfert I need your input here. Currently, `OpenMPIRBuilder::createCanonicalLoop` assumes that the body builder will only populate one basic block with no control flow (and so does CanonicalLoopInfo as far as I understand). Is there a further plan to support loops with control flow inside? This would be necessary for, e.g., nested loops (which I also introduced above). It is sufficient for me to fix `createCanonicalLoop` to call the body builder callback _before_ it inserts the branch to the latch basic block and require the body builder callback to produce an single-entry-single-exit region + leave the insertion point at the end of the last block without terminating it. Not sure it will play nicely with the intended use of `CanonicalLoopInfo`. ftynse: @Meinersbur @jdoerfert I need your input here. Currently, `OpenMPIRBuilder…
		jdoerfertUnsubmitted Not Done Reply Inline Actions We certainly need to allow loops with multiple blocks but I thought we already do. Can't you just split the block in the callback in case you want to introduce control flow? body: br latch <- body gen insertion point will be split to body: br body.end <- introduced by the body gen, can be removed body.end: br latch So now you should be able to generate a single-entry single-exit region with body as entry and body.end as exit. Am I missing something? jdoerfert: We certainly need to allow loops with multiple blocks but I thought we already do. Can't you…
		ftynseAuthorUnsubmitted Done Reply Inline Actions I can split the block and get an SESE region, or just do nothing in the callback and later retarget the branch at the end of the body + introduce branches to latch (or to a new block that itself branches to latch if SESE is necessary). Current code does the latter. The question is what does the builder and following transformations expect? SESE? Anything that eventually branches to latch? ftynse: I can split the block and get an SESE region, or just do nothing in the callback and later…
		MeinersburUnsubmitted Not Done Reply Inline Actions There can be control flow in the loop body and OpenMPBuilderTests even tests it (search for `SplitBlockAndInsertIfThenElse`). The OpenMP spec mandates that nothing jumps into the loop body or out of it that skips the loop control. The spec calls it a 'region', effectively SESE (which also forbids breaks), but also does not allow exceptions or longjumps into/out of it. Everything should be OK as long as it stays within the body region. For the CFG, the builder expects that `CanonicalLoopInfo::getBody()` dominates the loop body code and `CanonicalLoopInfo::getLatch()` postdominates it, although. `assertOK()` does not explicitly check for that. 'postdominance' is a bit wishy-washy here because of the existence of statically infinite loops and unreachable terminators. Meinersbur: There can be control flow in the loop body and OpenMPBuilderTests even tests it (search for…
		ftynseAuthorUnsubmitted Done Reply Inline Actions Thanks for the additional context! It would be helpful if this was stated in the documentation. ftynse: Thanks for the additional context! It would be helpful if this was stated in the documentation.
		jdoerfertUnsubmitted Done Reply Inline Actions Thanks for the additional context! It would be helpful if this was stated in the documentation. Patches welcome :P (Honestly, I agree with you. Maybe @Meinersbur will add some documentation to the API and `CanonicalLoopInfo` ;) ?) jdoerfert: > Thanks for the additional context! It would be helpful if this was stated in the…
		builder.CreateSub(valueMapping[loop.upperBound()[0]], one);
		builder.CreateStore(valueMapping[loop.lowerBound()[0]], lowerBound);
		builder.CreateStore(inclusiveUpperBound, upperBound);
		builder.CreateStore(valueMapping[loop.step()[0]], stride);
		llvm::Value *chunk = loop.schedule_chunk_var()
		? valueMapping[loop.schedule_chunk_var()]
		: llvm::ConstantInt::get(ivType, 1);

		// Set up the source location value for OpenMP runtime.
		llvm::DISubprogram *subprogram =
		builder.GetInsertBlock()->getParent()->getSubprogram();
		const llvm::DILocation *diLoc =
		debugTranslation->translateLoc(opInst.getLoc(), subprogram);
		llvm::OpenMPIRBuilder::LocationDescription ompLoc(builder.saveIP(),
		llvm::DebugLoc(diLoc));
		llvm::Constant *srcLocStr = ompBuilder->getOrCreateSrcLocStr(ompLoc);
		llvm::Value *srcLoc = ompBuilder->getOrCreateIdent(srcLocStr);

		// Get the global thread id of the current thread.
		llvm::Value *threadNum = builder.CreateCall(globalThreadNum, {srcLoc});

		// TODO: extract scheduling type and map it to OMP constant. This is curently
		// happening in kmp.h and its ilk and needs to be moved to OpenMP.td first.
		constexpr int kStaticSchedType = 34;
		llvm::Constant *schedulingType =
		llvm::ConstantInt::get(i32Type, kStaticSchedType);

		// Call the runtime function to compute new loop bounds according to the
		// scheduler policy.
		builder.CreateCall(staticInit, {srcLoc, threadNum, schedulingType, lastIter,
		lowerBound, upperBound, stride, step, chunk});
		llvm::Value *lowerBoundVal = builder.CreateLoad(lowerBound);
		llvm::Value *upperBoundVal = builder.CreateLoad(upperBound);
		MeinersburUnsubmitted Not Done Reply Inline Actions Just to clarify the semantics: InclusiveStop=true models a `for (int i = lowerBound; i <= upperBound; i+=step)` loop (i.e. range includes the upper bound). This matches the semantics of wsloop? Meinersbur: Just to clarify the semantics: InclusiveStop=true models a `for (int i = lowerBound; i <=…
		ftynseAuthorUnsubmitted Done Reply Inline Actions Nice catch! I actually need non-inclusive stop here, mis-merged with the previous version that was computing the bound here. ftynse: Nice catch! I actually need non-inclusive stop here, mis-merged with the previous version that…
		MeinersburUnsubmitted Not Done Reply Inline Actions I looked up OpenMPOps.td and it doesn't document the semantics of lowerBound, upperBound, step. E.g. what happens when step is negative? Does it count down starting from upperBound? I chose to name the parameters createCanonicalLoop `start` and `stop` to avoid this ambiguity. Meinersbur: I looked up OpenMPOps.td and it doesn't document the semantics of lowerBound, upperBound, step.
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions Yes, I have not defined these bounds and step. I was kind of postponing that decision. As you know, the OpenMP standard does not specify whether the upperbound is included or excluded. The standard applies the workshare directive to the associated loop. In Fortran this would mean the end index is included. In C/C++ all relational operators are allowed, so depending on the relational operator used it can include or not include the end index. @ftynse has used this as a target for scf.parallel and scf.parallel does not include the upperbound. What is finally visible to the user is the pretty-print syntax and it will look like the following (from https://reviews.llvm.org/D92327). omp.wsloop (%iv) = (%lb) to (%ub) step (%step) { omp.yield } If step is negative then it will count down from lb to ub. kiranchandramohan: Yes, I have not defined these bounds and step. I was kind of postponing that decision. As you…
		ftynseAuthorUnsubmitted Done Reply Inline Actions Yes, I assumed same syntax would mean same semantics, i.e. positive step non-inclusive upper bound. If we want negative step, we should really consider having a different syntax for it and expressing it as a unit/bool attribute. Otherwise, we may not know at compile time which comparison operator to emit. Say `(%lb) downto (%ub) step`. Same for including/excluding the stop condition. MLIR's (implicit) convention is that `A to B` includes A but excludes B. We can add an optional keyword and get something like `A to B inclusive` and map it to a unit attribute. ftynse: Yes, I assumed same syntax would mean same semantics, i.e. positive step non-inclusive upper…
		MeinersburUnsubmitted Not Done Reply Inline Actions @kiranchandramohan The OpenMP standard does not specify this because it is already specified by the host language. For C/C++ as well as Fortran must generate `ws.loop` that match its semantics. We should not define ub is inclusive in one case and exclusive in the other. We can either: Define exclusive upperBound: Fortran front-ends must generate `omp.wsloop (%iv) = (%lb) to (%last+1) step (%step)` Define inclusive upperBound: C/C++ front-ends must generate `omp.wsloop (%iv) = (%lb) to (%end-1) step (%step)` (when the relational operation is `<` or `>`). From @ftynse's first answer is seems to be the first (which is idiomatic C/C++). Requiring `(%lb) downto (%ub) step` for counting-down loops is problematic when step is not known at compile-time, e.g. `for (int i = 0; i != n; i+=stepsize)` (OpenMP allows `stepsize` to be either 1 or -1 dynamic at runtime). C/C++ front-ends emitting MLIR had to emit constructions such as if (stepsize > 0) { omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } } else { omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) { ... } } Meinersbur: @kiranchandramohan The OpenMP standard does not specify this because it is already specified…
		MeinersburUnsubmitted Not Done Reply Inline Actions However, I expect loop with non-constant step size to be rare, maybe the passes not requiring to handle this case is worth the versioning. On the other side CanonicalLoopInfo entirely operates on logical iteration counters, no such separation is necessary. Meinersbur: However, I expect loop with non-constant step size to be rare, maybe the passes not requiring…
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions Thanks @Meinersbur, @ftynse for the discussion. I agree that we cannot have different semantics. Just to complete the discussion, the fir.do_loop has the loop with upper-bounds included. This is as per the language standard. https://github.com/llvm/llvm-project/blob/a0539298540e49cb734c7b82f93572ab46bf9b00/flang/include/flang/Optimizer/Dialect/FIROps.td#L1905 So finally when we use in Flang there will be some inclusive loops (FIR) and some exclusive loops (OpenMP). For a general OpenMP loop in fortran, there are four candidates at the MLIR level. Note: here we do not know whether the loop is up or down-counting. !$omp do do i=start,end,incr ... end do !$omp end do use attributes for down-counting and inclusive loops. %lb = %start %stepsize = %incr if (stepsize > 0) { omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) inclusive { ... } } else { omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) inclusive { ... } } use exclusive end bounds and an attribute for down-counting. %lb = %start %stepsize = %incr if (%stepsize > 0) { %ub = %end + 1 omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } } else { %ub = %end - 1 omp.wsloop (%iv) = (%lb) downto (%ub) step (%stepsize) { ... } } use exclusive end bounds and differ from MLIR convention to have both down and up-counting with "to". %lb = %start %stepsize = %incr if (%stepsize > 0) { %ub = %end + 1 } else { %ub = %end - 1 } omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } use inclusive end bounds and differ from MLIR convention to have both down and up-counting with "to". %lb = %start %stepsize = %incr %ub = %end omp.wsloop (%iv) = (%lb) to (%ub) step (%stepsize) { ... } What should we prefer? Is (3) and (4) not recommended because we need to know the direction of counting at the MLIR layer? kiranchandramohan: Thanks @Meinersbur, @ftynse for the discussion. I agree that we cannot have different semantics.
		ftynseAuthorUnsubmitted Not Done Reply Inline Actions I'm quite open to revising the convention and allowing for negative step in `scf.for` (affine loops are a separate story, they have a constant step anyway). I would just prefer to avoid implicit assumptions stemming from similar syntax. We can combine that with the `inclusive` attribute, and give all this information to OpenMPIRBuilder or, if we want to deparallelize wsloops to scf, emit the corresponding select to comply with its exlusive-upper-bound semantics. ftynse: I'm quite open to revising the convention and allowing for negative step in `scf.for` (affine…
		MeinersburUnsubmitted Not Done Reply Inline Actions As discussed in today's "OpenMP work for llvm-project/flang" call, there is also the possibility to redesign `omp.wsloop` to only take the loop's trip count with the resulting induction variable being canonical (start at 0 with step size 1). The calculation of how to compute the trip count, and deriving the loop counter value (i.e. the value between lb and ub) from the canonical induction variable (multiply by step and add ub) is done by the frontend according to the language's semantics. In case of C/C++ the front-end has to do this anyway in some cases, such as with for-loops on iterators and range-based for-loops. In my last comment I did not consider that this is already possible with the current design, just use `omp.wsloop (%iv) = (0) to (%tripcount) step (1)`. That is, code duplication would not be necessary. However, if front-ends need to implement the tripcount computation anyway, we can also simplify the `wsloop` operation to ONLY take the tripcount. This simplifies anything processing `wsloop` (optimizer, codegen), e.g. they don't have to consider cases such as lb larger than ub, downcounting loops and non-constant stepsize, etc. Are there concerns of breaking the current `omp.wsloop` syntax? This is basically the design of `OpenMPIRBuilder::createCanonicalLoop`. The main version only takes the trip count. The overloads taking lower/upper bound and stepsize is only a convenience wrapper that computes the tripcount and wraps the BodyGen callback that derives the scaled/shiften loop counter from the canonical induction variable. Meinersbur: As discussed in today's "OpenMP work for llvm-project/flang" call, there is also the…
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions Are there concerns of breaking the current omp.wsloop syntax? I don't think this will break the omp.wsloop syntax. This will involve adding more constraints to the definition of the operation or the verifier. The concerns/questions are, Whether there is enough support in the flang frontend or lowering to do this normalisation transformation (i.e the conversion to up-counting loops with step 1 and iterating from 0 to tripcount). Whether this normalisation transformation is easier and more natural to do at the MLIR layer. And would it be possible to do without introducing another operation in MLIR? Whether this normalisation transformation will have or will be be impacted by the presence of clauses. I have not thought through this. Does @schweitz or @clementval or @SouraVX have comments on 1, 2 ? kiranchandramohan: > Are there concerns of breaking the current omp.wsloop syntax? I don't think this will break…
		MeinersburUnsubmitted Not Done Reply Inline Actions I was thinking about changing the syntax of `omp.wsloop` to: omp.wsloop (%iv) = (%tripcount) { ... } This would make any previous code defining %lb and %step invalid. With normalization, we still have to agree on semantics compatible with all possible base languages, and which still would be incomplete because of range-based for-loops and iterators. Additionally, passes assuming normalization happened still need to check for it (an `assert(->isNormalizes())` might be sufficient), and introduces a phase-ordering requirement. Meinersbur: I was thinking about changing the syntax of `omp.wsloop` to: ``` omp.wsloop (%iv) =…
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions Yes, having just the tripcount would make it invalid. I guess the tripcount only proposal also would mean that the frontend has to collapse the loops (if the collapse clause is present) as well and give only a single loop. Previously the plan was to leave the task of collapsing loops to OpenMPIRBuilder/MLIR. With normalization, we still have to agree on semantics compatible with all possible base languages, and which still would be incomplete because of range-based for-loops and iterators. This is correct. But this would leave lot of work for the frontends to do. Clang anway has the ability to convert both iterators and other kind of loops to normalized loops but we will have to code all this normalisation in the flang frontend. Other users (like scf.parallel) will also have to perform normalisation while converting to the omp.wsloop operation. Additionally, passes assuming normalization happened still need to check for it (an assert(->isNormalizes()) might be sufficient), and introduces a phase-ordering requirement. This is correct. But at the moment no such passes are planned for worksharing loop. So a possible place to do the normalisation is during conversion to LLVM dialect. kiranchandramohan: Yes, having just the tripcount would make it invalid. I guess the tripcount only proposal also…
		schweitzUnsubmitted Not Done Reply Inline Actions Are there concerns of breaking the current omp.wsloop syntax? I don't think this will break the omp.wsloop syntax. This will involve adding more constraints to the definition of the operation or the verifier. The concerns/questions are, Whether there is enough support in the flang frontend or lowering to do this normalisation transformation (i.e the conversion to up-counting loops with step 1 and iterating from 0 to tripcount). Whether this normalisation transformation is easier and more natural to do at the MLIR layer. And would it be possible to do without introducing another operation in MLIR? Whether this normalisation transformation will have or will be be impacted by the presence of clauses. I have not thought through this. Does @schweitz or @clementval or @SouraVX have comments on 1, 2 ? The FIR loop ops have a semantics that is a middle ground between Fortran's DO construct (which has multiple different semantics) and a SESE counted loop. The plan is to perform loop transformations (including normalization) at the (F/ML)IR level rather than on the syntax parse tree. Some preliminary work investigating transforming FIR loops to the affine dialect, for example, have happened with success. schweitz: > > Are there concerns of breaking the current omp.wsloop syntax? > > I don't think this will…
		ftynseAuthorUnsubmitted Done Reply Inline Actions We already have a "legalize-for-export" on the LLVM dialect that makes it compatible with the translator / LLVM IR since the dialect is slightly more expressive; we can also legalize OpenMP constructs in a similar way. I may concerned by code duplication if the frontend has to emit loop bound "normalization", we will have roughly the same in FIR, in MLIR SCF and inside OpenMPIRBuilder. If the logic is sufficiently different, it may be fine though. Some preliminary work investigating transforming FIR loops to the affine dialect, for example, have happened with success. Affine only supports compile-time constant, positive steps :) ftynse: We already have a "legalize-for-export" on the LLVM dialect that makes it compatible with the…
		kiranchandramohanUnsubmitted Not Done Reply Inline Actions We had a discussion today and there was consensus to go ahead with the worksharing loop with start, end and step. (@Meinersbur notes that this will not be able to handle all the C++ variants, since OpenMP allows worksharing loop pragmas to be attached to loops with iterators. And for this case the frontend will have to do some work.) Normalization of the worksharing loop can be handled in the OpenMPIRBuilder. kiranchandramohan: We had a discussion today and there was consensus to go ahead with the worksharing loop with…

		cheliniUnsubmitted Done Reply Inline Actions Remove? chelini: Remove?
		// Generator of the canonical loop body. Produces an SESE region of basic
		// blocks.
		// TODO: support error propagation in OpenMPIRBuilder and use it instead of
		// relying on captured variables.
		LogicalResult bodyGenStatus = success();
		auto bodyGen = [&](llvm::OpenMPIRBuilder::InsertPointTy ip, llvm::Value *iv) {
		llvm::IRBuilder<>::InsertPointGuard guard(builder);

		// Make sure further conversions know about the induction variable.
		valueMapping[loop.getRegion().front().getArgument(0)] = iv;

		llvm::BasicBlock *entryBlock = ip.getBlock();
		llvm::BasicBlock *exitBlock =
		entryBlock->splitBasicBlock(ip.getPoint(), "omp.wsloop.exit");

		// Convert the body of the loop.
		Region &region = loop.region();
		for (Block &bb : region) {
		llvm::BasicBlock *llvmBB =
		llvm::BasicBlock::Create(llvmContext, "omp.wsloop.region", func);
		blockMapping[&bb] = llvmBB;

		// Retarget the branch of the entry block to the entry block of the
		// converted region (regions are single-entry).
		if (bb.isEntryBlock()) {
		auto *branch = cast<llvm::BranchInst>(entryBlock->getTerminator());
		branch->setSuccessor(0, llvmBB);
		}
		}

		// Block conversion creates a new IRBuilder every time so need not bother
		// about maintaining the insertion point.
		llvm::SetVector<Block *> blocks = topologicalSort(region);
		for (Block *bb : blocks) {
		if (failed(convertBlock(*bb, bb->isEntryBlock()))) {
		bodyGenStatus = failure();
		return;
		}

		// Special handling for `omp.yield` terminators (we may have more than
		// one): they return the control to the parent WsLoop operation so replace
		// them with the branch to the exit block. We handle this here to avoid
		// relying inter-function communication through the ModuleTranslation
		// class to set up the correct insertion point. This is also consistent
		// with MLIR's idiom of handling special region terminators in the same
		// code that handles the region-owning operation.
		if (isa<omp::YieldOp>(bb->getTerminator())) {
		llvm::BasicBlock *llvmBB = blockMapping[bb];
		builder.SetInsertPoint(llvmBB, llvmBB->end());
		builder.CreateBr(exitBlock);
		}
		}

		connectPHINodes(region, valueMapping, blockMapping);
		};

		// Delegate actual loop construction to the OpenMP IRBuilder.
		llvm::CanonicalLoopInfo *loopInfo = ompBuilder->createCanonicalLoop(
		builder, bodyGen, lowerBoundVal, upperBoundVal,
		valueMapping[loop.step()[0]], /IsSigned=/true,
		/InclusiveStop=/true);
		if (failed(bodyGenStatus))
		return failure();

		// Notify the scheduler when the loop is complete.
		builder.restoreIP(loopInfo->getAfterIP());
		builder.CreateCall(staticFini, {srcLoc, threadNum});

		return success();
		}

/// Given an OpenMP MLIR operation, create the corresponding LLVM IR		/// Given an OpenMP MLIR operation, create the corresponding LLVM IR
/// (including OpenMP runtime calls).		/// (including OpenMP runtime calls).
LogicalResult		LogicalResult
ModuleTranslation::convertOmpOperation(Operation &opInst,		ModuleTranslation::convertOmpOperation(Operation &opInst,
llvm::IRBuilder<> &builder) {		llvm::IRBuilder<> &builder) {
if (!ompBuilder) {		if (!ompBuilder) {
ompBuilder = std::make_unique<llvm::OpenMPIRBuilder>(*llvmModule);		ompBuilder = std::make_unique<llvm::OpenMPIRBuilder>(*llvmModule);
ompBuilder->initialize();		ompBuilder->initialize();
Show All 24 Lines	return llvm::TypeSwitch<Operation *, LogicalResult>(&opInst)
return success();		return success();
})		})
.Case([&](omp::TerminatorOp) {		.Case([&](omp::TerminatorOp) {
builder.CreateBr(ompContinuationIPStack.back());		builder.CreateBr(ompContinuationIPStack.back());
return success();		return success();
})		})
.Case(		.Case(
[&](omp::ParallelOp) { return convertOmpParallel(opInst, builder); })		[&](omp::ParallelOp) { return convertOmpParallel(opInst, builder); })
		.Case([&](omp::WsLoopOp) { return convertOmpWsLoop(opInst, builder); })
		.Case([&](omp::YieldOp op) {
		// Yields are loop terminators that can be just omitted. The loop
		// structure was created in the function that handles WsLoopOp.
		assert(op.getNumOperands() == 0 && "unexpected yield with operands");
		return success();
		})

.Default([&](Operation *inst) {		.Default([&](Operation *inst) {
return inst->emitError("unsupported OpenMP operation: ")		return inst->emitError("unsupported OpenMP operation: ")
<< inst->getName();		<< inst->getName();
});		});
}		}

/// Given a single MLIR operation, create the corresponding LLVM IR operation		/// Given a single MLIR operation, create the corresponding LLVM IR operation
/// using the `builder`. LLVM IR Builder does not have a generic interface so		/// using the `builder`. LLVM IR Builder does not have a generic interface so
▲ Show 20 Lines • Show All 483 Lines • Show Last 20 Lines

mlir/test/Target/openmp-llvm.mlir

// RUN: mlir-translate -mlir-to-llvmir %s \| FileCheck %s		// RUN: mlir-translate -mlir-to-llvmir -split-input-file %s \| FileCheck %s

// CHECK-LABEL: define void @test_stand_alone_directives()		// CHECK-LABEL: define void @test_stand_alone_directives()
llvm.func @test_stand_alone_directives() {		llvm.func @test_stand_alone_directives() {
// CHECK: [[OMP_THREAD:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @{{[0-9]+}})		// CHECK: [[OMP_THREAD:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @{{[0-9]+}})
// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @{{[0-9]+}}, i32 [[OMP_THREAD]])		// CHECK-NEXT: call void @__kmpc_barrier(%struct.ident_t* @{{[0-9]+}}, i32 [[OMP_THREAD]])
omp.barrier		omp.barrier

// CHECK: [[OMP_THREAD1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @{{[0-9]+}})		// CHECK: [[OMP_THREAD1:%.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @{{[0-9]+}})
▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	// CHECK: call void @__kmpc_barrier
omp.terminator		omp.terminator
}		}

omp.barrier		omp.barrier
omp.terminator		omp.terminator
}		}
llvm.return		llvm.return
}		}

		// -----

		// CHECK: %struct.ident_t = type
		// CHECK: @[[$parallel_loc:.]] = private unnamed_addr constant {{.}} c";LLVMDialectModule;wsloop_simple;{{[0-9]+}};{{[0-9]+}};;\00"
		// CHECK: @[[$parallel_loc_struct:.]] = private unnamed_addr constant %struct.ident_t {{.}} @[[$parallel_loc]], {{.*}}

		// CHECK: @[[$wsloop_loc:.]] = private unnamed_addr constant {{.}} c";LLVMDialectModule;wsloop_simple;{{[0-9]+}};{{[0-9]+}};;\00"
		// CHECK: @[[$wsloop_loc_struct:.]] = private unnamed_addr constant %struct.ident_t {{.}} @[[$wsloop_loc]], {{.*}}

		// CHECK-LABEL: @wsloop_simple
		llvm.func @wsloop_simple(%arg0: !llvm.ptr<float>) {
		%0 = llvm.mlir.constant(42 : index) : !llvm.i64
		%1 = llvm.mlir.constant(10 : index) : !llvm.i64
		%2 = llvm.mlir.constant(1 : index) : !llvm.i64
		omp.parallel {
		"omp.wsloop"(%1, %0, %2) ( {
		^bb0(%arg1: !llvm.i64):
		// CHECK: %p.lastiter = alloca i32
		// CHECK: %[[PLOWER:.*]] = alloca i64
		// CHECK: %[[PUPPER:.*]] = alloca i64
		// CHECK: %[[PSTRIDE:.*]] = alloca i64
		// CHECK: store i64 {{.}}, i64 %[[PLOWER]]
		// CHECK: store i64 {{.}}, i64 %[[PUPPER]]
		// CHECK: store i64 {{.}}, i64 %[[PSTRIDE]]
		// CHECK: %[[TID:.]] = call i32 @__kmpc_global_thread_num(%struct.ident_t @[[$wsloop_loc_struct]])
		// CHECK: call void @__kmpc_for_static_init_8(%struct.ident_t* @[[$wsloop_loc_struct]], i32 %[[TID]], i32 34, i32* %p.lastiter, i64* %[[PLOWER]], i64* %[[PUPPER]], i64* %[[PSTRIDE]], i64 1, i64 1)
		// CHECK: load i64, i64* %[[PLOWER]]
		// CHECK: load i64, i64* %[[PUPPER]]
		%3 = llvm.mlir.constant(2.000000e+00 : f32) : !llvm.float
		%4 = llvm.getelementptr %arg0[%arg1] : (!llvm.ptr<float>, !llvm.i64) -> !llvm.ptr<float>
		llvm.store %3, %4 : !llvm.ptr<float>
		omp.yield
		// CHECK: call void @__kmpc_for_static_fini(%struct.ident_t* @[[$wsloop_loc_struct]], i32 %[[TID]])
		}) {operand_segment_sizes = dense<[1, 1, 1, 0, 0, 0, 0, 0, 0]> : vector<9xi32>} : (!llvm.i64, !llvm.i64, !llvm.i64) -> ()
		omp.terminator
		}
		llvm.return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add translation of omp.wsloop to LLVM IRClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 308592

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp

mlir/test/Target/openmp-llvm.mlir

[mlir] Add translation of omp.wsloop to LLVM IR
ClosedPublic