This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/include/llvm/Frontend/OpenMP/
-
include/
-
llvm/
-
Frontend/
-
OpenMP/
5/7
OMP.td
-
mlir/
-
include/mlir/Dialect/OpenMP/
-
mlir/
-
Dialect/
-
OpenMP/
-
OpenMPDialect.h
19/20
OpenMPOps.td
-
test/Dialect/OpenMP/
-
Dialect/
-
OpenMP/
1
ops.mlir

Differential D86071

[MLIR][OpenMP] Add omp.wsloop operation
ClosedPublic

Authored by kiranchandramohan on Aug 17 2020, 6:25 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
mehdi_amini
ftynse
clementval
DavidTruby
bondhugula

Commits

rG843525075b87: [MLIR][OpenMP] Add omp.wsloop operation

Summary

This adds a simple definition of a "workshare loop" operation for
the OpenMP MLIR dialect, excluding the "reduction" and "allocate"
clauses and without a custom parser and pretty printer.

The schedule clause also does not yet accept the modifiers that are
permitted in OpenMP 5.0.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

DavidTruby created this revision.Aug 17 2020, 6:25 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 17 2020, 6:25 AM

Herald added subscribers: llvm-commits, msifontes, jurahul and 15 others. · View Herald Transcript

DavidTruby requested review of this revision.Aug 17 2020, 6:25 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptAug 17 2020, 6:25 AM

Herald added subscribers: sstefan1, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

DavidTruby added reviewers: kiranchandramohan, mehdi_amini.Aug 17 2020, 6:28 AM

DavidTruby added a reviewer: ftynse.

Harbormaster completed remote builds in B68597: Diff 286000.Aug 17 2020, 7:17 AM

kiranchandramohan added inline comments.Aug 17 2020, 9:09 AM

llvm/include/llvm/Frontend/OpenMP/OMP.td
120	For the default clause in parallel we have used a prefix "def" to fix this issue. I think we need to standardize this. Would converting the first letter to caps be a reasonable workaround since reserved keywords do not have a letter with caps as first letter?

DavidTruby added a reviewer: clementval.Aug 18 2020, 6:33 AM

clementval added inline comments.Aug 18 2020, 11:34 AM

llvm/include/llvm/Frontend/OpenMP/OMP.td
120	That's probably a good idea to have a standard way to do that. +1 for the first letter capitalized if it works.

kiranchandramohan added inline comments.Aug 18 2020, 2:15 PM

llvm/include/llvm/Frontend/OpenMP/OMP.td
120	I have filed an issue https://bugs.llvm.org/show_bug.cgi?id=47225 regarding StrEnumAttr not accepting reserved C++ keywords.

A few general comments about the design of the workshare loop operation.

Should the iteration interval (start, end) and step be part of the loop operation or should we have one more version of omp do?

Openmp for/do loops with static scheduling are implemented by the two following runtime call. How will these values be provided to the OpenMP IRBuilder if these are not carried in the loop operation?
void kmpc for static init 4 ( ident_t ∗ loc, kmp int32 gtid, kmp int32 schedtype, kmp int32 ∗ plastiter, kmp int32 ∗ plower, kmp int32 ∗ pupper, kmp int32 ∗ pstride, kmp int32 incr, kmp int32 chunk )

Should the loop operation be Anyregion or just one block? (Note: @clementval felt it was OK with both anyregion or single block region for OpenACC loop.)

Unlike parallel, the loop operation will always have a loop associated with it. All higher-level dialects have loop operations. But since we lower to OpenMP + LLVM dialect (and llvm has no loop operation) we need to have any region and not just one block. An alternative would be to have two omp.do operations one which sits with the do loop (of one block) and one which sits without (any region).

Another version of omp.do suggested was a single block region with iteration interval and step size. This is similar to loops in other dialects like affine. This has the advantage of transforming to other dialects like affine or recreating the affine transformations as omp.do, as well as keeping it free of optimizations of other dialects. But this version alone will not be able to capture the various loops permitted by OpenMP and Fortran/C++ because there can be branches in Fortran code. Another issue is that we have to retain the OpenMP information somehow (to generate runtime calls) so fully transforming to other dialect loops is not possible.

kiranchandramohan added inline comments.Aug 28 2020, 2:24 PM

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
154	Is this a single value or a list of values?
155–156	Is there a schedule modifier also?
157	This should be an attribute. "The parameter of the collapse clause must be a constant positive integer expression."
159	This should also be an attribute. "The parameter of the ordered clause must be a constant positive integer expression if specified."

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 28 2020, 2:24 PM

ftynse added inline comments.Aug 31 2020, 12:36 AM

llvm/include/llvm/Frontend/OpenMP/OMP.td
123	Nit: could we have a space after comma?

DavidTruby added inline comments.Sep 7 2020, 6:09 AM

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
154	it's one step value per element in linear_vars. So a list of values. The two lists should always be the same length, but I don't think there's a way to enforce it here (it should be enforced later)
155–156	I'm leaving the schedule modifier for a later patch, as it requires more changes to the OMP.td file. I've added a clarification on this to the commit message.

Address review comments.

Harbormaster completed remote builds in B70833: Diff 290276.Sep 7 2020, 7:22 AM

@DavidTruby, In general, I was asking whether all clause enumeration values can begin with a Capital letter? Just the reserved keywords would look odd right?

Also, can you mark fixed comments as done?

I guess for the general workshare loop design issues we can have an RFC in discourse. But this patch can go ahead.

mlir/test/Dialect/OpenMP/ops.mlir
154	Nit: extra line.

This revision is now accepted and ready to land.Sep 7 2020, 1:50 PM

In D86071#2259756, @kiranchandramohan wrote:

@DavidTruby, In general, I was asking whether all clause enumeration values can begin with a Capital letter? Just the reserved keywords would look odd right?

Oh I see. Yes, we can do that. I'll make the same change for parallel in a separate patch.

DavidTruby marked 3 inline comments as done.Sep 8 2020, 4:52 AM

Capitalise all schedule clause values.

DavidTruby marked 2 inline comments as done.Sep 8 2020, 6:49 AM

Harbormaster completed remote builds in B70934: Diff 290474.Sep 8 2020, 7:27 AM

clementval accepted this revision.Sep 10 2020, 8:35 AM

I guess for the general workshare loop design issues we can have an RFC in discourse. But this patch can go ahead.

I haven't seen answers to the questions about lowering to LLVM IR + OpenMP runtime, and it sounds suboptimal to push the patch before discussing and agreeing on the actual design.

In particular, it is not clear to me how this construct will connect to loops and what the lowering flow is. Does it expect an scf.for/scf.parallel as the only nested op? Is there a plan for a separate omp.for? How long do loops persist when we go to LLVM, given that OpenMPIRBuilder does not handle loop constructs and we really want to avoid converting loops to CFG during MLIR->LLVM IR translation.

This revision now requires changes to proceed.Sep 10 2020, 8:54 AM

In D86071#2265876, @ftynse wrote:

In particular, it is not clear to me how this construct will connect to loops and what the lowering flow is. Does it expect an scf.for/scf.parallel as the only nested op? Is there a plan for a separate omp.for? How long do loops persist when we go to LLVM, given that OpenMPIRBuilder does not handle loop constructs and we really want to avoid converting loops to CFG during MLIR->LLVM IR translation.

The last part is unclear to me TBH. What exactly do you expect to do with OpenMP worksharing loops on this level which is problematic with CFGs?

What exactly do you expect to do with OpenMP worksharing loops on this level which is problematic with CFGs?

It's the inverse that is problematic: having loop ops where CFG is expected. I would like to avoid seeing something like

omp.do <...> {
  omp.for %i = <...> {
    llvm.store <...>
    // other LLVM operations here
  }
}
// LLVM operations as CFG here

go into mlir-translate, which will have to outline and lower the loop during _translation_ in this case.

In D86071#2265964, @ftynse wrote:
What exactly do you expect to do with OpenMP worksharing loops on this level which is problematic with CFGs?

It's the inverse that is problematic: having loop ops where CFG is expected. I would like to avoid seeing something like
omp.do <...> {
  omp.for %i = <...> {
    llvm.store <...>
    // other LLVM operations here
  }
}
// LLVM operations as CFG here
go into mlir-translate, which will have to outline and lower the loop during _translation_ in this case.

I think I might simply misunderstand because I'm oblivious to the "MLIR way" but the loop belongs to the OpenMP directive.
It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it? So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it? So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it?

Yes.

So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

All OpenMP related LLVM-IR code generation is (eventually) going to be moved into OpenMPIRBuilder so we do not duplicate the rather nontrivial parts in two places which would become a maintenance nightmare. That is (among other things) the point ;)

In D86071#2266254, @jdoerfert wrote:

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it? So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it?

Yes.

Is there a written version somewhere we can see?

So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

All OpenMP related LLVM-IR code generation is (eventually) going to be moved into OpenMPIRBuilder so we do not duplicate the rather nontrivial parts in two places which would become a maintenance nightmare. That is (among other things) the point ;)

This sounds great! What I am missing is how this connects to MLIR. Two particular issues: how the OpenMP dialect interacts with other dialects, and can the OpenMPIRBuilder be designed so as to eventually take an mlir::Builder instead of llvm::IRBuilder<> and produce LLVM dialect instead of LLVM; there may be others. So far, the implementation of omp.do proposed here does not align, if not contradicts, the original RFC. Bottomline, this deserves a proper discussion in a forum with better visibility than code review. Part of the discussion seemingly happens in flang channels and should be somehow summarized for mlir folks as OpenMP dialect being independent of any Fortran construct was a condition for accepting it in core.

In D86071#2265876, @ftynse wrote:

I guess for the general workshare loop design issues we can have an RFC in discourse. But this patch can go ahead.

I haven't seen answers to the questions about lowering to LLVM IR + OpenMP runtime, and it sounds suboptimal to push the patch before discussing and agreeing on the actual design.

In particular, it is not clear to me how this construct will connect to loops and what the lowering flow is. Does it expect an scf.for/scf.parallel as the only nested op? Is there a plan for a separate omp.for? How long do loops persist when we go to LLVM, given that OpenMPIRBuilder does not handle loop constructs and we really want to avoid converting loops to CFG during MLIR->LLVM IR translation.

Thanks @ftynse for the feedback. Yes, the whole flow for the OpenMP worksharing loop requires a detailed discussion in discourse. I was only suggesting that since the operation in this patch only models what is an OpenMP do directive it can go ahead. The RFC for the workshare operation is next on @DavidTruby's list. It is fine to have the RFC discussion before submitting this patch.

In D86071#2267473, @ftynse wrote:

In D86071#2266254, @jdoerfert wrote:

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it? So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it?

Yes.

Is there a written version somewhere we can see?

A written version of the plan? Yes. A written version of the code, not yet.

So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

All OpenMP related LLVM-IR code generation is (eventually) going to be moved into OpenMPIRBuilder so we do not duplicate the rather nontrivial parts in two places which would become a maintenance nightmare. That is (among other things) the point ;)

This sounds great!
[...]
and can the OpenMPIRBuilder be designed so as to eventually take an mlir::Builder instead of llvm::IRBuilder<> and produce LLVM dialect instead of LLVM;
[...]

So far that was not on my TODO list and it seems like a lot of work assuming you do not port various other things into MLIR land. Could you help me understand what we would gain by generating LLVM dialect?

In D86071#2267750, @jdoerfert wrote:

In D86071#2267473, @ftynse wrote:

In D86071#2266254, @jdoerfert wrote:

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it? So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it?

Yes.

Is there a written version somewhere we can see?

A written version of the plan? Yes. A written version of the code, not yet.

Could you post a link?

So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

All OpenMP related LLVM-IR code generation is (eventually) going to be moved into OpenMPIRBuilder so we do not duplicate the rather nontrivial parts in two places which would become a maintenance nightmare. That is (among other things) the point ;)

This sounds great!
[...]
and can the OpenMPIRBuilder be designed so as to eventually take an mlir::Builder instead of llvm::IRBuilder<> and produce LLVM dialect instead of LLVM;
[...]

So far that was not on my TODO list and it seems like a lot of work assuming you do not port various other things into MLIR land. Could you help me understand what we would gain by generating LLVM dialect?

Would you mind moving this to discourse / mailing list?

In D86071#2267754, @ftynse wrote:

In D86071#2267750, @jdoerfert wrote:

In D86071#2267473, @ftynse wrote:

In D86071#2266254, @jdoerfert wrote:

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it? So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

In D86071#2266169, @ftynse wrote:

It has to be the OpenMPIRBuilder that lowers it into a CFG (eventually) because it is not "a fortran/mlir/affine/... loop" but an OpenMP worksharing loop with all what that entails.

If there is an OpenMPIRBuilder::CreateForLoop or a plan to have it?

Yes.

Is there a written version somewhere we can see?

A written version of the plan? Yes. A written version of the code, not yet.

Could you post a link?

Sure: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

So far, it looks a there is a non-negligible amount of code in Clang that emits the IR for loops, and replicating that code in mlir-translate is a no go.

All OpenMP related LLVM-IR code generation is (eventually) going to be moved into OpenMPIRBuilder so we do not duplicate the rather nontrivial parts in two places which would become a maintenance nightmare. That is (among other things) the point ;)

This sounds great!
[...]
and can the OpenMPIRBuilder be designed so as to eventually take an mlir::Builder instead of llvm::IRBuilder<> and produce LLVM dialect instead of LLVM;
[...]

So far that was not on my TODO list and it seems like a lot of work assuming you do not port various other things into MLIR land. Could you help me understand what we would gain by generating LLVM dialect?

Would you mind moving this to discourse / mailing list?

No, feel free to reply to the thread above or start a new one.

Sure: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

@jdoerfert This is probably the old flang-dev mailing list.

Is it OK for @jdoerfert and @ftynse If I initiate the discussion in discourse?

I'm fine with anything that is not a code review, which most people would just ignore

In D86071#2268185, @kiranchandramohan wrote:

Sure: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

@jdoerfert This is probably the old flang-dev mailing list.

Is it OK for @jdoerfert and @ftynse If I initiate the discussion in discourse?

I'm not following discourse.

In D86071#2268408, @jdoerfert wrote:

In D86071#2268185, @kiranchandramohan wrote:

Sure: http://lists.flang-compiler.org/pipermail/flang-dev_lists.flang-compiler.org/2019-May/000197.html

@jdoerfert This is probably the old flang-dev mailing list.

Is it OK for @jdoerfert and @ftynse If I initiate the discussion in discourse?

I'm not following discourse.

That's the main discussion channel for MLIR, we expect RFCs to go there. We can go to llvm-dev@ for the OpenMPIRBuilder discussion, I think it is actually the best place given how it cuts across LLVM, Clang and Flang.

OK. I had almost typed in discourse. I primarily wanted to ask the following question before posing a general question in llvm-dev about the dialect or the OpenMP IRBuilder.

Consider that we want to parallelize an SCF loop with OpenMP. We can add the omp.do operation around the loop. This would look like.

func @some_op_inside_loop(%arg0: index, %arg1: index, %arg2: index) {
omp.do {
  scf.for %i = %arg0 to %arg1 step %arg2 {
    "some.op"(%i) : (index) -> ()
  }
}
  return
}

One way to pass this to the OpenMP IR Builder would be as follows. The loop exists as control flow. (Question: Can we have index, start, end and step variables here as operands or attributes?)

  llvm.func @some_op_inside_loop(%arg0: !llvm.i64, %arg1: !llvm.i64, %arg2: !llvm.i64) {
omp.do {
    llvm.br ^bb1(%arg0 : !llvm.i64)
  ^bb1(%0: !llvm.i64):  // 2 preds: ^bb0, ^bb2
    %1 = llvm.icmp "slt" %0, %arg1 : !llvm.i64
    llvm.cond_br %1, ^bb2, ^bb3
  ^bb2:  // pred: ^bb1
    "some.op"(%0) : (!llvm.i64) -> ()
    %2 = llvm.add %0, %arg2 : !llvm.i64
    llvm.br ^bb1(%2 : !llvm.i64)
  ^bb3:  // pred: ^bb1
}
    llvm.return
  }

Another way would be as follows where the loop exists without the control flow of the loop.

llvm.func @some_op_inside_loop(%arg0: !llvm.i64, %arg1: !llvm.i64, %arg2: !llvm.i64) {
omp.do index(%i: !llvm.i64) start(%arg0: !llvm.i64) stop(%arg1: !llvm.i64) step(%arg2: !llvm.i64) {
    "some.op"(%i) : (!llvm.i64) -> ()
}
  return
}

The question I have for @jdoerfert is which one of these will be preferable for the OpenMP IRBuilder? And the question I have for @ftynse is what is the issue that is there in these schemes?

Great to see this. Some minor comments.

llvm/include/llvm/Frontend/OpenMP/OMP.td
120	Please terminate with a period.
mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
147	This description is missing a customary example in the end. (Please use triple backticks to include an example.)
157	Please `Confined` and an `IntMinValue` argument to force this to be positive. It'll be automatically enforced and verified. As an example, please see `AllocLikeOp` in the standard dialect. let arguments = (ins Variadic<Index>:$value, Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$alignment);

Herald added a subscriber: tatianashp. · View Herald TranscriptSep 13 2020, 11:17 PM

In D86071#2265876, @ftynse wrote:

I guess for the general workshare loop design issues we can have an RFC in discourse. But this patch can go ahead.

I haven't seen answers to the questions about lowering to LLVM IR + OpenMP runtime, and it sounds suboptimal to push the patch before discussing and agreeing on the actual design.

In particular, it is not clear to me how this construct will connect to loops and what the lowering flow is. Does it expect an scf.for/scf.parallel as the only nested op? Is there a plan for a separate omp.for? How long do loops persist when we go to LLVM, given that OpenMPIRBuilder does not handle loop constructs and we really want to avoid converting loops to CFG during MLIR->LLVM IR translation.

+1 on having clarity on the lowering part to LLVM dialect. I think the op itself doesn't place any restrictions on what's inside. There could be scf.for ops nested inside or surrounding an omp.do given that they both use the same type subsystem (standard types). One would expect an scf.parallel to be converted to an omp.do? The remaining scf.for's would be lowered to LLVM dialect the usual way. It looks like we need some sort of a pre-pass on the LLVM dialect to lower/handle the omp.do and some of the implementation therein would have to duplicate/reuse the logic of scf.for lowering. @DavidTruby @kiranchandramohan - do you have an example sketch of how the LLVM dialect would like for a simple / minimal omp.do loop right before it's translated out to LLVM IR proper? Would it require duplicating LLVM's OpenMPIRBuilder functionality here on the LLVM dialect?

Thanks @bondhugula for the review and comments. Very helpful. We will post out an RFC in a week or so and take it forward from there. Will try to answer your questions in the RFC. We will likely need some help to complete the flow.

I forgot to submit the first part of this response a few days ago, apologies.

In D86071#2268569, @kiranchandramohan wrote:
OK. I had almost typed in discourse. I primarily wanted to ask the following question before posing a general question in llvm-dev about the dialect or the OpenMP IRBuilder.

Consider that we want to parallelize an SCF loop with OpenMP. We can add the omp.do operation around the loop. This would look like.
func @some_op_inside_loop(%arg0: index, %arg1: index, %arg2: index) {
omp.do {
  scf.for %i = %arg0 to %arg1 step %arg2 {
    "some.op"(%i) : (index) -> ()
  }
}
  return
}

This is not parallelization but worksharing, but I think I get what you're saying.

One way to pass this to the OpenMP IR Builder would be as follows. The loop exists as control flow. (Question: Can we have index, start, end and step variables here as operands or attributes?)
  llvm.func @some_op_inside_loop(%arg0: !llvm.i64, %arg1: !llvm.i64, %arg2: !llvm.i64) {
omp.do {
    llvm.br ^bb1(%arg0 : !llvm.i64)
  ^bb1(%0: !llvm.i64):  // 2 preds: ^bb0, ^bb2
    %1 = llvm.icmp "slt" %0, %arg1 : !llvm.i64
    llvm.cond_br %1, ^bb2, ^bb3
  ^bb2:  // pred: ^bb1
    "some.op"(%0) : (!llvm.i64) -> ()
    %2 = llvm.add %0, %arg2 : !llvm.i64
    llvm.br ^bb1(%2 : !llvm.i64)
  ^bb3:  // pred: ^bb1
}
    llvm.return
  }
Another way would be as follows where the loop exists without the control flow of the loop.
llvm.func @some_op_inside_loop(%arg0: !llvm.i64, %arg1: !llvm.i64, %arg2: !llvm.i64) {
omp.do index(%i: !llvm.i64) start(%arg0: !llvm.i64) stop(%arg1: !llvm.i64) step(%arg2: !llvm.i64) {
    "some.op"(%i) : (!llvm.i64) -> ()
}
  return
}
The question I have for @jdoerfert is which one of these will be preferable for the OpenMP IRBuilder? And the question I have for @ftynse is what is the issue that is there in these schemes?

First, the loop belongs to the omp.do. If you want to lower an omp.do (w/ or w/o the OpenMPIRBuilder) you need the loop information (=bounds + step). The loop body is irrelevant at this point.
The interface will somewhat look like this:

InsertPos CreateWorksharingLoop(..., LowerBound, UpperBound, Step, ..., BodyCodeGenCallback)

In D86071#2270542, @bondhugula wrote:

In D86071#2265876, @ftynse wrote:

I guess for the general workshare loop design issues we can have an RFC in discourse. But this patch can go ahead.

I haven't seen answers to the questions about lowering to LLVM IR + OpenMP runtime, and it sounds suboptimal to push the patch before discussing and agreeing on the actual design.

In particular, it is not clear to me how this construct will connect to loops and what the lowering flow is. Does it expect an scf.for/scf.parallel as the only nested op? Is there a plan for a separate omp.for? How long do loops persist when we go to LLVM, given that OpenMPIRBuilder does not handle loop constructs and we really want to avoid converting loops to CFG during MLIR->LLVM IR translation.

+1 on having clarity on the lowering part to LLVM dialect. I think the op itself doesn't place any restrictions on what's inside. There could be scf.for ops nested inside or surrounding an omp.do given that they both use the same type subsystem (standard types).
One would expect an scf.parallel to be converted to an omp.do?

You can lower scf.parallel into omp parallel for/do *if* you prove the body does not contain (certain) OpenMP runtime calls and directives. So without analysis you cannot.

The remaining scf.for's would be lowered to LLVM dialect the usual way. It looks like we need some sort of a pre-pass on the LLVM dialect to lower/handle the omp.do and some of the implementation therein would have to duplicate/reuse the logic of scf.for lowering.

I don't see why this would be the case. As mentioned above, the loop, whatever "op" it might be, belongs to the omp do. There is no omp do without loop, there is no "loop" once omp do has been lowered (to runtime calls). The omp do lowering is also not duplicating scf.for code if you use the OpenMPIRBuilder.

@DavidTruby @kiranchandramohan - do you have an example sketch of how the LLVM dialect would like for a simple / minimal omp.do loop right before it's translated out to LLVM IR proper? Would it require duplicating LLVM's OpenMPIRBuilder functionality here on the LLVM dialect?

Please, do not duplicate OpenMPIRBuilder functionality.

In D86071#2273026, @jdoerfert wrote:

The remaining scf.for's would be lowered to LLVM dialect the usual way. It looks like we need some sort of a pre-pass on the LLVM dialect to lower/handle the omp.do and some of the implementation therein would have to duplicate/reuse the logic of scf.for lowering.

I don't see why this would be the case. As mentioned above, the loop, whatever "op" it might be, belongs to the omp do. There is no omp do without loop, there is no "loop" once omp do has been lowered (to runtime calls). The omp do lowering is also not duplicating scf.for code if you use the OpenMPIRBuilder.

I see. So you are suggesting just preserving the omp.do all the way into the LLVM dialect and then use the OpenMPIRBuilder in the MLIR LLVM dialect to LLVM IR *translator*?

You can lower scf.parallel into omp parallel for/do *if* you prove the body does not contain (certain) OpenMP runtime calls and directives. So without analysis you cannot.

It's generally hard to prove that something is not contained in a region in MLIR. You can always have "unknown.op"() in there, and you'd have to treat it conservatively by assuming it may be an equivalent of the forbidden call or it lowers to such call.

The presence of additional constraints on what is allowed inside the loop body suggests that OpenMP loops should a different operation than scf.for or scf.parallel, potentially sharing interfaces/traits with the latter. That being said, introducing yet another "loop-like" operation also sounds suboptimal.

The question I have for @jdoerfert is which one of these will be preferable for the OpenMP IRBuilder? And the question I have for @ftynse is what is the issue that is there in these schemes?

If scf.for persists until the point where we perform the MLIR-to-LLVM-IR translation, the translation will have to do (the equivalent of) scf-to-std and std-to-llvm passes. which contradicts the "progressive lowering" principle of MLIR. (Note that we originally had a translation from the standard dialect directly to LLVM IR, but we introduced the LLVM dialect specifically to avoid that translation growing in complexity). If there is no explicit loop, the translator will have to analyze the CFG and essentially raise back to the loop form to be able to call the OpenMP IRBuilder, which would expect loop bounds.

To actually make the loops persist, we would need some fine-grained control over the SCF-to-std lowering that would ignore loops contained in omp.do somehow, but only the outermost (?). The entire layering of scf-to-std lowering depending on the OpenMP dialect is not clear to me. There are also hard edges on dialect mixture due to type system differences: SCF does not work on LLVM types and LLVM operations don't work on standard types. We'd need cast operations, type conversions and canonicalizations that remove redundant back-and-forth cast chains, all in translation, which sounds messy and poorly maintainable.

I don't see why this would be the case. As mentioned above, the loop, whatever "op" it might be, belongs to the omp do. There is no omp do without loop, there is no "loop" once omp do has been lowered (to runtime calls). The omp do lowering is also not duplicating scf.for code if you use the OpenMPIRBuilder.

It sounds like a satisfactory solution if we have an explicit loop-like construct in the OpenMP dialect, compatible with the LLVM dialect type system. At translation time, it is expected to contain LLVM+OpenMP dialect inside and the translation itself is straightforward thanks to a dedicated OpenMPIRBuilder method. It's still suboptimal to test this within MLIR, but we could trust OpenMPIRBuilder to be tested properly in LLVM.

Please, do not duplicate OpenMPIRBuilder functionality.

+1.

I see. So you are suggesting just preserving the omp.do all the way into the LLVM dialect and then use the OpenMPIRBuilder in the MLIR LLVM dialect to LLVM IR *translator*?

This was one of my initial concerns, the translator should be kept simple and OpenMPIRBuilder did not have a "create loop" function when I looked at it. So it was unclear to me if the folks implementing OpenMP intend to replicate it within MLIR, extend it in LLVM or do something else, hence my request for an RFC+discussion.

Two other options that are worth considering are:

using an attribute to annotate SCF loops as OpenMP loops; this adds a verification hook without requiring to duplicate an operation but the layering is still not very clear to me;
making OpenMPIRBuilder somehow extensible so that it can build the LLVM dialect instead of LLVM IR, at which point we can have an OpenMP-to-LLVM dialect conversion that uses it directly and is testable within MLIR, and an actually trivial translation.

I would be interested to hear from @mehdi_amini and @nicolasvasilache among others... @jdoerfert seems to have an account on Discourse, so he would normally receive an email if mentioned explicitly even if he doesn't follow the forum in general.

The question I have for @jdoerfert is which one of these will be preferable for the OpenMP IRBuilder? And the question I have for @ftynse is what is the issue that is there in these schemes?
First, the loop belongs to the omp.do. If you want to lower an omp.do (w/ or w/o the OpenMPIRBuilder) you need the loop information (=bounds + step). The loop body is irrelevant at this point. The interface will somewhat look like this: InsertPos CreateWorksharingLoop(..., LowerBound, UpperBound, Step, ..., BodyCodeGenCallback)

Thanks @jdoerfert. I am thinking that there is some additional requirement since (for.eg. for a static schedule) the kmpc_static_init call has to be inserted in the loop header, the kmpc_static_fini has to be inserted in the loop footer.

The question I have for @jdoerfert is which one of these will be preferable for the OpenMP IRBuilder? And the question I have for @ftynse is what is the issue that is there in these schemes?

If scf.for persists until the point where we perform the MLIR-to-LLVM-IR translation, the translation will have to do (the equivalent of) scf-to-std and std-to-llvm passes. which contradicts the "progressive lowering" principle of MLIR. (Note that we originally had a translation from the standard dialect directly to LLVM IR, but we introduced the LLVM dialect specifically to avoid that translation growing in complexity). If there is no explicit loop, the translator will have to analyze the CFG and essentially raise back to the loop form to be able to call the OpenMP IRBuilder, which would expect loop bounds.

To actually make the loops persist, we would need some fine-grained control over the SCF-to-std lowering that would ignore loops contained in omp.do somehow, but only the outermost (?). The entire layering of scf-to-std lowering depending on the OpenMP dialect is not clear to me. There are also hard edges on dialect mixture due to type system differences: SCF does not work on LLVM types and LLVM operations don't work on standard types. We'd need cast operations, type conversions and canonicalizations that remove redundant back-and-forth cast chains, all in translation, which sounds messy and poorly maintainable.

The broad plan is to always to get to OpenMP operation with LLVM dialect. So the plan is not for scf to persist until MLIR to LLVM IR translation. The plan is to have a conversion pattern (or does this fall under transformations?) which converts an scf.for nested inside an omp.do to a loop like operation in the OpenMP dialect. A user can invoke this with the -convert-openmp-to-llvm conversion option with mlir-opt. The question I had is that most of the loops in MLIR seems to be SizedRegion<1>. Is there any issue with having a loop like operation (with bounds, step) and is Anyregion with multiple blocks? The spv.loop seems to be AnyRegion but is a collection of blocks with control flow but has a well-defined structure and no bounds.

I don't see why this would be the case. As mentioned above, the loop, whatever "op" it might be, belongs to the omp do. There is no omp do without loop, there is no "loop" once omp do has been lowered (to runtime calls). The omp do lowering is also not duplicating scf.for code if you use the OpenMPIRBuilder.

It sounds like a satisfactory solution if we have an explicit loop-like construct in the OpenMP dialect, compatible with the LLVM dialect type system. At translation time, it is expected to contain LLVM+OpenMP dialect inside and the translation itself is straightforward thanks to a dedicated OpenMPIRBuilder method. It's still suboptimal to test this within MLIR, but we could trust OpenMPIRBuilder to be tested properly in LLVM.

OK. I was worried you had an objection here.

Please, do not duplicate OpenMPIRBuilder functionality.

+1.

I see. So you are suggesting just preserving the omp.do all the way into the LLVM dialect and then use the OpenMPIRBuilder in the MLIR LLVM dialect to LLVM IR *translator*?

This was one of my initial concerns, the translator should be kept simple and OpenMPIRBuilder did not have a "create loop" function when I looked at it. So it was unclear to me if the folks implementing OpenMP intend to replicate it within MLIR, extend it in LLVM or do something else, hence my request for an RFC+discussion.

There were more things to discuss, like

should we have both the directive like omp.do operation and the loop like omp.do operation (name can be omp.wsloop) or should there be only the loop like omp.do operation. @DavidTruby was writing an RFC on this.
How flang fits into the picture. The FIR developers informed that scf.for might not be in the path and it is possible that fir.do loop operation might get converted directly to LLVM dialect. In this case there has to be additional conversions or passes in Flang to convert and fir.do inside an omp.do. Can the loop like omp.do operation be the target of fir.do that is concurrent?
Should we handle affine.for nested inside an omp.do? Is affine.for guaranteed to be converted to scf.for at some time and will the handling of scf.for inside omp.do automatically kick in?
Should we use the information that the omp.do loop is parallel to do additional optimisation or even convert to something like affine loops if it is possible?

Two other options that are worth considering are:

using an attribute to annotate SCF loops as OpenMP loops; this adds a verification hook without requiring to duplicate an operation but the layering is still not very clear to me;

I was thinking for operations that are already parallel there will be a conversion operation to convert to the OpenMP dialect. -convert-scf-parallel-to-openmp. This can convert the parallel scf loop to the loop like operation in the OpenMP dialect.

making OpenMPIRBuilder somehow extensible so that it can build the LLVM dialect instead of LLVM IR, at which point we can have an OpenMP-to-LLVM dialect conversion that uses it directly and is testable within MLIR, and an actually trivial translation.

I would be interested to hear from @mehdi_amini and @nicolasvasilache among others... @jdoerfert seems to have an account on Discourse, so he would normally receive an email if mentioned explicitly even if he doesn't follow the forum in general.

We are OK with submitting the RFC in discourse.

I have submitted an RFC on the openmp do loop design here: https://llvm.discourse.group/t/openmp-worksharing-loop-rfc/1815

Add indexes for loop-style implementation, and rename operation to omp.wsloop

DavidTruby retitled this revision from [MLIR][OpenMP] Add omp.do operation to [MLIR][OpenMP] Add omp.wsloop operation.Sep 30 2020, 5:33 AM

Harbormaster completed remote builds in B73494: Diff 295252.Sep 30 2020, 6:14 AM

clementval added inline comments.Sep 30 2020, 11:12 AM

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
116	Would it make sense to add the `DeclareOpInterfaceMethods<LoopLikeOpInterface>` trait since you added `lowerBound`, `upperBound` and `step`?
119	Since it is now a loop should the `associated loops` be rephrased?

ftynse added inline comments.Oct 5 2020, 1:29 AM

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
116	`LoopLikeOpInterface` is a bit of a misnomer. It has nothing to do with bounds, but instead registers the op to be processed by LICM. It will likely break OpenMP loops.
165	How does one terminate such loops?

bondhugula resigned from this revision.Oct 23 2020, 11:18 AM

Herald added a subscriber: rdzhabarov. · View Herald TranscriptOct 23 2020, 11:18 AM

kiranchandramohan commandeered this revision.Oct 25 2020, 4:30 PM

kiranchandramohan edited reviewers, added: DavidTruby; removed: kiranchandramohan.

Taking over from @DavidTruby on this patch. Contains the following modifications,
Added a yield terminator (omp.yield).
Restricted loop indices to llvm integer, integer, index types
Added an example
Addressed other minor text changes comments

kiranchandramohan marked 6 inline comments as done.Oct 28 2020, 8:32 AM

kiranchandramohan added inline comments.

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
116	Skipping since they are not applicable to OpenMP loops.
119	Did a minor rephrasing.
147	Added an example. Note that the pretty printer and parser are not part of this patch but still i am using the pretty syntax.
165	Added a yield terminator. Current usage will be an empty yield.

ftynse accepted this revision.Nov 4 2020, 4:52 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
209–210	What is the semantics of multiple blocks terminated with `omp.yield` in the loop body? (Regions are not necessarily single-exit).
222	Nit, here and below: the `$` sign only appears in ODS input, not in the generated documentation or code.
223	Nit: there are no "variables" in MLIR. In these case, you are likely referring to operand groups.

This revision is now accepted and ready to land.Nov 4 2020, 4:52 AM

Addressed formatting comments and description changes suggested by @ftynse.

kiranchandramohan marked 2 inline comments as done.Nov 6 2020, 11:34 AM

kiranchandramohan added inline comments.

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
209–210	The openmp worksharing loop is a single exit region. So multiple terminators are not expected. Should this be enforced through the verifier in this or a subsequent patch?

Harbormaster completed remote builds in B77914: Diff 303507.Nov 6 2020, 12:30 PM

This revision was landed with ongoing or failed builds.Nov 16 2020, 7:25 AM

Closed by commit rG843525075b87: [MLIR][OpenMP] Add omp.wsloop operation (authored by DavidTruby, committed by kiranchandramohan). · Explain Why

This revision was automatically updated to reflect the committed changes.

kiranchandramohan added a commit: rG843525075b87: [MLIR][OpenMP] Add omp.wsloop operation.

Herald added a subscriber: teijeong. · View Herald TranscriptNov 16 2020, 7:25 AM

Meinersbur added a subscriber: Meinersbur.Nov 17 2020, 3:41 PM

Meinersbur added inline comments.

llvm/include/llvm/Frontend/OpenMP/OMP.td
120	IIUC, the are meant to be directly passed by the frontend to get the schedule mode: getScheduleKind(clausearg.str()) I.e. this would require the user to write: #pragma omp for schedule(Static) since #pragma omp for schedule(static) will give you `OMP_SCHEDULE_Default`. See D91643.
126	To detect some invalid keyword, there should be some unknown constant as well. Otherwise the front-end must itself store a complete list of valid schedule arguments.

Thanks @Meinersbur for your comments. I will have a look soon. I suspect these are only used now for generating string enum classes in mlir openmp.

Revision Contents

Path

Size

llvm/

include/

llvm/

Frontend/

OpenMP/

OMP.td

25 lines

mlir/

include/

mlir/

Dialect/

OpenMP/

OpenMPDialect.h

3 lines

OpenMPOps.td

92 lines

test/

Dialect/

OpenMP/

ops.mlir

38 lines

Diff 305508

llvm/include/llvm/Frontend/OpenMP/OMP.td

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	def OMPC_ProcBind : Clause<"proc_bind"> {
let allowedClauseValues = [		let allowedClauseValues = [
OMP_PROC_BIND_master,		OMP_PROC_BIND_master,
OMP_PROC_BIND_close,		OMP_PROC_BIND_close,
OMP_PROC_BIND_spread,		OMP_PROC_BIND_spread,
OMP_PROC_BIND_default,		OMP_PROC_BIND_default,
OMP_PROC_BIND_unknown		OMP_PROC_BIND_unknown
];		];
}		}

		// static and auto are C++ keywords so need a capital to disambiguate.
		kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions For the default clause in parallel we have used a prefix "def" to fix this issue. I think we need to standardize this. Would converting the first letter to caps be a reasonable workaround since reserved keywords do not have a letter with caps as first letter? kiranchandramohan: For the default clause in parallel we have used a prefix "def" to fix this issue. I think we…
		clementvalUnsubmitted Done Reply Inline Actions That's probably a good idea to have a standard way to do that. +1 for the first letter capitalized if it works. clementval: That's probably a good idea to have a standard way to do that. +1 for the first letter…
		kiranchandramohanAuthorUnsubmitted Done Reply Inline Actions I have filed an issue https://bugs.llvm.org/show_bug.cgi?id=47225 regarding StrEnumAttr not accepting reserved C++ keywords. kiranchandramohan: I have filed an issue https://bugs.llvm.org/show_bug.cgi?id=47225 regarding StrEnumAttr not…
		MeinersburUnsubmitted Not Done Reply Inline Actions IIUC, the are meant to be directly passed by the frontend to get the schedule mode: getScheduleKind(clausearg.str()) I.e. this would require the user to write: #pragma omp for schedule(Static) since #pragma omp for schedule(static) will give you `OMP_SCHEDULE_Default`. See D91643. Meinersbur: IIUC, the are meant to be directly passed by the frontend to get the schedule mode: ```…
		bondhugulaUnsubmitted Done Reply Inline Actions Please terminate with a period. bondhugula: Please terminate with a period.
		def OMP_SCHEDULE_Static : ClauseVal<"Static", 2, 1> {}
		def OMP_SCHEDULE_Dynamic : ClauseVal<"Dynamic", 3, 1> {}
		def OMP_SCHEDULE_Guided : ClauseVal<"Guided", 4, 1> {}
		ftynseUnsubmitted Done Reply Inline Actions Nit: could we have a space after comma? ftynse: Nit: could we have a space after comma?
		def OMP_SCHEDULE_Auto : ClauseVal<"Auto", 5, 1> {}
		def OMP_SCHEDULE_Runtime : ClauseVal<"Runtime", 6, 1> {}
		def OMP_SCHEDULE_Default : ClauseVal<"Default", 7, 0> { let isDefault = 1; }
		MeinersburUnsubmitted Not Done Reply Inline Actions To detect some invalid keyword, there should be some unknown constant as well. Otherwise the front-end must itself store a complete list of valid schedule arguments. Meinersbur: To detect some invalid keyword, there should be some unknown constant as well. Otherwise the…

def OMPC_Schedule : Clause<"schedule"> {		def OMPC_Schedule : Clause<"schedule"> {
let clangClass = "OMPScheduleClause";		let clangClass = "OMPScheduleClause";
let flangClass = "OmpScheduleClause";		let flangClass = "OmpScheduleClause";
		let enumClauseValue = "ScheduleKind";
		let allowedClauseValues = [
		OMP_SCHEDULE_Static,
		OMP_SCHEDULE_Dynamic,
		OMP_SCHEDULE_Guided,
		OMP_SCHEDULE_Auto,
		OMP_SCHEDULE_Runtime,
		OMP_SCHEDULE_Default
		];
}		}

def OMPC_Ordered : Clause<"ordered"> {		def OMPC_Ordered : Clause<"ordered"> {
let clangClass = "OMPOrderedClause";		let clangClass = "OMPOrderedClause";
let flangClassValue = "ScalarIntConstantExpr";		let flangClassValue = "ScalarIntConstantExpr";
let isValueOptional = true;		let isValueOptional = true;
}		}
def OMPC_NoWait : Clause<"nowait"> {		def OMPC_NoWait : Clause<"nowait"> {
let clangClass = "OMPNowaitClause";		let clangClass = "OMPNowaitClause";
let flangClass = "OmpNowait";		let flangClass = "OmpNowait";
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
}		}
def OMPC_Allocate : Clause<"allocate"> {		def OMPC_Allocate : Clause<"allocate"> {
let clangClass = "OMPAllocateClause";		let clangClass = "OMPAllocateClause";
let flangClass = "OmpAllocateClause";		let flangClass = "OmpAllocateClause";
}		}
def OMPC_NonTemporal : Clause<"nontemporal"> {		def OMPC_NonTemporal : Clause<"nontemporal"> {
let clangClass = "OMPNontemporalClause";		let clangClass = "OMPNontemporalClause";
}		}

		def OMP_ORDER_concurrent : ClauseVal<"default",2,0> { let isDefault = 1; }
def OMPC_Order : Clause<"order"> {		def OMPC_Order : Clause<"order"> {
let clangClass = "OMPOrderClause";		let clangClass = "OMPOrderClause";
		let enumClauseValue = "OrderKind";
		let allowedClauseValues = [
		OMP_ORDER_concurrent
		];
}		}
def OMPC_Destroy : Clause<"destroy"> {		def OMPC_Destroy : Clause<"destroy"> {
let clangClass = "OMPDestroyClause";		let clangClass = "OMPDestroyClause";
}		}
def OMPC_Detach : Clause<"detach"> {		def OMPC_Detach : Clause<"detach"> {
let clangClass = "OMPDetachClause";		let clangClass = "OMPDetachClause";
}		}
def OMPC_Inclusive : Clause<"inclusive"> {		def OMPC_Inclusive : Clause<"inclusive"> {
▲ Show 20 Lines • Show All 1,338 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h

	//===- OpenMPDialect.h - MLIR Dialect for OpenMP ----------------- C++ --===//			//===- OpenMPDialect.h - MLIR Dialect for OpenMP ----------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file declares the OpenMP dialect in MLIR.			// This file declares the OpenMP dialect in MLIR.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_OPENMP_OPENMPDIALECT_H_			#ifndef MLIR_DIALECT_OPENMP_OPENMPDIALECT_H_
	#define MLIR_DIALECT_OPENMP_OPENMPDIALECT_H_			#define MLIR_DIALECT_OPENMP_OPENMPDIALECT_H_

				#include "mlir/Dialect/LLVMIR/LLVMTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/OpDefinition.h"			#include "mlir/IR/OpDefinition.h"
				#include "mlir/Interfaces/ControlFlowInterfaces.h"
				#include "mlir/Interfaces/SideEffectInterfaces.h"

	#include "mlir/Dialect/OpenMP/OpenMPOpsDialect.h.inc"			#include "mlir/Dialect/OpenMP/OpenMPOpsDialect.h.inc"
	#include "mlir/Dialect/OpenMP/OpenMPOpsEnums.h.inc"			#include "mlir/Dialect/OpenMP/OpenMPOpsEnums.h.inc"

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/OpenMP/OpenMPOps.h.inc"			#include "mlir/Dialect/OpenMP/OpenMPOps.h.inc"

	#endif // MLIR_DIALECT_OPENMP_OPENMPDIALECT_H_			#endif // MLIR_DIALECT_OPENMP_OPENMPDIALECT_H_

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

Show All 9 Lines

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#ifndef OPENMP_OPS #ifndef OPENMP_OPS

#define OPENMP_OPS #define OPENMP_OPS

include "mlir/IR/OpBase.td" include "mlir/IR/OpBase.td"

include "mlir/Interfaces/SideEffectInterfaces.td"

include "mlir/Interfaces/ControlFlowInterfaces.td"

include "mlir/Dialect/LLVMIR/LLVMOpBase.td"

include "mlir/Dialect/OpenMP/OmpCommon.td" include "mlir/Dialect/OpenMP/OmpCommon.td"

def OpenMP_Dialect : Dialect { def OpenMP_Dialect : Dialect {

let name = "omp"; let name = "omp";

let cppNamespace = "::mlir::omp"; let cppNamespace = "::mlir::omp";

} }

class OpenMP_Op<string mnemonic, list<OpTrait> traits = []> : class OpenMP_Op<string mnemonic, list<OpTrait> traits = []> :

Op<OpenMP_Dialect, mnemonic, traits>; Op<OpenMP_Dialect, mnemonic, traits>;

// Type which can be constraint accepting standard integers, indices and

// LLVM integer types.

def IntLikeType : AnyTypeOf<[AnyInteger, Index, LLVM_AnyInteger]>;

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// 2.6 parallel Construct // 2.6 parallel Construct

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Possible values for the default clause // Possible values for the default clause

def ClauseDefaultPrivate : StrEnumAttrCase<"defprivate">; def ClauseDefaultPrivate : StrEnumAttrCase<"defprivate">;

def ClauseDefaultFirstPrivate : StrEnumAttrCase<"deffirstprivate">; def ClauseDefaultFirstPrivate : StrEnumAttrCase<"deffirstprivate">;

def ClauseDefaultShared : StrEnumAttrCase<"defshared">; def ClauseDefaultShared : StrEnumAttrCase<"defshared">;

▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines let description = [{

terminator takes no operands. The terminator op returns control to the terminator takes no operands. The terminator op returns control to the

enclosing op. enclosing op.

}]; }];

let assemblyFormat = "attr-dict"; let assemblyFormat = "attr-dict";

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// 2.9.2 Workshare Loop Construct

//===----------------------------------------------------------------------===//

def WsLoopOp : OpenMP_Op<"wsloop", [AttrSizedOperandSegments]> {

clementvalUnsubmitted

Done

Would it make sense to add the DeclareOpInterfaceMethods<LoopLikeOpInterface> trait since you added lowerBound, upperBound and step?

clementval: Would it make sense to add the `DeclareOpInterfaceMethods<LoopLikeOpInterface>` trait since you…

ftynseUnsubmitted

Done

LoopLikeOpInterface is a bit of a misnomer. It has nothing to do with bounds, but instead registers the op to be processed by LICM. It will likely break OpenMP loops.

ftynse: `LoopLikeOpInterface` is a bit of a misnomer. It has nothing to do with bounds, but instead…

kiranchandramohanAuthorUnsubmitted

Done

Skipping since they are not applicable to OpenMP loops.

kiranchandramohan: Skipping since they are not applicable to OpenMP loops.

let summary = "workshare loop construct";

let description = [{

The workshare loop construct specifies that the iterations of the loop(s)

clementvalUnsubmitted

Done

Since it is now a loop should the associated loops be rephrased?

clementval: Since it is now a loop should the `associated loops` be rephrased?

kiranchandramohanAuthorUnsubmitted

Done

Did a minor rephrasing.

kiranchandramohan: Did a minor rephrasing.

will be executed in parallel by threads in the current context. These

iterations are spread across threads that already exist in the enclosing

parallel region.

The body region can contain any number of blocks. The region is terminated

by "omp.yield" instruction without operands.

```

omp.wsloop (%i1, %i2) = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) {

%a = load %arrA[%i1, %i2] : memref<?x?xf32>

%b = load %arrB[%i1, %i2] : memref<?x?xf32>

%sum = addf %a, %b : f32

store %sum, %arrC[%i1, %i2] : memref<?x?xf32>

omp.yield

}

```

`private_vars`, `firstprivate_vars`, `lastprivate_vars` and `linear_vars`

arguments are variadic list of operands that specify the data sharing

attributes of the list of values. The `linear_step_vars` operand

additionally specifies the step for each associated linear operand. Note

that the `linear_vars` and `linear_step_vars` variadic lists should contain

the same number of elements.

The optional `schedule_val` attribute specifies the loop schedule for this

loop, determining how the loop is distributed across the parallel threads.

The optional `schedule_chunk_var` associated with this determines further

controls this distribution.

bondhugulaUnsubmitted

Done

This description is missing a customary example in the end. (Please use triple backticks to include an example.)

bondhugula: This description is missing a customary example in the end. (Please use triple backticks to…

kiranchandramohanAuthorUnsubmitted

Done

Added an example. Note that the pretty printer and parser are not part of this patch but still i am using the pretty syntax.

kiranchandramohan: Added an example. Note that the pretty printer and parser are not part of this patch but still…

The optional `collapse_val` attribute specifies the number of loops which

are collapsed to form the worksharing loop.

The `nowait` attribute, when present, signifies that there should be no

implicit barrier at the end of the loop.

kiranchandramohanAuthorUnsubmitted

Done

Is this a single value or a list of values?

kiranchandramohan: Is this a single value or a list of values?

DavidTrubyUnsubmitted

Done

it's one step value per element in linear_vars. So a list of values.
The two lists should always be the same length, but I don't think there's a way to enforce it here (it should be enforced later)

DavidTruby: it's one step value per element in linear_vars. So a list of values. The two lists should…

The optional `ordered_val` attribute specifies how many loops are associated

with the do loop construct.

kiranchandramohanAuthorUnsubmitted

Done

Is there a schedule modifier also?

kiranchandramohan: Is there a schedule modifier also?

DavidTrubyUnsubmitted

Done

I'm leaving the schedule modifier for a later patch, as it requires more changes to the OMP.td file. I've added a clarification on this to the commit message.

DavidTruby: I'm leaving the schedule modifier for a later patch, as it requires more changes to the OMP.td…

kiranchandramohanAuthorUnsubmitted

Done

This should be an attribute.
"The parameter of the collapse clause must be a constant positive integer expression."

kiranchandramohan: This should be an attribute. "The parameter of the collapse clause must be a constant positive…

bondhugulaUnsubmitted

Done

Please Confined and an IntMinValue argument to force this to be positive. It'll be automatically enforced and verified. As an example, please see AllocLikeOp in the standard dialect.

let arguments = (ins Variadic<Index>:$value,
                  Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$alignment);

bondhugula: Please `Confined` and an `IntMinValue` argument to force this to be positive. It'll be…

The optional `order` attribute specifies which order the iterations of the

associate loops are executed in. Currently the only option for this

kiranchandramohanAuthorUnsubmitted

Done

This should also be an attribute.
"The parameter of the ordered clause must be a constant positive integer expression if specified."

kiranchandramohan: This should also be an attribute. "The parameter of the ordered clause must be a constant…

attribute is "concurrent".

}];

let arguments = (ins Variadic<IntLikeType>:$lowerBound,

Variadic<IntLikeType>:$upperBound,

Variadic<IntLikeType>:$step,

ftynseUnsubmitted

Done

How does one terminate such loops?

ftynse: How does one terminate such loops?

kiranchandramohanAuthorUnsubmitted

Done

Added a yield terminator. Current usage will be an empty yield.

kiranchandramohan: Added a yield terminator. Current usage will be an empty yield.

Variadic<AnyType>:$private_vars,

Variadic<AnyType>:$firstprivate_vars,

Variadic<AnyType>:$lastprivate_vars,

Variadic<AnyType>:$linear_vars,

Variadic<AnyType>:$linear_step_vars,

OptionalAttr<ScheduleKind>:$schedule_val,

Optional<AnyType>:$schedule_chunk_var,

Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$collapse_val,

OptionalAttr<UnitAttr>:$nowait,

Confined<OptionalAttr<I64Attr>, [IntMinValue<0>]>:$ordered_val,

OptionalAttr<OrderKind>:$order_val);

let regions = (region AnyRegion:$region);

}

def YieldOp : OpenMP_Op<"yield", [NoSideEffect, ReturnLike, Terminator,

HasParent<"WsLoopOp">]> {

let summary = "loop yield and termination operation";

let description = [{

"omp.yield" yields SSA values from the OpenMP dialect op region and

terminates the region. The semantics of how the values are yielded is

defined by the parent operation.

If "omp.yield" has any operands, the operands must match the parent

operation's results.

}];

let arguments = (ins Variadic<AnyType>:$results);

let assemblyFormat = [{ ( `(` $results^ `:` type($results) `)` )? attr-dict}];

}

//===----------------------------------------------------------------------===//

// 2.10.4 taskyield Construct // 2.10.4 taskyield Construct

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def TaskyieldOp : OpenMP_Op<"taskyield"> { def TaskyieldOp : OpenMP_Op<"taskyield"> {

let summary = "taskyield construct"; let summary = "taskyield construct";

let description = [{ let description = [{

The taskyield construct specifies that the current task can be suspended The taskyield construct specifies that the current task can be suspended

in favor of execution of a different task. in favor of execution of a different task.

}]; }];

let assemblyFormat = "attr-dict"; let assemblyFormat = "attr-dict";

} }

ftynseUnsubmitted

Not Done

What is the semantics of multiple blocks terminated with omp.yield in the loop body? (Regions are not necessarily single-exit).

ftynse: What is the semantics of multiple blocks terminated with `omp.yield` in the loop body?

kiranchandramohanAuthorUnsubmitted

Done

The openmp worksharing loop is a single exit region. So multiple terminators are not expected.

Should this be enforced through the verifier in this or a subsequent patch?

kiranchandramohan: The openmp worksharing loop is a single exit region. So multiple terminators are not expected.

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// 2.13.7 flush Construct // 2.13.7 flush Construct

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def FlushOp : OpenMP_Op<"flush"> { def FlushOp : OpenMP_Op<"flush"> {

let summary = "flush construct"; let summary = "flush construct";

let description = [{ let description = [{

The flush construct executes the OpenMP flush operation. This operation The flush construct executes the OpenMP flush operation. This operation

makes a thread’s temporary view of memory consistent with memory and makes a thread’s temporary view of memory consistent with memory and

enforces an order on the memory operations of the variables explicitly enforces an order on the memory operations of the variables explicitly

specified or implied. specified or implied.

}]; }];

ftynseUnsubmitted

Done

}

```

- The $private_vars, $firstprivate_vars, $lastprivate_vars and $linear_vars

+ The `private_vars`, `firstprivate_vars`, `lastprivate_vars` and `linear_vars`

parameters are a variadic list of variables that specify the data sharing

Nit, here and below: the $ sign only appears in ODS input, not in the generated documentation or code.

ftynse: Nit, here and below: the `$` sign only appears in ODS input, not in the generated documentation…

let arguments = (ins Variadic<AnyType>:$varList); let arguments = (ins Variadic<AnyType>:$varList);

ftynseUnsubmitted

Done

Nit: there are no "variables" in MLIR. In these case, you are likely referring to operand groups.

ftynse: Nit: there are no "variables" in MLIR. In these case, you are likely referring to operand…

let assemblyFormat = [{ ( `(` $varList^ `:` type($varList) `)` )? attr-dict}]; let assemblyFormat = [{ ( `(` $varList^ `:` type($varList) `)` )? attr-dict}];

} }

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// 2.16 master Construct // 2.16 master Construct

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

def MasterOp : OpenMP_Op<"master"> { def MasterOp : OpenMP_Op<"master"> {

Show All 40 Lines

mlir/test/Dialect/OpenMP/ops.mlir

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	func @omp_parallel_pretty(%data_var : memref<i32>, %if_cond : i1, %num_threads : si32, %allocator : si32) -> () {
// CHECK omp.parallel if(%{{.}}) num_threads(%{{.}} : si32) private(%{{.*}} : memref<i32>) proc_bind(close)		// CHECK omp.parallel if(%{{.}}) num_threads(%{{.}} : si32) private(%{{.*}} : memref<i32>) proc_bind(close)
omp.parallel num_threads(%num_threads : si32) if(%if_cond: i1)		omp.parallel num_threads(%num_threads : si32) if(%if_cond: i1)
private(%data_var : memref<i32>) proc_bind(close) {		private(%data_var : memref<i32>) proc_bind(close) {
omp.terminator		omp.terminator
}		}

return		return
}		}

		func @omp_wsloop(%lb : index, %ub : index, %step : index,
		%data_var : memref<i32>, %linear_var : si32, %chunk_var : si32) -> () {

		// CHECK: "omp.wsloop"(%{{.}}, %{{.}}, %{{.}}, %{{.}})
		"omp.wsloop" (%lb, %ub, %step, %data_var) ({
		omp.yield
		}) {operand_segment_sizes = dense<[1,1,1,1,0,0,0,0,0]> : vector<9xi32>, collapse_val = 2, ordered_val = 1} :
		(index, index, index, memref<i32>) -> ()

		// CHECK: "omp.wsloop"(%{{.}}, %{{.}}, %{{.}}, %{{.}})
		"omp.wsloop" (%lb, %lb, %ub, %ub, %step, %step, %data_var) ({
		omp.yield
		}) {operand_segment_sizes = dense<[2,2,2,1,0,0,0,0,0]> : vector<9xi32>, collapse_val = 2, ordered_val = 1} :
		(index, index, index, index, index, index, memref<i32>) -> ()


		// CHECK: "omp.wsloop"(%{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.*}})
		"omp.wsloop" (%lb, %ub, %step, %data_var, %linear_var) ({
		omp.yield
		}) {operand_segment_sizes = dense<[1,1,1,0,0,0,1,1,0]> : vector<9xi32>, schedule_val = "Static"} :
		(index, index, index, memref<i32>, si32) -> ()

		// CHECK: "omp.wsloop"(%{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.*}})
		"omp.wsloop" (%lb, %ub, %step, %data_var, %data_var, %data_var, %data_var, %linear_var, %chunk_var) ({
		omp.yield
		}) {operand_segment_sizes = dense<[1,1,1,1,1,1,1,1,1]> : vector<9xi32>, schedule_val = "Dynamic", collapse_val = 3, ordered_val = 2} :
		kiranchandramohanAuthorUnsubmitted Not Done Reply Inline Actions Nit: extra line. kiranchandramohan: Nit: extra line.
		(index, index, index, memref<i32>, memref<i32>, memref<i32>, memref<i32>, si32, si32) -> ()

		// CHECK: "omp.wsloop"(%{{.}}, %{{.}}, %{{.}}, %{{.}})
		"omp.wsloop" (%lb, %ub, %step, %data_var) ({
		omp.yield
		}) {operand_segment_sizes = dense<[1,1,1,1,0,0,0,0,0]> : vector<9xi32>, nowait, schedule_val = "Auto"} :
		(index, index, index, memref<i32>) -> ()


		return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][OpenMP] Add omp.wsloop operationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 305508

llvm/include/llvm/Frontend/OpenMP/OMP.td

mlir/include/mlir/Dialect/OpenMP/OpenMPDialect.h

mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td

mlir/test/Dialect/OpenMP/ops.mlir

[MLIR][OpenMP] Add omp.wsloop operation
ClosedPublic