This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Introduce scf.execute_region op
ClosedPublic

Authored by bondhugula on Mar 8 2020, 10:54 PM.

Details

Summary

Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282

execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.

Diff Detail

Event Timeline

bondhugula created this revision.Mar 8 2020, 10:54 PM

This patch won't work without D71961 (which unties return from FuncOp).

nicolasvasilache added inline comments.
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
981 ↗(On Diff #249030)

I'll repaste my comment that I don't believe was addressed:

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.
Can this be designed and implemented so it serves today’s needs that are already more general than “just capture”?
bondhugula marked an inline comment as done.Mar 9 2020, 7:16 AM
bondhugula added inline comments.
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
981 ↗(On Diff #249030)

I'm not sure what you may need for LinAlg; so I can't say how it may help there. For the things this op will at least help with, please see the commit summary or the discussion thread for more details - it is based on today's needs but your 'today' may be very different from mine! :-) I do see the need for ops that take dimensional arguments and captures (and with regions), but their goals are very different from those of this op. I suspect you might be thinking that this op works at a higher level than it really does - so you may need a different op for what you have in mind.

I'll reformulate based on your description

lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining

Lambdas often allow both captures and arguments.
I can see immediate use cases for this op allowing captures + arguments and see advantages in having one op that does both, as opposed to duplication/splitting into, e.g.:

  1. one op "that can only capture",
  2. one op "that can only take arguments" and
  3. one op "that can do both".

Is there a fundamental reason to disallow arguments in your op?
Assuming there exists such a reason, isn't it trivial to check preconditions such as "empty arguments" in verifiers that need it?

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.

I don't understand how explicit capture contributes to "refuse to introduce SSA values prematurely"? Can you provide an example of what you?
It isn't clear to me why would we keep arguments with such op instead of always canonicalizing towards eliminating them.

bondhugula added a comment.EditedMar 9 2020, 9:34 PM

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.

I don't understand how explicit capture contributes to "refuse to introduce SSA values prematurely"? Can you provide an example of what you?
It isn't clear to me why would we keep arguments with such op instead of always canonicalizing towards eliminating them.

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt? If anything, allowing the removal of trivial args where it makes sense be a canonicalization on the execute_region op avoids pushing the responsibility on clients producing the op to create the op in that form initially. E.g. when inlining a calls as in the initial use case, you would just throw the FuncOp's region as-is into an execute_region op (updating "return" terminators perhaps), and transfer over the arg list of the call to the execute_regoin op. Otherwise, they would have to do the arg rewriting themselves (maybe we can just have a helper function for that though).

I guess I'm trying to understand whether we expect code to see code like this:

if (auto executeRegion = dyn_cast<ExecuteRegionOp>(op)) {
  if (executeRegion.hasExplicitCaptures()) {
    break; // Darn, can't handle it.
  }
}

I expect that we won't have code like this, and instead what we'll see is generic use-def following passes that silently aren't smart enough to handle explicit captures (such as when applying local rewrite patterns) and will fail to optimize. So I think the real question is balancing:

  1. The cost of pushing all clients of this op to establish the canonical form of no explicit captures mandatorily upon creation
  2. The potential lost optimization opportunities due failing (for whatever reason; oversight, pass ordering issues, ...) to run the canonicalization pass to put it into the no-explicit-capture form.

Neither seems massively compelling, so starting with the more restricted form seems like a good choice. We can loosen it later if needed.

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt?

I missed this, do you have a pointer?
I assume this wouldn't be zero cost / transparent though.

  1. The cost of pushing all clients of this op to establish the canonical form of no explicit captures mandatorily upon creation

I may be missing something, but isn't it just a direct RAUW? What is the cost herE?

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt? I

I think there is some communication gap here and perhaps different things being mixed. If you've explicitly captured something, you've already created a barrier: for eg. consider a dynamically shaped memref explicitly captured that prevents a static shape from flowing in via a memref cast used from above; unless you replace the memref with the statically shaped one, you won't see the static shape for whatever analysis/transform. For the affine graybox, I had done a detailed analysis of the costs of just explicitly capturing memrefs (see for eg. how it complicates dead dealloc removal):
https://github.com/polymage-labs/mlirx/blob/master/mlir/rfc/rfc-graybox.md#maintaining-memref-operandsarguments
There is no way around registering and implementing canonicalizations for what would have otherwise worked. The advantage of explicitly capturing memrefs in the context of the graybox was that you don't have to look inside the op to see which memory is being accessed and you don't want to because it's part of a different polyhedral scope with its own symbols; so the downsides of explicit capture IMO are outweighed by how they simplify polyhedral passes. For execute_region, there is no such argument in favor of explicit captures. You just have to do a simple replaceAllUsesWith to propagate what you thought of as explicit captures (this is what I already do when I convert an affine.graybox to an execute_region in D72223.

Okay, let's land this without allowing explicit captures, given that's the most restrictive semantics. We can loosen it later if there's a compelling need.

@nicolasvasilache is that ok with you?

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
978 ↗(On Diff #249030)

make the summary more descriptive.

979 ↗(On Diff #249030)

Indent description by two spaces (instead of 4) for consistency with the rest of the file

1025 ↗(On Diff #249030)

Can you add a verifier that the region doesn't have args?

Also, I'm not super familiar with ODS, but does this specification autogenerate a verifier that there are no operands, or does it just not check anything about the op's operands? If the latter, please change it so that the verifier checks that there are no operands to this op.

rriddle added inline comments.Mar 18 2020, 10:20 PM
mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
974 ↗(On Diff #249030)

This op comes before ExtractElementOp alphabetically.

984 ↗(On Diff #249030)

nit: MLIR functions (FuncOp) -> FuncOp

994 ↗(On Diff #249030)

This should be indented in an mlir code block

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388 ↗(On Diff #249030)

Use /// For top-level comments.

bondhugula marked 7 inline comments as done.

Address review comments.

Thanks for the reviews! Updated.

Okay, let's land this without allowing explicit captures, given that's the most restrictive semantics. We can loosen it later if there's a compelling need.

This will still need the patch on the ReturnOp to land. On a related note, I think we shouldn't move this op to the loop dialect unless the latter is renamed first.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1025 ↗(On Diff #249030)

ODS doesn't support regions yet, and so you are right that we'll need to verify for zero region arguments. (Added that as well as > 0 blocks check.) But for operands, with no operands in the ODS description, the auto-generation will mark the op with the ZeroOperands trait, and the latter's verifier will check for it.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388 ↗(On Diff #249030)

Sorry, I didn't understand. These aren't doc comments, but for the implementation. This entire comment para along with the "Ex:" should be ///?

bondhugula edited the summary of this revision. (Show Details)Mar 19 2020, 4:16 AM
bondhugula edited the summary of this revision. (Show Details)
bondhugula edited the summary of this revision. (Show Details)
nicolasvasilache added a comment.EditedMar 19 2020, 7:09 AM

Responding in bulk below:

@nicolasvasilache is that ok with you?

I am not opposed to moving forward and iterating on code as we learn more.
But I still haven't seen a compelling reason to disallow arguments.
I would like some concrete example that illustrates why allowing arguments is a bad idea.

We can loosen it later if there's a compelling need.

This seems at odds with this other statement:

Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

In other words, either:

  1. there are fundamental difficulties involved in which case, refusing arguments pushes concerns to all consumers. Shouldn't difficulty be factored out in one place
  2. or it's a simple extension, in which case why not just allow arguments?

I am unclear if we are in case 1., 2. or something else. Which is it?

@mehdi_amini Can you provide an example of what you?

See linalg.generic and linalg.indexed_generic, both have region arguments that are derived from the op operands, but are not necessarily the same SSA value
(e.g there is an interleaved load/store or even loop IV creation).

Now I can see how to use this op in its current form for my particular purpose: I can just move the content of my region inside a new execute_region op as I lower.

But I don't think my questions have been answered so I'll ask again:

Lambdas often allow both captures and arguments.
...
Is there a fundamental reason to disallow arguments in your op?
Assuming there exists such a reason, isn't it trivial to check preconditions such as "empty arguments" in verifiers that need it?
bondhugula added a comment.EditedMar 19 2020, 9:33 AM

Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

In other words, either:

  1. there are fundamental difficulties involved in which case, refusing arguments pushes concerns to all consumers. Shouldn't difficulty be factored out in one place
  2. or it's a simple extension, in which case why not just allow arguments?

It's not a simple extension. There are major costs. Straightforward SSA dominance vs having to pass through arguments (explicit captures) is akin to "intraprocedural optimization" vs "a good part of the complexity involved in interprocedural optimization" -- the latter has been established to be more difficult than intraprocedural for the same given transformation.

Is there a fundamental reason to disallow arguments in your op?

Yes, I'm going to copy paste the same thing from above, but with a few extra lines below - I'm not sure if you had read this differently because the answer is pretty straightforward. "Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across the region boundary, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it." You'd have to reimplement nearly all canonicalizations on this op from propagation of constants, to propagation of memref_casts, removal of dead deallocs, removal of dead allocs, subexpression elimination, etc.. For an example on just memref arguments, see the link upthread on grayboxes on the kind of complexities you'd have to deal with if you explicitly captured memrefs (over there IMO explicit captures just for memrefs are worth those cost and hence a new op affine.graybox). Did you skip reading the in-between messages?

To conclude, I just don't see the benefits of explicit captures in a few specific cases to outweigh the widespread / large scale negative impact on all lower level SSA optimizations (where low-level here is std dialect, loop dialect, and to some extent also the affine dialect - you'd have execute_region in the presence of these dialect ops *at least*) .

Lambdas often allow both captures and arguments.

Yes: and they serve very different purpose, they have different well.
Basically it seems like using different construct to model different concepts is fairly standard and undisputed.

Is there a fundamental reason to disallow arguments in your op?

Seems like this is adding extra complexity, but I haven't seen a reason to motivate it. This seems like a good enough reason to me?

As I mentioned before, why wouldn't a canonicalize pattern just eliminate all the operands? And if so why do we allow it in the first place?

Sure, the traditional, run-of-the-mill properties are true:

  • implicit captures preserve use-def chains
  • arguments break use-def chains and if you want similar optimizations you'll want inlining or some form of IPO.

I see the discussion above as conflating semantics with optimization.
Allowing your op to take argument does absolutely not mean you have to use them for everything all the time, yet the argumentation seems to take that as a premisse.
I think this is particularly clear in the following:

You'd have to reimplement nearly all canonicalizations on this op from propagation of constants, to propagation of memref_casts, removal of dead deallocs, removal of dead allocs, subexpression elimination, etc..

I don't see how this is true and why you'd have to reimplement anything in this list.
If you want these canonicalizations to apply immediately, you should just use implicit capture for all values (which is what you propose).

You could also want to use arguments for a subset of the values, isolate their users, inline and then apply the remaining canonicalizations.
That would be perfectly fine too.

I still see no compelling reason to strictly forbid arguments in this op: if you want to enforce somewhere that everything is by-capture only, it's easy to verify that numArguments == 0.
OTOH adding arguments is trivial and will be transparent wrt everything you mention above: if you don't want to use arguments just don't use argument.
Literally, if you added the possibility for your op to have arguments, your canonicalization test would not change.

Plainly forbidding arguments has a finality to it that I view as unnecessary.

If you want these canonicalizations to apply immediately, you should just use implicit capture for all values (which is what you propose).

So we are actually on the same page as far as the benefits of implicit captures goes? I was under the impression that you were missing those, but you just want the option to use explicit captures on this when you really have such a use case -- but then with the explicit captures comes the question of how the arguments obtain their values and you can't have custom behavior there because the lowering would need to know how exactly those arguments obtain those values or how operands bind to arguments, for eg. that there is a 1:1 match between its operands and arguments. And if there is a 1:1 match, we are back to the question why not just do a RAUW and eliminate those arguments in the first place? OTOH, if your arguments obtain values from operands or elsewhere in a more custom way, then the lowering would need to be aware of it in an unambiguous way, and you'd have to design/evaluate that. As @silvas mentions too, this still means it makes sense to start from the most restrictive form (only implicit captures), and evaluate an explicit capture option by first defining what exactly the capture argument semantics are for the use case, how it impacts the lowering, and mechanically, what the new syntax of the op would look like. (It is just two lines to knock off in the verifier if you want the op to take region arguments.)

@nicolasvasilache the real thing to evaluate for your linalg use case is the benefits of "having a separate op that could readily lower to execute_region when the time is right" vis-a-vis "adding explicit arg semantics to execute region op itself". Note that different client/higher level use cases may want different semantics with their explicit captures (and how the region arguments obtain their values), and they could benefit by modeling/handling those explicit captures on their own op (and dealing with the custom canonicalizations there) before lowering to execute_region.

So we are actually on the same page as far as the benefits of implicit captures goes?
...
As @silvas mentions too, this still means it makes sense to start from the most restrictive form (only implicit captures).

Yes no argument there, I was unclear if I missed something fundamental that makes it strictly necessary to forbid explicit arguments.
I am sympathetic with the arguments that "it is simpler" + "you won't need it in practice".

Since I seem to be the only one who would like a little more flexibility but since I can also easily work around this, let's land this and iterate later, if necessary.

bondhugula retitled this revision from Introduce std.execute_region op to [MLIR] Introduce std.execute_region op.Mar 20 2020, 8:49 PM

Ping reviewers @rriddle, @silvas - could you please see the tip of the threads? Comments on the patch code itself have been addressed.

LGTM from me. I think the "free returnop from funcop" discussion could go on for a while, so I would encourage you to introduce a new terminator for now so that we can land this.

silvas accepted this revision.Mar 24 2020, 1:35 PM
This revision is now accepted and ready to land.Mar 24 2020, 1:35 PM

LGTM from me. I think the "free returnop from funcop" discussion could go on for a while, so I would encourage you to introduce a new terminator for now so that we can land this.

Sounds good to me. What should the new terminator be called - std.yield?

rriddle accepted this revision.Mar 24 2020, 8:18 PM

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
933 ↗(On Diff #251338)

nit: wrap this in ``.

944 ↗(On Diff #251338)

nit: Ex: -> Example

947 ↗(On Diff #251338)

nit: Don't indent inside of the mlir code block.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388 ↗(On Diff #249030)

Yeah for consistency, we use /// everywhere.

mlir/test/IR/core-ops.mlir
605 ↗(On Diff #251338)

nit: Don't check the pred comment,

bondhugula marked 6 inline comments as done.Mar 24 2020, 10:15 PM

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

Should the std.yield terminator be introduced in this patch or another one? It's not a trivial couple of lines because std.yield's verify should pretty much be doing what the FuncOp's verify does in D71961 for imperative ops. (On another note, the YieldOp name is used by both Loop and Linalg dialect without a namespace qualifier that would cause a conflict.)

bondhugula marked an inline comment as done.Mar 24 2020, 11:35 PM
bondhugula edited the summary of this revision. (Show Details)

Address review comments; introduce std.yield

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

Should the std.yield terminator be introduced in this patch or another one? It's not a trivial because std.yield's verify should pretty much be doing what the FuncOp's verify does in D71961 for imperative ops. Also, the YieldOp name is used by both Loop and Linalg dialect without a namespace qualifier which causes a conflict and requires many updates. I've anyway gone ahead and done those. PTAL.

Presumably yield should replace the linalg and loop yields? I would add std.yield in a separate patch that refactors the other dialects to use it as well.

Presumably yield should replace the linalg and loop yields? I would add std.yield in a separate patch that refactors the other dialects to use it as well.

That makes sense - it's a separate patch that requires a discussion and review in itself.

Take out any yield op changes. Rebase.

This op will have to be moved to the right dialect once the std dialect split completes - mostly scf.

Herald added a project: Restricted Project. · View Herald Transcript
ftynse added a subscriber: ftynse.May 4 2021, 7:59 AM

Hi Uday,

do you think it is possible to move this to the SCF dialect and use scf.yield instead of return?

This should move into the scf dialect and can be respun when the need arises.

bondhugula edited the summary of this revision. (Show Details)Jun 17 2021, 10:14 PM

Rebase on upstream tip. Move op to SCF dialect.

Drop duplicate attr dict parsing. Fix stale comment.

bondhugula edited the summary of this revision. (Show Details)

Update revision summary

bondhugula retitled this revision from [MLIR] Introduce std.execute_region op to [MLIR] Introduce scf.execute_region op.Jun 17 2021, 11:57 PM
ftynse accepted this revision.Jun 18 2021, 1:05 AM
bondhugula edited the summary of this revision. (Show Details)

Update commit summary - fix revision number.

This revision was landed with ongoing or failed builds.Jun 18 2021, 3:10 AM
This revision was automatically updated to reflect the committed changes.
mehdi_amini added inline comments.Jun 21 2021, 12:35 PM
mlir/include/mlir/Dialect/SCF/SCFOps.td
114

Seems like a canonicalization could be that it would return an SSA value defined in the enclosing region.

bondhugula marked an inline comment as done.Jul 9 2021, 12:09 AM
bondhugula added inline comments.
mlir/include/mlir/Dialect/SCF/SCFOps.td
114

Did you mean moving the slice that generates the yield values to the enclosing region (if they were inside the scf.execute_region)? This can't be done in O(1) time in general nor are the utilities that allow one to do that available in IR libraries - can be part of Transforms/ though.

mehdi_amini added inline comments.Jul 9 2021, 9:36 AM
mlir/include/mlir/Dialect/SCF/SCFOps.td
114

I mean that if you have:

%value = ...
%execute_results:2 = scf.execute_region {
  ...
  %x = ...
   ...
  scf.yield %x, %value
}

Here %execute_results#1 can be RAUW with %value and the code turned to:

%value = ...
%execute_results = scf.execute_region {
  ...
  %x = ...
   ...
  scf.yield %x
}
bondhugula marked 2 inline comments as done.Jul 9 2021, 4:59 PM
bondhugula added inline comments.
mlir/include/mlir/Dialect/SCF/SCFOps.td
114

Okay, sure. (This is completely different than what I understood from your statement. Perhaps better stated as: "... canonicalization in the situation where it's returning a value defined in the enclosing region".)