This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/SCF/
-
mlir/
-
Dialect/
-
SCF/
4/4
SCFOps.td
-
lib/Dialect/SCF/
-
Dialect/
-
SCF/
-
SCF.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
-
canonicalize.mlir
-
invalid.mlir
-
ops.mlir
-
utils/vim/syntax/
-
vim/
-
syntax/
-
mlir.vim

Differential D75837

[MLIR] Introduce scf.execute_region op
ClosedPublic

Authored by bondhugula on Mar 8 2020, 10:54 PM.

Download Raw Diff

Details

Reviewers

rriddle
silvas
mehdi_amini
nicolasvasilache
mravishankar
antiagainst
herhut
ftynse

Commits

rG18c8c934d858: [MLIR] Introduce scf.execute_region op

Summary

Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282

execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bondhugula created this revision.Mar 8 2020, 10:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2020, 10:54 PM

Herald added subscribers: llvm-commits, Joonsoo, liufengdb and 11 others. · View Herald Transcript

Harbormaster failed remote builds in B48515: Diff 249030!Mar 8 2020, 11:25 PM

This patch won't work without D71961 (which unties return from FuncOp).

bondhugula added reviewers: rriddle, silvas, mehdi_amini.Mar 9 2020, 12:03 AM

nicolasvasilache added a reviewer: nicolasvasilache.Mar 9 2020, 5:16 AM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

981 ↗

(On Diff #249030)

I'll repaste my comment that I don't believe was addressed:

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.
Can this be designed and implemented so it serves today’s needs that are already more general than “just capture”?

bondhugula marked an inline comment as done.Mar 9 2020, 7:16 AM

bondhugula added inline comments.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
981 ↗	(On Diff #249030)	I'm not sure what you may need for LinAlg; so I can't say how it may help there. For the things this op will at least help with, please see the commit summary or the discussion thread for more details - it is based on today's needs but your 'today' may be very different from mine! :-) I do see the need for ops that take dimensional arguments and captures (and with regions), but their goals are very different from those of this op. I suspect you might be thinking that this op works at a higher level than it really does - so you may need a different op for what you have in mind.

I'll reformulate based on your description

lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining

Lambdas often allow both captures and arguments.
I can see immediate use cases for this op allowing captures + arguments and see advantages in having one op that does both, as opposed to duplication/splitting into, e.g.:

one op "that can only capture",
one op "that can only take arguments" and
one op "that can do both".

Is there a fundamental reason to disallow arguments in your op?
Assuming there exists such a reason, isn't it trivial to check preconditions such as "empty arguments" in verifiers that need it?

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.

I don't understand how explicit capture contributes to "refuse to introduce SSA values prematurely"? Can you provide an example of what you?
It isn't clear to me why would we keep arguments with such op instead of always canonicalizing towards eliminating them.

In D75837#1913781, @mehdi_amini wrote:

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.

I don't understand how explicit capture contributes to "refuse to introduce SSA values prematurely"? Can you provide an example of what you?
It isn't clear to me why would we keep arguments with such op instead of always canonicalizing towards eliminating them.

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt? If anything, allowing the removal of trivial args where it makes sense be a canonicalization on the execute_region op avoids pushing the responsibility on clients producing the op to create the op in that form initially. E.g. when inlining a calls as in the initial use case, you would just throw the FuncOp's region as-is into an execute_region op (updating "return" terminators perhaps), and transfer over the arg list of the call to the execute_regoin op. Otherwise, they would have to do the arg rewriting themselves (maybe we can just have a helper function for that though).

I guess I'm trying to understand whether we expect code to see code like this:

if (auto executeRegion = dyn_cast<ExecuteRegionOp>(op)) {
  if (executeRegion.hasExplicitCaptures()) {
    break; // Darn, can't handle it.
  }
}

I expect that we won't have code like this, and instead what we'll see is generic use-def following passes that silently aren't smart enough to handle explicit captures (such as when applying local rewrite patterns) and will fail to optimize. So I think the real question is balancing:

The cost of pushing all clients of this op to establish the canonical form of no explicit captures mandatorily upon creation
The potential lost optimization opportunities due failing (for whatever reason; oversight, pass ordering issues, ...) to run the canonicalization pass to put it into the no-explicit-capture form.

Neither seems massively compelling, so starting with the more restricted form seems like a good choice. We can loosen it later if needed.

In D75837#1927525, @silvas wrote:

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt?

I missed this, do you have a pointer?
I assume this wouldn't be zero cost / transparent though.

The cost of pushing all clients of this op to establish the canonical form of no explicit captures mandatorily upon creation

I may be missing something, but isn't it just a direct RAUW? What is the cost herE?

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt? I

I think there is some communication gap here and perhaps different things being mixed. If you've explicitly captured something, you've already created a barrier: for eg. consider a dynamically shaped memref explicitly captured that prevents a static shape from flowing in via a memref cast used from above; unless you replace the memref with the statically shaped one, you won't see the static shape for whatever analysis/transform. For the affine graybox, I had done a detailed analysis of the costs of just explicitly capturing memrefs (see for eg. how it complicates dead dealloc removal):
https://github.com/polymage-labs/mlirx/blob/master/mlir/rfc/rfc-graybox.md#maintaining-memref-operandsarguments
There is no way around registering and implementing canonicalizations for what would have otherwise worked. The advantage of explicitly capturing memrefs in the context of the graybox was that you don't have to look inside the op to see which memory is being accessed and you don't want to because it's part of a different polyhedral scope with its own symbols; so the downsides of explicit capture IMO are outweighed by how they simplify polyhedral passes. For execute_region, there is no such argument in favor of explicit captures. You just have to do a simple replaceAllUsesWith to propagate what you thought of as explicit captures (this is what I already do when I convert an affine.graybox to an execute_region in D72223.

Okay, let's land this without allowing explicit captures, given that's the most restrictive semantics. We can loosen it later if there's a compelling need.

@nicolasvasilache is that ok with you?

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
978 ↗	(On Diff #249030)	make the summary more descriptive.
979 ↗	(On Diff #249030)	Indent description by two spaces (instead of 4) for consistency with the rest of the file
1025 ↗	(On Diff #249030)	Can you add a verifier that the region doesn't have args? Also, I'm not super familiar with ODS, but does this specification autogenerate a verifier that there are no operands, or does it just not check anything about the op's operands? If the latter, please change it so that the verifier checks that there are no operands to this op.

rriddle added inline comments.Mar 18 2020, 10:20 PM

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
974 ↗	(On Diff #249030)	This op comes before ExtractElementOp alphabetically.
984 ↗	(On Diff #249030)	nit: MLIR functions (FuncOp) -> FuncOp
994 ↗	(On Diff #249030)	This should be indented in an mlir code block
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388 ↗	(On Diff #249030)	Use /// For top-level comments.

Address review comments.

Thanks for the reviews! Updated.

In D75837#1930465, @silvas wrote:

Okay, let's land this without allowing explicit captures, given that's the most restrictive semantics. We can loosen it later if there's a compelling need.

This will still need the patch on the ReturnOp to land. On a related note, I think we shouldn't move this op to the loop dialect unless the latter is renamed first.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1025 ↗	(On Diff #249030)	ODS doesn't support regions yet, and so you are right that we'll need to verify for zero region arguments. (Added that as well as > 0 blocks check.) But for operands, with no operands in the ODS description, the auto-generation will mark the op with the ZeroOperands trait, and the latter's verifier will check for it.
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388 ↗	(On Diff #249030)	Sorry, I didn't understand. These aren't doc comments, but for the implementation. This entire comment para along with the "Ex:" should be ///?

bondhugula edited the summary of this revision. (Show Details)Mar 19 2020, 4:16 AM

bondhugula edited the summary of this revision. (Show Details)

Harbormaster failed remote builds in B49731: Diff 251338!Mar 19 2020, 4:50 AM

Responding in bulk below:

@nicolasvasilache is that ok with you?

I am not opposed to moving forward and iterating on code as we learn more.
But I still haven't seen a compelling reason to disallow arguments.
I would like some concrete example that illustrates why allowing arguments is a bad idea.

We can loosen it later if there's a compelling need.

This seems at odds with this other statement:

Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

In other words, either:

there are fundamental difficulties involved in which case, refusing arguments pushes concerns to all consumers. Shouldn't difficulty be factored out in one place
or it's a simple extension, in which case why not just allow arguments?

I am unclear if we are in case 1., 2. or something else. Which is it?

@mehdi_amini Can you provide an example of what you?

See linalg.generic and linalg.indexed_generic, both have region arguments that are derived from the op operands, but are not necessarily the same SSA value
(e.g there is an interleaved load/store or even loop IV creation).

Now I can see how to use this op in its current form for my particular purpose: I can just move the content of my region inside a new execute_region op as I lower.

But I don't think my questions have been answered so I'll ask again:

Lambdas often allow both captures and arguments.
...
Is there a fundamental reason to disallow arguments in your op?
Assuming there exists such a reason, isn't it trivial to check preconditions such as "empty arguments" in verifiers that need it?

Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

In other words, either:

there are fundamental difficulties involved in which case, refusing arguments pushes concerns to all consumers. Shouldn't difficulty be factored out in one place

or it's a simple extension, in which case why not just allow arguments?

It's not a simple extension. There are major costs. Straightforward SSA dominance vs having to pass through arguments (explicit captures) is akin to "intraprocedural optimization" vs "a good part of the complexity involved in interprocedural optimization" -- the latter has been established to be more difficult than intraprocedural for the same given transformation.

Is there a fundamental reason to disallow arguments in your op?

Yes, I'm going to copy paste the same thing from above, but with a few extra lines below - I'm not sure if you had read this differently because the answer is pretty straightforward. "Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across the region boundary, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it." You'd have to reimplement nearly all canonicalizations on this op from propagation of constants, to propagation of memref_casts, removal of dead deallocs, removal of dead allocs, subexpression elimination, etc.. For an example on just memref arguments, see the link upthread on grayboxes on the kind of complexities you'd have to deal with if you explicitly captured memrefs (over there IMO explicit captures just for memrefs are worth those cost and hence a new op affine.graybox). Did you skip reading the in-between messages?

To conclude, I just don't see the benefits of explicit captures in a few specific cases to outweigh the widespread / large scale negative impact on all lower level SSA optimizations (where low-level here is std dialect, loop dialect, and to some extent also the affine dialect - you'd have execute_region in the presence of these dialect ops *at least*) .

Lambdas often allow both captures and arguments.

Yes: and they serve very different purpose, they have different well.
Basically it seems like using different construct to model different concepts is fairly standard and undisputed.

Is there a fundamental reason to disallow arguments in your op?

Seems like this is adding extra complexity, but I haven't seen a reason to motivate it. This seems like a good enough reason to me?

As I mentioned before, why wouldn't a canonicalize pattern just eliminate all the operands? And if so why do we allow it in the first place?

Sure, the traditional, run-of-the-mill properties are true:

implicit captures preserve use-def chains
arguments break use-def chains and if you want similar optimizations you'll want inlining or some form of IPO.

I see the discussion above as conflating semantics with optimization.
Allowing your op to take argument does absolutely not mean you have to use them for everything all the time, yet the argumentation seems to take that as a premisse.
I think this is particularly clear in the following:

You'd have to reimplement nearly all canonicalizations on this op from propagation of constants, to propagation of memref_casts, removal of dead deallocs, removal of dead allocs, subexpression elimination, etc..

I don't see how this is true and why you'd have to reimplement anything in this list.
If you want these canonicalizations to apply immediately, you should just use implicit capture for all values (which is what you propose).

You could also want to use arguments for a subset of the values, isolate their users, inline and then apply the remaining canonicalizations.
That would be perfectly fine too.

I still see no compelling reason to strictly forbid arguments in this op: if you want to enforce somewhere that everything is by-capture only, it's easy to verify that numArguments == 0.
OTOH adding arguments is trivial and will be transparent wrt everything you mention above: if you don't want to use arguments just don't use argument.
Literally, if you added the possibility for your op to have arguments, your canonicalization test would not change.

Plainly forbidding arguments has a finality to it that I view as unnecessary.

If you want these canonicalizations to apply immediately, you should just use implicit capture for all values (which is what you propose).

So we are actually on the same page as far as the benefits of implicit captures goes? I was under the impression that you were missing those, but you just want the option to use explicit captures on this when you really have such a use case -- but then with the explicit captures comes the question of how the arguments obtain their values and you can't have custom behavior there because the lowering would need to know how exactly those arguments obtain those values or how operands bind to arguments, for eg. that there is a 1:1 match between its operands and arguments. And if there is a 1:1 match, we are back to the question why not just do a RAUW and eliminate those arguments in the first place? OTOH, if your arguments obtain values from operands or elsewhere in a more custom way, then the lowering would need to be aware of it in an unambiguous way, and you'd have to design/evaluate that. As @silvas mentions too, this still means it makes sense to start from the most restrictive form (only implicit captures), and evaluate an explicit capture option by first defining what exactly the capture argument semantics are for the use case, how it impacts the lowering, and mechanically, what the new syntax of the op would look like. (It is just two lines to knock off in the verifier if you want the op to take region arguments.)

@nicolasvasilache the real thing to evaluate for your linalg use case is the benefits of "having a separate op that could readily lower to execute_region when the time is right" vis-a-vis "adding explicit arg semantics to execute region op itself". Note that different client/higher level use cases may want different semantics with their explicit captures (and how the region arguments obtain their values), and they could benefit by modeling/handling those explicit captures on their own op (and dealing with the custom canonicalizations there) before lowering to execute_region.

So we are actually on the same page as far as the benefits of implicit captures goes?
...
As @silvas mentions too, this still means it makes sense to start from the most restrictive form (only implicit captures).

Yes no argument there, I was unclear if I missed something fundamental that makes it strictly necessary to forbid explicit arguments.
I am sympathetic with the arguments that "it is simpler" + "you won't need it in practice".

Since I seem to be the only one who would like a little more flexibility but since I can also easily work around this, let's land this and iterate later, if necessary.

bondhugula retitled this revision from Introduce std.execute_region op to [MLIR] Introduce std.execute_region op.Mar 20 2020, 8:49 PM

Ping reviewers @rriddle, @silvas - could you please see the tip of the threads? Comments on the patch code itself have been addressed.

LGTM from me. I think the "free returnop from funcop" discussion could go on for a while, so I would encourage you to introduce a new terminator for now so that we can land this.

silvas accepted this revision.Mar 24 2020, 1:35 PM

This revision is now accepted and ready to land.Mar 24 2020, 1:35 PM

In D75837#1940022, @silvas wrote:

LGTM from me. I think the "free returnop from funcop" discussion could go on for a while, so I would encourage you to introduce a new terminator for now so that we can land this.

Sounds good to me. What should the new terminator be called - std.yield?

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
933 ↗	(On Diff #251338)	nit: wrap this in ``.
944 ↗	(On Diff #251338)	nit: Ex: -> Example
947 ↗	(On Diff #251338)	nit: Don't indent inside of the mlir code block.
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388 ↗	(On Diff #249030)	Yeah for consistency, we use /// everywhere.
mlir/test/IR/core-ops.mlir
605 ↗	(On Diff #251338)	nit: Don't check the pred comment,

In D75837#1940549, @rriddle wrote:

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

Should the std.yield terminator be introduced in this patch or another one? It's not a trivial couple of lines because std.yield's verify should pretty much be doing what the FuncOp's verify does in D71961 for imperative ops. (On another note, the YieldOp name is used by both Loop and Linalg dialect without a namespace qualifier that would cause a conflict.)

bondhugula marked an inline comment as done.Mar 24 2020, 11:35 PM

Address review comments; introduce std.yield

Herald added a reviewer: mravishankar. · View Herald TranscriptMar 24 2020, 11:36 PM

Herald added a reviewer: antiagainst. · View Herald Transcript

Herald added a reviewer: herhut. · View Herald Transcript

Herald added subscribers: bader, csigg. · View Herald Transcript

Harbormaster completed remote builds in B50366: Diff 252508.Mar 25 2020, 12:30 AM

In D75837#1940549, @rriddle wrote:

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

Should the std.yield terminator be introduced in this patch or another one? It's not a trivial because std.yield's verify should pretty much be doing what the FuncOp's verify does in D71961 for imperative ops. Also, the YieldOp name is used by both Loop and Linalg dialect without a namespace qualifier which causes a conflict and requires many updates. I've anyway gone ahead and done those. PTAL.

Presumably yield should replace the linalg and loop yields? I would add std.yield in a separate patch that refactors the other dialects to use it as well.

In D75837#1947981, @silvas wrote:

Presumably yield should replace the linalg and loop yields? I would add std.yield in a separate patch that refactors the other dialects to use it as well.

That makes sense - it's a separate patch that requires a discussion and review in itself.

Take out any yield op changes. Rebase.

Harbormaster failed remote builds in B50847: Diff 253396!Mar 28 2020, 10:00 PM

bondhugula added a parent revision: D71961: [MLIR] Free ReturnOp from being restricted to a FuncOp.Mar 28 2020, 10:05 PM

bondhugula edited the summary of this revision. (Show Details)

bondhugula added a child revision: D72223: [MLIR] Introduce affine.execute_region op.Apr 18 2020, 1:31 PM

This op will have to be moved to the right dialect once the std dialect split completes - mostly scf.

Herald added a reviewer: bollu. · View Herald TranscriptFeb 8 2021, 4:30 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 5 others. · View Herald Transcript

Hi Uday,

do you think it is possible to move this to the SCF dialect and use scf.yield instead of return?

Herald added subscribers: dcaballe, cota. · View Herald TranscriptMay 4 2021, 7:59 AM

This should move into the scf dialect and can be respun when the need arises.

bondhugula edited the summary of this revision. (Show Details)Jun 17 2021, 10:14 PM

bondhugula removed a parent revision: D71961: [MLIR] Free ReturnOp from being restricted to a FuncOp.Jun 17 2021, 10:16 PM

bondhugula removed a reviewer: bollu.

Herald added a subscriber: bollu. · View Herald TranscriptJun 17 2021, 10:17 PM

Rebase on upstream tip. Move op to SCF dialect.

Drop duplicate attr dict parsing. Fix stale comment.

Update revision summary

bondhugula retitled this revision from [MLIR] Introduce std.execute_region op to [MLIR] Introduce scf.execute_region op.Jun 17 2021, 11:57 PM

ftynse accepted this revision.Jun 18 2021, 1:05 AM

Update commit summary - fix revision number.

This revision was landed with ongoing or failed builds.Jun 18 2021, 3:10 AM

Closed by commit rG18c8c934d858: [MLIR] Introduce scf.execute_region op (authored by bondhugula). · Explain Why

This revision was automatically updated to reflect the committed changes.

bondhugula added a commit: rG18c8c934d858: [MLIR] Introduce scf.execute_region op.

Harbormaster completed remote builds in B109887: Diff 352953.Jun 18 2021, 6:24 PM

bondhugula removed a child revision: D72223: [MLIR] Introduce affine.execute_region op.Jun 18 2021, 7:29 PM

mehdi_amini added inline comments.Jun 21 2021, 12:35 PM

mlir/include/mlir/Dialect/SCF/SCFOps.td
114	Seems like a canonicalization could be that it would return an SSA value defined in the enclosing region.

bondhugula marked an inline comment as done.Jul 9 2021, 12:09 AM

bondhugula added inline comments.

mlir/include/mlir/Dialect/SCF/SCFOps.td
114	Did you mean moving the slice that generates the yield values to the enclosing region (if they were inside the scf.execute_region)? This can't be done in O(1) time in general nor are the utilities that allow one to do that available in IR libraries - can be part of `Transforms/` though.

mehdi_amini added inline comments.Jul 9 2021, 9:36 AM

mlir/include/mlir/Dialect/SCF/SCFOps.td
114	I mean that if you have: %value = ... %execute_results:2 = scf.execute_region { ... %x = ... ... scf.yield %x, %value } Here `%execute_results#1` can be RAUW with `%value` and the code turned to: %value = ... %execute_results = scf.execute_region { ... %x = ... ... scf.yield %x }

bondhugula marked 2 inline comments as done.Jul 9 2021, 4:59 PM

bondhugula added inline comments.

mlir/include/mlir/Dialect/SCF/SCFOps.td
114	Okay, sure. (This is completely different than what I understood from your statement. Perhaps better stated as: "... canonicalization in the situation where it's returning a value defined in the enclosing region".)

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SCF/

SCFOps.td

68 lines

lib/

Dialect/

SCF/

SCF.cpp

59 lines

test/

Dialect/

SCF/

canonicalize.mlir

30 lines

invalid.mlir

13 lines

ops.mlir

25 lines

utils/

vim/

syntax/

mlir.vim

6 lines

Diff 352958

mlir/include/mlir/Dialect/SCF/SCFOps.td

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	def ConditionOp : SCF_Op<"condition",

let assemblyFormat =		let assemblyFormat =
[{ `(` $condition `)` attr-dict ($args^ `:` type($args))? }];		[{ `(` $condition `)` attr-dict ($args^ `:` type($args))? }];

// Override the default verifier, everything is checked by traits.		// Override the default verifier, everything is checked by traits.
let verifier = ?;		let verifier = ?;
}		}

		//===----------------------------------------------------------------------===//
		// ExecuteRegionOp
		//===----------------------------------------------------------------------===//

		def ExecuteRegionOp : SCF_Op<"execute_region"> {
		let summary = "operation that executes its region exactly once";
		let description = [{
		The `execute_region` operation executes the region held exactly once. The op
		cannot have any operands, nor does its region have any arguments. All SSA
		values that dominate the op can be accessed inside. The op's region can have
		multiple blocks and the blocks can have terminators the same way as FuncOp.
		The values returned from this op's region define the op's results. The op
		primarily provides control flow encapsulation and isolation from a parent
		op's control flow restrictions if any; for example, it allows representation
		of inlined calls in the inside of structured control flow ops with
		restrictions like affine.for/if, scf.for/if ops, and thus the optimization
		of IR in such a mixed form.

		Example:

		```mlir
		scf.for %i = 0 to 128 step %c1 {
		%y = scf.execute_region -> i32 {
		%x = load %A[%i] : memref<128xi32>
		scf.yield %x : i32
		}
		}

		affine.for %i = 0 to 100 {
		"foo"() : () -> ()
		%v = scf.execute_region -> i64 {
		cond_br %cond, ^bb1, ^bb2

		^bb1:
		%c1 = constant 1 : i64
		br ^bb3(%c1 : i64)

		^bb2:
		%c2 = constant 2 : i64
		br ^bb3(%c2 : i64)

		^bb3(%x : i64):
		scf.yield %x : i64
		}
		"bar"(%v) : (i64) -> ()
		}
		```
		}];

		let results = (outs Variadic<AnyType>);

		let regions = (region AnyRegion:$region);

		// TODO: If the parent is a func like op (which would be the case if all other
		// ops are from the std dialect), the inliner logic could be readily used to
		// inline.
		let hasCanonicalizer = 0;
		mehdi_aminiUnsubmitted Done Reply Inline Actions Seems like a canonicalization could be that it would return an SSA value defined in the enclosing region. mehdi_amini: Seems like a canonicalization could be that it would return an SSA value defined in the…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Did you mean moving the slice that generates the yield values to the enclosing region (if they were inside the scf.execute_region)? This can't be done in O(1) time in general nor are the utilities that allow one to do that available in IR libraries - can be part of `Transforms/` though. bondhugula: Did you mean moving the slice that generates the yield values to the enclosing region (if they…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I mean that if you have: %value = ... %execute_results:2 = scf.execute_region { ... %x = ... ... scf.yield %x, %value } Here `%execute_results#1` can be RAUW with `%value` and the code turned to: %value = ... %execute_results = scf.execute_region { ... %x = ... ... scf.yield %x } mehdi_amini: I mean that if you have: ``` %value = ... %execute_results:2 = scf.execute_region { ... %x…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Okay, sure. (This is completely different than what I understood from your statement. Perhaps better stated as: "... canonicalization in the situation where it's returning a value defined in the enclosing region".) bondhugula: Okay, sure. (This is completely different than what I understood from your statement. Perhaps…

		// TODO: can fold if it returns a constant.
		// TODO: Single block execute_region ops can be readily inlined irrespective
		// of which op is a parent. Add a fold for this.
		let hasFolder = 0;
		}

def ForOp : SCF_Op<"for",		def ForOp : SCF_Op<"for",
[DeclareOpInterfaceMethods<LoopLikeOpInterface>,		[DeclareOpInterfaceMethods<LoopLikeOpInterface>,
DeclareOpInterfaceMethods<RegionBranchOpInterface>,		DeclareOpInterfaceMethods<RegionBranchOpInterface>,
SingleBlockImplicitTerminator<"scf::YieldOp">,		SingleBlockImplicitTerminator<"scf::YieldOp">,
RecursiveSideEffects]> {		RecursiveSideEffects]> {
let summary = "for operation";		let summary = "for operation";
let description = [{		let description = [{
The "scf.for" operation represents a loop taking 3 SSA value as operands		The "scf.for" operation represents a loop taking 3 SSA value as operands
▲ Show 20 Lines • Show All 527 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
ConditionOp getConditionOp();		ConditionOp getConditionOp();
Block::BlockArgListType getAfterArguments();		Block::BlockArgListType getAfterArguments();
}];		}];

let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
}		}

def YieldOp : SCF_Op<"yield", [NoSideEffect, ReturnLike, Terminator,		def YieldOp : SCF_Op<"yield", [NoSideEffect, ReturnLike, Terminator,
ParentOneOf<["IfOp, ForOp", "ParallelOp",		ParentOneOf<["ExecuteRegionOp, ForOp",
"WhileOp"]>]> {		"IfOp, ParallelOp, WhileOp"]>]> {
let summary = "loop yield and termination operation";		let summary = "loop yield and termination operation";
let description = [{		let description = [{
"scf.yield" yields an SSA value from the SCF dialect op region and		"scf.yield" yields an SSA value from the SCF dialect op region and
terminates the regions. The semantics of how the values are yielded is		terminates the regions. The semantics of how the values are yielded is
defined by the parent operation.		defined by the parent operation.
If "scf.yield" has any operands, the operands must match the parent		If "scf.yield" has any operands, the operands must match the parent
operation's results.		operation's results.
If the parent operation defines no values, then the "scf.yield" may be		If the parent operation defines no values, then the "scf.yield" may be
Show All 17 Lines

mlir/lib/Dialect/SCF/SCF.cpp

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
}		}

/// Default callback for IfOp builders. Inserts a yield without arguments.		/// Default callback for IfOp builders. Inserts a yield without arguments.
void mlir::scf::buildTerminatedBody(OpBuilder &builder, Location loc) {		void mlir::scf::buildTerminatedBody(OpBuilder &builder, Location loc) {
builder.create<scf::YieldOp>(loc);		builder.create<scf::YieldOp>(loc);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// ExecuteRegionOp
		//===----------------------------------------------------------------------===//

		///
		/// (ssa-id `=`)? `execute_region` `->` function-result-type `{`
		/// block+
		/// `}`
		///
		/// Example:
		/// scf.execute_region -> i32 {
		/// %idx = load %rI[%i] : memref<128xi32>
		/// return %idx : i32
		/// }
		///
		static ParseResult parseExecuteRegionOp(OpAsmParser &parser,
		OperationState &result) {
		if (parser.parseOptionalArrowTypeList(result.types))
		return failure();

		// Introduce the body region and parse it.
		Region *body = result.addRegion();
		if (parser.parseRegion(body, /arguments=/{}, /argTypes=*/{}) \|\|
		parser.parseOptionalAttrDict(result.attributes))
		return failure();

		return success();
		}

		static void print(OpAsmPrinter &p, ExecuteRegionOp op) {
		p << ExecuteRegionOp::getOperationName();
		if (op.getNumResults() > 0)
		p << " -> " << op.getResultTypes();

		p.printRegion(op.region(),
		/printEntryBlockArgs=/false,
		/printBlockTerminators=/true);

		p.printOptionalAttrDict(op->getAttrs());
		}

		static LogicalResult verify(ExecuteRegionOp op) {
		if (op.region().empty())
		return op.emitOpError("region needs to have at least one block");
		if (op.region().front().getNumArguments() > 0)
		return op.emitOpError("region cannot have any arguments");
		return success();
		}

		//===----------------------------------------------------------------------===//
// ForOp		// ForOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void ForOp::build(OpBuilder &builder, OperationState &result, Value lb,		void ForOp::build(OpBuilder &builder, OperationState &result, Value lb,
Value ub, Value step, ValueRange iterArgs,		Value ub, Value step, ValueRange iterArgs,
BodyBuilderFn bodyBuilder) {		BodyBuilderFn bodyBuilder) {
result.addOperands({lb, ub, step});		result.addOperands({lb, ub, step});
result.addOperands(iterArgs);		result.addOperands(iterArgs);
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	static ParseResult parseForOp(OpAsmParser &parser, OperationState &result) {
regionArgs.push_back(inductionVariable);		regionArgs.push_back(inductionVariable);

if (succeeded(parser.parseOptionalKeyword("iter_args"))) {		if (succeeded(parser.parseOptionalKeyword("iter_args"))) {
// Parse assignment list and results type list.		// Parse assignment list and results type list.
if (parser.parseAssignmentList(regionArgs, operands) \|\|		if (parser.parseAssignmentList(regionArgs, operands) \|\|
parser.parseArrowTypeList(result.types))		parser.parseArrowTypeList(result.types))
return failure();		return failure();
// Resolve input operands.		// Resolve input operands.
for (auto operand_type : llvm::zip(operands, result.types))		for (auto operandType : llvm::zip(operands, result.types))
if (parser.resolveOperand(std::get<0>(operand_type),		if (parser.resolveOperand(std::get<0>(operandType),
std::get<1>(operand_type), result.operands))		std::get<1>(operandType), result.operands))
return failure();		return failure();
}		}
// Induction variable.		// Induction variable.
argTypes.push_back(indexType);		argTypes.push_back(indexType);
// Loop carried variables		// Loop carried variables
argTypes.append(result.types.begin(), result.types.end());		argTypes.append(result.types.begin(), result.types.end());
// Parse the body region.		// Parse the body region.
Region *body = result.addRegion();		Region *body = result.addRegion();
Show All 16 Lines

Region &ForOp::getLoopBody() { return region(); }		Region &ForOp::getLoopBody() { return region(); }

bool ForOp::isDefinedOutsideOfLoop(Value value) {		bool ForOp::isDefinedOutsideOfLoop(Value value) {
return !region().isAncestor(value.getParentRegion());		return !region().isAncestor(value.getParentRegion());
}		}

LogicalResult ForOp::moveOutOfLoop(ArrayRef<Operation *> ops) {		LogicalResult ForOp::moveOutOfLoop(ArrayRef<Operation *> ops) {
for (auto op : ops)		for (auto *op : ops)
op->moveBefore(*this);		op->moveBefore(*this);
return success();		return success();
}		}

ForOp mlir::scf::getForInductionVarOwner(Value val) {		ForOp mlir::scf::getForInductionVarOwner(Value val) {
auto ivArg = val.dyn_cast<BlockArgument>();		auto ivArg = val.dyn_cast<BlockArgument>();
if (!ivArg)		if (!ivArg)
return ForOp();		return ForOp();
▲ Show 20 Lines • Show All 1,361 Lines • ▼ Show 20 Lines

Region &ParallelOp::getLoopBody() { return region(); }		Region &ParallelOp::getLoopBody() { return region(); }

bool ParallelOp::isDefinedOutsideOfLoop(Value value) {		bool ParallelOp::isDefinedOutsideOfLoop(Value value) {
return !region().isAncestor(value.getParentRegion());		return !region().isAncestor(value.getParentRegion());
}		}

LogicalResult ParallelOp::moveOutOfLoop(ArrayRef<Operation *> ops) {		LogicalResult ParallelOp::moveOutOfLoop(ArrayRef<Operation *> ops) {
for (auto op : ops)		for (auto *op : ops)
op->moveBefore(*this);		op->moveBefore(*this);
return success();		return success();
}		}

ParallelOp mlir::scf::getParallelForInductionVarOwner(Value val) {		ParallelOp mlir::scf::getParallelForInductionVarOwner(Value val) {
auto ivArg = val.dyn_cast<BlockArgument>();		auto ivArg = val.dyn_cast<BlockArgument>();
if (!ivArg)		if (!ivArg)
return ParallelOp();		return ParallelOp();
▲ Show 20 Lines • Show All 473 Lines • Show Last 20 Lines

mlir/test/Dialect/SCF/canonicalize.mlir

Show First 20 Lines • Show All 891 Lines • ▼ Show 20 Lines	func @combineIfs4(%arg0 : i1, %arg2: i64) {
}		}
return		return
}		}

// CHECK-NEXT: scf.if %arg0 {		// CHECK-NEXT: scf.if %arg0 {
// CHECK-NEXT: "test.firstCodeTrue"() : () -> ()		// CHECK-NEXT: "test.firstCodeTrue"() : () -> ()
// CHECK-NEXT: "test.secondCodeTrue"() : () -> ()		// CHECK-NEXT: "test.secondCodeTrue"() : () -> ()
// CHECK-NEXT: }		// CHECK-NEXT: }

		// -----

		// CHECK-LABEL: func @propagate_into_execute_region
		func @propagate_into_execute_region() {
		%cond = constant 0 : i1
		affine.for %i = 0 to 100 {
		"test.foo"() : () -> ()
		%v = scf.execute_region -> i64 {
		cond_br %cond, ^bb1, ^bb2

		^bb1:
		%c1 = constant 1 : i64
		br ^bb3(%c1 : i64)

		^bb2:
		%c2 = constant 2 : i64
		br ^bb3(%c2 : i64)

		^bb3(%x : i64):
		scf.yield %x : i64
		}
		"test.bar"(%v) : (i64) -> ()
		// CHECK: %[[C2:.*]] = constant 2 : i64
		// CHECK: scf.execute_region -> i64 {
		// CHECK-NEXT: scf.yield %[[C2]] : i64
		// CHECK-NEXT: }
		}
		return
		}

mlir/test/Dialect/SCF/invalid.mlir

Show First 20 Lines • Show All 422 Lines • ▼ Show 20 Lines	func @parallel_invalid_yield(
}		}
return		return
}		}

// -----		// -----

func @yield_invalid_parent_op() {		func @yield_invalid_parent_op() {
"my.op"() ({		"my.op"() ({
// expected-error@+1 {{'scf.yield' op expects parent op to be one of 'scf.if, scf.for, scf.parallel, scf.while'}}		// expected-error@+1 {{'scf.yield' op expects parent op to be one of 'scf.execute_region, scf.for, scf.if, scf.parallel, scf.while'}}
scf.yield		scf.yield
}) : () -> ()		}) : () -> ()
return		return
}		}

// -----		// -----

func @while_parser_type_mismatch() {		func @while_parser_type_mismatch() {
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	func @while_bad_terminator() {
// expected-error@+1 {{expects the 'after' region to terminate with 'scf.yield'}}		// expected-error@+1 {{expects the 'after' region to terminate with 'scf.yield'}}
scf.while : () -> () {		scf.while : () -> () {
scf.condition(%true)		scf.condition(%true)
} do {		} do {
// expected-note@+1 {{terminator here}}		// expected-note@+1 {{terminator here}}
"some.other_terminator"() : () -> ()		"some.other_terminator"() : () -> ()
}		}
}		}

		// -----

		func @execute_region() {
		// expected-error @+1 {{region cannot have any arguments}}
		"scf.execute_region"() ({
		^bb0(%i : i32):
		scf.yield
		}) : () -> ()
		return
		}

mlir/test/Dialect/SCF/ops.mlir

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	scf.while : () -> () {
scf.condition(%true)		scf.condition(%true)
// CHECK: } do {		// CHECK: } do {
} do {		} do {
// CHECK: scf.yield		// CHECK: scf.yield
scf.yield		scf.yield
}		}
return		return
}		}

		// CHECK-LABEL: func @execute_region
		func @execute_region() -> i64 {
		// CHECK: scf.execute_region -> i64 {
		// CHECK-NEXT: constant
		// CHECK-NEXT: scf.yield
		// CHECK-NEXT: }
		%res = scf.execute_region -> i64 {
		%c1 = constant 1 : i64
		scf.yield %c1 : i64
		}

		// CHECK: scf.execute_region {
		// CHECK-NEXT: br ^bb1
		// CHECK-NEXT: ^bb1:
		// CHECK-NEXT: scf.yield
		// CHECK-NEXT: }
		"scf.execute_region"() ({
		^bb0:
		br ^bb1
		^bb1:
		scf.yield
		}) : () -> ()
		return %res : i64
		}

mlir/utils/vim/syntax/mlir.vim

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	syn match mlirOps /\<affine\.dma_start\>/			syn match mlirOps /\<affine\.dma_start\>/
	syn match mlirOps /\<affine\.dma_wait\>/			syn match mlirOps /\<affine\.dma_wait\>/
	syn match mlirOps /\<affine\.for\>/			syn match mlirOps /\<affine\.for\>/
	syn match mlirOps /\<affine\.if\>/			syn match mlirOps /\<affine\.if\>/
	syn match mlirOps /\<affine\.load\>/			syn match mlirOps /\<affine\.load\>/
	syn match mlirOps /\<affine\.parallel\>/			syn match mlirOps /\<affine\.parallel\>/
	syn match mlirOps /\<affine\.prefetch\>/			syn match mlirOps /\<affine\.prefetch\>/
	syn match mlirOps /\<affine\.store\>/			syn match mlirOps /\<affine\.store\>/
	syn match mlirOps /\<loop\.for\>/			syn match mlirOps /\<scf\.execute_region\>/
	syn match mlirOps /\<loop\.if\>/			syn match mlirOps /\<scf\.for\>/
				syn match mlirOps /\<scf\.if\>/
				syn match mlirOps /\<scf\.yield\>/

	" TODO: dialect name prefixed ops (llvm or std).			" TODO: dialect name prefixed ops (llvm or std).

	" Keywords.			" Keywords.
	syn keyword mlirKeyword			syn keyword mlirKeyword
	\ affine_map			\ affine_map
	\ affine_set			\ affine_set
	\ dense			\ dense
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Introduce scf.execute_region opClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 352958

mlir/include/mlir/Dialect/SCF/SCFOps.td

mlir/lib/Dialect/SCF/SCF.cpp

mlir/test/Dialect/SCF/canonicalize.mlir

mlir/test/Dialect/SCF/invalid.mlir

mlir/test/Dialect/SCF/ops.mlir

mlir/utils/vim/syntax/mlir.vim

[MLIR] Introduce scf.execute_region op
ClosedPublic