This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/StandardOps/IR/
-
mlir/
-
Dialect/
-
StandardOps/
-
IR/
11/12
Ops.td
-
lib/Dialect/StandardOps/IR/
-
Dialect/
-
StandardOps/
-
IR/
3/3
Ops.cpp
-
test/
-
IR/
1/1
core-ops.mlir
-
invalid-ops.mlir
-
Transforms/
-
canonicalize.mlir

Differential D75837

[MLIR] Introduce scf.execute_region op
ClosedPublic

Authored by bondhugula on Mar 8 2020, 10:54 PM.

Download Raw Diff

Details

Reviewers

rriddle
silvas
mehdi_amini
nicolasvasilache
mravishankar
antiagainst
herhut
ftynse

Commits

rG18c8c934d858: [MLIR] Introduce scf.execute_region op

Summary

Introduce the execute_region op that is able to hold a region which it
executes exactly once. The op encapsulates a CFG within itself while
isolating it from the surrounding control flow. Proposal discussed here:
https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282

execute_region enables one to inline a function without lowering out all
other higher level control flow constructs (affine.for/if, scf.for/if)
to the flat list of blocks / CFG form. It thus allows the benefit of
transforms on higher level control flow ops available in the presence of
the inlined calls. The inlined calls continue to benefit from
propagation of SSA values across their top boundary. Functions won’t
have to remain outlined until later than desired. Abstractions like
affine execute_regions, lambdas with implicit captures could be lowered
to this without first lowering out structured loops/ifs or outlining.
But two potential early use cases are of: (1) an early inliner (which
can inline functions by introducing execute_region ops), (2) lowering of
an affine.execute_region, which cleanly maps to an scf.execute_region
when going from the affine dialect to the scf dialect.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	90 ms	Clang.Driver::Unknown Unit Message ("")
	20 ms	LLVM.Linker::Unknown Unit Message ("")
	20 ms	LLVM.Linker::Unknown Unit Message ("")
	20 ms	LLVM.Linker::Unknown Unit Message ("")
	30 ms	LLVM.Linker::Unknown Unit Message ("")
		View Full Test Results (10 Failed)

Event Timeline

bondhugula created this revision.Mar 8 2020, 10:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2020, 10:54 PM

Herald added subscribers: llvm-commits, Joonsoo, liufengdb and 11 others. · View Herald Transcript

Harbormaster failed remote builds in B48515: Diff 249030!Mar 8 2020, 11:25 PM

This patch won't work without D71961 (which unties return from FuncOp).

bondhugula added reviewers: rriddle, silvas, mehdi_amini.Mar 9 2020, 12:03 AM

nicolasvasilache added a reviewer: nicolasvasilache.Mar 9 2020, 5:16 AM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

1046

I'll repaste my comment that I don't believe was addressed:

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.
Can this be designed and implemented so it serves today’s needs that are already more general than “just capture”?

bondhugula marked an inline comment as done.Mar 9 2020, 7:16 AM

bondhugula added inline comments.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1046	I'm not sure what you may need for LinAlg; so I can't say how it may help there. For the things this op will at least help with, please see the commit summary or the discussion thread for more details - it is based on today's needs but your 'today' may be very different from mine! :-) I do see the need for ops that take dimensional arguments and captures (and with regions), but their goals are very different from those of this op. I suspect you might be thinking that this op works at a higher level than it really does - so you may need a different op for what you have in mind.

I'll reformulate based on your description

lambdas with implicit captures (or even explicit when possible) could be lowered to this without first lowering out structured loops/ifs or outlining

Lambdas often allow both captures and arguments.
I can see immediate use cases for this op allowing captures + arguments and see advantages in having one op that does both, as opposed to duplication/splitting into, e.g.:

one op "that can only capture",
one op "that can only take arguments" and
one op "that can do both".

Is there a fundamental reason to disallow arguments in your op?
Assuming there exists such a reason, isn't it trivial to check preconditions such as "empty arguments" in verifiers that need it?

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.

I don't understand how explicit capture contributes to "refuse to introduce SSA values prematurely"? Can you provide an example of what you?
It isn't clear to me why would we keep arguments with such op instead of always canonicalizing towards eliminating them.

In D75837#1913781, @mehdi_amini wrote:

To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture.

I don't understand how explicit capture contributes to "refuse to introduce SSA values prematurely"? Can you provide an example of what you?
It isn't clear to me why would we keep arguments with such op instead of always canonicalizing towards eliminating them.

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt? If anything, allowing the removal of trivial args where it makes sense be a canonicalization on the execute_region op avoids pushing the responsibility on clients producing the op to create the op in that form initially. E.g. when inlining a calls as in the initial use case, you would just throw the FuncOp's region as-is into an execute_region op (updating "return" terminators perhaps), and transfer over the arg list of the call to the execute_regoin op. Otherwise, they would have to do the arg rewriting themselves (maybe we can just have a helper function for that though).

I guess I'm trying to understand whether we expect code to see code like this:

if (auto executeRegion = dyn_cast<ExecuteRegionOp>(op)) {
  if (executeRegion.hasExplicitCaptures()) {
    break; // Darn, can't handle it.
  }
}

I expect that we won't have code like this, and instead what we'll see is generic use-def following passes that silently aren't smart enough to handle explicit captures (such as when applying local rewrite patterns) and will fail to optimize. So I think the real question is balancing:

The cost of pushing all clients of this op to establish the canonical form of no explicit captures mandatorily upon creation
The potential lost optimization opportunities due failing (for whatever reason; oversight, pass ordering issues, ...) to run the canonicalization pass to put it into the no-explicit-capture form.

Neither seems massively compelling, so starting with the more restricted form seems like a good choice. We can loosen it later if needed.

In D75837#1927525, @silvas wrote:

+1 This is also exactly what I wanted to say. If there were arguments in the land you were starting from (say you were inlining a call), those arguments should just get propagated and eliminated. Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt?

I missed this, do you have a pointer?
I assume this wouldn't be zero cost / transparent though.

The cost of pushing all clients of this op to establish the canonical form of no explicit captures mandatorily upon creation

I may be missing something, but isn't it just a direct RAUW? What is the cost herE?

Since river is working on making dataflow be able to transparently look through non-explicit captures for ops like this, how important is it to not have explicit args?

That is, we can canonicalize the args away, but having the args shouldn't hurt? I

I think there is some communication gap here and perhaps different things being mixed. If you've explicitly captured something, you've already created a barrier: for eg. consider a dynamically shaped memref explicitly captured that prevents a static shape from flowing in via a memref cast used from above; unless you replace the memref with the statically shaped one, you won't see the static shape for whatever analysis/transform. For the affine graybox, I had done a detailed analysis of the costs of just explicitly capturing memrefs (see for eg. how it complicates dead dealloc removal):
https://github.com/polymage-labs/mlirx/blob/master/mlir/rfc/rfc-graybox.md#maintaining-memref-operandsarguments
There is no way around registering and implementing canonicalizations for what would have otherwise worked. The advantage of explicitly capturing memrefs in the context of the graybox was that you don't have to look inside the op to see which memory is being accessed and you don't want to because it's part of a different polyhedral scope with its own symbols; so the downsides of explicit capture IMO are outweighed by how they simplify polyhedral passes. For execute_region, there is no such argument in favor of explicit captures. You just have to do a simple replaceAllUsesWith to propagate what you thought of as explicit captures (this is what I already do when I convert an affine.graybox to an execute_region in D72223.

Okay, let's land this without allowing explicit captures, given that's the most restrictive semantics. We can loosen it later if there's a compelling need.

@nicolasvasilache is that ok with you?

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1043	make the summary more descriptive.
1044	Indent description by two spaces (instead of 4) for consistency with the rest of the file
1090	Can you add a verifier that the region doesn't have args? Also, I'm not super familiar with ODS, but does this specification autogenerate a verifier that there are no operands, or does it just not check anything about the op's operands? If the latter, please change it so that the verifier checks that there are no operands to this op.

rriddle added inline comments.Mar 18 2020, 10:20 PM

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1039	This op comes before ExtractElementOp alphabetically.
1049	nit: MLIR functions (FuncOp) -> FuncOp
1059	This should be indented in an mlir code block
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388	Use /// For top-level comments.

Address review comments.

Thanks for the reviews! Updated.

In D75837#1930465, @silvas wrote:

Okay, let's land this without allowing explicit captures, given that's the most restrictive semantics. We can loosen it later if there's a compelling need.

This will still need the patch on the ReturnOp to land. On a related note, I think we shouldn't move this op to the loop dialect unless the latter is renamed first.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
1090	ODS doesn't support regions yet, and so you are right that we'll need to verify for zero region arguments. (Added that as well as > 0 blocks check.) But for operands, with no operands in the ODS description, the auto-generation will mark the op with the ZeroOperands trait, and the latter's verifier will check for it.
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388	Sorry, I didn't understand. These aren't doc comments, but for the implementation. This entire comment para along with the "Ex:" should be ///?

bondhugula edited the summary of this revision. (Show Details)Mar 19 2020, 4:16 AM

bondhugula edited the summary of this revision. (Show Details)

Harbormaster failed remote builds in B49731: Diff 251338!Mar 19 2020, 4:50 AM

Responding in bulk below:

@nicolasvasilache is that ok with you?

I am not opposed to moving forward and iterating on code as we learn more.
But I still haven't seen a compelling reason to disallow arguments.
I would like some concrete example that illustrates why allowing arguments is a bad idea.

We can loosen it later if there's a compelling need.

This seems at odds with this other statement:

Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

In other words, either:

there are fundamental difficulties involved in which case, refusing arguments pushes concerns to all consumers. Shouldn't difficulty be factored out in one place
or it's a simple extension, in which case why not just allow arguments?

I am unclear if we are in case 1., 2. or something else. Which is it?

@mehdi_amini Can you provide an example of what you?

See linalg.generic and linalg.indexed_generic, both have region arguments that are derived from the op operands, but are not necessarily the same SSA value
(e.g there is an interleaved load/store or even loop IV creation).

Now I can see how to use this op in its current form for my particular purpose: I can just move the content of my region inside a new execute_region op as I lower.

But I don't think my questions have been answered so I'll ask again:

Lambdas often allow both captures and arguments.
...
Is there a fundamental reason to disallow arguments in your op?
Assuming there exists such a reason, isn't it trivial to check preconditions such as "empty arguments" in verifiers that need it?

Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it.

In other words, either:

there are fundamental difficulties involved in which case, refusing arguments pushes concerns to all consumers. Shouldn't difficulty be factored out in one place

or it's a simple extension, in which case why not just allow arguments?

It's not a simple extension. There are major costs. Straightforward SSA dominance vs having to pass through arguments (explicit captures) is akin to "intraprocedural optimization" vs "a good part of the complexity involved in interprocedural optimization" -- the latter has been established to be more difficult than intraprocedural for the same given transformation.

Is there a fundamental reason to disallow arguments in your op?

Yes, I'm going to copy paste the same thing from above, but with a few extra lines below - I'm not sure if you had read this differently because the answer is pretty straightforward. "Keeping arguments around will necessitate all kinds of tracking/bookkeeping in moving code across the region boundary, reimplementing existing canonicalizations on this op and largely defeating the purpose of this op - which is to let SSA dominance and dataflow work freely from above and through it." You'd have to reimplement nearly all canonicalizations on this op from propagation of constants, to propagation of memref_casts, removal of dead deallocs, removal of dead allocs, subexpression elimination, etc.. For an example on just memref arguments, see the link upthread on grayboxes on the kind of complexities you'd have to deal with if you explicitly captured memrefs (over there IMO explicit captures just for memrefs are worth those cost and hence a new op affine.graybox). Did you skip reading the in-between messages?

To conclude, I just don't see the benefits of explicit captures in a few specific cases to outweigh the widespread / large scale negative impact on all lower level SSA optimizations (where low-level here is std dialect, loop dialect, and to some extent also the affine dialect - you'd have execute_region in the presence of these dialect ops *at least*) .

Lambdas often allow both captures and arguments.

Yes: and they serve very different purpose, they have different well.
Basically it seems like using different construct to model different concepts is fairly standard and undisputed.

Is there a fundamental reason to disallow arguments in your op?

Seems like this is adding extra complexity, but I haven't seen a reason to motivate it. This seems like a good enough reason to me?

As I mentioned before, why wouldn't a canonicalize pattern just eliminate all the operands? And if so why do we allow it in the first place?

Sure, the traditional, run-of-the-mill properties are true:

implicit captures preserve use-def chains
arguments break use-def chains and if you want similar optimizations you'll want inlining or some form of IPO.

I see the discussion above as conflating semantics with optimization.
Allowing your op to take argument does absolutely not mean you have to use them for everything all the time, yet the argumentation seems to take that as a premisse.
I think this is particularly clear in the following:

You'd have to reimplement nearly all canonicalizations on this op from propagation of constants, to propagation of memref_casts, removal of dead deallocs, removal of dead allocs, subexpression elimination, etc..

I don't see how this is true and why you'd have to reimplement anything in this list.
If you want these canonicalizations to apply immediately, you should just use implicit capture for all values (which is what you propose).

You could also want to use arguments for a subset of the values, isolate their users, inline and then apply the remaining canonicalizations.
That would be perfectly fine too.

I still see no compelling reason to strictly forbid arguments in this op: if you want to enforce somewhere that everything is by-capture only, it's easy to verify that numArguments == 0.
OTOH adding arguments is trivial and will be transparent wrt everything you mention above: if you don't want to use arguments just don't use argument.
Literally, if you added the possibility for your op to have arguments, your canonicalization test would not change.

Plainly forbidding arguments has a finality to it that I view as unnecessary.

If you want these canonicalizations to apply immediately, you should just use implicit capture for all values (which is what you propose).

So we are actually on the same page as far as the benefits of implicit captures goes? I was under the impression that you were missing those, but you just want the option to use explicit captures on this when you really have such a use case -- but then with the explicit captures comes the question of how the arguments obtain their values and you can't have custom behavior there because the lowering would need to know how exactly those arguments obtain those values or how operands bind to arguments, for eg. that there is a 1:1 match between its operands and arguments. And if there is a 1:1 match, we are back to the question why not just do a RAUW and eliminate those arguments in the first place? OTOH, if your arguments obtain values from operands or elsewhere in a more custom way, then the lowering would need to be aware of it in an unambiguous way, and you'd have to design/evaluate that. As @silvas mentions too, this still means it makes sense to start from the most restrictive form (only implicit captures), and evaluate an explicit capture option by first defining what exactly the capture argument semantics are for the use case, how it impacts the lowering, and mechanically, what the new syntax of the op would look like. (It is just two lines to knock off in the verifier if you want the op to take region arguments.)

@nicolasvasilache the real thing to evaluate for your linalg use case is the benefits of "having a separate op that could readily lower to execute_region when the time is right" vis-a-vis "adding explicit arg semantics to execute region op itself". Note that different client/higher level use cases may want different semantics with their explicit captures (and how the region arguments obtain their values), and they could benefit by modeling/handling those explicit captures on their own op (and dealing with the custom canonicalizations there) before lowering to execute_region.

So we are actually on the same page as far as the benefits of implicit captures goes?
...
As @silvas mentions too, this still means it makes sense to start from the most restrictive form (only implicit captures).

Yes no argument there, I was unclear if I missed something fundamental that makes it strictly necessary to forbid explicit arguments.
I am sympathetic with the arguments that "it is simpler" + "you won't need it in practice".

Since I seem to be the only one who would like a little more flexibility but since I can also easily work around this, let's land this and iterate later, if necessary.

bondhugula retitled this revision from Introduce std.execute_region op to [MLIR] Introduce std.execute_region op.Mar 20 2020, 8:49 PM

Ping reviewers @rriddle, @silvas - could you please see the tip of the threads? Comments on the patch code itself have been addressed.

LGTM from me. I think the "free returnop from funcop" discussion could go on for a while, so I would encourage you to introduce a new terminator for now so that we can land this.

silvas accepted this revision.Mar 24 2020, 1:35 PM

This revision is now accepted and ready to land.Mar 24 2020, 1:35 PM

In D75837#1940022, @silvas wrote:

LGTM from me. I think the "free returnop from funcop" discussion could go on for a while, so I would encourage you to introduce a new terminator for now so that we can land this.

Sounds good to me. What should the new terminator be called - std.yield?

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
997	nit: wrap this in ``.
1008	nit: Ex: -> Example
1011	nit: Don't indent inside of the mlir code block.
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1388	Yeah for consistency, we use /// everywhere.
mlir/test/IR/core-ops.mlir
614	nit: Don't check the pred comment,

In D75837#1940549, @rriddle wrote:

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

Should the std.yield terminator be introduced in this patch or another one? It's not a trivial couple of lines because std.yield's verify should pretty much be doing what the FuncOp's verify does in D71961 for imperative ops. (On another note, the YieldOp name is used by both Loop and Linalg dialect without a namespace qualifier that would cause a conflict.)

bondhugula marked an inline comment as done.Mar 24 2020, 11:35 PM

Address review comments; introduce std.yield

Herald added a reviewer: mravishankar. · View Herald TranscriptMar 24 2020, 11:36 PM

Herald added a reviewer: antiagainst. · View Herald Transcript

Herald added a reviewer: herhut. · View Herald Transcript

Herald added subscribers: bader, csigg. · View Herald Transcript

Harbormaster completed remote builds in B50366: Diff 252508.Mar 25 2020, 12:30 AM

In D75837#1940549, @rriddle wrote:

I added some more nits, mostly to keep this consistent with the changes coming in D76743

Also, std.yield seems good.

Should the std.yield terminator be introduced in this patch or another one? It's not a trivial because std.yield's verify should pretty much be doing what the FuncOp's verify does in D71961 for imperative ops. Also, the YieldOp name is used by both Loop and Linalg dialect without a namespace qualifier which causes a conflict and requires many updates. I've anyway gone ahead and done those. PTAL.

Presumably yield should replace the linalg and loop yields? I would add std.yield in a separate patch that refactors the other dialects to use it as well.

In D75837#1947981, @silvas wrote:

Presumably yield should replace the linalg and loop yields? I would add std.yield in a separate patch that refactors the other dialects to use it as well.

That makes sense - it's a separate patch that requires a discussion and review in itself.

Take out any yield op changes. Rebase.

Harbormaster failed remote builds in B50847: Diff 253396!Mar 28 2020, 10:00 PM

bondhugula added a parent revision: D71961: [MLIR] Free ReturnOp from being restricted to a FuncOp.Mar 28 2020, 10:05 PM

bondhugula edited the summary of this revision. (Show Details)

bondhugula added a child revision: D72223: [MLIR] Introduce affine.execute_region op.Apr 18 2020, 1:31 PM

This op will have to be moved to the right dialect once the std dialect split completes - mostly scf.

Herald added a reviewer: bollu. · View Herald TranscriptFeb 8 2021, 4:30 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 5 others. · View Herald Transcript

Hi Uday,

do you think it is possible to move this to the SCF dialect and use scf.yield instead of return?

Herald added subscribers: dcaballe, cota. · View Herald TranscriptMay 4 2021, 7:59 AM

This should move into the scf dialect and can be respun when the need arises.

bondhugula edited the summary of this revision. (Show Details)Jun 17 2021, 10:14 PM

bondhugula removed a parent revision: D71961: [MLIR] Free ReturnOp from being restricted to a FuncOp.Jun 17 2021, 10:16 PM

bondhugula removed a reviewer: bollu.

Herald added a subscriber: bollu. · View Herald TranscriptJun 17 2021, 10:17 PM

Rebase on upstream tip. Move op to SCF dialect.

Drop duplicate attr dict parsing. Fix stale comment.

Update revision summary

bondhugula retitled this revision from [MLIR] Introduce std.execute_region op to [MLIR] Introduce scf.execute_region op.Jun 17 2021, 11:57 PM

ftynse accepted this revision.Jun 18 2021, 1:05 AM

Update commit summary - fix revision number.

This revision was landed with ongoing or failed builds.Jun 18 2021, 3:10 AM

Closed by commit rG18c8c934d858: [MLIR] Introduce scf.execute_region op (authored by bondhugula). · Explain Why

This revision was automatically updated to reflect the committed changes.

bondhugula added a commit: rG18c8c934d858: [MLIR] Introduce scf.execute_region op.

Harbormaster completed remote builds in B109887: Diff 352953.Jun 18 2021, 6:24 PM

bondhugula removed a child revision: D72223: [MLIR] Introduce affine.execute_region op.Jun 18 2021, 7:29 PM

mehdi_amini added inline comments.Jun 21 2021, 12:35 PM

mlir/include/mlir/Dialect/SCF/SCFOps.td
114 ↗	(On Diff #352958)	Seems like a canonicalization could be that it would return an SSA value defined in the enclosing region.

bondhugula marked an inline comment as done.Jul 9 2021, 12:09 AM

bondhugula added inline comments.

mlir/include/mlir/Dialect/SCF/SCFOps.td
114 ↗	(On Diff #352958)	Did you mean moving the slice that generates the yield values to the enclosing region (if they were inside the scf.execute_region)? This can't be done in O(1) time in general nor are the utilities that allow one to do that available in IR libraries - can be part of `Transforms/` though.

mehdi_amini added inline comments.Jul 9 2021, 9:36 AM

mlir/include/mlir/Dialect/SCF/SCFOps.td
114 ↗	(On Diff #352958)	I mean that if you have: %value = ... %execute_results:2 = scf.execute_region { ... %x = ... ... scf.yield %x, %value } Here `%execute_results#1` can be RAUW with `%value` and the code turned to: %value = ... %execute_results = scf.execute_region { ... %x = ... ... scf.yield %x }

bondhugula marked 2 inline comments as done.Jul 9 2021, 4:59 PM

bondhugula added inline comments.

mlir/include/mlir/Dialect/SCF/SCFOps.td
114 ↗	(On Diff #352958)	Okay, sure. (This is completely different than what I understood from your statement. Perhaps better stated as: "... canonicalization in the situation where it's returning a value defined in the enclosing region".)

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

StandardOps/

IR/

Ops.td

64 lines

lib/

Dialect/

StandardOps/

IR/

Ops.cpp

55 lines

test/

IR/

core-ops.mlir

21 lines

invalid-ops.mlir

11 lines

Transforms/

canonicalize.mlir

27 lines

Diff 253396

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

Show First 20 Lines • Show All 910 Lines • ▼ Show 20 Lines
// DivFOp		// DivFOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def DivFOp : FloatArithmeticOp<"divf"> {		def DivFOp : FloatArithmeticOp<"divf"> {
let summary = "floating point division operation";		let summary = "floating point division operation";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// ExecuteRegionOp
		//===----------------------------------------------------------------------===//

		def ExecuteRegionOp : Std_Op<"execute_region"> {
		let summary = "operation that executes its region exactly once";
		let description = [{
		The `execute_region` operation executes the region held exactly once. The op
		cannot have any operands, nor does its region have any arguments. All SSA
		values that dominate the op can be accessed inside. The op's region can have
		multiple blocks and the blocks can have terminators the same way as FuncOp.
		The values returned from this op's region define the op's results. The op
		primarily provides control flow encapsulation and isolation from a parent
		op's control flow restrictions if any; for example, it allows representation
		of inlined calls in the inside of structured control flow ops with
		restrictions like affine.for/if, loop.for/if ops, and thus the optimization
		of IR in such a mixed form.

		Example:

		```mlir
		loop.for %i = 0 to 128 {
		%y = execute_region -> i32 {
		%x = load %A[%i] : memref<128xi32>
		return %x : i32
		}
		}

		affine.for %i = 0 to 100 {
		"foo"() : () -> ()
		%v = execute_region -> i64 {
		cond_br %cond, ^bb1, ^bb2

		^bb1:
		%c1 = constant 1 : i64
		br ^bb3(%c1 : i64)

		^bb2:
		%c2 = constant 2 : i64
		br ^bb3(%c2 : i64)

		^bb3(%x : i64):
		return %x : i64
		}
		"bar"(%v) : (i64) -> ()
		}
		```
		}];

		let results = (outs Variadic<AnyType>);

		let regions = (region AnyRegion:$region);

		// TODO: If the parent is a func like op (which would be the case if all other
		// ops are from the std dialect), the inliner logic could be readily used to
		// inline.
		let hasCanonicalizer = 0;

		// TODO: can fold if it returns a constant.
		// TODO: Single block execute_region ops can be readily inlined irrespective
		// of which op is a parent. Add a fold for this.
		let hasFolder = 0;
		}

		//===----------------------------------------------------------------------===//
// ExpOp		// ExpOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def ExpOp : FloatUnaryOp<"exp"> {		def ExpOp : FloatUnaryOp<"exp"> {
let summary = "base-e exponential of the specified value";		let summary = "base-e exponential of the specified value";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ExtractElementOp		// ExtractElementOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def ExtractElementOp : Std_Op<"extract_element",		def ExtractElementOp : Std_Op<"extract_element",
[NoSideEffect,		[NoSideEffect,
TypesMatchWith<"result type matches element type of aggregate",		TypesMatchWith<"result type matches element type of aggregate",
"aggregate", "result",		"aggregate", "result",
		rriddleUnsubmitted Done Reply Inline Actions nit: wrap this in ``. rriddle: nit: wrap this in ``.
"$_self.cast<ShapedType>().getElementType()">]> {		"$_self.cast<ShapedType>().getElementType()">]> {
let summary = "element extract operation";		let summary = "element extract operation";
let description = [{		let description = [{
The "extract_element" op reads a tensor or vector and returns one element		The "extract_element" op reads a tensor or vector and returns one element
from it specified by an index list. The output of extract is a new value		from it specified by an index list. The output of extract is a new value
with the same type as the elements of the tensor or vector. The arity of		with the same type as the elements of the tensor or vector. The arity of
indices matches the rank of the accessed value (i.e., if a tensor is of rank		indices matches the rank of the accessed value (i.e., if a tensor is of rank
3, then 3 indices are required for the extract). The indices should all be		3, then 3 indices are required for the extract). The indices should all be
of index type. For example:		of index type. For example:

%3 = extract_element %0[%1, %2] : vector<4x4xi32>		%3 = extract_element %0[%1, %2] : vector<4x4xi32>
		rriddleUnsubmitted Done Reply Inline Actions nit: Ex: -> Example rriddle: nit: Ex: -> Example
}];		}];

let arguments = (ins AnyTypeOf<[AnyVector, AnyTensor]>:$aggregate,		let arguments = (ins AnyTypeOf<[AnyVector, AnyTensor]>:$aggregate,
		rriddleUnsubmitted Done Reply Inline Actions nit: Don't indent inside of the mlir code block. rriddle: nit: Don't indent inside of the mlir code block.
Variadic<Index>:$indices);		Variadic<Index>:$indices);
let results = (outs AnyType:$result);		let results = (outs AnyType:$result);

let builders = [OpBuilder<		let builders = [OpBuilder<
"Builder *builder, OperationState &result, Value aggregate,"		"Builder *builder, OperationState &result, Value aggregate,"
"ValueRange indices = {}", [{		"ValueRange indices = {}", [{
auto resType = aggregate.getType().cast<ShapedType>()		auto resType = aggregate.getType().cast<ShapedType>()
.getElementType();		.getElementType();
Show All 11 Lines	def ExtractElementOp : Std_Op<"extract_element",
let hasFolder = 1;		let hasFolder = 1;

let assemblyFormat = [{		let assemblyFormat = [{
$aggregate `[` $indices `]` attr-dict `:` type($aggregate)		$aggregate `[` $indices `]` attr-dict `:` type($aggregate)
}];		}];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// FPExtOp		// FPExtOp
		rriddleUnsubmitted Done Reply Inline Actions This op comes before ExtractElementOp alphabetically. rriddle: This op comes before ExtractElementOp alphabetically.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def FPExtOp : CastOp<"fpext">, Arguments<(ins AnyType:$in)> {		def FPExtOp : CastOp<"fpext">, Arguments<(ins AnyType:$in)> {
let summary = "cast from floating-point to wider floating-point";		let summary = "cast from floating-point to wider floating-point";
		silvasUnsubmitted Done Reply Inline Actions make the summary more descriptive. silvas: make the summary more descriptive.
let description = [{		let description = [{
		silvasUnsubmitted Done Reply Inline Actions Indent description by two spaces (instead of 4) for consistency with the rest of the file silvas: Indent description by two spaces (instead of 4) for consistency with the rest of the file
Cast a floating-point value to a larger floating-point-typed value.		Cast a floating-point value to a larger floating-point-typed value.
The destination type must to be strictly wider than the source type.		The destination type must to be strictly wider than the source type.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions I'll repaste my comment that I don't believe was addressed: To be generally useful for Linalg and other ops with regions that refuse to introduce SSA values prematurely (I.e. that use type information to encode the semantics and delay SSA value creation until inlining) you need both arguments and capture. Can this be designed and implemented so it serves today’s needs that are already more general than “just capture”? nicolasvasilache: I'll repaste my comment that I don't believe was addressed: ``` To be generally useful for…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions I'm not sure what you may need for LinAlg; so I can't say how it may help there. For the things this op will at least help with, please see the commit summary or the discussion thread for more details - it is based on today's needs but your 'today' may be very different from mine! :-) I do see the need for ops that take dimensional arguments and captures (and with regions), but their goals are very different from those of this op. I suspect you might be thinking that this op works at a higher level than it really does - so you may need a different op for what you have in mind. bondhugula: I'm not sure what you may need for LinAlg; so I can't say how it may help there. For the things…
Only scalars are currently supported.		Only scalars are currently supported.
}];		}];

		rriddleUnsubmitted Done Reply Inline Actions nit: MLIR functions (FuncOp) -> FuncOp rriddle: nit: MLIR functions (FuncOp) -> FuncOp
let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Return true if `a` and `b` are valid operand and result pairs for		/// Return true if `a` and `b` are valid operand and result pairs for
/// the operation.		/// the operation.
static bool areCastCompatible(Type a, Type b);		static bool areCastCompatible(Type a, Type b);
}];		}];

let hasFolder = 0;		let hasFolder = 0;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		rriddleUnsubmitted Done Reply Inline Actions This should be indented in an mlir code block rriddle: This should be indented in an mlir code block
// FPTruncOp		// FPTruncOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def FPTruncOp : CastOp<"fptrunc">, Arguments<(ins AnyType:$in)> {		def FPTruncOp : CastOp<"fptrunc">, Arguments<(ins AnyType:$in)> {
let summary = "cast from floating-point to narrower floating-point";		let summary = "cast from floating-point to narrower floating-point";
let description = [{		let description = [{
Truncate a floating-point value to a smaller floating-point-typed value.		Truncate a floating-point value to a smaller floating-point-typed value.
The destination type must be strictly narrower than the source type.		The destination type must be strictly narrower than the source type.
Show All 14 Lines
// IndexCastOp		// IndexCastOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def IndexCastOp : CastOp<"index_cast">, Arguments<(ins AnyType:$in)> {		def IndexCastOp : CastOp<"index_cast">, Arguments<(ins AnyType:$in)> {
let summary = "cast between index and integer types";		let summary = "cast between index and integer types";
let description = [{		let description = [{
Casts between integer scalars and 'index' scalars. Index is an integer of		Casts between integer scalars and 'index' scalars. Index is an integer of
platform-specific bit width. If casting to a wider integer, the value is		platform-specific bit width. If casting to a wider integer, the value is
sign-extended. If casting to a narrower integer, the value is truncated.		sign-extended. If casting to a narrower integer, the value is truncated.
		silvasUnsubmitted Not Done Reply Inline Actions Can you add a verifier that the region doesn't have args? Also, I'm not super familiar with ODS, but does this specification autogenerate a verifier that there are no operands, or does it just not check anything about the op's operands? If the latter, please change it so that the verifier checks that there are no operands to this op. silvas: Can you add a verifier that the region doesn't have args? Also, I'm not super familiar with…
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions ODS doesn't support regions yet, and so you are right that we'll need to verify for zero region arguments. (Added that as well as > 0 blocks check.) But for operands, with no operands in the ODS description, the auto-generation will mark the op with the ZeroOperands trait, and the latter's verifier will check for it. bondhugula: ODS doesn't support regions yet, and so you are right that we'll need to verify for zero region…
}];		}];

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Return true if `a` and `b` are valid operand and result pairs for		/// Return true if `a` and `b` are valid operand and result pairs for
/// the operation.		/// the operation.
static bool areCastCompatible(Type a, Type b);		static bool areCastCompatible(Type a, Type b);
}];		}];

▲ Show 20 Lines • Show All 1,065 Lines • Show Last 20 Lines

mlir/lib/Dialect/StandardOps/IR/Ops.cpp

Show First 20 Lines • Show All 1,376 Lines • ▼ Show 20 Lines	OpFoldResult IndexCastOp::fold(ArrayRef<Attribute> cstOperands) {
// of the constant might need to change.		// of the constant might need to change.
if (auto value = cstOperands[0].dyn_cast_or_null<IntegerAttr>())		if (auto value = cstOperands[0].dyn_cast_or_null<IntegerAttr>())
return IntegerAttr::get(getType(), value.getInt());		return IntegerAttr::get(getType(), value.getInt());

return {};		return {};
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// ExecuteRegionOp
		//===----------------------------------------------------------------------===//

		///
		rriddleUnsubmitted Done Reply Inline Actions Use /// For top-level comments. rriddle: Use /// For top-level comments.
		bondhugulaAuthorUnsubmitted Done Reply Inline Actions Sorry, I didn't understand. These aren't doc comments, but for the implementation. This entire comment para along with the "Ex:" should be ///? bondhugula: Sorry, I didn't understand. These aren't doc comments, but for the implementation. This entire…
		rriddleUnsubmitted Done Reply Inline Actions Yeah for consistency, we use /// everywhere. rriddle: Yeah for consistency, we use /// everywhere.
		/// (ssa-id `=`)? `execute_region` `->` function-result-type `{`
		/// block+
		/// `}`
		///
		/// Example:
		/// std.execute_region -> i32 {
		/// %idx = load %rI[%i] : memref<128xi32>
		/// return %idx : i32
		/// }
		///
		static ParseResult parseExecuteRegionOp(OpAsmParser &parser,
		OperationState &result) {
		if (parser.parseOptionalArrowTypeList(result.types))
		return failure();

		// Introduce the body region and parse it.
		Region *body = result.addRegion();
		if (parser.parseRegion(body, /arguments=/{}, /argTypes=*/{}) \|\|
		parser.parseOptionalAttrDict(result.attributes))
		return failure();

		// Parse the optional attribute list.
		if (parser.parseOptionalAttrDict(result.attributes))
		return failure();

		return success();
		}

		static void print(OpAsmPrinter &p, ExecuteRegionOp op) {
		p << ExecuteRegionOp::getOperationName();
		if (op.getNumResults() > 0)
		p << " -> " << op.getResultTypes();

		p.printRegion(op.region(),
		/printEntryBlockArgs=/false,
		/printBlockTerminators=/true);

		p.printOptionalAttrDict(op.getAttrs());
		}

		static LogicalResult verify(ExecuteRegionOp op) {
		if (op.region().empty())
		return op.emitOpError("region needs to have at least one block");

		if (op.region().front().getNumArguments() > 0)
		return op.emitOpError("region cannot have any arguments");

		return success();
		}

		//===----------------------------------------------------------------------===//
// LoadOp		// LoadOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static LogicalResult verify(LoadOp op) {		static LogicalResult verify(LoadOp op) {
if (op.getNumOperands() != 1 + op.getMemRefType().getRank())		if (op.getNumOperands() != 1 + op.getMemRefType().getRank())
return op.emitOpError("incorrect number of indices for load");		return op.emitOpError("incorrect number of indices for load");
return success();		return success();
}		}
▲ Show 20 Lines • Show All 1,139 Lines • Show Last 20 Lines

mlir/test/IR/core-ops.mlir

Show First 20 Lines • Show All 594 Lines • ▼ Show 20 Lines	func @calls(%arg0: i32) {
%f_0 = constant @return_op : (i32) -> i32		%f_0 = constant @return_op : (i32) -> i32

// CHECK: %3 = call_indirect %f_0(%arg0) : (i32) -> i32		// CHECK: %3 = call_indirect %f_0(%arg0) : (i32) -> i32
%2 = call_indirect %f_0(%arg0) : (i32) -> i32		%2 = call_indirect %f_0(%arg0) : (i32) -> i32

// CHECK: %4 = call_indirect %f_0(%arg0) : (i32) -> i32		// CHECK: %4 = call_indirect %f_0(%arg0) : (i32) -> i32
%3 = "std.call_indirect"(%f_0, %arg0) : ((i32) -> i32, i32) -> i32		%3 = "std.call_indirect"(%f_0, %arg0) : ((i32) -> i32, i32) -> i32

		// CHECK: execute_region -> i64 {
		// CHECK-NEXT: constant
		// CHECK-NEXT: return
		// CHECK-NEXT: }
		%4 = execute_region -> i64 {
		%c1 = constant 1 : i64
		return %c1 : i64
		}

		// CHECK: execute_region {
		// CHECK-NEXT: br ^bb1
		// CHECK-NEXT: ^bb1:
		rriddleUnsubmitted Done Reply Inline Actions nit: Don't check the pred comment, rriddle: nit: Don't check the pred comment,
		// CHECK-NEXT: return
		// CHECK-NEXT: }
		"std.execute_region"() ({
		^bb0:
		br ^bb1
		^bb1:
		return
		}) : () -> ()

return		return
}		}

// CHECK-LABEL: func @extract_element(%arg0: tensor<*xi32>, %arg1: tensor<4x4xf32>) -> i32 {		// CHECK-LABEL: func @extract_element(%arg0: tensor<*xi32>, %arg1: tensor<4x4xf32>) -> i32 {
func @extract_element(%arg0: tensor<*xi32>, %arg1 : tensor<4x4xf32>) -> i32 {		func @extract_element(%arg0: tensor<*xi32>, %arg1 : tensor<4x4xf32>) -> i32 {
%c0 = "std.constant"() {value = 0: index} : () -> index		%c0 = "std.constant"() {value = 0: index} : () -> index

// CHECK: %0 = extract_element %arg0[%c0, %c0, %c0, %c0] : tensor<*xi32>		// CHECK: %0 = extract_element %arg0[%c0, %c0, %c0, %c0] : tensor<*xi32>
▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

mlir/test/IR/invalid-ops.mlir

	Show First 20 Lines • Show All 1,152 Lines • ▼ Show 20 Lines
	// -----			// -----

	// 0 alignment value.			// 0 alignment value.
	func @assume_alignment(%0: memref<4x4xf16>) {			func @assume_alignment(%0: memref<4x4xf16>) {
	// expected-error@+1 {{'std.assume_alignment' op attribute 'alignment' failed to satisfy constraint: 32-bit signless integer attribute whose value is positive}}			// expected-error@+1 {{'std.assume_alignment' op attribute 'alignment' failed to satisfy constraint: 32-bit signless integer attribute whose value is positive}}
	std.assume_alignment %0, 0 : memref<4x4xf16>			std.assume_alignment %0, 0 : memref<4x4xf16>
	return			return
	}			}

				// -----

				func @execute_region() {
				// expected-error @+1 {{region cannot have any arguments}}
				"std.execute_region"() ({
				^bb0(%i : i32):
				return
				}) : () -> ()
				return
				}

mlir/test/Transforms/canonicalize.mlir

Show First 20 Lines • Show All 485 Lines • ▼ Show 20 Lines	func @hoist_constant(%arg0: memref<8xi32>) {
affine.for %arg1 = 0 to 8 {		affine.for %arg1 = 0 to 8 {
// CHECK-NEXT: store %c42_i32, %arg0[%arg1]		// CHECK-NEXT: store %c42_i32, %arg0[%arg1]
%c42_i32 = constant 42 : i32		%c42_i32 = constant 42 : i32
store %c42_i32, %arg0[%arg1] : memref<8xi32>		store %c42_i32, %arg0[%arg1] : memref<8xi32>
}		}
return		return
}		}

		// CHECK-LABEL: func @propagate_into_execute_region
		func @propagate_into_execute_region() {
		%cond = constant 0 : i1
		affine.for %i = 0 to 100 {
		"foo"() : () -> ()
		%v = execute_region -> i64 {
		cond_br %cond, ^bb1, ^bb2

		^bb1:
		%c1 = constant 1 : i64
		br ^bb3(%c1 : i64)

		^bb2:
		%c2 = constant 2 : i64
		br ^bb3(%c2 : i64)

		^bb3(%x : i64):
		return %x : i64
		}
		"bar"(%v) : (i64) -> ()
		// CHECK: std.execute_region -> i64 {
		// CHECK-NEXT: return %c2_i64 : i64
		// CHECK-NEXT: }
		}
		return
		}

// CHECK-LABEL: func @const_fold_propagate		// CHECK-LABEL: func @const_fold_propagate
func @const_fold_propagate() -> memref<?x?xf32> {		func @const_fold_propagate() -> memref<?x?xf32> {
%VT_i = constant 512 : index		%VT_i = constant 512 : index

%VT_i_s = affine.apply affine_map<(d0) -> (d0 floordiv 8)> (%VT_i)		%VT_i_s = affine.apply affine_map<(d0) -> (d0 floordiv 8)> (%VT_i)
%VT_k_l = affine.apply affine_map<(d0) -> (d0 floordiv 16)> (%VT_i)		%VT_k_l = affine.apply affine_map<(d0) -> (d0 floordiv 16)> (%VT_i)

// CHECK: = alloc() : memref<64x32xf32>		// CHECK: = alloc() : memref<64x32xf32>
▲ Show 20 Lines • Show All 396 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Introduce scf.execute_region opClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 253396

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

mlir/lib/Dialect/StandardOps/IR/Ops.cpp

mlir/test/IR/core-ops.mlir

mlir/test/IR/invalid-ops.mlir

mlir/test/Transforms/canonicalize.mlir

[MLIR] Introduce scf.execute_region op
ClosedPublic