This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Enhance InferShapedTypeOpInterface and move LinalgOps to use them.
ClosedPublic

Authored by mravishankar on Mar 3 2021, 1:43 PM.

Details

Summary

or most Linalg operations, the shape of the output can be computed
from the shape of the inputs. A new interface is added that allows
operations to define the shape of the output in terms of shapes of its
operands. A new canonicalization pattern is added to canonicalize dim
ops that query the shape of the result of operations that implement
this interface. This replaces the op -> dim canonicalization
patterns inserted in a piecewise manner.

Diff Detail

Event Timeline

mravishankar created this revision.Mar 3 2021, 1:43 PM
mravishankar requested review of this revision.Mar 3 2021, 1:43 PM

Updating the patch to not use tensor.from_elements. Instead change
to interface to return the Values directly for dims of all results.

benvanik added inline comments.
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
1014

can you try using createOrFold here and elsewhere when the dims are created? in theory that'll immediately fold if the dim+source can resolve, preventing the insertion of the dim op entirely

It's fine to have these separate for review, but please merge to avoid updating the interface in two consecutive changes where one undos part of the previous

Updating by folding in patch D97532.

mravishankar retitled this revision from [mlir] Change InferShapedTypeInterface to have a interface that returns shape of all results. to [mlir] Enhance InferShapedTypeOpInterface and move LinalgOps to use them..Mar 4 2021, 10:11 AM
mravishankar edited the summary of this revision. (Show Details)

I folded things into a single patch. (Was planning to do that, sorry wasnt explicit before).

@benvanik I saw your comment of using createOrFold (it isnt there on this patch anymore), that made me think we should have SmallVectorImpl<SmallVector<OpFoldResult>> & that might just remove some unnecessary IR being created.

stellaraccident accepted this revision.Mar 8 2021, 8:06 AM

Thanks, this lgtm and it appears that review comments have been addressed.

This revision is now accepted and ready to land.Mar 8 2021, 8:06 AM
jpienaar added inline comments.Mar 8 2021, 10:17 AM
mlir/include/mlir/Interfaces/InferTypeOpInterface.td
118

Every shapedtype only has one shape, this results the value computed for the shape of the shapedtype. Not sure why vector of vectors are needed.

Rebase and address concerns from offline conversation

Rebase and remove unused methods.

jpienaar added inline comments.Mar 9 2021, 2:38 PM
mlir/include/mlir/Interfaces/InferTypeOpInterface.td
103

Why this change?

131

Also document that this method is restricted to only ranked cases and cannot be used for unranked contrasted to more general above.

140

Is this change needed? The InferTensorType trait has this method.

142

The top one uses Value, this one OpFoldResult while the comment says Value. This method is reifying/building the ops to compute the shape. The use of this is in many context, in some of those these could be extracted into pure compute parts. In cases where the type is (for example) the same, would still require to create an Attribute (as the shape is stored in the type, but not as a separate Attribute). So if asked reify the shape for me, if the return is a free standing attribute then the caller would still need to create a constant op.

Lets keep this as Value, as this function is intended for reification. It can be used/but isn't meant as an optimized compute shape by folding & it is also possible to only call reify for dynamic shapes/dims. That way the reify doesn't become mayReify and require consumers to check whether Value or not is returned.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1568

Should this be in StandardOps then? (e.g., making standardops depend on tensor dialect, it does already but not sure if that is intended state there)

1596

Should be these be switched? E.g., I can see how one would use a helper method to implement both of these method, and so both interface methods would succeed, but the latter is more efficient if already in that form.

Adding Stephan as I'll be OOO

mravishankar marked an inline comment as done.Mar 9 2021, 3:04 PM
mravishankar added inline comments.Mar 9 2021, 3:04 PM
mlir/include/mlir/Interfaces/InferTypeOpInterface.td
103

Otherwise every op needs to implement this method. I was trying to make this more opt-in. If a method implements it and an analysis can use it great. But I dont see a need for it to be forced on all operations that implement this interface.

131

Good point. Will do so.

140

It doesnt declare the inferReturnTypeComponents. With the default implementation on this method, by default ODS will not generate the declaration for this method on the op. This gets back to the previous behavior of the defvar.

142

I am not sure what is the use of keeping it as Value. I see OpFoldResult is a super set of Value. If the op implementing the interface knows it is a static value, it will return a static value as an attribute. If the method querying this information can use OpFoldResult then its a win-win. A static value is maintained throughout without having to unnecessarily instantiate a Value only for the canonicalizer to make it go away. If the client needs a Value it can always create the Value as needed.

Returning Value and relying on canonicalizers to make them constants is problematic. There are many cases where certain patterns kick in only when shapes are static. For example if you are tiling + vectorizing an operation, then vectorization requires all shapes to be statically known. If the tiling cannot ensure that the shapes are static by construction (and rely on canonicalization to propagate) that information instead it creates a break point. You have to run a pass to do tiling, then run a pass to do vectorization. That leads to other issues cause you need to make sure you are only vectorizing the op you tiled in a previous pass and not other ops (of the same type) that might exist in your program/function that are not meant to be vectorized. THis is just an example where relying on canonlicazations to propagate static information just makes things harder.

Bottom line though is that OpFoldResult is stricly more expressive than Value (its literally a PointerPair<Value, Attribute>). So you can use a Value if thats the case that suits you or an Attribute if you can.

herhut requested changes to this revision.Mar 10 2021, 9:40 AM
herhut added inline comments.
mlir/include/mlir/Interfaces/InferTypeOpInterface.td
142

I think the question here then is what reify means. If a certain output dimension of an operation has a static shape, then you can just query the type. No need to reify anything, as you have full static knowledge already. If it is a dynamic dimension, you can use this interface to produce IR that makes this dynamic value available to you.

How about changing this interface so that it only produces the shape for a specific dimension and result? That would keep it as a pure reification interface and the logic you want could be in a helper? That is essentially the same argument you gave, just the other way round.

A default implementation could be to fall back to the other interface and insert an extract.

mlir/lib/Dialect/Linalg/IR/LinalgInterfaces.cpp
202

Why the orFold? What does the ConstantIndexOp fold into?

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1610

Is this a good canonicalization in general? In essence, it replicates the computation of the shape in IR instead of falling back on the memref descriptor. That is a good thing to do if you at runtime do not have descriptors. It is less good if you have, as you replicate the computation.

We have the same concept with shape_of and not dim. So generally, shape_of(a) can be replaced with the result of the other shape interface and that is useful in some cases for static analysis purposes and to discharge constraints. However, we do not have this as a canonicalization pattern as it also might duplicate computation. So we have it in a separate pass.

This revision now requires changes to proceed.Mar 10 2021, 9:40 AM

Moving the interface into Linalg for now till can converge on using InferShapedTypeOpInterface

@herhut Had some offline discussions with Jacques. For the time being moving the interface into Linalg. We can unify them once we iron out all the API of the interface. I am happy to pick that up based on what is finally converged upon.

W.R.T your comment about using OpFoldResult, I think it comes down to what you are using the interface for. To me it just felt more uniform to get the shape of all the dimensions. There is no need to check on the analysis side to check if the shape is dynamic and then go query the interface. You query the interface and it gives you the entire shape (static and dynamic) and then its a matter of just extracting the Value (or int64_t if you can use that) on the client side. Much cleaner usage AFAICS.

W.R.T. having an interface that allows you to get a single dim of a single result, that was one of the things done in an earlier version of the patch. But Jacques pointed out that it is not general enough. Some ops might not be able to compute a single dim without doing all the work to compute the entire dim. In which case if you do need multiple dims of the same result, then you have redundant (potentially heavy) computation that might not CSE. That makes sense to me. There is some API issues to work out here. Landing this in Linalg for now.

Fix build file.

mravishankar retitled this revision from [mlir] Enhance InferShapedTypeOpInterface and move LinalgOps to use them. to [mlir][Linalg] Add an interface in Linalg to infer shapes of Linalg Ops..Mar 10 2021, 3:17 PM
mravishankar edited the summary of this revision. (Show Details)

For the time being moving the interface into Linalg. We can unify them once we iron out all the API of the interface. I am happy to pick that up based on what is finally converged upon.

What's the plan to get there? It seems like moving this to linalg is kind of working around the lack of consensus on the API, in practice it seems like it can split the implementation into "Linalg-specific" things and the general case, which will lead to implementations of analysis that are linalg specific. While there are many reasons to have Linalg-specific analysis/transformations, I'm not sure this is such a case (which seems acknowledge by your sentence I'm quoting here).

mravishankar added inline comments.Mar 10 2021, 3:29 PM
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
1610

I see what you are saying here. I dont want to start a discussion about whether memref descriptor is a good thing or not, but I dont think it is a good idea for IR canonicalizations to be indexed on how memref descriptors work. If the memref descriptors are replicating computation, then that seems to be an issue with memref descriptor.

Here this pattern is really important. When you replace an operation that has dynamic shapes with another operation, the shape of the replacement obviously needs to match the shape of this op. If you use a the result SSA value, then you cannot remove all uses of the replaced op (and you need to move the insertion point after the operation being replaced to make sure you are no violating use-def chains). Instead resolving the dims in terms of the inputs avoids all of this. This is the reason why I started doing this work. So it is a good canonicalization to have :) . In any case, this is now moved into Linalg dialect only, so it should not affect your use case presently.

For the time being moving the interface into Linalg. We can unify them once we iron out all the API of the interface. I am happy to pick that up based on what is finally converged upon.

What's the plan to get there? It seems like moving this to linalg is kind of working around the lack of consensus on the API, in practice it seems like it can split the implementation into "Linalg-specific" things and the general case, which will lead to implementations of analysis that are linalg specific. While there are many reasons to have Linalg-specific analysis/transformations, I'm not sure this is such a case (which seems acknowledge by your sentence I'm quoting here).

Yeah, I'd like to as well. The hope is that anything in this interface can be made available to the InferShapedTypeOpInterface by just redirecting the call. I think there are larger disagreements about how the shape inference work should evolve. I am not aware of all the opinions here and cannot comment on those.

mehdi_amini requested changes to this revision.Mar 10 2021, 4:42 PM

I think there are larger disagreements about how the shape inference work should evolve. I am not aware of all the opinions here and cannot comment on those.

I'm not plugged into the details here, but if there is a disagreement it should be addressed. I reiterate what I wrote above: it seems like moving this to Linalg is kind of working around the lack of consensus, do I miss something here?

This revision now requires changes to proceed.Mar 10 2021, 4:42 PM

I think there are larger disagreements about how the shape inference work should evolve. I am not aware of all the opinions here and cannot comment on those.

I'm not plugged into the details here, but if there is a disagreement it should be addressed. I reiterate what I wrote above: it seems like moving this to Linalg is kind of working around the lack of consensus, do I miss something here?

Any suggestions of how to proceed. I am honestly not the person who can drive consensus here cause I do not have the larger context in which the upstream interface is designed. Based on offline discussion this was decided as an intermediate step till the pieces can be merged. This is going to be a simple interface with very fixed functionality. I'll leave it to @nicolasvasilache and @stellaraccident to decide if this is worth landing here or not. I need this functionality so if this is a no-go, I will have to find less diserable WAR solutions (either in Linalg or in IREE).

I think there are larger disagreements about how the shape inference work should evolve. I am not aware of all the opinions here and cannot comment on those.

I'm not plugged into the details here, but if there is a disagreement it should be addressed. I reiterate what I wrote above: it seems like moving this to Linalg is kind of working around the lack of consensus, do I miss something here?

Any suggestions of how to proceed. I am honestly not the person who can drive consensus here cause I do not have the larger context in which the upstream interface is designed. Based on offline discussion this was decided as an intermediate step till the pieces can be merged. This is going to be a simple interface with very fixed functionality. I'll leave it to @nicolasvasilache and @stellaraccident to decide if this is worth landing here or not. I need this functionality so if this is a no-go, I will have to find less diserable WAR solutions (either in Linalg or in IREE).

I can review the thread and try to catch up later, but if there is a conflicting opinion on the shape side, we probably need to resolve it with jpienaar, and i think he is out until next week.

As a side note, this is the cost of IREE being locked to Google's at head development cycle: there is no ability to land a temporary fix past upstream head to make progress. Circt, for example, maintains a project specific LLVM branch for this reason. I know more than anyone that there are reasons for this, but I feel like it is kind of hard for us to hold those reasons and feel urgency about a few days to converge -- when that is a byproduct of Google, not LLVM development process.

Fwiw, the tensorflow/xla side has at times been guilty of this false urgency too, and I would like to see us all be forced to improve in this area.

I think there are larger disagreements about how the shape inference work should evolve. I am not aware of all the opinions here and cannot comment on those.

I'm not plugged into the details here, but if there is a disagreement it should be addressed. I reiterate what I wrote above: it seems like moving this to Linalg is kind of working around the lack of consensus, do I miss something here?

Any suggestions of how to proceed. I am honestly not the person who can drive consensus here cause I do not have the larger context in which the upstream interface is designed. Based on offline discussion this was decided as an intermediate step till the pieces can be merged. This is going to be a simple interface with very fixed functionality. I'll leave it to @nicolasvasilache and @stellaraccident to decide if this is worth landing here or not. I need this functionality so if this is a no-go, I will have to find less diserable WAR solutions (either in Linalg or in IREE).

I can review the thread and try to catch up later, but if there is a conflicting opinion on the shape side, we probably need to resolve it with jpienaar, and i think he is out until next week.

Summary of some issues that are unresolved.

  1. The current reify... methods need to populate a SmallVector<Value> with as many Values as the number of results of the operation. So each Value is potentially a tensor< rank x index>. This needs to be created by tensor.from_elements in the callee and then the caller needs to tensor.extract the values it needs. Canonicalize and DCE/CSE will eliminate redundant ops. This makes sense for ops where the shape of the result is computed as a single value already, but not for ops, where each dim of each result is computed independently. So the downside here is that a) Everywhere you need to use that interface you need to depend on tensor dialect, which seems unnecessary b) THere is unnecessary cost of adding IR just to remove these instructions in canonicalizations. This is purely an artifact of the interface
  2. To account for the above, the next approach was to add a second interface method where the ops implementing the interface need to populate SmallVector<SmallVector<OpFoldResult>>. So there as many vectors as the number of results and each vector is of size rank of the result. The use of OpFoldResult allows returning both static values and dynamic values. There was push back against using OpFoldResult instead of Value. From experience the use of OpFoldResult is really important. Many times (almost always) these shapes returned are used as shapes of results for a replacement operation. Making the interface return Values means that the created operation is now dynamically shaped (even if the original value is statically shaped). To fix up all uses you need to add tensor.cast operations and then rely on canonicalizers to make the result statically shaped again, and remove all the tensor casts. This is again unnecessary addition/deletion of instructions created due to the interface (assuming that the canonicalizers do everything that is needed which is not a given)

I dont want to interfere with what the existing interface is doing. The intermediate interface added to Linalg can always be used to provide the implementation of everything InferShapedTypeOpInterface needs. So I dont see any conflict here.

I feel like I can accept that there is a place for a more specialized shape query interface specifically for linalg, given that it has a more constrained representation, allowing for a narrower interface, compared to the global InferShapeTypeOpInterface.

I'd be inclined to say that the original proposal to add more constrained per-dimension query mechanisms to the global InferShapeTypeOpInterface was not the right direction, since the new methods would only apply to specific ops and circumstances versus being a general purpose shape query mechanism. If we think we need something more specific for a defined subset of cases, then it is the right thing to not try to generalize the global interface in one step. Implementing locally for the case where it is deemed useful seems like the right incremental path forward.

I can take a more detailed final pass if that approach seems reasonable. It looks like Jacques delegated to herhut while he is out: herhut should get a chance to respond when he comes online for his day. It then looks like Mehdi was primarily objecting on non-consensus grounds. If herhut is not opposed to the approach, then neither am I.

mlir/include/mlir/Dialect/Linalg/IR/LinalgInferShapeInterface.td
19 ↗(On Diff #329790)

If I'm doing a fly-by and I see this interface, the first thing I want to know is "how is this different from the global InferShapeTypeOpInterface?

As it turns out, I do know this area and can glean the difference, and I wonder if some name clarification (or comments) can help.

The global interface is generic over any shape transfer function, including, I believe, ranked, unranked and dynamic. This interface is providing a narrow facility to query individual dimensions of the results. I can see the arguments about why you would want this more direct access for various things, given that the ops here are constrained such that it is always valid to service such a query, whereas the global InferShapeTypeOpInterface is higher level (you can implement InferShapeTypeOpInterface in terms of InferShapeOp here).

If we go forward with this, it makes more sense to me for this interface to be something like InferResultDimsInterface and to make the method a bit cleaner (comment below).

41 ↗(On Diff #329790)

In what situations can this fail?

42 ↗(On Diff #329790)

getResultShapeDims if renaming.

44 ↗(On Diff #329790)

I don't love this calling convention. Up thread, Jacques had asked (on the original) version, whether we wanted to return just dims for a certain result index or result+dim. In general, we've biased towards not economizing at the individual dim level (and as you have it, it can return an attribute for static dims, eliminating most practical redundancies). Any chance we could at least do this by result index and avoid the double vector? I don't feel super strongly about this - more of a smell.

Document the interface scope better.

mlir/include/mlir/Dialect/Linalg/IR/LinalgInferShapeInterface.td
19 ↗(On Diff #329790)

Yeah this makes sense. That was feedback too, that I was maybe looking at optimizing for a specific use case. So the intent is to have a Linalg specific interface that is for this optimized used case, that can also tie into the larger InferShapedTypeOpInterface. I added an explicit note stating this here (kept the name same for now, will change it based on other feedback).

41 ↗(On Diff #329790)

The failure is for two uses case. One is based on how the next iteration of this might look like (it is described below), but another reason is that this interface (I think) is useful when the shape of the result of the operation is expressible using the shape of the operands. The failure is to indicate that this is not possible for the op. Today it is not true for any Linalg operation (and shouldnt be true for any operation that is tensor based AFAICS). Still having an escape hatch here will help exit gracefully when failure does happen (instead of asserts that might not trigger based on build)

42 ↗(On Diff #329790)

(see below)

44 ↗(On Diff #329790)

The thought here is that each op might have different constraints w.r.t how the shapes need to be computed. It might be not possible (or at least not optimal) for an op to compute the shape of every result dim individually. So there is a natural heirarchy of methods (each of which returns failure by default)

a) getResultShapes: Return the shape of all dims for all results
b) getResultShape (singular): Return the shape of all dims of a particular result
c) getResultDim : Return the shape of a particular dim of a particular result.

An op can override any of these methods. The default implementation of (a) uses (b), and the default implementation of (b) uses (c).

From the perspective of the analysis/transformation the hierarchy goes in reverse. So we need a set of utility functions (Static Interface methods) that use the above interface as following
a) computeResultShapes(InferShapeOp op) : Returns the the result of op.getResultShape
b) computeResultShape(InferShapeOp op, unsigned index): Returns the result at position index of the op. If op implements getResultShape use that, otherwise use the result of computeResultShape and extract the needed values.
c) computeResultDim(InferShapeOp op, unsigned index, unsigned dim) : return the shape of a particular result, dim. If the op implements getResultDim use that, if not use computeResultShape and extract the needed values.
(i.e. (c) uses (b) as fallback and (b) uses (a) as fallback, i.e reverse of the above get* methods).

All this is just an idea, and planning to implement only on need basis. For now. Just getResultShape will suffice to see if we need this whole layering, either for optimization or correctness. All this was to justify why I went with the current calling convention. It is the least opinionated.

mravishankar added inline comments.Mar 11 2021, 11:24 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgInferShapeInterface.td
44 ↗(On Diff #329790)

Typo:

a) computeResultShapes(InferShapeOp op) : Returns the the result of op.getResultShapes

mravishankar added inline comments.Mar 11 2021, 11:27 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgInferShapeInterface.td
44 ↗(On Diff #329790)

Another typo :

All this is just an idea, and planning to implement only on need basis. For now. Just getResultShapes will suffice to see if we need this whole layering, either for optimization or correctness. All this was to justify why I went with the current calling convention. It is the least opinionated.

stellaraccident accepted this revision.Mar 11 2021, 2:30 PM

This seems fine to me, and I believe that Jacques is supportive of a local/specialized thing here. However, since he has open comments, I would prefer that we get his lgtm on Monday when he is back prior to proceeding.

nicolasvasilache added a comment.EditedMar 12 2021, 12:45 AM

I have not yet dug into the guts of the discussion and the implementation details, I'll just comment on 2 things:

There was push back against using OpFoldResult instead of Value. From experience the use of OpFoldResult is really important.

Where it makes sense, it is crucial to move towards a ValueOrAttr abstraction rather than stay stuck in Value land.
Maybe it does not (yet?) make sense for the Shape world and that is perfectly fine.
However this is where Linalg is: see in particular the getMixed methods in ViewLikeInterface.
The quality of life improvements brought about by that particular refactoring were enormous.

I've also seen similar discussions kick in in a vector context: one want to use ValueOrAttr aggressively to get a static constant that can be used to build a new vector shape.
Any API in the middle that does not use ValueOrAttr when it could becomes a (multiplicative) efficiency loss.

I reiterate what I wrote above: it seems like moving this to Linalg is kind of working around the lack of consensus, do I miss something here?

I don't see a particular need for consensus between the needs of Linalg and the solution provided by the Shape dialect at this point in spacetime.
What makes sense for Linalg today may or may not make sense for Shape tomorrow.

Stepping back from Linalg, I think we should generally stop wanting to solve all problems at the same time in the same abstraction: this is a recurrent source of a lot of wasted effort.
The ultimate goal is of course convergence but it must happen progressively through iteration.
Progress is not a step function: 0: the wheel 1: flying cars.

The risk is of course time spent in refactorings.
Yes it can be expensive; yet it is absolutely dwarfed by the cost of inertia and missed opportunities.

W.R.T your comment about using OpFoldResult, I think it comes down to what you are using the interface for. To me it just felt more uniform to get the shape of all the dimensions. There is no need to check on the analysis side to check if the shape is dynamic and then go query the interface. You query the interface and it gives you the entire shape (static and dynamic) and then its a matter of just extracting the Value (or int64_t if you can use that) on the client side. Much cleaner usage AFAICS.

I would have hidden this behind a helper that wraps the API, so that the reify function always reifies but only gets called if the shape is dynamic. However...

W.R.T. having an interface that allows you to get a single dim of a single result, that was one of the things done in an earlier version of the patch. But Jacques pointed out that it is not general enough. Some ops might not be able to compute a single dim without doing all the work to compute the entire dim. In which case if you do need multiple dims of the same result, then you have redundant (potentially heavy) computation that might not CSE. That makes sense to me. There is some API issues to work out here. Landing this in Linalg for now.

... I did not think about this. I had simply assumed that reifying multiple times ultimately is free. If it is not, as you suggest, then reifying single dimensions is also not possible. And, consequently, the API can not be wrapped as nicely. In particular, it would potentially materialize shape computations that are not needed and cannot be CSE'd. The same argument again.

So if we assume that shape computations are not CSE-able, then something like OpFoldResult makes a lot of sense. I would give it a new name, though, but that is pure stylistics.

I reiterate what I wrote above: it seems like moving this to Linalg is kind of working around the lack of consensus, do I miss something here?

I don't see a particular need for consensus between the needs of Linalg and the solution provided by the Shape dialect at this point in spacetime.
What makes sense for Linalg today may or may not make sense for Shape tomorrow.

Fwiw, this isn't about shape dialect+linalg, the original patch was extending a global interface for shape inference which has claims that seemed overlapping. I think that the point has been resolved that these are different (in that the linalg version is a specialization that we don't necessarily want on the global one). My pushback on landing until Jacques or Stephan weighed in was that they had taken the time to do detailed reviews of the original and we should provide time for the conversation to close (and I know that Jacques is on vacation because he pinged me privately prior to heading out).

mehdi_amini added a comment.EditedMar 12 2021, 2:16 PM

I reiterate what I wrote above: it seems like moving this to Linalg is kind of working around the lack of consensus, do I miss something here?

I don't see a particular need for consensus between the needs of Linalg and the solution provided by the Shape dialect at this point in spacetime.
What makes sense for Linalg today may or may not make sense for Shape tomorrow.

If we're aiming upstream to build interfaces and general solution to manipulate shape inference (reification or not), then I expect that we dogfood. Yes there is a cost, but that has to be the cost of developing linalg upstream.

I always mention to people that we only have the concept of OpInterface in MLIR in the first *because* you had a need in linalg and MLIR didn't have a solution for this.

Stepping back from Linalg, I think we should generally stop wanting to solve all problems at the same time in the same abstraction: this is a recurrent source of a lot of wasted effort.
The ultimate goal is of course convergence but it must happen progressively through iteration.
Progress is not a step function: 0: the wheel 1: flying cars.

Yes: but progress is not a random walk either. You can think about the end goal and have an idea about whether your next step is getting you closer to the flying car ;)
And as a community that joins multiple teams / group with various deployment/integration story, sharing this understanding is a key prerequisite to iterate in the project.

Avoiding to build the vision about the direction, and skip these discussions for the sake of your "island's velocity" does not sit well with me: for example your experience with Value / OpFoldResult is very important to be discussed and understood by others who don't work daily on linalg, this can impact the rest of the project. The points Mahesh is making in this thread make sense to me, and the most recent exchange with Stephan for example shows that discussing this is worthwhile in evolving our shared understanding of the tradeoffs.

The risk is of course time spent in refactorings.
Yes it can be expensive; yet it is absolutely dwarfed by the cost of inertia and missed opportunities.

I don't think it is that black or white, and I think that caution in the consensus when working upstream is critical to the project: I'm personally wary of building a "tech island" around linalg, with a lot of things that would not interface with the rest of the ecosystem.

I have not yet dug into the guts of the discussion and the implementation details, I'll just comment on 2 things:

There was push back against using OpFoldResult instead of Value. From experience the use of OpFoldResult is really important.

Where it makes sense, it is crucial to move towards a ValueOrAttr abstraction rather than stay stuck in Value land.
Maybe it does not (yet?) make sense for the Shape world and that is perfectly fine.

I find it funny how folks (excl Stephan) miss something very simple here: the method that was being changed is called reify. It reifies the shape computation, it doesn't potential reify. So Attributes doesn't makes sense. The caller expects a Value corresponding to the reified shape computation. This would be like having a build method and the build may or may not build (even createOrFold returns a Value rather than an attribute), or materializeConstant which decides to not materialize. It means the caller would need to check which state is made and then create a constant op or not, so this pushes work on to the caller if we were to relax it.

Attributes aren't free here either: if you wanted to avoid overhead during inference an int or an arrayref<int> or ShapedComponentType is better as for intermediate shapes during inference or when using existing shapes from a ShapedType, you don't want to create an attribute. That is uniqueing it in the context without need, it never needs to be a attribute (it isn't one in the final result type contrary to folding where it may be part of the constant op finally created, so there it is only potentially wasteful to make an attribute, here it would always be). But again that is a different method than reify.

W.R.T. having an interface that allows you to get a single dim of a single result, that was one of the things done in an earlier version of the patch. But Jacques pointed out that it is not general enough. Some ops might not be able to compute a single dim without doing all the work to compute the entire dim. In which case if you do need multiple dims of the same result, then you have redundant (potentially heavy) computation that might not CSE. That makes sense to me. There is some API issues to work out here. Landing this in Linalg for now.

... I did not think about this. I had simply assumed that reifying multiple times ultimately is free. If it is not, as you suggest, then reifying single dimensions is also not possible. And, consequently, the API can not be wrapped as nicely. In particular, it would potentially materialize shape computations that are not needed and cannot be CSE'd. The same argument again.

Yes so for cases where computing a single dim requires computing all the dims of the shape, it would seem more fragile (and potentially more expensive) to rely on CSE than having 0-1 tensor.concat (per result) or shape.from_extents and relying on that & get_element_at_dim (forgot the name) to be able to compose - the objection was that one now has (potentially) a concat & multiple get element calls which get inserted and removed later vs forwarding directly from the start (but again, thats an easy pattern that createOrFold could even do, and then it is just deleting the potentially optional concat means at most 1 extra op here). And good point about wrapping, I didn't consider that end.

For LinAlg, is this also true? E.g., is the output shape indices interrelated? If all of them are indeed calculatable per dim, then that changes things here given interface is LinAlg specific now.

This is fine for local iteration in LinAlg, it can't be used by shape inference pass and I don't think the tradeoff of at most one op per result that needs to be inserted and deleted is worth it really (given how often this would be called - one reify call per bufferization?).

mlir/include/mlir/Dialect/Linalg/IR/LinalgInferShapeInterface.td
41 ↗(On Diff #329790)

So failure here could mean either input shapes are invalid or can't get the shape as attributes and values? InferShapedTypeOpInterface is used along with InferTypeOpInterface (mostly, not always, so perhaps I should expand the documentation there to make it clearer) so the verifier catches all invalid shapes so that the reify methods return is less ambiguous (e.g., failure cannot imply invalid input shapes as a verified op can't have it). Are the shapes for LinAlg ops verified elsewhere?

44 ↗(On Diff #329790)

I agree with Stella here, but that is open for you two to discuss too.

I have not yet dug into the guts of the discussion and the implementation details, I'll just comment on 2 things:

There was push back against using OpFoldResult instead of Value. From experience the use of OpFoldResult is really important.

Where it makes sense, it is crucial to move towards a ValueOrAttr abstraction rather than stay stuck in Value land.
Maybe it does not (yet?) make sense for the Shape world and that is perfectly fine.

I find it funny how folks (excl Stephan) miss something very simple here: the method that was being changed is called reify. It reifies the shape computation, it doesn't potential reify. So Attributes doesn't makes sense. The caller expects a Value corresponding to the reified shape computation. This would be like having a build method and the build may or may not build (even createOrFold returns a Value rather than an attribute), or materializeConstant which decides to not materialize. It means the caller would need to check which state is made and then create a constant op or not, so this pushes work on to the caller if we were to relax it.

Attributes aren't free here either: if you wanted to avoid overhead during inference an int or an arrayref<int> or ShapedComponentType is better as for intermediate shapes during inference or when using existing shapes from a ShapedType, you don't want to create an attribute. That is uniqueing it in the context without need, it never needs to be a attribute (it isn't one in the final result type contrary to folding where it may be part of the constant op finally created, so there it is only potentially wasteful to make an attribute, here it would always be). But again that is a different method than reify.

I think that is a good reason to make this Linalg only. In Linalg based codegen not having static shapes has real issues. So this interface addresses those issues. There is more work needed here to bridge the gap between what is needed here and what the Shape inference interface needs.

W.R.T. having an interface that allows you to get a single dim of a single result, that was one of the things done in an earlier version of the patch. But Jacques pointed out that it is not general enough. Some ops might not be able to compute a single dim without doing all the work to compute the entire dim. In which case if you do need multiple dims of the same result, then you have redundant (potentially heavy) computation that might not CSE. That makes sense to me. There is some API issues to work out here. Landing this in Linalg for now.

... I did not think about this. I had simply assumed that reifying multiple times ultimately is free. If it is not, as you suggest, then reifying single dimensions is also not possible. And, consequently, the API can not be wrapped as nicely. In particular, it would potentially materialize shape computations that are not needed and cannot be CSE'd. The same argument again.

Yes so for cases where computing a single dim requires computing all the dims of the shape, it would seem more fragile (and potentially more expensive) to rely on CSE than having 0-1 tensor.concat (per result) or shape.from_extents and relying on that & get_element_at_dim (forgot the name) to be able to compose - the objection was that one now has (potentially) a concat & multiple get element calls which get inserted and removed later vs forwarding directly from the start (but again, thats an easy pattern that createOrFold could even do, and then it is just deleting the potentially optional concat means at most 1 extra op here). And good point about wrapping, I didn't consider that end.

For LinAlg, is this also true? E.g., is the output shape indices interrelated? If all of them are indeed calculatable per dim, then that changes things here given interface is LinAlg specific now.

It is true for some ops, not true for others. The major case is for structured ops. You could compute the shape indices for each element. There is some efficiencies that can be exploited for Structured ops if you compute the shape of all results at once. For other ops like PadTensorOp and InitTensorOp, etc, they difference indices are computed completely independently. So the default here is to compute everything at once. Depending on use cases we can make things a bit more fine-grained.

This is fine for local iteration in LinAlg, it can't be used by shape inference pass and I don't think the tradeoff of at most one op per result that needs to be inserted and deleted is worth it really (given how often this would be called - one reify call per bufferization?).

Going back to the implementation which updated the interface of
InferShapedTypeInterface but dropping the use of OpFoldResult.

mravishankar retitled this revision from [mlir][Linalg] Add an interface in Linalg to infer shapes of Linalg Ops. to [mlir] Enhance InferShapedTypeOpInterface and move LinalgOps to use them..Mar 18 2021, 9:41 PM
mravishankar marked 6 inline comments as done and an inline comment as not done.Mar 18 2021, 9:47 PM
mravishankar added inline comments.
mlir/include/mlir/Interfaces/InferTypeOpInterface.td
131

I can update the doc, but it seems like it could be implemented by unranked to (which I am assuming is <*xf32>? Sorry, not sure if that is unranked, but happy to update the doc.

142

Ok, I changed it to use Value

mlir/lib/Dialect/Linalg/IR/LinalgInterfaces.cpp
202

Probably nothing. Hoping that it would return a previous constant operation of the same value. Doesnt do. Dont see a harm in leaving it that way.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
1014

Using the "mixed mode" avoids the issue now.

I'm having a bit of trouble following the back and forth on this patch (linalg specific vs touching the global InferShapeType interface). I left some specific comments. I don't have an opinion (and defer to those who do) on the "reify" and OpFoldResult vs Value points.

I feel like the description should be updated to more clearly describe where this is landing.

mlir/include/mlir/Interfaces/InferTypeOpInterface.td
131

I can update the doc, but it seems like it could be implemented by unranked to (which I am assuming is <*xf32>? Sorry, not sure if that is unranked, but happy to update the doc.

I think that since this is returning a vector of dims, it is kind of implied that it only applies to ranked cases. Typically if returning a "shape" as per the above/pre-existing methods, an unranked result type would/could return a tensor<?xindex> (or equiv) as its "shape" (and have some kind of whole tensor operations that calculate it in a way that does not pre-suppose knowledge of a rank). With the way this is defined, you lose that generality, and therefore, it is restricted to ranked only.

mlir/lib/Dialect/Linalg/IR/LinalgInterfaces.cpp
202

I would make it just create to avoid confusion of people reading and wondering if they are missing something (I paused and tried to figure out what this could possibly be doing for a constant - better if that cognitive overhead was removed).

(also for consistency with the rest of the patch)

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
680 ↗(On Diff #331758)

Should this pattern have a dedicated unit test (vs being tested incidentally via a linalg op)?

697 ↗(On Diff #331758)

I feel that this could use a comment describing when this cascade (first checking reifyReturnTypeShapes -> reifyReturnTypeShapesPerResultDim) falls through to the second case.

713 ↗(On Diff #331758)

Nit: Maybe call it reifiedReturnDims? One of the things that is confusing me about this patch is that historically, when we say "shape" we refer to a single SSA value that is either a !shape.shape type or a tensor<?xindex>, whereas these per dim variants are inconsistently referring to static lists of dims as "shapes".

Also: You define a differently typed temporary in the block above with the same name. Please don't shadow names like this.

719 ↗(On Diff #331758)

Is there a case where we can get here and the result does not have a rank?

mlir/lib/Dialect/StandardOps/CMakeLists.txt
18 ↗(On Diff #331758)

I think you can revert the changes to this file?

mlir/lib/Interfaces/InferTypeOpInterface.cpp
15 ↗(On Diff #331758)

NFC this cleanup separately?

jpienaar accepted this revision.Mar 19 2021, 10:16 AM

LG to me interface wise modulo some ergonomics and questions others have there. Thanks!

mlir/include/mlir/Interfaces/InferTypeOpInterface.h
19 ↗(On Diff #331758)

Is this needed?

mlir/include/mlir/Interfaces/InferTypeOpInterface.td
104

Just to check, so both may be overriden but at least one has to?

The ideal here would be that one can define one or both, but the caller can call either - e.g., callers need not know which one is implemented and folks can override either, now I don't know if we could use some C++ (potentially) template magic to verify this vs having a mutually recursive call in the non-overridden case. Do you have an idea?

mravishankar marked 9 inline comments as done.

Addressing comments.

mravishankar added inline comments.Mar 19 2021, 4:38 PM
mlir/include/mlir/Interfaces/InferTypeOpInterface.h
19 ↗(On Diff #331758)

THanks. Using just Value this isnt needed anymore.

mlir/include/mlir/Interfaces/InferTypeOpInterface.td
104

I think its better only one is overridden (changed the description to say so). I dont know of a C++ way of enforcing this.

Based on the comments below, it looks like the existing method is strictly more general than the one added in this patch. So the default implementation of the existing method could be to call the method added here. THe part that is unclear to me is how to implement this fall back without adding a dependence on the tensor dialect. On a previous patch, adding dependence of an interface on a dialect was flagged as an issue, particularly because it can cause circular dependencies in the build. If an op from tensor dialect needs to use this interface then it will depend on the library here, and this will in-turn depend on the tensor dialect. I am happy to implement the fall backs as needed.

131

Thanks for the explanation. This makes sense. I updated the description to state that this only supports ranked case.

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
680 ↗(On Diff #331758)

Added a new op to test dialect to test the canonicalization pattern outside of Linalg.

697 ↗(On Diff #331758)

This is what the comment for the function itself is describing. I moved the comment here where its probably more relevant.

719 ↗(On Diff #331758)

Probably not, but this is more of a sanity check. Ideally, would like to verify that this doesnt happen for an op that does implement this method, but the verification will have to introduce operations into the IR, so dont see a good way to verify. For now this canonicalization will just fail gracefully and not just crash.

mlir/lib/Dialect/StandardOps/CMakeLists.txt
18 ↗(On Diff #331758)

Thanks!

mlir/lib/Interfaces/InferTypeOpInterface.cpp
15 ↗(On Diff #331758)

With use of Value I dont need this anymore.

frgossen accepted this revision.Mar 29 2021, 3:13 AM
This revision was not accepted when it landed; it landed in state Needs Review.Mar 29 2021, 11:40 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.