This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Graduate op availability to OpBase.td
Changes PlannedPublic

Authored by antiagainst on Aug 10 2021, 10:43 AM.

Details

Summary

This commit starts graduating op availability mechanisms
from the SPIR-V dialect into the core.

Op availability was designed in the beginning to be
flexible and extensible, so to support dialects with
versioning/availability requirements. It was prototyped
in the SPIR-V dialect and has been proven to work well
thus far. The prototype was written to be as detached
from SPIR-V as possible, in order to make graduation
easy.

See the RFCs for op availability:
https://groups.google.com/a/tensorflow.org/g/mlir/c/UinSwMkz5Fg/m/n6eNM0NiAwAJ
https://llvm.discourse.group/t/graduate-op-versioning-mechanism-to-opbase-td/3507

Depends on D108312

Diff Detail

Event Timeline

antiagainst created this revision.Aug 10 2021, 10:43 AM
antiagainst requested review of this revision.Aug 10 2021, 10:43 AM

This seems like it is missing directed tests and documentation.

This seems like it is missing directed tests and documentation.

Yup. I added tests in a following up patch (https://reviews.llvm.org/D107851). Trying to make this patch mostly just shuffling code around so easier to review. I'll have another patch for the documentation. :)

chhe accepted this revision.Aug 17 2021, 10:01 AM
This revision is now accepted and ready to land.Aug 17 2021, 10:01 AM
jpienaar added inline comments.Aug 17 2021, 10:28 AM
mlir/include/mlir/IR/OpBase.td
2305

When would we have multiple? This seems like Availability is an interface and so could have been defined in interfaces instead of OpBase. Why isn't it?

Esp if you allow specifying an interfaceName that implies to me there isn't just one and it should probably not be in OpBase.

rriddle requested changes to this revision.Aug 17 2021, 10:35 AM

Can you move the docs into this revision? It is a little hard to understand how these components are supposed to connect together which is making some parts a little more difficult to review.

mlir/include/mlir/IR/OpBase.td
2305

Yeah, do we actually need to move any of this into OpBase? This would be a good time to start decentralizing this file.

As to Jacques point, why couldn't this be defined like a normal interface?

This revision now requires changes to proceed.Aug 17 2021, 10:35 AM

Address comments

antiagainst marked 2 inline comments as done.Aug 18 2021, 11:39 AM

Can you move the docs into this revision? It is a little hard to understand how these components are supposed to connect together which is making some parts a little more difficult to review.

I've added the doc in https://reviews.llvm.org/D108313. I can merge from this one up to D108313 if that's prefered. Let me know. :)

mlir/include/mlir/IR/OpBase.td
2305

Availability is a layer on top of op interface; it generates more stuff than just the op interface. The concrete implementation of those interface methods are automatically synthesized; if availability is attached to enum attributes, helper functions for querying them can also be generated. I've added the doc in https://reviews.llvm.org/D108313 so hopefully that will make things clearer.

When would we have multiple?

Actually MinVersionBase and MaxVersionBase are already two different interfaces. (We need to further customize them with a dialect-specific versioning scheme though given each dialect can have different versioning schemes.) In SPIR-V, I additionally have Extension and Capability. These are four different interfaces, each with its own way.

Can you move the docs into this revision? It is a little hard to understand how these components are supposed to connect together which is making some parts a little more difficult to review.

I've added the doc in https://reviews.llvm.org/D108313. I can merge from this one up to D108313 if that's prefered. Let me know. :)

Thanks! Can you roll that into this commit? I find the code easier for me to review alongside the docs.

Merge everything into one patch

Thanks! Can you roll that into this commit? I find the code easier for me to review alongside the docs.

Sure thing. Done now.

Add more docs

What would this look like without being a first class ODS concept?
I mean: this seems like just some sugar on top of interfaces, but how much sugar is there? Could this be less intrusive in ODS itself by exposing C++ helpers and reusing existing ODS concepts?

mlir/docs/OpDefinitions.md
950

so must does not seem grammatically correct to me here, can you reformulate?

956

I don't think "the _availability_ of ops" is all there is about: the characteristics of an op can evolve as well. An attribute can be added or removed, or some attributes values that were previously valid becomes invalid and vice versa.

(here despite the italic, I read "availability of ops" as an English sentence and not as the MLIR specific meaning of "availability").

959
966
1016

Can you add extensive comments in the TableGen code here describing what is each things' purpose and effects?
I can follow, but someone less MLIR savvy may be quickly lost.

1143

Also: the doc does not seem complete enough to me right now. It does not seem enough for figuring out how to develop a versioning strategy for a dialect, maybe mostly because other than describing some internals and the interface generated, it does not really tell me what do we do with the generated code then?

antiagainst marked 6 inline comments as done.

Address comments

antiagainst edited the summary of this revision. (Show Details)Aug 23 2021, 4:43 PM

What would this look like without being a first class ODS concept?
I mean: this seems like just some sugar on top of interfaces, but how much sugar is there? Could this be less intrusive in ODS itself by exposing C++ helpers and reusing existing ODS concepts?

It actually generates a lot more than just the op interface. There are basically there components generated: 1) C++ op interface, 2) concrete C++ impl methods for all ops for every interface, and 3) utility functions for enum attribute. 2) is the major part: each op can have different versioning/capability/extension requirements so each op will have a different impl of every op interface. Taking SPIR-V as an example, a large portion of ops have requirements wrt MinVersion/MaxVersion/Capability/Extension. That's a lot of code to implement (several thousands LOC). Defining it in ODS allows us to have a single source of truth and avoid all those boilerplate. Updating is also trivial given we can just read in the spec and automatically adjust everything. So no manual work here basically (other than running a shell command). :)

Also: the doc does not seem complete enough to me right now. It does not seem enough for figuring out how to develop a versioning strategy for a dialect, maybe mostly because other than describing some internals and the interface generated, it does not really tell me what do we do with the generated code then?

Good catch! Thanks for pointing it out! I tired to lay out the doc in a progressive way and focus on explaining general concepts and then how to use. The details of generated code actually are explained in the very last, if the reader is interested to define its own availability control. But I guess due to the large amount of details under the hood and some of them are necessary to understand the ideas, it can lose focus. So I refined many aspects and included a summary subsection to recap what's needed for using the versioning mechanism, which is also how one use whatever availability controls. Let me know if this is better. :)

mlir/docs/OpDefinitions.md
956

I see your points here. Revised to explain more w.r.t. it.

Refine a bit more

It actually generates a lot more than just the op interface. There are basically there components generated: 1) C++ op interface, 2) concrete C++ impl methods for all ops for every interface, and 3) utility functions for enum attribute. 2) is the major part: each op can have different versioning/capability/extension requirements so each op will have a different impl of every op interface. Taking SPIR-V as an example, a large portion of ops have requirements wrt MinVersion/MaxVersion/Capability/Extension. That's a lot of code to implement (several thousands LOC). Defining it in ODS allows us to have a single source of truth and avoid all those boilerplate.

It still isn't clear to me here: the concrete C++ could be delegated to other mechanism and still be defined in ODS. For example using derived attribute to expose the availability of an operation.
Yes what you're proposing is adding sugar, but I'd like to see the alternative design which would be using as much existing ODS feature as possible before we actually change ODS itself.

It still isn't clear to me here: the concrete C++ could be delegated to other mechanism and still be defined in ODS. For example using derived attribute to expose the availability of an operation.
Yes what you're proposing is adding sugar, but I'd like to see the alternative design which would be using as much existing ODS feature as possible before we actually change ODS itself.

Coming back to this. Thanks for the comment, Mehdi! Trying to understand your point and make sure I answer properly. Is the concern about introducing new ODS mechanisms?

My take on whether we should introduce mechanisms is depending on

  • how common it is across multiple dialects,
  • how much leverage we can have by introducing it.

What version/capability/etc. an op require is not as ubiquitous as op/attribute/etc., but they are common enough (esp. for edge dialects) to have its own modelling and definition. Having that improves consistency across dialects needing it and also brings benefits for better readability/understandability/knowledge transfer/etc.. For the second point, a separate availability concept really drives generating lots of boilerplate C++ code across multiple components (which I explained before), as we can just customize the C++ code generation anchoring on it. Using other mechanisms not designed for this purpose means we need to piggyback/shorehorn in various ways. It's not a clear representation and could be a maintenance burden for other mechanisms too.

Specifically I've checked derived attribute, but I'm not sure that's the suitable mechanism for supporting the availability for the following reasons:

  • Derived attribute itself is already one level of interface. Using it for availability will mean we are effectively creating "interfaces" inside interfaces. IMO that's not the proper representation for a common concept like availability.
  • Also with this nested level of "interfaces", we are trying to create structures inside the derived attribute itself: all ops will need to have the same exact named attribute and querying will first go through the derived attribute interface symbol DerivedAttributeOpInterface and then a strings like (isDerivedAttribute("min_version")/isDerivedAttribute("max_version")), instead of just one level QueryMinVersionInterface/QueryMaxVersionInterface symbol. I think one of the goals of ODS is to drive C++ code generation to reduce this stringish representations as they are hard to check/validate. So we use ODS to define ops instead of checking their string names in C++. One could argue it's similar here.
  • Derived attribute requires having DerivedAttr on the op itself, with some code body embedded. That can introduce a lot of duplication and make it hard for readability. For example, many SPIR-V ops have multiple availability dimensions. It would mean defining multiple extra DerivedAttr on it like
def SPV_SomeOp ... {
  DerivedTypeAttr min_version = DerivedTypeAttr<"::mlir::spirv::Version", "<calling into some separately generated C++ utility function to compute as we cannot inline here given it's too complicated>">,
  DerivedTypeAttr max_version = DerivedTypeAttr<"::mlir::spirv::Version", "...">
  DerivedTypeAttr capabilities = DerivedTypeAttr<"::llvm::SmallVector<::llvm::ArrayRef<::mlir::spirv::Capability>, 1>", "...">
  DerivedTypeAttr extensions = DerivedTypeAttr<"::llvm::SmallVector<::llvm::ArrayRef<::mlir::spirv::Extension>, 1>", "...">
}

I find it hard to compare with the simplicity and conciseness of the following:

let availability = [
  MinVersion<SPV_V_1_3>,
  MaxVersion<SPV_V_1_5>,
  Extension<[]>,
  Capability<[SPV_C_GroupNonUniformBallot]>
];

Note that that above differs at a per-op level, as different ops have different requirements on version/capability/extension/etc.

  • Derived attribute interface is for ops and it cannot be used on attributes. In SPIR-V it's fairly common to have enums values requiring different version/capability/etc. Being able to attach that directly on the attribute is better for representation and avoids lots of duplication (in all user ops).

I also checked other existing ODS mechanisms and I don't find other good hosts to handle the problem here. (But I might miss something so recommendations certainly welcome!) The goal is to generate all those boilerplate C++ code so that we have one concise single source of truth in ODS. I think even we can probably piggyback/shorehorn different components on different mechanisms, it looks a disintegrated solution that will cause confusion and maintenance burden: folks will have difficulty to understand how everything fits together and that means harder to debug/improve/etc.

What's introduced in ODS is really quite well isolated. Aside from the main Availability and the MinVersionBase/MaxVersionBase definitions, we just need one additional field on the op and the enum attribute. They defaults to none and it won't affect folks don't care about it at all.

Hopefully this addresses the concerns. Otherwise feel free to let me know where I should explain/improve more. Thanks. :)

. Is the concern about introducing new ODS mechanisms?

I would say it this way: the concern is about introducing ODS mechanisms instead of building on top of ODS and by composition of existing mechanisms. I believe that this can easily lead to bloating ODS (which is already a complex and not always consistent system), when we'd better modularize and make it possible to build on top of it.

There is also an aspect of maturity: I'm not sure how widespread the concept of "availability" is (you mention edge, we but don't even have a handful of examples right now) and it isn't clear to me that the requirements are clear enough to make it a feature that will work for everyone. It is easy to anchor to one example and provide something that does not generalize to what's needed.

def SPV_SomeOp ... {

DerivedTypeAttr min_version = DerivedTypeAttr<"::mlir::spirv::Version", "<calling into some separately generated C++ utility function to compute as we cannot inline here given it's too complicated>">,
DerivedTypeAttr max_version = DerivedTypeAttr<"::mlir::spirv::Version", "...">
DerivedTypeAttr capabilities = DerivedTypeAttr<"::llvm::SmallVector<::llvm::ArrayRef<::mlir::spirv::Capability>, 1>", "...">
DerivedTypeAttr extensions = DerivedTypeAttr<"::llvm::SmallVector<::llvm::ArrayRef<::mlir::spirv::Extension>, 1>", "...">

}
I find it hard to compare with the simplicity and conciseness of the following:

let availability = [

MinVersion<SPV_V_1_3>,
MaxVersion<SPV_V_1_5>,
Extension<[]>,
Capability<[SPV_C_GroupNonUniformBallot]>

];

Right, that's the kind of examples I'm after. So you showed how to do it inline with DerivedTypeAttr fully specified here. But can this be refactored?
I would expect that (without much change to ODS) we could get to a syntax that would look like:

def SPV_SomeOp : MinVersion<SPV_V_1_3>, MaxVersion<SPV_V_1_4> {
 ...
}

where MinVersion and MaxVersion would be parent classes defining the right derived attribute and hiding the C++ boilerplate.

Similarly for arguments, we have something to annotate them with "decorators" right now, for example the MemRead here:

let arguments = (ins 
    Arg<AnyMemRef, "the reference to load from", [MemRead]>:$base,

I would look into generic extension to such mechanism before adding anything first class for availability.

chhe added a comment.Sep 28 2021, 3:56 PM

Any update on this?

mravishankar resigned from this revision.Feb 8 2022, 8:23 PM
antiagainst planned changes to this revision.Feb 24 2023, 2:12 PM
mlir/include/mlir/Dialect/SPIRV/IR/CMakeLists.txt