This is an archive of the discontinued LLVM Phabricator instance.

Expose callbacks for encoding of types/attributes
ClosedPublic

Authored by mfrancio on Jun 20 2023, 3:33 PM.

Details

Summary

[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.

Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.

Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.

Diff Detail

Event Timeline

mfrancio created this revision.Jun 20 2023, 3:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2023, 3:33 PM
mfrancio requested review of this revision.Jun 20 2023, 3:33 PM
rriddle requested changes to this revision.Jun 20 2023, 3:37 PM

I'm not sure I understand the rationale here. This looks like a very awkward side band, and quite invasive. I don't see why we'd want to open up the bytecode for arbitrary encodings, this should be driven solely by the dialect itself.

This revision now requires changes to proceed.Jun 20 2023, 3:37 PM
mehdi_amini added inline comments.Jun 20 2023, 3:43 PM
mlir/include/mlir/Bytecode/BytecodeWriter.h
63

Seems like not natural to me as an API to anchor this to the dialect, that seems a bit arbitrary to me.

When we talked about it I was thinking something more flat and simpler:

void addOverrideCallback(std::function<bool(Type)> callback) {
}
Smallvector<std::function<bool(Type)>> typeOverrideCallbacks;

And the writer would do:

// First try to process the given type with the provided override
for (auto &callback : typeOverrideCallbacks)
  if (callback(type)) return

// continue with normal emission
mehdi_amini added inline comments.Jun 20 2023, 3:58 PM
mlir/include/mlir/Bytecode/BytecodeWriter.h
63

I guess what I wrote is not enough: there is more than the exact encoding, there may be some remapping to another dialect as well.

I'm not sure I understand the rationale here. This looks like a very awkward side band, and quite invasive. I don't see why we'd want to open up the bytecode for arbitrary encodings, this should be driven solely by the dialect itself.

Thanks for your feedback. I completely agree in the principle: a dialect should be the only driver of the encoding. However, this comes with a limitation - if a versioned client dialect wants to control its own encoding, there is no way to do it currently without re-defining and reimplementing all the types and attributes. The patch tries to address this problem by exposing a callback, which offers clients the chance to decouple the encoding of types and attributes that are defined as part of the upstream dialects from their upstream encoding, so that a versioned client dialect that uses unversioned upstream types/attributes can maintain forward/backward compatibility independently from the upstream development.

mlir/include/mlir/Bytecode/BytecodeWriter.h
63

I found it as a compelling and explicit way to override Type and Attributes printer/parser of any dialect with a specific encoding. But agreed, it is arbitrary. I can remove this anchor and simplify it a little bit.

mfrancio updated this revision to Diff 533112.Jun 20 2023, 8:33 PM
mfrancio edited the summary of this revision. (Show Details)

Remove ability to specify the dialect associated with each callback.

mfrancio added inline comments.Jun 20 2023, 8:35 PM
mlir/include/mlir/Bytecode/BytecodeWriter.h
63

I think what implemented should be enough to be able to override the encoding of a type/attribute if its definition jumps from one dialect to another - this is true as long as the encoding is always owned by the callback, before and after the jump.

mfrancio updated this revision to Diff 538873.Jul 10 2023, 5:00 PM
mfrancio edited the summary of this revision. (Show Details)

Adds a test exercising roundtrip to bytecode with a custom encoding of IntegerType.

mfrancio added inline comments.Jul 10 2023, 5:07 PM
mlir/test/lib/IR/TestBytecodeCallbacks.cpp
90

@mehdi_amini this gives an idea of what we would do on parsing - we would get the dialect version to parse from the version map. Right now though there is no system to specify a version on writing, and this is the closest I could go :).

We could post-fix this once the proper API exists if you are ok in leaving the TODO, or I can propose an implementation and finalize the work.

mehdi_amini added inline comments.Jul 10 2023, 8:02 PM
mlir/test/lib/IR/TestBytecodeCallbacks.cpp
51

Can you write Type entryValue and drop the if constexpr in the body?

65

I'd be interested to see an example where you actually take a !test.i32 as input and write it as a builtin IntegerType (and show that we can parse it as such), and vice-versa.

90

LG

mfrancio updated this revision to Diff 539292.Jul 11 2023, 2:16 PM

Adds bytecode roundtrip tests with custom integer types.

mfrancio marked 3 inline comments as done.Jul 11 2023, 2:22 PM
mfrancio added inline comments.
mlir/test/lib/IR/TestBytecodeCallbacks.cpp
51

No, the callback needs to compile for both Type entryValue and Attribute entryValue. We could do this with two separate callbacks if you feel it's a bit cumbersome to force the use of auto in the callback signature.

65

Test added.

Note that it is not possible to parse !test.i32 as a native builtin integer type (hence, without adding a specific callback for it, or without a custom parser that falls back to the builtin integer type parser, which I did not implement for the sake of this test) because the the "owner" of such encoding is still the test dialect.

mfrancio updated this revision to Diff 539294.Jul 11 2023, 2:24 PM
mfrancio marked 2 inline comments as done.

clang-format commit.

mfrancio updated this revision to Diff 539301.Jul 11 2023, 2:52 PM
mehdi_amini added inline comments.Jul 11 2023, 3:21 PM
mlir/test/lib/IR/TestBytecodeCallbacks.cpp
51

What about taking a union of type/attr instead of templating this?
Right now this adds some cognitive complexity that does not seems necessarily justified to me.
Alternatively we could indeed split it in two callback registration, that can be fine as well.

65

I understand we need a callback, but can't you implement the callback in this file to show we can parse !test.i32 as a native builtin integer type?

mfrancio updated this revision to Diff 539379.Jul 11 2023, 10:19 PM
  • split callbacks between type and attributes
  • expose builtin bytecode dialect interface
  • parse explicitly with bytecode dialect interface when testing interoperability with bytecode types/attributes within a callback
mfrancio marked an inline comment as done.Jul 11 2023, 10:22 PM
mfrancio added inline comments.
mlir/test/lib/IR/TestBytecodeCallbacks.cpp
51

Switched to two callbacks - it is a bit more code, but the function signature is cleaner and more explicit.

65

I did export the integer type parser code within the callback - but effectively was not very clear. Right now I exported the bytecode dialect interface and used this explicitly to write/read, which helps showcasing the feature.

mehdi_amini added inline comments.Jul 12 2023, 1:14 AM
mlir/include/mlir/IR/AsmState.h
83

I would think we should distinguish between "no handling" and "error during parsing" here, that is the API should be able to fail the parsing.

mfrancio updated this revision to Diff 539639.Jul 12 2023, 10:54 AM
mfrancio marked an inline comment as done.
  • Handle failure in read callback API
  • Add test showcasing the feature
mfrancio marked an inline comment as done.Jul 12 2023, 10:55 AM
mfrancio added inline comments.
mlir/include/mlir/IR/AsmState.h
83

Thanks for pointing this out, it is indeed a very useful feature. I changed the API and added a test for it.

burmako added inline comments.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
1234

How do you envision the contract between customized serialization and deserialization? E.g. how does a consumer of bytecode payload know that a specific payload was generated via serializer callbacks and how do they know where to obtain the corresponding deserializer callbacks?

mfrancio marked an inline comment as done.Jul 12 2023, 12:48 PM
mfrancio added inline comments.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
1234

Since the callbacks are driven by the client, it's up to the client to decide. Assuming you are serializing a module that contains a versioned dialect, I would envision handling such scenario through the dialect version.

I don't think upstream dialects should use those callbacks in any way.

mfrancio updated this revision to Diff 540749.Jul 15 2023, 4:36 PM

Simplifies signature of the read callback by leveraging the dialect reader to retrieve context and dialect versions.

(forgot to add feedback earlier)

mlir/include/mlir/Bytecode/BytecodeWriter.h
57

Nit: let's keep these called writer rather than printer (could be emitter too as that matches some of the comments).

mlir/include/mlir/IR/AsmState.h
28

Let's keep these sorted

67

I don't think the _ naming convention is used anywhere else here, of you want to signal internal it could be in private section.

73

emitting -> parsing (or reading or ingesting)

580

So these are just flat arrays rather than grouped by (say) type it handles?

mlir/include/mlir/IR/BuiltinDialectBytecode.h
22 ↗(On Diff #540749)

I think we should just make this now builtin::detail

24 ↗(On Diff #540749)

I was semi in between exposing the interface vs just helper methods generated and then the add method below (the add method doesn't add much given full interface here). But I was thinking of different composition then.

47 ↗(On Diff #540749)

Don't know Mehdi s opinion, but I'd probably put this just in builtin namespace along with dialect. This is rather "public" level API for me.

mehdi_amini added inline comments.Jul 15 2023, 7:09 PM
mlir/include/mlir/IR/AsmState.h
579

Nit: can you make the return type explicit (ArrayRef<...>) here?

I don't think the auto return is widely used in the codebase, and it goes a bit against the "use auto only when the type is obvious from the context" practice.

mlir/include/mlir/IR/BuiltinDialectBytecode.h
47 ↗(On Diff #540749)

I agree: in general "detail" namespace aren't meant to contain things to be directly used by clients.
But also stepping back: I'm not convinced this entire file should be publicly exposed. We should be able to have a single entry point in MLIR for writing a type or an attribute.

mlir/lib/IR/BuiltinDialectBytecode.cpp
87 ↗(On Diff #540749)

I believe that in general we prefer using namespace in implementation files, and fully qualify function definition?

mfrancio updated this revision to Diff 540855.Jul 16 2023, 9:50 PM
  • Makes BuiltinDialectBytecodeInterface private
  • Exposes write/read bytecode functions for types and attributes using the builtin encoding
  • Addresses few nits and comments
mfrancio marked 11 inline comments as done.Jul 16 2023, 10:00 PM
mfrancio added inline comments.
mlir/include/mlir/IR/AsmState.h
580

Well, the idea is that you would encode a function that handles the abstract type and not the concrete, then handling the concrete types using a type switch within the callback itself as needed.

I feel passing a callback per concrete type would be more cumbersome to use?

mlir/include/mlir/IR/BuiltinDialectBytecode.h
47 ↗(On Diff #540749)

I avoided exposing the interface while adding an entry point to read/write types and attributes using the builtin encoding.

I used the builtin namespace even though I found out it wasn't used for the builtin dialect - it seems that this dialect lives directly at the mlir level.

However, I have the impression that read/write bytecode functions would be best under the builtin namespace to be explicit about the underlying encoding.

mehdi_amini added inline comments.Jul 16 2023, 10:21 PM
mlir/include/mlir/IR/BuiltinDialectBytecode.h
47 ↗(On Diff #540749)

I really meant that we should be able to call « writeAttribute » without knowing the dialect to target: I don’t quite get what is special about the built in dialect here?

mfrancio marked an inline comment as done.Jul 17 2023, 8:17 AM
mfrancio added inline comments.
mlir/include/mlir/IR/BuiltinDialectBytecode.h
47 ↗(On Diff #540749)

I got what you mean and yes, write functions for attribute and type could be made dialect agnostic - what made me stop is that it's now clear how that would extend to read functions, since the encoding does not contain dialect info, and there is nothing that prevents different dialects to use the same encoding for different things. I found this incongruence a bit confusing, to a point that it seemed suggesting that it would not fit in the current design?

For context, see for example Builtin and Quantization (those functions are generated through tablegen) - both dialect encode type 1 and there is no way to disambiguate in a "dialect agnostic" fashion.

Builtin:

static Type readType(MLIRContext* context, DialectBytecodeReader &reader) {
  uint64_t kind;
  if (failed(reader.readVarInt(kind)))
    return Type();
  switch (kind) {
    case 0:
      return readIntegerType(context, reader);
    case 1:
      return readIndexType(context, reader);

Quantization:

static Type readType(MLIRContext* context, DialectBytecodeReader &reader) {
  uint64_t kind;
  if (failed(reader.readVarInt(kind)))
    return Type();
  switch (kind) {
    case 1:
      return readAnyQuantizedType(context, reader);
    case 2:
      return readAnyQuantizedTypeWithExpressedType(context, reader);

Furthermore, it is reasonable to expect to have more of such conflicts as the MLIR bytecode gains popularity.

mehdi_amini added inline comments.Jul 17 2023, 11:52 AM
mlir/include/mlir/IR/BuiltinDialectBytecode.h
47 ↗(On Diff #540749)

OK I get it: it is similar to what we do for print/parse textual assembly.

The convention there is that every type/attribute class expose a print/parse method, can we align on the same model for bytecode? We wouldn't need to emit the discriminant for which attribute it is when we know which one to emit!
(similar to textual ASM)

mfrancio updated this revision to Diff 541718.Jul 18 2023, 1:52 PM

Removes readAttribute() and readType() APIs that take a version as argument since the dialect version is available through the dialect reader.

mfrancio added inline comments.Jul 18 2023, 1:55 PM
mlir/include/mlir/IR/BuiltinDialectBytecode.h
47 ↗(On Diff #540749)

A non-breaking exension of the ASM approach to bytecode does not seem straightforward because of our current bytecode serialization approach.

To summarize the textual ASM approach: builtin types/attributes have known tokens that can be used to detect if a type or an attribute is owned by builtin. If it is not, textual ASM uses a special token (!), which triggers emission of an extended type. At parsing, the extended type token is processed first and this triggers the use of the specific dialect parser.

Now porting this approach into bytecode would mean changing how types or attributes are encoded. Types and attributes are grouped together when emitted to bytecode. At parsing, we parse first the dialect owning the group, which will determine the encoding (this is done per-group and not per-attribute, as opposed to the ASM case).

We could implement an API that works similarly to textual ASM, but unless i am missing some details, integrating it into the existing bytecode format would require some additional work if we want to maintain backwards compatibility, which is probably beyond the scope of the current patch. We could attempt doing this in the future and I would be happy to take the work.

mehdi_amini added inline comments.Jul 18 2023, 2:49 PM
mlir/include/mlir/IR/BuiltinDialectBytecode.h
47 ↗(On Diff #540749)

I agree with you, the encoding difference makes it harder here. Seems reasonable.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
315

Formatting only?

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
815

Can you extract the above in a lambda "emitAttrOrTypeImpl(..)": you could do a lot of early return which would simplify the control flow and reduce the indentation.
The hasCustomEncoding boolean should disappear basically.

mfrancio updated this revision to Diff 542185.Jul 19 2023, 2:12 PM

Extend callbacks to allow remapping from one dialect group to another when writing/reading bytecode.

mfrancio marked an inline comment as done.Jul 19 2023, 2:12 PM
mfrancio updated this revision to Diff 542212.Jul 19 2023, 3:01 PM

Refactors emitAttrOrType lambda to simplify control flow.

mfrancio marked an inline comment as done.Jul 19 2023, 3:01 PM
mfrancio marked 7 inline comments as done.Jul 19 2023, 3:38 PM
mfrancio updated this revision to Diff 542263.Jul 19 2023, 5:43 PM
mfrancio updated this revision to Diff 544481.Jul 26 2023, 12:58 PM

improve few comments

rriddle added inline comments.Jul 26 2023, 1:54 PM
mlir/include/mlir/Bytecode/BytecodeWriter.h
56

Can you ArrayRef here and below instead?

mlir/include/mlir/IR/AsmState.h
41–44

It feels quite weird to have something bytecode related not in Bytecode/. Why does this need to be here instead of hooked into the BytecodeWriterConfig?

mlir/lib/IR/BuiltinDialectBytecode.cpp
115–130 ↗(On Diff #542263)

Why do we need to expose these at all? As opposed to casting the builtin dialect to DialectBytecodeInterface, and going through the normal path?

mfrancio marked an inline comment as done.Jul 26 2023, 2:35 PM
mfrancio added inline comments.
mlir/include/mlir/Bytecode/BytecodeWriter.h
56

Absolutely, I'll push a revision.

mlir/include/mlir/IR/AsmState.h
41–44

I agree and I tried to search for a better location. The issue is that we currently have a unified entry point for the parser (ParserConfig) which works for both text and bytecode. Unless we want to refactor this, at least the reader side of this class need to stay in this header. Since the same logic is used for AsmResourcePrinter (defined here, but used in bytecode writer config), I thought this could work?

mlir/lib/IR/BuiltinDialectBytecode.cpp
115–130 ↗(On Diff #542263)

That would require exposing the interface for builtin, which I had originally done in a previous version of the patch :). However, after few rounds of revisions the consensus was that it could have been better to leave the interface as an internal implementation detail and expose single hooks. In previous revisions, I was also asked to try to expose top level entry point for writing type/attributes in a dialect agnostic fashion, similarly to what the textual parser does, but we realized that it was not going to fit in the current design. Hence, exposing those hooks was the closest I could go to that idea, which preserves the original intent of @jpienaar (having the bytecode dialect interface as an internal implementation detail and not exposed).

Those functions here are only needed for the tests, but it is not unreasonable to expect clients to use them. Let me know if you are strongly against leaving as is, and I can revise as necessary.

rriddle added inline comments.Jul 26 2023, 4:01 PM
mlir/include/mlir/IR/AsmState.h
41–44

Hmrmrm, can we split out all of the bytecode config into a BytecodeReaderConfig and have that in the parser config? Would be good to keep all of the bytecode pieces isolated.

mlir/lib/IR/BuiltinDialectBytecode.cpp
115–130 ↗(On Diff #542263)

DialectBytecodeInterface is an already exposed api, you can just do cast<DialectBytecodeInterface>(type.getDialect()) to get the virtual instance. I don't see which part of that requires exposing anything from the builtin dialect?

mfrancio marked 2 inline comments as done.Jul 26 2023, 4:48 PM
mfrancio added inline comments.
mlir/include/mlir/IR/AsmState.h
41–44

Sure, it seems like a good change. I'll do that and update the patch. Thanks for the feedback.

mlir/lib/IR/BuiltinDialectBytecode.cpp
115–130 ↗(On Diff #542263)

I apologize for the confusion - it seems casting directly is not allowed, but I was able to retrieve the virtual instance through getRegisteredInterface<BytecodeDialectInterface>()! For some reason I had assumed at first that type id of the interface was going to be different. Thanks for pointing it out.

mfrancio updated this revision to Diff 544596.Jul 26 2023, 9:49 PM
mfrancio marked an inline comment as done.
  • Move bytecode related code into Bytecode/. with appropriate header
  • Make naming convention uniform (parse<->read, print<->write)
  • Rebase
mfrancio marked an inline comment as done.Jul 26 2023, 9:51 PM
mfrancio added inline comments.
mlir/include/mlir/IR/AsmState.h
41–44

I created a new header specific for the bytecode reader config to avoid a circular dependency (AsmState <-> BytecodeReader).

rriddle accepted this revision.Jul 26 2023, 10:30 PM

Nice

mlir/include/mlir/IR/AsmState.h
578

Could we have just an accessor for the BytecodeReaderConfig? Would help really limit the bytecode parts of this API.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
315

Was the formatting off here before?

This revision is now accepted and ready to land.Jul 26 2023, 10:30 PM
mfrancio updated this revision to Diff 544817.Jul 27 2023, 9:26 AM

Expose the bytecode reader config directly to the parser config.

mfrancio marked 2 inline comments as done.Jul 27 2023, 9:28 AM
mfrancio added inline comments.
mlir/include/mlir/IR/AsmState.h
578

done! Thanks for the review.

mfrancio updated this revision to Diff 544832.Jul 27 2023, 10:08 AM
mfrancio marked an inline comment as done.

Rebase

This revision was automatically updated to reflect the committed changes.