Page MenuHomePhabricator

[mlir] Add initial support for a binary serialization format
ClosedPublic

Authored by rriddle on Aug 11 2022, 8:17 PM.

Details

Summary

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rriddle updated this revision to Diff 452942.Aug 16 2022, 4:04 AM
rriddle marked 33 inline comments as done.
rriddle added inline comments.Aug 16 2022, 4:04 AM
mlir/docs/BytecodeFormat.md
191

Originally I was thinking that you can't do that if you load lazily, but if I use your other suggestion of doing per-isolated region numbering it should be possible. This will be much nicer, thanks!

203

Great suggestion!

221

I don't think it would make lazy-loading bad, given that we could encode the last location at the start of the region. I'm going to drop this behavior though, given that I'm not sure how many cases in practice it would help. This would also free up this optional behavior for something else (if that something else was more efficient for real-world use cases).

252

Yeah? e.g. External functions have empty regions. Almost every operation that has a region that can optionally be filled use an empty region, given that regions can't be dynamically added after op construction. Cleaned up this section though, we don't need the leading code, we can just use the block count.

269

Nice! I figured you would come up with something better ;)

mlir/include/mlir/Bytecode/BytecodeWriter.h
36

There was some reason why I did this before, but I forget now. I just dropped it.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
149

I thought I did:

This method is marked noinline to avoid pessimizing the common case of single byte encoding.

Tried to make it more obvious.

173

Right now, no. I need to setup the proper little -> native conversions. I'm deferring that to a follow up because I have to setup a virtual machine for big endian (which is annoying/time consuming). I also need to fix some things in the textual format related to big endian as well.

381

We will likely need some form of special API that can parse just the "immutable" part (i.e. the "name" in the LLVM struct case). For example, if an attribute/type *is* recursive, we could encode both its immutable and mutable encodings in one entry (with some header that has the size of the immutable part or something). Something like:

RecursiveEntry {
  immutableEncodingSize: varint,
  immutableEncoding: ...,
  mutableEncoding: ...
}

During processing we could first process the immutable entry, and then immediately process the mutable one. That way any recursive references would resolve properly, and then we'd fix the final reference afterwards. Something like:

// Parse the immutable first, so that we have something to give recursive references.
if (!(result = parseImmutable()))
  return failure();
// Parse the mutable afterwards. Pass in `result` so that it can populate the mutable bits?
if (failed(parseMutable(result))
  return failure();

Until we figure any of this out though, I'm just going to have them always use the string fallback for those attributes and types.

755

Deferring this to a follow up to help simplify this patch, added a TODO for now.

mlir/lib/Bytecode/Writer/IRNumbering.cpp
61

We would need to encode things differently in that case, i.e. if the attributes are not in order of dialect, they would each need to have an associated dialect id encoded with them. In the case of lots of attributes/types, that would be significant. Maybe we could come up a hybrid model? i.e. encode the most common 128 attributes/types, so that they fit in one byte (or two), and then encode the rest using dialect grouping.

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
156

Makes sense to me, just dropped it. We can add a warning back in if enough people trip up on this (given bytecode generation is an explicit decision).

jpienaar added inline comments.Aug 16 2022, 7:55 AM
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
616

So encoding would be start and end offsets into a string table?

mehdi_amini added inline comments.Aug 16 2022, 3:49 PM
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
616

String being null terminated, you don't necessarily need the end offsets. But if we have an offset section separate from the string table: we just need to point to an entry number, same mechanism as attr/type reference.

(partial scan before meeting)

mlir/docs/BytecodeFormat.md
12

I just noticed ï and not i , I mean I guess writing this by hand and the other was that we don't actually take text file as bytecode (MĻîŘ just to bikeshed :))

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
10

Nit: I'd move this lower, I expect documentation here not todos :)

227

Dialect of this Attribute or Type ? (its mostly parent that makes me think OOP more than I normally think here, up to you)

309

attribute ?

mehdi_amini added inline comments.Aug 16 2022, 4:06 PM
mlir/docs/BytecodeFormat.md
12

I don't think your bike shed is ASCII though?

rriddle updated this revision to Diff 453166.Aug 16 2022, 4:41 PM
rriddle marked 5 inline comments as done.Aug 16 2022, 4:44 PM
rriddle updated this revision to Diff 453172.Aug 16 2022, 5:09 PM
rriddle marked an inline comment as done.Aug 16 2022, 5:10 PM
jpienaar accepted this revision.Aug 16 2022, 9:08 PM

Missing negative tests are a bit unfortunate, but good to hear they are coming soon and this seems like good starting point.

mlir/docs/BytecodeFormat.md
12

Yeah I marked the comment as done before sending as I wasn't serious (and I did also use an extended ASCII encoding without realizing).

But in seriousness: MLÏR wouldn't be a valid dialect name, so this looks good.

mlir/lib/Bytecode/Encoding.h
26

uint8_t here too? (I mean with 0 it doesn't matter)

80

We can now use binary literals (not sure it makes if it is more readable, but was reminded of it)

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
243

Unsigned needed?

414

Comment?

502

So this parses the string starting at front of reader? And null-terminated?

616

Indeed, null-termination means we can't have substrings referenced (not sure if that is common here, could think for error strings, but unsure about decoding cost).

mlir/lib/Bytecode/Writer/IRNumbering.cpp
61

Would sorting attributes per frequency per dialect? (Keep dialect attributes still together but just sort dialects per frequency). We could measure all three of course, doesn't require version bump ;-)

This revision is now accepted and ready to land.Aug 16 2022, 9:08 PM
rriddle updated this revision to Diff 453383.Aug 17 2022, 11:46 AM
rriddle marked 12 inline comments as done.
rriddle added inline comments.Aug 17 2022, 11:47 AM
mlir/lib/Bytecode/Encoding.h
26

kVersion is encoded as a varint now, so that we don't have to change it if we have some burst of changing versions (makes it easier to change version if we don't have a cap looming overhead). I suppose I could switch the general constants to use inline constexpr variables now (given we are on C++17), let me know your preference.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
502

Yeah, the front of the reader has an index to a string defined in the string section. Updated the comment.

mlir/test/Bytecode/general.mlir
2

Do you plan on reviving that @mehdi_amini ?

rriddle updated this revision to Diff 453397.Aug 17 2022, 12:32 PM
mehdi_amini added inline comments.Aug 17 2022, 2:53 PM
mlir/test/Bytecode/general.mlir
2

Yeah I should. I had memory that we couldn't reach consensus on it but I may be wrong.

mehdi_amini accepted this revision.Aug 17 2022, 3:04 PM

LG, great start :)

jpienaar added inline comments.Aug 17 2022, 3:05 PM
mlir/test/Bytecode/general.mlir
2

I think only question was on on by default or not (e.g., how much of testing tool mlir-opt really is, should it be used in directed testing only etc)

rriddle updated this revision to Diff 453794.Aug 18 2022, 2:40 PM
rriddle marked 2 inline comments as done.
jpienaar accepted this revision.Aug 18 2022, 2:51 PM

Nice, like the encoding change.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
58

Why this change?

mlir/lib/Bytecode/Writer/IRNumbering.cpp
81

Could this just be a static function here?

rriddle marked 2 inline comments as done.Aug 18 2022, 7:23 PM
rriddle added inline comments.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
58

So that we can load in enums.

rriddle updated this revision to Diff 453864.Aug 18 2022, 7:23 PM
rriddle marked an inline comment as done.
vitalybuka added inline comments.Aug 22 2022, 9:53 AM
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
901

also a problem, it emplace_back may relocate container, but the for loop above uses readState which is the ref to the element of container.

925

This pop_back and then readState.isIsolatedFromAbove which from the regionStack?

rriddle marked 2 inline comments as done.Aug 22 2022, 9:58 AM
rriddle added inline comments.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
901

This should be fine, given that we always return in this case (i.e. never touch to invalid reference again).

925

Thanks for catching this. I'm not sure why my local asan build didn't catch this (I'll try nuking and resetting it).

vitalybuka added inline comments.Aug 22 2022, 10:03 AM
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
901

Thanks, I see.

925

I'm not sure why my local asan build didn't catch this

Probably you don't use libc++ or instrumented libc++?

I'm not sure why my local asan build didn't catch this I'm not sure why my local asan build didn't catch this

If you can fix it quickly go for it.
If not, please let me know, I have a patch to revert it with related fixes.

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

Define "alive"? This is a new feature that has zero users and we're actively bootstrapping. So no I'm not scared by an out-of-bound here for a couple of days at most.

vitalybuka added a comment.EditedAug 22 2022, 11:15 AM

Looks like it's already fixed with 96fd3f2

rriddle marked 2 inline comments as done.Aug 22 2022, 11:19 AM

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

Define "alive"? This is a new feature that has zero users and we're actively bootstrapping. So no I'm not scared by an out-of-bound here for a couple of days at most.

Sure, I don't know that code. So XFAIL is OK to me if you accept implications. (having that @rriddle failed to reproduce locally, maybe UNSUPPORTED instead, in case if some asan setups will miss the issues)
But seems revert/reland safe and easy to do as well.

Also if this is the only issue

Trivial fix may work?

bool isIsolatedFromAbove = readState.isIsolatedFromAbove;
 regionStack.pop_back();
 if (isIsolatedFromAbove)
   valueScopes.pop_back();

Yeah, sorry I've been in meetings. I pushed https://github.com/llvm/llvm-project/commit/96fd3f2d5be21ded6ffed0ac75195df04ec679df an hour ago and have been watching the bot to see if that is the only issue.

Hi @rriddle , as of commit https://github.com/llvm/llvm-project/commit/93cf0e8a28e8c682f65d3e5c394d1eb169ca09ce the s390x build bot is still red due to "unexpected success":

XPASS: MLIR::invalid-string_section.mlir
XPASS: MLIR::invalid_attr_type_offset_section.mlir
XPASS: MLIR::invalid_attr_type_section.mlir
XPASS: MLIR::invalid-structure.mlir
XPASS: MLIR::invalid-ir_section.mlir
XPASS: MLIR::invalid-dialect_section.mlir

(see https://lab.llvm.org/buildbot/#/builders/199/builds/8674)

Can these "invalid" tests still legitimately pass even on a big-endian platform? It seems these XFAILs should either be removed or changed into UNSUPPORTED.

Hi @rriddle , as of commit https://github.com/llvm/llvm-project/commit/93cf0e8a28e8c682f65d3e5c394d1eb169ca09ce the s390x build bot is still red due to "unexpected success":

XPASS: MLIR::invalid-string_section.mlir
XPASS: MLIR::invalid_attr_type_offset_section.mlir
XPASS: MLIR::invalid_attr_type_section.mlir
XPASS: MLIR::invalid-structure.mlir
XPASS: MLIR::invalid-ir_section.mlir
XPASS: MLIR::invalid-dialect_section.mlir

(see https://lab.llvm.org/buildbot/#/builders/199/builds/8674)

Can these "invalid" tests still legitimately pass even on a big-endian platform? It seems these XFAILs should either be removed or changed into UNSUPPORTED.

@uweigand Thanks for the ping, it's possible big-endian is fine up to the point at which some of the tests fail (I haven't had time to setup a venv to test everything out). UNSUPPORTED is likely a better check than XFAIL there (I just copied from our other s390x broken test)

@uweigand Thanks for the ping, it's possible big-endian is fine up to the point at which some of the tests fail (I haven't had time to setup a venv to test everything out). UNSUPPORTED is likely a better check than XFAIL there (I just copied from our other s390x broken test)

As of commit df4e637ca7ef4ef17b662845120864921e65bb67 the build bot is green again on s390x. Thanks!

RVP added a subscriber: RVP.Thu, Sep 29, 8:29 AM
RVP added inline comments.
mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185

Is this parenthesized correctly?

jpienaar added inline comments.Thu, Sep 29, 8:34 AM
mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185

This is checking if the value post shift is 0 (and relies on this function being called only when multi byte), what issue did you run into with this?

RVP added inline comments.Thu, Sep 29, 8:38 AM
mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185

Shouldn't LLVM_LIKELY be around the whole condition instead of the shift expression? Isn't == 0 the likely case and not the shift result being non-zero?

RVP added inline comments.Thu, Sep 29, 8:53 AM
mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185

I didn't see any issues. Was looking at the code and this question popped. I now saw that emitVarInt specially handles the common case (... >> 7) == 0. Maybe a comment here as well would have avoided the question. Thanks.