This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
docs/
53/53
BytecodeFormat.md
-
include/mlir/
-
mlir/
-
Bytecode/
-
BytecodeReader.h
2/2
BytecodeWriter.h
-
IR/
-
OperationSupport.h
-
Tools/mlir-opt/
-
mlir-opt/
-
MlirOptMain.h
-
lib/
-
Bytecode/
-
CMakeLists.txt
5/5
Encoding.h
-
Reader/
36/38
BytecodeReader.cpp
-
CMakeLists.txt
-
Writer/
4
BytecodeWriter.cpp
-
CMakeLists.txt
-
IRNumbering.h
4/4
IRNumbering.cpp
-
CMakeLists.txt
-
Parser/
-
CMakeLists.txt
-
Parser.cpp
-
Tools/mlir-opt/
-
mlir-opt/
-
CMakeLists.txt
3/3
MlirOptMain.cpp
-
test/Bytecode/
-
Bytecode/
5/7
general.mlir

Differential D131747

[mlir] Add initial support for a binary serialization format
ClosedPublic

Authored by rriddle on Aug 11 2022, 8:17 PM.

Download Raw Diff

Details

Reviewers

mehdi_amini
jpienaar
nicolasvasilache

Commits

rGf3acb54c1b7b: [mlir] Add initial support for a binary serialization format

Summary

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rriddle created this revision.Aug 11 2022, 8:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 8:17 PM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 19 others. · View Herald Transcript

rriddle requested review of this revision.Aug 11 2022, 8:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 8:17 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

rriddle added reviewers: mehdi_amini, jpienaar.Aug 11 2022, 8:18 PM

I don't have any failure tests right now because those are annoying to make/update, and I want to make sure we agree on various aspects before doing that.

Harbormaster completed remote builds in B180841: Diff 452063.Aug 11 2022, 8:36 PM

Nice, just started with doc for now.

I was wondering if we should we have mlir-translate do the text to binary and vice versa, but can see the convenience when stitching together passes and it's not an external format.

mlir/docs/BytecodeFormat.md
36	OOC why not use it?
69	What is used instead?
121	Nit: [] instead of *? (I see the relational database or pointer notations of the latter but I find the former more intuitive)
128	Link to encoding section?
209	And 7 of 8 optional elements used?
214	So location is either previous or new? Unspecified doesn't mean unknown but previous. That's nice. The only other common case I could think of is successive lines in a file but that would take a bit.
247	So we could have a region with 0 blocks here that is considered not empty?

rriddle added inline comments.Aug 12 2022, 12:40 AM

mlir/docs/BytecodeFormat.md
36	The encoding described here, or more commonly referred to as PrefixVarInt, is essentially a variant of LEB. Encoding and decoding are generally faster using the prefix strategy, though. The decode for the prefix variant, for example, is effectively branchless given that you just need to count the trailing zeros, which is an intrinsic on most modern hardware. I've benchmarked a bunch of different strategies and implementations over the past week, using both a corpus of .mlir taken from various projects and just using random distributions of integers. This is what emerged from that testing.
247	Zero block regions should be marked as empty. I used codes for various "bool" like things just to make it easier to change things later on. E.g. if we wanted to change the way regions are encoded we could just have a new "kRegion2" code. Though that only matters after we start versioning things.

Just commenting on the doc right now.

mlir/docs/BytecodeFormat.md
26	No fixed64 ? Is the bet that we'd always use VBR for this?
36	Are you using PrefixVarInt or how does it differ? Is your variant format documented somewhere in the literature? I'd rather have us stick to something existing, I doubt that we'll invent a revolutionary trick here somehow. Can you position what you're proposing against varint-G8IU, PFOR, SIMD-BP12, ... ?
120	I'm not sure here why these varint are useful?
121	+1 for [] :)
142	What is `kAsmForm` ? Seems to refer to an enum not documented here.
148	I'm not sure yet how we dispatch to a dialect for loading a type/attribute
154	Basically this is differential encoding of the offset table, but that means you need to decompress the table here IIUC. Accessing the last attribute requires to read the entire table. (I assume we would do it once and for-all in the "BitcodeReaderContext" or whatever you're naming it)
168	That seems overly conservative to me: what about just storing the part after the dot and add a varint for the dialect ID?
191	I assume this field is needed to allow some level of lazy loading / random access? I'm not sure yet...
209	Actually 6? `firstResultIndex` and `numResults` seem coupled right?
248	Do we need this indirection here? You didn't provide one for the operation content.
265	block_element ?

mehdi_amini added inline comments.Aug 12 2022, 2:39 PM

mlir/docs/BytecodeFormat.md
203	Having the regions in-line makes it hard/impossible to lazily load IR I think? (at least not without decoding the entire IR section).

rriddle updated this revision to Diff 452346.Aug 12 2022, 6:30 PM

rriddle marked 12 inline comments as done.

rriddle added inline comments.Aug 12 2022, 6:31 PM

mlir/docs/BytecodeFormat.md
26	I just didn't add it in. Technically right now we only use byte, so I dropped the rest and added a TODO to add larger widths as necessary.
36	Yeah, it's just PrefixVarInt. I completely missed explicitly saying that when rushing out the doc.
120	These are used to know how many attributes/types/operation names need to be parsed.
121	SG, I also like [].
148	We chatted a little about this offline. Attributes and Types are grouped by dialect, with each grouping emitted in the same order as the dialects in the dialect section. This allows for us to know which dialect an attribute/type belongs to based on its index (i.e. we could know attributes 0-5 are the builtin dialect, 6-9 are the func dialect and so on).
154	Yeah. We do a single pass to initialize the bytecode structure of all attributes/types, which indicates where the data is stored in the bytecode. Reading lazily after that is trivial, because we just read the previously computed data directly.
168	Agreed, I need to look into this. I went with the current thing because it was easier to bootstrap (e.g. I don't have to worry about the difference between builtin and non-builtin).
191	This field is the value index of the first result. Every value gets a number, which is what gets referred to in the `operands` list. For multiple results, we know the value number are consecutive, so we just need to know the first one.
203	Yeah. The idea I have right now is that the op encoding mask will indicate if the regions are inline or out-of-line. That way we just dispatch to two different code paths depending on the encoding.
209	Yeah, right now the mask uses 6 out of 8 possible bits. Whenever we do lazy loading that may bump up to 7 bits (we could get around that by encoding the "are the regions lazy" bit a different way). We could of course change from a mask to a set of values for each possible encoding type.
214	Yeah, the file locations encoding is gonna be interesting. I'm hoping that when builtin attributes have custom encodings it ends up being mostly okay (should just be a string index+two small varints for the line/col). This is something we will likely want to play around with, but thankfully it's easy to test (I have a huge IR file that inherits locations from the file).

Harbormaster completed remote builds in B181030: Diff 452346.Aug 12 2022, 7:00 PM

jpienaar added inline comments.Aug 12 2022, 7:13 PM

mlir/include/mlir/Bytecode/BytecodeWriter.h
36	This feels a little bit weird ... I'd almost expect like OpPrintingFlags having the config be separate from Operation* being printed. But perhaps it makes more sense below.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
21	mlir-bytecode-reader ?
120	Nit: I'm more used to null-terminated, with NUL being the character name.
mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
156	I think yes & no. piping these will be common and other uses seem like mistake, but I don't know how foolproof this check is on all platforms and opt tool is not a user tool.
mlir/test/Bytecode/general.mlir
1	We probably end up running all non-split or -error cases through a round trip tests to check, followed by fuzzing. It would almost seem possible to enumerate all of these kind of the constructs above.

mehdi_amini added inline comments.Aug 13 2022, 2:59 AM

mlir/docs/BytecodeFormat.md
101	Producer string?
120	Actually, following our offline discussion, they are needed to be able match to type/attr back to a dialect right?
132	Worth mentioning: the first section can't be decoded without the second one as the elements in the array of attrs/types don't include a size.
158	Maybe make it explicit: `, this allows to associate an attribute back to a dialect without including a dialect reference in each type/attr entry.`
168	Can't you capture this as a TODO at the end of this paragraph?
191	Right, but I'm still not sure why this needs to be encoded: if you load operations in order, you could just use an ever incrementing number for each Value.
203	I think we should think about isolated region from the get go: you don't document the value numbering in the doc but I think we can use the same "local scope" as the textual parser to ensure that the value IDs stays small (and so use less varint space).
220	Is this really the common case that (other than "unknown") the locations are repeating in sequence? I am not convinced by this choice right now because it will make future lazy loading harder (you need to stream back to find the previously defined location).
251	When is "region empty" useful? Is it legal for an operation to have an empty region?
268	I don't get this, why multiple `block_arguments` blocks? What about something like: block { encoding: varint, // (numOps << 1) \| (hasBlockArgs) arguments: block_arguments?, // Optional based on encoding ops : op[] } block_arguments { firstArgIndex: varint, numArgs: varint?, args: block_argument[] } block_argument { typeIndexAndHasLoc: varint, // (typeIndex << 1) \| (hasLoc) location: varint? }

mehdi_amini added inline comments.Aug 13 2022, 8:55 AM

mlir/docs/BytecodeFormat.md
120	Also right now it isn't used at all as far as I can tell.
261	Please document numValues :)
mlir/lib/Bytecode/Encoding.h
29	I'm not clear on how you manage "codes", or what are a "builtin section codes"
57	"a" top level operation... We parse in a block so we should have the ability to have multiple I think.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
135	You should sanity check sectionID here I think (and make the argument type the right enum in the API)
148	Document noinline please.
156	`assert(numBytes > 0 && numBytes <= 7);` ?
172	Does this work on a big endian machine?
216	We should have a pointer to the BytecodeDialect here I think, should be able to set it up in initializeOffsets
248	We should be able to have an enum for code here right?
281	`offsetReader`? (I was confusing to me reading the code where it is used)
305	I think you should check that currentOffset does not exceed the `sectionData.size()`, a malformed byte code coud have offsets going beyond.
380	I remember that Attr/Type were made "mutable" to support LLVM named struct (IIRC?), but isn't this encoding and loading scheme assuming there are no cycles? How are we gonna handle this?
565	This should move to `parseSection I think. (`sectionID` is used in test/set already, seems unsafe)
592	Why aren't dialects lazy loaded?
615	I was thinking: could we have a stringpool top-level section and everywhere refer to strings with an id there? Mnemonic shared between op/attributes/types and across dialects would be stored once and for-all.
748	Please break the recursion :)
754	Not sure where to attached this comment, but there is something missing somewhere (unless I missed it?) to ensure that use-lists ordering is preserved.
mlir/lib/Bytecode/Writer/IRNumbering.cpp
60	We could record the number of times an attribute is used in ordre to sort them so that the most used one have a lower IDs (and have more chances to fit in one bytes) :)
mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
156	I think the difference with LLVM opt is that we're not having byte code as the default, hence we may not need to warn since the user has to opt-in to get there.
mlir/test/Bytecode/general.mlir
1	Reminds me an old proposal of mine to add some flag to mlir-opt to automatically round-trip and diff, and enable this flag optionally to process the entire test-suite :) Seems like it would be useful here as well!

jpienaar added inline comments.Aug 13 2022, 9:18 AM

mlir/test/Bytecode/general.mlir
1	Yes indeed, I was actually wondering if we had that already :)

mehdi_amini added inline comments.Aug 13 2022, 9:24 AM

mlir/test/Bytecode/general.mlir
1	https://reviews.llvm.org/D90088

rriddle updated this revision to Diff 452942.Aug 16 2022, 4:04 AM

rriddle marked 33 inline comments as done.

rriddle added inline comments.Aug 16 2022, 4:04 AM

mlir/docs/BytecodeFormat.md
191	Originally I was thinking that you can't do that if you load lazily, but if I use your other suggestion of doing per-isolated region numbering it should be possible. This will be much nicer, thanks!
203	Great suggestion!
220	I don't think it would make lazy-loading bad, given that we could encode the last location at the start of the region. I'm going to drop this behavior though, given that I'm not sure how many cases in practice it would help. This would also free up this optional behavior for something else (if that something else was more efficient for real-world use cases).
251	Yeah? e.g. External functions have empty regions. Almost every operation that has a region that can optionally be filled use an empty region, given that regions can't be dynamically added after op construction. Cleaned up this section though, we don't need the leading code, we can just use the block count.
268	Nice! I figured you would come up with something better ;)
mlir/include/mlir/Bytecode/BytecodeWriter.h
36	There was some reason why I did this before, but I forget now. I just dropped it.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
148	I thought I did: This method is marked noinline to avoid pessimizing the common case of single byte encoding. Tried to make it more obvious.
172	Right now, no. I need to setup the proper little -> native conversions. I'm deferring that to a follow up because I have to setup a virtual machine for big endian (which is annoying/time consuming). I also need to fix some things in the textual format related to big endian as well.
380	We will likely need some form of special API that can parse just the "immutable" part (i.e. the "name" in the LLVM struct case). For example, if an attribute/type is recursive, we could encode both its immutable and mutable encodings in one entry (with some header that has the size of the immutable part or something). Something like: RecursiveEntry { immutableEncodingSize: varint, immutableEncoding: ..., mutableEncoding: ... } During processing we could first process the immutable entry, and then immediately process the mutable one. That way any recursive references would resolve properly, and then we'd fix the final reference afterwards. Something like: // Parse the immutable first, so that we have something to give recursive references. if (!(result = parseImmutable())) return failure(); // Parse the mutable afterwards. Pass in `result` so that it can populate the mutable bits? if (failed(parseMutable(result)) return failure(); Until we figure any of this out though, I'm just going to have them always use the string fallback for those attributes and types.
754	Deferring this to a follow up to help simplify this patch, added a TODO for now.
mlir/lib/Bytecode/Writer/IRNumbering.cpp
60	We would need to encode things differently in that case, i.e. if the attributes are not in order of dialect, they would each need to have an associated dialect id encoded with them. In the case of lots of attributes/types, that would be significant. Maybe we could come up a hybrid model? i.e. encode the most common 128 attributes/types, so that they fit in one byte (or two), and then encode the rest using dialect grouping.
mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
156	Makes sense to me, just dropped it. We can add a warning back in if enough people trip up on this (given bytecode generation is an explicit decision).

Harbormaster completed remote builds in B181477: Diff 452942.Aug 16 2022, 4:28 AM

jpienaar added inline comments.Aug 16 2022, 7:55 AM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
615	So encoding would be start and end offsets into a string table?

mehdi_amini added inline comments.Aug 16 2022, 3:49 PM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
615	String being null terminated, you don't necessarily need the end offsets. But if we have an offset section separate from the string table: we just need to point to an entry number, same mechanism as attr/type reference.

(partial scan before meeting)

mlir/docs/BytecodeFormat.md
12	I just noticed ï and not i , I mean I guess writing this by hand and the other was that we don't actually take text file as bytecode (MĻîŘ just to bikeshed :))
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
10	Nit: I'd move this lower, I expect documentation here not todos :)
227	Dialect of this Attribute or Type ? (its mostly parent that makes me think OOP more than I normally think here, up to you)
309	attribute ?

mehdi_amini added inline comments.Aug 16 2022, 4:06 PM

mlir/docs/BytecodeFormat.md
12	I don't think your bike shed is ASCII though?

rriddle updated this revision to Diff 453166.Aug 16 2022, 4:41 PM

rriddle marked 5 inline comments as done.Aug 16 2022, 4:44 PM

Harbormaster completed remote builds in B181655: Diff 453166.Aug 16 2022, 5:08 PM

rriddle updated this revision to Diff 453172.Aug 16 2022, 5:09 PM

rriddle marked an inline comment as done.Aug 16 2022, 5:10 PM

Harbormaster completed remote builds in B181660: Diff 453172.Aug 16 2022, 5:33 PM

Missing negative tests are a bit unfortunate, but good to hear they are coming soon and this seems like good starting point.

mlir/docs/BytecodeFormat.md
12	Yeah I marked the comment as done before sending as I wasn't serious (and I did also use an extended ASCII encoding without realizing). But in seriousness: MLÏR wouldn't be a valid dialect name, so this looks good.
mlir/lib/Bytecode/Encoding.h
26	uint8_t here too? (I mean with 0 it doesn't matter)
80	We can now use binary literals (not sure it makes if it is more readable, but was reminded of it)
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
243	Unsigned needed?
414	Comment?
502	So this parses the string starting at front of reader? And null-terminated?
615	Indeed, null-termination means we can't have substrings referenced (not sure if that is common here, could think for error strings, but unsure about decoding cost).
mlir/lib/Bytecode/Writer/IRNumbering.cpp
60	Would sorting attributes per frequency per dialect? (Keep dialect attributes still together but just sort dialects per frequency). We could measure all three of course, doesn't require version bump ;-)

This revision is now accepted and ready to land.Aug 16 2022, 9:08 PM

rriddle updated this revision to Diff 453383.Aug 17 2022, 11:46 AM

rriddle marked 12 inline comments as done.

rriddle added inline comments.Aug 17 2022, 11:47 AM

mlir/lib/Bytecode/Encoding.h
26	kVersion is encoded as a varint now, so that we don't have to change it if we have some burst of changing versions (makes it easier to change version if we don't have a cap looming overhead). I suppose I could switch the general constants to use inline constexpr variables now (given we are on C++17), let me know your preference.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
502	Yeah, the front of the reader has an index to a string defined in the string section. Updated the comment.
mlir/test/Bytecode/general.mlir
1	Do you plan on reviving that @mehdi_amini ?

rriddle updated this revision to Diff 453397.Aug 17 2022, 12:32 PM

Herald added a subscriber: mgrang. · View Herald TranscriptAug 17 2022, 12:32 PM

Harbormaster completed remote builds in B181815: Diff 453397.Aug 17 2022, 1:51 PM

mehdi_amini added inline comments.Aug 17 2022, 2:53 PM

mlir/test/Bytecode/general.mlir
1	Yeah I should. I had memory that we couldn't reach consensus on it but I may be wrong.

LG, great start :)

jpienaar added inline comments.Aug 17 2022, 3:05 PM

mlir/test/Bytecode/general.mlir
1	I think only question was on on by default or not (e.g., how much of testing tool mlir-opt really is, should it be used in directed testing only etc)

rriddle updated this revision to Diff 453794.Aug 18 2022, 2:40 PM

rriddle marked 2 inline comments as done.

Nice, like the encoding change.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
58	Why this change?
mlir/lib/Bytecode/Writer/IRNumbering.cpp
81	Could this just be a static function here?

Harbormaster completed remote builds in B182092: Diff 453794.Aug 18 2022, 4:13 PM

rriddle marked 2 inline comments as done.Aug 18 2022, 7:23 PM

rriddle added inline comments.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
58	So that we can load in enums.

rriddle updated this revision to Diff 453864.Aug 18 2022, 7:23 PM

rriddle marked an inline comment as done.

Herald added a subscriber: arphaman. · View Herald TranscriptAug 18 2022, 7:23 PM

Harbormaster completed remote builds in B182139: Diff 453864.Aug 18 2022, 8:02 PM

Closed by commit rGf3acb54c1b7b: [mlir] Add initial support for a binary serialization format (authored by rriddle). · Explain WhyAug 22 2022, 12:47 AM

This revision was automatically updated to reflect the committed changes.

rriddle added a commit: rGf3acb54c1b7b: [mlir] Add initial support for a binary serialization format.

rriddle mentioned this in rG93cf0e8a28e8: [mlir] Fix bots after bytecode support was added in D131747.Aug 22 2022, 1:31 AM

gflegar mentioned this in rG1d9b1427f4ea: [mlir][Bazel] Fix bazel build.Aug 22 2022, 3:08 AM

This fails asan https://lab.llvm.org/buildbot/#/builders/5/builds/26955

vitalybuka added inline comments.Aug 22 2022, 9:53 AM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
902	also a problem, it emplace_back may relocate container, but the for loop above uses readState which is the ref to the element of container.
926	This pop_back and then readState.isIsolatedFromAbove which from the regionStack?

rriddle marked 2 inline comments as done.Aug 22 2022, 9:58 AM

rriddle added inline comments.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
902	This should be fine, given that we always return in this case (i.e. never touch to invalid reference again).
926	Thanks for catching this. I'm not sure why my local asan build didn't catch this (I'll try nuking and resetting it).

vitalybuka added inline comments.Aug 22 2022, 10:03 AM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
902	Thanks, I see.
926	I'm not sure why my local asan build didn't catch this Probably you don't use libc++ or instrumented libc++? I'm not sure why my local asan build didn't catch this I'm not sure why my local asan build didn't catch this If you can fix it quickly go for it. If not, please let me know, I have a patch to revert it with related fixes.

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

In D131747#3740323, @mehdi_amini wrote:

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

In D131747#3740334, @vitalybuka wrote:

In D131747#3740323, @mehdi_amini wrote:

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

Define "alive"? This is a new feature that has zero users and we're actively bootstrapping. So no I'm not scared by an out-of-bound here for a couple of days at most.

Looks like it's already fixed with 96fd3f2

In D131747#3740368, @vitalybuka wrote:
In D131747#3740335, @mehdi_amini wrote:

In D131747#3740334, @vitalybuka wrote:

In D131747#3740323, @mehdi_amini wrote:

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

Define "alive"? This is a new feature that has zero users and we're actively bootstrapping. So no I'm not scared by an out-of-bound here for a couple of days at most.

Sure, I don't know that code. So XFAIL is OK to me if you accept implications. (having that @rriddle failed to reproduce locally, maybe UNSUPPORTED instead, in case if some asan setups will miss the issues)
But seems revert/reland safe and easy to do as well.

Also if this is the only issue

Trivial fix may work?
bool isIsolatedFromAbove = readState.isIsolatedFromAbove;
 regionStack.pop_back();
 if (isIsolatedFromAbove)
   valueScopes.pop_back();

Yeah, sorry I've been in meetings. I pushed https://github.com/llvm/llvm-project/commit/96fd3f2d5be21ded6ffed0ac75195df04ec679df an hour ago and have been watching the bot to see if that is the only issue.

Hi @rriddle , as of commit https://github.com/llvm/llvm-project/commit/93cf0e8a28e8c682f65d3e5c394d1eb169ca09ce the s390x build bot is still red due to "unexpected success":

XPASS: MLIR::invalid-string_section.mlir
XPASS: MLIR::invalid_attr_type_offset_section.mlir
XPASS: MLIR::invalid_attr_type_section.mlir
XPASS: MLIR::invalid-structure.mlir
XPASS: MLIR::invalid-ir_section.mlir
XPASS: MLIR::invalid-dialect_section.mlir

(see https://lab.llvm.org/buildbot/#/builders/199/builds/8674)

Can these "invalid" tests still legitimately pass even on a big-endian platform? It seems these XFAILs should either be removed or changed into UNSUPPORTED.

In D131747#3740383, @uweigand wrote:
Hi @rriddle , as of commit https://github.com/llvm/llvm-project/commit/93cf0e8a28e8c682f65d3e5c394d1eb169ca09ce the s390x build bot is still red due to "unexpected success":
XPASS: MLIR::invalid-string_section.mlir
XPASS: MLIR::invalid_attr_type_offset_section.mlir
XPASS: MLIR::invalid_attr_type_section.mlir
XPASS: MLIR::invalid-structure.mlir
XPASS: MLIR::invalid-ir_section.mlir
XPASS: MLIR::invalid-dialect_section.mlir
(see https://lab.llvm.org/buildbot/#/builders/199/builds/8674)

Can these "invalid" tests still legitimately pass even on a big-endian platform? It seems these XFAILs should either be removed or changed into UNSUPPORTED.

@uweigand Thanks for the ping, it's possible big-endian is fine up to the point at which some of the tests fail (I haven't had time to setup a venv to test everything out). UNSUPPORTED is likely a better check than XFAIL there (I just copied from our other s390x broken test)

In D131747#3740387, @rriddle wrote:

@uweigand Thanks for the ping, it's possible big-endian is fine up to the point at which some of the tests fail (I haven't had time to setup a venv to test everything out). UNSUPPORTED is likely a better check than XFAIL there (I just copied from our other s390x broken test)

As of commit df4e637ca7ef4ef17b662845120864921e65bb67 the build bot is green again on s390x. Thanks!

RVP added a subscriber: RVP.Sep 29 2022, 8:29 AM

RVP added inline comments.

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
186	Is this parenthesized correctly?

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 29 2022, 8:29 AM

Herald added a subscriber: zero9178. · View Herald Transcript

jpienaar added inline comments.Sep 29 2022, 8:34 AM

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
186	This is checking if the value post shift is 0 (and relies on this function being called only when multi byte), what issue did you run into with this?

RVP added inline comments.Sep 29 2022, 8:38 AM

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
186	Shouldn't `LLVM_LIKELY` be around the whole condition instead of the shift expression? Isn't `== 0` the likely case and not the shift result being non-zero?

RVP added inline comments.Sep 29 2022, 8:53 AM

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
186	I didn't see any issues. Was looking at the code and this question popped. I now saw that `emitVarInt` specially handles the common case `(... >> 7) == 0`. Maybe a comment here as well would have avoided the question. Thanks.

Revision Contents

Path

Size

mlir/

docs/

BytecodeFormat.md

296 lines

include/

mlir/

Bytecode/

BytecodeReader.h

34 lines

BytecodeWriter.h

52 lines

IR/

OperationSupport.h

4 lines

Tools/

mlir-opt/

MlirOptMain.h

7 lines

lib/

Bytecode/

CMakeLists.txt

2 lines

Encoding.h

127 lines

Reader/

BytecodeReader.cpp

965 lines

CMakeLists.txt

11 lines

Writer/

474 lines

11 lines

173 lines

165 lines

1 line

Parser/

CMakeLists.txt

1 line

Parser.cpp

3 lines

Tools/

mlir-opt/

CMakeLists.txt

1 line

MlirOptMain.cpp

52 lines

test/

Bytecode/

general.mlir

30 lines

Diff 452346

mlir/docs/BytecodeFormat.md

This file was added.

				# MLIR Bytecode Format

				This documents describes the MLIR bytecode format and its encoding.

				[TOC]

				## Magic Number

				MLIR uses the following four-byte magic number to indicate bytecode files:

				'\[‘M’<sub>8</sub>, ‘L’<sub>8</sub>, ‘ï’<sub>8</sub>, ‘R’<sub>8</sub>\]'

				jpienaarUnsubmitted Done Reply Inline Actions I just noticed ï and not i , I mean I guess writing this by hand and the other was that we don't actually take text file as bytecode (MĻîŘ just to bikeshed :)) jpienaar: I just noticed ï and not i , I mean I guess writing this by hand and the other was that we…
				mehdi_aminiUnsubmitted Done Reply Inline Actions I don't think your bike shed is ASCII though? mehdi_amini: I don't think your bike shed is ASCII though?
				jpienaarUnsubmitted Done Reply Inline Actions Yeah I marked the comment as done before sending as I wasn't serious (and I did also use an extended ASCII encoding without realizing). But in seriousness: MLÏR wouldn't be a valid dialect name, so this looks good. jpienaar: Yeah I marked the comment as done before sending as I wasn't serious (and I did also use an…
				## Format Overview

				An MLIR Bytecode file is comprised of a byte stream, with a few simple
				structural concepts layered on top.

				### Primitives

				#### Fixed-Width Integers

				```
				byte ::= `0x00`...`0xFF`
				```

				Fixed width integers are unsigned integers of a known byte size. The values are
				mehdi_aminiUnsubmitted Done Reply Inline Actions No fixed64 ? Is the bet that we'd always use VBR for this? mehdi_amini: No fixed64 ? Is the bet that we'd always use VBR for this?
				rriddleAuthorUnsubmitted Done Reply Inline Actions I just didn't add it in. Technically right now we only use byte, so I dropped the rest and added a TODO to add larger widths as necessary. rriddle: I just didn't add it in. Technically right now we only use byte, so I dropped the rest and…
				stored in little-endian byte order.

				TODO: Add larger fixed width integers as necessary.

				#### Variable-Width Integers

				Variable width integers, or `VarInt`s, provide a compact representation for
				integers. Each encoded VarInt consists of one to nine bytes, which together
				represent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt"
				encoding for VarInts. This encoding is a variant of the
				jpienaarUnsubmitted Done Reply Inline Actions OOC why not use it? jpienaar: OOC why not use it?
				rriddleAuthorUnsubmitted Done Reply Inline Actions The encoding described here, or more commonly referred to as PrefixVarInt, is essentially a variant of LEB. Encoding and decoding are generally faster using the prefix strategy, though. The decode for the prefix variant, for example, is effectively branchless given that you just need to count the trailing zeros, which is an intrinsic on most modern hardware. I've benchmarked a bunch of different strategies and implementations over the past week, using both a corpus of .mlir taken from various projects and just using random distributions of integers. This is what emerged from that testing. rriddle: The encoding described here, or more commonly referred to as PrefixVarInt, is essentially a…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Are you using PrefixVarInt or how does it differ? Is your variant format documented somewhere in the literature? I'd rather have us stick to something existing, I doubt that we'll invent a revolutionary trick here somehow. Can you position what you're proposing against varint-G8IU, PFOR, SIMD-BP12, ... ? mehdi_amini: Are you using PrefixVarInt or how does it differ? Is your variant format documented somewhere…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, it's just PrefixVarInt. I completely missed explicitly saying that when rushing out the doc. rriddle: Yeah, it's just PrefixVarInt. I completely missed explicitly saying that when rushing out the…
				[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128)
				encoding, where each byte of the encoding provides up to 7 bits for the value,
				with the remaining bit used to store a tag indicating the number of bytes used
				for the encoding. This means that small unsigned integers (less than 2^7) may be
				stored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
				etc.

				The first byte of the encoding includes a length prefix in the low bits. This
				prefix is a bit sequence of '0's followed by a terminal '1', or the end of the
				byte. The number of '0' bits indicate the number of _additional_ bytes, not
				including the prefix byte, used to encode the value. All of the remaining bits
				in the first byte, along with all of the bits in the additional bytes, provide
				the value of the integer. Below are the various possible encodings of the prefix
				byte:

				```
				xxxxxxx1: 7 value bits, the encoding uses 1 byte
				xxxxxx10: 14 value bits, the encoding uses 2 bytes
				xxxxx100: 21 value bits, the encoding uses 3 bytes
				xxxx1000: 28 value bits, the encoding uses 4 bytes
				xxx10000: 35 value bits, the encoding uses 5 bytes
				xx100000: 42 value bits, the encoding uses 6 bytes
				x1000000: 49 value bits, the encoding uses 7 bytes
				10000000: 56 value bits, the encoding uses 8 bytes
				00000000: 64 value bits, the encoding uses 9 bytes
				```

				#### NUL Terminated Strings

				NUL Terminated Strings are terminated with the ASCII NUL character (whose byte
				value is zero). These are not used in cases where a string may contain an
				embedded NUL character. In cases that may hold an embedded NUL character, the
				string is encoded using a length and byte array.
				jpienaarUnsubmitted Done Reply Inline Actions What is used instead? jpienaar: What is used instead?

				### Sections

				```
				section {
				id: byte
				length: varint
				}
				```

				Sections are a mechanism for grouping data within the bytecode. The enable
				delayed processing, which is useful for out-of-order processing of data,
				lazy-loading, and more. Each section contains a Section ID and a length (which
				allowing for skipping over the section).

				TODO: Sections should also carry an optional alignment. Add this when necessary.

				## MLIR Encoding

				Given the generic structure of MLIR, the bytecode encoding is actually fairly
				simplistic. It effectively maps to the core components of MLIR.

				### Top Level Structure

				The top-level structure of the bytecode contains the 4-byte "magic number", a
				version number, and a list of sections. Each section is currently only expected
				to appear once within a bytecode file.

				```
				bytecode {
				magic: "MLïR",
				version: varint,
				mehdi_aminiUnsubmitted Done Reply Inline Actions Producer string? mehdi_amini: Producer string?
				sections: section[]
				}
				```

				### Dialect Section

				The dialect section of the bytecode contains all of the dialects referenced
				within the encoded IR, and some information about the components of those
				dialects that were also referenced.

				```
				dialect_section {
				dialects: dialect[]
				}

				dialect {
				name: nul_terminated_string,
				numAttrs: varint,
				numTypes: varint,
				mehdi_aminiUnsubmitted Done Reply Inline Actions I'm not sure here why these varint are useful? mehdi_amini: I'm not sure here why these varint are useful?
				rriddleAuthorUnsubmitted Done Reply Inline Actions These are used to know how many attributes/types/operation names need to be parsed. rriddle: These are used to know how many attributes/types/operation names need to be parsed.
				mehdi_aminiUnsubmitted Done Reply Inline Actions Actually, following our offline discussion, they are needed to be able match to type/attr back to a dialect right? mehdi_amini: Actually, following our offline discussion, they are needed to be able match to type/attr back…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Also right now it isn't used at all as far as I can tell. mehdi_amini: Also right now it isn't used at all as far as I can tell.
				numOpNames: varint,
				jpienaarUnsubmitted Done Reply Inline Actions Nit: [] instead of ? (I see the relational database or pointer notations of the latter but I find the former more intuitive) jpienaar:* Nit: [] instead of *? (I see the relational database or pointer notations of the latter but I…
				mehdi_aminiUnsubmitted Done Reply Inline Actions +1 for [] :) mehdi_amini: +1 for [] :)
				rriddleAuthorUnsubmitted Done Reply Inline Actions SG, I also like []. rriddle: SG, I also like [].
				opNames: nul_terminated_string[]
				}
				```

				### Attribute/Type Sections

				Attributes and types are encoded using two [sections](#sections), one section
				jpienaarUnsubmitted Done Reply Inline Actions Link to encoding section? jpienaar: Link to encoding section?
				(`attr_type_section`) containing the actual encoded representation, and another
				section (`attr_type_offset_section`) containing the offsets of each encoded
				attribute/type into the previous section. This allows for attributes and types
				to always be lazily loaded on demand.
				mehdi_aminiUnsubmitted Done Reply Inline Actions Worth mentioning: the first section can't be decoded without the second one as the elements in the array of attrs/types don't include a size. mehdi_amini: Worth mentioning: the first section can't be decoded without the second one as the elements in…

				```
				attr_type_section {
				attrs: attribute[],
				types: type[]
				}
				attr_type_offset_section {
				offset: varint[]
				}

				mehdi_aminiUnsubmitted Done Reply Inline Actions What is `kAsmForm` ? Seems to refer to an enum not documented here. mehdi_amini: What is `kAsmForm` ? Seems to refer to an enum not documented here.
				attribute {
				code: byte, // kAsmForm
				encoding: ...
				}
				type {
				code: byte, // kAsmForm
				mehdi_aminiUnsubmitted Done Reply Inline Actions I'm not sure yet how we dispatch to a dialect for loading a type/attribute mehdi_amini: I'm not sure yet how we dispatch to a dialect for loading a type/attribute
				rriddleAuthorUnsubmitted Done Reply Inline Actions We chatted a little about this offline. Attributes and Types are grouped by dialect, with each grouping emitted in the same order as the dialects in the dialect section. This allows for us to know which dialect an attribute/type belongs to based on its index (i.e. we could know attributes 0-5 are the builtin dialect, 6-9 are the func dialect and so on). rriddle: We chatted a little about this offline. Attributes and Types are grouped by dialect, with each…
				encoding: ...
				}
				```

				Each `offset` in the `attr_type_offset_section` above is the size of the
				encoding for the attribute or type. We avoid using the direct offset into the
				mehdi_aminiUnsubmitted Done Reply Inline Actions Basically this is differential encoding of the offset table, but that means you need to decompress the table here IIUC. Accessing the last attribute requires to read the entire table. (I assume we would do it once and for-all in the "BitcodeReaderContext" or whatever you're naming it) mehdi_amini: Basically this is differential encoding of the offset table, but that means you need to…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah. We do a single pass to initialize the bytecode structure of all attributes/types, which indicates where the data is stored in the bytecode. Reading lazily after that is trivial, because we just read the previously computed data directly. rriddle: Yeah. We do a single pass to initialize the bytecode structure of all attributes/types, which…
				`attr_type_section`, as a smaller relative offsets provides more effective
				compression. Attributes and types are grouped by dialect, with each dialect
				grouping in the same order of the dialects within the
				[dialect section](#dialect-section).
				mehdi_aminiUnsubmitted Done Reply Inline Actions Maybe make it explicit: `, this allows to associate an attribute back to a dialect without including a dialect reference in each type/attr entry.` mehdi_amini: Maybe make it explicit: `, this allows to associate an attribute back to a dialect without…

				#### Attribute/Type Encodings

				In the previous section, the forms of `attribute` and `type` both start with a
				`code` field. This field indicates how the attribute or type was encoded. In the
				abstract, an attribute/type is encoded in one of two possible ways: via its
				assembly format, or via a custom dialect defined encoding.

				##### Assembly Format Fallback

				mehdi_aminiUnsubmitted Done Reply Inline Actions That seems overly conservative to me: what about just storing the part after the dot and add a varint for the dialect ID? mehdi_amini: That seems overly conservative to me: what about just storing the part after the dot and add a…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Agreed, I need to look into this. I went with the current thing because it was easier to bootstrap (e.g. I don't have to worry about the difference between builtin and non-builtin). rriddle: Agreed, I need to look into this. I went with the current thing because it was easier to…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Can't you capture this as a TODO at the end of this paragraph? mehdi_amini: Can't you capture this as a TODO at the end of this paragraph?
				In the case where a dialect does not define a method for encoding the attribute
				or type, the textual assembly format of that attribute or type is used as a
				fallback. For example, a type of `!bytecode.type` would be encoded as the null
				terminated string "!bytecode.type". This ensures that every attribute and type
				may be encoded, even if the owning dialect has not yet opted in to a more
				efficient serialization.

				##### Dialect Defined Encoding

				TODO: This is not yet supported.

				### IR Section

				The IR section contains the encoded form of operations within the bytecode.

				#### Operation Encoding

				```
				op {
				name: varint,
				encodingMask: byte,

				location: varint?,
				mehdi_aminiUnsubmitted Done Reply Inline Actions I assume this field is needed to allow some level of lazy loading / random access? I'm not sure yet... mehdi_amini: I assume this field is needed to allow some level of lazy loading / random access? I'm not sure…
				rriddleAuthorUnsubmitted Done Reply Inline Actions This field is the value index of the first result. Every value gets a number, which is what gets referred to in the `operands` list. For multiple results, we know the value number are consecutive, so we just need to know the first one. rriddle: This field is the value index of the first result. Every value gets a number, which is what…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Right, but I'm still not sure why this needs to be encoded: if you load operations in order, you could just use an ever incrementing number for each Value. mehdi_amini: Right, but I'm still not sure why this needs to be encoded: if you load operations in order…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Originally I was thinking that you can't do that if you load lazily, but if I use your other suggestion of doing per-isolated region numbering it should be possible. This will be much nicer, thanks! rriddle: Originally I was thinking that you can't do that if you load lazily, but if I use your other…

				attrDict: varint?,

				firstResultIndex: varint?,
				numResults: varint?,
				resultTypes: varint[],

				numOperands: varint?,
				operands: varint[],

				numSuccessors: varint?,
				successors: varint[],
				mehdi_aminiUnsubmitted Done Reply Inline Actions Having the regions in-line makes it hard/impossible to lazily load IR I think? (at least not without decoding the entire IR section). mehdi_amini: Having the regions in-line makes it hard/impossible to lazily load IR I think? (at least not…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah. The idea I have right now is that the op encoding mask will indicate if the regions are inline or out-of-line. That way we just dispatch to two different code paths depending on the encoding. rriddle: Yeah. The idea I have right now is that the op encoding mask will indicate if the regions are…
				mehdi_aminiUnsubmitted Done Reply Inline Actions I think we should think about isolated region from the get go: you don't document the value numbering in the doc but I think we can use the same "local scope" as the textual parser to ensure that the value IDs stays small (and so use less varint space). mehdi_amini: I think we should think about isolated region from the get go: you don't document the value…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Great suggestion! rriddle: Great suggestion!

				numRegions: varint?,
				regions: region[]
				}
				```

				jpienaarUnsubmitted Done Reply Inline Actions And 7 of 8 optional elements used? jpienaar: And 7 of 8 optional elements used?
				mehdi_aminiUnsubmitted Done Reply Inline Actions Actually 6? `firstResultIndex` and `numResults` seem coupled right? mehdi_amini: Actually 6? `firstResultIndex` and `numResults` seem coupled right?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, right now the mask uses 6 out of 8 possible bits. Whenever we do lazy loading that may bump up to 7 bits (we could get around that by encoding the "are the regions lazy" bit a different way). We could of course change from a mask to a set of values for each possible encoding type. rriddle: Yeah, right now the mask uses 6 out of 8 possible bits. Whenever we do lazy loading that may…
				The encoding of an operation is important because this is generally the most
				commonly appearing structure in the bytecode. A single encoding is used for
				every type of operation. Given this prevelance, many of the fields of an
				operation are optional. The `encodingMask` field is a bitmask which indicates
				which of the components of the operation are present.
				jpienaarUnsubmitted Done Reply Inline Actions So location is either previous or new? Unspecified doesn't mean unknown but previous. That's nice. The only other common case I could think of is successive lines in a file but that would take a bit. jpienaar: So location is either previous or new? Unspecified doesn't mean unknown but previous. That's…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, the file locations encoding is gonna be interesting. I'm hoping that when builtin attributes have custom encodings it ends up being mostly okay (should just be a string index+two small varints for the line/col). This is something we will likely want to play around with, but thankfully it's easy to test (I have a huge IR file that inherits locations from the file). rriddle: Yeah, the file locations encoding is gonna be interesting. I'm hoping that when builtin…

				##### Location

				If necessary to encode, i.e. if the location for this operation is different
				than the location for the last operation or block argument, the index of the
				location within the attribute table is encoded.
				mehdi_aminiUnsubmitted Done Reply Inline Actions Is this really the common case that (other than "unknown") the locations are repeating in sequence? I am not convinced by this choice right now because it will make future lazy loading harder (you need to stream back to find the previously defined location). mehdi_amini: Is this really the common case that (other than "unknown") the locations are repeating in…
				rriddleAuthorUnsubmitted Done Reply Inline Actions I don't think it would make lazy-loading bad, given that we could encode the last location at the start of the region. I'm going to drop this behavior though, given that I'm not sure how many cases in practice it would help. This would also free up this optional behavior for something else (if that something else was more efficient for real-world use cases). rriddle: I don't think it would make lazy-loading bad, given that we could encode the last location at…

				##### Attributes

				If the operation has attribues, the index of the operation attribute dictionary
				within the attribute table is encoded.

				##### Results

				If the operation has results, the value index of the first result is encoded.
				After that, the number of results and the indexes of the result types within the
				type table are encoded.

				##### Operands

				If the operation has operands, the number of operands and the value index of
				each operand is encoded.

				##### Successors

				If the operation has successors, the number of successors and the indexes of the
				successor blocks within the parent region are encoded.

				##### Regions

				If the operation has regions, the number of regions and each region are encoded.

				#### Region Encoding
				jpienaarUnsubmitted Done Reply Inline Actions So we could have a region with 0 blocks here that is considered not empty? jpienaar: So we could have a region with 0 blocks here that is considered not empty?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Zero block regions should be marked as empty. I used codes for various "bool" like things just to make it easier to change things later on. E.g. if we wanted to change the way regions are encoded we could just have a new "kRegion2" code. Though that only matters after we start versioning things. rriddle: Zero block regions should be marked as empty. I used codes for various "bool" like things just…

				mehdi_aminiUnsubmitted Done Reply Inline Actions Do we need this indirection here? You didn't provide one for the operation content. mehdi_amini: Do we need this indirection here? You didn't provide one for the operation content.
				```
				region {
				code: byte, // kRegion \| kRegionEmpty
				mehdi_aminiUnsubmitted Done Reply Inline Actions When is "region empty" useful? Is it legal for an operation to have an empty region? mehdi_amini: When is "region empty" useful? Is it legal for an operation to have an empty region?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah? e.g. External functions have empty regions. Almost every operation that has a region that can optionally be filled use an empty region, given that regions can't be dynamically added after op construction. Cleaned up this section though, we don't need the leading code, we can just use the block count. rriddle: Yeah? e.g. External functions have empty regions. Almost every operation that has a region that…

				numBlocks: varint?,
				numValues: varint?,
				blocks: block[]
				}
				```

				A region is encoded with a leading code followed by the body. The code indicates
				how the body is encoded. If the code is `kRegionEmpty`, the region has no body.
				If the code is `kRegion`, the body is present.
				mehdi_aminiUnsubmitted Done Reply Inline Actions Please document numValues :) mehdi_amini: Please document numValues :)

				#### Block Encoding

				```
				mehdi_aminiUnsubmitted Done Reply Inline Actions block_element ? mehdi_amini: block_element ?
				block {
				block_code: byte, // kBlockArguments \| kOp \| kBlockEnd
				block_element: block_arguments \| op \| []
				mehdi_aminiUnsubmitted Done Reply Inline Actions I don't get this, why multiple `block_arguments` blocks? What about something like: block { encoding: varint, // (numOps << 1) \| (hasBlockArgs) arguments: block_arguments?, // Optional based on encoding ops : op[] } block_arguments { firstArgIndex: varint, numArgs: varint?, args: block_argument[] } block_argument { typeIndexAndHasLoc: varint, // (typeIndex << 1) \| (hasLoc) location: varint? } mehdi_amini: I don't get this, why multiple `block_arguments` blocks? What about something like: ``` block…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Nice! I figured you would come up with something better ;) rriddle: Nice! I figured you would come up with something better ;)
				}

				block_arguments {
				code: byte, // kBlockArguments

				firstArgIndex: varint,
				numArgs: varint?,
				args: block_argument[]

				}
				block_argument {
				typeIndexAndHasLoc: varint, // (typeIndex << 1) \| (hasLoc)
				location: varint?
				}

				```

				A block is encoded with an array of elements determined by a leading code. The
				terminal `kBlockEnd` code indicates the end of a block. The `kOp` code indicates
				that an operation follows. If the block has arguments, the first element of the
				block will contain the encoded representation of the arguments, or
				`block_arguments` above. The encoding for the block arguments includes the value
				index of the first argument, the number of arguments, and an encoded list of
				arguments. The `typeIndexAndHasLoc` field of the argument is a varint that in
				the high-bits holds the index for the type of that argument, and in the low bit
				contains a flag that indicates if the argument has a location encoded along with
				it. A location is encoded if the argument had a different location than the
				previously encoded argument or operation.

mlir/include/mlir/Bytecode/BytecodeReader.h

This file was added.

				//===- BytecodeReader.h - MLIR Bytecode Reader ------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header defines interfaces to read MLIR bytecode files/streams.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_BYTECODE_BYTECODEREADER_H
				#define MLIR_BYTECODE_BYTECODEREADER_H

				#include "mlir/IR/AsmState.h"
				#include "mlir/Support/LLVM.h"

				namespace llvm {
				class MemoryBufferRef;
				} // namespace llvm

				namespace mlir {
				/// Returns true if the given buffer starts with the magic bytes that signal
				/// MLIR bytecode.
				bool isBytecode(llvm::MemoryBufferRef buffer);

				/// Read the operations defined within the given memory buffer, containing MLIR
				/// bytecode, into the provided block.
				LogicalResult readBytecodeFile(llvm::MemoryBufferRef buffer, Block *block,
				const ParserConfig &config);
				} // namespace mlir

				#endif // MLIR_BYTECODE_BYTECODEREADER_H

mlir/include/mlir/Bytecode/BytecodeWriter.h

This file was added.

				//===- BytecodeWriter.h - MLIR Bytecode Writer ------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header defines interfaces to write MLIR bytecode files/streams.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_BYTECODE_BYTECODEWRITER_H
				#define MLIR_BYTECODE_BYTECODEWRITER_H

				#include "mlir/Support/LLVM.h"

				namespace mlir {
				class Operation;

				//===----------------------------------------------------------------------===//
				// BytecodeWriterConfig
				//===----------------------------------------------------------------------===//

				/// This class provides a configuration for the bytecode writer. It is the main
				/// injection of information into the writer.
				class BytecodeWriterConfig {
				struct Impl;

				public:
				BytecodeWriterConfig(Operation *op);
				~BytecodeWriterConfig();

				/// Return the root operation of the writer.
				Operation *getRootOp() const;

				jpienaarUnsubmitted Done Reply Inline Actions This feels a little bit weird ... I'd almost expect like OpPrintingFlags having the config be separate from Operation* being printed. But perhaps it makes more sense below. jpienaar: This feels a little bit weird ... I'd almost expect like OpPrintingFlags having the config be…
				rriddleAuthorUnsubmitted Done Reply Inline Actions There was some reason why I did this before, but I forget now. I just dropped it. rriddle: There was some reason why I did this before, but I forget now. I just dropped it.
				private:
				/// A pointer to the allocated storage for the impl state.
				std::unique_ptr<Impl> impl;
				};

				//===----------------------------------------------------------------------===//
				// Entry Points
				//===----------------------------------------------------------------------===//

				/// Write the given bytecode configuration to the provided output stream. For
				/// streams where it matters, the given stream should be in "binary" mode.
				void writeBytecodeToFile(const BytecodeWriterConfig &config, raw_ostream &os);

				} // namespace mlir

				#endif // MLIR_BYTECODE_BYTECODEWRITER_H

mlir/include/mlir/IR/OperationSupport.h

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	struct OperationState {
/// Regions that the op will hold.		/// Regions that the op will hold.
SmallVector<std::unique_ptr<Region>, 1> regions;		SmallVector<std::unique_ptr<Region>, 1> regions;

public:		public:
OperationState(Location location, StringRef name);		OperationState(Location location, StringRef name);
OperationState(Location location, OperationName name);		OperationState(Location location, OperationName name);

OperationState(Location location, OperationName name, ValueRange operands,		OperationState(Location location, OperationName name, ValueRange operands,
TypeRange types, ArrayRef<NamedAttribute> attributes,		TypeRange types, ArrayRef<NamedAttribute> attributes = {},
BlockRange successors = {},		BlockRange successors = {},
MutableArrayRef<std::unique_ptr<Region>> regions = {});		MutableArrayRef<std::unique_ptr<Region>> regions = {});
OperationState(Location location, StringRef name, ValueRange operands,		OperationState(Location location, StringRef name, ValueRange operands,
TypeRange types, ArrayRef<NamedAttribute> attributes,		TypeRange types, ArrayRef<NamedAttribute> attributes = {},
BlockRange successors = {},		BlockRange successors = {},
MutableArrayRef<std::unique_ptr<Region>> regions = {});		MutableArrayRef<std::unique_ptr<Region>> regions = {});

void addOperands(ValueRange newOperands);		void addOperands(ValueRange newOperands);

void addTypes(ArrayRef<Type> newTypes) {		void addTypes(ArrayRef<Type> newTypes) {
types.append(newTypes.begin(), newTypes.end());		types.append(newTypes.begin(), newTypes.end());
}		}
▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	/// "expected-(error\|note\|remark\|warning)" are parsed in the input and matched			/// "expected-(error\|note\|remark\|warning)" are parsed in the input and matched
	/// against emitted diagnostics.			/// against emitted diagnostics.
	/// - verifyPasses enables the IR verifier in-between each pass in the pipeline.			/// - verifyPasses enables the IR verifier in-between each pass in the pipeline.
	/// - allowUnregisteredDialects allows to parse and create operation without			/// - allowUnregisteredDialects allows to parse and create operation without
	/// registering the Dialect in the MLIRContext.			/// registering the Dialect in the MLIRContext.
	/// - preloadDialectsInContext will trigger the upfront loading of all			/// - preloadDialectsInContext will trigger the upfront loading of all
	/// dialects from the global registry in the MLIRContext. This option is			/// dialects from the global registry in the MLIRContext. This option is
	/// deprecated and will be removed soon.			/// deprecated and will be removed soon.
				/// - emitBytecode will generate bytecode output instead of text.
	LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,			LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,
	std::unique_ptr<llvm::MemoryBuffer> buffer,			std::unique_ptr<llvm::MemoryBuffer> buffer,
	const PassPipelineCLParser &passPipeline,			const PassPipelineCLParser &passPipeline,
	DialectRegistry &registry, bool splitInputFile,			DialectRegistry &registry, bool splitInputFile,
	bool verifyDiagnostics, bool verifyPasses,			bool verifyDiagnostics, bool verifyPasses,
	bool allowUnregisteredDialects,			bool allowUnregisteredDialects,
	bool preloadDialectsInContext = false);			bool preloadDialectsInContext = false,
				bool emitBytecode = false);

	/// Support a callback to setup the pass manager.			/// Support a callback to setup the pass manager.
	/// - passManagerSetupFn is the callback invoked to setup the pass manager to			/// - passManagerSetupFn is the callback invoked to setup the pass manager to
	/// apply on the loaded IR.			/// apply on the loaded IR.
	LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,			LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,
	std::unique_ptr<llvm::MemoryBuffer> buffer,			std::unique_ptr<llvm::MemoryBuffer> buffer,
	PassPipelineFn passManagerSetupFn,			PassPipelineFn passManagerSetupFn,
	DialectRegistry &registry, bool splitInputFile,			DialectRegistry &registry, bool splitInputFile,
	bool verifyDiagnostics, bool verifyPasses,			bool verifyDiagnostics, bool verifyPasses,
	bool allowUnregisteredDialects,			bool allowUnregisteredDialects,
	bool preloadDialectsInContext = false);			bool preloadDialectsInContext = false,
				bool emitBytecode = false);

	/// Implementation for tools like `mlir-opt`.			/// Implementation for tools like `mlir-opt`.
	/// - toolName is used for the header displayed by `--help`.			/// - toolName is used for the header displayed by `--help`.
	/// - registry should contain all the dialects that can be parsed in the source.			/// - registry should contain all the dialects that can be parsed in the source.
	/// - preloadDialectsInContext will trigger the upfront loading of all			/// - preloadDialectsInContext will trigger the upfront loading of all
	/// dialects from the global registry in the MLIRContext. This option is			/// dialects from the global registry in the MLIRContext. This option is
	/// deprecated and will be removed soon.			/// deprecated and will be removed soon.
	LogicalResult MlirOptMain(int argc, char **argv, llvm::StringRef toolName,			LogicalResult MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
	Show All 20 Lines

mlir/lib/Bytecode/CMakeLists.txt

This file was added.

				add_subdirectory(Reader)
				add_subdirectory(Writer)
				No newline at end of file

mlir/lib/Bytecode/Encoding.h

This file was added.

				//===- Encoding.h - MLIR binary format encoding information ------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header defines enum values describing the structure of MLIR bytecode
				// files.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LIB_MLIR_BYTECODE_ENCODING_H
				#define LIB_MLIR_BYTECODE_ENCODING_H

				#include <cstdint>

				namespace mlir {
				namespace bytecode {
				//===----------------------------------------------------------------------===//
				// General constants
				//===----------------------------------------------------------------------===//

				enum {
				/// The current bytecode version.
				jpienaarUnsubmitted Done Reply Inline Actions uint8_t here too? (I mean with 0 it doesn't matter) jpienaar: uint8_t here too? (I mean with 0 it doesn't matter)
				rriddleAuthorUnsubmitted Done Reply Inline Actions kVersion is encoded as a varint now, so that we don't have to change it if we have some burst of changing versions (makes it easier to change version if we don't have a cap looming overhead). I suppose I could switch the general constants to use inline constexpr variables now (given we are on C++17), let me know your preference. rriddle: kVersion is encoded as a varint now, so that we don't have to change it if we have some burst…
				kVersion = 0,

				/// The first non-builtin section code.
				mehdi_aminiUnsubmitted Done Reply Inline Actions I'm not clear on how you manage "codes", or what are a "builtin section codes" mehdi_amini: I'm not clear on how you manage "codes", or what are a "builtin section codes"
				kFirstNonBuiltinCode = 16,
				};

				namespace BuiltinCode {
				enum : uint8_t {
				/// This value indicates the code for a section.
				kSection = 0,
				};
				} // namespace BuiltinCode

				//===----------------------------------------------------------------------===//
				// Sections
				//===----------------------------------------------------------------------===//

				namespace Section {
				enum ID : uint8_t {
				/// This section contains the dialects referenced within an IR module.
				kDialect = 0,

				/// This section contains the attributes and types referenced within an IR
				/// module.
				kAttrType = 1,

				/// This section contains the offsets for the attribute and types within the
				/// AttrType section.
				kAttrTypeOffset = 2,

				/// This section contains the top level operation, and its nested
				mehdi_aminiUnsubmitted Done Reply Inline Actions "a" top level operation... We parse in a block so we should have the ability to have multiple I think. mehdi_amini: "a" top level operation... We parse in a block so we should have the ability to have multiple I…
				/// regions/operations.
				kTopLevelOp = 3,

				/// The total number of section types.
				kNumSections = 4,
				};
				} // namespace Section

				//===----------------------------------------------------------------------===//
				// AttrType Section
				//===----------------------------------------------------------------------===//

				namespace AttrTypeCode {
				enum : uint8_t {
				/// This code represents an attribute or type represented in the textual
				/// assembly format.
				kAsmForm,
				};
				} // namespace AttrTypeCode

				//===----------------------------------------------------------------------===//
				// kTopLevelOp Section
				//===----------------------------------------------------------------------===//
				jpienaarUnsubmitted Done Reply Inline Actions We can now use binary literals (not sure it makes if it is more readable, but was reminded of it) jpienaar: We can now use binary literals (not sure it makes if it is more readable, but was reminded of…

				namespace TopLevelOpCode {
				enum : uint8_t {
				//===--------------------------------------------------------------------===//
				// Operation Codes

				/// This code represents an operation.
				kOp = kFirstNonBuiltinCode,

				//===--------------------------------------------------------------------===//
				// Region Codes

				/// This code represents a non-empty region.
				kRegion,

				/// This code represents an empty region.
				kRegionEmpty,

				//===--------------------------------------------------------------------===//
				// Block Codes

				/// This code represents the argument list of a block.
				kBlockArguments,

				/// This code represents the end of a block.
				kBlockEnd,
				};
				} // namespace TopLevelOpCode

				/// This enum represents a mask of all of the potential components of an
				/// operation. This mask is used when encoding an operation to indicate which
				/// components are present in the bytecode.
				namespace OpEncodingMask {
				enum : uint8_t {
				kHasLoc = 1 << 0,
				kHasAttrs = 1 << 1,
				kHasResults = 1 << 2,
				kHasOperands = 1 << 3,
				kHasSuccessors = 1 << 4,
				kHasInlineRegions = 1 << 5,
				};
				} // namespace OpEncodingMask

				} // namespace bytecode
				} // namespace mlir

				#endif

mlir/lib/Bytecode/Reader/BytecodeReader.cpp

This file was added.

				//===- BytecodeReader.cpp - MLIR Bytecode Reader --------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Bytecode/BytecodeReader.h"
				#include "../Encoding.h"
				jpienaarUnsubmitted Done Reply Inline Actions Nit: I'd move this lower, I expect documentation here not todos :) jpienaar: Nit: I'd move this lower, I expect documentation here not todos :)
				#include "mlir/AsmParser/AsmParser.h"
				#include "mlir/IR/BuiltinDialect.h"
				#include "mlir/IR/OpImplementation.h"
				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/ScopeExit.h"
				#include "llvm/ADT/SmallString.h"
				#include "llvm/Support/MemoryBufferRef.h"
				#include "llvm/Support/SaveAndRestore.h"

				#define DEBUG_TYPE "mlir-bytecode"

				jpienaarUnsubmitted Done Reply Inline Actions mlir-bytecode-reader ? jpienaar: mlir-bytecode-reader ?
				using namespace mlir;

				//===----------------------------------------------------------------------===//
				// EncodingReader
				//===----------------------------------------------------------------------===//

				namespace {
				class EncodingReader {
				public:
				explicit EncodingReader(ArrayRef<uint8_t> contents, Location fileLoc)
				: dataIt(contents.data()), dataEnd(contents.end()), fileLoc(fileLoc) {}
				explicit EncodingReader(StringRef contents, Location fileLoc)
				: EncodingReader({reinterpret_cast<const uint8_t *>(contents.data()),
				contents.size()},
				fileLoc) {}

				/// Returns true if the entire section has been read.
				bool empty() const { return dataIt == dataEnd; }

				/// Returns the remaining size of the bytecode.
				size_t size() const { return dataEnd - dataIt; }

				/// Emit an error using the given arguments.
				template <typename... Args>
				LogicalResult emitError(Args &&...args) const {
				return ::emitError(fileLoc).append(std::forward<Args>(args)...);
				}

				/// Parse a single byte from the stream.
				template <typename T>
				ParseResult parseByte(T &value) {
				if (empty())
				return emitError("attempting to parse a byte at the end of the bytecode");
				value = *dataIt++;
				return success();
				}
				/// Parse a range of bytes of 'length' into the given result.
				jpienaarUnsubmitted Done Reply Inline Actions Why this change? jpienaar: Why this change?
				rriddleAuthorUnsubmitted Done Reply Inline Actions So that we can load in enums. rriddle: So that we can load in enums.
				ParseResult parseBytes(size_t length, ArrayRef<uint8_t> &result) {
				if (length > size()) {
				return emitError("attempting to parse ", length, " bytes when only ",
				size(), " remain");
				}
				result = {dataIt, length};
				dataIt += length;
				return success();
				}
				/// Parse a range of bytes of 'length' into the given result, which can be
				/// assumed to be large enough to hold `length`.
				ParseResult parseBytes(size_t length, uint8_t *result) {
				if (length > size()) {
				return emitError("attempting to parse ", length, " bytes when only ",
				size(), " remain");
				}
				memcpy(result, dataIt, length);
				dataIt += length;
				return success();
				}

				/// Parse a variable length encoded integer from the byte stream. The first
				/// encoded byte contains a prefix in the low bits indicating the encoded
				/// length of the value. This length prefix is a bit sequence of '0's followed
				/// by a '1'. The number of '0' bits indicate the number of _additional_ bytes
				/// (not including the prefix byte). All remaining bits in the first byte,
				/// along with all of the bits in additional bytes, provide the value of the
				/// integer encoded in little-endian order.
				ParseResult parseVarInt(uint64_t &result) {
				// Parse the first byte of the encoding, which contains the length prefix.
				if (parseByte(result))
				return failure();

				// Handle the overwhelmingly common case where the value is stored in a
				// single byte. In this case, the first bit is the `1` marker bit.
				if (LLVM_LIKELY(result & 1)) {
				result >>= 1;
				return success();
				}

				// Handle the overwhelming uncommon case where the value required all 8
				// bytes (i.e. a really really big number). In this case, the marker byte is
				// all zeros: `00000000`.
				if (LLVM_UNLIKELY(result == 0))
				return parseBytes(sizeof(result), reinterpret_cast<uint8_t *>(&result));
				return parseMultiByteVarInt(result);
				}

				/// Skip the first `length` bytes within the reader.
				ParseResult skipBytes(size_t length) {
				if (length > size()) {
				return emitError("attempting to skip ", length, " bytes when only ",
				size(), " remain");
				}
				dataIt += length;
				return success();
				}

				/// Parse a NUL terminated string into `result` (without including the NUL
				/// terminator).
				ParseResult parseNULTerminatedString(StringRef &result) {
				const char startIt = (const char )dataIt;
				jpienaarUnsubmitted Done Reply Inline Actions Nit: I'm more used to null-terminated, with NUL being the character name. jpienaar: Nit: I'm more used to null-terminated, with NUL being the character name.
				const char nulIt = (const char )memchr(startIt, 0, size());
				if (!nulIt)
				return emitError("malformed NUL terminated string, no NUL found");

				result = StringRef(startIt, nulIt - startIt);
				dataIt = (const uint8_t *)nulIt + 1;
				return success();
				}

				/// Parse a section header, placing the kind of section in `sectionID` and the
				/// contents of the section in `sectionData`.
				ParseResult parseSection(uint8_t &sectionID, ArrayRef<uint8_t> &sectionData) {
				size_t length;
				if (parseByte(sectionID) \|\| parseVarInt(length))
				return failure();
				mehdi_aminiUnsubmitted Done Reply Inline Actions You should sanity check sectionID here I think (and make the argument type the right enum in the API) mehdi_amini: You should sanity check sectionID here I think (and make the argument type the right enum in…

				// Parse the actua section data now that we have its length.
				return parseBytes(length, sectionData);
				}

				private:
				/// Parse a variable length encoded integer from the byte stream. This method
				/// is a fallback when the number of bytes used to encode the value is greater
				/// than 1, but less than the max (9). The provided `result` value can be
				/// assumed to already contain the first byte of the value. This method is
				/// marked noinline to avoid pessimizing the common case of single byte
				/// encoding.
				LLVM_ATTRIBUTE_NOINLINE ParseResult parseMultiByteVarInt(uint64_t &result) {
				mehdi_aminiUnsubmitted Done Reply Inline Actions Document noinline please. mehdi_amini: Document noinline please.
				rriddleAuthorUnsubmitted Done Reply Inline Actions I thought I did: This method is marked noinline to avoid pessimizing the common case of single byte encoding. Tried to make it more obvious. rriddle: I thought I did: ``` This method is marked noinline to avoid pessimizing the common case of…
				// Count the number of trailing zeros in the marker byte, this indicates the
				// number of trailing bytes that are part of the value. We use `uint32_t`
				// here because we only care about the first byte, and so that be actually
				// get ctz intrinsic calls when possible (the `uint8_t` overload uses a loop
				// implementation).
				uint32_t numBytes =
				llvm::countTrailingZeros<uint32_t>(result, llvm::ZB_Undefined);

				mehdi_aminiUnsubmitted Done Reply Inline Actions `assert(numBytes > 0 && numBytes <= 7);` ? mehdi_amini: `assert(numBytes > 0 && numBytes <= 7);` ?
				// Parse in the remaining bytes of the value.
				if (parseBytes(numBytes, reinterpret_cast<uint8_t *>(&result) + 1))
				return failure();

				// Shift out the low-order bits that were used to mark how the value was
				// encoded.
				result >>= (numBytes + 1);
				return success();
				}

				/// The current data iterator, and an iterator to the end of the buffer.
				const uint8_t dataIt, dataEnd;

				/// A location for the bytecode used to report errors.
				Location fileLoc;
				};
				mehdi_aminiUnsubmitted Done Reply Inline Actions Does this work on a big endian machine? mehdi_amini: Does this work on a big endian machine?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Right now, no. I need to setup the proper little -> native conversions. I'm deferring that to a follow up because I have to setup a virtual machine for big endian (which is annoying/time consuming). I also need to fix some things in the textual format related to big endian as well. rriddle: Right now, no. I need to setup the proper little -> native conversions. I'm deferring that to a…
				} // namespace

				//===----------------------------------------------------------------------===//
				// BytecodeDialect
				//===----------------------------------------------------------------------===//

				namespace {
				/// This struct represents a dialect entry within the bytecode.
				struct BytecodeDialect {
				BytecodeDialect(Dialect *dialect, StringRef name, unsigned numAttrs,
				unsigned numTypes)
				: dialect(dialect), name(name), numAttrs(numAttrs), numTypes(numTypes) {}

				/// The loaded dialect entry, if available, otherwise nullptr.
				Dialect *dialect;

				/// The name of the dialect.
				StringRef name;

				/// The number of attributes owned by this dialect in the bytecode.
				unsigned numAttrs;

				/// The number of types owned by this dialect in the bytecode.
				unsigned numTypes;
				};
				} // namespace

				//===----------------------------------------------------------------------===//
				// Attribute/Type Reader
				//===----------------------------------------------------------------------===//

				namespace {
				/// This class provides support for reading attribute and type entries from the
				/// bytecode. Attribute and Type entries are read lazily on demand, so we use
				/// this reader to manage when to actually parse them from the bytecode.
				class AttrTypeReader {
				/// This class represents a single attribute or type entry.
				template <typename T>
				struct Entry {
				/// The entry, or null if it hasn't been resolved yet.
				T entry = {};
				/// The raw data of this entry in the bytecode.
				ArrayRef<uint8_t> data;
				};
				mehdi_aminiUnsubmitted Done Reply Inline Actions We should have a pointer to the BytecodeDialect here I think, should be able to set it up in initializeOffsets mehdi_amini: We should have a pointer to the BytecodeDialect here I think, should be able to set it up in…
				using AttrEntry = Entry<Attribute>;
				using TypeEntry = Entry<Type>;

				public:
				AttrTypeReader(Location fileLoc) : fileLoc(fileLoc) {}

				/// Initialize the attribute and type information within the reader.
				LogicalResult initialize(ArrayRef<BytecodeDialect> dialects,
				ArrayRef<uint8_t> sectionData,
				ArrayRef<uint8_t> offsetSectionData);

				jpienaarUnsubmitted Done Reply Inline Actions Dialect of this Attribute or Type ? (its mostly parent that makes me think OOP more than I normally think here, up to you) jpienaar: Dialect of this Attribute or Type ? (its mostly parent that makes me think OOP more than I…
				/// Resolve the attribute or type at the given index. Returns nullptr on
				/// failure.
				Attribute resolveAttribute(unsigned index) {
				return resolveEntry(attributes, index, "Attribute");
				}
				Type resolveType(unsigned index) {
				return resolveEntry(types, index, "Type");
				}

				private:
				/// Initialize the offsets for the attribute and type entries.
				LogicalResult initializeOffsets(ArrayRef<uint8_t> sectionData,
				ArrayRef<uint8_t> offsetSectionData);

				/// Resolve the given entry at `index`.
				template <typename T>
				jpienaarUnsubmitted Done Reply Inline Actions Unsigned needed? jpienaar: Unsigned needed?
				T resolveEntry(SmallVectorImpl<Entry<T>> &entries, unsigned index,
				StringRef entryType);

				/// Parse the value defined within the given reader. `code` indicates how the
				/// entry was encoded.
				mehdi_aminiUnsubmitted Done Reply Inline Actions We should be able to have an enum for code here right? mehdi_amini: We should be able to have an enum for code here right?
				LogicalResult parseEntry(EncodingReader &reader, uint8_t code,
				Attribute &result);
				LogicalResult parseEntry(EncodingReader &reader, uint8_t code, Type &result);

				/// The set of attribute and type entries.
				SmallVector<AttrEntry> attributes;
				SmallVector<TypeEntry> types;

				/// A location used for error emission.
				Location fileLoc;
				};
				} // namespace

				LogicalResult AttrTypeReader::initialize(ArrayRef<BytecodeDialect> dialects,
				ArrayRef<uint8_t> sectionData,
				ArrayRef<uint8_t> offsetSectionData) {
				// Initialize the entries using the dialect information.
				unsigned numAttrs = 0, numTypes = 0;
				for (const BytecodeDialect &dialect : dialects) {
				numAttrs += dialect.numAttrs;
				numTypes += dialect.numTypes;
				}
				attributes.resize(numAttrs);
				types.resize(numTypes);

				// With the entries initialized, we can process the offsets.
				return initializeOffsets(sectionData, offsetSectionData);
				}

				LogicalResult
				AttrTypeReader::initializeOffsets(ArrayRef<uint8_t> sectionData,
				ArrayRef<uint8_t> offsetSectionData) {
				EncodingReader reader(offsetSectionData, fileLoc);
				mehdi_aminiUnsubmitted Done Reply Inline Actions `offsetReader`? (I was confusing to me reading the code where it is used) mehdi_amini: `offsetReader`? (I was confusing to me reading the code where it is used)

				// A functor used to accumulate the offsets for the entries in the given
				// range.
				uint64_t currentOffset = 0;
				auto accumulateOffsets = [&](auto &&range) {
				for (auto &entry : range) {
				uint64_t entrySize;
				if (reader.parseVarInt(entrySize))
				return failure();
				entry.data = sectionData.slice(currentOffset, entrySize);
				currentOffset += entrySize;
				}
				return success();
				};

				// Process each of the attributes, and then the types.
				if (failed(accumulateOffsets(attributes)) \|\| failed(accumulateOffsets(types)))
				return failure();

				// Ensure that we read everything from the section.
				if (!reader.empty()) {
				return reader.emitError(
				"unexpected trailing data in the Attribute/Type offset section");
				}
				mehdi_aminiUnsubmitted Done Reply Inline Actions I think you should check that currentOffset does not exceed the `sectionData.size()`, a malformed byte code coud have offsets going beyond. mehdi_amini: I think you should check that currentOffset does not exceed the `sectionData.size()`, a…
				return success();
				}

				template <typename T>
				jpienaarUnsubmitted Done Reply Inline Actions attribute ? jpienaar: attribute ?
				T AttrTypeReader::resolveEntry(SmallVectorImpl<Entry<T>> &entries,
				unsigned index, StringRef entryType) {
				if (index >= entries.size()) {
				emitError(fileLoc) << "invalid " << entryType << "index:" << index;
				return {};
				}

				// If the entry has already been resolved, there is nothing left to do.
				Entry<T> &entry = entries[index];
				if (entry.entry)
				return entry.entry;

				// Parse the entry. Each entry starts with a specific code that indicates how
				// it is represented.
				EncodingReader reader(entry.data, fileLoc);
				uint8_t code;
				if (reader.parseByte(code) \|\| failed(parseEntry(reader, code, entry.entry)))
				return T();
				if (!reader.empty()) {
				(void)reader.emitError("unexpected trailing bytes after " + entryType +
				" entry");
				return T();
				}
				return entry.entry;
				}

				LogicalResult AttrTypeReader::parseEntry(EncodingReader &reader, uint8_t code,
				Attribute &result) {
				// Handle the fallback case, where the attribute was encoded using its
				// assembly format.
				if (code == bytecode::AttrTypeCode::kAsmForm) {
				StringRef attrStr;
				if (failed(reader.parseNULTerminatedString(attrStr)))
				return failure();

				size_t numRead = 0;
				if (!(result = parseAttribute(attrStr, fileLoc->getContext(), numRead)))
				return failure();
				if (numRead != attrStr.size()) {
				return reader.emitError(
				"trailing characters found after Attribute assembly format: ",
				attrStr.drop_front(numRead));
				}
				return success();
				}

				return reader.emitError("unexpected Attribute encoding: ", code);
				}

				LogicalResult AttrTypeReader::parseEntry(EncodingReader &reader, uint8_t code,
				Type &result) {
				// Handle the fallback case, where the type was encoded using its
				// assembly format.
				if (code == bytecode::AttrTypeCode::kAsmForm) {
				StringRef typeStr;
				if (failed(reader.parseNULTerminatedString(typeStr)))
				return failure();

				size_t numRead = 0;
				if (!(result = parseType(typeStr, fileLoc->getContext(), numRead)))
				return failure();
				if (numRead != typeStr.size()) {
				return reader.emitError(
				"trailing characters found after Type assembly format: " +
				typeStr.drop_front(numRead));
				}
				return success();
				}

				return reader.emitError("unexpected Type encoding: ", code);
				}
				mehdi_aminiUnsubmitted Done Reply Inline Actions I remember that Attr/Type were made "mutable" to support LLVM named struct (IIRC?), but isn't this encoding and loading scheme assuming there are no cycles? How are we gonna handle this? mehdi_amini: I remember that Attr/Type were made "mutable" to support LLVM named struct (IIRC?), but isn't…
				rriddleAuthorUnsubmitted Done Reply Inline Actions We will likely need some form of special API that can parse just the "immutable" part (i.e. the "name" in the LLVM struct case). For example, if an attribute/type is recursive, we could encode both its immutable and mutable encodings in one entry (with some header that has the size of the immutable part or something). Something like: RecursiveEntry { immutableEncodingSize: varint, immutableEncoding: ..., mutableEncoding: ... } During processing we could first process the immutable entry, and then immediately process the mutable one. That way any recursive references would resolve properly, and then we'd fix the final reference afterwards. Something like: // Parse the immutable first, so that we have something to give recursive references. if (!(result = parseImmutable())) return failure(); // Parse the mutable afterwards. Pass in `result` so that it can populate the mutable bits? if (failed(parseMutable(result)) return failure(); Until we figure any of this out though, I'm just going to have them always use the string fallback for those attributes and types. rriddle: We will likely need some form of special API that can parse just the "immutable" part (i.e. the…

				//===----------------------------------------------------------------------===//
				// Bytecode Reader
				//===----------------------------------------------------------------------===//

				namespace {
				/// This class is used to read a bytecode buffer and translate it into MLIR.
				class BytecodeReader {
				public:
				BytecodeReader(Location fileLoc, const ParserConfig &config)
				: config(config), fileLoc(fileLoc), attrTypeReader(fileLoc),
				forwardRefOpState(UnknownLoc::get(config.getContext()),
				"builtin.unrealized_conversion_cast", ValueRange(),
				NoneType::get(config.getContext())) {}

				/// Read the bytecode defined within `buffer` into the given block.
				LogicalResult read(llvm::MemoryBufferRef buffer, Block *block);

				private:
				/// Return the context for this config.
				MLIRContext *getContext() const { return config.getContext(); }

				//===--------------------------------------------------------------------===//
				// Dialect Section

				LogicalResult parseDialectSection(ArrayRef<uint8_t> sectionData);

				/// Parse an operation name reference using the given reader.
				FailureOr<OperationName> parseOpName(EncodingReader &reader);

				//===--------------------------------------------------------------------===//
				// Attribute/Type Section

				/// Parse an attribute or type using the given reader. Returns nullptr in the
				jpienaarUnsubmitted Done Reply Inline Actions Comment? jpienaar: Comment?
				/// case of failure.
				Attribute parseAttribute(EncodingReader &reader);
				Type parseType(EncodingReader &reader);

				template <typename T>
				T parseAttribute(EncodingReader &reader) {
				if (Attribute attr = parseAttribute(reader)) {
				if (auto derivedAttr = attr.dyn_cast<T>())
				return derivedAttr;
				(void)reader.emitError("expected attribute of type: ",
				llvm::getTypeName<T>(), ", but got: ", attr);
				}
				return T();
				}

				//===--------------------------------------------------------------------===//
				// TopLevelOp Section

				LogicalResult parseTopLevelOpSection(ArrayRef<uint8_t> sectionData,
				Block *block);
				LogicalResult parseOp(EncodingReader &reader, Block *block,
				ArrayRef<Block *> regionBlocks, LocationAttr &lastLoc);
				LogicalResult parseRegion(EncodingReader &reader, Region *region,
				LocationAttr &lastLoc);
				LogicalResult parseBlock(EncodingReader &reader, Block *block,
				ArrayRef<Block *> regionBlocks,
				LocationAttr &lastLoc);
				LogicalResult parseBlockArguments(EncodingReader &reader, Block *block,
				LocationAttr &lastLoc);

				//===--------------------------------------------------------------------===//
				// Value Processing

				/// Parse an operand reference using the given reader. Returns nullptr in the
				/// case of failure.
				Value parseOperand(EncodingReader &reader);

				/// Sequentially define the given value range starting at the provided first
				/// value ID.
				LogicalResult defineValues(EncodingReader &reader, ValueRange values,
				unsigned firstValueID);

				/// Create a value to use for a forward reference.
				Value createForwardRef();

				//===--------------------------------------------------------------------===//
				// Fields

				/// The configuration of the parser.
				const ParserConfig &config;

				/// A location to use when emitting errors.
				Location fileLoc;

				/// The reader used to process attribute and types within the bytecode.
				AttrTypeReader attrTypeReader;

				/// The version of the bytecode being read.
				uint64_t version = 0;

				/// The table of IR units referenced within the bitcode file.
				SmallVector<BytecodeDialect> dialects;
				SmallVector<OperationName> opNames;

				/// The current set of available IR values.
				std::vector<Value> values;
				/// A block containing the set of operations defined to create forward
				/// references.
				Block forwardRefOps;
				/// A block containing previously created, and no longer used, forward
				/// reference operations.
				Block openForwardRefOps;
				/// An operation state used when instantiating forward references.
				OperationState forwardRefOpState;
				};
				} // namespace

				LogicalResult BytecodeReader::read(llvm::MemoryBufferRef buffer, Block *block) {
				EncodingReader reader(buffer.getBuffer(), fileLoc);

				// Skip over the bytecode header, this should have already been checked.
				if (reader.skipBytes(StringRef("ML\xefR").size()))
				return failure();

				// Parse the bytecode version.
				if (reader.parseVarInt(version))
				return failure();

				jpienaarUnsubmitted Done Reply Inline Actions So this parses the string starting at front of reader? And null-terminated? jpienaar: So this parses the string starting at front of reader? And null-terminated?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, the front of the reader has an index to a string defined in the string section. Updated the comment. rriddle: Yeah, the front of the reader has an index to a string defined in the string section. Updated…
				// Validate the bytecode version.
				if (version < bytecode::kVersion) {
				return reader.emitError(
				"bytecode version ", version, " is older than the current version of ",
				bytecode::kVersion, ", and upgrade is not supported.");
				}
				if (version > bytecode::kVersion) {
				return reader.emitError("bytecode version ", version,
				" is newer than the current version ",
				bytecode::kVersion, ".");
				}

				// The raw data for the AttrTypeOffset section.
				Optional<ArrayRef<uint8_t>> attrTypeOffsetSection;

				BitVector seenSections(bytecode::Section::kNumSections);
				while (!reader.empty()) {
				// Read the next section from the bytecode.
				uint8_t code;
				if (reader.parseByte(code) \|\| code != bytecode::BuiltinCode::kSection)
				return reader.emitError("expected top-level section code");
				uint8_t sectionID;
				ArrayRef<uint8_t> sectionData;
				if (reader.parseSection(sectionID, sectionData))
				return failure();

				// Check for duplicate sections, we only expect one instance of each.
				if (seenSections.test(sectionID))
				return reader.emitError("duplicate top-level section ID: ", sectionID);
				seenSections.set(sectionID);

				// Process the section.
				switch (sectionID) {
				case bytecode::Section::kDialect:
				if (failed(parseDialectSection(sectionData)))
				return failure();
				break;
				case bytecode::Section::kAttrType:
				if (!attrTypeOffsetSection) {
				return reader.emitError(
				"expected the AttrTypeOffset section before the AttrType section");
				}
				if (dialects.empty()) {
				return reader.emitError("expected the Dialect section before the"
				"AttrTypeOffset section");
				}

				// With everything ready, initialize the attribute/type reader.
				if (failed(attrTypeReader.initialize(dialects, sectionData,
				*attrTypeOffsetSection)))
				return failure();
				break;
				case bytecode::Section::kAttrTypeOffset:
				// We won't parse this section until we process the main AttrType section.
				// For now, just record the raw data.
				attrTypeOffsetSection = sectionData;
				break;
				case bytecode::Section::kTopLevelOp:
				if (failed(parseTopLevelOpSection(sectionData, block)))
				return failure();
				break;
				default:
				return reader.emitError("unexpected top-level section: ", sectionID);
				mehdi_aminiUnsubmitted Done Reply Inline Actions This should move to `parseSection I think. (`sectionID` is used in test/set already, seems unsafe) mehdi_amini: This should move to `parseSection I think. (`sectionID` is used in test/set already, seems…
				}
				}
				return success();
				}

				//===----------------------------------------------------------------------===//
				// Dialect Section

				LogicalResult
				BytecodeReader::parseDialectSection(ArrayRef<uint8_t> sectionData) {
				MLIRContext *ctx = getContext();

				EncodingReader sectionReader(sectionData, fileLoc);
				while (!sectionReader.empty()) {
				// Read the name of the next dialect.
				StringRef dialectName;
				if (sectionReader.parseNULTerminatedString(dialectName))
				return failure();

				// Parse the attribute and type counts.
				uint64_t attrCount, typeCount;
				if (sectionReader.parseVarInt(attrCount) \|\|
				sectionReader.parseVarInt(typeCount))
				return failure();

				// Try to load the dialect.
				Dialect *dialect = ctx->getOrLoadDialect(dialectName);
				mehdi_aminiUnsubmitted Done Reply Inline Actions Why aren't dialects lazy loaded? mehdi_amini: Why aren't dialects lazy loaded?
				if (!dialect && !ctx->allowsUnregisteredDialects()) {
				return sectionReader.emitError(
				"dialect '", dialectName,
				"' is unknown. If this is intended, please call "
				"allowUnregisteredDialects() on the MLIRContext, or use "
				"-allow-unregistered-dialect with the MLIR tool used.");
				}
				dialects.emplace_back(dialect, dialectName, attrCount, typeCount);

				// Parse the operation names of the dialect.
				uint64_t numOpNames;
				if (sectionReader.parseVarInt(numOpNames))
				return failure();
				SmallString<32> opNameStorage({dialectName, "."});
				while (numOpNames--) {
				StringRef opName;
				if (sectionReader.parseNULTerminatedString(opName))
				return failure();

				opNameStorage.resize(dialectName.size() + 1);
				opNameStorage.append(opName);
				opNames.push_back(OperationName(opNameStorage, ctx));
				}
				mehdi_aminiUnsubmitted Done Reply Inline Actions I was thinking: could we have a stringpool top-level section and everywhere refer to strings with an id there? Mnemonic shared between op/attributes/types and across dialects would be stored once and for-all. mehdi_amini: I was thinking: could we have a stringpool top-level section and everywhere refer to strings…
				jpienaarUnsubmitted Done Reply Inline Actions So encoding would be start and end offsets into a string table? jpienaar: So encoding would be start and end offsets into a string table?
				mehdi_aminiUnsubmitted Done Reply Inline Actions String being null terminated, you don't necessarily need the end offsets. But if we have an offset section separate from the string table: we just need to point to an entry number, same mechanism as attr/type reference. mehdi_amini: String being null terminated, you don't necessarily need the end offsets. But if we have an…
				jpienaarUnsubmitted Done Reply Inline Actions Indeed, null-termination means we can't have substrings referenced (not sure if that is common here, could think for error strings, but unsure about decoding cost). jpienaar: Indeed, null-termination means we can't have substrings referenced (not sure if that is common…
				}
				return success();
				}

				FailureOr<OperationName> BytecodeReader::parseOpName(EncodingReader &reader) {
				uint64_t opNameIdx;
				if (reader.parseVarInt(opNameIdx))
				return failure();

				if (opNameIdx >= opNames.size())
				return reader.emitError("invalid operation name index: ", opNameIdx);
				return opNames[opNameIdx];
				}

				//===----------------------------------------------------------------------===//
				// Attribute/Type Section

				Attribute BytecodeReader::parseAttribute(EncodingReader &reader) {
				uint64_t attrIdx;
				if (reader.parseVarInt(attrIdx))
				return Attribute();
				return attrTypeReader.resolveAttribute(attrIdx);
				}

				Type BytecodeReader::parseType(EncodingReader &reader) {
				uint64_t typeIdx;
				if (reader.parseVarInt(typeIdx))
				return Type();
				return attrTypeReader.resolveType(typeIdx);
				}

				//===----------------------------------------------------------------------===//
				// TopLevelOp Section

				LogicalResult
				BytecodeReader::parseTopLevelOpSection(ArrayRef<uint8_t> sectionData,
				Block *block) {
				EncodingReader reader(sectionData, fileLoc);

				LocationAttr lastLoc;
				if (failed(parseOp(reader, block, /regionBlocks=/llvm::None, lastLoc)))
				return failure();
				if (!forwardRefOps.empty())
				return reader.emitError(
				"not all forward unresolved forward operand references");
				return success();
				}

				LogicalResult BytecodeReader::parseOp(EncodingReader &reader, Block *block,
				ArrayRef<Block *> regionBlocks,
				LocationAttr &lastLoc) {
				// Parse the name of the operation.
				FailureOr<OperationName> opName = parseOpName(reader);
				if (failed(opName))
				return failure();

				// Parse the operation mask, which indicates which components of the operation
				// are present.
				uint8_t opMask;
				if (reader.parseByte(opMask))
				return failure();

				/// Check to see if this op has a new location.
				if (opMask & bytecode::OpEncodingMask::kHasLoc) {
				if (!(lastLoc = parseAttribute<LocationAttr>(reader)))
				return failure();
				}

				// With the location and name resolved, we can start building the operation
				// state.
				OperationState opState(lastLoc, *opName);

				// Parse the attributes of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasAttrs) {
				DictionaryAttr dictAttr = parseAttribute<DictionaryAttr>(reader);
				if (!dictAttr)
				return failure();
				opState.attributes = dictAttr;
				}

				/// Parse the results of the operation.
				Optional<uint64_t> firstResultID;
				if (opMask & bytecode::OpEncodingMask::kHasResults) {
				firstResultID.emplace(0);
				if (reader.parseVarInt(*firstResultID))
				return failure();

				// Parse the result types.
				uint64_t numResults;
				if (reader.parseVarInt(numResults))
				return failure();
				opState.types.resize(numResults);
				for (int i = 0, e = numResults; i < e; ++i)
				if (!(opState.types[i] = parseType(reader)))
				return failure();
				}

				/// Parse the operands of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasOperands) {
				uint64_t numOperands;
				if (reader.parseVarInt(numOperands))
				return failure();
				opState.operands.resize(numOperands);
				for (int i = 0, e = numOperands; i < e; ++i)
				if (!(opState.operands[i] = parseOperand(reader)))
				return failure();
				}

				/// Parse the successors of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasSuccessors) {
				uint64_t numSuccs;
				if (reader.parseVarInt(numSuccs))
				return failure();
				opState.operands.reserve(numSuccs);
				for (int i = 0, e = numSuccs; i < e; ++i) {
				uint64_t succID;
				if (reader.parseVarInt(succID))
				return failure();
				if (succID >= regionBlocks.size())
				return reader.emitError("invalid successor index: ", succID);
				opState.successors.push_back(regionBlocks[succID]);
				}
				}

				/// Parse the regions of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasInlineRegions) {
				uint64_t numRegions;
				if (reader.parseVarInt(numRegions))
				return failure();
				opState.regions.reserve(numRegions);
				for (int i = 0, e = numRegions; i < e; ++i) {
				opState.regions.push_back(std::make_unique<Region>());
				if (failed(parseRegion(reader, &*opState.regions.back(), lastLoc)))
				mehdi_aminiUnsubmitted Done Reply Inline Actions Please break the recursion :) mehdi_amini: Please break the recursion :)
				return failure();
				}
				}

				// Create the operation.
				Operation *op = Operation::create(opState);
				mehdi_aminiUnsubmitted Done Reply Inline Actions Not sure where to attached this comment, but there is something missing somewhere (unless I missed it?) to ensure that use-lists ordering is preserved. mehdi_amini: Not sure where to attached this comment, but there is something missing somewhere (unless I…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Deferring this to a follow up to help simplify this patch, added a TODO for now. rriddle: Deferring this to a follow up to help simplify this patch, added a TODO for now.
				block->push_back(op);

				// If the operation had results, update the value references.
				if (firstResultID)
				return defineValues(reader, op->getResults(), *firstResultID);
				return LogicalResult::success();
				}

				LogicalResult BytecodeReader::parseRegion(EncodingReader &reader,
				Region *region,
				LocationAttr &lastLoc) {
				// Read the code defining how this region was encoded.
				uint8_t regionCode;
				if (reader.parseByte(regionCode))
				return failure();

				// If it's an empty region, there is nothing more to do.
				if (regionCode == bytecode::TopLevelOpCode::kRegionEmpty)
				return success();

				// Otherwise, we need to parse the region body.
				if (regionCode != bytecode::TopLevelOpCode::kRegion)
				return reader.emitError("invalid region code: ", regionCode);

				// Parse the number of blocks and values in this region.
				uint64_t numBlocks, numValues;
				if (reader.parseVarInt(numBlocks) \|\| reader.parseVarInt(numValues))
				return failure();

				// Reserve enough values for those defined in this region. Make sure to reset
				// the size of the value table after processing though.
				size_t origNumValues = values.size();
				auto atExit = llvm::make_scope_exit([&]() { values.resize(origNumValues); });
				values.resize(values.size() + numValues);

				// Create the blocks within this region. We do this before processing so that
				// we can rely on the blocks existing when creating operations.
				SmallVector<Block *> regionBlocks;
				regionBlocks.reserve(numBlocks);
				for (uint64_t i = 0; i < numBlocks; ++i) {
				regionBlocks.push_back(new Block());
				region->push_back(regionBlocks.back());
				}

				for (uint64_t i = 0; i < numBlocks; ++i)
				if (failed(parseBlock(reader, regionBlocks[i], regionBlocks, lastLoc)))
				return failure();
				return success();
				}

				LogicalResult BytecodeReader::parseBlock(EncodingReader &reader, Block *block,
				ArrayRef<Block *> regionBlocks,
				LocationAttr &lastLoc) {
				// Parse the first code of the block explicitly in case the block has
				// arguments.
				uint8_t blockCode = 0;
				if (reader.parseByte(blockCode))
				return failure();

				// Check for arguments to the block.
				if (blockCode == bytecode::TopLevelOpCode::kBlockArguments) {
				if (failed(parseBlockArguments(reader, block, lastLoc)))
				return failure();

				// Parse the next block code.
				if (reader.parseByte(blockCode))
				return failure();
				}

				while (blockCode != bytecode::TopLevelOpCode::kBlockEnd) {
				// Parse an operation within the block.
				if (blockCode == bytecode::TopLevelOpCode::kOp) {
				if (failed(parseOp(reader, block, regionBlocks, lastLoc)))
				return failure();
				} else {
				return reader.emitError("unknown block code: ", blockCode);
				}

				// Parse the next code.
				if (reader.parseByte(blockCode))
				return failure();
				}
				return success();
				}

				LogicalResult BytecodeReader::parseBlockArguments(EncodingReader &reader,
				Block *block,
				LocationAttr &lastLoc) {
				// Parse the value ID for the first argument, and the number of arguments.
				uint64_t firstArgID, numArgs;
				if (reader.parseVarInt(firstArgID) \|\| reader.parseVarInt(numArgs))
				return failure();

				SmallVector<Type> argTypes;
				SmallVector<Location> argLocs;
				argTypes.reserve(numArgs);
				argLocs.reserve(numArgs);
				while (numArgs--) {
				uint64_t typeIdx;
				if (reader.parseVarInt(typeIdx))
				return failure();

				// Check the low bit of the type index to see if this argument has a new
				// location.
				bool hasNewLoc = (typeIdx & 1) != 0;
				typeIdx >>= 1;

				// Parse the type, and optionally the location.
				Type argType = attrTypeReader.resolveType(typeIdx);
				if (!argType)
				return failure();
				if (hasNewLoc && !(lastLoc = parseAttribute<LocationAttr>(reader)))
				return failure();

				argTypes.push_back(argType);
				argLocs.push_back(lastLoc);
				}
				block->addArguments(argTypes, argLocs);
				return defineValues(reader, block->getArguments(), firstArgID);
				}

				//===----------------------------------------------------------------------===//
				// Value Processing

				Value BytecodeReader::parseOperand(EncodingReader &reader) {
				uint64_t valueIdx;
				if (failed(reader.parseVarInt(valueIdx)))
				return nullptr;
				if (valueIdx >= values.size())
				return (void)reader.emitError("invalid value index: ", valueIdx), Value();

				// Resolve it, or create a new forward reference if necessary.
				Value &value = values[valueIdx];
				if (!value)
				value = createForwardRef();
				return value;
				}

				LogicalResult BytecodeReader::defineValues(EncodingReader &reader,
				ValueRange newValues,
				unsigned firstValueID) {
				size_t maxId = firstValueID + newValues.size();
				if (maxId > values.size()) {
				return reader.emitError(
				"value index range was outside of the expected range for "
				"the parent region, got [",
				firstValueID, ", ", maxId, "), but the maximum index was ",
				values.size() - 1);
				vitalybukaUnsubmitted Done Reply Inline Actions also a problem, it emplace_back may relocate container, but the for loop above uses readState which is the ref to the element of container. vitalybuka: also a problem, it emplace_back may relocate container, but the for loop above uses readState…
				rriddleAuthorUnsubmitted Done Reply Inline Actions This should be fine, given that we always return in this case (i.e. never touch to invalid reference again). rriddle: This should be fine, given that we always return in this case (i.e. never touch to invalid…
				vitalybukaUnsubmitted Not Done Reply Inline Actions Thanks, I see. vitalybuka: Thanks, I see.
				}

				// Assign the values and update any forward references.
				for (unsigned i = 0, e = newValues.size(); i != e; ++i) {
				Value newValue = newValues[i];

				// Check to see if a definition for this value already exists.
				if (Value oldValue = std::exchange(values[firstValueID + i], newValue)) {
				Operation *forwardRefOp = oldValue.getDefiningOp();
				if (!forwardRefOp \|\| forwardRefOp->getBlock() != &forwardRefOps) {
				return reader.emitError("value index ", firstValueID + i,
				" was already defined");
				}

				oldValue.replaceAllUsesWith(newValue);
				forwardRefOp->moveBefore(&openForwardRefOps, openForwardRefOps.end());
				}
				}
				return LogicalResult::success();
				}

				Value BytecodeReader::createForwardRef() {
				// Check for an avaliable existing operation to use. Otherwise, create a new
				// fake operation to use for the reference.
				vitalybukaUnsubmitted Done Reply Inline Actions This pop_back and then readState.isIsolatedFromAbove which from the regionStack? vitalybuka: This pop_back and then readState.isIsolatedFromAbove which from the regionStack?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Thanks for catching this. I'm not sure why my local asan build didn't catch this (I'll try nuking and resetting it). rriddle: Thanks for catching this. I'm not sure why my local asan build didn't catch this (I'll try…
				vitalybukaUnsubmitted Not Done Reply Inline Actions I'm not sure why my local asan build didn't catch this Probably you don't use libc++ or instrumented libc++? I'm not sure why my local asan build didn't catch this I'm not sure why my local asan build didn't catch this If you can fix it quickly go for it. If not, please let me know, I have a patch to revert it with related fixes. vitalybuka: >> I'm not sure why my local asan build didn't catch this Probably you don't use libc++ or…
				if (!openForwardRefOps.empty()) {
				Operation *op = &openForwardRefOps.back();
				op->moveBefore(&forwardRefOps, forwardRefOps.end());
				} else {
				forwardRefOps.push_back(Operation::create(forwardRefOpState));
				}
				return forwardRefOps.back().getResult(0);
				}

				//===----------------------------------------------------------------------===//
				// Entry Points
				//===----------------------------------------------------------------------===//

				bool mlir::isBytecode(llvm::MemoryBufferRef buffer) {
				return buffer.getBuffer().startswith("ML\xefR");
				}

				LogicalResult mlir::readBytecodeFile(llvm::MemoryBufferRef buffer, Block *block,
				const ParserConfig &config) {
				Location sourceFileLoc =
				FileLineColLoc::get(config.getContext(), buffer.getBufferIdentifier(),
				/line=/0, /column=/0);
				if (!isBytecode(buffer)) {
				return emitError(sourceFileLoc,
				"input buffer is not an MLIR bytecode file");
				}

				Block parsedBlock;
				BytecodeReader reader(sourceFileLoc, config);
				if (failed(reader.read(buffer, &parsedBlock)))
				return failure();

				// Splice the parsed operations over to the provided top-level block.
				auto &parsedOps = parsedBlock.getOperations();
				auto &destOps = block->getOperations();
				destOps.splice(destOps.empty() ? destOps.end() : std::prev(destOps.end()),
				parsedOps, parsedOps.begin(), parsedOps.end());
				return success();
				}

mlir/lib/Bytecode/Reader/CMakeLists.txt

This file was added.

				add_mlir_library(MLIRBytecodeReader
				BytecodeReader.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Bytecode

				LINK_LIBS PUBLIC
				MLIRAsmParser
				MLIRIR
				MLIRSupport
				)

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp

This file was added.

				//===- BytecodeWriter.cpp - MLIR Bytecode Writer --------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Bytecode/BytecodeWriter.h"
				#include "../Encoding.h"
				#include "IRNumbering.h"
				#include "mlir/IR/BuiltinDialect.h"
				#include "mlir/IR/OpImplementation.h"
				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/SmallString.h"
				#include "llvm/Support/Debug.h"
				#include <random>

				#define DEBUG_TYPE "mlir-bytecode"

				using namespace mlir;
				using namespace mlir::bytecode::detail;

				//===----------------------------------------------------------------------===//
				// BytecodeWriterConfig
				//===----------------------------------------------------------------------===//

				struct BytecodeWriterConfig::Impl {
				explicit Impl(Operation *op) : rootOp(op) {}

				/// The root operation of the bytecode.
				Operation *rootOp;
				};

				BytecodeWriterConfig::BytecodeWriterConfig(Operation *op)
				: impl(std::make_unique<Impl>(op)) {}
				BytecodeWriterConfig::~BytecodeWriterConfig() = default;

				Operation *BytecodeWriterConfig::getRootOp() const { return impl->rootOp; }

				//===----------------------------------------------------------------------===//
				// EncodingEmitter
				//===----------------------------------------------------------------------===//

				namespace {
				/// This class functions as the underlying encoding emitter for the bytecode
				/// writer. This class is a bit different compared to other types of encoders;
				/// it does not use a single buffer, but instead may contain several buffers
				/// (some owned by the writer, and some not) that get concatted during the final
				/// emission.
				class EncodingEmitter {
				public:
				EncodingEmitter() = default;
				EncodingEmitter(const EncodingEmitter &) = delete;
				EncodingEmitter &operator=(const EncodingEmitter &) = delete;

				/// Write the current contents to the provided stream.
				void writeTo(raw_ostream &os) const;

				/// Return the current size of the encoded buffer.
				size_t size() const { return prevResultSize + currentResult.size(); }

				//===--------------------------------------------------------------------===//
				// Emission
				//===--------------------------------------------------------------------===//

				/// Return a raw pointer into the result buffer at the specified offset.
				uint8_t *getRawPointer(uint64_t offset) {
				assert(offset < size() && offset >= prevResultSize &&
				"cannot get pointer to previously emitted data");
				return currentResult.data() + (offset - prevResultSize);
				}

				//===--------------------------------------------------------------------===//
				// Integer Emission

				/// Emit a single byte.
				void emitByte(uint8_t byte) { currentResult.push_back(byte); }

				/// Emit a range of bytes.
				void emitBytes(ArrayRef<uint8_t> bytes) {
				llvm::append_range(currentResult, bytes);
				}

				/// Emit a variable length integer. The first encoded byte contains a prefix
				/// in the low bits indicating the encoded length of the value. This length
				/// prefix is a bit sequence of '0's followed by a '1'. The number of '0' bits
				/// indicate the number of _additional_ bytes (not including the prefix byte).
				/// All remaining bits in the first byte, along with all of the bits in
				/// additional bytes, provide the value of the integer encoded in
				/// little-endian order.
				void emitVarInt(uint64_t value) {
				// In the most common case, the value can be represented in a single byte.
				// Given how hot this case is, explicitly handle that here.
				if ((value >> 7) == 0)
				return emitByte((value << 1) \| 0x1);
				emitMultiByteVarInt(value);
				}

				//===--------------------------------------------------------------------===//
				// String Emission

				/// Emit the given string as a nul terminated string.
				void emitNulTerminatedString(StringRef str) {
				emitString(str);
				emitByte(0);
				}

				/// Emit the given string without a nul terminator.
				void emitString(StringRef str) {
				emitBytes({reinterpret_cast<const uint8_t *>(str.data()), str.size()});
				}

				//===--------------------------------------------------------------------===//
				// Section Emission

				/// Emit a nested section of the given code, whose contents are encoded in the
				/// provided emitter.
				void emitSection(bytecode::Section::ID code, EncodingEmitter &&emitter) {
				emitByte(bytecode::BuiltinCode::kSection);

				// Emit the section code and length.
				emitByte(code);
				emitVarInt(emitter.size());

				// Push our current buffer and then merge the provided section body into
				// ours.
				appendResult(std::move(currentResult));
				for (std::vector<uint8_t> &result : emitter.prevResultStorage)
				appendResult(std::move(result));
				appendResult(std::move(emitter.currentResult));
				}

				private:
				/// Emit the given value using a variable width encoding. This method is a
				/// fallback when the number of bytes needed to encode the value is greater
				/// than 1. We mark it noinline here so that the single byte hot path isn't
				/// pessimized.
				LLVM_ATTRIBUTE_NOINLINE void emitMultiByteVarInt(uint64_t value);

				/// Append a new result buffer to the current contents.
				void appendResult(std::vector<uint8_t> &&result) {
				prevResultSize += result.size();
				prevResultStorage.emplace_back(std::move(result));
				prevResultList.emplace_back(prevResultStorage.back());
				}

				/// The result of the emitter currently being built. We refrain from building
				/// a single buffer to simplify emitting sections, large data, and more. The
				/// result is thus represented using multiple distinct buffers, some of which
				/// we own (via prevResultStorage), and some of which are just pointers into
				/// externally owned buffers.
				std::vector<uint8_t> currentResult;
				std::vector<ArrayRef<uint8_t>> prevResultList;
				std::vector<std::vector<uint8_t>> prevResultStorage;

				/// An up-to-date total size of all of the buffers within `prevResultList`.
				/// This enables O(1) size checks of the current encoding.
				size_t prevResultSize = 0;
				};

				/// A simple raw_ostream wrapper around a EncodingEmitter. This removes the need
				/// to go through an intermediate buffer when interacting with code that wants a
				/// raw_ostream.
				class raw_emitter_ostream : public raw_ostream {
				public:
				explicit raw_emitter_ostream(EncodingEmitter &emitter) : emitter(emitter) {
				SetUnbuffered();
				}

				private:
				void write_impl(const char *ptr, size_t size) override {
				emitter.emitBytes({reinterpret_cast<const uint8_t *>(ptr), size});
				}
				uint64_t current_pos() const override { return emitter.size(); }

				/// The section being emitted to.
				EncodingEmitter &emitter;
				};
				} // namespace

				void EncodingEmitter::writeTo(raw_ostream &os) const {
				for (auto &prevResult : prevResultList)
				os.write((const char *)prevResult.data(), prevResult.size());
				os.write((const char *)currentResult.data(), currentResult.size());
				}
				RVPUnsubmitted Not Done Reply Inline Actions Is this parenthesized correctly? RVP: Is this parenthesized correctly?
				jpienaarUnsubmitted Not Done Reply Inline Actions This is checking if the value post shift is 0 (and relies on this function being called only when multi byte), what issue did you run into with this? jpienaar: This is checking if the value post shift is 0 (and relies on this function being called only…
				RVPUnsubmitted Not Done Reply Inline Actions Shouldn't `LLVM_LIKELY` be around the whole condition instead of the shift expression? Isn't `== 0` the likely case and not the shift result being non-zero? RVP: Shouldn't `LLVM_LIKELY` be around the whole condition instead of the shift expression? Isn't…
				RVPUnsubmitted Not Done Reply Inline Actions I didn't see any issues. Was looking at the code and this question popped. I now saw that `emitVarInt` specially handles the common case `(... >> 7) == 0`. Maybe a comment here as well would have avoided the question. Thanks. RVP: I didn't see any issues. Was looking at the code and this question popped. I now saw that…

				void EncodingEmitter::emitMultiByteVarInt(uint64_t value) {
				// Compute the number of bytes needed to encode the value. Each byte can hold
				// up to 7-bits of data. We only check up to the number of bits we can encode
				// in the first byte (8).
				uint64_t it = value >> 7;
				for (size_t numBytes = 2; numBytes < 9; ++numBytes) {
				if (LLVM_LIKELY(it >>= 7) == 0) {
				uint64_t encodedValue = (value << 1) \| 0x1;
				encodedValue <<= (numBytes - 1);
				emitBytes({reinterpret_cast<uint8_t *>(&encodedValue), numBytes});
				return;
				}
				}

				// If the value is too large to encode in a single byte, emit a special all
				// zero marker byte and splat the value directly.
				emitByte(0);
				emitBytes({reinterpret_cast<uint8_t *>(&value), sizeof(value)});
				}

				//===----------------------------------------------------------------------===//
				// Bytecode Writer
				//===----------------------------------------------------------------------===//

				namespace {
				class BytecodeWriter {
				public:
				BytecodeWriter(const BytecodeWriterConfig &config) : numberingState(config) {}

				/// Write the bytecode for the given root operation.
				void write(Operation *rootOp, raw_ostream &os);

				private:
				//===--------------------------------------------------------------------===//
				// Dialects

				void writeDialectSection(EncodingEmitter &emitter);

				//===--------------------------------------------------------------------===//
				// Attributes and Types

				void writeAttrTypeSection(EncodingEmitter &emitter);

				//===--------------------------------------------------------------------===//
				// Operations

				void writeBlock(EncodingEmitter &emitter, Block *block, Attribute &lastLoc);
				void writeOp(EncodingEmitter &emitter, Operation *op, Attribute &lastLoc);
				void writeRegion(EncodingEmitter &emitter, Region *region,
				Attribute &lastLoc);
				void writeTopLevelOp(EncodingEmitter &emitter, Operation *op);

				//===--------------------------------------------------------------------===//
				// Fields

				/// The IR numbering state generated for the root operation.
				IRNumberingState numberingState;
				};
				} // namespace

				void BytecodeWriter::write(Operation *rootOp, raw_ostream &os) {
				EncodingEmitter emitter;

				// Emit the bytecode file header. This is how we identify the output as a
				// bytecode file.
				emitter.emitString("ML\xefR");

				// Emit the bytecode version.
				emitter.emitVarInt(bytecode::kVersion);

				// Emit the dialect section.
				writeDialectSection(emitter);

				// Emit the attributes and types section.
				writeAttrTypeSection(emitter);

				// Emit the top level operation section.
				writeTopLevelOp(emitter, rootOp);

				// Write the generated bytecode to the provided output stream.
				emitter.writeTo(os);
				}

				//===----------------------------------------------------------------------===//
				// Dialects

				void BytecodeWriter::writeDialectSection(EncodingEmitter &emitter) {
				EncodingEmitter dialectEmitter;

				// Emit the referenced dialects.
				for (DialectNumbering &dialect : numberingState.getDialects()) {
				// Emit the dialect name.
				dialectEmitter.emitNulTerminatedString(dialect.name);

				// Emit the number of attributes and types emitted for this dialect.
				dialectEmitter.emitVarInt(dialect.attributes.size());
				dialectEmitter.emitVarInt(dialect.types.size());

				// Emit the referenced operation names of this dialect.
				dialectEmitter.emitVarInt(dialect.opNames.size());
				for (OpNameNumbering *opName : dialect.opNames)
				dialectEmitter.emitNulTerminatedString(opName->name.stripDialect());
				}

				emitter.emitSection(bytecode::Section::kDialect, std::move(dialectEmitter));
				}

				//===----------------------------------------------------------------------===//
				// Attributes and Types

				void BytecodeWriter::writeAttrTypeSection(EncodingEmitter &emitter) {
				EncodingEmitter attrTypeEmitter;
				EncodingEmitter offsetEmitter;

				// A functor used to emit an attribute or type entry.
				uint64_t prevOffset = 0;
				auto emitAttrOrType = [&](auto value) {
				// Emit the entry using the textual format.
				// TODO: Allow dialects to provide more optimal implementations of attribute
				// and type encodings.
				attrTypeEmitter.emitByte(bytecode::AttrTypeCode::kAsmForm);
				raw_emitter_ostream(attrTypeEmitter) << value;
				attrTypeEmitter.emitByte(0);

				// Record the offset of this entry.
				uint64_t curOffset = attrTypeEmitter.size();
				offsetEmitter.emitVarInt(curOffset - prevOffset);
				prevOffset = curOffset;
				};

				// Emit the attribute and type entries for each dialect.
				for (DialectNumbering &dialect : numberingState.getDialects())
				for (AttributeNumbering *attr : dialect.attributes)
				emitAttrOrType(attr->getValue());
				for (DialectNumbering &dialect : numberingState.getDialects())
				for (TypeNumbering *type : dialect.types)
				emitAttrOrType(type->getValue());

				// Emit the sections to the stream.
				emitter.emitSection(bytecode::Section::kAttrTypeOffset,
				std::move(offsetEmitter));
				emitter.emitSection(bytecode::Section::kAttrType, std::move(attrTypeEmitter));
				}

				//===----------------------------------------------------------------------===//
				// Operations

				void BytecodeWriter::writeBlock(EncodingEmitter &emitter, Block *block,
				Attribute &lastLoc) {
				// Emit the arguments of the block.
				ArrayRef<BlockArgument> args = block->getArguments();
				if (!args.empty()) {
				emitter.emitByte(bytecode::TopLevelOpCode::kBlockArguments);

				// Emit the value number for the first argument, and the number of arguments
				// we are encoding.
				emitter.emitVarInt(numberingState.getNumber(args.front()));
				emitter.emitVarInt(args.size());

				for (const auto &it : llvm::enumerate(args)) {
				// Check to see if this argument has a new location.
				Attribute argLoc = it.value().getLoc();
				bool argHasNewLoc = argLoc != std::exchange(lastLoc, argLoc);

				// Emit the argument type. We use the low bit of the type number to
				// indicate if the argument changed locations.
				uint64_t typeID = numberingState.getNumber(it.value().getType());
				emitter.emitVarInt((typeID << 1) \| (argHasNewLoc ? 1 : 0));
				if (argHasNewLoc)
				emitter.emitVarInt(numberingState.getNumber(argLoc));
				}
				}

				// Emit the operations within the block.
				for (Operation &op : *block) {
				emitter.emitByte(bytecode::TopLevelOpCode::kOp);
				writeOp(emitter, &op, lastLoc);
				}
				// Emit a terminal code to indicate when we are finished emitting operations.
				emitter.emitByte(bytecode::TopLevelOpCode::kBlockEnd);
				}

				void BytecodeWriter::writeOp(EncodingEmitter &emitter, Operation *op,
				Attribute &lastLoc) {
				emitter.emitVarInt(numberingState.getNumber(op->getName()));

				// Emit a mask for the operation components. We need to fill this in later
				// (when we actually know what needs to be emitted), so emit a placeholder for
				// now.
				uint64_t maskOffset = emitter.size();
				uint8_t opEncodingMask = 0;
				emitter.emitByte(0);

				// Emit the location for this operation.
				Attribute opLoc = op->getLoc();
				if (opLoc != std::exchange(lastLoc, opLoc)) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasLoc;
				emitter.emitVarInt(numberingState.getNumber(opLoc));
				}

				// Emit the attributes of this operation.
				DictionaryAttr attrs = op->getAttrDictionary();
				if (!attrs.empty()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasAttrs;
				emitter.emitVarInt(numberingState.getNumber(op->getAttrDictionary()));
				}

				// Emit the result types of the operation.
				if (unsigned numResults = op->getNumResults()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasResults;
				emitter.emitVarInt(numberingState.getNumber(op->getResult(0)));
				emitter.emitVarInt(numResults);
				for (Type type : op->getResultTypes())
				emitter.emitVarInt(numberingState.getNumber(type));
				}

				// Emit the operands of the operation.
				if (unsigned numOperands = op->getNumOperands()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasOperands;
				emitter.emitVarInt(numOperands);
				for (Value operand : op->getOperands())
				emitter.emitVarInt(numberingState.getNumber(operand));
				}

				// Emit the successors of the operation.
				if (unsigned numSuccessors = op->getNumSuccessors()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasSuccessors;
				emitter.emitVarInt(numSuccessors);
				for (Block *successor : op->getSuccessors())
				emitter.emitVarInt(numberingState.getNumber(successor));
				}

				// Check for regions.
				unsigned numRegions = op->getNumRegions();
				if (numRegions)
				opEncodingMask \|= bytecode::OpEncodingMask::kHasInlineRegions;

				// Update the mask for the operation.
				*emitter.getRawPointer(maskOffset) = opEncodingMask;

				// With the mask emitted, we can now emit the regions of the operation. We do
				// this after mask emission to avoid offset complications that may arise by
				// emitting the regions first (e.g. if the regions are huge, backpatching the
				// op encoding mask is more annoying).
				if (numRegions) {
				emitter.emitVarInt(numRegions);
				for (Region &region : op->getRegions())
				writeRegion(emitter, &region, lastLoc);
				}
				}

				void BytecodeWriter::writeRegion(EncodingEmitter &emitter, Region *region,
				Attribute &lastLoc) {
				if (region->empty())
				return emitter.emitByte(bytecode::TopLevelOpCode::kRegionEmpty);

				// Emit the number of blocks and values within the region.
				unsigned numBlocks, numValues;
				std::tie(numBlocks, numValues) = numberingState.getBlockValueCount(region);
				emitter.emitByte(bytecode::TopLevelOpCode::kRegion);
				emitter.emitVarInt(numBlocks);
				emitter.emitVarInt(numValues);

				// Emit the blocks within the region.
				for (Block &block : *region)
				writeBlock(emitter, &block, lastLoc);
				}

				void BytecodeWriter::writeTopLevelOp(EncodingEmitter &emitter, Operation *op) {
				EncodingEmitter topLevelOpEmitter;

				Attribute lastLoc;
				writeOp(topLevelOpEmitter, op, lastLoc);

				emitter.emitSection(bytecode::Section::kTopLevelOp,
				std::move(topLevelOpEmitter));
				}

				//===----------------------------------------------------------------------===//
				// Entry Points
				//===----------------------------------------------------------------------===//

				void mlir::writeBytecodeToFile(const BytecodeWriterConfig &config,
				raw_ostream &os) {
				BytecodeWriter writer(config);
				writer.write(config.getRootOp(), os);
				}

mlir/lib/Bytecode/Writer/CMakeLists.txt

This file was added.

				add_mlir_library(MLIRBytecodeWriter
				BytecodeWriter.cpp
				IRNumbering.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Bytecode

				LINK_LIBS PUBLIC
				MLIRIR
				MLIRSupport
				)

mlir/lib/Bytecode/Writer/IRNumbering.h

This file was added.

				//===- IRNumbering.h - MLIR bytecode IR numbering ---------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains various utilities that number IR structures in preparation
				// for bytecode emission.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LIB_MLIR_BYTECODE_WRITER_IRNUMBERING_H
				#define LIB_MLIR_BYTECODE_WRITER_IRNUMBERING_H

				#include "mlir/IR/OperationSupport.h"
				#include "llvm/ADT/MapVector.h"

				namespace mlir {
				class BytecodeWriterConfig;

				namespace bytecode {
				namespace detail {
				struct DialectNumbering;

				//===----------------------------------------------------------------------===//
				// Attribute and Type Numbering
				//===----------------------------------------------------------------------===//

				/// This class represents a numbering entry for an Attribute or Type.
				struct AttrTypeNumbering {
				AttrTypeNumbering(PointerUnion<Attribute, Type> value) : value(value) {}

				/// The concrete value.
				PointerUnion<Attribute, Type> value;

				/// The number assigned to this value.
				unsigned number = 0;

				/// The dialect of this value.
				DialectNumbering *dialect = nullptr;
				};
				struct AttributeNumbering : public AttrTypeNumbering {
				AttributeNumbering(Attribute value) : AttrTypeNumbering(value) {}
				Attribute getValue() const { return value.get<Attribute>(); }
				};
				struct TypeNumbering : public AttrTypeNumbering {
				TypeNumbering(Type value) : AttrTypeNumbering(value) {}
				Type getValue() const { return value.get<Type>(); }
				};

				//===----------------------------------------------------------------------===//
				// OpName Numbering
				//===----------------------------------------------------------------------===//

				/// This class represents the numbering entry of an operation name.
				struct OpNameNumbering {
				OpNameNumbering(OperationName name) : name(name) {}

				/// The concrete name.
				OperationName name;

				/// The number assigned to this name.
				unsigned number = 0;
				};

				//===----------------------------------------------------------------------===//
				// Dialect Numbering
				//===----------------------------------------------------------------------===//

				/// This class represents a numbering entry for an Dialect.
				struct DialectNumbering {
				DialectNumbering(StringRef name, unsigned number)
				: name(name), number(number) {}

				/// The namespace of the dialect.
				StringRef name;

				/// The number assigned to the dialect.
				unsigned number;

				/// The loaded dialect, or nullptr if the dialect isn't loaded.
				Dialect *dialect = nullptr;

				/// Numbered sub-components of the dialect to be emitted.
				std::vector<OpNameNumbering *> opNames;
				std::vector<AttributeNumbering *> attributes;
				std::vector<TypeNumbering *> types;
				};

				//===----------------------------------------------------------------------===//
				// IRNumberingState
				//===----------------------------------------------------------------------===//

				/// This class manages numbering IR entities in preparation of bytecode
				/// emission.
				class IRNumberingState {
				public:
				IRNumberingState(const BytecodeWriterConfig &config);

				/// Return the numbered dialects.
				auto getDialects() {
				return llvm::make_pointee_range(llvm::make_second_range(dialects));
				}

				/// Return the number for the given IR unit.
				unsigned getNumber(Attribute attr) {
				assert(attrs.count(attr) && "attribute not numbered");
				return attrs[attr]->number;
				}
				unsigned getNumber(Block *block) {
				assert(blockIDs.count(block) && "block not numbered");
				return blockIDs[block];
				}
				unsigned getNumber(OperationName opName) {
				assert(opNames.count(opName) && "opName not numbered");
				return opNames[opName]->number;
				}
				unsigned getNumber(Type type) {
				assert(types.count(type) && "type not numbered");
				return types[type]->number;
				}
				unsigned getNumber(Value value) {
				assert(valueIDs.count(value) && "value not numbered");
				return valueIDs[value];
				}

				/// Return the block and value counts of the given region.
				std::pair<unsigned, unsigned> getBlockValueCount(Region *region) {
				assert(regionBlockValueCounts.count(region) && "value not numbered");
				return regionBlockValueCounts[region];
				}

				private:
				/// Number the given IR unit for bytecode emission.
				void number(Attribute attr);
				void number(Block &block);
				DialectNumbering &numberDialect(Dialect *dialect);
				DialectNumbering &numberDialect(StringRef dialect);
				void number(Operation &op);
				void number(OperationName opName);
				void number(Region &region);
				void number(Type type);

				/// Mapping from IR to the respective numbering entries.
				llvm::MapVector<Attribute, AttributeNumbering *> attrs;
				llvm::MapVector<OperationName, OpNameNumbering *> opNames;
				llvm::MapVector<Type, TypeNumbering *> types;
				llvm::MapVector<Dialect , DialectNumbering > registeredDialects;
				llvm::MapVector<StringRef, DialectNumbering *> dialects;

				/// Allocators used for the various numbering entries.
				llvm::SpecificBumpPtrAllocator<AttributeNumbering> attrAllocator;
				llvm::SpecificBumpPtrAllocator<DialectNumbering> dialectAllocator;
				llvm::SpecificBumpPtrAllocator<OpNameNumbering> opNameAllocator;
				llvm::SpecificBumpPtrAllocator<TypeNumbering> typeAllocator;

				/// The value ID for each Block and Value.
				DenseMap<Block *, unsigned> blockIDs;
				DenseMap<Value, unsigned> valueIDs;

				/// A map from region to the number of blocks and values within that region.
				DenseMap<Region *, std::pair<unsigned, unsigned>> regionBlockValueCounts;

				/// The next value ID to assign when numbering.
				unsigned nextValueID = 0;
				};
				} // namespace detail
				} // namespace bytecode
				} // namespace mlir

				#endif

mlir/lib/Bytecode/Writer/IRNumbering.cpp

This file was added.

				//===- IRNumbering.cpp - MLIR Bytecode IR numbering -----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "IRNumbering.h"
				#include "mlir/Bytecode/BytecodeWriter.h"
				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/IR/Dialect.h"
				#include "mlir/IR/Operation.h"

				using namespace mlir;
				using namespace mlir::bytecode::detail;

				//===----------------------------------------------------------------------===//
				// IR Numbering
				//===----------------------------------------------------------------------===//

				IRNumberingState::IRNumberingState(const BytecodeWriterConfig &config) {
				Operation *op = config.getRootOp();

				// Number the root operation.
				number(*op);

				// Push all of the regions of the root operation onto the worklist.
				SmallVector<std::pair<Region *, unsigned>, 8> numberContext;
				for (Region &region : op->getRegions())
				numberContext.emplace_back(&region, nextValueID);

				// Iteratively process each of the nested regions.
				while (!numberContext.empty()) {
				Region *region;
				std::tie(region, nextValueID) = numberContext.pop_back_val();
				number(*region);

				// Traverse into nested regions.
				for (Operation &op : region->getOps())
				for (Region &region : op.getRegions())
				numberContext.emplace_back(&region, nextValueID);
				}

				// Walk and number the recorded components within each dialect.
				unsigned attrID = 0, opNameID = 0, typeID = 0;
				for (DialectNumbering *dialect : llvm::make_second_range(dialects)) {
				for (AttributeNumbering *attr : dialect->attributes)
				attr->number = attrID++;
				for (OpNameNumbering *opName : dialect->opNames)
				opName->number = opNameID++;
				for (TypeNumbering *type : dialect->types)
				type->number = typeID++;
				}
				}

				void IRNumberingState::number(Attribute attr) {
				auto it = attrs.insert({attr, nullptr});
				if (!it.second)
				return;
				mehdi_aminiUnsubmitted Done Reply Inline Actions We could record the number of times an attribute is used in ordre to sort them so that the most used one have a lower IDs (and have more chances to fit in one bytes) :) mehdi_amini: We could record the number of times an attribute is used in ordre to sort them so that the most…
				rriddleAuthorUnsubmitted Done Reply Inline Actions We would need to encode things differently in that case, i.e. if the attributes are not in order of dialect, they would each need to have an associated dialect id encoded with them. In the case of lots of attributes/types, that would be significant. Maybe we could come up a hybrid model? i.e. encode the most common 128 attributes/types, so that they fit in one byte (or two), and then encode the rest using dialect grouping. rriddle: We would need to encode things differently in that case, i.e. if the attributes are not in…
				jpienaarUnsubmitted Done Reply Inline Actions Would sorting attributes per frequency per dialect? (Keep dialect attributes still together but just sort dialects per frequency). We could measure all three of course, doesn't require version bump ;-) jpienaar: Would sorting attributes per frequency per dialect? (Keep dialect attributes still together but…
				auto *numbering = new (attrAllocator.Allocate()) AttributeNumbering(attr);
				it.first->second = numbering;

				// Check for OpaqueAttr, which is a dialect-specific attribute that didn't
				// have a registered dialect when it got created. We don't want to encode this
				// as the builtin OpaqueAttr, we want to encode it as if the dialect was
				// actually loaded.
				if (OpaqueAttr opaqueAttr = attr.dyn_cast<OpaqueAttr>())
				numbering->dialect = &numberDialect(opaqueAttr.getDialectNamespace());
				else
				numbering->dialect = &numberDialect(&attr.getDialect());

				numbering->dialect->attributes.push_back(numbering);
				}

				void IRNumberingState::number(Block &block) {
				// Number the arguments of the block.
				for (BlockArgument arg : block.getArguments()) {
				valueIDs.try_emplace(arg, nextValueID++);
				number(arg.getLoc());
				number(arg.getType());
				jpienaarUnsubmitted Done Reply Inline Actions Could this just be a static function here? jpienaar: Could this just be a static function here?
				}

				// Number the operations in this block.
				for (Operation &op : block)
				number(op);
				}

				auto IRNumberingState::numberDialect(Dialect *dialect) -> DialectNumbering & {
				DialectNumbering *&numbering = registeredDialects[dialect];
				if (!numbering) {
				numbering = &numberDialect(dialect->getNamespace());
				numbering->dialect = dialect;
				}
				return *numbering;
				}

				auto IRNumberingState::numberDialect(StringRef dialect) -> DialectNumbering & {
				DialectNumbering *&numbering = dialects[dialect];
				if (!numbering) {
				numbering = new (dialectAllocator.Allocate())
				DialectNumbering(dialect, dialects.size() - 1);
				}
				return *numbering;
				}

				void IRNumberingState::number(Region &region) {
				size_t firstValueID = nextValueID;

				// Number the blocks within this region.
				size_t blockCount = 0;
				for (auto &it : llvm::enumerate(region)) {
				blockIDs.try_emplace(&it.value(), it.index());
				number(it.value());
				++blockCount;
				}

				// Remember the number of blocks and values in this region.
				regionBlockValueCounts.try_emplace(&region, blockCount,
				nextValueID - firstValueID);
				}

				void IRNumberingState::number(Operation &op) {
				// Number the components of an operation that won't be numbered elsewhere
				// (e.g. we don't number operands, regions, or successors here).
				number(op.getName());
				for (OpResult result : op.getResults()) {
				valueIDs.try_emplace(result, nextValueID++);
				number(result.getType());
				}
				number(op.getAttrDictionary());
				number(op.getLoc());
				}

				void IRNumberingState::number(OperationName opName) {
				OpNameNumbering *&numbering = opNames[opName];
				if (numbering)
				return;
				DialectNumbering *dialectNumber = nullptr;
				if (Dialect *dialect = opName.getDialect())
				dialectNumber = &numberDialect(dialect);
				else
				dialectNumber = &numberDialect(opName.getDialectNamespace());
				numbering = new (opNameAllocator.Allocate()) OpNameNumbering(opName);
				dialectNumber->opNames.emplace_back(numbering);
				}

				void IRNumberingState::number(Type type) {
				auto it = types.insert({type, nullptr});
				if (!it.second)
				return;
				auto *numbering = new (typeAllocator.Allocate()) TypeNumbering(type);
				it.first->second = numbering;

				// Check for OpaqueType, which is a dialect-specific type that didn't have a
				// registered dialect when it got created. We don't want to encode this as the
				// builtin OpaqueType, we want to encode it as if the dialect was actually
				// loaded.
				if (OpaqueType opaqueType = type.dyn_cast<OpaqueType>())
				numbering->dialect = &numberDialect(opaqueType.getDialectNamespace());
				else
				numbering->dialect = &numberDialect(&type.getDialect());

				numbering->dialect->types.push_back(numbering);
				}

mlir/lib/CMakeLists.txt

	# Enable errors for any global constructors.			# Enable errors for any global constructors.
	add_flag_if_supported("-Werror=global-constructors" WERROR_GLOBAL_CONSTRUCTOR)			add_flag_if_supported("-Werror=global-constructors" WERROR_GLOBAL_CONSTRUCTOR)

	add_subdirectory(Analysis)			add_subdirectory(Analysis)
	add_subdirectory(AsmParser)			add_subdirectory(AsmParser)
				add_subdirectory(Bytecode)
	add_subdirectory(Conversion)			add_subdirectory(Conversion)
	add_subdirectory(Dialect)			add_subdirectory(Dialect)
	add_subdirectory(IR)			add_subdirectory(IR)
	add_subdirectory(Interfaces)			add_subdirectory(Interfaces)
	add_subdirectory(Parser)			add_subdirectory(Parser)
	add_subdirectory(Pass)			add_subdirectory(Pass)
	add_subdirectory(Reducer)			add_subdirectory(Reducer)
	add_subdirectory(Rewrite)			add_subdirectory(Rewrite)
	add_subdirectory(Support)			add_subdirectory(Support)
	add_subdirectory(TableGen)			add_subdirectory(TableGen)
	add_subdirectory(Target)			add_subdirectory(Target)
	add_subdirectory(Tools)			add_subdirectory(Tools)
	add_subdirectory(Transforms)			add_subdirectory(Transforms)
	add_subdirectory(ExecutionEngine)			add_subdirectory(ExecutionEngine)

mlir/lib/Parser/CMakeLists.txt

	add_mlir_library(MLIRParser			add_mlir_library(MLIRParser
	Parser.cpp			Parser.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Parser			${MLIR_MAIN_INCLUDE_DIR}/mlir/Parser

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRAsmParser			MLIRAsmParser
				MLIRBytecodeReader
	MLIRIR			MLIRIR
	)			)

mlir/lib/Parser/Parser.cpp

	//===- Parser.cpp - MLIR Unified Parser Interface -------------------------===//			//===- Parser.cpp - MLIR Unified Parser Interface -------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements the parser for the MLIR textual form.			// This file implements the parser for the MLIR textual form.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Parser/Parser.h"			#include "mlir/Parser/Parser.h"
	#include "mlir/AsmParser/AsmParser.h"			#include "mlir/AsmParser/AsmParser.h"
				#include "mlir/Bytecode/BytecodeReader.h"
	#include "llvm/Support/SourceMgr.h"			#include "llvm/Support/SourceMgr.h"

	using namespace mlir;			using namespace mlir;

	LogicalResult mlir::parseSourceFile(const llvm::SourceMgr &sourceMgr,			LogicalResult mlir::parseSourceFile(const llvm::SourceMgr &sourceMgr,
	Block *block, const ParserConfig &config,			Block *block, const ParserConfig &config,
	LocationAttr *sourceFileLoc) {			LocationAttr *sourceFileLoc) {
	const auto *sourceBuf = sourceMgr.getMemoryBuffer(sourceMgr.getMainFileID());			const auto *sourceBuf = sourceMgr.getMemoryBuffer(sourceMgr.getMainFileID());
	if (sourceFileLoc) {			if (sourceFileLoc) {
	*sourceFileLoc = FileLineColLoc::get(config.getContext(),			*sourceFileLoc = FileLineColLoc::get(config.getContext(),
	sourceBuf->getBufferIdentifier(),			sourceBuf->getBufferIdentifier(),
	/line=/0, /column=/0);			/line=/0, /column=/0);
	}			}
				if (isBytecode(*sourceBuf))
				return readBytecodeFile(*sourceBuf, block, config);
	return parseAsmSourceFile(sourceMgr, block, config);			return parseAsmSourceFile(sourceMgr, block, config);
	}			}

	LogicalResult mlir::parseSourceFile(llvm::StringRef filename, Block *block,			LogicalResult mlir::parseSourceFile(llvm::StringRef filename, Block *block,
	const ParserConfig &config,			const ParserConfig &config,
	LocationAttr *sourceFileLoc) {			LocationAttr *sourceFileLoc) {
	llvm::SourceMgr sourceMgr;			llvm::SourceMgr sourceMgr;
	return parseSourceFile(filename, sourceMgr, block, config, sourceFileLoc);			return parseSourceFile(filename, sourceMgr, block, config, sourceFileLoc);
	Show All 32 Lines

mlir/lib/Tools/mlir-opt/CMakeLists.txt

	add_mlir_library(MLIROptLib			add_mlir_library(MLIROptLib
	MlirOptMain.cpp			MlirOptMain.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Tools/mlir-opt			${MLIR_MAIN_INCLUDE_DIR}/mlir/Tools/mlir-opt

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
				MLIRBytecodeWriter
	MLIRPass			MLIRPass
	MLIRParser			MLIRParser
	MLIRSupport			MLIRSupport
	)			)

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

//===- MlirOptMain.cpp - MLIR Optimizer Driver ----------------------------===//		//===- MlirOptMain.cpp - MLIR Optimizer Driver ----------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This is a utility that runs an optimization pass and prints the result back		// This is a utility that runs an optimization pass and prints the result back
// out. It is designed to support unit testing.		// out. It is designed to support unit testing.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Tools/mlir-opt/MlirOptMain.h"		#include "mlir/Tools/mlir-opt/MlirOptMain.h"
		#include "mlir/Bytecode/BytecodeWriter.h"
#include "mlir/IR/AsmState.h"		#include "mlir/IR/AsmState.h"
#include "mlir/IR/Attributes.h"		#include "mlir/IR/Attributes.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/Diagnostics.h"		#include "mlir/IR/Diagnostics.h"
#include "mlir/IR/Dialect.h"		#include "mlir/IR/Dialect.h"
#include "mlir/IR/Location.h"		#include "mlir/IR/Location.h"
#include "mlir/IR/MLIRContext.h"		#include "mlir/IR/MLIRContext.h"
#include "mlir/Parser/Parser.h"		#include "mlir/Parser/Parser.h"
Show All 19 Lines
/// within the specified context.		/// within the specified context.
///		///
/// This typically parses the main source file, runs zero or more optimization		/// This typically parses the main source file, runs zero or more optimization
/// passes, then prints the output.		/// passes, then prints the output.
///		///
static LogicalResult performActions(raw_ostream &os, bool verifyDiagnostics,		static LogicalResult performActions(raw_ostream &os, bool verifyDiagnostics,
bool verifyPasses, SourceMgr &sourceMgr,		bool verifyPasses, SourceMgr &sourceMgr,
MLIRContext *context,		MLIRContext *context,
PassPipelineFn passManagerSetupFn) {		PassPipelineFn passManagerSetupFn,
		bool emitBytecode) {
DefaultTimingManager tm;		DefaultTimingManager tm;
applyDefaultTimingManagerCLOptions(tm);		applyDefaultTimingManagerCLOptions(tm);
TimingScope timing = tm.getRootScope();		TimingScope timing = tm.getRootScope();

// Disable multi-threading when parsing the input file. This removes the		// Disable multi-threading when parsing the input file. This removes the
// unnecessary/costly context synchronization when parsing.		// unnecessary/costly context synchronization when parsing.
bool wasThreadingEnabled = context->isMultithreadingEnabled();		bool wasThreadingEnabled = context->isMultithreadingEnabled();
context->disableMultithreading();		context->disableMultithreading();
Show All 22 Lines	if (failed(passManagerSetupFn(pm)))
return failure();		return failure();

// Run the pipeline.		// Run the pipeline.
if (failed(pm.run(*module)))		if (failed(pm.run(*module)))
return failure();		return failure();

// Print the output.		// Print the output.
TimingScope outputTiming = timing.nest("Output");		TimingScope outputTiming = timing.nest("Output");
		if (emitBytecode) {
		writeBytecodeToFile(module->getOperation(), os);
		} else {
module->print(os);		module->print(os);
os << '\n';		os << '\n';
		}
return success();		return success();
}		}

/// Parses the memory buffer. If successfully, run a series of passes against		/// Parses the memory buffer. If successfully, run a series of passes against
/// it and print the result.		/// it and print the result.
static LogicalResult		static LogicalResult
processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,		processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,
bool verifyDiagnostics, bool verifyPasses,		bool verifyDiagnostics, bool verifyPasses,
bool allowUnregisteredDialects, bool preloadDialectsInContext,		bool allowUnregisteredDialects, bool preloadDialectsInContext,
PassPipelineFn passManagerSetupFn, DialectRegistry &registry,		bool emitBytecode, PassPipelineFn passManagerSetupFn,
llvm::ThreadPool *threadPool) {		DialectRegistry &registry, llvm::ThreadPool *threadPool) {
// Tell sourceMgr about this buffer, which is what the parser will pick up.		// Tell sourceMgr about this buffer, which is what the parser will pick up.
SourceMgr sourceMgr;		SourceMgr sourceMgr;
sourceMgr.AddNewSourceBuffer(std::move(ownedBuffer), SMLoc());		sourceMgr.AddNewSourceBuffer(std::move(ownedBuffer), SMLoc());

// Create a context just for the current buffer. Disable threading on creation		// Create a context just for the current buffer. Disable threading on creation
// since we'll inject the thread-pool separately.		// since we'll inject the thread-pool separately.
MLIRContext context(registry, MLIRContext::Threading::DISABLED);		MLIRContext context(registry, MLIRContext::Threading::DISABLED);
if (threadPool)		if (threadPool)
context.setThreadPool(*threadPool);		context.setThreadPool(*threadPool);

// Parse the input file.		// Parse the input file.
if (preloadDialectsInContext)		if (preloadDialectsInContext)
context.loadAllAvailableDialects();		context.loadAllAvailableDialects();
context.allowUnregisteredDialects(allowUnregisteredDialects);		context.allowUnregisteredDialects(allowUnregisteredDialects);
if (verifyDiagnostics)		if (verifyDiagnostics)
context.printOpOnDiagnostic(false);		context.printOpOnDiagnostic(false);
context.getDebugActionManager().registerActionHandler<DebugCounter>();		context.getDebugActionManager().registerActionHandler<DebugCounter>();

// If we are in verify diagnostics mode then we have a lot of work to do,		// If we are in verify diagnostics mode then we have a lot of work to do,
// otherwise just perform the actions without worrying about it.		// otherwise just perform the actions without worrying about it.
if (!verifyDiagnostics) {		if (!verifyDiagnostics) {
SourceMgrDiagnosticHandler sourceMgrHandler(sourceMgr, &context);		SourceMgrDiagnosticHandler sourceMgrHandler(sourceMgr, &context);
return performActions(os, verifyDiagnostics, verifyPasses, sourceMgr,		return performActions(os, verifyDiagnostics, verifyPasses, sourceMgr,
&context, passManagerSetupFn);		&context, passManagerSetupFn, emitBytecode);
}		}

SourceMgrDiagnosticVerifierHandler sourceMgrHandler(sourceMgr, &context);		SourceMgrDiagnosticVerifierHandler sourceMgrHandler(sourceMgr, &context);

// Do any processing requested by command line flags. We don't care whether		// Do any processing requested by command line flags. We don't care whether
// these actions succeed or fail, we only care what diagnostics they produce		// these actions succeed or fail, we only care what diagnostics they produce
// and whether they match our expectations.		// and whether they match our expectations.
(void)performActions(os, verifyDiagnostics, verifyPasses, sourceMgr, &context,		(void)performActions(os, verifyDiagnostics, verifyPasses, sourceMgr, &context,
passManagerSetupFn);		passManagerSetupFn, emitBytecode);

// Verify the diagnostic handler to make sure that each of the diagnostics		// Verify the diagnostic handler to make sure that each of the diagnostics
// matched.		// matched.
return sourceMgrHandler.verify();		return sourceMgrHandler.verify();
}		}

LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,		LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
std::unique_ptr<MemoryBuffer> buffer,		std::unique_ptr<MemoryBuffer> buffer,
PassPipelineFn passManagerSetupFn,		PassPipelineFn passManagerSetupFn,
DialectRegistry &registry, bool splitInputFile,		DialectRegistry &registry, bool splitInputFile,
bool verifyDiagnostics, bool verifyPasses,		bool verifyDiagnostics, bool verifyPasses,
bool allowUnregisteredDialects,		bool allowUnregisteredDialects,
bool preloadDialectsInContext) {		bool preloadDialectsInContext,
		bool emitBytecode) {
		// Check to see if we are trying to output bytecode to a displayed stream.
		// TODO: Do we need to provide a -f option like LLVM? Should we even
		jpienaarUnsubmitted Done Reply Inline Actions I think yes & no. piping these will be common and other uses seem like mistake, but I don't know how foolproof this check is on all platforms and opt tool is not a user tool. jpienaar: I think yes & no. piping these will be common and other uses seem like mistake, but I don't…
		mehdi_aminiUnsubmitted Done Reply Inline Actions I think the difference with LLVM opt is that we're not having byte code as the default, hence we may not need to warn since the user has to opt-in to get there. mehdi_amini: I think the difference with LLVM opt is that we're not having byte code as the default, hence…
		rriddleAuthorUnsubmitted Done Reply Inline Actions Makes sense to me, just dropped it. We can add a warning back in if enough people trip up on this (given bytecode generation is an explicit decision). rriddle: Makes sense to me, just dropped it. We can add a warning back in if enough people trip up on…
		// warn/disable in this case?
		if (emitBytecode && outputStream.is_displayed()) {
		llvm::errs()
		<< "warning: Attempting to output a bytecode file to a displayed "
		"stream.\n"
		"This is inadvisable as it may cause display problems, disabling "
		"bytecode output.\n\n";
		emitBytecode = false;
		}

// The split-input-file mode is a very specific mode that slices the file		// The split-input-file mode is a very specific mode that slices the file
// up into small pieces and checks each independently.		// up into small pieces and checks each independently.
// We use an explicit threadpool to avoid creating and joining/destroying		// We use an explicit threadpool to avoid creating and joining/destroying
// threads for each of the split.		// threads for each of the split.
ThreadPool *threadPool = nullptr;		ThreadPool *threadPool = nullptr;

// Create a temporary context for the sake of checking if		// Create a temporary context for the sake of checking if
// --mlir-disable-threading was passed on the command line.		// --mlir-disable-threading was passed on the command line.
// We use the thread-pool this context is creating, and avoid		// We use the thread-pool this context is creating, and avoid
// creating any thread when disabled.		// creating any thread when disabled.
MLIRContext threadPoolCtx;		MLIRContext threadPoolCtx;
if (threadPoolCtx.isMultithreadingEnabled())		if (threadPoolCtx.isMultithreadingEnabled())
threadPool = &threadPoolCtx.getThreadPool();		threadPool = &threadPoolCtx.getThreadPool();

auto chunkFn = [&](std::unique_ptr<MemoryBuffer> chunkBuffer,		auto chunkFn = [&](std::unique_ptr<MemoryBuffer> chunkBuffer,
raw_ostream &os) {		raw_ostream &os) {
return processBuffer(os, std::move(chunkBuffer), verifyDiagnostics,		return processBuffer(os, std::move(chunkBuffer), verifyDiagnostics,
verifyPasses, allowUnregisteredDialects,		verifyPasses, allowUnregisteredDialects,
preloadDialectsInContext, passManagerSetupFn, registry,		preloadDialectsInContext, emitBytecode,
threadPool);		passManagerSetupFn, registry, threadPool);
};		};
return splitAndProcessBuffer(std::move(buffer), chunkFn, outputStream,		return splitAndProcessBuffer(std::move(buffer), chunkFn, outputStream,
splitInputFile, /insertMarkerInOutput=/true);		splitInputFile, /insertMarkerInOutput=/true);
}		}

LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,		LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
std::unique_ptr<MemoryBuffer> buffer,		std::unique_ptr<MemoryBuffer> buffer,
const PassPipelineCLParser &passPipeline,		const PassPipelineCLParser &passPipeline,
DialectRegistry &registry, bool splitInputFile,		DialectRegistry &registry, bool splitInputFile,
bool verifyDiagnostics, bool verifyPasses,		bool verifyDiagnostics, bool verifyPasses,
bool allowUnregisteredDialects,		bool allowUnregisteredDialects,
bool preloadDialectsInContext) {		bool preloadDialectsInContext,
		bool emitBytecode) {
auto passManagerSetupFn = [&](PassManager &pm) {		auto passManagerSetupFn = [&](PassManager &pm) {
auto errorHandler = [&](const Twine &msg) {		auto errorHandler = [&](const Twine &msg) {
emitError(UnknownLoc::get(pm.getContext())) << msg;		emitError(UnknownLoc::get(pm.getContext())) << msg;
return failure();		return failure();
};		};
return passPipeline.addToPipeline(pm, errorHandler);		return passPipeline.addToPipeline(pm, errorHandler);
};		};
return MlirOptMain(outputStream, std::move(buffer), passManagerSetupFn,		return MlirOptMain(outputStream, std::move(buffer), passManagerSetupFn,
registry, splitInputFile, verifyDiagnostics, verifyPasses,		registry, splitInputFile, verifyDiagnostics, verifyPasses,
allowUnregisteredDialects, preloadDialectsInContext);		allowUnregisteredDialects, preloadDialectsInContext,
		emitBytecode);
}		}

LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,		LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
DialectRegistry &registry,		DialectRegistry &registry,
bool preloadDialectsInContext) {		bool preloadDialectsInContext) {
static cl::opt<std::string> inputFilename(		static cl::opt<std::string> inputFilename(
cl::Positional, cl::desc("<input file>"), cl::init("-"));		cl::Positional, cl::desc("<input file>"), cl::init("-"));

Show All 21 Lines	LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
static cl::opt<bool> allowUnregisteredDialects(		static cl::opt<bool> allowUnregisteredDialects(
"allow-unregistered-dialect",		"allow-unregistered-dialect",
cl::desc("Allow operation with no registered dialects"), cl::init(false));		cl::desc("Allow operation with no registered dialects"), cl::init(false));

static cl::opt<bool> showDialects(		static cl::opt<bool> showDialects(
"show-dialects", cl::desc("Print the list of registered dialects"),		"show-dialects", cl::desc("Print the list of registered dialects"),
cl::init(false));		cl::init(false));

		static cl::opt<bool> emitBytecode(
		"emit-bytecode", cl::desc("Emit bytecode when generating output"),
		cl::init(false));

InitLLVM y(argc, argv);		InitLLVM y(argc, argv);

// Register any command line options.		// Register any command line options.
registerAsmPrinterCLOptions();		registerAsmPrinterCLOptions();
registerMLIRContextCLOptions();		registerMLIRContextCLOptions();
registerPassManagerCLOptions();		registerPassManagerCLOptions();
registerDefaultTimingManagerCLOptions();		registerDefaultTimingManagerCLOptions();
DebugCounter::registerCLOptions();		DebugCounter::registerCLOptions();
Show All 28 Lines	LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
auto output = openOutputFile(outputFilename, &errorMessage);		auto output = openOutputFile(outputFilename, &errorMessage);
if (!output) {		if (!output) {
llvm::errs() << errorMessage << "\n";		llvm::errs() << errorMessage << "\n";
return failure();		return failure();
}		}

if (failed(MlirOptMain(output->os(), std::move(file), passPipeline, registry,		if (failed(MlirOptMain(output->os(), std::move(file), passPipeline, registry,
splitInputFile, verifyDiagnostics, verifyPasses,		splitInputFile, verifyDiagnostics, verifyPasses,
allowUnregisteredDialects, preloadDialectsInContext)))		allowUnregisteredDialects, preloadDialectsInContext,
		emitBytecode)))
return failure();		return failure();

// Keep the output file if the invocation of MlirOptMain was successful.		// Keep the output file if the invocation of MlirOptMain was successful.
output->keep();		output->keep();
return success();		return success();
}		}

mlir/test/Bytecode/general.mlir

This file was added.

				// RUN: mlir-opt -allow-unregistered-dialect -emit-bytecode %s \| mlir-opt -allow-unregistered-dialect \| FileCheck %s
				jpienaarUnsubmitted Done Reply Inline Actions We probably end up running all non-split or -error cases through a round trip tests to check, followed by fuzzing. It would almost seem possible to enumerate all of these kind of the constructs above. jpienaar: We probably end up running all non-split or -error cases through a round trip tests to check…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Reminds me an old proposal of mine to add some flag to mlir-opt to automatically round-trip and diff, and enable this flag optionally to process the entire test-suite :) Seems like it would be useful here as well! mehdi_amini: Reminds me an old proposal of mine to add some flag to mlir-opt to automatically round-trip and…
				jpienaarUnsubmitted Done Reply Inline Actions Yes indeed, I was actually wondering if we had that already :) jpienaar: Yes indeed, I was actually wondering if we had that already :)
				mehdi_aminiUnsubmitted Done Reply Inline Actions https://reviews.llvm.org/D90088 mehdi_amini: https://reviews.llvm.org/D90088
				rriddleAuthorUnsubmitted Done Reply Inline Actions Do you plan on reviving that @mehdi_amini ? rriddle: Do you plan on reviving that @mehdi_amini ?
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Yeah I should. I had memory that we couldn't reach consensus on it but I may be wrong. mehdi_amini: Yeah I should. I had memory that we couldn't reach consensus on it but I may be wrong.
				jpienaarUnsubmitted Not Done Reply Inline Actions I think only question was on on by default or not (e.g., how much of testing tool mlir-opt really is, should it be used in directed testing only etc) jpienaar: I think only question was on on by default or not (e.g., how much of testing tool mlir-opt…

				// CHECK-LABEL: "bytecode.test1"
				// CHECK-NEXT: "bytecode.empty"() : () -> ()
				// CHECK-NEXT: "bytecode.attributes"() {attra = 10 : i64, attrb = #bytecode.attr} : () -> ()
				// CHECK-NEXT: %[[RESULTS:.*]]:3 = "bytecode.results"() : () -> (i32, i64, i32)
				// CHECK-NEXT: "bytecode.operands"(%[[RESULTS]]#0, %[[RESULTS]]#1, %[[RESULTS]]#2) : (i32, i64, i32) -> ()
				// CHECK-NEXT: "bytecode.branch"()[^[[BLOCK:.*]]] : () -> ()
				// CHECK-NEXT: ^[[BLOCK]](%[[ARG0:.]]: i32, %[[ARG1:.]]: !bytecode.int, %[[ARG2:.*]]: !pdl.operation):
				// CHECK-NEXT: "bytecode.regions"() ({
				// CHECK-NEXT: "bytecode.operands"(%[[ARG0]], %[[ARG1]], %[[ARG2]]) : (i32, !bytecode.int, !pdl.operation) -> ()
				// CHECK-NEXT: "bytecode.return"() : () -> ()
				// CHECK-NEXT: }) : () -> ()
				// CHECK-NEXT: "bytecode.return"() : () -> ()
				// CHECK-NEXT: }) : () -> ()

				"bytecode.test1"() ({
				"bytecode.empty"() : () -> ()
				"bytecode.attributes"() {attra = 10, attrb = #bytecode.attr} : () -> ()
				%results:3 = "bytecode.results"() : () -> (i32, i64, i32)
				"bytecode.operands"(%results#0, %results#1, %results#2) : (i32, i64, i32) -> ()
				"bytecode.branch"()[^secondBlock] : () -> ()

				^secondBlock(%arg1: i32, %arg2: !bytecode.int, %arg3: !pdl.operation):
				"bytecode.regions"() ({
				"bytecode.operands"(%arg1, %arg2, %arg3) : (i32, !bytecode.int, !pdl.operation) -> ()
				"bytecode.return"() : () -> ()
				}) : () -> ()
				"bytecode.return"() : () -> ()
				}) : () -> ()

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add initial support for a binary serialization formatClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 452346

mlir/docs/BytecodeFormat.md

mlir/include/mlir/Bytecode/BytecodeReader.h

mlir/include/mlir/Bytecode/BytecodeWriter.h

mlir/include/mlir/IR/OperationSupport.h

mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h

mlir/lib/Bytecode/CMakeLists.txt

mlir/lib/Bytecode/Encoding.h

mlir/lib/Bytecode/Reader/BytecodeReader.cpp

mlir/lib/Bytecode/Reader/CMakeLists.txt

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp

mlir/lib/Bytecode/Writer/CMakeLists.txt

mlir/lib/Bytecode/Writer/IRNumbering.h

mlir/lib/Bytecode/Writer/IRNumbering.cpp

mlir/lib/CMakeLists.txt

mlir/lib/Parser/CMakeLists.txt

mlir/lib/Parser/Parser.cpp

mlir/lib/Tools/mlir-opt/CMakeLists.txt

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

mlir/test/Bytecode/general.mlir

[mlir] Add initial support for a binary serialization format
ClosedPublic