This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
docs/
53/53
BytecodeFormat.md
-
include/mlir/
-
mlir/
-
Bytecode/
-
BytecodeReader.h
2/2
BytecodeWriter.h
-
IR/
-
OperationSupport.h
-
Tools/mlir-opt/
-
mlir-opt/
-
MlirOptMain.h
-
lib/
-
Bytecode/
-
CMakeLists.txt
5/5
Encoding.h
-
Reader/
36/38
BytecodeReader.cpp
-
CMakeLists.txt
-
Writer/
4
BytecodeWriter.cpp
-
CMakeLists.txt
-
IRNumbering.h
4/4
IRNumbering.cpp
-
CMakeLists.txt
-
Parser/
-
CMakeLists.txt
-
Parser.cpp
-
Tools/mlir-opt/
-
mlir-opt/
-
CMakeLists.txt
3/3
MlirOptMain.cpp
-
test/Bytecode/
-
Bytecode/
5/7
general.mlir
-
invalid/
-
invalid-attr_type_offset_section-large_offset.mlirbc
-
invalid-attr_type_offset_section-trailing_data.mlirbc
-
invalid-attr_type_section-index.mlirbc
-
invalid-attr_type_section-trailing_data.mlirbc
-
invalid-dialect_section-dialect_string.mlirbc
-
invalid-dialect_section-opname_dialect.mlirbc
-
invalid-dialect_section-opname_string.mlirbc
-
invalid-dialect_section.mlir
-
invalid-ir_section-attr.mlirbc
-
invalid-ir_section-forwardref.mlirbc
-
invalid-ir_section-loc.mlirbc
-
invalid-ir_section-operands.mlirbc
-
invalid-ir_section-opname.mlirbc
-
invalid-ir_section-results.mlirbc
-
invalid-ir_section-successors.mlirbc
-
invalid-ir_section.mlir
-
invalid-string_section-count.mlirbc
-
invalid-string_section-large_string.mlirbc
-
invalid-string_section-no_string.mlirbc
-
invalid-string_section-trailing_data.mlirbc
-
invalid-string_section.mlir
-
invalid-structure-producer.mlirbc
-
invalid-structure-section-duplicate.mlirbc
-
invalid-structure-section-id-unknown.mlirbc
-
invalid-structure-section-length.mlirbc
-
invalid-structure-section-missing.mlirbc
-
invalid-structure-version.mlirbc
-
invalid-structure.mlir
-
invalid_attr_type_offset_section.mlir
-
invalid_attr_type_section.mlir

Differential D131747

[mlir] Add initial support for a binary serialization format
ClosedPublic

Authored by rriddle on Aug 11 2022, 8:17 PM.

Download Raw Diff

Details

Reviewers

mehdi_amini
jpienaar
nicolasvasilache

Commits

rGf3acb54c1b7b: [mlir] Add initial support for a binary serialization format

Summary

This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).

The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.

https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rriddle created this revision.Aug 11 2022, 8:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 8:17 PM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 19 others. · View Herald Transcript

rriddle requested review of this revision.Aug 11 2022, 8:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 8:17 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

rriddle added reviewers: mehdi_amini, jpienaar.Aug 11 2022, 8:18 PM

I don't have any failure tests right now because those are annoying to make/update, and I want to make sure we agree on various aspects before doing that.

Harbormaster completed remote builds in B180841: Diff 452063.Aug 11 2022, 8:36 PM

Nice, just started with doc for now.

I was wondering if we should we have mlir-translate do the text to binary and vice versa, but can see the convenience when stitching together passes and it's not an external format.

mlir/docs/BytecodeFormat.md
36	OOC why not use it?
69	What is used instead?
121	Nit: [] instead of *? (I see the relational database or pointer notations of the latter but I find the former more intuitive)
128	Link to encoding section?
209	And 7 of 8 optional elements used?
214	So location is either previous or new? Unspecified doesn't mean unknown but previous. That's nice. The only other common case I could think of is successive lines in a file but that would take a bit.
247	So we could have a region with 0 blocks here that is considered not empty?

rriddle added inline comments.Aug 12 2022, 12:40 AM

mlir/docs/BytecodeFormat.md
36	The encoding described here, or more commonly referred to as PrefixVarInt, is essentially a variant of LEB. Encoding and decoding are generally faster using the prefix strategy, though. The decode for the prefix variant, for example, is effectively branchless given that you just need to count the trailing zeros, which is an intrinsic on most modern hardware. I've benchmarked a bunch of different strategies and implementations over the past week, using both a corpus of .mlir taken from various projects and just using random distributions of integers. This is what emerged from that testing.
247	Zero block regions should be marked as empty. I used codes for various "bool" like things just to make it easier to change things later on. E.g. if we wanted to change the way regions are encoded we could just have a new "kRegion2" code. Though that only matters after we start versioning things.

Just commenting on the doc right now.

mlir/docs/BytecodeFormat.md
26	No fixed64 ? Is the bet that we'd always use VBR for this?
36	Are you using PrefixVarInt or how does it differ? Is your variant format documented somewhere in the literature? I'd rather have us stick to something existing, I doubt that we'll invent a revolutionary trick here somehow. Can you position what you're proposing against varint-G8IU, PFOR, SIMD-BP12, ... ?
120	I'm not sure here why these varint are useful?
121	+1 for [] :)
142	What is `kAsmForm` ? Seems to refer to an enum not documented here.
148	I'm not sure yet how we dispatch to a dialect for loading a type/attribute
154	Basically this is differential encoding of the offset table, but that means you need to decompress the table here IIUC. Accessing the last attribute requires to read the entire table. (I assume we would do it once and for-all in the "BitcodeReaderContext" or whatever you're naming it)
168	That seems overly conservative to me: what about just storing the part after the dot and add a varint for the dialect ID?
191	I assume this field is needed to allow some level of lazy loading / random access? I'm not sure yet...
209	Actually 6? `firstResultIndex` and `numResults` seem coupled right?
248	Do we need this indirection here? You didn't provide one for the operation content.
265	block_element ?

mehdi_amini added inline comments.Aug 12 2022, 2:39 PM

mlir/docs/BytecodeFormat.md
203	Having the regions in-line makes it hard/impossible to lazily load IR I think? (at least not without decoding the entire IR section).

rriddle updated this revision to Diff 452346.Aug 12 2022, 6:30 PM

rriddle marked 12 inline comments as done.

rriddle added inline comments.Aug 12 2022, 6:31 PM

mlir/docs/BytecodeFormat.md
26	I just didn't add it in. Technically right now we only use byte, so I dropped the rest and added a TODO to add larger widths as necessary.
36	Yeah, it's just PrefixVarInt. I completely missed explicitly saying that when rushing out the doc.
120	These are used to know how many attributes/types/operation names need to be parsed.
121	SG, I also like [].
148	We chatted a little about this offline. Attributes and Types are grouped by dialect, with each grouping emitted in the same order as the dialects in the dialect section. This allows for us to know which dialect an attribute/type belongs to based on its index (i.e. we could know attributes 0-5 are the builtin dialect, 6-9 are the func dialect and so on).
154	Yeah. We do a single pass to initialize the bytecode structure of all attributes/types, which indicates where the data is stored in the bytecode. Reading lazily after that is trivial, because we just read the previously computed data directly.
168	Agreed, I need to look into this. I went with the current thing because it was easier to bootstrap (e.g. I don't have to worry about the difference between builtin and non-builtin).
191	This field is the value index of the first result. Every value gets a number, which is what gets referred to in the `operands` list. For multiple results, we know the value number are consecutive, so we just need to know the first one.
203	Yeah. The idea I have right now is that the op encoding mask will indicate if the regions are inline or out-of-line. That way we just dispatch to two different code paths depending on the encoding.
209	Yeah, right now the mask uses 6 out of 8 possible bits. Whenever we do lazy loading that may bump up to 7 bits (we could get around that by encoding the "are the regions lazy" bit a different way). We could of course change from a mask to a set of values for each possible encoding type.
214	Yeah, the file locations encoding is gonna be interesting. I'm hoping that when builtin attributes have custom encodings it ends up being mostly okay (should just be a string index+two small varints for the line/col). This is something we will likely want to play around with, but thankfully it's easy to test (I have a huge IR file that inherits locations from the file).

Harbormaster completed remote builds in B181030: Diff 452346.Aug 12 2022, 7:00 PM

jpienaar added inline comments.Aug 12 2022, 7:13 PM

mlir/include/mlir/Bytecode/BytecodeWriter.h
36	This feels a little bit weird ... I'd almost expect like OpPrintingFlags having the config be separate from Operation* being printed. But perhaps it makes more sense below.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
21	mlir-bytecode-reader ?
120	Nit: I'm more used to null-terminated, with NUL being the character name.
mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
156	I think yes & no. piping these will be common and other uses seem like mistake, but I don't know how foolproof this check is on all platforms and opt tool is not a user tool.
mlir/test/Bytecode/general.mlir
2	We probably end up running all non-split or -error cases through a round trip tests to check, followed by fuzzing. It would almost seem possible to enumerate all of these kind of the constructs above.

mehdi_amini added inline comments.Aug 13 2022, 2:59 AM

mlir/docs/BytecodeFormat.md
102	Producer string?
120	Actually, following our offline discussion, they are needed to be able match to type/attr back to a dialect right?
133	Worth mentioning: the first section can't be decoded without the second one as the elements in the array of attrs/types don't include a size.
159	Maybe make it explicit: `, this allows to associate an attribute back to a dialect without including a dialect reference in each type/attr entry.`
168	Can't you capture this as a TODO at the end of this paragraph?
191	Right, but I'm still not sure why this needs to be encoded: if you load operations in order, you could just use an ever incrementing number for each Value.
203	I think we should think about isolated region from the get go: you don't document the value numbering in the doc but I think we can use the same "local scope" as the textual parser to ensure that the value IDs stays small (and so use less varint space).
221	Is this really the common case that (other than "unknown") the locations are repeating in sequence? I am not convinced by this choice right now because it will make future lazy loading harder (you need to stream back to find the previously defined location).
252	When is "region empty" useful? Is it legal for an operation to have an empty region?
269	I don't get this, why multiple `block_arguments` blocks? What about something like: block { encoding: varint, // (numOps << 1) \| (hasBlockArgs) arguments: block_arguments?, // Optional based on encoding ops : op[] } block_arguments { firstArgIndex: varint, numArgs: varint?, args: block_argument[] } block_argument { typeIndexAndHasLoc: varint, // (typeIndex << 1) \| (hasLoc) location: varint? }

mehdi_amini added inline comments.Aug 13 2022, 8:55 AM

mlir/docs/BytecodeFormat.md
120	Also right now it isn't used at all as far as I can tell.
262	Please document numValues :)
mlir/lib/Bytecode/Encoding.h
30	I'm not clear on how you manage "codes", or what are a "builtin section codes"
58	"a" top level operation... We parse in a block so we should have the ability to have multiple I think.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
136	You should sanity check sectionID here I think (and make the argument type the right enum in the API)
149	Document noinline please.
157	`assert(numBytes > 0 && numBytes <= 7);` ?
173	Does this work on a big endian machine?
217	We should have a pointer to the BytecodeDialect here I think, should be able to set it up in initializeOffsets
249	We should be able to have an enum for code here right?
282	`offsetReader`? (I was confusing to me reading the code where it is used)
306	I think you should check that currentOffset does not exceed the `sectionData.size()`, a malformed byte code coud have offsets going beyond.
381	I remember that Attr/Type were made "mutable" to support LLVM named struct (IIRC?), but isn't this encoding and loading scheme assuming there are no cycles? How are we gonna handle this?
566	This should move to `parseSection I think. (`sectionID` is used in test/set already, seems unsafe)
593	Why aren't dialects lazy loaded?
616	I was thinking: could we have a stringpool top-level section and everywhere refer to strings with an id there? Mnemonic shared between op/attributes/types and across dialects would be stored once and for-all.
749	Please break the recursion :)
755	Not sure where to attached this comment, but there is something missing somewhere (unless I missed it?) to ensure that use-lists ordering is preserved.
mlir/lib/Bytecode/Writer/IRNumbering.cpp
61	We could record the number of times an attribute is used in ordre to sort them so that the most used one have a lower IDs (and have more chances to fit in one bytes) :)
mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
156	I think the difference with LLVM opt is that we're not having byte code as the default, hence we may not need to warn since the user has to opt-in to get there.
mlir/test/Bytecode/general.mlir
2	Reminds me an old proposal of mine to add some flag to mlir-opt to automatically round-trip and diff, and enable this flag optionally to process the entire test-suite :) Seems like it would be useful here as well!

jpienaar added inline comments.Aug 13 2022, 9:18 AM

mlir/test/Bytecode/general.mlir
2	Yes indeed, I was actually wondering if we had that already :)

mehdi_amini added inline comments.Aug 13 2022, 9:24 AM

mlir/test/Bytecode/general.mlir
2	https://reviews.llvm.org/D90088

rriddle updated this revision to Diff 452942.Aug 16 2022, 4:04 AM

rriddle marked 33 inline comments as done.

rriddle added inline comments.Aug 16 2022, 4:04 AM

mlir/docs/BytecodeFormat.md
191	Originally I was thinking that you can't do that if you load lazily, but if I use your other suggestion of doing per-isolated region numbering it should be possible. This will be much nicer, thanks!
203	Great suggestion!
221	I don't think it would make lazy-loading bad, given that we could encode the last location at the start of the region. I'm going to drop this behavior though, given that I'm not sure how many cases in practice it would help. This would also free up this optional behavior for something else (if that something else was more efficient for real-world use cases).
252	Yeah? e.g. External functions have empty regions. Almost every operation that has a region that can optionally be filled use an empty region, given that regions can't be dynamically added after op construction. Cleaned up this section though, we don't need the leading code, we can just use the block count.
269	Nice! I figured you would come up with something better ;)
mlir/include/mlir/Bytecode/BytecodeWriter.h
36	There was some reason why I did this before, but I forget now. I just dropped it.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
149	I thought I did: This method is marked noinline to avoid pessimizing the common case of single byte encoding. Tried to make it more obvious.
173	Right now, no. I need to setup the proper little -> native conversions. I'm deferring that to a follow up because I have to setup a virtual machine for big endian (which is annoying/time consuming). I also need to fix some things in the textual format related to big endian as well.
381	We will likely need some form of special API that can parse just the "immutable" part (i.e. the "name" in the LLVM struct case). For example, if an attribute/type is recursive, we could encode both its immutable and mutable encodings in one entry (with some header that has the size of the immutable part or something). Something like: RecursiveEntry { immutableEncodingSize: varint, immutableEncoding: ..., mutableEncoding: ... } During processing we could first process the immutable entry, and then immediately process the mutable one. That way any recursive references would resolve properly, and then we'd fix the final reference afterwards. Something like: // Parse the immutable first, so that we have something to give recursive references. if (!(result = parseImmutable())) return failure(); // Parse the mutable afterwards. Pass in `result` so that it can populate the mutable bits? if (failed(parseMutable(result)) return failure(); Until we figure any of this out though, I'm just going to have them always use the string fallback for those attributes and types.
755	Deferring this to a follow up to help simplify this patch, added a TODO for now.
mlir/lib/Bytecode/Writer/IRNumbering.cpp
61	We would need to encode things differently in that case, i.e. if the attributes are not in order of dialect, they would each need to have an associated dialect id encoded with them. In the case of lots of attributes/types, that would be significant. Maybe we could come up a hybrid model? i.e. encode the most common 128 attributes/types, so that they fit in one byte (or two), and then encode the rest using dialect grouping.
mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
156	Makes sense to me, just dropped it. We can add a warning back in if enough people trip up on this (given bytecode generation is an explicit decision).

Harbormaster completed remote builds in B181477: Diff 452942.Aug 16 2022, 4:28 AM

jpienaar added inline comments.Aug 16 2022, 7:55 AM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
616	So encoding would be start and end offsets into a string table?

mehdi_amini added inline comments.Aug 16 2022, 3:49 PM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
616	String being null terminated, you don't necessarily need the end offsets. But if we have an offset section separate from the string table: we just need to point to an entry number, same mechanism as attr/type reference.

(partial scan before meeting)

mlir/docs/BytecodeFormat.md
12	I just noticed ï and not i , I mean I guess writing this by hand and the other was that we don't actually take text file as bytecode (MĻîŘ just to bikeshed :))
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
10	Nit: I'd move this lower, I expect documentation here not todos :)
227	Dialect of this Attribute or Type ? (its mostly parent that makes me think OOP more than I normally think here, up to you)
309	attribute ?

mehdi_amini added inline comments.Aug 16 2022, 4:06 PM

mlir/docs/BytecodeFormat.md
12	I don't think your bike shed is ASCII though?

rriddle updated this revision to Diff 453166.Aug 16 2022, 4:41 PM

rriddle marked 5 inline comments as done.Aug 16 2022, 4:44 PM

Harbormaster completed remote builds in B181655: Diff 453166.Aug 16 2022, 5:08 PM

rriddle updated this revision to Diff 453172.Aug 16 2022, 5:09 PM

rriddle marked an inline comment as done.Aug 16 2022, 5:10 PM

Harbormaster completed remote builds in B181660: Diff 453172.Aug 16 2022, 5:33 PM

Missing negative tests are a bit unfortunate, but good to hear they are coming soon and this seems like good starting point.

mlir/docs/BytecodeFormat.md
12	Yeah I marked the comment as done before sending as I wasn't serious (and I did also use an extended ASCII encoding without realizing). But in seriousness: MLÏR wouldn't be a valid dialect name, so this looks good.
mlir/lib/Bytecode/Encoding.h
26	uint8_t here too? (I mean with 0 it doesn't matter)
80	We can now use binary literals (not sure it makes if it is more readable, but was reminded of it)
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
243	Unsigned needed?
414	Comment?
502	So this parses the string starting at front of reader? And null-terminated?
616	Indeed, null-termination means we can't have substrings referenced (not sure if that is common here, could think for error strings, but unsure about decoding cost).
mlir/lib/Bytecode/Writer/IRNumbering.cpp
61	Would sorting attributes per frequency per dialect? (Keep dialect attributes still together but just sort dialects per frequency). We could measure all three of course, doesn't require version bump ;-)

This revision is now accepted and ready to land.Aug 16 2022, 9:08 PM

rriddle updated this revision to Diff 453383.Aug 17 2022, 11:46 AM

rriddle marked 12 inline comments as done.

rriddle added inline comments.Aug 17 2022, 11:47 AM

mlir/lib/Bytecode/Encoding.h
26	kVersion is encoded as a varint now, so that we don't have to change it if we have some burst of changing versions (makes it easier to change version if we don't have a cap looming overhead). I suppose I could switch the general constants to use inline constexpr variables now (given we are on C++17), let me know your preference.
mlir/lib/Bytecode/Reader/BytecodeReader.cpp
502	Yeah, the front of the reader has an index to a string defined in the string section. Updated the comment.
mlir/test/Bytecode/general.mlir
2	Do you plan on reviving that @mehdi_amini ?

rriddle updated this revision to Diff 453397.Aug 17 2022, 12:32 PM

Herald added a subscriber: mgrang. · View Herald TranscriptAug 17 2022, 12:32 PM

Harbormaster completed remote builds in B181815: Diff 453397.Aug 17 2022, 1:51 PM

mehdi_amini added inline comments.Aug 17 2022, 2:53 PM

mlir/test/Bytecode/general.mlir
2	Yeah I should. I had memory that we couldn't reach consensus on it but I may be wrong.

LG, great start :)

jpienaar added inline comments.Aug 17 2022, 3:05 PM

mlir/test/Bytecode/general.mlir
2	I think only question was on on by default or not (e.g., how much of testing tool mlir-opt really is, should it be used in directed testing only etc)

rriddle updated this revision to Diff 453794.Aug 18 2022, 2:40 PM

rriddle marked 2 inline comments as done.

Nice, like the encoding change.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
58	Why this change?
mlir/lib/Bytecode/Writer/IRNumbering.cpp
81	Could this just be a static function here?

Harbormaster completed remote builds in B182092: Diff 453794.Aug 18 2022, 4:13 PM

rriddle marked 2 inline comments as done.Aug 18 2022, 7:23 PM

rriddle added inline comments.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
58	So that we can load in enums.

rriddle updated this revision to Diff 453864.Aug 18 2022, 7:23 PM

rriddle marked an inline comment as done.

Herald added a subscriber: arphaman. · View Herald TranscriptAug 18 2022, 7:23 PM

Harbormaster completed remote builds in B182139: Diff 453864.Aug 18 2022, 8:02 PM

Closed by commit rGf3acb54c1b7b: [mlir] Add initial support for a binary serialization format (authored by rriddle). · Explain WhyAug 22 2022, 12:47 AM

This revision was automatically updated to reflect the committed changes.

rriddle added a commit: rGf3acb54c1b7b: [mlir] Add initial support for a binary serialization format.

rriddle mentioned this in rG93cf0e8a28e8: [mlir] Fix bots after bytecode support was added in D131747.Aug 22 2022, 1:31 AM

gflegar mentioned this in rG1d9b1427f4ea: [mlir][Bazel] Fix bazel build.Aug 22 2022, 3:08 AM

This fails asan https://lab.llvm.org/buildbot/#/builders/5/builds/26955

vitalybuka added inline comments.Aug 22 2022, 9:53 AM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
901	also a problem, it emplace_back may relocate container, but the for loop above uses readState which is the ref to the element of container.
925	This pop_back and then readState.isIsolatedFromAbove which from the regionStack?

rriddle marked 2 inline comments as done.Aug 22 2022, 9:58 AM

rriddle added inline comments.

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
901	This should be fine, given that we always return in this case (i.e. never touch to invalid reference again).
925	Thanks for catching this. I'm not sure why my local asan build didn't catch this (I'll try nuking and resetting it).

vitalybuka added inline comments.Aug 22 2022, 10:03 AM

mlir/lib/Bytecode/Reader/BytecodeReader.cpp
901	Thanks, I see.
925	I'm not sure why my local asan build didn't catch this Probably you don't use libc++ or instrumented libc++? I'm not sure why my local asan build didn't catch this I'm not sure why my local asan build didn't catch this If you can fix it quickly go for it. If not, please let me know, I have a patch to revert it with related fixes.

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

In D131747#3740323, @mehdi_amini wrote:

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

In D131747#3740334, @vitalybuka wrote:

In D131747#3740323, @mehdi_amini wrote:

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

Define "alive"? This is a new feature that has zero users and we're actively bootstrapping. So no I'm not scared by an out-of-bound here for a couple of days at most.

Looks like it's already fixed with 96fd3f2

In D131747#3740368, @vitalybuka wrote:
In D131747#3740335, @mehdi_amini wrote:

In D131747#3740334, @vitalybuka wrote:

In D131747#3740323, @mehdi_amini wrote:

If you need to unbreak a sanitizer bot, we can XFAIL: asan the two tests, this is pretty cheap.

You are not scare of out of bound mem access in alive code?

Define "alive"? This is a new feature that has zero users and we're actively bootstrapping. So no I'm not scared by an out-of-bound here for a couple of days at most.

Sure, I don't know that code. So XFAIL is OK to me if you accept implications. (having that @rriddle failed to reproduce locally, maybe UNSUPPORTED instead, in case if some asan setups will miss the issues)
But seems revert/reland safe and easy to do as well.

Also if this is the only issue

Trivial fix may work?
bool isIsolatedFromAbove = readState.isIsolatedFromAbove;
 regionStack.pop_back();
 if (isIsolatedFromAbove)
   valueScopes.pop_back();

Yeah, sorry I've been in meetings. I pushed https://github.com/llvm/llvm-project/commit/96fd3f2d5be21ded6ffed0ac75195df04ec679df an hour ago and have been watching the bot to see if that is the only issue.

Hi @rriddle , as of commit https://github.com/llvm/llvm-project/commit/93cf0e8a28e8c682f65d3e5c394d1eb169ca09ce the s390x build bot is still red due to "unexpected success":

XPASS: MLIR::invalid-string_section.mlir
XPASS: MLIR::invalid_attr_type_offset_section.mlir
XPASS: MLIR::invalid_attr_type_section.mlir
XPASS: MLIR::invalid-structure.mlir
XPASS: MLIR::invalid-ir_section.mlir
XPASS: MLIR::invalid-dialect_section.mlir

(see https://lab.llvm.org/buildbot/#/builders/199/builds/8674)

Can these "invalid" tests still legitimately pass even on a big-endian platform? It seems these XFAILs should either be removed or changed into UNSUPPORTED.

In D131747#3740383, @uweigand wrote:
Hi @rriddle , as of commit https://github.com/llvm/llvm-project/commit/93cf0e8a28e8c682f65d3e5c394d1eb169ca09ce the s390x build bot is still red due to "unexpected success":
XPASS: MLIR::invalid-string_section.mlir
XPASS: MLIR::invalid_attr_type_offset_section.mlir
XPASS: MLIR::invalid_attr_type_section.mlir
XPASS: MLIR::invalid-structure.mlir
XPASS: MLIR::invalid-ir_section.mlir
XPASS: MLIR::invalid-dialect_section.mlir
(see https://lab.llvm.org/buildbot/#/builders/199/builds/8674)

Can these "invalid" tests still legitimately pass even on a big-endian platform? It seems these XFAILs should either be removed or changed into UNSUPPORTED.

@uweigand Thanks for the ping, it's possible big-endian is fine up to the point at which some of the tests fail (I haven't had time to setup a venv to test everything out). UNSUPPORTED is likely a better check than XFAIL there (I just copied from our other s390x broken test)

In D131747#3740387, @rriddle wrote:

@uweigand Thanks for the ping, it's possible big-endian is fine up to the point at which some of the tests fail (I haven't had time to setup a venv to test everything out). UNSUPPORTED is likely a better check than XFAIL there (I just copied from our other s390x broken test)

As of commit df4e637ca7ef4ef17b662845120864921e65bb67 the build bot is green again on s390x. Thanks!

RVP added a subscriber: RVP.Sep 29 2022, 8:29 AM

RVP added inline comments.

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185	Is this parenthesized correctly?

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 29 2022, 8:29 AM

Herald added a subscriber: zero9178. · View Herald Transcript

jpienaar added inline comments.Sep 29 2022, 8:34 AM

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185	This is checking if the value post shift is 0 (and relies on this function being called only when multi byte), what issue did you run into with this?

RVP added inline comments.Sep 29 2022, 8:38 AM

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185	Shouldn't `LLVM_LIKELY` be around the whole condition instead of the shift expression? Isn't `== 0` the likely case and not the shift result being non-zero?

RVP added inline comments.Sep 29 2022, 8:53 AM

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp
185	I didn't see any issues. Was looking at the code and this question popped. I now saw that `emitVarInt` specially handles the common case `(... >> 7) == 0`. Maybe a comment here as well would have avoided the question. Thanks.

Revision Contents

Path

Size

mlir/

docs/

BytecodeFormat.md

314 lines

include/

mlir/

Bytecode/

BytecodeReader.h

34 lines

BytecodeWriter.h

36 lines

IR/

OperationSupport.h

4 lines

Tools/

mlir-opt/

MlirOptMain.h

7 lines

lib/

Bytecode/

CMakeLists.txt

2 lines

Encoding.h

81 lines

Reader/

BytecodeReader.cpp

1222 lines

CMakeLists.txt

11 lines

Writer/

520 lines

11 lines

193 lines

251 lines

1 line

Parser/

CMakeLists.txt

1 line

Parser.cpp

3 lines

Tools/

mlir-opt/

CMakeLists.txt

1 line

MlirOptMain.cpp

40 lines

test/

Bytecode/

general.mlir

34 lines

invalid/

invalid-attr_type_offset_section-large_offset.mlirbc

invalid-attr_type_offset_section-trailing_data.mlirbc

invalid-attr_type_section-index.mlirbc

invalid-attr_type_section-trailing_data.mlirbc

invalid-dialect_section-dialect_string.mlirbc

invalid-dialect_section-opname_dialect.mlirbc

invalid-dialect_section-opname_string.mlirbc

invalid-dialect_section.mlir

19 lines

invalid-ir_section-attr.mlirbc

invalid-ir_section-forwardref.mlirbc

invalid-ir_section-loc.mlirbc

invalid-ir_section-operands.mlirbc

invalid-ir_section-opname.mlirbc

invalid-ir_section-results.mlirbc

invalid-ir_section-successors.mlirbc

invalid-ir_section.mlir

45 lines

invalid-string_section-count.mlirbc

invalid-string_section-large_string.mlirbc

invalid-string_section-no_string.mlirbc

invalid-string_section-trailing_data.mlirbc

invalid-string_section.mlir

26 lines

invalid-structure-producer.mlirbc

1 line

invalid-structure-section-duplicate.mlirbc

invalid-structure-section-id-unknown.mlirbc

invalid-structure-section-length.mlirbc

invalid-structure-section-missing.mlirbc

invalid-structure-version.mlirbc

1 line

invalid-structure.mlir

44 lines

invalid_attr_type_offset_section.mlir

16 lines

invalid_attr_type_section.mlir

16 lines

Diff 454397

mlir/docs/BytecodeFormat.md

This file was added.

				# MLIR Bytecode Format

				This documents describes the MLIR bytecode format and its encoding.

				[TOC]

				## Magic Number

				MLIR uses the following four-byte magic number to indicate bytecode files:

				'\[‘M’<sub>8</sub>, ‘L’<sub>8</sub>, ‘ï’<sub>8</sub>, ‘R’<sub>8</sub>\]'

				jpienaarUnsubmitted Done Reply Inline Actions I just noticed ï and not i , I mean I guess writing this by hand and the other was that we don't actually take text file as bytecode (MĻîŘ just to bikeshed :)) jpienaar: I just noticed ï and not i , I mean I guess writing this by hand and the other was that we…
				mehdi_aminiUnsubmitted Done Reply Inline Actions I don't think your bike shed is ASCII though? mehdi_amini: I don't think your bike shed is ASCII though?
				jpienaarUnsubmitted Done Reply Inline Actions Yeah I marked the comment as done before sending as I wasn't serious (and I did also use an extended ASCII encoding without realizing). But in seriousness: MLÏR wouldn't be a valid dialect name, so this looks good. jpienaar: Yeah I marked the comment as done before sending as I wasn't serious (and I did also use an…
				In hex:

				'\[‘4D’<sub>8</sub>, ‘4C’<sub>8</sub>, ‘EF’<sub>8</sub>, ‘52’<sub>8</sub>\]'

				## Format Overview

				An MLIR Bytecode file is comprised of a byte stream, with a few simple
				structural concepts layered on top.

				### Primitives

				#### Fixed-Width Integers

				```
				mehdi_aminiUnsubmitted Done Reply Inline Actions No fixed64 ? Is the bet that we'd always use VBR for this? mehdi_amini: No fixed64 ? Is the bet that we'd always use VBR for this?
				rriddleAuthorUnsubmitted Done Reply Inline Actions I just didn't add it in. Technically right now we only use byte, so I dropped the rest and added a TODO to add larger widths as necessary. rriddle: I just didn't add it in. Technically right now we only use byte, so I dropped the rest and…
				byte ::= `0x00`...`0xFF`
				```

				Fixed width integers are unsigned integers of a known byte size. The values are
				stored in little-endian byte order.

				TODO: Add larger fixed width integers as necessary.

				#### Variable-Width Integers

				jpienaarUnsubmitted Done Reply Inline Actions OOC why not use it? jpienaar: OOC why not use it?
				rriddleAuthorUnsubmitted Done Reply Inline Actions The encoding described here, or more commonly referred to as PrefixVarInt, is essentially a variant of LEB. Encoding and decoding are generally faster using the prefix strategy, though. The decode for the prefix variant, for example, is effectively branchless given that you just need to count the trailing zeros, which is an intrinsic on most modern hardware. I've benchmarked a bunch of different strategies and implementations over the past week, using both a corpus of .mlir taken from various projects and just using random distributions of integers. This is what emerged from that testing. rriddle: The encoding described here, or more commonly referred to as PrefixVarInt, is essentially a…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Are you using PrefixVarInt or how does it differ? Is your variant format documented somewhere in the literature? I'd rather have us stick to something existing, I doubt that we'll invent a revolutionary trick here somehow. Can you position what you're proposing against varint-G8IU, PFOR, SIMD-BP12, ... ? mehdi_amini: Are you using PrefixVarInt or how does it differ? Is your variant format documented somewhere…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, it's just PrefixVarInt. I completely missed explicitly saying that when rushing out the doc. rriddle: Yeah, it's just PrefixVarInt. I completely missed explicitly saying that when rushing out the…
				Variable width integers, or `VarInt`s, provide a compact representation for
				integers. Each encoded VarInt consists of one to nine bytes, which together
				represent a single 64-bit value. The MLIR bytecode utilizes the "PrefixVarInt"
				encoding for VarInts. This encoding is a variant of the
				[LEB128 ("Little-Endian Base 128")](https://en.wikipedia.org/wiki/LEB128)
				encoding, where each byte of the encoding provides up to 7 bits for the value,
				with the remaining bit used to store a tag indicating the number of bytes used
				for the encoding. This means that small unsigned integers (less than 2^7) may be
				stored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
				etc.

				The first byte of the encoding includes a length prefix in the low bits. This
				prefix is a bit sequence of '0's followed by a terminal '1', or the end of the
				byte. The number of '0' bits indicate the number of _additional_ bytes, not
				including the prefix byte, used to encode the value. All of the remaining bits
				in the first byte, along with all of the bits in the additional bytes, provide
				the value of the integer. Below are the various possible encodings of the prefix
				byte:

				```
				xxxxxxx1: 7 value bits, the encoding uses 1 byte
				xxxxxx10: 14 value bits, the encoding uses 2 bytes
				xxxxx100: 21 value bits, the encoding uses 3 bytes
				xxxx1000: 28 value bits, the encoding uses 4 bytes
				xxx10000: 35 value bits, the encoding uses 5 bytes
				xx100000: 42 value bits, the encoding uses 6 bytes
				x1000000: 49 value bits, the encoding uses 7 bytes
				10000000: 56 value bits, the encoding uses 8 bytes
				00000000: 64 value bits, the encoding uses 9 bytes
				```

				#### Strings

				jpienaarUnsubmitted Done Reply Inline Actions What is used instead? jpienaar: What is used instead?
				Strings are blobs of characters with an associated length.

				### Sections

				```
				section {
				id: byte
				length: varint
				}
				```

				Sections are a mechanism for grouping data within the bytecode. The enable
				delayed processing, which is useful for out-of-order processing of data,
				lazy-loading, and more. Each section contains a Section ID and a length (which
				allowing for skipping over the section).

				TODO: Sections should also carry an optional alignment. Add this when necessary.

				## MLIR Encoding

				Given the generic structure of MLIR, the bytecode encoding is actually fairly
				simplistic. It effectively maps to the core components of MLIR.

				### Top Level Structure

				The top-level structure of the bytecode contains the 4-byte "magic number", a
				version number, a null-terminated producer string, and a list of sections. Each
				section is currently only expected to appear once within a bytecode file.

				```
				bytecode {
				magic: "MLïR",
				version: varint,
				mehdi_aminiUnsubmitted Done Reply Inline Actions Producer string? mehdi_amini: Producer string?
				producer: string,
				sections: section[]
				}
				```

				### String Section

				```
				strings {
				numStrings: varint,
				reverseStringLengths: varint[],
				stringData: byte[]
				}
				```

				The string section contains a table of strings referenced within the bytecode,
				more easily enabling string sharing. This section is encoded first with the
				total number of strings, followed by the sizes of each of the individual strings
				mehdi_aminiUnsubmitted Done Reply Inline Actions I'm not sure here why these varint are useful? mehdi_amini: I'm not sure here why these varint are useful?
				rriddleAuthorUnsubmitted Done Reply Inline Actions These are used to know how many attributes/types/operation names need to be parsed. rriddle: These are used to know how many attributes/types/operation names need to be parsed.
				mehdi_aminiUnsubmitted Done Reply Inline Actions Actually, following our offline discussion, they are needed to be able match to type/attr back to a dialect right? mehdi_amini: Actually, following our offline discussion, they are needed to be able match to type/attr back…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Also right now it isn't used at all as far as I can tell. mehdi_amini: Also right now it isn't used at all as far as I can tell.
				in reverse order. The remaining encoding contains a single blob containing all
				jpienaarUnsubmitted Done Reply Inline Actions Nit: [] instead of ? (I see the relational database or pointer notations of the latter but I find the former more intuitive) jpienaar:* Nit: [] instead of *? (I see the relational database or pointer notations of the latter but I…
				mehdi_aminiUnsubmitted Done Reply Inline Actions +1 for [] :) mehdi_amini: +1 for [] :)
				rriddleAuthorUnsubmitted Done Reply Inline Actions SG, I also like []. rriddle: SG, I also like [].
				of the strings concatenated together.

				### Dialect Section

				The dialect section of the bytecode contains all of the dialects referenced
				within the encoded IR, and some information about the components of those
				dialects that were also referenced.
				jpienaarUnsubmitted Done Reply Inline Actions Link to encoding section? jpienaar: Link to encoding section?

				```
				dialect_section {
				numDialects: varint,
				dialectNames: varint[],
				mehdi_aminiUnsubmitted Done Reply Inline Actions Worth mentioning: the first section can't be decoded without the second one as the elements in the array of attrs/types don't include a size. mehdi_amini: Worth mentioning: the first section can't be decoded without the second one as the elements in…
				opNames: op_name_group[]
				}

				op_name_group {
				dialect: varint,
				numOpNames: varint,
				opNames: varint[]
				}
				```
				mehdi_aminiUnsubmitted Done Reply Inline Actions What is `kAsmForm` ? Seems to refer to an enum not documented here. mehdi_amini: What is `kAsmForm` ? Seems to refer to an enum not documented here.

				Dialects are encoded as indexes to the name string within the string section.
				Operation names are encoded in groups by dialect, with each group containing the
				dialect, the number of operation names, and the array of indexes to each name
				within the string section.

				mehdi_aminiUnsubmitted Done Reply Inline Actions I'm not sure yet how we dispatch to a dialect for loading a type/attribute mehdi_amini: I'm not sure yet how we dispatch to a dialect for loading a type/attribute
				rriddleAuthorUnsubmitted Done Reply Inline Actions We chatted a little about this offline. Attributes and Types are grouped by dialect, with each grouping emitted in the same order as the dialects in the dialect section. This allows for us to know which dialect an attribute/type belongs to based on its index (i.e. we could know attributes 0-5 are the builtin dialect, 6-9 are the func dialect and so on). rriddle: We chatted a little about this offline. Attributes and Types are grouped by dialect, with each…
				### Attribute/Type Sections

				Attributes and types are encoded using two [sections](#sections), one section
				(`attr_type_section`) containing the actual encoded representation, and another
				section (`attr_type_offset_section`) containing the offsets of each encoded
				attribute/type into the previous section. This structure allows for attributes
				mehdi_aminiUnsubmitted Done Reply Inline Actions Basically this is differential encoding of the offset table, but that means you need to decompress the table here IIUC. Accessing the last attribute requires to read the entire table. (I assume we would do it once and for-all in the "BitcodeReaderContext" or whatever you're naming it) mehdi_amini: Basically this is differential encoding of the offset table, but that means you need to…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah. We do a single pass to initialize the bytecode structure of all attributes/types, which indicates where the data is stored in the bytecode. Reading lazily after that is trivial, because we just read the previously computed data directly. rriddle: Yeah. We do a single pass to initialize the bytecode structure of all attributes/types, which…
				and types to always be lazily loaded on demand.

				```
				attr_type_section {
				attrs: attribute[],
				mehdi_aminiUnsubmitted Done Reply Inline Actions Maybe make it explicit: `, this allows to associate an attribute back to a dialect without including a dialect reference in each type/attr entry.` mehdi_amini: Maybe make it explicit: `, this allows to associate an attribute back to a dialect without…
				types: type[]
				}
				attr_type_offset_section {
				numAttrs: varint,
				numTypes: varint,
				offsets: attr_type_offset_group[]
				}

				attr_type_offset_group {
				mehdi_aminiUnsubmitted Done Reply Inline Actions That seems overly conservative to me: what about just storing the part after the dot and add a varint for the dialect ID? mehdi_amini: That seems overly conservative to me: what about just storing the part after the dot and add a…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Agreed, I need to look into this. I went with the current thing because it was easier to bootstrap (e.g. I don't have to worry about the difference between builtin and non-builtin). rriddle: Agreed, I need to look into this. I went with the current thing because it was easier to…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Can't you capture this as a TODO at the end of this paragraph? mehdi_amini: Can't you capture this as a TODO at the end of this paragraph?
				dialect: varint,
				numElements: varint,
				offsets: varint[] // (offset << 1) \| (hasCustomEncoding)
				}

				attribute {
				encoding: ...
				}
				type {
				encoding: ...
				}
				```

				Each `offset` in the `attr_type_offset_section` above is the size of the
				encoding for the attribute or type and a flag indicating if the encoding uses
				the textual assembly format, or a custom bytecode encoding. We avoid using the
				direct offset into the `attr_type_section`, as a smaller relative offsets
				provides more effective compression. Attributes and types are grouped by
				dialect, with each `attr_type_offset_group` in the offset section containing the
				corresponding parent dialect, number of elements, and offsets for each element
				within the group.

				#### Attribute/Type Encodings
				mehdi_aminiUnsubmitted Done Reply Inline Actions I assume this field is needed to allow some level of lazy loading / random access? I'm not sure yet... mehdi_amini: I assume this field is needed to allow some level of lazy loading / random access? I'm not sure…
				rriddleAuthorUnsubmitted Done Reply Inline Actions This field is the value index of the first result. Every value gets a number, which is what gets referred to in the `operands` list. For multiple results, we know the value number are consecutive, so we just need to know the first one. rriddle: This field is the value index of the first result. Every value gets a number, which is what…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Right, but I'm still not sure why this needs to be encoded: if you load operations in order, you could just use an ever incrementing number for each Value. mehdi_amini: Right, but I'm still not sure why this needs to be encoded: if you load operations in order…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Originally I was thinking that you can't do that if you load lazily, but if I use your other suggestion of doing per-isolated region numbering it should be possible. This will be much nicer, thanks! rriddle: Originally I was thinking that you can't do that if you load lazily, but if I use your other…

				In the abstract, an attribute/type is encoded in one of two possible ways: via
				its assembly format, or via a custom dialect defined encoding.

				##### Assembly Format Fallback

				In the case where a dialect does not define a method for encoding the attribute
				or type, the textual assembly format of that attribute or type is used as a
				fallback. For example, a type of `!bytecode.type` would be encoded as the null
				terminated string "!bytecode.type". This ensures that every attribute and type
				may be encoded, even if the owning dialect has not yet opted in to a more
				efficient serialization.
				mehdi_aminiUnsubmitted Done Reply Inline Actions Having the regions in-line makes it hard/impossible to lazily load IR I think? (at least not without decoding the entire IR section). mehdi_amini: Having the regions in-line makes it hard/impossible to lazily load IR I think? (at least not…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah. The idea I have right now is that the op encoding mask will indicate if the regions are inline or out-of-line. That way we just dispatch to two different code paths depending on the encoding. rriddle: Yeah. The idea I have right now is that the op encoding mask will indicate if the regions are…
				mehdi_aminiUnsubmitted Done Reply Inline Actions I think we should think about isolated region from the get go: you don't document the value numbering in the doc but I think we can use the same "local scope" as the textual parser to ensure that the value IDs stays small (and so use less varint space). mehdi_amini: I think we should think about isolated region from the get go: you don't document the value…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Great suggestion! rriddle: Great suggestion!

				TODO: We shouldn't redundantly encode the dialect name here, we should use a
				reference to the parent dialect instead.

				##### Dialect Defined Encoding

				jpienaarUnsubmitted Done Reply Inline Actions And 7 of 8 optional elements used? jpienaar: And 7 of 8 optional elements used?
				mehdi_aminiUnsubmitted Done Reply Inline Actions Actually 6? `firstResultIndex` and `numResults` seem coupled right? mehdi_amini: Actually 6? `firstResultIndex` and `numResults` seem coupled right?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, right now the mask uses 6 out of 8 possible bits. Whenever we do lazy loading that may bump up to 7 bits (we could get around that by encoding the "are the regions lazy" bit a different way). We could of course change from a mask to a set of values for each possible encoding type. rriddle: Yeah, right now the mask uses 6 out of 8 possible bits. Whenever we do lazy loading that may…
				TODO: This is not yet supported.

				### IR Section

				The IR section contains the encoded form of operations within the bytecode.
				jpienaarUnsubmitted Done Reply Inline Actions So location is either previous or new? Unspecified doesn't mean unknown but previous. That's nice. The only other common case I could think of is successive lines in a file but that would take a bit. jpienaar: So location is either previous or new? Unspecified doesn't mean unknown but previous. That's…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, the file locations encoding is gonna be interesting. I'm hoping that when builtin attributes have custom encodings it ends up being mostly okay (should just be a string index+two small varints for the line/col). This is something we will likely want to play around with, but thankfully it's easy to test (I have a huge IR file that inherits locations from the file). rriddle: Yeah, the file locations encoding is gonna be interesting. I'm hoping that when builtin…

				#### Operation Encoding

				```
				op {
				name: varint,
				encodingMask: byte,
				mehdi_aminiUnsubmitted Done Reply Inline Actions Is this really the common case that (other than "unknown") the locations are repeating in sequence? I am not convinced by this choice right now because it will make future lazy loading harder (you need to stream back to find the previously defined location). mehdi_amini: Is this really the common case that (other than "unknown") the locations are repeating in…
				rriddleAuthorUnsubmitted Done Reply Inline Actions I don't think it would make lazy-loading bad, given that we could encode the last location at the start of the region. I'm going to drop this behavior though, given that I'm not sure how many cases in practice it would help. This would also free up this optional behavior for something else (if that something else was more efficient for real-world use cases). rriddle: I don't think it would make lazy-loading bad, given that we could encode the last location at…
				location: varint,

				attrDict: varint?,

				numResults: varint?,
				resultTypes: varint[],

				numOperands: varint?,
				operands: varint[],

				numSuccessors: varint?,
				successors: varint[],

				regionEncoding: varint?, // (numRegions << 1) \| (isIsolatedFromAbove)
				regions: region[]
				}
				```

				The encoding of an operation is important because this is generally the most
				commonly appearing structure in the bytecode. A single encoding is used for
				every type of operation. Given this prevelance, many of the fields of an
				operation are optional. The `encodingMask` field is a bitmask which indicates
				which of the components of the operation are present.

				##### Location

				jpienaarUnsubmitted Done Reply Inline Actions So we could have a region with 0 blocks here that is considered not empty? jpienaar: So we could have a region with 0 blocks here that is considered not empty?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Zero block regions should be marked as empty. I used codes for various "bool" like things just to make it easier to change things later on. E.g. if we wanted to change the way regions are encoded we could just have a new "kRegion2" code. Though that only matters after we start versioning things. rriddle: Zero block regions should be marked as empty. I used codes for various "bool" like things just…
				The location is encoded as the index of the location within the attribute table.
				mehdi_aminiUnsubmitted Done Reply Inline Actions Do we need this indirection here? You didn't provide one for the operation content. mehdi_amini: Do we need this indirection here? You didn't provide one for the operation content.

				##### Attributes

				If the operation has attribues, the index of the operation attribute dictionary
				mehdi_aminiUnsubmitted Done Reply Inline Actions When is "region empty" useful? Is it legal for an operation to have an empty region? mehdi_amini: When is "region empty" useful? Is it legal for an operation to have an empty region?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah? e.g. External functions have empty regions. Almost every operation that has a region that can optionally be filled use an empty region, given that regions can't be dynamically added after op construction. Cleaned up this section though, we don't need the leading code, we can just use the block count. rriddle: Yeah? e.g. External functions have empty regions. Almost every operation that has a region that…
				within the attribute table is encoded.

				##### Results

				If the operation has results, the number of results and the indexes of the
				result types within the type table are encoded.

				##### Operands

				If the operation has operands, the number of operands and the value index of
				mehdi_aminiUnsubmitted Done Reply Inline Actions Please document numValues :) mehdi_amini: Please document numValues :)
				each operand is encoded. This value index is the relative ordering of the
				definition of that value from the start of the first ancestor isolated region.

				mehdi_aminiUnsubmitted Done Reply Inline Actions block_element ? mehdi_amini: block_element ?
				##### Successors

				If the operation has successors, the number of successors and the indexes of the
				successor blocks within the parent region are encoded.
				mehdi_aminiUnsubmitted Done Reply Inline Actions I don't get this, why multiple `block_arguments` blocks? What about something like: block { encoding: varint, // (numOps << 1) \| (hasBlockArgs) arguments: block_arguments?, // Optional based on encoding ops : op[] } block_arguments { firstArgIndex: varint, numArgs: varint?, args: block_argument[] } block_argument { typeIndexAndHasLoc: varint, // (typeIndex << 1) \| (hasLoc) location: varint? } mehdi_amini: I don't get this, why multiple `block_arguments` blocks? What about something like: ``` block…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Nice! I figured you would come up with something better ;) rriddle: Nice! I figured you would come up with something better ;)

				##### Regions

				If the operation has regions, the number of regions and if the regions are
				isolated from above are encoded together in a single varint. Afterwards, each
				region is encoded inline.

				#### Region Encoding

				```
				region {
				numBlocks: varint,

				numValues: varint?,
				blocks: block[]
				}
				```

				A region is encoded first with the number of blocks within. If the region is
				non-empty, the number of values defined directly within the region are encoded,
				followed by the blocks of the region.

				#### Block Encoding

				```
				block {
				encoding: varint, // (numOps << 1) \| (hasBlockArgs)
				arguments: block_arguments?, // Optional based on encoding
				ops : op[]
				}

				block_arguments {
				numArgs: varint?,
				args: block_argument[]
				}

				block_argument {
				typeIndex: varint,
				location: varint
				}
				```

				A block is encoded with an array of operations and block arguments. The first
				field is an encoding that combines the number of operations in the block, with a
				flag indicating if the block has arguments.

mlir/include/mlir/Bytecode/BytecodeReader.h

This file was added.

				//===- BytecodeReader.h - MLIR Bytecode Reader ------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header defines interfaces to read MLIR bytecode files/streams.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_BYTECODE_BYTECODEREADER_H
				#define MLIR_BYTECODE_BYTECODEREADER_H

				#include "mlir/IR/AsmState.h"
				#include "mlir/Support/LLVM.h"

				namespace llvm {
				class MemoryBufferRef;
				} // namespace llvm

				namespace mlir {
				/// Returns true if the given buffer starts with the magic bytes that signal
				/// MLIR bytecode.
				bool isBytecode(llvm::MemoryBufferRef buffer);

				/// Read the operations defined within the given memory buffer, containing MLIR
				/// bytecode, into the provided block.
				LogicalResult readBytecodeFile(llvm::MemoryBufferRef buffer, Block *block,
				const ParserConfig &config);
				} // namespace mlir

				#endif // MLIR_BYTECODE_BYTECODEREADER_H

mlir/include/mlir/Bytecode/BytecodeWriter.h

This file was added.

				//===- BytecodeWriter.h - MLIR Bytecode Writer ------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header defines interfaces to write MLIR bytecode files/streams.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_BYTECODE_BYTECODEWRITER_H
				#define MLIR_BYTECODE_BYTECODEWRITER_H

				#include "mlir/Support/LLVM.h"
				#include "llvm/ADT/StringRef.h"

				namespace mlir {
				class Operation;

				//===----------------------------------------------------------------------===//
				// Entry Points
				//===----------------------------------------------------------------------===//

				/// Write the bytecode for the given operation to the provided output stream.
				/// For streams where it matters, the given stream should be in "binary" mode.
				/// `producer` is an optional string that can be used to identify the producer
				/// of the bytecode when reading. It has no functional effect on the bytecode
				/// serialization.
				void writeBytecodeToFile(Operation *op, raw_ostream &os,
				StringRef producer = "MLIR" LLVM_VERSION_STRING);

				} // namespace mlir

				#endif // MLIR_BYTECODE_BYTECODEWRITER_H
				jpienaarUnsubmitted Done Reply Inline Actions This feels a little bit weird ... I'd almost expect like OpPrintingFlags having the config be separate from Operation* being printed. But perhaps it makes more sense below. jpienaar: This feels a little bit weird ... I'd almost expect like OpPrintingFlags having the config be…
				rriddleAuthorUnsubmitted Done Reply Inline Actions There was some reason why I did this before, but I forget now. I just dropped it. rriddle: There was some reason why I did this before, but I forget now. I just dropped it.

mlir/include/mlir/IR/OperationSupport.h

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	struct OperationState {
/// Regions that the op will hold.		/// Regions that the op will hold.
SmallVector<std::unique_ptr<Region>, 1> regions;		SmallVector<std::unique_ptr<Region>, 1> regions;

public:		public:
OperationState(Location location, StringRef name);		OperationState(Location location, StringRef name);
OperationState(Location location, OperationName name);		OperationState(Location location, OperationName name);

OperationState(Location location, OperationName name, ValueRange operands,		OperationState(Location location, OperationName name, ValueRange operands,
TypeRange types, ArrayRef<NamedAttribute> attributes,		TypeRange types, ArrayRef<NamedAttribute> attributes = {},
BlockRange successors = {},		BlockRange successors = {},
MutableArrayRef<std::unique_ptr<Region>> regions = {});		MutableArrayRef<std::unique_ptr<Region>> regions = {});
OperationState(Location location, StringRef name, ValueRange operands,		OperationState(Location location, StringRef name, ValueRange operands,
TypeRange types, ArrayRef<NamedAttribute> attributes,		TypeRange types, ArrayRef<NamedAttribute> attributes = {},
BlockRange successors = {},		BlockRange successors = {},
MutableArrayRef<std::unique_ptr<Region>> regions = {});		MutableArrayRef<std::unique_ptr<Region>> regions = {});

void addOperands(ValueRange newOperands);		void addOperands(ValueRange newOperands);

void addTypes(ArrayRef<Type> newTypes) {		void addTypes(ArrayRef<Type> newTypes) {
types.append(newTypes.begin(), newTypes.end());		types.append(newTypes.begin(), newTypes.end());
}		}
▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	/// "expected-(error\|note\|remark\|warning)" are parsed in the input and matched			/// "expected-(error\|note\|remark\|warning)" are parsed in the input and matched
	/// against emitted diagnostics.			/// against emitted diagnostics.
	/// - verifyPasses enables the IR verifier in-between each pass in the pipeline.			/// - verifyPasses enables the IR verifier in-between each pass in the pipeline.
	/// - allowUnregisteredDialects allows to parse and create operation without			/// - allowUnregisteredDialects allows to parse and create operation without
	/// registering the Dialect in the MLIRContext.			/// registering the Dialect in the MLIRContext.
	/// - preloadDialectsInContext will trigger the upfront loading of all			/// - preloadDialectsInContext will trigger the upfront loading of all
	/// dialects from the global registry in the MLIRContext. This option is			/// dialects from the global registry in the MLIRContext. This option is
	/// deprecated and will be removed soon.			/// deprecated and will be removed soon.
				/// - emitBytecode will generate bytecode output instead of text.
	LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,			LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,
	std::unique_ptr<llvm::MemoryBuffer> buffer,			std::unique_ptr<llvm::MemoryBuffer> buffer,
	const PassPipelineCLParser &passPipeline,			const PassPipelineCLParser &passPipeline,
	DialectRegistry &registry, bool splitInputFile,			DialectRegistry &registry, bool splitInputFile,
	bool verifyDiagnostics, bool verifyPasses,			bool verifyDiagnostics, bool verifyPasses,
	bool allowUnregisteredDialects,			bool allowUnregisteredDialects,
	bool preloadDialectsInContext = false);			bool preloadDialectsInContext = false,
				bool emitBytecode = false);

	/// Support a callback to setup the pass manager.			/// Support a callback to setup the pass manager.
	/// - passManagerSetupFn is the callback invoked to setup the pass manager to			/// - passManagerSetupFn is the callback invoked to setup the pass manager to
	/// apply on the loaded IR.			/// apply on the loaded IR.
	LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,			LogicalResult MlirOptMain(llvm::raw_ostream &outputStream,
	std::unique_ptr<llvm::MemoryBuffer> buffer,			std::unique_ptr<llvm::MemoryBuffer> buffer,
	PassPipelineFn passManagerSetupFn,			PassPipelineFn passManagerSetupFn,
	DialectRegistry &registry, bool splitInputFile,			DialectRegistry &registry, bool splitInputFile,
	bool verifyDiagnostics, bool verifyPasses,			bool verifyDiagnostics, bool verifyPasses,
	bool allowUnregisteredDialects,			bool allowUnregisteredDialects,
	bool preloadDialectsInContext = false);			bool preloadDialectsInContext = false,
				bool emitBytecode = false);

	/// Implementation for tools like `mlir-opt`.			/// Implementation for tools like `mlir-opt`.
	/// - toolName is used for the header displayed by `--help`.			/// - toolName is used for the header displayed by `--help`.
	/// - registry should contain all the dialects that can be parsed in the source.			/// - registry should contain all the dialects that can be parsed in the source.
	/// - preloadDialectsInContext will trigger the upfront loading of all			/// - preloadDialectsInContext will trigger the upfront loading of all
	/// dialects from the global registry in the MLIRContext. This option is			/// dialects from the global registry in the MLIRContext. This option is
	/// deprecated and will be removed soon.			/// deprecated and will be removed soon.
	LogicalResult MlirOptMain(int argc, char **argv, llvm::StringRef toolName,			LogicalResult MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
	Show All 20 Lines

mlir/lib/Bytecode/CMakeLists.txt

This file was added.

				add_subdirectory(Reader)
				add_subdirectory(Writer)

mlir/lib/Bytecode/Encoding.h

This file was added.

				//===- Encoding.h - MLIR binary format encoding information ------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header defines enum values describing the structure of MLIR bytecode
				// files.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LIB_MLIR_BYTECODE_ENCODING_H
				#define LIB_MLIR_BYTECODE_ENCODING_H

				#include <cstdint>

				namespace mlir {
				namespace bytecode {
				//===----------------------------------------------------------------------===//
				// General constants
				//===----------------------------------------------------------------------===//

				enum {
				/// The current bytecode version.
				jpienaarUnsubmitted Done Reply Inline Actions uint8_t here too? (I mean with 0 it doesn't matter) jpienaar: uint8_t here too? (I mean with 0 it doesn't matter)
				rriddleAuthorUnsubmitted Done Reply Inline Actions kVersion is encoded as a varint now, so that we don't have to change it if we have some burst of changing versions (makes it easier to change version if we don't have a cap looming overhead). I suppose I could switch the general constants to use inline constexpr variables now (given we are on C++17), let me know your preference. rriddle: kVersion is encoded as a varint now, so that we don't have to change it if we have some burst…
				kVersion = 0,
				};

				//===----------------------------------------------------------------------===//
				mehdi_aminiUnsubmitted Done Reply Inline Actions I'm not clear on how you manage "codes", or what are a "builtin section codes" mehdi_amini: I'm not clear on how you manage "codes", or what are a "builtin section codes"
				// Sections
				//===----------------------------------------------------------------------===//

				namespace Section {
				enum ID : uint8_t {
				/// This section contains strings referenced within the bytecode.
				kString = 0,

				/// This section contains the dialects referenced within an IR module.
				kDialect = 1,

				/// This section contains the attributes and types referenced within an IR
				/// module.
				kAttrType = 2,

				/// This section contains the offsets for the attribute and types within the
				/// AttrType section.
				kAttrTypeOffset = 3,

				/// This section contains the list of operations serialized into the bytecode,
				/// and their nested regions/operations.
				kIR = 4,

				/// The total number of section types.
				kNumSections = 5,
				};
				} // namespace Section

				mehdi_aminiUnsubmitted Done Reply Inline Actions "a" top level operation... We parse in a block so we should have the ability to have multiple I think. mehdi_amini: "a" top level operation... We parse in a block so we should have the ability to have multiple I…
				//===----------------------------------------------------------------------===//
				// IR Section
				//===----------------------------------------------------------------------===//

				/// This enum represents a mask of all of the potential components of an
				/// operation. This mask is used when encoding an operation to indicate which
				/// components are present in the bytecode.
				namespace OpEncodingMask {
				enum : uint8_t {
				// clang-format off
				kHasAttrs = 0b00000001,
				kHasResults = 0b00000010,
				kHasOperands = 0b00000100,
				kHasSuccessors = 0b00001000,
				kHasInlineRegions = 0b00010000,
				// clang-format on
				};
				} // namespace OpEncodingMask

				} // namespace bytecode
				} // namespace mlir

				jpienaarUnsubmitted Done Reply Inline Actions We can now use binary literals (not sure it makes if it is more readable, but was reminded of it) jpienaar: We can now use binary literals (not sure it makes if it is more readable, but was reminded of…
				#endif

mlir/lib/Bytecode/Reader/BytecodeReader.cpp

This file was added.

				//===- BytecodeReader.cpp - MLIR Bytecode Reader --------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				// TODO: Support for big-endian architectures.
				// TODO: Properly preserve use lists of values.
				jpienaarUnsubmitted Done Reply Inline Actions Nit: I'd move this lower, I expect documentation here not todos :) jpienaar: Nit: I'd move this lower, I expect documentation here not todos :)

				#include "mlir/Bytecode/BytecodeReader.h"
				#include "../Encoding.h"
				#include "mlir/AsmParser/AsmParser.h"
				#include "mlir/IR/BuiltinDialect.h"
				#include "mlir/IR/BuiltinOps.h"
				#include "mlir/IR/OpImplementation.h"
				#include "mlir/IR/Verifier.h"
				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/ScopeExit.h"
				#include "llvm/ADT/SmallString.h"
				jpienaarUnsubmitted Done Reply Inline Actions mlir-bytecode-reader ? jpienaar: mlir-bytecode-reader ?
				#include "llvm/Support/MemoryBufferRef.h"
				#include "llvm/Support/SaveAndRestore.h"

				#define DEBUG_TYPE "mlir-bytecode-reader"

				using namespace mlir;

				/// Stringify the given section ID.
				static std::string toString(bytecode::Section::ID sectionID) {
				switch (sectionID) {
				case bytecode::Section::kString:
				return "String (0)";
				case bytecode::Section::kDialect:
				return "Dialect (1)";
				case bytecode::Section::kAttrType:
				return "AttrType (2)";
				case bytecode::Section::kAttrTypeOffset:
				return "AttrTypeOffset (3)";
				case bytecode::Section::kIR:
				return "IR (4)";
				default:
				return ("Unknown (" + Twine(sectionID) + ")").str();
				}
				}

				//===----------------------------------------------------------------------===//
				// EncodingReader
				//===----------------------------------------------------------------------===//

				namespace {
				class EncodingReader {
				public:
				explicit EncodingReader(ArrayRef<uint8_t> contents, Location fileLoc)
				: dataIt(contents.data()), dataEnd(contents.end()), fileLoc(fileLoc) {}
				explicit EncodingReader(StringRef contents, Location fileLoc)
				: EncodingReader({reinterpret_cast<const uint8_t *>(contents.data()),
				contents.size()},
				jpienaarUnsubmitted Done Reply Inline Actions Why this change? jpienaar: Why this change?
				rriddleAuthorUnsubmitted Done Reply Inline Actions So that we can load in enums. rriddle: So that we can load in enums.
				fileLoc) {}

				/// Returns true if the entire section has been read.
				bool empty() const { return dataIt == dataEnd; }

				/// Returns the remaining size of the bytecode.
				size_t size() const { return dataEnd - dataIt; }

				/// Emit an error using the given arguments.
				template <typename... Args>
				LogicalResult emitError(Args &&...args) const {
				return ::emitError(fileLoc).append(std::forward<Args>(args)...);
				}

				/// Parse a single byte from the stream.
				template <typename T>
				LogicalResult parseByte(T &value) {
				if (empty())
				return emitError("attempting to parse a byte at the end of the bytecode");
				value = static_cast<T>(*dataIt++);
				return success();
				}
				/// Parse a range of bytes of 'length' into the given result.
				LogicalResult parseBytes(size_t length, ArrayRef<uint8_t> &result) {
				if (length > size()) {
				return emitError("attempting to parse ", length, " bytes when only ",
				size(), " remain");
				}
				result = {dataIt, length};
				dataIt += length;
				return success();
				}
				/// Parse a range of bytes of 'length' into the given result, which can be
				/// assumed to be large enough to hold `length`.
				LogicalResult parseBytes(size_t length, uint8_t *result) {
				if (length > size()) {
				return emitError("attempting to parse ", length, " bytes when only ",
				size(), " remain");
				}
				memcpy(result, dataIt, length);
				dataIt += length;
				return success();
				}

				/// Parse a variable length encoded integer from the byte stream. The first
				/// encoded byte contains a prefix in the low bits indicating the encoded
				/// length of the value. This length prefix is a bit sequence of '0's followed
				/// by a '1'. The number of '0' bits indicate the number of _additional_ bytes
				/// (not including the prefix byte). All remaining bits in the first byte,
				/// along with all of the bits in additional bytes, provide the value of the
				/// integer encoded in little-endian order.
				LogicalResult parseVarInt(uint64_t &result) {
				// Parse the first byte of the encoding, which contains the length prefix.
				if (failed(parseByte(result)))
				return failure();

				// Handle the overwhelmingly common case where the value is stored in a
				// single byte. In this case, the first bit is the `1` marker bit.
				if (LLVM_LIKELY(result & 1)) {
				result >>= 1;
				return success();
				}
				jpienaarUnsubmitted Done Reply Inline Actions Nit: I'm more used to null-terminated, with NUL being the character name. jpienaar: Nit: I'm more used to null-terminated, with NUL being the character name.

				// Handle the overwhelming uncommon case where the value required all 8
				// bytes (i.e. a really really big number). In this case, the marker byte is
				// all zeros: `00000000`.
				if (LLVM_UNLIKELY(result == 0))
				return parseBytes(sizeof(result), reinterpret_cast<uint8_t *>(&result));
				return parseMultiByteVarInt(result);
				}

				/// Parse a variable length encoded integer whose low bit is used to encode an
				/// unrelated flag, i.e: `(integerValue << 1) \| (flag ? 1 : 0)`.
				LogicalResult parseVarIntWithFlag(uint64_t &result, bool &flag) {
				if (failed(parseVarInt(result)))
				return failure();
				flag = result & 1;
				result >>= 1;
				mehdi_aminiUnsubmitted Done Reply Inline Actions You should sanity check sectionID here I think (and make the argument type the right enum in the API) mehdi_amini: You should sanity check sectionID here I think (and make the argument type the right enum in…
				return success();
				}

				/// Skip the first `length` bytes within the reader.
				LogicalResult skipBytes(size_t length) {
				if (length > size()) {
				return emitError("attempting to skip ", length, " bytes when only ",
				size(), " remain");
				}
				dataIt += length;
				return success();
				}

				mehdi_aminiUnsubmitted Done Reply Inline Actions Document noinline please. mehdi_amini: Document noinline please.
				rriddleAuthorUnsubmitted Done Reply Inline Actions I thought I did: This method is marked noinline to avoid pessimizing the common case of single byte encoding. Tried to make it more obvious. rriddle: I thought I did: ``` This method is marked noinline to avoid pessimizing the common case of…
				/// Parse a null-terminated string into `result` (without including the NUL
				/// terminator).
				LogicalResult parseNullTerminatedString(StringRef &result) {
				const char startIt = (const char )dataIt;
				const char nulIt = (const char )memchr(startIt, 0, size());
				if (!nulIt)
				return emitError(
				"malformed null-terminated string, no null character found");
				mehdi_aminiUnsubmitted Done Reply Inline Actions `assert(numBytes > 0 && numBytes <= 7);` ? mehdi_amini: `assert(numBytes > 0 && numBytes <= 7);` ?

				result = StringRef(startIt, nulIt - startIt);
				dataIt = (const uint8_t *)nulIt + 1;
				return success();
				}

				/// Parse a section header, placing the kind of section in `sectionID` and the
				/// contents of the section in `sectionData`.
				LogicalResult parseSection(bytecode::Section::ID &sectionID,
				ArrayRef<uint8_t> &sectionData) {
				size_t length;
				if (failed(parseByte(sectionID)) \|\| failed(parseVarInt(length)))
				return failure();
				if (sectionID >= bytecode::Section::kNumSections)
				return emitError("invalid section ID: ", unsigned(sectionID));

				mehdi_aminiUnsubmitted Done Reply Inline Actions Does this work on a big endian machine? mehdi_amini: Does this work on a big endian machine?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Right now, no. I need to setup the proper little -> native conversions. I'm deferring that to a follow up because I have to setup a virtual machine for big endian (which is annoying/time consuming). I also need to fix some things in the textual format related to big endian as well. rriddle: Right now, no. I need to setup the proper little -> native conversions. I'm deferring that to a…
				// Parse the actua section data now that we have its length.
				return parseBytes(length, sectionData);
				}

				private:
				/// Parse a variable length encoded integer from the byte stream. This method
				/// is a fallback when the number of bytes used to encode the value is greater
				/// than 1, but less than the max (9). The provided `result` value can be
				/// assumed to already contain the first byte of the value.
				/// NOTE: This method is marked noinline to avoid pessimizing the common case
				/// of single byte encoding.
				LLVM_ATTRIBUTE_NOINLINE LogicalResult parseMultiByteVarInt(uint64_t &result) {
				// Count the number of trailing zeros in the marker byte, this indicates the
				// number of trailing bytes that are part of the value. We use `uint32_t`
				// here because we only care about the first byte, and so that be actually
				// get ctz intrinsic calls when possible (the `uint8_t` overload uses a loop
				// implementation).
				uint32_t numBytes =
				llvm::countTrailingZeros<uint32_t>(result, llvm::ZB_Undefined);
				assert(numBytes > 0 && numBytes <= 7 &&
				"unexpected number of trailing zeros in varint encoding");

				// Parse in the remaining bytes of the value.
				if (failed(parseBytes(numBytes, reinterpret_cast<uint8_t *>(&result) + 1)))
				return failure();

				// Shift out the low-order bits that were used to mark how the value was
				// encoded.
				result >>= (numBytes + 1);
				return success();
				}

				/// The current data iterator, and an iterator to the end of the buffer.
				const uint8_t dataIt, dataEnd;

				/// A location for the bytecode used to report errors.
				Location fileLoc;
				};
				} // namespace

				/// Resolve an index into the given entry list. `entry` may either be a
				/// reference, in which case it is assigned to the corresponding value in
				/// `entries`, or a pointer, in which case it is assigned to the address of the
				/// element in `entries`.
				mehdi_aminiUnsubmitted Done Reply Inline Actions We should have a pointer to the BytecodeDialect here I think, should be able to set it up in initializeOffsets mehdi_amini: We should have a pointer to the BytecodeDialect here I think, should be able to set it up in…
				template <typename RangeT, typename T>
				static LogicalResult resolveEntry(EncodingReader &reader, RangeT &entries,
				uint64_t index, T &entry,
				StringRef entryStr) {
				if (index >= entries.size())
				return reader.emitError("invalid ", entryStr, " index: ", index);

				// If the provided entry is a pointer, resolve to the address of the entry.
				if constexpr (std::is_convertible_v<llvm::detail::ValueOfRange<RangeT>, T>)
				entry = entries[index];
				jpienaarUnsubmitted Done Reply Inline Actions Dialect of this Attribute or Type ? (its mostly parent that makes me think OOP more than I normally think here, up to you) jpienaar: Dialect of this Attribute or Type ? (its mostly parent that makes me think OOP more than I…
				else
				entry = &entries[index];
				return success();
				}

				/// Parse and resolve an index into the given entry list.
				template <typename RangeT, typename T>
				static LogicalResult parseEntry(EncodingReader &reader, RangeT &entries,
				T &entry, StringRef entryStr) {
				uint64_t entryIdx;
				if (failed(reader.parseVarInt(entryIdx)))
				return failure();
				return resolveEntry(reader, entries, entryIdx, entry, entryStr);
				}

				//===----------------------------------------------------------------------===//
				jpienaarUnsubmitted Done Reply Inline Actions Unsigned needed? jpienaar: Unsigned needed?
				// BytecodeDialect
				//===----------------------------------------------------------------------===//

				namespace {
				/// This struct represents a dialect entry within the bytecode.
				struct BytecodeDialect {
				mehdi_aminiUnsubmitted Done Reply Inline Actions We should be able to have an enum for code here right? mehdi_amini: We should be able to have an enum for code here right?
				/// Load the dialect into the provided context if it hasn't been loaded yet.
				/// Returns failure if the dialect couldn't be loaded and the provided
				/// context does not allow unregistered dialects. The provided reader is used
				/// for error emission if necessary.
				LogicalResult load(EncodingReader &reader, MLIRContext *ctx) {
				if (dialect)
				return success();
				Dialect *loadedDialect = ctx->getOrLoadDialect(name);
				if (!loadedDialect && !ctx->allowsUnregisteredDialects()) {
				return reader.emitError(
				"dialect '", name,
				"' is unknown. If this is intended, please call "
				"allowUnregisteredDialects() on the MLIRContext, or use "
				"-allow-unregistered-dialect with the MLIR tool used.");
				}
				dialect = loadedDialect;
				return success();
				}

				/// The loaded dialect entry. This field is None if we haven't attempted to
				/// load, nullptr if we failed to load, otherwise the loaded dialect.
				Optional<Dialect *> dialect;

				/// The name of the dialect.
				StringRef name;
				};

				/// This struct represents an operation name entry within the bytecode.
				struct BytecodeOperationName {
				BytecodeOperationName(BytecodeDialect *dialect, StringRef name)
				: dialect(dialect), name(name) {}

				/// The loaded operation name, or None if it hasn't been processed yet.
				mehdi_aminiUnsubmitted Done Reply Inline Actions `offsetReader`? (I was confusing to me reading the code where it is used) mehdi_amini: `offsetReader`? (I was confusing to me reading the code where it is used)
				Optional<OperationName> opName;

				/// The dialect that owns this operation name.
				BytecodeDialect *dialect;

				/// The name of the operation, without the dialect prefix.
				StringRef name;
				};
				} // namespace

				/// Parse a single dialect group encoded in the byte stream.
				static LogicalResult parseDialectGrouping(
				EncodingReader &reader, MutableArrayRef<BytecodeDialect> dialects,
				function_ref<LogicalResult(BytecodeDialect *)> entryCallback) {
				// Parse the dialect and the number of entries in the group.
				BytecodeDialect *dialect;
				if (failed(parseEntry(reader, dialects, dialect, "dialect")))
				return failure();
				uint64_t numEntries;
				if (failed(reader.parseVarInt(numEntries)))
				return failure();

				for (uint64_t i = 0; i < numEntries; ++i)
				if (failed(entryCallback(dialect)))
				mehdi_aminiUnsubmitted Done Reply Inline Actions I think you should check that currentOffset does not exceed the `sectionData.size()`, a malformed byte code coud have offsets going beyond. mehdi_amini: I think you should check that currentOffset does not exceed the `sectionData.size()`, a…
				return failure();
				return success();
				}
				jpienaarUnsubmitted Done Reply Inline Actions attribute ? jpienaar: attribute ?

				//===----------------------------------------------------------------------===//
				// Attribute/Type Reader
				//===----------------------------------------------------------------------===//

				namespace {
				/// This class provides support for reading attribute and type entries from the
				/// bytecode. Attribute and Type entries are read lazily on demand, so we use
				/// this reader to manage when to actually parse them from the bytecode.
				class AttrTypeReader {
				/// This class represents a single attribute or type entry.
				template <typename T>
				struct Entry {
				/// The entry, or null if it hasn't been resolved yet.
				T entry = {};
				/// The parent dialect of this entry.
				BytecodeDialect *dialect = nullptr;
				/// A flag indicating if the entry was encoded using a custom encoding,
				/// instead of using the textual assembly format.
				bool hasCustomEncoding = false;
				/// The raw data of this entry in the bytecode.
				ArrayRef<uint8_t> data;
				};
				using AttrEntry = Entry<Attribute>;
				using TypeEntry = Entry<Type>;

				public:
				AttrTypeReader(Location fileLoc) : fileLoc(fileLoc) {}

				/// Initialize the attribute and type information within the reader.
				LogicalResult initialize(MutableArrayRef<BytecodeDialect> dialects,
				ArrayRef<uint8_t> sectionData,
				ArrayRef<uint8_t> offsetSectionData);

				/// Resolve the attribute or type at the given index. Returns nullptr on
				/// failure.
				Attribute resolveAttribute(size_t index) {
				return resolveEntry(attributes, index, "Attribute");
				}
				Type resolveType(size_t index) { return resolveEntry(types, index, "Type"); }

				private:
				/// Resolve the given entry at `index`.
				template <typename T>
				T resolveEntry(SmallVectorImpl<Entry<T>> &entries, size_t index,
				StringRef entryType);

				/// Parse the value defined within the given reader. `code` indicates how the
				/// entry was encoded.
				LogicalResult parseEntry(EncodingReader &reader, bool hasCustomEncoding,
				Attribute &result);
				LogicalResult parseEntry(EncodingReader &reader, bool hasCustomEncoding,
				Type &result);

				/// The set of attribute and type entries.
				SmallVector<AttrEntry> attributes;
				SmallVector<TypeEntry> types;

				/// A location used for error emission.
				Location fileLoc;
				};
				} // namespace

				LogicalResult
				AttrTypeReader::initialize(MutableArrayRef<BytecodeDialect> dialects,
				ArrayRef<uint8_t> sectionData,
				ArrayRef<uint8_t> offsetSectionData) {
				EncodingReader offsetReader(offsetSectionData, fileLoc);

				// Parse the number of attribute and type entries.
				uint64_t numAttributes, numTypes;
				if (failed(offsetReader.parseVarInt(numAttributes)) \|\|
				mehdi_aminiUnsubmitted Done Reply Inline Actions I remember that Attr/Type were made "mutable" to support LLVM named struct (IIRC?), but isn't this encoding and loading scheme assuming there are no cycles? How are we gonna handle this? mehdi_amini: I remember that Attr/Type were made "mutable" to support LLVM named struct (IIRC?), but isn't…
				rriddleAuthorUnsubmitted Done Reply Inline Actions We will likely need some form of special API that can parse just the "immutable" part (i.e. the "name" in the LLVM struct case). For example, if an attribute/type is recursive, we could encode both its immutable and mutable encodings in one entry (with some header that has the size of the immutable part or something). Something like: RecursiveEntry { immutableEncodingSize: varint, immutableEncoding: ..., mutableEncoding: ... } During processing we could first process the immutable entry, and then immediately process the mutable one. That way any recursive references would resolve properly, and then we'd fix the final reference afterwards. Something like: // Parse the immutable first, so that we have something to give recursive references. if (!(result = parseImmutable())) return failure(); // Parse the mutable afterwards. Pass in `result` so that it can populate the mutable bits? if (failed(parseMutable(result)) return failure(); Until we figure any of this out though, I'm just going to have them always use the string fallback for those attributes and types. rriddle: We will likely need some form of special API that can parse just the "immutable" part (i.e. the…
				failed(offsetReader.parseVarInt(numTypes)))
				return failure();
				attributes.resize(numAttributes);
				types.resize(numTypes);

				// A functor used to accumulate the offsets for the entries in the given
				// range.
				uint64_t currentOffset = 0;
				auto parseEntries = [&](auto &&range) {
				size_t currentIndex = 0, endIndex = range.size();

				// Parse an individual entry.
				auto parseEntryFn = [&](BytecodeDialect *dialect) {
				auto &entry = range[currentIndex++];

				uint64_t entrySize;
				if (failed(offsetReader.parseVarIntWithFlag(entrySize,
				entry.hasCustomEncoding)))
				return failure();

				// Verify that the offset is actually valid.
				if (currentOffset + entrySize > sectionData.size()) {
				return offsetReader.emitError(
				"Attribute or Type entry offset points past the end of section");
				}

				entry.data = sectionData.slice(currentOffset, entrySize);
				entry.dialect = dialect;
				currentOffset += entrySize;
				return success();
				};
				while (currentIndex != endIndex)
				if (failed(parseDialectGrouping(offsetReader, dialects, parseEntryFn)))
				jpienaarUnsubmitted Done Reply Inline Actions Comment? jpienaar: Comment?
				return failure();
				return success();
				};

				// Process each of the attributes, and then the types.
				if (failed(parseEntries(attributes)) \|\| failed(parseEntries(types)))
				return failure();

				// Ensure that we read everything from the section.
				if (!offsetReader.empty()) {
				return offsetReader.emitError(
				"unexpected trailing data in the Attribute/Type offset section");
				}
				return success();
				}

				template <typename T>
				T AttrTypeReader::resolveEntry(SmallVectorImpl<Entry<T>> &entries, size_t index,
				StringRef entryType) {
				if (index >= entries.size()) {
				emitError(fileLoc) << "invalid " << entryType << " index: " << index;
				return {};
				}

				// If the entry has already been resolved, there is nothing left to do.
				Entry<T> &entry = entries[index];
				if (entry.entry)
				return entry.entry;

				// Parse the entry.
				EncodingReader reader(entry.data, fileLoc);
				if (failed(parseEntry(reader, entry.hasCustomEncoding, entry.entry)))
				return T();
				if (!reader.empty()) {
				(void)reader.emitError("unexpected trailing bytes after " + entryType +
				" entry");
				return T();
				}
				return entry.entry;
				}

				LogicalResult AttrTypeReader::parseEntry(EncodingReader &reader,
				bool hasCustomEncoding,
				Attribute &result) {
				// Handle the fallback case, where the attribute was encoded using its
				// assembly format.
				if (!hasCustomEncoding) {
				StringRef attrStr;
				if (failed(reader.parseNullTerminatedString(attrStr)))
				return failure();

				size_t numRead = 0;
				if (!(result = parseAttribute(attrStr, fileLoc->getContext(), numRead)))
				return failure();
				if (numRead != attrStr.size()) {
				return reader.emitError(
				"trailing characters found after Attribute assembly format: ",
				attrStr.drop_front(numRead));
				}
				return success();
				}

				return reader.emitError("unexpected Attribute encoding");
				}

				LogicalResult AttrTypeReader::parseEntry(EncodingReader &reader,
				bool hasCustomEncoding, Type &result) {
				// Handle the fallback case, where the type was encoded using its
				// assembly format.
				if (!hasCustomEncoding) {
				StringRef typeStr;
				if (failed(reader.parseNullTerminatedString(typeStr)))
				return failure();

				size_t numRead = 0;
				if (!(result = parseType(typeStr, fileLoc->getContext(), numRead)))
				return failure();
				if (numRead != typeStr.size()) {
				return reader.emitError(
				"trailing characters found after Type assembly format: " +
				typeStr.drop_front(numRead));
				}
				return success();
				}

				return reader.emitError("unexpected Type encoding");
				}

				jpienaarUnsubmitted Done Reply Inline Actions So this parses the string starting at front of reader? And null-terminated? jpienaar: So this parses the string starting at front of reader? And null-terminated?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Yeah, the front of the reader has an index to a string defined in the string section. Updated the comment. rriddle: Yeah, the front of the reader has an index to a string defined in the string section. Updated…
				//===----------------------------------------------------------------------===//
				// Bytecode Reader
				//===----------------------------------------------------------------------===//

				namespace {
				/// This class is used to read a bytecode buffer and translate it into MLIR.
				class BytecodeReader {
				public:
				BytecodeReader(Location fileLoc, const ParserConfig &config)
				: config(config), fileLoc(fileLoc), attrTypeReader(fileLoc),
				// Use the builtin unrealized conversion cast operation to represent
				// forward references to values that aren't yet defined.
				forwardRefOpState(UnknownLoc::get(config.getContext()),
				"builtin.unrealized_conversion_cast", ValueRange(),
				NoneType::get(config.getContext())) {}

				/// Read the bytecode defined within `buffer` into the given block.
				LogicalResult read(llvm::MemoryBufferRef buffer, Block *block);

				private:
				/// Return the context for this config.
				MLIRContext *getContext() const { return config.getContext(); }

				/// Parse the bytecode version.
				LogicalResult parseVersion(EncodingReader &reader);

				//===--------------------------------------------------------------------===//
				// Dialect Section

				LogicalResult parseDialectSection(ArrayRef<uint8_t> sectionData);

				/// Parse an operation name reference using the given reader.
				FailureOr<OperationName> parseOpName(EncodingReader &reader);

				//===--------------------------------------------------------------------===//
				// Attribute/Type Section

				/// Parse an attribute or type using the given reader. Returns nullptr in the
				/// case of failure.
				Attribute parseAttribute(EncodingReader &reader);
				Type parseType(EncodingReader &reader);

				template <typename T>
				T parseAttribute(EncodingReader &reader) {
				if (Attribute attr = parseAttribute(reader)) {
				if (auto derivedAttr = attr.dyn_cast<T>())
				return derivedAttr;
				(void)reader.emitError("expected attribute of type: ",
				llvm::getTypeName<T>(), ", but got: ", attr);
				}
				return T();
				}

				//===--------------------------------------------------------------------===//
				// IR Section

				/// This struct represents the current read state of a range of regions. This
				/// struct is used to enable iterative parsing of regions.
				struct RegionReadState {
				RegionReadState(Operation *op, bool isIsolatedFromAbove)
				: RegionReadState(op->getRegions(), isIsolatedFromAbove) {}
				RegionReadState(MutableArrayRef<Region> regions, bool isIsolatedFromAbove)
				: curRegion(regions.begin()), endRegion(regions.end()),
				isIsolatedFromAbove(isIsolatedFromAbove) {}
				mehdi_aminiUnsubmitted Done Reply Inline Actions This should move to `parseSection I think. (`sectionID` is used in test/set already, seems unsafe) mehdi_amini: This should move to `parseSection I think. (`sectionID` is used in test/set already, seems…

				/// The current regions being read.
				MutableArrayRef<Region>::iterator curRegion, endRegion;

				/// The number of values defined immediately within this region.
				unsigned numValues = 0;

				/// The current blocks of the region being read.
				SmallVector<Block *> curBlocks;
				Region::iterator curBlock = {};

				/// The number of operations remaining to be read from the current block
				/// being read.
				uint64_t numOpsRemaining = 0;

				/// A flag indicating if the regions being read are isolated from above.
				bool isIsolatedFromAbove = false;
				};

				LogicalResult parseIRSection(ArrayRef<uint8_t> sectionData, Block *block);
				LogicalResult parseRegions(EncodingReader &reader,
				std::vector<RegionReadState> &regionStack,
				RegionReadState &readState);
				FailureOr<Operation *> parseOpWithoutRegions(EncodingReader &reader,
				RegionReadState &readState,
				bool &isIsolatedFromAbove);

				mehdi_aminiUnsubmitted Done Reply Inline Actions Why aren't dialects lazy loaded? mehdi_amini: Why aren't dialects lazy loaded?
				LogicalResult parseRegion(EncodingReader &reader, RegionReadState &readState);
				LogicalResult parseBlock(EncodingReader &reader, RegionReadState &readState);
				LogicalResult parseBlockArguments(EncodingReader &reader, Block *block);

				//===--------------------------------------------------------------------===//
				// String Section

				LogicalResult parseStringSection(ArrayRef<uint8_t> sectionData);

				/// Parse a shared string from the string section. The shared string is
				/// encoded using an index to a corresponding string in the string section.
				LogicalResult parseSharedString(EncodingReader &reader, StringRef &result) {
				return parseEntry(reader, strings, result, "string");
				}

				//===--------------------------------------------------------------------===//
				// Value Processing

				/// Parse an operand reference using the given reader. Returns nullptr in the
				/// case of failure.
				Value parseOperand(EncodingReader &reader);

				/// Sequentially define the given value range.
				mehdi_aminiUnsubmitted Done Reply Inline Actions I was thinking: could we have a stringpool top-level section and everywhere refer to strings with an id there? Mnemonic shared between op/attributes/types and across dialects would be stored once and for-all. mehdi_amini: I was thinking: could we have a stringpool top-level section and everywhere refer to strings…
				jpienaarUnsubmitted Done Reply Inline Actions So encoding would be start and end offsets into a string table? jpienaar: So encoding would be start and end offsets into a string table?
				mehdi_aminiUnsubmitted Done Reply Inline Actions String being null terminated, you don't necessarily need the end offsets. But if we have an offset section separate from the string table: we just need to point to an entry number, same mechanism as attr/type reference. mehdi_amini: String being null terminated, you don't necessarily need the end offsets. But if we have an…
				jpienaarUnsubmitted Done Reply Inline Actions Indeed, null-termination means we can't have substrings referenced (not sure if that is common here, could think for error strings, but unsure about decoding cost). jpienaar: Indeed, null-termination means we can't have substrings referenced (not sure if that is common…
				LogicalResult defineValues(EncodingReader &reader, ValueRange values);

				/// Create a value to use for a forward reference.
				Value createForwardRef();

				//===--------------------------------------------------------------------===//
				// Fields

				/// This class represents a single value scope, in which a value scope is
				/// delimited by isolated from above regions.
				struct ValueScope {
				/// Push a new region state onto this scope, reserving enough values for
				/// those defined within the current region of the provided state.
				void push(RegionReadState &readState) {
				nextValueIDs.push_back(values.size());
				values.resize(values.size() + readState.numValues);
				}

				/// Pop the values defined for the current region within the provided region
				/// state.
				void pop(RegionReadState &readState) {
				values.resize(values.size() - readState.numValues);
				nextValueIDs.pop_back();
				}

				/// The set of values defined in this scope.
				std::vector<Value> values;

				/// The ID for the next defined value for each region current being
				/// processed in this scope.
				SmallVector<unsigned, 4> nextValueIDs;
				};

				/// The configuration of the parser.
				const ParserConfig &config;

				/// A location to use when emitting errors.
				Location fileLoc;

				/// The reader used to process attribute and types within the bytecode.
				AttrTypeReader attrTypeReader;

				/// The version of the bytecode being read.
				uint64_t version = 0;

				/// The producer of the bytecode being read.
				StringRef producer;

				/// The table of IR units referenced within the bytecode file.
				SmallVector<BytecodeDialect> dialects;
				SmallVector<BytecodeOperationName> opNames;

				/// The table of strings referenced within the bytecode file.
				SmallVector<StringRef> strings;

				/// The current set of available IR value scopes.
				std::vector<ValueScope> valueScopes;
				/// A block containing the set of operations defined to create forward
				/// references.
				Block forwardRefOps;
				/// A block containing previously created, and no longer used, forward
				/// reference operations.
				Block openForwardRefOps;
				/// An operation state used when instantiating forward references.
				OperationState forwardRefOpState;
				};
				} // namespace

				LogicalResult BytecodeReader::read(llvm::MemoryBufferRef buffer, Block *block) {
				EncodingReader reader(buffer.getBuffer(), fileLoc);

				// Skip over the bytecode header, this should have already been checked.
				if (failed(reader.skipBytes(StringRef("ML\xefR").size())))
				return failure();
				// Parse the bytecode version and producer.
				if (failed(parseVersion(reader)) \|\|
				failed(reader.parseNullTerminatedString(producer)))
				return failure();

				// Add a diagnostic handler that attaches a note that includes the original
				// producer of the bytecode.
				ScopedDiagnosticHandler diagHandler(getContext(), [&](Diagnostic &diag) {
				diag.attachNote() << "in bytecode version " << version
				<< " produced by: " << producer;
				return failure();
				});

				// Parse the raw data for each of the top-level sections of the bytecode.
				Optional<ArrayRef<uint8_t>> sectionDatas[bytecode::Section::kNumSections];
				while (!reader.empty()) {
				// Read the next section from the bytecode.
				bytecode::Section::ID sectionID;
				ArrayRef<uint8_t> sectionData;
				if (failed(reader.parseSection(sectionID, sectionData)))
				return failure();

				// Check for duplicate sections, we only expect one instance of each.
				if (sectionDatas[sectionID]) {
				return reader.emitError("duplicate top-level section: ",
				toString(sectionID));
				}
				sectionDatas[sectionID] = sectionData;
				}
				// Check that all of the sections were found.
				for (int i = 0; i < bytecode::Section::kNumSections; ++i) {
				if (!sectionDatas[i]) {
				return reader.emitError("missing data for top-level section: ",
				toString(bytecode::Section::ID(i)));
				}
				}

				// Process the string section first.
				if (failed(parseStringSection(*sectionDatas[bytecode::Section::kString])))
				return failure();

				// Process the dialect section.
				if (failed(parseDialectSection(*sectionDatas[bytecode::Section::kDialect])))
				return failure();

				// Process the attribute and type section.
				if (failed(attrTypeReader.initialize(
				dialects, *sectionDatas[bytecode::Section::kAttrType],
				*sectionDatas[bytecode::Section::kAttrTypeOffset])))
				return failure();

				// Finally, process the IR section.
				return parseIRSection(*sectionDatas[bytecode::Section::kIR], block);
				}

				LogicalResult BytecodeReader::parseVersion(EncodingReader &reader) {
				if (failed(reader.parseVarInt(version)))
				return failure();

				mehdi_aminiUnsubmitted Done Reply Inline Actions Please break the recursion :) mehdi_amini: Please break the recursion :)
				// Validate the bytecode version.
				uint64_t currentVersion = bytecode::kVersion;
				if (version < currentVersion) {
				return reader.emitError("bytecode version ", version,
				" is older than the current version of ",
				currentVersion, ", and upgrade is not supported");
				mehdi_aminiUnsubmitted Done Reply Inline Actions Not sure where to attached this comment, but there is something missing somewhere (unless I missed it?) to ensure that use-lists ordering is preserved. mehdi_amini: Not sure where to attached this comment, but there is something missing somewhere (unless I…
				rriddleAuthorUnsubmitted Done Reply Inline Actions Deferring this to a follow up to help simplify this patch, added a TODO for now. rriddle: Deferring this to a follow up to help simplify this patch, added a TODO for now.
				}
				if (version > currentVersion) {
				return reader.emitError("bytecode version ", version,
				" is newer than the current version ",
				currentVersion);
				}
				return success();
				}

				//===----------------------------------------------------------------------===//
				// Dialect Section

				LogicalResult
				BytecodeReader::parseDialectSection(ArrayRef<uint8_t> sectionData) {
				EncodingReader sectionReader(sectionData, fileLoc);

				// Parse the number of dialects in the section.
				uint64_t numDialects;
				if (failed(sectionReader.parseVarInt(numDialects)))
				return failure();
				dialects.resize(numDialects);

				// Parse each of the dialects.
				for (uint64_t i = 0; i < numDialects; ++i)
				if (failed(parseSharedString(sectionReader, dialects[i].name)))
				return failure();

				// Parse the operation names, which are grouped by dialect.
				auto parseOpName = [&](BytecodeDialect *dialect) {
				StringRef opName;
				if (failed(parseSharedString(sectionReader, opName)))
				return failure();
				opNames.emplace_back(dialect, opName);
				return success();
				};
				while (!sectionReader.empty())
				if (failed(parseDialectGrouping(sectionReader, dialects, parseOpName)))
				return failure();
				return success();
				}

				FailureOr<OperationName> BytecodeReader::parseOpName(EncodingReader &reader) {
				BytecodeOperationName *opName = nullptr;
				if (failed(parseEntry(reader, opNames, opName, "operation name")))
				return failure();

				// Check to see if this operation name has already been resolved. If we
				// haven't, load the dialect and build the operation name.
				if (!opName->opName) {
				if (failed(opName->dialect->load(reader, getContext())))
				return failure();
				opName->opName.emplace((opName->dialect->name + "." + opName->name).str(),
				getContext());
				}
				return *opName->opName;
				}

				//===----------------------------------------------------------------------===//
				// Attribute/Type Section

				Attribute BytecodeReader::parseAttribute(EncodingReader &reader) {
				uint64_t attrIdx;
				if (failed(reader.parseVarInt(attrIdx)))
				return Attribute();
				return attrTypeReader.resolveAttribute(attrIdx);
				}

				Type BytecodeReader::parseType(EncodingReader &reader) {
				uint64_t typeIdx;
				if (failed(reader.parseVarInt(typeIdx)))
				return Type();
				return attrTypeReader.resolveType(typeIdx);
				}

				//===----------------------------------------------------------------------===//
				// IR Section

				LogicalResult BytecodeReader::parseIRSection(ArrayRef<uint8_t> sectionData,
				Block *block) {
				EncodingReader reader(sectionData, fileLoc);

				// A stack of operation regions currently being read from the bytecode.
				std::vector<RegionReadState> regionStack;

				// Parse the top-level block using a temporary module operation.
				OwningOpRef<ModuleOp> moduleOp = ModuleOp::create(fileLoc);
				regionStack.emplace_back(moduleOp, /isIsolatedFromAbove=*/true);
				regionStack.back().curBlocks.push_back(moduleOp->getBody());
				regionStack.back().curBlock = regionStack.back().curRegion->begin();
				if (failed(parseBlock(reader, regionStack.back())))
				return failure();
				valueScopes.emplace_back(ValueScope());
				valueScopes.back().push(regionStack.back());

				// Iteratively parse regions until everything has been resolved.
				while (!regionStack.empty())
				if (failed(parseRegions(reader, regionStack, regionStack.back())))
				return failure();
				if (!forwardRefOps.empty()) {
				return reader.emitError(
				"not all forward unresolved forward operand references");
				}

				// Verify that the parsed operations are valid.
				if (failed(verify(*moduleOp)))
				return failure();

				// Splice the parsed operations over to the provided top-level block.
				auto &parsedOps = moduleOp->getBody()->getOperations();
				auto &destOps = block->getOperations();
				destOps.splice(destOps.empty() ? destOps.end() : std::prev(destOps.end()),
				parsedOps, parsedOps.begin(), parsedOps.end());
				return success();
				}

				LogicalResult
				BytecodeReader::parseRegions(EncodingReader &reader,
				std::vector<RegionReadState> &regionStack,
				RegionReadState &readState) {
				// Read the regions of this operation.
				for (; readState.curRegion != readState.endRegion; ++readState.curRegion) {
				// If the current block hasn't been setup yet, parse the header for this
				// region.
				if (readState.curBlock == Region::iterator()) {
				if (failed(parseRegion(reader, readState)))
				return failure();

				// If the region is empty, there is nothing to more to do.
				if (readState.curRegion->empty())
				continue;
				}

				// Parse the blocks within the region.
				do {
				while (readState.numOpsRemaining--) {
				// Read in the next operation. We don't read its regions directly, we
				// handle those afterwards as necessary.
				bool isIsolatedFromAbove = false;
				FailureOr<Operation *> op =
				parseOpWithoutRegions(reader, readState, isIsolatedFromAbove);
				if (failed(op))
				return failure();

				// If the op has regions, add it to the stack for processing.
				if ((*op)->getNumRegions()) {
				regionStack.emplace_back(*op, isIsolatedFromAbove);
				vitalybukaUnsubmitted Done Reply Inline Actions also a problem, it emplace_back may relocate container, but the for loop above uses readState which is the ref to the element of container. vitalybuka: also a problem, it emplace_back may relocate container, but the for loop above uses readState…
				rriddleAuthorUnsubmitted Done Reply Inline Actions This should be fine, given that we always return in this case (i.e. never touch to invalid reference again). rriddle: This should be fine, given that we always return in this case (i.e. never touch to invalid…
				vitalybukaUnsubmitted Not Done Reply Inline Actions Thanks, I see. vitalybuka: Thanks, I see.

				// If the op is isolated from above, push a new value scope.
				if (isIsolatedFromAbove)
				valueScopes.emplace_back(ValueScope());
				return success();
				}
				}

				// Move to the next block of the region.
				if (++readState.curBlock == readState.curRegion->end())
				break;
				if (failed(parseBlock(reader, readState)))
				return failure();
				} while (true);

				// Reset the current block and any values reserved for this region.
				readState.curBlock = {};
				valueScopes.back().pop(readState);
				}

				// When the regions have been fully parsed, pop them off of the read stack. If
				// the regions were isolated from above, we also pop the last value scope.
				regionStack.pop_back();
				if (readState.isIsolatedFromAbove)
				vitalybukaUnsubmitted Done Reply Inline Actions This pop_back and then readState.isIsolatedFromAbove which from the regionStack? vitalybuka: This pop_back and then readState.isIsolatedFromAbove which from the regionStack?
				rriddleAuthorUnsubmitted Done Reply Inline Actions Thanks for catching this. I'm not sure why my local asan build didn't catch this (I'll try nuking and resetting it). rriddle: Thanks for catching this. I'm not sure why my local asan build didn't catch this (I'll try…
				vitalybukaUnsubmitted Not Done Reply Inline Actions I'm not sure why my local asan build didn't catch this Probably you don't use libc++ or instrumented libc++? I'm not sure why my local asan build didn't catch this I'm not sure why my local asan build didn't catch this If you can fix it quickly go for it. If not, please let me know, I have a patch to revert it with related fixes. vitalybuka: >> I'm not sure why my local asan build didn't catch this Probably you don't use libc++ or…
				valueScopes.pop_back();
				return success();
				}

				FailureOr<Operation *>
				BytecodeReader::parseOpWithoutRegions(EncodingReader &reader,
				RegionReadState &readState,
				bool &isIsolatedFromAbove) {
				// Parse the name of the operation.
				FailureOr<OperationName> opName = parseOpName(reader);
				if (failed(opName))
				return failure();

				// Parse the operation mask, which indicates which components of the operation
				// are present.
				uint8_t opMask;
				if (failed(reader.parseByte(opMask)))
				return failure();

				/// Parse the location.
				LocationAttr opLoc = parseAttribute<LocationAttr>(reader);
				if (!opLoc)
				return failure();

				// With the location and name resolved, we can start building the operation
				// state.
				OperationState opState(opLoc, *opName);

				// Parse the attributes of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasAttrs) {
				DictionaryAttr dictAttr = parseAttribute<DictionaryAttr>(reader);
				if (!dictAttr)
				return failure();
				opState.attributes = dictAttr;
				}

				/// Parse the results of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasResults) {
				uint64_t numResults;
				if (failed(reader.parseVarInt(numResults)))
				return failure();
				opState.types.resize(numResults);
				for (int i = 0, e = numResults; i < e; ++i)
				if (!(opState.types[i] = parseType(reader)))
				return failure();
				}

				/// Parse the operands of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasOperands) {
				uint64_t numOperands;
				if (failed(reader.parseVarInt(numOperands)))
				return failure();
				opState.operands.resize(numOperands);
				for (int i = 0, e = numOperands; i < e; ++i)
				if (!(opState.operands[i] = parseOperand(reader)))
				return failure();
				}

				/// Parse the successors of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasSuccessors) {
				uint64_t numSuccs;
				if (failed(reader.parseVarInt(numSuccs)))
				return failure();
				opState.successors.resize(numSuccs);
				for (int i = 0, e = numSuccs; i < e; ++i) {
				if (failed(parseEntry(reader, readState.curBlocks, opState.successors[i],
				"successor")))
				return failure();
				}
				}

				/// Parse the regions of the operation.
				if (opMask & bytecode::OpEncodingMask::kHasInlineRegions) {
				uint64_t numRegions;
				if (failed(reader.parseVarIntWithFlag(numRegions, isIsolatedFromAbove)))
				return failure();

				opState.regions.reserve(numRegions);
				for (int i = 0, e = numRegions; i < e; ++i)
				opState.regions.push_back(std::make_unique<Region>());
				}

				// Create the operation at the back of the current block.
				Operation *op = Operation::create(opState);
				readState.curBlock->push_back(op);

				// If the operation had results, update the value references.
				if (op->getNumResults() && failed(defineValues(reader, op->getResults())))
				return failure();

				return op;
				}

				LogicalResult BytecodeReader::parseRegion(EncodingReader &reader,
				RegionReadState &readState) {
				// Parse the number of blocks in the region.
				uint64_t numBlocks;
				if (failed(reader.parseVarInt(numBlocks)))
				return failure();

				// If the region is empty, there is nothing else to do.
				if (numBlocks == 0)
				return success();

				// Parse the number of values defined in this region.
				uint64_t numValues;
				if (failed(reader.parseVarInt(numValues)))
				return failure();
				readState.numValues = numValues;

				// Create the blocks within this region. We do this before processing so that
				// we can rely on the blocks existing when creating operations.
				readState.curBlocks.clear();
				readState.curBlocks.reserve(numBlocks);
				for (uint64_t i = 0; i < numBlocks; ++i) {
				readState.curBlocks.push_back(new Block());
				readState.curRegion->push_back(readState.curBlocks.back());
				}

				// Prepare the current value scope for this region.
				valueScopes.back().push(readState);

				// Parse the entry block of the region.
				readState.curBlock = readState.curRegion->begin();
				return parseBlock(reader, readState);
				}

				LogicalResult BytecodeReader::parseBlock(EncodingReader &reader,
				RegionReadState &readState) {
				bool hasArgs;
				if (failed(reader.parseVarIntWithFlag(readState.numOpsRemaining, hasArgs)))
				return failure();

				// Parse the arguments of the block.
				if (hasArgs && failed(parseBlockArguments(reader, &*readState.curBlock)))
				return failure();

				// We don't parse the operations of the block here, that's done elsewhere.
				return success();
				}

				LogicalResult BytecodeReader::parseBlockArguments(EncodingReader &reader,
				Block *block) {
				// Parse the value ID for the first argument, and the number of arguments.
				uint64_t numArgs;
				if (failed(reader.parseVarInt(numArgs)))
				return failure();

				SmallVector<Type> argTypes;
				SmallVector<Location> argLocs;
				argTypes.reserve(numArgs);
				argLocs.reserve(numArgs);

				while (numArgs--) {
				Type argType = parseType(reader);
				if (!argType)
				return failure();
				LocationAttr argLoc = parseAttribute<LocationAttr>(reader);
				if (!argLoc)
				return failure();

				argTypes.push_back(argType);
				argLocs.push_back(argLoc);
				}
				block->addArguments(argTypes, argLocs);
				return defineValues(reader, block->getArguments());
				}

				//===----------------------------------------------------------------------===//
				// String Section

				LogicalResult
				BytecodeReader::parseStringSection(ArrayRef<uint8_t> sectionData) {
				EncodingReader stringReader(sectionData, fileLoc);

				// Parse the number of strings in the section.
				uint64_t numStrings;
				if (failed(stringReader.parseVarInt(numStrings)))
				return failure();
				strings.resize(numStrings);

				// Parse each of the strings. The sizes of the strings are encoded in reverse
				// order, so that's the order we populate the table.
				size_t stringDataEndOffset = sectionData.size();
				size_t totalStringDataSize = 0;
				for (StringRef &string : llvm::reverse(strings)) {
				uint64_t stringSize;
				if (failed(stringReader.parseVarInt(stringSize)))
				return failure();
				if (stringDataEndOffset < stringSize) {
				return stringReader.emitError(
				"string size exceeds the available data size");
				}

				// Extract the string from the data, dropping the null character.
				size_t stringOffset = stringDataEndOffset - stringSize;
				string = StringRef(
				reinterpret_cast<const char *>(sectionData.data() + stringOffset),
				stringSize - 1);
				stringDataEndOffset = stringOffset;

				// Update the total string data size.
				totalStringDataSize += stringSize;
				}

				// Check that the only remaining data was for the strings
				if (stringReader.size() != totalStringDataSize) {
				return stringReader.emitError("unexpected trailing data between the "
				"offsets for strings and their data");
				}
				return success();
				}

				//===----------------------------------------------------------------------===//
				// Value Processing

				Value BytecodeReader::parseOperand(EncodingReader &reader) {
				std::vector<Value> &values = valueScopes.back().values;
				Value *value = nullptr;
				if (failed(parseEntry(reader, values, value, "value")))
				return Value();

				// Create a new forward reference if necessary.
				if (!*value)
				*value = createForwardRef();
				return *value;
				}

				LogicalResult BytecodeReader::defineValues(EncodingReader &reader,
				ValueRange newValues) {
				ValueScope &valueScope = valueScopes.back();
				std::vector<Value> &values = valueScope.values;

				unsigned &valueID = valueScope.nextValueIDs.back();
				unsigned valueIDEnd = valueID + newValues.size();
				if (valueIDEnd > values.size()) {
				return reader.emitError(
				"value index range was outside of the expected range for "
				"the parent region, got [",
				valueID, ", ", valueIDEnd, "), but the maximum index was ",
				values.size() - 1);
				}

				// Assign the values and update any forward references.
				for (unsigned i = 0, e = newValues.size(); i != e; ++i, ++valueID) {
				Value newValue = newValues[i];

				// Check to see if a definition for this value already exists.
				if (Value oldValue = std::exchange(values[valueID], newValue)) {
				Operation *forwardRefOp = oldValue.getDefiningOp();

				// Assert that this is a forward reference operation. Given how we compute
				// definition ids (incrementally as we parse), it shouldn't be possible
				// for the value to be defined any other way.
				assert(forwardRefOp && forwardRefOp->getBlock() == &forwardRefOps &&
				"value index was already defined?");

				oldValue.replaceAllUsesWith(newValue);
				forwardRefOp->moveBefore(&openForwardRefOps, openForwardRefOps.end());
				}
				}
				return success();
				}

				Value BytecodeReader::createForwardRef() {
				// Check for an avaliable existing operation to use. Otherwise, create a new
				// fake operation to use for the reference.
				if (!openForwardRefOps.empty()) {
				Operation *op = &openForwardRefOps.back();
				op->moveBefore(&forwardRefOps, forwardRefOps.end());
				} else {
				forwardRefOps.push_back(Operation::create(forwardRefOpState));
				}
				return forwardRefOps.back().getResult(0);
				}

				//===----------------------------------------------------------------------===//
				// Entry Points
				//===----------------------------------------------------------------------===//

				bool mlir::isBytecode(llvm::MemoryBufferRef buffer) {
				return buffer.getBuffer().startswith("ML\xefR");
				}

				LogicalResult mlir::readBytecodeFile(llvm::MemoryBufferRef buffer, Block *block,
				const ParserConfig &config) {
				Location sourceFileLoc =
				FileLineColLoc::get(config.getContext(), buffer.getBufferIdentifier(),
				/line=/0, /column=/0);
				if (!isBytecode(buffer)) {
				return emitError(sourceFileLoc,
				"input buffer is not an MLIR bytecode file");
				}

				BytecodeReader reader(sourceFileLoc, config);
				return reader.read(buffer, block);
				}

mlir/lib/Bytecode/Reader/CMakeLists.txt

This file was added.

				add_mlir_library(MLIRBytecodeReader
				BytecodeReader.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Bytecode

				LINK_LIBS PUBLIC
				MLIRAsmParser
				MLIRIR
				MLIRSupport
				)

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp

This file was added.

				//===- BytecodeWriter.cpp - MLIR Bytecode Writer --------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Bytecode/BytecodeWriter.h"
				#include "../Encoding.h"
				#include "IRNumbering.h"
				#include "mlir/IR/BuiltinDialect.h"
				#include "mlir/IR/OpImplementation.h"
				#include "llvm/ADT/CachedHashString.h"
				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/SmallString.h"
				#include "llvm/Support/Debug.h"
				#include <random>

				#define DEBUG_TYPE "mlir-bytecode-writer"

				using namespace mlir;
				using namespace mlir::bytecode::detail;

				//===----------------------------------------------------------------------===//
				// EncodingEmitter
				//===----------------------------------------------------------------------===//

				namespace {
				/// This class functions as the underlying encoding emitter for the bytecode
				/// writer. This class is a bit different compared to other types of encoders;
				/// it does not use a single buffer, but instead may contain several buffers
				/// (some owned by the writer, and some not) that get concatted during the final
				/// emission.
				class EncodingEmitter {
				public:
				EncodingEmitter() = default;
				EncodingEmitter(const EncodingEmitter &) = delete;
				EncodingEmitter &operator=(const EncodingEmitter &) = delete;

				/// Write the current contents to the provided stream.
				void writeTo(raw_ostream &os) const;

				/// Return the current size of the encoded buffer.
				size_t size() const { return prevResultSize + currentResult.size(); }

				//===--------------------------------------------------------------------===//
				// Emission
				//===--------------------------------------------------------------------===//

				/// Backpatch a byte in the result buffer at the given offset.
				void patchByte(uint64_t offset, uint8_t value) {
				assert(offset < size() && offset >= prevResultSize &&
				"cannot patch previously emitted data");
				currentResult[offset - prevResultSize] = value;
				}

				//===--------------------------------------------------------------------===//
				// Integer Emission

				/// Emit a single byte.
				template <typename T>
				void emitByte(T byte) {
				currentResult.push_back(static_cast<uint8_t>(byte));
				}

				/// Emit a range of bytes.
				void emitBytes(ArrayRef<uint8_t> bytes) {
				llvm::append_range(currentResult, bytes);
				}

				/// Emit a variable length integer. The first encoded byte contains a prefix
				/// in the low bits indicating the encoded length of the value. This length
				/// prefix is a bit sequence of '0's followed by a '1'. The number of '0' bits
				/// indicate the number of _additional_ bytes (not including the prefix byte).
				/// All remaining bits in the first byte, along with all of the bits in
				/// additional bytes, provide the value of the integer encoded in
				/// little-endian order.
				void emitVarInt(uint64_t value) {
				// In the most common case, the value can be represented in a single byte.
				// Given how hot this case is, explicitly handle that here.
				if ((value >> 7) == 0)
				return emitByte((value << 1) \| 0x1);
				emitMultiByteVarInt(value);
				}

				/// Emit a variable length integer whose low bit is used to encode the
				/// provided flag, i.e. encoded as: (value << 1) \| (flag ? 1 : 0).
				void emitVarIntWithFlag(uint64_t value, bool flag) {
				emitVarInt((value << 1) \| (flag ? 1 : 0));
				}

				//===--------------------------------------------------------------------===//
				// String Emission

				/// Emit the given string as a nul terminated string.
				void emitNulTerminatedString(StringRef str) {
				emitString(str);
				emitByte(0);
				}

				/// Emit the given string without a nul terminator.
				void emitString(StringRef str) {
				emitBytes({reinterpret_cast<const uint8_t *>(str.data()), str.size()});
				}

				//===--------------------------------------------------------------------===//
				// Section Emission

				/// Emit a nested section of the given code, whose contents are encoded in the
				/// provided emitter.
				void emitSection(bytecode::Section::ID code, EncodingEmitter &&emitter) {
				// Emit the section code and length.
				emitByte(code);
				emitVarInt(emitter.size());

				// Push our current buffer and then merge the provided section body into
				// ours.
				appendResult(std::move(currentResult));
				for (std::vector<uint8_t> &result : emitter.prevResultStorage)
				appendResult(std::move(result));
				appendResult(std::move(emitter.currentResult));
				}

				private:
				/// Emit the given value using a variable width encoding. This method is a
				/// fallback when the number of bytes needed to encode the value is greater
				/// than 1. We mark it noinline here so that the single byte hot path isn't
				/// pessimized.
				LLVM_ATTRIBUTE_NOINLINE void emitMultiByteVarInt(uint64_t value);

				/// Append a new result buffer to the current contents.
				void appendResult(std::vector<uint8_t> &&result) {
				prevResultSize += result.size();
				prevResultStorage.emplace_back(std::move(result));
				prevResultList.emplace_back(prevResultStorage.back());
				}

				/// The result of the emitter currently being built. We refrain from building
				/// a single buffer to simplify emitting sections, large data, and more. The
				/// result is thus represented using multiple distinct buffers, some of which
				/// we own (via prevResultStorage), and some of which are just pointers into
				/// externally owned buffers.
				std::vector<uint8_t> currentResult;
				std::vector<ArrayRef<uint8_t>> prevResultList;
				std::vector<std::vector<uint8_t>> prevResultStorage;

				/// An up-to-date total size of all of the buffers within `prevResultList`.
				/// This enables O(1) size checks of the current encoding.
				size_t prevResultSize = 0;
				};

				/// A simple raw_ostream wrapper around a EncodingEmitter. This removes the need
				/// to go through an intermediate buffer when interacting with code that wants a
				/// raw_ostream.
				class raw_emitter_ostream : public raw_ostream {
				public:
				explicit raw_emitter_ostream(EncodingEmitter &emitter) : emitter(emitter) {
				SetUnbuffered();
				}

				private:
				void write_impl(const char *ptr, size_t size) override {
				emitter.emitBytes({reinterpret_cast<const uint8_t *>(ptr), size});
				}
				uint64_t current_pos() const override { return emitter.size(); }

				/// The section being emitted to.
				EncodingEmitter &emitter;
				};
				} // namespace

				void EncodingEmitter::writeTo(raw_ostream &os) const {
				for (auto &prevResult : prevResultList)
				os.write((const char *)prevResult.data(), prevResult.size());
				os.write((const char *)currentResult.data(), currentResult.size());
				}

				void EncodingEmitter::emitMultiByteVarInt(uint64_t value) {
				// Compute the number of bytes needed to encode the value. Each byte can hold
				// up to 7-bits of data. We only check up to the number of bits we can encode
				// in the first byte (8).
				uint64_t it = value >> 7;
				for (size_t numBytes = 2; numBytes < 9; ++numBytes) {
				if (LLVM_LIKELY(it >>= 7) == 0) {
				RVPUnsubmitted Not Done Reply Inline Actions Is this parenthesized correctly? RVP: Is this parenthesized correctly?
				jpienaarUnsubmitted Not Done Reply Inline Actions This is checking if the value post shift is 0 (and relies on this function being called only when multi byte), what issue did you run into with this? jpienaar: This is checking if the value post shift is 0 (and relies on this function being called only…
				RVPUnsubmitted Not Done Reply Inline Actions Shouldn't `LLVM_LIKELY` be around the whole condition instead of the shift expression? Isn't `== 0` the likely case and not the shift result being non-zero? RVP: Shouldn't `LLVM_LIKELY` be around the whole condition instead of the shift expression? Isn't…
				RVPUnsubmitted Not Done Reply Inline Actions I didn't see any issues. Was looking at the code and this question popped. I now saw that `emitVarInt` specially handles the common case `(... >> 7) == 0`. Maybe a comment here as well would have avoided the question. Thanks. RVP: I didn't see any issues. Was looking at the code and this question popped. I now saw that…
				uint64_t encodedValue = (value << 1) \| 0x1;
				encodedValue <<= (numBytes - 1);
				emitBytes({reinterpret_cast<uint8_t *>(&encodedValue), numBytes});
				return;
				}
				}

				// If the value is too large to encode in a single byte, emit a special all
				// zero marker byte and splat the value directly.
				emitByte(0);
				emitBytes({reinterpret_cast<uint8_t *>(&value), sizeof(value)});
				}

				//===----------------------------------------------------------------------===//
				// Bytecode Writer
				//===----------------------------------------------------------------------===//

				namespace {
				class BytecodeWriter {
				public:
				BytecodeWriter(Operation *op) : numberingState(op) {}

				/// Write the bytecode for the given root operation.
				void write(Operation *rootOp, raw_ostream &os, StringRef producer);

				private:
				//===--------------------------------------------------------------------===//
				// Dialects

				void writeDialectSection(EncodingEmitter &emitter);

				//===--------------------------------------------------------------------===//
				// Attributes and Types

				void writeAttrTypeSection(EncodingEmitter &emitter);

				//===--------------------------------------------------------------------===//
				// Operations

				void writeBlock(EncodingEmitter &emitter, Block *block);
				void writeOp(EncodingEmitter &emitter, Operation *op);
				void writeRegion(EncodingEmitter &emitter, Region *region);
				void writeIRSection(EncodingEmitter &emitter, Operation *op);

				//===--------------------------------------------------------------------===//
				// Strings

				void writeStringSection(EncodingEmitter &emitter);

				/// Get the number for the given shared string, that is contained within the
				/// string section.
				size_t getSharedStringNumber(StringRef str);

				//===--------------------------------------------------------------------===//
				// Fields

				/// The IR numbering state generated for the root operation.
				IRNumberingState numberingState;

				/// A set of strings referenced within the bytecode. The value of the map is
				/// unused.
				llvm::MapVector<llvm::CachedHashStringRef, size_t> strings;
				};
				} // namespace

				void BytecodeWriter::write(Operation *rootOp, raw_ostream &os,
				StringRef producer) {
				EncodingEmitter emitter;

				// Emit the bytecode file header. This is how we identify the output as a
				// bytecode file.
				emitter.emitString("ML\xefR");

				// Emit the bytecode version.
				emitter.emitVarInt(bytecode::kVersion);

				// Emit the producer.
				emitter.emitNulTerminatedString(producer);

				// Emit the dialect section.
				writeDialectSection(emitter);

				// Emit the attributes and types section.
				writeAttrTypeSection(emitter);

				// Emit the IR section.
				writeIRSection(emitter, rootOp);

				// Emit the string section.
				writeStringSection(emitter);

				// Write the generated bytecode to the provided output stream.
				emitter.writeTo(os);
				}

				//===----------------------------------------------------------------------===//
				// Dialects

				/// Write the given entries in contiguous groups with the same parent dialect.
				/// Each dialect sub-group is encoded with the parent dialect and number of
				/// elements, followed by the encoding for the entries. The given callback is
				/// invoked to encode each individual entry.
				template <typename EntriesT, typename EntryCallbackT>
				static void writeDialectGrouping(EncodingEmitter &emitter, EntriesT &&entries,
				EntryCallbackT &&callback) {
				for (auto it = entries.begin(), e = entries.end(); it != e;) {
				auto groupStart = it++;

				// Find the end of the group that shares the same parent dialect.
				DialectNumbering *currentDialect = groupStart->dialect;
				it = std::find_if(it, e, [&](const auto &entry) {
				return entry.dialect != currentDialect;
				});

				// Emit the dialect and number of elements.
				emitter.emitVarInt(currentDialect->number);
				emitter.emitVarInt(std::distance(groupStart, it));

				// Emit the entries within the group.
				for (auto &entry : llvm::make_range(groupStart, it))
				callback(entry);
				}
				}

				void BytecodeWriter::writeDialectSection(EncodingEmitter &emitter) {
				EncodingEmitter dialectEmitter;

				// Emit the referenced dialects.
				auto dialects = numberingState.getDialects();
				dialectEmitter.emitVarInt(llvm::size(dialects));
				for (DialectNumbering &dialect : dialects)
				dialectEmitter.emitVarInt(getSharedStringNumber(dialect.name));

				// Emit the referenced operation names grouped by dialect.
				auto emitOpName = [&](OpNameNumbering &name) {
				dialectEmitter.emitVarInt(getSharedStringNumber(name.name.stripDialect()));
				};
				writeDialectGrouping(dialectEmitter, numberingState.getOpNames(), emitOpName);

				emitter.emitSection(bytecode::Section::kDialect, std::move(dialectEmitter));
				}

				//===----------------------------------------------------------------------===//
				// Attributes and Types

				void BytecodeWriter::writeAttrTypeSection(EncodingEmitter &emitter) {
				EncodingEmitter attrTypeEmitter;
				EncodingEmitter offsetEmitter;
				offsetEmitter.emitVarInt(llvm::size(numberingState.getAttributes()));
				offsetEmitter.emitVarInt(llvm::size(numberingState.getTypes()));

				// A functor used to emit an attribute or type entry.
				uint64_t prevOffset = 0;
				auto emitAttrOrType = [&](auto &entry) {
				// TODO: Allow dialects to provide more optimal implementations of attribute
				// and type encodings.
				bool hasCustomEncoding = false;

				// Emit the entry using the textual format.
				raw_emitter_ostream(attrTypeEmitter) << entry.getValue();
				attrTypeEmitter.emitByte(0);

				// Record the offset of this entry.
				uint64_t curOffset = attrTypeEmitter.size();
				offsetEmitter.emitVarIntWithFlag(curOffset - prevOffset, hasCustomEncoding);
				prevOffset = curOffset;
				};

				// Emit the attribute and type entries for each dialect.
				writeDialectGrouping(offsetEmitter, numberingState.getAttributes(),
				emitAttrOrType);
				writeDialectGrouping(offsetEmitter, numberingState.getTypes(),
				emitAttrOrType);

				// Emit the sections to the stream.
				emitter.emitSection(bytecode::Section::kAttrTypeOffset,
				std::move(offsetEmitter));
				emitter.emitSection(bytecode::Section::kAttrType, std::move(attrTypeEmitter));
				}

				//===----------------------------------------------------------------------===//
				// Operations

				void BytecodeWriter::writeBlock(EncodingEmitter &emitter, Block *block) {
				ArrayRef<BlockArgument> args = block->getArguments();
				bool hasArgs = !args.empty();

				// Emit the number of operations in this block, and if it has arguments. We
				// use the low bit of the operation count to indicate if the block has
				// arguments.
				unsigned numOps = numberingState.getOperationCount(block);
				emitter.emitVarIntWithFlag(numOps, hasArgs);

				// Emit the arguments of the block.
				if (hasArgs) {
				emitter.emitVarInt(args.size());
				for (BlockArgument arg : args) {
				emitter.emitVarInt(numberingState.getNumber(arg.getType()));
				emitter.emitVarInt(numberingState.getNumber(arg.getLoc()));
				}
				}

				// Emit the operations within the block.
				for (Operation &op : *block)
				writeOp(emitter, &op);
				}

				void BytecodeWriter::writeOp(EncodingEmitter &emitter, Operation *op) {
				emitter.emitVarInt(numberingState.getNumber(op->getName()));

				// Emit a mask for the operation components. We need to fill this in later
				// (when we actually know what needs to be emitted), so emit a placeholder for
				// now.
				uint64_t maskOffset = emitter.size();
				uint8_t opEncodingMask = 0;
				emitter.emitByte(0);

				// Emit the location for this operation.
				emitter.emitVarInt(numberingState.getNumber(op->getLoc()));

				// Emit the attributes of this operation.
				DictionaryAttr attrs = op->getAttrDictionary();
				if (!attrs.empty()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasAttrs;
				emitter.emitVarInt(numberingState.getNumber(op->getAttrDictionary()));
				}

				// Emit the result types of the operation.
				if (unsigned numResults = op->getNumResults()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasResults;
				emitter.emitVarInt(numResults);
				for (Type type : op->getResultTypes())
				emitter.emitVarInt(numberingState.getNumber(type));
				}

				// Emit the operands of the operation.
				if (unsigned numOperands = op->getNumOperands()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasOperands;
				emitter.emitVarInt(numOperands);
				for (Value operand : op->getOperands())
				emitter.emitVarInt(numberingState.getNumber(operand));
				}

				// Emit the successors of the operation.
				if (unsigned numSuccessors = op->getNumSuccessors()) {
				opEncodingMask \|= bytecode::OpEncodingMask::kHasSuccessors;
				emitter.emitVarInt(numSuccessors);
				for (Block *successor : op->getSuccessors())
				emitter.emitVarInt(numberingState.getNumber(successor));
				}

				// Check for regions.
				unsigned numRegions = op->getNumRegions();
				if (numRegions)
				opEncodingMask \|= bytecode::OpEncodingMask::kHasInlineRegions;

				// Update the mask for the operation.
				emitter.patchByte(maskOffset, opEncodingMask);

				// With the mask emitted, we can now emit the regions of the operation. We do
				// this after mask emission to avoid offset complications that may arise by
				// emitting the regions first (e.g. if the regions are huge, backpatching the
				// op encoding mask is more annoying).
				if (numRegions) {
				bool isIsolatedFromAbove = op->hasTrait<OpTrait::IsIsolatedFromAbove>();
				emitter.emitVarIntWithFlag(numRegions, isIsolatedFromAbove);

				for (Region &region : op->getRegions())
				writeRegion(emitter, &region);
				}
				}

				void BytecodeWriter::writeRegion(EncodingEmitter &emitter, Region *region) {
				// If the region is empty, we only need to emit the number of blocks (which is
				// zero).
				if (region->empty())
				return emitter.emitVarInt(/numBlocks/ 0);

				// Emit the number of blocks and values within the region.
				unsigned numBlocks, numValues;
				std::tie(numBlocks, numValues) = numberingState.getBlockValueCount(region);
				emitter.emitVarInt(numBlocks);
				emitter.emitVarInt(numValues);

				// Emit the blocks within the region.
				for (Block &block : *region)
				writeBlock(emitter, &block);
				}

				void BytecodeWriter::writeIRSection(EncodingEmitter &emitter, Operation *op) {
				EncodingEmitter irEmitter;

				// Write the IR section the same way as a block with no arguments. Note that
				// the low-bit of the operation count for a block is used to indicate if the
				// block has arguments, which in this case is always false.
				irEmitter.emitVarIntWithFlag(/numOps/ 1, /hasArgs/ false);

				// Emit the operations.
				writeOp(irEmitter, op);

				emitter.emitSection(bytecode::Section::kIR, std::move(irEmitter));
				}

				//===----------------------------------------------------------------------===//
				// Strings

				void BytecodeWriter::writeStringSection(EncodingEmitter &emitter) {
				EncodingEmitter stringEmitter;
				stringEmitter.emitVarInt(strings.size());

				// Emit the sizes in reverse order, so that we don't need to backpatch an
				// offset to the string data or have a separate section.
				for (const auto &it : llvm::reverse(strings))
				stringEmitter.emitVarInt(it.first.size() + 1);
				// Emit the string data itself.
				for (const auto &it : strings)
				stringEmitter.emitNulTerminatedString(it.first.val());

				emitter.emitSection(bytecode::Section::kString, std::move(stringEmitter));
				}

				size_t BytecodeWriter::getSharedStringNumber(StringRef str) {
				auto it = strings.insert({llvm::CachedHashStringRef(str), strings.size()});
				return it.first->second;
				}

				//===----------------------------------------------------------------------===//
				// Entry Points
				//===----------------------------------------------------------------------===//

				void mlir::writeBytecodeToFile(Operation *op, raw_ostream &os,
				StringRef producer) {
				BytecodeWriter writer(op);
				writer.write(op, os, producer);
				}

mlir/lib/Bytecode/Writer/CMakeLists.txt

This file was added.

				add_mlir_library(MLIRBytecodeWriter
				BytecodeWriter.cpp
				IRNumbering.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Bytecode

				LINK_LIBS PUBLIC
				MLIRIR
				MLIRSupport
				)

mlir/lib/Bytecode/Writer/IRNumbering.h

This file was added.

				//===- IRNumbering.h - MLIR bytecode IR numbering ---------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains various utilities that number IR structures in preparation
				// for bytecode emission.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LIB_MLIR_BYTECODE_WRITER_IRNUMBERING_H
				#define LIB_MLIR_BYTECODE_WRITER_IRNUMBERING_H

				#include "mlir/IR/OperationSupport.h"
				#include "llvm/ADT/MapVector.h"

				namespace mlir {
				class BytecodeWriterConfig;

				namespace bytecode {
				namespace detail {
				struct DialectNumbering;

				//===----------------------------------------------------------------------===//
				// Attribute and Type Numbering
				//===----------------------------------------------------------------------===//

				/// This class represents a numbering entry for an Attribute or Type.
				struct AttrTypeNumbering {
				AttrTypeNumbering(PointerUnion<Attribute, Type> value) : value(value) {}

				/// The concrete value.
				PointerUnion<Attribute, Type> value;

				/// The number assigned to this value.
				unsigned number = 0;

				/// The number of references to this value.
				unsigned refCount = 1;

				/// The dialect of this value.
				DialectNumbering *dialect = nullptr;
				};
				struct AttributeNumbering : public AttrTypeNumbering {
				AttributeNumbering(Attribute value) : AttrTypeNumbering(value) {}
				Attribute getValue() const { return value.get<Attribute>(); }
				};
				struct TypeNumbering : public AttrTypeNumbering {
				TypeNumbering(Type value) : AttrTypeNumbering(value) {}
				Type getValue() const { return value.get<Type>(); }
				};

				//===----------------------------------------------------------------------===//
				// OpName Numbering
				//===----------------------------------------------------------------------===//

				/// This class represents the numbering entry of an operation name.
				struct OpNameNumbering {
				OpNameNumbering(DialectNumbering *dialect, OperationName name)
				: dialect(dialect), name(name) {}

				/// The dialect of this value.
				DialectNumbering *dialect;

				/// The concrete name.
				OperationName name;

				/// The number assigned to this name.
				unsigned number = 0;

				/// The number of references to this name.
				unsigned refCount = 1;
				};

				//===----------------------------------------------------------------------===//
				// Dialect Numbering
				//===----------------------------------------------------------------------===//

				/// This class represents a numbering entry for an Dialect.
				struct DialectNumbering {
				DialectNumbering(StringRef name, unsigned number)
				: name(name), number(number) {}

				/// The namespace of the dialect.
				StringRef name;

				/// The number assigned to the dialect.
				unsigned number;

				/// The loaded dialect, or nullptr if the dialect isn't loaded.
				Dialect *dialect = nullptr;
				};

				//===----------------------------------------------------------------------===//
				// IRNumberingState
				//===----------------------------------------------------------------------===//

				/// This class manages numbering IR entities in preparation of bytecode
				/// emission.
				class IRNumberingState {
				public:
				IRNumberingState(Operation *op);

				/// Return the numbered dialects.
				auto getDialects() {
				return llvm::make_pointee_range(llvm::make_second_range(dialects));
				}
				auto getAttributes() { return llvm::make_pointee_range(orderedAttrs); }
				auto getOpNames() { return llvm::make_pointee_range(orderedOpNames); }
				auto getTypes() { return llvm::make_pointee_range(orderedTypes); }

				/// Return the number for the given IR unit.
				unsigned getNumber(Attribute attr) {
				assert(attrs.count(attr) && "attribute not numbered");
				return attrs[attr]->number;
				}
				unsigned getNumber(Block *block) {
				assert(blockIDs.count(block) && "block not numbered");
				return blockIDs[block];
				}
				unsigned getNumber(OperationName opName) {
				assert(opNames.count(opName) && "opName not numbered");
				return opNames[opName]->number;
				}
				unsigned getNumber(Type type) {
				assert(types.count(type) && "type not numbered");
				return types[type]->number;
				}
				unsigned getNumber(Value value) {
				assert(valueIDs.count(value) && "value not numbered");
				return valueIDs[value];
				}

				/// Return the block and value counts of the given region.
				std::pair<unsigned, unsigned> getBlockValueCount(Region *region) {
				assert(regionBlockValueCounts.count(region) && "value not numbered");
				return regionBlockValueCounts[region];
				}

				/// Return the number of operations in the given block.
				unsigned getOperationCount(Block *block) {
				assert(blockOperationCounts.count(block) && "block not numbered");
				return blockOperationCounts[block];
				}

				private:
				/// Number the given IR unit for bytecode emission.
				void number(Attribute attr);
				void number(Block &block);
				DialectNumbering &numberDialect(Dialect *dialect);
				DialectNumbering &numberDialect(StringRef dialect);
				void number(Operation &op);
				void number(OperationName opName);
				void number(Region &region);
				void number(Type type);

				/// Mapping from IR to the respective numbering entries.
				DenseMap<Attribute, AttributeNumbering *> attrs;
				DenseMap<OperationName, OpNameNumbering *> opNames;
				DenseMap<Type, TypeNumbering *> types;
				DenseMap<Dialect , DialectNumbering > registeredDialects;
				llvm::MapVector<StringRef, DialectNumbering *> dialects;
				std::vector<AttributeNumbering *> orderedAttrs;
				std::vector<OpNameNumbering *> orderedOpNames;
				std::vector<TypeNumbering *> orderedTypes;

				/// Allocators used for the various numbering entries.
				llvm::SpecificBumpPtrAllocator<AttributeNumbering> attrAllocator;
				llvm::SpecificBumpPtrAllocator<DialectNumbering> dialectAllocator;
				llvm::SpecificBumpPtrAllocator<OpNameNumbering> opNameAllocator;
				llvm::SpecificBumpPtrAllocator<TypeNumbering> typeAllocator;

				/// The value ID for each Block and Value.
				DenseMap<Block *, unsigned> blockIDs;
				DenseMap<Value, unsigned> valueIDs;

				/// The number of operations in each block.
				DenseMap<Block *, unsigned> blockOperationCounts;

				/// A map from region to the number of blocks and values within that region.
				DenseMap<Region *, std::pair<unsigned, unsigned>> regionBlockValueCounts;

				/// The next value ID to assign when numbering.
				unsigned nextValueID = 0;
				};
				} // namespace detail
				} // namespace bytecode
				} // namespace mlir

				#endif

mlir/lib/Bytecode/Writer/IRNumbering.cpp

This file was added.

				//===- IRNumbering.cpp - MLIR Bytecode IR numbering -----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "IRNumbering.h"
				#include "mlir/Bytecode/BytecodeWriter.h"
				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/IR/OpDefinition.h"

				using namespace mlir;
				using namespace mlir::bytecode::detail;

				//===----------------------------------------------------------------------===//
				// IR Numbering
				//===----------------------------------------------------------------------===//

				/// Group and sort the elements of the given range by their parent dialect. This
				/// grouping is applied to sub-sections of the ranged defined by how many bytes
				/// it takes to encode a varint index to that sub-section.
				template <typename T>
				static void groupByDialectPerByte(T range) {
				if (range.empty())
				return;

				// A functor used to sort by a given dialect, with a desired dialect to be
				// ordered first (to better enable sharing of dialects across byte groups).
				auto sortByDialect = [](unsigned dialectToOrderFirst, const auto &lhs,
				const auto &rhs) {
				if (lhs->dialect->number == dialectToOrderFirst)
				return rhs->dialect->number != dialectToOrderFirst;
				return lhs->dialect->number < rhs->dialect->number;
				};

				unsigned dialectToOrderFirst = 0;
				size_t elementsInByteGroup = 0;
				auto iterRange = range;
				for (unsigned i = 1; i < 9; ++i) {
				// Update the number of elements in the current byte grouping. Reminder
				// that varint encodes 7-bits per byte, so that's how we compute the
				// number of elements in each byte grouping.
				elementsInByteGroup = (1 << (7 * i)) - elementsInByteGroup;

				// Slice out the sub-set of elements that are in the current byte grouping
				// to be sorted.
				auto byteSubRange = iterRange.take_front(elementsInByteGroup);
				iterRange = iterRange.drop_front(byteSubRange.size());

				// Sort the sub range for this byte.
				llvm::stable_sort(byteSubRange, [&](const auto &lhs, const auto &rhs) {
				return sortByDialect(dialectToOrderFirst, lhs, rhs);
				});

				// Update the dialect to order first to be the dialect at the end of the
				// current grouping. This seeks to allow larger dialect groupings across
				// byte boundaries.
				dialectToOrderFirst = byteSubRange.back()->dialect->number;

				mehdi_aminiUnsubmitted Done Reply Inline Actions We could record the number of times an attribute is used in ordre to sort them so that the most used one have a lower IDs (and have more chances to fit in one bytes) :) mehdi_amini: We could record the number of times an attribute is used in ordre to sort them so that the most…
				rriddleAuthorUnsubmitted Done Reply Inline Actions We would need to encode things differently in that case, i.e. if the attributes are not in order of dialect, they would each need to have an associated dialect id encoded with them. In the case of lots of attributes/types, that would be significant. Maybe we could come up a hybrid model? i.e. encode the most common 128 attributes/types, so that they fit in one byte (or two), and then encode the rest using dialect grouping. rriddle: We would need to encode things differently in that case, i.e. if the attributes are not in…
				jpienaarUnsubmitted Done Reply Inline Actions Would sorting attributes per frequency per dialect? (Keep dialect attributes still together but just sort dialects per frequency). We could measure all three of course, doesn't require version bump ;-) jpienaar: Would sorting attributes per frequency per dialect? (Keep dialect attributes still together but…
				// If the data range is now empty, we are done.
				if (iterRange.empty())
				break;
				}

				// Assign the entry numbers based on the sort order.
				for (auto &entry : llvm::enumerate(range))
				entry.value()->number = entry.index();
				}

				IRNumberingState::IRNumberingState(Operation *op) {
				// Number the root operation.
				number(*op);

				// Push all of the regions of the root operation onto the worklist.
				SmallVector<std::pair<Region *, unsigned>, 8> numberContext;
				for (Region &region : op->getRegions())
				numberContext.emplace_back(&region, nextValueID);

				// Iteratively process each of the nested regions.
				jpienaarUnsubmitted Done Reply Inline Actions Could this just be a static function here? jpienaar: Could this just be a static function here?
				while (!numberContext.empty()) {
				Region *region;
				std::tie(region, nextValueID) = numberContext.pop_back_val();
				number(*region);

				// Traverse into nested regions.
				for (Operation &op : region->getOps()) {
				// Isolated regions don't share value numbers with their parent, so we can
				// start numbering these regions at zero.
				unsigned opFirstValueID =
				op.hasTrait<OpTrait::IsIsolatedFromAbove>() ? 0 : nextValueID;
				for (Region &region : op.getRegions())
				numberContext.emplace_back(&region, opFirstValueID);
				}
				}

				// Number each of the dialects. For now this is just in the order they were
				// found, given that the number of dialects on average is small enough to fit
				// within a singly byte (128). If we ever have real world use cases that have
				// a huge number of dialects, this could be made more intelligent.
				for (auto &it : llvm::enumerate(dialects))
				it.value().second->number = it.index();

				// Number each of the recorded components within each dialect.

				// First sort by ref count so that the most referenced elements are first. We
				// try to bias more heavily used elements to the front. This allows for more
				// frequently referenced things to be encoded using smaller varints.
				auto sortByRefCountFn = [](const auto &lhs, const auto &rhs) {
				return lhs->refCount > rhs->refCount;
				};
				llvm::stable_sort(orderedAttrs, sortByRefCountFn);
				llvm::stable_sort(orderedOpNames, sortByRefCountFn);
				llvm::stable_sort(orderedTypes, sortByRefCountFn);

				// After that, we apply a secondary ordering based on the parent dialect. This
				// ordering is applied to sub-sections of the element list defined by how many
				// bytes it takes to encode a varint index to that sub-section. This allows
				// for more efficiently encoding components of the same dialect (e.g. we only
				// have to encode the dialect reference once).
				groupByDialectPerByte(llvm::makeMutableArrayRef(orderedAttrs));
				groupByDialectPerByte(llvm::makeMutableArrayRef(orderedOpNames));
				groupByDialectPerByte(llvm::makeMutableArrayRef(orderedTypes));
				}

				void IRNumberingState::number(Attribute attr) {
				auto it = attrs.insert({attr, nullptr});
				if (!it.second) {
				++it.first->second->refCount;
				return;
				}
				auto *numbering = new (attrAllocator.Allocate()) AttributeNumbering(attr);
				it.first->second = numbering;
				orderedAttrs.push_back(numbering);

				// Check for OpaqueAttr, which is a dialect-specific attribute that didn't
				// have a registered dialect when it got created. We don't want to encode this
				// as the builtin OpaqueAttr, we want to encode it as if the dialect was
				// actually loaded.
				if (OpaqueAttr opaqueAttr = attr.dyn_cast<OpaqueAttr>())
				numbering->dialect = &numberDialect(opaqueAttr.getDialectNamespace());
				else
				numbering->dialect = &numberDialect(&attr.getDialect());
				}

				void IRNumberingState::number(Block &block) {
				// Number the arguments of the block.
				for (BlockArgument arg : block.getArguments()) {
				valueIDs.try_emplace(arg, nextValueID++);
				number(arg.getLoc());
				number(arg.getType());
				}

				// Number the operations in this block.
				unsigned &numOps = blockOperationCounts[&block];
				for (Operation &op : block) {
				number(op);
				++numOps;
				}
				}

				auto IRNumberingState::numberDialect(Dialect *dialect) -> DialectNumbering & {
				DialectNumbering *&numbering = registeredDialects[dialect];
				if (!numbering) {
				numbering = &numberDialect(dialect->getNamespace());
				numbering->dialect = dialect;
				}
				return *numbering;
				}

				auto IRNumberingState::numberDialect(StringRef dialect) -> DialectNumbering & {
				DialectNumbering *&numbering = dialects[dialect];
				if (!numbering) {
				numbering = new (dialectAllocator.Allocate())
				DialectNumbering(dialect, dialects.size() - 1);
				}
				return *numbering;
				}

				void IRNumberingState::number(Region &region) {
				if (region.empty())
				return;
				size_t firstValueID = nextValueID;

				// Number the blocks within this region.
				size_t blockCount = 0;
				for (auto &it : llvm::enumerate(region)) {
				blockIDs.try_emplace(&it.value(), it.index());
				number(it.value());
				++blockCount;
				}

				// Remember the number of blocks and values in this region.
				regionBlockValueCounts.try_emplace(&region, blockCount,
				nextValueID - firstValueID);
				}

				void IRNumberingState::number(Operation &op) {
				// Number the components of an operation that won't be numbered elsewhere
				// (e.g. we don't number operands, regions, or successors here).
				number(op.getName());
				for (OpResult result : op.getResults()) {
				valueIDs.try_emplace(result, nextValueID++);
				number(result.getType());
				}

				// Only number the operation's dictionary if it isn't empty.
				DictionaryAttr dictAttr = op.getAttrDictionary();
				if (!dictAttr.empty())
				number(dictAttr);

				number(op.getLoc());
				}

				void IRNumberingState::number(OperationName opName) {
				OpNameNumbering *&numbering = opNames[opName];
				if (numbering) {
				++numbering->refCount;
				return;
				}
				DialectNumbering *dialectNumber = nullptr;
				if (Dialect *dialect = opName.getDialect())
				dialectNumber = &numberDialect(dialect);
				else
				dialectNumber = &numberDialect(opName.getDialectNamespace());

				numbering =
				new (opNameAllocator.Allocate()) OpNameNumbering(dialectNumber, opName);
				orderedOpNames.push_back(numbering);
				}

				void IRNumberingState::number(Type type) {
				auto it = types.insert({type, nullptr});
				if (!it.second) {
				++it.first->second->refCount;
				return;
				}
				auto *numbering = new (typeAllocator.Allocate()) TypeNumbering(type);
				it.first->second = numbering;
				orderedTypes.push_back(numbering);

				// Check for OpaqueType, which is a dialect-specific type that didn't have a
				// registered dialect when it got created. We don't want to encode this as the
				// builtin OpaqueType, we want to encode it as if the dialect was actually
				// loaded.
				if (OpaqueType opaqueType = type.dyn_cast<OpaqueType>())
				numbering->dialect = &numberDialect(opaqueType.getDialectNamespace());
				else
				numbering->dialect = &numberDialect(&type.getDialect());
				}

mlir/lib/CMakeLists.txt

	# Enable errors for any global constructors.			# Enable errors for any global constructors.
	add_flag_if_supported("-Werror=global-constructors" WERROR_GLOBAL_CONSTRUCTOR)			add_flag_if_supported("-Werror=global-constructors" WERROR_GLOBAL_CONSTRUCTOR)

	add_subdirectory(Analysis)			add_subdirectory(Analysis)
	add_subdirectory(AsmParser)			add_subdirectory(AsmParser)
				add_subdirectory(Bytecode)
	add_subdirectory(Conversion)			add_subdirectory(Conversion)
	add_subdirectory(Dialect)			add_subdirectory(Dialect)
	add_subdirectory(IR)			add_subdirectory(IR)
	add_subdirectory(Interfaces)			add_subdirectory(Interfaces)
	add_subdirectory(Parser)			add_subdirectory(Parser)
	add_subdirectory(Pass)			add_subdirectory(Pass)
	add_subdirectory(Reducer)			add_subdirectory(Reducer)
	add_subdirectory(Rewrite)			add_subdirectory(Rewrite)
	add_subdirectory(Support)			add_subdirectory(Support)
	add_subdirectory(TableGen)			add_subdirectory(TableGen)
	add_subdirectory(Target)			add_subdirectory(Target)
	add_subdirectory(Tools)			add_subdirectory(Tools)
	add_subdirectory(Transforms)			add_subdirectory(Transforms)
	add_subdirectory(ExecutionEngine)			add_subdirectory(ExecutionEngine)

mlir/lib/Parser/CMakeLists.txt

	add_mlir_library(MLIRParser			add_mlir_library(MLIRParser
	Parser.cpp			Parser.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Parser			${MLIR_MAIN_INCLUDE_DIR}/mlir/Parser

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRAsmParser			MLIRAsmParser
				MLIRBytecodeReader
	MLIRIR			MLIRIR
	)			)

mlir/lib/Parser/Parser.cpp

	//===- Parser.cpp - MLIR Unified Parser Interface -------------------------===//			//===- Parser.cpp - MLIR Unified Parser Interface -------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements the parser for the MLIR textual form.			// This file implements the parser for the MLIR textual form.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Parser/Parser.h"			#include "mlir/Parser/Parser.h"
	#include "mlir/AsmParser/AsmParser.h"			#include "mlir/AsmParser/AsmParser.h"
				#include "mlir/Bytecode/BytecodeReader.h"
	#include "llvm/Support/SourceMgr.h"			#include "llvm/Support/SourceMgr.h"

	using namespace mlir;			using namespace mlir;

	LogicalResult mlir::parseSourceFile(const llvm::SourceMgr &sourceMgr,			LogicalResult mlir::parseSourceFile(const llvm::SourceMgr &sourceMgr,
	Block *block, const ParserConfig &config,			Block *block, const ParserConfig &config,
	LocationAttr *sourceFileLoc) {			LocationAttr *sourceFileLoc) {
	const auto *sourceBuf = sourceMgr.getMemoryBuffer(sourceMgr.getMainFileID());			const auto *sourceBuf = sourceMgr.getMemoryBuffer(sourceMgr.getMainFileID());
	if (sourceFileLoc) {			if (sourceFileLoc) {
	*sourceFileLoc = FileLineColLoc::get(config.getContext(),			*sourceFileLoc = FileLineColLoc::get(config.getContext(),
	sourceBuf->getBufferIdentifier(),			sourceBuf->getBufferIdentifier(),
	/line=/0, /column=/0);			/line=/0, /column=/0);
	}			}
				if (isBytecode(*sourceBuf))
				return readBytecodeFile(*sourceBuf, block, config);
	return parseAsmSourceFile(sourceMgr, block, config);			return parseAsmSourceFile(sourceMgr, block, config);
	}			}

	LogicalResult mlir::parseSourceFile(llvm::StringRef filename, Block *block,			LogicalResult mlir::parseSourceFile(llvm::StringRef filename, Block *block,
	const ParserConfig &config,			const ParserConfig &config,
	LocationAttr *sourceFileLoc) {			LocationAttr *sourceFileLoc) {
	llvm::SourceMgr sourceMgr;			llvm::SourceMgr sourceMgr;
	return parseSourceFile(filename, sourceMgr, block, config, sourceFileLoc);			return parseSourceFile(filename, sourceMgr, block, config, sourceFileLoc);
	Show All 32 Lines

mlir/lib/Tools/mlir-opt/CMakeLists.txt

	add_mlir_library(MLIROptLib			add_mlir_library(MLIROptLib
	MlirOptMain.cpp			MlirOptMain.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Tools/mlir-opt			${MLIR_MAIN_INCLUDE_DIR}/mlir/Tools/mlir-opt

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
				MLIRBytecodeWriter
	MLIRPass			MLIRPass
	MLIRParser			MLIRParser
	MLIRSupport			MLIRSupport
	)			)

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

//===- MlirOptMain.cpp - MLIR Optimizer Driver ----------------------------===//		//===- MlirOptMain.cpp - MLIR Optimizer Driver ----------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This is a utility that runs an optimization pass and prints the result back		// This is a utility that runs an optimization pass and prints the result back
// out. It is designed to support unit testing.		// out. It is designed to support unit testing.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Tools/mlir-opt/MlirOptMain.h"		#include "mlir/Tools/mlir-opt/MlirOptMain.h"
		#include "mlir/Bytecode/BytecodeWriter.h"
#include "mlir/IR/AsmState.h"		#include "mlir/IR/AsmState.h"
#include "mlir/IR/Attributes.h"		#include "mlir/IR/Attributes.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/Diagnostics.h"		#include "mlir/IR/Diagnostics.h"
#include "mlir/IR/Dialect.h"		#include "mlir/IR/Dialect.h"
#include "mlir/IR/Location.h"		#include "mlir/IR/Location.h"
#include "mlir/IR/MLIRContext.h"		#include "mlir/IR/MLIRContext.h"
#include "mlir/Parser/Parser.h"		#include "mlir/Parser/Parser.h"
Show All 19 Lines
/// within the specified context.		/// within the specified context.
///		///
/// This typically parses the main source file, runs zero or more optimization		/// This typically parses the main source file, runs zero or more optimization
/// passes, then prints the output.		/// passes, then prints the output.
///		///
static LogicalResult performActions(raw_ostream &os, bool verifyDiagnostics,		static LogicalResult performActions(raw_ostream &os, bool verifyDiagnostics,
bool verifyPasses, SourceMgr &sourceMgr,		bool verifyPasses, SourceMgr &sourceMgr,
MLIRContext *context,		MLIRContext *context,
PassPipelineFn passManagerSetupFn) {		PassPipelineFn passManagerSetupFn,
		bool emitBytecode) {
DefaultTimingManager tm;		DefaultTimingManager tm;
applyDefaultTimingManagerCLOptions(tm);		applyDefaultTimingManagerCLOptions(tm);
TimingScope timing = tm.getRootScope();		TimingScope timing = tm.getRootScope();

// Disable multi-threading when parsing the input file. This removes the		// Disable multi-threading when parsing the input file. This removes the
// unnecessary/costly context synchronization when parsing.		// unnecessary/costly context synchronization when parsing.
bool wasThreadingEnabled = context->isMultithreadingEnabled();		bool wasThreadingEnabled = context->isMultithreadingEnabled();
context->disableMultithreading();		context->disableMultithreading();
Show All 22 Lines	if (failed(passManagerSetupFn(pm)))
return failure();		return failure();

// Run the pipeline.		// Run the pipeline.
if (failed(pm.run(*module)))		if (failed(pm.run(*module)))
return failure();		return failure();

// Print the output.		// Print the output.
TimingScope outputTiming = timing.nest("Output");		TimingScope outputTiming = timing.nest("Output");
		if (emitBytecode) {
		writeBytecodeToFile(module->getOperation(), os);
		} else {
module->print(os);		module->print(os);
os << '\n';		os << '\n';
		}
return success();		return success();
}		}

/// Parses the memory buffer. If successfully, run a series of passes against		/// Parses the memory buffer. If successfully, run a series of passes against
/// it and print the result.		/// it and print the result.
static LogicalResult		static LogicalResult
processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,		processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,
bool verifyDiagnostics, bool verifyPasses,		bool verifyDiagnostics, bool verifyPasses,
bool allowUnregisteredDialects, bool preloadDialectsInContext,		bool allowUnregisteredDialects, bool preloadDialectsInContext,
PassPipelineFn passManagerSetupFn, DialectRegistry &registry,		bool emitBytecode, PassPipelineFn passManagerSetupFn,
llvm::ThreadPool *threadPool) {		DialectRegistry &registry, llvm::ThreadPool *threadPool) {
// Tell sourceMgr about this buffer, which is what the parser will pick up.		// Tell sourceMgr about this buffer, which is what the parser will pick up.
SourceMgr sourceMgr;		SourceMgr sourceMgr;
sourceMgr.AddNewSourceBuffer(std::move(ownedBuffer), SMLoc());		sourceMgr.AddNewSourceBuffer(std::move(ownedBuffer), SMLoc());

// Create a context just for the current buffer. Disable threading on creation		// Create a context just for the current buffer. Disable threading on creation
// since we'll inject the thread-pool separately.		// since we'll inject the thread-pool separately.
MLIRContext context(registry, MLIRContext::Threading::DISABLED);		MLIRContext context(registry, MLIRContext::Threading::DISABLED);
if (threadPool)		if (threadPool)
context.setThreadPool(*threadPool);		context.setThreadPool(*threadPool);

// Parse the input file.		// Parse the input file.
if (preloadDialectsInContext)		if (preloadDialectsInContext)
context.loadAllAvailableDialects();		context.loadAllAvailableDialects();
context.allowUnregisteredDialects(allowUnregisteredDialects);		context.allowUnregisteredDialects(allowUnregisteredDialects);
if (verifyDiagnostics)		if (verifyDiagnostics)
context.printOpOnDiagnostic(false);		context.printOpOnDiagnostic(false);
context.getDebugActionManager().registerActionHandler<DebugCounter>();		context.getDebugActionManager().registerActionHandler<DebugCounter>();

// If we are in verify diagnostics mode then we have a lot of work to do,		// If we are in verify diagnostics mode then we have a lot of work to do,
// otherwise just perform the actions without worrying about it.		// otherwise just perform the actions without worrying about it.
if (!verifyDiagnostics) {		if (!verifyDiagnostics) {
SourceMgrDiagnosticHandler sourceMgrHandler(sourceMgr, &context);		SourceMgrDiagnosticHandler sourceMgrHandler(sourceMgr, &context);
return performActions(os, verifyDiagnostics, verifyPasses, sourceMgr,		return performActions(os, verifyDiagnostics, verifyPasses, sourceMgr,
&context, passManagerSetupFn);		&context, passManagerSetupFn, emitBytecode);
}		}

SourceMgrDiagnosticVerifierHandler sourceMgrHandler(sourceMgr, &context);		SourceMgrDiagnosticVerifierHandler sourceMgrHandler(sourceMgr, &context);

// Do any processing requested by command line flags. We don't care whether		// Do any processing requested by command line flags. We don't care whether
// these actions succeed or fail, we only care what diagnostics they produce		// these actions succeed or fail, we only care what diagnostics they produce
// and whether they match our expectations.		// and whether they match our expectations.
(void)performActions(os, verifyDiagnostics, verifyPasses, sourceMgr, &context,		(void)performActions(os, verifyDiagnostics, verifyPasses, sourceMgr, &context,
passManagerSetupFn);		passManagerSetupFn, emitBytecode);

// Verify the diagnostic handler to make sure that each of the diagnostics		// Verify the diagnostic handler to make sure that each of the diagnostics
// matched.		// matched.
return sourceMgrHandler.verify();		return sourceMgrHandler.verify();
}		}

LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,		LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
std::unique_ptr<MemoryBuffer> buffer,		std::unique_ptr<MemoryBuffer> buffer,
PassPipelineFn passManagerSetupFn,		PassPipelineFn passManagerSetupFn,
DialectRegistry &registry, bool splitInputFile,		DialectRegistry &registry, bool splitInputFile,
bool verifyDiagnostics, bool verifyPasses,		bool verifyDiagnostics, bool verifyPasses,
bool allowUnregisteredDialects,		bool allowUnregisteredDialects,
bool preloadDialectsInContext) {		bool preloadDialectsInContext,
		bool emitBytecode) {
// The split-input-file mode is a very specific mode that slices the file		// The split-input-file mode is a very specific mode that slices the file
// up into small pieces and checks each independently.		// up into small pieces and checks each independently.
		jpienaarUnsubmitted Done Reply Inline Actions I think yes & no. piping these will be common and other uses seem like mistake, but I don't know how foolproof this check is on all platforms and opt tool is not a user tool. jpienaar: I think yes & no. piping these will be common and other uses seem like mistake, but I don't…
		mehdi_aminiUnsubmitted Done Reply Inline Actions I think the difference with LLVM opt is that we're not having byte code as the default, hence we may not need to warn since the user has to opt-in to get there. mehdi_amini: I think the difference with LLVM opt is that we're not having byte code as the default, hence…
		rriddleAuthorUnsubmitted Done Reply Inline Actions Makes sense to me, just dropped it. We can add a warning back in if enough people trip up on this (given bytecode generation is an explicit decision). rriddle: Makes sense to me, just dropped it. We can add a warning back in if enough people trip up on…
// We use an explicit threadpool to avoid creating and joining/destroying		// We use an explicit threadpool to avoid creating and joining/destroying
// threads for each of the split.		// threads for each of the split.
ThreadPool *threadPool = nullptr;		ThreadPool *threadPool = nullptr;

// Create a temporary context for the sake of checking if		// Create a temporary context for the sake of checking if
// --mlir-disable-threading was passed on the command line.		// --mlir-disable-threading was passed on the command line.
// We use the thread-pool this context is creating, and avoid		// We use the thread-pool this context is creating, and avoid
// creating any thread when disabled.		// creating any thread when disabled.
MLIRContext threadPoolCtx;		MLIRContext threadPoolCtx;
if (threadPoolCtx.isMultithreadingEnabled())		if (threadPoolCtx.isMultithreadingEnabled())
threadPool = &threadPoolCtx.getThreadPool();		threadPool = &threadPoolCtx.getThreadPool();

auto chunkFn = [&](std::unique_ptr<MemoryBuffer> chunkBuffer,		auto chunkFn = [&](std::unique_ptr<MemoryBuffer> chunkBuffer,
raw_ostream &os) {		raw_ostream &os) {
return processBuffer(os, std::move(chunkBuffer), verifyDiagnostics,		return processBuffer(os, std::move(chunkBuffer), verifyDiagnostics,
verifyPasses, allowUnregisteredDialects,		verifyPasses, allowUnregisteredDialects,
preloadDialectsInContext, passManagerSetupFn, registry,		preloadDialectsInContext, emitBytecode,
threadPool);		passManagerSetupFn, registry, threadPool);
};		};
return splitAndProcessBuffer(std::move(buffer), chunkFn, outputStream,		return splitAndProcessBuffer(std::move(buffer), chunkFn, outputStream,
splitInputFile, /insertMarkerInOutput=/true);		splitInputFile, /insertMarkerInOutput=/true);
}		}

LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,		LogicalResult mlir::MlirOptMain(raw_ostream &outputStream,
std::unique_ptr<MemoryBuffer> buffer,		std::unique_ptr<MemoryBuffer> buffer,
const PassPipelineCLParser &passPipeline,		const PassPipelineCLParser &passPipeline,
DialectRegistry &registry, bool splitInputFile,		DialectRegistry &registry, bool splitInputFile,
bool verifyDiagnostics, bool verifyPasses,		bool verifyDiagnostics, bool verifyPasses,
bool allowUnregisteredDialects,		bool allowUnregisteredDialects,
bool preloadDialectsInContext) {		bool preloadDialectsInContext,
		bool emitBytecode) {
auto passManagerSetupFn = [&](PassManager &pm) {		auto passManagerSetupFn = [&](PassManager &pm) {
auto errorHandler = [&](const Twine &msg) {		auto errorHandler = [&](const Twine &msg) {
emitError(UnknownLoc::get(pm.getContext())) << msg;		emitError(UnknownLoc::get(pm.getContext())) << msg;
return failure();		return failure();
};		};
return passPipeline.addToPipeline(pm, errorHandler);		return passPipeline.addToPipeline(pm, errorHandler);
};		};
return MlirOptMain(outputStream, std::move(buffer), passManagerSetupFn,		return MlirOptMain(outputStream, std::move(buffer), passManagerSetupFn,
registry, splitInputFile, verifyDiagnostics, verifyPasses,		registry, splitInputFile, verifyDiagnostics, verifyPasses,
allowUnregisteredDialects, preloadDialectsInContext);		allowUnregisteredDialects, preloadDialectsInContext,
		emitBytecode);
}		}

LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,		LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
DialectRegistry &registry,		DialectRegistry &registry,
bool preloadDialectsInContext) {		bool preloadDialectsInContext) {
static cl::opt<std::string> inputFilename(		static cl::opt<std::string> inputFilename(
cl::Positional, cl::desc("<input file>"), cl::init("-"));		cl::Positional, cl::desc("<input file>"), cl::init("-"));

Show All 21 Lines	LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
static cl::opt<bool> allowUnregisteredDialects(		static cl::opt<bool> allowUnregisteredDialects(
"allow-unregistered-dialect",		"allow-unregistered-dialect",
cl::desc("Allow operation with no registered dialects"), cl::init(false));		cl::desc("Allow operation with no registered dialects"), cl::init(false));

static cl::opt<bool> showDialects(		static cl::opt<bool> showDialects(
"show-dialects", cl::desc("Print the list of registered dialects"),		"show-dialects", cl::desc("Print the list of registered dialects"),
cl::init(false));		cl::init(false));

		static cl::opt<bool> emitBytecode(
		"emit-bytecode", cl::desc("Emit bytecode when generating output"),
		cl::init(false));

InitLLVM y(argc, argv);		InitLLVM y(argc, argv);

// Register any command line options.		// Register any command line options.
registerAsmPrinterCLOptions();		registerAsmPrinterCLOptions();
registerMLIRContextCLOptions();		registerMLIRContextCLOptions();
registerPassManagerCLOptions();		registerPassManagerCLOptions();
registerDefaultTimingManagerCLOptions();		registerDefaultTimingManagerCLOptions();
DebugCounter::registerCLOptions();		DebugCounter::registerCLOptions();
Show All 28 Lines	LogicalResult mlir::MlirOptMain(int argc, char **argv, llvm::StringRef toolName,
auto output = openOutputFile(outputFilename, &errorMessage);		auto output = openOutputFile(outputFilename, &errorMessage);
if (!output) {		if (!output) {
llvm::errs() << errorMessage << "\n";		llvm::errs() << errorMessage << "\n";
return failure();		return failure();
}		}

if (failed(MlirOptMain(output->os(), std::move(file), passPipeline, registry,		if (failed(MlirOptMain(output->os(), std::move(file), passPipeline, registry,
splitInputFile, verifyDiagnostics, verifyPasses,		splitInputFile, verifyDiagnostics, verifyPasses,
allowUnregisteredDialects, preloadDialectsInContext)))		allowUnregisteredDialects, preloadDialectsInContext,
		emitBytecode)))
return failure();		return failure();

// Keep the output file if the invocation of MlirOptMain was successful.		// Keep the output file if the invocation of MlirOptMain was successful.
output->keep();		output->keep();
return success();		return success();
}		}

mlir/test/Bytecode/general.mlir

This file was added.

				// RUN: mlir-opt -allow-unregistered-dialect -emit-bytecode %s \| mlir-opt -allow-unregistered-dialect \| FileCheck %s

				jpienaarUnsubmitted Done Reply Inline Actions We probably end up running all non-split or -error cases through a round trip tests to check, followed by fuzzing. It would almost seem possible to enumerate all of these kind of the constructs above. jpienaar: We probably end up running all non-split or -error cases through a round trip tests to check…
				mehdi_aminiUnsubmitted Done Reply Inline Actions Reminds me an old proposal of mine to add some flag to mlir-opt to automatically round-trip and diff, and enable this flag optionally to process the entire test-suite :) Seems like it would be useful here as well! mehdi_amini: Reminds me an old proposal of mine to add some flag to mlir-opt to automatically round-trip and…
				jpienaarUnsubmitted Done Reply Inline Actions Yes indeed, I was actually wondering if we had that already :) jpienaar: Yes indeed, I was actually wondering if we had that already :)
				mehdi_aminiUnsubmitted Done Reply Inline Actions https://reviews.llvm.org/D90088 mehdi_amini: https://reviews.llvm.org/D90088
				rriddleAuthorUnsubmitted Done Reply Inline Actions Do you plan on reviving that @mehdi_amini ? rriddle: Do you plan on reviving that @mehdi_amini ?
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Yeah I should. I had memory that we couldn't reach consensus on it but I may be wrong. mehdi_amini: Yeah I should. I had memory that we couldn't reach consensus on it but I may be wrong.
				jpienaarUnsubmitted Not Done Reply Inline Actions I think only question was on on by default or not (e.g., how much of testing tool mlir-opt really is, should it be used in directed testing only etc) jpienaar: I think only question was on on by default or not (e.g., how much of testing tool mlir-opt…
				// CHECK-LABEL: "bytecode.test1"
				// CHECK-NEXT: "bytecode.empty"() : () -> ()
				// CHECK-NEXT: "bytecode.attributes"() {attra = 10 : i64, attrb = #bytecode.attr} : () -> ()
				// CHECK-NEXT: test.graph_region {
				// CHECK-NEXT: "bytecode.operands"(%[[RESULTS:.*]]#0, %[[RESULTS]]#1, %[[RESULTS]]#2) : (i32, i64, i32) -> ()
				// CHECK-NEXT: %[[RESULTS]]:3 = "bytecode.results"() : () -> (i32, i64, i32)
				// CHECK-NEXT: }
				// CHECK-NEXT: "bytecode.branch"()[^[[BLOCK:.*]]] : () -> ()
				// CHECK-NEXT: ^[[BLOCK]](%[[ARG0:.]]: i32, %[[ARG1:.]]: !bytecode.int, %[[ARG2:.*]]: !pdl.operation):
				// CHECK-NEXT: "bytecode.regions"() ({
				// CHECK-NEXT: "bytecode.operands"(%[[ARG0]], %[[ARG1]], %[[ARG2]]) : (i32, !bytecode.int, !pdl.operation) -> ()
				// CHECK-NEXT: "bytecode.return"() : () -> ()
				// CHECK-NEXT: }) : () -> ()
				// CHECK-NEXT: "bytecode.return"() : () -> ()
				// CHECK-NEXT: }) : () -> ()

				"bytecode.test1"() ({
				"bytecode.empty"() : () -> ()
				"bytecode.attributes"() {attra = 10, attrb = #bytecode.attr} : () -> ()
				test.graph_region {
				"bytecode.operands"(%results#0, %results#1, %results#2) : (i32, i64, i32) -> ()
				%results:3 = "bytecode.results"() : () -> (i32, i64, i32)
				}
				"bytecode.branch"()[^secondBlock] : () -> ()

				^secondBlock(%arg1: i32, %arg2: !bytecode.int, %arg3: !pdl.operation):
				"bytecode.regions"() ({
				"bytecode.operands"(%arg1, %arg2, %arg3) : (i32, !bytecode.int, !pdl.operation) -> ()
				"bytecode.return"() : () -> ()
				}) : () -> ()
				"bytecode.return"() : () -> ()
				}) : () -> ()

mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-large_offset.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-trailing_data.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-attr_type_section-index.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-attr_type_section-trailing_data.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-dialect_section-dialect_string.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-dialect_section-opname_dialect.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-dialect_section-opname_string.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-dialect_section.mlir

This file was added.

				// This file contains various failure test cases related to the structure of
				// the dialect section.

				//===--------------------------------------------------------------------===//
				// Dialect Name
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-dialect_section-dialect_string.mlirbc 2>&1 \| FileCheck %s --check-prefix=DIALECT_STR
				// DIALECT_STR: invalid string index: 15

				//===--------------------------------------------------------------------===//
				// OpName
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-dialect_section-opname_dialect.mlirbc 2>&1 \| FileCheck %s --check-prefix=OPNAME_DIALECT
				// OPNAME_DIALECT: invalid dialect index: 7

				// RUN: not mlir-opt %S/invalid-dialect_section-opname_string.mlirbc 2>&1 \| FileCheck %s --check-prefix=OPNAME_STR
				// OPNAME_STR: invalid string index: 31

mlir/test/Bytecode/invalid/invalid-ir_section-attr.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-ir_section-forwardref.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-ir_section-loc.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-ir_section-operands.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-ir_section-opname.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-ir_section-results.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-ir_section-successors.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-ir_section.mlir

This file was added.

				// This file contains various failure test cases related to the structure of
				// the IR section.

				//===--------------------------------------------------------------------===//
				// Operations
				//===--------------------------------------------------------------------===//

				//===--------------------------------------------------------------------===//
				// Name

				// RUN: not mlir-opt %S/invalid-ir_section-opname.mlirbc -allow-unregistered-dialect 2>&1 \| FileCheck %s --check-prefix=OP_NAME
				// OP_NAME: invalid operation name index: 14

				//===--------------------------------------------------------------------===//
				// Loc

				// RUN: not mlir-opt %S/invalid-ir_section-loc.mlirbc -allow-unregistered-dialect 2>&1 \| FileCheck %s --check-prefix=OP_LOC
				// OP_LOC: expected attribute of type: {{.*}}, but got: {attra = 10 : i64, attrb = #bytecode.attr}

				//===--------------------------------------------------------------------===//
				// Attr

				// RUN: not mlir-opt %S/invalid-ir_section-attr.mlirbc -allow-unregistered-dialect 2>&1 \| FileCheck %s --check-prefix=OP_ATTR
				// OP_ATTR: expected attribute of type: {{.*}}, but got: loc(unknown)

				//===--------------------------------------------------------------------===//
				// Operands

				// RUN: not mlir-opt %S/invalid-ir_section-operands.mlirbc -allow-unregistered-dialect 2>&1 \| FileCheck %s --check-prefix=OP_OPERANDS
				// OP_OPERANDS: invalid value index: 6

				// RUN: not mlir-opt %S/invalid-ir_section-forwardref.mlirbc -allow-unregistered-dialect 2>&1 \| FileCheck %s --check-prefix=FORWARD_REF
				// FORWARD_REF: not all forward unresolved forward operand references

				//===--------------------------------------------------------------------===//
				// Results

				// RUN: not mlir-opt %S/invalid-ir_section-results.mlirbc -allow-unregistered-dialect 2>&1 \| FileCheck %s --check-prefix=OP_RESULTS
				// OP_RESULTS: value index range was outside of the expected range for the parent region, got [3, 6), but the maximum index was 2

				//===--------------------------------------------------------------------===//
				// Successors

				// RUN: not mlir-opt %S/invalid-ir_section-successors.mlirbc -allow-unregistered-dialect 2>&1 \| FileCheck %s --check-prefix=OP_SUCCESSORS
				// OP_SUCCESSORS: invalid successor index: 3

mlir/test/Bytecode/invalid/invalid-string_section-count.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-string_section-large_string.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-string_section-no_string.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-string_section-trailing_data.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-string_section.mlir

This file was added.

				// This file contains various failure test cases related to the structure of
				// the string section.

				//===--------------------------------------------------------------------===//
				// Count
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-string_section-count.mlirbc 2>&1 \| FileCheck %s --check-prefix=COUNT
				// COUNT: attempting to parse a byte at the end of the bytecode

				//===--------------------------------------------------------------------===//
				// Invalid String
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-string_section-no_string.mlirbc 2>&1 \| FileCheck %s --check-prefix=NO_STRING
				// NO_STRING: attempting to parse a byte at the end of the bytecode

				// RUN: not mlir-opt %S/invalid-string_section-large_string.mlirbc 2>&1 \| FileCheck %s --check-prefix=LARGE_STRING
				// LARGE_STRING: string size exceeds the available data size

				//===--------------------------------------------------------------------===//
				// Trailing data
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-string_section-trailing_data.mlirbc 2>&1 \| FileCheck %s --check-prefix=TRAILING_DATA
				// TRAILING_DATA: unexpected trailing data between the offsets for strings and their data

mlir/test/Bytecode/invalid/invalid-structure-producer.mlirbc

This file was added.

This file uses an unknown character encoding.

				MLïR␁ÿ
				No newline at end of file

mlir/test/Bytecode/invalid/invalid-structure-section-duplicate.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-structure-section-id-unknown.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-structure-section-length.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-structure-section-missing.mlirbc

This binary file was added.

mlir/test/Bytecode/invalid/invalid-structure-version.mlirbc

This file was added.

This file uses an unknown character encoding.

				MLïRÿ
				No newline at end of file

mlir/test/Bytecode/invalid/invalid-structure.mlir

This file was added.

				// This file contains various failure test cases related to the structure of
				// a bytecode file.

				//===--------------------------------------------------------------------===//
				// Version
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-structure-version.mlirbc 2>&1 \| FileCheck %s --check-prefix=VERSION
				// VERSION: bytecode version 127 is newer than the current version 0

				//===--------------------------------------------------------------------===//
				// Producer
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-structure-producer.mlirbc 2>&1 \| FileCheck %s --check-prefix=PRODUCER
				// PRODUCER: malformed null-terminated string, no null character found

				//===--------------------------------------------------------------------===//
				// Section
				//===--------------------------------------------------------------------===//

				//===--------------------------------------------------------------------===//
				// Missing

				// RUN: not mlir-opt %S/invalid-structure-section-missing.mlirbc 2>&1 \| FileCheck %s --check-prefix=SECTION_MISSING
				// SECTION_MISSING: missing data for top-level section: String (0)

				//===--------------------------------------------------------------------===//
				// ID

				// RUN: not mlir-opt %S/invalid-structure-section-id-unknown.mlirbc 2>&1 \| FileCheck %s --check-prefix=SECTION_ID_UNKNOWN
				// SECTION_ID_UNKNOWN: invalid section ID: 255

				//===--------------------------------------------------------------------===//
				// Length

				// RUN: not mlir-opt %S/invalid-structure-section-length.mlirbc 2>&1 \| FileCheck %s --check-prefix=SECTION_LENGTH
				// SECTION_LENGTH: attempting to parse a byte at the end of the bytecode

				//===--------------------------------------------------------------------===//
				// Duplicate

				// RUN: not mlir-opt %S/invalid-structure-section-duplicate.mlirbc 2>&1 \| FileCheck %s --check-prefix=SECTION_DUPLICATE
				// SECTION_DUPLICATE: duplicate top-level section: String (0)

mlir/test/Bytecode/invalid/invalid_attr_type_offset_section.mlir

This file was added.

				// This file contains various failure test cases related to the structure of
				// the attribute/type offset section.

				//===--------------------------------------------------------------------===//
				// Offset
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-attr_type_offset_section-large_offset.mlirbc 2>&1 \| FileCheck %s --check-prefix=LARGE_OFFSET
				// LARGE_OFFSET: Attribute or Type entry offset points past the end of section

				//===--------------------------------------------------------------------===//
				// Trailing Data
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-attr_type_offset_section-trailing_data.mlirbc 2>&1 \| FileCheck %s --check-prefix=TRAILING_DATA
				// TRAILING_DATA: unexpected trailing data in the Attribute/Type offset section

mlir/test/Bytecode/invalid/invalid_attr_type_section.mlir

This file was added.

				// This file contains various failure test cases related to the structure of
				// the attribute/type offset section.

				//===--------------------------------------------------------------------===//
				// Index
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-attr_type_section-index.mlirbc 2>&1 \| FileCheck %s --check-prefix=INDEX
				// INDEX: invalid Attribute index: 3

				//===--------------------------------------------------------------------===//
				// Trailing Data
				//===--------------------------------------------------------------------===//

				// RUN: not mlir-opt %S/invalid-attr_type_section-trailing_data.mlirbc 2>&1 \| FileCheck %s --check-prefix=TRAILING_DATA
				// TRAILING_DATA: trailing characters found after Attribute assembly format: trailing

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add initial support for a binary serialization formatClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 454397

mlir/docs/BytecodeFormat.md

mlir/include/mlir/Bytecode/BytecodeReader.h

mlir/include/mlir/Bytecode/BytecodeWriter.h

mlir/include/mlir/IR/OperationSupport.h

mlir/include/mlir/Tools/mlir-opt/MlirOptMain.h

mlir/lib/Bytecode/CMakeLists.txt

mlir/lib/Bytecode/Encoding.h

mlir/lib/Bytecode/Reader/BytecodeReader.cpp

mlir/lib/Bytecode/Reader/CMakeLists.txt

mlir/lib/Bytecode/Writer/BytecodeWriter.cpp

mlir/lib/Bytecode/Writer/CMakeLists.txt

mlir/lib/Bytecode/Writer/IRNumbering.h

mlir/lib/Bytecode/Writer/IRNumbering.cpp

mlir/lib/CMakeLists.txt

mlir/lib/Parser/CMakeLists.txt

mlir/lib/Parser/Parser.cpp

mlir/lib/Tools/mlir-opt/CMakeLists.txt

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp

mlir/test/Bytecode/general.mlir

mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-large_offset.mlirbc

mlir/test/Bytecode/invalid/invalid-attr_type_offset_section-trailing_data.mlirbc

mlir/test/Bytecode/invalid/invalid-attr_type_section-index.mlirbc

mlir/test/Bytecode/invalid/invalid-attr_type_section-trailing_data.mlirbc

mlir/test/Bytecode/invalid/invalid-dialect_section-dialect_string.mlirbc

mlir/test/Bytecode/invalid/invalid-dialect_section-opname_dialect.mlirbc

mlir/test/Bytecode/invalid/invalid-dialect_section-opname_string.mlirbc

mlir/test/Bytecode/invalid/invalid-dialect_section.mlir

mlir/test/Bytecode/invalid/invalid-ir_section-attr.mlirbc

mlir/test/Bytecode/invalid/invalid-ir_section-forwardref.mlirbc

mlir/test/Bytecode/invalid/invalid-ir_section-loc.mlirbc

mlir/test/Bytecode/invalid/invalid-ir_section-operands.mlirbc

mlir/test/Bytecode/invalid/invalid-ir_section-opname.mlirbc

mlir/test/Bytecode/invalid/invalid-ir_section-results.mlirbc

mlir/test/Bytecode/invalid/invalid-ir_section-successors.mlirbc

mlir/test/Bytecode/invalid/invalid-ir_section.mlir

mlir/test/Bytecode/invalid/invalid-string_section-count.mlirbc

mlir/test/Bytecode/invalid/invalid-string_section-large_string.mlirbc

mlir/test/Bytecode/invalid/invalid-string_section-no_string.mlirbc

mlir/test/Bytecode/invalid/invalid-string_section-trailing_data.mlirbc

mlir/test/Bytecode/invalid/invalid-string_section.mlir

mlir/test/Bytecode/invalid/invalid-structure-producer.mlirbc

mlir/test/Bytecode/invalid/invalid-structure-section-duplicate.mlirbc

mlir/test/Bytecode/invalid/invalid-structure-section-id-unknown.mlirbc

mlir/test/Bytecode/invalid/invalid-structure-section-length.mlirbc

mlir/test/Bytecode/invalid/invalid-structure-section-missing.mlirbc

mlir/test/Bytecode/invalid/invalid-structure-version.mlirbc

mlir/test/Bytecode/invalid/invalid-structure.mlir

mlir/test/Bytecode/invalid/invalid_attr_type_offset_section.mlir

mlir/test/Bytecode/invalid/invalid_attr_type_section.mlir

[mlir] Add initial support for a binary serialization format
ClosedPublic