IsolatedRegions are emitted in sections in order for the reader to be
able to skip over them. A new class is exposed to manage the state and
allow the readers to load these IsolatedRegions on-demand.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Nice! This is pretty much what I had in mind originally. Getting the bytecode format is the major part of this, we can tune the policy/API as we use it.
mlir/include/mlir/Bytecode/BytecodeReader.h | ||
---|---|---|
38–39 | ||
44–48 | Why return a functor instead of driving this through BytecodeReader? The returned functor references the BytecodeReader, so what are the benefits to returning a functor? The only situation I can think of is that it makes things slightly simpler when reading multiple bytecode files (which I run into a lot for my needs), removing the need to remember which one holds each op. | |
47 | Maybe getLazyOpMaterializer? Not blocking at all, but might be nice to keep consistent language for this. | |
mlir/lib/Bytecode/Reader/BytecodeReader.cpp | ||
1124 | Can you document this? Can be a simple callout to the main method (just feels weird having methods undocumented). | |
1130 | Can you add messages to these asserts? | |
1321 | Please document this. | |
1690–1717 | ||
1697–1700 | ||
1740 | Please add messages to the asserts. | |
mlir/lib/Bytecode/Writer/BytecodeWriter.cpp | ||
742 | Waiting for Jacques patch? |
Nice! Thanks
mlir/docs/BytecodeFormat.md | ||
---|---|---|
345 | Is region_section defined here? (Seems it should be ir_section from below) | |
mlir/lib/Bytecode/Writer/BytecodeWriter.cpp | ||
742 | Would be nice :-) | |
mlir/test/lib/IR/TestLazyLoading.cpp | ||
52 | So if I wanted to extract only a subgraph, one would materilze the op, materialize recursively everything in it while keeping track of referenced symbols and recursively materialize all of them? | |
60 | Nit: error convention is sentence fragments (start lower case and no trailing punctuation) |
mlir/docs/BytecodeFormat.md | ||
---|---|---|
345 | Did you consider making it so that not all isolated from above are encoded same way? Perhaps stealing a bit from region encoding to indicate encoding. That way one could mark only what is needed and so writer could limit depth and also it's smaller in old encoding. That does complicate the writer though (writer config would need a set or somesuch to track what to encode). |
Thanks a lot!
mlir/lib/Bytecode/Reader/BytecodeReader.cpp | ||
---|---|---|
1654–1658 | This line seems unnecessary now? |
mlir/docs/BytecodeFormat.md | ||
---|---|---|
345 |
Sections aren't defined, I'm not sure how to express it? I could make it: region_section { region } But it is still implicit that it is a section, for example op { name: varint, encodingMask: byte, ... There is no section defined here. As for the encoding, I like the uniformity of the format and the capabilities: it's not clear to me we need to push complexity on the emitter instead of just leaving the flexibility to the consumer. Do you have something in mind? | |
mlir/include/mlir/Bytecode/BytecodeReader.h | ||
44–48 | I was actually trying to have a single map lookup so that the API can do directly "check if this op can be materialized" and "materialize this op" without hitting the map twice! This may not be worth it though: the work to do is heavy enough that a map lookup is negligible. I'm not super happy with the API right now though, as it's not easy to know which operation need materialization or not. Another thing is how "unsafe" this all is from the client side, I'd like to think a bit more the API to provide some more guarantees maybe. | |
mlir/lib/Bytecode/Writer/BytecodeWriter.cpp | ||
742 | Yes, I was waiting for Jacques, I'll rebase and fix the TODO! | |
mlir/test/lib/IR/TestLazyLoading.cpp | ||
52 | Yes: you would materialize the module, build a symbol table, then worklist algorithm: add your entry point to the wordlist and while !empty, pop the current function, materialize and collect the symbols, using the symbol table to push the functions to the wordlist. |
mlir/docs/BytecodeFormat.md | ||
---|---|---|
345 | Good point, we need to tag those a bit differently. Perhaps section<region> and we could have op : Section { ... }. (Can be follow up, as you mention it's not consistent and we can do a NOP change to make it). Re encoding: I agree, uniform is better. I was just thinking it's probably rare than one would be lazily loading more than a handful of the regions but there is paying a cost on each. At least size wise, the uniformity does avoid additional cost during serialization/deserialization. I don't have anything quantitative though to back the tradeoff either way. |
mlir/docs/BytecodeFormat.md | ||
---|---|---|
345 | I've been wondering about the performance by the way, right now the cost is (if I don't miss anything):
Then in terms of runtime loading I'd be interested to benchmark "always lazy loading" vs "direct loading", it should be very close. |
mlir/docs/BytecodeFormat.md | ||
---|---|---|
345 | 1 byte for section id and 1 varint (that is if there is no alignment or padding). Not high, but means ~2.4 bytes per function additional. So if you crammed multiple models in ine file, it could add up (especially if you use encodings where everything is a function rather than region ... I've definitely been surprised by what python frontends end up producing). Probably still only at like 100 bytes larger kind of category. |
- Rebase on Jacques' back-deployment support and test it
- Update the exposed APIs to avoid having to walk the IR to find ops to lazy-load.
mlir/include/mlir/Bytecode/BytecodeReader.h | ||
---|---|---|
50–54 | When will these be used outside of just testing? I can't think of a situation where this would actually get used. |
mlir/include/mlir/Bytecode/BytecodeReader.h | ||
---|---|---|
50–54 | You could have a list of functions to load and filter based on this? |
Nice - I recall we talked about reachability too and so materializing what is reachable from others materialized and doing that in finalize if needed. But that is probably a higher level function and a more expensive one that could be built on top of this instead.
mlir/include/mlir/Bytecode/BytecodeReader.h | ||
---|---|---|
44 | This is only needs to be materialized given already materialized right? Ok that sentence is rough, i was mostly trying to think about recursive part and the user shouldn't expect that parser would decent into an unmaterialized op | |
48 | Why couldn't this have same API as finalize and just return whether to materialize or not? | |
51 | Why would a user care about this? | |
59 | make sure to use | |
62 | So upon callback of op to materialize, the callback could then call materialize? |
Update the callbacks to return a bool to provide control to the client to materialize as we go
mlir/include/mlir/Bytecode/BytecodeReader.h | ||
---|---|---|
44 | What about The lazyOps call back is invoked for every ops that contains regions that can be lazy loaded. I would also change the callback to return a bool to immediately materialize as you suggested below. | |
51 | I don't know, but it's cheap to expose :) | |
62 | It technically can: we shouldn't because it is a recursion, so better use a worklist algorithm, like I did in mlir/test/lib/IR/TestLazyLoading.cpp (around line 70) |
The new test case Bytecode/bytecode-lazy-loading.mlir is failing on s390x due to what seems to be endian issues. (It looks likely that this has just exposed existing endian problems in the bytecode reader.)
Unfortunately the s390x builder was down for a couple of weeks due to maintenance, but now that it's back up, the failing test makes it red. Note that since this patch, three more patches have introduced additional failing test cases on s390x:
https://reviews.llvm.org/D149755 (Bytecode/uselist_orders.mlir)
https://reviews.llvm.org/D151386 (Bytecode/unregistered_dialect.mlir)
https://reviews.llvm.org/D151408 (Bytecode/./MLIRBytecodeTests/Bytecode/MultiModuleWithResource)
all for what appear to be similar (or even the same) endian problems.
See e.g. https://lab.llvm.org/buildbot/#/builders/199/builds/21088 for more details. Looking at the BytecodeReader.cpp file, I can see a number of places that obviously assume little-endian host byte order, e.g. in parseVarInt or parseMultiByteVarInt. Any suggestions on how to handle MLIR bytecode on big-endian hosts?
MLIR bytecode doesn't support big endian hosts at all, so we'll need to just disable tests in those cases, e.g. https://github.com/llvm/llvm-project/blob/52ca6ad755b0cc2aa603cfb3124bf58c04a47005/mlir/test/Bytecode/resources.mlir#L4
We've discussed it before, but most of the engineers working on the bytecode (myself included) don't have easy big-endian platform access.
I assume Qemu can emulate big-endian systems?
A quick Google search yields https://aircrack-ng.blogspot.com/2018/10/to-be-or-not-to-be-using-qemu-to-run.html ; seems like Debian on MIPS is a possible option.
Ah, right, I had forgotten about that. I've had a closer look now to identify the root causes of these failures, and I can see only two actual problems:
- Encoding/decoding of multi-byte integers in the bytecode reader/writer. This is trivial to fix, and actually in itself resolves nearly all bytecode related endian problems, including these new failures and existing testcases that were already marked unsupported.
- Handling of "dense resource element attributes". While "regular" attributes have a tryGetValues accessor that returns a custom iterator that performs endian conversion on access, the "resource" attributes use a tryGetAsArrayRef accessor that returns a standard ArrayRef, which does not do any conversion. It seems it should be possible to use a similar custom iterator approach here as well, which would fix the remaining issues.
I'd be happy to come up with patches for these issues, if you'd be willing to consider accepting such changes. Also, I'd be happy to investigate endian problems that might come up in the future. In the alternative (or in addition), we can also make access to an s390x machine available if you want to look into endian issues yourself.
I've now posted a patch for the bytecode reader/writer problem here: https://reviews.llvm.org/D153567
The resource decoding problem proved more complicated than I initially thought - I think I'll open an issue to discuss in more detail.
Issue now open as https://github.com/llvm/llvm-project/issues/63469 - any comments welcome!
Is region_section defined here? (Seems it should be ir_section from below)