This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Allow for attaching external resources to .mlir files
ClosedPublic

Authored by rriddle on May 25 2022, 9:32 PM.

Details

Summary

This commit enables support for providing and processing external
resources within MLIR assembly formats. This is a mechanism with which
dialects, and external clients, may attach additional information when
printing IR without that information being encoded in the IR itself.
External resources are not uniqued within the MLIR context, are not
attached directly to any operation, and are solely intended to live and be
processed outside of the immediate IR. There are many potential uses of this
functionality, for example MLIR's pass crash reproducer could utilize this to
attach the pass resource executing when a crash occurs. Other types of
uses may be embedding large amounts of binary data, such as weights in ML
applications, that shouldn't be copied directly into the MLIR context, but
need to be kept adjacent to the IR.

External resources are encoded using a key-value pair nested within a
dictionary anchored by name either on a dialect, or an externally registered
entity. The key is an identifier used to disambiguate the data. The value
may be stored in various limited forms, but general encodings use a string
(human readable) or blob format (binary). Within the textual format, an
example may be of the form:

mlir
{-#
  // The `dialect_resources` section within the file-level metadata
  // dictionary is used to contain any dialect resource entries.
  dialect_resources: {
    // Here is a dictionary anchored on "foo_dialect", which is a dialect
    // namespace.
    foo_dialect: {
      // `some_dialect_resource` is a key to be interpreted by the dialect,
      // and used to initialize/configure/etc.
      some_dialect_resource: "Some important resource value"
    }
  },
  // The `external_resources` section within the file-level metadata
  // dictionary is used to contain any non-dialect resource entries.
  external_resources: {
    // Here is a dictionary anchored on "mlir_reproducer", which is an
    // external entity representing MLIR's crash reproducer functionality.
    mlir_reproducer: {
      // `pipeline` is an entry that holds a crash reproducer pipeline
      // resource.
      pipeline: "func.func(canonicalize,cse)"
    }
  }

Diff Detail

Event Timeline

rriddle created this revision.May 25 2022, 9:32 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 25 2022, 9:32 PM
rriddle requested review of this revision.May 25 2022, 9:32 PM
rriddle planned changes to this revision.May 25 2022, 9:36 PM

What about using directly YAML for this section?

Doing it ourselves would keep MLIR light. And the external config API would just be built on top of YAML.

Mogball added inline comments.May 26 2022, 10:35 AM
mlir/include/mlir/IR/AsmState.h
57

Just curious, but what else could be stored here other than "external config"?

69

Is there any guarantee that dialect names won't collide with any random config key?

What about using directly YAML for this section?

I considered, but ultimately it felt like more complexity than the benefits it brought (especially for this draft). Having something
just for us felt much simpler and more expressive (we could use Attributes/Types for config values if we wanted). I could likely be
convinced otherwise though, using something like YAML would conceptually make supporting more complex config values
easier (nested structs and stuff) if we wanted to go that approach. There are tradeoffs though (e.g. error reporting, attribute/type values,
still need a generic interface to enable seamless bitcode integration, etc.).

mlir/include/mlir/IR/AsmState.h
57

The idea for the top-level dictionary was to provide a convenient place to put things in the future. For example, this would be a natural place to put the dialect_versions metadata in Mehdi's WIP versioning patch: https://reviews.llvm.org/D117761

69

Non-dialect config keys are always wrapped with <>, so there won't be collision between the two.

rriddle requested review of this revision.May 30 2022, 10:49 PM
rriddle updated this revision to Diff 433016.
rriddle retitled this revision from WIP: [mlir] Allow for attaching external configurations to .mlir files to [mlir] Allow for attaching external configurations to .mlir files.
rriddle edited the summary of this revision. (Show Details)

What about using directly YAML for this section?

I considered, but ultimately it felt like more complexity than the benefits it brought (especially for this draft). Having something
just for us felt much simpler and more expressive (we could use Attributes/Types for config values if we wanted). I could likely be
convinced otherwise though, using something like YAML would conceptually make supporting more complex config values
easier (nested structs and stuff) if we wanted to go that approach. There are tradeoffs though (e.g. error reporting, attribute/type values,
still need a generic interface to enable seamless bitcode integration, etc.).

If the AsmConfig* classes were a bit more POD/model-ish, then it would be pretty easy to wrap YAML around that, but the way the implementation is laid out would be a bit of a weird fit (and I think I see why you have it the way you do). In general, I agree with the tradeoffs: it will feel and act like a complete side-thing. From my point of view, how this grows towards bitcode is more salient to me, and binding the text format close to YAML will just make it weird (all my subjective opinion based on trying to do such things before).

mehdi_amini added inline comments.May 31 2022, 12:03 PM
mlir/test/IR/file-metadata-config.mlir
17

I do have some concerns with this example (to the extent that this will be an example in-tree).

There is a problem in that we can't load in the same context some piece of IR if there was already another piece of IR using "blob1" loaded (even if the IR is deleted since).

rriddle added inline comments.May 31 2022, 12:33 PM
mlir/test/IR/file-metadata-config.mlir
17

Oh yeah, thanks for pointing this out. I originally wrote this quickly as a simple example, but forgot to update it. Do you have any suggestions on what we want to do here to set a good example? Compare the data? Just compare hashes?

Have you considered using base64 instead of hex for binary data?

https://reviews.llvm.org/D126254 will bring base64 to LLVM and LLDB.

Have you considered using base64 instead of hex for binary data?

I would argue against base64 personally - I don't know if the benefits outweigh the issue that it inflates all data by 33%. It does beg the question though, of 'should we allow folks to choose some other encoding for data in this config struct?' If this is the only use case - that some folks want b64 data, then maybe it's better to just wait for the bitcode format and b64 encode the whole file if you want it to be Javascript-safe?

The current implementation uses hexadecimal, which requires 16 bits (2 bytes) to encode 8 bits of binary data. base64 would be an improvement over the current implementation that uses hexadecimal, requiring 8 bits to encode 6 bits of binary data. (The other constraint here is that encoding needs to be handled by the parser, which in the original example, means it can't have a quotation mark.) IMO, base64 is:

  • smaller than hex
  • somewhat ubiquitous
  • compatible with the existing parser (I think?)
  • a reasonable default for situations where the size overhead is an acceptable tradeoff for having the MLIR file be text-based and self-contained.

For text-based MLIR file situations where the size increase is an issue, I could imagine the config data containing a string with the name of an external file, and an application-defined way to resolve the full path. That could be out-of-tree and/or dialect specific.

The current implementation uses hexadecimal, which requires 16 bits (2 bytes) to encode 8 bits of binary data. base64 would be an improvement over the current implementation that uses hexadecimal, requiring 8 bits to encode 6 bits of binary data. (The other constraint here is that encoding needs to be handled by the parser, which in the original example, means it can't have a quotation mark.)

Good point! I was thinking of just the raw binary data, forgot that hex is silly that way :)

The current implementation uses hexadecimal, which requires 16 bits (2 bytes) to encode 8 bits of binary data. base64 would be an improvement over the current implementation that uses hexadecimal, requiring 8 bits to encode 6 bits of binary data. (The other constraint here is that encoding needs to be handled by the parser, which in the original example, means it can't have a quotation mark.) IMO, base64 is:

  • smaller than hex
  • somewhat ubiquitous
  • compatible with the existing parser (I think?)
  • a reasonable default for situations where the size overhead is an acceptable tradeoff for having the MLIR file be text-based and self-contained.

For text-based MLIR file situations where the size increase is an issue, I could imagine the config data containing a string with the name of an external file, and an application-defined way to resolve the full path. That could be out-of-tree and/or dialect specific.

Yeah, I'd gladly switch to base64. We've only been using hex because it has prior art/use in-tree.

rriddle updated this revision to Diff 434252.Jun 4 2022, 12:52 AM
rriddle retitled this revision from [mlir] Allow for attaching external configurations to .mlir files to [mlir] Allow for attaching external resources to .mlir files.
rriddle edited the summary of this revision. (Show Details)
rriddle updated this revision to Diff 436069.Jun 10 2022, 3:59 PM
rriddle edited the summary of this revision. (Show Details)
stellaraccident accepted this revision.Jun 27 2022, 3:55 PM

A few very minor nits, but otherwise lgtm.

mlir/include/mlir/IR/AsmState.h
45

s/used/use/

97

sp. dataAlignment

mlir/lib/Parser/Parser.cpp
307

Missing closing ascii art?

This revision is now accepted and ready to land.Jun 27 2022, 3:55 PM
mehdi_amini accepted this revision.Jun 28 2022, 5:39 AM
mehdi_amini added inline comments.
mlir/include/mlir/IR/OpImplementation.h
32

Why didn't you keep an instance of the dialect here? Seems like you have to pass it around in a few places.

62

I find the naming and setup confusing with an inner class "Base" which inherits from its enclosing class? What are you trying to convey with this?
Is this some common pattern?

mlir/lib/Parser/Parser.cpp
317

I'm not sure what "otherwise" refer to here? This is the first comment in this function

mlir/test/lib/Dialect/Test/TestDialect.h
52

Where does deallocation happens?

rriddle marked 8 inline comments as done.Jun 28 2022, 12:19 PM
rriddle added inline comments.
mlir/include/mlir/IR/OpImplementation.h
32

No good reason, added a dialect field (which is likely cleaner/more general anyways).

62

We do this in a few other places as well. The intent is to provide a CRTP Base/Impl class that defines away all of the boiler plate for the given resource type. Users don't really interact with it, it's solely for simplifying the definition of concrete handle types. Do you have a suggestion on better naming here?

mlir/test/lib/Dialect/Test/TestDialect.h
52

The base AsmResourceBlob holds a deleter function which deallocates the data within the destructor.

mehdi_amini added inline comments.Jun 28 2022, 12:57 PM
mlir/include/mlir/IR/OpImplementation.h
62

Why is it an inner class?

mlir/test/lib/Dialect/Test/TestDialect.h
52

I was missing that you were inheriting the base constructor and that it is how we get the deleter...

rriddle marked 2 inline comments as done.Jun 28 2022, 12:58 PM
rriddle added inline comments.
mlir/include/mlir/IR/OpImplementation.h
62

Just to make it more contained (i.e. not pollute the mlir namespace), I could just pull it out if you prefer.

rriddle updated this revision to Diff 440771.Jun 28 2022, 2:30 PM
This revision was landed with ongoing or failed builds.Jun 29 2022, 12:17 PM
This revision was automatically updated to reflect the committed changes.

This commit has broken the mlir builder on s390x-ibm-linux: https://lab.llvm.org/buildbot/#/builders/199/builds/6417

The immediate error is this failing assertion:

mlir-opt: /home/uweigand/llvm/llvm-head/mlir/test/lib/Dialect/Test/TestDialect.cpp:191: {anonymous}::TestOpAsmInterface::parseResource(mlir::AsmParsedResourceEntry&) const::<lambda(unsigned int, unsigned int)>: Assertion `align == alignof(uint64_t) && "unexpected data alignment"' failed.

which occurs because the value of align is 0x08000000 instead of the expected 8. The alignment was taken from the binary blob, encoded in little-endian format, but decoded as uint32_t on the host platform, which in our case is big-endian.

Now this is easily fixed by using proper endian conversion when accessing the alignment, which fixes the assertion, but then the test cases still fail because they expect little-endian data in the generated blobs, while they will actually contain big-endian data when generated on a big-endian machine.

What is the intended design point here? Are those binary blobs intended to be portable across machine architectures? In this case, I think we'd have to define a particular binary format that needs to be used (e.g. the blobs are per definition always little-endian), and then the code would have to be updated to always respect that format. Or, instead, are those blobs intended to be machine-specific? In this case, at the least the tests need to be made machine specific as well.

Note that in general, there may of course be other issues apart from endianness when attempting to re-use binary blobs across machines, e.g. pointer and/or word size, or ABI struct alignment rules. To define an actually portable format, all of that would have to be addressed.

rriddle added a comment.EditedJul 8 2022, 9:52 AM

This commit has broken the mlir builder on s390x-ibm-linux: https://lab.llvm.org/buildbot/#/builders/199/builds/6417

The immediate error is this failing assertion:

mlir-opt: /home/uweigand/llvm/llvm-head/mlir/test/lib/Dialect/Test/TestDialect.cpp:191: {anonymous}::TestOpAsmInterface::parseResource(mlir::AsmParsedResourceEntry&) const::<lambda(unsigned int, unsigned int)>: Assertion `align == alignof(uint64_t) && "unexpected data alignment"' failed.

which occurs because the value of align is 0x08000000 instead of the expected 8. The alignment was taken from the binary blob, encoded in little-endian format, but decoded as uint32_t on the host platform, which in our case is big-endian.

Now this is easily fixed by using proper endian conversion when accessing the alignment, which fixes the assertion, but then the test cases still fail because they expect little-endian data in the generated blobs, while they will actually contain big-endian data when generated on a big-endian machine.

What is the intended design point here? Are those binary blobs intended to be portable across machine architectures? In this case, I think we'd have to define a particular binary format that needs to be used (e.g. the blobs are per definition always little-endian), and then the code would have to be updated to always respect that format. Or, instead, are those blobs intended to be machine-specific? In this case, at the least the tests need to be made machine specific as well.

Note that in general, there may of course be other issues apart from endianness when attempting to re-use binary blobs across machines, e.g. pointer and/or word size, or ABI struct alignment rules. To define an actually portable format, all of that would have to be addressed.

Thanks for bringing this up. I think for the sake of the textual format, we should just always use little endian (this matches what DenseElementsAttr stores as in hex form as well). For the binary format of MLIR, depending on what we want to do in the long term, we could craft something more intricate so that blobs are more optimal on non-little endian (so we wouldn't need to do conversions). I'm also happy with extending the API/serialized data for blobs here to cover any information necessary for endian conversions/etc.

Can we XFAIL the test on this platform in the meantime?

(bring the bot back to green should be a priority)

We have to fix the alignment tag for real, otherwise there will be failing assertions all over the place. With that fix in, there's still a failing test case - that one could indeed be XFAILed for now.

I've submitted a patch to implement this here: https://reviews.llvm.org/D129483