Page MenuHomePhabricator

[mlir] Allow for attaching external resources to .mlir files
Needs ReviewPublic

Authored by rriddle on May 25 2022, 9:32 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This commit enables support for providing and processing external
resources within MLIR assembly formats. This is a mechanism with which
dialects, and external clients, may attach additional information when
printing IR without that information being encoded in the IR itself.
External resources are not uniqued within the MLIR context, are not
attached directly to any operation, and are solely intended to live and be
processed outside of the immediate IR. There are many potential uses of this
functionality, for example MLIR's pass crash reproducer could utilize this to
attach the pass resource executing when a crash occurs. Other types of
uses may be embedding large amounts of binary data, such as weights in ML
applications, that shouldn't be copied directly into the MLIR context, but
need to be kept adjacent to the IR.

External resources are encoded using a key-value pair nested within a
dictionary anchored by name either on a dialect, or an externally registered
entity. The key is an identifier used to disambiguate the data. The value
may be stored in various limited forms, but general encodings use a string
(human readable) or blob format (binary). Within the textual format, an
example may be of the form:

mlir
{-#
  // The `dialect_resources` section within the file-level metadata
  // dictionary is used to contain any dialect resource entries.
  dialect_resources: {
    // Here is a dictionary anchored on "foo_dialect", which is a dialect
    // namespace.
    foo_dialect: {
      // `some_dialect_resource` is a key to be interpreted by the dialect,
      // and used to initialize/configure/etc.
      some_dialect_resource: "Some important resource value"
    }
  },
  // The `external_resources` section within the file-level metadata
  // dictionary is used to contain any non-dialect resource entries.
  external_resources: {
    // Here is a dictionary anchored on "mlir_reproducer", which is an
    // external entity representing MLIR's crash reproducer functionality.
    mlir_reproducer: {
      // `pipeline` is an entry that holds a crash reproducer pipeline
      // resource.
      pipeline: "func.func(canonicalize,cse)"
    }
  }

Diff Detail

Event Timeline

rriddle created this revision.May 25 2022, 9:32 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 25 2022, 9:32 PM
rriddle requested review of this revision.May 25 2022, 9:32 PM
rriddle planned changes to this revision.May 25 2022, 9:36 PM

What about using directly YAML for this section?

Doing it ourselves would keep MLIR light. And the external config API would just be built on top of YAML.

Mogball added inline comments.May 26 2022, 10:35 AM
mlir/include/mlir/IR/AsmState.h
57

Just curious, but what else could be stored here other than "external config"?

69

Is there any guarantee that dialect names won't collide with any random config key?

What about using directly YAML for this section?

I considered, but ultimately it felt like more complexity than the benefits it brought (especially for this draft). Having something
just for us felt much simpler and more expressive (we could use Attributes/Types for config values if we wanted). I could likely be
convinced otherwise though, using something like YAML would conceptually make supporting more complex config values
easier (nested structs and stuff) if we wanted to go that approach. There are tradeoffs though (e.g. error reporting, attribute/type values,
still need a generic interface to enable seamless bitcode integration, etc.).

mlir/include/mlir/IR/AsmState.h
57

The idea for the top-level dictionary was to provide a convenient place to put things in the future. For example, this would be a natural place to put the dialect_versions metadata in Mehdi's WIP versioning patch: https://reviews.llvm.org/D117761

69

Non-dialect config keys are always wrapped with <>, so there won't be collision between the two.

rriddle requested review of this revision.Mon, May 30, 10:49 PM
rriddle updated this revision to Diff 433016.
rriddle retitled this revision from WIP: [mlir] Allow for attaching external configurations to .mlir files to [mlir] Allow for attaching external configurations to .mlir files.
rriddle edited the summary of this revision. (Show Details)

What about using directly YAML for this section?

I considered, but ultimately it felt like more complexity than the benefits it brought (especially for this draft). Having something
just for us felt much simpler and more expressive (we could use Attributes/Types for config values if we wanted). I could likely be
convinced otherwise though, using something like YAML would conceptually make supporting more complex config values
easier (nested structs and stuff) if we wanted to go that approach. There are tradeoffs though (e.g. error reporting, attribute/type values,
still need a generic interface to enable seamless bitcode integration, etc.).

If the AsmConfig* classes were a bit more POD/model-ish, then it would be pretty easy to wrap YAML around that, but the way the implementation is laid out would be a bit of a weird fit (and I think I see why you have it the way you do). In general, I agree with the tradeoffs: it will feel and act like a complete side-thing. From my point of view, how this grows towards bitcode is more salient to me, and binding the text format close to YAML will just make it weird (all my subjective opinion based on trying to do such things before).

mehdi_amini added inline comments.Tue, May 31, 12:03 PM
mlir/test/IR/file-metadata-config.mlir
17 ↗(On Diff #433016)

I do have some concerns with this example (to the extent that this will be an example in-tree).

There is a problem in that we can't load in the same context some piece of IR if there was already another piece of IR using "blob1" loaded (even if the IR is deleted since).

rriddle added inline comments.Tue, May 31, 12:33 PM
mlir/test/IR/file-metadata-config.mlir
17 ↗(On Diff #433016)

Oh yeah, thanks for pointing this out. I originally wrote this quickly as a simple example, but forgot to update it. Do you have any suggestions on what we want to do here to set a good example? Compare the data? Just compare hashes?

Have you considered using base64 instead of hex for binary data?

https://reviews.llvm.org/D126254 will bring base64 to LLVM and LLDB.

Have you considered using base64 instead of hex for binary data?

I would argue against base64 personally - I don't know if the benefits outweigh the issue that it inflates all data by 33%. It does beg the question though, of 'should we allow folks to choose some other encoding for data in this config struct?' If this is the only use case - that some folks want b64 data, then maybe it's better to just wait for the bitcode format and b64 encode the whole file if you want it to be Javascript-safe?

The current implementation uses hexadecimal, which requires 16 bits (2 bytes) to encode 8 bits of binary data. base64 would be an improvement over the current implementation that uses hexadecimal, requiring 8 bits to encode 6 bits of binary data. (The other constraint here is that encoding needs to be handled by the parser, which in the original example, means it can't have a quotation mark.) IMO, base64 is:

  • smaller than hex
  • somewhat ubiquitous
  • compatible with the existing parser (I think?)
  • a reasonable default for situations where the size overhead is an acceptable tradeoff for having the MLIR file be text-based and self-contained.

For text-based MLIR file situations where the size increase is an issue, I could imagine the config data containing a string with the name of an external file, and an application-defined way to resolve the full path. That could be out-of-tree and/or dialect specific.

The current implementation uses hexadecimal, which requires 16 bits (2 bytes) to encode 8 bits of binary data. base64 would be an improvement over the current implementation that uses hexadecimal, requiring 8 bits to encode 6 bits of binary data. (The other constraint here is that encoding needs to be handled by the parser, which in the original example, means it can't have a quotation mark.)

Good point! I was thinking of just the raw binary data, forgot that hex is silly that way :)

The current implementation uses hexadecimal, which requires 16 bits (2 bytes) to encode 8 bits of binary data. base64 would be an improvement over the current implementation that uses hexadecimal, requiring 8 bits to encode 6 bits of binary data. (The other constraint here is that encoding needs to be handled by the parser, which in the original example, means it can't have a quotation mark.) IMO, base64 is:

  • smaller than hex
  • somewhat ubiquitous
  • compatible with the existing parser (I think?)
  • a reasonable default for situations where the size overhead is an acceptable tradeoff for having the MLIR file be text-based and self-contained.

For text-based MLIR file situations where the size increase is an issue, I could imagine the config data containing a string with the name of an external file, and an application-defined way to resolve the full path. That could be out-of-tree and/or dialect specific.

Yeah, I'd gladly switch to base64. We've only been using hex because it has prior art/use in-tree.

rriddle updated this revision to Diff 434252.Sat, Jun 4, 12:52 AM
rriddle retitled this revision from [mlir] Allow for attaching external configurations to .mlir files to [mlir] Allow for attaching external resources to .mlir files.
rriddle edited the summary of this revision. (Show Details)
rriddle updated this revision to Diff 436069.Fri, Jun 10, 3:59 PM
rriddle edited the summary of this revision. (Show Details)