This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Overhaul C/Python registration APIs to properly scope registration/loading activities.
ClosedPublic

Authored by stellaraccident on Jun 25 2022, 4:01 PM.

Details

Summary

Since the very first commits, the Python and C MLIR APIs have had mis-placed registration/load functionality for dialects, extensions, etc. This was done pragmatically in order to get bootstrapped and then just grew in. Downstreams largely bypass and do their own thing by providing various APIs to register things they need. Meanwhile, the C++ APIs have stabilized around this and it would make sense to follow suit.

The thing we have observed in canonical usage by downstreams is that each downstream tends to have native entry points that configure its installation to its preferences with one-stop APIs. This patch leans in to this approach with RegisterEverything.h and mlir._mlir_libs._mlirRegisterEverything being the one-stop entry points for the "upstream packages". The _mlir_libs.__init__.py now allows customization of the environment and Context by adding "initialization modules" to the _mlir_libs package. If present, _mlirRegisterEverything is treated as such a module. Others can be added by downstreams by adding a _site_initialize_{i}.py module, where '{i}' is a number starting with zero. The number will be incremented and corresponding module loaded until one is not found. Initialization modules can:

  • Perform load time customization to the global environment (i.e. registering passes, hooks, etc).
  • Define a register_dialects(registry: DialectRegistry) function that can extend the DialectRegistry that will be used to bootstrap the Context.
  • Define a context_init_hook(context: Context) function that will be added to a list of callbacks which will be invoked after dialect registration during Context initialization.

Note that the MLIRPythonExtension.RegisterEverything is not included by default when building a downstream (its corresponding behavior was prior). For downstreams which need the default MLIR initialization to take place, they must add this back in to their Python CMake build just like they add their own components (i.e. to add_mlir_python_common_capi_library and add_mlir_python_modules). It is perfectly valid to not do this, in which case, only the things explicitly depended on and initialized by downstreams will be built/packaged. If the downstream has not been set up for this, it is recommended to simply add this back for the time being and pay the build time/package size cost.

CMake changes:

  • MLIRCAPIRegistration -> MLIRCAPIRegisterEverything (renamed to signify what it does and force an evaluation: a number of places were incidentally linking this very expensive target)
  • MLIRPythonSoure.Passes removed (without replacement: just drop)
  • MLIRPythonExtension.AllPassesRegistration removed (without replacement: just drop)
  • MLIRPythonExtension.Conversions removed (without replacement: just drop)
  • MLIRPythonExtension.Transforms removed (without replacement: just drop)

Header changes:

  • mlir-c/Registration.h is deleted. Dialect registration functionality is now in IR.h. Registration of upstream features are in mlir-c/RegisterEverything.h. When updating MLIR and a couple of downstreams, I found that proper usage was commingled so required making a choice vs just blind S&R.

Python APIs removed:

  • mlir.transforms and mlir.conversions (previously only had an __init__.py which indirectly triggered mlirRegisterTransformsPasses() and mlirRegisterConversionPasses() respectively). Downstream impact: Remove these imports if present (they now happen as part of default initialization).
  • mlir._mlir_libs._all_passes_registration, mlir._mlir_libs._mlirTransforms, mlir._mlir_libs._mlirConversions. Downstream impact: None expected (these were internally used).

C-APIs changed:

  • mlirRegisterAllDialects(MlirContext) now takes an MlirDialectRegistry instead. It also used to trigger loading of all dialects, which was already marked with a TODO to remove -- it no longer does, and for direct use, dialects must be explicitly loaded. Downstream impact: Direct C-API users must ensure that needed dialects are loaded or call mlirContextLoadAllAvailableDialects(MlirContext) to emulate the prior behavior. Also see the ir.c test case (e.g. mlirContextGetOrLoadDialect(ctx, mlirStringRefCreateFromCString("func"));).
  • mlirDialectHandle* APIs were moved from Registration.h (which now is restricted to just global/upstream registration) to IR.h, arguably where it should have been. Downstream impact: include correct header (likely already doing so).

C-APIs added:

  • mlirContextLoadAllAvailableDialects(MlirContext): Corresponds to C++ API with the same purpose.

Python APIs added:

  • mlir.ir.DialectRegistry: Mapping for an MlirDialectRegistry.
  • mlir.ir.Context.append_dialect_registry(MlirDialectRegistry)
  • mlir.ir.Context.load_all_available_dialects()
  • mlir._mlir_libs._mlirAllRegistration: New native extension that exposes a register_dialects(MlirDialectRegistry) entry point and performs all upstream pass/conversion/transforms registration on init. In this first step, we eagerly load this as part of the __init__.py and use it to monkey patch the Context to emulate prior behavior.
  • Type caster and capsule support for MlirDialectRegistry

This should make it possible to build downstream Python dialects that only depend on a subset of MLIR. See: https://github.com/llvm/llvm-project/issues/56037

Here is an example PR, minimally adapting IREE to these changes: https://github.com/iree-org/iree/pull/9638/files In this situation, IREE is opting to not link everything, since it is already configuring the Context to its liking. For projects that would just like to not think about it and pull in everything, add MLIRPythonExtension.RegisterEverything to the list of Python sources getting built, and the old behavior will continue.

Diff Detail

Event Timeline

stellaraccident requested review of this revision.Jun 25 2022, 4:01 PM

Do you believe that there is a place for a HowTo.md that makes downstream users really happy?

Do you believe that there is a place for a HowTo.md that makes downstream users really happy?

As in a "How to Integrate into a Downstream Project?" guide. Yes, we should create that.

Do you believe that there is a place for a HowTo.md that makes downstream users really happy?

As in a "How to Integrate into a Downstream Project?" guide. Yes, we should create that.

Exactly.

Do you believe that there is a place for a HowTo.md that makes downstream users really happy?

As in a "How to Integrate into a Downstream Project?" guide. Yes, we should create that.

I will note that the Standalone example does wire this together, but it may be too simple. Since we never got the C/Python registration stuff right (from the get-go), my goal with this patch was to be as NFC as possible to set the stage. Then I was going to make changes to how it is all set up, reflected in the Standalone example. That would probably be the right time to add a guide as well. We have not previously specified how to "do registration right", and my view is that all of the downstreams made pragmatic (and different) choices there. We should establish a norm and use it upstream/Standalone-sample and advocate for downstreams to follow suit, imo.

ftynse accepted this revision.Jun 27 2022, 2:56 AM

Would you mind posting a PSA on the forum copying the change description and the tentative evolution plan?

mlir/include/mlir-c/Registration.h
1–6

Should we rename the file to RegisterEverything.h or something to make it clear?

mlir/lib/Bindings/Python/IRCore.cpp
19

Nit: drop this line

mlir/python/mlir/_mlir_libs/__init__.py
42

This looks unnecessary.

mlir/tools/mlir-pdll/mlir-pdll.cpp
122–126 ↗(On Diff #440014)

Looks unrelated.

This revision is now accepted and ready to land.Jun 27 2022, 2:56 AM
stellaraccident planned changes to this revision.Jun 27 2022, 7:56 AM

Would you mind posting a PSA on the forum copying the change description and the tentative evolution plan?

Thanks, Alex. I'm going to take another pass through with an eye to getting things lined up to the final state. Will ping for another look then.

Move closer to end state.

This revision is now accepted and ready to land.Jun 27 2022, 2:30 PM
stellaraccident marked an inline comment as done.Jun 27 2022, 2:33 PM
stellaraccident added inline comments.
mlir/python/mlir/_mlir_libs/__init__.py
42

Reworked all of this.

stellaraccident edited the summary of this revision. (Show Details)Jun 27 2022, 2:50 PM
ftynse accepted this revision.Jun 28 2022, 2:05 AM

I am not a huge fan of magic method / file names, but this is a common thing to do in Python. My only usability concern is about debugging:

with Context():
  ir.parse("custom.func @foo()",)

will work or break depending on where the Context is imported from: from mlir.ir import Context vs from custom.ir import Context. I suppose we can get used to it.

mlir/lib/Bindings/Python/IRCore.cpp
19

Still here...

mlir/lib/Bindings/Python/RegisterEverything.cpp
19–26

Not super-relevant for this patch, but the inconsistency between having to call a function to register dialects and having the passes registered on load feels confusing.

mehdi_amini accepted this revision.Jun 28 2022, 4:54 AM

I have some concerns with:

Define a register_dialects(registry: DialectRegistry) function that can extend the DialectRegistry that will be used to bootstrap the Context.
Define a context_init_hook(context: Context) function that will be added to a list of callbacks which will be invoked after dialect registration during Context initialization.

Because contrary to pass registration for example, there isn't a global context or global dialect registry in MLIR: that means that I should be able to create a context inside my python library and not have it "polluted" by something that was registered by another library for its own purpose.

That said this is still quite an big step forward from where we're at right now, even though I fear we may dig ourselves in a place where it'll be hard to get out of.

mlir/lib/Bindings/Python/RegisterEverything.cpp
19–26

This matches somehow MLIR though in that passes are in a global registries while Dialects aren't.

mlir/lib/CAPI/RegisterEverything/RegisterEverything.cpp
19

Great that you fixed this TODO! :)

I am not a huge fan of magic method / file names, but this is a common thing to do in Python. My only usability concern is about debugging:

with Context():
  ir.parse("custom.func @foo()",)

will work or break depending on where the Context is imported from: from mlir.ir import Context vs from custom.ir import Context. I suppose we can get used to it.

It is probably best to think of these as completely different libraries: iree.ir.Context, circt.ir.Context are IREE/CIRCT APIs respectively. It is kind of weird if their contexts don't come instantiated to do the job. If they were just open coding a Context, they would just declare their own, but in actuality, this class is very wired in to the core code and not easily modifiable by downstreams. Therefore we provide some hooks for customizing it, so they can make it their own... That part is more mechanics for working around practicalities of interop between the native code and extenders.

I have less of an opinion (after we land all of this) about whether the stock mlir.ir.Context comes pre registered with everything. Aside from matching current behavior, which is an important intermediate state during this transition, we could choose to make the mlirRegisterEverything part of the automatic setup be explicit for upstream in some way. So I see this as digging us halfway out of the hole we're in on that and giving downstreams the ability to climb out completely. Let's discuss the upstream register everything default in a follow-up. Anything we would do there would be an API break and we should discuss that explicitly. I think this patch leaves us the options we need there.

I have some concerns with:

Define a register_dialects(registry: DialectRegistry) function that can extend the DialectRegistry that will be used to bootstrap the Context.
Define a context_init_hook(context: Context) function that will be added to a list of callbacks which will be invoked after dialect registration during Context initialization.

Because contrary to pass registration for example, there isn't a global context or global dialect registry in MLIR: that means that I should be able to create a context inside my python library and not have it "polluted" by something that was registered by another library for its own purpose.

That said this is still quite an big step forward from where we're at right now, even though I fear we may dig ourselves in a place where it'll be hard to get out of.

Thanks for your analysis. To avoid double typing, I think I addressed some of them in my comment above to Alex. Lmk and happy to keep resolving the default registration case post this patch.

Sorry for the delay, but I finally got around to taking this for a spin.

In terms of implementation details, I think this looks good in general. The _site_initialize_i.py mechanism would be useful for us downstream in CIRCT, and I agree with previous comments about using it for the short term to continue registering everything upstream, until we decide to stop doing that by default.

In CIRCT, I measured a reduction from 379 MB to 84 MB in our add_mlir_python_common_capi_library library (we depend only on MLIRPythonSources.Core, no other dialects or transforms).

So in general, +1 from me. Thanks for cleaning this up @stellaraccident, and for providing a hook for downstreams to do site-specific initialization (and upstream to retain previous behavior).

mlir/lib/Bindings/Python/IRCore.cpp
19

Just bumping this since I'm reviewing right now.

mlir/lib/Bindings/Python/RegisterEverything.cpp
19–26

Agreed that this sort of matches MLIR. In the case of CIRCT, we've done something similar in our own bespoke version of this, where import circt registers passes, but for dialects, you have to call circt.register_dialects.

I think that will change to all happen automatically using the new _site_initialization hook after this patch, but just weighing in that this is how we split it downstream as well.

Sorry for the delay, but I finally got around to taking this for a spin.

In terms of implementation details, I think this looks good in general. The _site_initialize_i.py mechanism would be useful for us downstream in CIRCT, and I agree with previous comments about using it for the short term to continue registering everything upstream, until we decide to stop doing that by default.

In CIRCT, I measured a reduction from 379 MB to 84 MB in our add_mlir_python_common_capi_library library (we depend only on MLIRPythonSources.Core, no other dialects or transforms).

So in general, +1 from me. Thanks for cleaning this up @stellaraccident, and for providing a hook for downstreams to do site-specific initialization (and upstream to retain previous behavior).

Thanke, Mike. Not that it is super relevant to this patch but I'm curious: is that some kind of debug build? 379MiB seems quite hefty otherwise! I expect that maybe you are <10MiB in a stripped/release build?

Not that it is super relevant to this patch but I'm curious: is that some kind of debug build? 379MiB seems quite hefty otherwise! I expect that maybe you are <10MiB in a stripped/release build?

Yes, that was a debug build.

jdd added a subscriber: jdd.Jul 11 2022, 11:21 AM

Rebase, fix standalone example, attempt to patch bazel as best I could.

This revision was landed with ongoing or failed builds.Jul 16 2022, 5:28 PM
This revision was automatically updated to reflect the committed changes.

This broke SparseTensor integration tests. I will fix forward.

Sorry, yes - I submitted a follow-up and neglected to note on this thread: https://reviews.llvm.org/rGbeebffa9ab81bba3de91b2cee7a62578ebe1ae00

mlir/python/mlir/_mlir_libs/_mlir/ir.pyi