This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add support for "promised" interfaces
ClosedPublic

Authored by rriddle on Feb 22 2022, 4:19 PM.

Details

Summary

Promised interfaces allow for a dialect to "promise" the implementation of an interface, i.e.
declare that it supports an interface, but have the interface defined in an extension in a library
separate from the dialect itself. A promised interface is powerful in that it alerts the user when
the interface is attempted to be used (e.g. via cast/dyn_cast/etc.) and the implementation has
not yet been provided. This makes the system much more robust against misconfiguration,
and ensures that we do not lose the benefit we currently have of defining the interface in
the dialect library.

Diff Detail

Event Timeline

rriddle created this revision.Feb 22 2022, 4:19 PM
Herald added a project: Restricted Project. · View Herald Transcript
rriddle requested review of this revision.Feb 22 2022, 4:19 PM
ormris removed a subscriber: ormris.Feb 24 2022, 10:06 AM
springerm accepted this revision.Mar 25 2023, 5:16 AM

This revision is already quite old now but I rebased the "promised interfaces" part (D146414) and it worked well. Would be great to get this landed. I'm preparing a stack of revisions for a new interface with external model impls (D145681) and it would benefit from this revision.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
24 ↗(On Diff #410671)

This where the interface is defined. Based on the RFC (https://discourse.llvm.org/t/rfc-dialect-extensions-and-promised-interfaces/60451/3) I thought this dependency could be removed.

The only thing that is important is the type name of the interface (for the declarePromisedInterface API), which can be grabbed (e.g. via forward declaration) in ways such that you don’t depend on the library where the interface is defined.

But I think a forward declaration of the interface is not enough because it doesn't have the type ID of the interface. So this "promised interface" mechanism will not work for the BufferizableOpInterface which is defined in the bufferization dialect. Still, it would already improve the situation a lot for the majority of the interfaces, which are defined in mlir/Interfaces.

This revision is now accepted and ready to land.Mar 25 2023, 5:16 AM
rriddle updated this revision to Diff 529111.Jun 6 2023, 6:01 PM
rriddle edited the summary of this revision. (Show Details)
rriddle updated this revision to Diff 529800.Jun 8 2023, 6:26 PM
This revision was automatically updated to reflect the committed changes.
wrengr added a comment.Jun 9 2023, 1:35 PM

This commit breaks Bazel: https://buildkite.com/llvm-project/upstream-bazel/builds/64887#0188a16c-a007-47f6-9324-11b5132b31ea

The root cause (or at least one major issue) is that mlir-opt.cpp can't find the "mlir/include/mlir/InitAllExtensions.h" header. But since the cmake files don't care much about headers I'm having trouble reconstructing the intent. Can you post a fix?

This commit breaks Bazel: https://buildkite.com/llvm-project/upstream-bazel/builds/64887#0188a16c-a007-47f6-9324-11b5132b31ea

The root cause (or at least one major issue) is that mlir-opt.cpp can't find the "mlir/include/mlir/InitAllExtensions.h" header. But since the cmake files don't care much about headers I'm having trouble reconstructing the intent. Can you post a fix?

Bazel isn't required to support and I don't have bazel setup to fix anything, so unfortunately I won't be able to help here.

ftynse added a subscriber: ftynse.Jun 15 2023, 10:01 AM

After updating a dozen of dependent projects to account for this change, I have concerns about usability.

Practically, promised dialects create a rather tight coupling between a dialect with a promised interface and an extension that realizes that promise. This makes the extension mandatory, which conceptually contradicts the notion of extension. Specifically, func dialect is now unusable without the extension that implements the inliner interface. I haven't had time to dig deeper into this, but it appears that it is the case even when the interface isn't ever requested by the user code.

Furthermore, it is often impossible to register the extension as a dependency of the pass, beyond the need to do it manually rather than in .td of the pass declaration. Specifically, extension registration in multithreaded mode (even when the same extensions are present in the registry) triggers an assertion due to https://github.com/llvm/llvm-project/blob/bae2e56115d617cc5e152b4743484ab04649fe13/mlir/lib/IR/Dialect.cpp#L269-L271 and extensions not having a TypeID that could be used to deduplicate them. This happens for projects that create a pass manager inside a pass run by another pass manager. I would consider this practice dubious, but it is quite common.

To complicate the matters, the error message omits the dialect name most of the time and reads like there it is itself buggy: "checking for an interface (``) that was promised by dialect 'func' but never implemented.", is interface nullptr? Something went wrong with registration before so there is no name? Certainly it should be possible to use, e.g., macro stringization to print at least the C++ class name of the missing interface.

rnk added a subscriber: rnk.Jun 21 2023, 9:28 AM

I'm not very familiar with MLIR, dialects, extensions, etc, but from my perspective, this change seems to create more API update churn than the MLIR developer policy

When performing breaking changes, we expect to proceed with the most convenient way for the change to be reviewed and implemented, while minimizing churn. It is frequent to break changes in multiple commits to ease the review process. A breaking can be implemented incrementally within these guidelines, but we don’t go out of our way only to accomodate downstream users. As such we don’t have predefined deprecation period for APIs. A good practice is to communicate on Discourse about the timeline and plan for conducting such changes.

My understanding is that every essentially all code initializing an MLIR context, such as the toy examples in this CL, need an update to register function extensions. So, it affects every single user, which seems like gratuitous churn, as described in the paragraph above. I can't say what a dialect or an extension are or how they relate or depend on each other, but it seems like this could have been implemented transparently by registering extensions as part of registering dialects, which seems like something that every client already does when initializing an MLIR context.

I can't say what the right approach here is, but there are other tradeoffs on the spectrum between rapid development and minimizing churn, which is a stated goal of the developer policy. I encourage the project to weigh that goal with more care in the future.

mlir/examples/toy/Ch7/toyc.cpp
295

This seems like the most representative change required of essentially all clients initializing MLIR.

rnk added a comment.Jun 21 2023, 9:38 AM

My understanding is that every essentially all code initializing an MLIR context, such as the toy examples in this CL, need an update to register function extensions. So, it affects every single user, which seems like gratuitous churn, as described in the paragraph above.

After thinking more about this, MLIR update churn is super common, and the volume this change creates is probably inline with that. I think what made this different was that failures manifested at runtime, which makes API updates much more expensive, and that's something to consider when evaluating changes.

I am wondering why this changes requires any downstream fixes at all. The purpose of this change is to give users an error when an external model is not registered (to avoid misconfiguration). This error is triggered when casting to an op interface for which an external model exists, but that was not registered.

E.g.:

tensor::ExtractOp op;
// error if `ExtractOpInterface` in `BufferizableOpInterfaceImpl.cpp` is not registered
cast<BufferizableOpInterface>(op.getOperation());

It was my understanding that, as long as there is no cast to the interface (for which the registration is missing), there should be no runtime error. Similarly, *if* there is such a cast, and the model is not registered, this piece of code is likely not working correctly, because the cast cannot succeed. (Similarly, isa<...> returns false, even though the op does implement that interface and the model is just not registered.) Consequently, every time we are seeing an error (caused by this revision) in a downstream project, we are actually detecting a bug/misconfiguration in the project, that just went unnoticed until now. Does this make sense?

Now, there seems to be something special about func extensions and the inliner here, because that's the only component that seems to cause issues with many downstream projects...

I am wondering why this changes requires any downstream fixes at all. The purpose of this change is to give users an error when an external model is not registered (to avoid misconfiguration). This error is triggered when casting to an op interface for which an external model exists, but that was not registered.

It was my understanding that, as long as there is no cast to the interface (for which the registration is missing), there should be no runtime error. Similarly, *if* there is such a cast, and the model is not registered, this piece of code is likely not working correctly, because the cast cannot succeed. (Similarly, isa<...> returns false, even though the op does implement that interface and the model is just not registered.) Consequently, every time we are seeing an error (caused by this revision) in a downstream project, we are actually detecting a bug/misconfiguration in the project, that just went unnoticed until now. Does this make sense?

There is a description of the problem in my answer above. This change not only implements the promised interfaces feature, but also immediately uses it for implementing the inliner interface of the func dialect. I haven't had time yet to check if it indeed works if nobody requests this interface, but that's not the point.

While we did discover some additional misconfigurations in downstream projects in the process of debugging this, most of them were not misconfigured before this change. Prior to this change, it was sufficient for them to register the func dialect that came with the inliner interface implementation. After this change, it became necessary for them to also register the inliner interface implementation that _no longer_ comes with the dialect. This requires an explicit change and makes previously-correct configuration incorrect, with no warning.

Furthermore, using extension makes it impossible to add the interface in the getDependentDialects of a pass in the general case (see explanation above for the special case) in case it is needed down the road. Again, the pass was correctly registering the func dialect as dependent prior to this change. This also makes previously-correct configurations incorrect, and difficult to debug except for the situations where all registration happens upfront instead of progressively as pass dependencies.

The purpose of this change is to give users an error when an external model is not registered (to avoid misconfiguration).

Also, the change isn't very good to doing exactly that: the error message doesn't specify the model for which interface is missing in the overwhelming majority of cases.

We have also hit this issue with quite an unhelpful error message

LLVM ERROR: checking for an interface (``) that was promised by dialect 'func' but never implemented. This is generally an indication that the dialect extension implementing the interface was never registered.

And I also don't understand why this change was needed for func dialect specifically as it wasn't elaborated in commit description. Can we revert func dialect change?

Given that it's been two weeks since it was landed, I would not push for revert as downstreams may have adapted already and it would cause additional churn.

I suppose the change in func was a way to "test" the functionality since there don't seem to be any unit tests.

@Hardcode84 to fix that error, find all places where MLIRContext is created and add both the func dialect and the inliner extension at that point. We did this by printing the stack trace from the MLIRContext constructor (binaries like tf-opt crash the linker if building in mode...).

Given that it's been two weeks since it was landed, I would not push for revert as downstreams may have adapted already and it would cause additional churn.

I suppose the change in func was a way to "test" the functionality since there don't seem to be any unit tests.

@Hardcode84 to fix that error, find all places where MLIRContext is created and add both the func dialect and the inliner extension at that point. We did this by printing the stack trace from the MLIRContext constructor (binaries like tf-opt crash the linker if building in mode...).

We have already figured the fix, thanks, although it looks quite ugly in our case, as we need to register extension before context creation https://github.com/numba/numba-mlir/pull/145/files#diff-236f1f75d31eb1df427f7a1ccbf72e05d02497200d680e73348970aca3c5bd69R830

I would still vote for revert, as it will only be a compile-time breakage, not the run-time as previous one.

Given that it's been two weeks since it was landed, I would not push for revert as downstreams may have adapted already and it would cause additional churn.

I suppose the change in func was a way to "test" the functionality since there don't seem to be any unit tests.

@Hardcode84 to fix that error, find all places where MLIRContext is created and add both the func dialect and the inliner extension at that point. We did this by printing the stack trace from the MLIRContext constructor (binaries like tf-opt crash the linker if building in mode...).

We have already figured the fix, thanks, although it looks quite ugly in our case, as we need to register extension before context creation https://github.com/numba/numba-mlir/pull/145/files#diff-236f1f75d31eb1df427f7a1ccbf72e05d02497200d680e73348970aca3c5bd69R830

I would still vote for revert, as it will only be a compile-time breakage, not the run-time as previous one.

We should certainly not revert. The func changes are intentional, given we previously had a dep on ControlFlow that was only there for the inliner interface.

We should certainly not revert. The func changes are intentional, given we previously had a dep on ControlFlow that was only there for the inliner interface.

IMO, solution here causes more issues for users than the problem itself.

Practically, promised dialects create a rather tight coupling between a dialect with a promised interface and an extension that realizes that promise.

Can you clarify where is the coupling between the dialect and the extension?

This makes the extension mandatory, which conceptually contradicts the notion of extension.

I don't quite get the contradiction: the intent here is that we promise that the extension will be available (when needed), without any coupling to the actual implementation. The provider is free to supply an extension however they wish.

Specifically, func dialect is now unusable without the extension that implements the inliner interface. I haven't had time to dig deeper into this, but it appears that it is the case even when the interface isn't ever requested by the user code.

I'd be interested in a test case, right now I can't reproduce.
I commented out registerAllExtensions() in mlir-opt.cpp, and re-run check-mlir:

********************
Failed Tests (20):
  MLIR :: Conversion/AsyncToLLVM/convert-to-llvm.mlir
  MLIR :: Conversion/AsyncToLLVM/typed-pointers.mlir
  MLIR :: Dialect/Affine/inlining.mlir
  MLIR :: Dialect/Async/async-parallel-for-canonicalize.mlir
  MLIR :: Dialect/Async/async-parallel-for-compute-fn.mlir
  MLIR :: Dialect/Async/async-to-async-runtime.mlir
  MLIR :: Dialect/Bufferization/inlining.mlir
  MLIR :: Dialect/GPU/outlining.mlir
  MLIR :: Dialect/LLVMIR/inlining.mlir
  MLIR :: Dialect/Linalg/inlining.mlir
  MLIR :: Dialect/Tosa/inlining.mlir
  MLIR :: Examples/Toy/Ch5/affine-lowering.mlir
  MLIR :: Transforms/inlining-dce.mlir
  MLIR :: Transforms/inlining-recursive.mlir
  MLIR :: Transforms/inlining-repeated-use.mlir
  MLIR :: Transforms/inlining.mlir
  MLIR :: Transforms/test-inlining.mlir
  MLIR :: Transforms/test-legalizer-full.mlir
  MLIR :: Transforms/test-legalizer.mlir
  MLIR :: mlir-cpu-runner/async-group.mlir


Testing Time: 39.21s
  Unsupported:  265
  Passed     : 1872
  Failed     :   20

This seems consistent with the expected behavior I believe?

Furthermore, it is often impossible to register the extension as a dependency of the pass, beyond the need to do it manually rather than in .td of the pass declaration. Specifically, extension registration in multithreaded mode (even when the same extensions are present in the registry) triggers an assertion due to https://github.com/llvm/llvm-project/blob/bae2e56115d617cc5e152b4743484ab04649fe13/mlir/lib/IR/Dialect.cpp#L269-L271 and extensions not having a TypeID that could be used to deduplicate them. This happens for projects that create a pass manager inside a pass run by another pass manager. I would consider this practice dubious, but it is quite common.

The case you describe seems WAI if I understand correctly: the enclosing pass that is wrapping an inner pass-manager (not clear to me why it can't be replaced by the dynamic pipeline instead, but that's not changing the setup for this) should add it to the registry in the getDependentDialects() call.
I tried commenting out registerAllExtensions() in mlir-opt.cpp, and moving it to a getDependentDialects() in the InlinerPass:

void getDependentDialects(DialectRegistry &registry) const override {
  mlir::func::registerInlinerExtension(registry);
}

And ninja check-mlir is passing. Maybe I misunderstood what you're describing though?

To complicate the matters, the error message omits the dialect name most of the time and reads like there it is itself buggy: "checking for an interface (``) that was promised by dialect 'func' but never implemented.", is interface nullptr? Something went wrong with registration before so there is no name? Certainly it should be possible to use, e.g., macro stringization to print at least the C++ class name of the missing interface.

I think https://reviews.llvm.org/D153676 should fix this:

LLVM ERROR: checking for an interface (`mlir::DialectInlinerInterface`) that was promised by dialect 'func' but never implemented. This is generally an indication that the dialect extension implementing the interface was never registered.

IMO, solution here causes more issues for users than the problem itself.

  • I would prefer inliner pass to gracefully fail if extension wasn't registered instead of random abort with cryptic error message.

Feel free to suggest more improvement to the error message?
You can also try a patch to the inliner to check for the promise and fail gracefully.

I hit this while testing the registration problem Alex mentioned, and fixed it in 86e1c050e944
That said I don't quite get why you think it "defeats the entire purpose of this change": of course I'd like our upstream setup to be able to defend more against these dependent dialects issues, but this feature is entirely unrelated to this class of failures. The "promise" is intended prevent misconfiguration of the system with respect to external interfaces.

Right now I think what's still missing is enforcing that an interface was promised by a dialect to be allowed to be added externally.

Matt added a subscriber: Matt.Jun 24 2023, 9:03 PM

You can also try a patch to the inliner to check for the promise and fail gracefully.

No one asked for this feature, there weren't any PSA or public discussion (latest update in thread was more than a year ago), most users were just fine with status quo so reverting this is the easiest solution.

Right now I think what's still missing is enforcing that an interface was promised by a dialect to be allowed to be added externally.

Maybe revert and recommit after this is solved then?

Dependency between func and cf dialects may not be ideal, but at least it was enforced at compile time. To fix this minor annoyance you are introducing a whole new class of potential misconfigurations and runtime errors. Now, instead of (explicit and enforced) dependency between dialects, we will have an implicit dependency between pass and extension with no way to enforce it except in documentation (which doesn't exist, by the way).

Practically, promised dialects create a rather tight coupling between a dialect with a promised interface and an extension that realizes that promise.

Can you clarify where is the coupling between the dialect and the extension?

This makes the extension mandatory, which conceptually contradicts the notion of extension.

I don't quite get the contradiction: the intent here is that we promise that the extension will be available (when needed), without any coupling to the actual implementation. The provider is free to supply an extension however they wish.

Specifically, func dialect is now unusable without the extension that implements the inliner interface. I haven't had time to dig deeper into this, but it appears that it is the case even when the interface isn't ever requested by the user code.

I'd be interested in a test case, right now I can't reproduce.
I commented out registerAllExtensions() in mlir-opt.cpp, and re-run check-mlir:

My concern was due to seeing downstreams massively fail tests, which may be attribute those downstreams requesting an instance of DialectInlinerInterface outside of the actual inliner pass. I did not have time to investigate. It is unclear to me if the check for the implementation of the promised interface happens when the interface is first requested or before that. It appears that it's checked on request, so we shouldn't see non-inliner tests fail.

Arguably, having some sort of constructor finalization for MLIRContext that checks whether all promised interfaces were in fact provided would offer better usability than doing that on the first use, which may or may not happen. However, this contradicts my understanding of extensions: extensions provide additional features that aren't required for the "main" dialect to work. I suppose it's just word semantics though.

********************
Failed Tests (20):
  MLIR :: Conversion/AsyncToLLVM/convert-to-llvm.mlir
  MLIR :: Conversion/AsyncToLLVM/typed-pointers.mlir
  MLIR :: Dialect/Affine/inlining.mlir
  MLIR :: Dialect/Async/async-parallel-for-canonicalize.mlir
  MLIR :: Dialect/Async/async-parallel-for-compute-fn.mlir
  MLIR :: Dialect/Async/async-to-async-runtime.mlir
  MLIR :: Dialect/Bufferization/inlining.mlir
  MLIR :: Dialect/GPU/outlining.mlir
  MLIR :: Dialect/LLVMIR/inlining.mlir
  MLIR :: Dialect/Linalg/inlining.mlir
  MLIR :: Dialect/Tosa/inlining.mlir
  MLIR :: Examples/Toy/Ch5/affine-lowering.mlir
  MLIR :: Transforms/inlining-dce.mlir
  MLIR :: Transforms/inlining-recursive.mlir
  MLIR :: Transforms/inlining-repeated-use.mlir
  MLIR :: Transforms/inlining.mlir
  MLIR :: Transforms/test-inlining.mlir
  MLIR :: Transforms/test-legalizer-full.mlir
  MLIR :: Transforms/test-legalizer.mlir
  MLIR :: mlir-cpu-runner/async-group.mlir


Testing Time: 39.21s
  Unsupported:  265
  Passed     : 1872
  Failed     :   20

This seems consistent with the expected behavior I believe?

test-legalizer and affine-lowering look quite surprising to me here.

Furthermore, it is often impossible to register the extension as a dependency of the pass, beyond the need to do it manually rather than in .td of the pass declaration. Specifically, extension registration in multithreaded mode (even when the same extensions are present in the registry) triggers an assertion due to https://github.com/llvm/llvm-project/blob/bae2e56115d617cc5e152b4743484ab04649fe13/mlir/lib/IR/Dialect.cpp#L269-L271 and extensions not having a TypeID that could be used to deduplicate them. This happens for projects that create a pass manager inside a pass run by another pass manager. I would consider this practice dubious, but it is quite common.

The case you describe seems WAI if I understand correctly: the enclosing pass that is wrapping an inner pass-manager (not clear to me why it can't be replaced by the dynamic pipeline instead, but that's not changing the setup for this) should add it to the registry in the getDependentDialects() call.
I tried commenting out registerAllExtensions() in mlir-opt.cpp, and moving it to a getDependentDialects() in the InlinerPass:

void getDependentDialects(DialectRegistry &registry) const override {
  mlir::func::registerInlinerExtension(registry);
}

And ninja check-mlir is passing. Maybe I misunderstood what you're describing though?

MLIR doesn't seem to have a test where a new pass manager is constructed inside a pass that is being run by another pass manager. That would assert because it would attempt to register extensions while in multi-threaded mode because of the outer pass manager. I agree that this is a dubious setup, but it is quite widespread and not explicitly disallowed.

To complicate the matters, the error message omits the dialect name most of the time and reads like there it is itself buggy: "checking for an interface (``) that was promised by dialect 'func' but never implemented.", is interface nullptr? Something went wrong with registration before so there is no name? Certainly it should be possible to use, e.g., macro stringization to print at least the C++ class name of the missing interface.

I think https://reviews.llvm.org/D153676 should fix this:

LLVM ERROR: checking for an interface (`mlir::DialectInlinerInterface`) that was promised by dialect 'func' but never implemented. This is generally an indication that the dialect extension implementing the interface was never registered.

Thanks!

You can also try a patch to the inliner to check for the promise and fail gracefully.

No one asked for this feature, there weren't any PSA or public discussion (latest update in thread was more than a year ago), most users were just fine with status quo so reverting this is the easiest solution.

If by feature you're talking about "promised interface": then I asked for it when the concept of "external interface" was proposed: I would have made this a blocker had I know it'd take so long to get there: so no clearly this can't be a "solution".

If you're talking about using "promised interface" to break the dependency between func and cf, that's a different discussion, I have less opinions about the way to get there: I probably wouldn't have landed it in the same revision myself.

Right now I think what's still missing is enforcing that an interface was promised by a dialect to be allowed to be added externally.

Maybe revert and recommit after this is solved then?

How is it gonna help anything? This patch adds some checks, I'm saying we need more, but these aren't coupled: having these checks in isn't dependent in any way in having more checks.

Dependency between func and cf dialects may not be ideal, but at least it was enforced at compile time. To fix this minor annoyance you are introducing a whole new class of potential misconfigurations and runtime errors. Now, instead of (explicit and enforced) dependency between dialects, we will have an implicit dependency between pass and extension with no way to enforce it except in documentation (which doesn't exist, by the way).

There was not compile time enforcement before: misconfiguration would fail at runtime.
It's just that we hard-coded a dependency between dialects because they might be used in some configurations (combination of pass+dialect). If we didn't setup the dependency, it would just crash in worse way than that when the pass would be involved. The func dialect was only artificially depending on the cf dialect before.

test-legalizer and affine-lowering look quite surprising to me here.

Yeah these were just different cases of missing dependent dialect that this surfaced. Updated list:

Failed Tests (12):
  MLIR :: Dialect/Affine/inlining.mlir
  MLIR :: Dialect/Async/async-parallel-for-canonicalize.mlir
  MLIR :: Dialect/Async/async-parallel-for-compute-fn.mlir
  MLIR :: Dialect/Bufferization/inlining.mlir
  MLIR :: Dialect/LLVMIR/inlining.mlir
  MLIR :: Dialect/Linalg/inlining.mlir
  MLIR :: Dialect/Tosa/inlining.mlir
  MLIR :: Transforms/inlining-dce.mlir
  MLIR :: Transforms/inlining-recursive.mlir
  MLIR :: Transforms/inlining-repeated-use.mlir
  MLIR :: Transforms/inlining.mlir
  MLIR :: Transforms/test-inlining.mlir

MLIR doesn't seem to have a test where a new pass manager is constructed inside a pass that is being run by another pass manager. That would assert because it would attempt to register extensions while in multi-threaded mode because of the outer pass manager. I agree that this is a dubious setup, but it is quite widespread and not explicitly disallowed.

How is it different than dialect registration? Having a nested pass manager requires you to handle all registrations in getDependentDialects(), this is the case for as long as I can remember.
What changed is that the "support for inlining func.func" is now an extension whereas before we had this builtin in the func dialect initialization.