This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Fix circular dialect initialization
ClosedPublic

Authored by springerm on Oct 25 2022, 7:35 AM.

Details

Summary

This change fixes a bug where a dialect is initialized multiple times. This triggers an assertion when the ops of the dialect are registered (error: operation named ... is already registered).

This bug can be triggered as follows:

  1. Dialect A depends on dialect B (as per ADialect.td).
  1. Somewhere there is an extension of dialect B that depends on dialect A (e.g., it defines external models create ops from dialect A). E.g.:
registry.addExtension(+[](MLIRContext *ctx, BDialect *dialect) {
  BDialectOp::attachInterface ...
  ctx->loadDialect<ADialect>();
});
  1. When dialect A is loaded, its initialize function is called twice:
ADialect::ADialect()
   |     |
   |     v
   |   ADialect::initialize()
   v
getOrLoadDialect<BDialect>()
   |
   v
(load extension of BDialect)
   |
   v
ctx->loadDialect<ADialect>()  // user wrote this in the extension
   |
   v
getOrLoadDialect<ADialect>()  // the dialect is not "fully" loaded yet
   |
   v
ADialect::ADialect()
   |
   v
ADialect::initialize()

An example of a dialect extension that depends on other dialects is Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp. That particular dialect extension does not trigger this bug. (It would trigger this bug if the SCF dialect would depend on the Tensor dialect.)

This change introduces a new dialect state: dialects that are currently being loaded. Same as dialects that were already fully loaded (and initialized), dialects that are in the process of being loaded are not loaded a second time.

Diff Detail

Event Timeline

springerm created this revision.Oct 25 2022, 7:35 AM
Herald added a project: Restricted Project. · View Herald Transcript
springerm requested review of this revision.Oct 25 2022, 7:35 AM

Note: I am not an expert on dialect loading/initialization. I can't tell whether this is the right way to fix the bug or not.

mehdi_amini added inline comments.Oct 25 2022, 10:01 AM
mlir/tools/mlir-tblgen/DialectGen.cpp
262

I think this require much more explanations: call out the recursive initialization through get-dependent.

But more importantly I don't actually understand how initialize() gets called twice, where is the second call coming from?
You mention in the description that there is a call to initialize() from ctx->loadDialect<ADialect>(), but how does this call initialize()? It may call another constructor, but we'd have the constructor called twice? Does not seem right, so I'm missing something :)

springerm added inline comments.Oct 26 2022, 12:41 AM
mlir/tools/mlir-tblgen/DialectGen.cpp
262

Yes, the same constructor is getting called twice.

When ADialect is loaded for the first time, an instance of ADialect is created, so the constructor is called. But the ADialect instance is not yet added to the list. This happens after the constructor is finished executing:

std::unique_ptr<Dialect> &dialect =
    impl.loadedDialects.insert({dialectNamespace, ctor()}).first->second;
assert(dialect && "dialect ctor failed");

The constructor (ADialect::ADialect) then indirectly calls ctx->loadDialect<ADialect>(), which calls getOrLoadDialect again. But we are still in the process of constructing the other ADialect instance, so it has not been added to loadedDialects yet. And therefore we create a second instance.

Note: impl.loadedDialects.insert is called twice. The second insert fails; i.e., it returns the already inserted ADialect instance, so one of the two ADialect objects is discarded.

springerm added inline comments.Oct 26 2022, 12:49 AM
mlir/tools/mlir-tblgen/DialectGen.cpp
262

so one of the two ADialect objects is discarded

This makes me wonder, are dialect objects allowed to have state? In that case, this change is wrong (as it may discard state).

However, it's still not ideal that ADialect is created twice....

springerm updated this revision to Diff 470742.Oct 26 2022, 1:42 AM

a better fix: do not load/create the same dialect multiple times

I uploaded a better fix that avoid creating two versions of ADialect in the first place.

springerm edited the summary of this revision. (Show Details)Oct 26 2022, 1:44 AM

On a technical level, the issue is with the fact that there's no entry in the dialect map when the second call is issued. Maybe we can proactively add a null pointer to that map before calling the constructor to work around this? It will result in the second getOrLoad call returning a null pointer. Does it happen only in extensions? If so, it may be okay as long as we document that extensions must not attempt to load dialects and use them, instead they should actually "extend" the dialect they want to use (since extensions may depend on multiple dialects).

On the conceptual level, loading dialects in extensions let us create cycles in the dialect dependency graph that are not visible in the dialect itself. I wonder if we can trigger a similar cycle without extensions, by having two mutually dependent dialects. If so, I'd go with whatever solution that addresses both problems.

mlir/lib/IR/MLIRContext.cpp
178
mehdi_amini added inline comments.Oct 26 2022, 10:35 AM
mlir/lib/IR/MLIRContext.cpp
180

I find this a bit annoying to have this separate data structure, what about Alex's suggestion of inserting a nullptr in the loadedDialects map?

mlir/tools/mlir-tblgen/DialectGen.cpp
262

Yes, the same constructor is getting called twice.

Right let's avoid that :)

This makes me wonder, are dialect objects allowed to have state?

Yes.

springerm edited the summary of this revision. (Show Details)

address comments

springerm marked 3 inline comments as done.Oct 27 2022, 12:27 AM

Update: Using nullptr to indicate "dialect loading" and also keeping isDialectLoading for error handling (report_fatal_error).

ftynse accepted this revision.Oct 27 2022, 1:44 AM
ftynse added inline comments.
mlir/lib/IR/MLIRContext.cpp
462

Nit: I'd expand this auto.

465

I'd rather do == nullptr to better match with the comment.

This revision is now accepted and ready to land.Oct 27 2022, 1:44 AM
springerm marked 3 inline comments as done.Oct 27 2022, 2:24 AM
springerm updated this revision to Diff 471084.Oct 27 2022, 2:25 AM

address comments

This revision was landed with ongoing or failed builds.Oct 27 2022, 2:50 AM
This revision was automatically updated to reflect the committed changes.
rriddle added inline comments.Oct 27 2022, 10:22 AM
mlir/include/mlir/IR/MLIRContext.h
101–102

This doesn't look like something that should be publicly exposed.

mlir/lib/IR/MLIRContext.cpp
436–438

Can you avoid the double lookup here? It'd be nicer to just assign std::unique_ptr<Dialect> &dialect = impl.loadedDialects[dialectNamespace]; first, and then call the constructor. The lookup should already default initialize it to null.