This is an archive of the discontinued LLVM Phabricator instance.

Preliminary python API for constructing operations.
AbandonedPublic

Authored by stellaraccident on Sep 6 2020, 3:53 PM.

Details

Reviewers
ftynse
zhanghb97
Summary
  • Primarily, this is just the scaffolding that allows a minimal operation to be created in the hierarchy.
  • A follow-up needs to focus on more pythonic/complete mechanisms for traversing the operation hierarchy
  • Threads through enough to test that the ownership model is sound and does not leak (verified with asan).

Diff Detail

Event Timeline

stellaraccident created this revision.Sep 6 2020, 3:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2020, 3:53 PM
stellaraccident requested review of this revision.Sep 6 2020, 3:53 PM
mehdi_amini added inline comments.Sep 6 2020, 9:54 PM
mlir/lib/Bindings/Python/IRModules.cpp
1015

We could reset the PyRegion reference to the new region held by the operation to avoid the dangling pointer I mention below.

It seems that to be really robust, we need some mechanism to keep a pointer back to the Py* objects almost everywhere, and we need to manage updates/invalidation as we will mutate the IR. It seems that any IR mutation outside Python method runs the risk of dangling pointers and later possible crashes when accessing these Python objects.

mlir/lib/Bindings/Python/IRModules.h
137

The extra context reference seems redundant with the MlirOperation: we should have an accessor to get the context from it.

(Same for PyModule)

mlir/test/Bindings/Python/ir_operation.py
103

I suspect that the region1 and region2 are dangling after the create_operation call and trying to use them would crash.

Example changes to show how comments can be addressed.

stellaraccident marked an inline comment as not done.Sep 6 2020, 10:49 PM
stellaraccident added inline comments.
mlir/lib/Bindings/Python/IRModules.cpp
1015

I made a hacky change here and to the corresponding test showing that this does indeed fix the dangling pointer. This needs a clean up, but I figure we're going to be discussing this for a bit and I wanted to show how it can work before a more invasive fix.

Regarding the "really robust" part, I agree but have been having trouble finding a design that gets us there and doesn't overly sacrifice legibility or accounting tax. I feel like how we model these mutable parts of the IR are pretty core and now that we are here, probably worth a more involved discussion.

I think we can likely get this factored so that Python-only inspection and mutation does not have any design-imposed (unfixable) lifetime or memory safety issues.

I've been toying with some mechanism for carrying around a context-owned lock on every mutable object that defines the scope in which the object is still valid (attempts to do anything to an invalid object would raise an error). This could be as simple as an incrementing integer value for each lock, with the possibility of auto-acquiring a lock if just open-coding against the API (i.e. only doing python mutations). If wanting to do anything more scoped, you would need to enter a context lock with something like:

module = ctx.parse_module(...)
with ctx:
  op = module.operation.first_region.first_block.first_operation
  # Do some queries and mutations

# Outside of the lock scope, any python references to parts of the Operation hierarchy would error on use.

Global APIs that perform unrestricted mutations (i.e. PassManager, etc) would take a new lock, invalidating any existing one.

If we had the lock facility that we had to manage anyway, we might want to also rework our cascading keep_alives to somehow instead tie created objects to the lock scope, in lieu of the current mess we have of keep_alives on non-context temporaries.

Shall we switch to discourse? I can try to come up with some options and then we can decide. Right now, the way the API is laid out, it is not possible to provide stronger lifetime guarantees, which isn't great, and as you've noted in other reviews, the keep_alive situation is imprecise and wasteful.

mlir/test/Bindings/Python/ir_operation.py
103

I didn't believe you so I tried and you are correct. I'm missing something in my mental model of how ownership is transferred into the operation. I had thought that it was just about who owned the pointer, but it is also about the pointer itself. It's a vague question, but can you explain why this was obvious to you?

The transfer semantics appear to be different for region and block: for the latter, the pointer remains valid. Not sure how much of this is implementation details that should be coded defensively.

ftynse added inline comments.Sep 7 2020, 7:30 AM
mlir/lib/Bindings/Python/IRModules.cpp
953–954

Why do we have a vector of PyType but a vector of pointers to everything else?

956

I'd consider a py::dict, potentially as an overload if we see lots of overhead in creating Identifiers on the fly.

1015

I like the context approach. +1 for moving to discourse.

mlir/test/Bindings/Python/ir_operation.py
79

This starts to diverge between ctx.create_operation / ctx.get_unknown_location and mlir.ir.StringAttr.get(ctx,..., and mlir.ir.F32Type(ctx). I'd prefer one consistent way of creating things.

Independently, there's no need to expose create_operation on a context object because loc can provide the context.

103

In C++, when you create an operation with regions, you std::move those regions into the constructor so they are unusable after that.

Under the hood, regions held by an operation are allocated together with the operation itself, and then the _contents_ of the region is moved. I suppose we can update the MlirRegion object to point to those new regions, and then it should become safe for Python to use after calling the op.

zhanghb97 added inline comments.Sep 7 2020, 8:55 AM
mlir/lib/Bindings/Python/IRModules.cpp
952

Should the PyLocation follow the rule in the D87091 - "making the location optional and internally creating an UnknownLocation if needed"?

mehdi_amini added inline comments.Sep 7 2020, 5:44 PM
mlir/test/Bindings/Python/ir_operation.py
103

It's a vague question, but can you explain why this was obvious to you?

I am just very familiar with this piece of infrastructure, and in particular I never really liked the way the OperationState is structured and exposed.

stellaraccident abandoned this revision.Sep 18 2020, 6:48 PM