This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Example/test code for Python binding of MLIR enum attributes
Changes PlannedPublic

Authored by jfurtek on Jun 21 2022, 1:13 PM.

Details

Summary

This diff contains test code that exercises the use of Enum attributes
(integer and bit) from Python code.

The implementation in this diff manually specifies the enum values. An approach
that leverages mlir-tblgen is possible, and is a logical next step. This test
code in this diff is intended to be a proof of concept, and to act as a template
for other dialects, as well as an eventual mlir-tblgen implementation.

Python binding for Enum attributes is necessary to implement fastmath
attributes in the Arithmetic dialect. (Other dialects generate arith
operations from Python, and recent changes to Python bindings require non-None
attribute values when creating these operations.)

Diff Detail

Event Timeline

jfurtek created this revision.Jun 21 2022, 1:13 PM
jfurtek requested review of this revision.Jun 21 2022, 1:14 PM

Adding @mehdi_amini to suggest reviewers, since none were automatically triggered.

Thanks for this. Just one comment of the variety of "have you considered doing it entirely differently) :) I'm not convinced either way -- open to discuss

mlir/test/python/lib/PythonTestCAPI.cpp
25

Just so I'm clear, you are wanting to auto generate this as a next step, right? (Which would be the first but if TableGen generated Capi we have)?

That would alleviate some of my concerns but is still a really heavy api and ask of the duplication comes with a real cost.

It occurs to me that as a class of things, we could supports common API for all enums based on some kind of opaque EnumHandle, just having one enums specific entrypoint to get the handle for a specific enums. If the common code also exposed an atoi and itoa like function, we could make the entire python implementation generic, based on reflection and exposed via an mlir.ir._DefEnum(handle_capsule).

Not that we care a ton, I guess, but pybind code really doesn't scale with either compile time or binary size to O(variants), and if reach attribute/enums just had one handle, we could reduce it to one c function and have a generated implementation that was pure c (as it would be trivial).

mlir/test/python/lib/PythonTestModule.cpp
14

This is pretty weird (directly including the .inc file here). I know why you're doing it...

(also sorry for some auto correct typos - on mobile)

jfurtek added inline comments.Jun 22 2022, 9:07 AM
mlir/test/python/lib/PythonTestCAPI.cpp
25

you are wanting to auto generate this as a next step, right?

It seems like it could be automated. If "this is the way" (to use enums from Python), and if enough dialects care about such a thing, then it probably should be automated. But I didn't want to presume this would be the right strategy and go straight to yet-another-tblgen output.

have you considered doing it entirely differently

Yes, but probably from a position of relative ignorance. ;-)

My thought process was:

  • Python code should have identifiers available for enum values (for readability)
  • The C interface used by the Python binding requires passing "plain" integer values (because the tblgen-erated code uses C++ enum classes)
  • Attribute creation from Python should fail if an invalid integer is used. (tblgen-erated C++ code already exists to check the integer value for validity, and there are other diffs "in flight" to automatically attach this to the MyEnumAttribute::getChecked() implementation.)

I thought a little about generating the enum definition in Python as opposed to C++/pybind. (Not a major simplification as you are proposing, but I was just trying to get something to work as a first step.) I abandoned this because it seemed a little clumsier to generate Python code with hard coded values (because the C++ enum identifiers are not visible).

we could reduce it to one c function and have a generated implementation that was pure c (as it would
be trivial)

I confess to not having totally grokked the inner workings of the MLIR python bindings. Would this require some sort of "registration" to allow enums/attributes from different dialects to be handled? Or would the capsule provide a pointer to the attribute-specific generator?

I'm (also) not a big fan of the verbosity of the approach in this diff. If you can see a path to something that is clearly better, and if you can spare a few minutes to describe it a little more, I can try to implement.

mlir/test/python/lib/PythonTestCAPI.cpp
25

you are wanting to auto generate this as a next step, right?

It seems like it could be automated. If "this is the way" (to use enums from Python), and if enough dialects care about such a thing, then it probably should be automated. But I didn't want to presume this would be the right strategy and go straight to yet-another-tblgen output.

I think I'm ok with this for this first case. In my experience, though, there is a big hurdle to "tablegen-enabling" a new part of the tree, and until someone does it, we tend to accumulate sub-par workarounds. In this case, neither the C API nor the Python API has generated code like this. I think it is fine to land one such exemplar case, but then we need to do it right...

have you considered doing it entirely differently

Yes, but probably from a position of relative ignorance. ;-)

My thought process was:

  • Python code should have identifiers available for enum values (for readability)
  • The C interface used by the Python binding requires passing "plain" integer values (because the tblgen-erated code uses C++ enum classes)
  • Attribute creation from Python should fail if an invalid integer is used. (tblgen-erated C++ code already exists to check the integer value for validity, and there are other diffs "in flight" to automatically attach this to the MyEnumAttribute::getChecked() implementation.)

I thought a little about generating the enum definition in Python as opposed to C++/pybind. (Not a major simplification as you are proposing, but I was just trying to get something to work as a first step.) I abandoned this because it seemed a little clumsier to generate Python code with hard coded values (because the C++ enum identifiers are not visible).

we could reduce it to one c function and have a generated implementation that was pure c (as it would
be trivial)

I confess to not having totally grokked the inner workings of the MLIR python bindings. Would this require some sort of "registration" to allow enums/attributes from different dialects to be handled? Or would the capsule provide a pointer to the attribute-specific generator?

I'm (also) not a big fan of the verbosity of the approach in this diff. If you can see a path to something that is clearly better, and if you can spare a few minutes to describe it a little more, I can try to implement.

I do see a hazy path in my mind. And I do think that my concern is that there are a lot of such cases spread up and down the users, and I generally wish we were not exposing so much API surface for a string/int mapping. Perhaps a half step to a better place would be to collapse the C API down to a single entry-point per enum attr:

struct MlirEnumAttributeHandle {
  bool (*isa)(MlirAttribute attr);
  MlirAttribute (*get_from_value)(int32_t value);
  ... Each of the others ...
};

const MlirEnumAttributeHandle *mlirPythonTestTestEnumAttributeGetHandle();

I am pretty sure we can extend something like that (which now gives us a pointer to a thing that lets us operate on the enum) into generic code on the Python side with a lot less coupling. In general, I'm looking to extend the DialectHandle support to be more reflectiony so that we can simplify the integration, and this may layer into that in the future.

Cons for doing it this way is that it deviates from how other attributes are defined, but I would argue that if we were defining another "class" of attributes to expose an API for like this in the future, we could follow this approach (we just haven't encountered that yet).

jfurtek added inline comments.Jun 24 2022, 7:46 AM
mlir/test/python/lib/PythonTestCAPI.cpp
25

Thanks for the review and for the explanation.

It seems like an approach that collapses enum handling to a single entry point would still benefit from tblgen code generation, and as you pointed out:

neither the C API nor the Python API has generated code like this.

Do you think that tblgen-ing code for the C API and/or Python is a reasonable path to go down?

mlir/test/python/lib/PythonTestCAPI.cpp
25

For the C API, yes, I think we need to do that: we already have some suboptimal things there because we don't before this patch.

For the python pybind API, I'm less convinced but open to it if we have to go there. My primary concern is that the native code is not really set up to scale to per op/type/attribute. Ideally, I would rather expose some more reflection capabilities at the CAPI level so that we could program more of this generically in python. That's where I'm going with the existing DialectHandle bits and if we at least had a C object for the other types of interest (vs a bunch of loose functions), it would go in that direction. That's why I suggested this as a halfway point.

And also, I'm roughly ok with this patch and open coding the arith enum. I'd just like to have a better plan before it spiders.

jfurtek planned changes to this revision.Jun 24 2022, 9:14 PM