This is an archive of the discontinued LLVM Phabricator instance.

[mlir][tblgen][ods][python] Use keyword-only arguments for optional builder arguments in generated Python bindings
ClosedPublic

Authored by jfurtek on Apr 30 2022, 12:36 PM.

Details

Summary

This diff modifies mlir-tblgen to generate Python Operation class __init__()
functions that use Python keyword-only arguments.

Previously, all __init__() function arguments were positional. Python code to
create MLIR Operations was required to provide values for ALL builder arguments,
including optional arguments (attributes and operands). Callers that did not
provide, for example, an optional attribute would be forced to provide None
as an argument for EACH optional attribute. Proposed changes in this diff use
tblgen record information (as provided by ODS) to generate keyword arguments
for:

  • optional operands
  • optional attributes (which includes unit attributes)
  • default-valued attributes

These __init__() function keyword arguments have default None values (i.e.
the argument form is optionalAttr=None), allowing callers to create Operations
more easily.

Note that since optional arguments become keyword-only arguments (since they are
placed after the bare * argument), this diff will require ALL optional
operands and attributes to be provided using explicit keyword syntax. This may,
in the short term, break any out-of-tree Python code that provided values via
positional arguments. However, in the long term, it seems that requiring
keywords for optional arguments will be more robust to operation changes that
add arguments.

Tests were modified to reflect the updated Operation builder calling convention.

This diff partially addresses the requests made in the github issue below.

https://github.com/llvm/llvm-project/issues/54932

Diff Detail

Event Timeline

jfurtek created this revision.Apr 30 2022, 12:36 PM
Herald added a project: Restricted Project. · View Herald Transcript
jfurtek requested review of this revision.Apr 30 2022, 12:36 PM

(I'm OOO, but randomly checking so my latency will be high)

I was thinking about this recently for a change I was playing around with (moving result types so that adding build time type inference doesn't change signature). For these I was wondering what if we follow what we do for C++ and we only add defaults for the "trailing" defaults. Both C++ and Python have the same restriction wrt default valued arguments and this then keeps the argument order as specified in ODS, used in C++, docs generated for dialect and DRR. Then the dialect author has the choice (and semi encouraged to do so, as today, except C++ allows overloading which lessens encouragement) to have default valued attributes towards end (we could even add log a message). Point of discussion, esp as I don't know Python conventions, but keeping these consistent is appealing to me.

I also realized that we'd hit another issue (unrelated to this change but to default attributes and Python): the default attributes are currently lazily constructed in the C++ API while Python side this would not trigger. So querying the attribute would not return the default value if unset.

(I'm OOO, but randomly checking so my latency will be high)

I was thinking about this recently for a change I was playing around with (moving result types so that adding build time type inference doesn't change signature). For these I was wondering what if we follow what we do for C++ and we only add defaults for the "trailing" defaults. Both C++ and Python have the same restriction wrt default valued arguments and this then keeps the argument order as specified in ODS, used in C++, docs generated for dialect and DRR. Then the dialect author has the choice (and semi encouraged to do so, as today, except C++ allows overloading which lessens encouragement) to have default valued attributes towards end (we could even add log a message). Point of discussion, esp as I don't know Python conventions, but keeping these consistent is appealing to me.

I also realized that we'd hit another issue (unrelated to this change but to default attributes and Python): the default attributes are currently lazily constructed in the C++ API while Python side this would not trigger. So querying the attribute would not return the default value if unset.

Jacques: what is your preference here on proceeding? If we're going to be breaking things, I have a minor preference to do so all at once if not too bad (vs a sequence of incremental breaks to the same API)

I agree on breaking at once, too many breaking changes one after the other is painful (looking at JAX team needing to bump API number for small changes makes me feel bad). Perhaps we could roll https://gist.github.com/jpienaar/9d22b710db6122a9a41e8cd3d824a39e or something like it in here too (also breaking change but orthogonal to this one, but in similar spirit). Ripping off bandaid should result in one larger change but enable less breaking changes post. I do have preference for consistency with ODS and C++ here (it may also result in less breakages but that's not the reason why I prefer it). It puts convenience and usability in the hand of dialect authors. Now the signature can change if one changes the op def, but merely appending optional/default value attribute would not. But I might not be a "pythonic" person :-)

I can do follow up on removing lazy initialization once back as i think it doesn't pay for itself and makes things behave unexpected ways.

Apologies for the delay... I said I'd take a look at this from CIRCT's perspective a week ago. Going this route in general sounds good to me, and the current implementation seems to make sense. I'm happy to update CIRCT as needed.

stellaraccident accepted this revision.May 15 2022, 7:42 PM
This revision is now accepted and ready to land.May 15 2022, 7:42 PM
mikeurbach accepted this revision.May 16 2022, 4:19 PM

I took this for a spin in CIRCT, and we have already used the extension mechanism to write our own constructors in a lot of places this could break APIs. There was one CIRCT helper, used in two places, that required minor changes at the call sites. I just wrote up a patch since it was so simple: https://github.com/llvm/circt/pull/3134.

So, +1 from me, the kinds of breakage this causes are generally pretty trivial, and at least for CIRCT, already fixed. It might be more annoying for other downstreams that rely on the generated constructors more directly, but as far as MLIR breakages go, they are not a big deal IMHO, especially since you've already used Discourse to raise visibility about this change potentially coming. But thanks for waiting for our feedback on the CIRCT side!

jfurtek updated this revision to Diff 430682.May 19 2022, 7:57 AM
  • Fix clang-format errors
jfurtek updated this revision to Diff 430712.May 19 2022, 9:05 AM
  • Update _ml_program_ops_ext.py to use FuncOp keyword arg

@stellaraccident - I don't have commit privileges. Can you (or someone else) land this diff at your convenience? Thanks!

Yes, I will land it either tomorrow or this weekend. Thank you for the patch!