diff --git a/mlir/docs/Bindings/Python.md b/mlir/docs/Bindings/Python.md new file mode 100644 --- /dev/null +++ b/mlir/docs/Bindings/Python.md @@ -0,0 +1,310 @@ +# MLIR Python Bindings + +Current status: Under development and not enabled by default + + +## Building + +### Pre-requisites + +* [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to + be located by CMake. +* A relatively recent Python3 installation + +### CMake variables + +* **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` + + Enables building the Python bindings. Defaults to `OFF`. + +* **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` + + Links the native extension against the Python runtime library, which is + optional on some platforms. While setting this to `OFF` can yield some greater + deployment flexibility, linking in this way allows the linker to report + compile time errors for unresolved symbols on all platforms, which makes for a + smoother development workflow. Defaults to `ON`. + +* **`PYTHON_EXECUTABLE`**:`STRING` + + Specifies the `python` executable used for the LLVM build, including for + determining header/link flags for the Python bindings. On systems with + multiple Python implementations, setting this explicitly to the preferred + `python3` executable is strongly recommended. + + +## Design + +### Use cases + +There are likely two primary use cases for the MLIR python bindings: + +1. Support users who expect that an installed version of LLVM/MLIR will yield + the ability to `import mlir` and use the API in a pure way out of the box. + +2. Downstream integrations will likely want to include parts of the API in their + private namespace or specially built libraries, probably mixing it with other + python native bits. + + +### Composable modules + +In order to support use case #2, the Python bindings are organized into +composable modules that downstream integrators can include and re-export into +their own namespace if desired. This forces several design points: + +* Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` + global constructor. + +* Introduce headers for C++-only wrapper classes as other related C++ modules + will need to interop with it. + +* Separate any infectious global initialization into its own module/dependency + that can be optionally linked (currently `registerAllDialects` falls into this + category). + +There are a lot of co-related issues of shared library linkage, distribution +concerns, etc that affect such things. Organizing the code into composable +modules (versus a monolithic `cpp` file) allows the flexibility to address many +of these as needed over time. Also, compilation time for all of the template +meta-programming in pybind scales with the number of things you define in a +translation unit. Breaking into multiple translation units can significantly aid +compile times for APIs with a large surface area. + +### Submodules + +Generally, the C++ codebase namespaces most things into the `mlir` namespace. +However, in order to modularize and make the Python bindings easier to +understand, sub-packages are defined that map roughly to the directory structure +of functional units in MLIR. + +Examples: + +* `mlir.ir` +* `mlir.passes` (`pass` is a reserved word :( ) +* `mlir.dialect` +* `mlir.execution_engine` (aside from namespacing, it is important that + "bulky"/optional parts like this are isolated) + +In addition, native global initialization should be in underscored (notionally +private) modules such as `_init` and linked separately. This allows downstream +integrators to completely customize what is included "in the box" and covers +things like dialect registration, pass registration, etc. + +### Loader + +LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with +other non-trivial native extensions. As such, the native extension (i.e. the +`.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol +(`_mlir`), while a small set of Python code is provided in `mlir/__init__.py` +and siblings which loads and re-exports it. This split provides a place to stage +code that needs to prepare the environment *before* the shared library is loaded +into the Python runtime, and also provides a place that one-time initialization +code can be invoked apart from module constructors. + +To start with the `mlir/__init__.py` loader shim can be very simple and scale to +future need: + +```python +from _mlir import * +``` + +### Limited use of globals + +For normal operations, parent-child constructor relationships are realized with +constructor methods on a parent class as opposed to requiring +invocation/creation from a global symbol. + +For example, consider two code fragments: + +```python + +op = build_my_op() + +region = mlir.Region(op) + +``` + +vs + +```python + +op = build_my_op() + +region = op.new_region() + +``` + +For tightly coupled data structures like `Operation`, the latter is generally +preferred because: + +* It is syntactically less possible to create something that is going to access + illegal memory (less error handling in the bindings, less testing, etc). + +* It reduces the global-API surface area for creating related entities. This + makes it more likely that if constructing IR based on an Operation instance of + unknown providence, receiving code can just call methods on it to do what they + want versus needing to reach back into the global namespace and find the right + `Region` class. + +* It leaks fewer things that are in place for C++ convenience (i.e. default + constructors to invalid instances). + +### Use the C-API + +The Python APIs should seek to layer on top of the C-API to the degree possible. +Especially for the core, dialect-independent parts, such a binding enables +packaging decisions that would be difficult or impossible if spanning a C++ ABI +boundary. In addition, factoring in this way side-steps some very difficult +issues that arise when combining RTTI-based modules (which pybind derived things +are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). + + +## Style + +In general, for the core parts of MLIR, the Python bindings should be largely +isomorphic with the underlying C++ structures. However, concessions are made +either for practicality or to give the resulting library an appropriately +"Pythonic" flavor. + +### Properties vs get*() methods + +Generally favor converting trivial methods like `getContext()`, `getName()`, +`isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is +primarily a matter of calling `def_property_readonly` vs `def` in binding code, +and makes things feel much nicer to the Python side. + +### __repr__ methods + +Things that have nice printed representations are really great :) If there is a +reasonable printed form, it can be a significant productivity boost to wire that +to the `__repr__` method (and verify it with a doctest). + +### CamelCase vs snake_case + +Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As +a mechanical concession to Python style, this can go a long way to making the +API feel like it fits in with its peers in the Python landscape. + +### Prefer pseudo-containers + +Many core IR constructs provide methods directly on the instance to query count +and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. + +For example, a direct mapping of blocks within regions could be done this way: + +```python +region = ... + +for block in region: + + pass +``` + +However, this way is preferred: + +```python +region = ... + +for block in region.blocks: + + pass + +print(len(region.blocks)) +print(region.blocks[0]) +print(region.blocks[-1]) +``` + +Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate +them to appropriate `__dunder__` methods and iterator wrappers in the bindings. + +Note that this can be taken too far, so use good judgment. For example, block +arguments may appear container-like but have defined methods for lookup and +mutation that would be hard to model properly without making semantics +complicated. If running into these, just mirror the C/C++ API. + +### Provide one stop helpers for common things + +One stop helpers that aggregate over multiple low level entities can be +incredibly helpful and are encouraged within reason. For example, making +`Context` have a `parse_asm` or equivalent that avoids needing to explicitly +construct a SourceMgr can be quite nice. One stop helpers do not have to be +mutually exclusive with a more complete mapping of the backing constructs. + +## Testing + +Tests should be added in the `test/Bindings/Python` directory and should +typically be `.py` files that have a lit run line. + +While lit can run any python module, prefer to lay tests out according to these +rules: + +* For tests of the API surface area, prefer + [`doctest`](https://docs.python.org/3/library/doctest.html). +* For generative tests (those that produce IR), define a Python module that + constructs/prints the IR and pipe it through `FileCheck`. +* Parsing should be kept self-contained within the module under test by use of + raw constants and an appropriate `parse_asm` call. +* Any file I/O code should be staged through a tempfile vs relying on file + artifacts/paths outside of the test module. + +### Sample Doctest + +```python +# RUN: %PYTHON %s + +""" + >>> m = load_test_module() +Test basics: + >>> m.operation.name + "module" + >>> m.operation.is_registered + True + >>> ... etc ... + +Verify that repr prints: + >>> m.operation + +""" + +import mlir + +TEST_MLIR_ASM = r""" +func @test_operation_correct_regions() { + // ... +} +""" + +def load_test_module(): + ctx = mlir.ir.Context() + ctx.allow_unregistered_dialects = True + module = ctx.parse_asm(TEST_MLIR_ASM) + return module + + +if __name__ == "__main__": + import doctest + doctest.testmod() +``` + +### Sample FileCheck test + +```python +# RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck + +def print_module(f): + m = f() + print("// -----") + print("// TEST_FUNCTION:", f.__name__) + print(m.to_asm()) + return f + +# CHECK-LABEL: TEST_FUNCTION: create_my_op +@print_module +def create_my_op(): + m = mlir.ir.Module() + builder = m.new_op_builder() + # CHECK: mydialect.my_operation ... + builder.my_op() + return m +```