- It was decided that this was the end of the line for the existing custom tc parser/generator, and this is the first step to replacing it with a declarative format that maps well to mathy source languages.
- One such source language is implemented here: https://github.com/stellaraccident/mlir-linalgpy/blob/main/samples/mm.py
- In fact, this is the exact source of the declarative polymorphic_matmul in this change.
- I am working separately to clean this python implementation up and add it to MLIR (probably as mlir.tools.linalg_opgen or equiv). The scope of the python side is greater than just generating named ops: the ops are callable and directly emit linalg.generic ops fully dynamically, and this is intended to be a feature for frontends like npcomp to define custom linear algebra ops at runtime.
- There is more work required to handle full type polymorphism, especially with respect to integer formulations, since they require more specificity wrt types.
- Followups to this change will bring the new generator to feature parity with the current one and delete the current. Roughly, this involves adding support for interface declarations and attribute symbol bindings.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Is there an RFC or a previous thread about all this?
(This seems like a significant change)
There were various discussions on discord among the recent contributors to this area. I'll leave it for Nicholas and Alex to comment before writing up a statement of intent. The feedback from those working on the current parser was uniformly negative, (including the author and recent contributors) indicating that the commingling of custom language, parser and generation was unmaintainable.
Primarily, this change separates concerns and makes named op generation possible to reason about as a standalone activity. When I looked at the finished result, it seemed pretty incremental. I have a lot of places I want to take this eventually but am working on taking it step by step to improve the present state, which is quite unfit for purpose.
The feedback from those working on the current parser was uniformly negative, (including the author and recent contributors) indicating that the commingling of custom language, parser and generation was unmaintainable.
Yeah, that's not surprising though considering the genesis of the tool :)
This was developed quickly and ad-hoc to unblock progress on named ops at the time, and the TC inspiration was fairly natural considering Nicolas' experience.
The main purpose was that to get the IR-level abstractions general and generic enough that 0 lines of C++ or tablegen needed to be written by end-users for things that fit the bill.
This has served its purpose and is ready to be retired for a better alternative that we will want to maintain.
Thanks for taking this up @stellaraccident!
I like the simplicity of the the yaml level of indirection.
Re-reading the patch (and more importantly the description) a bit more carefully I think I misjudged it last night: I thought it was the first step of something much bigger than it actually is. We're just missing documentation, otherwise I don't have any concern here.
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | ||
---|---|---|
166 | Nit: remove trivial braces, and "no else after return" (same below) | |
mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp | ||
232–233 | else if (... ? |
To be clear, I did a fair amount of lit review and talking to stakeholders (both in this community and more domain aligned) as part of reworking this, and I also looked at the prototypes that preceded the current parser. My conclusion was that the viewpoint of where this should go was quite sound and in line with how these things are done (there are other points in the design space, surely, but those can be explored independently). Regarding how it was done, I would just put it in the "pragmatic starting point" category and leave it at that. Some of the earlier prototypes were more principled from a usability standpoint but didn't layer well with respect to what MLIR was at the time.
As for mechanics, I don't think MLIR is in a good spot to be embedding a high-level user-centric linalg-op description language directly in to the static gencode pipeline: an attempt to do so (either with tablegen or an ad-hoc language) has pretty limited utility with the current toolset and is kind of the worst of all worlds since it doesn't quite match the expectations of the domain and it traps the definitions deep in the bowels of the static parts of the compiler where they can only be used for that one thing (would violate ssot when frontends also need to intersect with these definitions).
My observation was that separating concerns would let us evolve both aspects more independently. I isolated the latent data structures from the current parser, generalized them slightly to account for where we think this will go and then wrote a brain dead gencode tool on that (i.e. operating closer to the "AST-level" of the parser, had that been materialized separately). Separately, I built a python DSL from a high-level mathy form in line with what people in the domain expect and made it able to generate the op description data structures (it can actually generate a wider variety of op definitions and should give us a good path to further generalize, given more expressivity). The use of YAML to glue it together was a nice happy accident. I didn't know how that was going to go, but I was pretty happy with the result:
- LLVM's yaml support library is one of the best I've seen and involved writing almost no code to populate the data structures.
- The result was both visually appealing (i.e. I can glance directly at it or diff it as a human and know what the op is doing/what has changed) and FileCheck friendly.
- It just so happens that other gencode solutions in the domain have settled on little YAML based DSL descriptions for similar reasons, so there is some familiarity to the approach.
- Once I went through a few cycles with other ops, I found that the declarative nature really helped development of both the frontend and backend. I think we can grow a dev workflow here that people can understand.
Depending on how MLIR evolves, maybe we eliminate some of the indirection, but that time is definitely not now given the state of the tools. With things separated, approaching that then would be incremental, whereas if we keep it as-is, it will just accumulate more in-grown logic that will require a rewrite down the road to simplify.
clang-tidy: warning: invalid case style for function 'applyfn__add' [readability-identifier-naming]
not useful