- Mostly imported from experimental repo as-is with cosmetic changes.
- Temporarily left out emission code (for building ops at runtime) to keep review size down.
- Documentation and lit tests added fresh.
- Sample op library that represents current Linalg named ops included.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I can see how having a TC-like DSL to write nice IR and emit Linalg generic operations is gonna be very nice for our end-to-end story, but it isn't clear to me why the YAML is useful / desirable as part of the publicly exported MLIR python bindings?
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
678 ↗ | (On Diff #328345) | I believe these is an important layering with a strong separation between this tool and python niceties, and the "core" of MLIR, including the Linalg dialect. |
mlir/lib/Bindings/Python/CMakeLists.txt | ||
28 ↗ | (On Diff #328345) | Can you commit these changes ahead of this patch? They seem NFC and unrelated (right?) |
I suspect there are pieces of the design that aren't clear to me at the moment, in particular the setup looks monolithic to me (some comment inline may hint at this as well). From what I gather in terms of "independent components", the coupling between the tool and the bindings / native code isn't obvious to me. It seems like the DSL can be a "pure" python library, the tool is mostly about using the DSL and emitting YAML, and the bindings should be mostly bridging the DSL to emit IR.
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
774 ↗ | (On Diff #328345) | s/build/built/ ? Why do we need to build MLIR to be able to use this language and the tool? |
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/affine.py | ||
30 | The doc here makes it look like a general nice frontend for AffineExpr, isn't it suitable in mlir.ir? Or as mlir.dialects.affine? | |
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/comprehension.py | ||
5 | "a tensor-comprehension" is something that deserve to be explained / documented, I am not sure it is a widely known concept, or at least not obvious enough in the context of MLIR to be used as-is I think. | |
188 | Thanks for providing types as comment here by the way! :) |
A couple of points. First, the way this is intertwined in the Python code right now is not great, creating a hard dependency on YAML for using any of it. I had started to, but backed off in a first patch, factoring it to an optional serializer and make the core just about export to lists/dicts and such. That is mechanics and I'll separate it some more in a future patch so that it becomes an optional export path from these definitions.
As for utility, I've been iterating on this for a few weeks, and I think the utility of YAML as an export format is high:
- Tools like this work better with a human readable intermediate form. One of the issues with the prior tool flow was that it did everything at once, with a lot of magic in the processing of the DSL and no way to inspect what it was actually declaring without looking at generated C++ code. When I started this code, I was just exporting to a repr form for lit testing and found it to lack a structure that made it easy to discern exactly what the final result was doing. What we have here produces a high "glancability" index to easily discern whether what was heard by the tool was what you intended to say. That has obvious value for testing but has been pretty valuable for developing it too.
- We actually had a non core kernel engineer hacking on some of these ops over the last week or so, and he found that a) Handwriting the YAML op specs was better than what we had before and he had made independent progress working directly on that, and b) When I showed him these forms, he immediately got it and wanted to use it as a tool to author the op specs. The YAML form in LinAlg is good for reading/diffing/at rest, but the Python form is better for authoring -- especially for those further from the compiler domain. I've got ample feedback on that point.
- Within the LLVM ecosystem, YAML is already well supported on the C++ side. Leveraging that and making our Python tools interop through an already well supported, readable interchange mechanism strictly increases value. I believe strongly that a tool like this needs to have an externalized form, and why shouldn't it be something that interops well and aids testing?
- In the domain, this is not an uncommon use: Coming, for example, from PyTorch, they often represent fanout of a bunch of math definitions through YAML intermediate forms that are code generation amenable. It's just how things are increasingly done because it is easy/readable, and if your higher level DSL needs to be replaced/overhauled, you still have an asset that you can use/extend/diff/transform.
Also, for the record, I mean it when I say that I don't believe this should ever get promoted into the build system but remain as a runtime tool and development aid for building MLIR based compilers: It is much safer to do code reviews on the distilled YAML form and not leave critical op definition work strictly to a less constrained tool like this (which will require its own debugging, development, etc). We can put the last mile serialization in some other place in the codebase (not part of the API), but it will reduce testability. Once I fix the mechanics, it'll be one pure Python file and it is better to keep it together.
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
774 ↗ | (On Diff #328345) | It uses primitives like AffineMaps and AffineExpressions as is, and in the IR generation mode (not yet committed), it is pretty tied to contexts and ops. I originally tried to keep those separate to preserve a pure python version, but I eventually decided that it was nicer to just presume the presence of the MLIR API and not try to build something abstracted against what already exists and that we support. The merit is in pulling things together, not in building another disconnected thing. |
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/affine.py | ||
30 | +1 but would like to do that as a followup. The current Python API package structure needs some evolution... I struggled a good deal with what the package layout should be of the "library" side of the API, and decided that it was better to have some solid examples like this in tree, embedded in a tool, and then promote useful pieces out to the core API. It'll be easier to get a good feel of the package structure that way. | |
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/comprehension.py | ||
188 | The types are all valid, btw (MyPy requires you to add these). I just couldn't figure out an easy way to get MyPy enabled in the MLIR repo in one step (it is in the presubmit of the side repo where this was developed). |
I'm not sure I followed everything you write here, so I'm not entirely sure we're focusing on the same aspect at the moment, but I may just be behind on the design and the direction as well.
You seem here to argument in favor of using YAML, but the context isn't clear to me. Since you're talking about the "prior tool flow", for which the YAML has already been committed as a replacement, I don't think there is much controversy there: using YAML as the input to the build system for Linalg seems fine to me.
That is however very different from the public python bindings, i.e. I am surprised that import mlir would publicly exposed a tool that is used to generate YAML used to build MLIR itself.
On the other hand, import mlir exposing this nice authoring language and emitting MLIR IR (linalg generic) seems very natural to me.
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
774 ↗ | (On Diff #328345) | (missed this one since phab does not show it in the current diff)
Right, as I tried to explain elsewhere, the "IR generation mode" clearly belongs to the public python API. |
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
774 ↗ | (On Diff #328345) | Gotcha - so two things that might help.
With respect to the overall organization of the mlir namespace, I want to keep this piece isolated for now and design that properly. Right now, the namespace is really not organized for much beyond being a place for shims and helpers on top of the native libraries: it needs some more care to lay it out to house more general purpose libraries at a couple of levels, and I prefer to do that explicitly. Having this stuff off to the side and out of the way will help to see the right way to organize the overall namespace since it is easier to organize things when you can see some of the pieces that need to fit. I think I've demonstrated some good stewardship of this part of the codebase and would like some leeway to stage practical steps like this a bit incrementally. |
Thanks for pushing this forward @stellaraccident!
I'd characterize the current comment / discussion as the "layering needs to evolve", which is best served by having this landed and iterated upon by a distributed group of people.
Looking forward to playing with this.
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
759 ↗ | (On Diff #328345) | Note for self: do we verify early and aggressively? |
770 ↗ | (On Diff #328345) | Note for self: I assume we dump the linalg.generic body and provide a reasonable error message on failure ? |
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/affine.py | ||
120 | FMI, what does this do? I don't see any setter/getter/deleter | |
260 | nit: should the global variables exported here be grouped together at the end of the file for ease of tracking? | |
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/comprehension.py | ||
5 | @stellaraccident seems easy to define the term "tensor-comprehension" in the #### Language Guide as a line in a TensorDef that generates a single linalg.generic? | |
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/config.py | ||
10 | +1 finding the good balance is an iterative process. |
Comments.
mlir/lib/Bindings/Python/mlir/tools/linalg_opdsl/lang/affine.py | ||
---|---|---|
120 | "local" here refers to symbols and dims as part of this build action. "global" accounts for if you created this AffineBuildState subordinate to another, which is what we do when constructing affine maps across a comprehension which share the symbol and dim space. Added a class comment. (I want to rework this/the API is awkward - it kind of grew out of the code you had in the original C++ TC lang parser) |
Thanks for the reviews. I generally agree with Mehdi's instincts in the fullness of time with respect to namespace organization and layering, but there are a number of things that need to be done better to strike the right balance. It is my opinion that keeping this isolated as we do that work is in the best interests of both iteration/progress and being purposeful how we expand the intended-to-be-public parts of the API. We'll get there - just takes time.
In general, can you please avoid landing patches while there are discussions that haven't settled?
We don't mark explicitly revision as "request changes" when we comment, but that's somehow implicit to make sure resolve these before landing a patch: i.e. I don't consider that getting Nicolas' approval in this context was enough.
How can we strike a better discussion balance to ensure that "done is better than perfect" and that we can improve incrementally rather than have to wait for the monolith to be shiny before unblocking others ?
+1. While only one reviewer is required to approve before a patch can land, it is imperative that a reasonable consensus has been reached across all reviewers. If it isn't clear if a consensus has been reached, you should wait a bit for acknowledgement (especially if the discussion is still ongoing).
I'm not sure what you mean, I see your characterization as totally off here, you may need to elaborate. Note that this patch landed less than 24h after being posted, and it isn't like "a trivial" fix or a small change.
A key requirement to be able to move incrementally requires to discuss and agree on the general direction. I missed it but I haven't seen a single public discussion about any of this (other than on IREE chat room, which obviously can't really be a reference for upstream work).
Regardless, there is a cost to working upstream: it is a core part of what makes the quality of the project, I have a very strong believe in the culture that is set there. This was also one of the main motivation to develop MLIR inside LLVM.
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
774 ↗ | (On Diff #328345) |
OK!
This is fair, but that still does not mean it should be exposed in the public python API. Anyone who has a need to build native named op would have to integrate with the build system, at which point there have our /bin/ available (just like they have mlir-tblgen available). So bin/linalg-dsl-to-yaml is a very reasonable tool entry point to have. And even if this tool is written in python, and is using MLIR python APIs (to reuse the DSL for example), that still does not call for a public mlir.tools as far as I can tell. So: I don't disagree with any of your use-case, I just haven't perceived yet why exposing mlir.tools is neither needed or desirable to achieve them.
Absolutely, I'm fine with incrementally develop and improve this. However this works well when we build some shared understanding and agreement of the direction first. And right now I feel this whole track (YAML first, and DSL now) felt a short on this: looks like there has been a missed step between the downstream discussions and prototyping and the landing upstream. |
Probably for a longer discussion off thread, but I understand perfectly what Nicolas is referring to. I roughly slow my velocity down by 5-10x over what it would naturally be in order to stage and telegraph feature advancements that would not be controversial and would be obvious if done in a domain-aligned sub-project but have a very high consensus overhead in the general purpose MLIR project as a whole. I think there are a bunch of reasons for that, some good and some legacy, but in my experience, it is not the norm in our other MLIR derived sub projects (two of which are LLVM projects themselves). I generally feel that the core MLIR project is trying to do too many things (as evidenced by these kind of "there's no place to put the tool I need to do my job" kind of issues), especially as it straddles the line between general purpose, mid-level IR infra, and user-centric Tensor Compute domain abstractions. I think that both of those have a strong place upstream, but if I had it to do over again, I would have made them two separate projects. That still may be the right answer, either by directory structure or convention.
In this case, I made an explicit choice on the reviewer: Nicolas is the right person to be both providing the technical and roadmap vision for this component. Further, where it is placed, in the python tree (complete with caveats and tradeoffs) was literally designed and mostly implemented by me, and I have a lot of state on what can/should go where while we are expanding it. I was annoyed, given the timezones, that he was literally asleep while the debate about structure took place -- when what I was looking for was a review of the merits. I did let that annoyance come in to play when I re-read the threads this morning and decided that we were actually converged. In my defense, there is an extraordinary amount of debate about structure that goes on around here, and I have trouble determining what is drive-by vs substance from indirect stakeholders.
I'll revert this patch if this is deemed an issue worthy of more discussion.
mlir/docs/Dialects/Linalg.md | ||
---|---|---|
774 ↗ | (On Diff #328345) |
Yes. And I will also take two followups:
|
I don't think it is worthy of a revert here. My intent is mostly to set the expectation to avoid future frictions.
Here I disagree with your assessment on the reviewer selection and approval process. MLIR is moving slower than downstream project, and that is fine. The review time on a patch here does not prevent you from continuing to make progress in a downstream fork. Note that LLVM can be moving much much slower as well.
So please for such a patch in the future: wait for at least a few days before landing upstream to give a chance to get a wider look. I'd also like to see more RFC on Discourse to set the direction and involve the community.
We have good precedent of "getting agreement on the direction and implementing incrementally", for example the C API or the Python bindings. We also have very good examples of what happens when the process is shortcut, like the original python bindings for example. ;)
Thanks.
Here I disagree with your assessment on the reviewer selection and approval process. MLIR is moving slower than downstream project, and that is fine. The review time on a patch here does not prevent you from continuing to make progress in a downstream fork. Note that LLVM can be moving much much slower as well.
It's too bad that it will be a while before we have real f2f community meetups again. I am going to disagree, and I feel that there is an actual real dynamics and project scoping issue here, but it is the kind of thing better worked out f2f with some empathy vs cross continents while our family members are texting us asking when we are going to make them lunch :)
So please for such a patch in the future: wait for at least a few days before landing upstream to give a chance to get a wider look. I'd also like to see more RFC on Discourse to set the direction and involve the community.
We have good precedent of "getting agreement on the direction and implementing incrementally", for example the C API or the Python bindings. We also have very good examples of what happens when the process is shortcut, like the original python bindings for example. ;)
I think we should also have a good community discussion on that one at some point. I agree that the outcome is showing early signs of being good, but I would hesitate to hold that up as a model for future work. I took that up personally because of a failure to converge on a starting point and implementation plan for quite a long time -- and I simply couldn't wait any longer to have a real thing to build user-centric solutions out of. I agree that once it got underway, we got the job done... quite a bit more slowly than I would have expected, but it got there. Slow is one thing, but I think that the community bugs, though, are in the activation energy and force of will required to start things or push new directions. Many try, but very few people are being successful at picking up the initial velocity. And that's a real problem, because MLIR still has quite a way to go to be a generally useful piece of infra for the scale of problems we all want it to solve, and a lot of the gaps are in things that still need to be envisioned/invented.
Also, say what you will of the original python bindings: they served a useful purpose, and I am aware of at least one project that is still using them to good effect, and they were a key to them scaling their work to production.
My strong belief is that a project like this one can't scale or achieve its ambition without proper care and alignment in the development: we may share the same goal here, but we may also have divergent opinion on some aspects that I consider critical for a multi-parties open-source project like this one.
The energy required to do things are inherent at some point in a project like MLIR. I believe it is much worse in LLVM itself actually. Many users of LLVM have a fork and never contribute upstream, most of my work on LLVM has been organized as a pipeline: I always prototyped in a fork, with ~ 10 patches ahead, cleaning up the oldest ones and sending them for reviews, expecting the review process to take multiple days per patch. That never blocked me from progressing in my fork (even when this fork was Apple internal copy of LLVM downstream). Many features actually landed in production inside some Apple products using LLVM before trying to clean it up, refactor it, and send it to LLVM (with a discussion on llvm-dev@ as necessary).
The impression I get from this track of work reminds me of other examples I don't really want to replicate in MLIR, like how TensorFlow is developed internally: discussion, brainstorm, design, and direction are hidden from the community and when the code is there, it's already "pre-approved" and the review is really just rubber stamping.
By the way, I rather have an RFC discussed and agree upon, and skip pre-commit code reviews on the patches (but following the coding standards and being responsive on post-commit comments!). This is more in line with my experience of "moving fast incrementally" practiced in LLVM for example.
Also, say what you will of the original python bindings: they served a useful purpose, and I am aware of at least one project that is still using them to good effect, and they were a key to them scaling their work to production.
There are many projects that are "serving a purpose" and are "useful in production" that don't belong to MLIR, and I'm perfectly fine with that. But I don't have to work on them or maintain them either.
In this particular case: while these bindings are serving a purpose, they will serve the purpose of a single user and won't generalize. If a user of MLIR wants something ad-hoc to their need, then developing it out-of-tree makes sense. When it's time to upstream, it has to generalize further.
There is a huge cost to refactor and evolve things. I always believed that managing this carefully was important for the health of such a project, and I applied this in LLVM for years as much as possible. But this looks even more critical to me now that I know about Google: a soon a some code is written in MLIR and adopted inside google, the cost of improving/refactoring/evolving it is prohibitive for any Googler (but River, for now...). It can quickly spiral to the point where the only hope of evolution for MLIR will come from non-google contributors...
I have frequent reminders of this: most recently I was trying to work with the ExecutionEngine runtime and I regretted every day not pushing back more on the patches that landed there! I actually gave up for now on many of the refactoring I was trying to conduct there. This another area of the codebase that "is serving a purpose" for someone, but that may just be better not in the monorepo in this form.
You and I almost always agree on the destination, as in this case. I think there may be some practical differences on that path...
The energy required to do things are inherent at some point in a project like MLIR. I believe it is much worse in LLVM itself actually. Many users of LLVM have a fork and never contribute upstream, most of my work on LLVM has been organized as a pipeline: I always prototyped in a fork, with ~ 10 patches ahead, cleaning up the oldest ones and sending them for reviews, expecting the review process to take multiple days per patch. That never blocked me from progressing in my fork (even when this fork was Apple internal copy of LLVM downstream). Many features actually landed in production inside some Apple products using LLVM before trying to clean it up, refactor it, and send it to LLVM (with a discussion on llvm-dev@ as necessary).
When I'm working on my own or in a silo, that is my preferred operation style as well. I think our issue right now is that we are trying to synchronize the work of ~a dozen people on this aspect (not just at Google/IREE but a pretty good ragtag crew of people all trying to use this stuff for the first time across companies and projects), and you hit these clench point items that are causing a pipeline stall across individuals. Given the sprawling nature of the projects, forks, etc, it gets gross pretty quickly trying to synchronize people like that when you find a critical gap upstream.
The impression I get from this track of work reminds me of other examples I don't really want to replicate in MLIR, like how TensorFlow is developed internally: discussion, brainstorm, design, and direction are hidden from the community and when the code is there, it's already "pre-approved" and the review is really just rubber stamping.
I'll try to give more visibility on the lists next time around.
By the way, I rather have an RFC discussed and agree upon, and skip pre-commit code reviews on the patches (but following the coding standards and being responsive on post-commit comments!). This is more in line with my experience of "moving fast incrementally" practiced in LLVM for example.
Also, say what you will of the original python bindings: they served a useful purpose, and I am aware of at least one project that is still using them to good effect, and they were a key to them scaling their work to production.
There are many projects that are "serving a purpose" and are "useful in production" that don't belong to MLIR, and I'm perfectly fine with that. But I don't have to work on them or maintain them either.
In this particular case: while these bindings are serving a purpose, they will serve the purpose of a single user and won't generalize. If a user of MLIR wants something ad-hoc to their need, then developing it out-of-tree makes sense. When it's time to upstream, it has to generalize further.There is a huge cost to refactor and evolve things. I always believed that managing this carefully was important for the health of such a project, and I applied this in LLVM for years as much as possible. But this looks even more critical to me now that I know about Google: a soon a some code is written in MLIR and adopted inside google, the cost of improving/refactoring/evolving it is prohibitive for any Googler (but River, for now...). It can quickly spiral to the point where the only hope of evolution for MLIR will come from non-google contributors...
I probably shouldn't be saying this but I'm pretty sure that the way that Google is managing at-head development of LLVM is going to just grind to a halt at some point. On my side of the fence, we are very reticent to contribute more code to Google's internal repo because of this force. New features get developed entirely on GitHub to the extent possible and then only promoted once/if mature/needed and there is no other way to do it. If it continues on the path it is for ~years more, I estimate that I'll end up being one of those non-Google contributors.
I have frequent reminders of this: most recently I was trying to work with the ExecutionEngine runtime and I regretted every day not pushing back more on the patches that landed there! I actually gave up for now on many of the refactoring I was trying to conduct there. This another area of the codebase that "is serving a purpose" for someone, but that may just be better not in the monorepo in this form.
I took the afternoon and worked out how to open up the python directory tree/namespace and moved things into a more natural location: https://reviews.llvm.org/D98096
The doc here makes it look like a general nice frontend for AffineExpr, isn't it suitable in mlir.ir? Or as mlir.dialects.affine?