This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.
Details
Diff Detail
Event Timeline
Thanks for working on this, Sergei! Some initial comments!
mlir/include/mlir/Dialect/Vector/VectorOps.h | ||
---|---|---|
90 ↗ | (On Diff #327631) | add vector.store as well? |
mlir/lib/Dialect/Vector/VectorTransforms.cpp | ||
2766 ↗ | (On Diff #327631) | I think having a summary upfront of the different lowering options supported would be very useful. Same for transfer_write. |
2778 ↗ | (On Diff #327631) | I think this shouldn't be a TODO since transfer ops on tensors should be lowered to memrefs before this lowering. |
mlir/test/Dialect/Vector/vector-transfer-lowering.mlir | ||
96 ↗ | (On Diff #327631) | Add TODOs for unsupported cases? |
160 ↗ | (On Diff #327631) | There are semantic differences between a scalar load and a single-element vector load. Not sure if LLVM is optimizing the latter into the former but the latter might be rejected by a backend if that vector length is not supported. I think we should generate an std.load for these cases. |
mlir/test/lib/Transforms/TestVectorTransforms.cpp | ||
372 ↗ | (On Diff #327631) | I would like to know what others think about it but I think we should have an independent pass to perform the transfer op lowering and not only a test pass. It's very likely that different projects need the lowering at different stages of the pipeline. For example, we would like to lower transfer ops right after the vectorizer, others might want to keep transfer ops for a while after the vectorizer, linalg might want to lower them at some other point... Would it make more sense to have a pass that provides that level of flexibility? |
Don't we want to make some of these folders/canonicalization patterns? That way, everyone doing rewriting benefits automatically, and does not need to explicitly pull in these new specific rewriting patterns?
Also note that I added masked-load/store-> transfer_load/store canonicalization patterns before we had the regular vector loads that Diego introduced. Does could benefit from lowering directly to loads/stores too.
Don't we want to make some of these folders/canonicalization patterns? That way, everyone doing rewriting benefits automatically, and does not need to explicitly pull in these new specific rewriting patterns?
That sounds good to me but, if we always canonicalized transfer ops into vector and other ops, wouldn't that be a problem for existing Vector/Linalg passes expecting transfer ops?
In the sense that e.g. unmasked one becomes regular loads, yes. But AFAIK, this is very much in the spirit of MLIR, to canonicalize everything to the simplest form possible, and have patterns in the rewriting rules that expect only the simplest form possible to avoid cluttering the patterns with implicit canonicalization.
In the sense that e.g. unmasked one becomes regular loads, yes. But AFAIK, this is very much in the spirit of MLIR, to canonicalize everything to the simplest form possible, and have patterns in the rewriting rules that expect only the simplest form possible to avoid cluttering the patterns with implicit canonicalization.
Good point! I guess we could give it a try and see what happens. Canonicalization patterns wouldn't definitely be a problem for us! The way I see transfer ops is more like ops that provide a higher level of abstraction than [masked] vector loads/stores and others. Some transfer ops might be canonicalized into more complicated low-level ops, including vector shuffles, so it would be more difficult to reason about shuffles than maybe a permutation map. In any case, we could always move the canonicalization patterns to a pass if they became problematic. Thanks for the feedback!
mlir/lib/Dialect/Vector/VectorOps.cpp | ||
---|---|---|
2510 | period at end | |
2516 | Start comment as full sentence. | |
2527 | Sorry, I should have called this out stronger in my original comment (note the "some of these"), but I am not sure we want all of these as part of canonicalization. I definitely think the masked -> unmasked version should be here, but replacing one transfer with broadcast with load feels indeed like a lowering pattern. So the rule more below for "store" should be here, and some of the rule here for "load". But anything that is not a clear simplification (1 transfer -> 1 broadcast and 1 load) should perhaps remain were it was. Of course, the way you have it now allows for easier code sharing. Let's see if Nicolas feels strongly about it one way or the other. For the part that remains, change comment to say "canonicalization" now instead of "progressive lowering" (the difference is subtle, but still) | |
2728 | Change comment to say "canonicalization" now instead of "progressive lowering" (the difference is subtle, but still) | |
2760 | This feels like it should be here, one complex transfer -> one simple load is a clear canonicalization ;-) |
Some transfer ops might be canonicalized into more complicated low-level ops, including vector shuffles, so it would be more difficult to reason about shuffles than maybe a permutation map.
This may not belong to canonicalization then: we should not "lose high-level information" in the canonicalization, we should always have an IR easier to analyze when we run canonicalization in principle.
This is why I had the "some of these", as I also reiterated above. Dropping the mask is a simple canonicalizaiton imho, but adding another ops is not.
I like to have the rule that anything
n terms -> k terms with k < n or 1 term -> simpler term
belongs to folding/canonicalization, anything else is progressive lowering or rewriting. Even this within reason, the n/k terms shouldn't be overly complex either.
Thanks for making progress on this.
I would refrain from making it a blanket canonicalization.
The rationale is that vector transformations use vector.transfer_read / transfer_write.
Too early blanket canonicalizations are likely to not play nicely with the whole end-to-end transformation pipeline.
isMinorIdentityWithBroadcasting should live at the same place as isMinorIdentity.
There are a bunch other moving parts related to rewriting some of transformations related to unrolling and better support for vectors as programming models for GPU (including extensions towards scalable vectors).
For now, I would say that hiding the patterns behind a populate function (but not a canonicalization) would be the simplest way forward.
For now, I would say that hiding the patterns behind a populate function (but not a canonicalization) would be the simplest way forward.
Ok, let's go back to the previous approach. As I mentioned before, I would suggest that we create a regular pass for this lowering, not only a test pass, so that we can compose it as needed in the pipeline.
Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?
I don't want to hold this CL for sgrechanik (so proceed as you see fit) but don't want to give up the underlying argument too easily either ;-)
Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?
I think we could have canonicalization patterns as long as we don't lower the level of abstraction in the process. For example, we could have canonicalization patterns for:
- 'vector.masked_load' -> 'vector.load' // All-true mask.
- 'vector.masked_load' -> () // All-zero mask.
- 'vector.transfer_read [masked=true]' -> 'vector.transfer_read [masked=false]' // All-true mask.
- 'vector.transfer_read [masked=true]' -> () // All-zero mask.
- ...
but we shouldn't have canonicalization patterns that go from a transfer op to a vector masked or unmasked op or vice-versa since they are at a different level of abstraction. Those transformations should be part of the lowering pass. I think your recent patch aligns with this approach very well: https://reviews.llvm.org/rGe5c8fc776fbd2c93e25f5749049ee31cf73a0a41
For transfers, grep for:
template <typename TransferOp> static LogicalResult foldTransferMaskAttribute(TransferOp op) {
Feel free to add more as you see fit.
- ...
but we shouldn't have canonicalization patterns that go from a transfer op to a vector masked or unmasked op or vice-versa since they are at a different level of abstraction. Those transformations should be part of the lowering pass. I think your recent patch aligns with this approach very well: https://reviews.llvm.org/rGe5c8fc776fbd2c93e25f5749049ee31cf73a0a41
+1, canonicalization / folding need to keep the same level of abstraction when potentially conflicting transforms are present.
- Rolled back to the previous version (i.e. no new canonicalization patterns).
- Moved isMinorIdentityWithBroadcasting inside the AffineMap class (although not sure if it really deserves this).
Let's address this in a separate patch since the vector dialect doesn't currently have the pass infrastructure, so it will be quite a bit of somewhat unrelated code.
Let's address this in a separate patch since the vector dialect doesn't currently have the pass infrastructure, so it will be quite a bit of somewhat unrelated code.
Ok, that sounds reasonable. Thanks!
period at end