This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Lowering of transfer_read/write to vector.load/store
ClosedPublic

Authored by sgrechanik on Mar 2 2021, 5:32 PM.

Details

Summary

This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.

Diff Detail

Event Timeline

sgrechanik created this revision.Mar 2 2021, 5:32 PM
sgrechanik requested review of this revision.Mar 2 2021, 5:32 PM

Thanks for working on this, Sergei! Some initial comments!

mlir/include/mlir/Dialect/Vector/VectorOps.h
90 ↗(On Diff #327631)

add vector.store as well?

mlir/lib/Dialect/Vector/VectorTransforms.cpp
2766 ↗(On Diff #327631)

I think having a summary upfront of the different lowering options supported would be very useful. Same for transfer_write.

2778 ↗(On Diff #327631)

I think this shouldn't be a TODO since transfer ops on tensors should be lowered to memrefs before this lowering.

mlir/test/Dialect/Vector/vector-transfer-lowering.mlir
96 ↗(On Diff #327631)

Add TODOs for unsupported cases?

160 ↗(On Diff #327631)

There are semantic differences between a scalar load and a single-element vector load. Not sure if LLVM is optimizing the latter into the former but the latter might be rejected by a backend if that vector length is not supported. I think we should generate an std.load for these cases.

mlir/test/lib/Transforms/TestVectorTransforms.cpp
372 ↗(On Diff #327631)

I would like to know what others think about it but I think we should have an independent pass to perform the transfer op lowering and not only a test pass. It's very likely that different projects need the lowering at different stages of the pipeline. For example, we would like to lower transfer ops right after the vectorizer, others might want to keep transfer ops for a while after the vectorizer, linalg might want to lower them at some other point... Would it make more sense to have a pass that provides that level of flexibility?

sgrechanik updated this revision to Diff 328187.Mar 4 2021, 8:51 AM
sgrechanik marked 5 inline comments as done.

Updated the diff according to Diego's feedback

Don't we want to make some of these folders/canonicalization patterns? That way, everyone doing rewriting benefits automatically, and does not need to explicitly pull in these new specific rewriting patterns?

Also note that I added masked-load/store-> transfer_load/store canonicalization patterns before we had the regular vector loads that Diego introduced. Does could benefit from lowering directly to loads/stores too.

Don't we want to make some of these folders/canonicalization patterns? That way, everyone doing rewriting benefits automatically, and does not need to explicitly pull in these new specific rewriting patterns?

That sounds good to me but, if we always canonicalized transfer ops into vector and other ops, wouldn't that be a problem for existing Vector/Linalg passes expecting transfer ops?

That sounds good to me but, if we always canonicalized transfer ops into vector and other ops, wouldn't that be a problem for existing Vector/Linalg passes expecting transfer ops?

In the sense that e.g. unmasked one becomes regular loads, yes. But AFAIK, this is very much in the spirit of MLIR, to canonicalize everything to the simplest form possible, and have patterns in the rewriting rules that expect only the simplest form possible to avoid cluttering the patterns with implicit canonicalization.

In the sense that e.g. unmasked one becomes regular loads, yes. But AFAIK, this is very much in the spirit of MLIR, to canonicalize everything to the simplest form possible, and have patterns in the rewriting rules that expect only the simplest form possible to avoid cluttering the patterns with implicit canonicalization.

Good point! I guess we could give it a try and see what happens. Canonicalization patterns wouldn't definitely be a problem for us! The way I see transfer ops is more like ops that provide a higher level of abstraction than [masked] vector loads/stores and others. Some transfer ops might be canonicalized into more complicated low-level ops, including vector shuffles, so it would be more difficult to reason about shuffles than maybe a permutation map. In any case, we could always move the canonicalization patterns to a pass if they became problematic. Thanks for the feedback!

sgrechanik updated this revision to Diff 328674.Mar 5 2021, 3:57 PM
sgrechanik edited the summary of this revision. (Show Details)

Moved the lowering pattern to canonicalization patterns.

aartbik added inline comments.Mar 5 2021, 5:44 PM
mlir/lib/Dialect/Vector/VectorOps.cpp
2510

period at end

2516

Start comment as full sentence.

2527

Sorry, I should have called this out stronger in my original comment (note the "some of these"), but I am not sure we want all of these as part of canonicalization. I definitely think the masked -> unmasked version should be here, but replacing one transfer with broadcast with load feels indeed like a lowering pattern. So the rule more below for "store" should be here, and some of the rule here for "load".

But anything that is not a clear simplification (1 transfer -> 1 broadcast and 1 load) should perhaps remain were it was. Of course, the way you have it now allows for easier code sharing.

Let's see if Nicolas feels strongly about it one way or the other.

For the part that remains, change comment to say "canonicalization" now instead of "progressive lowering" (the difference is subtle, but still)

2728

Change comment to say "canonicalization" now instead of "progressive lowering" (the difference is subtle, but still)

2760

This feels like it should be here, one complex transfer -> one simple load is a clear canonicalization ;-)

Some transfer ops might be canonicalized into more complicated low-level ops, including vector shuffles, so it would be more difficult to reason about shuffles than maybe a permutation map.

This may not belong to canonicalization then: we should not "lose high-level information" in the canonicalization, we should always have an IR easier to analyze when we run canonicalization in principle.

This is why I had the "some of these", as I also reiterated above. Dropping the mask is a simple canonicalizaiton imho, but adding another ops is not.
I like to have the rule that anything

n terms -> k terms with k < n or  
1 term -> simpler term

belongs to folding/canonicalization, anything else is progressive lowering or rewriting. Even this within reason, the n/k terms shouldn't be overly complex either.

Thanks for making progress on this.
I would refrain from making it a blanket canonicalization.

The rationale is that vector transformations use vector.transfer_read / transfer_write.
Too early blanket canonicalizations are likely to not play nicely with the whole end-to-end transformation pipeline.

isMinorIdentityWithBroadcasting should live at the same place as isMinorIdentity.

There are a bunch other moving parts related to rewriting some of transformations related to unrolling and better support for vectors as programming models for GPU (including extensions towards scalable vectors).
For now, I would say that hiding the patterns behind a populate function (but not a canonicalization) would be the simplest way forward.

For now, I would say that hiding the patterns behind a populate function (but not a canonicalization) would be the simplest way forward.

Ok, let's go back to the previous approach. As I mentioned before, I would suggest that we create a regular pass for this lowering, not only a test pass, so that we can compose it as needed in the pipeline.

Too early blanket canonicalizations are likely to not play nicely with the whole end-to-end transformation pipeline.

Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?

I don't want to hold this CL for sgrechanik (so proceed as you see fit) but don't want to give up the underlying argument too easily either ;-)

Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?

I think we could have canonicalization patterns as long as we don't lower the level of abstraction in the process. For example, we could have canonicalization patterns for:

  • 'vector.masked_load' -> 'vector.load' // All-true mask.
  • 'vector.masked_load' -> () // All-zero mask.
  • 'vector.transfer_read [masked=true]' -> 'vector.transfer_read [masked=false]' // All-true mask.
  • 'vector.transfer_read [masked=true]' -> () // All-zero mask.
  • ...

but we shouldn't have canonicalization patterns that go from a transfer op to a vector masked or unmasked op or vice-versa since they are at a different level of abstraction. Those transformations should be part of the lowering pass. I think your recent patch aligns with this approach very well: https://reviews.llvm.org/rGe5c8fc776fbd2c93e25f5749049ee31cf73a0a41

Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?

I think we could have canonicalization patterns as long as we don't lower the level of abstraction in the process. For example, we could have canonicalization patterns for:

  • 'vector.masked_load' -> 'vector.load' // All-true mask.
  • 'vector.masked_load' -> () // All-zero mask.
  • 'vector.transfer_read [masked=true]' -> 'vector.transfer_read [masked=false]' // All-true mask.
  • 'vector.transfer_read [masked=true]' -> () // All-zero mask.

For transfers, grep for:

template <typename TransferOp>
static LogicalResult foldTransferMaskAttribute(TransferOp op) {

Feel free to add more as you see fit.

  • ...

but we shouldn't have canonicalization patterns that go from a transfer op to a vector masked or unmasked op or vice-versa since they are at a different level of abstraction. Those transformations should be part of the lowering pass. I think your recent patch aligns with this approach very well: https://reviews.llvm.org/rGe5c8fc776fbd2c93e25f5749049ee31cf73a0a41

+1, canonicalization / folding need to keep the same level of abstraction when potentially conflicting transforms are present.

sgrechanik updated this revision to Diff 329123.Mar 8 2021, 1:33 PM
sgrechanik marked 2 inline comments as done.
sgrechanik edited the summary of this revision. (Show Details)
  • Rolled back to the previous version (i.e. no new canonicalization patterns).
  • Moved isMinorIdentityWithBroadcasting inside the AffineMap class (although not sure if it really deserves this).
aartbik added inline comments.Mar 8 2021, 5:35 PM
mlir/include/mlir/IR/AffineMap.h
108 ↗(On Diff #329123)

by 0 values read ambiguous: indicated by value 0 in the result

mlir/lib/IR/AffineMap.cpp
114 ↗(On Diff #329123)

same

nicolasvasilache accepted this revision.Mar 9 2021, 12:36 AM
This revision is now accepted and ready to land.Mar 9 2021, 12:36 AM
sgrechanik marked 2 inline comments as done.

As I mentioned before, I would suggest that we create a regular pass for this lowering, not only a test pass, so that we can compose it as needed in the pipeline.

Let's address this in a separate patch since the vector dialect doesn't currently have the pass infrastructure, so it will be quite a bit of somewhat unrelated code.

dcaballe accepted this revision.Mar 10 2021, 11:13 AM

Let's address this in a separate patch since the vector dialect doesn't currently have the pass infrastructure, so it will be quite a bit of somewhat unrelated code.

Ok, that sounds reasonable. Thanks!

This revision was landed with ongoing or failed builds.Mar 11 2021, 6:19 PM
This revision was automatically updated to reflect the committed changes.