This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Introduce 'vector.load' and 'vector.store' ops
ClosedPublic

Authored by dcaballe on Feb 5 2021, 3:13 PM.

Details

Summary

This patch adds the 'vector.load' and 'vector.store' ops to the Vector
dialect [1]. These operations model *contiguous* vector loads and stores
from/to memory. Their semantics are similar to the 'affine.vector_load' and
'affine.vector_store' counterparts but without the affine constraints. The
most relevant feature is that these new vector operations may perform a vector
load/store on memrefs with a non-vector element type, unlike 'std.load' and
'std.store' ops. This opens the representation to model more generic vector
load/store scenarios: unaligned vector loads/stores, perform scalar and vector
memory access on the same memref, decouple memory allocation constraints from
memory accesses, etc [1]. These operations will also facilitate the progressive
lowering of both Affine vector loads/stores and Vector transfer reads/writes
for those that read/write contiguous slices from/to memory.

In particular, this patch adds the 'vector.load' and 'vector.store' ops to the
Vector dialect, implements their lowering to the LLVM dialect, and changes the
lowering of 'affine.vector_load' and 'affine.vector_store' ops to the new vector
ops. The lowering of Vector transfer reads/writes will be implemented in the
future, probably as an independent pass. The API of 'vector.maskedload' and
'vector.maskedstore' has also been changed slightly to align it with the
transfer read/write ops and the vector new ops. This will improve reusability
among all these operations. For example, the lowering of 'vector.load',
'vector.store', 'vector.maskedload' and 'vector.maskedstore' to the LLVM dialect
is implemented with a single template conversion pattern.

[1] https://llvm.discourse.group/t/memref-type-and-data-layout/

Diff Detail

Event Timeline

dcaballe created this revision.Feb 5 2021, 3:13 PM
dcaballe requested review of this revision.Feb 5 2021, 3:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2021, 3:13 PM
bondhugula added inline comments.Feb 5 2021, 9:12 PM
mlir/include/mlir/Dialect/Vector/VectorOps.td
1322

contiguous n-D slice of memory is a bit inherently contradictory to me.

One nitpicky request already on the order in the td file.

We thought for a while to complete the memory operations with these unmasked versions, so thanks for doing this.
I am okay with this, but you may want to also check with Nicolas who had big future plans for the transfer operations
(but I feel these new versions are always good for progressive lowering).

I am a bit more on the fence of some of the name changes you did in the existing ones. Can you please clarify a bit what your reasoning was for these?

mlir/include/mlir/Dialect/Vector/VectorOps.td
1318

A bit nitpicky request, but I insist ;-)

The order in the ops file for memory operations is strictly in pairs

Vector_TransferReadOp / Vector_TransferWriteOp
Vector_MaskedLoadOp / Vector_MaskedStoreOp
Vector_GatherOp / Vector_ScatterOp
Vector_ExpandLoadOp / Vector_CompressStoreOp

you break that convention by placing the unmasked versions before the load and store respectively.
I would prefer to have the new ones in between the Transfer and Masked versions, but paired together and in the order load / store as you have

Thanks for the feedback!

We thought for a while to complete the memory operations with these unmasked versions, so thanks for doing this.
I am okay with this, but you may want to also check with Nicolas who had big future plans for the transfer operations
(but I feel these new versions are always good for progressive lowering).

Yeah, this is what we discussed with @nicolasvasilache and @ftynse to lower vector transfer ops. Hopefully the plan hasn't changed.

I am a bit more on the fence of some of the name changes you did in the existing ones. Can you please clarify a bit what your reasoning was for these?

The intention was to align the APIs so that we can write generic code for all the vector load/store flavors. For example, vector.maskedload had getResultVectorType() and vector.maskedstore had getValueVectorType() to return the vector type used in the op. These two methods were changed to getVectorType() to be able to write a single conversion pattern to LLVM for both of them, and also for vector.load and vector.store. As you can see in ConvertVectorToLLVM.cpp, VectorLoadStoreConversion pattern is used for the four ops thanks to the unified API, instead of having four independent patterns. For example, see the following code in VectorLoadStoreConversion:

// Resolve address.
auto vtype = this->typeConverter->convertType(*loadOrStoreOp.getVectorType()*)

I think this will expose more reusability opportunities beyond the current LLVM conversion pattern since these four operations are pretty similar. The same approach is followed by vector transfer ops and affine vector loads/stores. They also have getVectorType(), getMemRefType(), valueToStore, etc. I thought it would be a good idea to align vector.load/store and vector.maskedload/maskedstore with that existing API to facilitate reusability. We are somehow paving the way for a VectorMemoryOp interface. However, no strong opinion about the API names. We could also change the API of transfer and affine ops but that would require far more changes all over the place.

mlir/include/mlir/Dialect/Vector/VectorOps.td
1318

Sure, no problem at all! I thought that having the vector.load next to the vector.maskedload would help quickly see the different variants that we have to perform a vector load. I'll change that. Thanks!

1322

Ok, I can remove contiguous from this line. It should be enough with the line below: This slice is contiguous along the respective dimensions of the shape. This is what we have in the description of the affine counterparts.

The intention was to align the APIs so that we can write generic code for all the vector load/store flavors. For example, vector.maskedload had getResultVectorType() and vector.maskedstore had getValueVectorType() to return the vector type used in the op. These two methods were changed to getVectorType() to be able to write a single conversion pattern to LLVM for both of them, and also for vector.load and vector.store. As you can see in ConvertVectorToLLVM.cpp, VectorLoadStoreConversion pattern is used for the four ops thanks to the unified API, instead of having four independent patterns. For example, see the following code in VectorLoadStoreConversion:

Yes, aligning the API for that purpose makes sense, but you changed the API of masked-load/store, but not of gather/scatter and expand/compress, and all these mem ops more or less go together.
Perhaps, if you don't mind, you can send out another CL that does the API name change for all these, just so that we have "progressive" lowering for this change as well (and consistent over all mem ops)?

nicolasvasilache requested changes to this revision.Feb 8 2021, 11:58 PM

Thanks for working on this Diego.
Let's please connect vector.transfer_read/write too otherwise it is unclear the abstraction is the right progressive lowering funnel.

mlir/include/mlir/Dialect/Vector/VectorOps.td
1359

assumptions

1448

Same remark re. contiguous here, this makes me think that vector<2x4xf32> stores 32 contiguous bytes which is not the case.
I'd actually rephrase this part and describe it as something carrying the intent of "strided by the memref strides" but not contiguous.

This revision now requires changes to proceed.Feb 8 2021, 11:58 PM
dcaballe updated this revision to Diff 322549.Feb 9 2021, 5:15 PM
dcaballe marked an inline comment as done.

Address feedback. Thanks!

Perhaps, if you don't mind, you can send out another CL that does the API name change for all these, just so that we have "progressive" lowering for this change as well (and consistent over all mem ops)?

Sure! I will follow up with a patch for those ops.

Let's please connect vector.transfer_read/write too otherwise it is unclear the abstraction is the right progressive lowering funnel.

Would you mind if we do that in a separate patch? It would be a new feature and lowering vector transfers require much more work since we have to analyze the stride, the permutation map, the mask, the padding, etc. and then generate the corresponding lower level vector ops (vector.load/store, vector.broadcast, vector.shuffle, vector.maskedload/maskedstore, vector.gather/scatter, etc.). I was thinking about creating a LowerVectorTransferOps pass in a follow-up patch so that we all can contribute lowering cases incrementally. Does it sound reasonable?

mlir/include/mlir/Dialect/Vector/VectorOps.td
1322

I removed contiguous and changed the last part of this paragraph. Let me know if it's better now.

1448

I removed contiguous and changed the last part of this paragraph. Let me know if it's better now.

bondhugula requested changes to this revision.Feb 9 2021, 6:31 PM
bondhugula added inline comments.
mlir/lib/Dialect/Vector/VectorOps.cpp
2368

affineMaps.empty() would be the right check here as opposed to affineMaps.size() != 1.

This revision now requires changes to proceed.Feb 9 2021, 6:31 PM
dcaballe added inline comments.Feb 9 2021, 9:02 PM
mlir/lib/Dialect/Vector/VectorOps.cpp
2368

Oh, if a single identity map is provided, it's not kept around. Got it. Thanks!

dcaballe updated this revision to Diff 322582.Feb 9 2021, 9:02 PM

Addressed Uday's comment and reorganized the code as suggested by Aart.

Perhaps, if you don't mind, you can send out another CL that does the API name change for all these, just so that we have "progressive" lowering for this change as well (and consistent over all mem ops)?

Sure! I will follow up with a patch for those ops.

Let's please connect vector.transfer_read/write too otherwise it is unclear the abstraction is the right progressive lowering funnel.

Would you mind if we do that in a separate patch? It would be a new feature and lowering vector transfers require much more work since we have to analyze the stride, the permutation map, the mask, the padding, etc. and then generate the corresponding lower level vector ops (vector.load/store, vector.broadcast, vector.shuffle, vector.maskedload/maskedstore, vector.gather/scatter, etc.). I was thinking about creating a LowerVectorTransferOps pass in a follow-up patch so that we all can contribute lowering cases incrementally. Does it sound reasonable?

If this is a commitment that you are working on starting to connect the pieces next then fair enough :)

If this is a commitment that you are working on starting to connect the pieces next then fair enough :)

Yeah, we have that in our schedule. We'll start with the basic 1-D contiguous cases.

Any other comments?

This revision was not accepted when it landed; it landed in state Needs Review.Feb 12 2021, 10:53 AM
This revision was automatically updated to reflect the committed changes.