This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Vector/
-
mlir/
-
Dialect/
-
Vector/
1/1
VectorOps.h
-
lib/Dialect/Vector/
-
Dialect/
-
Vector/
2/2
VectorTransforms.cpp
-
test/
-
Dialect/Vector/
-
Vector/
2/2
vector-transfer-lowering.mlir
-
lib/Transforms/
-
Transforms/
1
TestVectorTransforms.cpp

Differential D97822

[mlir][Vector] Lowering of transfer_read/write to vector.load/store
ClosedPublic

Authored by sgrechanik on Mar 2 2021, 5:32 PM.

Download Raw Diff

Details

Reviewers

dcaballe
bondhugula
aartbik
nicolasvasilache
rriddle

Commits

rGfd2b08969b8a: [mlir][Vector] Lowering of transfer_read/write to vector.load/store

Summary

This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.

Diff Detail

Event Timeline

sgrechanik created this revision.Mar 2 2021, 5:32 PM

Herald added subscribers: cota, teijeong, rdzhabarov and 14 others. · View Herald TranscriptMar 2 2021, 5:32 PM

sgrechanik requested review of this revision.Mar 2 2021, 5:32 PM

Herald added a subscriber: stephenneuendorffer. · View Herald TranscriptMar 2 2021, 5:32 PM

Harbormaster completed remote builds in B91703: Diff 327631.Mar 2 2021, 9:54 PM

Thanks for working on this, Sergei! Some initial comments!

mlir/include/mlir/Dialect/Vector/VectorOps.h
90	add `vector.store` as well?
mlir/lib/Dialect/Vector/VectorTransforms.cpp
2766	I think having a summary upfront of the different lowering options supported would be very useful. Same for transfer_write.
2778	I think this shouldn't be a TODO since transfer ops on tensors should be lowered to memrefs before this lowering.
mlir/test/Dialect/Vector/vector-transfer-lowering.mlir
97	Add TODOs for unsupported cases?
161	There are semantic differences between a scalar load and a single-element vector load. Not sure if LLVM is optimizing the latter into the former but the latter might be rejected by a backend if that vector length is not supported. I think we should generate an `std.load` for these cases.
mlir/test/lib/Transforms/TestVectorTransforms.cpp
372	I would like to know what others think about it but I think we should have an independent pass to perform the transfer op lowering and not only a test pass. It's very likely that different projects need the lowering at different stages of the pipeline. For example, we would like to lower transfer ops right after the vectorizer, others might want to keep transfer ops for a while after the vectorizer, linalg might want to lower them at some other point... Would it make more sense to have a pass that provides that level of flexibility?

Updated the diff according to Diego's feedback

Don't we want to make some of these folders/canonicalization patterns? That way, everyone doing rewriting benefits automatically, and does not need to explicitly pull in these new specific rewriting patterns?

Also note that I added masked-load/store-> transfer_load/store canonicalization patterns before we had the regular vector loads that Diego introduced. Does could benefit from lowering directly to loads/stores too.

Harbormaster completed remote builds in B92090: Diff 328187.Mar 4 2021, 10:08 PM

Don't we want to make some of these folders/canonicalization patterns? That way, everyone doing rewriting benefits automatically, and does not need to explicitly pull in these new specific rewriting patterns?

That sounds good to me but, if we always canonicalized transfer ops into vector and other ops, wouldn't that be a problem for existing Vector/Linalg passes expecting transfer ops?

In D97822#2606962, @dcaballe wrote:

That sounds good to me but, if we always canonicalized transfer ops into vector and other ops, wouldn't that be a problem for existing Vector/Linalg passes expecting transfer ops?

In the sense that e.g. unmasked one becomes regular loads, yes. But AFAIK, this is very much in the spirit of MLIR, to canonicalize everything to the simplest form possible, and have patterns in the rewriting rules that expect only the simplest form possible to avoid cluttering the patterns with implicit canonicalization.

In the sense that e.g. unmasked one becomes regular loads, yes. But AFAIK, this is very much in the spirit of MLIR, to canonicalize everything to the simplest form possible, and have patterns in the rewriting rules that expect only the simplest form possible to avoid cluttering the patterns with implicit canonicalization.

Good point! I guess we could give it a try and see what happens. Canonicalization patterns wouldn't definitely be a problem for us! The way I see transfer ops is more like ops that provide a higher level of abstraction than [masked] vector loads/stores and others. Some transfer ops might be canonicalized into more complicated low-level ops, including vector shuffles, so it would be more difficult to reason about shuffles than maybe a permutation map. In any case, we could always move the canonicalization patterns to a pass if they became problematic. Thanks for the feedback!

Moved the lowering pattern to canonicalization patterns.

Harbormaster completed remote builds in B92420: Diff 328674.Mar 5 2021, 4:01 PM

aartbik added inline comments.Mar 5 2021, 5:44 PM

mlir/lib/Dialect/Vector/VectorOps.cpp
2510 ↗	(On Diff #328674)	period at end
2516 ↗	(On Diff #328674)	Start comment as full sentence.
2527 ↗	(On Diff #328674)	Sorry, I should have called this out stronger in my original comment (note the "some of these"), but I am not sure we want all of these as part of canonicalization. I definitely think the masked -> unmasked version should be here, but replacing one transfer with broadcast with load feels indeed like a lowering pattern. So the rule more below for "store" should be here, and some of the rule here for "load". But anything that is not a clear simplification (1 transfer -> 1 broadcast and 1 load) should perhaps remain were it was. Of course, the way you have it now allows for easier code sharing. Let's see if Nicolas feels strongly about it one way or the other. For the part that remains, change comment to say "canonicalization" now instead of "progressive lowering" (the difference is subtle, but still)
2728 ↗	(On Diff #328674)	Change comment to say "canonicalization" now instead of "progressive lowering" (the difference is subtle, but still)
2760 ↗	(On Diff #328674)	This feels like it should be here, one complex transfer -> one simple load is a clear canonicalization ;-)

Some transfer ops might be canonicalized into more complicated low-level ops, including vector shuffles, so it would be more difficult to reason about shuffles than maybe a permutation map.

This may not belong to canonicalization then: we should not "lose high-level information" in the canonicalization, we should always have an IR easier to analyze when we run canonicalization in principle.

This is why I had the "some of these", as I also reiterated above. Dropping the mask is a simple canonicalizaiton imho, but adding another ops is not.
I like to have the rule that anything

n terms -> k terms with k < n or  
1 term -> simpler term

belongs to folding/canonicalization, anything else is progressive lowering or rewriting. Even this within reason, the n/k terms shouldn't be overly complex either.

Thanks for making progress on this.
I would refrain from making it a blanket canonicalization.

The rationale is that vector transformations use vector.transfer_read / transfer_write.
Too early blanket canonicalizations are likely to not play nicely with the whole end-to-end transformation pipeline.

isMinorIdentityWithBroadcasting should live at the same place as isMinorIdentity.

There are a bunch other moving parts related to rewriting some of transformations related to unrolling and better support for vectors as programming models for GPU (including extensions towards scalable vectors).
For now, I would say that hiding the patterns behind a populate function (but not a canonicalization) would be the simplest way forward.

For now, I would say that hiding the patterns behind a populate function (but not a canonicalization) would be the simplest way forward.

Ok, let's go back to the previous approach. As I mentioned before, I would suggest that we create a regular pass for this lowering, not only a test pass, so that we can compose it as needed in the pipeline.

In D97822#2609900, @nicolasvasilache wrote:

Too early blanket canonicalizations are likely to not play nicely with the whole end-to-end transformation pipeline.

Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?

I don't want to hold this CL for sgrechanik (so proceed as you see fit) but don't want to give up the underlying argument too easily either ;-)

Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?

I think we could have canonicalization patterns as long as we don't lower the level of abstraction in the process. For example, we could have canonicalization patterns for:

'vector.masked_load' -> 'vector.load' // All-true mask.
'vector.masked_load' -> () // All-zero mask.
'vector.transfer_read [masked=true]' -> 'vector.transfer_read [masked=false]' // All-true mask.
'vector.transfer_read [masked=true]' -> () // All-zero mask.
...

but we shouldn't have canonicalization patterns that go from a transfer op to a vector masked or unmasked op or vice-versa since they are at a different level of abstraction. Those transformations should be part of the lowering pass. I think your recent patch aligns with this approach very well: https://reviews.llvm.org/rGe5c8fc776fbd2c93e25f5749049ee31cf73a0a41

In D97822#2612037, @dcaballe wrote:

Agreed on the blanket. But how about the simple folding away of unmasked (or all-true/false mask). In the long run, it makes a lot more sense to have one place with that logic (folding/canonicalizaiton) and have all other rewriting rule that expect unmasked transfers simply work on the l/s instead?

I think we could have canonicalization patterns as long as we don't lower the level of abstraction in the process. For example, we could have canonicalization patterns for:

'vector.masked_load' -> 'vector.load' // All-true mask.

'vector.masked_load' -> () // All-zero mask.

'vector.transfer_read [masked=true]' -> 'vector.transfer_read [masked=false]' // All-true mask.

'vector.transfer_read [masked=true]' -> () // All-zero mask.

For transfers, grep for:

template <typename TransferOp>
static LogicalResult foldTransferMaskAttribute(TransferOp op) {

Feel free to add more as you see fit.

...

but we shouldn't have canonicalization patterns that go from a transfer op to a vector masked or unmasked op or vice-versa since they are at a different level of abstraction. Those transformations should be part of the lowering pass. I think your recent patch aligns with this approach very well: https://reviews.llvm.org/rGe5c8fc776fbd2c93e25f5749049ee31cf73a0a41

+1, canonicalization / folding need to keep the same level of abstraction when potentially conflicting transforms are present.

Rolled back to the previous version (i.e. no new canonicalization patterns).
Moved isMinorIdentityWithBroadcasting inside the AffineMap class (although not sure if it really deserves this).

Herald added a reviewer: rriddle. · View Herald TranscriptMar 8 2021, 1:33 PM

aartbik added inline comments.Mar 8 2021, 5:35 PM

mlir/include/mlir/IR/AffineMap.h
108 ↗	(On Diff #329123)	by 0 values read ambiguous: indicated by value 0 in the result
mlir/lib/IR/AffineMap.cpp
114 ↗	(On Diff #329123)	same

Harbormaster completed remote builds in B92737: Diff 329123.Mar 8 2021, 8:16 PM

nicolasvasilache accepted this revision.Mar 9 2021, 12:36 AM

This revision is now accepted and ready to land.Mar 9 2021, 12:36 AM

sgrechanik updated this revision to Diff 329400.Mar 9 2021, 10:59 AM

sgrechanik marked 2 inline comments as done.

Harbormaster completed remote builds in B92921: Diff 329400.Mar 9 2021, 10:21 PM

In D97822#2610502, @dcaballe wrote:

As I mentioned before, I would suggest that we create a regular pass for this lowering, not only a test pass, so that we can compose it as needed in the pipeline.

Let's address this in a separate patch since the vector dialect doesn't currently have the pass infrastructure, so it will be quite a bit of somewhat unrelated code.

Let's address this in a separate patch since the vector dialect doesn't currently have the pass infrastructure, so it will be quite a bit of somewhat unrelated code.

Ok, that sounds reasonable. Thanks!

This revision was landed with ongoing or failed builds.Mar 11 2021, 6:19 PM

Closed by commit rGfd2b08969b8a: [mlir][Vector] Lowering of transfer_read/write to vector.load/store (authored by sgrechanik). · Explain Why

This revision was automatically updated to reflect the committed changes.

sgrechanik added a commit: rGfd2b08969b8a: [mlir][Vector] Lowering of transfer_read/write to vector.load/store.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

VectorOps.h

7 lines

lib/

Dialect/

Vector/

VectorTransforms.cpp

150 lines

test/

Dialect/

Vector/

vector-transfer-lowering.mlir

208 lines

lib/

Transforms/

TestVectorTransforms.cpp

12 lines

Diff 328187

mlir/include/mlir/Dialect/Vector/VectorOps.h

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	/// ops in terms of more elementary vector "slice" ops. If all			/// ops in terms of more elementary vector "slice" ops. If all
	/// "produced" tuple values are "consumed" (the most common			/// "produced" tuple values are "consumed" (the most common
	/// use for "slices" ops), this lowering removes all tuple related			/// use for "slices" ops), this lowering removes all tuple related
	/// operations as well (through DCE and folding). If tuple values			/// operations as well (through DCE and folding). If tuple values
	/// "leak" coming in, however, some tuple related ops will remain.			/// "leak" coming in, however, some tuple related ops will remain.
	void populateVectorSlicesLoweringPatterns(OwningRewritePatternList &patterns,			void populateVectorSlicesLoweringPatterns(OwningRewritePatternList &patterns,
	MLIRContext *context);			MLIRContext *context);

				/// Collect a set of transfer read/write lowering patterns.
				///
				/// These patterns lower transfer ops to simpler ops like `vector.load`,
				dcaballeUnsubmitted Done Reply Inline Actions add `vector.store` as well? dcaballe: add `vector.store` as well?
				/// `vector.store` and `vector.broadcast`.
				void populateVectorTransferLoweringPatterns(OwningRewritePatternList &patterns,
				MLIRContext *context);

	/// An attribute that specifies the combining function for `vector.contract`,			/// An attribute that specifies the combining function for `vector.contract`,
	/// and `vector.reduction`.			/// and `vector.reduction`.
	class CombiningKindAttr			class CombiningKindAttr
	: public Attribute::AttrBase<CombiningKindAttr, Attribute,			: public Attribute::AttrBase<CombiningKindAttr, Attribute,
	detail::BitmaskEnumStorage> {			detail::BitmaskEnumStorage> {
	public:			public:
	using Base::Base;			using Base::Base;

	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorTransforms.cpp

Show All 31 Lines
#include "mlir/IR/Location.h"		#include "mlir/IR/Location.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/OperationSupport.h"		#include "mlir/IR/OperationSupport.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/IR/TypeUtilities.h"		#include "mlir/IR/TypeUtilities.h"
#include "mlir/IR/Types.h"		#include "mlir/IR/Types.h"
#include "mlir/Interfaces/VectorInterfaces.h"		#include "mlir/Interfaces/VectorInterfaces.h"

		#include "llvm/ADT/STLExtras.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

#define DEBUG_TYPE "vector-to-vector"		#define DEBUG_TYPE "vector-to-vector"

using namespace mlir;		using namespace mlir;

▲ Show 20 Lines • Show All 533 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = map.getNumResults(); i < e; ++i) {
int currPos = static_cast<int>(expr.getPosition());		int currPos = static_cast<int>(expr.getPosition());
if (lastPos.hasValue() && currPos != lastPos.getValue() + 1)		if (lastPos.hasValue() && currPos != lastPos.getValue() + 1)
return false;		return false;
lastPos = currPos;		lastPos = currPos;
}		}
return true;		return true;
}		}

		/// Returns true if 'map' is a suffix of an identity affine map except for
		/// broadcasted dimensions (which are indicated by 0 values in the result). If
		/// `broadcastedDims` is not null, it will be populated with the indices of the
		/// broadcasted dimensions.
		/// Example: affine_map<(d0, d1, d2, d3, d4) -> (0, d2, 0, d4)>
		/// `broadcastedDims` will contain [0, 2] (the result indices equal to 0)
		static bool isMinorIdentityWithBroadcasting(
		AffineMap map, SmallVectorImpl<unsigned> *broadcastedDims = nullptr) {
		if (broadcastedDims)
		broadcastedDims->clear();
		if (map.getNumDims() < map.getNumResults())
		return false;
		unsigned suffixStart = map.getNumDims() - map.getNumResults();
		for (auto idxAndExpr : llvm::enumerate(map.getResults())) {
		unsigned resIdx = idxAndExpr.index();
		AffineExpr expr = idxAndExpr.value();
		if (auto constExpr = expr.dyn_cast<AffineConstantExpr>()) {
		// Each result must be either a constant 0 (broadcasted dimension)
		if (constExpr.getValue() != 0)
		return false;
		if (broadcastedDims)
		broadcastedDims->push_back(resIdx);
		} else if (auto dimExpr = expr.dyn_cast<AffineDimExpr>()) {
		// or the input dimension corresponding to this result position.
		if (dimExpr.getPosition() != suffixStart + resIdx)
		return false;
		} else {
		return false;
		}
		}
		return true;
		}

/// Unroll transfer_read ops to the given shape and create an aggregate with all		/// Unroll transfer_read ops to the given shape and create an aggregate with all
/// the chunks.		/// the chunks.
static Value unrollTransferReadOp(vector::TransferReadOp readOp,		static Value unrollTransferReadOp(vector::TransferReadOp readOp,
ArrayRef<int64_t> targetShape,		ArrayRef<int64_t> targetShape,
OpBuilder &builder) {		OpBuilder &builder) {
if (!isIdentitySuffix(readOp.permutation_map()))		if (!isIdentitySuffix(readOp.permutation_map()))
return nullptr;		return nullptr;
auto sourceVectorType = readOp.getVectorType();		auto sourceVectorType = readOp.getVectorType();
▲ Show 20 Lines • Show All 2,127 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferWriteOp write,
}		}
vector_transfer_write(insert.vector(), write.source(), indices,		vector_transfer_write(insert.vector(), write.source(), indices,
write.permutation_map(), write.maskedAttr());		write.permutation_map(), write.maskedAttr());
rewriter.eraseOp(write);		rewriter.eraseOp(write);
return success();		return success();
}		}
};		};

		/// Progressive lowering of transfer_read. This pattern supports lowering of
		dcaballeUnsubmitted Done Reply Inline Actions I think having a summary upfront of the different lowering options supported would be very useful. Same for transfer_write. dcaballe: I think having a summary upfront of the different lowering options supported would be very…
		/// `vector.transfer_read` to a combination of `vector.load` and
		/// `vector.broadcast` if all of the following hold:
		/// - The op reads from a memref with the default layout.
		/// - Masking is not required.
		/// - If the memref's element type is a vector type then it coincides with the
		/// result type.
		/// - The permutation map doesn't perform permutation (broadcasting is allowed).
		struct TransferReadToVectorLoadLowering
		: public OpRewritePattern<vector::TransferReadOp> {
		TransferReadToVectorLoadLowering(MLIRContext *context)
		: OpRewritePattern<vector::TransferReadOp>(context) {}
		LogicalResult matchAndRewrite(vector::TransferReadOp read,
		dcaballeUnsubmitted Done Reply Inline Actions I think this shouldn't be a TODO since transfer ops on tensors should be lowered to memrefs before this lowering. dcaballe: I think this shouldn't be a TODO since transfer ops on tensors should be lowered to memrefs…
		PatternRewriter &rewriter) const override {
		SmallVector<unsigned, 4> broadcastedDims;
		// TODO: Support permutations.
		if (!isMinorIdentityWithBroadcasting(read.permutation_map(),
		&broadcastedDims))
		return failure();
		auto memRefType = read.getShapedType().dyn_cast<MemRefType>();
		if (!memRefType)
		return failure();

		// If there is broadcasting involved then we first load the unbroadcasted
		// vector, and then broadcast it with `vector.broadcast`.
		ArrayRef<int64_t> vectorShape = read.getVectorType().getShape();
		SmallVector<int64_t, 4> unbroadcastedVectorShape(vectorShape.begin(),
		vectorShape.end());
		for (unsigned i : broadcastedDims)
		unbroadcastedVectorShape[i] = 1;
		VectorType unbroadcastedVectorType = VectorType::get(
		unbroadcastedVectorShape, read.getVectorType().getElementType());

		// `vector.load` supports vector types as memref's elements only when the
		// resulting vector type is the same as the element type.
		if (memRefType.getElementType().isa<VectorType>() &&
		memRefType.getElementType() != unbroadcastedVectorType)
		return failure();
		// Only the default layout is supported by `vector.load`.
		// TODO: Support non-default layouts.
		if (!memRefType.getAffineMaps().empty())
		return failure();
		// TODO: When masking is required, we can create a MaskedLoadOp
		if (read.hasMaskedDim())
		return failure();

		Operation *loadOp;
		if (!broadcastedDims.empty() &&
		unbroadcastedVectorType.getNumElements() == 1) {
		// If broadcasting is required and the number of loaded elements is 1 then
		// we can create `std.load` instead of `vector.load`.
		loadOp = rewriter.create<mlir::LoadOp>(read.getLoc(), read.source(),
		read.indices());
		} else {
		// Otherwise create `vector.load`.
		loadOp = rewriter.create<vector::LoadOp>(read.getLoc(),
		unbroadcastedVectorType,
		read.source(), read.indices());
		}

		// Insert a broadcasting op if required.
		if (!broadcastedDims.empty()) {
		rewriter.replaceOpWithNewOp<vector::BroadcastOp>(
		read, read.getVectorType(), loadOp->getResult(0));
		} else {
		rewriter.replaceOp(read, loadOp->getResult(0));
		}

		return success();
		}
		};

		/// Progressive lowering of transfer_write. This pattern supports lowering of
		/// `vector.transfer_write` to `vector.store` if all of the following hold:
		/// - The op writes to a memref with the default layout.
		/// - Masking is not required.
		/// - If the memref's element type is a vector type then it coincides with the
		/// type of the written value.
		/// - The permutation map is the minor identity map (neither permutation nor
		/// broadcasting is allowed).
		struct TransferWriteToVectorStoreLowering
		: public OpRewritePattern<vector::TransferWriteOp> {
		TransferWriteToVectorStoreLowering(MLIRContext *context)
		: OpRewritePattern<vector::TransferWriteOp>(context) {}
		LogicalResult matchAndRewrite(vector::TransferWriteOp write,
		PatternRewriter &rewriter) const override {
		// TODO: Support non-minor-identity maps
		if (!write.permutation_map().isMinorIdentity())
		return failure();
		auto memRefType = write.getShapedType().dyn_cast<MemRefType>();
		if (!memRefType)
		return failure();
		// `vector.store` supports vector types as memref's elements only when the
		// type of the vector value being written is the same as the element type.
		if (memRefType.getElementType().isa<VectorType>() &&
		memRefType.getElementType() != write.getVectorType())
		return failure();
		// Only the default layout is supported by `vector.store`.
		// TODO: Support non-default layouts.
		if (!memRefType.getAffineMaps().empty())
		return failure();
		// TODO: When masking is required, we can create a MaskedStoreOp
		if (write.hasMaskedDim())
		return failure();
		rewriter.replaceOpWithNewOp<vector::StoreOp>(
		write, write.vector(), write.source(), write.indices());
		return success();
		}
		};

// Trims leading one dimensions from `oldType` and returns the result type.		// Trims leading one dimensions from `oldType` and returns the result type.
// Returns `vector<1xT>` if `oldType` only has one element.		// Returns `vector<1xT>` if `oldType` only has one element.
static VectorType trimLeadingOneDims(VectorType oldType) {		static VectorType trimLeadingOneDims(VectorType oldType) {
ArrayRef<int64_t> oldShape = oldType.getShape();		ArrayRef<int64_t> oldShape = oldType.getShape();
ArrayRef<int64_t> newShape =		ArrayRef<int64_t> newShape =
oldShape.drop_while([](int64_t dim) { return dim == 1; });		oldShape.drop_while([](int64_t dim) { return dim == 1; });
// Make sure we have at least 1 dimension per vector type requirements.		// Make sure we have at least 1 dimension per vector type requirements.
if (newShape.empty())		if (newShape.empty())
▲ Show 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	patterns.insert<BroadcastOpLowering,
ShapeCastOp2DUpCastRewritePattern,		ShapeCastOp2DUpCastRewritePattern,
ShapeCastOpRewritePattern>(context);		ShapeCastOpRewritePattern>(context);
patterns.insert<TransposeOpLowering,		patterns.insert<TransposeOpLowering,
ContractionOpLowering,		ContractionOpLowering,
ContractionOpToMatmulOpLowering,		ContractionOpToMatmulOpLowering,
ContractionOpToOuterProductOpLowering>(parameters, context);		ContractionOpToOuterProductOpLowering>(parameters, context);
// clang-format on		// clang-format on
}		}

		void mlir::vector::populateVectorTransferLoweringPatterns(
		OwningRewritePatternList &patterns, MLIRContext *context) {
		patterns.insert<TransferReadToVectorLoadLowering,
		TransferWriteToVectorStoreLowering>(context);
		}

mlir/test/Dialect/Vector/vector-transfer-lowering.mlir

This file was added.

				// RUN: mlir-opt %s -test-vector-transfer-lowering-patterns -split-input-file \| FileCheck %s

				// transfer_read/write are lowered to vector.load/store
				// CHECK-LABEL: func @transfer_to_load(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
				// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<4xf32>
				// CHECK-NEXT: vector.store %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<4xf32>
				// CHECK-NEXT: return %[[RES]] : vector<4xf32>
				// CHECK-NEXT: }

				func @transfer_to_load(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false]} : memref<8x8xf32>, vector<4xf32>
				vector.transfer_write %res, %mem[%i, %i] {masked = [false]} : vector<4xf32>, memref<8x8xf32>
				return %res : vector<4xf32>
				}

				// -----

				// n-D results are also supported.
				// CHECK-LABEL: func @transfer_2D(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {
				// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<2x4xf32>
				// CHECK-NEXT: vector.store %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<2x4xf32>
				// CHECK-NEXT: return %[[RES]] : vector<2x4xf32>
				// CHECK-NEXT: }

				func @transfer_2D(%mem : memref<8x8xf32>, %i : index) -> vector<2x4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false, false]} : memref<8x8xf32>, vector<2x4xf32>
				vector.transfer_write %res, %mem[%i, %i] {masked = [false, false]} : vector<2x4xf32>, memref<8x8xf32>
				return %res : vector<2x4xf32>
				}

				// -----

				// Vector element types are supported when the result has the same type.
				// CHECK-LABEL: func @transfer_vector_element(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xvector<2x4xf32>>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {
				// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>
				// CHECK-NEXT: vector.store %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>
				// CHECK-NEXT: return %[[RES]] : vector<2x4xf32>
				// CHECK-NEXT: }

				func @transfer_vector_element(%mem : memref<8x8xvector<2x4xf32>>, %i : index) -> vector<2x4xf32> {
				%cf0 = constant dense<0.0> : vector<2x4xf32>
				%res = vector.transfer_read %mem[%i, %i], %cf0 : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>
				vector.transfer_write %res, %mem[%i, %i] : vector<2x4xf32>, memref<8x8xvector<2x4xf32>>
				return %res : vector<2x4xf32>
				}

				// -----

				// TODO: Vector element types are not supported yet when the result has a
				// different type.
				// CHECK-LABEL: func @transfer_vector_element_different_types(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xvector<2x4xf32>>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<1x2x4xf32> {
				// CHECK-NEXT: %[[CF0:.*]] = constant dense<0.000000e+00> : vector<2x4xf32>
				// CHECK-NEXT: %[[RES:.*]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {masked = [false]} : memref<8x8xvector<2x4xf32>>, vector<1x2x4xf32>
				// CHECK-NEXT: vector.transfer_write %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] {masked = [false]} : vector<1x2x4xf32>, memref<8x8xvector<2x4xf32>>
				// CHECK-NEXT: return %[[RES]] : vector<1x2x4xf32>
				// CHECK-NEXT: }

				func @transfer_vector_element_different_types(%mem : memref<8x8xvector<2x4xf32>>, %i : index) -> vector<1x2x4xf32> {
				%cf0 = constant dense<0.0> : vector<2x4xf32>
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false]} : memref<8x8xvector<2x4xf32>>, vector<1x2x4xf32>
				vector.transfer_write %res, %mem[%i, %i] {masked = [false]} : vector<1x2x4xf32>, memref<8x8xvector<2x4xf32>>
				return %res : vector<1x2x4xf32>
				}

				// -----

				// TODO: transfer_read/write cannot be lowered because there is an unmasked
				// dimension.
				// CHECK-LABEL: func @transfer_2D_masked(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {
				// CHECK-NEXT: %[[CF0:.*]] = constant 0.000000e+00 : f32
				// CHECK-NEXT: %[[RES:.*]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {masked = [false, true]} : memref<8x8xf32>, vector<2x4xf32>
				// CHECK-NEXT: vector.transfer_write %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] {masked = [true, false]} : vector<2x4xf32>, memref<8x8xf32>
				// CHECK-NEXT: return %[[RES]] : vector<2x4xf32>
				// CHECK-NEXT: }

				func @transfer_2D_masked(%mem : memref<8x8xf32>, %i : index) -> vector<2x4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false, true]} : memref<8x8xf32>, vector<2x4xf32>
				vector.transfer_write %res, %mem[%i, %i] {masked = [true, false]} : vector<2x4xf32>, memref<8x8xf32>
				return %res : vector<2x4xf32>
				}

				// -----

				// TODO: transfer_read/write cannot be lowered because they are masked.
				dcaballeUnsubmitted Done Reply Inline Actions Add TODOs for unsupported cases? dcaballe: Add TODOs for unsupported cases?
				// CHECK-LABEL: func @transfer_masked(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
				// CHECK-NEXT: %[[CF0:.*]] = constant 0.000000e+00 : f32
				// CHECK-NEXT: %[[RES:.*]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] : memref<8x8xf32>, vector<4xf32>
				// CHECK-NEXT: vector.transfer_write %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] : vector<4xf32>, memref<8x8xf32>
				// CHECK-NEXT: return %[[RES]] : vector<4xf32>
				// CHECK-NEXT: }

				func @transfer_masked(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 : memref<8x8xf32>, vector<4xf32>
				vector.transfer_write %res, %mem[%i, %i] : vector<4xf32>, memref<8x8xf32>
				return %res : vector<4xf32>
				}

				// -----

				// TODO: transfer_read/write cannot be lowered to vector.load/store because the
				// memref has a non-default layout.
				// CHECK-LABEL: func @transfer_nondefault_layout(
				// CHECK-SAME: %[[MEM:.]]: memref<8x8xf32, #{{.}}>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
				// CHECK-NEXT: %[[CF0:.*]] = constant 0.000000e+00 : f32
				// CHECK-NEXT: %[[RES:.]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {masked = [false]} : memref<8x8xf32, #{{.}}>, vector<4xf32>
				// CHECK-NEXT: vector.transfer_write %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] {masked = [false]} : vector<4xf32>, memref<8x8xf32, #{{.*}}>
				// CHECK-NEXT: return %[[RES]] : vector<4xf32>
				// CHECK-NEXT: }

				#layout = affine_map<(d0, d1) -> (d0*16 + d1)>
				func @transfer_nondefault_layout(%mem : memref<8x8xf32, #layout>, %i : index) -> vector<4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false]} : memref<8x8xf32, #layout>, vector<4xf32>
				vector.transfer_write %res, %mem[%i, %i] {masked = [false]} : vector<4xf32>, memref<8x8xf32, #layout>
				return %res : vector<4xf32>
				}

				// -----

				// TODO: transfer_read/write cannot be lowered to vector.load/store yet when the
				// permutation map is not the minor identity map (up to broadcasting).
				// CHECK-LABEL: func @transfer_perm_map(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
				// CHECK-NEXT: %[[CF0:.*]] = constant 0.000000e+00 : f32
				// CHECK-NEXT: %[[RES:.]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {masked = [false], permutation_map = #{{.}}} : memref<8x8xf32>, vector<4xf32>
				// CHECK-NEXT: vector.transfer_write %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] {masked = [false], permutation_map = #{{.*}}} : vector<4xf32>, memref<8x8xf32>
				// CHECK-NEXT: return %[[RES]] : vector<4xf32>
				// CHECK-NEXT: }

				func @transfer_perm_map(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false], permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<8x8xf32>, vector<4xf32>
				vector.transfer_write %res, %mem[%i, %i] {masked = [false], permutation_map = affine_map<(d0, d1) -> (d0)>} : vector<4xf32>, memref<8x8xf32>
				return %res : vector<4xf32>
				}

				// -----

				// Lowering of transfer_read with broadcasting is supported (note that a `load`
				// is generated instead of a `vector.load`).
				// CHECK-LABEL: func @transfer_broadcasting(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
				dcaballeUnsubmitted Done Reply Inline Actions There are semantic differences between a scalar load and a single-element vector load. Not sure if LLVM is optimizing the latter into the former but the latter might be rejected by a backend if that vector length is not supported. I think we should generate an `std.load` for these cases. dcaballe: There are semantic differences between a scalar load and a single-element vector load. Not sure…
				// CHECK-NEXT: %[[LOAD:.*]] = load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>
				// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : f32 to vector<4xf32>
				// CHECK-NEXT: return %[[RES]] : vector<4xf32>
				// CHECK-NEXT: }

				#broadcast = affine_map<(d0, d1) -> (0)>
				func @transfer_broadcasting(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false], permutation_map = #broadcast} : memref<8x8xf32>, vector<4xf32>
				return %res : vector<4xf32>
				}

				// -----

				// An example with two broadcasted dimensions.
				// CHECK-LABEL: func @transfer_broadcasting_2D(
				// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4x4xf32> {
				// CHECK-NEXT: %[[LOAD:.*]] = load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>
				// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : f32 to vector<4x4xf32>
				// CHECK-NEXT: return %[[RES]] : vector<4x4xf32>
				// CHECK-NEXT: }

				#broadcast = affine_map<(d0, d1) -> (0, 0)>
				func @transfer_broadcasting_2D(%mem : memref<8x8xf32>, %i : index) -> vector<4x4xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i], %cf0 {masked = [false, false], permutation_map = #broadcast} : memref<8x8xf32>, vector<4x4xf32>
				return %res : vector<4x4xf32>
				}

				// -----

				// More complex broadcasting case (here a `vector.load` is generated).
				// CHECK-LABEL: func @transfer_broadcasting_complex(
				// CHECK-SAME: %[[MEM:.*]]: memref<10x20x30x8x8xf32>,
				// CHECK-SAME: %[[IDX:.*]]: index) -> vector<3x2x4x5xf32> {
				// CHECK-NEXT: %[[LOAD:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]], %[[IDX]], %[[IDX]], %[[IDX]]] : memref<10x20x30x8x8xf32>, vector<3x1x1x5xf32>
				// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : vector<3x1x1x5xf32> to vector<3x2x4x5xf32>
				// CHECK-NEXT: return %[[RES]] : vector<3x2x4x5xf32>
				// CHECK-NEXT: }

				#broadcast = affine_map<(d0, d1, d2, d3, d4) -> (d1, 0, 0, d4)>
				func @transfer_broadcasting_complex(%mem : memref<10x20x30x8x8xf32>, %i : index) -> vector<3x2x4x5xf32> {
				%cf0 = constant 0.0 : f32
				%res = vector.transfer_read %mem[%i, %i, %i, %i, %i], %cf0 {masked = [false, false, false, false], permutation_map = #broadcast} : memref<10x20x30x8x8xf32>, vector<3x2x4x5xf32>
				return %res : vector<3x2x4x5xf32>
				}

mlir/test/lib/Transforms/TestVectorTransforms.cpp

Show First 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	struct TestVectorTransferFullPartialSplitPatterns
}		}
};		};

struct TestVectorTransferOpt		struct TestVectorTransferOpt
: public PassWrapper<TestVectorTransferOpt, FunctionPass> {		: public PassWrapper<TestVectorTransferOpt, FunctionPass> {
void runOnFunction() override { transferOpflowOpt(getFunction()); }		void runOnFunction() override { transferOpflowOpt(getFunction()); }
};		};

		struct TestVectorTransferLoweringPatterns
		: public PassWrapper<TestVectorTransferLoweringPatterns, FunctionPass> {
		void runOnFunction() override {
		OwningRewritePatternList patterns;
		populateVectorTransferLoweringPatterns(patterns, &getContext());
		(void)applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));
		}
		};

		dcaballeUnsubmitted Not Done Reply Inline Actions I would like to know what others think about it but I think we should have an independent pass to perform the transfer op lowering and not only a test pass. It's very likely that different projects need the lowering at different stages of the pipeline. For example, we would like to lower transfer ops right after the vectorizer, others might want to keep transfer ops for a while after the vectorizer, linalg might want to lower them at some other point... Would it make more sense to have a pass that provides that level of flexibility? dcaballe: I would like to know what others think about it but I think we should have an independent pass…
} // end anonymous namespace		} // end anonymous namespace

namespace mlir {		namespace mlir {
namespace test {		namespace test {
void registerTestVectorConversions() {		void registerTestVectorConversions() {
PassRegistration<TestVectorToVectorConversion> vectorToVectorPass(		PassRegistration<TestVectorToVectorConversion> vectorToVectorPass(
"test-vector-to-vector-conversion",		"test-vector-to-vector-conversion",
"Test conversion patterns between ops in the vector dialect");		"Test conversion patterns between ops in the vector dialect");
Show All 23 Lines	PassRegistration<TestVectorDistributePatterns> distributePass(
"Test conversion patterns to distribute vector ops in the vector "		"Test conversion patterns to distribute vector ops in the vector "
"dialect");		"dialect");
PassRegistration<TestVectorToLoopPatterns> vectorToForLoop(		PassRegistration<TestVectorToLoopPatterns> vectorToForLoop(
"test-vector-to-forloop",		"test-vector-to-forloop",
"Test conversion patterns to break up a vector op into a for loop");		"Test conversion patterns to break up a vector op into a for loop");
PassRegistration<TestVectorTransferOpt> transferOpOpt(		PassRegistration<TestVectorTransferOpt> transferOpOpt(
"test-vector-transferop-opt",		"test-vector-transferop-opt",
"Test optimization transformations for transfer ops");		"Test optimization transformations for transfer ops");
		PassRegistration<TestVectorTransferLoweringPatterns> transferOpLoweringPass(
		"test-vector-transfer-lowering-patterns",
		"Test conversion patterns to lower transfer ops to other vector ops");
}		}
} // namespace test		} // namespace test
} // namespace mlir		} // namespace mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Lowering of transfer_read/write to vector.load/storeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 328187

mlir/include/mlir/Dialect/Vector/VectorOps.h

mlir/lib/Dialect/Vector/VectorTransforms.cpp

mlir/test/Dialect/Vector/vector-transfer-lowering.mlir

mlir/test/lib/Transforms/TestVectorTransforms.cpp

[mlir][Vector] Lowering of transfer_read/write to vector.load/store
ClosedPublic