This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Vector/
-
mlir/
-
Dialect/
-
Vector/
-
VectorOps.h
-
lib/Dialect/Vector/
-
Dialect/
-
Vector/
17/24
VectorTransferOpTransforms.cpp
-
test/
-
Dialect/Vector/
-
Vector/
2/2
vector-transfer-flatten.mlir
-
lib/Dialect/Vector/
-
Dialect/
-
Vector/
-
TestVectorTransforms.cpp

Differential D114993

Patterns flattening vector transfers to 1D
ClosedPublic

Authored by Benoit on Dec 2 2021, 1:18 PM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
mravishankar
rriddle

Commits

rGaba437ceb237: [mlir][Vector] Patterns flattening vector transfers to 1D

Summary

This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.

For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.

The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.

The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Benoit created this revision.Dec 2 2021, 1:18 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 20 others. · View Herald TranscriptDec 2 2021, 1:18 PM

Herald added a reviewer: aartbik. · View Herald TranscriptDec 2 2021, 1:18 PM

Benoit requested review of this revision.Dec 2 2021, 1:18 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptDec 2 2021, 1:18 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Benoit added a reviewer: mravishankar.Dec 2 2021, 1:18 PM

Harbormaster completed remote builds in B137221: Diff 391443.Dec 2 2021, 1:50 PM

mravishankar added inline comments.Dec 2 2021, 2:35 PM

mlir/lib/Dialect/Vector/VectorTransforms.cpp
3422 ↗	(On Diff #391443)	I think you can drop the dimsize != 1 condition. The `strides[i] != productOfInnerMostSizes` should still hold.

Cool, thanks much for tackling this, glad to see that it seems to get you to reasonable assembly (inferring from your description?)

mlir/lib/Dialect/Vector/VectorTransforms.cpp
3394 ↗	(On Diff #391443)	Can we be more specific here: `collapseContiguousMemRefCollapseTo1D` ?
3398 ↗	(On Diff #391443)	nit:trivial braces.
3408 ↗	(On Diff #391443)	This should be sliced a bit differently and better reusing API around BuiltinTypes.h::450. Basically, `getStridesAndOffset` is the future-proof way to get the offset and strides. You want to check the last stride is 1. You can have this as a special helper `bool isContiguousMostMinorDimension()` or `bool isStrideOneMostMinor(int dim)` (whichever you find most natural and reusable). Then you need to determine if the whole type is contiguous. For this you should add a helper that: returns true empty or identity layout map (the weird trifecta I mentioned previously) then returns false if not fully static then perform a direct MemRefType comparison between a proper usage or `getStridesAndOffset`, `makeStridedLinearLayoutMap`, `canonicalizeStridedLayout`. There should be something similar already using similar logic that can be refactored. Giving this helper a name and reusing it in multiple places will be a nice cleanup.
3411 ↗	(On Diff #391443)	Nit: we avoid trivial braces in LLVM, here and below
3419 ↗	(On Diff #391443)	nit: we use camelCase in LLVM, here and below.
3420 ↗	(On Diff #391443)	nit `dimSize != 1`
3443 ↗	(On Diff #391443)	nit: trivial braces here and below (lift the comment out the the condition)
3469 ↗	(On Diff #391443)	Nice!
3539 ↗	(On Diff #391443)	`VectorTransforms.cpp` is way too bloated and we should split it up (a bit like the standard dialect). Please either start a new properly named `Vector/XXXPatterns.cpp` file to put these new patterns or find an existing one.
mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
23	You could use the form "memref<4x3x2x1xi8, offset: ?, strides: [6, 2, 1, 1]>" Due to weird biases, it is currently implemented with an underlying affine_map but the informatio may be clearer like this. Unfortunately it will still print with affine_map for now.

nicolasvasilache requested changes to this revision.Dec 3 2021, 7:47 AM

This revision now requires changes to proceed.Dec 3 2021, 7:47 AM

add rank reducing subview to drop unit dims

Thanks for the review comments! I'll apply them on monday. For now I've just updated this diff with the new idea to use rank reducing subviews to drop unit dims, (thanks @ThomasRaoux for the suggestion), that removes my need for https://reviews.llvm.org/D114821 so I can drop it now.

Benoit mentioned this in D114821: isReshapableDimBand: ignore strides of unit dims..Dec 3 2021, 8:19 PM

Harbormaster completed remote builds in B137486: Diff 391804.Dec 3 2021, 8:32 PM

nicolasvasilache requested changes to this revision.Dec 6 2021, 1:03 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Vector/VectorTransforms.cpp
3397 ↗	(On Diff #391804)	Please don't add more patterns to VectorTransforms.cpp, we need to split them out into better isolated logical units. Either add a new .cpp file at the same level with a proper name (see other XXXPatternXXX.cpp files) or put them in an already existing such file, depending on what is most appropriate.
3398 ↗	(On Diff #391804)	This should be its own pattern and return failure when it fails to apply.
3422 ↗	(On Diff #391804)	You could run this through clang-format. I locally have this in my `.bashrc` function git-format-add-and-amend(){ echo "git add $1 && git show --name-only \| egrep \".(\.cpp\|\.h)\" \| xargs -i clang-format --style=file -i {}; git add $1; git commit --amend" git add $1 && git show --name-only \| egrep ".(\.cpp\|\.h)" \| xargs -i clang-format --style=file -i {}; git add $1; git commit --amend }
3425 ↗	(On Diff #391804)	This should be its own pattern and return failure when it fails to apply.
3441 ↗	(On Diff #391804)	static bool isStaticShapeAndContiguousRowMajor(MemRefType memrefType) { if (!memrefType.hasStaticShape()) return false; int64_t offset; SmallVector<int64_t> strides; LogicalResult res = getStridesAndOffset(memrefType, strides, offset); if (failed(res)) return false; // You may want to improve the APIs here to minimize the code below to something // that is expected to be reusable by others. AffineExpr expr = makeCanonicalStridedLayoutExpr( memrefTyp.getSizes(), memrefType.getContext()); MemRefType canonicalMemRefType = MemRefType::get( memrefTyp.getSizes(), AffineMap::infer({expr})); int64_t canonicalOffset; SmallVector<int64_t> canonicalStrides; LogicalResult res = getStridesAndOffset( canonicalMemRefType, canonicalStrides, canonicalOffset); if (failed(res)) llvm_unreachable("Unexpected stride extraction error"); for (auto it : llvm::zip(strides, canonicalStrides)) if (std::get<0>(it) != std::get<1>(it)) return false; return true; }

This revision now requires changes to proceed.Dec 6 2021, 1:03 AM

apply some review comments

Harbormaster completed remote builds in B138644: Diff 393458.Dec 10 2021, 5:56 AM

nicolasvasilache added inline comments.Dec 10 2021, 6:30 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
214	you'll need comments on this function and everywhere below (ideally with a short IR example but not strictly necessary)

split into 2 patterns

Harbormaster completed remote builds in B138661: Diff 393479.Dec 10 2021, 7:13 AM

Benoit added inline comments.Dec 10 2021, 7:32 AM

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
23	Thanks for the tip. For consistency between the MLIR code and the `CHECK`'s I will stick to the `affine_map<...>` form for now.

nicolasvasilache added inline comments.Dec 10 2021, 7:34 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
214	Nit, top-levle function and class comments take 3 slashes `///`
215	nit:camelCase
227	use std::copy_if or one of the other stl transforms?
235	nice!
246	nit: camelCase `Drop`
251	This would only work for the static memref cases. It feels like you should either early exit if the type is not fully static or use the OpFoldResult-based builders.
259	std::count_if or llvm::count_if IIRC

nicolasvasilache added inline comments.Dec 10 2021, 8:14 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
269	@gysity, does this relate to the example we just discussed?

more review comments

Harbormaster completed remote builds in B138675: Diff 393499.Dec 10 2021, 8:16 AM

Benoit marked 6 inline comments as done.Dec 10 2021, 8:19 AM

Benoit added inline comments.

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
227	I didn't find a simple way to do that, because of how we need to create `reducedStrides`, not just `reducedShape`. While it would be possible to let `copy_if` create `reducedShape`, I thought that it was more readable if both vectors were created in the same way. In particular, it makes it plain that they have the same length, and that their i-th elements correspond to the same i-th dim.
251	The caller already has such an early exit, so I put assertions here.

gysit added a subscriber: gysit.Dec 10 2021, 8:39 AM

gysit added inline comments.

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
269	I think @Benoit `s revision addresses a similar topic but at a different level of the stack. I was looking into generating good transfer ops without switching between tensors and vectors in between and making sure rank-reduction works well. @Benoit optimizes the lowering of what we generate higher up in the stack by flattening the vectors, AFAIU. Looking forward to understand the performance implications but I could imagine this helps quite a bit with in combination with hoisting. In particular, for convolutions where we work on very high-dimensional vectors.

Thanks for pushing on this @Benoit !

I'd suggest slicing and dicing into smaller commits so we can better track if we ever need to bisect; some of the behavior is tricky to get right and the smaller the CLs + tests, the better we will be reviewing and revisiting in the future.
I think you can turn this into 4-5 commits that can then be more easily clicked.

Also, please don't forget about putting those outside of vectortransforms.cpp.

These are nice developments, sorry it is getting longer than what I think you initially signed for but OTOH you're doing things in the right and future-proof way, so I am grateful!

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
299	This is incorrect, the original transferReadOp may have some permutation (i.e. also be sure to insert a proper test). You want some projection map and compose that with the transfer permutation. @ThomasRaoux for off-EU-hours advice. Ah nm, my apologies I see you have a permutation_map test above. Could you just add a comment/TODO that would highlight this / provision for future work?
300	zeros is invalid as we discussed on discord. You can't go around a project map and applying it to values here.
338	same as above re projection map and zeros / identity map.
343	This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz.
353	This is generally useful and should go to BuiltinTypes.h with proper doc (`///` prefix) plz.
367	This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz. Also, can we retire helper functions that were not good enough for your use case (if there are too many uses please ignore this last point).
381	This should be moved to a proper place in the memref dialect (maybe some utils file and maybe there is already something similar) ?
399	I'd use the name "contiguous" in the pattern name and def. in the description otherwise the doc by itself would have invalid assumptions.
432	same comment here re 0 and permutation map this case may be a bit trickier though
442	same comments as the above pattern
477	same comments re 0 and map

mravishankar added inline comments.Dec 10 2021, 10:55 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
243	There might be a simpler way to do this. `SubViewOp` already has constructors to generate rank-reduced subviews. You can use the `inferRankReducedSubview` method [here[(https://github.com/llvm/llvm-project/blob/7f09aee0f6b4b00508d2cf86b0b1339c8d2ca2d1/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td#L1488) and get the result type and use that.

Apply Mahesh's suggestion to use inferRankReducedResultType.

Harbormaster completed remote builds in B138894: Diff 393792.Dec 12 2021, 7:18 PM

Benoit marked 15 inline comments as done.Dec 12 2021, 7:29 PM

Benoit marked 2 inline comments as done.Dec 12 2021, 7:35 PM

More review comments.

Harbormaster completed remote builds in B138897: Diff 393795.Dec 12 2021, 8:11 PM

Benoit marked 7 inline comments as done.Dec 12 2021, 8:11 PM

Hi Nicolas, thanks for the kind supportive words here regarding the usefulness of these patterns/helpers.

I think I've addressed the "make it correct" part of your comments, in particular, allZeroConstantIndexValues now checks that the indices of the transfer ops are really all zeros. And I've applied the other "localized" comments that you and Mahesh had (thanks again for those! in particular, Mahesh's comment allowed dropping ~20 lines of code by making dropUnitDims trivial).

I haven't yet addressed your comments about splitting this into multiple commits and about contributing these helpers to core headers:

Regarding splitting into multiple commits: note that these patterns are so far only enabled in by test-only flags in TestVectorTransforms.cpp, so they are not for now going to affect anything outside these tests. If I understand correctly, the place where granularity will affect how well people can bisect any issues, will be in how we eventually enable these patterns outside of these tests?
Regarding sharing helpers into core headers: I wonder if this would be best done anyway as a second step after this, and maybe even delay a little further to wait until a second use case arises from someone else, to get more context before blessing a particular helper into a core header? If you feel that you already have enough context to make this call now, that may be a sign that you or someone else with this experience, not I, should be making this move :-)
As you guessed, I am at this point looking for a time-economical way to wrap up this work :-)

rebased

Harbormaster completed remote builds in B138979: Diff 393908.Dec 13 2021, 8:45 AM

nicolasvasilache accepted this revision.Dec 13 2021, 11:50 AM

This revision is now accepted and ready to land.Dec 13 2021, 11:50 AM

nicolasvasilache mentioned this in rG0aea49a73083: [mlir][Vector] Patterns flattening vector transfers to 1D.Dec 13 2021, 1:50 PM

Sliced a first independent commit and landed as 0aea49a7308322e6987c7b45e4e0d7ab15609e78 as we discussed offline to help you land this.

Closed by commit rGaba437ceb237: [mlir][Vector] Patterns flattening vector transfers to 1D (authored by Benoit, committed by nicolasvasilache). · Explain WhyDec 13 2021, 2:42 PM

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rGaba437ceb237: [mlir][Vector] Patterns flattening vector transfers to 1D.

Herald added a reviewer: rriddle. · View Herald TranscriptDec 13 2021, 2:42 PM

Second part landed as aba437ceb2379f219935b98a10ca3c5081f0c8b7.

Note that I reduced the amount of tests as the combination did not bring much IMO, feel free to disagree and revive some of those.
Also, note that there are now 2 populate functions to get the behavior you wanted, both of the should be called in sequence.

Benoit mentioned this in D119202: Add case to handle 0-D vectors in FlattenContiguousRowMajorTransferWritePattern and FlattenContiguousRowMajorTransferReadPattern..Feb 7 2022, 7:49 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

VectorOps.h

2 lines

lib/

Dialect/

Vector/

VectorTransferOpTransforms.cpp

287 lines

test/

Dialect/

Vector/

vector-transfer-flatten.mlir

206 lines

lib/

Dialect/

Vector/

TestVectorTransforms.cpp

21 lines

Diff 393908

mlir/include/mlir/Dialect/Vector/VectorOps.h

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	/// Collect a set of leading one dimension removal patterns.			/// Collect a set of leading one dimension removal patterns.
	///			///
	/// These patterns insert vector.shape_cast to remove leading one dimensions			/// These patterns insert vector.shape_cast to remove leading one dimensions
	/// to expose more canonical forms of read/write/insert/extract operations.			/// to expose more canonical forms of read/write/insert/extract operations.
	/// With them, there are more chances that we can cancel out extract-insert			/// With them, there are more chances that we can cancel out extract-insert
	/// pairs or forward write-read pairs.			/// pairs or forward write-read pairs.
	void populateCastAwayVectorLeadingOneDimPatterns(RewritePatternSet &patterns);			void populateCastAwayVectorLeadingOneDimPatterns(RewritePatternSet &patterns);

				void populateFlattenVectorTransferPatterns(RewritePatternSet &patterns);

	/// Collect a set of patterns that bubble up/down bitcast ops.			/// Collect a set of patterns that bubble up/down bitcast ops.
	///			///
	/// These patterns move vector.bitcast ops to be before insert ops or after			/// These patterns move vector.bitcast ops to be before insert ops or after
	/// extract ops where suitable. With them, bitcast will happen on smaller			/// extract ops where suitable. With them, bitcast will happen on smaller
	/// vectors and there are more chances to share extract/insert ops.			/// vectors and there are more chances to share extract/insert ops.
	void populateBubbleVectorBitCastOpPatterns(RewritePatternSet &patterns);			void populateBubbleVectorBitCastOpPatterns(RewritePatternSet &patterns);

	/// Collect a set of transfer read/write lowering patterns.			/// Collect a set of transfer read/write lowering patterns.
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp

//===- VectorTransferOpTransforms.cpp - transfer op transforms ------------===//		//===- VectorTransferOpTransforms.cpp - transfer op transforms ------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements functions concerned with optimizing transfer_read and		// This file implements functions concerned with optimizing transfer_read and
// transfer_write ops.		// transfer_write ops.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/Dialect/Vector/VectorOps.h"		#include "mlir/Dialect/Vector/VectorOps.h"
#include "mlir/Dialect/Vector/VectorTransforms.h"		#include "mlir/Dialect/Vector/VectorTransforms.h"
#include "mlir/Dialect/Vector/VectorUtils.h"		#include "mlir/Dialect/Vector/VectorUtils.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/Dominance.h"		#include "mlir/IR/Dominance.h"
		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

#define DEBUG_TYPE "vector-transfer-opt"		#define DEBUG_TYPE "vector-transfer-opt"

#define DBGS() (llvm::dbgs() << '[' << DEBUG_TYPE << "] ")		#define DBGS() (llvm::dbgs() << '[' << DEBUG_TYPE << "] ")

using namespace mlir;		using namespace mlir;
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	void TransferOptimization::storeToLoadForwarding(vector::TransferReadOp read) {
}		}

LLVM_DEBUG(DBGS() << "Forward value from " << *lastwrite.getOperation()		LLVM_DEBUG(DBGS() << "Forward value from " << *lastwrite.getOperation()
<< " to: " << *read.getOperation() << "\n");		<< " to: " << *read.getOperation() << "\n");
read.replaceAllUsesWith(lastwrite.vector());		read.replaceAllUsesWith(lastwrite.vector());
opToErase.push_back(read.getOperation());		opToErase.push_back(read.getOperation());
}		}

		/// Drops unit dimensions from the input MemRefType.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions you'll need comments on this function and everywhere below (ideally with a short IR example but not strictly necessary) nicolasvasilache: you'll need comments on this function and everywhere below (ideally with a short IR example but…
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Nit, top-levle function and class comments take 3 slashes `///` nicolasvasilache: Nit, top-levle function and class comments take 3 slashes `///`
		static MemRefType dropUnitDims(MemRefType inputType) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit:camelCase nicolasvasilache: nit:camelCase
		ArrayRef<int64_t> none{};
		Type rankReducedType = memref::SubViewOp::inferRankReducedResultType(
		0, inputType, none, none, none);
		return canonicalizeStridedLayout(rankReducedType.cast<MemRefType>());
		}

		/// Creates a rank-reducing memref.subview op that drops unit dims from its
		/// input. Or just returns the input if it was already without unit dims.
		static Value rankReducingSubviewDroppingUnitDims(PatternRewriter &rewriter,
		mlir::Location loc,
		Value input) {
		MemRefType inputType = input.getType().cast<MemRefType>();
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions use std::copy_if or one of the other stl transforms? nicolasvasilache: use std::copy_if or one of the other stl transforms?
		BenoitAuthorUnsubmitted Done Reply Inline Actions I didn't find a simple way to do that, because of how we need to create `reducedStrides`, not just `reducedShape`. While it would be possible to let `copy_if` create `reducedShape`, I thought that it was more readable if both vectors were created in the same way. In particular, it makes it plain that they have the same length, and that their i-th elements correspond to the same i-th dim. Benoit: I didn't find a simple way to do that, because of how we need to create `reducedStrides`, not…
		assert(inputType.hasStaticShape());
		MemRefType resultType = dropUnitDims(inputType);
		if (resultType == inputType)
		return input;
		SmallVector<int64_t> subviewOffsets(inputType.getRank(), 0);
		SmallVector<int64_t> subviewStrides(inputType.getRank(), 1);
		return rewriter.create<memref::SubViewOp>(
		loc, resultType, input, subviewOffsets, inputType.getShape(),
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nice! nicolasvasilache: nice!
		subviewStrides);
		}

		/// Returns the number of dims that aren't unit dims.
		static int getReducedRank(ArrayRef<int64_t> shape) {
		return llvm::count_if(shape, [](int64_t dimSize) { return dimSize != 1; });
		}

		mravishankarUnsubmitted Done Reply Inline Actions There might be a simpler way to do this. `SubViewOp` already has constructors to generate rank-reduced subviews. You can use the `inferRankReducedSubview` method [here[(https://github.com/llvm/llvm-project/blob/7f09aee0f6b4b00508d2cf86b0b1339c8d2ca2d1/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td#L1488) and get the result type and use that. mravishankar: There might be a simpler way to do this. `SubViewOp` already has constructors to generate rank…
		/// Returns true if all values are `arith.constant 0 : index`
		static bool allZeroConstantIndexValues(ValueRange values) {
		for (Value value : values) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: camelCase `Drop` nicolasvasilache: nit: camelCase `Drop`
		auto cst = value.getDefiningOp<arith::ConstantIndexOp>();
		if (!cst)
		return false;
		if (cst.value() != 0)
		return false;
		nicolasvasilacheUnsubmitted Done Reply Inline Actions This would only work for the static memref cases. It feels like you should either early exit if the type is not fully static or use the OpFoldResult-based builders. nicolasvasilache: This would only work for the static memref cases. It feels like you should either early exit if…
		BenoitAuthorUnsubmitted Done Reply Inline Actions The caller already has such an early exit, so I put assertions here. Benoit: The caller already has such an early exit, so I put assertions here.
		}
		return true;
		}

		/// Rewrites vector.transfer_read ops where the source has unit dims, by
		/// inserting a memref.subview dropping those unit dims.
		class TransferReadDropUnitDimsPattern
		: public OpRewritePattern<vector::TransferReadOp> {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions std::count_if or llvm::count_if IIRC nicolasvasilache: std::count_if or llvm::count_if IIRC
		using OpRewritePattern<vector::TransferReadOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(vector::TransferReadOp transferReadOp,
		PatternRewriter &rewriter) const override {
		auto loc = transferReadOp.getLoc();
		Value vector = transferReadOp.vector();
		VectorType vectorType = vector.getType().cast<VectorType>();
		Value source = transferReadOp.source();
		MemRefType sourceType = source.getType().cast<MemRefType>();
		if (!sourceType.hasStaticShape())
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions @gysity, does this relate to the example we just discussed? nicolasvasilache: @gysity, does this relate to the example we just discussed?
		gysitUnsubmitted Not Done Reply Inline Actions I think @Benoit `s revision addresses a similar topic but at a different level of the stack. I was looking into generating good transfer ops without switching between tensors and vectors in between and making sure rank-reduction works well. @Benoit optimizes the lowering of what we generate higher up in the stack by flattening the vectors, AFAIU. Looking forward to understand the performance implications but I could imagine this helps quite a bit with in combination with hoisting. In particular, for convolutions where we work on very high-dimensional vectors. gysit: I think @Benoit `s revision addresses a similar topic but at a different level of the stack. I…
		return failure();
		if (sourceType.getNumElements() != vectorType.getNumElements())
		return failure();
		// TODO: generalize this pattern, relax the requirements here.
		if (transferReadOp.hasOutOfBoundsDim())
		return failure();
		if (!transferReadOp.permutation_map().isMinorIdentity())
		return failure();
		int reducedRank = getReducedRank(sourceType.getShape());
		if (reducedRank == sourceType.getRank())
		return failure(); // The source shape can't be further reduced.
		if (reducedRank != vectorType.getRank())
		return failure(); // This pattern requires the vector shape to match the
		// reduced source shape.
		if (!allZeroConstantIndexValues(transferReadOp.indices()))
		return failure();
		Value reducedShapeSource =
		rankReducingSubviewDroppingUnitDims(rewriter, loc, source);
		Value c0 = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		SmallVector<Value> zeros(reducedRank, c0);
		auto identityMap = rewriter.getMultiDimIdentityMap(reducedRank);
		rewriter.replaceOpWithNewOp<vector::TransferReadOp>(
		transferReadOp, vectorType, reducedShapeSource, zeros, identityMap);
		return success();
		}
		};

		/// Rewrites vector.transfer_write ops where the "source" (i.e. destination) has
		/// unit dims, by inserting a memref.subview dropping those unit dims.
		class TransferWriteDropUnitDimsPattern
		nicolasvasilacheUnsubmitted Done Reply Inline Actions This is incorrect, the original transferReadOp may have some permutation (i.e. also be sure to insert a proper test). You want some projection map and compose that with the transfer permutation. @ThomasRaoux for off-EU-hours advice. Ah nm, my apologies I see you have a permutation_map test above. Could you just add a comment/TODO that would highlight this / provision for future work? nicolasvasilache: This is incorrect, the original transferReadOp may have some permutation (i.e. also be sure to…
		: public OpRewritePattern<vector::TransferWriteOp> {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions zeros is invalid as we discussed on discord. You can't go around a project map and applying it to values here. nicolasvasilache: zeros is invalid as we discussed on discord. You can't go around a project map and applying it…
		using OpRewritePattern<vector::TransferWriteOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(vector::TransferWriteOp transferWriteOp,
		PatternRewriter &rewriter) const override {
		auto loc = transferWriteOp.getLoc();
		Value vector = transferWriteOp.vector();
		VectorType vectorType = vector.getType().cast<VectorType>();
		Value source = transferWriteOp.source();
		MemRefType sourceType = source.getType().cast<MemRefType>();
		if (!sourceType.hasStaticShape())
		return failure();
		if (sourceType.getNumElements() != vectorType.getNumElements())
		return failure();
		// TODO: generalize this pattern, relax the requirements here.
		if (transferWriteOp.hasOutOfBoundsDim())
		return failure();
		if (!transferWriteOp.permutation_map().isMinorIdentity())
		return failure();
		int reducedRank = getReducedRank(sourceType.getShape());
		if (reducedRank == sourceType.getRank())
		return failure(); // The source shape can't be further reduced.
		if (reducedRank != vectorType.getRank())
		return failure(); // This pattern requires the vector shape to match the
		// reduced source shape.
		if (!allZeroConstantIndexValues(transferWriteOp.indices()))
		return failure();
		Value reducedShapeSource =
		rankReducingSubviewDroppingUnitDims(rewriter, loc, source);
		Value c0 = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		SmallVector<Value> zeros(reducedRank, c0);
		auto identityMap = rewriter.getMultiDimIdentityMap(reducedRank);
		rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
		transferWriteOp, vector, reducedShapeSource, zeros, identityMap);
		return success();
		}
		};

		static AffineExpr getOffsetExpr(MemRefType memrefType) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions same as above re projection map and zeros / identity map. nicolasvasilache: same as above re projection map and zeros / identity map.
		SmallVector<AffineExpr> strides;
		AffineExpr offset;
		LogicalResult res = getStridesAndOffset(memrefType, strides, offset);
		assert(succeeded(res));
		(void)res;
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz. nicolasvasilache: This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz.
		(void)strides;
		return offset;
		}

		static MemRefType makeContiguousRowMajorMemRefType(MLIRContext *context,
		ArrayRef<int64_t> shape,
		Type elementType,
		AffineExpr offset) {
		AffineExpr canonical = makeCanonicalStridedLayoutExpr(shape, context);
		AffineExpr contiguousRowMajor = canonical + offset;
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This is generally useful and should go to BuiltinTypes.h with proper doc (`///` prefix) plz. nicolasvasilache: This is generally useful and should go to BuiltinTypes.h with proper doc (`///` prefix) plz.
		AffineMap contiguousRowMajorMap =
		AffineMap::inferFromExprList({contiguousRowMajor})[0];
		return MemRefType::get(shape, elementType, contiguousRowMajorMap);
		}

		/// Helper determining if a memref is static-shape and contiguous-row-major
		/// layout, still allowing an arbitrary offset (unlike some existing similar
		/// functions).
		static bool isStaticShapeAndContiguousRowMajor(MemRefType memrefType) {
		if (!memrefType.hasStaticShape()) {
		return false;
		}
		AffineExpr offset = getOffsetExpr(memrefType);
		MemRefType contiguousRowMajorMemRefType = makeContiguousRowMajorMemRefType(
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz. Also, can we retire helper functions that were not good enough for your use case (if there are too many uses please ignore this last point). nicolasvasilache: This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz.
		memrefType.getContext(), memrefType.getShape(),
		memrefType.getElementType(), offset);
		return canonicalizeStridedLayout(memrefType) ==
		canonicalizeStridedLayout(contiguousRowMajorMemRefType);
		}

		/// Creates a memref.collapse_shape collapsing all of the dimensions of the
		/// input into a 1D shape.
		static Value collapseContiguousRowMajorMemRefTo1D(PatternRewriter &rewriter,
		mlir::Location loc,
		Value input) {
		Value rankReducedInput =
		rankReducingSubviewDroppingUnitDims(rewriter, loc, input);
		ShapedType rankReducedInputType =
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This should be moved to a proper place in the memref dialect (maybe some utils file and maybe there is already something similar) ? nicolasvasilache: This should be moved to a proper place in the memref dialect (maybe some utils file and maybe…
		rankReducedInput.getType().cast<ShapedType>();
		if (rankReducedInputType.getRank() == 1)
		return rankReducedInput;
		ReassociationIndices indices;
		for (int i = 0; i < rankReducedInputType.getRank(); ++i)
		indices.push_back(i);
		return rewriter.create<memref::CollapseShapeOp>(
		loc, rankReducedInput, std::array<ReassociationIndices, 1>{indices});
		}

		/// Rewrites contiguous row-major vector.transfer_read ops by inserting
		/// memref.collapse_shape on the source so that the resulting
		/// vector.transfer_read has a 1D source. Requires the source shape to be
		/// already reduced i.e. without unit dims.
		class FlattenContiguousRowMajorTransferReadPattern
		: public OpRewritePattern<vector::TransferReadOp> {
		using OpRewritePattern<vector::TransferReadOp>::OpRewritePattern;

		nicolasvasilacheUnsubmitted Done Reply Inline Actions I'd use the name "contiguous" in the pattern name and def. in the description otherwise the doc by itself would have invalid assumptions. nicolasvasilache: I'd use the name "contiguous" in the pattern name and def. in the description otherwise the doc…
		LogicalResult matchAndRewrite(vector::TransferReadOp transferReadOp,
		PatternRewriter &rewriter) const override {
		auto loc = transferReadOp.getLoc();
		Value vector = transferReadOp.vector();
		VectorType vectorType = vector.getType().cast<VectorType>();
		Value source = transferReadOp.source();
		MemRefType sourceType = source.getType().cast<MemRefType>();
		if (vectorType.getRank() == 1 && sourceType.getRank() == 1)
		// Already 1D, nothing to do.
		return failure();
		if (!isStaticShapeAndContiguousRowMajor(sourceType))
		return failure();
		if (getReducedRank(sourceType.getShape()) != sourceType.getRank())
		// This pattern requires the source to already be rank-reduced.
		return failure();
		if (sourceType.getNumElements() != vectorType.getNumElements())
		return failure();
		// TODO: generalize this pattern, relax the requirements here.
		if (transferReadOp.hasOutOfBoundsDim())
		return failure();
		if (!transferReadOp.permutation_map().isMinorIdentity())
		return failure();
		if (transferReadOp.mask())
		return failure();
		if (!allZeroConstantIndexValues(transferReadOp.indices()))
		return failure();
		Value c0 = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		auto identityMap1D = rewriter.getMultiDimIdentityMap(1);
		VectorType vectorType1d = VectorType::get({sourceType.getNumElements()},
		sourceType.getElementType());
		Value source1d =
		collapseContiguousRowMajorMemRefTo1D(rewriter, loc, source);
		Value read1d = rewriter.create<vector::TransferReadOp>(
		nicolasvasilacheUnsubmitted Done Reply Inline Actions same comment here re 0 and permutation map this case may be a bit trickier though nicolasvasilache: same comment here re 0 and permutation map this case may be a bit trickier though
		loc, vectorType1d, source1d, ValueRange{c0}, identityMap1D);
		rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(
		transferReadOp, vector.getType().cast<VectorType>(), read1d);
		return success();
		}
		};

		/// Rewrites contiguous row-major vector.transfer_write ops by inserting
		/// memref.collapse_shape on the source so that the resulting
		/// vector.transfer_write has a 1D source. Requires the source shape to be
		nicolasvasilacheUnsubmitted Done Reply Inline Actions same comments as the above pattern nicolasvasilache: same comments as the above pattern
		/// already reduced i.e. without unit dims.
		class FlattenContiguousRowMajorTransferWritePattern
		: public OpRewritePattern<vector::TransferWriteOp> {
		using OpRewritePattern<vector::TransferWriteOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(vector::TransferWriteOp transferWriteOp,
		PatternRewriter &rewriter) const override {
		auto loc = transferWriteOp.getLoc();
		Value vector = transferWriteOp.vector();
		VectorType vectorType = vector.getType().cast<VectorType>();
		Value source = transferWriteOp.source();
		MemRefType sourceType = source.getType().cast<MemRefType>();
		if (vectorType.getRank() == 1 && sourceType.getRank() == 1)
		// Already 1D, nothing to do.
		return failure();
		if (!isStaticShapeAndContiguousRowMajor(sourceType))
		return failure();
		if (getReducedRank(sourceType.getShape()) != sourceType.getRank())
		// This pattern requires the source to already be rank-reduced.
		return failure();
		if (sourceType.getNumElements() != vectorType.getNumElements())
		return failure();
		// TODO: generalize this pattern, relax the requirements here.
		if (transferWriteOp.hasOutOfBoundsDim())
		return failure();
		if (!transferWriteOp.permutation_map().isMinorIdentity())
		return failure();
		if (transferWriteOp.mask())
		return failure();
		if (!allZeroConstantIndexValues(transferWriteOp.indices()))
		return failure();
		Value c0 = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		auto identityMap1D = rewriter.getMultiDimIdentityMap(1);
		VectorType vectorType1d = VectorType::get({sourceType.getNumElements()},
		sourceType.getElementType());
		nicolasvasilacheUnsubmitted Done Reply Inline Actions same comments re 0 and map nicolasvasilache: same comments re 0 and map
		Value source1d =
		collapseContiguousRowMajorMemRefTo1D(rewriter, loc, source);
		Value vector1d =
		rewriter.create<vector::ShapeCastOp>(loc, vectorType1d, vector);
		rewriter.create<vector::TransferWriteOp>(loc, vector1d, source1d,
		ValueRange{c0}, identityMap1D);
		rewriter.eraseOp(transferWriteOp);
		return success();
		}
		};

} // namespace		} // namespace

void mlir::vector::transferOpflowOpt(FuncOp func) {		void mlir::vector::transferOpflowOpt(FuncOp func) {
TransferOptimization opt(func);		TransferOptimization opt(func);
// Run store to load forwarding first since it can expose more dead store		// Run store to load forwarding first since it can expose more dead store
// opportunity.		// opportunity.
func.walk([&](vector::TransferReadOp read) {		func.walk([&](vector::TransferReadOp read) {
if (read.getShapedType().isa<MemRefType>())		if (read.getShapedType().isa<MemRefType>())
opt.storeToLoadForwarding(read);		opt.storeToLoadForwarding(read);
});		});
opt.removeDeadOp();		opt.removeDeadOp();
func.walk([&](vector::TransferWriteOp write) {		func.walk([&](vector::TransferWriteOp write) {
if (write.getShapedType().isa<MemRefType>())		if (write.getShapedType().isa<MemRefType>())
opt.deadStoreOp(write);		opt.deadStoreOp(write);
});		});
opt.removeDeadOp();		opt.removeDeadOp();
}		}

		void mlir::vector::populateFlattenVectorTransferPatterns(
		RewritePatternSet &patterns) {
		patterns
		.add<TransferReadDropUnitDimsPattern, TransferWriteDropUnitDimsPattern,
		FlattenContiguousRowMajorTransferReadPattern,
		FlattenContiguousRowMajorTransferWritePattern>(
		patterns.getContext());
		populateShapeCastFoldingPatterns(patterns);
		}

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir

This file was added.

				// RUN: mlir-opt %s -test-vector-transfer-flatten-patterns -split-input-file \| FileCheck %s

				func @transfer_read_flattenable(%arg : memref<5x4x3x2xi8>) -> vector<5x4x3x2xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<5x4x3x2xi8>, vector<5x4x3x2xi8>
				return %v : vector<5x4x3x2xi8>
				}

				// CHECK-LABEL: func @transfer_read_flattenable
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8>
				// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]{{.}} : memref<5x4x3x2xi8> into memref<120xi8>
				// CHECK: %[[READ1D:.+]] = vector.transfer_read %[[COLLAPSED]]
				// CHECK: %[[VEC2D:.+]] = vector.shape_cast %[[READ1D]] : vector<120xi8> to vector<5x4x3x2xi8>
				// CHECK: return %[[VEC2D]]

				// -----

				func @transfer_read_flattenable_with_offset(%arg : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 24 + d1 * 6 + d2 * 2 + d3 + s0)>>) -> vector<5x4x3x2xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 24 + d1 * 6 + d2 * 2 + d3 + s0)>>, vector<5x4x3x2xi8>
				return %v : vector<5x4x3x2xi8>
				nicolasvasilacheUnsubmitted Done Reply Inline Actions You could use the form "memref<4x3x2x1xi8, offset: ?, strides: [6, 2, 1, 1]>" Due to weird biases, it is currently implemented with an underlying affine_map but the informatio may be clearer like this. Unfortunately it will still print with affine_map for now. nicolasvasilache: You could use the form "memref<4x3x2x1xi8, offset: ?, strides: [6, 2, 1, 1]>" Due to weird…
				BenoitAuthorUnsubmitted Done Reply Inline Actions Thanks for the tip. For consistency between the MLIR code and the `CHECK`'s I will stick to the `affine_map<...>` form for now. Benoit: Thanks for the tip. For consistency between the MLIR code and the `CHECK`'s I will stick to the…
				}

				// CHECK-LABEL: func @transfer_read_flattenable_with_offset
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8
				// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]
				// CHECK: %[[READ1D:.+]] = vector.transfer_read %[[COLLAPSED]]
				// CHECK: %[[VEC2D:.+]] = vector.shape_cast %[[READ1D]] : vector<120xi8> to vector<5x4x3x2xi8>
				// CHECK: return %[[VEC2D]]

				// -----

				func @transfer_read_flattenable_with_offset_with_rank_reducing_subview(%arg : memref<1x1x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 6 + d2 * 2 + d3 + s0)>>) -> vector<3x2xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<1x1x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 6 + d2 * 2 + d3 + s0)>>, vector<3x2xi8>
				return %v : vector<3x2xi8>
				}

				// CHECK-LABEL: func @transfer_read_flattenable_with_offset
				// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2xi8
				// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
				// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[SUBVIEW]] {{.}}[0, 1]
				// CHECK: %[[READ1D:.+]] = vector.transfer_read %[[COLLAPSED]]
				// CHECK: %[[VEC2D:.+]] = vector.shape_cast %[[READ1D]] : vector<6xi8> to vector<3x2xi8>
				// CHECK: return %[[VEC2D]]

				// -----

				func @transfer_read_flattenable_with_offset_with_rank_reducing_subview_and_no_collapse(%arg : memref<1x1x1x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 2 + d1 * 2 + d2 * 2 + d3 + s0)>>) -> vector<2xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<1x1x1x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 2 + d1 * 2 + d2 * 2 + d3 + s0)>>, vector<2xi8>
				return %v : vector<2xi8>
				}

				// CHECK-LABEL: func @transfer_read_flattenable_with_offset
				// CHECK-SAME: %[[ARG:.+]]: memref<1x1x1x2xi8
				// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 1, 2] [1, 1, 1, 1]
				// CHECK: %[[READ1D:.+]] = vector.transfer_read %[[SUBVIEW]]
				// CHECK: return %[[READ1D]]

				// -----

				func @transfer_read_nonflattenable_out_of_bounds(%arg : memref<5x4x3x2xi8>, %i : index) -> vector<5x4x3x2xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%i, %c0, %c0, %c0], %cst {in_bounds = [false, true, true, true]} : memref<5x4x3x2xi8>, vector<5x4x3x2xi8>
				return %v : vector<5x4x3x2xi8>
				}

				// CHECK-LABEL: func @transfer_read_nonflattenable_out_of_bounds
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8>,
				// CHECK-SAME: %[[I:.+]]: index
				// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]][%[[I]]
				// CHECK: return %[[READ]]

				// -----

				func @transfer_read_nonflattenable_non_contiguous(%arg : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 25 + d1 * 6 + d2 * 2 + d3)>>) -> vector<5x4x3x2xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 25 + d1 * 6 + d2 * 2 + d3)>>, vector<5x4x3x2xi8>
				return %v : vector<5x4x3x2xi8>
				}

				// CHECK-LABEL: func @transfer_read_nonflattenable_non_contiguous
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8,
				// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]]
				// CHECK: return %[[READ]]

				// -----

				func @transfer_read_nonflattenable_non_row_major(%arg : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 5 + d2 * 20 + d3 * 60)>>) -> vector<5x4x3x2xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 5 + d2 * 20 + d3 * 60)>>, vector<5x4x3x2xi8>
				return %v : vector<5x4x3x2xi8>
				}

				// CHECK-LABEL: func @transfer_read_nonflattenable_non_row_major
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8,
				// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]]
				// CHECK: return %[[READ]]

				// -----

				func @transfer_write_flattenable(%arg : memref<5x4x3x2xi8>, %vec : vector<5x4x3x2xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] : vector<5x4x3x2xi8>, memref<5x4x3x2xi8>
				return
				}

				// CHECK-LABEL: func @transfer_write_flattenable
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8>,
				// CHECK-SAME: %[[VEC:.+]]: vector<5x4x3x2xi8>
				// CHECK-DAG: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]{{.}} : memref<5x4x3x2xi8> into memref<120xi8>
				// CHECK-DAG: %[[VEC1D:.+]] = vector.shape_cast %[[VEC]] : vector<5x4x3x2xi8> to vector<120xi8>
				// CHECK: vector.transfer_write %[[VEC1D]], %[[COLLAPSED]]

				// -----

				func @transfer_write_flattenable_with_offset(%arg : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 24 + d1 * 6 + d2 * 2 + d3 + s0)>>, %vec : vector<5x4x3x2xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] : vector<5x4x3x2xi8>, memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 24 + d1 * 6 + d2 * 2 + d3 + s0)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_flattenable_with_offset
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8, {{.+}}>,
				// CHECK-SAME: %[[VEC:.+]]: vector<5x4x3x2xi8>
				// CHECK-DAG: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]{{.}} : memref<5x4x3x2xi8, {{.+}}> into memref<120xi8, {{.+}}>
				// CHECK-DAG: %[[VEC1D:.+]] = vector.shape_cast %[[VEC]] : vector<5x4x3x2xi8> to vector<120xi8>
				// CHECK: vector.transfer_write %[[VEC1D]], %[[COLLAPSED]]

				// -----

				func @transfer_write_flattenable_with_offset_with_rank_reducing_subview(%arg : memref<1x1x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 6 + d2 * 2 + d3 + s0)>>, %vec : vector<3x2xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] : vector<3x2xi8>, memref<1x1x3x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 6 + d2 * 2 + d3 + s0)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_flattenable_with_offset_with_rank_reducing_subview
				// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2xi8, {{.+}}>,
				// CHECK-SAME: %[[VEC:.+]]: vector<3x2xi8>
				// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
				// CHECK-DAG: %[[COLLAPSED:.+]] = memref.collapse_shape %[[SUBVIEW]] {{.}}[0, 1]{{.}} : memref<3x2xi8, {{.+}}> into memref<6xi8, {{.+}}>
				// CHECK-DAG: %[[VEC1D:.+]] = vector.shape_cast %[[VEC]] : vector<3x2xi8> to vector<6xi8>
				// CHECK: vector.transfer_write %[[VEC1D]], %[[COLLAPSED]]

				// -----

				func @transfer_write_flattenable_with_offset_with_rank_reducing_subview_and_no_collapse(%arg : memref<1x1x1x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 2 + d1 * 2 + d2 * 2 + d3 + s0)>>, %vec : vector<2xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] : vector<2xi8>, memref<1x1x1x2xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 2 + d1 * 2 + d2 * 2 + d3 + s0)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_flattenable_with_offset_with_rank_reducing_subview_and_no_collapse
				// CHECK-SAME: %[[ARG:.+]]: memref<1x1x1x2xi8, {{.+}}>,
				// CHECK-SAME: %[[VEC:.+]]: vector<2xi8>
				// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 1, 2] [1, 1, 1, 1]
				// CHECK: vector.transfer_write %[[VEC]], %[[SUBVIEW]]

				// -----

				func @transfer_write_nonflattenable_out_of_bounds(%arg : memref<5x4x3x2xi8>, %vec : vector<5x4x3x2xi8>, %i : index) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%i, %c0, %c0, %c0] {in_bounds = [false, true, true, true]} : vector<5x4x3x2xi8>, memref<5x4x3x2xi8>
				return
				}

				// CHECK-LABEL: func @transfer_write_nonflattenable_out_of_bounds
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8>,
				// CHECK-SAME: %[[VEC:.+]]: vector<5x4x3x2xi8>
				// CHECK-SAME: %[[I:.+]]: index
				// CHECK: vector.transfer_write %[[VEC]], %[[ARG]]

				// -----

				func @transfer_write_nonflattenable_non_contiguous(%arg : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 25 + d1 * 6 + d2 * 2 + d3)>>, %vec : vector<5x4x3x2xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg[%c0, %c0, %c0, %c0] : vector<5x4x3x2xi8>, memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 25 + d1 * 6 + d2 * 2 + d3)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_nonflattenable_non_contiguous
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8,
				// CHECK-SAME: %[[VEC:.+]]: vector<5x4x3x2xi8>
				// CHECK: vector.transfer_write %[[VEC]], %[[ARG]]

				// -----

				func @transfer_write_nonflattenable_non_row_major(%arg : memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 4 + d2 * 12 + d3 * 24)>>, %vec : vector<5x4x3x2xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg[%c0, %c0, %c0, %c0] : vector<5x4x3x2xi8>, memref<5x4x3x2xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 4 + d2 * 12 + d3 * 24)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_nonflattenable_non_row_major
				// CHECK-SAME: %[[ARG:.+]]: memref<5x4x3x2xi8,
				// CHECK-SAME: %[[VEC:.+]]: vector<5x4x3x2xi8>
				// CHECK: vector.transfer_write %[[VEC]], %[[ARG]]

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

Show First 20 Lines • Show All 577 Lines • ▼ Show 20 Lines	struct TestVectorReduceToContractPatternsPatterns
}		}
void runOnFunction() override {		void runOnFunction() override {
RewritePatternSet patterns(&getContext());		RewritePatternSet patterns(&getContext());
populateVectorReductionToContractPatterns(patterns);		populateVectorReductionToContractPatterns(patterns);
(void)applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));		(void)applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));
}		}
};		};

		struct TestFlattenVectorTransferPatterns
		: public PassWrapper<TestFlattenVectorTransferPatterns, FunctionPass> {
		StringRef getArgument() const final {
		return "test-vector-transfer-flatten-patterns";
		}
		StringRef getDescription() const final {
		return "Test patterns to rewrite contiguous row-major N-dimensional "
		"vector.transfer_{read,write} ops into 1D transfers";
		}
		void getDependentDialects(DialectRegistry &registry) const override {
		registry.insert<memref::MemRefDialect>();
		}
		void runOnFunction() override {
		RewritePatternSet patterns(&getContext());
		populateFlattenVectorTransferPatterns(patterns);
		(void)applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));
		}
		};

} // namespace		} // namespace

namespace mlir {		namespace mlir {
namespace test {		namespace test {
void registerTestVectorLowerings() {		void registerTestVectorLowerings() {
PassRegistration<TestVectorToVectorLowering>();		PassRegistration<TestVectorToVectorLowering>();

PassRegistration<TestVectorContractionLowering>();		PassRegistration<TestVectorContractionLowering>();
Show All 14 Lines	void registerTestVectorLowerings() {

PassRegistration<TestVectorTransferLoweringPatterns>();		PassRegistration<TestVectorTransferLoweringPatterns>();

PassRegistration<TestVectorMultiReductionLoweringPatterns>();		PassRegistration<TestVectorMultiReductionLoweringPatterns>();

PassRegistration<TestVectorTransferCollapseInnerMostContiguousDims>();		PassRegistration<TestVectorTransferCollapseInnerMostContiguousDims>();

PassRegistration<TestVectorReduceToContractPatternsPatterns>();		PassRegistration<TestVectorReduceToContractPatternsPatterns>();

		PassRegistration<TestFlattenVectorTransferPatterns>();
}		}
} // namespace test		} // namespace test
} // namespace mlir		} // namespace mlir

This is an archive of the discontinued LLVM Phabricator instance.

Patterns flattening vector transfers to 1DClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 393908

mlir/include/mlir/Dialect/Vector/VectorOps.h

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

Patterns flattening vector transfers to 1D
ClosedPublic