This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Vector/
-
mlir/
-
Dialect/
-
Vector/
-
VectorOps.h
-
lib/Dialect/Vector/
-
Dialect/
-
Vector/
14/15
VectorTransforms.cpp
-
test/
-
Dialect/Vector/
-
Vector/
2/2
vector-transfer-flatten.mlir
-
lib/Dialect/Vector/
-
Dialect/
-
Vector/
-
TestVectorTransforms.cpp

Differential D114993

Patterns flattening vector transfers to 1D
ClosedPublic

Authored by Benoit on Dec 2 2021, 1:18 PM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
mravishankar
rriddle

Commits

rGaba437ceb237: [mlir][Vector] Patterns flattening vector transfers to 1D

Summary

This is needed at the moment to get good codegen from 2d vector.transfer
ops that aim to compile to SIMD load/store instructions but that can
only do so if the whole 2d transfer shape is handled in one piece, in
particular taking advantage of the memref being contiguous rowmajor.

For instance, if the target architecture has 128bit SIMD then we would
expect that contiguous row-major transfers of <4x4xi8> map to one SIMD
load/store instruction each.

The current generic lowering of multi-dimensional vector.transfer ops
can't achieve that because it peels dimensions one by one, so a transfer
of <4x4xi8> becomes 4 transfers of <4xi8>.

The new patterns here are only enabled for now by
-test-vector-transfer-flatten-patterns.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Benoit created this revision.Dec 2 2021, 1:18 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 20 others. · View Herald TranscriptDec 2 2021, 1:18 PM

Herald added a reviewer: aartbik. · View Herald TranscriptDec 2 2021, 1:18 PM

Benoit requested review of this revision.Dec 2 2021, 1:18 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptDec 2 2021, 1:18 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Benoit added a reviewer: mravishankar.Dec 2 2021, 1:18 PM

Harbormaster completed remote builds in B137221: Diff 391443.Dec 2 2021, 1:50 PM

mravishankar added inline comments.Dec 2 2021, 2:35 PM

mlir/lib/Dialect/Vector/VectorTransforms.cpp
3422	I think you can drop the dimsize != 1 condition. The `strides[i] != productOfInnerMostSizes` should still hold.

Cool, thanks much for tackling this, glad to see that it seems to get you to reasonable assembly (inferring from your description?)

mlir/lib/Dialect/Vector/VectorTransforms.cpp
3394	Can we be more specific here: `collapseContiguousMemRefCollapseTo1D` ?
3398	nit:trivial braces.
3408	This should be sliced a bit differently and better reusing API around BuiltinTypes.h::450. Basically, `getStridesAndOffset` is the future-proof way to get the offset and strides. You want to check the last stride is 1. You can have this as a special helper `bool isContiguousMostMinorDimension()` or `bool isStrideOneMostMinor(int dim)` (whichever you find most natural and reusable). Then you need to determine if the whole type is contiguous. For this you should add a helper that: returns true empty or identity layout map (the weird trifecta I mentioned previously) then returns false if not fully static then perform a direct MemRefType comparison between a proper usage or `getStridesAndOffset`, `makeStridedLinearLayoutMap`, `canonicalizeStridedLayout`. There should be something similar already using similar logic that can be refactored. Giving this helper a name and reusing it in multiple places will be a nice cleanup.
3411	Nit: we avoid trivial braces in LLVM, here and below
3419	nit: we use camelCase in LLVM, here and below.
3420	nit `dimSize != 1`
3443	nit: trivial braces here and below (lift the comment out the the condition)
3469	Nice!
3539	`VectorTransforms.cpp` is way too bloated and we should split it up (a bit like the standard dialect). Please either start a new properly named `Vector/XXXPatterns.cpp` file to put these new patterns or find an existing one.
mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
22	You could use the form "memref<4x3x2x1xi8, offset: ?, strides: [6, 2, 1, 1]>" Due to weird biases, it is currently implemented with an underlying affine_map but the informatio may be clearer like this. Unfortunately it will still print with affine_map for now.

nicolasvasilache requested changes to this revision.Dec 3 2021, 7:47 AM

This revision now requires changes to proceed.Dec 3 2021, 7:47 AM

add rank reducing subview to drop unit dims

Thanks for the review comments! I'll apply them on monday. For now I've just updated this diff with the new idea to use rank reducing subviews to drop unit dims, (thanks @ThomasRaoux for the suggestion), that removes my need for https://reviews.llvm.org/D114821 so I can drop it now.

Benoit mentioned this in D114821: isReshapableDimBand: ignore strides of unit dims..Dec 3 2021, 8:19 PM

Harbormaster completed remote builds in B137486: Diff 391804.Dec 3 2021, 8:32 PM

nicolasvasilache requested changes to this revision.Dec 6 2021, 1:03 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Vector/VectorTransforms.cpp
3393–3518	Please don't add more patterns to VectorTransforms.cpp, we need to split them out into better isolated logical units. Either add a new .cpp file at the same level with a proper name (see other XXXPatternXXX.cpp files) or put them in an already existing such file, depending on what is most appropriate.
3394	This should be its own pattern and return failure when it fails to apply.
3418	You could run this through clang-format. I locally have this in my `.bashrc` function git-format-add-and-amend(){ echo "git add $1 && git show --name-only \| egrep \".(\.cpp\|\.h)\" \| xargs -i clang-format --style=file -i {}; git add $1; git commit --amend" git add $1 && git show --name-only \| egrep ".(\.cpp\|\.h)" \| xargs -i clang-format --style=file -i {}; git add $1; git commit --amend }
3421	This should be its own pattern and return failure when it fails to apply.
3437	static bool isStaticShapeAndContiguousRowMajor(MemRefType memrefType) { if (!memrefType.hasStaticShape()) return false; int64_t offset; SmallVector<int64_t> strides; LogicalResult res = getStridesAndOffset(memrefType, strides, offset); if (failed(res)) return false; // You may want to improve the APIs here to minimize the code below to something // that is expected to be reusable by others. AffineExpr expr = makeCanonicalStridedLayoutExpr( memrefTyp.getSizes(), memrefType.getContext()); MemRefType canonicalMemRefType = MemRefType::get( memrefTyp.getSizes(), AffineMap::infer({expr})); int64_t canonicalOffset; SmallVector<int64_t> canonicalStrides; LogicalResult res = getStridesAndOffset( canonicalMemRefType, canonicalStrides, canonicalOffset); if (failed(res)) llvm_unreachable("Unexpected stride extraction error"); for (auto it : llvm::zip(strides, canonicalStrides)) if (std::get<0>(it) != std::get<1>(it)) return false; return true; }

This revision now requires changes to proceed.Dec 6 2021, 1:03 AM

apply some review comments

Harbormaster completed remote builds in B138644: Diff 393458.Dec 10 2021, 5:56 AM

nicolasvasilache added inline comments.Dec 10 2021, 6:30 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
213 ↗	(On Diff #393458)	you'll need comments on this function and everywhere below (ideally with a short IR example but not strictly necessary)

split into 2 patterns

Harbormaster completed remote builds in B138661: Diff 393479.Dec 10 2021, 7:13 AM

Benoit added inline comments.Dec 10 2021, 7:32 AM

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir
22	Thanks for the tip. For consistency between the MLIR code and the `CHECK`'s I will stick to the `affine_map<...>` form for now.

nicolasvasilache added inline comments.Dec 10 2021, 7:34 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
213 ↗	(On Diff #393479)	Nit, top-levle function and class comments take 3 slashes `///`
214 ↗	(On Diff #393479)	nit:camelCase
226 ↗	(On Diff #393479)	use std::copy_if or one of the other stl transforms?
234 ↗	(On Diff #393479)	nice!
245 ↗	(On Diff #393479)	nit: camelCase `Drop`
250 ↗	(On Diff #393479)	This would only work for the static memref cases. It feels like you should either early exit if the type is not fully static or use the OpFoldResult-based builders.
258 ↗	(On Diff #393479)	std::count_if or llvm::count_if IIRC

nicolasvasilache added inline comments.Dec 10 2021, 8:14 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
268 ↗	(On Diff #393479)	@gysity, does this relate to the example we just discussed?

more review comments

Harbormaster completed remote builds in B138675: Diff 393499.Dec 10 2021, 8:16 AM

Benoit marked 6 inline comments as done.Dec 10 2021, 8:19 AM

Benoit added inline comments.

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
226 ↗	(On Diff #393479)	I didn't find a simple way to do that, because of how we need to create `reducedStrides`, not just `reducedShape`. While it would be possible to let `copy_if` create `reducedShape`, I thought that it was more readable if both vectors were created in the same way. In particular, it makes it plain that they have the same length, and that their i-th elements correspond to the same i-th dim.
250 ↗	(On Diff #393479)	The caller already has such an early exit, so I put assertions here.

gysit added a subscriber: gysit.Dec 10 2021, 8:39 AM

gysit added inline comments.

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
268 ↗	(On Diff #393479)	I think @Benoit `s revision addresses a similar topic but at a different level of the stack. I was looking into generating good transfer ops without switching between tensors and vectors in between and making sure rank-reduction works well. @Benoit optimizes the lowering of what we generate higher up in the stack by flattening the vectors, AFAIU. Looking forward to understand the performance implications but I could imagine this helps quite a bit with in combination with hoisting. In particular, for convolutions where we work on very high-dimensional vectors.

Thanks for pushing on this @Benoit !

I'd suggest slicing and dicing into smaller commits so we can better track if we ever need to bisect; some of the behavior is tricky to get right and the smaller the CLs + tests, the better we will be reviewing and revisiting in the future.
I think you can turn this into 4-5 commits that can then be more easily clicked.

Also, please don't forget about putting those outside of vectortransforms.cpp.

These are nice developments, sorry it is getting longer than what I think you initially signed for but OTOH you're doing things in the right and future-proof way, so I am grateful!

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
298 ↗	(On Diff #393479)	This is incorrect, the original transferReadOp may have some permutation (i.e. also be sure to insert a proper test). You want some projection map and compose that with the transfer permutation. @ThomasRaoux for off-EU-hours advice. Ah nm, my apologies I see you have a permutation_map test above. Could you just add a comment/TODO that would highlight this / provision for future work?
299 ↗	(On Diff #393479)	zeros is invalid as we discussed on discord. You can't go around a project map and applying it to values here.
337 ↗	(On Diff #393479)	same as above re projection map and zeros / identity map.
342 ↗	(On Diff #393479)	This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz.
352 ↗	(On Diff #393479)	This is generally useful and should go to BuiltinTypes.h with proper doc (`///` prefix) plz.
366 ↗	(On Diff #393479)	This is generally useful and should go to BuiltinTypes.h with proper doc (/// prefix) plz. Also, can we retire helper functions that were not good enough for your use case (if there are too many uses please ignore this last point).
380 ↗	(On Diff #393479)	This should be moved to a proper place in the memref dialect (maybe some utils file and maybe there is already something similar) ?
398 ↗	(On Diff #393479)	I'd use the name "contiguous" in the pattern name and def. in the description otherwise the doc by itself would have invalid assumptions.
431 ↗	(On Diff #393479)	same comment here re 0 and permutation map this case may be a bit trickier though
441 ↗	(On Diff #393479)	same comments as the above pattern
476 ↗	(On Diff #393479)	same comments re 0 and map

mravishankar added inline comments.Dec 10 2021, 10:55 AM

mlir/lib/Dialect/Vector/VectorTransferOpTransforms.cpp
243 ↗	(On Diff #393499)	There might be a simpler way to do this. `SubViewOp` already has constructors to generate rank-reduced subviews. You can use the `inferRankReducedSubview` method [here[(https://github.com/llvm/llvm-project/blob/7f09aee0f6b4b00508d2cf86b0b1339c8d2ca2d1/mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td#L1488) and get the result type and use that.

Apply Mahesh's suggestion to use inferRankReducedResultType.

Harbormaster completed remote builds in B138894: Diff 393792.Dec 12 2021, 7:18 PM

Benoit marked 15 inline comments as done.Dec 12 2021, 7:29 PM

Benoit marked 2 inline comments as done.Dec 12 2021, 7:35 PM

More review comments.

Harbormaster completed remote builds in B138897: Diff 393795.Dec 12 2021, 8:11 PM

Benoit marked 7 inline comments as done.Dec 12 2021, 8:11 PM

Hi Nicolas, thanks for the kind supportive words here regarding the usefulness of these patterns/helpers.

I think I've addressed the "make it correct" part of your comments, in particular, allZeroConstantIndexValues now checks that the indices of the transfer ops are really all zeros. And I've applied the other "localized" comments that you and Mahesh had (thanks again for those! in particular, Mahesh's comment allowed dropping ~20 lines of code by making dropUnitDims trivial).

I haven't yet addressed your comments about splitting this into multiple commits and about contributing these helpers to core headers:

Regarding splitting into multiple commits: note that these patterns are so far only enabled in by test-only flags in TestVectorTransforms.cpp, so they are not for now going to affect anything outside these tests. If I understand correctly, the place where granularity will affect how well people can bisect any issues, will be in how we eventually enable these patterns outside of these tests?
Regarding sharing helpers into core headers: I wonder if this would be best done anyway as a second step after this, and maybe even delay a little further to wait until a second use case arises from someone else, to get more context before blessing a particular helper into a core header? If you feel that you already have enough context to make this call now, that may be a sign that you or someone else with this experience, not I, should be making this move :-)
As you guessed, I am at this point looking for a time-economical way to wrap up this work :-)

rebased

Harbormaster completed remote builds in B138979: Diff 393908.Dec 13 2021, 8:45 AM

nicolasvasilache accepted this revision.Dec 13 2021, 11:50 AM

This revision is now accepted and ready to land.Dec 13 2021, 11:50 AM

nicolasvasilache mentioned this in rG0aea49a73083: [mlir][Vector] Patterns flattening vector transfers to 1D.Dec 13 2021, 1:50 PM

Sliced a first independent commit and landed as 0aea49a7308322e6987c7b45e4e0d7ab15609e78 as we discussed offline to help you land this.

Closed by commit rGaba437ceb237: [mlir][Vector] Patterns flattening vector transfers to 1D (authored by Benoit, committed by nicolasvasilache). · Explain WhyDec 13 2021, 2:42 PM

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rGaba437ceb237: [mlir][Vector] Patterns flattening vector transfers to 1D.

Herald added a reviewer: rriddle. · View Herald TranscriptDec 13 2021, 2:42 PM

Second part landed as aba437ceb2379f219935b98a10ca3c5081f0c8b7.

Note that I reduced the amount of tests as the combination did not bring much IMO, feel free to disagree and revive some of those.
Also, note that there are now 2 populate functions to get the behavior you wanted, both of the should be called in sequence.

Benoit mentioned this in D119202: Add case to handle 0-D vectors in FlattenContiguousRowMajorTransferWritePattern and FlattenContiguousRowMajorTransferReadPattern..Feb 7 2022, 7:49 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

VectorOps.h

2 lines

lib/

Dialect/

Vector/

VectorTransforms.cpp

132 lines

test/

Dialect/

Vector/

vector-transfer-flatten.mlir

143 lines

lib/

Dialect/

Vector/

TestVectorTransforms.cpp

21 lines

Diff 391443

mlir/include/mlir/Dialect/Vector/VectorOps.h

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	/// Collect a set of leading one dimension removal patterns.			/// Collect a set of leading one dimension removal patterns.
	///			///
	/// These patterns insert vector.shape_cast to remove leading one dimensions			/// These patterns insert vector.shape_cast to remove leading one dimensions
	/// to expose more canonical forms of read/write/insert/extract operations.			/// to expose more canonical forms of read/write/insert/extract operations.
	/// With them, there are more chances that we can cancel out extract-insert			/// With them, there are more chances that we can cancel out extract-insert
	/// pairs or forward write-read pairs.			/// pairs or forward write-read pairs.
	void populateCastAwayVectorLeadingOneDimPatterns(RewritePatternSet &patterns);			void populateCastAwayVectorLeadingOneDimPatterns(RewritePatternSet &patterns);

				void populateFlattenVectorTransferPatterns(RewritePatternSet &patterns);

	/// Collect a set of patterns that bubble up/down bitcast ops.			/// Collect a set of patterns that bubble up/down bitcast ops.
	///			///
	/// These patterns move vector.bitcast ops to be before insert ops or after			/// These patterns move vector.bitcast ops to be before insert ops or after
	/// extract ops where suitable. With them, bitcast will happen on smaller			/// extract ops where suitable. With them, bitcast will happen on smaller
	/// vectors and there are more chances to share extract/insert ops.			/// vectors and there are more chances to share extract/insert ops.
	void populateBubbleVectorBitCastOpPatterns(RewritePatternSet &patterns);			void populateBubbleVectorBitCastOpPatterns(RewritePatternSet &patterns);

	/// Collect a set of transfer read/write lowering patterns.			/// Collect a set of transfer read/write lowering patterns.
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorTransforms.cpp

Show First 20 Lines • Show All 3,384 Lines • ▼ Show 20 Lines	Value result = rewriter.create<vector::TransferReadOp>(
loc, resultTargetVecType, rankedReducedView,		loc, resultTargetVecType, rankedReducedView,
readOp.indices().drop_back(dimsToDrop), permMap, readOp.padding(),		readOp.indices().drop_back(dimsToDrop), permMap, readOp.padding(),
inBounds);		inBounds);
rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(readOp, targetType,		rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(readOp, targetType,
result);		result);
return success();		return success();
}		}
};		};

		static Value collapseTo1D(PatternRewriter &rewriter, mlir::Location loc,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can we be more specific here: `collapseContiguousMemRefCollapseTo1D` ? nicolasvasilache: Can we be more specific here: `collapseContiguousMemRefCollapseTo1D` ?
		nicolasvasilacheUnsubmitted Done Reply Inline Actions This should be its own pattern and return failure when it fails to apply. nicolasvasilache: This should be its own pattern and return failure when it fails to apply.
		Value input) {
		auto inputType = input.getType().cast<ShapedType>();
		ReassociationIndices indices;
		for (int i = 0; i < inputType.getRank(); ++i) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit:trivial braces. nicolasvasilache: nit:trivial braces.
		indices.push_back(i);
		}
		return rewriter.create<memref::CollapseShapeOp>(
		loc, input, std::array<ReassociationIndices, 1>{indices});
		}

		// Helper determining if a memref is static-shape and contiguous-row-major
		// layout, still allowing an arbitrary offset (unlike some existing similar
		// functions).
		static bool isStaticShapeAndContiguousRowMajor(MemRefType memrefType) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions This should be sliced a bit differently and better reusing API around BuiltinTypes.h::450. Basically, `getStridesAndOffset` is the future-proof way to get the offset and strides. You want to check the last stride is 1. You can have this as a special helper `bool isContiguousMostMinorDimension()` or `bool isStrideOneMostMinor(int dim)` (whichever you find most natural and reusable). Then you need to determine if the whole type is contiguous. For this you should add a helper that: returns true empty or identity layout map (the weird trifecta I mentioned previously) then returns false if not fully static then perform a direct MemRefType comparison between a proper usage or `getStridesAndOffset`, `makeStridedLinearLayoutMap`, `canonicalizeStridedLayout`. There should be something similar already using similar logic that can be refactored. Giving this helper a name and reusing it in multiple places will be a nice cleanup. nicolasvasilache: This should be sliced a bit differently and better reusing API around BuiltinTypes.h::450.
		SmallVector<int64_t> strides;
		int64_t offset;
		if (!memrefType.hasStaticShape()) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Nit: we avoid trivial braces in LLVM, here and below nicolasvasilache: Nit: we avoid trivial braces in LLVM, here and below
		return false;
		}
		if (failed(getStridesAndOffset(memrefType, strides, offset))) {
		return false;
		}
		int64_t productOfInnerMostSizes = 1;
		for (int i = memrefType.getRank() - 1; i >= 0; --i) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions You could run this through clang-format. I locally have this in my `.bashrc` function git-format-add-and-amend(){ echo "git add $1 && git show --name-only \| egrep \".(\.cpp\|\.h)\" \| xargs -i clang-format --style=file -i {}; git add $1; git commit --amend" git add $1 && git show --name-only \| egrep ".(\.cpp\|\.h)" \| xargs -i clang-format --style=file -i {}; git add $1; git commit --amend } nicolasvasilache: You could run this through clang-format. I locally have this in my `.bashrc` ``` function git…
		int64_t dimsize = memrefType.getDimSize(i);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: we use camelCase in LLVM, here and below. nicolasvasilache: nit: we use camelCase in LLVM, here and below.
		// The dimsize!=1 condition here means that we ignore the strides of
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit `dimSize != 1` nicolasvasilache: nit `dimSize != 1`
		// unit dims, as they don't make a practical difference.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions This should be its own pattern and return failure when it fails to apply. nicolasvasilache: This should be its own pattern and return failure when it fails to apply.
		if (dimsize != 1 && strides[i] != productOfInnerMostSizes) {
		mravishankarUnsubmitted Done Reply Inline Actions I think you can drop the dimsize != 1 condition. The `strides[i] != productOfInnerMostSizes` should still hold. mravishankar: I think you can drop the dimsize != 1 condition. The `strides[i] != productOfInnerMostSizes`…
		return false;
		}
		// This simple arithmetic is correct thanks to having ensured above that
		// we have a static shape.
		productOfInnerMostSizes *= dimsize;
		}
		return true;
		}

		class FlattenTransferReadPattern
		: public OpRewritePattern<vector::TransferReadOp> {
		using OpRewritePattern<vector::TransferReadOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(vector::TransferReadOp transferReadOp,
		PatternRewriter &rewriter) const override {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions static bool isStaticShapeAndContiguousRowMajor(MemRefType memrefType) { if (!memrefType.hasStaticShape()) return false; int64_t offset; SmallVector<int64_t> strides; LogicalResult res = getStridesAndOffset(memrefType, strides, offset); if (failed(res)) return false; // You may want to improve the APIs here to minimize the code below to something // that is expected to be reusable by others. AffineExpr expr = makeCanonicalStridedLayoutExpr( memrefTyp.getSizes(), memrefType.getContext()); MemRefType canonicalMemRefType = MemRefType::get( memrefTyp.getSizes(), AffineMap::infer({expr})); int64_t canonicalOffset; SmallVector<int64_t> canonicalStrides; LogicalResult res = getStridesAndOffset( canonicalMemRefType, canonicalStrides, canonicalOffset); if (failed(res)) llvm_unreachable("Unexpected stride extraction error"); for (auto it : llvm::zip(strides, canonicalStrides)) if (std::get<0>(it) != std::get<1>(it)) return false; return true; } nicolasvasilache: static bool isStaticShapeAndContiguousRowMajor(MemRefType memrefType) { if (!memrefType.
		auto loc = transferReadOp.getLoc();
		Value vector = transferReadOp.vector();
		VectorType vectorType = vector.getType().cast<VectorType>();
		Value source = transferReadOp.source();
		MemRefType sourceType = source.getType().cast<MemRefType>();
		if (vectorType.getRank() == 1 && sourceType.getRank() == 1) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: trivial braces here and below (lift the comment out the the condition) nicolasvasilache: nit: trivial braces here and below (lift the comment out the the condition)
		// Already 1D, nothing to do.
		return failure();
		}
		if (!isStaticShapeAndContiguousRowMajor(sourceType)) {
		return failure();
		}
		if (sourceType.getNumElements() != vectorType.getNumElements()) {
		return failure();
		}
		if (transferReadOp.hasOutOfBoundsDim()) {
		return failure();
		}
		if (!transferReadOp.permutation_map().isMinorIdentity()) {
		return failure();
		}
		if (transferReadOp.mask()) {
		return failure();
		}
		Value c0 = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		auto identityMap1D = rewriter.getMultiDimIdentityMap(1);
		VectorType vectorType1d = VectorType::get({sourceType.getNumElements()},
		sourceType.getElementType());
		Value source1d = collapseTo1D(rewriter, loc, source);
		Value read1d = rewriter.create<vector::TransferReadOp>(
		loc, vectorType1d, source1d, ValueRange{c0}, identityMap1D);
		rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Nice! nicolasvasilache: Nice!
		transferReadOp, vector.getType().cast<VectorType>(), read1d);
		return success();
		}
		};

		class FlattenTransferWritePattern
		: public OpRewritePattern<vector::TransferWriteOp> {
		using OpRewritePattern<vector::TransferWriteOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(vector::TransferWriteOp transferWriteOp,
		PatternRewriter &rewriter) const override {
		auto loc = transferWriteOp.getLoc();
		Value vector = transferWriteOp.vector();
		VectorType vectorType = vector.getType().cast<VectorType>();
		Value source = transferWriteOp.source();
		MemRefType sourceType = source.getType().cast<MemRefType>();
		if (vectorType.getRank() == 1 && sourceType.getRank() == 1) {
		// Already 1D, nothing to do.
		return failure();
		}
		if (!isStaticShapeAndContiguousRowMajor(sourceType)) {
		return failure();
		}
		if (sourceType.getNumElements() != vectorType.getNumElements()) {
		return failure();
		}
		if (transferWriteOp.hasOutOfBoundsDim()) {
		return failure();
		}
		if (!transferWriteOp.permutation_map().isMinorIdentity()) {
		return failure();
		}
		if (transferWriteOp.mask()) {
		return failure();
		}
		Value c0 = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		auto identityMap1D = rewriter.getMultiDimIdentityMap(1);
		VectorType vectorType1d = VectorType::get({sourceType.getNumElements()},
		sourceType.getElementType());
		Value source1d = collapseTo1D(rewriter, loc, source);
		Value vector1d =
		rewriter.create<vector::ShapeCastOp>(loc, vectorType1d, vector);
		rewriter.create<vector::TransferWriteOp>(loc, vector1d, source1d,
		ValueRange{c0}, identityMap1D);
		rewriter.eraseOp(transferWriteOp);
		return success();
		}
		};

		nicolasvasilacheUnsubmitted Done Reply Inline Actions Please don't add more patterns to VectorTransforms.cpp, we need to split them out into better isolated logical units. Either add a new .cpp file at the same level with a proper name (see other XXXPatternXXX.cpp files) or put them in an already existing such file, depending on what is most appropriate. nicolasvasilache: Please don't add more patterns to VectorTransforms.cpp, we need to split them out into better…
void mlir::vector::populateVectorMaskMaterializationPatterns(		void mlir::vector::populateVectorMaskMaterializationPatterns(
RewritePatternSet &patterns, bool indexOptimizations) {		RewritePatternSet &patterns, bool indexOptimizations) {
patterns.add<VectorCreateMaskOpConversion,		patterns.add<VectorCreateMaskOpConversion,
MaterializeTransferMask<vector::TransferReadOp>,		MaterializeTransferMask<vector::TransferReadOp>,
MaterializeTransferMask<vector::TransferWriteOp>>(		MaterializeTransferMask<vector::TransferWriteOp>>(
patterns.getContext(), indexOptimizations);		patterns.getContext(), indexOptimizations);
}		}

void mlir::vector::populatePropagateVectorDistributionPatterns(		void mlir::vector::populatePropagateVectorDistributionPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
patterns.add<PointwiseExtractPattern, ContractExtractPattern,		patterns.add<PointwiseExtractPattern, ContractExtractPattern,
TransferReadExtractPattern, TransferWriteInsertPattern>(		TransferReadExtractPattern, TransferWriteInsertPattern>(
patterns.getContext());		patterns.getContext());
}		}

void mlir::vector::populateShapeCastFoldingPatterns(		void mlir::vector::populateShapeCastFoldingPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
patterns.add<ShapeCastOpFolder>(patterns.getContext());		patterns.add<ShapeCastOpFolder>(patterns.getContext());
}		}

		void mlir::vector::populateFlattenVectorTransferPatterns(
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `VectorTransforms.cpp` is way too bloated and we should split it up (a bit like the standard dialect). Please either start a new properly named `Vector/XXXPatterns.cpp` file to put these new patterns or find an existing one. nicolasvasilache: `VectorTransforms.cpp` is way too bloated and we should split it up (a bit like the standard…
		RewritePatternSet &patterns) {
		patterns.add<FlattenTransferReadPattern, FlattenTransferWritePattern>(
		patterns.getContext());
		populateShapeCastFoldingPatterns(patterns);
		}

void mlir::vector::populateBubbleVectorBitCastOpPatterns(		void mlir::vector::populateBubbleVectorBitCastOpPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
patterns.add<BubbleDownVectorBitCastForExtract,		patterns.add<BubbleDownVectorBitCastForExtract,
BubbleDownBitCastForStridedSliceExtract,		BubbleDownBitCastForStridedSliceExtract,
BubbleUpBitCastForStridedSliceInsert>(patterns.getContext());		BubbleUpBitCastForStridedSliceInsert>(patterns.getContext());
}		}

void mlir::vector::populateVectorBroadcastLoweringPatterns(		void mlir::vector::populateVectorBroadcastLoweringPatterns(
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir

This file was added.

				// RUN: mlir-opt %s -test-vector-transfer-flatten-patterns -split-input-file \| FileCheck %s

				func @transfer_read_flattenable(%arg : memref<4x3x2x1xi8>) -> vector<4x3x2x1xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<4x3x2x1xi8>, vector<4x3x2x1xi8>
				return %v : vector<4x3x2x1xi8>
				}

				// CHECK-LABEL: func @transfer_read_flattenable
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8>
				// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]{{.}} : memref<4x3x2x1xi8> into memref<24xi8>
				// CHECK: %[[READ1D:.+]] = vector.transfer_read %[[COLLAPSED]]
				// CHECK: %[[VEC2D:.+]] = vector.shape_cast %[[READ1D]] : vector<24xi8> to vector<4x3x2x1xi8>
				// CHECK: return %[[VEC2D]]

				// -----

				func @transfer_read_flattenable_with_offset(%arg : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 2 + d2 + d3 + s0)>>) -> vector<4x3x2x1xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 2 + d2 + d3 + s0)>>, vector<4x3x2x1xi8>
				nicolasvasilacheUnsubmitted Done Reply Inline Actions You could use the form "memref<4x3x2x1xi8, offset: ?, strides: [6, 2, 1, 1]>" Due to weird biases, it is currently implemented with an underlying affine_map but the informatio may be clearer like this. Unfortunately it will still print with affine_map for now. nicolasvasilache: You could use the form "memref<4x3x2x1xi8, offset: ?, strides: [6, 2, 1, 1]>" Due to weird…
				BenoitAuthorUnsubmitted Done Reply Inline Actions Thanks for the tip. For consistency between the MLIR code and the `CHECK`'s I will stick to the `affine_map<...>` form for now. Benoit: Thanks for the tip. For consistency between the MLIR code and the `CHECK`'s I will stick to the…
				return %v : vector<4x3x2x1xi8>
				}

				// CHECK-LABEL: func @transfer_read_flattenable_with_offset
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8
				// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]
				// CHECK: %[[READ1D:.+]] = vector.transfer_read %[[COLLAPSED]]
				// CHECK: %[[VEC2D:.+]] = vector.shape_cast %[[READ1D]] : vector<24xi8> to vector<4x3x2x1xi8>
				// CHECK: return %[[VEC2D]]
				// -----

				func @transfer_read_nonflattenable_out_of_bounds(%arg : memref<4x3x2x1xi8>, %i : index) -> vector<4x3x2x1xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%i, %c0, %c0, %c0], %cst {in_bounds = [false, true, true, true]} : memref<4x3x2x1xi8>, vector<4x3x2x1xi8>
				return %v : vector<4x3x2x1xi8>
				}

				// CHECK-LABEL: func @transfer_read_nonflattenable_out_of_bounds
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8>,
				// CHECK-SAME: %[[I:.+]]: index
				// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]][%[[I]]
				// CHECK: return %[[READ]]

				// -----

				func @transfer_read_nonflattenable_non_contiguous(%arg : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 8 + d1 * 2 + d2 + d3)>>) -> vector<4x3x2x1xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 8 + d1 * 2 + d2 + d3)>>, vector<4x3x2x1xi8>
				return %v : vector<4x3x2x1xi8>
				}

				// CHECK-LABEL: func @transfer_read_nonflattenable_non_contiguous
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8,
				// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]]
				// CHECK: return %[[READ]]

				// -----

				func @transfer_read_nonflattenable_non_row_major(%arg : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 4 + d2 * 12 + d3 * 24)>>) -> vector<4x3x2x1xi8> {
				%c0 = arith.constant 0 : index
				%cst = arith.constant 0 : i8
				%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 4 + d2 * 12 + d3 * 24)>>, vector<4x3x2x1xi8>
				return %v : vector<4x3x2x1xi8>
				}

				// CHECK-LABEL: func @transfer_read_nonflattenable_non_row_major
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8,
				// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]]
				// CHECK: return %[[READ]]

				// -----

				func @transfer_write_flattenable(%arg : memref<4x3x2x1xi8>, %vec : vector<4x3x2x1xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] : vector<4x3x2x1xi8>, memref<4x3x2x1xi8>
				return
				}

				// CHECK-LABEL: func @transfer_write_flattenable
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8>,
				// CHECK-SAME: %[[VEC:.+]]: vector<4x3x2x1xi8>
				// CHECK-DAG: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]{{.}} : memref<4x3x2x1xi8> into memref<24xi8>
				// CHECK-DAG: %[[VEC1D:.+]] = vector.shape_cast %[[VEC]] : vector<4x3x2x1xi8> to vector<24xi8>
				// CHECK: vector.transfer_write %[[VEC1D]], %[[COLLAPSED]]

				// -----

				func @transfer_write_flattenable_with_offset(%arg : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 2 + d2 + d3 + s0)>>, %vec : vector<4x3x2x1xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] : vector<4x3x2x1xi8>, memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3)[s0] -> (d0 * 6 + d1 * 2 + d2 + d3 + s0)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_flattenable_with_offset
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8, {{.+}}>,
				// CHECK-SAME: %[[VEC:.+]]: vector<4x3x2x1xi8>
				// CHECK-DAG: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG]] {{.}}[0, 1, 2, 3]{{.}} : memref<4x3x2x1xi8, {{.+}}> into memref<24xi8, {{.+}}>
				// CHECK-DAG: %[[VEC1D:.+]] = vector.shape_cast %[[VEC]] : vector<4x3x2x1xi8> to vector<24xi8>
				// CHECK: vector.transfer_write %[[VEC1D]], %[[COLLAPSED]]

				// -----

				func @transfer_write_nonflattenable_out_of_bounds(%arg : memref<4x3x2x1xi8>, %vec : vector<4x3x2x1xi8>, %i : index) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg [%i, %c0, %c0, %c0] {in_bounds = [false, true, true, true]} : vector<4x3x2x1xi8>, memref<4x3x2x1xi8>
				return
				}

				// CHECK-LABEL: func @transfer_write_nonflattenable_out_of_bounds
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8>,
				// CHECK-SAME: %[[VEC:.+]]: vector<4x3x2x1xi8>
				// CHECK-SAME: %[[I:.+]]: index
				// CHECK: vector.transfer_write %[[VEC]], %[[ARG]]

				// -----

				func @transfer_write_nonflattenable_non_contiguous(%arg : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 8 + d1 * 2 + d2 + d3)>>, %vec : vector<4x3x2x1xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg[%c0, %c0, %c0, %c0] : vector<4x3x2x1xi8>, memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 * 8 + d1 * 2 + d2 + d3)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_nonflattenable_non_contiguous
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8,
				// CHECK-SAME: %[[VEC:.+]]: vector<4x3x2x1xi8>
				// CHECK: vector.transfer_write %[[VEC]], %[[ARG]]

				// -----

				func @transfer_write_nonflattenable_non_row_major(%arg : memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 4 + d2 * 12 + d3 * 24)>>, %vec : vector<4x3x2x1xi8>) {
				%c0 = arith.constant 0 : index
				vector.transfer_write %vec, %arg[%c0, %c0, %c0, %c0] : vector<4x3x2x1xi8>, memref<4x3x2x1xi8, affine_map<(d0, d1, d2, d3) -> (d0 + d1 * 4 + d2 * 12 + d3 * 24)>>
				return
				}

				// CHECK-LABEL: func @transfer_write_nonflattenable_non_row_major
				// CHECK-SAME: %[[ARG:.+]]: memref<4x3x2x1xi8,
				// CHECK-SAME: %[[VEC:.+]]: vector<4x3x2x1xi8>
				// CHECK: vector.transfer_write %[[VEC]], %[[ARG]]

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

Show First 20 Lines • Show All 577 Lines • ▼ Show 20 Lines	struct TestVectorReduceToContractPatternsPatterns
}		}
void runOnFunction() override {		void runOnFunction() override {
RewritePatternSet patterns(&getContext());		RewritePatternSet patterns(&getContext());
populateVectorReductionToContractPatterns(patterns);		populateVectorReductionToContractPatterns(patterns);
(void)applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));		(void)applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));
}		}
};		};

		struct TestFlattenVectorTransferPatterns
		: public PassWrapper<TestFlattenVectorTransferPatterns, FunctionPass> {
		StringRef getArgument() const final {
		return "test-vector-transfer-flatten-patterns";
		}
		StringRef getDescription() const final {
		return "Test patterns to rewrite contiguous row-major N-dimensional "
		"vector.transfer_{read,write} ops into 1D transfers";
		}
		void getDependentDialects(DialectRegistry &registry) const override {
		registry.insert<memref::MemRefDialect>();
		}
		void runOnFunction() override {
		RewritePatternSet patterns(&getContext());
		populateFlattenVectorTransferPatterns(patterns);
		(void)applyPatternsAndFoldGreedily(getFunction(), std::move(patterns));
		}
		};

} // end anonymous namespace		} // end anonymous namespace

namespace mlir {		namespace mlir {
namespace test {		namespace test {
void registerTestVectorLowerings() {		void registerTestVectorLowerings() {
PassRegistration<TestVectorToVectorLowering>();		PassRegistration<TestVectorToVectorLowering>();

PassRegistration<TestVectorContractionLowering>();		PassRegistration<TestVectorContractionLowering>();
Show All 14 Lines	void registerTestVectorLowerings() {

PassRegistration<TestVectorTransferLoweringPatterns>();		PassRegistration<TestVectorTransferLoweringPatterns>();

PassRegistration<TestVectorMultiReductionLoweringPatterns>();		PassRegistration<TestVectorMultiReductionLoweringPatterns>();

PassRegistration<TestVectorTransferCollapseInnerMostContiguousDims>();		PassRegistration<TestVectorTransferCollapseInnerMostContiguousDims>();

PassRegistration<TestVectorReduceToContractPatternsPatterns>();		PassRegistration<TestVectorReduceToContractPatternsPatterns>();

		PassRegistration<TestFlattenVectorTransferPatterns>();
}		}
} // namespace test		} // namespace test
} // namespace mlir		} // namespace mlir

This is an archive of the discontinued LLVM Phabricator instance.

Patterns flattening vector transfers to 1DClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 391443

mlir/include/mlir/Dialect/Vector/VectorOps.h

mlir/lib/Dialect/Vector/VectorTransforms.cpp

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

Patterns flattening vector transfers to 1D
ClosedPublic