This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/MemRef/Transforms/
-
MemRef/
-
Transforms/
-
Passes.h
-
Passes.td
-
IR/
-
AffineExpr.h
-
lib/Dialect/MemRef/Transforms/
-
Dialect/
-
MemRef/
-
Transforms/
-
CMakeLists.txt
16/17
SimplifyExtractStridedMetadata.cpp
-
test/Dialect/MemRef/
-
Dialect/
-
MemRef/
8/8
simplify-extract-strided-metadata.mlir

Differential D133166

[mlir][MemRef] Canonicalize extract_strided_metadata(subview)
ClosedPublic

Authored by qcolombet on Sep 1 2022, 4:07 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
michalt
dcaballe
ftynse
aartbik
ThomasRaoux
chelini
rriddle

Commits

rG63a2536f77a4: [mlir][MemRef] Simplify extract_strided_metadata(subview)

Summary

Add a canonicalization step for extract_strided_metadata(subview).
The goal is to get rid of the subview while expressing its effects directly on the offset and strides of the base object.

In other words, this canonicalization replaces:

baseBuffer, offset, sizes, strides =
    extract_strided_metadata(
        subview(memref, subOffset, subSizes, subStrides))

With

baseBuffer, baseOffset, baseSizes, baseStrides =
    extract_strided_metadata(memref)
strides#i = baseStrides#i * subSizes#i
offset = baseOffset + sum(subOffset#i * strides#i)
sizes = subSizes

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

qcolombet created this revision.Sep 1 2022, 4:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 4:07 PM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 17 others. · View Herald Transcript

qcolombet requested review of this revision.Sep 1 2022, 4:07 PM

Herald added a subscriber: stephenneuendorffer. · View Herald TranscriptSep 1 2022, 4:07 PM

Harbormaster completed remote builds in B184709: Diff 457430.Sep 1 2022, 4:42 PM

Thanks for making progress on this @qcolombet !

Here is a first round of comments to reduce the impl size and make it more idiomatic.

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗	(On Diff #457430)	We should have something in StaticValueUtils.h\|cpp to hide away the if/else complexity and return an OpFoldResult. Could you reuse or extend ?
1228 ↗	(On Diff #457430)	Can we use a single AffineApplyOp here and below? At the higher levels of abstraction this is a better way of representing such indexing computations (i.e. 1 op instead of N ops). See e.g. https://sourcegraph.com/github.com/llvm/llvm-project/-/blob/mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp?L154 You can also make the variadic process of defining symbols bound to a position better by using something like: https://sourcegraph.com/github.com/llvm/llvm-project/-/blob/mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp?L281
1255 ↗	(On Diff #457430)	Same above, a lot of this could be compressed down to almost a one liner
1271 ↗	(On Diff #457430)	Can this be factored out into a helper function that would live next to e.g. `SubViewOp::inferResultType` ? It could return a vector of OpFoldResult to avoid passing a builder and then StaticValueUtils can be used or extended to materialize the constants here. All this should then become much more idiomatic (and shorter).
mlir/test/Dialect/MemRef/canonicalize.mlir
868 ↗	(On Diff #457430)	with the suggestion above you'd only check 1 `affine.apply` op here with the properly captured AffineMap.

Hardcode84 added a subscriber: Hardcode84.Sep 2 2022, 5:09 AM

qcolombet added inline comments.Sep 2 2022, 12:08 PM

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗	(On Diff #457430)	Thanks for the pointer! I dug a little bit more and I was struggling to find a generic way to describe the ValueRange we want to iterate on. The high level interface is OffsetSizeAndStrideOpInterface and I thought of using that and then I stumbled on `OffsetSizeAndStrideOpInterface::getMixedStrides()` which covers that and returns directly `OpFoldResult`. So looks like I may avoid the StaticValueUtils all together.
1228 ↗	(On Diff #457430)	Thanks for that pointer as well. Makes a lot of sense. However regarding the 1 op vs. N ops. That's going to be true for the offset, but not for the strides. Indeed, we still need to produce one value per stride since these appear in the final result. I need to look closer how `AffineApplyOp` are built to avoid creating useless one (when one of the stride is 1 or both strides are constant).
1271 ↗	(On Diff #457430)	Thanks for the suggestion, looking!

qcolombet added inline comments.Sep 2 2022, 4:39 PM

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1228 ↗	(On Diff #457430)	Alright reporting back on this. If we do this, the `MemRef` dialect will depend on the `Affine` dialect, which I am not sure is what the community wants in general. On top of that, the `Affine` dialect already depends on the `MemRef` dialect, so we would create a circular dependency. That may not be a deal breaker, but I don't have enough MLIR experience at this point to know how it could be broken.

Use getMixedXXX and getValueOrCreateConstant instead of manually handling the dynamic/static strides and sizes
Make the code more compact:
- Compute the offset and strides in the same loop
- Filter the sizes and strides dimensions in the same loop

On the "make the code more compact", we could go even further and populate the resulting strides and sizes directly at the same time that we compute the strides and offset.

I decided against it as I like the separation of concern here, but I could be convinced otherwise. Similarly, I merged the offset and strides computations because it seems natural (and saved one loop), but if people prefer it separated like in the original patch that works for me too.

Hi @nicolasvasilache ,

Let me know what you want to do with respect to the Affine dialect. (Whether we should use it or not here.)

I feel that if we want to go in that direction, we would need to do this canonicalization somewhere else (i.e. out of the core MemRef dialect).
I am happy to go down that road if you feel that's the way to go, just tell me where this would fit.

For now, I've stuck with the arith expansions, with all the problems there are dealing with it. (See inlined comments.)

Cheers,
-Quentin

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗	(On Diff #457430)	I ended up giving up on OpFoldResult here because unlike the AffineOps, the ArithmeticOps don't play nicely with that type. Well more precisely, you need to convert them to actual values to be able to construct the arith operations, unless I missed something.
1271 ↗	(On Diff #457430)	I ended up populating the resulting strides and sizes in the same loop and I didn't see a good refactoring at this point. I decided to go this way, because we already have to loop through the dimensions to populate the strides.

Harbormaster completed remote builds in B184923: Diff 457748.Sep 2 2022, 7:15 PM

nicolasvasilache added inline comments.Sep 5 2022, 9:19 AM

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗	(On Diff #457430)	Makes sense, thanks for trying! Yes, the difference here is that AffineApply + AffineMap can represent constants as attribute / part of the attribute, but this is not the case for individual `arith` ops. Whatever you do you have to materialize the constant, even when it folds perfectly. Side note, a lower-level indexing dialect with an `affine_apply` op in it would allow breaking unnecessary dependences but the discussion churn trying to get progress on that has been too high, this is as good as we can do right now.
1228 ↗	(On Diff #457430)	Fair enough. Given that strides indeed require multiple ops anyway, this is fine. For reference, one way to break cycles (but not applicable here) is to realize that canonicalization patterns don't necessarily need to be rooted on the consumer operation. Here is one example: https://discourse.llvm.org/t/cross-dialect-folding-and-canonicalization/2740 The key insight (that does not apply here) is: "In this case, no dialect dependency (in the milr registration/loading sense) is needed because your pattern does not produce ops from a different dialect." Here we would produce an `affine` dialect op.
1271 ↗	(On Diff #457430)	SG! Not using AffineApplyOp triggers a different set of tradeoffs and this looks fine to me.
mlir/test/Dialect/MemRef/canonicalize.mlir
910 ↗	(On Diff #457748)	In this example and below I believe some foldings and tests are missing. The strides of `%ARG` are known, they are `[64, 4, 1]`, so the return should be `%[[C4]]`, `%[[C1]]`. It would be nice to also have some mixed tests where the function argument type has e.g. `strides<?x2>` etc. You could have 1 fully dynamic test that spells out all the IR (adds, muls etc) and a few other tests that are just 1-line cheks and look that the return have the right constant in the proper place.

nicolasvasilache added a reviewer: chelini.Sep 5 2022, 9:20 AM

Actually I forgot about some of the context in my last review, I am sorry .. the patterns we are talking about here should probably not be canonicalization patterns but rather patterns that we can apply with a specific new pass to fold away the subview ops in the presence of memref.extract_strided_metadata.
See https://reviews.llvm.org/D128986 for a PR that does these foldings with load/store operations.

I am unclear yet whether we want to add those patterns to the same pass or create a new pass but it seems possible to reuse and refactor pieces.
Helpers like getLinearAffineExpr can likely be reused and refactored to make more idiomatic use of OpFoldResult; emitting AffineApplyOp will be easier and not subject to cyclic dependences.

the patterns we are talking about here should probably not be canonicalization patterns but rather patterns that we can apply with a specific new pass to fold away the subview ops in the presence of memref.extract_strided_metadata.
See https://reviews.llvm.org/D128986 for a PR that does these foldings with load/store operations.

Thanks for the pointer!

I'll move the logic in that pass while using the affine ops and then we can decide if we want a new pass or not.

mlir/test/Dialect/MemRef/canonicalize.mlir
910 ↗	(On Diff #457748)	Good point! I was hoping the `getStrides` would do the right thing, but it doesn't. Looks like we need to introduce another getter to get the attribute of the input memref and not the produced values in that case. (More precisely, I think we'll want a getter that returns a `OpFoldResult`.) Looking.

Move the canonicalization in its how pass (and call it simplification!)
Use affine apply to expand the offset and strides computations
Add a test with everything being dynamic to demonstrate all the computations

Herald added a subscriber: mgorny. · View Herald TranscriptSep 6 2022, 6:43 PM

qcolombet added inline comments.Sep 6 2022, 7:03 PM

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
89	Is it possible for the `getStridesAndOffset` method to fail on the source of a `subview`? Put differently, how can I write a test for that part?
108	FWIW, I'm not super happy with this helper function, because we end up materializing (and then deleting later) constants. I've tried to use the `getStridesAndOffset` function directly with the `AffineExpr` constructs, but I was not satisfied with the code. Essentially, I would end up in the main loop below with something like: AffineConstantExpr cst; if (known && (cst = strideExpr.dyn_cast<AffineConstantExpr>())) { makeComposedFolded(s0 * cst, {subStride}) } else { makeComposedFolded(s0 * s1, {subStride, origStride}) } The main problem here is that the number of operands is different (`subStride` vs. `subStride, origStride`), so getting rid of the if is not easy (at least I didn't see a nice way to do it.) What I tried for that was something like: Expr = s0 Operands = {subStride} If () Expr = cst Else { Expr = s1 Operands.push_back(origStride) } makeComposedFolded(s0 * Expr, Operands) And I didn't find it particularly readable.
mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
72	@nicolasvasilache I've kept some of the mixed dynamic/static tests. Compared to the full dynamic test (`extract_strided_metadata_of_subview_all_dynamic` at the end of this file), these tests don't add much, but I feel they are more approachable than dissecting the huge affine map of the last test. Let me know if you want to remove them nonetheless.

qcolombet added inline comments.Sep 6 2022, 7:06 PM

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
220	Nit question: Is there a way to produce an affine map, with a nice ordering of the arguments? E.g., origOffset, stride0, subStride0, subOffset0, ... Although the current map is correct, the arguments are all over the place :).

Harbormaster completed remote builds in B185340: Diff 458342.Sep 6 2022, 7:12 PM

nicolasvasilache added inline comments.Sep 7 2022, 7:18 AM

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
70	SmallVector<Type> sizeStrideTypes(sourceRank, indexType);
89	If `sourceType` come from a valid subview op then you shouldn't be able to fail here. You may want to assert success instead ans simplify code below.
108	I'm having trouble seeing it in this form but there must be significantly simpler ways of writing this. Let's apply the first round of cleanups and revisit?
121	Iteratively composing makes it hard to control your operands and indices here. You could do something like: SmallVector<Value> newStrides; for (unsigned i = 0; i < sourceRank; ++i) { newStrides.push_back( makeComposedFoldedAffineApply(rewriter, origLoc, s0 * s1, {getOrigStrideAtIdx(i), subStrides[i]})); } SmallVector<Value> values(2 * sourceRank + 1); SmallVector<AffineExpr> symbols(2 * sourceRank + 1); // Note: you may need to impl. bindSymbols yourself as the existing version works for variadic templates only. bindSymbols(symbols, ctx); AffineExpr expr = symbols.front(); for (unsigned i = 0; i < sourceRank; ++i) { expr = expr + symbols[1 + 2 * i] * symbols[1 + 2 * i + 1]; values.push_back(getOrigStrideAtIdx(i)); values.push_back(subStrides[i]); } Value finalOffset = makeComposedFoldedAffineApply(rewriter, origLoc, expr, values));
mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
223	can you please manually reflow and align?

nicolasvasilache added inline comments.Sep 7 2022, 9:09 AM

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
220	See my comment above, I believe it's because you compose AffineMaps iteratively. You could create a flat list of AffineExpr with the proper expressions, match that to the proper value and thus better control the alignment of values. This would also resist canonicalizations (e.g. if 2 values are the same the corresponding symbol gets dropped and the other ones shifted preserving the original alignment on the other values).

Introduce a bindSymbolsList function (had to give a different name because the compiler would be confused which version to map otherwise.)
Build the affine expr upfront and call composedFoldAffineApply just once
Reformat the tests manually to help readability

Herald added a reviewer: rriddle. · View Herald TranscriptSep 7 2022, 12:03 PM

qcolombet marked 2 inline comments as done.Sep 7 2022, 12:12 PM

qcolombet added inline comments.

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
89	I was suspecting something like that. Thanks for the confirmation.
108	With the suggested cleanups it doesn't look bad. We still create the constants and delete them on the fly though. If you have ideas on how to fix that, that would be great!
121	Great! The order makes much more sense now and more importantly is predictable. Thanks! BTW, I have one loop for the new strides and one loop for the offset. Let me know if you prefer that we merge both loops.
mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
220	Works great!

Nit: SmallVector initialize for indexType

qcolombet marked an inline comment as done.Sep 7 2022, 12:14 PM

Harbormaster completed remote builds in B185466: Diff 458528.Sep 7 2022, 12:33 PM

Looks good, thanks!

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
93	You can just use `IntegerAttr Builder::getIndexAttr(int64_t value)` to avoid materializing the constant. This should work out of the box with OpFoldResult
113	This feels like too many levels of indirection for no good reason. How about something resembling: for () { OpFoldResult origStride = (ShapedType::isDynamicStrideOrOffset(sourceStrides[i])) ? b.getIndexAttr(sourceStrides[i]) : origStrides[i]; strides.push_back(makeComposedFoldedAffineApply( rewriter, origLoc, s0 * s1, {subStrides[i], origStride})); }
121	same here, just spell it out with a ternary cond, the level of indirection dos not pay for itself.
145	nit: arrray
mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
104	typo
152	typo

This revision is now accepted and ready to land.Sep 7 2022, 3:43 PM

qcolombet added inline comments.Sep 7 2022, 4:15 PM

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
93	Ah great! Looking.
113	Sounds like I should merge the loop for the strides and the loop of the offset then. Otherwise I will repeat in both loops: OpFoldResult origStride = (ShapedType::isDynamicStrideOrOffset(sourceStrides[i])) ? b.getIndexAttr(sourceStrides[i]) : origStrides[i];

Remove lambdas and use ternary op instead
Use getIndexAttr instead of getValueOrCreateConstantIndexOp
Merge the offset computation loop and the strides computation loop
Fix typos

qcolombet marked 8 inline comments as done.Sep 7 2022, 4:38 PM

qcolombet added inline comments.

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
113	@nicolasvasilache I'm going to leave the PR open until tomorrow to give you a chance to see the merged loops. (In other words, what I did to avoid the duplication of the ternary op for the original stride after getting rid of the lambda.)
mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
104	Nice catch!

Cleanup unused includes

Harbormaster completed remote builds in B185529: Diff 458604.Sep 7 2022, 5:04 PM

nicolasvasilache accepted this revision.Sep 8 2022, 9:56 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
113	WFM, thanks!

Closed by commit rG63a2536f77a4: [mlir][MemRef] Simplify extract_strided_metadata(subview) (authored by qcolombet). · Explain WhySep 8 2022, 10:16 AM

This revision was automatically updated to reflect the committed changes.

qcolombet added a commit: rG63a2536f77a4: [mlir][MemRef] Simplify extract_strided_metadata(subview).

qcolombet mentioned this in D136483: [mlir][MemRefToLLVM] Remove the code for lowering collaspe/expand_shape.Oct 27 2022, 11:58 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

MemRef/

Transforms/

Passes.h

10 lines

Passes.td

13 lines

IR/

AffineExpr.h

8 lines

lib/

Dialect/

MemRef/

Transforms/

CMakeLists.txt

1 line

SimplifyExtractStridedMetadata.cpp

199 lines

test/

Dialect/

MemRef/

simplify-extract-strided-metadata.mlir

283 lines

Diff 458784

mlir/include/mlir/Dialect/MemRef/Transforms/Passes.h

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	void populateResolveRankedShapeTypeResultDimsPatterns(			void populateResolveRankedShapeTypeResultDimsPatterns(
	RewritePatternSet &patterns);			RewritePatternSet &patterns);

	/// Appends patterns that resolve `memref.dim` operations with values that are			/// Appends patterns that resolve `memref.dim` operations with values that are
	/// defined by operations that implement the `InferShapedTypeOpInterface`, in			/// defined by operations that implement the `InferShapedTypeOpInterface`, in
	/// terms of shapes of its input operands.			/// terms of shapes of its input operands.
	void populateResolveShapedTypeResultDimsPatterns(RewritePatternSet &patterns);			void populateResolveShapedTypeResultDimsPatterns(RewritePatternSet &patterns);

				/// Appends patterns for simplifying extract_strided_metadata(other_op) into
				/// easier to analyze constructs.
				void populateSimplifyExtractStridedMetadataOpPatterns(
				RewritePatternSet &patterns);

	/// Transformation to do multi-buffering/array expansion to remove dependencies			/// Transformation to do multi-buffering/array expansion to remove dependencies
	/// on the temporary allocation between consecutive loop iterations.			/// on the temporary allocation between consecutive loop iterations.
	/// It return success if the allocation was multi-buffered and returns failure()			/// It return success if the allocation was multi-buffered and returns failure()
	/// otherwise.			/// otherwise.
	/// Example:			/// Example:
	/// ```			/// ```
	/// %0 = memref.alloc() : memref<4x128xf32>			/// %0 = memref.alloc() : memref<4x128xf32>
	/// scf.for %iv = %c1 to %c1024 step %c3 {			/// scf.for %iv = %c1 to %c1024 step %c3 {
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	std::unique_ptr<Pass> createResolveRankedShapeTypeResultDimsPass();			std::unique_ptr<Pass> createResolveRankedShapeTypeResultDimsPass();

	/// Creates an operation pass to resolve `memref.dim` operations with values			/// Creates an operation pass to resolve `memref.dim` operations with values
	/// that are defined by operations that implement the			/// that are defined by operations that implement the
	/// `InferShapedTypeOpInterface` or the `ReifyRankedShapeTypeShapeOpInterface`,			/// `InferShapedTypeOpInterface` or the `ReifyRankedShapeTypeShapeOpInterface`,
	/// in terms of shapes of its input operands.			/// in terms of shapes of its input operands.
	std::unique_ptr<Pass> createResolveShapedTypeResultDimsPass();			std::unique_ptr<Pass> createResolveShapedTypeResultDimsPass();

				/// Creates an operation pass to simplify
				/// `extract_strided_metadata(other_op(memref))` into
				/// `extract_strided_metadata(memref)`.
				std::unique_ptr<Pass> createSimplifyExtractStridedMetadataPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/MemRef/Transforms/Passes.h.inc"			#include "mlir/Dialect/MemRef/Transforms/Passes.h.inc"

	} // namespace memref			} // namespace memref
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_MEMREF_TRANSFORMS_PASSES_H			#endif // MLIR_DIALECT_MEMREF_TRANSFORMS_PASSES_H

mlir/include/mlir/Dialect/MemRef/Transforms/Passes.td

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	let description = [{
operands.		operands.
}];		}];
let constructor = "mlir::memref::createResolveShapedTypeResultDimsPass()";		let constructor = "mlir::memref::createResolveShapedTypeResultDimsPass()";
let dependentDialects = [		let dependentDialects = [
"AffineDialect", "memref::MemRefDialect", "tensor::TensorDialect"		"AffineDialect", "memref::MemRefDialect", "tensor::TensorDialect"
];		];
}		}

		def SimplifyExtractStridedMetadata : Pass<"simplify-extract-strided-metadata"> {
		let summary = "Simplify extract_strided_metadata ops";
		let description = [{
		The pass simplifies extract_strided_metadata(other_op(memref)) to
		extract_strided_metadata(memref) when it is possible to model the effect
		of other_op directly with affine maps applied to the result of
		extract_strided_metadata.
		}];
		let constructor = "mlir::memref::createSimplifyExtractStridedMetadataPass()";
		let dependentDialects = [
		"AffineDialect", "memref::MemRefDialect"
		];
		}
#endif // MLIR_DIALECT_MEMREF_TRANSFORMS_PASSES		#endif // MLIR_DIALECT_MEMREF_TRANSFORMS_PASSES

mlir/include/mlir/IR/AffineExpr.h

	Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	template <int N>			template <int N>
	void bindSymbols(MLIRContext *ctx) {}			void bindSymbols(MLIRContext *ctx) {}

	template <int N, typename AffineExprTy, typename... AffineExprTy2>			template <int N, typename AffineExprTy, typename... AffineExprTy2>
	void bindSymbols(MLIRContext *ctx, AffineExprTy &e, AffineExprTy2 &...exprs) {			void bindSymbols(MLIRContext *ctx, AffineExprTy &e, AffineExprTy2 &...exprs) {
	e = getAffineSymbolExpr(N, ctx);			e = getAffineSymbolExpr(N, ctx);
	bindSymbols<N + 1, AffineExprTy2 &...>(ctx, exprs...);			bindSymbols<N + 1, AffineExprTy2 &...>(ctx, exprs...);
	}			}

				template <typename AffineExprTy>
				void bindSymbolsList(MLIRContext *ctx, SmallVectorImpl<AffineExprTy> &exprs) {
				int idx = 0;
				for (AffineExprTy &e : exprs)
				e = getAffineSymbolExpr(idx++, ctx);
				}

	} // namespace detail			} // namespace detail

	/// Bind a list of AffineExpr references to DimExpr at positions:			/// Bind a list of AffineExpr references to DimExpr at positions:
	/// [0 .. sizeof...(exprs)]			/// [0 .. sizeof...(exprs)]
	template <typename... AffineExprTy>			template <typename... AffineExprTy>
	void bindDims(MLIRContext *ctx, AffineExprTy &...exprs) {			void bindDims(MLIRContext *ctx, AffineExprTy &...exprs) {
	detail::bindDims<0>(ctx, exprs...);			detail::bindDims<0>(ctx, exprs...);
	}			}
	Show All 34 Lines

mlir/lib/Dialect/MemRef/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRMemRefTransforms			add_mlir_dialect_library(MLIRMemRefTransforms
	ComposeSubView.cpp			ComposeSubView.cpp
	ExpandOps.cpp			ExpandOps.cpp
	FoldMemRefAliasOps.cpp			FoldMemRefAliasOps.cpp
	MultiBuffer.cpp			MultiBuffer.cpp
	NormalizeMemRefs.cpp			NormalizeMemRefs.cpp
	ResolveShapedTypeResultDims.cpp			ResolveShapedTypeResultDims.cpp
				SimplifyExtractStridedMetadata.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/MemRef			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/MemRef

	DEPENDS			DEPENDS
	MLIRMemRefPassIncGen			MLIRMemRefPassIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	Show All 13 Lines

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp

This file was added.

				//===- SimplifyExtractStridedMetadata.cpp - Simplify this operation -------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// This pass simplifies extract_strided_metadata(other_op(memref) to
				/// extract_strided_metadata(memref) when it is possible to express the effect
				// of other_op using affine apply on the results of
				// extract_strided_metadata(memref).
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/Arithmetic/Utils/Utils.h"
				#include "mlir/Dialect/MemRef/IR/MemRef.h"
				#include "mlir/Dialect/MemRef/Transforms/Passes.h"
				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
				#include "llvm/ADT/SmallBitVector.h"

				namespace mlir {
				namespace memref {
				#define GEN_PASS_DEF_SIMPLIFYEXTRACTSTRIDEDMETADATA
				#include "mlir/Dialect/MemRef/Transforms/Passes.h.inc"
				} // namespace memref
				} // namespace mlir
				using namespace mlir;

				namespace {
				/// Replace `baseBuffer, offset, sizes, strides =
				/// extract_strided_metadata(subview(memref, subOffset,
				/// subSizes, subStrides))`
				/// With
				///
				/// \verbatim
				/// baseBuffer, baseOffset, baseSizes, baseStrides =
				/// extract_strided_metadata(memref)
				/// strides#i = baseStrides#i * subSizes#i
				/// offset = baseOffset + sum(subOffset#i * strides#i)
				/// sizes = subSizes
				/// \endverbatim
				///
				/// In other words, get rid of the subview in that expression and canonicalize
				/// on its effects on the offset, the sizes, and the strides using affine apply.
				struct ExtractStridedMetadataOpSubviewFolder
				: public OpRewritePattern<memref::ExtractStridedMetadataOp> {
				public:
				using OpRewritePattern<memref::ExtractStridedMetadataOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(memref::ExtractStridedMetadataOp op,
				PatternRewriter &rewriter) const override {
				auto subview = op.getSource().getDefiningOp<memref::SubViewOp>();
				if (!subview)
				return failure();

				// Build a plain extract_strided_metadata(memref) from
				// extract_strided_metadata(subview(memref)).
				Location origLoc = op.getLoc();
				IndexType indexType = rewriter.getIndexType();
				Value source = subview.getSource();
				auto sourceType = source.getType().cast<MemRefType>();
				unsigned sourceRank = sourceType.getRank();
				SmallVector<Type> sizeStrideTypes(sourceRank, indexType);

				auto newExtractStridedMetadata =
				rewriter.create<memref::ExtractStridedMetadataOp>(
				origLoc, op.getBaseBuffer().getType(), indexType, sizeStrideTypes,
				sizeStrideTypes, source);
				nicolasvasilacheUnsubmitted Done Reply Inline Actions SmallVector<Type> sizeStrideTypes(sourceRank, indexType); nicolasvasilache: ``` SmallVector<Type> sizeStrideTypes(sourceRank, indexType); ```

				SmallVector<int64_t> sourceStrides;
				int64_t sourceOffset;

				bool hasKnownStridesAndOffset =
				succeeded(getStridesAndOffset(sourceType, sourceStrides, sourceOffset));
				(void)hasKnownStridesAndOffset;
				assert(hasKnownStridesAndOffset &&
				"getStridesAndOffset must work on valid subviews");

				// Compute the new strides and offset from the base strides and offset:
				// newStride#i = baseStride#i * subStride#i
				// offset = baseOffset + sum(subOffsets#i * newStrides#i)
				SmallVector<OpFoldResult> strides;
				SmallVector<OpFoldResult> subStrides = subview.getMixedStrides();
				auto origStrides = newExtractStridedMetadata.getStrides();

				// Hold the affine symbols and values for the computation of the offset.
				SmallVector<OpFoldResult> values(3 * sourceRank + 1);
				qcolombetAuthorUnsubmitted Done Reply Inline Actions Is it possible for the `getStridesAndOffset` method to fail on the source of a `subview`? Put differently, how can I write a test for that part? qcolombet: Is it possible for the `getStridesAndOffset` method to fail on the source of a `subview`? Put…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions If `sourceType` come from a valid subview op then you shouldn't be able to fail here. You may want to assert success instead ans simplify code below. nicolasvasilache: If `sourceType` come from a valid subview op then you shouldn't be able to fail here. You may…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions I was suspecting something like that. Thanks for the confirmation. qcolombet: I was suspecting something like that. Thanks for the confirmation.
				SmallVector<AffineExpr> symbols(3 * sourceRank + 1);

				detail::bindSymbolsList(rewriter.getContext(), symbols);
				AffineExpr expr = symbols.front();
				nicolasvasilacheUnsubmitted Done Reply Inline Actions You can just use `IntegerAttr Builder::getIndexAttr(int64_t value)` to avoid materializing the constant. This should work out of the box with OpFoldResult nicolasvasilache: You can just use `IntegerAttr Builder::getIndexAttr(int64_t value)` to avoid materializing the…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions Ah great! Looking. qcolombet: Ah great! Looking.
				values[0] = ShapedType::isDynamicStrideOrOffset(sourceOffset)
				? getAsOpFoldResult(newExtractStridedMetadata.getOffset())
				: rewriter.getIndexAttr(sourceOffset);
				SmallVector<OpFoldResult> subOffsets = subview.getMixedOffsets();

				AffineExpr s0 = rewriter.getAffineSymbolExpr(0);
				AffineExpr s1 = rewriter.getAffineSymbolExpr(1);
				for (unsigned i = 0; i < sourceRank; ++i) {
				// Compute the stride.
				OpFoldResult origStride =
				ShapedType::isDynamicStrideOrOffset(sourceStrides[i])
				? origStrides[i]
				: OpFoldResult(rewriter.getIndexAttr(sourceStrides[i]));
				strides.push_back(makeComposedFoldedAffineApply(
				rewriter, origLoc, s0 * s1, {subStrides[i], origStride}));
				qcolombetAuthorUnsubmitted Done Reply Inline Actions FWIW, I'm not super happy with this helper function, because we end up materializing (and then deleting later) constants. I've tried to use the `getStridesAndOffset` function directly with the `AffineExpr` constructs, but I was not satisfied with the code. Essentially, I would end up in the main loop below with something like: AffineConstantExpr cst; if (known && (cst = strideExpr.dyn_cast<AffineConstantExpr>())) { makeComposedFolded(s0 * cst, {subStride}) } else { makeComposedFolded(s0 * s1, {subStride, origStride}) } The main problem here is that the number of operands is different (`subStride` vs. `subStride, origStride`), so getting rid of the if is not easy (at least I didn't see a nice way to do it.) What I tried for that was something like: Expr = s0 Operands = {subStride} If () Expr = cst Else { Expr = s1 Operands.push_back(origStride) } makeComposedFolded(s0 * Expr, Operands) And I didn't find it particularly readable. qcolombet: FWIW, I'm not super happy with this helper function, because we end up materializing (and then…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions I'm having trouble seeing it in this form but there must be significantly simpler ways of writing this. Let's apply the first round of cleanups and revisit? nicolasvasilache: I'm having trouble seeing it in this form but there must be significantly simpler ways of…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions With the suggested cleanups it doesn't look bad. We still create the constants and delete them on the fly though. If you have ideas on how to fix that, that would be great! qcolombet: With the suggested cleanups it doesn't look bad. We still create the constants and delete them…

				// Build up the computation of the offset.
				unsigned baseIdxForDim = 1 + 3 * i;
				unsigned subOffsetForDim = baseIdxForDim;
				unsigned subStrideForDim = baseIdxForDim + 1;
				nicolasvasilacheUnsubmitted Done Reply Inline Actions This feels like too many levels of indirection for no good reason. How about something resembling: for () { OpFoldResult origStride = (ShapedType::isDynamicStrideOrOffset(sourceStrides[i])) ? b.getIndexAttr(sourceStrides[i]) : origStrides[i]; strides.push_back(makeComposedFoldedAffineApply( rewriter, origLoc, s0 * s1, {subStrides[i], origStride})); } nicolasvasilache: This feels like too many levels of indirection for no good reason. How about something…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions Sounds like I should merge the loop for the strides and the loop of the offset then. Otherwise I will repeat in both loops: OpFoldResult origStride = (ShapedType::isDynamicStrideOrOffset(sourceStrides[i])) ? b.getIndexAttr(sourceStrides[i]) : origStrides[i]; qcolombet: Sounds like I should merge the loop for the strides and the loop of the offset then. Otherwise…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions @nicolasvasilache I'm going to leave the PR open until tomorrow to give you a chance to see the merged loops. (In other words, what I did to avoid the duplication of the ternary op for the original stride after getting rid of the lambda.) qcolombet: @nicolasvasilache I'm going to leave the PR open until tomorrow to give you a chance to see the…
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions WFM, thanks! nicolasvasilache: WFM, thanks!
				unsigned origStrideForDim = baseIdxForDim + 2;
				expr = expr + symbols[subOffsetForDim] * symbols[subStrideForDim] *
				symbols[origStrideForDim];
				values[subOffsetForDim] = subOffsets[i];
				values[subStrideForDim] = subStrides[i];
				values[origStrideForDim] = origStride;
				}

				nicolasvasilacheUnsubmitted Done Reply Inline Actions Iteratively composing makes it hard to control your operands and indices here. You could do something like: SmallVector<Value> newStrides; for (unsigned i = 0; i < sourceRank; ++i) { newStrides.push_back( makeComposedFoldedAffineApply(rewriter, origLoc, s0 * s1, {getOrigStrideAtIdx(i), subStrides[i]})); } SmallVector<Value> values(2 * sourceRank + 1); SmallVector<AffineExpr> symbols(2 * sourceRank + 1); // Note: you may need to impl. bindSymbols yourself as the existing version works for variadic templates only. bindSymbols(symbols, ctx); AffineExpr expr = symbols.front(); for (unsigned i = 0; i < sourceRank; ++i) { expr = expr + symbols[1 + 2 * i] * symbols[1 + 2 * i + 1]; values.push_back(getOrigStrideAtIdx(i)); values.push_back(subStrides[i]); } Value finalOffset = makeComposedFoldedAffineApply(rewriter, origLoc, expr, values)); nicolasvasilache: Iteratively composing makes it hard to control your operands and indices here. You could do…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions Great! The order makes much more sense now and more importantly is predictable. Thanks! BTW, I have one loop for the new strides and one loop for the offset. Let me know if you prefer that we merge both loops. qcolombet: Great! The order makes much more sense now and more importantly is predictable. Thanks! BTW…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions same here, just spell it out with a ternary cond, the level of indirection dos not pay for itself. nicolasvasilache: same here, just spell it out with a ternary cond, the level of indirection dos not pay for…
				// Compute the offset.
				OpFoldResult finalOffset =
				makeComposedFoldedAffineApply(rewriter, origLoc, expr, values);

				SmallVector<Value> results;
				// The final result is <baseBuffer, offset, sizes, strides>.
				// Thus we need 1 + 1 + subview.getRank() + subview.getRank(), to hold all
				// the values.
				auto subType = subview.getType().cast<MemRefType>();
				unsigned subRank = subType.getRank();
				// Properly size the array so that we can do random insertions
				// at the right indices.
				// We do that to populate the non-dropped sizes and strides in one go.
				results.resize_for_overwrite(subRank * 2 + 2);

				results[0] = newExtractStridedMetadata.getBaseBuffer();
				results[1] =
				getValueOrCreateConstantIndexOp(rewriter, origLoc, finalOffset);

				// The sizes of the final type are defined directly by the input sizes of
				// the subview.
				// Moreover subviews can drop some dimensions, some strides and sizes may
				// not end up in the final <base, offset, sizes, strides> value that we are
				// replacing.
				nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: arrray nicolasvasilache: nit: arrray
				// Do the filtering here.
				SmallVector<OpFoldResult> subSizes = subview.getMixedSizes();
				const unsigned sizeStartIdx = 2;
				const unsigned strideStartIdx = sizeStartIdx + subRank;
				unsigned insertedDims = 0;
				llvm::SmallBitVector droppedDims = subview.getDroppedDims();
				for (unsigned i = 0; i < sourceRank; ++i) {
				if (droppedDims.test(i))
				continue;

				results[sizeStartIdx + insertedDims] =
				getValueOrCreateConstantIndexOp(rewriter, origLoc, subSizes[i]);
				results[strideStartIdx + insertedDims] =
				getValueOrCreateConstantIndexOp(rewriter, origLoc, strides[i]);
				++insertedDims;
				}
				assert(insertedDims == subRank &&
				"Should have populated all the values at this point");

				rewriter.replaceOp(op, results);
				return success();
				}
				};
				} // namespace

				void memref::populateSimplifyExtractStridedMetadataOpPatterns(
				RewritePatternSet &patterns) {
				patterns.add<ExtractStridedMetadataOpSubviewFolder>(patterns.getContext());
				}

				//===----------------------------------------------------------------------===//
				// Pass registration
				//===----------------------------------------------------------------------===//

				namespace {

				struct SimplifyExtractStridedMetadataPass final
				: public memref::impl::SimplifyExtractStridedMetadataBase<
				SimplifyExtractStridedMetadataPass> {
				void runOnOperation() override;
				};

				} // namespace

				void SimplifyExtractStridedMetadataPass::runOnOperation() {
				RewritePatternSet patterns(&getContext());
				memref::populateSimplifyExtractStridedMetadataOpPatterns(patterns);
				(void)applyPatternsAndFoldGreedily(getOperation()->getRegions(),
				std::move(patterns));
				}

				std::unique_ptr<Pass> memref::createSimplifyExtractStridedMetadataPass() {
				return std::make_unique<SimplifyExtractStridedMetadataPass>();
				}

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir

This file was added.

				// RUN: mlir-opt --simplify-extract-strided-metadata -split-input-file %s -o - \| FileCheck %s

				// -----

				// Check that we simplify extract_strided_metadata of subview to
				// base_buf, base_offset, base_sizes, base_strides = extract_strided_metadata
				// strides = base_stride_i * subview_stride_i
				// offset = base_offset + sum(subview_offsets_i * strides_i).
				//
				// This test also checks that we don't create useless arith operations
				// when subview_offsets_i is 0.
				//
				// CHECK-LABEL: func @extract_strided_metadata_of_subview
				// CHECK-SAME: (%[[ARG:.*]]: memref<5x4xf32>)
				//
				// Materialize the offset for dimension 1.
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
				// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
				//
				// Plain extract_strided_metadata.
				// CHECK-DAG: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:2, %[[STRIDES:.]]:2 = memref.extract_strided_metadata %[[ARG]]
				//
				// Final offset is:
				// origOffset + (== 0)
				// base_stride0 * subview_stride0 * subview_offset0 + (== 4 * 1 * 0 == 0)
				// base_stride1 * subview_stride1 * subview_offset1 (== 1 * 1 * 2)
				// == 2
				//
				// Return the new tuple.
				// CHECK: return %[[BASE]], %[[C2]], %[[C2]], %[[C2]], %[[C4]], %[[C1]]
				func.func @extract_strided_metadata_of_subview(%base: memref<5x4xf32>)
				-> (memref<f32>, index, index, index, index, index) {

				%subview = memref.subview %base[0, 2][2, 2][1, 1] :
				memref<5x4xf32> to memref<2x2xf32, strided<[4, 1], offset: 2>>

				%base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
				memref<2x2xf32, strided<[4,1], offset:2>>
				-> memref<f32>, index, index, index, index, index

				return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
				memref<f32>, index, index, index, index, index
				}

				// -----

				// Check that we simplify extract_strided_metadata of subview properly
				// when dynamic sizes are involved.
				// See extract_strided_metadata_of_subview for an explanation of the actual
				// expansion.
				// Orig strides: [64, 4, 1]
				// Sub strides: [1, 1, 1]
				// => New strides: [64, 4, 1]
				//
				// Orig offset: 0
				// Sub offsets: [3, 4, 2]
				// => Final offset: 3 * 64 + 4 * 4 + 2 * 1 + 0 == 210
				//
				// Final sizes == subview sizes == [%size, 6, 3]
				//
				// CHECK-LABEL: func @extract_strided_metadata_of_subview_with_dynamic_size
				// CHECK-SAME: (%[[ARG:.*]]: memref<8x16x4xf32>,
				// CHECK-SAME: %[[DYN_SIZE:.*]]: index)
				//
				// CHECK-DAG: %[[C210:.*]] = arith.constant 210 : index
				// CHECK-DAG: %[[C64:.*]] = arith.constant 64 : index
				// CHECK-DAG: %[[C6:.*]] = arith.constant 6 : index
				// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
				// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				//
				qcolombetAuthorUnsubmitted Done Reply Inline Actions @nicolasvasilache I've kept some of the mixed dynamic/static tests. Compared to the full dynamic test (`extract_strided_metadata_of_subview_all_dynamic` at the end of this file), these tests don't add much, but I feel they are more approachable than dissecting the huge affine map of the last test. Let me know if you want to remove them nonetheless. qcolombet: @nicolasvasilache I've kept some of the mixed dynamic/static tests. Compared to the full…
				// CHECK-DAG: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:3, %[[STRIDES:.]]:3 = memref.extract_strided_metadata %[[ARG]]
				//
				// CHECK: return %[[BASE]], %[[C210]], %[[DYN_SIZE]], %[[C6]], %[[C3]], %[[C64]], %[[C4]], %[[C1]]
				func.func @extract_strided_metadata_of_subview_with_dynamic_size(
				%base: memref<8x16x4xf32>, %size: index)
				-> (memref<f32>, index, index, index, index, index, index, index) {

				%subview = memref.subview %base[3, 4, 2][%size, 6, 3][1, 1, 1] :
				memref<8x16x4xf32> to memref<?x6x3xf32, strided<[64, 4, 1], offset: 210>>

				%base_buffer, %offset, %sizes:3, %strides:3 = memref.extract_strided_metadata %subview :
				memref<?x6x3xf32, strided<[64,4,1], offset: 210>>
				-> memref<f32>, index, index, index, index, index, index, index

				return %base_buffer, %offset, %sizes#0, %sizes#1, %sizes#2, %strides#0, %strides#1, %strides#2 :
				memref<f32>, index, index, index, index, index, index, index
				}

				// -----

				// Check that we simplify extract_strided_metadata of subview properly
				// when the subview reduces the ranks.
				// In particular the returned strides must come from #1 and #2 of the %strides
				// value of the new extract_strided_metadata_of_subview, not #0 and #1.
				// See extract_strided_metadata_of_subview for an explanation of the actual
				// expansion.
				//
				// Orig strides: [64, 4, 1]
				// Sub strides: [1, 1, 1]
				// => New strides: [64, 4, 1]
				// Final strides == filterOutReducedDim(new strides, 0) == [4 , 1]
				//
				nicolasvasilacheUnsubmitted Done Reply Inline Actions typo nicolasvasilache: typo
				qcolombetAuthorUnsubmitted Done Reply Inline Actions Nice catch! qcolombet: Nice catch!
				// Orig offset: 0
				// Sub offsets: [3, 4, 2]
				// => Final offset: 3 * 64 + 4 * 4 + 2 * 1 + 0 == 210
				//
				// Final sizes == filterOutReducedDim(subview sizes, 0) == [6, 3]
				//
				// CHECK-LABEL: func @extract_strided_metadata_of_rank_reduced_subview
				// CHECK-SAME: (%[[ARG:.*]]: memref<8x16x4xf32>)
				//
				// CHECK-DAG: %[[C210:.*]] = arith.constant 210 : index
				// CHECK-DAG: %[[C6:.*]] = arith.constant 6 : index
				// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
				// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				//
				// CHECK-DAG: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:3, %[[STRIDES:.]]:3 = memref.extract_strided_metadata %[[ARG]]
				//
				// CHECK: return %[[BASE]], %[[C210]], %[[C6]], %[[C3]], %[[C4]], %[[C1]]
				func.func @extract_strided_metadata_of_rank_reduced_subview(%base: memref<8x16x4xf32>)
				-> (memref<f32>, index, index, index, index, index) {

				%subview = memref.subview %base[3, 4, 2][1, 6, 3][1, 1, 1] :
				memref<8x16x4xf32> to memref<6x3xf32, strided<[4, 1], offset: 210>>

				%base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
				memref<6x3xf32, strided<[4,1], offset: 210>>
				-> memref<f32>, index, index, index, index, index

				return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
				memref<f32>, index, index, index, index, index
				}

				// -----

				// Check that we simplify extract_strided_metadata of subview properly
				// when the subview reduces the rank and some of the strides are variable.
				// In particular, we check that:
				// A. The dynamic stride is multiplied with the base stride to create the new
				// stride for dimension 1.
				// B. The first returned stride is the value computed in #A.
				// See extract_strided_metadata_of_subview for an explanation of the actual
				// expansion.
				//
				// Orig strides: [64, 4, 1]
				// Sub strides: [1, %stride, 1]
				// => New strides: [64, 4 * %stride, 1]
				// Final strides == filterOutReducedDim(new strides, 0) == [4 * %stride , 1]
				//
				nicolasvasilacheUnsubmitted Done Reply Inline Actions typo nicolasvasilache: typo
				// Orig offset: 0
				// Sub offsets: [3, 4, 2]
				// => Final offset: 3 * 64 + 4 * 4 * %stride + 2 * 1 + 0 == 16 * %stride + 194
				//
				// CHECK-DAG: #[[$STRIDE1_MAP:.]] = affine_map<()[s0] -> (s0 4)>
				// CHECK-DAG: #[[$OFFSET_MAP:.]] = affine_map<()[s0] -> (s0 16 + 194)>
				// CHECK-LABEL: func @extract_strided_metadata_of_rank_reduced_subview_w_variable_strides
				// CHECK-SAME: (%[[ARG:.*]]: memref<8x16x4xf32>,
				// CHECK-SAME: %[[DYN_STRIDE:.*]]: index)
				//
				// CHECK-DAG: %[[C6:.*]] = arith.constant 6 : index
				// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				//
				// CHECK-DAG: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:3, %[[STRIDES:.]]:3 = memref.extract_strided_metadata %[[ARG]]
				//
				// CHECK-DAG: %[[DIM1_STRIDE:.*]] = affine.apply #[[$STRIDE1_MAP]]()[%[[DYN_STRIDE]]]
				// CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[DYN_STRIDE]]]
				//
				// CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[C6]], %[[C3]], %[[DIM1_STRIDE]], %[[C1]]
				func.func @extract_strided_metadata_of_rank_reduced_subview_w_variable_strides(
				%base: memref<8x16x4xf32>, %stride: index)
				-> (memref<f32>, index, index, index, index, index) {

				%subview = memref.subview %base[3, 4, 2][1, 6, 3][1, %stride, 1] :
				memref<8x16x4xf32> to memref<6x3xf32, strided<[4, 1], offset: 210>>

				%base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
				memref<6x3xf32, strided<[4, 1], offset: 210>>
				-> memref<f32>, index, index, index, index, index

				return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
				memref<f32>, index, index, index, index, index
				}

				// -----

				// Check that we simplify extract_strided_metadata of subview properly
				// when the subview uses variable offsets.
				// See extract_strided_metadata_of_subview for an explanation of the actual
				// expansion.
				//
				// Orig strides: [128, 1]
				// Sub strides: [1, 1]
				// => New strides: [128, 1]
				//
				// Orig offset: 0
				// Sub offsets: [%arg1, %arg2]
				// => Final offset: 128 * arg1 + 1 * %arg2 + 0
				//
				// CHECK-DAG: #[[$OFFSETS_MAP:.]] = affine_map<()[s0, s1] -> (s0 128 + s1)>
				// CHECK-LABEL: func @extract_strided_metadata_of_subview_w_variable_offset
				// CHECK-SAME: (%[[ARG:.*]]: memref<384x128xf32>,
				// CHECK-SAME: %[[DYN_OFFSET0:.*]]: index,
				// CHECK-SAME: %[[DYN_OFFSET1:.*]]: index)
				//
				// CHECK-DAG: %[[C128:.*]] = arith.constant 128 : index
				// CHECK-DAG: %[[C64:.*]] = arith.constant 64 : index
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:2, %[[STRIDES:.]]:2 = memref.extract_strided_metadata %[[ARG]]
				//
				// CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSETS_MAP]]()[%[[DYN_OFFSET0]], %[[DYN_OFFSET1]]]
				//
				// CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[C64]], %[[C64]], %[[C128]], %[[C1]]
				#map0 = affine_map<(d0, d1)[s0] -> (d0 * 128 + s0 + d1)>
				func.func @extract_strided_metadata_of_subview_w_variable_offset(
				%arg0: memref<384x128xf32>, %arg1 : index, %arg2 : index)
				-> (memref<f32>, index, index, index, index, index) {
				qcolombetAuthorUnsubmitted Done Reply Inline Actions Nit question: Is there a way to produce an affine map, with a nice ordering of the arguments? E.g., origOffset, stride0, subStride0, subOffset0, ... Although the current map is correct, the arguments are all over the place :). qcolombet: Nit question: Is there a way to produce an affine map, with a nice ordering of the arguments? E.
				nicolasvasilacheUnsubmitted Done Reply Inline Actions See my comment above, I believe it's because you compose AffineMaps iteratively. You could create a flat list of AffineExpr with the proper expressions, match that to the proper value and thus better control the alignment of values. This would also resist canonicalizations (e.g. if 2 values are the same the corresponding symbol gets dropped and the other ones shifted preserving the original alignment on the other values). nicolasvasilache: See my comment above, I believe it's because you compose AffineMaps iteratively. You could…
				qcolombetAuthorUnsubmitted Done Reply Inline Actions Works great! qcolombet: Works great!

				%subview = memref.subview %arg0[%arg1, %arg2] [64, 64] [1, 1] :
				memref<384x128xf32> to memref<64x64xf32, #map0>
				nicolasvasilacheUnsubmitted Done Reply Inline Actions can you please manually reflow and align? nicolasvasilache: can you please manually reflow and align?

				%base_buffer, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %subview :
				memref<64x64xf32, #map0> -> memref<f32>, index, index, index, index, index

				return %base_buffer, %offset, %sizes#0, %sizes#1, %strides#0, %strides#1 :
				memref<f32>, index, index, index, index, index
				}

				// -----

				// Check that all the math is correct for all types of computations.
				// We achieve that by using dynamic values for all the different types:
				// - Offsets
				// - Sizes
				// - Strides
				//
				// Orig strides: [s0, s1, s2]
				// Sub strides: [subS0, subS1, subS2]
				// => New strides: [s0 * subS0, s1 * subS1, s2 * subS2]
				// ==> 1 affine map (used for each stride) with two values.
				//
				// Orig offset: origOff
				// Sub offsets: [subO0, subO1, subO2]
				// => Final offset: s0 * subS0 * subO0 + ... + s2 * subS2 * subO2 + origOff
				// ==> 1 affine map with (rank * 3 + 1) symbols
				//
				// CHECK-DAG: #[[$STRIDE_MAP:.]] = affine_map<()[s0, s1] -> (s0 s1)>
				// CHECK-DAG: #[[$OFFSET_MAP:.]] = affine_map<()[s0, s1, s2, s3, s4, s5, s6, s7, s8, s9] -> (s0 + (s1 s2) * s3 + (s4 * s5) * s6 + (s7 * s8) * s9)>
				// CHECK-LABEL: func @extract_strided_metadata_of_subview_all_dynamic
				// CHECK-SAME: (%[[ARG:.]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>, %[[DYN_OFFSET0:.]]: index, %[[DYN_OFFSET1:.]]: index, %[[DYN_OFFSET2:.]]: index, %[[DYN_SIZE0:.]]: index, %[[DYN_SIZE1:.]]: index, %[[DYN_SIZE2:.]]: index, %[[DYN_STRIDE0:.]]: index, %[[DYN_STRIDE1:.]]: index, %[[DYN_STRIDE2:.]]: index)
				//
				// CHECK-DAG: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:3, %[[STRIDES:.]]:3 = memref.extract_strided_metadata %[[ARG]]
				//
				// CHECK-DAG: %[[FINAL_STRIDE0:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE0]], %[[STRIDES]]#0]
				// CHECK-DAG: %[[FINAL_STRIDE1:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE1]], %[[STRIDES]]#1]
				// CHECK-DAG: %[[FINAL_STRIDE2:.*]] = affine.apply #[[$STRIDE_MAP]]()[%[[DYN_STRIDE2]], %[[STRIDES]]#2]
				//
				// CHECK-DAG: %[[FINAL_OFFSET:.*]] = affine.apply #[[$OFFSET_MAP]]()[%[[OFFSET]], %[[DYN_OFFSET0]], %[[DYN_STRIDE0]], %[[STRIDES]]#0, %[[DYN_OFFSET1]], %[[DYN_STRIDE1]], %[[STRIDES]]#1, %[[DYN_OFFSET2]], %[[DYN_STRIDE2]], %[[STRIDES]]#2]
				//
				// CHECK: return %[[BASE]], %[[FINAL_OFFSET]], %[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]], %[[FINAL_STRIDE0]], %[[FINAL_STRIDE1]], %[[FINAL_STRIDE2]]
				func.func @extract_strided_metadata_of_subview_all_dynamic(
				%base: memref<?x?x?xf32, strided<[?,?,?], offset:?>>,
				%offset0: index, %offset1: index, %offset2: index,
				%size0: index, %size1: index, %size2: index,
				%stride0: index, %stride1: index, %stride2: index)
				-> (memref<f32>, index, index, index, index, index, index, index) {

				%subview = memref.subview %base[%offset0, %offset1, %offset2]
				[%size0, %size1, %size2]
				[%stride0, %stride1, %stride2] :
				memref<?x?x?xf32, strided<[?,?,?], offset: ?>> to
				memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>

				%base_buffer, %offset, %sizes:3, %strides:3 = memref.extract_strided_metadata %subview :
				memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
				-> memref<f32>, index, index, index, index, index, index, index

				return %base_buffer, %offset, %sizes#0, %sizes#1, %sizes#2, %strides#0, %strides#1, %strides#2 :
				memref<f32>, index, index, index, index, index, index, index
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][MemRef] Canonicalize extract_strided_metadata(subview)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 458784

mlir/include/mlir/Dialect/MemRef/Transforms/Passes.h

mlir/include/mlir/Dialect/MemRef/Transforms/Passes.td

mlir/include/mlir/IR/AffineExpr.h

mlir/lib/Dialect/MemRef/Transforms/CMakeLists.txt

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir

[mlir][MemRef] Canonicalize extract_strided_metadata(subview)
ClosedPublic