This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1001	We had discussed a bunch of times in the past to move reshapes towards either function entry or return depending on whether it is an expanding or contracting reshape to make the "reshape"-free block of ops higher-dimensional. I thought @mravishankar had implemented some of those already. Could you please comment on how this overlaps with, complements or extends what (I think) exists?

ThomasRaoux added inline comments.Apr 16 2021, 3:21 PM

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1001	This is something we discussed with @mravishankar on discord. My understanding is that what has been done so far works to fuse reshape into elementwise when the conversion increase the rank. I believe there is no solution for the case where we need to reduce the rank of the generic op. Here I always try to sink the reshape because named op are fused with generic users so moving the reshape down allows removing the reshape between named ops and and its users. Let me know if you think we should discuss this more.

nicolasvasilache added inline comments.Apr 16 2021, 3:54 PM

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1001	ok, it seems I mistook discussions that turned into code with wishful thinking :) We have discussed the general problem and solutions a bunch of times with Mahesh, if you guys sync'ed up on the topic that's perfect. I'll review on Monday.

Harbormaster completed remote builds in B99257: Diff 338221.Apr 16 2021, 4:55 PM

Very cool, nice to see this line of work taking shape (pun intended :) )!
Feel free to ignore the refactoring comments for now if that sounds too painful, we'll have a bunch of refactoring and helper functions to add anyway once the various reshapes ops are rebalanced.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
110	"Sinking an op" involves a notion of loop nest in my mind. I'd prefer terminology such as "push" towards block end / "pull" towards block begin; other suggestions welcome.
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1068	Note: we discussed with @pifon2a and we are going to do some refactoring on the various reshape ops. linalg.reshape is going to become memref.reshape and drop its permutation map in favor of a permutation vector (`vector<vector<int>>`). Just mentioning this in case you want to look a bit further ahead and have a simple impl. here. If you prefer this, you could add a helper to reshape that returns `vector<vector<int>>` for the reassociation maps. The rationale is that maps were provisioned to allow more advanced reindexings than just permutations but in practice this is overkill and will be hard to use: composition with other ops will keep it simpler.
1086	typo `inserting`.
1088	this looks like it would be a good helper function too on reshape op.
1102	Should be refactored somewhere as helpers on linalg.reshape/linalg.generic ? Same thing re. simplifying with `vector<vector<int>>` if it makes sense.

This revision is now accepted and ready to land.Apr 19 2021, 5:38 AM

Initial comments. Will do a detailed review soon

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1001	Sorry, I missed this. Will take a look at this. But at the outset this is the converse of the existing pattern that expands generic/indexed_generic to fold away reshapes. It always makes sense to expand dimensions and execute in higher dimensionality when possible. Once that is done, we need to the converse. You reduce the dimensionality of generic/indexed_generic operations to fold away more reshapes. So both are I needed. Either way, I think there is a lot of opportunity to reuse/extend the logic used for the expansion mode here, and this looks like a rewrite/replication of that functionality. Maybe there is something missing in the above, would be good to unify these.

This revision now requires changes to proceed.Apr 20 2021, 10:59 AM

mravishankar added inline comments.Apr 20 2021, 12:35 PM

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1033	This should work for `IndexedGeneric` too right? I would suggest moving the actual logic into a function which takes `LinalgOp` as an argument (as is done for other functions here)
1037	Better to check if (!op.hasTensorSemantics() \|\| op.getNumParallelLoops() == op.getNumLoops()) return failure();
1061	Kind of a very specific check. Actually would be better off separting the core logic of fusing a `linalg.generic/indexed_generic` operation with the `linalg.tensor_reshape` from heuristics that decide when to do the fusion. This check is better off in the pattern itself, but outside of the core transformation.
1073	I am confused why there is a `mergedDim` and a `removedDim`. All dims are merged into another dim during collapse.
1102	I think similar comment here. You can just use the constructor with `SmallVector<ReassociationIndices>`
1112	Dont need to use the maps. You can just use the build method with `SmallVector<ReassociationIndices>` directly.
1130	Mentioned this offline too, but noting it here. AFAIK, the build method used here is for the case where the source is collapsed to get the dest. Here the src is expanded to get the dest. So need to provide the output type specifically.
1491	Nit: Would rename this to `populateFoldReshapeOpsByCollapsingPatterns` (its the dual of the `populateFoldReshapeOpsByExpansionPatterns`
mlir/test/Dialect/Linalg/fusion-sink-reshape.mlir
100 ↗	(On Diff #338221)	Add a dynamic shape test too?

Address review comments

ThomasRaoux marked 5 inline comments as done.Apr 21 2021, 10:28 AM

ThomasRaoux added inline comments.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
110	Changed the naming to push.
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1033	Good point, I made it a template.
1061	This is just because I'm not inserting reshape for other operands so just checking that all the reshape operands match. As discuss I can extend it in a separate patch.
1068	It does seem much simpler. I changed the code to be based SmallVector<ReassociationIndices> (as suggested by Mahesh below) which is basically a vector<vector<int>>. It should be easy to switch in the future.
1073	This was a bit confusing indeed. This was to be able to figure out which dimensions was left after merging. I simplified this logic now everything is based on the reassociation indices as the reverse map so this should be much easier to understand.
1088	I removed this code and use the reshape builder instead.
1491	It doesn't really fold the reshape so I'm not sure about calling it fold. I renamed to push but I can change it again if you think there is a better name.
mlir/test/Dialect/Linalg/fusion-sink-reshape.mlir
100 ↗	(On Diff #338221)	Added a dynamic dim in the first test.

Harbormaster completed remote builds in B100035: Diff 339303.Apr 21 2021, 11:35 AM

Logic looks fine. Just some points about code structure. Thanks and sorry for nitpicking a bit. Trying to keep a consistent flow across patterns in this file.

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
1033	I'd still like it refactored though. THe core logic of this has nothing specific to `Generic` or `IndexedGeneric`. So you can move the core logic to a method fuseWithReshapeByCollapsing(LinalgOp linalgOp, TensorReshapeOp reshapeOp, unsigned fusedTensorIndex, PatternRewriter &rewriter) similar to `fuseWithReshapeByExpansion` above. THere is a separate method `isFusable` above that checks all the conditions for the fusion to be valid. So the approach has been bool isFusableWithReshapeByCollapsing(LinalgOp linalgOp, TensorReshapeOp reshapeOp, ...) { ... } static Optional<SmallVector<Value, 1>> fuseWithReshapeByCollapsing(...) { ...} template <typename GenericOpTy> struct FuseWithReshapeByCollapsing : OpRewritePattern<GenericOpTy> { LogicalResult matchAndRewrite(GenericOpTy genericOp, PatternRewriter &rewriter) const override { if (!isFusableWithReshapeByCollapsing(...)) return failure(); auto replacement = fuseWithReshapeByCollapsing(cast<LinalgOp>(genericOp.getOperation(), reshapeOp); if( !replacement) return failure(); rewriter.replaceOp(genericOp, (replacement)[0]); return success(); } };
1065	I dont think this is necessray. You can stop on the first `TensorReshapeOp` you find that is fusable by collapsing (see the use of `isFusable...` method above). You implement the fusion for that reshape -> generic op. If there are other reshape ops, they will be checked independent of the first match. It should still compose this way and this extra logic is not needed.

This revision now requires changes to proceed.Apr 21 2021, 1:33 PM

After offlien conversation with Thomas, I realize I was missing one aspect of checking that all reshapes need to be folded at the same time here as opposed to other patterns where each reshape could be folded at a time (this is because the pattern does not introduce new reshapes). So LGTM this for now. Can revisit this later.

This revision is now accepted and ready to land.Apr 21 2021, 4:39 PM

Closed by commit rGd40a19c3a8b3: [mlir][linalg] Add pattern to push reshape after elementwise operation (authored by ThomasRaoux). · Explain WhyApr 21 2021, 9:36 PM

This revision was automatically updated to reflect the committed changes.

ThomasRaoux added a commit: rGd40a19c3a8b3: [mlir][linalg] Add pattern to push reshape after elementwise operation.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Transforms.h

4 lines

lib/

Dialect/

Linalg/

Transforms/

FusionOnTensors.cpp

161 lines

test/

Dialect/

Linalg/

fusion-push-reshape.mlir

98 lines

lib/

Transforms/

TestLinalgElementwiseFusion.cpp

21 lines

tools/

mlir-opt/

mlir-opt.cpp

2 lines

Diff 339461

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	struct LinalgElementwiseFusionOptions {
}		}
};		};

/// Patterns for fusing linalg operation on tensors.		/// Patterns for fusing linalg operation on tensors.
void populateElementwiseOpsFusionPatterns(		void populateElementwiseOpsFusionPatterns(
RewritePatternSet &patterns,		RewritePatternSet &patterns,
LinalgElementwiseFusionOptions options = LinalgElementwiseFusionOptions());		LinalgElementwiseFusionOptions options = LinalgElementwiseFusionOptions());

		/// Patterns to push reshape op towards the end of the graph in order to expose
		/// more fusion opportunities.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions "Sinking an op" involves a notion of loop nest in my mind. I'd prefer terminology such as "push" towards block end / "pull" towards block begin; other suggestions welcome. nicolasvasilache: "Sinking an op" involves a notion of loop nest in my mind. I'd prefer terminology such as…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Changed the naming to push. ThomasRaoux: Changed the naming to push.
		void populatePushReshapeOpsPatterns(RewritePatternSet &patterns);

/// Performs standalone tiling of a single LinalgOp by `tileSizes`.		/// Performs standalone tiling of a single LinalgOp by `tileSizes`.
/// and permute the loop nest according to `interchangeVector`		/// and permute the loop nest according to `interchangeVector`
/// The permutation is expressed as a list of integers that specify		/// The permutation is expressed as a list of integers that specify
/// the new ordering of the loop nest. The length of `interchangeVector`		/// the new ordering of the loop nest. The length of `interchangeVector`
/// must be equal to the length of `tileSizes`.		/// must be equal to the length of `tileSizes`.
/// An empty vector is interpreted as the identity permutation and the		/// An empty vector is interpreted as the identity permutation and the
/// transformation returns early.		/// transformation returns early.
///		///
▲ Show 20 Lines • Show All 1,031 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp

Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines	for (auto operand : llvm::enumerate(linalgOp.getInputs())) {
op.indexing_mapsAttr(rewriter.getAffineMapArrayAttr(fusedIndexMaps));		op.indexing_mapsAttr(rewriter.getAffineMapArrayAttr(fusedIndexMaps));
rewriter.finalizeRootUpdate(op);		rewriter.finalizeRootUpdate(op);
return success();		return success();
}		}
return failure();		return failure();
}		}
};		};

		static SmallVector<ReassociationIndices>
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions We had discussed a bunch of times in the past to move reshapes towards either function entry or return depending on whether it is an expanding or contracting reshape to make the "reshape"-free block of ops higher-dimensional. I thought @mravishankar had implemented some of those already. Could you please comment on how this overlaps with, complements or extends what (I think) exists? nicolasvasilache: We had discussed a bunch of times in the past to move reshapes towards either function entry or…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions This is something we discussed with @mravishankar on discord. My understanding is that what has been done so far works to fuse reshape into elementwise when the conversion increase the rank. I believe there is no solution for the case where we need to reduce the rank of the generic op. Here I always try to sink the reshape because named op are fused with generic users so moving the reshape down allows removing the reshape between named ops and and its users. Let me know if you think we should discuss this more. ThomasRaoux: This is something we discussed with @mravishankar on discord. My understanding is that what has…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions ok, it seems I mistook discussions that turned into code with wishful thinking :) We have discussed the general problem and solutions a bunch of times with Mahesh, if you guys sync'ed up on the topic that's perfect. I'll review on Monday. nicolasvasilache: ok, it seems I mistook discussions that turned into code with wishful thinking :) We have…
		mravishankarUnsubmitted Not Done Reply Inline Actions Sorry, I missed this. Will take a look at this. But at the outset this is the converse of the existing pattern that expands generic/indexed_generic to fold away reshapes. It always makes sense to expand dimensions and execute in higher dimensionality when possible. Once that is done, we need to the converse. You reduce the dimensionality of generic/indexed_generic operations to fold away more reshapes. So both are I needed. Either way, I think there is a lot of opportunity to reuse/extend the logic used for the expansion mode here, and this looks like a rewrite/replication of that functionality. Maybe there is something missing in the above, would be good to unify these. mravishankar: Sorry, I missed this. Will take a look at this. But at the outset this is the converse of the…
		getReassociationIndices(ArrayRef<AffineMap> maps) {
		SmallVector<ReassociationIndices> reassociation;
		for (AffineMap map : maps) {
		ReassociationIndices indices;
		for (unsigned i = 0, e = map.getNumResults(); i < e; i++) {
		unsigned pos = map.getResult(i).cast<AffineDimExpr>().getPosition();
		indices.push_back(pos);
		}
		reassociation.push_back(indices);
		}
		return reassociation;
		}

		/// Pattern to move rank reducing reshape after an elementwise linalg generic
		/// op. This is useful to expose more fusion opportunities between named ops and
		/// generic op. This can only be done if there is no broadcast or permuation
		/// within the dimensions we need to merge.
		///
		/// For example,
		///
		/// %0 = linalg.tensor_reshape %A [
		/// affine_map<(d0, d1, d2) -> (d0, d1)>, affine_map<(d0, d1, d2) -> (d2)>]
		/// : tensor<12544x16xf32> into tensor<112x112x16xf32>
		/// %2 = linalg.generic {indexing_maps = [
		/// affine_map<(d0, d1, d2) -> (d0, d1, d2)>,
		/// affine_map<(d0, d1, d2) -> (d2)>,
		/// affine_map<(d0, d1, d2) -> (d0, d1, d2)>], iterator_types =
		/// ["parallel", "parallel", "parallel"]} {
		/// } -> tensor<112x112x16xf32>
		///
		/// into
		///
		mravishankarUnsubmitted Done Reply Inline Actions This should work for `IndexedGeneric` too right? I would suggest moving the actual logic into a function which takes `LinalgOp` as an argument (as is done for other functions here) mravishankar: This should work for `IndexedGeneric` too right? I would suggest moving the actual logic into a…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Good point, I made it a template. ThomasRaoux: Good point, I made it a template.
		mravishankarUnsubmitted Not Done Reply Inline Actions I'd still like it refactored though. THe core logic of this has nothing specific to `Generic` or `IndexedGeneric`. So you can move the core logic to a method fuseWithReshapeByCollapsing(LinalgOp linalgOp, TensorReshapeOp reshapeOp, unsigned fusedTensorIndex, PatternRewriter &rewriter) similar to `fuseWithReshapeByExpansion` above. THere is a separate method `isFusable` above that checks all the conditions for the fusion to be valid. So the approach has been bool isFusableWithReshapeByCollapsing(LinalgOp linalgOp, TensorReshapeOp reshapeOp, ...) { ... } static Optional<SmallVector<Value, 1>> fuseWithReshapeByCollapsing(...) { ...} template <typename GenericOpTy> struct FuseWithReshapeByCollapsing : OpRewritePattern<GenericOpTy> { LogicalResult matchAndRewrite(GenericOpTy genericOp, PatternRewriter &rewriter) const override { if (!isFusableWithReshapeByCollapsing(...)) return failure(); auto replacement = fuseWithReshapeByCollapsing(cast<LinalgOp>(genericOp.getOperation(), reshapeOp); if( !replacement) return failure(); rewriter.replaceOp(genericOp, (replacement)[0]); return success(); } }; mravishankar: I'd still like it refactored though. THe core logic of this has nothing specific to `Generic`…
		/// %2 = linalg.generic {indexing_maps = [
		/// affine_map<(d0, d1) -> (d0, d1)>,
		/// affine_map<(d0, d1) -> (d1)>,
		/// affine_map<(d0, d1) -> (d0, d1)>],
		mravishankarUnsubmitted Done Reply Inline Actions Better to check if (!op.hasTensorSemantics() \|\| op.getNumParallelLoops() == op.getNumLoops()) return failure(); mravishankar: Better to check ``` if (!op.hasTensorSemantics() \|\| op.getNumParallelLoops() == op.
		/// iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1
		/// : tensor<12544x16xf32>, tensor<16xf32>) outs(%1 : tensor<12544x16xf32>) {
		/// } -> tensor<12544x16xf32>
		/// %3 = linalg.tensor_reshape %2 [
		/// #affine_map<(d0, d1, d2) -> (d0, d1)>, affine_map<(d0, d1, d2) -> (d2)>]
		/// : tensor<12544x16xf32> into tensor<112x112x16xf32>
		template <typename GenericOpTy>
		struct PushExpandingReshape : public OpRewritePattern<GenericOpTy> {
		using OpRewritePattern<GenericOpTy>::OpRewritePattern;

		LogicalResult matchAndRewrite(GenericOpTy op,
		PatternRewriter &rewriter) const override {
		// Only apply to elementwise linalg on tensor.
		if (!op.hasTensorSemantics() \|\|
		op.getNumParallelLoops() != op.getNumLoops())
		return failure();
		// Only support identity output maps. It could be extended to permuations if
		// needed.
		if (llvm::any_of(op.getOutputIndexingMaps(),
		[](AffineMap map) { return !map.isIdentity(); }))
		return failure();
		int64_t destRank = op.getNumParallelLoops();
		SmallVector<Value, 4> newOperands = llvm::to_vector<4>(op.getInputs());
		TensorReshapeOp reshapeFound;
		mravishankarUnsubmitted Not Done Reply Inline Actions Kind of a very specific check. Actually would be better off separting the core logic of fusing a `linalg.generic/indexed_generic` operation with the `linalg.tensor_reshape` from heuristics that decide when to do the fusion. This check is better off in the pattern itself, but outside of the core transformation. mravishankar: Kind of a very specific check. Actually would be better off separting the core logic of fusing…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions This is just because I'm not inserting reshape for other operands so just checking that all the reshape operands match. As discuss I can extend it in a separate patch. ThomasRaoux: This is just because I'm not inserting reshape for other operands so just checking that all the…
		// 1. Look for tensor_reshape operands and figure out save the dimensions
		// merged.
		for (auto operand : llvm::enumerate(op.getInputs())) {
		TensorReshapeOp reshapeOp =
		mravishankarUnsubmitted Not Done Reply Inline Actions I dont think this is necessray. You can stop on the first `TensorReshapeOp` you find that is fusable by collapsing (see the use of `isFusable...` method above). You implement the fusion for that reshape -> generic op. If there are other reshape ops, they will be checked independent of the first match. It should still compose this way and this extra logic is not needed. mravishankar: I dont think this is necessray. You can stop on the first `TensorReshapeOp` you find that is…
		operand.value().template getDefiningOp<TensorReshapeOp>();
		if (!reshapeOp \|\| reshapeOp.getSrcType().getRank() >
		reshapeOp.getResultType().getRank()) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Note: we discussed with @pifon2a and we are going to do some refactoring on the various reshape ops. linalg.reshape is going to become memref.reshape and drop its permutation map in favor of a permutation vector (`vector<vector<int>>`). Just mentioning this in case you want to look a bit further ahead and have a simple impl. here. If you prefer this, you could add a helper to reshape that returns `vector<vector<int>>` for the reassociation maps. The rationale is that maps were provisioned to allow more advanced reindexings than just permutations but in practice this is overkill and will be hard to use: composition with other ops will keep it simpler. nicolasvasilache: Note: we discussed with @pifon2a and we are going to do some refactoring on the various reshape…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions It does seem much simpler. I changed the code to be based SmallVector<ReassociationIndices> (as suggested by Mahesh below) which is basically a vector<vector<int>>. It should be easy to switch in the future. ThomasRaoux: It does seem much simpler. I changed the code to be based SmallVector<ReassociationIndices> (as…
		continue;
		}
		// TODO: We could support non-identity map as long as the merged
		// dimensions are still contiguous.
		if (!op.getIndexingMaps()[operand.index()].isIdentity())
		mravishankarUnsubmitted Not Done Reply Inline Actions I am confused why there is a `mergedDim` and a `removedDim`. All dims are merged into another dim during collapse. mravishankar: I am confused why there is a `mergedDim` and a `removedDim`. All dims are merged into another…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions This was a bit confusing indeed. This was to be able to figure out which dimensions was left after merging. I simplified this logic now everything is based on the reassociation indices as the reverse map so this should be much easier to understand. ThomasRaoux: This was a bit confusing indeed. This was to be able to figure out which dimensions was left…
		continue;
		if (reshapeFound) {
		// Only support a second reshape op if it has the same reassociate maps.
		if (reshapeFound.getReassociationMaps() ==
		reshapeOp.getReassociationMaps())
		newOperands[operand.index()] = reshapeOp.src();
		continue;
		}
		reshapeFound = reshapeOp;
		newOperands[operand.index()] = reshapeOp.src();
		}
		if (!reshapeFound)
		return failure();
		nicolasvasilacheUnsubmitted Done Reply Inline Actions typo `inserting`. nicolasvasilache: typo `inserting`.

		// Calculate the reassociation indices and rassociated reverse map.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions this looks like it would be a good helper function too on reshape op. nicolasvasilache: this looks like it would be a good helper function too on reshape op.
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions I removed this code and use the reshape builder instead. ThomasRaoux: I removed this code and use the reshape builder instead.
		SmallVector<ReassociationIndices> reassociation =
		getReassociationIndices(reshapeFound.getReassociationMaps());
		SmallVector<unsigned, 4> remap(destRank);
		for (auto &indices : llvm::enumerate(reassociation)) {
		for (int64_t index : indices.value()) {
		remap[index] = indices.index();
		}
		}
		// 2. Verify that we can merge the dimensions in the linalg and that we
		// don't need to create new reshapes operands. Inserting new reshape
		// operands would defeat the purpose of the transformation.
		for (auto operand : llvm::enumerate(op.getInputs())) {
		if (operand.value() == newOperands[operand.index()]) {
		AffineMap map = op.getIndexingMaps()[operand.index()];
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Should be refactored somewhere as helpers on linalg.reshape/linalg.generic ? Same thing re. simplifying with `vector<vector<int>>` if it makes sense. nicolasvasilache: Should be refactored somewhere as helpers on linalg.reshape/linalg.generic ? Same thing re.
		mravishankarUnsubmitted Done Reply Inline Actions I think similar comment here. You can just use the constructor with `SmallVector<ReassociationIndices>` mravishankar: I think similar comment here. You can just use the constructor with…
		for (unsigned i : llvm::seq(unsigned(0), map.getNumResults())) {
		if (reassociation[remap[map.getDimPosition(i)]].size() > 1)
		return failure();
		}
		}
		}

		// 3. Calculate the affine map remapping and the reassociation to apply to
		// output tensors.
		SmallVector<AffineMap, 4> newMaps;
		mravishankarUnsubmitted Done Reply Inline Actions Dont need to use the maps. You can just use the build method with `SmallVector<ReassociationIndices>` directly. mravishankar: Dont need to use the maps. You can just use the build method with…
		unsigned newRank = reassociation.size();
		for (auto map : op.getIndexingMaps()) {
		SmallVector<AffineExpr> newExprs;
		for (auto expr : map.getResults()) {
		unsigned position = expr.template cast<AffineDimExpr>().getPosition();
		// Skip dimension merged except for the last of the group.
		if (reassociation[remap[position]].back() == position) {
		newExprs.push_back(
		getAffineDimExpr(remap[position], op.getContext()));
		}
		}
		newMaps.push_back(AffineMap::get(newRank, 0, newExprs, op.getContext()));
		}

		// 4. Reshape the output tensors.
		SmallVector<Value> newOutputs;
		SmallVector<Type> newOutputTypes;
		for (auto output : op.outputs()) {
		mravishankarUnsubmitted Done Reply Inline Actions Mentioned this offline too, but noting it here. AFAIK, the build method used here is for the case where the source is collapsed to get the dest. Here the src is expanded to get the dest. So need to provide the output type specifically. mravishankar: Mentioned this offline too, but noting it here. AFAIK, the build method used here is for the…
		Value newOutput = rewriter.create<TensorReshapeOp>(
		op->getLoc(), reshapeFound.getSrcType(), output, reassociation);
		newOutputTypes.push_back(newOutput.getType());
		newOutputs.push_back(newOutput);
		}
		// 5. Create a new generic op with lowerer rank.
		SmallVector<StringRef, 4> iteratorTypes(newRank,
		getParallelIteratorTypeName());
		auto newOp =
		rewriter.create<GenericOpTy>(op->getLoc(), newOutputTypes, newOperands,
		newOutputs, newMaps, iteratorTypes);
		rewriter.inlineRegionBefore(op.region(), newOp.region(),
		newOp.region().begin());
		// 6. Reshape the so that the type matches the uses.
		SmallVector<Value> newResults;
		for (auto result : llvm::enumerate(newOp->getResults())) {
		newResults.push_back(rewriter.create<TensorReshapeOp>(
		op->getLoc(), op.getOutputTensorTypes()[result.index()],
		result.value(), reassociation));
		}
		rewriter.replaceOp(op, newResults);
		return success();
		}
		};

/// Pattern to fuse a tensor_reshape op with its consumer		/// Pattern to fuse a tensor_reshape op with its consumer
/// generic/indexed_generic op, when the reshape op is collapsing		/// generic/indexed_generic op, when the reshape op is collapsing
/// dimensions. The dimensionality of the loop in the consumer is expanded.		/// dimensions. The dimensionality of the loop in the consumer is expanded.
template <typename GenericOpTy>		template <typename GenericOpTy>
class FoldWithProducerReshapeOpByExpansion		class FoldWithProducerReshapeOpByExpansion
: public OpRewritePattern<GenericOpTy> {		: public OpRewritePattern<GenericOpTy> {
public:		public:
FoldWithProducerReshapeOpByExpansion(MLIRContext *context,		FoldWithProducerReshapeOpByExpansion(MLIRContext *context,
▲ Show 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	void mlir::linalg::populateElementwiseOpsFusionPatterns(
populateFoldReshapeOpsByExpansionPatterns(		populateFoldReshapeOpsByExpansionPatterns(
patterns, options.allowFoldingUnitDimReshapes);		patterns, options.allowFoldingUnitDimReshapes);
AffineApplyOp::getCanonicalizationPatterns(patterns, context);		AffineApplyOp::getCanonicalizationPatterns(patterns, context);
GenericOp::getCanonicalizationPatterns(patterns, context);		GenericOp::getCanonicalizationPatterns(patterns, context);
IndexedGenericOp::getCanonicalizationPatterns(patterns, context);		IndexedGenericOp::getCanonicalizationPatterns(patterns, context);
TensorReshapeOp::getCanonicalizationPatterns(patterns, context);		TensorReshapeOp::getCanonicalizationPatterns(patterns, context);
}		}

		void mlir::linalg::populatePushReshapeOpsPatterns(RewritePatternSet &patterns) {
		mravishankarUnsubmitted Not Done Reply Inline Actions Nit: Would rename this to `populateFoldReshapeOpsByCollapsingPatterns` (its the dual of the `populateFoldReshapeOpsByExpansionPatterns` mravishankar: Nit: Would rename this to `populateFoldReshapeOpsByCollapsingPatterns` (its the dual of the…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions It doesn't really fold the reshape so I'm not sure about calling it fold. I renamed to push but I can change it again if you think there is a better name. ThomasRaoux: It doesn't really fold the reshape so I'm not sure about calling it fold. I renamed to push but…
		auto *context = patterns.getContext();
		patterns.add<PushExpandingReshape<GenericOp>,
		PushExpandingReshape<IndexedGenericOp>>(context);
		}

std::unique_ptr<Pass> mlir::createLinalgFusionOfTensorOpsPass() {		std::unique_ptr<Pass> mlir::createLinalgFusionOfTensorOpsPass() {
return std::make_unique<FusionOfTensorOpsPass>();		return std::make_unique<FusionOfTensorOpsPass>();
}		}

std::unique_ptr<Pass> mlir::createFoldReshapeOpsByLinearizationPass() {		std::unique_ptr<Pass> mlir::createFoldReshapeOpsByLinearizationPass() {
return std::make_unique<FoldReshapeOpsByLinearizationPass>();		return std::make_unique<FoldReshapeOpsByLinearizationPass>();
}		}

mlir/test/Dialect/Linalg/fusion-push-reshape.mlir

This file was added.

				// RUN: mlir-opt %s -test-linalg-push-reshape -split-input-file \| FileCheck %s

				// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>
				// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2) -> (d2)>
				// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>
				// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1) -> (d1)>

				// CHECK-LABEL: func @reshape
				// CHECK-SAME: (%[[A:.]]: tensor<?x16xf32>, %[[B:.]]: tensor<16xf32>, %[[INIT:.*]]: tensor<?x112x16xf32>)
				// CHECK: %[[RI:.*]] = linalg.tensor_reshape %[[INIT]] [#[[$MAP0]], #[[$MAP1]]] : tensor<?x112x16xf32> into tensor<?x16xf32>
				// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP2]], #[[$MAP3]], #[[$MAP2]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel"]}
				// CHECK-SAME: ins(%[[A]], %[[B]] : tensor<?x16xf32>, tensor<16xf32>) outs(%[[RI]] : tensor<?x16xf32>)
				// CHECK: %[[RR:.*]] = linalg.tensor_reshape %[[R]] [#[[$MAP0]], #[[$MAP1]]] : tensor<?x16xf32> into tensor<?x112x16xf32>
				// CHECK: return %[[RR]] : tensor<?x112x16xf32>
				func @reshape(%A: tensor<?x16xf32>, %B: tensor<16xf32>, %init: tensor<?x112x16xf32>) -> tensor<?x112x16xf32> {
				%0 = linalg.tensor_reshape %A [
				affine_map<(d0, d1, d2) -> (d0, d1)>, affine_map<(d0, d1, d2) -> (d2)>]
				: tensor<?x16xf32> into tensor<?x112x16xf32>
				%2 = linalg.generic {indexing_maps = [
				affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d2)>,
				affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
				iterator_types = ["parallel", "parallel", "parallel"]}
				ins(%0, %B : tensor<?x112x16xf32>, tensor<16xf32>)
				outs(%init : tensor<?x112x16xf32>) {
				^bb0(%arg1: f32, %arg2: f32, %arg3: f32): // no predecessors
				%s = subf %arg1, %arg2 : f32
				linalg.yield %s : f32
				} -> tensor<?x112x16xf32>
				return %2 : tensor<?x112x16xf32>
				}

				// -----

				// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>
				// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2) -> (d2)>
				// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>
				// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1) -> (d1)>

				// CHECK-LABEL: func @reshape_multiple
				// CHECK-SAME: (%[[A:.]]: tensor<12544x16xf32>, %[[B:.]]: tensor<12544x16xf32>, %[[C:.*]]: tensor<16xf32>)
				// CHECK: %[[I:.*]] = linalg.init_tensor [112, 112, 16] : tensor<112x112x16xf32>
				// CHECK: %[[RI:.*]] = linalg.tensor_reshape %[[I]] [#[[$MAP0]], #[[$MAP1]]] : tensor<112x112x16xf32> into tensor<12544x16xf32>
				// CHECK: %[[R:.*]] = linalg.generic {indexing_maps = [#[[$MAP2]], #[[$MAP2]], #[[$MAP3]], #[[$MAP2]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel"]}
				// CHECK-SAME: ins(%[[A]], %[[B]], %[[C]] : tensor<12544x16xf32>, tensor<12544x16xf32>, tensor<16xf32>) outs(%[[RI]] : tensor<12544x16xf32>)
				// CHECK: %[[RR:.*]] = linalg.tensor_reshape %[[R]] [#[[$MAP0]], #[[$MAP1]]] : tensor<12544x16xf32> into tensor<112x112x16xf32>
				// CHECK: return %[[RR]] : tensor<112x112x16xf32>
				func @reshape_multiple(%A: tensor<12544x16xf32>, %B: tensor<12544x16xf32>,
				%C: tensor<16xf32>) -> tensor<112x112x16xf32> {
				%0 = linalg.tensor_reshape %A [
				affine_map<(d0, d1, d2) -> (d0, d1)>, affine_map<(d0, d1, d2) -> (d2)>]
				: tensor<12544x16xf32> into tensor<112x112x16xf32>
				%1 = linalg.tensor_reshape %B [
				affine_map<(d0, d1, d2) -> (d0, d1)>, affine_map<(d0, d1, d2) -> (d2)>]
				: tensor<12544x16xf32> into tensor<112x112x16xf32>
				%2 = linalg.init_tensor [112, 112, 16] : tensor<112x112x16xf32>
				%3 = linalg.generic {indexing_maps = [
				affine_map<(d0, d1, d2) -> (d0, d1, d2)>,
				affine_map<(d0, d1, d2) -> (d0, d1, d2)>,
				affine_map<(d0, d1, d2) -> (d2)>,
				affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
				iterator_types = ["parallel", "parallel", "parallel"]}
				ins(%0, %1, %C : tensor<112x112x16xf32>, tensor<112x112x16xf32>, tensor<16xf32>)
				outs(%2 : tensor<112x112x16xf32>) {
				^bb0(%arg1: f32, %arg2: f32, %arg3: f32, %arg4: f32): // no predecessors
				%s = subf %arg1, %arg2 : f32
				%m = mulf %s, %arg3 : f32
				linalg.yield %m : f32
				} -> tensor<112x112x16xf32>
				return %3 : tensor<112x112x16xf32>
				}

				// -----

				// Negative test, since the second source is broadcasted from d1 we cannot merge
				// d0 and d1 dimensions
				// CHECK-LABEL: func @reshape_negative
				// CHECK: linalg.tensor_reshape {{.*}} : tensor<12544x16xf32> into tensor<112x112x16xf32>
				// CHECK: linalg.generic
				// CHECK: } -> tensor<112x112x16xf32>
				func @reshape_negative(%A: tensor<12544x16xf32>, %B: tensor<112xf32>) -> tensor<112x112x16xf32> {
				%20 = linalg.tensor_reshape %A [
				affine_map<(d0, d1, d2) -> (d0, d1)>, affine_map<(d0, d1, d2) -> (d2)>]
				: tensor<12544x16xf32> into tensor<112x112x16xf32>
				%21 = linalg.init_tensor [112, 112, 16] : tensor<112x112x16xf32>
				%22 = linalg.generic {indexing_maps = [
				affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d1)>,
				affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
				iterator_types = ["parallel", "parallel", "parallel"]}
				ins(%20, %B : tensor<112x112x16xf32>, tensor<112xf32>)
				outs(%21 : tensor<112x112x16xf32>) {
				^bb0(%arg1: f32, %arg2: f32, %arg3: f32): // no predecessors
				%s = subf %arg1, %arg2 : f32
				linalg.yield %s : f32
				} -> tensor<112x112x16xf32>
				return %22 : tensor<112x112x16xf32>
				}

mlir/test/lib/Transforms/TestLinalgElementwiseFusion.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	linalg::populateElementwiseOpsFusionPatterns(
fusionPatterns,		fusionPatterns,
linalg::LinalgElementwiseFusionOptions()		linalg::LinalgElementwiseFusionOptions()
.setControlElementwiseOpsFusionFn(setFusedOpOperandLimit<4>));		.setControlElementwiseOpsFusionFn(setFusedOpOperandLimit<4>));

(void)applyPatternsAndFoldGreedily(funcOp.getBody(),		(void)applyPatternsAndFoldGreedily(funcOp.getBody(),
std::move(fusionPatterns));		std::move(fusionPatterns));
}		}
};		};

		struct TestPushExpandingReshape
		: public PassWrapper<TestPushExpandingReshape, FunctionPass> {
		void getDependentDialects(DialectRegistry &registry) const override {
		registry
		.insert<AffineDialect, linalg::LinalgDialect, tensor::TensorDialect>();
		}

		void runOnFunction() override {
		MLIRContext *context = &this->getContext();
		FuncOp funcOp = this->getFunction();
		RewritePatternSet patterns(context);
		linalg::populatePushReshapeOpsPatterns(patterns);
		(void)applyPatternsAndFoldGreedily(funcOp.getBody(), std::move(patterns));
		}
		};
} // namespace		} // namespace

namespace test {		namespace test {
void registerTestLinalgElementwiseFusion() {		void registerTestLinalgElementwiseFusion() {
PassRegistration<TestLinalgElementwiseFusion> testElementwiseFusionPass(		PassRegistration<TestLinalgElementwiseFusion> testElementwiseFusionPass(
"test-linalg-elementwise-fusion-patterns",		"test-linalg-elementwise-fusion-patterns",
"Test Linalg element wise operation fusion patterns");		"Test Linalg element wise operation fusion patterns");
}		}

		void registerTestPushExpandingReshape() {
		PassRegistration<TestPushExpandingReshape> testPushExpandingReshapePass(
		"test-linalg-push-reshape", "Test Linalg reshape push patterns");
		}
} // namespace test		} // namespace test

} // namespace mlir		} // namespace mlir

mlir/tools/mlir-opt/mlir-opt.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
void registerTestDynamicPipelinePass();		void registerTestDynamicPipelinePass();
void registerTestExpandTanhPass();		void registerTestExpandTanhPass();
void registerTestComposeSubView();		void registerTestComposeSubView();
void registerTestGpuParallelLoopMappingPass();		void registerTestGpuParallelLoopMappingPass();
void registerTestIRVisitorsPass();		void registerTestIRVisitorsPass();
void registerTestInterfaces();		void registerTestInterfaces();
void registerTestLinalgCodegenStrategy();		void registerTestLinalgCodegenStrategy();
void registerTestLinalgElementwiseFusion();		void registerTestLinalgElementwiseFusion();
		void registerTestPushExpandingReshape();
void registerTestLinalgFusionTransforms();		void registerTestLinalgFusionTransforms();
void registerTestLinalgTensorFusionTransforms();		void registerTestLinalgTensorFusionTransforms();
void registerTestLinalgGreedyFusion();		void registerTestLinalgGreedyFusion();
void registerTestLinalgHoisting();		void registerTestLinalgHoisting();
void registerTestLinalgTileAndFuseSequencePass();		void registerTestLinalgTileAndFuseSequencePass();
void registerTestLinalgTransforms();		void registerTestLinalgTransforms();
void registerTestLivenessPass();		void registerTestLivenessPass();
void registerTestLoopFusion();		void registerTestLoopFusion();
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	#endif
test::registerTestDynamicPipelinePass();		test::registerTestDynamicPipelinePass();
test::registerTestExpandTanhPass();		test::registerTestExpandTanhPass();
test::registerTestComposeSubView();		test::registerTestComposeSubView();
test::registerTestGpuParallelLoopMappingPass();		test::registerTestGpuParallelLoopMappingPass();
test::registerTestIRVisitorsPass();		test::registerTestIRVisitorsPass();
test::registerTestInterfaces();		test::registerTestInterfaces();
test::registerTestLinalgCodegenStrategy();		test::registerTestLinalgCodegenStrategy();
test::registerTestLinalgElementwiseFusion();		test::registerTestLinalgElementwiseFusion();
		test::registerTestPushExpandingReshape();
test::registerTestLinalgFusionTransforms();		test::registerTestLinalgFusionTransforms();
test::registerTestLinalgTensorFusionTransforms();		test::registerTestLinalgTensorFusionTransforms();
test::registerTestLinalgGreedyFusion();		test::registerTestLinalgGreedyFusion();
test::registerTestLinalgHoisting();		test::registerTestLinalgHoisting();
test::registerTestLinalgTileAndFuseSequencePass();		test::registerTestLinalgTileAndFuseSequencePass();
test::registerTestLinalgTransforms();		test::registerTestLinalgTransforms();
test::registerTestLivenessPass();		test::registerTestLivenessPass();
test::registerTestLoopFusion();		test::registerTestLoopFusion();
Show All 30 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Add pattern to sink reshape after elementwise operationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 339461

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp

mlir/test/Dialect/Linalg/fusion-push-reshape.mlir

mlir/test/lib/Transforms/TestLinalgElementwiseFusion.cpp

mlir/tools/mlir-opt/mlir-opt.cpp

[mlir][linalg] Add pattern to sink reshape after elementwise operation
ClosedPublic