This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/
-
mlir/
-
Dialect/
-
Linalg/
-
IR/
-
LinalgOps.td
-
Passes.h
-
Utils/
-
Utils.h
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
3/7
FusionOnTensors.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
fusion-tensor.mlir
-
reshape_fusion.mlir

Differential D89002

[mlir][Linalg] Rethink fusion ot linalg ops with reshape ops.
ClosedPublic

Authored by mravishankar on Oct 7 2020, 12:52 PM.

Download Raw Diff

Details

Reviewers

hanchung
asaadaldien
nicolasvasilache
antiagainst
ThomasRaoux
pifon2a

Commits

rGde2568aab819: [mlir][Linalg] Rethink fusion of linalg ops with reshape ops.

Summary

The current fusion on tensors fuses reshape ops with generic ops by
linearizing the indexing maps of the fused tensor in the generic
op. This has some limitations

It only works for static shapes
The resulting indexing map has a linearization that would be potentially prevent fusion later on (for ex. tile + fuse).

Instead, try to fuse the reshape consumer (producer) with generic op
producer (consumer) by expanding the dimensionality of the generic op
when the reshape is expanding (folding).
This approach conflicts with the linearization approach. The use of
the linearization approach is still kept as the default, but the
approach added in this change will be made the default after further
experimentation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mravishankar created this revision.Oct 7 2020, 12:52 PM

Herald added subscribers: tatianashp, msifontes, jurahul and 11 others. · View Herald TranscriptOct 7 2020, 12:52 PM

mravishankar requested review of this revision.Oct 7 2020, 12:53 PM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald TranscriptOct 7 2020, 12:53 PM

Harbormaster completed remote builds in B74338: Diff 296774.Oct 7 2020, 1:13 PM

First pass...

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
325–326	nit: explicitly `return if(!reshapeOp.getSrcType().hasStaticShape()) return false;` to simplify condition below.
385	I wonder the case when reshape is a collapsing dims, how do we avoid linearization ?
398	Very nice! and the assumption here pushing reshapes up in the chain the top most reshape will be folded away (e.g IREE or in general reshape of arguments)
447	nit: s/folded/collapsed

I am unclear why this is implemented as part of fusion ?
My thinking was that reshapes that expand are pulled upwards, reshapes that contract are pushed downwards.
Applying this reshape folding onto producer and consumer ops enables tiling and fusion in higher-D.

Inside the tiles, depending on the tile sizes and the contiguity aspects, we may or may not be able to just "reshape-without-copy" to convert a matmul in 4-D form to a regular 2-D matmul (think fusing the output of the last conv layer with the following linear layer).

Can we revisit this to just introduce 2 patterns for each case and keep fusion separate from this?

In D89002#2318711, @nicolasvasilache wrote:

I am unclear why this is implemented as part of fusion ?
My thinking was that reshapes that expand are pulled upwards, reshapes that contract are pushed downwards.
Applying this reshape folding onto producer and consumer ops enables tiling and fusion in higher-D.

Inside the tiles, depending on the tile sizes and the contiguity aspects, we may or may not be able to just "reshape-without-copy" to convert a matmul in 4-D form to a regular 2-D matmul (think fusing the output of the last conv layer with the following linear layer).

Can we revisit this to just introduce 2 patterns for each case and keep fusion separate from this?

We can do this as folding, but the "fusion" added here conflicts with the fusion that uses "linearization" by folding a collapsing reshape with its producer by linearizing the collapsed dimensions in the indexing map of the fused tensor in the producer. This is opposite to the pattern here where collapsing reshapes get fused with its consumer. (In the previous two statements you can replace collapsing with expanding and producer with consumer and the statement still holds). So I think it makes sense to drop the linearization approach. Just trying to stage this here. Here is what I plan

Land this change in its current state
The reshape + (generic/indexed-generic) op fusion can be replace with patterns. Both for the expansion approach here and the linearization approach. The "expansion' pattern can be moved into a folding pattern with the linearization patterns dropped from the fusion pass here. I just need to verify that this works in IREE (which is probably the main user of these fusions). So far it seems to be the case, but the pain will be lesser if the switch is staged a bit

WDYT?

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
385	They conflict. So eventually want to deprecate linearization. No one is really using it, so maybe I will refactor these into separate patterns (they kind of already are). Will wait to get resolution on Nicolas' comment.
398	Thats right. The "collapsing" reshapes get pushed down the function to the returns, and the "expanding" reshape get pushed all the way to the function arguments.

L(very)GTM, thanks Mahesh!. I will leave the pattern splitting/refactoring discussion between you and Nicolas.

This revision is now accepted and ready to land.Oct 8 2020, 1:30 PM

SGTM, I am fine with evolving after this CL.

Just added a small comment.
I am tight on time this week and will not have time to review more deeply.
I'll take a deeper pass when this gets rewritten as independent patterns.

Thanks for pushing this!

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp
553	There is a clone in the interface LinalgStructuredOpsInterface.td. Can we (improve and) reuse that ?

Rebase

In D89002#2321837, @nicolasvasilache wrote:

Just added a small comment.
I am tight on time this week and will not have time to review more deeply.
I'll take a deeper pass when this gets rewritten as independent patterns.

Thanks for pushing this!

I added a couple of patches on top of this that does the refactoring to use patterns (particularly : https://reviews.llvm.org/D89201). PTAL. I would like to squash and submit them in one go.

Harbormaster completed remote builds in B74723: Diff 297449.Oct 11 2020, 1:55 AM

Please go ahead and squash merge once child comments are addressed.
Thanks for these improvements!

Herald added a subscriber: rdzhabarov. · View Herald TranscriptOct 14 2020, 7:59 AM

Closed by commit rGde2568aab819: [mlir][Linalg] Rethink fusion of linalg ops with reshape ops. (authored by mravishankar). · Explain WhyOct 14 2020, 2:00 PM

This revision was automatically updated to reflect the committed changes.

mravishankar added a commit: rGde2568aab819: [mlir][Linalg] Rethink fusion of linalg ops with reshape ops..

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

IR/

LinalgOps.td

2 lines

Passes.h

8 lines

Utils/

Utils.h

7 lines

lib/

Dialect/

Linalg/

Transforms/

FusionOnTensors.cpp

547 lines

test/

Dialect/

Linalg/

fusion-tensor.mlir

34 lines

reshape_fusion.mlir

192 lines

Diff 297449

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	let builders = [
OpBuilder<"OpBuilder &b, OperationState &result, Type resultType, "		OpBuilder<"OpBuilder &b, OperationState &result, Type resultType, "
"Value src, ArrayRef<ReassociationExprs> reassociation, "		"Value src, ArrayRef<ReassociationExprs> reassociation, "
"ArrayRef<NamedAttribute> attrs = {}">,		"ArrayRef<NamedAttribute> attrs = {}">,
OpBuilder<"OpBuilder &b, OperationState &result, Type resultType, "		OpBuilder<"OpBuilder &b, OperationState &result, Type resultType, "
"Value src, ArrayRef<ReassociationIndices> reassociation, "		"Value src, ArrayRef<ReassociationIndices> reassociation, "
"ArrayRef<NamedAttribute> attrs = {}", [{		"ArrayRef<NamedAttribute> attrs = {}", [{
auto reassociationMaps =		auto reassociationMaps =
convertReassociationIndicesToMaps(b, reassociation);		convertReassociationIndicesToMaps(b, reassociation);
build(b, result, src, reassociationMaps, attrs);		build(b, result, resultType, src, reassociationMaps, attrs);
}]>		}]>
];		];

code commonExtraClassDeclaration = [{		code commonExtraClassDeclaration = [{
static StringRef getReassociationAttrName() { return "reassociation"; }		static StringRef getReassociationAttrName() { return "reassociation"; }
SmallVector<AffineMap, 4> getReassociationMaps() {		SmallVector<AffineMap, 4> getReassociationMaps() {
return llvm::to_vector<4>(llvm::map_range(reassociation(), [		return llvm::to_vector<4>(llvm::map_range(reassociation(), [
](Attribute a) { return a.cast<AffineMapAttr>().getValue(); }));		](Attribute a) { return a.cast<AffineMapAttr>().getValue(); }));
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/Passes.h

	Show All 13 Lines
	#define MLIR_DIALECT_LINALG_PASSES_H_			#define MLIR_DIALECT_LINALG_PASSES_H_

	#include "mlir/Pass/Pass.h"			#include "mlir/Pass/Pass.h"

	namespace mlir {			namespace mlir {
	std::unique_ptr<OperationPass<FuncOp>> createLinalgFoldUnitExtentDimsPass();			std::unique_ptr<OperationPass<FuncOp>> createLinalgFoldUnitExtentDimsPass();

	std::unique_ptr<OperationPass<FuncOp>> createLinalgFusionPass();			std::unique_ptr<OperationPass<FuncOp>> createLinalgFusionPass();
	std::unique_ptr<Pass> createLinalgFusionOfTensorOpsPass();			std::unique_ptr<Pass>
				createLinalgFusionOfTensorOpsPass(bool useReshapeFusionByExpansion = false);

	std::unique_ptr<OperationPass<FuncOp>>			std::unique_ptr<OperationPass<FuncOp>>
	createLinalgTilingPass(ArrayRef<int64_t> tileSizes = {});			createLinalgTilingPass(ArrayRef<int64_t> tileSizes = {});

	std::unique_ptr<OperationPass<FuncOp>>			std::unique_ptr<OperationPass<FuncOp>>
	createLinalgTilingToParallelLoopsPass(ArrayRef<int64_t> tileSizes = {});			createLinalgTilingToParallelLoopsPass(ArrayRef<int64_t> tileSizes = {});

	std::unique_ptr<OperationPass<FuncOp>>			std::unique_ptr<OperationPass<FuncOp>>
	Show All 14 Lines
	std::unique_ptr<OperationPass<FuncOp>> createConvertLinalgToAffineLoopsPass();			std::unique_ptr<OperationPass<FuncOp>> createConvertLinalgToAffineLoopsPass();

	/// Create a pass to convert Linalg operations which work on tensors to use			/// Create a pass to convert Linalg operations which work on tensors to use
	/// buffers instead.			/// buffers instead.
	std::unique_ptr<OperationPass<ModuleOp>>			std::unique_ptr<OperationPass<ModuleOp>>
	createConvertLinalgOnTensorsToBuffersPass();			createConvertLinalgOnTensorsToBuffersPass();

	/// Patterns for fusing linalg operation on tensors.			/// Patterns for fusing linalg operation on tensors.
	void populateLinalgTensorOpsFusionPatterns(MLIRContext *context,			void populateLinalgTensorOpsFusionPatterns(
	OwningRewritePatternList &patterns);			MLIRContext *context, OwningRewritePatternList &patterns,
				bool useReshapeFusionByExpansion = false);

	/// Patterns to fold unit-extent dimensions in operands/results of linalg ops on			/// Patterns to fold unit-extent dimensions in operands/results of linalg ops on
	/// tensors.			/// tensors.
	void populateLinalgFoldUnitExtentDimsPatterns(			void populateLinalgFoldUnitExtentDimsPatterns(
	MLIRContext *context, OwningRewritePatternList &patterns);			MLIRContext *context, OwningRewritePatternList &patterns);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	Show All 9 Lines

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	/// method is called.			/// method is called.
	Optional<FusionInfo> fuseProducerOf(OpBuilder &b, LinalgOp consumer,			Optional<FusionInfo> fuseProducerOf(OpBuilder &b, LinalgOp consumer,
	unsigned consumerIdx,			unsigned consumerIdx,
	const LinalgDependenceGraph &graph,			const LinalgDependenceGraph &graph,
	OperationFolder *folder = nullptr);			OperationFolder *folder = nullptr);

	/// Fuse linalg operation on tensors, with the producer of the operand at			/// Fuse linalg operation on tensors, with the producer of the operand at
	/// position `consumerIdx` of the consumer.			/// position `consumerIdx` of the consumer.
	Operation fuseTensorOps(PatternRewriter &rewriter, Operation consumer,			Optional<SmallVector<Value, 1>>
	unsigned consumerIdx,			fuseTensorOps(PatternRewriter &rewriter, Operation *consumer,
	OperationFolder *folder = nullptr);			unsigned consumerIdx, OperationFolder *folder = nullptr,
				bool useReshapeFusionByExpansion = false);

	/// Returns the linearized list of all shape dimensions in a `linalgOp`.			/// Returns the linearized list of all shape dimensions in a `linalgOp`.
	/// Applying the inverse, concatenated loopToOperandRangeMaps to this list			/// Applying the inverse, concatenated loopToOperandRangeMaps to this list
	/// allows the derivation of loop ranges for any linalgOp.			/// allows the derivation of loop ranges for any linalgOp.
	SmallVector<Value, 8> getShape(OpBuilder &builder, LinalgOp linalgOp);			SmallVector<Value, 8> getShape(OpBuilder &builder, LinalgOp linalgOp);
	template <typename ConcreteOpTy>			template <typename ConcreteOpTy>
	SmallVector<Value, 8> getShape(OpBuilder &builder, ConcreteOpTy linalgOp) {			SmallVector<Value, 8> getShape(OpBuilder &builder, ConcreteOpTy linalgOp) {
	return getShape(builder, cast<linalg::LinalgOp>(linalgOp.getOperation()));			return getShape(builder, cast<linalg::LinalgOp>(linalgOp.getOperation()));
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp

Show All 22 Lines

using namespace mlir;		using namespace mlir;
using namespace mlir::linalg;		using namespace mlir::linalg;

namespace {		namespace {

/// Implementation of fusion of generic ops and indexed_generic ops.		/// Implementation of fusion of generic ops and indexed_generic ops.
struct FuseGenericOpsOnTensors {		struct FuseGenericOpsOnTensors {
static bool isFusible(LinalgOp producer, LinalgOp consumer,		static bool isFusable(LinalgOp producer, LinalgOp consumer,
unsigned consumerIdx) {		unsigned consumerIdx) {
// Producer and consumer must have tensor semantics.		// Producer and consumer must have tensor semantics.
if (!producer.hasTensorSemantics() \|\| !consumer.hasTensorSemantics())		if (!producer.hasTensorSemantics() \|\| !consumer.hasTensorSemantics())
return false;		return false;

// Verify that		// Verify that
// - the producer has all "parallel" iterator type.		// - the producer has all "parallel" iterator type.
if (producer.getNumParallelLoops() != producer.getNumLoops())		if (producer.getNumParallelLoops() != producer.getNumLoops())
return false;		return false;

// Get the consumer index map. The number of results of the consumer index		// Get the consumer index map. The number of results of the consumer index
// map must match the number of loops of the producer.		// map must match the number of loops of the producer.
AffineMap consumerIndexMap = consumer.getIndexingMap(consumerIdx);		AffineMap consumerIndexMap = consumer.getIndexingMap(consumerIdx);
if (consumerIndexMap.getNumResults() != producer.getNumLoops())		if (consumerIndexMap.getNumResults() != producer.getNumLoops())
return false;		return false;

// Finally the index_map for the result must be invertible. For now just		// Finally the index_map for the result must be invertible. For now just
// verify it is a permutation.		// verify it is a permutation.
AffineMap producerResultIndexMap = producer.getOutputIndexingMap(0);		AffineMap producerResultIndexMap = producer.getOutputIndexingMap(0);
return producerResultIndexMap.isPermutation();		return producerResultIndexMap.isPermutation();
}		}

static LinalgOp fuse(LinalgOp producer, LinalgOp consumer,		static Optional<SmallVector<Value, 1>>
unsigned consumerIdx, PatternRewriter &rewriter,		fuse(LinalgOp producer, LinalgOp consumer, unsigned consumerIdx,
OperationFolder *folder = nullptr) {		PatternRewriter &rewriter, OperationFolder *folder = nullptr) {
if (!isFusible(producer, consumer, consumerIdx))		if (!isFusable(producer, consumer, consumerIdx))
return nullptr;		return llvm::None;

unsigned numFusedOperands = producer.getOperation()->getNumOperands() +		unsigned numFusedOperands =
consumer.getOperation()->getNumOperands() - 1;		producer.getNumInputs() + consumer.getNumInputs() - 1;

// Compute the fused operands list,		// Compute the fused operands list,
SmallVector<Value, 2> fusedOperands;		SmallVector<Value, 2> fusedOperands;
fusedOperands.reserve(numFusedOperands);		fusedOperands.reserve(numFusedOperands);
auto consumerOperands = consumer.getOperation()->getOperands();		auto consumerOperands = consumer.getInputs();
auto producerOperands = producer.getOperation()->getOperands();		auto producerOperands = producer.getInputs();
fusedOperands.assign(consumerOperands.begin(),		fusedOperands.assign(consumerOperands.begin(),
std::next(consumerOperands.begin(), consumerIdx));		std::next(consumerOperands.begin(), consumerIdx));
fusedOperands.append(producerOperands.begin(), producerOperands.end());		fusedOperands.append(producerOperands.begin(), producerOperands.end());
fusedOperands.append(std::next(consumerOperands.begin(), consumerIdx + 1),		fusedOperands.append(std::next(consumerOperands.begin(), consumerIdx + 1),
consumerOperands.end());		consumerOperands.end());

// Compute indexing_maps for the fused operation. The indexing_maps for the		// Compute indexing_maps for the fused operation. The indexing_maps for the
// operands of the consumers that arent fused are the same. The		// operands of the consumers that arent fused are the same. The
// indexing_maps for the producers need to be computed based on the		// indexing_maps for the producers need to be computed based on the
// indexing_map of the operand at consumerIdx in the consumer.		// indexing_map of the operand at consumerIdx in the consumer.
SmallVector<Attribute, 4> fusedIndexMaps;		SmallVector<Attribute, 4> fusedIndexMaps;
auto consumerIndexMaps = consumer.indexing_maps();		auto consumerIndexMaps = consumer.indexing_maps();
fusedIndexMaps.reserve(fusedOperands.size() +		fusedIndexMaps.reserve(fusedOperands.size() + consumer.getNumOutputs());
consumer.getOperation()->getNumResults());
fusedIndexMaps.assign(consumerIndexMaps.begin(),		fusedIndexMaps.assign(consumerIndexMaps.begin(),
std::next(consumerIndexMaps.begin(), consumerIdx));		std::next(consumerIndexMaps.begin(), consumerIdx));
// Compute indexing maps for the producer args in the fused operation.		// Compute indexing maps for the producer args in the fused operation.
computeProducerOperandIndex(		computeProducerOperandIndex(
producer, consumer.getInputIndexingMap(consumerIdx), fusedIndexMaps);		producer, consumer.getInputIndexingMap(consumerIdx), fusedIndexMaps);

// Append the indexing maps for the remaining consumer operands.		// Append the indexing maps for the remaining consumer operands.
fusedIndexMaps.append(std::next(consumerIndexMaps.begin(), consumerIdx + 1),		fusedIndexMaps.append(std::next(consumerIndexMaps.begin(), consumerIdx + 1),
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	assert(invProducerResultIndexMap &&
"expected producer result indexig map to be invertible");		"expected producer result indexig map to be invertible");
// consumer loop -> producer loop		// consumer loop -> producer loop
AffineMap consumerToProducerLoopsMap =		AffineMap consumerToProducerLoopsMap =
invProducerResultIndexMap.compose(consumerResultIndexMap);		invProducerResultIndexMap.compose(consumerResultIndexMap);

generateFusedRegion(rewriter, fusedOp, producer, consumer,		generateFusedRegion(rewriter, fusedOp, producer, consumer,
consumerToProducerLoopsMap, consumerIdx,		consumerToProducerLoopsMap, consumerIdx,
consumer.getNumLoops());		consumer.getNumLoops());
return fusedOp;		return SmallVector<Value, 1>(fusedOp.getOperation()->getResults());
}		}

private:		private:
/// Append to `fusedOpIndexingMapAttrs` the indexing maps for the operands of		/// Append to `fusedOpIndexingMapAttrs` the indexing maps for the operands of
/// the `producer` to use in the fused operation given the indexing map of the		/// the `producer` to use in the fused operation given the indexing map of the
/// result of the producer in the consumer.		/// result of the producer in the consumer.
static void computeProducerOperandIndex(		static void computeProducerOperandIndex(
LinalgOp producer, AffineMap fusedConsumerArgIndexMap,		LinalgOp producer, AffineMap fusedConsumerArgIndexMap,
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	static AffineMap linearizeCollapsedDims(AffineMap sourceMap,
}		}
return AffineMap::get(sourceMap.getNumDims(), sourceMap.getNumSymbols(),		return AffineMap::get(sourceMap.getNumDims(), sourceMap.getNumSymbols(),
resultExprs, context);		resultExprs, context);
}		}

/// Checks if the `reshapeOp` can be fused with it consumer (if `asProducer` is		/// Checks if the `reshapeOp` can be fused with it consumer (if `asProducer` is
/// true) or its producer (if `asProducer` is false) given the indexing map at		/// true) or its producer (if `asProducer` is false) given the indexing map at
/// its use.		/// its use.
static bool isTensorReshapeOpFusible(TensorReshapeOp reshapeOp,		static bool isTensorReshapeOpFusableByLinearization(TensorReshapeOp reshapeOp,
AffineMap useIndexMap, bool asProducer) {		AffineMap useIndexMap,
		bool asProducer) {
RankedTensorType returnType = reshapeOp.getResultType();		RankedTensorType returnType = reshapeOp.getResultType();
RankedTensorType operandType = reshapeOp.getSrcType();		RankedTensorType operandType = reshapeOp.getSrcType();
// Reshape is fusible with its consumer (i.e. reshape as a producer) when its		// Reshape is fusable with its consumer (i.e. reshape as a producer) when its
// operand is of lesser rank than the result. Fusing when operand has higher		// operand is of lesser rank than the result. Fusing when operand has higher
// rank will require use of mods and divs in the indexing maps of the fused op		// rank will require use of mods and divs in the indexing maps of the fused op
// which would make it non-invertible. Similarly reshape is fused with its		// which would make it non-invertible. Similarly reshape is fused with its
// producer (i.e. reshape as consumer) only if the return type has lesser		// producer (i.e. reshape as consumer) only if the return type has lesser
// rank.		// rank.
if ((asProducer && returnType.getRank() < operandType.getRank()) \|\|		if ((asProducer && reshapeOp.getSrcType().hasStaticShape() &&
		asaadaldienUnsubmitted Not Done Reply Inline Actions nit: explicitly `return if(!reshapeOp.getSrcType().hasStaticShape()) return false;` to simplify condition below. asaadaldien: nit: explicitly `return if(!reshapeOp.getSrcType().hasStaticShape()) return false; ` to…
(!asProducer && operandType.getRank() < returnType.getRank()))		returnType.getRank() < operandType.getRank()) \|\|
		(!asProducer && reshapeOp.getResultType().hasStaticShape() &&
		operandType.getRank() < returnType.getRank()))
return false;		return false;
return useIndexMap.isPermutation();		return useIndexMap.isPermutation();
}		}

/// Based on the type of `op` create a linalg op of the same type, i.e. if `op`		/// Based on the type of `op` create a linalg op of the same type, i.e. if `op`
/// is a linalg.generic operation, the create a `linalg.generic` operation with		/// is a linalg.generic operation, the create a `linalg.generic` operation with
/// the given `args`. Expects `op` to be `linalg.generic` or		/// the given `args`. Expects `op` to be `linalg.generic` or
/// `linalg.indexed_generic`.		/// `linalg.indexed_generic`.
template <typename... Args>		template <typename... Args>
static LinalgOp createLinalgOpOfSameType(LinalgOp op, PatternRewriter &rewriter,		static LinalgOp createLinalgOpOfSameType(LinalgOp op, PatternRewriter &rewriter,
Args... args) {		Args... args) {
if (isa<GenericOp>(op.getOperation()))		if (isa<GenericOp>(op.getOperation()))
return cast<LinalgOp>(rewriter.create<GenericOp>(args...).getOperation());		return cast<LinalgOp>(rewriter.create<GenericOp>(args...).getOperation());
if (isa<IndexedGenericOp>(op.getOperation()))		if (isa<IndexedGenericOp>(op.getOperation()))
return cast<LinalgOp>(		return cast<LinalgOp>(
rewriter.create<IndexedGenericOp>(args...).getOperation());		rewriter.create<IndexedGenericOp>(args...).getOperation());
llvm_unreachable(		llvm_unreachable(
"expected only linalg.generic or linalg.indexed_generic ops");		"expected only linalg.generic or linalg.indexed_generic ops");
return nullptr;		return nullptr;
}		}

		/// Conditions for fusing a generic/indexed-generic operation with a reshape op
		/// by expanding the iteration space dimensionality. These are preconditions
		/// assumed by `fusWithReshapeByDimExpansion` which implements the following
		/// fusion pattern.
		///
		/// Consider
		///
		/// %c = linalg.generic ins(%a, %b : memref<?x?x?xf32>, memref<?x?xf32>)
		/// indexing_maps = [affine_map<(d0, d1, d2) -> (d1, d0, d2)>,
		/// affine_map<(d0, d1, d2) -> (d1, d2)>,
		/// affine_map<(d0, d1, d2) -> (d0, d2, d1)>]
		/// %d = linalg.tensor_reshape %c
		/// [affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1)>,
		/// affine_map<(d0, d1, d2, d3, d4, d5) -> (d2)>,
		/// affine_map<(d0, d1, d2, d3, d4, d5) -> (d3, d4, d5)>]
		/// : tensor<?x?x?xf32> into tensor<?x?x?x?x?x?xf32>
		///
		/// The reshape can be folded into the `linalgOp` if the
		/// generic/indexed-generic op loop dimensionality is increased to match the
		/// result (operand) of the tensor_reshape when the reshape is expanding
		/// (folding). The indexing_map of the fused tensor in the `linalgOp` and the
		/// reassociation map helps compute the indexing maps of the modified op. For
		/// the above example, based on the reassociation map it can be concluded that
		///
		/// - The loop used to access the first dimension of the fused tensor is split
		/// into two.
		/// - The loop used to access the second dimension of the fused tensor is kept
		/// as is.
		/// - The loop used to access the third dimension of the fused tensor is split
		/// into three.
		///
		/// i.e. (e0, e1, e2, e3, e4) is the domain of the indexing map of the modified
		/// op, then
		///
		/// d0 -> e0, e1
		asaadaldienUnsubmitted Done Reply Inline Actions I wonder the case when reshape is a collapsing dims, how do we avoid linearization ? asaadaldien: I wonder the case when reshape is a collapsing dims, how do we avoid linearization ?
		mravishankarAuthorUnsubmitted Done Reply Inline Actions They conflict. So eventually want to deprecate linearization. No one is really using it, so maybe I will refactor these into separate patterns (they kind of already are). Will wait to get resolution on Nicolas' comment. mravishankar: They conflict. So eventually want to deprecate linearization. No one is really using it, so…
		/// d1 -> e2, e3, e4
		/// d2 -> e5
		///
		/// substituting this, the generic op can be rewritten as
		///
		/// %d = linalg.generic ins(%0, %1 : )
		/// indexing_maps =
		/// [affine_map<(e0, e1, e2, e3, e4, e5) -> (e2, e3, e4, e0, e1, e5)>,
		/// affine_map<(e0, e1, e2, e3, e4, e5) -> (e2, e3, e4, e5)>,
		/// affine_map<(e0, e1, e2, e3, e4, e5) -> (e0, e1, e5, e2, e3, e4)>]
		///
		/// Since operands to the linalg generic are now 5D, reshapes can be introduced
		/// to make it consistent
		asaadaldienUnsubmitted Not Done Reply Inline Actions Very nice! and the assumption here pushing reshapes up in the chain the top most reshape will be folded away (e.g IREE or in general reshape of arguments) asaadaldien: Very nice! and the assumption here pushing reshapes up in the chain the top most reshape will…
		mravishankarAuthorUnsubmitted Done Reply Inline Actions Thats right. The "collapsing" reshapes get pushed down the function to the returns, and the "expanding" reshape get pushed all the way to the function arguments. mravishankar: Thats right. The "collapsing" reshapes get pushed down the function to the returns, and the…
		///
		/// %0 = linalg.tensor_reshape %a
		/// [affine_map<(e0, e1, e2, e3, e4, e5) -> (e0, e1, e2),
		/// affine_map<(e0, e1, e2, e3, e4, e5) -> (e3, e4),
		/// affine_map<(e0, e1, e2, e3, e4, e5) -> (e5)]
		/// : tensor<?x?x?xf32> into tensor<?x?x?x?x?x?xf32>
		/// %1 = linalg.tensor_reshape %b
		/// [affine_map<(e0, e1, e2, e3) -> (e0, e1, e2),
		/// affine_map<(e0, e1, e2, e3) -> (e3)]
		/// : tensor<?x?x?xf32> into tensor<?x?x?x?xf32>
		///
		/// The added reshapes are again expanding patterns, so they will get fused
		/// with its producers if possible.
		static bool isFusableWithReshapeByDimExpansion(LinalgOp linalgOp,
		unsigned fusedTensorIndex) {
		// Is fusable only if:
		// - The linalgOp is a generic op.
		// - The linalgOp has tensor semantics with a single output.
		// - All the indexing maps for operands in linalgOp are projected
		// permutations.
		// - The indexing map at the position representing the fused tensor is a
		// permutation.
		// - All the loops in linalgOp are parallel loops.
		return isa<GenericOp>(linalgOp.getOperation()) &&
		linalgOp.hasTensorSemantics() && linalgOp.getNumOutputs() == 1 &&
		llvm::all_of(linalgOp.indexing_maps().getValue().take_front(
		linalgOp.getNumInputs()),
		[](Attribute attr) {
		return attr.cast<AffineMapAttr>()
		.getValue()
		.isProjectedPermutation();
		}) &&
		linalgOp.getIndexingMap(fusedTensorIndex).isPermutation() &&
		llvm::all_of(linalgOp.iterator_types(), [](Attribute attr) {
		return attr.cast<StringAttr>().getValue() ==
		getParallelIteratorTypeName();
		});
		}

		static Optional<SmallVector<Value, 1>>
		fuseWithReshapeByExpansion(LinalgOp linalgOp, TensorReshapeOp reshapeOp,
		unsigned fusedTensorIndex, PatternRewriter &rewriter,
		OperationFolder *folder = nullptr) {
		// Check if reshape is expanding or collapsing.
		bool isExpanding =
		reshapeOp.getSrcType().getRank() < reshapeOp.getResultType().getRank();
		RankedTensorType expandedType =
		isExpanding ? reshapeOp.getResultType() : reshapeOp.getSrcType();
		RankedTensorType foldedType =
		asaadaldienUnsubmitted Not Done Reply Inline Actions nit: s/folded/collapsed asaadaldien: nit: s/folded/collapsed
		isExpanding ? reshapeOp.getSrcType() : reshapeOp.getResultType();
		AffineMap fusedIndexMap = linalgOp.getIndexingMap(fusedTensorIndex);

		// The reshape is folding/expanding consecutive dimensions. Given the indexing
		// map of the fused tensor find the number of dimensions each of the loops of
		// the original op is expanded into. Also record the shape of the expanded
		// dimensions.
		ArrayRef<int64_t> expandedShape = expandedType.getShape();
		SmallVector<unsigned, 4> numFoldedDims(foldedType.getRank(), 0);
		SmallVector<SmallVector<int64_t, 4>, 4> expandedDimsShape(
		expandedType.getRank());
		auto reassociationMaps = reshapeOp.getReassociationMaps();
		for (auto resultExpr : llvm::enumerate(fusedIndexMap.getResults())) {
		unsigned pos = resultExpr.value().cast<AffineDimExpr>().getPosition();
		AffineMap foldedDims = reassociationMaps[resultExpr.index()];
		numFoldedDims[pos] = foldedDims.getNumResults();
		ArrayRef<int64_t> shape = expandedShape.slice(
		foldedDims.getResult(0).cast<AffineDimExpr>().getPosition(),
		numFoldedDims[pos]);
		expandedDimsShape[pos].assign(shape.begin(), shape.end());
		}

		// The remapping of the indices is then the prefix sum (inclusive) of the
		// numFoldedDims.
		SmallVector<unsigned, 4> remapping(numFoldedDims.size() + 1, 0);
		unsigned sum = 0;
		for (auto numFoldedDim : llvm::enumerate(numFoldedDims)) {
		sum += numFoldedDim.value();
		remapping[numFoldedDim.index() + 1] = sum;
		}

		SmallVector<AffineMap, 4> expandedOpIndexingMaps;
		// Compute the modified indexing maps by replacing every loop (AffineDimExpr)
		// in the original indexing map with the sequence of loops that it is expanded
		// to.
		for (AffineMap indexingMap : linalgOp.getIndexingMaps()) {
		SmallVector<AffineExpr, 4> newExprs;
		for (AffineExpr expr : indexingMap.getResults()) {
		unsigned pos = expr.cast<AffineDimExpr>().getPosition();
		for (unsigned newPos :
		llvm::seq<unsigned>(remapping[pos], remapping[pos + 1])) {
		newExprs.push_back(rewriter.getAffineDimExpr(newPos));
		}
		}
		expandedOpIndexingMaps.push_back(
		AffineMap::get(remapping.back(), indexingMap.getNumSymbols(), newExprs,
		rewriter.getContext()));
		}

		// The operands of the expanded op are computed by reshaping the original
		// operands. The reshape depends on the ordering of the loop used to access
		// the tensor in the original operation, and are expanded into as many
		// dimensions as the loop is expanded into (as computed by `remapping`).
		auto getReshapeInfo =
		[&](AffineMap operandIndexingMap,
		SmallVectorImpl<ReassociationIndices> &reassociation,
		SmallVectorImpl<int64_t> &expandedOpOperandShape) {
		unsigned reshapeDims = 0;
		for (AffineExpr expr : operandIndexingMap.getResults()) {
		unsigned origDim = expr.cast<AffineDimExpr>().getPosition();
		auto foldedDims = llvm::seq<int64_t>(
		reshapeDims, reshapeDims + numFoldedDims[origDim]);
		reassociation.emplace_back(foldedDims.begin(), foldedDims.end());
		expandedOpOperandShape.append(expandedDimsShape[origDim].begin(),
		expandedDimsShape[origDim].end());
		reshapeDims += numFoldedDims[origDim];
		}
		};
		SmallVector<Value, 4> expandedOpOperands;
		for (auto operand : llvm::enumerate(linalgOp.getInputs())) {
		if (operand.index() == fusedTensorIndex) {
		expandedOpOperands.push_back(reshapeOp.src());
		continue;
		}
		AffineMap indexingMap = linalgOp.getIndexingMap(operand.index());
		SmallVector<ReassociationIndices, 4> reassociation;
		SmallVector<int64_t, 4> expandedOperandShape;
		getReshapeInfo(indexingMap, reassociation, expandedOperandShape);
		Type expandedOperandType = RankedTensorType::get(
		expandedOperandShape,
		operand.value().getType().cast<ShapedType>().getElementType());
		if (expandedOperandType != operand.value().getType()) {
		expandedOpOperands.push_back(rewriter.create<TensorReshapeOp>(
		linalgOp.getLoc(), expandedOperandType, operand.value(),
		reassociation));
		} else {
		expandedOpOperands.push_back(operand.value());
		}
		}
		SmallVector<Type, 1> resultTypes;
		SmallVector<SmallVector<ReassociationIndices, 4>, 1> resultReassociation;
		for (auto result : llvm::enumerate(linalgOp.getOperation()->getResults())) {
		AffineMap indexingMap =
		linalgOp.getIndexingMap(linalgOp.getNumInputs() + result.index());
		SmallVector<ReassociationIndices, 4> reassociation;
		SmallVector<int64_t, 4> expandedResultShape;
		getReshapeInfo(indexingMap, reassociation, expandedResultShape);
		resultTypes.push_back(RankedTensorType::get(
		expandedResultShape,
		result.value().getType().cast<ShapedType>().getElementType()));
		resultReassociation.emplace_back(std::move(reassociation));
		}

		// The iterator types of the expanded op are all parallel.
		SmallVector<StringRef, 4> iteratorTypes(remapping.back(),
		getParallelIteratorTypeName());
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions There is a clone in the interface LinalgStructuredOpsInterface.td. Can we (improve and) reuse that ? nicolasvasilache: There is a clone in the interface LinalgStructuredOpsInterface.td. Can we (improve and) reuse…

		LinalgOp fusedOp = createLinalgOpOfSameType(
		linalgOp, rewriter, linalgOp.getLoc(), resultTypes,
		/inputs=/expandedOpOperands,
		/outputBuffers=/ValueRange{},
		/initTensors=/ValueRange{}, expandedOpIndexingMaps, iteratorTypes);
		Region &fusedRegion = fusedOp.getOperation()->getRegion(0);
		// TODO: Add support for indexed generic op, which would need mapping the
		// expanded dimensions to the original dimension arguments.
		rewriter.cloneRegionBefore(linalgOp.getOperation()->getRegion(0), fusedRegion,
		fusedRegion.begin());

		// Reshape the result values to their original shape if this is a collapsing
		// reshape folded into its consumer.
		SmallVector<Value, 1> resultVals;
		for (auto result : llvm::enumerate(linalgOp.getOperation()->getResults())) {
		if (!isExpanding &&
		resultTypes[result.index()] != result.value().getType()) {
		resultVals.push_back(rewriter.create<TensorReshapeOp>(
		linalgOp.getLoc(), result.value().getType(),
		fusedOp.getOperation()->getResult(result.index()),
		resultReassociation[result.index()]));
		} else {
		resultVals.push_back(fusedOp.getOperation()->getResult(result.index()));
		}
		}
		// Assuming a single result.
		return resultVals;
		}

namespace {		namespace {

/// Implementation of fusion on tensor ops when producer is a TensorReshapeOp.		/// Implementation of fusion on tensor ops when producer is a TensorReshapeOp.
struct FuseTensorReshapeOpAsProducer {		struct FuseTensorReshapeOpAsProducerByLinearization {
static bool isFusible(TensorReshapeOp producer, LinalgOp consumer,		static bool isFusable(TensorReshapeOp producer, LinalgOp consumer,
unsigned consumerIdx) {		unsigned consumerIdx) {
return isa<GenericOp, IndexedGenericOp>(consumer.getOperation()) &&		return isa<GenericOp, IndexedGenericOp>(consumer.getOperation()) &&
consumer.hasTensorSemantics() &&		consumer.hasTensorSemantics() &&
isTensorReshapeOpFusible(producer,		isTensorReshapeOpFusableByLinearization(
consumer.getInputIndexingMap(consumerIdx),		producer, consumer.getInputIndexingMap(consumerIdx),
/asProducer=/true);		/asProducer=/true);
}		}

static LinalgOp fuse(TensorReshapeOp producer, LinalgOp consumer,		static Optional<SmallVector<Value, 1>>
unsigned consumerIdx, PatternRewriter &rewriter,		fuse(TensorReshapeOp producer, LinalgOp consumer, unsigned consumerIdx,
OperationFolder *folder = nullptr) {		PatternRewriter &rewriter, OperationFolder *folder = nullptr) {
if (producer.src().getDefiningOp<ConstantOp>())		if (producer.src().getDefiningOp<ConstantOp>())
return nullptr;		return llvm::None;

if (!isFusible(producer, consumer, consumerIdx))		if (!isFusable(producer, consumer, consumerIdx))
return nullptr;		return llvm::None;

// Compute the fused operands list,		// Compute the fused operands list,
Operation *consumerOp = consumer.getOperation();		SmallVector<Value, 2> fusedOperands(consumer.getInputs());
SmallVector<Value, 2> fusedOperands(consumerOp->getOperands());
fusedOperands[consumerIdx] = producer.src();		fusedOperands[consumerIdx] = producer.src();

// Compute indexing_maps for the fused operation. The indexing_maps for the		// Compute indexing_maps for the fused operation. The indexing_maps for the
// operands of the consumers that arent fused are the same.		// operands of the consumers that arent fused are the same.
SmallVector<AffineMap, 4> fusedIndexMaps =		SmallVector<AffineMap, 4> fusedIndexMaps =
llvm::to_vector<4>(llvm::map_range(		llvm::to_vector<4>(llvm::map_range(
consumer.indexing_maps(), [](Attribute attr) -> AffineMap {		consumer.indexing_maps(), [](Attribute attr) -> AffineMap {
return attr.cast<AffineMapAttr>().getValue();		return attr.cast<AffineMapAttr>().getValue();
}));		}));

// Accepted consumer maps are either identity or permutation.		// Accepted consumer maps are either identity or permutation.
auto invMap = inversePermutation(fusedIndexMaps[consumerIdx]);		auto invMap = inversePermutation(fusedIndexMaps[consumerIdx]);

// Compute the indexing map to use for the operand of the producer.		// Compute the indexing map to use for the operand of the producer.
AffineMap modifiedMap =		AffineMap modifiedMap =
linearizeCollapsedDims(invMap, producer.getResultType().getShape(),		linearizeCollapsedDims(invMap, producer.getResultType().getShape(),
producer.getReassociationMaps());		producer.getReassociationMaps());
for (AffineExpr expr : modifiedMap.getResults()) {		for (AffineExpr expr : modifiedMap.getResults()) {
if (!expr.isPureAffine())		if (!expr.isPureAffine())
return nullptr;		return llvm::None;
}		}
fusedIndexMaps[consumerIdx] = modifiedMap;		fusedIndexMaps[consumerIdx] = modifiedMap;

// Further check that the resulting index maps can be fused and		// Further check that the resulting index maps can be fused and
// inverted. Without this the resultant op is not legal.		// inverted. Without this the resultant op is not legal.
if (!inversePermutation(concatAffineMaps(fusedIndexMaps)))		if (!inversePermutation(concatAffineMaps(fusedIndexMaps)))
return nullptr;		return llvm::None;

SmallVector<Attribute, 4> indexMapAttrs = llvm::to_vector<4>(		SmallVector<Attribute, 4> indexMapAttrs = llvm::to_vector<4>(
llvm::map_range(fusedIndexMaps, [](AffineMap map) -> Attribute {		llvm::map_range(fusedIndexMaps, [](AffineMap map) -> Attribute {
return AffineMapAttr::get(map);		return AffineMapAttr::get(map);
}));		}));
LinalgOp fusedOp = createLinalgOpOfSameType(		LinalgOp fusedOp = createLinalgOpOfSameType(
consumer, rewriter, rewriter.getUnknownLoc(),		consumer, rewriter, rewriter.getUnknownLoc(),
consumerOp->getResultTypes(),		consumer.getOperation()->getResultTypes(),
/inputs=/fusedOperands,		/inputs=/fusedOperands,
/outputBuffers=/ValueRange{},		/outputBuffers=/ValueRange{},
/initTensors=/ValueRange{}, // no init tensors for now.		/initTensors=/ValueRange{}, // no init tensors for now.
rewriter.getArrayAttr(indexMapAttrs), consumer.iterator_types(),		rewriter.getArrayAttr(indexMapAttrs), consumer.iterator_types(),
/doc=/nullptr,		/doc=/nullptr,
/library_call=/nullptr,		/library_call=/nullptr,
/symbol_source=/nullptr);		/symbol_source=/nullptr);
auto &fusedRegion = fusedOp.getOperation()->getRegion(0);		auto &fusedRegion = fusedOp.getOperation()->getRegion(0);
rewriter.cloneRegionBefore(consumerOp->getRegion(0), fusedRegion,		rewriter.cloneRegionBefore(consumer.getOperation()->getRegion(0),
fusedRegion.begin());		fusedRegion, fusedRegion.begin());
return fusedOp;		return SmallVector<Value, 1>(fusedOp.getOperation()->getResults());
}		}
};		};

/// Implementation of fusion on tensor ops when consumer is a TensorReshapeOp.		struct FuseTensorReshapeOpAsProducerByExpansion {
struct FuseTensorReshapeOpAsConsumer {		static bool isFusable(TensorReshapeOp producer, LinalgOp consumer,
static bool isCollapsingAndFusible(LinalgOp producer,		unsigned consumerIdx) {
TensorReshapeOp consumer,		// Fuse only if
		// - The tensor reshape op is folding.
		// - All constraints of fusing with reshape by expansion are met.
		return producer.getSrcType().getRank() >
		producer.getResultType().getRank() &&
		isFusableWithReshapeByDimExpansion(consumer, consumerIdx);
		}

		static Optional<SmallVector<Value, 1>>
		fuse(TensorReshapeOp producer, LinalgOp consumer, unsigned consumerIdx,
		PatternRewriter &rewriter, OperationFolder *folder = nullptr) {
		if (!isFusable(producer, consumer, consumerIdx))
		return llvm::None;
		return fuseWithReshapeByExpansion(consumer, producer, consumerIdx,
		rewriter);
		}
		};

		struct FuseTensorReshapeOpAsConsumerByLinearization {
		static bool isFusable(LinalgOp producer, TensorReshapeOp consumer,
unsigned consumerIdx) {		unsigned consumerIdx) {
return isa<GenericOp, IndexedGenericOp>(producer.getOperation()) &&		return isa<GenericOp, IndexedGenericOp>(producer.getOperation()) &&
producer.hasTensorSemantics() &&		producer.hasTensorSemantics() &&
isTensorReshapeOpFusible(consumer, producer.getOutputIndexingMap(0),		isTensorReshapeOpFusableByLinearization(
		consumer, producer.getOutputIndexingMap(0),
/asProducer=/false);		/asProducer=/false);
}		}

static LinalgOp fuseCollapsingCase(LinalgOp producer,		static Optional<SmallVector<Value, 1>>
TensorReshapeOp consumer,		fuse(LinalgOp producer, TensorReshapeOp consumer, unsigned consumerIdx,
unsigned consumerIdx,		PatternRewriter &rewriter, OperationFolder *folder = nullptr) {
PatternRewriter &rewriter) {		if (!isFusable(producer, consumer, consumerIdx))
		return llvm::None;
// The indexing_maps for the operands of the fused operation are same as		// The indexing_maps for the operands of the fused operation are same as
// those for the operands of the producer.		// those for the operands of the producer.
SmallVector<AffineMap, 4> fusedIndexMaps =		SmallVector<AffineMap, 4> fusedIndexMaps =
llvm::to_vector<4>(llvm::map_range(		llvm::to_vector<4>(llvm::map_range(
producer.indexing_maps(), [](Attribute attr) -> AffineMap {		producer.indexing_maps(), [](Attribute attr) -> AffineMap {
return attr.cast<AffineMapAttr>().getValue();		return attr.cast<AffineMapAttr>().getValue();
}));		}));

auto invMap = inversePermutation(producer.getOutputIndexingMap(0));		auto invMap = inversePermutation(producer.getOutputIndexingMap(0));

// Compute the indexing map to use for the operand of the producer.		// Compute the indexing map to use for the operand of the producer.
AffineMap modifiedMap =		AffineMap modifiedMap =
linearizeCollapsedDims(invMap, consumer.getSrcType().getShape(),		linearizeCollapsedDims(invMap, consumer.getSrcType().getShape(),
consumer.getReassociationMaps());		consumer.getReassociationMaps());
for (AffineExpr expr : modifiedMap.getResults()) {		for (AffineExpr expr : modifiedMap.getResults()) {
if (!expr.isPureAffine())		if (!expr.isPureAffine())
return nullptr;		return llvm::None;
}		}
fusedIndexMaps.back() = modifiedMap;		fusedIndexMaps.back() = modifiedMap;

// Further check that the resulting index maps can be fused and		// Further check that the resulting index maps can be fused and
// inverted. Without this the resultant op is not legal.		// inverted. Without this the resultant op is not legal.
if (!inversePermutation(concatAffineMaps(fusedIndexMaps)))		if (!inversePermutation(concatAffineMaps(fusedIndexMaps)))
return nullptr;		return llvm::None;

SmallVector<Attribute, 4> indexMapAttrs = llvm::to_vector<4>(		SmallVector<Attribute, 4> indexMapAttrs = llvm::to_vector<4>(
llvm::map_range(fusedIndexMaps, [](AffineMap map) -> Attribute {		llvm::map_range(fusedIndexMaps, [](AffineMap map) -> Attribute {
return AffineMapAttr::get(map);		return AffineMapAttr::get(map);
}));		}));

Operation *producerOp = producer.getOperation();
LinalgOp fusedOp = createLinalgOpOfSameType(		LinalgOp fusedOp = createLinalgOpOfSameType(
producer, rewriter, rewriter.getUnknownLoc(), consumer.getResultType(),		producer, rewriter, rewriter.getUnknownLoc(), consumer.getResultType(),
/inputs=/producerOp->getOperands(),		/inputs=/producer.getInputs(),
/outputBuffers=/ValueRange{},		/outputBuffers=/ValueRange{},
/initTensors=/ValueRange{}, // no init tensors for now.		/initTensors=/ValueRange{}, // no init tensors for now.
rewriter.getArrayAttr(indexMapAttrs), producer.iterator_types(),		rewriter.getArrayAttr(indexMapAttrs), producer.iterator_types(),
/doc=/nullptr,		/doc=/nullptr,
/library_call=/nullptr,		/library_call=/nullptr,
/symbol_source=/nullptr);		/symbol_source=/nullptr);
auto &fusedRegion = fusedOp.getOperation()->getRegion(0);		auto &fusedRegion = fusedOp.getOperation()->getRegion(0);
rewriter.cloneRegionBefore(producerOp->getRegion(0), fusedRegion,		rewriter.cloneRegionBefore(producer.getOperation()->getRegion(0),
fusedRegion.begin());		fusedRegion, fusedRegion.begin());
return fusedOp;		return SmallVector<Value, 1>(fusedOp.getOperation()->getResults());
}		}
		};

static bool isExpandingAndFusible(LinalgOp producer, TensorReshapeOp consumer,		/// Implementation of fusion on tensor ops when consumer is a TensorReshapeOp.
		struct FuseTensorReshapeOpAsConsumerByExpansion {
		static bool isFusable(LinalgOp producer, TensorReshapeOp consumer,
unsigned consumerIdx) {		unsigned consumerIdx) {
// Is fusible only if:		// Fuse only if
// 1) The producer is a generic op.		// - The tensor reshape op is a expanding case.
// 2) The producer has tensor semantics.		// - All constraints of fusing with reshape by expansion are met.
// 3) The tensor reshape op is a expanding case.		return consumer.getSrcType().getRank() <
// 4) All the shapes are the same for the generic op.
// 5) All the indexing maps in producer are identity.
// 6) All the loops in producer are parallel loops.
// 7) The producer has a single user.
auto types = producer.getInputOutputShapedTypes();
assert(!types.empty());
return isa<GenericOp>(producer.getOperation()) &&
producer.hasTensorSemantics() &&
consumer.getSrcType().getRank() <
consumer.getResultType().getRank() &&		consumer.getResultType().getRank() &&
std::equal(types.begin() + 1, types.end(), types.begin()) &&		isFusableWithReshapeByDimExpansion(producer,
llvm::all_of(producer.getIndexingMaps(),		producer.getNumInputs());
[](AffineMap map) { return map.isIdentity(); }) &&
llvm::all_of(producer.iterator_types(),
[](Attribute attr) {
return attr.cast<StringAttr>().getValue() ==
getParallelIteratorTypeName();
}) &&
producer.getOperation()->hasOneUse();
}

static LinalgOp fuseExpandingCase(LinalgOp producer, TensorReshapeOp consumer,
unsigned consumerIdx,
PatternRewriter &rewriter) {
Location loc = producer.getLoc();
auto dstShape = consumer.getResultType().cast<ShapedType>().getShape();
SmallVector<Value, 4> args;
for (auto arg : producer.getOperation()->getOperands()) {
auto type = RankedTensorType::get(
dstShape, arg.getType().cast<ShapedType>().getElementType());
args.push_back(rewriter.createOrFold<linalg::TensorReshapeOp>(
loc, type, arg, consumer.reassociation()));
}

SmallVector<Type, 4> resultTypes;
for (auto t : producer.getOutputTensorTypes()) {
Type type = RankedTensorType::get(dstShape,
t.cast<ShapedType>().getElementType());
resultTypes.push_back(type);
}

int rank = dstShape.size();
auto genericOp = rewriter.create<linalg::GenericOp>(
loc, resultTypes, /inputs=/args,
/outputBuffers=/ValueRange{},
/initTensors=/ValueRange{},
SmallVector<AffineMap, 3>(args.size() + resultTypes.size(),
rewriter.getMultiDimIdentityMap(rank)),
SmallVector<StringRef, 3>(rank, getParallelIteratorTypeName()));
Region &region = genericOp.getRegion();
rewriter.cloneRegionBefore(producer.getOperation()->getRegion(0), region,
region.begin());
return cast<LinalgOp>(genericOp.getOperation());
}		}

static LinalgOp fuse(LinalgOp producer, TensorReshapeOp consumer,		static Optional<SmallVector<Value, 1>>
unsigned consumerIdx, PatternRewriter &rewriter,		fuse(LinalgOp producer, TensorReshapeOp consumer, unsigned consumerIdx,
OperationFolder *folder = nullptr) {		PatternRewriter &rewriter, OperationFolder *folder = nullptr) {
if (isCollapsingAndFusible(producer, consumer, consumerIdx))		if (!isFusable(producer, consumer, consumerIdx))
return fuseCollapsingCase(producer, consumer, consumerIdx, rewriter);		return llvm::None;
if (isExpandingAndFusible(producer, consumer, consumerIdx))		return fuseWithReshapeByExpansion(
return fuseExpandingCase(producer, consumer, consumerIdx, rewriter);		producer, consumer, producer.getNumInputs(), rewriter, folder);
return nullptr;
}		}
};		};

/// Implementation of fusion on tensor ops when producer is a splat constant.		/// Implementation of fusion on tensor ops when producer is a splat constant.
struct FuseConstantOpAsProducer {		struct FuseConstantOpAsProducer {
static bool isFusible(ConstantOp producer, LinalgOp consumer,		static bool isFusable(ConstantOp producer, LinalgOp consumer,
unsigned consumerIdx) {		unsigned consumerIdx) {
return isa<GenericOp, IndexedGenericOp>(consumer.getOperation()) &&		return isa<GenericOp, IndexedGenericOp>(consumer.getOperation()) &&
consumer.hasTensorSemantics() &&		consumer.hasTensorSemantics() &&
producer.getResult().getType().isa<RankedTensorType>() &&		producer.getResult().getType().isa<RankedTensorType>() &&
producer.value().cast<DenseElementsAttr>().isSplat();		producer.value().cast<DenseElementsAttr>().isSplat();
}		}

static LinalgOp fuse(ConstantOp producer, LinalgOp consumer,		static Optional<SmallVector<Value, 1>>
unsigned consumerIdx, PatternRewriter &rewriter,		fuse(ConstantOp producer, LinalgOp consumer, unsigned consumerIdx,
OperationFolder *folder = nullptr) {		PatternRewriter &rewriter, OperationFolder *folder = nullptr) {
if (!isFusible(producer, consumer, consumerIdx))		if (!isFusable(producer, consumer, consumerIdx))
return nullptr;		return llvm::None;

// The indexing_maps for the operands of the fused operation are same as		// The indexing_maps for the operands of the fused operation are same as
// those for the operands of the consumer without the indexing map at		// those for the operands of the consumer without the indexing map at
// consumerIdx		// consumerIdx
SmallVector<AffineMap, 4> fusedIndexMaps =		SmallVector<AffineMap, 4> fusedIndexMaps =
llvm::to_vector<4>(llvm::map_range(		llvm::to_vector<4>(llvm::map_range(
consumer.indexing_maps(), [](Attribute attr) -> AffineMap {		consumer.indexing_maps(), [](Attribute attr) -> AffineMap {
return attr.cast<AffineMapAttr>().getValue();		return attr.cast<AffineMapAttr>().getValue();
}));		}));
fusedIndexMaps.erase(std::next(fusedIndexMaps.begin(), consumerIdx));		fusedIndexMaps.erase(std::next(fusedIndexMaps.begin(), consumerIdx));

// The operands list is same as the consumer with the argument for constant		// The operands list is same as the consumer with the argument for constant
// index dropped.		// index dropped.
Operation *consumerOp = consumer.getOperation();		SmallVector<Value, 4> fusedOperands(consumer.getInputs());
SmallVector<Value, 4> fusedOperands(consumerOp->getOperands());
fusedOperands.erase(std::next(fusedOperands.begin(), consumerIdx));		fusedOperands.erase(std::next(fusedOperands.begin(), consumerIdx));

// Create a constant scalar value from the splat constant.		// Create a constant scalar value from the splat constant.
Value scalarConstant = rewriter.create<ConstantOp>(		Value scalarConstant = rewriter.create<ConstantOp>(
producer.getLoc(),		producer.getLoc(),
producer.value().cast<DenseElementsAttr>().getSplatValue());		producer.value().cast<DenseElementsAttr>().getSplatValue());

LinalgOp fusedOp = createLinalgOpOfSameType(		LinalgOp fusedOp = createLinalgOpOfSameType(
consumer, rewriter, rewriter.getUnknownLoc(),		consumer, rewriter, rewriter.getUnknownLoc(),
consumerOp->getResultTypes(),		consumer.getOperation()->getResultTypes(),
/inputs=/fusedOperands,		/inputs=/fusedOperands,
/outputBuffers=/ValueRange{},		/outputBuffers=/ValueRange{},
/initTensors=/ValueRange{}, // no init tensors for now.		/initTensors=/ValueRange{}, // no init tensors for now.
rewriter.getAffineMapArrayAttr(fusedIndexMaps),		rewriter.getAffineMapArrayAttr(fusedIndexMaps),
consumer.iterator_types(),		consumer.iterator_types(),
/doc=/nullptr,		/doc=/nullptr,
/library_call=/nullptr,		/library_call=/nullptr,
/symbol_source=/nullptr);		/symbol_source=/nullptr);

// Map the block argument corresponding to the replaced argument with the		// Map the block argument corresponding to the replaced argument with the
// scalar constant.		// scalar constant.
Region &consumerRegion = consumerOp->getRegion(0);		Region &consumerRegion = consumer.getOperation()->getRegion(0);
Block &entryBlock = *consumerRegion.begin();		Block &entryBlock = *consumerRegion.begin();
unsigned argIndex = entryBlock.getNumArguments() -		unsigned argIndex =
consumerOp->getNumOperands() + consumerIdx;		entryBlock.getNumArguments() - consumer.getNumInputs() + consumerIdx;
BlockAndValueMapping mapping;		BlockAndValueMapping mapping;
mapping.map(entryBlock.getArgument(argIndex), scalarConstant);		mapping.map(entryBlock.getArgument(argIndex), scalarConstant);
Region &fusedRegion = fusedOp.getOperation()->getRegion(0);		Region &fusedRegion = fusedOp.getOperation()->getRegion(0);
rewriter.cloneRegionBefore(consumerRegion, fusedRegion, fusedRegion.begin(),		rewriter.cloneRegionBefore(consumerRegion, fusedRegion, fusedRegion.begin(),
mapping);		mapping);
return fusedOp;		return SmallVector<Value, 1>(fusedOp.getOperation()->getResults());
}		}
};		};
} // namespace		} // namespace

Operation *mlir::linalg::fuseTensorOps(PatternRewriter &rewriter,		Optional<SmallVector<Value, 1>>
Operation *consumer,		mlir::linalg::fuseTensorOps(PatternRewriter &rewriter, Operation *consumer,
unsigned consumerIdx,		unsigned consumerIdx, OperationFolder *folder,
OperationFolder *folder) {		bool useReshapeFusionByExpansion) {
if (consumerIdx >= consumer->getNumOperands())		if (consumerIdx >= consumer->getNumOperands())
return nullptr;		return llvm::None;
Operation *producer = consumer->getOperand(consumerIdx).getDefiningOp();		Operation *producer = consumer->getOperand(consumerIdx).getDefiningOp();
if (!producer \|\| producer->getNumResults() != 1)		if (!producer \|\| producer->getNumResults() != 1)
return nullptr;		return llvm::None;

// Fuse when consumer is GenericOp or IndexedGenericOp.		// Fuse when consumer is GenericOp or IndexedGenericOp.
if (isa<GenericOp, IndexedGenericOp>(consumer)) {		if (isa<GenericOp, IndexedGenericOp>(consumer)) {
if (isa<GenericOp, IndexedGenericOp>(producer))		if (isa<GenericOp, IndexedGenericOp>(producer))
return FuseGenericOpsOnTensors::fuse(cast<LinalgOp>(producer),		return FuseGenericOpsOnTensors::fuse(cast<LinalgOp>(producer),
cast<LinalgOp>(consumer),		cast<LinalgOp>(consumer),
consumerIdx, rewriter, folder);		consumerIdx, rewriter, folder);
if (auto reshapeOpProducer = dyn_cast<TensorReshapeOp>(producer))		if (auto reshapeOpProducer = dyn_cast<TensorReshapeOp>(producer)) {
return FuseTensorReshapeOpAsProducer::fuse(reshapeOpProducer,		if (useReshapeFusionByExpansion) {
cast<LinalgOp>(consumer),		return FuseTensorReshapeOpAsProducerByExpansion::fuse(
consumerIdx, rewriter, folder);		reshapeOpProducer, cast<LinalgOp>(consumer), consumerIdx, rewriter,
		folder);
		}
		return FuseTensorReshapeOpAsProducerByLinearization::fuse(
		reshapeOpProducer, cast<LinalgOp>(consumer), consumerIdx, rewriter,
		folder);
		}
if (auto constantOpProducer = dyn_cast<ConstantOp>(producer))		if (auto constantOpProducer = dyn_cast<ConstantOp>(producer))
return FuseConstantOpAsProducer::fuse(constantOpProducer,		return FuseConstantOpAsProducer::fuse(constantOpProducer,
cast<LinalgOp>(consumer),		cast<LinalgOp>(consumer),
consumerIdx, rewriter, folder);		consumerIdx, rewriter, folder);
return nullptr;		return llvm::None;
}		}

if (isa<GenericOp, IndexedGenericOp>(producer)) {		if (isa<GenericOp, IndexedGenericOp>(producer)) {
// Fuse when consumer is a TensorReshapeOp.		// Fuse when consumer is a TensorReshapeOp.
if (TensorReshapeOp reshapeOp = dyn_cast<TensorReshapeOp>(consumer)) {		if (TensorReshapeOp reshapeOp = dyn_cast<TensorReshapeOp>(consumer)) {
return FuseTensorReshapeOpAsConsumer::fuse(		if (useReshapeFusionByExpansion) {
		return FuseTensorReshapeOpAsConsumerByExpansion::fuse(
		cast<LinalgOp>(producer), reshapeOp, consumerIdx, rewriter, folder);
		}
		return FuseTensorReshapeOpAsConsumerByLinearization::fuse(
cast<LinalgOp>(producer), reshapeOp, consumerIdx, rewriter, folder);		cast<LinalgOp>(producer), reshapeOp, consumerIdx, rewriter, folder);
}		}
}		}

return nullptr;		return llvm::None;
}		}

namespace {		namespace {
/// Patterns to fuse a generic op, with the producer of its operands.		/// Patterns to fuse a generic op, with the producer of its operands.
template <typename LinalgOpTy>		template <typename LinalgOpTy>
struct FuseTensorOps : public OpRewritePattern<LinalgOpTy> {		struct FuseTensorOps : public OpRewritePattern<LinalgOpTy> {
using OpRewritePattern<LinalgOpTy>::OpRewritePattern;		FuseTensorOps(MLIRContext *context, bool useReshapeFusionByExpansion = false)
		: OpRewritePattern<LinalgOpTy>::OpRewritePattern(context),
		useReshapeFusionByExpansion(useReshapeFusionByExpansion) {}

LogicalResult matchAndRewrite(LinalgOpTy op,		LogicalResult matchAndRewrite(LinalgOpTy op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
// Find the first operand that is defined by another generic op on tensors.		// Find the first operand that is defined by another generic op on tensors.
for (auto operandNum :		for (auto operandNum :
llvm::seq<unsigned>(0, op.getOperation()->getNumOperands())) {		llvm::seq<unsigned>(0, op.getOperation()->getNumOperands())) {
Operation *producer =		Operation *producer =
op.getOperation()->getOperand(operandNum).getDefiningOp();		op.getOperation()->getOperand(operandNum).getDefiningOp();
if (Operation *fusedOp = fuseTensorOps(rewriter, op, operandNum)) {		Optional<SmallVector<Value, 1>> fusedOpResults = fuseTensorOps(
rewriter.replaceOp(op, fusedOp->getResults());		rewriter, op, operandNum, nullptr, useReshapeFusionByExpansion);
		if (fusedOpResults) {
		rewriter.replaceOp(op, *fusedOpResults);
if (producer && llvm::all_of(producer->getResults(),		if (producer && llvm::all_of(producer->getResults(),
[](Value val) { return val.use_empty(); }))		[](Value val) { return val.use_empty(); }))
rewriter.eraseOp(producer);		rewriter.eraseOp(producer);
return success();		return success();
}		}
}		}
return failure();		return failure();
}		}

		private:
		bool useReshapeFusionByExpansion;
};		};

/// Pass that fuses generic ops on tensors. Used only for testing.		/// Pass that fuses generic ops on tensors. Used only for testing.
struct FusionOfTensorOpsPass		struct FusionOfTensorOpsPass
: public LinalgFusionOfTensorOpsBase<FusionOfTensorOpsPass> {		: public LinalgFusionOfTensorOpsBase<FusionOfTensorOpsPass> {
		FusionOfTensorOpsPass() = default;
		FusionOfTensorOpsPass(const FusionOfTensorOpsPass &) {}
		FusionOfTensorOpsPass(bool vUseReshapeFusionByExpansion) {
		this->useReshapeFusionByExpansion = vUseReshapeFusionByExpansion;
		}

void runOnOperation() override {		void runOnOperation() override {
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
Operation *op = getOperation();		Operation *op = getOperation();
populateLinalgTensorOpsFusionPatterns(op->getContext(), patterns);		populateLinalgTensorOpsFusionPatterns(op->getContext(), patterns,
		useReshapeFusionByExpansion);
applyPatternsAndFoldGreedily(op->getRegions(), patterns);		applyPatternsAndFoldGreedily(op->getRegions(), patterns);
};		};

		Option<bool> useReshapeFusionByExpansion{
		*this, "use-reshape-fusion-by-expansion", llvm::cl::init(true),
		llvm::cl::desc("Enable use of reshape fusion with producer/consumer by "
		"expanding the dimensionality of the producer/consumer")};
};		};
} // namespace		} // namespace

void mlir::populateLinalgTensorOpsFusionPatterns(		void mlir::populateLinalgTensorOpsFusionPatterns(
MLIRContext *context, OwningRewritePatternList &patterns) {		MLIRContext *context, OwningRewritePatternList &patterns,
		bool useReshapeFusionByExpansion) {
patterns.insert<FuseTensorOps<GenericOp>, FuseTensorOps<IndexedGenericOp>,		patterns.insert<FuseTensorOps<GenericOp>, FuseTensorOps<IndexedGenericOp>,
FuseTensorOps<TensorReshapeOp>>(context);		FuseTensorOps<TensorReshapeOp>>(context,
		useReshapeFusionByExpansion);
}		}

std::unique_ptr<Pass> mlir::createLinalgFusionOfTensorOpsPass() {		std::unique_ptr<Pass>
return std::make_unique<FusionOfTensorOpsPass>();		mlir::createLinalgFusionOfTensorOpsPass(bool useReshapeFusionByExpansion) {
		return std::make_unique<FusionOfTensorOpsPass>(useReshapeFusionByExpansion);
}		}

mlir/test/Dialect/Linalg/fusion-tensor.mlir

Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	func @generic_op_reshape_consumer_nofusion(%arg0 : tensor<?x?x?x5xf32>,
return %1 : tensor<?x?xf32>		return %1 : tensor<?x?xf32>
}		}

// CHECK-LABEL: func @generic_op_reshape_consumer_nofusion		// CHECK-LABEL: func @generic_op_reshape_consumer_nofusion
// CHECK: linalg.tensor_reshape		// CHECK: linalg.tensor_reshape

// -----		// -----

#map0 = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d2)>

func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>)
-> tensor<8x33x4xf32> {
%cst = constant dense<2.000000e+00> : tensor<264x4xf32>
%0 = linalg.generic {
indexing_maps = [#map0, #map0, #map0],
iterator_types = ["parallel", "parallel"]}
ins(%arg0, %cst : tensor<264x4xf32>, tensor<264x4xf32>) {
^bb0(%arg1: f32, %arg2: f32): // no predecessors
%2 = mulf %arg1, %arg2 : f32
linalg.yield %2 : f32
} -> tensor<264x4xf32>
%1 = linalg.tensor_reshape %0 [#map1, #map2] :
tensor<264x4xf32> into tensor<8x33x4xf32>
return %1 : tensor<8x33x4xf32>
}

// The reshape op in `%arg0` is folded into the indexing map of generic op.
// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2) -> (d0 * 33 + d1, d2)>
// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
// CHECK: func @generic_op_reshape_consumer_expanding
// CHECK-NOT: linalg.tensor_reshape
// CHECK: %[[CST:.]] = constant {{.}} : f32
// CHECK: linalg.generic
// CHECK-SAME: indexing_maps = [#[[MAP0]], #[[MAP1]]]
// CHECK-SAME: tensor<264x4xf32>
// CHECK: -> tensor<8x33x4xf32>
// CHECK-NOT: linalg.tensor_reshape

// -----

#map0 = affine_map<(d0, d1, d2) -> (d0)>		#map0 = affine_map<(d0, d1, d2) -> (d0)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>		#map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
func @generic_op_constant_fusion(%arg0 : tensor<5x?x?xf32>) -> tensor<5x?x?xf32>		func @generic_op_constant_fusion(%arg0 : tensor<5x?x?xf32>) -> tensor<5x?x?xf32>
{		{
%0 = constant dense<42.0> : tensor<5xf32>		%0 = constant dense<42.0> : tensor<5xf32>
%1 = linalg.generic {		%1 = linalg.generic {
indexing_maps = [#map0, #map1, #map1],		indexing_maps = [#map0, #map1, #map1],
iterator_types = ["parallel", "parallel", "parallel"]}		iterator_types = ["parallel", "parallel", "parallel"]}
▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/reshape_fusion.mlir

This file was added.

				// RUN: mlir-opt %s -linalg-fusion-for-tensor-ops=use-reshape-fusion-by-expansion -split-input-file \| FileCheck %s

				#map0 = affine_map<(d0, d1, d2) -> (d2, d0, d1)>
				#map1 = affine_map<(d0, d1, d2) -> (d1, d2, d0)>
				func @generic_op_reshape_producer_fusion(%arg0 : tensor<?x?x?x?xf32>,
				%arg1 : tensor<?x?x?xf32>) ->
				tensor<?x?x?xf32>
				{
				%0 = linalg.tensor_reshape %arg0 [affine_map<(i, j, k, l) -> (i)>,
				affine_map<(i, j, k, l) -> (j, k)>,
				affine_map<(i, j, k, l) -> (l)>] :
				tensor<?x?x?x?xf32> into tensor<?x?x?xf32>
				%1 = linalg.generic {
				indexing_maps = [#map0, #map1, #map1],
				iterator_types = ["parallel", "parallel", "parallel"]}
				ins(%0, %arg1 : tensor<?x?x?xf32>, tensor<?x?x?xf32>) {
				^bb0(%arg3: f32, %arg4: f32): // no predecessors
				%1 = mulf %arg3, %arg4 : f32
				linalg.yield %1 : f32
				} -> tensor<?x?x?xf32>
				return %1 : tensor<?x?x?xf32>
				}

				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (d0)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d1)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2, d3) -> (d2, d3)>
				// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0, d1, d2, d3) -> (d3, d0, d1, d2)>
				// CHECK-DAG: #[[MAP4:.+]] = affine_map<(d0, d1, d2, d3) -> (d2, d3, d0, d1)>
				// CHECK: func @generic_op_reshape_producer_fusion
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?x?x?xf32>
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?x?xf32>
				// CHECK: %[[T0:.+]] = linalg.tensor_reshape %[[ARG1]]
				// CHECK-SAME: [#[[MAP0]], #[[MAP1]], #[[MAP2]]]
				// CHECK-SAME: tensor<?x?x?xf32> into tensor<?x?x?x?xf32>
				// CHECK: %[[T1:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP3]], #[[MAP4]], #[[MAP4]]]
				// CHECK-SAME: ["parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[ARG0]], %[[T0]] : tensor<?x?x?x?xf32>, tensor<?x?x?x?xf32>)
				// CHECK: %[[T2:.+]] = linalg.tensor_reshape
				// CHECK-SAME: [#[[MAP0]], #[[MAP1]], #[[MAP2]]]
				// CHECK-SAME: tensor<?x?x?x?xf32> into tensor<?x?x?xf32>
				// CHECK: return %[[T2]]

				// -----

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				func @generic_op_reshape_consumer_fusion(%arg0 : tensor<?x?xf32>,
				%arg1 : tensor<?x?xf32>) ->
				tensor<?x?x4x5xf32>
				{
				%0 = linalg.generic {
				indexing_maps = [#map0, #map0, #map0],
				iterator_types = ["parallel", "parallel"]}
				ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) {
				^bb0(%arg3: f32, %arg4: f32): // no predecessors
				%1 = mulf %arg3, %arg4 : f32
				linalg.yield %1 : f32
				} -> tensor<?x?xf32>
				%1 = linalg.tensor_reshape %0 [affine_map<(i, j, k, l) -> (i)>,
				affine_map<(i, j, k, l) -> (j, k, l)>] :
				tensor<?x?xf32> into tensor<?x?x4x5xf32>
				return %1 : tensor<?x?x4x5xf32>
				}

				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (d0)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d1, d2, d3)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				// CHECK: func @generic_op_reshape_consumer_fusion
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK: %[[T0:.+]] = linalg.tensor_reshape %[[ARG0]]
				// CHECK-SAME: [#[[MAP0]], #[[MAP1]]]
				// CHECK-SAME: tensor<?x?xf32> into tensor<?x?x4x5xf32>
				// CHECK: %[[T1:.+]] = linalg.tensor_reshape %[[ARG1]]
				// CHECK-SAME: [#[[MAP0]], #[[MAP1]]]
				// CHECK-SAME: tensor<?x?xf32> into tensor<?x?x4x5xf32>
				// CHECK: %[[T2:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP2]], #[[MAP2]], #[[MAP2]]]
				// CHECK-SAME: ["parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[T0]], %[[T1]] : tensor<?x?x4x5xf32>, tensor<?x?x4x5xf32>)
				// CHECK: return %[[T2]] : tensor<?x?x4x5xf32>


				// -----

				func @reshape_as_consumer_permutation
				(%a : tensor<?x?x?xf32>, %b : tensor<?x?xf32>)
				-> tensor<?x?x?x?x?x?xf32> {
				%c = linalg.generic {
				indexing_maps = [affine_map<(d0, d1, d2) -> (d1, d0, d2)>,
				affine_map<(d0, d1, d2) -> (d1, d2)>,
				affine_map<(d0, d1, d2) -> (d0, d2, d1)>],
				iterator_types = ["parallel", "parallel", "parallel"]}
				ins(%a, %b : tensor<?x?x?xf32>, tensor<?x?xf32>) {
				^bb0(%arg0 : f32, %arg1: f32):
				%1 = addf %arg0, %arg1 : f32
				linalg.yield %1 : f32
				} -> tensor<?x?x?xf32>
				%d = linalg.tensor_reshape %c
				[affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1)>,
				affine_map<(d0, d1, d2, d3, d4, d5) -> (d2)>,
				affine_map<(d0, d1, d2, d3, d4, d5) -> (d3, d4, d5)>]
				: tensor<?x?x?xf32> into tensor<?x?x?x?x?x?xf32>
				return %d : tensor<?x?x?x?x?x?xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d2)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d3, d4)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d5)>
				// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
				// CHECK-DAG: #[[MAP4:.+]] = affine_map<(d0, d1, d2, d3) -> (d3)>
				// CHECK-DAG: #[[MAP5:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d3, d4, d0, d1, d5)>
				// CHECK-DAG: #[[MAP6:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d3, d4, d5)>
				// CHECK-DAG: #[[MAP7:.+]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d5, d2, d3, d4)>
				// CHECK: func @reshape_as_consumer_permutation
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?x?xf32>
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK: %[[T0:.+]] = linalg.tensor_reshape %[[ARG0]]
				// CHECK-SAME: [#[[MAP0]], #[[MAP1]], #[[MAP2]]]
				// CHECK-SAME: tensor<?x?x?xf32> into tensor<?x?x?x?x?x?xf32>
				// CHECK: %[[T1:.+]] = linalg.tensor_reshape %[[ARG1]]
				// CHECK-SAME: [#[[MAP3]], #[[MAP4]]]
				// CHECK-SAME: tensor<?x?xf32> into tensor<?x?x?x?xf32>
				// CHECK: %[[T2:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP5]], #[[MAP6]], #[[MAP7]]]
				// CHECK-SAME: ["parallel", "parallel", "parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[T0]], %[[T1]] : tensor<?x?x?x?x?x?xf32>, tensor<?x?x?x?xf32>)
				// CHECK: return %[[T2]] : tensor<?x?x?x?x?x?xf32>

				// -----

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				#map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
				#map2 = affine_map<(d0, d1, d2) -> (d2)>

				func @generic_op_reshape_consumer_static(%arg0: tensor<264x4xf32>)
				-> tensor<8x33x4xf32> {
				%cst = constant dense<2.000000e+00> : tensor<264x4xf32>
				%0 = linalg.generic {
				indexing_maps = [#map0, #map0, #map0],
				iterator_types = ["parallel", "parallel"]}
				ins(%arg0, %cst : tensor<264x4xf32>, tensor<264x4xf32>) {
				^bb0(%arg1: f32, %arg2: f32): // no predecessors
				%2 = mulf %arg1, %arg2 : f32
				linalg.yield %2 : f32
				} -> tensor<264x4xf32>
				%1 = linalg.tensor_reshape %0 [#map1, #map2] :
				tensor<264x4xf32> into tensor<8x33x4xf32>
				return %1 : tensor<8x33x4xf32>
				}

				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2) -> (d0, d1)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2) -> (d2)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
				// CHECK: func @generic_op_reshape_consumer_static
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<264x4xf32>
				// CHECK: %[[T0:.+]] = linalg.tensor_reshape %[[ARG0]]
				// CHECK-SAME: [#[[MAP0]], #[[MAP1]]]
				// CHECK-SAME: tensor<264x4xf32> into tensor<8x33x4xf32>
				// CHECK: %[[T1:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP2]], #[[MAP2]]]
				// CHECK-SAME: ["parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[T0]] : tensor<8x33x4xf32>)
				// CHECK: return %[[T1]] : tensor<8x33x4xf32>

				// -----

				func @scalar_reshape(%arg0 : tensor<1x10xf32>, %arg1 : tensor<1xf32>)
				-> tensor<1x10xf32> {
				%0 = linalg.tensor_reshape %arg1 [] : tensor<1xf32> into tensor<f32>
				%1 = linalg.generic
				{indexing_maps = [affine_map<(d0) -> ()>, affine_map<(d0) -> (d0)>],
				iterator_types = ["parallel"]} ins(%0 : tensor<f32>) {
				^bb0(%arg2: f32): // no predecessors
				linalg.yield %arg2 : f32
				} -> tensor<10xf32>
				%2 = linalg.tensor_reshape %1 [affine_map<(d0, d1) -> (d0, d1)>]
				: tensor<10xf32> into tensor<1x10xf32>
				return %2 : tensor<1x10xf32>
				}

				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1) -> ()>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1) -> (d0, d1)>
				// CHECK: func @scalar_reshape
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<1x10xf32>
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<1xf32>
				// CHECK: %[[T0:.+]] = linalg.tensor_reshape %[[ARG1]] []
				// CHECK-SAME: tensor<1xf32> into tensor<f32>
				// CHECK: %[[T1:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP0]], #[[MAP1]]]
				// CHECK-SAME: iterator_types = ["parallel", "parallel"]
				// CHECK-SAME: ins(%[[T0]] : tensor<f32>)
				// CHECK: return %[[T1]] : tensor<1x10xf32>

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Rethink fusion ot linalg ops with reshape ops.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 297449

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td

mlir/include/mlir/Dialect/Linalg/Passes.h

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp

mlir/test/Dialect/Linalg/fusion-tensor.mlir

mlir/test/Dialect/Linalg/reshape_fusion.mlir

[mlir][Linalg] Rethink fusion ot linalg ops with reshape ops.
ClosedPublic