This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
-
DropUnitDims.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
drop-unit-extent-dims.mlir

Differential D144294

[mlir][linalg] Add pattern to convert batch_matmul to matmul
AbandonedPublic

Authored by ThomasRaoux on Feb 17 2023, 12:57 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
mravishankar
hanchung

Summary

If the batch dimension of a batch matmul is 1 it can be converted to a
linalg.matmul. This avoid reshape when converting mix of generic ops
with batch matmul.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Feb 17 2023, 12:57 PM

Herald added a reviewer: hanchung. · View Herald TranscriptFeb 17 2023, 12:57 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: hanchung, Moerafaat, bzcheeseman and 22 others. · View Herald Transcript

ThomasRaoux requested review of this revision.Feb 17 2023, 12:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2023, 12:57 PM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B214491: Diff 498485.Feb 17 2023, 1:23 PM

Probably needs to be into NamedOpConversion pass

This revision now requires changes to proceed.Feb 17 2023, 7:36 PM

I am unclear why this special casing is needed...
I routinely convert higher-D ops into lower-D generic ops with rank-reducing patterns.
In my use cases, I am not relying on the named ops for further transformations / optimizations.

IMO this just points at the need to finally start that inverse generalization transform: linalg.generic -> linalg.named op.
I was still dragging my feet because I am still hoping we can do significantly better than special casing, but when the special casing leaks into N^2 land, it is time to just do it.

So bottom line, could you start a new file called Specialization.cpp with a functional transform API FailureOr<LinalgOp> specialize(linalg::GenericOp genericOp).

For this use case, you should only need:

if (isaContractionOpInterface(genericOp) && /*check indexing maps*/ && /*check reduction type*/)
  return rewriter.replaceOpWithNewOp<linalg::MatmulOp>(...);

That switch can grow over time and is the only place in the compiler where we need such logic.
This will compose with rank-reducing patterns that we have filtere over time to properly use rank-reducing slices.

In D144294#4138255, @nicolasvasilache wrote:
I am unclear why this special casing is needed...
I routinely convert higher-D ops into lower-D generic ops with rank-reducing patterns.
In my use cases, I am not relying on the named ops for further transformations / optimizations.

IMO this just points at the need to finally start that inverse generalization transform: linalg.generic -> linalg.named op.
I was still dragging my feet because I am still hoping we can do significantly better than special casing, but when the special casing leaks into N^2 land, it is time to just do it.

So bottom line, could you start a new file called Specialization.cpp with a functional transform API FailureOr<LinalgOp> specialize(linalg::GenericOp genericOp).

For this use case, you should only need:
if (isaContractionOpInterface(genericOp) && /*check indexing maps*/ && /*check reduction type*/)
  return rewriter.replaceOpWithNewOp<linalg::MatmulOp>(...);
That switch can grow over time and is the only place in the compiler where we need such logic.
This will compose with rank-reducing patterns that we have filtere over time to properly use rank-reducing slices.

We are seeing quite a few problems related to named ops indeed. See the analysis made by Sean here:
https://github.com/iree-org/iree/issues/12214#issuecomment-1437481667

I'm not sure I understand what you are suggesting for this case though. The problem is that when we have matmul with generic ops around the generic ops will get reshaped but not the named ops, therefore this leaves reshape ops in the graph blocking further optimizations.
Are you saying the right fix should be to just change the existing pattern on linalg generic and make it work on LinalgOp interface and therefore convert batch.matmul into linalg.generic?

That sounds like a good direction but this means it would apply generalization even if user doesn't want that, so the op may get specialized again but in some cases it won't which makes it harder for user to handle all the cases.
Could you give a bit more details on how you think the flow should work?

ThomasRaoux abandoned this revision.Feb 23 2023, 1:54 PM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

DropUnitDims.cpp

197 lines

test/

Dialect/

Linalg/

drop-unit-extent-dims.mlir

44 lines

Diff 498485

mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	if (unitDims.count(indexOp.getDim()) != 0) {
unitDims, [&](unsigned dim) { return dim < indexOp.getDim(); });		unitDims, [&](unsigned dim) { return dim < indexOp.getDim(); });
if (droppedDims != 0)		if (droppedDims != 0)
rewriter.replaceOpWithNewOp<IndexOp>(indexOp,		rewriter.replaceOpWithNewOp<IndexOp>(indexOp,
indexOp.getDim() - droppedDims);		indexOp.getDim() - droppedDims);
}		}
}		}
}		}

		/// Expand the given value based on reassociation.
		static Value expandValue(Value result, Value origOutput,
		ArrayRef<ReassociationIndices> reassociation,
		RankReductionStrategy rankReductionStrategy,
		Location loc, PatternRewriter &rewriter) {
		// There are no results for memref outputs.
		auto origResultType = origOutput.getType().cast<RankedTensorType>();
		if (rankReductionStrategy == RankReductionStrategy::ExtractInsertSlice) {
		unsigned rank = origResultType.getRank();
		SmallVector<OpFoldResult> offsets(rank, rewriter.getIndexAttr(0));
		SmallVector<OpFoldResult> sizes =
		tensor::getMixedSizes(rewriter, loc, origOutput);
		SmallVector<OpFoldResult> strides(rank, rewriter.getIndexAttr(1));
		return rewriter.createOrFold<tensor::InsertSliceOp>(
		loc, result, origOutput, offsets, sizes, strides);
		}

		assert(rankReductionStrategy == RankReductionStrategy::ReassociativeReshape &&
		"unknown rank reduction strategy");
		return rewriter.create<tensor::ExpandShapeOp>(loc, origResultType, result,
		reassociation);
		}

		/// Collapse the given value based on reassociation.
		static Value collapseValue(Value operand, ArrayRef<int64_t> targetShape,
		ArrayRef<ReassociationIndices> reassociation,
		RankReductionStrategy rankReductionStrategy,
		Location loc, PatternRewriter &rewriter) {
		if (auto memrefType = operand.getType().dyn_cast<MemRefType>()) {
		if (rankReductionStrategy == RankReductionStrategy::ExtractInsertSlice) {
		FailureOr<Value> rankReducingExtract =
		memref::SubViewOp::rankReduceIfNeeded(rewriter, loc, operand,
		targetShape);
		assert(succeeded(rankReducingExtract) && "not a unit-extent collapse");
		return *rankReducingExtract;
		}

		assert(rankReductionStrategy ==
		RankReductionStrategy::ReassociativeReshape &&
		"unknown rank reduction strategy");
		MemRefLayoutAttrInterface layout;
		auto targetType = MemRefType::get(targetShape, memrefType.getElementType(),
		layout, memrefType.getMemorySpace());
		return rewriter.create<memref::CollapseShapeOp>(loc, targetType, operand,
		reassociation);
		}
		if (auto tensorType = operand.getType().dyn_cast<RankedTensorType>()) {
		if (rankReductionStrategy == RankReductionStrategy::ExtractInsertSlice) {
		FailureOr<Value> rankReducingExtract =
		tensor::ExtractSliceOp::rankReduceIfNeeded(rewriter, loc, operand,
		targetShape);
		assert(succeeded(rankReducingExtract) && "not a unit-extent collapse");
		return *rankReducingExtract;
		}

		assert(rankReductionStrategy ==
		RankReductionStrategy::ReassociativeReshape &&
		"unknown rank reduction strategy");
		auto targetType =
		RankedTensorType::get(targetShape, tensorType.getElementType());
		return rewriter.create<tensor::CollapseShapeOp>(loc, targetType, operand,
		reassociation);
		}
		llvm_unreachable("unsupported operand type");
		}

namespace {		namespace {
		/// Pattern to convert batch_matmul op into matmul.
		struct FoldUnitBatchMatmulToMatmul : public OpRewritePattern<BatchMatmulOp> {
		FoldUnitBatchMatmulToMatmul(MLIRContext *ctx,
		RankReductionStrategy rankReductionStrategy)
		: OpRewritePattern<BatchMatmulOp>(ctx),
		rankReductionStrategy(rankReductionStrategy) {}
		LogicalResult matchAndRewrite(BatchMatmulOp batchMatmul,
		PatternRewriter &rewriter) const override {
		SmallVector<int64_t> dims = batchMatmul.getStaticShape();
		if (dims[0] != 1)
		return failure();
		SmallVector<ReassociationIndices> reassoc = {ReassociationIndices({0, 1}),
		ReassociationIndices({2})};
		Location loc = batchMatmul.getLoc();
		Value lhs = batchMatmul.getDpsInputOperand(0)->get();
		Value rhs = batchMatmul.getDpsInputOperand(1)->get();
		Value acc = batchMatmul.getDpsInitOperand(0)->get();
		lhs = collapseValue(
		lhs, lhs.getType().cast<ShapedType>().getShape().take_back(2), reassoc,
		rankReductionStrategy, loc, rewriter);
		rhs = collapseValue(
		rhs, rhs.getType().cast<ShapedType>().getShape().take_back(2), reassoc,
		rankReductionStrategy, loc, rewriter);
		acc = collapseValue(
		acc, acc.getType().cast<ShapedType>().getShape().take_back(2), reassoc,
		rankReductionStrategy, loc, rewriter);
		SmallVector<Type> resultType;
		bool hasTensorResult = batchMatmul.getNumResults() > 0;
		if (hasTensorResult)
		resultType.push_back(acc.getType());
		auto matmul = rewriter.create<linalg::MatmulOp>(loc, resultType,
		ValueRange{lhs, rhs}, acc);
		if (hasTensorResult) {
		Value expand = expandValue(matmul.getResult(0),
		batchMatmul.getDpsInitOperand(0)->get(),
		reassoc, rankReductionStrategy, loc, rewriter);
		rewriter.replaceOp(batchMatmul, expand);
		} else {
		rewriter.eraseOp(batchMatmul);
		}
		return success();
		}

		private:
		RankReductionStrategy rankReductionStrategy;
		};

/// Pattern to fold unit-trip count loops in GenericOps.		/// Pattern to fold unit-trip count loops in GenericOps.
struct FoldUnitDimLoops : public OpRewritePattern<GenericOp> {		struct FoldUnitDimLoops : public OpRewritePattern<GenericOp> {
using OpRewritePattern<GenericOp>::OpRewritePattern;		using OpRewritePattern<GenericOp>::OpRewritePattern;
LogicalResult matchAndRewrite(GenericOp genericOp,		LogicalResult matchAndRewrite(GenericOp genericOp,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
SmallVector<AffineMap, 4> indexingMaps = genericOp.getIndexingMapsArray();		SmallVector<AffineMap, 4> indexingMaps = genericOp.getIndexingMapsArray();
if (indexingMaps.empty())		if (indexingMaps.empty())
return failure();		return failure();
▲ Show 20 Lines • Show All 236 Lines • ▼ Show 20 Lines

/// Pattern to replace tensor/buffer operands/results that are unit extents.		/// Pattern to replace tensor/buffer operands/results that are unit extents.
struct ReplaceUnitExtents : public OpRewritePattern<GenericOp> {		struct ReplaceUnitExtents : public OpRewritePattern<GenericOp> {
ReplaceUnitExtents(MLIRContext *ctx,		ReplaceUnitExtents(MLIRContext *ctx,
RankReductionStrategy rankReductionStrategy)		RankReductionStrategy rankReductionStrategy)
: OpRewritePattern<GenericOp>(ctx),		: OpRewritePattern<GenericOp>(ctx),
rankReductionStrategy(rankReductionStrategy) {}		rankReductionStrategy(rankReductionStrategy) {}

// Expand the given value.
Value expandValue(Value result, Value origOutput,
ArrayRef<ReassociationIndices> reassociation, Location loc,
PatternRewriter &rewriter) const {
// There are no results for memref outputs.
auto origResultType = origOutput.getType().cast<RankedTensorType>();
if (rankReductionStrategy == RankReductionStrategy::ExtractInsertSlice) {
unsigned rank = origResultType.getRank();
SmallVector<OpFoldResult> offsets(rank, rewriter.getIndexAttr(0));
SmallVector<OpFoldResult> sizes =
tensor::getMixedSizes(rewriter, loc, origOutput);
SmallVector<OpFoldResult> strides(rank, rewriter.getIndexAttr(1));
return rewriter.createOrFold<tensor::InsertSliceOp>(
loc, result, origOutput, offsets, sizes, strides);
}

assert(rankReductionStrategy ==
RankReductionStrategy::ReassociativeReshape &&
"unknown rank reduction strategy");
return rewriter.create<tensor::ExpandShapeOp>(loc, origResultType, result,
reassociation);
}

// Collapse the given value.
Value collapseValue(Value operand, ArrayRef<int64_t> targetShape,
ArrayRef<ReassociationIndices> reassociation,
Location loc, PatternRewriter &rewriter) const {
if (auto memrefType = operand.getType().dyn_cast<MemRefType>()) {
if (rankReductionStrategy == RankReductionStrategy::ExtractInsertSlice) {
FailureOr<Value> rankReducingExtract =
memref::SubViewOp::rankReduceIfNeeded(rewriter, loc, operand,
targetShape);
assert(succeeded(rankReducingExtract) && "not a unit-extent collapse");
return *rankReducingExtract;
}

assert(rankReductionStrategy ==
RankReductionStrategy::ReassociativeReshape &&
"unknown rank reduction strategy");
MemRefLayoutAttrInterface layout;
auto targetType =
MemRefType::get(targetShape, memrefType.getElementType(), layout,
memrefType.getMemorySpace());
return rewriter.create<memref::CollapseShapeOp>(loc, targetType, operand,
reassociation);
}
if (auto tensorType = operand.getType().dyn_cast<RankedTensorType>()) {
if (rankReductionStrategy == RankReductionStrategy::ExtractInsertSlice) {
FailureOr<Value> rankReducingExtract =
tensor::ExtractSliceOp::rankReduceIfNeeded(rewriter, loc, operand,
targetShape);
assert(succeeded(rankReducingExtract) && "not a unit-extent collapse");
return *rankReducingExtract;
}

assert(rankReductionStrategy ==
RankReductionStrategy::ReassociativeReshape &&
"unknown rank reduction strategy");
auto targetType =
RankedTensorType::get(targetShape, tensorType.getElementType());
return rewriter.create<tensor::CollapseShapeOp>(loc, targetType, operand,
reassociation);
}
llvm_unreachable("unsupported operand type");
}

LogicalResult matchAndRewrite(GenericOp genericOp,		LogicalResult matchAndRewrite(GenericOp genericOp,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
// Skip the pattern if the op has any tensor with special encoding.		// Skip the pattern if the op has any tensor with special encoding.
if (llvm::any_of(genericOp->getOperandTypes(), [](Type type) {		if (llvm::any_of(genericOp->getOperandTypes(), [](Type type) {
auto tensorType = type.dyn_cast<RankedTensorType>();		auto tensorType = type.dyn_cast<RankedTensorType>();
return tensorType && tensorType.getEncoding() != nullptr;		return tensorType && tensorType.getEncoding() != nullptr;
}))		}))
return failure();		return failure();
Show All 33 Lines	LogicalResult matchAndRewrite(GenericOp genericOp,
// Insert rank reductions.		// Insert rank reductions.
SmallVector<Value> newOperands;		SmallVector<Value> newOperands;
for (OpOperand &opOperand : genericOp->getOpOperands()) {		for (OpOperand &opOperand : genericOp->getOpOperands()) {
int64_t idx = opOperand.getOperandNumber();		int64_t idx = opOperand.getOperandNumber();
if (!collapsed[idx]) {		if (!collapsed[idx]) {
newOperands.push_back(opOperand.get());		newOperands.push_back(opOperand.get());
continue;		continue;
}		}
newOperands.push_back(collapseValue(opOperand.get(), targetShapes[idx],		newOperands.push_back(
reassociations[idx], loc, rewriter));		collapseValue(opOperand.get(), targetShapes[idx], reassociations[idx],
		rankReductionStrategy, loc, rewriter));
}		}

// If any result type changes, insert a reshape to convert from the original		// If any result type changes, insert a reshape to convert from the original
// type to the new type.		// type to the new type.
ArrayRef<Value> newInputs =		ArrayRef<Value> newInputs =
ArrayRef<Value>(newOperands).take_front(genericOp.getNumDpsInputs());		ArrayRef<Value>(newOperands).take_front(genericOp.getNumDpsInputs());
ArrayRef<Value> newOutputs =		ArrayRef<Value> newOutputs =
ArrayRef<Value>(newOperands).take_back(genericOp.getNumDpsInits());		ArrayRef<Value>(newOperands).take_back(genericOp.getNumDpsInits());
Show All 13 Lines	LogicalResult matchAndRewrite(GenericOp genericOp,
SmallVector<Value> resultReplacements;		SmallVector<Value> resultReplacements;
for (const auto &result : llvm::enumerate(replacementOp.getResults())) {		for (const auto &result : llvm::enumerate(replacementOp.getResults())) {
unsigned index = result.index() + replacementOp.getNumDpsInputs();		unsigned index = result.index() + replacementOp.getNumDpsInputs();
Value origOutput = oldOutputs[result.index()];		Value origOutput = oldOutputs[result.index()];
if (!collapsed[result.index() + genericOp.getNumDpsInputs()]) {		if (!collapsed[result.index() + genericOp.getNumDpsInputs()]) {
resultReplacements.push_back(result.value());		resultReplacements.push_back(result.value());
continue;		continue;
}		}
resultReplacements.push_back(expandValue(		resultReplacements.push_back(
result.value(), origOutput, reassociations[index], loc, rewriter));		expandValue(result.value(), origOutput, reassociations[index],
		rankReductionStrategy, loc, rewriter));
}		}

rewriter.replaceOp(genericOp, resultReplacements);		rewriter.replaceOp(genericOp, resultReplacements);
return success();		return success();
}		}

private:		private:
RankReductionStrategy rankReductionStrategy;		RankReductionStrategy rankReductionStrategy;
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
};		};
} // namespace		} // namespace

/// Patterns that are used to canonicalize the use of unit-extent dims for		/// Patterns that are used to canonicalize the use of unit-extent dims for
/// broadcasting.		/// broadcasting.
void mlir::linalg::populateFoldUnitExtentDimsViaReshapesPatterns(		void mlir::linalg::populateFoldUnitExtentDimsViaReshapesPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
auto *context = patterns.getContext();		auto *context = patterns.getContext();
patterns.add<ReplaceUnitExtents>(context,		patterns.add<ReplaceUnitExtents, FoldUnitBatchMatmulToMatmul>(
RankReductionStrategy::ReassociativeReshape);		context, RankReductionStrategy::ReassociativeReshape);
// TODO: Patterns unrelated to unit dim folding should be factored out.		// TODO: Patterns unrelated to unit dim folding should be factored out.
patterns.add<FoldUnitDimLoops, RankReducedExtractSliceOp,		patterns.add<FoldUnitDimLoops, RankReducedExtractSliceOp,
RankReducedInsertSliceOp<tensor::InsertSliceOp>,		RankReducedInsertSliceOp<tensor::InsertSliceOp>,
RankReducedInsertSliceOp<tensor::ParallelInsertSliceOp>>(		RankReducedInsertSliceOp<tensor::ParallelInsertSliceOp>>(
context);		context);
linalg::FillOp::getCanonicalizationPatterns(patterns, context);		linalg::FillOp::getCanonicalizationPatterns(patterns, context);
tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);		tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);
tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);		tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);
tensor::ExpandShapeOp::getCanonicalizationPatterns(patterns, context);		tensor::ExpandShapeOp::getCanonicalizationPatterns(patterns, context);
tensor::populateFoldTensorEmptyPatterns(patterns);		tensor::populateFoldTensorEmptyPatterns(patterns);
memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);		memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);
memref::populateResolveShapedTypeResultDimsPatterns(patterns);		memref::populateResolveShapedTypeResultDimsPatterns(patterns);
}		}

void mlir::linalg::populateFoldUnitExtentDimsViaSlicesPatterns(		void mlir::linalg::populateFoldUnitExtentDimsViaSlicesPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
auto *context = patterns.getContext();		auto *context = patterns.getContext();
patterns.add<ReplaceUnitExtents>(context,		patterns.add<ReplaceUnitExtents, FoldUnitBatchMatmulToMatmul>(
RankReductionStrategy::ExtractInsertSlice);		context, RankReductionStrategy::ExtractInsertSlice);
patterns.add<FoldUnitDimLoops>(context);		patterns.add<FoldUnitDimLoops>(context);
// TODO: Patterns unrelated to unit dim folding should be factored out.		// TODO: Patterns unrelated to unit dim folding should be factored out.
linalg::FillOp::getCanonicalizationPatterns(patterns, context);		linalg::FillOp::getCanonicalizationPatterns(patterns, context);
tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);		tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);
tensor::populateFoldTensorEmptyPatterns(patterns);		tensor::populateFoldTensorEmptyPatterns(patterns);
memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);		memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);
memref::populateResolveShapedTypeResultDimsPatterns(patterns);		memref::populateResolveShapedTypeResultDimsPatterns(patterns);
}		}
Show All 31 Lines

mlir/test/Dialect/Linalg/drop-unit-extent-dims.mlir

	Show First 20 Lines • Show All 917 Lines • ▼ Show 20 Lines
	// CHECK: memref.collapse_shape			// CHECK: memref.collapse_shape
	// CHECK-SAME: [] : memref<1x1xf32, 3> into memref<f32, 3>			// CHECK-SAME: [] : memref<1x1xf32, 3> into memref<f32, 3>
	// CHECK: linalg.generic{{.*}}memref<f32, 3>			// CHECK: linalg.generic{{.*}}memref<f32, 3>

	// CHECK-SLICES-LABEL: func @drop_all_loops			// CHECK-SLICES-LABEL: func @drop_all_loops
	// CHECK-SLICES: memref.subview %{{.*}}[0, 0] [1, 1] [1, 1] : memref<1x1xf32, 3> to memref<f32, strided<[]>, 3>			// CHECK-SLICES: memref.subview %{{.*}}[0, 0] [1, 1] [1, 1] : memref<1x1xf32, 3> to memref<f32, strided<[]>, 3>
	// CHECK-SLICES: linalg.generic{{.*}}memref<f32, strided<[]>, 3>			// CHECK-SLICES: linalg.generic{{.*}}memref<f32, strided<[]>, 3>

				// -----

				func.func @batch_matmul_memref(%A: memref<1x?x?xf32>, %B: memref<1x?x?xf32>, %C: memref<1x?x?xf32>) {
				linalg.batch_matmul ins(%A, %B: memref<1x?x?xf32>, memref<1x?x?xf32>)
				outs(%C : memref<1x?x?xf32>)
				return
				}

				// CHECK-LABEL: func @batch_matmul_memref
				// CHECK-SAME: (%[[A:.+]]: memref<1x?x?xf32>, %[[B:.+]]: memref<1x?x?xf32>, %[[C:.+]]: memref<1x?x?xf32>)
				// CHECK: %[[RA:.+]] = memref.collapse_shape %[[A]] {{\[}}[0, 1], [2]] : memref<1x?x?xf32> into memref<?x?xf32>
				// CHECK: %[[RB:.+]] = memref.collapse_shape %[[B]] {{\[}}[0, 1], [2]] : memref<1x?x?xf32> into memref<?x?xf32>
				// CHECK: %[[RC:.+]] = memref.collapse_shape %[[C]] {{\[}}[0, 1], [2]] : memref<1x?x?xf32> into memref<?x?xf32>
				// CHECK: linalg.matmul ins(%[[RA]], %[[RB]] : memref<?x?xf32>, memref<?x?xf32>) outs(%[[RC]] : memref<?x?xf32>)

				// CHECK-LABEL-SLICES: func @batch_matmul_memref
				// CHECK-SLICES: memref.subview %{{.}}[0, 0, 0] [1, %{{.}}, %{{.*}}] [1, 1, 1]
				// CHECK-SLICES: memref.subview %{{.}}[0, 0, 0] [1, %{{.}}, %{{.*}}] [1, 1, 1]
				// CHECK-SLICES: memref.subview %{{.}}[0, 0, 0] [1, %{{.}}, %{{.*}}] [1, 1, 1]
				// CHECK-SLICES: linalg.matmul ins(%{{.}}, %{{.}} : memref<?x?xf32, strided<[?, 1], offset: ?>>, memref<?x?xf32, strided<[?, 1], offset: ?>>) outs(%{{.*}} : memref<?x?xf32, strided<[?, 1], offset: ?>>)


				// -----

				func.func @batch_matmul_tensor(%A: tensor<1x?x?xf32>, %B: tensor<1x?x?xf32>, %C: tensor<1x?x?xf32>) -> tensor<1x?x?xf32> {
				%0 = linalg.batch_matmul ins(%A, %B : tensor<1x?x?xf32>, tensor<1x?x?xf32>)
				outs(%C : tensor<1x?x?xf32>) -> tensor<1x?x?xf32>
				return %0 : tensor<1x?x?xf32>
				}

				// CHECK-LABEL: func @batch_matmul_tensor
				// CHECK-SAME: (%[[A:.+]]: tensor<1x?x?xf32>, %[[B:.+]]: tensor<1x?x?xf32>, %[[C:.+]]: tensor<1x?x?xf32>)
				// CHECK: %[[RA:.+]] = tensor.collapse_shape %[[A]] {{\[}}[0, 1], [2]] : tensor<1x?x?xf32> into tensor<?x?xf32>
				// CHECK: %[[RB:.+]] = tensor.collapse_shape %[[B]] {{\[}}[0, 1], [2]] : tensor<1x?x?xf32> into tensor<?x?xf32>
				// CHECK: %[[RC:.+]] = tensor.collapse_shape %[[C]] {{\[}}[0, 1], [2]] : tensor<1x?x?xf32> into tensor<?x?xf32>
				// CHECK: %[[D:.+]] = linalg.matmul ins(%[[RA]], %[[RB]] : tensor<?x?xf32>, tensor<?x?xf32>) outs(%[[RC]] : tensor<?x?xf32>) -> tensor<?x?xf32>
				// CHECK: %[[E:.+]] = tensor.expand_shape %[[D]] {{\[}}[0, 1], [2]] : tensor<?x?xf32> into tensor<1x?x?xf32>

				// CHECK-LABEL-SLICES: func @batch_matmul_tensor
				// CHECK-SLICES: tensor.extract_slice %{{.}}[0, 0, 0] [1, %{{.}}, %{{.*}}] [1, 1, 1] : tensor<1x?x?xf32> to tensor<?x?xf32>
				// CHECK-SLICES: tensor.extract_slice %{{.}}[0, 0, 0] [1, %{{.}}, %{{.*}}] [1, 1, 1] : tensor<1x?x?xf32> to tensor<?x?xf32>
				// CHECK-SLICES: tensor.extract_slice %{{.}}[0, 0, 0] [1, %{{.}}, %{{.*}}] [1, 1, 1] : tensor<1x?x?xf32> to tensor<?x?xf32>
				// CHECK-SLICES: linalg.matmul ins(%{{.}}, %{{.}} : tensor<?x?xf32>, tensor<?x?xf32>) outs(%{{.*}} : tensor<?x?xf32>) -> tensor<?x?xf32>
				// CHECK-SLICES: tensor.insert_slice %{{.}} into %{{.}}[0, 0, 0] [1, %{{.}}, %{{.}}] [1, 1, 1] : tensor<?x?xf32> into tensor<1x?x?xf32>