This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
-
DropUnitDims.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
drop-unit-extent-dims.mlir

Differential D139118

[mlir][linalg] Do not drop unit dims of extract_slice/insert_slice ops
AbandonedPublic

Authored by springerm on Dec 1 2022, 8:07 AM.

Download Raw Diff

Details

Reviewers

mravishankar
nicolasvasilache
hanchung

Summary

The pass still drops unit dims of Linalg ops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Dec 1 2022, 8:07 AM

Herald added a reviewer: hanchung. · View Herald TranscriptDec 1 2022, 8:07 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, bzcheeseman, sdasgup3 and 21 others. · View Herald Transcript

springerm requested review of this revision.Dec 1 2022, 8:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 1 2022, 8:07 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

springerm added a child revision: D139103: [mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape.Dec 1 2022, 8:08 AM

I tried this change with IREE and did not see any failures. (I was not running various iree/tests/e2e tests.)

Fantastic, getting rid of intoducing expand/collapse goes a long way!

This revision is now accepted and ready to land.Dec 1 2022, 8:13 AM

Please don't remove these patterns. If these patterns are interfering with things please split them into two populate* methods

This revision now requires changes to proceed.Dec 1 2022, 8:17 AM

These patterns prevent the opposite transforms from becoming canonicalization patterns.

Because of tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);.

/// Patterns that are used to canonicalize the use of unit-extent dims for
/// broadcasting.
void mlir::linalg::populateFoldUnitExtentDimsPatterns(
    RewritePatternSet &patterns) {
  auto *context = patterns.getContext();
  patterns.add<FoldUnitDimLoops, AddInitOperandsToInput, ReplaceUnitExtents,
               RankReducedExtractSliceOp,
               RankReducedInsertSliceOp<tensor::InsertSliceOp>,
               RankReducedInsertSliceOp<tensor::ParallelInsertSliceOp>>(
      context);
  linalg::FillOp::getCanonicalizationPatterns(patterns, context);
  tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);
  tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);
  tensor::ExpandShapeOp::getCanonicalizationPatterns(patterns, context);
  memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);
  memref::populateResolveShapedTypeResultDimsPatterns(patterns);
}

I'd like to understand if and why these patterns are needed.

Harbormaster completed remote builds in B200534: Diff 479300.Dec 1 2022, 9:30 AM

In D139118#3963879, @mravishankar wrote:

Please don't remove these patterns. If these patterns are interfering with things please split them into two populate* methods

Dropping unit dims with expand/collapse is very problematic for a quite a bunch of reasons, they even trigger cases where we cannot statically know how to lower to LLVM (@qcolombet has been digging here).

Short-term, I am not opposed to keeping those in a quarantined populate* for now if you have a compelling use case, but just be aware that they need to go and the expand/collapse verifier is likely going to be tightened to disallow the ambiguous cases, which will make these patterns fail in certain cases.
The right way to drop 1s is via insert/extract_slice that also work generally with tiling.

We're still gathering data but please share compelling use cases that we would need to take into account.
The fact that IREE does not seem affected by this is a strong data point that Matthias provides.

In D139118#3964195, @nicolasvasilache wrote:

In D139118#3963879, @mravishankar wrote:

Please don't remove these patterns. If these patterns are interfering with things please split them into two populate* methods

Dropping unit dims with expand/collapse is very problematic for a quite a bunch of reasons, they even trigger cases where we cannot statically know how to lower to LLVM (@qcolombet has been digging here).

Short-term, I am not opposed to keeping those in a quarantined populate* for now if you have a compelling use case, but just be aware that they need to go and the expand/collapse verifier is likely going to be tightened to disallow the ambiguous cases, which will make these patterns fail in certain cases.
The right way to drop 1s is via insert/extract_slice that also work generally with tiling.

I am not even sure you looked at what the pattern is trying to do, it is trying to coax the program to use rank-reducing extracts and inserts. This is used in IREE basically because the input program does NOT use rank-reduced extracts and insert slices, and these patterns were trying to get there. The reshapes introduced are for either the result of the slice where the unit dims need to be rematerialized for satisfying users (these are cleaned up later), or to get the source of the insert slice into the shape to make the tensor.insert_slice rank-reducing.

We're still gathering data but please share compelling use cases that we would need to take into account.
The fact that IREE does not seem affected by this is a strong data point that Matthias provides.

There might not be failures, but that doesnt mean they arent affected. We dont track all the metrics in CI that might affect this (CI infra is still being built up to track all of these). These were uncovered while looking through models and reading through the IR coming from the input. So these changes might not show failures but show up later when evaluating models. CI on IREE side needs to get better to be able to catch material changes on patches (thats WIP), but also using lit tests dont allow you to check whole model behavior.

In D139118#3963911, @springerm wrote:

These patterns prevent the opposite transforms from becoming canonicalization patterns.

I am really confused. Adding these as canonicalization patterns is a red flag for me. We've been burnt by adding too many things into canonicalizers, but that same rubric is not being applied in this case?

Because of tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);.

/// Patterns that are used to canonicalize the use of unit-extent dims for
/// broadcasting.
void mlir::linalg::populateFoldUnitExtentDimsPatterns(
    RewritePatternSet &patterns) {
  auto *context = patterns.getContext();
  patterns.add<FoldUnitDimLoops, AddInitOperandsToInput, ReplaceUnitExtents,
               RankReducedExtractSliceOp,
               RankReducedInsertSliceOp<tensor::InsertSliceOp>,
               RankReducedInsertSliceOp<tensor::ParallelInsertSliceOp>>(
      context);
  linalg::FillOp::getCanonicalizationPatterns(patterns, context);
  tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);
  tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);
  tensor::ExpandShapeOp::getCanonicalizationPatterns(patterns, context);
  memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);
  memref::populateResolveShapedTypeResultDimsPatterns(patterns);
}

I'd like to understand if and why these patterns are needed.

mravishankar mentioned this in D139104: [mlir][tensor] Fold rank-reducing insert_slice with inverse collapse_shape.Dec 2 2022, 11:47 AM

mravishankar mentioned this in D139220: [mlir][tensor] Fold rank-reducing extract_slice with inverse expand_shape.Dec 2 2022, 1:16 PM

mravishankar mentioned this in D139221: [mlir][tensor] Fold rank-reducing insert_slice with inverse collapse_shape.

springerm abandoned this revision.Jan 5 2023, 1:24 AM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

DropUnitDims.cpp

73 lines

test/

Dialect/

Linalg/

drop-unit-extent-dims.mlir

278 lines

Diff 479300

mlir/lib/Dialect/Linalg/Transforms/DropUnitDims.cpp

Show First 20 Lines • Show All 579 Lines • ▼ Show 20 Lines	for (const auto &result : llvm::enumerate(replacementOp.getResults())) {
resultReplacements.push_back(newResult);		resultReplacements.push_back(newResult);
}		}
rewriter.replaceOp(genericOp, resultReplacements);		rewriter.replaceOp(genericOp, resultReplacements);
return success();		return success();
}		}
};		};
} // namespace		} // namespace

namespace {
/// Convert `extract_slice` operations to rank-reduced versions.
struct RankReducedExtractSliceOp
: public OpRewritePattern<tensor::ExtractSliceOp> {
using OpRewritePattern<tensor::ExtractSliceOp>::OpRewritePattern;

LogicalResult matchAndRewrite(tensor::ExtractSliceOp sliceOp,
PatternRewriter &rewriter) const override {
RankedTensorType resultType = sliceOp.getType();
SmallVector<OpFoldResult> offsets = sliceOp.getMixedOffsets();
SmallVector<OpFoldResult> sizes = sliceOp.getMixedSizes();
SmallVector<OpFoldResult> strides = sliceOp.getMixedStrides();
auto reassociation = getReassociationMapForFoldingUnitDims(sizes);
if (!reassociation \|\|
reassociation->size() == static_cast<size_t>(resultType.getRank()))
return failure();
auto rankReducedType =
tensor::ExtractSliceOp::inferCanonicalRankReducedResultType(
reassociation->size(), sliceOp.getSourceType(), offsets, sizes,
strides)
.cast<RankedTensorType>();

Location loc = sliceOp.getLoc();
Value newSlice = rewriter.create<tensor::ExtractSliceOp>(
loc, rankReducedType, sliceOp.getSource(), offsets, sizes, strides);
rewriter.replaceOpWithNewOp<tensor::ExpandShapeOp>(
sliceOp, resultType, newSlice, *reassociation);
return success();
}
};

/// Convert `insert_slice` operations to rank-reduced versions.
/// This patterns works with both InsertSliceOp and ParallelInsertSliceOp.
template <typename InsertOpTy>
struct RankReducedInsertSliceOp : public OpRewritePattern<InsertOpTy> {
using OpRewritePattern<InsertOpTy>::OpRewritePattern;

LogicalResult matchAndRewrite(InsertOpTy insertSliceOp,
PatternRewriter &rewriter) const override {
RankedTensorType sourceType = insertSliceOp.getSourceType();
SmallVector<OpFoldResult> offsets = insertSliceOp.getMixedOffsets();
SmallVector<OpFoldResult> sizes = insertSliceOp.getMixedSizes();
SmallVector<OpFoldResult> strides = insertSliceOp.getMixedStrides();
auto reassociation = getReassociationMapForFoldingUnitDims(sizes);
if (!reassociation \|\|
reassociation->size() == static_cast<size_t>(sourceType.getRank()))
return failure();
Location loc = insertSliceOp.getLoc();
tensor::CollapseShapeOp reshapedSource;
{
OpBuilder::InsertionGuard g(rewriter);
// The only difference between InsertSliceOp and ParallelInsertSliceOp is
// the insertion point is just before the ParallelCombiningOp in the
// parallel case.
if (std::is_same<InsertOpTy, tensor::ParallelInsertSliceOp>::value)
rewriter.setInsertionPoint(insertSliceOp->getParentOp());
reshapedSource = rewriter.create<tensor::CollapseShapeOp>(
loc, insertSliceOp.getSource(), *reassociation);
}
rewriter.replaceOpWithNewOp<InsertOpTy>(
insertSliceOp, reshapedSource, insertSliceOp.getDest(),
insertSliceOp.getMixedOffsets(), insertSliceOp.getMixedSizes(),
insertSliceOp.getMixedStrides());
return success();
}
};
} // namespace

/// Patterns that are used to canonicalize the use of unit-extent dims for		/// Patterns that are used to canonicalize the use of unit-extent dims for
/// broadcasting.		/// broadcasting.
void mlir::linalg::populateFoldUnitExtentDimsPatterns(		void mlir::linalg::populateFoldUnitExtentDimsPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
auto *context = patterns.getContext();		auto *context = patterns.getContext();
patterns.add<FoldUnitDimLoops, AddInitOperandsToInput, ReplaceUnitExtents,		patterns.add<FoldUnitDimLoops, AddInitOperandsToInput, ReplaceUnitExtents>(
RankReducedExtractSliceOp,
RankReducedInsertSliceOp<tensor::InsertSliceOp>,
RankReducedInsertSliceOp<tensor::ParallelInsertSliceOp>>(
context);		context);
linalg::FillOp::getCanonicalizationPatterns(patterns, context);		linalg::FillOp::getCanonicalizationPatterns(patterns, context);
tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);		tensor::CollapseShapeOp::getCanonicalizationPatterns(patterns, context);
tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);		tensor::EmptyOp::getCanonicalizationPatterns(patterns, context);
tensor::ExpandShapeOp::getCanonicalizationPatterns(patterns, context);		tensor::ExpandShapeOp::getCanonicalizationPatterns(patterns, context);
memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);		memref::populateResolveRankedShapeTypeResultDimsPatterns(patterns);
memref::populateResolveShapedTypeResultDimsPatterns(patterns);		memref::populateResolveShapedTypeResultDimsPatterns(patterns);
}		}
Show All 21 Lines

mlir/test/Dialect/Linalg/drop-unit-extent-dims.mlir

	// RUN: mlir-opt %s -split-input-file -pass-pipeline="builtin.module(func.func(linalg-fold-unit-extent-dims))" \| FileCheck %s			// RUN: mlir-opt %s -split-input-file -pass-pipeline="builtin.module(func.func(linalg-fold-unit-extent-dims))" \| FileCheck %s

	#accesses = [
	affine_map<(i, j, k, l, m) -> (i, k, m)>,
	affine_map<(i, j, k, l, m) -> ()>,
	affine_map<(i, j, k, l, m) -> (i, k, j, l, m)>
	]

	#trait = {
	iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"],
	indexing_maps = #accesses,
	library_call = "some_external_func"
	}

	func.func @drop_one_trip_loops(%arg0 : tensor<?x1x?xf32>, %arg1 : f32, %shape: tensor<?x1x?x1x?xf32>) -> tensor<?x1x?x1x?xf32> {
	%0 = linalg.generic #trait
	ins(%arg0, %arg1 : tensor<?x1x?xf32>, f32)
	outs(%shape : tensor<?x1x?x1x?xf32>) {
	^bb0(%arg2 : f32, %arg3 : f32, %arg4 : f32) :
	linalg.yield %arg3 : f32
	} -> tensor<?x1x?x1x?xf32>
	return %0 : tensor<?x1x?x1x?xf32>
	}
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1, d2) -> (d0, d2)>
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1, d2) -> ()>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
	// CHECK-LABEL: func @drop_one_trip_loops
	// CHECK: tensor.collapse_shape %{{.*}} {{\[}}[0, 1], [2]]
	// CHECK: linalg.generic
	// CHECK-SAME: indexing_maps = [#[[$MAP1]], #[[$MAP2]], #[[$MAP3]]]
	// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel"]
	// CHECK: tensor.expand_shape %{{.*}} {{\[}}[0, 1], [2, 3], [4]]

	// -----

	#accesses = [
	affine_map<(i, j, k, l, m) -> (i, k, m)>,
	affine_map<(i, j, k, l, m) -> (i, k, j, l, m)>
	]

	#trait = {
	iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"],
	indexing_maps = #accesses,
	library_call = "some_external_func"
	}

	func.func @drop_one_trip_loops_indexed
	(%arg0 : tensor<?x1x?xi32>, %shape: tensor<?x1x?x1x?xi32>) -> tensor<?x1x?x1x?xi32>
	{
	%0 = linalg.generic #trait
	ins(%arg0 : tensor<?x1x?xi32>)
	outs(%shape: tensor<?x1x?x1x?xi32>) {
	^bb0(%arg6 : i32, %arg7 : i32) :
	%idx0 = linalg.index 0 : index
	%idx1 = linalg.index 1 : index
	%idx2 = linalg.index 2 : index
	%idx3 = linalg.index 3 : index
	%idx4 = linalg.index 4 : index
	%1 = arith.addi %idx0, %idx1 : index
	%2 = arith.subi %1, %idx2 : index
	%3 = arith.subi %2, %idx3 : index
	%4 = arith.addi %3, %idx4 : index
	%5 = arith.index_cast %4 : index to i32
	%6 = arith.addi %5, %arg6 : i32
	linalg.yield %6 : i32
	} -> tensor<?x1x?x1x?xi32>
	return %0 : tensor<?x1x?x1x?xi32>
	}
	// The subtractions disappear the access map of the output tensor maps its unit
	// dimensions 1 and 3 to the index dimensions 2 and 3.
	// CHECK-LABEL: func @drop_one_trip_loops_indexed
	// CHECK: linalg.generic
	// CHECK: ^{{.+}}(
	// CHECK-SAME: %[[ARG4:[a-zA-Z0-9]+]]: i32, %{{.*}}: i32)
	// CHECK: %[[IDX0:.+]] = linalg.index 0 : index
	// CHECK: %[[IDX1:.+]] = linalg.index 1 : index
	// CHECK: %[[IDX2:.+]] = linalg.index 2 : index
	// CHECK: %[[T3:.+]] = arith.addi %[[IDX0]], %[[IDX1]]
	// CHECK: %[[T4:.+]] = arith.addi %[[T3]], %[[IDX2]]
	// CHECK: %[[T5:.+]] = arith.index_cast %[[T4]] : index to i32
	// CHECK: %[[T6:.+]] = arith.addi %[[T5]], %[[ARG4]] : i32
	// CHECK: linalg.yield %[[T6]] : i32

	// -----

	#map0 = affine_map<(i, j) -> (i, j)>
	#access = [#map0, #map0]
	#trait = {
	iterator_types = ["parallel", "parallel"],
	indexing_maps = #access,
	library_call = "some_external_func"
	}

	func.func @drop_all_loops(%arg0 : tensor<1x1xf32>) -> tensor<1x1xf32>
	{
	%0 = linalg.generic #trait
	ins(%arg0 : tensor<1x1xf32>)
	outs(%arg0 : tensor<1x1xf32>) {
	^bb0(%arg1: f32, %arg2: f32) :
	linalg.yield %arg1 : f32
	} -> tensor<1x1xf32>
	return %0 : tensor<1x1xf32>
	}
	// CHECK: #[[$MAP0:.*]] = affine_map<() -> ()>
	// CHECK-LABEL: func @drop_all_loops
	// CHECK: tensor.collapse_shape %{{.*}} []
	// CHECK: linalg.generic
	// CHECK-SAME: indexing_maps = [#[[$MAP0]], #[[$MAP0]]]
	// CHECK-SAME: iterator_types = []

	// -----

	#map0 = affine_map<(i, j) -> (i, j)>
	#access = [#map0, #map0]
	#trait = {
	iterator_types = ["parallel", "parallel"],
	indexing_maps = #access,
	library_call = "some_external_func"
	}

	func.func @drop_all_loops_indexed
	(%arg0 : tensor<1x1xi32>) -> tensor<1x1xi32>{
	%0 = linalg.generic #trait
	ins(%arg0 : tensor<1x1xi32>)
	outs(%arg0 : tensor<1x1xi32>) {
	^bb0(%arg3: i32, %arg4: i32) :
	%idx0 = linalg.index 0 : index
	%idx1 = linalg.index 1 : index
	%1 = arith.addi %idx0, %idx1 : index
	%2 = arith.index_cast %1 : index to i32
	%3 = arith.addi %2, %arg3 : i32
	linalg.yield %3 : i32
	} -> tensor<1x1xi32>
	return %0 : tensor<1x1xi32>
	}

	// CHECK-LABEL: func @drop_all_loops_indexed
	// CHECK: linalg.generic
	// CHECK: ^{{.+}}(%[[ARG1:.+]]: i32, %[[ARG2:.+]]: i32)
	// CHECK: linalg.yield %[[ARG1]] : i32

	// -----

	#accesses = [
	affine_map<(d0) -> (0, d0)>,
	affine_map<(d0) -> (d0)>
	]

	#trait = {
	indexing_maps = #accesses,
	iterator_types = ["parallel"],
	library_call = "some_external_fn"
	}

	func.func @leading_dim_1_canonicalization(%arg0: tensor<1x5xf32>, %shape: tensor<5xf32>) -> tensor<5xf32> {
	%0 = linalg.generic #trait
	ins(%arg0 : tensor<1x5xf32>)
	outs(%shape : tensor<5xf32>) {
	^bb0(%arg2: f32, %arg3: f32):
	linalg.yield %arg2 : f32
	} -> tensor<5xf32>
	return %0 : tensor<5xf32>
	}
	// CHECK: #[[$MAP1:.*]] = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @leading_dim_1_canonicalization
	// CHECK: tensor.collapse_shape %{{.*}} {{\[}}[0, 1]]
	// CHECK: linalg.generic
	// CHECK-SAME: indexing_maps = [#[[$MAP1]], #[[$MAP1]]]
	// CHECK-SAME: iterator_types = ["parallel"]

	// -----

	#accesses = [
	affine_map<(d0, d1) -> (0, d1)>,
	affine_map<(d0, d1) -> (d0, 0)>,
	affine_map<(d0, d1) -> (d0, d1)>
	]

	#trait = {
	indexing_maps = #accesses,
	iterator_types = ["parallel", "parallel"],
	library_call = "some_external_fn"
	}

	func.func @broadcast_test(%arg0 : tensor<5xf32>, %arg1 : tensor<5xf32>, %shape : tensor<5x5xf32>) -> tensor<5x5xf32>
	{
	%0 = tensor.expand_shape %arg0 [[0, 1]] : tensor<5xf32> into tensor<1x5xf32>
	%1 = tensor.expand_shape %arg1 [[0, 1]] : tensor<5xf32> into tensor<5x1xf32>
	%2 = linalg.generic #trait
	ins(%0, %1 : tensor<1x5xf32>, tensor<5x1xf32>)
	outs(%shape : tensor<5x5xf32>) {
	^bb0(%arg3: f32, %arg4: f32, %arg5: f32):
	%3 = arith.addf %arg3, %arg4 : f32
	linalg.yield %3 : f32
	} -> tensor<5x5xf32>
	return %2 : tensor<5x5xf32>
	}
	// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1) -> (d1)>
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0, d1) -> (d0)>
	// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0, d1) -> (d0, d1)>
	// CHECK-LABEL: func @broadcast_test
	// CHECK-NOT: linalg.tensor_{{.*}}shape
	// CHECK: linalg.generic
	// CHECK-SAME: indexing_maps = [#[[$MAP0]], #[[$MAP1]], #[[$MAP2]]]
	// CHECK-SAME: iterator_types = ["parallel", "parallel"]
	// CHECK-NOT: linalg.tensor_{{.*}}shape

	// -----

	#accesses = [			#accesses = [
	affine_map<(d0, d1) -> (0, 0)>,			affine_map<(d0, d1) -> (0, 0)>,
	affine_map<(d0, d1) -> (d0, d1)>			affine_map<(d0, d1) -> (d0, d1)>
	]			]

	#trait = {			#trait = {
	indexing_maps = #accesses,			indexing_maps = #accesses,
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	// CHECK: %[[GENERIC:.+]] = linalg.generic			// CHECK: %[[GENERIC:.+]] = linalg.generic
	// CHECK-SAME: indexing_maps = [#[[MAP1]], #[[MAP2]]]			// CHECK-SAME: indexing_maps = [#[[MAP1]], #[[MAP2]]]
	// CHECK-SAME: iterator_types = ["reduction"]			// CHECK-SAME: iterator_types = ["reduction"]
	// CHECK-SAME: ins(%[[INPUT_RESHAPE]] : tensor<1000xf32>)			// CHECK-SAME: ins(%[[INPUT_RESHAPE]] : tensor<1000xf32>)
	// CHECK-SAME: outs(%[[FILL]] : tensor<f32>)			// CHECK-SAME: outs(%[[FILL]] : tensor<f32>)
	// CHECK: %[[GENERIC_RESHAPE:.+]] = tensor.expand_shape %[[GENERIC]] [] : tensor<f32> into tensor<1xf32>			// CHECK: %[[GENERIC_RESHAPE:.+]] = tensor.expand_shape %[[GENERIC]] [] : tensor<f32> into tensor<1xf32>
	// CHECK: return %[[GENERIC_RESHAPE:.+]] : tensor<1xf32>			// CHECK: return %[[GENERIC_RESHAPE:.+]] : tensor<1xf32>


	// -----

	func.func @fold_slice(
	%arg0 : tensor<1x?x?x1x?x1x1xf32>, %arg1 : tensor<1x?x?x?x?x1x1xf32>,
	%arg2 : index, %arg3 : index, %arg4 : index, %arg5 : index,
	%arg6 : index, %arg7 : index) -> (tensor<1x?x?x1x?x1x1xf32>, tensor<1x?x?x1x?x1x1xf32>) {
	%0 = tensor.extract_slice %arg0[0, %arg2, %arg3, 0, %arg4, 0, 0]
	[1, %arg5, %arg6, 1, %arg7, 1, 1] [1, 1, 1, 1, 1, 1, 1] :
	tensor<1x?x?x1x?x1x1xf32> to tensor<1x?x?x1x?x1x1xf32>
	%1 = tensor.extract_slice %arg1[%arg2, 0, %arg3, 0, 0, %arg4, 0]
	[1, %arg5, %arg6, 1, %arg7, 1, 1] [1, 1, 1, 1, 1, 1, 1] :
	tensor<1x?x?x?x?x1x1xf32> to tensor<1x?x?x1x?x1x1xf32>
	return %0, %1 : tensor<1x?x?x1x?x1x1xf32>, tensor<1x?x?x1x?x1x1xf32>
	}
	// CHECK: func @fold_slice
	// CHECK-SAME: %[[ARG0:.+]]: tensor<1x?x?x1x?x1x1xf32>
	// CHECK-SAME: %[[ARG1:.+]]: tensor<1x?x?x?x?x1x1xf32>
	// CHECK: %[[SLICE1:.+]] = tensor.extract_slice %[[ARG0]]
	// CHECK-SAME: to tensor<?x?x?xf32>
	// CHECK: %[[RESULT1:.+]] = tensor.expand_shape %[[SLICE1]]
	// CHECK-SAME: [0, 1], [2], [3, 4, 5, 6]
	// CHECK: %[[SLICE2:.+]] = tensor.extract_slice %[[ARG1]]
	// CHECK-SAME: to tensor<?x?x?xf32>
	// CHECK: %[[RESULT2:.+]] = tensor.expand_shape %[[SLICE2]]
	// CHECK-SAME: [0, 1], [2], [3, 4, 5, 6]
	// CHECK: return %[[RESULT1]], %[[RESULT2]]

	// -----			// -----

	func.func @unit_dim_for_reduction(%arg0: tensor<1x?x1x?xf32>) -> tensor<1x?xf32> {			func.func @unit_dim_for_reduction(%arg0: tensor<1x?x1x?xf32>) -> tensor<1x?xf32> {
	%cst = arith.constant 1.000000e+00 : f32			%cst = arith.constant 1.000000e+00 : f32
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%0 = tensor.dim %arg0, %c3 : tensor<1x?x1x?xf32>			%0 = tensor.dim %arg0, %c3 : tensor<1x?x1x?xf32>
	%1 = tensor.empty(%0) : tensor<1x?xf32>			%1 = tensor.empty(%0) : tensor<1x?xf32>
	%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x?xf32>) -> tensor<1x?xf32>			%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x?xf32>) -> tensor<1x?xf32>
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	// CHECK-SAME: iterator_types = ["parallel", "reduction"]			// CHECK-SAME: iterator_types = ["parallel", "reduction"]
	// CHECK-SAME: ins(%[[RESHAPE]] : tensor<?x?xf32>)			// CHECK-SAME: ins(%[[RESHAPE]] : tensor<?x?xf32>)
	// CHECK-SAME: outs(%[[FILL]] : tensor<?xf32>)			// CHECK-SAME: outs(%[[FILL]] : tensor<?xf32>)
	// CHECK: %[[RESULT_RESHAPE:.+]] = tensor.expand_shape %[[RESULT]] {{\[}}[0, 1]]			// CHECK: %[[RESULT_RESHAPE:.+]] = tensor.expand_shape %[[RESULT]] {{\[}}[0, 1]]
	// CHECK: return %[[RESULT_RESHAPE]]			// CHECK: return %[[RESULT_RESHAPE]]

	// -----			// -----

	func.func @slice_unit_dims(%arg0: tensor<1x3xf32>) -> tensor<1x1xf32> {
	%0 = tensor.extract_slice %arg0[0, 2] [1, 1] [1, 1] : tensor<1x3xf32> to tensor<1x1xf32>
	return %0 : tensor<1x1xf32>
	}
	// CHECK-LABEL: func @slice_unit_dims
	// CHECK: %[[SLICE:.+]] = tensor.extract_slice
	// CHECK-SAME: tensor<1x3xf32> to tensor<f32>
	// CHECK: %[[RESULT:.+]] = tensor.expand_shape %[[SLICE]] []
	// CHECK: return %[[RESULT]]

	// -----

	func.func @insert_slice_unit_dims(%arg0: tensor<1x3xf32>, %arg1: tensor<1x1xf32>) -> tensor<1x3xf32> {
	%0 = tensor.insert_slice %arg1 into %arg0[0, 2] [1, 1] [1, 1] : tensor<1x1xf32> into tensor<1x3xf32>
	return %0 : tensor<1x3xf32>
	}
	// CHECK-LABEL: func @insert_slice_unit_dims
	// CHECK: %[[RESHAPE:.+]] = tensor.collapse_shape %{{.+}} []
	// CHECK: %[[RESULT:.+]] = tensor.insert_slice %[[RESHAPE]]
	// CHECK-SAME: tensor<f32> into tensor<1x3xf32>
	// CHECK: return %[[RESULT]]

	// -----

	#accesses = [			#accesses = [
	affine_map<(i, j, k, l, m) -> (i, k, m)>,			affine_map<(i, j, k, l, m) -> (i, k, m)>,
	affine_map<(i, j, k, l, m) -> ()>,			affine_map<(i, j, k, l, m) -> ()>,
	affine_map<(i, j, k, l, m) -> (i, k, j, l, m)>			affine_map<(i, j, k, l, m) -> (i, k, j, l, m)>
	]			]

	#trait = {			#trait = {
	iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"],			iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"],
	▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines
	}			}

	// CHECK-LABEL: func @sparse_case			// CHECK-LABEL: func @sparse_case
	// CHECK-NEXT: tensor.empty			// CHECK-NEXT: tensor.empty
	// CHECK-NEXT: linalg.generic			// CHECK-NEXT: linalg.generic

	// -----			// -----

	func.func @reduce_dispatch_0() -> tensor<4x2xf32> {
	%c2 = arith.constant 2 : index
	%c4 = arith.constant 4 : index
	%cst = arith.constant 0.000000e+00 : f32
	%0 = tensor.empty() : tensor<4x2xf32>
	%res = scf.foreach_thread (%arg0, %arg1) in (%c4, %c2) shared_outs(%o = %0) -> (tensor<4x2xf32>) {
	%1 = tensor.empty() : tensor<1x1xf32>
	%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<1x1xf32>) -> tensor<1x1xf32>
	scf.foreach_thread.perform_concurrently {
	// CHECK: tensor.parallel_insert_slice %{{[0-9a-z]}} into %{{[0-9a-z]}}
	// CHECK-SAME: [%{{.}}, %{{.}}] [1, 1] [1, 1] : tensor<f32> into tensor<4x2xf32>
	tensor.parallel_insert_slice %2 into %o[%arg0, %arg1] [1, 1] [1, 1] :
	tensor<1x1xf32> into tensor<4x2xf32>
	}
	}
	return %res: tensor<4x2xf32>
	}

	// -----

	#map0 = affine_map<(i, j) -> (i, j)>			#map0 = affine_map<(i, j) -> (i, j)>
	#access = [#map0, #map0]			#access = [#map0, #map0]
	#trait = {			#trait = {
	iterator_types = ["parallel", "parallel"],			iterator_types = ["parallel", "parallel"],
	indexing_maps = #access,			indexing_maps = #access,
	library_call = "some_external_func"			library_call = "some_external_func"
	}			}

	Show All 15 Lines