This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Tensor/Transforms/
-
Dialect/
-
Tensor/
-
Transforms/
12
BufferizableOpInterfaceImpl.cpp
1
Bufferize.cpp
-
test/Dialect/Tensor/
-
Dialect/
-
Tensor/
5
bufferize.mlir

Differential D123027

[mlir] Support tensor.pad bufferization
AbandonedPublic

Authored by tpopp on Apr 4 2022, 6:07 AM.

Download Raw Diff

Details

Reviewers

springerm
pifon2a
bondhugula
nicolasvasilache

Summary

This tensor.pad can be represented as an initialization of the result
with the padding values followed by a strided insert of the input into
the output. To honor enforced invariants of bufferization, the pad
operation creates bufferized forms of both of these steps at the same
time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tpopp created this revision.Apr 4 2022, 6:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 6:07 AM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 19 others. · View Herald Transcript

tpopp requested review of this revision.Apr 4 2022, 6:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 6:07 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B157716: Diff 420154.Apr 4 2022, 6:26 AM

pifon2a accepted this revision.Apr 4 2022, 9:23 AM

pifon2a added inline comments.

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
746	nit: `loc`?
754–760	what about making this function a part of `extraClassDeclaration` for `PadOp`?
760	cache results of `getMixedLowPad` and `getMixedHighPad` since they construct a new vector?
771	super-nit: "maybeResult" and "result" are a bit confusing here. What about just calling it `allocOr` and `alloc`?
783	upperBounds.reserve(rank)?
785	nit: `for (int i = 0, nextDynamicIndex = 0; i < rank; i++)`?
873	nit: remove empty line?

This revision is now accepted and ready to land.Apr 4 2022, 9:23 AM

springerm added inline comments.Apr 4 2022, 9:29 AM

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
867	The buffer of `padOp.source()` is not modified, so this can `return false`.
876	`bufferize()` always allocates a brand new buffer, right? In that case, always `return {}`in this function.

springerm accepted this revision.Apr 4 2022, 9:41 AM

springerm added inline comments.

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
769–770	See comment below. It would be cleaner to query the loop bounds with `ReifyRankedShapedTypeOpInterface`. Can be done in separate commit or add TODO.
mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp
43	Why is this needed?
mlir/test/Dialect/Tensor/bufferize.mlir
379–382	Try this with a test case that has a dynamic result dim size. I think the bufferization will fail. `PadOp` should implement `ReifyRankedShapedTypeOpInterface`, which it does not currently. Once it does, the dynamic bufferization test case should pass. (Can be done in a separate commit.)

springerm added inline comments.Apr 4 2022, 11:47 PM

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
769–770	You can try rebasing on D123108. This implements `ReifyRankedShapedTypeOpInterface`. Should be able to simplify the index computation part.

bondhugula requested changes to this revision.Apr 5 2022, 12:31 AM

bondhugula added subscribers: ftynse, bondhugula.

bondhugula added inline comments.

mlir/test/Dialect/Tensor/bufferize.mlir
395–399	As a general comment, I'm not sure why `scf.parallel`'s are being generated here instead of `affine.parallel` + `affine.store` -- these are trivially affine. And then if you do scf -> affine on this, you'll get this scf output exactly. So you have a strictly better path lowering this through `affine`. I notice there are other lowerings that have chosen this and that's just because they were missed during review propagating the counter-pattern. CC: @ftynse @mehdi_amini

This revision now requires changes to proceed.Apr 5 2022, 12:31 AM

bondhugula added inline comments.Apr 5 2022, 12:32 AM

mlir/test/Dialect/Tensor/bufferize.mlir
395–399	Correction: "scf -> affine" should "affine -> scf" above.

springerm added inline comments.Apr 5 2022, 1:34 AM

mlir/test/Dialect/Tensor/bufferize.mlir
395–399	E.g., `tensor.generate` also lowers to `scf.parallel`. I think there's no particular reason why it was implemented like that. When I touched that bufferization pattern the last time, I just did not think of lowering to affine, then to SCF.

springerm added inline comments.Apr 5 2022, 3:10 AM

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
769–770	Just noticed that the op already implements the interface via an external model: `mlir::tensor::registerInferTypeOpInterfaceExternalModels`.

bondhugula added inline comments.Apr 5 2022, 7:52 PM

mlir/test/Dialect/Tensor/bufferize.mlir
395–399	tensor -> affine -> scf would be natural lowering and de-abstraction here. Lowering directly from tensor -> scf, whenever tensor -> affine is possible, would just be the wrong direction and strictly worse in all possible ways. We shouldn't propagate this further and fix the rest. They were just missed during the review.

To follow on Uday's comment, this should not generate loops at all.
In linalg, you'd generate linalg.fill + subview + linalg.copy and later re-use the lowering to any of the loop flavor you want.

Seems like you are missing a memref.fill or memref.memset abstraction.

nicolasvasilache requested changes to this revision.Apr 6 2022, 1:32 AM

Sorry to delay on this for so long. I am proposing, instead, a TensorToLinalgPass here: https://reviews.llvm.org/D125384

I believe that bufferization should try to stay simple: tensor ops converted to memref ops in the case of accesses and other operations converting their types from tensors to memrefs.

For lowerings like tensor.pad which involve other control flow as discussed here, I believe that should not be considered as part of bufferization but instead should be a more principled lowering at another point in time. This separates the concerns then of tensor operations that are closely tied to the type tensor and operations that are there for convenience but would be just as relevant somewhere like linalg.

tpopp abandoned this revision.May 23 2022, 7:12 AM

Herald added a subscriber: bzcheeseman. · View Herald TranscriptMay 23 2022, 7:12 AM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Tensor/

Transforms/

BufferizableOpInterfaceImpl.cpp

151 lines

Bufferize.cpp

3 lines

test/

Dialect/

Tensor/

bufferize.mlir

33 lines

Diff 420154

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp

Show First 20 Lines • Show All 721 Lines • ▼ Show 20 Lines	LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
auto rankOp = cast<tensor::RankOp>(op);		auto rankOp = cast<tensor::RankOp>(op);
Value v = state.getBuffer(rewriter, rankOp->getOpOperand(0) /source*/);		Value v = state.getBuffer(rewriter, rankOp->getOpOperand(0) /source*/);
replaceOpWithNewBufferizedOp<memref::RankOp>(rewriter, op, rankOp.getType(),		replaceOpWithNewBufferizedOp<memref::RankOp>(rewriter, op, rankOp.getType(),
v);		v);
return success();		return success();
}		}
};		};

		/// Bufferization of tensor.pad.
		struct PadOpInterface
		: public BufferizableOpInterface::ExternalModel<PadOpInterface,
		tensor::PadOp> {
		LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
		BufferizationState &state) const {
		auto padOp = cast<tensor::PadOp>(op);
		auto resultType = padOp.getResultType();
		Location loc = op->getLoc();

		// Given an OpFoldResult, return an index-typed value.
		auto getIdxValue = [&](OpFoldResult ofr) {
		if (auto val = ofr.dyn_cast<Value>())
		return val;
		return rewriter
		.create<arith::ConstantIndexOp>(
		padOp.getLoc(), ofr.get<Attribute>().cast<IntegerAttr>().getInt())
		pifon2aUnsubmitted Not Done Reply Inline Actions nit: `loc`? pifon2a: nit: `loc`?
		.getResult();
		};

		// Compute size of InitTensorOp. Any combination of static/dynamic is
		// supported.
		SmallVector<Value> dynSizes;
		SmallVector<int64_t> staticSizes;
		for (unsigned dim = 0; dim < resultType.getRank(); ++dim) {
		if (resultType.isDynamicDim(dim)) {
		auto srcSize = rewriter.createOrFold<tensor::DimOp>(
		padOp.getLoc(), padOp.source(), dim);
		// Add low and high padding value.
		auto plusLow = rewriter.createOrFold<arith::AddIOp>(
		padOp.getLoc(), srcSize, getIdxValue(padOp.getMixedLowPad()[dim]));
		pifon2aUnsubmitted Not Done Reply Inline Actions cache results of `getMixedLowPad` and `getMixedHighPad` since they construct a new vector? pifon2a: cache results of `getMixedLowPad` and `getMixedHighPad` since they construct a new vector?
		pifon2aUnsubmitted Not Done Reply Inline Actions what about making this function a part of `extraClassDeclaration` for `PadOp`? pifon2a: what about making this function a part of `extraClassDeclaration` for `PadOp`?
		auto plusHigh = rewriter.createOrFold<arith::AddIOp>(
		padOp.getLoc(), plusLow, getIdxValue(padOp.getMixedHighPad()[dim]));
		dynSizes.push_back(plusHigh);
		}
		staticSizes.push_back(resultType.getDimSize(dim));
		}

		// Allocate memory.
		MemRefType memrefType =
		getContiguousMemRefType(padOp.getType().cast<RankedTensorType>());
		springermUnsubmitted Not Done Reply Inline Actions See comment below. It would be cleaner to query the loop bounds with `ReifyRankedShapedTypeOpInterface`. Can be done in separate commit or add TODO. springerm: See comment below. It would be cleaner to query the loop bounds with…
		springermUnsubmitted Not Done Reply Inline Actions You can try rebasing on D123108. This implements `ReifyRankedShapedTypeOpInterface`. Should be able to simplify the index computation part. springerm: You can try rebasing on D123108. This implements `ReifyRankedShapedTypeOpInterface`. Should be…
		springermUnsubmitted Not Done Reply Inline Actions Just noticed that the op already implements the interface via an external model: `mlir::tensor::registerInferTypeOpInterfaceExternalModels`. springerm: Just noticed that the op already implements the interface via an external model: `mlir::tensor…
		FailureOr<Value> maybeResult =
		pifon2aUnsubmitted Not Done Reply Inline Actions super-nit: "maybeResult" and "result" are a bit confusing here. What about just calling it `allocOr` and `alloc`? pifon2a: super-nit: "maybeResult" and "result" are a bit confusing here. What about just calling it…
		state.createAlloc(rewriter, loc, padOp.getResult());
		if (failed(maybeResult))
		return failure();
		Value result = *maybeResult;

		// Collect loop bounds.
		int64_t rank = memrefType.getRank();
		Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);
		Value one = rewriter.create<arith::ConstantIndexOp>(loc, 1);
		SmallVector<Value, 4> lowerBounds(rank, zero);
		SmallVector<Value, 4> steps(rank, one);
		SmallVector<Value, 4> upperBounds;
		pifon2aUnsubmitted Not Done Reply Inline Actions upperBounds.reserve(rank)? pifon2a: upperBounds.reserve(rank)?
		int nextDynamicIndex = 0;
		for (int i = 0; i < rank; i++) {
		pifon2aUnsubmitted Not Done Reply Inline Actions nit: `for (int i = 0, nextDynamicIndex = 0; i < rank; i++)`? pifon2a: nit: ` for (int i = 0, nextDynamicIndex = 0; i < rank; i++)`?
		Value upperBound = memrefType.isDynamicDim(i)
		? dynSizes[nextDynamicIndex++]
		: rewriter.create<arith::ConstantIndexOp>(
		loc, memrefType.getDimSize(i));
		upperBounds.push_back(upperBound);
		}

		// Generate tensor elements with a parallel loop that stores into
		// each element of the resulting memref. We use mergeBlockBefore to "move"
		// this op's body into the scf.parallel's body.
		auto parallel =
		rewriter.create<scf::ParallelOp>(loc, lowerBounds, upperBounds, steps);
		Block *parallelBody = parallel.getBody();
		rewriter.mergeBlockBefore(padOp.getBody(), parallelBody->getTerminator(),
		parallelBody->getArguments());
		// Replace the inlined yield op with a store op. The scf.parallel's builder
		// already populated an scf.yield at the end, so we don't need to worry
		// about creating that.
		Operation *elementYield = parallelBody->getTerminator()->getPrevNode();
		{
		RewriterBase::InsertionGuard g(rewriter);
		rewriter.setInsertionPointAfter(elementYield);
		rewriter.replaceOpWithNewOp<memref::StoreOp>(
		elementYield, elementYield->getOperands()[0], result,
		parallelBody->getArguments());
		}

		// Generate a InsertSliceOp for copying the PadOp source. This is directly
		// lowered to honor invariants required for bufferization results.
		auto sourceType = padOp.getSourceType();
		// Compute size of source of tensor::PadOp.
		SmallVector<OpFoldResult> mixedSizes;
		for (unsigned dim = 0; dim < sourceType.getRank(); ++dim) {
		if (sourceType.isDynamicDim(dim)) {
		mixedSizes.push_back(rewriter.createOrFold<tensor::DimOp>(
		padOp.getLoc(), padOp.source(), dim));
		} else {
		mixedSizes.push_back(rewriter.getIndexAttr(sourceType.getDimSize(dim)));
		}
		}
		// Strides of InsertSliceOp are all 1.
		SmallVector<OpFoldResult> mixedStrides(rank, rewriter.getIndexAttr(1));

		// Expand offsets, sizes and strides to the full rank to handle the
		// rank-reducing case.
		SmallVector<OpFoldResult> mixedOffsets = padOp.getMixedLowPad();
		OffsetSizeAndStrideOpInterface::expandToRank(
		result, mixedOffsets, mixedSizes, mixedStrides,
		[&](Value target, int64_t dim) -> OpFoldResult {
		auto shapedType = target.getType().cast<ShapedType>();
		if (shapedType.isDynamicDim(dim))
		return rewriter.create<memref::DimOp>(loc, target, dim).result();
		return rewriter.getIndexAttr(shapedType.getDimSize(dim));
		});
		// Take a subview of the dst.
		auto subviewMemRefType = memref::SubViewOp::inferRankReducedResultType(
		resultType.getRank(), memrefType, mixedOffsets,
		mixedSizes, mixedStrides)
		.cast<MemRefType>();
		Value subView = rewriter.create<memref::SubViewOp>(
		loc, subviewMemRefType, result, mixedOffsets, mixedSizes, mixedStrides);

		// Copy tensor. If this tensor.insert_slice has a matching
		// tensor.extract_slice, the copy operation will eventually fold away.
		Value srcMemref =
		state.getBuffer(rewriter, padOp->getOpOperand(0) /source*/);
		if (failed(createMemCpy(rewriter, loc, srcMemref, subView,
		state.getOptions())))
		return failure();

		replaceOpWithBufferizedValues(rewriter, op, result);
		return success();
		}

		bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
		const BufferizationState &state) const {
		return true;
		}

		bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
		const BufferizationState &state) const {
		return true;
		springermUnsubmitted Not Done Reply Inline Actions The buffer of `padOp.source()` is not modified, so this can `return false`. springerm: The buffer of `padOp.source()` is not modified, so this can `return false`.
		}

		SmallVector<OpResult>
		getAliasingOpResult(Operation *op, OpOperand &opOperand,
		const BufferizationState &state) const {

		pifon2aUnsubmitted Not Done Reply Inline Actions nit: remove empty line? pifon2a: nit: remove empty line?
		if (&opOperand == &op->getOpOperand(0) /dest/)
		return {op->getResult(0)};
		return {};
		springermUnsubmitted Not Done Reply Inline Actions `bufferize()` always allocates a brand new buffer, right? In that case, always `return {}`in this function. springerm: `bufferize()` always allocates a brand new buffer, right? In that case, always `return {}`in…
		}
		};

} // namespace		} // namespace
} // namespace tensor		} // namespace tensor
} // namespace mlir		} // namespace mlir

void mlir::tensor::registerBufferizableOpInterfaceExternalModels(		void mlir::tensor::registerBufferizableOpInterfaceExternalModels(
DialectRegistry &registry) {		DialectRegistry &registry) {
registry.addExtension(+[](MLIRContext ctx, tensor::TensorDialect dialect) {		registry.addExtension(+[](MLIRContext ctx, tensor::TensorDialect dialect) {
CastOp::attachInterface<CastOpInterface>(*ctx);		CastOp::attachInterface<CastOpInterface>(*ctx);
CollapseShapeOp::attachInterface<CollapseShapeOpInterface>(*ctx);		CollapseShapeOp::attachInterface<CollapseShapeOpInterface>(*ctx);
DimOp::attachInterface<DimOpInterface>(*ctx);		DimOp::attachInterface<DimOpInterface>(*ctx);
ExpandShapeOp::attachInterface<ExpandShapeOpInterface>(*ctx);		ExpandShapeOp::attachInterface<ExpandShapeOpInterface>(*ctx);
ExtractSliceOp::attachInterface<ExtractSliceOpInterface>(*ctx);		ExtractSliceOp::attachInterface<ExtractSliceOpInterface>(*ctx);
ExtractOp::attachInterface<ExtractOpInterface>(*ctx);		ExtractOp::attachInterface<ExtractOpInterface>(*ctx);
FromElementsOp::attachInterface<FromElementsOpInterface>(*ctx);		FromElementsOp::attachInterface<FromElementsOpInterface>(*ctx);
GenerateOp::attachInterface<GenerateOpInterface>(*ctx);		GenerateOp::attachInterface<GenerateOpInterface>(*ctx);
InsertOp::attachInterface<InsertOpInterface>(*ctx);		InsertOp::attachInterface<InsertOpInterface>(*ctx);
InsertSliceOp::attachInterface<InsertSliceOpInterface>(*ctx);		InsertSliceOp::attachInterface<InsertSliceOpInterface>(*ctx);
		PadOp::attachInterface<PadOpInterface>(*ctx);
RankOp::attachInterface<RankOpInterface>(*ctx);		RankOp::attachInterface<RankOpInterface>(*ctx);
});		});
}		}

mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp

//===- Bufferize.cpp - Bufferization for `tensor` dialect ops -------------===//		//===- Bufferize.cpp - Bufferization for `tensor` dialect ops -------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements bufferization of `tensor` dialect ops		// This file implements bufferization of `tensor` dialect ops
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Bufferization/Transforms/Bufferize.h"		#include "mlir/Dialect/Bufferization/Transforms/Bufferize.h"
#include "PassDetail.h"		#include "PassDetail.h"
		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"		#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"		#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
#include "mlir/Dialect/Bufferization/IR/Bufferization.h"		#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/SCF/SCF.h"		#include "mlir/Dialect/SCF/SCF.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.h"		#include "mlir/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.h"
#include "mlir/Dialect/Tensor/Transforms/Passes.h"		#include "mlir/Dialect/Tensor/Transforms/Passes.h"
Show All 11 Lines	void runOnOperation() override {

if (failed(bufferizeOp(getOperation(), options)))		if (failed(bufferizeOp(getOperation(), options)))
signalPassFailure();		signalPassFailure();
}		}

void getDependentDialects(DialectRegistry &registry) const override {		void getDependentDialects(DialectRegistry &registry) const override {
registry.insert<bufferization::BufferizationDialect, memref::MemRefDialect,		registry.insert<bufferization::BufferizationDialect, memref::MemRefDialect,
tensor::TensorDialect, scf::SCFDialect,		tensor::TensorDialect, scf::SCFDialect,
arith::ArithmeticDialect>();		arith::ArithmeticDialect, AffineDialect>();
		springermUnsubmitted Not Done Reply Inline Actions Why is this needed? springerm: Why is this needed?
tensor::registerBufferizableOpInterfaceExternalModels(registry);		tensor::registerBufferizableOpInterfaceExternalModels(registry);
}		}
};		};
} // namespace		} // namespace

std::unique_ptr<Pass> mlir::createTensorBufferizePass() {		std::unique_ptr<Pass> mlir::createTensorBufferizePass() {
return std::make_unique<TensorBufferizePass>();		return std::make_unique<TensorBufferizePass>();
}		}

mlir/test/Dialect/Tensor/bufferize.mlir

	// RUN: mlir-opt %s -tensor-bufferize \| FileCheck %s			// RUN: mlir-opt %s -split-input-file -tensor-bufferize \| FileCheck %s

	// CHECK-DAG: #[[$MAP0:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>			// CHECK-DAG: #[[$MAP0:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
	// CHECK-DAG: #[[$MAP1:.]] = affine_map<(d0, d1)[s0] -> (d0 20 + s0 + d1)>			// CHECK-DAG: #[[$MAP1:.]] = affine_map<(d0, d1)[s0] -> (d0 20 + s0 + d1)>
	// CHECK-DAG: #[[$MAP2:.]] = affine_map<(d0, d1, d2, d3)[s0] -> (d0 140 + d1 * 20 + d2 * 5 + d3 + s0)>			// CHECK-DAG: #[[$MAP2:.]] = affine_map<(d0, d1, d2, d3)[s0] -> (d0 140 + d1 * 20 + d2 * 5 + d3 + s0)>
	// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0) -> (d0 + 1)>			// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0) -> (d0 + 1)>
	// CHECK-DAG: #[[$MAP4:.*]] = affine_map<() -> (1)>			// CHECK-DAG: #[[$MAP4:.*]] = affine_map<() -> (1)>

	// CHECK-LABEL: func @dim(			// CHECK-LABEL: func @dim(
	▲ Show 20 Lines • Show All 357 Lines • ▼ Show 20 Lines
	// CHECK-LABEL: func @tensor.collapse_shape_of_slice(			// CHECK-LABEL: func @tensor.collapse_shape_of_slice(
	func @tensor.collapse_shape_of_slice(%arg0: tensor<2xi32>) -> tensor<i32> {			func @tensor.collapse_shape_of_slice(%arg0: tensor<2xi32>) -> tensor<i32> {
	// CHECK: memref.subview %{{.*}}[1] [1] [1] : memref<2xi32> to memref<1xi32, #[[$MAP3]]>			// CHECK: memref.subview %{{.*}}[1] [1] [1] : memref<2xi32> to memref<1xi32, #[[$MAP3]]>
	%0 = tensor.extract_slice %arg0[1] [1] [1] : tensor<2xi32> to tensor<1xi32>			%0 = tensor.extract_slice %arg0[1] [1] [1] : tensor<2xi32> to tensor<1xi32>
	// CHECK: memref.collapse_shape %{{.*}} [] : memref<1xi32, #[[$MAP3]]> into memref<i32, #[[$MAP4]]>			// CHECK: memref.collapse_shape %{{.*}} [] : memref<1xi32, #[[$MAP3]]> into memref<i32, #[[$MAP4]]>
	%1 = tensor.collapse_shape %0 [] : tensor<1xi32> into tensor<i32>			%1 = tensor.collapse_shape %0 [] : tensor<1xi32> into tensor<i32>
	return %1 : tensor<i32>			return %1 : tensor<i32>
	}			}

				// -----

				func @pad_static(%arg0: tensor<3x4xf32>, %pad_value: f32) -> tensor<6x9xf32> {
				%0 = tensor.pad %arg0 low[1, 2] high[2, 3] {
				^bb0(%arg1 : index, %arg2 : index):
				tensor.yield %pad_value : f32
				} : tensor<3x4xf32> to tensor<6x9xf32>
				springermUnsubmitted Not Done Reply Inline Actions Try this with a test case that has a dynamic result dim size. I think the bufferization will fail. `PadOp` should implement `ReifyRankedShapedTypeOpInterface`, which it does not currently. Once it does, the dynamic bufferization test case should pass. (Can be done in a separate commit.) springerm: Try this with a test case that has a dynamic result dim size. I think the bufferization will…
				return %0 : tensor<6x9xf32>
				}
				// CHECK-LABEL: func @pad_static(
				// CHECK-SAME: %[[INPUT_TENSOR:.*]]: tensor<3x4xf32>,
				// CHECK-SAME: %[[PADDING:.*]]: f32) -> tensor<6x9xf32> {
				// CHECK-DAG: %[[C0:.*]] = arith.constant 0
				// CHECK-DAG: %[[C1:.*]] = arith.constant 1
				// CHECK-DAG: %[[C6:.*]] = arith.constant 6
				// CHECK-DAG: %[[C9:.*]] = arith.constant 9
				// CHECK: %[[INPUT:.*]] = bufferization.to_memref %[[INPUT_TENSOR]] : memref<3x4xf32>
				// CHECK: %[[INPUT_CLONE:.*]] = memref.alloc() {alignment = 128 : i64} : memref<3x4xf32>
				// CHECK: %[[OUTPUT_BUFFER:.*]] = memref.alloc() {alignment = 128 : i64} : memref<6x9xf32>
				// CHECK: scf.parallel (%[[VAL_9:.]], %[[VAL_10:.]]) = (%[[C0]], %[[C0]])
				// CHECK-SAME: to (%[[C6]], %[[C9]]) step (%[[C1]], %[[C1]]) {
				// CHECK: memref.store %[[PADDING]], %[[OUTPUT_BUFFER]]{{\[}}%[[VAL_9]], %[[VAL_10]]]
				// CHECK: scf.yield
				// CHECK: }
				bondhugulaUnsubmitted Not Done Reply Inline Actions As a general comment, I'm not sure why `scf.parallel`'s are being generated here instead of `affine.parallel` + `affine.store` -- these are trivially affine. And then if you do scf -> affine on this, you'll get this scf output exactly. So you have a strictly better path lowering this through `affine`. I notice there are other lowerings that have chosen this and that's just because they were missed during review propagating the counter-pattern. CC: @ftynse @mehdi_amini bondhugula: As a general comment, I'm not sure why `scf.parallel`'s are being generated here instead of…
				bondhugulaUnsubmitted Not Done Reply Inline Actions Correction: "scf -> affine" should "affine -> scf" above. bondhugula: Correction: "scf -> affine" should "affine -> scf" above.
				springermUnsubmitted Not Done Reply Inline Actions E.g., `tensor.generate` also lowers to `scf.parallel`. I think there's no particular reason why it was implemented like that. When I touched that bufferization pattern the last time, I just did not think of lowering to affine, then to SCF. springerm: E.g., `tensor.generate` also lowers to `scf.parallel`. I think there's no particular reason why…
				bondhugulaUnsubmitted Not Done Reply Inline Actions tensor -> affine -> scf would be natural lowering and de-abstraction here. Lowering directly from tensor -> scf, whenever tensor -> affine is possible, would just be the wrong direction and strictly worse in all possible ways. We shouldn't propagate this further and fix the rest. They were just missed during the review. bondhugula: tensor -> affine -> scf would be natural lowering and de-abstraction here. Lowering directly…
				// CHECK: %[[OUPUT_VIEW:.]] = memref.subview %[[OUTPUT_BUFFER]][1, 2] [3, 4] [1, 1] : memref<6x9xf32> to memref<3x4xf32, #[[MAP:.]]>
				// CHECK: memref.copy %[[INPUT]], %[[INPUT_CLONE]] : memref<3x4xf32> to memref<3x4xf32>
				// CHECK: memref.copy %[[INPUT_CLONE]], %[[OUPUT_VIEW]] : memref<3x4xf32> to memref<3x4xf32, #[[MAP]]>
				// CHECK: %[[RESULT:.*]] = bufferization.to_tensor %[[OUTPUT_BUFFER]] : memref<6x9xf32>
				// CHECK: return %[[RESULT]] : tensor<6x9xf32>
				// CHECK: }