This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/
-
Dialect/
-
Bufferization/Transforms/
-
Transforms/
-
Bufferize.cpp
-
Tensor/Transforms/
-
Transforms/
7/9
BufferizableOpInterfaceImpl.cpp
-
test/Dialect/Tensor/
-
Dialect/
-
Tensor/
-
bufferize.mlir
-
one-shot-bufferize.mlir

Differential D132355

[mlir][tensor][bufferize] Bufferize tensor.pad
ClosedPublic

Authored by springerm on Aug 22 2022, 1:41 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
mravishankar
tpopp

Commits

rG9ee12f477859: [mlir][tensor][bufferize] Bufferize tensor.pad

Summary

tensor.pad is lowered to tensor.generate + tensor.insert_slice during bufferization. For best performance with constant padding values, users should vectorize the IR before bufferizing it.

This change also relaxes tje restriction that no new ops that bufferize to a memory write should be added during bufferization. Since bufferization has been split into two steps a while ago (tensor copy insertion + bufferization), it is reasonable to allow this now.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Aug 22 2022, 1:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 1:41 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 18 others. · View Herald Transcript

springerm requested review of this revision.Aug 22 2022, 1:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 1:41 AM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

springerm added a reviewer: tpopp.Aug 22 2022, 1:46 AM

Harbormaster completed remote builds in B182526: Diff 454414.Aug 22 2022, 2:10 AM

Nice!

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
789–790	These methods recompute a vector of values every time, so ideally move them out of the loop. Alternatively, you could only obtain the dynamic values and keep a separate variable for indexing into them, but I like this more.
800	Would it be better to move over all attributes except the few that are PadOp specific? I've seen some place in MLIR provide a filter function for this.

This revision is now accepted and ready to land.Aug 22 2022, 7:34 AM

address comments

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
800	Not sure, it's not clear if these attributes would have any meaning on the GenerateOp. (Should they be added to the GenerateOp or the InsertSliceOp, or both?) The `escape` attribute is very specific to bufferization and only makes sense for the GenerateOp.

This revision was landed with ongoing or failed builds.Aug 22 2022, 8:05 AM

Closed by commit rG9ee12f477859: [mlir][tensor][bufferize] Bufferize tensor.pad (authored by springerm). · Explain Why

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rG9ee12f477859: [mlir][tensor][bufferize] Bufferize tensor.pad.

Harbormaster completed remote builds in B182602: Diff 454507.Aug 22 2022, 9:22 AM

nicolasvasilache added inline comments.Aug 23 2022, 1:49 AM

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
746	Some comments and warnings that this will bufferize to tensor.generate which in turn will jump the abstraction gap and bufferize to loops. Also a // TODO: Reevaluate when we have a higher-level representation on buffers for generate we can avoid that jump.
775	This feels like it should be a first-class citizen of StaticValueUtils.
793	AffineApply of 2 symbols would generally be better and more composable.
811	Can we hoist this into a properly named helper in a some util file close to a notional BuiltinTypeUtils.h ?

springerm marked an inline comment as done.Aug 23 2022, 3:18 AM

springerm added inline comments.

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
775	That would require a dependency of StaticValueUtils (part of DialectUtils) on a dialect (Arith in this case).

tpopp added inline comments.Aug 23 2022, 3:49 AM

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp
811	I feel like there is a missing method that should exist on the Op similar to `getMixedLowPad()`

springerm marked 2 inline comments as done.Aug 23 2022, 11:58 PM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Bufferization/

Transforms/

Bufferize.cpp

11 lines

Tensor/

Transforms/

BufferizableOpInterfaceImpl.cpp

86 lines

test/

Dialect/

Tensor/

bufferize.mlir

33 lines

one-shot-bufferize.mlir

18 lines

Diff 454414

mlir/lib/Dialect/Bufferization/Transforms/Bufferize.cpp

Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines	void notifyOperationInserted(Operation *op) override {
if (!hasTensorSemantics(op))		if (!hasTensorSemantics(op))
return;		return;

// Skip ops that are not allowed to be bufferized.		// Skip ops that are not allowed to be bufferized.
auto const &options = analysisState.getOptions();		auto const &options = analysisState.getOptions();
if (!options.isOpAllowed(op) \|\| (opFilter && !opFilter->isOpAllowed(op)))		if (!options.isOpAllowed(op) \|\| (opFilter && !opFilter->isOpAllowed(op)))
return;		return;

#ifndef NDEBUG
// Read-only tensor ops may be created during bufferization. Ops that are
// writing should not be created because such ops were never analyzed.
// Bufferizing such ops could introduce a RaW conflict.
for (OpOperand &operand : op->getOpOperands())
if (operand.get().getType().isa<TensorType>())
assert(!analysisState.bufferizesToMemoryWrite(operand) &&
"creating tensor ops that bufferize to a memory write is not "
"allowed during bufferization");
#endif // NDEBUG

// Add op to worklist.		// Add op to worklist.
worklist.push_back(op);		worklist.push_back(op);
}		}

private:		private:
/// A set of all erased ops.		/// A set of all erased ops.
DenseSet<Operation *> &erasedOps;		DenseSet<Operation *> &erasedOps;

▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp

//===- BufferizableOpInterfaceImpl.cpp - Impl. of BufferizableOpInterface -===//		//===- BufferizableOpInterfaceImpl.cpp - Impl. of BufferizableOpInterface -===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.h"		#include "mlir/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.h"
#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"		#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"		#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
#include "mlir/Dialect/Bufferization/IR/Bufferization.h"		#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/SCF/IR/SCF.h"		#include "mlir/Dialect/SCF/IR/SCF.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
		#include "mlir/Dialect/Utils/StaticValueUtils.h"
#include "mlir/IR/Dialect.h"		#include "mlir/IR/Dialect.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::bufferization;		using namespace mlir::bufferization;
using namespace mlir::tensor;		using namespace mlir::tensor;

namespace mlir {		namespace mlir {
▲ Show 20 Lines • Show All 710 Lines • ▼ Show 20 Lines	LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
if (failed(options.createMemCpy(rewriter, loc, *srcMemref, subView)))		if (failed(options.createMemCpy(rewriter, loc, *srcMemref, subView)))
return failure();		return failure();

replaceOpWithBufferizedValues(rewriter, op, *dstMemref);		replaceOpWithBufferizedValues(rewriter, op, *dstMemref);
return success();		return success();
}		}
};		};

		/// Bufferization of tensor.pad. Replace with tensor.generate + insert_slice.
		/// For best performance, vectorize before bufferization (better performance in
		/// case of padding with a constant).
		struct PadOpInterface
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Some comments and warnings that this will bufferize to tensor.generate which in turn will jump the abstraction gap and bufferize to loops. Also a // TODO: Reevaluate when we have a higher-level representation on buffers for generate we can avoid that jump. nicolasvasilache: Some comments and warnings that this will bufferize to tensor.generate which in turn will jump…
		: public BufferizableOpInterface::ExternalModel<PadOpInterface,
		tensor::PadOp> {
		bool bufferizesToAllocation(Operation *op, OpResult opResult) const {
		return true;
		}

		bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
		const AnalysisState &state) const {
		return true;
		}

		bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
		const AnalysisState &state) const {
		return false;
		}

		SmallVector<OpResult> getAliasingOpResult(Operation *op, OpOperand &opOperand,
		const AnalysisState &state) const {
		return {};
		}

		LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
		const BufferizationOptions &options) const {
		auto padOp = cast<tensor::PadOp>(op);
		Location loc = padOp.getLoc();
		RankedTensorType resultType = padOp.getResultType();
		RankedTensorType srcType = padOp.getSourceType();

		auto toValue = [&](OpFoldResult ofr) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This feels like it should be a first-class citizen of StaticValueUtils. nicolasvasilache: This feels like it should be a first-class citizen of StaticValueUtils.
		springermAuthorUnsubmitted Done Reply Inline Actions That would require a dependency of StaticValueUtils (part of DialectUtils) on a dialect (Arith in this case). springerm: That would require a dependency of StaticValueUtils (part of DialectUtils) on a dialect (Arith…
		if (ofr.is<Value>())
		return ofr.get<Value>();
		return rewriter
		.create<arith::ConstantIndexOp>(loc, *getConstantIntValue(ofr))
		.getResult();
		};

		// Compute dynamic result dimensions.
		SmallVector<Value> dynamicSizes;
		for (int64_t i = 0; i < resultType.getRank(); ++i) {
		if (!resultType.isDynamicDim(i))
		continue;
		Value srcDim = rewriter.create<tensor::DimOp>(loc, padOp.getSource(), i);
		Value lowPad = toValue(padOp.getMixedLowPad()[i]);
		Value highPad = toValue(padOp.getMixedHighPad()[i]);
		tpoppUnsubmitted Done Reply Inline Actions These methods recompute a vector of values every time, so ideally move them out of the loop. Alternatively, you could only obtain the dynamic values and keep a separate variable for indexing into them, but I like this more. tpopp: These methods recompute a vector of values every time, so ideally move them out of the loop.
		Value s1 = rewriter.create<arith::AddIOp>(loc, lowPad, highPad);
		Value s2 = rewriter.create<arith::AddIOp>(loc, s1, srcDim);
		dynamicSizes.push_back(s2);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions AffineApply of 2 symbols would generally be better and more composable. nicolasvasilache: AffineApply of 2 symbols would generally be better and more composable.
		}

		// Create tensor::GenerateOp.
		auto generateOp =
		rewriter.create<tensor::GenerateOp>(loc, resultType, dynamicSizes);
		// Move over "escape" attribute if present.
		if (padOp->hasAttr(BufferizationDialect::kEscapeAttrName))
		tpoppUnsubmitted Done Reply Inline Actions Would it be better to move over all attributes except the few that are PadOp specific? I've seen some place in MLIR provide a filter function for this. tpopp: Would it be better to move over all attributes except the few that are PadOp specific? I've…
		springermAuthorUnsubmitted Done Reply Inline Actions Not sure, it's not clear if these attributes would have any meaning on the GenerateOp. (Should they be added to the GenerateOp or the InsertSliceOp, or both?) The `escape` attribute is very specific to bufferization and only makes sense for the GenerateOp. springerm: Not sure, it's not clear if these attributes would have any meaning on the GenerateOp. (Should…
		generateOp->setAttr(
		BufferizationDialect::kEscapeAttrName,
		padOp->getAttr(BufferizationDialect::kEscapeAttrName));
		// TODO: Memory space
		rewriter.inlineRegionBefore(padOp.getRegion(), generateOp.getBody(),
		generateOp.getBody().begin());

		// Create tensor::InsertSliceOp.
		SmallVector<OpFoldResult> sliceSizes, sliceStrides;
		for (int64_t i = 0; i < resultType.getRank(); ++i) {
		sliceStrides.push_back(rewriter.getIndexAttr(1));
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Can we hoist this into a properly named helper in a some util file close to a notional BuiltinTypeUtils.h ? nicolasvasilache: Can we hoist this into a properly named helper in a some util file close to a notional…
		tpoppUnsubmitted Done Reply Inline Actions I feel like there is a missing method that should exist on the Op similar to `getMixedLowPad()` tpopp: I feel like there is a missing method that should exist on the Op similar to `getMixedLowPad()`
		if (srcType.isDynamicDim(i)) {
		Value size = rewriter.create<tensor::DimOp>(loc, padOp.getSource(), i);
		sliceSizes.push_back(size);
		} else {
		sliceSizes.push_back(rewriter.getIndexAttr(srcType.getDimSize(i)));
		}
		}
		rewriter.replaceOpWithNewOp<tensor::InsertSliceOp>(
		padOp, padOp.getSource(), generateOp.getResult(),
		/offsets=/padOp.getMixedLowPad(), sliceSizes, sliceStrides);

		return success();
		}
		};

/// Bufferization of tensor.rank. Replace with memref.rank.		/// Bufferization of tensor.rank. Replace with memref.rank.
struct RankOpInterface		struct RankOpInterface
: public BufferizableOpInterface::ExternalModel<RankOpInterface,		: public BufferizableOpInterface::ExternalModel<RankOpInterface,
tensor::RankOp> {		tensor::RankOp> {
bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,		bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
const AnalysisState &state) const {		const AnalysisState &state) const {
return true;		return true;
}		}
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	registry.addExtension(+[](MLIRContext ctx, tensor::TensorDialect dialect) {
DimOp::attachInterface<DimOpInterface>(*ctx);		DimOp::attachInterface<DimOpInterface>(*ctx);
ExpandShapeOp::attachInterface<ExpandShapeOpInterface>(*ctx);		ExpandShapeOp::attachInterface<ExpandShapeOpInterface>(*ctx);
ExtractSliceOp::attachInterface<ExtractSliceOpInterface>(*ctx);		ExtractSliceOp::attachInterface<ExtractSliceOpInterface>(*ctx);
ExtractOp::attachInterface<ExtractOpInterface>(*ctx);		ExtractOp::attachInterface<ExtractOpInterface>(*ctx);
FromElementsOp::attachInterface<FromElementsOpInterface>(*ctx);		FromElementsOp::attachInterface<FromElementsOpInterface>(*ctx);
GenerateOp::attachInterface<GenerateOpInterface>(*ctx);		GenerateOp::attachInterface<GenerateOpInterface>(*ctx);
InsertOp::attachInterface<InsertOpInterface>(*ctx);		InsertOp::attachInterface<InsertOpInterface>(*ctx);
InsertSliceOp::attachInterface<InsertSliceOpInterface>(*ctx);		InsertSliceOp::attachInterface<InsertSliceOpInterface>(*ctx);
		PadOp::attachInterface<PadOpInterface>(*ctx);
ParallelInsertSliceOp::attachInterface<ParallelInsertSliceOpInterface>(		ParallelInsertSliceOp::attachInterface<ParallelInsertSliceOpInterface>(
*ctx);		*ctx);
RankOp::attachInterface<RankOpInterface>(*ctx);		RankOp::attachInterface<RankOpInterface>(*ctx);
ReshapeOp::attachInterface<ReshapeOpInterface>(*ctx);		ReshapeOp::attachInterface<ReshapeOpInterface>(*ctx);

// Load additional dialects of which ops may get created.		// Load additional dialects of which ops may get created.
ctx->loadDialect<arith::ArithmeticDialect, scf::SCFDialect>();		ctx->loadDialect<arith::ArithmeticDialect, scf::SCFDialect>();
});		});
}		}

mlir/test/Dialect/Tensor/bufferize.mlir

Show First 20 Lines • Show All 538 Lines • ▼ Show 20 Lines	func.func @tensor.reshape(%t1: tensor<?x10xf32>) -> tensor<2x2x5xf32> {

// CHECK: %[[reshaped:.*]] = memref.reshape %[[m1]](%[[alloc]]) : (memref<?x10xf32>, memref<3xi64>) -> memref<2x2x5xf32>		// CHECK: %[[reshaped:.*]] = memref.reshape %[[m1]](%[[alloc]]) : (memref<?x10xf32>, memref<3xi64>) -> memref<2x2x5xf32>
%reshaped = tensor.reshape %t1(%shape) : (tensor<?x10xf32>, tensor<3xi64>) -> tensor<2x2x5xf32>		%reshaped = tensor.reshape %t1(%shape) : (tensor<?x10xf32>, tensor<3xi64>) -> tensor<2x2x5xf32>

// CHECK: %[[r:.*]] = bufferization.to_tensor %[[reshaped]]		// CHECK: %[[r:.*]] = bufferization.to_tensor %[[reshaped]]
// CHECK: return %[[r]]		// CHECK: return %[[r]]
return %reshaped : tensor<2x2x5xf32>		return %reshaped : tensor<2x2x5xf32>
}		}

		// -----

		// CHECK-LABEL: func @tensor.pad(
		// CHECK-SAME: %[[t1:.]]: tensor<?x10xindex>, %[[l2:.]]: index, %[[h1:.]]: index, %[[h2:.]]: index
		func.func @tensor.pad(%t1: tensor<?x10xindex>, %l2: index, %h1: index,
		%h2: index) -> tensor<?x?xindex> {
		// CHECK-DAG: %[[m1:.*]] = bufferization.to_memref %[[t1]] : memref<?x10xindex>
		// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index
		// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index
		// CHECK-DAG: %[[c5:.*]] = arith.constant 5 : index
		// CHECK-DAG: %[[dim0:.*]] = memref.dim %[[m1]], %[[c0]]
		// CHECK-DAG: %[[dim1:.*]] = memref.dim %[[m1]], %[[c1]]
		// CHECK-DAG: %[[pad0:.*]] = arith.addi %[[c5]], %[[h1]]
		// CHECK-DAG: %[[size0:.*]] = arith.addi %[[pad0]], %[[dim0]]
		// CHECK-DAG: %[[pad1:.*]] = arith.addi %[[l2]], %[[h2]]
		// CHECK-DAG: %[[size1:.*]] = arith.addi %[[pad1]], %[[dim1]]
		// CHECK: %[[alloc:.]] = memref.alloc(%[[size0]], %[[size1]]) {{.}} : memref<?x?xindex>
		// CHECK: scf.parallel ({{.*}}) = (%[[c0]], %[[c0]]) to (%[[size0]], %[[size1]]) step (%[[c1]], %[[c1]]) {
		// CHECK: memref.store
		// CHECK: }
		// CHECK: %[[subview:.*]] = memref.subview %[[alloc]][5, %[[l2]]] [%[[dim0]], 10] [1, 1]
		// CHECK: memref.copy %[[m1]], %[[subview]]
		%0 = tensor.pad %t1 low[5, %l2] high[%h1, %h2] {
		^bb0(%arg0: index, %arg1: index):
		%m = arith.muli %arg0, %arg1 : index
		tensor.yield %m : index
		} : tensor<?x10xindex> to tensor<?x?xindex>

		// CHECK: %[[r:.*]] = bufferization.to_tensor %[[alloc]]
		// CHECK: return %[[r]] : tensor<?x?xindex>
		return %0 : tensor<?x?xindex>
		}

mlir/test/Dialect/Tensor/one-shot-bufferize.mlir

Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	func.func @dealloc_generate_buffer(%arg: tensor<*xf32>, %sz: index, %idx: index)
%0 = tensor.generate %sz {		%0 = tensor.generate %sz {
^bb0(%i : index):		^bb0(%i : index):
%elem = tensor.dim %arg, %i : tensor<*xf32>		%elem = tensor.dim %arg, %i : tensor<*xf32>
tensor.yield %elem : index		tensor.yield %elem : index
} : tensor<?xindex>		} : tensor<?xindex>
%r = tensor.extract %0[%idx] : tensor<?xindex>		%r = tensor.extract %0[%idx] : tensor<?xindex>
return %r : index		return %r : index
}		}

		// -----

		// CHECK-LABEL: func @dealloc_pad_buffer
		func.func @dealloc_pad_buffer(%t1: tensor<?x10xindex>, %l2: index, %h1: index,
		%h2: index, %idx: index) -> index {
		// CHECK: memref.alloc
		// CHECK: scf.parallel
		// CHECK: memref.load
		// CHECK: memref.dealloc
		%0 = tensor.pad %t1 low[5, %l2] high[%h1, %h2] {
		^bb0(%arg0: index, %arg1: index):
		%m = arith.muli %arg0, %arg1 : index
		tensor.yield %m : index
		} : tensor<?x10xindex> to tensor<?x?xindex>
		%r = tensor.extract %0[%idx, %idx] : tensor<?x?xindex>
		return %r : index
		}