This is an archive of the discontinued LLVM Phabricator instance.

[mlir][tensor][bufferize] Support memory_space for tensor.pad
ClosedPublic

Authored by springerm on Oct 19 2022, 9:03 AM.

Download Raw Diff

Details

Reviewers

KoolJBlack
pifon2a
nicolasvasilache

Commits

rG09dfb4419397: [mlir][tensor][bufferize] Support memory_space for tensor.pad

Summary

This change adds memory space support to tensor.pad. (tensor.generate and tensor.from_elements do not support memory spaces yet.)

The memory space is inferred from the buffer of the source tensor.

Instead of lowering tensor.pad to tensor.generate + tensor.insert_slice, it is now lowered to bufferization.alloc_tensor (with the correct memory space) + scf.parallel + tensor.insert_slice.

Memory space support for the remaining two tensor ops is left for a later point, as this requires some more design discussions.

Depends On D136767

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Oct 19 2022, 9:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 9:03 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 18 others. · View Herald Transcript

springerm requested review of this revision.Oct 19 2022, 9:03 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptOct 19 2022, 9:03 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

springerm mentioned this in D136064: [mlir][tensor][bufferize] Support memory_space on all tensor dialect ops.Oct 19 2022, 9:03 AM

update

Harbormaster completed remote builds in B193026: Diff 468943.Oct 19 2022, 9:29 AM

Awesome, thanks for setting this up.

This revision is now accepted and ready to land.Oct 24 2022, 3:09 PM

This triggers all sorts of red flags for me (lowering to loops directly during bufferization, increasing the number of uses of ToMemref/ToTensor, having to choose a loop type) and I don't see this as a good path forward.

This revision now requires changes to proceed.Oct 25 2022, 2:40 AM

address comments

springerm edited the summary of this revision. (Show Details)Oct 26 2022, 7:15 AM

springerm added a parent revision: D136767: [mlir][tensor][bufferize] Lower tensor.generate to linalg.map.

Harbormaster completed remote builds in B194414: Diff 470810.Oct 26 2022, 7:29 AM

Awesome, thanks for improving this!

This revision is now accepted and ready to land.Oct 27 2022, 12:42 AM

This revision was landed with ongoing or failed builds.Oct 27 2022, 3:30 AM

Closed by commit rG09dfb4419397: [mlir][tensor][bufferize] Support memory_space for tensor.pad (authored by springerm). · Explain Why

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rG09dfb4419397: [mlir][tensor][bufferize] Support memory_space for tensor.pad.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Tensor/

Transforms/

BufferizableOpInterfaceImpl.cpp

47 lines

test/

Dialect/

Tensor/

bufferize.mlir

8 lines

one-shot-bufferize.mlir

28 lines

Diff 471106

mlir/lib/Dialect/Tensor/Transforms/BufferizableOpInterfaceImpl.cpp

Show First 20 Lines • Show All 773 Lines • ▼ Show 20 Lines	LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
if (failed(options.createMemCpy(rewriter, loc, *srcMemref, subView)))		if (failed(options.createMemCpy(rewriter, loc, *srcMemref, subView)))
return failure();		return failure();

replaceOpWithBufferizedValues(rewriter, op, *dstMemref);		replaceOpWithBufferizedValues(rewriter, op, *dstMemref);
return success();		return success();
}		}
};		};

/// Bufferization of tensor.pad. Replace with tensor.generate + insert_slice.		/// Bufferization of tensor.pad. Replace with bufferization.alloc_tensor +
		/// linalg.map + insert_slice.
/// For best performance, vectorize before bufferization (better performance in		/// For best performance, vectorize before bufferization (better performance in
/// case of padding with a constant).		/// case of padding with a constant).
struct PadOpInterface		struct PadOpInterface
: public BufferizableOpInterface::ExternalModel<PadOpInterface,		: public BufferizableOpInterface::ExternalModel<PadOpInterface,
tensor::PadOp> {		tensor::PadOp> {
bool bufferizesToAllocation(Operation *op, OpResult opResult) const {		bool bufferizesToAllocation(Operation *op, OpResult opResult) const {
return true;		return true;
}		}

bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,		bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
const AnalysisState &state) const {		const AnalysisState &state) const {
return true;		return true;
}		}

bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,		bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
const AnalysisState &state) const {		const AnalysisState &state) const {
return false;		return false;
}		}

SmallVector<OpResult> getAliasingOpResult(Operation *op, OpOperand &opOperand,		SmallVector<OpResult> getAliasingOpResult(Operation *op, OpOperand &opOperand,
const AnalysisState &state) const {		const AnalysisState &state) const {
return {};		return {};
}		}

		FailureOr<BaseMemRefType>
		getBufferType(Operation *op, Value value, const BufferizationOptions &options,
		const DenseMap<Value, BaseMemRefType> &fixedTypes) const {
		// Infer memory space from the source tensor.
		auto padOp = cast<tensor::PadOp>(op);
		auto maybeSrcBufferType =
		bufferization::getBufferType(padOp.getSource(), options, fixedTypes);
		if (failed(maybeSrcBufferType))
		return failure();
		MemRefLayoutAttrInterface layout;
		return MemRefType::get(padOp.getResultType().getShape(),
		padOp.getResultType().getElementType(), layout,
		maybeSrcBufferType->getMemorySpace());
		}

LogicalResult bufferize(Operation *op, RewriterBase &rewriter,		LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
const BufferizationOptions &options) const {		const BufferizationOptions &options) const {
auto padOp = cast<tensor::PadOp>(op);		auto padOp = cast<tensor::PadOp>(op);
Location loc = padOp.getLoc();		Location loc = padOp.getLoc();
RankedTensorType resultType = padOp.getResultType();		RankedTensorType resultType = padOp.getResultType();
RankedTensorType srcType = padOp.getSourceType();		RankedTensorType srcType = padOp.getSourceType();

auto toValue = [&](OpFoldResult ofr) {		auto toValue = [&](OpFoldResult ofr) {
Show All 17 Lines	for (int64_t i = 0; i < resultType.getRank(); ++i) {
AffineExpr s0, s1, s2;		AffineExpr s0, s1, s2;
bindSymbols(op->getContext(), s0, s1, s2);		bindSymbols(op->getContext(), s0, s1, s2);
AffineExpr sumExpr = s0 + s1 + s2;		AffineExpr sumExpr = s0 + s1 + s2;
Value sum = rewriter.create<AffineApplyOp>(		Value sum = rewriter.create<AffineApplyOp>(
loc, sumExpr, ValueRange{srcDim, lowPad, highPad});		loc, sumExpr, ValueRange{srcDim, lowPad, highPad});
dynamicSizes.push_back(sum);		dynamicSizes.push_back(sum);
}		}

// Create tensor::GenerateOp.		// Should the buffer be deallocated?
auto generateOp =		bool dealloc =
rewriter.create<tensor::GenerateOp>(loc, resultType, dynamicSizes);		shouldDeallocateOpResult(padOp.getResult().cast<OpResult>(), options);
// Move over "escape" attribute if present.		// Allocate a buffer for the padded result.
if (padOp->hasAttr(BufferizationDialect::kEscapeAttrName))		FailureOr<Value> tensorAlloc =
generateOp->setAttr(		allocateTensorForShapedValue(rewriter, loc, padOp.getResult(),
BufferizationDialect::kEscapeAttrName,		/escape=/!dealloc, options,
padOp->getAttr(BufferizationDialect::kEscapeAttrName));		/copy=/false);
// TODO: Memory space		if (failed(tensorAlloc))
rewriter.inlineRegionBefore(padOp.getRegion(), generateOp.getBody(),		return failure();
generateOp.getBody().begin());
		// tensor::PadOp is like tensor::GenerateOp: The only difference is that
		// only a part of the generated tensor is needed. For simplicity, we reuse
		// the same functionality here.
		Value filledBuffer = lowerGenerateLikeOpBody(
		rewriter, loc, *tensorAlloc, dynamicSizes, padOp.getBodyRegion());

// Create tensor::InsertSliceOp.		// Create tensor::InsertSliceOp.
SmallVector<OpFoldResult> sliceSizes =		SmallVector<OpFoldResult> sliceSizes =
getMixedSizes(rewriter, loc, padOp.getSource());		getMixedSizes(rewriter, loc, padOp.getSource());
SmallVector<OpFoldResult> sliceStrides(srcType.getRank(),		SmallVector<OpFoldResult> sliceStrides(srcType.getRank(),
rewriter.getIndexAttr(1));		rewriter.getIndexAttr(1));
rewriter.replaceOpWithNewOp<tensor::InsertSliceOp>(		rewriter.replaceOpWithNewOp<tensor::InsertSliceOp>(
padOp, padOp.getSource(), generateOp.getResult(),		padOp, padOp.getSource(), filledBuffer,
/offsets=/padOp.getMixedLowPad(), sliceSizes, sliceStrides);		/offsets=/padOp.getMixedLowPad(), sliceSizes, sliceStrides);

return success();		return success();
}		}
};		};

/// Bufferization of tensor.rank. Replace with memref.rank.		/// Bufferization of tensor.rank. Replace with memref.rank.
struct RankOpInterface		struct RankOpInterface
▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

mlir/test/Dialect/Tensor/bufferize.mlir

Show First 20 Lines • Show All 533 Lines • ▼ Show 20 Lines	func.func @tensor.reshape(%t1: tensor<?x10xf32>) -> tensor<2x2x5xf32> {

// CHECK: %[[r:.*]] = bufferization.to_tensor %[[reshaped]]		// CHECK: %[[r:.*]] = bufferization.to_tensor %[[reshaped]]
// CHECK: return %[[r]]		// CHECK: return %[[r]]
return %reshaped : tensor<2x2x5xf32>		return %reshaped : tensor<2x2x5xf32>
}		}

// -----		// -----

// CHECK: #[[$sum_map:.+]] = affine_map<()[s0, s1, s2] -> (s0 + s1 + s2)>		// CHECK: #[[$sum_map_1:.+]] = affine_map<()[s0, s1] -> (s1 + s0 + 5)>
		// CHECK: #[[$sum_map_2:.+]] = affine_map<()[s0, s1] -> (s0 + s1 + 10)>
// CHECK-LABEL: func @tensor.pad(		// CHECK-LABEL: func @tensor.pad(
// CHECK-SAME: %[[t1:.]]: tensor<?x10xindex>, %[[l2:.]]: index, %[[h1:.]]: index, %[[h2:.]]: index		// CHECK-SAME: %[[t1:.]]: tensor<?x10xindex>, %[[l2:.]]: index, %[[h1:.]]: index, %[[h2:.]]: index
func.func @tensor.pad(%t1: tensor<?x10xindex>, %l2: index, %h1: index,		func.func @tensor.pad(%t1: tensor<?x10xindex>, %l2: index, %h1: index,
%h2: index) -> tensor<?x?xindex> {		%h2: index) -> tensor<?x?xindex> {
// CHECK-DAG: %[[m1:.*]] = bufferization.to_memref %[[t1]] : memref<?x10xindex>		// CHECK-DAG: %[[m1:.*]] = bufferization.to_memref %[[t1]] : memref<?x10xindex>
// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[c5:.*]] = arith.constant 5 : index
// CHECK-DAG: %[[dim0:.*]] = memref.dim %[[m1]], %[[c0]]		// CHECK-DAG: %[[dim0:.*]] = memref.dim %[[m1]], %[[c0]]
// CHECK-DAG: %[[dim1:.*]] = memref.dim %[[m1]], %[[c1]]		// CHECK-DAG: %[[dim1:.*]] = memref.dim %[[m1]], %[[c1]]
// CHECK-DAG: %[[size0:.*]] = affine.apply #[[$sum_map]]()[%[[dim0]], %[[c5]], %[[h1]]]		// CHECK-DAG: %[[size0:.*]] = affine.apply #[[$sum_map_1]]()[%[[h1]], %[[dim0]]]
// CHECK-DAG: %[[size1:.*]] = affine.apply #[[$sum_map]]()[%[[dim1]], %[[l2]], %[[h2]]]		// CHECK-DAG: %[[size1:.*]] = affine.apply #[[$sum_map_2]]()[%[[l2]], %[[h2]]]
// CHECK: %[[alloc:.]] = memref.alloc(%[[size0]], %[[size1]]) {{.}} : memref<?x?xindex>		// CHECK: %[[alloc:.]] = memref.alloc(%[[size0]], %[[size1]]) {{.}} : memref<?x?xindex>
// CHECK: %[[alloc_t:.*]] = bufferization.to_tensor %[[alloc]]		// CHECK: %[[alloc_t:.*]] = bufferization.to_tensor %[[alloc]]
// CHECK: %[[mapped:.*]] = linalg.map		// CHECK: %[[mapped:.*]] = linalg.map
// CHECK: outs(%[[alloc_t]] : tensor<?x?xindex>)		// CHECK: outs(%[[alloc_t]] : tensor<?x?xindex>)
// CHECK: %[[index0:.*]] = linalg.index 0		// CHECK: %[[index0:.*]] = linalg.index 0
// CHECK: %[[index1:.*]] = linalg.index 1		// CHECK: %[[index1:.*]] = linalg.index 1
// CHECK: %[[mul:.*]] = arith.muli %[[index0]], %[[index1]]		// CHECK: %[[mul:.*]] = arith.muli %[[index0]], %[[index1]]
// CHECK: linalg.yield %[[mul]]		// CHECK: linalg.yield %[[mul]]
Show All 14 Lines

mlir/test/Dialect/Tensor/one-shot-bufferize.mlir

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	func.func @insert_equivalent_tensor(%t: tensor<10xf32>) -> tensor<10xf32> {
// CHECK-NOT: memref.alloc		// CHECK-NOT: memref.alloc
%cst = arith.constant 4.200000e+01 : f32		%cst = arith.constant 4.200000e+01 : f32
// CHECK: linalg.fill		// CHECK: linalg.fill
%0 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>) -> tensor<10xf32>		%0 = linalg.fill ins(%cst : f32) outs(%t : tensor<10xf32>) -> tensor<10xf32>
// CHECK-NOT: memref.copy		// CHECK-NOT: memref.copy
%1 = tensor.insert_slice %0 into %t[0][10][1] : tensor<10xf32> into tensor<10xf32>		%1 = tensor.insert_slice %0 into %t[0][10][1] : tensor<10xf32> into tensor<10xf32>
return %1 : tensor<10xf32>		return %1 : tensor<10xf32>
}		}

		// -----

		// CHECK-LABEL: func @pad_memory_space(
		// CHECK-SAME: %[[t:.*]]: memref<?xf32, strided<[?], offset: ?>>
		func.func @pad_memory_space(%t: tensor<?xf32>, %h1: index, %f: f32, %pos: index) -> f32
		{
		// CHECK: %[[alloc_tensor:.]] = memref.alloc{{.}} : memref<?xf32, 3>
		// CHECK: memref.copy %[[t]], %[[alloc_tensor]]
		%0 = bufferization.alloc_tensor() copy(%t)
		{memory_space = 3 : ui64} : tensor<?xf32>
		// CHECK: %[[padded_alloc:.]] = memref.alloc() {{.}} : memref<15xf32, 3>
		// CHECK: linalg.map
		// CHECK: outs(%[[padded_alloc]] : memref<15xf32, 3>)
		// CHECK: linalg.yield %{{.*}}
		// CHECK: }
		// CHECK: %[[subview:.]] = memref.subview {{.}} : memref<15xf32, 3> to memref<?xf32, strided<[1], offset: 2>, 3>
		// CHECK: memref.copy %[[alloc_tensor]], %[[subview]]
		%1 = tensor.pad %0 low[2] high[%h1] {
		^bb0(%arg0: index):
		tensor.yield %f : f32
		} : tensor<?xf32> to tensor<15xf32>
		// CHECK: memref.load {{.*}} : memref<15xf32, 3>
		%2 = tensor.extract %1[%pos] : tensor<15xf32>
		// CHECK-DAG: memref.dealloc %[[alloc_tensor]]
		// CHECK-DAG: memref.dealloc %[[padded_alloc]]
		return %2 : f32
		}