This is an archive of the discontinued LLVM Phabricator instance.

Can you please add the integration test cases I suggested in the other patch? I'm still pretty sure this will miscompile. Fundamentally you are writing this as an "in-place" bufferization and that cannot be done with the current framework because it involves non-local reasoning.

Can you please revert this commit and the previous one I objected to? I think they are fundamentally going in the wrong direction.

https://llvm.org/docs/CodeReview.html#post-commit-review

If a community member expresses a concern about a recent commit, and this concern would have been significant enough to warrant a conversation during pre-commit review (including around the need for more design discussions), they may ask for a revert to the original author who is responsible to revert the patch promptly. Developers often disagree, and erring on the side of the developer asking for more review prevents any lingering disagreement over code in the tree. This does not indicate any fault from the patch author, this is inherent to our post-commit review practices. Reverting a patch ensures that design discussions can happen without blocking other development; it’s entirely possible the patch will end up being reapplied essentially as-is once concerns have been resolved.

pifon2a added a reverting change: rG967578f0b8b1: Revert "[mlir] Change the pattern for TiledLoopOp bufferization.".Aug 11 2021, 1:03 AM

In D107858#2938298, @silvas wrote:

Can you please add the integration test cases I suggested in the other patch? I'm still pretty sure this will miscompile. Fundamentally you are writing this as an "in-place" bufferization and that cannot be done with the current framework because it involves non-local reasoning.

I reverted both PRs. This is not really an in-place bufferization, this is an attempt to reason about subsets that tiled loop operates upon in the presence of an ugly terminator. That's why there is special logic to bufferize extract_slice/insert_slice that have a block arg of tiled loop as their operand. If tiled_loop had a separate region to specify the subset (tiled_range or full_range), then all of that would not be needed. Changing this operation would require time and it is not clear whether it makes sense to do that in MLIR OSS.

Thanks Alex!

In D107858#2938914, @pifon2a wrote:

In D107858#2938298, @silvas wrote:

Can you please add the integration test cases I suggested in the other patch? I'm still pretty sure this will miscompile. Fundamentally you are writing this as an "in-place" bufferization and that cannot be done with the current framework because it involves non-local reasoning.

I reverted both PRs. This is not really an in-place bufferization, this is an attempt to reason about subsets that tiled loop operates upon in the presence of an ugly terminator. That's why there is special logic to bufferize extract_slice/insert_slice that have a block arg of tiled loop as their operand. If tiled_loop had a separate region to specify the subset (tiled_range or full_range), then all of that would not be needed. Changing this operation would require time and it is not clear whether it makes sense to do that in MLIR OSS.

I'm not as much worried about the handling of extract_slice/insert_slice (I haven't looked in detail at that aspect). The most important problem I see is:

auto newLoop = rewriter.create<TiledLoopOp>(
    loc, adaptor.lowerBound(), adaptor.upperBound(), adaptor.step(),
    adaptor.inputs(), adaptor.outputs(), adaptor.iterator_types(),
    adaptor.distribution_types());

Note the use of adaptor.outputs() directly as the outs. Instead, it should be a copy of the outputs, to avoid miscompiles when the outputs are readonly or aliased. That is what I refer to when I say "in place bufferization".

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Bufferize.cpp

155 lines

test/

Dialect/

Linalg/

bufferize.mlir

69 lines

Diff 365583

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	matchAndRewrite(LinalgOp op, ArrayRef<Value> operands,
// We abuse the GenericOpAdaptor here.		// We abuse the GenericOpAdaptor here.
// TODO: Manually create an Adaptor that captures inputs and outputs for all		// TODO: Manually create an Adaptor that captures inputs and outputs for all
// linalg::LinalgOp interface ops.		// linalg::LinalgOp interface ops.
linalg::GenericOpAdaptor adaptor(operands, op->getAttrDictionary());		linalg::GenericOpAdaptor adaptor(operands, op->getAttrDictionary());

Location loc = op.getLoc();		Location loc = op.getLoc();
SmallVector<Value, 2> newOutputBuffers;		SmallVector<Value, 2> newOutputBuffers;

if (op->getParentOfType<TiledLoopOp>()) {		if (failed(allocateBuffersForResults(loc, op, adaptor.outputs(),
newOutputBuffers = adaptor.outputs();
} else if (failed(allocateBuffersForResults(loc, op, adaptor.outputs(),
newOutputBuffers, rewriter))) {		newOutputBuffers, rewriter))) {
return op.emitOpError()		return op.emitOpError()
<< "Failed to allocate buffers for tensor results.";		<< "Failed to allocate buffers for tensor results.";
}		}

// Delegate to the linalg generic pattern.		// Delegate to the linalg generic pattern.
if (auto genericOp = dyn_cast<linalg::GenericOp>(*op)) {		if (auto genericOp = dyn_cast<linalg::GenericOp>(*op)) {
finalizeBufferAllocationForGenericOp(rewriter, genericOp,		finalizeBufferAllocationForGenericOp(rewriter, genericOp,
adaptor.inputs(), newOutputBuffers);		adaptor.inputs(), newOutputBuffers);
return success();		return success();
}		}

finalizeBufferAllocation(rewriter, op, adaptor.inputs(), newOutputBuffers);		finalizeBufferAllocation(rewriter, op, adaptor.inputs(), newOutputBuffers);
return success();		return success();
}		}
};		};

bool IsBlockArgOfTiledLoop(Value tensor) {
if (auto tensorLoad = tensor.getDefiningOp<memref::TensorLoadOp>())
if (auto blockArgument = tensorLoad.memref().dyn_cast<BlockArgument>())
if (isa<TiledLoopOp>(blockArgument.getOwner()->getParentOp()))
return true;
return false;
}

/// Convert `extract_slice %t [offsets][sizes][strides] -> %st` to an		/// Convert `extract_slice %t [offsets][sizes][strides] -> %st` to an
/// alloc + copy pattern.		/// alloc + copy pattern.
/// ```		/// ```
/// %a = alloc(sizes)		/// %a = alloc(sizes)
/// %sv = subview %source [offsets][sizes][strides]		/// %sv = subview %source [offsets][sizes][strides]
/// linalg_copy(%sv, %a)		/// linalg_copy(%sv, %a)
/// ```		/// ```
///		///
/// This pattern is arguable a std pattern once linalg::CopyOp becomes		/// This pattern is arguable a std pattern once linalg::CopyOp becomes
/// std::CopyOp.		/// std::CopyOp.
class ExtractSliceOpConverter		class ExtractSliceOpConverter
: public OpConversionPattern<tensor::ExtractSliceOp> {		: public OpConversionPattern<tensor::ExtractSliceOp> {
public:		public:
using OpConversionPattern<tensor::ExtractSliceOp>::OpConversionPattern;		using OpConversionPattern<tensor::ExtractSliceOp>::OpConversionPattern;

LogicalResult		LogicalResult
matchAndRewrite(tensor::ExtractSliceOp op, ArrayRef<Value> operands,		matchAndRewrite(tensor::ExtractSliceOp op, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const final {		ConversionPatternRewriter &rewriter) const final {
tensor::ExtractSliceOpAdaptor adaptor(operands, op->getAttrDictionary());		tensor::ExtractSliceOpAdaptor adaptor(operands, op->getAttrDictionary());
Value sourceMemref = adaptor.source();		Value sourceMemref = adaptor.source();
assert(sourceMemref.getType().isa<MemRefType>());		assert(sourceMemref.getType().isa<MemRefType>());

// Block arguments of the tiled_loop can be bufferized inplace.
if (IsBlockArgOfTiledLoop(op.source())) {
Value subView = rewriter.create<memref::SubViewOp>(
op.getLoc(), sourceMemref, op.getMixedOffsets(), op.getMixedSizes(),
op.getMixedStrides());
rewriter.replaceOp(op, subView);
return success();
}

MemRefType subviewMemRefType =		MemRefType subviewMemRefType =
getTypeConverter()->convertType(op.getType()).cast<MemRefType>();		getTypeConverter()->convertType(op.getType()).cast<MemRefType>();
// op.sizes() capture exactly the dynamic alloc operands matching the		// op.sizes() capture exactly the dynamic alloc operands matching the
// subviewMemRefType thanks to subview/slice canonicalization and		// subviewMemRefType thanks to subview/slice canonicalization and
// verification.		// verification.
Value alloc = rewriter.create<memref::AllocOp>(		Value alloc = rewriter.create<memref::AllocOp>(
op.getLoc(), subviewMemRefType, op.sizes());		op.getLoc(), subviewMemRefType, op.sizes());
Value subView = rewriter.create<memref::SubViewOp>(		Value subView = rewriter.create<memref::SubViewOp>(
Show All 27 Lines	matchAndRewrite(tensor::InsertSliceOp op, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const final {		ConversionPatternRewriter &rewriter) const final {
tensor::InsertSliceOpAdaptor adaptor(operands, op->getAttrDictionary());		tensor::InsertSliceOpAdaptor adaptor(operands, op->getAttrDictionary());
Value sourceMemRef = adaptor.source();		Value sourceMemRef = adaptor.source();
assert(sourceMemRef.getType().isa<MemRefType>());		assert(sourceMemRef.getType().isa<MemRefType>());

// For now, be conservative and copy the converted input memref.		// For now, be conservative and copy the converted input memref.
// In general, the converted input memref here could be aliased or could		// In general, the converted input memref here could be aliased or could
// point into constant memory, so mutating it would lead to miscompilations.		// point into constant memory, so mutating it would lead to miscompilations.
// Block arguments of the tiled_loop can be bufferized inplace.		Value destMemRef = cloneMemref(op.getLoc(), adaptor.dest(), rewriter);
Value destMemRef;
if (IsBlockArgOfTiledLoop(op.dest()))
destMemRef = adaptor.dest();
else
destMemRef = cloneMemref(op.getLoc(), adaptor.dest(), rewriter);
assert(destMemRef.getType().isa<MemRefType>());		assert(destMemRef.getType().isa<MemRefType>());

// Take a subview to copy the small memref.		// Take a subview to copy the small memref.
Value subview = rewriter.create<memref::SubViewOp>(		Value subview = rewriter.create<memref::SubViewOp>(
op.getLoc(), destMemRef, op.getMixedOffsets(), op.getMixedSizes(),		op.getLoc(), destMemRef, op.getMixedOffsets(), op.getMixedSizes(),
op.getMixedStrides());		op.getMixedStrides());
// Copy the small memref.		// Copy the small memref.
rewriter.create<linalg::CopyOp>(op.getLoc(), sourceMemRef, subview);		rewriter.create<linalg::CopyOp>(op.getLoc(), sourceMemRef, subview);
rewriter.replaceOp(op, destMemRef);		rewriter.replaceOp(op, destMemRef);
return success();		return success();
}		}
};		};

		bool isBlockArgOfTiledLoop(Value tensor) {
		if (auto blockArgument = tensor.dyn_cast<BlockArgument>())
		return isa<TiledLoopOp>(blockArgument.getOwner()->getParentOp());
		return false;
		}

		SmallVector<Value, 3> convertOperands(ValueRange operands,
		BlockAndValueMapping &bvm) {
		SmallVector<Value, 3> newOperands;
		newOperands.reserve(operands.size());
		for (auto operand : operands)
		newOperands.push_back(bvm.lookupOrDefault(operand));
		return newOperands;
		}

class TiledLoopOpConverter : public OpConversionPattern<TiledLoopOp> {		class TiledLoopOpConverter : public OpConversionPattern<TiledLoopOp> {
public:		public:
using OpConversionPattern<TiledLoopOp>::OpConversionPattern;		using OpConversionPattern<TiledLoopOp>::OpConversionPattern;

LogicalResult		LogicalResult
matchAndRewrite(TiledLoopOp tiledLoop, ArrayRef<Value> operands,		matchAndRewrite(TiledLoopOp loop, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const final {		ConversionPatternRewriter &rewriter) const final {
TiledLoopOp::Adaptor adaptor(operands, tiledLoop->getAttrDictionary());		TiledLoopOp::Adaptor adaptor(operands, loop->getAttrDictionary());
Location loc = tiledLoop.getLoc();		if (loop.getNumResults() == 0)
if (tiledLoop.getNumResults() == 0)
return failure();		return failure();
auto newTiledLoop = rewriter.create<TiledLoopOp>(
		Location loc = loop.getLoc();
		auto newLoop = rewriter.create<TiledLoopOp>(
loc, adaptor.lowerBound(), adaptor.upperBound(), adaptor.step(),		loc, adaptor.lowerBound(), adaptor.upperBound(), adaptor.step(),
adaptor.inputs(), adaptor.outputs(), adaptor.iterator_types(),		adaptor.inputs(), adaptor.outputs(), adaptor.iterator_types(),
adaptor.distribution_types());		adaptor.distribution_types());

// Clone the region.		// Clone the region.
BlockAndValueMapping bvm;		BlockAndValueMapping bvm;
bvm.map(tiledLoop.getInductionVars(), newTiledLoop.getInductionVars());		bvm.map(loop.getInductionVars(), newLoop.getInductionVars());
		bvm.map(loop.getRegionInputArgs(), newLoop.getRegionInputArgs());
		bvm.map(loop.getRegionOutputArgs(), newLoop.getRegionOutputArgs());

OpBuilder innerBuilder =		OpBuilder innerBuilder =
OpBuilder::atBlockEnd(newTiledLoop.getBody(), rewriter.getListener());		OpBuilder::atBlockEnd(newLoop.getBody(), rewriter.getListener());

// Remap input block arguments.		for (auto &op : loop.getBody()->getOperations()) {
SmallVector<Value, 2> inputs;		Location loc = op.getLoc();
for (auto en : llvm::zip(newTiledLoop.getRegionInputArgs(),		if (auto extractSlice = dyn_cast<tensor::ExtractSliceOp>(op)) {
tiledLoop.getRegionInputArgs())) {		if (isBlockArgOfTiledLoop(extractSlice.source())) {
auto &newInputArg = std::get<0>(en);		auto newOperands = convertOperands(extractSlice.getOperands(), bvm);
if (!newInputArg.getType().isa<ShapedType>()) {		auto srcMemRefType =
inputs.push_back(std::get<0>(en));		bvm.lookup(extractSlice.source()).getType().cast<MemRefType>();
		auto dstMemRefType =
		memref::SubViewOp::inferResultType(
		srcMemRefType,
		extractFromI64ArrayAttr(extractSlice.static_offsets()),
		extractFromI64ArrayAttr(extractSlice.static_sizes()),
		extractFromI64ArrayAttr(extractSlice.static_strides()))
		.cast<MemRefType>();

		Value subView = innerBuilder.create<memref::SubViewOp>(
		loc, TypeRange{dstMemRefType}, newOperands,
		extractSlice->getAttrs());
		bvm.map(extractSlice.getResult(), subView);
continue;		continue;
}		}
inputs.push_back(
innerBuilder.create<memref::TensorLoadOp>(loc, newInputArg));
}		}
bvm.map(tiledLoop.getRegionInputArgs(), inputs);		if (auto insertSlice = dyn_cast<tensor::InsertSliceOp>(op)) {
		if (isBlockArgOfTiledLoop(insertSlice.dest())) {
// Remap output block arguments.
SmallVector<Value, 2> outputs;
for (auto en : llvm::zip(newTiledLoop.getRegionOutputArgs(),
tiledLoop.getRegionOutputArgs())) {
auto &newOutputArg = std::get<0>(en);
if (!newOutputArg.getType().isa<ShapedType>()) {
outputs.push_back(std::get<0>(en));
continue;		continue;
}		}
outputs.push_back(
innerBuilder.create<memref::TensorLoadOp>(loc, newOutputArg));
}		}
bvm.map(tiledLoop.getRegionOutputArgs(), outputs);		if (auto yield = dyn_cast<linalg::YieldOp>(op)) {
		for (OpOperand &operand : yield->getOpOperands()) {
		if (auto insert =
		operand.get().getDefiningOp<tensor::InsertSliceOp>()) {

		auto dstMemRefType = memref::SubViewOp::inferResultType(
		getTypeConverter()
		->convertType(insert.source().getType())
		.cast<MemRefType>(),
		extractFromI64ArrayAttr(insert.static_offsets()),
		extractFromI64ArrayAttr(insert.static_sizes()),
		extractFromI64ArrayAttr(insert.static_strides()));

		Value subView = innerBuilder.create<memref::SubViewOp>(
		loc, dstMemRefType, bvm.lookup(insert.dest()),
		convertOperands(insert.offsets(), bvm),
		convertOperands(insert.sizes(), bvm),
		convertOperands(insert.strides(), bvm), insert.static_offsets(),
		insert.static_sizes(), insert.static_strides());

		Value cast = innerBuilder.create<memref::BufferCastOp>(
		loc,
		getTypeConverter()
		->convertType(insert.source().getType())
		.cast<MemRefType>(),
		bvm.lookup(insert.source()));

for (auto &op : tiledLoop.getBody()->without_terminator())		innerBuilder.create<linalg::CopyOp>(loc, cast, subView);
		continue;
		}
		auto dst = newLoop.getRegionOutputArgs()[operand.getOperandNumber()];
		Value cast = innerBuilder.create<memref::BufferCastOp>(
		loc, dst.getType(), bvm.lookup(operand.get()));
		innerBuilder.create<linalg::CopyOp>(loc, cast, dst);
		}
		continue;
		}
innerBuilder.clone(op, bvm);		innerBuilder.clone(op, bvm);
		}
innerBuilder.create<linalg::YieldOp>(loc);		innerBuilder.create<linalg::YieldOp>(loc);
rewriter.replaceOp(tiledLoop, newTiledLoop.outputs());		rewriter.replaceOp(loop, newLoop.outputs());
return success();		return success();
}		}
};		};

class VectorTransferReadOpConverter		class VectorTransferReadOpConverter
: public OpConversionPattern<vector::TransferReadOp> {		: public OpConversionPattern<vector::TransferReadOp> {
public:		public:
using OpConversionPattern<vector::TransferReadOp>::OpConversionPattern;		using OpConversionPattern<vector::TransferReadOp>::OpConversionPattern;
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/bufferize.mlir

Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	%A_sub = tensor.extract_slice %A_[%i] [%c2] [1]
: tensor<10xf32> to tensor<?xf32>		: tensor<10xf32> to tensor<?xf32>
%B_sub = tensor.extract_slice %B_[%i] [%c2] [1]		%B_sub = tensor.extract_slice %B_[%i] [%c2] [1]
: tensor<10xf32> to tensor<?xf32>		: tensor<10xf32> to tensor<?xf32>
%dot_sub = linalg.dot ins(%A_sub, %B_sub : tensor<?xf32>, tensor<?xf32>)		%dot_sub = linalg.dot ins(%A_sub, %B_sub : tensor<?xf32>, tensor<?xf32>)
outs(%C_ : tensor<f32>) -> tensor<f32>		outs(%C_ : tensor<f32>) -> tensor<f32>
linalg.yield %dot_sub : tensor<f32>		linalg.yield %dot_sub : tensor<f32>
}		}
// CHECK: linalg.tiled_loop		// CHECK: linalg.tiled_loop
// CHECK-SAME: ins (%[[A:.]] = %{{.}}: memref<10xf32>, %[[B:.]] = %{{.}}: memref<10xf32>)		// CHECK-SAME: ins (%[[A:arg[0-9]]] = %{{[0-9]}}: memref<10xf32>,
// CHECK-SAME: outs (%[[C:.]] = %{{.}}: memref<f32>)		// CHECK-SAME: %[[B:arg[0-9]]] = %{{[0-9]}}: memref<10xf32>
// CHECK-NOT: alloc		// CHECK-SAME: outs (%[[C:arg[0-9]]] = %{{[0-9]}}: memref<f32>)
// CHECK: %[[SV_A:.*]] = memref.subview %[[A]]
// CHECK: %[[SV_B:.*]] = memref.subview %[[B]]		// CHECK-NEXT: %[[SV_A:.*]] = memref.subview %[[A]]
// CHECK: linalg.dot ins(%[[SV_A]], %[[SV_B]]		// CHECK-NEXT: %[[SV_B:.*]] = memref.subview %[[B]]
// CHECK-SAME: outs(%[[C]] : memref<f32>)		// CHECK-NEXT: %[[TMP:.*]] = memref.alloc
// CHECK: linalg.yield		// CHECK-NEXT: linalg.copy(%[[C]], %[[TMP]])
		// CHECK-NEXT: linalg.dot ins(%[[SV_A]], %[[SV_B]]
		// CHECK-SAME: outs(%[[TMP]] : memref<f32>)
		// CHECK-NEXT: linalg.copy(%[[TMP]], %[[C]])
		// CHECK-NEXT: linalg.yield
return %dot : tensor<f32>		return %dot : tensor<f32>
}		}

		// -----

		#map0 = affine_map<(d0) -> (d0)>

		func @tiled_add(%A: tensor<10xf32>, %B: tensor<10xf32>,
		%C: tensor<10xf32>) -> tensor<10xf32> {
		%c0 = constant 0 : index
		%c2 = constant 2 : index
		%c10 = constant 10 : index

		%sum = linalg.tiled_loop (%i) = (%c0) to (%c10) step (%c2)
		ins (%A_ = %A: tensor<10xf32>, %B_ = %B: tensor<10xf32>)
		outs (%C_ = %C: tensor<10xf32>) {
		%A_sub = tensor.extract_slice %A_[%i] [%c2] [1]
		: tensor<10xf32> to tensor<?xf32>
		%B_sub = tensor.extract_slice %B_[%i] [%c2] [1]
		: tensor<10xf32> to tensor<?xf32>
		%C_sub = tensor.extract_slice %C_[%i] [%c2] [1]
		: tensor<10xf32> to tensor<?xf32>
		%sum_sub = linalg.generic {
		indexing_maps = [#map0, #map0, #map0],
		iterator_types = ["parallel"]
		} ins(%A_sub, %B_sub : tensor<?xf32>, tensor<?xf32>)
		outs(%C_sub : tensor<?xf32>) {
		^bb0(%a: f32, %b: f32, %c: f32):
		%0 = std.addf %a, %b : f32
		linalg.yield %0 : f32
		} -> tensor<?xf32>
		%update = tensor.insert_slice %sum_sub into %C_[%i] [%c2] [1]
		: tensor<?xf32> into tensor<10xf32>
		linalg.yield %update : tensor<10xf32>
		}
		// CHECK: linalg.tiled_loop
		// CHECK-SAME: ins (%[[A:arg[0-9]]] = %{{[0-9]}}: memref<10xf32>,
		// CHECK-SAME: %[[B:arg[0-9]]] = %{{[0-9]}}: memref<10xf32>
		// CHECK-SAME: outs (%[[C:arg[0-9]]] = %{{[0-9]}}: memref<10xf32>)

		// CHECK-NEXT: %[[SV_A:.*]] = memref.subview %[[A]]
		// CHECK-NEXT: %[[SV_B:.*]] = memref.subview %[[B]]
		// CHECK-NEXT: %[[TMP:.*]] = memref.alloc
		// CHECK-NEXT: linalg.generic
		// CHECK-SAME: ins(%[[SV_A]], %[[SV_B]]
		// CHECK-SAME: outs(%[[TMP]] : memref<2xf32>)
		// CHECK: %[[SV_C:.*]] = memref.subview %[[C]]
		// CHECK-NEXT: linalg.copy(%[[TMP]], %[[SV_C]])
		// CHECK-NEXT: linalg.yield
		return %sum : tensor<10xf32>
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Change the pattern for TiledLoopOp bufferization.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 365583

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp

mlir/test/Dialect/Linalg/bufferize.mlir

[mlir] Change the pattern for TiledLoopOp bufferization.
ClosedPublic