This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Bufferization/IR/
-
mlir/
-
Dialect/
-
Bufferization/
-
IR/
5/5
BufferizableOpInterface.td
-
lib/Dialect/
-
Dialect/
-
Bufferization/Transforms/
-
Transforms/
-
OneShotAnalysis.cpp
-
Linalg/Transforms/
-
Transforms/
2/2
BufferizableOpInterfaceImpl.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
one-shot-bufferize-analysis.mlir

Differential D156887

[mlir][bufferization] Improve analysis for element-wise operation
ClosedPublic

Authored by springerm on Aug 2 2023, 6:26 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
steeve

Commits

rG546834055327: [mlir][bufferization] Improve analysis for element-wise operations

Summary

Before this change, two equivalent operands that bufferize to a memory read and write, respectively, were always conflicting. This change improves the analysis for ops that bufferize to element-wise access. Such ops can bufferize in-place, because an original element value is not needed anymore after computing and writing an updated element value.

This change allows ops such as the following one to bufferize in-place:

%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
    ins(%a, %b : tensor<5xf32>, tensor<5xf32>)
    outs(%a : tensor<5xf32>) -> tensor<5xf32>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Aug 2 2023, 6:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 6:26 AM

Herald added subscribers: bviyer, Moerafaat, zero9178 and 25 others. · View Herald Transcript

springerm requested review of this revision.Aug 2 2023, 6:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 6:26 AM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B249752: Diff 546439.Aug 2 2023, 8:24 AM

springerm edited the summary of this revision. (Show Details)Aug 3 2023, 2:31 AM

Herald added a subscriber: limo1996. · View Herald TranscriptAug 3 2023, 2:31 AM

Hi,

Thank you for this quick patch!
I am running into this issue ("wrong allocation"). OP's patch works, but only if both operands are of the same type. If one of those if an arith.constant, then allocation occurs. I believe this is because areEquivalentBufferizedValues fails.

Sample MLIR:

func.func @relu(%a: tensor<5xf32>) -> tensor<5xf32> {
    %c0f = arith.constant 0.0 : f32
    %0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
        ins(%a, %c0f : tensor<5xf32>, f32)
        outs(%a : tensor<5xf32>) -> tensor<5xf32>
    return %0 : tensor<5xf32>
}

address comments: better analysis

springerm added a reviewer: steeve.Aug 3 2023, 6:35 AM

Thank you!
This now works with the example, but fails if the tensor is somehow 32xf32 instead of 5xf32

!TC = tensor<32xf32>
func.func @relu(%a: !TC) -> !TC {
    %c0f = arith.constant 0.0 : f32
    %0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
        ins(%a, %c0f : !TC, f32)
        outs(%a : !TC) -> !TC
    return %0 : !TC
}

EDIT: it fails when the tensor is higher than 8xf32, meaning the first failing case is 9xf32, no matter how many dimensions (8x8x8x8 works, 9x9x9x9 doesn't)

In D156887#4557538, @steeve wrote:
Thank you!
This now works with the example, but fails if the tensor is somehow 32xf32 instead of 5xf32
!TC = tensor<32xf32>
func.func @relu(%a: !TC) -> !TC {
    %c0f = arith.constant 0.0 : f32
    %0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
        ins(%a, %c0f : !TC, f32)
        outs(%a : !TC) -> !TC
    return %0 : !TC
}
EDIT: it fails when the tensor is higher than 8xf32, meaning the first failing case is 9xf32

Hmm I cannot reproduce this. Did you run it as part of the test case that I added? It bufferizes in-place on my machine:

func.func @relu(%arg0: tensor<32xf32> {bufferization.access = "read-write"}) -> tensor<32xf32> {
  %cst = arith.constant 0.000000e+00 : f32
  %0 = linalg.elemwise_binary {__inplace_operands_attr__ = ["true", "none", "true"], fun = #linalg.binary_fn<add>} ins(%arg0, %cst : tensor<32xf32>, f32) outs(%arg0 : tensor<32xf32>) -> tensor<32xf32>
  return {__equivalent_func_args__ = [0], __inplace_operands_attr__ = ["true"]} %0 : tensor<32xf32>
}

Thanks!

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td
100	nit: in compiler linguo we'd also say "free of loop-carried dependences"
102	typo: `hypothetical`
140	`that the op is not element-wise.` -> `that the op is element-wise.` ?
mlir/lib/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.cpp
118	move this TODO to l129 ?
125	Technically this would break if we allowed mixed tensor/buffer (which we did in the past). Guard against memref types too?

This revision is now accepted and ready to land.Aug 3 2023, 7:25 AM

springerm marked 5 inline comments as done.Aug 3 2023, 7:32 AM

springerm added inline comments.

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td
140	This is correct as is. Being element-wise enable an additional optimization. If we don't know for sure, we don't optimize.

springerm marked an inline comment as done.Aug 3 2023, 7:36 AM

This revision was landed with ongoing or failed builds.Aug 3 2023, 7:36 AM

Closed by commit rG546834055327: [mlir][bufferization] Improve analysis for element-wise operations (authored by springerm). · Explain Why

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rG546834055327: [mlir][bufferization] Improve analysis for element-wise operations.

nicolasvasilache added inline comments.Aug 3 2023, 7:44 AM

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td
140	ah yes sorry, confused myself over nothing :)

steeve added a comment.Aug 3 2023, 7:52 AM

This comment was removed by steeve.

Here is a full example, the problem arises when the tensor dimension is bigger than the tile size in transform.structured.fuse:

!TT = tensor<32xf32>
func.func @elementwise_no_conflict(%a: !TT,
                                   %b: !TT) -> !TT {
    %c0f = arith.constant 0.0 : f32
    %0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
        ins(%a, %c0f : !TT, f32)
        outs(%a : !TT) -> !TT
  return %0 : !TT
}

transform.sequence failures(propagate) {
^bb0(%module: !transform.any_op):
    %max = transform.structured.match ops{["linalg.elemwise_binary"]} in %module
        : (!transform.any_op) -> !transform.any_op

    %max_tiled, %loops:2 = transform.structured.fuse %max { tile_sizes = [8, 8] } // <----------
        : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

    %module_2 = transform.bufferization.one_shot_bufferize
        layout{IdentityLayoutMap} %module
        { bufferize_function_boundaries = true, allow_return_allocs = true }
        : (!transform.any_op) -> !transform.any_op

    transform.yield
}

Run with:

$ mlir-opt test.mlir --test-transform-dialect-interpreter --test-transform-dialect-erase-schedule

In D156887#4557838, @steeve wrote:

Here is a full example, the problem arises when the tensor dimension is bigger than the tile size in transform.structured.fuse:

!TT = tensor<32xf32>
func.func @elementwise_no_conflict(%a: !TT,
                                   %b: !TT) -> !TT {
    %c0f = arith.constant 0.0 : f32
    %0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
        ins(%a, %c0f : !TT, f32)
        outs(%a : !TT) -> !TT
  return %0 : !TT
}

transform.sequence failures(propagate) {
^bb0(%module: !transform.any_op):
    %max = transform.structured.match ops{["linalg.elemwise_binary"]} in %module
        : (!transform.any_op) -> !transform.any_op

    %max_tiled, %loops:2 = transform.structured.fuse %max { tile_sizes = [8, 8] } // <----------
        : (!transform.any_op) -> (!transform.any_op, !transform.any_op, !transform.any_op)

    %module_2 = transform.bufferization.one_shot_bufferize
        layout{IdentityLayoutMap} %module
        { bufferize_function_boundaries = true, allow_return_allocs = true }
        : (!transform.any_op) -> !transform.any_op

    transform.yield
}

Run with:

$ mlir-opt test.mlir --test-transform-dialect-interpreter --test-transform-dialect-erase-schedule

There's probably an alloc already in some shape or form (tensor.empty, bufferization.alloc_tensor, etc.) before it hits bufferization.

Harbormaster completed remote builds in B250045: Diff 546836.Aug 3 2023, 8:31 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Bufferization/

IR/

BufferizableOpInterface.td

50 lines

lib/

Dialect/

Bufferization/

Transforms/

OneShotAnalysis.cpp

16 lines

Linalg/

Transforms/

BufferizableOpInterfaceImpl.cpp

31 lines

test/

Dialect/

Linalg/

one-shot-bufferize-analysis.mlir

59 lines

Diff 546836

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	let methods = [
// Does not have to be implemented for ops without tensor OpOperands.		// Does not have to be implemented for ops without tensor OpOperands.
// Does not have to be implemented for OpOperands that do not have an		// Does not have to be implemented for OpOperands that do not have an
// aliasing OpResult.		// aliasing OpResult.
llvm_unreachable("bufferizesToMemoryWrite not implemented");		llvm_unreachable("bufferizesToMemoryWrite not implemented");
}]		}]
>,		>,
InterfaceMethod<		InterfaceMethod<
/desc=/[{		/desc=/[{
		Return `true` if the operation bufferizes to IR that performs only
		element-wise accesses on all tensor operands. (All operands must have
		the same shape.) The `bufferize` method must be implemented in such a
		way that all loads at a position appear before all stores at the same
		position.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: in compiler linguo we'd also say "free of loop-carried dependences" nicolasvasilache: nit: in compiler linguo we'd also say "free of loop-carried dependences"

		Example: Consider a hypthetical op element-wise op, where the "ins"
		nicolasvasilacheUnsubmitted Done Reply Inline Actions typo: `hypothetical` nicolasvasilache: typo: `hypothetical`
		bufferize to a memory read and the "outs" bufferize to a memory write.
		```
		test.element_wise ins(%0), outs(%1) : tensor<3xf32>
		```

		The following is a valid access pattern:
		```
		load(%0[1])
		store(%1[1])
		load(%0[2])
		store(%1[2])
		load(%0[0])
		store(%1[0])
		```

		The following would be an invalid (not element-wise) access pattern:
		```
		load(%0[1])
		store(%0[1])
		load(%0[1])
		...
		```

		Element-wise ops can sometimes bufferize more efficiently: a RaW
		conflict between two operands of the same op can be avoided if it is
		guaranteed that an original element value is no longer needed after
		writing a computed element value at the same location. E.g., such an
		optimization is possible in the above example if %0 and %1 are
		equivalent tensors. (It is not possible, if %0 and %1 are merely
		aliasing. It is not necessary if %0 and %1 are not aliasing at all,
		because there would be no conflict anyway.)
		}],
		/retType=/"bool",
		/methodName=/"bufferizesToElementwiseAccess",
		/args=/(ins "const ::mlir::bufferization::AnalysisState &":$state),
		/methodBody=/"",
		/defaultImplementation=/[{
		// It is always safe to assume that the op is not element-wise.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `that the op is not element-wise.` -> `that the op is element-wise.` ? nicolasvasilache: `that the op is not element-wise.` -> `that the op is element-wise.` ?
		springermAuthorUnsubmitted Done Reply Inline Actions This is correct as is. Being element-wise enable an additional optimization. If we don't know for sure, we don't optimize. springerm: This is correct as is. Being element-wise enable an additional optimization. If we don't know…
		nicolasvasilacheUnsubmitted Done Reply Inline Actions ah yes sorry, confused myself over nothing :) nicolasvasilache: ah yes sorry, confused myself over nothing :)
		return false;
		}]
		>,
		InterfaceMethod<
		/desc=/[{
Return `true` if the given OpResult bufferizes to a memory write.		Return `true` if the given OpResult bufferizes to a memory write.
This is the same property as `bufferizesToMemoryWrite`, but from The		This is the same property as `bufferizesToMemoryWrite`, but from The
perspective of OpResults.		perspective of OpResults.

This method will never be called on OpResults that do not have a		This method will never be called on OpResults that do not have a
tensor type.		tensor type.

This method has a default implementation. By default, it returns		This method has a default implementation. By default, it returns
▲ Show 20 Lines • Show All 410 Lines • Show Last 20 Lines

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp

Show First 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	for (OpOperand *uConflictingWrite : usesWrite) {
// multiple times.		// multiple times.
if (insideMutuallyExclusiveRegions(readingOp, conflictingWritingOp)) {		if (insideMutuallyExclusiveRegions(readingOp, conflictingWritingOp)) {
LLVM_DEBUG(llvm::dbgs() << " no conflict: read and write are in "		LLVM_DEBUG(llvm::dbgs() << " no conflict: read and write are in "
"mutually exclusive regions\n");		"mutually exclusive regions\n");
continue;		continue;
}		}
}		}

		// Two equivalent operands of the same op are not conflicting if the op
		// bufferizes to element-wise access. I.e., all loads at a position happen
		// before all stores to the same position.
		if (conflictingWritingOp == readingOp &&
		state.areEquivalentBufferizedValues(uRead->get(),
		uConflictingWrite->get())) {
		if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
		if (bufferizableOp.bufferizesToElementwiseAccess(state)) {
		LLVM_DEBUG(
		llvm::dbgs()
		<< " no conflict: op bufferizes to element-wise access\n");
		continue;
		}
		}
		}

// No conflict if the op interface says so.		// No conflict if the op interface says so.
if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {		if (auto bufferizableOp = options.dynCastBufferizableOp(readingOp)) {
if (bufferizableOp.isNotConflicting(uRead, uConflictingWrite, state)) {		if (bufferizableOp.isNotConflicting(uRead, uConflictingWrite, state)) {
LLVM_DEBUG(llvm::dbgs()		LLVM_DEBUG(llvm::dbgs()
<< " no conflict: op interace of reading op says 'no'\n");		<< " no conflict: op interace of reading op says 'no'\n");
continue;		continue;
}		}
}		}
▲ Show 20 Lines • Show All 594 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.cpp

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	/// operates entirely on memrefs.			/// operates entirely on memrefs.
	template <typename OpTy>			template <typename OpTy>
	struct LinalgOpInterface			struct LinalgOpInterface
	: public DstBufferizableOpInterfaceExternalModel<LinalgOpInterface<OpTy>,			: public DstBufferizableOpInterfaceExternalModel<LinalgOpInterface<OpTy>,
	OpTy> {			OpTy> {
	bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,			bool bufferizesToMemoryRead(Operation *op, OpOperand &opOperand,
	const AnalysisState &state) const {			const AnalysisState &state) const {
	// Operand is read if it is used in the computation.			// Operand is read if it is used in the computation.
	auto genericOp = cast<linalg::LinalgOp>(op);			auto linalgOp = cast<linalg::LinalgOp>(op);
	return genericOp.payloadUsesValueFromOperand(&opOperand);			return linalgOp.payloadUsesValueFromOperand(&opOperand);
	}			}

	bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,			bool bufferizesToMemoryWrite(Operation *op, OpOperand &opOperand,
	const AnalysisState &state) const {			const AnalysisState &state) const {
	// Operand is written to if it is not an input/init.			// Operand is written to if it is not an input/init.
	auto dpsOp = cast<DestinationStyleOpInterface>(op);			auto dpsOp = cast<DestinationStyleOpInterface>(op);
	return dpsOp.isDpsInit(&opOperand);			return dpsOp.isDpsInit(&opOperand);
	}			}

				bool bufferizesToElementwiseAccess(Operation *op,
				const AnalysisState &state) const {
				auto linalgOp = cast<linalg::LinalgOp>(op);

				// All loops must be parallel.
				if (linalgOp.getNumLoops() != linalgOp.getNumParallelLoops())
				return false;

				// All index maps of tensors must be identity maps.
				// TODO: This could be generalized to other indexing maps. (All indexing
				nicolasvasilacheUnsubmitted Done Reply Inline Actions move this TODO to l129 ? nicolasvasilache: move this TODO to l129 ?
				// must be the same.)
				SmallVector<AffineMap> indexingMaps = linalgOp.getIndexingMapsArray();
				assert(linalgOp->getNumOperands() == indexingMaps.size() &&
				"unexpected number of indexing maps");
				for (auto [operand, map] :
				llvm::zip(linalgOp->getOperands(), indexingMaps)) {
				// Non-tensors do not participate in bufferization, so they can be
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Technically this would break if we allowed mixed tensor/buffer (which we did in the past). Guard against memref types too? nicolasvasilache: Technically this would break if we allowed mixed tensor/buffer (which we did in the past).
				// ignored.
				if (!isa<TensorType>(operand.getType()))
				continue;
				if (!map.isIdentity())
				return false;
				}

				return true;
				}

	LogicalResult bufferize(Operation *op, RewriterBase &rewriter,			LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
	const BufferizationOptions &options) const {			const BufferizationOptions &options) const {
	return bufferizeDestinationStyleOpInterface(			return bufferizeDestinationStyleOpInterface(
	rewriter, cast<DestinationStyleOpInterface>(op), options);			rewriter, cast<DestinationStyleOpInterface>(op), options);
	}			}
	};			};

	/// Helper structure that iterates over all LinalgOps in `OpTys` and registers			/// Helper structure that iterates over all LinalgOps in `OpTys` and registers
	Show All 21 Lines

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir

This file was added.

				// RUN: mlir-opt %s -one-shot-bufferize="allow-return-allocs bufferize-function-boundaries test-analysis-only" -split-input-file \| FileCheck %s

				// CHECK-LABEL: @elementwise_no_conflict
				func.func @elementwise_no_conflict(%a: tensor<5xf32>,
				%b: tensor<5xf32>) -> tensor<5xf32> {
				// CHECK: linalg.elemwise_binary
				// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"], fun = #linalg.binary_fn<add>}
				%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
				ins(%a, %b : tensor<5xf32>, tensor<5xf32>)
				outs(%a : tensor<5xf32>) -> tensor<5xf32>
				return %0 : tensor<5xf32>
				}

				// -----

				// CHECK-LABEL: @elementwise_no_conflict_2
				func.func @elementwise_no_conflict_2(%a: tensor<5xf32>) -> tensor<5xf32> {
				// CHECK: linalg.elemwise_binary
				// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "true"], fun = #linalg.binary_fn<add>}
				%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
				ins(%a, %a : tensor<5xf32>, tensor<5xf32>)
				outs(%a : tensor<5xf32>) -> tensor<5xf32>
				return %0 : tensor<5xf32>
				}

				// -----

				// CHECK-LABEL: @elementwise_no_conflict_3
				func.func @elementwise_no_conflict_3(%a: tensor<5xf32>) -> tensor<5xf32> {
				%c0f = arith.constant 1.0 : f32
				// CHECK: linalg.elemwise_binary
				// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none", "true"], fun = #linalg.binary_fn<add>}
				%0 = linalg.elemwise_binary {fun = #linalg.binary_fn<add>}
				ins(%a, %c0f : tensor<5xf32>, f32)
				outs(%a : tensor<5xf32>) -> tensor<5xf32>
				return %0 : tensor<5xf32>
				}

				// -----

				func.func @not_elementwise(%a: tensor<5x6xf32>) -> tensor<5x6xf32> {
				%cst = arith.constant 5.0 : f32
				// CHECK: tensor.extract_slice
				// CHECK-SAME: {__inplace_operands_attr__ = ["false"]}
				%b = tensor.extract_slice %a[0, 0] [1, 6] [1, 1]
				: tensor<5x6xf32> to tensor<6xf32>
				// CHECK: linalg.generic
				// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true"]}
				%0 = linalg.generic
				{ iterator_types = ["parallel", "parallel"],
				indexing_maps = [ affine_map<(d0, d1) -> (d1)>,
				affine_map<(d0, d1) -> (d0, d1)>] }
				ins(%b: tensor<6xf32>) outs(%a: tensor<5x6xf32>) {
				^bb0(%arg0: f32, %arg1: f32):
				%r = arith.addf %arg0, %arg1 : f32
				linalg.yield %r : f32
				} -> tensor<5x6xf32>
				return %0 : tensor<5x6xf32>
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferization] Improve analysis for element-wise operationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 546836

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp

mlir/lib/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.cpp

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis.mlir

[mlir][bufferization] Improve analysis for element-wise operation
ClosedPublic