This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Bufferization/IR/
-
mlir/
-
Dialect/
-
Bufferization/
-
IR/
1/1
BufferizableOpInterface.h
-
BufferizableOpInterface.td
-
lib/Dialect/
-
Dialect/
-
Bufferization/
-
IR/
1/1
BufferizableOpInterface.cpp
-
Transforms/
2/2
OneShotAnalysis.cpp
-
SCF/Transforms/
-
Transforms/
-
BufferizableOpInterfaceImpl.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
-
one-shot-bufferize-analysis.mlir

Differential D159286

[mlir][bufferization] Privatize buffers for parallel regions
ClosedPublic

Authored by springerm on Aug 31 2023, 7:39 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
maerhart

Commits

rG1e1a3112f123: [mlir][bufferization] Privatize buffers for parallel regions

Summary

One-Shot Bufferize correctly handles RaW conflicts around repetitive regions (loops). Specical handling is needed for parallel regions. These are a special kind of repetitive regions that can have additional RaW conflicts that would not be present if the regions would be executed sequentially.

Example:

%0 = bufferization.alloc_tensor()
scf.forall ... {
  %1 = linalg.fill ins(...) outs(%0)
  ...
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %1 into ...
  }
}

A separate (private) buffer must be allocated for each iteration of the scf.forall loop.

This change adds a new interface method to BufferizableOpInterface to detect parallel regions. By default, regions are assumed to be sequential.

A buffer is privatized if an OpOperand bufferizes to a memory read inside a parallel region that is different from the parallel region where operand's value is defined.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Aug 31 2023, 7:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 7:39 AM

Herald added subscribers: bviyer, Moerafaat, zero9178 and 22 others. · View Herald Transcript

springerm requested review of this revision.Aug 31 2023, 7:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 7:39 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B256029: Diff 555046.Aug 31 2023, 8:01 AM

Herald added a subscriber: jsetoain. · View Herald TranscriptAug 31 2023, 8:01 AM

maerhart added inline comments.Sep 4 2023, 7:56 AM

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h
687	region?
mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp
127	Nit: maybe assert that it is also a repetitive region since that is requested in the documentation?
mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
551	Does this include reads of the result of the region-containing op? I.e., there are writes inside the parallel region, but they are basically discarded (like doing nothing in the `in_parallel`), is it guaranteed that they won't show up?

address comments

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp
551	If an alias of the `scf.forall` result (which is also an alias of the init_arg) is written in the loop body and read afterwards, something (usually the op inside the loop) must bufferize out-of-place. This is irrespective of parallelism. But I added a test case where there is no read that bufferizes in-place.

Harbormaster completed remote builds in B256589: Diff 555829.Sep 5 2023, 2:16 AM

maerhart accepted this revision.Sep 5 2023, 3:10 AM

This revision is now accepted and ready to land.Sep 5 2023, 3:10 AM

Closed by commit rG1e1a3112f123: [mlir][bufferization] Privatize buffers for parallel regions (authored by springerm). · Explain WhySep 6 2023, 5:37 AM

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rG1e1a3112f123: [mlir][bufferization] Privatize buffers for parallel regions.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Bufferization/

IR/

BufferizableOpInterface.h

8 lines

BufferizableOpInterface.td

19 lines

lib/

Dialect/

Bufferization/

IR/

BufferizableOpInterface.cpp

15 lines

Transforms/

OneShotAnalysis.cpp

37 lines

SCF/

Transforms/

BufferizableOpInterfaceImpl.cpp

4 lines

test/

Dialect/

SCF/

one-shot-bufferize-analysis.mlir

103 lines

Diff 556013

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h

	Show First 20 Lines • Show All 678 Lines • ▼ Show 20 Lines
	/// owner of the block. In case of an OpResult that is the defining op.			/// owner of the block. In case of an OpResult that is the defining op.
	Operation *getOwnerOfValue(Value value);			Operation *getOwnerOfValue(Value value);

	/// Assuming that the given region is repetitive, find the next enclosing			/// Assuming that the given region is repetitive, find the next enclosing
	/// repetitive region.			/// repetitive region.
	Region getNextEnclosingRepetitiveRegion(Region region,			Region getNextEnclosingRepetitiveRegion(Region region,
	const BufferizationOptions &options);			const BufferizationOptions &options);

				/// If `region` is a parallel region, return `region`. Otherwise, find the first
				maerhartUnsubmitted Done Reply Inline Actions region? maerhart: region?
				/// enclosing parallel region of `region`. If there is no such region, return
				/// "nullptr".
				///
				/// Note: Whether a region is parallel or sequential is queried from the
				/// `BufferizableOpInterface`.
				Region getParallelRegion(Region region, const BufferizationOptions &options);

	namespace detail {			namespace detail {
	/// This is the default implementation of			/// This is the default implementation of
	/// BufferizableOpInterface::getAliasingOpOperands. Should not be called from			/// BufferizableOpInterface::getAliasingOpOperands. Should not be called from
	/// other places.			/// other places.
	AliasingOpOperandList defaultGetAliasingOpOperands(Value value,			AliasingOpOperandList defaultGetAliasingOpOperands(Value value,
	const AnalysisState &state);			const AnalysisState &state);

	/// This is the default implementation of			/// This is the default implementation of
	Show All 39 Lines

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td

Show First 20 Lines • Show All 550 Lines • ▼ Show 20 Lines	let methods = [
/methodName=/"isRepetitiveRegion",		/methodName=/"isRepetitiveRegion",
/args=/(ins "unsigned":$index),		/args=/(ins "unsigned":$index),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
return ::mlir::bufferization::detail::defaultIsRepetitiveRegion(		return ::mlir::bufferization::detail::defaultIsRepetitiveRegion(
::llvm::cast<BufferizableOpInterface>($_op.getOperation()), index);		::llvm::cast<BufferizableOpInterface>($_op.getOperation()), index);
}]		}]
>,		>,
		InterfaceMethod<
		/desc=/[{
		Return `true` if the given region of this op is parallel, i.e.,
		multiple instances of the region may be executing at the same time.
		If a region is parallel, it must also be marked as "repetitive".

		The RaW conflict detection of One-Shot Analysis is more strict inside
		parallel regions: Buffer may have to be privatized.

		By default, regions are assumed to be sequential.
		}],
		/retType=/"bool",
		/methodName=/"isParallelRegion",
		/args=/(ins "unsigned":$index),
		/methodBody=/"",
		/defaultImplementation=/[{
		return false;
		}]
		>,
StaticInterfaceMethod<		StaticInterfaceMethod<
/desc=/[{		/desc=/[{
Return `true` if the op and this interface implementation supports		Return `true` if the op and this interface implementation supports
unstructured control flow. I.e., regions with multiple blocks. This is		unstructured control flow. I.e., regions with multiple blocks. This is
not supported in most ops, so the default answer is `false`.		not supported in most ops, so the default answer is `false`.
}],		}],
/retType=/"bool",		/retType=/"bool",
/methodName=/"supportsUnstructuredControlFlow",		/methodName=/"supportsUnstructuredControlFlow",
Show All 35 Lines

mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	Region *bufferization::getNextEnclosingRepetitiveRegion(
assert(isRepetitiveRegion(region, options) && "expected repetitive region");		assert(isRepetitiveRegion(region, options) && "expected repetitive region");
while ((region = region->getParentRegion())) {		while ((region = region->getParentRegion())) {
if (isRepetitiveRegion(region, options))		if (isRepetitiveRegion(region, options))
break;		break;
}		}
return region;		return region;
}		}

		Region bufferization::getParallelRegion(Region region,
		const BufferizationOptions &options) {
		while (region) {
		auto bufferizableOp = options.dynCastBufferizableOp(region->getParentOp());
		if (bufferizableOp &&
		bufferizableOp.isParallelRegion(region->getRegionNumber())) {
		maerhartUnsubmitted Done Reply Inline Actions Nit: maybe assert that it is also a repetitive region since that is requested in the documentation? maerhart: Nit: maybe assert that it is also a repetitive region since that is requested in the…
		assert(isRepetitiveRegion(region, options) &&
		"expected that all parallel regions are also repetitive regions");
		return region;
		}
		region = region->getParentRegion();
		}
		return nullptr;
		}

Operation *bufferization::getOwnerOfValue(Value value) {		Operation *bufferization::getOwnerOfValue(Value value) {
if (auto opResult = llvm::dyn_cast<OpResult>(value))		if (auto opResult = llvm::dyn_cast<OpResult>(value))
return opResult.getDefiningOp();		return opResult.getDefiningOp();
return llvm::cast<BlockArgument>(value).getOwner()->getParentOp();		return llvm::cast<BlockArgument>(value).getOwner()->getParentOp();
}		}

bool bufferization::allocationDoesNotEscape(OpResult opResult) {		bool bufferization::allocationDoesNotEscape(OpResult opResult) {
#ifndef NDEBUG		#ifndef NDEBUG
▲ Show 20 Lines • Show All 962 Lines • Show Last 20 Lines

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp

	Show First 20 Lines • Show All 539 Lines • ▼ Show 20 Lines
	/// actually reads another definition W2.			/// actually reads another definition W2.
	static bool			static bool
	hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,			hasReadAfterWriteInterference(const DenseSet<OpOperand *> &usesRead,
	const DenseSet<OpOperand *> &usesWrite,			const DenseSet<OpOperand *> &usesWrite,
	const DominanceInfo &domInfo,			const DominanceInfo &domInfo,
	OneShotAnalysisState &state) {			OneShotAnalysisState &state) {
	const BufferizationOptions &options = state.getOptions();			const BufferizationOptions &options = state.getOptions();

				// Before going through the main RaW analysis, find cases where a buffer must
				// be privatized due to parallelism. If the result of a write is never read,
				// privatization is not necessary (and large parts of the IR are likely dead).
				if (!usesRead.empty()) {
				maerhartUnsubmitted Done Reply Inline Actions Does this include reads of the result of the region-containing op? I.e., there are writes inside the parallel region, but they are basically discarded (like doing nothing in the `in_parallel`), is it guaranteed that they won't show up? maerhart: Does this include reads of the result of the region-containing op? I.e., there are writes…
				springermAuthorUnsubmitted Done Reply Inline Actions If an alias of the `scf.forall` result (which is also an alias of the init_arg) is written in the loop body and read afterwards, something (usually the op inside the loop) must bufferize out-of-place. This is irrespective of parallelism. But I added a test case where there is no read that bufferizes in-place. springerm: If an alias of the `scf.forall` result (which is also an alias of the init_arg) is written in…
				for (OpOperand *uConflictingWrite : usesWrite) {
				// Find the allocation point or last write (definition) of the buffer.
				// Note: In contrast to `findDefinitions`, this also returns results of
				// ops that do not bufferize to memory write when no other definition
				// could be found. E.g., "bufferization.alloc_tensor" would be included,
				// even though that op just bufferizes to an allocation but does define
				// the contents of the buffer.
				SetVector<Value> definitionsOrLeaves =
				state.findValueInReverseUseDefChain(
				uConflictingWrite->get(),
				[&](Value v) { return state.bufferizesToMemoryWrite(v); });
				assert(!definitionsOrLeaves.empty() &&
				"expected at least one definition or leaf");

				// The writing op must bufferize out-of-place if the definition is in a
				// different parallel region than this write.
				for (Value def : definitionsOrLeaves) {
				if (getParallelRegion(def.getParentRegion(), options) !=
				getParallelRegion(uConflictingWrite->getOwner()->getParentRegion(),
				options)) {
				LLVM_DEBUG(
				llvm::dbgs()
				<< "\n- bufferizes out-of-place due to parallel region:\n");
				LLVM_DEBUG(llvm::dbgs()
				<< " unConflictingWrite = operand "
				<< uConflictingWrite->getOperandNumber() << " of "
				<< *uConflictingWrite->getOwner() << "\n");
				return true;
				}
				}
				}
				}

	for (OpOperand *uRead : usesRead) {			for (OpOperand *uRead : usesRead) {
	Operation *readingOp = uRead->getOwner();			Operation *readingOp = uRead->getOwner();
	LLVM_DEBUG(llvm::dbgs() << "\n- check conflict:\n");			LLVM_DEBUG(llvm::dbgs() << "\n- check conflict:\n");
	LLVM_DEBUG(llvm::dbgs() << " uRead = operand " << uRead->getOperandNumber()			LLVM_DEBUG(llvm::dbgs() << " uRead = operand " << uRead->getOperandNumber()
	<< " of " << *readingOp << "\n");			<< " of " << *readingOp << "\n");

	// Find the definition of uRead by following the SSA use-def chain.			// Find the definition of uRead by following the SSA use-def chain.
	// E.g.:			// E.g.:
	▲ Show 20 Lines • Show All 747 Lines • Show Last 20 Lines

mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp

Show First 20 Lines • Show All 1,196 Lines • ▼ Show 20 Lines	for (auto [lb, ub, step] :
if (!stepConstant)		if (!stepConstant)
return true;		return true;

if (lbConstant + stepConstant < *ubConstant)		if (lbConstant + stepConstant < *ubConstant)
return true;		return true;
}		}
return false;		return false;
}		}

		bool isParallelRegion(Operation *op, unsigned index) const {
		return isRepetitiveRegion(op, index);
		}
};		};

/// Nothing to do for InParallelOp.		/// Nothing to do for InParallelOp.
struct InParallelOpInterface		struct InParallelOpInterface
: public BufferizableOpInterface::ExternalModel<InParallelOpInterface,		: public BufferizableOpInterface::ExternalModel<InParallelOpInterface,
InParallelOp> {		InParallelOp> {
LogicalResult bufferize(Operation *op, RewriterBase &b,		LogicalResult bufferize(Operation *op, RewriterBase &b,
const BufferizationOptions &options) const {		const BufferizationOptions &options) const {
Show All 22 Lines

mlir/test/Dialect/SCF/one-shot-bufferize-analysis.mlir

Show First 20 Lines • Show All 792 Lines • ▼ Show 20 Lines	scf.for %iv2 = %a to %b step %c {
%2 = tensor.extract_slice %t[0][4][1] : tensor<10xf32> to tensor<4xf32>		%2 = tensor.extract_slice %t[0][4][1] : tensor<10xf32> to tensor<4xf32>
%3 = tensor.extract %2[%a] : tensor<4xf32>		%3 = tensor.extract %2[%a] : tensor<4xf32>
vector.print %3 : f32		vector.print %3 : f32
}		}
}		}
}		}
return		return
}		}

		// -----

		// CHECK-LABEL: func @parallel_region()
		func.func @parallel_region() -> tensor<320xf32>
		{
		%alloc0 = bufferization.alloc_tensor() : tensor<320xf32>
		%alloc1 = bufferization.alloc_tensor() : tensor<1xf32>
		%c320 = arith.constant 320 : index
		// CHECK: scf.forall
		%0 = scf.forall (%arg0) in (%c320) shared_outs(%arg1 = %alloc0) -> (tensor<320xf32>) {
		%val = "test.foo"() : () -> (f32)
		// linalg.fill must bufferize out-of-place because every thread needs a
		// private copy of %alloc1.
		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
		%fill = linalg.fill ins(%val : f32) outs(%alloc1 : tensor<1xf32>) -> tensor<1xf32>
		scf.forall.in_parallel {
		// CHECK: tensor.parallel_insert_slice {{.*}} {__inplace_operands_attr__ = ["true", "true", "none"]}
		tensor.parallel_insert_slice %fill into %arg1[%arg0] [1] [1] : tensor<1xf32> into tensor<320xf32>
		}
		}
		// CHECK: } {__inplace_operands_attr__ = ["none", "true"]}
		return %0 : tensor<320xf32>
		}

		// -----

		// CHECK-LABEL: func @parallel_region_mixed_def(
		func.func @parallel_region_mixed_def(%c: i1) -> tensor<320xf32>
		{
		%alloc0 = bufferization.alloc_tensor() : tensor<320xf32>
		%alloc1 = bufferization.alloc_tensor() : tensor<1xf32>
		%c320 = arith.constant 320 : index
		// CHECK: scf.forall
		%0 = scf.forall (%arg0) in (%c320) shared_outs(%arg1 = %alloc0) -> (tensor<320xf32>) {
		%alloc2 = bufferization.alloc_tensor() : tensor<1xf32>
		%selected = scf.if %c -> tensor<1xf32> {
		scf.yield %alloc1 : tensor<1xf32>
		} else {
		scf.yield %alloc2 : tensor<1xf32>
		}
		%val = "test.foo"() : () -> (f32)
		// linalg.fill must bufferize out-of-place because every thread needs a
		// private copy of %alloc1.
		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
		%fill = linalg.fill ins(%val : f32) outs(%selected : tensor<1xf32>) -> tensor<1xf32>
		scf.forall.in_parallel {
		// CHECK: tensor.parallel_insert_slice {{.*}} {__inplace_operands_attr__ = ["true", "true", "none"]}
		tensor.parallel_insert_slice %fill into %arg1[%arg0] [1] [1] : tensor<1xf32> into tensor<320xf32>
		}
		}
		// CHECK: } {__inplace_operands_attr__ = ["none", "true"]}
		return %0 : tensor<320xf32>
		}

		// -----

		// CHECK-LABEL: func @parallel_region_two_writes(
		func.func @parallel_region_two_writes(%f: f32) -> tensor<320xf32>
		{
		%alloc0 = bufferization.alloc_tensor() : tensor<320xf32>
		%alloc1 = bufferization.alloc_tensor() : tensor<1xf32>
		%c320 = arith.constant 320 : index
		%c0 = arith.constant 0 : index
		// CHECK: scf.forall
		%0 = scf.forall (%arg0) in (%c320) shared_outs(%arg1 = %alloc0) -> (tensor<320xf32>) {
		%val = "test.foo"() : () -> (f32)
		// linalg.fill must bufferize out-of-place because every thread needs a
		// private copy of %alloc1.
		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "false"]}
		%fill = linalg.fill ins(%val : f32) outs(%alloc1 : tensor<1xf32>) -> tensor<1xf32>
		// CHECK: tensor.insert
		// CHECK-SAME: __inplace_operands_attr__ = ["none", "true", "none"]
		%inserted = tensor.insert %f into %fill[%c0] : tensor<1xf32>

		scf.forall.in_parallel {
		// CHECK: tensor.parallel_insert_slice {{.*}} {__inplace_operands_attr__ = ["true", "true", "none"]}
		tensor.parallel_insert_slice %inserted into %arg1[%arg0] [1] [1] : tensor<1xf32> into tensor<320xf32>
		}
		}
		// CHECK: } {__inplace_operands_attr__ = ["none", "true"]}
		return %0 : tensor<320xf32>
		}

		// -----

		// CHECK-LABEL: func @parallel_region_no_read()
		func.func @parallel_region_no_read()
		{
		%alloc0 = bufferization.alloc_tensor() : tensor<320xf32>
		%alloc1 = bufferization.alloc_tensor() : tensor<1xf32>
		%c320 = arith.constant 320 : index
		// CHECK: scf.forall
		scf.forall (%arg0) in (%c320) {
		%val = "test.foo"() : () -> (f32)
		// linalg.fill can bufferize in-place because no alias of %alloc1 is read.
		// CHECK: linalg.fill {__inplace_operands_attr__ = ["none", "true"]}
		%fill = linalg.fill ins(%val : f32) outs(%alloc1 : tensor<1xf32>) -> tensor<1xf32>
		scf.forall.in_parallel {
		}
		}
		return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferization] Privatize buffers for parallel regionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 556013

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h

mlir/include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td

mlir/lib/Dialect/Bufferization/IR/BufferizableOpInterface.cpp

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp

mlir/lib/Dialect/SCF/Transforms/BufferizableOpInterfaceImpl.cpp

mlir/test/Dialect/SCF/one-shot-bufferize-analysis.mlir

[mlir][bufferization] Privatize buffers for parallel regions
ClosedPublic