This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/
-
mlir/
-
Dialect/
-
Linalg/
-
IR/
-
LinalgBase.td
-
Passes.h
-
Passes.td
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
IR/
-
LinalgTypes.cpp
-
Transforms/
-
CMakeLists.txt
1/1
ComprehensiveBufferize.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
comprehensive-func-bufferize.mlir

Differential D101693

[mlir] Add ComprehensiveBufferize pass for function and modules (step 1/n)
ClosedPublic

Authored by nicolasvasilache on May 1 2021, 9:48 AM.

Download Raw Diff

Details

Reviewers

silvas
mehdi_amini
ftynse
herhut
mravishankar
pifon2a
ThomasRaoux

Commits

rG1e01a8919f8d: [mlir][Linalg] Add ComprehensiveBufferize for functions(step 1/n)

Summary

This is the first step towards upstreaming comprehensive bufferization following the
discourse post: https://llvm.discourse.group/t/rfc-linalg-on-tensors-update-and-comprehensive-bufferization-rfc/3373/6.

This first commit introduces a basic pass for bufferizing within function boundaries,
assuming that the inplaceable function boundaries have been marked as such.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.May 1 2021, 9:48 AM

Herald added subscribers: dcaballe, cota, teijeong and 16 others. · View Herald TranscriptMay 1 2021, 9:48 AM

nicolasvasilache requested review of this revision.May 1 2021, 9:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2021, 9:48 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

nicolasvasilache added reviewers: silvas, mehdi_amini, ftynse, herhut, mravishankar, pifon2a, ThomasRaoux.May 1 2021, 9:50 AM

Harbormaster completed remote builds in B102110: Diff 342159.May 1 2021, 9:57 AM

Thanks! It seems to me that the main missing piece right now is to express the per-op conversion through a dedicated interface instead of through the LinalOp interface.

mlir/include/mlir/Dialect/Tensor/IR/TensorBase.td
50 ↗	(On Diff #342159)	Doc
mlir/include/mlir/Transforms/Passes.td
384 ↗	(On Diff #342159)	This is limited to tensor->memref conversion, worth mentioning. Should this be in the tensor dialect actually?
mlir/lib/Transforms/ComprehensiveBufferization.cpp
155 ↗	(On Diff #342159)	This seems ill defined to me as tensors are immutable values...
159 ↗	(On Diff #342159)	Is this a real condition or is it dead? I rather see "assert" for invariant than more codepath.
164 ↗	(On Diff #342159)	Seems like worth making this a first class attribute instead of an array of string and performing continuous stringification and string matching everywhere.
180 ↗	(On Diff #342159)
218 ↗	(On Diff #342159)	assert?
388 ↗	(On Diff #342159)	Honestly I find such condition easier to read inline than hidden behind this helper which has a single use in the codebase. If you feel like such test is worse an API it should likely be a method on the Value class.
453 ↗	(On Diff #342159)	I don't quite get why this is taken by reference: it does not seem modified.
454 ↗	(On Diff #342159)	is it possible to have a nullptr here?
472 ↗	(On Diff #342159)	Seems like what needs a new interface to abstract it from `LinalgOp`
481 ↗	(On Diff #342159)	This assignment seems dead?
565 ↗	(On Diff #342159)
565 ↗	(On Diff #342159)	I am not sure why `op == parentOp` is useful here? (what does it catch?) But I also don't understand why the filter in the first place?
659 ↗	(On Diff #342159)	We need an interface here right?
702–705 ↗	(On Diff #342159)
738 ↗	(On Diff #342159)	This is duplicating the (I think dead) condition at call-site.
751–753 ↗	(On Diff #342159)
754 ↗	(On Diff #342159)
775–776 ↗	(On Diff #342159)
779 ↗	(On Diff #342159)	Where is `convertedCallOps` populated?
794 ↗	(On Diff #342159)	Function passes aren't scheduled on declaration I believe, this condition is likely dead?
798 ↗	(On Diff #342159)	I don't think you can use this from a FunctionOp because it'll break the multi-threading aspect unfortunately.

herhut added inline comments.May 11 2021, 3:46 AM

mlir/lib/Transforms/ComprehensiveBufferization.cpp
54 ↗	(On Diff #342159)	Is this legal IR? Is it not a use after free?
61 ↗	(On Diff #342159)	So this assumes that the prelude with the allocation can always be hoisted to the caller. Is this correct (just making sure I understand right)?
129 ↗	(On Diff #342159)	What does this do?
303 ↗	(On Diff #342159)	Is the problem here that every user creates its own alloc (and hence the pattern to recognize to hoist these out to the caller gets messy) or that you have multiple copies? Ultimately, you want to hoist the alloc and copy out, but only if they appear on all paths through the function, right? As a post optimization this would be hoisting copies + allocs. How do you envision this here? Collect this information before the bufferization in form of attributes?
358 ↗	(On Diff #342159)	How do you envision this to compose? This is the equivalent of bufferization patterns for other dialects, right?
400 ↗	(On Diff #342159)	So you assume that there is a single block for the entire computation that you are allocation here? Or at least that no allocated memory escapes the block? Is this the reason for the `tensor.load` in the example comment? So cross-block flow is expected to be handled outside of this transformation?
481 ↗	(On Diff #342159)	I think it is the wrong way round, this should communicate the updates `tmpSliceRef` back to the reference parameter if the match succeeded.
684 ↗	(On Diff #342159)	Why is this different form the code below?
745 ↗	(On Diff #342159)	Can we lift this to an arbitrary region with a single block and terminator? That would allow the use in other contexts, too.

Address reviews.

thanks for reviewing despite the WIP state!

mlir/lib/Transforms/ComprehensiveBufferization.cpp
54 ↗	(On Diff #342159)	It is illegal IR and if it cannot be legalized by hoisting, the pass will need to fail.
61 ↗	(On Diff #342159)	It is meant to support the cases where the hoisting can happen. Cases where the hoisting can't happen are unsupported by this pass at this time.
129 ↗	(On Diff #342159)	nothing atm
159 ↗	(On Diff #342159)	It is a real condition: `getTiedOpResult` may return null as it is perfectly fine that certain op operands do not map to results.
303 ↗	(On Diff #342159)	The latter (i.e. you have multiple copies). When no interfering reads occur, we could reuse the same buffer but the current impl does not perform the optimization atm.
358 ↗	(On Diff #342159)	not too sure how to parse this sentence, I just refactored as a DimOp bufferization pattern and deleted this. This was a remnant from experimenting with other traversal orders.
400 ↗	(On Diff #342159)	Yes, it is a best effort pass and branchy control-flow is generally supported. In particular this pass has no intention of generalizing to the graph level and may just fail. Added some more doc at the top-level to make it more explicit.
481 ↗	(On Diff #342159)	yes, thanks!
565 ↗	(On Diff #342159)	older cruft from when I used a backwardSlice, thanks!
659 ↗	(On Diff #342159)	I feel this is premature, this pass is not meant to generalize to an open set of ops atm and I don't know that there is such a generic interface that makes sense for other ops. Note that this is a "non-conversion" + inplace-aware specialization of the linalg bufferization pattern used in: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L191. There could technically be refactorings that allow BlockAndValueMapping-based code to be reused by conversion patterns but it feels premature here.
684 ↗	(On Diff #342159)	cruft from refactorings, thanks!
745 ↗	(On Diff #342159)	I will think about it as I flush my stack and try as a followup, I have multiple refactorings in flight and the generalization should be separable.
779 ↗	(On Diff #342159)	in my other repo :)
798 ↗	(On Diff #342159)	right, the original pass was a full Module pass, I'll reinsert at the cross function boundary when it makes sense.

Hmm phab. did not like the file moving, a few unaddressed comments re interfaces dropped.

Harbormaster completed remote builds in B103849: Diff 344551.May 11 2021, 1:51 PM

Add InplaceInterface.

Harbormaster completed remote builds in B103887: Diff 344599.May 11 2021, 5:12 PM

Thanks for the replies.

mlir/lib/Transforms/ComprehensiveBufferization.cpp
303 ↗	(On Diff #342159)	Still don't get this. if you have a non-reusable input argument you create a copy and then mutate the copy, correct? I don't see how a single copy fits in here? If you have multiple cases of in place updates, you also need multiple copies so that you can materialize the updated value somewhere. If the updates lie on different control flow paths, having a copy per path is not worse than a single shared copy (except for code size maybe). But that is not supported yet anyway, right?
358 ↗	(On Diff #342159)	That answers my question. I was wondering how to enable bufferizing other operations and whether we would need more static functions for those. If they are patterns that are provided by the context, that should work.

nicolasvasilache marked 4 inline comments as done.May 12 2021, 5:50 AM

nicolasvasilache added inline comments.

mlir/lib/Transforms/ComprehensiveBufferization.cpp
303 ↗	(On Diff #342159)	A single control flow path is supported so that part is out of the equation. Imagine: func @foo(%A: tensor<...>) { %B = op1(%A) op2(%B) // some print. %C = op3(%A) return %C } `%A` will bufferize to a new alloc `%bA`; after op2 we are done with the uses of the value `%B`. If op1 and op3 were inpleaceable, we could reuse `%bA` as the buffer for `%C` too but atm this optimization is not supported and a new alloc would trigger for the use of `%A` in `op2`.
472 ↗	(On Diff #342159)	it's the same as the one needed to drop `getTiedOpResult` for `OpOperand`.

Address more review.

nicolasvasilache retitled this revision from [WIP][mlir] Add ComprehensiveBufferize pass for function and modules (step 1/n) to [mlir] Add ComprehensiveBufferize pass for function and modules (step 1/n).May 12 2021, 5:54 AM

nicolasvasilache edited the summary of this revision. (Show Details)

nicolasvasilache marked an inline comment as done.May 12 2021, 6:00 AM

nicolasvasilache added inline comments.

mlir/lib/Transforms/ComprehensiveBufferization.cpp
164 ↗	(On Diff #342159)	I'd like to keep this for a followup (and ideally farm out as an intro task), if you don't mind?

Harbormaster completed remote builds in B104033: Diff 344790.May 12 2021, 6:11 AM

Add cmake.

Herald added a subscriber: mgorny. · View Herald TranscriptMay 12 2021, 6:18 AM

Harbormaster completed remote builds in B104039: Diff 344800.May 12 2021, 6:54 AM

ezhulenev added a subscriber: ezhulenev.May 12 2021, 10:41 AM

ezhulenev added inline comments.

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferization.cpp
62 ↗	(On Diff #344800)	Is it legal to `memref.tensor_load` after `%2` was deallocated? Or it is intentionally an invalid intermediate IR that will be gone after bufferization?

nicolasvasilache marked an inline comment as done.May 12 2021, 11:35 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferization.cpp
62 ↗	(On Diff #344800)	The latter, there will be a cross-"function/call" analysis that will say such alloc/dealloc pairs must be lifted outside the function at each call site and become new function operands (see comments l67-73) The pass will fail if this is not feasible for some reason.

rriddle added inline comments.May 12 2021, 11:38 AM

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferization.cpp
726–727 ↗	(On Diff #344800)	Did you mean for one of these to be getResultTypes?

Address comment.

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferization.cpp
726–727 ↗	(On Diff #344800)	yes, thanks for catching this!

Harbormaster completed remote builds in B104098: Diff 344894.May 12 2021, 12:43 PM

silvas added inline comments.May 12 2021, 5:15 PM

mlir/include/mlir/Interfaces/InplaceInterface.td
21 ↗	(On Diff #344894)	I know some folks have asked for this interface, but I have to disagree here. I don't think this interface is a good idea, because it doesn't really have any semantics independent of a particular bufferization scheme, and in particular the exact buffer op sequence that a tensor op gets lowered to (that is, it's really a way for a pass to "predict" some details about the lowered op sequence and makes no sense in isolation). Since ComprehensiveBufferization is already doing an explicit enumeration to handle the lowering code, it seems natural that it would have an explicit enumeration of this op property which is tightly coupled to the rewrites. Or to put it another way, either: Have a single interface that provides both the getTiedOpResult function and the lowering code that converts the tensor op into a sequence of buffer ops, or Do explicit case enumeration for getTiedOpResult and for the lowering code and delete this interface. I strongly prefer 2, because 1. feels contrary to the current intent here of having a scoped-down bufferization for the types of code that linalg transformations generates, to avoid needing to do the heavy lift of generalizing the existing bufferization infra to support inplace, which is more involved due to it's grander scope. Maybe as we see how users are intending to use this, we will find that some interface is needed for pluggability (e.g. maybe IREE needs some pluggability for a few ops at the ABI boundaries) -- I would rather wait and see what those concrete needs are before creating such an interface (which should be clearly labeled as specific to this pass -- something like ComprehensiveBufferizeOpInterface).
mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferization.cpp
1 ↗	(On Diff #344894)	nit: can we call this ComprehensiveBufferize.cpp instead of ComprehensiveBufferization.cpp, to keep the term "Bufferize" consistent?
464 ↗	(On Diff #344894)	TODO is out of date.

nicolasvasilache added inline comments.May 13 2021, 1:38 AM

mlir/include/mlir/Interfaces/InplaceInterface.td
21 ↗	(On Diff #344894)	Basically the information needed is that OpOperand and OpResult are the same tensor type and size. This information is unrelated to any transformation and is purely a property of the op. In the static case, this is trivial and nothing special is required. In the dynamic case, this is a property of the op and each op spells this differently. I can change the naming of the interface and the doc to reflect the property above and drop any mention of bufferization. Would that alleviate your concerns? Note that this is just a minimal subset of the interface that you have been using on the IREE side: https://github.com/google/iree/blob/22a905c485e3ae2ff9743dc9d195de3b90b08ed5/iree/compiler/Dialect/IREE/IR/IREEInterfaces.td#L24

Address comments.
Simplify inplace patterns.
Fix read interference bug for func args.
Add negative tests.

Rebase.

nicolasvasilache added a child revision: D102395: [mlir][Linalg] Add support for vector.transfer ops to comprehensive bufferization (2/n)..May 13 2021, 5:19 AM

Harbormaster completed remote builds in B104247: Diff 345093.May 13 2021, 5:41 AM

Rebase.

Harbormaster completed remote builds in B104321: Diff 345191.May 13 2021, 11:00 AM

nicolasvasilache marked an inline comment as done.May 13 2021, 12:51 PM

nicolasvasilache added inline comments.

mlir/include/mlir/Interfaces/InplaceInterface.td
21 ↗	(On Diff #344894)	FYI, after discussing with @silvas we settled on reverting back to the pass-local switch for now.

Address comments.

Rebase.

aartbik added inline comments.May 13 2021, 1:36 PM

mlir/lib/Transforms/ComprehensiveBufferization.cpp
155 ↗	(On Diff #342159)	Note that this is very similar to the "fastOutput" experimental flag I have been using in the sparse compiler to get "in place" bufferization semantics. I am not sure how we can make the semantics cleaner, but I there is clearly a need for something like this!

Cleanup.

Nicolas and I had a long conversation regarding the "in place op interface". We managed to tease out two separate concepts that are independent.

Whether a tensor result is known to have the same shape, element type, and encoding as an input. This is a property that can be described at the tensor level. A motivating example here is a 10-ary elementwise add op -- all ten inputs have the same shape, element type, and encoding as the input. This type of tensor-level compatibility is necessary to allow a bufferization pass to be "smart" and reuse the storage of one operand for the result, but it is not sufficient for a bufferization pass to safely decide whether an op can be done in place or not.

The ultimate determination of whether it is safe for a tensor op to bufferize to an in place op is strictly a function of the choice of lowering (and cannot be described at the level of a tensor op). E.g. a (data-moving) transpose op could be implemented as for (...) dst[i,j] = src[j,i] in which case it could not be done in place, or it could be implemented with a transpose algorithm that can be done in place (block swapping etc.). This is purely an implementation detail of the lowering that happens during bufferization, and cannot be known independently of the pattern which is used to bufferize the op (you could put such a pattern in an op interface, but that doesn't change the fact that the "inplaceability" is a function of the pattern, not the op). A similar situation occurs with e.g. a square tensor-level matmul D=A*B+C. All of A, B, C, D can have the same shape, element type, and encoding, but whether it is possible to reuse A to hold the storage for D depends on the lowering (and I'm not aware of any matmul lowering that operates in place with LHS/RHS).

Thus, the tensor-level InplaceOpInterface that we had here before was more of an attempt at 1, but since that is not the sufficient condition for the pass, it was deemed a premature generalization to add that (something like that might be useful, or a generalization of that which allows e.g. a "concat" to indicate that the output ).

Indeed, the pass uses an open-coded enumeration of the lowering patterns, so a matching open-coded enumeration for the "inplaceability" information coupled to those lowering patterns makes sense and is consistent with this pass's goals of being fairly isolated and standalone -- we don't want to prematurely add interfaces for something like that until we have a user.

I'm personally ok for this to go in. It's self-contained, and serves as a benchmark for the needed functionality needed if/when we add this kind of behavior to the other bufferization system.

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferize.cpp
132	Document the contract of this: it says which results will be reused inplace by the bufferization patterns in bufferizeFuncOpInternals, so that the analysis can correctly predict the inplaced final result for its safety analysis.

This revision is now accepted and ready to land.May 13 2021, 2:04 PM

Harbormaster completed remote builds in B104373: Diff 345269.May 13 2021, 3:13 PM

Address comment.

This revision was landed with ongoing or failed builds.May 13 2021, 3:25 PM

Closed by commit rG1e01a8919f8d: [mlir][Linalg] Add ComprehensiveBufferize for functions(step 1/n) (authored by nicolasvasilache). · Explain Why

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rG1e01a8919f8d: [mlir][Linalg] Add ComprehensiveBufferize for functions(step 1/n).

nicolasvasilache marked an inline comment as done.May 13 2021, 3:28 PM

Harbormaster completed remote builds in B104397: Diff 345300.May 13 2021, 4:37 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

IR/

LinalgBase.td

10 lines

Passes.h

6 lines

Passes.td

15 lines

lib/

Dialect/

Linalg/

IR/

LinalgTypes.cpp

27 lines

Transforms/

CMakeLists.txt

1 line

ComprehensiveBufferize.cpp

785 lines

test/

Dialect/

Linalg/

comprehensive-func-bufferize.mlir

83 lines

Diff 345303

mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td

Show All 31 Lines	let description = [{
Document](https://mlir.llvm.org/docs/Rationale/RationaleLinalgDialect) are		Document](https://mlir.llvm.org/docs/Rationale/RationaleLinalgDialect) are
are also available and should be read first before going in the details of		are also available and should be read first before going in the details of
the op semantics.		the op semantics.
}];		}];
let cppNamespace = "::mlir::linalg";		let cppNamespace = "::mlir::linalg";
let dependentDialects = [		let dependentDialects = [
"AffineDialect", "StandardOpsDialect", "tensor::TensorDialect"		"AffineDialect", "StandardOpsDialect", "tensor::TensorDialect"
];		];
		let hasOperationAttrVerify = 1;
let extraClassDeclaration = [{		let extraClassDeclaration = [{
		/// Attribute name used to to memoize indexing maps for named ops.
		constexpr const static ::llvm::StringLiteral
		kMemoizedIndexingMapsAttrName = "linalg.memoized_indexing_maps";

		/// Attribute name used to mark region arguments that can be bufferized
		/// in-place during linalg comprehensive bufferization.
		constexpr const static ::llvm::StringLiteral
		kInplaceableAttrName = "linalg.inplaceable";

using RegionBuilderFunType = llvm::function_ref<void(Block &, ValueRange)>;		using RegionBuilderFunType = llvm::function_ref<void(Block &, ValueRange)>;
RegionBuilderFunType getRegionBuilder(StringRef name) {		RegionBuilderFunType getRegionBuilder(StringRef name) {
return namedStructuredOpRegionBuilders.lookup(name);		return namedStructuredOpRegionBuilders.lookup(name);
}		}
private:		private:
llvm::StringMap<RegionBuilderFunType> namedStructuredOpRegionBuilders;		llvm::StringMap<RegionBuilderFunType> namedStructuredOpRegionBuilders;
}];		}];
}		}

// Whether a type is a RangeType.		// Whether a type is a RangeType.
def LinalgIsRangeTypePred : CPred<"$_self.isa<RangeType>()">;		def LinalgIsRangeTypePred : CPred<"$_self.isa<RangeType>()">;
def Range : DialectType<Linalg_Dialect, LinalgIsRangeTypePred, "range">;		def Range : DialectType<Linalg_Dialect, LinalgIsRangeTypePred, "range">;

#endif // LINALG_BASE		#endif // LINALG_BASE

mlir/include/mlir/Dialect/Linalg/Passes.h

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	/// memref.load/memref.store accesses.			/// memref.load/memref.store accesses.
	std::unique_ptr<OperationPass<FuncOp>> createConvertLinalgToParallelLoopsPass();			std::unique_ptr<OperationPass<FuncOp>> createConvertLinalgToParallelLoopsPass();

	/// Create a pass to convert Linalg operations to affine.for loops and			/// Create a pass to convert Linalg operations to affine.for loops and
	/// affine_load/affine_store accesses.			/// affine_load/affine_store accesses.
	/// Placeholder for now, this is NYI.			/// Placeholder for now, this is NYI.
	std::unique_ptr<OperationPass<FuncOp>> createConvertLinalgToAffineLoopsPass();			std::unique_ptr<OperationPass<FuncOp>> createConvertLinalgToAffineLoopsPass();

				/// Create a pass that bufferizes the body of a FuncOp and tries to reuse the
				/// buffers for those arguments that:
				/// a) have been annotated 'inplaceable' and
				/// b) whose buffer uses would be free of memory hazards.
				std::unique_ptr<Pass> createLinalgComprehensiveFuncBufferizePass();

	/// Create a pass to convert Linalg operations which work on tensors to use			/// Create a pass to convert Linalg operations which work on tensors to use
	/// buffers instead.			/// buffers instead.
	std::unique_ptr<OperationPass<FuncOp>> createLinalgBufferizePass();			std::unique_ptr<OperationPass<FuncOp>> createLinalgBufferizePass();

	/// Create a pass to conver named Linalg operations to Linalg generic			/// Create a pass to conver named Linalg operations to Linalg generic
	/// operations.			/// operations.
	std::unique_ptr<OperationPass<FuncOp>> createLinalgGeneralizationPass();			std::unique_ptr<OperationPass<FuncOp>> createLinalgGeneralizationPass();

	Show All 15 Lines

mlir/include/mlir/Dialect/Linalg/Passes.td

Show All 16 Lines	let description = [{
Convert ops with the `ElementwiseMappable` trait to linalg parallel loops.		Convert ops with the `ElementwiseMappable` trait to linalg parallel loops.

This pass only converts ops that operate on ranked tensors.		This pass only converts ops that operate on ranked tensors.
}];		}];
let constructor = "mlir::createConvertElementwiseToLinalgPass()";		let constructor = "mlir::createConvertElementwiseToLinalgPass()";
let dependentDialects = ["linalg::LinalgDialect", "memref::MemRefDialect"];		let dependentDialects = ["linalg::LinalgDialect", "memref::MemRefDialect"];
}		}

		def LinalgComprehensiveFuncBufferize :
		FunctionPass<"linalg-comprehensive-func-bufferize"> {
		let summary = "Bufferize (tensor into memref) the body of a FuncOp and try "
		"to reuse the buffers for those arguments that "
		"a) have been annotated 'inplaceable' and "
		"b) whose buffer uses would be free of memory hazards";
		let description = [{
		This pass implements a cross-dialect bufferization approach and performs an
		analysis to determine which op operands and results may be bufferized in the
		same buffers. The analysis is performed on SSA use-def chains starting from
		function operands that are annotated with the 'inplaceable' attribute
		}];
		let constructor = "mlir::createLinalgComprehensiveFuncBufferizePass()";
		}

def LinalgFoldUnitExtentDims : FunctionPass<"linalg-fold-unit-extent-dims"> {		def LinalgFoldUnitExtentDims : FunctionPass<"linalg-fold-unit-extent-dims"> {
let summary = "Remove unit-extent dimension in Linalg ops on tensors";		let summary = "Remove unit-extent dimension in Linalg ops on tensors";
let constructor = "mlir::createLinalgFoldUnitExtentDimsPass()";		let constructor = "mlir::createLinalgFoldUnitExtentDimsPass()";
let options = [		let options = [
Option<"foldOneTripLoopsOnly", "fold-one-trip-loops-only", "bool",		Option<"foldOneTripLoopsOnly", "fold-one-trip-loops-only", "bool",
/default=/"false",		/default=/"false",
"Only folds the one-trip loops from Linalg ops on tensors "		"Only folds the one-trip loops from Linalg ops on tensors "
"(for testing purposes only)">		"(for testing purposes only)">
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/IR/LinalgTypes.cpp

	Show All 9 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Dialect/Linalg/IR/LinalgTypes.h"			#include "mlir/Dialect/Linalg/IR/LinalgTypes.h"
	#include "mlir/Dialect/Linalg/IR/LinalgOps.h"			#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
	#include "mlir/IR/BuiltinTypes.h"			#include "mlir/IR/BuiltinTypes.h"
	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/IR/DialectImplementation.h"			#include "mlir/IR/DialectImplementation.h"
				#include "mlir/IR/FunctionSupport.h"
	#include "mlir/Parser.h"			#include "mlir/Parser.h"
	#include "mlir/Support/LLVM.h"			#include "mlir/Support/LLVM.h"
	#include "mlir/Transforms/InliningUtils.h"			#include "mlir/Transforms/InliningUtils.h"

	#include "llvm/ADT/StringExtras.h"			#include "llvm/ADT/StringExtras.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	using namespace mlir;			using namespace mlir;
	Show All 26 Lines
	};			};

	} // end anonymous namespace			} // end anonymous namespace

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// LinalgDialect			// LinalgDialect
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				/// Attribute name used to to memoize indexing maps for named ops.
				constexpr const ::llvm::StringLiteral
				LinalgDialect::kMemoizedIndexingMapsAttrName;

				/// Attribute name used to mark region arguments that can be bufferized
				/// in-place during linalg comprehensive bufferization.
				constexpr const ::llvm::StringLiteral LinalgDialect::kInplaceableAttrName;

	/// Trait to check if T provides a `regionBuilder` method.			/// Trait to check if T provides a `regionBuilder` method.
	template <typename T, typename... Args>			template <typename T, typename... Args>
	using has_region_builder = decltype(T::regionBuilder);			using has_region_builder = decltype(T::regionBuilder);
	template <typename T>			template <typename T>
	using detect_has_region_builder = llvm::is_detected<has_region_builder, T>;			using detect_has_region_builder = llvm::is_detected<has_region_builder, T>;

	/// SFINAE helper for single C++ class without a `regionBuilder` method (e.g.			/// SFINAE helper for single C++ class without a `regionBuilder` method (e.g.
	/// an OpInterface).			/// an OpInterface).
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

	/// RangeType prints as just "range".			/// RangeType prints as just "range".
	static void print(RangeType rt, DialectAsmPrinter &os) { os << "range"; }			static void print(RangeType rt, DialectAsmPrinter &os) { os << "range"; }

	void mlir::linalg::LinalgDialect::printType(Type type,			void mlir::linalg::LinalgDialect::printType(Type type,
	DialectAsmPrinter &os) const {			DialectAsmPrinter &os) const {
	print(type.cast<RangeType>(), os);			print(type.cast<RangeType>(), os);
	}			}

				LogicalResult LinalgDialect::verifyOperationAttribute(Operation *op,
				NamedAttribute attr) {
				if (attr.first == LinalgDialect::kInplaceableAttrName) {
				if (!attr.second.isa<BoolAttr>()) {
				return op->emitError() << "'" << LinalgDialect::kInplaceableAttrName
				<< "' is expected to be a boolean attribute";
				}
				if (!op->hasTrait<OpTrait::FunctionLike>())
				return op->emitError() << "expected " << attr.first
				<< " to be used on function-like operations";
				return success();
				}
				if (attr.first == LinalgDialect::kMemoizedIndexingMapsAttrName)
				return success();
				return op->emitError() << "attribute '" << attr.first
				<< "' not supported by the linalg dialect";
				}

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRLinalgTransforms			add_mlir_dialect_library(MLIRLinalgTransforms
	Bufferize.cpp			Bufferize.cpp
	CodegenStrategy.cpp			CodegenStrategy.cpp
				ComprehensiveBufferize.cpp
	Detensorize.cpp			Detensorize.cpp
	DropUnitDims.cpp			DropUnitDims.cpp
	ElementwiseToLinalg.cpp			ElementwiseToLinalg.cpp
	Fusion.cpp			Fusion.cpp
	FusionOnTensors.cpp			FusionOnTensors.cpp
	Generalization.cpp			Generalization.cpp
	Hoisting.cpp			Hoisting.cpp
	Interchange.cpp			Interchange.cpp
	Show All 36 Lines

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferize.cpp

This file was added.

				//===- ComprehensiveBufferize.cpp - Single pass bufferization -------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Perform inplace bufferization within function boundaries.
				// This is a specialized pass that supports inplace analysis for a fixed subset
				// of ops that have well-defined inplace semantics.
				// This pass caters to high-performance codegen where buffer reuse is deemed
				// necessary: the pass should fail if the bufferized form of the function needs
				// to return any buffer.
				// Generic control-flow and branching are unsupported.
				// Composability with extensible set of ops is not a first-class concern.
				//
				// Bufferization occurs by:
				// a. performing an inPlace analysis `inPlaceAnalysisFuncOpInternals`
				// which marks each operation within the function with the
				// `kInPlaceResultsAttrName` attribute.
				// b. traversing each operation in the function and rewriting it in
				// buffer form and keeping a BlockAndValueMapping mapping of the
				// rewrites. New allocations are introduced during this step.
				// TODO: Allocation + depending op hoisting to outermost enclosing
				// sequential scope.
				// c. at the end of this bufferization, 2 cases may occur:
				// * inplaceable function arguments may be reused in place after the
				// function itself has been bufferized. This is encoded by IR resembling:
				//
				// ```
				// #map = affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>
				// func @foo(%A: tensor<?xf32> {linalg.inplaceable = true}) -> tensor<?xf32> {
				// %0 = memref.buffer_cast %A : memref<?xf32, #map>
				// // ... uses of %0
				// %res = memref.tensor_load %0 : memref<?xf32, #map>
				// return %res : tensor<?xf32>
				// }
				// ```
				//
				// this is the cue for the bufferization of the function foo (and calls to
				// it) may bufferize to `func @foo(%A: memref<?xf32, some_layout>)`.
				// To fully achieve bufferization, an additional analysis is needed to
				// determine whether function argument/operand pairs bufferize to a single
				// inplace buffer argument (i.e. functions may return tensors in arbitrary
				// order that may not match argument numbers).
				// * results that don't map to an inplaceable function argument must be
				// allocated. Since memref semantics wrt ownership of the underlying
				// memory region are not well-defined, comprehensive bufferization chooses
				// to perform allocations in a scoped fashion: returning memrefs is always
				// considered illegal. Such scenarios are encoded by IR resembling:
				//
				// ```
				// #map = affine_map<(d0)[s0, s1] -> (d0 * s1 + s0)>
				// func @foo(%A: tensor<?xf32> {linalg.inplaceable = true}) -> tensor<?xf32> {
				// %0 = memref.buffer_cast %A : memref<?xf32, #map>
				// %1 = memref.dim %0, %c0 : memref<?xf32, #map>
				// %2 = memref.alloc(%1) : memref<?xf32>
				// %3 = memref.cast %2 : memref<?xf32> to memref<?xf32, #map>
				// // ... uses of %3
				// memref.dealloc %2 : memref<?xf32, #map>
				// %res = memref.tensor_load %3 : memref<?xf32, #map>
				// return %res : tensor<?xf32>
				// }
				// ```
				//
				// this is the cue for the bufferization of the function foo (and calls to
				// it) that it must bufferize to
				// `func @foo(%A: memref<?xf32, some_layout>,
				// %B: memref<?xf32, some_layout>)` (i.e. make a cloned
				// allocation of the result tensor)
				// To fully achieve bufferization, the alloc/dealloc pair must be lifted
				// out of the function at each call site.
				//
				// Lastly, note that layout map chosen to bufferize is the most dynamic
				// canonical strided layout of the proper rank. This ensures compatibility with
				// expected layouts after transformations. Combinations of memref.cast +
				// canonicalization are responsible for clean ups.

				#include "PassDetail.h"
				#include "mlir/Analysis/SliceAnalysis.h"
				#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
				#include "mlir/Dialect/Linalg/Passes.h"
				#include "mlir/Dialect/MemRef/IR/MemRef.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/Interfaces/LoopLikeInterface.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/BufferUtils.h"

				#include "llvm/ADT/ScopeExit.h"
				#include "llvm/ADT/TypeSwitch.h"

				#define DEBUG_TYPE "comprehensive-func-bufferize"

				using namespace mlir;
				using namespace linalg;
				using namespace tensor;

				#define DBGS() (llvm::dbgs() << '[' << DEBUG_TYPE << "] ")

				//===----------------------------------------------------------------------===//
				// Op-specific semantics helper to retrieve matching inplaceable result.
				//===----------------------------------------------------------------------===//

				/// Return the OpResult that matches an operand.
				/// Return null if no such result exists.
				OpResult getMatchingOpResult(LinalgOp linalgOp, OpOperand &opOperand) {
				if (!opOperand.get().getType().isa<RankedTensorType>())
				return OpResult();
				// For now assume inputs are never inplaceable.
				// TODO: refine this.
				if (opOperand.getOperandNumber() < linalgOp.getNumInputs())
				return OpResult();
				// For now assume if the operand appears twice, it is not inplaceable.
				// TODO: refine this.
				for (auto &opOperand2 : linalgOp->getOpOperands()) {
				if (opOperand.getOperandNumber() == opOperand2.getOperandNumber())
				continue;
				if (opOperand.get() == opOperand2.get())
				return OpResult();
				}
				int64_t outputOperandIndex =
				opOperand.getOperandNumber() - linalgOp.getNumInputs();
				int64_t numOutputBuffers = 0;
				for (unsigned idx = 0; idx < outputOperandIndex; ++idx)
				if (!linalgOp.getOutputShapedType(idx).isa<TensorType>())
				++numOutputBuffers;
				return linalgOp->getResult(outputOperandIndex - numOutputBuffers);
				}

				/// Determine which results may be reused inplace by the bufferization
				/// patterns of `bufferizeFuncOpInternals`.
				silvasUnsubmitted Done Reply Inline Actions Document the contract of this: it says which results will be reused inplace by the bufferization patterns in bufferizeFuncOpInternals, so that the analysis can correctly predict the inplaced final result for its safety analysis. silvas: Document the contract of this: it says which results will be reused inplace by the…
				/// The inplace analysis uses this information along with interfering read
				/// analysis to determine which op results reuse the same buffer as some
				/// operand.
				OpResult getMatchingOpResult(OpOperand &opOperand) {
				OpResult res =
				llvm::TypeSwitch<Operation *, OpResult>(opOperand.getOwner())
				.Case([&](LinalgOp op) { return getMatchingOpResult(op, opOperand); })
				.Default([&](Operation *op) { return OpResult(); });
				return res;
				}

				//===----------------------------------------------------------------------===//
				// Bufferization-specific attribute manipulation.
				//===----------------------------------------------------------------------===//

				/// Attribute marker to specify op results that can be bufferized inPlace.
				constexpr StringLiteral kInPlaceResultsAttrName = "__inplace_results_attr__";

				// TODO: proper enum.
				enum class InPlaceSpec {
				False,
				True,
				None,
				};

				static StringRef stringify(InPlaceSpec val) {
				switch (val) {
				case InPlaceSpec::False:
				return "false";
				case InPlaceSpec::True:
				return "true";
				case InPlaceSpec::None:
				return "none";
				}
				return "";
				}

				static Optional<InPlaceSpec> symbolize(StringRef str) {
				return StringSwitch<Optional<InPlaceSpec>>(str)
				.Case("false", InPlaceSpec::False)
				.Case("true", InPlaceSpec::True)
				.Case("none", InPlaceSpec::None)
				.Default(None);
				}

				/// Mark whether OpResult can actually be bufferized inplace. If `inPlace` is
				/// `InPlaceSpec::True`, the use-def chain analysis has guaranteed that no
				/// subsequent write would occur to the bufferized tensor value (i.e. the result
				/// can be bufferized inPlace).
				static void setInPlaceOpResult(OpResult opResult,
				InPlaceSpec inPlace = InPlaceSpec::True) {
				if (!opResult)
				return;

				Operation *op = opResult.getOwner();
				auto attr =
				op->getAttr(kInPlaceResultsAttrName).dyn_cast_or_null<ArrayAttr>();
				SmallVector<StringRef> inPlaceVector =
				attr ? SmallVector<StringRef>(
				llvm::to_vector<4>(attr.getAsValueRange<StringAttr>()))
				: SmallVector<StringRef>(op->getNumResults(),
				stringify(InPlaceSpec::None));
				LLVM_DEBUG(DBGS() << "Set inPlace=" << stringify(inPlace) << ": " << *op
				<< " @idx=" << opResult.getResultNumber() << "\n");
				inPlaceVector[opResult.getResultNumber()] = stringify(inPlace);
				op->setAttr(kInPlaceResultsAttrName,
				OpBuilder(op).getStrArrayAttr(inPlaceVector));
				}

				/// Get the InPlaceSpec attribute entry `kInPlaceResultsAttrName` for
				/// `opResult`. If the result is `InPlaceSpec::True`, the use-def chain analysis
				/// has guaranteed that no subsequent read of the tensor value occurs and the
				/// result can be buferized inPlace.
				/// If no InPlaceSpec attribute has been set for `opResult`, return
				/// InPlaceSpec::None.
				static InPlaceSpec getInPlace(OpResult opResult) {
				if (!opResult)
				return InPlaceSpec::None;

				Operation *op = opResult.getOwner();
				auto attr =
				op->getAttr(kInPlaceResultsAttrName).dyn_cast_or_null<ArrayAttr>();
				if (!attr)
				return InPlaceSpec::None;

				// Must return a proper value.
				return symbolize((attr.getAsValueRange<StringAttr>().begin() +
				opResult.getResultNumber()));
				}

				/// Get inPlace information for `bbArg`.
				/// If it does not come from a function, return InPlaceSpec::False.
				static InPlaceSpec getInPlace(BlockArgument bbArg) {
				auto funcOp = dyn_cast<FuncOp>(bbArg.getOwner()->getParentOp());
				if (!funcOp)
				return InPlaceSpec::False;
				auto attr = funcOp.getArgAttrOfType<BoolAttr>(
				bbArg.getArgNumber(), LinalgDialect::kInplaceableAttrName);
				if (!attr)
				return InPlaceSpec::None;
				return attr.getValue() ? InPlaceSpec::True : InPlaceSpec::False;
				}

				//===----------------------------------------------------------------------===//
				// Bufferization-specific BlockAndValueMapping support with debugging.
				//===----------------------------------------------------------------------===//

				/// Wrapper for better debugging.
				static void map(BlockAndValueMapping &bvm, ValueRange keys, ValueRange values) {
				assert(!keys.empty() && "Unexpected empty keys");
				LLVM_DEBUG(DBGS() << "Map: " << keys.front() << " to " << values.front()
				<< "\n");
				return bvm.map(keys, values);
				}

				/// Wrapper for better debugging.
				static void map(BlockAndValueMapping &bvm, Value key, Value value) {
				LLVM_DEBUG(DBGS() << "Map: " << key << " to " << value << "\n");
				return bvm.map(key, value);
				}

				/// Wrapper for better debugging.
				static Value lookup(BlockAndValueMapping &bvm, Value key) {
				// TODO: if key comes from bbArg, forward.
				assert(key.getType().isa<TensorType>());
				if (!bvm.lookupOrNull(key)) {
				if (auto bbArg = key.dyn_cast<BlockArgument>()) {
				if (isa<FuncOp>(key.getParentBlock()->getParentOp()))
				key.getParentBlock()->getParentOp()->dump();
				else
				key.getParentBlock()->getParentOp()->getParentOfType<FuncOp>()->dump();
				bbArg.getOwner()->getParentOp()->dump();
				} else {
				key.getDefiningOp()->getParentOfType<FuncOp>()->dump();
				}
				llvm::errs() << "NO VALUE FOR KEY: " << key << "\n";
				abort();
				}
				return bvm.lookup(key);
				}

				//===----------------------------------------------------------------------===//
				// Bufferization-specific support.
				//===----------------------------------------------------------------------===//

				/// Determine whether any subsequent read of the tensor `opOperand` may occur.
				/// For now, this assumes any use is a read. If any use of the tensor does not
				/// properly dominate `opOperand.getOwner()`, then the tensor cannot be
				/// bufferized inPlace.
				// TODO: For now, this assumes any use is a read. Refine this.
				bool hasInterferingTensorRead(OpOperand &opOperand,
				const DominanceInfo &domInfo) {
				if (!opOperand.get().getType().isa<RankedTensorType>())
				return false;
				for (auto &use : opOperand.get().getUses()) {
				Operation *user = use.getOwner();

				// If properly dominate, there is a clear sequence point and we can dismiss
				// read.
				if (domInfo.properlyDominates(user, opOperand.getOwner()))
				continue;
				// Otherwise, we need to analyze self-dependencies, for now just let it go.
				// TODO: proper self-dependence analysis.
				if (domInfo.dominates(user, opOperand.getOwner()))
				continue;
				if (user == opOperand.getOwner() &&
				use.getOperandNumber() == opOperand.getOperandNumber())
				continue;
				LLVM_DEBUG(DBGS() << "found interfering read operand #"
				<< opOperand.getOperandNumber()
				<< " in op: " << *opOperand.getOwner() << "\n");
				return true;
				}
				LLVM_DEBUG(DBGS() << "no interfering read\n");
				return false;
				}

				/// Return false if either:
				/// 1. `opOperand` is produced by a constant op. For now this is assumed to be
				/// bufferized to a GlobalMemrefOp that cannot be written. Generalize in the
				/// future.
				/// 2.`opOperand` is a BlockArgument of a FuncOp that is not known to be
				/// bufferizable inplace.
				/// 3.`opOperand` has an interfering tensor read.
				/// Return true otherwise.
				bool isBufferizableInPlace(OpOperand &opOperand, const DominanceInfo &domInfo) {
				// Constant tensors are deemed not bufferizable for now.
				if (auto constantOp =
				dyn_cast_or_null<ConstantOp>(opOperand.get().getDefiningOp()))
				return !constantOp.getResult().getType().isa<RankedTensorType>();
				if (auto bbArg = opOperand.get().dyn_cast<BlockArgument>()) {
				// Uses of function arguments that may not be written-to need to be copied.
				// If the function argument itself is not inplaceable, early return false.
				// If is is inplaceable, interfering tensor read need to be checked.
				//
				// TODO: better propagate the fact that we want a single clone inside the
				// function. Atm every user that wants to write inplace will create its own
				// alloc, irrespective of whether or not interfering reads occur.
				if (isa<FuncOp>(bbArg.getOwner()->getParentOp()))
				if (getInPlace(bbArg) != InPlaceSpec::True)
				return false;
				}
				return !hasInterferingTensorRead(opOperand, domInfo);
				}

				//===----------------------------------------------------------------------===//
				// Bufferization-specific MemRefType support.
				//===----------------------------------------------------------------------===//

				/// Return a contiguous MemRefType (i.e. with canonical/empty layout map) with
				/// the same shape as `shapedType` and specified `layout` and `addressSpace`.
				static MemRefType getContiguousMemRefType(ShapedType shapedType,
				ArrayRef<AffineMap> layout = {},
				unsigned addressSpace = 0) {
				if (RankedTensorType tensorType = shapedType.dyn_cast<RankedTensorType>())
				return MemRefType::get(tensorType.getShape(), tensorType.getElementType(),
				layout, addressSpace);
				MemRefType memrefType = shapedType.cast<MemRefType>();
				return MemRefType::get(memrefType.getShape(), memrefType.getElementType(),
				layout, addressSpace);
				}

				/// Return a contiguous MemRefType (i.e. with canonical/empty layout map) with
				/// the same shape as `shapedType` and specified `layout` and `addressSpace` or
				/// an UnrankedMemRefType otherwise.
				static Type getContiguousOrUnrankedMemRefType(Type type,
				ArrayRef<AffineMap> layout = {},
				unsigned addressSpace = 0) {
				if (type.isa<RankedTensorType, MemRefType>())
				return getContiguousMemRefType(type.cast<ShapedType>(), layout,
				addressSpace);
				assert(layout.empty() && "expected empty layout with UnrankedMemRefType");
				return UnrankedMemRefType::get(getElementTypeOrSelf(type), addressSpace);
				}

				/// Return a MemRefType to which the `tensorType` can be bufferized in a
				/// composable fashion. The layout must be the most dynamic possible and
				/// canonicalize away once bufferization is finished.
				static MemRefType getDynamicMemRefType(RankedTensorType tensorType,
				unsigned addressSpace = 0) {
				// TODO: address space decisions to connect with the actual alloc.
				int64_t dynamicOffset = ShapedType::kDynamicStrideOrOffset;
				SmallVector<int64_t> dynamicStrides(tensorType.getRank(),
				ShapedType::kDynamicStrideOrOffset);
				AffineMap stridedLayout = makeStridedLinearLayoutMap(
				dynamicStrides, dynamicOffset, tensorType.getContext());
				return MemRefType::get(tensorType.getShape(), tensorType.getElementType(),
				stridedLayout, addressSpace);
				}

				//===----------------------------------------------------------------------===//
				// Bufferization-specific inPlace pattern matching support.
				//===----------------------------------------------------------------------===//

				/// First assign `op` if `slice.back()` isa `T`, then check condition.
				/// If anything fails just return failure. Otherwise update `sliceRef` by
				/// dropping `sliceRef.back()`, then return success().
				template <typename T>
				static LogicalResult
				matchAndDropBack(ArrayRef<Operation *> &sliceRef, T &op,
				llvm::function_ref<LogicalResult(T)> condition = nullptr) {
				if (sliceRef.empty())
				return failure();
				op = dyn_cast<T>(sliceRef.back());
				if (!op \|\| (condition && failed(condition(op))))
				return failure();
				sliceRef = sliceRef.drop_back();
				return success();
				}

				//===----------------------------------------------------------------------===//
				// Bufferization-specific scoped alloc/dealloc insertion support.
				//===----------------------------------------------------------------------===//

				/// Create an Allocop/DeAllocOp pair, where the AllocOp is after
				/// `shapedValue.getDefiningOp` (or at the top of the block in case of a bbArg)
				/// and the DeallocOp is at the end of the block.
				static Value createNewAllocDeallocPairForShapedValue(
				OpBuilder &b, Location loc, Value shapedValue,
				SmallVector<Value, 4> dynOperands = {}) {
				// Take a guard before anything else.
				OpBuilder::InsertionGuard g(b);

				// TODO: non-zero address space.
				// TODO: layout information if relevant.
				// Cannot allocate an unranked memref so just always go for the contiguous
				// form.
				MemRefType allocMemRefType =
				getContiguousMemRefType(shapedValue.getType().cast<ShapedType>());
				assert(shapedValue.getType().isa<ShapedType>());
				MemRefType memRefType = shapedValue.getType().dyn_cast<MemRefType>();
				memRefType = memRefType ? memRefType : allocMemRefType;

				if (auto bbArg = shapedValue.dyn_cast<BlockArgument>()) {
				b.setInsertionPointToStart(bbArg.getOwner());
				loc = bbArg.getOwner()->getParentOp()->getLoc();
				} else {
				b.setInsertionPointAfter(shapedValue.getDefiningOp());
				loc = shapedValue.getDefiningOp()->getLoc();
				}

				// If the dynOperands are not passed explicity, copmpute them.
				// This circumvents currently missing dim(init_tensor) canonicalizations.
				// TODO: dim(init_tensor) canonicalization.
				if (dynOperands.empty()) {
				for (auto dim : llvm::enumerate(memRefType.getShape()))
				if (dim.value() == ShapedType::kDynamicSize)
				dynOperands.push_back(
				b.create<memref::DimOp>(loc, shapedValue, dim.index()));
				}

				Value allocated =
				b.create<memref::AllocOp>(loc, allocMemRefType, dynOperands);
				Value casted = allocated;
				if (memRefType != allocMemRefType)
				casted = b.create<memref::CastOp>(loc, memRefType, allocated);
				b.setInsertionPoint(allocated.getParentBlock()->getTerminator());
				b.create<memref::DeallocOp>(loc, allocated);
				return casted;
				}

				//===----------------------------------------------------------------------===//
				// Bufferization-specific inPlace analysis support.
				//===----------------------------------------------------------------------===//

				/// Detect the simple terminator pattern:
				/// ```
				/// candidate -> ... -> inplaceable_op(candidate) -> term
				/// ```
				template <typename ContainerOp, typename TerminatorOp>
				static LogicalResult detectInplaceOpToTerminator(Operation *parentOp,
				BlockArgument candidate,
				ArrayRef<Operation *> slice) {
				assert(parentOp && "Unexpected null parent op");
				if (!isa<ContainerOp>(parentOp))
				return failure();
				TerminatorOp terminatorOp;
				// Match returnOp and update slice.
				if (failed(matchAndDropBack(slice, terminatorOp))) {
				LLVM_DEBUG(DBGS() << "FAIL: inplaceOpToTerm pattern -> slice must end with "
				"a known terminator\n");
				return failure();
				}
				return success();
				}

				/// The following uses internal knowledge of the position of tied operand /
				/// results.
				static void propagateInPlace(const SmallVector<OpOperand *> &initalWorklist,
				const DominanceInfo &domInfo) {
				LLVM_DEBUG(DBGS() << "\n\n");
				LLVM_DEBUG(DBGS() << "Start propagateInPlace from initial WL\n");
				for (OpOperand *operand : initalWorklist)
				LLVM_DEBUG(DBGS() << "WL item: " << operand->get() << " used by "
				<< *operand->getOwner() << "\n");
				SmallVector<OpOperand *> worklist(initalWorklist);
				for (unsigned idx = 0; idx < worklist.size(); ++idx) {
				// TODO: bail on subtensor/subtensor_insert and vector.transfer_read/write
				// that should have been already captured in destructive update patterns?
				OpOperand &operand = *worklist[idx];
				LLVM_DEBUG(DBGS() << "WL item: " << *operand.getOwner() << "\n");
				// If the owner turns out to be a CallOp without
				// `kWriteableFuncBufferArgsAttrName` this will be a noop.
				if (isBufferizableInPlace(operand, domInfo)) {
				LLVM_DEBUG(DBGS() << "bufferizable inplace\n");
				setInPlaceOpResult(getMatchingOpResult(operand));
				}
				LLVM_DEBUG(DBGS() << "propagatedInPlace: " << *operand.getOwner() << "\n");
				// use can have interfering reads that prevent it from being written inPlace
				// but the values it produces are still themselves candidates for inPlace at
				// their point of use.
				for (Value v : operand.getOwner()->getResults()) {
				LLVM_DEBUG(DBGS() << "propagate result: " << v << "\n");
				for (auto &use : v.getUses()) {
				LLVM_DEBUG(DBGS() << "add use to WL: " << use.get() << "\n");
				worklist.push_back(&use);
				}
				}
				}
				LLVM_DEBUG(DBGS() << "\n\n");
				}

				static void propagateInPlace(BlockArgument &bbArg,
				const DominanceInfo &domInfo) {
				SmallVector<OpOperand *> worklist;
				for (auto &use : bbArg.getUses())
				worklist.push_back(&use);
				propagateInPlace(worklist, domInfo);
				}

				/// Iterate over bbArgs of `parentOp` and determine if they are the root of a
				/// known destructive update chain. Such a destructive update is related to
				/// traditional loop nest + memory analysis but provides a simpler SSA use-def
				/// chain-based abstraction.
				static void destructiveUpdateAnalysis(Block *block,
				const DominanceInfo &domInfo) {
				Operation *parentOp = block->getParentOp();
				for (BlockArgument candidate : block->getArguments()) {
				LLVM_DEBUG(llvm::dbgs() << "\n\n");
				LLVM_DEBUG(DBGS() << "Destructive update analysis on candidate: "
				<< candidate << "\nof:\n"
				<< *parentOp << "\n");

				if (!candidate.getType().isa<ShapedType>()) {
				LLVM_DEBUG(DBGS() << "Not a tensor\n");
				continue;
				}

				// FuncOp arguments must be inplaceable otherwise they cannot be the root of
				// a destructive update chain.
				if (isa<FuncOp>(parentOp) && getInPlace(candidate) != InPlaceSpec::True) {
				LLVM_DEBUG(DBGS() << "Not inplace\n");
				continue;
				}

				llvm::SetVector<Operation *> slice;
				getForwardSlice(candidate, &slice,
				[&](Operation *op) { return op->getBlock() == block; });

				LLVM_DEBUG(DBGS() << "Slice:\n");
				for (auto *op : slice)
				LLVM_DEBUG(DBGS() << *op << "\n");

				bool failedDetectingDestructiveUpdate =
				// func / return inplace patterns.
				failed(detectInplaceOpToTerminator<FuncOp, ReturnOp>(
				parentOp, candidate, slice.getArrayRef()));
				if (failedDetectingDestructiveUpdate) {
				LLVM_DEBUG(DBGS() << "Failed to detect a destructive update pattern\n");
				continue;
				}

				propagateInPlace(candidate, domInfo);
				}
				}

				//===----------------------------------------------------------------------===//
				// Bufferization as simple BlockAndValueMapping rewrites.
				//===----------------------------------------------------------------------===//

				/// Helper function for LinalgOp bufferization.
				/// Operate on mixed tensor + buffer Linalg ops for progressive bufferization.
				/// Allocate the output buffers for the remaining tensor output operands of
				/// the Linalg op. If the tensor is an "init" tensor (i.e. its value is
				/// actually used in the payload region), we additionally copy the original
				/// value into the newly allocated buffer.
				static void allocateBuffersForResults(OpBuilder &b, Location loc, LinalgOp op,
				SmallVectorImpl<Value> &resultBuffers,
				BlockAndValueMapping &bvm) {
				// Take a guard before anything else.
				OpBuilder::InsertionGuard g(b);

				// Lazily compute loopRanges.
				SmallVector<Range, 4> loopRanges;

				// Linalg invariant: output tensors and result match 1-1.
				assert(op.getNumOutputTensors() == op->getNumResults());
				for (auto &opOperand : op.getOutputOpOperands()) {
				Value output = opOperand.get();
				if (output.getType().isa<MemRefType>()) {
				resultBuffers.push_back(output);
				continue;
				}

				// If output tensor is marked inPlace, just use the buffer.
				// The following uses internal knowledge of the position of tied operand /
				// results.
				OpResult tiedResult = getMatchingOpResult(op, opOperand);
				if (getInPlace(tiedResult) == InPlaceSpec::True) {
				resultBuffers.push_back(lookup(bvm, output));
				continue;
				}

				Value dimTensor = bvm.lookupOrDefault(output);
				Value alloc = createNewAllocDeallocPairForShapedValue(b, loc, dimTensor);
				b.setInsertionPointAfter(alloc.getDefiningOp());
				resultBuffers.push_back(alloc);

				// Additionally, if the output buffer is used, clone its value for now.
				if (op.payloadUsesValueFromOpOperand(&opOperand))
				b.create<CopyOp>(loc, lookup(bvm, output), alloc);
				}
				if (op->getNumResults())
				map(bvm, op->getResults(), resultBuffers);
				}

				static void finalizeBufferAllocation(OpBuilder &b, LinalgOp op,
				ValueRange inputs, ValueRange outputs,
				BlockAndValueMapping &bvm) {
				SmallVector<Value, 8> newOperands = inputs;
				newOperands.append(outputs.begin(), outputs.end());
				auto otherOperands = op.getAssumedNonShapedOperands();
				newOperands.append(otherOperands.begin(), otherOperands.end());
				Location loc = op.getLoc();
				op.clone(b, loc, /resultTypes=/TypeRange{}, newOperands);

				// Replace the results of the old op with the new output buffers.
				if (op->getNumResults())
				map(bvm, op->getResults(), outputs);
				if (!op.hasTensorSemantics())
				op->erase();
				}

				/// Generic conversion for any LinalgOp.
				/// Operate on mixed tensor + buffer Linalg ops for progressive bufferization.
				static LogicalResult convertAnyLinalgOp(OpBuilder &b, LinalgOp op,
				BlockAndValueMapping &bvm) {
				// Take a guard before anything else.
				OpBuilder::InsertionGuard g(b);

				if (op.hasBufferSemantics())
				return failure();

				LLVM_DEBUG(DBGS() << "convert: " << *op << "\n");

				b.setInsertionPoint(op);
				Location loc = op.getLoc();
				SmallVector<Value, 2> newInputBuffers;
				newInputBuffers.reserve(op.getNumInputs());
				for (Value v : op.getInputs())
				newInputBuffers.push_back(lookup(bvm, v));
				SmallVector<Value, 2> newOutputBuffers;
				allocateBuffersForResults(b, loc, op, newOutputBuffers, bvm);
				finalizeBufferAllocation(b, op, newInputBuffers, newOutputBuffers, bvm);
				return success();
				}

				/// DimOp tensor operand is modified inplace. This allows leaving dead tensors
				/// behind that will get DCE'd.
				static LogicalResult convertDimOp(OpBuilder &b, memref::DimOp dimOp,
				BlockAndValueMapping &bvm) {
				if (dimOp.memrefOrTensor().getType().isa<RankedTensorType>())
				dimOp.memrefOrTensorMutable().assign(lookup(bvm, dimOp.memrefOrTensor()));
				return success();
				}

				/// FuncOp always creates TensorToMemRef ops.
				static LogicalResult convertFuncOp(OpBuilder &b, FuncOp funcOp,
				BlockAndValueMapping &bvm) {
				// Take a guard before anything else.
				OpBuilder::InsertionGuard g(b);
				b.setInsertionPointToStart(&funcOp.body().front());
				for (auto bbArg : funcOp.getArguments()) {
				auto tensorType = bbArg.getType().dyn_cast<TensorType>();
				if (!tensorType)
				continue;
				auto rankedTensorType = tensorType.dyn_cast<RankedTensorType>();
				// Cast the tensor to the most dynamic buffer possible. Further
				// canonicalizations will clean up.
				Type memRefType = rankedTensorType
				? getDynamicMemRefType(rankedTensorType)
				: getContiguousOrUnrankedMemRefType(tensorType);
				Value tensorToMemref =
				b.create<memref::BufferCastOp>(funcOp.getLoc(), memRefType, bbArg);
				map(bvm, bbArg, tensorToMemref);
				}
				return success();
				}

				/// ReturnOp always creates memref::TensorLoadOp.
				static LogicalResult convertReturnOp(OpBuilder &b, ReturnOp returnOp,
				BlockAndValueMapping &bvm) {
				// Take a guard before anything else.
				OpBuilder::InsertionGuard g(b);
				b.setInsertionPoint(returnOp);

				FuncOp funcOp = cast<FuncOp>(returnOp->getParentOp());
				assert(funcOp && "only support FuncOp parent for ReturnOp");
				for (OpOperand &operand : returnOp->getOpOperands()) {
				auto tensorType = operand.get().getType().dyn_cast<TensorType>();
				if (!tensorType)
				continue;
				operand.set(b.create<memref::TensorLoadOp>(returnOp.getLoc(),
				lookup(bvm, operand.get())));
				}
				return success();
				}

				static void inPlaceAnalysisFuncOpInternals(FuncOp funcOp,
				const DominanceInfo &domInfo) {
				assert(funcOp && funcOp->getNumRegions() > 0 && !funcOp.body().empty() &&
				"expected a funcOp definition with a body");

				// Start propagating from FuncOp bbArgs.
				destructiveUpdateAnalysis(&funcOp.body().front(), domInfo);
				}

				static LogicalResult bufferizeFuncOpInternals(
				FuncOp funcOp, BlockAndValueMapping &bvm,
				const DenseMap<FuncOp, SmallVector<int64_t>> &tiedResultsMap) {
				OpBuilder b(funcOp->getContext());
				/// Start by converting `funcOp` arguments.
				if (failed(convertFuncOp(b, funcOp, bvm)))
				return failure();
				WalkResult result = funcOp.walk<WalkOrder::PreOrder>([&](Operation *op) {
				LogicalResult status =
				llvm::TypeSwitch<Operation *, LogicalResult>(op)
				// Skip BufferCast and TensorLoad ops.
				.Case<memref::BufferCastOp, memref::TensorLoadOp>(
				[&](auto) { return success(); })
				.Case([&](memref::DimOp op) { return convertDimOp(b, op, bvm); })
				.Case([&](LinalgOp op) { return convertAnyLinalgOp(b, op, bvm); })
				.Case([&](ReturnOp op) { return convertReturnOp(b, op, bvm); })
				.Default([&](Operation *op) {
				auto isaTensor = [](Type t) { return t.isa<TensorType>(); };
				if (llvm::any_of(op->getOperandTypes(), isaTensor) \|\|
				llvm::any_of(op->getResultTypes(), isaTensor))
				return failure();
				return success();
				});
				if (failed(status)) {
				op->emitError("Failed bufferization");
				return WalkResult::interrupt();
				}
				return WalkResult::advance();
				});
				if (result.wasInterrupted())
				return failure();
				return success();
				}

				namespace {
				struct LinalgComprehensiveFuncBufferize
				: public LinalgComprehensiveFuncBufferizeBase<
				LinalgComprehensiveFuncBufferize> {
				void runOnFunction() override;

				void getDependentDialects(DialectRegistry &registry) const override {
				registry.insert<linalg::LinalgDialect, memref::MemRefDialect>();
				}
				};
				} // end namespace

				void LinalgComprehensiveFuncBufferize::runOnFunction() {
				auto funcOp = getFunction();
				DominanceInfo domInfo(funcOp);
				BlockAndValueMapping bvm;
				DenseMap<FuncOp, SmallVector<int64_t>> tiedResultsMap;
				inPlaceAnalysisFuncOpInternals(funcOp, domInfo);

				LLVM_DEBUG(DBGS() << "Begin BufferizeFuncOpInternals:\n" << funcOp << "\n");
				auto guard = llvm::make_scope_exit([&] {
				funcOp.walk(
				[&](Operation *op) { op->removeAttr(kInPlaceResultsAttrName); });
				LLVM_DEBUG(DBGS() << "End BufferizeFuncOpInternals:\n" << funcOp << "\n");
				});
				if (failed(bufferizeFuncOpInternals(funcOp, bvm, tiedResultsMap)))
				signalPassFailure();
				}

				std::unique_ptr<Pass> mlir::createLinalgComprehensiveFuncBufferizePass() {
				return std::make_unique<LinalgComprehensiveFuncBufferize>();
				}

mlir/test/Dialect/Linalg/comprehensive-func-bufferize.mlir

This file was added.

				// RUN: mlir-opt %s -linalg-comprehensive-func-bufferize -split-input-file \| FileCheck %s

				// CHECK-DAG: #[[$map_2d_dyn:.]] = affine_map<(d0)[s0, s1] -> (d0 s1 + s0)>

				// CHECK-LABEL: func @fill_inplace(
				// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: tensor<?xf32> {linalg.inplaceable = true})
				func @fill_inplace(%A : tensor<?xf32> {linalg.inplaceable = true}) -> tensor<?xf32> {
				// CHECK: %[[I:.*]] = memref.buffer_cast %[[A]] : memref<?xf32, #[[$map_2d_dyn]]>

				// CHECK: %[[F0:.*]] = constant 0.000000e+00 : f32
				%f0 = constant 0.0 : f32

				/// Inplaceable, no alloc
				// CHECK-NOT: alloc
				// CHECK: linalg.fill(%[[I]], %[[F0]]) : memref<?xf32, #[[$map_2d_dyn]]>, f32
				%r = linalg.fill(%A, %f0) : tensor<?xf32>, f32 -> tensor<?xf32>

				// CHECK: %[[R:.*]] = memref.tensor_load %[[I]] : memref<?xf32, #[[$map_2d_dyn]]>
				// CHECK: return %[[R]] : tensor<?xf32>
				return %r: tensor<?xf32>
				}

				// -----

				// CHECK-DAG: #[[$map_2d_dyn:.]] = affine_map<(d0)[s0, s1] -> (d0 s1 + s0)>

				/// No linalg.inplaceable flag, must allocate.
				// CHECK-LABEL: func @not_inplace(
				// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: tensor<?xf32>)
				func @not_inplace(%A : tensor<?xf32>) -> tensor<?xf32> {
				// CHECK: %[[I:.*]] = memref.buffer_cast %[[A]] : memref<?xf32, #[[$map_2d_dyn]]>

				// CHECK: %[[D0:.]] = memref.dim %[[I]], {{.}} : memref<?xf32, #[[$map_2d_dyn]]>
				// CHECK: %[[ALLOC:.*]] = memref.alloc(%[[D0]]) : memref<?xf32>
				// CHECK: %[[I2:.*]] = memref.cast %[[ALLOC]] : memref<?xf32> to memref<?xf32, #map>

				// CHECK: %[[F0:.*]] = constant 0.000000e+00 : f32
				%f0 = constant 0.0 : f32

				// CHECK: linalg.fill(%[[I2]], %[[F0]]) : memref<?xf32, #[[$map_2d_dyn]]>, f32
				%r = linalg.fill(%A, %f0) : tensor<?xf32>, f32 -> tensor<?xf32>

				// CHECK: dealloc %[[ALLOC]] : memref<?xf32>
				// CHECK: %[[R:.*]] = memref.tensor_load %[[I2]] : memref<?xf32, #[[$map_2d_dyn]]>
				// CHECK: return %[[R]] : tensor<?xf32>
				return %r: tensor<?xf32>
				}

				// -----

				// CHECK-LABEL: func @not_inplace
				// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: tensor<?x?xf32>
				func @not_inplace(%A : tensor<?x?xf32> {linalg.inplaceable = true}) -> tensor<?x?xf32> {
				%f0 = constant 0.0 : f32

				// CHECK: %[[BUFFER_CAST:.*]] = memref.buffer_cast %[[A]] : memref<?x?xf32

				/// Cross-op multiple uses of %A, the first op which has interfering reads must alloc.
				// CHECK: %[[ALLOC:.*]] = memref.alloc
				// CHECK: %[[CAST:.*]] = memref.cast %[[ALLOC]]
				// CHECK: linalg.fill(%[[CAST]]
				%f = linalg.fill(%A, %f0) : tensor<?x?xf32>, f32 -> tensor<?x?xf32>

				/// The second op has no interfering reads and can reuse.
				// CHECK-NOT: alloc
				// CHECK: linalg.matmul{{.*}}outs(%[[BUFFER_CAST]]
				%r = linalg.matmul ins(%f, %f: tensor<?x?xf32>, tensor<?x?xf32>)
				outs(%A: tensor<?x?xf32>)
				-> tensor<?x?xf32>
				return %r: tensor<?x?xf32>
				}

				// -----

				// CHECK-LABEL: func @not_inplace
				func @not_inplace(%A : tensor<?x?xf32> {linalg.inplaceable = true}) -> tensor<?x?xf32> {
				/// Within op multiple uses of %A, must alloc.
				// CHECK: alloc
				%r = linalg.matmul ins(%A, %A: tensor<?x?xf32>, tensor<?x?xf32>)
				outs(%A: tensor<?x?xf32>)
				-> tensor<?x?xf32>
				return %r: tensor<?x?xf32>
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add ComprehensiveBufferize pass for function and modules (step 1/n)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 345303

mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td

mlir/include/mlir/Dialect/Linalg/Passes.h

mlir/include/mlir/Dialect/Linalg/Passes.td

mlir/lib/Dialect/Linalg/IR/LinalgTypes.cpp

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferize.cpp

mlir/test/Dialect/Linalg/comprehensive-func-bufferize.mlir

[mlir] Add ComprehensiveBufferize pass for function and modules (step 1/n)
ClosedPublic