This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Transforms/
-
mlir/
-
Transforms/
-
Bufferize.h
2/2
Passes.h
2/2
Passes.td
-
lib/Transforms/
-
Transforms/
6/6
BufferDeallocation.cpp
25/25
BufferOptimizations.cpp
-
BufferPlacement.cpp
-
CMakeLists.txt
-
test/
-
Dialect/Linalg/
-
Linalg/
-
bufferize.mlir
-
Transforms/
-
buffer-deallocation.mlir
1/1
buffer-hoisting.mlir
10/10
buffer-loop-hoisting.mlir
-
buffer-placement.mlir

Differential D87756

[mlir] Refactored BufferPlacement transformation into BufferDeallocation and BufferHoisting.
ClosedPublic

Authored by dfki-mako on Sep 16 2020, 3:51 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
herhut
pifon2a
bondhugula

Commits

rG1b1c61ff47f8: [mlir] Refactored BufferPlacement transformation.

Summary

The current BufferPlacement transformation contains several concepts for hoisting allocations. However, more advanced hoisting techniques should not be integrated into the BufferPlacement transformation. Hence, this CL refactors the current BufferPlacement pass into two separate pieces: BufferDeallocation and BufferHoisting. Moreover, it extends the hoisting functionality by allowing to move allocations out of loops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dfki-mako created this revision.Sep 16 2020, 3:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2020, 3:51 AM

Herald added subscribers: tatianashp, msifontes, jurahul and 14 others. · View Herald Transcript

dfki-mako requested review of this revision.Sep 16 2020, 3:51 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 16 2020, 3:51 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

dfki-mako added reviewers: herhut, pifon2a.Sep 16 2020, 3:53 AM

Harbormaster completed remote builds in B71858: Diff 292173.Sep 16 2020, 4:11 AM

Some first comments.

mlir/include/mlir/Transforms/BufferPlacement.h
296 ↗	(On Diff #292173)	Could this just be a tuple of Value, Block and Operation without any associated logic?
354 ↗	(On Diff #292173)	Why is this called `Operation`? If it is a transformation, use `Transformation`. Or maybe `Pass`?
366 ↗	(On Diff #292173)	Could this be a static function? It does not need to be an inherited member.
382 ↗	(On Diff #292173)	Do all passes that inherit this really need all the below information?
mlir/lib/Transforms/BufferDeallocation.cpp
250	Why do we need to store the `placementBlock` here? Does this ever need to be updated? As far as I can see, it is always the block where the corresponding alloc was initially. Would `allocValue->getDefiningOp()->getBlock()` not always do the same?
264	This logic could be moved to the use site or is this reused by anything other than the 'BufferPlacementHoisting' pass?
mlir/lib/Transforms/BufferOptimizations.cpp
31	Nit: `Hosting` Maybe also drop the Placement and instead call it `BufferAllocationHoisting`?
78	This can only be done if the alloc you are moving is loop invariant. This is ensured by the calling context but not clear from this code. Also, you could query the `LoopLikeOpInterface` as to whether the operands are loop independent. This also is not an optimization of the loop is never executed. What happens if the allocated buffer escapes the loop over a backedge? Something like %0 = alloc scf.for ... init(%alloc) { bb0(%arg0...): %1 = alloc <update %1 using %arg0> yield %1 } I think with your rewrite, you would allocate `%1` once and from the second iteration, the two buffers would now alias.

pifon2a added inline comments.Sep 21 2020, 1:44 AM

mlir/include/mlir/Transforms/BufferPlacement.h
277 ↗	(On Diff #292173)	nit: I would add `begin()` just for consistency. In that case you would not need to explain why you only have `end()` like you do now.
310–317 ↗	(On Diff #292173)	nit: move constructors to the top of the struct. Also: why do you need them to be explicit?
mlir/include/mlir/Transforms/Passes.td
126	+1 for formatting
mlir/lib/Transforms/BufferDeallocation.cpp
268	`const LivenessBlockInfo& livenessInfo = *liveness.getLiveness(placementBlock);`
mlir/lib/Transforms/BufferOptimizations.cpp
2	nit: remove empty line
58	you don't really need `Block* placementBlock` line. The code is shorter without this tmp variable. if (operands.size() < 1) { // If not, we have to find the common dominator of all aliases and move // the allocation out of nested loops. auto resultAliases = aliases.resolve(alloc); allocEntry.placementBlock = findCommonDominator(alloc, resultAliases, dominators); moveOutOfLoop(allocEntry.placementBlock); return; } // If this node has dependencies, check all dependent nodes with respect // to a common post dominator in which all values are available. ValueSetT dependencies(++operands.begin(), operands.end()); allocEntry.placementBlock = findCommonDominator(*operands.begin(), dependencies, postDominators);
68	`std::next(operands.begin())`

pifon2a requested changes to this revision.Sep 21 2020, 1:44 AM

This revision now requires changes to proceed.Sep 21 2020, 1:44 AM

Addressed reviewer comments and changed the BufferAllocationHoisting pass to pay attention to escaping buffers via backedges.

dfki-mako added inline comments.Sep 30 2020, 7:39 AM

mlir/lib/Transforms/BufferDeallocation.cpp
264	This logic will be used by other passes in the near future - in one of the upcoming CLs :)

Harbormaster completed remote builds in B73508: Diff 295278.Sep 30 2020, 8:01 AM

Mostly minor comments on readability.

mlir/include/mlir/Transforms/BufferPlacement.h
313–314 ↗	(On Diff #295278)	Could this comment be improved to document the return value better? A single `Operation *` is returned while "associated deallocs" is mentioned.
mlir/include/mlir/Transforms/Passes.h
31–41	Add a few more words to this comment (like the one below) - it's almost stating the obvious.
mlir/include/mlir/Transforms/Passes.td
105–107	This is a welcome change, thanks!
mlir/lib/Transforms/BufferOptimizations.cpp
10–12	Nit: use width
35	Avoid `auto`.
59	Missing assert for `getDefiningOp`.
62	`!operands.empty()` - less expensive in general and more idiomatic.
109	Nit: analysis -> info
111	analysis -> info
127	Nit: Terminate with period.

This revision now requires changes to proceed.Oct 13 2020, 12:50 AM

Herald added a subscriber: rdzhabarov. · View Herald TranscriptOct 13 2020, 12:50 AM

Refined all parts of this PR and addressed all reviewer comments.

Harbormaster completed remote builds in B74921: Diff 297821.Oct 13 2020, 4:44 AM

Much nicer!

mlir/include/mlir/Transforms/Passes.h
39	Does it reduce copies? I though the goal was to avoid re-allocations inside of loops and instead share a single buffer for the whole loop?
mlir/lib/Transforms/BufferDeallocation.cpp
379	nit: This no longer finds the initial allocation block.
415–416	Maybe clarify this a bit. It ensures that all allocs in the program have a corresponding de-allocation. As a side-effect, it might also introduce copies (which lead to allocs again).
mlir/lib/Transforms/BufferOptimizations.cpp
45	It is not clear to me why this is computed first for all allocations here and then there is a second round in `hoist()` that applies this? Once could equally well compute one placement and then host the alloc, avoiding creating intermediate state. So in essence, move the loop outwards and do foreach alloc : allocs { computePos; hoist; }
51	nit: common
59	`all values this allocation depends on are available`?
90	Does this address my example? This moves up the blocks until all aliases are covered. So in my example, if there was another aliasing on some outer level, it would still break the semantics, no?
109	This moves within one region level, correct?
119	I would spell this out for readability: `state.isLegalPlacementOp(parentOp)`
124	`recordPotentialPlacement` as this does not move?
148	The goal here is to move the alloc high enough to cover all aliases, right? Moving it any higher will not avoid copies but only increase memory pressure. This should be covered here. `isa<LoopLikeOpInterface>` is a very weak condition. There might be loops that do not implement this. So a better way to phrase this would be that the operation also does implement the `RegionBranchOpInterface` and has no back-edges between regions. Otherwise it is a loop and hoisting out of loops is illegal.
161	So this will walk all the way up and if it finds a loop anywhere during that walk it will hoist. This also means hoisting out of conditionals (might be bad) and through unknown region based control flow. This should only hoist out of operations that implement `LoopLikeOpInterface` and only if the allocation does not escape that loop.
mlir/test/Transforms/buffer-loop-hoisting.mlir
2	It is not clear to me why these tests are in this file. Not all of them are concerned with loops.
4	BufferLoopHoisting here and below?
87	This comment does not describe what is tested. In the test, nothing is moved.
177	Why is this legal to do? `%3` escapes this loop on the back-edge and result. So if there was another use of `iterBuf` after the alloc, this would introduce an aliasing of buffers that did not exist before. In my mind, it is only legal to hoist an allocation out of the loop if it does not escape the loop otherwise.
205	Two issues the then case might be rare, in which case this is not an improvement the allocation escapes

This revision now requires changes to proceed.Oct 13 2020, 5:37 AM

Refactored code and addressed reviewer comments.

mlir/lib/Transforms/BufferOptimizations.cpp
148	The goal here is to move the alloc high enough to cover all aliases, right? Moving it any higher will not avoid copies but only increase memory pressure. This should be covered here. The function `findPlacementBlock` does not move the placement block above the `dominatorBlock`, which has been determined before. Since the `dominatorBlock` represents the immediate common dominator of all aliases (while taking potential dependencies into account), it cannot happen that the memory pressure is significantly increased because the allocation will not be moved further. isa<LoopLikeOpInterface> is a very weak condition I agree that we might want to capture these cases, as well.
161	The current implementation behaves differently, however, if we want to ensure that allocation does not escape the loop in all cases, we have to restructure the implementation slightly.
mlir/test/Transforms/buffer-loop-hoisting.mlir
2	It is not required to have all of these tests. However, out intention was to ensure that the allocations are not moved in tests without any loops.

Harbormaster completed remote builds in B75068: Diff 298135.Oct 14 2020, 6:26 AM

herhut added inline comments.Oct 14 2020, 7:16 AM

mlir/lib/Transforms/BufferOptimizations.cpp
207	nit: `the one` -> `one`
259	Please fix.
mlir/test/Transforms/buffer-hoisting.mlir
5	nit: to their
mlir/test/Transforms/buffer-loop-hoisting.mlir
13	It does not move it, right?
223	This is no longer the same.
259	Is is hard to see what the actual structure is that this is checking.
311	Can you add small tests that check simple cases: one alloc hoisted out of loop, one hoisted through multiple loops alloc not hoisted out of conditional, alloc hoisted out of loop in conditional alloc with dependencies not hoisted out of loop, alloc with dependencies in loop nest hoisted as far as possible. That likely makes it easier to write CHECK patterns.

Addressed reviewer comments.

herhut accepted this revision.Oct 16 2020, 3:58 AM

Harbormaster completed remote builds in B75289: Diff 298590.Oct 16 2020, 4:26 AM

This revision was not accepted when it landed; it landed in state Needs Review.Oct 19 2020, 3:54 AM

Closed by commit rG1b1c61ff47f8: [mlir] Refactored BufferPlacement transformation. (authored by dfki-mako). · Explain Why

This revision was automatically updated to reflect the committed changes.

dfki-mako added a commit: rG1b1c61ff47f8: [mlir] Refactored BufferPlacement transformation..

Revision Contents

Path

Size

mlir/

include/

mlir/

Transforms/

Bufferize.h

138 lines

Passes.h

13 lines

Passes.td

73 lines

lib/

Transforms/

	BufferDeallocation.cpp
	BufferPlacement.cpp

617 lines

BufferOptimizations.cpp

258 lines

BufferPlacement.cpp

CMakeLists.txt

3 lines

test/

Dialect/

Linalg/

bufferize.mlir

2 lines

Transforms/

	buffer-deallocation.mlir
	buffer-placement.mlir

387 lines

buffer-hoisting.mlir

905 lines

buffer-loop-hoisting.mlir

490 lines

buffer-placement.mlir

Diff 298998

mlir/include/mlir/Transforms/Bufferize.h

Show All 22 Lines
// those cases, BufferizeConversionPattern and its related classes should be		// those cases, BufferizeConversionPattern and its related classes should be
// used, which provide access to a BufferizeTypeConverter directly.		// used, which provide access to a BufferizeTypeConverter directly.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef MLIR_TRANSFORMS_BUFFERIZE_H		#ifndef MLIR_TRANSFORMS_BUFFERIZE_H
#define MLIR_TRANSFORMS_BUFFERIZE_H		#define MLIR_TRANSFORMS_BUFFERIZE_H

		#include "mlir/Analysis/Liveness.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
		#include "mlir/IR/Dominance.h"
#include "mlir/IR/Function.h"		#include "mlir/IR/Function.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"
#include "mlir/Transforms/DialectConversion.h"		#include "mlir/Transforms/DialectConversion.h"

namespace mlir {		namespace mlir {

/// A helper type converter class for using inside Buffer Assignment operation		/// A helper type converter class for using inside Buffer Assignment operation
/// conversion patterns. The default constructor keeps all the types intact		/// conversion patterns. The default constructor keeps all the types intact
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	populateWithBufferizeOpConversionPatterns(MLIRContext *context,
patterns.insert<		patterns.insert<
BufferizeCallOpConverter,		BufferizeCallOpConverter,
BufferizeFuncOpConverter,		BufferizeFuncOpConverter,
BufferizeReturnOpConverter		BufferizeReturnOpConverter
<ReturnOpSourceTy, ReturnOpTargetTy, CopyOpTy>		<ReturnOpSourceTy, ReturnOpTargetTy, CopyOpTy>
>(context, converter);		>(context, converter);
// clang-format on		// clang-format on
}		}

		/// A straight-forward alias analysis which ensures that all aliases of all
		/// values will be determined. This is a requirement for the BufferPlacement
		/// class since you need to determine safe positions to place alloc and
		/// deallocs.
		class BufferPlacementAliasAnalysis {
		public:
		using ValueSetT = SmallPtrSet<Value, 16>;
		using ValueMapT = llvm::DenseMap<Value, ValueSetT>;

		public:
		/// Constructs a new alias analysis using the op provided.
		BufferPlacementAliasAnalysis(Operation *op);

		/// Find all immediate aliases this value could potentially have.
		ValueMapT::const_iterator find(Value value) const {
		return aliases.find(value);
		}

		/// Returns the begin iterator to iterate over all aliases.
		ValueMapT::const_iterator begin() const { return aliases.begin(); }

		/// Returns the end iterator that can be used in combination with find.
		ValueMapT::const_iterator end() const { return aliases.end(); }

		/// Find all immediate and indirect aliases this value could potentially
		/// have. Note that the resulting set will also contain the value provided as
		/// it is an alias of itself.
		ValueSetT resolve(Value value) const;

		/// Removes the given values from all alias sets.
		void remove(const SmallPtrSetImpl<Value> &aliasValues);

		private:
		/// This function constructs a mapping from values to its immediate aliases.
		void build(Operation *op);

		/// Maps values to all immediate aliases this value can have.
		ValueMapT aliases;
		};

		/// A simple analysis that detects allocation operations.
		class BufferPlacementAllocs {
		public:
		/// Represents a tuple of allocValue and deallocOperation.
		using AllocEntry = std::tuple<Value, Operation *>;

		/// Represents a list containing all alloc entries.
		using AllocEntryList = SmallVector<AllocEntry, 8>;

		/// Get the start operation to place the given alloc value withing the
		// specified placement block.
		static Operation getStartOperation(Value allocValue, Block placementBlock,
		const Liveness &liveness);

		/// Find an associated dealloc operation that is linked to the given
		/// allocation node (if any).
		static Operation *findDealloc(Value allocValue);

		public:
		/// Initializes the internal list by discovering all supported allocation
		/// nodes.
		BufferPlacementAllocs(Operation *op);

		/// Returns the begin iterator to iterate over all allocations.
		AllocEntryList::const_iterator begin() const { return allocs.begin(); }

		/// Returns the end iterator that can be used in combination with begin.
		AllocEntryList::const_iterator end() const { return allocs.end(); }

		/// Returns the begin iterator to iterate over all allocations.
		AllocEntryList::iterator begin() { return allocs.begin(); }

		/// Returns the end iterator that can be used in combination with begin.
		AllocEntryList::iterator end() { return allocs.end(); }

		/// Registers a new allocation entry.
		void registerAlloc(const AllocEntry &entry) { allocs.push_back(entry); }

		private:
		/// Searches for and registers all supported allocation entries.
		void build(Operation *op);

		private:
		/// Maps allocation nodes to their associated blocks.
		AllocEntryList allocs;
		};

		/// The base class for all BufferPlacement transformations.
		class BufferPlacementTransformationBase {
		public:
		using ValueSetT = BufferPlacementAliasAnalysis::ValueSetT;

		/// Finds a common dominator for the given value while taking the positions
		/// of the values in the value set into account. It supports dominator and
		/// post-dominator analyses via template arguments.
		template <typename DominatorT>
		static Block *findCommonDominator(Value value, const ValueSetT &values,
		const DominatorT &doms) {
		// Start with the current block the value is defined in.
		Block *dom = value.getParentBlock();
		// Iterate over all aliases and their uses to find a safe placement block
		// according to the given dominator information.
		for (Value childValue : values) {
		for (Operation *user : childValue.getUsers()) {
		// Move upwards in the dominator tree to find an appropriate
		// dominator block that takes the current use into account.
		dom = doms.findNearestCommonDominator(dom, user->getBlock());
		}
		// Take values without any users into account.
		dom = doms.findNearestCommonDominator(dom, childValue.getParentBlock());
		}
		return dom;
		}

		/// Returns true if the given operation represents a loop by testing whether
		/// it implements the `LoopLikeOpInterface` or the `RegionBranchOpInterface`.
		/// In the case of a `RegionBranchOpInterface`, it checks all region-based
		/// control-flow edges for cycles.
		static bool isLoop(Operation *op);

		/// Constructs a new operation base using the given root operation.
		BufferPlacementTransformationBase(Operation *op);

		protected:
		/// Alias information that can be updated during the insertion of copies.
		BufferPlacementAliasAnalysis aliases;

		/// Stores all internally managed allocations.
		BufferPlacementAllocs allocs;

		/// The underlying liveness analysis to compute fine grained information
		/// about alloc and dealloc positions.
		Liveness liveness;
		};

} // end namespace mlir		} // end namespace mlir

#endif // MLIR_TRANSFORMS_BUFFERIZE_H		#endif // MLIR_TRANSFORMS_BUFFERIZE_H

mlir/include/mlir/Transforms/Passes.h

	Show All 22 Lines
	namespace mlir {			namespace mlir {

	class AffineForOp;			class AffineForOp;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Passes			// Passes
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Creates an instance of the BufferPlacement pass.			/// Creates an instance of the BufferDeallocation pass to free all allocated
	std::unique_ptr<Pass> createBufferPlacementPass();			/// buffers.
				std::unique_ptr<Pass> createBufferDeallocationPass();

				/// Creates a pass that moves allocations upwards to reduce the number of
				/// required copies that are inserted during the BufferDeallocation pass.
				std::unique_ptr<Pass> createBufferHoistingPass();

				/// Creates a pass that moves allocations upwards out of loops. This avoids
				herhutUnsubmitted Done Reply Inline Actions Does it reduce copies? I though the goal was to avoid re-allocations inside of loops and instead share a single buffer for the whole loop? herhut: Does it reduce copies? I though the goal was to avoid re-allocations inside of loops and…
				/// reallocations inside of loops.
				std::unique_ptr<Pass> createBufferLoopHoistingPass();
				bondhugulaUnsubmitted Done Reply Inline Actions Add a few more words to this comment (like the one below) - it's almost stating the obvious. bondhugula: Add a few more words to this comment (like the one below) - it's almost stating the obvious.

	/// Creates an instance of the Canonicalizer pass.			/// Creates an instance of the Canonicalizer pass.
	std::unique_ptr<Pass> createCanonicalizerPass();			std::unique_ptr<Pass> createCanonicalizerPass();

	/// Create a pass that removes unnecessary Copy operations.			/// Create a pass that removes unnecessary Copy operations.
	std::unique_ptr<Pass> createCopyRemovalPass();			std::unique_ptr<Pass> createCopyRemovalPass();

	/// Creates a pass to perform common sub expression elimination.			/// Creates a pass to perform common sub expression elimination.
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

mlir/include/mlir/Transforms/Passes.td

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	module {
return		return
}		}
}		}
```		```
}];		}];
let constructor = "mlir::createPipelineDataTransferPass()";		let constructor = "mlir::createPipelineDataTransferPass()";
}		}

def BufferPlacement : FunctionPass<"buffer-placement"> {		def BufferDeallocation : FunctionPass<"buffer-deallocation"> {
let summary = "Optimizes placement of alloc and dealloc operations";		let summary = "Adds all required dealloc operations for all allocations in the "
		"input program";
		bondhugulaUnsubmitted Done Reply Inline Actions This is a welcome change, thanks! bondhugula: This is a welcome change, thanks!
let description = [{		let description = [{
This pass implements an algorithm to optimize the placement of alloc and		This pass implements an algorithm to automatically introduce all required
dealloc operations. This pass also inserts missing dealloc operations		deallocation operations for all buffers in the input program. This ensures that
automatically to reclaim memory.		the resulting program does not have any memory leaks.


Input		Input

```mlir		```mlir
#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>
module {		module {
func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3(%arg1 : memref<2xf32>)		br ^bb3(%arg1 : memref<2xf32>)
^bb2:		^bb2:
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {		linalg.generic {
		args_in = 1 : i64,
		pifon2aUnsubmitted Done Reply Inline Actions +1 for formatting pifon2a: +1 for formatting
		args_out = 1 : i64,
		indexing_maps = [#map0, #map0],
		iterator_types = ["parallel"]} %arg1, %0 {
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):		^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}: memref<2xf32>, memref<2xf32>		}: memref<2xf32>, memref<2xf32>
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}
}		}

```		```

Output		Output

```mlir		```mlir
#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>
module {		module {
func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
%0 = alloc() : memref<2xf32>
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1: // pred: ^bb0		^bb1: // pred: ^bb0
br ^bb3(%arg1 : memref<2xf32>)		%0 = alloc() : memref<2xf32>
		linalg.copy(%arg1, %0) : memref<2xf32>, memref<2xf32>
		br ^bb3(%0 : memref<2xf32>)
^bb2: // pred: ^bb0		^bb2: // pred: ^bb0
linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {		%1 = alloc() : memref<2xf32>
		linalg.generic {
		args_in = 1 : i64,
		args_out = 1 : i64,
		indexing_maps = [#map0, #map0],
		iterator_types = ["parallel"]} %arg1, %1 {
^bb0(%arg3: f32, %arg4: f32): // no predecessors		^bb0(%arg3: f32, %arg4: f32): // no predecessors
%2 = exp %arg3 : f32		%4 = exp %arg3 : f32
linalg.yield %2 : f32		linalg.yield %4 : f32
}: memref<2xf32>, memref<2xf32>		}: memref<2xf32>, memref<2xf32>
br ^bb3(%0 : memref<2xf32>)		%2 = alloc() : memref<2xf32>
^bb3(%1: memref<2xf32>): // 2 preds: ^bb1, ^bb2		linalg.copy(%1, %2) : memref<2xf32>, memref<2xf32>
linalg.copy(%1, %arg2) : memref<2xf32>, memref<2xf32>		dealloc %1 : memref<2xf32>
dealloc %0 : memref<2xf32>		br ^bb3(%2 : memref<2xf32>)
		^bb3(%3: memref<2xf32>): // 2 preds: ^bb1, ^bb2
		linalg.copy(%3, %arg2) : memref<2xf32>, memref<2xf32>
		dealloc %3 : memref<2xf32>
return		return
}		}

}		}
```		```

}];		}];
let constructor = "mlir::createBufferPlacementPass()";		let constructor = "mlir::createBufferDeallocationPass()";
// TODO: this pass likely shouldn't depend on Linalg?		// TODO: this pass likely shouldn't depend on Linalg?
let dependentDialects = ["linalg::LinalgDialect"];		let dependentDialects = ["linalg::LinalgDialect"];
}		}

		def BufferHoisting : FunctionPass<"buffer-hoisting"> {
		let summary = "Optimizes placement of allocation operations by moving them "
		"into common dominators and out of nested regions";
		let description = [{
		This pass implements an approach to aggressively move allocations upwards
		into common dominators and out of nested regions.
		}];
		let constructor = "mlir::createBufferHoistingPass()";
		}

		def BufferLoopHoisting : FunctionPass<"buffer-loop-hoisting"> {
		let summary = "Optimizes placement of allocation operations by moving them "
		"out of loop nests";
		let description = [{
		This pass implements an approach to aggressively move allocations upwards
		out of loop nests. It does not move allocations into common dominators.
		}];
		let constructor = "mlir::createBufferLoopHoistingPass()";
		}

def Canonicalizer : Pass<"canonicalize"> {		def Canonicalizer : Pass<"canonicalize"> {
let summary = "Canonicalize operations";		let summary = "Canonicalize operations";
let description = [{		let description = [{
This pass performs various types of canonicalizations over a set of		This pass performs various types of canonicalizations over a set of
operations. See [Operation Canonicalization](Canonicalization.md) for more		operations. See [Operation Canonicalization](Canonicalization.md) for more
details.		details.
}];		}];
let constructor = "mlir::createCanonicalizerPass()";		let constructor = "mlir::createCanonicalizerPass()";
▲ Show 20 Lines • Show All 349 Lines • Show Last 20 Lines

mlir/lib/Transforms/BufferDeallocation.cpp

This file was moved from mlir/lib/Transforms/BufferPlacement.cpp.

//===- BufferPlacement.cpp - the impl for buffer placement ---------------===//		//===- BufferDeallocation.cpp - the impl for buffer deallocation ----------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements logic for computing correct alloc and dealloc positions.		// This file implements logic for computing correct alloc and dealloc positions.
// Furthermore, buffer placement also adds required new alloc and copy		// Furthermore, buffer placement also adds required new alloc and copy
// operations to ensure that all buffers are deallocated.The main class is the		// operations to ensure that all buffers are deallocated. The main class is the
// BufferPlacementPass class that implements the underlying algorithm. In order		// BufferDeallocationPass class that implements the underlying algorithm. In
// to put allocations and deallocations at safe positions, it is significantly		// order to put allocations and deallocations at safe positions, it is
// important to put them into the correct blocks. However, the liveness analysis		// significantly important to put them into the correct blocks. However, the
// does not pay attention to aliases, which can occur due to branches (and their		// liveness analysis does not pay attention to aliases, which can occur due to
// associated block arguments) in general. For this purpose, BufferPlacement		// branches (and their associated block arguments) in general. For this purpose,
// firstly finds all possible aliases for a single value (using the		// BufferDeallocation firstly finds all possible aliases for a single value
// BufferPlacementAliasAnalysis class). Consider the following example:		// (using the BufferPlacementAliasAnalysis class). Consider the following
		// example:
//		//
// ^bb0(%arg0):		// ^bb0(%arg0):
// cond_br %cond, ^bb1, ^bb2		// cond_br %cond, ^bb1, ^bb2
// ^bb1:		// ^bb1:
// br ^exit(%arg0)		// br ^exit(%arg0)
// ^bb2:		// ^bb2:
// %new_value = ...		// %new_value = ...
// br ^exit(%new_value)		// br ^exit(%new_value)
// ^exit(%arg1):		// ^exit(%arg1):
// return %arg1;		// return %arg1;
//		//
// Using liveness information on its own would cause us to place the allocs and		// We should place the dealloc for %new_value in exit. However, we have to free
// deallocs in the wrong block. This is due to the fact that %new_value will not		// the buffer in the same block, because it cannot be freed in the post
// be liveOut of its block. Instead, we can place the alloc for %new_value		// dominator. However, this requires a new copy buffer for %arg1 that will
// in bb0 and its associated dealloc in exit. Alternatively, the alloc can stay
// (or even has to stay due to additional dependencies) at this location and we
// have to free the buffer in the same block, because it cannot be freed in the
// post dominator. However, this requires a new copy buffer for %arg1 that will
// contain the actual contents. Using the class BufferPlacementAliasAnalysis, we		// contain the actual contents. Using the class BufferPlacementAliasAnalysis, we
// will find out that %new_value has a potential alias %arg1. In order to find		// will find out that %new_value has a potential alias %arg1. In order to find
// the dealloc position we have to find all potential aliases, iterate over		// the dealloc position we have to find all potential aliases, iterate over
// their uses and find the common post-dominator block (note that additional		// their uses and find the common post-dominator block (note that additional
// copies and buffers remove potential aliases and will influence the placement		// copies and buffers remove potential aliases and will influence the placement
// of the deallocs). In all cases, the computed block can be safely used to free		// of the deallocs). In all cases, the computed block can be safely used to free
// the %new_value buffer (may be exit or bb2) as it will die and we can use		// the %new_value buffer (may be exit or bb2) as it will die and we can use
// liveness information to determine the exact operation after which we have to		// liveness information to determine the exact operation after which we have to
// insert the dealloc. Finding the alloc position is similar and non-obvious.		// insert the dealloc. However, the algorithm supports introducing copy buffers
// However, the algorithm supports moving allocs to other places and introducing		// and placing deallocs in safe locations to ensure that all buffers will be
// copy buffers and placing deallocs in safe places to ensure that all buffers		// freed in the end.
// will be freed in the end.
//		//
// TODO:		// TODO:
// The current implementation does not support explicit-control-flow loops and		// The current implementation does not support explicit-control-flow loops and
// the resulting code will be invalid with respect to program semantics.		// the resulting code will be invalid with respect to program semantics.
// However, structured control-flow loops are fully supported. Furthermore, it		// However, structured control-flow loops are fully supported. Furthermore, it
// doesn't accept functions which return buffers already.		// doesn't accept functions which return buffers already.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "PassDetail.h"		#include "PassDetail.h"
#include "mlir/Analysis/Liveness.h"
#include "mlir/Dialect/Linalg/IR/LinalgOps.h"		#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/IR/Dominance.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"
#include "mlir/Interfaces/ControlFlowInterfaces.h"		#include "mlir/Interfaces/ControlFlowInterfaces.h"
		#include "mlir/Interfaces/LoopLikeInterface.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
		#include "mlir/Transforms/Bufferize.h"
#include "mlir/Transforms/Passes.h"		#include "mlir/Transforms/Passes.h"
#include "llvm/ADT/SetOperations.h"		#include "llvm/ADT/SetOperations.h"

using namespace mlir;		using namespace mlir;

/// Walks over all immediate return-like terminators in the given region.		/// Walks over all immediate return-like terminators in the given region.
template <typename FuncT>		template <typename FuncT>
static void walkReturnOperations(Region *region, const FuncT &func) {		static void walkReturnOperations(Region *region, const FuncT &func) {
Show All 16 Lines	static void getSuccessorRegions(RegionBranchOpInterface regionInterface,
SmallVector<Attribute, 2> operandAttributes(		SmallVector<Attribute, 2> operandAttributes(
regionInterface.getOperation()->getNumOperands());		regionInterface.getOperation()->getNumOperands());

// Get all successor regions using the temporarily allocated		// Get all successor regions using the temporarily allocated
// `operandAttributes`.		// `operandAttributes`.
regionInterface.getSuccessorRegions(index, operandAttributes, successors);		regionInterface.getSuccessorRegions(index, operandAttributes, successors);
}		}

namespace {
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacementAliasAnalysis		// BufferPlacementAliasAnalysis
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// A straight-forward alias analysis which ensures that all aliases of all
/// values will be determined. This is a requirement for the BufferPlacement
/// class since you need to determine safe positions to place alloc and
/// deallocs.
class BufferPlacementAliasAnalysis {
public:
using ValueSetT = SmallPtrSet<Value, 16>;
using ValueMapT = llvm::DenseMap<Value, ValueSetT>;

public:
/// Constructs a new alias analysis using the op provided.		/// Constructs a new alias analysis using the op provided.
BufferPlacementAliasAnalysis(Operation *op) { build(op); }		BufferPlacementAliasAnalysis::BufferPlacementAliasAnalysis(Operation *op) {
		build(op);
/// Find all immediate aliases this value could potentially have.
ValueMapT::const_iterator find(Value value) const {
return aliases.find(value);
}		}

/// Returns the end iterator that can be used in combination with find.
ValueMapT::const_iterator end() const { return aliases.end(); }

/// Find all immediate and indirect aliases this value could potentially		/// Find all immediate and indirect aliases this value could potentially
/// have. Note that the resulting set will also contain the value provided as		/// have. Note that the resulting set will also contain the value provided as
/// it is an alias of itself.		/// it is an alias of itself.
ValueSetT resolve(Value value) const {		BufferPlacementAliasAnalysis::ValueSetT
		BufferPlacementAliasAnalysis::resolve(Value value) const {
ValueSetT result;		ValueSetT result;
resolveRecursive(value, result);
return result;
}

/// Removes the given values from all alias sets.
void remove(const SmallPtrSetImpl<Value> &aliasValues) {
for (auto &entry : aliases)
llvm::set_subtract(entry.second, aliasValues);
}

private:
/// Recursively determines alias information for the given value. It stores		/// Recursively determines alias information for the given value. It stores
/// all newly found potential aliases in the given result set.		/// all newly found potential aliases in the given result set.
void resolveRecursive(Value value, ValueSetT &result) const {		std::function<void(Value)> resolveRecursive = [&](Value current) {
if (!result.insert(value).second)		if (!result.insert(current).second)
return;		return;
auto it = aliases.find(value);		auto it = aliases.find(current);
if (it == aliases.end())		if (it == aliases.end())
return;		return;
for (Value alias : it->second)		for (Value alias : it->second)
resolveRecursive(alias, result);		resolveRecursive(alias);
		};

		resolveRecursive(value);
		return result;
		}

		/// Removes the given values from all alias sets.
		void BufferPlacementAliasAnalysis::remove(
		const SmallPtrSetImpl<Value> &aliasValues) {
		for (auto &entry : aliases)
		llvm::set_subtract(entry.second, aliasValues);
}		}

/// This function constructs a mapping from values to its immediate aliases.		/// This function constructs a mapping from values to its immediate aliases.
/// It iterates over all blocks, gets their predecessors, determines the		/// It iterates over all blocks, gets their predecessors, determines the
/// values that will be passed to the corresponding block arguments and		/// values that will be passed to the corresponding block arguments and
/// inserts them into the underlying map. Furthermore, it wires successor		/// inserts them into the underlying map. Furthermore, it wires successor
/// regions and branch-like return operations from nested regions.		/// regions and branch-like return operations from nested regions.
void build(Operation *op) {		void BufferPlacementAliasAnalysis::build(Operation *op) {
// Registers all aliases of the given values.		// Registers all aliases of the given values.
auto registerAliases = [&](auto values, auto aliases) {		auto registerAliases = [&](auto values, auto aliases) {
for (auto entry : llvm::zip(values, aliases))		for (auto entry : llvm::zip(values, aliases))
this->aliases[std::get<0>(entry)].insert(std::get<1>(entry));		this->aliases[std::get<0>(entry)].insert(std::get<1>(entry));
};		};

// Add additional aliases created by view changes to the alias list.		// Add additional aliases created by view changes to the alias list.
op->walk([&](ViewLikeOpInterface viewInterface) {		op->walk([&](ViewLikeOpInterface viewInterface) {
aliases[viewInterface.getViewSource()].insert(		aliases[viewInterface.getViewSource()].insert(
viewInterface.getOperation()->getResult(0));		viewInterface.getOperation()->getResult(0));
});		});

// Query all branch interfaces to link block argument aliases.		// Query all branch interfaces to link block argument aliases.
op->walk([&](BranchOpInterface branchInterface) {		op->walk([&](BranchOpInterface branchInterface) {
Block *parentBlock = branchInterface.getOperation()->getBlock();		Block *parentBlock = branchInterface.getOperation()->getBlock();
for (auto it = parentBlock->succ_begin(), e = parentBlock->succ_end();		for (auto it = parentBlock->succ_begin(), e = parentBlock->succ_end();
it != e; ++it) {		it != e; ++it) {
// Query the branch op interface to get the successor operands.		// Query the branch op interface to get the successor operands.
auto successorOperands =		auto successorOperands =
branchInterface.getSuccessorOperands(it.getIndex());		branchInterface.getSuccessorOperands(it.getIndex());
if (!successorOperands.hasValue())		if (!successorOperands.hasValue())
continue;		continue;
// Build the actual mapping of values to their immediate aliases.		// Build the actual mapping of values to their immediate aliases.
registerAliases(successorOperands.getValue(), (*it)->getArguments());		registerAliases(successorOperands.getValue(), (*it)->getArguments());
}		}
});		});

// Query the RegionBranchOpInterface to find potential successor regions.		// Query the RegionBranchOpInterface to find potential successor regions.
op->walk([&](RegionBranchOpInterface regionInterface) {		op->walk([&](RegionBranchOpInterface regionInterface) {
// Extract all entry regions and wire all initial entry successor inputs.		// Extract all entry regions and wire all initial entry successor inputs.
SmallVector<RegionSuccessor, 2> entrySuccessors;		SmallVector<RegionSuccessor, 2> entrySuccessors;
getSuccessorRegions(regionInterface, /index=/llvm::None,		getSuccessorRegions(regionInterface, /index=/llvm::None, entrySuccessors);
entrySuccessors);
for (RegionSuccessor &entrySuccessor : entrySuccessors) {		for (RegionSuccessor &entrySuccessor : entrySuccessors) {
// Wire the entry region's successor arguments with the initial		// Wire the entry region's successor arguments with the initial
// successor inputs.		// successor inputs.
assert(entrySuccessor.getSuccessor() &&		assert(entrySuccessor.getSuccessor() &&
"Invalid entry region without an attached successor region");		"Invalid entry region without an attached successor region");
registerAliases(regionInterface.getSuccessorEntryOperands(		registerAliases(regionInterface.getSuccessorEntryOperands(
entrySuccessor.getSuccessor()->getRegionNumber()),		entrySuccessor.getSuccessor()->getRegionNumber()),
entrySuccessor.getSuccessorInputs());		entrySuccessor.getSuccessorInputs());
}		}

// Wire flow between regions and from region exits.		// Wire flow between regions and from region exits.
for (Region &region : regionInterface.getOperation()->getRegions()) {		for (Region &region : regionInterface.getOperation()->getRegions()) {
// Iterate over all successor region entries that are reachable from the		// Iterate over all successor region entries that are reachable from the
// current region.		// current region.
SmallVector<RegionSuccessor, 2> successorRegions;		SmallVector<RegionSuccessor, 2> successorRegions;
getSuccessorRegions(regionInterface, region.getRegionNumber(),		getSuccessorRegions(regionInterface, region.getRegionNumber(),
successorRegions);		successorRegions);
for (RegionSuccessor &successorRegion : successorRegions) {		for (RegionSuccessor &successorRegion : successorRegions) {
// Iterate over all immediate terminator operations and wire the		// Iterate over all immediate terminator operations and wire the
// successor inputs with the operands of each terminator.		// successor inputs with the operands of each terminator.
walkReturnOperations(&region, [&](Operation *terminator) {		walkReturnOperations(&region, [&](Operation *terminator) {
registerAliases(terminator->getOperands(),		registerAliases(terminator->getOperands(),
successorRegion.getSuccessorInputs());		successorRegion.getSuccessorInputs());
});		});
}		}
}		}
});		});
}		}

/// Maps values to all immediate aliases this value can have.		//===----------------------------------------------------------------------===//
ValueMapT aliases;		// BufferPlacementAllocs
		//===----------------------------------------------------------------------===//

		/// Get the start operation to place the given alloc value withing the
		// specified placement block.
		Operation *BufferPlacementAllocs::getStartOperation(Value allocValue,
		Block *placementBlock,
		const Liveness &liveness) {
		// We have to ensure that we place the alloc before its first use in this
		// block.
		const LivenessBlockInfo &livenessInfo = *liveness.getLiveness(placementBlock);
		Operation *startOperation = livenessInfo.getStartOperation(allocValue);
		// Check whether the start operation lies in the desired placement block.
		// If not, we will use the terminator as this is the last operation in
		// this block.
		if (startOperation->getBlock() != placementBlock) {
		Operation *opInPlacementBlock =
		placementBlock->findAncestorOpInBlock(*startOperation);
		startOperation = opInPlacementBlock ? opInPlacementBlock
		: placementBlock->getTerminator();
		}

		return startOperation;
		}

		/// Finds associated deallocs that can be linked to our allocation nodes (if
		/// any).
		Operation *BufferPlacementAllocs::findDealloc(Value allocValue) {
		auto userIt = llvm::find_if(allocValue.getUsers(), [&](Operation *user) {
		auto effectInterface = dyn_cast<MemoryEffectOpInterface>(user);
		if (!effectInterface)
		return false;
		// Try to find a free effect that is applied to one of our values
		// that will be automatically freed by our pass.
		SmallVector<MemoryEffects::EffectInstance, 2> effects;
		effectInterface.getEffectsOnValue(allocValue, effects);
		return llvm::any_of(effects, [&](MemoryEffects::EffectInstance &it) {
		return isa<MemoryEffects::Free>(it.getEffect());
		});
		});
		// Assign the associated dealloc operation (if any).
		return userIt != allocValue.user_end() ? *userIt : nullptr;
		}

		/// Initializes the internal list by discovering all supported allocation
		/// nodes.
		BufferPlacementAllocs::BufferPlacementAllocs(Operation *op) { build(op); }

		/// Searches for and registers all supported allocation entries.
		void BufferPlacementAllocs::build(Operation *op) {
		herhutUnsubmitted Done Reply Inline Actions Why do we need to store the `placementBlock` here? Does this ever need to be updated? As far as I can see, it is always the block where the corresponding alloc was initially. Would `allocValue->getDefiningOp()->getBlock()` not always do the same? herhut: Why do we need to store the `placementBlock` here? Does this ever need to be updated? As far as…
		op->walk([&](MemoryEffectOpInterface opInterface) {
		// Try to find a single allocation result.
		SmallVector<MemoryEffects::EffectInstance, 2> effects;
		opInterface.getEffects(effects);

		SmallVector<MemoryEffects::EffectInstance, 2> allocateResultEffects;
		llvm::copy_if(
		effects, std::back_inserter(allocateResultEffects),
		[=](MemoryEffects::EffectInstance &it) {
		Value value = it.getValue();
		return isa<MemoryEffects::Allocate>(it.getEffect()) && value &&
		value.isa<OpResult>() &&
		it.getResource() !=
		SideEffects::AutomaticAllocationScopeResource::get();
		herhutUnsubmitted Done Reply Inline Actions This logic could be moved to the use site or is this reused by anything other than the 'BufferPlacementHoisting' pass? herhut: This logic could be moved to the use site or is this reused by anything other than the…
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions This logic will be used by other passes in the near future - in one of the upcoming CLs :) dfki-mako: This logic will be used by other passes in the near future - in one of the upcoming CLs :)
		});
		// If there is one result only, we will be able to move the allocation and
		// (possibly existing) deallocation ops.
		if (allocateResultEffects.size() != 1)
		pifon2aUnsubmitted Done Reply Inline Actions `const LivenessBlockInfo& livenessInfo = liveness.getLiveness(placementBlock);` pifon2a:* ` const LivenessBlockInfo& livenessInfo = *liveness.getLiveness(placementBlock);`
		return;
		// Get allocation result.
		Value allocValue = allocateResultEffects[0].getValue();
		// Find the associated dealloc value and register the allocation entry.
		allocs.push_back(std::make_tuple(allocValue, findDealloc(allocValue)));
		});
		}

		//===----------------------------------------------------------------------===//
		// BufferPlacementTransformationBase
		//===----------------------------------------------------------------------===//

		/// Constructs a new transformation base using the given root operation.
		BufferPlacementTransformationBase::BufferPlacementTransformationBase(
		Operation *op)
		: aliases(op), allocs(op), liveness(op) {}

		/// Returns true if the given operation represents a loop by testing whether it
		/// implements the `LoopLikeOpInterface` or the `RegionBranchOpInterface`. In
		/// the case of a `RegionBranchOpInterface`, it checks all region-based control-
		/// flow edges for cycles.
		bool BufferPlacementTransformationBase::isLoop(Operation *op) {
		// If the operation implements the `LoopLikeOpInterface` it can be considered
		// a loop.
		if (isa<LoopLikeOpInterface>(op))
		return true;

		// If the operation does not implement the `RegionBranchOpInterface`, it is
		// (currently) not possible to detect a loop.
		RegionBranchOpInterface regionInterface;
		if (!(regionInterface = dyn_cast<RegionBranchOpInterface>(op)))
		return false;

		// Recurses into a region using the current region interface to find potential
		// cycles.
		SmallPtrSet<Region *, 4> visitedRegions;
		std::function<bool(Region )> recurse = [&](Region current) {
		if (!current)
		return false;
		// If we have found a back edge, the parent operation induces a loop.
		if (!visitedRegions.insert(current).second)
		return true;
		// Recurses into all region successors.
		SmallVector<RegionSuccessor, 2> successors;
		getSuccessorRegions(regionInterface, current->getRegionNumber(),
		successors);
		for (RegionSuccessor &regionEntry : successors)
		if (recurse(regionEntry.getSuccessor()))
		return true;
		return false;
};		};

		// Start with all entry regions and test whether they induce a loop.
		SmallVector<RegionSuccessor, 2> successorRegions;
		getSuccessorRegions(regionInterface, /index=/llvm::None, successorRegions);
		for (RegionSuccessor &regionEntry : successorRegions) {
		if (recurse(regionEntry.getSuccessor()))
		return true;
		visitedRegions.clear();
		}

		return false;
		}

		namespace {

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Backedges		// Backedges analysis
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// A straight-forward program analysis which detects loop backedges induced by		/// A straight-forward program analysis which detects loop backedges induced by
/// explicit control flow.		/// explicit control flow.
class Backedges {		class Backedges {
public:		public:
using BlockSetT = SmallPtrSet<Block *, 16>;		using BlockSetT = SmallPtrSet<Block *, 16>;
using BackedgeSetT = llvm::DenseSet<std::pair<Block , Block >>;		using BackedgeSetT = llvm::DenseSet<std::pair<Block , Block >>;
Show All 26 Lines	private:
void exit(Block &current) { visited.erase(&current); }		void exit(Block &current) { visited.erase(&current); }

/// Recurses into the given operation while taking all attached regions into		/// Recurses into the given operation while taking all attached regions into
/// account.		/// account.
void recurse(Operation op, Block predecessor) {		void recurse(Operation op, Block predecessor) {
Block *current = op->getBlock();		Block *current = op->getBlock();
// If the current op implements the `BranchOpInterface`, there can be		// If the current op implements the `BranchOpInterface`, there can be
// cycles in the scope of all successor blocks.		// cycles in the scope of all successor blocks.
if (isa<BranchOpInterface>(op)) {		if (isa<BranchOpInterface>(op)) {
		herhutUnsubmitted Done Reply Inline Actions nit: This no longer finds the initial allocation block. herhut: nit: This no longer finds the initial allocation block.
for (Block *succ : current->getSuccessors())		for (Block *succ : current->getSuccessors())
recurse(*succ, current);		recurse(*succ, current);
}		}
// Recurse into all distinct regions and check for explicit control-flow		// Recurse into all distinct regions and check for explicit control-flow
// loops.		// loops.
for (Region &region : op->getRegions())		for (Region &region : op->getRegions())
recurse(region.front(), current);		recurse(region.front(), current);
}		}

/// Recurses into explicit control-flow structures that are given by		/// Recurses into explicit control-flow structures that are given by
/// the successor relation defined on the block level.		/// the successor relation defined on the block level.
void recurse(Block &block, Block *predecessor) {		void recurse(Block &block, Block *predecessor) {
// Try to enter the current block. If this is not possible, we are		// Try to enter the current block. If this is not possible, we are
// currently processing this block and can safely return here.		// currently processing this block and can safely return here.
if (!enter(block, predecessor))		if (!enter(block, predecessor))
return;		return;

// Recurse into all operations and successor blocks.		// Recurse into all operations and successor blocks.
for (auto &op : block.getOperations())		for (Operation &op : block.getOperations())
recurse(&op, predecessor);		recurse(&op, predecessor);

// Leave the current block.		// Leave the current block.
exit(block);		exit(block);
}		}

/// Stores all blocks that are currently visited and on the processing stack.		/// Stores all blocks that are currently visited and on the processing stack.
BlockSetT visited;		BlockSetT visited;

/// Stores all backedges in the format (source, target).		/// Stores all backedges in the format (source, target).
BackedgeSetT edgeSet;		BackedgeSetT edgeSet;
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacement		// BufferDeallocation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// The main buffer placement analysis used to place allocs, copies and deallocs.		/// The buffer deallocation transformation which ensures that all allocs in the
		herhutUnsubmitted Done Reply Inline Actions Maybe clarify this a bit. It ensures that all allocs in the program have a corresponding de-allocation. As a side-effect, it might also introduce copies (which lead to allocs again). herhut: Maybe clarify this a bit. It ensures that all allocs in the program have a corresponding de…
class BufferPlacement {		/// program have a corresponding de-allocation. As a side-effect, it might also
		/// introduce copies that in turn leads to additional allocs and de-allocations.
		class BufferDeallocation : BufferPlacementTransformationBase {
public:		public:
using ValueSetT = BufferPlacementAliasAnalysis::ValueSetT;		BufferDeallocation(Operation *op)
		: BufferPlacementTransformationBase(op), dominators(op),
/// An intermediate representation of a single allocation node.		postDominators(op) {}
struct AllocEntry {
/// A reference to the associated allocation node.		/// Performs the actual placement/creation of all temporary alloc, copy and
Value allocValue;		/// dealloc nodes.
		void deallocate() {
/// The associated placement block in which the allocation should be
/// performed.
Block *placementBlock;

/// The associated dealloc operation (if any).
Operation *deallocOperation;
};

using AllocEntryList = SmallVector<AllocEntry, 8>;

public:
BufferPlacement(Operation *op)
: operation(op), aliases(op), liveness(op), dominators(op),
postDominators(op) {
// Gather all allocation nodes
initBlockMapping();
}

/// Performs the actual placement/creation of all alloc, copy and dealloc
/// nodes.
void place() {
// Place all allocations.
placeAllocs();
// Add additional allocations and copies that are required.		// Add additional allocations and copies that are required.
introduceCopies();		introduceCopies();
// Find all associated dealloc nodes.
findDeallocs();
// Place deallocations for all allocation entries.		// Place deallocations for all allocation entries.
placeDeallocs();		placeDeallocs();
}		}

private:		private:
/// Initializes the internal block mapping by discovering allocation nodes. It
/// maps all allocation nodes to their initial block in which they can be
/// safely allocated.
void initBlockMapping() {
operation->walk([&](MemoryEffectOpInterface opInterface) {
// Try to find a single allocation result.
SmallVector<MemoryEffects::EffectInstance, 2> effects;
opInterface.getEffects(effects);

SmallVector<MemoryEffects::EffectInstance, 2> allocateResultEffects;
llvm::copy_if(
effects, std::back_inserter(allocateResultEffects),
[=](MemoryEffects::EffectInstance &it) {
Value value = it.getValue();
return isa<MemoryEffects::Allocate>(it.getEffect()) && value &&
value.isa<OpResult>() &&
it.getResource() !=
SideEffects::AutomaticAllocationScopeResource::get();
});
// If there is one result only, we will be able to move the allocation and
// (possibly existing) deallocation ops.
if (allocateResultEffects.size() != 1)
return;
// Get allocation result.
auto allocResult = allocateResultEffects[0].getValue().cast<OpResult>();
// Find the initial allocation block and register this result.
allocs.push_back(
{allocResult, getInitialAllocBlock(allocResult), nullptr});
});
}

/// Computes a valid allocation position in a dominator (if possible) for the
/// given allocation result.
Block *getInitialAllocBlock(OpResult result) {
// Get all allocation operands as these operands are important for the
// allocation operation.
Operation *owner = result.getOwner();
auto operands = owner->getOperands();
Block *dominator;
if (operands.size() < 1)
dominator =
findCommonDominator(result, aliases.resolve(result), dominators);
else {
// If this node has dependencies, check all dependent nodes with respect
// to a common post dominator in which all values are available.
ValueSetT dependencies(++operands.begin(), operands.end());
dominator =
findCommonDominator(*operands.begin(), dependencies, postDominators);
}

// Do not move allocs out of their parent regions to keep them local.
if (dominator->getParent() != owner->getParentRegion())
return &owner->getParentRegion()->front();
return dominator;
}

/// Finds correct alloc positions according to the algorithm described at
/// the top of the file for all alloc nodes that can be handled by this
/// analysis.
void placeAllocs() const {
for (const AllocEntry &entry : allocs) {
Value alloc = entry.allocValue;
// Get the actual block to place the alloc and get liveness information
// for the placement block.
Block *placementBlock = entry.placementBlock;
// We have to ensure that we place the alloc before its first use in this
// block.
const LivenessBlockInfo *livenessInfo =
liveness.getLiveness(placementBlock);
Operation *startOperation = livenessInfo->getStartOperation(alloc);
// Check whether the start operation lies in the desired placement block.
// If not, we will use the terminator as this is the last operation in
// this block.
if (startOperation->getBlock() != placementBlock)
startOperation = placementBlock->getTerminator();

// Move the alloc in front of the start operation.
Operation *allocOperation = alloc.getDefiningOp();
allocOperation->moveBefore(startOperation);
}
}

/// Introduces required allocs and copy operations to avoid memory leaks.		/// Introduces required allocs and copy operations to avoid memory leaks.
void introduceCopies() {		void introduceCopies() {
// Initialize the set of values that require a dedicated memory free		// Initialize the set of values that require a dedicated memory free
// operation since their operands cannot be safely deallocated in a post		// operation since their operands cannot be safely deallocated in a post
// dominator.		// dominator.
SmallPtrSet<Value, 8> valuesToFree;		SmallPtrSet<Value, 8> valuesToFree;
llvm::SmallDenseSet<std::tuple<Value, Block *>> visitedValues;		llvm::SmallDenseSet<std::tuple<Value, Block *>> visitedValues;
SmallVector<std::tuple<Value, Block *>, 8> toProcess;		SmallVector<std::tuple<Value, Block *>, 8> toProcess;
Show All 20 Lines	auto findUnsafeValues = [&](Value source, Block *definingBlock) {
valuesToFree.insert(value);		valuesToFree.insert(value);
} else if (visitedValues.insert(std::make_tuple(value, definingBlock))		} else if (visitedValues.insert(std::make_tuple(value, definingBlock))
.second)		.second)
toProcess.emplace_back(value, definingBlock);		toProcess.emplace_back(value, definingBlock);
}		}
};		};

// Detect possibly unsafe aliases starting from all allocations.		// Detect possibly unsafe aliases starting from all allocations.
for (auto &entry : allocs)		for (BufferPlacementAllocs::AllocEntry &entry : allocs) {
findUnsafeValues(entry.allocValue, entry.placementBlock);		Value allocValue = std::get<0>(entry);
		findUnsafeValues(allocValue, allocValue.getDefiningOp()->getBlock());
		}
// Try to find block arguments that require an explicit free operation		// Try to find block arguments that require an explicit free operation
// until we reach a fix point.		// until we reach a fix point.
while (!toProcess.empty()) {		while (!toProcess.empty()) {
auto current = toProcess.pop_back_val();		auto current = toProcess.pop_back_val();
findUnsafeValues(std::get<0>(current), std::get<1>(current));		findUnsafeValues(std::get<0>(current), std::get<1>(current));
}		}

// Update buffer aliases to ensure that we free all buffers and block		// Update buffer aliases to ensure that we free all buffers and block
// arguments at the correct locations.		// arguments at the correct locations.
aliases.remove(valuesToFree);		aliases.remove(valuesToFree);

// Add new allocs and additional copy operations.		// Add new allocs and additional copy operations.
for (Value value : valuesToFree) {		for (Value value : valuesToFree) {
if (auto blockArg = value.dyn_cast<BlockArgument>())		if (auto blockArg = value.dyn_cast<BlockArgument>())
introduceBlockArgCopy(blockArg);		introduceBlockArgCopy(blockArg);
else		else
introduceValueCopyForRegionResult(value);		introduceValueCopyForRegionResult(value);

// Register the value to require a final dealloc. Note that we do not have		// Register the value to require a final dealloc. Note that we do not have
// to assign a block here since we do not want to move the allocation node		// to assign a block here since we do not want to move the allocation node
// to another location.		// to another location.
allocs.push_back({value, nullptr, nullptr});		allocs.registerAlloc(std::make_tuple(value, nullptr));
}		}
}		}

/// Introduces temporary allocs in all predecessors and copies the source		/// Introduces temporary allocs in all predecessors and copies the source
/// values into the newly allocated buffers.		/// values into the newly allocated buffers.
void introduceBlockArgCopy(BlockArgument blockArg) {		void introduceBlockArgCopy(BlockArgument blockArg) {
// Allocate a buffer for the current block argument in the block of		// Allocate a buffer for the current block argument in the block of
// the associated value (which will be a predecessor block by		// the associated value (which will be a predecessor block by
Show All 39 Lines	introduceCopiesForRegionSuccessors(
// Find a predecessor of our argRegion.		// Find a predecessor of our argRegion.
return successorRegion.getSuccessor() == argRegion;		return successorRegion.getSuccessor() == argRegion;
});		});

// Check whether the block argument belongs to an entry region of the		// Check whether the block argument belongs to an entry region of the
// parent operation. In this case, we have to introduce an additional copy		// parent operation. In this case, we have to introduce an additional copy
// for buffer that is passed to the argument.		// for buffer that is passed to the argument.
SmallVector<RegionSuccessor, 2> successorRegions;		SmallVector<RegionSuccessor, 2> successorRegions;
getSuccessorRegions(regionInterface, llvm::None, successorRegions);		getSuccessorRegions(regionInterface, /index=/llvm::None,
		successorRegions);
auto *it =		auto *it =
llvm::find_if(successorRegions, [&](RegionSuccessor &successorRegion) {		llvm::find_if(successorRegions, [&](RegionSuccessor &successorRegion) {
return successorRegion.getSuccessor() == argRegion;		return successorRegion.getSuccessor() == argRegion;
});		});
if (it == successorRegions.end())		if (it == successorRegions.end())
return;		return;

// Determine the actual operand to introduce a copy for and rewire the		// Determine the actual operand to introduce a copy for and rewire the
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	Value introduceBufferCopy(Value sourceValue, Operation *terminator) {
// allocation to the new one.		// allocation to the new one.
builder.create<linalg::CopyOp>(terminator->getLoc(), sourceValue, alloc);		builder.create<linalg::CopyOp>(terminator->getLoc(), sourceValue, alloc);

// Remember the copy of original source value.		// Remember the copy of original source value.
copiedValues.insert(alloc);		copiedValues.insert(alloc);
return alloc;		return alloc;
}		}

/// Finds associated deallocs that can be linked to our allocation nodes (if
/// any).
void findDeallocs() {
for (auto &entry : allocs) {
auto userIt =
llvm::find_if(entry.allocValue.getUsers(), [&](Operation *user) {
auto effectInterface = dyn_cast<MemoryEffectOpInterface>(user);
if (!effectInterface)
return false;
// Try to find a free effect that is applied to one of our values
// that will be automatically freed by our pass.
SmallVector<MemoryEffects::EffectInstance, 2> effects;
effectInterface.getEffectsOnValue(entry.allocValue, effects);
return llvm::any_of(
effects, [&](MemoryEffects::EffectInstance &it) {
return isa<MemoryEffects::Free>(it.getEffect());
});
});
// Assign the associated dealloc operation (if any).
if (userIt != entry.allocValue.user_end())
entry.deallocOperation = *userIt;
}
}

/// Finds correct dealloc positions according to the algorithm described at		/// Finds correct dealloc positions according to the algorithm described at
/// the top of the file for all alloc nodes and block arguments that can be		/// the top of the file for all alloc nodes and block arguments that can be
/// handled by this analysis.		/// handled by this analysis.
void placeDeallocs() const {		void placeDeallocs() const {
// Move or insert deallocs using the previously computed information.		// Move or insert deallocs using the previously computed information.
// These deallocations will be linked to their associated allocation nodes		// These deallocations will be linked to their associated allocation nodes
// since they don't have any aliases that can (potentially) increase their		// since they don't have any aliases that can (potentially) increase their
// liveness.		// liveness.
for (const AllocEntry &entry : allocs) {		for (const BufferPlacementAllocs::AllocEntry &entry : allocs) {
Value alloc = entry.allocValue;		Value alloc = std::get<0>(entry);
auto aliasesSet = aliases.resolve(alloc);		auto aliasesSet = aliases.resolve(alloc);
assert(aliasesSet.size() > 0 && "must contain at least one alias");		assert(aliasesSet.size() > 0 && "must contain at least one alias");

// Determine the actual block to place the dealloc and get liveness		// Determine the actual block to place the dealloc and get liveness
// information.		// information.
Block *placementBlock =		Block *placementBlock =
findCommonDominator(alloc, aliasesSet, postDominators);		findCommonDominator(alloc, aliasesSet, postDominators);
const LivenessBlockInfo *livenessInfo =		const LivenessBlockInfo *livenessInfo =
liveness.getLiveness(placementBlock);		liveness.getLiveness(placementBlock);

// We have to ensure that the dealloc will be after the last use of all		// We have to ensure that the dealloc will be after the last use of all
// aliases of the given value. We first assume that there are no uses in		// aliases of the given value. We first assume that there are no uses in
// the placementBlock and that we can safely place the dealloc at the		// the placementBlock and that we can safely place the dealloc at the
// beginning.		// beginning.
Operation *endOperation = &placementBlock->front();		Operation *endOperation = &placementBlock->front();

// Iterate over all aliases and ensure that the endOperation will point		// Iterate over all aliases and ensure that the endOperation will point
// to the last operation of all potential aliases in the placementBlock.		// to the last operation of all potential aliases in the placementBlock.
for (Value alias : aliasesSet) {		for (Value alias : aliasesSet) {
		// Ensure that the start operation is at least the defining operation of
		// the current alias to avoid invalid placement of deallocs for aliases
		// without any uses.
		Operation *beforeOp = endOperation;
		if (alias.getDefiningOp() &&
		!(beforeOp = placementBlock->findAncestorOpInBlock(
		*alias.getDefiningOp())))
		continue;

Operation *aliasEndOperation =		Operation *aliasEndOperation =
livenessInfo->getEndOperation(alias, endOperation);		livenessInfo->getEndOperation(alias, beforeOp);
// Check whether the aliasEndOperation lies in the desired block and		// Check whether the aliasEndOperation lies in the desired block and
// whether it is behind the current endOperation. If yes, this will be		// whether it is behind the current endOperation. If yes, this will be
// the new endOperation.		// the new endOperation.
if (aliasEndOperation->getBlock() == placementBlock &&		if (aliasEndOperation->getBlock() == placementBlock &&
endOperation->isBeforeInBlock(aliasEndOperation))		endOperation->isBeforeInBlock(aliasEndOperation))
endOperation = aliasEndOperation;		endOperation = aliasEndOperation;
}		}
// endOperation is the last operation behind which we can safely store		// endOperation is the last operation behind which we can safely store
// the dealloc taking all potential aliases into account.		// the dealloc taking all potential aliases into account.

// If there is an existing dealloc, move it to the right place.		// If there is an existing dealloc, move it to the right place.
if (entry.deallocOperation) {		Operation *deallocOperation = std::get<1>(entry);
entry.deallocOperation->moveAfter(endOperation);		if (deallocOperation) {
		deallocOperation->moveAfter(endOperation);
} else {		} else {
// If the Dealloc position is at the terminator operation of the		// If the Dealloc position is at the terminator operation of the
// block, then the value should escape from a deallocation.		// block, then the value should escape from a deallocation.
Operation *nextOp = endOperation->getNextNode();		Operation *nextOp = endOperation->getNextNode();
if (!nextOp)		if (!nextOp)
continue;		continue;
// If there is no dealloc node, insert one in the right place.		// If there is no dealloc node, insert one in the right place.
OpBuilder builder(nextOp);		OpBuilder builder(nextOp);
builder.create<DeallocOp>(alloc.getLoc(), alloc);		builder.create<DeallocOp>(alloc.getLoc(), alloc);
}		}
}		}
}		}

/// Finds a common dominator for the given value while taking the positions		/// The dominator info to find the appropriate start operation to move the
/// of the values in the value set into account. It supports dominator and		/// allocs.
/// post-dominator analyses via template arguments.
template <typename DominatorT>
Block *findCommonDominator(Value value, const ValueSetT &values,
const DominatorT &doms) const {
// Start with the current block the value is defined in.
Block *dom = value.getParentBlock();
// Iterate over all aliases and their uses to find a safe placement block
// according to the given dominator information.
for (Value childValue : values)
for (Operation *user : childValue.getUsers()) {
// Move upwards in the dominator tree to find an appropriate
// dominator block that takes the current use into account.
dom = doms.findNearestCommonDominator(dom, user->getBlock());
}
return dom;
}

/// The operation this transformation was constructed from.
Operation *operation;

/// Alias information that can be updated during the insertion of copies.
BufferPlacementAliasAnalysis aliases;

/// Maps allocation nodes to their associated blocks.
AllocEntryList allocs;

// Stores already copied allocations to avoid additional copies of copies.
ValueSetT copiedValues;

/// The underlying liveness analysis to compute fine grained information
/// about alloc and dealloc positions.
Liveness liveness;

/// The dominator analysis to place deallocs in the appropriate blocks.
DominanceInfo dominators;		DominanceInfo dominators;

/// The post dominator analysis to place deallocs in the appropriate blocks.		/// The post dominator info to move the dependent allocs in the right
		/// position.
PostDominanceInfo postDominators;		PostDominanceInfo postDominators;

		/// Stores already copied allocations to avoid additional copies of copies.
		ValueSetT copiedValues;
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacementPass		// BufferDeallocationPass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// The actual buffer placement pass that moves alloc and dealloc nodes into		/// The actual buffer deallocation pass that inserts and moves dealloc nodes
/// the right positions. It uses the algorithm described at the top of the		/// into the right positions. Furthermore, it inserts additional allocs and
/// file.		/// copies if necessary. It uses the algorithm described at the top of the file.
struct BufferPlacementPass : BufferPlacementBase<BufferPlacementPass> {		struct BufferDeallocationPass : BufferDeallocationBase<BufferDeallocationPass> {

void runOnFunction() override {		void runOnFunction() override {
// Ensure that there are supported loops only.		// Ensure that there are supported loops only.
Backedges backedges(getFunction());		Backedges backedges(getFunction());
if (backedges.size()) {		if (backedges.size()) {
getFunction().emitError(		getFunction().emitError(
"Structured control-flow loops are supported only.");		"Structured control-flow loops are supported only.");
return;		return;
}		}

// Place all required alloc, copy and dealloc nodes.		// Place all required temporary alloc, copy and dealloc nodes.
BufferPlacement placement(getFunction());		BufferDeallocation deallocation(getFunction());
placement.place();		deallocation.deallocate();
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacementPass construction		// BufferDeallocationPass construction
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

std::unique_ptr<Pass> mlir::createBufferPlacementPass() {		std::unique_ptr<Pass> mlir::createBufferDeallocationPass() {
return std::make_unique<BufferPlacementPass>();		return std::make_unique<BufferDeallocationPass>();
}		}

mlir/lib/Transforms/BufferOptimizations.cpp

This file was added.

				//===- BufferOptimizations.cpp - pre-pass optimizations for bufferization -===//
				//
				pifon2aUnsubmitted Done Reply Inline Actions nit: remove empty line pifon2a: nit: remove empty line
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements logic for two optimization passes. These passes try to
				// hoist alloc nodes to reduce the number of allocations and copies during
				// buffer deallocation.

				bondhugulaUnsubmitted Done Reply Inline Actions Nit: use width bondhugula: Nit: use width
				#include "PassDetail.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/Interfaces/LoopLikeInterface.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/Bufferize.h"
				#include "mlir/Transforms/Passes.h"

				using namespace mlir;

				namespace {

				//===----------------------------------------------------------------------===//
				// BufferAllocationHoisting
				//===----------------------------------------------------------------------===//

				/// A base implementation compatible with the `BufferAllocationHoisting` class.
				struct BufferAllocationHoistingStateBase {
				/// A pointer to the current dominance info.
				DominanceInfo *dominators;
				herhutUnsubmitted Done Reply Inline Actions Nit: `Hosting` Maybe also drop the Placement and instead call it `BufferAllocationHoisting`? herhut: Nit: `Hosting` Maybe also drop the Placement and instead call it `BufferAllocationHoisting`?

				/// The current allocation value.
				Value allocValue;

				bondhugulaUnsubmitted Done Reply Inline Actions Avoid `auto`. bondhugula: Avoid `auto`.
				/// The current placement block (if any).
				Block *placementBlock;

				/// Initializes the state base.
				BufferAllocationHoistingStateBase(DominanceInfo *dominators, Value allocValue,
				Block *placementBlock)
				: dominators(dominators), allocValue(allocValue),
				placementBlock(placementBlock) {}
				};

				herhutUnsubmitted Done Reply Inline Actions It is not clear to me why this is computed first for all allocations here and then there is a second round in `hoist()` that applies this? Once could equally well compute one placement and then host the alloc, avoiding creating intermediate state. So in essence, move the loop outwards and do foreach alloc : allocs { computePos; hoist; } herhut: It is not clear to me why this is computed first for all allocations here and then there is a…
				/// Implements the actual hoisting logic for allocation nodes.
				template <typename StateT>
				class BufferAllocationHoisting : public BufferPlacementTransformationBase {
				private:
				/// Returns true if the given operation implements a known high-level region-
				/// based control-flow interface.
				herhutUnsubmitted Done Reply Inline Actions nit: common herhut: nit: common
				static bool isKnownControlFlowInterface(Operation *op) {
				return isa<LoopLikeOpInterface, RegionBranchOpInterface>(op);
				}

				public:
				BufferAllocationHoisting(Operation *op)
				: BufferPlacementTransformationBase(op), dominators(op),
				pifon2aUnsubmitted Done Reply Inline Actions you don't really need `Block* placementBlock` line. The code is shorter without this tmp variable. if (operands.size() < 1) { // If not, we have to find the common dominator of all aliases and move // the allocation out of nested loops. auto resultAliases = aliases.resolve(alloc); allocEntry.placementBlock = findCommonDominator(alloc, resultAliases, dominators); moveOutOfLoop(allocEntry.placementBlock); return; } // If this node has dependencies, check all dependent nodes with respect // to a common post dominator in which all values are available. ValueSetT dependencies(++operands.begin(), operands.end()); allocEntry.placementBlock = findCommonDominator(operands.begin(), dependencies, postDominators); pifon2a:* you don't really need `Block* placementBlock` line. The code is shorter without this tmp…
				postDominators(op) {}
				bondhugulaUnsubmitted Done Reply Inline Actions Missing assert for `getDefiningOp`. bondhugula: Missing assert for `getDefiningOp`.
				herhutUnsubmitted Done Reply Inline Actions `all values this allocation depends on are available`? herhut: `all values this allocation depends on are available`?

				/// Moves allocations upwards.
				void hoist() {
				bondhugulaUnsubmitted Done Reply Inline Actions `!operands.empty()` - less expensive in general and more idiomatic. bondhugula: `!operands.empty()` - less expensive in general and more idiomatic.
				for (BufferPlacementAllocs::AllocEntry &entry : allocs) {
				Value allocValue = std::get<0>(entry);
				Operation *definingOp = allocValue.getDefiningOp();
				assert(definingOp && "No defining op");
				auto operands = definingOp->getOperands();
				auto resultAliases = aliases.resolve(allocValue);
				pifon2aUnsubmitted Done Reply Inline Actions `std::next(operands.begin())` pifon2a: `std::next(operands.begin())`
				// Determine the common dominator block of all aliases.
				Block *dominatorBlock =
				findCommonDominator(allocValue, resultAliases, dominators);
				// Init the initial hoisting state.
				StateT state(&dominators, allocValue, allocValue.getParentBlock());
				// Check for additional allocation dependencies to compute an upper bound
				// for hoisting.
				Block *dependencyBlock = nullptr;
				if (!operands.empty()) {
				// If this node has dependencies, check all dependent nodes with respect
				herhutUnsubmitted Done Reply Inline Actions This can only be done if the alloc you are moving is loop invariant. This is ensured by the calling context but not clear from this code. Also, you could query the `LoopLikeOpInterface` as to whether the operands are loop independent. This also is not an optimization of the loop is never executed. What happens if the allocated buffer escapes the loop over a backedge? Something like %0 = alloc scf.for ... init(%alloc) { bb0(%arg0...): %1 = alloc <update %1 using %arg0> yield %1 } I think with your rewrite, you would allocate `%1` once and from the second iteration, the two buffers would now alias. herhut: This can only be done if the alloc you are moving is loop invariant. This is ensured by the…
				// to a common post dominator. This ensures that all dependency values
				// have been computed before allocating the buffer.
				ValueSetT dependencies(std::next(operands.begin()), operands.end());
				dependencyBlock = findCommonDominator(*operands.begin(), dependencies,
				postDominators);
				}

				// Find the actual placement block and determine the start operation using
				// an upper placement-block boundary. The idea is that placement block
				// cannot be moved any further upwards than the given upper bound.
				Block *placementBlock = findPlacementBlock(
				state, state.computeUpperBound(dominatorBlock, dependencyBlock));
				herhutUnsubmitted Done Reply Inline Actions Does this address my example? This moves up the blocks until all aliases are covered. So in my example, if there was another aliasing on some outer level, it would still break the semantics, no? herhut: Does this address my example? This moves up the blocks until all aliases are covered. So in my…
				Operation *startOperation = BufferPlacementAllocs::getStartOperation(
				allocValue, placementBlock, liveness);

				// Move the alloc in front of the start operation.
				Operation *allocOperation = allocValue.getDefiningOp();
				allocOperation->moveBefore(startOperation);
				}
				}

				private:
				/// Finds a valid placement block by walking upwards in the CFG until we
				/// either cannot continue our walk due to constraints (given by the StateT
				/// implementation) or we have reached the upper-most dominator block.
				Block findPlacementBlock(StateT &state, Block upperBound) {
				Block *currentBlock = state.placementBlock;
				// Walk from the innermost regions/loops to the outermost regions/loops and
				// find an appropriate placement block that satisfies the constraint of the
				// current StateT implementation. Walk until we reach the upperBound block
				// (if any).
				bondhugulaUnsubmitted Done Reply Inline Actions Nit: analysis -> info bondhugula: Nit: analysis -> info
				herhutUnsubmitted Done Reply Inline Actions This moves within one region level, correct? herhut: This moves within one region level, correct?

				// If we are not able to find a valid parent operation or an associated
				bondhugulaUnsubmitted Done Reply Inline Actions analysis -> info bondhugula: analysis -> info
				// parent block, break the walk loop.
				Operation *parentOp;
				Block *parentBlock;
				while ((parentOp = currentBlock->getParentOp()) &&
				(parentBlock = parentOp->getBlock()) &&
				(!upperBound \|\|
				dominators.properlyDominates(upperBound, currentBlock))) {
				// Try to find an immediate dominator and check whether the parent block
				herhutUnsubmitted Done Reply Inline Actions I would spell this out for readability: `state.isLegalPlacementOp(parentOp)` herhut: I would spell this out for readability: `state.isLegalPlacementOp(parentOp)`
				// is above the immediate dominator (if any).
				DominanceInfoNode *idom = dominators.getNode(currentBlock)->getIDom();
				if (idom && dominators.properlyDominates(parentBlock, idom->getBlock())) {
				// If the current immediate dominator is below the placement block, move
				// to the immediate dominator block.
				herhutUnsubmitted Done Reply Inline Actions `recordPotentialPlacement` as this does not move? herhut: `recordPotentialPlacement` as this does not move?
				currentBlock = idom->getBlock();
				state.recordMoveToDominator(currentBlock);
				} else {
				bondhugulaUnsubmitted Done Reply Inline Actions Nit: Terminate with period. bondhugula: Nit: Terminate with period.
				// We have to move to our parent block since an immediate dominator does
				// either not exist or is above our parent block. If we cannot move to
				// our parent operation due to constraints given by the StateT
				// implementation, break the walk loop. Furthermore, we should not move
				// allocations out of unknown region-based control-flow operations.
				if (!isKnownControlFlowInterface(parentOp) \|\|
				!state.isLegalPlacement(parentOp))
				break;
				// Move to our parent block by notifying the current StateT
				// implementation.
				currentBlock = parentBlock;
				state.recordMoveToParent(currentBlock);
				}
				}
				// Return the finally determined placement block.
				return state.placementBlock;
				}

				/// The dominator info to find the appropriate start operation to move the
				/// allocs.
				DominanceInfo dominators;
				herhutUnsubmitted Done Reply Inline Actions The goal here is to move the alloc high enough to cover all aliases, right? Moving it any higher will not avoid copies but only increase memory pressure. This should be covered here. `isa<LoopLikeOpInterface>` is a very weak condition. There might be loops that do not implement this. So a better way to phrase this would be that the operation also does implement the `RegionBranchOpInterface` and has no back-edges between regions. Otherwise it is a loop and hoisting out of loops is illegal. herhut: The goal here is to move the alloc high enough to cover all aliases, right? Moving it any…
				dfki-makoAuthorUnsubmitted Done Reply Inline Actions The goal here is to move the alloc high enough to cover all aliases, right? Moving it any higher will not avoid copies but only increase memory pressure. This should be covered here. The function `findPlacementBlock` does not move the placement block above the `dominatorBlock`, which has been determined before. Since the `dominatorBlock` represents the immediate common dominator of all aliases (while taking potential dependencies into account), it cannot happen that the memory pressure is significantly increased because the allocation will not be moved further. isa<LoopLikeOpInterface> is a very weak condition I agree that we might want to capture these cases, as well. dfki-mako: > The goal here is to move the alloc high enough to cover all aliases, right? Moving it any…

				/// The post dominator info to move the dependent allocs in the right
				/// position.
				PostDominanceInfo postDominators;

				/// The map storing the final placement blocks of a given alloc value.
				llvm::DenseMap<Value, Block *> placementBlocks;
				};

				/// A state implementation compatible with the `BufferAllocationHoisting` class
				/// that hoists allocations into dominator blocks while keeping them inside of
				/// loops.
				struct BufferAllocationHoistingState : BufferAllocationHoistingStateBase {
				herhutUnsubmitted Done Reply Inline Actions So this will walk all the way up and if it finds a loop anywhere during that walk it will hoist. This also means hoisting out of conditionals (might be bad) and through unknown region based control flow. This should only hoist out of operations that implement `LoopLikeOpInterface` and only if the allocation does not escape that loop. herhut: So this will walk all the way up and if it finds a loop anywhere during that walk it will hoist.
				dfki-makoAuthorUnsubmitted Done Reply Inline Actions The current implementation behaves differently, however, if we want to ensure that allocation does not escape the loop in all cases, we have to restructure the implementation slightly. dfki-mako: The current implementation behaves differently, however, if we want to ensure that allocation…
				using BufferAllocationHoistingStateBase::BufferAllocationHoistingStateBase;

				/// Computes the upper bound for the placement block search.
				Block computeUpperBound(Block dominatorBlock, Block *dependencyBlock) {
				// If we do not have a dependency block, the upper bound is given by the
				// dominator block.
				if (!dependencyBlock)
				return dominatorBlock;

				// Find the "lower" block of the dominator and the dependency block to
				// ensure that we do not move allocations above this block.
				return dominators->properlyDominates(dominatorBlock, dependencyBlock)
				? dependencyBlock
				: dominatorBlock;
				}

				/// Returns true if the given operation does not represent a loop.
				bool isLegalPlacement(Operation *op) {
				return !BufferPlacementTransformationBase::isLoop(op);
				}

				/// Sets the current placement block to the given block.
				void recordMoveToDominator(Block *block) { placementBlock = block; }

				/// Sets the current placement block to the given block.
				void recordMoveToParent(Block *block) { recordMoveToDominator(block); }
				};

				/// A state implementation compatible with the `BufferAllocationHoisting` class
				/// that hoists allocations out of loops.
				struct BufferAllocationLoopHoistingState : BufferAllocationHoistingStateBase {
				using BufferAllocationHoistingStateBase::BufferAllocationHoistingStateBase;

				/// Remembers the dominator block of all aliases.
				Block *aliasDominatorBlock;

				/// Computes the upper bound for the placement block search.
				Block computeUpperBound(Block dominatorBlock, Block *dependencyBlock) {
				aliasDominatorBlock = dominatorBlock;
				// If there is a dependency block, we have to use this block as an upper
				// bound to satisfy all allocation value dependencies.
				return dependencyBlock ? dependencyBlock : nullptr;
				}

				/// Returns true if the given operation represents a loop and one of the
				/// aliases caused the `aliasDominatorBlock` to be "above" the block of the
				herhutUnsubmitted Done Reply Inline Actions nit: `the one` -> `one` herhut: nit: `the one` -> `one`
				/// given loop operation. If this is the case, it indicates that the
				/// allocation is passed via a back edge.
				bool isLegalPlacement(Operation *op) {
				return BufferPlacementTransformationBase::isLoop(op) &&
				!dominators->dominates(aliasDominatorBlock, op->getBlock());
				}

				/// Does not change the internal placement block, as we want to move
				/// operations out of loops only.
				void recordMoveToDominator(Block *block) {}

				/// Sets the current placement block to the given block.
				void recordMoveToParent(Block *block) { placementBlock = block; }
				};

				//===----------------------------------------------------------------------===//
				// BufferOptimizationPasses
				//===----------------------------------------------------------------------===//

				/// The buffer hoisting pass that hoists allocation nodes into dominating
				/// blocks.
				struct BufferHoistingPass : BufferHoistingBase<BufferHoistingPass> {

				void runOnFunction() override {
				// Hoist all allocations into dominator blocks.
				BufferAllocationHoisting<BufferAllocationHoistingState> optimizer(
				getFunction());
				optimizer.hoist();
				}
				};

				/// The buffer loop hoisting pass that hoists allocation nodes out of loops.
				struct BufferLoopHoistingPass : BufferLoopHoistingBase<BufferLoopHoistingPass> {

				void runOnFunction() override {
				// Hoist all allocations out of loops.
				BufferAllocationHoisting<BufferAllocationLoopHoistingState> optimizer(
				getFunction());
				optimizer.hoist();
				}
				};

				} // end anonymous namespace

				std::unique_ptr<Pass> mlir::createBufferHoistingPass() {
				return std::make_unique<BufferHoistingPass>();
				}

				std::unique_ptr<Pass> mlir::createBufferLoopHoistingPass() {
				return std::make_unique<BufferLoopHoistingPass>();
				}
				herhutUnsubmitted Done Reply Inline Actions Please fix. herhut: Please fix.

mlir/lib/Transforms/BufferPlacement.cpp

This file was moved to mlir/lib/Transforms/BufferDeallocation.cpp.

mlir/lib/Transforms/CMakeLists.txt

	add_subdirectory(Utils)			add_subdirectory(Utils)

	add_mlir_library(MLIRTransforms			add_mlir_library(MLIRTransforms
	BufferPlacement.cpp			BufferDeallocation.cpp
				BufferOptimizations.cpp
	Bufferize.cpp			Bufferize.cpp
	Canonicalizer.cpp			Canonicalizer.cpp
	CopyRemoval.cpp			CopyRemoval.cpp
	CSE.cpp			CSE.cpp
	DialectConversion.cpp			DialectConversion.cpp
	Inliner.cpp			Inliner.cpp
	LocationSnapshot.cpp			LocationSnapshot.cpp
	LoopCoalescing.cpp			LoopCoalescing.cpp
	Show All 31 Lines

mlir/test/Dialect/Linalg/bufferize.mlir

	// RUN: mlir-opt -linalg-bufferize -buffer-placement -split-input-file %s \| FileCheck %s			// RUN: mlir-opt -linalg-bufferize -buffer-hoisting -buffer-deallocation -split-input-file %s \| FileCheck %s

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @multiple_results			// CHECK-LABEL: func @multiple_results
	func @multiple_results(%arg0: tensor<4xf32>) -> (tensor<4xf32>, tensor<4xf32>) {			func @multiple_results(%arg0: tensor<4xf32>) -> (tensor<4xf32>, tensor<4xf32>) {
	%0, %1 = linalg.generic {			%0, %1 = linalg.generic {
	indexing_maps = [#map0, #map0, #map0],			indexing_maps = [#map0, #map0, #map0],
	iterator_types = ["parallel"]			iterator_types = ["parallel"]
	▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

mlir/test/Transforms/buffer-deallocation.mlir

This file was moved from mlir/test/Transforms/buffer-placement.mlir.

// RUN: mlir-opt -buffer-placement -split-input-file %s \| FileCheck %s		// RUN: mlir-opt -buffer-deallocation -split-input-file %s \| FileCheck %s

// This file checks the behaviour of BufferPlacement pass for moving Alloc and		// This file checks the behaviour of BufferDeallocation pass for moving and
// Dealloc operations and inserting the missing the DeallocOps in their correct		// inserting missing DeallocOps in their correct positions. Furthermore,
// positions.		// copies and their corresponding AllocOps are inserted.

// Test Case:		// Test Case:
// bb0		// bb0
// / \		// / \
// bb1 bb2 <- Initial position of AllocOp		// bb1 bb2 <- Initial position of AllocOp
// \ /		// \ /
// bb3		// bb3
// BufferPlacement Expected Behaviour: It should move the existing AllocOp to		// BufferDeallocation expected behavior: bb2 contains an AllocOp which is
// the entry block, and insert a DeallocOp at the exit block after CopyOp since		// passed to bb3. In the latter block, there should be an deallocation.
// %1 is an alias for %0 and %arg1.		// Since bb1 does not contain an adequate alloc and the alloc in bb2 is not
		// moved to bb0, we need to insert allocs and copies.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @condBranch		// CHECK-LABEL: func @condBranch
func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3(%arg1 : memref<2xf32>)		br ^bb3(%arg1 : memref<2xf32>)
Show All 9 Lines	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}		}
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br
		// CHECK: %[[ALLOC0:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
		// CHECK-NEXT: br ^bb3(%[[ALLOC0]]
		// CHECK: %[[ALLOC1:.*]] = alloc()
		// CHECK-NEXT: linalg.generic
		// CHECK: %[[ALLOC2:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
		// CHECK-NEXT: dealloc %[[ALLOC1]]
		// CHECK-NEXT: br ^bb3(%[[ALLOC2]]
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc %[[ALLOC]]		// CHECK-NEXT: dealloc
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case:		// Test Case:
// bb0		// bb0
// / \		// / \
// bb1 bb2 <- Initial position of AllocOp		// bb1 bb2 <- Initial position of AllocOp
// \ /		// \ /
// bb3		// bb3
// BufferPlacement Expected Behaviour: It should not move the existing AllocOp		// BufferDeallocation expected behavior: The existing AllocOp has a dynamic
// to any other block since the alloc has a dynamic dependency to block argument		// dependency to block argument %0 in bb2. Since the dynamic type is passed
// %0 in bb2. Since the dynamic type is passed to bb3 via the block argument %2,		// to bb3 via the block argument %2, it is currently required to allocate a
// it is currently required to allocate a temporary buffer for %2 that gets		// temporary buffer for %2 that gets copies of %arg0 and %1 with their
// copies of %arg0 and %1 with their appropriate shape dimensions. The copy		// appropriate shape dimensions. The copy buffer deallocation will be applied
// buffer deallocation will be applied to %2 in block bb3.		// to %2 in block bb3.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @condBranchDynamicType		// CHECK-LABEL: func @condBranchDynamicType
func @condBranchDynamicType(		func @condBranchDynamicType(
%arg0: i1,		%arg0: i1,
%arg1: memref<?xf32>,		%arg1: memref<?xf32>,
%arg2: memref<?xf32>,		%arg2: memref<?xf32>,
Show All 17 Lines	^bb3(%2: memref<?xf32>):
"linalg.copy"(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()		"linalg.copy"(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
return		return
}		}

// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br
// CHECK: %[[DIM0:.*]] = dim		// CHECK: %[[DIM0:.*]] = dim
// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[DIM0]])		// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[DIM0]])
// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])		// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])
		// CHECK-NEXT: br ^bb3(%[[ALLOC0]]
// CHECK: ^bb2(%[[IDX:.]]:{{.}})		// CHECK: ^bb2(%[[IDX:.]]:{{.}})
// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%[[IDX]])		// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%[[IDX]])
// CHECK-NEXT: linalg.generic		// CHECK-NEXT: linalg.generic
// CHECK: %[[DIM1:.*]] = dim %[[ALLOC1]]		// CHECK: %[[DIM1:.*]] = dim %[[ALLOC1]]
// CHECK-NEXT: %[[ALLOC2:.*]] = alloc(%[[DIM1]])		// CHECK-NEXT: %[[ALLOC2:.*]] = alloc(%[[DIM1]])
// CHECK-NEXT: linalg.copy(%[[ALLOC1]], %[[ALLOC2]])		// CHECK-NEXT: linalg.copy(%[[ALLOC1]], %[[ALLOC2]])
// CHECK-NEXT: dealloc %[[ALLOC1]]		// CHECK-NEXT: dealloc %[[ALLOC1]]
// CHECK-NEXT: br ^bb3		// CHECK-NEXT: br ^bb3
Show All 11 Lines
// \| / \		// \| / \
// \| bb3 bb4		// \| bb3 bb4
// \| \ /		// \| \ /
// \ bb5		// \ bb5
// \ /		// \ /
// bb6		// bb6
// \|		// \|
// bb7		// bb7
// BufferPlacement Expected Behaviour: It should not move the existing AllocOp		// BufferDeallocation expected behavior: The existing AllocOp has a dynamic
// to any other block since the alloc has a dynamic dependency to block argument		// dependency to block argument %0 in bb2. Since the dynamic type is passed to
// %0 in bb2. Since the dynamic type is passed to bb5 via the block argument %2		// bb5 via the block argument %2 and to bb6 via block argument %3, it is
// and to bb6 via block argument %3, it is currently required to allocate		// currently required to allocate temporary buffers for %2 and %3 that gets
// temporary buffers for %2 and %3 that gets copies of %1 and %arg0 1 with their		// copies of %1 and %arg0 1 with their appropriate shape dimensions. The copy
// appropriate shape dimensions. The copy buffer deallocations will be applied		// buffer deallocations will be applied to %2 in block bb5 and to %3 in block
// to %2 in block bb5 and to %3 in block bb6. Furthermore, there should be no		// bb6. Furthermore, there should be no copy inserted for %4.
// copy inserted for %4.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @condBranchDynamicType		// CHECK-LABEL: func @condBranchDynamicTypeNested
func @condBranchDynamicTypeNested(		func @condBranchDynamicTypeNested(
%arg0: i1,		%arg0: i1,
%arg1: memref<?xf32>,		%arg1: memref<?xf32>,
%arg2: memref<?xf32>,		%arg2: memref<?xf32>,
%arg3: index) {		%arg3: index) {
cond_br %arg0, ^bb1, ^bb2(%arg3: index)		cond_br %arg0, ^bb1, ^bb2(%arg3: index)
^bb1:		^bb1:
br ^bb6(%arg1 : memref<?xf32>)		br ^bb6(%arg1 : memref<?xf32>)
Show All 22 Lines	^bb7(%4: memref<?xf32>):
return		return
}		}

// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br
// CHECK: ^bb1		// CHECK: ^bb1
// CHECK: %[[DIM0:.*]] = dim		// CHECK: %[[DIM0:.*]] = dim
// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[DIM0]])		// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[DIM0]])
// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])		// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])
		// CHECK-NEXT: br ^bb6
// CHECK: ^bb2(%[[IDX:.]]:{{.}})		// CHECK: ^bb2(%[[IDX:.]]:{{.}})
// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%[[IDX]])		// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%[[IDX]])
// CHECK-NEXT: linalg.generic		// CHECK-NEXT: linalg.generic
// CHECK: cond_br		// CHECK: cond_br
// CHECK: ^bb3:		// CHECK: ^bb3:
// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})		// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})
// CHECK: ^bb4:		// CHECK: ^bb4:
// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})		// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})
// CHECK-NEXT: ^bb5(%[[ALLOC2:.]]:{{.}})		// CHECK-NEXT: ^bb5(%[[ALLOC2:.]]:{{.}})
// CHECK: %[[DIM2:.*]] = dim %[[ALLOC2]]		// CHECK: %[[DIM2:.*]] = dim %[[ALLOC2]]
// CHECK-NEXT: %[[ALLOC3:.*]] = alloc(%[[DIM2]])		// CHECK-NEXT: %[[ALLOC3:.*]] = alloc(%[[DIM2]])
// CHECK-NEXT: linalg.copy(%[[ALLOC2]], %[[ALLOC3]])		// CHECK-NEXT: linalg.copy(%[[ALLOC2]], %[[ALLOC3]])
// CHECK-NEXT: dealloc %[[ALLOC1]]		// CHECK-NEXT: dealloc %[[ALLOC1]]
// CHECK-NEXT: br ^bb6(%[[ALLOC3]]{{.*}})		// CHECK-NEXT: br ^bb6(%[[ALLOC3]]{{.*}})
// CHECK-NEXT: ^bb6(%[[ALLOC4:.]]:{{.}})		// CHECK-NEXT: ^bb6(%[[ALLOC4:.]]:{{.}})
// CHECK-NEXT: br ^bb7(%[[ALLOC4]]{{.*}})		// CHECK-NEXT: br ^bb7(%[[ALLOC4]]{{.*}})
// CHECK-NEXT: ^bb7(%[[ALLOC5:.]]:{{.}})		// CHECK-NEXT: ^bb7(%[[ALLOC5:.]]:{{.}})
// CHECK: linalg.copy(%[[ALLOC5]],		// CHECK: linalg.copy(%[[ALLOC5]],
// CHECK-NEXT: dealloc %[[ALLOC4]]		// CHECK-NEXT: dealloc %[[ALLOC4]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case: Existing AllocOp with no users.		// Test Case: Existing AllocOp with no users.
// BufferPlacement Expected Behaviour: It should insert a DeallocOp right before		// BufferDeallocation expected behavior: It should insert a DeallocOp right
// ReturnOp.		// before ReturnOp.

// CHECK-LABEL: func @emptyUsesValue		// CHECK-LABEL: func @emptyUsesValue
func @emptyUsesValue(%arg0: memref<4xf32>) {		func @emptyUsesValue(%arg0: memref<4xf32>) {
%0 = alloc() : memref<4xf32>		%0 = alloc() : memref<4xf32>
return		return
}		}
// CHECK-NEXT: %[[ALLOC:.*]] = alloc()		// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
// CHECK-NEXT: dealloc %[[ALLOC]]		// CHECK-NEXT: dealloc %[[ALLOC]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case:		// Test Case:
// bb0		// bb0
// / \		// / \
// \| bb1 <- Initial position of AllocOp		// \| bb1 <- Initial position of AllocOp
// \ /		// \ /
// bb2		// bb2
// BufferPlacement Expected Behaviour: It should move the existing AllocOp to		// BufferDeallocation expected behavior: It should insert a DeallocOp at the
// the entry block and insert a DeallocOp at the exit block after CopyOp since		// exit block after CopyOp since %1 is an alias for %0 and %arg1. Furthermore,
// %1 is an alias for %0 and %arg1.		// we have to insert a copy and an alloc in the beginning of the function.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @criticalEdge		// CHECK-LABEL: func @criticalEdge
func @criticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @criticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)		cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)
^bb1:		^bb1:
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
iterator_types = ["parallel"]}		iterator_types = ["parallel"]}
ins(%arg1: memref<2xf32>)		ins(%arg1: memref<2xf32>)
outs(%0: memref<2xf32>) {		outs(%0: memref<2xf32>) {
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):		^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}		}
br ^bb2(%0 : memref<2xf32>)		br ^bb2(%0 : memref<2xf32>)
^bb2(%1: memref<2xf32>):		^bb2(%1: memref<2xf32>):
"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

// CHECK-NEXT: %[[ALLOC:.*]] = alloc()		// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br
		// CHECK: %[[ALLOC1:.*]] = alloc()
		// CHECK-NEXT: linalg.generic
		// CHECK: %[[ALLOC2:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
		// CHECK-NEXT: dealloc %[[ALLOC1]]
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc %[[ALLOC]]		// CHECK-NEXT: dealloc
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case:		// Test Case:
// bb0 <- Initial position of AllocOp		// bb0 <- Initial position of AllocOp
// / \		// / \
// \| bb1		// \| bb1
// \ /		// \ /
// bb2		// bb2
// BufferPlacement Expected Behaviour: It shouldn't move the alloc position. It		// BufferDeallocation expected behavior: It only inserts a DeallocOp at the
// only inserts a DeallocOp at the exit block after CopyOp since %1 is an alias		// exit block after CopyOp since %1 is an alias for %0 and %arg1.
// for %0 and %arg1.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @invCriticalEdge		// CHECK-LABEL: func @invCriticalEdge
func @invCriticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @invCriticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
Show All 18 Lines
// -----		// -----

// Test Case:		// Test Case:
// bb0 <- Initial position of the first AllocOp		// bb0 <- Initial position of the first AllocOp
// / \		// / \
// bb1 bb2		// bb1 bb2
// \ /		// \ /
// bb3 <- Initial position of the second AllocOp		// bb3 <- Initial position of the second AllocOp
// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only		// BufferDeallocation expected behavior: It only inserts two missing
// inserts two missing DeallocOps in the exit block. %5 is an alias for %0.		// DeallocOps in the exit block. %5 is an alias for %0. Therefore, the
// Therefore, the DeallocOp for %0 should occur after the last GenericOp. The		// DeallocOp for %0 should occur after the last GenericOp. The Dealloc for %7
// Dealloc for %7 should happen after the CopyOp.		// should happen after the CopyOp.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @ifElse		// CHECK-LABEL: func @ifElse
func @ifElse(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @ifElse(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
Show All 38 Lines
// -----		// -----

// Test Case: No users for buffer in if-else CFG		// Test Case: No users for buffer in if-else CFG
// bb0 <- Initial position of AllocOp		// bb0 <- Initial position of AllocOp
// / \		// / \
// bb1 bb2		// bb1 bb2
// \ /		// \ /
// bb3		// bb3
// BufferPlacement Expected Behaviour: It shouldn't move the AllocOp. It only		// BufferDeallocation expected behavior: It only inserts a missing DeallocOp
// inserts a missing DeallocOp in the exit block since %5 or %6 are the latest		// in the exit block since %5 or %6 are the latest aliases of %0.
// aliases of %0.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @ifElseNoUsers		// CHECK-LABEL: func @ifElseNoUsers
func @ifElseNoUsers(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @ifElseNoUsers(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
Show All 12 Lines
^bb2(%3: memref<2xf32>, %4: memref<2xf32>):		^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)		br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)
^bb3(%5: memref<2xf32>, %6: memref<2xf32>):		^bb3(%5: memref<2xf32>, %6: memref<2xf32>):
"linalg.copy"(%arg1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%arg1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()		// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
// CHECK: dealloc %[[FIRST_ALLOC]]		// CHECK: linalg.copy
		// CHECK-NEXT: dealloc %[[FIRST_ALLOC]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case:		// Test Case:
// bb0 <- Initial position of the first AllocOp		// bb0 <- Initial position of the first AllocOp
// / \		// / \
// bb1 bb2		// bb1 bb2
// \| / \		// \| / \
// \| bb3 bb4		// \| bb3 bb4
// \ \ /		// \ \ /
// \ /		// \ /
// bb5 <- Initial position of the second AllocOp		// bb5 <- Initial position of the second AllocOp
// BufferPlacement Expected Behaviour: AllocOps shouldn't be moved.		// BufferDeallocation expected behavior: Two missing DeallocOps should be
// Two missing DeallocOps should be inserted in the exit block.		// inserted in the exit block.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @ifElseNested		// CHECK-LABEL: func @ifElseNested
func @ifElseNested(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @ifElseNested(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
Show All 37 Lines
// CHECK: dealloc %[[FIRST_ALLOC]]		// CHECK: dealloc %[[FIRST_ALLOC]]
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]		// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case: Dead operations in a single block.		// Test Case: Dead operations in a single block.
// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only		// BufferDeallocation expected behavior: It only inserts the two missing
// inserts the two missing DeallocOps after the last GenericOp.		// DeallocOps after the last GenericOp.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @redundantOperations		// CHECK-LABEL: func @redundantOperations
func @redundantOperations(%arg0: memref<2xf32>) {		func @redundantOperations(%arg0: memref<2xf32>) {
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
Show All 16 Lines	func @redundantOperations(%arg0: memref<2xf32>) {
}		}
return		return
}		}

// CHECK: (%[[ARG0:.]]: {{.}})		// CHECK: (%[[ARG0:.]]: {{.}})
// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()		// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
// CHECK-NEXT: linalg.generic {{.}} ins(%[[ARG0]]{{.}}outs(%[[FIRST_ALLOC]]		// CHECK-NEXT: linalg.generic {{.}} ins(%[[ARG0]]{{.}}outs(%[[FIRST_ALLOC]]
// CHECK: %[[SECOND_ALLOC:.*]] = alloc()		// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
// CHECK-NEXT: linalg.generic {{.}} ins(%[[FIRST_ALLOC]]{{.}}outs(%[[SECOND_ALLOC]]		// CHECK-NEXT: linalg.generic {{.*}} ins
		// CHECK-SAME: (%[[FIRST_ALLOC]]{{.*}}outs(%[[SECOND_ALLOC]]
// CHECK: dealloc		// CHECK: dealloc
// CHECK-NEXT: dealloc		// CHECK-NEXT: dealloc
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case:		// Test Case:
// bb0		// bb0
// / \		// / \
// Initial pos of the 1st AllocOp -> bb1 bb2 <- Initial pos of the 2nd AllocOp		// Initial pos of the 1st AllocOp -> bb1 bb2 <- Initial pos of the 2nd AllocOp
// \ /		// \ /
// bb3		// bb3
// BufferPlacement Expected Behaviour: Both AllocOps should be moved to the		// BufferDeallocation expected behavior: We need to introduce a copy for each
// entry block. Both missing DeallocOps should be moved to the exit block after		// buffer since the buffers are passed to bb3. The both missing DeallocOps are
// CopyOp since %arg2 is an alias for %0 and %1.		// inserted in the respective block of the allocs. The copy is freed in the exit
		// block.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @moving_alloc_and_inserting_missing_dealloc		// CHECK-LABEL: func @moving_alloc_and_inserting_missing_dealloc
func @moving_alloc_and_inserting_missing_dealloc(		func @moving_alloc_and_inserting_missing_dealloc(
%cond: i1,		%cond: i1,
%arg0: memref<2xf32>,		%arg0: memref<2xf32>,
%arg1: memref<2xf32>) {		%arg1: memref<2xf32>) {
Show All 22 Lines	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
linalg.yield %tmp2 : f32		linalg.yield %tmp2 : f32
}		}
br ^exit(%1 : memref<2xf32>)		br ^exit(%1 : memref<2xf32>)
^exit(%arg2: memref<2xf32>):		^exit(%arg2: memref<2xf32>):
"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

// CHECK-NEXT: %{{.*}} = alloc()		// CHECK-NEXT: cond_br
// CHECK-NEXT: %{{.*}} = alloc()		// CHECK: ^bb1
		// CHECK: ^bb1
		// CHECK: %[[ALLOC0:.*]] = alloc()
		// CHECK-NEXT: linalg.generic
		// CHECK: %[[ALLOC1:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
		// CHECK-NEXT: dealloc %[[ALLOC0]]
		// CHECK-NEXT: br ^bb3(%[[ALLOC1]]
		// CHECK-NEXT: ^bb2
		// CHECK-NEXT: %[[ALLOC2:.*]] = alloc()
		// CHECK-NEXT: linalg.generic
		// CHECK: %[[ALLOC3:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
		// CHECK-NEXT: dealloc %[[ALLOC2]]
		// CHECK-NEXT: br ^bb3(%[[ALLOC3]]
		// CHECK-NEXT: ^bb3(%[[ALLOC4:.]]:{{.}})
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc		// CHECK-NEXT: dealloc %[[ALLOC4]]
// CHECK-NEXT: dealloc
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case: Invalid position of the DeallocOp. There is a user after		// Test Case: Invalid position of the DeallocOp. There is a user after
// deallocation.		// deallocation.
// bb0		// bb0
// / \		// / \
// bb1 bb2 <- Initial position of AllocOp		// bb1 bb2 <- Initial position of AllocOp
// \ /		// \ /
// bb3		// bb3
// BufferPlacement Expected Behaviour: It should move the AllocOp to the entry		// BufferDeallocation expected behavior: The existing DeallocOp should be
// block. The existing DeallocOp should be moved to exit block.		// moved to exit block.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @moving_invalid_dealloc_op_complex		// CHECK-LABEL: func @moving_invalid_dealloc_op_complex
func @moving_invalid_dealloc_op_complex(		func @moving_invalid_dealloc_op_complex(
%cond: i1,		%cond: i1,
%arg0: memref<2xf32>,		%arg0: memref<2xf32>,
%arg1: memref<2xf32>) {		%arg1: memref<2xf32>) {
		%1 = alloc() : memref<2xf32>
cond_br %cond, ^bb1, ^bb2		cond_br %cond, ^bb1, ^bb2
^bb1:		^bb1:
br ^exit(%arg0 : memref<2xf32>)		br ^exit(%arg0 : memref<2xf32>)
^bb2:		^bb2:
%1 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
iterator_types = ["parallel"]}		iterator_types = ["parallel"]}
ins(%arg0: memref<2xf32>)		ins(%arg0: memref<2xf32>)
outs(%1: memref<2xf32>) {		outs(%1: memref<2xf32>) {
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):		^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}		}
dealloc %1 : memref<2xf32>		dealloc %1 : memref<2xf32>
br ^exit(%1 : memref<2xf32>)		br ^exit(%1 : memref<2xf32>)
^exit(%arg2: memref<2xf32>):		^exit(%arg2: memref<2xf32>):
"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

// CHECK-NEXT: %{{.*}} = alloc()		// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
		// CHECK-NEXT: cond_br
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc		// CHECK-NEXT: dealloc %[[ALLOC0]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case: Inserting missing DeallocOp in a single block.		// Test Case: Inserting missing DeallocOp in a single block.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

Show All 10 Lines	func @inserting_missing_dealloc_simple(
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):		^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}		}
"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

		// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc		// CHECK-NEXT: dealloc %[[ALLOC0]]

// -----		// -----

// Test Case: Moving invalid DeallocOp (there is a user after deallocation) in a		// Test Case: Moving invalid DeallocOp (there is a user after deallocation) in a
// single block.		// single block.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

Show All 9 Lines	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}		}
dealloc %0 : memref<2xf32>		dealloc %0 : memref<2xf32>
"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

		// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc		// CHECK-NEXT: dealloc %[[ALLOC0]]

// -----		// -----

// Test Case: Nested regions - This test defines a GenericOp inside the region of		// Test Case: Nested regions - This test defines a GenericOp inside the region
// another GenericOp.		// of another GenericOp.
// BufferPlacement Expected Behaviour: The AllocOp of inner GenericOp should remain		// BufferDeallocation expected behavior: The AllocOp of inner GenericOp should
// inside the region of outer GenericOp and it should insert the missing DeallocOp		// remain inside the region of outer GenericOp and it should insert the missing
// in the same region. The AllocOp of the outer GenericOp should be moved to the		// DeallocOp in the same region. The missing DeallocOp should be inserted after
// entry block and its missing DeallocOp should be inserted after Linalg.Copy.		// Linalg.Copy.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @nested_regions_and_cond_branch		// CHECK-LABEL: func @nested_regions_and_cond_branch
func @nested_regions_and_cond_branch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @nested_regions_and_cond_branch(
		%arg0: i1,
		%arg1: memref<2xf32>,
		%arg2: memref<2xf32>) {
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3(%arg1 : memref<2xf32>)		br ^bb3(%arg1 : memref<2xf32>)
^bb2:		^bb2:
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}		linalg.generic {
		indexing_maps = [#map0, #map0],
		iterator_types = ["parallel"]}
ins(%arg1: memref<2xf32>)		ins(%arg1: memref<2xf32>)
outs(%0: memref<2xf32>) {		outs(%0: memref<2xf32>) {
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):		^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%1 = alloc() : memref<2xf32>		%1 = alloc() : memref<2xf32>
linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}		linalg.generic {
		indexing_maps = [#map0, #map0],
		iterator_types = ["parallel"]}
ins(%arg1: memref<2xf32>)		ins(%arg1: memref<2xf32>)
outs(%1: memref<2xf32>) {		outs(%1: memref<2xf32>) {
^bb0(%gen2_arg0: f32, %gen2_arg1: f32):		^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
%tmp2 = exp %gen2_arg0 : f32		%tmp2 = exp %gen2_arg0 : f32
linalg.yield %tmp2 : f32		linalg.yield %tmp2 : f32
}		}
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}		}
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}
// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})		// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})
// CHECK-NEXT: %[[GENERIC1_ALLOC:.*]] = alloc()
// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]		// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]
		// CHECK: %[[ALLOC0:.*]] = alloc()
		// CHECK-NEXT: linalg.copy(%[[ARG1]], %[[ALLOC0]])
// CHECK: ^[[BB2]]:		// CHECK: ^[[BB2]]:
// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[GENERIC1_ALLOC]]		// CHECK: %[[ALLOC1:.*]] = alloc()
// CHECK: %[[GENERIC2_ALLOC:.*]] = alloc()		// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[ALLOC1]]
// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[GENERIC2_ALLOC]]		// CHECK: %[[ALLOC2:.*]] = alloc()
// CHECK: dealloc %[[GENERIC2_ALLOC]]		// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[ALLOC2]]
		// CHECK: dealloc %[[ALLOC2]]
// CHECK-NEXT: %{{.*}} = exp		// CHECK-NEXT: %{{.*}} = exp
		// CHECK: %[[ALLOC3:.*]] = alloc()
		// CHECK-NEXT: linalg.copy(%[[ALLOC1]], %[[ALLOC3]])
		// CHECK-NEXT: dealloc %[[ALLOC1]]
// CHECK: ^[[BB3:.]]({{.}}):		// CHECK: ^[[BB3:.]]({{.}}):
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc %[[GENERIC1_ALLOC]]		// CHECK-NEXT: dealloc

// -----		// -----

// Test Case: buffer deallocation escaping		// Test Case: buffer deallocation escaping
// BufferPlacement Expected Behaviour: It must not dealloc %arg1 and %x		// BufferDeallocation expected behavior: It must not dealloc %arg1 and %x
// since they are operands of return operation and should escape from		// since they are operands of return operation and should escape from
// deallocating. It should dealloc %y after linalg.copy.		// deallocating. It should dealloc %y after linalg.copy.

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @memref_in_function_results		// CHECK-LABEL: func @memref_in_function_results
func @memref_in_function_results(%arg0: memref<5xf32>, %arg1: memref<10xf32>, %arg2: memref<5xf32>) -> (memref<10xf32>, memref<15xf32>) {		func @memref_in_function_results(
		%arg0: memref<5xf32>,
		%arg1: memref<10xf32>,
		%arg2: memref<5xf32>) -> (memref<10xf32>, memref<15xf32>) {
%x = alloc() : memref<15xf32>		%x = alloc() : memref<15xf32>
%y = alloc() : memref<5xf32>		%y = alloc() : memref<5xf32>
linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}		linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
ins(%arg0: memref<5xf32>)		ins(%arg0: memref<5xf32>)
outs(%y: memref<5xf32>) {		outs(%y: memref<5xf32>) {
^bb0(%arg3: f32, %arg4: f32):		^bb0(%arg3: f32, %arg4: f32):
%2 = exp %arg3 : f32		%2 = exp %arg3 : f32
linalg.yield %2 : f32		linalg.yield %2 : f32
}		}
linalg.copy(%y, %arg2) : memref<5xf32>, memref<5xf32>		linalg.copy(%y, %arg2) : memref<5xf32>, memref<5xf32>
return %arg1, %x : memref<10xf32>, memref<15xf32>		return %arg1, %x : memref<10xf32>, memref<15xf32>
}		}
// CHECK: (%[[ARG0:.]]: memref<5xf32>, %[[ARG1:.]]: memref<10xf32>, %[[RESULT:.*]]: memref<5xf32>)		// CHECK: (%[[ARG0:.]]: memref<5xf32>, %[[ARG1:.]]: memref<10xf32>,
		// CHECK-SAME: %[[RESULT:.*]]: memref<5xf32>)
// CHECK: %[[X:.*]] = alloc()		// CHECK: %[[X:.*]] = alloc()
// CHECK: %[[Y:.*]] = alloc()		// CHECK: %[[Y:.*]] = alloc()
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK: dealloc %[[Y]]		// CHECK: dealloc %[[Y]]
// CHECK: return %[[ARG1]], %[[X]]		// CHECK: return %[[ARG1]], %[[X]]

// -----		// -----

// Test Case: nested region control flow		// Test Case: nested region control flow
// The alloc position of %1 does not need to be changed and flows through		// The alloc %1 flows through both if branches until it is finally returned.
// both if branches until it is finally returned. Hence, it does not		// Hence, it does not require a specific dealloc operation. However, %3
// require a specific dealloc operation. However, %3 requires a dealloc.		// requires a dealloc.

// CHECK-LABEL: func @nested_region_control_flow		// CHECK-LABEL: func @nested_region_control_flow
func @nested_region_control_flow(		func @nested_region_control_flow(
%arg0 : index,		%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {		%arg1 : index) -> memref<?x?xf32> {
%0 = cmpi "eq", %arg0, %arg1 : index		%0 = cmpi "eq", %arg0, %arg1 : index
%1 = alloc(%arg0, %arg0) : memref<?x?xf32>		%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
%2 = scf.if %0 -> (memref<?x?xf32>) {		%2 = scf.if %0 -> (memref<?x?xf32>) {
Show All 12 Lines
// CHECK-NEXT: dealloc %[[ALLOC2]]		// CHECK-NEXT: dealloc %[[ALLOC2]]
// CHECK-NEXT: scf.yield %[[ALLOC0]]		// CHECK-NEXT: scf.yield %[[ALLOC0]]
// CHECK: return %[[ALLOC1]]		// CHECK: return %[[ALLOC1]]

// -----		// -----

// Test Case: nested region control flow with a nested buffer allocation in a		// Test Case: nested region control flow with a nested buffer allocation in a
// divergent branch.		// divergent branch.
// The alloc positions of %1, %3 does not need to be changed since		// Buffer deallocation places a copy for both %1 and %3, since they are
// BufferPlacement does not move allocs out of nested regions at the moment.		// returned in the end.
// However, since %3 is allocated and "returned" in a divergent branch, we have
// to allocate a temporary buffer (like in condBranchDynamicTypeNested).

// CHECK-LABEL: func @nested_region_control_flow_div		// CHECK-LABEL: func @nested_region_control_flow_div
func @nested_region_control_flow_div(		func @nested_region_control_flow_div(
%arg0 : index,		%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {		%arg1 : index) -> memref<?x?xf32> {
%0 = cmpi "eq", %arg0, %arg1 : index		%0 = cmpi "eq", %arg0, %arg1 : index
%1 = alloc(%arg0, %arg0) : memref<?x?xf32>		%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
%2 = scf.if %0 -> (memref<?x?xf32>) {		%2 = scf.if %0 -> (memref<?x?xf32>) {
Show All 15 Lines
// CHECK-NEXT: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])		// CHECK-NEXT: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])
// CHECK: dealloc %[[ALLOC3]]		// CHECK: dealloc %[[ALLOC3]]
// CHECK: scf.yield %[[ALLOC4]]		// CHECK: scf.yield %[[ALLOC4]]
// CHECK: dealloc %[[ALLOC0]]		// CHECK: dealloc %[[ALLOC0]]
// CHECK-NEXT: return %[[ALLOC1]]		// CHECK-NEXT: return %[[ALLOC1]]

// -----		// -----

// Test Case: deeply nested region control flow with a nested buffer allocation
// in a divergent branch.
// The alloc positions of %1, %4 and %5 does not need to be changed since
// BufferPlacement does not move allocs out of nested regions at the moment.
// However, since %4 is allocated and "returned" in a divergent branch, we have
// to allocate several temporary buffers (like in condBranchDynamicTypeNested).

// CHECK-LABEL: func @nested_region_control_flow_div_nested
func @nested_region_control_flow_div_nested(
%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {
%0 = cmpi "eq", %arg0, %arg1 : index
%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
%2 = scf.if %0 -> (memref<?x?xf32>) {
%3 = scf.if %0 -> (memref<?x?xf32>) {
scf.yield %1 : memref<?x?xf32>
} else {
%4 = alloc(%arg0, %arg1) : memref<?x?xf32>
scf.yield %4 : memref<?x?xf32>
}
scf.yield %3 : memref<?x?xf32>
} else {
%5 = alloc(%arg1, %arg1) : memref<?x?xf32>
scf.yield %5 : memref<?x?xf32>
}
return %2 : memref<?x?xf32>
}
// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if
// CHECK-NEXT: %[[ALLOC2:.*]] = scf.if
// CHECK: %[[ALLOC3:.*]] = alloc
// CHECK-NEXT: linalg.copy(%[[ALLOC0]], %[[ALLOC3]])
// CHECK: scf.yield %[[ALLOC3]]
// CHECK: %[[ALLOC4:.*]] = alloc(%arg0, %arg1)
// CHECK: %[[ALLOC5:.*]] = alloc
// CHECK-NEXT: linalg.copy(%[[ALLOC4]], %[[ALLOC5]])
// CHECK: dealloc %[[ALLOC4]]
// CHECK: scf.yield %[[ALLOC5]]
// CHECK: %[[ALLOC6:.*]] = alloc
// CHECK-NEXT: linalg.copy(%[[ALLOC2]], %[[ALLOC6]])
// CHECK: dealloc %[[ALLOC2]]
// CHECK: scf.yield %[[ALLOC6]]
// CHECK: %[[ALLOC7:.*]] = alloc(%arg1, %arg1)
// CHECK: %[[ALLOC8:.*]] = alloc
// CHECK-NEXT: linalg.copy(%[[ALLOC7]], %[[ALLOC8]])
// CHECK: dealloc %[[ALLOC7]]
// CHECK: scf.yield %[[ALLOC8]]
// CHECK: dealloc %[[ALLOC0]]
// CHECK-NEXT: return %[[ALLOC1]]

// -----

// Test Case: nested region control flow within a region interface.		// Test Case: nested region control flow within a region interface.
// The alloc positions of %0 does not need to be changed and no copies are		// No copies are required in this case since the allocation finally escapes
// required in this case since the allocation finally escapes the method.		// the method.

// CHECK-LABEL: func @inner_region_control_flow		// CHECK-LABEL: func @inner_region_control_flow
func @inner_region_control_flow(%arg0 : index) -> memref<?x?xf32> {		func @inner_region_control_flow(%arg0 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg0) : memref<?x?xf32>		%0 = alloc(%arg0, %arg0) : memref<?x?xf32>
%1 = test.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) then {		%1 = test.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) then {
^bb0(%arg1 : memref<?x?xf32>):		^bb0(%arg1 : memref<?x?xf32>):
test.region_if_yield %arg1 : memref<?x?xf32>		test.region_if_yield %arg1 : memref<?x?xf32>
} else {		} else {
Show All 13 Lines
// CHECK: ^bb0(%[[ALLOC3:.]]:{{.}}):		// CHECK: ^bb0(%[[ALLOC3:.]]:{{.}}):
// CHECK-NEXT: test.region_if_yield %[[ALLOC3]]		// CHECK-NEXT: test.region_if_yield %[[ALLOC3]]
// CHECK: ^bb0(%[[ALLOC4:.]]:{{.}}):		// CHECK: ^bb0(%[[ALLOC4:.]]:{{.}}):
// CHECK-NEXT: test.region_if_yield %[[ALLOC4]]		// CHECK-NEXT: test.region_if_yield %[[ALLOC4]]
// CHECK: return %[[ALLOC1]]		// CHECK: return %[[ALLOC1]]

// -----		// -----

// Test Case: nested region control flow within a region interface including an
// allocation in a divergent branch.
// The alloc positions of %1 and %2 does not need to be changed since
// BufferPlacement does not move allocs out of nested regions at the moment.
// However, since %2 is allocated and yielded in a divergent branch, we have
// to allocate several temporary buffers (like in condBranchDynamicTypeNested).

// CHECK-LABEL: func @inner_region_control_flow_div
func @inner_region_control_flow_div(
%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg0) : memref<?x?xf32>
%1 = test.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) then {
^bb0(%arg2 : memref<?x?xf32>):
test.region_if_yield %arg2 : memref<?x?xf32>
} else {
^bb0(%arg2 : memref<?x?xf32>):
%2 = alloc(%arg0, %arg1) : memref<?x?xf32>
test.region_if_yield %2 : memref<?x?xf32>
} join {
^bb0(%arg2 : memref<?x?xf32>):
test.region_if_yield %arg2 : memref<?x?xf32>
}
return %1 : memref<?x?xf32>
}

// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
// CHECK-NEXT: %[[ALLOC1:.*]] = test.region_if
// CHECK-NEXT: ^bb0(%[[ALLOC2:.]]:{{.}}):
// CHECK: %[[ALLOC3:.*]] = alloc
// CHECK-NEXT: linalg.copy(%[[ALLOC2]], %[[ALLOC3]])
// CHECK-NEXT: test.region_if_yield %[[ALLOC3]]
// CHECK: ^bb0(%[[ALLOC4:.]]:{{.}}):
// CHECK: %[[ALLOC5:.*]] = alloc
// CHECK: %[[ALLOC6:.*]] = alloc
// CHECK-NEXT: linalg.copy(%[[ALLOC5]], %[[ALLOC6]])
// CHECK-NEXT: dealloc %[[ALLOC5]]
// CHECK-NEXT: test.region_if_yield %[[ALLOC6]]
// CHECK: ^bb0(%[[ALLOC7:.]]:{{.}}):
// CHECK: %[[ALLOC8:.*]] = alloc
// CHECK-NEXT: linalg.copy(%[[ALLOC7]], %[[ALLOC8]])
// CHECK-NEXT: dealloc %[[ALLOC7]]
// CHECK-NEXT: test.region_if_yield %[[ALLOC8]]
// CHECK: dealloc %[[ALLOC0]]
// CHECK-NEXT: return %[[ALLOC1]]

// -----

// CHECK-LABEL: func @subview		// CHECK-LABEL: func @subview
func @subview(%arg0 : index, %arg1 : index, %arg2 : memref<?x?xf32>) {		func @subview(%arg0 : index, %arg1 : index, %arg2 : memref<?x?xf32>) {
%0 = alloc() : memref<64x4xf32, offset: 0, strides: [4, 1]>		%0 = alloc() : memref<64x4xf32, offset: 0, strides: [4, 1]>
%1 = subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :		%1 = subview %0[%arg0, %arg1][%arg0, %arg1][%arg0, %arg1] :
memref<64x4xf32, offset: 0, strides: [4, 1]>		memref<64x4xf32, offset: 0, strides: [4, 1]>
to memref<?x?xf32, offset: ?, strides: [?, ?]>		to memref<?x?xf32, offset: ?, strides: [?, ?]>
"linalg.copy"(%1, %arg2) :		"linalg.copy"(%1, %arg2) :
(memref<?x?xf32, offset: ?, strides: [?, ?]>, memref<?x?xf32>) -> ()		(memref<?x?xf32, offset: ?, strides: [?, ?]>, memref<?x?xf32>) -> ()
return		return
}		}

// CHECK-NEXT: %[[ALLOC:.*]] = alloc()		// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
// CHECK-NEXT: subview		// CHECK-NEXT: subview
// CHECK-NEXT: linalg.copy		// CHECK-NEXT: linalg.copy
// CHECK-NEXT: dealloc %[[ALLOC]]		// CHECK-NEXT: dealloc %[[ALLOC]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

		// Test Case: In the presence of AllocaOps only the AllocOps has top be freed.
		// Therefore, all allocas are not handled.

// CHECK-LABEL: func @condBranchAlloca		// CHECK-LABEL: func @condBranchAlloca
func @condBranchAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @condBranchAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3(%arg1 : memref<2xf32>)		br ^bb3(%arg1 : memref<2xf32>)
^bb2:		^bb2:
%0 = alloca() : memref<2xf32>		%0 = alloca() : memref<2xf32>
linalg.generic {		linalg.generic {
Show All 17 Lines
// CHECK-NEXT: ^bb3		// CHECK-NEXT: ^bb3
// CHECK-NEXT: linalg.copy		// CHECK-NEXT: linalg.copy
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

		// Test Case: In the presence of AllocaOps only the AllocOps has top be freed.
		// Therefore, all allocas are not handled. In this case, only alloc %0 has a
		// dealloc.

// CHECK-LABEL: func @ifElseAlloca		// CHECK-LABEL: func @ifElseAlloca
func @ifElseAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @ifElseAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
iterator_types = ["parallel"]}		iterator_types = ["parallel"]}
ins(%arg1: memref<2xf32>)		ins(%arg1: memref<2xf32>)
outs(%0: memref<2xf32>) {		outs(%0: memref<2xf32>) {
Show All 31 Lines
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @ifElseNestedAlloca		// CHECK-LABEL: func @ifElseNestedAlloca
func @ifElseNestedAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @ifElseNestedAlloca(
		%arg0: i1,
		%arg1: memref<2xf32>,
		%arg2: memref<2xf32>) {
%0 = alloca() : memref<2xf32>		%0 = alloca() : memref<2xf32>
linalg.generic {		linalg.generic {
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
iterator_types = ["parallel"]}		iterator_types = ["parallel"]}
ins(%arg1: memref<2xf32>)		ins(%arg1: memref<2xf32>)
outs(%0: memref<2xf32>) {		outs(%0: memref<2xf32>) {
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):		^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
Show All 33 Lines
// CHECK-NEXT: dealloc %[[ALLOC]]		// CHECK-NEXT: dealloc %[[ALLOC]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

#map0 = affine_map<(d0) -> (d0)>		#map0 = affine_map<(d0) -> (d0)>

// CHECK-LABEL: func @nestedRegionsAndCondBranchAlloca		// CHECK-LABEL: func @nestedRegionsAndCondBranchAlloca
func @nestedRegionsAndCondBranchAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @nestedRegionsAndCondBranchAlloca(
		%arg0: i1,
		%arg1: memref<2xf32>,
		%arg2: memref<2xf32>) {
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3(%arg1 : memref<2xf32>)		br ^bb3(%arg1 : memref<2xf32>)
^bb2:		^bb2:
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}		linalg.generic {
		indexing_maps = [#map0, #map0],
		iterator_types = ["parallel"]}
ins(%arg1: memref<2xf32>)		ins(%arg1: memref<2xf32>)
outs(%0: memref<2xf32>) {		outs(%0: memref<2xf32>) {
^bb0(%gen1_arg0: f32, %gen1_arg1: f32):		^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
%1 = alloca() : memref<2xf32>		%1 = alloca() : memref<2xf32>
linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}		linalg.generic {
		indexing_maps = [#map0, #map0],
		iterator_types = ["parallel"]}
ins(%arg1: memref<2xf32>)		ins(%arg1: memref<2xf32>)
outs(%1: memref<2xf32>) {		outs(%1: memref<2xf32>) {
^bb0(%gen2_arg0: f32, %gen2_arg1: f32):		^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
%tmp2 = exp %gen2_arg0 : f32		%tmp2 = exp %gen2_arg0 : f32
linalg.yield %tmp2 : f32		linalg.yield %tmp2 : f32
}		}
%tmp1 = exp %gen1_arg0 : f32		%tmp1 = exp %gen1_arg0 : f32
linalg.yield %tmp1 : f32		linalg.yield %tmp1 : f32
}		}
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}
// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})		// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})
// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]		// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]
		// CHECK: ^[[BB1]]:
		// CHECK: %[[ALLOC0:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
// CHECK: ^[[BB2]]:		// CHECK: ^[[BB2]]:
// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[ALLOC]]		// CHECK: %[[ALLOC1:.*]] = alloc()
		// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[ALLOC1]]
// CHECK: %[[ALLOCA:.*]] = alloca()		// CHECK: %[[ALLOCA:.*]] = alloca()
// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[ALLOCA]]		// CHECK-NEXT: linalg.generic {{{.}}} ins(%[[ARG1]]{{.}}outs(%[[ALLOCA]]
// CHECK: %{{.*}} = exp		// CHECK: %{{.*}} = exp
		// CHECK: %[[ALLOC2:.*]] = alloc()
		// CHECK-NEXT: linalg.copy
		// CHECK-NEXT: dealloc %[[ALLOC1]]
// CHECK: ^[[BB3:.]]({{.}}):		// CHECK: ^[[BB3:.]]({{.}}):
// CHECK: linalg.copy		// CHECK: linalg.copy
// CHECK-NEXT: dealloc %[[ALLOC]]		// CHECK-NEXT: dealloc

// -----		// -----

// CHECK-LABEL: func @nestedRegionControlFlowAlloca		// CHECK-LABEL: func @nestedRegionControlFlowAlloca
func @nestedRegionControlFlowAlloca(		func @nestedRegionControlFlowAlloca(
%arg0 : index,		%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {		%arg1 : index) -> memref<?x?xf32> {
%0 = cmpi "eq", %arg0, %arg1 : index		%0 = cmpi "eq", %arg0, %arg1 : index
Show All 12 Lines
// CHECK: scf.yield %[[ALLOC0]]		// CHECK: scf.yield %[[ALLOC0]]
// CHECK: %[[ALLOCA:.*]] = alloca(%arg0, %arg1)		// CHECK: %[[ALLOCA:.*]] = alloca(%arg0, %arg1)
// CHECK-NEXT: scf.yield %[[ALLOC0]]		// CHECK-NEXT: scf.yield %[[ALLOC0]]
// CHECK: return %[[ALLOC1]]		// CHECK: return %[[ALLOC1]]

// -----		// -----

// Test Case: structured control-flow loop using a nested alloc.		// Test Case: structured control-flow loop using a nested alloc.
// The alloc positions of %3 will not be changed, but the iteration argument		// The iteration argument %iterBuf has to be freed before yielding %3 to avoid
// %iterBuf has to be freed before yielding %3 to avoid memory leaks.		// memory leaks.

// -----

// CHECK-LABEL: func @loop_alloc		// CHECK-LABEL: func @loop_alloc
func @loop_alloc(		func @loop_alloc(
%lb: index,		%lb: index,
%ub: index,		%ub: index,
%step: index,		%step: index,
%buf: memref<2xf32>,		%buf: memref<2xf32>,
%res: memref<2xf32>) {		%res: memref<2xf32>) {
%0 = alloc() : memref<2xf32>		%0 = alloc() : memref<2xf32>
%1 = scf.for %i = %lb to %ub step %step		%1 = scf.for %i = %lb to %ub step %step
iter_args(%iterBuf = %buf) -> memref<2xf32> {		iter_args(%iterBuf = %buf) -> memref<2xf32> {
%2 = cmpi "eq", %i, %ub : index		%2 = cmpi "eq", %i, %ub : index
%3 = alloc() : memref<2xf32>		%3 = alloc() : memref<2xf32>
scf.yield %3 : memref<2xf32>		scf.yield %3 : memref<2xf32>
}		}
"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()		"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}

// CHECK: %[[ALLOC0:.*]] = alloc()		// CHECK: %[[ALLOC0:.*]] = alloc()
// CHECK-NEXT: dealloc %[[ALLOC0]]		// CHECK-NEXT: dealloc %[[ALLOC0]]
// CHECK-NEXT: %[[ALLOC1:.*]] = alloc()		// CHECK-NEXT: %[[ALLOC1:.*]] = alloc()
// CHECK: linalg.copy(%arg3, %[[ALLOC1]])		// CHECK: linalg.copy(%arg3, %[[ALLOC1]])
// CHECK: %[[ALLOC2:.]] = scf.for {{.}} iter_args(%[[IALLOC:.*]] = %[[ALLOC1]]		// CHECK: %[[ALLOC2:.]] = scf.for {{.}} iter_args
		// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]]
// CHECK: cmpi		// CHECK: cmpi
// CHECK: dealloc %[[IALLOC]]		// CHECK: dealloc %[[IALLOC]]
// CHECK: %[[ALLOC3:.*]] = alloc()		// CHECK: %[[ALLOC3:.*]] = alloc()
// CHECK: %[[ALLOC4:.*]] = alloc()		// CHECK: %[[ALLOC4:.*]] = alloc()
// CHECK: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])		// CHECK: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])
// CHECK: dealloc %[[ALLOC3]]		// CHECK: dealloc %[[ALLOC3]]
// CHECK: scf.yield %[[ALLOC4]]		// CHECK: scf.yield %[[ALLOC4]]
// CHECK: }		// CHECK: }
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	%1 = scf.for %i = %lb to %ub step %step
scf.yield %3 : memref<2xf32>		scf.yield %3 : memref<2xf32>
}		}
return %1 : memref<2xf32>		return %1 : memref<2xf32>
}		}

// CHECK: %[[ALLOC0:.*]] = alloc()		// CHECK: %[[ALLOC0:.*]] = alloc()
// CHECK: %[[ALLOC1:.*]] = alloc()		// CHECK: %[[ALLOC1:.*]] = alloc()
// CHECK-NEXT: linalg.copy(%arg3, %[[ALLOC1]])		// CHECK-NEXT: linalg.copy(%arg3, %[[ALLOC1]])
// CHECK-NEXT: %[[ALLOC2:.]] = scf.for {{.}} iter_args(%[[IALLOC:.*]] = %[[ALLOC1]]		// CHECK-NEXT: %[[ALLOC2:.]] = scf.for {{.}} iter_args
		// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]]
// CHECK: dealloc %[[IALLOC]]		// CHECK: dealloc %[[IALLOC]]
// CHECK: %[[ALLOC3:.*]] = scf.if		// CHECK: %[[ALLOC3:.*]] = scf.if

// CHECK: %[[ALLOC4:.*]] = alloc()		// CHECK: %[[ALLOC4:.*]] = alloc()
// CHECK-NEXT: %[[ALLOC5:.*]] = alloc()		// CHECK-NEXT: %[[ALLOC5:.*]] = alloc()
// CHECK-NEXT: linalg.copy(%[[ALLOC4]], %[[ALLOC5]])		// CHECK-NEXT: linalg.copy(%[[ALLOC4]], %[[ALLOC5]])
// CHECK-NEXT: dealloc %[[ALLOC4]]		// CHECK-NEXT: dealloc %[[ALLOC4]]
// CHECK-NEXT: scf.yield %[[ALLOC5]]		// CHECK-NEXT: scf.yield %[[ALLOC5]]
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
// CHECK-NEXT: scf.yield %[[ALLOC4]]		// CHECK-NEXT: scf.yield %[[ALLOC4]]

// CHECK: linalg.copy(%[[VAL_7]], %arg4)		// CHECK: linalg.copy(%[[VAL_7]], %arg4)
// CHECK-NEXT: dealloc %[[VAL_7]]		// CHECK-NEXT: dealloc %[[VAL_7]]

// -----		// -----

// Test Case: explicit control-flow loop with a dynamically allocated buffer.		// Test Case: explicit control-flow loop with a dynamically allocated buffer.
// The BufferPlacement transformation should fail on this explicit		// The BufferDeallocation transformation should fail on this explicit
// control-flow loop since they are not supported.		// control-flow loop since they are not supported.

// CHECK-LABEL: func @loop_dynalloc		// CHECK-LABEL: func @loop_dynalloc
func @loop_dynalloc(		func @loop_dynalloc(
%arg0 : i32,		%arg0 : i32,
%arg1 : i32,		%arg1 : i32,
%arg2: memref<?xf32>,		%arg2: memref<?xf32>,
%arg3: memref<?xf32>) {		%arg3: memref<?xf32>) {
Show All 18 Lines	^exit(%buff3 : memref<?xf32>):
return		return
}		}

// expected-error@+1 {{Structured control-flow loops are supported only}}		// expected-error@+1 {{Structured control-flow loops are supported only}}

// -----		// -----

// Test Case: explicit control-flow loop with a dynamically allocated buffer.		// Test Case: explicit control-flow loop with a dynamically allocated buffer.
// The BufferPlacement transformation should fail on this explicit		// The BufferDeallocation transformation should fail on this explicit
// control-flow loop since they are not supported.		// control-flow loop since they are not supported.

// CHECK-LABEL: func @do_loop_alloc		// CHECK-LABEL: func @do_loop_alloc
func @do_loop_alloc(		func @do_loop_alloc(
%arg0 : i32,		%arg0 : i32,
%arg1 : i32,		%arg1 : i32,
%arg2: memref<2xf32>,		%arg2: memref<2xf32>,
%arg3: memref<2xf32>) {		%arg3: memref<2xf32>) {
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

mlir/test/Transforms/buffer-hoisting.mlir

This file was added.

				// RUN: mlir-opt -buffer-hoisting -split-input-file %s \| FileCheck %s

				// This file checks the behaviour of BufferHoisting pass for moving Alloc
				// operations to their correct positions.

				herhutUnsubmitted Done Reply Inline Actions nit: to their herhut: nit: to their
				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferHoisting expected behavior: It should move the existing AllocOp to
				// the entry block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranch
				func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				br ^bb3(%arg1 : memref<2xf32>)
				^bb2:
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%0 : memref<2xf32>)
				^bb3(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
				// CHECK-NEXT: cond_br

				// -----

				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferHoisting expected behavior: It should not move the existing AllocOp
				// to any other block since the alloc has a dynamic dependency to block argument
				// %0 in bb2.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranchDynamicType
				func @condBranchDynamicType(
				%arg0: i1,
				%arg1: memref<?xf32>,
				%arg2: memref<?xf32>,
				%arg3: index) {
				cond_br %arg0, ^bb1, ^bb2(%arg3: index)
				^bb1:
				br ^bb3(%arg1 : memref<?xf32>)
				^bb2(%0: index):
				%1 = alloc(%0) : memref<?xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<?xf32>)
				outs(%1: memref<?xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%1 : memref<?xf32>)
				^bb3(%2: memref<?xf32>):
				"linalg.copy"(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
				return
				}

				// CHECK-NEXT: cond_br
				// CHECK: ^bb2
				// CHECK: ^bb2(%[[IDX:.]]:{{.}})
				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[IDX]])
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \| / \
				// \| bb3 bb4
				// \| \ /
				// \ bb5
				// \ /
				// bb6
				// \|
				// bb7
				// BufferHoisting expected behavior: It should not move the existing AllocOp
				// to any other block since the alloc has a dynamic dependency to block argument
				// %0 in bb2.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranchDynamicTypeNested
				func @condBranchDynamicTypeNested(
				%arg0: i1,
				%arg1: memref<?xf32>,
				%arg2: memref<?xf32>,
				%arg3: index) {
				cond_br %arg0, ^bb1, ^bb2(%arg3: index)
				^bb1:
				br ^bb6(%arg1 : memref<?xf32>)
				^bb2(%0: index):
				%1 = alloc(%0) : memref<?xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<?xf32>)
				outs(%1: memref<?xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				cond_br %arg0, ^bb3, ^bb4
				^bb3:
				br ^bb5(%1 : memref<?xf32>)
				^bb4:
				br ^bb5(%1 : memref<?xf32>)
				^bb5(%2: memref<?xf32>):
				br ^bb6(%2 : memref<?xf32>)
				^bb6(%3: memref<?xf32>):
				br ^bb7(%3 : memref<?xf32>)
				^bb7(%4: memref<?xf32>):
				"linalg.copy"(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
				return
				}

				// CHECK-NEXT: cond_br
				// CHECK: ^bb2
				// CHECK: ^bb2(%[[IDX:.]]:{{.}})
				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[IDX]])
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case:
				// bb0
				// / \
				// \| bb1 <- Initial position of AllocOp
				// \ /
				// bb2
				// BufferHoisting expected behavior: It should move the existing AllocOp to
				// the entry block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @criticalEdge
				func @criticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)
				^bb1:
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb2(%0 : memref<2xf32>)
				^bb2(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
				// CHECK-NEXT: cond_br

				// -----

				// Test Case:
				// bb0 <- Initial position of the first AllocOp
				// / \
				// bb1 bb2
				// \ /
				// bb3 <- Initial position of the second AllocOp
				// BufferHoisting expected behavior: It shouldn't move the AllocOps.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @ifElse
				func @ifElse(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				cond_br %arg0,
				^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>),
				^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
				^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
				br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)
				^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
				br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)
				^bb3(%5: memref<2xf32>, %6: memref<2xf32>):
				%7 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%7: memref<2xf32>)
				outs(%7: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				"linalg.copy"(%7, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: br ^bb3
				// CHECK: br ^bb3
				// CHECK-NEXT: ^bb3
				// CHECK: %[[ALLOC1:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: linalg.copy(%[[ALLOC1]]
				// CHECK-NEXT: return

				// -----

				// Test Case: No users for buffer in if-else CFG
				// bb0 <- Initial position of AllocOp
				// / \
				// bb1 bb2
				// \ /
				// bb3
				// BufferHoisting expected behavior: It shouldn't move the AllocOp.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @ifElseNoUsers
				func @ifElseNoUsers(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				cond_br %arg0,
				^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>),
				^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
				^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
				br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)
				^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
				br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)
				^bb3(%5: memref<2xf32>, %6: memref<2xf32>):
				"linalg.copy"(%arg1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case:
				// bb0 <- Initial position of the first AllocOp
				// / \
				// bb1 bb2
				// \| / \
				// \| bb3 bb4
				// \ \ /
				// \ /
				// bb5 <- Initial position of the second AllocOp
				// BufferHoisting expected behavior: AllocOps shouldn't be moved.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @ifElseNested
				func @ifElseNested(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				cond_br %arg0,
				^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>),
				^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
				^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
				br ^bb5(%1, %2 : memref<2xf32>, memref<2xf32>)
				^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
				cond_br %arg0, ^bb3(%3 : memref<2xf32>), ^bb4(%4 : memref<2xf32>)
				^bb3(%5: memref<2xf32>):
				br ^bb5(%5, %3 : memref<2xf32>, memref<2xf32>)
				^bb4(%6: memref<2xf32>):
				br ^bb5(%3, %6 : memref<2xf32>, memref<2xf32>)
				^bb5(%7: memref<2xf32>, %8: memref<2xf32>):
				%9 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%7: memref<2xf32>)
				outs(%9: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				"linalg.copy"(%9, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: br ^bb5
				// CHECK: br ^bb5
				// CHECK: br ^bb5
				// CHECK-NEXT: ^bb5
				// CHECK: %[[ALLOC1:.*]] = alloc()
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case: Dead operations in a single block.
				// BufferHoisting expected behavior: It shouldn't move the AllocOps.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @redundantOperations
				func @redundantOperations(%arg0: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg0: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				%1 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%0: memref<2xf32>)
				outs(%1: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				return
				}

				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: %[[ALLOC1:.*]] = alloc()
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case:
				// bb0
				// / \
				// Initial pos of the 1st AllocOp -> bb1 bb2 <- Initial pos of the 2nd AllocOp
				// \ /
				// bb3
				// BufferHoisting expected behavior: Both AllocOps should be moved to the
				// entry block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @moving_alloc_and_inserting_missing_dealloc
				func @moving_alloc_and_inserting_missing_dealloc(
				%cond: i1,
				%arg0: memref<2xf32>,
				%arg1: memref<2xf32>) {
				cond_br %cond, ^bb1, ^bb2
				^bb1:
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg0: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^exit(%0 : memref<2xf32>)
				^bb2:
				%1 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg0: memref<2xf32>)
				outs(%1: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				br ^exit(%1 : memref<2xf32>)
				^exit(%arg2: memref<2xf32>):
				"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %{{.*}} = alloc()
				// CHECK-NEXT: %{{.*}} = alloc()
				// CHECK-NEXT: cond_br

				// -----

				// Test Case: Invalid position of the DeallocOp. There is a user after
				// deallocation.
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferHoisting expected behavior: It should move the AllocOp to the entry
				// block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @moving_invalid_dealloc_op_complex
				func @moving_invalid_dealloc_op_complex(
				%cond: i1,
				%arg0: memref<2xf32>,
				%arg1: memref<2xf32>) {
				cond_br %cond, ^bb1, ^bb2
				^bb1:
				br ^exit(%arg0 : memref<2xf32>)
				^bb2:
				%1 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg0: memref<2xf32>)
				outs(%1: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				dealloc %1 : memref<2xf32>
				br ^exit(%1 : memref<2xf32>)
				^exit(%arg2: memref<2xf32>):
				"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %{{.*}} = alloc()
				// CHECK-NEXT: cond_br

				// -----

				// Test Case: Nested regions - This test defines a GenericOp inside the region
				// of another GenericOp.
				// BufferHoisting expected behavior: The AllocOp of inner GenericOp should
				// remain inside the region of outer GenericOp. The AllocOp of the outer
				// GenericOp should be moved to the entry block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @nested_regions_and_cond_branch
				func @nested_regions_and_cond_branch(
				%arg0: i1,
				%arg1: memref<2xf32>,
				%arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				br ^bb3(%arg1 : memref<2xf32>)
				^bb2:
				%0 = alloc() : memref<2xf32>
				linalg.generic {
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%1 = alloc() : memref<2xf32>
				linalg.generic {
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%1: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%0 : memref<2xf32>)
				^bb3(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}
				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: cond_br
				// CHECK: linalg.generic
				// CHECK: %[[ALLOC1:.*]] = alloc()
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case: nested region control flow
				// The alloc position of %1 does not need to be changed and flows through
				// both if branches until it is finally returned.

				// CHECK-LABEL: func @nested_region_control_flow
				func @nested_region_control_flow(
				%arg0 : index,
				%arg1 : index) -> memref<?x?xf32> {
				%0 = cmpi "eq", %arg0, %arg1 : index
				%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%2 = scf.if %0 -> (memref<?x?xf32>) {
				scf.yield %1 : memref<?x?xf32>
				} else {
				%3 = alloc(%arg0, %arg1) : memref<?x?xf32>
				scf.yield %1 : memref<?x?xf32>
				}
				return %2 : memref<?x?xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: %{{.*}} = scf.if
				// CHECK: else
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%arg0, %arg1)

				// -----

				// Test Case: nested region control flow with a nested buffer allocation in a
				// divergent branch.
				// The alloc positions of %1 does not need to be changed. %3 is moved upwards.

				// CHECK-LABEL: func @nested_region_control_flow_div
				func @nested_region_control_flow_div(
				%arg0 : index,
				%arg1 : index) -> memref<?x?xf32> {
				%0 = cmpi "eq", %arg0, %arg1 : index
				%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%2 = scf.if %0 -> (memref<?x?xf32>) {
				scf.yield %1 : memref<?x?xf32>
				} else {
				%3 = alloc(%arg0, %arg1) : memref<?x?xf32>
				scf.yield %3 : memref<?x?xf32>
				}
				return %2 : memref<?x?xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%arg0, %arg1)
				// CHECK-NEXT: %{{.*}} = scf.if

				// -----

				// Test Case: deeply nested region control flow with a nested buffer allocation
				// in a divergent branch.
				// The alloc position of %1 does not need to be changed. Allocs %4 and %5 are
				// moved upwards.

				// CHECK-LABEL: func @nested_region_control_flow_div_nested
				func @nested_region_control_flow_div_nested(
				%arg0 : index,
				%arg1 : index) -> memref<?x?xf32> {
				%0 = cmpi "eq", %arg0, %arg1 : index
				%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%2 = scf.if %0 -> (memref<?x?xf32>) {
				%3 = scf.if %0 -> (memref<?x?xf32>) {
				scf.yield %1 : memref<?x?xf32>
				} else {
				%4 = alloc(%arg0, %arg1) : memref<?x?xf32>
				scf.yield %4 : memref<?x?xf32>
				}
				scf.yield %3 : memref<?x?xf32>
				} else {
				%5 = alloc(%arg1, %arg1) : memref<?x?xf32>
				scf.yield %5 : memref<?x?xf32>
				}
				return %2 : memref<?x?xf32>
				}
				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%arg0, %arg1)
				// CHECK-NEXT: %[[ALLOC2:.*]] = alloc(%arg1, %arg1)
				// CHECK-NEXT: %{{.*}} = scf.if

				// -----

				// Test Case: nested region control flow within a region interface.
				// The alloc positions of %0 does not need to be changed.

				// CHECK-LABEL: func @inner_region_control_flow
				func @inner_region_control_flow(%arg0 : index) -> memref<?x?xf32> {
				%0 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%1 = test.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) then {
				^bb0(%arg1 : memref<?x?xf32>):
				test.region_if_yield %arg1 : memref<?x?xf32>
				} else {
				^bb0(%arg1 : memref<?x?xf32>):
				test.region_if_yield %arg1 : memref<?x?xf32>
				} join {
				^bb0(%arg1 : memref<?x?xf32>):
				test.region_if_yield %arg1 : memref<?x?xf32>
				}
				return %1 : memref<?x?xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: {{.*}} test.region_if

				// -----

				// Test Case: nested region control flow within a region interface including an
				// allocation in a divergent branch.
				// The alloc positions of %0 does not need to be changed. %2 is moved upwards.

				// CHECK-LABEL: func @inner_region_control_flow_div
				func @inner_region_control_flow_div(
				%arg0 : index,
				%arg1 : index) -> memref<?x?xf32> {
				%0 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%1 = test.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>) then {
				^bb0(%arg2 : memref<?x?xf32>):
				test.region_if_yield %arg2 : memref<?x?xf32>
				} else {
				^bb0(%arg2 : memref<?x?xf32>):
				%2 = alloc(%arg0, %arg1) : memref<?x?xf32>
				test.region_if_yield %2 : memref<?x?xf32>
				} join {
				^bb0(%arg2 : memref<?x?xf32>):
				test.region_if_yield %arg2 : memref<?x?xf32>
				}
				return %1 : memref<?x?xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%arg0, %arg1)
				// CHECK-NEXT: {{.*}} test.region_if

				// -----

				#map0 = affine_map<(d0) -> (d0)>

				// Test Case: Alloca operations shouldn't be moved.

				// CHECK-LABEL: func @condBranchAlloca
				func @condBranchAlloca(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				br ^bb3(%arg1 : memref<2xf32>)
				^bb2:
				%0 = alloca() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%0 : memref<2xf32>)
				^bb3(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: cond_br
				// CHECK: ^bb2
				// CHECK: ^bb2
				// CHECK-NEXT: %[[ALLOCA:.*]] = alloca()
				// CHECK-NEXT: linalg.generic

				// -----

				#map0 = affine_map<(d0) -> (d0)>

				// Test Case: Alloca operations shouldn't be moved. The alloc operation also
				// shouldn't be moved analogously to the ifElseNested test.

				// CHECK-LABEL: func @ifElseNestedAlloca
				func @ifElseNestedAlloca(
				%arg0: i1,
				%arg1: memref<2xf32>,
				%arg2: memref<2xf32>) {
				%0 = alloca() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				cond_br %arg0,
				^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>),
				^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
				^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
				br ^bb5(%1, %2 : memref<2xf32>, memref<2xf32>)
				^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
				cond_br %arg0, ^bb3(%3 : memref<2xf32>), ^bb4(%4 : memref<2xf32>)
				^bb3(%5: memref<2xf32>):
				br ^bb5(%5, %3 : memref<2xf32>, memref<2xf32>)
				^bb4(%6: memref<2xf32>):
				br ^bb5(%3, %6 : memref<2xf32>, memref<2xf32>)
				^bb5(%7: memref<2xf32>, %8: memref<2xf32>):
				%9 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%7: memref<2xf32>)
				outs(%9: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				"linalg.copy"(%9, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOCA:.*]] = alloca()
				// CHECK-NEXT: linalg.generic
				// CHECK: ^bb5
				// CHECK: ^bb5
				// CHECK: ^bb5
				// CHECK-NEXT: ^bb5
				// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic

				// -----

				#map0 = affine_map<(d0) -> (d0)>

				// Test Case: Alloca operations shouldn't be moved. The alloc operation should
				// be moved in the beginning analogous to the nestedRegionsAndCondBranch test.

				// CHECK-LABEL: func @nestedRegionsAndCondBranchAlloca
				func @nestedRegionsAndCondBranchAlloca(
				%arg0: i1,
				%arg1: memref<2xf32>,
				%arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				br ^bb3(%arg1 : memref<2xf32>)
				^bb2:
				%0 = alloc() : memref<2xf32>
				linalg.generic {
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%1 = alloca() : memref<2xf32>
				linalg.generic {
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%1: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%0 : memref<2xf32>)
				^bb3(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}
				// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
				// CHECK-NEXT: cond_br
				// CHECK: linalg.generic
				// CHECK: %[[ALLOCA:.*]] = alloca()
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case: structured control-flow loop using a nested alloc.
				// The alloc positions of %3 will be moved upwards.

				// CHECK-LABEL: func @loop_alloc
				func @loop_alloc(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = cmpi "eq", %i, %ub : index
				%3 = alloc() : memref<2xf32>
				scf.yield %3 : memref<2xf32>
				}
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: {{.*}} scf.for
				// CHECK: %[[ALLOC1:.*]] = alloc()

				// -----

				// Test Case: structured control-flow loop with a nested if operation using
				// a deeply nested buffer allocation.
				// The allocation %4 is not moved upwards.

				// CHECK-LABEL: func @loop_nested_if_alloc
				func @loop_nested_if_alloc(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>) -> memref<2xf32> {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = cmpi "eq", %i, %ub : index
				%3 = scf.if %2 -> (memref<2xf32>) {
				%4 = alloc() : memref<2xf32>
				scf.yield %4 : memref<2xf32>
				} else {
				scf.yield %0 : memref<2xf32>
				}
				scf.yield %3 : memref<2xf32>
				}
				return %1 : memref<2xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: {{.*}} scf.for
				// CHECK: %[[ALLOC1:.*]] = alloc()

				// -----

				// Test Case: several nested structured control-flow loops with a deeply nested
				// buffer allocation inside an if operation.
				// Same behavior is an loop_nested_if_alloc: The allocs are not moved upwards.

				// CHECK-LABEL: func @loop_nested_alloc
				func @loop_nested_alloc(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = scf.for %i2 = %lb to %ub step %step
				iter_args(%iterBuf2 = %iterBuf) -> memref<2xf32> {
				%3 = scf.for %i3 = %lb to %ub step %step
				iter_args(%iterBuf3 = %iterBuf2) -> memref<2xf32> {
				%4 = alloc() : memref<2xf32>
				%5 = cmpi "eq", %i, %ub : index
				%6 = scf.if %5 -> (memref<2xf32>) {
				%7 = alloc() : memref<2xf32>
				scf.yield %7 : memref<2xf32>
				} else {
				scf.yield %iterBuf3 : memref<2xf32>
				}
				scf.yield %6 : memref<2xf32>
				}
				scf.yield %3 : memref<2xf32>
				}
				scf.yield %2 : memref<2xf32>
				}
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc()
				// CHECK: %[[ALLOC2:.*]] = alloc()

				// -----

				// CHECK-LABEL: func @loop_nested_alloc_dyn_dependency
				func @loop_nested_alloc_dyn_dependency(
				%lb: index,
				%ub: index,
				%step: index,
				%arg0: index,
				%buf: memref<?xf32>,
				%res: memref<?xf32>) {
				%0 = alloc(%arg0) : memref<?xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<?xf32> {
				%2 = scf.for %i2 = %lb to %ub step %step
				iter_args(%iterBuf2 = %iterBuf) -> memref<?xf32> {
				%3 = scf.for %i3 = %lb to %ub step %step
				iter_args(%iterBuf3 = %iterBuf2) -> memref<?xf32> {
				%5 = cmpi "eq", %i, %ub : index
				%6 = scf.if %5 -> (memref<?xf32>) {
				%7 = alloc(%i3) : memref<?xf32>
				scf.yield %7 : memref<?xf32>
				} else {
				scf.yield %iterBuf3 : memref<?xf32>
				}
				scf.yield %6 : memref<?xf32>
				}
				scf.yield %3 : memref<?xf32>
				}
				scf.yield %0 : memref<?xf32>
				}
				"linalg.copy"(%1, %res) : (memref<?xf32>, memref<?xf32>) -> ()
				return
				}


				// CHECK: %[[ALLOC0:.]] = alloc({{.}})
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK: %[[ALLOC1:.]] = alloc({{.}})

mlir/test/Transforms/buffer-loop-hoisting.mlir

This file was added.

				// RUN: mlir-opt -buffer-loop-hoisting -split-input-file %s \| FileCheck %s

				herhutUnsubmitted Done Reply Inline Actions It is not clear to me why these tests are in this file. Not all of them are concerned with loops. herhut: It is not clear to me why these tests are in this file. Not all of them are concerned with…
				dfki-makoAuthorUnsubmitted Done Reply Inline Actions It is not required to have all of these tests. However, out intention was to ensure that the allocations are not moved in tests without any loops. dfki-mako: It is not required to have all of these tests. However, out intention was to ensure that the…
				// This file checks the behavior of BufferLoopHoisting pass for moving Alloc
				// operations in their correct positions.
				herhutUnsubmitted Done Reply Inline Actions BufferLoopHoisting here and below? herhut: BufferLoopHoisting here and below?

				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferLoopHoisting expected behavior: It should not move the AllocOp.

				herhutUnsubmitted Done Reply Inline Actions It does not move it, right? herhut: It does not move it, right?
				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranch
				func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				br ^bb3(%arg1 : memref<2xf32>)
				^bb2:
				%0 = alloc() : memref<2xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%0 : memref<2xf32>)
				^bb3(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: cond_br
				// CHECK: %[[ALLOC:.*]] = alloc()

				// -----

				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferLoopHoisting expected behavior: It should not move the existing AllocOp
				// to any other block since the alloc has a dynamic dependency to block argument
				// %0 in bb2.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranchDynamicType
				func @condBranchDynamicType(
				%arg0: i1,
				%arg1: memref<?xf32>,
				%arg2: memref<?xf32>,
				%arg3: index) {
				cond_br %arg0, ^bb1, ^bb2(%arg3: index)
				^bb1:
				br ^bb3(%arg1 : memref<?xf32>)
				^bb2(%0: index):
				%1 = alloc(%0) : memref<?xf32>
				linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel"]}
				ins(%arg1: memref<?xf32>)
				outs(%1: memref<?xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%1 : memref<?xf32>)
				^bb3(%2: memref<?xf32>):
				"linalg.copy"(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
				return
				}

				// CHECK-NEXT: cond_br
				// CHECK: ^bb2
				// CHECK: ^bb2(%[[IDX:.]]:{{.}})
				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[IDX]])
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case: Nested regions - This test defines a GenericOp inside the region
				// of another GenericOp.
				// BufferLoopHoisting expected behavior: The AllocOp of inner GenericOp should
				herhutUnsubmitted Done Reply Inline Actions This comment does not describe what is tested. In the test, nothing is moved. herhut: This comment does not describe what is tested. In the test, nothing is moved.
				// remain inside the region of outer GenericOp. The AllocOp of the outer
				// GenericOp should not be moved during this pass.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @nested_regions_and_cond_branch
				func @nested_regions_and_cond_branch(
				%arg0: i1,
				%arg1: memref<2xf32>,
				%arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				br ^bb3(%arg1 : memref<2xf32>)
				^bb2:
				%0 = alloc() : memref<2xf32>
				linalg.generic {
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%0: memref<2xf32>) {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%1 = alloc() : memref<2xf32>
				linalg.generic {
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]}
				ins(%arg1: memref<2xf32>)
				outs(%1: memref<2xf32>) {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}
				br ^bb3(%0 : memref<2xf32>)
				^bb3(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}
				// CHECK-NEXT: cond_br
				// CHECK: %[[ALLOC0:.*]] = alloc()
				// CHECK: linalg.generic
				// CHECK: %[[ALLOC1:.*]] = alloc()
				// CHECK-NEXT: linalg.generic

				// -----

				// Test Case: nested region control flow
				// The alloc position of %1 does not need to be changed and flows through
				// both if branches until it is finally returned.

				// CHECK-LABEL: func @nested_region_control_flow
				func @nested_region_control_flow(
				%arg0 : index,
				%arg1 : index) -> memref<?x?xf32> {
				%0 = cmpi "eq", %arg0, %arg1 : index
				%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%2 = scf.if %0 -> (memref<?x?xf32>) {
				scf.yield %1 : memref<?x?xf32>
				} else {
				%3 = alloc(%arg0, %arg1) : memref<?x?xf32>
				scf.yield %1 : memref<?x?xf32>
				}
				return %2 : memref<?x?xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: %{{.*}} = scf.if
				// CHECK: else
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%arg0, %arg1)

				// -----

				// Test Case: structured control-flow loop using a nested alloc.
				// The alloc positions of %3 should not be changed.

				// CHECK-LABEL: func @loop_alloc
				func @loop_alloc(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = cmpi "eq", %i, %ub : index
				%3 = alloc() : memref<2xf32>
				scf.yield %3 : memref<2xf32>
				}
				herhutUnsubmitted Done Reply Inline Actions Why is this legal to do? `%3` escapes this loop on the back-edge and result. So if there was another use of `iterBuf` after the alloc, this would introduce an aliasing of buffers that did not exist before. In my mind, it is only legal to hoist an allocation out of the loop if it does not escape the loop otherwise. herhut: Why is this legal to do? `%3` escapes this loop on the back-edge and result. So if there was…
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: {{.*}} scf.for
				// CHECK: %[[ALLOC1:.*]] = alloc()

				// -----

				// Test Case: structured control-flow loop with a nested if operation using
				// a deeply nested buffer allocation.
				// The allocation %4 should not be moved upwards due to a back-edge dependency.

				// CHECK-LABEL: func @loop_nested_if_alloc
				func @loop_nested_if_alloc(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>) -> memref<2xf32> {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = cmpi "eq", %i, %ub : index
				%3 = scf.if %2 -> (memref<2xf32>) {
				%4 = alloc() : memref<2xf32>
				scf.yield %4 : memref<2xf32>
				} else {
				herhutUnsubmitted Done Reply Inline Actions Two issues the then case might be rare, in which case this is not an improvement the allocation escapes herhut: Two issues * the then case might be rare, in which case this is not an improvement * the…
				scf.yield %0 : memref<2xf32>
				}
				scf.yield %3 : memref<2xf32>
				}
				return %1 : memref<2xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: {{.*}} scf.for
				// CHECK: %[[ALLOC1:.*]] = alloc()

				// -----

				// Test Case: several nested structured control-flow loops with deeply nested
				// buffer allocations inside an if operation.
				// Behavior: The allocs %0, %4 and %9 are moved upwards, while %7 and %8 stay
				// in their positions.

				herhutUnsubmitted Done Reply Inline Actions This is no longer the same. herhut: This is no longer the same.
				// CHECK-LABEL: func @loop_nested_alloc
				func @loop_nested_alloc(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = scf.for %i2 = %lb to %ub step %step
				iter_args(%iterBuf2 = %iterBuf) -> memref<2xf32> {
				%3 = scf.for %i3 = %lb to %ub step %step
				iter_args(%iterBuf3 = %iterBuf2) -> memref<2xf32> {
				%4 = alloc() : memref<2xf32>
				%5 = cmpi "eq", %i, %ub : index
				%6 = scf.if %5 -> (memref<2xf32>) {
				%7 = alloc() : memref<2xf32>
				%8 = alloc() : memref<2xf32>
				scf.yield %8 : memref<2xf32>
				} else {
				scf.yield %iterBuf3 : memref<2xf32>
				}
				%9 = alloc() : memref<2xf32>
				scf.yield %6 : memref<2xf32>
				}
				scf.yield %3 : memref<2xf32>
				}
				scf.yield %2 : memref<2xf32>
				}
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc()
				herhutUnsubmitted Done Reply Inline Actions Is is hard to see what the actual structure is that this is checking. herhut: Is is hard to see what the actual structure is that this is checking.
				// CHECK-NEXT: %[[ALLOC2:.*]] = alloc()
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK: {{.*}} = scf.if
				// CHECK: %[[ALLOC3:.*]] = alloc()
				// CHECK: %[[ALLOC4:.*]] = alloc()

				// -----

				// CHECK-LABEL: func @loop_nested_alloc_dyn_dependency
				func @loop_nested_alloc_dyn_dependency(
				%lb: index,
				%ub: index,
				%step: index,
				%arg0: index,
				%buf: memref<?xf32>,
				%res: memref<?xf32>) {
				%0 = alloc(%arg0) : memref<?xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<?xf32> {
				%2 = scf.for %i2 = %lb to %ub step %step
				iter_args(%iterBuf2 = %iterBuf) -> memref<?xf32> {
				%3 = scf.for %i3 = %lb to %ub step %step
				iter_args(%iterBuf3 = %iterBuf2) -> memref<?xf32> {
				%4 = alloc(%i3) : memref<?xf32>
				%5 = cmpi "eq", %i, %ub : index
				%6 = scf.if %5 -> (memref<?xf32>) {
				%7 = alloc(%i3) : memref<?xf32>
				scf.yield %7 : memref<?xf32>
				} else {
				scf.yield %iterBuf3 : memref<?xf32>
				}
				%8 = alloc(%i3) : memref<?xf32>
				scf.yield %6 : memref<?xf32>
				}
				scf.yield %3 : memref<?xf32>
				}
				scf.yield %0 : memref<?xf32>
				}
				"linalg.copy"(%1, %res) : (memref<?xf32>, memref<?xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.]] = alloc({{.}})
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK: %[[ALLOC1:.]] = alloc({{.}})
				// CHECK: %[[ALLOC2:.]] = alloc({{.}})

				// -----
				herhutUnsubmitted Done Reply Inline Actions Can you add small tests that check simple cases: one alloc hoisted out of loop, one hoisted through multiple loops alloc not hoisted out of conditional, alloc hoisted out of loop in conditional alloc with dependencies not hoisted out of loop, alloc with dependencies in loop nest hoisted as far as possible. That likely makes it easier to write CHECK patterns. herhut: Can you add small tests that check simple cases: * one alloc hoisted out of loop, one hoisted…

				// CHECK-LABEL: func @hoist_one_loop
				func @hoist_one_loop(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = alloc() : memref<2xf32>
				scf.yield %0 : memref<2xf32>
				}
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.]] = alloc({{.}})
				// CHECK-NEXT: %[[ALLOC1:.]] = alloc({{.}})
				// CHECK-NEXT: {{.*}} = scf.for

				// -----

				// CHECK-LABEL: func @no_hoist_one_loop
				func @no_hoist_one_loop(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%1 = alloc() : memref<2xf32>
				scf.yield %1 : memref<2xf32>
				}
				"linalg.copy"(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: {{.*}} = scf.for
				// CHECK-NEXT: %[[ALLOC0:.]] = alloc({{.}})

				// -----

				// CHECK-LABEL: func @hoist_multiple_loop
				func @hoist_multiple_loop(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = scf.for %i2 = %lb to %ub step %step
				iter_args(%iterBuf2 = %iterBuf) -> memref<2xf32> {
				%3 = alloc() : memref<2xf32>
				scf.yield %0 : memref<2xf32>
				}
				scf.yield %0 : memref<2xf32>
				}
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.]] = alloc({{.}})
				// CHECK-NEXT: %[[ALLOC1:.]] = alloc({{.}})
				// CHECK-NEXT: {{.*}} = scf.for

				// -----

				// CHECK-LABEL: func @no_hoist_one_loop_conditional
				func @no_hoist_one_loop_conditional(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%1 = cmpi "eq", %i, %ub : index
				%2 = scf.if %1 -> (memref<2xf32>) {
				%3 = alloc() : memref<2xf32>
				scf.yield %3 : memref<2xf32>
				} else {
				scf.yield %iterBuf : memref<2xf32>
				}
				scf.yield %2 : memref<2xf32>
				}
				"linalg.copy"(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: {{.*}} = scf.for
				// CHECK: {{.*}} = scf.if
				// CHECK-NEXT: %[[ALLOC0:.]] = alloc({{.}})

				// -----

				// CHECK-LABEL: func @hoist_one_loop_conditional
				func @hoist_one_loop_conditional(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = cmpi "eq", %lb, %ub : index
				%2 = scf.if %1 -> (memref<2xf32>) {
				%3 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%4 = alloc() : memref<2xf32>
				scf.yield %0 : memref<2xf32>
				}
				scf.yield %0 : memref<2xf32>
				}
				else
				{
				scf.yield %0 : memref<2xf32>
				}
				"linalg.copy"(%2, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: {{.*}} = scf.if
				// CHECK-NEXT: %[[ALLOC0:.]] = alloc({{.}})
				// CHECK: {{.*}} = scf.for

				// -----

				// CHECK-LABEL: func @no_hoist_one_loop_dependency
				func @no_hoist_one_loop_dependency(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = alloc(%i) : memref<?xf32>
				scf.yield %0 : memref<2xf32>
				}
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.]] = alloc({{.}})
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: %[[ALLOC1:.]] = alloc({{.}})

				// -----

				// CHECK-LABEL: func @partial_hoist_multiple_loop_dependency
				func @partial_hoist_multiple_loop_dependency(
				%lb: index,
				%ub: index,
				%step: index,
				%buf: memref<2xf32>,
				%res: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				%1 = scf.for %i = %lb to %ub step %step
				iter_args(%iterBuf = %buf) -> memref<2xf32> {
				%2 = scf.for %i2 = %lb to %ub step %step
				iter_args(%iterBuf2 = %iterBuf) -> memref<2xf32> {
				%3 = alloc(%i) : memref<?xf32>
				scf.yield %0 : memref<2xf32>
				}
				scf.yield %0 : memref<2xf32>
				}
				"linalg.copy"(%1, %res) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: %[[ALLOC0:.]] = alloc({{.}})
				// CHECK-NEXT: {{.*}} = scf.for
				// CHECK-NEXT: %[[ALLOC1:.]] = alloc({{.}})
				// CHECK-NEXT: {{.*}} = scf.for

mlir/test/Transforms/buffer-placement.mlir

This file was moved to mlir/test/Transforms/buffer-deallocation.mlir.

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Refactored BufferPlacement transformation into BufferDeallocation and BufferHoisting.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 298998

mlir/include/mlir/Transforms/Bufferize.h

mlir/include/mlir/Transforms/Passes.h

mlir/include/mlir/Transforms/Passes.td

mlir/lib/Transforms/BufferDeallocation.cpp

mlir/lib/Transforms/BufferOptimizations.cpp

mlir/lib/Transforms/BufferPlacement.cpp

mlir/lib/Transforms/CMakeLists.txt

mlir/test/Dialect/Linalg/bufferize.mlir

mlir/test/Transforms/buffer-deallocation.mlir

mlir/test/Transforms/buffer-hoisting.mlir

mlir/test/Transforms/buffer-loop-hoisting.mlir

mlir/test/Transforms/buffer-placement.mlir

[mlir] Refactored BufferPlacement transformation into BufferDeallocation and BufferHoisting.
ClosedPublic