Diff 270710

mlir/lib/Transforms/BufferPlacement.cpp

//===- BufferPlacement.cpp - the impl for buffer placement ---------------===//		//===- BufferPlacement.cpp - the impl for buffer placement ---------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements logic for computing correct alloc and dealloc positions.		// This file implements logic for computing correct alloc and dealloc positions.
// The main class is the BufferPlacementPass class that implements the		// Furthermore, buffer placement also adds required new alloc and copy
// underlying algorithm. In order to put allocations and deallocations at safe		// operations to ensure that all buffers are deallocated.The main class is the
		herhutUnsubmitted Done Reply Inline Actions Incomplete sentence `to ensure that all buffers` herhut: Incomplete sentence `to ensure that all buffers`
// positions, it is significantly important to put them into the correct blocks.		// BufferPlacementPass class that implements the underlying algorithm. In order
// However, the liveness analysis does not pay attention to aliases, which can		// to put allocations and deallocations at safe positions, it is significantly
// occur due to branches (and their associated block arguments) in general. For		// important to put them into the correct blocks. However, the liveness analysis
// this purpose, BufferPlacement firstly finds all possible aliases for a single		// does not pay attention to aliases, which can occur due to branches (and their
// value (using the BufferPlacementAliasAnalysis class). Consider the following		// associated block arguments) in general. For this purpose, BufferPlacement
// example:		// firstly finds all possible aliases for a single value (using the
		// BufferPlacementAliasAnalysis class). Consider the following example:
//		//
// ^bb0(%arg0):		// ^bb0(%arg0):
// cond_br %cond, ^bb1, ^bb2		// cond_br %cond, ^bb1, ^bb2
// ^bb1:		// ^bb1:
// br ^exit(%arg0)		// br ^exit(%arg0)
// ^bb2:		// ^bb2:
// %new_value = ...		// %new_value = ...
// br ^exit(%new_value)		// br ^exit(%new_value)
// ^exit(%arg1):		// ^exit(%arg1):
// return %arg1;		// return %arg1;
//		//
// Using liveness information on its own would cause us to place the allocs and		// Using liveness information on its own would cause us to place the allocs and
// deallocs in the wrong block. This is due to the fact that %new_value will not		// deallocs in the wrong block. This is due to the fact that %new_value will not
// be liveOut of its block. Instead, we have to place the alloc for %new_value		// be liveOut of its block. Instead, we can place the alloc for %new_value
// in bb0 and its associated dealloc in exit. Using the class		// in bb0 and its associated dealloc in exit. Alternatively, the alloc can stay
// BufferPlacementAliasAnalysis, we will find out that %new_value has a		// (or even has to stay due to additional dependencies) at this location and we
// potential alias %arg1. In order to find the dealloc position we have to find		// have to free the buffer in the same block, because it cannot be freed in the
// all potential aliases, iterate over their uses and find the common		// post dominator. However, this requires a new copy buffer for %arg1 that will
// post-dominator block. In this block we can safely be sure that %new_value		// contain the actual contents. Using the class BufferPlacementAliasAnalysis, we
		herhutUnsubmitted Done Reply Inline Actions In other words, the analysis uses the post-dominator to free the allocated memory. If no such post-dominator exists, aliases are removed by inserting allocs and copies. Is that the idea? herhut: In other words, the analysis uses the post-dominator to free the allocated memory. If no such…
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions Yep, that's the general idea. dfki-mako: Yep, that's the general idea.
// will die and can use liveness information to determine the exact operation		// will find out that %new_value has a potential alias %arg1. In order to find
// after which we have to insert the dealloc. Finding the alloc position is		// the dealloc position we have to find all potential aliases, iterate over
// highly similar and non- obvious. Again, we have to consider all potential		// their uses and find the common post-dominator block (note that additional
// aliases and find the common dominator block to place the alloc.		// copies and buffers remove potential aliases and will influence the placement
		// of the deallocs). In all cases, the computed block can be safely used to free
		// the %new_value buffer (may be exit or bb2) as it will die and we can use
		// liveness information to determine the exact operation after which we have to
		// insert the dealloc. Finding the alloc position is similar and non-obvious.
		herhutUnsubmitted Done Reply Inline Actions Nit: Remove `highly`. So alloc works the same way but this time with a common dominator? If no such dominator exists, copies are inserted? herhut: Nit: Remove `highly`. So alloc works the same way but this time with a common dominator? If…
		// However, the algorithm supports moving allocs to other places and introducing
		// copy buffers and placing deallocs in safe places to ensure that all buffers
		// will be freed in the end.
//		//
// TODO:		// TODO:
// The current implementation does not support loops and the resulting code will		// The current implementation does not support loops and the resulting code will
// be invalid with respect to program semantics. The only thing that is		// be invalid with respect to program semantics. The only thing that is
// currently missing is a high-level loop analysis that allows us to move allocs		// currently missing is a high-level loop analysis that allows us to move allocs
// and deallocs outside of the loop blocks. Furthermore, it doesn't also accept		// and deallocs outside of the loop blocks. Furthermore, it doesn't also accept
// functions which return buffers already.		// functions which return buffers already.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Transforms/BufferPlacement.h"		#include "mlir/Transforms/BufferPlacement.h"
		#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
		#include "mlir/IR/Operation.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/Passes.h"		#include "mlir/Transforms/Passes.h"
		#include "llvm/ADT/SetOperations.h"

using namespace mlir;		using namespace mlir;

namespace {		namespace {

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacementAliasAnalysis		// BufferPlacementAliasAnalysis
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// A straight-forward alias analysis which ensures that all aliases of all		/// A straight-forward alias analysis which ensures that all aliases of all
/// values will be determined. This is a requirement for the BufferPlacement		/// values will be determined. This is a requirement for the BufferPlacement
/// class since you need to determine safe positions to place alloc and		/// class since you need to determine safe positions to place alloc and
/// deallocs.		/// deallocs.
class BufferPlacementAliasAnalysis {		class BufferPlacementAliasAnalysis {
public:		public:
using ValueSetT = SmallPtrSet<Value, 16>;		using ValueSetT = SmallPtrSet<Value, 16>;
		using ValueMapT = llvm::DenseMap<Value, ValueSetT>;

public:		public:
/// Constructs a new alias analysis using the op provided.		/// Constructs a new alias analysis using the op provided.
BufferPlacementAliasAnalysis(Operation *op) { build(op->getRegions()); }		BufferPlacementAliasAnalysis(Operation *op) { build(op->getRegions()); }

/// Finds all immediate and indirect aliases this value could potentially		/// Find all immediate aliases this value could potentially have.
		ValueMapT::const_iterator find(Value value) const {
		return aliases.find(value);
		}

		/// Returns the end iterator that can be used in combination with find.
		ValueMapT::const_iterator end() const { return aliases.end(); }

		/// Find all immediate and indirect aliases this value could potentially
/// have. Note that the resulting set will also contain the value provided as		/// have. Note that the resulting set will also contain the value provided as
/// it is an alias of itself.		/// it is an alias of itself.
ValueSetT resolve(Value value) const {		ValueSetT resolve(Value value) const {
ValueSetT result;		ValueSetT result;
resolveRecursive(value, result);		resolveRecursive(value, result);
return result;		return result;
}		}

		/// Removes the given values from all alias sets.
		void remove(const SmallPtrSetImpl<BlockArgument> &aliasValues) {
		for (auto &entry : aliases)
		llvm::set_subtract(entry.second, aliasValues);
		}

private:		private:
/// Recursively determines alias information for the given value. It stores		/// Recursively determines alias information for the given value. It stores
/// all newly found potential aliases in the given result set.		/// all newly found potential aliases in the given result set.
void resolveRecursive(Value value, ValueSetT &result) const {		void resolveRecursive(Value value, ValueSetT &result) const {
if (!result.insert(value).second)		if (!result.insert(value).second)
return;		return;
auto it = aliases.find(value);		auto it = aliases.find(value);
if (it == aliases.end())		if (it == aliases.end())
Show All 30 Lines	for (Region &region : regions) {
}		}
}		}
}		}
}		}
}		}
}		}

/// Maps values to all immediate aliases this value can have.		/// Maps values to all immediate aliases this value can have.
llvm::DenseMap<Value, ValueSetT> aliases;		ValueMapT aliases;
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacementPositions		// BufferPlacement
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Stores correct alloc and dealloc positions to place dialect-specific alloc		// The main buffer placement analysis used to place allocs, copies and deallocs.
/// and dealloc operations.		class BufferPlacement {
struct BufferPlacementPositions {
public:		public:
BufferPlacementPositions()		using ValueSetT = BufferPlacementAliasAnalysis::ValueSetT;
: allocPosition(nullptr), deallocPosition(nullptr) {}

/// Creates a new positions tuple including alloc and dealloc positions.		/// An intermediate representation of a single allocation node.
BufferPlacementPositions(Operation allocPosition, Operation deallocPosition)		struct AllocEntry {
: allocPosition(allocPosition), deallocPosition(deallocPosition) {}		/// A reference to the associated allocation node.
		Value allocValue;
/// Returns the alloc position before which the alloc operation has to be
/// inserted.		/// The associated placement block in which the allocation should be
Operation *getAllocPosition() const { return allocPosition; }		/// performed.
		Block *placementBlock;
/// Returns the dealloc position after which the dealloc operation has to be
/// inserted.
Operation *getDeallocPosition() const { return deallocPosition; }

private:		/// The associated dealloc operation (if any).
Operation *allocPosition;		Operation *deallocOperation;
Operation *deallocPosition;
};		};

//===----------------------------------------------------------------------===//		using AllocEntryList = SmallVector<AllocEntry, 8>;
// BufferPlacementAnalysis
//===----------------------------------------------------------------------===//

		herhutUnsubmitted Done Reply Inline Actions Optimizations should be independent of `BufferPlacement`, so this TODO does not belong here. herhut: Optimizations should be independent of `BufferPlacement`, so this TODO does not belong here.
// The main buffer placement analysis used to place allocs and deallocs.
class BufferPlacementAnalysis {
public:		public:
using DeallocSetT = SmallPtrSet<Operation *, 2>;		BufferPlacement(Operation *op)
		: operation(op), aliases(op), liveness(op), dominators(op),
		postDominators(op) {
		// Gather all allocation nodes
		initBlockMapping();
		herhutUnsubmitted Done Reply Inline Actions Nit: `Finds` -> `Find`? herhut: Nit: `Finds` -> `Find`?
		}

		/// Performs the actual placement/creation of all alloc, copy and dealloc
		/// nodes.
		void place() {
		// Place all allocations.
		placeAllocs();
		// Add additional allocations and copies that are required.
		introduceCopies();
		// Find all associated dealloc nodes.
		findDeallocs();
		// Place deallocations for all allocation entries.
		placeDeallocs();
		}

public:		private:
BufferPlacementAnalysis(Operation *op)		/// Initializes the internal block mapping by discovering allocation nodes. It
: operation(op), liveness(op), dominators(op), postDominators(op),		/// maps all allocation nodes to their initial block in which they can be
aliases(op) {}		/// safely allocated.
		void initBlockMapping() {
/// Computes the actual positions to place allocs and deallocs for the given		operation->walk([&](MemoryEffectOpInterface opInterface) {
		herhutUnsubmitted Done Reply Inline Actions Would it make sense to compute this once and share it between the different phases? The moving of allocs does not influence the aliasing. herhut: Would it make sense to compute this once and share it between the different phases? The moving…
/// value.		// Try to find a single allocation result.
BufferPlacementPositions
computeAllocAndDeallocPositions(OpResult result) const {
if (result.use_empty())
return BufferPlacementPositions(result.getOwner(), result.getOwner());
// Get all possible aliases.
auto possibleValues = aliases.resolve(result);
return BufferPlacementPositions(getAllocPosition(result, possibleValues),
getDeallocPosition(result, possibleValues));
}

/// Finds all associated dealloc nodes for the alloc nodes using alias
/// information.
DeallocSetT findAssociatedDeallocs(OpResult allocResult) const {
DeallocSetT result;
auto possibleValues = aliases.resolve(allocResult);
for (Value alias : possibleValues)
for (Operation *op : alias.getUsers()) {
// Check for an existing memory effect interface.
auto effectInstance = dyn_cast<MemoryEffectOpInterface>(op);
if (!effectInstance)
continue;
// Check whether the associated value will be freed using the current
// operation.
SmallVector<MemoryEffects::EffectInstance, 2> effects;		SmallVector<MemoryEffects::EffectInstance, 2> effects;
effectInstance.getEffectsOnValue(alias, effects);		opInterface.getEffects(effects);
if (llvm::any_of(effects, [=](MemoryEffects::EffectInstance &it) {
return isa<MemoryEffects::Free>(it.getEffect());		SmallVector<MemoryEffects::EffectInstance, 2> allocateResultEffects;
}))		llvm::copy_if(effects, std::back_inserter(allocateResultEffects),
result.insert(op);		[=](MemoryEffects::EffectInstance &it) {
		Value value = it.getValue();
		return isa<MemoryEffects::Allocate>(it.getEffect()) &&
		value && value.isa<OpResult>();
		});
		// If there is one result only, we will be able to move the allocation and
		// (possibly existing) deallocation ops.
		if (allocateResultEffects.size() != 1)
		return;
		// Get allocation result.
		herhutUnsubmitted Done Reply Inline Actions This should not be an assert. Instead either ignore the `alloc` (which AFAIK was the mode before) or fail the transformation and report an error. Generally, the reason this was not supported before was that multiple results creates situations where the alloc cannot be moved. Can the new system not tolerate this with copies? I am OK with this not being supported, as a multi-result `alloc` seems a very boutique use. herhut: This should not be an assert. Instead either ignore the `alloc` (which AFAIK was the mode…
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions In principal yes; however, we currently remove all `Dealloc` nodes in the beginning of the transformation. If we keep them for unsupported `Alloc` ops, this should not be an issue. dfki-mako: In principal yes; however, we currently remove all `Dealloc` nodes in the beginning of the…
		mehdi_aminiUnsubmitted Done Reply Inline Actions Can we be safe in this case instead of failing entirely? mehdi_amini: Can we be safe in this case instead of failing entirely?
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions @mehdi_amini If we do not remove `Dealloc` nodes (or nodes with `free` semantics) that are related to unsupported `alloc` nodes, we can be sure that the program will not be "worse" than before with respect to these buffers. dfki-mako: @mehdi_amini If we do not remove `Dealloc` nodes (or nodes with `free` semantics) that are…
		mehdi_aminiUnsubmitted Done Reply Inline Actions Right, so we shouldn't need to emit an error here then? We can just skip here? mehdi_amini: Right, so we shouldn't need to emit an error here then? We can just skip here?
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions The latest version skips these allocation nodes without aborting the transformation process. dfki-mako: The latest version skips these allocation nodes without aborting the transformation process.
		auto allocResult = allocateResultEffects[0].getValue().cast<OpResult>();
		// Find the initial allocation block and register this result.
		allocs.push_back(
		{allocResult, getInitialAllocBlock(allocResult), nullptr});
		});
		}

		/// Computes a valid allocation position in a dominator (if possible) for the
		/// given allocation result.
		Block *getInitialAllocBlock(OpResult result) {
		// Get all allocation operands as these operands are important for the
		// allocation operation.
		herhutUnsubmitted Done Reply Inline Actions Allocation node with know shape? Another way to look at it would be to check whether the allocation has any operands, which structurally is the reason it cannot be moved. herhut: Allocation node with know shape? Another way to look at it would be to check whether the…
		auto operands = result.getOwner()->getOperands();
		if (operands.size() < 1)
		return findCommonDominator(result, aliases.resolve(result), dominators);

		// If this node has dependencies, check all dependent nodes with respect
		// to a common post dominator in which all values are available.
		ValueSetT dependencies(++operands.begin(), operands.end());
		return findCommonDominator(*operands.begin(), dependencies, postDominators);
		}

		herhutUnsubmitted Done Reply Inline Actions Same here, this could just use the operands of the alloc. herhut: Same here, this could just use the operands of the alloc.
		/// Finds correct alloc positions according to the algorithm described at
		/// the top of the file for all alloc nodes that can be handled by this
		herhutUnsubmitted Done Reply Inline Actions It is a clever reuse of `findPlacementBlock`. Maybe it should no longer talk about aliases. Instead, it is a helper that finds a dominator/postdominator for a range of values. herhut: It is a clever reuse of `findPlacementBlock`. Maybe it should no longer talk about aliases.
		/// analysis.
		void placeAllocs() const {
		for (auto &entry : allocs) {
		Value alloc = entry.allocValue;
		// Get the actual block to place the alloc and get liveness information
		// for the placement block.
		Block *placementBlock = entry.placementBlock;
		// We have to ensure that we place the alloc before its first use in this
		// block.
		const LivenessBlockInfo *livenessInfo =
		liveness.getLiveness(placementBlock);
		Operation *startOperation = livenessInfo->getStartOperation(alloc);
		// Check whether the start operation lies in the desired placement block.
		// If not, we will use the terminator as this is the last operation in
		// this block.
		if (startOperation->getBlock() != placementBlock)
		startOperation = placementBlock->getTerminator();

		// Move the alloc in front of the start operation.
		Operation *allocOperation = alloc.getDefiningOp();
		allocOperation->moveBefore(startOperation);
}		}
		herhutUnsubmitted Done Reply Inline Actions Is this `dependency information` that you are lacking? herhut: Is this `dependency information` that you are lacking?
return result;
}		}

/// Dumps the buffer placement information to the given stream.		/// Introduces required allocs and copy operations to avoid memory leaks.
void print(raw_ostream &os) const {		void introduceCopies() {
os << "// ---- Buffer Placement -----\n";		// Initialize the set of block arguments that require a dedicated memory
		// free operation since their arguments cannot be safely deallocated in a
for (Region &region : operation->getRegions())		// post dominator.
for (Block &block : region)		SmallPtrSet<BlockArgument, 8> blockArgsToFree;
for (Operation &operation : block)		llvm::SmallDenseSet<std::tuple<BlockArgument, Block *>> visitedBlockArgs;
for (OpResult result : operation.getResults()) {		SmallVector<std::tuple<BlockArgument, Block *>, 8> toProcess;
BufferPlacementPositions positions =
computeAllocAndDeallocPositions(result);		// Check dominance relation for proper dominance properties. If the given
os << "Positions for ";		// value node does not dominate an alias, we will have to create a copy in
result.print(os);		// order to free all buffers that can potentially leak into a post
os << "\n Alloc: ";		// dominator.
positions.getAllocPosition()->print(os);		auto findUnsafeValues = [&](Value source, Block *definingBlock) {
		herhutUnsubmitted Done Reply Inline Actions With this change, you consider all transitive aliases of a value. However, as soon as you insert a copy (by adding a blockarg to the list of blocks to free) that set would need updating, as you now have fewer transitive aliases. In you example, consider the case where there is another aliasing block (say b7) after b6. Then we would get a copy from b5 to b6, as b2 does not dominate b6. This would also take care of the block b7 after b6, as long as b6 dominates it. However, with the current approach, you would still check whether b2 dominates this additional block b7 and like insert a copy from b6 to b7. I think that, when adding a blockarg to the free list, you have to use that block as the dominating block (as it now becomes the allocating place) when processing aliases of the new freed blockarg. herhut: With this change, you consider all transitive aliases of a value. However, as soon as you…
os << "\n Dealloc: ";		auto it = aliases.find(source);
positions.getDeallocPosition()->print(os);		if (it == aliases.end())
os << "\n";		return;
		for (Value value : it->second) {
		auto blockArg = value.cast<BlockArgument>();
		if (blockArgsToFree.count(blockArg) > 0)
		continue;
		// Check whether we have to free this particular block argument.
		if (!dominators.dominates(definingBlock, blockArg.getOwner())) {
		toProcess.emplace_back(blockArg, blockArg.getParentBlock());
		blockArgsToFree.insert(blockArg);
		} else if (visitedBlockArgs.insert({blockArg, definingBlock}).second)
		toProcess.emplace_back(blockArg, definingBlock);
}		}
		};

		// Detect possibly unsafe aliases starting from all allocations.
		for (auto &entry : allocs)
		herhutUnsubmitted Done Reply Inline Actions You could also populate `toProcess` here and then just have the loop below. This works for me too, though. herhut: You could also populate `toProcess` here and then just have the loop below. This works for me…
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions This would also work. However, we would prefer the more explicit version to help others understand the code more easily. dfki-mako: This would also work. However, we would prefer the more explicit version to help others…
		findUnsafeValues(entry.allocValue, entry.placementBlock);
		herhutUnsubmitted Done Reply Inline Actions Nit: Remove the `{` `}`. herhut: Nit: Remove the `{` `}`.

		// Try to find block arguments that require an explicit free operation
		// until we reach a fix point.
		while (!toProcess.empty()) {
		auto current = toProcess.pop_back_val();
		findUnsafeValues(std::get<0>(current), std::get<1>(current));
		}
		herhutUnsubmitted Done Reply Inline Actions Aren't there two cases here? If this immediate alias is not dominated, then a copy is inserted, the alias turns effectively into an allocation and also needs to be processed. If we do not insert a copy, then the alias still has to be processed, using the original allocation as the block that needs to dominate. I think the second case is missing here. herhut: Aren't there two cases here? If this immediate alias is not dominated, then a copy is inserted…
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions We have added a separate test case to simulate such a case and changed the CL accordingly. dfki-mako: We have added a separate test case to simulate such a case and changed the CL accordingly.

		// Update buffer aliases to ensure that we free all buffers and block
		// arguments at the correct locations.
		aliases.remove(blockArgsToFree);

		// Add new allocs and additional copy operations.
		for (BlockArgument blockArg : blockArgsToFree) {
		Block *block = blockArg.getOwner();

		// Allocate a buffer for the current block argument in the block of
		// the associated value (which will be a predecessor block by
		// definition).
		for (auto it = block->pred_begin(), e = block->pred_end(); it != e;
		++it) {
		// Get the terminator and the value that will be passed to our
		// argument.
		Operation terminator = (it)->getTerminator();
		auto successorOperand =
		cast<BranchOpInterface>(terminator)
		.getMutableSuccessorOperands(it.getSuccessorIndex())
		.getValue()
		.slice(blockArg.getArgNumber(), 1);
		Value sourceValue = ((OperandRange)successorOperand)[0];

		// Create a new alloc at the current location of the terminator.
		auto memRefType = sourceValue.getType().cast<MemRefType>();
		OpBuilder builder(terminator);

		// Extract information about dynamically shaped types by
		// extracting their dynamic dimensions.
		SmallVector<Value, 4> dynamicOperands;
		for (auto shapeElement : llvm::enumerate(memRefType.getShape())) {
		if (!ShapedType::isDynamic(shapeElement.value()))
		continue;
		dynamicOperands.push_back(builder.create<DimOp>(
		terminator->getLoc(), sourceValue, shapeElement.index()));
}		}

private:		// TODO: provide a generic interface to create dialect-specific
/// Finds a correct placement block to store alloc/dealloc node according to		// Alloc and CopyOp nodes.
/// the algorithm described at the top of the file. It supports dominator and		auto alloc = builder.create<AllocOp>(terminator->getLoc(), memRefType,
/// post-dominator analyses via template arguments.		dynamicOperands);
template <typename DominatorT>		// Wire new alloc and successor operand.
Block *		successorOperand.assign(alloc);
findPlacementBlock(OpResult result,		// Create a new copy operation that copies to contents of the old
const BufferPlacementAliasAnalysis::ValueSetT &aliases,		// allocation to the new one.
const DominatorT &doms) const {		builder.create<linalg::CopyOp>(terminator->getLoc(), sourceValue,
// Start with the current block the value is defined in.		alloc);
Block *dom = result.getOwner()->getBlock();		}
// Iterate over all aliases and their uses to find a safe placement block
// according to the given dominator information.		// Register the block argument to require a final dealloc. Note that
for (Value alias : aliases)		// we do not have to assign a block here since we do not want to
for (Operation *user : alias.getUsers()) {		// move the allocation node to another location.
// Move upwards in the dominator tree to find an appropriate		allocs.push_back({blockArg, nullptr, nullptr});
// dominator block that takes the current use into account.
dom = doms.findNearestCommonDominator(dom, user->getBlock());
}		}
return dom;
}		}

/// Finds a correct alloc position according to the algorithm described at		/// Finds associated deallocs that can be linked to our allocation nodes (if
		herhutUnsubmitted Done Reply Inline Actions This does not account for nested blocks in nested regions. Also, why is this needed? herhut: This does not account for nested blocks in nested regions. Also, why is this needed?
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions Obsolete; we have added a new fix-point iteration. dfki-mako: Obsolete; we have added a new fix-point iteration.
/// the top of the file.		/// any).
Operation *getAllocPosition(		void findDeallocs() {
OpResult result,		for (auto &entry : allocs) {
const BufferPlacementAliasAnalysis::ValueSetT &aliases) const {		auto userIt =
// Determine the actual block to place the alloc and get liveness		llvm::find_if(entry.allocValue.getUsers(), [&](Operation *user) {
// information.		auto effectInterface = dyn_cast<MemoryEffectOpInterface>(user);
Block *placementBlock = findPlacementBlock(result, aliases, dominators);		if (!effectInterface)
const LivenessBlockInfo *livenessInfo =		return false;
liveness.getLiveness(placementBlock);		// Try to find a free effect that is applied to one of our values
		// that will be automatically freed by our pass.
		herhutUnsubmitted Done Reply Inline Actions I do not understand what the block argument refers to here. herhut: I do not understand what the block argument refers to here.
		SmallVector<MemoryEffects::EffectInstance, 2> effects;
		effectInterface.getEffectsOnValue(entry.allocValue, effects);
		return llvm::any_of(
		effects, [&](MemoryEffects::EffectInstance &it) {
		return isa<MemoryEffects::Free>(it.getEffect());
		});
		herhutUnsubmitted Done Reply Inline Actions Would it make sense to store `BlockArgument` to start with? Aliases are always `BlockArguments`, right? herhut: Would it make sense to store `BlockArgument` to start with? Aliases are always `BlockArguments`…
		});
		// Assign the associated dealloc operation (if any).
		herhutUnsubmitted Done Reply Inline Actions Somewhat sad that `MutableOperandRange` does not support to get values out... herhut: Somewhat sad that `MutableOperandRange` does not support to get values out...
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions Unfortunately, it seems to be the only way at the moment... dfki-mako: Unfortunately, it seems to be the only way at the moment...
		if (userIt != entry.allocValue.user_end())
		herhutUnsubmitted Done Reply Inline Actions This breaks the aliasing between `source` and `value`. If there are further aliases of `source` that alias via `value`, they would no longer be an alias. But we would still potentially insert a copy to break the aliasing? herhut: This breaks the aliasing between `source` and `value`. If there are further aliases of `source`…
		herhutUnsubmitted Done Reply Inline Actions Nit: `delloc` -> `dealloc` herhut: Nit: `delloc` -> `dealloc`
		entry.deallocOperation = *userIt;
		}
		}

		/// Finds correct dealloc positions according to the algorithm described at
		/// the top of the file for all alloc nodes and block arguments that can be
		/// handled by this analysis.
		void placeDeallocs() const {
		// Move or insert deallocs using the previously computed information.
		// These deallocations will be linked to their associated allocation nodes
		// since they don't have any aliases that can (potentially) increase their
		// liveness.
		for (auto &entry : allocs) {
		Value alloc = entry.allocValue;
		auto aliasesSet = aliases.resolve(alloc);
		assert(aliasesSet.size() > 0 && "must contain at least one alias");

// We have to ensure that the alloc will be before the first use of all
// aliases of the given value. We first assume that there are no uses in the
// placementBlock and that we can safely place the alloc before the
// terminator at the end of the block.
Operation *startOperation = placementBlock->getTerminator();
// Iterate over all aliases and ensure that the startOperation will point to
// the first operation of all potential aliases in the placementBlock.
for (Value alias : aliases) {
Operation *aliasStartOperation = livenessInfo->getStartOperation(alias);
// Check whether the aliasStartOperation lies in the desired block and
// whether it is before the current startOperation. If yes, this will be
// the new startOperation.
if (aliasStartOperation->getBlock() == placementBlock &&
aliasStartOperation->isBeforeInBlock(startOperation))
startOperation = aliasStartOperation;
}
// startOperation is the first operation before which we can safely store
// the alloc taking all potential aliases into account.
return startOperation;
}

/// Finds a correct dealloc position according to the algorithm described at
/// the top of the file.
Operation *getDeallocPosition(
OpResult result,
const BufferPlacementAliasAnalysis::ValueSetT &aliases) const {
// Determine the actual block to place the dealloc and get liveness		// Determine the actual block to place the dealloc and get liveness
// information.		// information.
Block *placementBlock = findPlacementBlock(result, aliases, postDominators);		Block *placementBlock =
		findCommonDominator(alloc, aliasesSet, postDominators);
const LivenessBlockInfo *livenessInfo =		const LivenessBlockInfo *livenessInfo =
liveness.getLiveness(placementBlock);		liveness.getLiveness(placementBlock);

// We have to ensure that the dealloc will be after the last use of all		// We have to ensure that the dealloc will be after the last use of all
// aliases of the given value. We first assume that there are no uses in the		// aliases of the given value. We first assume that there are no uses in
// placementBlock and that we can safely place the dealloc at the beginning.		// the placementBlock and that we can safely place the dealloc at the
		// beginning.
Operation *endOperation = &placementBlock->front();		Operation *endOperation = &placementBlock->front();
// Iterate over all aliases and ensure that the endOperation will point to		// Iterate over all aliases and ensure that the endOperation will point
// the last operation of all potential aliases in the placementBlock.		// to the last operation of all potential aliases in the placementBlock.
for (Value alias : aliases) {		for (Value alias : aliasesSet) {
Operation *aliasEndOperation =		Operation *aliasEndOperation =
livenessInfo->getEndOperation(alias, endOperation);		livenessInfo->getEndOperation(alias, endOperation);
// Check whether the aliasEndOperation lies in the desired block and		// Check whether the aliasEndOperation lies in the desired block and
// whether it is behind the current endOperation. If yes, this will be the		// whether it is behind the current endOperation. If yes, this will be
// new endOperation.		// the new endOperation.
if (aliasEndOperation->getBlock() == placementBlock &&		if (aliasEndOperation->getBlock() == placementBlock &&
endOperation->isBeforeInBlock(aliasEndOperation))		endOperation->isBeforeInBlock(aliasEndOperation))
endOperation = aliasEndOperation;		endOperation = aliasEndOperation;
}		}
// endOperation is the last operation behind which we can safely store the		// endOperation is the last operation behind which we can safely store
// dealloc taking all potential aliases into account.		// the dealloc taking all potential aliases into account.
return endOperation;
		// If there is an existing dealloc, move it to the right place.
		if (entry.deallocOperation) {
		entry.deallocOperation->moveAfter(endOperation);
		} else {
		// If the Dealloc position is at the terminator operation of the block,
		// then the value should escape from a deallocation.
		Operation *nextOp = endOperation->getNextNode();
		if (!nextOp)
		continue;
		// If there is no dealloc node, insert one in the right place.
		OpBuilder builder(nextOp);
		builder.create<DeallocOp>(alloc.getLoc(), alloc);
		}
		}
		}

		/// Finds a common dominator for the given value while taking the positions
		/// of the values in the value set into account. It supports dominator and
		/// post-dominator analyses via template arguments.
		template <typename DominatorT>
		Block *findCommonDominator(Value value, const ValueSetT &values,
		const DominatorT &doms) const {
		// Start with the current block the value is defined in.
		Block *dom = value.getParentBlock();
		// Iterate over all aliases and their uses to find a safe placement block
		// according to the given dominator information.
		for (Value childValue : values)
		for (Operation *user : childValue.getUsers()) {
		// Move upwards in the dominator tree to find an appropriate
		// dominator block that takes the current use into account.
		herhutUnsubmitted Done Reply Inline Actions Why would we want to insert before the end operation? That would free it before the last use? herhut: Why would we want to insert before the end operation? That would free it before the last use?
		dom = doms.findNearestCommonDominator(dom, user->getBlock());
		}
		return dom;
}		}

/// The operation this transformation was constructed from.		/// The operation this transformation was constructed from.
Operation *operation;		Operation *operation;

/// The underlying liveness analysis to compute fine grained information about		/// Alias information that can be updated during the insertion of copies.
/// alloc and dealloc positions.		BufferPlacementAliasAnalysis aliases;

		/// Maps allocation nodes to their associated blocks.
		AllocEntryList allocs;

		/// The underlying liveness analysis to compute fine grained information
		/// about alloc and dealloc positions.
Liveness liveness;		Liveness liveness;

/// The dominator analysis to place allocs in the appropriate blocks.		/// The dominator analysis to place deallocs in the appropriate blocks.
DominanceInfo dominators;		DominanceInfo dominators;

/// The post dominator analysis to place deallocs in the appropriate blocks.		/// The post dominator analysis to place deallocs in the appropriate blocks.
PostDominanceInfo postDominators;		PostDominanceInfo postDominators;

/// The internal alias analysis to ensure that allocs and deallocs take all
/// their potential aliases into account.
BufferPlacementAliasAnalysis aliases;
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacementPass		// BufferPlacementPass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// The actual buffer placement pass that moves alloc and dealloc nodes into		/// The actual buffer placement pass that moves alloc and dealloc nodes into
/// the right positions. It uses the algorithm described at the top of the file.		/// the right positions. It uses the algorithm described at the top of the
		/// file.
struct BufferPlacementPass		struct BufferPlacementPass
: mlir::PassWrapper<BufferPlacementPass, FunctionPass> {		: mlir::PassWrapper<BufferPlacementPass, FunctionPass> {
void runOnFunction() override {
// Get required analysis information first.
auto &analysis = getAnalysis<BufferPlacementAnalysis>();

// Compute an initial placement of all nodes.
llvm::SmallVector<std::pair<OpResult, BufferPlacementPositions>, 16>
placements;
getFunction().walk([&](MemoryEffectOpInterface op) {
// Try to find a single allocation result.
SmallVector<MemoryEffects::EffectInstance, 2> effects;
op.getEffects(effects);

SmallVector<MemoryEffects::EffectInstance, 2> allocateResultEffects;
llvm::copy_if(effects, std::back_inserter(allocateResultEffects),
[=](MemoryEffects::EffectInstance &it) {
Value value = it.getValue();
return isa<MemoryEffects::Allocate>(it.getEffect()) &&
value && value.isa<OpResult>();
});
// If there is one result only, we will be able to move the allocation and
// (possibly existing) deallocation ops.
if (allocateResultEffects.size() == 1) {
// Insert allocation result.
auto allocResult = allocateResultEffects[0].getValue().cast<OpResult>();
placements.emplace_back(
allocResult, analysis.computeAllocAndDeallocPositions(allocResult));
}
});

// Move alloc (and dealloc - if any) nodes into the right places and insert		void runOnFunction() override {
// dealloc nodes if necessary.		// Place all required alloc, copy and dealloc nodes.
for (auto &entry : placements) {		BufferPlacement placement(getFunction());
		herhutUnsubmitted Done Reply Inline Actions `a an` -> `an` herhut: `a an` -> `an`
// Find already associated dealloc nodes.		placement.place();
OpResult alloc = entry.first;
auto deallocs = analysis.findAssociatedDeallocs(alloc);
if (deallocs.size() > 1) {
emitError(alloc.getLoc(),
"not supported number of associated dealloc operations");
return;
}

// Move alloc node to the right place.
BufferPlacementPositions &positions = entry.second;
Operation *allocOperation = alloc.getOwner();
allocOperation->moveBefore(positions.getAllocPosition());

// If there is an existing dealloc, move it to the right place.
Operation *nextOp = positions.getDeallocPosition()->getNextNode();
// If the Dealloc position is at the terminator operation of the block,
// then the value should escape from a deallocation.
if (!nextOp) {
assert(deallocs.empty() &&
"There should be no dealloc for the returned buffer");
continue;
}
if (deallocs.size()) {
(*deallocs.begin())->moveBefore(nextOp);
} else {
// If there is no dealloc node, insert one in the right place.
OpBuilder builder(nextOp);
builder.create<DeallocOp>(allocOperation->getLoc(), alloc);
}
}		}
};		};
};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferAssignmentPlacer		// BufferAssignmentPlacer
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Creates a new assignment placer.		/// Creates a new assignment placer.
Show All 35 Lines

mlir/test/Transforms/buffer-placement.mlir

	// RUN: mlir-opt -buffer-placement -split-input-file %s \| FileCheck %s			// RUN: mlir-opt -buffer-placement -split-input-file %s \| FileCheck %s

	// This file checks the behaviour of BufferPlacement pass for moving Alloc and Dealloc			// This file checks the behaviour of BufferPlacement pass for moving Alloc and
	// operations and inserting the missing the DeallocOps in their correct positions.			// Dealloc operations and inserting the missing the DeallocOps in their correct
				// positions.

	// Test Case:			// Test Case:
	// bb0			// bb0
	// / \			// / \
	// bb1 bb2 <- Initial position of AllocOp			// bb1 bb2 <- Initial position of AllocOp
	// \ /			// \ /
	// bb3			// bb3
	// BufferPlacement Expected Behaviour: It should move the existing AllocOp to the entry block,			// BufferPlacement Expected Behaviour: It should move the existing AllocOp to
	// and insert a DeallocOp at the exit block after CopyOp since %1 is an alias for %0 and %arg1.			// the entry block, and insert a DeallocOp at the exit block after CopyOp since
				// %1 is an alias for %0 and %arg1.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @condBranch			// CHECK-LABEL: func @condBranch
	func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {			func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
	cond_br %arg0, ^bb1, ^bb2			cond_br %arg0, ^bb1, ^bb2
	^bb1:			^bb1:
	br ^bb3(%arg1 : memref<2xf32>)			br ^bb3(%arg1 : memref<2xf32>)
	^bb2:			^bb2:
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	br ^bb3(%0 : memref<2xf32>)			br ^bb3(%0 : memref<2xf32>)
	^bb3(%1: memref<2xf32>):			^bb3(%1: memref<2xf32>):
	"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK-NEXT: %[[ALLOC:.*]] = alloc()			// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
	// CHECK-NEXT: cond_br			// CHECK-NEXT: cond_br
	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK-NEXT: dealloc %[[ALLOC]]			// CHECK-NEXT: dealloc %[[ALLOC]]
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferPlacement Expected Behaviour: It should not move the existing AllocOp
				// to any other block since the alloc has a dynamic dependency to block argument
				// %0 in bb2. Since the dynamic type is passed to bb3 via the block argument %2,
				// it is currently required to allocate a temporary buffer for %2 that gets
				// copies of %arg0 and %1 with their appropriate shape dimensions. The copy
				// buffer deallocation will be applied to %2 in block bb3.
				pifon2aUnsubmitted Done Reply Inline Actions can this (and all other comments and also IR below) fit 80 chars? pifon2a: can this (and all other comments and also IR below) fit 80 chars?

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranchDynamicType
				func @condBranchDynamicType(
				%arg0: i1,
				%arg1: memref<?xf32>,
				%arg2: memref<?xf32>,
				%arg3: index) {
				cond_br %arg0, ^bb1, ^bb2(%arg3: index)
				^bb1:
				br ^bb3(%arg1 : memref<?xf32>)
				^bb2(%0: index):
				%1 = alloc(%0) : memref<?xf32>
				linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %1 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<?xf32>, memref<?xf32>
				br ^bb3(%1 : memref<?xf32>)
				^bb3(%2: memref<?xf32>):
				"linalg.copy"(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
				return
				}

				// CHECK-NEXT: cond_br
				// CHECK: %[[DIM0:.*]] = dim
				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[DIM0]])
				// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])
				herhutUnsubmitted Done Reply Inline Actions The numbering is strange of `ALLOC02` herhut: The numbering is strange of `ALLOC02`
				dfki-makoAuthorUnsubmitted Done Reply Inline Actions The name `ALLOC02` should indicate that the allocations `ALLOC0` and `ALLOC2` will be freed at this location. dfki-mako: The name `ALLOC02` should indicate that the allocations `ALLOC0` and `ALLOC2` will be freed at…
				// CHECK: ^bb2(%[[IDX:.]]:{{.}})
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%[[IDX]])
				// CHECK-NEXT: linalg.generic
				// CHECK: %[[DIM1:.*]] = dim %[[ALLOC1]]
				// CHECK-NEXT: %[[ALLOC2:.*]] = alloc(%[[DIM1]])
				// CHECK-NEXT: linalg.copy(%[[ALLOC1]], %[[ALLOC2]])
				// CHECK-NEXT: dealloc %[[ALLOC1]]
				// CHECK-NEXT: br ^bb3
				// CHECK-NEXT: ^bb3(%[[ALLOC3:.]]:{{.}})
				// CHECK: linalg.copy(%[[ALLOC3]],
				// CHECK-NEXT: dealloc %[[ALLOC3]]
				// CHECK-NEXT: return

				// -----

				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \| / \
				// \| bb3 bb4
				// \| \ /
				// \ bb5
				// \ /
				// bb6
				// \|
				// bb7
				// BufferPlacement Expected Behaviour: It should not move the existing AllocOp
				// to any other block since the alloc has a dynamic dependency to block argument
				// %0 in bb2. Since the dynamic type is passed to bb5 via the block argument %2
				// and to bb6 via block argument %3, it is currently required to allocate
				// temporary buffers for %2 and %3 that gets copies of %1 and %arg0 1 with their
				// appropriate shape dimensions. The copy buffer deallocations will be applied
				// to %2 in block bb5 and to %3 in block bb6. Furthermore, there should be no
				// copy inserted for %4.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranchDynamicType
				func @condBranchDynamicTypeNested(
				%arg0: i1,
				%arg1: memref<?xf32>,
				%arg2: memref<?xf32>,
				%arg3: index) {
				cond_br %arg0, ^bb1, ^bb2(%arg3: index)
				^bb1:
				br ^bb6(%arg1 : memref<?xf32>)
				^bb2(%0: index):
				%1 = alloc(%0) : memref<?xf32>
				linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %1 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<?xf32>, memref<?xf32>
				cond_br %arg0, ^bb3, ^bb4
				herhutUnsubmitted Done Reply Inline Actions Could you add all block numbers here. That makes it easier to follow where the `br` operations belong. herhut: Could you add all block numbers here. That makes it easier to follow where the `br` operations…
				^bb3:
				br ^bb5(%1 : memref<?xf32>)
				^bb4:
				br ^bb5(%1 : memref<?xf32>)
				^bb5(%2: memref<?xf32>):
				br ^bb6(%2 : memref<?xf32>)
				^bb6(%3: memref<?xf32>):
				br ^bb7(%3 : memref<?xf32>)
				^bb7(%4: memref<?xf32>):
				"linalg.copy"(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
				return
				}

				// CHECK-NEXT: cond_br
				// CHECK: ^bb1
				// CHECK: %[[DIM0:.*]] = dim
				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc(%[[DIM0]])
				// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])
				// CHECK: ^bb2(%[[IDX:.]]:{{.}})
				// CHECK-NEXT: %[[ALLOC1:.*]] = alloc(%[[IDX]])
				// CHECK-NEXT: linalg.generic
				// CHECK: cond_br
				// CHECK: ^bb3:
				// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})
				// CHECK: ^bb4:
				// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})
				// CHECK-NEXT: ^bb5(%[[ALLOC2:.]]:{{.}})
				// CHECK: %[[DIM2:.*]] = dim %[[ALLOC2]]
				// CHECK-NEXT: %[[ALLOC3:.*]] = alloc(%[[DIM2]])
				// CHECK-NEXT: linalg.copy(%[[ALLOC2]], %[[ALLOC3]])
				// CHECK-NEXT: dealloc %[[ALLOC1]]
				// CHECK-NEXT: br ^bb6(%[[ALLOC3]]{{.*}})
				// CHECK-NEXT: ^bb6(%[[ALLOC4:.]]:{{.}})
				// CHECK-NEXT: br ^bb7(%[[ALLOC4]]{{.*}})
				// CHECK-NEXT: ^bb7(%[[ALLOC5:.]]:{{.}})
				// CHECK: linalg.copy(%[[ALLOC5]],
				// CHECK-NEXT: dealloc %[[ALLOC4]]
				// CHECK-NEXT: return

				// -----

	// Test Case: Existing AllocOp with no users.			// Test Case: Existing AllocOp with no users.
	// BufferPlacement Expected Behaviour: It should insert a DeallocOp right before ReturnOp.			// BufferPlacement Expected Behaviour: It should insert a DeallocOp right before
				// ReturnOp.

	// CHECK-LABEL: func @emptyUsesValue			// CHECK-LABEL: func @emptyUsesValue
	func @emptyUsesValue(%arg0: memref<4xf32>) {			func @emptyUsesValue(%arg0: memref<4xf32>) {
	%0 = alloc() : memref<4xf32>			%0 = alloc() : memref<4xf32>
	return			return
	}			}
	// CHECK-NEXT: %[[ALLOC:.*]] = alloc()			// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
	// CHECK-NEXT: dealloc %[[ALLOC]]			// CHECK-NEXT: dealloc %[[ALLOC]]
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case:			// Test Case:
	// bb0			// bb0
	// / \			// / \
	// \| bb1 <- Initial position of AllocOp			// \| bb1 <- Initial position of AllocOp
	// \ /			// \ /
	// bb2			// bb2
	// BufferPlacement Expected Behaviour: It should move the existing AllocOp to the entry block			// BufferPlacement Expected Behaviour: It should move the existing AllocOp to
	// and insert a DeallocOp at the exit block after CopyOp since %1 is an alias for %0 and %arg1.			// the entry block and insert a DeallocOp at the exit block after CopyOp since
				// %1 is an alias for %0 and %arg1.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @criticalEdge			// CHECK-LABEL: func @criticalEdge
	func @criticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {			func @criticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
	cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)			cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)
	^bb1:			^bb1:
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	br ^bb2(%0 : memref<2xf32>)			br ^bb2(%0 : memref<2xf32>)
	^bb2(%1: memref<2xf32>):			^bb2(%1: memref<2xf32>):
	"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK-NEXT: %[[ALLOC:.*]] = alloc()			// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
	// CHECK-NEXT: cond_br			// CHECK-NEXT: cond_br
	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK-NEXT: dealloc %[[ALLOC]]			// CHECK-NEXT: dealloc %[[ALLOC]]
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case:			// Test Case:
	// bb0 <- Initial position of AllocOp			// bb0 <- Initial position of AllocOp
	// / \			// / \
	// \| bb1			// \| bb1
	// \ /			// \ /
	// bb2			// bb2
	// BufferPlacement Expected Behaviour: It shouldn't move the alloc position. It only inserts			// BufferPlacement Expected Behaviour: It shouldn't move the alloc position. It
	// a DeallocOp at the exit block after CopyOp since %1 is an alias for %0 and %arg1.			// only inserts a DeallocOp at the exit block after CopyOp since %1 is an alias
				// for %0 and %arg1.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @invCriticalEdge			// CHECK-LABEL: func @invCriticalEdge
	func @invCriticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {			func @invCriticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)			cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)
	^bb1:			^bb1:
	br ^bb2(%0 : memref<2xf32>)			br ^bb2(%0 : memref<2xf32>)
	^bb2(%1: memref<2xf32>):			^bb2(%1: memref<2xf32>):
	"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK: dealloc			// CHECK: dealloc
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case:			// Test Case:
	// bb0 <- Initial position of the first AllocOp			// bb0 <- Initial position of the first AllocOp
	// / \			// / \
	// bb1 bb2			// bb1 bb2
	// \ /			// \ /
	// bb3 <- Initial position of the second AllocOp			// bb3 <- Initial position of the second AllocOp
	// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only inserts two missing DeallocOps in the exit block.			// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only
	// %5 is an alias for %0. Therefore, the DeallocOp for %0 should occur after the last GenericOp. The Dealloc for %7 should			// inserts two missing DeallocOps in the exit block. %5 is an alias for %0.
	// happen after the CopyOp.			// Therefore, the DeallocOp for %0 should occur after the last GenericOp. The
				// Dealloc for %7 should happen after the CopyOp.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @ifElse			// CHECK-LABEL: func @ifElse
	func @ifElse(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {			func @ifElse(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	cond_br %arg0, ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)			cond_br %arg0,
				^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>),
				^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
	^bb1(%1: memref<2xf32>, %2: memref<2xf32>):			^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
	br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)			br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)
	^bb2(%3: memref<2xf32>, %4: memref<2xf32>):			^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
	br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)			br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)
	^bb3(%5: memref<2xf32>, %6: memref<2xf32>):			^bb3(%5: memref<2xf32>, %6: memref<2xf32>):
	%7 = alloc() : memref<2xf32>			%7 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %5, %7 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %5, %7 {
	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):			^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
	%tmp2 = exp %gen2_arg0 : f32			%tmp2 = exp %gen2_arg0 : f32
	linalg.yield %tmp2 : f32			linalg.yield %tmp2 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	"linalg.copy"(%7, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%7, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()			// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
				herhutUnsubmitted Done Reply Inline Actions I cannot follow this without the block numbers in the checks. Why do we get copies here? herhut: I cannot follow this without the block numbers in the checks. Why do we get copies here?
				dfki-makoAuthorUnsubmitted Done Reply Inline Actions The updated algorithm does not insert any copies in these cases any more. dfki-mako: The updated algorithm does not insert any copies in these cases any more.
	// CHECK-NEXT: linalg.generic			// CHECK-NEXT: linalg.generic
	// CHECK: %[[SECOND_ALLOC:.*]] = alloc()			// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
	// CHECK-NEXT: linalg.generic			// CHECK-NEXT: linalg.generic
	// CHECK: dealloc %[[FIRST_ALLOC]]			// CHECK: dealloc %[[FIRST_ALLOC]]
	// CHECK-NEXT: linalg.copy			// CHECK: linalg.copy
	// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]			// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case: No users for buffer in if-else CFG			// Test Case: No users for buffer in if-else CFG
	// bb0 <- Initial position of AllocOp			// bb0 <- Initial position of AllocOp
	// / \			// / \
	// bb1 bb2			// bb1 bb2
	// \ /			// \ /
	// bb3			// bb3
	// BufferPlacement Expected Behaviour: It shouldn't move the AllocOp. It only inserts a missing DeallocOp			// BufferPlacement Expected Behaviour: It shouldn't move the AllocOp. It only
	// in the exit block since %5 or %6 are the latest aliases of %0.			// inserts a missing DeallocOp in the exit block since %5 or %6 are the latest
				// aliases of %0.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @ifElseNoUsers			// CHECK-LABEL: func @ifElseNoUsers
	func @ifElseNoUsers(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {			func @ifElseNoUsers(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	cond_br %arg0, ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)			cond_br %arg0,
				^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>),
				^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
	^bb1(%1: memref<2xf32>, %2: memref<2xf32>):			^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
	br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)			br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)
	^bb2(%3: memref<2xf32>, %4: memref<2xf32>):			^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
	br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)			br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)
	^bb3(%5: memref<2xf32>, %6: memref<2xf32>):			^bb3(%5: memref<2xf32>, %6: memref<2xf32>):
	"linalg.copy"(%arg1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%arg1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK: dealloc			// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
				// CHECK: dealloc %[[FIRST_ALLOC]]
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case:			// Test Case:
	// bb0 <- Initial position of the first AllocOp			// bb0 <- Initial position of the first AllocOp
	// / \			// / \
	// bb1 bb2			// bb1 bb2
	// \| / \			// \| / \
	// \| bb3 bb4			// \| bb3 bb4
	// \ \ /			// \ \ /
	// \ /			// \ /
	// bb5 <- Initial position of the second AllocOp			// bb5 <- Initial position of the second AllocOp
	// BufferPlacement Expected Behaviour: AllocOps shouldn't be moved.			// BufferPlacement Expected Behaviour: AllocOps shouldn't be moved.
	// Two missing DeallocOps should be inserted in the exit block.			// Two missing DeallocOps should be inserted in the exit block.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @ifElseNested			// CHECK-LABEL: func @ifElseNested
	func @ifElseNested(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {			func @ifElseNested(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg1, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	cond_br %arg0, ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)			cond_br %arg0,
				^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>),
				^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
	^bb1(%1: memref<2xf32>, %2: memref<2xf32>):			^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
	br ^bb5(%1, %2 : memref<2xf32>, memref<2xf32>)			br ^bb5(%1, %2 : memref<2xf32>, memref<2xf32>)
	^bb2(%3: memref<2xf32>, %4: memref<2xf32>):			^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
	cond_br %arg0, ^bb3(%3 : memref<2xf32>), ^bb4(%4 : memref<2xf32>)			cond_br %arg0, ^bb3(%3 : memref<2xf32>), ^bb4(%4 : memref<2xf32>)
	^bb3(%5: memref<2xf32>):			^bb3(%5: memref<2xf32>):
	br ^bb5(%5, %3 : memref<2xf32>, memref<2xf32>)			br ^bb5(%5, %3 : memref<2xf32>, memref<2xf32>)
	^bb4(%6: memref<2xf32>):			^bb4(%6: memref<2xf32>):
	br ^bb5(%3, %6 : memref<2xf32>, memref<2xf32>)			br ^bb5(%3, %6 : memref<2xf32>, memref<2xf32>)
	^bb5(%7: memref<2xf32>, %8: memref<2xf32>):			^bb5(%7: memref<2xf32>, %8: memref<2xf32>):
	%9 = alloc() : memref<2xf32>			%9 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %7, %9 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %7, %9 {
	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):			^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
	%tmp2 = exp %gen2_arg0 : f32			%tmp2 = exp %gen2_arg0 : f32
	linalg.yield %tmp2 : f32			linalg.yield %tmp2 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	"linalg.copy"(%9, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%9, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()			// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
	// CHECK-NEXT: linalg.generic			// CHECK-NEXT: linalg.generic
	// CHECK: %[[SECOND_ALLOC:.*]] = alloc()			// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
				herhutUnsubmitted Done Reply Inline Actions Same here. herhut: Same here.
	// CHECK-NEXT: linalg.generic			// CHECK-NEXT: linalg.generic
	// CHECK: dealloc %[[FIRST_ALLOC]]			// CHECK: dealloc %[[FIRST_ALLOC]]
	// CHECK-NEXT: linalg.copy			// CHECK: linalg.copy
	// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]			// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case: Dead operations in a single block.			// Test Case: Dead operations in a single block.
	// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only inserts the two missing DeallocOps			// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only
	// after the last GenericOp.			// inserts the two missing DeallocOps after the last GenericOp.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @redundantOperations			// CHECK-LABEL: func @redundantOperations
	func @redundantOperations(%arg0: memref<2xf32>) {			func @redundantOperations(%arg0: memref<2xf32>) {
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg0, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	%1 = alloc() : memref<2xf32>			%1 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %0, %1 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %0, %1 {
	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):			^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
	%tmp2 = exp %gen2_arg0 : f32			%tmp2 = exp %gen2_arg0 : f32
	linalg.yield %tmp2 : f32			linalg.yield %tmp2 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	return			return
	}			}

	// CHECK: (%[[ARG0:.]]: {{.}})			// CHECK: (%[[ARG0:.]]: {{.}})
	// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()			// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
	// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[FIRST_ALLOC]]			// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[FIRST_ALLOC]]
	// CHECK: %[[SECOND_ALLOC:.*]] = alloc()			// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
	// CHECK-NEXT: linalg.generic {{.*}} %[[FIRST_ALLOC]], %[[SECOND_ALLOC]]			// CHECK-NEXT: linalg.generic {{.*}} %[[FIRST_ALLOC]], %[[SECOND_ALLOC]]
	// CHECK: dealloc			// CHECK: dealloc
	// CHECK-NEXT: dealloc			// CHECK-NEXT: dealloc
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case:			// Test Case:
	// bb0			// bb0
	// / \			// / \
	// Initial position of the first AllocOp -> bb1 bb2 <- Initial position of the second AllocOp			// Initial pos of the 1st AllocOp -> bb1 bb2 <- Initial pos of the 2nd AllocOp
	// \ /			// \ /
	// bb3			// bb3
	// BufferPlacement Expected Behaviour: Both AllocOps should be moved to the entry block. Both missing DeallocOps should be moved to			// BufferPlacement Expected Behaviour: Both AllocOps should be moved to the
	// the exit block after CopyOp since %arg2 is an alias for %0 and %1.			// entry block. Both missing DeallocOps should be moved to the exit block after
				// CopyOp since %arg2 is an alias for %0 and %1.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @moving_alloc_and_inserting_missing_dealloc			// CHECK-LABEL: func @moving_alloc_and_inserting_missing_dealloc
	func @moving_alloc_and_inserting_missing_dealloc(%cond: i1, %arg0: memref<2xf32>, %arg1: memref<2xf32>){			func @moving_alloc_and_inserting_missing_dealloc(
				%cond: i1,
				%arg0: memref<2xf32>,
				%arg1: memref<2xf32>) {
	cond_br %cond, ^bb1, ^bb2			cond_br %cond, ^bb1, ^bb2
	^bb1:			^bb1:
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg0, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	br ^exit(%0 : memref<2xf32>)			br ^exit(%0 : memref<2xf32>)
	^bb2:			^bb2:
	%1 = alloc() : memref<2xf32>			%1 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %1 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg0, %1 {
	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):			^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
	%tmp2 = exp %gen2_arg0 : f32			%tmp2 = exp %gen2_arg0 : f32
	linalg.yield %tmp2 : f32			linalg.yield %tmp2 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	br ^exit(%1 : memref<2xf32>)			br ^exit(%1 : memref<2xf32>)
	^exit(%arg2: memref<2xf32>):			^exit(%arg2: memref<2xf32>):
	"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK-NEXT: %{{.*}} = alloc()			// CHECK-NEXT: %{{.*}} = alloc()
	// CHECK-NEXT: %{{.*}} = alloc()			// CHECK-NEXT: %{{.*}} = alloc()
	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK-NEXT: dealloc			// CHECK-NEXT: dealloc
	// CHECK-NEXT: dealloc			// CHECK-NEXT: dealloc
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case: Invalid position of the DeallocOp. There is a user after deallocation.			// Test Case: Invalid position of the DeallocOp. There is a user after
				// deallocation.
	// bb0			// bb0
	// / \			// / \
	// bb1 bb2 <- Initial position of AllocOp			// bb1 bb2 <- Initial position of AllocOp
	// \ /			// \ /
	// bb3			// bb3
	// BufferPlacement Expected Behaviour: It should move the AllocOp to the entry block.			// BufferPlacement Expected Behaviour: It should move the AllocOp to the entry
	// The existing DeallocOp should be moved to exit block.			// block. The existing DeallocOp should be moved to exit block.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @moving_invalid_dealloc_op_complex			// CHECK-LABEL: func @moving_invalid_dealloc_op_complex
	func @moving_invalid_dealloc_op_complex(%cond: i1, %arg0: memref<2xf32>, %arg1: memref<2xf32>){			func @moving_invalid_dealloc_op_complex(
				%cond: i1,
				%arg0: memref<2xf32>,
				%arg1: memref<2xf32>) {
	cond_br %cond, ^bb1, ^bb2			cond_br %cond, ^bb1, ^bb2
	^bb1:			^bb1:
	br ^exit(%arg0 : memref<2xf32>)			br ^exit(%arg0 : memref<2xf32>)
	^bb2:			^bb2:
	%1 = alloc() : memref<2xf32>			%1 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %1 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg0, %1 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	dealloc %1 : memref<2xf32>			dealloc %1 : memref<2xf32>
	br ^exit(%1 : memref<2xf32>)			br ^exit(%1 : memref<2xf32>)
	^exit(%arg2: memref<2xf32>):			^exit(%arg2: memref<2xf32>):
	"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK-NEXT: %{{.*}} = alloc()			// CHECK-NEXT: %{{.*}} = alloc()
	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK-NEXT: dealloc			// CHECK-NEXT: dealloc
	// CHECK-NEXT: return			// CHECK-NEXT: return

	// -----			// -----

	// Test Case: Iserting missing DeallocOp in a single block.			// Test Case: Iserting missing DeallocOp in a single block.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @inserting_missing_dealloc_simple			// CHECK-LABEL: func @inserting_missing_dealloc_simple
	func @inserting_missing_dealloc_simple(%arg0 : memref<2xf32>, %arg1: memref<2xf32>){			func @inserting_missing_dealloc_simple(
	%0 = alloc() : memref<2xf32>			%arg0 : memref<2xf32>,
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {			%arg1: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg0, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}

	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK-NEXT: dealloc			// CHECK-NEXT: dealloc

	// -----			// -----

	// Test Case: Moving invalid DeallocOp (there is a user after deallocation) in a single block.			// Test Case: Moving invalid DeallocOp (there is a user after deallocation) in a
				// single block.

	#map0 = affine_map<(d0) -> (d0)>			#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @moving_invalid_dealloc_op			// CHECK-LABEL: func @moving_invalid_dealloc_op
	func @moving_invalid_dealloc_op(%arg0 : memref<2xf32>, %arg1: memref<2xf32>){			func @moving_invalid_dealloc_op(%arg0 : memref<2xf32>, %arg1: memref<2xf32>) {
	%0 = alloc() : memref<2xf32>			%0 = alloc() : memref<2xf32>
	linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {			linalg.generic {
				args_in = 1 : i64,
				args_out = 1 : i64,
				indexing_maps = [#map0, #map0],
				iterator_types = ["parallel"]} %arg0, %0 {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):			^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = exp %gen1_arg0 : f32			%tmp1 = exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32			linalg.yield %tmp1 : f32
	}: memref<2xf32>, memref<2xf32>			}: memref<2xf32>, memref<2xf32>
	dealloc %0 : memref<2xf32>			dealloc %0 : memref<2xf32>
	"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()			"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
	return			return
	}			}
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Extended BufferPlacement to support more sophisticated scenarios in which allocations cannot be moved freely and can remain in divergent control flow.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 270710

mlir/lib/Transforms/BufferPlacement.cpp

mlir/test/Transforms/buffer-placement.mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Extended BufferPlacement to support more sophisticated scenarios in which allocations cannot be moved freely and can remain in divergent control flow.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 270710

mlir/lib/Transforms/BufferPlacement.cpp

mlir/test/Transforms/buffer-placement.mlir

[mlir] Extended BufferPlacement to support more sophisticated scenarios in which allocations cannot be moved freely and can remain in divergent control flow.
ClosedPublic