This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Transforms/
-
Transforms/
45/45
BufferPlacement.cpp
-
test/Transforms/
-
Transforms/
1/1
buffer-placement.mlir

Differential D81926

[mlir] Extended BufferPlacement to support nested region control flow.
ClosedPublic

Authored by dfki-mako on Jun 16 2020, 4:11 AM.

Download Raw Diff

Details

Reviewers

pifon2a
herhut

Commits

rG6f5da84f7bb3: [mlir] Extended BufferPlacement to support nested region control flow.

Summary

The current BufferPlacement implementation does not support nested region control flow. This CL adds support for nested regions via the RegionBranchOpInterface and the detection of branch-like (ReturnLike) terminators inside nested regions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dfki-mako created this revision.Jun 16 2020, 4:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2020, 4:11 AM

Herald added subscribers: msifontes, jurahul, Kayjukh and 15 others. · View Herald Transcript

dfki-mako added reviewers: pifon2a, herhut.Jun 16 2020, 4:12 AM

Can you also add a test with deeper nesting?

mlir/lib/Transforms/BufferPlacement.cpp
72	Another way to describe `exitParentRegion` would be to only consider immediately nested regions. You can use the approach you have now, or simply use a loop over regions/blocks and get their terminators in the true case. Less traversal work.
78	would `terminator->getParentRegion()->getParentOp() == operation` do the same?
148	`detailed information` is not very helpful in a comment. What does it query?
162	Maybe `llvm::for_each`? Drop the `{`,`}`.
176	So these are the entry regions for this operation. Maybe write that in the comment.
180	This ties the operands of the op with a region to the block arguments of the target region,
181	`Block &successorBlock = regionSuccessor.getSuccessor()->front();` ?
182	`for_each` or drop the brackets.
186	Now that you have queried the flow-in, you can also query the flow within the op. For this, you can call `regionInterface.getSuccessorRegions` once for each region of the op. The alias registration is the same as with the initial flow into the op. Except, when `regionSuccessor.getSuccessor()` is `nulltptr`. That signals that the terminator of the region will exit the region. So you have to tie the `regionSuccessor.getSuccessorInputs()` to `parentOp->getResults()` in that case.
190	`/exitParentRegion=/false`
190	I don't think this is needed with the above implemented.
294	`front()`?
412	This copies a region result, so maybe reflect that in the name.
420	This also needs to use the `RegionBranchOpInterface` to find all the terminators in the operation and their successor inputs (they might be in a different order than the op results). Querying the terminators directly makes assumptions about ordering that do not exist.
mlir/test/Transforms/buffer-placement.mlir
779	There should be a `dealloc` here, right?

This revision now requires changes to proceed.Jun 16 2020, 5:04 AM

Harbormaster completed remote builds in B60453: Diff 271037.Jun 16 2020, 5:31 AM

dfki-mako retitled this revision from [mlir] Extended BufferPlacement to support nested region control flow. to [mlir] WIP: Extended BufferPlacement to support nested region control flow..Jun 17 2020, 4:16 AM

Added extended support for the RegionOpInterface to query successor bindings for successor regions

Harbormaster failed remote builds in B60999: Diff 272028!Jun 19 2020, 6:27 AM

rriddle added inline comments.Jun 19 2020, 11:05 AM

mlir/lib/Transforms/BufferPlacement.cpp
72	Please move static functions to the global namespace. anonymous namespace should only really be used by things like classes.
151	nit: Use a lambda instead.
166	nit: I don't think this really saves anything over a normal for loop, if anything it is much less efficient.
175	nit: Use the full name for the type.
200	nit: Use parameter names when passing constants, i.e., /someName=/
403	You are completely disregarding the fact that getMutableSuccessorOperands can return None, which would cause this to crash. Where are you checking that this is valid?
435	Is this guaranteed to be a RegionBranchOpInterface?
453	llvm::find?

herhut added inline comments.Jun 22 2020, 1:02 AM

mlir/lib/Transforms/BufferPlacement.cpp
175	Why is this called `operands`? These are the results of the overall operation, right?
177	This region does not have a valid successor block if it terminates the parent operation. In that case, we wire its successor operands with the parent operations results. This could be more obvious in the code if you would pass the parent operation. Please at least rename `operands` to `results`.
182	The loop and the zip could be part of the helper function `registerAliasFunc`.
187	Isn't this the wrong way round? The results alias the successor inputs?
213	This only follows one level of branching. So if the entry region goes to region 2, and that one goes to region 3, the second link will not be seen. It should be good enough to loop over all regions of an operation and then do this linking to successor regions.
403	The underlying assumption is that if the successor has block arguments, then the branch in the predecessor needs to have operands for those. Can we rely on this?

rriddle added inline comments.Jun 22 2020, 1:46 AM

mlir/lib/Transforms/BufferPlacement.cpp
403	Branching operations are not required to have a Value already materialized for a block argument. There are certain classes of operations that internally generate the value that is passed to the branch. For example, operations like LLVM Callbr and certain SIL switches.

Refactored implementation and simplified the iteration over all successor regions.

dfki-mako marked 6 inline comments as done.Jun 23 2020, 6:06 AM

dfki-mako added inline comments.

mlir/lib/Transforms/BufferPlacement.cpp
403	We have changed the code to emit an error message. We prefer the error message over silently ignoring this case as the retrieved alias information can be invalid if an operation passes values implicitly to a block argument.
435	The value should either be a `BlockArgument` or a value resulting from a `RegionBranchOpInterface` operation.

Harbormaster failed remote builds in B61382: Diff 272697!Jun 23 2020, 6:53 AM

dfki-mako retitled this revision from [mlir] WIP: Extended BufferPlacement to support nested region control flow. to [mlir] Extended BufferPlacement to support nested region control flow..Jun 24 2020, 3:51 AM

herhut added inline comments.Jun 24 2020, 4:00 AM

mlir/lib/Transforms/BufferPlacement.cpp
162	nit: interface
189	`entryRegion.getSuccessorInputs` returns the inputs of the target region, typically the block arguments of the first block. So this maps them with itself. Instead, this should map the result from `regionInterface().getSuccessorEntryOperands` to the `entryRegion.getSuccessorInputs`.
205	The terminator should also implement the `BranchOpInterface`, right? So, one should query the inputs to the successor using `BranchOpInterface.getSuccessorOperands`, correct? Here, `successorRegion.getSuccessorInputs` returns the input values to the region, which normally would be the block arguments or, in case this leaves the operation, the results.

Added support for region-region control flow within operations that implement the RegionBranchOpInterface.
Added new test operations to verify more advanced region-region control-flow scenarios.

Harbormaster failed remote builds in B61689: Diff 273290!Jun 25 2020, 4:17 AM

Cool stuff. Great to see this work with complex region interfaces, as well!

So, this now assumes that there is a 1:1 mapping between return-like op operands and successor inputs and we likely want an interface that makes this configurable. Maybe the returnlike trait could provide a function that enables this. (We discussed this, just writing it down.)

Also, we need a clean-up pass to remove unneeded alloc+copy+dealloc triples. This is underway.

With the nits addressed and some of the comments cleaned up, this is good to go.

mlir/lib/Transforms/BufferPlacement.cpp
190	Mega-nit: `entryRegion->getSuccessor` reads weird. It read like getting the successor of the entry region but actually it gets the entry region itself :) Maybe rename this `entrySuccessor`?
194	Maybe `Wire flow between regions and from region exits.`?
200	The length of `operandAttributes` might be wrong here, as it was built for the entry successors.
202	I think this comment is now off. It always wires the terminator operands with the successor inputs. The latter can be block arguments of a region's entry block or the result values, if the terminator exists the op.
293	Should this apply to the case above, as well?
394	Where is this conversion? Or is this comment off?
399	Nit: Doesn't this create a new alloc and a copy? If so, please fix the comment.
446	So is it OK to pass an empty ArrayRef here? If so, why can we not do this in the other cases?
mlir/test/lib/Dialect/Test/TestOps.td
1344 ↗	(On Diff #273290)	Please use let description = [{ ... }]; instead of comment.

This revision now requires changes to proceed.Jun 25 2020, 12:14 PM

With my comments addressed, this is good to go.

This revision is now accepted and ready to land.Jun 26 2020, 4:03 AM

Addressed reviewer comments.

mlir/lib/Transforms/BufferPlacement.cpp
189	Yeah makes sense to me; we should wire `getSuccessorEntryOperands` with `getSuccessorInputs`.
200	According to our interpretation of the comment operands` is a set of optional attributes that correspond to a constant value for each operand this should be an array containing values for each operand of the defining operation; although it might make sense to include an attribute for each block argument. However, we should keep this for now, as it is fully compatible with the current implementation of all instantiations of the `RegionBrachOpInterface`.
205	I guess querying a `BranchOpInterface` might not make sense in the case of a `ReturnLike` op, since it does not branch to any block in general (although this can still be expressed via the `BranchOpInterface`). It feels like that a `ReturnLike` terminator should provide a method to access all operands similar to the `BranchOpInterface`.

Harbormaster completed remote builds in B61906: Diff 273669.Jun 26 2020, 5:24 AM

Closed by commit rG6f5da84f7bb3: [mlir] Extended BufferPlacement to support nested region control flow. (authored by dfki-mako, committed by herhut). · Explain WhyJun 30 2020, 3:14 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

mlir/

lib/

Transforms/

BufferPlacement.cpp

269 lines

test/

Transforms/

buffer-placement.mlir

63 lines

Diff 271037

mlir/lib/Transforms/BufferPlacement.cpp

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/Passes.h"		#include "mlir/Transforms/Passes.h"
#include "llvm/ADT/SetOperations.h"		#include "llvm/ADT/SetOperations.h"

using namespace mlir;		using namespace mlir;

namespace {		namespace {

		/// Walks over all return-like terminators that either exit the parent region or
		/// a nested region.
		template <bool exitParentRegion, typename FuncT>
		herhutUnsubmitted Done Reply Inline Actions Another way to describe `exitParentRegion` would be to only consider immediately nested regions. You can use the approach you have now, or simply use a loop over regions/blocks and get their terminators in the true case. Less traversal work. herhut: Another way to describe `exitParentRegion` would be to only consider immediately nested regions.
		rriddleUnsubmitted Done Reply Inline Actions Please move static functions to the global namespace. anonymous namespace should only really be used by things like classes. rriddle: Please move static functions to the global namespace. anonymous namespace should only really be…
		static void walkReturnOperations(Operation *operation, const FuncT &func) {
		auto attachedRegions = operation->getRegions();
		operation->walk([&](Operation *terminator) {
		// Skip non-return-like terminators or return-like ones that do not satisfy
		// the escaping constraint.
		if (terminator->hasTrait<OpTrait::ReturnLike>() &&
		herhutUnsubmitted Done Reply Inline Actions would `terminator->getParentRegion()->getParentOp() == operation` do the same? herhut: would `terminator->getParentRegion()->getParentOp() == operation` do the same?
		(llvm::find_if(attachedRegions, [&](Region &region) {
		return &region == terminator->getParentRegion();
		}) == attachedRegions.end()) != exitParentRegion)
		func(terminator);
		});
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacementAliasAnalysis		// BufferPlacementAliasAnalysis
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// A straight-forward alias analysis which ensures that all aliases of all		/// A straight-forward alias analysis which ensures that all aliases of all
/// values will be determined. This is a requirement for the BufferPlacement		/// values will be determined. This is a requirement for the BufferPlacement
/// class since you need to determine safe positions to place alloc and		/// class since you need to determine safe positions to place alloc and
/// deallocs.		/// deallocs.
class BufferPlacementAliasAnalysis {		class BufferPlacementAliasAnalysis {
public:		public:
using ValueSetT = SmallPtrSet<Value, 16>;		using ValueSetT = SmallPtrSet<Value, 16>;
using ValueMapT = llvm::DenseMap<Value, ValueSetT>;		using ValueMapT = llvm::DenseMap<Value, ValueSetT>;

public:		public:
/// Constructs a new alias analysis using the op provided.		/// Constructs a new alias analysis using the op provided.
BufferPlacementAliasAnalysis(Operation *op) { build(op->getRegions()); }		BufferPlacementAliasAnalysis(Operation *op) { build(op); }

/// Find all immediate aliases this value could potentially have.		/// Find all immediate aliases this value could potentially have.
ValueMapT::const_iterator find(Value value) const {		ValueMapT::const_iterator find(Value value) const {
return aliases.find(value);		return aliases.find(value);
}		}

/// Returns the end iterator that can be used in combination with find.		/// Returns the end iterator that can be used in combination with find.
ValueMapT::const_iterator end() const { return aliases.end(); }		ValueMapT::const_iterator end() const { return aliases.end(); }

/// Find all immediate and indirect aliases this value could potentially		/// Find all immediate and indirect aliases this value could potentially
/// have. Note that the resulting set will also contain the value provided as		/// have. Note that the resulting set will also contain the value provided as
/// it is an alias of itself.		/// it is an alias of itself.
ValueSetT resolve(Value value) const {		ValueSetT resolve(Value value) const {
ValueSetT result;		ValueSetT result;
resolveRecursive(value, result);		resolveRecursive(value, result);
return result;		return result;
}		}

/// Removes the given values from all alias sets.		/// Removes the given values from all alias sets.
void remove(const SmallPtrSetImpl<BlockArgument> &aliasValues) {		void remove(const SmallPtrSetImpl<Value> &aliasValues) {
for (auto &entry : aliases)		for (auto &entry : aliases)
llvm::set_subtract(entry.second, aliasValues);		llvm::set_subtract(entry.second, aliasValues);
}		}

private:		private:
/// Recursively determines alias information for the given value. It stores		/// Recursively determines alias information for the given value. It stores
/// all newly found potential aliases in the given result set.		/// all newly found potential aliases in the given result set.
void resolveRecursive(Value value, ValueSetT &result) const {		void resolveRecursive(Value value, ValueSetT &result) const {
if (!result.insert(value).second)		if (!result.insert(value).second)
return;		return;
auto it = aliases.find(value);		auto it = aliases.find(value);
if (it == aliases.end())		if (it == aliases.end())
return;		return;
for (Value alias : it->second)		for (Value alias : it->second)
resolveRecursive(alias, result);		resolveRecursive(alias, result);
}		}

		/// Registers a new alias tuple entry (first element is the alias, the second
		/// one is the source value).
		void registerAlias(std::tuple<Value, Value> aliasEntry) {
		aliases[std::get<1>(aliasEntry)].insert(std::get<0>(aliasEntry));
		}

/// This function constructs a mapping from values to its immediate aliases.		/// This function constructs a mapping from values to its immediate aliases.
/// It iterates over all blocks, gets their predecessors, determines the		/// It iterates over all blocks, gets their predecessors, determines the
/// values that will be passed to the corresponding block arguments and		/// values that will be passed to the corresponding block arguments and
/// inserts them into the underlying map.		/// inserts them into the underlying map. Furthermore, it queries detailed
		herhutUnsubmitted Done Reply Inline Actions `detailed information` is not very helpful in a comment. What does it query? herhut: `detailed information` is not very helpful in a comment. What does it query?
void build(MutableArrayRef<Region> regions) {		/// information about successor regions and branch-like return operations
for (Region &region : regions) {		/// from nested regions.
for (Block &block : region) {		void build(Operation *op) {
		rriddleUnsubmitted Done Reply Inline Actions nit: Use a lambda instead. rriddle: nit: Use a lambda instead.
// Iterate over all predecessor and get the mapped values to their		op->walk([&](BranchOpInterface branchInterface) {
// corresponding block arguments values.		Block *parentBlock = branchInterface.getOperation()->getBlock();
for (auto it = block.pred_begin(), e = block.pred_end(); it != e;		for (auto it = parentBlock->succ_begin(), e = parentBlock->succ_end();
++it) {		it != e; ++it) {
unsigned successorIndex = it.getSuccessorIndex();
// Get the terminator and the values that will be passed to our block.
auto branchInterface =
dyn_cast<BranchOpInterface>((*it)->getTerminator());
if (!branchInterface)
continue;
// Query the branch op interace to get the successor operands.		// Query the branch op interace to get the successor operands.
auto successorOperands =		auto successorOperands =
branchInterface.getSuccessorOperands(successorIndex);		branchInterface.getSuccessorOperands(it.getIndex());
if (successorOperands.hasValue()) {		if (!successorOperands.hasValue())
		continue;
// Build the actual mapping of values to their immediate aliases.		// Build the actual mapping of values to their immediate aliases.
for (auto argPair : llvm::zip(block.getArguments(),		for (auto argPair :
		herhutUnsubmitted Done Reply Inline Actions Maybe `llvm::for_each`? Drop the `{`,`}`. herhut: Maybe `llvm::for_each`? Drop the `{`,`}`.
		herhutUnsubmitted Done Reply Inline Actions nit: interface herhut: nit: interface
successorOperands.getValue())) {		llvm::zip((*it)->getArguments(), successorOperands.getValue())) {
aliases[std::get<1>(argPair)].insert(std::get<0>(argPair));		registerAlias(argPair);
}		}
}		}
		rriddleUnsubmitted Done Reply Inline Actions nit: I don't think this really saves anything over a normal for loop, if anything it is much less efficient. rriddle: nit: I don't think this really saves anything over a normal for loop, if anything it is much…
		});

		// Query the RegionBranchOpInterface to find potential successor regions.
		op->walk([&](RegionBranchOpInterface regionInterface) {
		// Create an empty attribute for each operand to comply with the
		// `getSuccessorRegions` interface definition that requires a single
		// attribute per operand.
		SmallVector<Attribute, 2> operandAttributes(
		regionInterface.getOperation()->getNumOperands());
		rriddleUnsubmitted Done Reply Inline Actions nit: Use the full name for the type. rriddle: nit: Use the full name for the type.
		herhutUnsubmitted Done Reply Inline Actions Why is this called `operands`? These are the results of the overall operation, right? herhut: Why is this called `operands`? These are the results of the overall operation, right?
		// Extract all region successors.
		herhutUnsubmitted Done Reply Inline Actions So these are the entry regions for this operation. Maybe write that in the comment. herhut: So these are the entry regions for this operation. Maybe write that in the comment.
		SmallVector<RegionSuccessor, 2> regionSuccessors;
		herhutUnsubmitted Done Reply Inline Actions This region does not have a valid successor block if it terminates the parent operation. In that case, we wire its successor operands with the parent operations results. This could be more obvious in the code if you would pass the parent operation. Please at least rename `operands` to `results`. herhut: ``` This region does not have a valid successor block if it terminates the parent operation. In…
		regionInterface.getSuccessorRegions(llvm::None, operandAttributes,
		regionSuccessors);
		for (auto regionSuccessor : regionSuccessors) {
		herhutUnsubmitted Done Reply Inline Actions This ties the operands of the op with a region to the block arguments of the target region, herhut: This ties the operands of the op with a region to the block arguments of the target region,
		Block successorBlock = &regionSuccessor.getSuccessor()->begin();
		herhutUnsubmitted Done Reply Inline Actions `Block &successorBlock = regionSuccessor.getSuccessor()->front();` ? herhut: `Block &successorBlock = regionSuccessor.getSuccessor()->front();` ?
		for (auto argPair : llvm::zip(successorBlock->getArguments(),
		herhutUnsubmitted Done Reply Inline Actions `for_each` or drop the brackets. herhut: `for_each` or drop the brackets.
		herhutUnsubmitted Done Reply Inline Actions The loop and the zip could be part of the helper function `registerAliasFunc`. herhut: The loop and the zip could be part of the helper function `registerAliasFunc`.
		regionSuccessor.getSuccessorInputs())) {
		registerAlias(argPair);
}		}
}		}
		herhutUnsubmitted Done Reply Inline Actions Now that you have queried the flow-in, you can also query the flow within the op. For this, you can call `regionInterface.getSuccessorRegions` once for each region of the op. The alias registration is the same as with the initial flow into the op. Except, when `regionSuccessor.getSuccessor()` is `nulltptr`. That signals that the terminator of the region will exit the region. So you have to tie the `regionSuccessor.getSuccessorInputs()` to `parentOp->getResults()` in that case. herhut: Now that you have queried the flow-in, you can also query the flow within the op. For this, you…
		});
		herhutUnsubmitted Done Reply Inline Actions Isn't this the wrong way round? The results alias the successor inputs? herhut: Isn't this the wrong way round? The results alias the successor inputs?

		// Query all return operations that leave nested regions.
		herhutUnsubmitted Done Reply Inline Actions `entryRegion.getSuccessorInputs` returns the inputs of the target region, typically the block arguments of the first block. So this maps them with itself. Instead, this should map the result from `regionInterface().getSuccessorEntryOperands` to the `entryRegion.getSuccessorInputs`. herhut: `entryRegion.getSuccessorInputs` returns the inputs of the target region, typically the block…
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions Yeah makes sense to me; we should wire `getSuccessorEntryOperands` with `getSuccessorInputs`. dfki-mako: Yeah makes sense to me; we should wire `getSuccessorEntryOperands ` with `getSuccessorInputs`.
		walkReturnOperations<false>(op, [&](Operation *terminator) {
		herhutUnsubmitted Done Reply Inline Actions `/exitParentRegion=/false` herhut: `/exitParentRegion=/false`
		herhutUnsubmitted Done Reply Inline Actions I don't think this is needed with the above implemented. herhut: I don't think this is needed with the above implemented.
		herhutUnsubmitted Done Reply Inline Actions Mega-nit: `entryRegion->getSuccessor` reads weird. It read like getting the successor of the entry region but actually it gets the entry region itself :) Maybe rename this `entrySuccessor`? herhut: Mega-nit: `entryRegion->getSuccessor` reads weird. It read like getting the successor of the…
		Operation *parentOp = terminator->getParentOp();
		for (auto argPair :
		llvm::zip(parentOp->getResults(), terminator->getOperands())) {
		registerAlias(argPair);
		herhutUnsubmitted Done Reply Inline Actions Maybe `Wire flow between regions and from region exits.`? herhut: Maybe `Wire flow between regions and from region exits.`?
}		}
		});
}		}

/// Maps values to all immediate aliases this value can have.		/// Maps values to all immediate aliases this value can have.
ValueMapT aliases;		ValueMapT aliases;
		rriddleUnsubmitted Done Reply Inline Actions nit: Use parameter names when passing constants, i.e., /someName=/ rriddle: nit: Use parameter names when passing constants, i.e., /someName=/
		herhutUnsubmitted Done Reply Inline Actions The length of `operandAttributes` might be wrong here, as it was built for the entry successors. herhut: The length of `operandAttributes` might be wrong here, as it was built for the entry successors.
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions According to our interpretation of the comment operands` is a set of optional attributes that correspond to a constant value for each operand this should be an array containing values for each operand of the defining operation; although it might make sense to include an attribute for each block argument. However, we should keep this for now, as it is fully compatible with the current implementation of all instantiations of the `RegionBrachOpInterface`. dfki-mako: According to our interpretation of the comment > operands` is a set of optional attributes…
};		};

		herhutUnsubmitted Done Reply Inline Actions I think this comment is now off. It always wires the terminator operands with the successor inputs. The latter can be block arguments of a region's entry block or the result values, if the terminator exists the op. herhut: I think this comment is now off. It always wires the terminator operands with the successor…
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferPlacement		// BufferPlacement
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		herhutUnsubmitted Done Reply Inline Actions The terminator should also implement the `BranchOpInterface`, right? So, one should query the inputs to the successor using `BranchOpInterface.getSuccessorOperands`, correct? Here, `successorRegion.getSuccessorInputs` returns the input values to the region, which normally would be the block arguments or, in case this leaves the operation, the results. herhut: The terminator should also implement the `BranchOpInterface`, right? So, one should query the…
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions I guess querying a `BranchOpInterface` might not make sense in the case of a `ReturnLike` op, since it does not branch to any block in general (although this can still be expressed via the `BranchOpInterface`). It feels like that a `ReturnLike` terminator should provide a method to access all operands similar to the `BranchOpInterface`. dfki-mako: I guess querying a `BranchOpInterface` might not make sense in the case of a `ReturnLike` op…

// The main buffer placement analysis used to place allocs, copies and deallocs.		// The main buffer placement analysis used to place allocs, copies and deallocs.
class BufferPlacement {		class BufferPlacement {
public:		public:
using ValueSetT = BufferPlacementAliasAnalysis::ValueSetT;		using ValueSetT = BufferPlacementAliasAnalysis::ValueSetT;

/// An intermediate representation of a single allocation node.		/// An intermediate representation of a single allocation node.
struct AllocEntry {		struct AllocEntry {
		herhutUnsubmitted Done Reply Inline Actions This only follows one level of branching. So if the entry region goes to region 2, and that one goes to region 3, the second link will not be seen. It should be good enough to loop over all regions of an operation and then do this linking to successor regions. herhut: This only follows one level of branching. So if the entry region goes to region 2, and that one…
/// A reference to the associated allocation node.		/// A reference to the associated allocation node.
Value allocValue;		Value allocValue;

/// The associated placement block in which the allocation should be		/// The associated placement block in which the allocation should be
/// performed.		/// performed.
Block *placementBlock;		Block *placementBlock;

/// The associated dealloc operation (if any).		/// The associated dealloc operation (if any).
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	void initBlockMapping() {
});		});
}		}

/// Computes a valid allocation position in a dominator (if possible) for the		/// Computes a valid allocation position in a dominator (if possible) for the
/// given allocation result.		/// given allocation result.
Block *getInitialAllocBlock(OpResult result) {		Block *getInitialAllocBlock(OpResult result) {
// Get all allocation operands as these operands are important for the		// Get all allocation operands as these operands are important for the
// allocation operation.		// allocation operation.
auto operands = result.getOwner()->getOperands();		Operation *owner = result.getOwner();
		auto operands = owner->getOperands();
if (operands.size() < 1)		if (operands.size() < 1)
return findCommonDominator(result, aliases.resolve(result), dominators);		return findCommonDominator(result, aliases.resolve(result), dominators);

// If this node has dependencies, check all dependent nodes with respect		// If this node has dependencies, check all dependent nodes with respect
// to a common post dominator in which all values are available.		// to a common post dominator in which all values are available.
ValueSetT dependencies(++operands.begin(), operands.end());		ValueSetT dependencies(++operands.begin(), operands.end());
return findCommonDominator(*operands.begin(), dependencies, postDominators);		Block *dominator =
		findCommonDominator(*operands.begin(), dependencies, postDominators);
		// Do not move allocs out of their parent regions to keep them local.
		if (dominator->getParent() != owner->getParentRegion())
		herhutUnsubmitted Done Reply Inline Actions Should this apply to the case above, as well? herhut: Should this apply to the case above, as well?
		return &*owner->getParentRegion()->begin();
		herhutUnsubmitted Done Reply Inline Actions `front()`? herhut: `front()`?
		return dominator;
}		}

/// Finds correct alloc positions according to the algorithm described at		/// Finds correct alloc positions according to the algorithm described at
/// the top of the file for all alloc nodes that can be handled by this		/// the top of the file for all alloc nodes that can be handled by this
/// analysis.		/// analysis.
void placeAllocs() const {		void placeAllocs() const {
for (auto &entry : allocs) {		for (auto &entry : allocs) {
Value alloc = entry.allocValue;		Value alloc = entry.allocValue;
Show All 14 Lines	for (auto &entry : allocs) {
// Move the alloc in front of the start operation.		// Move the alloc in front of the start operation.
Operation *allocOperation = alloc.getDefiningOp();		Operation *allocOperation = alloc.getDefiningOp();
allocOperation->moveBefore(startOperation);		allocOperation->moveBefore(startOperation);
}		}
}		}

/// Introduces required allocs and copy operations to avoid memory leaks.		/// Introduces required allocs and copy operations to avoid memory leaks.
void introduceCopies() {		void introduceCopies() {
// Initialize the set of block arguments that require a dedicated memory		// Initialize the set of values that require a dedicated memory free
// free operation since their arguments cannot be safely deallocated in a		// operation since their operands cannot be safely deallocated in a post
// post dominator.		// dominator.
SmallPtrSet<BlockArgument, 8> blockArgsToFree;		SmallPtrSet<Value, 8> valuesToFree;
llvm::SmallDenseSet<std::tuple<BlockArgument, Block *>> visitedBlockArgs;		llvm::SmallDenseSet<std::tuple<Value, Block *>> visitedValues;
SmallVector<std::tuple<BlockArgument, Block *>, 8> toProcess;		SmallVector<std::tuple<Value, Block *>, 8> toProcess;

// Check dominance relation for proper dominance properties. If the given		// Check dominance relation for proper dominance properties. If the given
// value node does not dominate an alias, we will have to create a copy in		// value node does not dominate an alias, we will have to create a copy in
// order to free all buffers that can potentially leak into a post		// order to free all buffers that can potentially leak into a post
// dominator.		// dominator.
auto findUnsafeValues = [&](Value source, Block *definingBlock) {		auto findUnsafeValues = [&](Value source, Block *definingBlock) {
auto it = aliases.find(source);		auto it = aliases.find(source);
if (it == aliases.end())		if (it == aliases.end())
return;		return;
for (Value value : it->second) {		for (Value value : it->second) {
auto blockArg = value.cast<BlockArgument>();		if (valuesToFree.count(value) > 0)
if (blockArgsToFree.count(blockArg) > 0)
continue;		continue;
// Check whether we have to free this particular block argument.		// Check whether we have to free this particular block argument.
if (!dominators.dominates(definingBlock, blockArg.getOwner())) {		if (!dominators.dominates(definingBlock, value.getParentBlock())) {
toProcess.emplace_back(blockArg, blockArg.getParentBlock());		toProcess.emplace_back(value, value.getParentBlock());
blockArgsToFree.insert(blockArg);		valuesToFree.insert(value);
} else if (visitedBlockArgs		} else if (visitedValues.insert(std::make_tuple(value, definingBlock))
.insert(std::make_tuple(blockArg, definingBlock))
.second)		.second)
toProcess.emplace_back(blockArg, definingBlock);		toProcess.emplace_back(value, definingBlock);
}		}
};		};

// Detect possibly unsafe aliases starting from all allocations.		// Detect possibly unsafe aliases starting from all allocations.
for (auto &entry : allocs)		for (auto &entry : allocs)
findUnsafeValues(entry.allocValue, entry.placementBlock);		findUnsafeValues(entry.allocValue, entry.placementBlock);

// Try to find block arguments that require an explicit free operation		// Try to find block arguments that require an explicit free operation
// until we reach a fix point.		// until we reach a fix point.
while (!toProcess.empty()) {		while (!toProcess.empty()) {
auto current = toProcess.pop_back_val();		auto current = toProcess.pop_back_val();
findUnsafeValues(std::get<0>(current), std::get<1>(current));		findUnsafeValues(std::get<0>(current), std::get<1>(current));
}		}

// Update buffer aliases to ensure that we free all buffers and block		// Update buffer aliases to ensure that we free all buffers and block
// arguments at the correct locations.		// arguments at the correct locations.
aliases.remove(blockArgsToFree);		aliases.remove(valuesToFree);

// Add new allocs and additional copy operations.		// Add new allocs and additional copy operations.
for (BlockArgument blockArg : blockArgsToFree) {		for (Value value : valuesToFree) {
Block *block = blockArg.getOwner();		if (auto blockArg = value.dyn_cast<BlockArgument>())
		introduceBlockArgCopy(blockArg);
		else
		introduceValueCopy(value);

		// Register the value to require a final dealloc. Note that we do not have
		// to assign a block here since we do not want to move the allocation node
		// to another location.
		allocs.push_back({value, nullptr, nullptr});
		}
		}

		/// Introduces temporary allocs in all predecessors and copies the source
		/// values into the newly allocated buffers.
		void introduceBlockArgCopy(BlockArgument blockArg) {
// Allocate a buffer for the current block argument in the block of		// Allocate a buffer for the current block argument in the block of
// the associated value (which will be a predecessor block by		// the associated value (which will be a predecessor block by
// definition).		// definition).
for (auto it = block->pred_begin(), e = block->pred_end(); it != e;		Block *block = blockArg.getOwner();
++it) {		for (auto it = block->pred_begin(), e = block->pred_end(); it != e; ++it) {
// Get the terminator and the value that will be passed to our		// Get the terminator and the value that will be passed to our
// argument.		// argument.
Operation terminator = (it)->getTerminator();		Operation terminator = (it)->getTerminator();
auto branchInterface = cast<BranchOpInterface>(terminator);		auto branchInterface = cast<BranchOpInterface>(terminator);
		herhutUnsubmitted Done Reply Inline Actions Where is this conversion? Or is this comment off? herhut: Where is this conversion? Or is this comment off?
// Convert the mutable operand range to an immutable range and query the		// Convert the mutable operand range to an immutable range and query the
// associated source value.		// associated source value.
Value sourceValue =		Value sourceValue =
branchInterface.getSuccessorOperands(it.getSuccessorIndex())		branchInterface.getSuccessorOperands(it.getSuccessorIndex())
.getValue()[blockArg.getArgNumber()];		.getValue()[blockArg.getArgNumber()];
		herhutUnsubmitted Done Reply Inline Actions Nit: Doesn't this create a new alloc and a copy? If so, please fix the comment. herhut: Nit: Doesn't this create a new alloc and a copy? If so, please fix the comment.
// Create a new alloc at the current location of the terminator.		// Create a new alloc at the current location of the terminator.
		Value alloc = introduceBufferCopy(sourceValue, terminator);
		// Wire new alloc and successor operand.
		branchInterface.getMutableSuccessorOperands(it.getSuccessorIndex())
		rriddleUnsubmitted Done Reply Inline Actions You are completely disregarding the fact that getMutableSuccessorOperands can return None, which would cause this to crash. Where are you checking that this is valid? rriddle: You are completely disregarding the fact that getMutableSuccessorOperands can return None…
		herhutUnsubmitted Done Reply Inline Actions The underlying assumption is that if the successor has block arguments, then the branch in the predecessor needs to have operands for those. Can we rely on this? herhut: The underlying assumption is that if the successor has block arguments, then the branch in the…
		rriddleUnsubmitted Done Reply Inline Actions Branching operations are not required to have a Value already materialized for a block argument. There are certain classes of operations that internally generate the value that is passed to the branch. For example, operations like LLVM Callbr and certain SIL switches. rriddle: Branching operations are not required to have a Value already materialized for a block argument.
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions We have changed the code to emit an error message. We prefer the error message over silently ignoring this case as the retrieved alias information can be invalid if an operation passes values implicitly to a block argument. dfki-mako: We have changed the code to emit an error message. We prefer the error message over silently…
		.getValue()
		.slice(blockArg.getArgNumber(), 1)
		.assign(alloc);
		}
		}

		/// Introduces temporary allocs in front of all associated nested-region
		/// terminators and copies the source values into the newly allocated buffers.
		void introduceValueCopy(Value value) {
		herhutUnsubmitted Done Reply Inline Actions This copies a region result, so maybe reflect that in the name. herhut: This copies a region result, so maybe reflect that in the name.
		// Get the actual result index in the scope of the parent terminator.
		Operation *operation = value.getDefiningOp();
		auto resultIndex =
		llvm::find_if(operation->getResults(), [&](OpResult result) {
		return result == value;
		}).getIndex();

		walkReturnOperations<true>(operation, [&](Operation *terminator) {
		herhutUnsubmitted Done Reply Inline Actions This also needs to use the `RegionBranchOpInterface` to find all the terminators in the operation and their successor inputs (they might be in a different order than the op results). Querying the terminators directly makes assumptions about ordering that do not exist. herhut: This also needs to use the `RegionBranchOpInterface` to find all the terminators in the…
		// Extract the source value from the current terminator.
		Value sourceValue = terminator->getOperand(resultIndex);
		// Create a new alloc at the current location of the terminator.
		Value alloc = introduceBufferCopy(sourceValue, terminator);
		// Wire alloc and terminator operand.
		terminator->setOperand(resultIndex, alloc);
		});
		}

		/// Creates a new memory allocation for the given source value and copies its
		/// content into the newly allocated buffer. The terminator operation is used
		/// to insert the alloc and copy operations at the right places.
		Value introduceBufferCopy(Value sourceValue, Operation *terminator) {
		// Create a new alloc at the current location of the terminator.
auto memRefType = sourceValue.getType().cast<MemRefType>();		auto memRefType = sourceValue.getType().cast<MemRefType>();
		rriddleUnsubmitted Done Reply Inline Actions Is this guaranteed to be a RegionBranchOpInterface? rriddle: Is this guaranteed to be a RegionBranchOpInterface?
		dfki-makoAuthorUnsubmitted Done Reply Inline Actions The value should either be a `BlockArgument` or a value resulting from a `RegionBranchOpInterface` operation. dfki-mako: The value should either be a `BlockArgument` or a value resulting from a…
OpBuilder builder(terminator);		OpBuilder builder(terminator);

// Extract information about dynamically shaped types by		// Extract information about dynamically shaped types by
// extracting their dynamic dimensions.		// extracting their dynamic dimensions.
SmallVector<Value, 4> dynamicOperands;		SmallVector<Value, 4> dynamicOperands;
for (auto shapeElement : llvm::enumerate(memRefType.getShape())) {		for (auto shapeElement : llvm::enumerate(memRefType.getShape())) {
if (!ShapedType::isDynamic(shapeElement.value()))		if (!ShapedType::isDynamic(shapeElement.value()))
continue;		continue;
dynamicOperands.push_back(builder.create<DimOp>(		dynamicOperands.push_back(builder.create<DimOp>(
terminator->getLoc(), sourceValue, shapeElement.index()));		terminator->getLoc(), sourceValue, shapeElement.index()));
}		}
		herhutUnsubmitted Done Reply Inline Actions So is it OK to pass an empty ArrayRef here? If so, why can we not do this in the other cases? herhut: So is it OK to pass an empty ArrayRef here? If so, why can we not do this in the other cases?

// TODO: provide a generic interface to create dialect-specific		// TODO: provide a generic interface to create dialect-specific
// Alloc and CopyOp nodes.		// Alloc and CopyOp nodes.
auto alloc = builder.create<AllocOp>(terminator->getLoc(), memRefType,		auto alloc = builder.create<AllocOp>(terminator->getLoc(), memRefType,
dynamicOperands);		dynamicOperands);
// Wire new alloc and successor operand.
branchInterface.getMutableSuccessorOperands(it.getSuccessorIndex())
.getValue()
.slice(blockArg.getArgNumber(), 1)
.assign(alloc);
// Create a new copy operation that copies to contents of the old		// Create a new copy operation that copies to contents of the old
		rriddleUnsubmitted Done Reply Inline Actions llvm::find? rriddle: llvm::find?
// allocation to the new one.		// allocation to the new one.
builder.create<linalg::CopyOp>(terminator->getLoc(), sourceValue,		builder.create<linalg::CopyOp>(terminator->getLoc(), sourceValue, alloc);
alloc);
}

// Register the block argument to require a final dealloc. Note that		return alloc;
// we do not have to assign a block here since we do not want to
// move the allocation node to another location.
allocs.push_back({blockArg, nullptr, nullptr});
}
}		}

/// Finds associated deallocs that can be linked to our allocation nodes (if		/// Finds associated deallocs that can be linked to our allocation nodes (if
/// any).		/// any).
void findDeallocs() {		void findDeallocs() {
for (auto &entry : allocs) {		for (auto &entry : allocs) {
auto userIt =		auto userIt =
llvm::find_if(entry.allocValue.getUsers(), [&](Operation *user) {		llvm::find_if(entry.allocValue.getUsers(), [&](Operation *user) {
▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

mlir/test/Transforms/buffer-placement.mlir

	Show First 20 Lines • Show All 710 Lines • ▼ Show 20 Lines
	}			}
	// CHECK: (%[[ARG0:.]]: memref<5xf32>, %[[ARG1:.]]: memref<10xf32>, %[[RESULT:.*]]: memref<5xf32>)			// CHECK: (%[[ARG0:.]]: memref<5xf32>, %[[ARG1:.]]: memref<10xf32>, %[[RESULT:.*]]: memref<5xf32>)
	// CHECK: %[[X:.*]] = alloc()			// CHECK: %[[X:.*]] = alloc()
	// CHECK: %[[Y:.*]] = alloc()			// CHECK: %[[Y:.*]] = alloc()
	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK: dealloc %[[Y]]			// CHECK: dealloc %[[Y]]
	// CHECK: return %[[ARG1]], %[[X]]			// CHECK: return %[[ARG1]], %[[X]]

				// -----

				// Test Case: nested region control flow
				// The alloc position of %1 does not need to be changed and flows through
				// both if branches until it is finally returned. Hence, it does not
				// require a specific dealloc operation. However, %3 requires a dealloc.

				func @nested_region_control_flow(
				%arg0 : index,
				%arg1 : index) -> memref<?x?xf32> {
				%0 = cmpi "eq", %arg0, %arg1 : index
				%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%2 = scf.if %0 -> (memref<?x?xf32>) {
				scf.yield %1 : memref<?x?xf32>
				} else {
				%3 = alloc(%arg0, %arg1) : memref<?x?xf32>
				scf.yield %1 : memref<?x?xf32>
				}
				return %2 : memref<?x?xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if
				// CHECK: scf.yield %[[ALLOC0]]
				// CHECK: %[[ALLOC2:.*]] = alloc(%arg0, %arg1)
				// CHECK-NEXT: dealloc %[[ALLOC2]]
				// CHECK-NEXT: scf.yield %[[ALLOC0]]
				// CHECK: return %[[ALLOC1]]

				// -----

				// Test Case: nested region control flow with a nested buffer allocation in a
				// divergent branch.
				// The alloc positions of %1, %3 does not need to be changed since
				// BufferPlacement does not move allocs out of nested regions at the moment.
				// However, since %3 is allocated and "returned" in a divergent branch, we have
				// to allocate a temporary buffer (like in condBranchDynamicTypeNested).

				func @nested_region_control_flow_div(
				%arg0 : index,
				%arg1 : index) -> memref<?x?xf32> {
				%0 = cmpi "eq", %arg0, %arg1 : index
				%1 = alloc(%arg0, %arg0) : memref<?x?xf32>
				%2 = scf.if %0 -> (memref<?x?xf32>) {
				scf.yield %1 : memref<?x?xf32>
				} else {
				%3 = alloc(%arg0, %arg1) : memref<?x?xf32>
				scf.yield %3 : memref<?x?xf32>
				}
				return %2 : memref<?x?xf32>
				}

				// CHECK: %[[ALLOC0:.*]] = alloc(%arg0, %arg0)
				// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if
				// CHECK: %[[ALLOC2:.*]] = alloc
				// CHECK-NEXT: linalg.copy(%[[ALLOC0]], %[[ALLOC2]])
				// CHECK: scf.yield %[[ALLOC2]]
				// CHECK: %[[ALLOC3:.*]] = alloc(%arg0, %arg1)
				// CHECK: %[[ALLOC4:.*]] = alloc
				// CHECK-NEXT: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])
				// CHECK: scf.yield %[[ALLOC4]]
				herhutUnsubmitted Done Reply Inline Actions There should be a `dealloc` here, right? herhut: There should be a `dealloc` here, right?
				// CHECK: dealloc %[[ALLOC0]]
				// CHECK-NEXT: return %[[ALLOC1]]