This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Transforms/
-
mlir/
-
Transforms/
6/6
BufferPlacement.h
-
Passes.h
1/1
Passes.td
-
lib/Transforms/
-
Transforms/
12/13
BufferPlacement.cpp
-
CMakeLists.txt
-
test/
-
Transforms/
-
buffer-placement-prepration.mlir
-
buffer-placement.mlir
-
lib/Transforms/
-
Transforms/
-
CMakeLists.txt
3/3
TestBufferPlacement.cpp
-
tools/mlir-opt/
-
mlir-opt/
1/1
mlir-opt.cpp

Differential D78484

Providing buffer assignment for MLIR
ClosedPublic

Authored by dfki-ehna on Apr 20 2020, 4:32 AM.

Download Raw Diff

Details

Reviewers

mehdi_amini
rriddle
herhut
pifon2a

Commits

rG5c352e69e76a: Providing buffer assignment for MLIR

Summary

We have provided a generic buffer assignment transformation ported from TensorFlow. This generic transformation pass automatically analyzes the values and their aliases (also in other blocks) and returns the valid positions for Alloc and Dealloc operations. To find these positions, the algorithm uses the block Dominator and Post-Dominator analyses. In our proposed algorithm, we have considered aliasing, liveness, nested regions, branches, conditional branches, critical edges, and independency to custom block terminators. This implementation doesn't support block loops. However, we have considered this in our design. For this purpose, it is only required to have a loop analysis to insert Alloc and Dealloc operations outside of these loops in some special cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dfki-ehna created this revision.Apr 20 2020, 4:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2020, 4:32 AM

Herald added subscribers: llvm-commits, frgossen, grosul1 and 13 others. · View Herald Transcript

dfki-ehna added reviewers: mehdi_amini, rriddle, herhut, pifon2a.Apr 20 2020, 4:34 AM

dfki-mako added a subscriber: dfki-mako.Apr 20 2020, 4:36 AM

Harbormaster failed remote builds in B53950: Diff 258702!Apr 20 2020, 4:48 AM

Replacing fake test-purpose operations by Linalg.Generic operations. For testing computeAllocPosition of Buffer Assignment (BA), GenericOpConverter is introduced inside TestBufferAssignmentPreparationPass to convert tensor-type linalg.generic operations to memref ones. FunctionAndBlockSignatureConverter and NonVoidToVoidReturnOpConverter of BA are also tested.

Harbormaster completed remote builds in B53966: Diff 258747.Apr 20 2020, 8:37 AM

Thanks! Left some stylistic comments for now, will review in more detail later.

mlir/include/mlir/Transforms/BufferAssignment.h
74 ↗	(On Diff #258747)	Please use /// for top-level comments.
91 ↗	(On Diff #258747)	Same here.
102 ↗	(On Diff #258747)	Same here.
107 ↗	(On Diff #258747)	Do we need to hard-code FuncOp here? Could we instead just check the number of arguments in the entry block of the function? Or is there some better way to make this more general?
mlir/lib/Transforms/BufferAssignment.cpp
41 ↗	(On Diff #258747)	Please drop the `(username)` from the TODO.
105 ↗	(On Diff #258747)	Use the predecessor iterators instead, so that you get the successor index for free: for (auto it = block.pred_begin(), e = block.pred_end(); it !=e; ++it) { unsigned successorIndex = it.getSuccessorIndex(); }
113 ↗	(On Diff #258747)	Please prefer early exit when possible, e.g.: auto branchInterface = ...; if (!branchInterface) continue;
197 ↗	(On Diff #258747)	nit: auto -> Value Please use the full type if possible.
198 ↗	(On Diff #258747)	nit: auto -> Operation *
330 ↗	(On Diff #258747)	nit: Please remove the `(username)`
330 ↗	(On Diff #258747)	Generally using traits and interfaces are how you should generalize a pass, instead of templating.
384 ↗	(On Diff #258747)	Please just construct an InsertionPoint directly if you need one.
446 ↗	(On Diff #258747)	Why block2? Can we just use block?
460 ↗	(On Diff #258747)	PassRegistration should use the declarative tablegen backend, i.e. you should add your pass here: https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Transforms/Passes.td

This revision now requires changes to proceed.Apr 20 2020, 9:55 AM

mehdi_amini added a project: Restricted Project.Apr 20 2020, 4:12 PM

Thanks for upstreaming this!

mlir/include/mlir/Transforms/BufferAssignment.h
9 ↗	(On Diff #258747)	Typo `assginment`
mlir/test/Transforms/buffer-assignment.mlir
1 ↗	(On Diff #258747)	Can you try to use only operations with registered dialects? The test dialect in-tree is a potential host for such operations, and `std.call @external_func` is also frequently a good escape hatch.
64 ↗	(On Diff #258747)	Can you document the property you're trying to test in each test-case? I really helps to interpret minimal check lines.

mehdi_amini added inline comments.Apr 20 2020, 8:09 PM

mlir/include/mlir/Transforms/BufferAssignment.h
50 ↗	(On Diff #258747)	There seems to be a precondition that is does not work for a BlockArgument, can you take an `OpResult` as argument instead?
58 ↗	(On Diff #258747)	Can you expand on the intended uses of it? Maybe a small example?
107 ↗	(On Diff #258747)	I agree. Some tricky part is that this goes with the other pattern above which is also hard-coded on FuncOp and this pattern can only operate correctly after the other one rewrote the function. This is an even stronger point of coupling I believe.
113 ↗	(On Diff #258747)	Nit: `numFuncArgs - numReturnValues` is a loop invariant, and it could help readability to hoist it out and name it, like `int firstReturnParameter = numFuncArgs - numReturnValues;` or similar.
120 ↗	(On Diff #258747)	I am puzzled about why not just `rewriter.setInsertionPoint(returnOp.getOperation());` here?
mlir/lib/Transforms/BufferAssignment.cpp
44 ↗	(On Diff #258747)	When you say "does not support loops": can you be more precise in what will happen in the presence of loops (miscompile, crash, or just suboptimal allocations?)
117 ↗	(On Diff #258747)	`successorOps`->`successorOperands` ("Op" is for "Operation" in general)
120 ↗	(On Diff #258747)	Nit: `Value arg`
122 ↗	(On Diff #258747)	I'd consider iterating on `llvm::zip(block.getArguments(), successorOps.getValue())`
126 ↗	(On Diff #258747)	what happens in the "else" case with respect to aliasing? It isn't clear to me that this is safe?
185 ↗	(On Diff #258747)	Nit: avoid trivial braces
194 ↗	(On Diff #258747)	Isn't there a trait we could use instead of hard-coding the `AllocOp` here (and everywhere)?
232 ↗	(On Diff #258747)	Can you write the API as taking an `OpResult` instead of the base `Value`?
249 ↗	(On Diff #258747)	Does it need to be a template? This is a private method and I see a single use?
368 ↗	(On Diff #258747)	Isn't this all assuming a particular model with a single allocation / deallocation per buffer? What about a pattern like the following with two allocs feeding into one dealloc? (we can imagine more complex cases) cond_br %cond, ^bb1, ^bb2 ^bb1: %buffer1 = alloc ... br ^exit(%buffer1 : ....) ^bb2: %buffer2 = alloc ... br ^exit(%buffer2 : ...) ^exit(%buffer: ...): dealloc %buffer
388 ↗	(On Diff #258747)	This does not seem to be "computing" anything? Is this like "future work"?
409 ↗	(On Diff #258747)	Nit: auto -> Type
416 ↗	(On Diff #258747)	Typo: `arugments`
426 ↗	(On Diff #258747)	Can't you just call `setType()` on the BlockArgument? (also don't use auto when unnecessary)
446 ↗	(On Diff #258747)	`block` instead of `block2` here? Also don't use auto when it does not make the code mode readable, I think that `Block &block` would be just fine here.
449 ↗	(On Diff #258747)	Remove the `legality` boolean and just `if (all_of(...)) return false;` here. You can even replace the outer loop with an all_of/any_of ;) return all_of(funcOp.getBlocks(), [] (Block &block) { return all_of(block2.getArguments(), isLegalBlockArg); }
464 ↗	(On Diff #258747)	It seems to me that the pass is a misnomer: it does not really "assign" buffers, but optimizes the placement, would there be a more accurate name? Also saying "into their proper positions" seems like it is intended for correctness.
mlir/test/lib/Transforms/TestBufferAssignment.cpp
29 ↗	(On Diff #258747)	Can you clarify this sentence? I am missing context
mlir/tools/mlir-opt/mlir-opt.cpp
97	This isn't in the test directory, so it shouldn't be registered here but use the same mechanism as the other non-tests passes.

benvanik added a subscriber: benvanik.Apr 21 2020, 1:29 PM

benvanik added inline comments.

mlir/lib/Transforms/BufferAssignment.cpp
398 ↗	(On Diff #258747)	How tied is this pass to memref? If we have our own dialect type that represents buffers that we want to use with our own dialect alloc/dealloc ops, how can we use that here? Specifically this kind of function type conversion seems better served by a TypeConverter that can be provided by the target dialect. For us, for example, we'd not have it change types at all probably, and instead just use this for inserting our alloc/dealloc markers.

silvas added a subscriber: silvas.Apr 21 2020, 2:22 PM

silvas added inline comments.

mlir/test/lib/Transforms/TestBufferAssignment.cpp
57 ↗	(On Diff #258747)	How would this pass be extended to support dynamic shapes? It would be good to have that written down in a comment (and hopefully implemented as a fast follow-on to this CL).

dfki-mako added inline comments.Apr 22 2020, 1:00 AM

mlir/lib/Transforms/BufferAssignment.cpp
330 ↗	(On Diff #258747)	Definitely. We are going to make this pass more generic in one of the next CLs.
388 ↗	(On Diff #258747)	This function does not "compute" anything. This is just a simple abstraction to make the integration into the dialect-specific legalization phases more convenient and pratical with respect to future extensions. For instance, adding support for additional types and interfaces might require an extension of this functionality.
398 ↗	(On Diff #258747)	Currently, the implementation is strongly coupled to memref types. However, this only affects the helper converters provided. The underlying pass will be extended in a follow-up CL to work on alloc and free interfaces instead of AllocOp and DeallocOp. This will make the general pass compatible with arbitrary dialects that implement the required interfaces.
464 ↗	(On Diff #258747)	I guess a better name would be "BufferPlacement", since the pass moves allocations/deallocations into "better" positions.
mlir/test/lib/Transforms/TestBufferAssignment.cpp
57 ↗	(On Diff #258747)	We can add more specific comments about the feature you mentioned. Adding support for dynamic shapes would be in fact on the next follow-up CLs. One thing that has to be adapted is the computation of the alloc and free positions. Moreover, it might be necessary to adapt the BufferAssignmentPlacer (however, I don't think so at the moment).

Resolving all the comments and adding description for tests.

dfki-ehna marked 33 inline comments as done.Apr 22 2020, 4:21 AM

dfki-mako added inline comments.Apr 22 2020, 4:31 AM

mlir/lib/Transforms/BufferAssignment.cpp
194 ↗	(On Diff #258747)	There is a `MemoryEffectsOpInterface` that could be used in favor of hard code standard allocation operations. However, I would prefer making BA more generic in a follow-up CL. Furthermore, this would require this CL to be merged first in order to support standard Alloc and Dealloc nodes.
368 ↗	(On Diff #258747)	This pass currently assumes a single-allocation/deallocation model that usually appears during straight-forward legalization (lowering) of operations. We wanted to keep the first version simple and would like to significantly extend the functionality in one of the follow-up CLs.

Harbormaster failed remote builds in B54222: Diff 259228!Apr 22 2020, 5:22 AM

Change NonVoidToVoidReturnOpConverter to use arguments of entry block arguments instead of FuncOp.

dfki-ehna marked an inline comment as done.Apr 22 2020, 6:35 AM

dfki-ehna added inline comments.

mlir/include/mlir/Transforms/BufferAssignment.h
107 ↗	(On Diff #258747)	Thanks for the suggestion. Replaced with the block entry.

Harbormaster failed remote builds in B54235: Diff 259263!Apr 22 2020, 7:33 AM

benvanik added inline comments.Apr 22 2020, 11:54 AM

mlir/lib/Transforms/BufferAssignment.cpp
398 ↗	(On Diff #258747)	If that future work could remove the use of MemRefType to instead use a TypeConverter that'd be awesome. Is there a bug you are using to track this work that I could follow along on to see when it lands? I really like the behavior of the pass and am excited to plug it in to our stuff :)

Overall LGTM, it'd be nice if someone else could also also approve though (@silvas ?)

mlir/include/mlir/Transforms/BufferPlacement.h
120	Almost all the uses of auto above could benefit from using the actual types.
mlir/lib/Transforms/BufferAssignment.cpp
368 ↗	(On Diff #258747)	Sure, but please document this restriction in the TableGen pass description and add a TODO in the code so the reader is aware that this is a known limitation (in general it helps me reviewing code when the known limitations are spelled out and there is explicit acknowledgement of what will be addressed in followup revisions) Even in single-allocation/deallocation, would the pass pessimizes case of conditional allocation? For example in the following (pseudo) code I suspect you'd increase the dynamic memory consumption by always allocating both buffers: if (cond) %buffer1 = alloc ... else %buffer2 = alloc ... ... if (cond) { consume(buffer1) dealloc(buffer1) } else { consume(buffer2) dealloc(buffer2) }
mlir/lib/Transforms/BufferPlacement.cpp
252	Can you spell the types above?
283	(here as well)
292	`Operation *` ?
mlir/test/lib/Transforms/TestBufferPlacement.cpp
128	FuncOp

rriddle added inline comments.Apr 22 2020, 1:41 PM

mlir/include/mlir/Transforms/BufferPlacement.h
14	typo TRANSFORM -> TRANSFORMS
21	This header isn't necessary I believe.
69	Please remove the trailing _ from these variable names.
mlir/include/mlir/Transforms/Passes.td
109	Can you add some example input/output here?
mlir/lib/Transforms/BufferPlacement.cpp
95	nit: auto -> Value
183	nit: Add punctuation here.
350	Can you emit an error and use `signalPassFailure()` instead of assert here?
422	This isn't really valid to do directly in a pattern, as it is being done outside of the rewriter. Seems like this pattern can just be replaced by using a TypeConverter instead.

Herald added a subscriber: Kayjukh. · View Herald TranscriptApr 22 2020, 1:41 PM

dfki-mako added inline comments.Apr 23 2020, 4:40 AM

mlir/lib/Transforms/BufferAssignment.cpp
368 ↗	(On Diff #258747)	In this case, the current allocation policies will cause the dynamic memory allocation to increase, as you outlined above. However, we are going to extend the documentation to describe the current limitations and assumptions. In one of the future CLs, the buffer placement transformation will include several optimization passes to optimize the overall memory consumption.

dfki-ehna marked 2 inline comments as done.Apr 23 2020, 6:45 AM

dfki-ehna added inline comments.

mlir/lib/Transforms/BufferAssignment.cpp
398 ↗	(On Diff #258747)	@benvanik We are definitely going to use TypeConverter instead either in this CL or in the following up one. Are you referring to an open discussion issue?
mlir/lib/Transforms/BufferPlacement.cpp
422	TypeConverter has convertBlockSignature which returns SignatureConversion but there is no applySignatureConversion for the rewriter that gets a Block as an input (the current version only accepts a region). Are we missing the point?

Resolved second pass comments.

Polish BufferPlacement.h

dfki-ehna marked 12 inline comments as done.Apr 23 2020, 7:59 AM

dfki-ehna added inline comments.

mlir/include/mlir/Transforms/BufferPlacement.h
21	BufferAssignmentOpConversionPattern inherits from OpConversionPattern in this file.

Harbormaster failed remote builds in B54398: Diff 259570!Apr 23 2020, 8:38 AM

Harbormaster completed remote builds in B54400: Diff 259573.

rriddle added inline comments.Apr 23 2020, 11:11 AM

mlir/include/mlir/Transforms/BufferPlacement.h
21	Sorry, I meant the one above it. Support/LLVM.h is included transitively.
mlir/lib/Transforms/BufferPlacement.cpp
422	When passing a type converter the non-entry blocks are converted automatically using that converter. After that I would expect that the default function conversion pattern would remove the need for this pattern: https://github.com/llvm/llvm-project/blob/3d178581ac7f5336b1ac75e31001de074ecca937/mlir/include/mlir/Transforms/DialectConversion.h#L300

Provide BufferAssignmentTypeConverter for using inside FunctionAndBlockSignatureConverter.

dfki-ehna marked 3 inline comments as done.Apr 24 2020, 5:39 AM

dfki-ehna added inline comments.

mlir/lib/Transforms/BufferPlacement.cpp
422	Thanks. Resolved.

rriddle accepted this revision.Apr 24 2020, 11:04 AM

rriddle added inline comments.

mlir/lib/Transforms/BufferPlacement.cpp
355	Can you change this walk to return WalkResult? That allows for interrupting the walk early. You can return WalkResult::interrupt()(or failure()) to stop the walk, and WalkResult::advance()/success() to continue the walk.
438	An easy way of doing this is just: target.addDynamicallyLegalOp<FuncOp>([&](FuncOp funcOp) { return typeConverter.isSignatureLegal(funcOp.getType()); });
mlir/test/lib/Transforms/TestBufferPlacement.cpp
30	A function pass is not allowed to mutate the public type of the function, so this should be a module pass.
131	Can you just do this inside of the converter constructor? Otherwise, you don't need a specific converter class.

This revision is now accepted and ready to land.Apr 24 2020, 11:04 AM

Resolve the latest comments.

dfki-ehna marked 5 inline comments as done.Apr 27 2020, 3:14 AM

dfki-ehna added inline comments.

mlir/lib/Transforms/BufferPlacement.cpp
438	Thanks. We got rid of FunctionAndBlockSignatureConverter::addDynamicallyLegalFuncOp and added it directly to the TestBufferPlacement pass. So, the dialect experts also need to add this to their targets in their legalization passes.

Harbormaster completed remote builds in B54772: Diff 260263.Apr 27 2020, 3:43 AM

Taking Context and ConversionTarget out of the scope of function walk.

Harbormaster completed remote builds in B54779: Diff 260276.Apr 27 2020, 5:52 AM

Closed by commit rG5c352e69e76a: Providing buffer assignment for MLIR (authored by dfki-ehna, committed by dfki-mako). · Explain WhyApr 28 2020, 1:34 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

mlir/

include/

mlir/

Transforms/

BufferPlacement.h

137 lines

Passes.h

3 lines

Passes.td

10 lines

lib/

Transforms/

BufferPlacement.cpp

455 lines

CMakeLists.txt

1 line

test/

Transforms/

buffer-placement-prepration.mlir

143 lines

buffer-placement.mlir

412 lines

lib/

Transforms/

CMakeLists.txt

1 line

TestBufferPlacement.cpp

149 lines

tools/

mlir-opt/

mlir-opt.cpp

2 lines

Diff 259263

mlir/include/mlir/Transforms/BufferPlacement.h

This file was added.

				//===- BufferPlacement.h - Buffer Assignment Utilities ---------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header file defines buffer assignment helper methods to compute correct
				// and valid positions for placing Alloc and Dealloc operations.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_TRANSFORM_BUFFERPLACEMENT_H
				rriddleUnsubmitted Done Reply Inline Actions typo TRANSFORM -> TRANSFORMS rriddle: typo TRANSFORM -> TRANSFORMS
				#define MLIR_TRANSFORM_BUFFERPLACEMENT_H

				#include "mlir/Analysis/Dominance.h"
				#include "mlir/Analysis/Liveness.h"
				#include "mlir/IR/Builders.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/Support/LLVM.h"
				rriddleUnsubmitted Done Reply Inline Actions This header isn't necessary I believe. rriddle: This header isn't necessary I believe.
				dfki-ehnaAuthorUnsubmitted Done Reply Inline Actions BufferAssignmentOpConversionPattern inherits from OpConversionPattern in this file. dfki-ehna: BufferAssignmentOpConversionPattern inherits from OpConversionPattern in this file.
				rriddleUnsubmitted Done Reply Inline Actions Sorry, I meant the one above it. Support/LLVM.h is included transitively. rriddle: Sorry, I meant the one above it. Support/LLVM.h is included transitively.
				#include "mlir/Transforms/DialectConversion.h"

				namespace mlir {

				/// Prepares a buffer placement phase. It can place (user-defined) alloc
				/// nodes. This simplifies the integration of the actual buffer-placement
				/// pass. Sample usage:
				/// BufferAssignmentPlacer baHelper(regionOp);
				/// -> determine alloc positions
				/// auto allocPosition = baHelper.computeAllocPosition(value);
				/// -> place alloc
				/// allocBuilder.setInsertionPoint(positions.getAllocPosition());
				/// <create alloc>
				/// Note: this class is intended to be used during legalization. In order
				/// to move alloc and dealloc nodes into the right places you can use the
				/// createBufferPlacementPass() function.
				class BufferAssignmentPlacer {
				public:
				/// Creates a new assignment builder.
				explicit BufferAssignmentPlacer(Operation *op);

				/// Returns the operation this analysis was constructed from.
				Operation *getOperation() const { return operation; }

				/// Computes the actual position to place allocs for the given result.
				OpBuilder::InsertPoint computeAllocPosition(OpResult result);

				private:
				/// The operation this analysis was constructed from.
				Operation *operation;
				};

				/// Helper conversion pattern that encapsulates a BufferAssignmentPlacer
				/// instance. Sample usage:
				/// class CustomConversionPattern : public
				/// BufferAssignmentOpConversionPattern<MyOpT>
				/// {
				/// ... matchAndRewrite(...) {
				/// -> Access stored BufferAssignmentPlacer
				/// bufferAssignment->computeAllocPosition(resultOp);
				/// }
				/// };
				template <typename SourceOp>
				class BufferAssignmentOpConversionPattern
				: public OpConversionPattern<SourceOp> {
				public:
				explicit BufferAssignmentOpConversionPattern(
				MLIRContext *context_,
				rriddleUnsubmitted Done Reply Inline Actions Please remove the trailing _ from these variable names. rriddle: Please remove the trailing _ from these variable names.
				BufferAssignmentPlacer *bufferAssignment_ = nullptr,
				PatternBenefit benefit_ = 1)
				: OpConversionPattern<SourceOp>(context_, benefit_),
				bufferAssignment(bufferAssignment_) {}

				protected:
				BufferAssignmentPlacer *bufferAssignment;
				};

				/// Converts only the tensor-type function and block arguments to memref-type.
				class FunctionAndBlockSignatureConverter
				: public BufferAssignmentOpConversionPattern<FuncOp> {
				public:
				using BufferAssignmentOpConversionPattern<
				FuncOp>::BufferAssignmentOpConversionPattern;

				/// Adding functions whose arguments are memref type to the set of legal
				/// operations.
				static void addDynamicallyLegalFuncOp(ConversionTarget &target);

				/// Performs the actual signature rewriting step.
				LogicalResult
				matchAndRewrite(FuncOp funcOp, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const final;
				};

				/// This pattern converter transforms a non-void ReturnOpSourceTy into a void
				/// return of type ReturnOpTargetTy. It uses a copy operation of type CopyOpTy
				/// to copy the results to the output buffer.
				template <typename ReturnOpSourceTy, typename ReturnOpTargetTy,
				typename CopyOpTy>
				class NonVoidToVoidReturnOpConverter
				: public BufferAssignmentOpConversionPattern<ReturnOpSourceTy> {
				public:
				using BufferAssignmentOpConversionPattern<
				ReturnOpSourceTy>::BufferAssignmentOpConversionPattern;

				/// Performs the actual return-op conversion step.
				LogicalResult
				matchAndRewrite(ReturnOpSourceTy returnOp, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const final {
				auto numReturnValues = returnOp.getNumOperands();
				Block &entryBlock = returnOp.getParentRegion()->front();
				auto numFuncArgs = entryBlock.getNumArguments();
				auto loc = returnOp.getLoc();

				// Find the corresponding output buffer for each operand.
				auto firstReturnParameter = numFuncArgs - numReturnValues;
				for (auto operand : llvm::enumerate(operands)) {
				auto returnArgNumber = firstReturnParameter + operand.index();
				auto dstBuffer = entryBlock.getArgument(returnArgNumber);
				mehdi_aminiUnsubmitted Done Reply Inline Actions Almost all the uses of auto above could benefit from using the actual types. mehdi_amini: Almost all the uses of auto above could benefit from using the actual types.
				if (dstBuffer == operand.value())
				continue;

				// Insert the copy operation to copy before the return.
				rewriter.setInsertionPoint(returnOp);
				rewriter.create<CopyOpTy>(loc, operand.value(),
				entryBlock.getArgument(returnArgNumber));
				}
				// Insert the new target return operation.
				rewriter.replaceOpWithNewOp<ReturnOpTargetTy>(returnOp);
				return success();
				}
				};

				} // end namespace mlir

				#endif // MLIR_TRANSFORM_BUFFERPLACEMENT_H

mlir/include/mlir/Transforms/Passes.h

	Show All 20 Lines
	namespace mlir {			namespace mlir {

	class AffineForOp;			class AffineForOp;
	class FuncOp;			class FuncOp;
	class ModuleOp;			class ModuleOp;
	class Pass;			class Pass;
	template <typename T> class OperationPass;			template <typename T> class OperationPass;

				/// Creates an instance of the BufferPlacement pass.
				std::unique_ptr<Pass> createBufferPlacementPass();

	/// Creates an instance of the Canonicalizer pass.			/// Creates an instance of the Canonicalizer pass.
	std::unique_ptr<Pass> createCanonicalizerPass();			std::unique_ptr<Pass> createCanonicalizerPass();

	/// Creates a pass to perform common sub expression elimination.			/// Creates a pass to perform common sub expression elimination.
	std::unique_ptr<Pass> createCSEPass();			std::unique_ptr<Pass> createCSEPass();

	/// Creates a loop fusion pass which fuses loops. Buffers of size less than or			/// Creates a loop fusion pass which fuses loops. Buffers of size less than or
	/// equal to `localBufSizeThreshold` are promoted to memory space			/// equal to `localBufSizeThreshold` are promoted to memory space
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

mlir/include/mlir/Transforms/Passes.td

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	module {
return		return
}		}
}		}
```		```
}];		}];
let constructor = "mlir::createPipelineDataTransferPass()";		let constructor = "mlir::createPipelineDataTransferPass()";
}		}

		def BufferPlacement : Pass<"buffer-placement"> {
		let summary = "Optimizes placement of alloc and dealloc operations";
		let description = [{
		This pass implements an algorithm to optimize the placement of alloc and
		dealloc operations. This pass also inserts missing dealloc operations
		rriddleUnsubmitted Done Reply Inline Actions Can you add some example input/output here? rriddle: Can you add some example input/output here?
		automatically to reclaim memory.
		}];
		let constructor = "mlir::createBufferPlacementPass()";
		}

def Canonicalizer : Pass<"canonicalize"> {		def Canonicalizer : Pass<"canonicalize"> {
let summary = "Canonicalize operations";		let summary = "Canonicalize operations";
let description = [{		let description = [{
This pass performs various types of canonicalizations over a set of		This pass performs various types of canonicalizations over a set of
operations. See [Operation Canonicalization](Canonicalization.md) for more		operations. See [Operation Canonicalization](Canonicalization.md) for more
details.		details.
}];		}];
let constructor = "mlir::createCanonicalizerPass()";		let constructor = "mlir::createCanonicalizerPass()";
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

mlir/lib/Transforms/BufferPlacement.cpp

This file was added.

				//===- BufferPlacement.cpp - the impl for buffer placement ---------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements logic for computing correct alloc and dealloc positions.
				// The main class is the BufferPlacementPass class that implements the
				// underlying algorithm. In order to put allocations and deallocations at safe
				// positions, it is significantly important to put them into the correct blocks.
				// However, the liveness analysis does not pay attention to aliases, which can
				// occur due to branches (and their associated block arguments) in general. For
				// this purpose, BufferPlacement firstly finds all possible aliases for a single
				// value (using the BufferPlacementAliasAnalysis class). Consider the following
				// example:
				//
				// ^bb0(%arg0):
				// cond_br %cond, ^bb1, ^bb2
				// ^bb1:
				// br ^exit(%arg0)
				// ^bb2:
				// %new_value = ...
				// br ^exit(%new_value)
				// ^exit(%arg1):
				// return %arg1;
				//
				// Using liveness information on its own would cause us to place the allocs and
				// deallocs in the wrong block. This is due to the fact that %new_value will not
				// be liveOut of its block. Instead, we have to place the alloc for %new_value
				// in bb0 and its associated dealloc in exit. Using the class
				// BufferPlacementAliasAnalysis, we will find out that %new_value has a
				// potential alias %arg1. In order to find the dealloc position we have to find
				// all potential aliases, iterate over their uses and find the common
				// post-dominator block. In this block we can safely be sure that %new_value
				// will die and can use liveness information to determine the exact operation
				// after which we have to insert the dealloc. Finding the alloc position is
				// highly similar and non- obvious. Again, we have to consider all potential
				// aliases and find the common dominator block to place the alloc.
				//
				// TODO:
				// The current implementation does not support loops and the resulting code will
				// be invalid with respect to program semantics. The only thing that is
				// currently missing is a high-level loop analysis that allows us to move allocs
				// and deallocs outside of the loop blocks.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Transforms/BufferPlacement.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/IR/Function.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/Passes.h"

				using namespace mlir;

				namespace {

				//===----------------------------------------------------------------------===//
				// BufferPlacementAliasAnalysis
				//===----------------------------------------------------------------------===//

				/// A straight-forward alias analysis which ensures that all aliases of all
				/// values will be determined. This is a requirement for the BufferPlacement
				/// class since you need to determine safe positions to place alloc and
				/// deallocs.
				class BufferPlacementAliasAnalysis {
				public:
				using ValueSetT = SmallPtrSet<Value, 16>;

				public:
				/// Constructs a new alias analysis using the op provided.
				BufferPlacementAliasAnalysis(Operation *op) { build(op->getRegions()); }

				/// Finds all immediate and indirect aliases this value could potentially
				/// have. Note that the resulting set will also contain the value provided as
				/// it is an alias of itself.
				ValueSetT resolve(Value value) const {
				ValueSetT result;
				resolveRecursive(value, result);
				return result;
				}

				private:
				/// Recursively determines alias information for the given value. It stores
				/// all newly found potential aliases in the given result set.
				void resolveRecursive(Value value, ValueSetT &result) const {
				if (!result.insert(value).second)
				return;
				auto it = aliases.find(value);
				if (it == aliases.end())
				return;
				for (auto alias : it->second)
				rriddleUnsubmitted Done Reply Inline Actions nit: auto -> Value rriddle: nit: auto -> Value
				resolveRecursive(alias, result);
				}

				/// This function constructs a mapping from values to its immediate aliases.
				/// It iterates over all blocks, gets their predecessors, determines the
				/// values that will be passed to the corresponding block arguments and
				/// inserts them into the underlying map.
				void build(MutableArrayRef<Region> regions) {
				for (Region &region : regions) {
				for (Block &block : region) {
				// Iterate over all predecessor and get the mapped values to their
				// corresponding block arguments values.
				for (auto it = block.pred_begin(), e = block.pred_end(); it != e;
				++it) {
				unsigned successorIndex = it.getSuccessorIndex();
				// Get the terminator and the values that will be passed to our block.
				auto branchInterface =
				dyn_cast<BranchOpInterface>((*it)->getTerminator());
				if (!branchInterface)
				continue;
				// Query the branch op interace to get the successor operands.
				auto successorOperands =
				branchInterface.getSuccessorOperands(successorIndex);
				if (successorOperands.hasValue()) {
				// Build the actual mapping of values to their immediate aliases.
				for (auto argPair : llvm::zip(block.getArguments(),
				successorOperands.getValue())) {
				aliases[std::get<1>(argPair)].insert(std::get<0>(argPair));
				}
				}
				}
				}
				}
				}

				/// Maps values to all immediate aliases this value can have.
				llvm::DenseMap<Value, ValueSetT> aliases;
				};

				//===----------------------------------------------------------------------===//
				// BufferPlacementPositions
				//===----------------------------------------------------------------------===//

				/// Stores correct alloc and dealloc positions to place dialect-specific alloc
				/// and dealloc operations.
				struct BufferPlacementPositions {
				public:
				BufferPlacementPositions()
				: allocPosition(nullptr), deallocPosition(nullptr) {}

				/// Creates a new positions tuple including alloc and dealloc positions.
				BufferPlacementPositions(Operation allocPosition, Operation deallocPosition)
				: allocPosition(allocPosition), deallocPosition(deallocPosition) {}

				/// Returns the alloc position before which the alloc operation has to be
				/// inserted.
				Operation *getAllocPosition() const { return allocPosition; }

				/// Returns the dealloc position after which the dealloc operation has to be
				/// inserted.
				Operation *getDeallocPosition() const { return deallocPosition; }

				private:
				Operation *allocPosition;
				Operation *deallocPosition;
				};

				//===----------------------------------------------------------------------===//
				// BufferPlacementAnalysis
				//===----------------------------------------------------------------------===//

				// The main buffer placement analysis used to place allocs and deallocs.
				class BufferPlacementAnalysis {
				public:
				using DeallocSetT = SmallPtrSet<Operation *, 2>;

				public:
				BufferPlacementAnalysis(Operation *op)
				: operation(op), liveness(op), dominators(op), postDominators(op),
				aliases(op) {}

				/// Computes the actual positions to place allocs and deallocs for the given
				/// value.
				BufferPlacementPositions
				computeAllocAndDeallocPositions(OpResult result) const {
				if (result.use_empty())
				return BufferPlacementPositions(result.getOwner(), result.getOwner());
				// Get all possible aliases
				rriddleUnsubmitted Done Reply Inline Actions nit: Add punctuation here. rriddle: nit: Add punctuation here.
				auto possibleValues = aliases.resolve(result);
				return BufferPlacementPositions(getAllocPosition(result, possibleValues),
				getDeallocPosition(result, possibleValues));
				}

				/// Finds all associated dealloc nodes for the alloc nodes using alias
				/// information.
				DeallocSetT findAssociatedDeallocs(AllocOp alloc) const {
				DeallocSetT result;
				auto possibleValues = aliases.resolve(alloc);
				for (Value alias : possibleValues)
				for (Operation *user : alias.getUsers()) {
				if (isa<DeallocOp>(user))
				result.insert(user);
				}
				return result;
				}

				/// Dumps the buffer placement information to the given stream.
				void print(raw_ostream &os) const {
				os << "// ---- Buffer Placement -----\n";

				for (Region &region : operation->getRegions())
				for (Block &block : region)
				for (Operation &operation : block)
				for (OpResult result : operation.getResults()) {
				BufferPlacementPositions positions =
				computeAllocAndDeallocPositions(result);
				os << "Positions for ";
				result.print(os);
				os << "\n Alloc: ";
				positions.getAllocPosition()->print(os);
				os << "\n Dealloc: ";
				positions.getDeallocPosition()->print(os);
				os << "\n";
				}
				}

				private:
				/// Finds a correct placement block to store alloc/dealloc node according to
				/// the algorithm described at the top of the file. It supports dominator and
				/// post-dominator analyses via template arguments.
				template <typename DominatorT>
				Block *
				findPlacementBlock(OpResult result,
				const BufferPlacementAliasAnalysis::ValueSetT &aliases,
				const DominatorT &doms) const {
				// Start with the current block the value is defined in.
				Block *dom = result.getOwner()->getBlock();
				// Iterate over all aliases and their uses to find a safe placement block
				// according to the given dominator information.
				for (Value alias : aliases)
				for (Operation *user : alias.getUsers()) {
				// Move upwards in the dominator tree to find an appropriate
				// dominator block that takes the current use into account.
				dom = doms.findNearestCommonDominator(dom, user->getBlock());
				}
				return dom;
				}

				/// Finds a correct alloc positions according to the algorithm described at
				/// the top of the file.
				Operation *getAllocPosition(
				OpResult result,
				const BufferPlacementAliasAnalysis::ValueSetT &aliases) const {
				// Determine the actual block to place the alloc and get liveness
				// information.
				auto placementBlock = findPlacementBlock(result, aliases, dominators);
				auto livenessInfo = liveness.getLiveness(placementBlock);
				mehdi_aminiUnsubmitted Done Reply Inline Actions Can you spell the types above? mehdi_amini: Can you spell the types above?

				// We have to ensure that the alloc will be before the first use of all
				// aliases of the given value. We first assume that there are no uses in the
				// placementBlock and that we can safely place the alloc before the
				// terminator at the end of the block.
				Operation *startOperation = placementBlock->getTerminator();
				// Iterate over all aliases and ensure that the startOperation will point to
				// the first operation of all potential aliases in the placementBlock.
				for (Value alias : aliases) {
				Operation *aliasStartOperation = livenessInfo->getStartOperation(alias);
				// Check whether the aliasStartOperation lies in the desired block and
				// whether it is before the current startOperation. If yes, this will be
				// the new startOperation.
				if (aliasStartOperation->getBlock() == placementBlock &&
				aliasStartOperation->isBeforeInBlock(startOperation))
				startOperation = aliasStartOperation;
				}
				// startOperation is the first operation before which we can safely store
				// the alloc taking all potential aliases into account.
				return startOperation;
				}

				/// Finds a correct dealloc position according to the algorithm described at
				/// the top of the file.
				Operation *getDeallocPosition(
				OpResult result,
				const BufferPlacementAliasAnalysis::ValueSetT &aliases) const {
				// Determine the actual block to place the dealloc and get liveness
				// information.
				auto placementBlock = findPlacementBlock(result, aliases, postDominators);
				auto livenessInfo = liveness.getLiveness(placementBlock);
				mehdi_aminiUnsubmitted Done Reply Inline Actions (here as well) mehdi_amini: (here as well)

				// We have to ensure that the dealloc will be after the last use of all
				// aliases of the given value. We first assume that there are no uses in the
				// placementBlock and that we can safely place the dealloc at the beginning.
				Operation *endOperation = &placementBlock->front();
				// Iterate over all aliases and ensure that the endOperation will point to
				// the last operation of all potential aliases in the placementBlock.
				for (Value alias : aliases) {
				auto aliasEndOperation =
				mehdi_aminiUnsubmitted Done Reply Inline Actions `Operation ` ? mehdi_amini:* `Operation *` ?
				livenessInfo->getEndOperation(alias, endOperation);
				// Check whether the aliasEndOperation lies in the desired block and
				// whether it is behind the current endOperation. If yes, this will be the
				// new endOperation.
				if (aliasEndOperation->getBlock() == placementBlock &&
				endOperation->isBeforeInBlock(aliasEndOperation))
				endOperation = aliasEndOperation;
				}
				// endOperation is the last operation behind which we can safely store the
				// dealloc taking all potential aliases into account.
				return endOperation;
				}

				/// The operation this transformation was constructed from.
				Operation *operation;

				/// The underlying liveness analysis to compute fine grained information about
				/// alloc and dealloc positions.
				Liveness liveness;

				/// The dominator analysis to place allocs in the appropriate blocks.
				DominanceInfo dominators;

				/// The post dominator analysis to place deallocs in the appropriate blocks.
				PostDominanceInfo postDominators;

				/// The internal alias analysis to ensure that allocs and deallocs take all
				/// their potential aliases into account.
				BufferPlacementAliasAnalysis aliases;
				};

				//===----------------------------------------------------------------------===//
				// BufferPlacementPass
				//===----------------------------------------------------------------------===//

				/// The actual buffer placement pass that moves alloc and dealloc nodes into
				/// the right positions. It uses the algorithm described at the top of the file.
				// TODO: create a templated version that allows to match dialect-specific
				// alloc/dealloc nodes and to insert dialect-specific dealloc node.
				struct BufferPlacementPass
				: mlir::PassWrapper<BufferPlacementPass, FunctionPass> {
				void runOnFunction() override {
				// Get required analysis information first.
				auto &analysis = getAnalysis<BufferPlacementAnalysis>();

				// Compute an initial placement of all nodes.
				llvm::SmallDenseMap<Value, BufferPlacementPositions, 16> placements;
				getFunction().walk([&](AllocOp alloc) {
				placements[alloc] = analysis.computeAllocAndDeallocPositions(
				alloc.getOperation()->getResult(0));
				});

				// Move alloc (and dealloc - if any) nodes into the right places
				// and insert dealloc nodes if necessary.
				getFunction().walk([&](AllocOp alloc) {
				// Find already associated dealloc nodes.
				auto deallocs = analysis.findAssociatedDeallocs(alloc);
				assert(deallocs.size() < 2 &&
				rriddleUnsubmitted Done Reply Inline Actions Can you emit an error and use `signalPassFailure()` instead of assert here? rriddle: Can you emit an error and use `signalPassFailure()` instead of assert here?
				"Not supported number of associated dealloc operations");

				// Move alloc node to the right place.
				BufferPlacementPositions &positions = placements[alloc];
				Operation *allocOperation = alloc.getOperation();
				rriddleUnsubmitted Done Reply Inline Actions Can you change this walk to return WalkResult? That allows for interrupting the walk early. You can return WalkResult::interrupt()(or failure()) to stop the walk, and WalkResult::advance()/success() to continue the walk. rriddle: Can you change this walk to return WalkResult? That allows for interrupting the walk early.
				allocOperation->moveBefore(positions.getAllocPosition());

				// If there is an existing dealloc, move it to the right place.
				if (deallocs.size()) {
				Operation *nextOp = positions.getDeallocPosition()->getNextNode();
				assert(nextOp && "Invalid Dealloc operation position");
				(*deallocs.begin())->moveBefore(nextOp);
				} else {
				// If there is no dealloc node, insert one in the right place.
				OpBuilder builder(alloc);
				builder.setInsertionPointAfter(positions.getDeallocPosition());
				builder.create<DeallocOp>(allocOperation->getLoc(), alloc);
				}
				});
				};
				};

				} // end anonymous namespace

				//===----------------------------------------------------------------------===//
				// BufferAssignmentPlacer
				//===----------------------------------------------------------------------===//

				/// Creates a new assignment placer.
				BufferAssignmentPlacer::BufferAssignmentPlacer(Operation *op) : operation(op) {}

				/// Computes the actual position to place allocs for the given value.
				OpBuilder::InsertPoint
				BufferAssignmentPlacer::computeAllocPosition(OpResult result) {
				Operation *owner = result.getOwner();
				return OpBuilder::InsertPoint(owner->getBlock(), Block::iterator(owner));
				}

				//===----------------------------------------------------------------------===//
				// FunctionAndBlockSignatureConverter
				//===----------------------------------------------------------------------===//

				// Performs the actual signature rewriting step.
				LogicalResult FunctionAndBlockSignatureConverter::matchAndRewrite(
				FuncOp funcOp, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const {
				auto toMemrefConverter = [&](Type t) -> Type {
				if (auto tensorType = t.dyn_cast<RankedTensorType>())
				return MemRefType::get(tensorType.getShape(),
				tensorType.getElementType());
				return t;
				};
				// Converting tensor-type function arguments to memref-type.
				auto funcType = funcOp.getType();
				TypeConverter::SignatureConversion conversion(funcType.getNumInputs());
				for (auto argType : llvm::enumerate(funcType.getInputs()))
				conversion.addInputs(argType.index(), toMemrefConverter(argType.value()));
				for (Type resType : funcType.getResults())
				conversion.addInputs(toMemrefConverter(resType));
				rewriter.updateRootInPlace(funcOp, [&] {
				funcOp.setType(
				rewriter.getFunctionType(conversion.getConvertedTypes(), llvm::None));
				rewriter.applySignatureConversion(&funcOp.getBody(), conversion);
				});
				// Converting tensor-type block arguments of all blocks inside the
				// function region to memref-type except for the entry block.
				for (Block &block : funcOp.getBlocks()) {
				if (block.isEntryBlock())
				continue;
				for (int i = 0, e = block.getNumArguments(); i < e; ++i) {
				auto arg = block.getArgument(i);
				arg.setType(toMemrefConverter(arg.getType()));
				rriddleUnsubmitted Not Done Reply Inline Actions This isn't really valid to do directly in a pattern, as it is being done outside of the rewriter. Seems like this pattern can just be replaced by using a TypeConverter instead. rriddle: This isn't really valid to do directly in a pattern, as it is being done outside of the…
				dfki-ehnaAuthorUnsubmitted Done Reply Inline Actions TypeConverter has convertBlockSignature which returns SignatureConversion but there is no applySignatureConversion for the rewriter that gets a Block as an input (the current version only accepts a region). Are we missing the point? dfki-ehna: TypeConverter has convertBlockSignature which returns SignatureConversion but there is no…
				rriddleUnsubmitted Done Reply Inline Actions When passing a type converter the non-entry blocks are converted automatically using that converter. After that I would expect that the default function conversion pattern would remove the need for this pattern: https://github.com/llvm/llvm-project/blob/3d178581ac7f5336b1ac75e31001de074ecca937/mlir/include/mlir/Transforms/DialectConversion.h#L300 rriddle: When passing a type converter the non-entry blocks are converted automatically using that…
				dfki-ehnaAuthorUnsubmitted Done Reply Inline Actions Thanks. Resolved. dfki-ehna: Thanks. Resolved.
				}
				}
				return success();
				}

				/// A helper method to make the functions, whose all block argument types are
				/// Memref or non-shaped type, legal. BufferAssignmentPlacer expects all
				/// function and block argument types are in Memref or non-shaped type. Using
				/// this helper method and additionally, FunctionAndBlockSignatureConverter as a
				/// pattern conversion make sure that the type of block arguments are compatible
				/// with using BufferAssignmentPlacer.
				void FunctionAndBlockSignatureConverter::addDynamicallyLegalFuncOp(
				ConversionTarget &target) {
				auto isLegalBlockArg = [](BlockArgument arg) -> bool {
				auto type = arg.getType();
				return type.isa<MemRefType>() \|\| !type.isa<ShapedType>();
				rriddleUnsubmitted Done Reply Inline Actions An easy way of doing this is just: target.addDynamicallyLegalOp<FuncOp>([&](FuncOp funcOp) { return typeConverter.isSignatureLegal(funcOp.getType()); }); rriddle: An easy way of doing this is just: ``` target.addDynamicallyLegalOp<FuncOp>([&](FuncOp funcOp)…
				dfki-ehnaAuthorUnsubmitted Done Reply Inline Actions Thanks. We got rid of FunctionAndBlockSignatureConverter::addDynamicallyLegalFuncOp and added it directly to the TestBufferPlacement pass. So, the dialect experts also need to add this to their targets in their legalization passes. dfki-ehna: Thanks. We got rid of FunctionAndBlockSignatureConverter::addDynamicallyLegalFuncOp and added…
				};
				target.addDynamicallyLegalOp<FuncOp>([&](FuncOp funcOp) {
				for (Block &block : funcOp.getBlocks()) {
				if (!llvm::all_of(block.getArguments(), isLegalBlockArg))
				return false;
				}
				return true;
				});
				}

				//===----------------------------------------------------------------------===//
				// BufferPlacementPass construction
				//===----------------------------------------------------------------------===//

				std::unique_ptr<Pass> mlir::createBufferPlacementPass() {
				return std::make_unique<BufferPlacementPass>();
				}

mlir/lib/Transforms/CMakeLists.txt

	add_subdirectory(Utils)			add_subdirectory(Utils)

	add_mlir_library(MLIRTransforms			add_mlir_library(MLIRTransforms
				BufferPlacement.cpp
	Canonicalizer.cpp			Canonicalizer.cpp
	CSE.cpp			CSE.cpp
	DialectConversion.cpp			DialectConversion.cpp
	Inliner.cpp			Inliner.cpp
	LocationSnapshot.cpp			LocationSnapshot.cpp
	LoopCoalescing.cpp			LoopCoalescing.cpp
	LoopFusion.cpp			LoopFusion.cpp
	LoopInvariantCodeMotion.cpp			LoopInvariantCodeMotion.cpp
	Show All 27 Lines

mlir/test/Transforms/buffer-placement-prepration.mlir

This file was added.

				// RUN: mlir-opt -test-buffer-placement-preparation -split-input-file %s \| FileCheck %s -dump-input-on-failure

				// CHECK-LABEL: func @func_signature_conversion
				func @func_signature_conversion(%arg0: tensor<4x8xf32>) {
				return
				}
				// CHECK: ({{.*}}: memref<4x8xf32>) {

				// -----

				// CHECK-LABEL: func @non_void_to_void_return_op_converter
				func @non_void_to_void_return_op_converter(%arg0: tensor<4x8xf32>) -> tensor<4x8xf32> {
				return %arg0 : tensor<4x8xf32>
				}
				// CHECK: (%[[ARG0:.]]: [[TYPE:.]]<[[RANK:.]]>, %[[RESULT:.]]: [[TYPE]]<[[RANK]]>) {
				// CHECK-NEXT: linalg.copy(%[[ARG0]], %[[RESULT]])
				// CHECK-NEXT: return

				// -----

				// CHECK-LABEL: func @func_and_block_signature_conversion
				func @func_and_block_signature_conversion(%arg0 : tensor<2xf32>, %cond : i1, %arg1: tensor<4x4xf32>) -> tensor<4x4xf32>{
				cond_br %cond, ^bb1, ^bb2
				^bb1:
				br ^exit(%arg0 : tensor<2xf32>)
				^bb2:
				br ^exit(%arg0 : tensor<2xf32>)
				^exit(%arg2: tensor<2xf32>):
				return %arg1 : tensor<4x4xf32>
				}
				// CHECK: (%[[ARG0:.]]: [[ARG0_TYPE:.]], %[[COND:.]]: i1, %[[ARG1:.]]: [[ARG1_TYPE:.]], %[[RESULT:.]]: [[RESULT_TYPE:.*]]) {
				// CHECK: br ^[[EXIT_BLOCK:.*]](%[[ARG0]] : [[ARG0_TYPE]])
				// CHECK: br ^[[EXIT_BLOCK]](%[[ARG0]] : [[ARG0_TYPE]])
				// CHECK: ^[[EXIT_BLOCK]](%{{.*}}: [[ARG0_TYPE]])
				// CHECK-NEXT: linalg.copy(%[[ARG1]], %[[RESULT]])
				// CHECK-NEXT: return

				// -----

				// Test Case: Simple case for checking if BufferAssignmentPlacer creates AllocOps right before GenericOps.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @compute_allocs_position_simple
				func @compute_allocs_position_simple(%cond: i1, %arg0: tensor<2xf32>) -> tensor<2xf32>{
				%0 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0 {
				^bb0(%gen1_arg0: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				%1 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %0 {
				^bb0(%gen2_arg0: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				return %1 : tensor<2xf32>
				}
				// CHECK: (%{{.}}: {{.}}, %[[ARG0:.*]]: memref<2xf32>,
				// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[FIRST_ALLOC]]
				// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[FIRST_ALLOC]], %[[SECOND_ALLOC]]

				// -----

				// Test Case: if-else case for checking if BufferAssignmentPlacer creates AllocOps right before GenericOps.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @compute_allocs_position
				func @compute_allocs_position(%cond: i1, %arg0: tensor<2xf32>) -> tensor<2xf32>{
				%0 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0 {
				^bb0(%gen1_arg0: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				%1 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %0 {
				^bb0(%gen2_arg0: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				cond_br %cond, ^bb1(%arg0, %0: tensor<2xf32>, tensor<2xf32>),
				^bb2(%0, %arg0: tensor<2xf32>, tensor<2xf32>)
				^bb1(%arg1 : tensor<2xf32>, %arg2 : tensor<2xf32>):
				%2 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0 {
				^bb0(%gen3_arg0: f32):
				%tmp3 = exp %gen3_arg0 : f32
				linalg.yield %tmp3 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				%3 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %2 {
				^bb0(%gen4_arg0: f32):
				%tmp4 = exp %gen4_arg0 : f32
				linalg.yield %tmp4 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				br ^exit(%arg1, %arg2 : tensor<2xf32>, tensor<2xf32>)
				^bb2(%arg3 : tensor<2xf32>, %arg4 : tensor<2xf32>):
				%4 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0 {
				^bb0(%gen5_arg0: f32):
				%tmp5 = exp %gen5_arg0 : f32
				linalg.yield %tmp5 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				%5 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %4 {
				^bb0(%gen6_arg0: f32):
				%tmp6 = exp %gen6_arg0 : f32
				linalg.yield %tmp6 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				br ^exit(%arg3, %arg4 : tensor<2xf32>, tensor<2xf32>)
				^exit(%arg5 : tensor<2xf32>, %arg6 : tensor<2xf32>):
				%6 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0 {
				^bb0(%gen7_arg0: f32):
				%tmp7 = exp %gen7_arg0 : f32
				linalg.yield %tmp7 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				%7 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %6 {
				^bb0(%gen8_arg0: f32):
				%tmp8 = exp %gen8_arg0 : f32
				linalg.yield %tmp8 : f32
				}: tensor<2xf32> -> tensor<2xf32>
				return %7 : tensor<2xf32>
				}
				// CHECK: (%{{.}}: {{.}}, %[[ARG0:.*]]: memref<2xf32>,
				// CHECK-NEXT: %[[ALLOC0:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[ALLOC0]]
				// CHECK: %[[ALLOC1:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ALLOC0]], %[[ALLOC1]]
				// CHECK: cond_br %{{.}}, ^[[BB0:.]]({{.}}), ^[[BB1:.]](
				// CHECK-NEXT: ^[[BB0]]
				// CHECK-NEXT: %[[ALLOC2:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[ALLOC2]]
				// CHECK: %[[ALLOC3:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ALLOC2]], %[[ALLOC3]]
				// CHECK: br ^[[EXIT:.]]({{.}})
				// CHECK-NEXT: ^[[BB1]]
				// CHECK-NEXT: %[[ALLOC4:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[ALLOC4]]
				// CHECK: %[[ALLOC5:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ALLOC4]], %[[ALLOC5]]
				// CHECK: br ^[[EXIT]]
				// CHECK-NEXT: ^[[EXIT]]
				// CHECK-NEXT: %[[ALLOC6:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[ALLOC6]]
				// CHECK: %[[ALLOC7:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ALLOC6]], %[[ALLOC7]]

mlir/test/Transforms/buffer-placement.mlir

This file was added.

				// RUN: mlir-opt -buffer-placement -split-input-file %s \| FileCheck %s -dump-input-on-failure

				// This file checks the behaviour of BufferPlacement pass for moving Alloc and Dealloc
				// operations and inserting the missing the DeallocOps in their correct positions.

				// Test Case:
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferPlacement Expected Behaviour: It should move the existing AllocOp to the entry block,
				// and insert a DeallocOp at the exit block after CopyOp since %1 is an alias for %0 and %arg1.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @condBranch
				func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				br ^bb3(%arg1 : memref<2xf32>)
				^bb2:
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				br ^bb3(%0 : memref<2xf32>)
				^bb3(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
				// CHECK-NEXT: cond_br
				// CHECK: linalg.copy
				// CHECK-NEXT: dealloc %[[ALLOC]]
				// CHECK-NEXT: return

				// -----

				// Test Case: Existing AllocOp with no users.
				// BufferPlacement Expected Behaviour: It should insert a DeallocOp right before ReturnOp.

				// CHECK-LABEL: func @emptyUsesValue
				func @emptyUsesValue(%arg0: memref<4xf32>) {
				%0 = alloc() : memref<4xf32>
				return
				}
				// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
				// CHECK-NEXT: dealloc %[[ALLOC]]
				// CHECK-NEXT: return

				// -----

				// Test Case:
				// bb0
				// / \
				// \| bb1 <- Initial position of AllocOp
				// \ /
				// bb2
				// BufferPlacement Expected Behaviour: It should move the existing AllocOp to the entry block
				// and insert a DeallocOp at the exit block after CopyOp since %1 is an alias for %0 and %arg1.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @criticalEdge
				func @criticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)
				^bb1:
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				br ^bb2(%0 : memref<2xf32>)
				^bb2(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[ALLOC:.*]] = alloc()
				// CHECK-NEXT: cond_br
				// CHECK: linalg.copy
				// CHECK-NEXT: dealloc %[[ALLOC]]
				// CHECK-NEXT: return

				// -----

				// Test Case:
				// bb0 <- Initial position of AllocOp
				// / \
				// \| bb1
				// \ /
				// bb2
				// BufferPlacement Expected Behaviour: It shouldn't move the alloc position. It only inserts
				// a DeallocOp at the exit block after CopyOp since %1 is an alias for %0 and %arg1.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @invCriticalEdge
				func @invCriticalEdge(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				cond_br %arg0, ^bb1, ^bb2(%arg1 : memref<2xf32>)
				^bb1:
				br ^bb2(%0 : memref<2xf32>)
				^bb2(%1: memref<2xf32>):
				"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: dealloc
				// CHECK-NEXT: return

				// -----

				// Test Case:
				// bb0 <- Initial position of the first AllocOp
				// / \
				// bb1 bb2
				// \ /
				// bb3 <- Initial position of the second AllocOp
				// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only inserts two missing DeallocOps in the exit block.
				// %5 is an alias for %0. Therefore, the DeallocOp for %0 should occur after the last GenericOp. The Dealloc for %7 should
				// happen after the CopyOp.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @ifElse
				func @ifElse(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				cond_br %arg0, ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
				^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
				br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)
				^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
				br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)
				^bb3(%5: memref<2xf32>, %6: memref<2xf32>):
				%7 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %5, %7 {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}: memref<2xf32>, memref<2xf32>
				"linalg.copy"(%7, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: dealloc %[[FIRST_ALLOC]]
				// CHECK-NEXT: linalg.copy
				// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]
				// CHECK-NEXT: return

				// -----

				// Test Case: No users for buffer in if-else CFG
				// bb0 <- Initial position of AllocOp
				// / \
				// bb1 bb2
				// \ /
				// bb3
				// BufferPlacement Expected Behaviour: It shouldn't move the AllocOp. It only inserts a missing DeallocOp
				// in the exit block since %5 or %6 are the latest aliases of %0.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @ifElseNoUsers
				func @ifElseNoUsers(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				cond_br %arg0, ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
				^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
				br ^bb3(%1, %2 : memref<2xf32>, memref<2xf32>)
				^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
				br ^bb3(%3, %4 : memref<2xf32>, memref<2xf32>)
				^bb3(%5: memref<2xf32>, %6: memref<2xf32>):
				"linalg.copy"(%arg1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: dealloc
				// CHECK-NEXT: return

				// -----

				// Test Case:
				// bb0 <- Initial position of the first AllocOp
				// / \
				// bb1 bb2
				// \| / \
				// \| bb3 bb4
				// \ \ /
				// \ /
				// bb5 <- Initial position of the second AllocOp
				// BufferPlacement Expected Behaviour: AllocOps shouldn't be moved.
				// Two missing DeallocOps should be inserted in the exit block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @ifElseNested
				func @ifElseNested(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg1, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				cond_br %arg0, ^bb1(%arg1, %0 : memref<2xf32>, memref<2xf32>), ^bb2(%0, %arg1 : memref<2xf32>, memref<2xf32>)
				^bb1(%1: memref<2xf32>, %2: memref<2xf32>):
				br ^bb5(%1, %2 : memref<2xf32>, memref<2xf32>)
				^bb2(%3: memref<2xf32>, %4: memref<2xf32>):
				cond_br %arg0, ^bb3(%3 : memref<2xf32>), ^bb4(%4 : memref<2xf32>)
				^bb3(%5: memref<2xf32>):
				br ^bb5(%5, %3 : memref<2xf32>, memref<2xf32>)
				^bb4(%6: memref<2xf32>):
				br ^bb5(%3, %6 : memref<2xf32>, memref<2xf32>)
				^bb5(%7: memref<2xf32>, %8: memref<2xf32>):
				%9 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %7, %9 {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}: memref<2xf32>, memref<2xf32>
				"linalg.copy"(%9, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic
				// CHECK: dealloc %[[FIRST_ALLOC]]
				// CHECK-NEXT: linalg.copy
				// CHECK-NEXT: dealloc %[[SECOND_ALLOC]]
				// CHECK-NEXT: return

				// -----

				// Test Case: Dead operations in a single block.
				// BufferPlacement Expected Behaviour: It shouldn't move the AllocOps. It only inserts the two missing DeallocOps
				// after the last GenericOp.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @redundantOperations
				func @redundantOperations(%arg0: memref<2xf32>) {
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				%1 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %0, %1 {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}: memref<2xf32>, memref<2xf32>
				return
				}

				// CHECK: (%[[ARG0:.]]: {{.}})
				// CHECK-NEXT: %[[FIRST_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[ARG0]], %[[FIRST_ALLOC]]
				// CHECK: %[[SECOND_ALLOC:.*]] = alloc()
				// CHECK-NEXT: linalg.generic {{.*}} %[[FIRST_ALLOC]], %[[SECOND_ALLOC]]
				// CHECK: dealloc
				// CHECK-NEXT: dealloc
				// CHECK-NEXT: return

				// -----

				// Test Case:
				// bb0
				// / \
				// Initial position of the first AllocOp -> bb1 bb2 <- Initial position of the second AllocOp
				// \ /
				// bb3
				// BufferPlacement Expected Behaviour: Both AllocOps should be moved to the entry block. Both missing DeallocOps should be moved to
				// the exit block after CopyOp since %arg2 is an alias for %0 and %1.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @moving_alloc_and_inserting_missing_dealloc
				func @moving_alloc_and_inserting_missing_dealloc(%cond: i1, %arg0: memref<2xf32>, %arg1: memref<2xf32>){
				cond_br %cond, ^bb1, ^bb2
				^bb1:
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				br ^exit(%0 : memref<2xf32>)
				^bb2:
				%1 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %1 {
				^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
				%tmp2 = exp %gen2_arg0 : f32
				linalg.yield %tmp2 : f32
				}: memref<2xf32>, memref<2xf32>
				br ^exit(%1 : memref<2xf32>)
				^exit(%arg2: memref<2xf32>):
				"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %{{.*}} = alloc()
				// CHECK-NEXT: %{{.*}} = alloc()
				// CHECK: linalg.copy
				// CHECK-NEXT: dealloc
				// CHECK-NEXT: dealloc
				// CHECK-NEXT: return

				// -----

				// Test Case: Invalid position of the DeallocOp. There is a user after deallocation.
				// bb0
				// / \
				// bb1 bb2 <- Initial position of AllocOp
				// \ /
				// bb3
				// BufferPlacement Expected Behaviour: It should move the AllocOp to the entry block.
				// The existing DeallocOp should be moved to exit block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @moving_invalid_dealloc_op_complex
				func @moving_invalid_dealloc_op_complex(%cond: i1, %arg0: memref<2xf32>, %arg1: memref<2xf32>){
				cond_br %cond, ^bb1, ^bb2
				^bb1:
				br ^exit(%arg0 : memref<2xf32>)
				^bb2:
				%1 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %1 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				dealloc %1 : memref<2xf32>
				br ^exit(%1 : memref<2xf32>)
				^exit(%arg2: memref<2xf32>):
				"linalg.copy"(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK-NEXT: %{{.*}} = alloc()
				// CHECK: linalg.copy
				// CHECK-NEXT: dealloc
				// CHECK-NEXT: return

				// -----

				// Test Case: Iserting missing DeallocOp in a single block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @inserting_missing_dealloc_simple
				func @inserting_missing_dealloc_simple(%arg0 : memref<2xf32>, %arg1: memref<2xf32>){
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: linalg.copy
				// CHECK-NEXT: dealloc

				// -----

				// Test Case: Moving invalid DeallocOp (there is a user after deallocation) in a single block.

				#map0 = affine_map<(d0) -> (d0)>

				// CHECK-LABEL: func @moving_invalid_dealloc_op
				func @moving_invalid_dealloc_op(%arg0 : memref<2xf32>, %arg1: memref<2xf32>){
				%0 = alloc() : memref<2xf32>
				linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel"]} %arg0, %0 {
				^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
				%tmp1 = exp %gen1_arg0 : f32
				linalg.yield %tmp1 : f32
				}: memref<2xf32>, memref<2xf32>
				dealloc %0 : memref<2xf32>
				"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
				return
				}

				// CHECK: linalg.copy
				// CHECK-NEXT: dealloc
				No newline at end of file

mlir/test/lib/Transforms/CMakeLists.txt

	add_llvm_library(MLIRTestTransforms			add_llvm_library(MLIRTestTransforms
	TestAllReduceLowering.cpp			TestAllReduceLowering.cpp
				TestBufferPlacement.cpp
	TestCallGraph.cpp			TestCallGraph.cpp
	TestConstantFold.cpp			TestConstantFold.cpp
	TestConvertGPUKernelToCubin.cpp			TestConvertGPUKernelToCubin.cpp
	TestDominance.cpp			TestDominance.cpp
	TestLoopFusion.cpp			TestLoopFusion.cpp
	TestGpuMemoryPromotion.cpp			TestGpuMemoryPromotion.cpp
	TestGpuParallelLoopMapping.cpp			TestGpuParallelLoopMapping.cpp
	TestInlining.cpp			TestInlining.cpp
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

mlir/test/lib/Transforms/TestBufferPlacement.cpp

This file was added.

				//===- TestBufferPlacement.cpp - Test for buffer placement 0----- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements logic for testing buffer placement including its
				// utility converters.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/IR/Function.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Pass/PassManager.h"
				#include "mlir/Transforms/BufferPlacement.h"

				using namespace mlir;

				namespace {
				/// This pass tests the computeAllocPosition helper method and two provided
				/// operation converters, FunctionAndBlockSignatureConverter and
				/// NonVoidToVoidReturnOpConverter. Furthermore, this pass converts linalg
				/// operations on tensors to linalg operations on buffers to prepare them for
				/// the BufferPlacement pass that can be applied afterwards.
				struct TestBufferPlacementPreparationPass
				rriddleUnsubmitted Done Reply Inline Actions A function pass is not allowed to mutate the public type of the function, so this should be a module pass. rriddle: A function pass is not allowed to mutate the public type of the function, so this should be a…
				: mlir::PassWrapper<TestBufferPlacementPreparationPass, FunctionPass> {

				/// Converts tensor-type generic linalg operations to memref ones using buffer
				/// assignment.
				class GenericOpConverter
				: public BufferAssignmentOpConversionPattern<linalg::GenericOp> {
				public:
				using BufferAssignmentOpConversionPattern<
				linalg::GenericOp>::BufferAssignmentOpConversionPattern;

				LogicalResult
				matchAndRewrite(linalg::GenericOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const final {
				auto loc = op.getLoc();
				SmallVector<Value, 4> args(operands.begin(), operands.end());

				// Update all types to memref types.
				auto results = op.getOperation()->getResults();
				for (auto result : results) {
				auto type = result.getType().cast<ShapedType>();
				if (!type)
				op.emitOpError()
				<< "tensor to buffer conversion expects ranked results";
				if (!type.hasStaticShape())
				return rewriter.notifyMatchFailure(
				op, "dynamic shapes not currently supported");
				auto memrefType =
				MemRefType::get(type.getShape(), type.getElementType());

				// Compute alloc position and insert a custom allocation node.
				OpBuilder::InsertionGuard guard(rewriter);
				rewriter.restoreInsertionPoint(
				bufferAssignment->computeAllocPosition(result));
				auto alloc = rewriter.create<AllocOp>(loc, memrefType);
				result.replaceAllUsesWith(alloc);
				args.push_back(alloc);
				}

				// Generate a new linalg operation that works on buffers.
				auto linalgOp = rewriter.create<linalg::GenericOp>(
				loc, llvm::None, args, rewriter.getI64IntegerAttr(operands.size()),
				rewriter.getI64IntegerAttr(results.size()), op.indexing_maps(),
				op.iterator_types(), op.docAttr(), op.library_callAttr());

				// Move regions from the old operation to the new one.
				auto &region = linalgOp.region();
				rewriter.inlineRegionBefore(op.region(), region, region.end());

				// TODO: verify the internal memref-based linalg functionality.
				auto &entryBlock = region.front();
				for (auto result : results) {
				auto type = result.getType().cast<ShapedType>();
				entryBlock.addArgument(type.getElementType());
				}

				rewriter.eraseOp(op);
				return success();
				}
				};

				void populateTensorLinalgToBufferLinalgConversionPattern(
				MLIRContext context, BufferAssignmentPlacer placer,
				OwningRewritePatternList *patterns) {
				// clang-format off
				patterns->insert<
				FunctionAndBlockSignatureConverter,
				GenericOpConverter,
				NonVoidToVoidReturnOpConverter<
				ReturnOp, ReturnOp, linalg::CopyOp>
				>(context,placer);
				// clang-format on
				}

				void runOnFunction() override {
				OwningRewritePatternList patterns;
				auto &context = getContext();
				ConversionTarget target(context);

				// Make all linalg operations illegal as long as they work on tensors.
				auto isLegalOperation = [](Operation *op) {
				auto isIllegalValue = [](Value operand) {
				return operand.getType().isa<TensorType>();
				};
				auto operands = op->getOperands();
				auto results = op->getResults();
				return std::none_of(operands.begin(), operands.end(), isIllegalValue) &&
				std::none_of(results.begin(), results.end(), isIllegalValue);
				};
				target.addLegalDialect<StandardOpsDialect>();
				target.addDynamicallyLegalDialect<linalg::LinalgDialect>(
				Optional<ConversionTarget::DynamicLegalityCallbackFn>(
				isLegalOperation));

				// Mark return operations illegal as long as they return values.
				target.addDynamicallyLegalOp<mlir::ReturnOp>(
				[](mlir::ReturnOp returnOp) { return returnOp.getNumOperands() == 0; });

				auto function = getFunction();
				mehdi_aminiUnsubmitted Done Reply Inline Actions FuncOp mehdi_amini: FuncOp
				BufferAssignmentPlacer placer(function);
				FunctionAndBlockSignatureConverter::addDynamicallyLegalFuncOp(target);
				populateTensorLinalgToBufferLinalgConversionPattern(function.getContext(),
				rriddleUnsubmitted Done Reply Inline Actions Can you just do this inside of the converter constructor? Otherwise, you don't need a specific converter class. rriddle: Can you just do this inside of the converter constructor? Otherwise, you don't need a specific…
				&placer, &patterns);

				// Do partial conversion so we can have unknown ops in tests.
				if (failed(applyPartialConversion(function, target, patterns, nullptr))) {
				signalPassFailure();
				}
				};
				};
				} // end anonymous namespace

				namespace mlir {
				void registerTestBufferPlacementPreparationPass() {
				PassRegistration<TestBufferPlacementPreparationPass>(
				"test-buffer-placement-preparation",
				"Tests buffer placement helper methods including its "
				"operation-conversion patterns");
				}
				} // end namespace mlir
				No newline at end of file

mlir/tools/mlir-opt/mlir-opt.cpp

Show All 35 Lines
void registerPatternsTestPass();		void registerPatternsTestPass();
void registerPrintOpAvailabilityPass();		void registerPrintOpAvailabilityPass();
void registerSideEffectTestPasses();		void registerSideEffectTestPasses();
void registerSimpleParametricTilingPass();		void registerSimpleParametricTilingPass();
void registerSymbolTestPasses();		void registerSymbolTestPasses();
void registerTestAffineDataCopyPass();		void registerTestAffineDataCopyPass();
void registerTestAllReduceLoweringPass();		void registerTestAllReduceLoweringPass();
void registerTestAffineLoopUnswitchingPass();		void registerTestAffineLoopUnswitchingPass();
		void registerTestBufferPlacementPreparationPass();
void registerTestLinalgMatmulToVectorPass();		void registerTestLinalgMatmulToVectorPass();
void registerTestLoopPermutationPass();		void registerTestLoopPermutationPass();
void registerTestCallGraphPass();		void registerTestCallGraphPass();
void registerTestConstantFold();		void registerTestConstantFold();
void registerTestConvertGPUKernelToCubinPass();		void registerTestConvertGPUKernelToCubinPass();
void registerTestDominancePass();		void registerTestDominancePass();
void registerTestFunc();		void registerTestFunc();
void registerTestGpuMemoryPromotionPass();		void registerTestGpuMemoryPromotionPass();
Show All 36 Lines	verifyPasses("verify-each",
cl::desc("Run the verifier after each transformation pass"),		cl::desc("Run the verifier after each transformation pass"),
cl::init(true));		cl::init(true));

static cl::opt<bool> allowUnregisteredDialects(		static cl::opt<bool> allowUnregisteredDialects(
"allow-unregistered-dialect",		"allow-unregistered-dialect",
cl::desc("Allow operation with no registered dialects"), cl::init(false));		cl::desc("Allow operation with no registered dialects"), cl::init(false));

void registerTestPasses() {		void registerTestPasses() {
registerConvertToTargetEnvPass();		registerConvertToTargetEnvPass();
		mehdi_aminiUnsubmitted Done Reply Inline Actions This isn't in the test directory, so it shouldn't be registered here but use the same mechanism as the other non-tests passes. mehdi_amini: This isn't in the test directory, so it shouldn't be registered here but use the same mechanism…
registerInliner();		registerInliner();
registerMemRefBoundCheck();		registerMemRefBoundCheck();
registerPassManagerTestPass();		registerPassManagerTestPass();
registerPatternsTestPass();		registerPatternsTestPass();
registerPrintOpAvailabilityPass();		registerPrintOpAvailabilityPass();
registerSideEffectTestPasses();		registerSideEffectTestPasses();
registerSimpleParametricTilingPass();		registerSimpleParametricTilingPass();
registerSymbolTestPasses();		registerSymbolTestPasses();
registerTestAffineDataCopyPass();		registerTestAffineDataCopyPass();
registerTestAllReduceLoweringPass();		registerTestAllReduceLoweringPass();
registerTestAffineLoopUnswitchingPass();		registerTestAffineLoopUnswitchingPass();
registerTestLinalgMatmulToVectorPass();		registerTestLinalgMatmulToVectorPass();
registerTestLoopPermutationPass();		registerTestLoopPermutationPass();
registerTestCallGraphPass();		registerTestCallGraphPass();
registerTestConstantFold();		registerTestConstantFold();
#if MLIR_CUDA_CONVERSIONS_ENABLED		#if MLIR_CUDA_CONVERSIONS_ENABLED
registerTestConvertGPUKernelToCubinPass();		registerTestConvertGPUKernelToCubinPass();
#endif		#endif
		registerTestBufferPlacementPreparationPass();
registerTestDominancePass();		registerTestDominancePass();
registerTestFunc();		registerTestFunc();
registerTestGpuMemoryPromotionPass();		registerTestGpuMemoryPromotionPass();
registerTestLinalgTransforms();		registerTestLinalgTransforms();
registerTestLivenessPass();		registerTestLivenessPass();
registerTestLoopFusion();		registerTestLoopFusion();
registerTestLoopMappingPass();		registerTestLoopMappingPass();
registerTestMatchers();		registerTestMatchers();
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Providing buffer assignment for MLIRClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 259263

mlir/include/mlir/Transforms/BufferPlacement.h

mlir/include/mlir/Transforms/Passes.h

mlir/include/mlir/Transforms/Passes.td

mlir/lib/Transforms/BufferPlacement.cpp

mlir/lib/Transforms/CMakeLists.txt

mlir/test/Transforms/buffer-placement-prepration.mlir

mlir/test/Transforms/buffer-placement.mlir

mlir/test/lib/Transforms/CMakeLists.txt

mlir/test/lib/Transforms/TestBufferPlacement.cpp

mlir/tools/mlir-opt/mlir-opt.cpp

Providing buffer assignment for MLIR
ClosedPublic