This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
docs/
3/3
BufferDeallocationInternals.md
-
include/mlir/
-
mlir/
-
Dialect/MemRef/
-
MemRef/
-
IR/
-
MemRef.h
3/4
MemRefOps.td
-
Utils/
4/4
MemRefUtils.h
-
Transforms/
-
BufferUtils.h
-
Passes.h
-
Passes.td
-
lib/
-
Dialect/MemRef/
-
MemRef/
-
CMakeLists.txt
-
IR/
-
CMakeLists.txt
-
MemRefOps.cpp
-
Utils/
1/1
MemRefUtils.cpp
-
Transforms/
2/2
BufferDeallocation.cpp
-
BufferUtils.cpp
-
CMakeLists.txt
-
CopyRemoval.cpp
-
test/Transforms/
-
Transforms/
-
buffer-deallocation.mlir
3/4
canonicalize.mlir
-
copy-removal.mlir

Differential D99172

[mlir] Introduce CloneOp and adapt test cases in BufferDeallocation.
ClosedPublic

Authored by dfki-jugr on Mar 23 2021, 4:13 AM.

Download Raw Diff

Details

Reviewers

herhut
silvas
mehdi_amini
pifon2a

Commits

rG06b03800f3fc: [mlir] Introduce CloneOp and adapt test cases in BufferDeallocation.

Summary

Add a new clone operation to the memref dialect. This operation implicitly
copies data from a source buffer to a new buffer. In contrast to the linalg.copy
operation, this operation does not accept a target buffer as an argument.
Instead, this operation performs a conceptual allocation which does not need to
be performed manually.

Furthermore, this operation resolves the dependency from the linalg-dialect
in the BufferDeallocation pass. In addition, we also extended the canonicalization
patterns to fold clone operations. This is a competitive alternative to the
copy removal pass which is more expensive.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dfki-jugr created this revision.Mar 23 2021, 4:13 AM

Herald added subscribers: dcaballe, cota, teijeong and 17 others. · View Herald TranscriptMar 23 2021, 4:13 AM

dfki-jugr requested review of this revision.Mar 23 2021, 4:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2021, 4:13 AM

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

dfki-jugr added reviewers: herhut, silvas, mehdi_amini.Mar 23 2021, 4:14 AM

Harbormaster completed remote builds in B95228: Diff 332607.Mar 23 2021, 6:32 AM

Nice!

mlir/docs/BufferDeallocationInternals.md
312–313	Nit, should we keep the comment that explains the reason for the clone?
608–609	Is this pass still needed? The copy removal as described here relies on the fact that source and destination of the copy are not mutated after the copy operation. While this is true for `memref.clone`, it is not true for copy operations. So to do this kind of optimization, one would need to analyse the program first. As the bufferization pass no longer introduces copy operations, maybe we should rather drop the incomplete pass for now.
696–697	This is no longer true.
mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
354	Can you add a remark that mutating the source and result of `clone` after the clone operation has executed has undefined behavior? This is a requirement to making the bufferization logic work. And while at it, could you also add a similar comment to `tensor_load` (the source may not be mutated afterwards, at least when used in the context of bufferization) and `memref.buffer_cast` (the result may not be mutated). As we are restricting the existing `tensor_load` here, maybe it would be better to have a new operation with the restricted semantics but that is outside of the scope of this diff.
mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h
26	With the guarantee that the cloned buffer cannot be mutated, is this actually required?
30	This only finds one dealloc. For its use, would it not be OK to find the dealloc in the current block (can only be one).
mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp
22	should this be `&&`?
mlir/lib/Transforms/BufferDeallocation.cpp
405	Is this still true?
mlir/test/Transforms/canonicalize.mlir
1122	Why does this remain?
1184	It is not clear to me what these tests are testing.

Addressed comments.

mlir/lib/Transforms/BufferDeallocation.cpp
405	Yes, it can still happens that there are clones of clones.
mlir/test/Transforms/canonicalize.mlir
1122	%3 results from the scf.if. The canonicalization pattern only checks, if %3 is an alloc operation. Since this is not the case, this clone remains.

Harbormaster completed remote builds in B95478: Diff 332971.Mar 24 2021, 11:36 AM

herhut added inline comments.Mar 25 2021, 5:13 AM

mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
362	nit: source or result.
1157	nit: source source
mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h
26	These are only used for the canonicalization pattern, so I'd prefer to make them local.
mlir/test/Transforms/canonicalize.mlir
1122	Now that the operand to clone is required to not be mutated, why do we need to see the alloc? Is it not good enough if a `clone` has a following `dealloc` of the source, if the clone itself does not have an earlier dealloc? We can land it like it is (which is quite restricted) but this should be revisited again to make it apply more broadly.

Addressed comments.

mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h
26	findDealloc is also used in BufferDeallocation.

Harbormaster completed remote builds in B95857: Diff 333524.Mar 26 2021, 4:29 AM

pifon2a accepted this revision.Mar 28 2021, 11:16 PM

pifon2a added a subscriber: pifon2a.

pifon2a added inline comments.

mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td
359	nit: `arg 1` -> `arg1`

This revision is now accepted and ready to land.Mar 28 2021, 11:16 PM

Closed by commit rG06b03800f3fc: [mlir] Introduce CloneOp and adapt test cases in BufferDeallocation. (authored by dfki-jugr). · Explain WhyMar 29 2021, 1:20 AM

This revision was automatically updated to reflect the committed changes.

dfki-jugr added a commit: rG06b03800f3fc: [mlir] Introduce CloneOp and adapt test cases in BufferDeallocation..

pifon2a added a reverting change: rG883912abe669: Revert "[mlir] Introduce CloneOp and adapt test cases in BufferDeallocation.".Mar 29 2021, 3:49 AM

@dfki-jugr But why does this revision remove the copy removal pass when there are things that generate copies? The extension to canonicalization done eliminates clone ops, but IR with copy ops will no longer have a way to remove copies? For eg. -linalg-bufferize would still generate copy operations, right? @silvas

In D99172#2670712, @bondhugula wrote:

@dfki-jugr But why does this revision remove the copy removal pass when there are things that generate copies? The extension to canonicalization done eliminates clone ops, but IR with copy ops will no longer have a way to remove copies? For eg. -linalg-bufferize would still generate copy operations, right? @silvas

I asked for it to be removed, as I was unaware of where it got used. I assumed it was only used for copies created by the bufferization pass itself. The copy removal pass was created for bufferization and only works if all assumptions that bufferization makes about copies hold. We had reports where this was not the case. So instead of keeping the broken one, I thought it would be better to remove it for now and implement a proper one.

In D99172#2670712, @bondhugula wrote:

@dfki-jugr But why does this revision remove the copy removal pass when there are things that generate copies? The extension to canonicalization done eliminates clone ops, but IR with copy ops will no longer have a way to remove copies? For eg. -linalg-bufferize would still generate copy operations, right? @silvas

I believe all uses of copy in linalg-bufferize can be replaced with memref.clone easily -- it's a much more natural representation and we should do that cleanup.

In D99172#2672229, @silvas wrote:

In D99172#2670712, @bondhugula wrote:

@dfki-jugr But why does this revision remove the copy removal pass when there are things that generate copies? The extension to canonicalization done eliminates clone ops, but IR with copy ops will no longer have a way to remove copies? For eg. -linalg-bufferize would still generate copy operations, right? @silvas

I believe all uses of copy in linalg-bufferize can be replaced with memref.clone easily -- it's a much more natural representation and we should do that cleanup.

Yes, but I wish -copy-removal was removed *after* this cleanup was done --- since there are now common copy patterns for which there is no longer anything in the infrastructure to eliminate/optimize.

In D99172#2674514, @bondhugula wrote:

In D99172#2672229, @silvas wrote:

In D99172#2670712, @bondhugula wrote:

@dfki-jugr But why does this revision remove the copy removal pass when there are things that generate copies? The extension to canonicalization done eliminates clone ops, but IR with copy ops will no longer have a way to remove copies? For eg. -linalg-bufferize would still generate copy operations, right? @silvas

I believe all uses of copy in linalg-bufferize can be replaced with memref.clone easily -- it's a much more natural representation and we should do that cleanup.

Yes, but I wish -copy-removal was removed *after* this cleanup was done --- since there are now common copy patterns for which there is no longer anything in the infrastructure to eliminate/optimize.

If this patch is breaking you please revert it, by all means! If you concern is hypothetical though, I tend to trust that the patch authors did their homework on this to make sure it isn't an issue in practice (AFAIK they are well in tune with all users of this code).

I have concerns about this kind of restriction on an operation that is named so broadly: a "memref.clone" shouldn't impose anything about mutability!
Can you rename this into something really unambiguously tied to bufferization, like memref.__bufferize__clone or something like that please?

Revision Contents

Path

Size

mlir/

docs/

BufferDeallocationInternals.md

320 lines

include/

mlir/

Dialect/

MemRef/

IR/

MemRef.h

1 line

MemRefOps.td

47 lines

Utils/

MemRefUtils.h

29 lines

Transforms/

BufferUtils.h

4 lines

Passes.h

3 lines

Passes.td

7 lines

lib/

Dialect/

MemRef/

CMakeLists.txt

23 lines

IR/

CMakeLists.txt

MemRefOps.cpp

71 lines

Utils/

MemRefUtils.cpp

35 lines

Transforms/

BufferDeallocation.cpp

133 lines

BufferUtils.cpp

21 lines

CMakeLists.txt

1 line

CopyRemoval.cpp

test/

Transforms/

buffer-deallocation.mlir

114 lines

canonicalize.mlir

85 lines

copy-removal.mlir

Diff 333524

mlir/docs/BufferDeallocationInternals.md

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	func @condBranch(%arg0: i1, %arg1: memref<2xf32>) {
%0 = memref.alloc() : memref<2xf32>		%0 = memref.alloc() : memref<2xf32>
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3()		br ^bb3()
^bb2:		^bb2:
partial_write(%0, %0)		partial_write(%0, %0)
br ^bb3()		br ^bb3()
^bb3():		^bb3():
"linalg.copy"(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()		test.copy(%0, %arg1) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}
```		```

The maintenance of the SSA like properties is only needed in the bufferization		The maintenance of the SSA like properties is only needed in the bufferization
process. Afterwards, for example in optimization processes, the property is no		process. Afterwards, for example in optimization processes, the property is no
longer needed.		longer needed.

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
![branch_example_pre_move](/includes/img/branch_example_pre_move.svg)		![branch_example_pre_move](/includes/img/branch_example_pre_move.svg)

```mlir		```mlir
func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3(%arg1 : memref<2xf32>)		br ^bb3(%arg1 : memref<2xf32>)
^bb2:		^bb2:
%0 = alloc() : memref<2xf32> // aliases: %1		%0 = memref.alloc() : memref<2xf32> // aliases: %1
use(%0)		use(%0)
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>): // %1 could be %0 or %arg1		^bb3(%1: memref<2xf32>): // %1 could be %0 or %arg1
"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}
```		```

Applying the BufferHoisting pass on this program results in the following piece		Applying the BufferHoisting pass on this program results in the following piece
of code:		of code:

![branch_example_post_move](/includes/img/branch_example_post_move.svg)		![branch_example_post_move](/includes/img/branch_example_post_move.svg)

```mlir		```mlir
func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {		func @condBranch(%arg0: i1, %arg1: memref<2xf32>, %arg2: memref<2xf32>) {
%0 = alloc() : memref<2xf32> // moved to bb0		%0 = memref.alloc() : memref<2xf32> // moved to bb0
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
br ^bb3(%arg1 : memref<2xf32>)		br ^bb3(%arg1 : memref<2xf32>)
^bb2:		^bb2:
use(%0)		use(%0)
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
"linalg.copy"(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()		test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}
```		```

The alloc is moved from bb2 to the beginning and it is passed as an argument to		The alloc is moved from bb2 to the beginning and it is passed as an argument to
bb3.		bb3.

The following example demonstrates an allocation using dynamically shaped		The following example demonstrates an allocation using dynamically shaped
types. Due to the data dependency of the allocation to %0, we cannot move the		types. Due to the data dependency of the allocation to %0, we cannot move the
allocation out of bb2 in this case:		allocation out of bb2 in this case:

```mlir		```mlir
func @condBranchDynamicType(		func @condBranchDynamicType(
%arg0: i1,		%arg0: i1,
%arg1: memref<?xf32>,		%arg1: memref<?xf32>,
%arg2: memref<?xf32>,		%arg2: memref<?xf32>,
%arg3: index) {		%arg3: index) {
cond_br %arg0, ^bb1, ^bb2(%arg3: index)		cond_br %arg0, ^bb1, ^bb2(%arg3: index)
^bb1:		^bb1:
br ^bb3(%arg1 : memref<?xf32>)		br ^bb3(%arg1 : memref<?xf32>)
^bb2(%0: index):		^bb2(%0: index):
%1 = alloc(%0) : memref<?xf32> // cannot be moved upwards to the data		%1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards to the data
// dependency to %0		// dependency to %0
use(%1)		use(%1)
br ^bb3(%1 : memref<?xf32>)		br ^bb3(%1 : memref<?xf32>)
^bb3(%2: memref<?xf32>):		^bb3(%2: memref<?xf32>):
"linalg.copy"(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()		test.copy(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
return		return
}		}
```		```

## Introduction of Copies		## Introduction of Clones

In order to guarantee that all allocated buffers are freed properly, we have to		In order to guarantee that all allocated buffers are freed properly, we have to
pay attention to the control flow and all potential aliases a buffer allocation		pay attention to the control flow and all potential aliases a buffer allocation
can have. Since not all allocations can be safely freed with respect to their		can have. Since not all allocations can be safely freed with respect to their
aliases (see the following code snippet), it is often required to introduce		aliases (see the following code snippet), it is often required to introduce
copies to eliminate them. Consider the following example in which the		copies to eliminate them. Consider the following example in which the
allocations have already been placed:		allocations have already been placed:

```mlir		```mlir
func @branch(%arg0: i1) {		func @branch(%arg0: i1) {
%0 = alloc() : memref<2xf32> // aliases: %2		%0 = memref.alloc() : memref<2xf32> // aliases: %2
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
%1 = alloc() : memref<2xf32> // resides here for demonstration purposes		%1 = memref.alloc() : memref<2xf32> // resides here for demonstration purposes
// aliases: %2		// aliases: %2
br ^bb3(%1 : memref<2xf32>)		br ^bb3(%1 : memref<2xf32>)
^bb2:		^bb2:
use(%0)		use(%0)
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%2: memref<2xf32>):		^bb3(%2: memref<2xf32>):
…		…
return		return
Show All 12 Lines
allocations can be safely freed in their associated post-dominator blocks.		allocations can be safely freed in their associated post-dominator blocks.
Instead, we have to pay attention to all of their aliases.		Instead, we have to pay attention to all of their aliases.

Applying the BufferDeallocation pass to the program above yields the following		Applying the BufferDeallocation pass to the program above yields the following
result:		result:

```mlir		```mlir
func @branch(%arg0: i1) {		func @branch(%arg0: i1) {
%0 = alloc() : memref<2xf32>		%0 = memref.alloc() : memref<2xf32>
cond_br %arg0, ^bb1, ^bb2		cond_br %arg0, ^bb1, ^bb2
^bb1:		^bb1:
%1 = alloc() : memref<2xf32>		%1 = memref.alloc() : memref<2xf32>
%3 = alloc() : memref<2xf32> // temp copy for %1		%3 = memref.clone %1 : (memref<2xf32>) -> (memref<2xf32>)
"linalg.copy"(%1, %3) : (memref<2xf32>, memref<2xf32>) -> ()		memref.dealloc %1 : memref<2xf32> // %1 can be safely freed here
dealloc %1 : memref<2xf32> // %1 can be safely freed here
br ^bb3(%3 : memref<2xf32>)		br ^bb3(%3 : memref<2xf32>)
^bb2:		^bb2:
use(%0)		use(%0)
%4 = alloc() : memref<2xf32> // temp copy for %0		%4 = memref.clone %0 : (memref<2xf32>) -> (memref<2xf32>)
"linalg.copy"(%0, %4) : (memref<2xf32>, memref<2xf32>) -> ()
br ^bb3(%4 : memref<2xf32>)		br ^bb3(%4 : memref<2xf32>)
^bb3(%2: memref<2xf32>):		^bb3(%2: memref<2xf32>):
…		…
dealloc %2 : memref<2xf32> // free temp buffer %2		memref.dealloc %2 : memref<2xf32> // free temp buffer %2
dealloc %0 : memref<2xf32> // %0 can be safely freed here		memref.dealloc %0 : memref<2xf32> // %0 can be safely freed here
return		return
}		}
```		```

Note that a temporary buffer for %2 was introduced to free all allocations		Note that a temporary buffer for %2 was introduced to free all allocations
properly. Note further that the unnecessary allocation of %3 can be easily		properly. Note further that the unnecessary allocation of %3 can be easily
removed using one of the post-pass transformations.		removed using one of the post-pass transformations or the canonicalization
		pass.
Reconsider the previously introduced sample demonstrating dynamically shaped
types:

```mlir
func @condBranchDynamicType(
%arg0: i1,
%arg1: memref<?xf32>,
%arg2: memref<?xf32>,
%arg3: index) {
cond_br %arg0, ^bb1, ^bb2(%arg3: index)
^bb1:
br ^bb3(%arg1 : memref<?xf32>)
^bb2(%0: index):
%1 = alloc(%0) : memref<?xf32> // aliases: %2
use(%1)
br ^bb3(%1 : memref<?xf32>)
^bb3(%2: memref<?xf32>):
"linalg.copy"(%2, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
return
}
```

In the presence of DSTs, we have to parameterize the allocations with		The presented example also works with dynamically shaped types.
additional dimension information of the source buffers, we want to copy from.
BufferDeallocation automatically introduces all required operations to extract
dimension specifications and wires them with the associated allocations:

```mlir
func @condBranchDynamicType(
%arg0: i1,
%arg1: memref<?xf32>,
%arg2: memref<?xf32>,
%arg3: index) {
cond_br %arg0, ^bb1, ^bb2(%arg3 : index)
^bb1:
%c0 = constant 0 : index
%0 = dim %arg1, %c0 : memref<?xf32> // dimension operation to parameterize
// the following temp allocation
%1 = alloc(%0) : memref<?xf32>
"linalg.copy"(%arg1, %1) : (memref<?xf32>, memref<?xf32>) -> ()
br ^bb3(%1 : memref<?xf32>)
^bb2(%2: index):
%3 = alloc(%2) : memref<?xf32>
use(%3)
%c0_0 = constant 0 : index
%4 = dim %3, %c0_0 : memref<?xf32> // dimension operation to parameterize
// the following temp allocation
%5 = alloc(%4) : memref<?xf32>
"linalg.copy"(%3, %5) : (memref<?xf32>, memref<?xf32>) -> ()
dealloc %3 : memref<?xf32> // %3 can be safely freed here
br ^bb3(%5 : memref<?xf32>)
^bb3(%6: memref<?xf32>):
"linalg.copy"(%6, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
dealloc %6 : memref<?xf32> // %6 can be safely freed here
return
}
```

BufferDeallocation performs a fix-point iteration taking all aliases of all		BufferDeallocation performs a fix-point iteration taking all aliases of all
tracked allocations into account. We initialize the general iteration process		tracked allocations into account. We initialize the general iteration process
using all tracked allocations and their associated aliases. As soon as we		using all tracked allocations and their associated aliases. As soon as we
encounter an alias that is not properly dominated by our allocation, we mark		encounter an alias that is not properly dominated by our allocation, we mark
this alias as _critical_ (needs to be freed and tracked by the internal		this alias as _critical_ (needs to be freed and tracked by the internal
fix-point iteration). The following sample demonstrates the presence of		fix-point iteration). The following sample demonstrates the presence of
critical and non-critical aliases:		critical and non-critical aliases:

![nested_branch_example_pre_move](/includes/img/nested_branch_example_pre_move.svg)		![nested_branch_example_pre_move](/includes/img/nested_branch_example_pre_move.svg)

```mlir		```mlir
func @condBranchDynamicTypeNested(		func @condBranchDynamicTypeNested(
%arg0: i1,		%arg0: i1,
%arg1: memref<?xf32>, // aliases: %3, %4		%arg1: memref<?xf32>, // aliases: %3, %4
%arg2: memref<?xf32>,		%arg2: memref<?xf32>,
%arg3: index) {		%arg3: index) {
cond_br %arg0, ^bb1, ^bb2(%arg3: index)		cond_br %arg0, ^bb1, ^bb2(%arg3: index)
^bb1:		^bb1:
br ^bb6(%arg1 : memref<?xf32>)		br ^bb6(%arg1 : memref<?xf32>)
^bb2(%0: index):		^bb2(%0: index):
%1 = alloc(%0) : memref<?xf32> // cannot be moved upwards due to the data		%1 = memref.alloc(%0) : memref<?xf32> // cannot be moved upwards due to the data
// dependency to %0		// dependency to %0
// aliases: %2, %3, %4		// aliases: %2, %3, %4
use(%1)		use(%1)
cond_br %arg0, ^bb3, ^bb4		cond_br %arg0, ^bb3, ^bb4
^bb3:		^bb3:
br ^bb5(%1 : memref<?xf32>)		br ^bb5(%1 : memref<?xf32>)
^bb4:		^bb4:
br ^bb5(%1 : memref<?xf32>)		br ^bb5(%1 : memref<?xf32>)
^bb5(%2: memref<?xf32>): // non-crit. alias of %1, since %1 dominates %2		^bb5(%2: memref<?xf32>): // non-crit. alias of %1, since %1 dominates %2
br ^bb6(%2 : memref<?xf32>)		br ^bb6(%2 : memref<?xf32>)
^bb6(%3: memref<?xf32>): // crit. alias of %arg1 and %2 (in other words %1)		^bb6(%3: memref<?xf32>): // crit. alias of %arg1 and %2 (in other words %1)
br ^bb7(%3 : memref<?xf32>)		br ^bb7(%3 : memref<?xf32>)
^bb7(%4: memref<?xf32>): // non-crit. alias of %3, since %3 dominates %4		^bb7(%4: memref<?xf32>): // non-crit. alias of %3, since %3 dominates %4
"linalg.copy"(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()		test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
return		return
}		}
```		```

Applying BufferDeallocation yields the following output:		Applying BufferDeallocation yields the following output:

![nested_branch_example_post_move](/includes/img/nested_branch_example_post_move.svg)		![nested_branch_example_post_move](/includes/img/nested_branch_example_post_move.svg)

```mlir		```mlir
func @condBranchDynamicTypeNested(		func @condBranchDynamicTypeNested(
%arg0: i1,		%arg0: i1,
%arg1: memref<?xf32>,		%arg1: memref<?xf32>,
%arg2: memref<?xf32>,		%arg2: memref<?xf32>,
%arg3: index) {		%arg3: index) {
cond_br %arg0, ^bb1, ^bb2(%arg3 : index)		cond_br %arg0, ^bb1, ^bb2(%arg3 : index)
^bb1:		^bb1:
%c0 = constant 0 : index		// temp buffer required due to alias %3
%d0 = dim %arg1, %c0 : memref<?xf32>		%5 = memref.clone %arg1 : (memref<?xf32>) -> (memref<?xf32>)
		herhutUnsubmitted Done Reply Inline Actions Nit, should we keep the comment that explains the reason for the clone? herhut: Nit, should we keep the comment that explains the reason for the clone?
%5 = alloc(%d0) : memref<?xf32> // temp buffer required due to alias %3
"linalg.copy"(%arg1, %5) : (memref<?xf32>, memref<?xf32>) -> ()
br ^bb6(%5 : memref<?xf32>)		br ^bb6(%5 : memref<?xf32>)
^bb2(%0: index):		^bb2(%0: index):
%1 = alloc(%0) : memref<?xf32>		%1 = memref.alloc(%0) : memref<?xf32>
use(%1)		use(%1)
cond_br %arg0, ^bb3, ^bb4		cond_br %arg0, ^bb3, ^bb4
^bb3:		^bb3:
br ^bb5(%1 : memref<?xf32>)		br ^bb5(%1 : memref<?xf32>)
^bb4:		^bb4:
br ^bb5(%1 : memref<?xf32>)		br ^bb5(%1 : memref<?xf32>)
^bb5(%2: memref<?xf32>):		^bb5(%2: memref<?xf32>):
%c0_0 = constant 0 : index		%6 = memref.clone %1 : (memref<?xf32>) -> (memref<?xf32>)
%d1 = dim %2, %c0_0 : memref<?xf32>		memref.dealloc %1 : memref<?xf32>
%6 = alloc(%d1) : memref<?xf32> // temp buffer required due to alias %3
"linalg.copy"(%1, %6) : (memref<?xf32>, memref<?xf32>) -> ()
dealloc %1 : memref<?xf32>
br ^bb6(%6 : memref<?xf32>)		br ^bb6(%6 : memref<?xf32>)
^bb6(%3: memref<?xf32>):		^bb6(%3: memref<?xf32>):
br ^bb7(%3 : memref<?xf32>)		br ^bb7(%3 : memref<?xf32>)
^bb7(%4: memref<?xf32>):		^bb7(%4: memref<?xf32>):
"linalg.copy"(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()		test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>) -> ()
dealloc %3 : memref<?xf32> // free %3, since %4 is a non-crit. alias of %3		memref.dealloc %3 : memref<?xf32> // free %3, since %4 is a non-crit. alias of %3
return		return
}		}
```		```

Since %3 is a critical alias, BufferDeallocation introduces an additional		Since %3 is a critical alias, BufferDeallocation introduces an additional
temporary copy in all predecessor blocks. %3 has an additional (non-critical)		temporary copy in all predecessor blocks. %3 has an additional (non-critical)
alias %4 that extends the live range until the end of bb7. Therefore, we can		alias %4 that extends the live range until the end of bb7. Therefore, we can
free %3 after its last use, while taking all aliases into account. Note that %4		free %3 after its last use, while taking all aliases into account. Note that %4
does not need to be freed, since we did not introduce a copy for it.		does not need to be freed, since we did not introduce a copy for it.

The actual introduction of buffer copies is done after the fix-point iteration		The actual introduction of buffer copies is done after the fix-point iteration
has been terminated and all critical aliases have been detected. A critical		has been terminated and all critical aliases have been detected. A critical
alias can be either a block argument or another value that is returned by an		alias can be either a block argument or another value that is returned by an
operation. Copies for block arguments are handled by analyzing all predecessor		operation. Copies for block arguments are handled by analyzing all predecessor
blocks. This is primarily done by querying the `BranchOpInterface` of the		blocks. This is primarily done by querying the `BranchOpInterface` of the
associated branch terminators that can jump to the current block. Consider the		associated branch terminators that can jump to the current block. Consider the
following example which involves a simple branch and the critical block		following example which involves a simple branch and the critical block
Show All 29 Lines
operation returns a result to the parent operation. This sample demonstrates		operation returns a result to the parent operation. This sample demonstrates
the use of the `RegionBranchOpInterface` to determine predecessors in order to		the use of the `RegionBranchOpInterface` to determine predecessors in order to
infer the high-level control flow:		infer the high-level control flow:

```mlir		```mlir
func @inner_region_control_flow(		func @inner_region_control_flow(
%arg0 : index,		%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {		%arg1 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg0) : memref<?x?xf32>		%0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)		%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1		then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1
custom.region_if_yield %arg2 : memref<?x?xf32>		custom.region_if_yield %arg2 : memref<?x?xf32>
} else(%arg3 : memref<?x?xf32>) { // aliases: %arg4, %1		} else(%arg3 : memref<?x?xf32>) { // aliases: %arg4, %1
custom.region_if_yield %arg3 : memref<?x?xf32>		custom.region_if_yield %arg3 : memref<?x?xf32>
} join(%arg4 : memref<?x?xf32>) { // aliases: %1		} join(%arg4 : memref<?x?xf32>) { // aliases: %1
custom.region_if_yield %arg4 : memref<?x?xf32>		custom.region_if_yield %arg4 : memref<?x?xf32>
}		}
return %1 : memref<?x?xf32>		return %1 : memref<?x?xf32>
}		}
```		```

![region_branch_example_pre_move](/includes/img/region_branch_example_pre_move.svg)		![region_branch_example_pre_move](/includes/img/region_branch_example_pre_move.svg)

Non-block arguments (other values) can become aliases when they are returned by		Non-block arguments (other values) can become aliases when they are returned by
dialect-specific operations. BufferDeallocation supports this behavior via the		dialect-specific operations. BufferDeallocation supports this behavior via the
`RegionBranchOpInterface`. Consider the following example that uses an “scf.if”		`RegionBranchOpInterface`. Consider the following example that uses an “scf.if”
operation to determine the value of %2 at runtime which creates an alias:		operation to determine the value of %2 at runtime which creates an alias:

```mlir		```mlir
func @nested_region_control_flow(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {		func @nested_region_control_flow(%arg0 : index, %arg1 : index) -> memref<?x?xf32> {
%0 = cmpi "eq", %arg0, %arg1 : index		%0 = cmpi "eq", %arg0, %arg1 : index
%1 = alloc(%arg0, %arg0) : memref<?x?xf32>		%1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
%2 = scf.if %0 -> (memref<?x?xf32>) {		%2 = scf.if %0 -> (memref<?x?xf32>) {
scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1		scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1
} else {		} else {
%3 = alloc(%arg0, %arg1) : memref<?x?xf32> // nested allocation in a div.		%3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // nested allocation in a div.
// branch		// branch
use(%3)		use(%3)
scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1		scf.yield %1 : memref<?x?xf32> // %2 will be an alias of %1
}		}
return %2 : memref<?x?xf32>		return %2 : memref<?x?xf32>
}		}
```		```

In this example, a dealloc is inserted to release the buffer within the else		In this example, a dealloc is inserted to release the buffer within the else
block since it cannot be accessed by the remainder of the program. Accessing		block since it cannot be accessed by the remainder of the program. Accessing
the `RegionBranchOpInterface`, allows us to infer that %2 is a non-critical		the `RegionBranchOpInterface`, allows us to infer that %2 is a non-critical
alias of %1 which does not need to be tracked.		alias of %1 which does not need to be tracked.

```mlir		```mlir
func @nested_region_control_flow(%arg0: index, %arg1: index) -> memref<?x?xf32> {		func @nested_region_control_flow(%arg0: index, %arg1: index) -> memref<?x?xf32> {
%0 = cmpi "eq", %arg0, %arg1 : index		%0 = cmpi "eq", %arg0, %arg1 : index
%1 = alloc(%arg0, %arg0) : memref<?x?xf32>		%1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
%2 = scf.if %0 -> (memref<?x?xf32>) {		%2 = scf.if %0 -> (memref<?x?xf32>) {
scf.yield %1 : memref<?x?xf32>		scf.yield %1 : memref<?x?xf32>
} else {		} else {
%3 = alloc(%arg0, %arg1) : memref<?x?xf32>		%3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
use(%3)		use(%3)
dealloc %3 : memref<?x?xf32> // %3 can be safely freed here		memref.dealloc %3 : memref<?x?xf32> // %3 can be safely freed here
scf.yield %1 : memref<?x?xf32>		scf.yield %1 : memref<?x?xf32>
}		}
return %2 : memref<?x?xf32>		return %2 : memref<?x?xf32>
}		}
```		```

Analogous to the previous case, we have to detect all terminator operations in		Analogous to the previous case, we have to detect all terminator operations in
all attached regions of “scf.if” that provides a value to its parent operation		all attached regions of “scf.if” that provides a value to its parent operation
(in this sample via scf.yield). Querying the `RegionBranchOpInterface` allows		(in this sample via scf.yield). Querying the `RegionBranchOpInterface` allows
us to determine the regions that “return” a result to their parent operation.		us to determine the regions that “return” a result to their parent operation.
Like before, we have to update all `ReturnLike` terminators as described above.		Like before, we have to update all `ReturnLike` terminators as described above.
Reconsider a slightly adapted version of the “custom.region_if” example from		Reconsider a slightly adapted version of the “custom.region_if” example from
above that uses a nested allocation:		above that uses a nested allocation:

```mlir		```mlir
func @inner_region_control_flow_div(		func @inner_region_control_flow_div(
%arg0 : index,		%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {		%arg1 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg0) : memref<?x?xf32>		%0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)		%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1		then(%arg2 : memref<?x?xf32>) { // aliases: %arg4, %1
custom.region_if_yield %arg2 : memref<?x?xf32>		custom.region_if_yield %arg2 : memref<?x?xf32>
} else(%arg3 : memref<?x?xf32>) {		} else(%arg3 : memref<?x?xf32>) {
%2 = alloc(%arg0, %arg1) : memref<?x?xf32> // aliases: %arg4, %1		%2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32> // aliases: %arg4, %1
custom.region_if_yield %2 : memref<?x?xf32>		custom.region_if_yield %2 : memref<?x?xf32>
} join(%arg4 : memref<?x?xf32>) { // aliases: %1		} join(%arg4 : memref<?x?xf32>) { // aliases: %1
custom.region_if_yield %arg4 : memref<?x?xf32>		custom.region_if_yield %arg4 : memref<?x?xf32>
}		}
return %1 : memref<?x?xf32>		return %1 : memref<?x?xf32>
}		}
```		```

Since the allocation %2 happens in a divergent branch and cannot be safely		Since the allocation %2 happens in a divergent branch and cannot be safely
deallocated in a post-dominator, %arg4 will be considered a critical alias.		deallocated in a post-dominator, %arg4 will be considered a critical alias.
Furthermore, %arg4 is returned to its parent operation and has an alias %1.		Furthermore, %arg4 is returned to its parent operation and has an alias %1.
This causes BufferDeallocation to introduce additional copies:		This causes BufferDeallocation to introduce additional copies:

```mlir		```mlir
func @inner_region_control_flow_div(		func @inner_region_control_flow_div(
%arg0 : index,		%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {		%arg1 : index) -> memref<?x?xf32> {
%0 = alloc(%arg0, %arg0) : memref<?x?xf32>		%0 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)		%1 = custom.region_if %0 : memref<?x?xf32> -> (memref<?x?xf32>)
then(%arg2 : memref<?x?xf32>) {		then(%arg2 : memref<?x?xf32>) {
%c0 = constant 0 : index // determine dimension extents for temp allocation		%4 = memref.clone %arg2 : (memref<?x?xf32>) -> (memref<?x?xf32>)
%2 = dim %arg2, %c0 : memref<?x?xf32>
%c1 = constant 1 : index
%3 = dim %arg2, %c1 : memref<?x?xf32>
%4 = alloc(%2, %3) : memref<?x?xf32> // temp buffer required due to critic.
// alias %arg4
linalg.copy(%arg2, %4) : memref<?x?xf32>, memref<?x?xf32>
custom.region_if_yield %4 : memref<?x?xf32>		custom.region_if_yield %4 : memref<?x?xf32>
} else(%arg3 : memref<?x?xf32>) {		} else(%arg3 : memref<?x?xf32>) {
%2 = alloc(%arg0, %arg1) : memref<?x?xf32>		%2 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
%c0 = constant 0 : index // determine dimension extents for temp allocation		%5 = memref.clone %2 : (memref<?x?xf32>) -> (memref<?x?xf32>)
%3 = dim %2, %c0 : memref<?x?xf32>		memref.dealloc %2 : memref<?x?xf32>
%c1 = constant 1 : index
%4 = dim %2, %c1 : memref<?x?xf32>
%5 = alloc(%3, %4) : memref<?x?xf32> // temp buffer required due to critic.
// alias %arg4
linalg.copy(%2, %5) : memref<?x?xf32>, memref<?x?xf32>
dealloc %2 : memref<?x?xf32>
custom.region_if_yield %5 : memref<?x?xf32>		custom.region_if_yield %5 : memref<?x?xf32>
} join(%arg4: memref<?x?xf32>) {		} join(%arg4: memref<?x?xf32>) {
%c0 = constant 0 : index // determine dimension extents for temp allocation		%4 = memref.clone %arg4 : (memref<?x?xf32>) -> (memref<?x?xf32>)
%2 = dim %arg4, %c0 : memref<?x?xf32>		memref.dealloc %arg4 : memref<?x?xf32>
%c1 = constant 1 : index
%3 = dim %arg4, %c1 : memref<?x?xf32>
%4 = alloc(%2, %3) : memref<?x?xf32> // this allocation will be removed by
// applying the copy removal pass
linalg.copy(%arg4, %4) : memref<?x?xf32>, memref<?x?xf32>
dealloc %arg4 : memref<?x?xf32>
custom.region_if_yield %4 : memref<?x?xf32>		custom.region_if_yield %4 : memref<?x?xf32>
}		}
dealloc %0 : memref<?x?xf32> // %0 can be safely freed here		memref.dealloc %0 : memref<?x?xf32> // %0 can be safely freed here
return %1 : memref<?x?xf32>		return %1 : memref<?x?xf32>
}		}
```		```

## Placement of Deallocs		## Placement of Deallocs

After introducing allocs and copies, deallocs have to be placed to free		After introducing allocs and copies, deallocs have to be placed to free
allocated memory and avoid memory leaks. The deallocation needs to take place		allocated memory and avoid memory leaks. The deallocation needs to take place
Show All 13 Lines	func @loop_nested_if(
%ub: index,		%ub: index,
%step: index,		%step: index,
%buf: memref<2xf32>,		%buf: memref<2xf32>,
%res: memref<2xf32>) {		%res: memref<2xf32>) {
%0 = scf.for %i = %lb to %ub step %step		%0 = scf.for %i = %lb to %ub step %step
iter_args(%iterBuf = %buf) -> memref<2xf32> {		iter_args(%iterBuf = %buf) -> memref<2xf32> {
%1 = cmpi "eq", %i, %ub : index		%1 = cmpi "eq", %i, %ub : index
%2 = scf.if %1 -> (memref<2xf32>) {		%2 = scf.if %1 -> (memref<2xf32>) {
%3 = alloc() : memref<2xf32> // makes %2 a critical alias due to a		%3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias due to a
// divergent allocation		// divergent allocation
use(%3)		use(%3)
scf.yield %3 : memref<2xf32>		scf.yield %3 : memref<2xf32>
} else {		} else {
scf.yield %iterBuf : memref<2xf32>		scf.yield %iterBuf : memref<2xf32>
}		}
scf.yield %2 : memref<2xf32>		scf.yield %2 : memref<2xf32>
}		}
"linalg.copy"(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()		test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
return		return
}		}
```		```

In this example, the _then_ branch of the nested “scf.if” operation returns a		In this example, the _then_ branch of the nested “scf.if” operation returns a
newly allocated buffer.		newly allocated buffer.

Since this allocation happens in the scope of a divergent branch, %2 becomes a		Since this allocation happens in the scope of a divergent branch, %2 becomes a
critical alias that needs to be handled. As before, we have to insert		critical alias that needs to be handled. As before, we have to insert
additional copies to eliminate this alias using copies of %3 and %iterBuf. This		additional copies to eliminate this alias using copies of %3 and %iterBuf. This
guarantees that %2 will be a newly allocated buffer that is returned in each		guarantees that %2 will be a newly allocated buffer that is returned in each
iteration. However, “returning” %2 to its alias %iterBuf turns %iterBuf into a		iteration. However, “returning” %2 to its alias %iterBuf turns %iterBuf into a
critical alias as well. In other words, we have to create a copy of %2 to pass		critical alias as well. In other words, we have to create a copy of %2 to pass
it to %iterBuf. Since this jump represents a back edge, and %2 will always be a		it to %iterBuf. Since this jump represents a back edge, and %2 will always be a
new buffer, we have to free the buffer from the previous iteration to avoid		new buffer, we have to free the buffer from the previous iteration to avoid
memory leaks:		memory leaks:

```mlir		```mlir
func @loop_nested_if(		func @loop_nested_if(
%lb: index,		%lb: index,
%ub: index,		%ub: index,
%step: index,		%step: index,
%buf: memref<2xf32>,		%buf: memref<2xf32>,
%res: memref<2xf32>) {		%res: memref<2xf32>) {
%4 = alloc() : memref<2xf32>		%4 = memref.clone %buf : (memref<2xf32>) -> (memref<2xf32>)
"linalg.copy"(%buf, %4) : (memref<2xf32>, memref<2xf32>) -> ()
%0 = scf.for %i = %lb to %ub step %step		%0 = scf.for %i = %lb to %ub step %step
iter_args(%iterBuf = %4) -> memref<2xf32> {		iter_args(%iterBuf = %4) -> memref<2xf32> {
%1 = cmpi "eq", %i, %ub : index		%1 = cmpi "eq", %i, %ub : index
%2 = scf.if %1 -> (memref<2xf32>) {		%2 = scf.if %1 -> (memref<2xf32>) {
%3 = alloc() : memref<2xf32> // makes %2 a critical alias		%3 = memref.alloc() : memref<2xf32> // makes %2 a critical alias
use(%3)		use(%3)
%5 = alloc() : memref<2xf32> // temp copy due to crit. alias %2		%5 = memref.clone %3 : (memref<2xf32>) -> (memref<2xf32>)
"linalg.copy"(%3, %5) : memref<2xf32>, memref<2xf32>		memref.dealloc %3 : memref<2xf32>
dealloc %3 : memref<2xf32>
scf.yield %5 : memref<2xf32>		scf.yield %5 : memref<2xf32>
} else {		} else {
%6 = alloc() : memref<2xf32> // temp copy due to crit. alias %2		%6 = memref.clone %iterBuf : (memref<2xf32>) -> (memref<2xf32>)
"linalg.copy"(%iterBuf, %6) : memref<2xf32>, memref<2xf32>
scf.yield %6 : memref<2xf32>		scf.yield %6 : memref<2xf32>
}		}
%7 = alloc() : memref<2xf32> // temp copy due to crit. alias %iterBuf		%7 = memref.clone %2 : (memref<2xf32>) -> (memref<2xf32>)
"linalg.copy"(%2, %7) : memref<2xf32>, memref<2xf32>		memref.dealloc %2 : memref<2xf32>
dealloc %2 : memref<2xf32>		memref.dealloc %iterBuf : memref<2xf32> // free backedge iteration variable
dealloc %iterBuf : memref<2xf32> // free backedge iteration variable
scf.yield %7 : memref<2xf32>		scf.yield %7 : memref<2xf32>
}		}
"linalg.copy"(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()		test.copy(%0, %res) : (memref<2xf32>, memref<2xf32>) -> ()
dealloc %0 : memref<2xf32> // free temp copy %0		memref.dealloc %0 : memref<2xf32> // free temp copy %0
return		return
}		}
```		```

Example for loop-like control flow. The CFG contains back edges that have to be		Example for loop-like control flow. The CFG contains back edges that have to be
handled to avoid memory leaks. The bufferization is able to free the backedge		handled to avoid memory leaks. The bufferization is able to free the backedge
iteration variable %iterBuf.		iteration variable %iterBuf.

Show All 9 Lines
placed in front of this position. However, the most important analysis is the		placed in front of this position. However, the most important analysis is the
alias analysis that is needed to introduce copies and to place all		alias analysis that is needed to introduce copies and to place all
deallocations.		deallocations.

# Post Phase		# Post Phase

In order to limit the complexity of the BufferDeallocation transformation, some		In order to limit the complexity of the BufferDeallocation transformation, some
tiny code-polishing/optimization transformations are not applied on-the-fly		tiny code-polishing/optimization transformations are not applied on-the-fly
during placement. Currently, there is only the CopyRemoval transformation to		during placement. Currently, a canonicalization pattern is added to the clone
remove unnecessary copy and allocation operations.		operation to reduce the appearance of unnecessary clones.

Note: further transformations might be added to the post-pass phase in the		Note: further transformations might be added to the post-pass phase in the
future.		future.

## CopyRemoval Pass		## Clone Canonicalization
		herhutUnsubmitted Done Reply Inline Actions Is this pass still needed? The copy removal as described here relies on the fact that source and destination of the copy are not mutated after the copy operation. While this is true for `memref.clone`, it is not true for copy operations. So to do this kind of optimization, one would need to analyse the program first. As the bufferization pass no longer introduces copy operations, maybe we should rather drop the incomplete pass for now. herhut: Is this pass still needed? The copy removal as described here relies on the fact that source…

A common pattern that arises during placement is the introduction of		During placement of clones it may happen, that unnecessary clones are inserted.
unnecessary temporary copies that are used instead of the original source		If these clones appear with their corresponding dealloc operation within the
buffer. For this reason, there is a post-pass transformation that removes these		same block, we can use the canonicalizer to remove these unnecessary operations.
allocations and copies via `-copy-removal`. This pass, besides removing		Note, that this step needs to take place after the insertion of clones and
unnecessary copy operations, will also remove the dead allocations and their		deallocs in the buffer deallocation step. The canonicalization inludes both,
corresponding deallocation operations. The CopyRemoval pass can currently be		the newly created target value from the clone operation and the source
applied to operations that implement the `CopyOpInterface` in any of these two		operation.
situations which are
		## Canonicalization of the Source Buffer of the Clone Operation
* reusing the source buffer of the copy operation.
* reusing the target buffer of the copy operation.		In this case, the source of the clone operation can be used instead of its
		target. The unused allocation and deallocation operations that are defined for
## Reusing the Source Buffer of the Copy Operation		this clone operation are also removed. Here is a working example generated by
		the BufferDeallocation pass that allocates a buffer with dynamic size. A deeper
In this case, the source of the copy operation can be used instead of target.
The unused allocation and deallocation operations that are defined for this
copy operation are also removed. Here is a working example generated by the
BufferDeallocation pass that allocates a buffer with dynamic size. A deeper
analysis of this sample reveals that the highlighted operations are redundant		analysis of this sample reveals that the highlighted operations are redundant
and can be removed.		and can be removed.

```mlir		```mlir
func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {		func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {
%7 = alloc(%arg0, %arg1) : memref<?x?xf32>		%1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
%c0_0 = constant 0 : index		%2 = memref.clone %1 : (memref<?x?xf32>) -> (memref<?x?xf32>)
%8 = dim %7, %c0_0 : memref<?x?xf32>		memref.dealloc %1 : memref<?x?xf32>
%c1_1 = constant 1 : index		return %2 : memref<?x?xf32>
%9 = dim %7, %c1_1 : memref<?x?xf32>
%10 = alloc(%8, %9) : memref<?x?xf32>
linalg.copy(%7, %10) : memref<?x?xf32>, memref<?x?xf32>
dealloc %7 : memref<?x?xf32>
return %10 : memref<?x?xf32>
}		}
```		```

Will be transformed to:		Will be transformed to:

```mlir		```mlir
func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {		func @dynamic_allocation(%arg0: index, %arg1: index) -> memref<?x?xf32> {
%7 = alloc(%arg0, %arg1) : memref<?x?xf32>		%1 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
%c0_0 = constant 0 : index		return %1 : memref<?x?xf32>
%8 = dim %7, %c0_0 : memref<?x?xf32>
%c1_1 = constant 1 : index
%9 = dim %7, %c1_1 : memref<?x?xf32>
return %7 : memref<?x?xf32>
}		}
```		```

In this case, the additional copy %10 can be replaced with its original source		In this case, the additional copy %2 can be replaced with its original source
buffer %7. This also applies to the associated dealloc operation of %7.		buffer %1. This also applies to the associated dealloc operation of %1.

To limit the complexity of this transformation, it only removes copy operations		## Canonicalization of the Target Buffer of the Clone Operation
when the following constraints are met:
		In this case, the target buffer of the clone operation can be used instead of
* The copy operation, the defining operation for the target value, and the		its source. The unused deallocation operation that is defined for this clone
deallocation of the source value lie in the same block.		operation is also removed.
* There are no users/aliases of the target value between the defining operation
of the target value and its copy operation.		Consider the following example where a generic test operation writes the result
* There are no users/aliases of the source value between its associated copy		to %temp and then copies %temp to %result. However, these two operations
operation and the deallocation of the source value.		can be merged into a single step. Canonicalization removes the clone operation
		and %temp, and replaces the uses of %temp with %result:
## Reusing the Target Buffer of the Copy Operation

In this case, the target buffer of the copy operation can be used instead of
its source. The unused allocation and deallocation operations that are defined
for this copy operation are also removed.

Consider the following example where a generic linalg operation writes the
result to %temp and then copies %temp to %result. However, these two operations
can be merged into a single step. Copy removal removes the copy operation and
%temp, and replaces the uses of %temp with %result:

```mlir		```mlir
func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){		func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){
%temp = alloc() : memref<2xf32>		%temp = memref.alloc() : memref<2xf32>
linalg.generic {		test.generic {
args_in = 1 : i64,		args_in = 1 : i64,
args_out = 1 : i64,		args_out = 1 : i64,
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
iterator_types = ["parallel"]} %arg0, %temp {		iterator_types = ["parallel"]} %arg0, %temp {
^bb0(%gen2_arg0: f32, %gen2_arg1: f32):		^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
%tmp2 = exp %gen2_arg0 : f32		%tmp2 = exp %gen2_arg0 : f32
linalg.yield %tmp2 : f32		test.yield %tmp2 : f32
}: memref<2xf32>, memref<2xf32>		}: memref<2xf32>, memref<2xf32>
"linalg.copy"(%temp, %result) : (memref<2xf32>, memref<2xf32>) -> ()		%result = memref.clone %temp : (memref<2xf32>) -> (memref<2xf32>)
dealloc %temp : memref<2xf32>		memref.dealloc %temp : memref<2xf32>
return		return
}		}
```		```

Will be transformed to:		Will be transformed to:

```mlir		```mlir
func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){		func @reuseTarget(%arg0: memref<2xf32>, %result: memref<2xf32>){
linalg.generic {		test.generic {
args_in = 1 : i64,		args_in = 1 : i64,
args_out = 1 : i64,		args_out = 1 : i64,
indexing_maps = [#map0, #map0],		indexing_maps = [#map0, #map0],
iterator_types = ["parallel"]} %arg0, %result {		iterator_types = ["parallel"]} %arg0, %result {
^bb0(%gen2_arg0: f32, %gen2_arg1: f32):		^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
%tmp2 = exp %gen2_arg0 : f32		%tmp2 = exp %gen2_arg0 : f32
linalg.yield %tmp2 : f32		test.yield %tmp2 : f32
}: memref<2xf32>, memref<2xf32>		}: memref<2xf32>, memref<2xf32>
return		return
}		}
```		```

Like before, several constraints to use the transformation apply:

* The copy operation, the defining operation of the source value, and the
deallocation of the source value lie in the same block.
* There are no users/aliases of the target value between the defining operation
of the source value and the copy operation.
* There are no users/aliases of the source value between the copy operation and
the deallocation of the source value.

## Known Limitations		## Known Limitations

BufferDeallocation introduces additional copies using allocations from the		BufferDeallocation introduces additional clones from “memref” dialect
		herhutUnsubmitted Done Reply Inline Actions This is no longer true. herhut: This is no longer true.
“memref” dialect (“memref.alloc”). Analogous, all deallocations use the		(“memref.clone”). Analogous, all deallocations use the “memref” dialect-free
“memref” dialect-free operation “memref.dealloc”. The actual copy process is		operation “memref.dealloc”. The actual copy process is realized using
realized using “linalg.copy”. Furthermore, buffers are essentially immutable		“test.copy”. Furthermore, buffers are essentially immutable after their
after their creation in a block. Another limitations are known in the case		creation in a block. Another limitations are known in the case using
using unstructered control flow.		unstructered control flow.

mlir/include/mlir/Dialect/MemRef/IR/MemRef.h

	//===- MemRef.h - MemRef dialect --------------------------------- C++ --===//			//===- MemRef.h - MemRef dialect --------------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_MEMREF_IR_MEMREF_H_			#ifndef MLIR_DIALECT_MEMREF_IR_MEMREF_H_
	#define MLIR_DIALECT_MEMREF_IR_MEMREF_H_			#define MLIR_DIALECT_MEMREF_IR_MEMREF_H_

	#include "mlir/IR/Dialect.h"			#include "mlir/IR/Dialect.h"
	#include "mlir/Interfaces/CallInterfaces.h"			#include "mlir/Interfaces/CallInterfaces.h"
	#include "mlir/Interfaces/CastInterfaces.h"			#include "mlir/Interfaces/CastInterfaces.h"
				#include "mlir/Interfaces/CopyOpInterface.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"
	#include "mlir/Interfaces/ViewLikeInterface.h"			#include "mlir/Interfaces/ViewLikeInterface.h"

	namespace mlir {			namespace mlir {

	class Location;			class Location;
	class OpBuilder;			class OpBuilder;

	▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td

//===- MemRefOps.td - MemRef op definitions ----------------- tablegen --===//		//===- MemRefOps.td - MemRef op definitions ----------------- tablegen --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef MEMREF_OPS		#ifndef MEMREF_OPS
#define MEMREF_OPS		#define MEMREF_OPS

include "mlir/Dialect/MemRef/IR/MemRefBase.td"		include "mlir/Dialect/MemRef/IR/MemRefBase.td"
include "mlir/Interfaces/CastInterfaces.td"		include "mlir/Interfaces/CastInterfaces.td"
		include "mlir/Interfaces/CopyOpInterface.td"
include "mlir/Interfaces/SideEffectInterfaces.td"		include "mlir/Interfaces/SideEffectInterfaces.td"
include "mlir/Interfaces/ViewLikeInterface.td"		include "mlir/Interfaces/ViewLikeInterface.td"
include "mlir/IR/SymbolInterfaces.td"		include "mlir/IR/SymbolInterfaces.td"

class MemRef_Op<string mnemonic, list<OpTrait> traits = []>		class MemRef_Op<string mnemonic, list<OpTrait> traits = []>
: Op<MemRef_Dialect, mnemonic, traits> {		: Op<MemRef_Dialect, mnemonic, traits> {
let printer = [{ return ::print(p, *this); }];		let printer = [{ return ::print(p, *this); }];
let verifier = [{ return ::verify(*this); }];		let verifier = [{ return ::verify(*this); }];
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	def MemRef_BufferCastOp : MemRef_Op<"buffer_cast",
let summary = "tensor to memref cast operation";		let summary = "tensor to memref cast operation";
let description = [{		let description = [{
Casts a tensor to a memref.		Casts a tensor to a memref.

```mlir		```mlir
// Result type is tensor<4x?xf32>		// Result type is tensor<4x?xf32>
%12 = memref.buffer_cast %10 : memref<4x?xf32, #map0, 42>		%12 = memref.buffer_cast %10 : memref<4x?xf32, #map0, 42>
```		```

		Note, that mutating the result of the buffer cast operation leads to
		undefined behavior.
}];		}];

let arguments = (ins AnyTensor:$tensor);		let arguments = (ins AnyTensor:$tensor);
let results = (outs AnyRankedOrUnrankedMemRef:$memref);		let results = (outs AnyRankedOrUnrankedMemRef:$memref);
// This op is fully verified by traits.		// This op is fully verified by traits.
let verifier = ?;		let verifier = ?;

let assemblyFormat = "$tensor attr-dict `:` type($memref)";		let assemblyFormat = "$tensor attr-dict `:` type($memref)";
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{

Value getViewSource() { return source(); }		Value getViewSource() { return source(); }
}];		}];

let hasFolder = 1;		let hasFolder = 1;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// CloneOp
		//===----------------------------------------------------------------------===//

		def CloneOp : MemRef_Op<"clone", [
		CopyOpInterface,
		DeclareOpInterfaceMethods<MemoryEffectsOpInterface>
		]> {
		let builders = [
		OpBuilder<(ins "Value":$value), [{
		return build($_builder, $_state, value.getType(), value);
		}]>];

		let description = [{
		Clones the data in the input view into an implicitly defined output view.
		herhutUnsubmitted Done Reply Inline Actions Can you add a remark that mutating the source and result of `clone` after the clone operation has executed has undefined behavior? This is a requirement to making the bufferization logic work. And while at it, could you also add a similar comment to `tensor_load` (the source may not be mutated afterwards, at least when used in the context of bufferization) and `memref.buffer_cast` (the result may not be mutated). As we are restricting the existing `tensor_load` here, maybe it would be better to have a new operation with the restricted semantics but that is outside of the scope of this diff. herhut: Can you add a remark that mutating the source and result of `clone` after the clone operation…

		Usage:

		```mlir
		%arg 1 = memref.clone %arg0 : memref<?xf32> to memref<?xf32>
		pifon2aUnsubmitted Not Done Reply Inline Actions nit: `arg 1` -> `arg1` pifon2a: nit: `arg 1` -> `arg1`
		```

		Note, that mutating the source or result of the clone operation leads to
		herhutUnsubmitted Done Reply Inline Actions nit: source or result. herhut: nit: source or result.
		undefined behavior.
		}];

		let arguments = (ins Arg<AnyMemRef, "", []>:$input);
		let results = (outs Arg<AnyMemRef, "", []>:$output);

		let extraClassDeclaration = [{
		Value getSource() { return input();}
		Value getTarget() { return output(); }
		}];

		let assemblyFormat = "$input attr-dict `:` type($input) `to` type($output)";

		let hasFolder = 1;
		let hasCanonicalizer = 1;
		}

		//===----------------------------------------------------------------------===//
// DeallocOp		// DeallocOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def MemRef_DeallocOp : MemRef_Op<"dealloc", [MemRefsNormalizable]> {		def MemRef_DeallocOp : MemRef_Op<"dealloc", [MemRefsNormalizable]> {
let summary = "memory deallocation operation";		let summary = "memory deallocation operation";
let description = [{		let description = [{
The `dealloc` operation frees the region of memory referenced by a memref		The `dealloc` operation frees the region of memory referenced by a memref
which was originally created by the `alloc` operation.		which was originally created by the `alloc` operation.
▲ Show 20 Lines • Show All 759 Lines • ▼ Show 20 Lines	let description = [{
involving tensors and memrefs.		involving tensors and memrefs.

Example:		Example:

```mlir		```mlir
// Produces a value of tensor<4x?xf32> type.		// Produces a value of tensor<4x?xf32> type.
%12 = memref.tensor_load %10 : memref<4x?xf32, #layout, memspace0>		%12 = memref.tensor_load %10 : memref<4x?xf32, #layout, memspace0>
```		```

		If tensor load is used in the bufferization steps, mutating the source
		herhutUnsubmitted Done Reply Inline Actions nit: source source herhut: nit: source source
		buffer after loading leads to undefined behavior.
}];		}];

let arguments = (ins Arg<AnyRankedOrUnrankedMemRef,		let arguments = (ins Arg<AnyRankedOrUnrankedMemRef,
"the reference to load from", [MemRead]>:$memref);		"the reference to load from", [MemRead]>:$memref);
let results = (outs AnyTensor:$result);		let results = (outs AnyTensor:$result);
// TensorLoadOp is fully verified by traits.		// TensorLoadOp is fully verified by traits.
let verifier = ?;		let verifier = ?;

▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h

This file was added.

				//===- MemRefUtils.h - MemRef transformation utilities ----------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header file defines prototypes for various transformation utilities for
				// the MemRefOps dialect. These are not passes by themselves but are used
				// either by passes, optimization sequences, or in turn by other transformation
				// utilities.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_DIALECT_MEMREF_UTILS_MEMREFUTILS_H
				#define MLIR_DIALECT_MEMREF_UTILS_MEMREFUTILS_H

				#include "mlir/Dialect/MemRef/IR/MemRef.h"

				namespace mlir {

				/// Finds the associated dealloc that can be linked to our allocation nodes (if
				/// any).
				Operation *findDealloc(Value allocValue);

				herhutUnsubmitted Done Reply Inline Actions With the guarantee that the cloned buffer cannot be mutated, is this actually required? herhut: With the guarantee that the cloned buffer cannot be mutated, is this actually required?
				herhutUnsubmitted Done Reply Inline Actions These are only used for the canonicalization pattern, so I'd prefer to make them local. herhut: These are only used for the canonicalization pattern, so I'd prefer to make them local.
				dfki-jugrAuthorUnsubmitted Done Reply Inline Actions findDealloc is also used in BufferDeallocation. dfki-jugr: findDealloc is also used in BufferDeallocation.
				} // end namespace mlir

				#endif // MLIR_DIALECT_MEMREF_UTILS_MEMREFUTILS_H
				herhutUnsubmitted Done Reply Inline Actions This only finds one dealloc. For its use, would it not be OK to find the dealloc in the current block (can only be one). herhut: This only finds one dealloc. For its use, would it not be OK to find the dealloc in the current…

mlir/include/mlir/Transforms/BufferUtils.h

Show All 33 Lines	public:
/// Represents a list containing all alloc entries.		/// Represents a list containing all alloc entries.
using AllocEntryList = SmallVector<AllocEntry, 8>;		using AllocEntryList = SmallVector<AllocEntry, 8>;

/// Get the start operation to place the given alloc value within the		/// Get the start operation to place the given alloc value within the
/// specified placement block.		/// specified placement block.
static Operation getStartOperation(Value allocValue, Block placementBlock,		static Operation getStartOperation(Value allocValue, Block placementBlock,
const Liveness &liveness);		const Liveness &liveness);

/// Find an associated dealloc operation that is linked to the given
/// allocation node (if any).
static Operation *findDealloc(Value allocValue);

public:		public:
/// Initializes the internal list by discovering all supported allocation		/// Initializes the internal list by discovering all supported allocation
/// nodes.		/// nodes.
BufferPlacementAllocs(Operation *op);		BufferPlacementAllocs(Operation *op);

/// Returns the begin iterator to iterate over all allocations.		/// Returns the begin iterator to iterate over all allocations.
AllocEntryList::const_iterator begin() const { return allocs.begin(); }		AllocEntryList::const_iterator begin() const { return allocs.begin(); }

▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

mlir/include/mlir/Transforms/Passes.h

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	std::unique_ptr<FunctionPass> createFinalizingBufferizePass();			std::unique_ptr<FunctionPass> createFinalizingBufferizePass();

	/// Creates a pass that converts memref function results to out-params.			/// Creates a pass that converts memref function results to out-params.
	std::unique_ptr<Pass> createBufferResultsToOutParamsPass();			std::unique_ptr<Pass> createBufferResultsToOutParamsPass();

	/// Creates an instance of the Canonicalizer pass.			/// Creates an instance of the Canonicalizer pass.
	std::unique_ptr<Pass> createCanonicalizerPass();			std::unique_ptr<Pass> createCanonicalizerPass();

	/// Create a pass that removes unnecessary Copy operations.
	std::unique_ptr<Pass> createCopyRemovalPass();

	/// Creates a pass to perform common sub expression elimination.			/// Creates a pass to perform common sub expression elimination.
	std::unique_ptr<Pass> createCSEPass();			std::unique_ptr<Pass> createCSEPass();

	/// Creates a loop fusion pass which fuses loops. Buffers of size less than or			/// Creates a loop fusion pass which fuses loops. Buffers of size less than or
	/// equal to `localBufSizeThreshold` are promoted to memory space			/// equal to `localBufSizeThreshold` are promoted to memory space
	/// `fastMemorySpace'.			/// `fastMemorySpace'.
	std::unique_ptr<OperationPass<FuncOp>>			std::unique_ptr<OperationPass<FuncOp>>
	createLoopFusionPass(unsigned fastMemorySpace = 0,			createLoopFusionPass(unsigned fastMemorySpace = 0,
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

mlir/include/mlir/Transforms/Passes.td

Show First 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	module {
return		return
}		}

}		}
```		```

}];		}];
let constructor = "mlir::createBufferDeallocationPass()";		let constructor = "mlir::createBufferDeallocationPass()";
// TODO: this pass likely shouldn't depend on Linalg?
let dependentDialects = ["linalg::LinalgDialect"];
}		}

def BufferHoisting : FunctionPass<"buffer-hoisting"> {		def BufferHoisting : FunctionPass<"buffer-hoisting"> {
let summary = "Optimizes placement of allocation operations by moving them "		let summary = "Optimizes placement of allocation operations by moving them "
"into common dominators and out of nested regions";		"into common dominators and out of nested regions";
let description = [{		let description = [{
This pass implements an approach to aggressively move allocations upwards		This pass implements an approach to aggressively move allocations upwards
into common dominators and out of nested regions.		into common dominators and out of nested regions.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	let description = [{
This pass performs various types of canonicalizations over a set of		This pass performs various types of canonicalizations over a set of
operations. See [Operation Canonicalization](Canonicalization.md) for more		operations. See [Operation Canonicalization](Canonicalization.md) for more
details.		details.
}];		}];
let constructor = "mlir::createCanonicalizerPass()";		let constructor = "mlir::createCanonicalizerPass()";
let dependentDialects = ["memref::MemRefDialect"];		let dependentDialects = ["memref::MemRefDialect"];
}		}

def CopyRemoval : FunctionPass<"copy-removal"> {
let summary = "Remove the redundant copies from input IR";
let constructor = "mlir::createCopyRemovalPass()";
}

def CSE : Pass<"cse"> {		def CSE : Pass<"cse"> {
let summary = "Eliminate common sub-expressions";		let summary = "Eliminate common sub-expressions";
let description = [{		let description = [{
This pass implements a generalized algorithm for common sub-expression		This pass implements a generalized algorithm for common sub-expression
elimination. This pass relies on information provided by the		elimination. This pass relies on information provided by the
`Memory SideEffect` interface to identify when it is safe to eliminate		`Memory SideEffect` interface to identify when it is safe to eliminate
operations. See [Common subexpression elimination](https://en.wikipedia.org/wiki/Common_subexpression_elimination)		operations. See [Common subexpression elimination](https://en.wikipedia.org/wiki/Common_subexpression_elimination)
for more general details on this optimization.		for more general details on this optimization.
▲ Show 20 Lines • Show All 371 Lines • Show Last 20 Lines

mlir/lib/Dialect/MemRef/CMakeLists.txt

	add_subdirectory(IR)			add_mlir_dialect_library(MLIRMemRef
				IR/MemRefDialect.cpp
				IR/MemRefOps.cpp
				Utils/MemRefUtils.cpp

				ADDITIONAL_HEADER_DIRS
				${PROJECT_SOURCE_DIR}/inlude/mlir/Dialect/MemRefDialect

				DEPENDS
				MLIRStandardOpsIncGen
				MLIRMemRefOpsIncGen

				LINK_COMPONENTS
				Core

				LINK_LIBS PUBLIC
				MLIRDialect
				MLIRIR
				MLIRStandard
				MLIRTensor
				MLIRViewLikeInterface
				)

mlir/lib/Dialect/MemRef/IR/CMakeLists.txt

This file was deleted.

	add_mlir_dialect_library(MLIRMemRef
	MemRefDialect.cpp
	MemRefOps.cpp

	ADDITIONAL_HEADER_DIRS
	${PROJECT_SOURCE_DIR}/inlude/mlir/Dialect/MemRefDialect

	DEPENDS
	MLIRStandardOpsIncGen
	MLIRMemRefOpsIncGen

	LINK_COMPONENTS
	Core

	LINK_LIBS PUBLIC
	MLIRDialect
	MLIRIR
	MLIRStandard
	MLIRTensor
	MLIRViewLikeInterface
	)

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
		#include "mlir/Dialect/MemRef/Utils/MemRefUtils.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/Dialect/StandardOps/Utils/Utils.h"		#include "mlir/Dialect/StandardOps/Utils/Utils.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
#include "mlir/IR/BuiltinTypes.h"		#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
▲ Show 20 Lines • Show All 442 Lines • ▼ Show 20 Lines	bool CastOp::areCastCompatible(TypeRange inputs, TypeRange outputs) {
return false;		return false;
}		}

OpFoldResult CastOp::fold(ArrayRef<Attribute> operands) {		OpFoldResult CastOp::fold(ArrayRef<Attribute> operands) {
return succeeded(foldMemRefCast(*this)) ? getResult() : Value();		return succeeded(foldMemRefCast(*this)) ? getResult() : Value();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// CloneOp
		//===----------------------------------------------------------------------===//

		static LogicalResult verify(CloneOp op) { return success(); }

		void CloneOp::getEffects(
		SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>>
		&effects) {
		effects.emplace_back(MemoryEffects::Read::get(), input(),
		SideEffects::DefaultResource::get());
		effects.emplace_back(MemoryEffects::Write::get(), output(),
		SideEffects::DefaultResource::get());
		}

		namespace {
		/// Fold Dealloc operations that are deallocating an AllocOp that is only used
		/// by other Dealloc operations.
		struct SimplifyClones : public OpRewritePattern<CloneOp> {
		using OpRewritePattern<CloneOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(CloneOp cloneOp,
		PatternRewriter &rewriter) const override {
		if (cloneOp.use_empty()) {
		rewriter.eraseOp(cloneOp);
		return success();
		}

		Value source = cloneOp.input();

		// Removes the clone operation and the corresponding dealloc and alloc
		// operation (if any).
		auto tryRemoveClone = [&](Operation sourceOp, Operation dealloc,
		Operation *alloc) {
		if (!sourceOp \|\| !dealloc \|\| !alloc \|\|
		alloc->getBlock() != dealloc->getBlock())
		return false;
		rewriter.replaceOp(cloneOp, source);
		rewriter.eraseOp(dealloc);
		return true;
		};

		// Removes unnecessary clones that are derived from the result of the clone
		// op.
		Operation *deallocOp = findDealloc(cloneOp.output());
		Operation *sourceOp = source.getDefiningOp();
		if (tryRemoveClone(sourceOp, deallocOp, sourceOp))
		return success();

		// Removes unnecessary clones that are derived from the source of the clone
		// op.
		deallocOp = findDealloc(source);
		if (tryRemoveClone(sourceOp, deallocOp, cloneOp))
		return success();

		return failure();
		}
		};

		} // end anonymous namespace.

		void CloneOp::getCanonicalizationPatterns(OwningRewritePatternList &results,
		MLIRContext *context) {
		results.insert<SimplifyClones>(context);
		}

		OpFoldResult CloneOp::fold(ArrayRef<Attribute> operands) {
		return succeeded(foldMemRefCast(*this)) ? getResult() : Value();
		}

		//===----------------------------------------------------------------------===//
// DeallocOp		// DeallocOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
namespace {		namespace {
/// Fold Dealloc operations that are deallocating an AllocOp that is only used		/// Fold Dealloc operations that are deallocating an AllocOp that is only used
/// by other Dealloc operations.		/// by other Dealloc operations.
struct SimplifyDeadDealloc : public OpRewritePattern<DeallocOp> {		struct SimplifyDeadDealloc : public OpRewritePattern<DeallocOp> {
using OpRewritePattern<DeallocOp>::OpRewritePattern;		using OpRewritePattern<DeallocOp>::OpRewritePattern;

▲ Show 20 Lines • Show All 1,623 Lines • Show Last 20 Lines

mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp

This file was added.

				//===- Utils.cpp - Utilities to support the MemRef dialect ----------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements utilities for the MemRef dialect.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/MemRef/Utils/MemRefUtils.h"
				#include "mlir/Dialect/MemRef/IR/MemRef.h"

				using namespace mlir;

				/// Finds associated deallocs that can be linked to our allocation nodes (if
				/// any).
				Operation *mlir::findDealloc(Value allocValue) {
				auto userIt = llvm::find_if(allocValue.getUsers(), [&](Operation *user) {
				auto effectInterface = dyn_cast<MemoryEffectOpInterface>(user);
				herhutUnsubmitted Done Reply Inline Actions should this be `&&`? herhut: should this be `&&`?
				if (!effectInterface)
				return false;
				// Try to find a free effect that is applied to one of our values
				// that will be automatically freed by our pass.
				SmallVector<MemoryEffects::EffectInstance, 2> effects;
				effectInterface.getEffectsOnValue(allocValue, effects);
				return llvm::any_of(effects, [&](MemoryEffects::EffectInstance &it) {
				return isa<MemoryEffects::Free>(it.getEffect());
				});
				});
				// Assign the associated dealloc operation (if any).
				return userIt != allocValue.user_end() ? *userIt : nullptr;
				}

mlir/lib/Transforms/BufferDeallocation.cpp

//===- BufferDeallocation.cpp - the impl for buffer deallocation ----------===//		//===- BufferDeallocation.cpp - the impl for buffer deallocation ----------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements logic for computing correct alloc and dealloc positions.		// This file implements logic for computing correct alloc and dealloc positions.
// Furthermore, buffer placement also adds required new alloc and copy		// Furthermore, buffer deallocation also adds required new clone operations to
// operations to ensure that all buffers are deallocated. The main class is the		// ensure that all buffers are deallocated. The main class is the
// BufferDeallocationPass class that implements the underlying algorithm. In		// BufferDeallocationPass class that implements the underlying algorithm. In
// order to put allocations and deallocations at safe positions, it is		// order to put allocations and deallocations at safe positions, it is
// significantly important to put them into the correct blocks. However, the		// significantly important to put them into the correct blocks. However, the
// liveness analysis does not pay attention to aliases, which can occur due to		// liveness analysis does not pay attention to aliases, which can occur due to
// branches (and their associated block arguments) in general. For this purpose,		// branches (and their associated block arguments) in general. For this purpose,
// BufferDeallocation firstly finds all possible aliases for a single value		// BufferDeallocation firstly finds all possible aliases for a single value
// (using the BufferAliasAnalysis class). Consider the following		// (using the BufferAliasAnalysis class). Consider the following example:
// example:
//		//
// ^bb0(%arg0):		// ^bb0(%arg0):
// cond_br %cond, ^bb1, ^bb2		// cond_br %cond, ^bb1, ^bb2
// ^bb1:		// ^bb1:
// br ^exit(%arg0)		// br ^exit(%arg0)
// ^bb2:		// ^bb2:
// %new_value = ...		// %new_value = ...
// br ^exit(%new_value)		// br ^exit(%new_value)
// ^exit(%arg1):		// ^exit(%arg1):
// return %arg1;		// return %arg1;
//		//
// We should place the dealloc for %new_value in exit. However, we have to free		// We should place the dealloc for %new_value in exit. However, we have to free
// the buffer in the same block, because it cannot be freed in the post		// the buffer in the same block, because it cannot be freed in the post
// dominator. However, this requires a new copy buffer for %arg1 that will		// dominator. However, this requires a new clone buffer for %arg1 that will
// contain the actual contents. Using the class BufferAliasAnalysis, we		// contain the actual contents. Using the class BufferAliasAnalysis, we
// will find out that %new_value has a potential alias %arg1. In order to find		// will find out that %new_value has a potential alias %arg1. In order to find
// the dealloc position we have to find all potential aliases, iterate over		// the dealloc position we have to find all potential aliases, iterate over
// their uses and find the common post-dominator block (note that additional		// their uses and find the common post-dominator block (note that additional
// copies and buffers remove potential aliases and will influence the placement		// clones and buffers remove potential aliases and will influence the placement
// of the deallocs). In all cases, the computed block can be safely used to free		// of the deallocs). In all cases, the computed block can be safely used to free
// the %new_value buffer (may be exit or bb2) as it will die and we can use		// the %new_value buffer (may be exit or bb2) as it will die and we can use
// liveness information to determine the exact operation after which we have to		// liveness information to determine the exact operation after which we have to
// insert the dealloc. However, the algorithm supports introducing copy buffers		// insert the dealloc. However, the algorithm supports introducing clone buffers
// and placing deallocs in safe locations to ensure that all buffers will be		// and placing deallocs in safe locations to ensure that all buffers will be
// freed in the end.		// freed in the end.
//		//
// TODO:		// TODO:
// The current implementation does not support explicit-control-flow loops and		// The current implementation does not support explicit-control-flow loops and
// the resulting code will be invalid with respect to program semantics.		// the resulting code will be invalid with respect to program semantics.
// However, structured control-flow loops are fully supported. Furthermore, it		// However, structured control-flow loops are fully supported. Furthermore, it
// doesn't accept functions which return buffers already.		// doesn't accept functions which return buffers already.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "PassDetail.h"		#include "PassDetail.h"
#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/Dialect/StandardOps/Utils/Utils.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"
#include "mlir/Interfaces/ControlFlowInterfaces.h"		#include "mlir/Interfaces/ControlFlowInterfaces.h"
#include "mlir/Interfaces/LoopLikeInterface.h"		#include "mlir/Interfaces/LoopLikeInterface.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/BufferUtils.h"		#include "mlir/Transforms/BufferUtils.h"
#include "mlir/Transforms/Passes.h"		#include "mlir/Transforms/Passes.h"
#include "llvm/ADT/SetOperations.h"		#include "llvm/ADT/SetOperations.h"

▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferDeallocation		// BufferDeallocation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// The buffer deallocation transformation which ensures that all allocs in the		/// The buffer deallocation transformation which ensures that all allocs in the
/// program have a corresponding de-allocation. As a side-effect, it might also		/// program have a corresponding de-allocation. As a side-effect, it might also
/// introduce copies that in turn leads to additional allocs and de-allocations.		/// introduce clones that in turn leads to additional deallocations.
class BufferDeallocation : BufferPlacementTransformationBase {		class BufferDeallocation : BufferPlacementTransformationBase {
public:		public:
BufferDeallocation(Operation *op)		BufferDeallocation(Operation *op)
: BufferPlacementTransformationBase(op), dominators(op),		: BufferPlacementTransformationBase(op), dominators(op),
postDominators(op) {}		postDominators(op) {}

/// Performs the actual placement/creation of all temporary alloc, copy and		/// Performs the actual placement/creation of all temporary clone and dealloc
/// dealloc nodes.		/// nodes.
void deallocate() {		void deallocate() {
// Add additional allocations and copies that are required.		// Add additional clones that are required.
introduceCopies();		introduceClones();
// Place deallocations for all allocation entries.		// Place deallocations for all allocation entries.
placeDeallocs();		placeDeallocs();
}		}

private:		private:
/// Introduces required allocs and copy operations to avoid memory leaks.		/// Introduces required clone operations to avoid memory leaks.
void introduceCopies() {		void introduceClones() {
// Initialize the set of values that require a dedicated memory free		// Initialize the set of values that require a dedicated memory free
// operation since their operands cannot be safely deallocated in a post		// operation since their operands cannot be safely deallocated in a post
// dominator.		// dominator.
SmallPtrSet<Value, 8> valuesToFree;		SmallPtrSet<Value, 8> valuesToFree;
llvm::SmallDenseSet<std::tuple<Value, Block *>> visitedValues;		llvm::SmallDenseSet<std::tuple<Value, Block *>> visitedValues;
SmallVector<std::tuple<Value, Block *>, 8> toProcess;		SmallVector<std::tuple<Value, Block *>, 8> toProcess;

// Check dominance relation for proper dominance properties. If the given		// Check dominance relation for proper dominance properties. If the given
// value node does not dominate an alias, we will have to create a copy in		// value node does not dominate an alias, we will have to create a clone in
// order to free all buffers that can potentially leak into a post		// order to free all buffers that can potentially leak into a post
// dominator.		// dominator.
auto findUnsafeValues = [&](Value source, Block *definingBlock) {		auto findUnsafeValues = [&](Value source, Block *definingBlock) {
auto it = aliases.find(source);		auto it = aliases.find(source);
if (it == aliases.end())		if (it == aliases.end())
return;		return;
for (Value value : it->second) {		for (Value value : it->second) {
if (valuesToFree.count(value) > 0)		if (valuesToFree.count(value) > 0)
Show All 24 Lines	while (!toProcess.empty()) {
auto current = toProcess.pop_back_val();		auto current = toProcess.pop_back_val();
findUnsafeValues(std::get<0>(current), std::get<1>(current));		findUnsafeValues(std::get<0>(current), std::get<1>(current));
}		}

// Update buffer aliases to ensure that we free all buffers and block		// Update buffer aliases to ensure that we free all buffers and block
// arguments at the correct locations.		// arguments at the correct locations.
aliases.remove(valuesToFree);		aliases.remove(valuesToFree);

// Add new allocs and additional copy operations.		// Add new allocs and additional clone operations.
for (Value value : valuesToFree) {		for (Value value : valuesToFree) {
if (auto blockArg = value.dyn_cast<BlockArgument>())		if (auto blockArg = value.dyn_cast<BlockArgument>())
introduceBlockArgCopy(blockArg);		introduceBlockArgCopy(blockArg);
else		else
introduceValueCopyForRegionResult(value);		introduceValueCopyForRegionResult(value);

// Register the value to require a final dealloc. Note that we do not have		// Register the value to require a final dealloc. Note that we do not have
// to assign a block here since we do not want to move the allocation node		// to assign a block here since we do not want to move the allocation node
// to another location.		// to another location.
allocs.registerAlloc(std::make_tuple(value, nullptr));		allocs.registerAlloc(std::make_tuple(value, nullptr));
}		}
}		}

/// Introduces temporary allocs in all predecessors and copies the source		/// Introduces temporary clones in all predecessors and copies the source
/// values into the newly allocated buffers.		/// values into the newly allocated buffers.
void introduceBlockArgCopy(BlockArgument blockArg) {		void introduceBlockArgCopy(BlockArgument blockArg) {
// Allocate a buffer for the current block argument in the block of		// Allocate a buffer for the current block argument in the block of
// the associated value (which will be a predecessor block by		// the associated value (which will be a predecessor block by
// definition).		// definition).
Block *block = blockArg.getOwner();		Block *block = blockArg.getOwner();
for (auto it = block->pred_begin(), e = block->pred_end(); it != e; ++it) {		for (auto it = block->pred_begin(), e = block->pred_end(); it != e; ++it) {
// Get the terminator and the value that will be passed to our		// Get the terminator and the value that will be passed to our
// argument.		// argument.
Operation terminator = (it)->getTerminator();		Operation terminator = (it)->getTerminator();
auto branchInterface = cast<BranchOpInterface>(terminator);		auto branchInterface = cast<BranchOpInterface>(terminator);
// Query the associated source value.		// Query the associated source value.
Value sourceValue =		Value sourceValue =
branchInterface.getSuccessorOperands(it.getSuccessorIndex())		branchInterface.getSuccessorOperands(it.getSuccessorIndex())
.getValue()[blockArg.getArgNumber()];		.getValue()[blockArg.getArgNumber()];
// Create a new alloc and copy at the current location of the terminator.		// Create a new clone at the current location of the terminator.
Value alloc = introduceBufferCopy(sourceValue, terminator);		Value clone = introduceCloneBuffers(sourceValue, terminator);
// Wire new alloc and successor operand.		// Wire new clone and successor operand.
auto mutableOperands =		auto mutableOperands =
branchInterface.getMutableSuccessorOperands(it.getSuccessorIndex());		branchInterface.getMutableSuccessorOperands(it.getSuccessorIndex());
if (!mutableOperands.hasValue())		if (!mutableOperands.hasValue())
terminator->emitError() << "terminators with immutable successor "		terminator->emitError() << "terminators with immutable successor "
"operands are not supported";		"operands are not supported";
else		else
mutableOperands.getValue()		mutableOperands.getValue()
.slice(blockArg.getArgNumber(), 1)		.slice(blockArg.getArgNumber(), 1)
.assign(alloc);		.assign(clone);
}		}

// Check whether the block argument has implicitly defined predecessors via		// Check whether the block argument has implicitly defined predecessors via
// the RegionBranchOpInterface. This can be the case if the current block		// the RegionBranchOpInterface. This can be the case if the current block
// argument belongs to the first block in a region and the parent operation		// argument belongs to the first block in a region and the parent operation
// implements the RegionBranchOpInterface.		// implements the RegionBranchOpInterface.
Region *argRegion = block->getParent();		Region *argRegion = block->getParent();
Operation *parentOp = argRegion->getParentOp();		Operation *parentOp = argRegion->getParentOp();
RegionBranchOpInterface regionInterface;		RegionBranchOpInterface regionInterface;
if (!argRegion \|\| &argRegion->front() != block \|\|		if (!argRegion \|\| &argRegion->front() != block \|\|
!(regionInterface = dyn_cast<RegionBranchOpInterface>(parentOp)))		!(regionInterface = dyn_cast<RegionBranchOpInterface>(parentOp)))
return;		return;

introduceCopiesForRegionSuccessors(		introduceClonesForRegionSuccessors(
regionInterface, argRegion->getParentOp()->getRegions(), blockArg,		regionInterface, argRegion->getParentOp()->getRegions(), blockArg,
[&](RegionSuccessor &successorRegion) {		[&](RegionSuccessor &successorRegion) {
// Find a predecessor of our argRegion.		// Find a predecessor of our argRegion.
return successorRegion.getSuccessor() == argRegion;		return successorRegion.getSuccessor() == argRegion;
});		});

// Check whether the block argument belongs to an entry region of the		// Check whether the block argument belongs to an entry region of the
// parent operation. In this case, we have to introduce an additional copy		// parent operation. In this case, we have to introduce an additional clone
// for buffer that is passed to the argument.		// for buffer that is passed to the argument.
SmallVector<RegionSuccessor, 2> successorRegions;		SmallVector<RegionSuccessor, 2> successorRegions;
regionInterface.getSuccessorRegions(/index=/llvm::None, successorRegions);		regionInterface.getSuccessorRegions(/index=/llvm::None, successorRegions);
auto *it =		auto *it =
llvm::find_if(successorRegions, [&](RegionSuccessor &successorRegion) {		llvm::find_if(successorRegions, [&](RegionSuccessor &successorRegion) {
return successorRegion.getSuccessor() == argRegion;		return successorRegion.getSuccessor() == argRegion;
});		});
if (it == successorRegions.end())		if (it == successorRegions.end())
return;		return;

// Determine the actual operand to introduce a copy for and rewire the		// Determine the actual operand to introduce a clone for and rewire the
// operand to point to the copy instead.		// operand to point to the clone instead.
Value operand =		Value operand =
regionInterface.getSuccessorEntryOperands(argRegion->getRegionNumber())		regionInterface.getSuccessorEntryOperands(argRegion->getRegionNumber())
[llvm::find(it->getSuccessorInputs(), blockArg).getIndex()];		[llvm::find(it->getSuccessorInputs(), blockArg).getIndex()];
Value copy = introduceBufferCopy(operand, parentOp);		Value clone = introduceCloneBuffers(operand, parentOp);

auto op = llvm::find(parentOp->getOperands(), operand);		auto op = llvm::find(parentOp->getOperands(), operand);
assert(op != parentOp->getOperands().end() &&		assert(op != parentOp->getOperands().end() &&
"parentOp does not contain operand");		"parentOp does not contain operand");
parentOp->setOperand(op.getIndex(), copy);		parentOp->setOperand(op.getIndex(), clone);
}		}

/// Introduces temporary allocs in front of all associated nested-region		/// Introduces temporary clones in front of all associated nested-region
/// terminators and copies the source values into the newly allocated buffers.		/// terminators and copies the source values into the newly allocated buffers.
void introduceValueCopyForRegionResult(Value value) {		void introduceValueCopyForRegionResult(Value value) {
// Get the actual result index in the scope of the parent terminator.		// Get the actual result index in the scope of the parent terminator.
Operation *operation = value.getDefiningOp();		Operation *operation = value.getDefiningOp();
auto regionInterface = cast<RegionBranchOpInterface>(operation);		auto regionInterface = cast<RegionBranchOpInterface>(operation);
// Filter successors that return to the parent operation.		// Filter successors that return to the parent operation.
auto regionPredicate = [&](RegionSuccessor &successorRegion) {		auto regionPredicate = [&](RegionSuccessor &successorRegion) {
// If the RegionSuccessor has no associated successor, it will return to		// If the RegionSuccessor has no associated successor, it will return to
// its parent operation.		// its parent operation.
return !successorRegion.getSuccessor();		return !successorRegion.getSuccessor();
};		};
// Introduce a copy for all region "results" that are returned to the parent		// Introduce a clone for all region "results" that are returned to the
// operation. This is required since the parent's result value has been		// parent operation. This is required since the parent's result value has
// considered critical. Therefore, the algorithm assumes that a copy of a		// been considered critical. Therefore, the algorithm assumes that a clone
// previously allocated buffer is returned by the operation (like in the		// of a previously allocated buffer is returned by the operation (like in
// case of a block argument).		// the case of a block argument).
introduceCopiesForRegionSuccessors(regionInterface, operation->getRegions(),		introduceClonesForRegionSuccessors(regionInterface, operation->getRegions(),
value, regionPredicate);		value, regionPredicate);
}		}

/// Introduces buffer copies for all terminators in the given regions. The		/// Introduces buffer clones for all terminators in the given regions. The
/// regionPredicate is applied to every successor region in order to restrict		/// regionPredicate is applied to every successor region in order to restrict
/// the copies to specific regions.		/// the clones to specific regions.
template <typename TPredicate>		template <typename TPredicate>
void introduceCopiesForRegionSuccessors(		void introduceClonesForRegionSuccessors(
RegionBranchOpInterface regionInterface, MutableArrayRef<Region> regions,		RegionBranchOpInterface regionInterface, MutableArrayRef<Region> regions,
Value argValue, const TPredicate &regionPredicate) {		Value argValue, const TPredicate &regionPredicate) {
for (Region &region : regions) {		for (Region &region : regions) {
// Query the regionInterface to get all successor regions of the current		// Query the regionInterface to get all successor regions of the current
// one.		// one.
SmallVector<RegionSuccessor, 2> successorRegions;		SmallVector<RegionSuccessor, 2> successorRegions;
regionInterface.getSuccessorRegions(region.getRegionNumber(),		regionInterface.getSuccessorRegions(region.getRegionNumber(),
successorRegions);		successorRegions);
Show All 9 Lines	for (Region &region : regions) {
.getIndex();		.getIndex();

// Iterate over all immediate terminator operations to introduce		// Iterate over all immediate terminator operations to introduce
// new buffer allocations. Thereby, the appropriate terminator operand		// new buffer allocations. Thereby, the appropriate terminator operand
// will be adjusted to point to the newly allocated buffer instead.		// will be adjusted to point to the newly allocated buffer instead.
walkReturnOperations(&region, [&](Operation *terminator) {		walkReturnOperations(&region, [&](Operation *terminator) {
// Extract the source value from the current terminator.		// Extract the source value from the current terminator.
Value sourceValue = terminator->getOperand(operandIndex);		Value sourceValue = terminator->getOperand(operandIndex);
// Create a new alloc at the current location of the terminator.		// Create a new clone at the current location of the terminator.
Value alloc = introduceBufferCopy(sourceValue, terminator);		Value clone = introduceCloneBuffers(sourceValue, terminator);
// Wire alloc and terminator operand.		// Wire clone and terminator operand.
terminator->setOperand(operandIndex, alloc);		terminator->setOperand(operandIndex, clone);
});		});
}		}
}		}

/// Creates a new memory allocation for the given source value and copies		/// Creates a new memory allocation for the given source value and clones
/// its content into the newly allocated buffer. The terminator operation is		/// its content into the newly allocated buffer. The terminator operation is
/// used to insert the alloc and copy operations at the right places.		/// used to insert the clone operation at the right place.
Value introduceBufferCopy(Value sourceValue, Operation *terminator) {		Value introduceCloneBuffers(Value sourceValue, Operation *terminator) {
// Avoid multiple copies of the same source value. This can happen in the		// Avoid multiple clones of the same source value. This can happen in the
		herhutUnsubmitted Done Reply Inline Actions Is this still true? herhut: Is this still true?
		dfki-jugrAuthorUnsubmitted Done Reply Inline Actions Yes, it can still happens that there are clones of clones. dfki-jugr: Yes, it can still happens that there are clones of clones.
// presence of loops when a branch acts as a backedge while also having		// presence of loops when a branch acts as a backedge while also having
// another successor that returns to its parent operation. Note: that		// another successor that returns to its parent operation. Note: that
// copying copied buffers can introduce memory leaks since the invariant of		// copying copied buffers can introduce memory leaks since the invariant of
// BufferPlacement assumes that a buffer will be only copied once into a		// BufferDeallocation assumes that a buffer will be only cloned once into a
// temporary buffer. Hence, the construction of copy chains introduces		// temporary buffer. Hence, the construction of clone chains introduces
// additional allocations that are not tracked automatically by the		// additional allocations that are not tracked automatically by the
// algorithm.		// algorithm.
if (copiedValues.contains(sourceValue))		if (clonedValues.contains(sourceValue))
return sourceValue;		return sourceValue;
// Create a new alloc at the current location of the terminator.		// Create a new clone operation that copies the contents of the old
auto memRefType = sourceValue.getType().cast<MemRefType>();		// buffer to the new one.
OpBuilder builder(terminator);		OpBuilder builder(terminator);
		auto cloneOp =
		builder.create<memref::CloneOp>(terminator->getLoc(), sourceValue);

// Extract information about dynamically shaped types by		// Remember the clone of original source value.
// extracting their dynamic dimensions.		clonedValues.insert(cloneOp);
auto dynamicOperands =		return cloneOp;
getDynOperands(terminator->getLoc(), sourceValue, builder);

// TODO: provide a generic interface to create dialect-specific
// Alloc and CopyOp nodes.
auto alloc = builder.create<memref::AllocOp>(terminator->getLoc(),
memRefType, dynamicOperands);

// Create a new copy operation that copies to contents of the old
// allocation to the new one.
builder.create<linalg::CopyOp>(terminator->getLoc(), sourceValue, alloc);

// Remember the copy of original source value.
copiedValues.insert(alloc);
return alloc;
}		}

/// Finds correct dealloc positions according to the algorithm described at		/// Finds correct dealloc positions according to the algorithm described at
/// the top of the file for all alloc nodes and block arguments that can be		/// the top of the file for all alloc nodes and block arguments that can be
/// handled by this analysis.		/// handled by this analysis.
void placeDeallocs() const {		void placeDeallocs() const {
// Move or insert deallocs using the previously computed information.		// Move or insert deallocs using the previously computed information.
// These deallocations will be linked to their associated allocation nodes		// These deallocations will be linked to their associated allocation nodes
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	private:
/// The dominator info to find the appropriate start operation to move the		/// The dominator info to find the appropriate start operation to move the
/// allocs.		/// allocs.
DominanceInfo dominators;		DominanceInfo dominators;

/// The post dominator info to move the dependent allocs in the right		/// The post dominator info to move the dependent allocs in the right
/// position.		/// position.
PostDominanceInfo postDominators;		PostDominanceInfo postDominators;

/// Stores already copied allocations to avoid additional copies of copies.		/// Stores already cloned buffers to avoid additional clones of clones.
ValueSetT copiedValues;		ValueSetT clonedValues;
};		};

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferDeallocationPass		// BufferDeallocationPass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// The actual buffer deallocation pass that inserts and moves dealloc nodes		/// The actual buffer deallocation pass that inserts and moves dealloc nodes
/// into the right positions. Furthermore, it inserts additional allocs and		/// into the right positions. Furthermore, it inserts additional clones if
/// copies if necessary. It uses the algorithm described at the top of the file.		/// necessary. It uses the algorithm described at the top of the file.
struct BufferDeallocationPass : BufferDeallocationBase<BufferDeallocationPass> {		struct BufferDeallocationPass : BufferDeallocationBase<BufferDeallocationPass> {

void runOnFunction() override {		void runOnFunction() override {
// Ensure that there are supported loops only.		// Ensure that there are supported loops only.
Backedges backedges(getFunction());		Backedges backedges(getFunction());
if (backedges.size()) {		if (backedges.size()) {
getFunction().emitError(		getFunction().emitError(
"Structured control-flow loops are supported only.");		"Structured control-flow loops are supported only.");
return signalPassFailure();		return signalPassFailure();
}		}

// Check that the control flow structures are supported.		// Check that the control flow structures are supported.
if (!validateSupportedControlFlow(getFunction().getRegion())) {		if (!validateSupportedControlFlow(getFunction().getRegion())) {
return signalPassFailure();		return signalPassFailure();
}		}

// Place all required temporary alloc, copy and dealloc nodes.		// Place all required temporary clone and dealloc nodes.
BufferDeallocation deallocation(getFunction());		BufferDeallocation deallocation(getFunction());
deallocation.deallocate();		deallocation.deallocate();
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// BufferDeallocationPass construction		// BufferDeallocationPass construction
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

std::unique_ptr<Pass> mlir::createBufferDeallocationPass() {		std::unique_ptr<Pass> mlir::createBufferDeallocationPass() {
return std::make_unique<BufferDeallocationPass>();		return std::make_unique<BufferDeallocationPass>();
}		}

mlir/lib/Transforms/BufferUtils.cpp

//===- BufferUtils.cpp - buffer transformation utilities ------------------===//		//===- BufferUtils.cpp - buffer transformation utilities ------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements utilities for buffer optimization passes.		// This file implements utilities for buffer optimization passes.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Transforms/BufferUtils.h"		#include "mlir/Transforms/BufferUtils.h"
#include "PassDetail.h"		#include "PassDetail.h"
#include "mlir/Dialect/Linalg/IR/LinalgOps.h"		#include "mlir/Dialect/MemRef/Utils/MemRefUtils.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"
#include "mlir/Interfaces/ControlFlowInterfaces.h"		#include "mlir/Interfaces/ControlFlowInterfaces.h"
#include "mlir/Interfaces/LoopLikeInterface.h"		#include "mlir/Interfaces/LoopLikeInterface.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Transforms/Passes.h"		#include "mlir/Transforms/Passes.h"
#include "llvm/ADT/SetOperations.h"		#include "llvm/ADT/SetOperations.h"

Show All 20 Lines	Operation *opInPlacementBlock =
placementBlock->findAncestorOpInBlock(*startOperation);		placementBlock->findAncestorOpInBlock(*startOperation);
startOperation = opInPlacementBlock ? opInPlacementBlock		startOperation = opInPlacementBlock ? opInPlacementBlock
: placementBlock->getTerminator();		: placementBlock->getTerminator();
}		}

return startOperation;		return startOperation;
}		}

/// Finds associated deallocs that can be linked to our allocation nodes (if
/// any).
Operation *BufferPlacementAllocs::findDealloc(Value allocValue) {
auto userIt = llvm::find_if(allocValue.getUsers(), [&](Operation *user) {
auto effectInterface = dyn_cast<MemoryEffectOpInterface>(user);
if (!effectInterface)
return false;
// Try to find a free effect that is applied to one of our values
// that will be automatically freed by our pass.
SmallVector<MemoryEffects::EffectInstance, 2> effects;
effectInterface.getEffectsOnValue(allocValue, effects);
return llvm::any_of(effects, [&](MemoryEffects::EffectInstance &it) {
return isa<MemoryEffects::Free>(it.getEffect());
});
});
// Assign the associated dealloc operation (if any).
return userIt != allocValue.user_end() ? *userIt : nullptr;
}

/// Initializes the internal list by discovering all supported allocation		/// Initializes the internal list by discovering all supported allocation
/// nodes.		/// nodes.
BufferPlacementAllocs::BufferPlacementAllocs(Operation *op) { build(op); }		BufferPlacementAllocs::BufferPlacementAllocs(Operation *op) { build(op); }

/// Searches for and registers all supported allocation entries.		/// Searches for and registers all supported allocation entries.
void BufferPlacementAllocs::build(Operation *op) {		void BufferPlacementAllocs::build(Operation *op) {
op->walk([&](MemoryEffectOpInterface opInterface) {		op->walk([&](MemoryEffectOpInterface opInterface) {
// Try to find a single allocation result.		// Try to find a single allocation result.
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

mlir/lib/Transforms/CMakeLists.txt

	add_subdirectory(Utils)			add_subdirectory(Utils)

	add_mlir_library(MLIRTransforms			add_mlir_library(MLIRTransforms
	BufferDeallocation.cpp			BufferDeallocation.cpp
	BufferOptimizations.cpp			BufferOptimizations.cpp
	BufferResultsToOutParams.cpp			BufferResultsToOutParams.cpp
	BufferUtils.cpp			BufferUtils.cpp
	Bufferize.cpp			Bufferize.cpp
	Canonicalizer.cpp			Canonicalizer.cpp
	CopyRemoval.cpp
	CSE.cpp			CSE.cpp
	Inliner.cpp			Inliner.cpp
	LocationSnapshot.cpp			LocationSnapshot.cpp
	LoopCoalescing.cpp			LoopCoalescing.cpp
	LoopFusion.cpp			LoopFusion.cpp
	LoopInvariantCodeMotion.cpp			LoopInvariantCodeMotion.cpp
	MemRefDataFlowOpt.cpp			MemRefDataFlowOpt.cpp
	NormalizeMemRefs.cpp			NormalizeMemRefs.cpp
	Show All 28 Lines

mlir/lib/Transforms/CopyRemoval.cpp

This file was deleted.

	//===- CopyRemoval.cpp - Removing the redundant copies --------------------===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//

	#include "mlir/Interfaces/CopyOpInterface.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"
	#include "mlir/Pass/Pass.h"
	#include "mlir/Transforms/Passes.h"

	using namespace mlir;
	using namespace MemoryEffects;

	namespace {

	//===----------------------------------------------------------------------===//
	// CopyRemovalPass
	//===----------------------------------------------------------------------===//

	/// This pass removes the redundant Copy operations. Additionally, it
	/// removes the leftover definition and deallocation operations by erasing the
	/// copy operation.
	class CopyRemovalPass : public PassWrapper<CopyRemovalPass, OperationPass<>> {
	public:
	void runOnOperation() override {
	getOperation()->walk([&](CopyOpInterface copyOp) {
	reuseCopySourceAsTarget(copyOp);
	reuseCopyTargetAsSource(copyOp);
	});
	for (std::pair<Value, Value> &pair : replaceList)
	pair.first.replaceAllUsesWith(pair.second);
	for (Operation *op : eraseList)
	op->erase();
	}

	private:
	/// List of operations that need to be removed.
	llvm::SmallPtrSet<Operation *, 4> eraseList;

	/// List of values that need to be replaced with their counterparts.
	llvm::SmallDenseSet<std::pair<Value, Value>, 4> replaceList;

	/// Returns the allocation operation for `value` in `block` if it exists.
	/// nullptr otherwise.
	Operation getAllocationOpInBlock(Value value, Block block) {
	assert(block && "Block cannot be null");
	Operation *op = value.getDefiningOp();
	if (op && op->getBlock() == block) {
	auto effects = dyn_cast<MemoryEffectOpInterface>(op);
	if (effects && effects.hasEffect<Allocate>())
	return op;
	}
	return nullptr;
	}

	/// Returns the deallocation operation for `value` in `block` if it exists.
	/// nullptr otherwise.
	Operation getDeallocationOpInBlock(Value value, Block block) {
	assert(block && "Block cannot be null");
	auto valueUsers = value.getUsers();
	auto it = llvm::find_if(valueUsers, [&](Operation *op) {
	auto effects = dyn_cast<MemoryEffectOpInterface>(op);
	return effects && op->getBlock() == block && effects.hasEffect<Free>();
	});
	return (it == valueUsers.end() ? nullptr : *it);
	}

	/// Returns true if an operation between start and end operations has memory
	/// effect.
	bool hasMemoryEffectOpBetween(Operation start, Operation end) {
	assert((start \|\| end) && "Start and end operations cannot be null");
	assert(start->getBlock() == end->getBlock() &&
	"Start and end operations should be in the same block.");
	Operation *op = start->getNextNode();
	while (op->isBeforeInBlock(end)) {
	if (isa<MemoryEffectOpInterface>(op))
	return true;
	op = op->getNextNode();
	}
	return false;
	};

	/// Returns true if `val` value has at least a user between `start` and
	/// `end` operations.
	bool hasUsersBetween(Value val, Operation start, Operation end) {
	assert((start \|\| end) && "Start and end operations cannot be null");
	Block *block = start->getBlock();
	assert(block == end->getBlock() &&
	"Start and end operations should be in the same block.");
	return llvm::any_of(val.getUsers(), [&](Operation *op) {
	return op->getBlock() == block && start->isBeforeInBlock(op) &&
	op->isBeforeInBlock(end);
	});
	};

	bool areOpsInTheSameBlock(ArrayRef<Operation *> operations) {
	assert(!operations.empty() &&
	"The operations list should contain at least a single operation");
	Block *block = operations.front()->getBlock();
	return llvm::none_of(
	operations, [&](Operation *op) { return block != op->getBlock(); });
	}

	/// Input:
	/// func(){
	/// %from = alloc()
	/// write_to(%from)
	/// %to = alloc()
	/// copy(%from,%to)
	/// dealloc(%from)
	/// return %to
	/// }
	///
	/// Output:
	/// func(){
	/// %from = alloc()
	/// write_to(%from)
	/// return %from
	/// }
	/// Constraints:
	/// 1) %to, copy and dealloc must all be defined and lie in the same block.
	/// 2) This transformation cannot be applied if there is a single user/alias
	/// of `to` value between the defining operation of `to` and the copy
	/// operation.
	/// 3) This transformation cannot be applied if there is a single user/alias
	/// of `from` value between the copy operation and the deallocation of `from`.
	/// TODO: Alias analysis is not available at the moment. Currently, we check
	/// if there are any operations with memory effects between copy and
	/// deallocation operations.
	void reuseCopySourceAsTarget(CopyOpInterface copyOp) {
	if (eraseList.count(copyOp))
	return;

	Value from = copyOp.getSource();
	Value to = copyOp.getTarget();

	Operation *copy = copyOp.getOperation();
	Block *copyBlock = copy->getBlock();
	Operation *fromDefiningOp = from.getDefiningOp();
	Operation *fromFreeingOp = getDeallocationOpInBlock(from, copyBlock);
	Operation *toDefiningOp = getAllocationOpInBlock(to, copyBlock);
	if (!fromDefiningOp \|\| !fromFreeingOp \|\| !toDefiningOp \|\|
	!areOpsInTheSameBlock({fromFreeingOp, toDefiningOp, copy}) \|\|
	hasUsersBetween(to, toDefiningOp, copy) \|\|
	hasUsersBetween(from, copy, fromFreeingOp) \|\|
	hasMemoryEffectOpBetween(copy, fromFreeingOp))
	return;

	replaceList.insert({to, from});
	eraseList.insert(copy);
	eraseList.insert(toDefiningOp);
	eraseList.insert(fromFreeingOp);
	}

	/// Input:
	/// func(){
	/// %to = alloc()
	/// %from = alloc()
	/// write_to(%from)
	/// copy(%from,%to)
	/// dealloc(%from)
	/// return %to
	/// }
	///
	/// Output:
	/// func(){
	/// %to = alloc()
	/// write_to(%to)
	/// return %to
	/// }
	/// Constraints:
	/// 1) %from, copy and dealloc must all be defined and lie in the same block.
	/// 2) This transformation cannot be applied if there is a single user/alias
	/// of `to` value between the defining operation of `from` and the copy
	/// operation.
	/// 3) This transformation cannot be applied if there is a single user/alias
	/// of `from` value between the copy operation and the deallocation of `from`.
	/// TODO: Alias analysis is not available at the moment. Currently, we check
	/// if there are any operations with memory effects between copy and
	/// deallocation operations.
	void reuseCopyTargetAsSource(CopyOpInterface copyOp) {
	if (eraseList.count(copyOp))
	return;

	Value from = copyOp.getSource();
	Value to = copyOp.getTarget();

	Operation *copy = copyOp.getOperation();
	Block *copyBlock = copy->getBlock();
	Operation *fromDefiningOp = getAllocationOpInBlock(from, copyBlock);
	Operation *fromFreeingOp = getDeallocationOpInBlock(from, copyBlock);
	if (!fromDefiningOp \|\| !fromFreeingOp \|\|
	!areOpsInTheSameBlock({fromFreeingOp, fromDefiningOp, copy}) \|\|
	hasUsersBetween(to, fromDefiningOp, copy) \|\|
	hasUsersBetween(from, copy, fromFreeingOp) \|\|
	hasMemoryEffectOpBetween(copy, fromFreeingOp))
	return;

	replaceList.insert({from, to});
	eraseList.insert(copy);
	eraseList.insert(fromDefiningOp);
	eraseList.insert(fromFreeingOp);
	}
	};

	} // end anonymous namespace

	//===----------------------------------------------------------------------===//
	// CopyRemovalPass construction
	//===----------------------------------------------------------------------===//

	std::unique_ptr<Pass> mlir::createCopyRemovalPass() {
	return std::make_unique<CopyRemovalPass>();
	}

mlir/test/Transforms/buffer-deallocation.mlir

Show All 24 Lines	^bb2:
test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>)		test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>)
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)		test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)
return		return
}		}

// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br
// CHECK: %[[ALLOC0:.*]] = memref.alloc()		// CHECK: %[[ALLOC0:.*]] = memref.clone
// CHECK-NEXT: linalg.copy
// CHECK-NEXT: br ^bb3(%[[ALLOC0]]		// CHECK-NEXT: br ^bb3(%[[ALLOC0]]
// CHECK: %[[ALLOC1:.*]] = memref.alloc()		// CHECK: %[[ALLOC1:.*]] = memref.alloc
// CHECK-NEXT: test.buffer_based		// CHECK-NEXT: test.buffer_based
// CHECK: %[[ALLOC2:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC2:.*]] = memref.clone %[[ALLOC1]]
// CHECK-NEXT: linalg.copy
// CHECK-NEXT: memref.dealloc %[[ALLOC1]]		// CHECK-NEXT: memref.dealloc %[[ALLOC1]]
// CHECK-NEXT: br ^bb3(%[[ALLOC2]]		// CHECK-NEXT: br ^bb3(%[[ALLOC2]]
// CHECK: test.copy		// CHECK: test.copy
// CHECK-NEXT: memref.dealloc		// CHECK-NEXT: memref.dealloc
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

Show All 24 Lines	^bb2(%0: index):
test.buffer_based in(%arg1: memref<?xf32>) out(%1: memref<?xf32>)		test.buffer_based in(%arg1: memref<?xf32>) out(%1: memref<?xf32>)
br ^bb3(%1 : memref<?xf32>)		br ^bb3(%1 : memref<?xf32>)
^bb3(%2: memref<?xf32>):		^bb3(%2: memref<?xf32>):
test.copy(%2, %arg2) : (memref<?xf32>, memref<?xf32>)		test.copy(%2, %arg2) : (memref<?xf32>, memref<?xf32>)
return		return
}		}

// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br
// CHECK: %[[DIM0:.*]] = memref.dim		// CHECK: %[[ALLOC0:.*]] = memref.clone
// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc(%[[DIM0]])
// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])
// CHECK-NEXT: br ^bb3(%[[ALLOC0]]		// CHECK-NEXT: br ^bb3(%[[ALLOC0]]
// CHECK: ^bb2(%[[IDX:.]]:{{.}})		// CHECK: ^bb2(%[[IDX:.]]:{{.}})
// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc(%[[IDX]])		// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc(%[[IDX]])
// CHECK-NEXT: test.buffer_based		// CHECK-NEXT: test.buffer_based
// CHECK: %[[DIM1:.*]] = memref.dim %[[ALLOC1]]		// CHECK-NEXT: %[[ALLOC2:.*]] = memref.clone
// CHECK-NEXT: %[[ALLOC2:.*]] = memref.alloc(%[[DIM1]])
// CHECK-NEXT: linalg.copy(%[[ALLOC1]], %[[ALLOC2]])
// CHECK-NEXT: memref.dealloc %[[ALLOC1]]		// CHECK-NEXT: memref.dealloc %[[ALLOC1]]
// CHECK-NEXT: br ^bb3		// CHECK-NEXT: br ^bb3
// CHECK-NEXT: ^bb3(%[[ALLOC3:.]]:{{.}})		// CHECK-NEXT: ^bb3(%[[ALLOC3:.]]:{{.}})
// CHECK: test.copy(%[[ALLOC3]],		// CHECK: test.copy(%[[ALLOC3]],
// CHECK-NEXT: memref.dealloc %[[ALLOC3]]		// CHECK-NEXT: memref.dealloc %[[ALLOC3]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----
Show All 39 Lines	^bb5(%2: memref<?xf32>):
br ^bb6(%2 : memref<?xf32>)		br ^bb6(%2 : memref<?xf32>)
^bb6(%3: memref<?xf32>):		^bb6(%3: memref<?xf32>):
br ^bb7(%3 : memref<?xf32>)		br ^bb7(%3 : memref<?xf32>)
^bb7(%4: memref<?xf32>):		^bb7(%4: memref<?xf32>):
test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>)		test.copy(%4, %arg2) : (memref<?xf32>, memref<?xf32>)
return		return
}		}

// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br{{.*}}
// CHECK: ^bb1		// CHECK-NEXT: ^bb1
// CHECK: %[[DIM0:.*]] = memref.dim		// CHECK-NEXT: %[[ALLOC0:.*]] = memref.clone
// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc(%[[DIM0]])		// CHECK-NEXT: br ^bb6(%[[ALLOC0]]
// CHECK-NEXT: linalg.copy(%{{.*}}, %[[ALLOC0]])
// CHECK-NEXT: br ^bb6
// CHECK: ^bb2(%[[IDX:.]]:{{.}})		// CHECK: ^bb2(%[[IDX:.]]:{{.}})
// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc(%[[IDX]])		// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc(%[[IDX]])
// CHECK-NEXT: test.buffer_based		// CHECK-NEXT: test.buffer_based
// CHECK: cond_br		// CHECK: cond_br
// CHECK: ^bb3:		// CHECK: ^bb3:
// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})		// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})
// CHECK: ^bb4:		// CHECK: ^bb4:
// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})		// CHECK-NEXT: br ^bb5(%[[ALLOC1]]{{.*}})
// CHECK-NEXT: ^bb5(%[[ALLOC2:.]]:{{.}})		// CHECK-NEXT: ^bb5(%[[ALLOC2:.]]:{{.}})
// CHECK: %[[DIM2:.*]] = memref.dim %[[ALLOC2]]		// CHECK-NEXT: %[[ALLOC3:.*]] = memref.clone %[[ALLOC2]]
// CHECK-NEXT: %[[ALLOC3:.*]] = memref.alloc(%[[DIM2]])
// CHECK-NEXT: linalg.copy(%[[ALLOC2]], %[[ALLOC3]])
// CHECK-NEXT: memref.dealloc %[[ALLOC1]]		// CHECK-NEXT: memref.dealloc %[[ALLOC1]]
// CHECK-NEXT: br ^bb6(%[[ALLOC3]]{{.*}})		// CHECK-NEXT: br ^bb6(%[[ALLOC3]]{{.*}})
// CHECK-NEXT: ^bb6(%[[ALLOC4:.]]:{{.}})		// CHECK-NEXT: ^bb6(%[[ALLOC4:.]]:{{.}})
// CHECK-NEXT: br ^bb7(%[[ALLOC4]]{{.*}})		// CHECK-NEXT: br ^bb7(%[[ALLOC4]]{{.*}})
// CHECK-NEXT: ^bb7(%[[ALLOC5:.]]:{{.}})		// CHECK-NEXT: ^bb7(%[[ALLOC5:.]]:{{.}})
// CHECK: test.copy(%[[ALLOC5]],		// CHECK: test.copy(%[[ALLOC5]],
// CHECK-NEXT: memref.dealloc %[[ALLOC4]]		// CHECK-NEXT: memref.dealloc %[[ALLOC4]]
// CHECK-NEXT: return		// CHECK-NEXT: return
Show All 32 Lines	^bb1:
%0 = memref.alloc() : memref<2xf32>		%0 = memref.alloc() : memref<2xf32>
test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>)		test.buffer_based in(%arg1: memref<2xf32>) out(%0: memref<2xf32>)
br ^bb2(%0 : memref<2xf32>)		br ^bb2(%0 : memref<2xf32>)
^bb2(%1: memref<2xf32>):		^bb2(%1: memref<2xf32>):
test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)		test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)
return		return
}		}

// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC0:.*]] = memref.clone
// CHECK-NEXT: linalg.copy
// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br
// CHECK: %[[ALLOC1:.*]] = memref.alloc()		// CHECK: %[[ALLOC1:.*]] = memref.alloc()
// CHECK-NEXT: test.buffer_based		// CHECK-NEXT: test.buffer_based
// CHECK: %[[ALLOC2:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC2:.*]] = memref.clone %[[ALLOC1]]
// CHECK-NEXT: linalg.copy
// CHECK-NEXT: memref.dealloc %[[ALLOC1]]		// CHECK-NEXT: memref.dealloc %[[ALLOC1]]
// CHECK: test.copy		// CHECK: test.copy
// CHECK-NEXT: memref.dealloc		// CHECK-NEXT: memref.dealloc
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----

// Test Case:		// Test Case:
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	^bb2:
%1 = memref.alloc() : memref<2xf32>		%1 = memref.alloc() : memref<2xf32>
test.buffer_based in(%arg0: memref<2xf32>) out(%1: memref<2xf32>)		test.buffer_based in(%arg0: memref<2xf32>) out(%1: memref<2xf32>)
br ^exit(%1 : memref<2xf32>)		br ^exit(%1 : memref<2xf32>)
^exit(%arg2: memref<2xf32>):		^exit(%arg2: memref<2xf32>):
test.copy(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>)		test.copy(%arg2, %arg1) : (memref<2xf32>, memref<2xf32>)
return		return
}		}

// CHECK-NEXT: cond_br		// CHECK-NEXT: cond_br{{.*}}
// CHECK: ^bb1		// CHECK-NEXT: ^bb1
// CHECK: ^bb1
// CHECK: %[[ALLOC0:.*]] = memref.alloc()		// CHECK: %[[ALLOC0:.*]] = memref.alloc()
// CHECK-NEXT: test.buffer_based		// CHECK-NEXT: test.buffer_based
// CHECK: %[[ALLOC1:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC1:.*]] = memref.clone %[[ALLOC0]]
// CHECK-NEXT: linalg.copy
// CHECK-NEXT: memref.dealloc %[[ALLOC0]]		// CHECK-NEXT: memref.dealloc %[[ALLOC0]]
// CHECK-NEXT: br ^bb3(%[[ALLOC1]]		// CHECK-NEXT: br ^bb3(%[[ALLOC1]]
// CHECK-NEXT: ^bb2		// CHECK-NEXT: ^bb2
// CHECK-NEXT: %[[ALLOC2:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC2:.*]] = memref.alloc()
// CHECK-NEXT: test.buffer_based		// CHECK-NEXT: test.buffer_based
// CHECK: %[[ALLOC3:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC3:.*]] = memref.clone %[[ALLOC2]]
// CHECK-NEXT: linalg.copy
// CHECK-NEXT: memref.dealloc %[[ALLOC2]]		// CHECK-NEXT: memref.dealloc %[[ALLOC2]]
// CHECK-NEXT: br ^bb3(%[[ALLOC3]]		// CHECK-NEXT: br ^bb3(%[[ALLOC3]]
// CHECK-NEXT: ^bb3(%[[ALLOC4:.]]:{{.}})		// CHECK-NEXT: ^bb3(%[[ALLOC4:.]]:{{.}})
// CHECK: test.copy		// CHECK: test.copy
// CHECK-NEXT: memref.dealloc %[[ALLOC4]]		// CHECK-NEXT: memref.dealloc %[[ALLOC4]]
// CHECK-NEXT: return		// CHECK-NEXT: return

// -----		// -----
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	^bb2:
}		}
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)		test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)
return		return
}		}
// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})		// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})
// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]		// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]
// CHECK: %[[ALLOC0:.*]] = memref.alloc()		// CHECK: %[[ALLOC0:.*]] = memref.clone %[[ARG1]]
// CHECK-NEXT: linalg.copy(%[[ARG1]], %[[ALLOC0]])
// CHECK: ^[[BB2]]:		// CHECK: ^[[BB2]]:
// CHECK: %[[ALLOC1:.*]] = memref.alloc()		// CHECK: %[[ALLOC1:.*]] = memref.alloc()
// CHECK-NEXT: test.region_buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC1]]		// CHECK-NEXT: test.region_buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC1]]
// CHECK: %[[ALLOC2:.*]] = memref.alloc()		// CHECK: %[[ALLOC2:.*]] = memref.alloc()
// CHECK-NEXT: test.buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC2]]		// CHECK-NEXT: test.buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC2]]
// CHECK: memref.dealloc %[[ALLOC2]]		// CHECK: memref.dealloc %[[ALLOC2]]
// CHECK-NEXT: %{{.*}} = math.exp		// CHECK-NEXT: %{{.*}} = math.exp
// CHECK: %[[ALLOC3:.*]] = memref.alloc()		// CHECK: %[[ALLOC3:.*]] = memref.clone %[[ALLOC1]]
// CHECK-NEXT: linalg.copy(%[[ALLOC1]], %[[ALLOC3]])
// CHECK-NEXT: memref.dealloc %[[ALLOC1]]		// CHECK-NEXT: memref.dealloc %[[ALLOC1]]
// CHECK: ^[[BB3:.]]({{.}}):		// CHECK: ^[[BB3:.]]({{.}}):
// CHECK: test.copy		// CHECK: test.copy
// CHECK-NEXT: dealloc		// CHECK-NEXT: memref.dealloc

// -----		// -----

// Test Case: buffer deallocation escaping		// Test Case: buffer deallocation escaping
// BufferDeallocation expected behavior: It must not dealloc %arg1 and %x		// BufferDeallocation expected behavior: It must not dealloc %arg1 and %x
// since they are operands of return operation and should escape from		// since they are operands of return operation and should escape from
// deallocating. It should dealloc %y after CopyOp.		// deallocating. It should dealloc %y after CopyOp.

▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	%2 = scf.if %0 -> (memref<?x?xf32>) {
%3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>		%3 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
scf.yield %3 : memref<?x?xf32>		scf.yield %3 : memref<?x?xf32>
}		}
return %2 : memref<?x?xf32>		return %2 : memref<?x?xf32>
}		}

// CHECK: %[[ALLOC0:.*]] = memref.alloc(%arg0, %arg0)		// CHECK: %[[ALLOC0:.*]] = memref.alloc(%arg0, %arg0)
// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if		// CHECK-NEXT: %[[ALLOC1:.*]] = scf.if
// CHECK: %[[ALLOC2:.*]] = memref.alloc		// CHECK-NEXT: %[[ALLOC2:.*]] = memref.clone %[[ALLOC0]]
// CHECK-NEXT: linalg.copy(%[[ALLOC0]], %[[ALLOC2]])
// CHECK: scf.yield %[[ALLOC2]]		// CHECK: scf.yield %[[ALLOC2]]
// CHECK: %[[ALLOC3:.*]] = memref.alloc(%arg0, %arg1)		// CHECK: %[[ALLOC3:.*]] = memref.alloc(%arg0, %arg1)
// CHECK: %[[ALLOC4:.*]] = memref.alloc		// CHECK-NEXT: %[[ALLOC4:.*]] = memref.clone %[[ALLOC3]]
// CHECK-NEXT: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])
// CHECK: memref.dealloc %[[ALLOC3]]		// CHECK: memref.dealloc %[[ALLOC3]]
// CHECK: scf.yield %[[ALLOC4]]		// CHECK: scf.yield %[[ALLOC4]]
// CHECK: memref.dealloc %[[ALLOC0]]		// CHECK: memref.dealloc %[[ALLOC0]]
// CHECK-NEXT: return %[[ALLOC1]]		// CHECK-NEXT: return %[[ALLOC1]]

// -----		// -----

// Test Case: nested region control flow within a region interface.		// Test Case: nested region control flow within a region interface.
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	^bb2:
br ^bb3(%0 : memref<2xf32>)		br ^bb3(%0 : memref<2xf32>)
^bb3(%1: memref<2xf32>):		^bb3(%1: memref<2xf32>):
test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)		test.copy(%1, %arg2) : (memref<2xf32>, memref<2xf32>)
return		return
}		}
// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})		// CHECK: (%[[cond:.]]: {{.}}, %[[ARG1:.]]: {{.}}, %{{.}}: {{.}})
// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]		// CHECK-NEXT: cond_br %[[cond]], ^[[BB1:.]], ^[[BB2:.]]
// CHECK: ^[[BB1]]:		// CHECK: ^[[BB1]]:
// CHECK: %[[ALLOC0:.*]] = memref.alloc()		// CHECK: %[[ALLOC0:.*]] = memref.clone
// CHECK-NEXT: linalg.copy
// CHECK: ^[[BB2]]:		// CHECK: ^[[BB2]]:
// CHECK: %[[ALLOC1:.*]] = memref.alloc()		// CHECK: %[[ALLOC1:.*]] = memref.alloc()
// CHECK-NEXT: test.region_buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC1]]		// CHECK-NEXT: test.region_buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOC1]]
// CHECK: %[[ALLOCA:.*]] = memref.alloca()		// CHECK: %[[ALLOCA:.*]] = memref.alloca()
// CHECK-NEXT: test.buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOCA]]		// CHECK-NEXT: test.buffer_based in(%[[ARG1]]{{.*}}out(%[[ALLOCA]]
// CHECK: %{{.*}} = math.exp		// CHECK: %{{.*}} = math.exp
// CHECK: %[[ALLOC2:.*]] = memref.alloc()		// CHECK: %[[ALLOC2:.*]] = memref.clone %[[ALLOC1]]
// CHECK-NEXT: linalg.copy
// CHECK-NEXT: memref.dealloc %[[ALLOC1]]		// CHECK-NEXT: memref.dealloc %[[ALLOC1]]
// CHECK: ^[[BB3:.]]({{.}}):		// CHECK: ^[[BB3:.]]({{.}}):
// CHECK: test.copy		// CHECK: test.copy
// CHECK-NEXT: dealloc		// CHECK-NEXT: memref.dealloc

// -----		// -----

// CHECK-LABEL: func @nestedRegionControlFlowAlloca		// CHECK-LABEL: func @nestedRegionControlFlowAlloca
func @nestedRegionControlFlowAlloca(		func @nestedRegionControlFlowAlloca(
%arg0 : index,		%arg0 : index,
%arg1 : index) -> memref<?x?xf32> {		%arg1 : index) -> memref<?x?xf32> {
%0 = cmpi eq, %arg0, %arg1 : index		%0 = cmpi eq, %arg0, %arg1 : index
Show All 35 Lines	%1 = scf.for %i = %lb to %ub step %step
scf.yield %3 : memref<2xf32>		scf.yield %3 : memref<2xf32>
}		}
test.copy(%1, %res) : (memref<2xf32>, memref<2xf32>)		test.copy(%1, %res) : (memref<2xf32>, memref<2xf32>)
return		return
}		}

// CHECK: %[[ALLOC0:.*]] = memref.alloc()		// CHECK: %[[ALLOC0:.*]] = memref.alloc()
// CHECK-NEXT: memref.dealloc %[[ALLOC0]]		// CHECK-NEXT: memref.dealloc %[[ALLOC0]]
// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC1:.*]] = memref.clone %arg3
// CHECK: linalg.copy(%arg3, %[[ALLOC1]])
// CHECK: %[[ALLOC2:.]] = scf.for {{.}} iter_args		// CHECK: %[[ALLOC2:.]] = scf.for {{.}} iter_args
// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]]		// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]]
// CHECK: cmpi		// CHECK: cmpi
// CHECK: memref.dealloc %[[IALLOC]]		// CHECK: memref.dealloc %[[IALLOC]]
// CHECK: %[[ALLOC3:.*]] = memref.alloc()		// CHECK: %[[ALLOC3:.*]] = memref.alloc()
// CHECK: %[[ALLOC4:.*]] = memref.alloc()		// CHECK: %[[ALLOC4:.*]] = memref.clone %[[ALLOC3]]
// CHECK: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])
// CHECK: memref.dealloc %[[ALLOC3]]		// CHECK: memref.dealloc %[[ALLOC3]]
// CHECK: scf.yield %[[ALLOC4]]		// CHECK: scf.yield %[[ALLOC4]]
// CHECK: }		// CHECK: }
// CHECK: test.copy(%[[ALLOC2]], %arg4)		// CHECK: test.copy(%[[ALLOC2]], %arg4)
// CHECK-NEXT: memref.dealloc %[[ALLOC2]]		// CHECK-NEXT: memref.dealloc %[[ALLOC2]]

// -----		// -----

▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	%3 = scf.if %2 -> (memref<2xf32>) {
scf.yield %0 : memref<2xf32>		scf.yield %0 : memref<2xf32>
}		}
scf.yield %3 : memref<2xf32>		scf.yield %3 : memref<2xf32>
}		}
return %1 : memref<2xf32>		return %1 : memref<2xf32>
}		}

// CHECK: %[[ALLOC0:.*]] = memref.alloc()		// CHECK: %[[ALLOC0:.*]] = memref.alloc()
// CHECK: %[[ALLOC1:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC1:.*]] = memref.clone %arg3
// CHECK-NEXT: linalg.copy(%arg3, %[[ALLOC1]])
// CHECK-NEXT: %[[ALLOC2:.]] = scf.for {{.}} iter_args		// CHECK-NEXT: %[[ALLOC2:.]] = scf.for {{.}} iter_args
// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]]		// CHECK-SAME: (%[[IALLOC:.*]] = %[[ALLOC1]]
// CHECK: memref.dealloc %[[IALLOC]]		// CHECK: memref.dealloc %[[IALLOC]]
// CHECK: %[[ALLOC3:.*]] = scf.if		// CHECK: %[[ALLOC3:.*]] = scf.if

// CHECK: %[[ALLOC4:.*]] = memref.alloc()		// CHECK: %[[ALLOC4:.*]] = memref.alloc()
// CHECK-NEXT: %[[ALLOC5:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC5:.*]] = memref.clone %[[ALLOC4]]
// CHECK-NEXT: linalg.copy(%[[ALLOC4]], %[[ALLOC5]])
// CHECK-NEXT: memref.dealloc %[[ALLOC4]]		// CHECK-NEXT: memref.dealloc %[[ALLOC4]]
// CHECK-NEXT: scf.yield %[[ALLOC5]]		// CHECK-NEXT: scf.yield %[[ALLOC5]]

// CHECK: %[[ALLOC6:.*]] = memref.alloc()		// CHECK: %[[ALLOC6:.*]] = memref.clone %[[ALLOC0]]
// CHECK-NEXT: linalg.copy(%[[ALLOC0]], %[[ALLOC6]])
// CHECK-NEXT: scf.yield %[[ALLOC6]]		// CHECK-NEXT: scf.yield %[[ALLOC6]]

// CHECK: %[[ALLOC7:.*]] = memref.alloc()		// CHECK: %[[ALLOC7:.*]] = memref.clone %[[ALLOC3]]
// CHECK-NEXT: linalg.copy(%[[ALLOC3:.*]], %[[ALLOC7]])
// CHECK-NEXT: memref.dealloc %[[ALLOC3]]		// CHECK-NEXT: memref.dealloc %[[ALLOC3]]
// CHECK-NEXT: scf.yield %[[ALLOC7]]		// CHECK-NEXT: scf.yield %[[ALLOC7]]

// CHECK: memref.dealloc %[[ALLOC0]]		// CHECK: memref.dealloc %[[ALLOC0]]
// CHECK-NEXT: return %[[ALLOC2]]		// CHECK-NEXT: return %[[ALLOC2]]

// -----		// -----

Show All 31 Lines	%1 = scf.for %i = %lb to %ub step %step
scf.yield %2 : memref<2xf32>		scf.yield %2 : memref<2xf32>
}		}
test.copy(%1, %res) : (memref<2xf32>, memref<2xf32>)		test.copy(%1, %res) : (memref<2xf32>, memref<2xf32>)
return		return
}		}

// CHECK: %[[ALLOC0:.*]] = memref.alloc()		// CHECK: %[[ALLOC0:.*]] = memref.alloc()
// CHECK-NEXT: memref.dealloc %[[ALLOC0]]		// CHECK-NEXT: memref.dealloc %[[ALLOC0]]
// CHECK-NEXT: %[[ALLOC1:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC1:.*]] = memref.clone %arg3
// CHECK-NEXT: linalg.copy(%arg3, %[[ALLOC1]])
// CHECK-NEXT: %[[VAL_7:.]] = scf.for {{.}} iter_args		// CHECK-NEXT: %[[VAL_7:.]] = scf.for {{.}} iter_args
// CHECK-SAME: (%[[IALLOC0:.*]] = %[[ALLOC1]])		// CHECK-SAME: (%[[IALLOC0:.*]] = %[[ALLOC1]])
// CHECK: %[[ALLOC2:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC2:.*]] = memref.clone %[[IALLOC0]]
// CHECK-NEXT: linalg.copy(%[[IALLOC0]], %[[ALLOC2]])
// CHECK-NEXT: memref.dealloc %[[IALLOC0]]		// CHECK-NEXT: memref.dealloc %[[IALLOC0]]
// CHECK-NEXT: %[[ALLOC3:.]] = scf.for {{.}} iter_args		// CHECK-NEXT: %[[ALLOC3:.]] = scf.for {{.}} iter_args
// CHECK-SAME: (%[[IALLOC1:.*]] = %[[ALLOC2]])		// CHECK-SAME: (%[[IALLOC1:.*]] = %[[ALLOC2]])
// CHECK: %[[ALLOC5:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC5:.*]] = memref.clone %[[IALLOC1]]
// CHECK-NEXT: linalg.copy(%[[IALLOC1]], %[[ALLOC5]])
// CHECK-NEXT: memref.dealloc %[[IALLOC1]]		// CHECK-NEXT: memref.dealloc %[[IALLOC1]]

// CHECK: %[[ALLOC6:.]] = scf.for {{.}} iter_args		// CHECK: %[[ALLOC6:.]] = scf.for {{.}} iter_args
// CHECK-SAME: (%[[IALLOC2:.*]] = %[[ALLOC5]])		// CHECK-SAME: (%[[IALLOC2:.*]] = %[[ALLOC5]])
// CHECK: %[[ALLOC8:.*]] = memref.alloc()		// CHECK: %[[ALLOC8:.*]] = memref.alloc()
// CHECK-NEXT: memref.dealloc %[[ALLOC8]]		// CHECK-NEXT: memref.dealloc %[[ALLOC8]]
// CHECK: %[[ALLOC9:.*]] = scf.if		// CHECK: %[[ALLOC9:.*]] = scf.if

// CHECK: %[[ALLOC11:.*]] = memref.alloc()		// CHECK: %[[ALLOC11:.*]] = memref.alloc()
// CHECK-NEXT: %[[ALLOC12:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC12:.*]] = memref.clone %[[ALLOC11]]
// CHECK-NEXT: linalg.copy(%[[ALLOC11]], %[[ALLOC12]])
// CHECK-NEXT: memref.dealloc %[[ALLOC11]]		// CHECK-NEXT: memref.dealloc %[[ALLOC11]]
// CHECK-NEXT: scf.yield %[[ALLOC12]]		// CHECK-NEXT: scf.yield %[[ALLOC12]]

// CHECK: %[[ALLOC13:.*]] = memref.alloc()		// CHECK: %[[ALLOC13:.*]] = memref.clone %[[IALLOC2]]
// CHECK-NEXT: linalg.copy(%[[IALLOC2]], %[[ALLOC13]])
// CHECK-NEXT: scf.yield %[[ALLOC13]]		// CHECK-NEXT: scf.yield %[[ALLOC13]]

// CHECK: memref.dealloc %[[IALLOC2]]		// CHECK: memref.dealloc %[[IALLOC2]]
// CHECK-NEXT: %[[ALLOC10:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC10:.*]] = memref.clone %[[ALLOC9]]
// CHECK-NEXT: linalg.copy(%[[ALLOC9]], %[[ALLOC10]])
// CHECK-NEXT: memref.dealloc %[[ALLOC9]]		// CHECK-NEXT: memref.dealloc %[[ALLOC9]]
// CHECK-NEXT: scf.yield %[[ALLOC10]]		// CHECK-NEXT: scf.yield %[[ALLOC10]]

// CHECK: %[[ALLOC7:.*]] = memref.alloc()		// CHECK: %[[ALLOC7:.*]] = memref.clone %[[ALLOC6]]
// CHECK-NEXT: linalg.copy(%[[ALLOC6]], %[[ALLOC7]])
// CHECK-NEXT: memref.dealloc %[[ALLOC6]]		// CHECK-NEXT: memref.dealloc %[[ALLOC6]]
// CHECK-NEXT: scf.yield %[[ALLOC7]]		// CHECK-NEXT: scf.yield %[[ALLOC7]]

// CHECK: %[[ALLOC4:.*]] = memref.alloc()		// CHECK: %[[ALLOC4:.*]] = memref.clone %[[ALLOC3]]
// CHECK-NEXT: linalg.copy(%[[ALLOC3]], %[[ALLOC4]])
// CHECK-NEXT: memref.dealloc %[[ALLOC3]]		// CHECK-NEXT: memref.dealloc %[[ALLOC3]]
// CHECK-NEXT: scf.yield %[[ALLOC4]]		// CHECK-NEXT: scf.yield %[[ALLOC4]]

// CHECK: test.copy(%[[VAL_7]], %arg4)		// CHECK: test.copy(%[[VAL_7]], %arg4)
// CHECK-NEXT: memref.dealloc %[[VAL_7]]		// CHECK-NEXT: memref.dealloc %[[VAL_7]]

// -----		// -----

▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
// CHECK-SAME: %[[ARG1:.]]: {{.}},		// CHECK-SAME: %[[ARG1:.]]: {{.}},
// CHECK-SAME: %[[ARG2:.]]: {{.}}		// CHECK-SAME: %[[ARG2:.]]: {{.}}
// CHECK: %[[UNUSED_RESULT:.*]] = shape.assuming %[[ARG0]]		// CHECK: %[[UNUSED_RESULT:.*]] = shape.assuming %[[ARG0]]
// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc()		// CHECK-NEXT: %[[ALLOC0:.*]] = memref.alloc()
// CHECK-NEXT: memref.dealloc %[[ALLOC0]]		// CHECK-NEXT: memref.dealloc %[[ALLOC0]]
// CHECK-NEXT: shape.assuming_yield %[[ARG1]]		// CHECK-NEXT: shape.assuming_yield %[[ARG1]]
// CHECK: %[[ASSUMING_RESULT:.*]] = shape.assuming %[[ARG0]]		// CHECK: %[[ASSUMING_RESULT:.*]] = shape.assuming %[[ARG0]]
// CHECK-NEXT: %[[TMP_ALLOC:.*]] = memref.alloc()		// CHECK-NEXT: %[[TMP_ALLOC:.*]] = memref.alloc()
// CHECK-NEXT: %[[RETURNING_ALLOC:.*]] = memref.alloc()		// CHECK-NEXT: %[[RETURNING_ALLOC:.*]] = memref.clone %[[TMP_ALLOC]]
// CHECK-NEXT: linalg.copy(%[[TMP_ALLOC]], %[[RETURNING_ALLOC]])
// CHECK-NEXT: memref.dealloc %[[TMP_ALLOC]]		// CHECK-NEXT: memref.dealloc %[[TMP_ALLOC]]
// CHECK-NEXT: shape.assuming_yield %[[RETURNING_ALLOC]]		// CHECK-NEXT: shape.assuming_yield %[[RETURNING_ALLOC]]
// CHECK: test.copy(%[[ASSUMING_RESULT:.*]], %[[ARG2]])		// CHECK: test.copy(%[[ASSUMING_RESULT:.*]], %[[ARG2]])
// CHECK-NEXT: memref.dealloc %[[ASSUMING_RESULT]]		// CHECK-NEXT: memref.dealloc %[[ASSUMING_RESULT]]

// -----		// -----

// Test Case: The op "test.bar" does not implement the RegionBranchOpInterface.		// Test Case: The op "test.bar" does not implement the RegionBranchOpInterface.
Show All 13 Lines

mlir/test/Transforms/canonicalize.mlir

Show First 20 Lines • Show All 1,053 Lines • ▼ Show 20 Lines	func @subtensor(%t: tensor<8x16x4xf32>, %arg0 : index, %arg1 : index)
// CHECK-SAME: tensor<7x11x2xf32> to tensor<2x?x2xf32>		// CHECK-SAME: tensor<7x11x2xf32> to tensor<2x?x2xf32>
// CHECK: tensor.cast %{{.*}} : tensor<2x?x2xf32> to tensor<?x?x?xf32>		// CHECK: tensor.cast %{{.*}} : tensor<2x?x2xf32> to tensor<?x?x?xf32>
%2 = subtensor %1[%c0, %c0, %c0] [%c2, %arg0, %c2] [%c1, %c1, %c1]		%2 = subtensor %1[%c0, %c0, %c0] [%c2, %arg0, %c2] [%c1, %c1, %c1]
: tensor<?x?x?xf32> to tensor<?x?x?xf32>		: tensor<?x?x?xf32> to tensor<?x?x?xf32>

return %2 : tensor<?x?x?xf32>		return %2 : tensor<?x?x?xf32>
}		}

		// -----

		// CHECK-LABEL: func @simple_clone_elimination
		func @simple_clone_elimination() -> memref<5xf32> {
		%ret = memref.alloc() : memref<5xf32>
		%temp = memref.clone %ret : memref<5xf32> to memref<5xf32>
		memref.dealloc %temp : memref<5xf32>
		return %ret : memref<5xf32>
		}
		// CHECK-NEXT: %[[ret:.*]] = memref.alloc()
		// CHECK-NOT: %[[temp:.*]] = memref.clone
		// CHECK-NOT: memref.dealloc %[[temp]]
		// CHECK: return %[[ret]]

		// -----

		// CHECK-LABEL: func @clone_loop_alloc
		func @clone_loop_alloc(%arg0: index, %arg1: index, %arg2: index, %arg3: memref<2xf32>, %arg4: memref<2xf32>) {
		%0 = memref.alloc() : memref<2xf32>
		memref.dealloc %0 : memref<2xf32>
		%1 = memref.clone %arg3 : memref<2xf32> to memref<2xf32>
		%2 = scf.for %arg5 = %arg0 to %arg1 step %arg2 iter_args(%arg6 = %1) -> (memref<2xf32>) {
		%3 = cmpi eq, %arg5, %arg1 : index
		memref.dealloc %arg6 : memref<2xf32>
		%4 = memref.alloc() : memref<2xf32>
		%5 = memref.clone %4 : memref<2xf32> to memref<2xf32>
		memref.dealloc %4 : memref<2xf32>
		%6 = memref.clone %5 : memref<2xf32> to memref<2xf32>
		memref.dealloc %5 : memref<2xf32>
		scf.yield %6 : memref<2xf32>
		}
		linalg.copy(%2, %arg4) : memref<2xf32>, memref<2xf32>
		memref.dealloc %2 : memref<2xf32>
		return
		}

		// CHECK-NEXT: %[[ALLOC0:.*]] = memref.clone
		// CHECK-NEXT: %[[ALLOC1:.*]] = scf.for
		// CHECK-NEXT: memref.dealloc
		// CHECK-NEXT: %[[ALLOC2:.*]] = memref.alloc
		// CHECK-NEXT: scf.yield %[[ALLOC2]]
		// CHECK: linalg.copy(%[[ALLOC1]]
		// CHECK-NEXT: memref.dealloc %[[ALLOC1]]

		// -----

		// CHECK-LABEL: func @clone_nested_region
		func @clone_nested_region(%arg0: index, %arg1: index) -> memref<?x?xf32> {
		%0 = cmpi eq, %arg0, %arg1 : index
		%1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
		%2 = scf.if %0 -> (memref<?x?xf32>) {
		%3 = scf.if %0 -> (memref<?x?xf32>) {
		%9 = memref.clone %1 : memref<?x?xf32> to memref<?x?xf32>
		scf.yield %9 : memref<?x?xf32>
		} else {
		%7 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
		%10 = memref.clone %7 : memref<?x?xf32> to memref<?x?xf32>
		memref.dealloc %7 : memref<?x?xf32>
		scf.yield %10 : memref<?x?xf32>
		}
		%6 = memref.clone %3 : memref<?x?xf32> to memref<?x?xf32>
		herhutUnsubmitted Not Done Reply Inline Actions Why does this remain? herhut: Why does this remain?
		dfki-jugrAuthorUnsubmitted Done Reply Inline Actions %3 results from the scf.if. The canonicalization pattern only checks, if %3 is an alloc operation. Since this is not the case, this clone remains. dfki-jugr: %3 results from the scf.if. The canonicalization pattern only checks, if %3 is an alloc…
		herhutUnsubmitted Done Reply Inline Actions Now that the operand to clone is required to not be mutated, why do we need to see the alloc? Is it not good enough if a `clone` has a following `dealloc` of the source, if the clone itself does not have an earlier dealloc? We can land it like it is (which is quite restricted) but this should be revisited again to make it apply more broadly. herhut: Now that the operand to clone is required to not be mutated, why do we need to see the alloc?
		memref.dealloc %3 : memref<?x?xf32>
		scf.yield %6 : memref<?x?xf32>
		} else {
		%3 = memref.alloc(%arg1, %arg1) : memref<?x?xf32>
		%6 = memref.clone %3 : memref<?x?xf32> to memref<?x?xf32>
		memref.dealloc %3 : memref<?x?xf32>
		scf.yield %6 : memref<?x?xf32>
		}
		memref.dealloc %1 : memref<?x?xf32>
		return %2 : memref<?x?xf32>
		}

		// CHECK: %[[ALLOC1:.*]] = memref.alloc
		// CHECK-NEXT: %[[ALLOC2:.*]] = scf.if
		// CHECK-NEXT: %[[ALLOC3_1:.*]] = scf.if
		// CHECK-NEXT: %[[ALLOC4_1:.*]] = memref.clone %[[ALLOC1]]
		// CHECK-NEXT: scf.yield %[[ALLOC4_1]]
		// CHECK: %[[ALLOC4_2:.*]] = memref.alloc
		// CHECK-NEXT: scf.yield %[[ALLOC4_2]]
		// CHECK: scf.yield %[[ALLOC3_1]]
		// CHECK: %[[ALLOC3_2:.*]] = memref.alloc
		// CHECK-NEXT: scf.yield %[[ALLOC3_2]]
		// CHECK: memref.dealloc %[[ALLOC1]]
		// CHECK-NEXT: return %[[ALLOC2]]
		herhutUnsubmitted Done Reply Inline Actions It is not clear to me what these tests are testing. herhut: It is not clear to me what these tests are testing.

mlir/test/Transforms/copy-removal.mlir

This file was deleted.

	// RUN: mlir-opt -copy-removal -split-input-file %s \| FileCheck %s

	// All linalg copies except the linalg.copy(%1, %9) must be removed since the
	// defining operation of %1 and its DeallocOp have been defined in another block.

	// CHECK-LABEL: func @nested_region_control_flow_div_nested
	func @nested_region_control_flow_div_nested(%arg0: index, %arg1: index) -> memref<?x?xf32> {
	%0 = cmpi eq, %arg0, %arg1 : index
	%1 = memref.alloc(%arg0, %arg0) : memref<?x?xf32>
	// CHECK: %{{.*}} = scf.if
	%2 = scf.if %0 -> (memref<?x?xf32>) {
	// CHECK: %[[PERCENT3:.*]] = scf.if
	%3 = scf.if %0 -> (memref<?x?xf32>) {
	%c0_0 = constant 0 : index
	%7 = memref.dim %1, %c0_0 : memref<?x?xf32>
	%c1_1 = constant 1 : index
	%8 = memref.dim %1, %c1_1 : memref<?x?xf32>
	%9 = memref.alloc(%7, %8) : memref<?x?xf32>
	// CHECK: linalg.copy({{.}}, %[[PERCENT9:.]])
	linalg.copy(%1, %9) : memref<?x?xf32>, memref<?x?xf32>
	// CHECK: scf.yield %[[PERCENT9]]
	scf.yield %9 : memref<?x?xf32>
	} else {
	// CHECK: %[[PERCENT7:.*]] = memref.alloc
	%7 = memref.alloc(%arg0, %arg1) : memref<?x?xf32>
	%c0_0 = constant 0 : index
	%8 = memref.dim %7, %c0_0 : memref<?x?xf32>
	%c1_1 = constant 1 : index
	%9 = memref.dim %7, %c1_1 : memref<?x?xf32>
	// CHECK-NOT: %{{.*}} = memref.alloc
	// CHECK-NOT: linalg.copy(%[[PERCENT7]], %{{.*}})
	// CHECK-NOT: memref.dealloc %[[PERCENT7]]
	%10 = memref.alloc(%8, %9) : memref<?x?xf32>
	linalg.copy(%7, %10) : memref<?x?xf32>, memref<?x?xf32>
	memref.dealloc %7 : memref<?x?xf32>
	// CHECK: scf.yield %[[PERCENT7]]
	scf.yield %10 : memref<?x?xf32>
	}
	%c0 = constant 0 : index
	%4 = memref.dim %3, %c0 : memref<?x?xf32>
	%c1 = constant 1 : index
	%5 = memref.dim %3, %c1 : memref<?x?xf32>
	// CHECK-NOT: %{{.*}} = memref.alloc
	// CHECK-NOT: linalg.copy(%[[PERCENT3]], %{{.*}})
	// CHECK-NOT: memref.dealloc %[[PERCENT3]]
	%6 = memref.alloc(%4, %5) : memref<?x?xf32>
	linalg.copy(%3, %6) : memref<?x?xf32>, memref<?x?xf32>
	memref.dealloc %3 : memref<?x?xf32>
	// CHECK: scf.yield %[[PERCENT3]]
	scf.yield %6 : memref<?x?xf32>
	} else {
	// CHECK: %[[PERCENT3:.*]] = memref.alloc
	%3 = memref.alloc(%arg1, %arg1) : memref<?x?xf32>
	%c0 = constant 0 : index
	%4 = memref.dim %3, %c0 : memref<?x?xf32>
	%c1 = constant 1 : index
	%5 = memref.dim %3, %c1 : memref<?x?xf32>
	// CHECK-NOT: %{{.*}} = memref.alloc
	// CHECK-NOT: linalg.copy(%[[PERCENT3]], %{{.*}})
	// CHECK-NOT: memref.dealloc %[[PERCENT3]]
	%6 = memref.alloc(%4, %5) : memref<?x?xf32>
	linalg.copy(%3, %6) : memref<?x?xf32>, memref<?x?xf32>
	memref.dealloc %3 : memref<?x?xf32>
	// CHECK: scf.yield %[[PERCENT3]]
	scf.yield %6 : memref<?x?xf32>
	}
	memref.dealloc %1 : memref<?x?xf32>
	return %2 : memref<?x?xf32>
	}

	// -----

	// CHECK-LABEL: func @simple_test
	func @simple_test() -> memref<5xf32> {
	%temp = memref.alloc() : memref<5xf32>
	%ret = memref.alloc() : memref<5xf32>
	linalg.copy(%ret, %temp) : memref<5xf32>, memref<5xf32>
	memref.dealloc %ret : memref<5xf32>
	return %temp : memref<5xf32>
	}
	// CHECK-SAME: () -> memref<5xf32>
	// CHECK-NEXT: %[[ret:.*]] = memref.alloc()
	// CHECK-NOT: linalg.copy(%[[ret]], %{{.*}})
	// CHECK-NOT: memref.dealloc %[[ret]]
	// CHECK: return %[[ret]]

	// -----

	// It is legal to remove the copy operation that %ret has a usage before the copy
	// operation. The allocation of %temp and the deallocation of %ret should be also
	// removed.

	// CHECK-LABEL: func @test_with_ret_usage_before_copy
	func @test_with_ret_usage_before_copy() -> memref<5xf32> {
	%ret = memref.alloc() : memref<5xf32>
	%temp = memref.alloc() : memref<5xf32>
	%c0 = constant 0 : index
	%dimension = memref.dim %ret, %c0 : memref<5xf32>
	linalg.copy(%ret, %temp) : memref<5xf32>, memref<5xf32>
	memref.dealloc %ret : memref<5xf32>
	return %temp : memref<5xf32>
	}
	// CHECK-NEXT: %[[ret:.*]] = memref.alloc()
	// CHECK-NOT: %{{.*}} = memref.alloc
	// CHECK-NEXT: %{{.*}} = constant
	// CHECK-NEXT: %[[DIM:.*]] = memref.dim %[[ret]]
	// CHECK-NOT: linalg.copy(%[[ret]], %{{.*}})
	// CHECK-NOT: memref.dealloc %[[ret]]
	// CHECK: return %[[ret]]

	// -----

	// It is illegal to remove a copy operation that %ret has a usage after copy
	// operation.

	// CHECK-LABEL: func @test_with_ret_usage_after_copy
	func @test_with_ret_usage_after_copy() -> memref<5xf32> {
	%ret = memref.alloc() : memref<5xf32>
	%temp = memref.alloc() : memref<5xf32>
	// CHECK: linalg.copy
	linalg.copy(%ret, %temp) : memref<5xf32>, memref<5xf32>
	%c0 = constant 0 : index
	%dimension = memref.dim %ret, %c0 : memref<5xf32>
	memref.dealloc %ret : memref<5xf32>
	return %temp : memref<5xf32>
	}

	// -----

	// It is illegal to remove a copy operation that %temp has a usage before copy
	// operation.

	// CHECK-LABEL: func @test_with_temp_usage_before_copy
	func @test_with_temp_usage_before_copy() -> memref<5xf32> {
	%ret = memref.alloc() : memref<5xf32>
	%temp = memref.alloc() : memref<5xf32>
	%c0 = constant 0 : index
	%dimension = memref.dim %temp, %c0 : memref<5xf32>
	// CHECK: linalg.copy
	linalg.copy(%ret, %temp) : memref<5xf32>, memref<5xf32>
	memref.dealloc %ret : memref<5xf32>
	return %temp : memref<5xf32>
	}

	// -----

	// It is legal to remove the copy operation that %temp has a usage after the copy
	// operation. The allocation of %temp and the deallocation of %ret could be also
	// removed.

	// However the following pattern is not handled by copy removal.
	// %from = memref.alloc()
	// %to = memref.alloc()
	// copy(%from, %to)
	// read_from(%from) + write_to(%something_else)
	// memref.dealloc(%from)
	// return %to
	// In particular, linalg.generic is a memoryEffectOp between copy and dealloc.
	// Since no alias analysis is performed and no distinction is made between reads
	// and writes, the linalg.generic with effects blocks copy removal.

	#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @test_with_temp_usage_after_copy
	func @test_with_temp_usage_after_copy() -> memref<5xf32> {
	%ret = memref.alloc() : memref<5xf32>
	%res = memref.alloc() : memref<5xf32>
	%temp = memref.alloc() : memref<5xf32>
	linalg.copy(%ret, %temp) : memref<5xf32>, memref<5xf32>
	linalg.generic {
	indexing_maps = [#map0, #map0],
	iterator_types = ["parallel"]}
	ins(%temp : memref<5xf32>)
	outs(%res : memref<5xf32>) {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = math.exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32
	}
	memref.dealloc %ret : memref<5xf32>
	return %temp : memref<5xf32>
	}
	// CHECK-NEXT: %[[ret:.*]] = memref.alloc()
	// CHECK-NEXT: %[[res:.*]] = memref.alloc()
	// CHECK-NEXT: %[[temp:.*]] = memref.alloc()
	// CHECK-NEXT: linalg.copy(%[[ret]], %[[temp]])
	// CHECK-NEXT: linalg.generic
	// CHECK: memref.dealloc %[[ret]]
	// CHECK: return %[[temp]]

	// -----

	// CHECK-LABEL: func @make_allocation
	func @make_allocation() -> memref<5xf32> {
	%mem = memref.alloc() : memref<5xf32>
	return %mem : memref<5xf32>
	}

	// CHECK-LABEL: func @test_with_function_call
	func @test_with_function_call() -> memref<5xf32> {
	// CHECK-NEXT: %[[ret:.*]] = call @make_allocation() : () -> memref<5xf32>
	%ret = call @make_allocation() : () -> (memref<5xf32>)
	// CHECK-NOT: %{{.*}} = memref.alloc
	// CHECK-NOT: linalg.copy(%[[ret]], %{{.*}})
	// CHECK-NOT: memref.dealloc %[[ret]]
	%temp = memref.alloc() : memref<5xf32>
	linalg.copy(%ret, %temp) : memref<5xf32>, memref<5xf32>
	memref.dealloc %ret : memref<5xf32>
	// CHECK: return %[[ret]]
	return %temp : memref<5xf32>
	}

	// -----

	// CHECK-LABEL: func @multiple_deallocs_in_different_blocks
	func @multiple_deallocs_in_different_blocks(%cond : i1) -> memref<5xf32> {
	// CHECK-NEXT: %[[PERCENT0:.*]] = memref.alloc()
	%0 = memref.alloc() : memref<5xf32>
	cond_br %cond, ^bb1, ^bb2
	^bb1:
	memref.dealloc %0 : memref<5xf32>
	// CHECK: br ^[[BB3:.*]](%[[PERCENT0]]
	br ^bb3(%0 : memref<5xf32>)
	^bb2:
	// CHECK-NOT: %{{.*}} = memref.alloc
	// CHECK-NOT: linalg.copy(%[[PERCENT0]], %{{.*}})
	// CHECK-NOT: memref.dealloc %[[PERCENT0]]
	%temp = memref.alloc() : memref<5xf32>
	linalg.copy(%0, %temp) : memref<5xf32>, memref<5xf32>
	memref.dealloc %0 : memref<5xf32>
	// CHECK: br ^[[BB3]](%[[PERCENT0]]
	br ^bb3(%temp : memref<5xf32>)
	^bb3(%res : memref<5xf32>):
	return %res : memref<5xf32>
	}

	// -----

	#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @test_ReuseCopyTargetAsSource
	func @test_ReuseCopyTargetAsSource(%arg0: memref<2xf32>, %result: memref<2xf32>){
	// CHECK-SAME: (%[[ARG0:.]]: memref<2xf32>, %[[RES:.]]: memref<2xf32>)
	// CHECK-NOT: %{{.*}} = memref.alloc
	%temp = memref.alloc() : memref<2xf32>
	// CHECK-NEXT: linalg.generic
	// CHECK-SAME: ins(%[[ARG0]]{{.*}}outs(%[[RES]]
	// CHECK-NOT: linalg.copy(%{{.*}}, %[[RES]])
	// CHECK-NOT: memref.dealloc %{{.*}}
	linalg.generic {
	indexing_maps = [#map0, #map0],
	iterator_types = ["parallel"]}
	ins(%arg0 : memref<2xf32>)
	outs(%temp : memref<2xf32>) {
	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
	%tmp2 = math.exp %gen2_arg0 : f32
	linalg.yield %tmp2 : f32
	}
	linalg.copy(%temp, %result) : memref<2xf32>, memref<2xf32>
	memref.dealloc %temp : memref<2xf32>
	// CHECK: return
	return
	}

	// -----

	// Copy operation must not be removed since an operation writes to %to value
	// before copy.

	#map0 = affine_map<(d0) -> (d0)>

	// CHECK-LABEL: func @test_ReuseCopyTargetAsSource
	func @test_ReuseCopyTargetAsSource(%arg0: memref<2xf32>){
	%to = memref.alloc() : memref<2xf32>
	%temp = memref.alloc() : memref<2xf32>
	linalg.generic {
	indexing_maps = [#map0, #map0],
	iterator_types = ["parallel"]}
	ins(%arg0 : memref<2xf32>)
	outs(%temp : memref<2xf32>) {
	^bb0(%gen1_arg0: f32, %gen1_arg1: f32):
	%tmp1 = math.exp %gen1_arg0 : f32
	linalg.yield %tmp1 : f32
	}
	linalg.generic {
	indexing_maps = [#map0, #map0],
	iterator_types = ["parallel"]}
	ins(%arg0 : memref<2xf32>)
	outs(%to : memref<2xf32>) {
	^bb0(%gen2_arg0: f32, %gen2_arg1: f32):
	%tmp2 = math.exp %gen2_arg0 : f32
	linalg.yield %tmp2 : f32
	}
	// CHECK: linalg.copy
	linalg.copy(%temp, %to) : memref<2xf32>, memref<2xf32>
	memref.dealloc %temp : memref<2xf32>
	return
	}

	// -----

	// The only redundant copy is linalg.copy(%4, %5)

	// CHECK-LABEL: func @loop_alloc
	func @loop_alloc(%arg0: index, %arg1: index, %arg2: index, %arg3: memref<2xf32>, %arg4: memref<2xf32>) {
	// CHECK: %{{.*}} = memref.alloc()
	%0 = memref.alloc() : memref<2xf32>
	memref.dealloc %0 : memref<2xf32>
	// CHECK: %{{.*}} = memref.alloc()
	%1 = memref.alloc() : memref<2xf32>
	// CHECK: linalg.copy
	linalg.copy(%arg3, %1) : memref<2xf32>, memref<2xf32>
	%2 = scf.for %arg5 = %arg0 to %arg1 step %arg2 iter_args(%arg6 = %1) -> (memref<2xf32>) {
	%3 = cmpi eq, %arg5, %arg1 : index
	// CHECK: memref.dealloc
	memref.dealloc %arg6 : memref<2xf32>
	// CHECK: %[[PERCENT4:.*]] = memref.alloc()
	%4 = memref.alloc() : memref<2xf32>
	// CHECK-NOT: memref.alloc
	// CHECK-NOT: linalg.copy
	// CHECK-NOT: memref.dealloc
	%5 = memref.alloc() : memref<2xf32>
	linalg.copy(%4, %5) : memref<2xf32>, memref<2xf32>
	memref.dealloc %4 : memref<2xf32>
	// CHECK: %[[PERCENT6:.*]] = memref.alloc()
	%6 = memref.alloc() : memref<2xf32>
	// CHECK: linalg.copy(%[[PERCENT4]], %[[PERCENT6]])
	linalg.copy(%5, %6) : memref<2xf32>, memref<2xf32>
	scf.yield %6 : memref<2xf32>
	}
	// CHECK: linalg.copy
	linalg.copy(%2, %arg4) : memref<2xf32>, memref<2xf32>
	memref.dealloc %2 : memref<2xf32>
	return
	}

	// -----

	// The linalg.copy operation can be removed in addition to alloc and dealloc
	// operations. All uses of %0 is then replaced with %arg2.

	// CHECK-LABEL: func @check_with_affine_dialect
	func @check_with_affine_dialect(%arg0: memref<4xf32>, %arg1: memref<4xf32>, %arg2: memref<4xf32>) {
	// CHECK-SAME: (%[[ARG0:.]]: memref<4xf32>, %[[ARG1:.]]: memref<4xf32>, %[[RES:.*]]: memref<4xf32>)
	// CHECK-NOT: memref.alloc
	%0 = memref.alloc() : memref<4xf32>
	affine.for %arg3 = 0 to 4 {
	%5 = affine.load %arg0[%arg3] : memref<4xf32>
	%6 = affine.load %arg1[%arg3] : memref<4xf32>
	%7 = cmpf ogt, %5, %6 : f32
	// CHECK: %[[SELECT_RES:.*]] = select
	%8 = select %7, %5, %6 : f32
	// CHECK-NEXT: affine.store %[[SELECT_RES]], %[[RES]]
	affine.store %8, %0[%arg3] : memref<4xf32>
	}
	// CHECK-NOT: linalg.copy
	// CHECK-NOT: dealloc
	linalg.copy(%0, %arg2) : memref<4xf32>, memref<4xf32>
	memref.dealloc %0 : memref<4xf32>
	//CHECK: return
	return
	}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Introduce CloneOp and adapt test cases in BufferDeallocation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 333524

mlir/docs/BufferDeallocationInternals.md

mlir/include/mlir/Dialect/MemRef/IR/MemRef.h

mlir/include/mlir/Dialect/MemRef/IR/MemRefOps.td

mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h

mlir/include/mlir/Transforms/BufferUtils.h

mlir/include/mlir/Transforms/Passes.h

mlir/include/mlir/Transforms/Passes.td

mlir/lib/Dialect/MemRef/CMakeLists.txt

mlir/lib/Dialect/MemRef/IR/CMakeLists.txt

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp

mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp

mlir/lib/Transforms/BufferDeallocation.cpp

mlir/lib/Transforms/BufferUtils.cpp

mlir/lib/Transforms/CMakeLists.txt

mlir/lib/Transforms/CopyRemoval.cpp

mlir/test/Transforms/buffer-deallocation.mlir

mlir/test/Transforms/canonicalize.mlir

mlir/test/Transforms/copy-removal.mlir

[mlir] Introduce CloneOp and adapt test cases in BufferDeallocation.
ClosedPublic