This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferization] Improve buffer deallocation pass
ClosedPublic

Authored by maerhart on Aug 21 2023, 6:44 AM.

Details

Summary

Add a new Buffer Deallocation pass replacing the old one with the goal of
inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.

Depends on D156663

Diff Detail

Event Timeline

maerhart created this revision.Aug 21 2023, 6:44 AM
Herald added a project: Restricted Project. · View Herald Transcript
maerhart requested review of this revision.Aug 21 2023, 6:44 AM
maerhart updated this revision to Diff 552385.Aug 22 2023, 8:26 AM

clang-format

springerm added inline comments.Aug 28 2023, 2:16 AM
mlir/docs/Bufferization.md
230

"I.e., after running One-Shot Bufferize, the result IR may have a number of memref.alloc ops, but no memref.dealloc ops."

238

result of the -buffer-deallocation pass

246

arguments

249

acquired?

254

with the same allocated base buffer

257

Describe what this means in practice: If a function is external, we assume that the ABI is as described above. If a function is not external, it will be processed by the buffer deallocation pass, which will rewrite the function in such a way that the ABI is respected (if needed).

266

remove

267–268

: it lowers to

268–269

. When enough aliasing information is statically available, operands or the entire op may fold away.

269–272

remove

277

materialize as

277

ownership indicators

278

I would remove this. Conceptually, there is an ownership indicator for every memref, but it can be optimized away in many cases (?)

319

need

388

the

390–392

What does that mean?

394–395

What happens in that case?

397–398

Can we make the pass fail in case there are existing deallocs?

409

of

409

maybe add in parentheses that "private" means cannot be called externally

springerm added inline comments.Aug 28 2023, 2:16 AM
mlir/docs/Bufferization.md
226

We could have the high-level compilation flow as a diagram here:

one-shot-bufferize  -> buffer-deallocation -> buffer-deallocation-simplification -> canonicalize -> bufferization-to-memref
417

something missing here?

431

as follows

435

Mention that no ownership will be taken for %memref.

459

remove

468

Mention that the ownership for alloca is always false

504

I think it would be useful to mention the bufferization-to-memref pass and show how to a single bufferization.dealloc op is lowered. You can probably copy it mostly from the C++ comments that we have with that pass.

springerm added inline comments.Aug 28 2023, 6:35 AM
mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
45

buildBoolValue

204–215

You can use isEqualConstantIntOrValue

257–258

Describe what handleOp is supposed to do.

277–278

what does this mean?

285

implements

287

typo

287

typo

289

It would be

509

const

531

can this be const

580–581

Wouldn't this lead to spurious deallocs (double deallocs, deallocs of buffers that we don't own, etc.)?

604

Can we use a more descriptive name? set sounds like it would overwrite the original ownership. Maybe joinOwnership or combineOwnership?

768

It seems like it is expected that memrefsToDeallocatePerBlock was already populated. Can we turn this into a memrefsToDeallocatePerBlock.find and assert if the block is not in the map?

780–781

Add a comment that this is needed because the same memref that was produced by the alloc op must be passed to dealloc (and not a subview etc.).

954–955

What are the exact assumptions and can we check for them here? Mention that these assumptions are satisfied for scf.for and scf.while.

987–990

You can use DenseMap::find and then assert(it != ownershipMap.end()).

1012

I think this can be written as auto [newMemRef, condition]

1051–1053

only exactly 1 successor is supported

1065

if (op.getSuccessorOperands(0).getProducedOperandCount() > 0) return emitError("produced operands are not supported");

1150–1152

Can we check if the operand is defined by an op with an alloc side effect and both are in the same block. And otherwise return an error to make sure we don't miscompile/have double deallocs?

1152

the

1195–1202

Is is possible to have a separate handleInterface for func.return and somehow make sure that the impl for RegionBranchTerminatorOpInterface will not be called? (To keep this implementation here simpler.)

1205

how

1206

which

mlir/lib/Dialect/Bufferization/Transforms/BufferUtils.cpp
209

Why is this needed. There's a ValueComparator in OneShotAnalysis.h with a simpler implementation, would that suffice?

struct ValueComparator {
  bool operator()(const Value &lhs, const Value &rhs) const {
    return lhs.getImpl() < rhs.getImpl();
  }
};

Do we need this so that the result IR is deterministic?

maerhart updated this revision to Diff 555740.Sep 4 2023, 6:28 AM
maerhart marked 52 inline comments as done.

address comments and rebase

mlir/docs/Bufferization.md
390–392

In theory, you could implement your own kind of CallOpInterface, for example, and not use the upstream one. In that case, the pass wouldn't do the correct thing. But I guess that's obvious and I could remove this point.

394–395

I think it should just work with the current implementation, but there are no regression tests for that yet, so I didn't remove it. I thought of doing this in a later PR to keep this one a bit smaller (there are already a lot of regression tests to look at in this PR).

397–398

It fails with a nice error message if bufferization.dealloc of memref.dealloc operations are already present, but we cannot check for the dealloc side-effect on memref values, because the memref.realloc operation is modeled using a dealloc+alloc side-effect and there are use-cases that need realloc support.

417

Hmm, that's interesting. I thought I wrote something here. Maybe I forgot to commit the last few changes or lost them during export or rebase...
I've added something again :)

mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
277–278

Oh, this is actually outdated, I removed this part.

580–581

At the point where the dealloc operation is inserted, the ownership values for all memrefs in this list are queried and added as deallocation condition to the dealloc op. If a memref produced by an alloca is added here, the ownership value will be false and no deallocation will actually happen (and most likely be optimized away in the simplification pass). It's more dangerous to not add a value that might have to be deallocated in (only) some situations and thus leading to memory leaks.

768

If there is a block in which no memref has to be deallocated, there will be no entry in the map. In that case we would get the default constructed empty list here which is the behavior we want. Alternatively, we could establish the invariant that every processed block has to add an entry (with a possibly empty list as value), but I don't really see why this would be strictly better.

1012

The problem is that newMemref is used in the lambda body below which is a C++20 feature and LLVM uses C++17.

1150–1152

I fixed this by lowering memref.realloc operations in a separate pass beforehand in a way that doesn't insert the deallocation operation for the original buffer. That means, the deallocation pass will insert the memref.dealloc operation for the realloc. Additionally, I check that no dealloc side-effect is present on memref values (or otherwise emit an error).

1195–1202

This way we also support function operations that are not func.func (but still implement the FunctionOpInterface and ReturnLike on the return-like terminator). Also note, that we still need all of the code for func.return because it can be in both private and public functions.

mlir/lib/Dialect/Bufferization/Transforms/BufferUtils.cpp
209

Yes, exactly. The implementation in OneShotAnalysis.h compares pointers and as long as the Values defined in a function aren't guaranteed to always be stored in a particular order in memory (and thus provide some guarantees on the pointer values returned by getImpl()) it is not sufficient to generate deterministic IR.

springerm added inline comments.Sep 6 2023, 7:32 AM
mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td
23–85

Is this still up to date?

89

Can you describe this briefly in the description part? What's different when this flag is enabled or disabled?

mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
264–266

A bit of high-level information is still missing here. Presumably the task of handleOp is to insert bufferization.dealloc ops. That should be mentioned. What if an op implements multiple interfaces? Will multiple bufferization.dealloc be inserted?

297

auxiliary

333

an

355

Mention that regions do not take ownership of memrefs that are passed to them from the outside. Similarly, no ownership is taken of implicitly captured memrefs (for ops that are not isolated from above).

1126–1127

Why is this needed? Shouldn't there be at least one ReturnOp from which you can always take the result types?

1137

: no ownership is taken (?)

1145

Does it matter in which order the blocks are traversed? E.g., do the ownerships of the bbargs already have to be available in the internal state? Note that the order of traversal here is not necessarily according to block dominance.

1149

Maybe populateRemainingOwnerships?

1179

This is a bit ambiguous. Maybe rephrase as: to pass to the "bufferization.dealloc" op as "memrefs to dealloc"

1181

Maybe ownerships is a more descriptive name here?

1188

Maybe thenOwnerships and elseOwnerships would be more descriptive.

1189

Add comment: // Retain trueDestOperands` if "true" branch is taken.`

1190

To make this function a bit simpler, can you just pass a Value branchTaken here and construct the AndIOp inside of insertDeallocForBranch?

1191–1193

Is there a reason why is this is emitError instead of assert? When can this happen?

1196

Add comment: Add comment: // Retain falseDestOperands` if "false" branch is taken.`

1223

Can you add a comment here.

Ownership of implicitly captured memrefs from other regions is never taken, but ownership of memref in the same region (but different block) is taken if I understand correctly.

1224–1225

Does this assume that the first region is executed when entering this op? Maybe getEntrySuccessorRegions/getSuccessorRegions should be called?

1227

Add comment: No ownership is taken for any memrefs that are passed to the region from outside.

1284

How does this work when branching back to the entry block of a non-private function? (For which no ownership indicators are added (?))

1299–1301

I cannot follow this sentence. Can you rephrase? I think there are 2 separate cases. clone and something like alloca?

1310

nit: Store this in a helper variable at the beginning of the function, and call only once instead of twice.

1312

Under what circumstances would getMemrefWithUniqueOwnership insert a clone here? Just for me to understand, no need to add any IR as a comment to the code....

mlir/lib/Dialect/Bufferization/Transforms/Bufferize.cpp
630–680

nit: Can you move this into a separate revision. We can land this already.

maerhart updated this revision to Diff 556132.Sep 7 2023, 5:16 AM
maerhart marked 25 inline comments as done.

address comments

maerhart added inline comments.Sep 7 2023, 5:17 AM
mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td
23–85

I've updated it now. The alloc, copy, and dealloc in ^bb2 are not inserted anymore.

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
781

We don't need the debug names anymore since we now register those patterns as well in the BufferDeallocationSimplification pass.

mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
1126–1127

I haven't checked if the verifier of func.func enforces something, but in principle you could have an infinite loop by branching in the entry block to the entry block again and it would be not returnLike op. But that'd only be relevant when we lift the explicit control flow loop limitation.

I added a comment mentioning this, but no regression test because we don't support unstructured control flow loops yet at all.

1145

Excellent point, this should traverse in dominance order.

1191–1193

Right, that's actually overly conservative, it can only happen when the BufferDeallocationOpInterface or a default handler in this file is implemented incorrectly. I'll convert it to an assert.

1284

Currently, this case would be rejected with a nice error message because we don't support unstructured control flow loops yet. If just removing this precondition check, this would lead to a memory leak, so we have to special-case it. Thanks for bringing this up!

1312

Assuming that the BufferDeallocationOpInterface is not implemented for arith.select, the following would lead to a clone:

func.func private @f(...
func.func @top() {
%alloc = memref.alloc
%alloca = memref.alloca
%select = arith.select %cond, %alloc, %alloca
func.call @f(%select)
...
}
maerhart updated this revision to Diff 556133.Sep 7 2023, 5:20 AM

remove canonicalization pattern debug names

maerhart added inline comments.Sep 8 2023, 5:40 AM
mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
1284

I took a closer look and it turns out that for any region it's impossible to branch to the entry block because that's checked in the verifier (the entry block may not have any predecessors).

springerm accepted this revision.Sep 11 2023, 5:24 AM
This revision is now accepted and ready to land.Sep 11 2023, 5:24 AM
This revision was landed with ongoing or failed builds.Sep 13 2023, 2:31 AM
This revision was automatically updated to reflect the committed changes.
mlir/test/Dialect/Bufferization/Transforms/BufferDeallocation/invalid-buffer-deallocation.mlir