This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferization] Improve buffer deallocation pass
ClosedPublic

Authored by maerhart on Aug 21 2023, 6:44 AM.

Details

Summary

Add a new Buffer Deallocation pass replacing the old one with the goal of
inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.

Depends on D156663

Diff Detail

Event Timeline

maerhart created this revision.Aug 21 2023, 6:44 AM
Herald added a project: Restricted Project. · View Herald Transcript
maerhart requested review of this revision.Aug 21 2023, 6:44 AM
maerhart updated this revision to Diff 552385.Aug 22 2023, 8:26 AM

clang-format

springerm added inline comments.Aug 28 2023, 2:16 AM
mlir/docs/Bufferization.md
230

"I.e., after running One-Shot Bufferize, the result IR may have a number of memref.alloc ops, but no memref.dealloc ops."

238

result of the -buffer-deallocation pass

246

arguments

249

acquired?

254

with the same allocated base buffer

257

Describe what this means in practice: If a function is external, we assume that the ABI is as described above. If a function is not external, it will be processed by the buffer deallocation pass, which will rewrite the function in such a way that the ABI is respected (if needed).

266

remove

267–268

: it lowers to

268–269

. When enough aliasing information is statically available, operands or the entire op may fold away.

269–272

remove

277

materialize as

277

ownership indicators

278

I would remove this. Conceptually, there is an ownership indicator for every memref, but it can be optimized away in many cases (?)

319

need

388

the

390–392

What does that mean?

394–395

What happens in that case?

397–398

Can we make the pass fail in case there are existing deallocs?

409

of

409

maybe add in parentheses that "private" means cannot be called externally

springerm added inline comments.Aug 28 2023, 2:16 AM
mlir/docs/Bufferization.md
226

We could have the high-level compilation flow as a diagram here:

one-shot-bufferize  -> buffer-deallocation -> buffer-deallocation-simplification -> canonicalize -> bufferization-to-memref
417

something missing here?

431

as follows

435

Mention that no ownership will be taken for %memref.

459

remove

468

Mention that the ownership for alloca is always false

504

I think it would be useful to mention the bufferization-to-memref pass and show how to a single bufferization.dealloc op is lowered. You can probably copy it mostly from the C++ comments that we have with that pass.

springerm added inline comments.Aug 28 2023, 6:35 AM
mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
46–47

buildBoolValue

205–216

You can use isEqualConstantIntOrValue

251–252

Describe what handleOp is supposed to do.

296–297

what does this mean?

297–487

implements

299

typo

299

typo

301

It would be

531

const

553

can this be const

610–611

Wouldn't this lead to spurious deallocs (double deallocs, deallocs of buffers that we don't own, etc.)?

634

Can we use a more descriptive name? set sounds like it would overwrite the original ownership. Maybe joinOwnership or combineOwnership?

795

It seems like it is expected that memrefsToDeallocatePerBlock was already populated. Can we turn this into a memrefsToDeallocatePerBlock.find and assert if the block is not in the map?

807–808

Add a comment that this is needed because the same memref that was produced by the alloc op must be passed to dealloc (and not a subview etc.).

894–895

What are the exact assumptions and can we check for them here? Mention that these assumptions are satisfied for scf.for and scf.while.

927–930

You can use DenseMap::find and then assert(it != ownershipMap.end()).

952

I think this can be written as auto [newMemRef, condition]

991–993

only exactly 1 successor is supported

1005

if (op.getSuccessorOperands(0).getProducedOperandCount() > 0) return emitError("produced operands are not supported");

1214–1216

Can we check if the operand is defined by an op with an alloc side effect and both are in the same block. And otherwise return an error to make sure we don't miscompile/have double deallocs?

1216

the

1252–1259

Is is possible to have a separate handleInterface for func.return and somehow make sure that the impl for RegionBranchTerminatorOpInterface will not be called? (To keep this implementation here simpler.)

1264

how

1265

which

mlir/lib/Dialect/Bufferization/Transforms/BufferUtils.cpp
210

Why is this needed. There's a ValueComparator in OneShotAnalysis.h with a simpler implementation, would that suffice?

struct ValueComparator {
  bool operator()(const Value &lhs, const Value &rhs) const {
    return lhs.getImpl() < rhs.getImpl();
  }
};

Do we need this so that the result IR is deterministic?

maerhart updated this revision to Diff 555740.Sep 4 2023, 6:28 AM
maerhart marked 52 inline comments as done.

address comments and rebase

mlir/docs/Bufferization.md
390–392

In theory, you could implement your own kind of CallOpInterface, for example, and not use the upstream one. In that case, the pass wouldn't do the correct thing. But I guess that's obvious and I could remove this point.

394–395

I think it should just work with the current implementation, but there are no regression tests for that yet, so I didn't remove it. I thought of doing this in a later PR to keep this one a bit smaller (there are already a lot of regression tests to look at in this PR).

397–398

It fails with a nice error message if bufferization.dealloc of memref.dealloc operations are already present, but we cannot check for the dealloc side-effect on memref values, because the memref.realloc operation is modeled using a dealloc+alloc side-effect and there are use-cases that need realloc support.

417

Hmm, that's interesting. I thought I wrote something here. Maybe I forgot to commit the last few changes or lost them during export or rebase...
I've added something again :)

mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
296–297

Oh, this is actually outdated, I removed this part.

610–611

At the point where the dealloc operation is inserted, the ownership values for all memrefs in this list are queried and added as deallocation condition to the dealloc op. If a memref produced by an alloca is added here, the ownership value will be false and no deallocation will actually happen (and most likely be optimized away in the simplification pass). It's more dangerous to not add a value that might have to be deallocated in (only) some situations and thus leading to memory leaks.

795

If there is a block in which no memref has to be deallocated, there will be no entry in the map. In that case we would get the default constructed empty list here which is the behavior we want. Alternatively, we could establish the invariant that every processed block has to add an entry (with a possibly empty list as value), but I don't really see why this would be strictly better.

952

The problem is that newMemref is used in the lambda body below which is a C++20 feature and LLVM uses C++17.

1214–1216

I fixed this by lowering memref.realloc operations in a separate pass beforehand in a way that doesn't insert the deallocation operation for the original buffer. That means, the deallocation pass will insert the memref.dealloc operation for the realloc. Additionally, I check that no dealloc side-effect is present on memref values (or otherwise emit an error).

1252–1259

This way we also support function operations that are not func.func (but still implement the FunctionOpInterface and ReturnLike on the return-like terminator). Also note, that we still need all of the code for func.return because it can be in both private and public functions.

mlir/lib/Dialect/Bufferization/Transforms/BufferUtils.cpp
210

Yes, exactly. The implementation in OneShotAnalysis.h compares pointers and as long as the Values defined in a function aren't guaranteed to always be stored in a particular order in memory (and thus provide some guarantees on the pointer values returned by getImpl()) it is not sufficient to generate deterministic IR.

springerm added inline comments.Sep 6 2023, 7:32 AM
mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td
54–119

Is this still up to date?

143

Can you describe this briefly in the description part? What's different when this flag is enabled or disabled?

mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
258–260

A bit of high-level information is still missing here. Presumably the task of handleOp is to insert bufferization.dealloc ops. That should be mentioned. What if an op implements multiple interfaces? Will multiple bufferization.dealloc be inserted?

309

auxiliary

345

an

367

Mention that regions do not take ownership of memrefs that are passed to them from the outside. Similarly, no ownership is taken of implicitly captured memrefs (for ops that are not isolated from above).

1190–1191

Why is this needed? Shouldn't there be at least one ReturnOp from which you can always take the result types?

1201

: no ownership is taken (?)

1209

Does it matter in which order the blocks are traversed? E.g., do the ownerships of the bbargs already have to be available in the internal state? Note that the order of traversal here is not necessarily according to block dominance.

1213

Maybe populateRemainingOwnerships?

1243

This is a bit ambiguous. Maybe rephrase as: to pass to the "bufferization.dealloc" op as "memrefs to dealloc"

1245

Maybe ownerships is a more descriptive name here?

1245

Maybe thenOwnerships and elseOwnerships would be more descriptive.

1246

Add comment: // Retain trueDestOperands` if "true" branch is taken.`

1253

Add comment: Add comment: // Retain falseDestOperands` if "false" branch is taken.`

1254

To make this function a bit simpler, can you just pass a Value branchTaken here and construct the AndIOp inside of insertDeallocForBranch?

1255–1257

Is there a reason why is this is emitError instead of assert? When can this happen?

1283–1284

Does this assume that the first region is executed when entering this op? Maybe getEntrySuccessorRegions/getSuccessorRegions should be called?

1286

Add comment: No ownership is taken for any memrefs that are passed to the region from outside.

1287

Can you add a comment here.

Ownership of implicitly captured memrefs from other regions is never taken, but ownership of memref in the same region (but different block) is taken if I understand correctly.

1347

How does this work when branching back to the entry block of a non-private function? (For which no ownership indicators are added (?))

1362–1364

I cannot follow this sentence. Can you rephrase? I think there are 2 separate cases. clone and something like alloca?

1372

nit: Store this in a helper variable at the beginning of the function, and call only once instead of twice.

1375

Under what circumstances would getMemrefWithUniqueOwnership insert a clone here? Just for me to understand, no need to add any IR as a comment to the code....

mlir/lib/Dialect/Bufferization/Transforms/Bufferize.cpp
679–729 ↗(On Diff #555740)

nit: Can you move this into a separate revision. We can land this already.

maerhart updated this revision to Diff 556132.Sep 7 2023, 5:16 AM
maerhart marked 25 inline comments as done.

address comments

maerhart added inline comments.Sep 7 2023, 5:17 AM
mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td
54–119

I've updated it now. The alloc, copy, and dealloc in ^bb2 are not inserted anymore.

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
781 ↗(On Diff #555740)

We don't need the debug names anymore since we now register those patterns as well in the BufferDeallocationSimplification pass.

mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
1190–1191

I haven't checked if the verifier of func.func enforces something, but in principle you could have an infinite loop by branching in the entry block to the entry block again and it would be not returnLike op. But that'd only be relevant when we lift the explicit control flow loop limitation.

I added a comment mentioning this, but no regression test because we don't support unstructured control flow loops yet at all.

1209

Excellent point, this should traverse in dominance order.

1255–1257

Right, that's actually overly conservative, it can only happen when the BufferDeallocationOpInterface or a default handler in this file is implemented incorrectly. I'll convert it to an assert.

1347

Currently, this case would be rejected with a nice error message because we don't support unstructured control flow loops yet. If just removing this precondition check, this would lead to a memory leak, so we have to special-case it. Thanks for bringing this up!

1375

Assuming that the BufferDeallocationOpInterface is not implemented for arith.select, the following would lead to a clone:

func.func private @f(...
func.func @top() {
%alloc = memref.alloc
%alloca = memref.alloca
%select = arith.select %cond, %alloc, %alloca
func.call @f(%select)
...
}
maerhart updated this revision to Diff 556133.Sep 7 2023, 5:20 AM

remove canonicalization pattern debug names

maerhart added inline comments.Sep 8 2023, 5:40 AM
mlir/lib/Dialect/Bufferization/Transforms/BufferDeallocation.cpp
1347

I took a closer look and it turns out that for any region it's impossible to branch to the entry block because that's checked in the verifier (the entry block may not have any predecessors).

springerm accepted this revision.Sep 11 2023, 5:24 AM
This revision is now accepted and ready to land.Sep 11 2023, 5:24 AM
This revision was landed with ongoing or failed builds.Sep 13 2023, 2:31 AM
This revision was automatically updated to reflect the committed changes.
mlir/test/Dialect/Bufferization/Transforms/BufferDeallocation/invalid-buffer-deallocation.mlir