Page MenuHomePhabricator

[mlir] Async: add automatic reference counting at async.runtime operations level

Authored by ezhulenev on Jan 25 2021, 1:16 PM.



Depends On D95311

Previous automatic-ref-counting pass worked with high level async operations (e.g. async.execute), however async values reference counting is a runtime implementation detail.

New pass mostly relies on the save liveness analysis to place drop_ref operations, and does better verification of CFG with different liveIn sets in block successors.

This is almost NFC change. No new reference counting ideas, just a cleanup of the previous version.

Diff Detail

Event Timeline

ezhulenev created this revision.Jan 25 2021, 1:16 PM
ezhulenev requested review of this revision.Jan 25 2021, 1:16 PM
ezhulenev edited the summary of this revision. (Show Details)Jan 25 2021, 1:20 PM
ezhulenev added a reviewer: mehdi_amini.

Fix integration tests

@mehdi_amini will you have time to take a look at this change? Diffs looks large, but really no new ideas here. I plan to get back to working on async stuff next week, and need this to land first.

Excellent documentation in the code!!

I haven't read it all, but I won't finish tonight so flushing these comments.


I'd add here on which abstraction level it is intended to work (async.execute vs async.runtime/async.coro)


The parenthesis isn't clear to me: does it imply that a function argument in the function body is considered a new async value and start with a ref count of 1?
Why isn't a function argument just whatever the caller provides?


If the function call is also the last use of this value in a block, is this needed?


One possible "optimization" may be to collect the block to consider from the users of the value first and iterate only these blocks?

I suspect we could also take advantage of this traversal of the uses ahead of time to collect them and avoid re-traversing them in every block where the value "dies" (line 170).

Something like:

DenseMap<Block *, Operation *> last_users;
for (Operation *user : value.getUsers()) {
   Block block = definingRegion->findAncestorOpInRegion(user->getParent());
   auto it = last_users.insert({block, user});
   if (!it.second && user->isBeforeInBlock(it.first.second))
     it.first.second = user;

for (const auto &block_and_last_user :  last_users) {
   Block *block = block_and_last_user.first;
   Operation lastUser = block_and_last_user.second;
   // check liveness now
  if (....)

What the guaranteed on this?
Isn't there valid IR that can hit this?

To take the example provided above and slightly tweaking it:

///   ^entry:
///     %token = async.runtime.create : !async.token
///     cond_br %cond, ^bb1, ^bb2
///   ^bb1:
///     async.runtime.await %token
///     return
///   ^bb2:
///     br ^bb1

The token is live in ^bb2 even if not used there.


Why does the isBeforeInBlock matters?
I'd expect a drop_ref followed by an add_ref should be optimizable just like in the other direction?


The double lookup in the map can be avoided:

bool inserted = cancellable.insert({dropRef.getOperation(), addRef.getOperation()}).second;
if (inserted)
ezhulenev updated this revision to Diff 336631.Sat, Apr 10, 2:18 PM
ezhulenev marked 5 inline comments as done.

Update reference counting pass documentation


I've reworded it a little bit and added documentation for passing args to functions.


No, but to keeps things simple it is still added there, and ref-counting-opt will remove it. This is now documented in the beginning.

ezhulenev updated this revision to Diff 336633.Sat, Apr 10, 2:44 PM
ezhulenev marked 3 inline comments as done.

Do not lookup in the hash map twice.


Slightly updated the documentation, I think that @token_arg_cond_br_await_with_fallthough in the async-runtime-ref-counting.mlir tests exactly that.


This might hide bugs at runtime, drop_ref can drop reference count to zero and delete the value, and the following add_ref must fail (with asan or msan, it likely lead to segfault in the regular build).

ezhulenev marked an inline comment as done.

Traverse only blocks where the ref counted value has users


Done. Also updated the documentation a bit to explain how/why it works.

mehdi_amini accepted this revision.Mon, Apr 12, 10:29 AM



Should we return here?


If I understand correctly, you're saying that an incorrect input will not crash at runtime?

It is fine in general to do that: most optimization have to assume that the input program is "correct" and optimize accordingly (LLVM optimizations will always do that).

This revision is now accepted and ready to land.Mon, Apr 12, 10:29 AM
ezhulenev marked 2 inline comments as done.

Return after signaling pass failure

ezhulenev added inline comments.Mon, Apr 12, 10:50 AM

I'm not that confident in the correctness of reference counting pass to silently ignore this type of errors :) Also if the program is correct wrt reference counting such optimization opportunities should not exists.