Page MenuHomePhabricator

[mlir] Automatic reference counting for Async values + runtime support for ref counted objects

Authored by ezhulenev on Nov 3 2020, 1:57 PM.



Depends On D89963

Automatic reference counting algorithm outline:

  1. ReturnLike operations forward the reference counted values without modifying the reference count.
  2. Use liveness analysis to find blocks in the CFG where the lifetime of reference counted values ends, and insert drop_ref operations after the last use of the value.
  3. Insert add_ref before the async.execute operation capturing the value, and pairing drop_ref before the async body region terminator, to release the captured reference counted value when execution completes.
  4. If the reference counted value is passed only to some of the block successors, insert drop_ref operations in the beginning of the blocks that do not have reference coutned value uses.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ftynse added inline comments.Nov 5 2020, 4:43 AM

No they will not be visible. Conversion almost never changes operations in-place. replaceOpWithNewOp and the likes inject a new op, and keep the old op until the conversion completes in case one needs to examine the original op or its operand. The list of the operands to the op being rewritten is formed by combining the results of the new ops if they were rewritten and existing ops if they were not. This is why we pass operands into matchAndRewrite, otherwise it would have been a useless copy of op->getOperands().

rriddle added inline comments.Nov 6 2020, 1:16 PM

Missing static on all of these?

ezhulenev updated this revision to Diff 303566.Nov 6 2020, 3:13 PM
ezhulenev marked an inline comment as done.

Add static to functions in AsyncRefCounting.cpp

herhut added inline comments.Nov 13 2020, 3:40 AM

Why not produce the ValueRange in place from the two arguments?


Nit: are.


I would argue for not having the users consume reference counts, as this makes it impossible to optimize the decrement operations in IR (they are tied to the ops). For instance, if you had inc_rc and dec_rc explicit, and both were in a loop, you could hoist the increments and sink the decrements, removing the overhead from the loop.

That might be a better way to optimize this in general. First insert all increments and decrements trivially where needed (the buffer deallocation pass could do this for you, see my comment on other CL) and then have a pass that pushes increments and decrements up/down, combining them where possible.

Seems less fragile and would work with existing interfaces for region control flow. It would also allow to pass async values to operations that do not implement the reference counting consumer interface.

ezhulenev added inline comments.Nov 13 2020, 4:02 AM

FWIW Swift SIL has all reference counting explicit (

There are two types of ref-counted value users:

  1. "forwarding": std.return, function call arg - they do not change the ref count
  2. "consumers" - everything else.

Async automatic ref counting will need to either have a closed set of supported users, or rely in op interfaces to distinguish between user types.

ezhulenev added inline comments.Nov 13 2020, 4:41 AM

And there is also operation like mlirAsyncRuntimeAddTokenToGroup that consumes reference at some indeterminate point in the future, so if IR has drop_ref, then the operation will need to have add_ref to compensate for that or marked as "forwarding" (reference counting responsibility forwarded to the runtime)

ezhulenev edited the summary of this revision. (Show Details)Nov 13 2020, 12:38 PM
ezhulenev removed reviewers: ftynse, aartbik, mehdi_amini, herhut.
silvas added a subscriber: silvas.Nov 13 2020, 6:17 PM
silvas added inline comments.

It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost". Can you adjust the comment?

I also don't understand "Inside this operation statically known number of uses is 1" - if %cond is false it will be 0.


nit: looks like line wrapping here forgot to insert //.Same on the async.drop_ref below.


nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance of result's owner".

ezhulenev updated this revision to Diff 305458.Nov 16 2020, 3:34 AM

Use liveness analysis for reference counting

ezhulenev edited the summary of this revision. (Show Details)Nov 16 2020, 3:39 AM
ezhulenev added reviewers: silvas, mehdi_amini, herhut.
ezhulenev updated this revision to Diff 305469.Nov 16 2020, 4:22 AM
ezhulenev marked 10 inline comments as done.

Construct ValueRange directly as an argument to create call


I've pushed a new revision based on liveness analysis and explicit drop_ref instead of implicit "ref consumer".

silvas added inline comments.Nov 17 2020, 9:03 AM

Why only ExecuteOp? Why not use NumberOfExecutions?

ezhulenev marked an inline comment as done.Nov 17 2020, 9:11 AM
ezhulenev added inline comments.

Because operations after the async.execute can be executed before the operations nested under the async.execute, this is currently the only operation that has this property.


%token = ...
async.execute {
  async.await %token : !async.token // await #1
async.await %token : !async.token // await #2

It is impossible to determine which of the async.await operations will be the "last use" at runtime. Ref counting will pick second await as the last user and will create drop_ref after it, however if first await will be executed later it needs to keep the token alive.

silvas added inline comments.Nov 17 2020, 4:05 PM

nit: "All values are semantically created"


unclear what "owner" means in this context. Is this referring to a runtime construct or IR construct?


should it start with "create" to match the others?


rewriter has some helpers to avoid these raw get calls.


This should use operands[0] for the converted operands since this is doing a type conversion.


Discuss runtime refcounting ABI conventions for runtime functions in this comment. And conventions for IR functions that accept/return refcounted objects.


Add the explanation from your other review comment here justifying the special treatment of async.execute.


nit: typo coutned


typo: dialect types are


explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we encounter uses in nested region)


typo: in in


findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky case):

for (Operation *user : value.getUsers()) {
  if (user->getParent() == block) {
    userInTheBlock = user;

Also, recommend putting this in a static helper, per


I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing while (user->getRegion() != definingRegion). That would make this code simpler as well.


I would prefer to keep such optimizations in a separate pass. Advantages:

  1. Easy to show and test tricky cases of this optimization (the current code requires a level of indirection -- one has to imagine which ops are inserted, and then removed)
  2. When debugging a miscompile, it is easier to bisect by removing an optimization pass which should not affect correctness.
  3. Can do this more efficiently. The current algorithm is O(BlockSize^3); many ML programs are single blocks of >1000 ops. I think this algorithm can be replaced with with a single walk of each block, applying the optimization to all refcounted Value's in that block at the same time.
  4. Makes test cases for this pass clearer because users can see all the ops inserted and follow along with the code.

(if you want to omit this optimization from the initial patch, that is fine too).

ezhulenev updated this revision to Diff 306216.Nov 18 2020, 1:58 PM
ezhulenev marked 14 inline comments as done.

Add a separate AsyncRefCountingOptimization pass + address PR comments

ezhulenev added inline comments.Nov 18 2020, 2:02 PM

Changed the documentation to reflect the new implementation of automatic reference counting.


createTokenFunctionType == function type for createToken function. Renamed to addOrDropRefFunctionType to make it clear that it is for add_ref and drop_ref ops.


Yes, also fixed a similar bug below.


Added few lines to explain why ignoring nested regions is ok.


findAncestorOpInBlock required to find the last use in the block even if the "real" use is deep inside nested region.

%token = ...
scf.for %i = ... {      <<<----- `scf.for` will be the last user
  async.await %token : !async.token
asyn.drop_ref %token.   <<<---- will be added after the last use in the CFG

Cleaned up code a little bit.


I moved it to a separate async-ref-counting-optimization pass. It is still not as efficient as it could be, but I added a small preprocessing step + iterate only the blocks that have uses of value.

ezhulenev updated this revision to Diff 306229.Nov 18 2020, 2:39 PM

Fix a bug in ref counting optimization

ezhulenev updated this revision to Diff 306230.Nov 18 2020, 2:52 PM

Break the loop early in user is after dropRef

ezhulenev updated this revision to Diff 306232.Nov 18 2020, 2:54 PM


ezhulenev updated this revision to Diff 306239.Nov 18 2020, 3:14 PM

Mark symbol declaration private

silvas accepted this revision.Nov 19 2020, 6:08 PM

Thanks! This looks great!


nit: could -> can


nit: "it is the responsibility of the async value user" seems to imply that it is not this pass's responsibility. Suggest "To implement automatic reference counting, we must insert a +1 reference before each Operation using the value".


typo: yied


suggest putting this helper in include/Dialect/Async/IR/Async.h; it is used in the other file too.


Is it interesting to test async.execute[%token]?


is scf.if essential to this test case? If not, remove it. if so, describe it in the comment.


The input IR here seems strange to me. Will it create a leak if %arg1 == false?

I don't see a test case that produces IR that looks like this in async-ref-counting.mlir. Perhaps it would be good to add.


nit: inconsistency of CHECK: drop_ref vs CHECK: async.drop_ref


Is there a missing CHECK: async.add_ref %[[TOKEN]] on the line before %token_0 = async.execute and a missing CHECK: async.drop_ref %[[TOKEN_0]] before the return?

(best to show all add_ref/drop_ref, or use CHECK-NOT to show that they are not produced there)

25 ↗(On Diff #306239)

generally we don't test propreties verified by traits/interfaces.

This revision is now accepted and ready to land.Nov 19 2020, 6:08 PM
ezhulenev updated this revision to Diff 306635.Nov 20 2020, 2:42 AM
ezhulenev marked 11 inline comments as done.

Address PR comments

Thanks for the review!


Added a test, it is indeed quite common pattern with nested async execute operations.


I was not really thinking about ref counting correctness when writing this tests :) Added an explicit note to the test where this property is violated.


Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back missing checks to few other tests.