This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Automatic reference counting for Async values + runtime support for ref counted objects
ClosedPublic

Authored by ezhulenev on Nov 3 2020, 1:57 PM.

Details

Summary

Depends On D89963

Automatic reference counting algorithm outline:

  1. ReturnLike operations forward the reference counted values without modifying the reference count.
  2. Use liveness analysis to find blocks in the CFG where the lifetime of reference counted values ends, and insert drop_ref operations after the last use of the value.
  3. Insert add_ref before the async.execute operation capturing the value, and pairing drop_ref before the async body region terminator, to release the captured reference counted value when execution completes.
  4. If the reference counted value is passed only to some of the block successors, insert drop_ref operations in the beginning of the blocks that do not have reference coutned value uses.

Diff Detail

Event Timeline

ezhulenev created this revision.Nov 3 2020, 1:57 PM
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
ezhulenev requested review of this revision.Nov 3 2020, 1:57 PM
ezhulenev edited the summary of this revision. (Show Details)Nov 3 2020, 2:02 PM
ezhulenev added reviewers: mehdi_amini, herhut.
ftynse added inline comments.Nov 5 2020, 2:11 AM
mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
669

You probably want to take the operand from operands rather than from the op directly in case it was modified by another pattern.

AddRefOpAdaptor is an autogenerated class that is constructible from ArrayRef<Value> and provides an API similar to the Op it models, i.e. you can call adaptor.operand().

681

Could we do something like

template <typename OpTy>
class RefToCallLoweringPattern : public OpConversionPattern<OpTy> {
  RefLoweringPatter(MLIRContext *ctx, StringRef funcName) : OpConversionPattern<OpTy>(ctx), funcName(funcName) {}

  matchAndRewrite(...) {
    ...
    rewruter.replaceOpWithNewOp<CallOp>(op, Type(), funcName, ValueRange(args));
  }
};

and remove duplicate code?

911

I would recommend to make ConstantOp legal, not the whole StandardDialect, which has lots of different things.

mlir/lib/Dialect/Async/IR/Async.cpp
349–350 ↗(On Diff #302681)

Just declare it as IntNonNegative in ODS.

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
38

MLIR uses /// for top-level comments.

65

Out of scope: I am interested in seing this as a generic OpInterface, just yesterday the need for this popped up in another discussion.

144

Any particular reason for using 32bit integers for refcount? In this struct, it may not even save space because the compiler will insert padding.

276

19 looks very unconventional. We usually try to estimate what would be the common "small" number of entries and round it up to a power of two.

mlir/lib/ExecutionEngine/AsyncRuntime.cpp
94

please fix

ezhulenev updated this revision to Diff 303074.Nov 5 2020, 3:50 AM
ezhulenev marked 6 inline comments as done.

Remove code duplication in op lowering + fix style guide violations

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
669

Wouldn't the changes be also visible through the op? From the auto generated code is seems that they are identical:

::mlir::Value AddRefOpAdaptor::operand() {
  return *getODSOperands(0).begin();
}

vs

::mlir::Operation::operand_range AddRefOp::getODSOperands(unsigned index) {
  auto valueRange = getODSOperandIndexAndLength(index);
  return {std::next(getOperation()->operand_begin(), valueRange.first),
           std::next(getOperation()->operand_begin(), valueRange.first + valueRange.second)};
}

::mlir::Value AddRefOp::operand() {
  return *getODSOperands(0).begin();
}
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
65

Yeah, seems like a useful property in many contexts. Will leave it for the followup.

144

Not really, just to match the type of the count arg in add_ref/drop_ref ops, but that choice is also arbitrary.

276

That was a typo, it was supposed to be 10 :) Changed to 8 here and below, because that seems like a reasonable upper bound for number of uses for an async value,

ezhulenev marked an inline comment as not done.Nov 5 2020, 3:51 AM
ezhulenev updated this revision to Diff 303077.Nov 5 2020, 3:55 AM

Use IntPositive trait for ref count attr

ftynse added inline comments.Nov 5 2020, 4:43 AM
mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
669

No they will not be visible. Conversion almost never changes operations in-place. replaceOpWithNewOp and the likes inject a new op, and keep the old op until the conversion completes in case one needs to examine the original op or its operand. The list of the operands to the op being rewritten is formed by combining the results of the new ops if they were rewritten and existing ops if they were not. This is why we pass operands into matchAndRewrite, otherwise it would have been a useless copy of op->getOperands().

rriddle added inline comments.Nov 6 2020, 1:16 PM
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
40

Missing static on all of these?

ezhulenev updated this revision to Diff 303566.Nov 6 2020, 3:13 PM
ezhulenev marked an inline comment as done.

Add static to functions in AsyncRefCounting.cpp

herhut added inline comments.Nov 13 2020, 3:40 AM
mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
675

Why not produce the ValueRange in place from the two arguments?

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
38

Nit: are.

170

I would argue for not having the users consume reference counts, as this makes it impossible to optimize the decrement operations in IR (they are tied to the ops). For instance, if you had inc_rc and dec_rc explicit, and both were in a loop, you could hoist the increments and sink the decrements, removing the overhead from the loop.

That might be a better way to optimize this in general. First insert all increments and decrements trivially where needed (the buffer deallocation pass could do this for you, see my comment on other CL) and then have a pass that pushes increments and decrements up/down, combining them where possible.

Seems less fragile and would work with existing interfaces for region control flow. It would also allow to pass async values to operations that do not implement the reference counting consumer interface.

ezhulenev added inline comments.Nov 13 2020, 4:02 AM
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170

FWIW Swift SIL has all reference counting explicit (https://github.com/apple/swift/blob/main/docs/ARCOptimization.rst).

There are two types of ref-counted value users:

  1. "forwarding": std.return, function call arg - they do not change the ref count
  2. "consumers" - everything else.

Async automatic ref counting will need to either have a closed set of supported users, or rely in op interfaces to distinguish between user types.

ezhulenev added inline comments.Nov 13 2020, 4:41 AM
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170

And there is also operation like mlirAsyncRuntimeAddTokenToGroup that consumes reference at some indeterminate point in the future, so if IR has drop_ref, then the operation will need to have add_ref to compensate for that or marked as "forwarding" (reference counting responsibility forwarded to the runtime)

ezhulenev edited the summary of this revision. (Show Details)Nov 13 2020, 12:38 PM
ezhulenev removed reviewers: ftynse, aartbik, mehdi_amini, herhut.
silvas added a subscriber: silvas.Nov 13 2020, 6:17 PM
silvas added inline comments.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170

It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost". Can you adjust the comment?

I also don't understand "Inside this operation statically known number of uses is 1" - if %cond is false it will be 0.

181

nit: looks like line wrapping here forgot to insert //.Same on the async.drop_ref below.

273

nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance of result's owner".

ezhulenev updated this revision to Diff 305458.Nov 16 2020, 3:34 AM

Use liveness analysis for reference counting

ezhulenev edited the summary of this revision. (Show Details)Nov 16 2020, 3:39 AM
ezhulenev added reviewers: silvas, mehdi_amini, herhut.
ezhulenev updated this revision to Diff 305469.Nov 16 2020, 4:22 AM
ezhulenev marked 10 inline comments as done.

Construct ValueRange directly as an argument to create call

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170

I've pushed a new revision based on liveness analysis and explicit drop_ref instead of implicit "ref consumer".

silvas added inline comments.Nov 17 2020, 9:03 AM
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
192

Why only ExecuteOp? Why not use NumberOfExecutions?

ezhulenev marked an inline comment as done.Nov 17 2020, 9:11 AM
ezhulenev added inline comments.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
192

Because operations after the async.execute can be executed before the operations nested under the async.execute, this is currently the only operation that has this property.

Example:

%token = ...
async.execute {
  async.await %token : !async.token // await #1
  async.yield
}
async.await %token : !async.token // await #2

It is impossible to determine which of the async.await operations will be the "last use" at runtime. Ref counting will pick second await as the last user and will create drop_ref after it, however if first await will be executed later it needs to keep the token alive.

silvas added inline comments.Nov 17 2020, 4:05 PM
mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
237

nit: "All values are semantically created"

238

unclear what "owner" means in this context. Is this referring to a runtime construct or IR construct?

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
54

should it start with "create" to match the others?

670

rewriter has some helpers to avoid these raw get calls.

674

This should use operands[0] for the converted operands since this is doing a type conversion.

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
37

Discuss runtime refcounting ABI conventions for runtime functions in this comment. And conventions for IR functions that accept/return refcounted objects.

48

Add the explanation from your other review comment here justifying the special treatment of async.execute.

55

nit: typo coutned

62

typo: dialect types are

92

explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we encounter uses in nested region)

107

typo: in in

121

findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky case):

for (Operation *user : value.getUsers()) {
  if (user->getParent() == block) {
    userInTheBlock = user;
    break;
  }
}

Also, recommend putting this in a static helper, per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code

211

I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing while (user->getRegion() != definingRegion). That would make this code simpler as well.

243

I would prefer to keep such optimizations in a separate pass. Advantages:

  1. Easy to show and test tricky cases of this optimization (the current code requires a level of indirection -- one has to imagine which ops are inserted, and then removed)
  2. When debugging a miscompile, it is easier to bisect by removing an optimization pass which should not affect correctness.
  3. Can do this more efficiently. The current algorithm is O(BlockSize^3); many ML programs are single blocks of >1000 ops. I think this algorithm can be replaced with with a single walk of each block, applying the optimization to all refcounted Value's in that block at the same time.
  4. Makes test cases for this pass clearer because users can see all the ops inserted and follow along with the code.

(if you want to omit this optimization from the initial patch, that is fine too).

ezhulenev updated this revision to Diff 306216.Nov 18 2020, 1:58 PM
ezhulenev marked 14 inline comments as done.

Add a separate AsyncRefCountingOptimization pass + address PR comments

ezhulenev added inline comments.Nov 18 2020, 2:02 PM
mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
238

Changed the documentation to reflect the new implementation of automatic reference counting.

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
54

createTokenFunctionType == function type for createToken function. Renamed to addOrDropRefFunctionType to make it clear that it is for add_ref and drop_ref ops.

674

Yes, also fixed a similar bug below.

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
92

Added few lines to explain why ignoring nested regions is ok.

121

findAncestorOpInBlock required to find the last use in the block even if the "real" use is deep inside nested region.

%token = ...
scf.for %i = ... {      <<<----- `scf.for` will be the last user
  async.await %token : !async.token
}
asyn.drop_ref %token.   <<<---- will be added after the last use in the CFG

Cleaned up code a little bit.

243

I moved it to a separate async-ref-counting-optimization pass. It is still not as efficient as it could be, but I added a small preprocessing step + iterate only the blocks that have uses of value.

ezhulenev updated this revision to Diff 306229.Nov 18 2020, 2:39 PM

Fix a bug in ref counting optimization

ezhulenev updated this revision to Diff 306230.Nov 18 2020, 2:52 PM

Break the loop early in user is after dropRef

ezhulenev updated this revision to Diff 306232.Nov 18 2020, 2:54 PM

ValueUser->UserInfo

ezhulenev updated this revision to Diff 306239.Nov 18 2020, 3:14 PM

Mark symbol declaration private

silvas accepted this revision.Nov 19 2020, 6:08 PM

Thanks! This looks great!

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
249

nit: could -> can

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
47

nit: "it is the responsibility of the async value user" seems to imply that it is not this pass's responsibility. Suggest "To implement automatic reference counting, we must insert a +1 reference before each Operation using the value".

76

typo: yied

mlir/lib/Dialect/Async/Transforms/AsyncRefCountingOptimization.cpp
40 ↗(On Diff #306239)

suggest putting this helper in include/Dialect/Async/IR/Async.h; it is used in the other file too.

mlir/test/Dialect/Async/async-ref-counting-optimization.mlir
1 ↗(On Diff #306239)

Is it interesting to test async.execute[%token]?

55 ↗(On Diff #306239)

is scf.if essential to this test case? If not, remove it. if so, describe it in the comment.

58 ↗(On Diff #306239)

The input IR here seems strange to me. Will it create a leak if %arg1 == false?

I don't see a test case that produces IR that looks like this in async-ref-counting.mlir. Perhaps it would be good to add.

64 ↗(On Diff #306239)

nit: inconsistency of CHECK: drop_ref vs CHECK: async.drop_ref

mlir/test/Dialect/Async/async-ref-counting.mlir
146

Is there a missing CHECK: async.add_ref %[[TOKEN]] on the line before %token_0 = async.execute and a missing CHECK: async.drop_ref %[[TOKEN_0]] before the return?

(best to show all add_ref/drop_ref, or use CHECK-NOT to show that they are not produced there)

mlir/test/Dialect/Async/verify.mlir
25

generally we don't test propreties verified by traits/interfaces.

This revision is now accepted and ready to land.Nov 19 2020, 6:08 PM
ezhulenev updated this revision to Diff 306635.Nov 20 2020, 2:42 AM
ezhulenev marked 11 inline comments as done.

Address PR comments

Thanks for the review!

mlir/test/Dialect/Async/async-ref-counting-optimization.mlir
1 ↗(On Diff #306239)

Added a test, it is indeed quite common pattern with nested async execute operations.

58 ↗(On Diff #306239)

I was not really thinking about ref counting correctness when writing this tests :) Added an explicit note to the test where this property is violated.

mlir/test/Dialect/Async/async-ref-counting.mlir
146

Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back missing checks to few other tests.