This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/Async/
-
Async/
-
IR/
-
AsyncBase.td
4/4
AsyncOps.td
-
Passes.h
-
Passes.td
-
ExecutionEngine/
-
AsyncRuntime.h
-
integration_test/Dialect/Async/CPU/
-
Dialect/
-
Async/
-
CPU/
-
test-async-parallel-for-1d.mlir
-
test-async-parallel-for-2d.mlir
-
lib/
-
Conversion/AsyncToLLVM/
-
AsyncToLLVM/
11/11
AsyncToLLVM.cpp
-
Dialect/Async/Transforms/
-
Async/
-
Transforms/
32/32
AsyncRefCounting.cpp
-
CMakeLists.txt
-
ExecutionEngine/
1/1
AsyncRuntime.cpp
-
test/
-
Conversion/AsyncToLLVM/
-
AsyncToLLVM/
-
convert-to-llvm.mlir
-
Dialect/Async/
-
Async/
2/2
async-ref-counting.mlir
-
ops.mlir
1/1
verify.mlir
-
mlir-cpu-runner/
-
async-group.mlir

Differential D90716

[mlir] Automatic reference counting for Async values + runtime support for ref counted objects
ClosedPublic

Authored by ezhulenev on Nov 3 2020, 1:57 PM.

Download Raw Diff

Details

Reviewers

ftynse
aartbik
silvas
mehdi_amini
herhut

Commits

rGa86a9b5ef777: [mlir] Automatic reference counting for Async values + runtime support for ref…

Summary

Depends On D89963

Automatic reference counting algorithm outline:

ReturnLike operations forward the reference counted values without modifying the reference count.
Use liveness analysis to find blocks in the CFG where the lifetime of reference counted values ends, and insert drop_ref operations after the last use of the value.
Insert add_ref before the async.execute operation capturing the value, and pairing drop_ref before the async body region terminator, to release the captured reference counted value when execution completes.
If the reference counted value is passed only to some of the block successors, insert drop_ref operations in the beginning of the blocks that do not have reference coutned value uses.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ezhulenev created this revision.Nov 3 2020, 1:57 PM

Herald added a reviewer: ftynse. · View Herald TranscriptNov 3 2020, 1:57 PM

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: rdzhabarov, tatianashp, msifontes and 15 others. · View Herald Transcript

ezhulenev requested review of this revision.Nov 3 2020, 1:57 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptNov 3 2020, 1:57 PM

ezhulenev edited the summary of this revision. (Show Details)Nov 3 2020, 2:02 PM

ezhulenev added reviewers: mehdi_amini, herhut.

Harbormaster completed remote builds in B77466: Diff 302681.Nov 3 2020, 2:12 PM

ftynse added inline comments.Nov 5 2020, 2:11 AM

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
669	You probably want to take the operand from `operands` rather than from the op directly in case it was modified by another pattern. `AddRefOpAdaptor` is an autogenerated class that is constructible from `ArrayRef<Value>` and provides an API similar to the Op it models, i.e. you can call `adaptor.operand()`.
681	Could we do something like template <typename OpTy> class RefToCallLoweringPattern : public OpConversionPattern<OpTy> { RefLoweringPatter(MLIRContext *ctx, StringRef funcName) : OpConversionPattern<OpTy>(ctx), funcName(funcName) {} matchAndRewrite(...) { ... rewruter.replaceOpWithNewOp<CallOp>(op, Type(), funcName, ValueRange(args)); } }; and remove duplicate code?
910	I would recommend to make ConstantOp legal, not the whole StandardDialect, which has lots of different things.
mlir/lib/Dialect/Async/IR/Async.cpp
349–350 ↗	(On Diff #302681)	Just declare it as `IntNonNegative` in ODS.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
38	MLIR uses `///` for top-level comments.
65	Out of scope: I am interested in seing this as a generic OpInterface, just yesterday the need for this popped up in another discussion.
144	Any particular reason for using 32bit integers for refcount? In this struct, it may not even save space because the compiler will insert padding.
276	19 looks very unconventional. We usually try to estimate what would be the common "small" number of entries and round it up to a power of two.
mlir/lib/ExecutionEngine/AsyncRuntime.cpp
94	please fix

Remove code duplication in op lowering + fix style guide violations

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
669	Wouldn't the changes be also visible through the op? From the auto generated code is seems that they are identical: ::mlir::Value AddRefOpAdaptor::operand() { return getODSOperands(0).begin(); } vs ::mlir::Operation::operand_range AddRefOp::getODSOperands(unsigned index) { auto valueRange = getODSOperandIndexAndLength(index); return {std::next(getOperation()->operand_begin(), valueRange.first), std::next(getOperation()->operand_begin(), valueRange.first + valueRange.second)}; } ::mlir::Value AddRefOp::operand() { return getODSOperands(0).begin(); }
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
65	Yeah, seems like a useful property in many contexts. Will leave it for the followup.
144	Not really, just to match the type of the `count` arg in add_ref/drop_ref ops, but that choice is also arbitrary.
276	That was a typo, it was supposed to be 10 :) Changed to 8 here and below, because that seems like a reasonable upper bound for number of uses for an async value,

ezhulenev marked an inline comment as not done.Nov 5 2020, 3:51 AM

Use IntPositive trait for ref count attr

Harbormaster completed remote builds in B77681: Diff 303077.Nov 5 2020, 4:11 AM

Harbormaster completed remote builds in B77679: Diff 303074.Nov 5 2020, 4:15 AM

ftynse added inline comments.Nov 5 2020, 4:43 AM

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
669	No they will not be visible. Conversion almost never changes operations in-place. `replaceOpWithNewOp` and the likes inject a new op, and keep the old op until the conversion completes in case one needs to examine the original op or its operand. The list of the operands to the op being rewritten is formed by combining the results of the new ops if they were rewritten and existing ops if they were not. This is why we pass `operands` into `matchAndRewrite`, otherwise it would have been a useless copy of `op->getOperands()`.

rriddle added inline comments.Nov 6 2020, 1:16 PM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
40	Missing static on all of these?

rriddle mentioned this in D90922: [mlir] Add NumberOfExecutions analysis + update RegionBranchOpInterface interface to query number of region invocations.Nov 6 2020, 1:18 PM

Add static to functions in AsyncRefCounting.cpp

Harbormaster completed remote builds in B77948: Diff 303566.Nov 6 2020, 4:25 PM

ezhulenev mentioned this in D89963: [mlir] Transform scf.parallel to scf.for + async.execute.Nov 13 2020, 3:11 AM

herhut added inline comments.Nov 13 2020, 3:40 AM

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
675	Why not produce the `ValueRange` in place from the two arguments?
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
37	Nit: are.
169	I would argue for not having the users consume reference counts, as this makes it impossible to optimize the decrement operations in IR (they are tied to the ops). For instance, if you had `inc_rc` and `dec_rc` explicit, and both were in a loop, you could hoist the increments and sink the decrements, removing the overhead from the loop. That might be a better way to optimize this in general. First insert all increments and decrements trivially where needed (the buffer deallocation pass could do this for you, see my comment on other CL) and then have a pass that pushes increments and decrements up/down, combining them where possible. Seems less fragile and would work with existing interfaces for region control flow. It would also allow to pass async values to operations that do not implement the reference counting consumer interface.

ezhulenev added inline comments.Nov 13 2020, 4:02 AM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
169	FWIW Swift SIL has all reference counting explicit (https://github.com/apple/swift/blob/main/docs/ARCOptimization.rst). There are two types of ref-counted value users: "forwarding": std.return, function call arg - they do not change the ref count "consumers" - everything else. Async automatic ref counting will need to either have a closed set of supported users, or rely in op interfaces to distinguish between user types.

ezhulenev added inline comments.Nov 13 2020, 4:41 AM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
169	And there is also operation like `mlirAsyncRuntimeAddTokenToGroup` that consumes reference at some indeterminate point in the future, so if IR has `drop_ref`, then the operation will need to have `add_ref` to compensate for that or marked as `"forwarding"` (reference counting responsibility forwarded to the runtime)

ezhulenev edited the summary of this revision. (Show Details)Nov 13 2020, 12:38 PM

ezhulenev removed reviewers: ftynse, aartbik, mehdi_amini, herhut.

Herald added a reviewer: ftynse. · View Herald TranscriptNov 13 2020, 12:38 PM

Herald added a reviewer: aartbik. · View Herald Transcript

silvas added a subscriber: silvas.Nov 13 2020, 6:17 PM

silvas added inline comments.

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
169	It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost". Can you adjust the comment? I also don't understand "Inside this operation statically known number of uses is 1" - if %cond is false it will be 0.
180	nit: looks like line wrapping here forgot to insert `//`.Same on the async.drop_ref below.
272	nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance of `result`'s owner".

Use liveness analysis for reference counting

Herald added a subscriber: teijeong. · View Herald TranscriptNov 16 2020, 3:35 AM

ezhulenev edited the summary of this revision. (Show Details)Nov 16 2020, 3:39 AM

ezhulenev added reviewers: silvas, mehdi_amini, herhut.

Harbormaster completed remote builds in B78943: Diff 305458.Nov 16 2020, 3:49 AM

Construct ValueRange directly as an argument to create call

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
169	I've pushed a new revision based on liveness analysis and explicit `drop_ref` instead of implicit "ref consumer".

Harbormaster completed remote builds in B78947: Diff 305469.Nov 16 2020, 4:44 AM

silvas added inline comments.Nov 17 2020, 9:03 AM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
193	Why only ExecuteOp? Why not use NumberOfExecutions?

ezhulenev marked an inline comment as done.Nov 17 2020, 9:11 AM

ezhulenev added inline comments.

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
193	Because operations after the `async.execute` can be executed before the operations nested under the `async.execute`, this is currently the only operation that has this property. Example: %token = ... async.execute { async.await %token : !async.token // await #1 async.yield } async.await %token : !async.token // await #2 It is impossible to determine which of the `async.await` operations will be the "last use" at runtime. Ref counting will pick second await as the last user and will create `drop_ref` after it, however if first await will be executed later it needs to keep the `token` alive.

silvas added inline comments.Nov 17 2020, 4:05 PM

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
234	nit: "All values are semantically created"
235	unclear what "owner" means in this context. Is this referring to a runtime construct or IR construct?
mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
54	should it start with "create" to match the others?
670	rewriter has some helpers to avoid these raw `get` calls.
674	This should use `operands[0]` for the converted operands since this is doing a type conversion.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
38	Discuss runtime refcounting ABI conventions for runtime functions in this comment. And conventions for IR functions that accept/return refcounted objects.
49	Add the explanation from your other review comment here justifying the special treatment of async.execute.
56	nit: typo coutned
63	typo: dialect types are
93	explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we encounter uses in nested region)
108	typo: in in
122	findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky case): for (Operation *user : value.getUsers()) { if (user->getParent() == block) { userInTheBlock = user; break; } } Also, recommend putting this in a static helper, per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code
212	I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing `while (user->getRegion() != definingRegion)`. That would make this code simpler as well.
244	I would prefer to keep such optimizations in a separate pass. Advantages: Easy to show and test tricky cases of this optimization (the current code requires a level of indirection -- one has to imagine which ops are inserted, and then removed) When debugging a miscompile, it is easier to bisect by removing an optimization pass which should not affect correctness. Can do this more efficiently. The current algorithm is O(BlockSize^3); many ML programs are single blocks of >1000 ops. I think this algorithm can be replaced with with a single walk of each block, applying the optimization to all refcounted Value's in that block at the same time. Makes test cases for this pass clearer because users can see all the ops inserted and follow along with the code. (if you want to omit this optimization from the initial patch, that is fine too).

Add a separate AsyncRefCountingOptimization pass + address PR comments

Herald added a subscriber: mgrang. · View Herald TranscriptNov 18 2020, 1:58 PM

ezhulenev added inline comments.Nov 18 2020, 2:02 PM

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
235	Changed the documentation to reflect the new implementation of automatic reference counting.
mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
54	`createTokenFunctionType` == function type for `createToken` function. Renamed to `addOrDropRefFunctionType` to make it clear that it is for `add_ref` and `drop_ref` ops.
674	Yes, also fixed a similar bug below.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
93	Added few lines to explain why ignoring nested regions is ok.
122	`findAncestorOpInBlock` required to find the last use in the block even if the "real" use is deep inside nested region. %token = ... scf.for %i = ... { <<<----- `scf.for` will be the last user async.await %token : !async.token } asyn.drop_ref %token. <<<---- will be added after the last use in the CFG Cleaned up code a little bit.
244	I moved it to a separate `async-ref-counting-optimization` pass. It is still not as efficient as it could be, but I added a small preprocessing step + iterate only the blocks that have uses of `value`.

Harbormaster completed remote builds in B79359: Diff 306216.Nov 18 2020, 2:17 PM

Fix a bug in ref counting optimization

Break the loop early in user is after dropRef

ValueUser->UserInfo

Harbormaster completed remote builds in B79368: Diff 306229.Nov 18 2020, 2:59 PM

Harbormaster completed remote builds in B79369: Diff 306230.Nov 18 2020, 3:08 PM

Harbormaster completed remote builds in B79370: Diff 306232.Nov 18 2020, 3:11 PM

Mark symbol declaration private

Harbormaster completed remote builds in B79372: Diff 306239.Nov 18 2020, 3:31 PM

Thanks! This looks great!

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
246	nit: could -> can
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
47	nit: "it is the responsibility of the async value user" seems to imply that it is not this pass's responsibility. Suggest "To implement automatic reference counting, we must insert a +1 reference before each Operation using the value".
76	typo: yied
mlir/lib/Dialect/Async/Transforms/AsyncRefCountingOptimization.cpp
40 ↗	(On Diff #306239)	suggest putting this helper in include/Dialect/Async/IR/Async.h; it is used in the other file too.
mlir/test/Dialect/Async/async-ref-counting-optimization.mlir
1 ↗	(On Diff #306239)	Is it interesting to test `async.execute[%token]`?
55 ↗	(On Diff #306239)	is scf.if essential to this test case? If not, remove it. if so, describe it in the comment.
58 ↗	(On Diff #306239)	The input IR here seems strange to me. Will it create a leak if `%arg1 == false`? I don't see a test case that produces IR that looks like this in async-ref-counting.mlir. Perhaps it would be good to add.
64 ↗	(On Diff #306239)	nit: inconsistency of `CHECK: drop_ref` vs `CHECK: async.drop_ref`
mlir/test/Dialect/Async/async-ref-counting.mlir
146	Is there a missing `CHECK: async.add_ref %[[TOKEN]]` on the line before `%token_0 = async.execute` and a missing `CHECK: async.drop_ref %[[TOKEN_0]]` before the return? (best to show all add_ref/drop_ref, or use CHECK-NOT to show that they are not produced there)
mlir/test/Dialect/Async/verify.mlir
25	generally we don't test propreties verified by traits/interfaces.

This revision is now accepted and ready to land.Nov 19 2020, 6:08 PM

Address PR comments

Thanks for the review!

mlir/test/Dialect/Async/async-ref-counting-optimization.mlir
1 ↗	(On Diff #306239)	Added a test, it is indeed quite common pattern with nested async execute operations.
58 ↗	(On Diff #306239)	I was not really thinking about ref counting correctness when writing this tests :) Added an explicit note to the test where this property is violated.
mlir/test/Dialect/Async/async-ref-counting.mlir
146	Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back missing checks to few other tests.

Harbormaster completed remote builds in B79584: Diff 306635.Nov 20 2020, 2:57 AM

Closed by commit rGa86a9b5ef777: [mlir] Automatic reference counting for Async values + runtime support for ref… (authored by ezhulenev). · Explain WhyNov 20 2020, 3:08 AM

This revision was automatically updated to reflect the committed changes.

ezhulenev added a commit: rGa86a9b5ef777: [mlir] Automatic reference counting for Async values + runtime support for ref….

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Async/

IR/

4 lines

46 lines

2 lines

6 lines

ExecutionEngine/

AsyncRuntime.h

12 lines

integration_test/

Dialect/

Async/

CPU/

test-async-parallel-for-1d.mlir

1 line

test-async-parallel-for-2d.mlir

1 line

lib/

Conversion/

AsyncToLLVM/

AsyncToLLVM.cpp

67 lines

Dialect/

Async/

Transforms/

AsyncRefCounting.cpp

360 lines

CMakeLists.txt

1 line

ExecutionEngine/

AsyncRuntime.cpp

148 lines

test/

Conversion/

AsyncToLLVM/

convert-to-llvm.mlir

15 lines

Dialect/

Async/

async-ref-counting.mlir

181 lines

ops.mlir

14 lines

verify.mlir

14 lines

mlir-cpu-runner/

async-group.mlir

3 lines

Diff 303566

mlir/include/mlir/Dialect/Async/IR/AsyncBase.td

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines

	def Async_AnyValueType : DialectType<AsyncDialect,			def Async_AnyValueType : DialectType<AsyncDialect,
	CPred<"$_self.isa<::mlir::async::ValueType>()">,			CPred<"$_self.isa<::mlir::async::ValueType>()">,
	"async value type">;			"async value type">;

	def Async_AnyValueOrTokenType : AnyTypeOf<[Async_AnyValueType,			def Async_AnyValueOrTokenType : AnyTypeOf<[Async_AnyValueType,
	Async_TokenType]>;			Async_TokenType]>;

				def Async_AnyAsyncType : AnyTypeOf<[Async_AnyValueType,
				Async_TokenType,
				Async_GroupType]>;

	#endif // ASYNC_BASE_TD			#endif // ASYNC_BASE_TD

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	def Async_AwaitAllOp : Async_Op<"await_all", []> {
}];		}];

let arguments = (ins Async_GroupType:$operand);		let arguments = (ins Async_GroupType:$operand);
let results = (outs);		let results = (outs);

let assemblyFormat = "$operand attr-dict";		let assemblyFormat = "$operand attr-dict";
}		}

		//===----------------------------------------------------------------------===//
		// Async Dialect Automatic Reference Counting Operations.
		//===----------------------------------------------------------------------===//

		// All async values (values, tokens, groups) are reference counted at runtime
		// and automatically destructed when reference count drops to 0.
		//
		// All values semantically created with a reference count of +1 and it is
		silvasUnsubmitted Done Reply Inline Actions nit: "All values are semantically created" silvas: nit: "All values are semantically created"
		// the responsibility of the async value owner to add/drop reference count
		silvasUnsubmitted Done Reply Inline Actions unclear what "owner" means in this context. Is this referring to a runtime construct or IR construct? silvas: unclear what "owner" means in this context. Is this referring to a runtime construct or IR…
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions Changed the documentation to reflect the new implementation of automatic reference counting. ezhulenev: Changed the documentation to reflect the new implementation of automatic reference counting.
		// based on the number of uses.
		//
		// See `AsyncRefCountingPass` for the automatic reference counting
		// implementation details.

		def Async_AddRefOp : Async_Op<"add_ref"> {
		let summary = "adds a reference to async value";
		let description = [{
		The `async.add_ref` operation adds a reference(s) to async value (token,
		value or group).
		}];
		silvasUnsubmitted Done Reply Inline Actions nit: could -> can silvas: nit: could -> can

		let arguments = (ins Async_AnyAsyncType:$operand,
		Confined<I32Attr, [IntPositive]>:$count);
		let results = (outs );

		let assemblyFormat = [{
		$operand attr-dict `:` type($operand)
		}];
		}

		def Async_DropRefOp : Async_Op<"drop_ref"> {
		let summary = "drops a reference to async value";
		let description = [{
		The `async.drop_ref` operation drops a reference(s) to async value (token,
		value or group).
		}];

		let arguments = (ins Async_AnyAsyncType:$operand,
		Confined<I32Attr, [IntPositive]>:$count);
		let results = (outs );

		let assemblyFormat = [{
		$operand attr-dict `:` type($operand)
		}];
		}

#endif // ASYNC_OPS		#endif // ASYNC_OPS

mlir/include/mlir/Dialect/Async/Passes.h

	Show All 13 Lines
	#define MLIR_DIALECT_ASYNC_PASSES_H_			#define MLIR_DIALECT_ASYNC_PASSES_H_

	#include "mlir/Pass/Pass.h"			#include "mlir/Pass/Pass.h"

	namespace mlir {			namespace mlir {

	std::unique_ptr<OperationPass<FuncOp>> createAsyncParallelForPass();			std::unique_ptr<OperationPass<FuncOp>> createAsyncParallelForPass();

				std::unique_ptr<OperationPass<FuncOp>> createAsyncRefCountingPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/Async/Passes.h.inc"			#include "mlir/Dialect/Async/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_ASYNC_PASSES_H_			#endif // MLIR_DIALECT_ASYNC_PASSES_H_

mlir/include/mlir/Dialect/Async/Passes.td

Show All 18 Lines	let options = [
Option<"numConcurrentAsyncExecute", "num-concurrent-async-execute",		Option<"numConcurrentAsyncExecute", "num-concurrent-async-execute",
"int32_t", /default=/"4",		"int32_t", /default=/"4",
"The number of async.execute operations that will be used for concurrent "		"The number of async.execute operations that will be used for concurrent "
"loop execution.">		"loop execution.">
];		];
let dependentDialects = ["async::AsyncDialect", "scf::SCFDialect"];		let dependentDialects = ["async::AsyncDialect", "scf::SCFDialect"];
}		}

		def AsyncRefCounting : FunctionPass<"async-ref-counting"> {
		let summary = "Automatic reference counting for Async dialect data types";
		let constructor = "mlir::createAsyncRefCountingPass()";
		let dependentDialects = ["async::AsyncDialect"];
		}

#endif // MLIR_DIALECT_ASYNC_PASSES		#endif // MLIR_DIALECT_ASYNC_PASSES

mlir/include/mlir/ExecutionEngine/AsyncRuntime.h

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	typedef struct AsyncGroup MLIR_AsyncGroup;			typedef struct AsyncGroup MLIR_AsyncGroup;

	// Async runtime uses LLVM coroutines to represent asynchronous tasks. Task			// Async runtime uses LLVM coroutines to represent asynchronous tasks. Task
	// function is a coroutine handle and a resume function that continue coroutine			// function is a coroutine handle and a resume function that continue coroutine
	// execution from a suspension point.			// execution from a suspension point.
	using CoroHandle = void *; // coroutine handle			using CoroHandle = void *; // coroutine handle
	using CoroResume = void ()(void ); // coroutine resume function			using CoroResume = void ()(void ); // coroutine resume function

				// Async runtime uses reference counting to manage the lifetime of async values
				// (values of async types like tokens, values and groups).
				using RefCountedObjPtr = void *;

				// Adds references to reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeAddRef(RefCountedObjPtr, int32_t);

				// Drops references from reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeDropRef(RefCountedObjPtr, int32_t);

	// Create a new `async.token` in not-ready state.			// Create a new `async.token` in not-ready state.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncToken *mlirAsyncRuntimeCreateToken();			extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncToken *mlirAsyncRuntimeCreateToken();

	// Create a new `async.group` in empty state.			// Create a new `async.group` in empty state.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup();			extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup();

	extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t			extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t
	mlirAsyncRuntimeAddTokenToGroup(AsyncToken , AsyncGroup );			mlirAsyncRuntimeAddTokenToGroup(AsyncToken , AsyncGroup );
	Show All 35 Lines

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-1d.mlir

	// RUN: mlir-opt %s -async-parallel-for \			// RUN: mlir-opt %s -async-parallel-for \
				// RUN: -async-ref-counting \
	// RUN: -convert-async-to-llvm \			// RUN: -convert-async-to-llvm \
	// RUN: -convert-scf-to-std \			// RUN: -convert-scf-to-std \
	// RUN: -convert-std-to-llvm \			// RUN: -convert-std-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e entry -entry-point-result=void -O0 \			// RUN: -e entry -entry-point-result=void -O0 \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\
	// RUN: \| FileCheck %s --dump-input=always			// RUN: \| FileCheck %s --dump-input=always
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-2d.mlir

	// RUN: mlir-opt %s -async-parallel-for \			// RUN: mlir-opt %s -async-parallel-for \
				// RUN: -async-ref-counting \
	// RUN: -convert-async-to-llvm \			// RUN: -convert-async-to-llvm \
	// RUN: -convert-scf-to-std \			// RUN: -convert-scf-to-std \
	// RUN: -convert-std-to-llvm \			// RUN: -convert-std-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e entry -entry-point-result=void -O0 \			// RUN: -e entry -entry-point-result=void -O0 \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\
	// RUN: \| FileCheck %s --dump-input=always			// RUN: \| FileCheck %s --dump-input=always
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp

Show All 27 Lines

// Prefix for functions outlined from `async.execute` op regions.		// Prefix for functions outlined from `async.execute` op regions.
static constexpr const char kAsyncFnPrefix[] = "async_execute_fn";		static constexpr const char kAsyncFnPrefix[] = "async_execute_fn";

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Async Runtime C API declaration.		// Async Runtime C API declaration.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		static constexpr const char *kAddRef = "mlirAsyncRuntimeAddRef";
		static constexpr const char *kDropRef = "mlirAsyncRuntimeDropRef";
static constexpr const char *kCreateToken = "mlirAsyncRuntimeCreateToken";		static constexpr const char *kCreateToken = "mlirAsyncRuntimeCreateToken";
static constexpr const char *kCreateGroup = "mlirAsyncRuntimeCreateGroup";		static constexpr const char *kCreateGroup = "mlirAsyncRuntimeCreateGroup";
static constexpr const char *kEmplaceToken = "mlirAsyncRuntimeEmplaceToken";		static constexpr const char *kEmplaceToken = "mlirAsyncRuntimeEmplaceToken";
static constexpr const char *kAwaitToken = "mlirAsyncRuntimeAwaitToken";		static constexpr const char *kAwaitToken = "mlirAsyncRuntimeAwaitToken";
static constexpr const char *kAwaitGroup = "mlirAsyncRuntimeAwaitAllInGroup";		static constexpr const char *kAwaitGroup = "mlirAsyncRuntimeAwaitAllInGroup";
static constexpr const char *kExecute = "mlirAsyncRuntimeExecute";		static constexpr const char *kExecute = "mlirAsyncRuntimeExecute";
static constexpr const char *kAddTokenToGroup =		static constexpr const char *kAddTokenToGroup =
"mlirAsyncRuntimeAddTokenToGroup";		"mlirAsyncRuntimeAddTokenToGroup";
static constexpr const char *kAwaitAndExecute =		static constexpr const char *kAwaitAndExecute =
"mlirAsyncRuntimeAwaitTokenAndExecute";		"mlirAsyncRuntimeAwaitTokenAndExecute";
static constexpr const char *kAwaitAllAndExecute =		static constexpr const char *kAwaitAllAndExecute =
"mlirAsyncRuntimeAwaitAllInGroupAndExecute";		"mlirAsyncRuntimeAwaitAllInGroupAndExecute";

namespace {		namespace {
// Async Runtime API function types.		// Async Runtime API function types.
struct AsyncAPI {		struct AsyncAPI {
		static FunctionType refCountingFunctionType(MLIRContext *ctx) {
		silvasUnsubmitted Done Reply Inline Actions should it start with "create" to match the others? silvas: should it start with "create" to match the others?
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions `createTokenFunctionType` == function type for `createToken` function. Renamed to `addOrDropRefFunctionType` to make it clear that it is for `add_ref` and `drop_ref` ops. ezhulenev: `createTokenFunctionType` == function type for `createToken` function. Renamed to…
		auto ref = LLVM::LLVMType::getInt8PtrTy(ctx);
		auto count = IntegerType::get(32, ctx);
		return FunctionType::get({ref, count}, {}, ctx);
		}

static FunctionType createTokenFunctionType(MLIRContext *ctx) {		static FunctionType createTokenFunctionType(MLIRContext *ctx) {
return FunctionType::get({}, {TokenType::get(ctx)}, ctx);		return FunctionType::get({}, {TokenType::get(ctx)}, ctx);
}		}

static FunctionType createGroupFunctionType(MLIRContext *ctx) {		static FunctionType createGroupFunctionType(MLIRContext *ctx) {
return FunctionType::get({}, {GroupType::get(ctx)}, ctx);		return FunctionType::get({}, {GroupType::get(ctx)}, ctx);
}		}

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines

// Adds Async Runtime C API declarations to the module.		// Adds Async Runtime C API declarations to the module.
static void addAsyncRuntimeApiDeclarations(ModuleOp module) {		static void addAsyncRuntimeApiDeclarations(ModuleOp module) {
auto builder = OpBuilder::atBlockTerminator(module.getBody());		auto builder = OpBuilder::atBlockTerminator(module.getBody());

MLIRContext *ctx = module.getContext();		MLIRContext *ctx = module.getContext();
Location loc = module.getLoc();		Location loc = module.getLoc();

		if (!module.lookupSymbol(kAddRef))
		builder.create<FuncOp>(loc, kAddRef,
		AsyncAPI::refCountingFunctionType(ctx));

		if (!module.lookupSymbol(kDropRef))
		builder.create<FuncOp>(loc, kDropRef,
		AsyncAPI::refCountingFunctionType(ctx));

if (!module.lookupSymbol(kCreateToken))		if (!module.lookupSymbol(kCreateToken))
builder.create<FuncOp>(loc, kCreateToken,		builder.create<FuncOp>(loc, kCreateToken,
AsyncAPI::createTokenFunctionType(ctx));		AsyncAPI::createTokenFunctionType(ctx));

if (!module.lookupSymbol(kCreateGroup))		if (!module.lookupSymbol(kCreateGroup))
builder.create<FuncOp>(loc, kCreateGroup,		builder.create<FuncOp>(loc, kCreateGroup,
AsyncAPI::createGroupFunctionType(ctx));		AsyncAPI::createGroupFunctionType(ctx));

▲ Show 20 Lines • Show All 507 Lines • ▼ Show 20 Lines	rewriter.replaceOpWithNewOp<CallOp>(op, resultTypes, call.callee(),
call.getOperands());		call.getOperands());

return success();		return success();
}		}
};		};
} // namespace		} // namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// Async reference counting ops lowering (`async.add_ref` and `async.drop_ref`
		// to the corresponding API calls).
		//===----------------------------------------------------------------------===//

		namespace {

		template <typename RefCountingOp>
		class RefCountingOpLowering : public ConversionPattern {
		public:
		explicit RefCountingOpLowering(MLIRContext *ctx, StringRef apiFunctionName)
		: ConversionPattern(RefCountingOp::getOperationName(), 1, ctx),
		apiFunctionName(apiFunctionName) {}

		LogicalResult
		matchAndRewrite(Operation *op, ArrayRef<Value> operands,
		ConversionPatternRewriter &rewriter) const override {
		RefCountingOp refCountingOp = cast<RefCountingOp>(op);

		auto i32 = IntegerType::get(32, op->getContext());
		ftynseUnsubmitted Done Reply Inline Actions You probably want to take the operand from `operands` rather than from the op directly in case it was modified by another pattern. `AddRefOpAdaptor` is an autogenerated class that is constructible from `ArrayRef<Value>` and provides an API similar to the Op it models, i.e. you can call `adaptor.operand()`. ftynse: You probably want to take the operand from `operands` rather than from the op directly in case…
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions Wouldn't the changes be also visible through the op? From the auto generated code is seems that they are identical: ::mlir::Value AddRefOpAdaptor::operand() { return getODSOperands(0).begin(); } vs ::mlir::Operation::operand_range AddRefOp::getODSOperands(unsigned index) { auto valueRange = getODSOperandIndexAndLength(index); return {std::next(getOperation()->operand_begin(), valueRange.first), std::next(getOperation()->operand_begin(), valueRange.first + valueRange.second)}; } ::mlir::Value AddRefOp::operand() { return getODSOperands(0).begin(); } ezhulenev: Wouldn't the changes be also visible through the op? From the auto generated code is seems that…
		ftynseUnsubmitted Done Reply Inline Actions No they will not be visible. Conversion almost never changes operations in-place. `replaceOpWithNewOp` and the likes inject a new op, and keep the old op until the conversion completes in case one needs to examine the original op or its operand. The list of the operands to the op being rewritten is formed by combining the results of the new ops if they were rewritten and existing ops if they were not. This is why we pass `operands` into `matchAndRewrite`, otherwise it would have been a useless copy of `op->getOperands()`. ftynse: No they will not be visible. Conversion almost never changes operations in-place.
		auto count = IntegerAttr::get(i32, refCountingOp.count());
		silvasUnsubmitted Done Reply Inline Actions rewriter has some helpers to avoid these raw `get` calls. silvas: rewriter has some helpers to avoid these raw `get` calls.
		auto countCst = rewriter.create<ConstantOp>(op->getLoc(), i32, count);

		SmallVector<Value, 2> args = {refCountingOp.operand(), countCst};
		rewriter.replaceOpWithNewOp<CallOp>(op, Type(), apiFunctionName,
		silvasUnsubmitted Done Reply Inline Actions This should use `operands[0]` for the converted operands since this is doing a type conversion. silvas: This should use `operands[0]` for the converted operands since this is doing a type conversion.
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions Yes, also fixed a similar bug below. ezhulenev: Yes, also fixed a similar bug below.
		ValueRange(args));
		herhutUnsubmitted Done Reply Inline Actions Why not produce the `ValueRange` in place from the two arguments? herhut: Why not produce the `ValueRange` in place from the two arguments?
		return success();
		}

		private:
		StringRef apiFunctionName;
		};
		ftynseUnsubmitted Done Reply Inline Actions Could we do something like template <typename OpTy> class RefToCallLoweringPattern : public OpConversionPattern<OpTy> { RefLoweringPatter(MLIRContext ctx, StringRef funcName) : OpConversionPattern<OpTy>(ctx), funcName(funcName) {} matchAndRewrite(...) { ... rewruter.replaceOpWithNewOp<CallOp>(op, Type(), funcName, ValueRange(args)); } }; and remove duplicate code? ftynse:* Could we do something like ``` template <typename OpTy> class RefToCallLoweringPattern…

		// async.drop_ref op lowering to mlirAsyncRuntimeDropRef function call.
		class AddRefOpLowering : public RefCountingOpLowering<AddRefOp> {
		public:
		explicit AddRefOpLowering(MLIRContext *ctx)
		: RefCountingOpLowering(ctx, kAddRef) {}
		};

		// async.create_group op lowering to mlirAsyncRuntimeCreateGroup function call.
		class DropRefOpLowering : public RefCountingOpLowering<DropRefOp> {
		public:
		explicit DropRefOpLowering(MLIRContext *ctx)
		: RefCountingOpLowering(ctx, kDropRef) {}
		};

		} // namespace

		//===----------------------------------------------------------------------===//
// async.create_group op lowering to mlirAsyncRuntimeCreateGroup function call.		// async.create_group op lowering to mlirAsyncRuntimeCreateGroup function call.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
class CreateGroupOpLowering : public ConversionPattern {		class CreateGroupOpLowering : public ConversionPattern {
public:		public:
explicit CreateGroupOpLowering(MLIRContext *ctx)		explicit CreateGroupOpLowering(MLIRContext *ctx)
: ConversionPattern(CreateGroupOp::getOperationName(), 1, ctx) {}		: ConversionPattern(CreateGroupOp::getOperationName(), 1, ctx) {}
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	void ConvertAsyncToLLVMPass::runOnOperation() {
MLIRContext *ctx = &getContext();		MLIRContext *ctx = &getContext();

// Convert async dialect types and operations to LLVM dialect.		// Convert async dialect types and operations to LLVM dialect.
AsyncRuntimeTypeConverter converter;		AsyncRuntimeTypeConverter converter;
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;

populateFuncOpTypeConversionPattern(patterns, ctx, converter);		populateFuncOpTypeConversionPattern(patterns, ctx, converter);
patterns.insert<CallOpOpConversion>(ctx);		patterns.insert<CallOpOpConversion>(ctx);
		patterns.insert<AddRefOpLowering, DropRefOpLowering>(ctx);
patterns.insert<CreateGroupOpLowering, AddToGroupOpLowering>(ctx);		patterns.insert<CreateGroupOpLowering, AddToGroupOpLowering>(ctx);
patterns.insert<AwaitOpLowering, AwaitAllOpLowering>(ctx, outlinedFunctions);		patterns.insert<AwaitOpLowering, AwaitAllOpLowering>(ctx, outlinedFunctions);

ConversionTarget target(*ctx);		ConversionTarget target(*ctx);
		target.addLegalOp<ConstantOp>();
		ftynseUnsubmitted Done Reply Inline Actions I would recommend to make ConstantOp legal, not the whole StandardDialect, which has lots of different things. ftynse: I would recommend to make ConstantOp legal, not the whole StandardDialect, which has lots of…
target.addLegalDialect<LLVM::LLVMDialect>();		target.addLegalDialect<LLVM::LLVMDialect>();
target.addIllegalDialect<AsyncDialect>();		target.addIllegalDialect<AsyncDialect>();
target.addDynamicallyLegalOp<FuncOp>(		target.addDynamicallyLegalOp<FuncOp>(
[&](FuncOp op) { return converter.isSignatureLegal(op.getType()); });		[&](FuncOp op) { return converter.isSignatureLegal(op.getType()); });
target.addDynamicallyLegalOp<CallOp>(		target.addDynamicallyLegalOp<CallOp>(
[&](CallOp op) { return converter.isLegal(op.getResultTypes()); });		[&](CallOp op) { return converter.isLegal(op.getResultTypes()); });

if (failed(applyPartialConversion(module, target, std::move(patterns))))		if (failed(applyPartialConversion(module, target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}
} // namespace		} // namespace

std::unique_ptr<OperationPass<ModuleOp>> mlir::createConvertAsyncToLLVMPass() {		std::unique_ptr<OperationPass<ModuleOp>> mlir::createConvertAsyncToLLVMPass() {
return std::make_unique<ConvertAsyncToLLVMPass>();		return std::make_unique<ConvertAsyncToLLVMPass>();
}		}

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp

This file was added.

				//===- AsyncRefCounting.cpp - Implementation of Async Ref Counting --------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements automatic reference counting for Async dialect data
				// types.
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Dialect/Async/IR/Async.h"
				#include "mlir/Dialect/Async/Passes.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/IR/PatternMatch.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
				#include "llvm/ADT/SmallSet.h"

				using namespace mlir;
				using namespace mlir::async;

				#define DEBUG_TYPE "async-ref-counting"

				namespace {

				struct AsyncRefCountingPass
				: public AsyncRefCountingBase<AsyncRefCountingPass> {
				AsyncRefCountingPass() = default;
				void runOnFunction() override;
				};

				} // namespace

				/// Returns true if the type is reference counted. All async dialect types a
				herhutUnsubmitted Done Reply Inline Actions Nit: are. herhut: Nit: are.
				/// reference counted at runtime.
				ftynseUnsubmitted Done Reply Inline Actions MLIR uses `///` for top-level comments. ftynse: MLIR uses `///` for top-level comments.
				silvasUnsubmitted Done Reply Inline Actions Discuss runtime refcounting ABI conventions for runtime functions in this comment. And conventions for IR functions that accept/return refcounted objects. silvas: Discuss runtime refcounting ABI conventions for runtime functions in this comment. And…
				static bool isRefCounted(Type type) {
				return type.isa<TokenType, ValueType, GroupType>();
				rriddleUnsubmitted Done Reply Inline Actions Missing static on all of these? rriddle: Missing static on all of these?
				}

				/// Returns true if the operation `op` supports async reference counting.
				///
				/// It is the async value consumer responsibility to drop the reference count
				/// when the value is no longer needed. If the async value passed to the
				/// consumer that is not aware of reference counting, this async value will leak
				silvasUnsubmitted Done Reply Inline Actions nit: "it is the responsibility of the async value user" seems to imply that it is not this pass's responsibility. Suggest "To implement automatic reference counting, we must insert a +1 reference before each Operation using the value". silvas: nit: "it is the responsibility of the async value user" seems to imply that it is not this…
				/// at runtime.
				static bool isSupportedConsumer(Operation *op) {
				silvasUnsubmitted Done Reply Inline Actions Add the explanation from your other review comment here justifying the special treatment of async.execute. silvas: Add the explanation from your other review comment here justifying the special treatment of…
				// Return operation transfers ownership to the caller.
				if (isa<ReturnOp>(op))
				return true;

				// Async dialect operations correctly handle reference counted values.
				if (isa<ExecuteOp, AwaitOp, AwaitAllOp, AddToGroupOp>(op))
				return true;
				silvasUnsubmitted Done Reply Inline Actions nit: typo coutned silvas: nit: typo coutned

				return false;
				}

				/// Returns the statically know number of instances for all operations in the
				/// attached region (the number of times each operation inside the attached
				/// region will be executed).
				silvasUnsubmitted Done Reply Inline Actions typo: dialect types are silvas: typo: dialect types are
				static Optional<int32_t> getStaticNumberOfInstances(Operation *op) {
				assert(!op->getRegions().empty() && "operation must have attached regions");
				ftynseUnsubmitted Done Reply Inline Actions Out of scope: I am interested in seing this as a generic OpInterface, just yesterday the need for this popped up in another discussion. ftynse: Out of scope: I am interested in seing this as a generic OpInterface, just yesterday the need…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Yeah, seems like a useful property in many contexts. Will leave it for the followup. ezhulenev: Yeah, seems like a useful property in many contexts. Will leave it for the followup.

				// `async.execute` will execute all operations exactly once.
				if (isa<ExecuteOp>(op))
				return 1;

				// TODO: Loops with statically known bounds have statically know number of
				// operation instances in the loop body.
				return None;
				}

				/// Returns the statically known number of instances of the `user` operation
				silvasUnsubmitted Done Reply Inline Actions typo: yied silvas: typo: yied
				/// that consumes async values produced by the `owner` operation. Returns empty
				/// optional if the number of instances is dynamic.
				///
				/// Examples:
				///
				/// 1. `owner` and `user` are in the same region.
				///
				/// %token = ...
				/// "use"(%token): (!async.token) -> ()
				///
				/// Number of instances: 1
				///
				/// 2. `user` is inside the region with statically known execution.
				///
				/// %token = ...
				/// async.execute {
				/// "use"(%token): (!async.token) -> ()
				silvasUnsubmitted Done Reply Inline Actions explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we encounter uses in nested region) silvas: explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Added few lines to explain why ignoring nested regions is ok. ezhulenev: Added few lines to explain why ignoring nested regions is ok.
				/// }
				///
				/// Number of instances: 1 (async.execute will execute all operations in the
				/// attached body region)
				///
				/// 3. `user` is inside the dynamic control flow operation (e.g. `scf.if`,
				/// `scf.for` or `scf.parallel`).
				///
				/// %token = ...
				/// scf.if %condition {
				/// "use"(%token): (!async.token) -> ()
				/// } else {
				/// "some_other_operation"(): () -> ()
				/// }
				///
				silvasUnsubmitted Done Reply Inline Actions typo: in in silvas: typo: in in
				/// Number of instances: <unknown> (it is not statically known if the
				/// execution will go into the first region).
				///
				/// If we know the number of `user` instances statically, we can increment the
				/// reference count for the async value produced by the `owner`:
				///
				/// %token = ...
				/// async.add_ref %token {count = <static-number-of-instances - 1>}
				///
				/// For dynamic instances we can safely add a reference only in the same region
				/// as the `user` parent region. See details below.
				static Optional<int32_t> getStaticNumberOfInstances(Operation *owner,
				Operation *user) {
				int32_t result = 1;
				silvasUnsubmitted Done Reply Inline Actions findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky case): for (Operation user : value.getUsers()) { if (user->getParent() == block) { userInTheBlock = user; break; } } Also, recommend putting this in a static helper, per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code silvas:* findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions `findAncestorOpInBlock` required to find the last use in the block even if the "real" use is deep inside nested region. %token = ... scf.for %i = ... { <<<----- `scf.for` will be the last user async.await %token : !async.token } asyn.drop_ref %token. <<<---- will be added after the last use in the CFG Cleaned up code a little bit. ezhulenev: `findAncestorOpInBlock` required to find the last use in the block even if the "real" use is…

				Operation *ownerParent = owner->getParentOp();
				Operation *userParent = user->getParentOp();

				while (ownerParent != userParent) {
				if (auto num = getStaticNumberOfInstances(userParent))
				result = num;
				else
				return None;

				userParent = userParent->getParentOp();
				}

				return result;
				}

				namespace {
				struct DynamicInstanceProperties {
				/// The static number instances of the `user` operation inside the dynamic
				/// operation.
				int32_t staticNumberOfInstances;

				ftynseUnsubmitted Done Reply Inline Actions Any particular reason for using 32bit integers for refcount? In this struct, it may not even save space because the compiler will insert padding. ftynse: Any particular reason for using 32bit integers for refcount? In this struct, it may not even…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Not really, just to match the type of the `count` arg in add_ref/drop_ref ops, but that choice is also arbitrary. ezhulenev: Not really, just to match the type of the `count` arg in add_ref/drop_ref ops, but that choice…
				/// The block owned by the region attached to the dynamic operation.
				Block *dynamicBlock;

				/// Operation in the same region as async value `owner` that contains the
				/// dynamic operation (can be the dynamic operation itself). We'll use this as
				/// an anchor to add explicit `async.drop_ref` operation after it.
				Operation *refCountAnchor;
				};
				} // namespace

				/// Returns the dynamic instance properties of the `user` operation that
				/// consumes async value produced by the `owner` operation.
				///
				/// Example:
				///
				/// %token = ...
				/// scf.for %i = %c0 to %c2 step %c1 {
				/// scf.if %cond {
				/// async.execute {
				/// async.await %token : !async.token
				/// }
				/// }
				/// }
				///
				/// Innermost dynamic operation that contains the async value user `async.await`
				herhutUnsubmitted Done Reply Inline Actions I would argue for not having the users consume reference counts, as this makes it impossible to optimize the decrement operations in IR (they are tied to the ops). For instance, if you had `inc_rc` and `dec_rc` explicit, and both were in a loop, you could hoist the increments and sink the decrements, removing the overhead from the loop. That might be a better way to optimize this in general. First insert all increments and decrements trivially where needed (the buffer deallocation pass could do this for you, see my comment on other CL) and then have a pass that pushes increments and decrements up/down, combining them where possible. Seems less fragile and would work with existing interfaces for region control flow. It would also allow to pass async values to operations that do not implement the reference counting consumer interface. herhut: I would argue for not having the users consume reference counts, as this makes it impossible to…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions FWIW Swift SIL has all reference counting explicit (https://github.com/apple/swift/blob/main/docs/ARCOptimization.rst). There are two types of ref-counted value users: "forwarding": std.return, function call arg - they do not change the ref count "consumers" - everything else. Async automatic ref counting will need to either have a closed set of supported users, or rely in op interfaces to distinguish between user types. ezhulenev: FWIW Swift SIL has all reference counting explicit (https://github.
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions And there is also operation like `mlirAsyncRuntimeAddTokenToGroup` that consumes reference at some indeterminate point in the future, so if IR has `drop_ref`, then the operation will need to have `add_ref` to compensate for that or marked as `"forwarding"` (reference counting responsibility forwarded to the runtime) ezhulenev: And there is also operation like `mlirAsyncRuntimeAddTokenToGroup` that consumes reference at…
				silvasUnsubmitted Done Reply Inline Actions It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost". Can you adjust the comment? I also don't understand "Inside this operation statically known number of uses is 1" - if %cond is false it will be 0. silvas: It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost".
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions I've pushed a new revision based on liveness analysis and explicit `drop_ref` instead of implicit "ref consumer". ezhulenev: I've pushed a new revision based on liveness analysis and explicit `drop_ref` instead of…
				/// is `scf.for`. Inside this operation statically known number of uses is 1.
				///
				/// Dynamic reference counting must be added to the block that owns the
				/// operation that has statically known number of instances of the async uses.
				///
				/// With automatic reference counting this should become:
				///
				/// %token = ...
				///
				/// // Add a reference count statically because we know that we have one
				/// dynamic
				silvasUnsubmitted Done Reply Inline Actions nit: looks like line wrapping here forgot to insert `//`.Same on the async.drop_ref below. silvas: nit: looks like line wrapping here forgot to insert `//`.Same on the async.drop_ref below.
				/// // use inside the `scf.for` operation.
				/// async.add_ref %token {count = 1 : i32} : !async.token
				///
				/// scf.for %i = %c0 to %c2 step %c1 {
				/// scf.if %cond {
				/// // Add a reference count dynamically.
				/// async.add_ref %token {count = 1 : i32} : !async.token
				/// async.execute {
				/// async.await %token : !async.token
				/// }
				/// }
				/// }
				///
				silvasUnsubmitted Done Reply Inline Actions Why only ExecuteOp? Why not use NumberOfExecutions? silvas: Why only ExecuteOp? Why not use NumberOfExecutions?
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Because operations after the `async.execute` can be executed before the operations nested under the `async.execute`, this is currently the only operation that has this property. Example: %token = ... async.execute { async.await %token : !async.token // await #1 async.yield } async.await %token : !async.token // await #2 It is impossible to determine which of the `async.await` operations will be the "last use" at runtime. Ref counting will pick second await as the last user and will create `drop_ref` after it, however if first await will be executed later it needs to keep the `token` alive. ezhulenev: Because operations after the `async.execute` can be executed before the operations nested under…
				/// // Explicitly drop the static reference that we added on behalf of
				/// `scf.for` operation.
				/// async.drop_ref %token {count = 1 : i32} : !async.token
				static DynamicInstanceProperties getDynamicInstanceProperties(Operation *owner,
				Operation *user) {
				assert(!getStaticNumberOfInstances(owner, user).hasValue() &&
				"user must have dynamic number of instance");

				// Compute the number of static instances before we reach the first dynamic
				// parent.
				int32_t numberOfStaticInstances = 1;

				// Operation with statically known number of instances of the `user`
				// operation (can be `user` operation itself).
				Operation *staticUser = user;

				Operation *ownerParent = owner->getParentOp();
				Operation *userParent = user->getParentOp();

				silvasUnsubmitted Done Reply Inline Actions I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing `while (user->getRegion() != definingRegion)`. That would make this code simpler as well. silvas: I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing `while…
				// Find the parent with statically known number of instances.
				while (ownerParent != userParent) {
				if (auto n = getStaticNumberOfInstances(userParent))
				numberOfStaticInstances = n;
				else
				break;

				staticUser = userParent;
				userParent = userParent->getParentOp();
				}

				assert(ownerParent != userParent && "did not find dynamic operation");

				// Block that owns operation with statically known number of user instances,
				// but the parent has dynamic nubmer of instances.
				Block *dynamicBlock = staticUser->getBlock();
				Operation *dynamicOperation = userParent;

				assert(dynamicBlock->getParentOp() == dynamicOperation);

				// Find the operation that owns the operation with dynamic nubmer of
				// instances and has the same parent as the `owner`.
				Operation *dynamicParent = dynamicOperation->getParentOp();
				while (ownerParent != dynamicParent) {
				dynamicOperation = dynamicParent;
				dynamicParent = dynamicOperation->getParentOp();
				}

				return {numberOfStaticInstances, dynamicBlock, dynamicOperation};
				}

				static LogicalResult addAutomaticRefCounting(OpResult result) {
				silvasUnsubmitted Done Reply Inline Actions I would prefer to keep such optimizations in a separate pass. Advantages: Easy to show and test tricky cases of this optimization (the current code requires a level of indirection -- one has to imagine which ops are inserted, and then removed) When debugging a miscompile, it is easier to bisect by removing an optimization pass which should not affect correctness. Can do this more efficiently. The current algorithm is O(BlockSize^3); many ML programs are single blocks of >1000 ops. I think this algorithm can be replaced with with a single walk of each block, applying the optimization to all refcounted Value's in that block at the same time. Makes test cases for this pass clearer because users can see all the ops inserted and follow along with the code. (if you want to omit this optimization from the initial patch, that is fine too). silvas: I would prefer to keep such optimizations in a separate pass. Advantages: 1. Easy to show and…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions I moved it to a separate `async-ref-counting-optimization` pass. It is still not as efficient as it could be, but I added a small preprocessing step + iterate only the blocks that have uses of `value`. ezhulenev: I moved it to a separate `async-ref-counting-optimization` pass. It is still not as efficient…
				Operation *op = result.getOwner();
				MLIRContext *ctx = op->getContext();

				Location loc = result.getLoc();

				OpBuilder builder(op);
				builder.setInsertionPointAfter(op);

				auto i32 = IntegerType::get(32, ctx);

				// Drop ref count -1 if the result has no users.
				if (result.getUsers().empty()) {
				builder.create<DropRefOp>(loc, result, IntegerAttr::get(i32, 1));
				return success();
				}

				// Verify that all users support automatic reference counting.
				for (Operation *user : result.getUsers()) {
				if (!isSupportedConsumer(user)) {
				op->emitError() << "result #" << result.getResultNumber()
				<< " passed to the operation that does not support "
				"automatic async reference counting: "
				<< user->getName();
				return failure();
				}
				}

				// The number of statically known uses of the `result`.
				silvasUnsubmitted Done Reply Inline Actions nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance of `result`'s owner". silvas: nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance…
				int32_t staticInstances = 0;

				// Collect properties of the dynamic uses of the `result`.
				SmallVector<DynamicInstanceProperties, 8> dynamicInstances;
				ftynseUnsubmitted Done Reply Inline Actions 19 looks very unconventional. We usually try to estimate what would be the common "small" number of entries and round it up to a power of two. ftynse: 19 looks very unconventional. We usually try to estimate what would be the common "small"…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions That was a typo, it was supposed to be 10 :) Changed to 8 here and below, because that seems like a reasonable upper bound for number of uses for an async value, ezhulenev: That was a typo, it was supposed to be 10 :) Changed to 8 here and below, because that seems…

				for (const OpOperand &use : result.getUses()) {
				Operation *user = use.getOwner();

				// Check if we know the number of user instances statically.
				if (auto knownStatically = getStaticNumberOfInstances(op, user)) {
				staticInstances += *knownStatically;
				continue;
				}

				// Collect dynamic instance properties otherwise.
				DynamicInstanceProperties props = getDynamicInstanceProperties(op, user);
				dynamicInstances.push_back(props);
				}

				// Remove redundant reference counting from the same anchors ...
				llvm::SmallSet<Operation *, 8> anchors;
				// ... and aggregate the number of static instances per dynamic block.
				llvm::DenseMap<Block *, int32_t> blockStaticCounts;

				for (DynamicInstanceProperties &props : dynamicInstances) {
				blockStaticCounts[props.dynamicBlock] += props.staticNumberOfInstances;
				anchors.insert(props.refCountAnchor);
				}

				// We'll add +1 reference for each static instance of the user operation, and
				// also +1 for every dynamic instance anchor operation. Adding references for
				// dynamic instance anchors is required to keep reference counted objects
				// alive until the control flow reaches `async.add_ref` operation inside the
				// dynamic region.
				int32_t useCount = staticInstances + anchors.size();

				// Add +1 reference for each result use to eventually drop the reference count
				// to zero.
				if (useCount > 1)
				builder.create<AddRefOp>(loc, result, IntegerAttr::get(i32, useCount - 1));

				// Drop reference count immediately after the anchor operation.
				for (Operation *anchor : anchors) {
				builder.setInsertionPointAfter(anchor);
				builder.create<DropRefOp>(loc, result, IntegerAttr::get(i32, 1));
				}

				// Add statically know references at the beginning of the dynamic block.
				for (auto &kv : blockStaticCounts) {
				Block *block = kv.first;
				int32_t count = kv.second;
				builder.setInsertionPointToStart(block);
				builder.create<AddRefOp>(loc, result, IntegerAttr::get(i32, count));
				}

				return success();
				}

				static LogicalResult addAutomaticRefCounting(Operation *op) {
				for (unsigned i = 0; i < op->getNumResults(); ++i) {
				Type resultType = op->getResultTypes()[i];
				if (!isRefCounted(resultType))
				continue;

				if (failed(addAutomaticRefCounting(op->getResult(i))))
				return failure();
				}
				return success();
				}

				void AsyncRefCountingPass::runOnFunction() {
				FuncOp func = getFunction();

				// Add `async.add_ref` operations to match the number of uses for each async
				// value.
				WalkResult walkResult = func.walk([](Operation *op) -> WalkResult {
				if (failed(addAutomaticRefCounting(op)))
				return WalkResult::interrupt();
				return WalkResult::advance();
				});

				if (walkResult.wasInterrupted())
				signalPassFailure();
				}

				std::unique_ptr<OperationPass<FuncOp>> mlir::createAsyncRefCountingPass() {
				return std::make_unique<AsyncRefCountingPass>();
				}

mlir/lib/Dialect/Async/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRAsyncTransforms			add_mlir_dialect_library(MLIRAsyncTransforms
	AsyncParallelFor.cpp			AsyncParallelFor.cpp
				AsyncRefCounting.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Async			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Async

	DEPENDS			DEPENDS
	MLIRAsyncPassIncGen			MLIRAsyncPassIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRAsync			MLIRAsync
	MLIRSCF			MLIRSCF
	MLIRPass			MLIRPass
	MLIRTransforms			MLIRTransforms
	MLIRTransformUtils			MLIRTransformUtils
	)			)

mlir/lib/ExecutionEngine/AsyncRuntime.cpp

	Show All 10 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/ExecutionEngine/AsyncRuntime.h"			#include "mlir/ExecutionEngine/AsyncRuntime.h"

	#ifdef MLIR_ASYNCRUNTIME_DEFINE_FUNCTIONS			#ifdef MLIR_ASYNCRUNTIME_DEFINE_FUNCTIONS

	#include <atomic>			#include <atomic>
				#include <cassert>
	#include <condition_variable>			#include <condition_variable>
	#include <functional>			#include <functional>
	#include <iostream>			#include <iostream>
	#include <mutex>			#include <mutex>
	#include <thread>			#include <thread>
	#include <vector>			#include <vector>

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Async runtime API.			// Async runtime API.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	struct AsyncToken {			namespace {
	bool ready = false;
				// Forward declare class defined below.
				class RefCounted;

				// -------------------------------------------------------------------------- //
				// AsyncRuntime orchestrates all async operations and Async runtime API is built
				// on top of the default runtime instance.
				// -------------------------------------------------------------------------- //

				class AsyncRuntime {
				public:
				AsyncRuntime() : numRefCountedObjects(0) {}

				~AsyncRuntime() {
				assert(getNumRefCountedObjects() == 0 &&
				"all ref counted objects must be destroyed");
				}

				int32_t getNumRefCountedObjects() {
				return numRefCountedObjects.load(std::memory_order_relaxed);
				}

				private:
				friend class RefCounted;

				// Count the total number of reference counted objects in this instance
				// of an AsyncRuntime. For debugging purposes only.
				void addNumRefCountedObjects() {
				numRefCountedObjects.fetch_add(1, std::memory_order_relaxed);
				}
				void dropNumRefCountedObjects() {
				numRefCountedObjects.fetch_sub(1, std::memory_order_relaxed);
				}

				std::atomic<int32_t> numRefCountedObjects;
				};

				// Returns the default per-process instance of an async runtime.
				AsyncRuntime *getDefaultAsyncRuntimeInstance() {
				static auto runtime = std::make_unique<AsyncRuntime>();
				return runtime.get();
				}

				// -------------------------------------------------------------------------- //
				// A base class for all reference counted objects created by the async runtime.
				// -------------------------------------------------------------------------- //

				class RefCounted {
				public:
				RefCounted(AsyncRuntime *runtime, int32_t refCount = 1)
				: runtime(runtime), refCount(refCount) {
				runtime->addNumRefCountedObjects();
				}

				virtual ~RefCounted() {
				assert(refCount.load() == 0 && "reference count must be zero");
				runtime->dropNumRefCountedObjects();
				}

				RefCounted(const RefCounted &) = delete;
				RefCounted &operator=(const RefCounted &) = delete;

				void addRef(int32_t count = 1) { refCount.fetch_add(count); }
				ftynseUnsubmitted Done Reply Inline Actions please fix ftynse: please fix

				void dropRef(int32_t count = 1) {
				int32_t previous = refCount.fetch_sub(count);
				assert(previous >= count && "reference count should not go below zero");
				if (previous == count)
				destroy();
				}

				protected:
				virtual void destroy() { delete this; }

				private:
				AsyncRuntime *runtime;
				std::atomic<int32_t> refCount;
				};

				} // namespace

				struct AsyncToken : public RefCounted {
				// AsyncToken created with a reference count of 2 because it will be returned
				// to the `async.execute` caller and also will be later on emplaced by the
				// asynchronously executed task. If the caller immediately will drop its
				// reference we must ensure that the token will be alive until the
				// asynchronous operation is completed.
				AsyncToken(AsyncRuntime runtime) : RefCounted(runtime, /count=*/2) {}

				// Internal state below guarded by a mutex.
	std::mutex mu;			std::mutex mu;
	std::condition_variable cv;			std::condition_variable cv;

				bool ready = false;
	std::vector<std::function<void()>> awaiters;			std::vector<std::function<void()>> awaiters;
	};			};

	struct AsyncGroup {			struct AsyncGroup : public RefCounted {
	std::atomic<int> pendingTokens{0};			AsyncGroup(AsyncRuntime *runtime)
	std::atomic<int> rank{0};			: RefCounted(runtime), pendingTokens(0), rank(0) {}

				std::atomic<int> pendingTokens;
				std::atomic<int> rank;

				// Internal state below guarded by a mutex.
	std::mutex mu;			std::mutex mu;
	std::condition_variable cv;			std::condition_variable cv;

	std::vector<std::function<void()>> awaiters;			std::vector<std::function<void()>> awaiters;
	};			};

				// Adds references to reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeAddRef(RefCountedObjPtr ptr, int32_t count) {
				RefCounted refCounted = static_cast<RefCounted >(ptr);
				refCounted->addRef(count);
				}

				// Drops references from reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeDropRef(RefCountedObjPtr ptr, int32_t count) {
				RefCounted refCounted = static_cast<RefCounted >(ptr);
				refCounted->dropRef(count);
				}

	// Create a new `async.token` in not-ready state.			// Create a new `async.token` in not-ready state.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncToken *mlirAsyncRuntimeCreateToken() {			extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncToken *mlirAsyncRuntimeCreateToken() {
	AsyncToken *token = new AsyncToken;			AsyncToken *token = new AsyncToken(getDefaultAsyncRuntimeInstance());
	return token;			return token;
	}			}

	// Create a new `async.group` in empty state.			// Create a new `async.group` in empty state.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup() {			extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup() {
	AsyncGroup *group = new AsyncGroup;			AsyncGroup *group = new AsyncGroup(getDefaultAsyncRuntimeInstance());
	return group;			return group;
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t			extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t
	mlirAsyncRuntimeAddTokenToGroup(AsyncToken token, AsyncGroup group) {			mlirAsyncRuntimeAddTokenToGroup(AsyncToken token, AsyncGroup group) {
	std::unique_lock<std::mutex> lockToken(token->mu);			std::unique_lock<std::mutex> lockToken(token->mu);
	std::unique_lock<std::mutex> lockGroup(group->mu);			std::unique_lock<std::mutex> lockGroup(group->mu);

				// Get the rank of the token inside the group before we drop the reference.
				int rank = group->rank.fetch_add(1);
	group->pendingTokens.fetch_add(1);			group->pendingTokens.fetch_add(1);

	auto onTokenReady = [group]() {			auto onTokenReady = [group, token]() {
	// Run all group awaiters if it was the last token in the group.			// Run all group awaiters if it was the last token in the group.
	if (group->pendingTokens.fetch_sub(1) == 1) {			if (group->pendingTokens.fetch_sub(1) == 1) {
	group->cv.notify_all();			group->cv.notify_all();
	for (auto &awaiter : group->awaiters)			for (auto &awaiter : group->awaiters)
	awaiter();			awaiter();
	}			}

				// We no longer need the token or the group, drop references on them.
				group->dropRef();
				token->dropRef();
	};			};

	if (token->ready)			if (token->ready)
	onTokenReady();			onTokenReady();
	else			else
	token->awaiters.push_back([onTokenReady]() { onTokenReady(); });			token->awaiters.push_back([onTokenReady]() { onTokenReady(); });

	return group->rank.fetch_add(1);			return rank;
	}			}

	// Switches `async.token` to ready state and runs all awaiters.			// Switches `async.token` to ready state and runs all awaiters.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT void			extern "C" MLIR_ASYNCRUNTIME_EXPORT void
	mlirAsyncRuntimeEmplaceToken(AsyncToken *token) {			mlirAsyncRuntimeEmplaceToken(AsyncToken *token) {
	std::unique_lock<std::mutex> lock(token->mu);			std::unique_lock<std::mutex> lock(token->mu);
	token->ready = true;			token->ready = true;
	token->cv.notify_all();			token->cv.notify_all();
	for (auto &awaiter : token->awaiters)			for (auto &awaiter : token->awaiters)
	awaiter();			awaiter();

				token->dropRef();
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT void			extern "C" MLIR_ASYNCRUNTIME_EXPORT void
	mlirAsyncRuntimeAwaitToken(AsyncToken *token) {			mlirAsyncRuntimeAwaitToken(AsyncToken *token) {
	std::unique_lock<std::mutex> lock(token->mu);			std::unique_lock<std::mutex> lock(token->mu);
	if (!token->ready)			if (!token->ready)
	token->cv.wait(lock, [token] { return token->ready; });			token->cv.wait(lock, [token] { return token->ready; });

				token->dropRef();
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT void			extern "C" MLIR_ASYNCRUNTIME_EXPORT void
	mlirAsyncRuntimeAwaitAllInGroup(AsyncGroup *group) {			mlirAsyncRuntimeAwaitAllInGroup(AsyncGroup *group) {
	std::unique_lock<std::mutex> lock(group->mu);			std::unique_lock<std::mutex> lock(group->mu);
	if (group->pendingTokens != 0)			if (group->pendingTokens != 0)
	group->cv.wait(lock, [group] { return group->pendingTokens == 0; });			group->cv.wait(lock, [group] { return group->pendingTokens == 0; });

				group->dropRef();
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT void			extern "C" MLIR_ASYNCRUNTIME_EXPORT void
	mlirAsyncRuntimeExecute(CoroHandle handle, CoroResume resume) {			mlirAsyncRuntimeExecute(CoroHandle handle, CoroResume resume) {
	#if LLVM_ENABLE_THREADS			#if LLVM_ENABLE_THREADS
	std::thread thread([handle, resume]() { (*resume)(handle); });			std::thread thread([handle, resume]() { (*resume)(handle); });
	thread.detach();			thread.detach();
	#else			#else
	(*resume)(handle);			(*resume)(handle);
	#endif			#endif
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT void			extern "C" MLIR_ASYNCRUNTIME_EXPORT void
	mlirAsyncRuntimeAwaitTokenAndExecute(AsyncToken *token, CoroHandle handle,			mlirAsyncRuntimeAwaitTokenAndExecute(AsyncToken *token, CoroHandle handle,
	CoroResume resume) {			CoroResume resume) {
	std::unique_lock<std::mutex> lock(token->mu);			std::unique_lock<std::mutex> lock(token->mu);

	auto execute = [handle, resume]() {			auto execute = [handle, resume, token]() {
				token->dropRef();
	mlirAsyncRuntimeExecute(handle, resume);			mlirAsyncRuntimeExecute(handle, resume);
	};			};

	if (token->ready)			if (token->ready)
	execute();			execute();
	else			else
	token->awaiters.push_back([execute]() { execute(); });			token->awaiters.push_back([execute]() { execute(); });
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT void			extern "C" MLIR_ASYNCRUNTIME_EXPORT void
	mlirAsyncRuntimeAwaitAllInGroupAndExecute(AsyncGroup *group, CoroHandle handle,			mlirAsyncRuntimeAwaitAllInGroupAndExecute(AsyncGroup *group, CoroHandle handle,
	CoroResume resume) {			CoroResume resume) {
	std::unique_lock<std::mutex> lock(group->mu);			std::unique_lock<std::mutex> lock(group->mu);

	auto execute = [handle, resume]() {			auto execute = [handle, resume, group]() {
				group->dropRef();
	mlirAsyncRuntimeExecute(handle, resume);			mlirAsyncRuntimeExecute(handle, resume);
	};			};

	if (group->pendingTokens == 0)			if (group->pendingTokens == 0)
	execute();			execute();
	else			else
	group->awaiters.push_back([execute]() { execute(); });			group->awaiters.push_back([execute]() { execute(); });
	}			}
	Show All 12 Lines

mlir/test/Conversion/AsyncToLLVM/convert-to-llvm.mlir

	// RUN: mlir-opt %s -split-input-file -convert-async-to-llvm \| FileCheck %s			// RUN: mlir-opt %s -split-input-file -convert-async-to-llvm \| FileCheck %s

				// CHECK-LABEL: reference_counting
				func @reference_counting(%arg0: !async.token) {
				// CHECK: %[[C2:.*]] = constant 2 : i32
				// CHECK: call @mlirAsyncRuntimeAddRef(%arg0, %[[C2]])
				async.add_ref %arg0 {count = 2 : i32} : !async.token

				// CHECK: %[[C1:.*]] = constant 1 : i32
				// CHECK: call @mlirAsyncRuntimeDropRef(%arg0, %[[C1]])
				async.drop_ref %arg0 {count = 1 : i32} : !async.token

				return
				}

				// -----

	// CHECK-LABEL: execute_no_async_args			// CHECK-LABEL: execute_no_async_args
	func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {			func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
	// CHECK: %[[TOKEN:.*]] = call @async_execute_fn(%arg0, %arg1)			// CHECK: %[[TOKEN:.*]] = call @async_execute_fn(%arg0, %arg1)
	%token = async.execute {			%token = async.execute {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	store %arg0, %arg1[%c0] : memref<1xf32>			store %arg0, %arg1[%c0] : memref<1xf32>
	async.yield			async.yield
	}			}
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

mlir/test/Dialect/Async/async-ref-counting.mlir

This file was added.

				// RUN: mlir-opt %s -async-ref-counting \| FileCheck %s

				// CHECK-LABEL: @token_no_uses
				func @token_no_uses() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				%token = async.execute {
				async.yield
				}
				return
				}

				// CHECK-LABEL: @token_return
				func @token_return() -> !async.token {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				// CHECK: return %token
				return %token : !async.token
				}

				// CHECK-LABEL: @token_with_await
				func @token_with_await() -> !async.token {
				// CHECK: %[[TOKEN:.*]] = async.execute
				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				%token = async.execute {
				async.yield
				}
				async.await %token : !async.token
				// CHECK: return %token
				return %token : !async.token
				}

				// CHECK-LABEL: @token_capture
				func @token_capture() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				%token = async.execute {
				async.yield
				}

				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token_0 = async.execute {
				async.await %token : !async.token
				async.yield
				}

				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.await %[[TOKEN_0]]
				async.await %token : !async.token
				async.await %token_0 : !async.token

				// CHECK: return
				return
				}

				// CHECK-LABEL: @token_dependency
				func @token_dependency() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				%token = async.execute {
				async.yield
				}

				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token_0 = async.execute[%token] {
				async.yield
				}

				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.await %[[TOKEN_0]]
				async.await %token : !async.token
				async.await %token_0 : !async.token

				// CHECK: return
				return
				}

				// CHECK-LABEL: @value_operand
				func @value_operand() -> f32 {
				// CHECK: %[[TOKEN:.]], %[[RESULTS:.]] = async.execute
				// CHECK: async.add_ref %[[RESULTS]] {count = 1 : i32}
				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				%token, %results = async.execute -> !async.value<f32> {
				%0 = constant 0.0 : f32
				async.yield %0 : f32
				}

				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token_0 = async.execute[%token](%results as %arg0 : !async.value<f32>) {
				async.yield
				}

				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.await %[[TOKEN_0]]
				async.await %token : !async.token
				async.await %token_0 : !async.token

				// CHECK: async.await %[[RESULTS]]
				%0 = async.await %results : !async.value<f32>

				// CHECK: return
				return %0 : f32
				}

				// CHECK-LABEL: @async_group
				func @async_group() {
				// CHECK: %[[GROUP:.*]] = async.create_group
				// CHECK: async.add_ref %[[GROUP]] {count = 1 : i32} : !async.group
				%0 = async.create_group

				// CHECK: %[[TOKEN:.*]] = async.execute
				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token = async.execute { async.yield }
				%token_0 = async.execute { async.yield }

				// CHECK: async.add_to_group %[[TOKEN]], %[[GROUP]]
				// CHECK: async.add_to_group %[[TOKEN_0]], %[[GROUP]]
				%1 = async.add_to_group %token, %0 : !async.token
				%2 = async.add_to_group %token_0, %0 : !async.token

				// CHECK: return
				return
				}

				// CHECK-LABEL: @capture_by_scf_if
				func @capture_by_scf_if(%arg0 : i1) {
				%token = async.execute { async.yield }

				scf.if %arg0 {
				// CHECK: async.add_ref %token {count = 2 : i32}
				async.await %token : !async.token
				async.await %token : !async.token
				} else {
				// CHECK: async.add_ref %token {count = 1 : i32}
				async.await %token : !async.token
				}
				// CHECK: async.drop_ref %token {count = 1 : i32}

				return
				}

				// CHECK-LABEL: @capture_by_scf_if_with_async_execute
				func @capture_by_scf_if_with_async_execute(%arg0 : i1) {
				%token = async.execute { async.yield }
				silvasUnsubmitted Done Reply Inline Actions Is there a missing `CHECK: async.add_ref %[[TOKEN]]` on the line before `%token_0 = async.execute` and a missing `CHECK: async.drop_ref %[[TOKEN_0]]` before the return? (best to show all add_ref/drop_ref, or use CHECK-NOT to show that they are not produced there) silvas: Is there a missing `CHECK: async.add_ref %[[TOKEN]]` on the line before `%token_0 = async.
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back missing checks to few other tests. ezhulenev: Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back…

				// `async.await` from the `async.execute` rolled up to the first
				// operation with dynamic number of instances.
				scf.if %arg0 {
				// CHECK: async.add_ref %token {count = 2 : i32}
				async.execute {
				async.await %token : !async.token
				async.await %token : !async.token
				async.yield
				}
				} else {
				// CHECK: async.add_ref %token {count = 1 : i32}
				async.await %token : !async.token
				}
				// CHECK: async.drop_ref %token {count = 1 : i32}

				return
				}

				// CHECK-LABEL: @capture_by_scf_for
				func @capture_by_scf_for() {
				%token = async.execute { async.yield }

				%c0 = constant 0 : index
				%c1 = constant 1 : index
				%c2 = constant 0 : index

				scf.for %i = %c0 to %c2 step %c1 {
				// CHECK: async.add_ref %token {count = 1 : i32}
				async.await %token : !async.token
				}
				// CHECK: async.drop_ref %token {count = 1 : i32}

				return
				}

mlir/test/Dialect/Async/ops.mlir

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	func @create_group_and_await_all(%arg0: !async.token, %arg1: !async.value<f32>) -> index {
// CHECK: async.add_to_group %arg1		// CHECK: async.add_to_group %arg1
%1 = async.add_to_group %arg0, %0 : !async.token		%1 = async.add_to_group %arg0, %0 : !async.token
%2 = async.add_to_group %arg1, %0 : !async.value<f32>		%2 = async.add_to_group %arg1, %0 : !async.value<f32>
async.await_all %0		async.await_all %0

%3 = addi %1, %2 : index		%3 = addi %1, %2 : index
return %3 : index		return %3 : index
}		}

		// CHECK-LABEL: @add_ref
		func @add_ref(%arg0: !async.token) {
		// CHECK: async.add_ref %arg0 {count = 1 : i32}
		async.add_ref %arg0 {count = 1 : i32} : !async.token
		return
		}

		// CHECK-LABEL: @drop_ref
		func @drop_ref(%arg0: !async.token) {
		// CHECK: async.drop_ref %arg0 {count = 1 : i32}
		async.drop_ref %arg0 {count = 1 : i32} : !async.token
		return
		}

mlir/test/Dialect/Async/verify.mlir

	Show All 13 Lines
	}			}

	// -----			// -----

	func @wrong_async_await_result_type(%arg0: !async.value<f32>) {			func @wrong_async_await_result_type(%arg0: !async.value<f32>) {
	// expected-error @+1 {{'async.await' op result type 'f64' does not match async value type 'f32'}}			// expected-error @+1 {{'async.await' op result type 'f64' does not match async value type 'f32'}}
	%0 = "async.await"(%arg0): (!async.value<f32>) -> f64			%0 = "async.await"(%arg0): (!async.value<f32>) -> f64
	}			}

				// -----

				func @wrong_add_ref_count(%arg0: !async.token) {
				silvasUnsubmitted Done Reply Inline Actions generally we don't test propreties verified by traits/interfaces. silvas: generally we don't test propreties verified by traits/interfaces.
				// expected-error @+1 {{'async.add_ref' op attribute 'count' failed to satisfy constraint: 32-bit signless integer attribute whose value is positive}}
				async.add_ref %arg0 {count = 0 : i32} : !async.token
				}

				// -----

				func @wrong_drop_ref_count(%arg0: !async.token) {
				// expected-error @+1 {{'async.drop_ref' op attribute 'count' failed to satisfy constraint: 32-bit signless integer attribute whose value is positive}}
				async.drop_ref %arg0 {count = 0 : i32} : !async.token
				}

mlir/test/mlir-cpu-runner/async-group.mlir

	// RUN: mlir-opt %s -convert-async-to-llvm \			// RUN: mlir-opt %s -async-ref-counting \
				// RUN: -convert-async-to-llvm \
	// RUN: -convert-std-to-llvm \			// RUN: -convert-std-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e main -entry-point-result=void -O0 \			// RUN: -e main -entry-point-result=void -O0 \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_c_runner_utils%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_c_runner_utils%shlibext \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_async_runtime%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_async_runtime%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	Show All 31 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Automatic reference counting for Async values + runtime support for ref counted objectsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 303566

mlir/include/mlir/Dialect/Async/IR/AsyncBase.td

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td

mlir/include/mlir/Dialect/Async/Passes.h

mlir/include/mlir/Dialect/Async/Passes.td

mlir/include/mlir/ExecutionEngine/AsyncRuntime.h

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-1d.mlir

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-2d.mlir

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp

mlir/lib/Dialect/Async/Transforms/CMakeLists.txt

mlir/lib/ExecutionEngine/AsyncRuntime.cpp

mlir/test/Conversion/AsyncToLLVM/convert-to-llvm.mlir

mlir/test/Dialect/Async/async-ref-counting.mlir

mlir/test/Dialect/Async/ops.mlir

mlir/test/Dialect/Async/verify.mlir

mlir/test/mlir-cpu-runner/async-group.mlir

[mlir] Automatic reference counting for Async values + runtime support for ref counted objects
ClosedPublic