This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/Async/
-
Async/
-
IR/
-
Async.h
-
AsyncBase.td
4/4
AsyncOps.td
-
Passes.h
-
Passes.td
-
ExecutionEngine/
-
AsyncRuntime.h
-
integration_test/Dialect/Async/CPU/
-
Dialect/
-
Async/
-
CPU/
-
test-async-parallel-for-1d.mlir
-
test-async-parallel-for-2d.mlir
-
lib/
-
Conversion/AsyncToLLVM/
-
AsyncToLLVM/
11/11
AsyncToLLVM.cpp
-
Dialect/Async/Transforms/
-
Async/
-
Transforms/
32/32
AsyncRefCounting.cpp
1/1
AsyncRefCountingOptimization.cpp
-
CMakeLists.txt
-
ExecutionEngine/
1/1
AsyncRuntime.cpp
-
test/
-
Conversion/AsyncToLLVM/
-
AsyncToLLVM/
-
convert-to-llvm.mlir
-
Dialect/Async/
-
Async/
6/6
async-ref-counting-optimization.mlir
2/2
async-ref-counting.mlir
-
ops.mlir
-
mlir-cpu-runner/
-
async-group.mlir
-
async.mlir

Differential D90716

[mlir] Automatic reference counting for Async values + runtime support for ref counted objects
ClosedPublic

Authored by ezhulenev on Nov 3 2020, 1:57 PM.

Download Raw Diff

Details

Reviewers

ftynse
aartbik
silvas
mehdi_amini
herhut

Commits

rGa86a9b5ef777: [mlir] Automatic reference counting for Async values + runtime support for ref…

Summary

Depends On D89963

Automatic reference counting algorithm outline:

ReturnLike operations forward the reference counted values without modifying the reference count.
Use liveness analysis to find blocks in the CFG where the lifetime of reference counted values ends, and insert drop_ref operations after the last use of the value.
Insert add_ref before the async.execute operation capturing the value, and pairing drop_ref before the async body region terminator, to release the captured reference counted value when execution completes.
If the reference counted value is passed only to some of the block successors, insert drop_ref operations in the beginning of the blocks that do not have reference coutned value uses.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ezhulenev created this revision.Nov 3 2020, 1:57 PM

Herald added a reviewer: ftynse. · View Herald TranscriptNov 3 2020, 1:57 PM

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: rdzhabarov, tatianashp, msifontes and 15 others. · View Herald Transcript

ezhulenev requested review of this revision.Nov 3 2020, 1:57 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptNov 3 2020, 1:57 PM

ezhulenev edited the summary of this revision. (Show Details)Nov 3 2020, 2:02 PM

ezhulenev added reviewers: mehdi_amini, herhut.

Harbormaster completed remote builds in B77466: Diff 302681.Nov 3 2020, 2:12 PM

ftynse added inline comments.Nov 5 2020, 2:11 AM

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
621	You probably want to take the operand from `operands` rather than from the op directly in case it was modified by another pattern. `AddRefOpAdaptor` is an autogenerated class that is constructible from `ArrayRef<Value>` and provides an API similar to the Op it models, i.e. you can call `adaptor.operand()`.
633	Could we do something like template <typename OpTy> class RefToCallLoweringPattern : public OpConversionPattern<OpTy> { RefLoweringPatter(MLIRContext *ctx, StringRef funcName) : OpConversionPattern<OpTy>(ctx), funcName(funcName) {} matchAndRewrite(...) { ... rewruter.replaceOpWithNewOp<CallOp>(op, Type(), funcName, ValueRange(args)); } }; and remove duplicate code?
862	I would recommend to make ConstantOp legal, not the whole StandardDialect, which has lots of different things.
mlir/lib/Dialect/Async/IR/Async.cpp
349–350 ↗	(On Diff #302681)	Just declare it as `IntNonNegative` in ODS.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
38	MLIR uses `///` for top-level comments.
65	Out of scope: I am interested in seing this as a generic OpInterface, just yesterday the need for this popped up in another discussion.
144	Any particular reason for using 32bit integers for refcount? In this struct, it may not even save space because the compiler will insert padding.
276	19 looks very unconventional. We usually try to estimate what would be the common "small" number of entries and round it up to a power of two.
mlir/lib/ExecutionEngine/AsyncRuntime.cpp
94	please fix

Remove code duplication in op lowering + fix style guide violations

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
621	Wouldn't the changes be also visible through the op? From the auto generated code is seems that they are identical: ::mlir::Value AddRefOpAdaptor::operand() { return getODSOperands(0).begin(); } vs ::mlir::Operation::operand_range AddRefOp::getODSOperands(unsigned index) { auto valueRange = getODSOperandIndexAndLength(index); return {std::next(getOperation()->operand_begin(), valueRange.first), std::next(getOperation()->operand_begin(), valueRange.first + valueRange.second)}; } ::mlir::Value AddRefOp::operand() { return getODSOperands(0).begin(); }
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
65	Yeah, seems like a useful property in many contexts. Will leave it for the followup.
144	Not really, just to match the type of the `count` arg in add_ref/drop_ref ops, but that choice is also arbitrary.
276	That was a typo, it was supposed to be 10 :) Changed to 8 here and below, because that seems like a reasonable upper bound for number of uses for an async value,

ezhulenev marked an inline comment as not done.Nov 5 2020, 3:51 AM

Use IntPositive trait for ref count attr

Harbormaster completed remote builds in B77681: Diff 303077.Nov 5 2020, 4:11 AM

Harbormaster completed remote builds in B77679: Diff 303074.Nov 5 2020, 4:15 AM

ftynse added inline comments.Nov 5 2020, 4:43 AM

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
621	No they will not be visible. Conversion almost never changes operations in-place. `replaceOpWithNewOp` and the likes inject a new op, and keep the old op until the conversion completes in case one needs to examine the original op or its operand. The list of the operands to the op being rewritten is formed by combining the results of the new ops if they were rewritten and existing ops if they were not. This is why we pass `operands` into `matchAndRewrite`, otherwise it would have been a useless copy of `op->getOperands()`.

rriddle added inline comments.Nov 6 2020, 1:16 PM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
40	Missing static on all of these?

rriddle mentioned this in D90922: [mlir] Add NumberOfExecutions analysis + update RegionBranchOpInterface interface to query number of region invocations.Nov 6 2020, 1:18 PM

Add static to functions in AsyncRefCounting.cpp

Harbormaster completed remote builds in B77948: Diff 303566.Nov 6 2020, 4:25 PM

ezhulenev mentioned this in D89963: [mlir] Transform scf.parallel to scf.for + async.execute.Nov 13 2020, 3:11 AM

herhut added inline comments.Nov 13 2020, 3:40 AM

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
627	Why not produce the `ValueRange` in place from the two arguments?
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
38	Nit: are.
170	I would argue for not having the users consume reference counts, as this makes it impossible to optimize the decrement operations in IR (they are tied to the ops). For instance, if you had `inc_rc` and `dec_rc` explicit, and both were in a loop, you could hoist the increments and sink the decrements, removing the overhead from the loop. That might be a better way to optimize this in general. First insert all increments and decrements trivially where needed (the buffer deallocation pass could do this for you, see my comment on other CL) and then have a pass that pushes increments and decrements up/down, combining them where possible. Seems less fragile and would work with existing interfaces for region control flow. It would also allow to pass async values to operations that do not implement the reference counting consumer interface.

ezhulenev added inline comments.Nov 13 2020, 4:02 AM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170	FWIW Swift SIL has all reference counting explicit (https://github.com/apple/swift/blob/main/docs/ARCOptimization.rst). There are two types of ref-counted value users: "forwarding": std.return, function call arg - they do not change the ref count "consumers" - everything else. Async automatic ref counting will need to either have a closed set of supported users, or rely in op interfaces to distinguish between user types.

ezhulenev added inline comments.Nov 13 2020, 4:41 AM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170	And there is also operation like `mlirAsyncRuntimeAddTokenToGroup` that consumes reference at some indeterminate point in the future, so if IR has `drop_ref`, then the operation will need to have `add_ref` to compensate for that or marked as `"forwarding"` (reference counting responsibility forwarded to the runtime)

ezhulenev edited the summary of this revision. (Show Details)Nov 13 2020, 12:38 PM

ezhulenev removed reviewers: ftynse, aartbik, mehdi_amini, herhut.

Herald added a reviewer: ftynse. · View Herald TranscriptNov 13 2020, 12:38 PM

Herald added a reviewer: aartbik. · View Herald Transcript

silvas added a subscriber: silvas.Nov 13 2020, 6:17 PM

silvas added inline comments.

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170	It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost". Can you adjust the comment? I also don't understand "Inside this operation statically known number of uses is 1" - if %cond is false it will be 0.
181	nit: looks like line wrapping here forgot to insert `//`.Same on the async.drop_ref below.
273	nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance of `result`'s owner".

Use liveness analysis for reference counting

Herald added a subscriber: teijeong. · View Herald TranscriptNov 16 2020, 3:35 AM

ezhulenev edited the summary of this revision. (Show Details)Nov 16 2020, 3:39 AM

ezhulenev added reviewers: silvas, mehdi_amini, herhut.

Harbormaster completed remote builds in B78943: Diff 305458.Nov 16 2020, 3:49 AM

Construct ValueRange directly as an argument to create call

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
170	I've pushed a new revision based on liveness analysis and explicit `drop_ref` instead of implicit "ref consumer".

Harbormaster completed remote builds in B78947: Diff 305469.Nov 16 2020, 4:44 AM

silvas added inline comments.Nov 17 2020, 9:03 AM

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
193	Why only ExecuteOp? Why not use NumberOfExecutions?

ezhulenev marked an inline comment as done.Nov 17 2020, 9:11 AM

ezhulenev added inline comments.

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
193	Because operations after the `async.execute` can be executed before the operations nested under the `async.execute`, this is currently the only operation that has this property. Example: %token = ... async.execute { async.await %token : !async.token // await #1 async.yield } async.await %token : !async.token // await #2 It is impossible to determine which of the `async.await` operations will be the "last use" at runtime. Ref counting will pick second await as the last user and will create `drop_ref` after it, however if first await will be executed later it needs to keep the `token` alive.

silvas added inline comments.Nov 17 2020, 4:05 PM

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
237	nit: "All values are semantically created"
238	unclear what "owner" means in this context. Is this referring to a runtime construct or IR construct?
mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
54	should it start with "create" to match the others?
622	rewriter has some helpers to avoid these raw `get` calls.
626	This should use `operands[0]` for the converted operands since this is doing a type conversion.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
38	Discuss runtime refcounting ABI conventions for runtime functions in this comment. And conventions for IR functions that accept/return refcounted objects.
49	Add the explanation from your other review comment here justifying the special treatment of async.execute.
56	nit: typo coutned
63	typo: dialect types are
93	explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we encounter uses in nested region)
108	typo: in in
122	findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky case): for (Operation *user : value.getUsers()) { if (user->getParent() == block) { userInTheBlock = user; break; } } Also, recommend putting this in a static helper, per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code
212	I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing `while (user->getRegion() != definingRegion)`. That would make this code simpler as well.
244	I would prefer to keep such optimizations in a separate pass. Advantages: Easy to show and test tricky cases of this optimization (the current code requires a level of indirection -- one has to imagine which ops are inserted, and then removed) When debugging a miscompile, it is easier to bisect by removing an optimization pass which should not affect correctness. Can do this more efficiently. The current algorithm is O(BlockSize^3); many ML programs are single blocks of >1000 ops. I think this algorithm can be replaced with with a single walk of each block, applying the optimization to all refcounted Value's in that block at the same time. Makes test cases for this pass clearer because users can see all the ops inserted and follow along with the code. (if you want to omit this optimization from the initial patch, that is fine too).

Add a separate AsyncRefCountingOptimization pass + address PR comments

Herald added a subscriber: mgrang. · View Herald TranscriptNov 18 2020, 1:58 PM

ezhulenev added inline comments.Nov 18 2020, 2:02 PM

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
238	Changed the documentation to reflect the new implementation of automatic reference counting.
mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp
54	`createTokenFunctionType` == function type for `createToken` function. Renamed to `addOrDropRefFunctionType` to make it clear that it is for `add_ref` and `drop_ref` ops.
626	Yes, also fixed a similar bug below.
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
93	Added few lines to explain why ignoring nested regions is ok.
122	`findAncestorOpInBlock` required to find the last use in the block even if the "real" use is deep inside nested region. %token = ... scf.for %i = ... { <<<----- `scf.for` will be the last user async.await %token : !async.token } asyn.drop_ref %token. <<<---- will be added after the last use in the CFG Cleaned up code a little bit.
244	I moved it to a separate `async-ref-counting-optimization` pass. It is still not as efficient as it could be, but I added a small preprocessing step + iterate only the blocks that have uses of `value`.

Harbormaster completed remote builds in B79359: Diff 306216.Nov 18 2020, 2:17 PM

Fix a bug in ref counting optimization

Break the loop early in user is after dropRef

ValueUser->UserInfo

Harbormaster completed remote builds in B79368: Diff 306229.Nov 18 2020, 2:59 PM

Harbormaster completed remote builds in B79369: Diff 306230.Nov 18 2020, 3:08 PM

Harbormaster completed remote builds in B79370: Diff 306232.Nov 18 2020, 3:11 PM

Mark symbol declaration private

Harbormaster completed remote builds in B79372: Diff 306239.Nov 18 2020, 3:31 PM

Thanks! This looks great!

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td
249	nit: could -> can
mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp
47	nit: "it is the responsibility of the async value user" seems to imply that it is not this pass's responsibility. Suggest "To implement automatic reference counting, we must insert a +1 reference before each Operation using the value".
76	typo: yied
mlir/lib/Dialect/Async/Transforms/AsyncRefCountingOptimization.cpp
41	suggest putting this helper in include/Dialect/Async/IR/Async.h; it is used in the other file too.
mlir/test/Dialect/Async/async-ref-counting-optimization.mlir
2	Is it interesting to test `async.execute[%token]`?
56	is scf.if essential to this test case? If not, remove it. if so, describe it in the comment.
59	The input IR here seems strange to me. Will it create a leak if `%arg1 == false`? I don't see a test case that produces IR that looks like this in async-ref-counting.mlir. Perhaps it would be good to add.
65	nit: inconsistency of `CHECK: drop_ref` vs `CHECK: async.drop_ref`
mlir/test/Dialect/Async/async-ref-counting.mlir
146	Is there a missing `CHECK: async.add_ref %[[TOKEN]]` on the line before `%token_0 = async.execute` and a missing `CHECK: async.drop_ref %[[TOKEN_0]]` before the return? (best to show all add_ref/drop_ref, or use CHECK-NOT to show that they are not produced there)
mlir/test/Dialect/Async/verify.mlir
25 ↗	(On Diff #306239)	generally we don't test propreties verified by traits/interfaces.

This revision is now accepted and ready to land.Nov 19 2020, 6:08 PM

Address PR comments

Thanks for the review!

mlir/test/Dialect/Async/async-ref-counting-optimization.mlir
2	Added a test, it is indeed quite common pattern with nested async execute operations.
59	I was not really thinking about ref counting correctness when writing this tests :) Added an explicit note to the test where this property is violated.
mlir/test/Dialect/Async/async-ref-counting.mlir
146	Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back missing checks to few other tests.

Harbormaster completed remote builds in B79584: Diff 306635.Nov 20 2020, 2:57 AM

Closed by commit rGa86a9b5ef777: [mlir] Automatic reference counting for Async values + runtime support for ref… (authored by ezhulenev). · Explain WhyNov 20 2020, 3:08 AM

This revision was automatically updated to reflect the committed changes.

ezhulenev added a commit: rGa86a9b5ef777: [mlir] Automatic reference counting for Async values + runtime support for ref….

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Async/

IR/

10 lines

4 lines

58 lines

4 lines

14 lines

ExecutionEngine/

AsyncRuntime.h

12 lines

integration_test/

Dialect/

Async/

CPU/

test-async-parallel-for-1d.mlir

1 line

test-async-parallel-for-2d.mlir

1 line

lib/

Conversion/

AsyncToLLVM/

AsyncToLLVM.cpp

64 lines

Dialect/

Async/

Transforms/

AsyncRefCounting.cpp

324 lines

AsyncRefCountingOptimization.cpp

218 lines

CMakeLists.txt

2 lines

ExecutionEngine/

AsyncRuntime.cpp

181 lines

test/

Conversion/

AsyncToLLVM/

convert-to-llvm.mlir

15 lines

Dialect/

Async/

async-ref-counting-optimization.mlir

113 lines

async-ref-counting.mlir

253 lines

ops.mlir

14 lines

mlir-cpu-runner/

async-group.mlir

3 lines

async.mlir

3 lines

Diff 306641

mlir/include/mlir/Dialect/Async/IR/Async.h

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	};			};

	/// The group type to represent async tokens or values grouped together.			/// The group type to represent async tokens or values grouped together.
	class GroupType : public Type::TypeBase<GroupType, Type, TypeStorage> {			class GroupType : public Type::TypeBase<GroupType, Type, TypeStorage> {
	public:			public:
	using Base::Base;			using Base::Base;
	};			};

				// -------------------------------------------------------------------------- //
				// Helper functions of Async dialect transformations.
				// -------------------------------------------------------------------------- //

				/// Returns true if the type is reference counted. All async dialect types are
				/// reference counted at runtime.
				inline bool isRefCounted(Type type) {
				return type.isa<TokenType, ValueType, GroupType>();
				}

	} // namespace async			} // namespace async
	} // namespace mlir			} // namespace mlir

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/Async/IR/AsyncOps.h.inc"			#include "mlir/Dialect/Async/IR/AsyncOps.h.inc"

	#include "mlir/Dialect/Async/IR/AsyncOpsDialect.h.inc"			#include "mlir/Dialect/Async/IR/AsyncOpsDialect.h.inc"

	#endif // MLIR_DIALECT_ASYNC_IR_ASYNC_H			#endif // MLIR_DIALECT_ASYNC_IR_ASYNC_H

mlir/include/mlir/Dialect/Async/IR/AsyncBase.td

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines

	def Async_AnyValueType : DialectType<AsyncDialect,			def Async_AnyValueType : DialectType<AsyncDialect,
	CPred<"$_self.isa<::mlir::async::ValueType>()">,			CPred<"$_self.isa<::mlir::async::ValueType>()">,
	"async value type">;			"async value type">;

	def Async_AnyValueOrTokenType : AnyTypeOf<[Async_AnyValueType,			def Async_AnyValueOrTokenType : AnyTypeOf<[Async_AnyValueType,
	Async_TokenType]>;			Async_TokenType]>;

				def Async_AnyAsyncType : AnyTypeOf<[Async_AnyValueType,
				Async_TokenType,
				Async_GroupType]>;

	#endif // ASYNC_BASE_TD			#endif // ASYNC_BASE_TD

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	def Async_AwaitAllOp : Async_Op<"await_all", []> {
}];		}];

let arguments = (ins Async_GroupType:$operand);		let arguments = (ins Async_GroupType:$operand);
let results = (outs);		let results = (outs);

let assemblyFormat = "$operand attr-dict";		let assemblyFormat = "$operand attr-dict";
}		}

		//===----------------------------------------------------------------------===//
		// Async Dialect Automatic Reference Counting Operations.
		//===----------------------------------------------------------------------===//

		// All async values (values, tokens, groups) are reference counted at runtime
		// and automatically destructed when reference count drops to 0.
		//
		// All values are semantically created with a reference count of +1 and it is
		silvasUnsubmitted Done Reply Inline Actions nit: "All values are semantically created" silvas: nit: "All values are semantically created"
		// the responsibility of the last async value user to drop reference count.
		silvasUnsubmitted Done Reply Inline Actions unclear what "owner" means in this context. Is this referring to a runtime construct or IR construct? silvas: unclear what "owner" means in this context. Is this referring to a runtime construct or IR…
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions Changed the documentation to reflect the new implementation of automatic reference counting. ezhulenev: Changed the documentation to reflect the new implementation of automatic reference counting.
		//
		// Async values created when:
		// 1. Operation returns async result (e.g. the result of an `async.execute`).
		// 2. Async value passed in as a block argument.
		//
		// It is the responsiblity of the async value user to extend the lifetime by
		// adding a +1 reference, if the reference counted value captured by the
		// asynchronously executed region (`async.execute` operation), and drop it after
		// the last nested use.
		//
		// Reference counting operations can be added to the IR using automatic
		silvasUnsubmitted Done Reply Inline Actions nit: could -> can silvas: nit: could -> can
		// reference count pass, that relies on liveness analysis to find the last uses
		// of all reference counted values and automatically inserts
		// `drop_ref` operations.
		//
		// See `AsyncRefCountingPass` documentation for the implementation details.

		def Async_AddRefOp : Async_Op<"add_ref"> {
		let summary = "adds a reference to async value";
		let description = [{
		The `async.add_ref` operation adds a reference(s) to async value (token,
		value or group).
		}];

		let arguments = (ins Async_AnyAsyncType:$operand,
		Confined<I32Attr, [IntPositive]>:$count);
		let results = (outs );

		let assemblyFormat = [{
		$operand attr-dict `:` type($operand)
		}];
		}

		def Async_DropRefOp : Async_Op<"drop_ref"> {
		let summary = "drops a reference to async value";
		let description = [{
		The `async.drop_ref` operation drops a reference(s) to async value (token,
		value or group).
		}];

		let arguments = (ins Async_AnyAsyncType:$operand,
		Confined<I32Attr, [IntPositive]>:$count);
		let results = (outs );

		let assemblyFormat = [{
		$operand attr-dict `:` type($operand)
		}];
		}

#endif // ASYNC_OPS		#endif // ASYNC_OPS

mlir/include/mlir/Dialect/Async/Passes.h

	Show All 13 Lines
	#define MLIR_DIALECT_ASYNC_PASSES_H_			#define MLIR_DIALECT_ASYNC_PASSES_H_

	#include "mlir/Pass/Pass.h"			#include "mlir/Pass/Pass.h"

	namespace mlir {			namespace mlir {

	std::unique_ptr<OperationPass<FuncOp>> createAsyncParallelForPass();			std::unique_ptr<OperationPass<FuncOp>> createAsyncParallelForPass();

				std::unique_ptr<OperationPass<FuncOp>> createAsyncRefCountingPass();

				std::unique_ptr<OperationPass<FuncOp>> createAsyncRefCountingOptimizationPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/Async/Passes.h.inc"			#include "mlir/Dialect/Async/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_ASYNC_PASSES_H_			#endif // MLIR_DIALECT_ASYNC_PASSES_H_

mlir/include/mlir/Dialect/Async/Passes.td

Show All 18 Lines	let options = [
Option<"numConcurrentAsyncExecute", "num-concurrent-async-execute",		Option<"numConcurrentAsyncExecute", "num-concurrent-async-execute",
"int32_t", /default=/"4",		"int32_t", /default=/"4",
"The number of async.execute operations that will be used for concurrent "		"The number of async.execute operations that will be used for concurrent "
"loop execution.">		"loop execution.">
];		];
let dependentDialects = ["async::AsyncDialect", "scf::SCFDialect"];		let dependentDialects = ["async::AsyncDialect", "scf::SCFDialect"];
}		}

		def AsyncRefCounting : FunctionPass<"async-ref-counting"> {
		let summary = "Automatic reference counting for Async dialect data types";
		let constructor = "mlir::createAsyncRefCountingPass()";
		let dependentDialects = ["async::AsyncDialect"];
		}

		def AsyncRefCountingOptimization :
		FunctionPass<"async-ref-counting-optimization"> {
		let summary = "Optimize automatic reference counting operations for the"
		"Async dialect by removing redundant operations";
		let constructor = "mlir::createAsyncRefCountingOptimizationPass()";
		let dependentDialects = ["async::AsyncDialect"];
		}

#endif // MLIR_DIALECT_ASYNC_PASSES		#endif // MLIR_DIALECT_ASYNC_PASSES

mlir/include/mlir/ExecutionEngine/AsyncRuntime.h

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	typedef struct AsyncGroup MLIR_AsyncGroup;			typedef struct AsyncGroup MLIR_AsyncGroup;

	// Async runtime uses LLVM coroutines to represent asynchronous tasks. Task			// Async runtime uses LLVM coroutines to represent asynchronous tasks. Task
	// function is a coroutine handle and a resume function that continue coroutine			// function is a coroutine handle and a resume function that continue coroutine
	// execution from a suspension point.			// execution from a suspension point.
	using CoroHandle = void *; // coroutine handle			using CoroHandle = void *; // coroutine handle
	using CoroResume = void ()(void ); // coroutine resume function			using CoroResume = void ()(void ); // coroutine resume function

				// Async runtime uses reference counting to manage the lifetime of async values
				// (values of async types like tokens, values and groups).
				using RefCountedObjPtr = void *;

				// Adds references to reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeAddRef(RefCountedObjPtr, int32_t);

				// Drops references from reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeDropRef(RefCountedObjPtr, int32_t);

	// Create a new `async.token` in not-ready state.			// Create a new `async.token` in not-ready state.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncToken *mlirAsyncRuntimeCreateToken();			extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncToken *mlirAsyncRuntimeCreateToken();

	// Create a new `async.group` in empty state.			// Create a new `async.group` in empty state.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup();			extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup();

	extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t			extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t
	mlirAsyncRuntimeAddTokenToGroup(AsyncToken , AsyncGroup );			mlirAsyncRuntimeAddTokenToGroup(AsyncToken , AsyncGroup );
	Show All 35 Lines

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-1d.mlir

	// RUN: mlir-opt %s -async-parallel-for \			// RUN: mlir-opt %s -async-parallel-for \
				// RUN: -async-ref-counting \
	// RUN: -convert-async-to-llvm \			// RUN: -convert-async-to-llvm \
	// RUN: -convert-scf-to-std \			// RUN: -convert-scf-to-std \
	// RUN: -convert-std-to-llvm \			// RUN: -convert-std-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e entry -entry-point-result=void -O0 \			// RUN: -e entry -entry-point-result=void -O0 \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\
	// RUN: \| FileCheck %s --dump-input=always			// RUN: \| FileCheck %s --dump-input=always
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-2d.mlir

	// RUN: mlir-opt %s -async-parallel-for \			// RUN: mlir-opt %s -async-parallel-for \
				// RUN: -async-ref-counting \
	// RUN: -convert-async-to-llvm \			// RUN: -convert-async-to-llvm \
	// RUN: -convert-scf-to-std \			// RUN: -convert-scf-to-std \
	// RUN: -convert-std-to-llvm \			// RUN: -convert-std-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e entry -entry-point-result=void -O0 \			// RUN: -e entry -entry-point-result=void -O0 \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_async_runtime%shlibext\
	// RUN: \| FileCheck %s --dump-input=always			// RUN: \| FileCheck %s --dump-input=always
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp

Show All 27 Lines

// Prefix for functions outlined from `async.execute` op regions.		// Prefix for functions outlined from `async.execute` op regions.
static constexpr const char kAsyncFnPrefix[] = "async_execute_fn";		static constexpr const char kAsyncFnPrefix[] = "async_execute_fn";

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Async Runtime C API declaration.		// Async Runtime C API declaration.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		static constexpr const char *kAddRef = "mlirAsyncRuntimeAddRef";
		static constexpr const char *kDropRef = "mlirAsyncRuntimeDropRef";
static constexpr const char *kCreateToken = "mlirAsyncRuntimeCreateToken";		static constexpr const char *kCreateToken = "mlirAsyncRuntimeCreateToken";
static constexpr const char *kCreateGroup = "mlirAsyncRuntimeCreateGroup";		static constexpr const char *kCreateGroup = "mlirAsyncRuntimeCreateGroup";
static constexpr const char *kEmplaceToken = "mlirAsyncRuntimeEmplaceToken";		static constexpr const char *kEmplaceToken = "mlirAsyncRuntimeEmplaceToken";
static constexpr const char *kAwaitToken = "mlirAsyncRuntimeAwaitToken";		static constexpr const char *kAwaitToken = "mlirAsyncRuntimeAwaitToken";
static constexpr const char *kAwaitGroup = "mlirAsyncRuntimeAwaitAllInGroup";		static constexpr const char *kAwaitGroup = "mlirAsyncRuntimeAwaitAllInGroup";
static constexpr const char *kExecute = "mlirAsyncRuntimeExecute";		static constexpr const char *kExecute = "mlirAsyncRuntimeExecute";
static constexpr const char *kAddTokenToGroup =		static constexpr const char *kAddTokenToGroup =
"mlirAsyncRuntimeAddTokenToGroup";		"mlirAsyncRuntimeAddTokenToGroup";
static constexpr const char *kAwaitAndExecute =		static constexpr const char *kAwaitAndExecute =
"mlirAsyncRuntimeAwaitTokenAndExecute";		"mlirAsyncRuntimeAwaitTokenAndExecute";
static constexpr const char *kAwaitAllAndExecute =		static constexpr const char *kAwaitAllAndExecute =
"mlirAsyncRuntimeAwaitAllInGroupAndExecute";		"mlirAsyncRuntimeAwaitAllInGroupAndExecute";

namespace {		namespace {
// Async Runtime API function types.		// Async Runtime API function types.
struct AsyncAPI {		struct AsyncAPI {
		static FunctionType addOrDropRefFunctionType(MLIRContext *ctx) {
		silvasUnsubmitted Done Reply Inline Actions should it start with "create" to match the others? silvas: should it start with "create" to match the others?
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions `createTokenFunctionType` == function type for `createToken` function. Renamed to `addOrDropRefFunctionType` to make it clear that it is for `add_ref` and `drop_ref` ops. ezhulenev: `createTokenFunctionType` == function type for `createToken` function. Renamed to…
		auto ref = LLVM::LLVMType::getInt8PtrTy(ctx);
		auto count = IntegerType::get(32, ctx);
		return FunctionType::get({ref, count}, {}, ctx);
		}

static FunctionType createTokenFunctionType(MLIRContext *ctx) {		static FunctionType createTokenFunctionType(MLIRContext *ctx) {
return FunctionType::get({}, {TokenType::get(ctx)}, ctx);		return FunctionType::get({}, {TokenType::get(ctx)}, ctx);
}		}

static FunctionType createGroupFunctionType(MLIRContext *ctx) {		static FunctionType createGroupFunctionType(MLIRContext *ctx) {
return FunctionType::get({}, {GroupType::get(ctx)}, ctx);		return FunctionType::get({}, {GroupType::get(ctx)}, ctx);
}		}

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	static void addAsyncRuntimeApiDeclarations(ModuleOp module) {

auto addFuncDecl = [&](StringRef name, FunctionType type) {		auto addFuncDecl = [&](StringRef name, FunctionType type) {
if (module.lookupSymbol(name))		if (module.lookupSymbol(name))
return;		return;
builder.create<FuncOp>(module.getLoc(), name, type).setPrivate();		builder.create<FuncOp>(module.getLoc(), name, type).setPrivate();
};		};

MLIRContext *ctx = module.getContext();		MLIRContext *ctx = module.getContext();
		addFuncDecl(kAddRef, AsyncAPI::addOrDropRefFunctionType(ctx));
		addFuncDecl(kDropRef, AsyncAPI::addOrDropRefFunctionType(ctx));
addFuncDecl(kCreateToken, AsyncAPI::createTokenFunctionType(ctx));		addFuncDecl(kCreateToken, AsyncAPI::createTokenFunctionType(ctx));
addFuncDecl(kCreateGroup, AsyncAPI::createGroupFunctionType(ctx));		addFuncDecl(kCreateGroup, AsyncAPI::createGroupFunctionType(ctx));
addFuncDecl(kEmplaceToken, AsyncAPI::emplaceTokenFunctionType(ctx));		addFuncDecl(kEmplaceToken, AsyncAPI::emplaceTokenFunctionType(ctx));
addFuncDecl(kAwaitToken, AsyncAPI::awaitTokenFunctionType(ctx));		addFuncDecl(kAwaitToken, AsyncAPI::awaitTokenFunctionType(ctx));
addFuncDecl(kAwaitGroup, AsyncAPI::awaitGroupFunctionType(ctx));		addFuncDecl(kAwaitGroup, AsyncAPI::awaitGroupFunctionType(ctx));
addFuncDecl(kExecute, AsyncAPI::executeFunctionType(ctx));		addFuncDecl(kExecute, AsyncAPI::executeFunctionType(ctx));
addFuncDecl(kAddTokenToGroup, AsyncAPI::addTokenToGroupFunctionType(ctx));		addFuncDecl(kAddTokenToGroup, AsyncAPI::addTokenToGroupFunctionType(ctx));
addFuncDecl(kAwaitAndExecute, AsyncAPI::awaitAndExecuteFunctionType(ctx));		addFuncDecl(kAwaitAndExecute, AsyncAPI::awaitAndExecuteFunctionType(ctx));
addFuncDecl(kAwaitAllAndExecute, AsyncAPI::awaitAllAndExecuteFunctionType(ctx));		addFuncDecl(kAwaitAllAndExecute,
		AsyncAPI::awaitAllAndExecuteFunctionType(ctx));
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// LLVM coroutines intrinsics declarations.		// LLVM coroutines intrinsics declarations.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static constexpr const char *kCoroId = "llvm.coro.id";		static constexpr const char *kCoroId = "llvm.coro.id";
static constexpr const char *kCoroSizeI64 = "llvm.coro.size.i64";		static constexpr const char *kCoroSizeI64 = "llvm.coro.size.i64";
▲ Show 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	rewriter.replaceOpWithNewOp<CallOp>(op, resultTypes, call.callee(),
call.getOperands());		call.getOperands());

return success();		return success();
}		}
};		};
} // namespace		} // namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// Async reference counting ops lowering (`async.add_ref` and `async.drop_ref`
		// to the corresponding API calls).
		//===----------------------------------------------------------------------===//

		namespace {

		template <typename RefCountingOp>
		class RefCountingOpLowering : public ConversionPattern {
		public:
		explicit RefCountingOpLowering(MLIRContext *ctx, StringRef apiFunctionName)
		: ConversionPattern(RefCountingOp::getOperationName(), 1, ctx),
		apiFunctionName(apiFunctionName) {}

		LogicalResult
		matchAndRewrite(Operation *op, ArrayRef<Value> operands,
		ConversionPatternRewriter &rewriter) const override {
		RefCountingOp refCountingOp = cast<RefCountingOp>(op);

		auto count = rewriter.create<ConstantOp>(
		ftynseUnsubmitted Done Reply Inline Actions You probably want to take the operand from `operands` rather than from the op directly in case it was modified by another pattern. `AddRefOpAdaptor` is an autogenerated class that is constructible from `ArrayRef<Value>` and provides an API similar to the Op it models, i.e. you can call `adaptor.operand()`. ftynse: You probably want to take the operand from `operands` rather than from the op directly in case…
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions Wouldn't the changes be also visible through the op? From the auto generated code is seems that they are identical: ::mlir::Value AddRefOpAdaptor::operand() { return getODSOperands(0).begin(); } vs ::mlir::Operation::operand_range AddRefOp::getODSOperands(unsigned index) { auto valueRange = getODSOperandIndexAndLength(index); return {std::next(getOperation()->operand_begin(), valueRange.first), std::next(getOperation()->operand_begin(), valueRange.first + valueRange.second)}; } ::mlir::Value AddRefOp::operand() { return getODSOperands(0).begin(); } ezhulenev: Wouldn't the changes be also visible through the op? From the auto generated code is seems that…
		ftynseUnsubmitted Done Reply Inline Actions No they will not be visible. Conversion almost never changes operations in-place. `replaceOpWithNewOp` and the likes inject a new op, and keep the old op until the conversion completes in case one needs to examine the original op or its operand. The list of the operands to the op being rewritten is formed by combining the results of the new ops if they were rewritten and existing ops if they were not. This is why we pass `operands` into `matchAndRewrite`, otherwise it would have been a useless copy of `op->getOperands()`. ftynse: No they will not be visible. Conversion almost never changes operations in-place.
		op->getLoc(), rewriter.getI32Type(),
		silvasUnsubmitted Done Reply Inline Actions rewriter has some helpers to avoid these raw `get` calls. silvas: rewriter has some helpers to avoid these raw `get` calls.
		rewriter.getI32IntegerAttr(refCountingOp.count()));

		rewriter.replaceOpWithNewOp<CallOp>(op, TypeRange(), apiFunctionName,
		ValueRange({operands[0], count}));
		silvasUnsubmitted Done Reply Inline Actions This should use `operands[0]` for the converted operands since this is doing a type conversion. silvas: This should use `operands[0]` for the converted operands since this is doing a type conversion.
		ezhulenevAuthorUnsubmitted Done Reply Inline Actions Yes, also fixed a similar bug below. ezhulenev: Yes, also fixed a similar bug below.

		herhutUnsubmitted Done Reply Inline Actions Why not produce the `ValueRange` in place from the two arguments? herhut: Why not produce the `ValueRange` in place from the two arguments?
		return success();
		}

		private:
		StringRef apiFunctionName;
		};
		ftynseUnsubmitted Done Reply Inline Actions Could we do something like template <typename OpTy> class RefToCallLoweringPattern : public OpConversionPattern<OpTy> { RefLoweringPatter(MLIRContext ctx, StringRef funcName) : OpConversionPattern<OpTy>(ctx), funcName(funcName) {} matchAndRewrite(...) { ... rewruter.replaceOpWithNewOp<CallOp>(op, Type(), funcName, ValueRange(args)); } }; and remove duplicate code? ftynse:* Could we do something like ``` template <typename OpTy> class RefToCallLoweringPattern…

		// async.drop_ref op lowering to mlirAsyncRuntimeDropRef function call.
		class AddRefOpLowering : public RefCountingOpLowering<AddRefOp> {
		public:
		explicit AddRefOpLowering(MLIRContext *ctx)
		: RefCountingOpLowering(ctx, kAddRef) {}
		};

		// async.create_group op lowering to mlirAsyncRuntimeCreateGroup function call.
		class DropRefOpLowering : public RefCountingOpLowering<DropRefOp> {
		public:
		explicit DropRefOpLowering(MLIRContext *ctx)
		: RefCountingOpLowering(ctx, kDropRef) {}
		};

		} // namespace

		//===----------------------------------------------------------------------===//
// async.create_group op lowering to mlirAsyncRuntimeCreateGroup function call.		// async.create_group op lowering to mlirAsyncRuntimeCreateGroup function call.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
class CreateGroupOpLowering : public ConversionPattern {		class CreateGroupOpLowering : public ConversionPattern {
public:		public:
explicit CreateGroupOpLowering(MLIRContext *ctx)		explicit CreateGroupOpLowering(MLIRContext *ctx)
: ConversionPattern(CreateGroupOp::getOperationName(), 1, ctx) {}		: ConversionPattern(CreateGroupOp::getOperationName(), 1, ctx) {}
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	void ConvertAsyncToLLVMPass::runOnOperation() {
MLIRContext *ctx = &getContext();		MLIRContext *ctx = &getContext();

// Convert async dialect types and operations to LLVM dialect.		// Convert async dialect types and operations to LLVM dialect.
AsyncRuntimeTypeConverter converter;		AsyncRuntimeTypeConverter converter;
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;

populateFuncOpTypeConversionPattern(patterns, ctx, converter);		populateFuncOpTypeConversionPattern(patterns, ctx, converter);
patterns.insert<CallOpOpConversion>(ctx);		patterns.insert<CallOpOpConversion>(ctx);
		patterns.insert<AddRefOpLowering, DropRefOpLowering>(ctx);
patterns.insert<CreateGroupOpLowering, AddToGroupOpLowering>(ctx);		patterns.insert<CreateGroupOpLowering, AddToGroupOpLowering>(ctx);
patterns.insert<AwaitOpLowering, AwaitAllOpLowering>(ctx, outlinedFunctions);		patterns.insert<AwaitOpLowering, AwaitAllOpLowering>(ctx, outlinedFunctions);

ConversionTarget target(*ctx);		ConversionTarget target(*ctx);
		target.addLegalOp<ConstantOp>();
		ftynseUnsubmitted Done Reply Inline Actions I would recommend to make ConstantOp legal, not the whole StandardDialect, which has lots of different things. ftynse: I would recommend to make ConstantOp legal, not the whole StandardDialect, which has lots of…
target.addLegalDialect<LLVM::LLVMDialect>();		target.addLegalDialect<LLVM::LLVMDialect>();
target.addIllegalDialect<AsyncDialect>();		target.addIllegalDialect<AsyncDialect>();
target.addDynamicallyLegalOp<FuncOp>(		target.addDynamicallyLegalOp<FuncOp>(
[&](FuncOp op) { return converter.isSignatureLegal(op.getType()); });		[&](FuncOp op) { return converter.isSignatureLegal(op.getType()); });
target.addDynamicallyLegalOp<CallOp>(		target.addDynamicallyLegalOp<CallOp>(
[&](CallOp op) { return converter.isLegal(op.getResultTypes()); });		[&](CallOp op) { return converter.isLegal(op.getResultTypes()); });

if (failed(applyPartialConversion(module, target, std::move(patterns))))		if (failed(applyPartialConversion(module, target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}
} // namespace		} // namespace

std::unique_ptr<OperationPass<ModuleOp>> mlir::createConvertAsyncToLLVMPass() {		std::unique_ptr<OperationPass<ModuleOp>> mlir::createConvertAsyncToLLVMPass() {
return std::make_unique<ConvertAsyncToLLVMPass>();		return std::make_unique<ConvertAsyncToLLVMPass>();
}		}

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp

This file was added.

				//===- AsyncRefCounting.cpp - Implementation of Async Ref Counting --------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements automatic reference counting for Async dialect data
				// types.
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Analysis/Liveness.h"
				#include "mlir/Dialect/Async/IR/Async.h"
				#include "mlir/Dialect/Async/Passes.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/IR/PatternMatch.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
				#include "llvm/ADT/SmallSet.h"

				using namespace mlir;
				using namespace mlir::async;

				#define DEBUG_TYPE "async-ref-counting"

				namespace {

				class AsyncRefCountingPass : public AsyncRefCountingBase<AsyncRefCountingPass> {
				public:
				AsyncRefCountingPass() = default;
				void runOnFunction() override;

				private:
				/// Adds an automatic reference counting to the `value`.
				///
				/// All values are semantically created with a reference count of +1 and it is
				ftynseUnsubmitted Done Reply Inline Actions MLIR uses `///` for top-level comments. ftynse: MLIR uses `///` for top-level comments.
				herhutUnsubmitted Done Reply Inline Actions Nit: are. herhut: Nit: are.
				silvasUnsubmitted Done Reply Inline Actions Discuss runtime refcounting ABI conventions for runtime functions in this comment. And conventions for IR functions that accept/return refcounted objects. silvas: Discuss runtime refcounting ABI conventions for runtime functions in this comment. And…
				/// the responsibility of the last async value user to drop reference count.
				///
				rriddleUnsubmitted Done Reply Inline Actions Missing static on all of these? rriddle: Missing static on all of these?
				/// Async values created when:
				/// 1. Operation returns async result (e.g. the result of an
				/// `async.execute`).
				/// 2. Async value passed in as a block argument.
				///
				/// To implement automatic reference counting, we must insert a +1 reference
				/// before each `async.execute` operation using the value, and drop it after
				silvasUnsubmitted Done Reply Inline Actions nit: "it is the responsibility of the async value user" seems to imply that it is not this pass's responsibility. Suggest "To implement automatic reference counting, we must insert a +1 reference before each Operation using the value". silvas: nit: "it is the responsibility of the async value user" seems to imply that it is not this…
				/// the last use inside the async body region (we currently drop the reference
				/// before the `async.yield` terminator).
				silvasUnsubmitted Done Reply Inline Actions Add the explanation from your other review comment here justifying the special treatment of async.execute. silvas: Add the explanation from your other review comment here justifying the special treatment of…
				///
				/// Automatic reference counting algorithm outline:
				///
				/// 1. `ReturnLike` operations forward the reference counted values without
				/// modifying the reference count.
				///
				/// 2. Use liveness analysis to find blocks in the CFG where the lifetime of
				silvasUnsubmitted Done Reply Inline Actions nit: typo coutned silvas: nit: typo coutned
				/// reference counted values ends, and insert `drop_ref` operations after
				/// the last use of the value.
				///
				/// 3. Insert `add_ref` before the `async.execute` operation capturing the
				/// value, and pairing `drop_ref` before the async body region terminator,
				/// to release the captured reference counted value when execution
				/// completes.
				silvasUnsubmitted Done Reply Inline Actions typo: dialect types are silvas: typo: dialect types are
				///
				/// 4. If the reference counted value is passed only to some of the block
				ftynseUnsubmitted Done Reply Inline Actions Out of scope: I am interested in seing this as a generic OpInterface, just yesterday the need for this popped up in another discussion. ftynse: Out of scope: I am interested in seing this as a generic OpInterface, just yesterday the need…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Yeah, seems like a useful property in many contexts. Will leave it for the followup. ezhulenev: Yeah, seems like a useful property in many contexts. Will leave it for the followup.
				/// successors, insert `drop_ref` operations in the beginning of the blocks
				/// that do not have reference counted value uses.
				///
				///
				/// Example:
				///
				/// %token = ...
				/// async.execute {
				/// async.await %token : !async.token // await #1
				/// async.yield
				/// }
				silvasUnsubmitted Done Reply Inline Actions typo: yied silvas: typo: yied
				/// async.await %token : !async.token // await #2
				///
				/// Based on the liveness analysis await #2 is the last use of the %token,
				/// however the execution of the async region can be delayed, and to guarantee
				/// that the %token is still alive when await #1 executes we need to
				/// explicitly extend its lifetime using `add_ref` operation.
				///
				/// After automatic reference counting:
				///
				/// %token = ...
				///
				/// // Make sure that %token is alive inside async.execute.
				/// async.add_ref %token {count = 1 : i32} : !async.token
				///
				/// async.execute {
				/// async.await %token : !async.token // await #1
				///
				silvasUnsubmitted Done Reply Inline Actions explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we encounter uses in nested region) silvas: explain why not nested blocks (or leave TODO; also, we should probably signalPassFailure if we…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Added few lines to explain why ignoring nested regions is ok. ezhulenev: Added few lines to explain why ignoring nested regions is ok.
				/// // Drop the extra reference added to keep %token alive.
				/// async.drop_ref %token {count = 1 : i32} : !async.token
				///
				/// async.yied
				/// }
				/// async.await %token : !async.token // await #2
				///
				/// // Drop the reference after the last use of %token.
				/// async.drop_ref %token {count = 1 : i32} : !async.token
				///
				LogicalResult addAutomaticRefCounting(Value value);
				};

				} // namespace

				silvasUnsubmitted Done Reply Inline Actions typo: in in silvas: typo: in in
				LogicalResult AsyncRefCountingPass::addAutomaticRefCounting(Value value) {
				MLIRContext *ctx = value.getContext();
				OpBuilder builder(ctx);

				// Set inserton point after the operation producing a value, or at the
				// beginning of the block if the value defined by the block argument.
				if (Operation *op = value.getDefiningOp())
				builder.setInsertionPointAfter(op);
				else
				builder.setInsertionPointToStart(value.getParentBlock());

				Location loc = value.getLoc();
				auto i32 = IntegerType::get(32, ctx);

				silvasUnsubmitted Done Reply Inline Actions findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky case): for (Operation user : value.getUsers()) { if (user->getParent() == block) { userInTheBlock = user; break; } } Also, recommend putting this in a static helper, per https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code silvas:* findAncestorOpInBlock is tricky. Can you do this? (or leave a comment explaining the tricky…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions `findAncestorOpInBlock` required to find the last use in the block even if the "real" use is deep inside nested region. %token = ... scf.for %i = ... { <<<----- `scf.for` will be the last user async.await %token : !async.token } asyn.drop_ref %token. <<<---- will be added after the last use in the CFG Cleaned up code a little bit. ezhulenev: `findAncestorOpInBlock` required to find the last use in the block even if the "real" use is…
				// Drop the reference count immediately if the value has no uses.
				if (value.getUses().empty()) {
				builder.create<DropRefOp>(loc, value, IntegerAttr::get(i32, 1));
				return success();
				}

				// Use liveness analysis to find the placement of `drop_ref`operation.
				auto liveness = getAnalysis<Liveness>();

				// We analyse only the blocks of the region that defines the `value`, and do
				// not check nested blocks attached to operations.
				//
				// By analyzing only the `definingRegion` CFG we potentially loose an
				// opportunity to drop the reference count earlier and can extend the lifetime
				// of reference counted value longer then it is really required.
				//
				// We also assume that all nested regions finish their execution before the
				// completion of the owner operation. The only exception to this rule is
				// `async.execute` operation, which is handled explicitly below.
				Region *definingRegion = value.getParentRegion();

				// ------------------------------------------------------------------------ //
				ftynseUnsubmitted Done Reply Inline Actions Any particular reason for using 32bit integers for refcount? In this struct, it may not even save space because the compiler will insert padding. ftynse: Any particular reason for using 32bit integers for refcount? In this struct, it may not even…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Not really, just to match the type of the `count` arg in add_ref/drop_ref ops, but that choice is also arbitrary. ezhulenev: Not really, just to match the type of the `count` arg in add_ref/drop_ref ops, but that choice…
				// Find blocks where the `value` dies: the value is in `liveIn` set and not
				// in the `liveOut` set. We place `drop_ref` immediately after the last use
				// of the `value` in such regions.
				// ------------------------------------------------------------------------ //

				// Last users of the `value` inside all blocks where the value dies.
				llvm::SmallSet<Operation *, 4> lastUsers;

				for (Block &block : definingRegion->getBlocks()) {
				const LivenessBlockInfo *blockLiveness = liveness.getLiveness(&block);

				// Value in live input set or was defined in the block.
				bool liveIn = blockLiveness->isLiveIn(value) \|\|
				blockLiveness->getBlock() == value.getParentBlock();
				if (!liveIn)
				continue;

				// Value is in the live out set.
				bool liveOut = blockLiveness->isLiveOut(value);
				if (liveOut)
				continue;

				// We proved that `value` dies in the `block`. Now find the last use of the
				// `value` inside the `block`.

				// Find any user of the `value` inside the block (including uses in nested
				herhutUnsubmitted Done Reply Inline Actions I would argue for not having the users consume reference counts, as this makes it impossible to optimize the decrement operations in IR (they are tied to the ops). For instance, if you had `inc_rc` and `dec_rc` explicit, and both were in a loop, you could hoist the increments and sink the decrements, removing the overhead from the loop. That might be a better way to optimize this in general. First insert all increments and decrements trivially where needed (the buffer deallocation pass could do this for you, see my comment on other CL) and then have a pass that pushes increments and decrements up/down, combining them where possible. Seems less fragile and would work with existing interfaces for region control flow. It would also allow to pass async values to operations that do not implement the reference counting consumer interface. herhut: I would argue for not having the users consume reference counts, as this makes it impossible to…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions FWIW Swift SIL has all reference counting explicit (https://github.com/apple/swift/blob/main/docs/ARCOptimization.rst). There are two types of ref-counted value users: "forwarding": std.return, function call arg - they do not change the ref count "consumers" - everything else. Async automatic ref counting will need to either have a closed set of supported users, or rely in op interfaces to distinguish between user types. ezhulenev: FWIW Swift SIL has all reference counting explicit (https://github.
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions And there is also operation like `mlirAsyncRuntimeAddTokenToGroup` that consumes reference at some indeterminate point in the future, so if IR has `drop_ref`, then the operation will need to have `add_ref` to compensate for that or marked as `"forwarding"` (reference counting responsibility forwarded to the runtime) ezhulenev: And there is also operation like `mlirAsyncRuntimeAddTokenToGroup` that consumes reference at…
				silvasUnsubmitted Done Reply Inline Actions It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost". Can you adjust the comment? I also don't understand "Inside this operation statically known number of uses is 1" - if %cond is false it will be 0. silvas: It is unclear what "dynamic operation" means in this context and why scf.for is the "innermost".
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions I've pushed a new revision based on liveness analysis and explicit `drop_ref` instead of implicit "ref consumer". ezhulenev: I've pushed a new revision based on liveness analysis and explicit `drop_ref` instead of…
				// regions attached to the operations in the block).
				Operation *userInTheBlock = nullptr;
				for (Operation *user : value.getUsers()) {
				userInTheBlock = block.findAncestorOpInBlock(*user);
				if (userInTheBlock)
				break;
				}

				// Values with zero users handled explicitly in the beginning, if the value
				// is in live out set it must have at least one use in the block.
				assert(userInTheBlock && "value must have a user in the block");
				silvasUnsubmitted Done Reply Inline Actions nit: looks like line wrapping here forgot to insert `//`.Same on the async.drop_ref below. silvas: nit: looks like line wrapping here forgot to insert `//`.Same on the async.drop_ref below.

				// Find the last user of the `value` in the block;
				Operation *lastUser = blockLiveness->getEndOperation(value, userInTheBlock);
				assert(lastUsers.count(lastUser) == 0 && "last users must be unique");
				lastUsers.insert(lastUser);
				}

				// Process all the last users of the `value` inside each block where the value
				// dies.
				for (Operation *lastUser : lastUsers) {
				// Return like operations forward reference count.
				if (lastUser->hasTrait<OpTrait::ReturnLike>())
				silvasUnsubmitted Done Reply Inline Actions Why only ExecuteOp? Why not use NumberOfExecutions? silvas: Why only ExecuteOp? Why not use NumberOfExecutions?
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Because operations after the `async.execute` can be executed before the operations nested under the `async.execute`, this is currently the only operation that has this property. Example: %token = ... async.execute { async.await %token : !async.token // await #1 async.yield } async.await %token : !async.token // await #2 It is impossible to determine which of the `async.await` operations will be the "last use" at runtime. Ref counting will pick second await as the last user and will create `drop_ref` after it, however if first await will be executed later it needs to keep the `token` alive. ezhulenev: Because operations after the `async.execute` can be executed before the operations nested under…
				continue;

				// We can't currently handle other types of terminators.
				if (lastUser->hasTrait<OpTrait::IsTerminator>())
				return lastUser->emitError() << "async reference counting can't handle "
				"terminators that are not ReturnLike";

				// Add a drop_ref immediately after the last user.
				builder.setInsertionPointAfter(lastUser);
				builder.create<DropRefOp>(loc, value, IntegerAttr::get(i32, 1));
				}

				// ------------------------------------------------------------------------ //
				// Find blocks where the `value` is in `liveOut` set, however it is not in
				// the `liveIn` set of all successors. If the `value` is not in the successor
				// `liveIn` set, we add a `drop_ref` to the beginning of it.
				// ------------------------------------------------------------------------ //

				// Successors that we'll need a `drop_ref` for the `value`.
				silvasUnsubmitted Done Reply Inline Actions I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing `while (user->getRegion() != definingRegion)`. That would make this code simpler as well. silvas: I think you can avoid findAncestorBlockInRegion/findAncestorOpInBlock by just doing `while…
				llvm::SmallSet<Block *, 4> dropRefSuccessors;

				for (Block &block : definingRegion->getBlocks()) {
				const LivenessBlockInfo *blockLiveness = liveness.getLiveness(&block);

				// Skip the block if value is not in the `liveOut` set.
				if (!blockLiveness->isLiveOut(value))
				continue;

				// Find successors that do not have `value` in the `liveIn` set.
				for (Block *successor : block.getSuccessors()) {
				const LivenessBlockInfo *succLiveness = liveness.getLiveness(successor);

				if (!succLiveness->isLiveIn(value))
				dropRefSuccessors.insert(successor);
				}
				}

				// Drop reference in all successor blocks that do not have the `value` in
				// their `liveIn` set.
				for (Block *dropRefSuccessor : dropRefSuccessors) {
				builder.setInsertionPointToStart(dropRefSuccessor);
				builder.create<DropRefOp>(loc, value, IntegerAttr::get(i32, 1));
				}

				// ------------------------------------------------------------------------ //
				// Find all `async.execute` operation that take `value` as an operand
				// (dependency token or async value), or capture implicitly by the nested
				// region. Each `async.execute` operation will require `add_ref` operation
				// to keep all captured values alive until it will finish its execution.
				// ------------------------------------------------------------------------ //

				silvasUnsubmitted Done Reply Inline Actions I would prefer to keep such optimizations in a separate pass. Advantages: Easy to show and test tricky cases of this optimization (the current code requires a level of indirection -- one has to imagine which ops are inserted, and then removed) When debugging a miscompile, it is easier to bisect by removing an optimization pass which should not affect correctness. Can do this more efficiently. The current algorithm is O(BlockSize^3); many ML programs are single blocks of >1000 ops. I think this algorithm can be replaced with with a single walk of each block, applying the optimization to all refcounted Value's in that block at the same time. Makes test cases for this pass clearer because users can see all the ops inserted and follow along with the code. (if you want to omit this optimization from the initial patch, that is fine too). silvas: I would prefer to keep such optimizations in a separate pass. Advantages: 1. Easy to show and…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions I moved it to a separate `async-ref-counting-optimization` pass. It is still not as efficient as it could be, but I added a small preprocessing step + iterate only the blocks that have uses of `value`. ezhulenev: I moved it to a separate `async-ref-counting-optimization` pass. It is still not as efficient…
				llvm::SmallSet<ExecuteOp, 4> executeOperations;

				auto trackAsyncExecute = [&](Operation *op) {
				if (auto execute = dyn_cast<ExecuteOp>(op))
				executeOperations.insert(execute);
				};

				for (Operation *user : value.getUsers()) {
				// Follow parent operations up until the operation in the `definingRegion`.
				while (user->getParentRegion() != definingRegion) {
				trackAsyncExecute(user);
				user = user->getParentOp();
				assert(user != nullptr && "value user lies outside of the value region");
				}

				// Don't forget to process the parent in the `definingRegion` (can be the
				// original user operation itself).
				trackAsyncExecute(user);
				}

				// Process all `async.execute` operations capturing `value`.
				for (ExecuteOp execute : executeOperations) {
				// Add a reference before the execute operation to keep the reference
				// counted alive before the async region completes execution.
				builder.setInsertionPoint(execute.getOperation());
				builder.create<AddRefOp>(loc, value, IntegerAttr::get(i32, 1));

				// Drop the reference inside the async region before completion.
				OpBuilder executeBuilder = OpBuilder::atBlockTerminator(execute.getBody());
				silvasUnsubmitted Done Reply Inline Actions nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance of `result`'s owner". silvas: nit: you might want to clarify somwhere that when you say "instances" here, it is "per instance…
				executeBuilder.create<DropRefOp>(loc, value, IntegerAttr::get(i32, 1));
				}

				ftynseUnsubmitted Done Reply Inline Actions 19 looks very unconventional. We usually try to estimate what would be the common "small" number of entries and round it up to a power of two. ftynse: 19 looks very unconventional. We usually try to estimate what would be the common "small"…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions That was a typo, it was supposed to be 10 :) Changed to 8 here and below, because that seems like a reasonable upper bound for number of uses for an async value, ezhulenev: That was a typo, it was supposed to be 10 :) Changed to 8 here and below, because that seems…
				return success();
				}

				void AsyncRefCountingPass::runOnFunction() {
				FuncOp func = getFunction();

				// Check that we do not have explicit `add_ref` or `drop_ref` in the IR
				// because otherwise automatic reference counting will produce incorrect
				// results.
				WalkResult refCountingWalk = func.walk([&](Operation *op) -> WalkResult {
				if (isa<AddRefOp, DropRefOp>(op))
				return op->emitError() << "explicit reference counting is not supported";
				return WalkResult::advance();
				});

				if (refCountingWalk.wasInterrupted())
				signalPassFailure();

				// Add reference counting to block arguments.
				WalkResult blockWalk = func.walk([&](Block *block) -> WalkResult {
				for (BlockArgument arg : block->getArguments())
				if (isRefCounted(arg.getType()))
				if (failed(addAutomaticRefCounting(arg)))
				return WalkResult::interrupt();

				return WalkResult::advance();
				});

				if (blockWalk.wasInterrupted())
				signalPassFailure();

				// Add reference counting to operation results.
				WalkResult opWalk = func.walk([&](Operation *op) -> WalkResult {
				for (unsigned i = 0; i < op->getNumResults(); ++i)
				if (isRefCounted(op->getResultTypes()[i]))
				if (failed(addAutomaticRefCounting(op->getResult(i))))
				return WalkResult::interrupt();

				return WalkResult::advance();
				});

				if (opWalk.wasInterrupted())
				signalPassFailure();
				}

				std::unique_ptr<OperationPass<FuncOp>> mlir::createAsyncRefCountingPass() {
				return std::make_unique<AsyncRefCountingPass>();
				}

mlir/lib/Dialect/Async/Transforms/AsyncRefCountingOptimization.cpp

This file was added.

				//===- AsyncRefCountingOptimization.cpp - Async Ref Counting --------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Optimize Async dialect reference counting operations.
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Dialect/Async/IR/Async.h"
				#include "mlir/Dialect/Async/Passes.h"
				#include "llvm/ADT/SmallSet.h"

				using namespace mlir;
				using namespace mlir::async;

				#define DEBUG_TYPE "async-ref-counting"

				namespace {

				class AsyncRefCountingOptimizationPass
				: public AsyncRefCountingOptimizationBase<
				AsyncRefCountingOptimizationPass> {
				public:
				AsyncRefCountingOptimizationPass() = default;
				void runOnFunction() override;

				private:
				LogicalResult optimizeReferenceCounting(Value value);
				};

				} // namespace

				LogicalResult
				AsyncRefCountingOptimizationPass::optimizeReferenceCounting(Value value) {
				Region *definingRegion = value.getParentRegion();

				silvasUnsubmitted Done Reply Inline Actions suggest putting this helper in include/Dialect/Async/IR/Async.h; it is used in the other file too. silvas: suggest putting this helper in include/Dialect/Async/IR/Async.h; it is used in the other file…
				// Find all users of the `value` inside each block, including operations that
				// do not use `value` directly, but have a direct use inside nested region(s).
				//
				// Example:
				//
				// ^bb1:
				// %token = ...
				// scf.if %cond {
				// ^bb2:
				// async.await %token : !async.token
				// }
				//
				// %token has a use inside ^bb2 (`async.await`) and inside ^bb1 (`scf.if`).
				//
				// In addition to the operation that uses the `value` we also keep track if
				// this user is an `async.execute` operation itself, or has `async.execute`
				// operations in the nested regions that do use the `value`.

				struct UserInfo {
				Operation *operation;
				bool hasExecuteUser;
				};

				struct BlockUsersInfo {
				llvm::SmallVector<AddRefOp, 4> addRefs;
				llvm::SmallVector<DropRefOp, 4> dropRefs;
				llvm::SmallVector<UserInfo, 4> users;
				};

				llvm::DenseMap<Block *, BlockUsersInfo> blockUsers;

				auto updateBlockUsersInfo = [&](UserInfo user) {
				BlockUsersInfo &info = blockUsers[user.operation->getBlock()];
				info.users.push_back(user);

				if (auto addRef = dyn_cast<AddRefOp>(user.operation))
				info.addRefs.push_back(addRef);
				if (auto dropRef = dyn_cast<DropRefOp>(user.operation))
				info.dropRefs.push_back(dropRef);
				};

				for (Operation *user : value.getUsers()) {
				bool isAsyncUser = isa<ExecuteOp>(user);

				while (user->getParentRegion() != definingRegion) {
				updateBlockUsersInfo({user, isAsyncUser});
				user = user->getParentOp();
				isAsyncUser \|= isa<ExecuteOp>(user);
				assert(user != nullptr && "value user lies outside of the value region");
				}

				updateBlockUsersInfo({user, isAsyncUser});
				}

				// Sort all operations found in the block.
				auto preprocessBlockUsersInfo = [](BlockUsersInfo &info) -> BlockUsersInfo & {
				auto isBeforeInBlock = [](Operation a, Operation b) -> bool {
				return a->isBeforeInBlock(b);
				};
				llvm::sort(info.addRefs, isBeforeInBlock);
				llvm::sort(info.dropRefs, isBeforeInBlock);
				llvm::sort(info.users, [&](UserInfo a, UserInfo b) -> bool {
				return isBeforeInBlock(a.operation, b.operation);
				});

				return info;
				};

				// Find and erase matching pairs of `add_ref` / `drop_ref` operations in the
				// blocks that modify the reference count of the `value`.
				for (auto &kv : blockUsers) {
				BlockUsersInfo &info = preprocessBlockUsersInfo(kv.second);

				// Find all cancellable pairs first and erase them later to keep all
				// pointers in the `info` valid until the end.
				//
				// Mapping from `dropRef.getOperation()` to `addRef.getOperation()`.
				llvm::SmallDenseMap<Operation , Operation > cancellable;

				for (AddRefOp addRef : info.addRefs) {
				for (DropRefOp dropRef : info.dropRefs) {
				// `drop_ref` operation after the `add_ref` with matching count.
				if (dropRef.count() != addRef.count() \|\|
				dropRef.getOperation()->isBeforeInBlock(addRef.getOperation()))
				continue;

				// `drop_ref` was already marked for removal.
				if (cancellable.find(dropRef.getOperation()) != cancellable.end())
				continue;

				// Check `value` users between `addRef` and `dropRef` in the `block`.
				Operation *addRefOp = addRef.getOperation();
				Operation *dropRefOp = dropRef.getOperation();

				// If there is a "regular" user after the `async.execute` user it is
				// unsafe to erase cancellable reference counting operations pair,
				// because async region can complete before the "regular" user and
				// destroy the reference counted value.
				bool hasExecuteUser = false;
				bool unsafeToCancel = false;

				for (UserInfo &user : info.users) {
				Operation *op = user.operation;

				// `user` operation lies after `addRef` ...
				if (op == addRefOp \|\| op->isBeforeInBlock(addRefOp))
				continue;
				// ... and before `dropRef`.
				if (op == dropRefOp \|\| dropRefOp->isBeforeInBlock(op))
				break;

				bool isRegularUser = !user.hasExecuteUser;
				bool isExecuteUser = user.hasExecuteUser;

				// It is unsafe to cancel `addRef` / `dropRef` pair.
				if (isRegularUser && hasExecuteUser) {
				unsafeToCancel = true;
				break;
				}

				hasExecuteUser \|= isExecuteUser;
				}

				// Mark the pair of reference counting operations for removal.
				if (!unsafeToCancel)
				cancellable[dropRef.getOperation()] = addRef.getOperation();

				// If it us unsafe to cancel `addRef <-> dropRef` pair at this point,
				// all the following pairs will be also unsafe.
				break;
				}
				}

				// Erase all cancellable `addRef <-> dropRef` operation pairs.
				for (auto &kv : cancellable) {
				kv.first->erase();
				kv.second->erase();
				}
				}

				return success();
				}

				void AsyncRefCountingOptimizationPass::runOnFunction() {
				FuncOp func = getFunction();

				// Optimize reference counting for values defined by block arguments.
				WalkResult blockWalk = func.walk([&](Block *block) -> WalkResult {
				for (BlockArgument arg : block->getArguments())
				if (isRefCounted(arg.getType()))
				if (failed(optimizeReferenceCounting(arg)))
				return WalkResult::interrupt();

				return WalkResult::advance();
				});

				if (blockWalk.wasInterrupted())
				signalPassFailure();

				// Optimize reference counting for values defined by operation results.
				WalkResult opWalk = func.walk([&](Operation *op) -> WalkResult {
				for (unsigned i = 0; i < op->getNumResults(); ++i)
				if (isRefCounted(op->getResultTypes()[i]))
				if (failed(optimizeReferenceCounting(op->getResult(i))))
				return WalkResult::interrupt();

				return WalkResult::advance();
				});

				if (opWalk.wasInterrupted())
				signalPassFailure();
				}

				std::unique_ptr<OperationPass<FuncOp>>
				mlir::createAsyncRefCountingOptimizationPass() {
				return std::make_unique<AsyncRefCountingOptimizationPass>();
				}

mlir/lib/Dialect/Async/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRAsyncTransforms			add_mlir_dialect_library(MLIRAsyncTransforms
	AsyncParallelFor.cpp			AsyncParallelFor.cpp
				AsyncRefCounting.cpp
				AsyncRefCountingOptimization.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Async			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Async

	DEPENDS			DEPENDS
	MLIRAsyncPassIncGen			MLIRAsyncPassIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRAsync			MLIRAsync
	MLIRSCF			MLIRSCF
	MLIRPass			MLIRPass
	MLIRTransforms			MLIRTransforms
	MLIRTransformUtils			MLIRTransformUtils
	)			)

mlir/lib/ExecutionEngine/AsyncRuntime.cpp

	Show All 10 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/ExecutionEngine/AsyncRuntime.h"			#include "mlir/ExecutionEngine/AsyncRuntime.h"

	#ifdef MLIR_ASYNCRUNTIME_DEFINE_FUNCTIONS			#ifdef MLIR_ASYNCRUNTIME_DEFINE_FUNCTIONS

	#include <atomic>			#include <atomic>
				#include <cassert>
	#include <condition_variable>			#include <condition_variable>
	#include <functional>			#include <functional>
	#include <iostream>			#include <iostream>
	#include <mutex>			#include <mutex>
	#include <thread>			#include <thread>
	#include <vector>			#include <vector>

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Async runtime API.			// Async runtime API.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	struct AsyncToken {			namespace {
	bool ready = false;
				// Forward declare class defined below.
				class RefCounted;

				// -------------------------------------------------------------------------- //
				// AsyncRuntime orchestrates all async operations and Async runtime API is built
				// on top of the default runtime instance.
				// -------------------------------------------------------------------------- //

				class AsyncRuntime {
				public:
				AsyncRuntime() : numRefCountedObjects(0) {}

				~AsyncRuntime() {
				assert(getNumRefCountedObjects() == 0 &&
				"all ref counted objects must be destroyed");
				}

				int32_t getNumRefCountedObjects() {
				return numRefCountedObjects.load(std::memory_order_relaxed);
				}

				private:
				friend class RefCounted;

				// Count the total number of reference counted objects in this instance
				// of an AsyncRuntime. For debugging purposes only.
				void addNumRefCountedObjects() {
				numRefCountedObjects.fetch_add(1, std::memory_order_relaxed);
				}
				void dropNumRefCountedObjects() {
				numRefCountedObjects.fetch_sub(1, std::memory_order_relaxed);
				}

				std::atomic<int32_t> numRefCountedObjects;
				};

				// Returns the default per-process instance of an async runtime.
				AsyncRuntime *getDefaultAsyncRuntimeInstance() {
				static auto runtime = std::make_unique<AsyncRuntime>();
				return runtime.get();
				}

				// -------------------------------------------------------------------------- //
				// A base class for all reference counted objects created by the async runtime.
				// -------------------------------------------------------------------------- //

				class RefCounted {
				public:
				RefCounted(AsyncRuntime *runtime, int32_t refCount = 1)
				: runtime(runtime), refCount(refCount) {
				runtime->addNumRefCountedObjects();
				}

				virtual ~RefCounted() {
				assert(refCount.load() == 0 && "reference count must be zero");
				runtime->dropNumRefCountedObjects();
				}

				RefCounted(const RefCounted &) = delete;
				RefCounted &operator=(const RefCounted &) = delete;

				void addRef(int32_t count = 1) { refCount.fetch_add(count); }
				ftynseUnsubmitted Done Reply Inline Actions please fix ftynse: please fix

				void dropRef(int32_t count = 1) {
				int32_t previous = refCount.fetch_sub(count);
				assert(previous >= count && "reference count should not go below zero");
				if (previous == count)
				destroy();
				}

				protected:
				virtual void destroy() { delete this; }

				private:
				AsyncRuntime *runtime;
				std::atomic<int32_t> refCount;
				};

				} // namespace

				struct AsyncToken : public RefCounted {
				// AsyncToken created with a reference count of 2 because it will be returned
				// to the `async.execute` caller and also will be later on emplaced by the
				// asynchronously executed task. If the caller immediately will drop its
				// reference we must ensure that the token will be alive until the
				// asynchronous operation is completed.
				AsyncToken(AsyncRuntime runtime) : RefCounted(runtime, /count=*/2) {}

				// Internal state below guarded by a mutex.
	std::mutex mu;			std::mutex mu;
	std::condition_variable cv;			std::condition_variable cv;

				bool ready = false;
	std::vector<std::function<void()>> awaiters;			std::vector<std::function<void()>> awaiters;
	};			};

	struct AsyncGroup {			struct AsyncGroup : public RefCounted {
	std::atomic<int> pendingTokens{0};			AsyncGroup(AsyncRuntime *runtime)
	std::atomic<int> rank{0};			: RefCounted(runtime), pendingTokens(0), rank(0) {}

				std::atomic<int> pendingTokens;
				std::atomic<int> rank;

				// Internal state below guarded by a mutex.
	std::mutex mu;			std::mutex mu;
	std::condition_variable cv;			std::condition_variable cv;

	std::vector<std::function<void()>> awaiters;			std::vector<std::function<void()>> awaiters;
	};			};

				// Adds references to reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeAddRef(RefCountedObjPtr ptr, int32_t count) {
				RefCounted refCounted = static_cast<RefCounted >(ptr);
				refCounted->addRef(count);
				}

				// Drops references from reference counted runtime object.
				extern "C" MLIR_ASYNCRUNTIME_EXPORT void
				mlirAsyncRuntimeDropRef(RefCountedObjPtr ptr, int32_t count) {
				RefCounted refCounted = static_cast<RefCounted >(ptr);
				refCounted->dropRef(count);
				}

	// Create a new `async.token` in not-ready state.			// Create a new `async.token` in not-ready state.
	extern "C" AsyncToken *mlirAsyncRuntimeCreateToken() {			extern "C" AsyncToken *mlirAsyncRuntimeCreateToken() {
	AsyncToken *token = new AsyncToken;			AsyncToken *token = new AsyncToken(getDefaultAsyncRuntimeInstance());
	return token;			return token;
	}			}

	// Create a new `async.group` in empty state.			// Create a new `async.group` in empty state.
	extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup() {			extern "C" MLIR_ASYNCRUNTIME_EXPORT AsyncGroup *mlirAsyncRuntimeCreateGroup() {
	AsyncGroup *group = new AsyncGroup;			AsyncGroup *group = new AsyncGroup(getDefaultAsyncRuntimeInstance());
	return group;			return group;
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t			extern "C" MLIR_ASYNCRUNTIME_EXPORT int64_t
	mlirAsyncRuntimeAddTokenToGroup(AsyncToken token, AsyncGroup group) {			mlirAsyncRuntimeAddTokenToGroup(AsyncToken token, AsyncGroup group) {
	std::unique_lock<std::mutex> lockToken(token->mu);			std::unique_lock<std::mutex> lockToken(token->mu);
	std::unique_lock<std::mutex> lockGroup(group->mu);			std::unique_lock<std::mutex> lockGroup(group->mu);

				// Get the rank of the token inside the group before we drop the reference.
				int rank = group->rank.fetch_add(1);
	group->pendingTokens.fetch_add(1);			group->pendingTokens.fetch_add(1);

	auto onTokenReady = [group]() {			auto onTokenReady = [group, token](bool dropRef) {
	// Run all group awaiters if it was the last token in the group.			// Run all group awaiters if it was the last token in the group.
	if (group->pendingTokens.fetch_sub(1) == 1) {			if (group->pendingTokens.fetch_sub(1) == 1) {
	group->cv.notify_all();			group->cv.notify_all();
	for (auto &awaiter : group->awaiters)			for (auto &awaiter : group->awaiters)
	awaiter();			awaiter();
	}			}

				// We no longer need the token or the group, drop references on them.
				if (dropRef) {
				group->dropRef();
				token->dropRef();
				}
	};			};

	if (token->ready)			if (token->ready) {
	onTokenReady();			onTokenReady(false);
	else			} else {
	token->awaiters.push_back([onTokenReady]() { onTokenReady(); });			group->addRef();
				token->addRef();
				token->awaiters.push_back([onTokenReady]() { onTokenReady(true); });
				}

	return group->rank.fetch_add(1);			return rank;
	}			}

	// Switches `async.token` to ready state and runs all awaiters.			// Switches `async.token` to ready state and runs all awaiters.
	extern "C" void mlirAsyncRuntimeEmplaceToken(AsyncToken *token) {			extern "C" void mlirAsyncRuntimeEmplaceToken(AsyncToken *token) {
	std::unique_lock<std::mutex> lock(token->mu);			std::unique_lock<std::mutex> lock(token->mu);
	token->ready = true;			token->ready = true;
	token->cv.notify_all();			token->cv.notify_all();
	for (auto &awaiter : token->awaiters)			for (auto &awaiter : token->awaiters)
	awaiter();			awaiter();

				// Async tokens created with a ref count `2` to keep token alive until the
				// async task completes. Drop this reference explicitly when token emplaced.
				token->dropRef();
	}			}

	extern "C" void mlirAsyncRuntimeAwaitToken(AsyncToken *token) {			extern "C" void mlirAsyncRuntimeAwaitToken(AsyncToken *token) {
	std::unique_lock<std::mutex> lock(token->mu);			std::unique_lock<std::mutex> lock(token->mu);
	if (!token->ready)			if (!token->ready)
	token->cv.wait(lock, [token] { return token->ready; });			token->cv.wait(lock, [token] { return token->ready; });
	}			}

	Show All 13 Lines
	#endif			#endif
	}			}

	extern "C" void mlirAsyncRuntimeAwaitTokenAndExecute(AsyncToken *token,			extern "C" void mlirAsyncRuntimeAwaitTokenAndExecute(AsyncToken *token,
	CoroHandle handle,			CoroHandle handle,
	CoroResume resume) {			CoroResume resume) {
	std::unique_lock<std::mutex> lock(token->mu);			std::unique_lock<std::mutex> lock(token->mu);

	auto execute = [handle, resume]() {			auto execute = [handle, resume, token](bool dropRef) {
				if (dropRef)
				token->dropRef();
	mlirAsyncRuntimeExecute(handle, resume);			mlirAsyncRuntimeExecute(handle, resume);
	};			};

	if (token->ready)			if (token->ready) {
	execute();			execute(false);
	else			} else {
	token->awaiters.push_back([execute]() { execute(); });			token->addRef();
				token->awaiters.push_back([execute]() { execute(true); });
				}
	}			}

	extern "C" MLIR_ASYNCRUNTIME_EXPORT void			extern "C" MLIR_ASYNCRUNTIME_EXPORT void
	mlirAsyncRuntimeAwaitAllInGroupAndExecute(AsyncGroup *group, CoroHandle handle,			mlirAsyncRuntimeAwaitAllInGroupAndExecute(AsyncGroup *group, CoroHandle handle,
	CoroResume resume) {			CoroResume resume) {
	std::unique_lock<std::mutex> lock(group->mu);			std::unique_lock<std::mutex> lock(group->mu);

	auto execute = [handle, resume]() {			auto execute = [handle, resume, group](bool dropRef) {
				if (dropRef)
				group->dropRef();
	mlirAsyncRuntimeExecute(handle, resume);			mlirAsyncRuntimeExecute(handle, resume);
	};			};

	if (group->pendingTokens == 0)			if (group->pendingTokens == 0) {
	execute();			execute(false);
	else			} else {
	group->awaiters.push_back([execute]() { execute(); });			group->addRef();
				group->awaiters.push_back([execute]() { execute(true); });
				}
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Small async runtime support library for testing.			// Small async runtime support library for testing.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	extern "C" void mlirAsyncRuntimePrintCurrentThreadId() {			extern "C" void mlirAsyncRuntimePrintCurrentThreadId() {
	static thread_local std::thread::id thisId = std::this_thread::get_id();			static thread_local std::thread::id thisId = std::this_thread::get_id();
	std::cout << "Current thread id: " << thisId << "\n";			std::cout << "Current thread id: " << thisId << "\n";
	}			}

	#endif // MLIR_ASYNCRUNTIME_DEFINE_FUNCTIONS			#endif // MLIR_ASYNCRUNTIME_DEFINE_FUNCTIONS

mlir/test/Conversion/AsyncToLLVM/convert-to-llvm.mlir

	// RUN: mlir-opt %s -split-input-file -convert-async-to-llvm \| FileCheck %s			// RUN: mlir-opt %s -split-input-file -convert-async-to-llvm \| FileCheck %s

				// CHECK-LABEL: reference_counting
				func @reference_counting(%arg0: !async.token) {
				// CHECK: %[[C2:.*]] = constant 2 : i32
				// CHECK: call @mlirAsyncRuntimeAddRef(%arg0, %[[C2]])
				async.add_ref %arg0 {count = 2 : i32} : !async.token

				// CHECK: %[[C1:.*]] = constant 1 : i32
				// CHECK: call @mlirAsyncRuntimeDropRef(%arg0, %[[C1]])
				async.drop_ref %arg0 {count = 1 : i32} : !async.token

				return
				}

				// -----

	// CHECK-LABEL: execute_no_async_args			// CHECK-LABEL: execute_no_async_args
	func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {			func @execute_no_async_args(%arg0: f32, %arg1: memref<1xf32>) {
	// CHECK: %[[TOKEN:.*]] = call @async_execute_fn(%arg0, %arg1)			// CHECK: %[[TOKEN:.*]] = call @async_execute_fn(%arg0, %arg1)
	%token = async.execute {			%token = async.execute {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	store %arg0, %arg1[%c0] : memref<1xf32>			store %arg0, %arg1[%c0] : memref<1xf32>
	async.yield			async.yield
	}			}
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

mlir/test/Dialect/Async/async-ref-counting-optimization.mlir

This file was added.

				// RUN: mlir-opt %s -async-ref-counting-optimization \| FileCheck %s

				silvasUnsubmitted Done Reply Inline Actions Is it interesting to test `async.execute[%token]`? silvas: Is it interesting to test `async.execute[%token]`?
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Added a test, it is indeed quite common pattern with nested async execute operations. ezhulenev: Added a test, it is indeed quite common pattern with nested async execute operations.
				// CHECK-LABEL: @cancellable_operations_0
				func @cancellable_operations_0(%arg0: !async.token) {
				// CHECK-NOT: async.add_ref
				// CHECK-NOT: async.drop_ref
				async.add_ref %arg0 {count = 1 : i32} : !async.token
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				// CHECK: return
				return
				}

				// CHECK-LABEL: @cancellable_operations_1
				func @cancellable_operations_1(%arg0: !async.token) {
				// CHECK-NOT: async.add_ref
				// CHECK: async.execute
				async.add_ref %arg0 {count = 1 : i32} : !async.token
				async.execute [%arg0] {
				// CHECK: async.drop_ref
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				// CHECK-NEXT: async.yield
				async.yield
				}
				// CHECK-NOT: async.drop_ref
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				// CHECK: return
				return
				}

				// CHECK-LABEL: @cancellable_operations_2
				func @cancellable_operations_2(%arg0: !async.token) {
				// CHECK: async.await
				// CHECK-NEXT: async.await
				// CHECK-NEXT: async.await
				// CHECK-NEXT: return
				async.add_ref %arg0 {count = 1 : i32} : !async.token
				async.await %arg0 : !async.token
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				async.await %arg0 : !async.token
				async.add_ref %arg0 {count = 1 : i32} : !async.token
				async.await %arg0 : !async.token
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				return
				}

				// CHECK-LABEL: @cancellable_operations_3
				func @cancellable_operations_3(%arg0: !async.token) {
				// CHECK-NOT: add_ref
				async.add_ref %arg0 {count = 1 : i32} : !async.token
				%token = async.execute {
				async.await %arg0 : !async.token
				// CHECK: async.drop_ref
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				async.yield
				}
				// CHECK-NOT: async.drop_ref
				silvasUnsubmitted Done Reply Inline Actions is scf.if essential to this test case? If not, remove it. if so, describe it in the comment. silvas: is scf.if essential to this test case? If not, remove it. if so, describe it in the comment.
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				// CHECK: async.await
				async.await %arg0 : !async.token
				silvasUnsubmitted Done Reply Inline Actions The input IR here seems strange to me. Will it create a leak if `%arg1 == false`? I don't see a test case that produces IR that looks like this in async-ref-counting.mlir. Perhaps it would be good to add. silvas: The input IR here seems strange to me. Will it create a leak if `%arg1 == false`? I don't see…
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions I was not really thinking about ref counting correctness when writing this tests :) Added an explicit note to the test where this property is violated. ezhulenev: I was not really thinking about ref counting correctness when writing this tests :) Added an…
				// CHECK: return
				return
				}

				// CHECK-LABEL: @not_cancellable_operations_0
				func @not_cancellable_operations_0(%arg0: !async.token, %arg1: i1) {
				silvasUnsubmitted Done Reply Inline Actions nit: inconsistency of `CHECK: drop_ref` vs `CHECK: async.drop_ref` silvas: nit: inconsistency of `CHECK: drop_ref` vs `CHECK: async.drop_ref`
				// It is unsafe to cancel `add_ref` / `drop_ref` pair because it is possible
				// that the body of the `async.execute` operation will run before the await
				// operation in the function body, and will destroy the `%arg0` token.
				// CHECK: add_ref
				async.add_ref %arg0 {count = 1 : i32} : !async.token
				%token = async.execute {
				// CHECK: async.await
				async.await %arg0 : !async.token
				// CHECK: async.drop_ref
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				// CHECK: async.yield
				async.yield
				}
				// CHECK: async.await
				async.await %arg0 : !async.token
				// CHECK: drop_ref
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				// CHECK: return
				return
				}

				// CHECK-LABEL: @not_cancellable_operations_1
				func @not_cancellable_operations_1(%arg0: !async.token, %arg1: i1) {
				// Same reason as above, although `async.execute` is inside the nested
				// region or "regular" opeation.
				//
				// NOTE: This test is not correct w.r.t. reference counting, and at runtime
				// would leak %arg0 value if %arg1 is false. IR like this will not be
				// constructed by automatic reference counting pass, because it would
				// place `async.add_ref` right before the `async.execute` inside `scf.if`.

				// CHECK: async.add_ref
				async.add_ref %arg0 {count = 1 : i32} : !async.token
				scf.if %arg1 {
				%token = async.execute {
				async.await %arg0 : !async.token
				// CHECK: async.drop_ref
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				async.yield
				}
				}
				// CHECK: async.await
				async.await %arg0 : !async.token
				// CHECK: async.drop_ref
				async.drop_ref %arg0 {count = 1 : i32} : !async.token
				// CHECK: return
				return
				}

mlir/test/Dialect/Async/async-ref-counting.mlir

This file was added.

				// RUN: mlir-opt %s -async-ref-counting \| FileCheck %s

				// CHECK-LABEL: @cond
				func private @cond() -> i1

				// CHECK-LABEL: @token_arg_no_uses
				func @token_arg_no_uses(%arg0: !async.token) {
				// CHECK: async.drop_ref %arg0 {count = 1 : i32}
				return
				}

				// CHECK-LABEL: @token_arg_conditional_await
				func @token_arg_conditional_await(%arg0: !async.token, %arg1: i1) {
				cond_br %arg1, ^bb1, ^bb2
				^bb1:
				// CHECK: async.drop_ref %arg0 {count = 1 : i32}
				return
				^bb2:
				// CHECK: async.await %arg0
				// CHECK: async.drop_ref %arg0 {count = 1 : i32}
				async.await %arg0 : !async.token
				return
				}

				// CHECK-LABEL: @token_no_uses
				func @token_no_uses() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				%token = async.execute {
				async.yield
				}
				return
				}

				// CHECK-LABEL: @token_return
				func @token_return() -> !async.token {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				// CHECK: return %[[TOKEN]]
				return %token : !async.token
				}

				// CHECK-LABEL: @token_await
				func @token_await() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				// CHECK: async.await %[[TOKEN]]
				async.await %token : !async.token
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: return
				return
				}

				// CHECK-LABEL: @token_await_and_return
				func @token_await_and_return() -> !async.token {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				// CHECK: async.await %[[TOKEN]]
				// CHECK-NOT: async.drop_ref
				async.await %token : !async.token
				// CHECK: return %[[TOKEN]]
				return %token : !async.token
				}

				// CHECK-LABEL: @token_await_inside_scf_if
				func @token_await_inside_scf_if(%arg0: i1) {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				// CHECK: scf.if %arg0 {
				scf.if %arg0 {
				// CHECK: async.await %[[TOKEN]]
				async.await %token : !async.token
				}
				// CHECK: }
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: return
				return
				}

				// CHECK-LABEL: @token_conditional_await
				func @token_conditional_await(%arg0: i1) {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				cond_br %arg0, ^bb1, ^bb2
				^bb1:
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				return
				^bb2:
				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				async.await %token : !async.token
				return
				}

				// CHECK-LABEL: @token_await_in_the_loop
				func @token_await_in_the_loop() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				br ^bb1
				^bb1:
				// CHECK: async.await %[[TOKEN]]
				async.await %token : !async.token
				%0 = call @cond(): () -> (i1)
				cond_br %0, ^bb1, ^bb2
				^bb2:
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				return
				}

				// CHECK-LABEL: @token_defined_in_the_loop
				func @token_defined_in_the_loop() {
				br ^bb1
				^bb1:
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}
				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				async.await %token : !async.token
				%0 = call @cond(): () -> (i1)
				cond_br %0, ^bb1, ^bb2
				^bb2:
				return
				}

				// CHECK-LABEL: @token_capture
				func @token_capture() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}

				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				silvasUnsubmitted Done Reply Inline Actions Is there a missing `CHECK: async.add_ref %[[TOKEN]]` on the line before `%token_0 = async.execute` and a missing `CHECK: async.drop_ref %[[TOKEN_0]]` before the return? (best to show all add_ref/drop_ref, or use CHECK-NOT to show that they are not produced there) silvas: Is there a missing `CHECK: async.add_ref %[[TOKEN]]` on the line before `%token_0 = async.
				ezhulenevAuthorUnsubmitted Done Reply Inline Actions Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back missing checks to few other tests. ezhulenev: Yes, forgot to update some tests after decoupling it from ref counting optimization. Added back…
				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token_0 = async.execute {
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK-NEXT: async.yield
				async.await %token : !async.token
				async.yield
				}
				// CHECK: async.drop_ref %[[TOKEN_0]] {count = 1 : i32}
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: return
				return
				}

				// CHECK-LABEL: @token_nested_capture
				func @token_nested_capture() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}

				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token_0 = async.execute {
				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: %[[TOKEN_1:.*]] = async.execute
				%token_1 = async.execute {
				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: %[[TOKEN_2:.*]] = async.execute
				%token_2 = async.execute {
				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				async.await %token : !async.token
				async.yield
				}
				// CHECK: async.drop_ref %[[TOKEN_2]] {count = 1 : i32}
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				async.yield
				}
				// CHECK: async.drop_ref %[[TOKEN_1]] {count = 1 : i32}
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				async.yield
				}
				// CHECK: async.drop_ref %[[TOKEN_0]] {count = 1 : i32}
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: return
				return
				}

				// CHECK-LABEL: @token_dependency
				func @token_dependency() {
				// CHECK: %[[TOKEN:.*]] = async.execute
				%token = async.execute {
				async.yield
				}

				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token_0 = async.execute[%token] {
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK-NEXT: async.yield
				async.yield
				}

				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				async.await %token : !async.token
				// CHECK: async.await %[[TOKEN_0]]
				// CHECK: async.drop_ref %[[TOKEN_0]] {count = 1 : i32}
				async.await %token_0 : !async.token

				// CHECK: return
				return
				}

				// CHECK-LABEL: @value_operand
				func @value_operand() -> f32 {
				// CHECK: %[[TOKEN:.]], %[[RESULTS:.]] = async.execute
				%token, %results = async.execute -> !async.value<f32> {
				%0 = constant 0.0 : f32
				async.yield %0 : f32
				}

				// CHECK: async.add_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: async.add_ref %[[RESULTS]] {count = 1 : i32}
				// CHECK: %[[TOKEN_0:.*]] = async.execute
				%token_0 = async.execute[%token](%results as %arg0 : !async.value<f32>) {
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				// CHECK: async.drop_ref %[[RESULTS]] {count = 1 : i32}
				// CHECK: async.yield
				async.yield
				}

				// CHECK: async.await %[[TOKEN]]
				// CHECK: async.drop_ref %[[TOKEN]] {count = 1 : i32}
				async.await %token : !async.token

				// CHECK: async.await %[[TOKEN_0]]
				// CHECK: async.drop_ref %[[TOKEN_0]] {count = 1 : i32}
				async.await %token_0 : !async.token

				// CHECK: async.await %[[RESULTS]]
				// CHECK: async.drop_ref %[[RESULTS]] {count = 1 : i32}
				%0 = async.await %results : !async.value<f32>

				// CHECK: return
				return %0 : f32
				}

mlir/test/Dialect/Async/ops.mlir

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	func @create_group_and_await_all(%arg0: !async.token, %arg1: !async.value<f32>) -> index {
// CHECK: async.add_to_group %arg1		// CHECK: async.add_to_group %arg1
%1 = async.add_to_group %arg0, %0 : !async.token		%1 = async.add_to_group %arg0, %0 : !async.token
%2 = async.add_to_group %arg1, %0 : !async.value<f32>		%2 = async.add_to_group %arg1, %0 : !async.value<f32>
async.await_all %0		async.await_all %0

%3 = addi %1, %2 : index		%3 = addi %1, %2 : index
return %3 : index		return %3 : index
}		}

		// CHECK-LABEL: @add_ref
		func @add_ref(%arg0: !async.token) {
		// CHECK: async.add_ref %arg0 {count = 1 : i32}
		async.add_ref %arg0 {count = 1 : i32} : !async.token
		return
		}

		// CHECK-LABEL: @drop_ref
		func @drop_ref(%arg0: !async.token) {
		// CHECK: async.drop_ref %arg0 {count = 1 : i32}
		async.drop_ref %arg0 {count = 1 : i32} : !async.token
		return
		}

mlir/test/mlir-cpu-runner/async-group.mlir

	// RUN: mlir-opt %s -convert-async-to-llvm \			// RUN: mlir-opt %s -async-ref-counting \
				// RUN: -convert-async-to-llvm \
	// RUN: -convert-std-to-llvm \			// RUN: -convert-std-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e main -entry-point-result=void -O0 \			// RUN: -e main -entry-point-result=void -O0 \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_c_runner_utils%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_c_runner_utils%shlibext \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_async_runtime%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_async_runtime%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	Show All 31 Lines

mlir/test/mlir-cpu-runner/async.mlir

	// RUN: mlir-opt %s -convert-async-to-llvm \			// RUN: mlir-opt %s -async-ref-counting \
				// RUN: -convert-async-to-llvm \
	// RUN: -convert-linalg-to-loops \			// RUN: -convert-linalg-to-loops \
	// RUN: -convert-linalg-to-llvm \			// RUN: -convert-linalg-to-llvm \
	// RUN: -convert-std-to-llvm \			// RUN: -convert-std-to-llvm \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e main -entry-point-result=void -O0 \			// RUN: -e main -entry-point-result=void -O0 \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_c_runner_utils%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_c_runner_utils%shlibext \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_runner_utils%shlibext \
	// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_async_runtime%shlibext \			// RUN: -shared-libs=%linalg_test_lib_dir/libmlir_async_runtime%shlibext \
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Automatic reference counting for Async values + runtime support for ref counted objectsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 306641

mlir/include/mlir/Dialect/Async/IR/Async.h

mlir/include/mlir/Dialect/Async/IR/AsyncBase.td

mlir/include/mlir/Dialect/Async/IR/AsyncOps.td

mlir/include/mlir/Dialect/Async/Passes.h

mlir/include/mlir/Dialect/Async/Passes.td

mlir/include/mlir/ExecutionEngine/AsyncRuntime.h

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-1d.mlir

mlir/integration_test/Dialect/Async/CPU/test-async-parallel-for-2d.mlir

mlir/lib/Conversion/AsyncToLLVM/AsyncToLLVM.cpp

mlir/lib/Dialect/Async/Transforms/AsyncRefCounting.cpp

mlir/lib/Dialect/Async/Transforms/AsyncRefCountingOptimization.cpp

mlir/lib/Dialect/Async/Transforms/CMakeLists.txt

mlir/lib/ExecutionEngine/AsyncRuntime.cpp

mlir/test/Conversion/AsyncToLLVM/convert-to-llvm.mlir

mlir/test/Dialect/Async/async-ref-counting-optimization.mlir

mlir/test/Dialect/Async/async-ref-counting.mlir

mlir/test/Dialect/Async/ops.mlir

mlir/test/mlir-cpu-runner/async-group.mlir

mlir/test/mlir-cpu-runner/async.mlir

[mlir] Automatic reference counting for Async values + runtime support for ref counted objects
ClosedPublic