Index: docs/Coroutines.rst =================================================================== --- docs/Coroutines.rst +++ docs/Coroutines.rst @@ -73,7 +73,7 @@ Let's look at an example of an LLVM coroutine with the behavior sketched by the following pseudo-code. -.. code-block:: C++ +.. code-block:: c++ void *f(int n) { for(;;) { @@ -215,15 +215,9 @@ dynamic allocation by storing the coroutine frame as a static `alloca` in its caller. -If a coroutine uses allocation and deallocation functions that are known to -LLVM, unused calls to `malloc` and calls to `free` with `null` argument will be -removed as dead code. However, if custom allocation functions are used, the -`coro.alloc` and `coro.free` intrinsics can be used to enable removal of custom -allocation and deallocation code when coroutine does not require dynamic -allocation of the coroutine frame. - In the entry block, we will call `coro.alloc`_ intrinsic that will return `null` -when dynamic allocation is required, and non-null otherwise: +when dynamic allocation is required, and an address of an alloca on the caller's +frame where coroutine frame can be stored if dynamic allocation is elided. .. code-block:: llvm @@ -256,8 +250,7 @@ ... With allocations and deallocations represented as described as above, after -coroutine heap allocation elision optimization, the resulting main will end up -looking just like it was when we used `malloc` and `free`: +coroutine heap allocation elision optimization, the resulting main will be: .. code-block:: llvm @@ -274,7 +267,7 @@ Let's consider the coroutine that has more than one suspend point: -.. code-block:: C++ +.. code-block:: c++ void *f(int n) { for(;;) { @@ -419,12 +412,19 @@ entry: %promise = alloca i32 %pv = bitcast i32* %promise to i8* + %elide = call i8* @llvm.coro.alloc() + %need.dyn.alloc = icmp ne i8* %elide, null + br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc + dyn.alloc: %size = call i32 @llvm.coro.size.i32() %alloc = call i8* @malloc(i32 %size) - %hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null) + br label %coro.begin + coro.begin: + %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] + %hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* %pv, i8* null) br label %loop loop: - %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ] + %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ] %inc = add nsw i32 %n.val, 1 store i32 %n.val, i32* %promise %0 = call i8 @llvm.coro.suspend(token none, i1 false) @@ -461,8 +461,7 @@ ret i32 0 } -After example in this section is compiled, result of the compilation will -exactly like the result of the very first example: +After example in this section is compiled, result of the compilation will be: .. code-block:: llvm @@ -528,7 +527,7 @@ Python frontend would inject two more suspend points, so that the actual code looks like this: -.. code-block:: C +.. code-block:: c void* coroutine(int n) { int current_value; @@ -542,7 +541,7 @@ and python iterator `__next__` would look like: -.. code-block:: C++ +.. code-block:: c++ int __next__(void* hdl) { coro.resume(hdl); @@ -758,14 +757,13 @@ Overview: """"""""" -The '``llvm.coro.begin``' intrinsic returns an address of the -coroutine frame. +The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame. Arguments: """""""""" -The first argument is a pointer to a block of memory in which coroutine frame -may use if memory for the coroutine frame needs to be allocated dynamically. +The first argument is a pointer to a block of memory where coroutine frame +will be stored. The second argument provides information on the alignment of the memory returned by the allocation function and given to `coro.begin` by the first argument. If @@ -788,7 +786,7 @@ instructions that express relative access to data can be more compactly encoded with small positive and negative offsets). -Frontend should emit exactly one `coro.begin` intrinsic per coroutine. +A frontend should emit exactly one `coro.begin` intrinsic per coroutine. .. _coro.free: @@ -861,10 +859,8 @@ If the coroutine is eligible for heap elision, this intrinsic is lowered to an alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`. -This intrinsic only needs to be used if a custom allocation function is used -(i.e. a function not recognized by LLVM as a memory allocation function) and the -language rules allow for custom allocation / deallocation to be elided when not -needed. + +A frontend should emit at most one `coro.alloc` intrinsic per coroutine. Example: """""""" @@ -1076,7 +1072,7 @@ Overview: """"""""" -The '``llvm.coro.param``' is used by the frontend to mark up the code used to +The '``llvm.coro.param``' is used by a frontend to mark up the code used to construct and destruct copies of the parameters. If the optimizer discovers that a particular parameter copy is not used after any suspends, it can remove the construction and destruction of the copy by replacing corresponding coro.param @@ -1101,7 +1097,7 @@ points. The code that would be DCE'd if the `coro.param` is replaced with `i1 false` is not considered to be a use of the parameter copy. -The frontend can emit this intrinsic if its language rules allow for this +A frontend can emit this intrinsic if its language rules allow for this optimization. Example: @@ -1109,7 +1105,7 @@ Consider the following example. A coroutine takes two parameters `a` and `b` that has a destructor and a move constructor. -.. code-block:: C++ +.. code-block:: c++ struct A { ~A(); A(A&&); bool foo(); void bar(); }; @@ -1180,8 +1176,8 @@ Upstreaming sequence (rough plan) ================================= -#. Add documentation. <= we are here -#. Add coroutine intrinsics. +#. Add documentation. +#. Add coroutine intrinsics. <= we are here #. Add empty coroutine passes. #. Add coroutine devirtualization + tests. #. Add CGSCC restart trigger + tests. Index: include/llvm/Analysis/TargetTransformInfoImpl.h =================================================================== --- include/llvm/Analysis/TargetTransformInfoImpl.h +++ include/llvm/Analysis/TargetTransformInfoImpl.h @@ -152,6 +152,15 @@ case Intrinsic::var_annotation: case Intrinsic::experimental_gc_result: case Intrinsic::experimental_gc_relocate: + case Intrinsic::coro_alloc: + case Intrinsic::coro_begin: + case Intrinsic::coro_free: + case Intrinsic::coro_end: + case Intrinsic::coro_frame: + case Intrinsic::coro_size: + case Intrinsic::coro_suspend: + case Intrinsic::coro_param: + case Intrinsic::coro_subfn_addr: // These intrinsics don't actually represent code after lowering. return TTI::TCC_Free; } Index: include/llvm/IR/Intrinsics.td =================================================================== --- include/llvm/IR/Intrinsics.td +++ include/llvm/IR/Intrinsics.td @@ -597,7 +597,46 @@ [llvm_token_ty, llvm_i32_ty, llvm_i32_ty], [IntrReadMem]>; -//===-------------------------- Other Intrinsics --------------------------===// +//===------------------------ Coroutine Intrinsics ---------------===// +// These are documented in docs/Coroutines.rst + +// Coroutine Structure Intrinsics. + +def int_coro_alloc : Intrinsic<[llvm_ptr_ty], [], []>; +def int_coro_begin : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i32_ty, + llvm_ptr_ty, llvm_ptr_ty], + [WriteOnly<0>, ReadNone<2>, ReadOnly<3>, + NoCapture<3>]>; + +def int_coro_free : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty], + [IntrArgMemOnly, ReadOnly<0>, NoCapture<0>]>; +def int_coro_end : Intrinsic<[], [llvm_ptr_ty, llvm_i1_ty], []>; + +def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>; +def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>; + +def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>; +def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>; + +def int_coro_param : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_ptr_ty], + [IntrNoMem, ReadNone<0>, ReadNone<1>]>; + +// Coroutine Manipulation Intrinsics. + +def int_coro_resume : Intrinsic<[], [llvm_ptr_ty], [Throws]>; +def int_coro_destroy : Intrinsic<[], [llvm_ptr_ty], [Throws]>; +def int_coro_done : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty], + [IntrArgMemOnly, ReadOnly<0>, NoCapture<0>]>; +def int_coro_promise : Intrinsic<[llvm_ptr_ty], + [llvm_ptr_ty, llvm_i32_ty, llvm_i1_ty], + [IntrNoMem, NoCapture<0>]>; + +// Coroutine Lowering Intrinsics. Used internally by coroutine passes. + +def int_coro_subfn_addr : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i8_ty], + [IntrArgMemOnly, ReadOnly<0>,NoCapture<0>]>; + +///===-------------------------- Other Intrinsics --------------------------===// // def int_flt_rounds : Intrinsic<[llvm_i32_ty]>, GCCBuiltin<"__builtin_flt_rounds">;