This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
7/7
Coroutines.rst
-
index.rst

Differential D22603

[coroutines] Part 1 of N: Documentation
ClosedPublic

Authored by GorNishanov on Jul 20 2016, 4:08 PM.

Download Raw Diff

Details

Reviewers

majnemer
pitrou

Commits

rGa653927e8b98: [coroutines] Part 1 of N: Documentation
rL276513: [coroutines] Part 1 of N: Documentation

Summary

This is the first patch in the coroutine series. It contains the documentation for the coroutine intrinsics and their usage.

Upstreaming sequence (rough plan)

#. Add documentation. <= we are here
#. Add coroutine intrinsics.
#. Add empty coroutine passes.
#. Add coroutine devirtualization + tests.
#. Add CGSCC restart trigger + tests.
#. Add coroutine heap elision + tests.
#. Add the rest of the logic (split into more patches)

Diff Detail

Event Timeline

GorNishanov updated this revision to Diff 64787.Jul 20 2016, 4:08 PM

GorNishanov retitled this revision from to [coroutines] Part 1 of N: Documentation.

GorNishanov updated this object.

GorNishanov added a reviewer: pitrou.

GorNishanov added a subscriber: llvm-commits.

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

docs/Coroutines.rst
659	Why not return i8* ? Bitcasting the i8* seems nicer unless we expect the result to be in another LLVM address space.
719–720	Shouldn't this be llvm.coro.size.i32 and llvm.coro.size.i64?
909	In other places, you call this <handle>. I'd recommend you make the two variants agree.

Fixed issues identified by David:

added .i32 and .i64 to coro.size intrinsic names.
make parameters to all intrinsics unformly named as <name>.

Fixed typo: s/i32/i64 in i64 coro.size.i32

In D22603#490604, @majnemer wrote:

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

I was looking for other examples where two intrinsics or instructions are linked together. They used tokens as means to link them, so I chose this model.

For example, catchpad - catchret pair, or gc_statepoint - gc_result.

Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

docs/Coroutines.rst
659	If I change the intrinsic to declare i8* @llvm.coro.promise.p0<type>(i8* <handle>) It will no longer be considered overloaded, and getIntrinsicID() will return 0=not_intrinsic for it.

dberris added a subscriber: dberris.Jul 20 2016, 10:36 PM

In D22603#490753, @GorNishanov wrote:

In D22603#490604, @majnemer wrote:

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

I was looking for other examples where two intrinsics or instructions are linked together. They used tokens as means to link them, so I chose this model.

For example, catchpad - catchret pair, or gc_statepoint - gc_result.

Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

OK, so it is critical that you be able to find the coro.suspend with its associated coro.save?

docs/Coroutines.rst
660	I was suggesting: declare i8* @llvm.coro.promise(i8* <handle>)

In D22603#490812, @majnemer wrote:

In D22603#490753, @GorNishanov wrote:

In D22603#490604, @majnemer wrote:

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

[snip] Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

OK, so it is critical that you be able to find the coro.suspend with its associated coro.save?

Yes, for example, given a coro.suspend describing a suspend/resume point 5, I need to find its coro save and replace it with:

%index.addr = gep %hdl <index-field>
store 5, index.addr

The motivating example why a suspend point is split into two instructions, save and suspend is mentioned in the description of coro.save intrinsic.

%save1 = call token @llvm.coro.save(i8* %hdl)
call void async_op1(i8* %hdl)
%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
switch i8 %suspend1, label %suspend [i8 0, label %resume1
                                     i8 1, label %cleanup]

It will get lowered to:

%index.addr = gep %hdl <index-field>
store 1, index.addr 
call void async_op1(i8* %hdl)
ret void

Preparing coroutine for suspension, (just saving the current suspend point index in the coroutine frame), allows the coroutine to be resumed by async_op from the current or a different thread.

If we did not have coro.save (and were saving the current suspend point index at the point of coro.suspend), then, the store to the index will race with possible resumption and chaos and mayhem will ensue. :-)

docs/Coroutines.rst
660	The coro.promise / from.promise intrinsics hide the structure of the coroutine frame. To figure out the offset from where the coroutine handle points to where promise lives, LLVM needs to know its type (or more precisely alignment). Let's say we are on the architecture with 4 byte pointers, but, the promise requires 16 byte alignment. In this case, the coroutine frame might be: 0000 0004 008 0016 [ResumeFnPtr] [DestroyFnPtr] less aligned things [promise] or padding Given that we don't really care about the type and only need to know promise alignment, I take your suggestion and counter offer the following as a replacement for both coro.promise and coro.from.promise intrinsics. decl i8* @llvm.coro.promise(i8* <handle>, i32 <promise-alignment>, i1 <from>) old model new model --------- --------- @llvm.coro.promise.p0i32(%hdl) @llvm.coro.promise(%hdl, 4, false) @llvm.coro.from.promise.p0i32(%hdl) @llvm.coro.promise(%hdl, 4, true) Better now?

In D22603#491320, @GorNishanov wrote:
In D22603#490812, @majnemer wrote:

In D22603#490753, @GorNishanov wrote:

In D22603#490604, @majnemer wrote:

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

[snip] Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

OK, so it is critical that you be able to find the coro.suspend with its associated coro.save?

Yes, for example, given a coro.suspend describing a suspend/resume point 5, I need to find its coro save and replace it with:
%index.addr = gep %hdl <index-field>
store 5, index.addr
The motivating example why a suspend point is split into two instructions, save and suspend is mentioned in the description of coro.save intrinsic.
%save1 = call token @llvm.coro.save(i8* %hdl)
call void async_op1(i8* %hdl)
%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
switch i8 %suspend1, label %suspend [i8 0, label %resume1
                                     i8 1, label %cleanup]
It will get lowered to:
%index.addr = gep %hdl <index-field>
store 1, index.addr 
call void async_op1(i8* %hdl)
ret void
Preparing coroutine for suspension, (just saving the current suspend point index in the coroutine frame), allows the coroutine to be resumed by async_op from the current or a different thread.

If we did not have coro.save (and were saving the current suspend point index at the point of coro.suspend), then, the store to the index will race with possible resumption and chaos and mayhem will ensue. :-)

OK, sounds like a great use of token!

docs/Coroutines.rst
660	Looks great.

Collapsed coro.promise and coro.from.promise to a single intrinsic:

declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>)

Added an example to a section describing coro.promise intrinsic.

GorNishanov added a reviewer: majnemer.Jul 21 2016, 4:28 PM

David:

I don't have commit rights (yet). If is LGTM to you, can you submit the patch when appropriate?

GorNishanov added a child revision: D22659: [coroutines] Part 2 of N: Adding Coroutine Intrinsics.Jul 21 2016, 7:12 PM

dberris added a comment.Jul 21 2016, 7:54 PM

This comment was removed by dberris.

Some questions (not really blockers but I think is worth addressing somewhere):

Are the coroutines guaranteed to "leak" if they are never run to full completion?
Is there a way to explicitly "clone" the state of a coroutine so that it can be restored to a previous state? The use case is for running the same suspended coroutine in different threads, perhaps because there's some non-determinism involved.
Will the escape analysis figure out whether there are resources that need cleaning up once execution has flowed through resume points? Say for example, there's a local variable in the preamble that never gets touched again in the resume points -- will the analysis be able to see that there aren't references anymore in the resume points? i.e. is it possible to do just in lowering, or do you need to teach some of the other optimisations about what the final layout would be for coroutines to be able to make these kinds of optimisations?

I don't know if these should be documented here or somewhere else (and if they're been asked before then apologies for being late to the party).

In D22603#492436, @dberris wrote:

Some questions (not really blockers but I think is worth addressing somewhere):

Are the coroutines guaranteed to "leak" if they are never run to full completion?

A suspended coroutine can be always destroyed by a call to coro.destroy. Note that in the Coroutines.rst, I always say: "runs to completion or destroyed via call to the coro.destroy. Of course, it is the frontend responsibility to correctly setup resume and destroy control flow for every coro.suspend.

Is there a way to explicitly "clone" the state of a coroutine so that it can be restored to a previous state? The use case is for running the same suspended coroutine in different threads, perhaps because there's some non-determinism involved.

I don't know how to deterministically clone the coroutine state at LLVM level. Consider:

generator<Instruction*> foo() {
   SmallVector<Instruction*, 8> WorkList;
   fillTheWorkList(WorkList);
   for (auto It = WorkList.begin(), E = WorkList.end(); It != E; ++It)
      co_yield *It;
}

To be able to clone this coroutine, LLVM needs to understand when iterator It and E point to the coroutine frame itself (when WorkList contains less than 8 instructions), in this case, you would need to adjust It and E to point at the new memory. If WorkList contains more than 8 instructions, to clone, you would need to know what allocator to use to clone the WorkList for your cloned coroutine.

Will the escape analysis figure out whether there are resources that need cleaning up once execution has flowed through resume points? Say for example, there's a local variable in the preamble that never gets touched again in the resume points -- will the analysis be able to see that there aren't references anymore in the resume points? i.e. is it possible to do just in lowering, or do you need to teach some of the other optimisations about what the final layout would be for coroutines to be able to make these kinds of optimisations?

This is very exciting topic! We can do that review when we will be looking at CoroutineFrameBuilder.cpp patch. Or, if you think it is a good idea, before even getting to the patch, I can open a discussion on llvm-devs on the algorithm itself. Let me know what do you think works best.

I don't know if these should be documented here or somewhere else (and if they're been asked before then apologies for being late to the party).

I think I may add a Q&A section at the end of Coroutines.rst to address frequently asked questions. (Not necessarily as a part of this patch).

I think I was too cryptic in my repsonse to your second question. I can provide brief answers to your points now without deferring for later reviews or discussions on llvm-dev.

In D22603#492436, @dberris wrote:

Will the escape analysis figure out whether there are resources that need cleaning up once execution has flowed through resume points? Say for example, there's a local variable in the preamble that never gets touched again in the resume points -- will the analysis be able to see that there aren't references anymore in the resume points? i.e. is it possible to do just in lowering, or do you need to teach some of the other optimisations about what the final layout would be for coroutines to be able to make these kinds of optimisations?

The former. All of the coroutine logic is fully contained in the coroutine passes. This was one of the design goals for this feature to make coroutine handling completely segregated in those passes. No other passes should care about coroutines.

The idea is that a coroutine is a normal function with some intrinsics, it travels through the pipelines and is optimized just like a normal function.

At the end of the IPO pipeline, we split the coroutine into state + function manipulating the state and add those functions to SCC and restart the pipeline, so that now we optimize individual functions driving the state machine.

Figuring out which objects need to go to the coroutine frame and which can stay on the stack, is done as a part of the coroutine splitting pass (where all of the meat is, other three coroutine passes are tiny and very simple)

LGTM

This revision is now accepted and ready to land.Jul 22 2016, 9:08 PM

Closed by commit rL276513: [coroutines] Part 1 of N: Documentation (authored by majnemer). · Explain WhyJul 22 2016, 9:12 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

docs/

Coroutines.rst

1206 lines

index.rst

4 lines

Diff 64813

docs/Coroutines.rst

This file was added.

				=====================================
				Coroutines in LLVM
				=====================================

				.. contents::
				:local:
				:depth: 3

				.. warning::
				This is a work in progress. Compatibility across LLVM releases is not
				guaranteed.

				Introduction
				============

				.. _coroutine handle:

				LLVM coroutines are functions that have one or more `suspend points`_.
				When a suspend point is reached, the execution of a coroutine is suspended and
				control is returned back to its caller. A suspended coroutine can be resumed
				to continue execution from the last suspend point or it can be destroyed.

				In the following example, we call function `f` (which may or may not be a
				coroutine itself) that returns a handle to a suspended coroutine
				(coroutine handle) that is used by `main` to resume the coroutine twice and
				then destroy it:

				.. code-block:: llvm

				define i32 @main() {
				entry:
				%hdl = call i8* @f(i32 4)
				call void @llvm.coro.resume(i8* %hdl)
				call void @llvm.coro.resume(i8* %hdl)
				call void @llvm.coro.destroy(i8* %hdl)
				ret i32 0
				}

				.. _coroutine frame:

				In addition to the function stack frame which exists when a coroutine is
				executing, there is an additional region of storage that contains objects that
				keep the coroutine state when a coroutine is suspended. This region of storage
				is called coroutine frame. It is created when a coroutine is called and
				destroyed when a coroutine runs to completion or destroyed by a call to
				the `coro.destroy`_ intrinsic.

				An LLVM coroutine is represented as an LLVM function that has calls to
				`coroutine intrinsics`_ defining the structure of the coroutine.
				After lowering, a coroutine is split into several
				functions that represent three different ways of how control can enter the
				coroutine:

				1. a ramp function, which represents an initial invocation of the coroutine that
				creates the coroutine frame and executes the coroutine code until it
				encounters a suspend point or reaches the end of the function;

				2. a coroutine resume function that is invoked when the coroutine is resumed;

				3. a coroutine destroy function that is invoked when the coroutine is destroyed.

				.. note:: Splitting out resume and destroy functions are just one of the
				possible ways of lowering the coroutine. We chose it for initial
				implementation as it matches closely the mental model and results in
				reasonably nice code.

				Coroutines by Example
				=====================

				Coroutine Representation
				------------------------

				Let's look at an example of an LLVM coroutine with the behavior sketched
				by the following pseudo-code.

				.. code-block:: C++

				void *f(int n) {
				for(;;) {
				print(n++);
				<suspend> // returns a coroutine handle on first suspend
				}
				}

				This coroutine calls some function `print` with value `n` as an argument and
				suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
				a `main` shown in the previous section. It will call `print` with values 4, 5
				and 6 after which the coroutine will be destroyed.

				The LLVM IR for this coroutine looks like this:

				.. code-block:: llvm

				define i8* @f(i32 %n) {
				entry:
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
				br label %loop
				loop:
				%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
				%inc = add nsw i32 %n.val, 1
				call void @print(i32 %n.val)
				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %loop
				i8 1, label %cleanup]
				cleanup:
				%mem = call i8* @llvm.coro.free(i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend
				suspend:
				call void @llvm.coro.end(i8* %hdl, i1 false)
				ret i8* %hdl
				}

				The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
				lowered to a constant representing the size required for the coroutine frame.
				The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
				coroutine handle. The first parameter of `coro.begin` is given a block of memory
				to be used if the coroutine frame needs to be allocated dynamically.

				The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
				given the coroutine handle, returns a pointer of the memory block to be freed or
				`null` if the coroutine frame was not allocated dynamically. The `cleanup`
				block is entered when coroutine runs to completion by itself or destroyed via
				call to the `coro.destroy`_ intrinsic.

				The `suspend` block contains code to be executed when coroutine runs to
				completion or suspended. The `coro.end`_ intrinsic marks the point where
				a coroutine needs to return control back to the caller if it is not an initial
				invocation of the coroutine.

				The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
				intrinsic in combination with the following switch indicates what happens to
				control flow when a coroutine is suspended (default case), resumed (case 0) or
				destroyed (case 1).

				Coroutine Transformation
				------------------------

				One of the steps of coroutine lowering is building the coroutine frame. The
				def-use chains are analyzed to determine which objects need be kept alive across
				suspend points. In the coroutine shown in the previous section, use of virtual register
				`%n.val` is separated from the definition by a suspend point, therefore, it
				cannot reside on the stack frame since the latter goes away once the coroutine
				is suspended and control is returned back to the caller. An i32 slot is
				allocated in the coroutine frame and `%n.val` is spilled and reloaded from that
				slot as needed.

				We also store addresses of the resume and destroy functions so that the
				`coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine
				when its identity cannot be determined statically at compile time. For our
				example, the coroutine frame will be:

				.. code-block:: llvm

				%f.frame = type { void (%f.frame), void (%f.frame), i32 }

				After resume and destroy parts are outlined, function `f` will contain only the
				code responsible for creation and initialization of the coroutine frame and
				execution of the coroutine until a suspend point is reached:

				.. code-block:: llvm

				define i8* @f(i32 %n) {
				entry:
				%alloc = call noalias i8* @malloc(i32 24)
				%0 = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
				%frame = bitcast i8* %frame to %f.frame*
				%1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
				store void (%f.frame) @f.resume, void (%f.frame)* %1
				%2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
				store void (%f.frame) @f.destroy, void (%f.frame)* %2

				%inc = add nsw i32 %n, 1
				%inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				store i32 %inc, i32* %inc.spill.addr
				call void @print(i32 %n)

				ret i8* %frame
				}

				Outlined resume part of the coroutine will reside in function `f.resume`:

				.. code-block:: llvm

				define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) {
				entry:
				%inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
				%inc.spill = load i32, i32* %inc.spill.addr, align 4
				%inc = add i32 %n.val, 1
				store i32 %inc, i32* %inc.spill.addr, align 4
				tail call void @print(i32 %inc)
				ret void
				}

				Whereas function `f.destroy` will contain the cleanup code for the coroutine:

				.. code-block:: llvm

				define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) {
				entry:
				%0 = bitcast %f.frame* %frame.ptr.destroy to i8*
				tail call void @free(i8* %0)
				ret void
				}

				Avoiding Heap Allocations
				-------------------------

				A particular coroutine usage pattern, which is illustrated by the `main`
				function in the overview section, where a coroutine is created, manipulated and
				destroyed by the same calling function, is common for coroutines implementing
				RAII idiom and is suitable for allocation elision optimization which avoid
				dynamic allocation by storing the coroutine frame as a static `alloca` in its
				caller.

				If a coroutine uses allocation and deallocation functions that are known to
				LLVM, unused calls to `malloc` and calls to `free` with `null` argument will be
				removed as dead code. However, if custom allocation functions are used, the
				`coro.alloc` and `coro.free` intrinsics can be used to enable removal of custom
				allocation and deallocation code when coroutine does not require dynamic
				allocation of the coroutine frame.

				In the entry block, we will call `coro.alloc`_ intrinsic that will return `null`
				when dynamic allocation is required, and non-null otherwise:

				.. code-block:: llvm

				entry:
				%elide = call i8* @llvm.coro.alloc()
				%need.dyn.alloc = icmp ne i8* %elide, null
				br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
				dyn.alloc:
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @CustomAlloc(i32 %size)
				br label %coro.begin
				coro.begin:
				%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
				%hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)

				In the cleanup block, we will make freeing the coroutine frame conditional on
				`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
				thus skipping the deallocation code:

				.. code-block:: llvm

				cleanup:
				%mem = call i8* @llvm.coro.free(i8* %hdl)
				%need.dyn.free = icmp ne i8* %mem, null
				br i1 %need.dyn.free, label %dyn.free, label %if.end
				dyn.free:
				call void @CustomFree(i8* %mem)
				br label %if.end
				if.end:
				...

				With allocations and deallocations represented as described as above, after
				coroutine heap allocation elision optimization, the resulting main will end up
				looking just like it was when we used `malloc` and `free`:

				.. code-block:: llvm

				define i32 @main() {
				entry:
				call void @print(i32 4)
				call void @print(i32 5)
				call void @print(i32 6)
				ret i32 0
				}

				Multiple Suspend Points
				-----------------------

				Let's consider the coroutine that has more than one suspend point:

				.. code-block:: C++

				void *f(int n) {
				for(;;) {
				print(n++);
				<suspend>
				print(-n);
				<suspend>
				}
				}

				Matching LLVM code would look like (with the rest of the code remaining the same
				as the code in the previous section):

				.. code-block:: llvm

				loop:
				%n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ]
				call void @print(i32 %n.addr) #4
				%2 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %2, label %suspend [i8 0, label %loop.resume
				i8 1, label %cleanup]
				loop.resume:
				%inc = add nsw i32 %n.addr, 1
				%sub = xor i32 %n.addr, -1
				call void @print(i32 %sub)
				%3 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %3, label %suspend [i8 0, label %loop
				i8 1, label %cleanup]

				In this case, the coroutine frame would include a suspend index that will
				indicate at which suspend point the coroutine needs to resume. The resume
				function will use an index to jump to an appropriate basic block and will look
				as follows:

				.. code-block:: llvm

				define internal fastcc void @f.Resume(%f.Frame* %FramePtr) {
				entry.Resume:
				%index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2
				%index = load i8, i8* %index.addr, align 1
				%switch = icmp eq i8 %index, 0
				%n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3
				%n = load i32, i32* %n.addr, align 4
				br i1 %switch, label %loop.resume, label %loop

				loop.resume:
				%sub = xor i32 %n, -1
				call void @print(i32 %sub)
				br label %suspend
				loop:
				%inc = add nsw i32 %n, 1
				store i32 %inc, i32* %n.addr, align 4
				tail call void @print(i32 %inc)
				br label %suspend

				suspend:
				%storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ]
				store i8 %storemerge, i8* %index.addr, align 1
				ret void
				}

				If different cleanup code needs to get executed for different suspend points,
				a similar switch will be in the `f.destroy` function.

				.. note ::

				Using suspend index in a coroutine state and having a switch in `f.resume` and
				`f.destroy` is one of the possible implementation strategies. We explored
				another option where a distinct `f.resume1`, `f.resume2`, etc. are created for
				every suspend point, and instead of storing an index, the resume and destroy
				function pointers are updated at every suspend. Early testing showed that the
				current approach is easier on the optimizer than the latter so it is a
				lowering strategy implemented at the moment.

				Distinct Save and Suspend
				-------------------------

				In the previous example, setting a resume index (or some other state change that
				needs to happen to prepare a coroutine for resumption) happens at the same time as
				a suspension of a coroutine. However, in certain cases, it is necessary to control
				when coroutine is prepared for resumption and when it is suspended.

				In the following example, a coroutine represents some activity that is driven
				by completions of asynchronous operations `async_op1` and `async_op2` which get
				a coroutine handle as a parameter and resume the coroutine once async
				operation is finished.

				.. code-block:: llvm

				void g() {
				for (;;)
				if (cond()) {
				async_op1(<coroutine-handle>); // will resume once async_op1 completes
				<suspend>
				do_one();
				}
				else {
				async_op2(<coroutine-handle>); // will resume once async_op2 completes
				<suspend>
				do_two();
				}
				}
				}

				In this case, coroutine should be ready for resumption prior to a call to
				`async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a
				point when coroutine should be ready for resumption (namely, when a resume index
				should be stored in the coroutine frame, so that it can be resumed at the
				correct resume point):

				.. code-block:: llvm

				if.true:
				%save1 = call token @llvm.coro.save(i8* %hdl)
				call void async_op1(i8* %hdl)
				%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
				switch i8 %suspend1, label %suspend [i8 0, label %resume1
				i8 1, label %cleanup]
				if.false:
				%save2 = call token @llvm.coro.save(i8* %hdl)
				call void async_op2(i8* %hdl)
				%suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false)
				switch i8 %suspend1, label %suspend [i8 0, label %resume2
				i8 1, label %cleanup]

				.. _coroutine promise:

				Coroutine Promise
				-----------------

				A coroutine author or a frontend may designate a distinguished `alloca` that can
				be used to communicate with the coroutine. This distinguished alloca is called
				coroutine promise and is provided as a third parameter to the `coro.begin`_
				intrinsic.

				The following coroutine designates a 32 bit integer `promise` and uses it to
				store the current value produced by a coroutine.

				.. code-block:: llvm

				define i8* @f(i32 %n) {
				entry:
				%promise = alloca i32
				%pv = bitcast i32* %promise to i8*
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null)
				br label %loop
				loop:
				%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
				%inc = add nsw i32 %n.val, 1
				store i32 %n.val, i32* %promise
				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %loop
				i8 1, label %cleanup]
				cleanup:
				%mem = call i8* @llvm.coro.free(i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend
				suspend:
				call void @llvm.coro.end(i8* %hdl, i1 false)
				ret i8* %hdl
				}

				A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the
				coroutine promise.

				.. code-block:: llvm

				define i32 @main() {
				entry:
				%hdl = call i8* @f(i32 4)
				%promise.addr = call i32* @llvm.coro.promise.p0i32(i8* %hdl)
				%val0 = load i32, i32* %promise.addr
				call void @print(i32 %val0)
				call void @llvm.coro.resume(i8* %hdl)
				%val1 = load i32, i32* %promise.addr
				call void @print(i32 %val1)
				call void @llvm.coro.resume(i8* %hdl)
				%val2 = load i32, i32* %promise.addr
				call void @print(i32 %val2)
				call void @llvm.coro.destroy(i8* %hdl)
				ret i32 0
				}

				There is also an intrinsic `coro.from.promise`_ that performs a reverse
				operation. Given an address of a coroutine promise, it obtains a coroutine handle.
				This intrinsic is the only mechanism for a user code outside of the coroutine
				to get access to the coroutine handle.

				After example in this section is compiled, result of the compilation will
				exactly like the result of the very first example:

				.. code-block:: llvm

				define i32 @main() {
				entry:
				tail call void @print(i32 4)
				tail call void @print(i32 5)
				tail call void @print(i32 6)
				ret i32 0
				}

				.. _final:
				.. _final suspend:

				Final Suspend
				-------------

				A coroutine author or a frontend may designate a particular suspend to be final,
				by setting the second argument of the `coro.suspend`_ intrinsic to `true`.
				Such a suspend point has two properties:

				* it is possible to check whether a suspended coroutine is at the final suspend
				point via `coro.done`_ intrinsic;

				* a resumption of a coroutine stopped at the final suspend point leads to
				undefined behavior. The only possible action for a coroutine at a final
				suspend point is destroying it via `coro.destroy`_ intrinsic.

				From the user perspective, the final suspend point represents an idea of a
				coroutine reaching the end. From the compiler perspective, it is an optimization
				opportunity for reducing number of resume points (and therefore switch cases) in
				the resume function.

				The following is an example of a function that keeps resuming the coroutine
				until the final suspend point is reached after which point the coroutine is
				destroyed:

				.. code-block:: llvm

				define i32 @main() {
				entry:
				%hdl = call i8* @f(i32 4)
				br label %while
				while:
				call void @llvm.coro.resume(i8* %hdl)
				%done = call i1 @llvm.coro.done(i8* %hdl)
				br i1 %done, label %end, label %while
				end:
				call void @llvm.coro.destroy(i8* %hdl)
				ret i32 0
				}

				Usually, final suspend point is a frontend injected suspend point that does not
				correspond to any explicitly authored suspend point of the high level language.
				For example, for a Python generator that has only one suspend point:

				.. code-block:: python

				def coroutine(n):
				for i in range(n):
				yield i

				Python frontend would inject two more suspend points, so that the actual code
				looks like this:

				.. code-block:: C

				void* coroutine(int n) {
				int current_value;
				<designate current_value to be coroutine promise>
				<SUSPEND> // injected suspend point, so that the coroutine starts suspended
				for (int i = 0; i < n; ++i) {
				current_value = i; <SUSPEND>; // corresponds to "yield i"
				}
				<SUSPEND final=true> // injected final suspend point
				}

				and python iterator `__next__` would look like:

				.. code-block:: C++

				int __next__(void* hdl) {
				coro.resume(hdl);
				if (coro.done(hdl)) throw StopIteration();
				return (int)coro.promise(hdl);
				}

				Intrinsics
				==========

				Coroutine Manipulation Intrinsics
				---------------------------------

				Intrinsics described in this section are used to manipulate an existing
				coroutine. They can be used in any function which happen to have a pointer
				to a `coroutine frame`_ or a pointer to a `coroutine promise`_.

				.. _coro.destroy:

				'llvm.coro.destroy' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare void @llvm.coro.destroy(i8* <handle>)

				Overview:
				"""""""""

				The '``llvm.coro.destroy``' intrinsic destroys a suspended
				coroutine.

				Arguments:
				""""""""""

				The argument is a coroutine handle to a suspended coroutine.

				Semantics:
				""""""""""

				When possible, the `coro.destroy` intrinsic is replaced with a direct call to
				the coroutine destroy function. Otherwise it is replaced with an indirect call
				based on the function pointer for the destroy function stored in the coroutine
				frame. Destroying a coroutine that is not suspended leads to undefined behavior.

				.. _coro.resume:

				'llvm.coro.resume' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				::

				declare void @llvm.coro.resume(i8* <handle>)

				Overview:
				"""""""""

				The '``llvm.coro.resume``' intrinsic resumes a suspended coroutine.

				Arguments:
				""""""""""

				The argument is a handle to a suspended coroutine.

				Semantics:
				""""""""""

				When possible, the `coro.resume` intrinsic is replaced with a direct call to the
				coroutine resume function. Otherwise it is replaced with an indirect call based
				on the function pointer for the resume function stored in the coroutine frame.
				Resuming a coroutine that is not suspended leads to undefined behavior.

				.. _coro.done:

				'llvm.coro.done' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				::

				declare i1 @llvm.coro.done(i8* <handle>)

				Overview:
				"""""""""

				The '``llvm.coro.done``' intrinsic checks whether a suspended coroutine is at
				the final suspend point or not.

				Arguments:
				""""""""""

				The argument is a handle to a suspended coroutine.

				Semantics:
				""""""""""

				Using this intrinsic on a coroutine that does not have a `final suspend`_ point
				or on a coroutine that is not suspended leads to undefined behavior.

				.. _coro.promise:

				'llvm.coro.promise' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				::

				declare <type>* @llvm.coro.promise.p0<type>(i8* <handle>)

				majnemerUnsubmitted Done Reply Inline Actions Why not return i8* ? Bitcasting the i8* seems nicer unless we expect the result to be in another LLVM address space. majnemer: Why not return i8* ? Bitcasting the i8* seems nicer unless we expect the result to be in…
				GorNishanovAuthorUnsubmitted Done Reply Inline Actions If I change the intrinsic to declare i8* @llvm.coro.promise.p0<type>(i8* <handle>) It will no longer be considered overloaded, and getIntrinsicID() will return 0=not_intrinsic for it. GorNishanov: If I change the intrinsic to ``` declare i8* @llvm.coro.promise.p0<type>(i8* <handle>) ```…
				Overview:
				majnemerUnsubmitted Done Reply Inline Actions I was suggesting: declare i8* @llvm.coro.promise(i8* <handle>) majnemer: I was suggesting: declare i8* @llvm.coro.promise(i8* <handle>)
				GorNishanovAuthorUnsubmitted Done Reply Inline Actions The coro.promise / from.promise intrinsics hide the structure of the coroutine frame. To figure out the offset from where the coroutine handle points to where promise lives, LLVM needs to know its type (or more precisely alignment). Let's say we are on the architecture with 4 byte pointers, but, the promise requires 16 byte alignment. In this case, the coroutine frame might be: 0000 0004 008 0016 [ResumeFnPtr] [DestroyFnPtr] less aligned things [promise] or padding Given that we don't really care about the type and only need to know promise alignment, I take your suggestion and counter offer the following as a replacement for both coro.promise and coro.from.promise intrinsics. decl i8* @llvm.coro.promise(i8* <handle>, i32 <promise-alignment>, i1 <from>) old model new model --------- --------- @llvm.coro.promise.p0i32(%hdl) @llvm.coro.promise(%hdl, 4, false) @llvm.coro.from.promise.p0i32(%hdl) @llvm.coro.promise(%hdl, 4, true) Better now? GorNishanov: The coro.promise / from.promise intrinsics hide the structure of the coroutine frame. To figure…
				majnemerUnsubmitted Done Reply Inline Actions Looks great. majnemer: Looks great.
				"""""""""

				The '``llvm.coro.promise``' intrinsic returns a pointer to a
				`coroutine promise`_.

				Arguments:
				""""""""""

				The argument is a handle to a coroutine.

				Semantics:
				""""""""""

				Using this intrinsic on a coroutine that does not have a coroutine promise
				leads to undefined behavior. It is possible to read and modify coroutine
				promise of the coroutine which is currently executing. The coroutine author and
				a coroutine user are responsible to makes sure there is no data races.

				.. _coro.from.promise:

				'llvm.coro.from.promise' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				::

				declare i8* @llvm.coro.from.promise.p0<type>(<type>* <handle>)

				Overview:
				"""""""""

				The '``llvm.coro.from.promise``' intrinsic returns a coroutine
				handle given the coroutine promise.

				Arguments:
				""""""""""

				An address of a coroutine promise.

				Semantics:
				""""""""""

				Using this intrinsic on a coroutine that does not have a coroutine promise
				results in undefined behavior.

				.. _coroutine intrinsics:

				Coroutine Structure Intrinsics
				------------------------------
				Intrinsics described in this section are used within a coroutine to describe
				the coroutine structure. They should not be used outside of a coroutine.

				.. _coro.size:

				'llvm.coro.size' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i32 @llvm.coro.size.i32()
				declare i64 @llvm.coro.size.i32()

				majnemerUnsubmitted Done Reply Inline Actions Shouldn't this be llvm.coro.size.i32 and llvm.coro.size.i64? majnemer: Shouldn't this be llvm.coro.size.i32 and llvm.coro.size.i64?
				Overview:
				"""""""""

				The '``llvm.coro.size``' intrinsic returns the number of bytes
				required to store a `coroutine frame`_.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				The `coro.size` intrinsic is lowered to a constant representing the size of
				the coroutine frame.

				.. _coro.begin:

				'llvm.coro.begin' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i8* @llvm.coro.begin(i8* <mem>, i32 <align>, i8* <promise>, i8* <fnaddr>)

				Overview:
				"""""""""

				The '``llvm.coro.begin``' intrinsic returns an address of the
				coroutine frame.

				Arguments:
				""""""""""

				The first argument is a pointer to a block of memory in which coroutine frame
				may use if memory for the coroutine frame needs to be allocated dynamically.

				The second argument provides information on the alignment of the memory returned
				by the allocation function and given to `coro.begin` by the first argument. If
				this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
				This argument only accepts constants.

				The third argument, if not `null`, designates a particular alloca instruction to
				be a `coroutine promise`_.

				The fourth argument is `null` before coroutine is split, and later is replaced
				to point to a private global constant array containing function pointers to
				outlined resume and destroy parts of the coroutine.

				Semantics:
				""""""""""

				Depending on the alignment requirements of the objects in the coroutine frame
				and/or on the codegen compactness reasons the pointer returned from `coro.begin`
				may be at offset to the `%mem` argument. (This could be beneficial if
				instructions that express relative access to data can be more compactly encoded
				with small positive and negative offsets).

				Frontend should emit exactly one `coro.begin` intrinsic per coroutine.

				.. _coro.free:

				'llvm.coro.free' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i8* @llvm.coro.free(i8* <frame>)

				Overview:
				"""""""""

				The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
				coroutine frame is stored or `null` if this instance of a coroutine did not use
				dynamically allocated memory for its coroutine frame.

				Arguments:
				""""""""""

				A pointer to the coroutine frame. This should be the same pointer that was
				returned by prior `coro.begin` call.

				Example (custom deallocation function):
				"""""""""""""""""""""""""""""""""""""""

				.. code-block:: llvm

				cleanup:
				%mem = call i8* @llvm.coro.free(i8* %frame)
				%mem_not_null = icmp ne i8* %mem, null
				br i1 %mem_not_null, label %if.then, label %if.end
				if.then:
				call void @CustomFree(i8* %mem)
				br label %if.end
				if.end:
				ret void

				Example (standard deallocation functions):
				""""""""""""""""""""""""""""""""""""""""""

				.. code-block:: llvm

				cleanup:
				%mem = call i8* @llvm.coro.free(i8* %frame)
				call void @free(i8* %mem)
				ret void

				.. _coro.alloc:

				'llvm.coro.alloc' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i8* @llvm.coro.alloc()

				Overview:
				"""""""""

				The '``llvm.coro.alloc``' intrinsic returns an address of the memory on the
				callers frame where coroutine frame of this coroutine can be placed or `null`
				otherwise.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				If the coroutine is eligible for heap elision, this intrinsic is lowered to an
				alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`.
				This intrinsic only needs to be used if a custom allocation function is used
				(i.e. a function not recognized by LLVM as a memory allocation function) and the
				language rules allow for custom allocation / deallocation to be elided when not
				needed.

				Example:
				""""""""

				.. code-block:: llvm

				entry:
				%elide = call i8* @llvm.coro.alloc()
				%0 = icmp ne i8* %elide, null
				br i1 %0, label %coro.begin, label %coro.alloc

				coro.alloc:
				%frame.size = call i32 @llvm.coro.size()
				%alloc = call i8* @MyAlloc(i32 %frame.size)
				br label %coro.begin

				coro.begin:
				%phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
				%frame = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)

				.. _coro.frame:

				'llvm.coro.frame' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i8* @llvm.coro.frame()

				Overview:
				"""""""""

				The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of
				the enclosing coroutine.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is
				a frontend convenience intrinsic that makes it easier to refer to the
				coroutine frame.

				.. _coro.end:

				'llvm.coro.end' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare void @llvm.coro.end(i8* <handle>, i1 <unwind>)

				majnemerUnsubmitted Done Reply Inline Actions In other places, you call this <handle>. I'd recommend you make the two variants agree. majnemer: In other places, you call this <handle>. I'd recommend you make the two variants agree.
				Overview:
				"""""""""

				The '``llvm.coro.end``' marks the point where execution of the resume part of
				the coroutine should end and control returns back to the caller.


				Arguments:
				""""""""""

				The first argument should refer to the coroutine handle of the enclosing coroutine.

				The second argument should be `true` if this coro.end is in the block that is
				part of the unwind sequence leaving the coroutine body due to exception prior to
				the first reaching any suspend points, and `false` otherwise.

				Semantics:
				""""""""""
				The `coro.end`_ intrinsic is a no-op during an initial invocation of the
				coroutine. When the coroutine resumes, the intrinsic marks the point when
				coroutine need to return control back to the caller.

				This intrinsic is removed by the CoroSplit pass when a coroutine is split into
				the start, resume and destroy parts. In start part, the intrinsic is removed,
				in resume and destroy parts, it is replaced with `ret void` instructions and
				the rest of the block containing `coro.end` instruction is discarded.

				In landing pads it is replaced with an appropriate instruction to unwind to
				caller.

				A frontend is allowed to supply null as the first parameter, in this case
				`coro-early` pass will replace the null with an appropriate coroutine handle
				value.

				.. _coro.suspend:
				.. _suspend points:

				'llvm.coro.suspend' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i8 @llvm.coro.suspend(token <save>, i1 <final>)

				Overview:
				"""""""""

				The '``llvm.coro.suspend``' marks the point where execution of the coroutine
				need to get suspended and control returned back to the caller.
				Conditional branches consuming the result of this intrinsic lead to basic blocks
				where coroutine should proceed when suspended (-1), resumed (0) or destroyed
				(1).

				Arguments:
				""""""""""

				The first argument refers to a token of `coro.save` intrinsic that marks the
				point when coroutine state is prepared for suspension. If `none` token is passed,
				the intrinsic behaves as if there were a `coro.save` immediately preceding
				the `coro.suspend` intrinsic.

				The second argument indicates whether this suspension point is `final`_.
				The second argument only accepts constants. If more than one suspend point is
				designated as final, the resume and destroy branches should lead to the same
				basic blocks.

				Example (normal suspend point):
				"""""""""""""""""""""""""""""""

				.. code-block:: llvm

				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]

				Example (final suspend point):
				""""""""""""""""""""""""""""""

				.. code-block:: llvm

				while.end:
				%s.final = call i8 @llvm.coro.suspend(token none, i1 true)
				switch i8 %s.final, label %suspend [i8 0, label %trap
				i8 1, label %cleanup]
				trap:
				call void @llvm.trap()
				unreachable

				Semantics:
				""""""""""

				If a coroutine that was suspended at the suspend point marked by this intrinsic
				is resumed via `coro.resume`_ the control will transfer to the basic block
				of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
				basic block indicated by the 1-case. To suspend, coroutine proceed to the
				default label.

				If suspend intrinsic is marked as final, it can consider the `true` branch
				unreachable and can perform optimizations that can take advantage of that fact.

				.. _coro.save:

				'llvm.coro.save' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare token @llvm.coro.save(i8* <handle>)

				Overview:
				"""""""""

				The '``llvm.coro.save``' marks the point where a coroutine need to update its
				state to prepare for resumption to be considered suspended (and thus eligible
				for resumption).

				Arguments:
				""""""""""

				The first argument points to a coroutine handle of the enclosing coroutine.

				Semantics:
				""""""""""

				Whatever coroutine state changes are required to enable resumption of
				the coroutine from the corresponding suspend point should be done at the point
				of `coro.save` intrinsic.

				Example:
				""""""""

				Separate save and suspend points are necessary when a coroutine is used to
				represent an asynchronous control flow driven by callbacks representing
				completions of asynchronous operations.

				In such a case, a coroutine should be ready for resumption prior to a call to
				`async_op` function that may trigger resumption of a coroutine from the same or
				a different thread possibly prior to `async_op` call returning control back
				to the coroutine:

				.. code-block:: llvm

				%save1 = call token @llvm.coro.save(i8* %hdl)
				call void async_op1(i8* %hdl)
				%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
				switch i8 %suspend1, label %suspend [i8 0, label %resume1
				i8 1, label %cleanup]

				.. _coro.param:

				'llvm.coro.param' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i1 @llvm.coro.param(i8* <original>, i8* <copy>)

				Overview:
				"""""""""

				The '``llvm.coro.param``' is used by the frontend to mark up the code used to
				construct and destruct copies of the parameters. If the optimizer discovers that
				a particular parameter copy is not used after any suspends, it can remove the
				construction and destruction of the copy by replacing corresponding coro.param
				with `i1 false` and replacing any use of the `copy` with the `original`.

				Arguments:
				""""""""""

				The first argument points to an `alloca` storing the value of a parameter to a
				coroutine.

				The second argument points to an `alloca` storing the value of the copy of that
				parameter.

				Semantics:
				""""""""""

				The optimizer is free to always replace this intrinsic with `i1 true`.

				The optimizer is also allowed to replace it with `i1 false` provided that the
				parameter copy is only used prior to control flow reaching any of the suspend
				points. The code that would be DCE'd if the `coro.param` is replaced with
				`i1 false` is not considered to be a use of the parameter copy.

				The frontend can emit this intrinsic if its language rules allow for this
				optimization.

				Example:
				""""""""
				Consider the following example. A coroutine takes two parameters `a` and `b`
				that has a destructor and a move constructor.

				.. code-block:: C++

				struct A { ~A(); A(A&&); bool foo(); void bar(); };

				task<int> f(A a, A b) {
				if (a.foo())
				return 42;

				a.bar();
				co_await read_async(); // introduces suspend point
				b.bar();
				}

				Note that, uses of `b` is used after a suspend point and thus must be copied
				into a coroutine frame, whereas `a` does not have to, since it never used
				after suspend.

				A frontend can create parameter copies for `a` and `b` as follows:

				.. code-block:: C++

				task<int> f(A a', A b') {
				a = alloca A;
				b = alloca A;
				// move parameters to its copies
				if (coro.param(a', a)) A::A(a, A&& a');
				if (coro.param(b', b)) A::A(b, A&& b');
				...
				// destroy parameters copies
				if (coro.param(a', a)) A::~A(a);
				if (coro.param(b', b)) A::~A(b);
				}

				The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses
				of `a` with `a'`, since it is not used after suspend.

				The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used
				after suspend and therefore, it has to reside in the coroutine frame.

				Coroutine Transformation Passes
				===============================
				CoroEarly
				---------
				The pass CoroEarly lowers coroutine intrinsics that hide the details of the
				structure of the coroutine frame, but, otherwise not needed to be preserved to
				help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
				`coro.promise`_ and `coro.from.promise`_ intrinsics.

				.. _CoroSplit:

				CoroSplit
				---------
				The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
				into separate functions.

				CoroElide
				---------
				The pass CoroElide examines if the inlined coroutine is eligible for heap
				allocation elision optimization. If so, it replaces `coro.alloc` and
				`coro.begin` intrinsic with an address of a coroutine frame placed on its caller
				and replaces `coro.free` intrinsics with `null` to remove the deallocation code.
				This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
				calls to resume and destroy functions for a particular coroutine where possible.

				CoroCleanup
				-----------
				This pass runs late to lower all coroutine related intrinsics not replaced by
				earlier passes.

				Upstreaming sequence (rough plan)
				=================================
				#. Add documentation. <= we are here
				#. Add coroutine intrinsics.
				#. Add empty coroutine passes.
				#. Add coroutine devirtualization + tests.
				#. Add CGSCC restart trigger + tests.
				#. Add coroutine heap elision + tests.
				#. Add custom allocation heap elision + tests.
				#. Add coroutine splitting logic + tests.
				#. Add simple coroutine frame builder + tests.
				#. Add the rest of the logic + tests. (Maybe split further as needed).

				Areas Requiring Attention
				=========================
				#. A coroutine frame is bigger than it could be. Adding stack packing and stack
				coloring like optimization on the coroutine frame will result in tighter
				coroutine frames.

				#. Take advantage of the lifetime intrinsics for the data that goes into the
				coroutine frame. Leave lifetime intrinsics as is for the data that stays in
				allocas.

				#. The CoroElide optimization pass relies on coroutine ramp function to be
				inlined. It would be beneficial to split the ramp function further to
				increase the chance that it will get inlined into its caller.

				#. Design a convention that would make it possible to apply coroutine heap
				elision optimization across ABI boundaries.

				#. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows).

				#. Alignment is ignored by coro.begin and coro.free intrinsics.

				#. Make required changes to make sure that coroutine optimizations work with
				LTO.

				#. More tests, more tests, more tests

docs/index.rst

Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	.. toctree::
InAlloca		InAlloca
BigEndianNEON		BigEndianNEON
CoverageMappingFormat		CoverageMappingFormat
Statepoints		Statepoints
MergeFunctions		MergeFunctions
TypeMetadata		TypeMetadata
FaultMaps		FaultMaps
MIRLangRef		MIRLangRef
		Coroutines

:doc:`WritingAnLLVMPass`		:doc:`WritingAnLLVMPass`
Information on how to write LLVM transformations and analyses.		Information on how to write LLVM transformations and analyses.

:doc:`WritingAnLLVMBackend`		:doc:`WritingAnLLVMBackend`
Information on how to write LLVM backends for machine targets.		Information on how to write LLVM backends for machine targets.

:doc:`CodeGenerator`		:doc:`CodeGenerator`
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	:doc:`InAlloca`
Description of the ``inalloca`` argument attribute.		Description of the ``inalloca`` argument attribute.

:doc:`FaultMaps`		:doc:`FaultMaps`
LLVM support for folding control flow into faulting machine instructions.		LLVM support for folding control flow into faulting machine instructions.

:doc:`CompileCudaWithLLVM`		:doc:`CompileCudaWithLLVM`
LLVM support for CUDA.		LLVM support for CUDA.

		:doc:`Coroutines`
		LLVM support for coroutines.

Development Process Documentation		Development Process Documentation
=================================		=================================

Information about LLVM's development process.		Information about LLVM's development process.

.. toctree::		.. toctree::
:hidden:		:hidden:

▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines