This is an archive of the discontinued LLVM Phabricator instance.

[coroutines] Part 1 of N: Documentation
ClosedPublic

Authored by GorNishanov on Jul 20 2016, 4:08 PM.

Details

Summary

This is the first patch in the coroutine series. It contains the documentation for the coroutine intrinsics and their usage.

Upstreaming sequence (rough plan)

#. Add documentation. <= we are here
#. Add coroutine intrinsics.
#. Add empty coroutine passes.
#. Add coroutine devirtualization + tests.
#. Add CGSCC restart trigger + tests.
#. Add coroutine heap elision + tests.
#. Add the rest of the logic (split into more patches)

Diff Detail

Event Timeline

GorNishanov retitled this revision from to [coroutines] Part 1 of N: Documentation.
GorNishanov updated this object.
GorNishanov added a reviewer: pitrou.
GorNishanov added a subscriber: llvm-commits.

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

docs/Coroutines.rst
659

Why not return i8* ? Bitcasting the i8* seems nicer unless we expect the result to be in another LLVM address space.

719–720

Shouldn't this be llvm.coro.size.i32 and llvm.coro.size.i64?

909

In other places, you call this <handle>. I'd recommend you make the two variants agree.

GorNishanov marked 2 inline comments as done.

Fixed issues identified by David:

  • added .i32 and .i64 to coro.size intrinsic names.
  • make parameters to all intrinsics unformly named as <name>.

Fixed typo: s/i32/i64 in i64 coro.size.i32

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

I was looking for other examples where two intrinsics or instructions are linked together. They used tokens as means to link them, so I chose this model.

For example, catchpad - catchret pair, or gc_statepoint - gc_result.

Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

docs/Coroutines.rst
659

If I change the intrinsic to

declare i8* @llvm.coro.promise.p0<type>(i8* <handle>)

It will no longer be considered overloaded, and getIntrinsicID() will return 0=not_intrinsic for it.

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

I was looking for other examples where two intrinsics or instructions are linked together. They used tokens as means to link them, so I chose this model.

For example, catchpad - catchret pair, or gc_statepoint - gc_result.

Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

OK, so it is critical that you be able to find the coro.suspend with its associated coro.save?

docs/Coroutines.rst
660

I was suggesting:

declare i8* @llvm.coro.promise(i8* <handle>)

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

[snip] Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

OK, so it is critical that you be able to find the coro.suspend with its associated coro.save?

Yes, for example, given a coro.suspend describing a suspend/resume point 5, I need to find its coro save and replace it with:

%index.addr = gep %hdl <index-field>
store 5, index.addr

The motivating example why a suspend point is split into two instructions, save and suspend is mentioned in the description of coro.save intrinsic.

%save1 = call token @llvm.coro.save(i8* %hdl)
call void async_op1(i8* %hdl)
%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
switch i8 %suspend1, label %suspend [i8 0, label %resume1
                                     i8 1, label %cleanup]

It will get lowered to:

%index.addr = gep %hdl <index-field>
store 1, index.addr 
call void async_op1(i8* %hdl)
ret void

Preparing coroutine for suspension, (just saving the current suspend point index in the coroutine frame), allows the coroutine to be resumed by async_op from the current or a different thread.

If we did not have coro.save (and were saving the current suspend point index at the point of coro.suspend), then, the store to the index will race with possible resumption and chaos and mayhem will ensue. :-)

docs/Coroutines.rst
660

The coro.promise / from.promise intrinsics hide the structure of the coroutine frame. To figure out the offset from where the coroutine handle points to where promise lives, LLVM needs to know its type (or more precisely alignment).

Let's say we are on the architecture with 4 byte pointers, but, the promise requires 16 byte alignment. In this case, the coroutine frame might be:

0000          0004           008                 0016
[ResumeFnPtr] [DestroyFnPtr] less aligned things [promise]
                             or padding

Given that we don't really care about the type and only need to know promise alignment, I take your suggestion and counter offer the following as a replacement for both coro.promise and coro.from.promise intrinsics.

decl i8* @llvm.coro.promise(i8* <handle>, i32 <promise-alignment>, i1 <from>)

old model                            new model
---------                            ---------
@llvm.coro.promise.p0i32(%hdl)       @llvm.coro.promise(%hdl, 4, false)
@llvm.coro.from.promise.p0i32(%hdl)  @llvm.coro.promise(%hdl, 4, true)

Better now?

Out of curiosity, what lead you to use tokens for coro.save and coro.suspend?

[snip] Besides, the only purpose of return value of coro.save is to give it to matching coro.suspend. It does not have any other meaning, besides providing a facility of linking those two intrinsics together.

OK, so it is critical that you be able to find the coro.suspend with its associated coro.save?

Yes, for example, given a coro.suspend describing a suspend/resume point 5, I need to find its coro save and replace it with:

%index.addr = gep %hdl <index-field>
store 5, index.addr

The motivating example why a suspend point is split into two instructions, save and suspend is mentioned in the description of coro.save intrinsic.

%save1 = call token @llvm.coro.save(i8* %hdl)
call void async_op1(i8* %hdl)
%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
switch i8 %suspend1, label %suspend [i8 0, label %resume1
                                     i8 1, label %cleanup]

It will get lowered to:

%index.addr = gep %hdl <index-field>
store 1, index.addr 
call void async_op1(i8* %hdl)
ret void

Preparing coroutine for suspension, (just saving the current suspend point index in the coroutine frame), allows the coroutine to be resumed by async_op from the current or a different thread.

If we did not have coro.save (and were saving the current suspend point index at the point of coro.suspend), then, the store to the index will race with possible resumption and chaos and mayhem will ensue. :-)

OK, sounds like a great use of token!

docs/Coroutines.rst
660

Looks great.

  1. Collapsed coro.promise and coro.from.promise to a single intrinsic:

    declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>)
  1. Added an example to a section describing coro.promise intrinsic.
GorNishanov marked 5 inline comments as done.Jul 21 2016, 4:30 PM

David:

I don't have commit rights (yet). If is LGTM to you, can you submit the patch when appropriate?

This comment was removed by dberris.

Some questions (not really blockers but I think is worth addressing somewhere):

  • Are the coroutines guaranteed to "leak" if they are never run to full completion?
  • Is there a way to explicitly "clone" the state of a coroutine so that it can be restored to a previous state? The use case is for running the same suspended coroutine in different threads, perhaps because there's some non-determinism involved.
  • Will the escape analysis figure out whether there are resources that need cleaning up once execution has flowed through resume points? Say for example, there's a local variable in the preamble that never gets touched again in the resume points -- will the analysis be able to see that there aren't references anymore in the resume points? i.e. is it possible to do just in lowering, or do you need to teach some of the other optimisations about what the final layout would be for coroutines to be able to make these kinds of optimisations?

I don't know if these should be documented here or somewhere else (and if they're been asked before then apologies for being late to the party).

Some questions (not really blockers but I think is worth addressing somewhere):

  • Are the coroutines guaranteed to "leak" if they are never run to full completion?

A suspended coroutine can be always destroyed by a call to coro.destroy. Note that in the Coroutines.rst, I always say: "runs to completion or destroyed via call to the coro.destroy. Of course, it is the frontend responsibility to correctly setup resume and destroy control flow for every coro.suspend.

  • Is there a way to explicitly "clone" the state of a coroutine so that it can be restored to a previous state? The use case is for running the same suspended coroutine in different threads, perhaps because there's some non-determinism involved.

I don't know how to deterministically clone the coroutine state at LLVM level. Consider:

generator<Instruction*> foo() {
   SmallVector<Instruction*, 8> WorkList;
   fillTheWorkList(WorkList);
   for (auto It = WorkList.begin(), E = WorkList.end(); It != E; ++It)
      co_yield *It;
}

To be able to clone this coroutine, LLVM needs to understand when iterator It and E point to the coroutine frame itself (when WorkList contains less than 8 instructions), in this case, you would need to adjust It and E to point at the new memory. If WorkList contains more than 8 instructions, to clone, you would need to know what allocator to use to clone the WorkList for your cloned coroutine.

  • Will the escape analysis figure out whether there are resources that need cleaning up once execution has flowed through resume points? Say for example, there's a local variable in the preamble that never gets touched again in the resume points -- will the analysis be able to see that there aren't references anymore in the resume points? i.e. is it possible to do just in lowering, or do you need to teach some of the other optimisations about what the final layout would be for coroutines to be able to make these kinds of optimisations?

This is very exciting topic! We can do that review when we will be looking at CoroutineFrameBuilder.cpp patch. Or, if you think it is a good idea, before even getting to the patch, I can open a discussion on llvm-devs on the algorithm itself. Let me know what do you think works best.

I don't know if these should be documented here or somewhere else (and if they're been asked before then apologies for being late to the party).

I think I may add a Q&A section at the end of Coroutines.rst to address frequently asked questions. (Not necessarily as a part of this patch).

GorNishanov added a comment.EditedJul 22 2016, 8:59 AM

I think I was too cryptic in my repsonse to your second question. I can provide brief answers to your points now without deferring for later reviews or discussions on llvm-dev.

  • Will the escape analysis figure out whether there are resources that need cleaning up once execution has flowed through resume points? Say for example, there's a local variable in the preamble that never gets touched again in the resume points -- will the analysis be able to see that there aren't references anymore in the resume points? i.e. is it possible to do just in lowering, or do you need to teach some of the other optimisations about what the final layout would be for coroutines to be able to make these kinds of optimisations?

The former. All of the coroutine logic is fully contained in the coroutine passes. This was one of the design goals for this feature to make coroutine handling completely segregated in those passes. No other passes should care about coroutines.

The idea is that a coroutine is a normal function with some intrinsics, it travels through the pipelines and is optimized just like a normal function.

At the end of the IPO pipeline, we split the coroutine into state + function manipulating the state and add those functions to SCC and restart the pipeline, so that now we optimize individual functions driving the state machine.

Figuring out which objects need to go to the coroutine frame and which can stay on the stack, is done as a part of the coroutine splitting pass (where all of the meat is, other three coroutine passes are tiny and very simple)

majnemer accepted this revision.Jul 22 2016, 9:08 PM
majnemer edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jul 22 2016, 9:08 PM
This revision was automatically updated to reflect the committed changes.