This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Introduce `always_complete_coroutine` attribute to guide optimization (1/2)
Needs ReviewPublic

Authored by ChuanqiXu on Sep 15 2022, 8:06 PM.

Details

Summary

Prepare for https://reviews.llvm.org/D134010.

From the perspective of the middle end, the attribute teaches the compiler how to reduce the control flow graph which was impossible.

Currently the optimization will reduce the size of the destroy function since we know the destroy function will only be called after the coroutine completes.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Sep 15 2022, 8:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 8:06 PM
ChuanqiXu requested review of this revision.Sep 15 2022, 8:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 8:06 PM
ChuanqiXu edited the summary of this revision. (Show Details)Sep 15 2022, 8:20 PM
ChuanqiXu added reviewers: rjmccall, ychen.
ChuanqiXu set the repository for this revision to rG LLVM Github Monorepo.
ChuanqiXu updated this revision to Diff 460620.Sep 15 2022, 8:27 PM

Add the new line

nikic added a subscriber: nikic.Sep 16 2022, 12:46 AM

Can you please specify the semantics of the new attribute in LangRef?

ChuanqiXu updated this revision to Diff 460677.Sep 16 2022, 2:00 AM

Address comments.

Can you please specify the semantics of the new attribute in LangRef?

Oh, sorry I forgot due to I thought this in the frontend side.

high-level design comment (I didn't look at the code; wouldn't understand it anyway):
I think there is another important case to cover, besides "a coroutine can only be destroyed after it ran to its final suspension point":
"A coroutine can only be destroy after it ran to its final suspension point *or* before it executed until after its initial suspension point"

Assuming an execution model similar to https://github.com/lewissbaker/cppcoro, i.e. coroutines are always constructed in suspended state, and there is a when_all function to combine multiple coroutine tasks, I tend to write code like

lazy<int> foo(std::unique_ptr<int>);
lazy<int> bar(std::unique_ptr<float>);

lazy<int> foobar() {
     lazy<int> coro1 = foo(make_unique<int>(12));
     lazy<int> coro2 = bar(make_unique<float>(1.2));
     auto [x,y] = co_await when_all(std::move(coro1), std::move(coro2));
     return x + y;
}

In this case, coro1 and coro2 can be destroyed after both of them finished execution (i.e. after the when_all returned).
In addition, coro1 could be destroyed before it is ever co_awaited on, in case bar throws.
However, if coro1/coro2 ever left their initial suspend point, i.e. if they were ever co_awaited on, we know for sure that they won't be destroyed while suspended at one of the intermediate suspension points.
I guess this usage pattern is rather common? If so, I think we should design this attribute, such that it also accomodates that use case...

high-level design comment (I didn't look at the code; wouldn't understand it anyway):
I think there is another important case to cover, besides "a coroutine can only be destroyed after it ran to its final suspension point":
"A coroutine can only be destroy after it ran to its final suspension point *or* before it executed until after its initial suspension point"

Assuming an execution model similar to https://github.com/lewissbaker/cppcoro, i.e. coroutines are always constructed in suspended state, and there is a when_all function to combine multiple coroutine tasks, I tend to write code like

lazy<int> foo(std::unique_ptr<int>);
lazy<int> bar(std::unique_ptr<float>);

lazy<int> foobar() {
     lazy<int> coro1 = foo(make_unique<int>(12));
     lazy<int> coro2 = bar(make_unique<float>(1.2));
     auto [x,y] = co_await when_all(std::move(coro1), std::move(coro2));
     return x + y;
}

In this case, coro1 and coro2 can be destroyed after both of them finished execution (i.e. after the when_all returned).
In addition, coro1 could be destroyed before it is ever co_awaited on, in case bar throws.
However, if coro1/coro2 ever left their initial suspend point, i.e. if they were ever co_awaited on, we know for sure that they won't be destroyed while suspended at one of the intermediate suspension points.
I guess this usage pattern is rather common? If so, I think we should design this attribute, such that it also accomodates that use case...

Yeah, it is about the design in the language side. I got your point. Generally, if we relax the restriction more, we can do less optimizations. If we want to use the attribute in the example case, then we can't optimize the destroy functions.
But I don't think we need to make the decision now. Since we can add more than one attribute! I mean we can add that attribute later if we found this is really useful. And it looks like not a blocking reason to me.

Agree, not a blocker

avogelsgesang added inline comments.Sep 19 2022, 6:51 AM
llvm/docs/Coroutines.rst
1732 ↗(On Diff #460677)

I am a bit surprised that we introduce a new IR-attribute for this.

Can't we reuse the existing branching structure of the suspension points?
I could imagine that we could use the switch after each suspend to encode whether the coroutine can be destroyed at that suspension point. If we point the switch for i8 1, i.e. the cleanup control flow edge, to an unreachable basic block, the coroutine splitting should not optimize out the corresponding code from the destroy function already, doesn't it?

I imagine something like

  // For this suspension point, we want to optimize under the assumption that the
  // coroutine will not be destroyed while suspended.
  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
  // Hence we let the "cleanup" result (`i8 1`) branch to an "unreachable" block
  switch i8 %0, label %suspend [i8 0, label %continue
                                i8 1, label %unreachable_bb]
continue:
  // After the resumption point, execution continues here...
  // Let's assume we suspend a 2nd time. 
  %1 = call i8 @llvm.coro.suspend(token none, i1 false)
  // At this suspension point, we allow the coroutine to be destructed.
  switch i8 %1, label %suspend [i8 0, label %continue2
                                i8 1, label %cleanup]
continue2:
  // Some other stuff happens here...
cleanup:
  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
  call void @free(i8* %mem)
  br label %suspend
suspend:
  %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false)
  ret i8* %hdl
unreachable_bb:
  unreachable
}

To prove the viability of this alternative approach, I copied the LLVM code for https://github.com/llvm/llvm-project/issues/56980 from https://godbolt.org/z/P84MPzq4q and manually replaced the await.cleanup, await2.cleanup ..., await7.cleanup basic blocks by "unreachable". You can find the changed LLVM code in https://godbolt.org/z/n93TTj8vo. As you can see, the generated Foo() [clone .destroy] function does not contain any code for the GlobalSetter destructors

I think re-using the switch control flow is preferable over introducing a new attribute because:

  1. it allows for more fine-grained control: we can control for each suspension point independently whether the coroutine can be destroyed while suspended at this suspension point
  2. it reuses an existing concept instead of introducing a new attribute, thereby keeping LLVM's code simpler
ChuanqiXu added inline comments.Sep 19 2022, 7:12 PM
llvm/docs/Coroutines.rst
1732 ↗(On Diff #460677)

Yeah, it is a good idea if we only want to reduce the destroy function. However, if we look at the following explanation for the control flow information, we can't infer such information any more by the trick. And the implementation of the patch is not complex indeed, if you look at the code actually, you'll find the meaningful change 5 lines of code in CoroSplit.cpp.

Also I feel like the attribute is a general property, it may be possible for other languages to use it. So I guess it might not be a big deal to introduce it.

avogelsgesang added inline comments.Sep 20 2022, 3:46 AM
llvm/docs/Coroutines.rst
1732 ↗(On Diff #460677)

However, if we look at the following explanation for the control flow information, we can't infer such information any more by the trick.

Why not? From my understanding, given the unreachable, we should also be able to "envision the `ret in bb{i} (i != n) as branch instruction to the next block bb{i+1}` with a unknown function call".

Yes, currently the control flow analysis does not yet treat the unreachable in a way such that it uses this additional control flow information, yet. But from what I can tell, also the always_complete_coroutine does not use the additional control flow, either. And I don't see a technical reason why inferring the additional control flow information from the always_complete_coroutine would be harder than inferring it from the unreachable. Or am I missing something?

Also I feel like the attribute is a general property, it may be possible for other languages to use it

Not sure I can follow. Afaict, the unreachable could just as well be used by other language frontends?

ChuanqiXu abandoned this revision.Oct 16 2022, 8:00 PM
This comment was removed by ChuanqiXu.
ChuanqiXu reclaimed this revision.Oct 16 2022, 8:01 PM

Mis-operations