This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
1/2
ReleaseNotes.rst
-
include/clang/Basic/
-
clang/
-
Basic/
-
Attr.td
3/16
AttrDocs.td
-
lib/
-
CodeGen/
-
CGCoroutine.cpp
-
Sema/
-
SemaDeclAttr.cpp
-
test/
-
CodeGenCoroutines/
-
coro-coro_always_done.cpp
-
Misc/
-
pragma-attribute-supported-attributes-list.test

Differential D134010

[C++] [Coroutines] Introduce `coro_always_done` attribute to help optimizations (2/2)
Needs ReviewPublic

Authored by ChuanqiXu on Sep 15 2022, 8:08 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman
erichkeane
ychen
rjmccall
avogelsgesang

Summary

Closing https://github.com/llvm/llvm-project/issues/56980

Coroutines will be split into pieces, which will complexify the control flow graph. Then the compiler must be conservative to avoid miscompilation. However, there are some cases that the programmer can understand the behavior of their coroutines but they can't pass the information to compiler now. And this patch fixes this.

Currently, the only optimization will reduce the size of the destroy function. I have other ideas but need to be proved. But I think this should be fine enough. For a internal coroutine-heavily-used example, the code sizes reduces 4% after I applied the attribute.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Sep 15 2022, 8:08 PM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptSep 15 2022, 8:08 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: jdoerfert. · View Herald Transcript

ChuanqiXu requested review of this revision.Sep 15 2022, 8:08 PM

ChuanqiXu edited the summary of this revision. (Show Details)Sep 15 2022, 8:16 PM

ChuanqiXu added reviewers: erichkeane, ychen, rjmccall.

ChuanqiXu mentioned this in D134009: [Coroutines] Introduce `always_complete_coroutine` attribute to guide optimization (1/2).Sep 15 2022, 8:20 PM

Harbormaster completed remote builds in B187037: Diff 460618.Sep 15 2022, 8:25 PM

Add ReleaseNotes.

Harbormaster completed remote builds in B187040: Diff 460621.Sep 15 2022, 9:35 PM

Add description with `__has_attribute(coro_always_done)`.

Harbormaster completed remote builds in B187094: Diff 460685.Sep 16 2022, 3:47 AM

I only vaguely understand what this is supposed to do,please see if you can take another run at the AttrDocs.td changes to see if you can make it more clear from the perspective of a user.

Source changes are otherwise minor, but I'm still on the fence as to whether there is value here, or whether this is something the optimizer shoudl be able to figure out on its own.

clang/docs/ReleaseNotes.rst
251
clang/include/clang/Basic/AttrDocs.td
6720	I dont get what this sentence means. Could you try again? I only barely understand the rest of this first section, otherwise I'd try to suggest an edit. In this document, it should also explain what the programmer is promising the compiler here.

In D134010#3795195, @erichkeane wrote:

I only vaguely understand what this is supposed to do,please see if you can take another run at the AttrDocs.td changes to see if you can make it more clear from the perspective of a user.

Yeah, I'll try to refactor later.

Source changes are otherwise minor, but I'm still on the fence as to whether there is value here, or whether this is something the optimizer shoudl be able to figure out on its own.

Let me answer the value question first. The attribute is helpful 2 optimizations (I got now, there may be more cases): one is implemented in the previous patch and the another one is a little bit harder to be implement. I'm fine if you want me to implement both although I feel like the first one is good enough since the codes in the frontend part are simply trivial.

The first optimization is that we can reduce the destroy function just like the example shows. I'll try to explain/refactor it more if it is not clear enough.

And the second optimization is about the coroutine elision or HALO, which would eliminate the dynamic allocation for coroutines. I am not sure if you know the dynamic allocation are necessary for coroutines. Then there is an optimization called coroutine elision in the compiler to eliminate the dynamic allocation. For example:

C++
Lazy<int> foo(); // Lazy is a classic coroutine task type.
void bar() {
     auto f = foo();
} // The coroutine frame for f will be destroyed automatically by the destructor of `Lazy`

In this example, compiler can optimize out the allocation since the lifetime for Lazy f is pretty simple. It is created after bar() starts and it is destroyed before bar() ends. So the coroutine frame of f can be a local variable of bar() safely.
However, this simple case is not common in real code (of course). The more real and common code pattern may be like:

C++
Lazy<int> foo(); // Lazy is a classic coroutine task type.
Lazy<int> bar() {
     auto f = foo();
     co_await f; // coroutine `bar()` will be suspended,
                        // and the executor will decide when the coroutine `bar()` will be resumed too.
     // other stuff
} // The coroutine frame for f will be destroyed automatically by the destructor of `Lazy`

Now from the perspective of compiler, it can't know anything about the executor. So the compiler can't infer the lifetime of f. The compiler could follow the instruction from the language spec: allocate the coroutine frame of f dynamically.

However, the programmer know the behavior of their program. They know the coroutine bar() must be resumed. But they can't send this message to the compiler now. Then here comes the attribute I introduced. Now the compiler can get this information if the coroutine is marked as the new attribute. So the attribute can be helpful to the optimizations.

However, the second optimization are a little far from the end user. It is a little like the compiler won't tell the actual optimization they would perform based on the [[likely]] attribute or assume intrinsics. So I'd like to mention these optimization vaguely in the document.

But again, I want to say that even if there is only the first optimization (reduce the destroy function), I think it is good enough to be adopted.

Address comments.

@avogelsgesang I guess you're able to review the docs to see if it is clear enough from the perspective of a user.

Harbormaster completed remote builds in B187428: Diff 461127.Sep 18 2022, 9:59 PM

took a look at the documentation; didn't read the implementation itself

clang/docs/ReleaseNotes.rst
252	can we add a direct link to the attribute documentation here?
clang/include/clang/Basic/AttrDocs.td
6721–6726
6755
6756
6757
6761	I am a bit surprised that we apply the `[[clang::coro_always_done]]` to the coroutine type and not the `promise_type`. Usually I would expect all things which influence the way that the compiler generates code for coroutines inside the `promise_type`
6776–6778
6790–6791

In D134010#3797524, @ChuanqiXu wrote:
In D134010#3795195, @erichkeane wrote:

I only vaguely understand what this is supposed to do,please see if you can take another run at the AttrDocs.td changes to see if you can make it more clear from the perspective of a user.

Yeah, I'll try to refactor later.

Source changes are otherwise minor, but I'm still on the fence as to whether there is value here, or whether this is something the optimizer shoudl be able to figure out on its own.

Let me answer the value question first. The attribute is helpful 2 optimizations (I got now, there may be more cases): one is implemented in the previous patch and the another one is a little bit harder to be implement. I'm fine if you want me to implement both although I feel like the first one is good enough since the codes in the frontend part are simply trivial.

The first optimization is that we can reduce the destroy function just like the example shows. I'll try to explain/refactor it more if it is not clear enough.

And the second optimization is about the coroutine elision or HALO, which would eliminate the dynamic allocation for coroutines. I am not sure if you know the dynamic allocation are necessary for coroutines. Then there is an optimization called coroutine elision in the compiler to eliminate the dynamic allocation. For example:
C++
Lazy<int> foo(); // Lazy is a classic coroutine task type.
void bar() {
     auto f = foo();
} // The coroutine frame for f will be destroyed automatically by the destructor of `Lazy`
In this example, compiler can optimize out the allocation since the lifetime for Lazy f is pretty simple. It is created after bar() starts and it is destroyed before bar() ends. So the coroutine frame of f can be a local variable of bar() safely.
However, this simple case is not common in real code (of course). The more real and common code pattern may be like:
C++
Lazy<int> foo(); // Lazy is a classic coroutine task type.
Lazy<int> bar() {
     auto f = foo();
     co_await f; // coroutine `bar()` will be suspended,
                        // and the executor will decide when the coroutine `bar()` will be resumed too.
     // other stuff
} // The coroutine frame for f will be destroyed automatically by the destructor of `Lazy`
Now from the perspective of compiler, it can't know anything about the executor. So the compiler can't infer the lifetime of f. The compiler could follow the instruction from the language spec: allocate the coroutine frame of f dynamically.

However, the programmer know the behavior of their program. They know the coroutine bar() must be resumed. But they can't send this message to the compiler now. Then here comes the attribute I introduced. Now the compiler can get this information if the coroutine is marked as the new attribute. So the attribute can be helpful to the optimizations.

However, the second optimization are a little far from the end user. It is a little like the compiler won't tell the actual optimization they would perform based on the [[likely]] attribute or assume intrinsics. So I'd like to mention these optimization vaguely in the document.

But again, I want to say that even if there is only the first optimization (reduce the destroy function), I think it is good enough to be adopted.

Thanks for the explanation, I see justification for it. A lot of that would fit very well in the commit message (and perhaps the docs!). Please do your best to take another run at the documentation to make it much more clear on the intent and functionality.

ChuanqiXu added inline comments.Sep 19 2022, 7:29 PM

clang/include/clang/Basic/AttrDocs.td
6761	There are two reasons for this: Current style is easier to implement. We keep the relationship of coroutine with its promise_type in the Sema (semantic analysis) part. And we generate the code in the CodeGen part, where the relation information are lost. So it is easier to make the decision by the return type. Also, it wouldn't be hard to depend on the `promise_type`. But the other reason I thought is that when people talking about the different coroutine behaviors, we generally say "generator coroutine", "task coroutine" or even "cppcoro task coroutine", "folly task coroutine". I mean we're always talking about the return type. And the `promise_type` one looks more like a library implementer's tongue. (I won't argue if you say the attribute is mostly for the library writer). So I think the current choice might not be bad.

avogelsgesang added inline comments.Sep 20 2022, 4:35 AM

clang/include/clang/Basic/AttrDocs.td
6761	I thought is that when people talking about the different coroutine behaviors, we generally say "generator coroutine", "task coroutine" or even "cppcoro task coroutine", "folly task coroutine" good point. I wonder, though, if it might makes sense to apply this attribute also to other return types, like a `std::future`. For `std::future`, a library implementor is able to specialize `coroutine_traits` accordingly, so he can provide an own `promise_type` for `std::future`. In that case, the library writer won't be able to add the `[[clang::coro_always_done]]` attribute to `std::future` but only to the `promise_type` injected why the `coroutine_traits` specialization.
6780–6783	can you explain to me why exceptions are a problem here? The documentation for the `always_complete_coroutine` LLVM-IR attribute in https://reviews.llvm.org/D134009 does not mention this restriction. Would exceptions still be problematic if we used the `unreachable` instead of the new LLVM-IR attribute as proposed in https://reviews.llvm.org/D134009#inline-1292827?

avogelsgesang added inline comments.Sep 20 2022, 8:28 AM

clang/include/clang/Basic/AttrDocs.td
6761	Another related thought: I think instead of the `[[clang::coro_always_done]]` attribute, we maybe want to introduce a `[[clang::co_await_always_resumes]]` attribute. This attribute would apply to the awaitable instead of the coroutine itself. Proposed semantics The `[[clang::co_await_always_resumes]]` can be applied either to a `co_await` as in task<int> foo(); task<int> foo2(); task<int> bar() { int x = [[clang::co_await_always_resumes]] co_await foo(); co_return x + [[clang::co_await_always_resumes]] co_await foo2(); } or directly to an Awaitable class class [[clang::co_await_always_resumes]] task {...}; If applied to a `co_await`, the attribute implies that for this `co_await` suspension point the coroutine will always be resumed. Destroying the coroutine while suspended at this suspension point is undefined behavior. If applied to a `struct`/`class`, all `co_await`s waiting on an instance of this struct/class are guaranteed to be resumed. Given, e.g., class [[clang::co_await_always_resumes]] task {...}; class std::lazy {...}; // from the standard library; not annotated with `co_await_always_resumes` task<int> foo1(); std::lazy<int> foo2(); std::lazy<int> foo3(); task<int> bar() { int x = co_await foo1(); int y = co_await foo2(); int z = [[clang::co_await_always_resumes]] co_await foo3(); co_return x + y + z; } the coroutine `bar` cannot be destroyed at suspension point `foo1`, because the `task` class is annotated with `co_await_always_resumes` can be destroyed at suspension point `foo2`, because `std::lazy` is not annotated with `co_await_always_resumes` cannot be destroyed at suspension point `foo3`, because the co_await is annotated with `co_await_always_resumes`. Whether `bar` can be destroyed while suspended at the initial/final suspension point can be customized by annotating the return type of `task::promise_type::{initial_suspend,final_suspend}` with `co_await_always_resumes`. Underlying mental model Everytime I `co_await` on something, I am stopping the coroutine and handing over control to the `Awaitable`. The `Awaitable` then decides what it wants to do with the `std::coroutine_handle` and when to resume it, e.g., calling `resume` after some asynchronous read operation finished. Given that the `Awaitable` decides when to call `resume`, it would be natural to let the `Awaitable` make the promise "I will always call resume on the received coroutine handle" by adding the `co_await_always_resumes` attribute. Usage examples With the `co_await_always_resumes` attribute we can decide for each `co_await` suspension point independently whether the coroutine can be destroyed while suspended at this point. I see this capability as relevant in practice, as shown by the following examples Usage example: Asynchronous generators Let's assume I have an asynchronous generator similar to namespace io { class [[clang::co_await_always_resumes]] task { ... }; class FileHandle {...}; task<FileHandle> openFile(std::path p); task<std::string> readLine(FileHandle f); } async_generator<std::string> readLinesFromFiles(std::span<std::path> filePaths) { for (auto& filePath : filePaths) { auto fileHandle = co_await io::openFile(filePath); while (!fileHandle.eof_reached()) { std::string line = co_await io::readLine(fileHandle); co_yield std::move(line); } } } io::task<int> findLineNumberOfFirstEmptyLine(std::span<std::path> filePaths) { int i = 0; // The special syntax `for co_await (auto line : lineGenerator)` for async generators // was part of https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4775.pdf // but was removed in https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0664r8.html#35. // Although the syntax is no longer supported directly, I think the concept of async // generators still makes sense. async_generator<std::string> lineGenerator = readLinesFromFiles(filePaths); while (!lineGenerator.done()) { auto line = co_await lineGenerator.next(); if (line.empty()) { co_return i; } ++i; } co_return i; } In this case, while suspended at the `co_await openFile` or `co_await readLine`, our coroutine is guaranteed to be resumed eventually. The compiler knows this because the `io::task` class which we are co_awaiting on is annotated as `co_await_always_resumes`. Hence, LLVM does not need to generate the corresponding destructor calls in the `destroy` function of our coroutine. When suspended at the `co_yield`, however, the coroutine might be destroyed. In fact, the usage in `findLineNumberOfFirstEmptyLine` returns from the loop early and does not execute the generator until completion. Correspondingly we would not annotate the Awaitable returned by `async_generator::promise_type::yield_value` with `co_await_always_resumes`. Usage example: Asynchronous tasks which can be destroyed while at the initial suspension point (Use case identical to to https://reviews.llvm.org/D134009#3796543) Let's assume an execution model similar to https://github.com/lewissbaker/cppcoro, i.e. coroutines are always constructed in suspended state, and there is a `when_all` function to combine multiple coroutine tasks and the code: template<typename T> struct [[clang::co_await_always_resumes]] task { struct promise_type { auto initial_suspend { return std::suspend_always{} } // [...] rest of the boilerplate code stripped for readability } bool await_ready { return false; } std::coroutine_handle<> await_suspend(std::coroutine_handle<> hdl) { /* [...] / } T await_resume() { / [...] */ } }; lazy<int> foo(std::unique_ptr<int>); lazy<int> getFileSize(std::path p); lazy<int> getFileSizeSum(std::vector<std::path> paths) { int sum = 0; for (auto p : paths) { sum += co_await getFileSize(p); } } lazy<int> foobar(std::vector<std::path> paths) { lazy<int> coro1 = foo(make_unique<int>(12)); lazy<int> coro2 = getFileSizeSum(paths); auto [x,y] = co_await when_all(std::move(coro1), std::move(coro2)); return x + y; } In this case, `coro1` and `coro2` can be destroyed after both coroutines finished execution (i.e. after the when_all returned). In addition, `coro1` could be destroyed before it is ever co_awaited on, e.g. in case the coroutine allocator for `getFileSizeSum` throws. However, if `coro1`/`coro2` ever get resumed from their initial suspend point, i.e. if they were ever co_awaited on, we know for sure that they won't be destroyed while suspended at one of the intermediate suspension points. The above code communicates the fact that the `co_await`s inside the function will not be destroyed, by annotating the `task` with `co_await_always_resumes`. At the same time, the `initial_suspend` returns `std::suspend_always` which is not annotated with `co_await_always_resumes`. Hence, the coroutine is allowed to be destroyed at the initial suspension point.

ChuanqiXu added inline comments.Sep 20 2022, 8:27 PM

clang/include/clang/Basic/AttrDocs.td
6761	I wonder, though, if it might makes sense to apply this attribute also to other return types, like a std::future. For std::future, a library implementor is able to specialize coroutine_traits accordingly, so he can provide an own promise_type for std::future. In that case, the library writer won't be able to add the [[clang::coro_always_done]] attribute to std::future but only to the promise_type injected why the coroutine_traits specialization. Yeah, the example looks compelling. I'll try to make the attribute work for promise_type. I think instead of the [[clang::coro_always_done]] attribute, we maybe want to introduce a [[clang::co_await_always_resumes]] attribute. Thanks for the writte-up! But I think there are two problems: one about implementation and one about the language semantics. Let's talk about the semantics first. When I wrote: C++ struct [[clang::co_await_always_resumes]] awaitable {...}; We're expecting every time the `awaitable` get `co_awaited`, the compiler knew it will be resumed. However, the semantics can't be true in all cases. Since the behavior of `co_await awaitable_instances;` are also controlled by `promise_type:await_transform` (if present) of the enclosing coroutine. And the awaitables can't be sure about the enclosing coroutines. So the semantics looks a little bit weird since we need to double-check the `promise_type:await_transform` to make sure the behavior when we see `co_await awaitable_type{};`. Another point is about the implementation. I won't say it is impossible to implement. But it'll be much harder. Let me introduce some details. First of all, we can't infer that the suspend points are guaranteed to be resumed in the following LLVM IR: %1 = call @llvm.coro.suspend(...) switch i8 %1, label %suspend [i8 0, label %continue2 i8 1, label %unreachable] The compiler can only get the information that %1 won't be `1` in this case. From the perspective of LLVM IR, we can't know the fact the coroutine must be resumed from the suspend point. The idea here is that LLVM IR and C++ are two different languages. And LLVM IR has its own semantics. It doesn't make sense to require the LLVM IR to change its semantics without changing it actually. (BTW, to be honest, in fact, we did this for implementing symmetric transfer. It is not a good idea. But we had no choice at that time since symmetric transfer is required by the standard and we chose the 2 phase model. But let's be in the right direction later). So in the end, we must add something to the above expression `call @llvm.coro.suspend(...)` to inform the middle end about the semantic changes. However, the `@llvm.coro.suspend` intrinsics will be removed after the coroutine splitted. The resume function will look like the examples I wrote in D134009: %coro_index = load i64, ptr %frame.index_addr switch i64 %coro_index, label %unreachable [i64 0, label %bb0 i64 1, label %bb1 i64 2, label %bb2 ... i64 n, label bb_n] bb0: ... %coro_next_index = add i64 %coro_index, 1 store i64 %coro_next_index, ptr %frame.index_addr ret bb1: ... %coro_next_index = add i64 %coro_index, 1 store i64 %coro_next_index, ptr %frame.index_addr ret Again, the compiler can't get the information that here is a control flow from `bb_i` to `bb_{i+1}` unless we add something else, which would make it much more complex. So it will much harder to implement and I can see less benefit. I think the it is sufficient to add [[clang::coro_always_done]] and you proposed another attribute. It is much easier to implement and useful in practice. As one side, as the enclosing coroutine, we can be sure about the behavior as much as possible (promise_type::await_transform). As the other side, in case we want to co_await an awaiter which may not resumed all the time, we'll have many methods. For example, if we're going to await a network IO which we can't be sure if we'll get it correctly, we can write: Task foo(...) { // Task is ``coro_always_done`` co_await when_any(IO_Awaiter{...}, TimeoutAwaiter5s)); // ... handling the results. } In this case, the `IO_Awaiter` and `TimeoutAwaiter5s` may not be `coro_always_done`, depends on the implementation detail of `when_any`. In this case, we can be sure that `foo()` will be done finally. And there are many other methods to design the interfaces in such cases. So I think the current style is sufficient.
6780–6783	Would exceptions still be problematic if we used the unreachable instead of the new LLVM-IR attribute as proposed in https://reviews.llvm.org/D134009#inline-1292827? Short answer: yes. There will be problems too. can you explain to me why exceptions are a problem here? The documentation for the always_complete_coroutine LLVM-IR attribute in https://reviews.llvm.org/D134009 does not mention this restriction. Yes, the reason why D134009 doesn't mention the restriction is that LLVM IR and C++ are two different languages so they have different meanings. The reason may be a little bit complex. Let's see if I can make it clearly. First things first, in fact, the exceptions are not problems at the most of time. For example, the following example is always good: template<typename T> struct [[clang::coro_always_done]] task { struct promise_type { void unhandled_exception() noexcept {...} ... } }; As long as the `unhandled_exception()` doesn't throw exceptions, the paragraph we're mentioning won't be a problem. In other words, the current paragraph says if `unhandled_exception()` throws and the coroutine function is `coro_always_done`. the behavior is UB although the standards requires `coroutine_handle<>::done()` should return true here. The reasons behind is a little bit complex. Let's explain it in the perspective of C++ instead of LLVM IR to make it simpler. See the following code: C++ { coroutine-function-body } catch (...) { unhandled_exception(); } final_suspend: ... As long as the coroutine function will reach the `final_suspend:` point actually all the time, it is OK to mark the coroutine function as `coro_always_done`. And exceptions are not problems as long as they are handled in `unhandled_exception ()`, in which case the coroutines are able to reach the` final_suspend` normally. However, if `unhandled_exception ()` throws, the control flow can't reach the `final_suspend` any more. And the most confusing thing is that the standard spec says the coroutine is considered at done in this case. So I think it'll be confusing that if people found their `coroutine_handle<>::done()` returns true all the time but their codes can't work with `coro_always_done`.

@avogelsgesang would you like to take another look?

ChuanqiXu abandoned this revision.Oct 16 2022, 8:00 PM

This comment was removed by ChuanqiXu.

Mis-Operations

Sorry for the late reply - and thanks for your detailed answers!

To summarize my current position:

I am fine with introducing a [[clang::coro_always_suspend]] attribute. I still think a [[clang::co_await_always_resumes]] would be more flexible. I agree with you that [[clang::coro_always_suspend]] also provides value, and therefore I do not oppose its introduction
I am opposed to introducing a new LLVM IR attribute for this
- I think we can reach the same codegen benefits by using unreachable inside the destroy block.
- I understood your point that this unreachable will not be sufficient to "get the information that here is a control flow from bb_i to bb_{i+1}". But I did not yet understand why this information would be useful for the compiler and which additional optimizations it would enable
- To convince me otherwise, I would like to see examples on which additional optimizations "get the information that here is a control flow from bb_i to bb_{i+1}" unlocks
I would prefer if we don't declare "throwing unhandled_exception()" undefined behavior. Users can already declare unhandled_exception() as noexcept. I see two potential solutions here:
- We allow "unhandled_exception()" to throw even if [[clang::coro_always_suspend]] is specified. To get the full optimization potential, users can just declare their unhandled_exception as noexpect. Currently this is my preferred solution.
- We make it a compiler error if users use [[clang::coro_always_suspend]] without marking unhandled_exception as noexcept. That way, we avoid undefined behavior, while getting all the same optimization benefits for our destroy function

More detailed answers/arguments inside the two sub-threads

clang/include/clang/Basic/AttrDocs.td
6761	Thanks for the writte-up! But I think there are two problems: Language Semantics the behavior of `co_await awaitable_instances`; are also controlled by promise_type:await_transform (if present) of the enclosing coroutine Right, there is also `await_transform` and `operator co_await`. Allowing `[[clang::co_await_always_resumes]]` direclty on the `co_await` call is not a good idea. Also, we should only look for it on the `Awaiter`, not on the `Awaitable` (using the terminology from https://lewissbaker.github.io/2017/11/17/understanding-operator-co-await). That way, `await_transform` and `operator co_await` would be taken into account as part of the usual protocol to get the `Awaiter` from an `Awaitable`. Similarly, for `co_yield` we would also first resolve the `Awaiter` and then look for the `co_await_always_resumes` attribute on that `Awaiter`. Implementation The compiler can only get the information that %1 won't be 1 in this case. From the perspective of LLVM IR, we can't know the fact the coroutine must be resumed from the suspend point I think I had a misunderstanding of LLVM semantics here: My understanding so far was that the `switch` after a `@llvm.coro.suspend` has 3 outgoing edges: resume: this control flow edge is still allowed by the `co_await_always_resumes` edge destroy: this one is forbidden by the `unreachable` and the optimizer knows it won't be taken suspend: my understanding of this edge so far was that this is only an artificial control flow edge needed for the LLVM passes, but that this edge would not actually be allowed to be taken. It seems my understanding of the "suspend" edge was incorrect. So far, I thought of this as undefined behavior since it would leak memory. Are you saying "In LLVM IR, it is valid to suspend a coroutine forever, i.e. we never call resume/destroy on it"? With that out of the way, I now also understood your point Again, the compiler can't get the information that here is a control flow from bb_i to bb_{i+1} Thanks for patiently explainging it! :) A couple of follow-up questions, though: what would we lose by the compiler not knowing that there is control flow from `bb_i` to `bb_{i+1}`? Do you have any specific optimization in mind which would be blocked by not having this information? My understanding of the coroutine design is, that all optimizations across resumption points should be done before coroutine splitting. Why would we want to do some of the cross-suspend-point optimizations after splitting? how does D134009 solve the problem? Afaict, with that approach we currently also lose the information that `bb_i` resumes at `bb_{i+1}` during coroutine splitting? General discussion I think the it is sufficient to add `[[clang::coro_always_done]]` and you proposed another attribute. It is [...] useful in practice. Agree, `[[clang::coro_always_done]]` is useful in practice. I am not opposed to introducing it, although I think `[[clangco_await_always_resumes]]` would be more widely applicable and wouldn't be harder to implement (at least not, if we just lower both as `unreachable`). Implementationwise, I am still not convinced that we indeed need a new LLVM IR attribute for this. The main difference between `unreachable` and the new attribute seems to be "the compiler knows that the coroutine will not be suspended forever". But I don't understand yet, what the LLVM optimizer would do with this additional knowledge (see follow-up questions above).
6780–6783	Thanks for this detailed explanation. Based on some godbolt experiments (https://godbolt.org/z/e9c8Yco5v), I understood that this the `destroy` function needs to know whether the final suspension point was reached due to an exception or through normal control flow, so that it knows whether it needs to destruct the `FinalAwaitable`. Is that understanding correct? Or are there additional points/nuances which I missed? I would prefer, if we don't declare "unhandled_exception throws" as undefined behavior, even if `[[clang::coro_always_done]]` is applied. This is based on my current understanding that: `[[clang::coro_always_done]]` simplifies the `destroy` function, even if we still have a throwing `unhandled_exception`. by allowing `[[clang::coro_always_done]]` also on throwing `unhandled_exception`, it can be applied in more cases. by declaring `unhandled_exception` as `noexcept`, the user could still achieve the full optimization potential

@avogelsgesang hi, thanks for the detailed reply. After reading your reply, my mind changes a little bit. So before replying to your comments, I want to hear your opinion about the other idea. How about the idea of coro_always_elide attribute? For example, we can mark a coroutine function as coro_always_elide attribute, or we can mark it at the callsite. Just like:

[[clang::coro_always_elide]]] coro_func(); // no dynamic allocation

The idea is that this attribute is much easier to understand and useful. Since I feel like even if we can get some consensus in the discussion, the other users may be confusing about it. And also the coroutine elision would be more helpful than reducing the destroy functions.

Interesting... . I will need to read up on coro_always_elide a bit before being able to form an informed opinion.
I found https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2594r0.pdf
Any other relevant resources?

Also, how does "guaranteed HALO" contribute to reducing the amount of code inside the destroy function? How does it address/help https://github.com/llvm/llvm-project/issues/56980, i.e. the starting point of this review?

My (very rough, current) understanding is:

HALO relies on inlining the ramp function
Even after HALO, the destroy function is still called through a function call (which might be inlined separately by the normal function inlining functionality)
HALO by itself will not simplify the destroy function

As such, I currently see independent value in always_coro_elide (avoids a heap allocation) and coro_always_done/co_await_always_resumes (removes unreachable code from the destroy function).
Both of which seem like useful goals. I don't see always_coro_elide as a better alternative to coro_always_done/co_await_always_resumes, I think both address independent goals

In D134010#3943486, @avogelsgesang wrote:

Interesting... . I will need to read up on coro_always_elide a bit before being able to form an informed opinion.
I found https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2594r0.pdf
Any other relevant resources?

Here is the original paper about HALO: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html.
I think my proposal may be helpful too: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2477r3.html. This is the paper which tries to allow the user to control the coroutine elision in the standard level. And I feel it may be too slow or not feasible to make it in the standard. So I am just wondering if it is better to make it an attribute.

Also, how does "guaranteed HALO" contribute to reducing the amount of code inside the destroy function? How does it address/help https://github.com/llvm/llvm-project/issues/56980, i.e. the starting point of this review?

No, it is not related. They are two separate problems.

My (very rough, current) understanding is:

HALO relies on inlining the ramp function

Even after HALO, the destroy function is still called through a function call (which might be inlined separately by the normal function inlining functionality)

HALO by itself will not simplify the destroy function

As such, I currently see independent value in always_coro_elide (avoids a heap allocation) and coro_always_done/co_await_always_resumes (removes unreachable code from the destroy function).
Both of which seem like useful goals. I don't see always_coro_elide as a better alternative to coro_always_done/co_await_always_resumes, I think both address independent goals

In fact, coro_always_done has two goals:

Reduce the size of destroy function. (Done in the series of patches)
Give helpful information to coroutine elision optimization (Not implemented or in imagination)

Both of goals are planned to be archived by the additional control flow graph information provided by coro_always_done attribute.

Then here let me introduce some more information about the coroutine elision. There are 2 conditions to perform the coroutine elision:

Inline the ramp function.
For the inlined ramp function, every path from the ramp function (the creation point) to the exit point of the enclosing function will (be known to) meet a destroy point. (coroutine_handle::destroy());

So for example, the following generator case will be elided all the way:

class generator {
    //...

    ~generator() {
        coroutine_handle::get_from_promise(&promise).destroy();
    }

private:
    promise_type promise;
};

void foo() {
    generator g;
    // any other information
}

After inlining and some optimization, the body of foo will look like:

void foo() {
     void *g_frame = new (...); // the creation for the frame of generator g
     ...
     destroy(g_frame);
}

So every path from the creation of the coroutine frame to the exit of the function will meet the destroy function. So it meets the condition for coroutine elision. So this is the reason why we would think generator wouldn't get any dynamic allocation.

Before move on, let me emphasis another point here: (to be known). What does this mean? See the following generator example:

C++
class generator {
    //...

    ~generator() {
        coroutine_handle::get_from_promise(&promise).destroy();
    }

private:
    promise_type promise;
    // any other field member
};

void any_other_function(void*);

void foo() {
    generator g;
    any_other_function(&g)
    // any other information
}

then the generated code after inlining and optimization would look like:

void foo() {
     void *g_frame = new (...); // the creation for the frame of generator g
     generator g(g_frame);
     any_other_function(&g)
     // ...
     destroy(g.promise - 16); // Get the address for the promise and delete it.
}

Then we can't perform coroutine elision now. What's the problem? It is the compiler can't be sure the g.promise-16 is referring to the same coroutine frame with the one created. Since it is completely possible that the g.promise get changed during the execution of any_other_function(xxx)! So the compiler can't assume anything about this for security.

OK, let's move to another point. Let's consider the case the enclosing function is a coroutine:

// task is the general task type.
// And let's assume that the destructor of task (or the await_resume) would destroy the corresponding task.
task g();
task f() {
   co_await g();
}

then the generated code would look like:

task f() {
    void *g_frame = new(...);
    task g(g_frame);
    auto awaiter = operator co_await(g);
    if (! awaiter.await_ready()) {
       awaiter.await_suspend();
       return; // There is an additional exit point !
    }
    v = awaiter.await_resume();
    destroy(g_frame);
}

Look about the above example. We can find that the second condition for coroutine elision is broken. Since there is path from the creation of the coroutine to the exit of the function wouldn't meet the destroy function. Since we can't be sure that the suspended coroutine f would always resume again. (It may be destroyed directly. Or it may never get accessed again and memory leaks.)

And now, it should be clear that why the coro_always_done and coro_always_resume is helpful for coroutine elision. Since it is helpful for such cases.

So let me answer the question: why I want to introduce coro_always_elide to replace coro_always_done /coro_always_resume? Because for coro_always_done /coro_always_resume, my original purpose is mainly for the coroutine elision instead of the reducing the destroy function. Reducing the destroy function is helpful for the size definitely (from my investigate, it can reduce 4% size for a coroutine intensively used project). But we care about the runtime performance much more than the size (while the size change is not compelling). And from the above analysis, you can know that coro_always_done /coro_always_resume care about the process and the reason. They are harder to understand and there are cases they can't handle (see the case where any_other_function shows). And coro_always_elide cares about the result and it is much easier to understand. So this is the reason why I want to introduce it now.

And now, it should be clear that why the coro_always_done and coro_always_resume is helpful for coroutine elision. Since it is helpful for such cases.

Yes, I understood that point now :)
Now I see why the information "that here is a control flow from bb_i to bb_{i+1}" is very valuable, and why we also need to give the middle-end the information that the case "coroutine gets suspended, but never destroyed/resumed" cannot occur.

So let me answer the question: why I want to introduce coro_always_elide to replace coro_always_done /coro_always_resume? Because for coro_always_done / coro_always_resume, my original purpose is mainly for the coroutine elision instead of the reducing the destroy function. Reducing the destroy function is helpful for the size definitely (from my investigate, it can reduce 4% size for a coroutine intensively used project). But we care about the runtime performance much more than the size (while the size change is not compelling).

100% with you. HALO is much more important than code size, also in my use case. The size reduction is still quite nice, and I would still take it, if we can find a solution to solve both with the same mechanism

And from the above analysis, you can know that coro_always_done /coro_always_resume care about the process and the reason. They are harder to understand and there are cases they can't handle (see the case where any_other_function shows). And coro_always_elide cares about the result and it is much easier to understand. So this is the reason why I want to introduce it now.

Personally, I think there is still value to have both coro_always_resume and coro_always_elide.
coro_always_resume targets library authors, while coro_always_elide targets library users.
coro_always_resume could be added to the awaitable classes inside a coroutines library by the library author: The library author knows, that his coroutine task type will always resume, and can provide this guarantee to the compiler. However, the author of the coroutine library cannot know what his users are doing with the task instances and whether they swap them out "under the hood".
Through coro_always_elide the user of the library can let the compiler know: Yes, I am leaking a reference to my task. But don't worry, I will not swap the coroutine_handle, you can still do HALO.
I would see coro_always_resume as the default solution which would capture 90% of the use cases. Given the library authors add it to their libraries, the users rarely have to worry about it.
coro_always_elide I would see as an escape hatch for the hard cases where coro_always_resume is insufficient.

But I agree: coro_always_elide is also necessary, and I would love if we had it in clang :)

Which brings me to another question though: How would you implement the coro_always_elide attribute?
Both on the LLVM IR level and on the C++ level. E.g., it's not completely clear to me where exactly the coro_always_elide would be placed... on the co_await?
I think we should move the discussion on coro_always_elide to discourse, so that it gets more visibility...

Yeah, it is good to know we get some consensus now.

Which brings me to another question though: How would you implement the coro_always_elide attribute?

Both on the LLVM IR level and on the C++ level. E.g., it's not completely clear to me where exactly the coro_always_elide would be placed... on the co_await?

I had some implementation experience before so I am sure it is implementable. And for the C++ syntax, it should be completely the same with always_inline. We know that always_inline can apply to function declarations. And recently it can be applied to statements. Here is an example: https://clang.llvm.org/docs/AttributeReference.html#always-inline-force-inline. So we can make it for the following case:

C++
Task<int> foo();
Task<int> bar() {
    int i = 0;
    [[clang::coro_always_elide]] i = co_await foo(); // OK.
    int j = co_await foo(); // not work since statement attribute can't be applied to declaration statement. But this defect **may** be able to be fixed in future.
}

I think we should move the discussion on coro_always_elide to discourse, so that it gets more visibility...

Agreed. I'll send one after I send a patch.

Personally, I think there is still value to have both coro_always_resume and coro_always_elide.

Agreed. I only adjust the priority here.

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

3 lines

include/

clang/

Basic/

Attr.td

6 lines

AttrDocs.td

80 lines

lib/

CodeGen/

CGCoroutine.cpp

8 lines

Sema/

SemaDeclAttr.cpp

5 lines

test/

CodeGenCoroutines/

coro-coro_always_done.cpp

68 lines

Misc/

pragma-attribute-supported-attributes-list.test

1 line

Diff 461127

clang/docs/ReleaseNotes.rst

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines

- Updated the value returned by ``__has_c_attribute(nodiscard)`` to ``202003L``

supported the ability to specify a message in the attribute, so there were no

changes to the attribute behavior.

- Updated the value returned by ``__has_c_attribute(fallthrough)`` to ``201910L``

based on the final date specified by the C2x committee draft. We previously

used ``201904L`` (the date the proposal was seen by the committee) by mistake.

There were no other changes to the attribute behavior.

- Introduced a new attribute ``[[clang::coro_always_done]]`` to allow programmers to

erichkeaneUnsubmitted

Done

There were no other changes to the attribute behavior.

- - Introduced a new attribute ``[[clang::coro_always_done]]`` to allow programmer

+ - Introduced a new attribute ``[[clang::coro_always_done]]`` to allow programmers to

provide runtime information to compilers. See the attribute document for details.

erichkeane:

provide runtime information to compilers. See the attribute document for details.

avogelsgesangUnsubmitted

Not Done

can we add a direct link to the attribute documentation here?

avogelsgesang: can we add a direct link to the attribute documentation here?

Windows Support

---------------

- For the MinGW driver, added the options ``-mguard=none``, ``-mguard=cf`` and

``-mguard=cf-nochecks`` (equivalent to ``/guard:cf-``, ``/guard:cf`` and

``/guard:cf,nochecks`` in clang-cl) for enabling Control Flow Guard checks

and generation of address-taken function table.

AIX Support

▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

clang/include/clang/Basic/Attr.td

Show First 20 Lines • Show All 4,097 Lines • ▼ Show 20 Lines	def FunctionReturnThunks : InheritableAttr,
let Spellings = [GCC<"function_return">];		let Spellings = [GCC<"function_return">];
let Args = [EnumArgument<"ThunkType", "Kind",		let Args = [EnumArgument<"ThunkType", "Kind",
["keep", "thunk-extern"],		["keep", "thunk-extern"],
["Keep", "Extern"]		["Keep", "Extern"]
>];		>];
let Subjects = SubjectList<[Function]>;		let Subjects = SubjectList<[Function]>;
let Documentation = [FunctionReturnThunksDocs];		let Documentation = [FunctionReturnThunksDocs];
}		}

		def CoroAlwaysDone : InheritableAttr {
		let Spellings = [Clang<"coro_always_done", /AllowInC/0>];
		let Subjects = SubjectList<[CXXRecord, Function], ErrorDiag>;
		let Documentation = [CoroAlwaysDoneDocs];
		}

clang/include/clang/Basic/AttrDocs.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,705 Lines • ▼ Show 20 Lines

The values ``thunk`` and ``thunk-inline`` from GCC are not supported. The values ``thunk`` and ``thunk-inline`` from GCC are not supported.

The symbol used for ``thunk-extern`` is target specific: The symbol used for ``thunk-extern`` is target specific:

* X86: ``__x86_return_thunk`` * X86: ``__x86_return_thunk``

As such, this function attribute is currently only supported on X86 targets. As such, this function attribute is currently only supported on X86 targets.

}]; }];

} }

def CoroAlwaysDoneDocs : Documentation {

let Category = DocCatType;

let Content = [{

The attribute ``coro_always_done`` can be applied to a class definition or

a function definiton.

Query for this feature with ``__has_attribute(coro_always_done)``.

erichkeaneUnsubmitted

Not Done

I dont get what this sentence means. Could you try again? I only barely understand the rest of this first section, otherwise I'd try to suggest an edit.

In this document, it should also explain what the programmer is promising the compiler here.

erichkeane: I dont get what this sentence means. Could you try again? I only barely understand the rest…

The coroutine functions for which the return type is a ``coro_always_done`` class

and the coroutine functions which are ``coro_always_done`` already are guaranteed

to complete. It implies that every coroutine created by such functions will always

be done in the end.

This is pretty helpful for compiler optimizations.

avogelsgesangUnsubmitted

Not Done

Query for this feature with ``__has_attribute(coro_always_done)``.

- The coroutine functions for which the return type is a ``coro_always_done`` class

- and the coroutine functions which are ``coro_always_done`` already are guaranteed

- to complete. It implies that every coroutine created by such functions will always

- be done in the end.

+ For all coroutine functions marked directly as ``coro_always_done`` or with a return type marked as ``coro_always_done`, clang will assume that the coroutine function will be at its final suspension point when `std::coroutine_handle::destroy` is called.

- This is pretty helpful for compiler optimizations.

- For example,

+ This allows clang to generate more efficient code for coroutines.For example,

avogelsgesang:

For example,

.. code-block:: c++

class task {...};

task coro() {

X x;

co_await std::suspend_always{};

co_return;

}

Then the generated destroy function looks like:

.. code-block:: c++

task coro.destroy() {

if (suspended at the second line) {

x.~X();

mark coroutine as done;

}

if (coroutine is done) {

delete ...

}

The destroy function will call the destructors in case the coroutine are destroyed

before completed. However, in many cases, the library writers know their coroutines

are not going to be destroyed before completed. In these cases,

``coro_always_done`` is helpful. It'll tell the compiler the information that

avogelsgesangUnsubmitted

Not Done

are not going to be destroyed before completed. In these cases,

- ``coro_always_done`` is helpful. It'll tell the compiler the information that

+ ``coro_always_done`` is helpful. It will tell the compiler that

the coroutine would be destroyed after it gets completed.

avogelsgesang:

the coroutine would be destroyed after it gets completed.

avogelsgesangUnsubmitted

Not Done

``coro_always_done`` is helpful. It'll tell the compiler the information that

- the coroutine would be destroyed after it gets completed.

+ the coroutine will be destroyed after it gets completed.

So the compiler can optimize the codes further. For the above example,

avogelsgesang:

So the compiler can optimize the codes further. For the above example,

avogelsgesangUnsubmitted

Not Done

the coroutine would be destroyed after it gets completed.

- So the compiler can optimize the codes further. For the above example,

+ Thereby, the compiler can optimize the code further. For the above example,

.. code-block:: c++

avogelsgesang:

.. code-block:: c++

class [[clang::coro_always_done]] task {...};

avogelsgesangUnsubmitted

Not Done

I am a bit surprised that we apply the [[clang::coro_always_done]] to the coroutine type and not the promise_type. Usually I would expect all things which influence the way that the compiler generates code for coroutines inside the promise_type

avogelsgesang: I am a bit surprised that we apply the `[[clang::coro_always_done]]` to the coroutine type and…

ChuanqiXuAuthorUnsubmitted

Done

There are two reasons for this:

Current style is easier to implement. We keep the relationship of coroutine with its promise_type in the Sema (semantic analysis) part. And we generate the code in the CodeGen part, where the relation information are lost. So it is easier to make the decision by the return type.
Also, it wouldn't be hard to depend on the promise_type. But the other reason I thought is that when people talking about the different coroutine behaviors, we generally say "generator coroutine", "task coroutine" or even "cppcoro task coroutine", "folly task coroutine". I mean we're always talking about the return type. And the promise_type one looks more like a library implementer's tongue. (I won't argue if you say the attribute is mostly for the library writer).

So I think the current choice might not be bad.

ChuanqiXu: There are two reasons for this: - Current style is easier to implement. We keep the…

avogelsgesangUnsubmitted

Not Done

I thought is that when people talking about the different coroutine behaviors, we generally say "generator coroutine", "task coroutine" or even "cppcoro task coroutine", "folly task coroutine"

good point. I wonder, though, if it might makes sense to apply this attribute also to other return types, like a std::future. For std::future, a library implementor is able to specialize coroutine_traits accordingly, so he can provide an own promise_type for std::future. In that case, the library writer won't be able to add the [[clang::coro_always_done]] attribute to std::future but only to the promise_type injected why the coroutine_traits specialization.

avogelsgesang: > I thought is that when people talking about the different coroutine behaviors, we generally…

avogelsgesangUnsubmitted

Not Done

Another related thought:

I think instead of the [[clang::coro_always_done]] attribute, we maybe want to introduce a [[clang::co_await_always_resumes]] attribute. This attribute would apply to the awaitable instead of the coroutine itself.

Proposed semantics

The [[clang::co_await_always_resumes]] can be applied either to a co_await as in

task<int> foo();
task<int> foo2();

task<int> bar() {
   int x = [[clang::co_await_always_resumes]] co_await foo();
   co_return x + [[clang::co_await_always_resumes]] co_await foo2();
}

or directly to an Awaitable class

class [[clang::co_await_always_resumes]] task {...};

If applied to a co_await, the attribute implies that for this co_await suspension point the coroutine will always be resumed. Destroying the coroutine while suspended at this suspension point is undefined behavior.

If applied to a struct/class, all co_awaits waiting on an instance of this struct/class are guaranteed to be resumed.

Given, e.g.,

class [[clang::co_await_always_resumes]] task {...};
class std::lazy {...}; // from the standard library; not annotated with `co_await_always_resumes`

task<int> foo1();
std::lazy<int> foo2();
std::lazy<int> foo3();

task<int> bar() {
   int x = co_await foo1();
   int y = co_await foo2();
   int z = [[clang::co_await_always_resumes]] co_await foo3();
   co_return x + y + z;
}

the coroutine bar

cannot be destroyed at suspension point foo1, because the task class is annotated with co_await_always_resumes
can be destroyed at suspension point foo2, because std::lazy is not annotated with co_await_always_resumes
cannot be destroyed at suspension point foo3, because the co_await is annotated with co_await_always_resumes.

Whether bar can be destroyed while suspended at the initial/final suspension point can be customized by annotating the return type of task::promise_type::{initial_suspend,final_suspend} with co_await_always_resumes.

Underlying mental model

Everytime I co_await on something, I am stopping the coroutine and handing over control to the Awaitable. The Awaitable then decides what it wants to do with the std::coroutine_handle and when to resume it, e.g., calling resume after some asynchronous read operation finished. Given that the Awaitable decides when to call resume, it would be natural to let the Awaitable make the promise "I will always call resume on the received coroutine handle" by adding the co_await_always_resumes attribute.

Usage examples

With the co_await_always_resumes attribute we can decide for each co_await suspension point independently whether the coroutine can be destroyed while suspended at this point. I see this capability as relevant in practice, as shown by the following examples

Usage example: Asynchronous generators

Let's assume I have an asynchronous generator similar to

namespace io {
  class [[clang::co_await_always_resumes]] task { ... };
  class FileHandle {...};
  task<FileHandle> openFile(std::path p);
  task<std::string> readLine(FileHandle f);
}

async_generator<std::string> readLinesFromFiles(std::span<std::path> filePaths) {
     for (auto& filePath : filePaths) {
         auto fileHandle = co_await io::openFile(filePath);
         while (!fileHandle.eof_reached()) {
              std::string line = co_await io::readLine(fileHandle);
              co_yield std::move(line);
         }
     }
}

io::task<int> findLineNumberOfFirstEmptyLine(std::span<std::path> filePaths) {
    int i = 0;
    // The special syntax `for co_await (auto line : lineGenerator)` for async generators
    // was part of https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4775.pdf
    // but was removed in https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0664r8.html#35.
    // Although the syntax is no longer supported directly, I think the concept of async
    // generators still makes sense.
    async_generator<std::string> lineGenerator = readLinesFromFiles(filePaths);
    while (!lineGenerator.done()) {
         auto line = co_await lineGenerator.next();
         if (line.empty()) {
             co_return i;
         }
         ++i;
     }
     co_return i;
}

In this case, while suspended at the co_await openFile or co_await readLine, our coroutine is guaranteed to be resumed eventually. The compiler knows this because the io::task class which we are co_awaiting on is annotated as co_await_always_resumes. Hence, LLVM does not need to generate the corresponding destructor calls in the destroy function of our coroutine.

When suspended at the co_yield, however, the coroutine might be destroyed. In fact, the usage in findLineNumberOfFirstEmptyLine returns from the loop early and does not execute the generator until completion. Correspondingly we would not annotate the Awaitable returned by async_generator::promise_type::yield_value with co_await_always_resumes.

Usage example: Asynchronous tasks which can be destroyed while at the initial suspension point

(Use case identical to to https://reviews.llvm.org/D134009#3796543)

Let's assume an execution model similar to https://github.com/lewissbaker/cppcoro, i.e. coroutines are always constructed in suspended state, and there is a when_all function to combine multiple coroutine tasks and the code:

template<typename T>
struct [[clang::co_await_always_resumes]] task { 
    struct promise_type {
        auto initial_suspend { return std::suspend_always{} }
        // [...] rest of the boilerplate code stripped for readability
    }
    bool await_ready { return false; }
    std::coroutine_handle<> await_suspend(std::coroutine_handle<> hdl) { /* [...] */ }
    T await_resume() { /* [...] */ }
};

lazy<int> foo(std::unique_ptr<int>);
lazy<int> getFileSize(std::path p);
lazy<int> getFileSizeSum(std::vector<std::path> paths) {
    int sum = 0;
    for (auto p : paths) {
       sum += co_await getFileSize(p);
    }
}

lazy<int> foobar(std::vector<std::path> paths) {
     lazy<int> coro1 = foo(make_unique<int>(12));
     lazy<int> coro2 = getFileSizeSum(paths);
     auto [x,y] = co_await when_all(std::move(coro1), std::move(coro2));
     return x + y;
}

In this case, coro1 and coro2 can be destroyed after both coroutines finished execution (i.e. after the when_all returned). In addition, coro1 could be destroyed before it is ever co_awaited on, e.g. in case the coroutine allocator for getFileSizeSum throws. However, if coro1/coro2 ever get resumed from their initial suspend point, i.e. if they were ever co_awaited on, we know for sure that they won't be destroyed while suspended at one of the intermediate suspension points.

The above code communicates the fact that the co_awaits inside the function will not be destroyed, by annotating the task with co_await_always_resumes. At the same time, the initial_suspend returns std::suspend_always which is not annotated with co_await_always_resumes. Hence, the coroutine is allowed to be destroyed at the initial suspension point.

avogelsgesang: Another related thought: I think instead of the `[[clang::coro_always_done]]` attribute, we…

ChuanqiXuAuthorUnsubmitted

Done

I wonder, though, if it might makes sense to apply this attribute also to other return types, like a std::future. For std::future, a library implementor is able to specialize coroutine_traits accordingly, so he can provide an own promise_type for std::future. In that case, the library writer won't be able to add the [[clang::coro_always_done]] attribute to std::future but only to the promise_type injected why the coroutine_traits specialization.

Yeah, the example looks compelling. I'll try to make the attribute work for promise_type.

I think instead of the [[clang::coro_always_done]] attribute, we maybe want to introduce a [[clang::co_await_always_resumes]] attribute.

Thanks for the writte-up! But I think there are two problems: one about implementation and one about the language semantics. Let's talk about the semantics first. When I wrote:

C++
struct [[clang::co_await_always_resumes]]  awaitable {...};

We're expecting every time the awaitable get co_awaited, the compiler knew it will be resumed. However, the semantics can't be true in all cases. Since the behavior of co_await awaitable_instances; are also controlled by promise_type:await_transform (if present) of the enclosing coroutine. And the awaitables can't be sure about the enclosing coroutines. So the semantics looks a little bit weird since we need to double-check the promise_type:await_transform to make sure the behavior when we see co_await awaitable_type{};.

Another point is about the implementation. I won't say it is impossible to implement. But it'll be much harder. Let me introduce some details.

First of all, we can't infer that the suspend points are guaranteed to be resumed in the following LLVM IR:

%1 = call @llvm.coro.suspend(...)
switch i8 %1, label %suspend [i8 0, label %continue2
                                                    i8 1, label %unreachable]

The compiler can only get the information that %1 won't be 1 in this case. From the perspective of LLVM IR, we can't know the fact the coroutine must be resumed from the suspend point. The idea here is that LLVM IR and C++ are two different languages. And LLVM IR has its own semantics. It doesn't make sense to require the LLVM IR to change its semantics without changing it actually. (BTW, to be honest, in fact, we did this for implementing symmetric transfer. It is not a good idea. But we had no choice at that time since symmetric transfer is required by the standard and we chose the 2 phase model. But let's be in the right direction later).

So in the end, we must add something to the above expression call @llvm.coro.suspend(...) to inform the middle end about the semantic changes. However, the @llvm.coro.suspend intrinsics will be removed after the coroutine splitted. The resume function will look like the examples I wrote in D134009:

%coro_index = load i64, ptr %frame.index_addr
switch i64 %coro_index, label %unreachable [i64 0, label %bb0
                                            i64 1, label %bb1
                                            i64 2, label %bb2
                                            ...
                                            i64 n, label bb_n]

bb0:
  ...
  %coro_next_index = add i64 %coro_index, 1
  store i64 %coro_next_index, ptr %frame.index_addr
  ret

bb1:
  ...
  %coro_next_index = add i64 %coro_index, 1
  store i64 %coro_next_index, ptr %frame.index_addr
  ret

Again, the compiler can't get the information that here is a control flow from bb_i to bb_{i+1} unless we add something else, which would make it much more complex.

So it will much harder to implement and I can see less benefit. I think the it is sufficient to add [[clang::coro_always_done]] and you proposed another attribute. It is much easier to implement and useful in practice. As one side, as the enclosing coroutine, we can be sure about the behavior as much as possible (promise_type::await_transform). As the other side, in case we want to co_await an awaiter which may not resumed all the time, we'll have many methods.

For example, if we're going to await a network IO which we can't be sure if we'll get it correctly, we can write:

Task foo(...) { // Task is ``coro_always_done``
   co_await when_any(IO_Awaiter{...}, TimeoutAwaiter5s));
   // ... handling the results.
}

In this case, the IO_Awaiter and TimeoutAwaiter5s may not be coro_always_done, depends on the implementation detail of when_any. In this case, we can be sure that foo() will be done finally. And there are many other methods to design the interfaces in such cases. So I think the current style is sufficient.

ChuanqiXu: > I wonder, though, if it might makes sense to apply this attribute also to other return types…

avogelsgesangUnsubmitted

Not Done

Thanks for the writte-up! But I think there are two problems:

Language Semantics

the behavior of co_await awaitable_instances; are also controlled by promise_type:await_transform (if present) of the enclosing coroutine

Right, there is also await_transform and operator co_await. Allowing [[clang::co_await_always_resumes]] direclty on the co_await call is not a good idea. Also, we should only look for it on the Awaiter, not on the Awaitable (using the terminology from https://lewissbaker.github.io/2017/11/17/understanding-operator-co-await). That way, await_transform and operator co_await would be taken into account as part of the usual protocol to get the Awaiter from an Awaitable.

Similarly, for co_yield we would also first resolve the Awaiter and then look for the co_await_always_resumes attribute on that Awaiter.

Implementation

The compiler can only get the information that %1 won't be 1 in this case. *From the perspective* of LLVM IR, we can't know the fact the coroutine must be resumed from the suspend point

I think I had a misunderstanding of LLVM semantics here: My understanding so far was that the switch after a @llvm.coro.suspend has 3 outgoing edges:

resume: this control flow edge is still allowed by the co_await_always_resumes edge
destroy: this one is forbidden by the unreachable and the optimizer knows it won't be taken
suspend: my understanding of this edge so far was that this is only an artificial control flow edge needed for the LLVM passes, but that this edge would not actually be allowed to be taken.

It seems my understanding of the "suspend" edge was incorrect. So far, I thought of this as undefined behavior since it would leak memory. Are you saying "In LLVM IR, it is valid to suspend a coroutine forever, i.e. we never call resume/destroy on it"?

With that out of the way, I now also understood your point

Again, the compiler can't get the information that here is a control flow from bb_i to bb_{i+1}

Thanks for patiently explainging it! :)

A couple of follow-up questions, though:

what would we lose by the compiler not knowing that there is control flow from bb_i to bb_{i+1}? Do you have any specific optimization in mind which would be blocked by not having this information?
My understanding of the coroutine design is, that all optimizations across resumption points should be done before coroutine splitting. Why would we want to do some of the cross-suspend-point optimizations after splitting?
how does D134009 solve the problem? Afaict, with that approach we currently also lose the information that bb_i resumes at bb_{i+1} during coroutine splitting?

General discussion

I think the it is sufficient to add [[clang::coro_always_done]] and you proposed another attribute. It is [...] useful in practice.

Agree, [[clang::coro_always_done]] is useful in practice. I am not opposed to introducing it, although I think [[clangco_await_always_resumes]] would be more widely applicable and wouldn't be harder to implement (at least not, if we just lower both as unreachable).

Implementationwise, I am still not convinced that we indeed need a new LLVM IR attribute for this. The main difference between unreachable and the new attribute seems to be "the compiler knows that the coroutine will not be suspended forever". But I don't understand yet, what the LLVM optimizer would do with this additional knowledge (see follow-up questions above).

avogelsgesang: > Thanks for the writte-up! But I think there are two problems: **Language Semantics** > the…

task coro() {

X x;

co_await std::suspend_always{};

co_return;

}

Now the generated destroy function will look like:

.. code-block:: c++

task coro.destroy() {

delete ...

}

Note that it is an undefined behavior if a ``coro_always_done`` coroutine

doesn't complete. For example, a coroutine gets destroyed before it is

done or a suspended coroutine never gets resumed.

avogelsgesangUnsubmitted

Not Done

delete ...

}

Note that it is an undefined behavior if a ``coro_always_done`` coroutine

- doesn't complete. For example, a coroutine gets destroyed before it is

- done or a suspended coroutine never gets resumed.

+ is not executed until completion. For example, it is undefined behavior if such a coroutine gets destroyed before it is completed or a suspended coroutine never gets resumed.

Also note that if the coroutine may exit via an exception from

avogelsgesang:

Also note that if the coroutine may exit via an exception from

`promise.unhandled_exception()`, it is an undefined behavior if we mark the coroutine

as ``coro_always_done``. Although the language standard says the coroutine

is considered to suspended at the final suspend point.

avogelsgesangUnsubmitted

Not Done

can you explain to me why exceptions are a problem here? The documentation for the always_complete_coroutine LLVM-IR attribute in https://reviews.llvm.org/D134009 does not mention this restriction.

Would exceptions still be problematic if we used the unreachable instead of the new LLVM-IR attribute as proposed in https://reviews.llvm.org/D134009#inline-1292827?

avogelsgesang: can you explain to me why exceptions are a problem here? The documentation for the…

ChuanqiXuAuthorUnsubmitted

Done

Would exceptions still be problematic if we used the unreachable instead of the new LLVM-IR attribute as proposed in https://reviews.llvm.org/D134009#inline-1292827?

Short answer: yes. There will be problems too.

can you explain to me why exceptions are a problem here? The documentation for the always_complete_coroutine LLVM-IR attribute in https://reviews.llvm.org/D134009 does not mention this restriction.

Yes, the reason why D134009 doesn't mention the restriction is that LLVM IR and C++ are two different languages so they have different meanings.

The reason may be a little bit complex. Let's see if I can make it clearly.

First things first, in fact, the exceptions are not problems at the most of time. For example, the following example is always good:

template<typename T>
struct [[clang::coro_always_done]] task { 
    struct promise_type {
        void unhandled_exception() noexcept {...}
        ...
    }
};

As long as the unhandled_exception() doesn't throw exceptions, the paragraph we're mentioning won't be a problem. In other words, the current paragraph says if unhandled_exception() throws and the coroutine function is coro_always_done. the behavior is UB although the standards requires coroutine_handle<>::done() should return true here. The reasons behind is a little bit complex. Let's explain it in the perspective of C++ instead of LLVM IR to make it simpler.

See the following code:

C++
{
   coroutine-function-body
} catch (...) {
   unhandled_exception();
}
final_suspend:
    ...

As long as the coroutine function will reach the final_suspend: point actually all the time, it is OK to mark the coroutine function as coro_always_done. And exceptions are not problems as long as they are handled in `unhandled_exception (), in which case the coroutines are able to reach the final_suspend` normally. However, if unhandled_exception () throws, the control flow can't reach the final_suspend any more. And the most confusing thing is that the standard spec says the coroutine is considered at done in this case. So I think it'll be confusing that if people found their coroutine_handle<>::done() returns true all the time but their codes can't work with coro_always_done.

ChuanqiXu: > Would exceptions still be problematic if we used the unreachable instead of the new LLVM-IR…

avogelsgesangUnsubmitted

Not Done

Thanks for this detailed explanation. Based on some godbolt experiments (https://godbolt.org/z/e9c8Yco5v), I understood that this the destroy function needs to know whether the final suspension point was reached due to an exception or through normal control flow, so that it knows whether it needs to destruct the FinalAwaitable. Is that understanding correct? Or are there additional points/nuances which I missed?

I would prefer, if we don't declare "unhandled_exception throws" as undefined behavior, even if [[clang::coro_always_done]] is applied.

This is based on my current understanding that:

[[clang::coro_always_done]] simplifies the destroy function, even if we still have a throwing unhandled_exception.
by allowing [[clang::coro_always_done]] also on throwing unhandled_exception, it can be applied in more cases.
by declaring unhandled_exception as noexcept, the user could still achieve the full optimization potential

avogelsgesang: Thanks for this detailed explanation. Based on some godbolt experiments (https://godbolt.

So formally, the programmer is promising the following conditions:

* All such coroutines are promised to start.

* All such coroutines are promised to be done when the ``destroy()`` function

get called.

* For every such coroutine, when it is switched out in a non final state, it is

switched in again. In another word, the cancellation is not supported.

avogelsgesangUnsubmitted

Not Done

get called.

- * For every such coroutine, when it is switched out in a non final state, it is

- switched in again. In another word, the cancellation is not supported.

+ * For every such coroutine, f it is suspended in a non final state, it is

+ resumed again. In another word, cancellation is not supported.

}];

}

avogelsgesang:

}];

}

clang/lib/CodeGen/CGCoroutine.cpp

Show First 20 Lines • Show All 648 Lines • ▼ Show 20 Lines	void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {

if (Stmt *Ret = S.getReturnStmt()) {		if (Stmt *Ret = S.getReturnStmt()) {
// Since we already emitted the return value above, so we shouldn't		// Since we already emitted the return value above, so we shouldn't
// emit it again here.		// emit it again here.
cast<ReturnStmt>(Ret)->setRetValue(nullptr);		cast<ReturnStmt>(Ret)->setRetValue(nullptr);
EmitStmt(Ret);		EmitStmt(Ret);
}		}

		if (auto *RD = FnRetTy->getAsCXXRecordDecl()) {
		if (RD->hasAttr<CoroAlwaysDoneAttr>())
		CurFn->setAlwaysPresplitCoroutine();
		} else if (auto *FD = dyn_cast<FunctionDecl>(CurFuncDecl)) {
		if (FD->hasAttr<CoroAlwaysDoneAttr>())
		CurFn->setAlwaysPresplitCoroutine();
		}

// LLVM require the frontend to mark the coroutine.		// LLVM require the frontend to mark the coroutine.
CurFn->setPresplitCoroutine();		CurFn->setPresplitCoroutine();
}		}

// Emit coroutine intrinsic and patch up arguments of the token type.		// Emit coroutine intrinsic and patch up arguments of the token type.
RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,		RValue CodeGenFunction::EmitCoroutineIntrinsic(const CallExpr *E,
unsigned int IID) {		unsigned int IID) {
SmallVector<llvm::Value *, 8> Args;		SmallVector<llvm::Value *, 8> Args;
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,103 Lines • ▼ Show 20 Lines	ProcessDeclAttribute(Sema &S, Scope scope, Decl D, const ParsedAttr &AL,

case ParsedAttr::AT_BuiltinAlias:		case ParsedAttr::AT_BuiltinAlias:
handleBuiltinAliasAttr(S, D, AL);		handleBuiltinAliasAttr(S, D, AL);
break;		break;

case ParsedAttr::AT_UsingIfExists:		case ParsedAttr::AT_UsingIfExists:
handleSimpleAttribute<UsingIfExistsAttr>(S, D, AL);		handleSimpleAttribute<UsingIfExistsAttr>(S, D, AL);
break;		break;

		// Coroutine attributes.
		case ParsedAttr::AT_CoroAlwaysDone:
		handleSimpleAttribute<CoroAlwaysDoneAttr>(S, D, AL);
		break;
}		}
}		}

/// ProcessDeclAttributeList - Apply all the decl attributes in the specified		/// ProcessDeclAttributeList - Apply all the decl attributes in the specified
/// attribute list to the specified decl, ignoring any type attributes.		/// attribute list to the specified decl, ignoring any type attributes.
void Sema::ProcessDeclAttributeList(		void Sema::ProcessDeclAttributeList(
Scope S, Decl D, const ParsedAttributesView &AttrList,		Scope S, Decl D, const ParsedAttributesView &AttrList,
const ProcessDeclAttributeOptions &Options) {		const ProcessDeclAttributeOptions &Options) {
▲ Show 20 Lines • Show All 406 Lines • Show Last 20 Lines

clang/test/CodeGenCoroutines/coro-coro_always_done.cpp

This file was added.

				// Verify the always_complete_coroutine will be emitted correctly.
				// RUN: %clang_cc1 -std=c++20 -triple=x86_64-unknown-linux-gnu -emit-llvm -o - %s -disable-llvm-passes \| FileCheck %s

				namespace std {
				template <typename R, typename...> struct coroutine_traits {
				using promise_type = typename R::promise_type;
				};

				template <class Promise = void> struct coroutine_handle {
				coroutine_handle() = default;
				static coroutine_handle from_address(void *) noexcept;
				};
				template <> struct coroutine_handle<void> {
				static coroutine_handle from_address(void *) noexcept;
				coroutine_handle() = default;
				template <class PromiseType>
				coroutine_handle(coroutine_handle<PromiseType>) noexcept;
				};

				struct suspend_always {
				bool await_ready() noexcept { return false; }
				void await_suspend(std::coroutine_handle<>) noexcept {}
				void await_resume() noexcept {}
				};
				} // namespace std

				template <> struct std::coroutine_traits<void> {
				struct promise_type {
				void get_return_object() noexcept {}
				std::suspend_always initial_suspend() noexcept { return std::suspend_always{}; }
				std::suspend_always final_suspend() noexcept { return std::suspend_always{}; }
				void return_void() noexcept {}
				promise_type() {}
				~promise_type() {}
				void unhandled_exception() noexcept {}
				};
				};

				struct Cleanup { ~Cleanup(); };

				[[clang::coro_always_done]]
				void f() {
				Cleanup cleanup;
				co_await std::suspend_always{};
				co_return;
				}

				// CHECK: define{{.}}@_Z1fv{{.}} #[[F_ATTR_NUM:[0-9]+]]

				struct [[clang::coro_always_done]] task {
				struct promise_type {
				auto initial_suspend() { return std::suspend_always{}; }
				auto final_suspend() noexcept { return std::suspend_always{}; }
				auto get_return_object() { return task{}; }
				void unhandled_exception() {}
				void return_value(int) {}
				};
				};

				task f1() {
				Cleanup cleanup;
				co_await std::suspend_always{};
				co_return 43;
				}

				// CHECK: define{{.}}@_Z2f1v{{.}} #[[F_ATTR_NUM]]

				// CHECK: attributes #[[F_ATTR_NUM]] = {{.*}}always_complete_coroutine

clang/test/Misc/pragma-attribute-supported-attributes-list.test

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: Cold (SubjectMatchRule_function)			// CHECK-NEXT: Cold (SubjectMatchRule_function)
	// CHECK-NEXT: Common (SubjectMatchRule_variable)			// CHECK-NEXT: Common (SubjectMatchRule_variable)
	// CHECK-NEXT: ConstInit (SubjectMatchRule_variable_is_global)			// CHECK-NEXT: ConstInit (SubjectMatchRule_variable_is_global)
	// CHECK-NEXT: Constructor (SubjectMatchRule_function)			// CHECK-NEXT: Constructor (SubjectMatchRule_function)
	// CHECK-NEXT: Consumable (SubjectMatchRule_record)			// CHECK-NEXT: Consumable (SubjectMatchRule_record)
	// CHECK-NEXT: ConsumableAutoCast (SubjectMatchRule_record)			// CHECK-NEXT: ConsumableAutoCast (SubjectMatchRule_record)
	// CHECK-NEXT: ConsumableSetOnRead (SubjectMatchRule_record)			// CHECK-NEXT: ConsumableSetOnRead (SubjectMatchRule_record)
	// CHECK-NEXT: Convergent (SubjectMatchRule_function)			// CHECK-NEXT: Convergent (SubjectMatchRule_function)
				// CHECK-NEXT: CoroAlwaysDone (SubjectMatchRule_record, SubjectMatchRule_function)
	// CHECK-NEXT: DLLExport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)			// CHECK-NEXT: DLLExport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
	// CHECK-NEXT: DLLImport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)			// CHECK-NEXT: DLLImport (SubjectMatchRule_function, SubjectMatchRule_variable, SubjectMatchRule_record, SubjectMatchRule_objc_interface)
	// CHECK-NEXT: Destructor (SubjectMatchRule_function)			// CHECK-NEXT: Destructor (SubjectMatchRule_function)
	// CHECK-NEXT: DiagnoseAsBuiltin (SubjectMatchRule_function)			// CHECK-NEXT: DiagnoseAsBuiltin (SubjectMatchRule_function)
	// CHECK-NEXT: DisableSanitizerInstrumentation (SubjectMatchRule_function, SubjectMatchRule_objc_method, SubjectMatchRule_variable_is_global)			// CHECK-NEXT: DisableSanitizerInstrumentation (SubjectMatchRule_function, SubjectMatchRule_objc_method, SubjectMatchRule_variable_is_global)
	// CHECK-NEXT: DisableTailCalls (SubjectMatchRule_function, SubjectMatchRule_objc_method)			// CHECK-NEXT: DisableTailCalls (SubjectMatchRule_function, SubjectMatchRule_objc_method)
	// CHECK-NEXT: EnableIf (SubjectMatchRule_function)			// CHECK-NEXT: EnableIf (SubjectMatchRule_function)
	// CHECK-NEXT: EnforceTCB (SubjectMatchRule_function, SubjectMatchRule_objc_method)			// CHECK-NEXT: EnforceTCB (SubjectMatchRule_function, SubjectMatchRule_objc_method)
	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[C++] [Coroutines] Introduce `coro_always_done` attribute to help optimizations (2/2)Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 461127

clang/docs/ReleaseNotes.rst

clang/include/clang/Basic/Attr.td

clang/include/clang/Basic/AttrDocs.td

clang/lib/CodeGen/CGCoroutine.cpp

clang/lib/Sema/SemaDeclAttr.cpp

clang/test/CodeGenCoroutines/coro-coro_always_done.cpp

clang/test/Misc/pragma-attribute-supported-attributes-list.test

[C++] [Coroutines] Introduce `coro_always_done` attribute to help optimizations (2/2)
Needs ReviewPublic