Close https://github.com/llvm/llvm-project/issues/56301
Close https://github.com/llvm/llvm-project/issues/64151
Given its context is pretty long, I'll give it a simple summarize here.
The issue pattern here is:
C++ struct Awaiter { SomeAwaitable&& awaitable; bool suspended; bool await_ready() { return false; } std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h) { // Assume we will suspend unless proven otherwise below. We must do // this *before* calling Register, since we may be destroyed by another // thread asynchronously as soon as we have registered. suspended = true; // Attempt to hand off responsibility for resuming/destroying the coroutine. const auto to_resume = awaitable.Register(h); if (!to_resume) { // The awaitable is already ready. In this case we know that Register didn't // hand off responsibility for the coroutine. So record the fact that we didn't // actually suspend, and tell the compiler to resume us inline. suspended = false; return h; } // Resume whatever Register wants us to resume. return to_resume; } void await_resume() { // If we didn't suspend, make note of that fact. if (!suspended) { DidntSuspend(); } } };
In the example, the program would only access the coroutine frame conditionally. However, the optimizer failed to think the variable suspended may be aliased with the coroutine handle h in await_suspend. In this case, the value of suspended is the same with the nullness of to_resume. So the variable suspended is optimized out and the nullness of the to_resume is stored to the coroutine frame unconditionally. So that it is a UB if the coroutine handle get destroyed in awaitable.Register() in another thread while we access the coroutine frame.
The root cause of the problem is that the optimizer can't recognize the may_alias relationship between the data members of the awaiter with the coroutine handle. So I thought I must fight with AA initially. But I realized that it is sufficient to mark the awaiter as escaped during the await_suspend. This is not a workaround but a proper fix to the underlying issue since the C++ language doesn't allow the programmers to access the coroutine handle except via await_suspend.
And it is pretty easy to mark certain variables as escaped. We can make it by passing its address to a foreign function simply. So it is the framework of the patch. We introduced a LLVM intrinsic @llvm.coro.opt.blocker to block the optimization of awaiter.
Another important point is that we shouldn't make it for awaiter with no non-static data members. The instance of an empty class may still occupy 1 byte due to the requirement of C++ language. But the optimizer can optimize such variables out if it can inline await_ready, await_suspend and await_resume. So it doesn't matter. However, it won't be done after we introduced @llvm.coro.opt.blocker. This is not acceptable. So I spent some time in this page to make it not happen with the empty awaiter.
Ideally, it will be better to emit @llvm.coro.opt.blocker only if we observed the coroutine handle leaked from await_suspend and there is conditional access after that. But it is more complex and it is also questionable if the benefits worth the cost. Currently, coroutines already compile slowly. So let's leave it to the future.
Note that the may cause a crash if there are some types we are not able to covered. This is intentional. Since such crash is easy to fix by adding new type to the TypeVisitor. I guess this may be understandable since the type system of clang frontend (or should I say C++?) is really complex.
Also note that this only matters with people who enabled assertions, possibly developers or testers.
For end users who use a clean release build, it is completely OK to not match the type. They may only need to pay 1 byte memory overhead. I feel this is better for the user experience.