Clang front-end emits an AST that looks like this for an co_await expression:
|- ExprWithCleanups |- -CoawaitExpr |- -MaterializeTemporaryExpr ... Awaiter ... |- -CXXMemberCallExpr ... .await_ready ... |- -CallExpr ... __builtin_coro_resume ... |- -CXXMemberCallExpr ... .await_resume ...
ExprWithCleanups is responsible for cleaning up (including calling dtors) for the temporaries generated in the wrapping expression).
In the above structure, the __builtin_coro_resume part (which corresponds to the code for the suspend case in the co_await with symmetric transfer), the pseudocode looks like this:
__builtin_coro_resume( awaiter.await_suspend( from_address( __builtin_coro_frame())).address());
One of the temporaries that's generated as part of this code is the coroutine handle returned from awaiter.await_suspend() call. The call returns a handle which is a prvalue (since it's a returned value on the fly). In order to call the address() method on it, it needs to be converted into an xvalue. Hence a materialized temp is created to hold it. This temp will need to be cleaned up eventually. Now, since all cleanups happen at the end of the entire co_await expression, which is after the <coro.suspend> suspension point, the compiler will think that such a temp needs to live across suspensions, and need to be put on the coroutine frame, even though it's only used temporarily just to call address() method.
Such a phenomena not only unnecessarily increases the frame size, but can lead to ASAN failures, if the coroutine was already destroyed as part of the await_suspend() call. This is because if the coroutine was already destroyed, the frame no longer exists, and one can not store anything into it. But if the temporary object is considered to need to live on the frame, it will be stored into the frame after await_suspend() returns.
A fix attempt was done in D87470. Unfortunately it is incorrect. The reason is that cleanups in Clang works more like linearly than nested. There is one current state indicating whether it needs cleanup, and an ExprWithCleanups resets that state. This means that an ExprWithCleanups must be capable of cleaning up all temporaries created in the wrapping expression, otherwise there will be dangling temporaries cleaned up at the wrong place.
I eventually found a walk-around (D89066) that doesn't break any existing tests while fixing the issue. But it targets the final co_await only. If we ever have a co_await that's not on the final awaiter and the frame gets destroyed after suspend, we are in trouble. Hence we need a proper fix.
This patch is the proper fix. It does the folllowing things to fully resolve the issue:
- The AST has to be generated in the order according to their nesting relationship. We should not generate AST out of order because then the code generator would incorrectly track the state of temporaries and when a cleanup is needed. So the code in buildCoawaitCalls is reorganized so that we will be generating the AST for each coawait member call in order along with their child AST.
- await_ready() call is wrapped with an ExprWithCleanups so that temporaries in it gets cleaned up as early as possible to avoid living across suspension.
- await_suspend() call is wrapped with an ExprWithCleanups if it's not a symmetric transfer. In the case of a symmetric transfer, in order to maintain the musttail call contract, the ExprWithCleanups is wraaped before the resume call.
- In the end, we mark again that it needs a cleanup, so that the entire CoawaitExpr will be wrapped with a ExprWithCleanups which will clean up the Awaiter object associated with the await expression.