One of the challenges with the alloca analysis in CoroSplit is that in a few cases we need to make sure the allocas must be put on the stack, not on the frame.
One of the cases is symmetric transfer. Symmetric transfer is a newly introduced feature in C++ coroutines that allows for immediate transfer to a different coroutine when the current coroutine is suspended.
The await_suspend() call will return a coroutine handle type, and when that happens, the compiler should generate code to resume the returned handle. Like this:
coroutine_handle tmp = awaiter.await_suspend(); __builtin_coro_resume(tmp.address());
It's very common that after the call to await_suspend(), the current coroutine frame is already destroyed, which means we should not be accessing the coroutine frame from there.
And we shouldn't because we we use here is a temporary variable which will be short-lived. However in a debug build when we don't have lifetime intrinsics, it's very hard for the compiler to determine that tmp doesn't escape. This bug can be reproduced in this example: https://godbolt.org/z/KvPY66
It results in a TSAN failure because we are accessing the heap after it's destroyed.
There are two specific challenges here:
- If the address() function call is not inlined (this should be the default case with -O0), we will have a function call that takes tmp as a pointer. The compiler does not know that the address call will not capture. This will lead to tmp being put on the frame. We could potentially special handle the address function in either front-end or CoroSplit, but both are fragile (we will need to do some name pattern matching).
- If the address() function call is inlined (in some versions of libc++, address seems to have "always_inline" attribute), we will end up with a series of store/load instructions. For a naive analysis, a store of the pointer will also be treated as escape. To solve that problem, I introduced D91305, which tries to match this specific store/load pattern and be able to deal with it. It looks very hacky.
To solve this problem once for all, and provide a framework for solving similar problems in the future, this patch introduces 2 new intrinsics to mark a region where all data accessed must be put on the stack.
In the case of symmetric transfer, in order to be able to insert code during front-end codegen right after the await_suspend call, we need to split the Suspend subnode CoroutineSuspendExpr at await_suspend call, as the new AwaitSuspendCall subnode.
Then we create a OpaqueValueExpr to wrap around AwaitSuspendCall, and use it to continue build the rest of the Suspend subnode. OpaqueValueExpr is necessary because we don't want to emit the await_suspend call twice. OpaqueValueExpr serves as a stopper in codegen.
If there is no symmetric transfer, the new nodes will be nullptr.
After this patch, now right after the await_suspend() call, we will see a llvm.coro.forcestack.begin() intrinsic, and then right before coro.suspend(), we will see a llvm.coro.forcestack.end() intrinsic.
CoroSplit will then be able to use this information to decide whether some data must be put on the stack.
We are also able to remove the code that tries to match the special store/load instruction sequence.
It looks strange for the change of CoroutineSuspendExpr at the first glance. It is easy to understand the coroutine suspend expression is consists of three parts: Ready, Suspend and resume. It is written in the language documentation. And the new added AwaitSuspendCall is confusing.