tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available.
Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame).
The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame.
In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap.
Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime.
To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point.
The following is a common code pattern called "Symmetric Transfer" in coroutine:
auto tmp = await_suspend(); __builtin_coro_resume(tmp.address()); return;
In the above code example, await_suspend() returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine.
During the call to await_suspend(), the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards.
However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for tmp. It will then call the address function on tmp. address function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the tmp pointer. So when the optimizer looks at it, it has to conservatively assume that tmp may escape and hence put it on the heap. Furthermore, in some cases address call would be inlined, which will generate a bunch of store/load instructions that move the tmp pointer around. Those stores will also make the compiler to think that tmp might escape.
A repro of crash can be found here: https://godbolt.org/z/KvPY66
To summarize, it's really difficult for the mid-end to figure out that the tmp data is short-lived.
I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines.
Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893
I need to fix a few tests. But sending this out early for feedback.
Can we sure frontend would always call this API to emit lifetime start? I mean the frontend may call EmitIntrinsic or create lifetime.start intrinsic directly whether by IRBuilder::CreateXXX or Instrinsic::Create(...). I worry about if this would incur changes out of design.
Then if we add check in EmitLifetimeStart, why not we add check in EmitLfietimeEnd?