This is an alternative to D97915 which missed proper deallocation of the
over-allocated frame. This patch handles both allocations and deallocations.
Both user-defined promise/local variables and compiler synthesized local variables could cause the coroutine frame to be overaligned. There are some related descriptions in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2014r0.pdf (issue #3).
Contrary to D97915, this patch implements the over-allocation in the backend instead of the frontend since
- The alloca of the raw frame pointer (suppose we insert it in the frontend) would be included in the non-overaligned frame if we don't teach CoroFrame how to elide it.
- Only insert extra code when it is known that the frame is overaligned.
- Simpler implementation.
- Clients could turn it on/off by using llvm.coro.size.aligned instead of llvm.coro.size. It indicates to LLVM that it should handle overaligned frames.
Overall approach:
- Overalignment handling is only performed when llvm.coro.size.aligned is used.
- llvm.coro.size.aligned returns (sizeof(coroutine frame) + max(0, alignof(coroutine frame) - STDCPP_DEFAULT_NEW_ALIGNMENT), which, when coroutine is not overaligned, equal to llvm.coro.size.
- In CoroFrame, immediately after the alignment of frame is known to be overaligned, extra code is emitted to 1) create a new alloca to store the malloc returned memory address. 2) emit dynamic alignment adjust code to make sure frame ptr address aligns correctly. 3) remember the frame index for the newly created alloca, use it later for deallocation
- In coro::replaceCoroFree, when it is decided that heap allocation could not be elided, instead of free(<frame ptr>) do Value *v = gep <frame ptr>, 0, frame_ptr_addr_index; free(load(v))
- Let Clang switch to use llvm.coro.size.aligned. When in the future, Clang gains support to handle this by itself, switch it back to use llvm.coro.size.
https://reviews.llvm.org/P8260 is a IR diff (llvm.coro.size vs llvm.coro.size.aligned) for the test case in llvm/test/Transforms/Coroutines/coro-padding.ll.
Do we need to change __builtin_coro_size? The argument will always be 1, right?
It only starts to change in LLVM intrinsics, if I read the impl correctly.