Prepare for https://reviews.llvm.org/D134010.
From the perspective of the middle end, the attribute teaches the compiler how to reduce the control flow graph which was impossible.
Currently the optimization will reduce the size of the destroy function since we know the destroy function will only be called after the coroutine completes.
I am a bit surprised that we introduce a new IR-attribute for this.
Can't we reuse the existing branching structure of the suspension points?
I could imagine that we could use the switch after each suspend to encode whether the coroutine can be destroyed at that suspension point. If we point the switch for i8 1, i.e. the cleanup control flow edge, to an unreachable basic block, the coroutine splitting should not optimize out the corresponding code from the destroy function already, doesn't it?
I imagine something like
// For this suspension point, we want to optimize under the assumption that the // coroutine will not be destroyed while suspended. %0 = call i8 @llvm.coro.suspend(token none, i1 false) // Hence we let the "cleanup" result (`i8 1`) branch to an "unreachable" block switch i8 %0, label %suspend [i8 0, label %continue i8 1, label %unreachable_bb] continue: // After the resumption point, execution continues here... // Let's assume we suspend a 2nd time. %1 = call i8 @llvm.coro.suspend(token none, i1 false) // At this suspension point, we allow the coroutine to be destructed. switch i8 %1, label %suspend [i8 0, label %continue2 i8 1, label %cleanup] continue2: // Some other stuff happens here... cleanup: %mem = call i8* @llvm.coro.free(token %id, i8* %hdl) call void @free(i8* %mem) br label %suspend suspend: %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false) ret i8* %hdl unreachable_bb: unreachable }To prove the viability of this alternative approach, I copied the LLVM code for https://github.com/llvm/llvm-project/issues/56980 from https://godbolt.org/z/P84MPzq4q and manually replaced the await.cleanup, await2.cleanup ..., await7.cleanup basic blocks by "unreachable". You can find the changed LLVM code in https://godbolt.org/z/n93TTj8vo. As you can see, the generated Foo() [clone .destroy] function does not contain any code for the GlobalSetter destructors
I think re-using the switch control flow is preferable over introducing a new attribute because: