This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ReleaseNotes.rst
-
lib/CodeGen/
-
CodeGen/
1/1
CGCoroutine.cpp
-
test/CodeGenCoroutines/
-
CodeGenCoroutines/
-
coro-awaiter-addr.cpp
-
coro-symmetric-transfer-02.cpp
-
pr56301.cpp
-
llvm/
-
docs/
-
Coroutines.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
Intrinsics.td
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
CoroInstr.h
-
Coroutines.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-opt-blocker.ll

Differential D157070

[C++20] [Coroutines] Introduce `@llvm.coro.opt.blocker` to block the may-be-incorrect optimization for awaiter
AbandonedPublic

Authored by ChuanqiXu on Aug 3 2023, 11:03 PM.

Download Raw Diff

Details

Reviewers

ilya-biryukov
rjmccall
cor3ntin
weiwang
bruno
MatzeB

Group Reviewers

Restricted Project

Summary

Close https://github.com/llvm/llvm-project/issues/56301
Close https://github.com/llvm/llvm-project/issues/64151

Given its context is pretty long, I'll give it a simple summarize here.

The issue pattern here is:

C++
struct Awaiter {
  SomeAwaitable&& awaitable;
  bool suspended;

  bool await_ready() { return false; }

  std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h) {
    // Assume we will suspend unless proven otherwise below. We must do
    // this *before* calling Register, since we may be destroyed by another
    // thread asynchronously as soon as we have registered.
    suspended = true;

    // Attempt to hand off responsibility for resuming/destroying the coroutine.
    const auto to_resume = awaitable.Register(h);

    if (!to_resume) {
      // The awaitable is already ready. In this case we know that Register didn't
      // hand off responsibility for the coroutine. So record the fact that we didn't
      // actually suspend, and tell the compiler to resume us inline.
      suspended = false;
      return h;
    }

    // Resume whatever Register wants us to resume.
    return to_resume;
  }

  void await_resume() {
    // If we didn't suspend, make note of that fact.
    if (!suspended) {
      DidntSuspend();
    }
  }
};

In the example, the program would only access the coroutine frame conditionally. However, the optimizer failed to think the variable suspended may be aliased with the coroutine handle h in await_suspend. In this case, the value of suspended is the same with the nullness of to_resume. So the variable suspended is optimized out and the nullness of the to_resume is stored to the coroutine frame unconditionally. So that it is a UB if the coroutine handle get destroyed in awaitable.Register() in another thread while we access the coroutine frame.

The root cause of the problem is that the optimizer can't recognize the may_alias relationship between the data members of the awaiter with the coroutine handle. So I thought I must fight with AA initially. But I realized that it is sufficient to mark the awaiter as escaped during the await_suspend. This is not a workaround but a proper fix to the underlying issue since the C++ language doesn't allow the programmers to access the coroutine handle except via await_suspend.

And it is pretty easy to mark certain variables as escaped. We can make it by passing its address to a foreign function simply. So it is the framework of the patch. We introduced a LLVM intrinsic @llvm.coro.opt.blocker to block the optimization of awaiter.

Another important point is that we shouldn't make it for awaiter with no non-static data members. The instance of an empty class may still occupy 1 byte due to the requirement of C++ language. But the optimizer can optimize such variables out if it can inline await_ready, await_suspend and await_resume. So it doesn't matter. However, it won't be done after we introduced @llvm.coro.opt.blocker. This is not acceptable. So I spent some time in this page to make it not happen with the empty awaiter.

Ideally, it will be better to emit @llvm.coro.opt.blocker only if we observed the coroutine handle leaked from await_suspend and there is conditional access after that. But it is more complex and it is also questionable if the benefits worth the cost. Currently, coroutines already compile slowly. So let's leave it to the future.

Diff Detail

Unit TestsFailed

	Time	Test
	170 ms	x64 debian > cfi-devirt-lld-thinlto-x86_64.cross-dso::shadow_is_read_only.cpp
	190 ms	x64 debian > cfi-devirt-lld-thinlto-x86_64.cross-dso::simple-fail.cpp
	180 ms	x64 debian > cfi-devirt-lld-thinlto-x86_64.cross-dso::simple-pass.cpp
	190 ms	x64 debian > cfi-devirt-lld-thinlto-x86_64.cross-dso::stats.cpp
	220 ms	x64 debian > cfi-devirt-lld-thinlto-x86_64.cross-dso::target_out_of_bounds.cpp
		View Full Test Results (18 Failed)

Event Timeline

ChuanqiXu created this revision.Aug 3 2023, 11:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 11:03 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

ChuanqiXu requested review of this revision.Aug 3 2023, 11:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2023, 11:03 PM

Herald added subscribers: llvm-commits, jdoerfert. · View Herald Transcript

ChuanqiXu updated this revision to Diff 547107.Aug 3 2023, 11:12 PM

ChuanqiXu added inline comments.Aug 3 2023, 11:14 PM

clang/lib/CodeGen/CGCoroutine.cpp
158–169	Note that the may cause a crash if there are some types we are not able to covered. This is intentional. Since such crash is easy to fix by adding new type to the TypeVisitor. I guess this may be understandable since the type system of clang frontend (or should I say C++?) is really complex. Also note that this only matters with people who enabled assertions, possibly developers or testers. For end users who use a clean release build, it is completely OK to not match the type. They may only need to pay 1 byte memory overhead. I feel this is better for the user experience.

Harbormaster completed remote builds in B250256: Diff 547107.Aug 4 2023, 12:33 AM

ChuanqiXu updated this revision to Diff 547134.Aug 4 2023, 1:15 AM

Harbormaster completed remote builds in B250272: Diff 547134.Aug 4 2023, 2:20 AM

ChuanqiXu added a reviewer: MatzeB.Aug 8 2023, 6:37 PM

Let me try to rephrase to see if I understand the problem here. In the coroutine function, the frontend allocates the awaiter object as a temporary like any other, i.e. with just an alloca. The optimizer doesn't see any escapes of that alloca, so it thinks it can freely optimize accesses to it. But in fact the awaiter object does escape and can be accessed separately somehow, so the proposed fix is just to emit a call which looks like an escape of the pointer.

I don't really understand how the awaiter object escapes here — how is it possible to access it given just a std::coroutine_handle? But given that it apparently does escape, the proposed fix seems like the best way to record that.

In D157070#4573825, @rjmccall wrote:

Let me try to rephrase to see if I understand the problem here. In the coroutine function, the frontend allocates the awaiter object as a temporary like any other, i.e. with just an alloca. The optimizer doesn't see any escapes of that alloca, so it thinks it can freely optimize accesses to it. But in fact the awaiter object does escape and can be accessed separately somehow, so the proposed fix is just to emit a call which looks like an escape of the pointer.

Yes. Exactly.

I don't really understand how the awaiter object escapes here — how is it possible to access it given just a std::coroutine_handle?

The std::coroutine_handle refers to the coroutine frame, which is a semi opaque object. The language doesn't allow programmers to access the local variables from the coroutine frame directly. But the language allows the programmer to resume/destroy the corresponding coroutine. So that the programmer can access the local variables by the coroutine handle indirectly.

For this example,

C++
std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h) {
    // Assume we will suspend unless proven otherwise below. We must do
    // this *before* calling Register, since we may be destroyed by another
    // thread asynchronously as soon as we have registered.
    suspended = true;

    // Attempt to hand off responsibility for resuming/destroying the coroutine.
    const auto to_resume = awaitable.Register(h);

    if (!to_resume) {
      // The awaitable is already ready. In this case we know that Register didn't
      // hand off responsibility for the coroutine. So record the fact that we didn't
      // actually suspend, and tell the compiler to resume us inline.
      suspended = false;
      return h;
    }

    // Resume whatever Register wants us to resume.
    return to_resume;
  }

The problem is that the call awaitable.Register(h); may destroy the coroutine referred by h then it is problematic to access the local variable suspended after that. The source code is good since it only access the local variable conditionally. But the optimizer ignores the fact that the local variable may be alias with the coroutine handle. So the compiler optimize the conditional access to unconditional access. So problem happens.

Then above explanation tells the reason why this only matters with awaiter. Since the language only allows the programmer to access the coroutine handle from await_suspend which is member function of an awaiter. So it is sufficient to take care of awaiter specially to me.

ChuanqiXu edited the summary of this revision. (Show Details)Aug 10 2023, 1:50 AM

Hmm. This is quite interesting. So we've got two things going on that aren't quite kosher from an LLVM perspective:

The allocas in the coroutine are potentially deallocated whenever there's a suspend. Normally this isn't a problem because code in the coroutine can't run when that happens, but:
The call to await_suspend is not really part of the coroutine execution, and this is really sneakily important in a lot of ways.

One of those ways that I, at least, hadn't realized before: the local variables of the await_suspend call *must not be allocated into the coroutine frame*, because otherwise they can be deallocated before they're allowed to be. This works out today because inlining adds lifetime markers to the allocas of the inlined function, and coroutine frame lowering detects when an alloca has lifetime markers that don't cross a suspend and leaves it as an alloca in the split function. But with await_suspend we're semantically reliant on that; as long as we can inline await_suspend into an unlowered coroutine, then if we have any gaps in that "optimization" at all, it can cause a miscompile.

The awaiter temporary has to go into the coroutine frame: its formal lifetime lasts until the end of the full-expression in the coroutine that contains the co_await, and of course we also have to call await_resume() after resuming from suspension. So necessarily await_suspend has to handle the this object being destroyed during the execution of the method: e.g. anything it does after enqueuing the coroutine to be asynchronously resumed is unordered (racing) with the destruction of this. If we inline await_suspend — which in general we'd really like to do because a lot of its data flow is just local to the coroutine — we have to be careful about how that's optimized, which ends up being the direct bug here. But we *also* have to be careful about how anything it might store a reference to might be optimized: the awaiter could captured a pointer to some other local variable from the coroutine, and that variable will be also apparently be deallocated during its alloca lifetime.

If marking an alloca as escaping (as your current patch does) is really sufficient to fix this bug, we should be good. For example, it implicitly fixes the problem with captured pointers to other allocas just by the nature of transitive escapes: if a pointer to the awaiter can escape, and the awaiter may be storing a pointer to some other variable A, then the pointer to A can also escape. I'm concerned that marking an alloca as escaping might not be sufficient, though. The optimizer does have to be a lot more conservative about optimizing accesses to an escaped alloca, but there are still *some* optimizations it can do because they can't be detected by well-behaved code. For example, if the optimizer sees a conditional store to an alloca followed by a return, it could certainly still make it unconditional (or eliminate it entirely) — the memory is guaranteed to exist, and there's no valid way to observe the store because there are no memory accesses or synchronizations before it goes out of scope. In the example, what actually blocks the optimization is that the store is followed by an intrinsic call, which the optimizer has to pessimistically assume can observe the store through the escaped pointer. So I'm worried that just marking the variable as having escaped isn't sufficient to prevent all mis-optimization here, and the analysis of why it probably won't happen seems really brittle. Maybe it's fine, though.

What does the coroutine IR actually look like here? We must be setting up the coroutine frame for the suspension before the call to await_suspend, or else it won't be properly ordered before whatever dispatch we do there; but the actual suspension point must come *after* the call to await_suspend, or the optimizer's understanding of the control flow will be completely messed up. I'm wondering if we might need to keep the await_suspend call outlined somehow until after coroutine splitting.

In D157070#4577996, @rjmccall wrote:

Hmm. This is quite interesting. So we've got two things going on that aren't quite kosher from an LLVM perspective:

The allocas in the coroutine are potentially deallocated whenever there's a suspend. Normally this isn't a problem because code in the coroutine can't run when that happens, but:

The call to await_suspend is not really part of the coroutine execution, and this is really sneakily important in a lot of ways.

Yes, exactly. And for the second point, it is stated formally that the coroutine is considered as suspended after await_ready returns false: http://eel.is/c++draft/expr.await#5.1. And we already considered this when designing coroutines intrinsics: this is the job of @llvm.coro.save: https://llvm.org/docs/Coroutines.html#llvm-coro-save-intrinsic. I feel we just had an oversight before.

One of those ways that I, at least, hadn't realized before: the local variables of the await_suspend call *must not be allocated into the coroutine frame*, because otherwise they can be deallocated before they're allowed to be. This works out today because inlining adds lifetime markers to the allocas of the inlined function, and coroutine frame lowering detects when an alloca has lifetime markers that don't cross a suspend and leaves it as an alloca in the split function. But with await_suspend we're semantically reliant on that; as long as we can inline await_suspend into an unlowered coroutine, then if we have any gaps in that "optimization" at all, it can cause a miscompile.

Yes and I don't feel it is scary. The semantics here are clear. So if a miscomputation of such cases happens, it implies that there is a bug either in CoroFrame or in the inlining pass. We'll know how to handle them.

The awaiter temporary has to go into the coroutine frame: its formal lifetime lasts until the end of the full-expression in the coroutine that contains the co_await, and of course we also have to call await_resume() after resuming from suspension. So necessarily await_suspend has to handle the this object being destroyed during the execution of the method: e.g. anything it does after enqueuing the coroutine to be asynchronously resumed is unordered (racing) with the destruction of this. If we inline await_suspend — which in general we'd really like to do because a lot of its data flow is just local to the coroutine — we have to be careful about how that's optimized, which ends up being the direct bug here. But we *also* have to be careful about how anything it might store a reference to might be optimized: the awaiter could captured a pointer to some other local variable from the coroutine, and that variable will be also apparently be deallocated during its alloca lifetime.

Yes and while I don't think we have any gaps here, I just want to note that there cases we can choose to not put the awaiter to the coroutine frame: in case we inlined await_ready, await_suspend and await_resume and there is no non-static data members used in await_resume.
Such cases are qutie common. e.g., the std::await_suspend; Of course, it may be always safe to put the awaiter to the coroutine frame.

If marking an alloca as escaping (as your current patch does) is really sufficient to fix this bug, we should be good. For example, it implicitly fixes the problem with captured pointers to other allocas just by the nature of transitive escapes: if a pointer to the awaiter can escape, and the awaiter may be storing a pointer to some other variable A, then the pointer to A can also escape. I'm concerned that marking an alloca as escaping might not be sufficient, though. The optimizer does have to be a lot more conservative about optimizing accesses to an escaped alloca, but there are still *some* optimizations it can do because they can't be detected by well-behaved code. For example, if the optimizer sees a conditional store to an alloca followed by a return, it could certainly still make it unconditional (or eliminate it entirely) — the memory is guaranteed to exist, and there's no valid way to observe the store because there are no memory accesses or synchronizations before it goes out of scope. In the example, what actually blocks the optimization is that the store is followed by an intrinsic call, which the optimizer has to pessimistically assume can observe the store through the escaped pointer. So I'm worried that just marking the variable as having escaped isn't sufficient to prevent all mis-optimization here, and the analysis of why it probably won't happen seems really brittle. Maybe it's fine, though.

I think we need to believe in other optimizations for the cases you mentioned. Otherwise, not only the design of LLVM coroutines is potentially broken, I feel the LLVM itself is not stable too. Since the cases you said are not limited to coroutines. So I feel we had to believe other optimizations and if we meet any other bugs, let's fix them.

I'm wondering if we might need to keep the await_suspend call outlined somehow until after coroutine splitting.

This works. But I feel this is more or less a policy issues. I mean the reason why we put some of the coroutine functionalities in the middle end is that we want to get benefits from the middle end optimizations. And in fact, we got it. We got much better performances than GCC within coroutines. Of course, the other side of the decision is that we meet many more middle end bugs than GCC. And in fact, there are many bugs can be fixed automatically by putting CoroSplit pass in the front of some other certain optimization passes. But this is what we didn't do. We always tried to find the solution with minimal impact.

What does the coroutine IR actually look like here? We must be setting up the coroutine frame for the suspension before the call to await_suspend, or else it won't be properly ordered before whatever dispatch we do there; but the actual suspension point must come *after* the call to await_suspend, or the optimizer's understanding of the control flow will be completely messed up.

The actual IR generated here before inlining is:

await.suspend:                                    ; preds = %init.ready
  %10 = call token @llvm.coro.save(ptr null)
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %ref.tmp11) #3
  %call14 = call ptr @_ZNSt7__n486116coroutine_handleIN6MyTask12promise_typeEE12from_addressEPv(ptr noundef %4)
  %call18 = call ptr @_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE(ptr noundef nonnull align 8 dereferenceable(9) %ref.tmp7, ptr %call14)
  store ptr %call18, ptr %ref.tmp11, align 8
  %call20 = call noundef ptr @_ZNKSt7__n486116coroutine_handleIvE7addressEv(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp11) #3
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %ref.tmp11) #3
  %11 = call ptr @llvm.coro.subfn.addr(ptr %call20, i8 0)
  call fastcc void %11(ptr %call20) #3
  %12 = call i8 @llvm.coro.suspend(token %10, i1 false)
  switch i8 %12, label %coro.ret [
    i8 0, label %await.ready
    i8 1, label %cleanup21
  ]

By the semantics of https://llvm.org/docs/Coroutines.html#llvm-coro-save-intrinsic, the call to @llvm.coro.save implies that we've entered the suspension state. And actually, the @llvm.coro.save will be lowered into an update of the index of the coroutine frame. So as far as I understood, it looks fine basically.

And for the issue itself, the IR after inlining looks like:

await.suspend:                                    ; preds = %init.ready
  %10 = call token @llvm.coro.save(ptr null)
  call void @llvm.lifetime.start.p0(i64 8, ptr nonnull %ref.tmp11) #3
  %suspended.i = getelementptr inbounds %struct.Awaiter, ptr %ref.tmp7, i64 0, i32 1
  store i8 1, ptr %suspended.i, align 8, !tbaa !5
  %11 = load ptr, ptr %ref.tmp7, align 8, !tbaa !11
  %call.i = call ptr @_ZN13SomeAwaitable8RegisterENSt7__n486116coroutine_handleIvEE(ptr noundef nonnull align 1 dereferenceable(1) %11, ptr %4) #3
  %tobool.i.not.i = icmp eq ptr %call.i, null
  br i1 %tobool.i.not.i, label %if.then.i, label %_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit

if.then.i:                                        ; preds = %await.suspend
  store i8 0, ptr %suspended.i, align 8, !tbaa !5
  br label %_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit

_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit: ; preds = %await.suspend, %if.then.i
  %retval.sroa.0.0.i = phi ptr [ %4, %if.then.i ], [ %call.i, %await.suspend ]
  store ptr %retval.sroa.0.0.i, ptr %ref.tmp11, align 8
  %12 = load ptr, ptr %ref.tmp11, align 8, !tbaa !12
  call void @llvm.lifetime.end.p0(i64 8, ptr nonnull %ref.tmp11) #3
  %13 = call ptr @llvm.coro.subfn.addr(ptr %12, i8 0)
  call fastcc void %13(ptr %12) #3
  %14 = call i8 @llvm.coro.suspend(token %10, i1 false)
  switch i8 %14, label %coro.ret [
    i8 0, label %await.ready
    i8 1, label %cleanup21
  ]

await.ready:                                      ; preds = %_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit, %init.ready
  %suspended.i43 = getelementptr inbounds %struct.Awaiter, ptr %ref.tmp7, i64 0, i32 1
  %15 = load i8, ptr %suspended.i43, align 8, !tbaa !5, !range !14, !noundef !15
  %tobool.not.i = icmp eq i8 %15, 0
  br i1 %tobool.not.i, label %if.then.i44, label %_ZN7Awaiter12await_resumeEv.exit

Here the %11 is the this pointer of the awaiter. And the IR is correct so far. The potentially on-the-frame variable %suspended.i (awaiter->suspended) may only be accessed if the coroutine is not destroyed. The semantics comes from the programmer. Then the optimizer found that the value of the variable %suspended.i (awaiter->suspended) is consistent with nullness of the return value of _ZN13SomeAwaitable8RegisterENSt7__n486116coroutine_handleIvEE and the awaiter is not escaped. So the optimizer choose to optimize the variable %suspended.i (awaiter->suspended) out.

await.suspend:                                    ; preds = %init.ready
  %9 = call token @llvm.coro.save(ptr null)
  %call.i = call ptr @_ZN13SomeAwaitable8RegisterENSt7__n486116coroutine_handleIvEE(ptr noundef nonnull align 1 dereferenceable(1) %7, ptr %4) #3
  %tobool.i.not.i = icmp eq ptr %call.i, null
  br i1 %tobool.i.not.i, label %if.then.i, label %_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit

if.then.i:                                        ; preds = %await.suspend
  br label %_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit

_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit: ; preds = %await.suspend, %if.then.i
  %ref.tmp7.sroa.5.0 = phi i8 [ 0, %if.then.i ], [ 1, %await.suspend ]
  %retval.sroa.0.0.i = phi ptr [ %4, %if.then.i ], [ %call.i, %await.suspend ]
  %10 = call ptr @llvm.coro.subfn.addr(ptr %retval.sroa.0.0.i, i8 0)
  call fastcc void %10(ptr %retval.sroa.0.0.i) #3
  %11 = call i8 @llvm.coro.suspend(token %9, i1 false)
  switch i8 %11, label %coro.ret [
    i8 0, label %await.ready
    i8 1, label %cleanup21
  ]

await.ready:                                      ; preds = %_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit, %init.ready
  %ref.tmp7.sroa.5.1 = phi i8 [ %8, %init.ready ], [ %ref.tmp7.sroa.5.0, %_ZN7Awaiter13await_suspendENSt7__n486116coroutine_handleIvEE.exit ]
  %tobool.not.i = icmp eq i8 %ref.tmp7.sroa.5.1, 0
  br i1 %tobool.not.i, label %if.then.i44, label %_ZN7Awaiter12await_resumeEv.exit

Then after a trivial optimization (SimplifyCFG):

coro.init:
  %5 = call token @llvm.coro.save(ptr null)
  %call.i = call ptr @_ZN13SomeAwaitable8RegisterENSt7__n486116coroutine_handleIvEE(ptr noundef nonnull align 1 dereferenceable(1) %ref.tmp8, ptr %4) #3
  %tobool.i.not.i = icmp eq ptr %call.i, null
  %ref.tmp7.sroa.5.0 = select i1 %tobool.i.not.i, i8 0, i8 1
  %retval.sroa.0.0.i = select i1 %tobool.i.not.i, ptr %4, ptr %call.i
  %6 = call ptr @llvm.coro.subfn.addr(ptr %retval.sroa.0.0.i, i8 0)
  call fastcc void %6(ptr %retval.sroa.0.0.i) #3
  %7 = call i8 @llvm.coro.suspend(token %5, i1 false)
  switch i8 %7, label %coro.ret [
    i8 0, label %await.ready
    i8 1, label %cleanup21
  ]

await.ready:                                      ; preds = %coro.init
  %tobool.not.i = icmp eq i8 %ref.tmp7.sroa.5.0, 0
  br i1 %tobool.not.i, label %if.then.i44, label %_ZN7Awaiter12await_resumeEv.exit

if.then.i44:                                      ; preds = %await.ready
  call void @_Z12DidntSuspendv() #3
  br label %_ZN7Awaiter12await_resumeEv.exit

Now the value %tobool.not.i is used across suspend points so that it belongs to the coroutine frame. Then we're accessing the coroutine frame unconditionally after the call to @_ZN13SomeAwaitable8RegisterENSt7__n486116coroutine_handleIvEE . Then boom after the coroutine handle got destroyed in that foreign functions. Here is the complete story.

So my personal summary is that we've already handled the case of await_suspend by @llvm.coro.save. But we didn't leak the awaiter before the await_suspend and the other optimization doesn't know the relationship between the awaiter and the coroutine handle. Then here is the problem. So I feel we can solve the issue simply after we mark the awaiter as escaped.

Yes and I don't feel it is scary. The semantics here are clear. So if a miscomputation of such cases happens, it implies that there is a bug either in CoroFrame or in the inlining pass. We'll know how to handle them.

Hmm. The problem is that both CoroFrame and the inliner treat those as best-effort, because they're just trying to provide optimizations, not preserve a critical semantic property. With that said, I don't have a great alternative than your suggestion of just fixing bugs as they come up.

I think we need to believe in other optimizations for the cases you mentioned. Otherwise, not only the design of LLVM coroutines is potentially broken, I feel the LLVM itself is not stable too. Since the cases you said are not limited to coroutines. So I feel we had to believe other optimizations and if we meet any other bugs, let's fix them.

I think you're misunderstanding my point. That optimization would be correct in a non-coroutine. You're relying on stronger properties here than LLVM IR guarantees.

This works. But I feel this is more or less a policy issues

Not really, no. Ideally the performance of generated code would never be in conflict with correctness, but when it is, it becomes a clearly subordinate goal. So the obligation here is to demonstrate that we can still compile code correctly while making the more aggressive representational choices that are enabling those performance benefits. If we can't demonstrate that, we need to make less aggressive choices.

The transformation in CoroSplit is incorrect according to normal IR semantics because the execution of the call to async_suspend can become asynchronous to the execution of the coroutine, breaking normal assumptions about instruction ordering and local allocation.

And actually, the @llvm.coro.save will be lowered into an update of the index of the coroutine frame.

That isn't all that coro.save has to do — it also has to make sure that spills are written into the frame. And in fact there's a subtle thing with that I'm not sure we're getting right, which again comes back to the special behavior of await_suspend and the ways our representation choices defy ordinary IR semantics.

Consider an awaiter that schedules a coroutine to be resumed asynchronously. It is important that stores that occur before this scheduling in program order remain before it after optimization. Now, optimizations like Mem2Reg lose information about when loads and stores are performed, but that normally doesn't matter because Mem2Reg has proven that the memory isn't visible non-locally and so memory ordering is irrelevant. But control flow in a coroutine doesn't really follow the CFG when you've got a non-trivial await_suspend — the control flow of the coroutine really stops at the coro.save and then picks up again after the coro.suspend, and the execution of await_suspend is in some sense asynchronous after the coro.save. And this is directly relevant because, when Mem2Reg needs to introduce a phi, that phi will generally appear *after* the stores that went into it, which is to say, not necessarily prior to the scheduling that might happen within await_suspend.

Let's make that more concrete. Suppose that our coroutine looks like this:

void *var = nullptr;
co_await someFunction(&var);
free(var);

And suppose that the awaiter returned by someFunction contains the pointer &var, and its await_suspend does something like *var_p = malloc(16) before asynchronously scheduling the coroutine handle. In the unoptimized code, var is just part of the coro frame, and this store happens in its usual program order, and as long as that happens-before the async scheduling, it'll happen-before the resumption of the coroutine and so will be visible when we execute free(var).

But consider what happens under optimization. After inlining and a little bit of optimization, we'll do the coro.save, then call malloc, store the result to var, do the coro.suspend, then load of var. Mem2Reg should then directly forward that store to the load, creating a direct value dependency across the coro.suspend. CoroFrame knows how to handle this: it has to spill the value into the coro frame. But where it should spill? It cannot spill at the point where the original store was done, because that information is lost. It cannot spill at the coro.save, because the malloc call hasn't happened yet. It cannot wait to spill until the coro.suspend to do it, because that would not necessarily be ordered before the async scheduling. Fortunately, CoroFrame will generally spill values at the point where the value was defined, which in this case happens to be approximately where the original store was, which is great.

However, consider what happens if the store was conditional within await_suspend. Mem2Reg will still do its work, but the value that's live across coro.suspend will now be a phi, and that phi will be introduced at the join point in the CFG. That point is no longer necessarily prior to the async scheduling, and so when CoroFrame spills it, it will insert a store that isn't necessarily ordered before the resumption, and we will have a miscompile.

I am very concerned we are going to be fighting problems like this indefinitely because of the way we are abusing IR.

Thanks for your valuable input. It pretty makes sense. I'll try to mark await_suspend as no inline as a (temporary) solution.

That isn't all that coro.save has to do — it also has to make sure that spills are written into the frame. And in fact there's a subtle thing with that I'm not sure we're getting right, which again comes back to the special behavior of await_suspend and the ways our representation choices defy ordinary IR semantics.

This is the key point that I agree to mark await_suspend as noinline now. Since currently we didn't implement the semantics for coro.save "it also has to make sure that spills are written into the frame" actually. Currently the coro.save intrinsic will take the coroutine handle as the first argument to make the coroutine handle as escaped. But the problem is, ..., other optimizations don't know other local variables may be alias with other local variables. So the problem you described is possible. Then I think this is the real root cause of the issue. Thanks for your insight again.

ChuanqiXu mentioned this in D157833: [C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty.Aug 13 2023, 10:40 PM

I sent the new patch to https://reviews.llvm.org/D157833.

ChuanqiXu abandoned this revision.Aug 16 2023, 11:58 PM

ChuanqiXu mentioned this in rGc4672454743e: [C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is not empty.Aug 21 2023, 7:03 PM

ChuanqiXu mentioned this in rGb32aa72afc1d: Recommit [C++20] [Coroutines] Mark await_suspend as noinline if the awaiter is….Aug 28 2023, 2:14 AM

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

4 lines

lib/

CodeGen/

CGCoroutine.cpp

172 lines

test/

CodeGenCoroutines/

coro-awaiter-addr.cpp

209 lines

coro-symmetric-transfer-02.cpp

2 lines

pr56301.cpp

85 lines

llvm/

docs/

Coroutines.rst

34 lines

include/

llvm/

IR/

Intrinsics.td

1 line

lib/

Transforms/

Coroutines/

CoroInstr.h

12 lines

Coroutines.cpp

7 lines

test/

Transforms/

Coroutines/

coro-opt-blocker.ll

69 lines

Diff 547134

clang/docs/ReleaseNotes.rst

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	- Clang constexpr evaluator now prints template arguments when displaying
template-specialization function calls.		template-specialization function calls.

Bug Fixes in This Version		Bug Fixes in This Version
-------------------------		-------------------------
- Fixed an issue where a class template specialization whose declaration is		- Fixed an issue where a class template specialization whose declaration is
instantiated in one module and whose definition is instantiated in another		instantiated in one module and whose definition is instantiated in another
module may end up with members associated with the wrong declaration of the		module may end up with members associated with the wrong declaration of the
class, which can result in miscompiles in some cases.		class, which can result in miscompiles in some cases.
		- Fixed an issue that the conditional access to local variables of the awaiter
		after leaking the coroutine handle in the await_suspend may be converted to
		unconditional access incorrectly.
		(`#56301 <https://github.com/llvm/llvm-project/issues/56301>`_)

Bug Fixes to Compiler Builtins		Bug Fixes to Compiler Builtins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bug Fixes to Attribute Support		Bug Fixes to Attribute Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Bug Fixes to C++ Support		Bug Fixes to C++ Support
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCoroutine.cpp

//===----- CGCoroutine.cpp - Emit LLVM Code for C++ coroutines ------------===//		//===----- CGCoroutine.cpp - Emit LLVM Code for C++ coroutines ------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This contains code dealing with C++ code generation of coroutines.		// This contains code dealing with C++ code generation of coroutines.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGCleanup.h"		#include "CGCleanup.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "llvm/ADT/ScopeExit.h"
#include "clang/AST/StmtCXX.h"		#include "clang/AST/StmtCXX.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
		#include "clang/AST/TypeVisitor.h"
		#include "llvm/ADT/ScopeExit.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

using llvm::Value;		using llvm::Value;
using llvm::BasicBlock;		using llvm::BasicBlock;

namespace {		namespace {
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	if (const auto *CE = dyn_cast<CXXMemberCallExpr>(E))
if (const auto *Proto =		if (const auto *Proto =
CE->getMethodDecl()->getType()->getAs<FunctionProtoType>())		CE->getMethodDecl()->getType()->getAs<FunctionProtoType>())
if (isNoexceptExceptionSpec(Proto->getExceptionSpecType()) &&		if (isNoexceptExceptionSpec(Proto->getExceptionSpecType()) &&
Proto->canThrow() == CT_Cannot)		Proto->canThrow() == CT_Cannot)
return false;		return false;
return true;		return true;
}		}

		namespace {
		// We need a TypeVisitor to find the actual awaiter declaration.
		// We can't use (CoroutineSuspendExpr).getCommonExpr()->getType() directly
		// since its type may be AutoType, ElaboratedType, ...
		class AwaiterTypeFinder : public TypeVisitor<AwaiterTypeFinder> {
		CXXRecordDecl *Result = nullptr;

		public:
		typedef TypeVisitor<AwaiterTypeFinder> Inherited;

		void Visit(const CoroutineSuspendExpr &S) {
		Visit(S.getCommonExpr()->getType());
		}

		bool IsRecordEmpty() {
		assert(Result && "Why can't we find the record type from the common "
		"expression of a coroutine suspend expression? "
		"Maybe we missed some types or the Sema get something "
		"incorrect");

		// In a release build without assertions enabled, return false directly
		// to give users better user experience. It doesn't matter with the
		// correctness but 1 byte memory overhead.
		#ifdef NDEBUG
		if (!Result)
		return false;
		#endif
		ChuanqiXuAuthorUnsubmitted Done Reply Inline Actions Note that the may cause a crash if there are some types we are not able to covered. This is intentional. Since such crash is easy to fix by adding new type to the TypeVisitor. I guess this may be understandable since the type system of clang frontend (or should I say C++?) is really complex. Also note that this only matters with people who enabled assertions, possibly developers or testers. For end users who use a clean release build, it is completely OK to not match the type. They may only need to pay 1 byte memory overhead. I feel this is better for the user experience. ChuanqiXu: Note that the may cause a crash if there are some types we are not able to covered. This is…

		return Result->field_empty();
		}

		// Following off should only be called by Inherited.
		public:
		void Visit(QualType Type) { Visit(Type.getTypePtr()); }

		void Visit(const Type *T) { Inherited::Visit(T); }

		void VisitDeducedType(const DeducedType *T) { Visit(T->getDeducedType()); }

		void VisitTypedefType(const TypedefType *T) {
		Visit(T->getDecl()->getUnderlyingType());
		}

		void VisitElaboratedType(const ElaboratedType *T) {
		Visit(T->getNamedType());
		}

		void VisitReferenceType(const ReferenceType *T) {
		Visit(T->getPointeeType());
		}

		void VisitTemplateSpecializationType(const TemplateSpecializationType *T) {
		// In the case the type is sugared, we can only see InjectedClassNameType,
		// which doesn't contain the definition information we need.
		if (T->desugar().getTypePtr() != T) {
		Visit(T->desugar().getTypePtr());
		return;
		}

		TemplateName Name = T->getTemplateName();
		TemplateDecl *TD = Name.getAsTemplateDecl();

		if (!TD)
		return;

		if (auto *TypedD = dyn_cast<TypeDecl>(TD->getTemplatedDecl()))
		Visit(TypedD->getTypeForDecl());
		}

		void VisitSubstTemplateTypeParmType(const SubstTemplateTypeParmType *T) {
		Visit(T->getReplacementType());
		}

		void VisitInjectedClassNameType(const InjectedClassNameType *T) {
		VisitCXXRecordDecl(T->getDecl());
		}

		void VisitCXXRecordDecl(CXXRecordDecl *Candidate) {
		assert(Candidate);

		#ifdef NDEBUG
		Result = Candidate;
		#else
		// Double check that the type we found is an awaiter class type.
		// We only do this in debug mode since:
		// The Sema should diagnose earlier in such cases. So this may
		// be a waste of time in most cases.
		// We just want to make sure our assumption is correct.

		auto HasMember = [](CXXRecordDecl *Candidate, llvm::StringRef Name,
		auto HasMember) {
		Candidate = Candidate->getDefinition();
		if (!Candidate)
		return false;

		ASTContext &Context = Candidate->getASTContext();

		auto IdenIter = Context.Idents.find(Name);
		if (IdenIter == Context.Idents.end())
		return false;

		if (!Candidate->lookup(DeclarationName(IdenIter->second)).empty())
		return true;

		return llvm::any_of(
		Candidate->bases(), [Name, &HasMember](CXXBaseSpecifier &Specifier) {
		auto *RD = cast<CXXRecordDecl>(
		Specifier.getType()->getAs<RecordType>()->getDecl());
		return HasMember(RD, Name, HasMember);
		});
		};

		bool FoundAwaitReady = HasMember(Candidate, "await_ready", HasMember);
		bool FoundAwaitSuspend = HasMember(Candidate, "await_suspend", HasMember);
		bool FoundAwaitResume = HasMember(Candidate, "await_resume", HasMember);

		assert(FoundAwaitReady && FoundAwaitSuspend && FoundAwaitResume);
		Result = Candidate;
		#endif
		}

		void VisitRecordType(const RecordType *RT) {
		assert(isa<CXXRecordDecl>(RT->getDecl()));
		VisitCXXRecordDecl(cast<CXXRecordDecl>(RT->getDecl()));
		}

		void VisitType(const Type *T) {}
		};
		} // namespace

		/// The middle end can't understand that the relationship between local
		/// variables between local variables with the coroutine handle until CoroSplit
		/// pass. However, there are a lot optimizations before CoroSplit. Luckily, it
		/// is not so bothering since the C++ languages doesn't allow the programmers to
		/// access the coroutine handle except in await_suspend. So it is sufficient to
		/// handle await_suspend specially. Here we emit @llvm.coro.opt.blocker with the
		/// address of the awaiter as the argument so that the optimizer will think the
		/// awaiter is escaped at the suspend point. See
		/// https://github.com/llvm/llvm-project/issues/56301 for the example and the
		/// complete discussion.
		static void EmitCoroOptBlockerForSuspension(CodeGenFunction &CGF,
		CoroutineSuspendExpr const &S,
		CGBuilderTy &Builder) {
		AwaiterTypeFinder Finder;
		Finder.Visit(S);
		// We shouldn't generate the call since it will prevents the compiler to erase
		// the empty awaiter class.
		if (Finder.IsRecordEmpty())
		return;

		// TODO: It would be better to generate the call only if we observed the
		// coroutine handle is leaked in the await_suspend. It is better to be
		// implemented by analying the generated IR instead of the AST.

		llvm::Function *CoroOptBlocker =
		CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_opt_blocker);
		llvm::Value *Addr = nullptr;
		if (CodeGenFunction::OpaqueValueMapping::shouldBindAsLValue(
		S.getOpaqueValue()))
		Addr =
		CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF);
		else
		Addr = CGF.getOrCreateOpaqueRValueMapping(S.getOpaqueValue())
		.getAggregatePointer();
		Builder.CreateCall(CoroOptBlocker, Addr);
		}

// Emit suspend expression which roughly looks like:		// Emit suspend expression which roughly looks like:
//		//
// auto && x = CommonExpr();		// auto && x = CommonExpr();
// if (!x.await_ready()) {		// if (!x.await_ready()) {
// llvm_coro_save();		// llvm_coro_save();
// x.await_suspend(...); (*)		// x.await_suspend(...); (*)
// llvm_coro_suspend(); (**)		// llvm_coro_suspend(); (**)
// }		// }
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Coro,
// Otherwise, emit suspend logic.		// Otherwise, emit suspend logic.
CGF.EmitBlock(SuspendBlock);		CGF.EmitBlock(SuspendBlock);

auto &Builder = CGF.Builder;		auto &Builder = CGF.Builder;
llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);		llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);		auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});		auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});

		EmitCoroOptBlockerForSuspension(CGF, S, Builder);

CGF.CurCoro.InSuspendBlock = true;		CGF.CurCoro.InSuspendBlock = true;
auto *SuspendRet = CGF.EmitScalarExpr(S.getSuspendExpr());		auto *SuspendRet = CGF.EmitScalarExpr(S.getSuspendExpr());
CGF.CurCoro.InSuspendBlock = false;		CGF.CurCoro.InSuspendBlock = false;
if (SuspendRet != nullptr && SuspendRet->getType()->isIntegerTy(1)) {		if (SuspendRet != nullptr && SuspendRet->getType()->isIntegerTy(1)) {
// Veto suspension if requested by bool returning await_suspend.		// Veto suspension if requested by bool returning await_suspend.
BasicBlock *RealSuspendBlock =		BasicBlock *RealSuspendBlock =
CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));		CGF.createBasicBlock(Prefix + Twine(".suspend.bool"));
CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);		CGF.Builder.CreateCondBr(SuspendRet, RealSuspendBlock, ReadyBlock);
▲ Show 20 Lines • Show All 640 Lines • Show Last 20 Lines

clang/test/CodeGenCoroutines/coro-awaiter-addr.cpp

This file was added.

				// Tests that we can generate @llvm.coro.opt.blocker correctly and we can omit the
				// call if certain conditions are met. (e.g., the awaitier is an empty class.)
				//
				// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
				// RUN: -disable-llvm-passes \| FileCheck %s

				#include "Inputs/coroutine.h"

				struct Task {
				struct promise_type {
				struct FinalAwaiter {
				bool await_ready() const noexcept { return false; }
				template <typename PromiseType>
				std::coroutine_handle<> await_suspend(std::coroutine_handle<PromiseType> h) noexcept {
				return h.promise().continuation;
				}
				void await_resume() noexcept {}
				};

				Task get_return_object() noexcept {
				return std::coroutine_handle<promise_type>::from_promise(*this);
				}

				std::suspend_always initial_suspend() noexcept { return {}; }
				FinalAwaiter final_suspend() noexcept { return {}; }
				void unhandled_exception() noexcept {}
				void return_void() noexcept {}

				std::coroutine_handle<> continuation;
				};

				Task(std::coroutine_handle<promise_type> handle);
				~Task();

				private:
				std::coroutine_handle<promise_type> handle;
				};

				struct StatefulAwaiter {
				int value;
				bool await_ready() const noexcept { return false; }
				template <typename PromiseType>
				void await_suspend(std::coroutine_handle<PromiseType> h) noexcept {}
				void await_resume() noexcept {}
				};

				typedef std::suspend_always NoStateAwaiter;
				using AnotherStatefulAwaiter = StatefulAwaiter;

				template <class T>
				struct TemplatedAwaiter {
				T value;
				bool await_ready() const noexcept { return false; }
				template <typename PromiseType>
				void await_suspend(std::coroutine_handle<PromiseType> h) noexcept {}
				void await_resume() noexcept {}
				};


				class Awaitable {};
				StatefulAwaiter operator co_await(Awaitable) {
				return StatefulAwaiter{};
				}

				StatefulAwaiter GlobalAwaiter;
				class Awaitable2 {};
				StatefulAwaiter& operator co_await(Awaitable2) {
				return GlobalAwaiter;
				}

				Task testing() {
				co_await std::suspend_always{};
				co_await StatefulAwaiter{};
				co_await AnotherStatefulAwaiter{};

				// Test lvalue case.
				StatefulAwaiter awaiter;
				co_await awaiter;

				co_await TemplatedAwaiter<int>{};
				TemplatedAwaiter<int> TemplatedAwaiterInstace;
				co_await TemplatedAwaiterInstace;

				co_await Awaitable{};
				co_await Awaitable2{};
				}

				// CHECK-LABEL: @_Z7testingv

				// Check `co_await __promise__.initial_suspend();` Since it returns std::suspend_always,
				// which is an empty class, we shouldn't generate optimization blocker for it.
				// CHECK: call token @llvm.coro.save
				// CHECK-NOT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZNSt14suspend_always13await_suspendESt16coroutine_handleIvE

				// Check the `co_await std::suspend_always{};` expression. We shouldn't emit the optimization
				// blocker for it since it is an empty class.
				// CHECK: call token @llvm.coro.save
				// CHECK-NOT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZNSt14suspend_always13await_suspendESt16coroutine_handleIvE

				// Check `co_await StatefulAwaiter{};`. We need to emit the optimization blocker since
				// the awaiter is not empty.
				// CHECK: call token @llvm.coro.save
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZN15StatefulAwaiter13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E

				// Check `co_await AnotherStatefulAwaiter{};` to make sure that we can handle TypedefTypes.
				// CHECK: call token @llvm.coro.save
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZN15StatefulAwaiter13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E

				// Check `co_await awaiter;` to make sure we can handle lvalue cases.
				// CHECK: call token @llvm.coro.save
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZN15StatefulAwaiter13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E

				// Check `co_await TemplatedAwaiter<int>{};` to make sure we can handle specialized template
				// type.
				// CHECK: call token @llvm.coro.save
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZN16TemplatedAwaiterIiE13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E

				// Check `co_await TemplatedAwaiterInstace;` to make sure we can handle the lvalue from
				// specialized template type.
				// CHECK: call token @llvm.coro.save
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZN16TemplatedAwaiterIiE13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E

				// Check `co_await Awaitable{};` to make sure we can handle awaiter returned by
				// `operator co_await`;
				// CHECK: call token @llvm.coro.save
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZN15StatefulAwaiter13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E

				// Check `co_await Awaitable2{};` to make sure we can handle awaiter returned by
				// `operator co_await` which returns a reference;
				// CHECK: call token @llvm.coro.save
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZN15StatefulAwaiter13await_suspendIN4Task12promise_typeEEEvSt16coroutine_handleIT_E

				// Check `co_await __promise__.final_suspend();`. We don't emit an blocker here since it is
				// empty.
				// CHECK: call token @llvm.coro.save
				// CHECK-NOT: call void @llvm.coro.opt.blocker(
				// CHECK: call ptr @_ZN4Task12promise_type12FinalAwaiter13await_suspendIS0_EESt16coroutine_handleIvES3_IT_E

				struct AwaitTransformTask {
				struct promise_type {
				struct FinalAwaiter {
				bool await_ready() const noexcept { return false; }
				template <typename PromiseType>
				std::coroutine_handle<> await_suspend(std::coroutine_handle<PromiseType> h) noexcept {
				return h.promise().continuation;
				}
				void await_resume() noexcept {}
				};

				AwaitTransformTask get_return_object() noexcept {
				return std::coroutine_handle<promise_type>::from_promise(*this);
				}

				std::suspend_always initial_suspend() noexcept { return {}; }
				FinalAwaiter final_suspend() noexcept { return {}; }
				void unhandled_exception() noexcept {}
				void return_void() noexcept {}

				template <typename Awaitable>
				auto await_transform(Awaitable &&awaitable) {
				return awaitable;
				}

				std::coroutine_handle<> continuation;
				};

				AwaitTransformTask(std::coroutine_handle<promise_type> handle);
				~AwaitTransformTask();

				private:
				std::coroutine_handle<promise_type> handle;
				};

				struct awaitableWithGetAwaiter {
				bool await_ready() const noexcept { return false; }
				template <typename PromiseType>
				void await_suspend(std::coroutine_handle<PromiseType> h) noexcept {}
				void await_resume() noexcept {}
				};

				AwaitTransformTask testingWithAwaitTransform() {
				co_await awaitableWithGetAwaiter{};
				}

				// CHECK-LABEL: @_Z25testingWithAwaitTransformv

				// Init suspend
				// CHECK: call token @llvm.coro.save
				// CHECK-NOT: call void @llvm.coro.opt.blocker(
				// CHECK: call void @_ZNSt14suspend_always13await_suspendESt16coroutine_handleIvE

				// Check `co_await awaitableWithGetAwaiter{};`.
				// CHECK: call token @llvm.coro.save
				// CHECK-NOT: call void @llvm.coro.opt.blocker(
				// Check call void @_ZN23awaitableWithGetAwaiter13await_suspendIN18AwaitTransformTask12promise_typeEEEvSt16coroutine_handleIT_E

				// Final suspend
				// CHECK: call token @llvm.coro.save
				// CHECK-NOT: call void @llvm.coro.opt.blocker(
				// CHECK: call ptr @_ZN18AwaitTransformTask12promise_type12FinalAwaiter13await_suspendIS0_EESt16coroutine_handleIvES3_IT_E

clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: i32 1, label %[[CASE1:.+]]			// CHECK-NEXT: i32 1, label %[[CASE1:.+]]
	// CHECK-NEXT: i32 2, label %[[CASE2:.+]]			// CHECK-NEXT: i32 2, label %[[CASE2:.+]]
	// CHECK-NEXT: ]			// CHECK-NEXT: ]

	// CHECK: [[CASE1]]:			// CHECK: [[CASE1]]:
	// CHECK: br i1 %{{.+}}, label %[[CASE1_AWAIT_READY:.+]], label %[[CASE1_AWAIT_SUSPEND:.+]]			// CHECK: br i1 %{{.+}}, label %[[CASE1_AWAIT_READY:.+]], label %[[CASE1_AWAIT_SUSPEND:.+]]
	// CHECK: [[CASE1_AWAIT_SUSPEND]]:			// CHECK: [[CASE1_AWAIT_SUSPEND]]:
	// CHECK-NEXT: %{{.+}} = call token @llvm.coro.save(ptr null)			// CHECK-NEXT: %{{.+}} = call token @llvm.coro.save(ptr null)
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
	// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr %[[TMP1:.+]])			// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr %[[TMP1:.+]])

	// CHECK: call void @llvm.lifetime.end.p0(i64 8, ptr %[[TMP1]])			// CHECK: call void @llvm.lifetime.end.p0(i64 8, ptr %[[TMP1]])
	// CHECK-NEXT: call void @llvm.coro.resume			// CHECK-NEXT: call void @llvm.coro.resume
	// CHECK-NEXT: %{{.+}} = call i8 @llvm.coro.suspend			// CHECK-NEXT: %{{.+}} = call i8 @llvm.coro.suspend
	// CHECK-NEXT: switch i8 %{{.+}}, label %coro.ret [			// CHECK-NEXT: switch i8 %{{.+}}, label %coro.ret [
	// CHECK-NEXT: i8 0, label %[[CASE1_AWAIT_READY]]			// CHECK-NEXT: i8 0, label %[[CASE1_AWAIT_READY]]
	// CHECK-NEXT: i8 1, label %[[CASE1_AWAIT_CLEANUP:.+]]			// CHECK-NEXT: i8 1, label %[[CASE1_AWAIT_CLEANUP:.+]]
	// CHECK-NEXT: ]			// CHECK-NEXT: ]
	// CHECK: [[CASE1_AWAIT_CLEANUP]]:			// CHECK: [[CASE1_AWAIT_CLEANUP]]:
	// make sure that the awaiter eventually gets cleaned up.			// make sure that the awaiter eventually gets cleaned up.
	// CHECK: call void @{{.+Awaiter.+}}			// CHECK: call void @{{.+Awaiter.+}}

	// CHECK: [[CASE2]]:			// CHECK: [[CASE2]]:
	// CHECK: br i1 %{{.+}}, label %[[CASE2_AWAIT_READY:.+]], label %[[CASE2_AWAIT_SUSPEND:.+]]			// CHECK: br i1 %{{.+}}, label %[[CASE2_AWAIT_READY:.+]], label %[[CASE2_AWAIT_SUSPEND:.+]]
	// CHECK: [[CASE2_AWAIT_SUSPEND]]:			// CHECK: [[CASE2_AWAIT_SUSPEND]]:
	// CHECK-NEXT: %{{.+}} = call token @llvm.coro.save(ptr null)			// CHECK-NEXT: %{{.+}} = call token @llvm.coro.save(ptr null)
				// CHECK-NEXT: call void @llvm.coro.opt.blocker(
	// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr %[[TMP2:.+]])			// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr %[[TMP2:.+]])

	// CHECK: call void @llvm.lifetime.end.p0(i64 8, ptr %[[TMP2]])			// CHECK: call void @llvm.lifetime.end.p0(i64 8, ptr %[[TMP2]])
	// CHECK-NEXT: call void @llvm.coro.resume			// CHECK-NEXT: call void @llvm.coro.resume
	// CHECK-NEXT: %{{.+}} = call i8 @llvm.coro.suspend			// CHECK-NEXT: %{{.+}} = call i8 @llvm.coro.suspend
	// CHECK-NEXT: switch i8 %{{.+}}, label %coro.ret [			// CHECK-NEXT: switch i8 %{{.+}}, label %coro.ret [
	// CHECK-NEXT: i8 0, label %[[CASE2_AWAIT_READY]]			// CHECK-NEXT: i8 0, label %[[CASE2_AWAIT_READY]]
	// CHECK-NEXT: i8 1, label %[[CASE2_AWAIT_CLEANUP:.+]]			// CHECK-NEXT: i8 1, label %[[CASE2_AWAIT_CLEANUP:.+]]
	// CHECK-NEXT: ]			// CHECK-NEXT: ]
	// CHECK: [[CASE2_AWAIT_CLEANUP]]:			// CHECK: [[CASE2_AWAIT_CLEANUP]]:
	// make sure that the awaiter eventually gets cleaned up.			// make sure that the awaiter eventually gets cleaned up.
	// CHECK: call void @{{.+Awaiter.+}}			// CHECK: call void @{{.+Awaiter.+}}

clang/test/CodeGenCoroutines/pr56301.cpp

This file was added.

				// An end-to-end test to make sure things get processed correctly.
				// RUN: %clang_cc1 -std=c++20 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s -O3 \| \
				// RUN: FileCheck %s

				#include "Inputs/coroutine.h"

				struct SomeAwaitable {
				// Resume the supplied handle once the awaitable becomes ready,
				// returning a handle that should be resumed now for the sake of symmetric transfer.
				// If the awaitable is already ready, return an empty handle without doing anything.
				//
				// Defined in another translation unit. Note that this may contain
				// code that synchronizees with another thread.
				std::coroutine_handle<> Register(std::coroutine_handle<>);
				};

				// Defined in another translation unit.
				void DidntSuspend();

				struct Awaiter {
				SomeAwaitable&& awaitable;
				bool suspended;

				bool await_ready() { return false; }

				std::coroutine_handle<> await_suspend(const std::coroutine_handle<> h) {
				// Assume we will suspend unless proven otherwise below. We must do
				// this before calling Register, since we may be destroyed by another
				// thread asynchronously as soon as we have registered.
				suspended = true;

				// Attempt to hand off responsibility for resuming/destroying the coroutine.
				const auto to_resume = awaitable.Register(h);

				if (!to_resume) {
				// The awaitable is already ready. In this case we know that Register didn't
				// hand off responsibility for the coroutine. So record the fact that we didn't
				// actually suspend, and tell the compiler to resume us inline.
				suspended = false;
				return h;
				}

				// Resume whatever Register wants us to resume.
				return to_resume;
				}

				void await_resume() {
				// If we didn't suspend, make note of that fact.
				if (!suspended) {
				DidntSuspend();
				}
				}
				};

				struct MyTask{
				struct promise_type {
				MyTask get_return_object() { return {}; }
				std::suspend_never initial_suspend() { return {}; }
				std::suspend_always final_suspend() noexcept { return {}; }
				void unhandled_exception();

				Awaiter await_transform(SomeAwaitable&& awaitable) {
				return Awaiter{static_cast<SomeAwaitable&&>(awaitable)};
				}
				};
				};

				MyTask FooBar() {
				co_await SomeAwaitable();
				}

				// CHECK-LABEL: @_Z6FooBarv
				// CHECK: %[[to_resume:.]] = {{.}}call ptr @_ZN13SomeAwaitable8RegisterESt16coroutine_handleIvE
				// CHECK-NEXT: %[[to_bool:.*]] = icmp eq ptr %[[to_resume]], null
				// CHECK-NEXT: br i1 %[[to_bool]], label %[[then:.]], label %[[else:.]]

				// CHECK: [[then]]:
				// We only access the coroutine frame conditionally as the sources did.
				// CHECK: store i8 0,
				// CHECK-NEXT: br label %[[else]]

				// CHECK: [[else]]:
				// No more access to the coroutine frame until suspended.
				// CHECK-NOT: store
				// CHECK: }

llvm/docs/Coroutines.rst

	Show First 20 Lines • Show All 1,585 Lines • ▼ Show 20 Lines
	.. code-block:: llvm			.. code-block:: llvm

	%save1 = call token @llvm.coro.save(i8* %hdl)			%save1 = call token @llvm.coro.save(i8* %hdl)
	call void @async_op1(i8* %hdl)			call void @async_op1(i8* %hdl)
	%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)			%suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
	switch i8 %suspend1, label %suspend [i8 0, label %resume1			switch i8 %suspend1, label %suspend [i8 0, label %resume1
	i8 1, label %cleanup]			i8 1, label %cleanup]

				.. _coro.opt.blocker:

				'llvm.coro.opt.blocker' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				::

				declare token @llvm.coro.opt.blocker(ptr <address>)

				Overview:
				"""""""""

				The '``@llvm.coro.opt.blocker``' leaks the address of a certain variable to
				prevent the variable get optimized out. The '``@llvm.coro.opt.blocker``' marker
				will be erased at the start of CoroSplit. This is useful since the optimizer can't
				know the local variables may be alias with the coroutine handle until CoroSplit pass
				so that some miscompilations may happen.

				Arguments:
				""""""""""

				The arguments is the address of the certain local variable that we want to block
				optimizations for.

				Semantics:
				""""""""""

				The local variable referred by '``@llvm.coro.opt.blocker``' won't get optimized out
				before CoroSplit. Note that this doesn't imply such variables must live on the coroutine
				frame. Also note that while every local variables may be alias with the coroutine handle,
				use '``@llvm.coro.opt.blocker``' for every local variables may be an overkill since it'll
				hurt the performance. The frontend can reduce the use of '``@llvm.coro.opt.blocker``' based
				on the rule of the corresponding language specification.

	.. _coro.suspend.async:			.. _coro.suspend.async:

	'llvm.coro.suspend.async' Intrinsic			'llvm.coro.suspend.async' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	::			::

	declare {i8, i8, i8*} @llvm.coro.suspend.async(			declare {i8, i8, i8*} @llvm.coro.suspend.async(
	i8* <resume function>,			i8* <resume function>,
	▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,643 Lines • ▼ Show 20 Lines	def int_coro_end_async
: Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty, llvm_vararg_ty], []>;		: Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty, llvm_vararg_ty], []>;

def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;		def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
def int_coro_noop : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;		def int_coro_noop : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;		def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
def int_coro_align : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;		def int_coro_align : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;

def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], [IntrNoMerge]>;		def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], [IntrNoMerge]>;
		def int_coro_opt_blocker : Intrinsic<[], [llvm_ptr_ty], []>;
def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;		def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;
def int_coro_suspend_retcon : Intrinsic<[llvm_any_ty], [llvm_vararg_ty], []>;		def int_coro_suspend_retcon : Intrinsic<[llvm_any_ty], [llvm_vararg_ty], []>;
def int_coro_prepare_retcon : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],		def int_coro_prepare_retcon : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_coro_alloca_alloc : Intrinsic<[llvm_token_ty],		def int_coro_alloca_alloc : Intrinsic<[llvm_token_ty],
[llvm_anyint_ty, llvm_i32_ty], []>;		[llvm_anyint_ty, llvm_i32_ty], []>;
def int_coro_alloca_get : Intrinsic<[llvm_ptr_ty], [llvm_token_ty], []>;		def int_coro_alloca_get : Intrinsic<[llvm_ptr_ty], [llvm_token_ty], []>;
def int_coro_alloca_free : Intrinsic<[], [llvm_token_ty], []>;		def int_coro_alloca_free : Intrinsic<[], [llvm_token_ty], []>;
▲ Show 20 Lines • Show All 893 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroInstr.h

Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines	public:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::coro_save;		return I->getIntrinsicID() == Intrinsic::coro_save;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

		/// This represents the llvm.coro.opt.blocker instruction.
		class LLVM_LIBRARY_VISIBILITY CoroOptBlockerInst : public IntrinsicInst {
		public:
		// Methods to support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::coro_opt_blocker;
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

/// This represents the llvm.coro.promise instruction.		/// This represents the llvm.coro.promise instruction.
class LLVM_LIBRARY_VISIBILITY CoroPromiseInst : public IntrinsicInst {		class LLVM_LIBRARY_VISIBILITY CoroPromiseInst : public IntrinsicInst {
enum { FrameArg, AlignArg, FromArg };		enum { FrameArg, AlignArg, FromArg };

public:		public:
/// Are we translating from the frame to the promise (false) or from		/// Are we translating from the frame to the promise (false) or from
/// the promise to the frame (true)?		/// the promise to the frame (true)?
bool isFromPromise() const {		bool isFromPromise() const {
▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/Coroutines.cpp

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
// Collect "interesting" coroutine intrinsics.		// Collect "interesting" coroutine intrinsics.
void coro::Shape::buildFrom(Function &F) {		void coro::Shape::buildFrom(Function &F) {
bool HasFinalSuspend = false;		bool HasFinalSuspend = false;
bool HasUnwindCoroEnd = false;		bool HasUnwindCoroEnd = false;
size_t FinalSuspendIndex = 0;		size_t FinalSuspendIndex = 0;
clear(*this);		clear(*this);
SmallVector<CoroFrameInst *, 8> CoroFrames;		SmallVector<CoroFrameInst *, 8> CoroFrames;
SmallVector<CoroSaveInst *, 2> UnusedCoroSaves;		SmallVector<CoroSaveInst *, 2> UnusedCoroSaves;
		SmallVector<CoroOptBlockerInst *, 8> CoroOptBlockers;

for (Instruction &I : instructions(F)) {		for (Instruction &I : instructions(F)) {
if (auto II = dyn_cast<IntrinsicInst>(&I)) {		if (auto II = dyn_cast<IntrinsicInst>(&I)) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
continue;		continue;
case Intrinsic::coro_size:		case Intrinsic::coro_size:
CoroSizes.push_back(cast<CoroSizeInst>(II));		CoroSizes.push_back(cast<CoroSizeInst>(II));
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (auto II = dyn_cast<IntrinsicInst>(&I)) {
report_fatal_error(		report_fatal_error(
"coroutine should have exactly one defining @llvm.coro.begin");		"coroutine should have exactly one defining @llvm.coro.begin");
CB->addRetAttr(Attribute::NonNull);		CB->addRetAttr(Attribute::NonNull);
CB->addRetAttr(Attribute::NoAlias);		CB->addRetAttr(Attribute::NoAlias);
CB->removeFnAttr(Attribute::NoDuplicate);		CB->removeFnAttr(Attribute::NoDuplicate);
CoroBegin = CB;		CoroBegin = CB;
break;		break;
}		}
		case Intrinsic::coro_opt_blocker:
		CoroOptBlockers.push_back(cast<CoroOptBlockerInst>(II));
		break;
case Intrinsic::coro_end_async:		case Intrinsic::coro_end_async:
case Intrinsic::coro_end:		case Intrinsic::coro_end:
CoroEnds.push_back(cast<AnyCoroEndInst>(II));		CoroEnds.push_back(cast<AnyCoroEndInst>(II));
if (auto *AsyncEnd = dyn_cast<CoroAsyncEndInst>(II)) {		if (auto *AsyncEnd = dyn_cast<CoroAsyncEndInst>(II)) {
AsyncEnd->checkWellFormed();		AsyncEnd->checkWellFormed();
}		}

if (CoroEnds.back()->isUnwind())		if (CoroEnds.back()->isUnwind())
Show All 10 Lines	if (auto II = dyn_cast<IntrinsicInst>(&I)) {
std::swap(CoroEnds.front(), CoroEnds.back());		std::swap(CoroEnds.front(), CoroEnds.back());
}		}
}		}
break;		break;
}		}
}		}
}		}

		for (CoroOptBlockerInst *CAAI : CoroOptBlockers)
		CAAI->eraseFromParent();

// If for some reason, we were not able to find coro.begin, bailout.		// If for some reason, we were not able to find coro.begin, bailout.
if (!CoroBegin) {		if (!CoroBegin) {
// Replace coro.frame which are supposed to be lowered to the result of		// Replace coro.frame which are supposed to be lowered to the result of
// coro.begin with undef.		// coro.begin with undef.
auto *Undef = UndefValue::get(Type::getInt8PtrTy(F.getContext()));		auto *Undef = UndefValue::get(Type::getInt8PtrTy(F.getContext()));
for (CoroFrameInst *CF : CoroFrames) {		for (CoroFrameInst *CF : CoroFrames) {
CF->replaceAllUsesWith(Undef);		CF->replaceAllUsesWith(Undef);
CF->eraseFromParent();		CF->eraseFromParent();
▲ Show 20 Lines • Show All 367 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-opt-blocker.ll

This file was added.

				; RUN: opt < %s -O1 -S \| FileCheck %s

				%"struct.std::coroutine_handle" = type { ptr }
				%"struct.std::coroutine_handle.0" = type { %"struct.std::coroutine_handle" }
				%"struct.lean_future<int>::Awaiter" = type { i32, %"struct.std::coroutine_handle.0" }

				declare ptr @malloc(i64)
				%empty = type { i8, i8 }
				declare void @produce(ptr)
				declare void @consume(ptr)
				declare void @consume.i8(i8)

				; Although the use of %testval lives across suspend points, it might be optimized
				; out by previous optimizations. Tests that the call to `@llvm.coro.opt.blocker`
				; can block optimizations for %testval so that it won't be optimized out and can
				; live on the coroutine frame.
				define void @foo(ptr %to_store) presplitcoroutine {
				entry:
				%testval = alloca %empty
				%a = alloca i8
				%id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
				%alloc = call ptr @malloc(i64 16) #3
				%vFrame = call noalias nonnull ptr @llvm.coro.begin(token %id, ptr %alloc)

				call void @produce(ptr %a)
				%testval.i = getelementptr inbounds %empty, ptr %testval, i32 0, i32 0
				store i8 0, ptr %testval.i
				%testval.ii = getelementptr inbounds %empty, ptr %testval, i32 0, i32 1
				store i8 1, ptr %testval.ii

				%save = call token @llvm.coro.save(ptr null)
				call void @llvm.coro.opt.blocker(ptr %testval)
				%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
				switch i8 %suspend, label %exit [
				i8 0, label %await.ready
				i8 1, label %exit
				]
				await.ready:
				%testval.i.l.addr = getelementptr inbounds %empty, ptr %testval, i32 0, i32 0
				%testval.i.l = load i8, ptr %testval.i.l.addr
				%testval.ii.l.addr = getelementptr inbounds %empty, ptr %testval, i32 0, i32 1
				%testval.ii.l = load i8, ptr %testval.ii.l.addr
				call void @consume.i8(i8 %testval.i.l)
				call void @consume.i8(i8 %testval.ii.l)
				call void @consume(ptr %a)
				br label %exit
				exit:
				call i1 @llvm.coro.end(ptr null, i1 false)
				ret void
				}

				; Verify that the %testval lives on the frame.
				; CHECK: %foo.Frame = type { ptr, ptr, %empty, i1, i8 }

				; Check that the call to @llvm.coro.opt.blocker get erased.
				; CHECK-NOT: call void @llvm.coro.opt.blocker

				declare token @llvm.coro.id(i32, ptr readnone, ptr nocapture readonly, ptr)
				declare i1 @llvm.coro.alloc(token) #3
				declare i64 @llvm.coro.size.i64() #5
				declare ptr @llvm.coro.begin(token, ptr writeonly) #3
				declare token @llvm.coro.save(ptr) #3
				declare ptr @llvm.coro.frame() #5
				declare i8 @llvm.coro.suspend(token, i1) #3
				declare ptr @llvm.coro.free(token, ptr nocapture readonly) #2
				declare i1 @llvm.coro.end(ptr, i1) #3
				declare void @llvm.lifetime.start.p0(i64, ptr nocapture) #4
				declare void @llvm.lifetime.end.p0(i64, ptr nocapture) #4
				declare void @llvm.coro.opt.blocker(ptr)