This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
4
CoroFrame.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-alloca-07.ll
-
coro-alloca-08.ll

Differential D96922

[Coroutine] Check indirect uses of alloca when checking lifetime info
ClosedPublic

Authored by lxfind on Feb 17 2021, 5:34 PM.

Download Raw Diff

Details

Reviewers

junparser
ChuanqiXu
rjmccall

Commits

rGc38000a9fb2c: [Coroutine] Check indirect uses of alloca when checking lifetime info

Summary

In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point.
This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias.
Only checking direct uses can miss cases where indirect uses are crossing suspension point.
This can be demonstrated in the newly added test case 007.
In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend.
If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame.

Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better.
We still checks the lifetime info in the same way as before, but with two differences:

The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated.
When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lxfind created this revision.Feb 17 2021, 5:34 PM

Herald added subscribers: hoy, modimo, wenlei, hiraditya. · View Herald TranscriptFeb 17 2021, 5:34 PM

lxfind requested review of this revision.Feb 17 2021, 5:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 5:34 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

ChuanqiXu added inline comments.Feb 17 2021, 6:37 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1012	usees -> uses
1025	Should we add `else return true;`? Since `Users` already contains the alias of the `alloca`, the code above already checks whether lifetime start info and the indirect uses cross suspend points.

address comments

Harbormaster completed remote builds in B89643: Diff 324476.Feb 17 2021, 7:33 PM

Harbormaster completed remote builds in B89652: Diff 324495.Feb 17 2021, 8:37 PM

In general, we cannot ever assume that we can see all uses of the alloca, whether direct or indirect; we can do our best when we can reason about particular uses, but we have to have some ability to recognize that the address may have escaped and therefore basically any point within the lifetime of the alloca might be a use. It doesn't seem like this gets us to that point yet.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1010	precise

In D96922#2572787, @rjmccall wrote:

In general, we cannot ever assume that we can see all uses of the alloca, whether direct or indirect; we can do our best when we can reason about particular uses, but we have to have some ability to recognize that the address may have escaped and therefore basically any point within the lifetime of the alloca might be a use. It doesn't seem like this gets us to that point yet.

Maybe we need to introduce AA or memorySSA to this pass to do that. And from my point of view, the solution now (by AllocaUseVisitor) can be a workaround. We could do analysis in AllocaUseVisitor as conservatively as we could before we introduce memorySSA or AA.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

1030

Maybe we need to forward the judgement for PI.isEscaped() to the front. Below is an example I find:

%a = alloca ...
%b = alloca ...
llvm.lifetime.start(%a)
call @f(%a, %b) ; %a maybe alias with %b and the `users` doesn't contain `%b` 
suspend point
; usage of %b ; so the memory access may not be right

In D96922#2573433, @ChuanqiXu wrote:

In D96922#2572787, @rjmccall wrote:

In general, we cannot ever assume that we can see all uses of the alloca, whether direct or indirect; we can do our best when we can reason about particular uses, but we have to have some ability to recognize that the address may have escaped and therefore basically any point within the lifetime of the alloca might be a use. It doesn't seem like this gets us to that point yet.

Maybe we need to introduce AA or memorySSA to this pass to do that. And from my point of view, the solution now (by AllocaUseVisitor) can be a workaround. We could do analysis in AllocaUseVisitor as conservatively as we could before we introduce memorySSA or AA.

We don't need more information to do that; we need to use the information we have *correctly*. When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).

In D96922#2573725, @rjmccall wrote:

When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).

Yes, this is the problem that the AllocaUseVisitor is trying to solve. Maybe the implementation detail have bugs, but I think our direction is same.

We don't need more information to do that; we need to use the information we have *correctly*. When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).

The fundamental problem (and what makes this so hard) is that we need to be very accurate to make the program correct. Being conservative is insufficient for correctness. This is due to the fact that we often access data after the frame is destroyed, and anything accessed after the frame is destroyed cannot be on the frame. So for those data we need to be able to accurately tell if they should be on the stack or if the developer has a bug.

Below is some IR that troubled me for a while (removed some lines that are irrelevant to simplify it). The alloca in question is % coro_gro. Its lifetime started early, used after coro.end.
Since it's used after coro.end, at which point the coroutine frame is already destroyed, coro_gro cannot be put on the frame and has to be put on the stack.
However technically it lives across suspension points and should be put on the frame.
The current implementation happens to work and put it on the stack because suspension checker stops propagating when it sees coro.end. So any use of data after coro.end is considered to not cross suspension point to anything before coro.suspend.
Also, I am not sure if we can prove that _ZN5folly15expected_detail7PromiseIiN12_GLOBALN_13ErrEE17get_return_objectEv never captures %coro_gro. Maybe a function cannot capture an argument that's sret? Not sure.

; Function Attrs: sanitize_address uwtable
define hidden i64 @_Z3foov() #3 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
entry:
  %__coro_gro = alloca %"struct.folly::expected_detail::PromiseReturn", align 8
  %__promise = alloca %"struct.folly::expected_detail::Promise", align 8
  %0 = bitcast %"struct.folly::expected_detail::Promise"* %__promise to i8*
  %1 = call token @llvm.coro.id(i32 16, i8* nonnull %0, i8* bitcast (i64 ()* @_Z3foov to i8*), i8* null)
  %2 = call i1 @llvm.coro.alloc(token %1)
  br i1 %2, label %coro.alloc, label %invoke.cont21

coro.alloc:                                       ; preds = %entry
  %3 = tail call i64 @llvm.coro.size.i64()
  %call = call i8* @_Znwm(i64 %3)
  br label %invoke.cont21

invoke.cont21:                                    ; preds = %coro.alloc, %entry
  %4 = phi i8* [ null, %entry ], [ %call, %coro.alloc ]
  %5 = call i8* @llvm.coro.begin(token %1, i8* %4) #13
  %6 = bitcast %"struct.folly::expected_detail::PromiseReturn"* %__coro_gro to i8*
  call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull %6) #6
  call fastcc void @_ZN5folly15expected_detail7PromiseIiN12_GLOBAL__N_13ErrEE17get_return_objectEv(%"struct.folly::expected_detail::PromiseReturn"* nonnull sret %__coro_gro, %"struct.folly::expected_detail::Promise"* nonnull %__promise) #6
  %call26 = call fastcc zeroext i1 @_ZNK5folly15expected_detail9AwaitableIiN12_GLOBAL__N_13ErrEE11await_readyEv(%"struct.folly::expected_detail::Awaitable"* nonnull %tmpcast) #6
  br i1 %call26, label %cont37, label %await.suspend

await.suspend:                                    ; preds = %invoke.cont21
  %10 = call token @llvm.coro.save(i8* null)
  %11 = bitcast %"class.std::experimental::coroutines_v1::coroutine_handle.21"* %retval.i109 to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %11)
  call fastcc void @_ZNSt12experimental13coroutines_v116coroutine_handleIN5folly15expected_detail7PromiseIiN12_GLOBAL__N_13ErrEEEEC2Ev(%"class.std::experimental::coroutines_v1::coroutine_handle.21"* nonnull %retval.i109) #6
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %11)
  invoke fastcc void @_ZN5folly15expected_detail9AwaitableIiN12_GLOBAL__N_13ErrEE13await_suspendIiEEvNSt12experimental13coroutines_v116coroutine_handleINS0_7PromiseIT_S3_EEEE(%"struct.folly::expected_detail::Awaitable"* nonnull %tmpcast, i8* %5)
          to label %invoke.cont35 unwind label %lpad

invoke.cont35:                                    ; preds = %await.suspend
  %12 = call i8 @llvm.coro.suspend(token %10, i1 false)
  switch i8 %12, label %coro.ret [
    i8 0, label %cont37
    i8 1, label %cleanup49
  ]

...

cleanup83:                                        ; preds = %invoke.cont54, %cleanup49, %cleanup49.thread
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #6
  %24 = call i8* @llvm.coro.free(token %1, i8* %5)
  %25 = icmp eq i8* %24, null
  br i1 %25, label %coro.ret, label %coro.free

coro.free:                                        ; preds = %cleanup83
  %26 = tail call i64 @llvm.coro.size.i64()
  call void @_ZdlPvm(i8* nonnull %24, i64 %26) #6
  br label %coro.ret

coro.ret:                                         ; preds = %coro.free, %cleanup83, %invoke.cont35
  %27 = call i1 @llvm.coro.end(i8* null, i1 false) #13
  %call95 = invoke fastcc i64 @_ZNR5folly15expected_detail13PromiseReturnIiN12_GLOBAL__N_13ErrEEcvNS_8ExpectedIiS3_EEEv(%"struct.folly::expected_detail::PromiseReturn"* nonnull %__coro_gro)
          to label %cleanup.action unwind label %lpad93
...
}

In D96922#2575118, @lxfind wrote:
We don't need more information to do that; we need to use the information we have *correctly*. When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).

The fundamental problem (and what makes this so hard) is that we need to be very accurate to make the program correct. Being conservative is insufficient for correctness. This is due to the fact that we often access data after the frame is destroyed, and anything accessed after the frame is destroyed cannot be on the frame. So for those data we need to be able to accurately tell if they should be on the stack or if the developer has a bug.

Below is some IR that troubled me for a while (removed some lines that are irrelevant to simplify it). The alloca in question is % coro_gro. Its lifetime started early, used after coro.end.
Since it's used after coro.end, at which point the coroutine frame is already destroyed, coro_gro cannot be put on the frame and has to be put on the stack.
However technically it lives across suspension points and should be put on the frame.
The current implementation happens to work and put it on the stack because suspension checker stops propagating when it sees coro.end. So any use of data after coro.end is considered to not cross suspension point to anything before coro.suspend.
Also, I am not sure if we can prove that _ZN5folly15expected_detail7PromiseIiN12_GLOBALN_13ErrEE17get_return_objectEv never captures %coro_gro. Maybe a function cannot capture an argument that's sret? Not sure.
; Function Attrs: sanitize_address uwtable
define hidden i64 @_Z3foov() #3 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
entry:
  %__coro_gro = alloca %"struct.folly::expected_detail::PromiseReturn", align 8
  %__promise = alloca %"struct.folly::expected_detail::Promise", align 8
  %0 = bitcast %"struct.folly::expected_detail::Promise"* %__promise to i8*
  %1 = call token @llvm.coro.id(i32 16, i8* nonnull %0, i8* bitcast (i64 ()* @_Z3foov to i8*), i8* null)
  %2 = call i1 @llvm.coro.alloc(token %1)
  br i1 %2, label %coro.alloc, label %invoke.cont21

coro.alloc:                                       ; preds = %entry
  %3 = tail call i64 @llvm.coro.size.i64()
  %call = call i8* @_Znwm(i64 %3)
  br label %invoke.cont21

invoke.cont21:                                    ; preds = %coro.alloc, %entry
  %4 = phi i8* [ null, %entry ], [ %call, %coro.alloc ]
  %5 = call i8* @llvm.coro.begin(token %1, i8* %4) #13
  %6 = bitcast %"struct.folly::expected_detail::PromiseReturn"* %__coro_gro to i8*
  call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull %6) #6
  call fastcc void @_ZN5folly15expected_detail7PromiseIiN12_GLOBAL__N_13ErrEE17get_return_objectEv(%"struct.folly::expected_detail::PromiseReturn"* nonnull sret %__coro_gro, %"struct.folly::expected_detail::Promise"* nonnull %__promise) #6
  %call26 = call fastcc zeroext i1 @_ZNK5folly15expected_detail9AwaitableIiN12_GLOBAL__N_13ErrEE11await_readyEv(%"struct.folly::expected_detail::Awaitable"* nonnull %tmpcast) #6
  br i1 %call26, label %cont37, label %await.suspend

await.suspend:                                    ; preds = %invoke.cont21
  %10 = call token @llvm.coro.save(i8* null)
  %11 = bitcast %"class.std::experimental::coroutines_v1::coroutine_handle.21"* %retval.i109 to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %11)
  call fastcc void @_ZNSt12experimental13coroutines_v116coroutine_handleIN5folly15expected_detail7PromiseIiN12_GLOBAL__N_13ErrEEEEC2Ev(%"class.std::experimental::coroutines_v1::coroutine_handle.21"* nonnull %retval.i109) #6
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %11)
  invoke fastcc void @_ZN5folly15expected_detail9AwaitableIiN12_GLOBAL__N_13ErrEE13await_suspendIiEEvNSt12experimental13coroutines_v116coroutine_handleINS0_7PromiseIT_S3_EEEE(%"struct.folly::expected_detail::Awaitable"* nonnull %tmpcast, i8* %5)
          to label %invoke.cont35 unwind label %lpad

invoke.cont35:                                    ; preds = %await.suspend
  %12 = call i8 @llvm.coro.suspend(token %10, i1 false)
  switch i8 %12, label %coro.ret [
    i8 0, label %cont37
    i8 1, label %cleanup49
  ]

...

cleanup83:                                        ; preds = %invoke.cont54, %cleanup49, %cleanup49.thread
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #6
  %24 = call i8* @llvm.coro.free(token %1, i8* %5)
  %25 = icmp eq i8* %24, null
  br i1 %25, label %coro.ret, label %coro.free

coro.free:                                        ; preds = %cleanup83
  %26 = tail call i64 @llvm.coro.size.i64()
  call void @_ZdlPvm(i8* nonnull %24, i64 %26) #6
  br label %coro.ret

coro.ret:                                         ; preds = %coro.free, %cleanup83, %invoke.cont35
  %27 = call i1 @llvm.coro.end(i8* null, i1 false) #13
  %call95 = invoke fastcc i64 @_ZNR5folly15expected_detail13PromiseReturnIiN12_GLOBAL__N_13ErrEEcvNS_8ExpectedIiS3_EEEv(%"struct.folly::expected_detail::PromiseReturn"* nonnull %__coro_gro)
          to label %cleanup.action unwind label %lpad93
...
}

It's really rare to me that program would try to access variable after coro.end. I can't image the corresponding C++ code right now. Does this program call @_ZNR5folly15expected_detail13PromiseReturnIiN12_GLOBAL__N_13ErrEEcvNS_8ExpectedIiS3_EEEv in destructor of promise_type? If it is the case, maybe the problem is related to the destroy order of coroutine frame and promise_type. Not sure.

In D96922#2575118, @lxfind wrote:

The current implementation happens to work and put it on the stack because suspension checker stops propagating when it sees coro.end. So any use of data after coro.end is considered to not cross suspension point to anything before coro.suspend.

If it's the solution, we should add an verify phase after we built coroutine frame to verify if there is any use of frame variable after coro.end.

In D96922#2575118, @lxfind wrote:

Also, I am not sure if we can prove that _ZN5folly15expected_detail7PromiseIiN12_GLOBALN_13ErrEE17get_return_objectEv never captures %coro_gro. Maybe a function cannot capture an argument that's sret?

I think we could only prove that only if we can inline get_return_object into foo. I am also not sure whether a function can capture a sret argument. But I think case by case fixes may not be good direction.

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

In D96922#2581141, @rjmccall wrote:

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

IIRC, coroutine developers already complain about the design of coroutine many times like D96928. But just like @lxfind says, it is impossible that we can redesign the mechanism in a short time. I agree with that we should discuss how to refactor the struct of coroutine to solve these problems fundamentally. Before we get a clear solution, I think it is ok to fix bugs like this patch and add verify phase to tell the user what's going wrong.

In D96922#2581141, @rjmccall wrote:

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

A special alloca for coroutines will certainly help but it's not sufficient. A special alloca and a set of special rules around data allocation in coroutines can help resolve the escape-before-coro.begin issue (though I haven't seen any new bugs around that).
There are other problems that need to be solved. I need some more time to think about how to put them all together.
I think we are doing OK identifying variables that need to be on the heap (a conservative approach will do just that).
But it gets complex when we also need to be able to identify a set of variables that MUST be put on the stack. These are the variables that are being used after a coroutine is suspended (i.e. coro.suspend switch goes to the default destination), at which point the coroutine frame may have been destroyed.
In the example I had above, the coroutine return object is allocated and returned in the very end. Since it's the return object, it has to be on the stack. But the compiler isn't powerful enough to tell that for two reasons:

It appears that it lives through coro.suspend, but it really only is used at the suspend path, not any other paths.
It's constructed by calling a (sret) struct pointer, which suggests that it may capture, further complicating the problem.

I think a proper solution might require introducing a new IR instruction for coroutine suspension, instead of relying on intrinsics. I have similar conclusions in another bug that involves LICM and coroutine.

To make this patch fully correct, we need to first check PI.isEscaped() before relying on lifetime information.
Then for the return object case to work, we have two short-term solutions:

We can treat sret as nocapture. This isn't 100% accurate. But the only case where C++ can capture a sret pointer is the constructor captures "this". But even that happens, in order for the "this" to really escape the object needs to be used again. So I think we should be safe there.
We can special handle the return object variable, and make sure it's always on the stack. This should also work, I think.

In D96922#2581141, @rjmccall wrote:

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

Actually, this gives me an idea, perhaps we can emit metadata to the allocas to teach CoroSplit whether one must be on the stack.

Added detailed documentation on next steps. While I need to continue iterating on them, this patch incrementally improves the exisiting code by considering indirect use, so it should be a pure win

Harbormaster completed remote builds in B90697: Diff 326221.Feb 24 2021, 5:58 PM

In D96922#2583920, @lxfind wrote:

In D96922#2581141, @rjmccall wrote:

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

A special alloca for coroutines will certainly help but it's not sufficient. A special alloca and a set of special rules around data allocation in coroutines can help resolve the escape-before-coro.begin issue (though I haven't seen any new bugs around that).
There are other problems that need to be solved. I need some more time to think about how to put them all together.
I think we are doing OK identifying variables that need to be on the heap (a conservative approach will do just that).
But it gets complex when we also need to be able to identify a set of variables that MUST be put on the stack. These are the variables that are being used after a coroutine is suspended (i.e. coro.suspend switch goes to the default destination), at which point the coroutine frame may have been destroyed.
In the example I had above, the coroutine return object is allocated and returned in the very end. Since it's the return object, it has to be on the stack. But the compiler isn't powerful enough to tell that for two reasons:

It appears that it lives through coro.suspend, but it really only is used at the suspend path, not any other paths.

It's constructed by calling a (sret) struct pointer, which suggests that it may capture, further complicating the problem.

I think a proper solution might require introducing a new IR instruction for coroutine suspension, instead of relying on intrinsics. I have similar conclusions in another bug that involves LICM and coroutine.

To make this patch fully correct, we need to first check PI.isEscaped() before relying on lifetime information.
Then for the return object case to work, we have two short-term solutions:

We can treat sret as nocapture. This isn't 100% accurate. But the only case where C++ can capture a sret pointer is the constructor captures "this". But even that happens, in order for the "this" to really escape the object needs to be used again. So I think we should be safe there.

We can special handle the return object variable, and make sure it's always on the stack. This should also work, I think.

Introducing new IR instruction and other special rules may be a long term target. And for the issue introduced by get_return_object, it looks better to emit special metadata for the alloca of get_return_object in the clang frontend.

this patch incrementally improves the exisiting code by considering indirect use, so it should be a pure win

Yes, this Patch LGTM.

This revision is now accepted and ready to land.Feb 24 2021, 6:01 PM

Closed by commit rGc38000a9fb2c: [Coroutine] Check indirect uses of alloca when checking lifetime info (authored by lxfind). · Explain WhyFeb 24 2021, 6:29 PM

This revision was automatically updated to reflect the committed changes.

lxfind added a commit: rGc38000a9fb2c: [Coroutine] Check indirect uses of alloca when checking lifetime info.

In D96922#2583920, @lxfind wrote:

In D96922#2581141, @rjmccall wrote:

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

A special alloca for coroutines will certainly help but it's not sufficient. A special alloca and a set of special rules around data allocation in coroutines can help resolve the escape-before-coro.begin issue (though I haven't seen any new bugs around that).

Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.

There are other problems that need to be solved. I need some more time to think about how to put them all together.
I think we are doing OK identifying variables that need to be on the heap (a conservative approach will do just that).
But it gets complex when we also need to be able to identify a set of variables that MUST be put on the stack. These are the variables that are being used after a coroutine is suspended (i.e. coro.suspend switch goes to the default destination), at which point the coroutine frame may have been destroyed.

Yeah, I agree with this basic statement of the problem. If we can figure out a way to reliably recognize the allocations that have to be on the stack but are really just meant to be local to various special computations which aren't fully part of the coroutine, that would make this much easier.

I'm not sure I understand what this allocation is that's only being used after the coroutine frame is destroyed. Where is it coming from? If it's actually just a local allocation in a function that we're inlining into the epilogue (a destructor?), can we reasonably delay that inlining to only happen after splitting? I think in any case we'll need to limit early inlining in those sections if we want to have stronger structural rules about allocation for them; either that or we'll need to be able to introduce our stronger rules during inlining.

In the example I had above, the coroutine return object is allocated and returned in the very end. Since it's the return object, it has to be on the stack. But the compiler isn't powerful enough to tell that for two reasons:

It appears that it lives through coro.suspend, but it really only is used at the suspend path, not any other paths.

It's constructed by calling a (sret) struct pointer, which suggests that it may capture, further complicating the problem.

I don't quite understand what you're saying here. If we're returning the return object indirectly (this is the return object from the ramp function, i.e. the coroutine value itself, right?), we should be constructing it in-place, and there's no local allocation for it. Are we constructing it into a temporary and then copying that (with memcpy? or a copy constructor?) for the actual return?

I think a proper solution might require introducing a new IR instruction for coroutine suspension, instead of relying on intrinsics. I have similar conclusions in another bug that involves LICM and coroutine.

Hmm. I can certainly believe that some of the code-motion passes need to understand that they can't always move things around suspensions. One problem with actually changing to an instruction, though, is that coroutine suspension actually looks quite different in the different lowerings.

To make this patch fully correct, we need to first check PI.isEscaped() before relying on lifetime information.

Yeah, this is the main thing I was trying to point out.

Then for the return object case to work, we have two short-term solutions:

We can treat sret as nocapture. This isn't 100% accurate. But the only case where C++ can capture a sret pointer is the constructor captures "this". But even that happens, in order for the "this" to really escape the object needs to be used again. So I think we should be safe there.

Once it's constructed and has possibly escaped, basically any side-effectful thing might indirectly access it. Usually functions don't have a problem with captures of this for the return value because constructing the return value is the last thing that happens in the function, except for a few destructors. That might be legitimately different in C++-style coroutines.

We can special handle the return object variable, and make sure it's always on the stack. This should also work, I think.

Yes, if Clang is emitting this return object variable as part of emitting the return from the ramp function, it should be easy to force it to use a different pattern, or at least mark it with some metadata.

Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.

I agree. Although I am leaning towards a different approach than specializing allocations. I am thinking about introducing another ramp function, whose sole job is to allocate the frame, copy the parameters, and then just to the real ramp function. This will prevent any random instructions from getting moved before coro.begin, because now we have a function boundary.

I'm not sure I understand what this allocation is that's only being used after the coroutine frame is destroyed. Where is it coming from? If it's actually just a local allocation in a function that we're inlining into the epilogue (a destructor?), can we reasonably delay that inlining to only happen after splitting? I think in any case we'll need to limit early inlining in those sections if we want to have stronger structural rules about allocation for them; either that or we'll need to be able to introduce our stronger rules during inlining.

It's not due to inlining. There are two cases here, one is the return object case (which I will talk a bit more below). The other is when a call to suspend_away returns a handle, and we need to do symmetric transfer by resuming that returned handle. In a case like this, the current coroutine (the one that is being suspended) may have been destroyed already. So all the instructions in-between the call to suspend_away and the resume of the handle should not touch the coroutine frame. This used to be quite complicated to deal with, but I patched it up in the front end a few months ago so that the lifetime of any temporaries used in that range is minimum as possible, so that a decent lifetime analysis/usage analysis should be able to decide that they need to be on the stack. However I still worry that if there exists any function call there that might appears to escape (especially when objects are constructed through sret), it will be broken.

I don't quite understand what you're saying here. If we're returning the return object indirectly (this is the return object from the ramp function, i.e. the coroutine value itself, right?), we should be constructing it in-place, and there's no local allocation for it. Are we constructing it into a temporary and then copying that (with memcpy? or a copy constructor?) for the actual return?

The return object is obtained by calling the get_return_object() function on the promise. And since the return object is a struct, it first do alloca to allocate a return object, then call the get_return_object() function with sret parameter. In that case, the return object isn't captured but the compiler doesn't know that, which makes it tricky.

Hmm. I can certainly believe that some of the code-motion passes need to understand that they can't always move things around suspensions. One problem with actually changing to an instruction, though, is that coroutine suspension actually looks quite different in the different lowerings.

It's possible that we may diverge more and more between these different lowerings..

In D96922#2586732, @lxfind wrote:

Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.

I agree. Although I am leaning towards a different approach than specializing allocations. I am thinking about introducing another ramp function, whose sole job is to allocate the frame, copy the parameters, and then just to the real ramp function. This will prevent any random instructions from getting moved before coro.begin, because now we have a function boundary.

That seems like a very good approach. And if you really want to avoid the call overhead, you can always force-inline the coroutine ramp after lowering.

I'm not sure I understand what this allocation is that's only being used after the coroutine frame is destroyed. Where is it coming from? If it's actually just a local allocation in a function that we're inlining into the epilogue (a destructor?), can we reasonably delay that inlining to only happen after splitting? I think in any case we'll need to limit early inlining in those sections if we want to have stronger structural rules about allocation for them; either that or we'll need to be able to introduce our stronger rules during inlining.

It's not due to inlining. There are two cases here, one is the return object case (which I will talk a bit more below). The other is when a call to suspend_away returns a handle, and we need to do symmetric transfer by resuming that returned handle. In a case like this, the current coroutine (the one that is being suspended) may have been destroyed already. So all the instructions in-between the call to suspend_away and the resume of the handle should not touch the coroutine frame. This used to be quite complicated to deal with, but I patched it up in the front end a few months ago so that the lifetime of any temporaries used in that range is minimum as possible, so that a decent lifetime analysis/usage analysis should be able to decide that they need to be on the stack. However I still worry that if there exists any function call there that might appears to escape (especially when objects are constructed through sret), it will be broken.

Ah, I see that I've failed to know about yet another feature of C++ coroutines.

I don't quite understand what you're saying here. If we're returning the return object indirectly (this is the return object from the ramp function, i.e. the coroutine value itself, right?), we should be constructing it in-place, and there's no local allocation for it. Are we constructing it into a temporary and then copying that (with memcpy? or a copy constructor?) for the actual return?

The return object is obtained by calling the get_return_object() function on the promise. And since the return object is a struct, it first do alloca to allocate a return object, then call the get_return_object() function with sret parameter. In that case, the return object isn't captured but the compiler doesn't know that, which makes it tricky.

Should the return object variable actually be NRVO'ed? It certainly sounds like it could be.

Hmm. I can certainly believe that some of the code-motion passes need to understand that they can't always move things around suspensions. One problem with actually changing to an instruction, though, is that coroutine suspension actually looks quite different in the different lowerings.

It's possible that we may diverge more and more between these different lowerings..

It's possible, but if we're going to start adding "harder" IR support, we should make an effort to see if we can find common ground first.

In D96922#2586732, @lxfind wrote:

Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.

I agree. Although I am leaning towards a different approach than specializing allocations. I am thinking about introducing another ramp function, whose sole job is to allocate the frame, copy the parameters, and then just to the real ramp function. This will prevent any random instructions from getting moved before coro.begin, because now we have a function boundary.

This idea looks to emit two function (the new ramp and the original one with whose argument is FramePtr) in the frontend. If it is the way, it could achieve the goal I try to discuss in llvm-dev before: Normalize Ramp Function. So I think it would be a good idea too.

In D96922#2586758, @rjmccall wrote:

Should the return object variable actually be NRVO'ed? It certainly sounds like it could be.

IIRC, it the return object would be RVO if its size exceed some threshold.

dongAxis1944 added a subscriber: dongAxis1944.Feb 26 2021, 12:35 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Coroutines/

CoroFrame.cpp

73 lines

test/

Transforms/

Coroutines/

coro-alloca-07.ll

104 lines

coro-alloca-08.ll

83 lines

Diff 326262

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

Show First 20 Lines • Show All 851 Lines • ▼ Show 20 Lines
namespace {		namespace {
struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {		struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
using Base = PtrUseVisitor<AllocaUseVisitor>;		using Base = PtrUseVisitor<AllocaUseVisitor>;
AllocaUseVisitor(const DataLayout &DL, const DominatorTree &DT,		AllocaUseVisitor(const DataLayout &DL, const DominatorTree &DT,
const CoroBeginInst &CB, const SuspendCrossingInfo &Checker)		const CoroBeginInst &CB, const SuspendCrossingInfo &Checker)
: PtrUseVisitor(DL), DT(DT), CoroBegin(CB), Checker(Checker) {}		: PtrUseVisitor(DL), DT(DT), CoroBegin(CB), Checker(Checker) {}

void visit(Instruction &I) {		void visit(Instruction &I) {
UserBBs.insert(I.getParent());		Users.insert(&I);
Base::visit(I);		Base::visit(I);
// If the pointer is escaped prior to CoroBegin, we have to assume it would		// If the pointer is escaped prior to CoroBegin, we have to assume it would
// be written into before CoroBegin as well.		// be written into before CoroBegin as well.
if (PI.isEscaped() && !DT.dominates(&CoroBegin, PI.getEscapingInst())) {		if (PI.isEscaped() && !DT.dominates(&CoroBegin, PI.getEscapingInst())) {
MayWriteBeforeCoroBegin = true;		MayWriteBeforeCoroBegin = true;
}		}
}		}
// We need to provide this overload as PtrUseVisitor uses a pointer based		// We need to provide this overload as PtrUseVisitor uses a pointer based
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
}		}

void visitGetElementPtrInst(GetElementPtrInst &GEPI) {		void visitGetElementPtrInst(GetElementPtrInst &GEPI) {
// The base visitor will adjust Offset accordingly.		// The base visitor will adjust Offset accordingly.
Base::visitGetElementPtrInst(GEPI);		Base::visitGetElementPtrInst(GEPI);
handleAlias(GEPI);		handleAlias(GEPI);
}		}

		void visitIntrinsicInst(IntrinsicInst &II) {
		if (II.getIntrinsicID() != Intrinsic::lifetime_start)
		return Base::visitIntrinsicInst(II);
		LifetimeStarts.insert(&II);
		}

void visitCallBase(CallBase &CB) {		void visitCallBase(CallBase &CB) {
for (unsigned Op = 0, OpCount = CB.getNumArgOperands(); Op < OpCount; ++Op)		for (unsigned Op = 0, OpCount = CB.getNumArgOperands(); Op < OpCount; ++Op)
if (U->get() == CB.getArgOperand(Op) && !CB.doesNotCapture(Op))		if (U->get() == CB.getArgOperand(Op) && !CB.doesNotCapture(Op))
PI.setEscaped(&CB);		PI.setEscaped(&CB);
handleMayWrite(CB);		handleMayWrite(CB);
}		}

bool getShouldLiveOnFrame() const {		bool getShouldLiveOnFrame() const {
Show All 17 Lines
private:		private:
const DominatorTree &DT;		const DominatorTree &DT;
const CoroBeginInst &CoroBegin;		const CoroBeginInst &CoroBegin;
const SuspendCrossingInfo &Checker;		const SuspendCrossingInfo &Checker;
// All alias to the original AllocaInst, created before CoroBegin and used		// All alias to the original AllocaInst, created before CoroBegin and used
// after CoroBegin. Each entry contains the instruction and the offset in the		// after CoroBegin. Each entry contains the instruction and the offset in the
// original Alloca. They need to be recreated after CoroBegin off the frame.		// original Alloca. They need to be recreated after CoroBegin off the frame.
DenseMap<Instruction *, llvm::Optional<APInt>> AliasOffetMap{};		DenseMap<Instruction *, llvm::Optional<APInt>> AliasOffetMap{};
SmallPtrSet<BasicBlock *, 2> UserBBs{};		SmallPtrSet<Instruction *, 4> Users{};
		SmallPtrSet<IntrinsicInst *, 2> LifetimeStarts{};
bool MayWriteBeforeCoroBegin{false};		bool MayWriteBeforeCoroBegin{false};

mutable llvm::Optional<bool> ShouldLiveOnFrame{};		mutable llvm::Optional<bool> ShouldLiveOnFrame{};

bool computeShouldLiveOnFrame() const {		bool computeShouldLiveOnFrame() const {
		// If lifetime information is available, we check it first since it's
		// more precise. We look at every pair of lifetime.start intrinsic and
		rjmccallUnsubmitted Not Done Reply Inline Actions precise rjmccall: precise
		// every basic block that uses the pointer to see if they cross suspension
		// points. The uses cover both direct uses as well as indirect uses.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions usees -> uses ChuanqiXu: usees -> uses
		if (!LifetimeStarts.empty()) {
		for (auto *I : Users)
		for (auto *S : LifetimeStarts)
		if (Checker.isDefinitionAcrossSuspend(*S, I))
		return true;
		return false;
		}
		// FIXME: Ideally the isEscaped check should come at the beginning.
		// However there are a few loose ends that need to be fixed first before
		// we can do that. We need to make sure we are not over-conservative, so
		// that the data accessed in-between await_suspend and symmetric transfer
		// is always put on the stack, and also data accessed after coro.end is
		// always put on the stack (esp the return object). To fix that, we need
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Should we add `else return true;`? Since `Users` already contains the alias of the `alloca`, the code above already checks whether lifetime start info and the indirect uses cross suspend points. ChuanqiXu: Should we add `else return true;`? Since `Users` already contains the alias of the `alloca`…
		// to:
		// 1) Potentially treat sret as nocapture in calls
		// 2) Special handle the return object and put it on the stack
		// 3) Utilize lifetime.end intrinsic
if (PI.isEscaped())		if (PI.isEscaped())
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Maybe we need to forward the judgement for `PI.isEscaped()` to the front. Below is an example I find: %a = alloca ... %b = alloca ... llvm.lifetime.start(%a) call @f(%a, %b) ; %a maybe alias with %b and the `users` doesn't contain `%b` suspend point ; usage of %b ; so the memory access may not be right ChuanqiXu: Maybe we need to forward the judgement for `PI.isEscaped()` to the front. Below is an example I…
return true;		return true;

for (auto *BB1 : UserBBs)		for (auto *U1 : Users)
for (auto *BB2 : UserBBs)		for (auto *U2 : Users)
if (Checker.hasPathCrossingSuspendPoint(BB1, BB2))		if (Checker.isDefinitionAcrossSuspend(*U1, U2))
return true;		return true;

return false;		return false;
}		}

void handleMayWrite(const Instruction &I) {		void handleMayWrite(const Instruction &I) {
if (!DT.dominates(&CoroBegin, &I))		if (!DT.dominates(&CoroBegin, &I))
MayWriteBeforeCoroBegin = true;		MayWriteBeforeCoroBegin = true;
▲ Show 20 Lines • Show All 1,073 Lines • ▼ Show 20 Lines	for (BasicBlock *DomBB : DomSet) {
}		}
}		}
}		}
}		}

static void collectFrameAllocas(Function &F, coro::Shape &Shape,		static void collectFrameAllocas(Function &F, coro::Shape &Shape,
const SuspendCrossingInfo &Checker,		const SuspendCrossingInfo &Checker,
SmallVectorImpl<AllocaInfo> &Allocas) {		SmallVectorImpl<AllocaInfo> &Allocas) {
// Collect lifetime.start info for each alloca.
using LifetimeStart = SmallPtrSet<Instruction *, 2>;
llvm::DenseMap<AllocaInst *, std::unique_ptr<LifetimeStart>> LifetimeMap;
for (Instruction &I : instructions(F)) {
auto *II = dyn_cast<IntrinsicInst>(&I);
if (!II \|\| II->getIntrinsicID() != Intrinsic::lifetime_start)
continue;

if (auto *OpInst = dyn_cast<Instruction>(II->getOperand(1))) {
if (auto *AI = dyn_cast<AllocaInst>(OpInst->stripPointerCasts())) {

if (LifetimeMap.find(AI) == LifetimeMap.end())
LifetimeMap[AI] = std::make_unique<LifetimeStart>();
LifetimeMap[AI]->insert(isa<AllocaInst>(OpInst) ? II : OpInst);
}
}
}

for (Instruction &I : instructions(F)) {		for (Instruction &I : instructions(F)) {
auto *AI = dyn_cast<AllocaInst>(&I);		auto *AI = dyn_cast<AllocaInst>(&I);
if (!AI)		if (!AI)
continue;		continue;
// The PromiseAlloca will be specially handled since it needs to be in a		// The PromiseAlloca will be specially handled since it needs to be in a
// fixed position in the frame.		// fixed position in the frame.
if (AI == Shape.SwitchLowering.PromiseAlloca) {		if (AI == Shape.SwitchLowering.PromiseAlloca) {
continue;		continue;
}		}
bool ShouldLiveOnFrame = false;
auto Iter = LifetimeMap.find(AI);
if (Iter != LifetimeMap.end()) {
// Check against lifetime.start if the instruction has the info.
for (User *U : I.users()) {
for (auto S : Iter->second)
if ((ShouldLiveOnFrame = Checker.isDefinitionAcrossSuspend(*S, U)))
break;
if (ShouldLiveOnFrame)
break;
}
if (!ShouldLiveOnFrame)
continue;
}
// At this point, either ShouldLiveOnFrame is true or we didn't have
// lifetime information. We will need to rely on more precise pointer
// tracking.
DominatorTree DT(F);		DominatorTree DT(F);
AllocaUseVisitor Visitor{F.getParent()->getDataLayout(), DT,		AllocaUseVisitor Visitor{F.getParent()->getDataLayout(), DT,
*Shape.CoroBegin, Checker};		*Shape.CoroBegin, Checker};
Visitor.visitPtr(*AI);		Visitor.visitPtr(*AI);
if (!Visitor.getShouldLiveOnFrame())		if (!Visitor.getShouldLiveOnFrame())
continue;		continue;
Allocas.emplace_back(AI, Visitor.getAliasesCopy(),		Allocas.emplace_back(AI, Visitor.getAliasesCopy(),
Visitor.getMayWriteBeforeCoroBegin());		Visitor.getMayWriteBeforeCoroBegin());
▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-alloca-07.ll

This file was added.

				; Tests that CoroSplit can succesfully determine allocas should live on the frame
				; if their aliases are used across suspension points through PHINode.
				; RUN: opt < %s -coro-split -S \| FileCheck %s
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				define i8* @f(i1 %n) "coroutine.presplit"="1" {
				entry:
				%x = alloca i64
				%y = alloca i64
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				br i1 %n, label %flag_true, label %flag_false

				flag_true:
				%x.alias = bitcast i64* %x to i8*
				call void @llvm.lifetime.start.p0i8(i64 8, i8* %x.alias)
				br label %merge

				flag_false:
				%y.alias = bitcast i64* %y to i8*
				call void @llvm.lifetime.start.p0i8(i64 8, i8* %y.alias)
				br label %merge

				merge:
				%alias_phi = phi i8* [ %x.alias, %flag_true ], [ %y.alias, %flag_false ]
				store i8 1, i8* %alias_phi
				%sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sp1, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				call void @print(i8* %alias_phi)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture)

				declare void @print(i8*)
				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

				; Verify that both x and y are put in the frame.
				; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i64, i64, i8*, i1 }

				; CHECK-LABEL: @f(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ID:%.]] = call token @llvm.coro.id(i32 0, i8 null, i8* null, i8* bitcast ([3 x void (%f.Frame)]* @f.resumers to i8*))
				; CHECK-NEXT: [[ALLOC:%.]] = call i8 @malloc(i32 48)
				; CHECK-NEXT: [[HDL:%.]] = call noalias nonnull i8 @llvm.coro.begin(token [[ID]], i8* [[ALLOC]])
				; CHECK-NEXT: [[FRAMEPTR:%.]] = bitcast i8 [[HDL]] to %f.Frame*
				; CHECK-NEXT: [[RESUME_ADDR:%.]] = getelementptr inbounds [[F_FRAME:%.]], %f.Frame* [[FRAMEPTR]], i32 0, i32 0
				; CHECK-NEXT: store void (%f.Frame) @f.resume, void (%f.Frame)* [[RESUME_ADDR]], align 8
				; CHECK-NEXT: [[DESTROY_ADDR:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR]], i32 0, i32 1
				; CHECK-NEXT: store void (%f.Frame) @f.destroy, void (%f.Frame)* [[DESTROY_ADDR]], align 8
				; CHECK-NEXT: [[X_RELOAD_ADDR:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR]], i32 0, i32 2
				; CHECK-NEXT: [[Y_RELOAD_ADDR:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR]], i32 0, i32 3
				; CHECK-NEXT: br i1 [[N:%.]], label [[FLAG_TRUE:%.]], label [[FLAG_FALSE:%.*]]
				; CHECK: flag_true:
				; CHECK-NEXT: [[X_ALIAS:%.]] = bitcast i64 [[X_RELOAD_ADDR]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 8, i8* [[X_ALIAS]])
				; CHECK-NEXT: br label [[MERGE:%.*]]
				; CHECK: flag_false:
				; CHECK-NEXT: [[Y_ALIAS:%.]] = bitcast i64 [[Y_RELOAD_ADDR]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 8, i8* [[Y_ALIAS]])
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: merge:
				; CHECK-NEXT: [[ALIAS_PHI:%.]] = phi i8 [ [[X_ALIAS]], [[FLAG_TRUE]] ], [ [[Y_ALIAS]], [[FLAG_FALSE]] ]
				; CHECK-NEXT: [[ALIAS_PHI_SPILL_ADDR:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR]], i32 0, i32 4
				; CHECK-NEXT: store i8* [[ALIAS_PHI]], i8** [[ALIAS_PHI_SPILL_ADDR]], align 8
				; CHECK-NEXT: store i8 1, i8* [[ALIAS_PHI]], align 1
				; CHECK-NEXT: [[INDEX_ADDR1:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR]], i32 0, i32 5
				; CHECK-NEXT: store i1 false, i1* [[INDEX_ADDR1]], align 1
				; CHECK-NEXT: ret i8* [[HDL]]
				;
				;
				; CHECK-LABEL: @f.resume(
				; CHECK-NEXT: entry.resume:
				; CHECK-NEXT: [[VFRAME:%.]] = bitcast %f.Frame [[FRAMEPTR:%.]] to i8
				; CHECK-NEXT: [[ALIAS_PHI_RELOAD_ADDR:%.]] = getelementptr inbounds [[F_FRAME:%.]], %f.Frame* [[FRAMEPTR]], i32 0, i32 4
				; CHECK-NEXT: [[ALIAS_PHI_RELOAD:%.]] = load i8, i8** [[ALIAS_PHI_RELOAD_ADDR]], align 8
				; CHECK-NEXT: call void @print(i8* [[ALIAS_PHI_RELOAD]])
				; CHECK-NEXT: call void @free(i8* [[VFRAME]])
				; CHECK-NEXT: ret void

llvm/test/Transforms/Coroutines/coro-alloca-08.ll

This file was added.

				; RUN: opt < %s -coro-split -S \| FileCheck %s
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				%"struct.std::coroutine_handle" = type { i8* }
				%"struct.std::coroutine_handle.0" = type { %"struct.std::coroutine_handle" }
				%"struct.lean_future<int>::Awaiter" = type { i32, %"struct.std::coroutine_handle.0" }

				declare i8* @malloc(i64)

				%i8.array = type { [100 x i8] }
				declare void @consume.i8.array(%i8.array*)

				; The lifetime of testval starts and ends before coro.suspend. Even though consume.i8.array
				; might capture it, we can safely say it won't live across suspension.
				define void @foo() "coroutine.presplit"="1" {
				entry:
				%testval = alloca %i8.array
				%cast = getelementptr inbounds %i8.array, %i8.array* %testval, i64 0, i32 0, i64 0
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%alloc = call i8* @malloc(i64 16) #3
				%vFrame = call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %alloc)

				call void @llvm.lifetime.start.p0i8(i64 100, i8* %cast)
				call void @consume.i8.array(%i8.array* %testval)
				call void @llvm.lifetime.end.p0i8(i64 100, i8* %cast)

				%save = call token @llvm.coro.save(i8* null)
				%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
				switch i8 %suspend, label %exit [
				i8 0, label %await.ready
				i8 1, label %exit
				]
				await.ready:
				%StrayCoroSave = call token @llvm.coro.save(i8* null)
				br label %exit
				exit:
				call i1 @llvm.coro.end(i8* null, i1 false)
				ret void
				}

				; The lifetime of testval starts after coro.suspend. So it will never live across suspension
				; points.
				define void @bar() "coroutine.presplit"="1" {
				entry:
				%testval = alloca %i8.array
				%cast = getelementptr inbounds %i8.array, %i8.array* %testval, i64 0, i32 0, i64 0
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%alloc = call i8* @malloc(i64 16) #3
				%vFrame = call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %alloc)
				%save = call token @llvm.coro.save(i8* null)
				%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
				switch i8 %suspend, label %exit [
				i8 0, label %await.ready
				i8 1, label %exit
				]
				await.ready:
				%StrayCoroSave = call token @llvm.coro.save(i8* null)

				call void @llvm.lifetime.start.p0i8(i64 100, i8* %cast)
				call void @consume.i8.array(%i8.array* %testval)
				call void @llvm.lifetime.end.p0i8(i64 100, i8* %cast)

				br label %exit
				exit:
				call i1 @llvm.coro.end(i8* null, i1 false)
				ret void
				}

				; Verify that for both foo and bar, testval isn't put on the frame.
				; CHECK: %foo.Frame = type { void (%foo.Frame), void (%foo.Frame), i1 }
				; CHECK: %bar.Frame = type { void (%bar.Frame), void (%bar.Frame), i1 }

				declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*)
				declare i1 @llvm.coro.alloc(token) #3
				declare i64 @llvm.coro.size.i64() #5
				declare i8* @llvm.coro.begin(token, i8* writeonly) #3
				declare token @llvm.coro.save(i8*) #3
				declare i8* @llvm.coro.frame() #5
				declare i8 @llvm.coro.suspend(token, i1) #3
				declare i8* @llvm.coro.free(token, i8* nocapture readonly) #2
				declare i1 @llvm.coro.end(i8*, i1) #3
				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #4
				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #4

This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine] Check indirect uses of alloca when checking lifetime infoClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 326262

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

llvm/test/Transforms/Coroutines/coro-alloca-07.ll

llvm/test/Transforms/Coroutines/coro-alloca-08.ll

[Coroutine] Check indirect uses of alloca when checking lifetime info
ClosedPublic