This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine] Do not CoroElide if there are musttail calls
ClosedPublic

Authored by lxfind on Jan 15 2021, 3:08 PM.

Details

Summary

This is to address https://bugs.llvm.org/show_bug.cgi?id=48626.
When there are musttail calls that use parameters aliasing the newly created coroutine frame, the existing implementation will fatal.
We simply cannot perform CoroElide in such cases. In theory a precise analysis can be done to check whether the parameters of the musttail call
actually alias the frame, but it's very hard to do it before the transformation happens. Also in most cases the existence of musttail call is
generated due to symmetric transfers, and in those cases alias analysis won't be able to tell that they don't alias anyway.

Diff Detail

Event Timeline

lxfind created this revision.Jan 15 2021, 3:08 PM
lxfind requested review of this revision.Jan 15 2021, 3:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2021, 3:08 PM
lxfind added a subscriber: palebedev.
bruno added a reviewer: bruno.Jan 15 2021, 5:17 PM
bruno added a subscriber: bruno.

Nice correctness improvement!

llvm/lib/Transforms/Coroutines/CoroElide.cpp
259

How about walking the BBs and check by using getTerminatingMustTailCall()?

lxfind added inline comments.Jan 15 2021, 5:48 PM
llvm/lib/Transforms/Coroutines/CoroElide.cpp
259

That's nice. Thanks!

lxfind updated this revision to Diff 317130.Jan 15 2021, 5:56 PM

address comments

I confirm this fixes the problem I've originally reported, thank you! Too bad this is yet another CoroElide pessimization...

I am a little unclear about this problem. From my point of view, it seems like that there is a Coroutine C elided in a normal function F. Then in the Coroutine Body of C, it would try to switch to itself by symmetric transfer. However, the Coroutine frame of C now is a stack variable. Then the tail call would pass the address of the stack variable whose lifetime has ended, so here is the corruption. Did I understand the situation?

junparser accepted this revision.Jan 17 2021, 10:42 PM

Thanks for fix this. Furthermore, we can add runtime check to see whether this pointer is current coroutine frame, and optimize them separately.

This revision is now accepted and ready to land.Jan 17 2021, 10:42 PM

I am a little unclear about this problem. From my point of view, it seems like that there is a Coroutine C elided in a normal function F. Then in the Coroutine Body of C, it would try to switch to itself by symmetric transfer. However, the Coroutine frame of C now is a stack variable. Then the tail call would pass the address of the stack variable whose lifetime has ended, so here is the corruption. Did I understand the situation?

Yes I believe that's an accurate description

This revision was automatically updated to reflect the committed changes.

I am a little unclear about this problem. From my point of view, it seems like that there is a Coroutine C elided in a normal function F. Then in the Coroutine Body of C, it would try to switch to itself by symmetric transfer. However, the Coroutine frame of C now is a stack variable. Then the tail call would pass the address of the stack variable whose lifetime has ended, so here is the corruption. Did I understand the situation?

Yes I believe that's an accurate description

I'm looking into this bug now. Now it is more confusing to me since it seems like that this process shouldn't happen.

The IR generated for the attachment of bug48626 is like:

%19 = tail call noalias nonnull i8* @llvm.coro.begin(token %15, i8* %18), !noalias !1082 ; The handle we are going to elide
// ... in another BB
%35 = call i8* @llvm.coro.subfn.addr(i8* nonnull %19, i8 1) #28 ; destroy %19
// ... in another BB
musttail call fastcc void %51(i8* %retval.sroa.0.0.copyload.i.i.i120) ; dominated by `call i8* @llvm.coro.subfn.addr(i8* nonnull %19, i8 1)`
ret void

It seems like that the argument of the musttail call can't alias %19 in any sense since %19 already been destroyed before the musttail call. Then it makes sense to all situations I can image. Since we only elide the coro handles who wouldn't escape. And we did the escape analysis very conservatively (Not by AA analysis, but by a simple path based escape analysis). So I think it may be better to remove the check in the end of ShouldElide function. It seems like that the memory wouldn't crash due to the argument of the musttail call alias with the coroutine handle we elided. And if any corruption happens, I think is is a bug for the escape analysis.

I am a little unclear about this problem. From my point of view, it seems like that there is a Coroutine C elided in a normal function F. Then in the Coroutine Body of C, it would try to switch to itself by symmetric transfer. However, the Coroutine frame of C now is a stack variable. Then the tail call would pass the address of the stack variable whose lifetime has ended, so here is the corruption. Did I understand the situation?

Yes I believe that's an accurate description

I'm looking into this bug now. Now it is more confusing to me since it seems like that this process shouldn't happen.

The IR generated for the attachment of bug48626 is like:

%19 = tail call noalias nonnull i8* @llvm.coro.begin(token %15, i8* %18), !noalias !1082 ; The handle we are going to elide
// ... in another BB
%35 = call i8* @llvm.coro.subfn.addr(i8* nonnull %19, i8 1) #28 ; destroy %19
// ... in another BB
musttail call fastcc void %51(i8* %retval.sroa.0.0.copyload.i.i.i120) ; dominated by `call i8* @llvm.coro.subfn.addr(i8* nonnull %19, i8 1)`
ret void

It seems like that the argument of the musttail call can't alias %19 in any sense since %19 already been destroyed before the musttail call. Then it makes sense to all situations I can image. Since we only elide the coro handles who wouldn't escape. And we did the escape analysis very conservatively (Not by AA analysis, but by a simple path based escape analysis). So I think it may be better to remove the check in the end of ShouldElide function. It seems like that the memory wouldn't crash due to the argument of the musttail call alias with the coroutine handle we elided. And if any corruption happens, I think is is a bug for the escape analysis.

It won't crash in this particular case. But as discussed in the bug thread, in theory the symmetric transfer could return the same handle as passed in, and in that case you won't be able to do tail call, right?

I am a little unclear about this problem. From my point of view, it seems like that there is a Coroutine C elided in a normal function F. Then in the Coroutine Body of C, it would try to switch to itself by symmetric transfer. However, the Coroutine frame of C now is a stack variable. Then the tail call would pass the address of the stack variable whose lifetime has ended, so here is the corruption. Did I understand the situation?

Yes I believe that's an accurate description

I'm looking into this bug now. Now it is more confusing to me since it seems like that this process shouldn't happen.

The IR generated for the attachment of bug48626 is like:

%19 = tail call noalias nonnull i8* @llvm.coro.begin(token %15, i8* %18), !noalias !1082 ; The handle we are going to elide
// ... in another BB
%35 = call i8* @llvm.coro.subfn.addr(i8* nonnull %19, i8 1) #28 ; destroy %19
// ... in another BB
musttail call fastcc void %51(i8* %retval.sroa.0.0.copyload.i.i.i120) ; dominated by `call i8* @llvm.coro.subfn.addr(i8* nonnull %19, i8 1)`
ret void

It seems like that the argument of the musttail call can't alias %19 in any sense since %19 already been destroyed before the musttail call. Then it makes sense to all situations I can image. Since we only elide the coro handles who wouldn't escape. And we did the escape analysis very conservatively (Not by AA analysis, but by a simple path based escape analysis). So I think it may be better to remove the check in the end of ShouldElide function. It seems like that the memory wouldn't crash due to the argument of the musttail call alias with the coroutine handle we elided. And if any corruption happens, I think is is a bug for the escape analysis.

It won't crash in this particular case. But as discussed in the bug thread, in theory the symmetric transfer could return the same handle as passed in, and in that case you won't be able to do tail call, right?

Yes and my thoughts about this case is that CoroElide pass won't elide the frame. As a general example:

define void @xxx.resume(%FrameTy* %FramePtr) {
entry:
     ; ...
BB:
     %handle = tail call noalias nonnull i8* @llvm.coro.begin(...)
     ; ...
BB1:
     call i8* @llvm.subfn.addr(%handle, 1)
     ; ...
BB.final:
      musttail call fastcc void @resume.call(i8* %val) 
      ret void
}

And I think there are two cases about this example:
(1) The musttail call is dominated by @llvm.subfn.addr(%handle, 1) calls. Then %handle is destroyed at the musttail call, so it makes no sense that the musttail call returns %handle.
(2) The musttail call isn't dominated by @llvm.subfn.addr(%handle, 1) calls. Then the musttail call may returns %handle. And it doesn't matter since %handle won't be elided in this case.

And I think there are two cases about this example:
(1) The musttail call is dominated by @llvm.subfn.addr(%handle, 1) calls. Then %handle is destroyed at the musttail call, so it makes no sense that the musttail call returns %handle.
(2) The musttail call isn't dominated by @llvm.subfn.addr(%handle, 1) calls. Then the musttail call may returns %handle. And it doesn't matter since %handle won't be elided in this case.

In case (1), why is it that '%handle' must be destroyed at the musttail call?

And I think there are two cases about this example:
(1) The musttail call is dominated by @llvm.subfn.addr(%handle, 1) calls. Then %handle is destroyed at the musttail call, so it makes no sense that the musttail call returns %handle.
(2) The musttail call isn't dominated by @llvm.subfn.addr(%handle, 1) calls. Then the musttail call may returns %handle. And it doesn't matter since %handle won't be elided in this case.

In case (1), why is it that '%handle' must be destroyed at the musttail call?

In my mind, @llvm.subfn.addr(%handle, 1) means a call which would destroy %handle. And CoroElide Pass also treat the call as a lifetime end marker for %handle. CoroElide pass only elides coroutine handle who is guarded by @llvm.subfn.addr(%handle, 1) .

And I think there are two cases about this example:
(1) The musttail call is dominated by @llvm.subfn.addr(%handle, 1) calls. Then %handle is destroyed at the musttail call, so it makes no sense that the musttail call returns %handle.
(2) The musttail call isn't dominated by @llvm.subfn.addr(%handle, 1) calls. Then the musttail call may returns %handle. And it doesn't matter since %handle won't be elided in this case.

In case (1), why is it that '%handle' must be destroyed at the musttail call?

In my mind, @llvm.subfn.addr(%handle, 1) means a call which would destroy %handle. And CoroElide Pass also treat the call as a lifetime end marker for %handle. CoroElide pass only elides coroutine handle who is guarded by @llvm.subfn.addr(%handle, 1) .

I guess that means we could improve this by teaching alias analysis that llvm.subfn.addr kills the handle.

And I think there are two cases about this example:
(1) The musttail call is dominated by @llvm.subfn.addr(%handle, 1) calls. Then %handle is destroyed at the musttail call, so it makes no sense that the musttail call returns %handle.
(2) The musttail call isn't dominated by @llvm.subfn.addr(%handle, 1) calls. Then the musttail call may returns %handle. And it doesn't matter since %handle won't be elided in this case.

In case (1), why is it that '%handle' must be destroyed at the musttail call?

In my mind, @llvm.subfn.addr(%handle, 1) means a call which would destroy %handle. And CoroElide Pass also treat the call as a lifetime end marker for %handle. CoroElide pass only elides coroutine handle who is guarded by @llvm.subfn.addr(%handle, 1) .

I guess that means we could improve this by teaching alias analysis that llvm.subfn.addr kills the handle.

I agree with that is a good solution to use AA if we can teach AA that llvm.subfn.addr(%handle, 1) kills %handle. But for my understanding to AA, it is not easy to do so. Maybe we need to refactor both AA and CoroElide to do that.
Let's go back to this bug. I think we can solve this problem by removing the static check for musttail call in removeTailCallAttribute.