This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine] Make dealing with alloca spills more robust
ClosedPublic

Authored by lxfind on Aug 30 2020, 11:29 PM.

Details

Summary

D66230 attempted to fix a problem where when there are allocas used before CoroBegin.
It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin.
Unfortunately that's incorrect.
Consider this code:

%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin
store ... %1
After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed).
To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior.

There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc.
Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases.

While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset.
It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design.
This patch makes it more robust and should be a pure win.
In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230.

Diff Detail

Event Timeline

lxfind created this revision.Aug 30 2020, 11:29 PM
lxfind updated this revision to Diff 288971.Aug 31 2020, 9:53 AM

Add comments

lxfind updated this revision to Diff 288975.Aug 31 2020, 10:01 AM

Amend description

lxfind published this revision for review.Aug 31 2020, 10:01 AM
lxfind updated this revision to Diff 288977.Aug 31 2020, 10:05 AM
lxfind edited the summary of this revision. (Show Details)

Add reviewers

lxfind planned changes to this revision.Aug 31 2020, 11:03 AM
lxfind added inline comments.
llvm/test/Transforms/Coroutines/ArgAddr.ll
50 ↗(On Diff #288977)

Looks like this is incorrect. It doesn't work well when CoroSaves are in loops. I will look into this.

lxfind updated this revision to Diff 289209.Sep 1 2020, 10:13 AM

Use a different approach

lxfind updated this revision to Diff 289212.Sep 1 2020, 10:18 AM
lxfind retitled this revision from [Coroutine] Properly deal with alloca spills to [Coroutine] Make dealing with alloca spills more robust.
lxfind edited the summary of this revision. (Show Details)

Update description

ChuanqiXu added inline comments.
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1002

If there is a case:

%a = alloca..
%b = bitcast  %a to ...
@coro.begin
// some use of b but no use of a

in this case, all the use of %a is dominated by Coro.begin, so the variable MightNeedToCopy may not be true, then the codes before won't be executed.

ChuanqiXu added inline comments.Sep 3 2020, 1:29 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
643

Would there be cast* or GEP instruction of Alloca before Coro.Begin? If it is, it seems like we need to visit all the use of Alloca in AllocaVisitor

lxfind added a comment.Sep 3 2020, 9:29 AM

@ChuanqiXu Thank you for taking a look!

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
643

Yes that's exactly what we are doing in the visitor.

1002

MightNeedToCopy will be true if any use if the AllocaInst is not dominated by coro.begin. So in your example, since %b uses %a and happens before coro.begin, it's not dominated by coro.begin, and hence MightNeedToCopy will be true.

ChuanqiXu added inline comments.Sep 3 2020, 6:28 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1002

Yes, you are right.

ChuanqiXu added inline comments.Sep 3 2020, 6:30 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
643

Sorry, I made a mistake. It is right.

lxfind updated this revision to Diff 289853.Sep 3 2020, 9:19 PM

address lint

modimo added a subscriber: modimo.Sep 3 2020, 9:40 PM
hoy added a subscriber: hoy.Sep 3 2020, 9:43 PM
hoy added inline comments.
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
628–643

typo : pointing

634–638

This does sound fragile. I'm wondering if this is a defined behavior in the language standard. For a local variable first allocated on the local frame and later on copied into the coroutine frame, all subsequent accesses to it should be redirected to the coroutine frame? If so, can the coroutine frame be allocated ealier?

lxfind added inline comments.Sep 3 2020, 9:48 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

This is defined behavior in the language. How to manage the data in the heap is implementation details and should not affect program correctness. So it's certainly true that this implementation is not quite right yet. The hard part is that some optimization passes will insert allocas before coro.begin, which makes it difficult to track. A proper implementation should provide strong guarantees.
This patch only makes existing algorithm slightly more robust, but does not solve the more fundamental problem, which will take some time to design and implement.

hoy added inline comments.Sep 3 2020, 9:56 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

I see. Is it possible to fix those passes allocating locals before coro.begin? If that's too involved, can we move the coro.begin to the beginning of a function and redirect all local references to the coro frame? It's basically similar to outline all the function code except for the coro.begin call.

lxfind added inline comments.Sep 3 2020, 10:00 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

The problem is that coro.begin depends on the creation of the frame on the heap, and the creation of the heap sometimes could depend on parameters of the function, which will lead to allocas before coro.begin.
I will need to look into how this is done in the Swift ABI and potentially use that model.

Looking at https://bugs.llvm.org/show_bug.cgi?id=36578:
Andrew Kelley 2019-08-11 09:00:59 PDT
I'm no longer subscribed to this bug report. I've come to the conclusion that LLVM's coroutine API is not worth using, and resorting to implementing coroutines directly in the frontend.

That's an option to look into for solutions here. Also official docs for coro.begin: https://llvm.org/docs/Coroutines.html#llvm-coro-begin-intrinsic

Having anything emitted before frame-setup is tricky because any operation there needs to be tied into DWARF unwind codes. Mirroring these operations as the current design is doing may be a better solution given that. The assertion will definitely be a bug farm as every corner gets tested.

hoy added inline comments.Sep 3 2020, 10:06 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

How does the frame creation depend on the function local variable values? Does the frame size depend on the values ?

Looking at https://bugs.llvm.org/show_bug.cgi?id=36578:
Andrew Kelley 2019-08-11 09:00:59 PDT
I'm no longer subscribed to this bug report. I've come to the conclusion that LLVM's coroutine API is not worth using, and resorting to implementing coroutines directly in the frontend.

That's an option to look into for solutions here. Also official docs for coro.begin: https://llvm.org/docs/Coroutines.html#llvm-coro-begin-intrinsic

Having anything emitted before frame-setup is tricky because any operation there needs to be tied into DWARF unwind codes. Mirroring these operations as the current design is doing may be a better solution given that. The assertion will definitely be a bug farm as every corner gets tested.

Yeah I will talk to Bruno latter and see what he thinks.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

@hoy My understanding is that you could provide a custom Allocator for the frame creation, which comes from a parameter, which will be moved to a local variable the first thing in the function.

hoy added inline comments.Sep 3 2020, 10:18 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

I see. Thanks for the explanation. That sounds like a special use of locals which should have a very short lifetime. For regular local variables, their computation may be able to be deferred until they are allocated on the coroutine frame.

lxfind added inline comments.Sep 3 2020, 10:36 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

I agree. All the cases I have seen so far are Allocas generated for parameters, and there are only simple casts/GEPs happening before coro.begin. So I hope this patch is good for a while, and give us sometime to think about this more systematically.

wenlei added inline comments.Sep 4 2020, 12:11 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

can we move the coro.begin to the beginning of a function and redirect all local references to the coro frame

This seems promising to me. Relying on pointer not escaping data flow analysis for correctness is a bit scary.

We could have something like a canonicalization for the sequence of frame allocation, followed by coro.begin and then user alloca. The idea is still similar to D66230, but without changing the order of the interfering def-use.

hoy added inline comments.Sep 4 2020, 12:20 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
674

Does a load from an alloca need to be rewritten?

679

If the use is the pointer, should the store be rewritten if it is dominated by coro.begin?

694

Is a normal function call needed to be handled?

hoy added inline comments.Sep 4 2020, 12:27 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

Yeah, still trying to figure out what locals could be used before coro.begin. Are the lifetimes used by the coro.begin call all compiler-generated, not user code? If so, the compiler can ensure that all locals are allocated on the coroutine frame before they are used.

lxfind added inline comments.Sep 4 2020, 8:59 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
674

No because it would already have the right value.

679

We only visit instructions before coro.begin, so it won't be dominated by coro.begin.

694

It's already handled in the base class, where it assumes escape.

lxfind added inline comments.Sep 4 2020, 11:13 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

@wenlei @hoy
The frame allocation (and hence coro.begin) is supposed to be the first thing that's happening in the coroutine function. However, there is one complication.
The frame allocation can use a user-defined placement form of operator new to allocate the memory, which can take arguments from the arguments list in the function (refer to https://en.cppreference.com/w/cpp/language/coroutines, the Heap Allocation section).
Allocas can then be introduced when trying to pass parameters of the function into the frame allocation call. When other optimization passes are involved, sometimes alias of the allocas for the parameters can be created which are used latter after coro.begin.
In theory, the frame allocation call can also have side-effects to the parameters, which are already stored in Allocas.
So in the current model without any major redesign, I don't think there is a perfect solution. We can only make a best-effort try to copy anything that might have been modified and recreate any pointer that points to the stack.
Assuming that all the complications are introduced due to parameter passing, I do think that the pointer analysis is powerful enough to handle all cases that's reasonable. But to make it truly robust, we would need to do something entirely different.

lxfind marked 4 inline comments as done.Sep 4 2020, 11:15 AM
hoy added inline comments.Sep 4 2020, 2:03 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

So sounds like only parameter locations need to be seen by coro.begin. We might be able to allow special temporary allocas for the sake of coro.begin while allocating real parameter and other locals on the coroutine frame directly.

hoy added inline comments.Sep 4 2020, 2:57 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

Talked offline, current coro.begin intrinsic call doesn't kill values crossing it, thus it doesn't prevent optimization passes from setting a value before it and using the value after it. This is also not modeled by any IR intrinsic attribute either. I guess we can assign coro.begin an existing attribute to prevent any function call being moved across it to minimize the escaped case.

The patch looks a decent workaround to me without introducing fundamental changes to the current coroutine implementation model.

hoy added inline comments.Sep 4 2020, 3:15 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679

Should we check if the use is the left-hand side or right-hand side?

hoy added inline comments.Sep 4 2020, 3:17 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679

Ignore my last comment please. I'm wondering why it's changed from escaped to aborted.

lxfind updated this revision to Diff 290053.Sep 4 2020, 5:37 PM

address typo

hoy added inline comments.Sep 4 2020, 6:38 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679

So why do we change from setAborted to setEscaped?

lxfind added inline comments.Sep 4 2020, 9:12 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679

setAborted will cause the visiting process to terminate right after this instruction, which was fine in the previous implementation because all it needed was to find out whether it escapes or gets modified. But now I also need to track all aliases, which mean I have to visit every instruction and cannot abort in the middle.

hoy accepted this revision.Sep 7 2020, 8:22 PM
This revision is now accepted and ready to land.Sep 7 2020, 8:22 PM
wenlei added inline comments.Sep 8 2020, 8:50 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

The frame allocation can use a user-defined placement form of operator new to allocate the memory, which can take arguments from the arguments list in the function

Ok, so having some sequence before coro.begin may be required for frame allocation on heap via custom allocator. But that isn't problematic by itself, as long as nothing from frame allocation will cross coro.begin, right? And frame allocation should be a well structured sequence, wondering what would cause something to be used by both frame allocation and after coro.begin?

For this case below you mentioned in comment, assuming neither %a nor %b is used in frame allocation, would moving coro.begin above them (but still below frame allocation sequence) solve the problem? I can see that your fix makes things better, but trying to understand its overlap/tradeoff against the move approach (improving Gor's original fix to avoid breaking def-use chain).

%a = AllocaInst ...
%b = call @computeAddress(... %a)
lxfind added inline comments.Sep 8 2020, 9:06 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

So far the only problematic cases I know of are when the parameters of the function are localized though AllocaInst before coro.begin, and then used after coro.begin. This can happen when the custom allocator needs to use those parameters or some other optimization passes inserted AllocaInst in the front for whatever reason.
I don't actually think the counter-example in my comment (%b = call @computeAddress(... %a)) would really happen today, otherwise this solution won't work. I put down the comment just to note what the current solution cannot cover in case we hit it in the future. If this does happen, it would still most likely be due to the custom allocator (say the allocation call is inlined, and latter some optimization pass makes some of the local variables to be shared across coro.begin boundary)

lxfind updated this revision to Diff 290505.Sep 8 2020, 9:23 AM

rebase before land

hoy added inline comments.Sep 8 2020, 9:48 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

I'm also wondering maybe we should assert for the cases we've not seen and cannot handle so far, i.e., escaped local addresses before coro.begin.

lxfind added inline comments.Sep 8 2020, 9:52 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638

I did consider this, but decided not to, because it's very likely the escape analysis in LLVM is not good enough to tell what could really escape. For instance, the custom allocator function might not be marked as non-escaping even if it won't escape.

This revision was automatically updated to reflect the committed changes.

Chatted with @lxfind off patch in the morning. A quick summary:

We could still move instructions as much as we can after coro.begin, but defs used by both heap allocation and after coro.begin will still necessitate the need for alias tracking for things that cannot be moved.

However on top of the alias tracking implemented in this patch, moving things after coro.begin could handle some corner cases that alias tracking may fail. Let's have this in first, and we can add the instruction moving as an optimization later.

Thanks for working on this!