Page MenuHomePhabricator

[coroutine] Fixes "cannot move instruction since its users are not dominated by CoroBegin" problem.
ClosedPublic

Authored by GorNishanov on Aug 14 2019, 10:14 AM.

Details

Summary

Fixes https://bugs.llvm.org/show_bug.cgi?id=36578 and https://bugs.llvm.org/show_bug.cgi?id=36296.
Supersedes: https://reviews.llvm.org/D55966

One of the fundamental transformation that CoroSplit pass performs before splitting the coroutine is to find which values need to survive between suspend and resume and provide a slot for them in the coroutine frame to spill and restore the value as needed.

Coroutine frame becomes available once the storage for it was allocated and that point is marked in the pre-split coroutine with a llvm.coro.begin intrinsic.

FE normally puts all of the user-authored code that would be accessing those values after llvm.coro.begin, however, sometimes instructions accessing those values would end up prior to coro.begin. For example, writing out a value of the parameter into the alloca done by the FE or instructions that are added by the optimization passes such as SROA when it rewrites allocas.

Prior to this change, CoroSplit pass would try to move instructions that may end up accessing the values in the coroutine frame after CoroBegin. However it would run into problems (report_fatal_error) if some of the values would be used both in the allocation function (for example allocator is passed as a parameter to a coroutine) and in the use-authored body of the coroutine.

To handle this case and to simplify the instruction moving logic, this change removes all of the instruction moving. Instead, we only change the uses of the spilled values that are dominated by coro.begin and leave other instructions intact.

Before:

%var = alloca i32
%1 = getelementptr .. %var; ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(

After:

%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin(

If we discover that there is a potential write into an alloca, prior to coro.begin we would copy its value from the alloca into the spill slot in the coroutine frame.

Before:

%var = alloca i32
store .. %var ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(

After:

%var = alloca i32
store .. %var ;stays put
%f = call i8* @llvm.coro.begin(
%tmp = load %var
store %tmp, %spill.slot.for.var

Note: This change does not handle array allocas as that is something that C++ FE does not produce, but, it can be added in the future if need arises

Diff Detail

Repository
rL LLVM

Event Timeline

GorNishanov created this revision.Aug 14 2019, 10:14 AM
GorNishanov edited the summary of this revision. (Show Details)Aug 14 2019, 10:28 AM
modocache accepted this revision.Aug 14 2019, 12:24 PM

Thanks for looking into this, @GorNishanov! LGTM aside from a tiny nit-pick.

lib/Transforms/Coroutines/CoroFrame.cpp
476 ↗(On Diff #215156)

A nit-pick, but: it seems like most other variable names in this function are capitalized (e.g.: PtrI), but this one is a lowercase v. Is there a reason they're not consistent?

This revision is now accepted and ready to land.Aug 14 2019, 12:24 PM

Thank you very much for the reivew, Brian!
Nit addressed. Preparing to land.

GorNishanov added a reviewer: rjmccall.
GorNishanov marked an inline comment as done.

Merged with the latest CoroFrame changes from @rjmccall .

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2019, 5:51 PM

Seems reasonable to me. If we're no longer imposing any *restrictions* for @llvm.coro.begin, is it serving any real purpose? Is it just a way of getting a handle to the coroutine frame?

Seems reasonable to me. If we're no longer imposing any *restrictions* for @llvm.coro.begin, is it serving any real purpose? Is it just a way of getting a handle to the coroutine frame?

I think the original reason for the coro.begin is intact. LLVM Coroutines.rst stays "Depending on the alignment requirements of the objects in the coroutine frame and/or on the codegen compactness reasons the pointer returned from coro.begin may be at offset to the %mem argument. "

C++ FE obtains a memory in some way for the coroutine frame and then give it to coro.begin, saying: "Here. Use it for coroutine frame." Lowering of coro.begin may simply return the same pointer or it may do some adjustment to it. One possible adjustment would be, if there are any over-aligned variables with alignment larger than the alignment of the allocator (supplied to coro.id), coro.begin would do (NOT YET IMPLEMENTED) a dynamic adjustment to make sure everything is properly aligned.

That reminded me, there was a brief common C++ coroutine ABI discussion at Cologne and I don't think the results were shared with the interested parties. I'll send an e-mail shortly

Is there a case you're imagining where the adjustment would have side-effects? Because I can see a reason to have an intrinsic that returns a frame pointer, but I don't see why that intrinsic would have any of the restrictions of @llvm.coro.begin.

Is there a case you're imagining where the adjustment would have side-effects? Because I can see a reason to have an intrinsic that returns a frame pointer, but I don't see why that intrinsic would have any of the restrictions of @llvm.coro.begin.

Which restrictions are you thinking about?
Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.
If you are thinking about the attributes on the intrinsic itself, those probably can be relaxed.

Is there a case you're imagining where the adjustment would have side-effects? Because I can see a reason to have an intrinsic that returns a frame pointer, but I don't see why that intrinsic would have any of the restrictions of @llvm.coro.begin.

Which restrictions are you thinking about?

"A frontend should emit exactly one coro.begin intrinsic per coroutine."

Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.

Does it? Why? There already has to be a single coro.id in the coroutine, and that's the intrinsic that takes useful information. coro.begin just has a bunch of requirements and no clear purpose except to return the frame, which could just as well be done with a duplicable intrinsic. Frame allocation has to happen in the ramp prologue anyway, and the underlying observation of this particular patch is that trying to move coro.begin to reflect its logical order in the function is just a lot of complication for no clear purpose. And honestly the frame pointer is not usually a useful thing to expose in a coroutine representation; aspects of frame layout being exposed in the ABI is a property of the switch lowering, not coroutines in general.

John.

aspects of frame layout being exposed in the ABI is a property of the switch lowering, not coroutines in general.

You may be right. Given that now we have two frontends targeting LLVM Coroutines, some refactoring may be in order. I need to study more the Swift approach before I can form an opinion.

Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.

Does it? Why? There already has to be a single coro.id in the coroutine, and that's the intrinsic that takes useful information. coro.begin just has a bunch of requirements and no clear purpose except to return the frame, which could just as well be done with a duplicable intrinsic.

Here is how it looks to me: C++ frontends emits the following structure (simplified):

%id = coro.id(stuff)
%mem = SomeAllocCodeCouldBeAnything(stuff)
%frame = coro.begin(%mem)

coro.begin "blesses" the memory as the one to be used in the coroutine and therefore should dominate any possible uses of data that go into the coroutine frame and gives a convenient place to dump spills, copies, etc.

If we did not allow arbitrary allocation logic in c++, there would be no need for coro.begin at all.

Gor

aspects of frame layout being exposed in the ABI is a property of the switch lowering, not coroutines in general.

You may be right. Given that now we have two frontends targeting LLVM Coroutines, some refactoring may be in order. I need to study more the Swift approach before I can form an opinion.

Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.

Does it? Why? There already has to be a single coro.id in the coroutine, and that's the intrinsic that takes useful information. coro.begin just has a bunch of requirements and no clear purpose except to return the frame, which could just as well be done with a duplicable intrinsic.

Here is how it looks to me: C++ frontends emits the following structure (simplified):

%id = coro.id(stuff)
%mem = SomeAllocCodeCouldBeAnything(stuff)
%frame = coro.begin(%mem)

coro.begin "blesses" the memory as the one to be used in the coroutine and therefore should dominate any possible uses of data that go into the coroutine frame and gives a convenient place to dump spills, copies, etc.

If we did not allow arbitrary allocation logic in c++, there would be no need for coro.begin at all.

Ah, okay, I see. Yes, if there has to be inlined code to do the allocation, then something like coro.begin does seem necessary; although of course the danger is that that arbitrary code — if nothing else, after inlining — might do something that we naively think needs the coroutine frame. For example, if the allocation called a user-defined allocation function, and we inlined that call before splitting the coroutine, and the inlined code contained an alloca (scoped to the call, of course, but we haven't taught CoroFrame to optimize based on alloca lifetimes yet), that would presumably cause serious problems for lowering because the alloca would have uses prior to coro.begin.

How arbitrary can allocation really be? If it can be emitted in a separate function call that can be provided abstractly to coro.id, then coroutine lowering can just insert a call to that function (or whatever more complicated pattern is necessary to enable static elimination of the allocation) and then trigger further optimization/inlining as necessary to optimize that new call. If we need to handle exceptions out of it then maybe we can make coro.id non-nounwind. (We can almost get away with just saying "allocation is the first thing the function does, so it never happens in an interesting EH context. Unfortunately, there are some features/ABIs that require parameters to be destroyed in the callee, which can mean that everything in the function is in an interesting EH context, even generated code in the prologue.)