This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
17/34
CoroFrame.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-param-copy.ll

Differential D86859

[Coroutine] Make dealing with alloca spills more robust
ClosedPublic

Authored by lxfind on Aug 30 2020, 11:29 PM.

Download Raw Diff

Details

Reviewers

wenlei
rjmccall
GorNishanov
junparser
lewissbaker
bruno
hoy

Commits

rG59a467ee4fae: [Coroutine] Make dealing with alloca spills more robust

Summary

D66230 attempted to fix a problem where when there are allocas used before CoroBegin.
It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin.
Unfortunately that's incorrect.
Consider this code:

%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin
store ... %1
After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed).
To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior.

There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc.
Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases.

While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset.
It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design.
This patch makes it more robust and should be a pure win.
In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lxfind created this revision.Aug 30 2020, 11:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2020, 11:29 PM

Herald added subscribers: llvm-commits, wenlei, danielkiss and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B70067: Diff 288892.Aug 31 2020, 12:08 AM

Add comments

Amend description

lxfind published this revision for review.Aug 31 2020, 10:01 AM

Add reviewers

Harbormaster completed remote builds in B70110: Diff 288971.Aug 31 2020, 10:40 AM

Harbormaster completed remote builds in B70116: Diff 288977.Aug 31 2020, 10:53 AM

Harbormaster completed remote builds in B70114: Diff 288975.Aug 31 2020, 11:02 AM

lxfind planned changes to this revision.Aug 31 2020, 11:03 AM

lxfind added inline comments.

llvm/test/Transforms/Coroutines/ArgAddr.ll
50 ↗	(On Diff #288977)	Looks like this is incorrect. It doesn't work well when CoroSaves are in loops. I will look into this.

Use a different approach

Update description

lxfind added reviewers: lewissbaker, bruno.Sep 1 2020, 10:21 AM

Herald added a subscriber: dexonsmith. · View Herald TranscriptSep 1 2020, 10:21 AM

Harbormaster completed remote builds in B70260: Diff 289209.Sep 1 2020, 10:56 AM

Harbormaster completed remote builds in B70262: Diff 289212.Sep 1 2020, 11:01 AM

ChuanqiXu added a subscriber: ChuanqiXu.Sep 3 2020, 1:14 AM

ChuanqiXu added inline comments.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1002	If there is a case: %a = alloca.. %b = bitcast %a to ... @coro.begin // some use of b but no use of a in this case, all the use of `%a` is dominated by Coro.begin, so the variable `MightNeedToCopy` may not be true, then the codes before won't be executed.

ChuanqiXu added inline comments.Sep 3 2020, 1:29 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
643	Would there be cast* or GEP instruction of `Alloca` before Coro.Begin? If it is, it seems like we need to visit all the use of Alloca in AllocaVisitor

@ChuanqiXu Thank you for taking a look!

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
643	Yes that's exactly what we are doing in the visitor.
1002	MightNeedToCopy will be true if any use if the AllocaInst is not dominated by coro.begin. So in your example, since %b uses %a and happens before coro.begin, it's not dominated by coro.begin, and hence MightNeedToCopy will be true.

lxfind mentioned this in D66230: [coroutine] Fixes "cannot move instruction since its users are not dominated by CoroBegin" problem..Sep 3 2020, 9:31 AM

ChuanqiXu added inline comments.Sep 3 2020, 6:28 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1002	Yes, you are right.

ChuanqiXu added inline comments.Sep 3 2020, 6:30 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
643	Sorry, I made a mistake. It is right.

address lint

modimo added a subscriber: modimo.Sep 3 2020, 9:40 PM

hoy added a subscriber: hoy.Sep 3 2020, 9:43 PM

hoy added inline comments.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
628–643	typo : pointing
634–638	This does sound fragile. I'm wondering if this is a defined behavior in the language standard. For a local variable first allocated on the local frame and later on copied into the coroutine frame, all subsequent accesses to it should be redirected to the coroutine frame? If so, can the coroutine frame be allocated ealier?

lxfind added inline comments.Sep 3 2020, 9:48 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	This is defined behavior in the language. How to manage the data in the heap is implementation details and should not affect program correctness. So it's certainly true that this implementation is not quite right yet. The hard part is that some optimization passes will insert allocas before coro.begin, which makes it difficult to track. A proper implementation should provide strong guarantees. This patch only makes existing algorithm slightly more robust, but does not solve the more fundamental problem, which will take some time to design and implement.

hoy added inline comments.Sep 3 2020, 9:56 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	I see. Is it possible to fix those passes allocating locals before `coro.begin`? If that's too involved, can we move the `coro.begin` to the beginning of a function and redirect all local references to the coro frame? It's basically similar to outline all the function code except for the `coro.begin` call.

lxfind added inline comments.Sep 3 2020, 10:00 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	The problem is that coro.begin depends on the creation of the frame on the heap, and the creation of the heap sometimes could depend on parameters of the function, which will lead to allocas before coro.begin. I will need to look into how this is done in the Swift ABI and potentially use that model.

Looking at https://bugs.llvm.org/show_bug.cgi?id=36578:
Andrew Kelley 2019-08-11 09:00:59 PDT
I'm no longer subscribed to this bug report. I've come to the conclusion that LLVM's coroutine API is not worth using, and resorting to implementing coroutines directly in the frontend.

That's an option to look into for solutions here. Also official docs for coro.begin: https://llvm.org/docs/Coroutines.html#llvm-coro-begin-intrinsic

Having anything emitted before frame-setup is tricky because any operation there needs to be tied into DWARF unwind codes. Mirroring these operations as the current design is doing may be a better solution given that. The assertion will definitely be a bug farm as every corner gets tested.

hoy added inline comments.Sep 3 2020, 10:06 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	How does the frame creation depend on the function local variable values? Does the frame size depend on the values ?

In D86859#2255833, @modimo wrote:

Looking at https://bugs.llvm.org/show_bug.cgi?id=36578:
Andrew Kelley 2019-08-11 09:00:59 PDT
I'm no longer subscribed to this bug report. I've come to the conclusion that LLVM's coroutine API is not worth using, and resorting to implementing coroutines directly in the frontend.

That's an option to look into for solutions here. Also official docs for coro.begin: https://llvm.org/docs/Coroutines.html#llvm-coro-begin-intrinsic

Having anything emitted before frame-setup is tricky because any operation there needs to be tied into DWARF unwind codes. Mirroring these operations as the current design is doing may be a better solution given that. The assertion will definitely be a bug farm as every corner gets tested.

Yeah I will talk to Bruno latter and see what he thinks.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	@hoy My understanding is that you could provide a custom Allocator for the frame creation, which comes from a parameter, which will be moved to a local variable the first thing in the function.

Harbormaster completed remote builds in B70613: Diff 289853.Sep 3 2020, 10:16 PM

hoy added inline comments.Sep 3 2020, 10:18 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	I see. Thanks for the explanation. That sounds like a special use of locals which should have a very short lifetime. For regular local variables, their computation may be able to be deferred until they are allocated on the coroutine frame.

lxfind added inline comments.Sep 3 2020, 10:36 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	I agree. All the cases I have seen so far are Allocas generated for parameters, and there are only simple casts/GEPs happening before coro.begin. So I hope this patch is good for a while, and give us sometime to think about this more systematically.

wenlei added inline comments.Sep 4 2020, 12:11 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	can we move the coro.begin to the beginning of a function and redirect all local references to the coro frame This seems promising to me. Relying on pointer not escaping data flow analysis for correctness is a bit scary. We could have something like a canonicalization for the sequence of frame allocation, followed by coro.begin and then user alloca. The idea is still similar to D66230, but without changing the order of the interfering def-use.

hoy added inline comments.Sep 4 2020, 12:20 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
674	Does a load from an alloca need to be rewritten?
679	If the use is the pointer, should the store be rewritten if it is dominated by coro.begin?
694	Is a normal function call needed to be handled?

hoy added inline comments.Sep 4 2020, 12:27 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	Yeah, still trying to figure out what locals could be used before coro.begin. Are the lifetimes used by the coro.begin call all compiler-generated, not user code? If so, the compiler can ensure that all locals are allocated on the coroutine frame before they are used.

lxfind added inline comments.Sep 4 2020, 8:59 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
674	No because it would already have the right value.
679	We only visit instructions before coro.begin, so it won't be dominated by coro.begin.
694	It's already handled in the base class, where it assumes escape.

lxfind added inline comments.Sep 4 2020, 11:13 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	@wenlei @hoy The frame allocation (and hence coro.begin) is supposed to be the first thing that's happening in the coroutine function. However, there is one complication. The frame allocation can use a user-defined placement form of `operator new` to allocate the memory, which can take arguments from the arguments list in the function (refer to https://en.cppreference.com/w/cpp/language/coroutines, the Heap Allocation section). Allocas can then be introduced when trying to pass parameters of the function into the frame allocation call. When other optimization passes are involved, sometimes alias of the allocas for the parameters can be created which are used latter after coro.begin. In theory, the frame allocation call can also have side-effects to the parameters, which are already stored in Allocas. So in the current model without any major redesign, I don't think there is a perfect solution. We can only make a best-effort try to copy anything that might have been modified and recreate any pointer that points to the stack. Assuming that all the complications are introduced due to parameter passing, I do think that the pointer analysis is powerful enough to handle all cases that's reasonable. But to make it truly robust, we would need to do something entirely different.

lxfind marked 4 inline comments as done.Sep 4 2020, 11:15 AM

hoy added inline comments.Sep 4 2020, 2:03 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	So sounds like only parameter locations need to be seen by `coro.begin`. We might be able to allow special temporary allocas for the sake of `coro.begin` while allocating real parameter and other locals on the coroutine frame directly.

hoy added inline comments.Sep 4 2020, 2:57 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	Talked offline, current `coro.begin` intrinsic call doesn't kill values crossing it, thus it doesn't prevent optimization passes from setting a value before it and using the value after it. This is also not modeled by any IR intrinsic attribute either. I guess we can assign `coro.begin` an existing attribute to prevent any function call being moved across it to minimize the escaped case. The patch looks a decent workaround to me without introducing fundamental changes to the current coroutine implementation model.

hoy added inline comments.Sep 4 2020, 3:15 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679	Should we check if the use is the left-hand side or right-hand side?

hoy added inline comments.Sep 4 2020, 3:17 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679	Ignore my last comment please. I'm wondering why it's changed from escaped to aborted.

address typo

Harbormaster completed remote builds in B70713: Diff 290053.Sep 4 2020, 6:06 PM

hoy added inline comments.Sep 4 2020, 6:38 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679	So why do we change from `setAborted` to `setEscaped`?

lxfind added inline comments.Sep 4 2020, 9:12 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
679	setAborted will cause the visiting process to terminate right after this instruction, which was fine in the previous implementation because all it needed was to find out whether it escapes or gets modified. But now I also need to track all aliases, which mean I have to visit every instruction and cannot abort in the middle.

hoy accepted this revision.Sep 7 2020, 8:22 PM

This revision is now accepted and ready to land.Sep 7 2020, 8:22 PM

wenlei added inline comments.Sep 8 2020, 8:50 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	The frame allocation can use a user-defined placement form of operator new to allocate the memory, which can take arguments from the arguments list in the function Ok, so having some sequence before coro.begin may be required for frame allocation on heap via custom allocator. But that isn't problematic by itself, as long as nothing from frame allocation will cross coro.begin, right? And frame allocation should be a well structured sequence, wondering what would cause something to be used by both frame allocation and after coro.begin? For this case below you mentioned in comment, assuming neither %a nor %b is used in frame allocation, would moving coro.begin above them (but still below frame allocation sequence) solve the problem? I can see that your fix makes things better, but trying to understand its overlap/tradeoff against the move approach (improving Gor's original fix to avoid breaking def-use chain). %a = AllocaInst ... %b = call @computeAddress(... %a)

lxfind added inline comments.Sep 8 2020, 9:06 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	So far the only problematic cases I know of are when the parameters of the function are localized though AllocaInst before coro.begin, and then used after coro.begin. This can happen when the custom allocator needs to use those parameters or some other optimization passes inserted AllocaInst in the front for whatever reason. I don't actually think the counter-example in my comment (%b = call @computeAddress(... %a)) would really happen today, otherwise this solution won't work. I put down the comment just to note what the current solution cannot cover in case we hit it in the future. If this does happen, it would still most likely be due to the custom allocator (say the allocation call is inlined, and latter some optimization pass makes some of the local variables to be shared across coro.begin boundary)

rebase before land

hoy added inline comments.Sep 8 2020, 9:48 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	I'm also wondering maybe we should assert for the cases we've not seen and cannot handle so far, i.e., escaped local addresses before `coro.begin`.

lxfind added inline comments.Sep 8 2020, 9:52 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
634–638	I did consider this, but decided not to, because it's very likely the escape analysis in LLVM is not good enough to tell what could really escape. For instance, the custom allocator function might not be marked as non-escaping even if it won't escape.

Harbormaster completed remote builds in B70952: Diff 290505.Sep 8 2020, 10:01 AM

Closed by commit rG59a467ee4fae: [Coroutine] Make dealing with alloca spills more robust (authored by lxfind). · Explain WhySep 8 2020, 11:10 AM

This revision was automatically updated to reflect the committed changes.

lxfind added a commit: rG59a467ee4fae: [Coroutine] Make dealing with alloca spills more robust.

Chatted with @lxfind off patch in the morning. A quick summary:

We could still move instructions as much as we can after coro.begin, but defs used by both heap allocation and after coro.begin will still necessitate the need for alias tracking for things that cannot be moved.

However on top of the alias tracking implemented in this patch, moving things after coro.begin could handle some corner cases that alias tracking may fail. Let's have this in first, and we can add the instruction moving as an optimization later.

Thanks for working on this!

junparser mentioned this in D89768: [Coroutine] Properly determine whether an alloca should live on the frame.Oct 27 2020, 7:53 AM

lxfind mentioned this in D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function.Apr 13 2021, 3:26 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Coroutines/

CoroFrame.cpp

121 lines

test/

Transforms/

Coroutines/

coro-param-copy.ll

57 lines

Diff 290533

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

Show First 20 Lines • Show All 619 Lines • ▼ Show 20 Lines	static StructType *buildFrameType(Function &F, coro::Shape &Shape,
}		}

return FrameTy;		return FrameTy;
}		}

// We use a pointer use visitor to discover if there are any writes into an		// We use a pointer use visitor to discover if there are any writes into an
// alloca that dominates CoroBegin. If that is the case, insertSpills will copy		// alloca that dominates CoroBegin. If that is the case, insertSpills will copy
// the value from the alloca into the coroutine frame spill slot corresponding		// the value from the alloca into the coroutine frame spill slot corresponding
// to that alloca.		// to that alloca. We also collect any alias pointing to the alloca created
		// before CoroBegin but used after CoroBegin. These alias will be recreated
		// after CoroBegin from the frame address so that latter references are
		// pointing to the frame instead of the stack.
		// Note: We are repurposing PtrUseVisitor's isEscaped() to mean whether the
		// pointer is potentially written into.
		// TODO: If the pointer is really escaped, we are in big trouble because we
		// will be escaping a pointer to a stack address that would no longer exist
		// soon. However most escape analysis isn't good enough to precisely tell,
		// so we are assuming that if a pointer is escaped that it's written into.
		// TODO: Another potential issue is if we are creating an alias through
		hoyUnsubmitted Not Done Reply Inline Actions This does sound fragile. I'm wondering if this is a defined behavior in the language standard. For a local variable first allocated on the local frame and later on copied into the coroutine frame, all subsequent accesses to it should be redirected to the coroutine frame? If so, can the coroutine frame be allocated ealier? hoy: This does sound fragile. I'm wondering if this is a defined behavior in the language standard.
		lxfindAuthorUnsubmitted Done Reply Inline Actions This is defined behavior in the language. How to manage the data in the heap is implementation details and should not affect program correctness. So it's certainly true that this implementation is not quite right yet. The hard part is that some optimization passes will insert allocas before coro.begin, which makes it difficult to track. A proper implementation should provide strong guarantees. This patch only makes existing algorithm slightly more robust, but does not solve the more fundamental problem, which will take some time to design and implement. lxfind: This is defined behavior in the language. How to manage the data in the heap is implementation…
		hoyUnsubmitted Not Done Reply Inline Actions I see. Is it possible to fix those passes allocating locals before `coro.begin`? If that's too involved, can we move the `coro.begin` to the beginning of a function and redirect all local references to the coro frame? It's basically similar to outline all the function code except for the `coro.begin` call. hoy: I see. Is it possible to fix those passes allocating locals before `coro.begin`? If that's too…
		lxfindAuthorUnsubmitted Done Reply Inline Actions The problem is that coro.begin depends on the creation of the frame on the heap, and the creation of the heap sometimes could depend on parameters of the function, which will lead to allocas before coro.begin. I will need to look into how this is done in the Swift ABI and potentially use that model. lxfind: The problem is that coro.begin depends on the creation of the frame on the heap, and the…
		hoyUnsubmitted Not Done Reply Inline Actions How does the frame creation depend on the function local variable values? Does the frame size depend on the values ? hoy: How does the frame creation depend on the function local variable values? Does the frame size…
		lxfindAuthorUnsubmitted Done Reply Inline Actions @hoy My understanding is that you could provide a custom Allocator for the frame creation, which comes from a parameter, which will be moved to a local variable the first thing in the function. lxfind: @hoy My understanding is that you could provide a custom Allocator for the frame creation…
		hoyUnsubmitted Not Done Reply Inline Actions I see. Thanks for the explanation. That sounds like a special use of locals which should have a very short lifetime. For regular local variables, their computation may be able to be deferred until they are allocated on the coroutine frame. hoy: I see. Thanks for the explanation. That sounds like a special use of locals which should have a…
		lxfindAuthorUnsubmitted Done Reply Inline Actions I agree. All the cases I have seen so far are Allocas generated for parameters, and there are only simple casts/GEPs happening before coro.begin. So I hope this patch is good for a while, and give us sometime to think about this more systematically. lxfind: I agree. All the cases I have seen so far are Allocas generated for parameters, and there are…
		wenleiUnsubmitted Not Done Reply Inline Actions can we move the coro.begin to the beginning of a function and redirect all local references to the coro frame This seems promising to me. Relying on pointer not escaping data flow analysis for correctness is a bit scary. We could have something like a canonicalization for the sequence of frame allocation, followed by coro.begin and then user alloca. The idea is still similar to D66230, but without changing the order of the interfering def-use. wenlei: > can we move the coro.begin to the beginning of a function and redirect all local references…
		hoyUnsubmitted Not Done Reply Inline Actions Yeah, still trying to figure out what locals could be used before coro.begin. Are the lifetimes used by the coro.begin call all compiler-generated, not user code? If so, the compiler can ensure that all locals are allocated on the coroutine frame before they are used. hoy: Yeah, still trying to figure out what locals could be used before coro.begin. Are the lifetimes…
		lxfindAuthorUnsubmitted Done Reply Inline Actions @wenlei @hoy The frame allocation (and hence coro.begin) is supposed to be the first thing that's happening in the coroutine function. However, there is one complication. The frame allocation can use a user-defined placement form of `operator new` to allocate the memory, which can take arguments from the arguments list in the function (refer to https://en.cppreference.com/w/cpp/language/coroutines, the Heap Allocation section). Allocas can then be introduced when trying to pass parameters of the function into the frame allocation call. When other optimization passes are involved, sometimes alias of the allocas for the parameters can be created which are used latter after coro.begin. In theory, the frame allocation call can also have side-effects to the parameters, which are already stored in Allocas. So in the current model without any major redesign, I don't think there is a perfect solution. We can only make a best-effort try to copy anything that might have been modified and recreate any pointer that points to the stack. Assuming that all the complications are introduced due to parameter passing, I do think that the pointer analysis is powerful enough to handle all cases that's reasonable. But to make it truly robust, we would need to do something entirely different. lxfind: @wenlei @hoy The frame allocation (and hence coro.begin) is supposed to be the first thing…
		hoyUnsubmitted Not Done Reply Inline Actions So sounds like only parameter locations need to be seen by `coro.begin`. We might be able to allow special temporary allocas for the sake of `coro.begin` while allocating real parameter and other locals on the coroutine frame directly. hoy: So sounds like only parameter locations need to be seen by `coro.begin`. We might be able to…
		hoyUnsubmitted Not Done Reply Inline Actions Talked offline, current `coro.begin` intrinsic call doesn't kill values crossing it, thus it doesn't prevent optimization passes from setting a value before it and using the value after it. This is also not modeled by any IR intrinsic attribute either. I guess we can assign `coro.begin` an existing attribute to prevent any function call being moved across it to minimize the escaped case. The patch looks a decent workaround to me without introducing fundamental changes to the current coroutine implementation model. hoy: Talked offline, current `coro.begin` intrinsic call doesn't kill values crossing it, thus it…
		wenleiUnsubmitted Not Done Reply Inline Actions The frame allocation can use a user-defined placement form of operator new to allocate the memory, which can take arguments from the arguments list in the function Ok, so having some sequence before coro.begin may be required for frame allocation on heap via custom allocator. But that isn't problematic by itself, as long as nothing from frame allocation will cross coro.begin, right? And frame allocation should be a well structured sequence, wondering what would cause something to be used by both frame allocation and after coro.begin? For this case below you mentioned in comment, assuming neither %a nor %b is used in frame allocation, would moving coro.begin above them (but still below frame allocation sequence) solve the problem? I can see that your fix makes things better, but trying to understand its overlap/tradeoff against the move approach (improving Gor's original fix to avoid breaking def-use chain). %a = AllocaInst ... %b = call @computeAddress(... %a) wenlei: > The frame allocation can use a user-defined placement form of operator new to allocate the…
		lxfindAuthorUnsubmitted Done Reply Inline Actions So far the only problematic cases I know of are when the parameters of the function are localized though AllocaInst before coro.begin, and then used after coro.begin. This can happen when the custom allocator needs to use those parameters or some other optimization passes inserted AllocaInst in the front for whatever reason. I don't actually think the counter-example in my comment (%b = call @computeAddress(... %a)) would really happen today, otherwise this solution won't work. I put down the comment just to note what the current solution cannot cover in case we hit it in the future. If this does happen, it would still most likely be due to the custom allocator (say the allocation call is inlined, and latter some optimization pass makes some of the local variables to be shared across coro.begin boundary) lxfind: So far the only problematic cases I know of are when the parameters of the function are…
		hoyUnsubmitted Not Done Reply Inline Actions I'm also wondering maybe we should assert for the cases we've not seen and cannot handle so far, i.e., escaped local addresses before `coro.begin`. hoy: I'm also wondering maybe we should assert for the cases we've not seen and cannot handle so far…
		lxfindAuthorUnsubmitted Done Reply Inline Actions I did consider this, but decided not to, because it's very likely the escape analysis in LLVM is not good enough to tell what could really escape. For instance, the custom allocator function might not be marked as non-escaping even if it won't escape. lxfind: I did consider this, but decided not to, because it's very likely the escape analysis in LLVM…
		// a function call, e.g:
		// %a = AllocaInst ...
		// %b = call @computeAddress(... %a)
		// If %b is an alias of %a and will be used after CoroBegin, this will be broken
		// and there is nothing we can do about it.
		ChuanqiXuUnsubmitted Done Reply Inline Actions Would there be cast* or GEP instruction of `Alloca` before Coro.Begin? If it is, it seems like we need to visit all the use of Alloca in AllocaVisitor ChuanqiXu: Would there be cast* or GEP instruction of `Alloca` before Coro.Begin? If it is, it seems like…
		lxfindAuthorUnsubmitted Done Reply Inline Actions Yes that's exactly what we are doing in the visitor. lxfind: Yes that's exactly what we are doing in the visitor.
		ChuanqiXuUnsubmitted Done Reply Inline Actions Sorry, I made a mistake. It is right. ChuanqiXu: Sorry, I made a mistake. It is right.
		hoyUnsubmitted Not Done Reply Inline Actions typo : pointing hoy: typo : pointing
namespace {		namespace {
struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {		struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
using Base = PtrUseVisitor<AllocaUseVisitor>;		using Base = PtrUseVisitor<AllocaUseVisitor>;
AllocaUseVisitor(const DataLayout &DL, const DominatorTree &DT,		AllocaUseVisitor(const DataLayout &DL, const DominatorTree &DT,
const CoroBeginInst &CB)		const CoroBeginInst &CB)
: PtrUseVisitor(DL), DT(DT), CoroBegin(CB) {}		: PtrUseVisitor(DL), DT(DT), CoroBegin(CB) {}

// We are only interested in uses that dominate coro.begin.		// We are only interested in uses that's not dominated by coro.begin.
void visit(Instruction &I) {		void visit(Instruction &I) {
if (DT.dominates(&I, &CoroBegin))		if (!DT.dominates(&CoroBegin, &I))
Base::visit(I);		Base::visit(I);
}		}
// We need to provide this overload as PtrUseVisitor uses a pointer based		// We need to provide this overload as PtrUseVisitor uses a pointer based
// visiting function.		// visiting function.
void visit(Instruction I) { return visit(I); }		void visit(Instruction I) { return visit(I); }

void visitLoadInst(LoadInst &) {} // Good. Nothing to do.		// We cannot handle PHI node and SelectInst because they could be selecting
		// between two addresses that point to different Allocas.
		void visitPHINode(PHINode &I) {
		assert(!usedAfterCoroBegin(I) &&
		"Unable to handle PHI node of aliases created before CoroBegin but "
		"used after CoroBegin");
		}

		void visitSelectInst(SelectInst &I) {
		assert(!usedAfterCoroBegin(I) &&
		"Unable to handle Select of aliases created before CoroBegin but "
		"used after CoroBegin");
		}

		void visitLoadInst(LoadInst &) {}
		hoyUnsubmitted Not Done Reply Inline Actions Does a load from an alloca need to be rewritten? hoy: Does a load from an alloca need to be rewritten?
		lxfindAuthorUnsubmitted Done Reply Inline Actions No because it would already have the right value. lxfind: No because it would already have the right value.

// If the use is an operand, the pointer escaped and anything can write into		// If the use is an operand, the pointer escaped and anything can write into
// that memory. If the use is the pointer, we are definitely writing into the		// that memory. If the use is the pointer, we are definitely writing into the
// alloca and therefore we need to copy.		// alloca and therefore we need to copy.
void visitStoreInst(StoreInst &SI) { PI.setAborted(&SI); }		void visitStoreInst(StoreInst &SI) { PI.setEscaped(&SI); }
		hoyUnsubmitted Not Done Reply Inline Actions If the use is the pointer, should the store be rewritten if it is dominated by coro.begin? hoy: If the use is the pointer, should the store be rewritten if it is dominated by coro.begin?
		lxfindAuthorUnsubmitted Done Reply Inline Actions We only visit instructions before coro.begin, so it won't be dominated by coro.begin. lxfind: We only visit instructions before coro.begin, so it won't be dominated by coro.begin.
		hoyUnsubmitted Not Done Reply Inline Actions Should we check if the use is the left-hand side or right-hand side? hoy: Should we check if the use is the left-hand side or right-hand side?
		hoyUnsubmitted Not Done Reply Inline Actions Ignore my last comment please. I'm wondering why it's changed from escaped to aborted. hoy: Ignore my last comment please. I'm wondering why it's changed from escaped to aborted.
		hoyUnsubmitted Not Done Reply Inline Actions So why do we change from `setAborted` to `setEscaped`? hoy: So why do we change from `setAborted` to `setEscaped`?
		lxfindAuthorUnsubmitted Done Reply Inline Actions setAborted will cause the visiting process to terminate right after this instruction, which was fine in the previous implementation because all it needed was to find out whether it escapes or gets modified. But now I also need to track all aliases, which mean I have to visit every instruction and cannot abort in the middle. lxfind: setAborted will cause the visiting process to terminate right after this instruction, which was…

		// All mem intrinsics modify the data.
		void visitMemIntrinsic(MemIntrinsic &MI) { PI.setEscaped(&MI); }

		void visitBitCastInst(BitCastInst &BC) {
		Base::visitBitCastInst(BC);
		handleAlias(BC);
		}

		void visitAddrSpaceCastInst(AddrSpaceCastInst &ASC) {
		Base::visitAddrSpaceCastInst(ASC);
		handleAlias(ASC);
		}

// Any other instruction that is not filtered out by PtrUseVisitor, will		void visitGetElementPtrInst(GetElementPtrInst &GEPI) {
		hoyUnsubmitted Not Done Reply Inline Actions Is a normal function call needed to be handled? hoy: Is a normal function call needed to be handled?
		lxfindAuthorUnsubmitted Done Reply Inline Actions It's already handled in the base class, where it assumes escape. lxfind: It's already handled in the base class, where it assumes escape.
// result in the copy.		// The base visitor will adjust Offset accordingly.
void visitInstruction(Instruction &I) { PI.setAborted(&I); }		Base::visitGetElementPtrInst(GEPI);
		handleAlias(GEPI);
		}

		const SmallVector<std::pair<Instruction *, APInt>, 1> &getAliases() const {
		return Aliases;
		}

private:		private:
const DominatorTree &DT;		const DominatorTree &DT;
const CoroBeginInst &CoroBegin;		const CoroBeginInst &CoroBegin;
};		// All alias to the original AllocaInst, and are used after CoroBegin.
} // namespace		// Each entry contains the instruction and the offset in the original Alloca.
static bool mightWriteIntoAllocaPtr(AllocaInst &A, const DominatorTree &DT,		SmallVector<std::pair<Instruction *, APInt>, 1> Aliases{};
const CoroBeginInst &CB) {
const DataLayout &DL = A.getModule()->getDataLayout();		bool usedAfterCoroBegin(Instruction &I) {
AllocaUseVisitor Visitor(DL, DT, CB);		for (auto &U : I.uses())
auto PtrI = Visitor.visitPtr(A);		if (DT.dominates(&CoroBegin, U))
if (PtrI.isEscaped() \|\| PtrI.isAborted()) {
auto *PointerEscapingInstr = PtrI.getEscapingInst()
? PtrI.getEscapingInst()
: PtrI.getAbortingInst();
if (PointerEscapingInstr) {
LLVM_DEBUG(
dbgs() << "AllocaInst copy was triggered by instruction: "
<< *PointerEscapingInstr << "\n");
}
return true;		return true;
}
return false;		return false;
}		}

		void handleAlias(Instruction &I) {
		if (!usedAfterCoroBegin(I))
		return;

		assert(IsOffsetKnown && "Can only handle alias with known offset created "
		"before CoroBegin and used after");
		Aliases.emplace_back(&I, Offset);
		}
		};
		} // namespace

// We need to make room to insert a spill after initial PHIs, but before		// We need to make room to insert a spill after initial PHIs, but before
// catchswitch instruction. Placing it before violates the requirement that		// catchswitch instruction. Placing it before violates the requirement that
// catchswitch, like all other EHPads must be the first nonPHI in a block.		// catchswitch, like all other EHPads must be the first nonPHI in a block.
//		//
// Split away catchswitch into a separate block and insert in its place:		// Split away catchswitch into a separate block and insert in its place:
//		//
// cleanuppad <InsertPt> cleanupret.		// cleanuppad <InsertPt> cleanupret.
//		//
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	if (!UsersToUpdate.empty()) {
for (Instruction *I : UsersToUpdate)		for (Instruction *I : UsersToUpdate)
I->replaceUsesOfWith(A, G);		I->replaceUsesOfWith(A, G);
}		}
}		}
// If we discovered such uses not dominated by CoroBegin, see if any of them		// If we discovered such uses not dominated by CoroBegin, see if any of them
// preceed coro begin and have instructions that can modify the		// preceed coro begin and have instructions that can modify the
// value of the alloca and therefore would require a copying the value into		// value of the alloca and therefore would require a copying the value into
// the spill slot in the coroutine frame.		// the spill slot in the coroutine frame.
if (MightNeedToCopy) {		if (MightNeedToCopy) {
		ChuanqiXuUnsubmitted Done Reply Inline Actions If there is a case: %a = alloca.. %b = bitcast %a to ... @coro.begin // some use of b but no use of a in this case, all the use of `%a` is dominated by Coro.begin, so the variable `MightNeedToCopy` may not be true, then the codes before won't be executed. ChuanqiXu: If there is a case: ``` %a = alloca.. %b = bitcast %a to ... @coro.begin // some use of b but…
		lxfindAuthorUnsubmitted Done Reply Inline Actions MightNeedToCopy will be true if any use if the AllocaInst is not dominated by coro.begin. So in your example, since %b uses %a and happens before coro.begin, it's not dominated by coro.begin, and hence MightNeedToCopy will be true. lxfind: MightNeedToCopy will be true if any use if the AllocaInst is not dominated by coro.begin. So in…
		ChuanqiXuUnsubmitted Done Reply Inline Actions Yes, you are right. ChuanqiXu: Yes, you are right.
Builder.SetInsertPoint(FramePtr->getNextNode());		Builder.SetInsertPoint(FramePtr->getNextNode());

for (auto &P : Allocas) {		for (auto &P : Allocas) {
AllocaInst *const A = P.first;		AllocaInst *const A = P.first;
if (mightWriteIntoAllocaPtr(A, DT, CB)) {		AllocaUseVisitor Visitor(A->getModule()->getDataLayout(), DT, *CB);
		auto PtrI = Visitor.visitPtr(*A);
		assert(!PtrI.isAborted());
		if (PtrI.isEscaped()) {
		// isEscaped really means potentially modified before CoroBegin.
if (A->isArrayAllocation())		if (A->isArrayAllocation())
report_fatal_error(		report_fatal_error(
"Coroutines cannot handle copying of array allocas yet");		"Coroutines cannot handle copying of array allocas yet");

auto *G = GetFramePointer(P.second, A);		auto *G = GetFramePointer(P.second, A);
auto *Value = Builder.CreateLoad(A->getAllocatedType(), A);		auto *Value = Builder.CreateLoad(A->getAllocatedType(), A);
Builder.CreateStore(Value, G);		Builder.CreateStore(Value, G);
}		}
		// For each alias to Alloca created before CoroBegin but used after
		// CoroBegin, we recreate them after CoroBegin by appplying the offset
		// to the pointer in the frame.
		for (const auto &Alias : Visitor.getAliases()) {
		auto *FramePtr = GetFramePointer(P.second, A);
		auto *FramePtrRaw =
		Builder.CreateBitCast(FramePtr, Type::getInt8PtrTy(C));
		auto *AliasPtr = Builder.CreateGEP(
		FramePtrRaw, ConstantInt::get(Type::getInt64Ty(C), Alias.second));
		auto *AliasPtrTyped =
		Builder.CreateBitCast(AliasPtr, Alias.first->getType());
		Alias.first->replaceUsesWithIf(
		AliasPtrTyped, [&](Use &U) { return DT.dominates(CB, U); });
		}
}		}
}		}
return FramePtr;		return FramePtr;
}		}

// Sets the unwind edge of an instruction to a particular successor.		// Sets the unwind edge of an instruction to a particular successor.
static void setUnwindEdgeTo(Instruction TI, BasicBlock Succ) {		static void setUnwindEdgeTo(Instruction TI, BasicBlock Succ) {
if (auto *II = dyn_cast<InvokeInst>(TI))		if (auto *II = dyn_cast<InvokeInst>(TI))
▲ Show 20 Lines • Show All 811 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-param-copy.ll

	; Check that we create copy the data from the alloca into the coroutine			; Check that we create copy the data from the alloca into the coroutine
	; frame slot if it was written to.			; frame slot if it was written to.
	; RUN: opt < %s -coro-split -S \| FileCheck %s			; RUN: opt < %s -coro-split -S \| FileCheck %s
	; RUN: opt < %s -passes=coro-split -S \| FileCheck %s			; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

	define i8* @f() "coroutine.presplit"="1" {			define i8* @f() "coroutine.presplit"="1" {
	entry:			entry:
				%a.addr = alloca i64 ; read-only before coro.begin
				%a = load i64, i64* %a.addr ; cannot modify the value, don't need to copy

	%x.addr = alloca i64			%x.addr = alloca i64
	call void @use(i64* %x.addr) ; might write to %x			call void @use(i64* %x.addr) ; uses %x.addr before coro.begin

	%y.addr = alloca i64			%y.addr = alloca i64
	%y = load i64, i64* %y.addr ; cannot modify the value, don't need to copy			%y.cast = bitcast i64* %y.addr to i8* ; alias created and used after coro.begin
	call void @print(i64 %y)
				%z.addr = alloca i64
				%flag = call i1 @check()
				br i1 %flag, label %flag_true, label %flag_merge

				flag_true:
				call void @use(i64* %z.addr) ; conditionally used %z.addr
				br label %flag_merge

				flag_merge:
	%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)			%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
	%size = call i32 @llvm.coro.size.i32()			%size = call i32 @llvm.coro.size.i32()
	%alloc = call i8* @myAlloc(i64 %y, i32 %size)			%alloc = call i8* @myAlloc(i32 %size)
	%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)			%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				call void @llvm.memset.p0i8.i32(i8* %y.cast, i8 1, i32 4, i1 false)
	%0 = call i8 @llvm.coro.suspend(token none, i1 false)			%0 = call i8 @llvm.coro.suspend(token none, i1 false)
	switch i8 %0, label %suspend [i8 0, label %resume			switch i8 %0, label %suspend [i8 0, label %resume
	i8 1, label %cleanup]			i8 1, label %cleanup]
	resume:			resume:
				call void @use(i64* %a.addr)
	call void @use(i64* %x.addr)			call void @use(i64* %x.addr)
	call void @use(i64* %y.addr)			call void @use(i64* %y.addr)
				call void @use(i64* %z.addr)
	br label %cleanup			br label %cleanup

	cleanup:			cleanup:
	%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)			%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
	call void @free(i8* %mem)			call void @free(i8* %mem)
	br label %suspend			br label %suspend
	suspend:			suspend:
	call i1 @llvm.coro.end(i8* %hdl, i1 0)			call i1 @llvm.coro.end(i8* %hdl, i1 0)
	ret i8* %hdl			ret i8* %hdl
	}			}

	; See that we added both x and y to the frame.			; See that we added both x and y to the frame.
	; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i64, i64, i1 }			; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i64, i64, i64, i64, i1 }

	; See that all of the uses prior to coro-begin stays put.			; See that all of the uses prior to coro-begin stays put.
	; CHECK-LABEL: define i8* @f() {			; CHECK-LABEL: define i8* @f() {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: %a.addr = alloca i64
	; CHECK-NEXT: %x.addr = alloca i64			; CHECK-NEXT: %x.addr = alloca i64
	; CHECK-NEXT: call void @use(i64* %x.addr)			; CHECK-NEXT: call void @use(i64* %x.addr)
	; CHECK-NEXT: %y.addr = alloca i64			; CHECK-NEXT: %y.addr = alloca i64
	; CHECK-NEXT: %y = load i64, i64* %y.addr			; CHECK-NEXT: %z.addr = alloca i64
	; CHECK-NEXT: call void @print(i64 %y)

	; See that we only copy the x as y was not modified prior to coro.begin.			; See that we only copy the x as y was not modified prior to coro.begin.
	; CHECK: store void (%f.Frame) @f.destroy, void (%f.Frame)* %destroy.addr			; CHECK: store void (%f.Frame) @f.destroy, void (%f.Frame)* %destroy.addr
	; CHECK-NEXT: %0 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2			; The next 3 instructions are to copy data in %x.addr from stack to frame.
	; CHECK-NEXT: %1 = load i64, i64* %x.addr			; CHECK-NEXT: %0 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 3
	; CHECK-NEXT: store i64 %1, i64* %0			; CHECK-NEXT: %1 = load i64, i64* %x.addr, align 4
	; CHECK-NEXT: %index.addr1 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4			; CHECK-NEXT: store i64 %1, i64* %0, align 4
	; CHECK-NEXT: store i1 false, i1* %index.addr1			; The next 2 instructions are to recreate %y.cast in the original IR.
				; CHECK-NEXT: %2 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4
				; CHECK-NEXT: %3 = bitcast i64* %2 to i8*
				; The next 3 instructions are to copy data in %z.addr from stack to frame.
				; CHECK-NEXT: %4 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
				; CHECK-NEXT: %5 = load i64, i64* %z.addr, align 4
				; CHECK-NEXT: store i64 %5, i64* %4, align 4
				; CHECK-NEXT: call void @llvm.memset.p0i8.i32(i8* %3, i8 1, i32 4, i1 false)
				; CHECK-NEXT: %index.addr1 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 6
				; CHECK-NEXT: store i1 false, i1* %index.addr1, align 1
	; CHECK-NEXT: ret i8* %hdl			; CHECK-NEXT: ret i8* %hdl


	declare i8* @llvm.coro.free(token, i8*)			declare i8* @llvm.coro.free(token, i8*)
	declare i32 @llvm.coro.size.i32()			declare i32 @llvm.coro.size.i32()
	declare i8 @llvm.coro.suspend(token, i1)			declare i8 @llvm.coro.suspend(token, i1)
	declare void @llvm.coro.resume(i8*)			declare void @llvm.coro.resume(i8*)
	declare void @llvm.coro.destroy(i8*)			declare void @llvm.coro.destroy(i8*)

	declare token @llvm.coro.id(i32, i8, i8, i8*)			declare token @llvm.coro.id(i32, i8, i8, i8*)
	declare i1 @llvm.coro.alloc(token)			declare i1 @llvm.coro.alloc(token)
	declare i8* @llvm.coro.begin(token, i8*)			declare i8* @llvm.coro.begin(token, i8*)
	declare i1 @llvm.coro.end(i8*, i1)			declare i1 @llvm.coro.end(i8*, i1)

	declare noalias i8* @myAlloc(i64, i32)			declare void @llvm.memset.p0i8.i32(i8*, i8, i32, i1)
	declare void @print(i64)
				declare noalias i8* @myAlloc(i32)
	declare void @use(i64*)			declare void @use(i64*)
	declare void @free(i8*)			declare void @free(i8*)
				declare i1 @check()