This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
CoroFrame.cpp
-
CoroSplit.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-alloc-with-param.ll
-
coro-param-copy.ll

Differential D66230

[coroutine] Fixes "cannot move instruction since its users are not dominated by CoroBegin" problem.
ClosedPublic

Authored by GorNishanov on Aug 14 2019, 10:14 AM.

Download Raw Diff

Details

Reviewers

llvm-commits
modocache
ben-clayton
tks2103
rjmccall

Commits

rGefe009340448: [coroutine] Fixes "cannot move instruction since its users are not dominated by…
rL368949: [coroutine] Fixes "cannot move instruction since its users are not dominated by…

Summary

Fixes https://bugs.llvm.org/show_bug.cgi?id=36578 and https://bugs.llvm.org/show_bug.cgi?id=36296.
Supersedes: https://reviews.llvm.org/D55966

One of the fundamental transformation that CoroSplit pass performs before splitting the coroutine is to find which values need to survive between suspend and resume and provide a slot for them in the coroutine frame to spill and restore the value as needed.

Coroutine frame becomes available once the storage for it was allocated and that point is marked in the pre-split coroutine with a llvm.coro.begin intrinsic.

FE normally puts all of the user-authored code that would be accessing those values after llvm.coro.begin, however, sometimes instructions accessing those values would end up prior to coro.begin. For example, writing out a value of the parameter into the alloca done by the FE or instructions that are added by the optimization passes such as SROA when it rewrites allocas.

Prior to this change, CoroSplit pass would try to move instructions that may end up accessing the values in the coroutine frame after CoroBegin. However it would run into problems (report_fatal_error) if some of the values would be used both in the allocation function (for example allocator is passed as a parameter to a coroutine) and in the use-authored body of the coroutine.

To handle this case and to simplify the instruction moving logic, this change removes all of the instruction moving. Instead, we only change the uses of the spilled values that are dominated by coro.begin and leave other instructions intact.

Before:

%var = alloca i32
%1 = getelementptr .. %var; ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(

After:

%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin(

If we discover that there is a potential write into an alloca, prior to coro.begin we would copy its value from the alloca into the spill slot in the coroutine frame.

Before:

%var = alloca i32
store .. %var ; will move this one after coro.begin
%f = call i8* @llvm.coro.begin(

After:

%var = alloca i32
store .. %var ;stays put
%f = call i8* @llvm.coro.begin(
%tmp = load %var
store %tmp, %spill.slot.for.var

Note: This change does not handle array allocas as that is something that C++ FE does not produce, but, it can be added in the future if need arises

Diff Detail

Repository: rL LLVM

Event Timeline

GorNishanov created this revision.Aug 14 2019, 10:14 AM

GorNishanov edited the summary of this revision. (Show Details)Aug 14 2019, 10:28 AM

Thanks for looking into this, @GorNishanov! LGTM aside from a tiny nit-pick.

lib/Transforms/Coroutines/CoroFrame.cpp
476 ↗	(On Diff #215156)	A nit-pick, but: it seems like most other variable names in this function are capitalized (e.g.: `PtrI`), but this one is a lowercase `v`. Is there a reason they're not consistent?

This revision is now accepted and ready to land.Aug 14 2019, 12:24 PM

Thank you very much for the reivew, Brian!
Nit addressed. Preparing to land.

Merged with the latest CoroFrame changes from @rjmccall .

Closed by commit rL368949: [coroutine] Fixes "cannot move instruction since its users are not dominated by… (authored by GorNishanov). · Explain WhyAug 14 2019, 5:51 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2019, 5:51 PM

Seems reasonable to me. If we're no longer imposing any *restrictions* for @llvm.coro.begin, is it serving any real purpose? Is it just a way of getting a handle to the coroutine frame?

In D66230#1630874, @rjmccall wrote:

Seems reasonable to me. If we're no longer imposing any *restrictions* for @llvm.coro.begin, is it serving any real purpose? Is it just a way of getting a handle to the coroutine frame?

I think the original reason for the coro.begin is intact. LLVM Coroutines.rst stays "Depending on the alignment requirements of the objects in the coroutine frame and/or on the codegen compactness reasons the pointer returned from coro.begin may be at offset to the %mem argument. "

C++ FE obtains a memory in some way for the coroutine frame and then give it to coro.begin, saying: "Here. Use it for coroutine frame." Lowering of coro.begin may simply return the same pointer or it may do some adjustment to it. One possible adjustment would be, if there are any over-aligned variables with alignment larger than the alignment of the allocator (supplied to coro.id), coro.begin would do (NOT YET IMPLEMENTED) a dynamic adjustment to make sure everything is properly aligned.

That reminded me, there was a brief common C++ coroutine ABI discussion at Cologne and I don't think the results were shared with the interested parties. I'll send an e-mail shortly

Is there a case you're imagining where the adjustment would have side-effects? Because I can see a reason to have an intrinsic that returns a frame pointer, but I don't see why that intrinsic would have any of the restrictions of @llvm.coro.begin.

In D66230#1632510, @rjmccall wrote:

Is there a case you're imagining where the adjustment would have side-effects? Because I can see a reason to have an intrinsic that returns a frame pointer, but I don't see why that intrinsic would have any of the restrictions of @llvm.coro.begin.

Which restrictions are you thinking about?
Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.
If you are thinking about the attributes on the intrinsic itself, those probably can be relaxed.

In D66230#1632621, @GorNishanov wrote:

In D66230#1632510, @rjmccall wrote:

Is there a case you're imagining where the adjustment would have side-effects? Because I can see a reason to have an intrinsic that returns a frame pointer, but I don't see why that intrinsic would have any of the restrictions of @llvm.coro.begin.

Which restrictions are you thinking about?

"A frontend should emit exactly one coro.begin intrinsic per coroutine."

Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.

Does it? Why? There already has to be a single coro.id in the coroutine, and that's the intrinsic that takes useful information. coro.begin just has a bunch of requirements and no clear purpose except to return the frame, which could just as well be done with a duplicable intrinsic. Frame allocation has to happen in the ramp prologue anyway, and the underlying observation of this particular patch is that trying to move coro.begin to reflect its logical order in the function is just a lot of complication for no clear purpose. And honestly the frame pointer is not usually a useful thing to expose in a coroutine representation; aspects of frame layout being exposed in the ABI is a property of the switch lowering, not coroutines in general.

John.

In D66230#1632625, @rjmccall wrote:

aspects of frame layout being exposed in the ABI is a property of the switch lowering, not coroutines in general.

You may be right. Given that now we have two frontends targeting LLVM Coroutines, some refactoring may be in order. I need to study more the Swift approach before I can form an opinion.

Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.

Does it? Why? There already has to be a single coro.id in the coroutine, and that's the intrinsic that takes useful information. coro.begin just has a bunch of requirements and no clear purpose except to return the frame, which could just as well be done with a duplicable intrinsic.

Here is how it looks to me: C++ frontends emits the following structure (simplified):

%id = coro.id(stuff)
%mem = SomeAllocCodeCouldBeAnything(stuff)
%frame = coro.begin(%mem)

coro.begin "blesses" the memory as the one to be used in the coroutine and therefore should dominate any possible uses of data that go into the coroutine frame and gives a convenient place to dump spills, copies, etc.

If we did not allow arbitrary allocation logic in c++, there would be no need for coro.begin at all.

Gor

In D66230#1633348, @GorNishanov wrote:
In D66230#1632625, @rjmccall wrote:

aspects of frame layout being exposed in the ABI is a property of the switch lowering, not coroutines in general.

You may be right. Given that now we have two frontends targeting LLVM Coroutines, some refactoring may be in order. I need to study more the Swift approach before I can form an opinion.

Marking it as NoDuplicate in CoroEarly helps simplify the CoroSplit logic.

Does it? Why? There already has to be a single coro.id in the coroutine, and that's the intrinsic that takes useful information. coro.begin just has a bunch of requirements and no clear purpose except to return the frame, which could just as well be done with a duplicable intrinsic.

Here is how it looks to me: C++ frontends emits the following structure (simplified):
%id = coro.id(stuff)
%mem = SomeAllocCodeCouldBeAnything(stuff)
%frame = coro.begin(%mem)
coro.begin "blesses" the memory as the one to be used in the coroutine and therefore should dominate any possible uses of data that go into the coroutine frame and gives a convenient place to dump spills, copies, etc.

If we did not allow arbitrary allocation logic in c++, there would be no need for coro.begin at all.

Ah, okay, I see. Yes, if there has to be inlined code to do the allocation, then something like coro.begin does seem necessary; although of course the danger is that that arbitrary code — if nothing else, after inlining — might do something that we naively think needs the coroutine frame. For example, if the allocation called a user-defined allocation function, and we inlined that call before splitting the coroutine, and the inlined code contained an alloca (scoped to the call, of course, but we haven't taught CoroFrame to optimize based on alloca lifetimes yet), that would presumably cause serious problems for lowering because the alloca would have uses prior to coro.begin.

How arbitrary can allocation really be? If it can be emitted in a separate function call that can be provided abstractly to coro.id, then coroutine lowering can just insert a call to that function (or whatever more complicated pattern is necessary to enable static elimination of the allocation) and then trigger further optimization/inlining as necessary to optimize that new call. If we need to handle exceptions out of it then maybe we can make coro.id non-nounwind. (We can almost get away with just saying "allocation is the first thing the function does, so it never happens in an interesting EH context. Unfortunately, there are some features/ABIs that require parameters to be destroyed in the callee, which can mean that everything in the function is in an interesting EH context, even generated code in the prologue.)

@GorNishanov This fix seems problematic.
Consider this code:

%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin
store ... %1

After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed).
To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior.
I wonder what would be a correct fix?

Also, there seems to be a few other minor problems with this fix, for instance, in AllocaUseVisitor, we are only checking escape and store instructions. However this is insufficient to cover all potential writes. You can have a call instruction that's non-escaping but modifies the content of the pointer (e.g. llvm.memcpy). Also, in AllocaUseVisitor::visit, we are checking DT.dominates(&I, &CoroBegin), which should really be !DT.dominates(&CoroBegin, &I).

Overall, I find it difficult to patch this change to make it correct. Some fundamental rewrite of this part of the algorithm seems necessary. I would be happy to look into it but would like to hear your opinions first @GorNishanov.

A full repro IR of the issue for reference:

%"struct_foo" = type <{ i64, i64, [8 x i8] }>

define i8* @foo(%"struct_foo"* byval(%"struct_foo") align 8 %arg) "coroutine.presplit"="1" {
entry:
  %local = alloca [24 x i8], align 8
  %local.sub = getelementptr inbounds [24 x i8], [24 x i8]* %local, i64 0, i64 0
  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
  %size = call i32 @llvm.coro.size.i32()
  %alloc = call i8* @myAlloc(i32 %size)
  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
  %arg.addr = bitcast %"struct_foo"* %arg to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(24) %local.sub, i8* nonnull align 8 dereferenceable(24) %arg.addr, i64 24, i1 false)
  %0 = call i8 @llvm.coro.suspend(token none, i1 false)
  switch i8 %0, label %suspend [i8 0, label %resume
                                i8 1, label %cleanup]
resume:
  call void @print2([24 x i8]* %local)
  br label %cleanup

cleanup:
  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
  call void @free(i8* %mem)
  br label %suspend
suspend:
  call i1 @llvm.coro.end(i8* %hdl, i1 0)
  ret i8* %hdl
}

declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg)

declare i8* @llvm.coro.free(token, i8*)
declare i32 @llvm.coro.size.i32()
declare i8  @llvm.coro.suspend(token, i1)
declare void @llvm.coro.resume(i8*)
declare void @llvm.coro.destroy(i8*)

declare token @llvm.coro.id(i32, i8*, i8*, i8*)
declare i1 @llvm.coro.alloc(token)
declare i8* @llvm.coro.begin(token, i8*)
declare i1 @llvm.coro.end(i8*, i1)

declare noalias i8* @myAlloc(i32)
declare double @print(double)
declare void @print2([24 x i8]*)
declare void @free(i8*)

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 30 2020, 9:54 AM

Yes, we at Apple had similar problems and basically had to revert this internally. The basic problem is that LLVM's alloca representation is not well-suited for coroutines — we really need a guarantee that there are no stack allocations live across the coro.begin, or at least none that stay alive until the first suspend. That would be easy with something like SIL's alloc_stack, but LLVM's desire to push local allocations into the entry block really makes it difficult to even talk about the appropriate restrictions here.

Maybe we could force all local allocations in coroutines to be done with the llvm.coro.alloca.* intrinsics that I added? We could then actually verify that no allocations are live across the coro.begin. That would require a fair bit of abstraction in Clang, though, and possibly in some other passes that introduce allocas, and it might severely block optimization unless we teach mem2reg how to handle those intrinsics.

lxfind mentioned this in D86859: [Coroutine] Make dealing with alloca spills more robust.Aug 31 2020, 10:05 AM

In D66230#2246830, @rjmccall wrote:

Yes, we at Apple had similar problems and basically had to revert this internally. The basic problem is that LLVM's alloca representation is not well-suited for coroutines — we really need a guarantee that there are no stack allocations live across the coro.begin, or at least none that stay alive until the first suspend. That would be easy with something like SIL's alloc_stack, but LLVM's desire to push local allocations into the entry block really makes it difficult to even talk about the appropriate restrictions here.

Maybe we could force all local allocations in coroutines to be done with the llvm.coro.alloca.* intrinsics that I added? We could then actually verify that no allocations are live across the coro.begin. That would require a fair bit of abstraction in Clang, though, and possibly in some other passes that introduce allocas, and it might severely block optimization unless we teach mem2reg how to handle those intrinsics.

Thank you @rjmccall for your feedback! Yes I agree with you that there seems no practical way to make this work 100% of the time. I made an attempt in D86859 which should at least make this more robust and handle majority of the cases. In the meanwhile I will look into how this is done using coro.alloca. It would be great if you could help take a look at D86859 too.

lxfind mentioned this in rG59a467ee4fae: [Coroutine] Make dealing with alloca spills more robust.Sep 8 2020, 11:10 AM

lxfind mentioned this in D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function.Apr 13 2021, 3:26 PM

ychen mentioned this in D100739: [Coroutines] Handle overaligned frame allocation (2).Apr 25 2021, 6:55 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Coroutines/

CoroFrame.cpp

163 lines

CoroSplit.cpp

90 lines

test/

Transforms/

Coroutines/

coro-alloc-with-param.ll

96 lines

coro-param-copy.ll

69 lines

Diff 215297

llvm/trunk/lib/Transforms/Coroutines/CoroFrame.cpp

Show All 12 Lines
// GEP + load from the coroutine frame. At the point of the definition we spill		// GEP + load from the coroutine frame. At the point of the definition we spill
// the value into the coroutine frame.		// the value into the coroutine frame.
//		//
// TODO: pack values tightly using liveness info.		// TODO: pack values tightly using liveness info.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CoroInternal.h"		#include "CoroInternal.h"
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
		#include "llvm/Analysis/PtrUseVisitor.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	Shape.RetconLowering.IsFrameInlineInStorage
Layout.getABITypeAlignment(FrameTy) <= Id->getStorageAlignment());		Layout.getABITypeAlignment(FrameTy) <= Id->getStorageAlignment());
break;		break;
}		}
}		}

return FrameTy;		return FrameTy;
}		}

		// We use a pointer use visitor to discover if there are any writes into an
		// alloca that dominates CoroBegin. If that is the case, insertSpills will copy
		// the value from the alloca into the coroutine frame spill slot corresponding
		// to that alloca.
		namespace {
		struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {
		using Base = PtrUseVisitor<AllocaUseVisitor>;
		AllocaUseVisitor(const DataLayout &DL, const DominatorTree &DT,
		const CoroBeginInst &CB)
		: PtrUseVisitor(DL), DT(DT), CoroBegin(CB) {}

		// We are only interested in uses that dominate coro.begin.
		void visit(Instruction &I) {
		if (DT.dominates(&I, &CoroBegin))
		Base::visit(I);
		}
		// We need to provide this overload as PtrUseVisitor uses a pointer based
		// visiting function.
		void visit(Instruction I) { return visit(I); }

		void visitLoadInst(LoadInst &) {} // Good. Nothing to do.

		// If the use is an operand, the pointer escaped and anything can write into
		// that memory. If the use is the pointer, we are definitely writing into the
		// alloca and therefore we need to copy.
		void visitStoreInst(StoreInst &SI) { PI.setAborted(&SI); }

		// Any other instruction that is not filtered out by PtrUseVisitor, will
		// result in the copy.
		void visitInstruction(Instruction &I) { PI.setAborted(&I); }

		private:
		const DominatorTree &DT;
		const CoroBeginInst &CoroBegin;
		};
		} // namespace
		static bool mightWriteIntoAllocaPtr(AllocaInst &A, const DominatorTree &DT,
		const CoroBeginInst &CB) {
		const DataLayout &DL = A.getModule()->getDataLayout();
		AllocaUseVisitor Visitor(DL, DT, CB);
		auto PtrI = Visitor.visitPtr(A);
		if (PtrI.isEscaped() \|\| PtrI.isAborted()) {
		auto *PointerEscapingInstr = PtrI.getEscapingInst()
		? PtrI.getEscapingInst()
		: PtrI.getAbortingInst();
		if (PointerEscapingInstr) {
		LLVM_DEBUG(
		dbgs() << "AllocaInst copy was triggered by instruction: "
		<< *PointerEscapingInstr << "\n");
		}
		return true;
		}
		return false;
		}

// We need to make room to insert a spill after initial PHIs, but before		// We need to make room to insert a spill after initial PHIs, but before
// catchswitch instruction. Placing it before violates the requirement that		// catchswitch instruction. Placing it before violates the requirement that
// catchswitch, like all other EHPads must be the first nonPHI in a block.		// catchswitch, like all other EHPads must be the first nonPHI in a block.
//		//
// Split away catchswitch into a separate block and insert in its place:		// Split away catchswitch into a separate block and insert in its place:
//		//
// cleanuppad <InsertPt> cleanupret.		// cleanuppad <InsertPt> cleanupret.
//		//
Show All 35 Lines
static Instruction *insertSpills(const SpillInfo &Spills, coro::Shape &Shape) {		static Instruction *insertSpills(const SpillInfo &Spills, coro::Shape &Shape) {
auto *CB = Shape.CoroBegin;		auto *CB = Shape.CoroBegin;
LLVMContext &C = CB->getContext();		LLVMContext &C = CB->getContext();
IRBuilder<> Builder(CB->getNextNode());		IRBuilder<> Builder(CB->getNextNode());
StructType *FrameTy = Shape.FrameTy;		StructType *FrameTy = Shape.FrameTy;
PointerType *FramePtrTy = FrameTy->getPointerTo();		PointerType *FramePtrTy = FrameTy->getPointerTo();
auto *FramePtr =		auto *FramePtr =
cast<Instruction>(Builder.CreateBitCast(CB, FramePtrTy, "FramePtr"));		cast<Instruction>(Builder.CreateBitCast(CB, FramePtrTy, "FramePtr"));
		DominatorTree DT(*CB->getFunction());

Value *CurrentValue = nullptr;		Value *CurrentValue = nullptr;
BasicBlock *CurrentBlock = nullptr;		BasicBlock *CurrentBlock = nullptr;
Value *CurrentReload = nullptr;		Value *CurrentReload = nullptr;

// Proper field number will be read from field definition.		// Proper field number will be read from field definition.
unsigned Index = InvalidFieldIndex;		unsigned Index = InvalidFieldIndex;

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	if (CurrentValue != E.def()) {
InsertPt = splitBeforeCatchSwitch(CSI);		InsertPt = splitBeforeCatchSwitch(CSI);
else		else
InsertPt = &*DefBlock->getFirstInsertionPt();		InsertPt = &*DefBlock->getFirstInsertionPt();
} else if (auto CSI = dyn_cast<AnyCoroSuspendInst>(CurrentValue)) {		} else if (auto CSI = dyn_cast<AnyCoroSuspendInst>(CurrentValue)) {
// Don't spill immediately after a suspend; splitting assumes		// Don't spill immediately after a suspend; splitting assumes
// that the suspend will be followed by a branch.		// that the suspend will be followed by a branch.
InsertPt = CSI->getParent()->getSingleSuccessor()->getFirstNonPHI();		InsertPt = CSI->getParent()->getSingleSuccessor()->getFirstNonPHI();
} else {		} else {
		auto *I = cast<Instruction>(E.def());
		assert(!I->isTerminator() && "unexpected terminator");
// For all other values, the spill is placed immediately after		// For all other values, the spill is placed immediately after
// the definition.		// the definition.
assert(!cast<Instruction>(E.def())->isTerminator() &&		if (DT.dominates(CB, I)) {
"unexpected terminator");		InsertPt = I->getNextNode();
InsertPt = cast<Instruction>(E.def())->getNextNode();		} else {
		// Unless, it is not dominated by CoroBegin, then it will be
		// inserted immediately after CoroFrame is computed.
		InsertPt = FramePtr->getNextNode();
		}
}		}

Builder.SetInsertPoint(InsertPt);		Builder.SetInsertPoint(InsertPt);
auto *G = Builder.CreateConstInBoundsGEP2_32(		auto *G = Builder.CreateConstInBoundsGEP2_32(
FrameTy, FramePtr, 0, Index,		FrameTy, FramePtr, 0, Index,
CurrentValue->getName() + Twine(".spill.addr"));		CurrentValue->getName() + Twine(".spill.addr"));
Builder.CreateStore(CurrentValue, G);		Builder.CreateStore(CurrentValue, G);
}		}
Show All 21 Lines	static Instruction *insertSpills(const SpillInfo &Spills, coro::Shape &Shape) {
}		}

BasicBlock *FramePtrBB = FramePtr->getParent();		BasicBlock *FramePtrBB = FramePtr->getParent();

auto SpillBlock =		auto SpillBlock =
FramePtrBB->splitBasicBlock(FramePtr->getNextNode(), "AllocaSpillBB");		FramePtrBB->splitBasicBlock(FramePtr->getNextNode(), "AllocaSpillBB");
SpillBlock->splitBasicBlock(&SpillBlock->front(), "PostSpill");		SpillBlock->splitBasicBlock(&SpillBlock->front(), "PostSpill");
Shape.AllocaSpillBlock = SpillBlock;		Shape.AllocaSpillBlock = SpillBlock;

// If we found any allocas, replace all of their remaining uses with Geps.		// If we found any allocas, replace all of their remaining uses with Geps.
Builder.SetInsertPoint(&SpillBlock->front());		// Note: we cannot do it indiscriminately as some of the uses may not be
		// dominated by CoroBegin.
		bool MightNeedToCopy = false;
		Builder.SetInsertPoint(&Shape.AllocaSpillBlock->front());
		SmallVector<Instruction *, 4> UsersToUpdate;
		for (auto &P : Allocas) {
		AllocaInst *const A = P.first;
		UsersToUpdate.clear();
		for (User *U : A->users()) {
		auto *I = cast<Instruction>(U);
		if (DT.dominates(CB, I))
		UsersToUpdate.push_back(I);
		else
		MightNeedToCopy = true;
		}
		if (!UsersToUpdate.empty()) {
		auto *G = GetFramePointer(P.second, A);
		G->takeName(A);
		for (Instruction *I : UsersToUpdate)
		I->replaceUsesOfWith(A, G);
		}
		}
		// If we discovered such uses not dominated by CoroBegin, see if any of them
		// preceed coro begin and have instructions that can modify the
		// value of the alloca and therefore would require a copying the value into
		// the spill slot in the coroutine frame.
		if (MightNeedToCopy) {
		Builder.SetInsertPoint(FramePtr->getNextNode());

for (auto &P : Allocas) {		for (auto &P : Allocas) {
auto *G = GetFramePointer(P.second, P.first);		AllocaInst *const A = P.first;
		if (mightWriteIntoAllocaPtr(A, DT, CB)) {
		if (A->isArrayAllocation())
		report_fatal_error(
		"Coroutines cannot handle copying of array allocas yet");

// We are not using ReplaceInstWithInst(P.first, cast<Instruction>(G)) here,		auto *G = GetFramePointer(P.second, A);
// as we are changing location of the instruction.		auto *Value = Builder.CreateLoad(A);
G->takeName(P.first);		Builder.CreateStore(Value, G);
P.first->replaceAllUsesWith(G);		}
P.first->eraseFromParent();		}
}		}
return FramePtr;		return FramePtr;
}		}

// Sets the unwind edge of an instruction to a particular successor.		// Sets the unwind edge of an instruction to a particular successor.
static void setUnwindEdgeTo(Instruction TI, BasicBlock Succ) {		static void setUnwindEdgeTo(Instruction TI, BasicBlock Succ) {
if (auto *II = dyn_cast<InvokeInst>(TI))		if (auto *II = dyn_cast<InvokeInst>(TI))
II->setUnwindDest(Succ);		II->setUnwindDest(Succ);
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	for (auto const &E : Spills) {
}		}

// Replace all uses of CurrentDef in the current instruction with the		// Replace all uses of CurrentDef in the current instruction with the
// CurrentMaterialization for the block.		// CurrentMaterialization for the block.
E.user()->replaceUsesOfWith(CurrentDef, CurrentMaterialization);		E.user()->replaceUsesOfWith(CurrentDef, CurrentMaterialization);
}		}
}		}

// Move early uses of spilled variable after CoroBegin.
// For example, if a parameter had address taken, we may end up with the code
// like:
// define @f(i32 %n) {
// %n.addr = alloca i32
// store %n, %n.addr
// ...
// call @coro.begin
// we need to move the store after coro.begin
static void moveSpillUsesAfterCoroBegin(Function &F, SpillInfo const &Spills,
CoroBeginInst *CoroBegin) {
DominatorTree DT(F);
SmallVector<Instruction *, 8> NeedsMoving;

Value *CurrentValue = nullptr;

for (auto const &E : Spills) {
if (CurrentValue == E.def())
continue;

CurrentValue = E.def();

for (User *U : CurrentValue->users()) {
Instruction *I = cast<Instruction>(U);
if (!DT.dominates(CoroBegin, I)) {
LLVM_DEBUG(dbgs() << "will move: " << *I << "\n");

// TODO: Make this more robust. Currently if we run into a situation
// where simple instruction move won't work we panic and
// report_fatal_error.
for (User *UI : I->users()) {
if (!DT.dominates(CoroBegin, cast<Instruction>(UI)))
report_fatal_error("cannot move instruction since its users are not"
" dominated by CoroBegin");
}

NeedsMoving.push_back(I);
}
}
}

Instruction *InsertPt = CoroBegin->getNextNode();
for (Instruction *I : NeedsMoving)
I->moveBefore(InsertPt);
}

// Splits the block at a particular instruction unless it is the first		// Splits the block at a particular instruction unless it is the first
// instruction in the block with a single predecessor.		// instruction in the block with a single predecessor.
static BasicBlock splitBlockIfNotFirst(Instruction I, const Twine &Name) {		static BasicBlock splitBlockIfNotFirst(Instruction I, const Twine &Name) {
auto *BB = I->getParent();		auto *BB = I->getParent();
if (&BB->front() == I) {		if (&BB->front() == I) {
if (BB->getSinglePredecessor()) {		if (BB->getSinglePredecessor()) {
BB->setName(Name);		BB->setName(Name);
return BB;		return BB;
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	for (User *U : I.users())
// We cannot spill a token.		// We cannot spill a token.
if (I.getType()->isTokenTy())		if (I.getType()->isTokenTy())
report_fatal_error(		report_fatal_error(
"token definition is separated from the use by a suspend point");		"token definition is separated from the use by a suspend point");
Spills.emplace_back(&I, U);		Spills.emplace_back(&I, U);
}		}
}		}
LLVM_DEBUG(dump("Spills", Spills));		LLVM_DEBUG(dump("Spills", Spills));
moveSpillUsesAfterCoroBegin(F, Spills, Shape.CoroBegin);
Shape.FrameTy = buildFrameType(F, Shape, Spills);		Shape.FrameTy = buildFrameType(F, Shape, Spills);
Shape.FramePtr = insertSpills(Spills, Shape);		Shape.FramePtr = insertSpills(Spills, Shape);
lowerLocalAllocas(LocalAllocas, DeadInstructions);		lowerLocalAllocas(LocalAllocas, DeadInstructions);

for (auto I : DeadInstructions)		for (auto I : DeadInstructions)
I->eraseFromParent();		I->eraseFromParent();
}		}

llvm/trunk/lib/Transforms/Coroutines/CoroSplit.cpp

Show First 20 Lines • Show All 1,147 Lines • ▼ Show 20 Lines	if (simplifySuspendPoint(cast<CoroSuspendInst>(S[I]), Shape.CoroBegin)) {
continue;		continue;
}		}
if (++I == N)		if (++I == N)
break;		break;
}		}
S.resize(N);		S.resize(N);
}		}

static SmallPtrSet<BasicBlock , 4> getCoroBeginPredBlocks(CoroBeginInst CB) {
// Collect all blocks that we need to look for instructions to relocate.
SmallPtrSet<BasicBlock *, 4> RelocBlocks;
SmallVector<BasicBlock *, 4> Work;
Work.push_back(CB->getParent());

do {
BasicBlock *Current = Work.pop_back_val();
for (BasicBlock *BB : predecessors(Current))
if (RelocBlocks.count(BB) == 0) {
RelocBlocks.insert(BB);
Work.push_back(BB);
}
} while (!Work.empty());
return RelocBlocks;
}

static SmallPtrSet<Instruction *, 8>
getNotRelocatableInstructions(CoroBeginInst *CoroBegin,
SmallPtrSetImpl<BasicBlock *> &RelocBlocks) {
SmallPtrSet<Instruction *, 8> DoNotRelocate;
// Collect all instructions that we should not relocate
SmallVector<Instruction *, 8> Work;

// Start with CoroBegin and terminators of all preceding blocks.
Work.push_back(CoroBegin);
BasicBlock *CoroBeginBB = CoroBegin->getParent();
for (BasicBlock *BB : RelocBlocks)
if (BB != CoroBeginBB)
Work.push_back(BB->getTerminator());

// For every instruction in the Work list, place its operands in DoNotRelocate
// set.
do {
Instruction *Current = Work.pop_back_val();
LLVM_DEBUG(dbgs() << "CoroSplit: Will not relocate: " << *Current << "\n");
DoNotRelocate.insert(Current);
for (Value *U : Current->operands()) {
auto *I = dyn_cast<Instruction>(U);
if (!I)
continue;

if (auto *A = dyn_cast<AllocaInst>(I)) {
// Stores to alloca instructions that occur before the coroutine frame
// is allocated should not be moved; the stored values may be used by
// the coroutine frame allocator. The operands to those stores must also
// remain in place.
for (const auto &User : A->users())
if (auto *SI = dyn_cast<llvm::StoreInst>(User))
if (RelocBlocks.count(SI->getParent()) != 0 &&
DoNotRelocate.count(SI) == 0) {
Work.push_back(SI);
DoNotRelocate.insert(SI);
}
continue;
}

if (DoNotRelocate.count(I) == 0) {
Work.push_back(I);
DoNotRelocate.insert(I);
}
}
} while (!Work.empty());
return DoNotRelocate;
}

static void relocateInstructionBefore(CoroBeginInst *CoroBegin, Function &F) {
// Analyze which non-alloca instructions are needed for allocation and
// relocate the rest to after coro.begin. We need to do it, since some of the
// targets of those instructions may be placed into coroutine frame memory
// for which becomes available after coro.begin intrinsic.

auto BlockSet = getCoroBeginPredBlocks(CoroBegin);
auto DoNotRelocateSet = getNotRelocatableInstructions(CoroBegin, BlockSet);

Instruction *InsertPt = CoroBegin->getNextNode();
BasicBlock &BB = F.getEntryBlock(); // TODO: Look at other blocks as well.
for (auto B = BB.begin(), E = BB.end(); B != E;) {
Instruction &I = *B++;
if (isa<AllocaInst>(&I))
continue;
if (&I == CoroBegin)
break;
if (DoNotRelocateSet.count(&I))
continue;
I.moveBefore(InsertPt);
}
}

static void splitSwitchCoroutine(Function &F, coro::Shape &Shape,		static void splitSwitchCoroutine(Function &F, coro::Shape &Shape,
SmallVectorImpl<Function *> &Clones) {		SmallVectorImpl<Function *> &Clones) {
assert(Shape.ABI == coro::ABI::Switch);		assert(Shape.ABI == coro::ABI::Switch);

createResumeEntryBlock(F, Shape);		createResumeEntryBlock(F, Shape);
auto ResumeClone = createClone(F, ".resume", Shape,		auto ResumeClone = createClone(F, ".resume", Shape,
CoroCloner::Kind::SwitchResume);		CoroCloner::Kind::SwitchResume);
auto DestroyClone = createClone(F, ".destroy", Shape,		auto DestroyClone = createClone(F, ".destroy", Shape,
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	static void splitCoroutine(Function &F, CallGraph &CG, CallGraphSCC &SCC) {
// up by uses in unreachable blocks, so remove them as a first pass.		// up by uses in unreachable blocks, so remove them as a first pass.
removeUnreachableBlocks(F);		removeUnreachableBlocks(F);

coro::Shape Shape(F);		coro::Shape Shape(F);
if (!Shape.CoroBegin)		if (!Shape.CoroBegin)
return;		return;

simplifySuspendPoints(Shape);		simplifySuspendPoints(Shape);
relocateInstructionBefore(Shape.CoroBegin, F);
buildCoroutineFrame(F, Shape);		buildCoroutineFrame(F, Shape);
replaceFrameSize(Shape);		replaceFrameSize(Shape);

SmallVector<Function*, 4> Clones;		SmallVector<Function*, 4> Clones;

// If there are no suspend points, no split required, just remove		// If there are no suspend points, no split required, just remove
// the allocation and deallocation blocks, they are not needed.		// the allocation and deallocation blocks, they are not needed.
if (Shape.CoroSuspends.empty()) {		if (Shape.CoroSuspends.empty()) {
▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/Coroutines/coro-alloc-with-param.ll

				; Check that we can handle the case when both alloc function and
				; the user body consume the same argument.
				; RUN: opt < %s -coro-split -S \| FileCheck %s

				; using this directly (as it would happen under -O2)
				define i8* @f_direct(i64 %this) "coroutine.presplit"="1" {
				entry:
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @myAlloc(i64 %this, i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				call void @print2(i64 %this)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend
				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; using copy of this (as it would happen under -O0)
				define i8* @f_copy(i64 %this_arg) "coroutine.presplit"="1" {
				entry:
				%this.addr = alloca i64
				store i64 %this_arg, i64* %this.addr
				%this = load i64, i64* %this.addr
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @myAlloc(i64 %this, i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				call void @print2(i64 %this)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend
				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; See if %this was added to the frame
				; CHECK: %f_direct.Frame = type { void (%f_direct.Frame), void (%f_direct.Frame), i1, i1, i64 }
				; CHECK: %f_copy.Frame = type { void (%f_copy.Frame), void (%f_copy.Frame), i1, i1, i64 }

				; See that %this is spilled into the frame
				; CHECK-LABEL: define i8* @f_direct(i64 %this)
				; CHECK: %this.spill.addr = getelementptr inbounds %f_direct.Frame, %f_direct.Frame* %FramePtr, i32 0, i32 4
				; CHECK: store i64 %this, i64* %this.spill.addr
				; CHECK: ret i8* %hdl

				; See that %this is spilled into the frame
				; CHECK-LABEL: define i8* @f_copy(i64 %this_arg)
				; CHECK: %this.spill.addr = getelementptr inbounds %f_copy.Frame, %f_copy.Frame* %FramePtr, i32 0, i32 4
				; CHECK: store i64 %this_arg, i64* %this.spill.addr
				; CHECK: ret i8* %hdl

				; See that %this was loaded from the frame
				; CHECK-LABEL: @f_direct.resume(
				; CHECK: %this.reload = load i64, i64* %this.reload.addr
				; CHECK: call void @print2(i64 %this.reload)
				; CHECK: ret void

				; See that %this was loaded from the frame
				; CHECK-LABEL: @f_copy.resume(
				; CHECK: %this.reload = load i64, i64* %this.reload.addr
				; CHECK: call void @print2(i64 %this.reload)
				; CHECK: ret void

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @myAlloc(i64, i32)
				declare double @print(double)
				declare void @print2(i64)
				declare void @free(i8*)

llvm/trunk/test/Transforms/Coroutines/coro-param-copy.ll

				; Check that we create copy the data from the alloca into the coroutine
				; frame slot if it was written to.
				; RUN: opt < %s -coro-split -S \| FileCheck %s

				define i8* @f() "coroutine.presplit"="1" {
				entry:
				%x.addr = alloca i64
				call void @use(i64* %x.addr) ; might write to %x
				%y.addr = alloca i64
				%y = load i64, i64* %y.addr ; cannot modify the value, don't need to copy
				call void @print(i64 %y)

				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @myAlloc(i64 %y, i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				call void @use(i64* %x.addr)
				call void @use(i64* %y.addr)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend
				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; See that we added both x and y to the frame.
				; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i1, i1, i64, i64 }

				; See that all of the uses prior to coro-begin stays put.
				; CHECK-LABEL: define i8* @f() {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %x.addr = alloca i64
				; CHECK-NEXT: call void @use(i64* %x.addr)
				; CHECK-NEXT: %y.addr = alloca i64
				; CHECK-NEXT: %y = load i64, i64* %y.addr
				; CHECK-NEXT: call void @print(i64 %y)

				; See that we only copy the x as y was not modified prior to coro.begin.
				; CHECK: store void (%f.Frame) @f.destroy, void (%f.Frame)* %destroy.addr
				; CHECK-NEXT: %0 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4
				; CHECK-NEXT: %1 = load i64, i64* %x.addr
				; CHECK-NEXT: store i64 %1, i64* %0
				; CHECK-NEXT: %index.addr1 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 3
				; CHECK-NEXT: store i1 false, i1* %index.addr1
				; CHECK-NEXT: ret i8* %hdl

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @myAlloc(i64, i32)
				declare void @print(i64)
				declare void @use(i64*)
				declare void @free(i8*)