Skip to content

Commit b2a9c02

Browse files
committedAug 10, 2016
[Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possible
Summary: A particular coroutine usage pattern, where a coroutine is created, manipulated and destroyed by the same calling function, is common for coroutines implementing RAII idiom and is suitable for allocation elision optimization which avoid dynamic allocation by storing the coroutine frame as a static `alloca` in its caller. coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed when dynamic allocation elision happens: ``` entry: %elide = call i8* @llvm.coro.alloc() %need.dyn.alloc = icmp ne i8* %elide, null br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc dyn.alloc: %alloc = call i8* @CustomAlloc(i32 4) br label %coro.begin coro.begin: %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] %hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*)) ``` and ``` %mem = call i8* @llvm.coro.free(i8* %hdl) %need.dyn.free = icmp ne i8* %mem, null br i1 %need.dyn.free, label %dyn.free, label %if.end dyn.free: call void @CustomFree(i8* %mem) br label %if.end if.end: ... ``` If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant. Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack. Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization (https://reviews.llvm.org/D23229) 5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234) 6.Add coroutine heap elision + tests. <= we are here 7.Add the rest of the logic (split into more patches) Reviewers: mehdi_amini, majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23245 llvm-svn: 278242
1 parent 1758658 commit b2a9c02

File tree

9 files changed

+410
-70
lines changed

9 files changed

+410
-70
lines changed
 

‎llvm/docs/Coroutines.rst

+42-31
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,8 @@ The LLVM IR for this coroutine looks like this:
9595
entry:
9696
%size = call i32 @llvm.coro.size.i32()
9797
%alloc = call i8* @malloc(i32 %size)
98-
%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
98+
%beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null)
99+
%hdl = call noalias i8* @llvm.coro.frame(token %beg)
99100
br label %loop
100101
loop:
101102
%n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
@@ -115,9 +116,10 @@ The LLVM IR for this coroutine looks like this:
115116
116117
The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
117118
lowered to a constant representing the size required for the coroutine frame.
118-
The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
119-
coroutine handle. The first parameter of `coro.begin` is given a block of memory
120-
to be used if the coroutine frame needs to be allocated dynamically.
119+
The `coro.begin`_ intrinsic initializes the coroutine frame and returns the a
120+
token that is used to obtain the coroutine handle via `coro.frame` intrinsic.
121+
The first parameter of `coro.begin` is given a block of memory to be used if the
122+
coroutine frame needs to be allocated dynamically.
121123

122124
The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
123125
given the coroutine handle, returns a pointer of the memory block to be freed or
@@ -160,12 +162,13 @@ After resume and destroy parts are outlined, function `f` will contain only the
160162
code responsible for creation and initialization of the coroutine frame and
161163
execution of the coroutine until a suspend point is reached:
162164

163-
.. code-block:: llvm
165+
.. code-block:: none
164166
165167
define i8* @f(i32 %n) {
166168
entry:
167169
%alloc = call noalias i8* @malloc(i32 24)
168-
%0 = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* null, i8* null)
170+
%beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null, i8* null)
171+
%0 = call i8* @llvm.coro.frame(token %beg)
169172
%frame = bitcast i8* %0 to %f.frame*
170173
%1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
171174
store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
@@ -219,7 +222,7 @@ In the entry block, we will call `coro.alloc`_ intrinsic that will return `null`
219222
when dynamic allocation is required, and an address of an alloca on the caller's
220223
frame where coroutine frame can be stored if dynamic allocation is elided.
221224

222-
.. code-block:: llvm
225+
.. code-block:: none
223226
224227
entry:
225228
%elide = call i8* @llvm.coro.alloc()
@@ -231,7 +234,7 @@ frame where coroutine frame can be stored if dynamic allocation is elided.
231234
br label %coro.begin
232235
coro.begin:
233236
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
234-
%hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)
237+
%beg = call token @llvm.coro.begin(i8* %phi, i8* null, i32 0, i8* null, i8* null)
235238
236239
In the cleanup block, we will make freeing the coroutine frame conditional on
237240
`coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
@@ -421,7 +424,8 @@ store the current value produced by a coroutine.
421424
br label %coro.begin
422425
coro.begin:
423426
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
424-
%hdl = call noalias i8* @llvm.coro.begin(i8* %phi, i32 0, i8* %pv, i8* null)
427+
%beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* %pv, i8* null)
428+
%hdl = call i8* @llvm.coro.frame(token %beg)
425429
br label %loop
426430
loop:
427431
%n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
@@ -687,15 +691,16 @@ a coroutine user are responsible to makes sure there is no data races.
687691
Example:
688692
""""""""
689693

690-
.. code-block:: llvm
694+
.. code-block:: text
691695
692696
define i8* @f(i32 %n) {
693697
entry:
694698
%promise = alloca i32
695699
%pv = bitcast i32* %promise to i8*
696700
...
697-
; the third argument to coro.begin points to the coroutine promise.
698-
%hdl = call noalias i8* @llvm.coro.begin(i8* %alloc, i32 0, i8* %pv, i8* null)
701+
; the fourth argument to coro.begin points to the coroutine promise.
702+
%beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* %pv, i8* null)
703+
%hdl = call noalias i8* @llvm.coro.frame(token %beg)
699704
...
700705
store i32 42, i32* %promise ; store something into the promise
701706
...
@@ -752,39 +757,43 @@ the coroutine frame.
752757
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
753758
::
754759

755-
declare i8* @llvm.coro.begin(i8* <mem>, i32 <align>, i8* <promise>, i8* <fnaddr>)
760+
declare i8* @llvm.coro.begin(i8* <mem>, i8* <elide>, i32 <align>, i8* <promise>, i8* <fnaddr>)
756761

757762
Overview:
758763
"""""""""
759764

760-
The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
765+
The '``llvm.coro.begin``' intrinsic captures coroutine initialization
766+
information and returns a token that can be used by `coro.frame` intrinsic to
767+
return an address of the coroutine frame.
761768

762769
Arguments:
763770
""""""""""
764771

765772
The first argument is a pointer to a block of memory where coroutine frame
766773
will be stored.
767774

768-
The second argument provides information on the alignment of the memory returned
775+
The second argument is either null or an SSA value of `coro.alloc` intrinsic.
776+
777+
The third argument provides information on the alignment of the memory returned
769778
by the allocation function and given to `coro.begin` by the first argument. If
770779
this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
771780
This argument only accepts constants.
772781

773-
The third argument, if not `null`, designates a particular alloca instruction to
782+
The fourth argument, if not `null`, designates a particular alloca instruction to
774783
be a `coroutine promise`_.
775784

776-
The fourth argument is `null` before coroutine is split, and later is replaced
785+
The fifth argument is `null` before coroutine is split, and later is replaced
777786
to point to a private global constant array containing function pointers to
778787
outlined resume and destroy parts of the coroutine.
779788

780789
Semantics:
781790
""""""""""
782791

783792
Depending on the alignment requirements of the objects in the coroutine frame
784-
and/or on the codegen compactness reasons the pointer returned from `coro.begin`
785-
may be at offset to the `%mem` argument. (This could be beneficial if
786-
instructions that express relative access to data can be more compactly encoded
787-
with small positive and negative offsets).
793+
and/or on the codegen compactness reasons the pointer returned from `coro.frame`
794+
associated with a particular `coro.begin` may be at offset to the `%mem`
795+
argument. (This could be beneficial if instructions that express relative access
796+
to data can be more compactly encoded with small positive and negative offsets).
788797

789798
A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
790799

@@ -807,7 +816,7 @@ Arguments:
807816
""""""""""
808817

809818
A pointer to the coroutine frame. This should be the same pointer that was
810-
returned by prior `coro.begin` call.
819+
returned by prior `coro.frame` call.
811820

812821
Example (custom deallocation function):
813822
"""""""""""""""""""""""""""""""""""""""
@@ -862,10 +871,13 @@ alloca storing the coroutine frame. Otherwise, it is lowered to constant `null`.
862871

863872
A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
864873

874+
If `coro.alloc` is present, the second parameter to `coro.begin` should refer
875+
to it.
876+
865877
Example:
866878
""""""""
867879

868-
.. code-block:: llvm
880+
.. code-block:: text
869881
870882
entry:
871883
%elide = call i8* @llvm.coro.alloc()
@@ -879,7 +891,8 @@ Example:
879891
880892
coro.begin:
881893
%phi = phi i8* [ %elide, %entry ], [ %alloc, %coro.alloc ]
882-
%frame = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* null)
894+
%beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* null, i8* null)
895+
%frame = call i8* @llvm.coro.frame(token %beg)
883896
884897
.. _coro.frame:
885898

@@ -898,14 +911,12 @@ the enclosing coroutine.
898911
Arguments:
899912
""""""""""
900913

901-
None
914+
A token that refers to `coro.begin` instruction.
902915

903916
Semantics:
904917
""""""""""
905918

906-
This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is
907-
a frontend convenience intrinsic that makes it easier to refer to the
908-
coroutine frame.
919+
This intrinsic is lowered to refer to address of the coroutine frame.
909920

910921
.. _coro.end:
911922

@@ -1164,7 +1175,7 @@ CoroElide
11641175
---------
11651176
The pass CoroElide examines if the inlined coroutine is eligible for heap
11661177
allocation elision optimization. If so, it replaces `coro.alloc` and
1167-
`coro.begin` intrinsic with an address of a coroutine frame placed on its caller
1178+
`coro.frame` intrinsic with an address of a coroutine frame placed on its caller
11681179
and replaces `coro.free` intrinsics with `null` to remove the deallocation code.
11691180
This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
11701181
calls to resume and destroy functions for a particular coroutine where possible.
@@ -1178,11 +1189,11 @@ Upstreaming sequence (rough plan)
11781189
=================================
11791190
#. Add documentation.
11801191
#. Add coroutine intrinsics.
1181-
#. Add empty coroutine passes. <== we are here
1192+
#. Add empty coroutine passes.
11821193
#. Add coroutine devirtualization + tests.
11831194
#. Add CGSCC restart trigger + tests.
11841195
#. Add coroutine heap elision + tests.
1185-
#. Add custom allocation heap elision + tests.
1196+
#. Add custom allocation heap elision + tests. <== we are here
11861197
#. Add coroutine splitting logic + tests.
11871198
#. Add simple coroutine frame builder + tests.
11881199
#. Add the rest of the logic + tests. (Maybe split further as needed).

‎llvm/include/llvm/IR/Intrinsics.td

+5-5
Original file line numberDiff line numberDiff line change
@@ -603,16 +603,16 @@ def int_experimental_gc_relocate : Intrinsic<[llvm_any_ty],
603603
// Coroutine Structure Intrinsics.
604604

605605
def int_coro_alloc : Intrinsic<[llvm_ptr_ty], [], []>;
606-
def int_coro_begin : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i32_ty,
607-
llvm_ptr_ty, llvm_ptr_ty],
608-
[WriteOnly<0>, ReadNone<2>, ReadOnly<3>,
609-
NoCapture<3>]>;
606+
def int_coro_begin : Intrinsic<[llvm_token_ty], [llvm_ptr_ty, llvm_ptr_ty,
607+
llvm_i32_ty, llvm_ptr_ty, llvm_ptr_ty],
608+
[WriteOnly<0>, WriteOnly<0>,
609+
ReadNone<3>, ReadOnly<4>, NoCapture<4>]>;
610610

611611
def int_coro_free : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],
612612
[IntrArgMemOnly, ReadOnly<0>, NoCapture<0>]>;
613613
def int_coro_end : Intrinsic<[], [llvm_ptr_ty, llvm_i1_ty], []>;
614614

615-
def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
615+
def int_coro_frame : Intrinsic<[llvm_ptr_ty], [llvm_token_ty], [IntrNoMem]>;
616616
def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
617617

618618
def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>;

‎llvm/lib/Transforms/Coroutines/CoroElide.cpp

+131-27
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include "llvm/Analysis/InstructionSimplify.h"
1717
#include "llvm/IR/InstIterator.h"
1818
#include "llvm/Pass.h"
19+
#include "llvm/Support/ErrorHandling.h"
1920

2021
using namespace llvm;
2122

@@ -39,11 +40,29 @@ struct CoroElide : FunctionPass {
3940

4041
bool runOnFunction(Function &F) override;
4142
void getAnalysisUsage(AnalysisUsage &AU) const override {
43+
AU.addRequired<AAResultsWrapperPass>();
4244
AU.setPreservesCFG();
4345
}
4446
};
4547
}
4648

49+
char CoroElide::ID = 0;
50+
INITIALIZE_PASS_BEGIN(
51+
CoroElide, "coro-elide",
52+
"Coroutine frame allocation elision and indirect calls replacement", false,
53+
false)
54+
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
55+
INITIALIZE_PASS_END(
56+
CoroElide, "coro-elide",
57+
"Coroutine frame allocation elision and indirect calls replacement", false,
58+
false)
59+
60+
Pass *llvm::createCoroElidePass() { return new CoroElide(); }
61+
62+
//===----------------------------------------------------------------------===//
63+
// Implementation
64+
//===----------------------------------------------------------------------===//
65+
4766
// Go through the list of coro.subfn.addr intrinsics and replace them with the
4867
// provided constant.
4968
static void replaceWithConstant(Constant *Value,
@@ -68,24 +87,103 @@ static void replaceWithConstant(Constant *Value,
6887
replaceAndRecursivelySimplify(I, Value);
6988
}
7089

90+
// See if any operand of the call instruction references the coroutine frame.
91+
static bool operandReferences(CallInst *CI, AllocaInst *Frame, AAResults &AA) {
92+
for (Value *Op : CI->operand_values())
93+
if (AA.alias(Op, Frame) != NoAlias)
94+
return true;
95+
return false;
96+
}
97+
98+
// Look for any tail calls referencing the coroutine frame and remove tail
99+
// attribute from them, since now coroutine frame resides on the stack and tail
100+
// call implies that the function does not references anything on the stack.
101+
static void removeTailCallAttribute(AllocaInst *Frame, AAResults &AA) {
102+
Function &F = *Frame->getFunction();
103+
MemoryLocation Mem(Frame);
104+
for (Instruction &I : instructions(F))
105+
if (auto *Call = dyn_cast<CallInst>(&I))
106+
if (Call->isTailCall() && operandReferences(Call, Frame, AA)) {
107+
// FIXME: If we ever hit this check. Evaluate whether it is more
108+
// appropriate to retain musttail and allow the code to compile.
109+
if (Call->isMustTailCall())
110+
report_fatal_error("Call referring to the coroutine frame cannot be "
111+
"marked as musttail");
112+
Call->setTailCall(false);
113+
}
114+
}
115+
116+
// Given a resume function @f.resume(%f.frame* %frame), returns %f.frame type.
117+
static Type *getFrameType(Function *Resume) {
118+
auto *ArgType = Resume->getArgumentList().front().getType();
119+
return cast<PointerType>(ArgType)->getElementType();
120+
}
121+
122+
// Finds first non alloca instruction in the entry block of a function.
123+
static Instruction *getFirstNonAllocaInTheEntryBlock(Function *F) {
124+
for (Instruction &I : F->getEntryBlock())
125+
if (!isa<AllocaInst>(&I))
126+
return &I;
127+
llvm_unreachable("no terminator in the entry block");
128+
}
129+
130+
// To elide heap allocations we need to suppress code blocks guarded by
131+
// llvm.coro.alloc and llvm.coro.free instructions.
132+
static void elideHeapAllocations(CoroBeginInst *CoroBegin, Type *FrameTy,
133+
CoroAllocInst *AllocInst, AAResults &AA) {
134+
LLVMContext &C = CoroBegin->getContext();
135+
auto *InsertPt = getFirstNonAllocaInTheEntryBlock(CoroBegin->getFunction());
136+
137+
// FIXME: Design how to transmit alignment information for every alloca that
138+
// is spilled into the coroutine frame and recreate the alignment information
139+
// here. Possibly we will need to do a mini SROA here and break the coroutine
140+
// frame into individual AllocaInst recreating the original alignment.
141+
auto *Frame = new AllocaInst(FrameTy, "", InsertPt);
142+
auto *FrameVoidPtr =
143+
new BitCastInst(Frame, Type::getInt8PtrTy(C), "vFrame", InsertPt);
144+
145+
// Replacing llvm.coro.alloc with non-null value will suppress dynamic
146+
// allocation as it is expected for the frontend to generate the code that
147+
// looks like:
148+
// mem = coro.alloc();
149+
// if (!mem) mem = malloc(coro.size());
150+
// coro.begin(mem, ...)
151+
AllocInst->replaceAllUsesWith(FrameVoidPtr);
152+
AllocInst->eraseFromParent();
153+
154+
// To suppress deallocation code, we replace all llvm.coro.free intrinsics
155+
// associated with this coro.begin with null constant.
156+
auto *NullPtr = ConstantPointerNull::get(Type::getInt8PtrTy(C));
157+
coro::replaceAllCoroFrees(CoroBegin, NullPtr);
158+
CoroBegin->lowerTo(FrameVoidPtr);
159+
160+
// Since now coroutine frame lives on the stack we need to make sure that
161+
// any tail call referencing it, must be made non-tail call.
162+
removeTailCallAttribute(Frame, AA);
163+
}
164+
71165
// See if there are any coro.subfn.addr intrinsics directly referencing
72166
// the coro.begin. If found, replace them with an appropriate coroutine
73167
// subfunction associated with that coro.begin.
74-
static bool replaceIndirectCalls(CoroBeginInst *CoroBegin) {
168+
static bool replaceIndirectCalls(CoroBeginInst *CoroBegin, AAResults &AA) {
75169
SmallVector<CoroSubFnInst *, 8> ResumeAddr;
76170
SmallVector<CoroSubFnInst *, 8> DestroyAddr;
77171

78-
for (User *U : CoroBegin->users()) {
79-
if (auto *II = dyn_cast<CoroSubFnInst>(U)) {
80-
switch (II->getIndex()) {
81-
case CoroSubFnInst::ResumeIndex:
82-
ResumeAddr.push_back(II);
83-
break;
84-
case CoroSubFnInst::DestroyIndex:
85-
DestroyAddr.push_back(II);
86-
break;
87-
default:
88-
llvm_unreachable("unexpected coro.subfn.addr constant");
172+
for (User *CF : CoroBegin->users()) {
173+
assert(isa<CoroFrameInst>(CF) &&
174+
"CoroBegin can be only used by coro.frame instructions");
175+
for (User *U : CF->users()) {
176+
if (auto *II = dyn_cast<CoroSubFnInst>(U)) {
177+
switch (II->getIndex()) {
178+
case CoroSubFnInst::ResumeIndex:
179+
ResumeAddr.push_back(II);
180+
break;
181+
case CoroSubFnInst::DestroyIndex:
182+
DestroyAddr.push_back(II);
183+
break;
184+
default:
185+
llvm_unreachable("unexpected coro.subfn.addr constant");
186+
}
89187
}
90188
}
91189
}
@@ -99,11 +197,28 @@ static bool replaceIndirectCalls(CoroBeginInst *CoroBegin) {
99197
"of coroutine subfunctions");
100198
auto *ResumeAddrConstant =
101199
ConstantExpr::getExtractValue(Resumers, CoroSubFnInst::ResumeIndex);
200+
replaceWithConstant(ResumeAddrConstant, ResumeAddr);
201+
202+
if (DestroyAddr.empty())
203+
return true;
204+
102205
auto *DestroyAddrConstant =
103206
ConstantExpr::getExtractValue(Resumers, CoroSubFnInst::DestroyIndex);
104-
105-
replaceWithConstant(ResumeAddrConstant, ResumeAddr);
106207
replaceWithConstant(DestroyAddrConstant, DestroyAddr);
208+
209+
// If llvm.coro.begin refers to llvm.coro.alloc, we can elide the allocation.
210+
if (auto *AllocInst = CoroBegin->getAlloc()) {
211+
// FIXME: The check above is overly lax. It only checks for whether we have
212+
// an ability to elide heap allocations, not whether it is safe to do so.
213+
// We need to do something like:
214+
// If for every exit from the function where coro.begin is
215+
// live, there is a coro.free or coro.destroy dominating that exit block,
216+
// then it is safe to elide heap allocation, since the lifetime of coroutine
217+
// is fully enclosed in its caller.
218+
auto *FrameTy = getFrameType(cast<Function>(ResumeAddrConstant));
219+
elideHeapAllocations(CoroBegin, FrameTy, AllocInst, AA);
220+
}
221+
107222
return true;
108223
}
109224

@@ -143,20 +258,9 @@ bool CoroElide::runOnFunction(Function &F) {
143258
if (CoroBegins.empty())
144259
return Changed;
145260

261+
AAResults &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
146262
for (auto *CB : CoroBegins)
147-
Changed |= replaceIndirectCalls(CB);
263+
Changed |= replaceIndirectCalls(CB, AA);
148264

149265
return Changed;
150266
}
151-
152-
char CoroElide::ID = 0;
153-
INITIALIZE_PASS_BEGIN(
154-
CoroElide, "coro-elide",
155-
"Coroutine frame allocation elision and indirect calls replacement", false,
156-
false)
157-
INITIALIZE_PASS_END(
158-
CoroElide, "coro-elide",
159-
"Coroutine frame allocation elision and indirect calls replacement", false,
160-
false)
161-
162-
Pass *llvm::createCoroElidePass() { return new CoroElide(); }

‎llvm/lib/Transforms/Coroutines/CoroInstr.h

+63-1
Original file line numberDiff line numberDiff line change
@@ -62,11 +62,57 @@ class LLVM_LIBRARY_VISIBILITY CoroSubFnInst : public IntrinsicInst {
6262
}
6363
};
6464

65+
/// This represents the llvm.coro.alloc instruction.
66+
class LLVM_LIBRARY_VISIBILITY CoroAllocInst : public IntrinsicInst {
67+
public:
68+
// Methods to support type inquiry through isa, cast, and dyn_cast:
69+
static inline bool classof(const IntrinsicInst *I) {
70+
return I->getIntrinsicID() == Intrinsic::coro_alloc;
71+
}
72+
static inline bool classof(const Value *V) {
73+
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
74+
}
75+
};
76+
77+
/// This represents the llvm.coro.frame instruction.
78+
class LLVM_LIBRARY_VISIBILITY CoroFrameInst : public IntrinsicInst {
79+
public:
80+
// Methods to support type inquiry through isa, cast, and dyn_cast:
81+
static inline bool classof(const IntrinsicInst *I) {
82+
return I->getIntrinsicID() == Intrinsic::coro_frame;
83+
}
84+
static inline bool classof(const Value *V) {
85+
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
86+
}
87+
};
88+
89+
/// This represents the llvm.coro.free instruction.
90+
class LLVM_LIBRARY_VISIBILITY CoroFreeInst : public IntrinsicInst {
91+
public:
92+
// Methods to support type inquiry through isa, cast, and dyn_cast:
93+
static inline bool classof(const IntrinsicInst *I) {
94+
return I->getIntrinsicID() == Intrinsic::coro_free;
95+
}
96+
static inline bool classof(const Value *V) {
97+
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
98+
}
99+
};
100+
65101
/// This class represents the llvm.coro.begin instruction.
66102
class LLVM_LIBRARY_VISIBILITY CoroBeginInst : public IntrinsicInst {
67-
enum { MemArg, AlignArg, PromiseArg, InfoArg };
103+
enum { MemArg, ElideArg, AlignArg, PromiseArg, InfoArg };
68104

69105
public:
106+
CoroAllocInst *getAlloc() const {
107+
if (auto *CAI = dyn_cast<CoroAllocInst>(
108+
getArgOperand(ElideArg)->stripPointerCasts()))
109+
return CAI;
110+
111+
return nullptr;
112+
}
113+
114+
Value *getMem() const { return getArgOperand(MemArg); }
115+
70116
Constant *getRawInfo() const {
71117
return cast<Constant>(getArgOperand(InfoArg)->stripPointerCasts());
72118
}
@@ -108,6 +154,22 @@ class LLVM_LIBRARY_VISIBILITY CoroBeginInst : public IntrinsicInst {
108154
return Result;
109155
}
110156

157+
// Replaces all coro.frame intrinsics that are associated with this coro.begin
158+
// to a replacement value and removes coro.begin and all of the coro.frame
159+
// intrinsics.
160+
void lowerTo(Value* Replacement) {
161+
SmallVector<CoroFrameInst*, 4> FrameInsts;
162+
for (auto *CF : this->users())
163+
FrameInsts.push_back(cast<CoroFrameInst>(CF));
164+
165+
for (auto *CF : FrameInsts) {
166+
CF->replaceAllUsesWith(Replacement);
167+
CF->eraseFromParent();
168+
}
169+
170+
this->eraseFromParent();
171+
}
172+
111173
// Methods for support type inquiry through isa, cast, and dyn_cast:
112174
static inline bool classof(const IntrinsicInst *I) {
113175
return I->getIntrinsicID() == Intrinsic::coro_begin;

‎llvm/lib/Transforms/Coroutines/CoroInternal.h

+1
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ void initializeCoroCleanupPass(PassRegistry &);
4242
namespace coro {
4343

4444
bool declaresIntrinsics(Module &M, std::initializer_list<StringRef>);
45+
void replaceAllCoroFrees(CoroBeginInst *CB, Value *Replacement);
4546

4647
// Keeps data and helper functions for lowering coroutine intrinsics.
4748
struct LowererBase {

‎llvm/lib/Transforms/Coroutines/Coroutines.cpp

+18
Original file line numberDiff line numberDiff line change
@@ -122,3 +122,21 @@ bool coro::declaresIntrinsics(Module &M,
122122

123123
return false;
124124
}
125+
126+
// Find all llvm.coro.free instructions associated with the provided coro.begin
127+
// and replace them with the provided replacement value.
128+
void coro::replaceAllCoroFrees(CoroBeginInst *CB, Value *Replacement) {
129+
SmallVector<CoroFreeInst *, 4> CoroFrees;
130+
for (User *FramePtr: CB->users())
131+
for (User *U : FramePtr->users())
132+
if (auto *CF = dyn_cast<CoroFreeInst>(U))
133+
CoroFrees.push_back(CF);
134+
135+
if (CoroFrees.empty())
136+
return;
137+
138+
for (CoroFreeInst *CF : CoroFrees) {
139+
CF->replaceAllUsesWith(Replacement);
140+
CF->eraseFromParent();
141+
}
142+
}

‎llvm/test/Transforms/Coroutines/coro-elide.ll

+9-6
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
; Tests that the coro.destroy and coro.resume are devirtualized where possible,
22
; SCC pipeline restarts and inlines the direct calls.
3-
; RUN: opt < %s -S -inline -coro-elide | FileCheck %s
3+
; RUN: opt < %s -S -inline -coro-elide -dce | FileCheck %s
44

55
declare void @print(i32) nounwind
66

@@ -22,15 +22,16 @@ define fastcc void @f.destroy(i8*) {
2222
; a coroutine start function
2323
define i8* @f() {
2424
entry:
25-
%hdl = call i8* @llvm.coro.begin(i8* null, i32 0, i8* null,
25+
%tok = call token @llvm.coro.begin(i8* null, i8* null, i32 0, i8* null,
2626
i8* bitcast ([2 x void (i8*)*]* @f.resumers to i8*))
27+
%hdl = call i8* @llvm.coro.frame(token %tok)
2728
ret i8* %hdl
2829
}
2930

3031
; CHECK-LABEL: @callResume(
3132
define void @callResume() {
3233
entry:
33-
; CHECK: call i8* @llvm.coro.begin
34+
; CHECK: call token @llvm.coro.begin
3435
%hdl = call i8* @f()
3536

3637
; CHECK-NEXT: call void @print(i32 0)
@@ -50,7 +51,7 @@ entry:
5051
; CHECK-LABEL: @eh(
5152
define void @eh() personality i8* null {
5253
entry:
53-
; CHECK: call i8* @llvm.coro.begin
54+
; CHECK: call token @llvm.coro.begin
5455
%hdl = call i8* @f()
5556

5657
; CHECK-NEXT: call void @print(i32 0)
@@ -70,7 +71,8 @@ ehcleanup:
7071
; no devirtualization here, since coro.begin info parameter is null
7172
define void @no_devirt_info_null() {
7273
entry:
73-
%hdl = call i8* @llvm.coro.begin(i8* null, i32 0, i8* null, i8* null)
74+
%tok = call token @llvm.coro.begin(i8* null, i8* null, i32 0, i8* null, i8* null)
75+
%hdl = call i8* @llvm.coro.frame(token %tok)
7476

7577
; CHECK: call i8* @llvm.coro.subfn.addr(i8* %hdl, i8 0)
7678
%0 = call i8* @llvm.coro.subfn.addr(i8* %hdl, i8 0)
@@ -106,5 +108,6 @@ entry:
106108
}
107109

108110

109-
declare i8* @llvm.coro.begin(i8*, i32, i8*, i8*)
111+
declare token @llvm.coro.begin(i8*, i8*, i32, i8*, i8*)
112+
declare i8* @llvm.coro.frame(token)
110113
declare i8* @llvm.coro.subfn.addr(i8*, i8)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
; Tests that the dynamic allocation and deallocation of the coroutine frame is
2+
; elided and any tail calls referencing the coroutine frame has the tail
3+
; call attribute removed.
4+
; RUN: opt < %s -S -inline -coro-elide -instsimplify -simplifycfg | FileCheck %s
5+
6+
declare void @print(i32) nounwind
7+
8+
%f.frame = type {i32}
9+
10+
declare void @bar(i8*)
11+
12+
declare fastcc void @f.resume(%f.frame*)
13+
declare fastcc void @f.destroy(%f.frame*)
14+
15+
declare void @may_throw()
16+
declare i8* @CustomAlloc(i32)
17+
declare void @CustomFree(i8*)
18+
19+
@f.resumers = internal constant
20+
[2 x void (%f.frame*)*] [void (%f.frame*)* @f.resume, void (%f.frame*)* @f.destroy]
21+
22+
; a coroutine start function
23+
define i8* @f() personality i8* null {
24+
entry:
25+
%elide = call i8* @llvm.coro.alloc()
26+
%need.dyn.alloc = icmp ne i8* %elide, null
27+
br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
28+
dyn.alloc:
29+
%alloc = call i8* @CustomAlloc(i32 4)
30+
br label %coro.begin
31+
coro.begin:
32+
%phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
33+
%beg = call token @llvm.coro.begin(i8* %phi, i8* %elide, i32 0, i8* null,
34+
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
35+
%hdl = call i8* @llvm.coro.frame(token %beg)
36+
invoke void @may_throw()
37+
to label %ret unwind label %ehcleanup
38+
ret:
39+
ret i8* %hdl
40+
41+
ehcleanup:
42+
%tok = cleanuppad within none []
43+
%mem = call i8* @llvm.coro.free(i8* %hdl)
44+
%need.dyn.free = icmp ne i8* %mem, null
45+
br i1 %need.dyn.free, label %dyn.free, label %if.end
46+
dyn.free:
47+
call void @CustomFree(i8* %mem)
48+
br label %if.end
49+
if.end:
50+
cleanupret from %tok unwind to caller
51+
}
52+
53+
; CHECK-LABEL: @callResume(
54+
define void @callResume() {
55+
entry:
56+
; CHECK: alloca %f.frame
57+
; CHECK-NOT: coro.begin
58+
; CHECK-NOT: CustomAlloc
59+
; CHECK: call void @may_throw()
60+
%hdl = call i8* @f()
61+
62+
; Need to remove 'tail' from the first call to @bar
63+
; CHECK-NOT: tail call void @bar(
64+
; CHECK: call void @bar(
65+
tail call void @bar(i8* %hdl)
66+
; CHECK: tail call void @bar(
67+
tail call void @bar(i8* null)
68+
69+
; CHECK-NEXT: call fastcc void bitcast (void (%f.frame*)* @f.resume to void (i8*)*)(i8* %vFrame)
70+
%0 = call i8* @llvm.coro.subfn.addr(i8* %hdl, i8 0)
71+
%1 = bitcast i8* %0 to void (i8*)*
72+
call fastcc void %1(i8* %hdl)
73+
74+
; CHECK-NEXT: call fastcc void bitcast (void (%f.frame*)* @f.destroy to void (i8*)*)(i8* %vFrame)
75+
%2 = call i8* @llvm.coro.subfn.addr(i8* %hdl, i8 1)
76+
%3 = bitcast i8* %2 to void (i8*)*
77+
call fastcc void %3(i8* %hdl)
78+
79+
; CHECK-NEXT: ret void
80+
ret void
81+
}
82+
83+
; a coroutine start function (cannot elide heap alloc, due to second argument to
84+
; coro.begin not pointint to coro.alloc)
85+
define i8* @f_no_elision() personality i8* null {
86+
entry:
87+
%alloc = call i8* @CustomAlloc(i32 4)
88+
%beg = call token @llvm.coro.begin(i8* %alloc, i8* null, i32 0, i8* null,
89+
i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
90+
%hdl = call i8* @llvm.coro.frame(token %beg)
91+
ret i8* %hdl
92+
}
93+
94+
; CHECK-LABEL: @callResume_no_elision(
95+
define void @callResume_no_elision() {
96+
entry:
97+
; CHECK: call i8* @CustomAlloc(
98+
%hdl = call i8* @f_no_elision()
99+
100+
; Tail call should remain tail calls
101+
; CHECK: tail call void @bar(
102+
tail call void @bar(i8* %hdl)
103+
; CHECK: tail call void @bar(
104+
tail call void @bar(i8* null)
105+
106+
; CHECK-NEXT: call fastcc void bitcast (void (%f.frame*)* @f.resume to void (i8*)*)(i8*
107+
%0 = call i8* @llvm.coro.subfn.addr(i8* %hdl, i8 0)
108+
%1 = bitcast i8* %0 to void (i8*)*
109+
call fastcc void %1(i8* %hdl)
110+
111+
; CHECK-NEXT: call fastcc void bitcast (void (%f.frame*)* @f.destroy to void (i8*)*)(i8*
112+
%2 = call i8* @llvm.coro.subfn.addr(i8* %hdl, i8 1)
113+
%3 = bitcast i8* %2 to void (i8*)*
114+
call fastcc void %3(i8* %hdl)
115+
116+
; CHECK-NEXT: ret void
117+
ret void
118+
}
119+
120+
121+
declare i8* @llvm.coro.alloc()
122+
declare i8* @llvm.coro.free(i8*)
123+
declare token @llvm.coro.begin(i8*, i8*, i32, i8*, i8*)
124+
declare i8* @llvm.coro.frame(token)
125+
declare i8* @llvm.coro.subfn.addr(i8*, i8)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
; Verifies that restart trigger forces IPO pipelines restart and the same
2+
; coroutine is looked at by CoroSplit pass twice.
3+
; REQUIRES: asserts
4+
; RUN: opt < %s -S -O0 -enable-coroutines -debug-only=coro-split 2>&1 | FileCheck %s
5+
; RUN: opt < %s -S -O1 -enable-coroutines -debug-only=coro-split 2>&1 | FileCheck %s
6+
7+
; CHECK: CoroSplit: Processing coroutine 'f' state: 0
8+
; CHECK-NEXT: CoroSplit: Processing coroutine 'f' state: 1
9+
10+
declare token @llvm.coro.begin(i8*, i8*, i32, i8*, i8*)
11+
12+
; a coroutine start function
13+
define void @f() {
14+
call token @llvm.coro.begin(i8* null, i8* null, i32 0, i8* null, i8* null)
15+
ret void
16+
}

0 commit comments

Comments
 (0)
Please sign in to comment.