This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possible
ClosedPublic

Authored by GorNishanov on Aug 6 2016, 11:05 PM.
Tokens
"Yellow Medal" token, awarded by GorNishanov.

Details

Summary

A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static alloca in its
caller.

coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:

entry:
  %elide = call i8* @llvm.coro.alloc()
  %need.dyn.alloc = icmp ne i8* %elide, null
  br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
  %alloc = call i8* @CustomAlloc(i32 4)
  br label %coro.begin
coro.begin:
  %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
  %hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
                          i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))

and

  %mem = call i8* @llvm.coro.free(i8* %hdl)
  %need.dyn.free = icmp ne i8* %mem, null
  br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
  call void @CustomFree(i8* %mem)
  br label %if.end
if.end:
  ...

If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.

Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests. <= we are here
7.Add the rest of the logic (split into more patches)

Diff Detail

Event Timeline

GorNishanov retitled this revision from to Part 5: Elide dynamic allocation of coroutine frame when possible.
GorNishanov updated this object.
GorNishanov added reviewers: majnemer, mehdi_amini.
GorNishanov added a subscriber: llvm-commits.
GorNishanov updated this revision to Diff 67101.Aug 7 2016, 7:49 AM
GorNishanov retitled this revision from Part 5: Elide dynamic allocation of coroutine frame when possible to [Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possible.

clang-format the changes. Fix the title.

majnemer added inline comments.Aug 8 2016, 8:10 PM
lib/Transforms/Coroutines/CoroElide.cpp
91–95

This doesn't seem conservative enough because you are only check if the call's operands alias the alloca. I could imagine the call referencing the coroutine frame via other means (load/store via global).

I think getModRefInfo() != MRI_NoModRef would capture this better.

104–106

Should we report_fatal_error if the CallSite is must_tail?

111

auto *

130

Do we have to worry about the alignment of this alloca?

209–210

Is this still safe if there are multiple coro.begin calls and some of them refer to llvm.coro.alloc and others do not?

209–218

We would typically sink this assignment into the if.

219

auto *

GorNishanov updated this revision to Diff 67404.Aug 9 2016, 1:13 PM
GorNishanov marked 7 inline comments as done.

Address the feedback

lib/Transforms/Coroutines/CoroElide.cpp
130

Added comment:

// FIXME: Design how to transmit alignment information for every alloca that
// is spilled into the coroutine frame and recreate the alignment information
// here. Possibly we will need to do a mini SROA here and break the coroutine
// frame into individual AllocaInst recreating the original alignment.
209–210

Addressed by making coro.begin return the token and coro.frame taken that token returning the coroutine frame address. This should prevent coro.begin being duplicated.

GorNishanov updated this revision to Diff 67412.Aug 9 2016, 1:49 PM

modRef check was too conservative. Reverting back to original behavior.

  1. coro.begin now has a parameter pointing directly at coro.alloc if it is present or null, otherwise
  2. documentation + test + code updated to support this tweak
  3. clarified the FIXME comment explaining that we need to add a safety check when heap allocation elision is possible
  4. added a no_elision unit test for the cases when coro.alloc is not provided
majnemer accepted this revision.Aug 10 2016, 8:40 AM
majnemer edited edge metadata.

LGTM with nits addressed.

lib/Transforms/Coroutines/CoroInstr.h
106–112

I'd phrase this another way to make it more conservative:

CoroAllocInst *getAlloc() const {
  if (auto *CAI = dyn_cast<CoroAllocInst>getArgOperand(ElideArg->stripPointerCasts()))
    return CAI;

  return nullptr;
}

This way, we won't crash if undef or a bitcast or null or some other weird operation is sitting in the elide arg.

This revision is now accepted and ready to land.Aug 10 2016, 8:40 AM
GorNishanov edited edge metadata.

Nits addressed, Rebased to top of the tree, will land soon

This revision was automatically updated to reflect the committed changes.