This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
32/37
CoroFrame.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-materialize.ll
-
coro-retcon-remat.ll

Differential D142620

[Coroutines] Improve rematerialization stage
ClosedPublic

Authored by dstuttard on Jan 26 2023, 5:41 AM.

Download Raw Diff

Details

Reviewers

sebastian-ne
jsilvanus
ChuanqiXu

Commits

rG3e51af9b5b3a: [Coroutines] Improve rematerialization stage

Summary

As originally implemented, the rematerialization of valid instructions across
the suspend point would iterate 4 times, meaning that up to 4 instructions could
be rematerialized.

This implementation changes that approach to instead build a graph of
rematerializable instructions, then move all of them. This is faster than the
original approach and is not limited to an arbitrary limit.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dstuttard created this revision.Jan 26 2023, 5:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2023, 5:41 AM

Herald added subscribers: ChuanqiXu, hiraditya. · View Herald Transcript

dstuttard requested review of this revision.Jan 26 2023, 5:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2023, 5:41 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B210099: Diff 492411.Jan 26 2023, 5:41 AM

dstuttard added a child revision: D142621: [Couroutines] Modify CoroFrame materializable into a callback.Jan 26 2023, 5:44 AM

dstuttard added a parent revision: D142619: [Coroutines] Presubmit test for more coro remats.

dstuttard added reviewers: sebastian-ne, jsilvanus, ChuanqiXu.Jan 26 2023, 5:47 AM

Looks good to me, but please give it a few days in case someone else has a comment.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
366–372	Maybe it makes sense to use a Set for the Worklist?

This revision is now accepted and ready to land.Jan 27 2023, 10:55 AM

LGTM -- one inline comment on style.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
360	Maybe the indentation here can be reduced a bit with early exiting out of the outermost if, moving the initialization of D out of the if, merging the two ifs above, and early exiting here as well?

jsilvanus accepted this revision.Jan 30 2023, 12:51 AM

Thanks for working on this! But a problem I found is that it is expensive to construct ReversePostOrderTraversal and this patch tries to construct it in a loop. So it looks not so good to me.

And I am wondering if it is necessary to have such a complex structure and algorithm. What I had in mind is that we can use a worklist to store the materialized instructions and we can operate on that list. So we can avoid duplicate and meaningless iterations. I feel this is easier to implement and it looks not bad.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
326	What does `Children` mean here?
387–388	Are the 2 methods used?
2227–2228	The comment looks not precise after we land this patch.
2251	It is expensive to create ReversePostOrderTraversal. So it looks not good to construct it in a loop.
2252	I feel it is not so necessary and helpful to declare the type for the iterator.
2902–2908	This is not good. It may cause the the behavior become inconsistent after we materialize DVI instructions. See https://github.com/llvm/llvm-project/issues/55276 for an example.
2929
2931–2934	We may prefer such styles to shorten the indentation.
2933–2934	nit: We can use `auto` if we can see the type in the right hand side.
2939–2941	It looks not bad to use `auto` in this case.
2953–2956	We can construct the IRBuilder in rewriteMaterializableInstructions and we don't need to clear the Spills clearly.

This revision now requires changes to proceed.Jan 30 2023, 10:18 PM

jsilvanus added inline comments.Jan 31 2023, 12:41 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2251	Pedantically speaking, I'm not sure constructing the `ReversePostOrderTraversal` in a loop here is an issue: It being "expensive" just means it does the graph traversal in the constructor, so its run time is linear in the size of the graph. But here we are using it to traverse different graphs, all of which have been constructed before, so the runtime can be amortized into the construction of those graphs, or also into the traversal that is done later. What we should not do is re-creating `ReversePostOrderTraversal` iterator objects for the same graph in a loop, because that wastes runtime. Still, one might argue that constructing all those graphs with overlapping nodes, i.e. possibly multiple graphs having a node for the same `Instruction*`, is a fundamental runtime issue. Not sure if that really can become an issue?

ChuanqiXu added inline comments.Jan 31 2023, 1:11 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2251	Yeah, the key point here is that how many overlapping nodes there is. Have you measured the compile-time, run time performance or memory usages? Then we can have a better feeling. For example, we can decide if we want to limit the depth of the graph then.

Addressing reviewer comment

Thanks for the feedback - see the comments and the udpated patch(es)

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
326	I just needed a reasonable name for the next nodes - they are defined as being one edge further away from the root of the graph, so seemed like a reasonable name to use. Do you think something else would be better?
360	I think I get what you mean - I've updated it with less indenting.
366–372	Maybe - are you thinking that a set would remove the need to check for duplicates? I'm not sure it makes things much better - maybe it removes the needs to iterate the worklist, I can't remember if there's a requirement to do this in order though.
387–388	No - it appears they aren't! Based on the examples for using the RPOT template I thought they were.
2227–2228	I'm not sure that the result of this patch is any different from what happened before - other than you might get more than 4 dependent instructions rematerialized. What do you think needs changing here?
2251	This work was done to speed up materialization. We needed a lot of rematerialization to happen, and initially just increased the number of iterations from 4 to a larger number. This didn't work very well and was extremely slow - hence this re-work. I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful. I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications. I'm not sure about the overlapping nodes actually being an issue here - I did attempt to create a test case that demonstrated this, but I'm not sure I was entirely successful (all the tests ended up with the minimum set for the instructions being rematerialized).
2902–2908	I think this is here because I created the original patch on an older version of CoroFrame which did this. Is removing this the right approach?

Harbormaster completed remote builds in B212166: Diff 495212.Feb 6 2023, 2:24 PM

ChuanqiXu added inline comments.Feb 6 2023, 6:31 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
324	Since `RematNode` is only used in `RematGraph`. I feel slightly better to make `RematNode` a private class definition for `RematGraph`.
326	I guess `operands` or `reversed_successor` or something similar to that may be better. What's more important here is that we lack a lot of comments here for `RematNode` and `RematGraph`. Otherwise the code readers can't understand what they mean.
339	I feel slightly better to add an assertion here.
360	We can still improve this. For example: if (Remats.count(N->Node)) return; if (Remats.count(D)) { // Already have this in the graph N->Children.push_back(Remats[D].get()); continue; }
366–372	Given the graph should be a DAG, the order should not be important here. So a set here may be better.
388	nit: It looks slightly better to provide a in-class definition for `dump()`. So we can reduce one `!defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)`.
2227–2228	The comment is `For every use of "the value"`. However, rewriteMaterializableInstructions don't have `the value` from the signature. Also the reader can't know what is `RematGraph`. So it may be hard for users to understand.
2251	I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful. I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications. I am not sure if it is a good idea to limit the depth too. I mean we need more data to make the decision. Since this change is not a pure win theoretically. There may be edge cases or everything would be fine. I am not blocking this. I just say we need more things to convince ourselves that this is a good change generally. Specially, folly may be a good choice to have a test. Or any other libraries that have a lot of coroutines.
2902–2908	Yes, we should remove it.

More changes based on feedback

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
324	Defining the GraphTraits is harder if RematNode is a private class definition for RematGraph - but I agree there's no reason that RematNode shouldn't be a class definition in RematGraph, so I've done that.
326	I think I prefer Operands, so I've changed it to that. I've also added some more comments explaining RematGraph and RematNode.
366–372	I don't think there's much of an advantage to doing this, so I've left it as a deque for now. I'm open to being convinced that a set is better, but I'd rather not rework this if possible.
2227–2228	I've reworded this. Hopefully it's clearer now.
2251	I tried using folly to test this - not sure my methodology is sound though: Compile clang and clang++ with/without these changes Set up to build folly by setting: CC=/just/built/clang CXX=/just/built/clang++ CXXFLAGS="-std=c++20" CCACHE_DISABLE=1 (to allow for multiple build runs) I think that should be sufficient to enable it? Also ran folly tests (Verified from build output that the correct compiler is being used). Results: Build time WITH changes (tried it a couple of times): Run 1 real 1m58.627s user 39m49.991s sys 1m27.272s Run 2 real 1m54.844s user 39m55.198s sys 1m28.330s Test time WITH changes (multiple runs): 32.28 secs 41.39 secs 34.91 secs 36.18 secs 35.94 secs Build time WITHOUT changes (multiple runs): Run 1 real 1m55.352s user 39m33.938s sys 1m25.716s Run 2 real 1m58.287s user 39m30.488s sys 1m26.247s Run 3 real 1m54.915s user 39m31.783s sys 1m25.420s Test time WITHOUT changes (multiple runs): 42.23 secs 36.24 secs 41.64 secs 40.92 secs If this is valid - then it doesn't seem to make much difference. Arguably the test time is slightly better with the changes, but there seems to be more variation run-to-run than there is with/without.

Harbormaster completed remote builds in B212414: Diff 495564.Feb 7 2023, 10:04 AM

LGTM. Thanks.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2251	Good enough to know that this won't make things worse.

This revision is now accepted and ready to land.Feb 7 2023, 5:59 PM

Removed assert that was incorrect (and causing build-bot pre-checkin failures)

clang-format change

Harbormaster completed remote builds in B212780: Diff 496091.Feb 9 2023, 6:42 AM

This revision was landed with ongoing or failed builds.Feb 13 2023, 3:06 AM

Closed by commit rG3e51af9b5b3a: [Coroutines] Improve rematerialization stage (authored by dstuttard). · Explain Why

This revision was automatically updated to reflect the committed changes.

dstuttard added a commit: rG3e51af9b5b3a: [Coroutines] Improve rematerialization stage.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Coroutines/

CoroFrame.cpp

311 lines

test/

Transforms/

Coroutines/

coro-materialize.ll

6 lines

coro-retcon-remat.ll

2 lines

Diff 496909

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

Show All 10 Lines

// Using the information discovered we form a Coroutine Frame structure to // Using the information discovered we form a Coroutine Frame structure to

// contain those values. All uses of those values are replaced with appropriate // contain those values. All uses of those values are replaced with appropriate

// GEP + load from the coroutine frame. At the point of the definition we spill // GEP + load from the coroutine frame. At the point of the definition we spill

// the value into the coroutine frame. // the value into the coroutine frame.

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "CoroInternal.h" #include "CoroInternal.h"

#include "llvm/ADT/BitVector.h" #include "llvm/ADT/BitVector.h"

#include "llvm/ADT/PostOrderIterator.h"

#include "llvm/ADT/ScopeExit.h" #include "llvm/ADT/ScopeExit.h"

#include "llvm/ADT/SmallString.h" #include "llvm/ADT/SmallString.h"

#include "llvm/Analysis/PtrUseVisitor.h" #include "llvm/Analysis/PtrUseVisitor.h"

#include "llvm/Analysis/StackLifetime.h" #include "llvm/Analysis/StackLifetime.h"

#include "llvm/Config/llvm-config.h" #include "llvm/Config/llvm-config.h"

#include "llvm/IR/CFG.h" #include "llvm/IR/CFG.h"

#include "llvm/IR/DIBuilder.h" #include "llvm/IR/DIBuilder.h"

#include "llvm/IR/DebugInfo.h" #include "llvm/IR/DebugInfo.h"

#include "llvm/IR/Dominators.h" #include "llvm/IR/Dominators.h"

#include "llvm/IR/IRBuilder.h" #include "llvm/IR/IRBuilder.h"

#include "llvm/IR/InstIterator.h" #include "llvm/IR/InstIterator.h"

#include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/IntrinsicInst.h"

#include "llvm/Support/Debug.h" #include "llvm/Support/Debug.h"

#include "llvm/Support/MathExtras.h" #include "llvm/Support/MathExtras.h"

#include "llvm/Support/OptimizedStructLayout.h" #include "llvm/Support/OptimizedStructLayout.h"

#include "llvm/Support/circular_raw_ostream.h" #include "llvm/Support/circular_raw_ostream.h"

#include "llvm/Support/raw_ostream.h" #include "llvm/Support/raw_ostream.h"

#include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h"

#include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/Local.h"

#include "llvm/Transforms/Utils/PromoteMemToReg.h" #include "llvm/Transforms/Utils/PromoteMemToReg.h"

#include <algorithm> #include <algorithm>

#include <deque>

#include <optional> #include <optional>

using namespace llvm; using namespace llvm;

// The "coro-suspend-crossing" flag is very noisy. There is another debug type, // The "coro-suspend-crossing" flag is very noisy. There is another debug type,

// "coro-frame", which results in leaner debug spew. // "coro-frame", which results in leaner debug spew.

#define DEBUG_TYPE "coro-suspend-crossing" #define DEBUG_TYPE "coro-suspend-crossing"

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines iterator_range<succ_iterator> successors(BlockData const &BD) const {

BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]); BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]);

return llvm::successors(BB); return llvm::successors(BB);

} }

BlockData &getBlockData(BasicBlock *BB) { BlockData &getBlockData(BasicBlock *BB) {

return Block[Mapping.blockToIndex(BB)]; return Block[Mapping.blockToIndex(BB)];

} }

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

void dump() const; void dump() const;

void dump(StringRef Label, BitVector const &BV) const; void dump(StringRef Label, BitVector const &BV) const;

#endif

SuspendCrossingInfo(Function &F, coro::Shape &Shape); SuspendCrossingInfo(Function &F, coro::Shape &Shape);

/// Returns true if there is a path from \p From to \p To crossing a suspend /// Returns true if there is a path from \p From to \p To crossing a suspend

/// point without crossing \p From a 2nd time. /// point without crossing \p From a 2nd time.

bool hasPathCrossingSuspendPoint(BasicBlock *From, BasicBlock *To) const { bool hasPathCrossingSuspendPoint(BasicBlock *From, BasicBlock *To) const {

size_t const FromIndex = Mapping.blockToIndex(From); size_t const FromIndex = Mapping.blockToIndex(From);

size_t const ToIndex = Mapping.blockToIndex(To); size_t const ToIndex = Mapping.blockToIndex(To);

▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines for (size_t I = 0; I < N; ++I) {

LLVM_DEBUG(dump("SavedCons", SavedConsumes)); LLVM_DEBUG(dump("SavedCons", SavedConsumes));

} }

} while (Changed); } while (Changed);

LLVM_DEBUG(dump()); LLVM_DEBUG(dump());

} }

static bool materializable(Instruction &V);

namespace {

ChuanqiXuUnsubmitted

Done

Since RematNode is only used in RematGraph. I feel slightly better to make RematNode a private class definition for RematGraph.

ChuanqiXu: Since `RematNode` is only used in `RematGraph`. I feel slightly better to make `RematNode ` a…

dstuttardAuthorUnsubmitted

Done

Defining the GraphTraits is harder if RematNode is a private class definition for RematGraph - but I agree there's no reason that RematNode shouldn't be a class definition in RematGraph, so I've done that.

dstuttard: Defining the GraphTraits is harder if RematNode is a private class definition for RematGraph…

// RematGraph is used to construct a DAG for rematerializable instructions

// When the constructor is invoked with a candidate instruction (which is

ChuanqiXuUnsubmitted

Done

What does Children mean here?

ChuanqiXu: What does `Children` mean here?

dstuttardAuthorUnsubmitted

Done

I just needed a reasonable name for the next nodes - they are defined as being one edge further away from the root of the graph, so seemed like a reasonable name to use.
Do you think something else would be better?

dstuttard: I just needed a reasonable name for the next nodes - they are defined as being one edge further…

ChuanqiXuUnsubmitted

Done

I guess operands or reversed_successor or something similar to that may be better. What's more important here is that we lack a lot of comments here for RematNode and RematGraph. Otherwise the code readers can't understand what they mean.

ChuanqiXu: I guess `operands` or `reversed_successor` or something similar to that may be better. What's…

dstuttardAuthorUnsubmitted

Done

I think I prefer Operands, so I've changed it to that.
I've also added some more comments explaining RematGraph and RematNode.

dstuttard: I think I prefer Operands, so I've changed it to that. I've also added some more comments…

// materializable) it builds a DAG of materializable instructions from that

// point.

// Typically, for each instruction identified as re-materializable across a

// suspend point, a RematGraph will be created.

struct RematGraph {

// Each RematNode in the graph contains the edges to instructions providing

// operands in the current node.

struct RematNode {

Instruction *Node;

SmallVector<RematNode *> Operands;

RematNode() = default;

RematNode(Instruction *V) : Node(V) {}

};

ChuanqiXuUnsubmitted

Done

RematGraph(Instruction *I, SuspendCrossingInfo &Checker) : Checker(Checker) {

+ assert(materializable(*I));

std::unique_ptr<RematNode> FirstNode = std::make_unique<RematNode>(I);

EntryNode = FirstNode.get();

I feel slightly better to add an assertion here.

ChuanqiXu: I feel slightly better to add an assertion here.

RematNode *EntryNode;

using RematNodeMap =

SmallMapVector<Instruction *, std::unique_ptr<RematNode>, 8>;

RematNodeMap Remats;

SuspendCrossingInfo &Checker;

RematGraph(Instruction *I, SuspendCrossingInfo &Checker) : Checker(Checker) {

std::unique_ptr<RematNode> FirstNode = std::make_unique<RematNode>(I);

EntryNode = FirstNode.get();

std::deque<std::unique_ptr<RematNode>> WorkList;

addNode(std::move(FirstNode), WorkList, cast<User>(I));

while (WorkList.size()) {

std::unique_ptr<RematNode> N = std::move(WorkList.front());

WorkList.pop_front();

addNode(std::move(N), WorkList, cast<User>(I));

}

void addNode(std::unique_ptr<RematNode> NUPtr,

std::deque<std::unique_ptr<RematNode>> &WorkList,

jsilvanusUnsubmitted

Done

Maybe the indentation here can be reduced a bit with early exiting out of the outermost if, moving the initialization of D out of the if, merging the two ifs above, and early exiting here as well?

jsilvanus: Maybe the indentation here can be reduced a bit with early exiting out of the outermost if…

dstuttardAuthorUnsubmitted

Done

I think I get what you mean - I've updated it with less indenting.

dstuttard: I think I get what you mean - I've updated it with less indenting.

ChuanqiXuUnsubmitted

Done

We can still improve this. For example:

if (Remats.count(N->Node)) 
    return;

if (Remats.count(D)) {
          // Already have this in the graph
          N->Children.push_back(Remats[D].get());
          continue;
}

ChuanqiXu: We can still improve this. For example: ``` if (Remats.count(N->Node)) return; ``` ```…

User *FirstUse) {

RematNode *N = NUPtr.get();

if (Remats.count(N->Node))

return;

// We haven't see this node yet - add to the list

Remats[N->Node] = std::move(NUPtr);

for (auto &Def : N->Node->operands()) {

Instruction *D = dyn_cast<Instruction>(Def.get());

if (!D || !materializable(*D) ||

!Checker.isDefinitionAcrossSuspend(*D, FirstUse))

continue;

sebastian-neUnsubmitted

Done

Maybe it makes sense to use a Set for the Worklist?

sebastian-ne: Maybe it makes sense to use a Set for the Worklist?

dstuttardAuthorUnsubmitted

Done

Maybe - are you thinking that a set would remove the need to check for duplicates? I'm not sure it makes things much better - maybe it removes the needs to iterate the worklist, I can't remember if there's a requirement to do this in order though.

dstuttard: Maybe - are you thinking that a set would remove the need to check for duplicates? I'm not sure…

ChuanqiXuUnsubmitted

Done

Given the graph should be a DAG, the order should not be important here. So a set here may be better.

ChuanqiXu: Given the graph should be a DAG, the order should not be important here. So a set here may be…

dstuttardAuthorUnsubmitted

Done

I don't think there's much of an advantage to doing this, so I've left it as a deque for now. I'm open to being convinced that a set is better, but I'd rather not rework this if possible.

dstuttard: I don't think there's much of an advantage to doing this, so I've left it as a deque for now.

if (Remats.count(D)) {

// Already have this in the graph

N->Operands.push_back(Remats[D].get());

continue;

}

bool NoMatch = true;

for (auto &I : WorkList) {

if (I->Node == D) {

NoMatch = false;

N->Operands.push_back(I.get());

break;

}

if (NoMatch) {

ChuanqiXuUnsubmitted

Done

Are the 2 methods used?

ChuanqiXu: Are the 2 methods used?

dstuttardAuthorUnsubmitted

Done

No - it appears they aren't!
Based on the examples for using the RPOT template I thought they were.

dstuttard: No - it appears they aren't! Based on the examples for using the RPOT template I thought they…

ChuanqiXuUnsubmitted

Done

nit: It looks slightly better to provide a in-class definition for dump(). So we can reduce one !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP).

ChuanqiXu: nit: It looks slightly better to provide a in-class definition for `dump()`. So we can reduce…

// Create a new node

std::unique_ptr<RematNode> ChildNode = std::make_unique<RematNode>(D);

N->Operands.push_back(ChildNode.get());

WorkList.push_back(std::move(ChildNode));

}

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

void dump() const {

dbgs() << "Entry (";

if (EntryNode->Node->getParent()->hasName())

dbgs() << EntryNode->Node->getParent()->getName();

else

EntryNode->Node->getParent()->printAsOperand(dbgs(), false);

dbgs() << ") : " << *EntryNode->Node << "\n";

for (auto &E : Remats) {

dbgs() << *(E.first) << "\n";

for (RematNode *U : E.second->Operands)

dbgs() << " " << *U->Node << "\n";

}

#endif

};

} // end anonymous namespace

namespace llvm {

template <> struct GraphTraits<RematGraph *> {

using NodeRef = RematGraph::RematNode *;

using ChildIteratorType = RematGraph::RematNode **;

static NodeRef getEntryNode(RematGraph *G) { return G->EntryNode; }

static ChildIteratorType child_begin(NodeRef N) {

return N->Operands.begin();

}

static ChildIteratorType child_end(NodeRef N) { return N->Operands.end(); }

};

} // end namespace llvm

#undef DEBUG_TYPE // "coro-suspend-crossing" #undef DEBUG_TYPE // "coro-suspend-crossing"

#define DEBUG_TYPE "coro-frame" #define DEBUG_TYPE "coro-frame"

namespace { namespace {

class FrameTypeBuilder; class FrameTypeBuilder;

// Mapping from the to-be-spilled value to all the users that need reload. // Mapping from the to-be-spilled value to all the users that need reload.

using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>; using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>;

struct AllocaInfo { struct AllocaInfo {

▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines static void dumpSpills(StringRef Title, const SpillInfo &Spills) {

dbgs() << "------------- " << Title << "--------------\n"; dbgs() << "------------- " << Title << "--------------\n";

for (const auto &E : Spills) { for (const auto &E : Spills) {

E.first->dump(); E.first->dump();

dbgs() << " user: "; dbgs() << " user: ";

for (auto *I : E.second) for (auto *I : E.second)

I->dump(); I->dump();

} }

static void dumpRemats(

StringRef Title,

const SmallMapVector<Instruction *, std::unique_ptr<RematGraph>, 8> &RM) {

dbgs() << "------------- " << Title << "--------------\n";

for (const auto &E : RM) {

E.second->dump();

dbgs() << "--\n";

}

static void dumpAllocas(const SmallVectorImpl<AllocaInfo> &Allocas) { static void dumpAllocas(const SmallVectorImpl<AllocaInfo> &Allocas) {

dbgs() << "------------- Allocas --------------\n"; dbgs() << "------------- Allocas --------------\n";

for (const auto &A : Allocas) { for (const auto &A : Allocas) {

A.Alloca->dump(); A.Alloca->dump();

} }

#endif #endif

▲ Show 20 Lines • Show All 1,661 Lines • ▼ Show 20 Lines

} }

// Check for structural coroutine intrinsics that should not be spilled into // Check for structural coroutine intrinsics that should not be spilled into

// the coroutine frame. // the coroutine frame.

static bool isCoroutineStructureIntrinsic(Instruction &I) { static bool isCoroutineStructureIntrinsic(Instruction &I) {

return isa<CoroIdInst>(&I) || isa<CoroSaveInst>(&I) || return isa<CoroIdInst>(&I) || isa<CoroSaveInst>(&I) ||

isa<CoroSuspendInst>(&I); isa<CoroSuspendInst>(&I);

} }

// For every use of the value that is across suspend point, recreate that value // For each instruction identified as materializable across the suspend point,

ChuanqiXuUnsubmitted

Not Done

The comment looks not precise after we land this patch.

ChuanqiXu: The comment looks not precise after we land this patch.

dstuttardAuthorUnsubmitted

Done

I'm not sure that the result of this patch is any different from what happened before - other than you might get more than 4 dependent instructions rematerialized.
What do you think needs changing here?

dstuttard: I'm not sure that the result of this patch is any different from what happened before - other…

ChuanqiXuUnsubmitted

Done

The comment is For every use of "the value". However, rewriteMaterializableInstructions don't have the value from the signature. Also the reader can't know what is RematGraph. So it may be hard for users to understand.

ChuanqiXu: The comment is `For every use of "the value"`. However, rewriteMaterializableInstructions don't…

dstuttardAuthorUnsubmitted

Done

I've reworded this. Hopefully it's clearer now.

dstuttard: I've reworded this. Hopefully it's clearer now.

// after a suspend point. // and its associated DAG of other rematerializable instructions,

static void rewriteMaterializableInstructions(IRBuilder<> &IRB, // recreate the DAG of instructions after the suspend point.

const SpillInfo &Spills) { static void rewriteMaterializableInstructions(

for (const auto &E : Spills) { const SmallMapVector<Instruction *, std::unique_ptr<RematGraph>, 8>

Value *Def = E.first; &AllRemats) {

BasicBlock *CurrentBlock = nullptr; // This has to be done in 2 phases

Instruction *CurrentMaterialization = nullptr; // Do the remats and record the required defs to be replaced in the

for (Instruction *U : E.second) { // original use instructions

// If we have not seen this block, materialize the value. // Once all the remats are complete, replace the uses in the final

if (CurrentBlock != U->getParent()) { // instructions with the new defs

typedef struct {

Instruction *Use;

Instruction *Def;

Instruction *Remat;

} ProcessNode;

bool IsInCoroSuspendBlock = isa<AnyCoroSuspendInst>(U); SmallVector<ProcessNode> FinalInstructionsToProcess;

CurrentBlock = U->getParent();

auto *InsertBlock = IsInCoroSuspendBlock for (const auto &E : AllRemats) {

? CurrentBlock->getSinglePredecessor() Instruction *Use = E.first;

: CurrentBlock; Instruction *CurrentMaterialization = nullptr;

CurrentMaterialization = cast<Instruction>(Def)->clone(); RematGraph *RG = E.second.get();

CurrentMaterialization->setName(Def->getName()); ReversePostOrderTraversal<RematGraph *> RPOT(RG);

ChuanqiXuUnsubmitted

Not Done

It is expensive to create ReversePostOrderTraversal. So it looks not good to construct it in a loop.

ChuanqiXu: It is expensive to create ReversePostOrderTraversal. So it looks not good to construct it in a…

jsilvanusUnsubmitted

Not Done

Pedantically speaking, I'm not sure constructing the ReversePostOrderTraversal in a loop here is an issue: It being "expensive" just means it does the graph traversal in the constructor, so its run time is linear in the size of the graph.
But here we are using it to traverse *different* graphs, all of which have been constructed before, so the runtime can be amortized into the construction of those graphs, or also into the traversal that is done later.

What we should not do is re-creating ReversePostOrderTraversal iterator objects for the same graph in a loop, because that wastes runtime.

Still, one might argue that constructing all those graphs with overlapping nodes, i.e. possibly multiple graphs having a node for the same Instruction*, is a fundamental runtime issue. Not sure if that really can become an issue?

jsilvanus: Pedantically speaking, I'm not sure constructing the `ReversePostOrderTraversal` in a loop here…

ChuanqiXuUnsubmitted

Not Done

Yeah, the key point here is that how many overlapping nodes there is. Have you measured the compile-time, run time performance or memory usages? Then we can have a better feeling. For example, we can decide if we want to limit the depth of the graph then.

ChuanqiXu: Yeah, the key point here is that how many overlapping nodes there is. Have you measured the…

dstuttardAuthorUnsubmitted

Done

This work was done to speed up materialization. We needed a lot of rematerialization to happen, and initially just increased the number of iterations from 4 to a larger number.
This didn't work very well and was extremely slow - hence this re-work.

I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful.

I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications.

I'm not sure about the overlapping nodes actually being an issue here - I did attempt to create a test case that demonstrated this, but I'm not sure I was entirely successful (all the tests ended up with the minimum set for the instructions being rematerialized).

dstuttard: This work was done to speed up materialization. We needed a lot of rematerialization to happen…

ChuanqiXuUnsubmitted

Done

I haven't done timings for smaller amounts of remat, but I can do that if you think it is useful.

I did wonder though if limiting the depth with an option might be useful - we want as much as possible, but that's probably not true for all applications.

I am not sure if it is a good idea to limit the depth too. I mean we need more data to make the decision. Since this change is not a pure win theoretically. There may be edge cases or everything would be fine. I am not blocking this. I just say we need more things to convince ourselves that this is a good change generally. Specially, folly may be a good choice to have a test. Or any other libraries that have a lot of coroutines.

ChuanqiXu: > I haven't done timings for smaller amounts of remat, but I can do that if you think it is…

dstuttardAuthorUnsubmitted

Done

I tried using folly to test this - not sure my methodology is sound though:

Compile clang and clang++ with/without these changes
Set up to build folly by setting:
CC=/just/built/clang
CXX=/just/built/clang++
CXXFLAGS="-std=c++20"
CCACHE_DISABLE=1 (to allow for multiple build runs)

I think that should be sufficient to enable it?

Also ran folly tests
(Verified from build output that the correct compiler is being used).

Results:
Build time WITH changes (tried it a couple of times):
Run 1
real 1m58.627s
user 39m49.991s
sys 1m27.272s

Run 2
real 1m54.844s
user 39m55.198s
sys 1m28.330s

Test time WITH changes (multiple runs):
32.28 secs
41.39 secs
34.91 secs
36.18 secs
35.94 secs

Build time WITHOUT changes (multiple runs):
Run 1
real 1m55.352s
user 39m33.938s
sys 1m25.716s

Run 2
real 1m58.287s
user 39m30.488s
sys 1m26.247s

Run 3
real 1m54.915s
user 39m31.783s
sys 1m25.420s

Test time WITHOUT changes (multiple runs):
42.23 secs
36.24 secs
41.64 secs
40.92 secs

If this is valid - then it doesn't seem to make much difference. Arguably the test time is slightly better with the changes, but there seems to be more variation run-to-run than there is with/without.

dstuttard: I tried using folly to test this - not sure my methodology is sound though: Compile clang and…

ChuanqiXuUnsubmitted

Not Done

Good enough to know that this won't make things worse.

ChuanqiXu: Good enough to know that this won't make things worse.

CurrentMaterialization->insertBefore( SmallVector<Instruction *> InstructionsToProcess;

ChuanqiXuUnsubmitted

Done

ReversePostOrderTraversal<RematGraph *> RPOT(RG);

- using rpo_iterator = ReversePostOrderTraversal<RematGraph *>::rpo_iterator;

SmallVector<Instruction *> InstructionsToProcess;

I feel it is not so necessary and helpful to declare the type for the iterator.

ChuanqiXu: I feel it is not so necessary and helpful to declare the type for the iterator.

IsInCoroSuspendBlock ? InsertBlock->getTerminator()

: &*InsertBlock->getFirstInsertionPt()); // If the target use is actually a suspend instruction then we have to

// insert the remats into the end of the predecessor (there should only be

// one). This is so that suspend blocks always have the suspend instruction

// as the first instruction.

auto InsertPoint = &*Use->getParent()->getFirstInsertionPt();

if (isa<AnyCoroSuspendInst>(Use)) {

BasicBlock *SuspendPredecessorBlock =

Use->getParent()->getSinglePredecessor();

assert(SuspendPredecessorBlock && "malformed coro suspend instruction");

InsertPoint = SuspendPredecessorBlock->getTerminator();

}

// Note: skip the first instruction as this is the actual use that we're

// rematerializing everything for.

auto I = RPOT.begin();

++I;

for (; I != RPOT.end(); ++I) {

Instruction *D = (*I)->Node;

CurrentMaterialization = D->clone();

CurrentMaterialization->setName(D->getName());

CurrentMaterialization->insertBefore(InsertPoint);

InsertPoint = CurrentMaterialization;

// Replace all uses of Def in the instructions being added as part of this

// rematerialization group

for (auto &I : InstructionsToProcess)

I->replaceUsesOfWith(D, CurrentMaterialization);

// Don't replace the final use at this point as this can cause problems

// for other materializations. Instead, for any final use that uses a

// define that's being rematerialized, record the replace values

for (unsigned i = 0, E = Use->getNumOperands(); i != E; ++i)

if (Use->getOperand(i) == D) // Is this operand pointing to oldval?

FinalInstructionsToProcess.push_back(

{Use, D, CurrentMaterialization});

InstructionsToProcess.push_back(CurrentMaterialization);

} }

if (auto *PN = dyn_cast<PHINode>(U)) { }

assert(PN->getNumIncomingValues() == 1 &&

"unexpected number of incoming " // Finally, replace the uses with the defines that we've just rematerialized

for (auto &R : FinalInstructionsToProcess) {

if (auto *PN = dyn_cast<PHINode>(R.Use)) {

assert(PN->getNumIncomingValues() == 1 && "unexpected number of incoming "

"values in the PHINode"); "values in the PHINode");

PN->replaceAllUsesWith(CurrentMaterialization); PN->replaceAllUsesWith(R.Remat);

PN->eraseFromParent(); PN->eraseFromParent();

continue; continue;

} }

// Replace all uses of Def in the current instruction with the R.Use->replaceUsesOfWith(R.Def, R.Remat);

// CurrentMaterialization for the block.

U->replaceUsesOfWith(Def, CurrentMaterialization);

}

} }

// Splits the block at a particular instruction unless it is the first // Splits the block at a particular instruction unless it is the first

// instruction in the block with a single predecessor. // instruction in the block with a single predecessor.

static BasicBlock *splitBlockIfNotFirst(Instruction *I, const Twine &Name) { static BasicBlock *splitBlockIfNotFirst(Instruction *I, const Twine &Name) {

auto *BB = I->getParent(); auto *BB = I->getParent();

if (&BB->front() == I) { if (&BB->front() == I) {

▲ Show 20 Lines • Show All 570 Lines • ▼ Show 20 Lines if (auto *I = dyn_cast<Instruction>(Storage))

InsertPt = I->getInsertionPointAfterDef(); InsertPt = I->getInsertionPointAfterDef();

else if (isa<Argument>(Storage)) else if (isa<Argument>(Storage))

InsertPt = &*F->getEntryBlock().begin(); InsertPt = &*F->getEntryBlock().begin();

if (InsertPt) if (InsertPt)

DVI->moveBefore(InsertPt); DVI->moveBefore(InsertPt);

} }

static void doRematerializations(Function &F, SuspendCrossingInfo &Checker) {

SpillInfo Spills;

// See if there are materializable instructions across suspend points

// We record these as the starting point to also identify materializable

// defs of uses in these operations

for (Instruction &I : instructions(F)) {

if (!materializable(I))

continue;

for (User *U : I.users())

if (Checker.isDefinitionAcrossSuspend(I, U))

Spills[&I].push_back(cast<Instruction>(U));

}

// Process each of the identified rematerializable instructions

// and add predecessor instructions that can also be rematerialized.

// This is actually a graph of instructions since we could potentially

// have multiple uses of a def in the set of predecessor instructions.

// The approach here is to maintain a graph of instructions for each bottom

ChuanqiXuUnsubmitted

Done

This is not good. It may cause the the behavior become inconsistent after we materialize DVI instructions. See https://github.com/llvm/llvm-project/issues/55276 for an example.

ChuanqiXu: This is not good. It may cause the the behavior become inconsistent after we materialize DVI…

dstuttardAuthorUnsubmitted

Done

I think this is here because I created the original patch on an older version of CoroFrame which did this.
Is removing this the right approach?

dstuttard: I think this is here because I created the original patch on an older version of CoroFrame…

ChuanqiXuUnsubmitted

Done

Yes, we should remove it.

ChuanqiXu: Yes, we should remove it.

// level instruction - where we have a unique set of instructions (nodes)

// and edges between them. We then walk the graph in reverse post-dominator

// order to insert them past the suspend point, but ensure that ordering is

// correct. We also rely on CSE removing duplicate defs for remats of

// different instructions with a def in common (rather than maintaining more

// complex graphs for each suspend point)

// We can do this by adding new nodes to the list for each suspend

// point. Then using standard GraphTraits to give a reverse post-order

// traversal when we insert the nodes after the suspend

SmallMapVector<Instruction *, std::unique_ptr<RematGraph>, 8> AllRemats;

for (auto &E : Spills) {

for (Instruction *U : E.second) {

// Don't process a user twice (this can happen if the instruction uses

// more than one rematerializable def)

if (AllRemats.count(U))

continue;

// Constructor creates the whole RematGraph for the given Use

auto RematUPtr = std::make_unique<RematGraph>(U, Checker);

ChuanqiXuUnsubmitted

Done

for (Instruction *U : E.second) {

- // Don't process a use twice (this can happen if the instruction uses

+ // Don't process a user twice (this can happen if the instruction uses

// more than one rematerializable def)

ChuanqiXu:

LLVM_DEBUG(dbgs() << "***** Next remat group *****\n";

ReversePostOrderTraversal<RematGraph *> RPOT(RematUPtr.get());

for (auto I = RPOT.begin(); I != RPOT.end();

++I) { (*I)->Node->dump(); } dbgs()

<< "\n";);

ChuanqiXuUnsubmitted

Done

// more than one rematerializable def)

- if (!AllRemats.count(U)) {

- // Constructor creates the whole RematGraph for the given Use

- std::unique_ptr<RematGraph> RematUPtr =

- std::make_unique<RematGraph>(U, Checker);

+ if (AllRemats.count(U))

+ continue;

LLVM_DEBUG(

We may prefer such styles to shorten the indentation.

ChuanqiXu: We may prefer such styles to shorten the indentation.

ChuanqiXuUnsubmitted

Done

// Constructor creates the whole RematGraph for the given Use

- std::unique_ptr<RematGraph> RematUPtr =

- std::make_unique<RematGraph>(U, Checker);

+ auto RematUPtr = std::make_unique<RematGraph>(U, Checker);

LLVM_DEBUG(

nit: We can use auto if we can see the type in the right hand side.

ChuanqiXu: nit: We can use `auto` if we can see the type in the right hand side.

AllRemats[U] = std::move(RematUPtr);

}

// Rewrite materializable instructions to be materialized at the use

// point.

ChuanqiXuUnsubmitted

Done

ReversePostOrderTraversal<RematGraph *> RPOT(RematUPtr.get());

- using rpo_iterator =

- ReversePostOrderTraversal<RematGraph *>::rpo_iterator;

- for (rpo_iterator I = RPOT.begin(); I != RPOT.end();

+ for (auto I = RPOT.begin(); I != RPOT.end();

++I) { (*I)->Node->dump(); } dbgs()

It looks not bad to use auto in this case.

ChuanqiXu: It looks not bad to use `auto` in this case.

LLVM_DEBUG(dumpRemats("Materializations", AllRemats));

rewriteMaterializableInstructions(AllRemats);

}

void coro::buildCoroutineFrame(Function &F, Shape &Shape) { void coro::buildCoroutineFrame(Function &F, Shape &Shape) {

// Don't eliminate swifterror in async functions that won't be split. // Don't eliminate swifterror in async functions that won't be split.

if (Shape.ABI != coro::ABI::Async || !Shape.CoroSuspends.empty()) if (Shape.ABI != coro::ABI::Async || !Shape.CoroSuspends.empty())

eliminateSwiftError(F, Shape); eliminateSwiftError(F, Shape);

if (Shape.ABI == coro::ABI::Switch && if (Shape.ABI == coro::ABI::Switch &&

Shape.SwitchLowering.PromiseAlloca) { Shape.SwitchLowering.PromiseAlloca) {

Shape.getSwitchCoroId()->clearPromise(); Shape.getSwitchCoroId()->clearPromise();

} }

// Make sure that all coro.save, coro.suspend and the fallthrough coro.end // Make sure that all coro.save, coro.suspend and the fallthrough coro.end

ChuanqiXuUnsubmitted

Done

LLVM_DEBUG(dumpRemats("Materializations", AllRemats));

- IRBuilder<> Builder(F.getContext());

rewriteMaterializableInstructions(Builder, AllRemats);

- Spills.clear();

}

void coro::buildCoroutineFrame(Function &F, Shape &Shape) {

We can construct the IRBuilder in rewriteMaterializableInstructions and we don't need to clear the Spills clearly.

ChuanqiXu: We can construct the IRBuilder in rewriteMaterializableInstructions and we don't need to clear…

// intrinsics are in their own blocks to simplify the logic of building up // intrinsics are in their own blocks to simplify the logic of building up

// SuspendCrossing data. // SuspendCrossing data.

for (auto *CSI : Shape.CoroSuspends) { for (auto *CSI : Shape.CoroSuspends) {

if (auto *Save = CSI->getCoroSave()) if (auto *Save = CSI->getCoroSave())

splitAround(Save, "CoroSave"); splitAround(Save, "CoroSave");

splitAround(CSI, "CoroSuspend"); splitAround(CSI, "CoroSuspend");

} }

Show All 24 Lines void coro::buildCoroutineFrame(Function &F, Shape &Shape) {

// Transforms multi-edge PHI Nodes, so that any value feeding into a PHI will // Transforms multi-edge PHI Nodes, so that any value feeding into a PHI will

// never has its definition separated from the PHI by the suspend point. // never has its definition separated from the PHI by the suspend point.

rewritePHIs(F); rewritePHIs(F);

// Build suspend crossing info. // Build suspend crossing info.

SuspendCrossingInfo Checker(F, Shape); SuspendCrossingInfo Checker(F, Shape);

IRBuilder<> Builder(F.getContext()); doRematerializations(F, Checker);

FrameDataInfo FrameData; FrameDataInfo FrameData;

SmallVector<CoroAllocaAllocInst*, 4> LocalAllocas; SmallVector<CoroAllocaAllocInst*, 4> LocalAllocas;

SmallVector<Instruction*, 4> DeadInstructions; SmallVector<Instruction*, 4> DeadInstructions;

{

SpillInfo Spills;

for (int Repeat = 0; Repeat < 4; ++Repeat) {

// See if there are materializable instructions across suspend points.

// FIXME: We can use a worklist to track the possible materialize

// instructions instead of iterating the whole function again and again.

for (Instruction &I : instructions(F))

if (materializable(I)) {

for (User *U : I.users())

if (Checker.isDefinitionAcrossSuspend(I, U))

Spills[&I].push_back(cast<Instruction>(U));

}

if (Spills.empty())

break;

// Rewrite materializable instructions to be materialized at the use

// point.

LLVM_DEBUG(dumpSpills("Materializations", Spills));

rewriteMaterializableInstructions(Builder, Spills);

Spills.clear();

}

if (Shape.ABI != coro::ABI::Async && Shape.ABI != coro::ABI::Retcon && if (Shape.ABI != coro::ABI::Async && Shape.ABI != coro::ABI::Retcon &&

Shape.ABI != coro::ABI::RetconOnce) Shape.ABI != coro::ABI::RetconOnce)

sinkLifetimeStartMarkers(F, Shape, Checker); sinkLifetimeStartMarkers(F, Shape, Checker);

// Collect the spills for arguments and other not-materializable values. // Collect the spills for arguments and other not-materializable values.

for (Argument &A : F.args()) for (Argument &A : F.args())

for (User *U : A.users()) for (User *U : A.users())

if (Checker.isDefinitionAcrossSuspend(A, U)) if (Checker.isDefinitionAcrossSuspend(A, U))

▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-materialize.ll

	; Verifies that we materialize instruction across suspend points			; Verifies that we materialize instruction across suspend points
	; RUN: opt < %s -passes='cgscc(coro-split),simplifycfg,early-cse' -S \| FileCheck %s			; RUN: opt < %s -passes='cgscc(coro-split),simplifycfg,early-cse' -S \| FileCheck %s

	; See that we only spilled one value for f			; See that we only spilled one value for f
	; CHECK: %f.Frame = type { ptr, ptr, i32, i1 }			; CHECK: %f.Frame = type { ptr, ptr, i32, i1 }
	; Check other variants where different levels of materialization are achieved			; Check other variants where different levels of materialization are achieved
	; CHECK: %f_multiple_remat.Frame = type { ptr, ptr, i32, i32, i32, i1 }			; CHECK: %f_multiple_remat.Frame = type { ptr, ptr, i32, i1 }
	; CHECK: %f_common_def.Frame = type { ptr, ptr, i32, i32, i32, i1 }			; CHECK: %f_common_def.Frame = type { ptr, ptr, i32, i1 }
	; CHECK: %f_common_def_multi_result.Frame = type { ptr, ptr, i32, i32, i32, i32, i32, i32, i32, i1 }			; CHECK: %f_common_def_multi_result.Frame = type { ptr, ptr, i32, i1 }
	; CHECK-LABEL: @f(			; CHECK-LABEL: @f(
	; CHECK-LABEL: @f_multiple_remat(			; CHECK-LABEL: @f_multiple_remat(
	; CHECK-LABEL: @f_common_def(			; CHECK-LABEL: @f_common_def(
	; CHECK-LABEL: @f_common_def_multi_result(			; CHECK-LABEL: @f_common_def_multi_result(

	define ptr @f(i32 %n) presplitcoroutine {			define ptr @f(i32 %n) presplitcoroutine {
	entry:			entry:
	%id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)			%id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-retcon-remat.ll

	; Check that a remat that inserts rematerialized instructions in the single predecessor block works			; Check that a remat that inserts rematerialized instructions in the single predecessor block works
	; as expected			; as expected
	; RUN: opt < %s -O0 -S \| FileCheck %s			; RUN: opt < %s -O0 -S \| FileCheck %s

	; CHECK: %f.Frame = type { i32, i32 }			; CHECK: %f.Frame = type { i32 }

	define { i8, i32 } @f(i8 %buffer, i32 %n) {			define { i8, i32 } @f(i8 %buffer, i32 %n) {
	entry:			entry:
	%id = call token @llvm.coro.id.retcon(i32 8, i32 4, i8* %buffer, i8* bitcast ({ i8, i32 } (i8, i1)* @f_prototype to i8), i8 bitcast (i8* (i32)* @allocate to i8), i8 bitcast (void (i8) @deallocate to i8*))			%id = call token @llvm.coro.id.retcon(i32 8, i32 4, i8* %buffer, i8* bitcast ({ i8, i32 } (i8, i1)* @f_prototype to i8), i8 bitcast (i8* (i32)* @allocate to i8), i8 bitcast (void (i8) @deallocate to i8*))
	%hdl = call i8* @llvm.coro.begin(token %id, i8* null)			%hdl = call i8* @llvm.coro.begin(token %id, i8* null)
	br label %loop			br label %loop

	loop:			loop:
	Show All 36 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Improve rematerialization stageClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 496909

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

llvm/test/Transforms/Coroutines/coro-materialize.ll

llvm/test/Transforms/Coroutines/coro-retcon-remat.ll

[Coroutines] Improve rematerialization stage
ClosedPublic