This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
8/21
CoroFrame.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-alloca-01.ll
-
coro-alloca-02.ll
-
coro-alloca-03.ll
-
coro-alloca-04.ll
-
coro-debug-frame-variable.ll

Differential D89768

[Coroutine] Properly determine whether an alloca should live on the frame
ClosedPublic

Authored by lxfind on Oct 19 2020, 11:39 PM.

Download Raw Diff

Details

Reviewers

wenlei
junparser
ChuanqiXu
rjmccall

Commits

rG9f5a2beadce4: [Coroutine] Properly determine whether an alloca should live on the frame

Summary

The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	270 ms	windows > lld.ELF/invalid::symtab-sh-info.s

Event Timeline

lxfind created this revision.Oct 19 2020, 11:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 11:39 PM

Herald added subscribers: llvm-commits, modimo, modocache, hiraditya. · View Herald Transcript

lxfind requested review of this revision.Oct 19 2020, 11:39 PM

Harbormaster completed remote builds in B75652: Diff 299271.Oct 20 2020, 12:17 AM

clang-tidy

Harbormaster completed remote builds in B75740: Diff 299431.Oct 20 2020, 12:32 PM

There's an existing StackLifetime analysis that does some of this work and also considers the lifetime intrinsics; can we take advantage of that?

In D89768#2345468, @rjmccall wrote:

There's an existing StackLifetime analysis that does some of this work and also considers the lifetime intrinsics; can we take advantage of that?

My understanding is quite the opposite, that StackLifetime analysis purely relies on lifetime intrinsics, and does not look at how an alloca is being used.
https://github.com/llvm/llvm-project/blob/e10e7829bf6f10c053c05e42b676d7acaf54a221/llvm/include/llvm/Analysis/StackLifetime.h#L111-L112

Okay. Well, it does seem to me that we need to be considering lifetimes. If we can also optimize when we don't have lifetime information and the variable doesn't escape, that's great, but we often have pretty good lifetime information that we can use. And in optimized builds we'll usually have already eliminated allocas that don't escape, so the remaining allocas are very likely to have all escaped.

In D89768#2345916, @rjmccall wrote:

Okay. Well, it does seem to me that we need to be considering lifetimes. If we can also optimize when we don't have lifetime information and the variable doesn't escape, that's great, but we often have pretty good lifetime information that we can use. And in optimized builds we'll usually have already eliminated allocas that don't escape, so the remaining allocas are very likely to have all escaped.

Yes we are considering lifetimes. The algorithms by first checking to see if there are lifetimes, and if there are use them. After that we use the pointer visitor to do more precise tracking.

Great! To my understanding, if there are lifetime informations, the behavior of this patch is same with the behavior of previous implementation. Is my understanding right?

In D89768#2348849, @ChuanqiXu wrote:

Great! To my understanding, if there are lifetime informations, the behavior of this patch is same with the behavior of previous implementation. Is my understanding right?

That's correct.

Oh, I see, sorry.

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
919–921	this function implies dominate relation for arg1 and arg2, so we can not use hasPathCrossingSuspendPoint for two user basic blocks directly.

lxfind added inline comments.Oct 26 2020, 10:15 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
919–921	hmm if that's the case, I think we might already have some buggy code here. The checks on lifetime markers don't necessary have dominance relationships. I saw you removed the dominance assertion in https://reviews.llvm.org/D75664. Do you remember why?

In D89768#2353488, @junparser wrote:

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

What do you mean by "leave aliases on frame"?

In D89768#2354119, @lxfind wrote:

In D89768#2353488, @junparser wrote:

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

What do you mean by "leave aliases on frame"?

Just setEscaped for pointer cast used before coro.begin, or can we do something like llvm::PointerMayBeCapturedBefore to check the pointer directly? I'm not sure about this.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
919–921	The bitcast users hit the assertion, however, they do not across the suspend point due to rewriteMaterializableInstructions. Except that, each lifetime marker has dominance relationship with some of the users. Iterate all the lifetime marker has effect as same as def

In D89768#2355576, @junparser wrote:

In D89768#2354119, @lxfind wrote:

In D89768#2353488, @junparser wrote:

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

What do you mean by "leave aliases on frame"?

Just setEscaped for pointer cast used before coro.begin, or can we do something like llvm::PointerMayBeCapturedBefore to check the pointer directly? I'm not sure about this.

@lxfind, sorry for the confusing question, please ignore it. After I read about the D86859, one of the suggestion is that after we find all of the aliases, we can do same thing like rewriteMaterializableInstructions

junparser added inline comments.Oct 27 2020, 8:00 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2040	we can call this at the beginning of this function, and then sink the aliases after coro.begin as we do in rewriteMaterializableInstructions, keep the original logic unchanged. I think we can even deal with unknown offset.

lxfind added inline comments.Oct 27 2020, 8:57 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2040	I am not exactly sure what you mean, but let me explain a bit more about the alias analysis here. The alias analysis serves three purposes. The first purpose, which was introduced in D86859, is to identify aliases created before CoroBegin so that we can recreate them after CoroBegin. In this case, we cannot deal with unknown offset because we simply cannot recreate an alias if the offset is unknow. The second purpose, which was also introduced in D86859, is to identify allocas that may have been written into before CoroBegin. If that happens, we need to copy the content from stack to the frame. The third purpose, which is new in this patch, is to identify which alloca should go to the frame. And we need alias analysis because without it, we won't be able to detect cases where an alloca needs to stay alive across suspension points due to indirect escape (as demonstrated in the added test cases). rewriteMaterializableInstructions cannot handle those indirect escapes. Which case do you think we can/should simplify?

Thank you for the explanation!

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	Still need call Checker.hasPathCrossingSuspendPoint(AllocaDef, Pointeruser)

junparser added inline comments.Oct 27 2020, 7:27 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2040	This makes me more clear! LGTM.

lxfind added inline comments.Oct 27 2020, 10:43 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca belong to different suspend regions, which causes the spill. I thought about this a bit more, and my conclusion is that although the API is not intended to be used this way, it actually works here. The goal for this loop is to check whether an alloca needs to live across a suspension. If an alloca needs to live across a suspension, it means there exists a coro.suspend such that a user BB happens before it and a user BB happens after it, and these two BBs must dominate each other through coro.susend. Hence one of the user BB pairs must return true on the hasPathCrossingSuspendPoint check. On the other hand, for BBs that don't even dominate each other, hasPathCrossingSuspendPoint will return false anyway and won't affect the algorithm.

junparser added inline comments.Oct 27 2020, 10:52 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	let's say : void foo () { alloca a; if (cond) { co_await use1 a; }else use2 a; } in such case, hasPathCrossingSuspendPoint(use1, use2) return false which means a is not keep on frame, this is wrong.

lxfind added inline comments.Oct 27 2020, 10:56 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	In this case, "a" is just defined but not used at all before the co_await, so it would be correct to not keep it on frame, right? Could you explain why in this case "a" needs to be on the frame?

junparser added inline comments.Oct 27 2020, 11:01 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca belong to different suspend regions, which causes the spill. I thought about this a bit more, and my conclusion is that although the API is not intended to be used this way, it actually works here. The goal for this loop is to check whether an alloca needs to live across a suspension. If an alloca needs to live across a suspension, it means there exists a coro.suspend such that a user BB happens before it and a user BB happens after it, and these two BBs must dominate each other through coro.susend. Hence one of the user BB pairs must return true on the hasPathCrossingSuspendPoint check. On the other hand, for BBs that don't even dominate each other, hasPathCrossingSuspendPoint will return false anyway and won't affect the algorithm. I do not understand why there is a user bb happends before coro.suspend?

junparser added inline comments.Oct 27 2020, 11:05 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	It may used before the co_await, as long as initial_suspend never suspend and cond is false. This pattern can also be put in a loop. we do not know whether part will be executed.

lxfind added inline comments.Oct 27 2020, 11:10 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	If an alloca needs to live across a suspension, there must exists two uses of the alloca (or through an alias) that one happens before a coro.suspend and one happens after that same coro.suspend. If no two uses of alloca can cross a suspend, the alloca does not ever need to live on the frame. Does this part sound right? Next we need to prove that if there exists such a pair of use, one of them will return true on hasPathCrossingSuspendPoint. It's obvious that the alloca use after coro.suspend must be dominated by coro.suspend. The alloca happens before coro.suspend is a bit trickier to think about. In the simpler case, if the alloca happens before coro.suspend also dominates coro.suspend, then we have a pair of user BBs that would return true on the check, because they dominate each other through coro.suspend. If the alloca happens before coro.suspend does not dominate coro.suspend, and yet it's meaningfully used, it would either escape at some point, or it will be propagated into a PHINode eventually and that PHINode will dominate coro.suspend. Since we also track PHINode in the alias analysis, we will eventually find this pair that dominates each other.

lxfind added inline comments.Oct 27 2020, 11:13 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	It may used before the co_await, as long as initial_suspend never suspend and cond is false. This pattern can also be put in a loop. we do not know whether part will be executed. In your code example, "a" is not used before co_await, neither initialized. So it doesn't matter if it lives on the frame or not. If it is used, then there will exists a user-pair that crosses suspensions. Could you explain what could go wrong in your example above if "a" is not kept in the frame?

junparser added inline comments.Oct 27 2020, 11:16 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	If an alloca needs to live across a suspension, there must exists two uses of the alloca (or through an alias) that one happens before a coro.suspend and one happens after that same coro.suspend. If no two uses of alloca can cross a suspend, the alloca does not ever need to live on the frame. Does this part sound right? There is no before/after relation. as long as two uses used in different suspend region. the case can be changed to void foo () { alloca a; if (cond) { co_await use1 a; }else{ co_await use2 a; } } Next we need to prove that if there exists such a pair of use, one of them will return true on hasPathCrossingSuspendPoint. It's obvious that the alloca use after coro.suspend must be dominated by coro.suspend. The alloca happens before coro.suspend is a bit trickier to think about. In the simpler case, if the alloca happens before coro.suspend also dominates coro.suspend, then we have a pair of user BBs that would return true on the check, because they dominate each other through coro.suspend. If the alloca happens before coro.suspend does not dominate coro.suspend, and yet it's meaningfully used, it would either escape at some point, or it will be propagated into a PHINode eventually and that PHINode will dominate coro.suspend. Since we also track PHINode in the alias analysis, we will eventually find this pair that dominates each other.

lxfind added inline comments.Oct 27 2020, 11:24 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	Could you explain what could go wrong in this example if "a" is not on the frame? The "alloca" instruction simply defines a value, but nothing else. So your example code void foo () { alloca a; if (cond) { co_await use1 a; }else{ co_await use2 a; } } behaves exactly as void foo () { if (cond) { co_await alloca a1 use1 a1; }else{ co_await alloca a2 use2 a2; } }

junparser added inline comments.Oct 27 2020, 11:36 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
921	hmm... get your point. make sense to me.

Thanks for the patch!

This revision is now accepted and ready to land.Oct 27 2020, 11:44 PM

junparser added inline comments.Oct 28 2020, 12:00 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2022	@lxfind, with pointer tracking, I wonder whether lifetime check can be removed.

lxfind added inline comments.Oct 28 2020, 9:13 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2022	The lifetime checks may provide more accurate information in the case where allocas seem escaped. So I think it's helpful to keep it.

junparser added inline comments.Oct 28 2020, 7:00 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
898	I prefer this to be report_fatal_error.

This revision was landed with ongoing or failed builds.Oct 29 2020, 11:56 PM

Closed by commit rG9f5a2beadce4: [Coroutine] Properly determine whether an alloca should live on the frame (authored by lxfind). · Explain Why

This revision was automatically updated to reflect the committed changes.

lxfind added a commit: rG9f5a2beadce4: [Coroutine] Properly determine whether an alloca should live on the frame.

Fails here http://lab.llvm.org:8011/#/builders/70/builds/507

vitalybuka added inline comments.Oct 30 2020, 12:32 AM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
912–917	not needed

In D89768#2363899, @vitalybuka wrote:

Fails here http://lab.llvm.org:8011/#/builders/70/builds/507

Fixed with 1455259546996dd86236ef8c70bc65a27b457ba7

vitalybuka mentioned this in rG36fa658db525: [NFC] Fix "ambiguous overload for ‘operator=’".Oct 30 2020, 12:43 AM

In D89768#2363912, @vitalybuka wrote:

In D89768#2363899, @vitalybuka wrote:

Fails here http://lab.llvm.org:8011/#/builders/70/builds/507

Fixed with 1455259546996dd86236ef8c70bc65a27b457ba7

oops. Thank you @vitalybuka for fixing it!

Hello,

A git bisect has identified this change as the likely candidate for a new set of asserts / crashes in SwiftShader when attempting to use the coroutine passes. This is having knock-on issues with internal Google projects.

When passing this IR to ./bin/opt crash.ir -coro-early -coro-split -coro-elide -S with this change, we now get this crash.
Running the same command on the parent change does not crash, and behaves as expected.

I'd like to file a bug, but LLVM's Bugzilla is not letting me sign in, possibly due to spam restrictions. :-/

I'll do some investigation myself tomorrow, but any assistance here would be gratefully appreciated.

Many thanks,
Ben

In D89768#2377441, @ben-clayton wrote:

Hello,

A git bisect has identified this change as the likely candidate for a new set of asserts / crashes in SwiftShader when attempting to use the coroutine passes. This is having knock-on issues with internal Google projects.

When passing this IR to ./bin/opt crash.ir -coro-early -coro-split -coro-elide -S with this change, we now get this crash.
Running the same command on the parent change does not crash, and behaves as expected.

I'd like to file a bug, but LLVM's Bugzilla is not letting me sign in, possibly due to spam restrictions. :-/

I'll do some investigation myself tomorrow, but any assistance here would be gratefully appreciated.

Many thanks,
Ben

Thanks for reporting. I will take a look.

In D89768#2377441, @ben-clayton wrote:

Hello,

A git bisect has identified this change as the likely candidate for a new set of asserts / crashes in SwiftShader when attempting to use the coroutine passes. This is having knock-on issues with internal Google projects.

When passing this IR to ./bin/opt crash.ir -coro-early -coro-split -coro-elide -S with this change, we now get this crash.
Running the same command on the parent change does not crash, and behaves as expected.

I'd like to file a bug, but LLVM's Bugzilla is not letting me sign in, possibly due to spam restrictions. :-/

I'll do some investigation myself tomorrow, but any assistance here would be gratefully appreciated.

Many thanks,
Ben

I can confirm that this patch introduced a bug.
It can be triggered when there is an alloca instruction defined after coro.begin and used after a suspension. These allocas are not properly moved to the .resume function.
I will think about how to fix this.

lxfind mentioned this in D90977: [Coroutine] Move all used local allocas to the .resume function.Nov 6 2020, 3:04 PM

lxfind mentioned this in rGc2cb093d9b96: [Coroutine] Move all used local allocas to the .resume function.Nov 9 2020, 5:25 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Coroutines/

CoroFrame.cpp

354 lines

test/

Transforms/

Coroutines/

67 lines

54 lines

57 lines

65 lines

coro-debug-frame-variable.ll

8 lines

Diff 299431

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines

#undef DEBUG_TYPE // "coro-suspend-crossing"

#define DEBUG_TYPE "coro-frame"

namespace {

class FrameTypeBuilder;

// Mapping from the to-be-spilled value to all the users that need reload.

using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>;

struct AllocaInfo {

AllocaInst *Alloca;

DenseMap<Instruction *, llvm::Optional<APInt>> Aliases;

bool MayWriteBeforeCoroBegin;

AllocaInfo(AllocaInst *Alloca,

DenseMap<Instruction *, llvm::Optional<APInt>> Aliases,

bool MayWriteBeforeCoroBegin)

: Alloca(Alloca), Aliases(std::move(Aliases)),

MayWriteBeforeCoroBegin(MayWriteBeforeCoroBegin) {}

};

struct FrameDataInfo {

// All the values (that are not allocas) that needs to be spilled to the

// frame.

SpillInfo Spills;

// Allocas contains all values defined as allocas that need to live in the

// frame.

SmallVector<AllocaInst *, 8> Allocas;

SmallVector<AllocaInfo, 8> Allocas;

SmallVector<Value *, 8> getAllDefs() const {

SmallVector<Value *, 8> Defs;

for (const auto &P : Spills)

Defs.push_back(P.first);

for (auto *A : Allocas)

for (const auto &A : Allocas)

Defs.push_back(A);

Defs.push_back(A.Alloca);

return Defs;

}

uint32_t getFieldIndex(Value *V) const {

auto Itr = FieldIndexMap.find(V);

assert(Itr != FieldIndexMap.end() &&

"Value does not have a frame field index");

return Itr->second;

Show All 25 Lines

static void dumpSpills(StringRef Title, const SpillInfo &Spills) {

for (const auto &E : Spills) {

E.first->dump();

dbgs() << " user: ";

for (auto *I : E.second)

I->dump();

}

static void dumpAllocas(const SmallVectorImpl<AllocaInst *> &Allocas) {

static void dumpAllocas(const SmallVectorImpl<AllocaInfo> &Allocas) {

dbgs() << "------------- Allocas --------------\n";

for (auto *A : Allocas)

for (const auto &A : Allocas) {

A->dump();

A.Alloca->dump();

}

#endif

namespace {

using FieldIDType = size_t;

// We cannot rely solely on natural alignment of a type when building a

// coroutine frame and if the alignment specified on the Alloca instruction

// differs from the natural alignment of the alloca type we will need to insert

▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines

void FrameDataInfo::updateLayoutIndex(FrameTypeBuilder &B) {

auto Updater = [&](Value *I) {

setFieldIndex(I, B.getLayoutFieldIndex(getFieldIndex(I)));

};

LayoutIndexUpdateStarted = true;

for (auto &S : Spills)

Updater(S.first);

for (auto *A : Allocas)

for (const auto &A : Allocas)

Updater(A);

Updater(A.Alloca);

LayoutIndexUpdateStarted = false;

}

void FrameTypeBuilder::addFieldForAllocas(const Function &F,

FrameDataInfo &FrameData,

coro::Shape &Shape) {

DenseMap<AllocaInst *, unsigned int> AllocaIndex;

using AllocaSetType = SmallVector<AllocaInst *, 4>;

Show All 10 Lines

for (auto AllocaList : NonOverlapedAllocas) {

auto *LargestAI = *AllocaList.begin();

FieldIDType Id = addFieldForAlloca(LargestAI);

for (auto *Alloca : AllocaList)

FrameData.setFieldIndex(Alloca, Id);

}

});

if (!Shape.ReuseFrameSlot && !EnableReuseStorageInFrame) {

for (auto *Alloca : FrameData.Allocas) {

for (const auto &A : FrameData.Allocas) {

AllocaInst *Alloca = A.Alloca;

AllocaIndex[Alloca] = NonOverlapedAllocas.size();

NonOverlapedAllocas.emplace_back(AllocaSetType(1, Alloca));

}

return;

}

// Because there are pathes from the lifetime.start to coro.end

// for each alloca, the liferanges for every alloca is overlaped

Show All 13 Lines

for (auto U : CoroSuspendInst->users()) {

if (auto *ConstSWI = dyn_cast<SwitchInst>(U)) {

auto *SWI = const_cast<SwitchInst *>(ConstSWI);

DefaultSuspendDest[SWI] = SWI->getDefaultDest();

SWI->setDefaultDest(SWI->getSuccessor(1));

}

StackLifetime StackLifetimeAnalyzer(F, FrameData.Allocas,

auto ExtractAllocas = [&]() {

AllocaSetType Allocas;

Allocas.reserve(FrameData.Allocas.size());

for (const auto &A : FrameData.Allocas)

Allocas.push_back(A.Alloca);

return Allocas;

};

StackLifetime StackLifetimeAnalyzer(F, ExtractAllocas(),

StackLifetime::LivenessType::May);

StackLifetimeAnalyzer.run();

auto IsAllocaInferenre = [&](const AllocaInst *AI1, const AllocaInst *AI2) {

return StackLifetimeAnalyzer.getLiveRange(AI1).overlaps(

StackLifetimeAnalyzer.getLiveRange(AI2));

};

auto GetAllocaSize = [&](const AllocaInst *AI) {

auto GetAllocaSize = [&](const AllocaInfo &A) {

Optional<uint64_t> RetSize = AI->getAllocationSizeInBits(DL);

Optional<uint64_t> RetSize = A.Alloca->getAllocationSizeInBits(DL);

assert(RetSize && "We can't handle scalable type now.\n");

return RetSize.getValue();

};

// Put larger allocas in the front. So the larger allocas have higher

// priority to merge, which can save more space potentially. Also each

// AllocaSet would be ordered. So we can get the largest Alloca in one

// AllocaSet easily.

sort(FrameData.Allocas, [&](auto Iter1, auto Iter2) {

sort(FrameData.Allocas, [&](const auto &Iter1, const auto &Iter2) {

return GetAllocaSize(Iter1) > GetAllocaSize(Iter2);

});

for (auto *Alloca : FrameData.Allocas) {

for (const auto &A : FrameData.Allocas) {

AllocaInst *Alloca = A.Alloca;

bool Merged = false;

// Try to find if the Alloca is not inferenced with any existing

// NonOverlappedAllocaSet. If it is true, insert the alloca to that

// NonOverlappedAllocaSet.

for (auto &AllocaSet : NonOverlapedAllocas) {

assert(!AllocaSet.empty() && "Processing Alloca Set is not empty.\n");

bool CouldMerge = none_of(AllocaSet, [&](auto Iter) {

return IsAllocaInferenre(Alloca, Iter);

▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines

Shape.RetconLowering.IsFrameInlineInStorage

B.getStructAlign() <= Id->getStorageAlignment());

break;

}

return FrameTy;

}

// We use a pointer use visitor to discover if there are any writes into an

// We use a pointer use visitor to track how an alloca is being used.

// alloca that dominates CoroBegin. If that is the case, insertSpills will copy

// The goal is to be able to answer the following three questions:

// the value from the alloca into the coroutine frame spill slot corresponding

// 1. Should this alloca be allocated on the frame instead.

// to that alloca. We also collect any alias pointing to the alloca created

// 2. Could the content of the alloca be modified prior to CoroBegn, which would

// before CoroBegin but used after CoroBegin. These alias will be recreated

// require copying the data from alloca to the frame after CoroBegin.

// after CoroBegin from the frame address so that latter references are

// 3. Is there any alias created for this alloca prior to CoroBegin, but used

// pointing to the frame instead of the stack.

// after CoroBegin. In that case, we will need to recreate the alias after

// Note: We are repurposing PtrUseVisitor's isEscaped() to mean whether the

// CoroBegin based off the frame. To answer question 1, we track two things:

// pointer is potentially written into.

// a. List of all BasicBlocks that use this alloca or any of the aliases of

// TODO: If the pointer is really escaped, we are in big trouble because we

// the alloca. In the end, we check if there exists any two basic blocks that

// will be escaping a pointer to a stack address that would no longer exist

// cross suspension points. If so, this alloca must be put on the frame. b.

// soon. However most escape analysis isn't good enough to precisely tell,

// Whether the alloca or any alias of the alloca is escaped at some point,

// so we are assuming that if a pointer is escaped that it's written into.

// either by storing the address somewhere, or the address is used in a

// TODO: Another potential issue is if we are creating an alias through

// function call that might capture. If it's ever escaped, this alloca must be

// a function call, e.g:

// put on the frame conservatively.

// %a = AllocaInst ...

// To answer quetion 2, we track through the variable MayWriteBeforeCoroBegin.

// %b = call @computeAddress(... %a)

// Whenever a potential write happens, either through a store instruction, a

// If %b is an alias of %a and will be used after CoroBegin, this will be broken

// function call or any of the memory intrinsics, we check whether this

// and there is nothing we can do about it.

// instruction is prior to CoroBegin. To answer question 3, we track the offsets

// of all aliases created for the alloca prior to CoroBegin but used after

// CoroBegin. llvm::Optional is used to be able to represent the case when the

// offset is unknown (e.g. when you have a PHINode that takes in different

// offset values). We cannot handle unknown offsets and will assert. This is the

// potential issue left out. An ideal solution would likely require a

// significant redesign.

namespace {

struct AllocaUseVisitor : PtrUseVisitor<AllocaUseVisitor> {

using Base = PtrUseVisitor<AllocaUseVisitor>;

AllocaUseVisitor(const DataLayout &DL, const DominatorTree &DT,

const CoroBeginInst &CB)

const CoroBeginInst &CB, const SuspendCrossingInfo &Checker)

: PtrUseVisitor(DL), DT(DT), CoroBegin(CB) {}

: PtrUseVisitor(DL), DT(DT), CoroBegin(CB), Checker(Checker) {}

// We are only interested in uses that's not dominated by coro.begin.

void visit(Instruction &I) {

if (!DT.dominates(&CoroBegin, &I))

UserBBs.insert(I.getParent());

Base::visit(I);

// If the pointer is escaped prior to CoroBegin, we have to assume it would

// be written into before CoroBegin as well.

if (PI.isEscaped() && !DT.dominates(&CoroBegin, PI.getEscapingInst())) {

MayWriteBeforeCoroBegin = true;

}

// We need to provide this overload as PtrUseVisitor uses a pointer based

// visiting function.

void visit(Instruction *I) { return visit(*I); }

// We cannot handle PHI node and SelectInst because they could be selecting

// between two addresses that point to different Allocas.

void visitPHINode(PHINode &I) {

assert(!usedAfterCoroBegin(I) &&

enqueueUsers(I);

"Unable to handle PHI node of aliases created before CoroBegin but "

handleAlias(I);

"used after CoroBegin");

}

void visitSelectInst(SelectInst &I) {

assert(!usedAfterCoroBegin(I) &&

enqueueUsers(I);

"Unable to handle Select of aliases created before CoroBegin but "

handleAlias(I);

"used after CoroBegin");

}

void visitLoadInst(LoadInst &) {}

void visitStoreInst(StoreInst &SI) {

// Base visit function will handle escape setting.

Base::visitStoreInst(SI);

// If the use is an operand, the pointer escaped and anything can write into

// Regardless whether the alias of the alloca is the value operand or the

// that memory. If the use is the pointer, we are definitely writing into the

// pointer operand, we need to assume the alloca is been written.

// alloca and therefore we need to copy.

handleMayWrite(SI);

void visitStoreInst(StoreInst &SI) { PI.setEscaped(&SI); }

}

// All mem intrinsics modify the data.

void visitMemIntrinsic(MemIntrinsic &MI) { PI.setEscaped(&MI); }

void visitMemIntrinsic(MemIntrinsic &MI) { handleMayWrite(MI); }

void visitBitCastInst(BitCastInst &BC) {

Base::visitBitCastInst(BC);

handleAlias(BC);

}

void visitAddrSpaceCastInst(AddrSpaceCastInst &ASC) {

Base::visitAddrSpaceCastInst(ASC);

handleAlias(ASC);

}

void visitGetElementPtrInst(GetElementPtrInst &GEPI) {

// The base visitor will adjust Offset accordingly.

Base::visitGetElementPtrInst(GEPI);

handleAlias(GEPI);

}

const SmallVector<std::pair<Instruction *, APInt>, 1> &getAliases() const {

void visitCallBase(CallBase &CB) {

return Aliases;

for (unsigned Op = 0, OpCount = CB.getNumArgOperands(); Op < OpCount; ++Op)

if (U->get() == CB.getArgOperand(Op) && !CB.doesNotCapture(Op))

PI.setEscaped(&CB);

handleMayWrite(CB);

}

bool getShouldLiveOnFrame() const {

if (!ShouldLiveOnFrame)

ShouldLiveOnFrame = computeShouldLiveOnFrame();

return ShouldLiveOnFrame.getValue();

}

bool getMayWriteBeforeCoroBegin() const { return MayWriteBeforeCoroBegin; }

DenseMap<Instruction *, llvm::Optional<APInt>> getAliasesCopy() const {

assert(getShouldLiveOnFrame() && "This method should only be called if the "

"alloca needs to live on the frame.");

for (const auto &P : AliasOffetMap) {

assert(P.second && "Unable to handle an alias with unknown offset.");

junparserUnsubmitted

Not Done

I prefer this to be report_fatal_error.

junparser: I prefer this to be report_fatal_error.

}

return AliasOffetMap;

}

private:

const DominatorTree &DT;

const CoroBeginInst &CoroBegin;

const SuspendCrossingInfo &Checker;

// All alias to the original AllocaInst, and are used after CoroBegin.

// Each entry contains the instruction and the offset in the original Alloca.

SmallVector<std::pair<Instruction *, APInt>, 1> Aliases{};

DenseMap<Instruction *, llvm::Optional<APInt>> AliasOffetMap{};

SmallPtrSet<BasicBlock *, 2> UserBBs{};

bool MayWriteBeforeCoroBegin{false};

mutable llvm::Optional<bool> ShouldLiveOnFrame{};

bool computeShouldLiveOnFrame() const {

if (PI.isEscaped())

return true;

vitalybukaUnsubmitted

Not Done

// after CoroBegin. Each entry contains the instruction and the offset in the

// original Alloca. They need to be recreated after CoroBegin off the frame.

- DenseMap<Instruction *, llvm::Optional<APInt>> AliasOffetMap{};

- SmallPtrSet<BasicBlock *, 2> UserBBs{};

+ DenseMap<Instruction *, llvm::Optional<APInt>> AliasOffetMap;

+ SmallPtrSet<BasicBlock *, 2> UserBBs;

bool MayWriteBeforeCoroBegin{false};

- mutable llvm::Optional<bool> ShouldLiveOnFrame{};

+ mutable llvm::Optional<bool> ShouldLiveOnFrame;

bool computeShouldLiveOnFrame() const {

not needed

vitalybuka: not needed

for (auto *BB1 : UserBBs)

for (auto *BB2 : UserBBs)

if (Checker.hasPathCrossingSuspendPoint(BB1, BB2))

junparserUnsubmitted

Not Done

this function implies dominate relation for arg1 and arg2, so we can not use hasPathCrossingSuspendPoint for two user basic blocks directly.

junparser: this function implies dominate relation for arg1 and arg2, so we can not use…

lxfindAuthorUnsubmitted

Done

hmm if that's the case, I think we might already have some buggy code here. The checks on lifetime markers don't necessary have dominance relationships. I saw you removed the dominance assertion in https://reviews.llvm.org/D75664. Do you remember why?

lxfind: hmm if that's the case, I think we might already have some buggy code here. The checks on…

junparserUnsubmitted

Not Done

The bitcast users hit the assertion, however, they do not across the suspend point due to rewriteMaterializableInstructions. Except that, each lifetime marker has dominance relationship with some of the users. Iterate all the lifetime marker has effect as same as def

junparser: The bitcast users hit the assertion, however, they do not across the suspend point due to…

junparserUnsubmitted

Not Done

Still need call Checker.hasPathCrossingSuspendPoint(AllocaDef, Pointeruser)

junparser: Still need call Checker.hasPathCrossingSuspendPoint(AllocaDef, Pointeruser)

lxfindAuthorUnsubmitted

Done

We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca belong to different suspend regions, which causes the spill.
I thought about this a bit more, and my conclusion is that although the API is not intended to be used this way, it actually works here.
The goal for this loop is to check whether an alloca needs to live across a suspension. If an alloca needs to live across a suspension, it means there exists a coro.suspend such that a user BB happens before it and a user BB happens after it, and these two BBs must dominate each other through coro.susend. Hence one of the user BB pairs must return true on the hasPathCrossingSuspendPoint check. On the other hand, for BBs that don't even dominate each other, hasPathCrossingSuspendPoint will return false anyway and won't affect the algorithm.

lxfind: We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca…

junparserUnsubmitted

Not Done

let's say :

void foo () {
  alloca a;
  if (cond)
    {
      co_await
      use1 a;
    }else
      use2 a;
}

in such case, hasPathCrossingSuspendPoint(use1, use2) return false which means a is not keep on frame, this is wrong.

junparser: let's say : ``` void foo () { alloca a; if (cond) { co_await use1 a…

lxfindAuthorUnsubmitted

Done

In this case, "a" is just defined but not used at all before the co_await, so it would be correct to not keep it on frame, right? Could you explain why in this case "a" needs to be on the frame?

lxfind: In this case, "a" is just defined but not used at all before the co_await, so it would be…

junparserUnsubmitted

Not Done

It may used before the co_await, as long as initial_suspend never suspend and cond is false. This pattern can also be put in a loop. we do not know whether part will be executed.

junparser: It may used before the co_await, as long as initial_suspend never suspend and cond is false.

lxfindAuthorUnsubmitted

Done

It may used before the co_await, as long as initial_suspend never suspend and cond is false. This pattern can also be put in a loop. we do not know whether part will be executed.

In your code example, "a" is not used before co_await, neither initialized. So it doesn't matter if it lives on the frame or not. If it is used, then there will exists a user-pair that crosses suspensions.
Could you explain what could go wrong in your example above if "a" is not kept in the frame?

lxfind: > It may used before the co_await, as long as initial_suspend never suspend and cond is false.

junparserUnsubmitted

Not Done

We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca belong to different suspend regions, which causes the spill.
I thought about this a bit more, and my conclusion is that although the API is not intended to be used this way, it actually works here.
The goal for this loop is to check whether an alloca needs to live across a suspension. If an alloca needs to live across a suspension, it means there exists a coro.suspend such that a user BB happens before it and a user BB happens after it, and these two BBs must dominate each other through coro.susend. Hence one of the user BB pairs must return true on the hasPathCrossingSuspendPoint check. On the other hand, for BBs that don't even dominate each other, hasPathCrossingSuspendPoint will return false anyway and won't affect the algorithm.

I do not understand why there is a user bb happends before coro.suspend?

junparser: > We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca…

lxfindAuthorUnsubmitted

Done

If an alloca needs to live across a suspension, there must exists two uses of the alloca (or through an alias) that one happens before a coro.suspend and one happens after that same coro.suspend. If no two uses of alloca can cross a suspend, the alloca does not ever need to live on the frame. Does this part sound right?

Next we need to prove that if there exists such a pair of use, one of them will return true on hasPathCrossingSuspendPoint.
It's obvious that the alloca use after coro.suspend must be dominated by coro.suspend. The alloca happens before coro.suspend is a bit trickier to think about. In the simpler case, if the alloca happens before coro.suspend also dominates coro.suspend, then we have a pair of user BBs that would return true on the check, because they dominate each other through coro.suspend. If the alloca happens before coro.suspend does not dominate coro.suspend, and yet it's meaningfully used, it would either escape at some point, or it will be propagated into a PHINode eventually and that PHINode will dominate coro.suspend. Since we also track PHINode in the alias analysis, we will eventually find this pair that dominates each other.

lxfind: If an alloca needs to live across a suspension, there must exists two uses of the alloca (or…

junparserUnsubmitted

Not Done

If an alloca needs to live across a suspension, there must exists two uses of the alloca (or through an alias) that one happens before a coro.suspend and one happens after that same coro.suspend. If no two uses of alloca can cross a suspend, the alloca does not ever need to live on the frame. Does this part sound right?

There is no before/after relation. as long as two uses used in different suspend region. the case can be changed to

void foo () {
  alloca a;
  if (cond)
    {
      co_await
      use1 a;
    }else{
      co_await
      use2 a;
    }
}

Next we need to prove that if there exists such a pair of use, one of them will return true on hasPathCrossingSuspendPoint.
It's obvious that the alloca use after coro.suspend must be dominated by coro.suspend. The alloca happens before coro.suspend is a bit trickier to think about. In the simpler case, if the alloca happens before coro.suspend also dominates coro.suspend, then we have a pair of user BBs that would return true on the check, because they dominate each other through coro.suspend. If the alloca happens before coro.suspend does not dominate coro.suspend, and yet it's meaningfully used, it would either escape at some point, or it will be propagated into a PHINode eventually and that PHINode will dominate coro.suspend. Since we also track PHINode in the alias analysis, we will eventually find this pair that dominates each other.

junparser: > If an alloca needs to live across a suspension, there must exists two uses of the alloca (or…

lxfindAuthorUnsubmitted

Done

Could you explain what could go wrong in this example if "a" is not on the frame?
The "alloca" instruction simply defines a value, but nothing else.
So your example code

void foo () {
  alloca a;
  if (cond)
    {
      co_await
      use1 a;
    }else{
      co_await
      use2 a;
    }
}

behaves exactly as

void foo () {
  if (cond)
    {
      co_await
      alloca a1
      use1 a1;
    }else{
      co_await
      alloca a2
      use2 a2;
    }
}

lxfind: Could you explain what could go wrong in this example if "a" is not on the frame? The "alloca"…

junparserUnsubmitted

Not Done

hmm... get your point. make sense to me.

junparser: hmm... get your point. make sense to me.

return true;

return false;

}

void handleMayWrite(const Instruction &I) {

if (!DT.dominates(&CoroBegin, &I))

MayWriteBeforeCoroBegin = true;

}

bool usedAfterCoroBegin(Instruction &I) {

for (auto &U : I.uses())

if (DT.dominates(&CoroBegin, U))

return true;

return false;

}

void handleAlias(Instruction &I) {

if (!usedAfterCoroBegin(I))

// We track all aliases created prior to CoroBegin but used after.

// These aliases may need to be recreated after CoroBegin if the alloca

// need to live on the frame.

if (DT.dominates(&CoroBegin, &I) || !usedAfterCoroBegin(I))

return;

assert(IsOffsetKnown && "Can only handle alias with known offset created "

if (!IsOffsetKnown) {

"before CoroBegin and used after");

AliasOffetMap[&I] = {};

Aliases.emplace_back(&I, Offset);

} else {

auto Itr = AliasOffetMap.find(&I);

if (Itr == AliasOffetMap.end()) {

AliasOffetMap[&I] = Offset;

} else if (Itr->second.hasValue() && Itr->second.getValue() != Offset) {

// If we have seen two different possible values for this alias, we set

// it to empty.

AliasOffetMap[&I] = {};

}

};

} // namespace

// We need to make room to insert a spill after initial PHIs, but before

// catchswitch instruction. Placing it before violates the requirement that

// catchswitch, like all other EHPads must be the first nonPHI in a block.

▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines

static Instruction *insertSpills(const FrameDataInfo &FrameData,

SpillBlock->splitBasicBlock(&SpillBlock->front(), "PostSpill");

Shape.AllocaSpillBlock = SpillBlock;

// retcon and retcon.once lowering assumes all uses have been sunk.

if (Shape.ABI == coro::ABI::Retcon || Shape.ABI == coro::ABI::RetconOnce) {

// If we found any allocas, replace all of their remaining uses with Geps.

Builder.SetInsertPoint(&SpillBlock->front());

for (const auto &P : FrameData.Allocas) {

auto *G = GetFramePointer(P);

AllocaInst *Alloca = P.Alloca;

auto *G = GetFramePointer(Alloca);

// We are not using ReplaceInstWithInst(P.first, cast<Instruction>(G))

// here, as we are changing location of the instruction.

G->takeName(P);

G->takeName(Alloca);

P->replaceAllUsesWith(G);

Alloca->replaceAllUsesWith(G);

P->eraseFromParent();

Alloca->eraseFromParent();

}

return FramePtr;

}

// If we found any alloca, replace all of their remaining uses with GEP

// instructions. Because new dbg.declare have been created for these alloca,

// we also delete the original dbg.declare and replace other uses with undef.

// Note: We cannot replace the alloca with GEP instructions indiscriminately,

// as some of the uses may not be dominated by CoroBegin.

bool MightNeedToCopy = false;

Builder.SetInsertPoint(&Shape.AllocaSpillBlock->front());

SmallVector<Instruction *, 4> UsersToUpdate;

for (AllocaInst *A : FrameData.Allocas) {

for (const auto &A : FrameData.Allocas) {

AllocaInst *Alloca = A.Alloca;

UsersToUpdate.clear();

for (User *U : A->users()) {

for (User *U : Alloca->users()) {

auto *I = cast<Instruction>(U);

if (DT.dominates(CB, I))

UsersToUpdate.push_back(I);

else

MightNeedToCopy = true;

}

if (!UsersToUpdate.empty()) {

if (UsersToUpdate.empty())

auto *G = GetFramePointer(A);

continue;

G->setName(A->getName() + Twine(".reload.addr"));

auto *G = GetFramePointer(Alloca);

TinyPtrVector<DbgDeclareInst *> DIs = FindDbgDeclareUses(A);

G->setName(Alloca->getName() + Twine(".reload.addr"));

TinyPtrVector<DbgDeclareInst *> DIs = FindDbgDeclareUses(Alloca);

if (!DIs.empty())

DIBuilder(*A->getModule(),

DIBuilder(*Alloca->getModule(),

/*AllowUnresolved*/ false)

.insertDeclare(G, DIs.front()->getVariable(),

DIs.front()->getExpression(),

DIs.front()->getDebugLoc(), DIs.front());

for (auto *DI : FindDbgDeclareUses(A))

for (auto *DI : FindDbgDeclareUses(Alloca))

DI->eraseFromParent();

replaceDbgUsesWithUndef(A);

replaceDbgUsesWithUndef(Alloca);

for (Instruction *I : UsersToUpdate)

I->replaceUsesOfWith(A, G);

I->replaceUsesOfWith(Alloca, G);

}

// If we discovered such uses not dominated by CoroBegin, see if any of them

// preceed coro begin and have instructions that can modify the

// value of the alloca and therefore would require a copying the value into

// the spill slot in the coroutine frame.

if (MightNeedToCopy) {

Builder.SetInsertPoint(FramePtr->getNextNode());

for (const auto &A : FrameData.Allocas) {

for (AllocaInst *A : FrameData.Allocas) {

AllocaInst *Alloca = A.Alloca;

AllocaUseVisitor Visitor(A->getModule()->getDataLayout(), DT, *CB);

if (A.MayWriteBeforeCoroBegin) {

auto PtrI = Visitor.visitPtr(*A);

assert(!PtrI.isAborted());

if (PtrI.isEscaped()) {

// isEscaped really means potentially modified before CoroBegin.

if (A->isArrayAllocation())

if (Alloca->isArrayAllocation())

report_fatal_error(

"Coroutines cannot handle copying of array allocas yet");

auto *G = GetFramePointer(A);

auto *G = GetFramePointer(Alloca);

auto *Value = Builder.CreateLoad(A->getAllocatedType(), A);

auto *Value = Builder.CreateLoad(Alloca->getAllocatedType(), Alloca);

Builder.CreateStore(Value, G);

}

// For each alias to Alloca created before CoroBegin but used after

// CoroBegin, we recreate them after CoroBegin by appplying the offset

// to the pointer in the frame.

for (const auto &Alias : Visitor.getAliases()) {

for (const auto &Alias : A.Aliases) {

auto *FramePtr = GetFramePointer(A);

auto *FramePtr = GetFramePointer(Alloca);

auto *FramePtrRaw =

Builder.CreateBitCast(FramePtr, Type::getInt8PtrTy(C));

auto *AliasPtr = Builder.CreateGEP(

FramePtrRaw, ConstantInt::get(Type::getInt64Ty(C), Alias.second));

FramePtrRaw,

ConstantInt::get(Type::getInt64Ty(C), Alias.second.getValue()));

auto *AliasPtrTyped =

Builder.CreateBitCast(AliasPtr, Alias.first->getType());

Alias.first->replaceUsesWithIf(

AliasPtrTyped, [&](Use &U) { return DT.dominates(CB, U); });

}

return FramePtr;

}

// Sets the unwind edge of an instruction to a particular successor.

static void setUnwindEdgeTo(Instruction *TI, BasicBlock *Succ) {

if (auto *II = dyn_cast<InvokeInst>(TI))

II->setUnwindDest(Succ);

else if (auto *CS = dyn_cast<CatchSwitchInst>(TI))

▲ Show 20 Lines • Show All 753 Lines • ▼ Show 20 Lines

for (BasicBlock *DomBB : DomSet) {

break;

}

static void collectFrameAllocas(Function &F, coro::Shape &Shape,

SuspendCrossingInfo &Checker,

const SuspendCrossingInfo &Checker,

SmallVectorImpl<AllocaInst *> &Allocas) {

SmallVectorImpl<AllocaInfo> &Allocas) {

// Collect lifetime.start info for each alloca.

using LifetimeStart = SmallPtrSet<Instruction *, 2>;

llvm::DenseMap<AllocaInst *, std::unique_ptr<LifetimeStart>> LifetimeMap;

for (Instruction &I : instructions(F)) {

auto *II = dyn_cast<IntrinsicInst>(&I);

if (!II || II->getIntrinsicID() != Intrinsic::lifetime_start)

continue;

Show All 11 Lines

for (Instruction &I : instructions(F)) {

auto *AI = dyn_cast<AllocaInst>(&I);

if (!AI)

continue;

// The PromiseAlloca will be specially handled since it needs to be in a

// fixed position in the frame.

if (AI == Shape.SwitchLowering.PromiseAlloca) {

continue;

}

auto Iter = LifetimeMap.find(AI);

for (User *U : I.users()) {

bool ShouldLiveOnFrame = false;

auto Iter = LifetimeMap.find(AI);

if (Iter != LifetimeMap.end()) {

junparserUnsubmitted

Not Done

@lxfind, with pointer tracking, I wonder whether lifetime check can be removed.

junparser: @lxfind, with pointer tracking, I wonder whether lifetime check can be removed.

lxfindAuthorUnsubmitted

Done

The lifetime checks may provide more accurate information in the case where allocas seem escaped. So I think it's helpful to keep it.

lxfind: The lifetime checks may provide more accurate information in the case where allocas seem…

// Check against lifetime.start if the instruction has the info.

if (Iter != LifetimeMap.end())

for (User *U : I.users()) {

for (auto *S : *Iter->second) {

for (auto *S : *Iter->second)

if ((ShouldLiveOnFrame = Checker.isDefinitionAcrossSuspend(*S, U)))

break;

}

if (ShouldLiveOnFrame)

else

ShouldLiveOnFrame = Checker.isDefinitionAcrossSuspend(I, U);

if (ShouldLiveOnFrame) {

Allocas.push_back(AI);

break;

}

if (!ShouldLiveOnFrame)

continue;

}

// At this point, either ShouldLiveOnFrame is true or we didn't have

// lifetime information. We will need to rely on more precise pointer

// tracking.

DominatorTree DT(F);

AllocaUseVisitor Visitor{F.getParent()->getDataLayout(), DT,

*Shape.CoroBegin, Checker};

Visitor.visitPtr(*AI);

junparserUnsubmitted

Not Done

we can call this at the beginning of this function, and then sink the aliases after coro.begin as we do in rewriteMaterializableInstructions, keep the original logic unchanged.

I think we can even deal with unknown offset.

junparser: we can call this at the beginning of this function, and then sink the aliases after coro.begin…

lxfindAuthorUnsubmitted

Done

I am not exactly sure what you mean, but let me explain a bit more about the alias analysis here.
The alias analysis serves three purposes.
The first purpose, which was introduced in D86859, is to identify aliases created before CoroBegin so that we can recreate them after CoroBegin. In this case, we cannot deal with unknown offset because we simply cannot recreate an alias if the offset is unknow.
The second purpose, which was also introduced in D86859, is to identify allocas that may have been written into before CoroBegin. If that happens, we need to copy the content from stack to the frame.
The third purpose, which is new in this patch, is to identify which alloca should go to the frame. And we need alias analysis because without it, we won't be able to detect cases where an alloca needs to stay alive across suspension points due to indirect escape (as demonstrated in the added test cases). rewriteMaterializableInstructions cannot handle those indirect escapes.

Which case do you think we can/should simplify?

lxfind: I am not exactly sure what you mean, but let me explain a bit more about the alias analysis…

junparserUnsubmitted

Not Done

This makes me more clear! LGTM.

junparser: This makes me more clear! LGTM.

if (!Visitor.getShouldLiveOnFrame())

continue;

Allocas.emplace_back(AI, Visitor.getAliasesCopy(),

Visitor.getMayWriteBeforeCoroBegin());

}

void coro::buildCoroutineFrame(Function &F, Shape &Shape) {

eliminateSwiftError(F, Shape);

if (Shape.ABI == coro::ABI::Switch &&

Shape.SwitchLowering.PromiseAlloca) {

▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines

for (User *U : I.users())

}

LLVM_DEBUG(dumpSpills("Spills", FrameData.Spills));

if (Shape.ABI == coro::ABI::Retcon || Shape.ABI == coro::ABI::RetconOnce)

sinkSpillUsesAfterCoroBegin(F, FrameData, Shape.CoroBegin);

Shape.FrameTy = buildFrameType(F, Shape, FrameData);

// Add PromiseAlloca to Allocas list so that it is processed in insertSpills.

if (Shape.ABI == coro::ABI::Switch && Shape.SwitchLowering.PromiseAlloca)

FrameData.Allocas.push_back(Shape.SwitchLowering.PromiseAlloca);

// TODO: We assume that the promise alloca won't be modified before

// CoroBegin and no alias will be create before CoroBegin.

FrameData.Allocas.emplace_back(

Shape.SwitchLowering.PromiseAlloca,

DenseMap<Instruction *, llvm::Optional<APInt>>{}, false);

Shape.FramePtr = insertSpills(FrameData, Shape);

lowerLocalAllocas(LocalAllocas, DeadInstructions);

for (auto I : DeadInstructions)

I->eraseFromParent();

}

llvm/test/Transforms/Coroutines/coro-alloca-01.ll

This file was added.

				; Tests that CoroSplit can succesfully determine allocas should live on the frame
				; if their aliases are used across suspension points through PHINode.
				; RUN: opt < %s -coro-split -S \| FileCheck %s
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				define i8* @f(i1 %n) "coroutine.presplit"="1" {
				entry:
				%x = alloca i64
				%y = alloca i64
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				br i1 %n, label %flag_true, label %flag_false

				flag_true:
				%x.alias = bitcast i64* %x to i32*
				br label %merge

				flag_false:
				%y.alias = bitcast i64* %y to i32*
				br label %merge

				merge:
				%alias_phi = phi i32* [ %x.alias, %flag_true ], [ %y.alias, %flag_false ]
				%sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sp1, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				call void @print(i32* %alias_phi)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; both %x and %y, as well as %alias_phi would all go to the frame.
				; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i64, i64, i32*, i1 }
				; CHECK-LABEL: @f(
				; CHECK: %x.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				; CHECK: %y.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 3
				; CHECK: %x.alias = bitcast i64* %x.reload.addr to i32*
				; CHECK: %y.alias = bitcast i64* %y.reload.addr to i32*
				; CHECK: %alias_phi = select i1 %n, i32* %x.alias, i32* %y.alias
				; CHECK: %alias_phi.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4
				; CHECK: store i32* %alias_phi, i32** %alias_phi.spill.addr, align 8

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare void @print(i32*)
				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-alloca-02.ll

This file was added.

				; Tests that if an alloca is escaped through storing the address,
				; the alloac will be put on the frame.
				; RUN: opt < %s -coro-split -S \| FileCheck %s
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				define i8* @f() "coroutine.presplit"="1" {
				entry:
				%x = alloca i64
				%y = alloca i32*
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%x.alias = bitcast i64* %x to i32*
				store i32* %x.alias, i32** %y
				%sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sp1, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%x1 = load i32, i32* %y
				call void @print(i32* %x1)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i64, i32*, i1 }
				; CHECK-LABEL: define i8* @f()
				; CHECK: %x.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				; CHECK: %y.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 3
				; CHECK: %x.alias = bitcast i64* %x.reload.addr to i32*
				; CHECK: store i32* %x.alias, i32** %y.reload.addr, align 8

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare void @print(i32*)
				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-alloca-03.ll

This file was added.

				; Tests that allocas escaped through function calls will live on the frame.
				; RUN: opt < %s -coro-split -S \| FileCheck %s
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				define i8* @f() "coroutine.presplit"="1" {
				entry:
				%x = alloca i64
				%y = alloca i64
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%x.alias = bitcast i64* %x to i32*
				call void @capture_call(i32* %x.alias)
				%y.alias = bitcast i64* %y to i32*
				call void @nocapture_call(i32* %y.alias)
				%sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sp1, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; %x needs to go to the frame since it's escaped; %y will stay as local since it doesn't escape.
				; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i64, i1 }
				; CHECK-LABEL: define i8* @f()
				; CHECK: %y = alloca i64, align 8
				; CHECK: %x.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				; CHECK: %x.alias = bitcast i64* %x.reload.addr to i32*
				; CHECK: call void @capture_call(i32* %x.alias)
				; CHECK: %y.alias = bitcast i64* %y to i32*
				; CHECK: call void @nocapture_call(i32* %y.alias)

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare void @capture_call(i32*)
				declare void @nocapture_call(i32* nocapture)
				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-alloca-04.ll

This file was added.

				; Tests that CoroSplit can succesfully determine allocas should live on the frame
				; if their aliases are used across suspension points through PHINode.
				; RUN: opt < %s -coro-split -S \| FileCheck %s
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				define i8* @f(i1 %n) "coroutine.presplit"="1" {
				entry:
				%x = alloca i64
				br i1 %n, label %flag_true, label %flag_false

				flag_true:
				%x.alias1 = bitcast i64* %x to i32*
				br label %merge

				flag_false:
				%x.alias2 = bitcast i64* %x to i32*
				br label %merge

				merge:
				%alias_phi = phi i32* [ %x.alias1, %flag_true ], [ %x.alias2, %flag_false ]
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sp1, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				call void @print(i32* %alias_phi)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; both %x and %alias_phi would go to the frame.
				; CHECK: %f.Frame = type { void (%f.Frame), void (%f.Frame), i64, i32*, i1 }
				; CHECK-LABEL: @f(
				; CHECK: store void (%f.Frame) @f.destroy, void (%f.Frame)* %destroy.addr
				; CHECK-NEXT: %0 = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				; CHECK-NEXT: %1 = bitcast i64* %0 to i8*
				; CHECK-NEXT: %2 = bitcast i8* %1 to i32*
				; CHECK-NEXT: %alias_phi.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 3
				; CHECK-NEXT: store i32* %2, i32** %alias_phi.spill.addr

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare void @print(i32*)
				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll

	Show All 20 Lines
	; The CHECKs verify that dbg.declare intrinsics are created for the coroutine			; The CHECKs verify that dbg.declare intrinsics are created for the coroutine
	; funclet 'f.resume', and that they reference the address of the variables on			; funclet 'f.resume', and that they reference the address of the variables on
	; the coroutine frame. The debug locations for the original function 'f' are			; the coroutine frame. The debug locations for the original function 'f' are
	; static (!11 and !13), whereas the coroutine funclet will have its own new			; static (!11 and !13), whereas the coroutine funclet will have its own new
	; ones with identical line and column numbers.			; ones with identical line and column numbers.
	;			;
	; CHECK-LABEL: define void @f() {			; CHECK-LABEL: define void @f() {
	; CHECK: entry:			; CHECK: entry:
				; CHECK: %j = alloca i32, align 4
	; CHECK: [[IGEP:%.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4			; CHECK: [[IGEP:%.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4
	; CHECK: [[JGEP:%.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
	; CHECK: init.ready:			; CHECK: init.ready:
	; CHECK: call void @llvm.dbg.declare(metadata i32* [[IGEP]], metadata ![[IVAR:[0-9]+]], metadata !DIExpression()), !dbg ![[IDBGLOC:[0-9]+]]			; CHECK: call void @llvm.dbg.declare(metadata i32* [[IGEP]], metadata ![[IVAR:[0-9]+]], metadata !DIExpression()), !dbg ![[IDBGLOC:[0-9]+]]
	; CHECK: await.ready:			; CHECK: await.ready:
	; CHECK: call void @llvm.dbg.declare(metadata i32* [[JGEP]], metadata ![[JVAR:[0-9]+]], metadata !DIExpression()), !dbg ![[JDBGLOC:[0-9]+]]			; CHECK: call void @llvm.dbg.declare(metadata i32* %j, metadata ![[JVAR:[0-9]+]], metadata !DIExpression()), !dbg ![[JDBGLOC:[0-9]+]]
	;			;
	; CHECK-LABEL: define internal fastcc void @f.resume({{.*}}) {			; CHECK-LABEL: define internal fastcc void @f.resume({{.*}}) {
	; CHECK: entry.resume:			; CHECK: entry.resume:
				; CHECK: %j = alloca i32, align 4
	; CHECK: [[IGEP_RESUME:%.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4			; CHECK: [[IGEP_RESUME:%.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4
	; CHECK: [[JGEP_RESUME:%.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
	; CHECK: init.ready:			; CHECK: init.ready:
	; CHECK: call void @llvm.dbg.declare(metadata i32* [[IGEP_RESUME]], metadata ![[IVAR_RESUME:[0-9]+]], metadata !DIExpression()), !dbg ![[IDBGLOC_RESUME:[0-9]+]]			; CHECK: call void @llvm.dbg.declare(metadata i32* [[IGEP_RESUME]], metadata ![[IVAR_RESUME:[0-9]+]], metadata !DIExpression()), !dbg ![[IDBGLOC_RESUME:[0-9]+]]
	; CHECK: await.ready:			; CHECK: await.ready:
	; CHECK: call void @llvm.dbg.declare(metadata i32* [[JGEP_RESUME]], metadata ![[JVAR_RESUME:[0-9]+]], metadata !DIExpression()), !dbg ![[JDBGLOC_RESUME:[0-9]+]]			; CHECK: call void @llvm.dbg.declare(metadata i32* %j, metadata ![[JVAR_RESUME:[0-9]+]], metadata !DIExpression()), !dbg ![[JDBGLOC_RESUME:[0-9]+]]
	;			;
	; CHECK: ![[IVAR]] = !DILocalVariable(name: "i"			; CHECK: ![[IVAR]] = !DILocalVariable(name: "i"
	; CHECK: ![[SCOPE:[0-9]+]] = distinct !DILexicalBlock(scope: !8, file: !1, line: 23, column: 12)			; CHECK: ![[SCOPE:[0-9]+]] = distinct !DILexicalBlock(scope: !8, file: !1, line: 23, column: 12)
	; CHECK: ![[IDBGLOC]] = !DILocation(line: 24, column: 7, scope: ![[SCOPE]])			; CHECK: ![[IDBGLOC]] = !DILocation(line: 24, column: 7, scope: ![[SCOPE]])
	; CHECK: ![[JVAR]] = !DILocalVariable(name: "j"			; CHECK: ![[JVAR]] = !DILocalVariable(name: "j"
	; CHECK: ![[JDBGLOC]] = !DILocation(line: 32, column: 7, scope: ![[SCOPE]])			; CHECK: ![[JDBGLOC]] = !DILocation(line: 32, column: 7, scope: ![[SCOPE]])
	; CHECK: ![[IVAR_RESUME]] = !DILocalVariable(name: "i"			; CHECK: ![[IVAR_RESUME]] = !DILocalVariable(name: "i"
	; CHECK: ![[RESUME_SCOPE:[0-9]+]] = distinct !DILexicalBlock(scope: !8, file: !1, line: 23, column: 12)			; CHECK: ![[RESUME_SCOPE:[0-9]+]] = distinct !DILexicalBlock(scope: !8, file: !1, line: 23, column: 12)
	▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine] Properly determine whether an alloca should live on the frameClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 299431

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

llvm/test/Transforms/Coroutines/coro-alloca-01.ll

llvm/test/Transforms/Coroutines/coro-alloca-02.ll

llvm/test/Transforms/Coroutines/coro-alloca-03.ll

llvm/test/Transforms/Coroutines/coro-alloca-04.ll

llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll

[Coroutine] Properly determine whether an alloca should live on the frame
ClosedPublic