This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
1/3
CoroFrame.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-alloca-loop-carried-address.ll

Differential D140231

CoroFrame: Put escaped variables with multiple lifetimes on coroutine frame
ClosedPublic

Authored by MatzeB on Dec 16 2022, 10:44 AM.

Download Raw Diff

Details

Reviewers

GorNishanov
ChuanqiXu
wlei
aschwaighofer

Commits

rGae7bf2b80b9b: CoroFrame: Put escaped variables with multiple lifetimes on coroutine frame

Summary

The llvm.lifetime.start intrinsic guarantees that the address for a
given alloca is always the same. So variables with escaped addresses
reaching reaching a lifetime start/end block before and after a suspend
must be placed onto the coroutine frame even if the variable itself
is not alive across the suspend point.

This computes a new LoopKill flag in the suspend crossing data flow
anaysis to catch the case where a lifetime marker can reach itself
via suspend-crossing path.

This fixes https://llvm.org/PR52501

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MatzeB created this revision.Dec 16 2022, 10:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 10:44 AM

Herald added subscribers: modimo, wenlei, hiraditya, mcrosier. · View Herald Transcript

MatzeB requested review of this revision.Dec 16 2022, 10:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 10:44 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B203660: Diff 483591.Dec 16 2022, 11:45 AM

The patch is self-contained and good.

I am curious about why https://github.com/llvm/llvm-project/issues/51843 is arm related since the patch doesn't involve the backend.

And another point is about the assumption:

The llvm.lifetime.start intrinsic guarantees that the address for a given alloca is always the same.

In the manual (https://llvm.org/docs/LangRef.html#int-lifestart), it says:

After llvm.lifetime.end is called, ‘llvm.lifetime.start’ on the stack object can be called again. The second ‘llvm.lifetime.start’ call marks the object as alive, but it does not change the address of the object.

I want to say these two statements doesn't look the same. Maybe it'll be better to send another patch to edit the manual too?

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
82–84	Is this change necessary? I mean if there is a path that can reach block 'i' and repeating 'i' , we must can reduce a path which can reach 'I' without repeating 'I'. Do I misunderstand anything?

This revision is now accepted and ready to land.Dec 18 2022, 7:40 PM

In D140231#4004061, @ChuanqiXu wrote:

The patch is self-contained and good.

I am curious about why https://github.com/llvm/llvm-project/issues/51843 is arm related since the patch doesn't involve the backend.

The issues is somewhat accidentally hitting ARM, because for the program in question passing a struct with 2 members, is modeled as 2xi64 vector type by the arm calling convention. This means there are instructions constructing this 2xi64 vector and those happen to get hoisted out of a loop. 1 of those 2 values is the address of a local variable (casted to i64 because of the calling convention) so the address suddenly needs to be constant even across the resume point within the loop and even though the variable is not alive throughout the whole loop as marked by the lifetime.start/end intrinsics.
The issue does not manifest on x86_64 because the struct calling convention uses 2 (separate) integer values instead of constructing a vector and those are simple enough to not get hoisted out of the loop it seems).

And another point is about the assumption:

The llvm.lifetime.start intrinsic guarantees that the address for a given alloca is always the same.

I expressed this a bit sloppily. It's always the same for a single call (or for a coroutine a single call/ramp-up and all following suspendends and continuations. It obviously doesn't need to be the same for separate calls.

In the manual (https://llvm.org/docs/LangRef.html#int-lifestart), it says:

After llvm.lifetime.end is called, ‘llvm.lifetime.start’ on the stack object can be called again. The second ‘llvm.lifetime.start’ call marks the object as alive, but it does not change the address of the object.

I want to say these two statements doesn't look the same. Maybe it'll be better to send another patch to edit the manual too?

The manual is fine as-is and doesn't need a change, doesn't it?

PS: I am hearing reports that my fix repairs some of the problems our users had but not all of them, so I will do some more testing and investigation before landing this.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
82–84	I am clarifying the definition for a case like the following and a query whether "B kills B": A ↓ B ← ← ↓ ↑ C (suspend) → ↑ ↓ D If you queried whether the path from "B" to "B" crosses a suspend point, the "Kill" flag flags compute here reported `false`. This result does ignore the path around the loop B->C->B though, which is another way to reach B and contains a suspense point in C. For most users in this file this interpretation is fine as it compares a "Def" and a "Use" and both being in the same block, just means the Def and the Use is in the same block and it doesn't matter that there we could also reach the block with a loop. (...And we may or not be right in this assumption that the "Def" always comes before the "Use" in the same block; but that is a discussion of corner cases for another day...) In the case of this change however when comparing lifetime.start values however we do need to test whether any path from a lifetime.start to a lifetime.start contains a suspend point even if it is the same point reached through a loop. Hence I introduce the new `KillLoop` flag here which captures this loop situation as the "Kills" flag alone does not catch it.

In D140231#4004199, @MatzeB wrote:

In D140231#4004061, @ChuanqiXu wrote:

The patch is self-contained and good.

I am curious about why https://github.com/llvm/llvm-project/issues/51843 is arm related since the patch doesn't involve the backend.

The issues is somewhat accidentally hitting ARM, because for the program in question passing a struct with 2 members, is modeled as 2xi64 vector type by the arm calling convention. This means there are instructions constructing this 2xi64 vector and those happen to get hoisted out of a loop. 1 of those 2 values is the address of a local variable (casted to i64 because of the calling convention) so the address suddenly needs to be constant even across the resume point within the loop and even though the variable is not alive throughout the whole loop as marked by the lifetime.start/end intrinsics.
The issue does not manifest on x86_64 because the struct calling convention uses 2 (separate) integer values instead of constructing a vector and those are simple enough to not get hoisted out of the loop it seems).

Thanks for the explanation.

And another point is about the assumption:

The llvm.lifetime.start intrinsic guarantees that the address for a given alloca is always the same.

I expressed this a bit sloppily. It's always the same for a single call (or for a coroutine a single call/ramp-up and all following suspendends and continuations. It obviously doesn't need to be the same for separate calls.

In the manual (https://llvm.org/docs/LangRef.html#int-lifestart), it says:

After llvm.lifetime.end is called, ‘llvm.lifetime.start’ on the stack object can be called again. The second ‘llvm.lifetime.start’ call marks the object as alive, but it does not change the address of the object.

I want to say these two statements doesn't look the same. Maybe it'll be better to send another patch to edit the manual too?

The manual is fine as-is and doesn't need a change, doesn't it?

Yes. After re-thinking about this, I think we need to do this even without lifetime markers (I'm not asking for a change). I mean we need to do this because this is the semantics of the coroutines instead of the semantics of lifetime markers. I don't require the changes since we'll always generate lifetime markers for coroutines as a helper. So we might not need to change the semantics of lifetime markers.

PS: I am hearing reports that my fix repairs some of the problems our users had but not all of them, so I will do some more testing and investigation before landing this.

Got it.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
82–84	Thanks for the explanation. But I guess you misunderstand my points. When I say `this change`, I mean the comment changes about `Kills`. I want to ask if the new comments add newer information then the old comments. Or it is just cleaner. (I'm not asking for a change.)

Decided to rebase and land this now.

MatzeB updated this revision to Diff 486257.Jan 4 2023, 6:12 AM

Harbormaster completed remote builds in B205674: Diff 486257.Jan 4 2023, 7:26 AM

Closed by commit rGae7bf2b80b9b: CoroFrame: Put escaped variables with multiple lifetimes on coroutine frame (authored by MatzeB). · Explain WhyJan 4 2023, 7:30 AM

This revision was automatically updated to reflect the committed changes.

MatzeB added a commit: rGae7bf2b80b9b: CoroFrame: Put escaped variables with multiple lifetimes on coroutine frame.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Coroutines/

CoroFrame.cpp

49 lines

test/

Transforms/

Coroutines/

coro-alloca-loop-carried-address.ll

86 lines

Diff 486286

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
} // end anonymous namespace		} // end anonymous namespace

// The SuspendCrossingInfo maintains data that allows to answer a question		// The SuspendCrossingInfo maintains data that allows to answer a question
// whether given two BasicBlocks A and B there is a path from A to B that		// whether given two BasicBlocks A and B there is a path from A to B that
// passes through a suspend point.		// passes through a suspend point.
//		//
// For every basic block 'i' it maintains a BlockData that consists of:		// For every basic block 'i' it maintains a BlockData that consists of:
// Consumes: a bit vector which contains a set of indices of blocks that can		// Consumes: a bit vector which contains a set of indices of blocks that can
// reach block 'i'		// reach block 'i'. A block can trivially reach itself.
// Kills: a bit vector which contains a set of indices of blocks that can		// Kills: a bit vector which contains a set of indices of blocks that can
// reach block 'i', but one of the path will cross a suspend point		// reach block 'i' but there is a path crossing a suspend point
		// not repeating 'i' (path to 'i' without cycles containing 'i').
// Suspend: a boolean indicating whether block 'i' contains a suspend point.		// Suspend: a boolean indicating whether block 'i' contains a suspend point.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Is this change necessary? I mean if there is a path that can reach block 'i' and repeating 'i' , we must can reduce a path which can reach 'I' without repeating 'I'. Do I misunderstand anything? ChuanqiXu: Is this change necessary? I mean if there is a path that can reach block 'i' and repeating 'i'…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions I am clarifying the definition for a case like the following and a query whether "B kills B": A ↓ B ← ← ↓ ↑ C (suspend) → ↑ ↓ D If you queried whether the path from "B" to "B" crosses a suspend point, the "Kill" flag flags compute here reported `false`. This result does ignore the path around the loop B->C->B though, which is another way to reach B and contains a suspense point in C. For most users in this file this interpretation is fine as it compares a "Def" and a "Use" and both being in the same block, just means the Def and the Use is in the same block and it doesn't matter that there we could also reach the block with a loop. (...And we may or not be right in this assumption that the "Def" always comes before the "Use" in the same block; but that is a discussion of corner cases for another day...) In the case of this change however when comparing lifetime.start values however we do need to test whether any path from a lifetime.start to a lifetime.start contains a suspend point even if it is the same point reached through a loop. Hence I introduce the new `KillLoop` flag here which captures this loop situation as the "Kills" flag alone does not catch it. MatzeB: I am clarifying the definition for a case like the following and a query whether "B kills B"…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Thanks for the explanation. But I guess you misunderstand my points. When I say `this change`, I mean the comment changes about `Kills`. I want to ask if the new comments add newer information then the old comments. Or it is just cleaner. (I'm not asking for a change.) ChuanqiXu: Thanks for the explanation. But I guess you misunderstand my points. When I say `this change`…
// End: a boolean indicating whether block 'i' contains a coro.end intrinsic.		// End: a boolean indicating whether block 'i' contains a coro.end intrinsic.
		// KillLoop: There is a path from 'i' to 'i' not otherwise repeating 'i' that
		// crosses a suspend point.
//		//
namespace {		namespace {
struct SuspendCrossingInfo {		struct SuspendCrossingInfo {
BlockToIndexMapping Mapping;		BlockToIndexMapping Mapping;

struct BlockData {		struct BlockData {
BitVector Consumes;		BitVector Consumes;
BitVector Kills;		BitVector Kills;
bool Suspend = false;		bool Suspend = false;
bool End = false;		bool End = false;
		bool KillLoop = false;
};		};
SmallVector<BlockData, SmallVectorThreshold> Block;		SmallVector<BlockData, SmallVectorThreshold> Block;

iterator_range<succ_iterator> successors(BlockData const &BD) const {		iterator_range<succ_iterator> successors(BlockData const &BD) const {
BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]);		BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]);
return llvm::successors(BB);		return llvm::successors(BB);
}		}

BlockData &getBlockData(BasicBlock *BB) {		BlockData &getBlockData(BasicBlock *BB) {
return Block[Mapping.blockToIndex(BB)];		return Block[Mapping.blockToIndex(BB)];
}		}

void dump() const;		void dump() const;
void dump(StringRef Label, BitVector const &BV) const;		void dump(StringRef Label, BitVector const &BV) const;

SuspendCrossingInfo(Function &F, coro::Shape &Shape);		SuspendCrossingInfo(Function &F, coro::Shape &Shape);

bool hasPathCrossingSuspendPoint(BasicBlock DefBB, BasicBlock UseBB) const {		/// Returns true if there is a path from \p From to \p To crossing a suspend
size_t const DefIndex = Mapping.blockToIndex(DefBB);		/// point without crossing \p From a 2nd time.
size_t const UseIndex = Mapping.blockToIndex(UseBB);		bool hasPathCrossingSuspendPoint(BasicBlock From, BasicBlock To) const {
		size_t const FromIndex = Mapping.blockToIndex(From);
bool const Result = Block[UseIndex].Kills[DefIndex];		size_t const ToIndex = Mapping.blockToIndex(To);
LLVM_DEBUG(dbgs() << UseBB->getName() << " => " << DefBB->getName()		bool const Result = Block[ToIndex].Kills[FromIndex];
		LLVM_DEBUG(dbgs() << From->getName() << " => " << To->getName()
<< " answer is " << Result << "\n");		<< " answer is " << Result << "\n");
return Result;		return Result;
}		}

		/// Returns true if there is a path from \p From to \p To crossing a suspend
		/// point without crossing \p From a 2nd time. If \p From is the same as \p To
		/// this will also check if there is a looping path crossing a suspend point.
		bool hasPathOrLoopCrossingSuspendPoint(BasicBlock *From,
		BasicBlock *To) const {
		size_t const FromIndex = Mapping.blockToIndex(From);
		size_t const ToIndex = Mapping.blockToIndex(To);
		bool Result = Block[ToIndex].Kills[FromIndex] \|\|
		(From == To && Block[ToIndex].KillLoop);
		LLVM_DEBUG(dbgs() << From->getName() << " => " << To->getName()
		<< " answer is " << Result << " (path or loop)\n");
		return Result;
		}

bool isDefinitionAcrossSuspend(BasicBlock DefBB, User U) const {		bool isDefinitionAcrossSuspend(BasicBlock DefBB, User U) const {
auto *I = cast<Instruction>(U);		auto *I = cast<Instruction>(U);

// We rewrote PHINodes, so that only the ones with exactly one incoming		// We rewrote PHINodes, so that only the ones with exactly one incoming
// value need to be analyzed.		// value need to be analyzed.
if (auto *PN = dyn_cast<PHINode>(I))		if (auto *PN = dyn_cast<PHINode>(I))
if (PN->getNumIncomingValues() > 1)		if (PN->getNumIncomingValues() > 1)
return false;		return false;
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	for (size_t I = 0; I < N; ++I) {
// If block S is an end block, it should not propagate kills as the		// If block S is an end block, it should not propagate kills as the
// blocks following coro.end() are reached during initial invocation		// blocks following coro.end() are reached during initial invocation
// of the coroutine while all the data are still available on the		// of the coroutine while all the data are still available on the
// stack or in the registers.		// stack or in the registers.
S.Kills.reset();		S.Kills.reset();
} else {		} else {
// This is reached when S block it not Suspend nor coro.end and it		// This is reached when S block it not Suspend nor coro.end and it
// need to make sure that it is not in the kill set.		// need to make sure that it is not in the kill set.
		S.KillLoop \|= S.Kills[SuccNo];
S.Kills.reset(SuccNo);		S.Kills.reset(SuccNo);
}		}

// See if anything changed.		// See if anything changed.
Changed \|= (S.Kills != SavedKills) \|\| (S.Consumes != SavedConsumes);		Changed \|= (S.Kills != SavedKills) \|\| (S.Consumes != SavedConsumes);

if (S.Kills != SavedKills) {		if (S.Kills != SavedKills) {
LLVM_DEBUG(dbgs() << "\nblock " << I << " follower " << SI->getName()		LLVM_DEBUG(dbgs() << "\nblock " << I << " follower " << SI->getName()
▲ Show 20 Lines • Show All 1,153 Lines • ▼ Show 20 Lines	bool computeShouldLiveOnFrame() const {
// more precise. We look at every pair of lifetime.start intrinsic and		// more precise. We look at every pair of lifetime.start intrinsic and
// every basic block that uses the pointer to see if they cross suspension		// every basic block that uses the pointer to see if they cross suspension
// points. The uses cover both direct uses as well as indirect uses.		// points. The uses cover both direct uses as well as indirect uses.
if (ShouldUseLifetimeStartInfo && !LifetimeStarts.empty()) {		if (ShouldUseLifetimeStartInfo && !LifetimeStarts.empty()) {
for (auto *I : Users)		for (auto *I : Users)
for (auto *S : LifetimeStarts)		for (auto *S : LifetimeStarts)
if (Checker.isDefinitionAcrossSuspend(*S, I))		if (Checker.isDefinitionAcrossSuspend(*S, I))
return true;		return true;
		// Addresses are guaranteed to be identical after every lifetime.start so
		// we cannot use the local stack if the address escaped and there is a
		// suspend point between lifetime markers. This should also cover the
		// case of a single lifetime.start intrinsic in a loop with suspend point.
		if (PI.isEscaped()) {
		for (auto *A : LifetimeStarts) {
		for (auto *B : LifetimeStarts) {
		if (Checker.hasPathOrLoopCrossingSuspendPoint(A->getParent(),
		B->getParent()))
		return true;
		}
		}
		}
return false;		return false;
}		}
// FIXME: Ideally the isEscaped check should come at the beginning.		// FIXME: Ideally the isEscaped check should come at the beginning.
// However there are a few loose ends that need to be fixed first before		// However there are a few loose ends that need to be fixed first before
// we can do that. We need to make sure we are not over-conservative, so		// we can do that. We need to make sure we are not over-conservative, so
// that the data accessed in-between await_suspend and symmetric transfer		// that the data accessed in-between await_suspend and symmetric transfer
// is always put on the stack, and also data accessed after coro.end is		// is always put on the stack, and also data accessed after coro.end is
// always put on the stack (esp the return object). To fix that, we need		// always put on the stack (esp the return object). To fix that, we need
▲ Show 20 Lines • Show All 1,415 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-alloca-loop-carried-address.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -passes='cgscc(coro-split),simplifycfg,early-cse' -S \| FileCheck %s

				@escape_hatch0 = external global i64
				@escape_hatch1 = external global i64

				define void @foo() presplitcoroutine {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[STACKVAR0:%.*]] = alloca i64, align 8
				; CHECK-NEXT: [[ID:%.*]] = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr @foo.resumers)
				; CHECK-NEXT: [[ALLOC:%.*]] = call ptr @malloc(i64 40)
				; CHECK-NEXT: [[VFRAME:%.*]] = call noalias nonnull ptr @llvm.coro.begin(token [[ID]], ptr [[ALLOC]])
				; CHECK-NEXT: store ptr @foo.resume, ptr [[VFRAME]], align 8
				; CHECK-NEXT: [[DESTROY_ADDR:%.]] = getelementptr inbounds [[FOO_FRAME:%.]], ptr [[VFRAME]], i32 0, i32 1
				; CHECK-NEXT: store ptr @foo.destroy, ptr [[DESTROY_ADDR]], align 8
				; CHECK-NEXT: [[STACKVAR0_RELOAD_ADDR:%.*]] = getelementptr inbounds [[FOO_FRAME]], ptr [[VFRAME]], i32 0, i32 2
				; CHECK-NEXT: [[STACKVAR1_RELOAD_ADDR:%.*]] = getelementptr inbounds [[FOO_FRAME]], ptr [[VFRAME]], i32 0, i32 3
				; CHECK-NEXT: [[STACKVAR0_INT:%.*]] = ptrtoint ptr [[STACKVAR0_RELOAD_ADDR]] to i64
				; CHECK-NEXT: store i64 [[STACKVAR0_INT]], ptr @escape_hatch0, align 4
				; CHECK-NEXT: [[STACKVAR1_INT:%.*]] = ptrtoint ptr [[STACKVAR1_RELOAD_ADDR]] to i64
				; CHECK-NEXT: store i64 [[STACKVAR1_INT]], ptr @escape_hatch1, align 4
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: store i64 1234, ptr [[STACKVAR0_RELOAD_ADDR]], align 4
				; CHECK-NEXT: call void @bar()
				; CHECK-NEXT: [[INDEX_ADDR1:%.*]] = getelementptr inbounds [[FOO_FRAME]], ptr [[VFRAME]], i32 0, i32 4
				; CHECK-NEXT: store i1 false, ptr [[INDEX_ADDR1]], align 1
				; CHECK-NEXT: br i1 false, label [[LOOP]], label [[AFTERCOROEND:%.*]]
				; CHECK: AfterCoroEnd:
				; CHECK-NEXT: ret void
				;
				entry:
				%stackvar0 = alloca i64
				%stackvar1 = alloca i64

				; address of %stackvar escapes and may be relied upon even after
				; suspending/resuming the coroutine regardless of the lifetime markers.
				%id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
				%size = call i64 @llvm.coro.size.i64()
				%alloc = call ptr @malloc(i64 %size)
				%vFrame = call noalias nonnull ptr @llvm.coro.begin(token %id, ptr %alloc)

				; %stackvar0 must be rewritten to reference the coroutine Frame!
				%stackvar0_int = ptrtoint ptr %stackvar0 to i64
				store i64 %stackvar0_int, ptr @escape_hatch0
				; %stackvar1 must be rewritten to reference the coroutine Frame!
				%stackvar1_int = ptrtoint ptr %stackvar1 to i64
				store i64 %stackvar1_int, ptr @escape_hatch1

				br label %loop

				loop:
				call void @llvm.lifetime.start(i64 8, ptr %stackvar0)

				store i64 1234, ptr %stackvar0

				; Call could potentially change value in memory referenced by %stackvar0 /
				; %stackvar1 and rely on it staying the same across suspension.
				call void @bar()

				call void @llvm.lifetime.end(i64 8, ptr %stackvar0)

				%save = call token @llvm.coro.save(ptr null)
				%suspend = call i8 @llvm.coro.suspend(token %save, i1 false)
				switch i8 %suspend, label %exit [
				i8 0, label %loop
				i8 1, label %exit
				]

				exit:
				call i1 @llvm.coro.end(ptr null, i1 false)
				ret void
				}

				declare void @bar()
				declare ptr @malloc(i64)

				declare token @llvm.coro.id(i32, ptr readnone, ptr nocapture readonly, ptr)
				declare i64 @llvm.coro.size.i64()
				declare ptr @llvm.coro.begin(token, ptr writeonly)
				declare token @llvm.coro.save(ptr)
				declare i8 @llvm.coro.suspend(token, i1)
				declare i1 @llvm.coro.end(ptr, i1)
				declare void @llvm.lifetime.start(i64, ptr nocapture)
				declare void @llvm.lifetime.end(i64, ptr nocapture)