This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
1/2
CGDecl.cpp
-
CGExpr.cpp
-
CodeGenFunction.cpp
-
test/CodeGenCoroutines/
-
CodeGenCoroutines/
-
coro-symmetric-transfer-01.cpp

Differential D99227

[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines
ClosedPublic

Authored by lxfind on Mar 23 2021, 4:53 PM.

Download Raw Diff

Details

Reviewers

ChuanqiXu
junparser
rjmccall

Commits

rGc7a39c833af1: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

Summary

tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available.

Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame).
The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame.
In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap.
Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime.

To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point.
The following is a common code pattern called "Symmetric Transfer" in coroutine:

auto tmp = await_suspend();
__builtin_coro_resume(tmp.address());
return;

In the above code example, await_suspend() returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine.
During the call to await_suspend(), the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards.
However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for tmp. It will then call the address function on tmp. address function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the tmp pointer. So when the optimizer looks at it, it has to conservatively assume that tmp may escape and hence put it on the heap. Furthermore, in some cases address call would be inlined, which will generate a bunch of store/load instructions that move the tmp pointer around. Those stores will also make the compiler to think that tmp might escape.
A repro of crash can be found here: https://godbolt.org/z/KvPY66
To summarize, it's really difficult for the mid-end to figure out that the tmp data is short-lived.
I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines.

Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893

I need to fix a few tests. But sending this out early for feedback.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	80 ms	x64 debian > Clang.CodeGenCoroutines::coro-alloc.cpp
	50 ms	x64 debian > Clang.CodeGenCoroutines::coro-await-resume-eh.cpp
	120 ms	x64 debian > Clang.CodeGenCoroutines::coro-await.cpp
	110 ms	x64 debian > Clang.CodeGenCoroutines::coro-dest-slot.cpp
	90 ms	x64 debian > Clang.CodeGenCoroutines::coro-params.cpp
		View Full Test Results (13 Failed)

Event Timeline

lxfind created this revision.Mar 23 2021, 4:53 PM

Herald added subscribers: ChuanqiXu, hoy, modimo, wenlei. · View Herald TranscriptMar 23 2021, 4:53 PM

lxfind requested review of this revision.Mar 23 2021, 4:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2021, 4:53 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

lxfind edited the summary of this revision. (Show Details)Mar 23 2021, 4:59 PM

lxfind added reviewers: ChuanqiXu, junparser, rjmccall.

I have no objection to trying to always emit lifetime intrinsics in coroutines since it has a less-trivial runtime cost. I am skeptical that it's reasonable to do this for *correctness*, however; I don't think the frontend unconditionally emits lifetime intrinsics. But since I think this fine to do regardless, I have no objection to the patch.

I think you just set ShouldEmitLifetimeMarkers correctly in the first place instead of adding this as an extra condition to every place that considers it, however.

In D99227#2646532, @rjmccall wrote:

I am skeptical that it's reasonable to do this for *correctness*, however; I don't think the frontend unconditionally emits lifetime intrinsics.

Sorry, I re-read this after posting, and it's not exactly clear what I was saying. There are a lot of situations where Clang doesn't emit lifetime intrinsics for every alloca it emits, or emits unnecessarily weak bounds. Certain LLVM transforms can also introduce allocas that don't have corresponding lifetime intrinsics. So I think it's problematic to consider it a correctness condition that we're emitting optimally-tight lifetimes.

Only one problem I had for emitting lifetime markers even at O0 is that would action make allocas to be optimized even at O0? If so, I wonder if it confuses programmers since they may find some variables disappear surprisingly. Or there would be no optimization since every function would be marked with optnone attribute. I am not sure about this.

If I understand this problem correctly, this patch could fix problems for the return value of symmetric transfer and the gro that we discussed in D98638. Then D98638 may be unneeded. I prefer the implementation in this patch.

clang/lib/CodeGen/CGDecl.cpp
1318	Can we sure frontend would always call this API to emit lifetime start? I mean the frontend may call EmitIntrinsic or create lifetime.start intrinsic directly whether by IRBuilder::CreateXXX or Instrinsic::Create(...). I worry about if this would incur changes out of design. Then if we add check in EmitLifetimeStart, why not we add check in EmitLfietimeEnd?

Harbormaster completed remote builds in B95370: Diff 332826.Mar 23 2021, 9:46 PM

I think you just set ShouldEmitLifetimeMarkers correctly in the first place instead of adding this as an extra condition to every place that considers it, however.

This was set when a CodeGenFunction is constructed, at that point it doesn't yet know if this function is a coroutine.
I could turn ShouldEmitLifetimeMarkers to non-const, and then modify it once it realizes it's a coroutine though, if that's better than the current approach.

Sorry, I re-read this after posting, and it's not exactly clear what I was saying. There are a lot of situations where Clang doesn't emit lifetime intrinsics for every alloca it emits, or emits unnecessarily weak bounds. Certain LLVM transforms can also introduce allocas that don't have corresponding lifetime intrinsics. So I think it's problematic to consider it a correctness condition that we're emitting optimally-tight lifetimes.

I tend to agree. Relying on lifetime for correctness seems fragile.
I wonder if there is a better way to inform optimizer that a "variable" is really a temporary value that should die at the end of an expression?
For instance, whenever we do something simple like:

foo().bar();
co_await ...

If we compile it under -O0 without lifetime intrinsics, the return value of foo() will always be put on the coroutine frame, unless the compiler knows in advance that bar() does not capture.
This becomes a problem if this code appears at a location where the current coroutine frame may be destroyed (but the code itself isn't wrong, it simply doesn't access the frame).
The case for symmetric transfer is exactly this situation.

An alternative to solve the problem for the case of symmetric transfer, is to change the design of symmetric transfer. For example, if we let await_suspend to return void* instead of coroutine_handle, we won't have this problem in the first place, because we no longer need to call address(). Maybe @lewissbaker can comment on the viability of that.

In D99227#2646568, @ChuanqiXu wrote:

Only one problem I had for emitting lifetime markers even at O0 is that would action make allocas to be optimized even at O0? If so, I wonder if it confuses programmers since they may find some variables disappear surprisingly. Or there would be no optimization since every function would be marked with optnone attribute. I am not sure about this.

It will only cause variables to be put on the stack instead of on the frame, which shouldn't affect developer's view?

If I understand this problem correctly, this patch could fix problems for the return value of symmetric transfer and the gro that we discussed in D98638. Then D98638 may be unneeded. I prefer the implementation in this patch.

I doubt it can fix the gro problem. I will need to double check on that latter.

lxfind added inline comments.Mar 23 2021, 10:58 PM

clang/lib/CodeGen/CGDecl.cpp
1318	I searched in the codebase, and we always call this API to emit lifetime start in the front-end. Also, for coroutine to behave correctly, we really only need SD_FullExpression to be able to emit it. Other cases are less critical. Usually when it emits a LifetimeStart instruction, it will store it somewhere, and latter check on it to decide whether it needs to emit a lifetime end. That's when there is no checks needed for lifetime end.

In D99227#2646719, @lxfind wrote:

In D99227#2646568, @ChuanqiXu wrote:

Only one problem I had for emitting lifetime markers even at O0 is that would action make allocas to be optimized even at O0? If so, I wonder if it confuses programmers since they may find some variables disappear surprisingly. Or there would be no optimization since every function would be marked with optnone attribute. I am not sure about this.

It will only cause variables to be put on the stack instead of on the frame, which shouldn't affect developer's view?

Yes, I am just worry about the variable marked with lifetime intrinsic would be optimized by other passes. But functions would get attribute optnone in O0, my worries may be redundant. Then it is Ok to me to emit lifetime intrinsics all the time.

Is it feasible to outline the initial segment that you don't want to be part of the coroutine, and then have coroutine splitting force that outlined function to be inlined into the ramp function? IIUC, you were saying that the splitting patch was difficult, but maybe thinking about it as outlining simplifies things. I know we had some nasty representational problems with the async lowering that we solved with outlining and force-inlining.

In D99227#2646710, @lxfind wrote:

I think you just set ShouldEmitLifetimeMarkers correctly in the first place instead of adding this as an extra condition to every place that considers it, however.

This was set when a CodeGenFunction is constructed, at that point it doesn't yet know if this function is a coroutine.
I could turn ShouldEmitLifetimeMarkers to non-const, and then modify it once it realizes it's a coroutine though, if that's better than the current approach.

That would be fine.

In D99227#2646819, @rjmccall wrote:

Is it feasible to outline the initial segment that you don't want to be part of the coroutine, and then have coroutine splitting force that outlined function to be inlined into the ramp function? IIUC, you were saying that the splitting patch was difficult, but maybe thinking about it as outlining simplifies things. I know we had some nasty representational problems with the async lowering that we solved with outlining and force-inlining.

That's a good idea. I will think about it. Thanks!

Address comments, and fix all tests

lxfind mentioned this in D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.Mar 25 2021, 10:25 AM

Harbormaster completed remote builds in B95717: Diff 333338.Mar 25 2021, 11:09 AM

LGTM

This revision is now accepted and ready to land.Mar 25 2021, 1:39 PM

This revision was landed with ongoing or failed builds.Mar 25 2021, 1:46 PM

Closed by commit rGc7a39c833af1: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines (authored by lxfind). · Explain Why

This revision was automatically updated to reflect the committed changes.

lxfind added a commit: rGc7a39c833af1: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGDecl.cpp

3 lines

CGExpr.cpp

3 lines

CodeGenFunction.cpp

3 lines

test/

CodeGenCoroutines/

coro-symmetric-transfer-01.cpp

17 lines

Diff 332826

clang/lib/CodeGen/CGDecl.cpp

Show First 20 Lines • Show All 1,309 Lines • ▼ Show 20 Lines	void CodeGenFunction::EmitAutoVarDecl(const VarDecl &D) {
AutoVarEmission emission = EmitAutoVarAlloca(D);		AutoVarEmission emission = EmitAutoVarAlloca(D);
EmitAutoVarInit(emission);		EmitAutoVarInit(emission);
EmitAutoVarCleanups(emission);		EmitAutoVarCleanups(emission);
}		}

/// Emit a lifetime.begin marker if some criteria are satisfied.		/// Emit a lifetime.begin marker if some criteria are satisfied.
/// \return a pointer to the temporary size Value if a marker was emitted, null		/// \return a pointer to the temporary size Value if a marker was emitted, null
/// otherwise		/// otherwise
llvm::Value *CodeGenFunction::EmitLifetimeStart(uint64_t Size,		llvm::Value *CodeGenFunction::EmitLifetimeStart(uint64_t Size,
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Can we sure frontend would always call this API to emit lifetime start? I mean the frontend may call EmitIntrinsic or create lifetime.start intrinsic directly whether by IRBuilder::CreateXXX or Instrinsic::Create(...). I worry about if this would incur changes out of design. Then if we add check in EmitLifetimeStart, why not we add check in EmitLfietimeEnd? ChuanqiXu: Can we sure frontend would always call this API to emit lifetime start? I mean the frontend may…
		lxfindAuthorUnsubmitted Done Reply Inline Actions I searched in the codebase, and we always call this API to emit lifetime start in the front-end. Also, for coroutine to behave correctly, we really only need SD_FullExpression to be able to emit it. Other cases are less critical. Usually when it emits a LifetimeStart instruction, it will store it somewhere, and latter check on it to decide whether it needs to emit a lifetime end. That's when there is no checks needed for lifetime end. lxfind: I searched in the codebase, and we always call this API to emit lifetime start in the front-end.
llvm::Value *Addr) {		llvm::Value *Addr) {
if (!ShouldEmitLifetimeMarkers)		// Coroutine relies on lifetime markers to properly place data.
		if (!ShouldEmitLifetimeMarkers && !isCoroutine())
return nullptr;		return nullptr;

assert(Addr->getType()->getPointerAddressSpace() ==		assert(Addr->getType()->getPointerAddressSpace() ==
CGM.getDataLayout().getAllocaAddrSpace() &&		CGM.getDataLayout().getAllocaAddrSpace() &&
"Pointer should be in alloca address space");		"Pointer should be in alloca address space");
llvm::Value *SizeV = llvm::ConstantInt::get(Int64Ty, Size);		llvm::Value *SizeV = llvm::ConstantInt::get(Int64Ty, Size);
Addr = Builder.CreateBitCast(Addr, AllocaInt8PtrTy);		Addr = Builder.CreateBitCast(Addr, AllocaInt8PtrTy);
llvm::CallInst *C =		llvm::CallInst *C =
▲ Show 20 Lines • Show All 1,282 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGExpr.cpp

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	case SD_Automatic:
CGM.getDataLayout().getTypeAllocSize(Alloca.getElementType()),		CGM.getDataLayout().getTypeAllocSize(Alloca.getElementType()),
Alloca.getPointer())) {		Alloca.getPointer())) {
pushCleanupAfterFullExpr<CallLifetimeEnd>(NormalEHLifetimeMarker,		pushCleanupAfterFullExpr<CallLifetimeEnd>(NormalEHLifetimeMarker,
Alloca, Size);		Alloca, Size);
}		}
break;		break;

case SD_FullExpression: {		case SD_FullExpression: {
if (!ShouldEmitLifetimeMarkers)		// Coroutine relies on lifetime markers to properly place data.
		if (!ShouldEmitLifetimeMarkers && !isCoroutine())
break;		break;

// Avoid creating a conditional cleanup just to hold an llvm.lifetime.end		// Avoid creating a conditional cleanup just to hold an llvm.lifetime.end
// marker. Instead, start the lifetime of a conditional temporary earlier		// marker. Instead, start the lifetime of a conditional temporary earlier
// so that it's unconditional. Don't do this with sanitizers which need		// so that it's unconditional. Don't do this with sanitizers which need
// more precise lifetime marks.		// more precise lifetime marks.
ConditionalEvaluation *OldConditional = nullptr;		ConditionalEvaluation *OldConditional = nullptr;
CGBuilderTy::InsertPoint OldIP;		CGBuilderTy::InsertPoint OldIP;
▲ Show 20 Lines • Show All 4,891 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenFunction.cpp

Show First 20 Lines • Show All 1,307 Lines • ▼ Show 20 Lines	void CodeGenFunction::GenerateCode(GlobalDecl GD, llvm::Function *Fn,
if (const FunctionDecl *SpecDecl = FD->getTemplateInstantiationPattern())		if (const FunctionDecl *SpecDecl = FD->getTemplateInstantiationPattern())
if (SpecDecl->hasBody(SpecDecl))		if (SpecDecl->hasBody(SpecDecl))
Loc = SpecDecl->getLocation();		Loc = SpecDecl->getLocation();

Stmt *Body = FD->getBody();		Stmt *Body = FD->getBody();

// Initialize helper which will detect jumps which can cause invalid lifetime		// Initialize helper which will detect jumps which can cause invalid lifetime
// markers.		// markers.
if (Body && ShouldEmitLifetimeMarkers)		// Coroutines always emit lifetime markers.
		if (Body && (ShouldEmitLifetimeMarkers \|\| isa<CoroutineBodyStmt>(Body)))
Bypasses.Init(Body);		Bypasses.Init(Body);

// Emit the standard function prologue.		// Emit the standard function prologue.
StartFunction(GD, ResTy, Fn, FnInfo, Args, Loc, BodyRange.getBegin());		StartFunction(GD, ResTy, Fn, FnInfo, Args, Loc, BodyRange.getBegin());

// Generate the body of the function.		// Generate the body of the function.
PGO.assignRegionCounters(GD, CurFn);		PGO.assignRegionCounters(GD, CurFn);
if (isa<CXXDestructorDecl>(FD))		if (isa<CXXDestructorDecl>(FD))
▲ Show 20 Lines • Show All 1,342 Lines • Show Last 20 Lines

clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp

	// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes \| FileCheck %s			// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O0 -emit-llvm %s -o - -disable-llvm-passes \| FileCheck %s

	#include "Inputs/coroutine.h"			#include "Inputs/coroutine.h"

	namespace coro = std::experimental::coroutines_v1;			namespace coro = std::experimental::coroutines_v1;

	struct detached_task {			struct detached_task {
	struct promise_type {			struct promise_type {
	detached_task get_return_object() noexcept {			detached_task get_return_object() noexcept {
	Show All 35 Lines
	};			};

	detached_task foo() {			detached_task foo() {
	co_return;			co_return;
	}			}

	// check that the lifetime of the coroutine handle used to obtain the address is contained within single basic block, and hence does not live across suspension points.			// check that the lifetime of the coroutine handle used to obtain the address is contained within single basic block, and hence does not live across suspension points.
	// CHECK-LABEL: final.suspend:			// CHECK-LABEL: final.suspend:
	// CHECK: %[[PTR1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP:.+]] to i8*			// CHECK: %{{.+}} = call token @llvm.coro.save(i8* null)
	// CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])			// CHECK: %[[HDL_CAST1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL:.+]] to i8*
	// CHECK: call i8* @{{.address.}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* {{[^,]*}} %[[ADDR_TMP]])			// CHECK: call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[HDL_CAST1]])
	// CHECK-NEXT: %[[PTR2:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] to i8*			// CHECK: %[[CALL:.+]] = call i8* @_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(
	// CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])			// CHECK: %[[HDL_CAST2:.+]] = getelementptr inbounds %"struct.std::experimental::coroutines_v1::coroutine_handle.0", %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL]], i32 0, i32 0
				// CHECK: store i8* %[[CALL]], i8** %[[HDL_CAST2]], align 8
				// CHECK: %[[HDL_TRANSFER:.+]] = call i8* @_ZNKSt12experimental13coroutines_v116coroutine_handleIvE7addressEv(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull dereferenceable(8) %[[HDL]])
				// CHECK: %[[HDL_CAST3:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL]] to i8*
				// CHECK: call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[HDL_CAST3]])
				// CHECK: call void @llvm.coro.resume(i8* %[[HDL_TRANSFER]])

This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine][Clang] Force emit lifetime intrinsics for CoroutinesClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 332826

clang/lib/CodeGen/CGDecl.cpp

clang/lib/CodeGen/CGExpr.cpp

clang/lib/CodeGen/CodeGenFunction.cpp

clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp

[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines
ClosedPublic