This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
4
CGExpr.cpp
-
ItaniumCXXABI.cpp
-
test/
-
CodeGenCXX/
-
const-init-cxx2a.cpp
-
cxx2a-thread-local-constinit.cpp
-
CodeGenCoroutines/
-
coro-tls.cpp
-
llvm/
-
docs/
-
Coroutines.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
IRBuilder.h
-
Intrinsics.td
-
lib/
-
IR/
-
IRBuilder.cpp
-
Transforms/Coroutines/
-
Coroutines/
-
CoroCleanup.cpp
-
Coroutines.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-TLS-01.ll
-
coro-TLS-02.ll
-
coro-TLS-03.ll
-
coro-TLS-04.ll
-
coro-cleanup-maychange.ll

Differential D124361

[Coroutines] Add coro_maychange intrinsic to solve TLS problem (2/5)
AbandonedPublic

Authored by ChuanqiXu on Apr 25 2022, 12:31 AM.

Download Raw Diff

Details

Reviewers

rjmccall
jyknight
nhaehnle
jdoerfert
efriedma

Summary

This is intended to fix the TLS problem in coroutine described in https://github.com/llvm/llvm-project/issues/47179

Simply, we would assume the address of a TLS variable is same in one function. Since a function should be executed in one thread only. However, it is not true for unlowered coroutine. This patch tries to fix the problem by adding a wrapper for every TLS variable to block the alias analysis. Note that we couldn't do this for unlowered coroutine only due to inlining. Also the compiler is still available to optimize TLS variables for lowered coroutine.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Apr 25 2022, 12:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 12:31 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

ChuanqiXu requested review of this revision.Apr 25 2022, 12:31 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 12:31 AM

Herald added subscribers: llvm-commits, jdoerfert. · View Herald Transcript

ChuanqiXu added a parent revision: D124360: [Pipelines] Hoist CoroCleanup to avoid blocking optimizations.Apr 25 2022, 1:16 AM

ChuanqiXu added reviewers: rjmccall, jyknight, nhaehnle.

ChuanqiXu added a reviewer: jdoerfert.

Harbormaster completed remote builds in B161107: Diff 424829.Apr 25 2022, 1:32 AM

ChuanqiXu added a child revision: D124362: [NFC] [Pipelines] Hoist CoroCleanup as Module Pass (3/5).Apr 25 2022, 1:32 AM

efriedma added a subscriber: efriedma.Apr 25 2022, 7:44 AM

efriedma added inline comments.

llvm/lib/Transforms/Coroutines/CoroEarly.cpp
265 ↗	(On Diff #424829)	UserInst might not be a legal insertion point for a call if it's a PHI node.

ChuanqiXu added inline comments.Apr 25 2022, 7:01 PM

llvm/lib/Transforms/Coroutines/CoroEarly.cpp
265 ↗	(On Diff #424829)	Oh, good Catcha!

ChuanqiXu added a reviewer: efriedma.Apr 25 2022, 8:32 PM

Emit llvm.coro.maychange in the frontend according to the discussion in https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015

Herald added a subscriber: sstefan1. · View Herald TranscriptApr 27 2022, 3:43 AM

Harbormaster completed remote builds in B161570: Diff 425470.Apr 27 2022, 4:28 AM

ChuanqiXu mentioned this in D124592: [DRAFT] [Coroutines] Add coro_maychange intrinsic to coroutines only for TLS variables.Apr 27 2022, 8:36 PM

ChuanqiXu mentioned this in D124363: [Coroutines] Don't optimize readnone function before we split coroutine (4/5).Apr 28 2022, 2:11 AM

rjmccall added inline comments.Apr 28 2022, 3:27 PM

clang/lib/CodeGen/CGExpr.cpp
2623	I guess this is unnecessary under OpenMP because the privatization logic will already anchor this appropriately in the function. That's worth mentioning in a comment so that readers don't think the combination is somehow busted. Same question as the other patch: is there any way to safely only do this in code that's actually going to be part of a coroutine? Because `getLangOpts().Coroutines` is true for everyone using `-std=c++20`, and most of that code is probably not using coroutines. It seems like we have two options: Use this pattern everywhere in the translation unit, and then eliminate it from all functions after we've split all coroutines Use this pattern in the frontend when directly emitting unsplit coroutine bodies, and also change LLVM to introduce this pattern when cloning code into an unsplit coroutine body (most importantly, in the inliner) I think the second option makes more sense. In the first option, the mere possibility of having coroutines in the module could significantly impede optimization. The second option also means you'll have to find and remove this pattern in all functions, not just when transforming coroutines. You could also use the second option for your `readnone` problem: you can have your early pass make `readnone` calls directly from coroutine bodies `coro_readnone` instead, and the inliner can do the same for `readnone` calls being inlined into unsplit coroutines.

efriedma added inline comments.Apr 28 2022, 3:53 PM

clang/lib/CodeGen/CGExpr.cpp
2623	I'm not really happy with the idea that IR semantics are different in the body of a coroutine; that implies that every interprocedural analysis or optimization author has to think about the possible interactions with coroutine semantics. It's much simpler to just pretend a coroutine is a thread. I mean, you mention inlining, but there's also IPSCCP, and Attributor, maybe GlobalOpt, and other interactions I'm probably not thinking of. And really, as long as alias analysis understands llvm.coro.maychange, we can do almost all the optimizations we want to do on thread-local variables anyway. But maybe I'm leaning too far towards conceptual purity, as opposed to practicality.

rjmccall added inline comments.Apr 28 2022, 4:56 PM

clang/lib/CodeGen/CGExpr.cpp
2623	Like I said in Discourse, I think our hands are pretty forced here for TLS, because LLVM IR is assuming in its basic representation that functions are pinned to a single thread, which simply is not true in coroutine bodies. We have three options: We can globally change the structure of TLS access. You would not be able to simply use `thread_local` GVs as constants; instead, you would have to use some instruction which yields the address of the TLS for the current thread. We can require different structure for TLS access in unlowered coroutines. We can decline to support coroutines. I don't think #3 is an option, and #1 isn't really a fight I want to fight, so we're left with #2. Maybe there's an option I'm not thinking of, though.

efriedma added inline comments.Apr 28 2022, 5:24 PM

clang/lib/CodeGen/CGExpr.cpp
2623	I think that's basically the three options, yes. I guess strictly speaking, there is a fourth option: we could mess with the semantics of coroutines. We can say, for example, that the address of thread-local variables is always the address on entry to the coroutine. Then we can just hoist the computation to the entry block of the coroutine. But the spec probably doesn't allow that. My intuition is that #1 is not actually that terrible. We don't really optimize thread-local variables very much anyway; as long as alias analysis doesn't explode, we're probably don't miss many optimizations. And I'm not sure #2 is actually significantly less work to implement. But I guess it's easier to prove the changes involved in #2 don't have any impact on code that doesn't use coroutines.

Address comments:

Don't add filter for OpenMP
Add readonly and inaccessiblememonly attribute to llvm.coro.may_change intrinsics.

I think it is better to discuss whether or not to insert llvm.coro.may_change in non-coroutines in discourse: https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015

Harbormaster completed remote builds in B161916: Diff 425954.Apr 28 2022, 9:02 PM

Abandon this one since we prefer the proposal from James in https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/21

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGExpr.cpp

23 lines

ItaniumCXXABI.cpp

5 lines

test/

CodeGenCXX/

const-init-cxx2a.cpp

4 lines

cxx2a-thread-local-constinit.cpp

11 lines

CodeGenCoroutines/

coro-tls.cpp

59 lines

llvm/

docs/

Coroutines.rst

33 lines

include/

llvm/

IR/

IRBuilder.h

3 lines

Intrinsics.td

3 lines

lib/

IR/

IRBuilder.cpp

6 lines

Transforms/

Coroutines/

CoroCleanup.cpp

6 lines

Coroutines.cpp

1 line

test/

Transforms/

Coroutines/

69 lines

65 lines

70 lines

65 lines

coro-cleanup-maychange.ll

73 lines

Diff 425470

clang/lib/CodeGen/CGExpr.cpp

Show First 20 Lines • Show All 2,585 Lines • ▼ Show 20 Lines	LValue CodeGenFunction::EmitLoadOfPointerLValue(Address PtrAddr,
Address Addr = EmitLoadOfPointer(PtrAddr, PtrTy, &BaseInfo, &TBAAInfo);		Address Addr = EmitLoadOfPointer(PtrAddr, PtrTy, &BaseInfo, &TBAAInfo);
return MakeAddrLValue(Addr, PtrTy->getPointeeType(), BaseInfo, TBAAInfo);		return MakeAddrLValue(Addr, PtrTy->getPointeeType(), BaseInfo, TBAAInfo);
}		}

static LValue EmitGlobalVarDeclLValue(CodeGenFunction &CGF,		static LValue EmitGlobalVarDeclLValue(CodeGenFunction &CGF,
const Expr E, const VarDecl VD) {		const Expr E, const VarDecl VD) {
QualType T = E->getType();		QualType T = E->getType();

// If it's thread_local, emit a call to its wrapper function instead.		// If it's a dynamic thread_local, and the ABI requires a wrapper function,
		// emit a call to its wrapper function instead.
if (VD->getTLSKind() == VarDecl::TLS_Dynamic &&		if (VD->getTLSKind() == VarDecl::TLS_Dynamic &&
CGF.CGM.getCXXABI().usesThreadWrapperFunction(VD))		CGF.CGM.getCXXABI().usesThreadWrapperFunction(VD))
return CGF.CGM.getCXXABI().EmitThreadLocalVarDeclLValue(CGF, VD, T);		return CGF.CGM.getCXXABI().EmitThreadLocalVarDeclLValue(CGF, VD, T);
// Check if the variable is marked as declare target with link clause in		// Check if the variable is marked as declare target with link clause in
// device codegen.		// device codegen.
if (CGF.getLangOpts().OpenMPIsDevice) {		if (CGF.getLangOpts().OpenMPIsDevice) {
Address Addr = emitDeclTargetVarDeclLValue(CGF, VD, T);		Address Addr = emitDeclTargetVarDeclLValue(CGF, VD, T);
if (Addr.isValid())		if (Addr.isValid())
return CGF.MakeAddrLValue(Addr, T, AlignmentSource::Decl);		return CGF.MakeAddrLValue(Addr, T, AlignmentSource::Decl);
}		}

		bool ShouldEmitPrivate = CGF.getLangOpts().OpenMP &&
		!CGF.getLangOpts().OpenMPSimd &&
		VD->hasAttr<OMPThreadPrivateDeclAttr>();

llvm::Value *V = CGF.CGM.GetAddrOfGlobalVar(VD);		llvm::Value *V = CGF.CGM.GetAddrOfGlobalVar(VD);
llvm::Type *RealVarTy = CGF.getTypes().ConvertTypeForMem(VD->getType());		llvm::Type *RealVarTy = CGF.getTypes().ConvertTypeForMem(VD->getType());
V = EmitBitCastOfLValueToProperType(CGF, V, RealVarTy);		V = EmitBitCastOfLValueToProperType(CGF, V, RealVarTy);

		// Previously, the optimizer could assume the address of a TLS variable is
		// same in the same function. The assumption is broken now after we introduced
		// coroutines. Mark the TLS variable with llvm.coro.maychange intrinsic to
		// block the optimizations before we split coroutines. After the coroutine get
		// splitted, the llvm.coro.maychange intrinsics would be removed. Then the
		// compiler is free to optimize them.
		if (VD->getTLSKind() != VarDecl::TLS_None && !ShouldEmitPrivate &&
		CGF.getLangOpts().Coroutines)
		V = CGF.Builder.CreateCoroMayChange(V);
		rjmccallUnsubmitted Not Done Reply Inline Actions I guess this is unnecessary under OpenMP because the privatization logic will already anchor this appropriately in the function. That's worth mentioning in a comment so that readers don't think the combination is somehow busted. Same question as the other patch: is there any way to safely only do this in code that's actually going to be part of a coroutine? Because `getLangOpts().Coroutines` is true for everyone using `-std=c++20`, and most of that code is probably not using coroutines. It seems like we have two options: Use this pattern everywhere in the translation unit, and then eliminate it from all functions after we've split all coroutines Use this pattern in the frontend when directly emitting unsplit coroutine bodies, and also change LLVM to introduce this pattern when cloning code into an unsplit coroutine body (most importantly, in the inliner) I think the second option makes more sense. In the first option, the mere possibility of having coroutines in the module could significantly impede optimization. The second option also means you'll have to find and remove this pattern in all functions, not just when transforming coroutines. You could also use the second option for your `readnone` problem: you can have your early pass make `readnone` calls directly from coroutine bodies `coro_readnone` instead, and the inliner can do the same for `readnone` calls being inlined into unsplit coroutines. rjmccall: I guess this is unnecessary under OpenMP because the privatization logic will already anchor…
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not really happy with the idea that IR semantics are different in the body of a coroutine; that implies that every interprocedural analysis or optimization author has to think about the possible interactions with coroutine semantics. It's much simpler to just pretend a coroutine is a thread. I mean, you mention inlining, but there's also IPSCCP, and Attributor, maybe GlobalOpt, and other interactions I'm probably not thinking of. And really, as long as alias analysis understands llvm.coro.maychange, we can do almost all the optimizations we want to do on thread-local variables anyway. But maybe I'm leaning too far towards conceptual purity, as opposed to practicality. efriedma: I'm not really happy with the idea that IR semantics are different in the body of a coroutine…
		rjmccallUnsubmitted Not Done Reply Inline Actions Like I said in Discourse, I think our hands are pretty forced here for TLS, because LLVM IR is assuming in its basic representation that functions are pinned to a single thread, which simply is not true in coroutine bodies. We have three options: We can globally change the structure of TLS access. You would not be able to simply use `thread_local` GVs as constants; instead, you would have to use some instruction which yields the address of the TLS for the current thread. We can require different structure for TLS access in unlowered coroutines. We can decline to support coroutines. I don't think #3 is an option, and #1 isn't really a fight I want to fight, so we're left with #2. Maybe there's an option I'm not thinking of, though. rjmccall: Like I said in Discourse, I think our hands are pretty forced here for TLS, because LLVM IR is…
		efriedmaUnsubmitted Not Done Reply Inline Actions I think that's basically the three options, yes. I guess strictly speaking, there is a fourth option: we could mess with the semantics of coroutines. We can say, for example, that the address of thread-local variables is always the address on entry to the coroutine. Then we can just hoist the computation to the entry block of the coroutine. But the spec probably doesn't allow that. My intuition is that #1 is not actually that terrible. We don't really optimize thread-local variables very much anyway; as long as alias analysis doesn't explode, we're probably don't miss many optimizations. And I'm not sure #2 is actually significantly less work to implement. But I guess it's easier to prove the changes involved in #2 don't have any impact on code that doesn't use coroutines. efriedma: I think that's basically the three options, yes. I guess strictly speaking, there is a fourth…

CharUnits Alignment = CGF.getContext().getDeclAlign(VD);		CharUnits Alignment = CGF.getContext().getDeclAlign(VD);
Address Addr(V, RealVarTy, Alignment);		Address Addr(V, RealVarTy, Alignment);
// Emit reference to the private copy of the variable if it is an OpenMP		// Emit reference to the private copy of the variable if it is an OpenMP
// threadprivate variable.		// threadprivate variable.
if (CGF.getLangOpts().OpenMP && !CGF.getLangOpts().OpenMPSimd &&		if (ShouldEmitPrivate)
VD->hasAttr<OMPThreadPrivateDeclAttr>()) {
return EmitThreadPrivateVarDeclLValue(CGF, VD, T, Addr, RealVarTy,		return EmitThreadPrivateVarDeclLValue(CGF, VD, T, Addr, RealVarTy,
E->getExprLoc());		E->getExprLoc());
}
LValue LV = VD->getType()->isReferenceType() ?		LValue LV = VD->getType()->isReferenceType() ?
CGF.EmitLoadOfReferenceLValue(Addr, VD->getType(),		CGF.EmitLoadOfReferenceLValue(Addr, VD->getType(),
AlignmentSource::Decl) :		AlignmentSource::Decl) :
CGF.MakeAddrLValue(Addr, T, AlignmentSource::Decl);		CGF.MakeAddrLValue(Addr, T, AlignmentSource::Decl);
setObjCGCLValueClass(CGF.getContext(), E, LV);		setObjCGCLValueClass(CGF.getContext(), E, LV);
return LV;		return LV;
}		}

▲ Show 20 Lines • Show All 3,004 Lines • Show Last 20 Lines

clang/lib/CodeGen/ItaniumCXXABI.cpp

Show First 20 Lines • Show All 2,977 Lines • ▼ Show 20 Lines	if (HasConstantInitialization) {
Builder.CreateBr(ExitBB);		Builder.CreateBr(ExitBB);

Builder.SetInsertPoint(ExitBB);		Builder.SetInsertPoint(ExitBB);
}		}

// For a reference, the result of the wrapper function is a pointer to		// For a reference, the result of the wrapper function is a pointer to
// the referenced object.		// the referenced object.
llvm::Value *Val = Var;		llvm::Value *Val = Var;
		// The address of a TLS variable in one coroutine may change if the
		// coroutine resumes in another thread.
		if (CGM.getLangOpts().Coroutines)
		Val = Builder.CreateCoroMayChange(Val);

if (VD->getType()->isReferenceType()) {		if (VD->getType()->isReferenceType()) {
CharUnits Align = CGM.getContext().getDeclAlign(VD);		CharUnits Align = CGM.getContext().getDeclAlign(VD);
Val = Builder.CreateAlignedLoad(Var->getValueType(), Var, Align);		Val = Builder.CreateAlignedLoad(Var->getValueType(), Var, Align);
}		}
if (Val->getType() != Wrapper->getReturnType())		if (Val->getType() != Wrapper->getReturnType())
Val = Builder.CreatePointerBitCastOrAddrSpaceCast(		Val = Builder.CreatePointerBitCastOrAddrSpaceCast(
Val, Wrapper->getReturnType(), "");		Val, Wrapper->getReturnType(), "");
Builder.CreateRet(Val);		Builder.CreateRet(Val);
▲ Show 20 Lines • Show All 1,837 Lines • Show Last 20 Lines

clang/test/CodeGenCXX/const-init-cxx2a.cpp

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s -std=c++2a \| FileCheck %s --implicit-check-not=cxx_global_var_init --implicit-check-not=cxa_atexit			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm -o - %s -std=c++2a -disable-llvm-passes \| FileCheck %s --implicit-check-not=cxx_global_var_init --implicit-check-not=cxa_atexit

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-pch -o %t.pch %s -std=c++2a			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-pch -o %t.pch %s -std=c++2a
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -include-pch %t.pch -x c++ /dev/null -emit-llvm -o - -std=c++2a \| FileCheck %s --implicit-check-not=cxx_global_var_init --implicit-check-not=cxa_atexit			// RUN: %clang_cc1 -triple x86_64-linux-gnu -include-pch %t.pch -x c++ /dev/null -emit-llvm -o - -std=c++2a -disable-llvm-passes \| FileCheck %s --implicit-check-not=cxx_global_var_init --implicit-check-not=cxa_atexit

	// CHECK: @a ={{.*}} global i32 123,			// CHECK: @a ={{.*}} global i32 123,
	int a = (delete new int, 123);			int a = (delete new int, 123);

	struct B {			struct B {
	constexpr B() {}			constexpr B() {}
	constexpr ~B() { n *= 5; }			constexpr ~B() { n *= 5; }
	int n = 123;			int n = 123;
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

clang/test/CodeGenCXX/cxx2a-thread-local-constinit.cpp

	// RUN: %clang_cc1 -no-opaque-pointers -triple x86_64-linux-gnu -std=c++2a %s -emit-llvm -o - \| FileCheck --check-prefix=CHECK --check-prefix=LINUX %s			// RUN: %clang_cc1 -no-opaque-pointers -triple x86_64-linux-gnu -std=c++2a %s -emit-llvm -o - -disable-llvm-passes \| FileCheck --check-prefix=CHECK --check-prefix=LINUX %s
	// RUN: %clang_cc1 -no-opaque-pointers -triple x86_64-apple-darwin12 -std=c++2a %s -emit-llvm -o - \| FileCheck --check-prefix=CHECK --check-prefix=DARWIN %s			// RUN: %clang_cc1 -no-opaque-pointers -triple x86_64-apple-darwin12 -std=c++2a %s -emit-llvm -o - -disable-llvm-passes \| FileCheck --check-prefix=CHECK --check-prefix=DARWIN %s

	// Check variable definitions/declarations. Note that on Darwin, typically the			// Check variable definitions/declarations. Note that on Darwin, typically the
	// variable's symbol is marked internal, and only the _ZTW function is			// variable's symbol is marked internal, and only the _ZTW function is
	// exported. Except: constinit variables do get exported, even on darwin.			// exported. Except: constinit variables do get exported, even on darwin.

	// CHECK-DAG: @a = external thread_local global i32			// CHECK-DAG: @a = external thread_local global i32
	// CHECK-DAG: @b = external thread_local global i32			// CHECK-DAG: @b = external thread_local global i32
	// LINUX-DAG: @c ={{.*}} thread_local global i32 0, align 4			// LINUX-DAG: @c ={{.*}} thread_local global i32 0, align 4
	Show All 15 Lines
	// LINUX-LABEL: define linkonce_odr {{.*}} @_ZTW1a()			// LINUX-LABEL: define linkonce_odr {{.*}} @_ZTW1a()
	// LINUX: br i1			// LINUX: br i1
	// LINUX: call void @_ZTH1a()			// LINUX: call void @_ZTH1a()
	// LINUX: }			// LINUX: }
	// DARWIN-NOT: define {{.*}}@_ZTW1a()			// DARWIN-NOT: define {{.*}}@_ZTW1a()

	// CHECK-LABEL: define{{.*}} i32 @_Z5get_bv()			// CHECK-LABEL: define{{.*}} i32 @_Z5get_bv()
	// CHECK-NOT: call			// CHECK-NOT: call
	// CHECK: load i32, i32* @b			// CHECK: %0 = call i32* @llvm.coro.maychange.p0i32(i32* @b)
				// CHECK: load i32, i32* %0
	// CHECK-NOT: call			// CHECK-NOT: call
	// CHECK: }			// CHECK: }
	int get_b() { return b; }			int get_b() { return b; }

	// CHECK-NOT: define {{.*}} @_ZTW1b()			// CHECK-NOT: define {{.*}} @_ZTW1b()

	extern thread_local int c;			extern thread_local int c;

	// CHECK-LABEL: define{{.*}} i32 @_Z5get_cv()			// CHECK-LABEL: define{{.*}} i32 @_Z5get_cv()
	// LINUX: call {{(cxx_fast_tlscc )?}}i32* @_ZTW1c()			// LINUX: call {{(cxx_fast_tlscc )?}}i32* @_ZTW1c()
	// CHECK: load i32, i32* %			// CHECK: load i32, i32* %
	// CHECK: }			// CHECK: }
	int get_c() { return c; }			int get_c() { return c; }

	// Note: use of 'c' does not trigger initialization of 'd', because 'c' has a			// Note: use of 'c' does not trigger initialization of 'd', because 'c' has a
	// constant initializer.			// constant initializer.
	// DARWIN-LABEL: define cxx_fast_tlscc {{.*}} @_ZTW1c()			// DARWIN-LABEL: define cxx_fast_tlscc {{.*}} @_ZTW1c()
	// LINUX-LABEL: define weak_odr {{.*}} @_ZTW1c()			// LINUX-LABEL: define weak_odr {{.*}} @_ZTW1c()
	// CHECK-NOT: br i1			// CHECK-NOT: br i1
	// CHECK-NOT: call			// CHECK: %[[C:[0-9]+]] = call i32* @llvm.coro.maychange.p0i32(i32* @c)
	// CHECK: ret i32* @c			// CHECK-NEXT: ret i32* %[[C]]
	// CHECK: }			// CHECK: }

	thread_local int c = 0;			thread_local int c = 0;

	// PR51079: We must assume an incomplete class type might have non-trivial			// PR51079: We must assume an incomplete class type might have non-trivial
	// destruction, and so speculatively call the thread wrapper.			// destruction, and so speculatively call the thread wrapper.

	// CHECK-LABEL: define {{.*}} @_Z6get_e3v(			// CHECK-LABEL: define {{.*}} @_Z6get_e3v(
	Show All 40 Lines

clang/test/CodeGenCoroutines/coro-tls.cpp

This file was added.

				// This tests that
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++20 -O3 -emit-llvm %s -o - \| FileCheck %s

				#include "Inputs/coroutine.h"

				struct awaitable {
				bool await_ready() { return false; }
				void await_suspend(std::coroutine_handle<> h);
				void await_resume() {}
				};
				awaitable switch_to_new_thread();

				struct task {
				struct promise_type {
				task get_return_object() { return {}; }
				std::suspend_never initial_suspend() { return {}; }
				std::suspend_never final_suspend() noexcept { return {}; }
				void return_void() {}
				void unhandled_exception() {}
				};
				};

				thread_local int tls_variable = 0;

				bool non_coroutine() {
				auto *i = &tls_variable;
				auto *j = &tls_variable;
				return i == j;
				}

				// CHECK-LABEL: @_Z13non_coroutinev()
				// CHECK-NEXT: entry:
				// CHECK-NEXT: ret i1 true

				void check(int i, int j);

				task resuming_on_new_thread() {
				auto *i = &tls_variable;
				co_await switch_to_new_thread();
				auto *j = &tls_variable;
				if (i != j)
				check(i, j);
				}

				// CHECK-LABEL: define internal fastcc void @_Z22resuming_on_new_threadv.resume
				// CHECK: %[[RELOAD:.+]] = load ptr, ptr %[[RELOAD_ADDR:.+reload.addr.*]]
				// CHECK: %[[CMP:.+]] = icmp eq ptr %[[RELOAD]], @tls_variable
				// CHECK: tail call void @_Z5checkPiS_({{.}}%[[RELOAD]], {{.}}@tls_variable)

				task resuming_on_new_thread2() {
				co_await switch_to_new_thread();
				auto *j = &tls_variable;
				auto *i = &tls_variable;
				if (i != j)
				check(i, j);
				}

				// CHECK-LABEL: _Z23resuming_on_new_thread2v.resume
				// CHECK-NOT: call void @_Z5checkPiS_

llvm/docs/Coroutines.rst

	Show First 20 Lines • Show All 1,684 Lines • ▼ Show 20 Lines
	abnormally (non-zero).			abnormally (non-zero).

	In a normal coroutine, it is undefined behavior if the coroutine executes			In a normal coroutine, it is undefined behavior if the coroutine executes
	a call to ``llvm.coro.suspend.retcon`` after resuming abnormally.			a call to ``llvm.coro.suspend.retcon`` after resuming abnormally.

	In a yield-once coroutine, it is undefined behavior if the coroutine			In a yield-once coroutine, it is undefined behavior if the coroutine
	executes a call to ``llvm.coro.suspend.retcon`` after resuming in any way.			executes a call to ``llvm.coro.suspend.retcon`` after resuming in any way.

				Coroutine Helper Intrinsics
				------------------------------
				Intrinsics described in this section are used as a helper to show
				the changed properties after we introduced coroutines.

				.. _coro.maychange:

				'llvm.coro.maychange' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare ptr @llvm.coro.maychange(ptr)

				Overview:
				"""""""""

				The '``llvm.coro.maychange``' intrinsic refers to the value which may
				change in coroutine but never change in normal functions. A typicall example
				is the address of the TLS variables. The addresses of TLS variables are
				thought to be constant in one function. But it is not true in coroutines
				due to a coroutine may resume in another thread.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				The `llvm.coro.maychange` intrinsic would be replaced with its argument
				after we lowered all the coroutines.

	Coroutine Transformation Passes			Coroutine Transformation Passes
	===============================			===============================
	CoroEarly			CoroEarly
	---------			---------
	The pass CoroEarly lowers coroutine intrinsics that hide the details of the			The pass CoroEarly lowers coroutine intrinsics that hide the details of the
	structure of the coroutine frame, but, otherwise not needed to be preserved to			structure of the coroutine frame, but, otherwise not needed to be preserved to
	help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,			help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
	and `coro.promise`_ intrinsics.			and `coro.promise`_ intrinsics.
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 732 Lines • ▼ Show 20 Lines	#endif
/// If the pointer isn't i8* it will be converted.		/// If the pointer isn't i8* it will be converted.
CallInst CreateLifetimeEnd(Value Ptr, ConstantInt *Size = nullptr);		CallInst CreateLifetimeEnd(Value Ptr, ConstantInt *Size = nullptr);

/// Create a call to invariant.start intrinsic.		/// Create a call to invariant.start intrinsic.
///		///
/// If the pointer isn't i8* it will be converted.		/// If the pointer isn't i8* it will be converted.
CallInst CreateInvariantStart(Value Ptr, ConstantInt *Size = nullptr);		CallInst CreateInvariantStart(Value Ptr, ConstantInt *Size = nullptr);

		/// Create a call to coro.may_change intrinsic.
		CallInst CreateCoroMayChange(Value Ptr);

/// Create a call to Masked Load intrinsic		/// Create a call to Masked Load intrinsic
CallInst CreateMaskedLoad(Type Ty, Value Ptr, Align Alignment, Value Mask,		CallInst CreateMaskedLoad(Type Ty, Value Ptr, Align Alignment, Value Mask,
Value *PassThru = nullptr, const Twine &Name = "");		Value *PassThru = nullptr, const Twine &Name = "");

/// Create a call to Masked Store intrinsic		/// Create a call to Masked Store intrinsic
CallInst CreateMaskedStore(Value Val, Value *Ptr, Align Alignment,		CallInst CreateMaskedStore(Value Val, Value *Ptr, Align Alignment,
Value *Mask);		Value *Mask);

▲ Show 20 Lines • Show All 1,808 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,318 Lines • ▼ Show 20 Lines

	// Coroutine Lowering Intrinsics. Used internally by coroutine passes.			// Coroutine Lowering Intrinsics. Used internally by coroutine passes.

	def int_coro_subfn_addr : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i8_ty],			def int_coro_subfn_addr : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty, llvm_i8_ty],
	[IntrReadMem, IntrArgMemOnly,			[IntrReadMem, IntrArgMemOnly,
	ReadOnly<ArgIndex<0>>,			ReadOnly<ArgIndex<0>>,
	NoCapture<ArgIndex<0>>]>;			NoCapture<ArgIndex<0>>]>;

				// Coroutine Lowering Intrinsics to block optimizations.
				def int_coro_maychange : Intrinsic<[llvm_anyptr_ty], [LLVMMatchType<0>]>;

	///===-------------------------- Other Intrinsics --------------------------===//			///===-------------------------- Other Intrinsics --------------------------===//
	//			//
	def int_trap : Intrinsic<[], [], [IntrNoReturn, IntrCold]>,			def int_trap : Intrinsic<[], [], [IntrNoReturn, IntrCold]>,
	GCCBuiltin<"__builtin_trap">;			GCCBuiltin<"__builtin_trap">;
	def int_debugtrap : Intrinsic<[]>,			def int_debugtrap : Intrinsic<[]>,
	GCCBuiltin<"__builtin_debugtrap">;			GCCBuiltin<"__builtin_debugtrap">;
	def int_ubsantrap : Intrinsic<[], [llvm_i8_ty],			def int_ubsantrap : Intrinsic<[], [llvm_i8_ty],
	[IntrNoReturn, IntrCold, ImmArg<ArgIndex<0>>]>;			[IntrNoReturn, IntrCold, ImmArg<ArgIndex<0>>]>;
	▲ Show 20 Lines • Show All 715 Lines • Show Last 20 Lines

llvm/lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 493 Lines • ▼ Show 20 Lines	CallInst IRBuilderBase::CreateInvariantStart(Value Ptr, ConstantInt *Size) {
// Fill in the single overloaded type: memory object type.		// Fill in the single overloaded type: memory object type.
Type *ObjectPtr[1] = {Ptr->getType()};		Type *ObjectPtr[1] = {Ptr->getType()};
Module *M = BB->getParent()->getParent();		Module *M = BB->getParent()->getParent();
Function *TheFn =		Function *TheFn =
Intrinsic::getDeclaration(M, Intrinsic::invariant_start, ObjectPtr);		Intrinsic::getDeclaration(M, Intrinsic::invariant_start, ObjectPtr);
return createCallHelper(TheFn, Ops, this);		return createCallHelper(TheFn, Ops, this);
}		}

		CallInst IRBuilderBase::CreateCoroMayChange(Value Ptr) {
		Type *Ty = Ptr->getType();
		assert(Ty->isPointerTy() && "llvm.coro.maychange is allowed for pointer.\n");
		return CreateIntrinsic(llvm::Intrinsic::coro_maychange, {Ty}, {Ptr});
		}

CallInst *		CallInst *
IRBuilderBase::CreateAssumption(Value *Cond,		IRBuilderBase::CreateAssumption(Value *Cond,
ArrayRef<OperandBundleDef> OpBundles) {		ArrayRef<OperandBundleDef> OpBundles) {
assert(Cond->getType() == getInt1Ty() &&		assert(Cond->getType() == getInt1Ty() &&
"an assumption condition must be of type i1");		"an assumption condition must be of type i1");

Value *Ops[] = { Cond };		Value *Ops[] = { Cond };
Module *M = BB->getParent()->getParent();		Module *M = BB->getParent()->getParent();
▲ Show 20 Lines • Show All 767 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroCleanup.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	if (auto *II = dyn_cast<IntrinsicInst>(&I)) {
break;		break;
case Intrinsic::coro_end:		case Intrinsic::coro_end:
case Intrinsic::coro_suspend_retcon:		case Intrinsic::coro_suspend_retcon:
if (IsPrivateAndUnprocessed) {		if (IsPrivateAndUnprocessed) {
II->replaceAllUsesWith(UndefValue::get(II->getType()));		II->replaceAllUsesWith(UndefValue::get(II->getType()));
} else		} else
continue;		continue;
break;		break;
		case Intrinsic::coro_maychange:
		II->replaceAllUsesWith(II->getOperand(0));
		break;
case Intrinsic::coro_async_size_replace:		case Intrinsic::coro_async_size_replace:
auto *Target = cast<ConstantStruct>(		auto *Target = cast<ConstantStruct>(
cast<GlobalVariable>(II->getArgOperand(0)->stripPointerCasts())		cast<GlobalVariable>(II->getArgOperand(0)->stripPointerCasts())
->getInitializer());		->getInitializer());
auto *Source = cast<ConstantStruct>(		auto *Source = cast<ConstantStruct>(
cast<GlobalVariable>(II->getArgOperand(1)->stripPointerCasts())		cast<GlobalVariable>(II->getArgOperand(1)->stripPointerCasts())
->getInitializer());		->getInitializer());
auto *TargetSize = Target->getOperand(1);		auto *TargetSize = Target->getOperand(1);
auto *SourceSize = Source->getOperand(1);		auto *SourceSize = Source->getOperand(1);
if (TargetSize->isElementWiseEqual(SourceSize)) {		if (TargetSize->isElementWiseEqual(SourceSize)) {
break;		break;
}		}
auto *TargetRelativeFunOffset = Target->getOperand(0);		auto *TargetRelativeFunOffset = Target->getOperand(0);
auto *NewFuncPtrStruct = ConstantStruct::get(		auto *NewFuncPtrStruct = ConstantStruct::get(
Target->getType(), TargetRelativeFunOffset, SourceSize);		Target->getType(), TargetRelativeFunOffset, SourceSize);
Target->replaceAllUsesWith(NewFuncPtrStruct);		Target->replaceAllUsesWith(NewFuncPtrStruct);
break;		break;
}		}

II->eraseFromParent();		II->eraseFromParent();
Changed = true;		Changed = true;
}		}
}		}

if (Changed) {		if (Changed) {
// After replacement were made we can cleanup the function body a little.		// After replacement were made we can cleanup the function body a little.
simplifyCFG(F);		simplifyCFG(F);
}		}

return Changed;		return Changed;
}		}

static bool declaresCoroCleanupIntrinsics(const Module &M) {		static bool declaresCoroCleanupIntrinsics(const Module &M) {
return coro::declaresIntrinsics(		return coro::declaresIntrinsics(
M, {"llvm.coro.alloc", "llvm.coro.begin", "llvm.coro.subfn.addr",		M, {"llvm.coro.alloc", "llvm.coro.begin", "llvm.coro.subfn.addr",
"llvm.coro.free", "llvm.coro.id", "llvm.coro.id.retcon",		"llvm.coro.free", "llvm.coro.id", "llvm.coro.id.retcon",
"llvm.coro.id.retcon.once", "llvm.coro.async.size.replace",		"llvm.coro.id.retcon.once", "llvm.coro.async.size.replace",
"llvm.coro.async.resume"});		"llvm.coro.async.resume", "llvm.coro.maychange"});
}		}

PreservedAnalyses CoroCleanupPass::run(Function &F,		PreservedAnalyses CoroCleanupPass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &M = *F.getParent();		auto &M = *F.getParent();
if (!declaresCoroCleanupIntrinsics(M) \|\|		if (!declaresCoroCleanupIntrinsics(M) \|\|
!Lowerer(M).lowerRemainingCoroIntrinsics(F))		!Lowerer(M).lowerRemainingCoroIntrinsics(F))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return PreservedAnalyses::none();		return PreservedAnalyses::none();
}		}

llvm/lib/Transforms/Coroutines/Coroutines.cpp

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	static const char *const CoroIntrinsics[] = {
"llvm.coro.end",		"llvm.coro.end",
"llvm.coro.end.async",		"llvm.coro.end.async",
"llvm.coro.frame",		"llvm.coro.frame",
"llvm.coro.free",		"llvm.coro.free",
"llvm.coro.id",		"llvm.coro.id",
"llvm.coro.id.async",		"llvm.coro.id.async",
"llvm.coro.id.retcon",		"llvm.coro.id.retcon",
"llvm.coro.id.retcon.once",		"llvm.coro.id.retcon.once",
		"llvm.coro.maychange",
"llvm.coro.noop",		"llvm.coro.noop",
"llvm.coro.prepare.async",		"llvm.coro.prepare.async",
"llvm.coro.prepare.retcon",		"llvm.coro.prepare.retcon",
"llvm.coro.promise",		"llvm.coro.promise",
"llvm.coro.resume",		"llvm.coro.resume",
"llvm.coro.save",		"llvm.coro.save",
"llvm.coro.size",		"llvm.coro.size",
"llvm.coro.subfn.addr",		"llvm.coro.subfn.addr",
▲ Show 20 Lines • Show All 559 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-TLS-01.ll

This file was added.

				; Tests that the TLS variables which cross suspend points wouldn't be misoptimized.
				; RUN: opt < %s -S -passes=coro-early,sroa,early-cse,coro-split,coro-cleanup,simplifycfg -opaque-pointers \| FileCheck %s

				@tls_variable = thread_local global i32 0

				define ptr @f() "coroutine.presplit"="0" {
				entry:
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%i = alloca ptr
				%j = alloca ptr
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%tls_variable = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable, ptr %i
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%tls_variable2 = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable2, ptr %j
				%i_value = load ptr, ptr %i
				%j_value = load ptr, ptr %j
				%cmp = icmp eq ptr %i_value, %j_value
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; CHECK-LABEL: f.resume(
				; CHECK: br i1 %cmp, label %same, label %diff
				; CHECK-EMPTY:
				; CHECK-NEXT: same:
				; CHECK-NEXT: call void @print_same()
				; CHECK-NEXT: br label %cleanup
				; CHECK-EMPTY:
				; CHECK-NEXT: diff:
				; CHECK-NEXT: call void @print_diff()
				; CHECK-NEXT: br label %cleanup

				declare ptr @llvm.coro.maychange(ptr)
				declare void @print_same()
				declare void @print_diff()
				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-TLS-02.ll

This file was added.

				; Tests that the TLS variables which don't cross suspend points would be optimized correctly.
				; RUN: opt < %s -S -passes=coro-early,coro-split,coro-cleanup,sroa,early-cse,simplifycfg -opaque-pointers \| FileCheck %s

				@tls_variable = thread_local global i32 0

				define ptr @f() "coroutine.presplit"="0" {
				entry:
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%i = alloca ptr
				%j = alloca ptr
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%tls_variable = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable, ptr %i
				%tls_variable2 = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable2, ptr %j
				%i_value = load ptr, ptr %i
				%j_value = load ptr, ptr %j
				%cmp = icmp eq ptr %i_value, %j_value
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; CHECK: void @f.resume
				; CHECK-NEXT: entry.resume:
				; CHECK-NEXT: call void @print_same()
				; CHECK-NEXT: call void @free(ptr %hdl)
				; CHECK-NEXT: ret void

				declare ptr @llvm.coro.maychange(ptr)

				declare void @print_same()
				declare void @print_diff()
				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-TLS-03.ll

This file was added.

				; Tests that the TLS variables which cross suspend points wouldn't be misoptimized during O2 pipeline.
				; RUN: opt < %s -S -passes='default<O2>' -opaque-pointers \| FileCheck %s

				@tls_variable = thread_local global i32 0

				define ptr @f() "coroutine.presplit"="0" {
				entry:
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%i = alloca ptr
				%j = alloca ptr
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%tls_variable = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable, ptr %i
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%tls_variable2 = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable2, ptr %j
				%i_value = load ptr, ptr %i
				%j_value = load ptr, ptr %j
				%cmp = icmp eq ptr %i_value, %j_value
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; CHECK-LABEL: f.resume(
				; CHECK: br i1 %cmp, label %same, label %diff
				; CHECK-EMPTY:
				; CHECK-NEXT: same:
				; CHECK-NEXT: call void @print_same()
				; CHECK-NEXT: br label
				; CHECK-EMPTY:
				; CHECK-NEXT: diff:
				; CHECK-NEXT: call void @print_diff()
				; CHECK-NEXT: br label

				declare ptr @llvm.coro.maychange(ptr)

				declare void @print_same()
				declare void @print_diff()
				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-TLS-04.ll

This file was added.

				; Tests that the TLS variables which don't cross suspend points would be optimized correctly during O2 pipelines.
				; RUN: opt < %s -S -passes='default<O2>' -opaque-pointers \| FileCheck %s

				@tls_variable = thread_local global i32 0

				define ptr @f() "coroutine.presplit"="0" {
				entry:
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%i = alloca ptr
				%j = alloca ptr
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%tls_variable = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable, ptr %i
				%tls_variable2 = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable2, ptr %j
				%i_value = load ptr, ptr %i
				%j_value = load ptr, ptr %j
				%cmp = icmp eq ptr %i_value, %j_value
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; CHECK: void @f.resume
				; CHECK-NEXT: resume:
				; CHECK-NEXT: call void @print_same(
				; CHECK-NEXT: call void @free(
				; CHECK-NEXT: ret void

				declare ptr @llvm.coro.maychange(ptr)

				declare void @print_same()
				declare void @print_diff()
				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

llvm/test/Transforms/Coroutines/coro-cleanup-maychange.ll

This file was added.

				; Test that coro-cleanup would convert llvm.coro.maychange intrinsics
				; correctly.
				; RUN: opt < %s -S -passes=coro-cleanup -opaque-pointers \| FileCheck %s

				%f.Frame = type { ptr, ptr, ptr, ptr, i1 }

				@tls_variable = thread_local global i32 0
				@f.resumers = private constant [3 x ptr] [ptr @f.resume, ptr @f.destroy, ptr @f.cleanup]

				define ptr @f() {
				entry:
				%id = call token @llvm.coro.id(i32 0, ptr null, ptr @f, ptr @f.resumers)
				%alloc = call ptr @malloc(i32 40)
				%i = alloca ptr, align 8
				%j = alloca ptr, align 8
				%hdl = call noalias nonnull ptr @llvm.coro.begin(token %id, ptr %alloc)
				%resume.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 0
				store ptr @f.resume, ptr %resume.addr, align 8
				%destroy.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 1
				store ptr @f.destroy, ptr %destroy.addr, align 8
				%i.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 2
				%j.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 3
				; CHECK: store ptr @tls_variable, ptr %i.reload.addr, align 8
				%tls_variable1 = call ptr @llvm.coro.maychange(ptr @tls_variable)
				store ptr %tls_variable1, ptr %i.reload.addr, align 8
				%index.addr2 = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 4
				store i1 false, ptr %index.addr2, align 1
				ret ptr %hdl
				}

				define internal fastcc void @f.resume(ptr noalias nonnull align 8 dereferenceable(40) %hdl) {
				entry.resume:
				%i.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 2
				%j.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 3
				%tls_variable = call ptr @llvm.coro.maychange(ptr @tls_variable)
				; CHECK: store ptr @tls_variable, ptr %j.reload.addr, align 8
				store ptr %tls_variable, ptr %j.reload.addr, align 8
				call void @consume(ptr %i.reload.addr)
				call void @consume(ptr %j.reload.addr)
				call void @free(ptr %hdl)
				ret void
				}

				define internal fastcc void @f.destroy(ptr noalias nonnull align 8 dereferenceable(40) %hdl) {
				entry.destroy:
				%i.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 2
				%j.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 3
				call void @free(ptr %hdl)
				ret void
				}

				define internal fastcc void @f.cleanup(ptr noalias nonnull align 8 dereferenceable(40) %hdl) {
				entry.cleanup:
				%i.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 2
				%j.reload.addr = getelementptr inbounds %f.Frame, ptr %hdl, i32 0, i32 3
				call void @free(ptr null)
				ret void
				}

				declare void @consume(ptr)
				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

				declare ptr @llvm.coro.maychange(ptr)