This is an archive of the discontinued LLVM Phabricator instance.

[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions
ClosedPublic

Authored by lxfind on Feb 17 2021, 8:02 PM.

Download Raw Diff

Details

Reviewers

junparser
efriedma
ChuanqiXu

Commits

rG03f668613c44: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions

Summary

See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.

This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lxfind created this revision.Feb 17 2021, 8:02 PM

Herald added subscribers: hoy, modimo, wenlei and 2 others. · View Herald TranscriptFeb 17 2021, 8:02 PM

lxfind requested review of this revision.Feb 17 2021, 8:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 8:02 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B89660: Diff 324507.Feb 17 2021, 9:30 PM

ChuanqiXu added inline comments.Feb 17 2021, 10:16 PM

llvm/lib/Transforms/Scalar/LICM.cpp
373	coroutine

please see comments of D87817.

In D96928#2570981, @junparser wrote:

please see comments of D87817.

I don't think we should go that route, because LICM will mostly hurt coroutine, as I explained in the summary.
Hence we should simply disable LICM for coroutine, and I don't think this is a temporary change. What do you think?

@efriedma, do you think the inlined document is sufficient? Or would you prefer to see it somewhere else too?

I want to see changes to LangRef and/or the coroutine documentation to describe the semantic restriction. If there's a correctness issue, it clearly isn't specific to LICM, so I want to see the rule described in general terms. And please ping llvm-dev when you have it written up.

It's not completely clear to me that disabling sinking is always a performance win, but it's not that important.

In D96928#2571879, @lxfind wrote:

In D96928#2570981, @junparser wrote:

please see comments of D87817.

I don't think we should go that route, because LICM will mostly hurt coroutine, as I explained in the summary.
Hence we should simply disable LICM for coroutine, and I don't think this is a temporary change. What do you think?

I do not think we should disable LICM for coroutine, also this is not semantic restriction of coroutine (GCC does not do this). It just caused by current pipeline of llvm coroutine as well as debug info issues. I was thinking maybe we can invoke corosplit as early as possible (not considering performance). Anyway , we can discuss this in D95807.

In D96928#2576084, @junparser wrote:

In D96928#2571879, @lxfind wrote:

In D96928#2570981, @junparser wrote:

please see comments of D87817.

I don't think we should go that route, because LICM will mostly hurt coroutine, as I explained in the summary.
Hence we should simply disable LICM for coroutine, and I don't think this is a temporary change. What do you think?

I do not think we should disable LICM for coroutine, also this is not semantic restriction of coroutine (GCC does not do this). It just caused by current pipeline of llvm coroutine as well as debug info issues. I was thinking maybe we can invoke corosplit as early as possible (not considering performance). Anyway , we can discuss this in D95807.

Could you explain why we want to enable LICM for coroutine (from performance perspective)?
My theory is this:

majority of LICM should be memory operations. Constant function calls that are not inlined should be relatively rare compared to memory operations.
For memory operations, LICM does not reduce the number of memory operations in the loop in the case of coroutine; instead it adds one extra entry to the coroutine frame, increasing memory usage.

In D96928#2576084, @junparser wrote:

In D96928#2571879, @lxfind wrote:

In D96928#2570981, @junparser wrote:

please see comments of D87817.

I don't think we should go that route, because LICM will mostly hurt coroutine, as I explained in the summary.
Hence we should simply disable LICM for coroutine, and I don't think this is a temporary change. What do you think?

I do not think we should disable LICM for coroutine, also this is not semantic restriction of coroutine (GCC does not do this). It just caused by current pipeline of llvm coroutine as well as debug info issues. I was thinking maybe we can invoke corosplit as early as possible (not considering performance). Anyway , we can discuss this in D95807.

Indeed GCC splits coroutine very early one so it doesn't get exposed to lots of issues like Clang does.
However if we do that, there will be no chance to optimize and we will always end up with a huge coroutine frame (unless we can redesign it in a way that can still optimize the coroutine frame post-split.

In D96928#2576209, @lxfind wrote:

In D96928#2576084, @junparser wrote:

In D96928#2571879, @lxfind wrote:

In D96928#2570981, @junparser wrote:

please see comments of D87817.

I don't think we should go that route, because LICM will mostly hurt coroutine, as I explained in the summary.
Hence we should simply disable LICM for coroutine, and I don't think this is a temporary change. What do you think?

I do not think we should disable LICM for coroutine, also this is not semantic restriction of coroutine (GCC does not do this). It just caused by current pipeline of llvm coroutine as well as debug info issues. I was thinking maybe we can invoke corosplit as early as possible (not considering performance). Anyway , we can discuss this in D95807.

Could you explain why we want to enable LICM for coroutine (from performance perspective)?
My theory is this:

majority of LICM should be memory operations. Constant function calls that are not inlined should be relatively rare compared to memory operations.

For memory operations, LICM does not reduce the number of memory operations in the loop in the case of coroutine; instead it adds one extra entry to the coroutine frame, increasing memory usage.

LICM move the memory operations out of the loop. It does reduce the number of memory operations. More importantly, I agree with @efriedma that either we have general solution or we describe these restrictions other than fix them anywhere.

LICM move the memory operations out of the loop. It does reduce the number of memory operations. More importantly, I agree with @efriedma that either we have general solution or we describe these restrictions other than fix them anywhere.

Let me elaborate in more detail:
This patch does not attempt to disable the entire LICM in the presence of coroutines. Instead, it disables a specific part of LICM: promoting memory references to scalars.
Such promotion works by sinking stores out of the loop and moving loads to before the loop. Let's look at each of the two cases and see why they won't reduce memory operations for coroutines:

Sinking stores out of the loop. LICM sinks stores out of the loop by turning the memory stores into scalar stores, and then outside of the loop it stores the scalars into the memory. So it is important to note that LICM introduces a scalar in the loop that needs to stay alive until the loop ends, so that it can store that scalar into the memory. In the presence of coroutine, that is, the loop can suspend and resume, anything that needs to live through the loop will need to be put on the coroutine frame (i.e. heap). So even though LICM can turn the memory store into a scalar store, with coroutine, that scalar needs to live on the coroutine frame and hence scalar store will eventually become a memory store again. So effectively as you can see, we still have the same number of memory stores in the loop, and further more we introduced one more entry in the frame to store the sunk scalar.
Moving loads to before the loop. The reasoning is similar here. In order to move loads to before the loop, we need a scalar to store the result of the load, so that we can access that scalar within the loop. However in the presence of coroutine, if the scalar value needs to live through the loop, it also needs to be put on the coroutine frame, which is the heap. Hence every read of the scalar value in the loop is still a memory load. We still end up with the same number of memory loads, and we also added one more entry to the frame.

Does this make sense?

In D96928#2579340, @lxfind wrote:

LICM move the memory operations out of the loop. It does reduce the number of memory operations. More importantly, I agree with @efriedma that either we have general solution or we describe these restrictions other than fix them anywhere.

Let me elaborate in more detail:
This patch does not attempt to disable the entire LICM in the presence of coroutines. Instead, it disables a specific part of LICM: promoting memory references to scalars.
Such promotion works by sinking stores out of the loop and moving loads to before the loop. Let's look at each of the two cases and see why they won't reduce memory operations for coroutines:

Sinking stores out of the loop. LICM sinks stores out of the loop by turning the memory stores into scalar stores, and then outside of the loop it stores the scalars into the memory. So it is important to note that LICM introduces a scalar in the loop that needs to stay alive until the loop ends, so that it can store that scalar into the memory. In the presence of coroutine, that is, the loop can suspend and resume, anything that needs to live through the loop will need to be put on the coroutine frame (i.e. heap). So even though LICM can turn the memory store into a scalar store, with coroutine, that scalar needs to live on the coroutine frame and hence scalar store will eventually become a memory store again. So effectively as you can see, we still have the same number of memory stores in the loop, and further more we introduced one more entry in the frame to store the sunk scalar.

Moving loads to before the loop. The reasoning is similar here. In order to move loads to before the loop, we need a scalar to store the result of the load, so that we can access that scalar within the loop. However in the presence of coroutine, if the scalar value needs to live through the loop, it also needs to be put on the coroutine frame, which is the heap. Hence every read of the scalar value in the loop is still a memory load. We still end up with the same number of memory loads, and we also added one more entry to the frame.

Does this make sense?

Yes, lots of the cases follow this rules. However, as long as the values stored are loop invariant, then LICM should move store out of the loop to reduce memory operation.

BTW, although this patch is OK enough for pr46990 and use-after-free issue, Fixing these issues case by case for coroutine is not good enough. That's the reason why i agree with efriedma.

BTW, although this patch is OK enough for pr46990 and use-after-free issue, Fixing these issues case by case for coroutine is not good enough. That's the reason why i agree with efriedma.

The problem I see right now is there is no general fix available for this problem. From LLVM IR perspective, the default edge of the coro.suspend switch is not something reliable to discover, nor something developers can avoid moving instructions over. So even if we want to document, I don't know what to document in LangRef other than explaining in the Coroutines.ts on how this works and what the issue is.
Perhaps we need to introduce a new IR instruction for suspend instead of relying on intrinsics, but that's not going to be something we can redesign in a short amount of time.
I am happy to add more detailed documentation in Coroutines.ts. But beyond that I don't see much I can do here.

In D96928#2580756, @lxfind wrote:

BTW, although this patch is OK enough for pr46990 and use-after-free issue, Fixing these issues case by case for coroutine is not good enough. That's the reason why i agree with efriedma.

The problem I see right now is there is no general fix available for this problem. From LLVM IR perspective, the default edge of the coro.suspend switch is not something reliable to discover, nor something developers can avoid moving instructions over. So even if we want to document, I don't know what to document in LangRef other than explaining in the Coroutines.ts on how this works and what the issue is.
Perhaps we need to introduce a new IR instruction for suspend instead of relying on intrinsics, but that's not going to be something we can redesign in a short amount of time.
I am happy to add more detailed documentation in Coroutines.ts. But beyond that I don't see much I can do here.

@efriedma any idea？

Perhaps we need to introduce a new IR instruction for suspend instead of relying on intrinsics, but that's not going to be something we can redesign in a short amount of time.

How about add fence instruction between coro.suspend and ret block?

In D96928#2583784, @junparser wrote:

Perhaps we need to introduce a new IR instruction for suspend instead of relying on intrinsics, but that's not going to be something we can redesign in a short amount of time.

How about add fence instruction between coro.suspend and ret block?

Can you elaborate using the test case from this patch?
The problem is at the edge from the switch to the block. There doesn't seem a way to prevent from sending instructions over an edge.

I'd be satisfied with a description of the issue in Coroutines.ts. Having a description of the problem is the first step to fixing it.

Yes, lots of the cases follow this rules. However, as long as the values stored are loop invariant, then LICM should move store out of the loop to reduce memory operation.

You are right that in some cases LICM does reduce memory operations for coroutines.
Now that we have all agreed on the documentation part, let me think a bit more about this patch vs D87817

Add more detailed documentation

Harbormaster completed remote builds in B90973: Diff 326602.Feb 26 2021, 1:04 AM

In D96928#2589101, @lxfind wrote:

Yes, lots of the cases follow this rules. However, as long as the values stored are loop invariant, then LICM should move store out of the loop to reduce memory operation.

You are right that in some cases LICM does reduce memory operations for coroutines.
Now that we have all agreed on the documentation part, let me think a bit more about this patch vs D87817

I concluded that there is no better way than disabling LICM for coroutines right now.

LGTM, you may need @efriedma's approval

LGTM

This revision is now accepted and ready to land.Mar 3 2021, 3:00 PM

Closed by commit rG03f668613c44: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions (authored by lxfind). · Explain WhyMar 3 2021, 3:22 PM

This revision was automatically updated to reflect the committed changes.

lxfind added a commit: rG03f668613c44: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

Revision Contents

Path

Size

llvm/

docs/

Coroutines.rst

8 lines

lib/

Transforms/

Scalar/

LICM.cpp

18 lines

test/

Transforms/

Coroutines/

ArgAddr.ll

44 lines

LICM/

sink-with-coroutine.ll

52 lines

Diff 327944

llvm/docs/Coroutines.rst

	Show First 20 Lines • Show All 1,754 Lines • ▼ Show 20 Lines

	CoroCleanup			CoroCleanup
	-----------			-----------
	This pass runs late to lower all coroutine related intrinsics not replaced by			This pass runs late to lower all coroutine related intrinsics not replaced by
	earlier passes.			earlier passes.

	Areas Requiring Attention			Areas Requiring Attention
	=========================			=========================
				#. When coro.suspend returns -1, the coroutine is suspended, and it's possible
				that the coroutine has already been destroyed (hence the frame has been freed).
				We cannot access anything on the frame on the suspend path.
				However there is nothing that prevents the compiler from moving instructions
				along that path (e.g. LICM), which can lead to use-after-free. At the moment
				we disabled LICM for loops that have coro.suspend, but the general problem still
				exists and requires a general solution.

	#. Take advantage of the lifetime intrinsics for the data that goes into the			#. Take advantage of the lifetime intrinsics for the data that goes into the
	coroutine frame. Leave lifetime intrinsics as is for the data that stays in			coroutine frame. Leave lifetime intrinsics as is for the data that stays in
	allocas.			allocas.

	#. The CoroElide optimization pass relies on coroutine ramp function to be			#. The CoroElide optimization pass relies on coroutine ramp function to be
	inlined. It would be beneficial to split the ramp function further to			inlined. It would be beneficial to split the ramp function further to
	increase the chance that it will get inlined into its caller.			increase the chance that it will get inlined into its caller.

	Show All 11 Lines

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines	bool LoopInvariantCodeMotion::runOnLoop(
if (hasDisableLICMTransformsHint(L)) {		if (hasDisableLICMTransformsHint(L)) {
return false;		return false;
}		}

std::unique_ptr<AliasSetTracker> CurAST;		std::unique_ptr<AliasSetTracker> CurAST;
std::unique_ptr<MemorySSAUpdater> MSSAU;		std::unique_ptr<MemorySSAUpdater> MSSAU;
std::unique_ptr<SinkAndHoistLICMFlags> Flags;		std::unique_ptr<SinkAndHoistLICMFlags> Flags;

		// Don't sink stores from loops with coroutine suspend instructions.
		// LICM would sink instructions into the default destination of
		// the coroutine switch. The default destination of the switch is to
		// handle the case where the coroutine is suspended, by which point the
		// coroutine frame may have been destroyed. No instruction can be sunk there.
		// FIXME: This would unfortunately hurt the performance of coroutines, however
		// there is currently no general solution for this. Similar issues could also
		// potentially happen in other passes where instructions are being moved
		// across that edge.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions coroutine ChuanqiXu: coroutine
		bool HasCoroSuspendInst = llvm::any_of(L->getBlocks(), [](BasicBlock *BB) {
		return llvm::any_of(*BB, [](Instruction &I) {
		IntrinsicInst *II = dyn_cast<IntrinsicInst>(&I);
		return II && II->getIntrinsicID() == Intrinsic::coro_suspend;
		});
		});

if (!MSSA) {		if (!MSSA) {
LLVM_DEBUG(dbgs() << "LICM: Using Alias Set Tracker.\n");		LLVM_DEBUG(dbgs() << "LICM: Using Alias Set Tracker.\n");
CurAST = collectAliasInfoForLoop(L, LI, AA);		CurAST = collectAliasInfoForLoop(L, LI, AA);
Flags = std::make_unique<SinkAndHoistLICMFlags>(		Flags = std::make_unique<SinkAndHoistLICMFlags>(
LicmMssaOptCap, LicmMssaNoAccForPromotionCap, /IsSink=/true);		LicmMssaOptCap, LicmMssaNoAccForPromotionCap, /IsSink=/true);
} else {		} else {
LLVM_DEBUG(dbgs() << "LICM: Using MemorySSA.\n");		LLVM_DEBUG(dbgs() << "LICM: Using MemorySSA.\n");
MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);		MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);
Show All 30 Lines	bool LoopInvariantCodeMotion::runOnLoop(
// Now that all loop invariants have been removed from the loop, promote any		// Now that all loop invariants have been removed from the loop, promote any
// memory references to scalars that we can.		// memory references to scalars that we can.
// Don't sink stores from loops without dedicated block exits. Exits		// Don't sink stores from loops without dedicated block exits. Exits
// containing indirect branches are not transformed by loop simplify,		// containing indirect branches are not transformed by loop simplify,
// make sure we catch that. An additional load may be generated in the		// make sure we catch that. An additional load may be generated in the
// preheader for SSA updater, so also avoid sinking when no preheader		// preheader for SSA updater, so also avoid sinking when no preheader
// is available.		// is available.
if (!DisablePromotion && Preheader && L->hasDedicatedExits() &&		if (!DisablePromotion && Preheader && L->hasDedicatedExits() &&
!Flags->tooManyMemoryAccesses()) {		!Flags->tooManyMemoryAccesses() && !HasCoroSuspendInst) {
// Figure out the loop exits and their insertion points		// Figure out the loop exits and their insertion points
SmallVector<BasicBlock *, 8> ExitBlocks;		SmallVector<BasicBlock *, 8> ExitBlocks;
L->getUniqueExitBlocks(ExitBlocks);		L->getUniqueExitBlocks(ExitBlocks);

// We can't insert into a catchswitch.		// We can't insert into a catchswitch.
bool HasCatchSwitch = llvm::any_of(ExitBlocks, [](BasicBlock *Exit) {		bool HasCatchSwitch = llvm::any_of(ExitBlocks, [](BasicBlock *Exit) {
return isa<CatchSwitchInst>(Exit->getTerminator());		return isa<CatchSwitchInst>(Exit->getTerminator());
});		});
▲ Show 20 Lines • Show All 2,021 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/ArgAddr.ll

	; Need to move users of allocas that were moved into the coroutine frame after			; Need to move users of allocas that were moved into the coroutine frame after
	; coro.begin.			; coro.begin.
	; RUN: opt < %s -preserve-alignment-assumptions-during-inlining=false -O2 -enable-coroutines -S \| FileCheck %s			; RUN: opt < %s -coro-split -S \| FileCheck %s
	; RUN: opt < %s -preserve-alignment-assumptions-during-inlining=false -aa-pipeline=basic-aa -passes='default<O2>' -enable-coroutines -S \| FileCheck %s			; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

	define nonnull i8* @f(i32 %n) {			define nonnull i8* @f(i32 %n) "coroutine.presplit"="1" {
				; CHECK-LABEL: @f(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[ID:%.]] = call token @llvm.coro.id(i32 0, i8 null, i8* null, i8* bitcast ([3 x void (%f.Frame)]* @f.resumers to i8*))
				; CHECK-NEXT: [[N_ADDR:%.*]] = alloca i32, align 4
				; CHECK-NEXT: store i32 [[N:%.]], i32 [[N_ADDR]], align 4
				; CHECK-NEXT: [[CALL:%.]] = tail call i8 @malloc(i32 24)
				; CHECK-NEXT: [[TMP0:%.]] = tail call noalias nonnull i8 @llvm.coro.begin(token [[ID]], i8* [[CALL]])
				; CHECK-NEXT: [[FRAMEPTR:%.]] = bitcast i8 [[TMP0]] to %f.Frame*
				; CHECK-NEXT: [[RESUME_ADDR:%.]] = getelementptr inbounds [[F_FRAME:%.]], %f.Frame* [[FRAMEPTR]], i32 0, i32 0
				; CHECK-NEXT: store void (%f.Frame) @f.resume, void (%f.Frame)* [[RESUME_ADDR]], align 8
				; CHECK-NEXT: [[DESTROY_ADDR:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR]], i32 0, i32 1
				; CHECK-NEXT: store void (%f.Frame) @f.destroy, void (%f.Frame)* [[DESTROY_ADDR]], align 8
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[F_FRAME]], %f.Frame [[FRAMEPTR]], i32 0, i32 2
				; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[N_ADDR]], align 4
				; CHECK-NEXT: store i32 [[TMP2]], i32* [[TMP1]], align 4
				;
	entry:			entry:
	%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null);			%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null);
	%n.addr = alloca i32			%n.addr = alloca i32
	store i32 %n, i32* %n.addr ; this needs to go after coro.begin			store i32 %n, i32* %n.addr ; this needs to go after coro.begin
	%0 = tail call i32 @llvm.coro.size.i32()			%0 = tail call i32 @llvm.coro.size.i32()
	%call = tail call i8* @malloc(i32 %0)			%call = tail call i8* @malloc(i32 %0)
	%1 = tail call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %call)			%1 = tail call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %call)
	%2 = bitcast i32* %n.addr to i8*			%2 = bitcast i32* %n.addr to i8*
	call void @ctor(i8* %2)			call void @ctor(i8* %2)
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	%3 = load i32, i32* %n.addr			%3 = load i32, i32* %n.addr
	%dec = add nsw i32 %3, -1			%dec = add nsw i32 %3, -1
	store i32 %dec, i32* %n.addr			store i32 %dec, i32* %n.addr
	call void @print(i32 %3)			call void @print(i32 %3)
	%4 = call i8 @llvm.coro.suspend(token none, i1 false)			%4 = call i8 @llvm.coro.suspend(token none, i1 false)
	%conv = sext i8 %4 to i32			%conv = sext i8 %4 to i32
	switch i32 %conv, label %coro_Suspend [			switch i32 %conv, label %coro_Suspend [
	i32 0, label %for.cond			i32 0, label %for.cond
	i32 1, label %coro_Cleanup			i32 1, label %coro_Cleanup
	]			]

	coro_Cleanup:			coro_Cleanup:
	%5 = call i8* @llvm.coro.free(token %id, i8* nonnull %1)			%5 = call i8* @llvm.coro.free(token %id, i8* nonnull %1)
	call void @free(i8* %5)			call void @free(i8* %5)
	br label %coro_Suspend			br label %coro_Suspend

	coro_Suspend:			coro_Suspend:
	call i1 @llvm.coro.end(i8* null, i1 false)			call i1 @llvm.coro.end(i8* null, i1 false)
	ret i8* %1			ret i8* %1
	}			}

	; CHECK-LABEL: @main			; CHECK-LABEL: @main
	define i32 @main() {			define i32 @main() {
	entry:			entry:
	%hdl = call i8* @f(i32 4)			%hdl = call i8* @f(i32 4)
	call void @llvm.coro.resume(i8* %hdl)			call void @llvm.coro.resume(i8* %hdl)
	call void @llvm.coro.resume(i8* %hdl)			call void @llvm.coro.resume(i8* %hdl)
	call void @llvm.coro.destroy(i8* %hdl)			call void @llvm.coro.destroy(i8* %hdl)
	ret i32 0			ret i32 0
	; CHECK: call void @ctor
	; CHECK-NEXT: %dec1.spill.addr.i = getelementptr inbounds i8, i8* %call.i, i64 20
	; CHECK-NEXT: bitcast i8* %dec1.spill.addr.i to i32*
	; CHECK-NEXT: store i32 4
	; CHECK-NEXT: call void @print(i32 4)
	; CHECK-NEXT: %index.addr12.i = getelementptr inbounds i8, i8* %call.i, i64 24
	; CHECK-NEXT: bitcast i8* %index.addr12.i to i1*
	; CHECK-NEXT: store i1 false
	; CHECK-NEXT: store i32 3
	; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl
	; CHECK-NEXT: store i32 3
	; CHECK-NEXT: call void @print(i32 3)
	; CHECK-NEXT: store i1 false
	; CHECK-NEXT: store i32 2
	; CHECK-NEXT: call void @llvm.experimental.noalias.scope.decl
	; CHECK-NEXT: store i32 2
	; CHECK-NEXT: call void @print(i32 2)
	; CHECK: ret i32 0
	}			}

	declare i8* @malloc(i32)			declare i8* @malloc(i32)
	declare void @free(i8*)			declare void @free(i8*)
	declare void @print(i32)			declare void @print(i32)
	declare void @ctor(i8* nocapture readonly)			declare void @ctor(i8* nocapture readonly)

	declare token @llvm.coro.id(i32, i8, i8, i8*)			declare token @llvm.coro.id(i32, i8, i8, i8*)
	declare i32 @llvm.coro.size.i32()			declare i32 @llvm.coro.size.i32()
	declare i8* @llvm.coro.begin(token, i8*)			declare i8* @llvm.coro.begin(token, i8*)
	declare i8 @llvm.coro.suspend(token, i1)			declare i8 @llvm.coro.suspend(token, i1)
	declare i8* @llvm.coro.free(token, i8*)			declare i8* @llvm.coro.free(token, i8*)
	declare i1 @llvm.coro.end(i8*, i1)			declare i1 @llvm.coro.end(i8*, i1)

	declare void @llvm.coro.resume(i8*)			declare void @llvm.coro.resume(i8*)
	declare void @llvm.coro.destroy(i8*)			declare void @llvm.coro.destroy(i8*)

llvm/test/Transforms/LICM/sink-with-coroutine.ll

This file was added.

				; Verifies that LICM is disabled for loops that contains coro.suspend.
				; RUN: opt -S < %s -passes=licm \| FileCheck %s

				define i64 @licm(i64 %n) #0 {
				; CHECK-LABEL: @licm(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[P:%.*]] = alloca i64, align 8
				; CHECK-NEXT: br label [[BB0:%.*]]
				; CHECK: bb0:
				; CHECK-NEXT: br label [[LOOP:%.*]]
				; CHECK: loop:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ 0, [[BB0]] ], [ [[T5:%.]], [[AWAIT_READY:%.*]] ]
				; CHECK-NEXT: [[T5]] = add i64 [[I]], 1
				; CHECK-NEXT: [[SUSPEND:%.*]] = call i8 @llvm.coro.suspend(token none, i1 false)
				; CHECK-NEXT: switch i8 [[SUSPEND]], label [[BB2:%.*]] [
				; CHECK-NEXT: i8 0, label [[AWAIT_READY]]
				; CHECK-NEXT: ]
				; CHECK: await.ready:
				; CHECK-NEXT: store i64 1, i64* [[P]], align 4
				; CHECK-NEXT: [[T6:%.]] = icmp ult i64 [[T5]], [[N:%.]]
				; CHECK-NEXT: br i1 [[T6]], label [[LOOP]], label [[BB2]]
				; CHECK: bb2:
				; CHECK-NEXT: [[RES:%.]] = call i1 @llvm.coro.end(i8 null, i1 false)
				; CHECK-NEXT: ret i64 0
				;
				entry:
				%p = alloca i64
				br label %bb0

				bb0:
				br label %loop

				loop:
				%i = phi i64 [ 0, %bb0 ], [ %t5, %await.ready ]
				%t5 = add i64 %i, 1
				%suspend = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %suspend, label %bb2 [
				i8 0, label %await.ready
				]

				await.ready:
				store i64 1, i64* %p
				%t6 = icmp ult i64 %t5, %n
				br i1 %t6, label %loop, label %bb2

				bb2:
				%res = call i1 @llvm.coro.end(i8* null, i1 false)
				ret i64 0
				}

				declare i8 @llvm.coro.suspend(token, i1)
				declare i1 @llvm.coro.end(i8*, i1)