This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
1/1
BasicAliasAnalysis.cpp
-
test/Transforms/MemCpyOpt/
-
Transforms/
-
MemCpyOpt/
-
coroutine-params.ll

Differential D104007

[BasicAA] Properly mark that coroutine suspension may modify parameters
AbandonedPublic

Authored by lxfind on Jun 9 2021, 7:48 PM.

Download Raw Diff

Details

Reviewers

nikic
asbirlea
ChuanqiXu
rjmccall

Summary

This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
When there is a memcpy from a byvalue argument (which is marked as readonly) to a stack alloca, and then latter another memcpy from the stack alloca to another stack alloca, MemCpyOpt will optimize the two memcpys
and turn them into one memcpy. Unfortunately this is incorrect in a coroutine where during a coroutine suspension, the byvalue argument may die outside of the coroutine.
Ultimately, this could be expressed as that a coroutine suspension could potentially modify (and reference) the arguments, so even though the argument is marked as readonly, we still need to assume that they may change during the call to coro_suspend intrinsic.
This change properly fixed the bug around memcpy optimization.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lxfind created this revision.Jun 9 2021, 7:48 PM

Herald added subscribers: hoy, modimo, wenlei, hiraditya. · View Herald TranscriptJun 9 2021, 7:48 PM

lxfind requested review of this revision.Jun 9 2021, 7:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2021, 7:48 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

lxfind mentioned this in D101510: Do not merge memcpy if the first source is a parameter of coroutine function.Jun 9 2021, 7:49 PM

Harbormaster completed remote builds in B108532: Diff 351048.Jun 9 2021, 8:50 PM

Great! Although I am not expect for AA, the general idea looks natural and proper to me. Previously, we think we should handle coroutine in the coroutine module and don't pollute other passes about the meaning of coroutine. On one hand, it looks not easy or not cheap to handle these semantics for coroutine in the coroutine module only. On the other hand, it is really natural to add it in AA since after suspend the argument is really possible be modified.

Although it may be restrict now, I think we could relax them in the future. For example, we could extract a analysis pass from coroutine and the AA could get information from that analysis pass to make more precise analysis.

nikic added inline comments.Jun 10 2021, 2:12 PM

llvm/lib/Analysis/BasicAliasAnalysis.cpp
831	Underlying object determination is best-effort. `Object` may still be based on an argument without getUnderlyingObject() detecting it. I think to be correct you'd have to check something like `!isIdentifiedObject(Object) \|\| isa<Argument>(Object)`.

address comment

lxfind marked an inline comment as done.Jun 10 2021, 4:46 PM

Harbormaster completed remote builds in B108713: Diff 351299.Jun 10 2021, 5:51 PM

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

In D104007#2812383, @efriedma wrote:

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

It will become part of the coroutine frame, by copying it through memcpy. That's actually what's the first memcpy for in the test case.
But the parameter pointer itself will still die during a coroutine suspension. Because of that I would say the parameter pointer is different from an alloca pointer, because during a coroutine suspension, allocas won't change.

In D104007#2812383, @efriedma wrote:

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

In fact, I think this patch may as well fix other bugs that we are not aware of yet.
As long as there exists a pointer in the parameters, its value may change during a coroutine suspension. Without this patch, I imagine there can exist other cases where some optimization will think the value of a parameter never change during the function and hence do some optimizations based on it.

In D104007#2812395, @lxfind wrote:

In D104007#2812383, @efriedma wrote:

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

It will become part of the coroutine frame, by copying it through memcpy. That's actually what's the first memcpy for in the test case.

I don't understand. As far as I can tell, the destination of both memcpys in the testcase are allocas.

But the parameter pointer itself will still die during a coroutine suspension. Because of that I would say the parameter pointer is different from an alloca pointer, because during a coroutine suspension, allocas won't change.

I think we're talking past each other.

During coroutine suspension, allocas won't change because corosplit rewrites them. I'm saying that the same rewrite should apply to byval arguments. A byval argument is basically the same thing as an alloca: it's memory owned by the callee that's deallocated when the function returns.

In D104007#2812401, @efriedma wrote:

In D104007#2812395, @lxfind wrote:

In D104007#2812383, @efriedma wrote:

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

It will become part of the coroutine frame, by copying it through memcpy. That's actually what's the first memcpy for in the test case.

I don't understand. As far as I can tell, the destination of both memcpys in the testcase are allocas.

OK, so the way coroutine codegen works is to first copy all parameters to local alloca, and then if any of those allocas need to live through coroutine suspension, the alloca will be put on the coroutine frame during the CoroSplit pass.

But the parameter pointer itself will still die during a coroutine suspension. Because of that I would say the parameter pointer is different from an alloca pointer, because during a coroutine suspension, allocas won't change.

I think we're talking past each other.

During coroutine suspension, allocas won't change because corosplit rewrites them. I'm saying that the same rewrite should apply to byval arguments. A byval argument is basically the same thing as an alloca: it's memory owned by the callee that's deallocated when the function returns.

Just to be clear, MemCpyOptPass happens before CoroSplit pass. So at the point of MemCpyOptPass, all it sees are two memcpys, one that copies from a byval readonly parameter to a local alloca, and another one from the local alloca to another. The only way to prevent these two memcpys to merge is to tell AA that coro_suspend intrinsics call could potentially modify the parameter pointer.

From my point of view, the difference comes from:

@lxfind and me think this patch could solve bug 48857 well.
@efriedma think that coroutine suspend wouldn't make alloca really 'may alias'. We are just trying to make a lie to the MemCpyOpt Pass.

Is it the main difference until now?

So here is my thought about the reason why this patch isn't a lie. Let's start from bug 48857, why is the transformation made by MemCpyOpt pass wrong? I think the root reason is that the introduce of coroutine breaks the data flow! For example, (use pseudo code for simplicity. Remind me if any thing is confusing )

void foo(int a) {
     %a.tmp = alloca ...
     %b = alloca ...
     copy %a to %a.tmp
     coro.suspend
     copy %a.tmp to %b ...
     use of %b
}

would be splited to:

void foo.ramp(int a);
void foo.resume(%foo.Frame);

And if all the caller of foo, calls foo.resume immediately foo.ramp, everything should be right. However, the actual behavior of the program between foo.ramp and foo.resume is undetermined. In other words, there may be any instruction executed between foo.ramp and foo.resume. From the side of static analysis, it looks like there is a call to function without definition known at compile time in every coro.suspend call.

So it looks natural to mark arguments may alias after coro.suspend to me. How do you think about it?

@efriedma think that coroutine suspend wouldn't make alloca really 'may alias'. We are just trying to make a lie to the MemCpyOpt Pass.

Yes. I suspect the bug is actually in the implementation of corosplit.

So it looks natural to mark arguments may alias after coro.suspend to me. How do you think about it?

For a normal pointer argument, that reasoning makes sense. But the normal alias analysis rules should make that happen automatically. For a byval argument, it does not make sense.

there may be any instruction executed between foo.ramp and foo.resume

In general, calling a function with a byval argument makes a copy of the memory in question, into memory owned by the callee. That memory is local to the coroutine function. So the code outside the function can't access it unless a pointer to the memory is explicitly escaped inside the callee.

In the context of a coroutine, that means instructions between foo.ramp and foo.resume can't modify the byval memory unless the address escapes.

I dug a bit more, and I think I see where the confusion is coming from. If you look at the clang output for the testcase from the bug before any optimizations run, there's an alloca named "%a11", which is specific to coroutines. Coroutines have to do some extra work to support arguments that are passed indirectly: we have to move-construct the argument from caller-owned memory into the coroutine frame. The "extra" alloca is coming out of this code.

If an argument is getting passed indirectly, so it's actually just a plain pointer at the LLVM IR level, this makes perfect sense. For arguments that are getting passed directly, or byval, it's nonsense: the argument was never in caller-owned memory in the first place, so copying it is extra work for nothing.

If you fix the clang code so it isn't generating these extra useless allocas, the problem with byval should become obvious: even without any optimizations, we end up with a use-after-free. corosplit isn't preserving memory that's supposed to be part of the callee frame.

The only reason byval arguments currently work at all is that the extra allocas cover up the bug in corosplit.

In D104007#2812536, @efriedma wrote:

If you fix the clang code so it isn't generating these extra useless allocas, the problem with byval should become obvious: even without any optimizations, we end up with a use-after-free. corosplit isn't preserving memory that's supposed to be part of the callee frame.

Thanks. I get your points now.
Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

In D104007#2813714, @efriedma wrote:

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

Sorry, that's not right. The question you want to answer here is whether the pointer in the caller points to different memory from the pointer in the callee. The only attribute that causes that is byval, and it's very unlikely we'll ever add any others.

inalloca/preallocated memory is part of the callee stack frame, at a low level. But it's actually allocated in the caller, so from an alias-analysis perspective, it doesn't count as a local allocation like byval would.

In D104007#2814244, @efriedma wrote:

In D104007#2813714, @efriedma wrote:

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

Sorry, that's not right. The question you want to answer here is whether the pointer in the caller points to different memory from the pointer in the callee. The only attribute that causes that is byval, and it's very unlikely we'll ever add any others.

inalloca/preallocated memory is part of the callee stack frame, at a low level. But it's actually allocated in the caller, so from an alias-analysis perspective, it doesn't count as a local allocation like byval would.

hmm sounds like hasPassPointeeByValueCopyAttr() is actually what I need. In CoroSplit, when deciding what to put on the coroutine frame, for arguments, I will need to decide whether it's enough to just put the pointer on the frame, or if I need to copy the content of the pointer (much like allocas) to the frame. Sounds like for inalloca and preallocated memory I still want to copy the content to the frame, not just the pointer.

lxfind mentioned this in D104184: [Coroutine] Properly deal with byval and noalias parameters.Jun 12 2021, 10:12 AM

In D104007#2814244, @efriedma wrote:

In D104007#2813714, @efriedma wrote:

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

Sorry, that's not right. The question you want to answer here is whether the pointer in the caller points to different memory from the pointer in the callee. The only attribute that causes that is byval, and it's very unlikely we'll ever add any others.

inalloca/preallocated memory is part of the callee stack frame, at a low level. But it's actually allocated in the caller, so from an alias-analysis perspective, it doesn't count as a local allocation like byval would.

It seems that there is no stable contract to tell whether an argument is byval, and it depends on the target.
For example: https://godbolt.org/z/YcrYPbMdc
So we won't be able to fix this just in CoroSplit pass. Either we change the front-end to force all byval arguments to have byval attribute (not sure if that's possible), or we may have to conservatively go with this patch.

In D104007#2815732, @lxfind wrote:

It seems that there is no stable contract to tell whether an argument is byval, and it depends on the target.
For example: https://godbolt.org/z/YcrYPbMdc
So we won't be able to fix this just in CoroSplit pass. Either we change the front-end to force all byval arguments to have byval attribute (not sure if that's possible), or we may have to conservatively go with this patch.

I'm not sure what you mean. An LLVM IR argument either has the byval attribute, or it doesn't. If it does, the allocation belongs to the callee. If it doesn't, the pointer refers to an allocation constructed in the caller. There isn't any other sense in which an argument can be "byval".

A given C++ function can have a bunch of variations in the LLVM IR signature, depending on the target, but that's not relevant here. Once clang is finished emitting IR, the C++ type signature becomes irrelevant.

In D104007#2815847, @efriedma wrote:

In D104007#2815732, @lxfind wrote:

It seems that there is no stable contract to tell whether an argument is byval, and it depends on the target.
For example: https://godbolt.org/z/YcrYPbMdc
So we won't be able to fix this just in CoroSplit pass. Either we change the front-end to force all byval arguments to have byval attribute (not sure if that's possible), or we may have to conservatively go with this patch.

I'm not sure what you mean. An LLVM IR argument either has the byval attribute, or it doesn't. If it does, the allocation belongs to the callee. If it doesn't, the pointer refers to an allocation constructed in the caller. There isn't any other sense in which an argument can be "byval".

A given C++ function can have a bunch of variations in the LLVM IR signature, depending on the target, but that's not relevant here. Once clang is finished emitting IR, the C++ type signature becomes irrelevant.

Unfortunately what we really care about here is whether an argument is by-value from C++ perspective, not just whether it has a byval attribute. Let me try to explain the problem top-down and see if I got it wrong somewhere:

In C++, when we call a function with a parameter passed by value, such as foo(std::move(a)), the passed value is typically a temporary value allocated at caller, and only the pointer is passed to the callee. When callee returns, the caller will be able to deallocate that temporary value's memory (in LLVM IR terms, there will be a lifetime.end marking the temporary value after the callee returns).
If the callee is a normal routine (non-coroutine), the callee don't need to worry about the lifetime of the parameter because it will stay alive through the entire life cycle of the callee.
However if the callee is a coroutine, the situation is different because the coroutine can return in the middle of the function and latter resume. As soon as the coroutine returns for the first time (i.e. suspended the first time), the temporary value (the pass-by-value parameter) would die in the caller. But if the callee needs to use that argument after coroutine resumption, we need to make sure that the argument value will be put on the coroutine frame (similar to allocas), so that latter access won't be accessing invalid memory.
However there is one major difference between a parameter and an alloca: for a parameter that is a pointer type, we need to know whether we need to copy the value that the pointer points to to the coroutine frame, or just the pointer it self. For instance, you could call a coroutine by passing a direct pointer, and in that case we should only need to copy the pointer to the frame. But for the C++ pass-by-value arguments, we need to copy the value that the pointer points to to the frame. The question is then, during CoroSplit (the pass where we put things to the frame), are we able to tell, given an argument pointer, whether we need to copy the value or just the pointer to the frame? Knowing that a C++ pass-by-value argument doesn't necessarily end up with an argument with a byval attribute, we know that it is not possible to tell, hence we cannot deal with this properly only in CoroSplit pass.

That this sound right?

I don't think you're characterizing the C++ issues correctly.

From clang's perspective, what it needs to ensure is that every argument is stored in an allocation local to the callee (in other words, allocations corosplit can handle). "allocations corosplit can handle" should be either allocas, or arguments marked byval. These are pointers that don't exist in the caller, so corosplit can mess with the address in memory without worrying about code it can't see in the caller.

Between clang IR generation and corosplit, normal alias analysis should ensure we don't assume arguments allocated in the caller are live past the first coro.suspend. (This might involve ensuring clang doesn't attach certain attributes, like noalias, to the arguments.)

corosplit then takes the alloca/byval allocations, and rewrites them to refer to the coroutine frame.

From clang's perspective, what it needs to ensure is that every argument is stored in an allocation local to the callee (in other words, allocations corosplit can handle). "allocations corosplit can handle" should be either allocas, or arguments marked byval. These are pointers that don't exist in the caller, so corosplit can mess with the address in memory without worrying about code it can't see in the caller.

How do you prevent optimization passes (prior to CoroSplit) from messing up with those pointers, though?
Let's say we have an argument that's not marked with byval (due to target specifics) but is pass-by-value in C++. In Clang we could emit IR to copy the argument value to a local alloca. However latter optimization passes could very well optimize out that copy because the copy would appear to be useless without considering what coro_suspend does, which is why we are changing BasicAA to prevent such elimination.
Or are you suggesting we need to modify Clang such that every pass-by-value argument in C++ must be marked with byval?

However latter optimization passes could very well optimize out that copy because the copy would appear to be useless without considering what coro_suspend does, which is why we are changing BasicAA to prevent such elimination.

If an argument is just a pointer without any argument attributes, BasicAA will assume coro_suspend frees the underlying allocations. Not because it has any particular knowledge of how coro_suspend works, but because that's the default assumption for any function call with unknown semantics.

BasicAA treats arguments marked byval differently because it understands the semantics of byval: it uses escape analysis to prove an arbitrary function call can't modify the contents.

I guess in some sense, it's possible to specify a world where coro_suspend frees byval allocations. But such a world doesn't really make sense. And we'd need to audit every pass that knows anything about byval semantics, so I really don't want to go down that path.

In D104007#2817612, @efriedma wrote:

However latter optimization passes could very well optimize out that copy because the copy would appear to be useless without considering what coro_suspend does, which is why we are changing BasicAA to prevent such elimination.

If an argument is just a pointer without any argument attributes, BasicAA will assume coro_suspend frees the underlying allocations. Not because it has any particular knowledge of how coro_suspend works, but because that's the default assumption for any function call with unknown semantics.

BasicAA treats arguments marked byval differently because it understands the semantics of byval: it uses escape analysis to prove an arbitrary function call can't modify the contents.

I guess in some sense, it's possible to specify a world where coro_suspend frees byval allocations. But such a world doesn't really make sense. And we'd need to audit every pass that knows anything about byval semantics, so I really don't want to go down that path.

Thank you so much! I understand it now.
Let me also play with inalloca attribute just to double check.

Abandon this in favor of D104184

lxfind abandoned this revision.Jun 16 2021, 10:34 AM

lxfind mentioned this in rG3522167efd80: [Coroutine] Properly deal with byval and noalias parameters.Jun 17 2021, 7:06 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

BasicAliasAnalysis.cpp

6 lines

test/

Transforms/

MemCpyOpt/

coroutine-params.ll

136 lines

Diff 351299

llvm/lib/Analysis/BasicAliasAnalysis.cpp

Show First 20 Lines • Show All 819 Lines • ▼ Show 20 Lines	if (const CallInst *CI = dyn_cast<CallInst>(Call))
return ModRefInfo::NoModRef;		return ModRefInfo::NoModRef;

// Stack restore is able to modify unescaped dynamic allocas. Assume it may		// Stack restore is able to modify unescaped dynamic allocas. Assume it may
// modify them even though the alloca is not escaped.		// modify them even though the alloca is not escaped.
if (auto *AI = dyn_cast<AllocaInst>(Object))		if (auto *AI = dyn_cast<AllocaInst>(Object))
if (!AI->isStaticAlloca() && isIntrinsicCall(Call, Intrinsic::stackrestore))		if (!AI->isStaticAlloca() && isIntrinsicCall(Call, Intrinsic::stackrestore))
return ModRefInfo::Mod;		return ModRefInfo::Mod;

		// When a coroutine is suspended, the arguments could potentially be modified
		// outside of the coroutine.
		if (Call->getIntrinsicID() == Intrinsic::coro_suspend &&
		(!isIdentifiedObject(Object) \|\| isa<Argument>(Object)))
		nikicUnsubmitted Done Reply Inline Actions Underlying object determination is best-effort. `Object` may still be based on an argument without getUnderlyingObject() detecting it. I think to be correct you'd have to check something like `!isIdentifiedObject(Object) \|\| isa<Argument>(Object)`. nikic: Underlying object determination is best-effort. `Object` may still be based on an argument…
		return ModRefInfo::ModRef;

// If the pointer is to a locally allocated object that does not escape,		// If the pointer is to a locally allocated object that does not escape,
// then the call can not mod/ref the pointer unless the call takes the pointer		// then the call can not mod/ref the pointer unless the call takes the pointer
// as an argument, and itself doesn't capture it.		// as an argument, and itself doesn't capture it.
if (!isa<Constant>(Object) && Call != Object &&		if (!isa<Constant>(Object) && Call != Object &&
isNonEscapingLocalObject(Object, &AAQI.IsCapturedCache)) {		isNonEscapingLocalObject(Object, &AAQI.IsCapturedCache)) {

// Optimistically assume that call doesn't touch Object and check this		// Optimistically assume that call doesn't touch Object and check this
// assumption in the following loop.		// assumption in the following loop.
▲ Show 20 Lines • Show All 997 Lines • Show Last 20 Lines

llvm/test/Transforms/MemCpyOpt/coroutine-params.ll

This file was added.

				; RUN: opt < %s -passes=memcpyopt -S \| FileCheck %s

				%promise_type = type { i8 }
				%struct.A = type <{ i64, i64, i32, [4 x i8] }>

				; Function Attrs: noinline ssp uwtable mustprogress
				define %promise_type* @foo(%struct.A* nocapture readonly byval(%struct.A) align 8 %a1) #0 {
				entry:
				%a1153 = alloca [24 x i8], align 8
				%__promise = alloca %promise_type, align 1
				%a2 = alloca %struct.A, align 8
				%a1153.sub = getelementptr inbounds [24 x i8], [24 x i8]* %a1153, i64 0, i64 0
				%0 = getelementptr inbounds %promise_type, %promise_type* %__promise, i64 0, i32 0
				%1 = call token @llvm.coro.id(i32 16, i8* nonnull %0, i8* bitcast (%promise_type* (%struct.A) @foo to i8), i8 null)
				%2 = call i1 @llvm.coro.alloc(token %1)
				br i1 %2, label %coro.alloc, label %coro.init

				coro.alloc: ; preds = %entry
				%3 = call i64 @llvm.coro.size.i64()
				%call = call noalias nonnull i8* @_Znwm(i64 %3) #8
				br label %coro.init

				coro.init: ; preds = %coro.alloc, %entry
				%4 = phi i8* [ null, %entry ], [ %call, %coro.alloc ]
				%5 = call i8* @llvm.coro.begin(token %1, i8* %4) #9
				%a1153.0.a1153.sub.sroa_idx = getelementptr inbounds [24 x i8], [24 x i8]* %a1153, i64 0, i64 0
				call void @llvm.lifetime.start.p0i8(i64 24, i8* %a1153.0.a1153.sub.sroa_idx)
				%6 = bitcast %struct.A* %a1 to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(24) %a1153.sub, i8* noundef nonnull align 8 dereferenceable(24) %6, i64 24, i1 false)
				call void @llvm.lifetime.start.p0i8(i64 1, i8* nonnull %0) #2
				%call2 = call %promise_type* @_ZN4task12promise_type17get_return_objectEv(%promise_type* nonnull dereferenceable(1) %__promise)
				call void @initial_suspend(%promise_type* nonnull dereferenceable(1) %__promise)
				%7 = call token @llvm.coro.save(i8* null)
				call fastcc void @_ZNSt12experimental13coroutines_v116coroutine_handleIN4task12promise_typeEE12from_addressEPv(i8* %5) #2
				%8 = call i8 @llvm.coro.suspend(token %7, i1 false)
				switch i8 %8, label %coro.ret [
				i8 0, label %init.ready
				i8 1, label %cleanup33
				]

				init.ready: ; preds = %coro.init
				%9 = bitcast %struct.A* %a2 to i8*
				call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull %9) #2
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(24) %9, i8* noundef nonnull align 8 dereferenceable(24) %a1153.sub, i64 24, i1 false)
				call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %9) #2
				call void @_ZN4task12promise_type13final_suspendEv(%promise_type* nonnull dereferenceable(1) %__promise) #2
				%10 = call token @llvm.coro.save(i8* null)
				call fastcc void @_ZNSt12experimental13coroutines_v116coroutine_handleIN4task12promise_typeEE12from_addressEPv(i8* %5) #2
				%11 = call i8 @llvm.coro.suspend(token %10, i1 true) #9
				%switch = icmp ult i8 %11, 2
				br i1 %switch, label %cleanup33, label %coro.ret

				cleanup33: ; preds = %init.ready, %coro.init
				call void @llvm.lifetime.end.p0i8(i64 1, i8* nonnull %0) #2
				%a1153.0.a1153.sub.sroa_idx58 = getelementptr inbounds [24 x i8], [24 x i8]* %a1153, i64 0, i64 0
				call void @llvm.lifetime.end.p0i8(i64 24, i8* %a1153.0.a1153.sub.sroa_idx58)
				%12 = call i8* @llvm.coro.free(token %1, i8* %5)
				%.not = icmp eq i8* %12, null
				br i1 %.not, label %coro.ret, label %coro.free

				coro.free: ; preds = %cleanup33
				call void @_ZdlPv(i8* nonnull %12) #2
				br label %coro.ret

				coro.ret: ; preds = %coro.free, %cleanup33, %init.ready, %coro.init
				%13 = call i1 @llvm.coro.end(i8* null, i1 false) #9
				ret %promise_type* %call2
				}

				; CHECK-LABEL: define %promise_type* @foo(
				; CHECK: %[[A1:.]] = bitcast %struct.A %a1 to i8*
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(24) %a1153.sub, i8* noundef nonnull align 8 dereferenceable(24) %[[A1]], i64 24, i1 false)
				; CHECK: %[[A2:.]] = bitcast %struct.A %a2 to i8*
				; CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(24) %[[A2]], i8* noundef nonnull align 8 dereferenceable(24) %a1153.sub, i64 24, i1 false)

				; Function Attrs: argmemonly nounwind readonly
				declare token @llvm.coro.id(i32, i8* readnone, i8* nocapture readonly, i8*) #1

				; Function Attrs: nounwind
				declare i1 @llvm.coro.alloc(token) #2

				; Function Attrs: nobuiltin nofree allocsize(0)
				declare nonnull i8* @_Znwm(i64) local_unnamed_addr #3

				; Function Attrs: nounwind readnone
				declare i64 @llvm.coro.size.i64() #4

				; Function Attrs: nounwind
				declare i8* @llvm.coro.begin(token, i8* writeonly) #2

				; Function Attrs: argmemonly nofree nosync nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #5

				; Function Attrs: argmemonly nofree nosync nounwind willreturn
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #5

				; Function Attrs: noinline nounwind ssp uwtable willreturn mustprogress
				declare %promise_type* @_ZN4task12promise_type17get_return_objectEv(%promise_type* nonnull dereferenceable(1)) local_unnamed_addr #6 align 2

				; Function Attrs: noinline nounwind ssp uwtable willreturn mustprogress
				declare void @initial_suspend(%promise_type* nonnull dereferenceable(1)) local_unnamed_addr #6 align 2

				; Function Attrs: nounwind
				declare token @llvm.coro.save(i8*) #2

				; Function Attrs: noinline nounwind ssp uwtable willreturn mustprogress
				declare hidden fastcc void @_ZNSt12experimental13coroutines_v116coroutine_handleIN4task12promise_typeEE12from_addressEPv(i8*) unnamed_addr #6 align 2

				; Function Attrs: argmemonly nofree nosync nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #5

				; Function Attrs: nounwind
				declare i8 @llvm.coro.suspend(token, i1) #2

				; Function Attrs: noinline nounwind ssp uwtable willreturn mustprogress
				declare void @_ZN4task12promise_type13final_suspendEv(%promise_type* nonnull dereferenceable(1)) local_unnamed_addr #6 align 2

				; Function Attrs: nounwind
				declare i1 @llvm.coro.end(i8*, i1) #2

				; Function Attrs: nobuiltin nounwind
				declare void @_ZdlPv(i8*) local_unnamed_addr #7

				; Function Attrs: argmemonly nounwind readonly
				declare i8* @llvm.coro.free(token, i8* nocapture readonly) #1

				attributes #0 = { noinline ssp uwtable mustprogress "coroutine.presplit"="1" "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" }
				attributes #1 = { argmemonly nounwind readonly }
				attributes #2 = { nounwind }
				attributes #3 = { nobuiltin nofree allocsize(0) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" }
				attributes #4 = { nounwind readnone }
				attributes #5 = { argmemonly nofree nosync nounwind willreturn }
				attributes #6 = { noinline nounwind ssp uwtable willreturn mustprogress "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" }
				attributes #7 = { nobuiltin nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "tune-cpu"="generic" }
				attributes #8 = { allocsize(0) }
				attributes #9 = { noduplicate }