This is an archive of the discontinued LLVM Phabricator instance.

[BasicAA] Properly mark that coroutine suspension may modify parameters
AbandonedPublic

Authored by lxfind on Jun 9 2021, 7:48 PM.

Details

Summary

This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
When there is a memcpy from a byvalue argument (which is marked as readonly) to a stack alloca, and then latter another memcpy from the stack alloca to another stack alloca, MemCpyOpt will optimize the two memcpys
and turn them into one memcpy. Unfortunately this is incorrect in a coroutine where during a coroutine suspension, the byvalue argument may die outside of the coroutine.
Ultimately, this could be expressed as that a coroutine suspension could potentially modify (and reference) the arguments, so even though the argument is marked as readonly, we still need to assume that they may change during the call to coro_suspend intrinsic.
This change properly fixed the bug around memcpy optimization.

Diff Detail

Event Timeline

lxfind created this revision.Jun 9 2021, 7:48 PM
lxfind requested review of this revision.Jun 9 2021, 7:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2021, 7:48 PM

Great! Although I am not expect for AA, the general idea looks natural and proper to me. Previously, we think we should handle coroutine in the coroutine module and don't pollute other passes about the meaning of coroutine. On one hand, it looks not easy or not cheap to handle these semantics for coroutine in the coroutine module only. On the other hand, it is really natural to add it in AA since after suspend the argument is really possible be modified.

Although it may be restrict now, I think we could relax them in the future. For example, we could extract a analysis pass from coroutine and the AA could get information from that analysis pass to make more precise analysis.

nikic added inline comments.Jun 10 2021, 2:12 PM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
831

Underlying object determination is best-effort. Object may still be based on an argument without getUnderlyingObject() detecting it.

I think to be correct you'd have to check something like !isIdentifiedObject(Object) || isa<Argument>(Object).

lxfind updated this revision to Diff 351299.Jun 10 2021, 4:45 PM

address comment

lxfind marked an inline comment as done.Jun 10 2021, 4:46 PM

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

It will become part of the coroutine frame, by copying it through memcpy. That's actually what's the first memcpy for in the test case.
But the parameter pointer itself will still die during a coroutine suspension. Because of that I would say the parameter pointer is different from an alloca pointer, because during a coroutine suspension, allocas won't change.

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

In fact, I think this patch may as well fix other bugs that we are not aware of yet.
As long as there exists a pointer in the parameters, its value may change during a coroutine suspension. Without this patch, I imagine there can exist other cases where some optimization will think the value of a parameter never change during the function and hence do some optimizations based on it.

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

It will become part of the coroutine frame, by copying it through memcpy. That's actually what's the first memcpy for in the test case.

I don't understand. As far as I can tell, the destination of both memcpys in the testcase are allocas.

But the parameter pointer itself will still die during a coroutine suspension. Because of that I would say the parameter pointer is different from an alloca pointer, because during a coroutine suspension, allocas won't change.

I think we're talking past each other.

During coroutine suspension, allocas won't change because corosplit rewrites them. I'm saying that the same rewrite should apply to byval arguments. A byval argument is basically the same thing as an alloca: it's memory owned by the callee that's deallocated when the function returns.

This doesn't seem right. A byval argument is essentially a local variable. We should treat it the same way as we would an alloca. If it's live across the first suspend point, it should become part of the coroutine frame.

It will become part of the coroutine frame, by copying it through memcpy. That's actually what's the first memcpy for in the test case.

I don't understand. As far as I can tell, the destination of both memcpys in the testcase are allocas.

OK, so the way coroutine codegen works is to first copy all parameters to local alloca, and then if any of those allocas need to live through coroutine suspension, the alloca will be put on the coroutine frame during the CoroSplit pass.

But the parameter pointer itself will still die during a coroutine suspension. Because of that I would say the parameter pointer is different from an alloca pointer, because during a coroutine suspension, allocas won't change.

I think we're talking past each other.

During coroutine suspension, allocas won't change because corosplit rewrites them. I'm saying that the same rewrite should apply to byval arguments. A byval argument is basically the same thing as an alloca: it's memory owned by the callee that's deallocated when the function returns.

Just to be clear, MemCpyOptPass happens before CoroSplit pass. So at the point of MemCpyOptPass, all it sees are two memcpys, one that copies from a byval readonly parameter to a local alloca, and another one from the local alloca to another. The only way to prevent these two memcpys to merge is to tell AA that coro_suspend intrinsics call could potentially modify the parameter pointer.

From my point of view, the difference comes from:

  • @lxfind and me think this patch could solve bug 48857 well.
  • @efriedma think that coroutine suspend wouldn't make alloca really 'may alias'. We are just trying to make a lie to the MemCpyOpt Pass.

Is it the main difference until now?

So here is my thought about the reason why this patch isn't a lie. Let's start from bug 48857, why is the transformation made by MemCpyOpt pass wrong? I think the root reason is that the introduce of coroutine breaks the data flow! For example, (use pseudo code for simplicity. Remind me if any thing is confusing )

void foo(int a) {
     %a.tmp = alloca ...
     %b = alloca ...
     copy %a to %a.tmp
     coro.suspend
     copy %a.tmp to %b ...
     use of %b
}

would be splited to:

void foo.ramp(int a);
void foo.resume(%foo.Frame);

And if all the caller of foo, calls foo.resume immediately foo.ramp, everything should be right. However, the actual behavior of the program between foo.ramp and foo.resume is undetermined. In other words, there may be any instruction executed between foo.ramp and foo.resume. From the side of static analysis, it looks like there is a call to function without definition known at compile time in every coro.suspend call.

So it looks natural to mark arguments may alias after coro.suspend to me. How do you think about it?

@efriedma think that coroutine suspend wouldn't make alloca really 'may alias'. We are just trying to make a lie to the MemCpyOpt Pass.

Yes. I suspect the bug is actually in the implementation of corosplit.

So it looks natural to mark arguments may alias after coro.suspend to me. How do you think about it?

For a normal pointer argument, that reasoning makes sense. But the normal alias analysis rules should make that happen automatically. For a byval argument, it does not make sense.

there may be any instruction executed between foo.ramp and foo.resume

In general, calling a function with a byval argument makes a copy of the memory in question, into memory owned by the callee. That memory is local to the coroutine function. So the code outside the function can't access it unless a pointer to the memory is explicitly escaped inside the callee.

In the context of a coroutine, that means instructions between foo.ramp and foo.resume can't modify the byval memory unless the address escapes.


I dug a bit more, and I think I see where the confusion is coming from. If you look at the clang output for the testcase from the bug before any optimizations run, there's an alloca named "%a11", which is specific to coroutines. Coroutines have to do some extra work to support arguments that are passed indirectly: we have to move-construct the argument from caller-owned memory into the coroutine frame. The "extra" alloca is coming out of this code.

If an argument is getting passed indirectly, so it's actually just a plain pointer at the LLVM IR level, this makes perfect sense. For arguments that are getting passed directly, or byval, it's nonsense: the argument was never in caller-owned memory in the first place, so copying it is extra work for nothing.

If you fix the clang code so it isn't generating these extra useless allocas, the problem with byval should become obvious: even without any optimizations, we end up with a use-after-free. corosplit isn't preserving memory that's supposed to be part of the callee frame.

The only reason byval arguments currently work at all is that the extra allocas cover up the bug in corosplit.

If you fix the clang code so it isn't generating these extra useless allocas, the problem with byval should become obvious: even without any optimizations, we end up with a use-after-free. corosplit isn't preserving memory that's supposed to be part of the callee frame.

Thanks. I get your points now.
Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

Sorry, that's not right. The question you want to answer here is whether the pointer in the caller points to different memory from the pointer in the callee. The only attribute that causes that is byval, and it's very unlikely we'll ever add any others.

inalloca/preallocated memory is part of the callee stack frame, at a low level. But it's actually allocated in the caller, so from an alias-analysis perspective, it doesn't count as a local allocation like byval would.

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

Sorry, that's not right. The question you want to answer here is whether the pointer in the caller points to different memory from the pointer in the callee. The only attribute that causes that is byval, and it's very unlikely we'll ever add any others.

inalloca/preallocated memory is part of the callee stack frame, at a low level. But it's actually allocated in the caller, so from an alias-analysis perspective, it doesn't count as a local allocation like byval would.

hmm sounds like hasPassPointeeByValueCopyAttr() is actually what I need. In CoroSplit, when deciding what to put on the coroutine frame, for arguments, I will need to decide whether it's enough to just put the pointer on the frame, or if I need to copy the content of the pointer (much like allocas) to the frame. Sounds like for inalloca and preallocated memory I still want to copy the content to the frame, not just the pointer.

Do you know if there is a stable contract in IR to tell whether an argument is in fact a callee-owned memory?

Argument::hasPassPointeeByValueCopyAttr()

Sorry, that's not right. The question you want to answer here is whether the pointer in the caller points to different memory from the pointer in the callee. The only attribute that causes that is byval, and it's very unlikely we'll ever add any others.

inalloca/preallocated memory is part of the callee stack frame, at a low level. But it's actually allocated in the caller, so from an alias-analysis perspective, it doesn't count as a local allocation like byval would.

It seems that there is no stable contract to tell whether an argument is byval, and it depends on the target.
For example: https://godbolt.org/z/YcrYPbMdc
So we won't be able to fix this just in CoroSplit pass. Either we change the front-end to force all byval arguments to have byval attribute (not sure if that's possible), or we may have to conservatively go with this patch.

It seems that there is no stable contract to tell whether an argument is byval, and it depends on the target.
For example: https://godbolt.org/z/YcrYPbMdc
So we won't be able to fix this just in CoroSplit pass. Either we change the front-end to force all byval arguments to have byval attribute (not sure if that's possible), or we may have to conservatively go with this patch.

I'm not sure what you mean. An LLVM IR argument either has the byval attribute, or it doesn't. If it does, the allocation belongs to the callee. If it doesn't, the pointer refers to an allocation constructed in the caller. There isn't any other sense in which an argument can be "byval".

A given C++ function can have a bunch of variations in the LLVM IR signature, depending on the target, but that's not relevant here. Once clang is finished emitting IR, the C++ type signature becomes irrelevant.

It seems that there is no stable contract to tell whether an argument is byval, and it depends on the target.
For example: https://godbolt.org/z/YcrYPbMdc
So we won't be able to fix this just in CoroSplit pass. Either we change the front-end to force all byval arguments to have byval attribute (not sure if that's possible), or we may have to conservatively go with this patch.

I'm not sure what you mean. An LLVM IR argument either has the byval attribute, or it doesn't. If it does, the allocation belongs to the callee. If it doesn't, the pointer refers to an allocation constructed in the caller. There isn't any other sense in which an argument can be "byval".

A given C++ function can have a bunch of variations in the LLVM IR signature, depending on the target, but that's not relevant here. Once clang is finished emitting IR, the C++ type signature becomes irrelevant.

Unfortunately what we really care about here is whether an argument is by-value from C++ perspective, not just whether it has a byval attribute. Let me try to explain the problem top-down and see if I got it wrong somewhere:

  1. In C++, when we call a function with a parameter passed by value, such as foo(std::move(a)), the passed value is typically a temporary value allocated at caller, and only the pointer is passed to the callee. When callee returns, the caller will be able to deallocate that temporary value's memory (in LLVM IR terms, there will be a lifetime.end marking the temporary value after the callee returns).
  2. If the callee is a normal routine (non-coroutine), the callee don't need to worry about the lifetime of the parameter because it will stay alive through the entire life cycle of the callee.
  3. However if the callee is a coroutine, the situation is different because the coroutine can return in the middle of the function and latter resume. As soon as the coroutine returns for the first time (i.e. suspended the first time), the temporary value (the pass-by-value parameter) would die in the caller. But if the callee needs to use that argument after coroutine resumption, we need to make sure that the argument value will be put on the coroutine frame (similar to allocas), so that latter access won't be accessing invalid memory.
  4. However there is one major difference between a parameter and an alloca: for a parameter that is a pointer type, we need to know whether we need to copy the value that the pointer points to to the coroutine frame, or just the pointer it self. For instance, you could call a coroutine by passing a direct pointer, and in that case we should only need to copy the pointer to the frame. But for the C++ pass-by-value arguments, we need to copy the value that the pointer points to to the frame. The question is then, during CoroSplit (the pass where we put things to the frame), are we able to tell, given an argument pointer, whether we need to copy the value or just the pointer to the frame? Knowing that a C++ pass-by-value argument doesn't necessarily end up with an argument with a byval attribute, we know that it is not possible to tell, hence we cannot deal with this properly only in CoroSplit pass.

That this sound right?

I don't think you're characterizing the C++ issues correctly.

From clang's perspective, what it needs to ensure is that every argument is stored in an allocation local to the callee (in other words, allocations corosplit can handle). "allocations corosplit can handle" should be either allocas, or arguments marked byval. These are pointers that don't exist in the caller, so corosplit can mess with the address in memory without worrying about code it can't see in the caller.

Between clang IR generation and corosplit, normal alias analysis should ensure we don't assume arguments allocated in the caller are live past the first coro.suspend. (This might involve ensuring clang doesn't attach certain attributes, like noalias, to the arguments.)

corosplit then takes the alloca/byval allocations, and rewrites them to refer to the coroutine frame.

From clang's perspective, what it needs to ensure is that every argument is stored in an allocation local to the callee (in other words, allocations corosplit can handle). "allocations corosplit can handle" should be either allocas, or arguments marked byval. These are pointers that don't exist in the caller, so corosplit can mess with the address in memory without worrying about code it can't see in the caller.

How do you prevent optimization passes (prior to CoroSplit) from messing up with those pointers, though?
Let's say we have an argument that's not marked with byval (due to target specifics) but is pass-by-value in C++. In Clang we could emit IR to copy the argument value to a local alloca. However latter optimization passes could very well optimize out that copy because the copy would appear to be useless without considering what coro_suspend does, which is why we are changing BasicAA to prevent such elimination.
Or are you suggesting we need to modify Clang such that every pass-by-value argument in C++ must be marked with byval?

However latter optimization passes could very well optimize out that copy because the copy would appear to be useless without considering what coro_suspend does, which is why we are changing BasicAA to prevent such elimination.

If an argument is just a pointer without any argument attributes, BasicAA will assume coro_suspend frees the underlying allocations. Not because it has any particular knowledge of how coro_suspend works, but because that's the default assumption for any function call with unknown semantics.

BasicAA treats arguments marked byval differently because it understands the semantics of byval: it uses escape analysis to prove an arbitrary function call can't modify the contents.


I guess in some sense, it's possible to specify a world where coro_suspend frees byval allocations. But such a world doesn't really make sense. And we'd need to audit every pass that knows anything about byval semantics, so I really don't want to go down that path.

However latter optimization passes could very well optimize out that copy because the copy would appear to be useless without considering what coro_suspend does, which is why we are changing BasicAA to prevent such elimination.

If an argument is just a pointer without any argument attributes, BasicAA will assume coro_suspend frees the underlying allocations. Not because it has any particular knowledge of how coro_suspend works, but because that's the default assumption for any function call with unknown semantics.

BasicAA treats arguments marked byval differently because it understands the semantics of byval: it uses escape analysis to prove an arbitrary function call can't modify the contents.


I guess in some sense, it's possible to specify a world where coro_suspend frees byval allocations. But such a world doesn't really make sense. And we'd need to audit every pass that knows anything about byval semantics, so I really don't want to go down that path.

Thank you so much! I understand it now.
Let me also play with inalloca attribute just to double check.

Abandon this in favor of D104184

lxfind abandoned this revision.Jun 16 2021, 10:34 AM