- User Since
- May 3 2020, 4:37 PM (49 w, 3 d)
Mon, Apr 12
Set the attributes in Clang instead of CoroEarly
Sun, Apr 11
Ah, if the pass does more than just setting the attribute, then sure, it makes sense to keep it. But I do think we should be requiring the attribute to be added by frontends, since it's really an IR invariant that it's present on all unlowered coroutines.
Sat, Apr 10
Fri, Apr 9
I am also a bit skeptical about this patch.
Specifically I agree that dbg info should not affect the coroutine frame.
From what I can tell, dbg intrinsics are location insensitive, i.e. they can be put at any location. (correct me if I am wrong)
So a use of any value by dbg intrinsics should not cause the value to be put on the frame.
Perhaps we could first move all dbg intrinsics to a dedicated location (e.g. right after corobegin) before creating the frame, and copy them during function cloning?
Thu, Mar 25
check null on S
Abandoning in favor of D99227
@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because of data inconsistency.
Address comments, and fix all tests
Wed, Mar 24
Tue, Mar 23
I am not sure if %0 = bit cast %a to ... is the majority redundant fields of coroutine frame if we enable this patch. My original idea is to add a rematerialization process to reduce the coroutine frame like register allocation. However, it maybe a little hard and benefit less. It sounds more easy and beneficial to use original pointer by value tracking. Then I think this problem may be not introduced by this patch. Here is my example:%0 = bitcast %a to ... call @llvm.coro.suspend ; .... use of %a use of %0
Then both %a and %0 would be put in the frame, which is totally redundant. We could see if there are examples other than bit cast in other patch.
It will only cause variables to be put on the stack instead of on the frame, which shouldn't affect developer's view?
I think you just set ShouldEmitLifetimeMarkers correctly in the first place instead of adding this as an extra condition to every place that considers it, however.
This was set when a CodeGenFunction is constructed, at that point it doesn't yet know if this function is a coroutine.
I could turn ShouldEmitLifetimeMarkers to non-const, and then modify it once it realizes it's a coroutine though, if that's better than the current approach.
Curious, is %0 = bitcast %a to ... the majority of the cases we are missing? If so, could we turn all the dbg.value instructions to use their original pointer by stripping pointer cast?
Sun, Mar 21
@bruno Thanks for the review!
Wed, Mar 17
Looks like it's calling "llvm::dbgs()" directly.
I am not sure how this would work, maybe I am missing something.
But this patch tries to round up the frame pointer by looking at the difference between the alignment of new and the alignment of the frame.
The alignment of new only gives you the guaranteed alignment for new, but not necessarily the maximum alignment, e.g. if the alignment of new is 16, the returned pointer can still be a multiple 32. And that difference matters.
Tue, Mar 16
Can't we did as inline comments?
No, because it would have already been too late. SuspendExpr returns the result of __builtin_coro_resume(awaiter.await_suspend().address()), which is different from the result of awaiter.await_suspend().
We need to be able to control the placement of awaiter.await_suspend(), which is why I had to break up the AST at that boundary.
Then if we want to put the result of the await_suspend in the stack, I think we can do it under CodeGen part only. It should be easy to judge the return type of await_suspend and create a call to llvm.coro.forcestack to the return value of await_suspend.
We probably could, but it would be very very tedious.
During CodeGen, we only have the AST that's calling __builtin_coro_resume, which we will call Emit as a whole.
So we need to manually match the AST 2 levels down to find the await_suspend call, get its name, and then walk through the emitted IR to find a call with the same name, and then find the tmp that's used to store the return value of the call, and then emit llvm.coro.forcestack.
Then we need to answer the question: how can we prove that the result of symmetric transfer and %gro are the only exceptions from the above rules. Or how can we know the list of exceptions wouldn't get longer and longer in the future?
Then go back to the example in the summary. From my point of view, the key problem is that our escape analysis isn't powerful enough. I don't ask us to do excellent escape analysis. It may beyond our abilities. I just want to say how can we know the result of symmetric transfer and %gro are the only exceptions.
That's a fair point. I agree that we have no guarantee these are the only two cases.
It does seem to me that coroutine implementation somewhat relies on proper lifetime markers so that data are being put correctly, which may be the fundamental problem we are trying to solve.
Well, I guess another potential solution is to force emitting lifetime intrinsics for this part of coroutine in the front-end.
Here what I want to say is we shouldn't handle all the symmetric transfer from the above analysis. And we shouldn't change the ASTNodes and Sema part. We need to solve about the above pattern. It is not easy to give a solution since user could implement symmetric transfer in final awaiter without destroying the handle, which is more common.
Just to clarify, in case there are any confusions around this. This patch would work no matter whether the coroutine frame is destroyed or not during await_suspend(). It simply makes sure that the temporary handle returned by await_suspend will be put in the stack instead of heap, and it will always be safe to do so, no matter what happens.
Whether or not the current coroutine frame would be destroyed completely depend on the implementation of await_suspend. So we cannot predict or know in advance. Therefore, the temporary handle returned by await_suspend must be put on the stack. I don't really see any other solutions other than this.
This is a repro of the crash (in TSAN mode): https://godbolt.org/z/KvPY66
It will result in a crash, because we will be accessing memory that's already freed. If you run:
bin/clang -fcoroutines-ts -std=c++14 -stdlib=libc++ ../clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp -o - -emit-llvm -S -Xclang -disable-llvm-passes
You can see that in the final.suspend basic block, there are IRs like this:
%call19 = call i8* @_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(%"struct.detached_task::promise_type::final_awaiter"* nonnull dereferenceable(1) %ref.tm p10, i8* %22) #2 %coerce.dive20 = getelementptr inbounds %"struct.std::experimental::coroutines_v1::coroutine_handle.0", %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %coerce, i32 0, i32 0 store i8* %call19, i8** %coerce.dive20, align 8 %call21 = call i8* @_ZNKSt12experimental13coroutines_v116coroutine_handleIvE7addressEv(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull dereferenceable(8) %coerce) #2 call void @llvm.coro.resume(i8* %call21)
The temporary variable %coerce will be put on the frame because it's used by the call to address function and LLVM thinks it may escape. But the call to await_suspend() (the first line) in reality could destroy the current coroutine frame. Hence after the call to await_suspend, it will be accessing the frame, leading to memory corruption.
Mar 15 2021
Mar 10 2021
Thanks. I am landing it.
But feel free to comment here if anything isn't right. @ABataev
Mar 9 2021
Mar 6 2021
Mar 4 2021
Could you describe in more detail what problem this patch solves?
Also, since you are adding a new intrinsics, please also update the coroutine documentation regarding this new intrinsics.
Mar 3 2021
Mar 2 2021
Feb 25 2021
Add more detailed documentation
Yes, lots of the cases follow this rules. However, as long as the values stored are loop invariant, then LICM should move store out of the loop to reduce memory operation.
You are right that in some cases LICM does reduce memory operations for coroutines.
Now that we have all agreed on the documentation part, let me think a bit more about this patch vs D87817
Feb 24 2021
Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.
I agree. Although I am leaning towards a different approach than specializing allocations. I am thinking about introducing another ramp function, whose sole job is to allocate the frame, copy the parameters, and then just to the real ramp function. This will prevent any random instructions from getting moved before coro.begin, because now we have a function boundary.
I'm not sure I understand what this allocation is that's only being used after the coroutine frame is destroyed. Where is it coming from? If it's actually just a local allocation in a function that we're inlining into the epilogue (a destructor?), can we reasonably delay that inlining to only happen after splitting? I think in any case we'll need to limit early inlining in those sections if we want to have stronger structural rules about allocation for them; either that or we'll need to be able to introduce our stronger rules during inlining.
It's not due to inlining. There are two cases here, one is the return object case (which I will talk a bit more below). The other is when a call to suspend_away returns a handle, and we need to do symmetric transfer by resuming that returned handle. In a case like this, the current coroutine (the one that is being suspended) may have been destroyed already. So all the instructions in-between the call to suspend_away and the resume of the handle should not touch the coroutine frame. This used to be quite complicated to deal with, but I patched it up in the front end a few months ago so that the lifetime of any temporaries used in that range is minimum as possible, so that a decent lifetime analysis/usage analysis should be able to decide that they need to be on the stack. However I still worry that if there exists any function call there that might appears to escape (especially when objects are constructed through sret), it will be broken.
I don't quite understand what you're saying here. If we're returning the return object indirectly (this is the return object from the ramp function, i.e. the coroutine value itself, right?), we should be constructing it in-place, and there's no local allocation for it. Are we constructing it into a temporary and then copying that (with memcpy? or a copy constructor?) for the actual return?
The return object is obtained by calling the get_return_object() function on the promise. And since the return object is a struct, it first do alloca to allocate a return object, then call the get_return_object() function with sret parameter. In that case, the return object isn't captured but the compiler doesn't know that, which makes it tricky.
Hmm. I can certainly believe that some of the code-motion passes need to understand that they can't always move things around suspensions. One problem with actually changing to an instruction, though, is that coroutine suspension actually looks quite different in the different lowerings.
It's possible that we may diverge more and more between these different lowerings..
Added detailed documentation on next steps. While I need to continue iterating on them, this patch incrementally improves the exisiting code by considering indirect use, so it should be a pure win
Feb 23 2021
Feb 22 2021
BTW, although this patch is OK enough for pr46990 and use-after-free issue, Fixing these issues case by case for coroutine is not good enough. That's the reason why i agree with efriedma.
LICM move the memory operations out of the loop. It does reduce the number of memory operations. More importantly, I agree with @efriedma that either we have general solution or we describe these restrictions other than fix them anywhere.
Feb 19 2021
We don't need more information to do that; we need to use the information we have *correctly*. When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).
Feb 18 2021
Not remove the attribute for musttail
@efriedma, do you think the inlined document is sufficient? Or would you prefer to see it somewhere else too?
Feb 17 2021
Feb 15 2021
@junparser I have been re-thinking about this patch, and now I actually think this is the right patch to go.
Other than the correctness reasons, I realize that it is almost always harmful to do LICM for a loop that contains a coroutine suspension. The reason is that in most cases LICM moves memory operations out of the loop. (LICM does move constant calls as well but it is relatively rare to have constant calls that's not inlined, furthermore there are other complications such as TLS access and etc.).
For memory operations, moving them out of the loop does not make the loop faster if there is a coroutine suspend in the loop, because the value moved out needs to be put on the coroutine frame anyway. So LICM not only does not eliminate memory access but increases the frame size.
I propose to bring this patch back. (I can help send it too)
What do you think?
Feb 12 2021
fix clang-tidy warnings
Feb 11 2021
Note: I believe it's still possible to take advantage of lifetime intrinsics, but the approach will be very different and I would prefer to implement that in a separate patch.