Page MenuHomePhabricator

lxfind (Xun Li)
User

Projects

User does not belong to any projects.

User Details

User Since
May 3 2020, 4:37 PM (49 w, 3 d)

Recent Activity

Today

lxfind added a comment to D76526: Add an algorithm for performing "optimal" layout of a struct..

Out of curiosity, what's the motivating use case for the fixed offset fields in this tool?

Wed, Apr 14, 8:17 AM · Restricted Project

Yesterday

lxfind added a comment to D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.

I think the setting is in CoroEarly from the beginning is that it is an implementation detail? Clients should only worry about coroutine shape. Maybe we could set noinline in frontends to express the intent and remove it in coroearly/corosplit?

Tue, Apr 13, 8:16 PM · Restricted Project, Restricted Project
lxfind updated the diff for D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function.

some cleanups

Tue, Apr 13, 5:17 PM · Restricted Project, Restricted Project
lxfind updated the summary of D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function.
Tue, Apr 13, 3:32 PM · Restricted Project, Restricted Project
lxfind added reviewers for D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function: rjmccall, junparser, ChuanqiXu, bruno, wenlei.
Tue, Apr 13, 3:31 PM · Restricted Project, Restricted Project
lxfind requested review of D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function.
Tue, Apr 13, 3:26 PM · Restricted Project, Restricted Project
lxfind updated the diff for D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.

Update test

Tue, Apr 13, 2:17 PM · Restricted Project, Restricted Project

Mon, Apr 12

lxfind retitled D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass from [Coroutines] Move CoroEarly pass to before AlwaysInliner to [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.
Mon, Apr 12, 8:47 AM · Restricted Project, Restricted Project
lxfind updated the diff for D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.

Set the attributes in Clang instead of CoroEarly

Mon, Apr 12, 8:46 AM · Restricted Project, Restricted Project

Sun, Apr 11

lxfind added a comment to D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.

Ah, if the pass does more than just setting the attribute, then sure, it makes sense to keep it. But I do think we should be requiring the attribute to be added by frontends, since it's really an IR invariant that it's present on all unlowered coroutines.

Sun, Apr 11, 10:47 PM · Restricted Project, Restricted Project
lxfind added a comment to D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.

Why does this pass even exist? We should just expect the frontend to set the attribute. It's not like frontends don't have to otherwise know that they're emitting a coroutine; a ton of things about the expected entire IR pattern are different.

Sun, Apr 11, 10:30 PM · Restricted Project, Restricted Project
lxfind requested review of D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.
Sun, Apr 11, 10:07 PM · Restricted Project, Restricted Project

Sat, Apr 10

lxfind added inline comments to D76526: Add an algorithm for performing "optimal" layout of a struct..
Sat, Apr 10, 10:13 PM · Restricted Project

Fri, Apr 9

lxfind added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

I am also a bit skeptical about this patch.
Specifically I agree that dbg info should not affect the coroutine frame.
From what I can tell, dbg intrinsics are location insensitive, i.e. they can be put at any location. (correct me if I am wrong)
So a use of any value by dbg intrinsics should not cause the value to be put on the frame.
Perhaps we could first move all dbg intrinsics to a dedicated location (e.g. right after corobegin) before creating the frame, and copy them during function cloning?

Fri, Apr 9, 2:28 PM · debug-info, Restricted Project

Thu, Mar 25

lxfind committed rGf490a5969bd5: [OpenMP][InstrProfiling] Fix a missing instr profiling counter (authored by lxfind).
[OpenMP][InstrProfiling] Fix a missing instr profiling counter
Thu, Mar 25, 1:53 PM
lxfind closed D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.
Thu, Mar 25, 1:52 PM · Restricted Project
lxfind updated the diff for D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

check null on S

Thu, Mar 25, 1:50 PM · Restricted Project
lxfind committed rGc7a39c833af1: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines (authored by lxfind).
[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines
Thu, Mar 25, 1:46 PM
lxfind closed D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.
Thu, Mar 25, 1:46 PM · Restricted Project
lxfind added a comment to D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because of data inconsistency.

Cannot answer right now. It would be much easier to fix this if you could provide additional info about what tests are exactly failed, what are the constructs that do not support it, etc.

Thu, Mar 25, 11:22 AM · Restricted Project
lxfind abandoned D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

Abandoning in favor of D99227

Thu, Mar 25, 10:25 AM · Restricted Project, Restricted Project
lxfind added a comment to D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because of data inconsistency.

Thu, Mar 25, 10:23 AM · Restricted Project
lxfind abandoned D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.
Thu, Mar 25, 10:23 AM · Restricted Project
lxfind updated the diff for D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.

Address comments, and fix all tests

Thu, Mar 25, 10:22 AM · Restricted Project

Wed, Mar 24

lxfind added a comment to D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.

Is it feasible to outline the initial segment that you don't want to be part of the coroutine, and then have coroutine splitting force that outlined function to be inlined into the ramp function? IIUC, you were saying that the splitting patch was difficult, but maybe thinking about it as outlining simplifies things. I know we had some nasty representational problems with the async lowering that we solved with outlining and force-inlining.

Wed, Mar 24, 8:51 AM · Restricted Project
lxfind added inline comments to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.
Wed, Mar 24, 8:43 AM · debug-info, Restricted Project

Tue, Mar 23

lxfind added inline comments to D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.
Tue, Mar 23, 10:58 PM · Restricted Project
lxfind added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

I am not sure if %0 = bit cast %a to ... is the majority redundant fields of coroutine frame if we enable this patch. My original idea is to add a rematerialization process to reduce the coroutine frame like register allocation. However, it maybe a little hard and benefit less. It sounds more easy and beneficial to use original pointer by value tracking. Then I think this problem may be not introduced by this patch. Here is my example:

%0 = bitcast %a to ...
call @llvm.coro.suspend 
; ....
use of %a
use of %0

Then both %a and %0 would be put in the frame, which is totally redundant. We could see if there are examples other than bit cast in other patch.

Tue, Mar 23, 10:26 PM · debug-info, Restricted Project
lxfind added a comment to D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.

Only one problem I had for emitting lifetime markers even at O0 is that would action make allocas to be optimized even at O0? If so, I wonder if it confuses programmers since they may find some variables disappear surprisingly. Or there would be no optimization since every function would be marked with optnone attribute. I am not sure about this.

It will only cause variables to be put on the stack instead of on the frame, which shouldn't affect developer's view?

Tue, Mar 23, 10:22 PM · Restricted Project
lxfind updated subscribers of D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.

I think you just set ShouldEmitLifetimeMarkers correctly in the first place instead of adding this as an extra condition to every place that considers it, however.

This was set when a CodeGenFunction is constructed, at that point it doesn't yet know if this function is a coroutine.
I could turn ShouldEmitLifetimeMarkers to non-const, and then modify it once it realizes it's a coroutine though, if that's better than the current approach.

Tue, Mar 23, 10:19 PM · Restricted Project
lxfind added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Curious, is %0 = bitcast %a to ... the majority of the cases we are missing? If so, could we turn all the dbg.value instructions to use their original pointer by stripping pointer cast?

Tue, Mar 23, 5:17 PM · debug-info, Restricted Project
lxfind updated the summary of D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.
Tue, Mar 23, 4:59 PM · Restricted Project
lxfind requested review of D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.
Tue, Mar 23, 4:53 PM · Restricted Project

Sun, Mar 21

lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

@bruno Thanks for the review!

Sun, Mar 21, 11:59 AM · Restricted Project, Restricted Project

Wed, Mar 17

lxfind added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Looks like it's calling "llvm::dbgs()" directly.

Wed, Mar 17, 12:30 PM · debug-info, Restricted Project
lxfind added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

I am not sure how this would work, maybe I am missing something.
But this patch tries to round up the frame pointer by looking at the difference between the alignment of new and the alignment of the frame.
The alignment of new only gives you the guaranteed alignment for new, but not necessarily the maximum alignment, e.g. if the alignment of new is 16, the returned pointer can still be a multiple 32. And that difference matters.

Wed, Mar 17, 12:07 PM · Restricted Project, Restricted Project

Tue, Mar 16

lxfind added inline comments to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.
Tue, Mar 16, 11:09 PM · Restricted Project, Restricted Project
lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

Can't we did as inline comments?

No, because it would have already been too late. SuspendExpr returns the result of __builtin_coro_resume(awaiter.await_suspend().address()), which is different from the result of awaiter.await_suspend().
We need to be able to control the placement of awaiter.await_suspend(), which is why I had to break up the AST at that boundary.

Tue, Mar 16, 11:06 PM · Restricted Project, Restricted Project
lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

Then if we want to put the result of the await_suspend in the stack, I think we can do it under CodeGen part only. It should be easy to judge the return type of await_suspend and create a call to llvm.coro.forcestack to the return value of await_suspend.

We probably could, but it would be very very tedious.
During CodeGen, we only have the AST that's calling __builtin_coro_resume, which we will call Emit as a whole.
So we need to manually match the AST 2 levels down to find the await_suspend call, get its name, and then walk through the emitted IR to find a call with the same name, and then find the tmp that's used to store the return value of the call, and then emit llvm.coro.forcestack.

Tue, Mar 16, 10:35 PM · Restricted Project, Restricted Project
lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

Then we need to answer the question: how can we prove that the result of symmetric transfer and %gro are the only exceptions from the above rules. Or how can we know the list of exceptions wouldn't get longer and longer in the future?

Then go back to the example in the summary. From my point of view, the key problem is that our escape analysis isn't powerful enough. I don't ask us to do excellent escape analysis. It may beyond our abilities. I just want to say how can we know the result of symmetric transfer and %gro are the only exceptions.

That's a fair point. I agree that we have no guarantee these are the only two cases.
It does seem to me that coroutine implementation somewhat relies on proper lifetime markers so that data are being put correctly, which may be the fundamental problem we are trying to solve.

Tue, Mar 16, 10:33 PM · Restricted Project, Restricted Project
lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

Well, I guess another potential solution is to force emitting lifetime intrinsics for this part of coroutine in the front-end.

Tue, Mar 16, 9:24 PM · Restricted Project, Restricted Project
lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

Here what I want to say is we shouldn't handle all the symmetric transfer from the above analysis. And we shouldn't change the ASTNodes and Sema part. We need to solve about the above pattern. It is not easy to give a solution since user could implement symmetric transfer in final awaiter without destroying the handle, which is more common.

Just to clarify, in case there are any confusions around this. This patch would work no matter whether the coroutine frame is destroyed or not during await_suspend(). It simply makes sure that the temporary handle returned by await_suspend will be put in the stack instead of heap, and it will always be safe to do so, no matter what happens.
Whether or not the current coroutine frame would be destroyed completely depend on the implementation of await_suspend. So we cannot predict or know in advance. Therefore, the temporary handle returned by await_suspend must be put on the stack. I don't really see any other solutions other than this.

Tue, Mar 16, 9:13 PM · Restricted Project, Restricted Project
lxfind added inline comments to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.
Tue, Mar 16, 5:50 PM · Restricted Project, Restricted Project
lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

I am a little confused about the first problem. Would it cause the program to crash? (e.g., we access the fields of coroutine frame after the frame gets destroyed). Or it just wastes some storage?

This is a repro of the crash (in TSAN mode): https://godbolt.org/z/KvPY66

Tue, Mar 16, 5:37 PM · Restricted Project, Restricted Project
lxfind updated the summary of D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.
Tue, Mar 16, 5:36 PM · Restricted Project, Restricted Project
lxfind added inline comments to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.
Tue, Mar 16, 12:19 PM · Restricted Project, Restricted Project
lxfind added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

It looks like there are two things this patch wants to do:

  1. Don't put the temporary generated by symmetric-transfer on the coroutine frame.
  2. Offer a mechanism to force some values (it is easy to extend Alloca to Value) to put in the stack instead of the coroutine frame.

I am a little confused about the first problem. Would it cause the program to crash? (e.g., we access the fields of coroutine frame after the frame gets destroyed). Or it just wastes some storage?
And I want to ask about the change of the AST nodes and SemaCoroutine. Can we know if a CoroutineSuspendExpr stands for a symmetric-transfer? If yes, it seems we can only do changes in CodeGen part.

It will result in a crash, because we will be accessing memory that's already freed. If you run:

bin/clang -fcoroutines-ts -std=c++14 -stdlib=libc++ ../clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp -o - -emit-llvm -S -Xclang -disable-llvm-passes

You can see that in the final.suspend basic block, there are IRs like this:

  %call19 = call i8* @_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(%"struct.detached_task::promise_type::final_awaiter"* nonnull dereferenceable(1) %ref.tm
p10, i8* %22) #2
  %coerce.dive20 = getelementptr inbounds %"struct.std::experimental::coroutines_v1::coroutine_handle.0", %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %coerce, i32 0, i32 0
  store i8* %call19, i8** %coerce.dive20, align 8
  %call21 = call i8* @_ZNKSt12experimental13coroutines_v116coroutine_handleIvE7addressEv(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull dereferenceable(8) %coerce) #2
  call void @llvm.coro.resume(i8* %call21)

The temporary variable %coerce will be put on the frame because it's used by the call to address function and LLVM thinks it may escape. But the call to await_suspend() (the first line) in reality could destroy the current coroutine frame. Hence after the call to await_suspend, it will be accessing the frame, leading to memory corruption.

Tue, Mar 16, 12:15 PM · Restricted Project, Restricted Project

Mar 15 2021

lxfind added reviewers for D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call: ChuanqiXu, junparser, rjmccall, bruno.
Mar 15 2021, 10:42 AM · Restricted Project, Restricted Project
lxfind requested review of D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.
Mar 15 2021, 10:36 AM · Restricted Project, Restricted Project

Mar 10 2021

lxfind added a comment to D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

There is a problem. We actually do not emit S here directly, instead, we use CodeGen lambdas, which may not be equal to S, in some cases S is nullptr here. It may result in not quite accurate results.

Thanks for the note!
I don't really know anything about OMP though, not sure how to handle it.
Would you mind taking a look at this issue? Feel free to send a different patch!

Could you create a PR for the problem so we could track it?

Mar 10 2021, 11:21 AM · Restricted Project
lxfind added a comment to D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

There is a problem. We actually do not emit S here directly, instead, we use CodeGen lambdas, which may not be equal to S, in some cases S is nullptr here. It may result in not quite accurate results.

Mar 10 2021, 11:08 AM · Restricted Project
lxfind added a comment to D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

Thanks. I am landing it.
But feel free to comment here if anything isn't right. @ABataev

Mar 10 2021, 9:29 AM · Restricted Project

Mar 9 2021

lxfind updated the diff for D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

address comment

Mar 9 2021, 8:03 PM · Restricted Project

Mar 6 2021

lxfind updated the diff for D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.

add test

Mar 6 2021, 10:38 PM · Restricted Project
lxfind requested review of D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter.
Mar 6 2021, 8:40 PM · Restricted Project

Mar 4 2021

lxfind added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

Could you describe in more detail what problem this patch solves?
Also, since you are adding a new intrinsics, please also update the coroutine documentation regarding this new intrinsics.

Mar 4 2021, 9:09 AM · Restricted Project, Restricted Project

Mar 3 2021

lxfind committed rG03f668613c44: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions (authored by lxfind).
[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions
Mar 3 2021, 3:22 PM
lxfind closed D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.
Mar 3 2021, 3:22 PM · Restricted Project

Mar 2 2021

lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

Yes, lots of the cases follow this rules. However, as long as the values stored are loop invariant, then LICM should move store out of the loop to reduce memory operation.

You are right that in some cases LICM does reduce memory operations for coroutines.
Now that we have all agreed on the documentation part, let me think a bit more about this patch vs D87817

Mar 2 2021, 1:54 PM · Restricted Project

Feb 25 2021

lxfind updated the diff for D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

Add more detailed documentation

Feb 25 2021, 10:52 PM · Restricted Project
lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

Yes, lots of the cases follow this rules. However, as long as the values stored are loop invariant, then LICM should move store out of the loop to reduce memory operation.

You are right that in some cases LICM does reduce memory operations for coroutines.
Now that we have all agreed on the documentation part, let me think a bit more about this patch vs D87817

Feb 25 2021, 4:43 PM · Restricted Project

Feb 24 2021

lxfind added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.

I agree. Although I am leaning towards a different approach than specializing allocations. I am thinking about introducing another ramp function, whose sole job is to allocate the frame, copy the parameters, and then just to the real ramp function. This will prevent any random instructions from getting moved before coro.begin, because now we have a function boundary.

I'm not sure I understand what this allocation is that's only being used after the coroutine frame is destroyed. Where is it coming from? If it's actually just a local allocation in a function that we're inlining into the epilogue (a destructor?), can we reasonably delay that inlining to only happen after splitting? I think in any case we'll need to limit early inlining in those sections if we want to have stronger structural rules about allocation for them; either that or we'll need to be able to introduce our stronger rules during inlining.

It's not due to inlining. There are two cases here, one is the return object case (which I will talk a bit more below). The other is when a call to suspend_away returns a handle, and we need to do symmetric transfer by resuming that returned handle. In a case like this, the current coroutine (the one that is being suspended) may have been destroyed already. So all the instructions in-between the call to suspend_away and the resume of the handle should not touch the coroutine frame. This used to be quite complicated to deal with, but I patched it up in the front end a few months ago so that the lifetime of any temporaries used in that range is minimum as possible, so that a decent lifetime analysis/usage analysis should be able to decide that they need to be on the stack. However I still worry that if there exists any function call there that might appears to escape (especially when objects are constructed through sret), it will be broken.

I don't quite understand what you're saying here. If we're returning the return object indirectly (this is the return object from the ramp function, i.e. the coroutine value itself, right?), we should be constructing it in-place, and there's no local allocation for it. Are we constructing it into a temporary and then copying that (with memcpy? or a copy constructor?) for the actual return?

The return object is obtained by calling the get_return_object() function on the promise. And since the return object is a struct, it first do alloca to allocate a return object, then call the get_return_object() function with sret parameter. In that case, the return object isn't captured but the compiler doesn't know that, which makes it tricky.

Hmm. I can certainly believe that some of the code-motion passes need to understand that they can't always move things around suspensions. One problem with actually changing to an instruction, though, is that coroutine suspension actually looks quite different in the different lowerings.

It's possible that we may diverge more and more between these different lowerings..

Feb 24 2021, 10:02 PM · Restricted Project
lxfind committed rGc38000a9fb2c: [Coroutine] Check indirect uses of alloca when checking lifetime info (authored by lxfind).
[Coroutine] Check indirect uses of alloca when checking lifetime info
Feb 24 2021, 6:29 PM
lxfind closed D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.
Feb 24 2021, 6:29 PM · Restricted Project
lxfind updated the diff for D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

Added detailed documentation on next steps. While I need to continue iterating on them, this patch incrementally improves the exisiting code by considering indirect use, so it should be a pure win

Feb 24 2021, 3:29 PM · Restricted Project
lxfind added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

Feb 24 2021, 3:27 PM · Restricted Project

Feb 23 2021

lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

Perhaps we need to introduce a new IR instruction for suspend instead of relying on intrinsics, but that's not going to be something we can redesign in a short amount of time.

How about add fence instruction between coro.suspend and ret block?

Feb 23 2021, 9:56 PM · Restricted Project
lxfind added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

Feb 23 2021, 9:47 PM · Restricted Project

Feb 22 2021

lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

BTW, although this patch is OK enough for pr46990 and use-after-free issue, Fixing these issues case by case for coroutine is not good enough. That's the reason why i agree with efriedma.

Feb 22 2021, 9:06 PM · Restricted Project
lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

LICM move the memory operations out of the loop. It does reduce the number of memory operations. More importantly, I agree with @efriedma that either we have general solution or we describe these restrictions other than fix them anywhere.

Feb 22 2021, 10:27 AM · Restricted Project

Feb 19 2021

lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

please see comments of D87817.

I don't think we should go that route, because LICM will mostly hurt coroutine, as I explained in the summary.
Hence we should simply disable LICM for coroutine, and I don't think this is a temporary change. What do you think?

I do not think we should disable LICM for coroutine, also this is not semantic restriction of coroutine (GCC does not do this). It just caused by current pipeline of llvm coroutine as well as debug info issues. I was thinking maybe we can invoke corosplit as early as possible (not considering performance). Anyway , we can discuss this in D95807.

Feb 19 2021, 7:27 PM · Restricted Project
lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

please see comments of D87817.

I don't think we should go that route, because LICM will mostly hurt coroutine, as I explained in the summary.
Hence we should simply disable LICM for coroutine, and I don't think this is a temporary change. What do you think?

I do not think we should disable LICM for coroutine, also this is not semantic restriction of coroutine (GCC does not do this). It just caused by current pipeline of llvm coroutine as well as debug info issues. I was thinking maybe we can invoke corosplit as early as possible (not considering performance). Anyway , we can discuss this in D95807.

Feb 19 2021, 7:26 PM · Restricted Project
lxfind added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

We don't need more information to do that; we need to use the information we have *correctly*. When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).

Feb 19 2021, 10:56 AM · Restricted Project

Feb 18 2021

lxfind committed rG3bf8f162a0a9: [Coroutine] Relax CoroElide musttail check (authored by lxfind).
[Coroutine] Relax CoroElide musttail check
Feb 18 2021, 7:36 PM
lxfind closed D96926: [Coroutine] Relax CoroElide musttail check.
Feb 18 2021, 7:36 PM · Restricted Project
lxfind planned changes to D92661: [RFC] Fix TLS and Coroutine.
Feb 18 2021, 10:36 AM · Restricted Project, Restricted Project
lxfind planned changes to D92662: [Clang][Coroutine] Drop const attribute on pthread_self when coroutine is enabled.
Feb 18 2021, 10:35 AM · Restricted Project
lxfind updated the diff for D96926: [Coroutine] Relax CoroElide musttail check.

Not remove the attribute for musttail

Feb 18 2021, 10:07 AM · Restricted Project
lxfind added a comment to D96926: [Coroutine] Relax CoroElide musttail check.

This patch looks good to me.

Feb 18 2021, 9:40 AM · Restricted Project
lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

@efriedma, do you think the inlined document is sufficient? Or would you prefer to see it somewhere else too?

Feb 18 2021, 9:14 AM · Restricted Project
lxfind planned changes to D95807: [RFC][Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened.
Feb 18 2021, 8:57 AM · Restricted Project
lxfind added a comment to D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.

please see comments of D87817.

Feb 18 2021, 8:55 AM · Restricted Project

Feb 17 2021

lxfind requested review of D96928: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions.
Feb 17 2021, 8:02 PM · Restricted Project
lxfind requested review of D96926: [Coroutine] Relax CoroElide musttail check.
Feb 17 2021, 7:13 PM · Restricted Project
lxfind updated the diff for D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

address comments

Feb 17 2021, 7:06 PM · Restricted Project
lxfind abandoned D96566: [Coroutine] Properly use lifetime intrinsics to analyze allocas.
Feb 17 2021, 5:35 PM · Restricted Project
lxfind abandoned D96441: [Coroutines] Don't use lifetime marker to decide whether an alloca should be put on the frame.
Feb 17 2021, 5:35 PM · Restricted Project
lxfind requested review of D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.
Feb 17 2021, 5:34 PM · Restricted Project

Feb 15 2021

lxfind added a comment to D86190: [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions..

@junparser I have been re-thinking about this patch, and now I actually think this is the right patch to go.
Other than the correctness reasons, I realize that it is almost always harmful to do LICM for a loop that contains a coroutine suspension. The reason is that in most cases LICM moves memory operations out of the loop. (LICM does move constant calls as well but it is relatively rare to have constant calls that's not inlined, furthermore there are other complications such as TLS access and etc.).
For memory operations, moving them out of the loop does not make the loop faster if there is a coroutine suspend in the loop, because the value moved out needs to be put on the coroutine frame anyway. So LICM not only does not eliminate memory access but increases the frame size.
I propose to bring this patch back. (I can help send it too)
What do you think?

Feb 15 2021, 11:04 AM · Restricted Project

Feb 12 2021

lxfind updated the diff for D96566: [Coroutine] Properly use lifetime intrinsics to analyze allocas.

fix clang-tidy warnings

Feb 12 2021, 10:49 AM · Restricted Project
lxfind committed rGa0d09ce4600b: [NFC][Coroutine] Fix an error message on coro.id verification (authored by lxfind).
[NFC][Coroutine] Fix an error message on coro.id verification
Feb 12 2021, 10:44 AM
lxfind closed D96447: [NFC][Coroutine] Fix an error message on coro.id verification.
Feb 12 2021, 10:44 AM · Restricted Project
lxfind added a reviewer for D96441: [Coroutines] Don't use lifetime marker to decide whether an alloca should be put on the frame: rjmccall.
Feb 12 2021, 10:33 AM · Restricted Project
lxfind added a reviewer for D96566: [Coroutine] Properly use lifetime intrinsics to analyze allocas: rjmccall.
Feb 12 2021, 10:33 AM · Restricted Project

Feb 11 2021

lxfind added reviewers for D96566: [Coroutine] Properly use lifetime intrinsics to analyze allocas: junparser, ChuanqiXu.
Feb 11 2021, 5:39 PM · Restricted Project
lxfind requested review of D96566: [Coroutine] Properly use lifetime intrinsics to analyze allocas.
Feb 11 2021, 5:37 PM · Restricted Project
lxfind added a comment to D96441: [Coroutines] Don't use lifetime marker to decide whether an alloca should be put on the frame.

Note: I believe it's still possible to take advantage of lifetime intrinsics, but the approach will be very different and I would prefer to implement that in a separate patch.

Feb 11 2021, 9:14 AM · Restricted Project

Feb 10 2021

lxfind added reviewers for D96447: [NFC][Coroutine] Fix an error message on coro.id verification: ChuanqiXu, junparser, rjmccall.
Feb 10 2021, 12:53 PM · Restricted Project
lxfind requested review of D96447: [NFC][Coroutine] Fix an error message on coro.id verification.
Feb 10 2021, 12:52 PM · Restricted Project
lxfind added a comment to D95807: [RFC][Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened.

By the way, is everyone comfortable with dropping the test lines using the legacy pass manager for Coroutine?
It's really painful to update the tests to keep both legacy pass manager and new pass manager, now that the pass sequence becomes much different. (CoroSplit runs late in CGSCC pipeline for NewPM, but early in legacy pm).
If we want to keep the legacy PM tests for a while for coroutine, I could wait until everyone is comfortable before updating this patch again.

I think testing is more important than the develop experience. It is more suitable to remove the tests for legacy pass manager when the old pass manager got replaced entirely.

Feb 10 2021, 11:58 AM · Restricted Project