This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine] Sink lifetime markers after switch of suspend blocks to avoid disturbing must tail calls
AbandonedPublic

Authored by ChuanqiXu on Jun 3 2021, 12:30 AM.

Details

Summary

This fixes bug 50544.

The original implementation would try to insert new lifetime markers between coro.suspend and the switch that uses coro.suspends. After lowering, it would make a pattern:

fastcc call anything ; // should be musttail
%0 = getelementptr ...
lifetime.start(size, %0)
ret void

which prevent the musttail call and it doesn't make sense actually. This patch tries to insert new lifetime markers after the switch block instead, which solves the bug and makes more sense.

Test Plan: check-all

Diff Detail

Event Timeline

ChuanqiXu created this revision.Jun 3 2021, 12:30 AM
ChuanqiXu requested review of this revision.Jun 3 2021, 12:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2021, 12:30 AM
ChuanqiXu updated this revision to Diff 349469.Jun 3 2021, 12:36 AM

Formatting.

rjmccall added inline comments.Jun 3 2021, 6:44 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2343–2383

Can we assert that this is a switch block somehow? This is very dangerous if the code pattern isn't what we expect.

2442

Is this not just NewLifetime->setOperand(1, NewBitCast)?

ChuanqiXu updated this revision to Diff 349755.Jun 3 2021, 8:46 PM

Address comments.

ChuanqiXu added inline comments.Jun 3 2021, 8:48 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2343–2383

The terminator of SuspendBlock->getSingleSuccessor() isn't necessarily a SwitchInst actually. Precious name is misunderstanding, my bad. I add comments to clarify it.

ChuanqiXu updated this revision to Diff 349757.Jun 3 2021, 8:50 PM

Update comments.

rjmccall added inline comments.Jun 4 2021, 9:20 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2343–2383

Do we ever not emit a block that does this kind of branch/switch? What if there's only one suspend point? It's extremely important that this not accidentally walk into the user-provided parts of the function, so if there's any way for us to actually assert that the successor block has the form we expect (nothing but a switch or branch on the state value), that would make me feel a lot better.

ChuanqiXu updated this revision to Diff 350175.Jun 6 2021, 10:44 PM

Add guard codes to check the pattern.

I find that the style of suspend block for async and retcon is a little different with switch. So I choose to handle switch style suspend specially. It should be OK since the intuition of the patch is to make sinkLifetimeStartMarkers avoid preventing symmetric transfer. And it seems like that only the switch style coroutine (c++ coroutine) has the definition about symmetric transfer.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2343–2383

Do we ever not emit a block that does this kind of branch/switch?

We always emit a switch in the frontend. But if there are identical destination basic block in the switch instruction, it may be converted into a conditional branch instruction. And if all the destination blocks are the same (happens in the test case), it would be converted to an unconditional branch instruction.

> What if there's only one suspend point?

From my point of view, it should be OK.

It's extremely important that this not accidentally walk into the user-provided parts of the function, so if there's any way for us to actually assert that the successor block has the form we expect (nothing but a switch or branch on the state value), that would make me feel a lot better.

Agreed. However, it seems hard to code the pattern in a one assert statement. So I tried to add codes to check the pattern. The codes look a little bit ugly to me. But I find it isn't easy to make it in a simple way. One alternative was to add function to verify the IR in the beginning of coro split pass.

By the way, there are other side effects for these guard codes. It breaks some tests whose pattern are unusual for IR produced by the frontend. But from the documentation (https://llvm.org/docs/Coroutines.html#llvm-coro-suspend-retcon-intrinsic), I think these tests are legal. After all, LLVM IR are independent language. So if we decide to continue in this way. We need to modify the document about coroutine API to make it consistent.

lxfind added inline comments.Jun 8 2021, 9:27 AM
llvm/test/Transforms/Coroutines/coro-debug.ll
22

This seems very common in our test cases, are you sure this kind of conversion can never happen?

ChuanqiXu added inline comments.Jun 8 2021, 6:41 PM
llvm/test/Transforms/Coroutines/coro-debug.ll
22

Probably yes. Since the code emitted in the clang frontend is:

%suspendResult = call i8 @llvm.coro.suspend()
%switch i8 % suspendResult, label ...

To my knowledge, it's hard to believe that there would be conversion from i8 to i32 in the middle end. So I think this code pattern is only live in the test cases.

The things that I worried about is that the guard codes would restrict the semantics of Coroutine APIs. Since in the documentation, it didn't say we must use a switch follow the llvm.coro.suspend. I wonder if the restriction is too hard. How do you guys think about it? @rjmccall and @lxfind .

rjmccall added inline comments.Jun 12 2021, 11:49 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2343–2383

We always emit a switch in the frontend. But if there are identical destination basic block in the switch instruction, it may be converted into a conditional branch instruction. And if all the destination blocks are the same (happens in the test case), it would be converted to an unconditional branch instruction.

Well, two blocks separated by an unconditional branch can be merged, so I don't this we can rely on seeing this pattern, even if it were true that the frontend guaranteed to emit it.

Stepping back, are lifetime intrinsics that point into the coroutine frame really at all useful after splitting? Should we just erase lifetime intrinsics for allocas that we've moved into the frame? (Lifetimes for allocas that we haven't moved into the frame should be fine to leave in, since we'll only do this when the lifetimes don't overlap a suspend.)

ChuanqiXu added inline comments.Jun 14 2021, 7:30 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2343–2383

Stepping back, are lifetime intrinsics that point into the coroutine frame really at all useful after splitting?

In my mind, if we did coroutine elision and SROA for the frame, the lifetime intrinsics may be useful. I didn't write test cases to prove it. But I guess it may be possible.

From my point of view, there may be two solutions:

  1. Remove these check-codes simply. Keep it as the original patch.
  2. Try to remove the lifetime markers when we convert tail call to musttail call at the end of Coro-split pass.

I think the first may be better. Because the root cause of the bug is that we didn't put lifetime markers properly which breaks the symmetric transfer. So we need to put the lifetime markers properly. And there are two questions followed:

  1. Do we put lifetime markers in the right place? Or do the lifetime markers would make the program wrong (crash)?
  2. Can we prove that the lifetime markers inserted wouldn't break the symmetric transfer?

And my answer for the first question is yes. Since the condition to sink lifetimes didn't change and I think it is correct.
For the second question, which may be your main concern, my answer is yes if other passes didn't do any surprising transformations. Given the codes generated in the frontend, we can know the switch (or branch) statement should be in the same BB of the corresponding coro.suspend. Then after splitting around coro.suspend, the switch statement should be in the single successor block of the coro.suspend. So it should be fine in my mind.

For the second solution, removing lifetimes when we generating musttail call. It looks a little bit weird for me since it didn't remove all the lifetime markers as you said nor prove the removed lifetime markers wouldn't be used in future transformation.

To be honest, none of the solutions seems satisfying but I couldn't propose a better one.

At a high level, what we are trying to achieve with sinkLifetimeStartMarkers is to make the lifetime of each alloca more accurate, so that we could avoid putting unnecessary data on the frame. The problem itself is valid: how to make lifetime analysis more accurate. But maybe the solution is hacky to begin with (I know, I added this function). Maybe we want to rely on some more formal/universal lifetime analysis to determine the true alive range of each alloca, instead of relying on moving around lifetime intrinsics. That said, it's possible that we don't want to sink lifetime markers at all, but want to use a more robust solution to determine the lifetimes.

On the other hand, I don't know what kind of lifetime analysis is out there that could be used by CoroSplit and is good enough.
I wonder how bad it would be if we just disable/remove the sinkLifetimeStartMarkers step for now, and focus on finding a better lifetime analysis framework that would give us what we want?

On the other hand, I don't know what kind of lifetime analysis is out there that could be used by CoroSplit and is good enough.
I wonder how bad it would be if we just disable/remove the sinkLifetimeStartMarkers step for now, and focus on finding a better lifetime analysis framework that would give us what we want?

To my knowledge, there isn't analysis/transformation available for our purpose now (reduce the lifetime range marked by lifetime markers). There is a pass named StackColoring, which could reduce the lifetime for stack variables. But we can't use it directly since they are implemented for MIR.


To summarize up, the solutions we found currently are:

  1. Sink lifetime markers after the switch statement followed by 'coro.suspend' intrinsics. One key point we discussed for this solution is whether we should add guard codes to check the pattern. And if yes, how do we check the pattern?
  2. Erase all the lifetime markers for the allocas we put on the frame after we made splitting. @rjmccall provided.
    1. Try to remove the lifetime markers when we convert tail call to musttail call at the end of Coro-split pass.
  3. Remove sinkLifetimeStartMarkers simply and try to make a better lifetime analysis. @lxfind provided.

There are my thoughts about them.
For the first: I prefer this method. Here are my reasons:

  1. The algorithm of sinkLifetimeStartMarkers looks like right and it works well for a relative long time. We hadn't find bug reports for it for a while.
  2. The problem we found now is caused by sinking lifetime start markers between 'coro.suspend' and the switch statement, which would break the transformation to the symmetric transfer.

So it looks like a minor issue of sinkLifetimeStartMarkers and we only need to sink lifetime start markers to the place where wouldn't break the transformation. And the third question is:

  1. Is the new place we choose good?

I think the answer is yes. Since we didn't change the logic of sinkLifetimeStartMarkers. And if sinkLifetimeStartMarkers is right, the codes after patching should be right too. Otherwise we should fix sinkLifetimeStartMarkers .
Another things to diccuss is whether to add check-codes. I think we can omit it and I said the reasons in the above.

For the second solution: Erase all the lifetime markers for the allocas we put on the frame after we made splitting.
The worries I got is that if the coroutine got elided and the frame got SROA, then the lifetime markers could be useful. But we had erased them.
The positive side is that this is simple to implement and the downside looks strict to trigger.

For the third solution: Try to remove the lifetime markers when we convert tail call to musttail call at the end of Coro-split pass
It looks like the second solution, which makes the condition to erase the lifetime markers more strict.

For the final solution: Remove sinkLifetimeStartMarkers simply and try to make a better lifetime analysis.
It consists of two steps:

  1. Remove sinkLifetimeStartMarkers.
  2. Make a better and formal lifetime analysis.

The first step could fix the bug clearly. But we need to answer how much regression we would got if we removed this optimization. It's a hard question. It reveals that we lack the benchmark for coroutine again. BTW, I had tried to make a benchmark. But I find it's much much harder than I think. Maybe we could discuss this in another thread.
For the second step, it should be harder to add a separate pass. But I agree with that we can improve the current analysis further more.

Overall, I prefer the first solution since the downside is relatively minimal from my perspective. The second solution is also acceptable to me since the condition to trigger the bad results are rare and the bad behavior is not so bad. If we really care it, we could choose the third solution.
For the final solution, I think the better order should be 'Make a better lifetime analysis' and 'Replace sinkLifetimeStartMarkers with the new one'. Although we can't know exactly the impact of removing sinkLifetimeStartMarkers , it doesn't make sense to me that we remove an optimization arbitrarily. I agree with that we could and should improve sinkLifetimeStartMarkers , but I think we could make it in the future instead of this patch (It seems like the TODO list is already long though)

I guess I am fine with solution 1 and see how far we can go with it.
In the long run, we should look at cases where this optimization (sinkLifetimeStartMarkers) is effective and figure out why the lifetime start markers are emitted early, so that we could improve the lifetime marker creation in the first place. Overall it still feels a bit strange to move lifetime markers around (a sign that the front-end didn't do the right thing)

I guess I am fine with solution 1 and see how far we can go with it.
In the long run, we should look at cases where this optimization (sinkLifetimeStartMarkers) is effective and figure out why the lifetime start markers are emitted early, so that we could improve the lifetime marker creation in the first place. Overall it still feels a bit strange to move lifetime markers around (a sign that the front-end didn't do the right thing)

All Agreed. Improving the accuracy for lifetime markers is a goal for me too. As I mentioned in bug 50372, I think we can do better to reduce the frame size.

By the way, while you are at it, I was looking at some of the sink lifetime tests, and I was confused by this test:
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll
It's supposed to test that we can sink the lifetime (and hence will not put the alloca on the frame). But looks like we are not able to sink it and the test is testing that it will be on the frame?

By the way, while you are at it, I was looking at some of the sink lifetime tests, and I was confused by this test:
https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll
It's supposed to test that we can sink the lifetime (and hence will not put the alloca on the frame). But looks like we are not able to sink it and the test is testing that it will be on the frame?

Yeah, this test is intended to not sink the lifetime to avoid troubles. Since the use of 'testval' would happen in both ramp function and resume function, it should be on the frame. From the commit log, it looks like that there are some problems with the original implementation which is a little bit too aggressive. So the fix tried to be more strict on that.

It seems like @rjmccall is out of work recently.
Since @lxfind you said that you may be fine with the first solution, what's your opinion about the guard codes?

I am OK with this. You may want to move those part of the checking code to a separate function for readability. (since this check is independent from lifetime sinking, we could also consider doing the check earlier for all corosuspends too.

I am OK with this. You may want to move those part of the checking code to a separate function for readability. (since this check is independent from lifetime sinking, we could also consider doing the check earlier for all corosuspends too.

In my opinion, I prefer to remove these checking codes in this diff or add the checking codes in another patch.
Here are reasons:

  • This check is independent from this patch.
  • It should be harmless.

Let me explain the second point more. First the algorithm of sinkLifetimeStartMarkers is to check if the candidate insert point is safe to insert lifetime markers. Second, this patch would add the successors of the SuspendBlock as candidate insert points instead of the SuspendBlock itself. So I think this patch only reduce the range that the lifetimes could be inserted to affect. It doesn't touch the algorithm that judge the safeness. So I mean, as long as the algorithm is right, this patch should do nothing harmful. This algorithm looks good to me. And in case that the algorithm has bugs, we should fix the bug in the algorithm instead.

So I prefer to remove the checking codes simply.

ChuanqiXu planned changes to this revision.Nov 15 2021, 2:25 AM
ChuanqiXu abandoned this revision.Nov 15 2021, 7:37 PM