This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine] Properly determine whether an alloca should live on the frame
ClosedPublic

Authored by lxfind on Oct 19 2020, 11:39 PM.

Details

Summary

The existing logic in determining whether an alloca should live on the frame only looks explicit def-use relationships. However a value defined by an alloca may be implicitly needed across suspension points, either because an alias has across-suspension-point def-use relationship, or escaped by store/call/memory intrinsics. To properly handle all these cases, we have to properly visit the alloca pointer up-front. Thie patch extends the exisiting alloca use visitor to determine whether an alloca should live on the frame.

Diff Detail

Event Timeline

lxfind created this revision.Oct 19 2020, 11:39 PM
lxfind requested review of this revision.Oct 19 2020, 11:39 PM
lxfind updated this revision to Diff 299431.Oct 20 2020, 11:40 AM

clang-tidy

There's an existing StackLifetime analysis that does some of this work and also considers the lifetime intrinsics; can we take advantage of that?

There's an existing StackLifetime analysis that does some of this work and also considers the lifetime intrinsics; can we take advantage of that?

My understanding is quite the opposite, that StackLifetime analysis purely relies on lifetime intrinsics, and does not look at how an alloca is being used.
https://github.com/llvm/llvm-project/blob/e10e7829bf6f10c053c05e42b676d7acaf54a221/llvm/include/llvm/Analysis/StackLifetime.h#L111-L112

Okay. Well, it does seem to me that we need to be considering lifetimes. If we can also optimize when we don't have lifetime information and the variable doesn't escape, that's great, but we often have pretty good lifetime information that we can use. And in optimized builds we'll usually have already eliminated allocas that don't escape, so the remaining allocas are very likely to have all escaped.

Okay. Well, it does seem to me that we need to be considering lifetimes. If we can also optimize when we don't have lifetime information and the variable doesn't escape, that's great, but we often have pretty good lifetime information that we can use. And in optimized builds we'll usually have already eliminated allocas that don't escape, so the remaining allocas are very likely to have all escaped.

Yes we are considering lifetimes. The algorithms by first checking to see if there are lifetimes, and if there are use them. After that we use the pointer visitor to do more precise tracking.

Great! To my understanding, if there are lifetime informations, the behavior of this patch is same with the behavior of previous implementation. Is my understanding right?

Great! To my understanding, if there are lifetime informations, the behavior of this patch is same with the behavior of previous implementation. Is my understanding right?

That's correct.

Oh, I see, sorry.

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
923–925

this function implies dominate relation for arg1 and arg2, so we can not use hasPathCrossingSuspendPoint for two user basic blocks directly.

lxfind added inline comments.Oct 26 2020, 10:15 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
923–925

hmm if that's the case, I think we might already have some buggy code here. The checks on lifetime markers don't necessary have dominance relationships. I saw you removed the dominance assertion in https://reviews.llvm.org/D75664. Do you remember why?

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

What do you mean by "leave aliases on frame"?

junparser added a comment.EditedOct 26 2020, 11:36 PM

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

What do you mean by "leave aliases on frame"?

Just setEscaped for pointer cast used before coro.begin, or can we do something like llvm::PointerMayBeCapturedBefore to check the pointer directly? I'm not sure about this.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
923–925

The bitcast users hit the assertion, however, they do not across the suspend point due to rewriteMaterializableInstructions. Except that, each lifetime marker has dominance relationship with some of the users. Iterate all the lifetime marker has effect as same as def

Thanks for the patch! I think I generally agree with this patch.
One thing is that the aliases seems to be over-analyzed. Can we just leave aliases on frame? Cause I have no idea about this effect on real workloads.

What do you mean by "leave aliases on frame"?

Just setEscaped for pointer cast used before coro.begin, or can we do something like llvm::PointerMayBeCapturedBefore to check the pointer directly? I'm not sure about this.

@lxfind, sorry for the confusing question, please ignore it. After I read about the D86859, one of the suggestion is that after we find all of the aliases, we can do same thing like rewriteMaterializableInstructions

junparser added inline comments.Oct 27 2020, 8:00 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2044

we can call this at the beginning of this function, and then sink the aliases after coro.begin as we do in rewriteMaterializableInstructions, keep the original logic unchanged.

I think we can even deal with unknown offset.

lxfind added inline comments.Oct 27 2020, 8:57 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2044

I am not exactly sure what you mean, but let me explain a bit more about the alias analysis here.
The alias analysis serves three purposes.
The first purpose, which was introduced in D86859, is to identify aliases created before CoroBegin so that we can recreate them after CoroBegin. In this case, we cannot deal with unknown offset because we simply cannot recreate an alias if the offset is unknow.
The second purpose, which was also introduced in D86859, is to identify allocas that may have been written into before CoroBegin. If that happens, we need to copy the content from stack to the frame.
The third purpose, which is new in this patch, is to identify which alloca should go to the frame. And we need alias analysis because without it, we won't be able to detect cases where an alloca needs to stay alive across suspension points due to indirect escape (as demonstrated in the added test cases). rewriteMaterializableInstructions cannot handle those indirect escapes.

Which case do you think we can/should simplify?

Thank you for the explanation!

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

Still need call Checker.hasPathCrossingSuspendPoint(AllocaDef, Pointeruser)

junparser added inline comments.Oct 27 2020, 7:27 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2044

This makes me more clear! LGTM.

lxfind added inline comments.Oct 27 2020, 10:43 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca belong to different suspend regions, which causes the spill.
I thought about this a bit more, and my conclusion is that although the API is not intended to be used this way, it actually works here.
The goal for this loop is to check whether an alloca needs to live across a suspension. If an alloca needs to live across a suspension, it means there exists a coro.suspend such that a user BB happens before it and a user BB happens after it, and these two BBs must dominate each other through coro.susend. Hence one of the user BB pairs must return true on the hasPathCrossingSuspendPoint check. On the other hand, for BBs that don't even dominate each other, hasPathCrossingSuspendPoint will return false anyway and won't affect the algorithm.

junparser added inline comments.Oct 27 2020, 10:52 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

let's say :

void foo () {
  alloca a;
  if (cond)
    {
      co_await
      use1 a;
    }else
      use2 a;
}

in such case, hasPathCrossingSuspendPoint(use1, use2) return false which means a is not keep on frame, this is wrong.

lxfind added inline comments.Oct 27 2020, 10:56 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

In this case, "a" is just defined but not used at all before the co_await, so it would be correct to not keep it on frame, right? Could you explain why in this case "a" needs to be on the frame?

junparser added inline comments.Oct 27 2020, 11:01 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

We cannot do that because the whole purpose of this rewrite is to check if uses of the alloca belong to different suspend regions, which causes the spill.
I thought about this a bit more, and my conclusion is that although the API is not intended to be used this way, it actually works here.
The goal for this loop is to check whether an alloca needs to live across a suspension. If an alloca needs to live across a suspension, it means there exists a coro.suspend such that a user BB happens before it and a user BB happens after it, and these two BBs must dominate each other through coro.susend. Hence one of the user BB pairs must return true on the hasPathCrossingSuspendPoint check. On the other hand, for BBs that don't even dominate each other, hasPathCrossingSuspendPoint will return false anyway and won't affect the algorithm.

I do not understand why there is a user bb happends before coro.suspend?

junparser added inline comments.Oct 27 2020, 11:05 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

It may used before the co_await, as long as initial_suspend never suspend and cond is false. This pattern can also be put in a loop. we do not know whether part will be executed.

lxfind added inline comments.Oct 27 2020, 11:10 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

If an alloca needs to live across a suspension, there must exists two uses of the alloca (or through an alias) that one happens before a coro.suspend and one happens after that same coro.suspend. If no two uses of alloca can cross a suspend, the alloca does not ever need to live on the frame. Does this part sound right?

Next we need to prove that if there exists such a pair of use, one of them will return true on hasPathCrossingSuspendPoint.
It's obvious that the alloca use after coro.suspend must be dominated by coro.suspend. The alloca happens before coro.suspend is a bit trickier to think about. In the simpler case, if the alloca happens before coro.suspend also dominates coro.suspend, then we have a pair of user BBs that would return true on the check, because they dominate each other through coro.suspend. If the alloca happens before coro.suspend does not dominate coro.suspend, and yet it's meaningfully used, it would either escape at some point, or it will be propagated into a PHINode eventually and that PHINode will dominate coro.suspend. Since we also track PHINode in the alias analysis, we will eventually find this pair that dominates each other.

lxfind added inline comments.Oct 27 2020, 11:13 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

It may used before the co_await, as long as initial_suspend never suspend and cond is false. This pattern can also be put in a loop. we do not know whether part will be executed.

In your code example, "a" is not used before co_await, neither initialized. So it doesn't matter if it lives on the frame or not. If it is used, then there will exists a user-pair that crosses suspensions.
Could you explain what could go wrong in your example above if "a" is not kept in the frame?

junparser added inline comments.Oct 27 2020, 11:16 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

If an alloca needs to live across a suspension, there must exists two uses of the alloca (or through an alias) that one happens before a coro.suspend and one happens after that same coro.suspend. If no two uses of alloca can cross a suspend, the alloca does not ever need to live on the frame. Does this part sound right?

There is no before/after relation. as long as two uses used in different suspend region. the case can be changed to

void foo () {
  alloca a;
  if (cond)
    {
      co_await
      use1 a;
    }else{
      co_await
      use2 a;
    }
}

Next we need to prove that if there exists such a pair of use, one of them will return true on hasPathCrossingSuspendPoint.
It's obvious that the alloca use after coro.suspend must be dominated by coro.suspend. The alloca happens before coro.suspend is a bit trickier to think about. In the simpler case, if the alloca happens before coro.suspend also dominates coro.suspend, then we have a pair of user BBs that would return true on the check, because they dominate each other through coro.suspend. If the alloca happens before coro.suspend does not dominate coro.suspend, and yet it's meaningfully used, it would either escape at some point, or it will be propagated into a PHINode eventually and that PHINode will dominate coro.suspend. Since we also track PHINode in the alias analysis, we will eventually find this pair that dominates each other.

lxfind added inline comments.Oct 27 2020, 11:24 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

Could you explain what could go wrong in this example if "a" is not on the frame?
The "alloca" instruction simply defines a value, but nothing else.
So your example code

void foo () {
  alloca a;
  if (cond)
    {
      co_await
      use1 a;
    }else{
      co_await
      use2 a;
    }
}

behaves exactly as

void foo () {
  if (cond)
    {
      co_await
      alloca a1
      use1 a1;
    }else{
      co_await
      alloca a2
      use2 a2;
    }
}
junparser added inline comments.Oct 27 2020, 11:36 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
925

hmm... get your point. make sense to me.

junparser accepted this revision.Oct 27 2020, 11:44 PM

Thanks for the patch!

This revision is now accepted and ready to land.Oct 27 2020, 11:44 PM
junparser added inline comments.Oct 28 2020, 12:00 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2026

@lxfind, with pointer tracking, I wonder whether lifetime check can be removed.

lxfind added inline comments.Oct 28 2020, 9:13 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2026

The lifetime checks may provide more accurate information in the case where allocas seem escaped. So I think it's helpful to keep it.

junparser added inline comments.Oct 28 2020, 7:00 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
898

I prefer this to be report_fatal_error.

This revision was landed with ongoing or failed builds.Oct 29 2020, 11:56 PM
This revision was automatically updated to reflect the committed changes.
vitalybuka added inline comments.Oct 30 2020, 12:32 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
916–921

not needed

Hello,

A git bisect has identified this change as the likely candidate for a new set of asserts / crashes in SwiftShader when attempting to use the coroutine passes. This is having knock-on issues with internal Google projects.

When passing this IR to ./bin/opt crash.ir -coro-early -coro-split -coro-elide -S with this change, we now get this crash.
Running the same command on the parent change does not crash, and behaves as expected.

I'd like to file a bug, but LLVM's Bugzilla is not letting me sign in, possibly due to spam restrictions. :-/

I'll do some investigation myself tomorrow, but any assistance here would be gratefully appreciated.

Many thanks,
Ben

lxfind added a comment.Nov 5 2020, 3:08 PM

Hello,

A git bisect has identified this change as the likely candidate for a new set of asserts / crashes in SwiftShader when attempting to use the coroutine passes. This is having knock-on issues with internal Google projects.

When passing this IR to ./bin/opt crash.ir -coro-early -coro-split -coro-elide -S with this change, we now get this crash.
Running the same command on the parent change does not crash, and behaves as expected.

I'd like to file a bug, but LLVM's Bugzilla is not letting me sign in, possibly due to spam restrictions. :-/

I'll do some investigation myself tomorrow, but any assistance here would be gratefully appreciated.

Many thanks,
Ben

Thanks for reporting. I will take a look.

Hello,

A git bisect has identified this change as the likely candidate for a new set of asserts / crashes in SwiftShader when attempting to use the coroutine passes. This is having knock-on issues with internal Google projects.

When passing this IR to ./bin/opt crash.ir -coro-early -coro-split -coro-elide -S with this change, we now get this crash.
Running the same command on the parent change does not crash, and behaves as expected.

I'd like to file a bug, but LLVM's Bugzilla is not letting me sign in, possibly due to spam restrictions. :-/

I'll do some investigation myself tomorrow, but any assistance here would be gratefully appreciated.

Many thanks,
Ben

I can confirm that this patch introduced a bug.
It can be triggered when there is an alloca instruction defined after coro.begin and used after a suspension. These allocas are not properly moved to the .resume function.
I will think about how to fix this.