This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Add coro_maychange intrinsic to solve TLS problem (2/5)
AbandonedPublic

Authored by ChuanqiXu on Apr 25 2022, 12:31 AM.

Details

Summary

This is intended to fix the TLS problem in coroutine described in https://github.com/llvm/llvm-project/issues/47179

Simply, we would assume the address of a TLS variable is same in one function. Since a function should be executed in one thread only. However, it is not true for unlowered coroutine. This patch tries to fix the problem by adding a wrapper for every TLS variable to block the alias analysis. Note that we couldn't do this for unlowered coroutine only due to inlining. Also the compiler is still available to optimize TLS variables for lowered coroutine.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Apr 25 2022, 12:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 12:31 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
ChuanqiXu requested review of this revision.Apr 25 2022, 12:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 12:31 AM
efriedma added inline comments.
llvm/lib/Transforms/Coroutines/CoroEarly.cpp
265 ↗(On Diff #424829)

UserInst might not be a legal insertion point for a call if it's a PHI node.

ChuanqiXu added inline comments.Apr 25 2022, 7:01 PM
llvm/lib/Transforms/Coroutines/CoroEarly.cpp
265 ↗(On Diff #424829)

Oh, good Catcha!

ChuanqiXu updated this revision to Diff 425470.Apr 27 2022, 3:43 AM

Emit llvm.coro.maychange in the frontend according to the discussion in https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015

rjmccall added inline comments.Apr 28 2022, 3:27 PM
clang/lib/CodeGen/CGExpr.cpp
2623

I guess this is unnecessary under OpenMP because the privatization logic will already anchor this appropriately in the function. That's worth mentioning in a comment so that readers don't think the combination is somehow busted.

Same question as the other patch: is there any way to safely only do this in code that's actually going to be part of a coroutine? Because getLangOpts().Coroutines is true for everyone using -std=c++20, and most of that code is probably not using coroutines. It seems like we have two options:

  • Use this pattern everywhere in the translation unit, and then eliminate it from all functions after we've split all coroutines
  • Use this pattern in the frontend when directly emitting unsplit coroutine bodies, and also change LLVM to introduce this pattern when cloning code into an unsplit coroutine body (most importantly, in the inliner)

I think the second option makes more sense. In the first option, the mere possibility of having coroutines in the module could significantly impede optimization. The second option also means you'll have to find and remove this pattern in *all* functions, not just when transforming coroutines.

You could also use the second option for your readnone problem: you can have your early pass make readnone calls directly from coroutine bodies coro_readnone instead, and the inliner can do the same for readnone calls being inlined into unsplit coroutines.

efriedma added inline comments.Apr 28 2022, 3:53 PM
clang/lib/CodeGen/CGExpr.cpp
2623

I'm not really happy with the idea that IR semantics are different in the body of a coroutine; that implies that every interprocedural analysis or optimization author has to think about the possible interactions with coroutine semantics. It's much simpler to just pretend a coroutine is a thread. I mean, you mention inlining, but there's also IPSCCP, and Attributor, maybe GlobalOpt, and other interactions I'm probably not thinking of.

And really, as long as alias analysis understands llvm.coro.maychange, we can do almost all the optimizations we want to do on thread-local variables anyway.

But maybe I'm leaning too far towards conceptual purity, as opposed to practicality.

rjmccall added inline comments.Apr 28 2022, 4:56 PM
clang/lib/CodeGen/CGExpr.cpp
2623

Like I said in Discourse, I think our hands are pretty forced here for TLS, because LLVM IR is assuming in its basic representation that functions are pinned to a single thread, which simply is not true in coroutine bodies. We have three options:

  1. We can globally change the structure of TLS access. You would not be able to simply use thread_local GVs as constants; instead, you would have to use some instruction which yields the address of the TLS for the current thread.
  2. We can require different structure for TLS access in unlowered coroutines.
  3. We can decline to support coroutines.

I don't think #3 is an option, and #1 isn't really a fight I want to fight, so we're left with #2. Maybe there's an option I'm not thinking of, though.

efriedma added inline comments.Apr 28 2022, 5:24 PM
clang/lib/CodeGen/CGExpr.cpp
2623

I think that's basically the three options, yes.

I guess strictly speaking, there is a fourth option: we could mess with the semantics of coroutines. We can say, for example, that the address of thread-local variables is always the address on entry to the coroutine. Then we can just hoist the computation to the entry block of the coroutine. But the spec probably doesn't allow that.

My intuition is that #1 is not actually that terrible. We don't really optimize thread-local variables very much anyway; as long as alias analysis doesn't explode, we're probably don't miss many optimizations. And I'm not sure #2 is actually significantly less work to implement. But I guess it's easier to prove the changes involved in #2 don't have any impact on code that doesn't use coroutines.

ChuanqiXu updated this revision to Diff 425954.Apr 28 2022, 8:20 PM

Address comments:

  • Don't add filter for OpenMP
  • Add readonly and inaccessiblememonly attribute to llvm.coro.may_change intrinsics.

I think it is better to discuss whether or not to insert llvm.coro.may_change in non-coroutines in discourse: https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015