This is an archive of the discontinued LLVM Phabricator instance.

Introduce noread_thread_id to address the thread identification problem in coroutines
AbandonedPublic

Authored by ChuanqiXu on Aug 22 2022, 1:12 AM.

Details

Summary

This implements the suggested solution from @nhaehnle, @jyknight and @fhahn (who prefer this than D127383). The suggested solution is described in: https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/48. Here is the referenced solutions:

  1. Introduce new attribute noread_thread_id
  2. Emit noread_thread_id in addition to readnone in the frontend for functions that are known to not read the thread ID implicitly or explicitly.
  3. Mark the TLS intrinsic as only as readnone.
  4. Many checks for hasAttribute(ReadNone) (e.g. to guard CSE) have to become hasAttribute(ReadNone) && (!isCoroutine(CurrentFn) || !hasAttribute(NoReadThreadID). Though, note that this isn’t the first time that happens. For example, we have a bunch of places that need to check hasAttribute(ReadNone) && !hasAttribute(Convergent). So there’s precedent.

And the patch doesn't implement the 2ed and 4th solutions completely, I think we could complete them step by step since we lack enough tests now. The decision may be a little bit painful but I think it would be more stable.

Test Plans: check-all, https://godbolt.org/z/TGcPPP57K, https://godbolt.org/z/TGcPPP57K, folly and our internal projects.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Aug 22 2022, 1:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 1:12 AM
ChuanqiXu requested review of this revision.Aug 22 2022, 1:12 AM
rjmccall added inline comments.Aug 22 2022, 1:42 PM
llvm/docs/LangRef.rst
1931

Suggestion:

This attribute indicates that the function does not rely on the
identity of the current thread in any way, such as by reading the
current thread ID or taking the address of a thread-local variable.
If the function does rely on the identity of the current thread,
the behavior is undefined.

rjmccall added inline comments.Aug 22 2022, 1:49 PM
clang/lib/CodeGen/CGStmt.cpp
2260 ↗(On Diff #454398)

Hmm, my comment here got lost somehow.

This looks like a new semantic assumption. Can we split this patch so that this is separate from the introduction of the new attribute? I know we were assuming this until a few weeks ago, but still, it's generally best practice to do representation changes independent from semantic changes.

Also, this is basically saying that relying on the thread identity is a side-effect that needs to be reflected in the constraints. Is there documentation justifying that assumption? Doesn't this make some kinds of existing code invalid?

ChuanqiXu updated this revision to Diff 454683.Aug 22 2022, 8:42 PM
ChuanqiXu edited the summary of this revision. (Show Details)

Address comments:

  • Edit LangRef.rst.
  • Split the add-the-attributes part in later revisions.
ChuanqiXu marked an inline comment as done.Aug 22 2022, 8:43 PM
rjmccall added a comment.EditedAug 22 2022, 8:57 PM

Okay. Doc parts LGTM, and I have some naming suggestions for the core methods.

llvm/include/llvm/IR/InstrTypes.h
1855

The closest existing precedent would suggest doesNotReadThreadID and setDoesNotReadThreadID.

1857
1863

This is an odd use of "nor". Maybe take a different approach — canReadDifferentThreadIDIfMoved()?

ChuanqiXu updated this revision to Diff 454696.Aug 22 2022, 9:58 PM
ChuanqiXu marked 3 inline comments as done.

Address comments.

I post the incorrect version the last time.

rjmccall added inline comments.Aug 23 2022, 9:25 AM
llvm/include/llvm/IR/InstrTypes.h
1863

Oh, I didn't notice this last night — canReadDifferentThreadIDIfMoved has the opposite sense of the old method, so either you need to negate the logic in the method and all its call sites, or you need to rename it something like cannotReadDifferentThreadIDIfMoved.

ChuanqiXu updated this revision to Diff 455209.Aug 24 2022, 7:47 AM

Address comments.

ChuanqiXu marked an inline comment as done.Aug 24 2022, 7:47 AM
ChuanqiXu added inline comments.
llvm/include/llvm/IR/InstrTypes.h
1863

Oh, my bad. I should check that. Thanks for double checking!

Seems okay to me, but like I said, it'd be good to get AA eyes on it.

nikic added a comment.Aug 29 2022, 6:34 AM

Okay, this is a bit tricky because we have three different things:

  1. The noread_thread_id attribute, the lack of which was causing issues with intrinsics in the previous version
  2. The meaning of the readnone (etc) attributes, which for pragmatic reasons has to exclude thread IDs for now
  3. The meaning of doesNotReadMemory() etc queries, which in the previous version included thread ID accesses, but in the new version require a separate call

I think my question here would be why this did not stick with the previous implementation approach that also affects doesNotReadMemory and AA queries (and thus makes everything "automatically correct"), and only added the noread_thread_id attribute to make intrinsic handling more precise?

My general vision for this area was that after D130896, we would add ThreadID as an additional ModRef location, which gets removed for non-presplit-coroutines due to being constant. This would follow the interpretation that the thread ID is part of "memory" though, which kind of goes against the approach here.

ChuanqiXu marked an inline comment as done.Aug 29 2022, 7:41 PM

Okay, this is a bit tricky because we have three different things:

  1. The noread_thread_id attribute, the lack of which was causing issues with intrinsics in the previous version
  2. The meaning of the readnone (etc) attributes, which for pragmatic reasons has to exclude thread IDs for now
  3. The meaning of doesNotReadMemory() etc queries, which in the previous version included thread ID accesses, but in the new version require a separate call

I think my question here would be why this did not stick with the previous implementation approach that also affects doesNotReadMemory and AA queries (and thus makes everything "automatically correct"), and only added the noread_thread_id attribute to make intrinsic handling more precise?

My general vision for this area was that after D130896, we would add ThreadID as an additional ModRef location, which gets removed for non-presplit-coroutines due to being constant. This would follow the interpretation that the thread ID is part of "memory" though, which kind of goes against the approach here.

I think the key point here is whether or not "thread_id" is part of "memory". According to https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/48, we agree to treat "thread_id" is not part of memory. I feel the idea is to make these attributes more composable. (@nhaehnle ) And it looks like @jyknight @fhahn @rjmccall @efriedma tend to agree the direction if I don't misread. And your proposed solution should be available too. I think we need to get in consensus that whether or not "thread_id" is part of the "memory".

And another benefit of this method is that it is helpful to solve the potential similar problem in green threads (which is called stackful coroutines, or fibers). We mention about it here: https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/28

(Some backgrounds for stackful coroutines: The stackful coroutines are not standard features and a vendor extension. the coroutine intrinsics in LLVM currently works for stackless coroutines. And the general implementation of stackful coroutine is not compiler dependent. The stackful coroutines save each register manually when switching. So it is hard to detect stackful coroutines in compiler.)

Stackful coroutine bodies should be straightforward to support on top of the other work you've been doing, if anyone's actually interested in pursuing them. As far as the optimizer needs to know, a stackful coroutine function is just like a presplit stackless coroutine except that calls and returns work normally and it's never split. Because it's never split, the backends would need to understand that they can't arbitrarily reorder TLS materializations and so on in those functions, which would probably be the most complicated piece of work there. Otherwise, I think we'd just need to mark stackful coroutine bodies with some new attribute and then change cannotReadDifferentThreadIDIfMoved to check for that, the same way it checks for presplit stackless coroutines.

Stackful coroutine bodies should be straightforward to support on top of the other work you've been doing, if anyone's actually interested in pursuing them. As far as the optimizer needs to know, a stackful coroutine function is just like a presplit stackless coroutine except that calls and returns work normally and it's never split. Because it's never split, the backends would need to understand that they can't arbitrarily reorder TLS materializations and so on in those functions, which would probably be the most complicated piece of work there. Otherwise, I think we'd just need to mark stackful coroutine bodies with some new attribute and then change cannotReadDifferentThreadIDIfMoved to check for that, the same way it checks for presplit stackless coroutines.

As far as I understand, we can't mark stackful coroutine bodies with special attributes. It is slightly different from stackless coroutine. A stackless coroutine is a suspendable function. So we can mark the function. But the stackful coroutine is a thread in the user space actually. (Or we can think stackful coroutine as a stack instead of a function) In another word, if a function A in a stackful coroutine calls another function B, then B lives in the stackful coroutine too. The stackful coroutine switches by user(library) implemented methods and they are not standardized so we can't even do hacks for them.

I don't pursue stackful coroutine too. I raise the example to show the idea of noread_thread_id may have some slight advantage.

Stackful coroutine bodies should be straightforward to support on top of the other work you've been doing, if anyone's actually interested in pursuing them. As far as the optimizer needs to know, a stackful coroutine function is just like a presplit stackless coroutine except that calls and returns work normally and it's never split. Because it's never split, the backends would need to understand that they can't arbitrarily reorder TLS materializations and so on in those functions, which would probably be the most complicated piece of work there. Otherwise, I think we'd just need to mark stackful coroutine bodies with some new attribute and then change cannotReadDifferentThreadIDIfMoved to check for that, the same way it checks for presplit stackless coroutines.

As far as I understand, we can't mark stackful coroutine bodies with special attributes. It is slightly different from stackless coroutine. A stackless coroutine is a suspendable function. So we can mark the function. But the stackful coroutine is a thread in the user space actually. (Or we can think stackful coroutine as a stack instead of a function) In another word, if a function A in a stackful coroutine calls another function B, then B lives in the stackful coroutine too. The stackful coroutine switches by user(library) implemented methods and they are not standardized so we can't even do hacks for them.

Well, it's complicated. Stackful coroutines don't generally involve creating a true thread, just a stack, and then yes, you swap the stack on the current thread in some system-specific way. What I'm pointing out is that stackful coroutines can therefore observe changes in their thread ID across suspension points, which is the exact same problem as stackless coroutines have. So we actually *would* need them to be identified specially, even if they're otherwise straightforward to compile, just so that we understand those special semantics and don't e.g. miscompile TLV accesses by hoisting them over a suspension. (That this is still necessary might perhaps undermine some of the arguments for pursuing stackful coroutines in the first place; nonetheless, it's true.)

I don't pursue stackful coroutine too. I raise the example to show the idea of noread_thread_id may have some slight advantage.

Sure, we don't need to talk about this any further, since nobody's interested in pursuing it right now. I'm just trying to underline the connections to what we've already done.

ChuanqiXu abandoned this revision.Oct 16 2022, 7:59 PM