This is an archive of the discontinued LLVM Phabricator instance.

Introduce @llvm.threadlocal.address intrinsic to access TLS variable
ClosedPublic

Authored by ChuanqiXu on May 9 2022, 11:41 PM.

Details

Summary

This belongs to a series of patches which try to solve the thread identification problem in coroutines. See https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015 for a full background.

The problem consists of two concrete problems: TLS variable and readnone functions. This patch tries to convert the TLS problem to readnone problem by converting the access of TLS variable to an intrinsic which is marked as readnone.

The readnone problem would be addressed in following patches.

Diff Detail

Event Timeline

ChuanqiXu created this revision.May 9 2022, 11:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 11:41 PM
ChuanqiXu requested review of this revision.May 9 2022, 11:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 11:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2022, 1:50 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ormris removed a subscriber: ormris.May 16 2022, 10:57 AM

I don't really understand how this is supposed to interact with D125292; even if you strip the readnone attribute from the call instruction, we'll still treat a call to the intrinsic as readnone.

I think I'd prefer to lower the intrinsic as part of the codegen optimization pipeline, not the optimization pipeline. So maybe just in in SelectionDAG? I mean, it doesn't matter much for your planned usage in coroutines, but it's more similar to other intrinsics, and more like what we expect it to look like in the future.

I don't really understand how this is supposed to interact with D125292; even if you strip the readnone attribute from the call instruction, we'll still treat a call to the intrinsic as readnone.

Yeah, the point here is that we used operand bundle in D125292. The operand bundles would break the logic you said, see: https://github.com/llvm/llvm-project/blob/3b762b3ab8d205cd6a7d42c96d39d5f4f701f2ab/llvm/include/llvm/IR/InstrTypes.h#L2315-L2325.

I think I'd prefer to lower the intrinsic as part of the codegen optimization pipeline, not the optimization pipeline. So maybe just in in SelectionDAG? I mean, it doesn't matter much for your planned usage in coroutines, but it's more similar to other intrinsics, and more like what we expect it to look like in the future.

Good idea. Would do.

ChuanqiXu edited reviewers, added: efriedma; removed: eli.friedman.May 18 2022, 7:30 PM

Address comments:

  • Lowering the introduced intrinsic in CodeGen pipelines.

Cleanup codes.

ChuanqiXu added inline comments.May 18 2022, 11:36 PM
llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
1 ↗(On Diff #430585)

I feel this is a better place than SelectDAG.

You can use the "Edit Related Revisions" option in the right-hand side menu of Phabricator to link this revision with the others of the series. I can't really speak for the Clang parts, but the LLVM parts looks reasonable to me modulo some detail comments.

llvm/docs/LangRef.rst
24392–24403

LangRef should be written in present tense. Something like:

The address of a thread local variable is not a constant, since it depends on the calling thread. The `llvm.threadlocal.address` intrinsic returns the address of the given thread local variable in the calling thread.

llvm/include/llvm/IR/IRBuilder.h
756–757

This could be a GlobalValue* operand to reduce the risk of misuse, right?

nhaehnle added inline comments.May 24 2022, 1:09 PM
llvm/include/llvm/IR/Intrinsics.td
1407–1408

Whether IntrNoMem is truly correct here depends on the overall solution of the thread identification issue, i.e. it depends on whether readnone implies "doesn't read thread ID". We'd best discuss that separately.

Address comments.

ChuanqiXu marked 2 inline comments as done.May 24 2022, 11:45 PM
ChuanqiXu added inline comments.
llvm/include/llvm/IR/IRBuilder.h
756–757

We could use Value* here to keep consistency with other functions and the uses. We added assertion in implementations to avoid misuses.

llvm/include/llvm/IR/Intrinsics.td
1407–1408

Yeah, let's discuss this in discourse thread.

nhaehnle added inline comments.May 26 2022, 8:46 AM
llvm/include/llvm/IR/IRBuilder.h
756–757

Thanks!

llvm/include/llvm/IR/Intrinsics.td
1407–1408

Just to follow up, based on the latest comments including that of @jyknight, IntroNoMem is correct.

So, I've been spending some significant time looking into this. I found that I couldn't really review this change for correctness, because I find it basically impossible to figure out whether the intrinsic calls have actually been added to all the right places in Clang or not. (And my intuition said "not", but then couldn't tell me where it's wrong.)

So, I started hacking up a prototype of a change to make the type of a TLS global variable be token instead of ptr. This allows missing calls to manifest as IR construction errors.

So far the biggest missing piece that identified in this review is function-local thread_locals (although I haven't actually gotten something fully working). Sadly, the handling of function-local statics in Clang is really rather hairy, what with objc blocks and openmp support both doing things that seem rather ill-advised to me. I'm toying with some cleanups there, to see if it can be simplified a bit. I don't have a full idea, yet, what changes need to be made to this review.

Anyhow -- I think the prototype I'm fiddling with is also along the path to the ideal long-term state, but pushing it beyond a prototype seems like it'll be a pain in the ass due to the bitcode compatibility requirement. (The bitcode upgrader would need to know how to transform all constant expressions using a TLS global into non-constant IR instructions, starting with a call to llvm.threadlocal.address -- in every function where the "constant" is referenced. For uses outside a function body, it transforms to an arbitrary address (e.g. null), instead.)

clang/lib/CodeGen/CGExpr.cpp
2609–2610 ↗(On Diff #428297)

This should be handled by using an overloaded intrinsic, so you get the entire family llvm.threadlocal.address.* with any pointer-type as the argument and the same type as the return value (that'll happen when you switch the intrinsic to use llvm_anyptr_ty).

llvm/include/llvm/IR/Intrinsics.td
1407

I believe this should be declared exactly like int_ptrmask right above.

ChuanqiXu marked an inline comment as done.Jun 6 2022, 1:02 AM

You can use the "Edit Related Revisions" option in the right-hand side menu of Phabricator to link this revision with the others of the series.

I forgot to reply this one. It is intended to not add related revisions. Since these revisions don't depend on each other from the perspective of codes. We could commit them in either order.

So, I've been spending some significant time looking into this. I found that I couldn't really review this change for correctness, because I find it basically impossible to figure out whether the intrinsic calls have actually been added to all the right places in Clang or not. (And my intuition said "not", but then couldn't tell me where it's wrong.)

Thanks for spending the time! To answer the question "Whether the intrinsic calls have actually been added to all the right places in Clang or not", I insert a verify pass in the beginning of the middle end pipeline to verify if all the uses of TLS variable is wrapped by the intrinsic. And my conclusion is NO. Not all the places is guarded by the intrinsic. Concretely, the dynamic initializer of TLS variable. See https://www.godbolt.org/z/a4bK8v67o. I am not sure if this is the RIGHT place you said.

So, I started hacking up a prototype of a change to make the type of a TLS global variable be token instead of ptr. This allows missing calls to manifest as IR construction errors.

So far the biggest missing piece that identified in this review is function-local thread_locals (although I haven't actually gotten something fully working). Sadly, the handling of function-local statics in Clang is really rather hairy, what with objc blocks and openmp support both doing things that seem rather ill-advised to me. I'm toying with some cleanups there, to see if it can be simplified a bit. I don't have a full idea, yet, what changes need to be made to this review.

Thanks for pointing this out! I also spent significant time to try to handle function-local thread locals in the last few days. And I have't found a solution yet. It is more complex indeed.

Anyhow -- I think the prototype I'm fiddling with is also along the path to the ideal long-term state, but pushing it beyond a prototype seems like it'll be a pain in the ass due to the bitcode compatibility requirement. (The bitcode upgrader would need to know how to transform all constant expressions using a TLS global into non-constant IR instructions, starting with a call to llvm.threadlocal.address -- in every function where the "constant" is referenced. For uses outside a function body, it transforms to an arbitrary address (e.g. null), instead.)

Oh, I never heard about the IR upgrader before. This is missed indeed.


Summarize things up. Here are 3 problems we found:
(1) The use in the dynamic initializer of TLS variable.
(2) Function local thread locals.
(3) IR Upgrader.

From my perspective, only the second problem (2) is the problem we must solve. The first problem looks fine to me in practice. And I am wondering if it is problem for the third problem. Since if I understand the problem correctly, it means that the newer compiler couldn't compile the IR from older compiler. And I think it wouldn't be that case after the patch landed. Do I misunderstand it?

ChuanqiXu retitled this revision from Introduce @llvm.threadlocal.address intrinsic to access TLS variable (1/3) to Introduce @llvm.threadlocal.address intrinsic to access TLS variable.Jun 9 2022, 1:28 AM
ChuanqiXu updated this revision to Diff 435461.Jun 9 2022, 2:23 AM

Handle function local thread locals.

ChuanqiXu updated this revision to Diff 435475.Jun 9 2022, 2:58 AM

Address inline comments.

ChuanqiXu marked 4 inline comments as done.Jun 9 2022, 3:05 AM

It looks like there are two problems now:
(1) The use of TLS variable in the dynamic initializer and the use of generated TLS variable (__tls_guard) doesn't get wrapped in @llvm.threadlocal_address() intrinsics. From my perspective, it is fine since the initializers should never be coroutines. (I meant to fix the coroutines bugs at the very start).
(2) The IR upgrader problem. It is fine to me too since we won't block the use of TLS variable directly after the patch landed (maybe we would in the longer future). So the higher version of LLVM will be able to compile the IR from older version after the patch landed. So it is not a problem to me. (It looks like the backward compatibility is not emphasized. This is the first time I saw the problem in the community)

clang/lib/CodeGen/CGExpr.cpp
2609–2610 ↗(On Diff #428297)

Yeah, it could be handled by an overloaded intrinsic. But given the process of opaque pointers goes well really, I feel like it is redundant to support non opaque pointer mode. It shouldn't affect users since opaque pointers is enabled by default as far as I know.

nikic added a subscriber: nikic.Jun 14 2022, 2:36 AM

Anyhow -- I think the prototype I'm fiddling with is also along the path to the ideal long-term state, but pushing it beyond a prototype seems like it'll be a pain in the ass due to the bitcode compatibility requirement. (The bitcode upgrader would need to know how to transform all constant expressions using a TLS global into non-constant IR instructions, starting with a call to llvm.threadlocal.address -- in every function where the "constant" is referenced. For uses outside a function body, it transforms to an arbitrary address (e.g. null), instead.)

I have implemented support for converting constant expressions to instructions in the bitcode reader in https://reviews.llvm.org/D127729. This was originally intended for constant expression removal, but I think with that infrastructure in place, upgrading TLS global references to use an intrinsic would be fairly simple.

Anyhow -- I think the prototype I'm fiddling with is also along the path to the ideal long-term state, but pushing it beyond a prototype seems like it'll be a pain in the ass due to the bitcode compatibility requirement. (The bitcode upgrader would need to know how to transform all constant expressions using a TLS global into non-constant IR instructions, starting with a call to llvm.threadlocal.address -- in every function where the "constant" is referenced. For uses outside a function body, it transforms to an arbitrary address (e.g. null), instead.)

I have implemented support for converting constant expressions to instructions in the bitcode reader in https://reviews.llvm.org/D127729. This was originally intended for constant expression removal, but I think with that infrastructure in place, upgrading TLS global references to use an intrinsic would be fairly simple.

Oh, it looks great! Especially in getValueForInitializer, we could solve the motivation problem given by @jyknight :

@x = thread_local global i32 0, align 4
@y = global i32* @x, align 8

Now we are possible to emit proper diagnostic message. The only issue is that the current framework of D127729 would map a constexpr into an instruction. But we might need to map a constexpr with TLS variables to multiple instructions (containing @llvm.threadaddress() intrinsics). But I believe this problem might be minor.

@jyknight How do you think about the status quo now?

@jyknight How do you think about the status now? I want to fix the thread local problem for coroutines in clang15 since the problem have been found for years...

@jyknight ping!

Or someone else is willing to review this one?

Ping @jyknight

Or anyone else will review this one? I want us to fix the thread problems in clang15, which would be created on July 26th.

@rjmccall @nhaehnle @danilaml @efriedma

nikic added a comment.Jul 5 2022, 12:17 AM

FWIW the bitcode patch has landed, so implementing the variant using a token type should be possible now. I'm not sure whether it's better to start with the current patch where the intrinsic is optional, or go directly to the one where it is required.

llvm/docs/LangRef.rst
24415

Don't we need to overload this intrinsic by return type, so it works with different address spaces?

24420

Should we enforce here (with a verifier check) that the argument is a thread-local global variable? I assume we don't want to allow weird things like @llvm.threadlocal.address(c ? @g1 : @g2) right? (Though I guess, without thread-local globals using a token type, nothing would prevent optimizations from forming this.)

llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
231 ↗(On Diff #435475)

I don't think this belongs here, this should get dropped by SelectionDAGBuilder.

llvm/test/Transforms/PreISelIntrinsicLowering/threadlocal_address.ll
1 ↗(On Diff #435475)

-opaque-pointers flag is not necessary.

ChuanqiXu updated this revision to Diff 442469.Jul 6 2022, 1:55 AM

Address comments:

  • Overload the intrinsic to make it to work with different address spaces.
  • Remove the limits in opaque pointers so we get many more tests involved.
  • Remove the intrinsics in SelectionDAGBuilder.
ChuanqiXu marked 2 inline comments as done.

FWIW the bitcode patch has landed, so implementing the variant using a token type should be possible now. I'm not sure whether it's better to start with the current patch where the intrinsic is optional, or go directly to the one where it is required.

I preferred to start with the current patch. Since my original idea is to fix the thread problems in coroutines and @jyknight said we could fix the two problems in one approach. But I still want to fix the thread problems in coroutines first.


A big problem I meet now is about the constant expression would merge into the argument automatically so the verifier would fail. Here is the example:

C++
union U {
  int a;
  float f;
  constexpr U() : f(0.0) {}
};
static thread_local U u;
void *p = &u;

Then it would generate following code for u:

define internal void @__cxx_global_var_init() #0 section ".text.startup" {
entry:
  %0 = call %union.U* @llvm.threadlocal.address.p0s_union.Us(%union.U* bitcast ({ float }* @_ZL1u to %union.U*))
  %1 = bitcast %union.U* %0 to i8*
  store i8* %1, i8** @p, align 8
  ret void
}

Then the verifier would complain since the argument of @llvm.threadlocal.address is not theadlocal global value. And I feel this might not be a simple problem to fix. I feel like we could only fix it after we made https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179. But it should be a long term goal to me.


So now it looks like there are two options:

  1. Remove the verifier part.
  2. Wait until we removed most constant expressions. In this case, I could only to pick up original methods to solve the thread problems in coroutines.

@jyknight @nikic @nhaehnle @efriedma @rjmccall How do you think about this?

llvm/docs/LangRef.rst
24415

Done. So we could remove the limit for opaque pointers too. So now it covers more tests.

24420

I originally added assertions when creating the intrinsics. But I add a check in the verifier in this revision.

ChuanqiXu marked 3 inline comments as done.Jul 6 2022, 2:11 AM

I can't judge the Clang changes.

On the LLVM side, can you also add the implementation and a test for the GlobalISel path? Plus, I have some minor inline comments.

llvm/docs/LangRef.rst
24563

The first argument is a pointer

24568–24570

LangRef should focus on how IR is defined, not the historical evolution. I think everything that needs to be said here is actually already said in the Semantics section.

ychen added inline comments.Jul 14 2022, 5:43 PM
llvm/docs/LangRef.rst
24558

Like @jyknight already did, any idea what it would take to change the parameter type from pointer to token? Looking at the test changes, it looks like OpenMP is impacted a lot. I suggest splitting this patch into Clang and LLVM parts to gain insights from the frontend experts.

llvm/lib/IR/IRBuilder.cpp
531

Here and below llvm::outs() are for debugging and should be removed?

ChuanqiXu marked 3 inline comments as done.

Address comments and remove verifier due to the automatic merging for constant expression I mentioned above.

ChuanqiXu marked an inline comment as done.Jul 14 2022, 10:44 PM

I can't judge the Clang changes.

Now the clang part is moved to D129833.

On the LLVM side, can you also add the implementation and a test for the GlobalISel path? Plus, I have some minor inline comments.

I don't understand it a lot since I lack experience in the backend. I've implemented it in SelectionDAG. And it looks redundant to implement it again in GlobalISel. May you explain it?

llvm/docs/LangRef.rst
24558

The change for OpenMP is relatively large since they used utils/update_cc_test_checks.py to update their tests automatically.

I've split the clang part into https://reviews.llvm.org/D129833

nikic accepted this revision.Jul 29 2022, 6:10 AM

LG. Now that LLVM 15 has branched, I think it is safe to land this, and there will be enough time before LLVM 16 to convert it to the token variant.

llvm/docs/LangRef.rst
24546

I'd replace variable -> global here.

This revision is now accepted and ready to land.Jul 29 2022, 6:10 AM
ChuanqiXu updated this revision to Diff 448904.Jul 31 2022, 7:50 PM
ChuanqiXu marked an inline comment as done.

Address comments.

LG. Now that LLVM 15 has branched, I think it is safe to land this, and there will be enough time before LLVM 16 to convert it to the token variant.

Yeah, thanks for reviewing!

This revision was landed with ongoing or failed builds.Jul 31 2022, 7:53 PM
This revision was automatically updated to reflect the committed changes.
xbolva00 added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1408

Returns nonnull?

nikic added a comment.Dec 15 2022, 6:08 AM

So, I've been spending some significant time looking into this. I found that I couldn't really review this change for correctness, because I find it basically impossible to figure out whether the intrinsic calls have actually been added to all the right places in Clang or not. (And my intuition said "not", but then couldn't tell me where it's wrong.)

So, I started hacking up a prototype of a change to make the type of a TLS global variable be token instead of ptr. This allows missing calls to manifest as IR construction errors.

So far the biggest missing piece that identified in this review is function-local thread_locals (although I haven't actually gotten something fully working). Sadly, the handling of function-local statics in Clang is really rather hairy, what with objc blocks and openmp support both doing things that seem rather ill-advised to me. I'm toying with some cleanups there, to see if it can be simplified a bit. I don't have a full idea, yet, what changes need to be made to this review.

Anyhow -- I think the prototype I'm fiddling with is also along the path to the ideal long-term state, but pushing it beyond a prototype seems like it'll be a pain in the ass due to the bitcode compatibility requirement. (The bitcode upgrader would need to know how to transform all constant expressions using a TLS global into non-constant IR instructions, starting with a call to llvm.threadlocal.address -- in every function where the "constant" is referenced. For uses outside a function body, it transforms to an arbitrary address (e.g. null), instead.)

Do you plan to pursue this? We now have the necessary infrastructure to make the bitcode upgrade simple, and I think it would be great to include this change in LLVM 16, so we directly introduce llvm.threadlocal.address in its final form.