Page MenuHomePhabricator

[RFC] Fix TLS and Coroutine
Changes PlannedPublic

Authored by lxfind on Dec 4 2020, 8:47 AM.



This patch is to address
A relevant discussion regarding pthread_self and TLS can be found here:

A coroutine may suspend and resume on a different thread, and hence the address of a thread_local variable may change after coroutine suspension.
In the existing design, getting the address of a TLS variable is through a direct reference, like @tls_variable. Such kind of value can be
arbitrarily moved around/replaced in the IR within the same function. This will lead to incorrect caching of TLS variable address in coroutines across suspension points.
To fix it, we have to turn the TLS address access into an intrinsics call, so that it will not be simply CSE-ed.
After CoroSplit, we no longer have coroutines, and hence can safely lower the TLS intrinsics back into references.

The current placement of the LowerThreadLocalIntrinsicPass may not be ideal. I am not quite sure how to organize it. Suggestions welcome!
Testing isn't sufficient, and there may also be failing tests. I will add/fix more tests if this patch is along the right direction.

Diff Detail

Unit TestsFailed

50 msx64 windows > LLVM.CodeGen/XCore::threads.ll
Script: -- : 'RUN: at line 1'; c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\llc.exe -march=xcore < C:\ws\w16c2-1\llvm-project\premerge-checks\llvm\test\CodeGen\XCore\threads.ll | c:\ws\w16c2-1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w16c2-1\llvm-project\premerge-checks\llvm\test\CodeGen\XCore\threads.ll

Event Timeline

lxfind created this revision.Dec 4 2020, 8:47 AM
lxfind requested review of this revision.Dec 4 2020, 8:47 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
hoy added inline comments.Dec 4 2020, 9:28 AM

With the intrinsic, can TLS variable reference in the same coroutine or regular routine be DCE-ed anymore?

hoy added inline comments.Dec 4 2020, 9:31 AM

Sorry, I meant CSE-ed.

lxfind added inline comments.Dec 4 2020, 10:08 AM

Since the intrinsics does not have readnone attribute, it won't be CSE-ed before CoroSplit.
However after CoroSplit, it will be lowered back to the direct reference of the TLS, and will be CSE-ed by latter passes.
I can add a test function to demonstrate that too.

hoy added inline comments.Dec 4 2020, 1:38 PM

Sounds good. Can you please point out what optimization passes CSE-ed tls reference without this implementation? I'm wondering if those optimizations can be postponed to after CoroSplit.

lxfind added inline comments.Dec 4 2020, 3:01 PM

To clarify, it wasn't just CSE that would merge the references of the same TLS.
For instance, without this patch, a reference to "tls_variable" will just be "@tls_variable". For code like this:

@tls_variable = internal thread_local global i32 0, align 4

define i32* @foo(){
  ret i32* @tls_variable

define void @bar() {
  %tls1 = call i32* @foo()
  %tls2 = call i32* @foo()
  %cond = icmp eq i32* %tls1, %tls2

When foo() is inlined into bar(), all uses of %tls1 will be replaced with @tls_variable.

hoy added inline comments.Dec 7 2020, 11:16 PM

Thanks for the explanation. I have a dumb question. Why isn't corosplit placed at the very beginning of the pipeline?

lxfind added inline comments.Dec 8 2020, 8:27 AM

The coroutine frame size is determined during CoroSplit. So if CoroSplit happens too early without any optimizations, the frame size will always be very big and there is no chance to optimize it.
This is indeed a fundamental trade-off. If CoroSplit happens much earlier then it will be immune to this kind of problem.

nhaehnle added inline comments.

Unrelated change

lxfind updated this revision to Diff 310575.Dec 9 2020, 10:18 AM

Fix all failing tests

lxfind planned changes to this revision.Feb 18 2021, 10:36 AM