This is an archive of the discontinued LLVM Phabricator instance.

[1/3] TLS loads opimization (hoist)
ClosedPublic

Authored by xiangzhangllvm on Feb 16 2022, 7:19 PM.

Details

Summary

When we access a TLS variable in PIC mode, it usually get the TLS address by calling a lib function, some like
callq __tls_get_addr@PLT
This call was not show in IR or MIR, usually tag by a target-special flag (like pic) and generated in Assembly Printing.
So it is usually call it every time when TLS variable is accessed. Many of them are duplicated, especially in loops.

This patch is try to optimize it. It identifies/eliminate Redundant TLS address call by hoist the TLS access when the related option is set.

For example:

static __thread int x;
int g();
int f(int c) {
  int *px = &x;
  while (c--)
    *px += g();
  return *px;
}

will generated Redundant TLS Loads by compiling it with
Clang++ -fPIC -ftls-model=global-dynamic -O2 -S

.LBB0_2:                                # %while.body
                                        # =>This Inner Loop Header: Depth=1
        callq   _Z1gv@PLT
        movl    %eax, %ebp
        leaq    _ZL1x@TLSLD(%rip), %rdi
        callq   __tls_get_addr@PLT
        addl    _ZL1x@DTPOFF(%rax), %ebp
        movl    %ebp, _ZL1x@DTPOFF(%rax)
        addl    $-1, %ebx
        jne     .LBB0_2
        jmp     .LBB0_3
.LBB0_4:                                # %entry.while.end_crit_edge
        leaq    _ZL1x@TLSLD(%rip), %rdi
        callq   __tls_get_addr@PLT
        movl    _ZL1x@DTPOFF(%rax), %ebp

The Redundant TLS Loads will hurt the performance, especially in loops.
So we try to eliminate/move them if required by customers, let it be:

# %bb.0:                                # %entry
         ...
         movl    %edi, %ebx
         leaq    _ZL1x@TLSLD(%rip), %rdi
         callq   __tls_get_addr@PLT
         leaq    _ZL1x@DTPOFF(%rax), %r14
         testl   %ebx, %ebx
         je      .LBB0_1
 .LBB0_2:                                # %while.body
                                         # =>This Inner Loop Header: Depth=1
         callq   _Z1gv@PLT
         addl    (%r14), %eax
         movl    %eax, (%r14)
         addl    $-1, %ebx
         jne     .LBB0_2
         jmp     .LBB0_3

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
xiangzhangllvm added inline comments.Feb 21 2022, 1:23 AM
llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
294–296

This pass is very later at Scalar optmization. I think we no need to preserve all. Let me carefully re-consider here, thanks!

llvm/test/CodeGen/X86/tls-loads-control.ll
1

The test contain "!3 = !{i32 1, !"tls-load-control", !"Optimize"}" , it has same functonality with "--tls-load-control=Optimize"
But test both of them here is not bad, I think.

11–13

This test is directly generate from a clang test I'll commit latter.
So let it be the raw result of source file out put (I comment at line 8) is good to check the source of it.
So I didn't manually change it more.

First let me quickly update some parts of it. Thanks a lot!

LuoYuanke added inline comments.Feb 21 2022, 5:13 AM
llvm/include/llvm/Transforms/Scalar/TLSVariableControl.h
75

I curious why not include the header file BasicBlock.h?

95

uses -> users?

llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
47

"Eleminate remove" -> Eliminate?

136

Drop brace.

xiangzhangllvm added inline comments.Feb 21 2022, 4:32 PM
llvm/include/llvm/Transforms/Scalar/TLSVariableControl.h
75

We prefer to include the *.h at x.cpp , this make x.h is simple. This style can easy find example in other files, for example "Scalar/GVNSink.cpp"

95

Make sense, The Use self is no problem, it means where is it used. But I name the "struct TLSUser" for its element, let me change it, thanks!

xiangzhangllvm added inline comments.Feb 21 2022, 4:35 PM
llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
136

I remember we should not remove "{}" for "else" if "if" has "{}"

Refine the code according to reviews, thanks a lot!

xiangzhangllvm marked 20 inline comments as done.Feb 21 2022, 6:47 PM
xiangzhangllvm added inline comments.
llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
115

The pass will change the code "move GV ahead", so let's just preserve CFG here.

294–296

The pass may change the code "move GV ahead", so let's just preserve CFG here.

Refine name "control" --> "hoist"

Fix tests (MLIR.Examples/standalone::test.toy and libFuzzer.libFuzzer::large.test looks no relation with this patch)

pengfei added inline comments.Feb 21 2022, 11:34 PM
llvm/include/llvm/LinkAllPasses.h
180

Keep the same format looks better. Up to you.

llvm/include/llvm/Transforms/Scalar.h
431

What's "prepares a function" mean?

llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
1 ↗(On Diff #410450)

Still less than 80.

21 ↗(On Diff #410450)

clang

llvm/test/CodeGen/AArch64/O3-pipeline.ll
69 ↗(On Diff #410450)

Maybe preserve loop infor too. The pass does nothing with it. This may help with the following passes not run it again.

llvm/test/CodeGen/X86/tls-loads-control.ll
177

There meta data is annoying, especially the OneAPI info here doesn't make sense to llvm. The same below.

llvm/test/CodeGen/X86/tls-loads-control3.ll
161

Add nounwind in attributes to avoid cfi directives.

Address Phoebe's reviews, thanks a lot!

xiangzhangllvm marked 7 inline comments as done.Feb 22 2022, 1:49 AM
xiangzhangllvm added inline comments.
llvm/include/llvm/LinkAllPasses.h
180

Yes, this is follow clang-format, let it be.

llvm/test/CodeGen/AArch64/O3-pipeline.ll
69 ↗(On Diff #410450)

This pass changed the instructions, so BB info should better be updated.

llvm/test/CodeGen/X86/tls-loads-control.ll
177

Done, but I think keep more info is not bad for test, maybe it is personal habits, I usually prefer to sync the *.ll with the *.c/cpp

llvm/test/CodeGen/X86/tls-loads-control3.ll
161

Add nounwind to #0, but seems cfi still here, anyway, that is not important, I think.

pengfei added inline comments.Feb 22 2022, 3:17 AM
llvm/test/CodeGen/X86/tls-loads-control3.ll
41–43

data16 and rex64 don't seem correct instructions.

161

Did you re-generate the tests? It should work. lit test will fail if you didn't update the tests.

xiangzhangllvm marked 3 inline comments as done.Feb 22 2022, 4:57 PM
xiangzhangllvm added inline comments.
llvm/test/CodeGen/X86/tls-loads-control3.ll
41–43

Yes, there are prefix not real instruction, I can see them in X86AsmParser.cpp , not clear what they are used for ? Seems no relation with this patch.

161

Yes, re-generate it with update_llc_test_checks.py
let me show my local status:

[xiangzh1@..]$./llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/X86/tls-loads-control3.ll
[xiangzh1@..]$git diff
[xiangzh1@..]$llvm-lit llvm/test/CodeGen/X86/tls-loads-control3.ll
-- Testing: 1 tests, 1 workers --
PASS: LLVM :: CodeGen/X86/tls-loads-control3.ll (1 of 1)

Testing Time: 0.21s
  Passed: 1
xiangzhangllvm added inline comments.Feb 22 2022, 5:09 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
30 ↗(On Diff #410479)

Don't know why clang-format will move it ahead, seems clang-format's bug.
Let me re-clang-format for other places.

Do clang format

pengfei accepted this revision.Feb 22 2022, 7:41 PM

LGTM. Please wait some days for other reviewers' opinions.

This revision is now accepted and ready to land.Feb 22 2022, 7:41 PM
craig.topper added inline comments.Feb 22 2022, 8:29 PM
llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
64 ↗(On Diff #410682)

Is SetVector used by this file?

68 ↗(On Diff #410682)

are any algorithms used in this file?

70 ↗(On Diff #410682)

Is vector used by this file?

122 ↗(On Diff #410682)

Varibles -> Variables.

Can we use llvm::MapVector to avoid a separate SmallVector?

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
10 ↗(On Diff #410682)

exmaple -> example

10 ↗(On Diff #410682)

PLS -> Please?

131 ↗(On Diff #410682)

The GV field in TLSCandidate isn't assigned is it?

136 ↗(On Diff #410682)

TLSCandMap[GV].addUser(Inst, Idx); works even if GV isn't in the map. The entry will be default constructed before addUser is called.

145 ↗(On Diff #410682)

can we use llvm::any_of here?

186 ↗(On Diff #410682)

Is this before the allocas?

218 ↗(On Diff #410682)

Replaced |= tryReplaceTLSCandidate(Fn, GV);

226 ↗(On Diff #410682)

Should we call skipFunction() for opt-bisect-limit and optnone support?

228 ↗(On Diff #410682)

Do we normally use capitalized strings?

craig.topper added inline comments.Feb 22 2022, 8:34 PM
llvm/include/llvm/IR/Module.h
914

Why a module flag? What is the policy for LTO merging?

xiangzhangllvm added inline comments.Feb 22 2022, 9:52 PM
llvm/include/llvm/IR/Module.h
914

Because the TLS is Global Value which has Scope for module.
I am not clear about the LTO merging, I think one module's flag should not "spread" to another module.

llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
122 ↗(On Diff #410682)

Let me check the MapVector, rarely use it before : ) thanks a lot!

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
131 ↗(On Diff #410682)

Yes, I should remove the "GV" field from TLSCandidate.

136 ↗(On Diff #410682)

Yes, So here the "else" means TLSCandMap.count(GV) != 0 (GV is in the map)

145 ↗(On Diff #410682)

Yes, I checked it, that is good, I never use it before : )

186 ↗(On Diff #410682)

Sorry, don't much understand,
What the problem if it before allocas ?
This is in IR level and the Global Value do not need "allocas"

218 ↗(On Diff #410682)

I think "|=" is a bit operation. Here is bool, the stardard way is "||"

226 ↗(On Diff #410682)

1 Yes, we should consider skipFunction, good catch!
2 optnone will not created this pass at TargetPassConfig.cpp

228 ↗(On Diff #410682)

I checked the other similar uses, we normally not use capitalized strings, let me change it, thanks a lot!

craig.topper added inline comments.Feb 22 2022, 11:22 PM
llvm/include/llvm/IR/Module.h
914

The first thing LTO does is merges all modules into a single module. Then the optimization pipeline runs on that unified module.

If we made it a function attribute would anything break? Could two different functions have a different hoisting policy? That would avoid the LTO issue.

llvm/lib/IR/Module.cpp
738

I believe the ModFlagBehavior::Error here controls what happens in LTO. It will fail to merge if the modules have different flags.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
186 ↗(On Diff #410682)

The alloca instructions for the function's local variables are the first instructons in the entry basic block. Not if we should be putting the bitcast before them.

218 ↗(On Diff #410682)

MadeChange |= is a frequent pattern used by passes in llvm.

226 ↗(On Diff #410682)

There is an optnone function attribute which is different than the global optlevel.

llvm/include/llvm/IR/Module.h
914

OK, let me move it into function attribute. Thanks a lot!

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
186 ↗(On Diff #410682)

OK, Let me re-place the bitcast position.

226 ↗(On Diff #410682)

OK, let me check it again, but for skipFunction, I can't call it here, because the TLSVariableHoistPass not inherit FunctionPass,
but I already check it at TLSVariableHoistLegacyPass::runOnFunction.

LuoYuanke added inline comments.Feb 22 2022, 11:52 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
186 ↗(On Diff #410682)

Would you add a test case with alloca for checking?

xiangzhangllvm added inline comments.Feb 23 2022, 1:42 AM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
186 ↗(On Diff #410682)

Yes, no problem

Address Craig's reviews

xiangzhangllvm marked 13 inline comments as done.Feb 23 2022, 2:57 AM
xiangzhangllvm added inline comments.
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
186 ↗(On Diff #410682)

Let me refine here a little later, I am considering get the insert position by checking dominate relation to reduce life range.

craig.topper added inline comments.Feb 23 2022, 12:59 PM
llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
68 ↗(On Diff #410758)

You're not using std::map in this file

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
126 ↗(On Diff #410758)

Merge into the previous if with ||

135 ↗(On Diff #410758)

This line should work for the case when GV isn't already in the map. The operator[] on the MapVector will default construct a TLS candidate before calling addUser. So you don't need to check TLSCandMap.count

craig.topper added inline comments.Feb 23 2022, 12:59 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
223 ↗(On Diff #410758)

This might be handled by the pass manager for the new pass manager. For the old pass manager it is part of skipFunction.

What are the legality considerations for this transformation,
when is it legal to perform it? I'm mainly confused why it's opt-in.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
45–52 ↗(On Diff #410758)

This should be an enum, not strings

xiangzhangllvm added a comment.EditedFeb 23 2022, 5:00 PM

What are the legality considerations for this transformation,
when is it legal to perform it? I'm mainly confused why it's opt-in.

This only try hoist TLS address call, the TLS variable will only used in thread function, so it is always legal to do this transformation in a function.
Sorry, what you mean "opt-in" ?

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
45–52 ↗(On Diff #410758)

yes, we usually use enum here, but I find string is better/easier, because we don't need to define the enum (sometimes may define 2 or more times if we want to use it in clang, for example transfer a clang option to "-mllvm -tls-load-hoist=xxx")

135 ↗(On Diff #410758)

Make sense!

223 ↗(On Diff #410758)

Yes, Both pass manager will call runImpl, it called by "TLSVariableHoistPass::run" and ."TLSVariableHoistLegacyPass::runOnFunction" , (I forget which is new and which is old)

1 Address Craig's reviews.
2 Re-write the function findInsertPos, use dominate relation to choose the insert position, this will help generate shorter life ranges.

pengfei added inline comments.Feb 24 2022, 4:39 AM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
168 ↗(On Diff #411048)

You need run clang-format for your new code.

llvm/test/CodeGen/X86/tls.ll
1–8 ↗(On Diff #411048)

Why remove the existing test?

512–513 ↗(On Diff #411048)

The existing tests also prefer simple meta data. So remove them in the new test.

Clang-format and recover tls.ll test. Thanks a lot!

xiangzhangllvm marked an inline comment as done.Feb 24 2022, 7:02 PM

Hi Craig, If you feel no problem, I'll submit it in these days. And go on the 2nd patch for clang. Thanks!

(The biggest change from your last review is that I re-write the function findInsertPos, use dominate relation to choose the insert position.)

I just noticed this patch from Phabricator "Recent Activity" and did not look into it closely. How does it interact with createCleanupLocalDynamicTLSPass?
Do we need multiple passes to deal with __tls_get_addr calls?

Do you have a plan to implement clang -mtls-dialect=gnu2? I have had such a plan for a long time but always have more prioritized work to do...
ld.lld has great TLSDESC support since D116900. If we can add support and let GCC/Clang default to TLS descriptors, there is just no need adding more code dealing with these traditional (legacy) TLS models.

Do you have a plan to implement clang -mtls-dialect=gnu2? I have had such a plan for a long time but always have more prioritized work to do...
ld.lld has great TLSDESC support since D116900. If we can add support and let GCC/Clang default to TLS descriptors, there is just no need adding more code dealing with these traditional (legacy) TLS models.

PLS let me take a look on createCleanupLocalDynamicTLSPass, I never read it before.
Could you give me more information about implement clang -mtls-dialect=gnu2? I am happy to do that, PLS let me try. (If my manager not transfer me urgent jobs)

xiangzhangllvm added a comment.EditedFeb 27 2022, 6:52 PM

Clear now. The createCleanupLocalDynamicTLSPass is integrated in ISel , it is mostly a local/peephole optimization for variables with same TLS base address.
It can't analysis the TLS variables in different BBs or Loops. PLS refer https://godbolt.org/z/f7WrqK5he
But it is a good supplement for current patch. because it can optimize different TLS variables in local. (Current patch is focus on same TLS variable which used in different BBs or Loops)

What are the legality considerations for this transformation,
when is it legal to perform it? I'm mainly confused why it's opt-in.

This only try hoist TLS address call, the TLS variable will only used in thread function, so it is always legal to do this transformation in a function.

What about profitability, is it always a win?

Sorry, what you mean "opt-in" ?

Why is it not enabled by default?

What about profitability, is it always a win?

It remove duplicated TLS address call, In theory, it has big profitability to win.
But similar with most optimization, I can't say it must win in any case.

Sorry, what you mean "opt-in" ?

Why is it not enabled by default?

This is a general optimization which will affect all targets test, I'd like to enable it by another independent patch with mainly modify tests.

Let me commit it and go on the 2nd patch. Thanks for all your reviews!

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 4:58 PM
This revision was landed with ongoing or failed builds.Mar 1 2022, 6:37 PM
This revision was automatically updated to reflect the committed changes.

Did anyone sign off on the new insertion position code? It looks like the only approval is from before you added that. This should have been moved back to needs review.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
190 ↗(On Diff #412302)

Why not pass the Loop* you already looked up in the caller. Why get the loop again?

194 ↗(On Diff #412302)

outmost -> outermost

201 ↗(On Diff #412302)

I'm not sure that's true. Control flow in IR is explicit, there is no fallthrough. If it's a predecessor it must have a branch instruction of some kind.

craig.topper added inline comments.Mar 1 2022, 7:07 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
156 ↗(On Diff #412302)

method names should be lowercase

198 ↗(On Diff #412302)

Predecessor and PreHeader aren't quite the same thing. A PreHeader also has the loop as it's only successor.

No problem, let me revert and refine. Thanks!

xiangzhangllvm reopened this revision.Mar 1 2022, 10:11 PM
This revision is now accepted and ready to land.Mar 1 2022, 10:11 PM
xiangzhangllvm added inline comments.Mar 1 2022, 10:39 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
198 ↗(On Diff #412302)

Sorry, not much clear about it.
My understand is the PreHeader here not in loop (currently it is the outermost loop), so if here has PreHeader for the loop, we can directly insert the bitcast instruction in PreHeader.
Because the PreHeader must dominate the Loop (It is the only way to go to Loop).

PreHeader
    |
    V
   Loop
craig.topper added inline comments.Mar 1 2022, 11:29 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
198 ↗(On Diff #412302)

There is a function called getLoopPredecessor and another called getLoopPreheader. You called the former, but named the variable like you called the latter.

xiangzhangllvm added inline comments.Mar 1 2022, 11:39 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
198 ↗(On Diff #412302)

Oh! That's really a big mistake! Thanks so much!

nlopes added a subscriber: nlopes.Mar 2 2022, 1:07 AM

I have some concerns with this patch:

  • Why is it adding a new function attribute (tls-load-hoist) that is not documented in LangRef?
  • Why can't this transformation be handled with existing optimizations? Adding e.g. the readonly & speculatable attributes should allow GVN & friends to remove duplicates and hoist these out of loops. If not, then maybe existing optimizations should be improved so others benefit as well.

In summary, do we really need this new pass or is this just a hack?

I have some concerns with this patch:

  • Why is it adding a new function attribute (tls-load-hoist) that is not documented in LangRef?

This is only [1/2] patch, the 2nd will add clang option for it, it will enable this optimization by generating function attribute (tls-load-hoist)

  • Why can't this transformation be handled with existing optimizations? Adding e.g. the readonly & speculatable attributes should allow GVN & friends to remove duplicates and hoist these out of loops. If not, then maybe existing optimizations should be improved so others benefit as well.

In summary, do we really need this new pass or is this just a hack?

I don't know which existing optimizations can do this job for the TLS Value (A readable/writable Global Value). Force add readonly attribute to a writable GV is OK/Safe ? Is GVN really suitable to directly handle it ? In fact I am not an mid-end expert, If it can, Please let me know, I'll go to take a look.

Address Craig's Review.

xiangzhangllvm marked 6 inline comments as done.Mar 2 2022, 2:59 AM
nlopes added a comment.Mar 2 2022, 5:04 AM

AFAIU, you want to remove redundant calls to __tls_get_addr@PLT.
The question is why are these redundant? Is it because no other function (visible to compiler) can change the memory in a way that changes the result of this function?
If so, we consider this kind of functions as accessing "inaccessible memory" only. The semantics might be more complicated, I don't how TLS works.

If we are able to match a set of existing function attributes with the semantics of these TLS functions, then LLVM will remove the function calls for you.

Since I don't know enough of TLS, I can't help much. But if you describe precisely why these calls are redundant, we can tell you which attributes apply (if any).

xiangzhangllvm added a comment.EditedMar 2 2022, 5:41 PM

Hi, Lopes, first thank you for your attention about it.
First let me give a short explain about TLS Variable:
The TLS Variable can be simply seem as a common Global Value (for example thread_local int thl_x) "shared" by different threads, but each thread read/write it without affect other threads use it.
So each thread used different address of it. The function __tls_get_addr is used to get the specific address of TLS Value for current thread. (You can simply seem thl_x as an array thl_x[thread_num],
and seem tls_get_addr as "return &thl_x[thread_id]"). So in same thread, the call of 'tls_get_addr' for same TLS Variable is never changed.

How TLS Variable show in LLVM IR and MIR:
LLVM do not distinguish TLS Variables by threads, the llvm IR/MIR directly use it like other normal Global Value. (for example load the thl_x is some like " %1 = load i32, i32* @thl_x, align 4"), So the called
function __tls_get_addr is invistiable to mid-end and back-end. That is why current optimization don't work on it.

AFAIK, Adding readonly & speculatable attributes let GVN optimization is not suitable. The main job of GVN is simplify the common expression to one GV. The TLS Variable is already a simple GV.
And constantHoist or LICM is also not suitable to do this (even suppose marking readonly attribute to it is no problem). Currently these 2 passes are very clean/pure to do their job, I don't want to mix them with handling TLS.

What's more current patch is simple and small, easy to control. I don't think intergrate it into other optimization will be more simple or clean.

nlopes added a comment.Mar 3 2022, 3:25 AM

Thank you for the explanation!
My confusion I guess stemmed from all the assembly in the comments and tests, which is quite confusing given that the transformation works at IR level.
Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

I guess in the end you want a single fn call per variable, which could be introduced always after the allocas? Unless the price for the call is too high, and then you want to restrict it to the paths where it existed already.

I second Roman's question: why is this not enabled by default?
Also, you cannot introduce a new attribute in LLVM IR without documenting it first in LangRef. Also, we don't use attributes to toggle optimizations on/off. Please remove it and use a cmd flag if the optimization can't be enabled by default for some reason.

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Is this much different than the artificial bitcast we use in ConstantHoisting?

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Is this much different than the artificial bitcast we use in ConstantHoisting?

Didn't know there was a precedent of using this technique. Seems fragile to me, but...

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Is this much different than the artificial bitcast we use in ConstantHoisting?

Didn't know there was a precedent of using this technique. Seems fragile to me, but...

This is running late enough in the pipeline that it's probably fine. Optimization passes that are part of of the codegen pipeline, like ConstantHoisting/CodeGenPrepare/etc. have an implicit contract with each other and SelectionDAG to make this sort of thing work.

Really, this transform should handled by some sort of general LICM, but we can't use IR LICM because the operations aren't visible in IR, and we can't use MachineLICM because we can't really hoist calls after isel. (Making MachineLICM handle this isn't impossible, but it gets messy because you're dealing with target-specific call sequences.)

More generally, the current representation of constants in IR isn't ideal, but improving constant expressions isn't really anyone's priority at the moment. Maybe someone will look at it once the opaque pointers are done.


Would it make sense to do this transform as part of ConstantHoisting, instead of as a separate transform? Most of the necessary infrastructure is already there.

I don't understand why you'd want a clang flag to control this. The backend knows when it's going to generate a call to get the tls pointer. And when we generate a call, hoisting that call is pretty obviously profitable.

xiangzhangllvm added a comment.EditedMar 3 2022, 5:30 PM

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Didn't know there was a precedent of using this technique. Seems fragile to me, but...

This is running late enough in the pipeline that it's probably fine. Optimization passes that are part of of the codegen pipeline, like ConstantHoisting/CodeGenPrepare/etc. have an implicit contract with each other and SelectionDAG to make this sort of thing work.

Yes, @nlopes , I think efriedma has answered your question, and that is why I put the TLSHoist pass at the late pass of mid-end.

Really, this transform should handled by some sort of general LICM, but we can't use IR LICM because the operations aren't visible in IR, and we can't use MachineLICM because we can't really hoist calls after isel. (Making MachineLICM handle this isn't impossible, but it gets messy because you're dealing with target-specific call sequences.)

Yes, that is right, but for MIR, it still can not see the call func (__tls_get_addr), MIR just mark the TLS with a flag, some like:

TLS_base_addr64 $noreg, 1, $noreg, target-flags(x86-tlsld) @thl_x, $noreg,   ...

And MachineLICM only focus on Loops, current patch want to handle the duplicated call in all the function.

Is this much different than the artificial bitcast we use in ConstantHoisting?

More generally, the current representation of constants in IR isn't ideal, but improving constant expressions isn't really anyone's priority at the moment. Maybe someone will look at it once the opaque pointers are done.
Would it make sense to do this transform as part of ConstantHoisting, instead of as a separate transform? Most of the necessary infrastructure is already there.

In fact, I though about it when I begin do this job, their logic is similar. The main reason I didn't go such way (integrate them) is

  1. I want to let TLS hoist pass more later than it.
  2. I want to keep ConstantHoisting is pure to handle Constant (In fact, TLS is not constant).
  3. Let TLSHoist be simple and easy to control (enable or disable it).

I second Roman's question: why is this not enabled by default?

I answer this question before, just want to split the tests updates into a independent patch.
I don't want to let the patch too big. I wish we can quickly push this part in, and go on the next one.

Also, you cannot introduce a new attribute in LLVM IR without documenting it first in >LangRef. Also, we don't use attributes to toggle optimizations on/off. Please remove it >and use a cmd flag if the optimization can't be enabled by default for some reason.
I don't understand why you'd want a clang flag to control this. The backend knows when it's going to generate a call to get the tls pointer. And when we generate a call, hoisting that call is pretty obviously profitable.

Because I want to add an clang option for it. So I provide such a function attribute (flag) to let clang option "control" it.
May be this is not good idea, but I think that is another independent topic, I can refine it in [2/2] patch.

xiangzhangllvm retitled this revision from [1/2] TLS loads opimization (hoist) to [1/3] TLS loads opimization (hoist).Mar 3 2022, 7:03 PM
xiangzhangllvm edited the summary of this revision. (Show Details)

Hello, anybody still have question or concern ? (I think the discussion before is clear.)

Hello @craig.topper , I think you have reviewed all the code, could you help accept it ?
It has last a long time, I wish to go on it. Thanks a lot!

One minor concern; otherwise seems fine.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
52 ↗(On Diff #412365)

Why is this off by default? Do you plan to turn it on by default in a followup?

craig.topper added inline comments.Mar 7 2022, 1:26 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
200 ↗(On Diff #412365)

If the Preheader exists, it isn't empty. It will always have a branch to the loop. There are no fallthroughs in IR. So the terminator will not be nullptr.

239 ↗(On Diff #412365)

Why would DT be null? The pass has DominatorTree as a requirement.

248 ↗(On Diff #412365)

Why would LI be null? The pass has it as a requirement

xiangzhangllvm added inline comments.Mar 7 2022, 4:46 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
52 ↗(On Diff #412365)

Yes, As I answered this question before, the last patch [3/3] will turn it on and update
affected tests.

200 ↗(On Diff #412365)

Eh ..., Yes, Seems make sense, just a question, in which case there will be an empty BB in IR level ? (even the last BB I see is always append with ret instruction) .
I check the BasicBlock::getTerminator() , it is possible return nullptr.

const Instruction *BasicBlock::getTerminator() const {
  if (InstList.empty() || !InstList.back().isTerminator())
    return nullptr;
  return &InstList.back();
}
239 ↗(On Diff #412365)

I am not sure if its requirement will must successful build/generate a DominatorTree.
Or here let me change to assert (DT) ?

248 ↗(On Diff #412365)

The same with DT, thanks for your review!

craig.topper added inline comments.Mar 7 2022, 5:56 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
200 ↗(On Diff #412365)

It can happen when the basic block is first created and not connected to the CFG. But if it's connected, it has a terminator. The successor list of a basic block is stored directly in the terminator instruction. The predecessor list is found by iterating all of the users of the BasicBlock* and looking at which uses are terminator instructions.

craig.topper added inline comments.Mar 7 2022, 5:58 PM
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
239 ↗(On Diff #412365)

You can use assert(DT)

Address Craig's review
+ add function attribute "tls-load-hoist" to /docs/LangRef.rst

xiangzhangllvm marked 5 inline comments as done.Mar 7 2022, 7:40 PM
xiangzhangllvm added inline comments.
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
200 ↗(On Diff #412365)

Thanks for your explain!

xiangzhangllvm marked an inline comment as done.Mar 8 2022, 4:26 PM

If no problem, anyone can help re-accepted it? Let me go on. Thanks a lot!

Hello @craig.topper Could you help re-accept it?
I think there is no big problem after times of review, what's more, currently this pass is disable in default.

Thanks you so much!
Thanks for all reviews!

This revision was landed with ongoing or failed builds.Mar 9 2022, 5:31 PM
This revision was automatically updated to reflect the committed changes.