This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
-
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
MachinePassRegistry.def
-
InitializePasses.h
2/2
LinkAllPasses.h
-
Transforms/
1/1
Scalar.h
-
Scalar/
7/8
TLSVariableHoist.h
-
lib/
-
CodeGen/
-
TargetPassConfig.cpp
-
Passes/
1/1
PassBuilder.cpp
-
PassRegistry.def
-
Transforms/Scalar/
-
Scalar/
-
CMakeLists.txt
-
Scalar.cpp
43/52
TLSVariableHoist.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
1/2
O3-pipeline.ll
-
AMDGPU/
-
llc-pipeline.ll
-
ARM/
-
O3-pipeline.ll
-
PowerPC/
-
O3-pipeline.ll
-
X86/
-
opt-pipeline.ll
6/9
tls-loads-control.ll
1/2
tls-loads-control2.ll
6/9
tls-loads-control3.ll
-
tools/llc/
-
llc/
-
llc.cpp

Differential D120000

[1/3] TLS loads opimization (hoist)
ClosedPublic

Authored by xiangzhangllvm on Feb 16 2022, 7:19 PM.

Download Raw Diff

Details

Reviewers

LuoYuanke
pengfei
andrew.w.kaylor
clin1
craig.topper
efriedma
nlopes

Commits

rGc31014322c0b: TLS loads opimization (hoist)
rG30e612ebdfb0: TLS loads opimization (hoist)

Summary

When we access a TLS variable in PIC mode, it usually get the TLS address by calling a lib function, some like
callq __tls_get_addr@PLT
This call was not show in IR or MIR, usually tag by a target-special flag (like pic) and generated in Assembly Printing.
So it is usually call it every time when TLS variable is accessed. Many of them are duplicated, especially in loops.

This patch is try to optimize it. It identifies/eliminate Redundant TLS address call by hoist the TLS access when the related option is set.

For example:

static __thread int x;
int g();
int f(int c) {
  int *px = &x;
  while (c--)
    *px += g();
  return *px;
}

will generated Redundant TLS Loads by compiling it with
Clang++ -fPIC -ftls-model=global-dynamic -O2 -S

.LBB0_2:                                # %while.body
                                        # =>This Inner Loop Header: Depth=1
        callq   _Z1gv@PLT
        movl    %eax, %ebp
        leaq    _ZL1x@TLSLD(%rip), %rdi
        callq   __tls_get_addr@PLT
        addl    _ZL1x@DTPOFF(%rax), %ebp
        movl    %ebp, _ZL1x@DTPOFF(%rax)
        addl    $-1, %ebx
        jne     .LBB0_2
        jmp     .LBB0_3
.LBB0_4:                                # %entry.while.end_crit_edge
        leaq    _ZL1x@TLSLD(%rip), %rdi
        callq   __tls_get_addr@PLT
        movl    _ZL1x@DTPOFF(%rax), %ebp

The Redundant TLS Loads will hurt the performance, especially in loops.
So we try to eliminate/move them if required by customers, let it be:

# %bb.0:                                # %entry
         ...
         movl    %edi, %ebx
         leaq    _ZL1x@TLSLD(%rip), %rdi
         callq   __tls_get_addr@PLT
         leaq    _ZL1x@DTPOFF(%rax), %r14
         testl   %ebx, %ebx
         je      .LBB0_1
 .LBB0_2:                                # %while.body
                                         # =>This Inner Loop Header: Depth=1
         callq   _Z1gv@PLT
         addl    (%r14), %eax
         movl    %eax, (%r14)
         addl    $-1, %ebx
         jne     .LBB0_2
         jmp     .LBB0_3

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

xiangzhangllvm added inline comments.Feb 21 2022, 1:23 AM

llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
294–296 ↗	(On Diff #409470)	This pass is very later at Scalar optmization. I think we no need to preserve all. Let me carefully re-consider here, thanks!
llvm/test/CodeGen/X86/tls-loads-control.ll
2	The test contain "!3 = !{i32 1, !"tls-load-control", !"Optimize"}" , it has same functonality with "--tls-load-control=Optimize" But test both of them here is not bad, I think.
12–14	This test is directly generate from a clang test I'll commit latter. So let it be the raw result of source file out put (I comment at line 8) is good to check the source of it. So I didn't manually change it more.

First let me quickly update some parts of it. Thanks a lot!

xiangzhangllvm added a reviewer: craig.topper.Feb 21 2022, 1:25 AM

Harbormaster completed remote builds in B150653: Diff 410243.Feb 21 2022, 2:15 AM

LuoYuanke added inline comments.Feb 21 2022, 5:13 AM

llvm/include/llvm/Transforms/Scalar/TLSVariableControl.h
74 ↗	(On Diff #410243)	I curious why not include the header file BasicBlock.h?
94 ↗	(On Diff #410243)	uses -> users?
llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
46 ↗	(On Diff #410243)	"Eleminate remove" -> Eliminate?
135 ↗	(On Diff #410243)	Drop brace.

xiangzhangllvm added inline comments.Feb 21 2022, 4:32 PM

llvm/include/llvm/Transforms/Scalar/TLSVariableControl.h
74 ↗	(On Diff #410243)	We prefer to include the *.h at x.cpp , this make x.h is simple. This style can easy find example in other files, for example "Scalar/GVNSink.cpp"
94 ↗	(On Diff #410243)	Make sense, The Use self is no problem, it means where is it used. But I name the "struct TLSUser" for its element, let me change it, thanks!

xiangzhangllvm added inline comments.Feb 21 2022, 4:35 PM

llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
135 ↗	(On Diff #410243)	I remember we should not remove "{}" for "else" if "if" has "{}"

Refine the code according to reviews, thanks a lot!

xiangzhangllvm marked 20 inline comments as done.Feb 21 2022, 6:47 PM

xiangzhangllvm added inline comments.

llvm/lib/Transforms/Scalar/TLSVariableControl.cpp
115 ↗	(On Diff #409470)	The pass will change the code "move GV ahead", so let's just preserve CFG here.
294–296 ↗	(On Diff #409470)	The pass may change the code "move GV ahead", so let's just preserve CFG here.

Refine name "control" --> "hoist"

Harbormaster completed remote builds in B150793: Diff 410431.Feb 21 2022, 7:55 PM

Fix tests (MLIR.Examples/standalone::test.toy and libFuzzer.libFuzzer::large.test looks no relation with this patch)

Herald added subscribers: kerbowa, jvesely, nemanjai. · View Herald TranscriptFeb 21 2022, 10:29 PM

Harbormaster completed remote builds in B150810: Diff 410450.Feb 21 2022, 11:17 PM

pengfei added inline comments.Feb 21 2022, 11:34 PM

llvm/include/llvm/LinkAllPasses.h
180	Keep the same format looks better. Up to you.
llvm/include/llvm/Transforms/Scalar.h
432	What's "prepares a function" mean?
llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
2	Still less than 80.
22	clang
llvm/test/CodeGen/AArch64/O3-pipeline.ll
69	Maybe preserve loop infor too. The pass does nothing with it. This may help with the following passes not run it again.
llvm/test/CodeGen/X86/tls-loads-control.ll
177	There meta data is annoying, especially the OneAPI info here doesn't make sense to llvm. The same below.
llvm/test/CodeGen/X86/tls-loads-control3.ll
161	Add `nounwind` in attributes to avoid cfi directives.

Address Phoebe's reviews, thanks a lot!

xiangzhangllvm marked 7 inline comments as done.Feb 22 2022, 1:49 AM

xiangzhangllvm added inline comments.

llvm/include/llvm/LinkAllPasses.h
180	Yes, this is follow clang-format, let it be.
llvm/test/CodeGen/AArch64/O3-pipeline.ll
69	This pass changed the instructions, so BB info should better be updated.
llvm/test/CodeGen/X86/tls-loads-control.ll
177	Done, but I think keep more info is not bad for test, maybe it is personal habits, I usually prefer to sync the .ll with the .c/cpp
llvm/test/CodeGen/X86/tls-loads-control3.ll
161	Add nounwind to #0, but seems cfi still here, anyway, that is not important, I think.

Harbormaster completed remote builds in B150830: Diff 410479.Feb 22 2022, 2:10 AM

pengfei added inline comments.Feb 22 2022, 3:17 AM

llvm/test/CodeGen/X86/tls-loads-control3.ll
41–43	`data16` and `rex64` don't seem correct instructions.
161	Did you re-generate the tests? It should work. lit test will fail if you didn't update the tests.

xiangzhangllvm marked 3 inline comments as done.Feb 22 2022, 4:57 PM

xiangzhangllvm added inline comments.

llvm/test/CodeGen/X86/tls-loads-control3.ll

41–43

Yes, there are prefix not real instruction, I can see them in X86AsmParser.cpp , not clear what they are used for ? Seems no relation with this patch.

161

Yes, re-generate it with update_llc_test_checks.py
let me show my local status:

[xiangzh1@..]$./llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/X86/tls-loads-control3.ll
[xiangzh1@..]$git diff
[xiangzh1@..]$llvm-lit llvm/test/CodeGen/X86/tls-loads-control3.ll
-- Testing: 1 tests, 1 workers --
PASS: LLVM :: CodeGen/X86/tls-loads-control3.ll (1 of 1)

Testing Time: 0.21s
  Passed: 1

xiangzhangllvm added inline comments.Feb 22 2022, 5:09 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
31	Don't know why clang-format will move it ahead, seems clang-format's bug. Let me re-clang-format for other places.

Do clang format

Harbormaster completed remote builds in B150968: Diff 410682.Feb 22 2022, 6:13 PM

LGTM. Please wait some days for other reviewers' opinions.

This revision is now accepted and ready to land.Feb 22 2022, 7:41 PM

craig.topper added inline comments.Feb 22 2022, 8:29 PM

llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
65	Is SetVector used by this file?
69	are any algorithms used in this file?
71	Is vector used by this file?
123	Varibles -> Variables. Can we use llvm::MapVector to avoid a separate SmallVector?
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
11	exmaple -> example
11	PLS -> Please?
132	The GV field in TLSCandidate isn't assigned is it?
137	`TLSCandMap[GV].addUser(Inst, Idx);` works even if GV isn't in the map. The entry will be default constructed before addUser is called.
146	can we use llvm::any_of here?
187	Is this before the allocas?
219	`Replaced \|= tryReplaceTLSCandidate(Fn, GV);`
227	Should we call skipFunction() for opt-bisect-limit and optnone support?
229	Do we normally use capitalized strings?

craig.topper added inline comments.Feb 22 2022, 8:34 PM

llvm/include/llvm/IR/Module.h
914 ↗	(On Diff #410682)	Why a module flag? What is the policy for LTO merging?

xiangzhangllvm added inline comments.Feb 22 2022, 9:52 PM

llvm/include/llvm/IR/Module.h
914 ↗	(On Diff #410682)	Because the TLS is Global Value which has Scope for module. I am not clear about the LTO merging, I think one module's flag should not "spread" to another module.
llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
123	Let me check the MapVector, rarely use it before : ) thanks a lot!
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
132	Yes, I should remove the "GV" field from TLSCandidate.
137	Yes, So here the "else" means TLSCandMap.count(GV) != 0 (GV is in the map)
146	Yes, I checked it, that is good, I never use it before : )
187	Sorry, don't much understand, What the problem if it before allocas ? This is in IR level and the Global Value do not need "allocas"
219	I think "\|=" is a bit operation. Here is bool, the stardard way is "\|\|"
227	1 Yes, we should consider skipFunction, good catch! 2 optnone will not created this pass at TargetPassConfig.cpp
229	I checked the other similar uses, we normally not use capitalized strings, let me change it, thanks a lot!

craig.topper added inline comments.Feb 22 2022, 11:22 PM

llvm/include/llvm/IR/Module.h
914 ↗	(On Diff #410682)	The first thing LTO does is merges all modules into a single module. Then the optimization pipeline runs on that unified module. If we made it a function attribute would anything break? Could two different functions have a different hoisting policy? That would avoid the LTO issue.
llvm/lib/IR/Module.cpp
738 ↗	(On Diff #410682)	I believe the `ModFlagBehavior::Error` here controls what happens in LTO. It will fail to merge if the modules have different flags.
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
187	The alloca instructions for the function's local variables are the first instructons in the entry basic block. Not if we should be putting the bitcast before them.
219	`MadeChange \|=` is a frequent pattern used by passes in llvm.
227	There is an optnone function attribute which is different than the global optlevel.

xiangzhangllvm added inline comments.Feb 22 2022, 11:39 PM

llvm/include/llvm/IR/Module.h
914 ↗	(On Diff #410682)	OK, let me move it into function attribute. Thanks a lot!
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
187	OK, Let me re-place the bitcast position.
227	OK, let me check it again, but for skipFunction, I can't call it here, because the TLSVariableHoistPass not inherit FunctionPass, but I already check it at TLSVariableHoistLegacyPass::runOnFunction.

LuoYuanke added inline comments.Feb 22 2022, 11:52 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
187	Would you add a test case with alloca for checking?

xiangzhangllvm added inline comments.Feb 23 2022, 1:42 AM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
187	Yes, no problem

Address Craig's reviews

xiangzhangllvm marked 13 inline comments as done.Feb 23 2022, 2:57 AM

xiangzhangllvm added inline comments.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
187	Let me refine here a little later, I am considering get the insert position by checking dominate relation to reduce life range.

Harbormaster completed remote builds in B151023: Diff 410758.Feb 23 2022, 3:39 AM

craig.topper added inline comments.Feb 23 2022, 12:59 PM

llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h
69	You're not using std::map in this file
llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
127	Merge into the previous if with \|\|
136	This line should work for the case when GV isn't already in the map. The operator[] on the MapVector will default construct a TLS candidate before calling addUser. So you don't need to check TLSCandMap.count

craig.topper added inline comments.Feb 23 2022, 12:59 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
224	This might be handled by the pass manager for the new pass manager. For the old pass manager it is part of skipFunction.

What are the legality considerations for this transformation,
when is it legal to perform it? I'm mainly confused why it's opt-in.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
46–53	This should be an enum, not strings

In D120000#3341406, @lebedev.ri wrote:

What are the legality considerations for this transformation,
when is it legal to perform it? I'm mainly confused why it's opt-in.

This only try hoist TLS address call, the TLS variable will only used in thread function, so it is always legal to do this transformation in a function.
Sorry, what you mean "opt-in" ?

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
46–53	yes, we usually use enum here, but I find string is better/easier, because we don't need to define the enum (sometimes may define 2 or more times if we want to use it in clang, for example transfer a clang option to "-mllvm -tls-load-hoist=xxx")
136	Make sense!
224	Yes, Both pass manager will call runImpl, it called by "TLSVariableHoistPass::run" and ."TLSVariableHoistLegacyPass::runOnFunction" , (I forget which is new and which is old)

1 Address Craig's reviews.
2 Re-write the function findInsertPos, use dominate relation to choose the insert position, this will help generate shorter life ranges.

Herald added a subscriber: mstorsjo. · View Herald TranscriptFeb 24 2022, 2:20 AM

Harbormaster completed remote builds in B151215: Diff 411048.Feb 24 2022, 3:12 AM

pengfei added inline comments.Feb 24 2022, 4:39 AM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
169	You need run clang-format for your new code.
llvm/test/CodeGen/X86/tls.ll
1–8 ↗	(On Diff #411048)	Why remove the existing test?
512–513 ↗	(On Diff #411048)	The existing tests also prefer simple meta data. So remove them in the new test.

Clang-format and recover tls.ll test. Thanks a lot!

Harbormaster completed remote builds in B151381: Diff 411270.Feb 24 2022, 5:53 PM

xiangzhangllvm marked an inline comment as done.Feb 24 2022, 7:02 PM

Hi Craig, If you feel no problem, I'll submit it in these days. And go on the 2nd patch for clang. Thanks!

(The biggest change from your last review is that I re-write the function findInsertPos, use dominate relation to choose the insert position.)

I just noticed this patch from Phabricator "Recent Activity" and did not look into it closely. How does it interact with createCleanupLocalDynamicTLSPass?
Do we need multiple passes to deal with __tls_get_addr calls?

Do you have a plan to implement clang -mtls-dialect=gnu2? I have had such a plan for a long time but always have more prioritized work to do...
ld.lld has great TLSDESC support since D116900. If we can add support and let GCC/Clang default to TLS descriptors, there is just no need adding more code dealing with these traditional (legacy) TLS models.

In D120000#3348110, @MaskRay wrote:

Do you have a plan to implement clang -mtls-dialect=gnu2? I have had such a plan for a long time but always have more prioritized work to do...
ld.lld has great TLSDESC support since D116900. If we can add support and let GCC/Clang default to TLS descriptors, there is just no need adding more code dealing with these traditional (legacy) TLS models.

PLS let me take a look on createCleanupLocalDynamicTLSPass, I never read it before.
Could you give me more information about implement clang -mtls-dialect=gnu2? I am happy to do that, PLS let me try. (If my manager not transfer me urgent jobs)

Clear now. The createCleanupLocalDynamicTLSPass is integrated in ISel , it is mostly a local/peephole optimization for variables with same TLS base address.
It can't analysis the TLS variables in different BBs or Loops. PLS refer https://godbolt.org/z/f7WrqK5he
But it is a good supplement for current patch. because it can optimize different TLS variables in local. (Current patch is focus on same TLS variable which used in different BBs or Loops)

In D120000#3341823, @xiangzhangllvm wrote:

In D120000#3341406, @lebedev.ri wrote:

What are the legality considerations for this transformation,
when is it legal to perform it? I'm mainly confused why it's opt-in.

This only try hoist TLS address call, the TLS variable will only used in thread function, so it is always legal to do this transformation in a function.

What about profitability, is it always a win?

Sorry, what you mean "opt-in" ?

Why is it not enabled by default?

In D120000#3348559, @lebedev.ri wrote:

What about profitability, is it always a win?

It remove duplicated TLS address call, In theory, it has big profitability to win.
But similar with most optimization, I can't say it must win in any case.

Sorry, what you mean "opt-in" ?

Why is it not enabled by default?

This is a general optimization which will affect all targets test, I'd like to enable it by another independent patch with mainly modify tests.

Let me commit it and go on the 2nd patch. Thanks for all your reviews!

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 4:58 PM

dexonsmith removed a subscriber: dexonsmith.Mar 1 2022, 5:06 PM

This revision was landed with ongoing or failed builds.Mar 1 2022, 6:37 PM

Closed by commit rG30e612ebdfb0: TLS loads opimization (hoist) (authored by xiangzhangllvm). · Explain Why

This revision was automatically updated to reflect the committed changes.

xiangzhangllvm added a commit: rG30e612ebdfb0: TLS loads opimization (hoist).

Did anyone sign off on the new insertion position code? It looks like the only approval is from before you added that. This should have been moved back to needs review.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
191	Why not pass the Loop* you already looked up in the caller. Why get the loop again?
195	outmost -> outermost
202	I'm not sure that's true. Control flow in IR is explicit, there is no fallthrough. If it's a predecessor it must have a branch instruction of some kind.

craig.topper added inline comments.Mar 1 2022, 7:07 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
157	method names should be lowercase
199	Predecessor and PreHeader aren't quite the same thing. A PreHeader also has the loop as it's only successor.

No problem, let me revert and refine. Thanks!

xiangzhangllvm added a reverting change: rG65588a0776ae: Revert "TLS loads opimization (hoist)".Mar 1 2022, 10:11 PM

xiangzhangllvm reopened this revision.Mar 1 2022, 10:11 PM

This revision is now accepted and ready to land.Mar 1 2022, 10:11 PM

xiangzhangllvm added inline comments.Mar 1 2022, 10:39 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
199	Sorry, not much clear about it. My understand is the PreHeader here not in loop (currently it is the outermost loop), so if here has PreHeader for the loop, we can directly insert the bitcast instruction in PreHeader. Because the PreHeader must dominate the Loop (It is the only way to go to Loop). PreHeader \| V Loop

craig.topper added inline comments.Mar 1 2022, 11:29 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
199	There is a function called getLoopPredecessor and another called getLoopPreheader. You called the former, but named the variable like you called the latter.

xiangzhangllvm added inline comments.Mar 1 2022, 11:39 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
199	Oh! That's really a big mistake! Thanks so much!

I have some concerns with this patch:

Why is it adding a new function attribute (tls-load-hoist) that is not documented in LangRef?
Why can't this transformation be handled with existing optimizations? Adding e.g. the readonly & speculatable attributes should allow GVN & friends to remove duplicates and hoist these out of loops. If not, then maybe existing optimizations should be improved so others benefit as well.

In summary, do we really need this new pass or is this just a hack?

In D120000#3353738, @nlopes wrote:

I have some concerns with this patch:

Why is it adding a new function attribute (tls-load-hoist) that is not documented in LangRef?

This is only [1/2] patch, the 2nd will add clang option for it, it will enable this optimization by generating function attribute (tls-load-hoist)

Why can't this transformation be handled with existing optimizations? Adding e.g. the readonly & speculatable attributes should allow GVN & friends to remove duplicates and hoist these out of loops. If not, then maybe existing optimizations should be improved so others benefit as well.

In summary, do we really need this new pass or is this just a hack?

I don't know which existing optimizations can do this job for the TLS Value (A readable/writable Global Value). Force add readonly attribute to a writable GV is OK/Safe ? Is GVN really suitable to directly handle it ? In fact I am not an mid-end expert, If it can, Please let me know, I'll go to take a look.

Address Craig's Review.

xiangzhangllvm marked 6 inline comments as done.Mar 2 2022, 2:59 AM

Harbormaster completed remote builds in B152128: Diff 412365.Mar 2 2022, 4:26 AM

AFAIU, you want to remove redundant calls to __tls_get_addr@PLT.
The question is why are these redundant? Is it because no other function (visible to compiler) can change the memory in a way that changes the result of this function?
If so, we consider this kind of functions as accessing "inaccessible memory" only. The semantics might be more complicated, I don't how TLS works.

If we are able to match a set of existing function attributes with the semantics of these TLS functions, then LLVM will remove the function calls for you.

Since I don't know enough of TLS, I can't help much. But if you describe precisely why these calls are redundant, we can tell you which attributes apply (if any).

Hi, Lopes, first thank you for your attention about it.
First let me give a short explain about TLS Variable:
The TLS Variable can be simply seem as a common Global Value (for example thread_local int thl_x) "shared" by different threads, but each thread read/write it without affect other threads use it.
So each thread used different address of it. The function __tls_get_addr is used to get the specific address of TLS Value for current thread. (You can simply seem thl_x as an array thl_x[thread_num],
and seem tls_get_addr as "return &thl_x[thread_id]"). So in same thread, the call of 'tls_get_addr' for same TLS Variable is never changed.

How TLS Variable show in LLVM IR and MIR:
LLVM do not distinguish TLS Variables by threads, the llvm IR/MIR directly use it like other normal Global Value. (for example load the thl_x is some like " %1 = load i32, i32* @thl_x, align 4"), So the called
function __tls_get_addr is invistiable to mid-end and back-end. That is why current optimization don't work on it.

AFAIK, Adding readonly & speculatable attributes let GVN optimization is not suitable. The main job of GVN is simplify the common expression to one GV. The TLS Variable is already a simple GV.
And constantHoist or LICM is also not suitable to do this (even suppose marking readonly attribute to it is no problem). Currently these 2 passes are very clean/pure to do their job, I don't want to mix them with handling TLS.

What's more current patch is simple and small, easy to control. I don't think intergrate it into other optimization will be more simple or clean.

Thank you for the explanation!
My confusion I guess stemmed from all the assembly in the comments and tests, which is quite confusing given that the transformation works at IR level.
Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

I guess in the end you want a single fn call per variable, which could be introduced always after the allocas? Unless the price for the call is too high, and then you want to restrict it to the paths where it existed already.

I second Roman's question: why is this not enabled by default?
Also, you cannot introduce a new attribute in LLVM IR without documenting it first in LangRef. Also, we don't use attributes to toggle optimizations on/off. Please remove it and use a cmd flag if the optimization can't be enabled by default for some reason.

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Is this much different than the artificial bitcast we use in ConstantHoisting?

In D120000#3357413, @craig.topper wrote:

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Is this much different than the artificial bitcast we use in ConstantHoisting?

Didn't know there was a precedent of using this technique. Seems fragile to me, but...

In D120000#3357639, @nlopes wrote:

In D120000#3357413, @craig.topper wrote:

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Is this much different than the artificial bitcast we use in ConstantHoisting?

Didn't know there was a precedent of using this technique. Seems fragile to me, but...

This is running late enough in the pipeline that it's probably fine. Optimization passes that are part of of the codegen pipeline, like ConstantHoisting/CodeGenPrepare/etc. have an implicit contract with each other and SelectionDAG to make this sort of thing work.

Really, this transform should handled by some sort of general LICM, but we can't use IR LICM because the operations aren't visible in IR, and we can't use MachineLICM because we can't really hoist calls after isel. (Making MachineLICM handle this isn't impossible, but it gets messy because you're dealing with target-specific call sequences.)

More generally, the current representation of constants in IR isn't ideal, but improving constant expressions isn't really anyone's priority at the moment. Maybe someone will look at it once the opaque pointers are done.

Would it make sense to do this transform as part of ConstantHoisting, instead of as a separate transform? Most of the necessary infrastructure is already there.

I don't understand why you'd want a clang flag to control this. The backend knows when it's going to generate a call to get the tls pointer. And when we generate a call, hoisting that call is pretty obviously profitable.

In D120000#3358477, @efriedma wrote:

In D120000#3357639, @nlopes wrote:

In D120000#3357413, @craig.topper wrote:

Anyway, this transformation works by creating some artificial bitcasts that you expect to be carried over to the backend so the lowering can share the calls to the TLS function.

This strategy seems very brittle to me. If some later optimization decides to remove the bitcast, your optimization will stop working. It's very likely that will be the case once opaque pointers take over.

Didn't know there was a precedent of using this technique. Seems fragile to me, but...

This is running late enough in the pipeline that it's probably fine. Optimization passes that are part of of the codegen pipeline, like ConstantHoisting/CodeGenPrepare/etc. have an implicit contract with each other and SelectionDAG to make this sort of thing work.

Yes, @nlopes , I think efriedma has answered your question, and that is why I put the TLSHoist pass at the late pass of mid-end.

Really, this transform should handled by some sort of general LICM, but we can't use IR LICM because the operations aren't visible in IR, and we can't use MachineLICM because we can't really hoist calls after isel. (Making MachineLICM handle this isn't impossible, but it gets messy because you're dealing with target-specific call sequences.)

Yes, that is right, but for MIR, it still can not see the call func (__tls_get_addr), MIR just mark the TLS with a flag, some like:

TLS_base_addr64 $noreg, 1, $noreg, target-flags(x86-tlsld) @thl_x, $noreg,   ...

And MachineLICM only focus on Loops, current patch want to handle the duplicated call in all the function.

Is this much different than the artificial bitcast we use in ConstantHoisting?

More generally, the current representation of constants in IR isn't ideal, but improving constant expressions isn't really anyone's priority at the moment. Maybe someone will look at it once the opaque pointers are done.
Would it make sense to do this transform as part of ConstantHoisting, instead of as a separate transform? Most of the necessary infrastructure is already there.

In fact, I though about it when I begin do this job, their logic is similar. The main reason I didn't go such way (integrate them) is

I want to let TLS hoist pass more later than it.
I want to keep ConstantHoisting is pure to handle Constant (In fact, TLS is not constant).
Let TLSHoist be simple and easy to control (enable or disable it).

I second Roman's question: why is this not enabled by default?

I answer this question before, just want to split the tests updates into a independent patch.
I don't want to let the patch too big. I wish we can quickly push this part in, and go on the next one.

Also, you cannot introduce a new attribute in LLVM IR without documenting it first in >LangRef. Also, we don't use attributes to toggle optimizations on/off. Please remove it >and use a cmd flag if the optimization can't be enabled by default for some reason.
I don't understand why you'd want a clang flag to control this. The backend knows when it's going to generate a call to get the tls pointer. And when we generate a call, hoisting that call is pretty obviously profitable.

Because I want to add an clang option for it. So I provide such a function attribute (flag) to let clang option "control" it.
May be this is not good idea, but I think that is another independent topic, I can refine it in [2/2] patch.

xiangzhangllvm added reviewers: efriedma, nlopes.Mar 3 2022, 6:59 PM

xiangzhangllvm retitled this revision from [1/2] TLS loads opimization (hoist) to [1/3] TLS loads opimization (hoist).Mar 3 2022, 7:03 PM

xiangzhangllvm edited the summary of this revision. (Show Details)

Hello, anybody still have question or concern ? （I think the discussion before is clear.）

Hello @craig.topper , I think you have reviewed all the code, could you help accept it ?
It has last a long time, I wish to go on it. Thanks a lot!

One minor concern; otherwise seems fine.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
53	Why is this off by default? Do you plan to turn it on by default in a followup?

craig.topper added inline comments.Mar 7 2022, 1:26 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
201	If the Preheader exists, it isn't empty. It will always have a branch to the loop. There are no fallthroughs in IR. So the terminator will not be nullptr.
240	Why would DT be null? The pass has DominatorTree as a requirement.
249	Why would LI be null? The pass has it as a requirement

xiangzhangllvm added inline comments.Mar 7 2022, 4:46 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
53	Yes, As I answered this question before, the last patch [3/3] will turn it on and update affected tests.
201	Eh ..., Yes, Seems make sense, just a question, in which case there will be an empty BB in IR level ? (even the last BB I see is always append with ret instruction) . I check the BasicBlock::getTerminator() , it is possible return nullptr. const Instruction *BasicBlock::getTerminator() const { if (InstList.empty() \|\| !InstList.back().isTerminator()) return nullptr; return &InstList.back(); }
240	I am not sure if its requirement will must successful build/generate a DominatorTree. Or here let me change to assert (DT) ?
249	The same with DT, thanks for your review!

craig.topper added inline comments.Mar 7 2022, 5:56 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
201	It can happen when the basic block is first created and not connected to the CFG. But if it's connected, it has a terminator. The successor list of a basic block is stored directly in the terminator instruction. The predecessor list is found by iterating all of the users of the BasicBlock* and looking at which uses are terminator instructions.

craig.topper added inline comments.Mar 7 2022, 5:58 PM

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
240	You can use assert(DT)

Address Craig's review
+ add function attribute "tls-load-hoist" to /docs/LangRef.rst

Herald added a subscriber: jdoerfert. · View Herald TranscriptMar 7 2022, 7:35 PM

xiangzhangllvm marked 5 inline comments as done.Mar 7 2022, 7:40 PM

xiangzhangllvm added inline comments.

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp
201	Thanks for your explain!

Harbormaster completed remote builds in B153070: Diff 413687.Mar 7 2022, 8:27 PM

If no problem, anyone can help re-accepted it? Let me go on. Thanks a lot!

Hello @craig.topper Could you help re-accept it?
I think there is no big problem after times of review, what's more, currently this pass is disable in default.

LGTM

Thanks you so much!
Thanks for all reviews!

This revision was landed with ongoing or failed builds.Mar 9 2022, 5:31 PM

Closed by commit rGc31014322c0b: TLS loads opimization (hoist) (authored by xiangzhangllvm). · Explain Why

This revision was automatically updated to reflect the committed changes.

xiangzhangllvm added a commit: rGc31014322c0b: TLS loads opimization (hoist).

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

5 lines

include/

llvm/

CodeGen/

MachinePassRegistry.def

1 line

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

6 lines

Scalar/

TLSVariableHoist.h

131 lines

lib/

CodeGen/

TargetPassConfig.cpp

3 lines

Passes/

PassBuilder.cpp

1 line

PassRegistry.def

1 line

Transforms/

Scalar/

CMakeLists.txt

1 line

Scalar.cpp

1 line

TLSVariableHoist.cpp

313 lines

test/

CodeGen/

AArch64/

O3-pipeline.ll

2 lines

AMDGPU/

llc-pipeline.ll

7 lines

ARM/

O3-pipeline.ll

1 line

PowerPC/

O3-pipeline.ll

1 line

X86/

opt-pipeline.ll

2 lines

tls-loads-control.ll

248 lines

tls-loads-control2.ll

51 lines

tls-loads-control3.ll

358 lines

tools/

llc/

llc.cpp

1 line

Diff 414250

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,108 Lines • ▼ Show 20 Lines	``"denormal-fp-math-f32"``
attempt is made to diagnose unsupported uses. Currently this		attempt is made to diagnose unsupported uses. Currently this
attribute is respected by the AMDGPU and NVPTX backends.		attribute is respected by the AMDGPU and NVPTX backends.

``"thunk"``		``"thunk"``
This attribute indicates that the function will delegate to some other		This attribute indicates that the function will delegate to some other
function with a tail call. The prototype of a thunk should not be used for		function with a tail call. The prototype of a thunk should not be used for
optimization purposes. The caller is expected to cast the thunk prototype to		optimization purposes. The caller is expected to cast the thunk prototype to
match the thunk target prototype.		match the thunk target prototype.

		``"tls-load-hoist"``
		This attribute indicates that the function will try to reduce redundant
		tls address caculation by hoisting tls variable.

``uwtable[(sync\|async)]``		``uwtable[(sync\|async)]``
This attribute indicates that the ABI being targeted requires that		This attribute indicates that the ABI being targeted requires that
an unwind table entry be produced for this function even if we can		an unwind table entry be produced for this function even if we can
show that no exceptions passes by it. This is normally the case for		show that no exceptions passes by it. This is normally the case for
the ELF x86-64 abi, but it can be disabled for some compilation		the ELF x86-64 abi, but it can be disabled for some compilation
units. The optional parameter describes what kind of unwind tables		units. The optional parameter describes what kind of unwind tables
to generate: ``sync`` for normal unwind tables, ``async`` for asynchronous		to generate: ``sync`` for normal unwind tables, ``async`` for asynchronous
(instruction precise) unwind tables. Without the parameter, the attribute		(instruction precise) unwind tables. Without the parameter, the attribute
▲ Show 20 Lines • Show All 22,059 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/MachinePassRegistry.def

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ())			FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ())
	FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ())			FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ())
	FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false))			FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false))
	FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass, (true))			FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass, (true))
	FUNCTION_PASS("expand-reductions", ExpandReductionsPass, ())			FUNCTION_PASS("expand-reductions", ExpandReductionsPass, ())
	FUNCTION_PASS("expandvp", ExpandVectorPredicationPass, ())			FUNCTION_PASS("expandvp", ExpandVectorPredicationPass, ())
	FUNCTION_PASS("lowerinvoke", LowerInvokePass, ())			FUNCTION_PASS("lowerinvoke", LowerInvokePass, ())
	FUNCTION_PASS("scalarize-masked-mem-intrin", ScalarizeMaskedMemIntrinPass, ())			FUNCTION_PASS("scalarize-masked-mem-intrin", ScalarizeMaskedMemIntrinPass, ())
				FUNCTION_PASS("tlshoist", TLSVariableHoistPass, ())
	FUNCTION_PASS("verify", VerifierPass, ())			FUNCTION_PASS("verify", VerifierPass, ())
	#undef FUNCTION_PASS			#undef FUNCTION_PASS

	#ifndef LOOP_PASS			#ifndef LOOP_PASS
	#define LOOP_PASS(NAME, PASS_NAME, CONSTRUCTOR)			#define LOOP_PASS(NAME, PASS_NAME, CONSTRUCTOR)
	#endif			#endif
	LOOP_PASS("loop-reduce", LoopStrengthReducePass, ())			LOOP_PASS("loop-reduce", LoopStrengthReducePass, ())
	#undef LOOP_PASS			#undef LOOP_PASS
	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines
	void initializeStripSymbolsPass(PassRegistry&);			void initializeStripSymbolsPass(PassRegistry&);
	void initializeStructurizeCFGLegacyPassPass(PassRegistry &);			void initializeStructurizeCFGLegacyPassPass(PassRegistry &);
	void initializeTailCallElimPass(PassRegistry&);			void initializeTailCallElimPass(PassRegistry&);
	void initializeTailDuplicatePass(PassRegistry&);			void initializeTailDuplicatePass(PassRegistry&);
	void initializeTargetLibraryInfoWrapperPassPass(PassRegistry&);			void initializeTargetLibraryInfoWrapperPassPass(PassRegistry&);
	void initializeTargetPassConfigPass(PassRegistry&);			void initializeTargetPassConfigPass(PassRegistry&);
	void initializeTargetTransformInfoWrapperPassPass(PassRegistry&);			void initializeTargetTransformInfoWrapperPassPass(PassRegistry&);
	void initializeThreadSanitizerLegacyPassPass(PassRegistry&);			void initializeThreadSanitizerLegacyPassPass(PassRegistry&);
				void initializeTLSVariableHoistLegacyPassPass(PassRegistry &);
	void initializeTwoAddressInstructionPassPass(PassRegistry&);			void initializeTwoAddressInstructionPassPass(PassRegistry&);
	void initializeTypeBasedAAWrapperPassPass(PassRegistry&);			void initializeTypeBasedAAWrapperPassPass(PassRegistry&);
	void initializeTypePromotionPass(PassRegistry&);			void initializeTypePromotionPass(PassRegistry&);
	void initializeUnifyFunctionExitNodesLegacyPassPass(PassRegistry &);			void initializeUnifyFunctionExitNodesLegacyPassPass(PassRegistry &);
	void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &);			void initializeUnifyLoopExitsLegacyPassPass(PassRegistry &);
	void initializeUnpackMachineBundlesPass(PassRegistry&);			void initializeUnpackMachineBundlesPass(PassRegistry&);
	void initializeUnreachableBlockElimLegacyPassPass(PassRegistry&);			void initializeUnreachableBlockElimLegacyPassPass(PassRegistry&);
	void initializeUnreachableMachineBlockElimPass(PassRegistry&);			void initializeUnreachableMachineBlockElimPass(PassRegistry&);
	Show All 15 Lines

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createSafeStackPass();		(void) llvm::createSafeStackPass();
(void) llvm::createSROAPass();		(void) llvm::createSROAPass();
(void) llvm::createSingleLoopExtractorPass();		(void) llvm::createSingleLoopExtractorPass();
(void) llvm::createStripSymbolsPass();		(void) llvm::createStripSymbolsPass();
(void) llvm::createStripNonDebugSymbolsPass();		(void) llvm::createStripNonDebugSymbolsPass();
(void) llvm::createStripDeadDebugInfoPass();		(void) llvm::createStripDeadDebugInfoPass();
(void) llvm::createStripDeadPrototypesPass();		(void) llvm::createStripDeadPrototypesPass();
(void) llvm::createTailCallEliminationPass();		(void) llvm::createTailCallEliminationPass();
		(void)llvm::createTLSVariableHoistPass();
		pengfeiUnsubmitted Done Reply Inline Actions Keep the same format looks better. Up to you. pengfei: Keep the same format looks better. Up to you.
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, this is follow clang-format, let it be. xiangzhangllvm: Yes, this is follow clang-format, let it be.
(void) llvm::createJumpThreadingPass();		(void) llvm::createJumpThreadingPass();
(void) llvm::createDFAJumpThreadingPass();		(void) llvm::createDFAJumpThreadingPass();
(void) llvm::createUnifyFunctionExitNodesPass();		(void) llvm::createUnifyFunctionExitNodesPass();
(void) llvm::createInstCountPass();		(void) llvm::createInstCountPass();
(void) llvm::createConstantHoistingPass();		(void) llvm::createConstantHoistingPass();
(void) llvm::createCodeGenPreparePass();		(void) llvm::createCodeGenPreparePass();
(void) llvm::createEntryExitInstrumenterPass();		(void) llvm::createEntryExitInstrumenterPass();
(void) llvm::createPostInlineEntryExitInstrumenterPass();		(void) llvm::createPostInlineEntryExitInstrumenterPass();
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 423 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LowerExpectIntrinsics - Removes llvm.expect intrinsics and creates			// LowerExpectIntrinsics - Removes llvm.expect intrinsics and creates
	// "block_weights" metadata.			// "block_weights" metadata.
	FunctionPass *createLowerExpectIntrinsicPass();			FunctionPass *createLowerExpectIntrinsicPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// TLSVariableHoist - This pass reduce duplicated TLS address call.
				pengfeiUnsubmitted Done Reply Inline Actions What's "prepares a function" mean? pengfei: What's "prepares a function" mean?
				//
				FunctionPass *createTLSVariableHoistPass();

				//===----------------------------------------------------------------------===//
				//
	// LowerConstantIntrinsicss - Expand any remaining llvm.objectsize and			// LowerConstantIntrinsicss - Expand any remaining llvm.objectsize and
	// llvm.is.constant intrinsic calls, even for the unknown cases.			// llvm.is.constant intrinsic calls, even for the unknown cases.
	//			//
	FunctionPass *createLowerConstantIntrinsicsPass();			FunctionPass *createLowerConstantIntrinsicsPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// PartiallyInlineLibCalls - Tries to inline the fast path of library			// PartiallyInlineLibCalls - Tries to inline the fast path of library
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h

This file was added.

				//==- TLSVariableHoist.h ------ Remove Redundant TLS Loads -------- C++ --==//
				//
				pengfeiUnsubmitted Done Reply Inline Actions Still less than 80. pengfei: Still less than 80.
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass identifies/eliminates Redundant TLS Loads if related option is set.
				// For example:
				// static __thread int x;
				// int g();
				// int f(int c) {
				// int *px = &x;
				// while (c--)
				// *px += g();
				// return *px;
				// }
				//
				// will generate Redundant TLS Loads by compiling it with
				// clang++ -fPIC -ftls-model=global-dynamic -O2 -S
				//
				pengfeiUnsubmitted Done Reply Inline Actions clang pengfei: clang
				// .LBB0_2: # %while.body
				// # =>This Inner Loop Header: Depth=1
				// callq _Z1gv@PLT
				// movl %eax, %ebp
				// leaq _ZL1x@TLSLD(%rip), %rdi
				// callq __tls_get_addr@PLT
				// addl _ZL1x@DTPOFF(%rax), %ebp
				// movl %ebp, _ZL1x@DTPOFF(%rax)
				// addl $-1, %ebx
				// jne .LBB0_2
				// jmp .LBB0_3
				// .LBB0_4: # %entry.while.end_crit_edge
				// leaq _ZL1x@TLSLD(%rip), %rdi
				// callq __tls_get_addr@PLT
				// movl _ZL1x@DTPOFF(%rax), %ebp
				//
				// The Redundant TLS Loads will hurt the performance, especially in loops.
				// So we try to eliminate/move them if required by customers, let it be:
				//
				// # %bb.0: # %entry
				// ...
				// movl %edi, %ebx
				// leaq _ZL1x@TLSLD(%rip), %rdi
				// callq __tls_get_addr@PLT
				// leaq _ZL1x@DTPOFF(%rax), %r14
				// testl %ebx, %ebx
				// je .LBB0_1
				// .LBB0_2: # %while.body
				// # =>This Inner Loop Header: Depth=1
				// callq _Z1gv@PLT
				// addl (%r14), %eax
				// movl %eax, (%r14)
				// addl $-1, %ebx
				// jne .LBB0_2
				// jmp .LBB0_3
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_TLSVARIABLEHOIST_H
				#define LLVM_TRANSFORMS_SCALAR_TLSVARIABLEHOIST_H

				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/SmallVector.h"
				craig.topperUnsubmitted Done Reply Inline Actions Is SetVector used by this file? craig.topper: Is SetVector used by this file?
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/IR/PassManager.h"

				namespace llvm {
				craig.topperUnsubmitted Done Reply Inline Actions are any algorithms used in this file? craig.topper: are any algorithms used in this file?
				craig.topperUnsubmitted Not Done Reply Inline Actions You're not using std::map in this file craig.topper: You're not using std::map in this file

				class BasicBlock;
				craig.topperUnsubmitted Done Reply Inline Actions Is vector used by this file? craig.topper: Is vector used by this file?
				class DominatorTree;
				class Function;
				class GlobalVariable;
				class Instruction;

				/// A private "module" namespace for types and utilities used by
				/// TLSVariableHoist. These are implementation details and should
				/// not be used by clients.
				namespace tlshoist {

				/// Keeps track of the user of a TLS variable and the operand index
				/// where the variable is used.
				struct TLSUser {
				Instruction *Inst;
				unsigned OpndIdx;

				TLSUser(Instruction *Inst, unsigned Idx) : Inst(Inst), OpndIdx(Idx) {}
				};

				/// Keeps track of a TLS variable candidate and its users.
				struct TLSCandidate {
				SmallVector<TLSUser, 8> Users;

				/// Add the user to the use list and update the cost.
				void addUser(Instruction *Inst, unsigned Idx) {
				Users.push_back(TLSUser(Inst, Idx));
				}
				};

				} // end namespace tlshoist

				class TLSVariableHoistPass : public PassInfoMixin<TLSVariableHoistPass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

				// Glue for old PM.
				bool runImpl(Function &F, DominatorTree &DT, LoopInfo &LI);

				private:
				DominatorTree *DT;
				LoopInfo *LI;

				/// Keeps track of TLS variable candidates found in the function.
				using TLSCandMapType = MapVector<GlobalVariable *, tlshoist::TLSCandidate>;
				TLSCandMapType TLSCandMap;

				void collectTLSCandidates(Function &Fn);
				void collectTLSCandidate(Instruction *Inst);
				Instruction getNearestLoopDomInst(BasicBlock BB, Loop *L);
				Instruction getDomInst(Instruction I1, Instruction *I2);
				BasicBlock::iterator findInsertPos(Function &Fn, GlobalVariable *GV,
				BasicBlock *&PosBB);
				craig.topperUnsubmitted Done Reply Inline Actions Varibles -> Variables. Can we use llvm::MapVector to avoid a separate SmallVector? craig.topper: Varibles -> Variables. Can we use llvm::MapVector to avoid a separate SmallVector?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Let me check the MapVector, rarely use it before : ) thanks a lot! xiangzhangllvm: Let me check the MapVector, rarely use it before : ) thanks a lot!
				Instruction genBitCastInst(Function &Fn, GlobalVariable GV);
				bool tryReplaceTLSCandidates(Function &Fn);
				bool tryReplaceTLSCandidate(Function &Fn, GlobalVariable *GV);
				};

				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_SCALAR_TLSVARIABLEHOIST_H

llvm/lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 916 Lines • ▼ Show 20 Lines	void TargetPassConfig::addIRPasses() {
// the unsupported intrinsic will be replaced with a chain of basic blocks,		// the unsupported intrinsic will be replaced with a chain of basic blocks,
// that stores/loads element one-by-one if the appropriate mask bit is set.		// that stores/loads element one-by-one if the appropriate mask bit is set.
addPass(createScalarizeMaskedMemIntrinLegacyPass());		addPass(createScalarizeMaskedMemIntrinLegacyPass());

// Expand reduction intrinsics into shuffle sequences if the target wants to.		// Expand reduction intrinsics into shuffle sequences if the target wants to.
// Allow disabling it for testing purposes.		// Allow disabling it for testing purposes.
if (!DisableExpandReductions)		if (!DisableExpandReductions)
addPass(createExpandReductionsPass());		addPass(createExpandReductionsPass());

		if (getOptLevel() != CodeGenOpt::None)
		addPass(createTLSVariableHoistPass());
}		}

/// Turn exception handling constructs into something the code generators can		/// Turn exception handling constructs into something the code generators can
/// handle.		/// handle.
void TargetPassConfig::addPassesToHandleExceptions() {		void TargetPassConfig::addPassesToHandleExceptions() {
const MCAsmInfo *MCAI = TM->getMCAsmInfo();		const MCAsmInfo *MCAI = TM->getMCAsmInfo();
assert(MCAI && "No MCAsmInfo");		assert(MCAI && "No MCAsmInfo");
switch (MCAI->getExceptionHandlingType()) {		switch (MCAI->getExceptionHandlingType()) {
▲ Show 20 Lines • Show All 611 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/Scalar/Scalarizer.h"			#include "llvm/Transforms/Scalar/Scalarizer.h"
	#include "llvm/Transforms/Scalar/SeparateConstOffsetFromGEP.h"			#include "llvm/Transforms/Scalar/SeparateConstOffsetFromGEP.h"
	#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"			#include "llvm/Transforms/Scalar/SimpleLoopUnswitch.h"
	#include "llvm/Transforms/Scalar/SimplifyCFG.h"			#include "llvm/Transforms/Scalar/SimplifyCFG.h"
	#include "llvm/Transforms/Scalar/Sink.h"			#include "llvm/Transforms/Scalar/Sink.h"
	#include "llvm/Transforms/Scalar/SpeculativeExecution.h"			#include "llvm/Transforms/Scalar/SpeculativeExecution.h"
	#include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"			#include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
	#include "llvm/Transforms/Scalar/StructurizeCFG.h"			#include "llvm/Transforms/Scalar/StructurizeCFG.h"
				#include "llvm/Transforms/Scalar/TLSVariableHoist.h"
	#include "llvm/Transforms/Scalar/TailRecursionElimination.h"			#include "llvm/Transforms/Scalar/TailRecursionElimination.h"
	#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"			#include "llvm/Transforms/Scalar/WarnMissedTransforms.h"
				pengfeiUnsubmitted Done Reply Inline Actions `L` is before `a`, so move it ahead. pengfei: `L` is before `a`, so move it ahead.
	#include "llvm/Transforms/Utils/AddDiscriminators.h"			#include "llvm/Transforms/Utils/AddDiscriminators.h"
	#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"			#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
	#include "llvm/Transforms/Utils/BreakCriticalEdges.h"			#include "llvm/Transforms/Utils/BreakCriticalEdges.h"
	#include "llvm/Transforms/Utils/CanonicalizeAliases.h"			#include "llvm/Transforms/Utils/CanonicalizeAliases.h"
	#include "llvm/Transforms/Utils/CanonicalizeFreezeInLoops.h"			#include "llvm/Transforms/Utils/CanonicalizeFreezeInLoops.h"
	#include "llvm/Transforms/Utils/Debugify.h"			#include "llvm/Transforms/Utils/Debugify.h"
	#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"			#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
	#include "llvm/Transforms/Utils/FixIrreducible.h"			#include "llvm/Transforms/Utils/FixIrreducible.h"
	▲ Show 20 Lines • Show All 1,611 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 355 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())			FUNCTION_PASS("verify<domtree>", DominatorTreeVerifierPass())
	FUNCTION_PASS("verify<loops>", LoopVerifierPass())			FUNCTION_PASS("verify<loops>", LoopVerifierPass())
	FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())			FUNCTION_PASS("verify<memoryssa>", MemorySSAVerifierPass())
	FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())			FUNCTION_PASS("verify<regions>", RegionInfoVerifierPass())
	FUNCTION_PASS("verify<safepoint-ir>", SafepointIRVerifierPass())			FUNCTION_PASS("verify<safepoint-ir>", SafepointIRVerifierPass())
	FUNCTION_PASS("verify<scalar-evolution>", ScalarEvolutionVerifierPass())			FUNCTION_PASS("verify<scalar-evolution>", ScalarEvolutionVerifierPass())
	FUNCTION_PASS("view-cfg", CFGViewerPass())			FUNCTION_PASS("view-cfg", CFGViewerPass())
	FUNCTION_PASS("view-cfg-only", CFGOnlyViewerPass())			FUNCTION_PASS("view-cfg-only", CFGOnlyViewerPass())
				FUNCTION_PASS("tlshoist", TLSVariableHoistPass())
	FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())			FUNCTION_PASS("transform-warning", WarnMissedTransformationsPass())
	FUNCTION_PASS("tsan", ThreadSanitizerPass())			FUNCTION_PASS("tsan", ThreadSanitizerPass())
	FUNCTION_PASS("memprof", MemProfilerPass())			FUNCTION_PASS("memprof", MemProfilerPass())
	#undef FUNCTION_PASS			#undef FUNCTION_PASS

	#ifndef FUNCTION_PASS_WITH_PARAMS			#ifndef FUNCTION_PASS_WITH_PARAMS
	#define FUNCTION_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)			#define FUNCTION_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)
	#endif			#endif
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMScalarOpts
SeparateConstOffsetFromGEP.cpp		SeparateConstOffsetFromGEP.cpp
SimpleLoopUnswitch.cpp		SimpleLoopUnswitch.cpp
SimplifyCFGPass.cpp		SimplifyCFGPass.cpp
Sink.cpp		Sink.cpp
SpeculativeExecution.cpp		SpeculativeExecution.cpp
StraightLineStrengthReduce.cpp		StraightLineStrengthReduce.cpp
StructurizeCFG.cpp		StructurizeCFG.cpp
TailRecursionElimination.cpp		TailRecursionElimination.cpp
		TLSVariableHoist.cpp
WarnMissedTransforms.cpp		WarnMissedTransforms.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar		${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/Scalar

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
Show All 12 Lines

llvm/lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);		initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);
initializeSCCPLegacyPassPass(Registry);		initializeSCCPLegacyPassPass(Registry);
initializeSROALegacyPassPass(Registry);		initializeSROALegacyPassPass(Registry);
initializeCFGSimplifyPassPass(Registry);		initializeCFGSimplifyPassPass(Registry);
initializeStructurizeCFGLegacyPassPass(Registry);		initializeStructurizeCFGLegacyPassPass(Registry);
initializeSimpleLoopUnswitchLegacyPassPass(Registry);		initializeSimpleLoopUnswitchLegacyPassPass(Registry);
initializeSinkingLegacyPassPass(Registry);		initializeSinkingLegacyPassPass(Registry);
initializeTailCallElimPass(Registry);		initializeTailCallElimPass(Registry);
		initializeTLSVariableHoistLegacyPassPass(Registry);
initializeSeparateConstOffsetFromGEPLegacyPassPass(Registry);		initializeSeparateConstOffsetFromGEPLegacyPassPass(Registry);
initializeSpeculativeExecutionLegacyPassPass(Registry);		initializeSpeculativeExecutionLegacyPassPass(Registry);
initializeStraightLineStrengthReduceLegacyPassPass(Registry);		initializeStraightLineStrengthReduceLegacyPassPass(Registry);
initializePlaceBackedgeSafepointsImplPass(Registry);		initializePlaceBackedgeSafepointsImplPass(Registry);
initializePlaceSafepointsPass(Registry);		initializePlaceSafepointsPass(Registry);
initializeFloat2IntLegacyPassPass(Registry);		initializeFloat2IntLegacyPassPass(Registry);
initializeLoopDistributeLegacyPass(Registry);		initializeLoopDistributeLegacyPass(Registry);
initializeLoopLoadEliminationPass(Registry);		initializeLoopLoadEliminationPass(Registry);
▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp

This file was added.

				//===- TLSVariableHoist.cpp -------- Remove Redundant TLS Loads ---------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass identifies/eliminate Redundant TLS Loads if related option is set.
				// The example: Please refer to the comment at the head of TLSVariableHoist.h.
				//
				craig.topperUnsubmitted Done Reply Inline Actions exmaple -> example craig.topper: exmaple -> example
				craig.topperUnsubmitted Done Reply Inline Actions PLS -> Please? craig.topper: PLS -> Please?
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/IR/BasicBlock.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/InstrTypes.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Value.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Casting.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Scalar/TLSVariableHoist.h"
				#include <algorithm>
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Don't know why clang-format will move it ahead, seems clang-format's bug. Let me re-clang-format for other places. xiangzhangllvm: Don't know why clang-format will move it ahead, seems clang-format's bug. Let me re-clang…
				#include <cassert>
				#include <cstdint>
				#include <iterator>
				#include <tuple>
				#include <utility>

				using namespace llvm;
				using namespace tlshoist;

				#define DEBUG_TYPE "tlshoist"

				// TODO: Support "strict" model if we need to strictly load TLS address,
				// because "non-optimize" may also do some optimization in other passes.
				static cl::opt<std::string> TLSLoadHoist(
				"tls-load-hoist",
				cl::desc(
				"hoist the TLS loads in PIC model: "
				"tls-load-hoist=optimize: Eleminate redundant TLS load(s)."
				"tls-load-hoist=strict: Strictly load TLS address before every use."
				"tls-load-hoist=non-optimize: Generally load TLS before use(s)."),
				cl::init("non-optimize"), cl::Hidden);

				lebedev.riUnsubmitted Not Done Reply Inline Actions This should be an enum, not strings lebedev.ri: This should be an enum, not strings
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions yes, we usually use enum here, but I find string is better/easier, because we don't need to define the enum (sometimes may define 2 or more times if we want to use it in clang, for example transfer a clang option to "-mllvm -tls-load-hoist=xxx") xiangzhangllvm: yes, we usually use enum here, but I find string is better/easier, because we don't need to…
				efriedmaUnsubmitted Not Done Reply Inline Actions Why is this off by default? Do you plan to turn it on by default in a followup? efriedma: Why is this off by default? Do you plan to turn it on by default in a followup?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, As I answered this question before, the last patch [3/3] will turn it on and update affected tests. xiangzhangllvm: Yes, As I answered this question before, the last patch [3/3] will turn it on and update…
				namespace {

				/// The TLS Variable hoist pass.
				class TLSVariableHoistLegacyPass : public FunctionPass {
				public:
				static char ID; // Pass identification, replacement for typeid

				TLSVariableHoistLegacyPass() : FunctionPass(ID) {
				initializeTLSVariableHoistLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &Fn) override;

				StringRef getPassName() const override { return "TLS Variable Hoist"; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<LoopInfoWrapperPass>();
				}

				private:
				TLSVariableHoistPass Impl;
				};

				} // end anonymous namespace

				char TLSVariableHoistLegacyPass::ID = 0;

				INITIALIZE_PASS_BEGIN(TLSVariableHoistLegacyPass, "tlshoist",
				"TLS Variable Hoist", false, false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
				INITIALIZE_PASS_END(TLSVariableHoistLegacyPass, "tlshoist",
				"TLS Variable Hoist", false, false)

				FunctionPass *llvm::createTLSVariableHoistPass() {
				return new TLSVariableHoistLegacyPass();
				}

				/// Perform the TLS Variable Hoist optimization for the given function.
				bool TLSVariableHoistLegacyPass::runOnFunction(Function &Fn) {
				if (skipFunction(Fn))
				return false;

				LLVM_DEBUG(dbgs() << "******** Begin TLS Variable Hoist ********\n");
				LLVM_DEBUG(dbgs() << "********** Function: " << Fn.getName() << '\n');

				bool MadeChange =
				Impl.runImpl(Fn, getAnalysis<DominatorTreeWrapperPass>().getDomTree(),
				getAnalysis<LoopInfoWrapperPass>().getLoopInfo());

				if (MadeChange) {
				LLVM_DEBUG(dbgs() << "********** Function after TLS Variable Hoist: "
				<< Fn.getName() << '\n');
				LLVM_DEBUG(dbgs() << Fn);
				}
				LLVM_DEBUG(dbgs() << "******** End TLS Variable Hoist ********\n");

				return MadeChange;
				}

				void TLSVariableHoistPass::collectTLSCandidate(Instruction *Inst) {
				// Skip all cast instructions. They are visited indirectly later on.
				if (Inst->isCast())
				return;

				// Scan all operands.
				for (unsigned Idx = 0, E = Inst->getNumOperands(); Idx != E; ++Idx) {
				auto *GV = dyn_cast<GlobalVariable>(Inst->getOperand(Idx));
				if (!GV \|\| !GV->isThreadLocal())
				continue;

				// Add Candidate to TLSCandMap (GV --> Candidate).
				craig.topperUnsubmitted Not Done Reply Inline Actions Merge into the previous if with \|\| craig.topper: Merge into the previous if with \|\|
				TLSCandMap[GV].addUser(Inst, Idx);
				}
				}

				void TLSVariableHoistPass::collectTLSCandidates(Function &Fn) {
				craig.topperUnsubmitted Done Reply Inline Actions The GV field in TLSCandidate isn't assigned is it? craig.topper: The GV field in TLSCandidate isn't assigned is it?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, I should remove the "GV" field from TLSCandidate. xiangzhangllvm: Yes, I should remove the "GV" field from TLSCandidate.
				// First, quickly check if there is TLS Variable.
				Module *M = Fn.getParent();

				bool HasTLS = llvm::any_of(
				craig.topperUnsubmitted Not Done Reply Inline Actions This line should work for the case when GV isn't already in the map. The operator[] on the MapVector will default construct a TLS candidate before calling addUser. So you don't need to check TLSCandMap.count craig.topper: This line should work for the case when GV isn't already in the map. The operator[] on the…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Make sense! xiangzhangllvm: Make sense!
				M->globals(), [](GlobalVariable &GV) { return GV.isThreadLocal(); });
				craig.topperUnsubmitted Done Reply Inline Actions `TLSCandMap[GV].addUser(Inst, Idx);` works even if GV isn't in the map. The entry will be default constructed before addUser is called. craig.topper: `TLSCandMap[GV].addUser(Inst, Idx);` works even if GV isn't in the map. The entry will be…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, So here the "else" means TLSCandMap.count(GV) != 0 (GV is in the map) xiangzhangllvm: Yes, So here the "else" means TLSCandMap.count(GV) != 0 (GV is in the map)

				// If non, directly return.
				if (!HasTLS)
				return;

				TLSCandMap.clear();

				// Then, collect TLS Variable info.
				for (BasicBlock &BB : Fn) {
				craig.topperUnsubmitted Done Reply Inline Actions can we use llvm::any_of here? craig.topper: can we use llvm::any_of here?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, I checked it, that is good, I never use it before : ) xiangzhangllvm: Yes, I checked it, that is good, I never use it before : )
				// Ignore unreachable basic blocks.
				if (!DT->isReachableFromEntry(&BB))
				continue;

				for (Instruction &Inst : BB)
				collectTLSCandidate(&Inst);
				}
				}

				static bool oneUseOutsideLoop(tlshoist::TLSCandidate &Cand, LoopInfo *LI) {
				if (Cand.Users.size() != 1)
				craig.topperUnsubmitted Done Reply Inline Actions method names should be lowercase craig.topper: method names should be lowercase
				return false;

				BasicBlock *BB = Cand.Users[0].Inst->getParent();
				if (LI->getLoopFor(BB))
				return false;

				return true;
				}

				Instruction TLSVariableHoistPass::getNearestLoopDomInst(BasicBlock BB,
				Loop *L) {
				assert(L && "Unexcepted Loop status!");
				pengfeiUnsubmitted Done Reply Inline Actions You need run clang-format for your new code. pengfei: You need run clang-format for your new code.

				// Get the outermost loop.
				while (Loop *Parent = L->getParentLoop())
				L = Parent;

				BasicBlock *PreHeader = L->getLoopPreheader();

				// There is unique predecessor outside the loop.
				if (PreHeader)
				return PreHeader->getTerminator();

				BasicBlock *Header = L->getHeader();
				BasicBlock *Dom = Header;
				for (BasicBlock *PredBB : predecessors(Header))
				Dom = DT->findNearestCommonDominator(Dom, PredBB);

				assert(Dom && "Not find dominator BB!");
				Instruction *Term = Dom->getTerminator();
				craig.topperUnsubmitted Not Done Reply Inline Actions Is this before the allocas? craig.topper: Is this before the allocas?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Sorry, don't much understand, What the problem if it before allocas ? This is in IR level and the Global Value do not need "allocas" xiangzhangllvm: Sorry, don't much understand, What the problem if it before allocas ? This is in IR level and…
				craig.topperUnsubmitted Not Done Reply Inline Actions The alloca instructions for the function's local variables are the first instructons in the entry basic block. Not if we should be putting the bitcast before them. craig.topper: The alloca instructions for the function's local variables are the first instructons in the…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions OK, Let me re-place the bitcast position. xiangzhangllvm: OK, Let me re-place the bitcast position.
				LuoYuankeUnsubmitted Not Done Reply Inline Actions Would you add a test case with alloca for checking? LuoYuanke: Would you add a test case with alloca for checking?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, no problem xiangzhangllvm: Yes, no problem
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Let me refine here a little later, I am considering get the insert position by checking dominate relation to reduce life range. xiangzhangllvm: Let me refine here a little later, I am considering get the insert position by checking…

				return Term;
				}

				craig.topperUnsubmitted Done Reply Inline Actions Why not pass the Loop* you already looked up in the caller. Why get the loop again? craig.topper: Why not pass the Loop* you already looked up in the caller. Why get the loop again?
				Instruction TLSVariableHoistPass::getDomInst(Instruction I1,
				Instruction *I2) {
				if (!I1)
				return I2;
				craig.topperUnsubmitted Done Reply Inline Actions outmost -> outermost craig.topper: outmost -> outermost
				if (DT->dominates(I1, I2))
				return I1;
				if (DT->dominates(I2, I1))
				return I2;
				craig.topperUnsubmitted Done Reply Inline Actions Predecessor and PreHeader aren't quite the same thing. A PreHeader also has the loop as it's only successor. craig.topper: Predecessor and PreHeader aren't quite the same thing. A PreHeader also has the loop as it's…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Sorry, not much clear about it. My understand is the PreHeader here not in loop (currently it is the outermost loop), so if here has PreHeader for the loop, we can directly insert the bitcast instruction in PreHeader. Because the PreHeader must dominate the Loop (It is the only way to go to Loop). PreHeader \| V Loop xiangzhangllvm: Sorry, not much clear about it. My understand is the PreHeader here not in loop (currently it…
				craig.topperUnsubmitted Done Reply Inline Actions There is a function called getLoopPredecessor and another called getLoopPreheader. You called the former, but named the variable like you called the latter. craig.topper: There is a function called getLoopPredecessor and another called getLoopPreheader. You called…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Oh! That's really a big mistake! Thanks so much! xiangzhangllvm: Oh! That's really a big mistake! Thanks so much!

				// If there is no dominance relation, use common dominator.
				craig.topperUnsubmitted Done Reply Inline Actions If the Preheader exists, it isn't empty. It will always have a branch to the loop. There are no fallthroughs in IR. So the terminator will not be nullptr. craig.topper: If the Preheader exists, it isn't empty. It will always have a branch to the loop. There are no…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Eh ..., Yes, Seems make sense, just a question, in which case there will be an empty BB in IR level ? (even the last BB I see is always append with ret instruction) . I check the BasicBlock::getTerminator() , it is possible return nullptr. const Instruction BasicBlock::getTerminator() const { if (InstList.empty() \|\| !InstList.back().isTerminator()) return nullptr; return &InstList.back(); } xiangzhangllvm:* Eh ..., Yes, Seems make sense, just a question, in which case there will be an empty BB in IR…
				craig.topperUnsubmitted Done Reply Inline Actions It can happen when the basic block is first created and not connected to the CFG. But if it's connected, it has a terminator. The successor list of a basic block is stored directly in the terminator instruction. The predecessor list is found by iterating all of the users of the BasicBlock* and looking at which uses are terminator instructions. craig.topper: It can happen when the basic block is first created and not connected to the CFG. But if it's…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Thanks for your explain! xiangzhangllvm: Thanks for your explain!
				BasicBlock *DomBB =
				craig.topperUnsubmitted Done Reply Inline Actions I'm not sure that's true. Control flow in IR is explicit, there is no fallthrough. If it's a predecessor it must have a branch instruction of some kind. craig.topper: I'm not sure that's true. Control flow in IR is explicit, there is no fallthrough. If it's a…
				DT->findNearestCommonDominator(I1->getParent(), I2->getParent());

				Instruction *Dom = DomBB->getTerminator();
				assert(Dom && "Common dominator not found!");

				return Dom;
				}

				BasicBlock::iterator TLSVariableHoistPass::findInsertPos(Function &Fn,
				GlobalVariable *GV,
				BasicBlock *&PosBB) {
				tlshoist::TLSCandidate &Cand = TLSCandMap[GV];

				// We should hoist the TLS use out of loop, so choose its nearest instruction
				// which dominate the loop and the outside loops (if exist).
				Instruction *LastPos = nullptr;
				for (auto &User : Cand.Users) {
				craig.topperUnsubmitted Done Reply Inline Actions `Replaced \|= tryReplaceTLSCandidate(Fn, GV);` craig.topper: `Replaced \|= tryReplaceTLSCandidate(Fn, GV);`
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions I think "\|=" is a bit operation. Here is bool, the stardard way is "\|\|" xiangzhangllvm: I think "\|=" is a bit operation. Here is bool, the stardard way is "\|\|"
				craig.topperUnsubmitted Done Reply Inline Actions `MadeChange \|=` is a frequent pattern used by passes in llvm. craig.topper: `MadeChange \|=` is a frequent pattern used by passes in llvm.
				BasicBlock *BB = User.Inst->getParent();
				Instruction *Pos = User.Inst;
				if (Loop *L = LI->getLoopFor(BB)) {
				Pos = getNearestLoopDomInst(BB, L);
				assert(Pos && "Not find insert position out of loop!");
				craig.topperUnsubmitted Not Done Reply Inline Actions This might be handled by the pass manager for the new pass manager. For the old pass manager it is part of skipFunction. craig.topper: This might be handled by the pass manager for the new pass manager. For the old pass manager it…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, Both pass manager will call runImpl, it called by "TLSVariableHoistPass::run" and ."TLSVariableHoistLegacyPass::runOnFunction" , (I forget which is new and which is old) xiangzhangllvm: Yes, Both pass manager will call runImpl, it called by "TLSVariableHoistPass::run" and .
				}
				Pos = getDomInst(LastPos, Pos);
				LastPos = Pos;
				craig.topperUnsubmitted Not Done Reply Inline Actions Should we call skipFunction() for opt-bisect-limit and optnone support? craig.topper: Should we call skipFunction() for opt-bisect-limit and optnone support?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions 1 Yes, we should consider skipFunction, good catch! 2 optnone will not created this pass at TargetPassConfig.cpp xiangzhangllvm: 1 Yes, we should consider skipFunction, good catch! 2 optnone will not created this pass…
				craig.topperUnsubmitted Done Reply Inline Actions There is an optnone function attribute which is different than the global optlevel. craig.topper: There is an optnone function attribute which is different than the global optlevel.
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions OK, let me check it again, but for skipFunction, I can't call it here, because the TLSVariableHoistPass not inherit FunctionPass, but I already check it at TLSVariableHoistLegacyPass::runOnFunction. xiangzhangllvm: OK, let me check it again, but for skipFunction, I can't call it here, because the…
				}

				craig.topperUnsubmitted Done Reply Inline Actions Do we normally use capitalized strings? craig.topper: Do we normally use capitalized strings?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions I checked the other similar uses, we normally not use capitalized strings, let me change it, thanks a lot! xiangzhangllvm: I checked the other similar uses, we normally not use capitalized strings, let me change it…
				assert(LastPos && "Unexpected insert position!");
				BasicBlock *Parent = LastPos->getParent();
				PosBB = Parent;
				return LastPos->getIterator();
				}

				// Generate a bitcast (no type change) to replace the uses of TLS Candidate.
				Instruction *TLSVariableHoistPass::genBitCastInst(Function &Fn,
				GlobalVariable *GV) {
				BasicBlock *PosBB = &Fn.getEntryBlock();
				BasicBlock::iterator Iter = findInsertPos(Fn, GV, PosBB);
				craig.topperUnsubmitted Done Reply Inline Actions Why would DT be null? The pass has DominatorTree as a requirement. craig.topper: Why would DT be null? The pass has DominatorTree as a requirement.
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions I am not sure if its requirement will must successful build/generate a DominatorTree. Or here let me change to assert (DT) ? xiangzhangllvm: I am not sure if its requirement will must successful build/generate a DominatorTree. Or here…
				craig.topperUnsubmitted Done Reply Inline Actions You can use assert(DT) craig.topper: You can use assert(DT)
				Type *Ty = GV->getType();
				auto *CastInst = new BitCastInst(GV, Ty, "tls_bitcast");
				PosBB->getInstList().insert(Iter, CastInst);
				return CastInst;
				}

				bool TLSVariableHoistPass::tryReplaceTLSCandidate(Function &Fn,
				GlobalVariable *GV) {

				craig.topperUnsubmitted Done Reply Inline Actions Why would LI be null? The pass has it as a requirement craig.topper: Why would LI be null? The pass has it as a requirement
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions The same with DT, thanks for your review! xiangzhangllvm: The same with DT, thanks for your review!
				tlshoist::TLSCandidate &Cand = TLSCandMap[GV];

				// If only used 1 time and not in loops, we no need to replace it.
				if (oneUseOutsideLoop(Cand, LI))
				return false;

				// Generate a bitcast (no type change)
				auto *CastInst = genBitCastInst(Fn, GV);

				// to replace the uses of TLS Candidate
				for (auto &User : Cand.Users)
				User.Inst->setOperand(User.OpndIdx, CastInst);

				return true;
				}

				bool TLSVariableHoistPass::tryReplaceTLSCandidates(Function &Fn) {
				if (TLSCandMap.empty())
				return false;

				bool Replaced = false;
				for (auto &GV2Cand : TLSCandMap) {
				GlobalVariable *GV = GV2Cand.first;
				Replaced \|= tryReplaceTLSCandidate(Fn, GV);
				}

				return Replaced;
				}

				/// Optimize expensive TLS variables in the given function.
				bool TLSVariableHoistPass::runImpl(Function &Fn, DominatorTree &DT,
				LoopInfo &LI) {
				if (Fn.hasOptNone())
				return false;

				if (TLSLoadHoist != "optimize" &&
				!Fn.getAttributes().hasFnAttr("tls-load-hoist"))
				return false;

				this->LI = &LI;
				this->DT = &DT;
				assert(this->LI && this->DT && "Unexcepted requirement!");

				// Collect all TLS variable candidates.
				collectTLSCandidates(Fn);

				bool MadeChange = tryReplaceTLSCandidates(Fn);

				return MadeChange;
				}

				PreservedAnalyses TLSVariableHoistPass::run(Function &F,
				FunctionAnalysisManager &AM) {

				auto &LI = AM.getResult<LoopAnalysis>(F);
				auto &DT = AM.getResult<DominatorTreeAnalysis>(F);

				if (!runImpl(F, DT, LI))
				return PreservedAnalyses::all();

				PreservedAnalyses PA;
				PA.preserveSet<CFGAnalyses>();
				return PA;
				}

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	; CHECK-NEXT: Block Frequency Analysis			; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Replace intrinsics with calls to vector library			; CHECK-NEXT: Replace intrinsics with calls to vector library
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
				; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: TLS Variable Hoist
	; CHECK-NEXT: Stack Safety Analysis			; CHECK-NEXT: Stack Safety Analysis
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				pengfeiUnsubmitted Not Done Reply Inline Actions Maybe preserve loop infor too. The pass does nothing with it. This may help with the following passes not run it again. pengfei: Maybe preserve loop infor too. The pass does nothing with it. This may help with the following…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions This pass changed the instructions, so BB info should better be updated. xiangzhangllvm: This pass changed the instructions, so BB info should better be updated.
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Stack Safety Local Analysis			; CHECK-NEXT: Stack Safety Local Analysis
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: AArch64 Stack Tagging			; CHECK-NEXT: AArch64 Stack Tagging
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Branch Probability Analysis			; GCN-O1-NEXT: Branch Probability Analysis
	; GCN-O1-NEXT: Block Frequency Analysis			; GCN-O1-NEXT: Block Frequency Analysis
	; GCN-O1-NEXT: Constant Hoisting			; GCN-O1-NEXT: Constant Hoisting
	; GCN-O1-NEXT: Replace intrinsics with calls to vector library			; GCN-O1-NEXT: Replace intrinsics with calls to vector library
	; GCN-O1-NEXT: Partially inline calls to library functions			; GCN-O1-NEXT: Partially inline calls to library functions
	; GCN-O1-NEXT: Expand vector predication intrinsics			; GCN-O1-NEXT: Expand vector predication intrinsics
	; GCN-O1-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O1-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O1-NEXT: Expand reduction intrinsics			; GCN-O1-NEXT: Expand reduction intrinsics
				; GCN-O1-NEXT: Natural Loop Information
				; GCN-O1-NEXT: TLS Variable Hoist
	; GCN-O1-NEXT: AMDGPU Attributor			; GCN-O1-NEXT: AMDGPU Attributor
	; GCN-O1-NEXT: CallGraph Construction			; GCN-O1-NEXT: CallGraph Construction
	; GCN-O1-NEXT: Call Graph SCC Pass Manager			; GCN-O1-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-NEXT: AMDGPU Annotate Kernel Features			; GCN-O1-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O1-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Branch Probability Analysis			; GCN-O1-OPTS-NEXT: Branch Probability Analysis
	; GCN-O1-OPTS-NEXT: Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Constant Hoisting			; GCN-O1-OPTS-NEXT: Constant Hoisting
	; GCN-O1-OPTS-NEXT: Replace intrinsics with calls to vector library			; GCN-O1-OPTS-NEXT: Replace intrinsics with calls to vector library
	; GCN-O1-OPTS-NEXT: Partially inline calls to library functions			; GCN-O1-OPTS-NEXT: Partially inline calls to library functions
	; GCN-O1-OPTS-NEXT: Expand vector predication intrinsics			; GCN-O1-OPTS-NEXT: Expand vector predication intrinsics
	; GCN-O1-OPTS-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O1-OPTS-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O1-OPTS-NEXT: Expand reduction intrinsics			; GCN-O1-OPTS-NEXT: Expand reduction intrinsics
				; GCN-O1-OPTS-NEXT: Natural Loop Information
				; GCN-O1-OPTS-NEXT: TLS Variable Hoist
	; GCN-O1-OPTS-NEXT: Early CSE			; GCN-O1-OPTS-NEXT: Early CSE
	; GCN-O1-OPTS-NEXT: AMDGPU Attributor			; GCN-O1-OPTS-NEXT: AMDGPU Attributor
	; GCN-O1-OPTS-NEXT: CallGraph Construction			; GCN-O1-OPTS-NEXT: CallGraph Construction
	; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager			; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-OPTS-NEXT: AMDGPU Annotate Kernel Features			; GCN-O1-OPTS-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O1-OPTS-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Branch Probability Analysis			; GCN-O2-NEXT: Branch Probability Analysis
	; GCN-O2-NEXT: Block Frequency Analysis			; GCN-O2-NEXT: Block Frequency Analysis
	; GCN-O2-NEXT: Constant Hoisting			; GCN-O2-NEXT: Constant Hoisting
	; GCN-O2-NEXT: Replace intrinsics with calls to vector library			; GCN-O2-NEXT: Replace intrinsics with calls to vector library
	; GCN-O2-NEXT: Partially inline calls to library functions			; GCN-O2-NEXT: Partially inline calls to library functions
	; GCN-O2-NEXT: Expand vector predication intrinsics			; GCN-O2-NEXT: Expand vector predication intrinsics
	; GCN-O2-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O2-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O2-NEXT: Expand reduction intrinsics			; GCN-O2-NEXT: Expand reduction intrinsics
				; GCN-O2-NEXT: Natural Loop Information
				; GCN-O2-NEXT: TLS Variable Hoist
	; GCN-O2-NEXT: Early CSE			; GCN-O2-NEXT: Early CSE
	; GCN-O2-NEXT: AMDGPU Attributor			; GCN-O2-NEXT: AMDGPU Attributor
	; GCN-O2-NEXT: CallGraph Construction			; GCN-O2-NEXT: CallGraph Construction
	; GCN-O2-NEXT: Call Graph SCC Pass Manager			; GCN-O2-NEXT: Call Graph SCC Pass Manager
	; GCN-O2-NEXT: AMDGPU Annotate Kernel Features			; GCN-O2-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O2-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Block Frequency Analysis			; GCN-O3-NEXT: Block Frequency Analysis
	; GCN-O3-NEXT: Constant Hoisting			; GCN-O3-NEXT: Constant Hoisting
	; GCN-O3-NEXT: Replace intrinsics with calls to vector library			; GCN-O3-NEXT: Replace intrinsics with calls to vector library
	; GCN-O3-NEXT: Partially inline calls to library functions			; GCN-O3-NEXT: Partially inline calls to library functions
	; GCN-O3-NEXT: Expand vector predication intrinsics			; GCN-O3-NEXT: Expand vector predication intrinsics
	; GCN-O3-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O3-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O3-NEXT: Expand reduction intrinsics			; GCN-O3-NEXT: Expand reduction intrinsics
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
				; GCN-O3-NEXT: TLS Variable Hoist
	; GCN-O3-NEXT: Phi Values Analysis			; GCN-O3-NEXT: Phi Values Analysis
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory Dependence Analysis			; GCN-O3-NEXT: Memory Dependence Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Global Value Numbering			; GCN-O3-NEXT: Global Value Numbering
	▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/O3-pipeline.ll

	Show All 35 Lines
	; CHECK-NEXT: Block Frequency Analysis			; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Replace intrinsics with calls to vector library			; CHECK-NEXT: Replace intrinsics with calls to vector library
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: TLS Variable Hoist
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Transform functions to use DSP intrinsics			; CHECK-NEXT: Transform functions to use DSP intrinsics
	; CHECK-NEXT: Interleaved Access Pass			; CHECK-NEXT: Interleaved Access Pass
	; CHECK-NEXT: Type Promotion			; CHECK-NEXT: Type Promotion
	; CHECK-NEXT: CodeGen Prepare			; CHECK-NEXT: CodeGen Prepare
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/O3-pipeline.ll

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Block Frequency Analysis			; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Replace intrinsics with calls to vector library			; CHECK-NEXT: Replace intrinsics with calls to vector library
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: TLS Variable Hoist
	; CHECK-NEXT: CodeGen Prepare			; CHECK-NEXT: CodeGen Prepare
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Prepare loop for ppc preferred instruction forms			; CHECK-NEXT: Prepare loop for ppc preferred instruction forms
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/opt-pipeline.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	; CHECK-NEXT: Block Frequency Analysis			; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Replace intrinsics with calls to vector library			; CHECK-NEXT: Replace intrinsics with calls to vector library
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
				; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: TLS Variable Hoist
	; CHECK-NEXT: Interleaved Access Pass			; CHECK-NEXT: Interleaved Access Pass
	; CHECK-NEXT: X86 Partial Reduction			; CHECK-NEXT: X86 Partial Reduction
	; CHECK-NEXT: Expand indirectbr instructions			; CHECK-NEXT: Expand indirectbr instructions
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: CodeGen Prepare			; CHECK-NEXT: CodeGen Prepare
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Safe Stack instrumentation pass			; CHECK-NEXT: Safe Stack instrumentation pass
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/tls-loads-control.ll

This file was added.

				; RUN: llc -mtriple=x86_64-unknown-unknown -O2 --relocation-model=pic --tls-load-hoist=optimize --stop-after=tlshoist -o - %s \| FileCheck %s
				; RUN: llc -mtriple=x86_64-unknown-unknown -O2 --relocation-model=pic --stop-after=tlshoist -o - %s \| FileCheck %s
				pengfeiUnsubmitted Done Reply Inline Actions Don't need since both check the same. pengfei: Don't need since both check the same.
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions The test contain "!3 = !{i32 1, !"tls-load-control", !"Optimize"}" , it has same functonality with "--tls-load-control=Optimize" But test both of them here is not bad, I think. xiangzhangllvm: The test contain "!3 = !{i32 1, !"tls-load-control", !"Optimize"}" , it has same functonality…

				pengfeiUnsubmitted Done Reply Inline Actions ditto. pengfei: ditto.
				; This test come from compiling clang/test/CodeGen/intel/tls_loads.cpp with:
				; (clang tls_loads.cpp -fPIC -ftls-model=global-dynamic -O2 -S -emit-llvm)

				; // Variable declaration and definition:
				; thread_local int thl_x;
				; thread_local int thl_x2;
				;
				; struct SS {
				; char thl_c;
				; int num;
				; };
				pengfeiUnsubmitted Not Done Reply Inline Actions What are they used for? pengfei: What are they used for?
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions This test is directly generate from a clang test I'll commit latter. So let it be the raw result of source file out put (I comment at line 8) is good to check the source of it. So I didn't manually change it more. xiangzhangllvm: This test is directly generate from a clang test I'll commit latter. So let it be the raw…
				;
				; int gfunc();
				; int gfunc2(int);

				; // First function (@_Z2f1i):
				; int f1(int c) {
				; while (c)
				; c++;
				;
				; int *px = &thl_x;
				; c -= gfunc();
				;
				; while(c++) {
				; c = gfunc();
				; while (c--)
				; *px += gfunc2(thl_x2);
				; }
				; return *px;
				; }

				$_ZTW5thl_x = comdat any

				$_ZTW6thl_x2 = comdat any

				@thl_x = thread_local global i32 0, align 4
				@thl_x2 = thread_local global i32 0, align 4
				@_ZZ2f2iE2st.0 = internal thread_local unnamed_addr global i8 0, align 4
				@_ZZ2f2iE2st.1 = internal thread_local unnamed_addr global i32 0, align 4

				; Function Attrs: mustprogress uwtable
				define noundef i32 @_Z2f1i(i32 noundef %c) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z2f1i
				; CHECK: entry:
				; CHECK-NEXT: %call = tail call noundef i32 @_Z5gfuncv()
				; CHECK-NEXT: %phi.cmp = icmp eq i32 %call, 0
				; CHECK-NEXT: %tls_bitcast1 = bitcast i32* @thl_x to i32*
				; CHECK-NEXT: br i1 %phi.cmp, label %while.end11, label %while.body4.preheader

				; CHECK: while.body4.preheader:
				; CHECK-NEXT: %tls_bitcast = bitcast i32* @thl_x2 to i32*
				; CHECK-NEXT: br label %while.body4

				; CHECK: while.body4:
				; CHECK-NEXT: %call5 = tail call noundef i32 @_Z5gfuncv()
				; CHECK-NEXT: %tobool7.not18 = icmp eq i32 %call5, 0
				; CHECK-NEXT: br i1 %tobool7.not18, label %while.body4.backedge, label %while.body8.preheader

				; CHECK: while.body8.preheader:
				; CHECK-NEXT: br label %while.body8

				; CHECK: while.body4.backedge.loopexit:
				; CHECK-NEXT: br label %while.body4.backedge

				; CHECK: while.body4.backedge:
				; CHECK-NEXT: br label %while.body4, !llvm.loop !4

				; CHECK: while.body8:
				; CHECK-NEXT: %c.addr.219 = phi i32 [ %dec, %while.body8 ], [ %call5, %while.body8.preheader ]
				; CHECK-NEXT: %dec = add i32 %c.addr.219, -1
				; CHECK-NEXT: %0 = load i32, i32* %tls_bitcast, align 4
				; CHECK-NEXT: %call9 = tail call noundef i32 @_Z6gfunc2i(i32 noundef %0)
				; CHECK-NEXT: %1 = load i32, i32* %tls_bitcast1, align 4
				; CHECK-NEXT: %add = add nsw i32 %1, %call9
				; CHECK-NEXT: store i32 %add, i32* %tls_bitcast1, align 4
				; CHECK-NEXT: %tobool7.not = icmp eq i32 %dec, 0
				; CHECK-NEXT: br i1 %tobool7.not, label %while.body4.backedge.loopexit, label %while.body8, !llvm.loop !4

				; CHECK: while.end11:
				; CHECK-NEXT: %2 = load i32, i32* %tls_bitcast1, align 4
				; CHECK-NEXT: ret i32 %2

				entry:
				%call = tail call noundef i32 @_Z5gfuncv()
				%phi.cmp = icmp eq i32 %call, 0
				br i1 %phi.cmp, label %while.end11, label %while.body4

				while.body4: ; preds = %entry, %while.body4.backedge
				%call5 = tail call noundef i32 @_Z5gfuncv()
				%tobool7.not18 = icmp eq i32 %call5, 0
				br i1 %tobool7.not18, label %while.body4.backedge, label %while.body8

				while.body4.backedge: ; preds = %while.body8, %while.body4
				br label %while.body4, !llvm.loop !4

				while.body8: ; preds = %while.body4, %while.body8
				%c.addr.219 = phi i32 [ %dec, %while.body8 ], [ %call5, %while.body4 ]
				%dec = add nsw i32 %c.addr.219, -1
				%0 = load i32, i32* @thl_x2, align 4
				%call9 = tail call noundef i32 @_Z6gfunc2i(i32 noundef %0)
				%1 = load i32, i32* @thl_x, align 4
				%add = add nsw i32 %1, %call9
				store i32 %add, i32* @thl_x, align 4
				%tobool7.not = icmp eq i32 %dec, 0
				br i1 %tobool7.not, label %while.body4.backedge, label %while.body8, !llvm.loop !4

				while.end11: ; preds = %entry
				%2 = load i32, i32* @thl_x, align 4
				ret i32 %2
				}

				; // Sencond function (@_Z2f2i):
				; int f2(int c) {
				; thread_local struct SS st;
				; c += gfunc();
				; while (c--) {
				; thl_x += gfunc();
				; st.thl_c += (char)gfunc();
				; st.num += gfunc();
				; }
				; return thl_x;
				; }
				declare noundef i32 @_Z5gfuncv() local_unnamed_addr #1

				declare noundef i32 @_Z6gfunc2i(i32 noundef) local_unnamed_addr #1

				; Function Attrs: mustprogress uwtable
				define noundef i32 @_Z2f2i(i32 noundef %c) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z2f2i
				; CHECK: entry:
				; CHECK-NEXT: %call = tail call noundef i32 @_Z5gfuncv()
				; CHECK-NEXT: %add = add nsw i32 %call, %c
				; CHECK-NEXT: %tobool.not12 = icmp eq i32 %add, 0
				; CHECK-NEXT: %tls_bitcast = bitcast i32* @thl_x to i32*
				; CHECK-NEXT: br i1 %tobool.not12, label %while.end, label %while.body.preheader

				; CHECK: while.body.preheader:
				; CHECK-NEXT: %tls_bitcast1 = bitcast i8* @_ZZ2f2iE2st.0 to i8*
				; CHECK-NEXT: %tls_bitcast2 = bitcast i32* @_ZZ2f2iE2st.1 to i32*
				; CHECK-NEXT: br label %while.body

				; CHECK: while.body:
				; CHECK-NEXT: %c.addr.013 = phi i32 [ %dec, %while.body ], [ %add, %while.body.preheader ]
				; CHECK-NEXT: %dec = add i32 %c.addr.013, -1
				; CHECK-NEXT: %call1 = tail call noundef i32 @_Z5gfuncv()
				; CHECK-NEXT: %0 = load i32, i32* %tls_bitcast, align 4
				; CHECK-NEXT: %add2 = add nsw i32 %0, %call1
				; CHECK-NEXT: store i32 %add2, i32* %tls_bitcast, align 4
				; CHECK-NEXT: %call3 = tail call noundef i32 @_Z5gfuncv()
				; CHECK-NEXT: %1 = load i8, i8* %tls_bitcast1, align 4
				; CHECK-NEXT: %2 = trunc i32 %call3 to i8
				; CHECK-NEXT: %conv7 = add i8 %1, %2
				; CHECK-NEXT: store i8 %conv7, i8* %tls_bitcast1, align 4
				; CHECK-NEXT: %call8 = tail call noundef i32 @_Z5gfuncv()
				; CHECK-NEXT: %3 = load i32, i32* %tls_bitcast2, align 4
				; CHECK-NEXT: %add9 = add nsw i32 %3, %call8
				; CHECK-NEXT: store i32 %add9, i32* %tls_bitcast2, align 4
				; CHECK-NEXT: %tobool.not = icmp eq i32 %dec, 0
				; CHECK-NEXT: br i1 %tobool.not, label %while.end.loopexit, label %while.body

				; CHECK: while.end.loopexit:
				pengfeiUnsubmitted Not Done Reply Inline Actions What are them used for? pengfei: What are them used for?
				; CHECK-NEXT: br label %while.end

				; CHECK: while.end:
				; CHECK-NEXT: %4 = load i32, i32* %tls_bitcast, align 4
				; CHECK-NEXT: ret i32 %4
				entry:
				%call = tail call noundef i32 @_Z5gfuncv()
				%add = add nsw i32 %call, %c
				%tobool.not12 = icmp eq i32 %add, 0
				br i1 %tobool.not12, label %while.end, label %while.body

				while.body: ; preds = %entry, %while.body
				%c.addr.013 = phi i32 [ %dec, %while.body ], [ %add, %entry ]
				pengfeiUnsubmitted Done Reply Inline Actions There meta data is annoying, especially the OneAPI info here doesn't make sense to llvm. The same below. pengfei: There meta data is annoying, especially the OneAPI info here doesn't make sense to llvm. The…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Done, but I think keep more info is not bad for test, maybe it is personal habits, I usually prefer to sync the .ll with the .c/cpp xiangzhangllvm: Done, but I think keep more info is not bad for test, maybe it is personal habits, I usually…
				%dec = add nsw i32 %c.addr.013, -1
				%call1 = tail call noundef i32 @_Z5gfuncv()
				%0 = load i32, i32* @thl_x, align 4
				%add2 = add nsw i32 %0, %call1
				store i32 %add2, i32* @thl_x, align 4
				%call3 = tail call noundef i32 @_Z5gfuncv()
				%1 = load i8, i8* @_ZZ2f2iE2st.0, align 4
				%2 = trunc i32 %call3 to i8
				%conv7 = add i8 %1, %2
				store i8 %conv7, i8* @_ZZ2f2iE2st.0, align 4
				pengfeiUnsubmitted Not Done Reply Inline Actions Remove meta data other than `tls-load-control` pengfei: Remove meta data other than `tls-load-control`
				%call8 = tail call noundef i32 @_Z5gfuncv()
				%3 = load i32, i32* @_ZZ2f2iE2st.1, align 4
				%add9 = add nsw i32 %3, %call8
				store i32 %add9, i32* @_ZZ2f2iE2st.1, align 4
				%tobool.not = icmp eq i32 %dec, 0
				br i1 %tobool.not, label %while.end, label %while.body

				while.end: ; preds = %while.body, %entry
				%4 = load i32, i32* @thl_x, align 4
				ret i32 %4
				}

				; // Third function (@_Z2f3i):
				; int f3(int c) {
				; int *px = &thl_x;
				; gfunc2(*px);
				; gfunc2(*px);
				; return 1;
				; }

				; Function Attrs: mustprogress uwtable
				define noundef i32 @_Z2f3i(i32 noundef %c) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z2f3i
				; CHECK: entry:
				; CHECK-NEXT: %tls_bitcast = bitcast i32* @thl_x to i32*
				; CHECK-NEXT: %0 = load i32, i32* %tls_bitcast, align 4
				; CHECK-NEXT: %call = tail call noundef i32 @_Z6gfunc2i(i32 noundef %0)
				; CHECK-NEXT: %1 = load i32, i32* %tls_bitcast, align 4
				; CHECK-NEXT: %call1 = tail call noundef i32 @_Z6gfunc2i(i32 noundef %1)
				; CHECK-NEXT: ret i32 1
				entry:
				%0 = load i32, i32* @thl_x, align 4
				%call = tail call noundef i32 @_Z6gfunc2i(i32 noundef %0)
				%1 = load i32, i32* @thl_x, align 4
				%call1 = tail call noundef i32 @_Z6gfunc2i(i32 noundef %1)
				ret i32 1
				}

				; Function Attrs: uwtable
				define weak_odr hidden noundef i32* @_ZTW5thl_x() local_unnamed_addr #2 comdat {
				ret i32* @thl_x
				}

				; Function Attrs: uwtable
				define weak_odr hidden noundef i32* @_ZTW6thl_x2() local_unnamed_addr #2 comdat {
				ret i32* @thl_x2
				}

				attributes #0 = { mustprogress uwtable "tls-load-hoist" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { uwtable "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

				!llvm.module.flags = !{!0, !1, !2}
				!llvm.ident = !{!3}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 7, !"PIC Level", i32 2}
				!2 = !{i32 7, !"uwtable", i32 2}
				!3 = !{!"clang version 15.0.0"}
				!4 = distinct !{!4, !5}
				!5 = !{!"llvm.loop.mustprogress"}

llvm/test/CodeGen/X86/tls-loads-control2.ll

This file was added.

				; RUN: opt -S -mtriple=x86_64-unknown-unknown -tlshoist --relocation-model=pic --tls-load-hoist=optimize -o - %s \| FileCheck %s --check-prefix=HOIST0
				; RUN: opt -S -mtriple=x86_64-unknown-unknown -tlshoist --relocation-model=pic --tls-load-hoist=non-optimize -o - %s \| FileCheck %s --check-prefix=HOIST2
				pengfeiUnsubmitted Done Reply Inline Actions Use `--check-prefix` for single prefix. The same below. pengfei: Use `--check-prefix` for single prefix. The same below.
				; RUN: opt -S -mtriple=x86_64-unknown-unknown -tlshoist --relocation-model=pic -o - %s \| FileCheck %s --check-prefix=HOIST2

				$_ZTW5thl_x = comdat any

				@thl_x = thread_local global i32 0, align 4

				; Function Attrs: mustprogress uwtable
				define i32 @_Z2f1i(i32 %c) local_unnamed_addr #0 {
				entry:
				%0 = load i32, i32* @thl_x, align 4
				%call = tail call i32 @_Z5gfunci(i32 %0)
				%1 = load i32, i32* @thl_x, align 4
				%call1 = tail call i32 @_Z5gfunci(i32 %1)
				ret i32 1
				}

				;HOIST0-LABEL: _Z2f1i
				;HOIST0: entry:
				;HOIST0-NEXT: %tls_bitcast = bitcast i32* @thl_x to i32*
				;HOIST0-NEXT: %0 = load i32, i32* %tls_bitcast, align 4
				;HOIST0-NEXT: %call = tail call i32 @_Z5gfunci(i32 %0)
				;HOIST0-NEXT: %1 = load i32, i32* %tls_bitcast, align 4
				;HOIST0-NEXT: %call1 = tail call i32 @_Z5gfunci(i32 %1)
				;HOIST0-NEXT: ret i32 1

				;HOIST2-LABEL: _Z2f1i
				;HOIST2: entry:
				;HOIST2-NEXT: %0 = load i32, i32* @thl_x, align 4
				;HOIST2-NEXT: %call = tail call i32 @_Z5gfunci(i32 %0)
				;HOIST2-NEXT: %1 = load i32, i32* @thl_x, align 4
				;HOIST2-NEXT: %call1 = tail call i32 @_Z5gfunci(i32 %1)
				;HOIST2-NEXT: ret i32 1

				declare i32 @_Z5gfunci(i32) local_unnamed_addr #1

				; Function Attrs: uwtable
				define weak_odr hidden i32* @_ZTW5thl_x() local_unnamed_addr #2 comdat {
				ret i32* @thl_x
				}

				pengfeiUnsubmitted Not Done Reply Inline Actions What's it used for? pengfei: What's it used for?
				attributes #0 = { mustprogress uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { uwtable "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

				!llvm.module.flags = !{!0, !1, !2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 7, !"PIC Level", i32 2}
				!2 = !{i32 7, !"uwtable", i32 1}

llvm/test/CodeGen/X86/tls-loads-control3.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=x86_64-unknown-unknown -O2 --relocation-model=pic --tls-load-hoist=optimize -o - %s \| FileCheck %s --check-prefix=HOIST0
				; RUN: llc -mtriple=x86_64-unknown-unknown -O2 --relocation-model=pic --tls-load-hoist=non-optimize -o - %s \| FileCheck %s --check-prefix=HOIST2
				pengfeiUnsubmitted Done Reply Inline Actions ditto. pengfei: ditto.
				; RUN: llc -mtriple=x86_64-unknown-unknown -O2 --relocation-model=pic -o - %s \| FileCheck %s --check-prefix=HOIST2

				; This test has no module flag {"tls-load-hoist", i32 0}, so use --tls-load-hoist=x
				; to choose the way of loading thread_local address.

				; This test come from compiling clang/test/CodeGen/intel/tls_loads.cpp with:
				; (clang tls_loads.cpp -fPIC -ftls-model=global-dynamic -O2 -S -emit-llvm)

				$_ZTW5thl_x = comdat any

				$_ZTW6thl_x2 = comdat any

				pengfeiUnsubmitted Not Done Reply Inline Actions No use. pengfei: No use.
				@thl_x = thread_local global i32 0, align 4
				@thl_x2 = thread_local global i32 0, align 4
				@_ZZ2f2iE2st.0 = internal thread_local unnamed_addr global i8 0, align 4
				@_ZZ2f2iE2st.1 = internal thread_local unnamed_addr global i32 0, align 4

				; For HOIST0, check call __tls_get_addr@PLT only one time for each thread_local variable.
				; For HOIST2, Check the default way: usually call __tls_get_addr@PLT every time when use thread_local variable.

				; Function Attrs: mustprogress uwtable
				define i32 @_Z2f1i(i32 %c) local_unnamed_addr #0 {
				; HOIST0-LABEL: _Z2f1i:
				; HOIST0: # %bb.0: # %entry
				; HOIST0-NEXT: pushq %r15
				; HOIST0-NEXT: .cfi_def_cfa_offset 16
				; HOIST0-NEXT: pushq %r14
				; HOIST0-NEXT: .cfi_def_cfa_offset 24
				; HOIST0-NEXT: pushq %rbx
				; HOIST0-NEXT: .cfi_def_cfa_offset 32
				; HOIST0-NEXT: .cfi_offset %rbx, -32
				; HOIST0-NEXT: .cfi_offset %r14, -24
				; HOIST0-NEXT: .cfi_offset %r15, -16
				; HOIST0-NEXT: movl %edi, %ebx
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: rex64
				; HOIST0-NEXT: callq __tls_get_addr@PLT
				pengfeiUnsubmitted Not Done Reply Inline Actions `data16` and `rex64` don't seem correct instructions. pengfei: `data16` and `rex64` don't seem correct instructions.
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, there are prefix not real instruction, I can see them in X86AsmParser.cpp , not clear what they are used for ? Seems no relation with this patch. xiangzhangllvm: Yes, there are prefix not real instruction, I can see them in X86AsmParser.cpp , not clear what…
				; HOIST0-NEXT: movq %rax, %r14
				; HOIST0-NEXT: testl %ebx, %ebx
				; HOIST0-NEXT: je .LBB0_4
				; HOIST0-NEXT: # %bb.1: # %while.body.preheader
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: leaq thl_x2@TLSGD(%rip), %rdi
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: rex64
				; HOIST0-NEXT: callq __tls_get_addr@PLT
				; HOIST0-NEXT: movq %rax, %r15
				; HOIST0-NEXT: .p2align 4, 0x90
				; HOIST0-NEXT: .LBB0_2: # %while.body
				; HOIST0-NEXT: # =>This Inner Loop Header: Depth=1
				; HOIST0-NEXT: movl (%r15), %edi
				; HOIST0-NEXT: callq _Z6gfunc2i@PLT
				; HOIST0-NEXT: addl (%r14), %eax
				; HOIST0-NEXT: movl %eax, (%r14)
				; HOIST0-NEXT: decl %ebx
				; HOIST0-NEXT: jne .LBB0_2
				; HOIST0-NEXT: jmp .LBB0_3
				; HOIST0-NEXT: .LBB0_4: # %entry.while.end_crit_edge
				; HOIST0-NEXT: movl (%r14), %eax
				; HOIST0-NEXT: .LBB0_3: # %while.end
				; HOIST0-NEXT: popq %rbx
				; HOIST0-NEXT: .cfi_def_cfa_offset 24
				; HOIST0-NEXT: popq %r14
				; HOIST0-NEXT: .cfi_def_cfa_offset 16
				; HOIST0-NEXT: popq %r15
				; HOIST0-NEXT: .cfi_def_cfa_offset 8
				; HOIST0-NEXT: retq
				;
				; HOIST2-LABEL: _Z2f1i:
				; HOIST2: # %bb.0: # %entry
				; HOIST2-NEXT: pushq %rbp
				; HOIST2-NEXT: .cfi_def_cfa_offset 16
				; HOIST2-NEXT: pushq %rbx
				; HOIST2-NEXT: .cfi_def_cfa_offset 24
				; HOIST2-NEXT: pushq %rax
				; HOIST2-NEXT: .cfi_def_cfa_offset 32
				; HOIST2-NEXT: .cfi_offset %rbx, -24
				; HOIST2-NEXT: .cfi_offset %rbp, -16
				; HOIST2-NEXT: testl %edi, %edi
				; HOIST2-NEXT: je .LBB0_4
				; HOIST2-NEXT: # %bb.1:
				; HOIST2-NEXT: movl %edi, %ebx
				; HOIST2-NEXT: .p2align 4, 0x90
				; HOIST2-NEXT: .LBB0_2: # %while.body
				; HOIST2-NEXT: # =>This Inner Loop Header: Depth=1
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: leaq thl_x2@TLSGD(%rip), %rdi
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: rex64
				; HOIST2-NEXT: callq __tls_get_addr@PLT
				; HOIST2-NEXT: movl (%rax), %edi
				; HOIST2-NEXT: callq _Z6gfunc2i@PLT
				; HOIST2-NEXT: movl %eax, %ebp
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: rex64
				; HOIST2-NEXT: callq __tls_get_addr@PLT
				; HOIST2-NEXT: addl (%rax), %ebp
				; HOIST2-NEXT: movl %ebp, (%rax)
				; HOIST2-NEXT: decl %ebx
				; HOIST2-NEXT: jne .LBB0_2
				; HOIST2-NEXT: jmp .LBB0_3
				; HOIST2-NEXT: .LBB0_4: # %entry.while.end_crit_edge
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: rex64
				; HOIST2-NEXT: callq __tls_get_addr@PLT
				; HOIST2-NEXT: movl (%rax), %ebp
				; HOIST2-NEXT: .LBB0_3: # %while.end
				; HOIST2-NEXT: movl %ebp, %eax
				; HOIST2-NEXT: addq $8, %rsp
				; HOIST2-NEXT: .cfi_def_cfa_offset 24
				; HOIST2-NEXT: popq %rbx
				; HOIST2-NEXT: .cfi_def_cfa_offset 16
				; HOIST2-NEXT: popq %rbp
				; HOIST2-NEXT: .cfi_def_cfa_offset 8
				; HOIST2-NEXT: retq
				entry:
				%tobool.not3 = icmp eq i32 %c, 0
				br i1 %tobool.not3, label %entry.while.end_crit_edge, label %while.body

				entry.while.end_crit_edge: ; preds = %entry
				%.pre = load i32, i32* @thl_x, align 4
				br label %while.end

				while.body: ; preds = %entry, %while.body
				%c.addr.04 = phi i32 [ %dec, %while.body ], [ %c, %entry ]
				%dec = add nsw i32 %c.addr.04, -1
				%0 = load i32, i32* @thl_x2, align 4
				%call = tail call i32 @_Z6gfunc2i(i32 %0)
				%1 = load i32, i32* @thl_x, align 4
				%add = add nsw i32 %1, %call
				store i32 %add, i32* @thl_x, align 4
				%tobool.not = icmp eq i32 %dec, 0
				br i1 %tobool.not, label %while.end, label %while.body

				while.end: ; preds = %while.body, %entry.while.end_crit_edge
				%2 = phi i32 [ %.pre, %entry.while.end_crit_edge ], [ %add, %while.body ]
				ret i32 %2
				}

				declare i32 @_Z6gfunc2i(i32) local_unnamed_addr #1

				; Function Attrs: mustprogress uwtable
				define i32 @_Z2f2i(i32 %c) local_unnamed_addr #0 {
				; HOIST0-LABEL: _Z2f2i:
				; HOIST0: # %bb.0: # %entry
				; HOIST0-NEXT: pushq %r15
				; HOIST0-NEXT: .cfi_def_cfa_offset 16
				pengfeiUnsubmitted Done Reply Inline Actions Add `nounwind` in attributes to avoid cfi directives. pengfei: Add `nounwind` in attributes to avoid cfi directives.
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Add nounwind to #0, but seems cfi still here, anyway, that is not important, I think. xiangzhangllvm: Add nounwind to #0, but seems cfi still here, anyway, that is not important, I think.
				pengfeiUnsubmitted Not Done Reply Inline Actions Did you re-generate the tests? It should work. lit test will fail if you didn't update the tests. pengfei: Did you re-generate the tests? It should work. lit test will fail if you didn't update the…
				xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Yes, re-generate it with update_llc_test_checks.py let me show my local status: [xiangzh1@..]$./llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/X86/tls-loads-control3.ll [xiangzh1@..]$git diff [xiangzh1@..]$llvm-lit llvm/test/CodeGen/X86/tls-loads-control3.ll -- Testing: 1 tests, 1 workers -- PASS: LLVM :: CodeGen/X86/tls-loads-control3.ll (1 of 1) Testing Time: 0.21s Passed: 1 xiangzhangllvm: Yes, re-generate it with update_llc_test_checks.py let me show my local status: ``` [xiangzh1@.
				; HOIST0-NEXT: pushq %r14
				; HOIST0-NEXT: .cfi_def_cfa_offset 24
				; HOIST0-NEXT: pushq %r12
				; HOIST0-NEXT: .cfi_def_cfa_offset 32
				; HOIST0-NEXT: pushq %rbx
				; HOIST0-NEXT: .cfi_def_cfa_offset 40
				; HOIST0-NEXT: pushq %rax
				; HOIST0-NEXT: .cfi_def_cfa_offset 48
				; HOIST0-NEXT: .cfi_offset %rbx, -40
				; HOIST0-NEXT: .cfi_offset %r12, -32
				; HOIST0-NEXT: .cfi_offset %r14, -24
				; HOIST0-NEXT: .cfi_offset %r15, -16
				; HOIST0-NEXT: movl %edi, %ebx
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: rex64
				; HOIST0-NEXT: callq __tls_get_addr@PLT
				; HOIST0-NEXT: movq %rax, %r14
				; HOIST0-NEXT: testl %ebx, %ebx
				; HOIST0-NEXT: je .LBB1_3
				; HOIST0-NEXT: # %bb.1: # %while.body.preheader
				; HOIST0-NEXT: leaq _ZZ2f2iE2st.0@TLSLD(%rip), %rdi
				; HOIST0-NEXT: callq __tls_get_addr@PLT
				; HOIST0-NEXT: movq %rax, %rcx
				; HOIST0-NEXT: leaq _ZZ2f2iE2st.0@DTPOFF(%rax), %r15
				; HOIST0-NEXT: leaq _ZZ2f2iE2st.1@DTPOFF(%rax), %r12
				; HOIST0-NEXT: .p2align 4, 0x90
				; HOIST0-NEXT: .LBB1_2: # %while.body
				; HOIST0-NEXT: # =>This Inner Loop Header: Depth=1
				; HOIST0-NEXT: callq _Z5gfuncv@PLT
				; HOIST0-NEXT: addl %eax, (%r14)
				; HOIST0-NEXT: callq _Z5gfuncv@PLT
				; HOIST0-NEXT: addb %al, (%r15)
				; HOIST0-NEXT: callq _Z5gfuncv@PLT
				; HOIST0-NEXT: addl %eax, (%r12)
				; HOIST0-NEXT: decl %ebx
				; HOIST0-NEXT: jne .LBB1_2
				; HOIST0-NEXT: .LBB1_3: # %while.end
				; HOIST0-NEXT: movl (%r14), %eax
				; HOIST0-NEXT: addq $8, %rsp
				; HOIST0-NEXT: .cfi_def_cfa_offset 40
				; HOIST0-NEXT: popq %rbx
				; HOIST0-NEXT: .cfi_def_cfa_offset 32
				; HOIST0-NEXT: popq %r12
				; HOIST0-NEXT: .cfi_def_cfa_offset 24
				; HOIST0-NEXT: popq %r14
				; HOIST0-NEXT: .cfi_def_cfa_offset 16
				; HOIST0-NEXT: popq %r15
				; HOIST0-NEXT: .cfi_def_cfa_offset 8
				; HOIST0-NEXT: retq
				;
				; HOIST2-LABEL: _Z2f2i:
				; HOIST2: # %bb.0: # %entry
				; HOIST2-NEXT: pushq %rbp
				; HOIST2-NEXT: .cfi_def_cfa_offset 16
				; HOIST2-NEXT: pushq %r14
				; HOIST2-NEXT: .cfi_def_cfa_offset 24
				; HOIST2-NEXT: pushq %rbx
				; HOIST2-NEXT: .cfi_def_cfa_offset 32
				; HOIST2-NEXT: .cfi_offset %rbx, -32
				; HOIST2-NEXT: .cfi_offset %r14, -24
				; HOIST2-NEXT: .cfi_offset %rbp, -16
				; HOIST2-NEXT: testl %edi, %edi
				; HOIST2-NEXT: je .LBB1_3
				; HOIST2-NEXT: # %bb.1: # %while.body.preheader
				; HOIST2-NEXT: movl %edi, %ebx
				; HOIST2-NEXT: .p2align 4, 0x90
				; HOIST2-NEXT: .LBB1_2: # %while.body
				; HOIST2-NEXT: # =>This Inner Loop Header: Depth=1
				; HOIST2-NEXT: callq _Z5gfuncv@PLT
				; HOIST2-NEXT: movl %eax, %ebp
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: rex64
				; HOIST2-NEXT: callq __tls_get_addr@PLT
				; HOIST2-NEXT: addl %ebp, (%rax)
				; HOIST2-NEXT: callq _Z5gfuncv@PLT
				; HOIST2-NEXT: movl %eax, %ebp
				; HOIST2-NEXT: leaq _ZZ2f2iE2st.0@TLSLD(%rip), %rdi
				; HOIST2-NEXT: callq __tls_get_addr@PLT
				; HOIST2-NEXT: movq %rax, %r14
				; HOIST2-NEXT: addb %bpl, _ZZ2f2iE2st.0@DTPOFF(%rax)
				; HOIST2-NEXT: callq _Z5gfuncv@PLT
				; HOIST2-NEXT: movl %eax, %ecx
				; HOIST2-NEXT: movq %r14, %rax
				; HOIST2-NEXT: addl %ecx, _ZZ2f2iE2st.1@DTPOFF(%r14)
				; HOIST2-NEXT: decl %ebx
				; HOIST2-NEXT: jne .LBB1_2
				; HOIST2-NEXT: .LBB1_3: # %while.end
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: rex64
				; HOIST2-NEXT: callq __tls_get_addr@PLT
				; HOIST2-NEXT: movl (%rax), %eax
				; HOIST2-NEXT: popq %rbx
				; HOIST2-NEXT: .cfi_def_cfa_offset 24
				; HOIST2-NEXT: popq %r14
				; HOIST2-NEXT: .cfi_def_cfa_offset 16
				; HOIST2-NEXT: popq %rbp
				; HOIST2-NEXT: .cfi_def_cfa_offset 8
				; HOIST2-NEXT: retq
				entry:
				%tobool.not9 = icmp eq i32 %c, 0
				br i1 %tobool.not9, label %while.end, label %while.body

				while.body: ; preds = %entry, %while.body
				%c.addr.010 = phi i32 [ %dec, %while.body ], [ %c, %entry ]
				%dec = add nsw i32 %c.addr.010, -1
				%call = tail call i32 @_Z5gfuncv()
				%0 = load i32, i32* @thl_x, align 4
				%add = add nsw i32 %0, %call
				store i32 %add, i32* @thl_x, align 4
				%call1 = tail call i32 @_Z5gfuncv()
				%1 = load i8, i8* @_ZZ2f2iE2st.0, align 4
				%2 = trunc i32 %call1 to i8
				%conv5 = add i8 %1, %2
				store i8 %conv5, i8* @_ZZ2f2iE2st.0, align 4
				%call6 = tail call i32 @_Z5gfuncv()
				%3 = load i32, i32* @_ZZ2f2iE2st.1, align 4
				%add7 = add nsw i32 %3, %call6
				store i32 %add7, i32* @_ZZ2f2iE2st.1, align 4
				%tobool.not = icmp eq i32 %dec, 0
				br i1 %tobool.not, label %while.end, label %while.body

				while.end: ; preds = %while.body, %entry
				%4 = load i32, i32* @thl_x, align 4
				ret i32 %4
				}

				declare i32 @_Z5gfuncv() local_unnamed_addr #1

				; Function Attrs: mustprogress uwtable
				define i32 @_Z2f3i(i32 %c) local_unnamed_addr #0 {
				; HOIST0-LABEL: _Z2f3i:
				; HOIST0: # %bb.0: # %entry
				; HOIST0-NEXT: pushq %rbx
				; HOIST0-NEXT: .cfi_def_cfa_offset 16
				; HOIST0-NEXT: .cfi_offset %rbx, -16
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: data16
				; HOIST0-NEXT: rex64
				; HOIST0-NEXT: callq __tls_get_addr@PLT
				; HOIST0-NEXT: movq %rax, %rbx
				; HOIST0-NEXT: movl (%rax), %edi
				; HOIST0-NEXT: callq _Z6gfunc2i@PLT
				; HOIST0-NEXT: movl (%rbx), %edi
				; HOIST0-NEXT: callq _Z6gfunc2i@PLT
				; HOIST0-NEXT: movl $1, %eax
				; HOIST0-NEXT: popq %rbx
				; HOIST0-NEXT: .cfi_def_cfa_offset 8
				; HOIST0-NEXT: retq
				;
				; HOIST2-LABEL: _Z2f3i:
				; HOIST2: # %bb.0: # %entry
				; HOIST2-NEXT: pushq %rbx
				; HOIST2-NEXT: .cfi_def_cfa_offset 16
				; HOIST2-NEXT: .cfi_offset %rbx, -16
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: leaq thl_x@TLSGD(%rip), %rdi
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: data16
				; HOIST2-NEXT: rex64
				; HOIST2-NEXT: callq __tls_get_addr@PLT
				; HOIST2-NEXT: movq %rax, %rbx
				; HOIST2-NEXT: movl (%rax), %edi
				; HOIST2-NEXT: callq _Z6gfunc2i@PLT
				; HOIST2-NEXT: movl (%rbx), %edi
				; HOIST2-NEXT: callq _Z6gfunc2i@PLT
				; HOIST2-NEXT: movl $1, %eax
				; HOIST2-NEXT: popq %rbx
				; HOIST2-NEXT: .cfi_def_cfa_offset 8
				; HOIST2-NEXT: retq
				entry:
				%0 = load i32, i32* @thl_x, align 4
				%call = tail call i32 @_Z6gfunc2i(i32 %0)
				%1 = load i32, i32* @thl_x, align 4
				%call1 = tail call i32 @_Z6gfunc2i(i32 %1)
				ret i32 1
				}

				attributes #0 = { nounwind mustprogress uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				pengfeiUnsubmitted Done Reply Inline Actions Add `nounwind` to avoid dfi directives. pengfei: Add `nounwind` to avoid dfi directives.
				attributes #2 = { uwtable "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

				!llvm.module.flags = !{!0, !1, !2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 7, !"PIC Level", i32 2}
				!2 = !{i32 7, !"uwtable", i32 1}

llvm/tools/llc/llc.cpp

Show First 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeScalarOpts(*Registry);		initializeScalarOpts(*Registry);
initializeVectorization(*Registry);		initializeVectorization(*Registry);
initializeScalarizeMaskedMemIntrinLegacyPassPass(*Registry);		initializeScalarizeMaskedMemIntrinLegacyPassPass(*Registry);
initializeExpandReductionsPass(*Registry);		initializeExpandReductionsPass(*Registry);
initializeExpandVectorPredicationPass(*Registry);		initializeExpandVectorPredicationPass(*Registry);
initializeHardwareLoopsPass(*Registry);		initializeHardwareLoopsPass(*Registry);
initializeTransformUtils(*Registry);		initializeTransformUtils(*Registry);
initializeReplaceWithVeclibLegacyPass(*Registry);		initializeReplaceWithVeclibLegacyPass(*Registry);
		initializeTLSVariableHoistLegacyPassPass(*Registry);

// Initialize debugging passes.		// Initialize debugging passes.
initializeScavengerTestPass(*Registry);		initializeScavengerTestPass(*Registry);

// Register the target printer for --version.		// Register the target printer for --version.
cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);		cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);

cl::ParseCommandLineOptions(argc, argv, "llvm system compiler\n");		cl::ParseCommandLineOptions(argc, argv, "llvm system compiler\n");
▲ Show 20 Lines • Show All 374 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[1/3] TLS loads opimization (hoist)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 414250

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/MachinePassRegistry.def

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/TLSVariableHoist.h

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/lib/Transforms/Scalar/TLSVariableHoist.cpp

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/ARM/O3-pipeline.ll

llvm/test/CodeGen/PowerPC/O3-pipeline.ll

llvm/test/CodeGen/X86/opt-pipeline.ll

llvm/test/CodeGen/X86/tls-loads-control.ll

llvm/test/CodeGen/X86/tls-loads-control2.ll

llvm/test/CodeGen/X86/tls-loads-control3.ll

llvm/tools/llc/llc.cpp

[1/3] TLS loads opimization (hoist)
ClosedPublic