When we access a TLS variable in PIC mode, it usually get the TLS address by calling a lib function, some like
callq __tls_get_addr@PLT
This call was not show in IR or MIR, usually tag by a target-special flag (like pic) and generated in Assembly Printing.
So it is usually call it every time when TLS variable is accessed. Many of them are duplicated, especially in loops.
This patch is try to optimize it. It identifies/eliminate Redundant TLS address call by hoist the TLS access when the related option is set.
For example:
static __thread int x;
int g();
int f(int c) {
int *px = &x;
while (c--)
*px += g();
return *px;
}will generated Redundant TLS Loads by compiling it with
Clang++ -fPIC -ftls-model=global-dynamic -O2 -S
.LBB0_2: # %while.body
# =>This Inner Loop Header: Depth=1
callq _Z1gv@PLT
movl %eax, %ebp
leaq _ZL1x@TLSLD(%rip), %rdi
callq __tls_get_addr@PLT
addl _ZL1x@DTPOFF(%rax), %ebp
movl %ebp, _ZL1x@DTPOFF(%rax)
addl $-1, %ebx
jne .LBB0_2
jmp .LBB0_3
.LBB0_4: # %entry.while.end_crit_edge
leaq _ZL1x@TLSLD(%rip), %rdi
callq __tls_get_addr@PLT
movl _ZL1x@DTPOFF(%rax), %ebpThe Redundant TLS Loads will hurt the performance, especially in loops.
So we try to eliminate/move them if required by customers, let it be:
# %bb.0: # %entry
...
movl %edi, %ebx
leaq _ZL1x@TLSLD(%rip), %rdi
callq __tls_get_addr@PLT
leaq _ZL1x@DTPOFF(%rax), %r14
testl %ebx, %ebx
je .LBB0_1
.LBB0_2: # %while.body
# =>This Inner Loop Header: Depth=1
callq _Z1gv@PLT
addl (%r14), %eax
movl %eax, (%r14)
addl $-1, %ebx
jne .LBB0_2
jmp .LBB0_3
Why a module flag? What is the policy for LTO merging?