When we access a TLS variable in PIC mode, it usually get the TLS address by calling a lib function, some like
callq __tls_get_addr@PLT
This call was not show in IR or MIR, usually tag by a target-special flag (like pic) and generated in Assembly Printing.
So it is usually call it every time when TLS variable is accessed. Many of them are duplicated, especially in loops.
This patch is try to optimize it. It identifies/eliminate Redundant TLS address call by hoist the TLS access when the related option is set.
For example:
static __thread int x; int g(); int f(int c) { int *px = &x; while (c--) *px += g(); return *px; }
will generated Redundant TLS Loads by compiling it with
Clang++ -fPIC -ftls-model=global-dynamic -O2 -S
.LBB0_2: # %while.body # =>This Inner Loop Header: Depth=1 callq _Z1gv@PLT movl %eax, %ebp leaq _ZL1x@TLSLD(%rip), %rdi callq __tls_get_addr@PLT addl _ZL1x@DTPOFF(%rax), %ebp movl %ebp, _ZL1x@DTPOFF(%rax) addl $-1, %ebx jne .LBB0_2 jmp .LBB0_3 .LBB0_4: # %entry.while.end_crit_edge leaq _ZL1x@TLSLD(%rip), %rdi callq __tls_get_addr@PLT movl _ZL1x@DTPOFF(%rax), %ebp
The Redundant TLS Loads will hurt the performance, especially in loops.
So we try to eliminate/move them if required by customers, let it be:
# %bb.0: # %entry ... movl %edi, %ebx leaq _ZL1x@TLSLD(%rip), %rdi callq __tls_get_addr@PLT leaq _ZL1x@DTPOFF(%rax), %r14 testl %ebx, %ebx je .LBB0_1 .LBB0_2: # %while.body # =>This Inner Loop Header: Depth=1 callq _Z1gv@PLT addl (%r14), %eax movl %eax, (%r14) addl $-1, %ebx jne .LBB0_2 jmp .LBB0_3
Why a module flag? What is the policy for LTO merging?