This is an archive of the discontinued LLVM Phabricator instance.

[X86] Support -fno-plt __tls_get_addr calls
ClosedPublic

Authored by MaskRay on May 19 2019, 2:10 AM.

Details

Summary

In general dynamic/local dynamic TLS models, with -fno-plt,

  • x86: emit calll *___tls_get_addr@GOT(%ebx) instead of calll ___tls_get_addr@PLT Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls.
  • x86_64: emit callq *__tls_get_addr@GOTPCREL(%rip) instead of callq __tls_get_addr@PLT

Reorganize the code by separating 32-bit and 64-bit.

Diff Detail

Event Timeline

MaskRay created this revision.May 19 2019, 2:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2019, 2:10 AM
MaskRay updated this revision to Diff 200174.May 19 2019, 2:11 AM
MaskRay edited the summary of this revision. (Show Details)

Fix markdown in the description...

Harbormaster completed remote builds in B32117: Diff 200174.
dalias added a subscriber: dalias.May 19 2019, 11:36 PM

Hardcoding ebx defeats most of the purpose of no-plt, which was motivated by: https://ewontfix.com/18/

https://ewontfix.com/18/ was enlightening. I read it two or three years ago for the first time but apparently I lacked background knowledge to understand these stuff then :) Now I understand how -fno-plt benefits people who don't care about lazy binding...

call    __x86.get_pc_thunk.cx
jmp *bar@GOT+_GLOBAL_OFFSET_TABLE_(%ecx)

I think the relocation type is unavailable. This requires something like G+GOT+A-* (think R_X86_64_GOTPCREL G+GOT+A-P) but that is not in the psABI.

We can still do better by switching to %eax, %ecx or %edx, but I think this function does not have information to make the decision. We should probably propagate the -fno-plt (RtLibUseGot) information to several levels above but I know really little about the x86 codegen to achieve that...

I have to add a TODO item in the description to note down this future improvement.

MaskRay updated this revision to Diff 200222.May 20 2019, 2:15 AM
MaskRay edited the summary of this revision. (Show Details)

Add a TODO

I'm not sure how LLVM internals work, but if it's anything like GCC's, I think you just need to make it take a register operand for the GOT address as an input/operand. Then the higher-level code should ensure it's available, and then you just use it in the code emitted. Touching higher-level code shouldn't be needed.

rnk accepted this revision.May 22 2019, 2:33 PM

Looks good to me.

I'm not 100% following the discussion of using EBX as the PIC register, but I think it's a pre-existing issue for 32-bit code. I think it's unlikely that anyone will ever care enough to improve our 32-bit codegen, so it seems reasonable to leave it alone for now (or forever).

lib/Target/X86/X86MCInstLower.cpp
690

I think it would be more readable to retain this NeedsPadding variable, but make it evaluate to SRVK == MCSymbolRefExpr::VK_TLSGD.

This revision is now accepted and ready to land.May 22 2019, 2:33 PM
MaskRay updated this revision to Diff 200836.May 22 2019, 4:40 PM
MaskRay edited the summary of this revision. (Show Details)

Address review comment

This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.Jul 7 2019, 4:56 AM

I'm seeing linker failures that I suspect are related to this change:

/usr/bin/ld: /home/nikic/rust/foo/libtest2.rlib(test2.test2.3a1fbbbh-cgu.0.rcgu.o): TLS transition from R_X86_64_TLSLD to R_X86_64_TPOFF32 against `_ZN5test24TEST17h9cebda1c5e0bffdeE' at 0x4 in section `.text._ZN5test28get_test17h7831513ccc936b83E' failed

Old:

0000000000000000 <_ZN5test28get_test17h7831513ccc936b83E>:
   0:   50                      push   %rax
   1:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 8 <_ZN5test28get_test17h7831513ccc936b83E+0x8>
            4: R_X86_64_TLSLD   _ZN5test24TEST17h9cebda1c5e0bffdeE-0x4
   8:   e8 00 00 00 00          callq  d <_ZN5test28get_test17h7831513ccc936b83E+0xd>
            9: R_X86_64_PLT32   __tls_get_addr-0x4
   d:   48 8b 80 00 00 00 00    mov    0x0(%rax),%rax
            10: R_X86_64_DTPOFF32   _ZN5test24TEST17h9cebda1c5e0bffdeE
  14:   59                      pop    %rcx
  15:   c3                      retq

New:

0000000000000000 <_ZN5test28get_test17h7831513ccc936b83E>:
   0:   50                      push   %rax
   1:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 8 <_ZN5test28get_test17h7831513ccc936b83E+0x8>
            4: R_X86_64_TLSLD   _ZN5test24TEST17h9cebda1c5e0bffdeE-0x4
   8:   ff 15 00 00 00 00       callq  *0x0(%rip)        # e <_ZN5test28get_test17h7831513ccc936b83E+0xe>
            a: R_X86_64_GOTPCREL    __tls_get_addr-0x4
   e:   48 8b 80 00 00 00 00    mov    0x0(%rax),%rax
            11: R_X86_64_DTPOFF32   _ZN5test24TEST17h9cebda1c5e0bffdeE
  15:   59                      pop    %rcx
  16:   c3                      retq

I'm not really familiar with TLS relocations -- do you have any pointers on what could be going wrong here?

MaskRay added a comment.EditedJul 7 2019, 5:47 AM

I'm seeing linker failures that I suspect are related to this change:

/usr/bin/ld: /home/nikic/rust/foo/libtest2.rlib(test2.test2.3a1fbbbh-cgu.0.rcgu.o): TLS transition from R_X86_64_TLSLD to R_X86_64_TPOFF32 against `_ZN5test24TEST17h9cebda1c5e0bffdeE' at 0x4 in section `.text._ZN5test28get_test17h7831513ccc936b83E' failed

Old:

0000000000000000 <_ZN5test28get_test17h7831513ccc936b83E>:
   0:   50                      push   %rax
   1:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 8 <_ZN5test28get_test17h7831513ccc936b83E+0x8>
            4: R_X86_64_TLSLD   _ZN5test24TEST17h9cebda1c5e0bffdeE-0x4
   8:   e8 00 00 00 00          callq  d <_ZN5test28get_test17h7831513ccc936b83E+0xd>
            9: R_X86_64_PLT32   __tls_get_addr-0x4
   d:   48 8b 80 00 00 00 00    mov    0x0(%rax),%rax
            10: R_X86_64_DTPOFF32   _ZN5test24TEST17h9cebda1c5e0bffdeE
  14:   59                      pop    %rcx
  15:   c3                      retq

New:

0000000000000000 <_ZN5test28get_test17h7831513ccc936b83E>:
   0:   50                      push   %rax
   1:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 8 <_ZN5test28get_test17h7831513ccc936b83E+0x8>
            4: R_X86_64_TLSLD   _ZN5test24TEST17h9cebda1c5e0bffdeE-0x4
   8:   ff 15 00 00 00 00       callq  *0x0(%rip)        # e <_ZN5test28get_test17h7831513ccc936b83E+0xe>
            a: R_X86_64_GOTPCREL    __tls_get_addr-0x4
   e:   48 8b 80 00 00 00 00    mov    0x0(%rax),%rax
            11: R_X86_64_DTPOFF32   _ZN5test24TEST17h9cebda1c5e0bffdeE
  15:   59                      pop    %rcx
  16:   c3                      retq

I'm not really familiar with TLS relocations -- do you have any pointers on what could be going wrong here?

% cat a.cc
static thread_local int a;
int main() { return a; }
% clang -fno-plt -fpic a.cc -Wa,-mrelax-relocations=no -fuse-ld=gold
% clang -fno-plt -fpic a.cc -Wa,-mrelax-relocations=no -fuse-ld=lld
% clang -fno-plt -fpic a.cc -Wa,-mrelax-relocations=no -fuse-ld=bfd
/usr/bin/ld.bfd: /tmp/a-9fd9f7.o: TLS transition from R_X86_64_TLSLD to R_X86_64_TPOFF32 against `_ZL1a' at 0x17 in section `.text' failed
# bfd/elf-x86-64.c:elf_x86_64_check_tls_transition
...
    if (largepic)
      return r_type == R_X86_64_PLTOFF64;
    else if (indirect_call)
      return r_type == R_X86_64_GOTPCRELX; // it doesn't allow R_X86_64_GOTPCREL
    else
      return (r_type == R_X86_64_PC32 || r_type == R_X86_64_PLT32);

ld.bfd doesn't allow R_X86_64_GOTPCREL. You can work around the issue with:

% clang -fno-plt -fpic a.cc -Wa,-mrelax-relocations=yes -fuse-ld=bfd

Filed a bug against binutils-gdb: https://sourceware.org/bugzilla/show_bug.cgi?id=24784

nikic added a comment.EditedJul 7 2019, 6:39 AM

@MaskRay Thanks for checking! Can you tell me which binutils version you are using? It looks like using plain ld works for you, but I'm seeing this error with binutils 2.30 ld (gold 1.15 works though).

Specifically I'm seeing this with /usr/bin/ld -o test file1.o file2.rlib with the files F9486230 and F9486238.

Edit: Ooops, never mind. I did not realize that ld and ld.bfd are the same thing.

nikic added a comment.Jul 7 2019, 7:22 AM

ld.bfd doesn't allow R_X86_64_GOTPCREL. You can work around the issue with:

% clang -fno-plt -fpic a.cc -Wa,-mrelax-relocations=yes -fuse-ld=bfd

Thanks, I've verified that setting RelaxELFRelocations in TargetOptions indeed works around the issue. I'm a bit concerned about what other compatibility issues this will cause though. If I understood correctly, relax relocations is a relatively recent feature and not compatible with older linkers, so may not be a suitable compiler default.

ld.bfd doesn't allow R_X86_64_GOTPCREL. You can work around the issue with:

% clang -fno-plt -fpic a.cc -Wa,-mrelax-relocations=yes -fuse-ld=bfd

Thanks, I've verified that setting RelaxELFRelocations in TargetOptions indeed works around the issue. I'm a bit concerned about what other compatibility issues this will cause though. If I understood correctly, relax relocations is a relatively recent feature and not compatible with older linkers, so may not be a suitable compiler default.

I think R_X86_64_GOTPCRELX should be usable in ld.bfd and gold since 2016. In gold, -fno-plt R_X86_64_GOTPCREL was supported in Aug 2016 (https://sourceware.org/bugzilla/show_bug.cgi?id=20216).

Do you know why the Rust lib uses -fno-plt TLS code sequence? I'm thinking if its depended llvm should be configured with -DENABLE_X86_RELAX_RELOCATIONS=ON so that you don't need -Wa,-mrelax-relocations=yes

% llvm-objdump -dr file2.rlib
...
       8: ff 15 00 00 00 00             callq   *(%rip)
                000000000000000a:  R_X86_64_GOTPCREL    __tls_get_addr-4
nikic added a comment.Jul 7 2019, 1:12 PM

ld.bfd doesn't allow R_X86_64_GOTPCREL. You can work around the issue with:

% clang -fno-plt -fpic a.cc -Wa,-mrelax-relocations=yes -fuse-ld=bfd

Thanks, I've verified that setting RelaxELFRelocations in TargetOptions indeed works around the issue. I'm a bit concerned about what other compatibility issues this will cause though. If I understood correctly, relax relocations is a relatively recent feature and not compatible with older linkers, so may not be a suitable compiler default.

I think R_X86_64_GOTPCRELX should be usable in ld.bfd and gold since 2016. In gold, -fno-plt R_X86_64_GOTPCREL was supported in Aug 2016 (https://sourceware.org/bugzilla/show_bug.cgi?id=20216).

Thanks for the information. As 2016 is still quite recent, I don't think we'll be able to turn on relax relocations by default and need some other solution. Given the flaky linker support, would it be possible to add an option to keep using PLT for __tls_get_addr despite RtLibUseGOT?

Do you know why the Rust lib uses -fno-plt TLS code sequence?

As rust uses full relro, using PLT doesn't really have any advantage, so it is disabled by default nowadays.

dalias added a comment.Jul 7 2019, 5:19 PM

Have you checked what gcc does? It should be safe to do the same, no? Or is the issue that gcc leaves it to gas, and it's reasonably assumed that gas and ld.bfd versions will match, whereas clang uses an internal assembler that might not match?

MaskRay added a comment.EditedJul 7 2019, 8:16 PM

Have you checked what gcc does? It should be safe to do the same, no? Or is the issue that gcc leaves it to gas, and it's reasonably assumed that gas and ld.bfd versions will match, whereas clang uses an internal assembler that might not match?

gcc does something like:

(define_insn "*tls_global_dynamic_64_<mode>"
...
if (flag_plt || !HAVE_AS_IX86_TLS_GET_ADDR_GOT)
  return "call\t%P2";
return "call\t{*%p2@GOT(%1)|[DWORD PTR %p2@GOT[%1]]}";

I haven't built it but it apparently it guards the issue with the configure-time check. I'll have to work around the bug with a source change.. D64304

@nikic Can you share a reproduce of the Rust code?