This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Fix LLD performing an invalid IE->LE relaxation for a thread-local variable on x86-64
AbandonedPublic

Authored by artem on Mar 1 2023, 5:26 AM.

Details

Reviewers
MaskRay
Summary

Reproducer

main.cpp:

void do_some_fun();

thread_local double kek;

int main() {
  kek = 42.0;
  do_some_fun();
}

bug.cpp:

#include <cstdio>

extern thread_local __attribute__((weak)) double kek;

void do_some_fun() {
  if (&kek != nullptr) {
    printf("The address is not nullptr!\n");
  } else {
    printf("The address is nullptr :(\n");
  }
  printf("Actually, the address is %p\n", &kek);
  printf("And the value is %f\n", kek);
}

Commands to reproduce:

clang++ -O3 -c main.cpp -o main.o
clang++ -O3 -c bug.cpp -o bug.o
clang++ -fuse-ld=lld main.o bug.o -o program

Expected behavior

The address is not nullptr!
Actually, the address is 0x7fcb82adb738
And the value is 42.000000

Actual behavior

The address is nullptr :(
Actually, the address is 0x7fa226ab5738
And the value is 42.000000

Explanation

Clang generates the following assembly:

movq    %fs:0, %rax
addq    kek@GOTTPOFF(%rip), %rax ; ADDQ sets ZF if the address is NULL
leaq    .Lstr(%rip), %rax        ; "The value is nullptr :("
leaq    .Lstr.3(%rip), %rdi      ; "The value is not nullptr!"
cmoveq  %rax, %rdi               ; Checks ZF
callq   puts@PLT

and a R_X86_64_GOTTPOFF relocation.

Then, when LLD discovers location of kek, it tries to relax the relocation to R_X86_64_TPOFF32. The transformation uses LEAQ thus proper ZF is lost and CMOVEQ misbehaves.

Diff Detail

Event Timeline

artem created this revision.Mar 1 2023, 5:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2023, 5:26 AM
artem requested review of this revision.Mar 1 2023, 5:26 AM
artem edited the summary of this revision. (Show Details)
MaskRay requested changes to this revision.Mar 1 2023, 9:43 AM

Clang generates the following assembly:

Which Clang version do you use? I cannot reproduce with clang trunk, 14.0, or 15.0 with clang++ main.cpp bug.cpp; ./a.out.

Anyway I don't think there is a bug in lld. lld's behavior matches gold while GNU ld has an assertion failure for some cases in tls-opt.s: assertion fail ../../bfd/elf64-x86-64.c:4159.
The GNU ld issue suggests that it doesn't expect the TLS code sequence provided by your Clang.

The weak attribute for TLS can essentially be ignored as undefined weak TLS doesn't really have the "if undefined, resolve to zero" semantics.
The user code extern thread_local __attribute__((weak)) double kek; probably should lead to an error.

This revision now requires changes to proceed.Mar 1 2023, 9:43 AM
artem added a comment.Mar 1 2023, 1:38 PM

Clang generates the following assembly:

Which Clang version do you use? I cannot reproduce with clang trunk, 14.0, or 15.0 with clang++ main.cpp bug.cpp; ./a.out.

Ubuntu clang version 15.0.6. I think you need to specify -O3 for compiler to produce such code

Anyway I don't think there is a bug in lld. lld's behavior matches gold while GNU ld has an assertion failure for some cases in tls-opt.s: assertion fail ../../bfd/elf64-x86-64.c:4159.
The GNU ld issue suggests that it doesn't expect the TLS code sequence provided by your Clang.

I used GNU ld (GNU Binutils for Ubuntu) 2.39. Without any extra flags, it successfully links the binary, which shows exactly the same wrong behavior. HOWERVER, passing -rdynamic to BFD magically fixes the issue, it emits ADD instruction ¯\_(ツ)_/¯

The weak attribute for TLS can essentially be ignored as undefined weak TLS doesn't really have the "if undefined, resolve to zero" semantics.
The user code extern thread_local __attribute__((weak)) double kek; probably should lead to an error.

I understand, but I added the __attribute__((weak)) with the only reason for Clang to not compile out the condition check.

P.S. I have a reproducer for GCC+LLD, it does not declare the variable as weak, but requires code to be instrumented with UBSan and is much less concise. Actually, that's the piece of code which originally helped me to discover the issue (the code in this differential's summary is hand-crafted). You can view it in attachments (tested with gcc version 12.2.0):

Clang generates the following assembly:

Which Clang version do you use? I cannot reproduce with clang trunk, 14.0, or 15.0 with clang++ main.cpp bug.cpp; ./a.out.

Ubuntu clang version 15.0.6. I think you need to specify -O3 for compiler to produce such code

I forgot to say but I have tried -O0/-O2/-O3 for all of 14.0, 15.0, and trunk from official releases. Not reproducible...
I don't know whether Ubuntu has made some patching which may change the behavior.

Anyway I don't think there is a bug in lld. lld's behavior matches gold while GNU ld has an assertion failure for some cases in tls-opt.s: assertion fail ../../bfd/elf64-x86-64.c:4159.
The GNU ld issue suggests that it doesn't expect the TLS code sequence provided by your Clang.

I used GNU ld (GNU Binutils for Ubuntu) 2.39. Without any extra flags, it successfully links the binary, which shows exactly the same wrong behavior. HOWERVER, passing -rdynamic to BFD magically fixes the issue, it emits ADD instruction ¯\_(ツ)_/¯

Thanks. I confirm this behavior with and without -rdynamic, so I think GNU ls is doing something strange...
Whether or not the TLS symbol or its associated TLS init function for is exported or not, the TLS optimization behavior should not change.

The weak attribute for TLS can essentially be ignored as undefined weak TLS doesn't really have the "if undefined, resolve to zero" semantics.
The user code extern thread_local __attribute__((weak)) double kek; probably should lead to an error.

I understand, but I added the __attribute__((weak)) with the only reason for Clang to not compile out the condition check.

P.S. I have a reproducer for GCC+LLD, it does not declare the variable as weak, but requires code to be instrumented with UBSan and is much less concise. Actually, that's the piece of code which originally helped me to discover the issue (the code in this differential's summary is hand-crafted). You can view it in attachments (tested with gcc version 12.2.0):

That said, I believe this is more of a GCC bug. If you can reproduce it with Clang, it is a Clang bug as well.

Link-time TLS optimization is a protocol between the compiler and the linker.
The linker just blindly trusts that the compiler output is legal and causes the situation: not updating eflags due to add=>lea transition causes a condition to fail.
For your reproduce, it is a bug of GCC's -fsanitize=undefined.
Since lld's behavior matches GNU ld's relaxation scheme (even if it is just the non-export-dynamic case), I don't think this a lld bug.

I will investigate more and add more information about _ZTH*, weak on my https://maskray.me/blog/2021-02-14-all-about-thread-local-storage

With these said, since this patch looks like a code simplification, if we can get GNU ld do something similar, we can revisit this patch.
I am not so sure whether changing lea to add (which will update eflags) may cause other unknown problems, though.

If you can get a smaller reproduce, you can report this on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93623

The key is extern thread_local double precise_now (__thread will paper over the bug) which has a undefined weak symbol _ZTH* which will be instrumented by -fsanitize=undefined.

artem added a comment.EditedMar 2 2023, 7:48 AM

I forgot to say but I have tried -O0/-O2/-O3 for all of 14.0, 15.0, and trunk from official releases. Not reproducible...
I don't know whether Ubuntu has made some patching which may change the behavior.

Here's a godbolt link: https://godbolt.org/z/3vMW8vz1s

Link-time TLS optimization is a protocol between the compiler and the linker.
The linker just blindly trusts that the compiler output is legal and causes the situation: not updating eflags due to add=>lea transition causes a condition to fail.

While I tend agree with this statement, I have to admit ELF ABI for TLS seems to be language agnostic. If it is possible to write an assembly program that does not violate the ABI but triggers the problem, it is outside the scope of both GCC and Clang. I quickly listed through the "ELF Handling For Thread-Local Storage" and couldn't find an explicit prohibition against dependency on the side effects of accessing IE variables.

With these said, since this patch looks like a code simplification, if we can get GNU ld do something similar, we can revisit this patch.
I am not so sure whether changing lea to add (which will update eflags) may cause other unknown problems, though.

Should I submit a patch or discuss it with someone somewhere first? I am not very competent in this kind of topics and how it's usually being done.

If you can get a smaller reproduce, you can report this on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93623
The key is extern thread_local double precise_now (__thread will paper over the bug) which has a undefined weak symbol _ZTH* which will be instrumented by -fsanitize=undefined.

Thanks for the suggestion. Do you mean that specific issue (it does not seem to be related to our subject at first glance) or reporting to GCC's Bugzilla in general?
As for __thread, it's a nice catch. I think can be used as a workaround until the issue is solved. Also, it exaplains why GCC has decided to instrument a non-weak symbol.

I forgot to say but I have tried -O0/-O2/-O3 for all of 14.0, 15.0, and trunk from official releases. Not reproducible...
I don't know whether Ubuntu has made some patching which may change the behavior.

Here's a godbolt link: https://godbolt.org/z/3vMW8vz1s

OK, I used 15.0.6 which doesn't use addq kek@GOTTPOFF(%rip), %rbx.
On Clang&LLVM side, D131716 fixed the bug.

Link-time TLS optimization is a protocol between the compiler and the linker.
The linker just blindly trusts that the compiler output is legal and causes the situation: not updating eflags due to add=>lea transition causes a condition to fail.

While I tend agree with this statement, I have to admit ELF ABI for TLS seems to be language agnostic. If it is possible to write an assembly program that does not violate the ABI but triggers the problem, it is outside the scope of both GCC and Clang. I quickly listed through the "ELF Handling For Thread-Local Storage" and couldn't find an explicit prohibition against dependency on the side effects of accessing IE variables.

No, "ELF Handling For Thread-Local Storage" is not authoritative for x86-64. We should use "System V Application Binary Interface AMD64 Architecture Processor Supplement" "TLS linker optimization" as the canonical reference, which mentions Table 10.17: IE -> LE Code Transition With Lea.

With these said, since this patch looks like a code simplification, if we can get GNU ld do something similar, we can revisit this patch.
I am not so sure whether changing lea to add (which will update eflags) may cause other unknown problems, though.

Should I submit a patch or discuss it with someone somewhere first? I am not very competent in this kind of topics and how it's usually being done.

OK, now that I re-checked the x86-64 ABI, I think this change will likely not be accepted on binutils side :(
You can just file a bug on https://gcc.gnu.org/bugzilla/show_bug.cgi and there are folks working on binutils, too.

If you can get a smaller reproduce, you can report this on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93623
The key is extern thread_local double precise_now (__thread will paper over the bug) which has a undefined weak symbol _ZTH* which will be instrumented by -fsanitize=undefined.

Thanks for the suggestion. Do you mean that specific issue (it does not seem to be related to our subject at first glance) or reporting to GCC's Bugzilla in general?
As for __thread, it's a nice catch. I think can be used as a workaround until the issue is solved. Also, it exaplains why GCC has decided to instrument a non-weak symbol.

artem abandoned this revision.Aug 7 2023, 7:07 AM