Page MenuHomePhabricator

[PowerPC][LLD] Detecting and fixing missing TLS relocation on __tls_get_addr
Needs ReviewPublic

Authored by stefanp on Tue, Nov 17, 3:35 AM.

Details

Reviewers
nemanjai
MaskRay
NeHuang
sfertile
espindola
Group Reviewers
Restricted Project
Summary

This is a replacement and extension for this patch:
https://reviews.llvm.org/D85994

The standard representation of Thread Local Storage (TLS) is as follows:

addis r3, r2, x@got@tlsgd@ha // R_PPC64_GOT_TLSGD16_HA
addi r3, r3, x@got@tlsgd@l // R_PPC64_GOT_TLSGD16_LO
bl __tls_get_addr(x@tlsgd) // R_PPC64_TLSGD followed by R_PPC64_REL24
nop

The above is specified in the Power PC ELFv2 ABI in section:
3.7.3.1. General Dynamic TLS Model

However, there are two deviations form the above that have been seen in actual
code.

  1. A direct call to __tls_get_addr.
bl __tls_get_addr
nop
  1. Missing R_PPC64_TLSGD relocation.
addis r3, r2, x@got@tlsgd@ha // R_PPC64_GOT_TLSGD16_HA
addi r3, r3, x@got@tlsgd@l // R_PPC64_GOT_TLSGD16_LO
bl __tls_get_addr // R_PPC64_REL24
nop

Both forms are technically not valid according to the ABI. No new code should
be producing either of the two forms.

However, a direct call does not cause lld to produce incorrect code gen. If
the user is careful enough to correctly setup the inputs to the function the
code will work. Also, it appears that some implementations of ld.so do this
direct call. Therefore, this patch silently allows for the simple direct call.

The second situation where the R_PPC64_TLSGD relocation is missing is a
different story. Generating the second code sequence will cause lld to produce
incorrect code gen when relaxing the sequence to Local Exec. The first two
relocations (R_PPC64_GOT_TLSGD16_HA and R_PPC64_GOT_TLSGD16_LO) will be relaxed
correctly but since the call is missing a relocation it will be left as-is
which will ultimately result in an incorrect TLS address being computed. This
patch tries to address the problem of the missing R_PPC64_TLSGD relocaiton.

An option has been added --add-missing-tls-reloc.
By default if the option is not specified lld will report an error or a warning
when the missing relocation is encountered. If the option is specified then lld
will try to add the missing relocation to the __tls_get_addr call.

Diff Detail

Event Timeline

stefanp created this revision.Tue, Nov 17, 3:35 AM
Herald added a project: Restricted Project. · View Herald Transcript
stefanp requested review of this revision.Tue, Nov 17, 3:35 AM
stefanp added a reviewer: Restricted Project.Tue, Nov 17, 3:36 AM

I remember that I have questioned about the error checking when the patch was initially brought up. The introduction of a new linker option for an incorrect case where compiler will not generate such code makes me feel more uneasy about it. Surely the linker can detect such errors but does it really have to? With assembly the user can play all sorts of tricks which will not work, but is that the duty of the linker?

NeHuang added inline comments.Wed, Nov 18, 3:45 PM
lld/test/ELF/ppc64-tls-add-missing-gdld.s
30

do we need to add two test cases here to cover?

  • add missing tls relocation for __tls_get_addr@notoc(x@tlsgd)
  • add missing tls relocation for __tls_get_addr@notoc(x@tlsld)
sfertile added inline comments.Thu, Nov 19, 8:36 AM
lld/ELF/Relocations.cpp
1316

I don't think this is something you can reliably determine. For example consider the code:

# r30 is previously saved to the stack.
# setup r3 for the call to __tls_get_addr
 addis 3, 2, i@got@tlsgd@ha
 addi 3, 3, i@got@tlsgd@l
# Load unrelated variable into a non-volatile register so it will stay live across the call.
 addis 30, 2, j@toc@ha
 addi 30, j@toc@l(30)
 # call __tls_get_addr. Relocation I - 2 is NOT one of the relocations used to setup the argument.
 bl __tls_get_addr(i@tlsgd)
 nop
stefanp edited the summary of this revision. (Show Details)Fri, Nov 20, 7:16 AM

I remember that I have questioned about the error checking when the patch was initially brought up. The introduction of a new linker option for an incorrect case where compiler will not generate such code makes me feel more uneasy about it. Surely the linker can detect such errors but does it really have to? With assembly the user can play all sorts of tricks which will not work, but is that the duty of the linker?

I agree that we cannot generate this sequence from C source using clang. However, the reason I am still trying to detect this problem is because we know that we have older versions of the XLC compiler that actually produced the incorrect setup with the missing relocation and we end up with bad code gen when linking in binaries with that missing relocation. These incorrectly compiled binaries do exist in real life and the users may have no idea that they have been compiled incorrectly.

lld/ELF/Relocations.cpp
1316

I agree.
However, in the above situation we are no worse off than we are now. Right now we fail to detect all such situations. If we add this patch we will detect most of these situations but as you mentioned above we can miss a few.

If we want to get them all we have to also flag the direct call to __tls_get_addr but I'm trying to avoid that situation.

Another option is to search up until I find a relocation that is either

  1. @tlsgd which indicates an error.

OR

  1. Function call which indicates we should stop the search without an error.
  2. Finished the search and nothing was found of the other cases in which case we stop without an error.
lld/test/ELF/ppc64-tls-add-missing-gdld.s
30

Yes I can add the @notoc versions.
Good catch. I missed those.

Both forms are technically not valid according to the ABI. No new code should be producing either of the two forms.

FWIW currently all ppc64 ld.so implementations I know (glibc/musl/FreeBSD rtld-elf) need to call a raw __tls_get_addr. This is a requisite.

The second situation where the R_PPC64_TLSGD relocation is missing is a different story. Generating the second code sequence will cause lld to produce incorrect code gen when relaxing the sequence to Local Exec.

I agree with the assertion but the complexity makes me feel uneasy. If it is a property which can be checked simply/efficiently/relatively prevailing in the wild, sure we should add it. Such examples include: absolute relocations referencing a non-absolute symbol; non-absolute relocation referencing an absolute symbol in -pie/-shared mode; relocations referencing local symbols defined in a discarded COMDAT; ...

These diagnostics are all fairly common and checking have little overhead in code and runtime costs.
However, this is not true for the code added in this patch. The long enumeration of relocation types make
the readable LLD bring closer to the hard-to-read bfd/elf*-ppc.c
String comparison coming with the check as common as type == R_PPC64_REL24 is harmful to the performance.

I am still questioning the motivation behind this change. If there are indeed ABI issues from older compilers,
can they add a wrapper around the linker if the check is deemed useful?

(In addition, binutils does not have --add-missing-tls-reloc.)

I am still questioning the motivation behind this change. If there are indeed ABI issues from older compilers,
can they add a wrapper around the linker if the check is deemed useful?

I don't really see it as an improvement in usability to have to post-process object files, potentially introducing another possible point of failure vs. providing an option to the linker.

(In addition, binutils does not have --add-missing-tls-reloc.)

I think that a departure from option compatibility is reasonable here if we can provide an option that will:

  • Allow us to handle the missing relocation when the option is specified
  • Not introduce any performance overhead when the option is not specified (i.e. the default behaviour)

Note that I am not saying this option achieves both of those goals.
It would appear to me that we really have 3 situations here that are kind of at odds with each other:

  1. Typical compiler-introduced calls to __tls_get_addr with the necessary relocations for the symbol
  2. The weird usage in ld.so with a missing relocation that does not require a relocation as there is no symbol
  3. A non-ABI compliant compiler-introduced call with a value associated with a symbol in the parameter register but a missing relocation

The current state is that the linker does not produce errors on any of those but produces incorrect code for 3. Adding this patch introduces performance overhead by default and handles 3. by either emitting an error or fixing up the relocation to avoid producing incorrect code. It has the further issue that it assumes the immediately preceding relocation was for setting up the parameter to __tls_get_addr which may not be the case as @sfertile pointed out.
So can we do something like this:

  • Leave default behaviour as it is now
  • Provide an option to search for and fix up missing relocations and error out if it cannot

Case 2. above would not work with this option, but that really doesn't seem like a problem as builds that do that will never specify the option. Furthermore, we will continue to emit incorrect code for case 3. above by default, but that is also not a problem as the documentation can suggest using the option whenever some objects are produced by older versions of the XL compiler.

MaskRay added a comment.EditedTue, Nov 24, 1:44 PM

I am still questioning the motivation behind this change. If there are indeed ABI issues from older compilers,
can they add a wrapper around the linker if the check is deemed useful?

I don't really see it as an improvement in usability to have to post-process object files, potentially introducing another possible point of failure vs. providing an option to the linker.

(In addition, binutils does not have --add-missing-tls-reloc.)

I think that a departure from option compatibility is reasonable here if we can provide an option that will:

  • Allow us to handle the missing relocation when the option is specified
  • Not introduce any performance overhead when the option is not specified (i.e. the default behaviour)

Note that I am not saying this option achieves both of those goals.
It would appear to me that we really have 3 situations here that are kind of at odds with each other:

  1. Typical compiler-introduced calls to __tls_get_addr with the necessary relocations for the symbol
  2. The weird usage in ld.so with a missing relocation that does not require a relocation as there is no symbol
  3. A non-ABI compliant compiler-introduced call with a value associated with a symbol in the parameter register but a missing relocation

The current state is that the linker does not produce errors on any of those but produces incorrect code for 3. Adding this patch introduces performance overhead by default and handles 3. by either emitting an error or fixing up the relocation to avoid producing incorrect code. It has the further issue that it assumes the immediately preceding relocation was for setting up the parameter to __tls_get_addr which may not be the case as @sfertile pointed out.
So can we do something like this:

  • Leave default behaviour as it is now
  • Provide an option to search for and fix up missing relocations and error out if it cannot

Case 2. above would not work with this option, but that really doesn't seem like a problem as builds that do that will never specify the option. Furthermore, we will continue to emit incorrect code for case 3. above by default, but that is also not a problem as the documentation can suggest using the option whenever some objects are produced by older versions of the XL compiler.

Looks like the proposed option needs to be tri-state? (a) no diagnostic (the default behavior, this is needed to link glibc/musl/FreeBSD ld.so) (b) call handleTlsRelocation (this looks out-of-place to me and I would really hope we don't have this) (c) error. It looks like (b) and (c) can have false positives as well.

If GNU ld is going to add a similar option, I hope we can be consistent. So can someone also make a request on the mailing list binutils@sourceware.org ?
I would hope ppc64 is part of the option name to make it clear this is a ppc64 specific workaround.

I don't think adding an option to GNU ld is appropriate because it does the slow thing by default. Stefan is working on some test cases to illustrate GNU ld's behaviour so I think we should start with what it does and see whether we want to match behaviour and whether we want to do so under option control.

I don't think adding an option to GNU ld is appropriate because it does the slow thing by default. Stefan is working on some test cases to illustrate GNU ld's behaviour so I think we should start with what it does and see whether we want to match behaviour and whether we want to do so under option control.

My request about raising this on binutils is less about implementing the functionality for GNU ld, but more for reaching a consensus on (a) the value of this option (as you can see, I am still doubtful; but acceptance on their side can make me feel more happy to accept it) (b) the naming of the option. If they decide to add an option, that is fine; it don't, hope they will not use a different option name for the same purpose (c) getting opinions on what may be missed from the current design.

stefanp added a comment.EditedWed, Nov 25, 7:45 PM

I have done some testing with gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 and GNU ld (GNU Binutils for Ubuntu) 2.30.

It appears that ld.bfd by default fixes up the relocation if it is missing and then does the relaxation correctly. I don't need to specify any options for this to be done.
As a simple example:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

The bl __tls_get_addr is missing the relocation ld.bfd still relaxes it correctly.
However, with a more complex example that I wrote by hand:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

in the above case ld.bfd does not recognize the pattern for the TLS and so it turns off TLS relaxations completely.
The reason I brought up these examples is because I feel that detecting the difference between the two cases in lld would be unreasonably complicated an probably not worth it. As a result I feel that it may not be feasible to do the same thing that ld.bfd is doing.

Here is my new proposal:
We can add an option to lld. Unfortunately this option will not exist in ld.bfd because they wouldn't need it.
The option is off by default.
If the option is off then lld will behave exactly the same way that it always has. No error detection for missing relocations.
If the option is on then lld will check all calls to __get_tls_addr and if it finds a missing relocation it will turn off TLS relaxations. It won't try to add the missing relocation and it won't throw an error.
This design will not cause problems to default behaviour (and it shouldn't add overhead to the default link either) but it will also allow linking with code that was generated with the legacy compiler if the user specifies the option.

How does this new idea sound?
If everyone agrees on this idea (or a variant of this idea) I will rework the patch like this.

I have done some testing with gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 and GNU ld (GNU Binutils for Ubuntu) 2.30.

It appears that ld.bfd by default fixes up the relocation if it is missing and then does the relaxation correctly. I don't need to specify any options for this to be done.
As a simple example:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

The bl __tls_get_addr is missing the relocation ld.bfd still relaxes it correctly.
However, with a more complex example that I wrote by hand:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

in the above case ld.bfd does not recognize the pattern for the TLS and so it turns off TLS relaxations completely.
The reason I brought up these examples is because I feel that detecting the difference between the two cases in lld would be unreasonably complicated an probably not worth it. As a result I feel that it may not be feasible to do the same thing that ld.bfd is doing.

Thanks. I did not know that GNU ld already has the functionality. That does justify the motivation behind the patch.

Here is my new proposal:
We can add an option to lld. Unfortunately this option will not exist in ld.bfd because they wouldn't need it.

Thanks. This makes sense.

The option is off by default.
If the option is off then lld will behave exactly the same way that it always has. No error detection for missing relocations.

Sounds good, if we don't implement a fullblown heuristic (which can be overkill and become maintenance burden) to reduce false positives.

If the option is on then lld will check all calls to __get_tls_addr and if it finds a missing relocation it will turn off TLS relaxations. It won't try to add the missing relocation and it won't throw an error.
This design will not cause problems to default behaviour (and it shouldn't add overhead to the default link either) but it will also allow linking with code that was generated with the legacy compiler if the user specifies the option.

Oh, I missed the case that the current patch implemented TLS relaxation. This is indeed unnecessary if the idea is just to make legacy code work, not to make them also efficient.
If not implementing TLS relaxation can make the code simpler, then go for it!

Another thing: R_PPC64_REL24_NOTOC is new. Does it really need the workaround?

How does this new idea sound?
If everyone agrees on this idea (or a variant of this idea) I will rework the patch like this.

I have done some testing with gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 and GNU ld (GNU Binutils for Ubuntu) 2.30.

It appears that ld.bfd by default fixes up the relocation if it is missing and then does the relaxation correctly. I don't need to specify any options for this to be done.
As a simple example:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

The bl __tls_get_addr is missing the relocation ld.bfd still relaxes it correctly.
However, with a more complex example that I wrote by hand:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

in the above case ld.bfd does not recognize the pattern for the TLS and so it turns off TLS relaxations completely.
The reason I brought up these examples is because I feel that detecting the difference between the two cases in lld would be unreasonably complicated an probably not worth it. As a result I feel that it may not be feasible to do the same thing that ld.bfd is doing.

Here is my new proposal:
We can add an option to lld. Unfortunately this option will not exist in ld.bfd because they wouldn't need it.
The option is off by default.
If the option is off then lld will behave exactly the same way that it always has. No error detection for missing relocations.
If the option is on then lld will check all calls to __get_tls_addr and if it finds a missing relocation it will turn off TLS relaxations. It won't try to add the missing relocation and it won't throw an error.
This design will not cause problems to default behaviour (and it shouldn't add overhead to the default link either) but it will also allow linking with code that was generated with the legacy compiler if the user specifies the option.

How does this new idea sound?
If everyone agrees on this idea (or a variant of this idea) I will rework the patch like this.

Instead of naming the option --add-*, we could also consider --fix-* (some arm/aarch64 specific options start with --fix-*, as you can see from Options.td)