This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][LLD] Detecting and fixing missing TLS relocation on __tls_get_addr
AbandonedPublic

Authored by stefanp on Nov 17 2020, 3:35 AM.

Details

Reviewers
nemanjai
MaskRay
NeHuang
sfertile
espindola
Group Reviewers
Restricted Project
Summary

This is a replacement and extension for this patch:
https://reviews.llvm.org/D85994

The standard representation of Thread Local Storage (TLS) is as follows:

addis r3, r2, x@got@tlsgd@ha // R_PPC64_GOT_TLSGD16_HA
addi r3, r3, x@got@tlsgd@l // R_PPC64_GOT_TLSGD16_LO
bl __tls_get_addr(x@tlsgd) // R_PPC64_TLSGD followed by R_PPC64_REL24
nop

The above is specified in the Power PC ELFv2 ABI in section:
3.7.3.1. General Dynamic TLS Model

However, there are two deviations form the above that have been seen in actual
code.

  1. A direct call to __tls_get_addr.
bl __tls_get_addr
nop
  1. Missing R_PPC64_TLSGD relocation.
addis r3, r2, x@got@tlsgd@ha // R_PPC64_GOT_TLSGD16_HA
addi r3, r3, x@got@tlsgd@l // R_PPC64_GOT_TLSGD16_LO
bl __tls_get_addr // R_PPC64_REL24
nop

A direct call does not cause lld to produce incorrect code gen. If
the user is careful enough to correctly setup the inputs to the function the
code will work. Also, it appears that some implementations of ld.so do this
direct call.

The second situation where the R_PPC64_TLSGD relocation is missing is a
different story. Generating the second code sequence will cause lld to produce
incorrect code gen when relaxing the sequence to Local Exec. The first two
relocations (R_PPC64_GOT_TLSGD16_HA and R_PPC64_GOT_TLSGD16_LO) will be relaxed
correctly but since the call is missing a relocation it will be left as-is
which will ultimately result in an incorrect TLS address being computed.

This patch tries to address the problem of the missing R_PPC64_TLSGD relocaiton.

An option has been added --fix-ppc64-tls-reloc.
By default if the option is not specified lld will behave just as it has done in the past.
If the option is specified then lld will check all calls to __tls_get_addr for the missing relocation.
If the relocation is missing then TLS relaxation will be turned off so the the incorrect
code is not produced.

Diff Detail

Event Timeline

stefanp created this revision.Nov 17 2020, 3:35 AM
Herald added a project: Restricted Project. · View Herald Transcript
stefanp requested review of this revision.Nov 17 2020, 3:35 AM
stefanp added a reviewer: Restricted Project.Nov 17 2020, 3:36 AM

I remember that I have questioned about the error checking when the patch was initially brought up. The introduction of a new linker option for an incorrect case where compiler will not generate such code makes me feel more uneasy about it. Surely the linker can detect such errors but does it really have to? With assembly the user can play all sorts of tricks which will not work, but is that the duty of the linker?

NeHuang added inline comments.Nov 18 2020, 3:45 PM
lld/test/ELF/ppc64-tls-add-missing-gdld.s
31

do we need to add two test cases here to cover?

  • add missing tls relocation for __tls_get_addr@notoc(x@tlsgd)
  • add missing tls relocation for __tls_get_addr@notoc(x@tlsld)
sfertile added inline comments.Nov 19 2020, 8:36 AM
lld/ELF/Relocations.cpp
1316

I don't think this is something you can reliably determine. For example consider the code:

# r30 is previously saved to the stack.
# setup r3 for the call to __tls_get_addr
 addis 3, 2, i@got@tlsgd@ha
 addi 3, 3, i@got@tlsgd@l
# Load unrelated variable into a non-volatile register so it will stay live across the call.
 addis 30, 2, j@toc@ha
 addi 30, j@toc@l(30)
 # call __tls_get_addr. Relocation I - 2 is NOT one of the relocations used to setup the argument.
 bl __tls_get_addr(i@tlsgd)
 nop
stefanp edited the summary of this revision. (Show Details)Nov 20 2020, 7:16 AM

I remember that I have questioned about the error checking when the patch was initially brought up. The introduction of a new linker option for an incorrect case where compiler will not generate such code makes me feel more uneasy about it. Surely the linker can detect such errors but does it really have to? With assembly the user can play all sorts of tricks which will not work, but is that the duty of the linker?

I agree that we cannot generate this sequence from C source using clang. However, the reason I am still trying to detect this problem is because we know that we have older versions of the XLC compiler that actually produced the incorrect setup with the missing relocation and we end up with bad code gen when linking in binaries with that missing relocation. These incorrectly compiled binaries do exist in real life and the users may have no idea that they have been compiled incorrectly.

lld/ELF/Relocations.cpp
1316

I agree.
However, in the above situation we are no worse off than we are now. Right now we fail to detect all such situations. If we add this patch we will detect most of these situations but as you mentioned above we can miss a few.

If we want to get them all we have to also flag the direct call to __tls_get_addr but I'm trying to avoid that situation.

Another option is to search up until I find a relocation that is either

  1. @tlsgd which indicates an error.

OR

  1. Function call which indicates we should stop the search without an error.
  2. Finished the search and nothing was found of the other cases in which case we stop without an error.
lld/test/ELF/ppc64-tls-add-missing-gdld.s
31

Yes I can add the @notoc versions.
Good catch. I missed those.

Both forms are technically not valid according to the ABI. No new code should be producing either of the two forms.

FWIW currently all ppc64 ld.so implementations I know (glibc/musl/FreeBSD rtld-elf) need to call a raw __tls_get_addr. This is a requisite.

The second situation where the R_PPC64_TLSGD relocation is missing is a different story. Generating the second code sequence will cause lld to produce incorrect code gen when relaxing the sequence to Local Exec.

I agree with the assertion but the complexity makes me feel uneasy. If it is a property which can be checked simply/efficiently/relatively prevailing in the wild, sure we should add it. Such examples include: absolute relocations referencing a non-absolute symbol; non-absolute relocation referencing an absolute symbol in -pie/-shared mode; relocations referencing local symbols defined in a discarded COMDAT; ...

These diagnostics are all fairly common and checking have little overhead in code and runtime costs.
However, this is not true for the code added in this patch. The long enumeration of relocation types make
the readable LLD bring closer to the hard-to-read bfd/elf*-ppc.c
String comparison coming with the check as common as type == R_PPC64_REL24 is harmful to the performance.

I am still questioning the motivation behind this change. If there are indeed ABI issues from older compilers,
can they add a wrapper around the linker if the check is deemed useful?

(In addition, binutils does not have --add-missing-tls-reloc.)

I am still questioning the motivation behind this change. If there are indeed ABI issues from older compilers,
can they add a wrapper around the linker if the check is deemed useful?

I don't really see it as an improvement in usability to have to post-process object files, potentially introducing another possible point of failure vs. providing an option to the linker.

(In addition, binutils does not have --add-missing-tls-reloc.)

I think that a departure from option compatibility is reasonable here if we can provide an option that will:

  • Allow us to handle the missing relocation when the option is specified
  • Not introduce any performance overhead when the option is not specified (i.e. the default behaviour)

Note that I am not saying this option achieves both of those goals.
It would appear to me that we really have 3 situations here that are kind of at odds with each other:

  1. Typical compiler-introduced calls to __tls_get_addr with the necessary relocations for the symbol
  2. The weird usage in ld.so with a missing relocation that does not require a relocation as there is no symbol
  3. A non-ABI compliant compiler-introduced call with a value associated with a symbol in the parameter register but a missing relocation

The current state is that the linker does not produce errors on any of those but produces incorrect code for 3. Adding this patch introduces performance overhead by default and handles 3. by either emitting an error or fixing up the relocation to avoid producing incorrect code. It has the further issue that it assumes the immediately preceding relocation was for setting up the parameter to __tls_get_addr which may not be the case as @sfertile pointed out.
So can we do something like this:

  • Leave default behaviour as it is now
  • Provide an option to search for and fix up missing relocations and error out if it cannot

Case 2. above would not work with this option, but that really doesn't seem like a problem as builds that do that will never specify the option. Furthermore, we will continue to emit incorrect code for case 3. above by default, but that is also not a problem as the documentation can suggest using the option whenever some objects are produced by older versions of the XL compiler.

MaskRay added a comment.EditedNov 24 2020, 1:44 PM

I am still questioning the motivation behind this change. If there are indeed ABI issues from older compilers,
can they add a wrapper around the linker if the check is deemed useful?

I don't really see it as an improvement in usability to have to post-process object files, potentially introducing another possible point of failure vs. providing an option to the linker.

(In addition, binutils does not have --add-missing-tls-reloc.)

I think that a departure from option compatibility is reasonable here if we can provide an option that will:

  • Allow us to handle the missing relocation when the option is specified
  • Not introduce any performance overhead when the option is not specified (i.e. the default behaviour)

Note that I am not saying this option achieves both of those goals.
It would appear to me that we really have 3 situations here that are kind of at odds with each other:

  1. Typical compiler-introduced calls to __tls_get_addr with the necessary relocations for the symbol
  2. The weird usage in ld.so with a missing relocation that does not require a relocation as there is no symbol
  3. A non-ABI compliant compiler-introduced call with a value associated with a symbol in the parameter register but a missing relocation

The current state is that the linker does not produce errors on any of those but produces incorrect code for 3. Adding this patch introduces performance overhead by default and handles 3. by either emitting an error or fixing up the relocation to avoid producing incorrect code. It has the further issue that it assumes the immediately preceding relocation was for setting up the parameter to __tls_get_addr which may not be the case as @sfertile pointed out.
So can we do something like this:

  • Leave default behaviour as it is now
  • Provide an option to search for and fix up missing relocations and error out if it cannot

Case 2. above would not work with this option, but that really doesn't seem like a problem as builds that do that will never specify the option. Furthermore, we will continue to emit incorrect code for case 3. above by default, but that is also not a problem as the documentation can suggest using the option whenever some objects are produced by older versions of the XL compiler.

Looks like the proposed option needs to be tri-state? (a) no diagnostic (the default behavior, this is needed to link glibc/musl/FreeBSD ld.so) (b) call handleTlsRelocation (this looks out-of-place to me and I would really hope we don't have this) (c) error. It looks like (b) and (c) can have false positives as well.

If GNU ld is going to add a similar option, I hope we can be consistent. So can someone also make a request on the mailing list binutils@sourceware.org ?
I would hope ppc64 is part of the option name to make it clear this is a ppc64 specific workaround.

I don't think adding an option to GNU ld is appropriate because it does the slow thing by default. Stefan is working on some test cases to illustrate GNU ld's behaviour so I think we should start with what it does and see whether we want to match behaviour and whether we want to do so under option control.

I don't think adding an option to GNU ld is appropriate because it does the slow thing by default. Stefan is working on some test cases to illustrate GNU ld's behaviour so I think we should start with what it does and see whether we want to match behaviour and whether we want to do so under option control.

My request about raising this on binutils is less about implementing the functionality for GNU ld, but more for reaching a consensus on (a) the value of this option (as you can see, I am still doubtful; but acceptance on their side can make me feel more happy to accept it) (b) the naming of the option. If they decide to add an option, that is fine; it don't, hope they will not use a different option name for the same purpose (c) getting opinions on what may be missed from the current design.

stefanp added a comment.EditedNov 25 2020, 7:45 PM

I have done some testing with gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 and GNU ld (GNU Binutils for Ubuntu) 2.30.

It appears that ld.bfd by default fixes up the relocation if it is missing and then does the relaxation correctly. I don't need to specify any options for this to be done.
As a simple example:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

The bl __tls_get_addr is missing the relocation ld.bfd still relaxes it correctly.
However, with a more complex example that I wrote by hand:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

in the above case ld.bfd does not recognize the pattern for the TLS and so it turns off TLS relaxations completely.
The reason I brought up these examples is because I feel that detecting the difference between the two cases in lld would be unreasonably complicated an probably not worth it. As a result I feel that it may not be feasible to do the same thing that ld.bfd is doing.

Here is my new proposal:
We can add an option to lld. Unfortunately this option will not exist in ld.bfd because they wouldn't need it.
The option is off by default.
If the option is off then lld will behave exactly the same way that it always has. No error detection for missing relocations.
If the option is on then lld will check all calls to __get_tls_addr and if it finds a missing relocation it will turn off TLS relaxations. It won't try to add the missing relocation and it won't throw an error.
This design will not cause problems to default behaviour (and it shouldn't add overhead to the default link either) but it will also allow linking with code that was generated with the legacy compiler if the user specifies the option.

How does this new idea sound?
If everyone agrees on this idea (or a variant of this idea) I will rework the patch like this.

I have done some testing with gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 and GNU ld (GNU Binutils for Ubuntu) 2.30.

It appears that ld.bfd by default fixes up the relocation if it is missing and then does the relaxation correctly. I don't need to specify any options for this to be done.
As a simple example:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

The bl __tls_get_addr is missing the relocation ld.bfd still relaxes it correctly.
However, with a more complex example that I wrote by hand:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

in the above case ld.bfd does not recognize the pattern for the TLS and so it turns off TLS relaxations completely.
The reason I brought up these examples is because I feel that detecting the difference between the two cases in lld would be unreasonably complicated an probably not worth it. As a result I feel that it may not be feasible to do the same thing that ld.bfd is doing.

Thanks. I did not know that GNU ld already has the functionality. That does justify the motivation behind the patch.

Here is my new proposal:
We can add an option to lld. Unfortunately this option will not exist in ld.bfd because they wouldn't need it.

Thanks. This makes sense.

The option is off by default.
If the option is off then lld will behave exactly the same way that it always has. No error detection for missing relocations.

Sounds good, if we don't implement a fullblown heuristic (which can be overkill and become maintenance burden) to reduce false positives.

If the option is on then lld will check all calls to __get_tls_addr and if it finds a missing relocation it will turn off TLS relaxations. It won't try to add the missing relocation and it won't throw an error.
This design will not cause problems to default behaviour (and it shouldn't add overhead to the default link either) but it will also allow linking with code that was generated with the legacy compiler if the user specifies the option.

Oh, I missed the case that the current patch implemented TLS relaxation. This is indeed unnecessary if the idea is just to make legacy code work, not to make them also efficient.
If not implementing TLS relaxation can make the code simpler, then go for it!

Another thing: R_PPC64_REL24_NOTOC is new. Does it really need the workaround?

How does this new idea sound?
If everyone agrees on this idea (or a variant of this idea) I will rework the patch like this.

I have done some testing with gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 and GNU ld (GNU Binutils for Ubuntu) 2.30.

It appears that ld.bfd by default fixes up the relocation if it is missing and then does the relaxation correctly. I don't need to specify any options for this to be done.
As a simple example:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

The bl __tls_get_addr is missing the relocation ld.bfd still relaxes it correctly.
However, with a more complex example that I wrote by hand:

Test1:
.LCF0:
  addis 2,12,.TOC.-.LCF0@ha
  addi 2,2,.TOC.-.LCF0@l
  .localentry     Test1,.-Test1
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  addis 3,2,x@got@tlsgd@ha
  addi 3,3,x@got@tlsgd@l
  bl __tls_get_addr
  nop

in the above case ld.bfd does not recognize the pattern for the TLS and so it turns off TLS relaxations completely.
The reason I brought up these examples is because I feel that detecting the difference between the two cases in lld would be unreasonably complicated an probably not worth it. As a result I feel that it may not be feasible to do the same thing that ld.bfd is doing.

Here is my new proposal:
We can add an option to lld. Unfortunately this option will not exist in ld.bfd because they wouldn't need it.
The option is off by default.
If the option is off then lld will behave exactly the same way that it always has. No error detection for missing relocations.
If the option is on then lld will check all calls to __get_tls_addr and if it finds a missing relocation it will turn off TLS relaxations. It won't try to add the missing relocation and it won't throw an error.
This design will not cause problems to default behaviour (and it shouldn't add overhead to the default link either) but it will also allow linking with code that was generated with the legacy compiler if the user specifies the option.

How does this new idea sound?
If everyone agrees on this idea (or a variant of this idea) I will rework the patch like this.

Instead of naming the option --add-*, we could also consider --fix-* (some arm/aarch64 specific options start with --fix-*, as you can see from Options.td)

stefanp updated this revision to Diff 308352.Nov 30 2020, 6:21 AM

Completely changed the way that the patch works.

Added a new option to the patch --fix-ppc64-tls-reloc
The option is off by default.
If the option is off then LLD behaves the way it has in the past.
If the option is on then LLD on PPC64 will check all calls to __tls_get_addr. If
any of those calls are missing a relocation then TLS relaxation is turned off.

stefanp edited the summary of this revision. (Show Details)Nov 30 2020, 6:26 AM
sfertile added inline comments.Nov 30 2020, 8:23 AM
lld/ELF/Relocations.cpp
1335

We still need to issue a diagnostic. While disabling tls relaxations allows us to produce a well formed binary, ideally the users would recompile the objects with invalid tls sequences. Without a warning they won't know some of the input files have problems.

lld/test/ELF/ppc64-call-tls-get-addr-missing.s
7

I'm trying to figure out the utility of this test. Referencing an undefined symbol should obviously be an error. Is the test meant to show we don't perform tls relaxations? If so I think the other tests cover that.

lld/test/ELF/ppc64-tls-add-missing-gdld.s
1

The name of this file should be changed to match the direction change, or if it doesn't add anything that isn't already covered by ppc64-tls-missing-gdld.s it should be removed.

stefanp updated this revision to Diff 308747.Dec 1 2020, 12:55 PM

Since the scope of the patch has changed two of the tests were no longer
required so I have removed them.
Added a warning for the case where TLS relaxation is disabled due to a missing
relocation.

lld/test/ELF/ppc64-call-tls-get-addr-missing.s
7

Yes this test was meant to show that the relaxations were not done. However, I agree with you this is probably already covered by the other tests. I'll remove this test.

lld/test/ELF/ppc64-tls-add-missing-gdld.s
1

Yes, This test pretty much covers the same thing as ppc64-tls-missing-gdld.s.
I'm going to remove it as well.

Gentle Ping

MaskRay added inline comments.Dec 8 2020, 9:37 AM
lld/ELF/Relocations.cpp
1324

The two if conditions can be combined.

1328

The parens are unneeded.

1331

config->isMips64EL -> false

lld/test/ELF/ppc64-call-tls-get-addr-exec.s
11

Add .globl __tls_get_addr to make it match the real case.

lld/test/ELF/ppc64-call-tls-get-addr.s
2

Consider combining the two tests with split-file

stefanp updated this revision to Diff 310526.Dec 9 2020, 7:13 AM

Merged if statement and did a bit of cleanup according to comments.
Merged two test cases.

MaskRay added inline comments.Dec 9 2020, 10:16 AM
lld/ELF/InputSection.cpp
1055–1056

This is not correct. The relocation will not be handled at all. ppc64-tls-missing-gdld.s does not test -no-pie or -pie so you did not notice the issue.

Consider dropping the TLS part and I'll try taking a stab at it.

lld/test/ELF/ppc64-call-tls-get-addr-exec.s
2

Since the test now has -shared RUN lines. -exec is no longer a suitable suffix.

lld/test/ELF/ppc64-tls-missing-gdld.s
4–11

GD to IE/LE relaxation only happens for -no-pie or -pie.

7

Can some portion of the comment be retained?

I think the code should be extracted to scanRelocs: check whether there are R_TLSGD_GOT relocations without R_PPC64_TLSGD/R_PPC64_TLSLD. This may be an entire rewrite. I'm working on it.

I think the code should be extracted to scanRelocs: check whether there are R_TLSGD_GOT relocations without R_PPC64_TLSGD/R_PPC64_TLSLD. This may be an entire rewrite. I'm working on it.

I think I understand what you are looking for. It's my patch so I should probably be doing the rewrite myself.
I don't want you to spend more time than you have to on this...

I think the code should be extracted to scanRelocs: check whether there are R_TLSGD_GOT relocations without R_PPC64_TLSGD/R_PPC64_TLSLD. This may be an entire rewrite. I'm working on it.

I think I understand what you are looking for. It's my patch so I should probably be doing the rewrite myself.
I don't want you to spend more time than you have to on this...

... but do you think D92959 is complete now?

I think the code should be extracted to scanRelocs: check whether there are R_TLSGD_GOT relocations without R_PPC64_TLSGD/R_PPC64_TLSLD. This may be an entire rewrite. I'm working on it.

I think I understand what you are looking for. It's my patch so I should probably be doing the rewrite myself.
I don't want you to spend more time than you have to on this...

... but do you think D92959 is complete now?

Yes, I see you have already written it.
Thank you for your help.
In that case I'll move to your patch and add comments there.

stefanp abandoned this revision.Jan 5 2021, 1:05 PM

Since D92959 is now in I am going to abandon this patch.