This is an archive of the discontinued LLVM Phabricator instance.

[ELF][PPC64] Suppress toc-indirect to toc-relative relaxation if R_PPC64_TOC16_LO is seen
ClosedPublic

Authored by MaskRay on Apr 18 2020, 10:51 AM.

Details

Summary

The current implementation assumes that R_PPC64_TOC16_HA is always followed
by R_PPC64_TOC16_LO_DS. This can break with R_PPC64_TOC16_LO:

// Load the address of the TOC entry, instead of the value stored at that address
addis 3, 2, .LC0@tloc@ha  # R_PPC64_TOC16_HA
addi  3, 3, .LC0@tloc@l   # R_PPC64_TOC16_LO
blr

which is used by boringssl's util/fipstools/delocate/delocate.go
https://github.com/google/boringssl/blob/master/crypto/fipsmodule/FIPS.md has some documentation.
In short, this tool converts an assembly file to avoid any potential relocations.
The distance to an input .toc is not a constant after linking, so it cannot use an addis;ld pair.
Instead, it jumps to a stub which loads the TOC entry address with addis;addi.

This patch checks the presence of R_PPC64_TOC16_LO and suppresses
toc-indirect to toc-relative relaxation if R_PPC64_TOC16_LO is seen.
This approach is conservative and loses some relaxation opportunities but is easy to implement.

addis 3, 2, .LC0@toc@ha  # no relaxation
addi  3, 3, .LC0@toc@l   # no relaxation
li    9, 0
addis 4, 2, .LC0@toc@ha  # can relax but suppressed
ld    4, .LC0@toc@l(4)   # can relax but suppressed

Also note that interleaved R_PPC64_TOC16_HA and R_PPC64_TOC16_LO_DS is
possible and this patch accounts for that.

addis 3, 2, .LC1@toc@ha  # can relax
addis 4, 2, .LC2@toc@ha  # can relax
ld    3, .LC1@toc@l(3)   # can relax
ld    4, .LC2@toc@l(4)   # can relax

Diff Detail

Event Timeline

MaskRay created this revision.Apr 18 2020, 10:51 AM

Interesting. I wasn't aware that taking the address of a toc entry was allowed when I implemented the toc-optimizations 😦 . I'm not sure we can peek at just the next relocation though to see if we can optimize or not. For example consider:

  addis 4, 2, .LC1@toc@ha  # R_PPC64_TOC16_HA, should be optimized.
  addis 3, 2, .LC0@toc@ha  # R_PPC64_TOC16_HA, should *NOT* be optimized.
  ld    4,    .LC1@toc@l(4)  # R_PPC64_TOC16_LO_DS, should be optimzied
  addi  3, 3, .LC0@toc@l   # R_PPC64_TOC16_LO, should *NOT* be optimized.
  blr

AES_encrypt:
  .long 0xabcd1234

  .data
Sym: 
  .long 0

.LC0:
  .tc AES_encrypt[TC], AES_encrypt
.LC1:
  .tc Sym[TC], Sym

My initial though are we will have to scan all the .toc relocs upfront to determine which are optimizable, but I need to spend a bit more time thinking about this to make sure I fully understand the problem.

I don't really know GO, but what in the language ends up producing assembly that takes the address of the toc entry for a symbol?

MaskRay added a comment.EditedApr 21 2020, 1:22 PM

Interesting. I wasn't aware that taking the address of a toc entry was allowed when I implemented the toc-optimizations 😦 . I'm not sure we can peek at just the next relocation though to see if we can optimize or not. For example consider:

  addis 4, 2, .LC1@toc@ha  # R_PPC64_TOC16_HA, should be optimized.
  addis 3, 2, .LC0@toc@ha  # R_PPC64_TOC16_HA, should *NOT* be optimized.
  ld    4,    .LC1@toc@l(4)  # R_PPC64_TOC16_LO_DS, should be optimzied
  addi  3, 3, .LC0@toc@l   # R_PPC64_TOC16_LO, should *NOT* be optimized.
  blr

AES_encrypt:
  .long 0xabcd1234

  .data
Sym: 
  .long 0

.LC0:
  .tc AES_encrypt[TC], AES_encrypt
.LC1:
  .tc Sym[TC], Sym

My initial though are we will have to scan all the .toc relocs upfront to determine which are optimizable, but I need to spend a bit more time thinking about this to make sure I fully understand the problem.

I don't really know GO, but what in the language ends up producing assembly that takes the address of the toc entry for a symbol?

boringssl delocate (there is some documentation on https://github.com/google/boringssl/blob/master/crypto/fipsmodule/FIPS.md ; my understanding is very limited, too) is the only tool that takes the address of a TOC entry. In short, this tool converts an assembly file to avoid any potential relocations. The distance to an input .toc is not a constant, so it cannot use an addis;ld pair. It calls a stub function (where relocations can be freely used) and the stub loads the address of the TOC entry in r3. The stub function has the TOC16_HA/TOC16_LO code sequence.

I believe all of our internal ppc targets can be built if we use --no-toc-optimize. After this bug fixed, there may be another bug related to delocate generated assembly, but no other tool generates such strange TOC16_HA/TOC16_LO pairs.

For interleaved R_PPC64_TOC16_* symbols, I agree that it is indeed a problem. I think we have to count on compilers/users not writing such assembly. GNU ld appears to use some heuristics. This patch brings us on par to some extent.

Thanks. I'm reading through the docs you linked and will have a look at the delocate tool.

agl added a subscriber: agl.Apr 23 2020, 7:01 PM

Sorry about delocate. It is profoundly weird, motivated purely by compliance with FIPS regulations, and we don't want it to cause any problems.

It has been a couple of years since I last had all the POWER stuff in my head but, if delocate can change to make things easier for you, please let us(*) know. The POWER ABI is also non-trivial and it's possible that we're doing invalid things that have only worked because of luck.

(*) agl at google dot com.

In D78431#2000967, @agl wrote:

Sorry about delocate. It is profoundly weird, motivated purely by compliance with FIPS regulations, and we don't want it to cause any problems.

No need to apologize, the linked documentation does a good job explaining why delocate does what it does, and the tests make it really easy to see what the transformations for Power are in practice. When I saw that the tests have input assembly that take the address of a toc-entry it got me thinking about if the gcc section-anchor option could produce code from source that takes the address of a toc-entry.

The POWER ABI is also non-trivial and it's possible that we're doing invalid things that have only worked because of luck.

All the transformations I looked at were ABI compliant, and I don't think there is anything in the ABI disallowing taking the address of a toc-entry. Both BFD and Gold disable toc-optimizations for these instructions so I would say that is pretty solid evidence showing it is a supported construct.

FWIW, I think I am OK with us landing a patch like this that will selectively unbreak the code that delocate emits. I'm going to poke around a bit more today though to see how hard a more general solution might be before approving this patch.

sfertile accepted this revision as: sfertile.Apr 28 2020, 7:09 AM

LGTM. Unfortunately a more general solution looks to be quite difficult/intrusive and I'm not sure if it is worth it.

This revision is now accepted and ready to land.Apr 28 2020, 7:09 AM
MaskRay updated this revision to Diff 260722.Apr 28 2020, 12:11 PM
MaskRay edited the summary of this revision. (Show Details)

Update description

This revision was automatically updated to reflect the committed changes.

I had to pull this as it breaks the PPC mutlistage LLD bot. Consider the following input:

addis r3, r2, .LC0@toc@ha  # No longer gets optimized
addis r4, r2, .LC1@toc@ha
ld  r3, .LC0@toc@l(r3)
ld r4, .LC1@toc@l(r4)

I should have caught this in the initial review, so I am sorry for the churn.

I think I have an alternative 'big-hammer' approach that will work and not be too intrusive to the rest of LLD.

There is a bit free in the Symbol class. We can use that to indicate if we have to disable toc optimizations through this symbol. In ScanRelocs if we find a R_PPC64_TOC16_LO relocation to a toc-entry, we set the bit on the referenced sym. That will disable toc-optimizations to any symbol referenced through that objects toc-section.

MaskRay added a comment.EditedApr 29 2020, 9:38 AM

I had to pull this as it breaks the PPC mutlistage LLD bot. Consider the following input:

addis r3, r2, .LC0@toc@ha  # No longer gets optimized
addis r4, r2, .LC1@toc@ha
ld  r3, .LC0@toc@l(r3)
ld r4, .LC1@toc@l(r4)

I should have caught this in the initial review, so I am sorry for the churn.

Oh, I did not know this can be interleaved. Can you point me to an example?

I think I have an alternative 'big-hammer' approach that will work and not be too intrusive to the rest of LLD.

There is a bit free in the Symbol class. We can use that to indicate if we have to disable toc optimizations through this symbol. In ScanRelocs if we find a R_PPC64_TOC16_LO relocation to a toc-entry, we set the bit on the referenced sym. That will disable toc-optimizations to any symbol referenced through that objects toc-section.

We have a spare bit after Symbol::scriptDefined. Indeed it seems that a stateless transformation is problematic. Adding logic to scanRelocs is probably the best approach.

@sfertile Your revert does not include Differential Revision: . I guess if it included, the revision would be automatically re-opened, and Commits: would list the associated revert commits.

MaskRay reopened this revision.Apr 29 2020, 9:39 AM
This revision is now accepted and ready to land.Apr 29 2020, 9:39 AM

@sfertile Your revert does not include Differential Revision: . I guess if it included, the revision would be automatically re-opened, and Commits: would list the associated revert commits.

Cool, I didn't realize we had that functionality. I'll make sure to include it from now on.

MaskRay updated this revision to Diff 260987.Apr 29 2020, 12:24 PM
MaskRay retitled this revision from [ELF][PPC64] Don't perform toc-indirect to toc-relative relaxation for R_PPC64_TOC16_HA not followed by R_PPC64_TOC16_LO_DS to [ELF][PPC64] Suppress toc-indirect to toc-relative relaxation if R_PPC64_TOC16_LO is seen.
MaskRay edited the summary of this revision. (Show Details)

New approach

MaskRay updated this revision to Diff 260992.Apr 29 2020, 12:35 PM

Improve comments

Harbormaster completed remote builds in B55158: Diff 260987.
sfertile accepted this revision.Apr 30 2020, 6:28 AM

One minor comment but otherwise LGTM. Thanks for fixing this.

lld/ELF/Relocations.cpp
1325

We should check that the symbol is a section, and that the sections name is ".toc" before adding it to the map as well.

MaskRay updated this revision to Diff 261252.Apr 30 2020, 9:06 AM

Check the section name is .toc

MaskRay marked an inline comment as done.Apr 30 2020, 9:10 AM
MaskRay updated this revision to Diff 261256.Apr 30 2020, 9:14 AM
MaskRay edited the summary of this revision. (Show Details)

Improve description

This revision was automatically updated to reflect the committed changes.