This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix issue where binary uses a .got but is missing a .TOC.
ClosedPublic

Authored by stefanp on Nov 13 2020, 7:25 AM.

Details

Summary

From the PowerPC ELFv2 ABI section 4.2.3. Global Offset Table.

The GOT consists of an 8-byte header that contains the TOC base (the first TOC
base when multiple TOCs are present), followed by an array of 8-byte addresses.

Due to the introduction of PC Relative code it is now possible to require a GOT
without having a .TOC. symbol in the object that is being linked. Since LLD uses
the .TOC. symbol to determine whether or not a GOT is required the GOT header is
not setup correctly and the 8-byte header is missing.

This patch allows the Power PC GOT setup to happen when an element is added to
the GOT instead of at the very begining. When this header is added a .TOC.
symbol is also added.

Diff Detail

Event Timeline

stefanp created this revision.Nov 13 2020, 7:25 AM
stefanp requested review of this revision.Nov 13 2020, 7:25 AM
stefanp added a reviewer: Restricted Project.
MaskRay added inline comments.Nov 16 2020, 11:24 PM
lld/ELF/SyntheticSections.cpp
710

Can you use an approach similar to GotPltSection::hasGotPltOffset?

stefanp added inline comments.Nov 17 2020, 4:03 AM
lld/ELF/SyntheticSections.cpp
710

I'm sorry but I'm not sure what you are looking for.
I assume you mean GotPltSection::hasGotPltOffRel.

Are you looking to replace

if (config->emachine != EM_PPC64 || ElfSym::globalOffsetTable)
    return;

with something like:

if (hasGotHeader || ElfSym::globalOffsetTable)
    return;

Where I can then set hasGotHeader from the GotSection constructor and/or PPC specific code?

Or:
Should I set numEntries += target->gotHeaderEntriesNum; from PPC specific code?

NeHuang added inline comments.Nov 18 2020, 1:14 PM
lld/test/ELF/ppc64-ld-got-dtprel.s
28

is the computation proper here? To me, &.got[2] is different from &.got[1]

MaskRay added inline comments.Nov 18 2020, 2:14 PM
lld/ELF/SyntheticSections.cpp
710

We should drop setupPPCDelayedInit and compute the size of GotSection with an approach similar to GotPltSection::hasGotPltOffset

stefanp added inline comments.Nov 23 2020, 4:36 AM
lld/ELF/SyntheticSections.cpp
710

I cannot find GotPltSection::hasGotPltOffset it is not in the code.

There are a couple of ways that I could remove

numEntries += target->gotHeaderEntriesNum;

from this function including adding it to Symbol::getGotOffset() in the same way that Symbol::getGotPltOffset() does. However, I still have to add the .TOC. symbol and so I cannot remove setupPPCDelayedInit completely.

I'm sorry but I'm still confused about this comment...

@MaskRay
I'm sorry to bug you but I still don't understand what you are looking for.

stefanp added inline comments.Feb 1 2021, 12:31 PM
lld/test/ELF/ppc64-ld-got-dtprel.s
28

You are correct the comment there is wrong.
It should probably be:

## hi(j@got@dtprel) = (&.got[2] - .got >> 16) & 0xffff = -1

I will fix that.

stefanp updated this revision to Diff 320564.Feb 1 2021, 12:37 PM

Updated incorrect comment in test case.

I'm sorry @MaskRay I don't want to keep bugging you but I really don't understand what you meant by your comment with respect to GotPltSection::hasGotPltOffset.

Sorry, I missed your early comments.

Which test can demonstrate the bug? My idea is that instead of creating setupPPCDelayedInit, we should let in.got->hasGotOffRel to convey that .got is needed.
See Relocations.cpp:1397

Sorry, I missed your early comments.

Which test can demonstrate the bug? My idea is that instead of creating setupPPCDelayedInit, we should let in.got->hasGotOffRel to convey that .got is needed.
See Relocations.cpp:1397

Thank you for looking at this again!
I think I understand what you are looking for.

Let me start with the example test case. Pretty much any test that requires a GOT will work as long as we use PC Relative and don't have a TOC.
Take globalvar.c:

 int glob_int=0;
int function() {
  return glob_int;
}

If we compile it for Power 10:

clang -O3 -mcpu=pwr10 -c globalvar.c -fPIC
ld.lld --shared globalvar.o -o globalvar.so

We can then dump out some information on the GOT for globalvar.so.

$ llvm-readelf -r globalvar.so
Relocation section '.rela.dyn' at offset 0x2a0 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000020380  0000000200000014 R_PPC64_GLOB_DAT       0000000000030388 glob_int + 0
$ llvm-readelf -x .got globalvar.so
Hex dump of section '.got':
0x00020380 80830200 00000000                   ........

The problem with the output is that the glob_int is placed right at the top of the .got section. The TOC base is missing and according to the ABI we require it in all cases that require a .got.
This becomes a problem when we link globalvar.so later on with something that expects that TOC base to exist.

Now with respect to the suggestion you made. The in.got->hasGotOffRel is not set in the example from above.
The reason is that:

R_PPC64_GOT_PCREL34	glob_int

And R_PPC64_GOT_PCREL34 is turned into R_GOT_PC which is not part of: oneof<R_GOTONLY_PC, R_GOTREL, R_PPC64_TOCBASE, R_PPC64_RELAX_TOC>(expr).
This is why I added setupPPCDelayedInit that would run when we actively tried to add something to the GOT. We do not want to try to add a header to the GOT if the R_GOT_PC ends up being relaxed away and the rest of the GOT is empty.

In the past this wasn't really a problem because .TOC. would be defined any time we needed a GOT because we would always access the GOT with the TOC pointer. But now with PC Relative that is not the case.

I'm not sure how to proceed in this case. I have no way of knowing whether or not a GOT entry will be required when I create the GotSection object and I need a header in all situations where the GOT has something in it.

Gentle ping.

Could there be a simpler approach where we always emit a .got section on PPC64: In addReservedSymbols if we fail to resolve .TOC. then we create it?

Could there be a simpler approach where we always emit a .got section on PPC64: In addReservedSymbols if we fail to resolve .TOC. then we create it?

We could do that and it would certainly solve the problem. The issue I have with that is that we will now have a .got for everything we link including things that we used to link in the past that did not require it. It seems odd to say that since the design of the linker won't allow us to do this easily we will generate this section whether we need it or not.

Ping.
I would like to restart the discussion on this patch.
I understand that the design of this fix is not ideal however I'm not sure of a better solution to this.
Adding the GOT to all linked binaries seems like too much. However, using hasGotOffRel isn't going to cover all of the test cases as it will not be set in all of the cases. The code needs to add a first symbol to the GOT in all cases where we use a GOT but we don't want to add a GOT if nothing else goes into it.

Could there be a simpler approach where we always emit a .got section on PPC64: In addReservedSymbols if we fail to resolve .TOC. then we create it?

We could do that and it would certainly solve the problem. The issue I have with that is that we will now have a .got for everything we link including things that we used to link in the past that did not require it. It seems odd to say that since the design of the linker won't allow us to do this easily we will generate this section whether we need it or not.

I'll need to go over this again to refresh my memory. As a quick recap though:

  • On PPC, if we create a .got, then the first entry must be the tocbase.
  • If .TOC. is not defined by one of the input objects, then we do not increment the number of .got entries here
  • If we synthetically define .TOC. in the linker, then we end up with a .got section in some cases where we would otherwise not need one.

Is it possible to change the .got section to always add gotHeaderEntriesNum in its constructor for PPC64, and remove the section late if all it contains is the single entry and .TOC. is not defined?

Sorry, I missed your early comments.

Which test can demonstrate the bug? My idea is that instead of creating setupPPCDelayedInit, we should let in.got->hasGotOffRel to convey that .got is needed.
See Relocations.cpp:1397

Thank you for looking at this again!
I think I understand what you are looking for.

Let me start with the example test case. Pretty much any test that requires a GOT will work as long as we use PC Relative and don't have a TOC.
Take globalvar.c:

 int glob_int=0;
int function() {
  return glob_int;
}

If we compile it for Power 10:

clang -O3 -mcpu=pwr10 -c globalvar.c -fPIC
ld.lld --shared globalvar.o -o globalvar.so

We can then dump out some information on the GOT for globalvar.so.

$ llvm-readelf -r globalvar.so
Relocation section '.rela.dyn' at offset 0x2a0 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
0000000000020380  0000000200000014 R_PPC64_GLOB_DAT       0000000000030388 glob_int + 0
$ llvm-readelf -x .got globalvar.so
Hex dump of section '.got':
0x00020380 80830200 00000000                   ........

Please add this as a lit test as part of this patch.

stefanp updated this revision to Diff 333384.Mar 25 2021, 12:01 PM

Rebased the patch to the latest top of trunk and added the test case.

I'll need to go over this again to refresh my memory. As a quick recap though:

  • On PPC, if we create a .got, then the first entry must be the tocbase.
  • If .TOC. is not defined by one of the input objects, then we do not increment the number of .got entries here
  • If we synthetically define .TOC. in the linker, then we end up with a .got section in some cases where we would otherwise not need one.

Yes, you got it!

Is it possible to change the .got section to always add gotHeaderEntriesNum in its constructor for PPC64, and remove the section late if all it contains is the single entry and .TOC. is not defined?

I see what you mean. An option might be to modify this function:

bool GotSection::isNeeded() const {
  // We need to emit a GOT even if it's empty if there's a relocation that is
  // relative to GOT(such as GOTOFFREL).
  return numEntries || hasGotOffRel;
}

On power PC we can say that we don't need the GOT if the number of entries is one and if the .TOC. is not defined. We now always add the GOT header but if we only have a GOT header at the end then we can say that we don't need the GOT. Does does that sound ok?

I can try to rework the patch to have it do this.

MaskRay added a comment.EditedMar 26 2021, 3:44 PM

Apologies for my late response.

I'll need to go over this again to refresh my memory. As a quick recap though:

  • On PPC, if we create a .got, then the first entry must be the tocbase.
  • If .TOC. is not defined by one of the input objects, then we do not increment the number of .got entries here
  • If we synthetically define .TOC. in the linker, then we end up with a .got section in some cases where we would otherwise not need one.

Yes, you got it!

Thanks:) I need to refresh my memory as well.

Is it possible to change the .got section to always add gotHeaderEntriesNum in its constructor for PPC64, and remove the section late if all it contains is the single entry and .TOC. is not defined?

I see what you mean. An option might be to modify this function:

bool GotSection::isNeeded() const {
  // We need to emit a GOT even if it's empty if there's a relocation that is
  // relative to GOT(such as GOTOFFREL).
  return numEntries || hasGotOffRel;
}

On power PC we can say that we don't need the GOT if the number of entries is one and if the .TOC. is not defined. We now always add the GOT header but if we only have a GOT header at the end then we can say that we don't need the GOT. Does does that sound ok?

I can try to rework the patch to have it do this.

Sounds good.

Eventually I think we may need to do something like if we want to have better compatibility with GNU ld. A rudimentary patch like the following needs test updates for ppc32/ppc64/riscv.
(I haven't thought enough about the interaction with the GOT symbol yet.)

I can check whether your ppc64 specific patch will make it closer to the below ... and if so, make our code looks like that once your patch is committed:)

diff --git i/lld/ELF/SyntheticSections.cpp w/lld/ELF/SyntheticSections.cpp
index 9a875bd7ec3e..1f9a741b206a 100644
--- i/lld/ELF/SyntheticSections.cpp
+++ w/lld/ELF/SyntheticSections.cpp
@@ -648,7 +648,3 @@ GotSection::GotSection()
                        ".got") {
-  // If ElfSym::globalOffsetTable is relative to .got and is referenced,
-  // increase numEntries by the number of entries used to emit
-  // ElfSym::globalOffsetTable.
-  if (ElfSym::globalOffsetTable && !target->gotBaseSymInGotPlt)
-    numEntries += target->gotHeaderEntriesNum;
+  numEntries += target->gotHeaderEntriesNum;
 }
@@ -694,3 +690,3 @@ bool GotSection::isNeeded() const {
   // relative to GOT(such as GOTOFFREL).
-  return numEntries || hasGotOffRel;
+  return numEntries > target->gotHeaderEntriesNum || ElfSym::globalOffsetTable || hasGotOffRel;
 }
stefanp updated this revision to Diff 334141.Mar 30 2021, 6:42 AM

Updated to always add the header to the got on PPC64.

If the .got only contains the header it is not needed and so old linking
behaviour will remain the same.

I was not able to add the .TOC. symbol to the patch. Adding the symbol in the
constructor did no allow me remove it later and adding it to either
finalizeContents or isNeeded is too late and it won't appear in the symbol table
in the object that is linked.

However, I do not believe that the symbol is necessary to fix the bug. Even
without the symbol the .got has space for the .TOC. at the top.

stefanp updated this revision to Diff 334144.Mar 30 2021, 6:50 AM

Removed extra whitespace.

MaskRay accepted this revision.Mar 30 2021, 1:02 PM
MaskRay added inline comments.
lld/test/ELF/ppc64-check-missing-tocbase.s
37

I know there is prior art but why can't this be written as t`.reloc ., R_PPC64_PCREL_OPT, 0` before pld?

This revision is now accepted and ready to land.Mar 30 2021, 1:02 PM
stefanp added inline comments.Mar 31 2021, 11:23 AM
lld/test/ELF/ppc64-check-missing-tocbase.s
37

The ABI does require that the .reloc come after the glob_int@got@pcrel relocation because the R_PPC64_PCREL_OPT relocation is an optimization opportunity related to that previous relocation. Therefore, I don't want to swap the order of the two.
It would be in section 3.5.4. Relocation Descriptions. It is a new relocation so I don't know if you have the text. So, here it is:

This relocation specifies that the instruction at r_offset and the instruction at r_offset + r_addend
may be optimized by the linker; the compiler must guarantee that register lifetimes are such that
the optimization is safe. In code sequences where this relocation is valid, the first instruction also
has another relocation at r_offset. The R_PPC64_PCREL_OPT entry occurs immediately after
that relocation in the table of relocations. See Section 3.6.3.3, “Displacement Optimization for PC-Relative Accesses” [95] for more details.