[ELF][MIPS] Multi-GOT implementation
AcceptedPublic

Authored by atanasyan on Mar 31 2017, 3:36 AM.

Details

Summary

Almost all entries inside MIPS GOT are referenced by signed 16-bit index. Zero entry lies approximately in the middle of the GOT. So the total number of GOT entries cannot exceed ~16384 for 32-bit architecture and ~8192 for 64-bit architecture. This limitation makes impossible to link rather large application like for example LLVM+Clang. There are two workaround for this problem. The first one is using the -mxgot compiler's flag. It enables using a 32-bit index to access GOT entries. But each access requires two assembly instructions two load GOT entry index to a register. Another workaround is multi-GOT. This patch implements it.

Here is a brief description of multi-GOT for detailed one see the following link https://dmz-portal.mips.com/wiki/MIPS_Multi_GOT.

If the sum of local, global and tls entries is less than 64K only single got is enough. Otherwise, multi-got is created. Series of primary and multiple secondary GOTs have the following layout:

- Primary GOT
    Header
    Local entries
    Global entries
    Relocation only entries
    TLS entries

- Secondary GOT
    Local entries
    Global entries
    TLS entries
...

All GOT entries required by relocations from a single input file entirely belong to either primary or one of secondary GOTs. To reference GOT entries each GOT has its own _gp value points to the "middle" of the GOT. In the code this value loaded to the register which is used for GOT access.

MIPS 32 function's prologue:

lui     v0,0x0
0: R_MIPS_HI16  _gp_disp
addiu   v0,v0,0
4: R_MIPS_LO16  _gp_disp

MIPS 64 function's prologue:

lui     at,0x0
14: R_MIPS_GPREL16  main

Dynamic linker does not know anything about secondary GOTs and cannot use a regular MIPS mechanism for GOT entries initialization. So we have to use an approach accepted by other architectures and create dynamic relocations R_MIPS_REL32 to initialize global entries (and local in case of PIC code) in secondary GOTs. But ironically MIPS dynamic linker requires GOT entries and correspondingly ordered dynamic symbol table entries to deal with dynamic relocations. To handle this problem relocation-only section in the primary GOT contains entries for all symbols referenced in global parts of secondary GOTs. Although the sum of local and normal global entries of the primary got should be less than 64K, the size of the primary got (including relocation-only entries can be greater than 64K, because parts of the primary got that overflow the 64K limit are used only by the dynamic linker at dynamic link-time and not by 16-bit gp-relative addressing at run-time.

The patch affects common LLD code in the following places:

  • Added new hidden -mips-got-size flag. This flag required to set low maximum size of a single GOT to be able to test the implementation using small test cases.
  • Added InputFile argument to the getRelocTargetVA function. The same symbol referenced by GOT relocation from different input file might be allocated in different GOT. So result of relocation depends on the file.
  • Added new ctor to the DynamicReloc class. This constructor records settings of dynamic relocation which used to adjust address of 64kb page lies inside a specific output section.

With the patch LLD is able to link all LLVM+Clang+LLD applications and libraries for MIPS 32/64 targets.

Diff Detail

Repository
rLLD LLVM Linker
There are a very large number of changes, so older changes are hidden. Show Older Changes
ruiu added a comment.Mar 31 2017, 6:20 PM

This is not your fault, but I have to say that this MIPS GOT layout is very odd, too different from other architectures, and too complicated. I want to avoid supporting this unless I'm convinced that it is absolutely necessary. It seems to me that MIPS needs a clean, common new ABI. Only the MIPS ABI imposes a lot of restrictions on the size of GOT sections and the order of GOT section members, even though MIPS as a processor is an ordinary RISC ISA. This change would really hurt maintainability of LLD which I already found some MIPS-specific behavior is hard to keep it right when editing code for all the other architectures.

I wonder what is the performance penalty you would have to pay when you use the -mxgot option. With the option, you'll need three instructions as opposed to a single instruction to access an GOT entry. Does that actually make observable difference in performance?

ELF/DriverUtils.cpp
31

Do you need this?

emaste added a comment.Apr 3 2017, 8:13 AM

One other note, I tried linking the FreeBSD base system on mips64 with this patch applied (since we require mxgot or multigot to be able to link the entirety of the base system), and have a reproducible segfault in lld. A reproducer is at https://people.freebsd.org/~emaste/lld/lld-mips.tar. It seems the crash is reproducible after reverting the patch, so I'll submit a bug for it shortly.

I also tried a debug build with and without this patch (twice each), and reproduced the ELF/mips-got-and-copy.s issue with the patch and not without. Since it didn't happen for you I'll try to collect some more detail about the failure.

I also tried a debug build with and without this patch (twice each), and reproduced the ELF/mips-got-and-copy.s issue with the patch and not without. Since it didn't happen for you I'll try to collect some more detail about the failure.

Thanks a lot for your help. BTW what tools and libraries do you use to build LLD: Clang (what version?) or gcc, libc++ or libstdc++?

emaste added a comment.Apr 3 2017, 9:13 AM

Thanks a lot for your help. BTW what tools and libraries do you use to build LLD: Clang (what version?) or gcc, libc++ or libstdc++?

This is the integrated tool-chain in FreeBSD-current, with one non-default configuration: using LLD as /usr/bin/ld. So it's Clang, LLD, and libc++ 4.0.0, ELF Tool Chain objcopy / strip, and FreeBSD's libc & libthr (== libpthread).

I compared the tmp objects in the failing and passing case using diffoscope and noticed one unexpected difference, although it does not appear to be root cause of the failure.

With stat() differences elided I see:

--- fail
+++ work
├── file list
│ @@ -1,3 +1,4 @@
│ +mips-got-and-copy.s.tmp.exe
│  mips-got-and-copy.s.tmp.o
│  mips-got-and-copy.s.tmp.so
│  mips-got-and-copy.s.tmp.so.o
├── mips-got-and-copy.s.tmp.so
├── readelf --wide --symbols {}
│ │ @@ -1,16 +1,16 @@
│ │  
│ │  Symbol table '.dynsym' contains 6 entries:
│ │     Num:    Value  Size Type    Bind   Vis      Ndx Name
│ │       0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
│ │ -     1: 00010008     0 FUNC    GLOBAL DEFAULT    7 foo1
│ │ -     2: 00010004     0 FUNC    GLOBAL DEFAULT    7 foo0
│ │ +     1: 00010000     0 NOTYPE  GLOBAL DEFAULT    7 _foo
│ │ +     2: 00020000     4 OBJECT  GLOBAL DEFAULT    8 data0
│ │       3: 00020004     4 OBJECT  GLOBAL DEFAULT    8 data1
│ │ -     4: 00020000     4 OBJECT  GLOBAL DEFAULT    8 data0
│ │ -     5: 00010000     0 NOTYPE  GLOBAL DEFAULT    7 _foo
│ │ +     4: 00010004     0 FUNC    GLOBAL DEFAULT    7 foo0
│ │ +     5: 00010008     0 FUNC    GLOBAL DEFAULT    7 foo1
│ │  
│ │  Symbol table '.symtab' contains 8 entries:
│ │     Num:    Value  Size Type    Bind   Vis      Ndx Name
│ │       0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
│ │       1: 00028000     0 NOTYPE  LOCAL  HIDDEN   ABS _gp
│ │       2: 000001c0     0 NOTYPE  LOCAL  HIDDEN     5 _DYNAMIC
│ │       3: 00010000     0 NOTYPE  GLOBAL DEFAULT    7 _foo
├── readelf --wide --decompress --hex-dump=.hash {}
│ │ @@ -1,7 +1,7 @@
│ │  
│ │  Hex dump of section '.hash':
│ │ -  0x00000188 00000006 00000006 00000002 00000001 ................
│ │ -  0x00000198 00000004 00000003 00000000 00000005 ................
│ │ +  0x00000188 00000006 00000006 00000004 00000005 ................
│ │ +  0x00000198 00000002 00000003 00000000 00000001 ................
│ │    0x000001a8 00000000 00000000 00000000 00000000 ................
│ │    0x000001b8 00000000 00000000                   ........

so for some reason the sort order for .dynsym is reversed between the two cases.

atanasyan updated this revision to Diff 94058.Apr 4 2017, 5:45 AM
  • Removed unused MIPS specific fields from the SymbolBody class.
  • Update comments.
atanasyan updated this revision to Diff 94587.Apr 7 2017, 11:11 PM
  • Rebase the patch
  • Fix GOT entry allocation for symbols which have related dynamic copy relocations
grimar added a subscriber: grimar.Apr 10 2017, 2:31 AM
In D31528#715906, @ruiu wrote:

This is not your fault, but I have to say that this MIPS GOT layout is very odd, too different from other architectures, and too complicated. I want to avoid supporting this unless I'm convinced that it is absolutely necessary. It seems to me that MIPS needs a clean, common new ABI. Only the MIPS ABI imposes a lot of restrictions on the size of GOT sections and the order of GOT section members, even though MIPS as a processor is an ordinary RISC ISA. This change would really hurt maintainability of LLD which I already found some MIPS-specific behavior is hard to keep it right when editing code for all the other architectures.

I wonder what is the performance penalty you would have to pay when you use the -mxgot option. With the option, you'll need three instructions as opposed to a single instruction to access an GOT entry. Does that actually make observable difference in performance?

-mxgot does not seem to be enough to run many binaries on FreeBSD mips64. I think this is because local static variables are still accessed using a single 16 bit offset and for larger programs this is not sufficient. I am currently trying to build postgres for MIPS FreeBSD and without the multigot patch I can't link it although all source files are being compiled with -mxgot.

atanasyan updated this revision to Diff 96282.Apr 22 2017, 5:44 AM
atanasyan added a reviewer: rafael.
  • Rebased against the trunk
  • Added a comment
atanasyan updated this revision to Diff 99012.May 15 2017, 8:58 AM
  • Rebased against the trunk

Ping?

Rebased against the trunk.

atanasyan updated this revision to Diff 102523.Jun 14 2017, 4:24 AM
  • Rebased against the trunk
atanasyan updated this revision to Diff 105586.Jul 6 2017, 10:11 PM

Rebased against the trunk.

Is there any chance of getting review for this patch on the next week or so?

atanasyan updated this revision to Diff 109116.Aug 1 2017, 7:42 AM

Rebased against the trunk.

Any chance to get review before LLD 6 or 7 release?

Rebased against the trunk.

We have been using this patch successfully for building CheriBSD (FreeBSD running on the CHERI CPU) for the last few months. Without it we can't build the base system even if we use -mxgot for every object.
The same problem happens when building upstream FreeBSD for MIPS64 so it would be great if this patch could be merged before the next LLD release.

brad added a subscriber: brad.Sep 16 2017, 5:21 PM

Rebased against the trunk.

atanasyan updated this revision to Diff 118583.Oct 11 2017, 3:41 AM
  • Rebased against the trunk.
  • Celebrated the first half a year of the patch reviewing.
atanasyan updated this revision to Diff 119139.Oct 16 2017, 5:19 AM

Rebased against the trunk.

ruiu added a comment.Oct 16 2017, 10:07 AM

I'll make more changes to Relocations.cpp and other files, so you might want to rebase all at once.

brad added a comment.Dec 10 2017, 3:05 PM

Rebase and get this in for 6?

atanasyan updated this revision to Diff 126363.Dec 11 2017, 7:41 AM

Rebased against the trunk.

I would be really great if this patch could be added to LLD soon. I believe this is the last change needed before we can use LLD to build the whole FreeBSD MIPS base system.

Could you rebase this on the latest master please? I tried updating our LLD fork (which includes this patch) to Jan 1st and am now getting three test failures:

Failing Tests (3):
    lld :: ELF/mips-got-redundant.s
    lld :: ELF/mips-got16.s
    lld :: ELF/mips-mgot.s

I have not debugged yet whether if this is because I didn't resolve the merge conflicts correctly or whether LLD has changed in a way that breaks the tests.

As this patch is essential for being able to link any of our code I did some bisection and it turns out that the tests start failing after rL320472 (D38361).

It is not obvious to me why that commit would break the multigot implementation but here are my findings so far:

Before that commit the mips .got section is 112 bytes (good) for the mips-mgot.s test and afterwards it's only 108 bytes (bad). It also makes rela.dyn go from 184 (good) bytes to 96 bytes (bad).

Passing test case before rL320472:

Contents of section .got:
 60000 00000000 80000000 00010000 00010030  ...............0
 60010 00020000 00030000 00040000 00050000  ................
 60020 00060000 00070000 00000000 00000000  ................
 60030 00000004 00000000 00000000 00000000  ................
 60040 00000000 00020000 00030000 00040000  ................
 60050 00050000 00060000 00070000 00000000  ................
 60060 00000000 00000000 00000000 00000000  ................

Broken GOT afterwards:

Contents of section .got:
 60000 00000000 80000000 00020000 00030000  ................
 60010 00010000 00010030 00000000 00000004  .......0........
 60020 00000000 00000000 00000000 00000000  ................
 60030 00020000 00030000 00040000 00050000  ................
 60040 00000000 00070000 00000000 00000000  ................
 60050 00000000 00000000 00000000 00000000  ................
 60060 00000000 00000000 00000000           ............

As this patch is essential for being able to link any of our code I did some bisection and it turns out that the tests start failing after rL320472 (D38361).

I found the reason of the regression. Working on the fix.

atanasyan updated this revision to Diff 130896.Jan 22 2018, 7:47 AM

Rebased against the trunk and fixed calculation of a GOT entries number.

Few quick minor comments/suggestions.

ELF/DriverUtils.cpp
31

Seems unused.

ELF/InputFiles.h
113

May be type could be Optional<> ?

ELF/SyntheticSections.cpp
729

Does not seem assert is needed here. It anyways will either assert or crash on error,
not a huge difference probably. I would simplify:

if (Sym->isTls())
    return G.Tls.find(Sym)->second * Config->Wordsize;
if (Sym->IsPreemptible)
    return G.Global.find(Sym)->second * Config->Wordsize;
...

The same in getGlobalDynOffset and other places.

ELF/SyntheticSections.h
342

Looks it can be MapVector<InputFile *, FileGot>.
Then probably you will not need to add InputFile::MipsGotIndex member.

408

Excessive space: uint64_t OffsetInSec

I'm going to submit updated patch soon.

ELF/DriverUtils.cpp
31

Without this line I get the following error message because Options.td includes the HelpHidden flag:

In file included from /home/simon/work/lld/git/tools/lld/ELF/DriverUtils.cpp:48:
tools/lld/ELF/Options.inc:201:87: error: use of undeclared identifier 'HelpHidden'; did you mean 'llvm::opt::HelpHidden'?
ELF/InputFiles.h
113

OK

ELF/SyntheticSections.cpp
729

OK. Good point.

ELF/SyntheticSections.h
342

While we collect requests for GOT entries, each input file containing GOT relocations has its own entry in the Gots container. But in the MipsGotSection::build method number of separated GOTs is reduced my merging and multiple files might start to refer a single GOT. So we need something like MapVector<InputFile *, FileGot *>. That makes more complicated at least GOT writing because we will not be able to iterate the Gots container and write each entry but have to extract unique GOT.

408

OK

atanasyan updated this revision to Diff 131063.Jan 23 2018, 7:09 AM
  • Addressing review comments
  • Fixing some typos

We have a chance to celebrate the first anniversary of this patch in a couple of months. Is it possible to escape this dubious holiday?

We have a chance to celebrate the first anniversary of this patch in a couple of months. Is it possible to escape this dubious holiday?

I certainly hope so! In FreeBSD we've been using lld as the linker for arm64 for a little while, and the migration is in various states of progress for amd64 (should be complete soon), armv7, and i386. MIPS is probably the next architecture for us to start working on switching if this patch lands.

Rebased against the trunk.

Thank you very much for updating this patch! I have just made the switch in CheriBSD to completely stop using ld.bfd and so far everything I have tested is working.

ELF/SyntheticSections.cpp
912

This could probably use the 4 argument overload since there is no addend.

921

Same for the following 3 calls.

ELF/SyntheticSections.h
407

RelType Type

atanasyan updated this revision to Diff 134620.Feb 16 2018, 7:52 AM
  • Addressing review comments

Ping? In less than four weeks we will celebrate the first anniversary of this patch.

espindola edited reviewers, added: espindola; removed: rafael.Mar 15 2018, 9:51 AM

Ping? Any chance to push this patch this year?

Rebased against the trunk.

Let's celebrate the first anniversary of the patch.

emaste added a comment.Apr 9 2018, 1:21 PM

Happy belated birthday Multi-GOT patch

atanasyan updated this revision to Diff 142354.Apr 13 2018, 1:50 AM

Rebased against the trunk.

Thank you very much for updating this patch! I will be merging from upstream LLVM soon and having this rebased makes the merging so much easier.

(Also happy belated birthday D31528. Hopefully it will not have a second birthday...)

Rebased against the trunk.

atanasyan updated this revision to Diff 145685.Tue, May 8, 7:06 AM

Rebased against the trunk.

arichardson accepted this revision.Wed, May 23, 1:22 AM

As I have mentioned before, having this patch is the difference between LLD being completely unusable for our purposes (linking FreeBSD MIPS n64) and being able to replace BFD.

We have been carrying this in our fork for about a year now and I have not noticed any issues. Thank you very much for updating the patch so frequently since otherwise merging from upstream would be much harder.

@ruiu any chace you could review this? The patch also has the advantage that it removes two MIPS-specific members from Symbol and moves all the MIPS GOT handling to the MipsGotSection class instead.

This revision is now accepted and ready to land.Wed, May 23, 1:22 AM