This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/trunk/
-
trunk/
-
ELF/
-
OutputSections.h
-
OutputSections.cpp
-
test/ELF/
-
ELF/
-
mips-got-redundant.s

Differential D18349

[ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOT
ClosedPublic

Authored by atanasyan on Mar 22 2016, 5:59 AM.

Download Raw Diff

Details

Reviewers

ruiu
• rafael

Commits

rGd2980d3e206e: [ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOT
rLLD264730: [ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOT
rL264730: [ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOT

Summary

Local symbol which requires GOT entry initialized by "page" address. This address is high 16 bits of sum of the symbol value and the relocation addend. In the relocation scanning phase final values of symbols are unknown so to reduce number of allocated GOT entries do the following trick. Save all output sections referenced by GOT relocations during the relocation scanning phase. Then later in the GotSection::finalize method calculate number of "pages" required to cover all saved output sections and allocate appropriate number of GOT entries. We assume the worst case - each 64kb page of the output section has at least one GOT relocation against it.

Diff Detail

Repository: rL LLVM

Event Timeline

atanasyan updated this revision to Diff 51277.Mar 22 2016, 5:59 AM

atanasyan retitled this revision from to [ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOT.

atanasyan updated this object.

atanasyan added reviewers: ruiu, • rafael.

atanasyan set the repository for this revision to rL LLVM.

atanasyan added a project: lld.

atanasyan added a subscriber: llvm-commits.

That is still more than needed, no?

A single symbol pointing to a 8k page would create 2 got entries but
only needs one. Is that correct?

Cheers,
Rafael

In D18349#380312, @rafael wrote:

That is still more than needed, no?

Yes. If symbols point only to the thirst 64kb page of large output section we allocate "page" entries covers the section completely.

A single symbol pointing to a 8k page would create 2 got entries but
only needs one. Is that correct?

I do not consider local symbols individually at all. If a single symbol pointing to a 8k page we get one redundant entry. It is not good but acceptable. If we have more than one symbol pointing to the adjacent addresses, exact number of required GOT entries depends on the final values of these addresses. In the best case they all belong to the same final 64kb page and have the same high 16 bits. In the worst case we will need two page addresses. But to get optimal result we will have to calculate number of required GOT entries after all output section will be finalized and aligned.

BTW, what result do you get with gold/bfd?

Cheers,
Rafael

In D18349#380355, @rafael wrote:

I wonder if with SymbolBodies being allocated to local symbols if we
can compute the exact answer.

Would this work?:

Represent the .got section with just a vector of symbol bodies.

Make sure .got is placed after all sections that a got entry can point to.

Layout all sections that go before got.

We can now compute the values of the symbols.

There are two problems in this approach (though it allows to get ideal result):

We need to create a local symbol body for each unique combination of local symbol referenced by GOT relocation and addend. Number of such combinations might be very large. And we need to read relocation addend in relocation scanning phase.
For some end-users like embedded developers it might be essential to be able to place sections in specific order.

BTW, what result do you get with gold/bfd?

Almost all test cases from LLVM Test suite show similar results. For example (size of .got section):
7zip-benchmark: BFD: 0x17a4 LLD: 0x177c
tramp3d-v4: BFD: 0x11a4 LLD: 0x11b8

There are couple exceptions mafft/pairlocalalign and NPB-serial/is/is. These executables have large .bss section and the patch produces the GOT bigger than necessary. But anyway it is better than current trunk implementation.

Rebased the patch.

Ping?

There are two problems in this approach (though it allows to get ideal result):

We need to create a local symbol body for each unique combination of local symbol referenced by GOT relocation and addend. Number of such combinations might be very large. And we need to read relocation addend in relocation scanning phase.

Well, it is one SymbolBody,Addend pair per relocation.

For some end-users like embedded developers it might be essential to be able to place sections in specific order.

BTW, what result do you get with gold/bfd?

Almost all test cases from LLVM Test suite show similar results. For example (size of .got section):
7zip-benchmark: BFD: 0x17a4 LLD: 0x177c
tramp3d-v4: BFD: 0x11a4 LLD: 0x11b8

There are couple exceptions mafft/pairlocalalign and NPB-serial/is/is. These executables have large .bss section and the patch produces the GOT bigger than necessary. But anyway it is better than current trunk implementation.

I guess the question is more: do you know in which cases they produce
a better result?

Sending a few more comments via Phab.

Cheers,
Rafael

OK, I think this is probably fine.

If we want to compute a better result in the future and still support got pointing forward in the file I think we can do:

Delay the scan just a bit so we know the position of each symbol in its output section.
Compute the offset in the output section that each got entry will have
Use that to compute an upper bound on the number of entries.

Can you just upload a new version with the inline comments fixed?

ELF/OutputSections.cpp
209 ↗	(On Diff #51392)	The description is not exactly true. You are computing an upper bound, not the exact number. Given that, you are not assuming that the "relocations are spread" uniformly. It is just that the worst case is for every page in the section to have a relocation pointing at it.
ELF/Writer.cpp
1096 ↗	(On Diff #51392)	Do you have a testcase for this? If not, could you leave it for a follow up patch?

In D18349#384685, @rafael wrote:

Almost all test cases from LLVM Test suite show similar results. For example (size of .got section):
7zip-benchmark: BFD: 0x17a4 LLD: 0x177c
tramp3d-v4: BFD: 0x11a4 LLD: 0x11b8

There are couple exceptions mafft/pairlocalalign and NPB-serial/is/is. These executables have large .bss section and the patch produces the GOT bigger than necessary. But anyway it is better than current trunk implementation.

I guess the question is more: do you know in which cases they produce
a better result?

Suppose that you have a large output section. All relocations against this section target only a small range of addresses in this section. In that case bfd and gold generates only a few GOT entries. This patch in contrast generates GOT entry for each 64kb of the output section.

Fixed the comment describes how we count the number of required GOT entries.
Removed changes from Writer.cpp. I reviewed this code once again and realized that the changes can be rolled back. The only section (except .dynamic, .got.plt etc) which can modify its sh_size in the finalize method is the MergeOutputSection. But such sections always go before the .got section in the OutputSections container because we support only read-only merge sections and read-only sections go before writable ones. So it is does not necessary to postpone finalizing of the .got section. It would be nice to have an assert to check that fact but I cannot find how to do that without significant modification of the code.

LGTM. Thanks.

This revision is now accepted and ready to land.Mar 29 2016, 6:26 AM

Thanks for review.

Closed by commit rL264730: [ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOT (authored by atanasyan). · Explain WhyMar 29 2016, 7:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lld/

trunk/

ELF/

OutputSections.h

3 lines

OutputSections.cpp

44 lines

test/

ELF/

mips-got-redundant.s

15 lines

Diff 51901

lld/trunk/ELF/OutputSections.h

//===- OutputSections.h ------------------------------------------ C++ --===//		//===- OutputSections.h ------------------------------------------ C++ --===//
//		//
// The LLVM Linker		// The LLVM Linker
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLD_ELF_OUTPUT_SECTIONS_H		#ifndef LLD_ELF_OUTPUT_SECTIONS_H
#define LLD_ELF_OUTPUT_SECTIONS_H		#define LLD_ELF_OUTPUT_SECTIONS_H

#include "Config.h"		#include "Config.h"

#include "lld/Core/LLVM.h"		#include "lld/Core/LLVM.h"
		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/MC/StringTableBuilder.h"		#include "llvm/MC/StringTableBuilder.h"
#include "llvm/Object/ELF.h"		#include "llvm/Object/ELF.h"

namespace lld {		namespace lld {
namespace elf {		namespace elf {

class SymbolBody;		class SymbolBody;
template <class ELFT> class SymbolTable;		template <class ELFT> class SymbolTable;
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	public:
unsigned getMipsLocalEntriesNum() const;		unsigned getMipsLocalEntriesNum() const;

uintX_t getTlsIndexVA() { return Base::getVA() + TlsIndexOff; }		uintX_t getTlsIndexVA() { return Base::getVA() + TlsIndexOff; }

private:		private:
std::vector<const SymbolBody *> Entries;		std::vector<const SymbolBody *> Entries;
uint32_t TlsIndexOff = -1;		uint32_t TlsIndexOff = -1;
uint32_t MipsLocalEntries = 0;		uint32_t MipsLocalEntries = 0;
		// Output sections referenced by MIPS GOT relocations.
		llvm::SmallPtrSet<const OutputSectionBase<ELFT> *, 10> MipsOutSections;
llvm::DenseMap<uintX_t, size_t> MipsLocalGotPos;		llvm::DenseMap<uintX_t, size_t> MipsLocalGotPos;

uintX_t getMipsLocalEntryAddr(uintX_t EntryValue);		uintX_t getMipsLocalEntryAddr(uintX_t EntryValue);
};		};

template <class ELFT>		template <class ELFT>
class GotPltSection final : public OutputSectionBase<ELFT> {		class GotPltSection final : public OutputSectionBase<ELFT> {
typedef typename ELFT::uint uintX_t;		typedef typename ELFT::uint uintX_t;
▲ Show 20 Lines • Show All 439 Lines • Show Last 20 Lines

lld/trunk/ELF/OutputSections.cpp

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	if (Config->EMachine == EM_MIPS) {
//		//
// If a symbol is preemptible we need help of dynamic linker to get its		// If a symbol is preemptible we need help of dynamic linker to get its
// final address. The corresponding GOT entries are allocated in the		// final address. The corresponding GOT entries are allocated in the
// "global" part of GOT. Entries for non preemptible global symbol allocated		// "global" part of GOT. Entries for non preemptible global symbol allocated
// in the "local" part of GOT.		// in the "local" part of GOT.
//		//
// See "Global Offset Table" in Chapter 5:		// See "Global Offset Table" in Chapter 5:
// ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf		// ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf
//		if (Sym.isLocal()) {
// FIXME (simon): Now LLD allocates GOT entries for each		// At this point we do not know final symbol value so to reduce number
// "local symbol+addend" pair. That should be fixed to reduce size		// of allocated GOT entries do the following trick. Save all output
// of generated GOT.		// sections referenced by GOT relocations. Then later in the `finalize`
if (Sym.isPreemptible())		// method calculate number of "pages" required to cover all saved output
Sym.MustBeInDynSym = true;		// section and allocate appropriate number of GOT entries.
else {		auto *OutSec = cast<DefinedRegular<ELFT>>(&Sym)->Section->OutSec;
		MipsOutSections.insert(OutSec);
		return;
		}
		if (!Sym.isPreemptible()) {
		// In case of non-local symbols require an entry in the local part
		// of MIPS GOT, we set GotIndex to 1 just to accent that this symbol
		// has the GOT entry and escape creation more redundant GOT entries.
		// FIXME (simon): We can try to store such symbols in the `Entries`
		// container. But in that case we have to sort out that container
		// and update GotIndex assigned to symbols.
		Sym.GotIndex = 1;
++MipsLocalEntries;		++MipsLocalEntries;
return;		return;
}		}
		// All preemptible symbols with MIPS GOT entries should be represented
		// in the dynamic symbols table.
		Sym.MustBeInDynSym = true;
}		}
Sym.GotIndex = Entries.size();		Sym.GotIndex = Entries.size();
Entries.push_back(&Sym);		Entries.push_back(&Sym);
}		}

template <class ELFT> bool GotSection<ELFT>::addDynTlsEntry(SymbolBody &Sym) {		template <class ELFT> bool GotSection<ELFT>::addDynTlsEntry(SymbolBody &Sym) {
if (Sym.hasGlobalDynIndex())		if (Sym.hasGlobalDynIndex())
return false;		return false;
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
}		}

template <class ELFT>		template <class ELFT>
unsigned GotSection<ELFT>::getMipsLocalEntriesNum() const {		unsigned GotSection<ELFT>::getMipsLocalEntriesNum() const {
return Target->GotHeaderEntriesNum + MipsLocalEntries;		return Target->GotHeaderEntriesNum + MipsLocalEntries;
}		}

template <class ELFT> void GotSection<ELFT>::finalize() {		template <class ELFT> void GotSection<ELFT>::finalize() {
		for (const OutputSectionBase<ELFT> *OutSec : MipsOutSections) {
		// Calculate an upper bound of MIPS GOT entries required to store page
		// addresses of local symbols. We assume the worst case - each 64kb
		// page of the output section has at least one GOT relocation against it.
		// Add 0x8000 to the section's size because the page address stored
		// in the GOT entry is calculated as (value + 0x8000) & ~0xffff.
		MipsLocalEntries += (OutSec->getSize() + 0x8000 + 0xfffe) / 0xffff;
		}
this->Header.sh_size =		this->Header.sh_size =
(Target->GotHeaderEntriesNum + MipsLocalEntries + Entries.size()) *		(Target->GotHeaderEntriesNum + MipsLocalEntries + Entries.size()) *
sizeof(uintX_t);		sizeof(uintX_t);
}		}

template <class ELFT> void GotSection<ELFT>::writeTo(uint8_t *Buf) {		template <class ELFT> void GotSection<ELFT>::writeTo(uint8_t *Buf) {
Target->writeGotHeader(Buf);		Target->writeGotHeader(Buf);
for (std::pair<uintX_t, size_t> &L : MipsLocalGotPos) {		for (std::pair<uintX_t, size_t> &L : MipsLocalGotPos) {
▲ Show 20 Lines • Show All 1,141 Lines • ▼ Show 20 Lines

// Orders symbols according to their positions in the GOT,		// Orders symbols according to their positions in the GOT,
// in compliance with MIPS ABI rules.		// in compliance with MIPS ABI rules.
// See "Global Offset Table" in Chapter 5 in the following document		// See "Global Offset Table" in Chapter 5 in the following document
// for detailed description:		// for detailed description:
// ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf		// ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf
static bool sortMipsSymbols(const std::pair<SymbolBody *, unsigned> &L,		static bool sortMipsSymbols(const std::pair<SymbolBody *, unsigned> &L,
const std::pair<SymbolBody *, unsigned> &R) {		const std::pair<SymbolBody *, unsigned> &R) {
if (!L.first->isInGot() \|\| !R.first->isInGot())		// Sort entries related to non-local preemptible symbols by GOT indexes.
return R.first->isInGot();		// All other entries go to the first part of GOT in arbitrary order.
		bool LIsInLocalGot = !L.first->isInGot() \|\| !L.first->isPreemptible();
		bool RIsInLocalGot = !R.first->isInGot() \|\| !R.first->isPreemptible();
		if (LIsInLocalGot \|\| RIsInLocalGot)
		return !RIsInLocalGot;
return L.first->GotIndex < R.first->GotIndex;		return L.first->GotIndex < R.first->GotIndex;
}		}

template <class ELFT> void SymbolTableSection<ELFT>::finalize() {		template <class ELFT> void SymbolTableSection<ELFT>::finalize() {
if (this->Header.sh_size)		if (this->Header.sh_size)
return; // Already finalized.		return; // Already finalized.

this->Header.sh_size = getNumSymbols() * sizeof(Elf_Sym);		this->Header.sh_size = getNumSymbols() * sizeof(Elf_Sym);
▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

lld/trunk/test/ELF/mips-got-redundant.s

	Show All 19 Lines
	# ^-- loc2, loc3, loc4			# ^-- loc2, loc3, loc4
	# CHECK-NEXT: }			# CHECK-NEXT: }
	# CHECK-NEXT: Entry {			# CHECK-NEXT: Entry {
	# CHECK-NEXT: Address: 0x20010			# CHECK-NEXT: Address: 0x20010
	# CHECK-NEXT: Access: -32736			# CHECK-NEXT: Access: -32736
	# CHECK-NEXT: Initial: 0x40008			# CHECK-NEXT: Initial: 0x40008
	# ^-- glb1			# ^-- glb1
	# CHECK-NEXT: }			# CHECK-NEXT: }
	# CHECK-NEXT: Entry {
	# CHECK-NEXT: Address: 0x20014
	# CHECK-NEXT: Access: -32732
	# CHECK-NEXT: Initial: 0x0
	# CHECK-NEXT: }
	# CHECK-NEXT: Entry {
	# CHECK-NEXT: Address: 0x20018
	# CHECK-NEXT: Access: -32728
	# CHECK-NEXT: Initial: 0x0
	# CHECK-NEXT: }
	# CHECK-NEXT: Entry {
	# CHECK-NEXT: Address: 0x2001C
	# CHECK-NEXT: Access: -32724
	# CHECK-NEXT: Initial: 0x0
	# CHECK-NEXT: }
	# CHECK-NEXT: ]			# CHECK-NEXT: ]

	.text			.text
	.globl foo			.globl foo
	foo:			foo:
	lw $t0, %got(loc1)($gp)			lw $t0, %got(loc1)($gp)
	addi $t0, $t0, %lo(loc1)			addi $t0, $t0, %lo(loc1)
	lw $t0, %got(loc2)($gp)			lw $t0, %got(loc2)($gp)
	Show All 23 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOTClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 51901

lld/trunk/ELF/OutputSections.h

lld/trunk/ELF/OutputSections.cpp

lld/trunk/test/ELF/mips-got-redundant.s

[ELF][MIPS] Reduce number of redundant entries in the local part of MIPS GOT
ClosedPublic