Download Raw Diff

Details

Reviewers

Commits

rG18821b60b084: [ELF] Generate symbol assignments for predefined symbols
rLLD312305: [ELF] Generate symbol assignments for predefined symbols
rL312305: [ELF] Generate symbol assignments for predefined symbols

Summary

The problem with symbol assignments in implicit linker scripts is that
they can refer synthetic symbols such as _end, _etext or _edata. The
value of these symbols is currently fixed only after all linker script
commands are processed, so these assignments will be using non-final and
hence invalid value.

Rather than fixing the symbol values after all command processing have
finished, we instead change the logic to generate symbol assignment
commands that set the value of these symbols while processing the
commands, this ensures that the value is going to be correct by the time
any reference to these symbol is processed and is equivalent to defining
these symbols explicitly in linker script as BFD ld does.

Diff Detail

Repository: rL LLVM

Event Timeline

phosek created this revision.Aug 21 2017, 4:01 PM

Herald added a subscriber: emaste. · View Herald TranscriptAug 21 2017, 4:01 PM

phosek updated this revision to Diff 112085.Aug 21 2017, 4:02 PM

grimar added a subscriber: grimar.Aug 22 2017, 12:57 AM

ruiu added inline comments.Aug 22 2017, 10:37 AM

ELF/LinkerScript.h
78 ↗	(On Diff #112085)	I think I'd name this `InSections`. But do you actually need this? This is I believe virtually a flag indicating whether an element is a direct child of the top element or not. But we can distinguish BaseCommands enclosed in an OutputSections from BaseCommands that are top-level elements, no?
ELF/Writer.cpp
227 ↗	(On Diff #112085)	Add a comment.

phosek added inline comments.Aug 22 2017, 2:55 PM

ELF/LinkerScript.h
78 ↗	(On Diff #112085)	That's what I tried initially, the problem is that `Opt.Commands` would also contain synthetic symbol assignments created by `fabricateDefaultCommands` and those have to be processed in `assignAddresses` because they modify `.`, so we would need some way to distinguish those commands from other top level symbol assignments. One option would be to create a new subclass e.g. `SyntheticSymbolAssignment` or even `DotAssignment` which would then allow distinguishing the two types of assignments in `assignAddresses`. The other option would be to just add an attribute to `SymbolAssignment`. Any opinion/preference?

phosek updated this revision to Diff 112286.Aug 22 2017, 7:29 PM

phosek marked an inline comment as done.

phosek added inline comments.Aug 22 2017, 7:44 PM

ELF/LinkerScript.h
78 ↗	(On Diff #112085)	Turned out it's even more complicated than that, in `SECTIONS { .text : { (.text) } foo = ABSOLUTE(.) + 0x100; .got : { (.got) } }` the `foo = ABSOLUTE(.)` is still going to end up in `Scripts->Opt.Commands`, but we have to process it early because it uses `.`. I have added the `InSections` attribute to `BaseCommand` because in LLD, we allow `ASSERT` in input linker scripts, which means it can also refer to predefined symbols. However, neither BFD ld nor gold does allow that. If we followed the same model, we could move `InSections` to `SymbolAssignment`.

phosek updated this revision to Diff 112404.Aug 23 2017, 11:09 AM

ruiu added inline comments.Aug 23 2017, 10:09 PM

ELF/LinkerScript.h
78 ↗	(On Diff #112085)	I'm not very sure, but creating `DotAssignment` sounds like a good idea, as assignments to `.` are very different from symbol assignments despite their superficial resemblance. It might be worth a try.
78 ↗	(On Diff #112085)	We do not aim to extend the linker script that GNU linkers, and we do not guarantee that the current undocumented behavior (lld supports only the documented behavior of GNU linkers). So, I think I'd move `InSections` to `SymbolAssignment`.
78 ↗	(On Diff #112085)	The first comment is a stale comment. I don't know why it is sent out this time, but please ignore that.

I have moved InSections to SymbolAssignment and AssertCommand, is there anything else? I might still try to refactor DotAssignment out from SymbolAssignment, but I'd prefer to do it as a separate change.

This patch is perhaps ok as-is, but I came up with another idea: we may be able to reorder commands so that all outside-sections commands follow sections commands. Then maybe we can just process them in-order. What do you think?

In D36986#851730, @ruiu wrote:

This patch is perhaps ok as-is, but I came up with another idea: we may be able to reorder commands so that all outside-sections commands follow sections commands. Then maybe we can just process them in-order. What do you think?

When you say process them in-order, do you mean all of them, including the outside-sections one? The problem right now is that if outside-sections one refer to any predefined symbol, those symbols won't have a correct value until fixPredefinedSymbols ran, which can only happen after we processed all sections commands. So we would need some way to stage the processing into sections commands, then fixPredefinedSymbols and then outside-sections commands. I was thinking that maybe even better would be to simply split the sections and outside-sections commands into two different lists rather then keeping them all in Opt.Commands, processing them separately then becomes even easier.

Ah, you are right that reordering elements won't fix the problem. The idea to have two separate lists sounds like a good idea.

In D36986#852125, @ruiu wrote:

Ah, you are right that reordering elements won't fix the problem. The idea to have two separate lists sounds like a good idea.

I tried to separate the lists, but that also turned out to be tricky because of ordering, e.g. in foo = 0x1 SECTIONS { bar = foo } baz = bar the symbols outside and inside SECTIONS have to be added to the symbol table in that particular order which is difficult when they're in separate lists. Given that, I think the current solution albeit not pretty is still the simplest one.

I take my last comment back, I think I came up with a much better solution, I'll send a patch as soon as I clean it up a bit.

Done, let me know if the new implementation makes sense. I can probably cleanup the code a bit more (possible combine the two Set* lambdas into one).

phosek updated this revision to Diff 112773.Aug 25 2017, 6:04 PM

ruiu added inline comments.Aug 28 2017, 10:44 AM

ELF/Writer.cpp
205 ↗	(On Diff #112773)	Can you briefly mention as a comment about what this function is supposed to do?
927 ↗	(On Diff #112773)	Can you add a comment to explain what this function is supposed to do?

phosek updated this revision to Diff 112986.Aug 28 2017, 4:14 PM

phosek marked 2 inline comments as done.

phosek edited the summary of this revision. (Show Details)

ping

LGTM

I have a little concern about this patch because this patch makes lld a bit more interpret-y. GNU bfd linker has a notion of "internal" linker script which drives the entire linking process, and I don't like the idea because that's too abstract to my taste. In lld, we take more direct approach to create a result: everything is directly driven by code and the flow of control is basically not controlled by scripts. This patch seems like a deviation from that design.

That being said, this patch looks what it should do in a simplest possible way, so I think I'm fine with this. I'm just expressing my thought.

ELF/Writer.cpp
204 ↗	(On Diff #112986)	I'd explain what is "predefined" a bit more, as this comment is essentially the same as the function name.
927 ↗	(On Diff #112986)	predefined symbols (e.g. _end or _etext)

This revision is now accepted and ready to land.Aug 30 2017, 3:07 PM

Thanks! I understand your concern but I think this change is in line with what LLD already does e.g. for output sections which are also converted to commands. The main difference between BFD ld and LLD is that in BFD ld, even the default behavior is driven by the linker script while in LLD we only construct the internal representation that is equivalent to some imaginary linker script... in a way, the BaseCommand hierarchy is becoming LLD's IR.

Closed by commit rL312305: [ELF] Generate symbol assignments for predefined symbols (authored by phosek). · Explain WhyAug 31 2017, 7:24 PM

This revision was automatically updated to reflect the committed changes.

Diff 113503

lld/trunk/ELF/Writer.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	private:
void copyLocalSymbols();		void copyLocalSymbols();
void addSectionSymbols();		void addSectionSymbols();
void addReservedSymbols();		void addReservedSymbols();
void createSections();		void createSections();
void forEachRelSec(std::function<void(InputSectionBase &)> Fn);		void forEachRelSec(std::function<void(InputSectionBase &)> Fn);
void sortSections();		void sortSections();
void finalizeSections();		void finalizeSections();
void addPredefinedSections();		void addPredefinedSections();
		void addPredefinedSymbols();

std::vector<PhdrEntry *> createPhdrs();		std::vector<PhdrEntry *> createPhdrs();
void removeEmptyPTLoad();		void removeEmptyPTLoad();
void addPtArmExid(std::vector<PhdrEntry *> &Phdrs);		void addPtArmExid(std::vector<PhdrEntry *> &Phdrs);
void assignFileOffsets();		void assignFileOffsets();
void assignFileOffsetsBinary();		void assignFileOffsetsBinary();
void setPhdrs();		void setPhdrs();
void fixSectionAlignments();		void fixSectionAlignments();
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	if (!Script->Opt.HasSections && !Config->Relocatable)
fixSectionAlignments();		fixSectionAlignments();

// If -compressed-debug-sections is specified, we need to compress		// If -compressed-debug-sections is specified, we need to compress
// .debug_* sections. Do it right now because it changes the size of		// .debug_* sections. Do it right now because it changes the size of
// output sections.		// output sections.
parallelForEach(OutputSections,		parallelForEach(OutputSections,
[](OutputSection *Sec) { Sec->maybeCompress<ELFT>(); });		[](OutputSection *Sec) { Sec->maybeCompress<ELFT>(); });

		// Generate assignments for predefined symbols (e.g. _end or _etext)
		// before assigning addresses. These symbols may be referred to from
		// the linker script and we need to ensure they have the correct value
		// prior evaluating any expressions using these symbols.
		addPredefinedSymbols();

Script->assignAddresses();		Script->assignAddresses();
Script->allocateHeaders(Phdrs);		Script->allocateHeaders(Phdrs);

// Remove empty PT_LOAD to avoid causing the dynamic linker to try to mmap a		// Remove empty PT_LOAD to avoid causing the dynamic linker to try to mmap a
// 0 sized region. This has to be done late since only after assignAddresses		// 0 sized region. This has to be done late since only after assignAddresses
// we know the size of the sections.		// we know the size of the sections.
removeEmptyPTLoad();		removeEmptyPTLoad();

▲ Show 20 Lines • Show All 704 Lines • ▼ Show 20 Lines	template <class ELFT> void Writer<ELFT>::createSections() {
Script->fabricateDefaultCommands();		Script->fabricateDefaultCommands();
sortBySymbolsOrder<ELFT>();		sortBySymbolsOrder<ELFT>();
sortInitFini(findSection(".init_array"));		sortInitFini(findSection(".init_array"));
sortInitFini(findSection(".fini_array"));		sortInitFini(findSection(".fini_array"));
sortCtorsDtors(findSection(".ctors"));		sortCtorsDtors(findSection(".ctors"));
sortCtorsDtors(findSection(".dtors"));		sortCtorsDtors(findSection(".dtors"));
}		}

		// This function generates assignments for predefined symbols (e.g. _end or
		// _etext) and inserts them into the commands sequence to be processed at the
		// appropriate time. This ensures that the value is going to be correct by the
		// time any references to these symbols are processed and is equivalent to
		// defining these symbols explicitly in the linker script.
		template <class ELFT> void Writer<ELFT>::addPredefinedSymbols() {
		PhdrEntry *Last = nullptr;
		PhdrEntry *LastRO = nullptr;
		PhdrEntry *LastRW = nullptr;
		for (PhdrEntry *P : Phdrs) {
		if (P->p_type != PT_LOAD)
		continue;
		Last = P;
		if (P->p_flags & PF_W)
		LastRW = P;
		else
		LastRO = P;
		}

		auto Make = [](DefinedRegular *S) {
		auto *Cmd = make<SymbolAssignment>(
		S->getName(), [=] { return Script->getSymbolValue("", "."); }, "");
		Cmd->Sym = S;
		return Cmd;
		};

		auto IsSection = [](OutputSection *Sec) {
		return [=](BaseCommand *Base) { return Base == Sec; };
		};

		auto IsNoBits = [](BaseCommand *Base) {
		if (auto *Sec = dyn_cast<OutputSection>(Base))
		return Sec->Type == SHT_NOBITS;
		return false;
		};

		if (Last) {
		// _end is the first location after the uninitialized data region.
		auto E = Script->Opt.Commands.end();
		auto I = Script->Opt.Commands.begin();
		I = std::find_if(I, E, IsSection(Last->Last));
		if (I != E) {
		if (ElfSym::End1)
		Script->Opt.Commands.insert(++I, Make(ElfSym::End1));
		if (ElfSym::End2)
		Script->Opt.Commands.insert(++I, Make(ElfSym::End2));
		}
		}
		if (LastRO) {
		// _etext is the first location after the last read-only loadable segment.
		auto E = Script->Opt.Commands.end();
		auto I = Script->Opt.Commands.begin();
		I = std::find_if(I, E, IsSection(LastRO->Last));
		if (I != E) {
		if (ElfSym::Etext1)
		Script->Opt.Commands.insert(++I, Make(ElfSym::Etext1));
		if (ElfSym::Etext2)
		Script->Opt.Commands.insert(++I, Make(ElfSym::Etext2));
		}
		}
		if (LastRW) {
		// _edata points to the end of the last non SHT_NOBITS section.
		auto E = Script->Opt.Commands.end();
		auto I = Script->Opt.Commands.begin();
		I = std::find_if(std::find_if(I, E, IsSection(LastRW->First)), E, IsNoBits);
		if (I != E) {
		if (ElfSym::Edata2)
		I = Script->Opt.Commands.insert(I, Make(ElfSym::Edata2));
		if (ElfSym::Edata1)
		I = Script->Opt.Commands.insert(I, Make(ElfSym::Edata1));
		}
		}
		}

// We want to find how similar two ranks are.		// We want to find how similar two ranks are.
// The more branches in getSectionRank that match, the more similar they are.		// The more branches in getSectionRank that match, the more similar they are.
// Since each branch corresponds to a bit flag, we can just use		// Since each branch corresponds to a bit flag, we can just use
// countLeadingZeros.		// countLeadingZeros.
static int getRankProximityAux(OutputSection A, OutputSection B) {		static int getRankProximityAux(OutputSection A, OutputSection B) {
return countLeadingZeros(A->SortRank ^ B->SortRank);		return countLeadingZeros(A->SortRank ^ B->SortRank);
}		}

▲ Show 20 Lines • Show All 782 Lines • ▼ Show 20 Lines	if (Config->Relocatable)
return ET_REL;		return ET_REL;
return ET_EXEC;		return ET_EXEC;
}		}

// This function is called after we have assigned address and size		// This function is called after we have assigned address and size
// to each section. This function fixes some predefined		// to each section. This function fixes some predefined
// symbol values that depend on section address and size.		// symbol values that depend on section address and size.
template <class ELFT> void Writer<ELFT>::fixPredefinedSymbols() {		template <class ELFT> void Writer<ELFT>::fixPredefinedSymbols() {
// _etext is the first location after the last read-only loadable segment.
// _edata is the first location after the last read-write loadable segment.
// _end is the first location after the uninitialized data region.
PhdrEntry *Last = nullptr;
PhdrEntry *LastRO = nullptr;
PhdrEntry *LastRW = nullptr;
for (PhdrEntry *P : Phdrs) {
if (P->p_type != PT_LOAD)
continue;
Last = P;
if (P->p_flags & PF_W)
LastRW = P;
else
LastRO = P;
}

auto Set = [](DefinedRegular S, OutputSection Cmd, uint64_t Value) {
if (S) {
S->Section = Cmd;
S->Value = Value;
}
};

if (Last) {
Set(ElfSym::End1, Last->First, Last->p_memsz);
Set(ElfSym::End2, Last->First, Last->p_memsz);
}
if (LastRO) {
Set(ElfSym::Etext1, LastRO->First, LastRO->p_filesz);
Set(ElfSym::Etext2, LastRO->First, LastRO->p_filesz);
}
if (LastRW) {
Set(ElfSym::Edata1, LastRW->First, LastRW->p_filesz);
Set(ElfSym::Edata2, LastRW->First, LastRW->p_filesz);
}

if (ElfSym::Bss)		if (ElfSym::Bss)
ElfSym::Bss->Section = findSection(".bss");		ElfSym::Bss->Section = findSection(".bss");

// Setup MIPS _gp_disp/__gnu_local_gp symbols which should		// Setup MIPS _gp_disp/__gnu_local_gp symbols which should
// be equal to the _gp symbol's value.		// be equal to the _gp symbol's value.
if (Config->EMachine == EM_MIPS && !ElfSym::MipsGp->Value) {		if (Config->EMachine == EM_MIPS && !ElfSym::MipsGp->Value) {
// Find GP-relative section with the lowest address		// Find GP-relative section with the lowest address
// and use this address to calculate default _gp value.		// and use this address to calculate default _gp value.
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

lld/trunk/test/ELF/edata-etext.s

	Show All 13 Lines
	# CHECK-NEXT: Idx Name Size Address Type			# CHECK-NEXT: Idx Name Size Address Type
	# CHECK-NEXT: 0 00000000 0000000000000000			# CHECK-NEXT: 0 00000000 0000000000000000
	# CHECK-NEXT: 1 .text 00000001 0000000000201000 TEXT DATA			# CHECK-NEXT: 1 .text 00000001 0000000000201000 TEXT DATA
	# CHECK-NEXT: 2 .data 00000002 0000000000202000 DATA			# CHECK-NEXT: 2 .data 00000002 0000000000202000 DATA
	# CHECK-NEXT: 3 .bss 00000006 0000000000202004 BSS			# CHECK-NEXT: 3 .bss 00000006 0000000000202004 BSS
	# CHECK: SYMBOL TABLE:			# CHECK: SYMBOL TABLE:
	# CHECK-NEXT: 0000000000000000 UND 00000000			# CHECK-NEXT: 0000000000000000 UND 00000000
	# CHECK-NEXT: 0000000000202002 .data 00000000 _edata			# CHECK-NEXT: 0000000000202002 .data 00000000 _edata
	# CHECK-NEXT: 000000000020200a .data 00000000 _end			# CHECK-NEXT: 000000000020200a .bss 00000000 _end
	# CHECK-NEXT: 0000000000201001 .text 00000000 _etext			# CHECK-NEXT: 0000000000201001 .text 00000000 _etext
	# CHECK-NEXT: 0000000000201000 .text 00000000 _start			# CHECK-NEXT: 0000000000201000 .text 00000000 _start

	# RUN: ld.lld -r %t.o -o %t2			# RUN: ld.lld -r %t.o -o %t2
	# RUN: llvm-objdump -t %t2 \| FileCheck %s --check-prefix=RELOCATABLE			# RUN: llvm-objdump -t %t2 \| FileCheck %s --check-prefix=RELOCATABLE
	# RELOCATABLE: 0000000000000000 UND 00000000 _edata			# RELOCATABLE: 0000000000000000 UND 00000000 _edata
	# RELOCATABLE-NEXT: 0000000000000000 UND 00000000 _end			# RELOCATABLE-NEXT: 0000000000000000 UND 00000000 _end
	# RELOCATABLE-NEXT: 0000000000000000 UND 00000000 _etext			# RELOCATABLE-NEXT: 0000000000000000 UND 00000000 _etext
	Show All 10 Lines

lld/trunk/test/ELF/linkerscript/symbol-reserved.s

	Show All 22 Lines
	# ALIGN-ADD: 0000000000000012 ABS 00000000 .hidden newsym			# ALIGN-ADD: 0000000000000012 ABS 00000000 .hidden newsym

	# RUN: echo "PROVIDE_HIDDEN(newsym = ALIGN(11, 8) - 10);" > %t.script			# RUN: echo "PROVIDE_HIDDEN(newsym = ALIGN(11, 8) - 10);" > %t.script
	# RUN: ld.lld -o %t1 %t.script %t			# RUN: ld.lld -o %t1 %t.script %t
	# RUN: llvm-objdump -t %t1 \| FileCheck --check-prefix=ALIGN-SUB %s			# RUN: llvm-objdump -t %t1 \| FileCheck --check-prefix=ALIGN-SUB %s
	# ALIGN-SUB: 0000000000000006 ABS 00000000 .hidden newsym			# ALIGN-SUB: 0000000000000006 ABS 00000000 .hidden newsym

	# RUN: echo "PROVIDE_HIDDEN(newsym = ALIGN(_end, CONSTANT(MAXPAGESIZE)) + 5);" > %t.script			# RUN: echo "PROVIDE_HIDDEN(newsym = ALIGN(_end, CONSTANT(MAXPAGESIZE)) + 5);" > %t.script
				# RUN: ld.lld -o %t1 %t %t.script
				# RUN: llvm-objdump -t %t1 \| FileCheck --check-prefix=RELATIVE %s
				# RELATIVE: 0000000000202005 .text 00000000 .hidden newsym
				# RELATIVE: 0000000000201007 .text 00000000 _end

				# RUN: echo "PROVIDE_HIDDEN(newsym = ALIGN(_end, CONSTANT(MAXPAGESIZE)) + 5);" > %t.script
	# RUN: ld.lld -o %t1 --script %p/Inputs/symbol-reserved.script %t %t.script			# RUN: ld.lld -o %t1 --script %p/Inputs/symbol-reserved.script %t %t.script
	# RUN: llvm-objdump -t %t1 \| FileCheck --check-prefix=RELATIVE-ADD %s			# RUN: llvm-objdump -t %t1 \| FileCheck --check-prefix=RELATIVE-ADD %s
	# RELATIVE-ADD: 0000000000001005 .text 00000000 .hidden newsym			# RELATIVE-ADD: 0000000000001005 .text 00000000 .hidden newsym
	# RELATIVE-ADD: 0000000000000007 .text 00000000 .hidden _end			# RELATIVE-ADD: 0000000000000007 .text 00000000 .hidden _end

	# RUN: echo "PROVIDE_HIDDEN(newsym = ALIGN(_end, CONSTANT(MAXPAGESIZE)) - 5);" > %t.script			# RUN: echo "PROVIDE_HIDDEN(newsym = ALIGN(_end, CONSTANT(MAXPAGESIZE)) - 5);" > %t.script
	# RUN: ld.lld -o %t1 --script %p/Inputs/symbol-reserved.script %t %t.script			# RUN: ld.lld -o %t1 --script %p/Inputs/symbol-reserved.script %t %t.script
	# RUN: llvm-objdump -t %t1 \| FileCheck --check-prefix=RELATIVE-SUB %s			# RUN: llvm-objdump -t %t1 \| FileCheck --check-prefix=RELATIVE-SUB %s
	# RELATIVE-SUB: 0000000000000ffb .text 00000000 .hidden newsym			# RELATIVE-SUB: 0000000000000ffb .text 00000000 .hidden newsym
	# RELATIVE-SUB: 0000000000000007 .text 00000000 .hidden _end			# RELATIVE-SUB: 0000000000000007 .text 00000000 .hidden _end

	.global _start			.global _start
	_start:			_start:
	lea newsym(%rip),%rax			lea newsym(%rip),%rax

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Generate symbol assignments for predefined symbols
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 113503

lld/trunk/ELF/Writer.cpp

lld/trunk/test/ELF/edata-etext.s

lld/trunk/test/ELF/linkerscript/symbol-reserved.s

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Generate symbol assignments for predefined symbolsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 113503

lld/trunk/ELF/Writer.cpp

lld/trunk/test/ELF/edata-etext.s

lld/trunk/test/ELF/linkerscript/symbol-reserved.s

[ELF] Generate symbol assignments for predefined symbols
ClosedPublic