This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Generate symbol assignments for predefined symbols
ClosedPublic

Authored by phosek on Aug 21 2017, 4:01 PM.

Details

Summary

The problem with symbol assignments in implicit linker scripts is that
they can refer synthetic symbols such as _end, _etext or _edata. The
value of these symbols is currently fixed only after all linker script
commands are processed, so these assignments will be using non-final and
hence invalid value.

Rather than fixing the symbol values after all command processing have
finished, we instead change the logic to generate symbol assignment
commands that set the value of these symbols while processing the
commands, this ensures that the value is going to be correct by the time
any reference to these symbol is processed and is equivalent to defining
these symbols explicitly in linker script as BFD ld does.

Diff Detail

Repository
rL LLVM

Event Timeline

phosek created this revision.Aug 21 2017, 4:01 PM
phosek updated this revision to Diff 112085.Aug 21 2017, 4:02 PM
grimar added a subscriber: grimar.Aug 22 2017, 12:57 AM
ruiu added inline comments.Aug 22 2017, 10:37 AM
ELF/LinkerScript.h
78 ↗(On Diff #112085)

I think I'd name this InSections.

But do you actually need this? This is I believe virtually a flag indicating whether an element is a direct child of the top element or not. But we can distinguish BaseCommands enclosed in an OutputSections from BaseCommands that are top-level elements, no?

ELF/Writer.cpp
227 ↗(On Diff #112085)

Add a comment.

phosek added inline comments.Aug 22 2017, 2:55 PM
ELF/LinkerScript.h
78 ↗(On Diff #112085)

That's what I tried initially, the problem is that Opt.Commands would also contain synthetic symbol assignments created by fabricateDefaultCommands and those have to be processed in assignAddresses because they modify ., so we would need some way to distinguish those commands from other top level symbol assignments. One option would be to create a new subclass e.g. SyntheticSymbolAssignment or even DotAssignment which would then allow distinguishing the two types of assignments in assignAddresses. The other option would be to just add an attribute to SymbolAssignment. Any opinion/preference?

phosek updated this revision to Diff 112286.Aug 22 2017, 7:29 PM
phosek marked an inline comment as done.
phosek added inline comments.Aug 22 2017, 7:44 PM
ELF/LinkerScript.h
78 ↗(On Diff #112085)

Turned out it's even more complicated than that, in SECTIONS { .text : { *(.text) } foo = ABSOLUTE(.) + 0x100; .got : { *(.got) } } the foo = ABSOLUTE(.) is still going to end up in Scripts->Opt.Commands, but we have to process it early because it uses ..

I have added the InSections attribute to BaseCommand because in LLD, we allow ASSERT in input linker scripts, which means it can also refer to predefined symbols. However, neither BFD ld nor gold does allow that. If we followed the same model, we could move InSections to SymbolAssignment.

phosek updated this revision to Diff 112404.Aug 23 2017, 11:09 AM
ruiu added inline comments.Aug 23 2017, 10:09 PM
ELF/LinkerScript.h
78 ↗(On Diff #112085)

I'm not very sure, but creating DotAssignment sounds like a good idea, as assignments to . are very different from symbol assignments despite their superficial resemblance. It might be worth a try.

78 ↗(On Diff #112085)

We do not aim to extend the linker script that GNU linkers, and we do not guarantee that the current undocumented behavior (lld supports only the documented behavior of GNU linkers). So, I think I'd move InSections to SymbolAssignment.

78 ↗(On Diff #112085)

The first comment is a stale comment. I don't know why it is sent out this time, but please ignore that.

phosek marked an inline comment as done.Aug 24 2017, 11:29 AM

I have moved InSections to SymbolAssignment and AssertCommand, is there anything else? I might still try to refactor DotAssignment out from SymbolAssignment, but I'd prefer to do it as a separate change.

ruiu edited edge metadata.Aug 24 2017, 12:08 PM

This patch is perhaps ok as-is, but I came up with another idea: we may be able to reorder commands so that all outside-sections commands follow sections commands. Then maybe we can just process them in-order. What do you think?

In D36986#851730, @ruiu wrote:

This patch is perhaps ok as-is, but I came up with another idea: we may be able to reorder commands so that all outside-sections commands follow sections commands. Then maybe we can just process them in-order. What do you think?

When you say process them in-order, do you mean all of them, including the outside-sections one? The problem right now is that if outside-sections one refer to any predefined symbol, those symbols won't have a correct value until fixPredefinedSymbols ran, which can only happen after we processed all sections commands. So we would need some way to stage the processing into sections commands, then fixPredefinedSymbols and then outside-sections commands. I was thinking that maybe even better would be to simply split the sections and outside-sections commands into two different lists rather then keeping them all in Opt.Commands, processing them separately then becomes even easier.

ruiu added a comment.Aug 24 2017, 5:18 PM

Ah, you are right that reordering elements won't fix the problem. The idea to have two separate lists sounds like a good idea.

In D36986#852125, @ruiu wrote:

Ah, you are right that reordering elements won't fix the problem. The idea to have two separate lists sounds like a good idea.

I tried to separate the lists, but that also turned out to be tricky because of ordering, e.g. in foo = 0x1 SECTIONS { bar = foo } baz = bar the symbols outside and inside SECTIONS have to be added to the symbol table in that particular order which is difficult when they're in separate lists. Given that, I think the current solution albeit not pretty is still the simplest one.

I take my last comment back, I think I came up with a much better solution, I'll send a patch as soon as I clean it up a bit.

phosek updated this revision to Diff 112765.Aug 25 2017, 5:23 PM
phosek retitled this revision from [ELF] Handle assignments outside SECTIONS command separately to [ELF] Generate symbol assignments for predefined symbols.
phosek edited the summary of this revision. (Show Details)

Done, let me know if the new implementation makes sense. I can probably cleanup the code a bit more (possible combine the two Set* lambdas into one).

phosek updated this revision to Diff 112773.Aug 25 2017, 6:04 PM
ruiu added inline comments.Aug 28 2017, 10:44 AM
ELF/Writer.cpp
205 ↗(On Diff #112773)

Can you briefly mention as a comment about what this function is supposed to do?

927 ↗(On Diff #112773)

Can you add a comment to explain what this function is supposed to do?

phosek updated this revision to Diff 112986.Aug 28 2017, 4:14 PM
phosek marked 2 inline comments as done.
phosek edited the summary of this revision. (Show Details)
ruiu accepted this revision.Aug 30 2017, 3:07 PM

LGTM

I have a little concern about this patch because this patch makes lld a bit more interpret-y. GNU bfd linker has a notion of "internal" linker script which drives the entire linking process, and I don't like the idea because that's too abstract to my taste. In lld, we take more direct approach to create a result: everything is directly driven by code and the flow of control is basically not controlled by scripts. This patch seems like a deviation from that design.

That being said, this patch looks what it should do in a simplest possible way, so I think I'm fine with this. I'm just expressing my thought.

ELF/Writer.cpp
204 ↗(On Diff #112986)

I'd explain what is "predefined" a bit more, as this comment is essentially the same as the function name.

927 ↗(On Diff #112986)

predefined symbols (e.g. _end or _etext)

This revision is now accepted and ready to land.Aug 30 2017, 3:07 PM
phosek updated this revision to Diff 113334.Aug 30 2017, 4:26 PM
phosek marked 2 inline comments as done.

Thanks! I understand your concern but I think this change is in line with what LLD already does e.g. for output sections which are also converted to commands. The main difference between BFD ld and LLD is that in BFD ld, even the default behavior is driven by the linker script while in LLD we only construct the internal representation that is equivalent to some imaginary linker script... in a way, the BaseCommand hierarchy is becoming LLD's IR.

This revision was automatically updated to reflect the committed changes.