This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Prototype of possible linkerscript redesign.
AbandonedPublic

Authored by grimar on May 5 2016, 7:53 AM.

Details

Reviewers
ruiu
rafael
Summary

Patch implements command driven linker script logic.
ScriptConfiguration class contains only a list of commands after
applying. No more any other temporarily representation.

That should allow to implement any scriopt feature I believe since
at any point it is possible to iterate over commands list and
process them accordinly.

Also one of the point is the creation of sections at linker side
and separating logic of script from any other part of linker
as much as possible.

Diff Detail

Event Timeline

grimar updated this revision to Diff 56288.May 5 2016, 7:53 AM
grimar retitled this revision from to [ELF] - Prototype of possible linkerscript redesign..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar updated this object.
grimar added subscribers: llvm-commits, grimar.
emaste added a subscriber: emaste.May 5 2016, 8:28 AM
ruiu edited edge metadata.May 5 2016, 6:39 PM

Before going into details, can I ask you if you think this is going to be able to support all the linker script features? Is there any concerns/limitations on it?

grimar added a comment.May 6 2016, 4:15 AM
In D19976#423173, @ruiu wrote:

Before going into details, can I ask you if you think this is going to be able to support all the linker script features? Is there any concerns/limitations on it?

I think this approach should be pretty efficient.
I reviewed again freebsd script and I think all commands it uses can be implemented with this approach without
hacks or anything alike. I can not imagine a real limitations. The way it works seems very natural to me,
because of 2 main steps that are separated and do not cross at all.

  1. Creating sections and processing all commands inside declaration to finally get

a output section as a black box for future use. This allows to make any internal layout and
support every internal command (it seems). The fact that location counter
meaning inside declaration differs from meaning outside very helpfull here and strengthens my faith.
For example PROVIDE inside declarations is easy to support because we know (.) value that is offset here,
we can move it, align or whatever we want, we can place sections in the order we want, sort them, keep or discard,
that all is also easy because on each step we have direct access to linkerscipt commands and have no any other
task except "create output section".

  1. Assign addresses and processing external to sections declarations commands is very comfortable to do now.

For example PROVIDE on that step requires global (.) value and that is what easily we can give it.
We do not need to bother here with internals of sections, because external commands can not affect them and
we already have sections created at step 1, so we skip all that stuff during processing.
We don't need to sort them here, as it is assumed they are already sorted after step 1.
And again we have direct access to linkerscript sections commands on this step as well
(actually current code gives ability to read/interpret them at any point of program), so we definetely should
be able to implement everything we might want, since there is no any information lost.

Now about my conserns.
The only I have is about processing predefined sections.
Now in this patch they are not processed, so they are just added to the end of outputsections list
in addPredefinedSections(). I think since script is responsible for full layout now, them should be processed
on its side either. So I mean if we see (for example):

.got            : { *(.got) }
.got.plt        : { *(.got.plt) }

Then we want to place Out<ELFT>::Got and Out<ELFT>::GotPlt in that order in some generalized way.
We can probably just add all predefined sections to the end
and then sort the list at the end of step 1, since we know their names and we have matching pattern.
Or we can assume that ".got" is always Out<ELFT>::Got and so on, or process them in some nicer way.
So that is definetely possible, just requires some attention for clean implementation.
I would leave that for futher patches.

davide added a subscriber: davide.Jul 4 2016, 8:19 AM

Rui, can you please take a look when you get a chance? Thanks!

davide added a comment.Jul 4 2016, 8:24 AM

Thanks for taking care of this, George.
The first question that comes to my mind is: how is this code going to support expression relative to sections? When we do symbol assignment a symbol may be relative to a section so we need to be able to fetch the section (which was a little bit of a stretch in the current code).

My second question is:
Can you defend the ability of your code being able to implement cleanly more complex constructs? e.g.

KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
KEEP (*(SORT(.dtors.*)))
grimar added a comment.Jul 4 2016, 9:36 AM

Thanks for taking care of this, George.
The first question that comes to my mind is: how is this code going to support expression relative to sections? When we do symbol assignment a symbol may be relative to a section so we need to be able to fetch the section (which was a little bit of a stretch in the current code).

Core idea of the patch turns around LayoutParser class that has run(Callback C) method. It can be used in any place of code,
providing the ability to retrive and proccess linker script section's tag commands that are actual for the place of call.
So speaking about expressions relative to sections: patch has LinkerScript<ELFT>::createSections() method that
main intend is to create all sections and process all commands that are inside output section declararions (including expressions and everything else).
Since location pointer inside output section declaration is relative to section and we have access to all linkerscript commands at that point,
we should be able to process them and perform any layout and expression calculations we want. Just because we have everything for that at this point.

My second question is:
Can you defend the ability of your code being able to implement cleanly more complex constructs? e.g.

KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
KEEP (*(SORT(.dtors.*)))

We should be able to do that because will have access to all sections tag commands of script at any point and will be able to process them using custom callback we put to LayoutParser::run(Callback C). We might need to extend enum CommandKind with more types of commands and update the callback(s) properly.
What nice here that linkerscript code will be updated only in part of adding new commands to CommandKind.
And callback(s) can handle everything else.
Speaking about how clean it can be: everything will depend on callback implementation here. I believe it should be possible to implement nice and clean ones.

ruiu added a comment.Jul 7 2016, 5:35 PM

I'll take a look at this tomorrow.

In D19976#477596, @ruiu wrote:

I'll take a look at this tomorrow.

Thanks Rui.

ruiu added a comment.Jul 8 2016, 5:22 PM

This is a fairly large patch -- you moved code between files and add new features. I'd appreciate if you do this incrementally in future.

ELF/LinkerScript.cpp
283

This seems tricky. Why do you have to parse linker scripts while processing expressions? I don't see a reason to use a callback function here. It seems to me that we can instead parse all scripts to save commands to a vector and evaluate them in this function.

ELF/MarkLive.cpp
185 ↗(On Diff #56288)

What's wrong with this code? Why did you have to change the way we handle KEEP commands? The new code seems more complicated than before.

ELF/OutputSections.h
686

Move the definition to the .cpp file.

727–740

Can you move them to the .cpp?

ELF/Writer.cpp
109

This function is called only when a given section is discarded, so checking !IS || IS->Live seems redundant.

111

Remove llvm::.

625–626

Can you move this to createSections?

629–630

Now the function gets much smaller, we don't give a long name to this variable.

634

*F.get() -> *F

655
for (OutputSectionBase<ELFT> *Sec : createRegularSections()) {
818–822

Move this after BuildId so that you can write

if (needsInterpSection())
  OutputSections.insert(OutputSections.begin(), Out<ELFT>::Interp);
ELF/Writer.h
31 ↗(On Diff #56288)

This function should belong to InputSections.h instead of this header.

In D19976#479048, @ruiu wrote:

This is a fairly large patch -- you moved code between files and add new features. I'd appreciate if you do this incrementally in future.

I think that was what I did initially.
D19977, D19981, D20104 patches were posted earlier to remove noise from this one.
I pointed that in comments for them, also this one depends on first 2 at least,
so if we land them I`ll be able to rebase this patch and reduce the amount of changes.

grimar added a comment.EditedJul 10 2016, 11:49 PM

Rui, thanks for review.
I noticed that most of comments are relative to code movements that I was need to do for this patch.
So some times it is even not my changes at all.
That changes belongs to 3 patches mentioned above: D19977, D19981, D20104.
May be it would be reasonable for me to rebase these 3 and land them at first to remove noise from this patch ?

grimar retitled this revision from [ELF] - Prototype of possible linkerscript redesign. to [ELF, WIP] - Prototype of possible linkerscript redesign..Jul 12 2016, 4:53 AM
grimar updated this object.
grimar edited edge metadata.

So I am trying to rebase this now, I hope to finish and update it tomorrow.

Yes, please rebase so we can review more easily.

Yes, please rebase so we can review more easily.

Need a bit more time unfortunately, I`ll update tomorrow I guess.

grimar updated this revision to Diff 64093.Jul 14 2016, 11:52 PM
  • Rebased/updated the patch, addressed review comments.

So I rebased this. Currently there are 11 testcases failing, that what also was before the rebase and I am working on fix for that. But the patch still shows the concept I am suggesting.

ELF/LinkerScript.cpp
283

Agree. That was overcomplication.

ELF/MarkLive.cpp
185 ↗(On Diff #56288)

I reverted that place and implemented shouldKeep() to keep the previous interface.

Though, I'm quite new to lld development, I think this code can be simplified. Here is my version of createSections()
It simply iterates over SectionRule array, selects input sections matching each rule, and adds them output section

template <class ELFT>
OutputSectionList LinkerScript<ELFT>::createSections(OutputSectionFactory<ELFT> &Factory) {
   OutputSectionList Result;
  typedef std::unique_ptr<elf::ObjectFile<ELFT>> ObjectFileUP;

  // Select input sections matching rule and add them to corresponding 
  // output section. Section rules are processed in order they're listed
  // in script, so correct input section order is maintained by design
  for (SectionRule &R : Opt.Sections) {
    for (const ObjectFileUP &F : Symtab<ELFT>::X->getObjectFiles()) {
      for (InputSectionBase<ELFT> *S : F->getSections()) {
          if (!elf::isDiscarded(S) && globMatch(R.SectionPattern, S->getSectionName())) 
             // Adds input section to output section collection
             addInputSection(Factory, S, R.Dest, Result);
       }
    }
  }
  // Add all other input sections, which are not listed in script
  for (const ObjectFileUP &F : Symtab<ELFT>::X->getObjectFiles()) {
    for (InputSectionBase<ELFT> *C : F->getSections()) {
      if (C->OutSec == nullptr)
        addInputSection(Factory, C, getOutputSectionName(C), Result);
    }
  }
  return Result;
}

Then in Writer<ELFT> I'm dispatching to correct version of createSections(), just like you're doing.

It's several times shorter and passes all unit tests.
I might be missing some important strategic move, which is introduced by your patch, so let's wait and see what others will say.

ELF/LinkerScript.cpp
205–218

I wonder, if we can live without script parser refactoring for the very first iteration.
May be this can be done as a separate review?

279

I wonder why do you now mix SectionsRule (which is now InputSectionKind) and SectionsCommand (now OutputSectionKind) into one Command object? Every time you iterate Opt.LayoutParser->Commands, you need either rules or commands, never both. May be it makes sense to have two separate collections, like we had before?

318

You seem to ignore comment in original version of assignAddresses:

// Find all the sections with required name. There can be more than
// ont section with such name, if the alignment, flags or type
// attribute differs.
448

Do we need to insert to Assigned here? Looks like, we're doing the final processing step

465

See my previous comment about mixing section rules and commands.

Though, I'm quite new to lld development, I think this code can be simplified. Here is my version of createSections()

Can you post a patch then ? I think it is fine to have few visions and chose the most appropriate. Though my one removes a lot of things, so I belive it is useful change.

ELF/LinkerScript.cpp
205–218

Not sure here as this patch itself in my understanding a kind of refactoring.

279

I am not sure what is better. Just afraid that to have 2 collections is a overcomplication itself. Though I`ll think about that.

318

Yes, thanks, this patch uses a bit outdated code. I am going to update it when fix the testcases, though still interested to see general comments.

448

Yes, I think I can remove that, thanks.

majnemer added inline comments.
ELF/LinkerScript.cpp
49

Why not just use an ArrayRef<StringRef>?

81

Ditto.

282

Consider using llvm::find_if, it's a little more concise.

Plan to update this in a hour or two, have one testcase failture left. Also changed lot of things.
Generally I must admit that version of Eugene Leviant looks simplier, though our patches has different
purposed..
Mine about redesign in general and can take much longer time to submit (if it will be).
Also it will be even a bit larger after following update.
His patch solves the problem of section order still.
So Eugene, I can still suggest for you to post the patch if you want, and this one can be landed later if you need the feature
you are solving.

grimar updated this revision to Diff 64263.Jul 17 2016, 12:09 PM
  • Addressed review comments.
  • Reworked the patch. I did a cleanup because found that it is can be written simplier that I tried before.
  • Fixed all testcases except one. Unfortunately I was unable to fix the last yet, it is ELF/arm-thumb-interwork-thunk.s, which fails with "Virtual address is not in any segment" llvm error. Since testcase is just about 1.5 week old I had no chance to find the reason quickly. Though I think single testcase should not be major problem for now since it is WIP patch that is subject of change. I will try to fix it as soon I can.
ELF/LinkerScript.cpp
49

Fixed

81

Fixed

282

Fixed, thanks for hint, I did not know about llvm::find_if

evgeny777 added inline comments.Jul 17 2016, 12:18 PM
test/ELF/linkerscript-va.s
10–12

Why were section order and addresses changed?

grimar added inline comments.Jul 17 2016, 12:26 PM
test/ELF/linkerscript-va.s
10–12

Because section tag is empty here. There is no defined order since all sections just processed as orphans.
Also there is no sorting except by name anymore.
Address changed because there is more PT_LOADs after that sometimes.

That is expected change of this patch, though I think we want to discuss that. I just remember that for previous redesign there were review comments about that we probably want to make user to provide full layout if he/she needs the some exact order. Otherwise it can be changed.

I think FreeBSD script also provides the full layout. Do you have some thoughts about that ?

we probably want to make user to provide full layout if he/she needs the some exact order

Yes, I think it's OK to rely on the user to fully specify the layout if providing a linker script. (And revisit later on if it turns out there's a lot of software that relies on ld/gold defaults for unspecified sections.)

we probably want to make user to provide full layout if he/she needs the some exact order

Yes, I think it's OK to rely on the user to fully specify the layout if providing a linker script. (And revisit later on if it turns out there's a lot of software that relies on ld/gold defaults for unspecified sections.)

That sounds good because I guess that implementing heuristic that gold uses for orphans sections not only lead to code complication, but also a bit hard for understanding. I remember I was confused with that fact, I think just strict following of what is written in script is much more simple to understand for users.

Can you post a patch then ? I think it is fine to have few visions and chose the most appropriate

Added review
https://reviews.llvm.org/D22455

evgeny777 added inline comments.Jul 18 2016, 4:54 AM
ELF/LinkerScript.cpp
206

Looks like findSection is not called by anyone. I suggest removing it

445–454

This code doesn't sort orphan sections, which might make sense on some occasions.
What is the reason for that?

grimar updated this revision to Diff 64308.Jul 18 2016, 5:35 AM
  • Addressed review comments + minor cleanup changes.
ELF/LinkerScript.cpp
206

Right, missed it, removed.

445–454

Because from my understanding we do not need to care about orphans.
I tried to separate the logic of script from anything other as much I could. Just I think it
is easier to process them separatelly.
So since we do not care about orphans, we do not need to sort them.

I am going to simplify and update this in about a day

grimar updated this revision to Diff 64661.Jul 20 2016, 4:41 AM
  • Rebased to head.
  • Simplified and updated.
  • ELF/arm-thumb-interwork-thunk.s is still fails, but that is the only testcase left and I think that is relative to the fact that orphans are not sorted anymore (see few testcases updates because of that also). I am working on fixing that last testcase.

But in general that is in my opinion ok for review, I finished everything I planned.

grimar retitled this revision from [ELF, WIP] - Prototype of possible linkerscript redesign. to [ELF] - Prototype of possible linkerscript redesign..Jul 20 2016, 4:44 AM
grimar updated this object.
grimar updated this object.
grimar updated this revision to Diff 64664.Jul 20 2016, 5:06 AM
  • Removed unrelated changes.
  • Minor changes.
grimar updated this revision to Diff 64666.Jul 20 2016, 5:12 AM
  • Was too hurry to remove "unrelated changes", them are needed. Restored.
grimar updated this revision to Diff 64681.Jul 20 2016, 7:00 AM
  • Restored previously removed logic of sorting orphans. That allows all testcases to pass without any changes. We might want to remove that later, but I see no harm in that for now and also it fits this design well.

Is it any different from https://reviews.llvm.org/D22455 after your changes?

ELF/LinkerScript.cpp
224

You don't need to pass Symtab. You can just use Symtab<ELFT>::X

256

Ditto.

269

I have strong guess, that Rui will ask you to use lambda instead of addSection :)

Is it any different from https://reviews.llvm.org/D22455 after your changes?

Not sure I understood the question, though there are lot of.
This one makes LS to be command driven like mentioned in summary.

ScriptConfiguration does not contain any members required for intermediate representation,
just
std::vector<Command> Commands;
which is used when needed to get info from.
It fully separates LS from other parts of linker.

ruiu added a comment.Jul 20 2016, 7:32 AM

It seems that this overlaps too much with Eugene's changes. There would be lots of conflicts. It's probably faster if you can split it up in a series of incremental patches and commit them one by one because while it is being review, the base code is being updated.

ELF/LinkerScript.h
38

We shouldn't need an end marker. This struct represents a parsing result of a linker script and basically should form a parse tree. If you need an end marker, it means you are not creating a tree.

grimar added inline comments.Jul 20 2016, 8:09 AM
ELF/LinkerScript.h
38

Yes, I do not.
CommandKind here is a just plain sequence of commands for parcing.

grimar added a comment.EditedJul 20 2016, 8:21 AM
In D19976#489779, @ruiu wrote:

It seems that this overlaps too much with Eugene's changes. There would be lots of conflicts. It's probably faster if you can split it up in a series of incremental patches and commit them one by one because while it is being review, the base code is being updated.

Ok, let me prepare a first small patch, to see if I understood correctly what you want. I`ll try to prepare it in a next few hours.

In D19976#489779, @ruiu wrote:

It seems that this overlaps too much with Eugene's changes. There would be lots of conflicts. It's probably faster if you can split it up in a series of incremental patches and commit them one by one because while it is being review, the base code is being updated.

Ok, let me prepare a first small patch, to see if I understood correctly what you want. I`ll try to prepare it in a next few hours.

I prepared some kind of sample. Rui, could you take a look, is that what you expect to see as a tree ? D22581

grimar abandoned this revision.Jul 20 2016, 11:51 PM

Another approach was used finally: D22604