This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Linkerscript: add InputSectionDescription command to LS parser.
ClosedPublic

Authored by grimar on Jul 21 2016, 2:27 AM.

Details

Summary

This adds InputSectionDescription command to represent
the input section declaration.

This leads to next cleanup:
SectionRule removed.
ScriptConfiguration::Sections mamber removed.
LinkerScript<ELFT>::getOutputSection() removed.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 64843.Jul 21 2016, 2:27 AM
grimar retitled this revision from to [ELF] - Linkerscript: add InputSectionDescription command to LS parser..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar, evgeny777.
evgeny777 added inline comments.Jul 21 2016, 2:32 AM
ELF/LinkerScript.cpp
220 ↗(On Diff #64843)

It looks like a slow-down compared with previous one, no?

grimar added inline comments.Jul 21 2016, 2:38 AM
ELF/LinkerScript.cpp
220 ↗(On Diff #64843)

That fits in new design. I am sure we will spend time on optimization of LS later,
when will be able to run benchmarks. And probably many places will need tweaking,
for now I don't think it is critical to change until major slowdown is proven.

evgeny777 added inline comments.Jul 21 2016, 2:47 AM
ELF/LinkerScript.cpp
220 ↗(On Diff #64843)

As far as I understand if --gc-sections command line switch is set, then markLive() calls LinkerScript::shouldKeep() for every input section. There could easily be millions of input sections in large projects like WebKit or LLVM itself. It might not be a good idea to sacrifice performance that much in favor of design improvements.

grimar added inline comments.Jul 21 2016, 2:54 AM
ELF/LinkerScript.cpp
220 ↗(On Diff #64843)

I think the most slow here is the call of globMatch. Amount of calls remained the same.

evgeny777 added inline comments.Jul 21 2016, 3:30 AM
ELF/LinkerScript.cpp
220 ↗(On Diff #64843)

Well, the time complexity in your new code is O(I*R), I - number of sections, R - number of rules
Imagine you don't have any kept sections and don't call globMatch. Let's say you have 1 million sections and 10 rules total in script
Your for statements are actually for_each, at average you need about 20 instructions (with -O2 and 100 without optimization) for a single iteration (this is just 'for' statement without body).
Let's say you'll need 50 instructions for each rule (just assumption - can't tell the exact number), so

1 000 000 * 10 * 50 = 500 000 000
On 2 Ghz processor and 1 clock cycle per instruction

t = (5 * 10^8) / (2*10^9) = 0.25 sec.
for doing nothing

Is it worth that?

ruiu added inline comments.Jul 21 2016, 3:36 AM
ELF/LinkerScript.cpp
220 ↗(On Diff #64843)

Yeah, I have a feeling that we should keep KeptSections mostly for simplicity. The reason why we want to construct an AST in a way that that reflects the structure of a given linker script is because we want to construct output sections in the same order as they appear in a linker script. On the other hand, "KEEP" seems to have been just piggybacking the grammar which they already had. It doesn't belong to this much deep data structure.

250 ↗(On Diff #64843)

I think you still want to iterate over SECTIONS subcommands. Eventually we want to create output sections and add input sections to them while we are visiting SECTIONS subcommands so that the results are naturally sorted by the same order as they appear in a linker script, so that we don't need to sort the results afterwards.

745 ↗(On Diff #64843)

Use auto.

754 ↗(On Diff #64843)

Ditto

grimar updated this revision to Diff 64854.Jul 21 2016, 4:16 AM
grimar updated this object.
  • Addressed review comments.
evgeny777 added inline comments.Jul 21 2016, 4:22 AM
ELF/LinkerScript.h
45 ↗(On Diff #64854)

See comment above:

This enum represents what we can observe in SECTIONS tag of script.
(https://sourceware.org/binutils/docs/ld/SECTIONS.html#SECTIONS)

If you follow the link:

Each sections-command may of be one of the following:

- an ENTRY command (see Entry command)
- a symbol assignment (see Assignments)
- an output section description
- an overlay description
grimar added inline comments.Jul 21 2016, 4:24 AM
ELF/LinkerScript.cpp
220 ↗(On Diff #64843)

Reverted to KeptSections.

250 ↗(On Diff #64843)

I see. Done. Looks a little bit more complicated now though.

745 ↗(On Diff #64843)

Done.

754 ↗(On Diff #64843)

Done.

ELF/LinkerScript.h
45 ↗(On Diff #64854)

Well, comment is depricated. I guess we do not want to treat SectionsCommandKind as written anymore, so it can be removed completely,
Rui, what do you think ?

ruiu added inline comments.Jul 21 2016, 4:35 AM
ELF/LinkerScript.cpp
202 ↗(On Diff #64854)

Do you still need this function?

247 ↗(On Diff #64854)

OutSec as a name for a command is very confusing.

249 ↗(On Diff #64854)

Flip if conditions and do continue to reduce indentation levels.

ELF/LinkerScript.h
45 ↗(On Diff #64854)

We should just remove this comment and leave the link to the documentation.

76 ↗(On Diff #64854)

Now you can remove this.

grimar updated this revision to Diff 64860.Jul 21 2016, 5:16 AM
  • Addressed review comments.
ELF/LinkerScript.cpp
202 ↗(On Diff #64854)

getOutputSection() was used in LinkerScript<ELFT>::isDiscarded().
I changed logic to avoid the using as "/DISCARD/" can just be processed as
special section kind.

247 ↗(On Diff #64854)

Changed to OutSec.

249 ↗(On Diff #64854)

Done.

ELF/LinkerScript.h
45 ↗(On Diff #64854)

Done.

76 ↗(On Diff #64854)

Done.

grimar added inline comments.Jul 21 2016, 5:28 AM
ELF/LinkerScript.cpp
247 ↗(On Diff #64854)

I mean to OutCmd, sorry.

ruiu added inline comments.Jul 21 2016, 5:37 AM
ELF/LinkerScript.cpp
240 ↗(On Diff #64860)

Again, InSec is a confusing name.

249–253 ↗(On Diff #64860)

Regular for-each loop is probably easier to read.

ELF/LinkerScript.h
93–94 ↗(On Diff #64860)

Remove.

grimar updated this revision to Diff 64865.Jul 21 2016, 5:53 AM
  • Addressed review comments.
ELF/LinkerScript.cpp
240 ↗(On Diff #64860)

Fixed.

249–253 ↗(On Diff #64860)

Changed.

ELF/LinkerScript.h
93–94 ↗(On Diff #64860)

Done.

grimar updated this object.Jul 21 2016, 5:54 AM
ruiu added inline comments.Jul 21 2016, 6:01 AM
ELF/LinkerScript.cpp
234 ↗(On Diff #64865)

This change will make this function too long, but that's probably okay for now. I'll try to split it up.

259 ↗(On Diff #64865)

break at end of a for-loop?

261–262 ↗(On Diff #64865)

If you need {}, please put {} all inner loops.

grimar added inline comments.Jul 21 2016, 6:06 AM
ELF/LinkerScript.cpp
259 ↗(On Diff #64865)

Yes, break. I need to exit this loop after first AddInputSec. As we do not want to add the section again if there are multiple pattern matches.
That was why I used std::find_if in previous iteration.
May be worth to revert to previous version ?

ruiu added inline comments.Jul 21 2016, 6:09 AM
ELF/LinkerScript.cpp
259 ↗(On Diff #64865)

You can define a function taking two parameters, a stringref and an arrayref of stringref, which returns true if the former string glob-matches one of the latter strings.

grimar added inline comments.Jul 21 2016, 6:27 AM
ELF/LinkerScript.cpp
259 ↗(On Diff #64865)

What about next ?

static bool hasGlobMatch(StringRef T, ArrayRef<StringRef> L) {
  for (StringRef S : L)
    if (globMatch(S, T))
      return true;
  return false;
}
grimar updated this revision to Diff 64871.Jul 21 2016, 6:35 AM
  • Addressed review comments.
ruiu added inline comments.Jul 21 2016, 6:36 AM
ELF/LinkerScript.cpp
259 ↗(On Diff #64865)

That's what I meant. (a few nits are: I'd name the function match because "GlobMatch" is not an object or a thing, rename T Pattern, and swap T and L to match the parameter order of globMatch.)

grimar added inline comments.Jul 21 2016, 6:54 AM
ELF/LinkerScript.cpp
266 ↗(On Diff #64871)

Afraid I am a bit lost in this puzzle :)
Is that what you mean ?

static bool match(StringRef T, ArrayRef<StringRef> Pattern) {
  for (StringRef S : Pattern)
    if (globMatch(S, T))
      return true;
  return false;
}
ruiu accepted this revision.Jul 21 2016, 7:01 AM
ruiu edited edge metadata.

LGTM with the change.

ELF/LinkerScript.cpp
266 ↗(On Diff #64871)
  1. I want parameters of similar functions be ordered in a consistent way. However, your new function takes [Pattern Target] although globMatch takes [Target Pattern]
  2. T is too short and doesn't convey it is a glob pattern rather.
  3. Apparently Pattern is not a right name for ArrayRef<StringRef> that doesn't contain patterns

    static bool match(StringRef Pattern, ArrayRef<StringRef> Arr) { for (StringRef S : Arr) if (globMatch(Pattern, S)) return true; return false; }
This revision is now accepted and ready to land.Jul 21 2016, 7:01 AM
grimar added inline comments.Jul 21 2016, 7:05 AM
ELF/LinkerScript.cpp
266 ↗(On Diff #64871)

Ok, now I see what you meant. Sorry for my confusion and thanks a lot for all reviews !

grimar added inline comments.Jul 21 2016, 7:29 AM
ELF/LinkerScript.cpp
266 ↗(On Diff #64871)
// Returns true if S matches T. S can contain glob meta-characters.
// The asterisk ('*') matches zero or more characters, and the question
// mark ('?') matches one character.
bool elf::globMatch(StringRef S, StringRef T) {
  for (;;) {
    if (S.empty())
      return T.empty();
    if (S[0] == '*') {
      S = S.substr(1);
      if (S.empty())
        // Fast path. If a pattern is '*', it matches anything.
        return true;
      for (size_t I = 0, E = T.size(); I < E; ++I)
        if (globMatch(S, T.substr(I)))
          return true;
      return false;
    }
    if (T.empty() || (S[0] != T[0] && S[0] != '?'))
      return false;
    S = S.substr(1);
    T = T.substr(1);
  }
}
  1. According to comments S is pattern. So it takes [Pattern, Target].
  2. My function searches Name in a list of wildcards, that mean second is a list of patterns, so ArrayRef<StringRef> *contains* patterns.

For function to work as I need, it needs to be:

if (globMatch(S, Pattern))

instead of

if (globMatch(Pattern, S))

And that is what inconsistent with globMatch,
which has Pattern as first argument.

I am really not trying to argue here. I`ll commit it and just please adjust the naming as you want after that.

This revision was automatically updated to reflect the committed changes.