This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Sort by input order within an input section description
ClosedPublic

Authored by MaskRay on Nov 9 2020, 11:23 PM.

Details

Summary

According to
https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html#Input-Section-Basics
for *(.a .b), the order should match the input order:

  • for ld 1.o 2.o, sections from 1.o precede sections from 2.o
  • within a file, .a and .b appear in the section header table order

This patch implements the behavior. The interaction with SORT* and --sort-section is:

Matched sections are ordered by radix sort with the keys being (SORT*, --sort-section, input order),
where SORT* (if present) is most significant.

Note, multiple SORT* within an input section description has undocumented and
confusing behaviors in GNU ld:
https://sourceware.org/pipermail/binutils/2020-November/114083.html
Therefore multiple SORT* is not the focus for this patch but
this patch still strives to have an explainable behavior.

As an example, we partition SORT(a.*) b.* c.* SORT(d.*), into
SORT(a.*) | b.* c.* | SORT(d.*) and perform sorting within groups. Sections
matched by patterns between two SORT* are sorted by input order. If
--sort-alignment is given, they are sorted by --sort-alignment, breaking tie by
input order.

This patch also allows a section to be matched by multiple patterns, previously
duplicated sections could occupy more space in the output and had erroneous zero bytes.

The patch is in preparation for support for
*(SORT_BY_INIT_PRIORITY(.init_array.* .ctors.*)) *(.init_array .ctors),
which will allow LLD to mix .ctors*/.init_array* like GNU ld (gold's --ctors-in-init-array)
PR44698 and PR48096

Diff Detail

Event Timeline

MaskRay created this revision.Nov 9 2020, 11:23 PM
MaskRay requested review of this revision.Nov 9 2020, 11:23 PM

This patch also fixes 2 bugs:
The second SORT in *(SORT(...) SORT(...)) is incorrectly parsed as a file pattern
A section may be added multiple times if it multiple patterns match it

Can these bug fixes be separated to their own reviews (at least the first one seems can be independent)?

lld/ELF/LinkerScript.cpp
436

I am not sure that doing 2 lookups in the DenseMap on each predicate call is a good idea.

445–449

So, with it it is also possible to do lookup in pos twice:
one here and one with pos[sec] |= 1;

It looks a bit dirty. I think usually double lookup situation is avoided.

472

Looks a bit hacky approach to me.

477

You don't need to call sortByPosition here? It is always sorted I believe.

493

I've experimented with the implementation and got the following
version that addresses all my concerns/comments. What do you think?

// Compute and remember which sections the InputSectionDescription matches.
std::vector<InputSectionBase *>
LinkerScript::computeInputSections(const InputSectionDescription *cmd,
                                   ArrayRef<InputSectionBase *> sections) {
  std::vector<InputSectionBase *> ret;
  SetVector<size_t> indices; // Parallel to ret.

  // Collects all sections that satisfy constraints of Cmd.
  size_t sizeLastSort = 0;
  for (const SectionPattern &pat : cmd->sectionPatterns) {
    size_t sizeBefore = ret.size();

    for (size_t I = 0, E = sections.size(); I != E; ++I) {
      InputSectionBase *sec = sections[I];
      if (!sec->isLive() || sec->parent || indices.contains(I))
        continue;

      // For -emit-relocs we have to ignore entries like
      //   .rela.dyn : { *(.rela.data) }
      // which are common because they are in the default bfd script.
      // We do not ignore SHT_REL[A] linker-synthesized sections here because
      // want to support scripts that do custom layout for them.
      if (isa<InputSection>(sec) &&
          cast<InputSection>(sec)->getRelocatedSection())
        continue;

      // Check the name early to improve performance in the common case.
      if (!pat.sectionPat.match(sec->name))
        continue;

      if (!cmd->matchesFile(sec->file) || pat.excludesFile(sec->file) ||
          (sec->flags & cmd->withFlags) != cmd->withFlags ||
          (sec->flags & cmd->withoutFlags) != 0)
        continue;

      indices.insert(I);
      ret.push_back(sec);
    }

    if (pat.sortOuter == SortSectionPolicy::Default)
      continue;

    sortInputSections(
        MutableArrayRef<InputSectionBase *>(ret).slice(sizeBefore),
        pat.sortOuter, pat.sortInner);
    sizeLastSort = ret.size();
  }

  std::vector<size_t> indicesV = indices.takeVector();
  llvm::sort(MutableArrayRef<size_t>(indicesV).slice(sizeLastSort),
             [&](size_t l, size_t r) { return l < r; });

  for (size_t I = sizeLastSort, E = ret.size(); I != E; ++I)
    ret[I] = sections[indicesV[I]];

  sortInputSections(
      MutableArrayRef<InputSectionBase *>(ret).slice(sizeLastSort),
      config->sortSection, SortSectionPolicy::None);
  return ret;
}
MaskRay updated this revision to Diff 304226.Nov 10 2020, 9:07 AM
MaskRay marked 2 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

Comments

Move a bugfix to D91180

lld/ELF/LinkerScript.cpp
472

This could be another DenseSet, but that would have more overhead. So I reused the map.

477

This is the key. We need to sort patterns between two SORT by position.

Similarly, we need to sort the trailing patterns outside of the loop (if the last pattern is not SORT)

MaskRay marked 2 inline comments as done.Nov 10 2020, 9:07 AM
MaskRay updated this revision to Diff 304228.Nov 10 2020, 9:11 AM

Fix a comment

MaskRay updated this revision to Diff 304231.Nov 10 2020, 9:22 AM
MaskRay marked an inline comment as done.

Optimize

MaskRay updated this revision to Diff 304238.Nov 10 2020, 9:41 AM

Take into account --sort-section and add a test.

Harbormaster completed remote builds in B78321: Diff 304228.

It took a lot of staring at that bit of code to work out what was going on. If I have understood it correctly. In particular I got confused with the sizeBefore and sizeLastSort with sizeBefore being the larger value. It took me a while to spot the continue. If I've understood correctly:

  • sizeBefore is really sizeOfRetBeforeNextSort
  • sizeLastSort is really sizeOfRetAfterPreviousSort

If possible could we use Prev rather than Last as it is more specific. Last can mean the last one we went by, or the last one in the list. I strongly recommend adding something like Next to sizeBefore so that it is clear that sizeBefore >= sizeLastSort.

It would be good to write a comment explaining how the partitions work as there are a lot of sorts, some of the indices and some of the sections.

MaskRay updated this revision to Diff 304398.Nov 10 2020, 9:55 PM

Improve comments
Change size* names

psmith added inline comments.Nov 11 2020, 3:17 AM
lld/ELF/LinkerScript.cpp
482

Thanks for the update, I think the new names are much better. One small nit, I think in sizeBeforePat should be sizeBeforeCurrPat in

ret[sizeBeforePat,ret.size())
grimar added inline comments.Nov 11 2020, 3:23 AM
lld/ELF/LinkerScript.cpp
430

The more common name in LLD for this would be seen

MaskRay updated this revision to Diff 304603.Nov 11 2020, 11:26 AM
MaskRay marked 2 inline comments as done.
MaskRay edited the summary of this revision. (Show Details)

Improve comments

MaskRay retitled this revision from [ELF] Use input order instead of pattern order within an input section description to [ELF] Sort by input order within an input section description.Nov 11 2020, 11:27 AM

Thanks for the comments. I have improved comments and rewritten a significant portion of the description.
I added this paragraph:

Note, multiple SORT* within an input section description has undocumented and confusing behaviors in GNU ld: https://sourceware.org/pipermail/binutils/2020-November/114083.html
Therefore multiple SORT* is not the focus for this patch but this patch still strives to have an explainable behavior.

grimar accepted this revision.Nov 11 2020, 11:40 PM

LGTM. Please wait for final word from Peter.

This revision is now accepted and ready to land.Nov 11 2020, 11:40 PM

Thanks for the updates, LGTM too.

MaskRay edited the summary of this revision. (Show Details)Nov 12 2020, 8:48 AM