This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Fix of 22906, referencing __start or __stop should keep the section from GC.
ClosedPublic

Authored by grimar on Feb 22 2016, 4:36 AM.

Details

Summary

This fixes the https://llvm.org/bugs/show_bug.cgi?id=22906 bug.

In GNU Binutils, a reference to start or stop is sufficient to prevent the section from being garbage collected.

This patch is a bit different in behavior: we just do not reclaim sections if their section names are valid C identifiers.
So it does not matter of start/stop symbols are referenced.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 48667.Feb 22 2016, 4:36 AM
grimar retitled this revision from to [ELF] - Fix of 22906, referencing __start or __stop should keep the section from GC..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
emaste added a subscriber: emaste.Feb 22 2016, 5:13 AM
emaste added inline comments.Feb 22 2016, 5:18 AM
ELF/MarkLive.cpp
83 ↗(On Diff #48667)

Is the input->output section name mapping necessary? Those section names are not valid C identifiers anyhow.

ed added a subscriber: ed.Feb 22 2016, 6:05 AM

Unfortunately, this patch doesn't apply cleanly against LLD 3.8.0-rc2. If you generate a patch against that version of LLD, I'll perform a package build to see if it was fixing the issue I was running in to specifically. Thanks!

ruiu edited edge metadata.Feb 22 2016, 2:54 PM

Although I understand the point of this patch, is this significant? Do you need this to link some specific program?

ed added a comment.Feb 23 2016, 12:19 AM
In D17502#359151, @ruiu wrote:

Although I understand the point of this patch, is this significant? Do you need this to link some specific program?

Yes. The unit testing binary for CloudABI's C library: https://github.com/NuxiNL/cloudlibc.

Each of the individual unit tests place a structure in a separate section, so they are registered automatically. The unit testing binary's entry point then iterates over all of the structures placed in this section and invokes the callbacks. Relevant sources:

https://github.com/NuxiNL/cloudlibc/blob/master/src/include/testing.h#L74-L101
https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/testing/testing_execute.c#L28-L30

CloudABI recently switched over to LLD on x86-64. The only regression we run in to compared to GNU ld is this specific issue.

In D17502#358468, @ed wrote:

Unfortunately, this patch doesn't apply cleanly against LLD 3.8.0-rc2. If you generate a patch against that version of LLD, I'll perform a package build to see if it was fixing the issue I was running in to specifically. Thanks!

Yes, I tried to port it for LLD 3.8.0-rc2, but found that it is not that easy as seems. There are changes in logic of getOutputSectionName(), for example. In newer code it references linker script methods, in older - internal fields of Writer. So it does not looks possible to easy move this method from Writer class like was done in this patch.
So I am not sure it is worth to implement it for old code as there are too many code changes already, sorry.

ELF/MarkLive.cpp
83 ↗(On Diff #48667)

I think it is. Thats mirrors logic of

void Writer<ELFT>::addStartStopSymbols(OutputSectionBase<ELFT> *Sec) {
  StringRef S = Sec->getName();
  if (!isValidCIdentifier(S))
    return;
...

which uses output section name to create start/stop identificator name.
Also since we can use linker script to place sections to any output section, it seems to be correct for me.

emaste added inline comments.Feb 23 2016, 5:13 AM
ELF/MarkLive.cpp
83 ↗(On Diff #48667)

Ah yes, sorry - I glossed over the arbitrary mapping that might come from the linker script and only looked at the built-in set of .text. etc.

ruiu added inline comments.Feb 23 2016, 3:13 PM
ELF/MarkLive.cpp
83 ↗(On Diff #48667)

This still seems to be overkill and would probably be slow. I don't feel that we really have to worry about a combination of start/stop and linker script. If you are using linker scripts, you can easily keep all sections using KEEP command, and you don't really have to depend on start/stop symbols because you can freely define any symbols using linker scripts.

I'd prefer a simpler implementation. You can check only input section names instead of output section names by adding this code to isReserved in MarkLive.cpp.

default:
  StringRef S = Sec->getSectionName();

  // We do not want to reclaim sections if they can be referred
  // by __start_* and __stop_* symbols.
  if (isValidCIdentifier(S))
    return true;

  return S.startswith(".ctors") || S.startswith(".dtors") ||
         S.startswith(".init") || S.startswith(".fini") ||
         S.startswith(".jcr");
}
grimar updated this revision to Diff 48898.Feb 24 2016, 2:19 AM
grimar edited edge metadata.
  • Applyed Rui's suggestion.
grimar added inline comments.Feb 24 2016, 2:22 AM
ELF/MarkLive.cpp
83 ↗(On Diff #48667)

But don't you want at least to check that there are start_/stop symbols referenced to keep sections alive ?
And also we can check if we are using LS then:

  if (!usingScript() && isValidCIdentifier(S)) {
  if (SymbolBody *B = Symtab->find(("__start_" + S).str()))
    if (B->isUndefined())
      return true;
  if (SymbolBody *B = Symtab->find(("__start_" + S).str()))
    if (B->isUndefined())
      return true;
    return false;
}

Or do you mean Symtab->find() would be slow ? But as far I understand there are not too much sections that are isValidCIdentifiers.

In D17502#358468, @ed wrote:

Unfortunately, this patch doesn't apply cleanly against LLD 3.8.0-rc2. If you generate a patch against that version of LLD, I'll perform a package build to see if it was fixing the issue I was running in to specifically. Thanks!

Hello Ed ! Updated version patch fits LLD 3.8.0-rc2 fine. It is in attachment.

ed added a comment.Feb 24 2016, 3:39 AM

Hi George,

Hello Ed ! Updated version patch fits LLD 3.8.0-rc2 fine. It is in attachment.

That is awesome. I've just done some experiments with your patch for 3.8.0-rc2 and it looks like I can get our unit testing binary to link. The unit testing binary also seems to work all right. Thanks a lot!

I've just included this patch into our package building system and switched it back to using LLD on x86-64 instead of using GNU ld:

https://github.com/NuxiNL/cloudabi-ports/commit/2629d0907a2324d1c257e7412485d02f247e2ae6

ruiu added a comment.Feb 24 2016, 10:49 AM

LGTM with a nit.

ELF/MarkLive.cpp
90 ↗(On Diff #48898)

Yes, I wanted to avoid hash table lookup here, and also the code duplication (although it is just 8 lines of code) was my concern. This is a small corner case so I think we want to avoid overdesigning stuff but just make it work with a minimal amount of work.

test/ELF/startstop-gccollect.s
1 ↗(On Diff #48898)

You want to remove start and stop symbols from the test and just verify that we do not reclaim sections if their section names are valid C identifiers.

grimar added inline comments.Feb 24 2016, 11:13 PM
test/ELF/startstop-gccollect.s
1 ↗(On Diff #48898)

That was already done in latest diff, no ? :)
start_/stop_ are just mentioned in comments to explain the behavior logic.

This revision was automatically updated to reflect the committed changes.
grimar updated this object.Feb 25 2016, 12:52 AM