This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Do not keep symbols if they referenced only from discarded sections.
ClosedPublic

Authored by ikudrin on Oct 11 2017, 2:26 AM.

Diff Detail

Event Timeline

ikudrin created this revision.Oct 11 2017, 2:26 AM
grimar added a subscriber: grimar.Oct 11 2017, 7:58 AM
ikudrin updated this revision to Diff 120200.Oct 25 2017, 1:12 AM

Ping.

  • Rebased on the tip.
  • Slightly improved tests.
ruiu edited edge metadata.Oct 25 2017, 8:44 AM

I wonder if you really had to add a new flag to the symbol. includeInDynsym can return false if a symbol is in a dead section, no?

The symbol is not in a dead section but referenced from it. Right now I don't know any existing way to say if a symbol is referenced only from dead sections or not.

ruiu added inline comments.Oct 26 2017, 3:24 PM
ELF/Writer.cpp
1289–1299

I don't think this is logically correct. If includeInDynsym returns true, the symbol should be included in a .dynsym regardless of other conditions. You should modify other place so that includeInDynsym works correctly.

ikudrin added inline comments.Oct 27 2017, 6:05 AM
ELF/Writer.cpp
1289–1299

includeInDynsym is used when computing IsPreemptible. This flag is calculated before calling scanRelocations because it is used there. And with my approach, we can say that a symbol is useless only after scanRelocations is finished and the symbol has not been used anywhere.

includeInDynsym is also called in SymbolTable::handleDynamicList, even before GC. At that point, we can't say if a symbol is going to become unreferenced after GC or not.

What if we just rename includeInDynsym to something like maybeIncludeInDynsym?

ruiu added inline comments.Oct 27 2017, 12:50 PM
ELF/Writer.cpp
1289–1299

Still it doesn't feel right. It feels like is a violation of constraints each step in lld should maintain. This complex and subtle decision shouldn't be made at this late stage.

You are fixing an issue when -gc-section is given. In the garbage collector, you visit all relocations to see whether something is alive or not. So, can't you do everything in that phase?

ikudrin updated this revision to Diff 122219.Nov 9 2017, 3:17 AM
ikudrin retitled this revision from [ELF] Avoid keeping useless undefined symbols in .dynsym. to [ELF] Do not keep symbols if they referenced only from discarded sections..
ikudrin edited the summary of this revision. (Show Details)
  • Rebase on the top.
  • Add flag 'Live' to Symbol.
  • Determine most of the alive symbols during GC phase.
ruiu added inline comments.Nov 14 2017, 12:20 AM
ELF/MarkLive.cpp
65 ↗(On Diff #122219)

Why do you have to exclude weak symbols? Please expand (or replace) the comment to explain why this code should be like this.

281 ↗(On Diff #122219)

Can't you just use getSymbols instead of getLocalSymbols?

310–312 ↗(On Diff #122219)

Why do you need to keep such symbols? It seems like practically you can just ignore such symbols.

ikudrin added inline comments.Nov 16 2017, 3:40 AM
ELF/MarkLive.cpp
65 ↗(On Diff #122219)

What about this?

// Weak symbols should not cause adding DT_NEEDED.

Anyway, it's mostly a replacement for the code removed from SymbolTable::addShared().

281 ↗(On Diff #122219)

Not all symbols come from the object files. For example, in test/ELF/wrap.s, symbol __real_foo would disappear if we didn't process Symtab->getSymbols(). So, we have to process symbols from Symtab. Additionally, local symbols from object files should also be processed, but we don't need to scan global symbols again because they are already processed at the first step.

310–312 ↗(On Diff #122219)

It's a replacement for the removed logic from includeInSymtab(). Now we have Live flag set for all existing symbols after GC phase.

There might be several cases when a symbol references an alive section, but we haven't seen it in relocations. For example:

$ cat a.s
.global _start, foo, bar
.text
_start:
	bl foo
	bl baz
.section .text.foo,"ax"
foo:
	nop
bar:
	nop
baz:
	nop
$ as a.s -o a.o
$ readelf -r a.o
Relocation section '.rela.text' at offset 0x178 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000000  00090000011b R_AARCH64_CALL26  0000000000000000 foo + 0
000000000004  00060000011b R_AARCH64_CALL26  0000000000000000 .text.foo + 8
$ ld.lld --gc-sections a.o -o a.out

The question is, should symbols bar and baz be kept or not? As you can see, bar points to some location inside the section, but is not involved in any relocation. For baz, things are a bit different; it is used in the code but in the corresponding relocation a section symbol is used instead.

The current behavior of lld and other linkers (ld and gold) is consistent, as they all keep such symbols. I believe that, in any case, this patch should not change that because it is intended to solve another issue.

ruiu added a comment.Nov 20 2017, 1:22 AM

I feel like this patch is a bit too complicated, but I can't really point out why I felt like that. This patch actually does two different things, no? Do you mind if I ask you to split it? I think one patch can be something like this https://reviews.llvm.org/D40240, and you can create another patch not to add dead symbols to the symbol table later.

ikudrin updated this revision to Diff 123596.Nov 20 2017, 7:57 AM
ikudrin edited the summary of this revision. (Show Details)
  • Extract corrections in the existing tests into D40253.
  • Use handling of DT_NEEDED from D40240.
  • Don't try to maintain actual Live flag for all symbols. Just mark symbols used in relocations in live sections and make the final decision later in includeInSymtab.
In D38790#930021, @ruiu wrote:

I feel like this patch is a bit too complicated, but I can't really point out why I felt like that. This patch actually does two different things, no? Do you mind if I ask you to split it? I think one patch can be something like this https://reviews.llvm.org/D40240, and you can create another patch not to add dead symbols to the symbol table later.

The main reason I did it that way is that ELF/gc-sections-shared.s test stated that an external symbol bar2 and DT_NEEDED entry for the corresponding library should be both eighter preserved or dropped. Thus, if dealing with DT_NEEDED is extracted into a separate patch, the test has to be adjusted to that intermediate state accordingly.

ruiu added a comment.Nov 20 2017, 11:23 PM

Can you upload a patch again? It seems like this patch is not against master but against my patch.

ikudrin updated this revision to Diff 123727.Nov 21 2017, 12:36 AM
  • Do not depend on D40240.
  • Mark test/ELF/gc-sections-shared.s to be in an intermediate state.
ikudrin updated this revision to Diff 124139.Nov 24 2017, 2:38 AM
  • Rebased on the tip.
  • This is mostly Rafael's version of the patch.
  • Added a check for a weak symbol when setting IsNeeded.
  • Added the test to check that a weak symbol doesn't cause adding DT_NEEDED; this test might be extracted to a separate patch.
This revision was automatically updated to reflect the committed changes.