Such symbols introduce unwanted dependencies.
In some cases, keeping them may even result in failing linking an executable.
Details
Diff Detail
Event Timeline
I wonder if you really had to add a new flag to the symbol. includeInDynsym can return false if a symbol is in a dead section, no?
The symbol is not in a dead section but referenced from it. Right now I don't know any existing way to say if a symbol is referenced only from dead sections or not.
ELF/Writer.cpp | ||
---|---|---|
1275–1281 | I don't think this is logically correct. If includeInDynsym returns true, the symbol should be included in a .dynsym regardless of other conditions. You should modify other place so that includeInDynsym works correctly. |
ELF/Writer.cpp | ||
---|---|---|
1275–1281 | includeInDynsym is used when computing IsPreemptible. This flag is calculated before calling scanRelocations because it is used there. And with my approach, we can say that a symbol is useless only after scanRelocations is finished and the symbol has not been used anywhere. includeInDynsym is also called in SymbolTable::handleDynamicList, even before GC. At that point, we can't say if a symbol is going to become unreferenced after GC or not. What if we just rename includeInDynsym to something like maybeIncludeInDynsym? |
ELF/Writer.cpp | ||
---|---|---|
1275–1281 | Still it doesn't feel right. It feels like is a violation of constraints each step in lld should maintain. This complex and subtle decision shouldn't be made at this late stage. You are fixing an issue when -gc-section is given. In the garbage collector, you visit all relocations to see whether something is alive or not. So, can't you do everything in that phase? |
- Rebase on the top.
- Add flag 'Live' to Symbol.
- Determine most of the alive symbols during GC phase.
ELF/MarkLive.cpp | ||
---|---|---|
65 | Why do you have to exclude weak symbols? Please expand (or replace) the comment to explain why this code should be like this. | |
276 | Can't you just use getSymbols instead of getLocalSymbols? | |
300–302 | Why do you need to keep such symbols? It seems like practically you can just ignore such symbols. |
ELF/MarkLive.cpp | ||
---|---|---|
65 | What about this? // Weak symbols should not cause adding DT_NEEDED. Anyway, it's mostly a replacement for the code removed from SymbolTable::addShared(). | |
276 | Not all symbols come from the object files. For example, in test/ELF/wrap.s, symbol __real_foo would disappear if we didn't process Symtab->getSymbols(). So, we have to process symbols from Symtab. Additionally, local symbols from object files should also be processed, but we don't need to scan global symbols again because they are already processed at the first step. | |
300–302 | It's a replacement for the removed logic from includeInSymtab(). Now we have Live flag set for all existing symbols after GC phase. There might be several cases when a symbol references an alive section, but we haven't seen it in relocations. For example: $ cat a.s .global _start, foo, bar .text _start: bl foo bl baz .section .text.foo,"ax" foo: nop bar: nop baz: nop $ as a.s -o a.o $ readelf -r a.o Relocation section '.rela.text' at offset 0x178 contains 2 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000000 00090000011b R_AARCH64_CALL26 0000000000000000 foo + 0 000000000004 00060000011b R_AARCH64_CALL26 0000000000000000 .text.foo + 8 $ ld.lld --gc-sections a.o -o a.out The question is, should symbols bar and baz be kept or not? As you can see, bar points to some location inside the section, but is not involved in any relocation. For baz, things are a bit different; it is used in the code but in the corresponding relocation a section symbol is used instead. The current behavior of lld and other linkers (ld and gold) is consistent, as they all keep such symbols. I believe that, in any case, this patch should not change that because it is intended to solve another issue. |
I feel like this patch is a bit too complicated, but I can't really point out why I felt like that. This patch actually does two different things, no? Do you mind if I ask you to split it? I think one patch can be something like this https://reviews.llvm.org/D40240, and you can create another patch not to add dead symbols to the symbol table later.
The main reason I did it that way is that ELF/gc-sections-shared.s test stated that an external symbol bar2 and DT_NEEDED entry for the corresponding library should be both eighter preserved or dropped. Thus, if dealing with DT_NEEDED is extracted into a separate patch, the test has to be adjusted to that intermediate state accordingly.
Can you upload a patch again? It seems like this patch is not against master but against my patch.
- Do not depend on D40240.
- Mark test/ELF/gc-sections-shared.s to be in an intermediate state.
- Rebased on the tip.
- This is mostly Rafael's version of the patch.
- Added a check for a weak symbol when setting IsNeeded.
- Added the test to check that a weak symbol doesn't cause adding DT_NEEDED; this test might be extracted to a separate patch.
Why do you have to exclude weak symbols? Please expand (or replace) the comment to explain why this code should be like this.