This change to llvm-objcopy preserves the ELF section sh_link to .symtab
so long as none of the symbol table indices have been changed.
Previously, any invocation of llvm-objcopy including a "no-op" would
clear any section sh_link to .symtab.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Sorry for the delay. This looks good me. Is the intention to make ld.lld --icf=safe work after certain llvm-objcopy operations? If so, it'd be good to state the motivation in the summary.
I think the change is good for other sh_link->SHT_SYMTAB sections.
I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.
I wonder whether we still consider changing the design of ELF SHT_LLVM_ADDRSIG (. Mach-O uses relocation-based __addrsig (https://discourse.llvm.org/t/problems-with-mach-o-address-significance-table-generation/63392 https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#:~:text=icf).
llvm/lib/ObjCopy/ELF/ELFObject.cpp | ||
---|---|---|
693–694 | Add {} for if and else. See https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements If the manual loop unswitching necessary? Can we just keep the else part? | |
2533 | ||
llvm/test/tools/llvm-objcopy/ELF/symtab-link.test | ||
21 |
llvm/test/tools/llvm-objcopy/ELF/symtab-link.test | ||
---|---|---|
9 | Perhaps add a comment to cover all the symbol index affecting operations, something like: A section with sh_link referencing SHT_SYMTAB indicates that its content may use the old symbol indices. |
Sorry for the delay. This looks good me. Is the intention to make ld.lld --icf=safe work after certain llvm-objcopy operations? If so, it'd be good to state the motivation in the summary.
I think the change is good for other sh_link->SHT_SYMTAB sections.
No, this was not specifically done for SHT_LLVM_ADDRSIG but for some downstream sections.
I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.
I did consider this but decided not to reset sh_link to 0. Do you have a reason for why resetting sh_link to 0 in this scenario would be better/preferable?
llvm/lib/ObjCopy/ELF/ELFObject.cpp | ||
---|---|---|
693–694 |
It's not needed but felt like a reasonable optimisation. Would you rather drop the optimisation? |
The section content may be a digest of existing symbols. Adding a symbol nullifies its assumption that its content is comprehensive.
However, I don't have an example that this breaks.
llvm/lib/ObjCopy/ELF/ELFObject.cpp | ||
---|---|---|
693–694 | I'd drop this optimization. It likely doesn't affect performance. There are complex symbol table operations elsewhere.. |
I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.
I did consider this but decided not to reset sh_link to 0. Do you have a reason for why resetting sh_link to 0 in this scenario would be better/preferable?
The section content may be a digest of existing symbols. Adding a symbol nullifies its assumption that its content is comprehensive.
However, I don't have an example that this breaks.
Yes, the number of symbols could differ. I guess resetting sh_link to 0 would be the safer option but on the other hand if the user is making such modifications then it could be considered at their own risk.
Thanks. The argument for not resetting sh_link to 0 for --add-symbol looks good to me.
llvm/lib/ObjCopy/ELF/ELFObject.cpp | ||
---|---|---|
432 | * -> & for a non-nullable pointer. |
One question; otherwise looks good.
llvm/test/tools/llvm-objcopy/ELF/symtab-link.test | ||
---|---|---|
22 | I'm probably missing something, but why does stripping the last symbol result in an index change (and therefore sh_link 0)? |
llvm/test/tools/llvm-objcopy/ELF/symtab-link.test | ||
---|---|---|
22 | It's treated as a change to the indices because there's now less than there were before, i.e. the last symbol could have been referenced by the linked section. It is a bit conservative because you could add and then remove those symbols and still have all the "original" symbols but that would complicate the patch and seems an unusual use (less) case in reality. |
* -> & for a non-nullable pointer.