This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][ELF] Preserve sh_link to .symtab when applicable
ClosedPublic

Authored by andrewng on May 18 2023, 4:10 AM.

Details

Summary

This change to llvm-objcopy preserves the ELF section sh_link to .symtab
so long as none of the symbol table indices have been changed.
Previously, any invocation of llvm-objcopy including a "no-op" would
clear any section sh_link to .symtab.

Diff Detail

Event Timeline

andrewng created this revision.May 18 2023, 4:10 AM
andrewng requested review of this revision.May 18 2023, 4:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 4:10 AM
MaskRay accepted this revision.EditedMay 25 2023, 11:00 AM

Sorry for the delay. This looks good me. Is the intention to make ld.lld --icf=safe work after certain llvm-objcopy operations? If so, it'd be good to state the motivation in the summary.
I think the change is good for other sh_link->SHT_SYMTAB sections.

I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.

I wonder whether we still consider changing the design of ELF SHT_LLVM_ADDRSIG (. Mach-O uses relocation-based __addrsig (https://discourse.llvm.org/t/problems-with-mach-o-address-significance-table-generation/63392 https://maskray.me/blog/2020-11-15-explain-gnu-linker-options#:~:text=icf).

llvm/lib/ObjCopy/ELF/ELFObject.cpp
693–694

Add {} for if and else. See https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

If the manual loop unswitching necessary? Can we just keep the else part?

2533
llvm/test/tools/llvm-objcopy/ELF/symtab-link.test
21
This revision is now accepted and ready to land.May 25 2023, 11:00 AM
MaskRay added inline comments.May 25 2023, 11:07 AM
llvm/test/tools/llvm-objcopy/ELF/symtab-link.test
9

Perhaps add a comment to cover all the symbol index affecting operations, something like:

A section with sh_link referencing SHT_SYMTAB indicates that its content may use the old symbol indices.
If symbol indices changed, reset sh_link to 0 to inform tools like linkers that the link is invalidated.

Sorry for the delay. This looks good me. Is the intention to make ld.lld --icf=safe work after certain llvm-objcopy operations? If so, it'd be good to state the motivation in the summary.
I think the change is good for other sh_link->SHT_SYMTAB sections.

No, this was not specifically done for SHT_LLVM_ADDRSIG but for some downstream sections.

I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.

I did consider this but decided not to reset sh_link to 0. Do you have a reason for why resetting sh_link to 0 in this scenario would be better/preferable?

llvm/lib/ObjCopy/ELF/ELFObject.cpp
693–694

If the manual loop unswitching necessary? Can we just keep the else part?

It's not needed but felt like a reasonable optimisation. Would you rather drop the optimisation?

Sorry for the delay. This looks good me. Is the intention to make ld.lld --icf=safe work after certain llvm-objcopy operations? If so, it'd be good to state the motivation in the summary.
I think the change is good for other sh_link->SHT_SYMTAB sections.

No, this was not specifically done for SHT_LLVM_ADDRSIG but for some downstream sections.

I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.

I did consider this but decided not to reset sh_link to 0. Do you have a reason for why resetting sh_link to 0 in this scenario would be better/preferable?

The section content may be a digest of existing symbols. Adding a symbol nullifies its assumption that its content is comprehensive.
However, I don't have an example that this breaks.

llvm/lib/ObjCopy/ELF/ELFObject.cpp
693–694

I'd drop this optimization. It likely doesn't affect performance. There are complex symbol table operations elsewhere..

I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.

I did consider this but decided not to reset sh_link to 0. Do you have a reason for why resetting sh_link to 0 in this scenario would be better/preferable?

The section content may be a digest of existing symbols. Adding a symbol nullifies its assumption that its content is comprehensive.
However, I don't have an example that this breaks.

Yes, the number of symbols could differ. I guess resetting sh_link to 0 would be the safer option but on the other hand if the user is making such modifications then it could be considered at their own risk.

I have a comment whether we should make new symbols (while existing symbol indices are unchanged) reset sh_link to zero.

I did consider this but decided not to reset sh_link to 0. Do you have a reason for why resetting sh_link to 0 in this scenario would be better/preferable?

The section content may be a digest of existing symbols. Adding a symbol nullifies its assumption that its content is comprehensive.
However, I don't have an example that this breaks.

Yes, the number of symbols could differ. I guess resetting sh_link to 0 would be the safer option but on the other hand if the user is making such modifications then it could be considered at their own risk.

Thanks. The argument for not resetting sh_link to 0 for --add-symbol looks good to me.

andrewng updated this revision to Diff 526067.May 26 2023, 8:09 AM

Updated for review comments and suggestions.

andrewng marked 5 inline comments as done.May 26 2023, 8:10 AM
MaskRay accepted this revision.May 26 2023, 8:53 AM
MaskRay added inline comments.
llvm/lib/ObjCopy/ELF/ELFObject.cpp
432

* -> & for a non-nullable pointer.

andrewng updated this revision to Diff 526143.May 26 2023, 11:33 AM
andrewng marked an inline comment as done.
jhenderson accepted this revision.May 30 2023, 12:58 AM

One question; otherwise looks good.

llvm/test/tools/llvm-objcopy/ELF/symtab-link.test
22

I'm probably missing something, but why does stripping the last symbol result in an index change (and therefore sh_link 0)?

andrewng added inline comments.Jun 5 2023, 11:22 AM
llvm/test/tools/llvm-objcopy/ELF/symtab-link.test
22

It's treated as a change to the indices because there's now less than there were before, i.e. the last symbol could have been referenced by the linked section. It is a bit conservative because you could add and then remove those symbols and still have all the "original" symbols but that would complicate the patch and seems an unusual use (less) case in reality.