The patch allows strip of entire symbol table in executable or DSO files with --strip-unneeded. This seems to be what GNU objcopy is doing.
Details
- Reviewers
rupprecht jhenderson jakehehrlich • espindola alexander-shaposhnikov - Commits
- rGc7e6d14c6c30: [llvm-objcopy] Allow strip symtab in executables and DSOs
rL366787: [llvm-objcopy] Allow strip symtab in executables and DSOs
rG194f16b3548b: [llvm-objcopy] Allow strip symtab from executables and DSOs
rL365193: [llvm-objcopy] Allow strip symtab from executables and DSOs
Diff Detail
Event Timeline
Does GNU remove all the symbols or remove the symbol table entirely? The two are slightly different. If the latter, we should look at whether it's specific to the --strip-unneeded switch or more generally when all symbols end up getting discarded.
test/tools/llvm-objcopy/ELF/strip-unneeded2.test | ||
---|---|---|
1 ↗ | (On Diff #198612) | Perhaps worth renaming this test to something like strip-unneeded-all-symbols.test. This test could also do with a brief comment at the top stating its purpose. |
9 ↗ | (On Diff #198612) | Nit: reduce the unnecessary padding amount, here and below. |
20 ↗ | (On Diff #198612) | What's the point of having DynamicSymbols here? |
29–32 ↗ | (On Diff #198612) | I think you can simplify this test by removing most of these fields. The only relevant fields are the Binding, Section and Name fields, I believe. Also, what happens for undefined STB_GLOBALs in GNU objcopy? There should be a test for that case here. |
tools/llvm-objcopy/ELF/ELFObjcopy.cpp | ||
424 | What does GNU objcopy do for other ET_* types, e.g. ET_CORE or ET_* values in the OS and PROC-specific ranges? Could this be inverted to != ET_REL? |
I'm confused as to what this is trying to accomplish. This really just marks every symbol as unneeded in a DSO and keeps the behavior for other cases. In that case why not strip the symbol table all togethor? This is a very round about way to accomplish that.
Can you document exactly what behavior you're seeing?
I'm confused as to what this is trying to accomplish.
I'm trying to strip symtab, because it's not needed in executables. We have .dynsym for dynamic linker, but it is not even readed by llvm-objcopy, so all symbols are coming from .symtab
This really just marks every symbol as unneeded in a DSO and keeps the behavior for other cases. In that case why not strip the symbol table all togethor? This is a very round about way to accomplish that.
This probably makes sense as well, but marking symbols unneeded seems more logical - one can figure out what is stripped looking at the single place in the code.
Can you document exactly what behavior you're seeing?
GNU objcopy strips all symbols from symtab in non-relocatable (ET_DYN or ET_EXEC) objects if those symbols are not referenced by relocations.
My understanding is that in DSOs and executables all static relocations should have been resolved by static linker, so we shouldn't bother about them.
For details please look at filter_symbols function in objcopy.c
tools/llvm-objcopy/ELF/ELFObjcopy.cpp | ||
---|---|---|
424 | May be. GNU objcopy considers everything which is not ET_DYN or ET_EXEC relocatable. See filter_symbols in objcopy.c int relocatable = (abfd->flags & (EXEC_P | DYNAMIC)) == 0; |
I'm still a bit unclear. Does it strip symbols that are duplicates of symbols in .dynsym? Does it just get rid of the the whole symbol table no matter what?
I think --strip-unneeded already strips debug sections as well so you can't "look in one place" to figure out what it does. You'll need extra stripping options regardless. If there is no documented behavior of GNU objcopy leaving .symtab in the ET_DYN and ET_EXEC cases we should just strip the symbol table all together.
Also what "strip unneeded" means is a bit unclear. There unfortunately isn't a spec anywhere nor is the use case implied by "needed" defined. We just have to match GNU objcopy. Arguments, which are quite reasonable otherwise, like ".symtab isn't needed to run the binary" don't target the right goal. If all you need is to run the binary you should use --strip-sections which gets rid of *everything* except what is needed to run/dynamically link the binary. The reality is that --strip-ubneeded is some hodge podge of behaviors we're just trying to match.
I'm still a bit unclear. Does it strip symbols that are duplicates of symbols in .dynsym? Does it just get rid of the the whole symbol table no matter what?
For ET_DYN/EXEC GNU objcopy strips all symbols from .symtab when they're not referenced by relocations. Given DYN/EXEC is a product of a static linker there're no such
symbols, so effectively all symbols from .symtab are stripped. As a result symtab section becomes empty and is not emitted to output binary.
I think --strip-unneeded already strips debug sections as well
llvm-objcopy --help ... -strip-unneeded Remove all symbols not needed by relocations ...
Also what "strip unneeded" means is a bit unclear.
I think it's pretty clear. You remove symbols not referenced by relocations compacting both .symtab and .strtab
There unfortunately isn't a spec anywhere nor is the use case implied by "needed" defined. We just have to match GNU objcopy.
I couldn't find any docs on the subject, so I can only suggest taking a look at GNU objcopy filter_symbols.
I'm still confused. If the behavior intended only has to do with the existence of relocations why are we making this change? A clear difference of behavior needs to be pointed out and that hasn't been made clear.
I'm still confused. If the behavior intended only has to do with the existence of relocations why are we making this change? A clear difference of behavior needs to be pointed out and that hasn't been made clear.
llvm-objcopy never strips STT_SECTION, STB_GLOBAL and SHN_UNDEF symbols, not matter if they referenced by relocations or not. This is right thing to do in ET_REL, but doesn't make sense in ET_DYN/EXEC.
See isUnneededSymbol.
GNU strip does this. In a really simple test -- build a dummy "int main() { return 0; }" executable, then run strip/llvm-strip --strip-unneeded -- GNU strip removes .symtab and .strtab, whereas llvm-strip does not. The .symtab/.strtab sections are trivial (only containing the null symbol), so we should do the final step and remove it. (Both leave in .shstrtab though).
So if we're going for better GNU strip emulation, this patch doesn't even go far enough, although it is a step in the right direction.
tools/llvm-objcopy/ELF/ELFObjcopy.cpp | ||
---|---|---|
424 | Seconding what James said, I think we should refactor this out to a separate method. That way, any downstream users using llvm-objcopy with custom object file types that should be considered relocatable have a narrow/well defined place to put their local modifications and avoid merge conflicts any time this part is changed. Ideally, this Object would be the one from libObject (ELFObjectFile<ELFT>::isRelocatableObject() in llvm/Object/ELFObjectFile.h), which I'm sure downstream users already have a local change for. Sadly, it's not, but downstream users can just add a second local patch. I'm fine with either way, slightly preferring the way you have it here just because it probably emulates GNU objcopy better, but checking Type != ET_REL should also work. |
tools/llvm-objcopy/ELF/ELFObjcopy.cpp | ||
---|---|---|
424 | In case I wasn't specific enough: by "refactor this" I mean just the (Obj.Type == ET_EXEC || Obj.Type == ET_DYN) check. |
- Addressed comments
- Trivial .symtab and .strtab are now removed completely from non-relocatable objects
A number of my test comments haven't been addressed yet. Please do so.
tools/llvm-objcopy/ELF/Object.cpp | ||
---|---|---|
646–649 | I'm not sure I see the point of this function. The symbol at index 0 is guaranteed to be the null symbol. | |
651 | I'd be inclined to call this "isEmpty" or simply "empty". Whilst not strictly empty, the null symbol is not significant. | |
1792 | Missing trailing full stop. | |
1796 | names, in such case -> names. In such a |
tools/llvm-objcopy/ELF/Object.cpp | ||
---|---|---|
651 | In fact SymbolTableSection::empty() already exists and is implemented like this (only checking size == 1, not verifying it's null) | |
1791–1793 | I don't think this part needs to be guarded by isRelocatable(). Even GNU objcopy will drop the symbol table if it's empty (only contains the null symbol). Demo: $ cat /tmp/yaml.txt --- !ELF FileHeader: Class: ELFCLASS64 Data: ELFDATA2LSB Type: ET_REL Machine: EM_X86_64 Sections: - Name: .text Type: SHT_PROGBITS Symbols: $ yaml2obj /tmp/yaml.txt > /tmp/null.o $ objcopy /tmp/null.o /tmp/null-copy.o $ readelf -SW /tmp/null.o /tmp/null-copy.o File: /tmp/null.o There are 5 section headers, starting at offset 0x40: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 0000000000000000 000180 000000 00 0 0 0 [ 2] .symtab SYMTAB 0000000000000000 000180 000018 18 3 1 8 [ 3] .strtab STRTAB 0000000000000000 000198 000001 00 0 0 1 [ 4] .shstrtab STRTAB 0000000000000000 000199 000021 00 0 0 1 ... File: /tmp/null-copy.o There are 3 section headers, starting at offset 0x58: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 0000000000000000 000040 000000 00 0 0 1 [ 2] .shstrtab STRTAB 0000000000000000 000040 000011 00 0 0 1 ... If I remove isRelocatable() here, I can match that behavior. |
test/tools/llvm-objcopy/ELF/strip-unneeded2.test | ||
---|---|---|
20 ↗ | (On Diff #198612) | The point is to test that dynamic symbols are not touched by --strip-unneeded run on executable. Do you think it's not needed? |
29–32 ↗ | (On Diff #198612) |
Removal. .symtab is not used by dynamic loader, so it's not needed. Undefined STB_GLOBAL symbols are placed in .dynsym by static linker for runtime resolution. |
tools/llvm-objcopy/ELF/Object.cpp | ||
1791–1793 |
Relocation sections are typically linked to .symtab so you can't remove it even if it is empty. It is still possible to check object for relocation sections and then check those sections link field, but it seems like an overkill, IMO. |
The point is to test that dynamic symbols are not touched by --strip-unneeded run on executable. Do you think it's not needed?
It's not needed. We don't do anything in llvm-objcopy to the dynamic symbols ever, so I don't think we need to test this in this specific case.
test/tools/llvm-objcopy/ELF/no-symbol-relocation.test | ||
---|---|---|
9 | Why are you changing this? | |
test/tools/llvm-objcopy/ELF/strip-unneeded-all-symbols.test | ||
3 | eliminate the static | |
4 | by the dynamic loader. | |
18–21 | As noted in an earlier diff, dynamic symbols aren't related to this change, so can be deleted (see out-of-line comment). | |
24 | The type is irrelevant to the behaviour, so it can be deleted. | |
tools/llvm-objcopy/ELF/ELFObjcopy.cpp | ||
424 | Why is this here? Is it tested? | |
tools/llvm-objcopy/ELF/Object.cpp | ||
1791–1793 | Probably worth a comment in the code to this effect. I don't feel strongly like we need to drop symbol tables in relocatable files. | |
1792 | Ping here. |
test/tools/llvm-objcopy/ELF/no-symbol-relocation.test | ||
---|---|---|
9 | Original yaml generates executable with relocation section. This triggers an error when this patch is applied, because llvm-objcopy attempts to strip symtab and fails because of the linked relocation section. | |
tools/llvm-objcopy/ELF/ELFObjcopy.cpp | ||
424 | I don't think I understand the question. Do you mean the check for !Obj.isRelocatable()? If so it's required to strip all symbols from static symbol table in executable. This in turn, will make static symbol empty and eventually it will be removed in removeUnneededSections |
tools/llvm-objcopy/ELF/Object.cpp | ||
---|---|---|
1791 | This should not be a method on Object. The goal is to keep Object highly generic. Please see how to remove this from Object's interface. | |
1804–1806 | I realize this is contrary to James's recommendation that this be a separate function but it would be nice if we used the same removeSections call to do this. This isn't critical however. |
llvm/trunk/tools/llvm-objcopy/ELF/Object.cpp | ||
---|---|---|
1998 ↗ | (On Diff #208155) | I think this method is overly specific. Is there a reason that the user can't invoke removeSections themselves? |
I'm seeing an error pulling this into our toolchain when trying to copy some files:
llvm-objcopy [file] --strip-unneeded /tmp/foo llvm-objcopy: error: '[file]': not stripping symbol '' because it is named in a relocation
Going to figure out what the issue is and submit a fix
This is an issue for fully linked executables that still have relocation sections, e.g. because it was linked with the --emit-relocs flag:
$ echo 'int main() { return 0; }' | clang -Wl,--emit-relocs -x c - -o foo && llvm-objcopy --strip-unneeded foo llvm-objcopy: error: 'foo': not stripping symbol '__gmon_start__' because it is named in a relocation
Temporarily reverted in r365712 due to the noted issue -- I don't think the fix should be too hard, but didn't seem trivial enough to just fix forward
Fixed issue with --emit-relocs. Existing test ELF/no-symbol-relocation.test checks this.
Sorry, I wanted to patch this in first and rerun tests to make sure the internal build step this broke doesn't have other issues -- I'll reply today
Why are you changing this?