This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Allow strip symtab in executables and DSOs
ClosedPublic

Authored by evgeny777 on May 8 2019, 3:25 AM.

Diff Detail

Event Timeline

evgeny777 created this revision.May 8 2019, 3:25 AM

Does GNU remove all the symbols or remove the symbol table entirely? The two are slightly different. If the latter, we should look at whether it's specific to the --strip-unneeded switch or more generally when all symbols end up getting discarded.

test/tools/llvm-objcopy/ELF/strip-unneeded2.test
1 ↗(On Diff #198612)

Perhaps worth renaming this test to something like strip-unneeded-all-symbols.test. This test could also do with a brief comment at the top stating its purpose.

9 ↗(On Diff #198612)

Nit: reduce the unnecessary padding amount, here and below.

20 ↗(On Diff #198612)

What's the point of having DynamicSymbols here?

29–32 ↗(On Diff #198612)

I think you can simplify this test by removing most of these fields. The only relevant fields are the Binding, Section and Name fields, I believe. Also, what happens for undefined STB_GLOBALs in GNU objcopy? There should be a test for that case here.

tools/llvm-objcopy/ELF/ELFObjcopy.cpp
424

What does GNU objcopy do for other ET_* types, e.g. ET_CORE or ET_* values in the OS and PROC-specific ranges? Could this be inverted to != ET_REL?

I'm confused as to what this is trying to accomplish. This really just marks every symbol as unneeded in a DSO and keeps the behavior for other cases. In that case why not strip the symbol table all togethor? This is a very round about way to accomplish that.

Can you document exactly what behavior you're seeing?

evgeny777 marked an inline comment as done.May 13 2019, 3:51 AM

I'm confused as to what this is trying to accomplish.

I'm trying to strip symtab, because it's not needed in executables. We have .dynsym for dynamic linker, but it is not even readed by llvm-objcopy, so all symbols are coming from .symtab

This really just marks every symbol as unneeded in a DSO and keeps the behavior for other cases. In that case why not strip the symbol table all togethor? This is a very round about way to accomplish that.

This probably makes sense as well, but marking symbols unneeded seems more logical - one can figure out what is stripped looking at the single place in the code.

Can you document exactly what behavior you're seeing?

GNU objcopy strips all symbols from symtab in non-relocatable (ET_DYN or ET_EXEC) objects if those symbols are not referenced by relocations.
My understanding is that in DSOs and executables all static relocations should have been resolved by static linker, so we shouldn't bother about them.
For details please look at filter_symbols function in objcopy.c

tools/llvm-objcopy/ELF/ELFObjcopy.cpp
424

May be. GNU objcopy considers everything which is not ET_DYN or ET_EXEC relocatable. See filter_symbols in objcopy.c

int relocatable = (abfd->flags & (EXEC_P | DYNAMIC)) == 0;

I'm still a bit unclear. Does it strip symbols that are duplicates of symbols in .dynsym? Does it just get rid of the the whole symbol table no matter what?

I think --strip-unneeded already strips debug sections as well so you can't "look in one place" to figure out what it does. You'll need extra stripping options regardless. If there is no documented behavior of GNU objcopy leaving .symtab in the ET_DYN and ET_EXEC cases we should just strip the symbol table all together.

Also what "strip unneeded" means is a bit unclear. There unfortunately isn't a spec anywhere nor is the use case implied by "needed" defined. We just have to match GNU objcopy. Arguments, which are quite reasonable otherwise, like ".symtab isn't needed to run the binary" don't target the right goal. If all you need is to run the binary you should use --strip-sections which gets rid of *everything* except what is needed to run/dynamically link the binary. The reality is that --strip-ubneeded is some hodge podge of behaviors we're just trying to match.

I'm still a bit unclear. Does it strip symbols that are duplicates of symbols in .dynsym? Does it just get rid of the the whole symbol table no matter what?

For ET_DYN/EXEC GNU objcopy strips all symbols from .symtab when they're not referenced by relocations. Given DYN/EXEC is a product of a static linker there're no such
symbols, so effectively all symbols from .symtab are stripped. As a result symtab section becomes empty and is not emitted to output binary.

I think --strip-unneeded already strips debug sections as well

llvm-objcopy --help
...
-strip-unneeded         Remove all symbols not needed by relocations
...

Also what "strip unneeded" means is a bit unclear.

I think it's pretty clear. You remove symbols not referenced by relocations compacting both .symtab and .strtab

There unfortunately isn't a spec anywhere nor is the use case implied by "needed" defined. We just have to match GNU objcopy.

I couldn't find any docs on the subject, so I can only suggest taking a look at GNU objcopy filter_symbols.

I'm still confused. If the behavior intended only has to do with the existence of relocations why are we making this change? A clear difference of behavior needs to be pointed out and that hasn't been made clear.

I'm still confused. If the behavior intended only has to do with the existence of relocations why are we making this change? A clear difference of behavior needs to be pointed out and that hasn't been made clear.

llvm-objcopy never strips STT_SECTION, STB_GLOBAL and SHN_UNDEF symbols, not matter if they referenced by relocations or not. This is right thing to do in ET_REL, but doesn't make sense in ET_DYN/EXEC.
See isUnneededSymbol.

In that case why not strip the symbol table all togethor?

GNU strip does this. In a really simple test -- build a dummy "int main() { return 0; }" executable, then run strip/llvm-strip --strip-unneeded -- GNU strip removes .symtab and .strtab, whereas llvm-strip does not. The .symtab/.strtab sections are trivial (only containing the null symbol), so we should do the final step and remove it. (Both leave in .shstrtab though).

So if we're going for better GNU strip emulation, this patch doesn't even go far enough, although it is a step in the right direction.

tools/llvm-objcopy/ELF/ELFObjcopy.cpp
424

Seconding what James said, I think we should refactor this out to a separate method. That way, any downstream users using llvm-objcopy with custom object file types that should be considered relocatable have a narrow/well defined place to put their local modifications and avoid merge conflicts any time this part is changed.

Ideally, this Object would be the one from libObject (ELFObjectFile<ELFT>::isRelocatableObject() in llvm/Object/ELFObjectFile.h), which I'm sure downstream users already have a local change for. Sadly, it's not, but downstream users can just add a second local patch.

I'm fine with either way, slightly preferring the way you have it here just because it probably emulates GNU objcopy better, but checking Type != ET_REL should also work.

rupprecht added inline comments.Jun 6 2019, 4:39 PM
tools/llvm-objcopy/ELF/ELFObjcopy.cpp
424

In case I wasn't specific enough: by "refactor this" I mean just the (Obj.Type == ET_EXEC || Obj.Type == ET_DYN) check.

evgeny777 updated this revision to Diff 204072.Jun 11 2019, 7:55 AM
  • Addressed comments
  • Trivial .symtab and .strtab are now removed completely from non-relocatable objects

A number of my test comments haven't been addressed yet. Please do so.

tools/llvm-objcopy/ELF/Object.cpp
646–649

I'm not sure I see the point of this function. The symbol at index 0 is guaranteed to be the null symbol.

651

I'd be inclined to call this "isEmpty" or simply "empty". Whilst not strictly empty, the null symbol is not significant.

1792

Missing trailing full stop.
"remove trivial" -> "remove an empty".

1796

names, in such case -> names. In such a

rupprecht added inline comments.Jun 12 2019, 3:09 PM
tools/llvm-objcopy/ELF/Object.cpp
651

In fact SymbolTableSection::empty() already exists and is implemented like this (only checking size == 1, not verifying it's null)

1791–1793

I don't think this part needs to be guarded by isRelocatable(). Even GNU objcopy will drop the symbol table if it's empty (only contains the null symbol).

Demo:

$ cat /tmp/yaml.txt
--- !ELF
FileHeader:
  Class:           ELFCLASS64
  Data:            ELFDATA2LSB
  Type:            ET_REL
  Machine:         EM_X86_64
Sections:
  - Name:          .text
    Type:          SHT_PROGBITS
Symbols:
$ yaml2obj /tmp/yaml.txt > /tmp/null.o
$ objcopy /tmp/null.o /tmp/null-copy.o
$ readelf -SW /tmp/null.o /tmp/null-copy.o

File: /tmp/null.o
There are 5 section headers, starting at offset 0x40:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000180 000000 00      0   0  0
  [ 2] .symtab           SYMTAB          0000000000000000 000180 000018 18      3   1  8
  [ 3] .strtab           STRTAB          0000000000000000 000198 000001 00      0   0  1
  [ 4] .shstrtab         STRTAB          0000000000000000 000199 000021 00      0   0  1
...
File: /tmp/null-copy.o
There are 3 section headers, starting at offset 0x58:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000000 00      0   0  1
  [ 2] .shstrtab         STRTAB          0000000000000000 000040 000011 00      0   0  1
...

If I remove isRelocatable() here, I can match that behavior.

evgeny777 marked 3 inline comments as done.Jun 13 2019, 4:42 AM
evgeny777 added inline comments.
test/tools/llvm-objcopy/ELF/strip-unneeded2.test
20 ↗(On Diff #198612)

The point is to test that dynamic symbols are not touched by --strip-unneeded run on executable. Do you think it's not needed?

29–32 ↗(On Diff #198612)

Also, what happens for undefined STB_GLOBALs in GNU objcopy?

Removal. .symtab is not used by dynamic loader, so it's not needed. Undefined STB_GLOBAL symbols are placed in .dynsym by static linker for runtime resolution.

tools/llvm-objcopy/ELF/Object.cpp
1791–1793

I don't think this part needs to be guarded by isRelocatable()

Relocation sections are typically linked to .symtab so you can't remove it even if it is empty. It is still possible to check object for relocation sections and then check those sections link field, but it seems like an overkill, IMO.

evgeny777 updated this revision to Diff 204490.Jun 13 2019, 4:56 AM

Addressed comments

The point is to test that dynamic symbols are not touched by --strip-unneeded run on executable. Do you think it's not needed?

It's not needed. We don't do anything in llvm-objcopy to the dynamic symbols ever, so I don't think we need to test this in this specific case.

test/tools/llvm-objcopy/ELF/no-symbol-relocation.test
9

Why are you changing this?

test/tools/llvm-objcopy/ELF/strip-unneeded-all-symbols.test
3

eliminate the static

4

by the dynamic loader.

18–21

As noted in an earlier diff, dynamic symbols aren't related to this change, so can be deleted (see out-of-line comment).

24

The type is irrelevant to the behaviour, so it can be deleted.

tools/llvm-objcopy/ELF/ELFObjcopy.cpp
424

Why is this here? Is it tested?

tools/llvm-objcopy/ELF/Object.cpp
1791–1793

Probably worth a comment in the code to this effect. I don't feel strongly like we need to drop symbol tables in relocatable files.

1792

Ping here.

evgeny777 marked 6 inline comments as done.Jun 14 2019, 1:46 AM
evgeny777 added inline comments.
test/tools/llvm-objcopy/ELF/no-symbol-relocation.test
9

Original yaml generates executable with relocation section. This triggers an error when this patch is applied, because llvm-objcopy attempts to strip symtab and fails because of the linked relocation section.

tools/llvm-objcopy/ELF/ELFObjcopy.cpp
424

I don't think I understand the question. Do you mean the check for !Obj.isRelocatable()? If so it's required to strip all symbols from static symbol table in executable. This in turn, will make static symbol empty and eventually it will be removed in removeUnneededSections

jakehehrlich added inline comments.Jun 14 2019, 1:56 PM
tools/llvm-objcopy/ELF/Object.cpp
1791

This should not be a method on Object. The goal is to keep Object highly generic. Please see how to remove this from Object's interface.

1804–1806

I realize this is contrary to James's recommendation that this be a separate function but it would be nice if we used the same removeSections call to do this. This isn't critical however.

evgeny777 updated this revision to Diff 205269.Jun 18 2019, 1:22 AM

removeUnneededSections ripped from Object interface.

I don't think I have any more comments. Thanks!

rupprecht accepted this revision.Jul 3 2019, 3:16 PM
This revision is now accepted and ready to land.Jul 3 2019, 3:16 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2019, 5:10 AM

Sorry looking now.

jakehehrlich reopened this revision.Jul 8 2019, 3:51 PM
jakehehrlich added inline comments.
llvm/trunk/tools/llvm-objcopy/ELF/Object.cpp
1998 ↗(On Diff #208155)

I think this method is overly specific. Is there a reason that the user can't invoke removeSections themselves?

This revision is now accepted and ready to land.Jul 8 2019, 3:51 PM

(sorry I just realized this was already landed, I was out when it was bumped).

I'm seeing an error pulling this into our toolchain when trying to copy some files:

llvm-objcopy [file] --strip-unneeded /tmp/foo                                                                                                                        
llvm-objcopy: error: '[file]': not stripping symbol '' because it is named in a relocation

Going to figure out what the issue is and submit a fix

This is an issue for fully linked executables that still have relocation sections, e.g. because it was linked with the --emit-relocs flag:

$ echo 'int main() { return 0; }' | clang -Wl,--emit-relocs -x c - -o foo && llvm-objcopy --strip-unneeded foo                                                                                         
llvm-objcopy: error: 'foo': not stripping symbol '__gmon_start__' because it is named in a relocation

Temporarily reverted in r365712 due to the noted issue -- I don't think the fix should be too hard, but didn't seem trivial enough to just fix forward

evgeny777 updated this revision to Diff 209217.Jul 11 2019, 7:42 AM

Fixed issue with --emit-relocs. Existing test ELF/no-symbol-relocation.test checks this.

evgeny777 requested review of this revision.Jul 18 2019, 3:04 AM

@rupprecht Do you have any comments on update? I'm going to re-commit it.

Sorry, I wanted to patch this in first and rerun tests to make sure the internal build step this broke doesn't have other issues -- I'll reply today

rupprecht accepted this revision.Jul 19 2019, 11:50 AM

Everything looks good for the tests that were broken before, thanks for waiting!

This revision is now accepted and ready to land.Jul 19 2019, 11:50 AM
This revision was automatically updated to reflect the committed changes.