This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Fix the behavior of --strip-* and --keep-symbol
ClosedPublic

Authored by alexander-shaposhnikov on May 17 2018, 9:32 PM.

Details

Summary

I've played with binutils objcopy --strip-all --keep-symbol foo,
so if the symbol table contains the symbol "foo"
it's not getting remove.

Test plan: make check-all

Diff Detail

Repository
rL LLVM

Event Timeline

jakehehrlich added inline comments.May 18 2018, 12:54 AM
tools/llvm-objcopy/llvm-objcopy.cpp
226 ↗(On Diff #147438)

If possible I'd prefer this come after and I'd prefer not using the functions in removeSymbols to keep state. I think it's sufficient to just check that SymbolsToKeep is non-empty for the single explicit section keep.

228 ↗(On Diff #147438)

Not something I want to fix here but I just realized that passing thought the symbol table every time is really inefficient for most use cases. We should somehow change that.

253 ↗(On Diff #147438)

Also same thing here. Could you add TODOs to only remove symbols when there is an option that effects them?

260 ↗(On Diff #147438)

I think we can just make Strips imply DiscardAll when a symbol is kept instead of doing this. Alternatively StripAll and StripAllGNU should cause each symbol to be removed and then only remove the symbol table if its empty in those options (which is closer to what GNU objcopy does anyway).

296 ↗(On Diff #147438)

Instead of duplicating all of these you can add this once at the end as a single keep step

Proposal aggregated from my comments

if (!Config.SymbolsToKeep.empty())
  RemovePred = [RemovePred, &Obj](const SectionBase &Sec] {
    if ( /* section is symbol table or symbol table's string table */)
      return false;
    return RemovePred(Sec);
  }

Hi !
I have a few thoughts about option conflicts handling.

First, for example, if we take localize-hidden.test test, and we apply -R .rel.text -N undefLocal, this way we would trigger an error. I know that objcopy is behaving the same as this patch behaves, but I was wondering if we could/want to do better ?

Then, for the example you gave, it seems that the behavior is not matching what you're saying. For instance, still with localize-hidden.test with --strip-all -K defaultGlobal, every other global/weak symbols are kept, which is not the case with objcopy.
Last, if you really test this, you will notice that it won't work, because there will be an error complaining about a symbol referenced by a relocation. This error happens because you are trying to remove undefLocal symbol, which is not permitted while it is a relocatable symbol.

I think this change is not sufficient to handle this problem (and more generally option conflicts), and it is creating other problems we might at least be aware of.

tools/llvm-objcopy/llvm-objcopy.cpp
296 ↗(On Diff #147438)

that's what i originally did,
it doesn't work correctly, or, at least not what binutils objcopy does.

tools/llvm-objcopy/llvm-objcopy.cpp
260 ↗(On Diff #147438)
>StripAll and StripAllGNU should cause each symbol to be removed -
  • it turns out that if at least one symbol specified by -K is kept, than all the local symbols are kept as well, that's what both objcopy and strip do, originally i didn't notice that and the code was much simpler. Anyway - i'll look into this tomorrow, kinda tired and want to sleep, maybe i sent this diff too quickly, want to think a bit more.
296 ↗(On Diff #147438)

i will look again tomorrow

tools/llvm-objcopy/llvm-objcopy.cpp
226 ↗(On Diff #147438)

@jakehehrich, Jake, so basically binutils objcopy does the following (as i already tried to explain in my previous comment) (if I'm not mistaken or not missing smth): if at least one symbol specified by -K is kept, then all the local symbols are kept as well, and the symbols table / strings table are not getting removed.
So it looks like if we want to repeat this logic (in particular regarding the local symbols), we can either keep track of this state (what's going on right now), or we can scan the symbols beforehand and check if it contains any of the symbols specified by -K.
(checking if the symbols table is empty or not is not sufficient since (as i mentioned above) we are not allowed to remove the local symbols anyway, so it'll be non-empty anyway in this case.
What are your thoughts / preferences ?

I might have missed something, but why do we need to keep _all_ of the local symbols, if one is kept? Surely that's completely unnecessary? And why local symbols and not global or weak symbols? It seems pretty straightforward that --strip-all --K <some symbol> should produce a symbol table with exactly one symbol in it (plus the null symbol).

@jhenderson,
i don't know why binutils objcopy is doing it.

[alexshap@devvm1372.frc2 ~/local] cat a.c
int xx = 12;
int yy() { return 11; }
[alexshap@devvm1372.frc2 ~/local] gcc -c -g a.c -o a.o
[alexshap@devvm1372.frc2 ~/local] objcopy --strip-all --keep-symbol yy a.o b.o
[alexshap@devvm1372.frc2 ~/local] ~/llvm-readobj.sh -symbols b.o | grep Local

Binding: Local (0x0)
Binding: Local (0x0)
Binding: Local (0x0)
Binding: Local (0x0)
Binding: Local (0x0)
Binding: Local (0x0)
Binding: Local (0x0)

[alexshap@devvm1372.frc2 ~/local] ~/llvm-readobj.sh -symbols b.o | grep yy

Name: yy (1)

[alexshap@devvm1372.frc2 ~/local] ~/llvm-readobj.sh -symbols b.o | grep xx
[alexshap@devvm1372.frc2 ~/local]

so basically we can implement this option without trying to repeat binutils behavior,
what would you say to this ? if so I can update the patch (and actually the code will be simpler)

In D47052#1106208, @alexshap wrote:

so basically we can implement this option without trying to repeat binutils behavior,
what would you say to this ? if so I can update the patch (and actually the code will be simpler)

I'm of the opinion that if the GNU behaviour doesn't make sense, we shouldn't follow it blindly, so I'd be happy in a divergence here, but I'm not a regular GNU tools user, so it might be wise to get @jakehehrlich's thoughts on this before proceeding.

In D47052#1106208, @alexshap wrote:

so basically we can implement this option without trying to repeat binutils behavior,
what would you say to this ? if so I can update the patch (and actually the code will be simpler)

I'm of the opinion that if the GNU behaviour doesn't make sense, we shouldn't follow it blindly, so I'd be happy in a divergence here, but I'm not a regular GNU tools user, so it might be wise to get @jakehehrlich's thoughts on this before proceeding.

I agree. This dosn't make sense to me. We should only keep the explicitly kept symbol and anything it relays on.

Update according to the previous discussion

jakehehrlich accepted this revision.May 21 2018, 5:40 PM

LGTM but please wait on james

This revision is now accepted and ready to land.May 21 2018, 5:40 PM

Fix typo in the comment

jhenderson accepted this revision.May 22 2018, 1:49 AM

LGTM, with a few nits.

tools/llvm-objcopy/llvm-objcopy.cpp
227 ↗(On Diff #147936)

Nit: effects -> affects

Also, I think you've wrapped unnecessarily. Clang-format doesn't "undo" line breaks in comments to match up to the column width.

381–384 ↗(On Diff #147936)

See my earlier comments re. comment wrapping.

symbols table -> symbol table (x2)
strings table -> string table

386 ↗(On Diff #147936)

I don't think you need Config in the capture list?

tools/llvm-objcopy/llvm-objcopy.cpp
386 ↗(On Diff #147936)

you are right, Config should not be captured here.

alexander-shaposhnikov edited the summary of this revision. (Show Details)May 22 2018, 10:46 AM
This revision was automatically updated to reflect the committed changes.