This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Add --strip-all option to llvm-objcopy
ClosedPublic

Authored by jakehehrlich on Nov 7 2017, 3:59 PM.

Details

Summary

This change adds a slightly less extreme form of stripping. It should remove any section that starts with ".debug" and should remove any symbol table or relocations. In general this strips out most of the stuff you don't need to execute but leaves a number of things around. This behavior has been designed to be compatible with GNU strip/objcopy --strip-all so that anywhere you currently use --strip-all you should be able to use llvm-objcopy as a drop in replacement.

Diff Detail

Repository
rL LLVM

Event Timeline

jakehehrlich created this revision.Nov 7 2017, 3:59 PM
jhenderson edited edge metadata.Nov 8 2017, 3:09 AM

I took a look at GNU objcopy's --strip-all behaviour and it is slightly different to what you have implemented here. I haven't been able to pin down the exact semantics yet, but I noticed that it kept the .comment section, for example. My suspicion is that it explicitly only strips the .debug and .symtab sections, along with the symbol string table, if it is not shared as the section header table.

Otherwise, the code change looks fine to me.

test/tools/llvm-objcopy/strip-all.test
1 ↗(On Diff #122006)

This test should really have some other sections in it. What sections are actually being stripped here (I assume the symbol table is)? What about debug sections, which are also stripped by GNU objcopy --strip-all?

I took a look at GNU objcopy's --strip-all behaviour and it is slightly different to what you have implemented here. I haven't been able to pin down the exact semantics yet, but I noticed that it kept the .comment section, for example. My suspicion is that it explicitly only strips the .debug and .symtab sections, along with the symbol string table, if it is not shared as the section header table.

Otherwise, the code change looks fine to me.

My testing led me to think that it was just stripping non-allocated sections but you're right about .comment not being stripped; I didn't catch that. After a bunch more testing (using yaml2obj to generate various sections) the behavoir seems to be the following.

  1. Remove relocation sections
  2. Remove symbol table
  3. Remove any section that starts with "debug_" (for instance all standard "debug_*" sections are removed but "debug_blarg" and "debug_foo" are also removed)

Everything else seems to be kept (.comment and .random_section_name are both kept for instance). In practice this only seems to differ for .comment but one can imagine this being an issue for other sorts of added information as well. --strip-all should most likely mirror this behavior for consistency sake I as well. I'll add this but I think I'll add --strip-non-allocated as well to do what this currently does.

test/tools/llvm-objcopy/strip-all.test
1 ↗(On Diff #122006)

Debug sections aren't allocated so they'll be stripped. In this case .symtab, and .strtab are stripped but .shstrtab is not stripped. The maning behind this test was just to check that only the two allocated sections were kept and .shstrtab was kept since the section header table was kept. I could include more allocated and non-allocated sections I suppose but it seems a bit redundant. Now the fact that the .comment section is being kept is news to me, I'm not sure what rule I'm supposed to be following to make that happen.

Oh and one more behavoir I found. It will never remove an allocated section, even if it is named debug_*. This makes a lot of sense for dynamic symbol tables and relocation and it makes more sense than not for debug sections.

  1. Changed the removal criteria to be conservative rather than aggressive

I improved the test to test the removal cases I mentioned.

Nice work with that investigation.

I think the point I raised at line 177 of llvm-objcopy.cpp suggests that we need a bit more testing of chaining of stripping options. I think you should consider adding a test that performs --remove-section followed by --strip-all (and/or the other way around).

test/tools/llvm-objcopy/strip-all.test
1 ↗(On Diff #122006)

Expanding the test to cover the different cases like you have done is good - we want to at the least have coverage of each of the different branch cases. However, there is no relocation section, so there probably should be. Ideally, what I'd like is for there to be a section of each different type that can be stripped, along with SHF_ALLOC versions of all those sections, but I could be persuaded that might be over the top.

So, in addition to the cases you've already got, that means a SHT_REL, and a SHT_RELA, and also SHF_ALLOC SHT_REL, SHT_RELA, and SHT_STRTAB (and for completeness a SHF_ALLOC SHT_SYMTAB, although I doubt that this case will ever occur in practice).

tools/llvm-objcopy/llvm-objcopy.cpp
85 ↗(On Diff #122160)

Given the change in behaviour, I think the help text needs updating. I guess "Removes symbol, relocation, and debug information" should probably be fine.

177 ↗(On Diff #122160)

Won't this and the next "return false" override any other "RemovePred" specified earlier? They should probably be "return RemovePred(Sec)" or similar, otherwise you can't do "--strip-all --remove-section=.text" or similar.

So I'm wondering if we should be following what GNU objcopy does here. After talking with Roland for a bit I'm back to thinking that removing non-allocated sections is the way to go. What I think we want --strip-all to do is minimize the size of a fully linked executable while maintaining any relevant section headers of the content still in the file. For instance I want .dyn* sections to be kept so that I can run --strip-all on shared objects and still link against them. In general I'm not sure we should be mimicking what GNU objcopy does just for the sake of doing what GNU objcopy does as long as the feature in question is attached with some understandable intent. I think --strip-all has just such an intent; namely it should be used when you want to reduce the size of a fully linked object but you don't want to change what can be done with the object with the exception of debugging.

The goal of --strip-all is to reduce the size while maintaining different degrees of functionality and --strip-all feels some position in this trade off between functionality and size. Here's how I see the intent of various options

--strip-sections -> Any information not needed for loading should be removed
--strip-all -> (Unsure but loadability should be preserved, the section headers should be preserved for content still in the file, and a shared object should still be linkable against)
--strip-debug -> Any information only needed by the debugger should be removed.
--strip-uneeded -> This one is actually very precisely defined unlike the rest. You should remove all symbols not needed for relocation.

Roland McGrath pointed out to me that .gnu.warning.* sections should probably be kept by --strip-all. Roland seems to think that at least at first removing non-allocated sections and then selectively keeping certain special sections (like .gnu.warning.*) would be a better way to go than what GNU objcopy does.

tools/llvm-objcopy/llvm-objcopy.cpp
177 ↗(On Diff #122160)

Yea this is a problem. This is not a consistent way to behave.

There's a general problem here of how we should resolve disagreements between predicates. So a predicate should have 3 stances on the removal of a section a) remove b) keep or c) don't care. option c) can't cause a contradiction but a) and b) can conflict if used. We could take a general stance and say that only a) and c) are allowed then rather than returning "false" in any given case we just return the result of the previous RemovePred. That actually sounds the most sensible to me. Order of arguments wouldn't matter if we adopt that stance. For sections there are not "but keep this" operations so removals should compose I think.

The only other option I can see is to make order matter and to process the options by their order. I'm generally in favor of order not mattering however so I think I should replace any "return false" with "return RemovePred(...)" instead. I don't really strongly hold that opinion however so I would be happy doing things in a specific order as well. This code as it stands is order agnostic but resolves contradictions in a specific and bad way. It should either a) now have flags that contradict each other or b) allow later flags to overrule earlier flags. I'm in favor of a) unless GNU objcopy does b)

jakehehrlich retitled this revision from [llvm-objcopy] Add --strip-all option to remove all non-allocated sections but keep section header table to [llvm-objcopy] Add --strip-all option to llvm-objcopy.
jakehehrlich edited the summary of this revision. (Show Details)

Ok I think this should just follow what GNU objcopy does for mass consistency but I'm also going to add --only-keep-allocated to do what I intended this to do in the first place

  1. I couldn't add dynamic relocations because yaml2obj will only give a relocation section a link for a non-dynamic symbol table and llvm-objcopy throws an error if you give it a dynamic relocation section without a dynamic symbol table. I added every other section requested though.
  2. Changed how StripAll is handled so that it dosn't keep allocated sections that are removed by a previous flag.
davide added a subscriber: davide.Nov 10 2017, 2:21 PM
jhenderson accepted this revision.Nov 13 2017, 2:56 AM

Ok I think this should just follow what GNU objcopy does for mass consistency but I'm also going to add --only-keep-allocated to do what I intended this to do in the first place

That sounds like a reasonable plan. I'm a big fan of drop-in replacements, as long as it doesn't particularly harm the interface, because it makes migration a lot easier. I could imagine a case where somebody expects .comment to be kept, but if it isn't, a breakage might occur somewhere down a potentially very long build pipeline.

  1. I couldn't add dynamic relocations because yaml2obj will only give a relocation section a link for a non-dynamic symbol table and llvm-objcopy throws an error if you give it a dynamic relocation section without a dynamic symbol table. I added every other section requested though.

I find it slightly surprising that yaml2obj automatically assumes relocation sections with SHF_ALLOC to be dynamic relocation sections, given we're working with an ELF of ET_REL type. Given that though, this is okay.

LGTM.

tools/llvm-objcopy/llvm-objcopy.cpp
177 ↗(On Diff #122160)

I prefer the "order doesn't matter" approach for now. We might have to be careful when we add any "keep" options though, e.g. what happens if you do -R .text -j .text? Is it the same as -j .text -R .text? Something to poke GNU objcopy with, I think. If we're lucky GNU treats one or the other as higher priority, ignoring order. If we're unlucky, we're going to need to completely change this area of code to be some kind of argument dispatching mechanism.

This revision is now accepted and ready to land.Nov 13 2017, 2:56 AM

I find it slightly surprising that yaml2obj automatically assumes relocation sections with SHF_ALLOC to be dynamic relocation sections, given we're working with an ELF of ET_REL type. Given that though, this is okay.

It assumes them to be *non*-dynamic and llvm-objcopy complains when a dynamic relocation uses a non-dynamic symbol table. So it's llvm-objcopy making this assumption (but yaml2obj is making the opposite assumption unfortunately). Dynamic relocations don't really make sense in ET_REL unfortunately so I'm not sure either is in the wrong here. There's a change that just got approved in yaml2obj that adds dynamic symbol table support so we're not too far off from having a way to test lots of .dyn* stuff with it. We're just missing .dynamic and the ability to set the link for dynamic relocations to .dynsym instead of .symtab. After thats all working I'd like to go back and improve some of these tests so that we can use yaml2obj more widely.

This revision was automatically updated to reflect the committed changes.