This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Improve section removal
AbandonedPublic

Authored by evgeny777 on Feb 12 2019, 6:04 AM.

Details

Summary

This patch changes segment size, when last section in the segment is removed. Segment also can be removed completely when it doesn't contain section (such behavior differs from GNU objcopy which leaves empty segment with warning).

This patch changes way section and segment layout is calculated and is a base for --update-section, --pad-to and few other options.

Diff Detail

Event Timeline

evgeny777 created this revision.Feb 12 2019, 6:04 AM
evgeny777 edited the summary of this revision. (Show Details)Feb 12 2019, 9:59 AM

Can we get an explicit real use case for this or a case where not removing these sections is detrimental? I believe this to be fundamentally wrong otherwise. Imagine what would happen if you first ran --strip-sections and then ran literally anything again. It's also a primary tenant so far that we not modify program headers. That's how we maintain that stripping not affect run time behavior so trivially. This diff changes that.

The one use case I'm aware of that would modify program headers is to add a program header so that a special chunk of data can be used at runtime that isn't available until after the binary is linked (because it is in part a way to verify that the correct binary is loaded or something like that; I'm not sure about the details)

cc @mcgrathr who can comment more in depth on how this works in GNU objcopy and how useful it is. He already noted that anything relaying on segment resizing behavior from GNU objcopy is already fragile.

Can we get an explicit real use case

Well, my understanding is that when someone removes section from final image he knows what he's doing. The patch allows reducing executable size when it is possible.
Also it now calculates segment file and memory sizes based on section offsets and sizes.

FYI GNU objcopy can not only shrink, but also expand segments, recalculating section and segment addresses. See --update-section.
My plans were implementing GNU objcopy options and (possibly) some that I personally find useful, i.e:

  • Adding/modifying symbol in dynamic symbol table (.dynsym)
  • Adding/modifying dynamic table entries

This requires segment modification as well. Is there any reasons why segments shouldn't be modified by llvm-objcopy, except that it might break something?

Some things that can go wrong if we allow segments to be modified such that their size or addresses change:

  1. Dynamic relocations no longer patch the right area (potentially even patching memory outside the program's address space).
  2. Dynamic tags could end up referencing incorrect addresses.
  3. Code could end up referencing incorrect addresses.
  4. The entry symbol may no longer point at a valid address.
  5. Unlabelled space that has meaning (e.g. following section header stripping) may be discarded.

None of these are absolute reasons why we can't modify program headers, but I agree that we must have a concrete use case before we go ahead and implement manipulating phdrs in this way. I will say that it is probably safe to increase the size of a segment (both file and memory size), as long as the addresses of other segments are unaffected. This includes adding new segments entirely.

Using llvm-objcopy to reduce memory usage post-link seems like the wrong thing to do, because it can only safely remove sections that are known to be unused, and it's a bit of a sledge-hammer, because section concatenation has already been performed by the linker. A safer and much more appropriate place for this would be at link time, using switches such as --gc-sections. Using llvm-objcopy on the objects pre-link would also be possible here.

Removing empty segments is also sometimes not a no-op in terms of semantic behaviour (see e.g. the PT_GNU_STACK segment type). Even removing PT_LOAD segments may not be wise, since some loaders may explicitly expect a certain number of segments, although I agree that it is probably a feature that could be implemented, e.g. in via a switch.

evgeny777 abandoned this revision.Oct 30 2019, 9:40 AM