[llvm-objcopy] Add support for --gap-fill and --pad-to options
Authored by asmith on Sep 17 2019, 8:42 PM.



This adds two new options to match GNU objcopy:

--gap-fill is used to fill gaps between two adjacent loadable sections with a specified 8 bit value.
--pad-to will pad the last loadable section to the pad-to address with the value specified by --gap-fill if present or zero by default.

Overview of how this works in llvm-objcopy:

  1. Resize all the segments if needed.
  2. Adjust section/segment layout (i.e. file offset) due to overlap across sections in the same segment or overlap across segments.
  3. Set the parent segment's content.
  4. Set the segment child's content.
  5. Set the section's content.

Any comments from the llvm-objcopy community?

Commented regarding tests and nits. Could you update docs/CommandGuide/llvm-objcopy.rst as well?


Is it possible to use yaml2obj instead of adding a binary file? I think we just need few sections to make sure that this patch works well. For example (not tested):

  Class:           ELFCLASS64
  Data:            ELFDATA2LSB
  Type:            ET_EXEC
  Machine:         EM_X86_64
  - Name:            .text
    Type:            SHT_PROGBITS
    Content:         "AABBCCDD"
  - Name:            .foo
    Type:            SHT_PROGBITS
    Content:         "EEFF"
  - Name:            .bar
    Type:            SHT_NOBITS

nit: please avoid auto and use uint64_t.





asmith updated this revision to Diff 221229.Sun, Sep 22, 12:23 PM

Address reviewers comments about auto

asmith updated this revision to Diff 221232.Sun, Sep 22, 12:31 PM

Add new options to objcopy documentation

asmith marked 3 inline comments as done.Sun, Sep 22, 12:35 PM
asmith added inline comments.

That's a good suggestion. Will look into that.

Hui added inline comments.Sun, Sep 22, 8:49 PM

This patch resizes the section and its containing segments. (Parent or child segment). One main part of the test is to check the section to segment mapping from PHDR. However yam2obj can't keep program headers.

Thinks for the patch! I haven't had time to review the meat of it yet, but will try to get back to it in the next couple of days. One quick question in the meantime: what's the motivation behind this? Is it to improve GNU compatibility?


I think you can change this slightly to "... to the ''<address>'' ...", using the formatting of the reference to <section> in --only-section above (i.e. back ticks instead of apostrophes).


:option:''--gap-fill'' (using backticks instead of apostrophes)


This and a couple of tests look like an unrelated change?


Why does this and the other tests require shell? I don't see anything in here that requires it (od and cmp are both available in GnuWin32 tools for example).


We tend in the newer binary tools tests to use '##' for comments to separate them from test commands. Please could you update.


This patch resizes the section and its containing segments. (Parent or child segment). One main part of the test is to check the section to segment mapping from PHDR. However yam2obj can't keep program headers.

I'm not sure I understand what you mean here? yaml2obj supports program headers (it has done for a couple of years now I think). See for example llvm/test/tools/yaml2obj/program-header.yaml etc as well as various llvm-objcopy tests using program headers (e.g. llvm/test/tools/llvm-objcopy/segment-test-remove-section.test).


addr -> Addr

I'm guessing this and the above error return Expected. Would you mind not using auto here to make it clear, please?

Hui added inline comments.Mon, Sep 23, 10:20 AM

This diagnostic message probably emitted when the layout of resized sections was not correctly done. However it is misleading. The 'Size' is supposed to be in hex while it is not.

asmith marked an inline comment as done.Mon, Sep 23, 11:32 AM

Thinks for the patch! I haven't had time to review the meat of it yet, but will try to get back to it in the next couple of days. One quick question in the meantime: what's the motivation behind this? Is it to improve GNU compatibility?

I need these options to use llvm-objcopy when building uboot.


This is a bug fix. The value wasn’t printed correctly.

Hui added inline comments.Mon, Sep 23, 2:51 PM

You are right. Actually I was then thinking about using obj2yaml to get yaml file and operate it with yaml2obj in the test. Unfortunately obj2yaml hasn't supported dumping the program headers (A plausible easy feature to have but not in position yet). So no output about PHDR by yam2obj.

asmith updated this revision to Diff 221461.Mon, Sep 23, 9:11 PM

Address comments on comment character #, auto and rst file.

asmith marked 7 inline comments as done.Mon, Sep 23, 9:12 PM

One major question to answer is how should --gap-fill interact with the preserving of segment contents that llvm-objcopy follows when segment contents are not covered by existing sections (see llvm/test/tools/llvm-objcopy/preserve-segment-contents.test for an example of the behaviour). I'd be inclined to have --gap-fill overwrite the data in this case, in which case a test should be written to show that the old data is no longer written in this case (see below for more details).

Some other questions/comments:

  1. What should be the behaviour for gaps between the start of a segment and the start of the first section? I feel like --gap-fill should fill that gap too (same goes for end of the section and segment).
  2. In relation to the preserve contents issue above, what should happen if a section is removed? Currently it fills it with zeroes. I think it should instead fill it with --gap-fill bytes (see writeSegmentData).
  3. For --gap-fill, I think a simpler approach than what you have might be to change how Segments are written in writeSegmentData: if --gap-fill is not specified, use the existing contents of the segment, otherwise use --gap-fill's value.
  4. For --pad-to, what should happen with SHT_NOBITS sections? The description you've written suggests that they should have their size increased to pad to the corresponding address, but this isn't the behaviour I see with GNU objcopy.
  5. Also for --pad-to, I feel like your code is more complicated than it needs to be. I feel like the relevant changes would be most appropriate in or immediately before layoutSegments - identify what the last segment is and then resize it and the last section in it. I don't think you need to create new sections or segments, although you'll need to identify how you can increase the size.
  6. What does GNU objcopy do for non-progbits sections that are last in the loadable segment, e.g. .dynamic sections?
  7. What does GNU objcopy do if the last such section is nested inside multiple segments (e.g. because it's a TLS section)? I expect it to resize all such segments, which might not be trivial to achieve using llvm-objcopy's current architecture (sections are only in the top-most parent segment, and don't know about other segments they are in).

I agree it's a bug fix, but it's a bug fix not directly related to this issue. Please create a separate patch for it.


In general, we prefer to create the YAML file by hand, as obj2yaml output for an object is significantly more verbose than is required to test the behaviour. If after looking at existing examples of how to write program headers for yaml2obj, you need more assistance in creating a test input, I'm happy to help - I've had plenty of experience crafting them to test odd cases.


This line is too long and should be broken up over multiple lines, like the other switch names. I think there's a clang-format invocation to do that for tablegen files, but I'm not sure what it is.

--gap-fill should likely overwrite all interstitial data which includes data at the end of loadable segments not just the data between sections that we previously preserved. I view the --gap-fill use case as being a use case for things like lld's trap filling. Yes this violates the strict interpretation of "don't overwrite segment contents" but its ok in this case I think. Making binaries more secure is a good thing!

--pad-to is more tricky and requires altering the segment size which I'm not so fond of. We have to ensure that sections never fall outside of a segment or overlap another section when this happens.

What's the use case for these two flags? Where has it come up in actual practice that these were needed? I think they basically exist to cover an issue left by the linker that lld no longer leaves. I'm particularly hesitant to add --pad-to

"Interstitial data" is data covered by by a segment but not a section. Gap fill should IMO set the bytes for all of that space. This is consistent with James's recommendation how things work. --pad-to when padding to a page aligned address this can for instance ensure that all executable bytes contain either valid code or a trap instruction.

Another axiom I'd like to hold if possible for the use case: no new segment overlapping should occur and its an error if --pad-to causes this. Technically this is valid and can occur and the internals handle it so I'm not adding a hard no here.

Dear god lets throw an error for now if this has to touch a TLS segment/sections...thats a crazy rabit whole where the rules for how everything works change. When someone hits that case we can think about it then.

If the last PT_LOAD segment memsize and filesize don't agree then using -pad-to is invalid and should be be an error.

I'm still not satisfied I've thought of all the error conditions so expect more as I think about this more carefully.


I'd say "set all bytes not covered by an allocated section but covered by a PT_LOAD segment to <val>" or something explicit to that effect. Reading GNU objcopy's format agnostic documentation is far too vauge to guess the results of.


We should be explict and say "pad the last SHF_ALLOC section ...."