This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Exclude empty sections in IHexWriter output
ClosedPublic

Authored by mciantyre on Apr 26 2021, 3:52 PM.

Details

Summary

IHexWriter was evaluating a section's physical address when deciding if
that section should be written to an output. This approach does not
account for a zero-sized section that has the same physical address as a
sized section. The behavior varies from GNU objcopy, and may result in a
HEX file that does not include all program sections.

The IHexWriter now excludes zero-sized sections when deciding what
should be written to the output. This affects the contents of the
writer's Sections collection; we will not try to insert multiple
sections that could have the same physical address. The behavior seems
consistent with GNU objcopy, which always excludes empty sections,
no matter the address.

The new test case evaluates the IHexWriter behavior when provided a
variety of empty sections that overlap or append a filled section. See
the input file's comments for more information. Given that test input,
and the change to the IHexWriter, GNU objcopy and llvm-objcopy produce
the same output.

Diff Detail

Event Timeline

mciantyre created this revision.Apr 26 2021, 3:52 PM
mciantyre requested review of this revision.Apr 26 2021, 3:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2021, 3:52 PM

First time LLVM contributor. Let me know if there are any issues with this submission.

The extra empty section in this patch's test case approximates what I saw in some of my embedded development. This patch allows me to drop GNU objcopy, though I'll admit it hasn't had more real-world testing beyond one example. Thanks for your time and consideration.

Thanks for the patch! I'm not particularly familiar with IHex myself. Others in the reviewing list might be able to help out better.

Could you post an example of the output before your change, and after, so that we can see the difference? Also, what happens to empty sections in GNU objcopy when they don't share a physical address with another section? Finally, can you explain why switching the ordering from an address-based ordering to an offset-based ordering is safe?

llvm/tools/llvm-objcopy/ELF/Object.h
371 ↗(On Diff #340677)

Any particular reason you've wrapped this function in a distinct namespace?

The logic is likely similar to D79229 for -O binary. See the tests contributed in that patch. ihex-elf-segments.yaml adopting a similar style can probably make it easier to read.

MaskRay retitled this revision from [llvm-obcopy] Keep ihex sections with same address to [llvm-objcopy] Keep ihex sections with same address.Apr 28 2021, 4:21 PM

Thanks for the review, and for all the guidance to get to this point.

Could you post an example of the output before your change, and after, so that we can see the difference?

See the attached before.hex and after.hex. They're derived from input.elf, my real-world test case. before.hex is missing the entire .apps section, which is available in after.hex. I observed that the location of the zero-sized .storage section prevented llvm-objcopy from emitting .apps. Given the same input, GNU objcopy produces after.hex.

what happens to empty sections in GNU objcopy when they don't share a physical address with another section?

When a section is empty, GNU objcopy doesn't emit any HEX output, regardless of its location.

can you explain why switching the ordering from an address-based ordering to an offset-based ordering is safe?

Thanks for bringing this up. I hadn't considered how the comparator would affect not just inclusion, but also ordering. I think it's OK. From my limited study, a HEX file contains lines of numbers resembling

[absolute address]
[relative address][data for section]
[relative address][data]
[relative address][data]
...
[another absolute address]
[relative address][data for next section]
[relative address][data]
...

The section ordering in the HEX file shouldn't matter, since a section always describes its absolute address, followed by data at relative addresses. I gave it a quick test by manually relocating sections in my test case's HEX output. My flashing tool and embedded system still worked as expected.

I have low confidence. I'm not too familiar with HEX either, nor these kinds of ELF details. If we're concerned, I'm happy to bring back the previous IHexWriter::SectionCompare comparator, and introduce a check for section indices that would indicate unique sections, similar to Segment's comparator.

llvm/tools/llvm-objcopy/ELF/Object.h
371 ↗(On Diff #340677)

The comparator was previously private to the Segment class. Now that it's shared with the IHexWriter, I wanted a way to signal "this is an implementation detail" to those who include this header. I usually use namespace details for this, but namespace internal had more occurrences throughout the LLVM code base.

mciantyre marked an inline comment as not done.May 2 2021, 7:41 AM
jhenderson added a subscriber: evgeny777.

Thanks for the review, and for all the guidance to get to this point.

Could you post an example of the output before your change, and after, so that we can see the difference?

See the attached before.hex and after.hex. They're derived from input.elf, my real-world test case. before.hex is missing the entire .apps section, which is available in after.hex. I observed that the location of the zero-sized .storage section prevented llvm-objcopy from emitting .apps. Given the same input, GNU objcopy produces after.hex.

Thanks, that helped me understand the problem a bit better.

what happens to empty sections in GNU objcopy when they don't share a physical address with another section?

When a section is empty, GNU objcopy doesn't emit any HEX output, regardless of its location.

Okay, thanks. I think it would be a good idea for us to expand the empty section testing for ihex. @MaskRay mentioned something similar earlier. Perhaps a dedicated ihex-empty-sections.test would be better than mixing it all in with the "main" one. In the test case, I'd recommend we have a mixture of sections that share their offset or address with the start and end of other sections, plus at least one that doesn't share with any other section, to show it isn't emitted either. I think you'd need 9 empty sections plus two or three non-empty sections to cover all the bases. The empty sections would be "does not share offset or address with any section", "shares start address", "shares start offset", "shares end address" and "shares end offset", with each of the "sharing" cases actually having two cases where the section appears before it in the section header table, and appears after it in the section header table.

can you explain why switching the ordering from an address-based ordering to an offset-based ordering is safe?

Thanks for bringing this up. I hadn't considered how the comparator would affect not just inclusion, but also ordering. I think it's OK. From my limited study, a HEX file contains lines of numbers resembling

[absolute address]
[relative address][data for section]
[relative address][data]
[relative address][data]
...
[another absolute address]
[relative address][data for next section]
[relative address][data]
...

The section ordering in the HEX file shouldn't matter, since a section always describes its absolute address, followed by data at relative addresses. I gave it a quick test by manually relocating sections in my test case's HEX output. My flashing tool and embedded system still worked as expected.

I have low confidence. I'm not too familiar with HEX either, nor these kinds of ELF details. If we're concerned, I'm happy to bring back the previous IHexWriter::SectionCompare comparator, and introduce a check for section indices that would indicate unique sections, similar to Segment's comparator.

Thanks for the explanation. It makes reasonable sense to me. I wonder whether it would just be simpler to filter out empty ihex sections entirely though, i.e. not adding them to the Sections set at all? There's a ShouldWrite lambda that would be easy to expand to cover this. I'd write the test coverage as described above, and confirm that GNU objcopy never writes the empty section. Assuming that's the case, it seems harmless to switch? @MaskRay/@rupprecht/@evgeny777, does that make sense to you?

evgeny777 added inline comments.May 10 2021, 4:19 AM
llvm/tools/llvm-objcopy/ELF/Object.h
371 ↗(On Diff #340677)

You can use anonymous namespace here. Both classes are in the same TU, so no need for SectionCompare to have external visibility.

evgeny777 added inline comments.May 10 2021, 4:28 AM
llvm/tools/llvm-objcopy/ELF/Object.cpp
180

With this change you start using section offset/index instead of address for IHexWriter. Why? Is this something GNU objcopy does?

Sounds great, I'll get to work on that dedicated test case. Excluding all empty sections sounds reasonable, but I'll fully assess GNU objcopy's behaviors before making a recommendation.

llvm/tools/llvm-objcopy/ELF/Object.cpp
180

I didn't consider the address- vs offset-ordering implications of this change. My goal was to share code, and I was hopeful that if it works for Segment, it might work here. See the discussion around James' previous remarks for ways to undo this.

I've so far treated GNU objcopy as a black box, and I'm not sure how it's generating HEX outputs.

evgeny777 added inline comments.May 16 2021, 11:42 PM
llvm/tools/llvm-objcopy/ELF/Object.cpp
180

Given that GNU objcopy creates ihex segments in sorted (by physical address) order, I have concerns about this change. ELF segment has designated p_paddr field, so section physical addresses can be of different order than section offsets. I don't know if (and how often) this happens in reality though.

mciantyre updated this revision to Diff 347221.EditedMay 22 2021, 6:04 PM
mciantyre retitled this revision from [llvm-objcopy] Keep ihex sections with same address to [llvm-objcopy] Exclude empty sections in IHexWriter output.
mciantyre edited the summary of this revision. (Show Details)

Implement James' suggested tests; exclude zero-sized sections from output.

I skipped the "does not share offset or address with any section" test case. There's a similar section in the existing ihex-elf-sections2.yaml test input, part of ihex-writer.test.

mciantyre added inline comments.May 22 2021, 6:06 PM
llvm/tools/llvm-objcopy/ELF/Object.cpp
180

Reverted the comparator change. IHex sections are still ordered by physical address.

This revision is now accepted and ready to land.May 23 2021, 1:37 AM

I skipped the "does not share offset or address with any section" test case. There's a similar section in the existing ihex-elf-sections2.yaml test input, part of ihex-writer.test.

Would it make sense to move that test case into the new test? That way all empty section beahviour is localised to a single test.

llvm/test/tools/llvm-objcopy/ELF/Inputs/ihex-elf-empty-sections.yaml
1 ↗(On Diff #347221)

As this input is only used for a single test, I think it makes more sense to inline this directly in that test.

38 ↗(On Diff #347221)
70 ↗(On Diff #347221)

Perhaps also explain which cases each section is testing with comments, like you do above.

107 ↗(On Diff #347221)
114 ↗(On Diff #347221)
123 ↗(On Diff #347221)
125 ↗(On Diff #347221)
132 ↗(On Diff #347221)
139 ↗(On Diff #347221)
146 ↗(On Diff #347221)
153 ↗(On Diff #347221)
llvm/test/tools/llvm-objcopy/ELF/ihex-writer-empty-sections.test
3

You probably want something like --implicit-check-not={{.}} in your FileCheck command, so that no other lines can appear in the output before are after the bits you are actually checking. If you do that, you can omit the trailing CHECK-EMPTY. At the moment, it looks like you could have output before your first checked line.

5

Nit: In newer llvm-objcopy tests, we use ## for comments to distinguish them from FileCheck and lit directives.

mciantyre updated this revision to Diff 348626.May 29 2021, 4:11 AM

Would it make sense to move that test case into the new test? That way all empty section beahviour is localised to a single test.

Agreed. Moved the test case into the new test.

mciantyre marked 2 inline comments as done.May 29 2021, 4:14 AM
mciantyre added inline comments.
llvm/test/tools/llvm-objcopy/ELF/ihex-writer-empty-sections.test
5

Favored ## throughout, including comments of the inlined YAML.

mciantyre marked 12 inline comments as done.May 29 2021, 4:16 AM
mciantyre added inline comments.
llvm/test/tools/llvm-objcopy/ELF/Inputs/ihex-elf-empty-sections.yaml
1 ↗(On Diff #347221)

Inlined in the latest revision.

Sorry for the delay - I was off work last week. Some minor comments, otherwise basically this is good.

llvm/test/tools/llvm-objcopy/ELF/ihex-writer-empty-sections.test
35

Whilst not required by yaml2obj, it is useful to have this leading ---, because it is required when there are multiple YAML documents in the same file. Adding it now will allow a future expansion of the test with a different YAML input without needing to change this line.

41

Nit: we don't usually leave blank lines within the YAML, as it may make it less clear that this is all one YAML document.

50

I think you can omit the AddressAlign field throughout this YAML.

58

I'm guessing the exact content isn't really important, in which case, I'd just go with something that's 2 or 4 bytes long for brevity (e.g. "0123"). Same goes elsewhere.

83

I think this is a little clearer.

mciantyre updated this revision to Diff 351281.Jun 10 2021, 3:32 PM
mciantyre marked an inline comment as done.

Thanks for all the feedback. I'm happy to keep this in review if we're missing anyone's thoughts.

As a reminder: I don't have commit access. Let me know if there's anything you need from me before committing.

mciantyre marked 5 inline comments as done.Jun 10 2021, 3:32 PM
mciantyre added inline comments.
llvm/test/tools/llvm-objcopy/ELF/ihex-writer-empty-sections.test
58

Simplified contents, and corrected addresses throughout.

mciantyre marked an inline comment as done.Jun 10 2021, 3:34 PM
jhenderson accepted this revision.Jun 11 2021, 12:10 AM

LGTM. I'm not in a good position to push this myself right now, due to some other pending work. @evgeny777/@MaskRay (or anybody else), could you push this for @mciantyre, please?

@mciantyre, we'll need your name and email address that you want to appear in the git log in order to push this on your behalf.

Thanks all. I'll use Ian McIntyre <ianpmcintyre@gmail.com> for name and email.

MaskRay added inline comments.Jun 12 2021, 12:10 PM
llvm/tools/llvm-objcopy/ELF/Object.cpp
2671

Adding parentheses around != and > are not common. I'll delete them.

MaskRay accepted this revision.Jun 12 2021, 12:15 PM
MaskRay updated this revision to Diff 351677.Jun 12 2021, 12:15 PM

minor adjustment

MaskRay updated this revision to Diff 351678.Jun 12 2021, 12:22 PM

reflow test comments using 80-column text width

This revision was landed with ongoing or failed builds.Jun 12 2021, 12:23 PM
This revision was automatically updated to reflect the committed changes.
MaskRay added inline comments.Jun 12 2021, 12:50 PM
llvm/tools/llvm-objcopy/ELF/Object.cpp
180

Unordered p_paddr can happen with AT(addr) in a linker script. I agree that GNU objcopy sorts sections by physical address for -O binary and -O ihex output.

Created D104186 for simplification.