This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Change -O binary to respect section removal and behave like GNU objcopy
ClosedPublic

Authored by jakehehrlich on Nov 6 2017, 8:12 PM.

Details

Summary

The original -O binary implementation just copied segment data from the object and dumped it into a file. This doesn't take into account any operations performed on objects such as section removal. GNU objcopy has some specific behavior that we'd also like to respect. For instance using -O binary and -j <some_section> will dump <some_section> to a file. This change implements GNU objcopy style -O binary to as close of an approximation as I can determine.

Diff Detail

Repository
rL LLVM

Event Timeline

jakehehrlich created this revision.Nov 6 2017, 8:12 PM
jhenderson edited edge metadata.Nov 7 2017, 2:47 AM

Missing test for removing the first section in the first segment?

test/tools/llvm-objcopy/binary-remove-end.test
44 ↗(On Diff #121836)

To make things slightly less confusing, could you make od use hex for the address, please (-Ax). Ditto for the other tests.

jhenderson added inline comments.Nov 7 2017, 3:09 AM
tools/llvm-objcopy/Object.cpp
714 ↗(On Diff #121836)

I wonder if it might be useful to have an is_sorted assert here? What do you think?

Alternatively, why not just sort them here as were doing before?

715 ↗(On Diff #121836)

I think we can be a little more specific, and say "one past the end of the last segment" in this comment. Unless I've missed something?

Nit: here and elsewhere layed -> laid.

742 ↗(On Diff #121836)

Given that we now have a function called "LayoutSegments", I don't think you need to say "in some consistent fashion" here, since being laid out implies to me that the corresponding function has been called.

788 ↗(On Diff #121836)

I'm not sure you need to have this and the next comment - the function names are quite clear as to what they do.

792 ↗(On Diff #121836)

we need align -> we need to align.

857 ↗(On Diff #121836)

I'm not a massive fan of this duplicate comment, but I also see that it's not necessarily trivial to remove the duplication. About the best idea I had was to pass a predicate into OrderSegments, which has the responsibility of returning the copied and sorted vector, filtered according to the predicate. If this sounds sensible to you, that would be a good way of getting rid of some of this duplication.

Looking further down, this sounds quite similar to what you are proposing with the LayoutSections in your new TODO. If you want to defer this to later, please add a TODO here as well.

870 ↗(On Diff #121836)

Why do we need to std::unique this list?

873 ↗(On Diff #121836)

This first sentence doesn't quite make sense - segment's don't have a sh_offset.

894 ↗(On Diff #121836)

cosntructing -> constructing

905 ↗(On Diff #121836)

I'd be explicit, and say that "total file size". I'm not sure I understand the second sentence at all (and therefore why we can't just use Offset).

jakehehrlich added inline comments.
tools/llvm-objcopy/Object.cpp
714 ↗(On Diff #121836)

Adding the assert seems like a good idea. I separated sorting out because I need to know the OriginalOffset of the first segment in order to give LayoutSegments the starting offset.

857 ↗(On Diff #121836)

Yeah This should be solvable by the same filter range idea that I propose below. I'll add a TODO for that since I think I'd like the filter range to reside in Iterator.h not inside this code.

870 ↗(On Diff #121836)

Notice that we construct OrderedSegments by adding the ParentSegment of each section. There might be a single segment for multiple sections so we need to dedup these so that LayoutSegment doesn't do crazy things.

905 ↗(On Diff #121836)

Say the last section of the last segment was removed. Then Offset will be past the last section but to be consistent with GNU objcopy we need TotalSize to be such that the output ends at the end of the section, not the end of the segment. Offset == TotalSize is true if there either isn't a gap at the end of the section or if an extra allocated section is added that isn't in a segment.

jhenderson added inline comments.Nov 8 2017, 2:50 AM
test/tools/llvm-objcopy/two-seg-remove-end.test
59 ↗(On Diff #121962)

It's not particularly important, but this and several of the other tests look like they're not testing for the correct number of digits in the address.

tools/llvm-objcopy/Object.cpp
708 ↗(On Diff #121962)

I don't think you need to repeat "This function" at the start of the second sentence. Simply "It", or combining the two sentences would suffice.

739 ↗(On Diff #121962)

Please comment what exactly is returned here. By my understanding, it will be the end offset of the last segment, if there are no sections outside segments, or the end offset of the last section.

768 ↗(On Diff #121962)

it's -> its

903 ↗(On Diff #121962)

const auto &

857 ↗(On Diff #121836)

Sounds fair. Thinking about it, this loop is basically just a std::copy_if followed by a std::transform.

870 ↗(On Diff #121836)

Okay, makes sense. Please add a comment to this effect.

905 ↗(On Diff #121836)

Ok, I understand. I still think the comment needs rewriting though. How about something like:

"Now that every section has been laid out we just need to compute the total file size. This might not be the same as the offset returned by LayoutSections, because we want to truncate the last segment to the end of its last section, to match GNU objcopy's behaviour."

  1. Fixed some comments
  2. Added reference in loop over AllocatedSections
jakehehrlich added inline comments.Nov 8 2017, 12:28 PM
tools/llvm-objcopy/Object.cpp
903 ↗(On Diff #121962)

AllocatedSections is a vector of pointers so we shouldn't need to add that. Is there a reason in to add the reference that I'm not aware of? It's not as if it harms anything by being there, I'm just curious. I went ahead and added it in case there's some reason it should be done so that this doesn't block an LGTM.

Still quite a few comment changes, and one or two that were missed, so I'm going to ask you to update them again, before giving a LGTM. Sorry!

tools/llvm-objcopy/Object.cpp
708 ↗(On Diff #121962)

Okay, my bad. That didn't come out quite as I meant/felt it would, and it now feels a little unnatural. As I believe that the LLVM standard is for full sentence, that first sentence needs to be either "This function finds a..." or "Find a...". Also, you've got "of list" when you need "list of".

Depending on which of the two suggestions you use for the first sentence, you should change the start of the second and third sentences accordingly. If the first approach is used, don't change the second sentence and merge the two: "It assumes ... OrderSegments, and returns ...". If using the second approach, you need to put "This function" instead of the first "It". Optionally, you can then fold the two sentences together as in the previous case.

739 ↗(On Diff #121962)

You've used too many "returns" in this comment now. Should be "It returns either the ... ParentSegment, or an offset ..."

903 ↗(On Diff #121962)

Good point, I missed this. Feel free to remove it again, if you prefer.

870 ↗(On Diff #121836)

Ping here.

905 ↗(On Diff #121836)

Ping here.

Fixed more comments

jhenderson accepted this revision.Nov 13 2017, 3:01 AM

Three more minor points, but LGTM with the suggested fixes.

tools/llvm-objcopy/Object.cpp
736 ↗(On Diff #122386)

layout of -> layout for

869 ↗(On Diff #122386)

stranges -> strange

708 ↗(On Diff #121962)

OrderSegments and It returns -> OrderSegments and returns

This revision is now accepted and ready to land.Nov 13 2017, 3:01 AM

Whoops! Turns out I added a bug in this change where certain kinds of cycles in ParentSegment could form because there wasn't a check to make sure that compareSegments(Parent, Child) held if Child->ParentSegment was nullptr. This caused -O binary to fail in our kernel dump. I fixed this and added a test case. Additionally the way I modified the first segment wasn't exactly right.

@jhenderson could you please review the additional changes I made?

  1. Added additional compareSegment check to make sure that bad ParentSegments are not set
  2. Set alignment to zero because sometimes the first used segment won't even have an OriginalOffset and Align that let's us place it at offset 0x0 but we need to force that to happen regardless. This is also checked by the new test.
jakehehrlich requested review of this revision.Nov 13 2017, 6:54 PM
jakehehrlich edited edge metadata.

Oops, I should have spotted that as well.

Although I think the alignment change is tested as part of parent-loop-check.test, I think it should have its own explicit test, to make it easier to see what is being tested. Please could you add one.

test/tools/llvm-objcopy/parent-loop-check.test
1 ↗(On Diff #122762)

Could you add a comment to the top of this test case to explain what is special about this case, please.

  1. Added test for setting alignment of first segment to zero
  2. Added description of the parent cycle test.
jhenderson accepted this revision.Nov 15 2017, 8:57 AM

As mentioned inline, I'd like the new test renaming slightly, but otherwise LGTM.

test/tools/llvm-objcopy/binary-align-first.test
1 ↗(On Diff #122896)

The test itself is fine, but I think it needs a different name, and a brief comment explaining the purpose of the test (i.e. that the first segment is always at offset zero). Possible new test name: binary-first-seg-offset-zero.test (but other names are aacceptable, as long as they convey some form of meaning).

This revision is now accepted and ready to land.Nov 15 2017, 8:57 AM
This revision was automatically updated to reflect the committed changes.