The original -O binary implementation just copied segment data from the object and dumped it into a file. This doesn't take into account any operations performed on objects such as section removal. GNU objcopy has some specific behavior that we'd also like to respect. For instance using -O binary and -j <some_section> will dump <some_section> to a file. This change implements GNU objcopy style -O binary to as close of an approximation as I can determine.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Missing test for removing the first section in the first segment?
| test/tools/llvm-objcopy/binary-remove-end.test | ||
|---|---|---|
| 45 | To make things slightly less confusing, could you make od use hex for the address, please (-Ax). Ditto for the other tests. | |
| tools/llvm-objcopy/Object.cpp | ||
|---|---|---|
| 705 | I wonder if it might be useful to have an is_sorted assert here? What do you think? Alternatively, why not just sort them here as were doing before? | |
| 706 | I think we can be a little more specific, and say "one past the end of the last segment" in this comment. Unless I've missed something? Nit: here and elsewhere layed -> laid. | |
| 737 | Given that we now have a function called "LayoutSegments", I don't think you need to say "in some consistent fashion" here, since being laid out implies to me that the corresponding function has been called. | |
| 783 | I'm not sure you need to have this and the next comment - the function names are quite clear as to what they do. | |
| 787 | we need align -> we need to align. | |
| 850–861 | I'm not a massive fan of this duplicate comment, but I also see that it's not necessarily trivial to remove the duplication. About the best idea I had was to pass a predicate into OrderSegments, which has the responsibility of returning the copied and sorted vector, filtered according to the predicate. If this sounds sensible to you, that would be a good way of getting rid of some of this duplication. Looking further down, this sounds quite similar to what you are proposing with the LayoutSections in your new TODO. If you want to defer this to later, please add a TODO here as well. | |
| 866 | Why do we need to std::unique this list? | |
| 869 | This first sentence doesn't quite make sense - segment's don't have a sh_offset. | |
| 890 | cosntructing -> constructing | |
| 901 | I'd be explicit, and say that "total file size". I'm not sure I understand the second sentence at all (and therefore why we can't just use Offset). | |
| tools/llvm-objcopy/Object.cpp | ||
|---|---|---|
| 705 | Adding the assert seems like a good idea. I separated sorting out because I need to know the OriginalOffset of the first segment in order to give LayoutSegments the starting offset. | |
| 850–861 | Yeah This should be solvable by the same filter range idea that I propose below. I'll add a TODO for that since I think I'd like the filter range to reside in Iterator.h not inside this code. | |
| 866 | Notice that we construct OrderedSegments by adding the ParentSegment of each section. There might be a single segment for multiple sections so we need to dedup these so that LayoutSegment doesn't do crazy things. | |
| 901 | Say the last section of the last segment was removed. Then Offset will be past the last section but to be consistent with GNU objcopy we need TotalSize to be such that the output ends at the end of the section, not the end of the segment. Offset == TotalSize is true if there either isn't a gap at the end of the section or if an extra allocated section is added that isn't in a segment. | |
| test/tools/llvm-objcopy/two-seg-remove-end.test | ||
|---|---|---|
| 59 | It's not particularly important, but this and several of the other tests look like they're not testing for the correct number of digits in the address. | |
| tools/llvm-objcopy/Object.cpp | ||
| 708 | I don't think you need to repeat "This function" at the start of the second sentence. Simply "It", or combining the two sentences would suffice. | |
| 739 | Please comment what exactly is returned here. By my understanding, it will be the end offset of the last segment, if there are no sections outside segments, or the end offset of the last section. | |
| 768 | it's -> its | |
| 850–861 | Sounds fair. Thinking about it, this loop is basically just a std::copy_if followed by a std::transform. | |
| 866 | Okay, makes sense. Please add a comment to this effect. | |
| 901 | Ok, I understand. I still think the comment needs rewriting though. How about something like: "Now that every section has been laid out we just need to compute the total file size. This might not be the same as the offset returned by LayoutSections, because we want to truncate the last segment to the end of its last section, to match GNU objcopy's behaviour." | |
| 903 | const auto & | |
| tools/llvm-objcopy/Object.cpp | ||
|---|---|---|
| 903 | AllocatedSections is a vector of pointers so we shouldn't need to add that. Is there a reason in to add the reference that I'm not aware of? It's not as if it harms anything by being there, I'm just curious. I went ahead and added it in case there's some reason it should be done so that this doesn't block an LGTM. | |
Still quite a few comment changes, and one or two that were missed, so I'm going to ask you to update them again, before giving a LGTM. Sorry!
| tools/llvm-objcopy/Object.cpp | ||
|---|---|---|
| 708 | Okay, my bad. That didn't come out quite as I meant/felt it would, and it now feels a little unnatural. As I believe that the LLVM standard is for full sentence, that first sentence needs to be either "This function finds a..." or "Find a...". Also, you've got "of list" when you need "list of". Depending on which of the two suggestions you use for the first sentence, you should change the start of the second and third sentences accordingly. If the first approach is used, don't change the second sentence and merge the two: "It assumes ... OrderSegments, and returns ...". If using the second approach, you need to put "This function" instead of the first "It". Optionally, you can then fold the two sentences together as in the previous case. | |
| 739 | You've used too many "returns" in this comment now. Should be "It returns either the ... ParentSegment, or an offset ..." | |
| 866 | Ping here. | |
| 901 | Ping here. | |
| 903 | Good point, I missed this. Feel free to remove it again, if you prefer. | |
Whoops! Turns out I added a bug in this change where certain kinds of cycles in ParentSegment could form because there wasn't a check to make sure that compareSegments(Parent, Child) held if Child->ParentSegment was nullptr. This caused -O binary to fail in our kernel dump. I fixed this and added a test case. Additionally the way I modified the first segment wasn't exactly right.
@jhenderson could you please review the additional changes I made?
- Added additional compareSegment check to make sure that bad ParentSegments are not set
- Set alignment to zero because sometimes the first used segment won't even have an OriginalOffset and Align that let's us place it at offset 0x0 but we need to force that to happen regardless. This is also checked by the new test.
Oops, I should have spotted that as well.
Although I think the alignment change is tested as part of parent-loop-check.test, I think it should have its own explicit test, to make it easier to see what is being tested. Please could you add one.
| test/tools/llvm-objcopy/parent-loop-check.test | ||
|---|---|---|
| 1 ↗ | (On Diff #122762) | Could you add a comment to the top of this test case to explain what is special about this case, please. | 
- Added test for setting alignment of first segment to zero
- Added description of the parent cycle test.
As mentioned inline, I'd like the new test renaming slightly, but otherwise LGTM.
| test/tools/llvm-objcopy/binary-align-first.test | ||
|---|---|---|
| 1 ↗ | (On Diff #122896) | The test itself is fine, but I think it needs a different name, and a brief comment explaining the purpose of the test (i.e. that the first segment is always at offset zero). Possible new test name: binary-first-seg-offset-zero.test (but other names are aacceptable, as long as they convey some form of meaning). | 
To make things slightly less confusing, could you make od use hex for the address, please (-Ax). Ditto for the other tests.