This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Add --update-section
ClosedPublic

Authored by leonardchan on Oct 19 2021, 6:43 PM.

Details

Summary

This is another attempt at D59351 which attempted to add --update-section, but with some heuristics for adjusting segment/section offsets/sizes in the event the data copied into the section is larger than the original size of the section. We are opting to not support this case. GNU's objcopy was able to do this because the linker and objcopy are tightly coupled enough that segment reformatting was simpler. This is not the case with llvm-objcopy and lld where they like to be separated.

This will attempt to copy data into the section without changing any other properties of the parent segment (if the section is part of one).

Diff Detail

Event Timeline

leonardchan created this revision.Oct 19 2021, 6:43 PM
leonardchan requested review of this revision.Oct 19 2021, 6:43 PM
jhenderson requested changes to this revision.Oct 20 2021, 12:52 AM

High-level question: does GNU objcopy change the section size for sections that are updated?

Overall, this approach seems reasonable, but there are some issues.

llvm/test/tools/llvm-objcopy/ELF/update-section.test
4

Nit: Newer tests in the llvm-binutils use ## for comments, to help them stand out from RUN and CHECK lines.

Should we also have a test a case without a segment?

9

Sorry, you've completely lost me with this comment. I don't know what this case is trying to test.

15

I think this does mean the section size as recorded in the section header is reduced though, right? Probably say "This will only overwrite the start of the original data and adjust the section header."

Should we also have a test case without a segment?

20

Nit: this and the next comment are missing trailing full stops.

23

If there's more than one type, I think we should test each individual type. If there's only one, just state that particular type in the comments.

26

This seems like it should only be an issue for segment sections, right? Probably need to say that in the comment and then have a test case where it works for sections not in segments.

39

The flags aren't important for this test, so I'd delete them.

41

I don't think the AddressAlign is important either. Delete it.

43

This section doesn't appear to be used, so delete it.

If it's important to the test case either name it accordingly or add a comment.

49

This section doesn't appear to be used, so delete it.

If it's important to the test case either name it accordingly or add a comment.

63

Flags aren't used. Delete them.

65

PAddr isn't used. Delete it.

66

I don't believe this Align is used either, so delete it.

69

Why do we need 2 program headers?

160

What about ERR2?

llvm/tools/llvm-objcopy/ConfigManager.cpp
885–887

This check needs a test case.

889–891

This check needs a test case.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
553

Use function_ref instead of std::function.

558–559

Is there existing testing for this case?

695

Nit: here (and in the existing code above for AddSection), I wouldn't use auto. If I'm not mistaken, Flag is a StringRef, meaning that the const & is pointless (StringRef is designed for passing by value after all), but the auto hides this fact.

Unrelated to that: I wonder if we should have a test case where a user sepcifies the same section for both --add-section and --update-section. To me, the logical behaviour is add it and then use the update section contents, regardless of the command-line option order, but really it should just be whatever GNU objcopy does. The same possibly applies with --remove-section, although I think you probably only need the first test, because we already test that --remove-section and --add-section do what you might expect.

llvm/tools/llvm-objcopy/ELF/Object.cpp
2110–2115

The LLVM style guide states that we don't use auto unless the type is obvious from the immediate context, or for iterators (see https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable). I think the only auto that should be present in this block is the one in the for loop. But then, why is this a traditional style for loop rather than a range-based for loop?

2116

This assert will fire if you specify --update-section with a section that isn't in a segment, if I'm not mistaken.

2142

Here and with the other error messages, the LLVM coding standard says error messages should generally not start with a capital letter or end with a full stop. See https://llvm.org/docs/CodingStandards.html#error-and-warning-messages.

2144

Perhaps pull *It into its own variable, as it's repeated 3 times, and the meaning isn't particularly clear, especially in isolation.

llvm/tools/llvm-objcopy/ELF/Object.h
497

I feel like this would be better named as hasContents. After all, that's really the defining feature of SHT_NOBITS sections. It should probably also include SHT_NULL sections, since they are "inactive" and so their values are meaningless.

1031

I don't think a SmallVector here is necessarily that sensible: I expect the majority of section content updates to be non-trivial in size, so 16 seems quite small. It may be better to just use std::vector (but I could be persuaded otherwise).

This revision now requires changes to proceed.Oct 20 2021, 12:52 AM

High-level question: does GNU objcopy change the section size for sections that are updated?

Looks like it does. It will change the section size to whatever the size of the file was. If the section happens to be part of a segment, then that segment (FileSize/MemSize) will increase/decrease by however much larger/smaller the file is compared to the original section size. (Example: given an 8 byte section, then using a 6 byte file will decrease the section size to 6 and reduce the segment size by 2. Given a 10 byte file, the section size increases to 10 and the segment size increases by 2).

For this implementation, we just want to avoid the hassle of resizing a segment. Currently this patch throws an error for a file larger than the section, but kinda just "accepts" files smaller than the section (effectively doing a short memcpy only overwriting part of the section without resizing). If in the future, there's a demand for resizing sections/segments, then maybe it would be more desirable to update this patch such that it will only work on same-size files and throw an error if the file is smaller. This way, the behavior can still be defined down the line rather than saying it's just a short memcpy for now. Thoughts?

In the meantime, I'll address code comments that aren't too crucial to design.

High-level question: does GNU objcopy change the section size for sections that are updated?

Looks like it does. It will change the section size to whatever the size of the file was. If the section happens to be part of a segment, then that segment (FileSize/MemSize) will increase/decrease by however much larger/smaller the file is compared to the original section size. (Example: given an 8 byte section, then using a 6 byte file will decrease the section size to 6 and reduce the segment size by 2. Given a 10 byte file, the section size increases to 10 and the segment size increases by 2).

For this implementation, we just want to avoid the hassle of resizing a segment. Currently this patch throws an error for a file larger than the section, but kinda just "accepts" files smaller than the section (effectively doing a short memcpy only overwriting part of the section without resizing). If in the future, there's a demand for resizing sections/segments, then maybe it would be more desirable to update this patch such that it will only work on same-size files and throw an error if the file is smaller. This way, the behavior can still be defined down the line rather than saying it's just a short memcpy for now. Thoughts?

In the meantime, I'll address code comments that aren't too crucial to design.

I think we can leave the segment size, to avoid having to change segments in general (I could see an argument for shrinking or even possibly growing a segment, if the section is last, but I don't think we want to go down that route in this patch). I do think it's important to change the section size, though, in my opinion. You can just leave behind the old data in the part of the segment that is no longer covered by a section header, or do what happens when a section is explicitly removed from a segment.

Now that I think about it, there should be nothing stopping you growing a section that isn't in a segment - llvm-objcopy should be able to happily update the other section offsets. I think we should only reject growing sections if they are part of a segment.

leonardchan marked 19 inline comments as done.
leonardchan edited the summary of this revision. (Show Details)

I think we can leave the segment size, to avoid having to change segments in general (I could see an argument for shrinking or even possibly growing a segment, if the section is last, but I don't think we want to go down that route in this patch). I do think it's important to change the section size, though, in my opinion. You can just leave behind the old data in the part of the segment that is no longer covered by a section header, or do what happens when a section is explicitly removed from a segment.

Done.

Now that I think about it, there should be nothing stopping you growing a section that isn't in a segment - llvm-objcopy should be able to happily update the other section offsets. I think we should only reject growing sections if they are part of a segment.

Done. Added a test case for this.

llvm/test/tools/llvm-objcopy/ELF/update-section.test
4

Done. Also added a case with a section not part of a segment.

9

This should test that if we overwrite a given section (.text) with new data, then rewrite it back (from the second --update-section), then it should be the exact same as the original file.

Perhaps I can just add this explanation as the comment.

69

This was part of leftovers from D59351. Removed since it looks like we don't need them either.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
558–559

There should be. This isn't new code and was just moved out of handleArgs.

695

Added cases for working with --add-section. I haven't experimented too deeply with --add-section + --update-section on GNU objcopy, but it looks like it's functionally equivalent to just adding a section, then updating it normally as if it were an already existing section.

llvm/tools/llvm-objcopy/ELF/Object.cpp
2110–2115

I think I've grown a habit of reserving range-based for loops only for vector-like structures and using traditional loops for everything else. Updated.

2116

Ah, thanks for catching. I believe this shouldn't fire now that we handle sections not part of segments.

llvm/tools/llvm-objcopy/ELF/Object.h
1031

That could be the case. I've personally never used --update-section so I don't know how many times/sections someone will update in a single invocation. Our specific use case I believe is to just update a single .checksum section.

Regardless, changed to std::vector.

jhenderson added inline comments.Nov 9 2021, 12:44 AM
llvm/test/tools/llvm-objcopy/ELF/update-section-error.test
1 ↗(On Diff #385658)

Up to you, but these days, we tend to add error case checks for argument parsing in the same test as testing the normal behaviour of the option.

llvm/test/tools/llvm-objcopy/ELF/update-section.test
5

It might be useful to name these files "%t.smaller", "%t.same" and "%t.bigger" or something similar, so that it's obvious from the specific run line what the test is testing.

9

"Show that if we overwrite...".

Do both operations actually happen, or is it just "last one wins"? In particular, I'm thinking of the case where you attempt to grow a section in a segment (which should fail) and then update it again back to its original (or smaller) size, in the same invocation. In this case, you could make an argument that it should just work, because you can ignore all but the last --update-section for any given named section. (I can also see an argument that both are applied though, so you'd get an error). Thoughts?

53

Perhaps call this ".in_segment", for full detail.

55

You may be able to get away with the default Address for this section and in the program header. Double-check the generated object to make sure the section does appear in the segment, but I think it should just work.

57

Perhaps call this ".unsupported_type", for full detail.

Since it's not in a segment, it doesn't need an address.

61

Perhaps call this ".not_in_segment", for full detail.

Since it's not in a segment, it doesn't need an address.

72

I may be being stupid, but the new size looks like it's 4 bytes? Are the section data bytes characters or hex numbers?

118

Probably need to state why it can't be updated here.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
553

Do you need the explicit llvm::?

llvm/tools/llvm-objcopy/ELF/Object.cpp
2112

Use ArrayRef<uint8_t> here.

2146

I'd delete this blank line.

llvm/tools/llvm-objcopy/ELF/Object.h
498

Need a test case for SHT_NULL.

leonardchan marked 17 inline comments as done.
leonardchan added inline comments.
llvm/test/tools/llvm-objcopy/ELF/update-section-error.test
1 ↗(On Diff #385658)

Fair. Moved to the original test file.

llvm/test/tools/llvm-objcopy/ELF/update-section.test
9

I see. Personally I'd prefer getting the error on the size growing and not allowing last-one-wins. I think having a situation where someone does --update-section=.text=larger.txt --update-section=.text=smaller.txtpassing, then deciding to remove --update-section=.text=smaller.txt (which leads to the error) be kind of confusing. I'd like to avoid allowing any bad usage "hidden". Added a test for this.

72

Those are character bytes.

118

Updated the error message.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
553

Nope

leonardchan marked 2 inline comments as done.Nov 9 2021, 5:04 PM
jhenderson added inline comments.Nov 10 2021, 3:38 AM
llvm/test/tools/llvm-objcopy/ELF/update-section.test
15

It's a bit fiddly, but I think you should test the bytes that appear immediately after this section to show that the contents are unchanged. It's not necessarily an issue if they are/aren't changed, but I think we need to make a conscious decision should this behaviour ever be changed, hence the test.

Also, some more testing that occurred to me: we should have a section immediately following the to-be-shrunk section, and show that its offset hasn't changed (or more specifically, its offset relative to the segment start), to show that the shrinking doesn't cause the other section to bunch up.

35

Here and elsewhere, I'd recommend breaking up these run lines as they're getting quite long, as per the inline edit. You don't need to be strict about the length, but the 80 character limit for code is a good rough guideline.

43

Don't need to re-echo this line, since you already have %t.larger. Maybe move all the echoes up to the start of the file, so you only have the llvm-objcopy run lines by this point.

52

Perhaps add a brief comment before these two cases saying something like "test option parsing failures" or something to that effect.

57

Actually, now that you've got a SHT_NULL type, I'd call it .nobits_type. I believe it also doesn't need a size?

llvm/tools/llvm-objcopy/ELF/Object.cpp
2149

Maybe "does not have contents"? That seems to be more the specific issue here.

leonardchan marked 5 inline comments as done.
leonardchan added inline comments.
llvm/test/tools/llvm-objcopy/ELF/update-section.test
15

It's a bit fiddly, but I think you should test the bytes that appear immediately after this section to show that the contents are unchanged. It's not necessarily an issue if they are/aren't changed, but I think we need to make a conscious decision should this behaviour ever be changed, hence the test.

I think I may want to push back on this part since this feels like it would be testing against undefined behavior (by checking for a value that we don't explicitly set/guarantee). I think the user just wants to ensure that the smaller chunk is written (and the size is updated), but not necessarily what the remaining data is. It just so happens that under the hood, we do sequential memcpy's that result on the old data still being there, but end-to-end-behavior-wise, I think we don't want to explicitly test the implementation.

I see that the test comment actually points this out, which is my bad. So I'll update this.

(I'll also admit that I can't find an easy way to test this other than doing a hexdump and inspecting specific bits, which seems a bit complex :)

Also, some more testing that occurred to me: we should have a section immediately following the to-be-shrunk section, and show that its offset hasn't changed (or more specifically, its offset relative to the segment start), to show that the shrinking doesn't cause the other section to bunch up.

Added a test for this.

leonardchan marked an inline comment as done.Nov 10 2021, 1:48 PM
jhenderson added inline comments.
llvm/test/tools/llvm-objcopy/ELF/update-section.test
15

Elsewhere, llvm-objcopy's rules for what gets written in the gaps within segments which are not covered by sections are well defined. Specifically, the existing contents are preserved, unless a section has been explicitly removed, in which case zeroes are written. The logic for the former is that there may be important data stored outside a section, such as interrupt instructions to prevent overrunning the end of a function or similar. The latter makes sense because you've deliberately removed a section, so you shouldn't leave its data lying around (maybe it was removed for reasons of secrecy etc). Arguably, the same case could be made for --update-section, but I could see it either way.

I've used od in the past to dump hex bytes of segments. See, for example, preserve-segment-contents.test. It's not actually that hard, as long as you know the offset (NB: I'm going off memory of that test, which might be incorrect, but there should be other examples if I'm wrong).

As an aside, if and when --gap-fill is ever implemented, it should definitely overwrite the data left behind when shrinking the section. Also, if --pad-to is implemented, it may be possible to use similar logic to allow --update-section to grow the section (otherwise we're in a weird situation where --pad-to can be used to grow the section, but not --update-section!). Not suggesting this as part of this patch, but I'm going to raise these points on D67689 too.

96–99

Sorry for not bringing this up until now, but I'm a bit nervous about not using -NEXT in these checks. I know they're not in adjacent lines, so it's a little tricky, and it's fairly unlikely you'll get spurious passes by the checks matching following sections, but it still makes me twitchy. It may also degrade failure reports from FileCheck, as it might start matching at a later (incorrect) point.

I'd suggest perhaps the following pattern:

# NORMAL:      Size:
# NORMAL-SAME:      {{ }}8{{$}}

The addition of the {{ }} and {{$}} ensure that the digit isn't part of a wider number (otherwise it would match 6789, for example, not just 8.

Similar comments apply for the Offset and Size fields elsewhere below. For the Content, you probably want to do a similar approach too, to ensure you're matching that section's contents, and not some other section.

leonardchan marked an inline comment as done.
leonardchan added inline comments.
llvm/test/tools/llvm-objcopy/ELF/update-section.test
15

Ah, thank you for the clarifications and tests I can reference. Added appropriate test cases.

96–99

This is my first time learning of the -SAME directive. Very useful! Updated tests.

jhenderson accepted this revision.Nov 16 2021, 12:45 AM

I was just thinking about whether it would be useful to have an additional test case that shows that the section after a shortened not-in-segment section is moved to compress space (and therefore remaining bytes are overwritten. I'm not 100% certain it's needed, so up to you.

Otherwise, LGTM.

llvm/test/tools/llvm-objcopy/ELF/update-section.test
29
This revision is now accepted and ready to land.Nov 16 2021, 12:45 AM
This revision was automatically updated to reflect the committed changes.
leonardchan marked an inline comment as done.