This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Add --dump-section
ClosedPublic

Authored by paulsemel on Jul 30 2018, 5:12 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

paulsemel created this revision.Jul 30 2018, 5:12 AM
paulsemel added a comment.EditedJul 30 2018, 5:14 AM

I know that there is this only-keep option that is supposed to be able to have similar behavior but :

  • It doesn't work at all (segfault). I made it work only with the dump-section.test, otherwise with real binaries it segfault.
  • This one is compliant with GNU binutils command line, and is not file format dependent. (even tho llvm-objcopy only handles ELF for the moment)

Thanks! I've been meaning to implement this forever. For instance this is the only way to dump non-allocated sections. I've run into cases debugging things where that was useful and had to use objcopy.

tools/llvm-objcopy/Object.h
622

If you use ELFSectionWriter you don't need to keep track of this information.

tools/llvm-objcopy/llvm-objcopy.cpp
241

It probably makes more sense to use llvm::objcopy::Object and to use an ELFSectionWriter to dump the contents.

498

You can just iterate over these instead of checking.

499–517

Could you wrap this in a function somehow?

Can you file a bug against me on the segfault? That sounds like a serious issue that needs to be addressed.

jhenderson added inline comments.Jul 31 2018, 2:30 AM
tools/llvm-objcopy/ObjcopyOpts.td
102

Nit: content -> contents

Also, it looks like we're being inconsistent with whether help text should have trailing full stops. I think the majority has none, so you might want to remove it here, and we can do a NFC tidy up later to bring the rest into line with each other (I could easily be persuaded that actually they should all have them - we should aim for consistency with other parts of LLVM).

tools/llvm-objcopy/llvm-objcopy.cpp
498

To be fair, this is the same as how the other options have been implemented.

500–501

I'd prefer less use of auto around here.

paulsemel marked 3 inline comments as done.Jul 31 2018, 3:35 AM
paulsemel added inline comments.
tools/llvm-objcopy/Object.h
622

Actually, the fact is that I truly think the way I am doing permits to be fully file format independent, which is a very good point for the expansion of the tool.

And I actually do not see problems doing this way because :

  • I compared with GNU objcopy, and it seems that this option is doing first, which means that we cannot really interfer with the changes made to the binary.
  • I couldn't really trigger an option that modifies the section and where the dump-section output is changed according to the modifications we made.
tools/llvm-objcopy/llvm-objcopy.cpp
241

See my comment above

498

Yes, my point was to follow what's being done in the rest of the code, but I can change I don't really matter.

paulsemel updated this revision to Diff 158202.Jul 31 2018, 3:36 AM

Added Jake and James' suggestions

jakehehrlich added inline comments.Jul 31 2018, 1:24 PM
tools/llvm-objcopy/Object.h
622

Hmm those facts are a bit at odds with what I've asked I suppose. The problem, as I see it, is the awkward extension of the Reader interface and the fact that this action is being embedded deep into some code that really has nothing todo with it. My solution appears not to be right either.

The fact that the dump-section happens before modification throws a wrench in my idea. Also thinking back on my use cases what I really always wanted was to know exactly what was in the binary, not what llvm-objcopy/strip modified it to become (which can happen even if no modifications are made just because layout can change). So I think dump-section happening before modification makes sense; this isn't GNU objcopy being strange again. In that case I think it would be ideal to use ExecuteElfObjcopyOnBinary and to handle this option there instead of in HandleArgs. Doing this in HandleArgs requires the kind of plumbing you've used here. Doing it before HandleArgs 1) makes it clear this is to happen independent of HandleArgs and 2) Negates the need for this plumbing which are the cheif issues as I see them.

As a further note, this would not work with an upcoming change to Reader when BinaryReader is implemented. So this change is at odds with the intended abstraction and an abstraction that is going to be used in the near future. So I can't allow the adding of Bin here like this (not unless another solution is presented...coincidentally Alex, James and I agreed on a refactoring scheme that I've yet to implement that would in fact resolve this but it would also change the code in such extensive ways that I'm not sure we'd even have the same issue anymore).

tools/llvm-objcopy/llvm-objcopy.cpp
498

Ah indeed. I think I've decided I find that silly. Do it this way for now. I might change that in the future. I remember the review where we did this.

So here's an idea. It is ELF specific but it extends the SectionBase interface in a way I've considered for a while and would make both this and another change quite easy. If we add an OriginalData field (ArrayRef<uint8_t>) to SectionBase we can dump that inside of HandleArgs.

paulsemel updated this revision to Diff 158482.Aug 1 2018, 2:41 AM

Added OriginData field as suggested by Jake

So here's an idea. It is ELF specific but it extends the SectionBase interface in a way I've considered for a while and would make both this and another change quite easy. If we add an OriginalData field (ArrayRef<uint8_t>) to SectionBase we can dump that inside of HandleArgs.

My first instinct on looking at the code as I considered this is that this should be handled in a way similar to how SplitDWOToFile works, i.e. to do it at the start of HandleArgs before any manipulation of Object takes place. I think this would mean that we wouldn't need any additional methods, because it would happen before any modifications occurred. And we have ELFSectionWriter that would achieve what we want, I believe? So just create a specific ELFSectionWriter for the dumping of sections and use that.

I'm not completely opposed to using another method to do it, but it seems like anything else we do would just add additional complexity?

tools/llvm-objcopy/llvm-objcopy.cpp
498

yeah, i remember that discussion - at some point (and probably for a particular place - i don't remember that exactly) i thought this would provide some benefits
(like early skipping some expensive checks), but later I took a closer look - in comparison with I/O + memory allocations + ELF parsing everything else in llvm-objcopy is almost invisible, so yeah, i think that wasn't right and code clarity is more important here. I think in any case we can refactor it on a separate diff (and it's really easy to do / not a big deal).

Due to the soon encroaching change where -I binary will be implemented I'm very much in favor of the OriginalData field being added to SectionBase. Really seems to handle everything very cleanly IMO.

Due to the soon encroaching change where -I binary will be implemented I'm very much in favor of the OriginalData field being added to SectionBase. Really seems to handle everything very cleanly IMO.

Okay, go for it. I have nerves around lifetime issues with references like that, but I think the structure means that the Reader (and therefore the file it reads) will be around long enough for it to not matter.

So should I go for it guys ?

This LGTM other than one issue

My first instinct on looking at the code as I considered this is that this should be handled in a way similar to how SplitDWOToFile works, i.e. to do it at the start of HandleArgs before any manipulation of Object takes place. I think this would mean that we wouldn't need any additional methods, because it would happen before any modifications occurred. And we have ELFSectionWriter that would achieve what we want, I believe? So just create a specific ELFSectionWriter for the dumping of sections and use that.

I'm not completely opposed to using another method to do it, but it seems like anything else we do would just add additional complexity?

For the record I like this. It runs contrary to the requirement that --dump-section use the original unmodified data however. For instance despite the fact that we try our best (except on section order? I might fix that) to maintain the original file details we can still change things in doing things that way.

test/tools/llvm-objcopy/dump-section.test
4

Can you also dump a non-allocated section with --dump-section to show why its far more useful than the -O binary -j hack?

tools/llvm-objcopy/Object.cpp
858 ↗(On Diff #158482)

Check for nobits here and use a size of zero if the type is nobits.

paulsemel updated this revision to Diff 158940.Aug 3 2018, 1:11 AM
paulsemel marked 2 inline comments as done.

Added Jake's suggestions.

jhenderson added inline comments.Aug 3 2018, 2:00 AM
test/tools/llvm-objcopy/dump-section.test
4

I'd also like a test showing what happens when you try to dump a nobits section.

5

Rather than use the implicitly generated .symtab, I'd prefer it if you create a simple small non-alloc section, with explicit contents (assuming this shows what @jakehehrlich is discussing above).

paulsemel updated this revision to Diff 159024.Aug 3 2018, 9:14 AM
paulsemel marked 2 inline comments as done.

Added James suggested tests

Should I land this guys ?

jhenderson accepted this revision.Aug 6 2018, 5:51 AM

LGTM, with the suggested changes, but please get @jakehehrlich to confirm he's happy with the test changes.

test/tools/llvm-objcopy/dump-section.test
32

Nit: Move this to the line immediately after the earlier "CHECK".

tools/llvm-objcopy/llvm-objcopy.cpp
246

I think this is a little unusual formatting of an error message, using '[' and '-'. I think a more common style would be to use '"' and ':':

'Can't dump section ".foo": it has no contents'

Also, llvm-objcopy's style is generally to use upper-case for the first letter in the error message.

This revision is now accepted and ready to land.Aug 6 2018, 5:51 AM
paulsemel updated this revision to Diff 159337.Aug 6 2018, 10:52 AM
paulsemel marked 2 inline comments as done.

Added Jame's suggestions

jakehehrlich accepted this revision.Aug 8 2018, 10:25 PM

oh god I apologize. I was wondering why this wasn't already submitted.

This revision was automatically updated to reflect the committed changes.