Add the --dump-section option to llvm-objcopy.
I know that there is this only-keep option that is supposed to be able to have similar behavior but :
- It doesn't work at all (segfault). I made it work only with the dump-section.test, otherwise with real binaries it segfault.
- This one is compliant with GNU binutils command line, and is not file format dependent. (even tho llvm-objcopy only handles ELF for the moment)
Thanks! I've been meaning to implement this forever. For instance this is the only way to dump non-allocated sections. I've run into cases debugging things where that was useful and had to use objcopy.
If you use ELFSectionWriter you don't need to keep track of this information.
It probably makes more sense to use llvm::objcopy::Object and to use an ELFSectionWriter to dump the contents.
You can just iterate over these instead of checking.
Could you wrap this in a function somehow?
Nit: content -> contents
Also, it looks like we're being inconsistent with whether help text should have trailing full stops. I think the majority has none, so you might want to remove it here, and we can do a NFC tidy up later to bring the rest into line with each other (I could easily be persuaded that actually they should all have them - we should aim for consistency with other parts of LLVM).
To be fair, this is the same as how the other options have been implemented.
I'd prefer less use of auto around here.
Actually, the fact is that I truly think the way I am doing permits to be fully file format independent, which is a very good point for the expansion of the tool.
And I actually do not see problems doing this way because :
See my comment above
Yes, my point was to follow what's being done in the rest of the code, but I can change I don't really matter.
Hmm those facts are a bit at odds with what I've asked I suppose. The problem, as I see it, is the awkward extension of the Reader interface and the fact that this action is being embedded deep into some code that really has nothing todo with it. My solution appears not to be right either.
The fact that the dump-section happens before modification throws a wrench in my idea. Also thinking back on my use cases what I really always wanted was to know exactly what was in the binary, not what llvm-objcopy/strip modified it to become (which can happen even if no modifications are made just because layout can change). So I think dump-section happening before modification makes sense; this isn't GNU objcopy being strange again. In that case I think it would be ideal to use ExecuteElfObjcopyOnBinary and to handle this option there instead of in HandleArgs. Doing this in HandleArgs requires the kind of plumbing you've used here. Doing it before HandleArgs 1) makes it clear this is to happen independent of HandleArgs and 2) Negates the need for this plumbing which are the cheif issues as I see them.
As a further note, this would not work with an upcoming change to Reader when BinaryReader is implemented. So this change is at odds with the intended abstraction and an abstraction that is going to be used in the near future. So I can't allow the adding of Bin here like this (not unless another solution is presented...coincidentally Alex, James and I agreed on a refactoring scheme that I've yet to implement that would in fact resolve this but it would also change the code in such extensive ways that I'm not sure we'd even have the same issue anymore).
Ah indeed. I think I've decided I find that silly. Do it this way for now. I might change that in the future. I remember the review where we did this.
So here's an idea. It is ELF specific but it extends the SectionBase interface in a way I've considered for a while and would make both this and another change quite easy. If we add an OriginalData field (ArrayRef<uint8_t>) to SectionBase we can dump that inside of HandleArgs.
My first instinct on looking at the code as I considered this is that this should be handled in a way similar to how SplitDWOToFile works, i.e. to do it at the start of HandleArgs before any manipulation of Object takes place. I think this would mean that we wouldn't need any additional methods, because it would happen before any modifications occurred. And we have ELFSectionWriter that would achieve what we want, I believe? So just create a specific ELFSectionWriter for the dumping of sections and use that.
I'm not completely opposed to using another method to do it, but it seems like anything else we do would just add additional complexity?
yeah, i remember that discussion - at some point (and probably for a particular place - i don't remember that exactly) i thought this would provide some benefits
Due to the soon encroaching change where -I binary will be implemented I'm very much in favor of the OriginalData field being added to SectionBase. Really seems to handle everything very cleanly IMO.
Okay, go for it. I have nerves around lifetime issues with references like that, but I think the structure means that the Reader (and therefore the file it reads) will be around long enough for it to not matter.
This LGTM other than one issue
For the record I like this. It runs contrary to the requirement that --dump-section use the original unmodified data however. For instance despite the fact that we try our best (except on section order? I might fix that) to maintain the original file details we can still change things in doing things that way.
Can you also dump a non-allocated section with --dump-section to show why its far more useful than the -O binary -j hack?
|858 ↗||(On Diff #158482)|
Check for nobits here and use a size of zero if the type is nobits.
I'd also like a test showing what happens when you try to dump a nobits section.
Rather than use the implicitly generated .symtab, I'd prefer it if you create a simple small non-alloc section, with explicit contents (assuming this shows what @jakehehrlich is discussing above).
LGTM, with the suggested changes, but please get @jakehehrlich to confirm he's happy with the test changes.
Nit: Move this to the line immediately after the earlier "CHECK".
I think this is a little unusual formatting of an error message, using '[' and '-'. I think a more common style would be to use '"' and ':':
'Can't dump section ".foo": it has no contents'
Also, llvm-objcopy's style is generally to use upper-case for the first letter in the error message.