This is an initial implementation of no-op passthrough copying of COFF with objcopy.
I've tried to base the design on the ELF backend, although some of the extra classes (such as Reader vs COFFReader) might not make much sense here.
Differential D54939
[llvm-objcopy] Initial COFF support mstorsjo on Nov 27 2018, 2:08 AM. Authored by
Details This is an initial implementation of no-op passthrough copying of COFF with objcopy. I've tried to base the design on the ELF backend, although some of the extra classes (such as Reader vs COFFReader) might not make much sense here.
Diff Detail
Event TimelineComment Actions Please do split this into a separate pass-through change and option-hook up. It will keep the scale of the review down by quite a bit. Hooking up the options one or two at a time after that would then make reviewing each of those easier too. I assume you've seen D54384, where @alexshap is doing the same for MachO? It would make a lot of sense for the two approaches to mirror each other. I noticed you're planning on adding pre-canned binaries for the tests. Is it possible to use yaml2obj or similar instead? Comment Actions Ok, will split that out. As for adding other options one at a time, I probably can't do it one option at a time, but I'll try to split it into smaller sets of options with the same mechanics at least.
I've seen it and looked at it a little for inspiration with this one as well. The general structure of the ELF/COFF/MachO subdirs are the same, but the actual objects and functions used for rebuilding objects obviously differ depending on the formats' structure etc.
For most practical functional tests, yes. I have one test for the plain passthrough copying which does a byte-by-byte comparison between the input and output, and tools that generate object files/executables have some freedom in exactly how they are laid out. The input files I use right now have been generated with llvm-mc/lld/msvc, but I can try to see how many of them end up identical if passed via yaml. Comment Actions Removed the changes that do actual transformations on the binary. Still using hardcoded input files, as it will take a while to sort out what exact differences are produced if going via yaml (I ran into one yaml2obj bug so far), will look more into that later. Comment Actions FWIW, I can easily change the simple object file inputs into yaml. Executables can also be made into yaml, when the serialization format of VirtualAddress for sections is clarified. (Right now it's broken if you convert to yaml and back.) For executables, I currently pad code segments with 0xcc, to match what LLD does, but yaml2obj doesn't do that, so I'd have to stop doing that if I want a test that matches byte for byte the input data. For bigobj inputs, yaml2obj doesn't produce that unless the bigobj format strictly is needed. So I can't easily test that with yaml unless I create an object file with over 64k sections. So at least for that, a binary input files more or less is wanted. Comment Actions i will get to this diff early next week, sorry about the delay. A small side note - probably the simple changes to include/llvm/Object/COFF.h can be factored out from this diff and reviewed separately (and probably much faster). Comment Actions Ok - splitting them out isn't hard, but I dunno what to make as test for them, if they are going in as such. Comment Actions It looks to me like they should be unit-tested. There is already a set of unit tests for LLVMObject, so it shouldn't be too hard to add them, I'd think. Comment Actions I'm not really a COFF expert, so I haven't yet looked at the details of this change. It broadly looks like the right direction though. If I get time, I'll try to start reviewing the details, but it would be better if somebody with more familiarity looked at them.
Comment Actions Split the large methods in Object.cpp into a bunch of smaller ones, and added a bit more comments. Moved the COFF header getter implementations into the cpp file. Suggested changes not (yet) done:
Currently, the llvm-objcopy output matches bytewise what lld outputs for executables (except for string tables), and what the LLVM codegen outputs for normal object files. If testing with files synthesized from yaml, yaml2obj needs 3 changes; padding executable sections, skipping 4 byte section content alignment as yaml2obj does but MC doesn't, and using the MC StringTableBuilder. The former two are trivial to change, but the latter requires adding an MC dependency to yaml2obj, which I'm not sure is wanted. Comment Actions On the whole ELFObject vs Object thing. I'm in favor of just using Object but I don't really care too much. Having both the namespace and the name prefix is kind of verbose. A while back we had two kinds of Objects instead of just one but this was determined overtime to be a mistake for the ELF backend. If you want to do conversions between formats, even if the format is "binary" you'll probably want to avoid having two kinds of Object or if you do make sure you provide a sufficiently rich base interface to them to allow most code to be written in terms of that.
Comment Actions Also don't focus on byte for byte accuracy, test semantically so that we can make layout changes independent of content changes. It's clear to me now that we shouldn't even bother attempting exact binary matches even in the first patch. Comment Actions Sure, in general it's not necessary to have byte for byte accuracy, but as we write a new file, I've tried to mimic the layout of MC and LLD as those should be sane and actually used in the wild. The test itself doesn't need to check for such accuracy of course though. I guess I could check e.g. the input/output with e.g. llvm-readobj -file-headers or something like that, to check that the output generally looks sane.
Comment Actions @rnk Can you have a look at this one wrt the COFF specifics, and how I'm storing/reassembling it? Comment Actions Removed the RFC tag as it has been through a few rounds of discussion already, and I'd like to see actual progress towards getting it merged. I changed the tests to use yaml files as input as requested. In order to have sensible tests that actually check that the file as a whole is copied correctly, without tediously writing lots of individual CHECK lines for llvm-readobj+FileCheck, I'm using obj2yaml on both inputs and outputs, and a plain cmp on the output from obj2yaml. Later patches that actually do some transformation can of course use more specific and targeted checks with llvm-readobj+FileCheck. Ping @rnk, can you give this a review from a generic COFF perspective? Ping @alexshap, do you have time to have a look? Ping @jakehehrlich, can you follow-up on the discussion where you suggested renaming fields? Can you, if possible, have a look on the patch as a whole, wrt to how I read structs from the input file, keeping them mostly packed in the same kinds of structs as in the binary file. Comment Actions sorry about the delay, I've also pinged some other people who know coff better than I do (at the moment). Comment Actions I guess that could be doable. For the actual stripping operations that I've already implemented (but holding off posting until this is merged), most of the code that does modifications actually is outside of the Object class (so far), so the Object class itself is mostly a dumb container. This, because modifications don't (so far) easily map down to simple individual operations that would fit into individual methods. E.g. removing a section requires updating symbols as well - so so far I've kept it as one series of modifications in COFFObjcopy.cpp. But that's Do you have any opinion on the extra Reader/Writer abstract base classes (vs COFFReader/COFFWriter)? I copied that design from ELF, but I'm not sure if it makes sense here as we have much fewer variants of everything. Comment Actions that's fine, my point was to "unload" some code from Object.cpp, in particular, the serialization/deserialization into separate files. (probably it won't change that much in the future, but the code for modifying Object will evolve as more features will be added). Comment Actions I don't have time to go over this with a fine tooth comb, but I took a look, and all the COFF parts make sense. The string table and the use of "/%d" for long section names is all correct. Comment Actions Split Object.cpp/h into separate files for Reader and Writer as suggested by @alexshap. Comment Actions A few minor cleanups and tweaks to the includes and file headers for the split Reader/Writer. Comment Actions I've looked at the code, there are few minor nits, but to me this code looks like a pretty good start, many thanks for working on this ))
Comment Actions Thanks for the review!
Comment Actions Removed the IsBigObj field, and only storing one single coff file header and one pe32 header struct, and converting to/from the intermediate form in the reader/writer, as suggested/hinted by @alexshap. Comment Actions My last day in work until the New Year is on Wednesday, and unfortunately I don't think I'll get a chance to look further at it until then. I'm happy to defer to other people's judgement on whether this is good, and can review it post-commit if needed. Comment Actions Ping @alexshap - Any reply to your last discussion point and my update of the patch relating to it? Ping @jakehehrlich - Any objections to this, or can I commit with @alexshap's and @rnk's approval? Comment Actions on my side i think this is fine, I don't want to block this diff. Comment Actions High level structure/interface looks good to me. I don't see in obvious code level issues. I'll trust @rnk's assessment of the COFF specific parts of this. I haven't reviewed the testing inputs but it looks like you're able to copy some substantial object files which looks like a solid start. I'll trust that James and Alex have looked at enough of this to fill in the gaps that I might have missed. We can start iterating now if there are any issues. Land it.
|