This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] [COFF] Add support for removing sections
ClosedPublic

Authored by mstorsjo on Jan 14 2019, 2:47 PM.

Details

Summary

This is designed similarly to removing symbols; each section gets a unique id, which is used to map symbols to sections through removals/additions, which is then mapped back to proper sequential indices when writing out.

Compared to removal of symbols, section definition symbols add a bit of extra code, as they contain section numbers in the section aux data as well. For associative comdat symbols, that needs to be updated just like the normal section number.

I've tried to keep the removal behaviour sensible, that is: When removing a section, all symbols pointing to that section is removed. (This matches GNU objcopy.) If there were relocations against those symbols in a section that is kept, writing errors out. (GNU objcopy on the other hand will just leave the relocation pointing at any random symbol that is left, but I prefer erroring out in this case.)

If the target section of a associative comdat is removed, the associative comdat can also be removed automatically (as nothing would include it). Nothing that really is used in the wild I guess, but makes the case of manually removing a specific section more convenient.

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.Jan 14 2019, 2:47 PM
rupprecht added inline comments.Jan 15 2019, 4:01 PM
test/tools/llvm-objcopy/COFF/remove-section.yaml
29 ↗(On Diff #181654)

Can you keep the RUN sections together?

tools/llvm-objcopy/COFF/COFFObjcopy.cpp
32 ↗(On Diff #181654)

Use explicit capture of &Config

tools/llvm-objcopy/COFF/Object.cpp
99 ↗(On Diff #181654)

Use explicit capture of &ToRemove, &RemovedSections

117 ↗(On Diff #181654)

I think it might be clearer to use an iterative approach, and possibly be able to remove the Internal var, e.g. logically something like this:

do {
  Sections.erase(...);
  Symbols.erase(...);
  ToRemove = [](...){ return AssociatedSections.count(Sec.UniqueId) == 1; }
} while (!AssociatedSections.empty());
updateSections();
updateSymbols();

I dunno though. It might be a little awkward, but recursive stuff just seems a little harder to debug, so it might be worth it.

tools/llvm-objcopy/COFF/Object.h
81 ↗(On Diff #181654)

Internal is not very clear -- i.e. it's clear that it's for internal use, but not really clear as to how it affects the call. (Ditto for removeSections).

87 ↗(On Diff #181654)

nit: extra blank line

111 ↗(On Diff #181654)

nit: extra blank line

tools/llvm-objcopy/COFF/Reader.cpp
116 ↗(On Diff #181654)

Consider having the error flow as the final condition instead of in the middle

mstorsjo updated this revision to Diff 182001.Jan 16 2019, 3:06 AM
mstorsjo marked 11 inline comments as done.

Applied @rupprecht's suggestions.

test/tools/llvm-objcopy/COFF/remove-section.yaml
29 ↗(On Diff #181654)

Sure - I tried to group them with the corresponding checks, but I see why it can be good to have all the RUN commands at the top as well.

tools/llvm-objcopy/COFF/Object.cpp
117 ↗(On Diff #181654)

Ok, that works - by updating the ToRemove predicate it becomes pretty palatable.

A second non-recursive alternative would to just remove sections and symbols for the associated sections once. I'm not entirely sure if it's valid (or supported by other tools) to have chains of associated sections, but handling that came for free in the recursive form (and by updating ToRemove).

mstorsjo updated this revision to Diff 182137.Jan 16 2019, 1:43 PM

Tweaked the scope of lambdas in removeSections to avoid use-after-free of stack variables in the implementation of the lambda captures, as pointed out by asan.

rupprecht accepted this revision.Jan 16 2019, 2:25 PM

It'd be nice if someone else could take a pass too, especially for any COFF-specific aspects, but LGTM

This revision is now accepted and ready to land.Jan 16 2019, 2:25 PM

It'd be nice if someone else could take a pass too, especially for any COFF-specific aspects, but LGTM

Thanks! @rnk - can you give this a peek? The main COFF specific details are updating section definition aux symbols if present, tracking the target of associative comdat sections. And if removing the target of an associative comdat, the associated section can be removed as well, to avoid leaving it dangling.

Adding another COFF-knowledgeable reviewer

tools/llvm-objcopy/COFF/Writer.cpp
43 ↗(On Diff #182137)

general comment (plus maybe others (Jake, James, Jordan) add their opinion): I know that smth like this ("finalize*" methods) has been going on in the llvm-objcopy for ELF for a while (in particular, the abstract class Writer has had the method "finalize" from the early beginning), however, this brings up the following question: what is the contract between Object and Writer ? in particular, to what extent an instance of Object should be "ready" to be consumable by a writer.

mstorsjo marked an inline comment as done.Jan 17 2019, 11:22 PM
mstorsjo added inline comments.
tools/llvm-objcopy/COFF/Writer.cpp
43 ↗(On Diff #182137)

Well, it's not very strictly typed out, but in general, while the Object class tries to maintain internal consistency as far as possible, some of the fields in the file format level structs have broken-out forms in the Symbol/Section/Relocation classes. In these cases, the broken-out field is the correct one that is maintained, and the corresponding field in the file format level struct is only set once the final layout of the file is known.

@rnk or @smeenai - This is awaiting an ack from a COFF perspective.

tools/llvm-objcopy/COFF/Writer.cpp
43 ↗(On Diff #182137)

so this is still a concern for me (and the existing ELF code too). So basically several people have mentioned interest in building a library from the code in llvm-objcopy, I'm thinking more and more about the current interfaces & design. Let me try to explain what makes me worried: so basically the idea of having these separate abstractions (Reader, Object, Writer) provides two important "extension points": we can read multiple input formats and we can write multiple output formats (at least potentially). For example, if one day smb decides to implement YAMLWriter i think it'll be unfortunate if he has to reimplement all this finalize* logic. Another thought - this introduces more subtle coupling between these classes (the assumptions which Writer makes about Object are non-obvious, at least for me). I think the same applies to ELF. @jakehehrlich , @jhenderson, @rupprecht - maybe you have some other thoughts / I'm missing something ?

rnk accepted this revision.Jan 18 2019, 2:42 PM

@rnk or @smeenai - This is awaiting an ack from a COFF perspective.

Going forward, I'd say that @mstosjo has enough COFF expertise that we don't need an additional reviewer for that. Please keep getting reviews from @rupprecht and other objcopy owners so the COFF port fits well into the objcopy design, but otherwise, forge onwards without waiting for me. Feel free to ping me if you have specific COFF questions.

mstorsjo marked an inline comment as done.Jan 18 2019, 2:46 PM
mstorsjo added inline comments.
tools/llvm-objcopy/COFF/Writer.cpp
43 ↗(On Diff #182137)

Very fair points, and it might be good to move some amount of finalization back from Writer into Object.

Are you ok with me moving forward with this patch though, and we can look at that as a separate refactoring concern at a later point?

alexander-shaposhnikov added inline comments.
tools/llvm-objcopy/COFF/Writer.cpp
43 ↗(On Diff #182137)

yeah, I'm okay with that (don't want to block this diff), I just wanted to take advantage of this discussion to bring this question up / get some feedback from other people.

rupprecht added inline comments.Jan 18 2019, 3:00 PM
tools/llvm-objcopy/COFF/Writer.cpp
43 ↗(On Diff #182137)

Sure, there's definitely opportunity for cleanup...

  • The split between write() and finalize() doesn't seem that important -- finalize() is always followed by write(), I don't see why we can't just combine them.
  • Having some sort of finalize/write logic in writers seems necessary; different ways of outputting the object is going to have different requirements (e.g. final bookkeeping)
  • But, perhaps Object itself should have a finalize() method for generic things (e.g. assigning sections indices). The balance of what goes in an Object vs Writer finalize method is going to be more of an art than a science, I think.
  • I think intra-coupling is actually the worse culprit, e.g. look at BinaryELFBuilder (to use my own code as a bad example), which requires its own private methods to be called in a specific error.
  • Also I'm not sure how much (if at all) we can generalize this -- creating a generic Object super interface that is unique across ELF/COFF/MachO will likely be too restrictive. Probably something we should just enforce in code review.

All that said, I don't think anything blocks this patch, but it's good to bring it up.

This revision was automatically updated to reflect the committed changes.

@jakehehrlich , @jhenderson, @rupprecht - maybe you have some other thoughts / I'm missing something ?

Just a quick drive-by comment to add to what @rupprecht said. Conceptually to me, the Object should be "final" when all of its details are consistent to represent a valid Object. Section indices are a good example, since there are sections within the Object, but until finalize is called, they don't have valid index values. As a result, a finalize method on the Object probably makes sense to achieve this. Writer meanwhile wants to handle the extra information that is specific to the file format, and not really generic in any meaningful sense (e.g. program headers). I agree that it's going to be hard to have a hard-and-fast rule though.