This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] [COFF] Add support for removing symbols
ClosedPublic

Authored by mstorsjo on Dec 19 2018, 1:26 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.Dec 19 2018, 1:26 AM

One broad comment is that we should probably implement these in different patches. Just pick one for now and we can build up to the others.

tools/llvm-objcopy/COFF/COFFObjcopy.cpp
56 ↗(On Diff #178846)

Do these really all do the same thing?

jakehehrlich added inline comments.Dec 19 2018, 2:48 PM
tools/llvm-objcopy/COFF/COFFObjcopy.cpp
136 ↗(On Diff #178846)

The two styles by which removeSymbols and removeSections were written was a historical accident in ELF. We should pick one and stick to it here. I was going to say I don't really care which but I'll retract that statement say that using the style that was used in removeSymbols is to be preferred. I think it will both perform better and be less confusing to incoming devs.

144 ↗(On Diff #178846)

I think supporters of checking empty() prior to is_contianed changed their minds. This should be removed in all checks.

tools/llvm-objcopy/COFF/Object.cpp
35 ↗(On Diff #178846)

Can sections reference each other? Can a symbol reference a section? A significant portion of the complexity in removeSection in ELF comes from the fact that we handle those sorts of cases. If you remove a section then depending on circumstance we either remove the referencing object as well or throw an error because you weren't *also* removing the reference.

40 ↗(On Diff #178846)

This seems like it is perhaps something the Writer should do? I'm not clear on the intention here.

48 ↗(On Diff #178846)

Again don't follow the error handling that I used in ELF code. You don't need to pass the otherwise useless Filename if you return an error because you can add that information when you report the error in whatever handles this. Also sometimes it won't even be a file (see -I binary for instance)

58 ↗(On Diff #178846)

I know I reportError or call error all of the place, but it's best to avoid that and instead propogate the Error back. I forgot to check for that in the previous patch. Doing this makes writing tools easy but makes converting this into a library hard.

70 ↗(On Diff #178846)

Ditto on filename and reporting and comment about RawSymbolTable not belonging in general.

100 ↗(On Diff #178846)

Why can't this be a loop over Sec.Relocs? The only blocker seems to be the use of Sec.RelocTargets[I] but it seems that's just a parallel array which is force you to write it like this.

tools/llvm-objcopy/COFF/Object.h
31 ↗(On Diff #178846)

Depending on what you mean here, this makes me think this should not be necessary. Can you elaborate?

33 ↗(On Diff #178846)

Aside from weather these should actually exist, (Idx probably should from my experience but I'm skeptical about RelocTargets) aren't you maintaining a parallel array array here? Why not just have a proper Reloc struct and put both the coff_relocation and the Target in it? That would improve some code I'm going to complain about below.

44 ↗(On Diff #178846)

Yeah this was the sort of thing we eventually compromised on but we agreed that it was a result of poor structure in the ELF code. Ideally this sort of thing should not be in Symbol; we just didn't find a way around it that didn't require significant upheaval in the code. Since we're in an earlier stage with COFF code we can find a better way that storing this with the Symbol itself perhaps?

65 ↗(On Diff #178846)

Seems like a field that only the writer should ever know about.

71 ↗(On Diff #178846)

Can you elaborate on what the intended semantics of this is? I'd kind of prefer the interface of Object be that you can construct a blank object, build it up step by step having a valid object along the way as you go, and then write it when possible. The exception to this is when parts of an Object reference each other you need to allow for temporary invalidity of some form or another while the cycle is being closed.

72 ↗(On Diff #178846)

If possible its most ideal to keep this true as you go rather than setting it all at once. We found that wasn't easily possible to do correctly in the ELF code 1) That might not be true here but even if it is 2) A more generic interface to this would be to use updateSymbols which is the sort of generic operation on an Object that I'd like to see used in favor of something specific like this.

74 ↗(On Diff #178846)

ditto.

75 ↗(On Diff #178846)

Can you describe what this does a bit more?

mstorsjo marked 12 inline comments as done.Dec 20 2018, 2:01 AM

Thanks for the thorough review!

One broad comment is that we should probably implement these in different patches. Just pick one for now and we can build up to the others.

I found them rather entangled, as removing sections also requires removing the corresponding symbols for those sections. But I can try to make this only removal of symbols, as an absolutely minimal first step.

Also, most of the comments seem to be related to the design of the data structure (or lack thereof). I'll try to restructure it according to the general direction I got from these comments, towards the goal of maintaining a correct state along the way as far as possible, moving more of the reassembly of correct state into the writer.

tools/llvm-objcopy/COFF/COFFObjcopy.cpp
56 ↗(On Diff #178846)

As far as I've seen, yes, with respect to sections. With respect to symbols, they do slightly different things though. (In the original objcopy, with things abstracted via libbfd, they probably have slightly different logic, but when mapped down to COFF, this is what I've discovered so far at least.)

136 ↗(On Diff #178846)

Ok, I'll try to rewrite that part (but if I get the patch split, that's in a later patch then).

144 ↗(On Diff #178846)

Ok, will do gladly.

tools/llvm-objcopy/COFF/Object.cpp
35 ↗(On Diff #178846)

A symbol references a section, and a relocation can reference a symbol. There are cases where sections can reference each other (associative comdat symbols) that I haven't implemented yet - I don't think GNU objcopy does either (it doesn't know about the associative comdat concept at all, afaik). Getting those right would of course be ideal, but they're a bit out of scope of what I'm aiming at initially (basic stripping functionality).

Most of the functionality in this patch also revolves around this - removing symbols that belong to a removed section, etc, but it's rather spread out, and the Object class API isn't easy to use for constructing things from scratch, and updates requires coordinating a few steps.

40 ↗(On Diff #178846)

It initializes helper tables that currently only are used during the transformation stage. But I think the general remedy to most of the suggested changes is to make more of these permanent members of the data structures, updated on the go as far as possible, and moving more of the logic into the reader/writer.

48 ↗(On Diff #178846)

Ok, that makes sense. I didn't copy this design from the ELF code actually, I independently made the same bad design...

100 ↗(On Diff #178846)

Yes, moving the target into the Relocs vector would get rid of that.

tools/llvm-objcopy/COFF/Object.h
31 ↗(On Diff #178846)

I mean that they're not essential for reading/writing, but necessary as intermediate helpers when remapping symbol indices. So strictly, this could also be kept in separate datastructure within COFFObjcopy, as it doesn't have to be persisted to the writing stage. But it's more convenient to have them go along with the Section struct here.

33 ↗(On Diff #178846)

Sure, that should be doable.

44 ↗(On Diff #178846)

I don't instinctively think of a better way of storing it - can you elaborate on what you would have in mind?

65 ↗(On Diff #178846)

No, the write (as it currently stands) doesn't need to know about this. Currently, before invoking the writer, all relocations have correct symbol table indices set up.

A COFF symbol can have a number of auxillary symbol table entries following, with metadata about that preceding symbol. The Symbols array here only consists of logical symbols (so that when removing one symbol, the corresponding extra entries are gone along with it), and this allows looking up what Symbol a raw symbol table index (from a relocation) refers to. The writer currently just writes out the symbol table index in the relocation as is.

71 ↗(On Diff #178846)

That sounds like a great goal for the Object class API, my design so far is too much of a plain container with the logic scattered outside of it.

mstorsjo updated this revision to Diff 179052.Dec 20 2018, 5:44 AM
mstorsjo retitled this revision from [llvm-objcopy] [COFF] Add support for removing symbols and sections to [llvm-objcopy] [COFF] Add support for removing symbols.
mstorsjo edited the summary of this revision. (Show Details)

Split out code for removing sections to a later patch. Maintaining a map from symbol name to Symbol object within the Object class, and keeping the relocation target symbol name in a Reloc struct together with the coff_relocation struct. This hopefully addresses most of @jakehehrlich's feedback.

mstorsjo marked 10 inline comments as done.Dec 20 2018, 5:49 AM

Marked a few more comments done (for things that I moved to a later patch). The main thing I didn't change much is the Referenced field, which I update with a helper function before it's needed. I could make the Reader call it originally to avoid having to do it in COFFObjcopy.cpp as well, in order to try to keep it up to date at all times.

jakehehrlich marked 2 inline comments as done.Dec 20 2018, 4:05 PM

Head's up I'm about to go home for the evening and I'm pretty well off for until the new year after that. I think others are in a similar boat. I'm not ready to LGTM this yet and I don't have time to give more review so this might have to wait until January 2nd.

I found them rather entangled, as removing sections also requires removing the corresponding symbols for those sections. But I can try to make this only removal of symbols, as an absolutely minimal first step.

Yeah I think given how simple section stripping seems to be for COFF relative to ELF this might make sense. I'm fine with just minimizing the number of options a bit. I think we can come back to minimizing handleArgs after the new years

tools/llvm-objcopy/COFF/Object.cpp
19 ↗(On Diff #179052)

I was thinking of updateSymbols as more of an operation the user could take on each symbol. See updateSymbols in the ELF code for instance. This seems like something that can be done at finalization time.

35 ↗(On Diff #178846)

Awesome! Good to hear COFF doesn't (at least yet) have that complexity.

48 ↗(On Diff #178846)

I copied it from LLD so you're not alone.

tools/llvm-objcopy/COFF/Object.h
44 ↗(On Diff #178846)

I don't have anything specifically in mind, I don't remember the review too well. It's kind of formats specific. For instance a symbol can be "referenced" by more than one section. It might be nice to have sections list all the symbols that they reference (except the symbol table which is the owner not the referencer) but this could cause O(n^2) running times. I was hoping that some quirk or insight you might have into the nature of COFF might lead us toward a means of avoiding storing weather this was referenced or not while maintaining something about as efficient. Had I done a complete redesign of the ELF code with this in mind we might wound up with something like a graph that could be walked to take out all the things that we actually wind up using and stripping would just reduce the set of initial nodes for instance; That's a crazy different direction but maybe something less extreme is possible here.

72 ↗(On Diff #178846)

Oh other thing, I think we did this on the section level (which probably doesn't make sense here) but it was called "markSymbols" to mimic "mark and sweep" garbage collection. I'd prefer that name but it hardly matters.

test/tools/llvm-objcopy/COFF/X86/remove-local-symbols.s
1 ↗(On Diff #179052)

so I'm confused - what decision has been made regarding tests (sorry, many people have commented, I'd like to understand the logic):
all the new tests will use assembly while yaml is deprecated, we will use both yaml and assembly, we will use yaml and assembly only in exceptional cases, or what ?
As far as I'm concerned - i want consistency + functionality, at least in the future.

mstorsjo marked 4 inline comments as done.Dec 21 2018, 1:47 AM

Head's up I'm about to go home for the evening and I'm pretty well off for until the new year after that. I think others are in a similar boat. I'm not ready to LGTM this yet and I don't have time to give more review so this might have to wait until January 2nd.

Ok, thanks for the reviews so far, let's pick it up again after new year's. I realized a few shortcomings in my approach so far (that had gone unnoticed while I've kept my patchset in use for a couple weeks but that just now cropped up) - the fact that I can't use symbol names as unique keys, and that this tool really does run on objects with associative comdats in the build setups I'm testing, so I'll try to revise it further until then.

test/tools/llvm-objcopy/COFF/X86/remove-local-symbols.s
1 ↗(On Diff #179052)

I'm not aware of any other decision other than preferring yaml over binary object files.

I'd strongly prefer to be able to use both yaml and assembly for these tests; yaml is ideal for dumped output from a full toolchain, including debug info and everything that a real object file would contain. Assembly is better for the cases when I want to construct a specific setup to test certain cases, where this keeps the test much smaller and easier to read and understand. And for executables instead of object files, yaml is the only option as we can't rely on lld.

tools/llvm-objcopy/COFF/Object.cpp
19 ↗(On Diff #179052)

Some of this can be done at finalization, but the map from symbol name to object pointer needs to be up to date when we do setReferencedSymbols or markSymbols if I rename it that way. With the current setup, of only supporting removing symbols and nothing else, it's enough if the Reader sets this up as well, but if we do multiple consecutive operations on the symbol vector (once we add removing of sections) we need to do it after each one of them.

However I just realized that there's fatal flaw with this in the current form; symbol names aren't necessarily unique within a COFF object file - there's a local symbol for each section, and there can be multiple sections with the same name. In the COFF files on disk, the symbol table index is the unique key that differentiates them, but when we rearrange the symbol table (e.g. on removals) I'll probably have to add some other unique id to be able to hook them up correctly back again after removals.

tools/llvm-objcopy/COFF/Object.h
44 ↗(On Diff #178846)

Well the alternative, as I see it, would be to store a backreference in each symbol, indicating what relocations refer to it, and that pretty much doubles the storage needed for all relocations. It probably creates a nice walkable graph in all directions, but also feels rather overengineered.

72 ↗(On Diff #178846)

Ok, I can rename it to markSymbols.

mstorsjo marked an inline comment as done.Dec 30 2018, 12:17 PM

I realized a few shortcomings in my approach so far (that had gone unnoticed while I've kept my patchset in use for a couple weeks but that just now cropped up) - the fact that I can't use symbol names as unique keys

I'm updating the patch to work around this now, by adding a different unique key to each symbol, to avoid having to use the non-unique name for that. Incidentally I noticed that neither obj2yaml, llvm-readobj or llvm-objdump were capable of differentiating between two symbols with the same name when dumping relocations; I'm adding support for that in D56140 for llvm-readobj, and will amend the tests here a bit once that lands, to properly verify that aspect as well.

tools/llvm-objcopy/COFF/Object.cpp
35 ↗(On Diff #178846)

Actually, associative comdats are a case where one section (indirectly, via a section definition symbol) references another section. Since this patch only handles symbol removal so far, I'm not touching that quite yet. Once I do, I'll probably first just let the removal do what it does and then error out at the end if there are dangling references - which should save quite a bit of complexity and be just fine for the basic stripping functionality (and shouldn't exclude implementing it differently later if desired).

mstorsjo updated this revision to Diff 179739.Dec 30 2018, 12:19 PM

Did a few suggested changes, renaming setReferencedSymbols to markSymbols, and using a unique key instead of the non-unique symbol name. Will amend the tests for that case once D56140 lands.

jhenderson added inline comments.Jan 2 2019, 3:38 AM
test/tools/llvm-objcopy/COFF/X86/remove-local-symbols.s
1 ↗(On Diff #179052)

Yes, to my knowledge, we haven't agreed anything formal re. YAML versus assembly, but I personally would prefer YAML wherever possible. From experience, at least for ELF, assembly often doesn't provide enough low-level control which we need to produce objects that have the properties we want in the input (e.g. what kinds of relocations get emitted etc). The comment about calling otherfunc not resulting in a relocation is a perfect example of this. It also removes the need for the "PRE" checks.

Also, I might be wrong, but wouldn't making this use YAML allow us to drop the X86-requirement here?

20–21 ↗(On Diff #179739)

This will produce a false positive, if otherfunc is present, because you've got nothing checking the total number of symbols. I don't know if COFF has a convenient way of testing this, but I would recommend doing so. Otherwise, you should put a -NEXT suffix on this line (and similarly in the above block) - you should also show that there are no symbols before this list if doing things this way, since llvm-objcopy might have reordered the symbols to put otherfunc first instead of removing it.

Could we also test that two different local symbols are removed? That shows that we remove all locals not just the first.

test/tools/llvm-objcopy/COFF/X86/remove-symbol1.s
22 ↗(On Diff #179739)

My comments apply from the remove-local-symbols test apply equally well here. We need to more tightly show that there are no symbols in other places. Similarly, it might be nice to make this YAML based if possible.

test/tools/llvm-objcopy/COFF/X86/remove-symbol2.s
6–10 ↗(On Diff #179739)

I don't think this check really gives us anything in this case.

12 ↗(On Diff #179739)

It would be nice if this test checked that the file name appears in the error. The symbol name should probably also be quoted in some fashion. At least on ELF, it is possible to have symbol names with spaces in, so the error can be a bit ambiguous!

tools/llvm-objcopy/COFF/COFFObjcopy.cpp
38 ↗(On Diff #179739)

There seems to be an awful lot of code in this function that I don't think is tested. Perhaps it would make more sense to implement it in more patches (e.g. basic explicit symbol removal, then keep, then strip all or whatever etc), and test each part individually.

52 ↗(On Diff #179739)

This isn't a parsing failure. Use a more appropriate error value, like invalid_argument.

tools/llvm-objcopy/COFF/Object.cpp
38–39 ↗(On Diff #179739)

Referenced is initialised to false. Do you need to reset it here as well? I don't envisage a situation where we should need to do this more than once on a single object.

tools/llvm-objcopy/COFF/Object.h
27–28 ↗(On Diff #179739)

Are these constructors really necessary? Won't they be generated by default?

46–47 ↗(On Diff #179739)

Don't abbreviate unnecessarily (Idx -> Index). Do we really need both separately? If so, perhaps better names would be OriginalIndex and Index or similar, and then use the input symbol index as the key.

70 ↗(On Diff #179739)

This comment seems to indicate an encapsulation issue to me. If this function has to be called any time the vector is updated, perhaps the vector should be private and accessed only through member functions?

mstorsjo marked 9 inline comments as done.Jan 2 2019, 1:09 PM
mstorsjo added inline comments.
test/tools/llvm-objcopy/COFF/X86/remove-local-symbols.s
1 ↗(On Diff #179052)

Yes, using YAML would remove the dependency on X86 being enabled, and I agree that it makes some things less ambiguous. As for writing (and reading) the tests I prefer assembly, but if there's strong opposition I can make them YAML instead and just use assembly as source locally when crafting the tests.

There is, however, at least one case in a test (that depends on D56140) which is impossible to make using YAML as input at the moment; the COFF YAML format doesn't distinguish between multiple symbols with the same name for relocations. To fix that, the COFF YAML format would need to be changed to use symbol indices instead of names for relocations (mirroring what actually is stored in the files).

20–21 ↗(On Diff #179739)

Indeed, that's true. I somehow thought I tested this stricter than this, but apparently I misremembered.

Yes, some sort of stricter check would indeed be good. I think the current order actually is produced by llvm-nm and not dependent on the source file itself, but avoiding such assumptions is of course better.

Using -NEXT would indeed help, but is there anything that would force a match on the first line? llvm-nm doesn't print anything at all before the symbols. llvm-readobj -symbols is way too verbose for a neat test here (it prints 5+ lines per symbol), but llvm-objdump -t turns out to be very compact and detailed, so that's probably even better than this. That output also has got a nice start header that makes the use of -NEXT exact.

test/tools/llvm-objcopy/COFF/X86/remove-symbol2.s
6–10 ↗(On Diff #179739)

Indeed, this part of the test is pretty pointless.

12 ↗(On Diff #179739)

Ok, will make the code quote it and have the test look for a filename as well.

tools/llvm-objcopy/COFF/COFFObjcopy.cpp
38 ↗(On Diff #179739)

Fair enough, I can do that.

52 ↗(On Diff #179739)

Ok, will use a different error code.

tools/llvm-objcopy/COFF/Object.cpp
38–39 ↗(On Diff #179739)

I'd rather have the method be self-contained enough to clear it here, in case the caller would call this method multiple times. If it doesn't, the method leaks the limitation that it can only be called once, which in itself also is rather encapsulation.

To make things safer perhaps the field should be initialized to true, to indicate that all symbols are referenced unless we've called this and actually know for sure?

tools/llvm-objcopy/COFF/Object.h
46–47 ↗(On Diff #179739)

Ok, I'll expand the unnecessarily terse name.

As for Key vs OriginalIndex - yes, input symbol index would work just as well for key (and that's pretty much what I use right now anyway). But as @jakehehrlich wants to have the Object API usable even for constructing object files from scratch, naming it OriginalIndex feels a bit limiting.

70 ↗(On Diff #179739)

Yes, that'd be a nicer abstraction. That leaves things a bit open wrt how to use it from the Reader though; I could have an add method which adds one symbol at a time and updates the map after each one. That ends up with O(n^2) for the map updates. Alternatively the Reader produces a vector locally and there'd be an addSymbols(vector<Symbol>) method here, which adds all at once and does one update of the map at the end.

What do you think?

mstorsjo updated this revision to Diff 179944.Jan 2 2019, 2:44 PM

Updating with most of the clear issues fixed, removed code for all other stripping modes than removing explicitly named symbols. Using llvm-objdump -t instead of llvm-nm for printing symbol lists, using -NEXT to avoid the risk of any missed symbols. Using llvm::errc::invalid_argument in one place, renamed RawIdx to RawIndex, added quoting around the symbol name for the erorr message, extended the error test to check for this and the file name.

Other issues that still are under discussion are not yet changed.

mstorsjo updated this revision to Diff 180003.Jan 3 2019, 12:42 AM

Updated and extended the testcase after llvm-readobj was improved. The testcase in remove-symbols1.s no longer works with yaml input, with current yaml2obj/obj2yaml tools/formats.

llvm-readobj -symbols is way too verbose for a neat test here (it prints 5+ lines per symbol)

Not that it's really relevant here, I think, but llvm-readelf (aka GNU output style for llvm-readobj) produces single line output for each symbol. I've not looked, but I'm guessing that llvm-readobj doesn't have an equivalent for COFF. Maybe that's worth taking a look at at some point? We actually tend to use the verbose output style for the llvm-objcopy ELF tests though.

Yes, using YAML would remove the dependency on X86 being enabled, and I agree that it makes some things less ambiguous. As for writing (and reading) the tests I prefer assembly, but if there's strong opposition I can make them YAML instead and just use assembly as source locally when crafting the tests.

I don't feel strongly about it, but I would prefer YAML, because of the extra control granted.

There is, however, at least one case in a test (that depends on D56140) which is impossible to make using YAML as input at the moment; the COFF YAML format doesn't distinguish between multiple symbols with the same name for relocations. To fix that, the COFF YAML format would need to be changed to use symbol indices instead of names for relocations (mirroring what actually is stored in the files).

Hmm... that's an interesting case. I think it accepting arbitrary indexes would be nice. IIRC, the ELF YAML format allows for some fields specifying either a name or index, and if an index, it doesn't do any checking, which allows for crafting invalid inputs to test error handling. Having a symmetry here would make a lot of sense.

test/tools/llvm-objcopy/COFF/X86/remove-symbol1.s
11–12 ↗(On Diff #180003)

Unless there's a strong need, I'd just drop the (16)/(15) bit at the end, as it allows folding these two lines together, and doesn't really give us anything, since there's only one "ptr" symbol.

47 ↗(On Diff #180003)

Could you largely fold this and the PRE block together, to reduce risk of copy/paste issues, and make clear the difference?

You could do this by sharing the majority of the checks, but having a single difference for the removed symbol:

...
# SYMBOLS-NEXT: foo
# SYMBOLS-PRE-NEXT: otherfunc
# SYMBOLS-NEXT: otherfunc2
...
tools/llvm-objcopy/COFF/COFFObjcopy.cpp
32 ↗(On Diff #180003)

You might want to simplify this if in this first version to only check the SymbolsToRemove.

40–42 ↗(On Diff #180003)

This comment doesn't really make sense in the patch's current form.

tools/llvm-objcopy/COFF/Object.cpp
38–39 ↗(On Diff #179739)

Okay, I see your point, so it's fine what you have done. I have issues both with it being set to true and to false, because each would result in technically incorrect state for a period of time until the function is called. Perhaps it would be best to simply not initialise Referenced at all, until markSymbols is called, since the value can't be relied on to be safe without that call.

tools/llvm-objcopy/COFF/Object.h
46–47 ↗(On Diff #179739)

Okay, no problem. I'd marginally prefer UniqueID to Key, I think, but I'm okay if you prefer it this way around.

70 ↗(On Diff #179739)

I think an addSymbols function as you describe would make a lot of sense. A future change might also want a single addSymbol function too, but that's just a minor thing that can be added when required.

mstorsjo marked 14 inline comments as done.Jan 3 2019, 2:14 PM

There is, however, at least one case in a test (that depends on D56140) which is impossible to make using YAML as input at the moment; the COFF YAML format doesn't distinguish between multiple symbols with the same name for relocations. To fix that, the COFF YAML format would need to be changed to use symbol indices instead of names for relocations (mirroring what actually is stored in the files).

Hmm... that's an interesting case. I think it accepting arbitrary indexes would be nice. IIRC, the ELF YAML format allows for some fields specifying either a name or index, and if an index, it doesn't do any checking, which allows for crafting invalid inputs to test error handling. Having a symmetry here would make a lot of sense.

Ok, I'll try to have a look at that and whether it's possible to extend the COFF YAML format similarly.

test/tools/llvm-objcopy/COFF/X86/remove-symbol1.s
11–12 ↗(On Diff #180003)

The point I tried to make in keeping it explicit, is that llvm-objcopy updated the symbol index in this relocation. Granted, if it wouldn't have, the symbol name would have changed as well (or it would have ended up as a broken relocation), but I tried to show it clearly that this reloc actually does refer to a symbol that changed index.

47 ↗(On Diff #180003)

Oh, right, yes - will do.

tools/llvm-objcopy/COFF/COFFObjcopy.cpp
32 ↗(On Diff #180003)

Right, yes, will fix.

40–42 ↗(On Diff #180003)

Oops, will fix.

tools/llvm-objcopy/COFF/Object.cpp
38–39 ↗(On Diff #179739)

Ok, I can leave it uninitialized, and then sanitizers and checkers can point out if it's ever used uninitialized when it shouldn't.

tools/llvm-objcopy/COFF/Object.h
27–28 ↗(On Diff #179739)

Sorry, I forgot to reply on this one yesterday. They're indeed unnecessary if using the right syntax for initializing, I didn't think of the new C++11 syntax for doing that without explicit constructors.

46–47 ↗(On Diff #179739)

I can change it to UniqueId, that's what I originally called it, but later changed it into Key because I thought it was better - I'll change it back.

mstorsjo updated this revision to Diff 180140.Jan 3 2019, 2:16 PM
mstorsjo marked 6 inline comments as done.

Updated the details discussed with @jhenderson, except for the still-discussed overly verbose relocation symbol index in the test, and still using assembly instead of yasm for the test, pending extending obj2yaml/yaml2obj.

mstorsjo marked an inline comment as done.Jan 3 2019, 2:19 PM
mstorsjo added inline comments.
tools/llvm-objcopy/COFF/Object.h
72 ↗(On Diff #180140)

Oops, I forgot to clang-format this one - done locally but won't repost a new patch just for that.

jhenderson added inline comments.Jan 4 2019, 2:16 AM
test/tools/llvm-objcopy/COFF/X86/remove-symbol1.s
11–12 ↗(On Diff #180003)

Okay, that seems reasonable. Might be worth a small comment to show that's what you're doing (i.e. showing that the index has been changed).

tools/llvm-objcopy/COFF/Object.cpp
20–23 ↗(On Diff #180140)

Not sure it's actually any simpler, but you could use std::transform, to do this. Approximate untested code:

std::transform(NewSymbols.begin(),
  NewSymbols.end(),
  std::back_inserter(Symbols),
  [&NextSymbolUniqueID](Symbol&S){ S.UniqueId = NextSymbolUniqueId++;});

Okay, now that I type it out, this is probably less readable too, so feel free to completely ignore it!

tools/llvm-objcopy/COFF/Object.h
64 ↗(On Diff #180140)

Could this return an ArrayRef?

67 ↗(On Diff #180140)

This can take an ArrayRef.

72 ↗(On Diff #180140)

I'd move this method to be alongside getSymbols.

27–28 ↗(On Diff #179739)

I'm happy that you changed this, but I now realise that I misread the original, and probably wouldn't have bothered commenting if I'd realised it was a conversion constructor, not a generic copy constructor!

mstorsjo marked 8 inline comments as done.Jan 4 2019, 1:42 PM
mstorsjo added inline comments.
test/tools/llvm-objcopy/COFF/X86/remove-symbol1.s
11–12 ↗(On Diff #180003)

Ok, adding a comment here.

tools/llvm-objcopy/COFF/Object.h
64 ↗(On Diff #180140)

Yup, will change.

67 ↗(On Diff #180140)

Done.

72 ↗(On Diff #180140)

Ok, will do.

mstorsjo updated this revision to Diff 180315.Jan 4 2019, 1:44 PM
mstorsjo marked 4 inline comments as done.

Changed a few cases of std::vector to ArrayRef as suggested, moved the getMutableSymbols method next to getSymbols as suggested, added a comment about the verbose tests, and changed the test input to yaml (depending on D56294).

jhenderson added inline comments.Jan 7 2019, 3:03 AM
test/tools/llvm-objcopy/COFF/Inputs/remove-symbols.yaml
38 ↗(On Diff #180315)

Similar to my comments in D56294, how much of this YAML is actually necessary to test symbol removal? We really only want the very minimum needed for this particular case (and we can add more for other tests or too this test as we add more cases).

test/tools/llvm-objcopy/COFF/remove-symbol1.test
1 ↗(On Diff #180315)

Please rename this to strip-symbol.test, to match the long form of the switch name. Also, please add a case using the long form. This could be a second run of llvm-objcopy on the same input, and show that you get the same output. Finally, the general pattern in most ELF tests is to make the edits out-of-line (i.e. specify a different output file to the input file), and then to show that the input file was not itself modified by the operation. This last point is probably only important because of the second point re. the long form switch name etc.

test/tools/llvm-objcopy/COFF/remove-symbol2.test
1 ↗(On Diff #180315)

Rename this test to something meaningful, e.g. strip-referenced-symbol.test

In fact, I wonder if we should mirror the equivalent ELF test (strip-reloc-symbol.test) and error message as much as possible? What do you think?

mstorsjo updated this revision to Diff 180466.Jan 7 2019, 5:03 AM

Stripped down the yaml input, renamed tests as suggested (matching the ELF tests), adjusted the error message to match the one used in the ELF case.

jhenderson accepted this revision.Jan 7 2019, 5:28 AM

LGTM, but please wait for the others to confirm they're happy.

This revision is now accepted and ready to land.Jan 7 2019, 5:28 AM

@jakehehrlich - Any further comments, or can this go in?

tools/llvm-objcopy/COFF/Writer.h
25 ↗(On Diff #180466)

btw, i forgot to ask it earlier - why do we need this base class + protected fields ?
maybe just roll it into COFFWriter ? (the hierarchy of Readers (in ELF and MachO) basically facilitates handling different kinds of input (object file vs binary input (not a regular object file)), but why is this base class necessary ?

tools/llvm-objcopy/COFF/Object.h
84 ↗(On Diff #180466)

i was thinking about these 3 guys for some time,
i don't have a strong opinion - however, for what it's worth, they all should be in some consistent state, maybe there should be an extra abstraction which would encapsulate them ? (and Object would own that abstraction (for example)). There is some interference with relocations, but we can expose some methods to enable that "communication".

mstorsjo marked 2 inline comments as done.Jan 7 2019, 2:44 PM
mstorsjo added inline comments.
tools/llvm-objcopy/COFF/Object.h
84 ↗(On Diff #180466)

Well they're private at least, to me that's at least some sort of encapsulation. I'd rather not overengineer that part at this point, in case later patches turn out to need things wrt these I haven't thought of yet.

tools/llvm-objcopy/COFF/Writer.h
25 ↗(On Diff #180466)

They're not really necessary - I brought them along while copying the general structure from ELF, and I tried to ask for opinions during review whether I should keep or remove them, but I don't think anybody commented on it. I can do an NFC patch to remove them.

alexander-shaposhnikov added inline comments.
tools/llvm-objcopy/COFF/Writer.h
25 ↗(On Diff #180466)

yeah, sounds good, thanks

rupprecht accepted this revision.Jan 10 2019, 12:57 PM

Just some nits, but I think this is good to commit.

tools/llvm-objcopy/COFF/Object.cpp
20 ↗(On Diff #180466)

Symbol& S?

31 ↗(On Diff #180466)

Given the number of symbols may be large and you know the exact size you want to add, you might want to avoid reallocation by changing SymbolMap.clear() with SymbolMap = DenseMap<xxx>(Symbols.size()).

tools/llvm-objcopy/COFF/Reader.cpp
87 ↗(On Diff #180466)

Similarly you can set an initial size of COFFObj.getRawNumberOfSymbols() here

mstorsjo marked 3 inline comments as done.Jan 10 2019, 1:09 PM

Just some nits, but I think this is good to commit.

Ok - should I still wait for @jakehehrlich, or do you happen to know about his whereabouts?

tools/llvm-objcopy/COFF/Object.cpp
20 ↗(On Diff #180466)

No, intentionally not a reference here. The point is that we have an ArrayRef<Symbol> from the caller, which essentially is read-only, and we mutate the Symbol objects as we insert them into the Object class' internal vector.

31 ↗(On Diff #180466)

Ok, I'll try that.

tools/llvm-objcopy/COFF/Reader.cpp
87 ↗(On Diff #180466)

Ok, will do. That'd be Symbols.reserve(COFFObj.getRawNumberOfSymbols()) right, as it can't be done via the constructor directly?

Just some nits, but I think this is good to commit.

Ok - should I still wait for @jakehehrlich, or do you happen to know about his whereabouts?

IMO you can go ahead and submit now -- Jake (or anyone else) can always do a post-commit review if there are issues with this patch.

tools/llvm-objcopy/COFF/Reader.cpp
87 ↗(On Diff #180466)

Sounds right to me

This revision was automatically updated to reflect the committed changes.