Add support for removing sections and symbols.
Details
Diff Detail
Event Timeline
Self note: I still need to look at coverage of the unit tests.
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
75 | entries, similarly -> entries. Similary | |
129 | non remove index -> next index that hasn't been removed The comment should say what happens if there are no such indexes before the end of the table. |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
132 | You need to assert that CurrentIndex is valid. std::find_if might be an easier to read function to use here. | |
139 | Could this be based on getNextIndex()? You could check first if the first element has not been removed, and then call getNextIndex(0); | |
146 | non removed entries -> entries that haven't been removed | |
147 | Maybe getEffectiveIndex? Relative doesn't really sound right. What is it relative to? | |
155–156 | "different to size() because size() returns the number of entries that haven't been removed." | |
157 | Perhaps call this getEndIndex() | |
337 | Superfluous overload. | |
345 | Superfluous overload. | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
397 | Missing trailing full stop. | |
422 | This shouldn't be auto. Can it be done inline in the check? | |
427 | This shouldn't be auto. | |
438 | Can this be done inline in the check? | |
442 | Missing trailing full stop. | |
465 | Probably requires a comment saying why "6". | |
478 | "index. It" But really, I'm not sure you should have these tests here. We don't usually test assertions. If this a likely situation to occur, then perhaps the methods should return an Expected. |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
---|---|---|
415 | Not sure, but it might be interesting to add some dynamic symbols and show that iterating over those isn't affected by the regular table changes. | |
430 | I think it would be interesting to see what happens when you remove the second symbol too, leaving only the null entry. I also think it would be interesting to test what happens if you try to remove the null entry (I think this should probably be an error). | |
491 | Again, a test case for trying to remove the null section header might be interesting (it should probably be an error). | |
501–502 | Rather than looking at just one, you should probably loop over all of the remaining sections, including the null one. |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
---|---|---|
415 | Wow! Great catch! There was actually a bug here with my moveSymbolNext not differentiating between the two symbol tables. Thanks :) | |
430 | I haven't done this or the similar comment for sections yet. Do you think the removeSymbol method should return Error or should this just be an assert? |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
129 | You've lost the "the" between "Gets" and "next". This part hasn't been addressed:
| |
131 | Why the cast? And what is the type of Range? I don't think you want auto here. | |
132 | What happens if std::find_if returns Range.end()? Won't that result in you dereferencing something invalid? | |
152 | You misunderstood my previous comment. I only wanted you to change the comment from the word "different". In other words, the full text should be: /// Returns the index of the last element. This is different to size() /// because size() returns the number of entries that haven't been removed. | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
430 | Good question. I'd probably just assert for now. | |
481–482 | added by yaml2obj |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
131 | Because templates are very finicky :) The alternative is I could do seq<uint64_t>(CurrentIndex + 1, Mappings.size()). Do you have a preference either way? | |
132 | I had to check this to make sure, *llvm::seq(0, 10).end() yields 10, so it just returns the index in the end state, so just Mappings.size(). | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
430 | Ok, I've added assertions to removeSymbol and removeSection. I should not test these explicitly because it is an assert, right? |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
134–138 | *llvm::find_if(seq(CurrentIndex + 1, (uint64_t)Mappings.size()), [this](uint64_t Index) { ... }); | |
189–190 | Many of these methods are now linear, meaning this library isn't feasible to operate on files that have a large number of symbols (especially if you end up calling these methods in a loop and go quadratic). I think this is fine to experiment with but should be noted, so it can be fixed before adding uses of it. | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
481–485 | I think writing a test matcher would make this more self-documenting than a comment. Writing a test helper (instead of a full blown matcher) would also work; you might not need a thorough matcher outside of this test. |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
189–190 | FWIW, I think size is the only method which gets its complexity changed, this can be solved by just keeping a running count, and having add and remove methods change Size. Does that sound ok? Methods like getMutableSection index into the table without any checks to deleted entries so those are all O(1), I'm trying to think of others which became O(N^2) but I cannot think of them. | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
481–485 | I created this MatchesRange template function. It's not as cool as what you were suggesting but I couldn't figure out how to make a gtest matcher. |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
129–131 | Hmm... thinking about it, how about "Gets the index of the next entry that hasn't ..." and "If there is no entry after ..." Also "this returns the index one past the last entry" to be more precise. | |
134 | Maybe the cast indicates that these indexes should be size_t everywhere, since they're indexes into a container? What do you think? | |
189–190 | Keeping a running count would make sense to me, if it improves the complexity of other functions. | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
23 | Do you need the MatchLen parameter? I don't think you do... | |
28 | std::distance already returns a size_t. No need to cast it. | |
31 | const T& maybe? | |
430 | Exactly. Don't test asserts. | |
487 | Match -> SectionsMatch |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
---|---|---|
481–485 | You could consider something like this: std::vector<StringRef> Sections; transform(MutableObject.sections(), [](??? S) { return S->getName(); }, std::back_inserter(Sections)); EXPECT_THAT(Sections, testing::ElementsAre(".foo", ".bar", ...)); The reason I would prefer that over your current solution is because this will give you a mostly reasonable error message if the check fails, whereas here you'll just get "false is not true"(tm). (You could even make a utility function to hide the transform stuff and make the check a one-liner.) |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
189–190 | getNextIndex() is also linear (with the linear factor being # of removed symbols), making getFirstIndex() linear, which makes all the *_begin() iterator methods linear. See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with llvm-objcopy --only-section .foo which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690) That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures. |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
134 | Sounds good. I have changed to size_t it does make more sense. | |
189–190 |
I think that MutableTable::size() isn't actually even used anywhere anymore I just updated it because it was used previously but its use was replaced by getEndIndex(). Should I just remove it then? Keeping a count of the number of elements would only improve the complexity of size() I think.
Oh, right.
FWIW currently the deletion of a symbol I think is linear (the use of remove_if) and has the cost of copying the elements where this doesn't have that same cost. Certainly it will be slower if iteration happens many times after.
Absolutely :) | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
28 | I thought so too but it returns the iterators difference_type which on most containers is ptrdiff_t. That's a really strange decision by the committee I think, I don't see the rational of not using size_t given distance will never be negative. | |
481–485 | Oh that's much better, before when I didn't know the order yaml2obj was putting the sections in I had to use lldb because all I got was "false is not true" like you said. Thanks!
I'll probably do this later when I use it in more than one place. |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
---|---|---|
28 | For random-access iterators, it is valid to plug the begin and end iterators into std::distance in reverse order and get a negative value in return (but only since c++11). |
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
129–131 | You missed this bit:
| |
134 |
+1 to this | |
189–190 | If size() is not used, get rid of it and update the getEndIndex() comment. | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
28 | You're both right. My bad. |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
---|---|---|
505–507 | These checks should also use the same pattern above (checking the whole list of section names), with the common bits extracted to a utility method. (A gmock matcher would be nice, but the API to write a new matcher is insanely hard to get right). |
I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
---|---|---|
28 | Wow I never knew that | |
135–145 | Should I just remove all of this? |
Yeah, there is. It involves writing either an operator<< or a PrintTo function. The StringRef class already has an operator<<, but the problem here is that gtest wants to print it as a container instead of using that operator<<. I don't know if there's anything we can do here, but @sammccall might. If it bothers you too much you can use a vector<string> on the LHS of the match ( i wouldn't use const char * because I am not sure if the returned StringRefs are guaranteed to be null terminated, and its best to not rely on that).
Right, the sequence is:
- PrintTo() is found by overload resolution, if there's no specialized version, it falls back to the generic template
- the generic template checks whether it's a container (by looking for nested iterator and const_iterator types)
- for non-containers, it attempts to use operator<< if it exists, falling back to "n-byte object XX-XX-XX-XX" otherwise
- for containers, it prints each element in the same manner
So overloading operator<< doesn't bypass the container detection, but overloading PrintTo() does. And this is what gtest does for std::string, in gtest-printers.h. I think we should do the same as a local modification to internal/custom/gtest-printers.h.
If it bothers you too much you can use a vector<string> on the LHS of the match ( i wouldn't use const char * because I am not sure if the returned StringRefs are guaranteed to be null terminated, and its best to not rely on that).
This has bothered at various times, but I never really investigated how to fix it. I'll try to put together a patch.
llvm/include/llvm/Object/MutableELFObject.h | ||
---|---|---|
155–156 | This comment needs updating. | |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
135–136 | Is this needed now that you have the following lines? | |
135–145 | It looks to me to be largely redundant now (though you should still keep the "change the section name in the middle" bit). | |
486–488 | Do you need this, given you have the previous line? | |
500–502 | Do you need this? | |
510–512 | Do you need this? |
llvm/unittests/Object/MutableELFObjectTest.cpp | ||
---|---|---|
22–23 | Is T ever going to be something besides StringRef? If not, removing that template parameter may be better. | |
138–142 | Can SectionNames be inlined to avoid variable reuse? EXPECT_THAT(collect<StringRef>(MutableObject.sections(), getSectionName), ::testing::ElementsAre("", ".sec0", ".sec1", ".sec2", ".symtab", ".strtab", ".shstrtab")); |
entries, similarly -> entries. Similary