Page MenuHomePhabricator

[Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 3]
AcceptedPublic

Authored by abrachet on Aug 1 2019, 11:31 PM.

Details

Summary

Add support for removing sections and symbols.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Aug 13 2019, 9:03 AM
llvm/include/llvm/Object/MutableELFObject.h
132

You need to assert that CurrentIndex is valid.

std::find_if might be an easier to read function to use here.

139

Could this be based on getNextIndex()? You could check first if the first element has not been removed, and then call getNextIndex(0);

146

non removed entries -> entries that haven't been removed

147

Maybe getEffectiveIndex? Relative doesn't really sound right. What is it relative to?

155–156

"different to size() because size() returns the number of entries that haven't been removed."

157

Perhaps call this getEndIndex()

337

Superfluous overload.

345

Superfluous overload.

llvm/unittests/Object/MutableELFObjectTest.cpp
397

Missing trailing full stop.

422

This shouldn't be auto. Can it be done inline in the check?

427

This shouldn't be auto.

438

Can this be done inline in the check?

442

Missing trailing full stop.

465

Probably requires a comment saying why "6".

478

"index. It"

But really, I'm not sure you should have these tests here. We don't usually test assertions. If this a likely situation to occur, then perhaps the methods should return an Expected.

jhenderson added inline comments.Aug 14 2019, 8:27 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
415

Not sure, but it might be interesting to add some dynamic symbols and show that iterating over those isn't affected by the regular table changes.

430

I think it would be interesting to see what happens when you remove the second symbol too, leaving only the null entry. I also think it would be interesting to test what happens if you try to remove the null entry (I think this should probably be an error).

491

Again, a test case for trying to remove the null section header might be interesting (it should probably be an error).

501–502

Rather than looking at just one, you should probably loop over all of the remaining sections, including the null one.

abrachet updated this revision to Diff 215319.Aug 14 2019, 9:25 PM

Addressed review comments

abrachet marked 20 inline comments as done.Aug 14 2019, 9:30 PM
abrachet added inline comments.
llvm/unittests/Object/MutableELFObjectTest.cpp
415

Wow! Great catch! There was actually a bug here with my moveSymbolNext not differentiating between the two symbol tables. Thanks :)

430

I haven't done this or the similar comment for sections yet. Do you think the removeSymbol method should return Error or should this just be an assert?

jhenderson added inline comments.Aug 15 2019, 4:42 AM
llvm/include/llvm/Object/MutableELFObject.h
129

You've lost the "the" between "Gets" and "next".

This part hasn't been addressed:

The comment should say what happens if there are no such indexes before the end of the table.

131

Why the cast? And what is the type of Range? I don't think you want auto here.

132

What happens if std::find_if returns Range.end()? Won't that result in you dereferencing something invalid?

152

You misunderstood my previous comment. I only wanted you to change the comment from the word "different". In other words, the full text should be:

/// Returns the index of the last element. This is different to size()
/// because size() returns the number of entries that haven't been removed.
llvm/unittests/Object/MutableELFObjectTest.cpp
430

Good question. I'd probably just assert for now.

481–482

added by yaml2obj

abrachet updated this revision to Diff 215420.Aug 15 2019, 9:42 AM
abrachet marked an inline comment as done.

Addressed review comments

abrachet marked 11 inline comments as done.Aug 15 2019, 9:51 AM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
131

Because templates are very finicky :) The alternative is I could do seq<uint64_t>(CurrentIndex + 1, Mappings.size()). Do you have a preference either way?

132

I had to check this to make sure, *llvm::seq(0, 10).end() yields 10, so it just returns the index in the end state, so just Mappings.size().

llvm/unittests/Object/MutableELFObjectTest.cpp
430

Ok, I've added assertions to removeSymbol and removeSection. I should not test these explicitly because it is an assert, right?

rupprecht added inline comments.Fri, Aug 16, 1:55 PM
llvm/include/llvm/Object/MutableELFObject.h
134–138

*llvm::find_if(seq(CurrentIndex + 1, (uint64_t)Mappings.size()), [this](uint64_t Index) { ... });

189–190

Many of these methods are now linear, meaning this library isn't feasible to operate on files that have a large number of symbols (especially if you end up calling these methods in a loop and go quadratic). I think this is fine to experiment with but should be noted, so it can be fixed before adding uses of it.

llvm/unittests/Object/MutableELFObjectTest.cpp
481–485

I think writing a test matcher would make this more self-documenting than a comment.
e.g.
EXPECT_THAT(MutableObject, SectionsAre({".remove_me", ..., }));
which would check both std::distance and each section name by iterating over all sections.

Writing a test helper (instead of a full blown matcher) would also work; you might not need a thorough matcher outside of this test.

abrachet updated this revision to Diff 215735.Fri, Aug 16, 7:07 PM
abrachet marked 3 inline comments as done.

Addressed review comments

abrachet marked 4 inline comments as done.Fri, Aug 16, 7:12 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
189–190

FWIW, I think size is the only method which gets its complexity changed, this can be solved by just keeping a running count, and having add and remove methods change Size. Does that sound ok?

Methods like getMutableSection index into the table without any checks to deleted entries so those are all O(1), I'm trying to think of others which became O(N^2) but I cannot think of them.

llvm/unittests/Object/MutableELFObjectTest.cpp
481–485

I created this MatchesRange template function. It's not as cool as what you were suggesting but I couldn't figure out how to make a gtest matcher.

jhenderson added inline comments.Mon, Aug 19, 2:31 AM
llvm/include/llvm/Object/MutableELFObject.h
129–131

Hmm... thinking about it, how about "Gets the index of the next entry that hasn't ..." and "If there is no entry after ..."

Also "this returns the index one past the last entry" to be more precise.

134

Maybe the cast indicates that these indexes should be size_t everywhere, since they're indexes into a container? What do you think?

189–190

Keeping a running count would make sense to me, if it improves the complexity of other functions.

llvm/unittests/Object/MutableELFObjectTest.cpp
23

Do you need the MatchLen parameter? I don't think you do...

28

std::distance already returns a size_t. No need to cast it.

31

const T& maybe?

430

Exactly. Don't test asserts.

487

Match -> SectionsMatch

labath added inline comments.Mon, Aug 19, 7:29 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
481–485

You could consider something like this:

std::vector<StringRef> Sections;
transform(MutableObject.sections(), [](??? S) { return S->getName(); }, std::back_inserter(Sections));
EXPECT_THAT(Sections, testing::ElementsAre(".foo", ".bar", ...));

The reason I would prefer that over your current solution is because this will give you a mostly reasonable error message if the check fails, whereas here you'll just get "false is not true"(tm). (You could even make a utility function to hide the transform stuff and make the check a one-liner.)

rupprecht added inline comments.Mon, Aug 19, 9:58 AM
llvm/include/llvm/Object/MutableELFObject.h
189–190

getNextIndex() is also linear (with the linear factor being # of removed symbols), making getFirstIndex() linear, which makes all the *_begin() iterator methods linear.

See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with llvm-objcopy --only-section .foo which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690)

That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures.

abrachet updated this revision to Diff 216032.Mon, Aug 19, 6:12 PM

Addressed review comments.

abrachet marked 13 inline comments as done.Mon, Aug 19, 6:34 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
134

Sounds good. I have changed to size_t it does make more sense.

189–190

Keeping a running count would make sense to me, if it improves the complexity of other functions.

I think that MutableTable::size() isn't actually even used anywhere anymore I just updated it because it was used previously but its use was replaced by getEndIndex(). Should I just remove it then? Keeping a count of the number of elements would only improve the complexity of size() I think.

getNextIndex() is also linear (with the linear factor being # of removed symbols), making getFirstIndex() linear, which makes all the *_begin() iterator methods linear.

Oh, right.

See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with llvm-objcopy --only-section .foo which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690)

FWIW currently the deletion of a symbol I think is linear (the use of remove_if) and has the cost of copying the elements where this doesn't have that same cost. Certainly it will be slower if iteration happens many times after.

That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures.

Absolutely :)

llvm/unittests/Object/MutableELFObjectTest.cpp
28

I thought so too but it returns the iterators difference_type which on most containers is ptrdiff_t. That's a really strange decision by the committee I think, I don't see the rational of not using size_t given distance will never be negative.

481–485

Oh that's much better, before when I didn't know the order yaml2obj was putting the sections in I had to use lldb because all I got was "false is not true" like you said. Thanks!

You could even make a utility function to hide the transform stuff and make the check a one-liner.

I'll probably do this later when I use it in more than one place.

MaskRay added inline comments.Mon, Aug 19, 10:42 PM
llvm/include/llvm/Object/MutableELFObject.h
134

uint64_t -> size_t change should be made to Part 1 D64281.

labath added inline comments.Tue, Aug 20, 1:48 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
28

For random-access iterators, it is valid to plug the begin and end iterators into std::distance in reverse order and get a negative value in return (but only since c++11).

jhenderson added inline comments.Tue, Aug 20, 3:00 AM
llvm/include/llvm/Object/MutableELFObject.h
129–131

You missed this bit:

Also "this returns the index one past the last entry" to be more precise.

134

uint64_t -> size_t change should be made to Part 1 D64281.

+1 to this

189–190

If size() is not used, get rid of it and update the getEndIndex() comment.

llvm/unittests/Object/MutableELFObjectTest.cpp
28

You're both right. My bad.

rupprecht accepted this revision.Tue, Aug 20, 3:22 PM
rupprecht added inline comments.
llvm/unittests/Object/MutableELFObjectTest.cpp
505–507

These checks should also use the same pattern above (checking the whole list of section names), with the common bits extracted to a utility method.

(A gmock matcher would be nice, but the API to write a new matcher is insanely hard to get right).

This revision is now accepted and ready to land.Tue, Aug 20, 3:22 PM
abrachet updated this revision to Diff 216318.Tue, Aug 20, 7:15 PM
abrachet marked 3 inline comments as done.

Addressed comments

abrachet marked 8 inline comments as done.Tue, Aug 20, 7:21 PM

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

llvm/unittests/Object/MutableELFObjectTest.cpp
28

Wow I never knew that

135–145

Should I just remove all of this?

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

Yeah, there is. It involves writing either an operator<< or a PrintTo function. The StringRef class already has an operator<<, but the problem here is that gtest wants to print it as a container instead of using that operator<<. I don't know if there's anything we can do here, but @sammccall might. If it bothers you too much you can use a vector<string> on the LHS of the match ( i wouldn't use const char * because I am not sure if the returned StringRefs are guaranteed to be null terminated, and its best to not rely on that).

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

Yeah, there is. It involves writing either an operator<< or a PrintTo function. The StringRef class already has an operator<<, but the problem here is that gtest wants to print it as a container instead of using that operator<<. I don't know if there's anything we can do here, but @sammccall might.

Right, the sequence is:

  • PrintTo() is found by overload resolution, if there's no specialized version, it falls back to the generic template
  • the generic template checks whether it's a container (by looking for nested iterator and const_iterator types)
  • for non-containers, it attempts to use operator<< if it exists, falling back to "n-byte object XX-XX-XX-XX" otherwise
  • for containers, it prints each element in the same manner

So overloading operator<< doesn't bypass the container detection, but overloading PrintTo() does. And this is what gtest does for std::string, in gtest-printers.h. I think we should do the same as a local modification to internal/custom/gtest-printers.h.

If it bothers you too much you can use a vector<string> on the LHS of the match ( i wouldn't use const char * because I am not sure if the returned StringRefs are guaranteed to be null terminated, and its best to not rely on that).

This has bothered at various times, but I never really investigated how to fix it. I'll try to put together a patch.

jhenderson added inline comments.Wed, Aug 21, 2:15 AM
llvm/include/llvm/Object/MutableELFObject.h
155–156

This comment needs updating.

llvm/unittests/Object/MutableELFObjectTest.cpp
135–136

Is this needed now that you have the following lines?

135–145

It looks to me to be largely redundant now (though you should still keep the "change the section name in the middle" bit).

486–488

Do you need this, given you have the previous line?

500–502

Do you need this?

510–512

Do you need this?

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

Yeah, there is. It involves writing either an operator<< or a PrintTo function. The StringRef class already has an operator<<, but the problem here is that gtest wants to print it as a container instead of using that operator<<. I don't know if there's anything we can do here, but @sammccall might.

This has bothered at various times, but I never really investigated how to fix it. I'll try to put together a patch.

D66520 addresses this.

rupprecht added inline comments.Wed, Aug 21, 11:31 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
22–23

Is T ever going to be something besides StringRef? If not, removing that template parameter may be better.

138–142

Can SectionNames be inlined to avoid variable reuse?

EXPECT_THAT(collect<StringRef>(MutableObject.sections(), getSectionName),
            ::testing::ElementsAre("", ".sec0", ".sec1", ".sec2", ".symtab",
                                   ".strtab", ".shstrtab"));
abrachet updated this revision to Diff 216448.Wed, Aug 21, 12:53 PM
abrachet marked an inline comment as done.

Addressed comments

abrachet marked 7 inline comments as done.Wed, Aug 21, 12:53 PM
jhenderson accepted this revision.Thu, Aug 22, 3:34 AM

No more comments from me. LGTM.