This is an archive of the discontinued LLVM Phabricator instance.

[Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 3]
AcceptedPublic

Authored by abrachet on Aug 1 2019, 11:31 PM.

Details

Summary

Add support for removing sections and symbols.

Diff Detail

Event Timeline

abrachet created this revision.Aug 1 2019, 11:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2019, 11:31 PM
abrachet updated this revision to Diff 213525.Aug 5 2019, 11:38 PM

Use new MutableTable. Add support for deletion.

abrachet planned changes to this revision.Aug 6 2019, 10:18 PM
abrachet updated this revision to Diff 214505.Aug 9 2019, 11:59 PM
abrachet edited the summary of this revision. (Show Details)
abrachet added subscribers: MaskRay, labath.

Self note: I still need to look at coverage of the unit tests.

llvm/include/llvm/Object/MutableELFObject.h
74

entries, similarly -> entries. Similary

128

non remove index -> next index that hasn't been removed

The comment should say what happens if there are no such indexes before the end of the table.

jhenderson added inline comments.Aug 13 2019, 9:03 AM
llvm/include/llvm/Object/MutableELFObject.h
131

You need to assert that CurrentIndex is valid.

std::find_if might be an easier to read function to use here.

138

Could this be based on getNextIndex()? You could check first if the first element has not been removed, and then call getNextIndex(0);

145

non removed entries -> entries that haven't been removed

146

Maybe getEffectiveIndex? Relative doesn't really sound right. What is it relative to?

154–155

"different to size() because size() returns the number of entries that haven't been removed."

156

Perhaps call this getEndIndex()

346

Superfluous overload.

354

Superfluous overload.

llvm/unittests/Object/MutableELFObjectTest.cpp
400

Missing trailing full stop.

425

This shouldn't be auto. Can it be done inline in the check?

430

This shouldn't be auto.

441

Can this be done inline in the check?

445

Missing trailing full stop.

468

Probably requires a comment saying why "6".

481

"index. It"

But really, I'm not sure you should have these tests here. We don't usually test assertions. If this a likely situation to occur, then perhaps the methods should return an Expected.

jhenderson added inline comments.Aug 14 2019, 8:27 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
418

Not sure, but it might be interesting to add some dynamic symbols and show that iterating over those isn't affected by the regular table changes.

433

I think it would be interesting to see what happens when you remove the second symbol too, leaving only the null entry. I also think it would be interesting to test what happens if you try to remove the null entry (I think this should probably be an error).

494

Again, a test case for trying to remove the null section header might be interesting (it should probably be an error).

504–505

Rather than looking at just one, you should probably loop over all of the remaining sections, including the null one.

abrachet updated this revision to Diff 215319.Aug 14 2019, 9:25 PM

Addressed review comments

abrachet marked 20 inline comments as done.Aug 14 2019, 9:30 PM
abrachet added inline comments.
llvm/unittests/Object/MutableELFObjectTest.cpp
418

Wow! Great catch! There was actually a bug here with my moveSymbolNext not differentiating between the two symbol tables. Thanks :)

433

I haven't done this or the similar comment for sections yet. Do you think the removeSymbol method should return Error or should this just be an assert?

jhenderson added inline comments.Aug 15 2019, 4:42 AM
llvm/include/llvm/Object/MutableELFObject.h
128

You've lost the "the" between "Gets" and "next".

This part hasn't been addressed:

The comment should say what happens if there are no such indexes before the end of the table.

130

Why the cast? And what is the type of Range? I don't think you want auto here.

131

What happens if std::find_if returns Range.end()? Won't that result in you dereferencing something invalid?

151

You misunderstood my previous comment. I only wanted you to change the comment from the word "different". In other words, the full text should be:

/// Returns the index of the last element. This is different to size()
/// because size() returns the number of entries that haven't been removed.
llvm/unittests/Object/MutableELFObjectTest.cpp
433

Good question. I'd probably just assert for now.

484–485

added by yaml2obj

abrachet updated this revision to Diff 215420.Aug 15 2019, 9:42 AM
abrachet marked an inline comment as done.

Addressed review comments

abrachet marked 11 inline comments as done.Aug 15 2019, 9:51 AM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
130

Because templates are very finicky :) The alternative is I could do seq<uint64_t>(CurrentIndex + 1, Mappings.size()). Do you have a preference either way?

131

I had to check this to make sure, *llvm::seq(0, 10).end() yields 10, so it just returns the index in the end state, so just Mappings.size().

llvm/unittests/Object/MutableELFObjectTest.cpp
433

Ok, I've added assertions to removeSymbol and removeSection. I should not test these explicitly because it is an assert, right?

rupprecht added inline comments.Aug 16 2019, 1:55 PM
llvm/include/llvm/Object/MutableELFObject.h
133–137

*llvm::find_if(seq(CurrentIndex + 1, (uint64_t)Mappings.size()), [this](uint64_t Index) { ... });

191–196

Many of these methods are now linear, meaning this library isn't feasible to operate on files that have a large number of symbols (especially if you end up calling these methods in a loop and go quadratic). I think this is fine to experiment with but should be noted, so it can be fixed before adding uses of it.

llvm/unittests/Object/MutableELFObjectTest.cpp
484–488

I think writing a test matcher would make this more self-documenting than a comment.
e.g.
EXPECT_THAT(MutableObject, SectionsAre({".remove_me", ..., }));
which would check both std::distance and each section name by iterating over all sections.

Writing a test helper (instead of a full blown matcher) would also work; you might not need a thorough matcher outside of this test.

abrachet updated this revision to Diff 215735.Aug 16 2019, 7:07 PM
abrachet marked 3 inline comments as done.

Addressed review comments

abrachet marked 4 inline comments as done.Aug 16 2019, 7:12 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
191–196

FWIW, I think size is the only method which gets its complexity changed, this can be solved by just keeping a running count, and having add and remove methods change Size. Does that sound ok?

Methods like getMutableSection index into the table without any checks to deleted entries so those are all O(1), I'm trying to think of others which became O(N^2) but I cannot think of them.

llvm/unittests/Object/MutableELFObjectTest.cpp
484–488

I created this MatchesRange template function. It's not as cool as what you were suggesting but I couldn't figure out how to make a gtest matcher.

jhenderson added inline comments.Aug 19 2019, 2:31 AM
llvm/include/llvm/Object/MutableELFObject.h
128–130

Hmm... thinking about it, how about "Gets the index of the next entry that hasn't ..." and "If there is no entry after ..."

Also "this returns the index one past the last entry" to be more precise.

133

Maybe the cast indicates that these indexes should be size_t everywhere, since they're indexes into a container? What do you think?

191–196

Keeping a running count would make sense to me, if it improves the complexity of other functions.

llvm/unittests/Object/MutableELFObjectTest.cpp
23

Do you need the MatchLen parameter? I don't think you do...

28

std::distance already returns a size_t. No need to cast it.

31

const T& maybe?

433

Exactly. Don't test asserts.

490

Match -> SectionsMatch

labath added inline comments.Aug 19 2019, 7:29 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
484–488

You could consider something like this:

std::vector<StringRef> Sections;
transform(MutableObject.sections(), [](??? S) { return S->getName(); }, std::back_inserter(Sections));
EXPECT_THAT(Sections, testing::ElementsAre(".foo", ".bar", ...));

The reason I would prefer that over your current solution is because this will give you a mostly reasonable error message if the check fails, whereas here you'll just get "false is not true"(tm). (You could even make a utility function to hide the transform stuff and make the check a one-liner.)

rupprecht added inline comments.Aug 19 2019, 9:58 AM
llvm/include/llvm/Object/MutableELFObject.h
191–196

getNextIndex() is also linear (with the linear factor being # of removed symbols), making getFirstIndex() linear, which makes all the *_begin() iterator methods linear.

See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with llvm-objcopy --only-section .foo which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690)

That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures.

abrachet updated this revision to Diff 216032.Aug 19 2019, 6:12 PM

Addressed review comments.

abrachet marked 13 inline comments as done.Aug 19 2019, 6:34 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
133

Sounds good. I have changed to size_t it does make more sense.

191–196

Keeping a running count would make sense to me, if it improves the complexity of other functions.

I think that MutableTable::size() isn't actually even used anywhere anymore I just updated it because it was used previously but its use was replaced by getEndIndex(). Should I just remove it then? Keeping a count of the number of elements would only improve the complexity of size() I think.

getNextIndex() is also linear (with the linear factor being # of removed symbols), making getFirstIndex() linear, which makes all the *_begin() iterator methods linear.

Oh, right.

See rL354597 for a real world motivation with llvm-objcopy. An easy way to need to "remove" a lot of symbols is to extract a single section with llvm-objcopy --only-section .foo which will implicitly delete symbols not in that section. If the number of symbols not in that section is large, it will be slow if you keep having to iterate over all removed symbols. (See the test case here if you're interested: https://reviews.llvm.org/D58296?vs=on&id=187690)

FWIW currently the deletion of a symbol I think is linear (the use of remove_if) and has the cost of copying the elements where this doesn't have that same cost. Certainly it will be slower if iteration happens many times after.

That said, there isn't necessarily a need to prematurely optimize this. I think it's fine to just document that the methods are linear, and if we run into regressions when using it on real scenarios, we can rework the data structures.

Absolutely :)

llvm/unittests/Object/MutableELFObjectTest.cpp
28

I thought so too but it returns the iterators difference_type which on most containers is ptrdiff_t. That's a really strange decision by the committee I think, I don't see the rational of not using size_t given distance will never be negative.

484–488

Oh that's much better, before when I didn't know the order yaml2obj was putting the sections in I had to use lldb because all I got was "false is not true" like you said. Thanks!

You could even make a utility function to hide the transform stuff and make the check a one-liner.

I'll probably do this later when I use it in more than one place.

MaskRay added inline comments.Aug 19 2019, 10:42 PM
llvm/include/llvm/Object/MutableELFObject.h
133

uint64_t -> size_t change should be made to Part 1 D64281.

labath added inline comments.Aug 20 2019, 1:48 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
28

For random-access iterators, it is valid to plug the begin and end iterators into std::distance in reverse order and get a negative value in return (but only since c++11).

jhenderson added inline comments.Aug 20 2019, 3:00 AM
llvm/include/llvm/Object/MutableELFObject.h
128–130

You missed this bit:

Also "this returns the index one past the last entry" to be more precise.

133

uint64_t -> size_t change should be made to Part 1 D64281.

+1 to this

191–196

If size() is not used, get rid of it and update the getEndIndex() comment.

llvm/unittests/Object/MutableELFObjectTest.cpp
28

You're both right. My bad.

rupprecht accepted this revision.Aug 20 2019, 3:22 PM
rupprecht added inline comments.
llvm/unittests/Object/MutableELFObjectTest.cpp
508–510

These checks should also use the same pattern above (checking the whole list of section names), with the common bits extracted to a utility method.

(A gmock matcher would be nice, but the API to write a new matcher is insanely hard to get right).

This revision is now accepted and ready to land.Aug 20 2019, 3:22 PM
abrachet updated this revision to Diff 216318.Aug 20 2019, 7:15 PM
abrachet marked 3 inline comments as done.

Addressed comments

abrachet marked 8 inline comments as done.Aug 20 2019, 7:21 PM

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

llvm/unittests/Object/MutableELFObjectTest.cpp
28

Wow I never knew that

130–151

Should I just remove all of this?

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

Yeah, there is. It involves writing either an operator<< or a PrintTo function. The StringRef class already has an operator<<, but the problem here is that gtest wants to print it as a container instead of using that operator<<. I don't know if there's anything we can do here, but @sammccall might. If it bothers you too much you can use a vector<string> on the LHS of the match ( i wouldn't use const char * because I am not sure if the returned StringRefs are guaranteed to be null terminated, and its best to not rely on that).

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

Yeah, there is. It involves writing either an operator<< or a PrintTo function. The StringRef class already has an operator<<, but the problem here is that gtest wants to print it as a container instead of using that operator<<. I don't know if there's anything we can do here, but @sammccall might.

Right, the sequence is:

  • PrintTo() is found by overload resolution, if there's no specialized version, it falls back to the generic template
  • the generic template checks whether it's a container (by looking for nested iterator and const_iterator types)
  • for non-containers, it attempts to use operator<< if it exists, falling back to "n-byte object XX-XX-XX-XX" otherwise
  • for containers, it prints each element in the same manner

So overloading operator<< doesn't bypass the container detection, but overloading PrintTo() does. And this is what gtest does for std::string, in gtest-printers.h. I think we should do the same as a local modification to internal/custom/gtest-printers.h.

If it bothers you too much you can use a vector<string> on the LHS of the match ( i wouldn't use const char * because I am not sure if the returned StringRefs are guaranteed to be null terminated, and its best to not rely on that).

This has bothered at various times, but I never really investigated how to fix it. I'll try to put together a patch.

jhenderson added inline comments.Aug 21 2019, 2:15 AM
llvm/include/llvm/Object/MutableELFObject.h
154–155

This comment needs updating.

llvm/unittests/Object/MutableELFObjectTest.cpp
129–130

Is this needed now that you have the following lines?

130–151

It looks to me to be largely redundant now (though you should still keep the "change the section name in the middle" bit).

489–491

Do you need this, given you have the previous line?

503–505

Do you need this?

513–515

Do you need this?

I've been using ::testing::ElementsAre(...) against a vector of StringRef a lot, when it fails the output is really bad. Like this: element #1 is equal to { '.' (46, 0x2E), 's' (115, 0x73), 'e' (101, 0x65), 'c' (99, 0x63), '0' (48, 0x30) }, Is there a way to provide formatting for types to gtest? Or should I just be using const char *?

Yeah, there is. It involves writing either an operator<< or a PrintTo function. The StringRef class already has an operator<<, but the problem here is that gtest wants to print it as a container instead of using that operator<<. I don't know if there's anything we can do here, but @sammccall might.

This has bothered at various times, but I never really investigated how to fix it. I'll try to put together a patch.

D66520 addresses this.

rupprecht added inline comments.Aug 21 2019, 11:31 AM
llvm/unittests/Object/MutableELFObjectTest.cpp
22–23

Is T ever going to be something besides StringRef? If not, removing that template parameter may be better.

132–136

Can SectionNames be inlined to avoid variable reuse?

EXPECT_THAT(collect<StringRef>(MutableObject.sections(), getSectionName),
            ::testing::ElementsAre("", ".sec0", ".sec1", ".sec2", ".symtab",
                                   ".strtab", ".shstrtab"));
abrachet updated this revision to Diff 216448.Aug 21 2019, 12:53 PM
abrachet marked an inline comment as done.

Addressed comments

abrachet marked 7 inline comments as done.Aug 21 2019, 12:53 PM
jhenderson accepted this revision.Aug 22 2019, 3:34 AM

No more comments from me. LGTM.