Page MenuHomePhabricator

[Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 1]
AcceptedPublic

Authored by abrachet on Jul 6 2019, 1:18 AM.

Details

Summary

This initial patch adds a MutableELFObject class for doing mutations on sections of an ELFObjectFile.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Aug 7 2019, 8:55 AM
llvm/include/llvm/Object/MutableELFObject.h
160

Probably this should be handleError(MutSecOrErr.takeError());

abrachet updated this revision to Diff 213933.Aug 7 2019, 9:54 AM
abrachet marked 9 inline comments as done.

Addressed review comments

abrachet marked 12 inline comments as done.Aug 7 2019, 9:59 AM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
118

Yes it is used and thus tested implicitly. How would I go about testing the private interface? I could make it protected and then inherit from it in the unit test, but that doesn't sound great. FWIW this is almost identical to the getHeader method in ELFFile. https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Object/ELF.h#L108

122

Yes. It is used for every public method on SectionRef that gets tested in the unit test. Is this enough or should I test it directly as well?

144

I did an explicit copy /I think/ in the unit test now. I have an ELFObjectFile * and then deference it and assign to an ObjectFile &. Does this make a copy or does the compiler omit it and just assign the address?

169

This was referring to moveSectionNext() before.

It does get used, it is an overload of the function that content_iterator::operator ++() calls. It must be overriden because ELFObjectFile::moveSectionNext() iterates over sections not with their index but their mapped address. This uses index into MutableTable.

llvm/unittests/Object/MutableELFObjectTest.cpp
43 ↗(On Diff #213791)

I don't think so. Then both will be MutableELFObject's and the same virtual methods will be called for each, I believe. I think testing the methods on an ELFObjectFile and MutableELFObject test that overloads for MutableELFObject behave the same as the methods in ELFObjectFile when the sections haven't been changed.

78 ↗(On Diff #213791)

Thanks!

114 ↗(On Diff #213791)

I have ASSERT_EQ(NumSections, 7); it ended up being 7 sections because of additional sections that yaml2obj added. Is this what you meant?

192 ↗(On Diff #213791)

I don't actually need the L it turns out. Without making it unsigned I get this warning comparison of integers of different signs on clang.

abrachet marked 3 inline comments as done.Aug 7 2019, 10:03 AM

Aargh, you pushed your latest update whilst I was doing my last review, so I can't see what are the latest changes!

added a better overload to public methods which take a SectionRef so they can be used in for-each loops.

What is this bit referring to?

I added getMutableSection(SectionRef) and changed getMutableSection(section_iterator Sec) to just call that with a *Sec. Maybe I should remove the section_iterator overload though?

labath added inline comments.Aug 7 2019, 11:50 AM
llvm/include/llvm/Object/MutableELFObject.h
144

Kind of the whole point of references is that they allow you to "refer" to other objects without making copies of them. If you want to make a copy, drop the & and assign it to an ObjectFile (or ELFObjectFile, or something)

abrachet updated this revision to Diff 213985.Aug 7 2019, 1:43 PM

Create a new ObjectFile in unit test

abrachet marked 2 inline comments as done.Aug 7 2019, 1:44 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
144

Oh yeah. What am I talking about!

abrachet updated this revision to Diff 214062.Aug 7 2019, 9:26 PM
abrachet marked an inline comment as done.

Added doxygen comments to private interface. NFC

jhenderson added inline comments.Aug 8 2019, 6:42 AM
llvm/include/llvm/Object/MutableELFObject.h
21

Comment?

44

How about a comment for MutableELFObject?

87

"regardless if it" sounds off to me. How about "regardless of whether it"

118

Sorry, missed that getHeader was private.

Being able to modify the header contents is a required thing however, so you might want to consider how this could be done (possibly a different change though).

122

I have no idea which method this was referring to any more...

141

overrie -> override

142

wether -> whether

169

Sorry, I missed that moveSectionNext() is an override.

llvm/unittests/Object/MutableELFObjectTest.cpp
114 ↗(On Diff #213791)

Yes, that's right.

192 ↗(On Diff #213791)

Sounds like clang isn't doing a good enough job here - it's a spurious warning, because 2 is a positive integer literal so can clearly be handled properly in this comparison.

Check elsewhere, but I thought it was more common in the code base to use lower-case for literal suffixes when they're needed.

54 ↗(On Diff #214062)

Rather than reusing the same variable, create a new variable here. Then don't bother with the ObjFile variable above. It doesn't give us anything.

88 ↗(On Diff #214062)

Don't use auto here.

141 ↗(On Diff #214062)

EXPECT_EQ -> ASSERT_EQ to avoid dereferencing an invalid iterator later on.

I think I either misread the code last time, as I don't know why I wanted this check here. However, you do show that the std::distance from section_begin to section_end is still correct after you called getMutableSection, but not until after you've done the iteration, meaning that your iterator could be invalid part way through your calls to compareSectionName. Move the later std::distance check up to here instead.

abrachet marked 2 inline comments as done.Aug 8 2019, 6:52 AM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
122

Oops I think I responded to this before adding the comments so I forgot to add a what the comment was originally referring to.

It was getSection(DataRefImpl). Its the function that I made virtual from ELFObjectFile. All the tests that I have now like FirstSec->getSize() test this method

abrachet updated this revision to Diff 214305.Aug 8 2019, 11:09 PM
abrachet marked an inline comment as done.

Addressed review comments

abrachet marked 17 inline comments as done.Aug 8 2019, 11:28 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
21

Not really sure what to say, is that enough you think?

44

Same as above, not quite sure what to say.

llvm/unittests/Object/MutableELFObjectTest.cpp
192 ↗(On Diff #213791)

FWIW I think there is there's a function template instantiation somewhere in these macros and so its doing unsigned_expr == signed_expr not unsigned_expr == 2.

54 ↗(On Diff #214062)

I think that it does give something, The ObjectFiles are made from the same yaml so I'm testing that the public methods return the same values. Its a whole thing because ObjectFile has no copy constructor. Also we don't want to use the ObjectFile after moving it. I also still think that doing something like this

MutableELFObject<ELF64LE> MutableObject(std::move(*MutELFObjFile));
ELFObjectFile<ELF64LE> *ElfObject = &MutableObject;

// test the interface

will never produce any differences so it doesn't test that MutableELFObject has the same behavior as ELFObjectFile on an unmodified object file.

141 ↗(On Diff #214062)

Is this what you mean? And then remove the one on 156? Sorry wasn't quite sure.

jhenderson added inline comments.Aug 9 2019, 8:43 AM
llvm/include/llvm/Object/MutableELFObject.h
21

Why would you use this class? What is the overall aim here?

It probably wants to say something like it is used by MutableELFObject to provide a mutable version of an ELFObjectFile Section or something to that effect.

44

I think this comment needs to explain why this class exists. Essentially it should say why we can't use ELFObjectFile directly for our purposes.

llvm/unittests/Object/MutableELFObjectTest.cpp
54 ↗(On Diff #214062)

I think you misunderstood. Having MutELFObjFile is fine, as is constructing two objects from the YAML. ObjFile doesn't need to exist however. You can use ELFObjFile directly now that you're not reusing the local variable for the creation of MutableObject, so the two variables you operate on are ELFObjFile (created directly from YAML) and MutableObject (created indirectly from YAML via MutELFObjFile).

141 ↗(On Diff #214062)

Sorry, I obviously wasn't clear enough, and I realised something else whilst writing this comment.

At the time of writing, the std::distance at line 127 (checks the section count before mutating), and the one at line 142 (checks it after mutating) are the ones that should exist. The one at line 155 is unnecessary, since you check it at line 142. The one at line 140 is necessary because it shows that the Iter variable is still valid after getting a mutable section.

abrachet updated this revision to Diff 214503.Aug 9 2019, 9:24 PM
abrachet marked 3 inline comments as done.

Addressed review comments

abrachet marked 6 inline comments as done.Aug 9 2019, 9:26 PM
jhenderson accepted this revision.Aug 12 2019, 9:02 AM

Aside from a comment change, this looks good to me. I'd wait for others to approve too though, and I wouldn't necessarily land it even then until some later patches are ready too, although I'm not sure about that.

llvm/include/llvm/Object/MutableELFObject.h
46

I'd change this slightly:

... an ELFObjectFile. The ELFObjectFile class only provides ...

This revision is now accepted and ready to land.Aug 12 2019, 9:02 AM
rupprecht added inline comments.Aug 12 2019, 2:27 PM
llvm/include/llvm/Object/MutableELFObject.h
24

You should be able to use the more common:

using Elf_Shdr = typename ELFT::Shdr;
82–85

"OriginalValues" is overloaded, so it's not clear which one is being referenced in the constructor

A common hacky practice in LLVM is to name the constructor arg "TheOriginalValues" or something -- that's fine to use if you can't come up with any other different name.

135–136

ditto

llvm/unittests/Object/MutableELFObjectTest.cpp
88–89 ↗(On Diff #214503)

Is it expected that an error here should not cause the test to fail? I think you want an assertion that Expect doesn't have an error instead.

abrachet updated this revision to Diff 214738.Aug 12 2019, 4:32 PM

Addressed review comments

abrachet marked 5 inline comments as done.Aug 12 2019, 4:33 PM
jhenderson accepted this revision.Aug 14 2019, 8:08 AM

Latest changes LGTM.

rupprecht accepted this revision.Aug 15 2019, 11:16 AM
MaskRay added inline comments.Aug 19 2019, 10:13 PM
llvm/include/llvm/Object/MutableELFObject.h
85

llvm::transform(OriginalValues, ...)

114

I think such Out of bounds assertions are probably not very necessary. They will certainly segfault if the user does have out-of-bounds accesses.

117
Mappings[Index] = MappingType(NewValues.size(), MappingType::New);
NewValues.emplace_back(std::forward<Args>(Arguments)...);
151

typo: overridden

the section_iterators in MutableELFObject -> MutableELFObject::section_begin ? (Just name it explicitly)

185

for (SectionRef Sec : MutObj.sections())

SectionRef has just 2 words. It is usually passed by value.

188
if (auto MutSecOrErr = MutObj.getMutableSection(Sec))
  then;
else
  handleError(...);
190

typo: sh_addralign

jhenderson added inline comments.Aug 20 2019, 2:34 AM
llvm/include/llvm/Object/MutableELFObject.h
114

Won't that entirely depend on how the memory is laid out, compiler-generated checks etc? A segmentation fault hardly provides any useful information, whereas at least the assertion shows what's going wrong in builds with assertions enabled...

MaskRay added inline comments.Aug 20 2019, 2:48 AM
llvm/include/llvm/Object/MutableELFObject.h
114

I think if the user of this library makes an out-of-bounds access, the program will very quickly segfault with a clear stack trace (symbolization can happen with a symbol table, even if debug info is absent) that the problem is related to the use of MutableELFObject.

There are lots of ways to detect errors: -D_GLIBCXX_DEBUG / asan / ...

We can add some assert() in some tricky places (they also serve as documentation of some invariants) but here I think it is probably not very necessary.

rupprecht added inline comments.Aug 20 2019, 12:49 PM
llvm/include/llvm/Object/MutableELFObject.h
114

I think if the user of this library makes an out-of-bounds access, the program will very quickly segfault with a clear stack trace

Having dealt with many prod crashes in the past, I strongly disagree. C++ optimizations can often make the crash happen far away from the actual bug.

I think it's much more common to run tools in debug mode than in asan, so it's useful to have these.

FYI, you'll have to rebase this whole patch series after rL368826 due to Section::getName() changing signature

abrachet updated this revision to Diff 216260.Aug 20 2019, 3:01 PM

Addressed review comments

abrachet marked 7 inline comments as done.Aug 20 2019, 3:03 PM

FYI, you'll have to rebase this whole patch series after rL368826 due to Section::getName() changing signature

I was safe for this patch it seems but not others, thanks for the heads up.

llvm/include/llvm/Object/MutableELFObject.h
114

I haven't removed the assertions in this diff. I think worst case it looks ugly and I agree that they aren't super necessary but they have helped me track bugs down before so I think the benefit outweighs the cost.

abrachet updated this revision to Diff 216301.Aug 20 2019, 5:59 PM

Use size_t for indexes not uint64_t

abrachet updated this revision to Diff 216305.Aug 20 2019, 6:11 PM

Removed superfluous overloads from this patch not later ones

jhenderson accepted this revision.Aug 21 2019, 1:37 AM

Latest round of changes LGTM.

I've been thinking about this, is size_t safe to use? Some methods from inherited classes use uint64_t on my machine its fine because even unsigned long and unsigned long long have the same width and sign but on a 32 bit machine I think that assigning to a size_t (which is guaranteed to be word size, I think) from a uint64_t might produce a warning. I don't think there are any examples in this patch, but this seemed like the best place to ask.

abrachet marked an inline comment as done.Aug 21 2019, 6:36 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
148

Actually, here is an example, MutableTable::operator [](size_t) being called with DataRefImpl::p which is uint64_t.

I've been thinking about this, is size_t safe to use? Some methods from inherited classes use uint64_t on my machine its fine because even unsigned long and unsigned long long have the same width and sign but on a 32 bit machine I think that assigning to a size_t (which is guaranteed to be word size, I think) from a uint64_t might produce a warning. I don't think there are any examples in this patch, but this seemed like the best place to ask.

You should use size_t and consider casting for cases using uint64_t. In a 32-bit build, size_t is usually a 32-bit number (i.e. unsigned int), so theoretically, the compiler could produce warnings for truncation issues when converting from a uint64_t. Since the main thing these indexes are used for is for accessing a vector, the type should match the argument type of std::vector<T>::operator[], which is (usually) a size_t.

I've been thinking about this, is size_t safe to use? Some methods from inherited classes use uint64_t on my machine its fine because even unsigned long and unsigned long long have the same width and sign but on a 32 bit machine I think that assigning to a size_t (which is guaranteed to be word size, I think) from a uint64_t might produce a warning. I don't think there are any examples in this patch, but this seemed like the best place to ask.

You should use size_t and consider casting for cases using uint64_t. In a 32-bit build, size_t is usually a 32-bit number (i.e. unsigned int), so theoretically, the compiler could produce warnings for truncation issues when converting from a uint64_t. Since the main thing these indexes are used for is for accessing a vector, the type should match the argument type of std::vector<T>::operator[], which is (usually) a size_t.

@MaskRay In the past I remember you finding usages on certain types (or similar) in the code base. Is there some kind of tool you use that could search every time a function that takes a size_t but is called with a unit64_t? Do you use clang-query for these kinds of things?