This is an archive of the discontinued LLVM Phabricator instance.

[Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 1]
AcceptedPublic

Authored by abrachet on Jul 6 2019, 1:18 AM.

Details

Summary

This initial patch adds a MutableELFObject class for doing mutations on sections of an ELFObjectFile.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
abrachet marked 6 inline comments as done.Jul 25 2019, 5:05 PM

Sorry to barge in here, but I couldn't resist to not spread the knowledge of more advanced googletest features. EXPECT_EQ is perfectly safe to use with StringRefs as it just defers to their operator==, and (as I mentioned in the other review), we have better facilities for checking the state of Expected and Error values.

Not at all thanks for the comments I really appreciate them!

llvm/include/llvm/Object/MutableObject.h
18

Not sure what to do here. I think eventually I will have a MutableObject base class, and it would seem silly to make a MutableRange.h. What do you think I should do?

22

FWIW, I think the MMU's on X86 and AArch64 both only support pointer sizes of 48 bits or similar, in either case less than 64, so pointers cannot be this large. I'm not sure this would be safe for 32 bit architectures though, but we have PointerIntPair in ADT, so this pattern is used in LLVM.

I would also add that moving the bool out effectively doubles the structs size because presumably the compiler will word align it, but then reading those members is faster now.

In either case it is crazy to worry about something like this in this stage! So I have done away with the bit field.

abrachet marked 11 inline comments as done.Jul 25 2019, 5:06 PM
This comment was removed by abrachet.
labath added inline comments.Jul 26 2019, 12:28 AM
llvm/include/llvm/Object/MutableObject.h
22

Note that PointerIntPair uses the *low* bits of the pointer to store data, which is a generally safe (depending on your reading of the c++ standard) thing to do for aligned values. 64-bit pointers are indeed smaller, but for instance there's an arm64 pointer authentication extension, which uses those extra bits to store pointer "signatures". And yeah, there are no free bits on 32-bit pointers. :)

47

I guess this assert isn't needed now that you store the full uintptr_t?

llvm/unittests/Object/MutableELFObjectTest.cpp
166–170 ↗(On Diff #211857)

using llvm::zip(ObjFile.sections(), MutableObject.sections()) is a bit cleaner way to handle parallel iteration (but don't forget to assert that the range sizes are the same).

With something like EXPECT_THAT(MutableObject.sections(), testing::ContainerEq(ObjFile.sections()) this check would be a one-liner, but it would require a bit more plumbing to make sure the elements are comparable, so it may not be worth it if this is just a one-off thing...

grimar added inline comments.Jul 26 2019, 1:54 AM
llvm/include/llvm/Object/MutableELFObject.h
28

Since all members are public, should this be a struct?

llvm/unittests/Object/MutableELFObjectTest.cpp
21 ↗(On Diff #211857)

I'd suggest adding a short description comment before each unit test
(about what this test intended to do) just like we often do in a regular test files.

jhenderson added inline comments.Jul 26 2019, 5:59 AM
llvm/include/llvm/Object/ELFObjectFile.h
794 ↗(On Diff #211857)

I'm not sure I understand why you've made this change from the previous version? I don't think it gives anything.

llvm/include/llvm/Object/MutableObject.h
18

I'd probably put it in MutableELFObject.h for now and move it out in a later change when needed.

llvm/unittests/Object/MutableELFObjectTest.cpp
43 ↗(On Diff #211857)

This still has the unnecessary trailing return type.

94 ↗(On Diff #211857)

You need an ASSERT here that there are the correct number of sections.

abrachet updated this revision to Diff 211946.Jul 26 2019, 8:31 AM

Addressed review comments.

abrachet marked 9 inline comments as done.Jul 26 2019, 8:35 AM
abrachet added inline comments.
llvm/include/llvm/Object/ELFObjectFile.h
794 ↗(On Diff #211857)

Because I didn't know you could do ELFObjectFile<ELFT>::isBerkeleyText() to not call the overrided method! Oops.

llvm/include/llvm/Object/MutableELFObject.h
28

For some reason I thought I remembered the style guide saying only POD types should be struct and others should be class regardless of access modifiers.

llvm/unittests/Object/MutableELFObjectTest.cpp
166–170 ↗(On Diff #211857)

Thanks never knew about zip!

jhenderson added inline comments.Jul 29 2019, 8:25 AM
llvm/include/llvm/Object/ELFObjectFile.h
801 ↗(On Diff #211946)

Are you sure you want to be calling the base-class method though? Isn't that going to go wrong if Sec is a modified or added section?

llvm/include/llvm/Object/MutableELFObject.h
21

I've been thinking about how this class is used by its clients. To me, it doesn't seem great that clients have to know whether a given member is new or not (i.e. whether it is stored in NewValues etc). I think it's okay to provide makeMutable to get a mutable reference to one of the original members, but I also feel like getNew should not be in the public interface. Instead, you should provide a function that can take a key or something to look things up with, possibly a MappingType, and return the corresponding item as a const variable.

25

New -> IsNew

This is a) easier to read, and b) going to play nicer with the upcoming variable name changes.

llvm/unittests/Object/MutableELFObjectTest.cpp
21 ↗(On Diff #211946)

sections -> section's

140 ↗(On Diff #211946)

You probably also need to show that if makeMutable has been called, calls to the various functions in the interface reference the mutable version. Currently, you only do this for the section name.

abrachet marked 5 inline comments as done.Jul 29 2019, 9:40 AM
abrachet added inline comments.
llvm/include/llvm/Object/ELFObjectFile.h
801 ↗(On Diff #211946)

When these functions are called, they get called with a DataRef pointing to the Elf_Shdr whether they are new or not. I was segfaulting here before because in ELFObjectFile the DataRef's point to the Elf_Shdr but in MutableELFObject they are indexes into the MutableRange. I couldn't find a way for the DataRefs to be interpreted the same way between the two classes. This isn't ideal though.

llvm/include/llvm/Object/MutableELFObject.h
21

I had something like this originally I think, and I agree that right now it isn't very ergonomic. The problem is that the original and new entries are of completely different types. I haven't found a great way to do this. What I don't do right now but might be better is require that the new type, T have a conversion operator. But no matter what, there needs to be a public method to get a pointer to the new type. This would require that MutableRange also have a template parameter to the original type it is pointing to (unless the conversion was to uintptr_t i suppose).

abrachet updated this revision to Diff 212274.Jul 29 2019, 10:41 PM

Addressed review comments

abrachet marked 4 inline comments as done.Jul 29 2019, 10:42 PM
rupprecht added inline comments.Jul 30 2019, 2:19 PM
llvm/include/llvm/Object/MutableELFObject.h
27

nit: it's simpler if all the params (Ptr/IsNew) are consistently in the same order (for class members, constructor arg order, and member initializers)

60–61

These should have an assertion (in debug mode) that Index is in range

67

Mapping.Ptr is already a uintptr_t, is the reinterpret_cast necessary?

90

std::memcpy

111–112

Is toDataRef(reinterpret_cast<uintptr_t>(...)) redundant? (Converting from pointer -> uintptr_t -> pointer)

135

I think this is a potentially-invalid reference, as B is std::move-d for the previous member. It looks like this works because the move constructor for ELFObjectFile is implemented as a copy constructor, although that is unusual (a typical implementation would swap() members):

template <class ELFT>
ELFObjectFile<ELFT>::ELFObjectFile(ELFObjectFile<ELFT> &&Other)
    : ELFObjectFile(Other.Data, Other.EF, Other.DotDynSymSec,
                    Other.DotSymtabSec, Other.ShndxTable) {}

I'm curious if you can test this theory by "fixing" the move constructor to be something like:

template <class ELFT>
ELFObjectFile<ELFT>::ELFObjectFile(ELFObjectFile<ELFT> &&Other) {
  using std::swap;
  swap(Data, Other.Data);
  swap(EF, Other.EF);
  swap(DotDynSymSec, Other.DotDynSymSec);
  swap(DotSymtabSec, Other.DotSymtabSec);
  swap(ShndxTable, Other.ShndxTable);
}

(no need to check it in though)

I think you can just use section_begin()/section_end() directly here, and it will correctly call the methods from the parent class (the vtable doesn't update until later)

136

What's being captured by [&] here?

llvm/unittests/Object/MutableELFObjectTest.cpp
47 ↗(On Diff #212274)

This should check the error code too

48 ↗(On Diff #212274)

Can you avoid the copy here, and just return StringRef?

108 ↗(On Diff #212274)

nit: ZeroData is kind of misleading since this is not 0, but rather an ASCII '0', i.e. this is {48, 48, 48, 48}.

At any rate, it doesn't matter that it's "zero", any random data works. Testing against something non-zero is probably healthier for tests too. Maybe just rename it to something else and use different bytes (e.g. to test byte ordering)

168–169 ↗(On Diff #212274)

This macro seems small enough that it should just be typed out

190–191 ↗(On Diff #212274)

You can put both Iter and End in the for header as described here: http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

for (auto Iter = MutableObject.section_begin(), End = MutableObject.section_end();
     Iter != End; ++Iter)

(taking this out-of-line due to length)

The problem is that the original and new entries are of completely different types.

What bit is this referring to? The MutableRange is for wrapping a container (vector) of Ts, and the new values are stored as Ts.

I don't want to need to know whether I'm dealing with a new or original member at any point outside the class itself. Indeed, if I can hide the MappingType completely from the public interface, that's even better. If I'm looking up an item, I want to have an index/key into the range, which transparently works on fetching the new or old member depending on where it is in the range. That will require some plumbing to know which container (the underlying one or the "new" one) is referring to.

What about the following?

template <typename T>
class MutableRange {
  ArrayRef<T> Originals;
  std::map<size_t Optional<T>> Mapping;
  using value_type = Mapping::value_type;

public:
  MutableRange(ArrayRef<T> Container) : Originals(Container) {
    for (size_t I = 0, Size = Container.size(); I != Size; ++I)
      Mapping.insert(std::make_pair(I, None));
  }

  void push_back(T t) {
    Mapping.insert(value_type(Mapping.size(), T));
  }

  const T& operator[](size_t Index) const {
    assert(Mapping.count(Index) != 0);
    if (Mapping[Index] == None)
      return Originals[Index];
    return Mapping[Index];
  }

  T& getMutable(size_t Index) {
    assert(Mapping.count(Index) != 0);
    if (Mapping[Index] == None)
      Mapping[Index] = Originals[Index];
    return Mapping[Index];
  }

  void remove(T t) {
    size_t Removed = 0;
    for (size_t I = 0, Removed = 0, Size = Mapping.size(); I != Size; ++I) {
      const T& Value = Mapping[I] ? *Mapping[I] : Originals[I];
      if (Value == t) {
        Mapping.erase(I);
        ++Removed;
        continue;
      }
      if (Removed == 0)
        continue;
      Mapping[I-Removed] = Mapping[I];
    }
  }
};

I think this satisfies the requirements of the container: it provides access to the underlying container member, until it gets modified, even if members before it have been removed. It also provides a way to get a modifiable version. You could even use a different key instead of size_t e.g. a pointer, by keeping a record of the mapping between pointer and index, though that is a little trickier. If you use a pointer, you could replace the Optional with a std::unique_ptr. In turn, that might be enough to avoid co-opting the DataRefImpl behaviour so that the base class behaviour will just work (because the pointer will point to a real section, just the real section is stored in this map).

Does this work?

llvm/include/llvm/Object/ELFObjectFile.h
801 ↗(On Diff #211946)

Oh, I realised I was mistaken as to which class we are in. I really don't like this change here at all. The base class shouldn't need to know that it has to call the base class version of the method. It should just work. Effectively the design here now means the base class has to know that there's a subclass that might change the meaning of some of its functions under-the-hood, and guard against that.

The solution might be to make getSection a virtual method and override it in your sub-class. What do you think of that?

(won't quote because of length but I'll tag you so you know I am responding) @jhenderson
That does work, and was what I was thinking of before. Also I quite like keeping pointers so it just works with the methods in the base class. But I'm still not sure what type T should be. If we use sections as an example, when changing the name it isn't possible (or feasible, I suppose that sh_name could be a pointer (not into the string table but a pointer into the processes memory) for modified sections). Right now, the original type is just an Elf_Shdr, or rather the mapping type points to one, but for a modified section, it indexes into the vector where to find the T, which is a wrapper around an Elf_Shdr but for example for its name it has a StringRef so it can be easily changed. This is how llvm-objcopy does it, but it copies every single section whether or not they will ever be modified. I agree that it is ugly for the users of the class to ever have to deal with the MappingType, it isn't ideal, but I don't know how to do this unless we copy every single one like is done in llvm-objcopy, and at that point I would suggest the Section type just have an Elf_Shdr * to the original and just keep a vector of Sections.

Something I played around with before was having a template parameter to the original type and then having methods with return OrigType & and require that T have a conversion function to 'OrigType`. This would be cleaner for the user but also avoid copying every original type into the wrapper type T. This is all moot if you don't think that copying is that big of a bottle neck, @MaskRay mentioned this. After all this is how its done for sections in llvm-objcopy. But I am still shooting for being able to use this class with symbols too so there may be more concern about this there.

abrachet updated this revision to Diff 212725.Jul 31 2019, 10:08 PM

Addressed review comments.

abrachet updated this revision to Diff 212726.Jul 31 2019, 10:13 PM

Fixed silly mistake

abrachet marked 14 inline comments as done.Jul 31 2019, 11:10 PM

Again responding to your comment @jhenderson but not quoting due to length.

I'll try to explain MutableRange further and why certain decisions were made, upon reading my previous response I don't think I explained well. I fear that this spells the writing on the wall for the current implementation because if I need to explain it isn't written very well and won't be maintainable!

Again for convenience I'll talk about sections. When it is initially constructed the std::vector<T> NewValues has a size of 0, the std::vector<MappingType> Mappings has as a size of as many sections in the original file. The MappingTypes will look like this: {.IsNew = false, .Ptr = <pointer into the file>}, but on modification that mapping looks like {.IsNew = true, .Ptr = <index into NewValues>}. This means that for unmodified sections there is an overhead of sizeof(MappingType), not sizeof(T) so doing modifications on only some sections should be faster, or at least there is less copying and uses less memory.

Of course there's no arguing that it leads to what is pretty ugly (and obviously causes a lot of confusion) solution. To get to the Elf_Shdr* from a new section there is a lot of indirection. DataRef => Mapping => NewValues => Elf_Shdr. And then for an unmodified one it is DataRef => Mapping => Elf_Shdr.

I still think that modified sections cannot be just an Elf_Shdr and there isn't a clean way to hold either an Elf_Shdr or a MutableELFSection. I had played around with a union, which didn't work because MutableELFSection had a non trivial copy constructor, but this still requires that the uses of MutableRange know to distinguish between the original type and the modified type. I will also say that although it is ugly, and it would be ideal if what you call the plumbing was inside of MutableRange, that plumbing exists in MutableELFObject, but it isn't exposed outside of the class. MutableELFObject abstracts this and works the same way as its predecessors.

llvm/include/llvm/Object/ELFObjectFile.h
801 ↗(On Diff #211946)

What do you think of that?

I think that saves so much boiler plate code! Can't believe I didn't see this earlier thank you. Also, I remember @jakehehrlich saying how I was previously doing it was not good, this is much better now. I now only have to override moveSectionNext, getSectionName, getSectionIndex and getSectionContents

llvm/include/llvm/Object/MutableELFObject.h
135

I couldn't test to see if it breaks with your example of swapping members because ELFObjectFile has no default constructor so all I could use were Others members to initialize and then the swap is just swapping two equal values.

I think you can just use section_begin()/section_end() directly here, and it will correctly call the methods from the parent class (the vtable doesn't update until later)

This didn't work for me, my program never exited (I gave it ~10 seconds assuming a segfault was coming). I am still using B.section_begin()/end(). I also tried ELFObjectFile<ELFT>::section_begin() and that compiled but caused a crash. Tried to create the OwningArrayRef with nonsense data OwningArrayRef(this=0x0000000106006280, Size=72057594037927936)

ObjectTests(66699,0x10c97d5c0) malloc: can't allocate region
*** mach_vm_map(size=72057594037927936) failed (error code=3)
ObjectTests(66699,0x10c97d5c0) malloc: *** set a breakpoint in malloc_error_break to debug
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
lebedev.ri resigned from this revision.Aug 1 2019, 3:11 PM
abrachet updated this revision to Diff 213178.Aug 2 2019, 11:10 PM
abrachet marked an inline comment as done.

Complete overhaul of how MutableRange works. It is now called MutableTable which seemed more fitting. It has a much cleaner interface for MutableELFObject while maintaining low overhead.

abrachet updated this revision to Diff 213500.Aug 5 2019, 5:24 PM

Fixed the reason for needing to make a member mutable

Since you're introducing a new class in a public header, you should write doxygen comments for at least the public/protected interface, and quite possibly any private bits too. I think if you are careful with those comments too, it will help explain the design of the classes.

I've not looked at the unit tests today. I'll do that tomorrow. The interface looks a lot cleaner than it did when I last looked at this. Hopefully adding comments will make it much clearer what's going on. If you make MutableTable a sub-class, then my concerns about the leaky implementation (i.e. needing to decide between fetching the section properties from the table or the base version) are significantly lessened.

llvm/include/llvm/Object/MutableELFObject.h
20

Perhaps this should be nested inside MutableELFObject for now, since it is effectively an implementation detail? Obviously, it would move to MutableObject if that class ever exists.

56

For what do you envisage this function being useful?

62

How about having this return a reference, rather than a pointer? That would seem to be a more natural fit.

71

What is this function useful for?

Returning an Optional would be a nicer interface than returning a nullptr, IMO.

92

Does this function need to exist as a free function, or can it be a private method of MutableELFObject?

122

Do you need this to be a friend, or can you just pass in another argument or two (if needed) into the constructor of MutableELFSection?

128

protected? Does this class have any subclasses?

144

Use explicit here. Should this be an r-value reference? What do you think?

161–162

You're going to need to do something with the Error inside Expected here. This function should probably return an Expected too, so that you can pass that error up to a higher level to be handled.

169

What is this function for?

abrachet updated this revision to Diff 213791.Aug 6 2019, 9:35 PM

Addressed review comments

abrachet marked 15 inline comments as done.Aug 6 2019, 9:52 PM

Since you're introducing a new class in a public header, you should write doxygen comments for at least the public/protected interface, and quite possibly any private bits too. I think if you are careful with those comments too, it will help explain the design of the classes.

I've not looked at the unit tests today. I'll do that tomorrow. The interface looks a lot cleaner than it did when I last looked at this. Hopefully adding comments will make it much clearer what's going on. If you make MutableTable a sub-class, then my concerns about the leaky implementation (i.e. needing to decide between fetching the section properties from the table or the base version) are significantly lessened.

I'm still working on the comments for the private interface although they are coming. I do have just the one for the one public method not inherited from ObjectFile.

llvm/include/llvm/Object/MutableELFObject.h
56

This was referencing getOriginal before.

In symbols for example, the top half of the DataRef is the section index of the symbol table. In that case I use getOriginal to see its sh_type. It wouldn't make sense to use the updated section list in this case I think.

71

This was referencing getIfNew previously.

It's for methods which could not use just the OrigType to do their job properly. For example, getSectionName returns the Name member when the section has been modified because there is no way to describe the change to ELFObjectFile through its sh_name field.

Optional does not work with references so far as I can tell. Is this correct? A copy would be too costly for MutableELFSections which have an OwningArrayRef and std::string.

144

Sounds fine to me. In the unit tests I move the ELFObjectFile but then continue to use it because as @rupprecht pointed out previously the move constructor is just a copy constructor. Is it safe to do this or should I explicitly copy?

169

It advances the section_iterator. MutableELFObject uses a different iterator to that of ELFObjectFile which I belive would have a moveSectionNext that did something like Sec.p += sizeof(Elf_Shdr) as those just point directly to the section header.

abrachet updated this revision to Diff 213912.Aug 7 2019, 8:38 AM
abrachet marked 4 inline comments as done.

Realized from the code example in doxygen comment that it was very unergonomic, added a better overload to public methods which take a SectionRef so they can be used in for-each loops.

jhenderson added inline comments.Aug 7 2019, 8:50 AM
llvm/include/llvm/Object/MutableELFObject.h
71

You're right about Optional, sorry for the noise.

getSectionName currently calls getConstIfNew, so this function doesn't need to exist, right?

118

Does this need to exist? If so, it needs unit-testing.

122

Does this need to exist? If so, it needs unit-testing.

144

Is it safe to do this or should I explicitly copy?

This doesn't sound safe to me - you need to copy or just simply not use the object after it's been moved. a) I wouldn't be surprised if this is technically undefined behaviour and the compiler makes assumptions about the state of the instance after the move. b) It's fragile to future changes.

169

At the moment, it doesn't look like it's used, so it shouldn't exist. Re-add it when it is needed. However, perhaps a safer thing to do is define your own iterator class, as @jakehehrlich said that does this internally.

llvm/unittests/Object/MutableELFObjectTest.cpp
21–22 ↗(On Diff #213791)

I'd rephrase the second half of this sentence as "methods work the same in both ELFObjectFile and MutableELFObject".

43 ↗(On Diff #213791)

Isn't the solution here to just make a second local variable that's a pointer to an ELFObjectFile constructed from MutableObject?

ELFObjectFile *ElfObject = &MutableObject;

46 ↗(On Diff #213791)

ObjFileSecCount?

48 ↗(On Diff #213791)

MutObjSecCount?

50 ↗(On Diff #213791)

This should be ASSERT_EQ, since the rest of the test assumes there's the same number.

78 ↗(On Diff #213791)

IIRC, this will still abort if there is an error, since the error itself hasn't been handled. The correct thing to do would be to a) check there's no Error, and then b) call consumeError().

114 ↗(On Diff #213791)

You need to assert somewhere that NumSections is as expected.

125 ↗(On Diff #213791)

You need to show the iterator is still valid after this, by checking std::distance.

192 ↗(On Diff #213791)

I'm pretty sure you don't need the UL suffix here. Same below.

Aargh, you pushed your latest update whilst I was doing my last review, so I can't see what are the latest changes!

added a better overload to public methods which take a SectionRef so they can be used in for-each loops.

What is this bit referring to?

jhenderson added inline comments.Aug 7 2019, 8:55 AM
llvm/include/llvm/Object/MutableELFObject.h
160

Probably this should be handleError(MutSecOrErr.takeError());

abrachet updated this revision to Diff 213933.Aug 7 2019, 9:54 AM
abrachet marked 9 inline comments as done.

Addressed review comments

abrachet marked 12 inline comments as done.Aug 7 2019, 9:59 AM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
118

Yes it is used and thus tested implicitly. How would I go about testing the private interface? I could make it protected and then inherit from it in the unit test, but that doesn't sound great. FWIW this is almost identical to the getHeader method in ELFFile. https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/Object/ELF.h#L108

122

Yes. It is used for every public method on SectionRef that gets tested in the unit test. Is this enough or should I test it directly as well?

144

I did an explicit copy /I think/ in the unit test now. I have an ELFObjectFile * and then deference it and assign to an ObjectFile &. Does this make a copy or does the compiler omit it and just assign the address?

169

This was referring to moveSectionNext() before.

It does get used, it is an overload of the function that content_iterator::operator ++() calls. It must be overriden because ELFObjectFile::moveSectionNext() iterates over sections not with their index but their mapped address. This uses index into MutableTable.

llvm/unittests/Object/MutableELFObjectTest.cpp
43 ↗(On Diff #213791)

I don't think so. Then both will be MutableELFObject's and the same virtual methods will be called for each, I believe. I think testing the methods on an ELFObjectFile and MutableELFObject test that overloads for MutableELFObject behave the same as the methods in ELFObjectFile when the sections haven't been changed.

78 ↗(On Diff #213791)

Thanks!

114 ↗(On Diff #213791)

I have ASSERT_EQ(NumSections, 7); it ended up being 7 sections because of additional sections that yaml2obj added. Is this what you meant?

192 ↗(On Diff #213791)

I don't actually need the L it turns out. Without making it unsigned I get this warning comparison of integers of different signs on clang.

abrachet marked 3 inline comments as done.Aug 7 2019, 10:03 AM

Aargh, you pushed your latest update whilst I was doing my last review, so I can't see what are the latest changes!

added a better overload to public methods which take a SectionRef so they can be used in for-each loops.

What is this bit referring to?

I added getMutableSection(SectionRef) and changed getMutableSection(section_iterator Sec) to just call that with a *Sec. Maybe I should remove the section_iterator overload though?

labath added inline comments.Aug 7 2019, 11:50 AM
llvm/include/llvm/Object/MutableELFObject.h
144

Kind of the whole point of references is that they allow you to "refer" to other objects without making copies of them. If you want to make a copy, drop the & and assign it to an ObjectFile (or ELFObjectFile, or something)

abrachet updated this revision to Diff 213985.Aug 7 2019, 1:43 PM

Create a new ObjectFile in unit test

abrachet marked 2 inline comments as done.Aug 7 2019, 1:44 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
144

Oh yeah. What am I talking about!

abrachet updated this revision to Diff 214062.Aug 7 2019, 9:26 PM
abrachet marked an inline comment as done.

Added doxygen comments to private interface. NFC

jhenderson added inline comments.Aug 8 2019, 6:42 AM
llvm/include/llvm/Object/MutableELFObject.h
21

Comment?

44

How about a comment for MutableELFObject?

87

"regardless if it" sounds off to me. How about "regardless of whether it"

118

Sorry, missed that getHeader was private.

Being able to modify the header contents is a required thing however, so you might want to consider how this could be done (possibly a different change though).

122

I have no idea which method this was referring to any more...

141

overrie -> override

142

wether -> whether

169

Sorry, I missed that moveSectionNext() is an override.

llvm/unittests/Object/MutableELFObjectTest.cpp
54 ↗(On Diff #214062)

Rather than reusing the same variable, create a new variable here. Then don't bother with the ObjFile variable above. It doesn't give us anything.

88 ↗(On Diff #214062)

Don't use auto here.

141 ↗(On Diff #214062)

EXPECT_EQ -> ASSERT_EQ to avoid dereferencing an invalid iterator later on.

I think I either misread the code last time, as I don't know why I wanted this check here. However, you do show that the std::distance from section_begin to section_end is still correct after you called getMutableSection, but not until after you've done the iteration, meaning that your iterator could be invalid part way through your calls to compareSectionName. Move the later std::distance check up to here instead.

114 ↗(On Diff #213791)

Yes, that's right.

192 ↗(On Diff #213791)

Sounds like clang isn't doing a good enough job here - it's a spurious warning, because 2 is a positive integer literal so can clearly be handled properly in this comparison.

Check elsewhere, but I thought it was more common in the code base to use lower-case for literal suffixes when they're needed.

abrachet marked 2 inline comments as done.Aug 8 2019, 6:52 AM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
122

Oops I think I responded to this before adding the comments so I forgot to add a what the comment was originally referring to.

It was getSection(DataRefImpl). Its the function that I made virtual from ELFObjectFile. All the tests that I have now like FirstSec->getSize() test this method

abrachet updated this revision to Diff 214305.Aug 8 2019, 11:09 PM
abrachet marked an inline comment as done.

Addressed review comments

abrachet marked 17 inline comments as done.Aug 8 2019, 11:28 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
21

Not really sure what to say, is that enough you think?

44

Same as above, not quite sure what to say.

llvm/unittests/Object/MutableELFObjectTest.cpp
54 ↗(On Diff #214062)

I think that it does give something, The ObjectFiles are made from the same yaml so I'm testing that the public methods return the same values. Its a whole thing because ObjectFile has no copy constructor. Also we don't want to use the ObjectFile after moving it. I also still think that doing something like this

MutableELFObject<ELF64LE> MutableObject(std::move(*MutELFObjFile));
ELFObjectFile<ELF64LE> *ElfObject = &MutableObject;

// test the interface

will never produce any differences so it doesn't test that MutableELFObject has the same behavior as ELFObjectFile on an unmodified object file.

141 ↗(On Diff #214062)

Is this what you mean? And then remove the one on 156? Sorry wasn't quite sure.

192 ↗(On Diff #213791)

FWIW I think there is there's a function template instantiation somewhere in these macros and so its doing unsigned_expr == signed_expr not unsigned_expr == 2.

jhenderson added inline comments.Aug 9 2019, 8:43 AM
llvm/include/llvm/Object/MutableELFObject.h
21

Why would you use this class? What is the overall aim here?

It probably wants to say something like it is used by MutableELFObject to provide a mutable version of an ELFObjectFile Section or something to that effect.

44

I think this comment needs to explain why this class exists. Essentially it should say why we can't use ELFObjectFile directly for our purposes.

llvm/unittests/Object/MutableELFObjectTest.cpp
54 ↗(On Diff #214062)

I think you misunderstood. Having MutELFObjFile is fine, as is constructing two objects from the YAML. ObjFile doesn't need to exist however. You can use ELFObjFile directly now that you're not reusing the local variable for the creation of MutableObject, so the two variables you operate on are ELFObjFile (created directly from YAML) and MutableObject (created indirectly from YAML via MutELFObjFile).

141 ↗(On Diff #214062)

Sorry, I obviously wasn't clear enough, and I realised something else whilst writing this comment.

At the time of writing, the std::distance at line 127 (checks the section count before mutating), and the one at line 142 (checks it after mutating) are the ones that should exist. The one at line 155 is unnecessary, since you check it at line 142. The one at line 140 is necessary because it shows that the Iter variable is still valid after getting a mutable section.

abrachet updated this revision to Diff 214503.Aug 9 2019, 9:24 PM
abrachet marked 3 inline comments as done.

Addressed review comments

abrachet marked 6 inline comments as done.Aug 9 2019, 9:26 PM
jhenderson accepted this revision.Aug 12 2019, 9:02 AM

Aside from a comment change, this looks good to me. I'd wait for others to approve too though, and I wouldn't necessarily land it even then until some later patches are ready too, although I'm not sure about that.

llvm/include/llvm/Object/MutableELFObject.h
46

I'd change this slightly:

... an ELFObjectFile. The ELFObjectFile class only provides ...

This revision is now accepted and ready to land.Aug 12 2019, 9:02 AM
rupprecht added inline comments.Aug 12 2019, 2:27 PM
llvm/include/llvm/Object/MutableELFObject.h
24

You should be able to use the more common:

using Elf_Shdr = typename ELFT::Shdr;
82–85

"OriginalValues" is overloaded, so it's not clear which one is being referenced in the constructor

A common hacky practice in LLVM is to name the constructor arg "TheOriginalValues" or something -- that's fine to use if you can't come up with any other different name.

135–136

ditto

llvm/unittests/Object/MutableELFObjectTest.cpp
88–89 ↗(On Diff #214503)

Is it expected that an error here should not cause the test to fail? I think you want an assertion that Expect doesn't have an error instead.

abrachet updated this revision to Diff 214738.Aug 12 2019, 4:32 PM

Addressed review comments

abrachet marked 5 inline comments as done.Aug 12 2019, 4:33 PM
jhenderson accepted this revision.Aug 14 2019, 8:08 AM

Latest changes LGTM.

rupprecht accepted this revision.Aug 15 2019, 11:16 AM
MaskRay added inline comments.Aug 19 2019, 10:13 PM
llvm/include/llvm/Object/MutableELFObject.h
85

llvm::transform(OriginalValues, ...)

114

I think such Out of bounds assertions are probably not very necessary. They will certainly segfault if the user does have out-of-bounds accesses.

117
Mappings[Index] = MappingType(NewValues.size(), MappingType::New);
NewValues.emplace_back(std::forward<Args>(Arguments)...);
151

typo: overridden

the section_iterators in MutableELFObject -> MutableELFObject::section_begin ? (Just name it explicitly)

185

for (SectionRef Sec : MutObj.sections())

SectionRef has just 2 words. It is usually passed by value.

188
if (auto MutSecOrErr = MutObj.getMutableSection(Sec))
  then;
else
  handleError(...);
190

typo: sh_addralign

jhenderson added inline comments.Aug 20 2019, 2:34 AM
llvm/include/llvm/Object/MutableELFObject.h
114

Won't that entirely depend on how the memory is laid out, compiler-generated checks etc? A segmentation fault hardly provides any useful information, whereas at least the assertion shows what's going wrong in builds with assertions enabled...

MaskRay added inline comments.Aug 20 2019, 2:48 AM
llvm/include/llvm/Object/MutableELFObject.h
114

I think if the user of this library makes an out-of-bounds access, the program will very quickly segfault with a clear stack trace (symbolization can happen with a symbol table, even if debug info is absent) that the problem is related to the use of MutableELFObject.

There are lots of ways to detect errors: -D_GLIBCXX_DEBUG / asan / ...

We can add some assert() in some tricky places (they also serve as documentation of some invariants) but here I think it is probably not very necessary.

rupprecht added inline comments.Aug 20 2019, 12:49 PM
llvm/include/llvm/Object/MutableELFObject.h
114

I think if the user of this library makes an out-of-bounds access, the program will very quickly segfault with a clear stack trace

Having dealt with many prod crashes in the past, I strongly disagree. C++ optimizations can often make the crash happen far away from the actual bug.

I think it's much more common to run tools in debug mode than in asan, so it's useful to have these.

FYI, you'll have to rebase this whole patch series after rL368826 due to Section::getName() changing signature

abrachet updated this revision to Diff 216260.Aug 20 2019, 3:01 PM

Addressed review comments

abrachet marked 7 inline comments as done.Aug 20 2019, 3:03 PM

FYI, you'll have to rebase this whole patch series after rL368826 due to Section::getName() changing signature

I was safe for this patch it seems but not others, thanks for the heads up.

llvm/include/llvm/Object/MutableELFObject.h
114

I haven't removed the assertions in this diff. I think worst case it looks ugly and I agree that they aren't super necessary but they have helped me track bugs down before so I think the benefit outweighs the cost.

abrachet updated this revision to Diff 216301.Aug 20 2019, 5:59 PM

Use size_t for indexes not uint64_t

abrachet updated this revision to Diff 216305.Aug 20 2019, 6:11 PM

Removed superfluous overloads from this patch not later ones

jhenderson accepted this revision.Aug 21 2019, 1:37 AM

Latest round of changes LGTM.

I've been thinking about this, is size_t safe to use? Some methods from inherited classes use uint64_t on my machine its fine because even unsigned long and unsigned long long have the same width and sign but on a 32 bit machine I think that assigning to a size_t (which is guaranteed to be word size, I think) from a uint64_t might produce a warning. I don't think there are any examples in this patch, but this seemed like the best place to ask.

abrachet marked an inline comment as done.Aug 21 2019, 6:36 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
148

Actually, here is an example, MutableTable::operator [](size_t) being called with DataRefImpl::p which is uint64_t.

I've been thinking about this, is size_t safe to use? Some methods from inherited classes use uint64_t on my machine its fine because even unsigned long and unsigned long long have the same width and sign but on a 32 bit machine I think that assigning to a size_t (which is guaranteed to be word size, I think) from a uint64_t might produce a warning. I don't think there are any examples in this patch, but this seemed like the best place to ask.

You should use size_t and consider casting for cases using uint64_t. In a 32-bit build, size_t is usually a 32-bit number (i.e. unsigned int), so theoretically, the compiler could produce warnings for truncation issues when converting from a uint64_t. Since the main thing these indexes are used for is for accessing a vector, the type should match the argument type of std::vector<T>::operator[], which is (usually) a size_t.

I've been thinking about this, is size_t safe to use? Some methods from inherited classes use uint64_t on my machine its fine because even unsigned long and unsigned long long have the same width and sign but on a 32 bit machine I think that assigning to a size_t (which is guaranteed to be word size, I think) from a uint64_t might produce a warning. I don't think there are any examples in this patch, but this seemed like the best place to ask.

You should use size_t and consider casting for cases using uint64_t. In a 32-bit build, size_t is usually a 32-bit number (i.e. unsigned int), so theoretically, the compiler could produce warnings for truncation issues when converting from a uint64_t. Since the main thing these indexes are used for is for accessing a vector, the type should match the argument type of std::vector<T>::operator[], which is (usually) a size_t.

@MaskRay In the past I remember you finding usages on certain types (or similar) in the code base. Is there some kind of tool you use that could search every time a function that takes a size_t but is called with a unit64_t? Do you use clang-query for these kinds of things?