Page MenuHomePhabricator

[llvm-objcopy] [WIP] Librarify llvm-objcopy
AbandonedPublic

Authored by abrachet on Jun 12 2019, 12:44 AM.

Details

Summary

This is for my GSoC project to librarify llvm-objcopy. See bug 41044

The current code is in a very rough state and is a work in progress (WIP). It will of course be different however this is just for a check in and giving an example of renaming sections. This approach does so by keeping a map of modified sections which allows sections to be immutable but also seemingly modified in place. For example, if we rename section at index 1, then the iterator (hypothetical at this point but see MutableObject::getSection() ) at 1 will expose the new one. This lets sections be both immutable and rewritten to the file in their original order. Also I like that in this way methods like renameSection aren't themselves virtual, of course relying on virtual methods but I think this is a good start for code reuse.

I have never been great at designing class hierarchies and would love tips on how to structure the relationship between the classes. Jake had previously suggested having a hierarchy based on semantic meaning not implementation details, clearly there is a lot of work to be done here, I'm not blind to the fact that this code is a very rough draft.

Of note, something like SectionBase::clone() feels really bad to me. I don't really know a better way to do this though.

I would also say to ignore access modifiers for now, and also that everything is in the header file, I know this isn't conformant with the coding standards but I figured this was easier to keep in one file for now.

Diff Detail

Event Timeline

abrachet created this revision.Jun 12 2019, 12:44 AM
Herald added a project: Restricted Project. · View Herald Transcript

At least with renaming a section, what is stopping you just modifying SectionBase.Name?

More generally, I guess I don't understand why you need to make a copy of a section when you want to update it rather than just modifying it in-place?

Also, I'd steer away from a single Object base class that is intended to be shared between formats. There's a good reason why we don't do this in llvm-objcopy currently: the file formats are different enough that trying to have a common interface is messy.

My feeling is that you should focus on the structure required to make changes and then lay out a file to be written. If you look at how llvm-objcopy was originally developed, it started off as just doing enough to copy everything, and then hooks were added to allow modifications of different parts as time went on. However, I'm not saying that's the only way to design this, so I'm open to alternatives, if you've got a good reason to do it a different way.

ormris added a subscriber: ormris.Jun 12 2019, 10:35 AM
seiya added a subscriber: seiya.Jun 12 2019, 5:48 PM

At least with renaming a section, what is stopping you just modifying SectionBase.Name?
More generally, I guess I don't understand why you need to make a copy of a section when you want to update it rather than just modifying it in-place?

The current code uses some OrginalXXX fields, also Jake had said in response to an email of mine on llvm-dev (although I cannot find his response on Google groups so I'm not quite sure how to link it) that there were many things that were left mutable in the current code. This seemed like it could be used to keep the original file if it ever needed to be referenced again after modification, also leaving things like sections immutable, only changed by adding a new one. FWIW Jake had also said that OriginalXXX were only used in objcopy and wouldn't make sense to prioritize for a library, but I thought this might be a clever way to incorporate that.

Also, I'd steer away from a single Object base class that is intended to be shared between formats.

Do you think there should be no common base class like there is in the current llvm-objcopy code? I would say that libObject has quite a bit in the ObjectFile (and in SymbolicFile and Binary) class and leaves only the truly file dependent routines in their respective classes. Currently what I have, in MutableObject take renameSection for example depends on the classes inheriting from SectionBase to fulfill their contracts leaving differences there, but leaving common code in renameSection. Is the alternative you are suggesting how the current code is? You had said that the current code is not a library in LLVM terms, what is the biggest factor making this true?

My feeling is that you should focus on the structure required to make changes and then lay out a file to be written.

Great, I'll do this next. Thank you for all the feedback, James!

abrachet marked an inline comment as done.Jun 13 2019, 1:02 AM
abrachet added inline comments.
llvm/tools/llvm-objcopy/MutableObject/MutableObject.h
7 ↗(On Diff #204221)

Ignore this, I just wanted access to ObjectFile::base(), it didn't even work and I forgot to remove it.

Okay, I think I see what you're getting at now. By creating a copy, you don't need to have the OriginalX fields. I think this makes sense now, okay.

(although I cannot find his response on Google groups so I'm not quite sure how to link it)

There's an archive for the email threads which I usually use to link to past llvm-dev emails: http://lists.llvm.org/pipermail/llvm-dev/

FWIW Jake had also said that OriginalXXX were only used in objcopy and wouldn't make sense to prioritize for a library, but I thought this might be a clever way to incorporate that.

I think you'll find once you start trying to write file layout code that you'll need access to the "Original" values at various points. That's why they exist in llvm-objcopy. I don't think we should leave laying the file out to the user (or at least, I think we should have a function that they can choose to call to lay it out, if they want to follow the default layout routine), so you'll need the data.

Do you think there should be no common base class like there is in the current llvm-objcopy code? I would say that libObject has quite a bit in the ObjectFile (and in SymbolicFile and Binary) class and leaves only the truly file dependent routines in their respective classes.

I'll admit that I haven't thought too much about the differences between the file formats, and how libObject is designed. Thinking about it a bit more, I guess it might be a nice design to have effectively a mutable mirror that sits on top of libObject, with a parallel hierarchy like you have started. I don't have strong feelings either way though. I guess I want to know how you envisage a regular user using the library as a whole, because I suspect we might have slightly different visions here that could explain some of the differences.

Is the alternative you are suggesting how the current code is? You had said that the current code is not a library in LLVM terms, what is the biggest factor making this true?

I assume by current code you are referring to the existing llvm-objcopy code. At the moment, the code is in a location that is not intended to be used outside the llvm-objcopy tree (llvm-strip is similar enough that it is effectively just a clone of llvm-objcopy). A library needs to live in llvm/include and llvm/lib, so that different tools can link against it. In my mind, the only part that should really be outside the library is the driver components in llvm-objcopy.cpp (and maybe CopyConfig.cpp, but I'm not convinced on that point). However, that being said, your design sounds like it might provide a bit more flexibility for library users in the future, so please keep going with it.

abrachet updated this revision to Diff 205509.Jun 18 2019, 10:54 PM

Adds an iterator which iterates over sections only exposing the updated ones and skipping over removed ones. This is used to write an (incomplete but still readable by llvm-readelf) file. Next I will work to move this code to make it much cleaner because it really needs it, moving it into lib/ and include/ directories.

You're going to need to account for program headers in your layout code. If you take a look at llvm-objcopy, a lot of the layout work is done based on those, rather than the sections, because sections within segments should move with their segment.

llvm/tools/llvm-objcopy/MutableObject/MutableObject.h
38 ↗(On Diff #205509)

You should consider using smart pointers rather than new. Probably a unique_ptr is sufficient.

44 ↗(On Diff #205509)

Maybe "clone" isn't the right name for this function? It's doing more than cloning.

324 ↗(On Diff #205509)

The MC library already has a StringTableBuilder class that does a lot of this work, so you should probably use that, rather than writing things from scratch.

383–384 ↗(On Diff #205509)

It probably doesn't make sense for the elf header to be updated in this method. That should be derivable right before writing the header to disk. Possible exception is that we need the section header offset somewhere.

390 ↗(On Diff #205509)

writeToDisk is probably the wrong name for a function that writes to a buffer... ;)

abrachet updated this revision to Diff 207485.Tue, Jul 2, 1:21 AM

Large rework moving towards adding support for segments.

abrachet marked an inline comment as done.Tue, Jul 2, 1:26 AM
abrachet added inline comments.
llvm/lib/Object/MutableObject.cpp
56–62

I need help here. I just can't get it to compile. I guess I'm not understanding templates correctly? The constructor looks like this MutableObjectRange<T>::MutableObjectRange(UpdadableList<T>). I want the compiler to deduce the template parameter to type T when I pass an UpdadableList<T>. Either way even if I declare the template parameter like I do on line 61, it still doesn't compile. Any tips?

jhenderson added inline comments.Tue, Jul 2, 3:23 AM
llvm/include/llvm/Object/MutableELFObject.h
1

Even though this is WIP, you should probably include the LLVM license at the top of the new files, as this is technically an LLVM contribution already.

20

Is there a particular reason this takes both a Name and a section header?

46

FWIW, I don't think this assert adds anything.

llvm/lib/Object/CMakeLists.txt
18

This is causing me a problem on my Windows machine at least - CMake doesn't like it when there's no such file. Did you forget to add it?

llvm/lib/Object/MutableObject.cpp
56–62

MutableObjectRange's constructor takes a non-const reference: MutableObjectRange(UpdatableList<Iterable> &List);. However, sections() and segments() are listed as const methods, which makes the members of MutableObject (i.e. Sections and Segments) const, and therefore the input arguments don't match up with the signature (passing in a const & when a plain & is expected. Changing the signature to const & was enough ot fix this issue.

You also need to be explicit about the template you are using, as MutableObjectRange is a type. It's the same as a std::vector for example. You wouldn't expect the following to compile, even though it could deduce the constructor argument: return std::vector(1234). Methods can have template argument deduction, but not types. Thus if the constructor itself was templated, you might reasonable expect the template arguments to be deducible.

65

This and getMutableSegment aren't returning a value.

abrachet abandoned this revision.Sun, Jul 21, 8:51 PM