This is an archive of the discontinued LLVM Phabricator instance.

[Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 5]
Needs ReviewPublic

Authored by abrachet on Aug 10 2019, 11:21 PM.

Details

Summary

Micro patch to add ability to modify the files header

Diff Detail

Event Timeline

abrachet created this revision.Aug 10 2019, 11:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2019, 11:21 PM

I think you need to be careful that you any methods in ELFObjectFile etc that query the header return the updated value. For example, I see there's a getStartAddress function, which returns the value of e_entry. If you change e_entry, you'll want this to return the updated value, I think.

abrachet updated this revision to Diff 215533.Aug 15 2019, 8:16 PM

Made a virtual getHeader method in ELFObjectFile

I'm concerned with the getHeader() call changes in ELFObjectFile, mostly because again it's introducing a very obvious point of failure. What if somebody added a new piece of functionality to the base class to get another piece of data from the header? There's a good change they'll call EF.getHeader(). Say the new function is called getFoo(): if another user then decides they want to call getFoo() on a MutableELFObject, then they'll end up using the unmodified header rather than the modified one, which could cause surprising bugs. I don't have an obvious answer to this, but I'm beginning to wonder if we should be using a different style of inheritance, one that only makes the functions available that we know to be safe from a MutableELFObject's point of view. This might be something like private inheritance. I'm not sure though, as it would mean providing shim functions to re-publicise the "safe" functions we care about. What are your thoughts?

llvm/include/llvm/Object/MutableELFObject.h
356

Why do you need both this and getHeader()?

357

Rename this to getHeader() to ensure sensible function overloading, and have it return a pointer to the header, like the other version.

abrachet updated this revision to Diff 216047.Aug 19 2019, 7:59 PM

Change getHeader() to return a reference not pointer.

abrachet marked 3 inline comments as done.Aug 19 2019, 8:00 PM

I'm concerned with the getHeader() call changes in ELFObjectFile, mostly because again it's introducing a very obvious point of failure. What if somebody added a new piece of functionality to the base class to get another piece of data from the header? There's a good change they'll call EF.getHeader(). Say the new function is called getFoo(): if another user then decides they want to call getFoo() on a MutableELFObject, then they'll end up using the unmodified header rather than the modified one, which could cause surprising bugs. I don't have an obvious answer to this, but I'm beginning to wonder if we should be using a different style of inheritance, one that only makes the functions available that we know to be safe from a MutableELFObject's point of view. This might be something like private inheritance. I'm not sure though, as it would mean providing shim functions to re-publicise the "safe" functions we care about. What are your thoughts?

Hmm that's a lot to think about. I don't love private inheritance because I think it would end up being a lot of boiler plate for the methods that are fine either way, I think.

Certainly it's true that EF.getHeader() would be gravitated towards for future changes. I think if a getFoo() is added and its something that MutableELFObject would want, then either way there needs to be a change either to use getHeader not EF.getHeader() or add getFoo() in MutableELFObject. Of course there's a difference because the bug couldn't exist if getFoo() isn't available.

You would know more about maintaining something like this so I'll go with whatever you (or others) have to say.

llvm/include/llvm/Object/MutableELFObject.h
357

I have moved to have a separate getMutableHeader that returns a mutable reference.

MaskRay added inline comments.Aug 19 2019, 10:58 PM
llvm/include/llvm/Object/MutableELFObject.h
91

If you use size_t in D64281, you shouldn't need the diff here.

Did you forget to rebase on latest Part 1,2,3?

grimar added inline comments.Aug 20 2019, 2:16 AM
llvm/include/llvm/Object/ELFObjectFile.h
501

I'd suggest using a reference here and in the other places. It looks more natural
as Header can't be nullptr.

onst Elf_Ehdr &Header = getHeader();

I've been trying and failing to come up with a good solution to the design. I only see three options:

  1. Adding the getHeader function as this patch does and accepting some fragility. In practice, usages within LLVM are probably fine, because we'll have testing, but it might not work as well for those who wish to use the library in their own tools.
  2. Using private inheritance or making the ELFObjectFile a private member of MutableELFObject instead of a base class. This would require MutableELFObject to opt in into functions that we know to be safe to use, and wouldn't allow usage of other functions. This is safer, but could result in a fair amount of extra code. This might need to be combined with 1) too to fix the specific issues this patch addresses.
  3. Something more radical involving bigger changes in ELFObjectFile to prevent accessing the wrong header within the base class. I don't have anything specific in mind here at the moment.

Does anybody else have any thoughts on this?

llvm/unittests/Object/MutableELFObjectTest.cpp
378

Why has this changed from your previous (correct) version?

416

Why has this changed from your previous (correct) version?

507–508

Bad rebase?

639

Bad rebase?

abrachet updated this revision to Diff 216328.Aug 20 2019, 9:22 PM
abrachet marked an inline comment as done.

Addressed comments

abrachet marked 6 inline comments as done.Aug 20 2019, 9:23 PM

I've been trying and failing to come up with a good solution to the design. I only see three options:

  1. Adding the getHeader function as this patch does and accepting some fragility. In practice, usages within LLVM are probably fine, because we'll have testing, but it might not work as well for those who wish to use the library in their own tools.
  2. Using private inheritance or making the ELFObjectFile a private member of MutableELFObject instead of a base class. This would require MutableELFObject to opt in into functions that we know to be safe to use, and wouldn't allow usage of other functions. This is safer, but could result in a fair amount of extra code. This might need to be combined with 1) too to fix the specific issues this patch addresses.
  3. Something more radical involving bigger changes in ELFObjectFile to prevent accessing the wrong header within the base class. I don't have anything specific in mind here at the moment.

Does anybody else have any thoughts on this?

I'll wait to see if anyone else chimes in before working on this

jhenderson added inline comments.Aug 21 2019, 2:06 AM
llvm/include/llvm/Object/ELFObjectFile.h
385

Adding a comment here saying make sure to use this function instead of calling the EF.getHeader() function directly is probably a good idea.

  1. Using private inheritance or making the ELFObjectFile a private member of MutableELFObject instead of a base class. This would require MutableELFObject to opt in into functions that we know to be safe to use, and wouldn't allow usage of other functions. This is safer, but could result in a fair amount of extra code. This might need to be combined with 1) too to fix the specific issues this patch addresses.

This approach (not private inheritance, but making ELFObjectFile a private member + exposing specific accessors) seems to be the least fragile approach, but also the most verbose one, e.g. you may have to implement/override all the methods of EF that are used in subclasses.

llvm/include/llvm/Object/ELFObjectFile.h
385

Not sure I agree with changing this to a ref; keeping the type as a pointer would probably make this patch (and any potential down-tree fixes) simpler, even if it can't ever be null. I don't feel strongly though.

llvm/include/llvm/Object/MutableELFObject.h
357

This is logically the same thing as getHeader, is there a reason it needs to have a different name?

  1. Using private inheritance or making the ELFObjectFile a private member of MutableELFObject instead of a base class. This would require MutableELFObject to opt in into functions that we know to be safe to use, and wouldn't allow usage of other functions. This is safer, but could result in a fair amount of extra code. This might need to be combined with 1) too to fix the specific issues this patch addresses.

This approach (not private inheritance, but making ELFObjectFile a private member + exposing specific accessors) seems to be the least fragile approach, but also the most verbose one, e.g. you may have to implement/override all the methods of EF that are used in subclasses.

I have been thinking about this a lot recently. I do really like that MutableELFObject inherits from ELFObjectFile I think it reuses the most amount of code possible while still doing what we want. I think it isn't worth it to use MutableELFObject as a private member that it would be too verbose. Do you think it is sufficiently less fragile to warrant making the larger change? Better to be sure now when the code is much easier to change with all patches still open :)

I did have a question about private inheritance and what is safe to do. I think the most useful methods for llvm-objcopy are actually in the SectionRef SymbolRef and SegmentRef types so really the only methods I can see that need to be publicly exposed are the iterator returning ones. The problem is (not for SegmentRef because I can change that) those keep a pointer to an ObjectFile. From what I have tested, the virtual table is still constructed correctly and we can reinterpret_cast<ObjectFile*>(this) when creating Segment/SymbolRef's and the proper overrides will be called. Is this safe? If not then I guess there would be no difference in how much boilerplate we would need to add between using inheritance and having an ELFObjectFile as a member.

Lastly, how should I go about doing this? Should I switch to private inheritance in this patch or in Part 1 D64281?

I have been thinking about this a lot recently. I do really like that MutableELFObject inherits from ELFObjectFile I think it reuses the most amount of code possible while still doing what we want. I think it isn't worth it to use MutableELFObject as a private member that it would be too verbose. Do you think it is sufficiently less fragile to warrant making the larger change? Better to be sure now when the code is much easier to change with all patches still open :)

I was discussing this issue with a colleague yesterday, to see if there's a clear path forward. He suggested that it might have made more sense to just modify the ELFObjectFile interface directly, rather than adding the MutableELFObject class. That would render the concerns we have a little less of an issue, since it would more clearly be a bug in the base class if the wrong things were called. You could also mess about with more objects to hide away the dangerous methods by doing this. I don't have a strong feeling about any of this either way. I think with sufficient testing and a large comment, you could probably get away without making the changes we've talked about, but it might be worth it still.

I did have a question about private inheritance and what is safe to do. I think the most useful methods for llvm-objcopy are actually in the SectionRef SymbolRef and SegmentRef types so really the only methods I can see that need to be publicly exposed are the iterator returning ones. The problem is (not for SegmentRef because I can change that) those keep a pointer to an ObjectFile. From what I have tested, the virtual table is still constructed correctly and we can reinterpret_cast<ObjectFile*>(this) when creating Segment/SymbolRef's and the proper overrides will be called. Is this safe? If not then I guess there would be no difference in how much boilerplate we would need to add between using inheritance and having an ELFObjectFile as a member.

I'm not massively familiar with how private inheritance works really, but the essential point is that a user of a subclass will never know that it's inheriting from the base class. I believe the virtual table will still exist, if you reinterpret_cast to the base class, and so it's safe enough to work with the base class this way, but it feels a bit wrong to be doing so, as it circumvents the whole point of why we are doing the inheritance this way at all. One advantage of private inheritance over a private member, I believe, is that you should be able to use the using directive to pull base-class functions into the public interface of the sub-class: https://softwareengineering.stackexchange.com/a/244910

Lastly, how should I go about doing this? Should I switch to private inheritance in this patch or in Part 1 D64281?

As this is a structural change that is not specific to this change (as in it doesn't introduce new functionality itself), it makes more sense to do the changes in part 1, if you're going to do them, but we should be sure it's the right thing to do first.

I was discussing this issue with a colleague yesterday, to see if there's a clear path forward. He suggested that it might have made more sense to just modify the ELFObjectFile interface directly, rather than adding the MutableELFObject class.

So remove MutableELFObject and put it in ELFObjectFile? This might let ELFObjectFile no longer depend on ELFFile but I'm not entirely sure on that. It's also worth noting that as @rupprecht has pointed out many methods are now linear not constant so without doing some changes (could even be a boolean to track whether any modifications have been made) it probably shouldn't be used yet.

You could also mess about with more objects to hide away the dangerous methods by doing this.

Is this in reference to the earlier point about just modifying ELFObjectFile?

I believe the virtual table will still exist, if you reinterpret_cast to the base class, and so it's safe enough to work with the base class this way, but it feels a bit wrong to be doing so, as it circumvents the whole point of why we are doing the inheritance this way at all.

I think it isn't that bad to cast this to a base class if we know the methods that will be called in that context will always be safe to call. It is much more controlled this way but still bugs can happen if the base class methods get change. I don't feel too strongly.

One advantage of private inheritance over a private member, I believe, is that you should be able to use the using directive to pull base-class functions into the public interface of the sub-class:

Oh that's very useful I didn't know about that.

I was discussing this issue with a colleague yesterday, to see if there's a clear path forward. He suggested that it might have made more sense to just modify the ELFObjectFile interface directly, rather than adding the MutableELFObject class.

So remove MutableELFObject and put it in ELFObjectFile? This might let ELFObjectFile no longer depend on ELFFile but I'm not entirely sure on that. It's also worth noting that as @rupprecht has pointed out many methods are now linear not constant so without doing some changes (could even be a boolean to track whether any modifications have been made) it probably shouldn't be used yet.

Yes, I was suggesting that an alternative design might be to simply expand the ELFObjectFile interface, but you raise a good point - I forgot about the complexity issue. We certainly wouldn't want to impair the complexity of the existing ELFObjectFile methods, at least unless we started to make modifications. The point about not relying on ELFFile isn't really the issue here. In fact, the class could for some situations. The important bit is that it is now ELFObjectFile's responsibility to be able to handle mutations, which means that changes to the ELFObjectFile class's code must support that, whereas in the current proposal, changes have to be made to ELFObejctFile's code to support a subclass which it otherwise doesn't care about.

You could also mess about with more objects to hide away the dangerous methods by doing this.

Is this in reference to the earlier point about just modifying ELFObjectFile?

Yes. You could add additional objects that wrap the ELFFile class and return either the original header or a mutated one, similar to what you are doing with MutableELFObject already, but the difference is that the wrapper object in this case goes in the base class, and thus it's never possible to access the dodgy method.

I believe the virtual table will still exist, if you reinterpret_cast to the base class, and so it's safe enough to work with the base class this way, but it feels a bit wrong to be doing so, as it circumvents the whole point of why we are doing the inheritance this way at all.

I think it isn't that bad to cast this to a base class if we know the methods that will be called in that context will always be safe to call. It is much more controlled this way but still bugs can happen if the base class methods get change. I don't feel too strongly.

reinterpret_cast is basically saying "pretend this thing here is actually this thing" which technically works in some situations, but bypasses the whole point of hiding methods via private inheritance. The function might be safe at the moment to call, but it doesn't guarantee it in the future, as the subclass hasn't got tests for that particular case. There may be other subtle correctness issues too.