This is an archive of the discontinued LLVM Phabricator instance.

[Object] Create MutableELFObject Class for Doing Mutations on ELFObjectFiles [Part 6]
Needs ReviewPublic

Authored by abrachet on Aug 11 2019, 11:08 PM.

Details

Summary

Added support for segments

Diff Detail

Event Timeline

abrachet created this revision.Aug 11 2019, 11:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2019, 11:08 PM

How about some testing for changing program header properties, e.g. size or address?

llvm/include/llvm/Object/MutableELFObject.h
204

Any particular reason you've moved this line?

211–220

This should be in one of the earlier changes probably, if it can be.

239–240

Did you mean to remove this?

299

What is the point of this and the less-than operator?

300

Just do:

return Object == Other.Object && Index == Other.Index;
302

Blank line before this. Also, is it clang-formatted?

313

What's the point of this function?

315–321

I'm not convinced this patch should attempt to add sections to segments. The reason for this is because they're not an intrinsic part of segments, but rather an artefact of the layout of an ELF object. Relatedly, this doesn't handle segments within other segments, so feels incomplete. I'd defer it to a patch that handles laying out an ELF.

349–350

Did you mean to remove this? Looks like a bad rebase to me.

355–356

Did you mean to remove this? Looks like a bad rebase to me.

387–390

Did you mean to remove this? Looks like a bad rebase to me.

412

Comment?

Also Seg -> Segment

llvm/unittests/Object/MutableELFObjectTest.cpp
464

Seems unrelated?

518–521

Seems unrelated?

532

Seems unrelated?

774

I'd have at least 2 different segments here and test that both can be read in.

abrachet updated this revision to Diff 216052.Aug 19 2019, 8:58 PM

Addressed comments

abrachet marked 14 inline comments as done.Aug 19 2019, 9:00 PM

Ahh sorry about all those unrelated changes! Not sure what happened there probably a bad rebase like you said.

llvm/include/llvm/Object/MutableELFObject.h
299

They are needed by content_iterator same with moveNext

315–321

Ok, I'll move it out to a part 7.

MaskRay added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
70

The ctor is not really necessary.

212

This should be added to Part I D64281.

296

You can use uint64_t Index = 0; (did you mean size_t Index = 0;?) and delete the ctor.

417

Delete &&. See my comment in Part 4 D66062.

jhenderson added inline comments.Aug 20 2019, 2:44 AM
llvm/include/llvm/Object/MutableELFObject.h
519

Can this be inlined into getMutableSegment?

520

Is there a particular reason this implementation requires getIfNew, whereas the symbol and section versions don't?

llvm/unittests/Object/MutableELFObjectTest.cpp
831

Changing the address of a segment is very unlikely to ever actually happen, I'd think. I'd change the p_offset field instead.

abrachet updated this revision to Diff 216331.Aug 20 2019, 10:08 PM
abrachet marked 2 inline comments as done.

Addressed comments

abrachet marked 9 inline comments as done.Aug 20 2019, 10:16 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
70

It eventually gets called by an emplace_back

212

I had this in an earlier patch but then moved it out because it wasn't needed until now. I don't have a preference either way which patch it goes in.

296

Hmm am I missing something on how to construct objects? C++ doesn't have the nice C syntax like SegmentRef S = {.Object = nullptr, .Index = 0}; Is there a way to leave out these boiler plate constructors that I don't know of?

520

Yes, although I'm not sure if you'll like it :) Creating a MutableELFSegment is more expensive than the others because it takes a list of all sections that reside in it. So using getIfNew allows to not have to calculate that if it is already been made mutable then it just returns a mutable reference to a MutableELFSegment. I say you might not like that because what I described doesn't happen in this patch but only will in Part 7.

abrachet updated this revision to Diff 216332.Aug 20 2019, 10:16 PM
abrachet marked 2 inline comments as done.

Use the right diff (!)

Posting this here since I've not seen a patch for Part 7 yet. I may have said this elsewhere already, but I don't think it's a good idea to have sections as members of segments. I think you'll find the layout algorithm easier to work with when using a parent relationship, rather than a child one. Note that this is how llvm-objcopy is already written. That should allow you to handle multiple levels of nesting/overlapping etc without too much difficulty. That would also, I believe, make the Segment and Section implementations much closer to each other.

llvm/include/llvm/Object/MutableELFObject.h
296

Do you ever need the default value of Index? It seems to me to be unnecessary.

abrachet updated this revision to Diff 216550.Aug 21 2019, 9:22 PM

Added parent segment to both section and segment types and methods to find parent segments

That should allow you to handle multiple levels of nesting/overlapping etc without too much difficulty.

Hmm if a segment had a list of its child segments, then couldn't layout be done recursively on each child segment? Would this not be more straight forward?

I think what I currently have should move more in the direction you are suggesting. I have a concern with my current implementation, its if I should use the original program headers or the updated ones. It's possible to change a segments vaddr, although you have mentioned unlikely. This would be problematic to use sectionInSegment after changing the program headers because then it would be obviously wrong. So right now I use the original program headers. Also something to look out for, I use parent segments as indexes not pointers and those are ssize_t not size_t because the alternative of using one past the last segment like an end iterator was too ugly than using -1 to denote no parent. I could also use size_t's max I suppose.

That should allow you to handle multiple levels of nesting/overlapping etc without too much difficulty.

Hmm if a segment had a list of its child segments, then couldn't layout be done recursively on each child segment? Would this not be more straight forward?

What happens if a segment is inside another segment? You only want to lay it out if it hasn't been laid out already, so you'd have to find a way to track that. On the other hand, if you only lay out things that have no parent (and then lay out the things with parents based on their parent's layout), you don't have this issue. Keeping track of children is not necessarily bad, as it allows you to lay the children out at the same time as the parent, but it's not as important (you can always do two passes for example).

I think what I currently have should move more in the direction you are suggesting. I have a concern with my current implementation, its if I should use the original program headers or the updated ones. It's possible to change a segments vaddr, although you have mentioned unlikely. This would be problematic to use sectionInSegment after changing the program headers because then it would be obviously wrong. So right now I use the original program headers.

Using the original program headers is probably the right thing to do, as that ensures the original relative layout is preserved.

Also something to look out for, I use parent segments as indexes not pointers and those are ssize_t not size_t because the alternative of using one past the last segment like an end iterator was too ugly than using -1 to denote no parent. I could also use size_t's max I suppose.

See my inline suggestion about using Optional. size_t's max would be my next suggestion.

llvm/include/llvm/Object/MutableELFObject.h
262–263

How about using Optional<size_t> instead of -1? I think that would more clearly convey the meaning and would avoid the weirdness of using ssize_t everywhere.

267

Could you not just move SegmentRef's definition earlier?

270

I'd keep this and findSectionsParentSegment() next to each other, and I'd seriously consider just calling them both findParentSegment().

543–544

Something you may wish to consider is effectively flattening the nesting, so that only top-level segments are parents, i.e. if a segment has a parent, then the parent cannot have a parent itself. This may make the layout algorithm simpler, but could probably be left until you're at that point.

Same comment applies to sections.

550

Is this identical to the llvm-objcopy version? If not, could you highlight how it's different, please.

574

Delete the blank line here.

You need to be careful. It looks to me like a segment could end up as its own parent here, if I'm not misreading the code.

Also, you should consider taking alignment requirements into account, in the event that two segments have the same size and offset. See https://bugs.llvm.org/show_bug.cgi?id=42963.

Also, consider the following bug in your sectionWithinSegment code: https://bugs.llvm.org/show_bug.cgi?id=41126.

abrachet updated this revision to Diff 216719.Aug 22 2019, 2:28 PM

Use Optional<size_t> to represents parent segments. Addressed other comments.

abrachet marked 7 inline comments as done.Aug 22 2019, 2:43 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
262–263

Thats much better :)

543–544

Something like if (segmentWithinSegment(Child, Phdr) && findParentSegment(Phdr) != None)?

Same comment applies to sections.

I think I am misunderstanding something, do sections have anything to do whether a segment can have a parent? There is no overlapping sections I thought so I don't think this relates to that. Sorry about the confusion!

574

You need to be careful. It looks to me like a segment could end up as its own parent here, if I'm not misreading the code.

Indeed, thanks fixed that.

Also, you should consider taking alignment requirements into account, in the event that two segments have the same size and offset. See https://bugs.llvm.org/show_bug.cgi?id=42963.

I have added this it happens on line 941 of the unit test, is that what you mean?

Also, consider the following bug in your sectionWithinSegment code: https://bugs.llvm.org/show_bug.cgi?id=41126.

I think I have this:

|-Segment-|
    |-Section-|

tested and it properly says that the section is in the segment. There was no change needed for that.
To get yaml2obj to output that I just made the sections size larger than the segment and put the section in the segment. I had trouble with the other two examples in that bug report because Offset: was not recognized for sections? So I don't know how to test these two cases:

    |-Segment-|
|-Section-|

  |-Segment-|
|---Section---|

but I know for sure they will not be reported as being in the segment.

Could you write some comments in the test code that show what the cases are that you are testing, similar to how you've written it in your inline comments. This will make it easier to understand what is going on.

llvm/include/llvm/Object/MutableELFObject.h
543–544

I'm thinking that if the segment layout looks something like the following:

| Section1 | Section2 | Section3 | Section4 |
| Segment1            | Segment2 |
| Segment3                                  |

Everything should have Segment3 as their parent (except Segment3 itself of course).

If you don't flatten the tree, then sections 1, 2, and 3 will have parents of Segment 1 or 2 which will make it harder to perform layout, as you effectively have to chase the parents pointer to find the top-level element to determine where to move them.

I think I am misunderstanding something, do sections have anything to do whether a segment can have a parent? There is no overlapping sections I thought so I don't think this relates to that. Sorry about the confusion!

Yes, I think you misunderstood. I was just referring to the fact that the parent of a section should be the top-most parent in the tree.

549

except it uses

550

and not

574

Indeed, thanks fixed that.

I think you've missed an issue - what happens if two segments have the same offset, size and alignment? In that case, neither will be treated as a parent of the other, and the algorithm falls apart.

I had trouble with the other two examples in that bug report because Offset: was not recognized for sections? So I don't know how to test these two cases:

I think there's a ShOffset field available to set the claimed offset of a section (the actual data will still be written where it would be). The alternative to that is to effectively force the section's offset by setting the AddressAlign and Address fields, so that yaml2obj has to place it at a certain position. You'll see quite a few examples of this within tests. Once you have forced a section to a specific position, you can then just set the ProgramHeader fields accordingly.

575

I'd rename Parent to PotentialParent and Child to Segment, because the Parent isn't necessarily a Parent yet (and therefore the Child isn't necessarily a Child of that segment).

abrachet updated this revision to Diff 217033.Aug 24 2019, 9:51 PM
abrachet marked an inline comment as done.

Addressed review comments

abrachet marked 9 inline comments as done.Aug 24 2019, 9:54 PM
abrachet added inline comments.
llvm/include/llvm/Object/MutableELFObject.h
574

I think you've missed an issue - what happens if two segments have the same offset, size and alignment? In that case, neither will be treated as a parent of the other, and the algorithm falls apart.

Ok, now it is finally fixed. It isn't very pretty because I need to pass the indexes of each segment but I just choose whichever segment is first to be the parent.

I'm out of time for today on this review. I'll come back to it tomorrow.

llvm/include/llvm/Object/MutableELFObject.h
521–522

A comment briefly explaining this would be helpful.

527–530

Some code comments explaining what is going on here would be useful.

539–540

An explanation with some comments as to why you can have this early out would make this clearer.

579

ChildOffset -> SegmentIndex
ParentOffset -> ParentIndex

Write a comment at the start of this function outlining the behaviour, please.

llvm/unittests/Object/MutableELFObjectTest.cpp
923–924

doesnt -> doesn't

But really, I'm a little confused what this comment is trying to say. I think you want to say something about offsets, because it's confusing what you're talking about in this comment. It appears you're saying 4 doesn't come before 5, but it's earlier in the program header list, which seems to contradict what you're saying.