This is an archive of the discontinued LLVM Phabricator instance.

[obj2yaml] Support dumping program headers.
Changes PlannedPublic

Authored by rupprecht on May 22 2019, 5:33 PM.

Details

Summary

This change dumps all program headers for ELF files. This will be useful for checking in yaml files instead of binaries/cores for programs (e.g. lldb) that needs to run tests on files with program headers.

Event Timeline

rupprecht created this revision.May 22 2019, 5:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2019, 5:33 PM

Unfortunately, I don't think this will be enough to make obj2yaml really useful for handling program headers. The interesting thing about program headers is their interconnection with the data described by section headers, and by storing that values verbatim, you're completely ignoring that aspect. This may be enough to capture a file, if the elf file was already produced by yaml2obj, and the yaml hasn't been modified in any way, as then obj2yaml will likely lay things out the same way and the program headers will come out "right". However, if the elf file was produced by some other tool (including an older version of yaml2obj), then the reconstituted program headers will likely point to garbage.

So, I believe a more sophisticated solution is needed here. I think we will need to somehow capture the segment-to-section relationship symbolically, much like yaml2obj allows you to state the sections which are to be contained in a segment, and then adjusts the program headers offset and size fields accordingly. However, that is not going to be that trivial, because there is a bunch of edge cases to consider:

  • the segments can contain data not covered by any section, due to alignment or other considerations. The most extreme case of this are elf core files, which contain no sections, and all data is accessible only through program headers. So we'll probably need a way to specify (parts of) segment content directly.
  • this means that yaml2obj will need to be able to generate and allocate space for this kind of data in its output. That may mean changing the algorithm it uses to allocate space for the section data, but I don't really have that part thought out.
  • the PT_PHDR header is particularly amusing, because it is self-referencing. However, I don't think we use this header for anything right now, so it's probably not too important what we do with it..

I agree with @labath, I don't think this approach is quite right. We should avoid using the strict Offset field if we can, in the obj2yaml output, and we should definitely link them with their sections (i.e. via the Sections: member of a program header). Perhaps we need to update the whole paradigm, to allow for arbitrary data in the list of "Sections"? Something like:

Sections:
  - Data: '12345678'
  - Section: .text
  - Data: 'abcdef90'
  - Section: .another.text

And I'd probably rename "Sections" to "Members" or similar.

I've been using the FileSize and Offset fields up to now because there isn't a sensible alternative for arbitrary data in program headers not covered by a section, and something like this would make the tests I've written more robust. It would also allow a cleaner obj2yaml output, I think.

rupprecht planned changes to this revision.Jun 6 2019, 4:46 PM

Ack -- those suggestions sound good, I'll revive this patch when I have a proposal for smarter program headers.