This is an archive of the discontinued LLVM Phabricator instance.

ELF: create "container" sections from PT_LOAD segments
ClosedPublic

Authored by labath on Dec 21 2018, 6:03 AM.

Details

Summary

This is the result of the discussion in D55356, where it was suggested
as a solution to representing the addresses that logically belong to a
module in memory, but are not a part of any of its sections.

The ELF PT_LOAD segments are similar to the MachO "load commands",
except that the relationship between them and the object file sections
is a bit weaker. While in the MachO case, the sections belonging to a
specific segment are placed directly inside it in the object file
logical structur, in the ELF case, the sections and segments form two
separate hierarchies. This means that it is in theory possible to create
an elf file where only a part of a section would belong to some segment
(and another part to a different one). However, I am not aware of any
tool which would produce such a file (and most tools will have problems
ingesting them), so this means it is still possible to follow the MachO
model and make sections children of the PT_LOAD segments.

In case we run into (corrupt?) files with overlapping sections, I have
added code (and tests) which adjusts the sizes and/or drops the offending
sections in order to present a reasonable image to the upper layers of
LLDB. This is mostly done for completeness, as I don't anticipate
running into this situation in the real world. However, if we do run
into it, and the current behavior is not suitable for some reason, we
can implement this logic differently.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Dec 21 2018, 6:03 AM
clayborg added inline comments.Dec 21 2018, 7:11 AM
source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
1841–1844 ↗(On Diff #179278)

Use existing range code from Range.h?

#include "lldb/Utility/Range.h"
typedef Range<lldb::addr_t, lldb::addr_t> SegmentAddressInfo;
1846–1850 ↗(On Diff #179278)

Use SegmentAddressInfo using either of:

struct SectionAddressInfo : public SegmentAddressInfo {
  SectionSP Segment;
};

or

struct SectionAddressInfo {
  SegmentAddressInfo Range;
  SectionSP Segment;
};
1954 ↗(On Diff #179278)

Maybe add square brackets around section name? "PT_LOAD[0]" "PT_LOAD[1]"? I am fine either way, just throwing this out there

source/Plugins/ObjectFile/ELF/ObjectFileELF.h
229 ↗(On Diff #179278)

Make this static and move to ObjectFileElf.cpp? It doesn't need to be a method.

it is in theory possible to create an elf file where only a part of a section would belong to some segment (and another part to a different one).

ELF standard: "An object file segment contains one or more sections.", "An object file segment comprises one or more sections"

joerg added a comment.Dec 24 2018, 4:37 PM

it is in theory possible to create an elf file where only a part of a section would belong to some segment (and another part to a different one).

ELF standard: "An object file segment contains one or more sections.", "An object file segment comprises one or more sections"

I wouldn't rule out smart tools splitting a section into a read-only and read-write part on linking. So no, I certainly wouldn't assume any direct correspondance between segments and sections.

ELF standard: "An object file segment contains one or more sections.", "An object file segment comprises one or more sections"

I wouldn't rule out smart tools splitting a section into a read-only and read-write part on linking. So no, I certainly wouldn't assume any direct correspondance between segments and sections.

That's interesting. Do you know of any such tool doing that? I am asking because that might give us a clue on how to handle that situation. In your example, it might be cleanest to split the offending section in two, and place the parts in appropriate segments (with the right permissions). However, I don't want to add code doing that if there is no known tool producing such files. If there aren't any (and I hope that there aren't, since it does appear to violate the spec), then I think we should just do the minimal amount of work necessary to process these (corrupt?) files without doing something completely bogus (like crashing). I think the current implementation would fit that bill.

labath updated this revision to Diff 179541.Dec 27 2018, 2:35 AM
labath marked 4 inline comments as done.

Address review comments:

  • move SegmentID() to the cpp file
  • use lldb_private::Range (not lldb_utility::Range) for segment and sections ranges
  • use brackets in segment names (PT_LOAD[0])
joerg added a comment.Dec 27 2018, 8:11 AM

I foundamentally don't understand what you are trying to do. You can either look at the executable from a segment perspective or from a section perspective. But trying to mix the views is bound to give bogus results.

I foundamentally don't understand what you are trying to do.

This is stemming from a discussion on a different patch. If you're interested in the full background you can look it up here https://reviews.llvm.org/D55356#1327099 (note that some of it happened on the mailing list, so it may not have made it into phabricator), but the short story is this: I needed a way to represent the base address of a module in lldb. Although that seems trivial, it is made complicated by the fact that lldb assumes that this address will belong to one of the module's sections. In the ELF case that typically won't be the case, as the load address will typically point to the first byte of the elf header (although it can point to anything really). Now, after learning this, my first reaction was also, "well, duh, we need to change that assumption", but then I realized that this will actually be correct if you consider treat segments to be "sections" as well (as then the load address is just the first byte of the first section). The nice part about this is that (to my surprise) perfectly matches what lldb is doing for MachO files. And if we assume that the elf spec that Jan quoted is true, then this is also a perfectly valid thing to do.

Also, even though this wasn't my motivation, I believe an even bigger impact of this will be greater accuracy in the way lldb attributes addresses. Right now, if you ask lldb about an a module's address which doesn't belong to any of the module's sections (e.g. the inter-section padding, or the padding commonly present at the beginning of the first segment), then lldb will say it doesn't have a clue. With this patch, it will correctly report that this address belongs to the module and was loaded by the given PT_LOAD segment.

You can either look at the executable from a segment perspective or from a section perspective. But trying to mix the views is bound to give bogus results.

But will it really? I mean, it's technically possible to create an elf file with inconsistent section/segment views, but the only way I know of doing that involves using a hex editor (and, then, I could create bogus MachO files the same way). As far as I know, gnu linker scripts https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_23.html#SEC23 only allow you to put the whole section into a segment or none of it. yaml2obj (I know that's not a very convincing data point) has the same restriction. With a hex-editor, I can create many inconsistencies using the section headers alone, but I'm not worried about those either, if there isn't a legitimate use case for e.g. sections with overlapping VM addresses. Obviously, throwing program headers into the mix adds a whole new level of possible inconsistencies, but I am not sure if these ever occur in practice.

labath added a comment.Jan 8 2019, 4:22 AM

Greg, what do you think about the current version of this patch.

Joerg, I propose to check the patch in in the current form, as this solves existing issues in the lldb's ELF representation and improves the overall consistency of lldb's representation of object files. I'm not opposed to changing things again if we find a use case that this breaks, but I can do that only once I know what that use case is (and that might involve making changes to the generic representation of modules in lldb).

clayborg accepted this revision.Jan 8 2019, 8:27 AM

LGTM

This revision is now accepted and ready to land.Jan 8 2019, 8:27 AM
This revision was automatically updated to reflect the committed changes.