This is an archive of the discontinued LLVM Phabricator instance.

ObjectFilePECOFF: Create a "container" section spanning the entire module image
ClosedPublic

Authored by labath on Jan 10 2019, 4:02 AM.

Details

Summary

This is coming from the discussion in D55356 (the most interesting part
happened on the mailing list, so it isn't reflected on the review page).

In short the issue is that lldb assumes that all bytes of a module image
in memory will be backed by a "section". This isn't the case for PECOFF
files because the initial bytes of the module image will contain the
file header, which does not correspond to any normal section in the
file. In particular, this means it is not possible to implement
GetBaseAddress function for PECOFF files, because that's supposed point
to the first byte of that header.

If my (limited) understanding of how PECOFF files work is correct, then
the OS is expecded to load the entire module into one continuous chunk
of memory. The address of that chunk (+/- ASLR) is given by the "image
base" field in the COFF header, and it's size by "image size". All of
the COFF sections are then loaded into this range.

If that's true, then we can model this behavior in lldb by creating a
"container" section to represent the entire module image, and then place
other sections inside that. This would make be consistent with how MachO
and ELF files are modelled (except that those can have multiple
top-level containers as they can be loaded into multiple discontinuous
chunks of memory).

This change required a small number of fixups in the PDB plugins, which
assumed a certain order of sections within the object file (which
obivously changes now). I fix this by changing the lookup code to use
section IDs (which are unchanged) instead of indexes. This has the nice
benefit of removing spurious -1s in the plugins as the section IDs in
the pdbs match the 1-based section IDs in the COFF plugin.

Besides making the implementation of GetBaseAddress possible, this also
improves the lookup of addresses in the gaps between the object file
sections, which will now be correctly resolved as belonging to the
object file.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Jan 10 2019, 4:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 5:24 AM

Could you please take a look at this? This is currently blocking me from making progress with breakpad symbol files.

My knowledge of PECOFF is even more limited, but it's my understanding that the ImageBase is the _preferred_ address for the module. That doesn't guarantee that's the actual address it would be loaded at. Not just because of ASLR but also because there may be conflicts with modules that have already been loaded. Is GetBaseAddress supposed to return the actual base address or the preferred one?

clayborg added inline comments.Feb 11 2019, 9:26 AM
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
1370 ↗(On Diff #181022)

How fast is this? Do we need a local cache so we aren't looking up a section for each symbol? Maybe a locally cached vector since sections are represented by indexes?

labath marked an inline comment as done.Feb 12 2019, 12:09 AM

My knowledge of PECOFF is even more limited, but it's my understanding that the ImageBase is the _preferred_ address for the module. That doesn't guarantee that's the actual address it would be loaded at. Not just because of ASLR but also because there may be conflicts with modules that have already been loaded. Is GetBaseAddress supposed to return the actual base address or the preferred one?

Well, it kind of returns both. The returned Address object stores the address in a section-relative form. So, if you ask it for the "file address", it will return the "address, as known to the object file", or the "preferred load address", or however you like to call it. OTOH, you can also ask it for the "load address" for a specific target, which will consult the target for the load address of the section, and return the actual load address in a specific target. This is the main reason why I needed to create the extra container section sitting on top of everything (though that has other benefits too).

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
1370 ↗(On Diff #181022)

It's not particularly fast (linear search), but the number of sections is generally small. FindSectionByID is also used in other places for Symtab construction (e.g. ObjectFileELF; ObjectFileMachO does some complicated thing, which I believe involves caching), so it doesn't seem to be too bad. If it turns out we need to speed up the lookup here, then I think it should be done a bit more generically, so that all users can benefit from this.

clayborg accepted this revision.Feb 12 2019, 7:04 AM
clayborg added inline comments.
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
1370 ↗(On Diff #181022)

I think this is actually ok because PECOFF files generally don't have many (if any) symbols in them. Unless we did the symbol table out of another location in the file that isn't the standard PECOFF symbol table.

This revision is now accepted and ready to land.Feb 12 2019, 7:04 AM
lemo accepted this revision.Feb 12 2019, 10:30 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2019, 11:17 PM