This is an archive of the discontinued LLVM Phabricator instance.

WIP: [lldb][COFF] Enhance symtab loading of symbol and export tables
AbandonedPublic

Authored by alvinhochun on Sep 18 2022, 5:29 AM.

Details

Summary

This reimplements ObjectFilePECOFF::ParseSymtab with the following
changes:

  • When the image has both an export table and a COFF symbol table, the symbols obtained from the symbol table no longer get erased.
  • Remove manual data extraction in favour of using what COFFObjectFile provides.
  • Mark symbols from the export table as "External" instead of "Debug".
  • Support DLL forwarder exports (marked as re-exported).
  • Handle absolute symbols in the symbol table.
  • Heuristically set some symbol types. Symbols in the symbol table starting in __imp_ (dllimport IAT reference) or .refptr. (mingw stub) are marked as "Data", because their locations store a pointer address.
  • When a symbol in the symbol table is a duplicate of an exported symbol, its info will be synchronized with the exported symbol, then it will be marked as "Additional" type to avoid unwanted repetition when used in commands like disassemble.

Diff Detail

Event Timeline

alvinhochun created this revision.Sep 18 2022, 5:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2022, 5:29 AM
alvinhochun requested review of this revision.Sep 18 2022, 5:29 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 18 2022, 5:29 AM

So previously, LLDB essentially used the COFF symbol table for executables, but only the list of exported symbols for DLLs, ignoring (or, reading and then overwriting) the symbol table for any DLL with exports? Then this certainly does look like a good direction.

Overall, I think this does look nice, but it's certainly hard to rewrite and wrap one's head around. When you've settled what you want to achieve, can you try to split it up into a short series of patches, so that the initial rewrite patch doesn't add a ton of extra functionality, but only covers the rewrite and appending instead of wiping symbols?

lldb/source/Commands/CommandObjectTarget.cpp
1558

Looks like this is a stray change unrelated to the rest (although it does seem correct).

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
787

If we mean to append to the symbol table here, shouldn't we start with i = symtab.size() or similar? Otherwise we'd start overwriting the existing symbols from the start of the table?

819

What about the case when a symbol is exported with a different name than the local symbol? (This is doable with def files e.g. as ExportedName = LocalName iirc.) Is it worth to have a map of address -> exported symbol, to use instead of the raw name?

917

If it's relevant, we can look up what section the exported address is in, and check if the section is executable or not.

lldb/source/Plugins/SymbolFile/Symtab/SymbolFileSymtab.cpp
84

For ease of review (when removing the WIP status), I think it'd be easier to wrap the head around, if new features like these were deferred to a separate patch.

Thanks for the review. Yes I shall split the changes into smaller pieces to aid review.

lldb/source/Commands/CommandObjectTarget.cpp
1558

Oh right, this is relevant for https://reviews.llvm.org/D131705#inline-1292343. I'll make sure not to mix in this change for the final patch.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
787

I made symtab.Extend return a pointer to the first item of the appended range, so counting from 0 does not overwrite existing symbols.

819

Indeed it is a good idea to match symbols with different export names to synchronize the symbol types. I probably should use a vector<pair<address, export_name>> sorted by address for lookup instead.

917

Right, this is something we can do too.

mstorsjo added inline comments.Sep 19 2022, 3:43 AM
lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
787

Ah, I see - ok, good then, otherwise I was wondering how this was working at all :-)

819

Is it even relevant to keep the name in the vector in that case? As you'd only be looking up on address anyway, and based on that you can get the symbol name from the pointed-at symbol anyway, right?

alvinhochun added inline comments.Sep 19 2022, 3:44 AM
lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
819

Ah, I guess I meant to say the symbol index there...

Separated the first part with some new changes here: https://reviews.llvm.org/D134196

alvinhochun abandoned this revision.Oct 1 2022, 11:00 PM

Handled in separate changes.