Page MenuHomePhabricator

WIP: [lldb][COFF] Enhance symtab loading of symbol and export tables

Authored by alvinhochun on Sep 18 2022, 5:29 AM.



This reimplements ObjectFilePECOFF::ParseSymtab with the following

  • When the image has both an export table and a COFF symbol table, the symbols obtained from the symbol table no longer get erased.
  • Remove manual data extraction in favour of using what COFFObjectFile provides.
  • Mark symbols from the export table as "External" instead of "Debug".
  • Support DLL forwarder exports (marked as re-exported).
  • Handle absolute symbols in the symbol table.
  • Heuristically set some symbol types. Symbols in the symbol table starting in __imp_ (dllimport IAT reference) or .refptr. (mingw stub) are marked as "Data", because their locations store a pointer address.
  • When a symbol in the symbol table is a duplicate of an exported symbol, its info will be synchronized with the exported symbol, then it will be marked as "Additional" type to avoid unwanted repetition when used in commands like disassemble.

Diff Detail

Unit TestsFailed

60,040 msx64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/LeakTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/fuzzer-leak.test.tmp-LeakTest

Event Timeline

alvinhochun created this revision.Sep 18 2022, 5:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2022, 5:29 AM
alvinhochun requested review of this revision.Sep 18 2022, 5:29 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 18 2022, 5:29 AM

So previously, LLDB essentially used the COFF symbol table for executables, but only the list of exported symbols for DLLs, ignoring (or, reading and then overwriting) the symbol table for any DLL with exports? Then this certainly does look like a good direction.

Overall, I think this does look nice, but it's certainly hard to rewrite and wrap one's head around. When you've settled what you want to achieve, can you try to split it up into a short series of patches, so that the initial rewrite patch doesn't add a ton of extra functionality, but only covers the rewrite and appending instead of wiping symbols?


Looks like this is a stray change unrelated to the rest (although it does seem correct).


If we mean to append to the symbol table here, shouldn't we start with i = symtab.size() or similar? Otherwise we'd start overwriting the existing symbols from the start of the table?


What about the case when a symbol is exported with a different name than the local symbol? (This is doable with def files e.g. as ExportedName = LocalName iirc.) Is it worth to have a map of address -> exported symbol, to use instead of the raw name?


If it's relevant, we can look up what section the exported address is in, and check if the section is executable or not.


For ease of review (when removing the WIP status), I think it'd be easier to wrap the head around, if new features like these were deferred to a separate patch.

Thanks for the review. Yes I shall split the changes into smaller pieces to aid review.


Oh right, this is relevant for I'll make sure not to mix in this change for the final patch.


I made symtab.Extend return a pointer to the first item of the appended range, so counting from 0 does not overwrite existing symbols.


Indeed it is a good idea to match symbols with different export names to synchronize the symbol types. I probably should use a vector<pair<address, export_name>> sorted by address for lookup instead.


Right, this is something we can do too.

mstorsjo added inline comments.Sep 19 2022, 3:43 AM

Ah, I see - ok, good then, otherwise I was wondering how this was working at all :-)


Is it even relevant to keep the name in the vector in that case? As you'd only be looking up on address anyway, and based on that you can get the symbol name from the pointed-at symbol anyway, right?

alvinhochun added inline comments.Sep 19 2022, 3:44 AM

Ah, I guess I meant to say the symbol index there...

Separated the first part with some new changes here:

alvinhochun abandoned this revision.Oct 1 2022, 11:00 PM

Handled in separate changes.