This is an archive of the discontinued LLVM Phabricator instance.

[lldb][PDB] Add ObjectFile PDB plugin
ClosedPublic

Authored by zequanwu on Oct 20 2020, 11:16 AM.

Details

Summary

To allow loading PDB file with target symbols add command.

Diff Detail

Event Timeline

zequanwu created this revision.Oct 20 2020, 11:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2020, 11:16 AM
zequanwu requested review of this revision.Oct 20 2020, 11:16 AM
zequanwu added a comment.EditedOct 20 2020, 11:26 AM

How can I start lldb at specified path so that target symbols add bar.pdb could find the pdb path? (For testing)

zequanwu updated this revision to Diff 299525.Oct 20 2020, 5:48 PM
  • Add GetArchitecture method, similar to ObjectFilePECOFF
  • Add a test case.
zequanwu updated this revision to Diff 299527.Oct 20 2020, 5:51 PM

elaborate CHECK

I think this is a good start. See inline comments for details. In addition to the test for the "target symbols add" flow it would be good to have a separate test for the ObjectFilePDB functionality. You can look at existing tests in test/Shell/ObjectFile for inspiration.

lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp
11

You should be able to delete this line too clang-format should know to put this file first.

46

There's also a third copy of this code in ProcessMinidump (MinidumpParser.cpp). I think it's time to factor this out somehow. We could either create a new file in the Utility module, or put this functionality in directly inside the UUID class -- I'm not currently sure what's better.

173

I'm pretty sure this check is bogus and can never be false.

177–198

Could this use the same algorithm as ObjectFilePDB::GetArchitecture ?

208–210

Leave this unimplemented. PDBs don't have sections and are not loaded into memory, so the function does not apply.

216–217

This shouldn't be necessary. What "normally" happens here is that an object file is adding own sections *into* the "unified" section list -- not the other way around. Since we're pretending that the pdb file has no sections (and that's kind of true) I think you should just return an empty section list here, and not touch the unified list at all.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
145

revert spurious changes.

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
299

Instead of changing loadMatchingPDBFile I think we ought to change this to something like:

if (auto *pdb = llvm::dyn_cast<ObjectFilePDB>(m_objfile_sp)) {
  file_up = pdb->giveMeAPDBFile(); 
  // PS: giveMeAPDBFile is a new pdb-specific public method on ObjectFilePDB
  // PS: possibly this should not return a PDBFile, but a PDBIndex directly -- I don't know the details but the general idea is that it could/should reuse the pdb parsing state that the objectfile class has already created
} else 
  file_up = do_the_loadMatchingPDBFile_fallback();

The reason for that is that it avoids opening the pdb file twice (once in the object file and once here). Another reason is that I believe the entire loadMatchingPDBFile function should disappear. Finding the pdb file should not be the job of the symbol file class -- we have symbol vendors for that. However, we should keep the fallback for now to keep the patch small...

lldb/test/Shell/SymbolFile/NativePDB/load-pdb.cpp
15

I'd just pass commands to lldb directly via -o.

zequanwu updated this revision to Diff 299826.Oct 21 2020, 5:01 PM
zequanwu marked 4 inline comments as done.
  • address comments.
  • add tests.
  • move loadPDBFile from SymbolFileNativePDB.cpp to ObjectFilePDB.cpp and use it to create unique_ptr of PDBFile so that we can move it around.
zequanwu marked an inline comment as done.Oct 21 2020, 5:02 PM
zequanwu added inline comments.
lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp
177–198

No. This part is the same algorithm as the part in ObjectFilePECOFF::GetModuleSpecifications, https://github.com/llvm/llvm-project/blob/master/lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp#L194-L215.

The ObjectFilePDB::GetArchitecture is same as ObjectFilePECOFF::GetArchitecture.

208–210

This is actually necessary. It's called at SymbolFileNativePDB::InitializeObject. Otherwise, setting breakpoint won't work.

216–217

The reason why I copy section list from unified_section_list to m_sections_up is to let GetBaseAddress be able to return the right base address.

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
299

From what I see, it seems like PDBFile is designed to be created when opening the pdb file. It doesn't allow copying it around.
NativeSession holds a unique_ptr of PDBFile and NativeSession::getPDBFile returns the reference of object, so creating unique_ptr from it is not allowed.

I moved loadPDBFile to ObjectFilePDB as static function, so we can move PDBFile easily.

labath added inline comments.Oct 22 2020, 2:33 AM
lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp
177–198

I'm not sure this kind of consistency actually helps. For example, in Module::MatchesModuleSpec, module's own architecture (as provided by ObjectFile(PECOFF)::GetArchitecture) is compared to the one in the ModuleSpec (generally coming from ObjectFile(PDB)::GetModuleSpecifications.

If there's any mismatch between the two ways of computing a ArchSpec, then there will be trouble, even though they are "symmetric".

That said, I think we can keep this like it is in this patch, but I'd like to clean up both implementations in a another patch.

208–210

So, that's a bug in SymbolFileNativePDB::InitializeObject. It should be calling m_objfile_sp->GetModule()->GetObjectFile()->GetBaseAddress() (as done in e.g. SymbolFileBreakpad::GetBaseFileAddress) instead of m_objfile_sp->GetBaseAddress(). That will get you the base address of the main (exe) object file instead of the pdb. Up until now, that distinction was not important because m_objfile_sp was always the main file...

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
299

You don't need to copy it -- you can just have the two objects reference the same instance (stealing the PDBFile object from under ObjectFilePDB seems suboptimal). Probably the simplest way to achieve that is to change the PdbIndex class to use a raw pointer (and ObjectFilePDB to provide one). Then SymbolFilePDB can have an additional unique_ptr<PDBFile> member, which may or may not be empty (depending on whether it fetches object from ObjectFilePDB or loads it itself). Eventually, as we transition to always fetching the PDBFile from ObjectFilePDB, that member will disappear completely.

Also, I wasn't expecting you to take the giveMeAPDBFile name literally. :) I was deliberatly using a funky name as a placeholder because I wasn't sure of how exactly this would work -- I was trying to illustrate the general concept, and not a precise implementation. If what, I said in the previous paragraph works, then a "normal" name like GetPDBFile should work just fine...

Sorry about the confusion.

301–310

You should be able to delete this block now -- the code you're writing now supersedes (and enhances) it. If the module has a SymbolFileSpec, then this file will be loaded (as an ObjectFilePDB) in SymbolVendor::FindPlugin, and that instance will be passed to the SymbolFilePDB constructor.

lldb/test/Shell/ObjectFile/PDB/object.test
3 ↗(On Diff #299826)

Could we generate the pdb directly from yaml (perhaps via llvm-pdbutil yaml2pdb)? That way we could have a hardcoded UUID and explicit test for its parsing.

lldb/test/Shell/ObjectFile/PDB/symbol.test
4 ↗(On Diff #299826)

This is SymbolFile functionality, so it'd be better placed in Shell/SymbolFile/PDB

zequanwu updated this revision to Diff 300133.Oct 22 2020, 6:28 PM
zequanwu marked 4 inline comments as done.

Address some comments.

zequanwu added inline comments.Oct 22 2020, 6:28 PM
lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp
177–198

So, we want to refactor out the common part of ObjectFilePECOFF::GetArchitecture and ObjectFilePDB::GetModuleSpecifications to keep them consistent? I will do it in another patch.

208–210

Thanks for pointing it out. I'll update it and leave GetBaseAddress and CreateSections as unimplemented.

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
299

Then SymbolFilePDB can have an additional unique_ptr<PDBFile> member

I think you meant SymbolFileNativePDB, but why do we need this? It already has a member unique_ptr<PDBIndex>. I might misunderstood.

I changed it to have loadPDBFile return raw pointer and ObjectFilePDB has a raw pointer member and PdbIndex uses raw pointer.

lldb/test/Shell/ObjectFile/PDB/object.test
3 ↗(On Diff #299826)

I noticed when I use llvm-pdbutil pdb2yaml to generate yaml file from pdb file and then use llvm-pdbutil yaml2pdb to get pdb file back, the two pdb files are not the same. Also using lldb-test symbols with the pdb created from yaml file, got the error: error: Module has no symbol vendor. This doesn't happen if I use the original pdb file.

lldb/test/Shell/ObjectFile/PDB/symbol.test
4 ↗(On Diff #299826)

I put it here because I saw symbol.yaml which uses lldb-test symbols exists in Shell/ObjectFile/PECOFF. And the folder Shell/SymbolFile seems to contain tests for SymbolFile plugins.

labath added inline comments.Oct 23 2020, 1:13 AM
lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp
177–198

I'd actually say all four functions (ObjectFile{PECOFF,PDB}::{GetArchitecture,GetModuleSpecifications}) should use the same algorithm, but lets leave that for the other patch.

lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
42 ↗(On Diff #300133)

What's up with the reference?

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
299

Yeah, but who's now responsible for deleting the PDBFile object? You just leak it, right?

My proposal was the following:

  • ObjectFilePDB is responsible for freeing the PDBFile object that it creates (but it can borrow that object to the SymbolFile)
  • SymbolFileNativePDB is responsible for the object that it creates (which happens only when it's not borrowing the PDBFile). This is what the extra member is for.
  • PDBIndex is *not* responsible for freeing anything

The imagined implementation was something like:

class ObjectFilePDB {
  std::unique_ptr<PDBFile> m_file_up; // Always non-null
  PDBFile &GetPDBFile() { return *m_file_up; }
};
class SymbolFilePDB {
  std::unique_ptr<PDBIndex> m_pdb_index;
  std::unique_ptr<PDBFile> m_file_up; // May be null
};
SymbolFilePDB::CalculateAbilities() {
  ...
  PDBFile *pdb_file;
  if (loading_from_objfile)
    pdb_file = &objfile->GetPDBFile();
  else {
    m_file_up = loadMatchingPDBFile(...);
    pdb_file = m_file_up.get();
  }
  m_index = PdbIndex::Create(*pdb_file);
}

Other implementations are possible as well...

lldb/test/Shell/ObjectFile/PDB/object.test
3 ↗(On Diff #299826)

Yeah, unfortunately it looks like pdb2yaml does not support some pdb streams (though I also wonder if lldb is not being too strict about the kinds of streams it expects). It would be nice to fix that one day.

Nonetheless, the current pdb2yaml functionality seems sufficient for the purposes of this particular test, so I think we should use that here. The hardcoded uuid here may be brittle as it depends on the exact algorithm that lld uses to compute it.

lldb/test/Shell/ObjectFile/PDB/symbol.test
4 ↗(On Diff #299826)

The fact that that test uses the symbols function of lldb-test is not ideal, but the functionality it tests (symbol table parsing) is definitely an ObjectFile feature. OTOH, Parsing of Types, CompileUnits and Functions definitely falls into the SymbolFile purview. Given that the other test can be moved to yaml2pdb, and you don't have to worry about duplicating inputs, I'd move this test to the SymbolFile folder

That said, I'm not sure what is it that you actually wanted to test here. What's being done here that is not covered by the load-pdb.test? You could potentially add -o "b main" (to force symfile parsing) and -o "image dump symfile" to the other test if you wanted to check that the pdb can actually be parsed this way.

4 ↗(On Diff #300133)

This is pretty strange way to use the pdb file and I'm surprised that it actually works (it probably shouldn't). The pdb file should not normally be the main object file of a module. You probably want something like lldb-test symbols %t.exe -symbol-file %t.pdb.

lldb/test/Shell/SymbolFile/NativePDB/load-pdb.cpp
15

It's simpler to just pass the executable as an argument instead of wrapping it in -o "target create".

zequanwu updated this revision to Diff 300388.Oct 23 2020, 12:54 PM
zequanwu marked 3 inline comments as done.

Address comments.

lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp
177–198

It's interesting that GetArchitecture and GetModuleSpecifications in ObjectFilePECOFF currently do not have the same algorithm, but works fine.

lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
42 ↗(On Diff #300133)

Removed reference.

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
299

Thanks for declaration.

lldb/test/Shell/ObjectFile/PDB/object.test
3 ↗(On Diff #299826)

I switched to use yaml2pdb. The uuid is swapped byte order when reading from pdb.

lldb/test/Shell/ObjectFile/PDB/symbol.test
4 ↗(On Diff #299826)

I'll put the testing of symfile parsing in load-pdb.cpp.

This looks pretty good, both the patch and Pavel's insights. I don't see much to comment on that Pavel didn't already catch.

lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp
169

For me, the name spec is confusing, because this code is mostly dealing with ModuleSpecs but spec is a reference to the ArchSpec of the module_spec. Perhaps module_arch would make this clearer. Then the code below would follow a pattern like:

module_arch.SetTriple("blah");
specs.Append(module_spec);
labath accepted this revision.Oct 26 2020, 3:41 AM

lgtm

lldb/source/Plugins/ObjectFile/PDB/CMakeLists.txt
7

I don't see this using any Host functionality.

lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
145

do revert this before committing.

855

this too

lldb/test/Shell/ObjectFile/PDB/object.test
1 ↗(On Diff #300388)

Consider "inlining" the yaml input into this file.

This revision is now accepted and ready to land.Oct 26 2020, 3:41 AM
zequanwu updated this revision to Diff 300720.Oct 26 2020, 10:28 AM
zequanwu marked 4 inline comments as done.

Address comments.

This revision was landed with ongoing or failed builds.Oct 26 2020, 10:29 AM
This revision was automatically updated to reflect the committed changes.

It has regressed buildbots:

SymbolFile/NativePDB/load-pdb.cpp
lldb-x86_64-fedora: http://lab.llvm.org:8014/#/builders/14/builds/1006
lldb-x86_64-debian: http://lab.llvm.org:8011/#/builders/68/builds/782

It has regressed buildbots:

SymbolFile/NativePDB/load-pdb.cpp
lldb-x86_64-fedora: http://lab.llvm.org:8014/#/builders/14/builds/1006
lldb-x86_64-debian: http://lab.llvm.org:8011/#/builders/68/builds/782

Just put a fix here: https://reviews.llvm.org/rG4b83747ab1577b7899116b0ddb75f05de2471694
This should fix it.

load-pdb is still failing on the Windows bot. Can you please commit a fix soon or revert the change?

load-pdb is still failing on the Windows bot. Can you please commit a fix soon or revert the change?

Okay, I'm looking at it now

load-pdb is still failing on the Windows bot. Can you please commit a fix soon or revert the change?

It's slash vs back slash thing. Put a fix at https://reviews.llvm.org/rG779deb9750a4853485ac7beca86f518b067ad6d6