This is an archive of the discontinued LLVM Phabricator instance.

Implement some functions in NativeSession.
ClosedPublic

Authored by akhuang on Apr 14 2020, 10:06 AM.

Details

Summary

This change implements readFromExe, and calculating VA and RVA, which
are some of the functionalities that will be used for native PDB reading
for llvm symbolizer.

bug: https://bugs.llvm.org/show_bug.cgi?id=41795

Diff Detail

Event Timeline

akhuang created this revision.Apr 14 2020, 10:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2020, 10:06 AM

I'll also add unit tests for these

The load address stuff looks fine.

I think the bit that tries to find the PDB file probably needs more flexibility or a bypass for the LLDB and Crash folks. I think we should include Pavel (labath) for his opinion on that bit. I recall there was lots of discussion about this in the Crash/Lexan meetings long ago.

llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
103

I think the LLDB folks might be concerned about searching the file system for the corresponding PDB file. The debugger folks are probably going to want a way to override that for remote debugging, symbol servers, virtual file systems, etc.

Perhaps the right thing to do is to have an API that takes paths to both the EXE and the PDB. If everything loads, great, otherwise return an error. Then you could have an API like this that tries to find the PDB, but rather than having it then load the session, it just forwards the PDB path to the two-path API.

That would make it possible for a client to get the automatic behavior you have here while also enabling LLDB to be explicit about where the PDB should be.

Per my previous comment, I've added Pavel as a reviewer.

akhuang updated this revision to Diff 257554.Apr 14 2020, 4:29 PM

Add unit tests

The load address stuff looks fine.

I think the bit that tries to find the PDB file probably needs more flexibility or a bypass for the LLDB and Crash folks. I think we should include Pavel (labath) for his opinion on that bit. I recall there was lots of discussion about this in the Crash/Lexan meetings long ago.

If I am not mistaken the "native" PDB plugin (which is the "future" for lldb ) does not use this code path at all -- it goes directly for the llvm::pdb::***Stream classes. As such, I don't think any special hooks need to be added here. Which isn't to say that searching the CWD unconditionally is a particularly nice interface for a library, but it is pretty similar to what the llvm::DWARFContext does...

rnk added inline comments.Apr 15 2020, 8:14 AM
llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
81

Maybe rename Path to ExePath so the code below that replaces the exe filename with the pdb filename is a bit more readable.

98

I see what you mean that it was mostly unimplemented. :)

103

Building on Adrian's idea, I think the API might look kind of like this:

struct PdbSearchOptions {
  std::string ExePath;
  // Options we can add later:
  // std::vector<std::string> SearchDirs;
  // bool UseEnvNtSymbolPath = false;
  // llvm::VirtualFileSystem *SearchVFS = nullptr; // Way to remap paths
};
Expected<std::string> searchForPdb(const PdbSearchOptions &Opts) {
  ...
}

The search would return the first existing file with matching PDB magic.

The process of searching for a PDB file can be quite involved. Here are some things that we may implement one day:

  • _NT_SYMBOL_PATH: an environment variable with PDB search paths
  • Symbol servers: Maybe we will want to learn to talk to MS symbol servers to download PDBs

Even if we don't implement the symsrv protocol, it would be useful to teach LLVM how to find PDBs in a symbol server cache. For example, I have these kernel32.dll PDBs in C:\symbols, which windbg put them:

$ ls /c/symbols/kernel32.pdb/*
/c/symbols/kernel32.pdb/548CAA89F1FCF0193D12D2DC02F8920E1:
kernel32.pdb

/c/symbols/kernel32.pdb/63816243EC704DC091BC31470BAC48A31:
kernel32.pdb

If llvm-symbolizer wants to be able to symbolize system DLLs, that will be our best bet.

178

Checking the sign bit makes sense, but these seem like simpler ways to express that:

  • (int32_t) RVA < 0
  • RVA > INT32_MAX (unsure on macro name)
186

If we can subtract the virtual address out of it, are you sure it is an RVA? It seems like it's supposed to be a virtual address... If it's a virtual address, though, I would expect it to require a uint64_t. Hm.

llvm/unittests/DebugInfo/PDB/Inputs/SimpleTest.cpp
4 ↗(On Diff #257554)

I tried to think of how to use yaml2pdb to avoid the binary files, but I don't like any of my solutions. I'm OK with the binary files.

akhuang updated this revision to Diff 258841.Apr 20 2020, 2:17 PM
akhuang marked 3 inline comments as done.

Address comments about adding a separate path to search for the PDB.

akhuang added inline comments.Apr 20 2020, 2:18 PM
llvm/lib/DebugInfo/PDB/Native/NativeSession.cpp
186

I think the section header virtual addresses are also relative to the load address.

akhuang updated this revision to Diff 258864.Apr 20 2020, 4:33 PM

Fix loadPdbFile function to take reference to Allocator

amccarth accepted this revision.Apr 20 2020, 4:43 PM

Thanks. I appreciate the separation of concerns with regard to finding and loading of the PDB.

This revision is now accepted and ready to land.Apr 20 2020, 4:43 PM
This revision was automatically updated to reflect the committed changes.
MaskRay added a subscriber: MaskRay.EditedApr 21 2020, 11:56 AM

Hi, llvm/object/COFF.h does not work on a case-sensitive filesystem. I fixed it in 4927ae085807731eb4052e0a682443fe9399b512. BTW, you can strip unneeded Phabricator tags (Subscribers: Tags: Reviewers:) with

arcfilter () {
        arc amend
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:$/ {sub(/^Summary: /,"");print}' | git commit --amend -F -
}

@MaskRay Oh, whoops. Thanks for fixing!

MaskRay added a comment.EditedApr 21 2020, 12:16 PM

http://45.33.8.238/linux/15836/step_12.txt

After fixing the -DLLVM_ENABLE_ABI_BREAKING_CHECKS=on problem locally, the unittest (unittests/DebugInfo/PDB/DebugInfoPDBTests) still fails.

(
An erroring Expected must be explicitly checked to avoid -DLLVM_ENABLE_ABI_BREAKING_CHECKS=on failures.

if (Expected<std::unique_ptr<PDBFile>> File = loadPdbFile(PdbPath, Allocator))
  return std::string(PdbPath);
else
  return File.takeError();

)

hliao added a subscriber: hliao.Apr 21 2020, 1:21 PM

http://45.33.8.238/linux/15836/step_12.txt

After fixing the -DLLVM_ENABLE_ABI_BREAKING_CHECKS=on problem locally, the unittest (unittests/DebugInfo/PDB/DebugInfoPDBTests) still fails.

(
An erroring Expected must be explicitly checked to avoid -DLLVM_ENABLE_ABI_BREAKING_CHECKS=on failures.

if (Expected<std::unique_ptr<PDBFile>> File = loadPdbFile(PdbPath, Allocator))
  return std::string(PdbPath);
else
  return File.takeError();

)

Even more, as the test reads the PDB file path from that EXE file (llvm-project/llvm/unittests/DebugInfo/PDB/Inputs/SimpleTest.exe), that path is an absolute path in Windows format (C:\\src\\llvm-project\\llvm\\unittests\\DebugInfo\\PDB\\Inputs\\SimpleTest.pdb). On Linux builds, that always fail and, on Windows, we are only luck if we have the same dir layout by chance. Could you fix that by taking relative path and disable that test on non-Windows platforms?

Actually, I will just revert for now.

Relanded in 23609331472be22be3be134f6facd8f086c5ea49 so that tests pass on Linux. I changed the PDB searching part to specify the path style, so that it correctly finds the filename on Linux.