Page MenuHomePhabricator

Create a new symbol file plugin for cross-platform PDB symbolization
ClosedPublic

Authored by zturner on Oct 8 2018, 3:51 PM.

Details

Summary

The existing SymbolFilePDB only works on Windows, as it is written against a closed-source Microsoft SDK that ships with their debugging tools.

There are several reasons we want to bypass this and go straight to the bits of the PDB, but just to list a few:

  1. More room for optimization. We can't see inside the implementation of the Microsoft SDK, so we don't always know if we're doing things in the most efficient way possible. For example, setting a breakpoint on main of a big program currently takes several seconds. With the implementation here, the time is unnoticeable.
  1. We want to be able to symbolize Windows minidumps even if not on Windows. Someone should be able to debug Windows minidumps as if they were on Windows, given that no running process is necessary.

This patch is a very crude first attempt at filling out some of the basic pieces.

I've implemented FindFunctions, ParseCompileUnitLineTable, and ResolveSymbolContext for a limited subset of possible parameter values. Basically, enough to get it to display me something nice for the breakpoint location.

The test I've added, incidentally, is able to load a .exe into LLDB and set a breakpoint by name even on non-Windows. This is really cool because it allows us to run the test on every platform thusly increasing our test coverage, and also serving as proof that this actually does work anywhere. Of course it doesn't make sense to actually run this program on non Windows and you'll get an error if you try to do so, but all we try to do with this test is set the breakpoint and verify that symbols are resolved correctly.

I'll add many more tests of this nature later.

For now, this plugin is enabled always on non-Windows, and by setting the environment variable LLDB_USE_NATIVE_PDB_READER=1 on Windows. Eventually, once it's at parity with the Windows implementation, we'll delete the Windows DIA-based implementation.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Oct 8 2018, 3:51 PM
zturner updated this revision to Diff 168723.Oct 8 2018, 3:56 PM

Fix the test. I accidentally clang-formatted it which messed up the check lines. Turn clang-format off for this file.

I can't say I have done a particularly thorough review, but code seems well structured, and it looks like I could make sense of it if I went through the effort of figuring out what various pdb record types mean. These are the things that came to mind while looking at this.

Also, hooray for being able to open pdb files on non-windows.

lldb/lit/SymbolFile/NativePDB/simple-breakpoints.cpp
1 ↗(On Diff #168723)

It seems llvm has a special .clang-format file in the test folder which sets ColumnLimit: 0. Maybe we should do the same?

lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
37–38 ↗(On Diff #168723)

Maybe propagate this error to the caller and have the caller log the error when ignoring it?

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
53 ↗(On Diff #168723)

Is this actually used?

lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
78–85 ↗(On Diff #168723)

I am wondering whether it wouldn't be better to use an lldb setting instead of an env var (plugin.symbol-file.pdb.native ?). For this to work, we'd need to delay to decision of which plugin to use to until CreateInstance time, but that shouldn't be too hard.

It's great! Especially thanks for comments!

I haven't looked the whole patch yet, I'll finish it tomorrow, ok?

lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
307 ↗(On Diff #168723)

Why shouldn't we pass a type here? Isn't it unrelated to the new plugin?

I looked through the code and it mostly looks good. I had a couple of nits and a couple of actual questions (especially about asserts). I agree with Aleksandr the comments are going to make a big difference in maintenance/improvements going forward (until they eventually get out of sync with the code...).

lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
149 ↗(On Diff #168723)

Nit: I've noticed you initialized the variables some places, but not all places. I'm all for initializing always.

153 ↗(On Diff #168723)

If the assert fails, does this mean that the symbol map won't be built correctly? Should it be an actual error instead of just an assert?

lldb/source/Plugins/SymbolFile/NativePDB/PdbSymUid.h
131 ↗(On Diff #168723)

This assert looks like it really should be an error instead of just an assert.

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
341 ↗(On Diff #168723)

Did you mean to leave this comment here? Maybe with a FIXME?

592 ↗(On Diff #168723)

It looks like a couple more functions should have FIXME comments as they are not yet implemented.

zturner added inline comments.Oct 9 2018, 10:04 AM
lldb/lit/SymbolFile/NativePDB/simple-breakpoints.cpp
1 ↗(On Diff #168723)

We could do the same, but it won't remove the need to turn clang-format off. Especially when we're running CHECk statements against line numbers in the file, we don't want clang-format modifying the file. Even if we set column limit to 0 here, clang-format will write main on 1 line as int main(int argc, char **argv) { return 0; }. That's an interesting test case, but it's a different test case than what we have here where we want the line number to point to the first line of the function that *isn't* the declaration. It might even be interesting to have a third test case for

int main(int argc, char **argv)
{
  return 0;
}
lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
37–38 ↗(On Diff #168723)

These are a bit tricky. In all cases, there is only possible thing the caller can do, which is to give up. I don't think the caller particularly cares about the details of the error that occurred, the only way to handle it is to say "we're not going to use this plugin". So I think it's fine to consume the error and just return a generic failure here.

153 ↗(On Diff #168723)

I think the only way this assert would fail is if two symbols in the PDB were indicated as having the same address. The only time we call BuildAddrToSymbolMap is if cci.m_symbols_by_va.empty() is true. Then, we only insert items into it at the bottom of this loop. So over the course of the loop, the same address would have had to have been added already.

If the debug info says two symbols have the same address, there isn't much we can do other than to change this to some kind of multimap, but that just pushes the decision of how to handle the incorrect debug info to the caller, which doesn't have any better strategy for resolving this kind of ambiguity. That said, down below we call insert which assumes the item is not already in the map. We could change that to a try_emplace, then assert that the insert succeeded. This would have the effect of asserting in the same case that this asserts, but the program still "working" (it would just drop anything other than the first symbol with a given address).

lldb/source/Plugins/SymbolFile/NativePDB/PdbSymUid.h
131 ↗(On Diff #168723)

This is a record which comes from the debug info, which technically is user input. On the other hand, we need some layer at which we can assume debug info has been sanitized. Propagating errors through multiple layers of infrastructure code just complicates the control flow and doesn't really add any value. The idea is that you should have sanitized the record before you call this function.

As it turns out, I only call this function in one place, and I didn't sanitize the input yet, partly because there are only two existing implementations of things that can write PDB files, and neither one of them will emit buggy CodeView records here. But if you think it's better, I can add a sanitization check to the caller.

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
53 ↗(On Diff #168723)

Nope, thanks! Neither is some of the other stuff. I started by copying all of SymbolFilePDB.cpp and then removing code. But I missed this include. I'll remove other dead code as well and re-upload.

341 ↗(On Diff #168723)

My bad. This was from the SymbolFilePDB.cpp which I started with as the template for implementing this file. And I didn't remove this comment.

lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
78–85 ↗(On Diff #168723)

I thought about this but decided against it. There are two reasons I didn't want to do this:

  1. On non-Windows platforms, the setting doesn't make any sense. There is no other option.
  2. It doesn't seem like something we want to "support" opting into or out of. It's either in experimental / development mode, like right now, in which case the only reason you have to use it is if you're working on it. Or it's complete, in which case we just delete the old one and make everything use the new one with no alternative.

So I couldn't come up with a valid use case to justify having it be a setting.

307 ↗(On Diff #168723)

Nice catch! We actually should pass a type here. The reason I'm not doing it is because that would require implementing the PDBASTParser class. Obviously we need to do that eventually, but for now it wasn't necessary in order for LLDB to not complain. So I was trying to keep the patch small and just implement the minimum amount needed for *something* to work. But yes, we will need to do this.

lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
307 ↗(On Diff #168723)

But here is the SymbolFilePDB part, not SymbolFileNativePDB, and PDBASTParser is already implemented for it? And we even retrieve the type in the line 297... If I understand right, it will be never called if LLDB_USE_NATIVE_PDB_READER is on?

zturner added inline comments.Oct 9 2018, 10:23 AM
lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
307 ↗(On Diff #168723)

Oh, yea you are right. I changed this in SymbolFilePDB because I was testing whether LLDB would still display something nice. THen I forgot to change it back. So yea, I'll revert this back to how it was. Sorry about that

lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
307 ↗(On Diff #168723)

That's ok :)

lldb/source/Plugins/SymbolFile/NativePDB/PdbSymUid.h
131 ↗(On Diff #168723)

Yes, I think adding a sanitization check to the caller is a good idea. Detecting the error earlier (and when it's potentially actionable) should make maintenance down the line easier.

zturner updated this revision to Diff 168871.Oct 9 2018, 2:10 PM

Addressed all issues pointed out by various people, and added a few more tests. There's only a limited amount of stuff we can do without a running process, so I'll probably have to resort to lldb-test soon to be able to print more detailed information.

zturner added inline comments.Oct 9 2018, 2:13 PM
lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
571–572 ↗(On Diff #168871)

@stella.stamenova I added the sanitization check here for the record. In theory we can provide an overload of this function for every record type that has special requirements. For now I've only implemented the function for this record type and all other types return true.

This revision is now accepted and ready to land.Oct 9 2018, 2:15 PM
lemo added a comment.Oct 9 2018, 2:29 PM

Yay! Nice to see this is coming along, thanks Zach!

Looks good to me, a few comments inline.

lldb/lit/SymbolFile/NativePDB/simple-breakpoints.cpp
11 ↗(On Diff #168723)

just an idea: add a negative test as well? (break -s n non_existent_function)

12 ↗(On Diff #168723)

Is the offset and address going to be stable enough to rely on it for testing?

lldb/source/Plugins/SymbolFile/NativePDB/CMakeLists.txt
12 ↗(On Diff #168723)

nit: fix indentation

lldb/source/Plugins/SymbolFile/NativePDB/CompileUnitIndex.cpp
84 ↗(On Diff #168723)

what's the meaning of this check?

103 ↗(On Diff #168723)

is the intention to count unique kinds? can we have duplicates?

121 ↗(On Diff #168723)

is C++ 17 available? structured bindings can make this more readable:

const auto& [it, inserted] = ...;
if (!inserted)

...
lldb/source/Plugins/SymbolFile/NativePDB/CompileUnitIndex.h
34 ↗(On Diff #168723)

add a brief description of the struct?

71 ↗(On Diff #168723)

ditto: class description comment?

lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
34 ↗(On Diff #168723)

lldbassert() ?

(same comment for the rest of assert()s in the CR)

37–38 ↗(On Diff #168723)

how do you feel about adding case-specific logging?

122 ↗(On Diff #168723)

class? (not just for the notation clarity, but there's no reason to make the visitor members public, right?

149 ↗(On Diff #168723)

+1

151 ↗(On Diff #168723)

C++ 17 structured bindings?

186 ↗(On Diff #168723)

but is that what we really want? shouldn't we be looking for the "inner-most" range which contains the address?

191 ↗(On Diff #168723)

Ok, I see what we're doing. Wouldn't make more sense to iterate backward? (efficiency wise and more importantly to locate the most "inner" range, right?)

lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.h
58 ↗(On Diff #168723)

initialize these raw pointers in-class (NSDMI) ?

llvm::pdb::DbiStream *m_dbi = nullptr;

95 ↗(On Diff #168723)

comment?

lldb/source/Plugins/SymbolFile/NativePDB/PdbSymUid.h
162 ↗(On Diff #168723)

should't the last argument be sizeof(result) ?

also, what is the meaning of this opaque id?

lldb/source/Plugins/SymbolFile/NativePDB/PdbUtil.cpp
124 ↗(On Diff #168723)

I'd prefer a dedicated SegmentAndOffset struct rather than the generic pair<>, what do you think?

lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
86 ↗(On Diff #168723)

add logging?

116 ↗(On Diff #168723)

logging, please?

128 ↗(On Diff #168723)

ditto: logging? (very valuable for someone trying to troubleshoot symbol loading issues w/o debugging lldb)

131 ↗(On Diff #168723)

sizeof(guid) ?

@lemo Thanks for the detailed review! I'll fix the suggestions and upload a new patch in a bit.

lldb/lit/SymbolFile/NativePDB/simple-breakpoints.cpp
12 ↗(On Diff #168723)

No, but it sounds like you're looking at an old version of the diff. I removed this regex'ed all the address checks out.

lldb/source/Plugins/SymbolFile/NativePDB/CompileUnitIndex.cpp
121 ↗(On Diff #168723)

Sadly no, LLVM is still on C++11.

lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
34 ↗(On Diff #168723)

lldbassert is nice to validate things that you expect should happen but want to log an error if the condition is not held. assert is better when you want to guarantee invariants of your API. In this case I use assert because the precondition of this function is "file is non-null". lldbassert wouldn't help here, because the very next line of code is to just dereference the pointer and we'd crash anyway.

37–38 ↗(On Diff #168723)

Logging is possible. But I think there's such a thing as too verbose logging. There's a balance between code simplicity and providing useful information in the log file. This code is called at a stage when LLDB still hasn't even figured out that the PDB symbol file plugin is ultimately the right plugin for the job. It's *asking* us if we can handle it. So if we log an error and say "Invalid file, PDB doesn't have a DBI stream" it seems too verbose in my opinion. Imagine if LLDB tried the DWARF plugin first (because it just goes down the list of looking for anyone who claims they can handle this file), and DWARF logged an error saying "Invalid DWARF debug information format", but it's not even DWARF in the first place.

I don't know what the best thing to do here is other than to say "we can't handle this", for which a single nullptr return value is sufficient.

122 ↗(On Diff #168723)

Well it's a local class defined inside of a member function, so in a sense it's already "private". But you're right, it could be a class.

186 ↗(On Diff #168723)

The purpose of this function is to find *all* symbols that occupy a given address. Imagine a function which contains a nested block. Just because you call this function with an address inside the block doesn't necessarily mean you want the block. If we had to choose just 1 symbol to return, I'd rather choose the outermost symbol, because from there it's easier to determine all the inner symbols. But going the other direction is more complicated. This function however just returns a list of all of them and lets the caller decide what to do with that information. We could, for example, build something on top of this to allow more granular selection, but that would still need to call into this and then filter the list down.

191 ↗(On Diff #168723)

See above, but most inner range is not necessarily what we're looking for. In this patch, for example, the goal is to find the function symbol. So if this returned the inner most range, it might return a block symbol and we'd still need to do more work to find the function symbol. So this function just returns every symbol that contains the address, and lets the caller filter that down further (if desired).

lldb/source/Plugins/SymbolFile/NativePDB/PdbSymUid.h
162 ↗(On Diff #168723)

sizeof(result) == sizeof(m_uid). This can even be static_asserted, due to the construction of the PdbSymUid class. See the comment at the top of this header file for a description.

The idea is that LLDB represents symbols, types, compilands, and everything else using a 64-bit integer. So we partition a 64-bit integer into a 1-byte tag field that says what kind of symbol it is, and then 48 bytes of type-specific data. When LLDB asks for a 64-bit unique identifier for a particular thing, we call this function, which is basically treating the 64-bit structure as a 64-bit integer. The 1-byte tag field guarantees uniqueness across all symbol types.

lldb/source/Plugins/SymbolFile/NativePDB/PdbUtil.cpp
124 ↗(On Diff #168723)

Reasonable, I'll fix this.

zturner updated this revision to Diff 168889.Oct 9 2018, 3:29 PM

Addressed suggestions from Leonard.

lemo accepted this revision.Oct 9 2018, 3:42 PM
lemo added inline comments.
lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
34 ↗(On Diff #168723)

I believe that lldbassert has been fixed (lldb_private::lldb_assert() unconditionally ends with calling abort())

zturner added inline comments.Oct 9 2018, 3:57 PM
lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
34 ↗(On Diff #168723)

Hmm, that's new. So it looks to me like the only difference between assert and lldbassert is that you can't turn lldbassert off in release mode. Do I have that right? Do we care about that?

lemo added inline comments.Oct 9 2018, 4:04 PM
lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
34 ↗(On Diff #168723)

my main concern is consistency - having multiple flavors of assert can be confusing (one should not have to think too much about the subtleties around the various flavors of "assert")

so my 2c: use lldbassert() in the LLDB codebase

zturner updated this revision to Diff 168897.Oct 9 2018, 4:13 PM

Use lldbassert. The main reason I didn't use this originally is because it didn't used to be a hard assert. But since it seems like it is now, I've no problem standardizing on this.

(FWIW, I actually think we should call llvm::sys::report_fatal_error instead of abort, but we can have that discussion another time.

labath added inline comments.Oct 10 2018, 12:11 AM
lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp
34 ↗(On Diff #168723)

It has been "fixed", though I'm not sure if people who were advocating for "lldbassert" actually saw the "fix".

37–38 ↗(On Diff #168723)

My main reason for suggesting this was simplicity. Having this function return an error means you can turn a bunch of:

if (!Foo) {
  consumeError(Foo.takeError());
  return nullptr;
}

into

if (!Foo)
  return Foo.takeError();

Then, in the caller (which is in a better position to determine what to do with the error imho), you can discard the error with a single consumeError call (or a log message, or whatever).

153 ↗(On Diff #168723)

It was my understanding we don't want malformed debug info to ever crash the debugger: https://reviews.llvm.org/D18646#388424 (last paragraph).

lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
78–85 ↗(On Diff #168723)

That's fine. I'll leave that up to you.

aleksandr.urakov accepted this revision.Oct 10 2018, 6:28 AM

Thank you!

This revision was automatically updated to reflect the committed changes.

Not sure if you've noticed this yet, but a couple of the new tests are failing on Linux and one of them is failing on Windows as well. The failures on Linux are in disassembly.cpp and source-list.cpp and source-list.cpp is also failing on Windows. Let me know if you need logs from the failures.

Ugh, I put absolute paths in the test. I can fix it in a few hours, i had
to leave tge office early. otherwise you’re free to revert, sorry :-/

Hui added a subscriber: Hui.Oct 10 2018, 5:19 PM
Hui added inline comments.
lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp
559 ↗(On Diff #168723)

will be good to add a 'pdb' or 'npdb' log channel for any lookup logging etc.

I took a quick look at the FileCheck issue and it looks like these have been failing since mid-September!

That’s when the change to use filecheck inside the LLDB tests as a function went in: https://reviews.llvm.org/D50751. The change adds a new required input to dotests.py and updates the cmake files to pass it as well as the xcode project, but any of the bots that are calling dotests.py directly now need to pass the new input too.

Vedant, since you made the change, how do you feel about logging a warning instead of hard failing inside dotests.py and then hardfailing any tests that use filecheck when it doesn’t exist? That would limit the failures just to those tests instead of all of the tests on these bots. The bot owners will have to update the scripts as well.

Zachary, if you can’t get a local repro of the windows or linux failure with your changes, I can get you the logs from our test runs.

Thanks,
-Stella

From: Zachary Turner <zturner@google.com>
Sent: Wednesday, October 10, 2018 8:23 PM
To: reviews+D53002+public+9ea58f85247e05d8@reviews.llvm.org
Cc: aleksandr.urakov@jetbrains.com; aaron.lee.smith <aaron.lee.smith@gmail.com>; Stella Stamenova <stilis@microsoft.com>; pavel@labath.sk; mosescu@google.com; mgorny@gentoo.org; arphaman@gmail.com; llvm-commits@lists.llvm.org; llvm@inglorion.net
Subject: Re: [PATCH] D53002: Create a new symbol file plugin for cross-platform PDB symbolization

vsk added a comment.Oct 11 2018, 3:06 PM

I took a quick look at the FileCheck issue and it looks like these have been failing since mid-September!

That’s when the change to use filecheck inside the LLDB tests as a function went in: https://reviews.llvm.org/D50751. The change adds a new required input to dotests.py and updates the cmake files to pass it as well as the xcode project, but any of the bots that are calling dotests.py directly now need to pass the new input too.

Vedant, since you made the change, how do you feel about logging a warning instead of hard failing inside dotests.py and then hardfailing any tests that use filecheck when it doesn’t exist? That would limit the failures just to those tests instead of all of the tests on these bots. The bot owners will have to update the scripts as well.

Sounds good, I'll send out a patch. Sorry, I didn't count on there being so many different ways the test harness is invoked.

Zachary, if you can’t get a local repro of the windows or linux failure with your changes, I can get you the logs from our test runs.

Thanks,
-Stella

From: Zachary Turner <zturner@google.com>
Sent: Wednesday, October 10, 2018 8:23 PM
To: reviews+D53002+public+9ea58f85247e05d8@reviews.llvm.org
Cc: aleksandr.urakov@jetbrains.com; aaron.lee.smith <aaron.lee.smith@gmail.com>; Stella Stamenova <stilis@microsoft.com>; pavel@labath.sk; mosescu@google.com; mgorny@gentoo.org; arphaman@gmail.com; llvm-commits@lists.llvm.org; llvm@inglorion.net
Subject: Re: [PATCH] D53002: Create a new symbol file plugin for cross-platform PDB symbolization