This is an archive of the discontinued LLVM Phabricator instance.

Add SymbolFilePDB with basic support for line tables.
ClosedPublic

Authored by zturner on Feb 17 2016, 6:23 PM.

Details

Summary

This is a first attempt at getting SymbolFilePDB working with line table support. A few notes:

  • This won't compile until a corresponding patch goes in on the LLVM side. The patch is done on my side, but not committed yet.
  • Right now only line tables and compilation units are supported. In particular, this means that "target modules dump line-table <file>" works, and displays output like this:
(lldb) file d:\src\llvmbuild\ninja\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe
Current executable set to 'd:\src\llvmbuild\ninja\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe' (i686).
(lldb) target modules dump line-table test-pdb.cpp
Line table for test-pdb.cpp in `test-pdb.exe
0x00401020: :4
0x00401030: :6
0x00401033: :8
0x00401035: :9
  • I checked in some unittests with 2 precompiled binaries and a PDB file for testing purposes. I opted for unittests and checked-in binaries here for a number of reasons.
    1. The SB API does not provide a way to test something this low level.
    2. Obviously not every platform has the ability to create PDB files.
    3. With the help of some magic compiler switches, the binaries are very small.
    4. At some point or another this will be unavoidable, as we will want to test "PDB file generated by MSVC on Windows can be read on Linux"
    5. I don't feel comfortable checking in something with no tests.

Diff Detail

Event Timeline

zturner updated this revision to Diff 48268.Feb 17 2016, 6:23 PM
zturner retitled this revision from to Add SymbolFilePDB with basic support for line tables..
zturner updated this object.
zturner added a reviewer: clayborg.
zturner updated this object.
zturner updated this object.
zturner added a subscriber: lldb-commits.

Added a few notes to clarify things that might not be obvious upon a first look at the patch.

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
82

Note that this entire plugin is supposed to compile and be registered on every platform, including platforms that don't support PDB (i.e. which, for now, is all non-Windows platforms). But this function will simply return an error in those case, and return 0 for the abilities.

Later, if and when PDB reading support is implemented in such a way as to not require Windows, we need only change the first argument and to loadDataForExe and we will have PDB support on non windows platforms with no other changes required.

119–120

This is a mistake, it does contain the language. I will need to implement this, although for all intents and purposes it doesn't matter because it's C++ in all cases we care about anyway.

375–376

This comment is no longer valid. It returns the source file name. Pretend you didn't see this.

unittests/SymbolFile/PDB/Inputs/test-dwarf.cpp
2–3

Need to fix this comment to include the clang++ command line instead of the cl command line.

unittests/SymbolFile/PDB/SymbolFilePDBTests.cpp
95

This will cause the 2 PDB-specific tests to run only on Windows.

clayborg requested changes to this revision.Feb 18 2016, 5:16 PM
clayborg edited edge metadata.

So the main issue is to fix the:

uint32_t
SymbolFilePDB::ResolveSymbolContext(const lldb_private::FileSpec &file_spec, uint32_t line, bool check_inlines,
                                    uint32_t resolve_scope, lldb_private::SymbolContextList &sc_list)

So that it searches all compile units when "check_inlines" is true. See inlined comments.

Feel free to add default implementations of any SymbolFile functions that you overrode that aren't doing anything. Especially for things you don't plan to implement. SymbolFileSymtab can them rely on these new default implementations.

source/Initialization/SystemInitializerCommon.cpp
97–98

Should we make a SBHostOS::Initialize() and move this code and the windows specific #include statements into that file? Not sure how `#if defined()` up this file is, but it is something to think about.

202–204

Call Host::Terminate() and move ::CoUninitialize(); over into the windows specific Host::Terminate()?

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
83–84

how about reversing this just to be safe:

if (error == llvm::PDB_ErrorCode::Success)
    return CompileUnits | LineTables;
83–85

how about reversing this just to be safe:

if (error == llvm::PDB_ErrorCode::Success)
    return CompileUnits | LineTables;

Do you also need to clear m_session in the else clause?

86

If you reverse logic above, then this becomes:

return 0;
93

Do you need to check m_session?

99–101

Do you want to cache the compile unit count, or is this really cheap to call over and over?

120

It will be interesting to see how you solve this one if there is no language in the compile unit. I would be shame to have to iterate through all functions and check for C++... Or look for classes or other things that are defined in the current compile unit.

147

Is this really a 32 bit value for 32 and 64 bit binaries?

163–237

Feel free to add default implementations to the base SymbolFile class if you don't plan to add your own versions of any functions. We have this same kind of code over in SymbolFileSymtab and I would love to get rid of 100 different "return invalid value" functions.

257–258

You need to still look through all line tables if "check_inlines" is true. Someone might ask for "vector" as the FileSpec (no directory) and 12 for the line table. What we do in DWARF is grab all compile units, get the "LineTable*" from each compile unit, and look to see if "file_spec" is in the support files by asking the support files to find the index for "file_spec" and if the file index is not UINT32_MAX, then we iterate through all line table entries and extract the entries that match. See the SymbolFileDWARF::ResolveSymbolContext() for more details.

source/Plugins/SymbolFile/PDB/SymbolFilePDB.h
183

Please rename to "m_session_up" to indicate it is a unique_ptr.

This revision now requires changes to proceed.Feb 18 2016, 5:16 PM
zturner added inline comments.Feb 18 2016, 5:28 PM
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
99–101

Most of these methods are really cheap. In fact, the entire process of caching stuff in SymbolContexts and whatnot is probably unnecessary for us in theory because while it may be cheaper performance wise to cache it the way we're doing, it's not *that* expensive to just query the file, and when you consider the memory usage of caching it a second time, it's not clear it's a good tradeoff.

That said, there's not really a way around this with the current architecture, so I only cache where necessary (can always add more later if performance becomes an issue)

120

Yea, my comment was wrong. I can fix this, I just didn't realize it at the time.

257–258

So does check_inlines really mean check_header_files? I kind of ignored the argument because it wasn't obvious to me what it did based on the name.

What is supposed to happen if someone specifies vector as the FileSpec, but vector is #included from 10 different compile units? Will this return line information from all 10 of those compile units?

I'm not using support files for anything right now because I can't figure out if it makes sense or if we need it. The PDB tells us, for each compile unit, the list of all source files that contribute to that compile unit. It sounds like that's what you're using support files for, but in my case I'm comfortable (at least for now) simply querying the PDB every time.

Just to make sure I understand, is it safe to say that:

  1. If check_inlines is false, sc_list should return with exactly 1 SymbolContext with m_comp_unit set to the main source file?
  2. The same for the check_inlines is false calse -- we still just have SymbolContext but we add line table entries from all additional files that contribute to the compilation unit?

When will sc_list end up with multiple entries?

There are problems with the source file IDs you are using as they currently must be indexes into the compile unit's support files. See inline comments for more details.

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
147

Again, is this really only a 32 bit value??? This seems wrong.

150

So line table entries get stored as lldb_private::LineTable::Entry values. The "source_id" you use is currently assumed to be an index into your compile unit's support files (CompileUnit::GetSupportFiles()). So you will either need to:

  • make each CompileUnit create the support files array and translate your "source_id" over into an index into the compile units FileSpecList that will get returned from CompileUnit::GetSupportFiles().
  • Change CompileUnit so that it no longer has a "FileSpecList &CompileUnit::GetSupportFiles()" function and make a "const FileSpec *CompileUnit::GetSupportFilebyID (lldb::user_id_t file_id);" and convert all users of "FileSpecList &CompileUnit::GetSupportFiles()" over to the new "const FileSpec *CompileUnit::GetSupportFilebyID (lldb::user_id_t file_id);"

So does a PDB file have a source file array itself that is accessed by a zero or one based index ID? If so it might be possible to keep GetSupportFiles(), else you might have better luck switching us over to "const FileSpec *CompileUnit::GetSupportFilebyID (lldb::user_id_t file_id);" so we allow each SymbolFile to determine how it speaks about files. In DWARF, we would continue to use the file index as the file_id, and PDB would end up using the "getSourceFileId()". Hopefully all source file accesses use this same source file ID??

257–258

So does check_inlines really mean check_header_files? I kind of ignored the argument because it wasn't obvious to me what it did based on the name.

So yes "check_inlines" means to check header files.

What is supposed to happen if someone specifies vector as the FileSpec, but vector is #included from 10 different compile units? Will this return line information from all 10 of those compile units?

Yes, and there might be multiple vector line 10 entries in each compile unit, so you may end up with many.

I'm not using support files for anything right now because I can't figure out if it makes sense or if we need it. The PDB tells us, for each compile unit, the list of all source files that contribute to that compile unit. It sounds like that's what you're using support files for, but in my case I'm comfortable (at least for now) simply querying the PDB every time.

That is fine as long as it is quick and also supports looking up things by basename. Does PDB support that? Most of the time you won't be getting a full path since users rarely type a full path.

You are specifying a source_id for your line entries when you create your line tables, and you will need to have them be valid indexes into your support files otherwise, your LineTable entries won't ever be able to lookup the file... See above comment in your line table code...

Just to make sure I understand, is it safe to say that:

If check_inlines is false, sc_list should return with exactly 1 SymbolContext with m_comp_unit set to the main source file?

You would get one SymbolContext per compile unit whose path matches the file_spec _and_ contains a line number. There might be multiple "Foo.cpp" files in different directories withing one PDB file, so you might still end up finding N matches. So your existing code can be used when check_inlines if false as long as the PDB can search by fullname and by basename. I am guessing it doesn't though, am I wrong?

The same for the check_inlines is false calse -- we still just have SymbolContext but we add line table entries from all additional files that contribute to the compilation unit?
When will sc_list end up with multiple entries?

You always look through all compile units. If check_inlines is false, then you make sure "file_spec" matches (by basename if only basename is specified, or by full path if a directory and filename are valid inside file_spec. If the file matches, then look for any lines that match and return ALL instances of them. Auto inlining of a function inside a compile unit might cause there to be many entries for line 10.

If check_inlines is true, then you must find all inlined entries that match. This means the compile unit itself, or any other line entries whose file matches must be considered. This is the default scenario. With each matching file, you must find all of the matches. Such would be the case for "vector" line 10 since you might have 20 std::vector::append() instantiations in the current compile unit and you would want to stop at each of these instances if you said "b vector:10".

If someone specifies "vector" and line 10, then yes, it should return line info for all matches for "vector" line 10 from all compile units. This is the default thing the user wants so we need to make sure looking up inline entries works well and is performant. If you have a way to query the PDB for all matches by file and line that can also do inline files, feel free to use that and then ask each line entry for its file address, and then look up the file address in your LineTable entries using the:

SymbolFilePDB::ResolveSymbolContext(const lldb_private::Address &so_addr, uint32_t resolve_scope,
                                    lldb_private::SymbolContext &sc)

So it depends on how well PDB supports lookups by file and line. And also be sure to handle the case where file_spec has only a basename.

I didn't respond to all your comments because I don't have the code in front of me. I'll take a look at the rest of the comments tomorrow.

source/Initialization/SystemInitializerCommon.cpp
202–204

Yea this seems like a reasonable way to go. I'll do this.

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
147

Will have to check. I think you're right though that it's a 64-bit value. relative virtual address is 32-bit though, that's probably what confused me.

150

The file itself is a black box. But the SDK the operating system provides to query the PDB does have its own notion of ids that it hands out. It's unfortunate that I can't use this. Your second solution of adding a CompileUnit::GetSupportFileByID might work, although it's kind of invasive change. I'll see how much work is involved tomorrow.

257–258

Just to make sure I understand what "support files" are.

If I have this file:

// foo.cpp
#include "foo.h"
#include <vector>
int main(int argc, char **argv) {
    return 0;
}

and I run clang.exe foo.cpp -o foo.exe, and someone calls ResolveSymbolContext("foo.cpp", ...), then in this scenario:

  1. The compilation unit is foo.cpp
  2. The support files are foo.h, vector, and any other headers indirectly included by those headers.
  3. If check_inlines is false, the SymbolContextList should only have one entry for foo.cpp, and all the line table should contain all entries from foo.cpp that match the line number.
  4. If check_inlines is true, the SymbolContextList should still only have one entry for foo.cpp, but the line table should contain all entries from foo.cpp that match *and* all entries from all support files (as defined in #2) that match.

Also, the code as written should already handle the case where only basename is specified (the SDK handles this case transparently)

(BTW, it would be nice if there were a way to write unit tests for all the things you're describing. All the different ways of calling it with different combinations of arguments, and verifying the output is as expected. It's a lot to wrap your head around trying to turn words into code, but if you have a test that just fails, then it's easy to look at it and see what it expects).

General things to know about SymbolFiles:

  • We use SymbolContext objects to refer to a specific symbol context: module, compile unit, function, deepest block, line entry and symbol.
  • A load address should produce 1 symbol context. Never more.
  • A file address can produce many symbol contexts. Think about looking up address 0x1000. On MacOSX all shared libraries have their file virtual addresses such that they start at 0x0, so many functions shortly after zero, so address 0x1000 can produce many matches since file addresses aren't unique
  • file + line can match many symbol contexts. Think about what the user would expect if they typed: (lldb) b vector:123
  • SymbolFile objects produce IDs for compile units, functions, variables, types, and many other things by creating these objects with a "lldb::user_id_t" that makes sense for the SymbolFile itself such that if they are given this ID back and asked to parse further information, they should quickly be able to use that number to get back to the debug info and parse more. For DWARF we use the DIE offset. The only thing that isn't correctly abstracted right now is the file IDs, they are currently expected to be indexes into the support files. This works well for DWARF, but probably not for other formats. We can change this if needed as I mentioned before.
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
99–101

I was just talking about caching the number of compile units. You will be asked to create each individual lldb_private::CompileUnit by index with:

lldb::CompUnitSP
SymbolFilePDB::ParseCompileUnitAtIndex(uint32_t index)

And you will be asked only once for a compile unit by index since the symbol vendor will cache those for you.

258

Just to make sure I understand what "support files" are.

If I have this file:

// foo.cpp
#include "foo.h"
#include <vector>
int main(int argc, char **argv) {

return 0;

}
and I run clang.exe foo.cpp -o foo.exe, and someone calls ResolveSymbolContext("foo.cpp", ...), then in this scenario:

The compilation unit is foo.cpp
The support files are foo.h, vector, and any other headers indirectly included by those headers.

The support files for DWARF will be "foo.cpp" as the first file (this is just the way DWARF does it) and foo.h vector and any other header files that were used will also be in there. So your support files will need to include your compile unit source file because you will create some line entries that point to your main source file.

If check_inlines is false, the SymbolContextList should only have one entry for foo.cpp, and all the line table should contain all entries from foo.cpp that match the line number.

SymbolContextList would have N entries inside of it where the SymbolContextList.comp_unit is always a compile unit whose name matches the "file_spec". The LineEntry in each symbol context will also have a file index that matches the SupportFiles index for the "foo.cpp" source file.

If check_inlines is true, the SymbolContextList should still only have one entry for foo.cpp, but the line table should contain all entries from foo.cpp that match *and* all entries from all support files (as defined in #2) that match.

No there will be one SymbolContext for each line entry that matches. All of the SymbolContext objects will have LineEntry whose file index will produce a file that is "foo.cpp" when the file is extracted by index from the compile units support files.

So a quick example in main.cpp:

 1	#include <vector>
 2	
 3	int main()
 4	{
 5	    std::vector<int> ints;
 6	    ints.push_back(123);
 7	    ints.push_back(234);
 8	}
 9	
10	int foo()
11	{
12	    std::vector<long> longs;
13	    longs.push_back(123);
14	    longs.push_back(234);
15	}

If vector<T>::push_back is on vector:234, and the user asks to set a breakpoint:

(lldb) b vector:234

Lets says our support files look like:

support_files[0] = main.cpp
support_files[1] = /usr/include/.../vector

You would have 4 symbol contexts in your SymbolContextList:

SymbolContext[0]:
   module = a.out
   compile_unit = main.cpp
   function = main
   block = std::vector<int>::push_back() with call_file = 0 and call_line = 6 (main.cpp:6)
   line_entry = file_idx = 1, line = 234, address = 0x1000

SymbolContext[1]:
   module = a.out
   compile_unit = main.cpp
   function = main
   block = std::vector<int>::push_back() with call_file = 0 and call_line = 7 (main.cpp:7)
   line_entry = file_idx = 1, line = 234, address = 0x1010

SymbolContext[2]:
   module = a.out
   compile_unit = main.cpp
   function = foo
   block = std::vector<long>::push_back() with call_file = 0 and call_line = 13 (main.cpp:13)
   line_entry = file_idx = 1, line = 234, address = 0x2000

SymbolContext[3]:
   module = a.out
   compile_unit = main.cpp
   function = foo
   block = std::vector<long>::push_back() with call_file = 0 and call_line = 14 (main.cpp:14)
   line_entry = file_idx = 1, line = 234, address = 0x2010

Note the compile unit always is the compile unit that the code exists in. The block is a lldb_private::Block that has inlined function info that tells us about std::vector inlined instantiation and also tells us the inline call stack. In LLDB inlined functions are just lexical blocks that have inline function info (inlined function name + call file + call line). There could be many inlined blocks above your current block as well..

Also, the code as written should already handle the case where only basename is specified (the SDK handles this case transparently)

(BTW, it would be nice if there were a way to write unit tests for all the things you're describing. All the different ways of calling it with different combinations of arguments, and verifying the output is as expected. It's a lot to wrap your head around trying to turn words into code, but if you have a test that just fails, then it's easy to look at it and see what it expects).

We should be covering this using API tests that make an example like the one above and setting breakpoints in the inline header file. We have many tests for this. It is hard to get compilers to generate things that you can put in unit tests since they often change.

zturner marked 6 inline comments as done.Feb 19 2016, 1:15 PM
zturner added inline comments.
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
83–84

I wrote it this way because then I only have to construct a non-zero value by ORing enums together once. So I don't have to worry about keeping the different return values in sync with each other. Not a big deal either awy, but that was my thinking.

83–85

Shouldn't need to clear m_session. If it returns an error, m_session will not have been initialized by the API.

93

Shouldn't need to. m_session will be valid if and only if the error code returned in CalculateAbilities was non-zero, and CalculateAbilities will return non-zero if and only if that call succeeded as well. So we should never get here unless CalculateAbilities returned non-zero, which implies the session is valid.

147

I checked, it is indeed a 64-bit value. Thanks for catching this.

150

In CompileUnit.h, it says this:

/// A representation of a compilation unit, or compiled source file.
/// The UserID of the compile unit is specified by the SymbolFile
/// plug-in and can have any value as long as the value is unique
/// within the Module that owns this compile units.

Are you sure it's an index into the support files? It sounds like maybe the DWARF plugin treats it as such, but it's not assumed to be such by any generic code.

zturner updated this revision to Diff 48544.Feb 19 2016, 1:16 PM
zturner edited edge metadata.

This should address all (I think) issues except for the one about check_inlines, which I'll do in a followup. Can you respond to my comment in a previous update about the uniqueness / index-ness of the comp_unit id?

zturner added inline comments.Feb 19 2016, 1:58 PM
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
150

Ahh i see the problem. The problem is not the value Im' specifying for the id of the compile unit, but the value I'm specifying in the argument to the constructor of LineEntry, which is clearly documented as an index.

Using the CompileUnit::GetSupportFileByID seems difficult based on a cursory glance over how it's used, so for now I will just do some extra work to add support files and map ids to indices.

Later we can think about whether it's worth it to do that refactor to get rid of the assumption that they're indices.

clayborg requested changes to this revision.Feb 19 2016, 2:02 PM
clayborg edited edge metadata.

So your LineTable::Entry objects that you create are still invalid since you didn't implement the support files.

All comments became unaligned but you did comment that:

In CompileUnit.h, it says this:

// A representation of a compilation unit, or compiled source file.
// The UserID of the compile unit is specified by the SymbolFile
// plug-in and can have any value as long as the value is unique
// within the Module that owns this compile units.

Are you sure it's an index into the support files? It sounds like maybe the DWARF plugin treats it as such, but it's not assumed to be such by any generic code.

I wasn't talking about the CompileUnit's UserID, I was talking about file indexes that are in the line tables that you create. The CompileUnit's UserID is what ever you want it to be and this is correct and will work for any debug info. The line tables use file indexes for files in the LineTable::Entry structures and they rely on the file index being a valid index into the CompilerUnit::GetSupportFiles() FileSpecList.

source/Host/windows/HostInfoWindows.cpp
37

Should be:

::CoUninitialize();
This revision now requires changes to proceed.Feb 19 2016, 2:02 PM
zturner updated this revision to Diff 49442.Feb 29 2016, 5:56 PM
zturner edited edge metadata.

Note this should compile now on every platform. Look over the unit tests to see my assumptions about how the API should work. Hopefully this makes reviewing things easier (it definitely made getting things to work easier)

clayborg requested changes to this revision.Mar 1 2016, 10:10 AM
clayborg edited edge metadata.

One general comment is the use of "auto". Although it makes the code shorter, it does make it quite a bit less readable. I will leave the decision to you since this is your code, but in general I think this is where auto is less than it is cracked up to be.

source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
283–286

So looking at SymbolFileDWARF::ResolveSymbolContext() I see it is very close to being SymbolFile agnostic... We would make SymbolFileDWARF::ResolveSymbolContext() into SymbolFile::ResolveSymbolContext() and clean it up to just use virtual SymbolFile calls. Then all SymbolFile plug-ins wouldn't need to implement this function. The basic flow of the function in DWARF is to iterate through all compile units. If check_inlines is true or the compile unit matches, grab the support files via lldb_private::CompileUnit::GetSupportFiles() and see if "file_spec" is in that support files list and find the one and only index for that file. If the index is valid, then get the LineTable from the compile unit via lldb_private::CompileUnit::GetLineTable(). Then find all matching entries. So with a quick refactor, all we need new SymbolFile instances to implement is GetSupportFiles() (which calls SymbolFile::ParseCompileUnitSupportFiles()) and CompileUnit::GetLineTable() (which calls into SymbolFile::ParseCompileUnitLineTable()). What do you think? This also helps others implementing new SymbolFile classes to not have to worry about the check_inlines thing.

292–293

So if file_spec is "vector", this function will return all compile units that have line table entries that match "vector"? It doesn't seem like this is correct. If "check_inlines" is true, I would expect that you need to traverse all compilands?

429–449

"auto" really makes it hard to read this code to figure out what each variable actually is from someone that doesn't know the code. I will leave it up to you to do what you will with this, but this is where auto falls down for me.

This revision now requires changes to proceed.Mar 1 2016, 10:10 AM

Regarding the refactoring of ResolveSymbolContext to a lower level. It
seems like a worthwhile refactor but probably one that should be done as an
independent CL. It seems like it has potential to open up a bit of a rats
nest so to speak, and if something ends up breaking as a result, we can
revert just the targeted refactor instead of the entirety of PDB support.

zturner added inline comments.Mar 2 2016, 10:44 AM
source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp
292–293

I was under the impression that this is exactly what it *should* do. If someone says they want to set a breakpoint in line 100 of <vector>, then this would need to find all compilands that have line contributions from <vector>, and put a breakpoint anywhere that line 100 of <vector> contributes to the source.

So basically if file_spec is <vector> here, then this would return an iterator for all source files that have #include <vector>. If you had this code:

// foo.cpp
#include <vector>

std::vector<int> g_vec;
// bar.cpp
#include "bar.h"
std::vector<int> g_vec;
// bar.h
#include <vector>
// baz.cpp
// empty file

So in the above example, that line would return an iterator with 2 compilands, one for foo.cpp and one for bar.cpp. Because <vector> contributes to both of those source files.

If check_inlines is false, we then skip the file unless it is actually <vector> itself (which of course it never would be, because you would never "compile" <vector>). And if it's true, we create a SymbolContext for it.

So in short, we don't need to traverse *all* compilands, only those that have line contributions from <vector>.

My guess is that DWARF doesn't make this easy to figure out, so you have to traverse all the lines and figure this out manually. But in PDB it's easy to get, we just say "give me all compilands that have some lines from <vector>, and it magically narrows down the resulting compilands to the exact right set.

clayborg accepted this revision.Mar 2 2016, 10:49 AM
clayborg edited edge metadata.

You are right. I am slowly seeing that PDB was designed well to be consumed as a user would want to, not compressed to the point of obfuscation and very difficult to extract data from like DWARF.

So then we really should leave things the way they are and let "SymbolFilePDB::ResolveSymbolContext(const lldb_private::FileSpec &, uint32_t, bool, uint32_t, lldb_private::SymbolContextList &)" do the search because it can do so more efficiently than the manual search we do with DWARF.

This revision is now accepted and ready to land.Mar 2 2016, 10:49 AM

You are right. I am slowly seeing that PDB was designed well to be consumed as a user would want to, not compressed to the point of obfuscation and very difficult to extract data from like DWARF.

So then we really should leave things the way they are and let "SymbolFilePDB::ResolveSymbolContext(const lldb_private::FileSpec &, uint32_t, bool, uint32_t, lldb_private::SymbolContextList &)" do the search because it can do so more efficiently than the manual search we do with DWARF.

Yea, in fact I suspect that when we start having large modules with thousands of lines of pre-processed source things will perform really poorly with PDB under the current design. Because it already exposes an API to return stuff in a manner convenient for being consumed, but then we have to convert it to LLDB's format for being consumed. So the extra overhead done by the underlying API will have been wasted.

I still need to fix the issue regarding termination entries before I can submit, but that sounds like the last issue. I'll try to get that worked out and then submit. Thanks!

I looked into this a little bit, it's going to be a couple days of work for me to get termination entries set correctly. Can I go in as-is? I know you already Accepted the revision, just want to double check.

I need to go make some changes to llvm-pdbdump on the LLVM side (it's just a tool we use for exploratory purposes like llvm-dwarfdump) to learn a little bit more about how the API behaves in order to confidently set the termination entry.

So I was thinking I can just make a followup patch to fix that issue.

fwiw, here's my current output showing that things behave correctly for the test case I've created.

(lldb) target create D:\src\llvm\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe
Current executable set to 'D:\src\llvm\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe' (i686).
(lldb) target modules dump line-table test-pdb.cpp
Line table for d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp in `test-pdb.exe
0x00401040: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:7
0x00401043: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:8
0x00401045: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:9
0x00401050: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:13
0x00401054: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:14
0x00401070: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:15
0x00401080: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:5
0x00401083: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:6
0x00401089: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:7
0x00401090: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:9
0x00401093: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:10
0x004010a2: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:11
clayborg requested changes to this revision.Mar 2 2016, 1:01 PM
clayborg edited edge metadata.

As long as you don't mind your last line entry potentially being really large because it isn't terminated?

(lldb) target create D:\src\llvm\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe
Current executable set to 'D:\src\llvm\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe' (i686).
(lldb) target modules dump line-table test-pdb.cpp
Line table for d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp in `test-pdb.exe
0x00401040: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:7
0x00401043: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:8
0x00401045: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:9
0x00401050: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:13
0x00401054: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:14
0x00401070: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:15
0x00401080: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:5
0x00401083: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:6
0x00401089: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:7
0x00401090: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:9
0x00401093: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:10
0x004010a2: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:11

How large is the last line entry going to be? It isn't capped at all. You at least need a termination entry for the last entry before this can go in. You probably just need to find the function for file address 0x004010a2 and get the end address of the function and use that to terminate.

This revision now requires changes to proceed.Mar 2 2016, 1:01 PM

That should be easy enough. If we make the simplifying assumption "all functions are represented contiguously in memory with no gaps", then I can just create a termination entry at start_of_function + num_bytes_in_function.

Obviously this is a wrong assumption in the general case, but it's at least true for trivial throwaway functions compiled in debug mode, which is enough to at least say "simple cases work"

Is this what I should be seeing?

(lldb) target create D:\src\llvm\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe
Current executable set to 'D:\src\llvm\tools\lldb\unittests\SymbolFile\PDB\Inputs\test-pdb.exe' (i686).
(lldb) target modules dump line-table test-pdb.cpp
Line table for d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp in `test-pdb.exe
0x00401040: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:7
0x00401043: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:8
0x00401045: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:9
0x00401047: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:9

0x00401050: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:13
0x00401054: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:14
0x00401070: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:15
0x00401073: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.cpp:15

0x00401080: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:5
0x00401083: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:6
0x00401089: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:7
0x0040108b: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb-nested.h:7

0x00401090: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:9
0x00401093: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:10
0x004010a2: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:11
0x004010a4: d:\src\llvm\tools\lldb\unittests\symbolfile\pdb\inputs\test-pdb.h:11

There is an additional line entry now which corresponds to the termination entry. The penultimate entry (in this example anyway) is always the first byte of the epilogue, and the last entry is the first byte of non-function data.

Each group of lines separated by an empty line represents one function in the named file. Does this seem right?

clayborg accepted this revision.Mar 2 2016, 1:45 PM
clayborg edited edge metadata.

I guess you already tackled the termination entry! That is exactly what I wanted to see... Sounds like line tables are done!

This revision is now accepted and ready to land.Mar 2 2016, 1:45 PM

The empty line signifies that the previous entry was a termination entry.

zturner closed this revision.Mar 2 2016, 2:17 PM
chying added a subscriber: chying.Mar 2 2016, 5:10 PM

Seems this patch breaks OSX build. I guess the new files were not added to xcode project file.
Could you please take a look?
http://lab.llvm.org:8011/builders/lldb-x86_64-darwin-13.4/builds/8892/steps/ninja%20build%20local/logs/stdio

jingham added a subscriber: jingham.Mar 2 2016, 5:26 PM

Fixed with r262543

Jim

This is already fixed. Update your sources and let us know if things are working.

chying added a comment.Mar 2 2016, 5:55 PM

Yes, it is fixed now. Thank you!