This is an archive of the discontinued LLVM Phabricator instance.

SymbolFileBreakpad: Add line table support
ClosedPublic

Authored by labath on Jan 11 2019, 5:08 AM.

Details

Summary

This patch teaches SymbolFileBreakpad to parse the line information in
breakpad files and present it to lldb.

The trickiest question here was what kind of "compile units" to present
to lldb, as there really isn't enough information in breakpad files to
correctly reconstruct those.

A couple of options were considered

  • have the entire file be one compile unit
  • have one compile unit for each FILE record
  • have one compile unit for each FUNC record

The main drawback of the first approach is that all of the files would
be considered "headers" by lldb, and so they wouldn't be searched if
target.inline-breakpoint-strategy=never. The single compile unit would
also be huge, and there isn't a good way to name it.

The second approach will create mostly correct compile units for cpp
files, but it will still be wrong for headers. However, the biggest
drawback here seemed to be the fact that this can cause a compile unit
to change mid-function (for example when a function from another file is
inlined or another file is #included into a function). While I don't
know of any specific thing that would break in this case, it does sound
like a thing that we should avoid.

In the end, we chose the third option, as it didn't seem to have any
major disadvantages, though it was not ideal either. One disadvantage
here is that this generates a large number of compile units, and there
is still a question on how to name it. We chose to simply name it after
the first line record in that function. This should be correct 99.99% of
the time, though it can produce somewhat strange results if the very
first line record comes from an #included file.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Jan 11 2019, 5:08 AM
labath marked an inline comment as done.Jan 11 2019, 5:13 AM
labath added inline comments.
source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp
150–151 ↗(On Diff #181252)

Note that here i set file=column=line=0 for the terminal entry, which isn't consistent with the dwarf plugin for instance (it puts there whatever falls out of the state automaton, which most likely means the values from the previous entry). AFAICT, this shouldn't be a problem, because the terminal entry is there to just determine the range of the last real entry.

So LLDB treats compile units special depending on the inline strategy. Because of this, I outlined some examples of how and why we should create a compile unit per "FUNC" token. Let me know if anything above was unclear

source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp
119 ↗(On Diff #181252)

I would vote to make 1 compile unit for each "FUNC". The hard part will be to select the right source file for each function. I would just start by selecting the first line entry for the function. Compile units are expected to give a list of support files, and in this case, you would make a set of files that are in the line entries only. So if you had:

MODULE Linux x86_64 761550E08086333960A9074A9CE2895C0 a.out
INFO CODE_ID E05015768680393360A9074A9CE2895C
FILE 0 /tmp/a.c
FILE 1 /tmp/b.c
FILE 2 /usr/include/foo.h
FUNC b0 10 0 func1
b0 1 1 0
b1 1 2 0
b2 1 2 1
b4 1 3 1
FUNC b0 10 0 func2
b0 1 1 1
b1 1 2 1
b2 1 2 2

We would have a compile unit for "func1" with a cu for "/tmp/a.c" and with support files:
support_file[0] = "/tmp/a.c"
support_file[1] = "/usr/include/foo.h"

We would have a compile unit for "func2" with a cu for "/tmp/b.c" and with support files:

support_file[0] = "/tmp/b.c"
support_file[1] = "/usr/include/foo.h"

The main reason for make individual compile units, is LLDB treats a compile unit specially when settings breakpoints depending on if we ask for inline functions to be set. If we set the following setting:

(lldb) settings set target.inline-breakpoint-strategy never

Then we will check the name of the compile unit to ensure it matches. So if we did:

(lldb) b a.c:12

This would only work if the actual lldb_private::CompileUnit has a FileSpec that matches "a.c".

122 ↗(On Diff #181252)

return the count of the number of "FUNC" objects

123–125 ↗(On Diff #181252)

parse a compile unit per function. We might want to cache all "FILE" entries in a list inside the SymbolFileBreakpad so we can easily pull out the FileSpecs when creating each compile unit.

Also, each compile unit's ID can be the line number in the breakpad file to the "FUNC" entry. This allows easy access to each "FUNC" entry in the breakpad file when we are asked to parse more information about it (get compile unit support files, or any parsing of info for a compile unit.

158–179 ↗(On Diff #181252)

It would be nice to put this parsing code into the lldb_private::breakpad::Line" class I talked about in the other patch?

It would be great if this code looked like:

lldb_private::breakpad::Line bp_line(line);
switch (bp_line.GetToken()) {
  case Token::Func:
    // Create the compile unit and store into cu_sp
    cu_sp = bp_line.CreateCompileUnit();
    break;
  case Token::Line: {
    addr_t address;
    size_t size;
    uint32_t line_num, file_num;
    if (bp_line.ParseLineEntry(address, size, line_num, file_num)) {
      // Discontiguous entries. Finish off the previous sequence and reset.
      if (next_addr && *next_addr != address)
        finish_sequence();
      line_table_up->AppendLineEntryToSequence(
          line_seq_up.get(), address, line_num, 0, file_num, true, false, false, false,  false);
    }
199 ↗(On Diff #181252)

This function will need to populate support_files for a given FUNC as mentioned above

234 ↗(On Diff #181252)

We need to search each compile unit to see which compile unit contains the address now.

source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.h
99 ↗(On Diff #181252)

Remove whitespace only change

labath planned changes to this revision.Jan 17 2019, 5:08 AM

I think I understand what you mean. I'll try to refactor this to create a compile unit for each function.

labath updated this revision to Diff 184103.Jan 29 2019, 9:18 AM

It took me a while to get back to this, but I believe this version now does what
we want. It creates a compile unit for each function. The compile units are
created as soon as the symbol file is initialized (it's needed to get the number
of compile units). The global list of support files is also parsed at this time
(needed to get the compile unit name). The list of support files and line tables
is created lazily for those compile units that need them.

A new element here is the "bookmark" class, which allows us to efficiently go
back the to place where the line tables for a given function/compile unit are
defined and parse those.

I added a couple of new tests to exercise handling of odd or broken minidump
files.

Let me know what you think.

labath added a comment.Feb 5 2019, 7:20 AM

Greg, what do you think about the new approach in this patch?

I like the way you did the compile units and the line tables and support file list. It would be nice to change this to do things more lazily. Right now we are parsing all compile unit data into CompUnitData structures and then passing their info along to the lldb_private::CompileUnit when we need to. We can be more lazy, see inlined comments.

source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp
279 ↗(On Diff #184103)

Do we need to iterate over the file multiple times here? We do it once here, and then once on line 260.

281–300 ↗(On Diff #184103)

Seems like we should just populate the m_compile_units data with address range to character file offset here? When we are asked to create a compile unit, we do this work by going to the "lldb_private::CompileUnit::GetID()" which will return the file offset and we just head there and start parsing?

314 ↗(On Diff #184103)

This seems like where we would do the heavy parsing code that are in the initialize object function. .Get the character file offset from m_compile_units and just parse what we need here? It will cause less work for us in the initialize object call then and we can be lazier

326 ↗(On Diff #184103)

From the compile unit, if GetID() returns the character file offset to the FUNC or first LINE, then we don't need the preparsed CompUnitData? We can just parse the line table here if and only if we need to

332 ↗(On Diff #184103)

Ditto above

source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.h
151–154 ↗(On Diff #184103)

Why do we need more than just a file character offset here?

182–188 ↗(On Diff #184103)

Seems like if we just pick the file character offset of the function or the function's line entries as the lldb_private::CompileUnit user ID (lldb::user_id_t) then we don't really need this class? We just create the compile unit as a lldb_private::CompileUnit and our symbol file parser will fill in the rest? Any reason we need this CompUnitData class?

212 ↗(On Diff #184103)

Could this just be:

using CompUnitMap = RangeDataVector<lldb::addr_t, lldb::addr_t, lldb::offset_t>;

Where offset_t is the character fie offset for the first line of the FUNC entry? Any reason to use CompUnitData instead of just creating lldb_private::CompileUnit objects?

214 ↗(On Diff #184103)

Use FileSpecList?

labath marked 11 inline comments as done.Feb 5 2019, 9:19 AM

Thanks for the review Greg. See my responses inline. I'm going to try incorporating the changes tomorrow.

source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp
279 ↗(On Diff #184103)

The two loops iterate over different parts of the file. The first one goes through the FILE records, and this one does the FUNC records. So the iteration here is efficient because we already know where different kinds of records are located in the file.

(of course, to figure out where these records are located, we've had to go through it once already (in ObjectFileBreakpad), so we still have to make two passes over this data in general. However, that is pretty much unavoidable if we want to do lazy (i.e. random access) into the file as it doesn't have any kind of index to start with.)

281–300 ↗(On Diff #184103)

I think I could avoid creating the CompileUnit object here. However, I will still need to do the parsing here, as I will need to figure out the number of compile units first (best I might be able to achieve is to delay this until GetNumCompileUnits() time).

326 ↗(On Diff #184103)

That is pretty much what happens here. CompUnitData construct the line table (almost) lazily. It doesn't preparse. The reason I have this indirection, is that the creation of line tables is coupled with the creation of the support file list:

  • in order to build the line table, I (obviously) need to go through the LINE records
  • however, I also need to go through the LINE records in order to build the CU file list, because I need to know what files are actually used in this "CU"

It seemed like a good idea to me to avoid parsing the LINE records twice. So what I've done is that on the first call to (ReleaseLineTable|ReleaseSupportFiles), CompUnitData will parse both things. Then, the second call will return the already parsed data. That seems like a good tradeoff to me as these two items are generally used together (one is fairly useless without the other).

source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.h
151–154 ↗(On Diff #184103)

That's because ObjectFileBreakpad breaks down the file into sections (so all FILE records would go into one section, PUBLIC records into another, etc.). This means that we don't need any "bookmarks" when we want to jump straight to the PUBLIC records for instance, but it does mean we need two coordinates (section, offset) when we want to jump to a specific record within a section.

182–188 ↗(On Diff #184103)

I'd like the keep to enable the CompUnitData for conjugated parsing of line tables and support files. I think I can get rid of the compile unit field inside it, but that would mean relying on the SymbolVendor to conjure up the CompUnitSP when I need it, which is a bit of an odd dependency. (I need to conjure up the pointer from somewhere in the ResolveSymbolContext functions).

214 ↗(On Diff #184103)

FileSpecList doesn't have the resize method. I use that to implement parsing of the FILE records, since the file records theoretically don't have to come in order, so I just resize the vector to fit the largest number I've encountered.

lemo accepted this revision.Feb 5 2019, 11:57 AM

The latest version looks good to me. Please update the description (it still says it uses the one CU per symbols file)

include/lldb/Core/FileSpecList.h
72 ↗(On Diff #184103)

nit: if you have move assignment-operator also add the move constructor

This revision is now accepted and ready to land.Feb 5 2019, 11:57 AM
labath updated this revision to Diff 185533.Feb 6 2019, 5:38 AM
labath marked 2 inline comments as done.

Tried to make parsing as lazy as possible. GetNumSections() will count the
number of FUNC records, but will not create CompileUnit objects. FILE records
will be parsed when we create the first compile unit. The support files and line
tables will be parsed when the first of them is requested for a given CU.

I've removed the CU shared pointer from the CompUnitData structure. Instead, I go through the symbol vendor to fetch the CU SP.

Added the move constructor for FileSpecList.

Please take another look.

labath edited the summary of this revision. (Show Details)Feb 6 2019, 5:59 AM

Just bounds check "index" in parse compile unit and this is good to go

source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp
196 ↗(On Diff #185533)

Validate "index" first? We will crash if someone iterates over too many CUs?

labath updated this revision to Diff 185557.Feb 6 2019, 7:40 AM
labath edited the summary of this revision. (Show Details)

Add a bounds check to the GetCompileUnitAtIndex method

clayborg accepted this revision.Feb 6 2019, 8:00 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 5:42 AM