This is an archive of the discontinued LLVM Phabricator instance.

[LLDB] Allows overwriting the existing line entry at given file address when inserting
AbandonedPublic

Authored by zequanwu on Dec 22 2021, 4:27 PM.

Details

Reviewers
labath
clayborg
Summary

It seems like that line entries in line table are assumed to always have
distinct file address. This allows overwriting the existing line entry if the
given file address is same as existing one.

The reason for the patch is that line table in PDB doesn't have line entries for
inlined functions. We can only get line info for inlined functions via parsing
inlined call site debug info (S_INLINESITE). This allows us to insert line
entries when we parsing S_INLINESITE, which could have line with the same
starting file address. For example:

$ llvm-pdbutil dump --symbols -l --modi=0 a.pdb 
                           Lines                            
============================================================
  0001:00000000-00000031, line/addr entries = 3
    19 00000000 !   20 00000004 !   21 00000024 ! 
                          Symbols                           
============================================================
...
    312 | S_INLINESITE [size = 28]
           inlinee = 0x17E0 (foo), parent = 88, end = 432
             0B44      code 0x4 (+0x4) line 2 (+2)
             0B26      code 0xA (+0x6) line 3 (+1)
             0B4A      code 0x14 (+0xA) line 5 (+2)
             0410      code end 0x24 (+0x10)
             0B37      code 0x2B (+0x7) line 4 (-1)
             0406      code end 0x31 (+0x6)
...

From line table, we only know 0x4 - 0x23 maps to line 20. When parsing
S_INLINESITE, we would know 0x4 - 0x9 maps to function foo declaration line + 2.

Diff Detail

Event Timeline

zequanwu requested review of this revision.Dec 22 2021, 4:27 PM
zequanwu created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2021, 4:27 PM
zequanwu edited the summary of this revision. (Show Details)Dec 22 2021, 4:36 PM

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

lldb/source/Symbol/LineTable.cpp
56–57

*pos = entry ?

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

Maybe it's better to use a set to store the line entries, ordering just by the file address so that insertion is cheaper? Currently, it compares other fields if two lines have same file address (https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/LineTable.cpp#L150). Is it necessary? I think line entries in line table always have distinct file address.

labath requested changes to this revision.Dec 28 2021, 7:04 AM

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

So, I'm afraid that won't work. You shouldn't be modifying the line tables after ParseLineTable returns. Lldb (currently) considers the line tables to be either fully parsed or not parsed at all. There's no way to "partially" parse a line table. Attempting to do so will result in various strange effects, including file+line breakpoints resolving differently depending on which blocks have been parsed.

I think you'll need to decide whether you want the inline info to be reflected in the line table or not. If yes, then you'll need to parse it from within ParseLineTable, at which point, you can probably do the deduplication outside of the LineTable class -- we can talk about how to make that easier.

We can also try to come up with a completely different way to represent line tables in lldb. The currently implementation is clearly dwarf-centric, and probably not best suited for the pdb use case. But (due to the breakpoint use case) I don't think we can avoid having a mode which does a complete parse of whatever we consider to be a line table. The only thing we can do is to avoid parsing everything for the case where that is not necessary.

Maybe it's better to use a set to store the line entries, ordering just by the file address so that insertion is cheaper? Currently, it compares other fields if two lines have same file address (https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/LineTable.cpp#L150). Is it necessary? I think line entries in line table always have distinct file address.

Given the above, I think that a sorted vector is an optimal data structure for the current state of the world. It's possible we don't need to sort by the other fields, but I don't think that's the biggest issue right now.

This revision now requires changes to proceed.Dec 28 2021, 7:04 AM
zequanwu added a comment.EditedDec 28 2021, 5:52 PM

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

So, I'm afraid that won't work. You shouldn't be modifying the line tables after ParseLineTable returns. Lldb (currently) considers the line tables to be either fully parsed or not parsed at all. There's no way to "partially" parse a line table. Attempting to do so will result in various strange effects, including file+line breakpoints resolving differently depending on which blocks have been parsed.

I think you'll need to decide whether you want the inline info to be reflected in the line table or not. If yes, then you'll need to parse it from within ParseLineTable, at which point, you can probably do the deduplication outside of the LineTable class -- we can talk about how to make that easier.

I think the inline info should be reflected in the line table. Otherwise image lookup -a ... cannot correctly show the file and line info for inlined functions. My plan is to keep a sorted (by address) set of line entries before adding into line sequence. Every time we add a new line entry into the set while parsing inline info, replace the existing entry with new entry if they have same address.

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

So, I'm afraid that won't work. You shouldn't be modifying the line tables after ParseLineTable returns. Lldb (currently) considers the line tables to be either fully parsed or not parsed at all. There's no way to "partially" parse a line table. Attempting to do so will result in various strange effects, including file+line breakpoints resolving differently depending on which blocks have been parsed.

I think you'll need to decide whether you want the inline info to be reflected in the line table or not. If yes, then you'll need to parse it from within ParseLineTable, at which point, you can probably do the deduplication outside of the LineTable class -- we can talk about how to make that easier.

I think the inline info should be reflected in the line table. Otherwise image lookup -a ... cannot correctly show the file and line info for inlined functions. My plan is to keep a sorted (by address) set of line entries before adding into line sequence. Every time we add a new line entry into the set while parsing inline info, replace the existing entry with new entry if they have same address.

Line tables in LLDB are currently encoded such that they only need the deepest line table entry for a given address. Line tables in DWARF encode only these line table entries. The inlined call stack is decoded by looking at the lldb_private::Block objects and looking at their inline information using:

const InlineFunctionInfo *Block::GetInlinedFunctionInfo() const;

The inline function info has a "Declaration" object that describes the call site which is accessed via:

Declaration &InlineFunctionInfo::GetCallSite();

A declarations a file + line + column. So maybe the best thing to do would be to make sure that we only emit the deepest line table entries and possibly encode the call site info into the Block objects?

zequanwu abandoned this revision.Jan 11 2022, 4:15 PM