This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
include/lldb/Symbol/
-
lldb/
-
Symbol/
-
LineTable.h
-
source/Symbol/
-
Symbol/
1
LineTable.cpp

Differential D116195

[LLDB] Allows overwriting the existing line entry at given file address when inserting
AbandonedPublic

Authored by zequanwu on Dec 22 2021, 4:27 PM.

Download Raw Diff

Details

Reviewers

labath
clayborg

Summary

It seems like that line entries in line table are assumed to always have
distinct file address. This allows overwriting the existing line entry if the
given file address is same as existing one.

The reason for the patch is that line table in PDB doesn't have line entries for
inlined functions. We can only get line info for inlined functions via parsing
inlined call site debug info (S_INLINESITE). This allows us to insert line
entries when we parsing S_INLINESITE, which could have line with the same
starting file address. For example:

$ llvm-pdbutil dump --symbols -l --modi=0 a.pdb 
                           Lines                            
============================================================
  0001:00000000-00000031, line/addr entries = 3
    19 00000000 !   20 00000004 !   21 00000024 ! 
                          Symbols                           
============================================================
...
    312 | S_INLINESITE [size = 28]
           inlinee = 0x17E0 (foo), parent = 88, end = 432
             0B44      code 0x4 (+0x4) line 2 (+2)
             0B26      code 0xA (+0x6) line 3 (+1)
             0B4A      code 0x14 (+0xA) line 5 (+2)
             0410      code end 0x24 (+0x10)
             0B37      code 0x2B (+0x7) line 4 (-1)
             0406      code end 0x31 (+0x6)
...

From line table, we only know 0x4 - 0x23 maps to line 20. When parsing
S_INLINESITE, we would know 0x4 - 0x9 maps to function foo declaration line + 2.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

zequanwu requested review of this revision.Dec 22 2021, 4:27 PM

zequanwu created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2021, 4:27 PM

Herald added a subscriber: lldb-commits. · View Herald Transcript

Harbormaster completed remote builds in B140454: Diff 395944.Dec 22 2021, 4:32 PM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptDec 22 2021, 4:32 PM

zequanwu edited the summary of this revision. (Show Details)Dec 22 2021, 4:36 PM

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

lldb/source/Symbol/LineTable.cpp
56–57	`*pos = entry` ?

In D116195#3207672, @labath wrote:

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

Maybe it's better to use a set to store the line entries, ordering just by the file address so that insertion is cheaper? Currently, it compares other fields if two lines have same file address (https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/LineTable.cpp#L150). Is it necessary? I think line entries in line table always have distinct file address.

In D116195#3208476, @zequanwu wrote:

In D116195#3207672, @labath wrote:

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

So, I'm afraid that won't work. You shouldn't be modifying the line tables after ParseLineTable returns. Lldb (currently) considers the line tables to be either fully parsed or not parsed at all. There's no way to "partially" parse a line table. Attempting to do so will result in various strange effects, including file+line breakpoints resolving differently depending on which blocks have been parsed.

I think you'll need to decide whether you want the inline info to be reflected in the line table or not. If yes, then you'll need to parse it from within ParseLineTable, at which point, you can probably do the deduplication outside of the LineTable class -- we can talk about how to make that easier.

We can also try to come up with a completely different way to represent line tables in lldb. The currently implementation is clearly dwarf-centric, and probably not best suited for the pdb use case. But (due to the breakpoint use case) I don't think we can avoid having a mode which does a complete parse of whatever we consider to be a line table. The only thing we can do is to avoid parsing everything for the case where that is not necessary.

Maybe it's better to use a set to store the line entries, ordering just by the file address so that insertion is cheaper? Currently, it compares other fields if two lines have same file address (https://github.com/llvm/llvm-project/blob/main/lldb/source/Symbol/LineTable.cpp#L150). Is it necessary? I think line entries in line table always have distinct file address.

Given the above, I think that a sorted vector is an optimal data structure for the current state of the world. It's possible we don't need to sort by the other fields, but I don't think that's the biggest issue right now.

This revision now requires changes to proceed.Dec 28 2021, 7:04 AM

In D116195#3211346, @labath wrote:

In D116195#3208476, @zequanwu wrote:

In D116195#3207672, @labath wrote:

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

So, I'm afraid that won't work. You shouldn't be modifying the line tables after ParseLineTable returns. Lldb (currently) considers the line tables to be either fully parsed or not parsed at all. There's no way to "partially" parse a line table. Attempting to do so will result in various strange effects, including file+line breakpoints resolving differently depending on which blocks have been parsed.

I think you'll need to decide whether you want the inline info to be reflected in the line table or not. If yes, then you'll need to parse it from within ParseLineTable, at which point, you can probably do the deduplication outside of the LineTable class -- we can talk about how to make that easier.

I think the inline info should be reflected in the line table. Otherwise image lookup -a ... cannot correctly show the file and line info for inlined functions. My plan is to keep a sorted (by address) set of line entries before adding into line sequence. Every time we add a new line entry into the set while parsing inline info, replace the existing entry with new entry if they have same address.

In D116195#3211906, @zequanwu wrote:

In D116195#3211346, @labath wrote:

In D116195#3208476, @zequanwu wrote:

In D116195#3207672, @labath wrote:

How are you planning to make use of this functionality?

I'm asking because I'm wondering if it wouldn't be better to do this kind of processing in the PDB code, and then hand this class a finished list of line entries. Inserting entries into the middle of a vector is expensive, which is why our dwarf code no longer uses this function (it uses the vector<LineSequence> constructor instead). If we could get pdb to do something similar, then we could get rid of this function altogether.

My plan was to insert entries when parsing inlined call site(S_INLINESITE ) in ParseBlocksRecursive, which usually happens after ParseLineTable. In PDB, line entries in inlined call site often have same file addresses as line entries in line table, and the former is better since it describes lines inside inlined function rather than lines in caller. And we could have nested inlined call sites. The line entries from inner call sites would always overwrite the line entries from the outer call sites if they have the same file address.

So, I'm afraid that won't work. You shouldn't be modifying the line tables after ParseLineTable returns. Lldb (currently) considers the line tables to be either fully parsed or not parsed at all. There's no way to "partially" parse a line table. Attempting to do so will result in various strange effects, including file+line breakpoints resolving differently depending on which blocks have been parsed.

I think you'll need to decide whether you want the inline info to be reflected in the line table or not. If yes, then you'll need to parse it from within ParseLineTable, at which point, you can probably do the deduplication outside of the LineTable class -- we can talk about how to make that easier.

I think the inline info should be reflected in the line table. Otherwise image lookup -a ... cannot correctly show the file and line info for inlined functions. My plan is to keep a sorted (by address) set of line entries before adding into line sequence. Every time we add a new line entry into the set while parsing inline info, replace the existing entry with new entry if they have same address.

Line tables in LLDB are currently encoded such that they only need the deepest line table entry for a given address. Line tables in DWARF encode only these line table entries. The inlined call stack is decoded by looking at the lldb_private::Block objects and looking at their inline information using:

const InlineFunctionInfo *Block::GetInlinedFunctionInfo() const;

The inline function info has a "Declaration" object that describes the call site which is accessed via:

Declaration &InlineFunctionInfo::GetCallSite();

A declarations a file + line + column. So maybe the best thing to do would be to make sure that we only emit the deepest line table entries and possibly encode the call site info into the Block objects?

zequanwu abandoned this revision.Jan 11 2022, 4:15 PM

Revision Contents

Path

Size

lldb/

include/

lldb/

Symbol/

LineTable.h

3 lines

source/

Symbol/

LineTable.cpp

12 lines

Diff 395944

lldb/include/lldb/Symbol/LineTable.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	public:
/// \see Address::DumpStyle		/// \see Address::DumpStyle
// void		// void
// AddLineEntry (const LineEntry& line_entry);		// AddLineEntry (const LineEntry& line_entry);

// Called when you can't guarantee the addresses are in increasing order		// Called when you can't guarantee the addresses are in increasing order
void InsertLineEntry(lldb::addr_t file_addr, uint32_t line, uint16_t column,		void InsertLineEntry(lldb::addr_t file_addr, uint32_t line, uint16_t column,
uint16_t file_idx, bool is_start_of_statement,		uint16_t file_idx, bool is_start_of_statement,
bool is_start_of_basic_block, bool is_prologue_end,		bool is_start_of_basic_block, bool is_prologue_end,
bool is_epilogue_begin, bool is_terminal_entry);		bool is_epilogue_begin, bool is_terminal_entry,
		bool overwrite_existing = false);

// Used to instantiate the LineSequence helper class		// Used to instantiate the LineSequence helper class
static std::unique_ptr<LineSequence> CreateLineSequenceContainer();		static std::unique_ptr<LineSequence> CreateLineSequenceContainer();

// Append an entry to a caller-provided collection that will later be		// Append an entry to a caller-provided collection that will later be
// inserted in this line table.		// inserted in this line table.
static void AppendLineEntryToSequence(LineSequence *sequence, lldb::addr_t file_addr,		static void AppendLineEntryToSequence(LineSequence *sequence, lldb::addr_t file_addr,
uint32_t line, uint16_t column,		uint32_t line, uint16_t column,
▲ Show 20 Lines • Show All 332 Lines • Show Last 20 Lines

lldb/source/Symbol/LineTable.cpp

	Show All 35 Lines
	// Destructor			// Destructor
	LineTable::~LineTable() = default;			LineTable::~LineTable() = default;

	void LineTable::InsertLineEntry(lldb::addr_t file_addr, uint32_t line,			void LineTable::InsertLineEntry(lldb::addr_t file_addr, uint32_t line,
	uint16_t column, uint16_t file_idx,			uint16_t column, uint16_t file_idx,
	bool is_start_of_statement,			bool is_start_of_statement,
	bool is_start_of_basic_block,			bool is_start_of_basic_block,
	bool is_prologue_end, bool is_epilogue_begin,			bool is_prologue_end, bool is_epilogue_begin,
	bool is_terminal_entry) {			bool is_terminal_entry,
				bool overwrite_existing) {
	Entry entry(file_addr, line, column, file_idx, is_start_of_statement,			Entry entry(file_addr, line, column, file_idx, is_start_of_statement,
	is_start_of_basic_block, is_prologue_end, is_epilogue_begin,			is_start_of_basic_block, is_prologue_end, is_epilogue_begin,
	is_terminal_entry);			is_terminal_entry);

	LineTable::Entry::LessThanBinaryPredicate less_than_bp(this);			LineTable::Entry::LessThanBinaryPredicate less_than_bp(this);
	entry_collection::iterator pos =			entry_collection::iterator pos =
	llvm::upper_bound(m_entries, entry, less_than_bp);			llvm::upper_bound(m_entries, entry, less_than_bp);
				if (overwrite_existing && pos != m_entries.begin()) {
				--pos;
				if (pos->file_addr == file_addr) {
				uint32_t idx = std::distance(m_entries.begin(), pos);
				m_entries[idx] = entry;
				labathUnsubmitted Not Done Reply Inline Actions `pos = entry` ? labath:* `*pos = entry` ?
				return;
				}
				}
	// Stream s(stdout);			// Stream s(stdout);
	// s << "\n\nBefore:\n";			// s << "\n\nBefore:\n";
	// Dump (&s, Address::DumpStyleFileAddress);			// Dump (&s, Address::DumpStyleFileAddress);
	m_entries.insert(pos, entry);			m_entries.insert(pos, entry);
	// s << "After:\n";			// s << "After:\n";
	// Dump (&s, Address::DumpStyleFileAddress);			// Dump (&s, Address::DumpStyleFileAddress);
	}			}

	▲ Show 20 Lines • Show All 438 Lines • Show Last 20 Lines