This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
include/lldb/Symbol/
-
lldb/
-
Symbol/
-
ObjectFile.h
-
Symbol.h
1/2
Symtab.h
-
source/
-
Plugins/ObjectFile/
-
ObjectFile/
-
ELF/
1/1
ObjectFileELF.cpp
-
Mach-O/
-
ObjectFileMachO.cpp
-
Symbol/
-
ObjectFile.cpp
1/4
Symbol.cpp
1/4
Symtab.cpp

Differential D104488

Create synthetic symbol names on demand to improve memory consumption and startup times.
ClosedPublic

Authored by clayborg on Jun 17 2021, 2:31 PM.

Download Raw Diff

Details

Reviewers

labath
aprantl
JDevlieghere
jingham
wallace

Commits

rGd77ccfdc7218: Create synthetic symbol names on demand to improve memory consumption and…
rGda6384fbb9fb: Add beginning of LLVM's GettingStarted to GitHub readme

Summary

This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.

Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.

This patch fixes the issue by:

not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool

Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)

After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)

The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

clayborg created this revision.Jun 17 2021, 2:31 PM

Herald added subscribers: kristof.beyls, emaste. · View Herald TranscriptJun 17 2021, 2:31 PM

clayborg requested review of this revision.Jun 17 2021, 2:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2021, 2:31 PM

Herald added subscribers: lldb-commits, MaskRay. · View Herald Transcript

lgtm

I wonder if it's lldb's style to rename m_mangled to m_mangled_do_not_use (or something like that) to prevent people from using it in the future.

lldb/source/Symbol/Symtab.cpp
639	these queries

• walli99 added a commit: rGda6384fbb9fb: Add beginning of LLVM's GettingStarted to GitHub readme.Jun 18 2021, 2:36 AM

Harbormaster completed remote builds in B109806: Diff 352848.Jun 18 2021, 4:50 AM

Fixed a case where the accessor was being used twice in the same function and fixed a typo in a comment.

wallace accepted this revision.Jun 18 2021, 5:25 PM

This revision is now accepted and ready to land.Jun 18 2021, 5:25 PM

Harbormaster completed remote builds in B110011: Diff 353120.Jun 19 2021, 1:03 PM

Are the Symbol ID's for unnamed symbols the same each time you read in a symbol file? While the unnamed_symbol symbol names are not significant, it would be good if you were crashing in __lldb_unnamed_symbol111 on one lldb run, you would also crash in the same unnamed symbol when you crashed again.

shafik added a subscriber: shafik.Jun 21 2021, 4:06 PM

shafik added inline comments.

lldb/include/lldb/Symbol/Symtab.h
222	We should add Doxygen comment for this member function. I know we are not consistent with doing this but for new stuff we should do this and fix when we can we refactoring. Thank you! I thought about this because I noticed we are returning `0` and we had an explicit comment about what it meant and this is where it really belongs. I also noticed we use `UINT32_MAX` but we don't seem to have an alias for that either.
lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
2815–2816	Since we are touching this can we move to using parameter name in the comment style as documented here: https://llvm.org/docs/CodingStandards.html#comment-formatting We even have a clang-tidy check to verify this: https://clang.llvm.org/extra/clang-tidy/checks/bugprone-argument-comment.html e.g.`/name=/llvm::StringRef(), /type=/eSymbolTypeCode, ...`
lldb/source/Symbol/Symtab.cpp
649	`/Radix=/10`

In D104488#2830992, @jingham wrote:

Are the Symbol ID's for unnamed symbols the same each time you read in a symbol file? While the unnamed_symbol symbol names are not significant, it would be good if you were crashing in __lldb_unnamed_symbol111 on one lldb run, you would also crash in the same unnamed symbol when you crashed again.

Yes, symbol IDs are consistent as they encode the UserID of the symbol as the number which will be the same on each run as long as the binary doesn't change. The UserID for synthetic symbols always start with the last valid actual symbol index from the main symbol table. So the numbers are just as good as they are before, they just don't start at 1 anymore, the start at the size of the actual symbol table.

In D104488#2831818, @clayborg wrote:

In D104488#2830992, @jingham wrote:

Are the Symbol ID's for unnamed symbols the same each time you read in a symbol file? While the unnamed_symbol symbol names are not significant, it would be good if you were crashing in __lldb_unnamed_symbol111 on one lldb run, you would also crash in the same unnamed symbol when you crashed again.

Yes, symbol IDs are consistent as they encode the UserID of the symbol as the number which will be the same on each run as long as the binary doesn't change. The UserID for synthetic symbols always start with the last valid actual symbol index from the main symbol table. So the numbers are just as good as they are before, they just don't start at 1 anymore, the start at the size of the actual symbol table.

Excellent, the specific number is unimportant so long as they are consistent. This seems a side effect of they way they are computed, might be good to drop a judicious comment somewhere saying why it's important...

Fix comments from review:

Add in-line C-style comment for parameter to symbol creation.
Fixed the entry point address to encode the offset correctly for the ELF e_entry address
Add header doc for the new Symtab function

clayborg marked 3 inline comments as done.Jun 21 2021, 10:29 PM

clayborg added inline comments.

lldb/source/Symbol/Symbol.cpp
573–582	This is the comment Jim Ingham asked for.

Harbormaster completed remote builds in B110338: Diff 353550.Jun 21 2021, 11:10 PM

Some comments on comments...

lldb/include/lldb/Symbol/Symtab.h
224	This comment is hard to read. I think it's mostly because you describe the implementation before the reason for it. Maybe this would be clearer like: "We generate unique names for synthetic symbols so that users can look them up by name when needed. But because doing so is uncommon in normal debugger use, we trade off some performance at lookup time for faster symbol table building by detecting these symbols and generating their names lazily, rather than adding them to the normal symbol indexes. This function does the job of first consulting the indexes, and if that fails checking whether the symbol has the synthetic symbol prefix and generating the correct synthetic name if it does.
lldb/source/Symbol/Symbol.cpp
575	are -> is or maybe: starts with the synthetic symbol prefix, followed by a unique number
576	is -> of so this reads: Typically the UserID of a real symbol is ...
578	I don't think you need the implementation detail here, you are stating policy. Starting from "Typically" I think something like the following is more direct: Typically the UserID of a real symbol is the symbol table index of the symbol in the object file's symbol table(s), so it will be the same every time you read in the object file. We want the same persistence for synthetic symbols so that users can identify them across multiple debug sessions, to understand crashes in those symbols and to reliably set breakpoints on them.
lldb/source/Symbol/Symtab.cpp
638	so -> to

Fixed comments per Jim Ingham's comments.

Thanks!

Anyone have any other comments? Or is this good to go?

Harbormaster completed remote builds in B110451: Diff 353715.Jun 22 2021, 12:01 PM

Anyone have any objections now?

I'm fine with the change.

Sounds good, can someone accept the patch?

good to go

aprantl added inline comments.Jun 25 2021, 12:18 PM

lldb/source/Symbol/Symtab.cpp
652	`if (!symbol)` ?

This revision was landed with ongoing or failed builds.Jun 28 2021, 6:05 PM

Closed by commit rGd77ccfdc7218: Create synthetic symbol names on demand to improve memory consumption and… (authored by clayborg). · Explain Why

This revision was automatically updated to reflect the committed changes.

clayborg added a commit: rGd77ccfdc7218: Create synthetic symbol names on demand to improve memory consumption and….

clayborg mentioned this in rG323bcbdba0e6: Fix buildbot failure after https://reviews.llvm.org/D104488..Jun 28 2021, 6:12 PM

clayborg mentioned this in rG42c05ed8beb2: Fix failing tests after https://reviews.llvm.org/D104488..Jun 28 2021, 8:00 PM

This rev has caused multiple test failure on LLDB Arm/AArch64 buildbots.

https://lab.llvm.org/buildbot/#/builders/17/builds/8504
https://lab.llvm.org/buildbot/#/builders/96/builds/9110
https://lab.llvm.org/buildbot/#/builders/96/builds/9111

This broke the windows lldb bot and the follow up changes did not fix it:

https://lab.llvm.org/buildbot/#/builders/83/builds/7748

Can you have a look?

stella.stamenova added a reverting change: rGbb2cfca2f323: Revert D104488 and friends since it broke the windows bot.Jun 29 2021, 12:59 PM

clayborg mentioned this in D105160: Create synthetic symbol names on demand to improve memory consumption and startup times..Jun 29 2021, 4:32 PM

Revision Contents

Path

Size

lldb/

include/

lldb/

Symbol/

ObjectFile.h

2 lines

Symbol.h

24 lines

Symtab.h

20 lines

source/

Plugins/

ObjectFile/

ELF/

ObjectFileELF.cpp

64 lines

Mach-O/

ObjectFileMachO.cpp

6 lines

Symbol/

ObjectFile.cpp

10 lines

Symbol.cpp

40 lines

Symtab.cpp

38 lines

Diff 355077

lldb/include/lldb/Symbol/ObjectFile.h

Show First 20 Lines • Show All 706 Lines • ▼ Show 20 Lines	protected:
/// \param[in] new_arch		/// \param[in] new_arch
/// The architecture this module will be set to.		/// The architecture this module will be set to.
///		///
/// \return		/// \return
/// Returns \b true if the architecture was changed, \b		/// Returns \b true if the architecture was changed, \b
/// false otherwise.		/// false otherwise.
bool SetModulesArchitecture(const ArchSpec &new_arch);		bool SetModulesArchitecture(const ArchSpec &new_arch);

ConstString GetNextSyntheticSymbolName();

static lldb::DataBufferSP MapFileData(const FileSpec &file, uint64_t Size,		static lldb::DataBufferSP MapFileData(const FileSpec &file, uint64_t Size,
uint64_t Offset);		uint64_t Offset);

private:		private:
ObjectFile(const ObjectFile &) = delete;		ObjectFile(const ObjectFile &) = delete;
const ObjectFile &operator=(const ObjectFile &) = delete;		const ObjectFile &operator=(const ObjectFile &) = delete;
};		};

Show All 15 Lines

lldb/include/lldb/Symbol/Symbol.h

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	public:

ConstString GetDisplayName() const;		ConstString GetDisplayName() const;

uint32_t GetID() const { return m_uid; }		uint32_t GetID() const { return m_uid; }

lldb::LanguageType GetLanguage() const {		lldb::LanguageType GetLanguage() const {
// TODO: See if there is a way to determine the language for a symbol		// TODO: See if there is a way to determine the language for a symbol
// somehow, for now just return our best guess		// somehow, for now just return our best guess
return m_mangled.GuessLanguage();		return GetMangled().GuessLanguage();
}		}

void SetID(uint32_t uid) { m_uid = uid; }		void SetID(uint32_t uid) { m_uid = uid; }

Mangled &GetMangled() { return m_mangled; }		Mangled &GetMangled() {
		SynthesizeNameIfNeeded();
		return m_mangled;
		}

const Mangled &GetMangled() const { return m_mangled; }		const Mangled &GetMangled() const {
		SynthesizeNameIfNeeded();
		return m_mangled;
		}

ConstString GetReExportedSymbolName() const;		ConstString GetReExportedSymbolName() const;

FileSpec GetReExportedSymbolSharedLibrary() const;		FileSpec GetReExportedSymbolSharedLibrary() const;

void SetReExportedSymbolName(ConstString name);		void SetReExportedSymbolName(ConstString name);

bool SetReExportedSymbolSharedLibrary(const FileSpec &fspec);		bool SetReExportedSymbolSharedLibrary(const FileSpec &fspec);
Show All 29 Lines	public:

bool IsExternal() const { return m_is_external; }		bool IsExternal() const { return m_is_external; }

void SetExternal(bool b) { m_is_external = b; }		void SetExternal(bool b) { m_is_external = b; }

bool IsTrampoline() const;		bool IsTrampoline() const;

bool IsIndirect() const;		bool IsIndirect() const;

bool IsWeak() const { return m_is_weak; }		bool IsWeak() const { return m_is_weak; }

void SetIsWeak (bool b) { m_is_weak = b; }		void SetIsWeak (bool b) { m_is_weak = b; }

bool GetByteSizeIsValid() const { return m_size_is_valid; }		bool GetByteSizeIsValid() const { return m_size_is_valid; }

lldb::addr_t GetByteSize() const;		lldb::addr_t GetByteSize() const;

void SetByteSize(lldb::addr_t size) {		void SetByteSize(lldb::addr_t size) {
m_size_is_valid = size > 0;		m_size_is_valid = size > 0;
Show All 38 Lines	lldb::DisassemblerSP GetInstructions(const ExecutionContext &exe_ctx,
const char *flavor,		const char *flavor,
bool prefer_file_cache);		bool prefer_file_cache);

bool GetDisassembly(const ExecutionContext &exe_ctx, const char *flavor,		bool GetDisassembly(const ExecutionContext &exe_ctx, const char *flavor,
bool prefer_file_cache, Stream &strm);		bool prefer_file_cache, Stream &strm);

bool ContainsFileAddress(lldb::addr_t file_addr) const;		bool ContainsFileAddress(lldb::addr_t file_addr) const;

		static llvm::StringRef GetSyntheticSymbolPrefix() {
		return "___lldb_unnamed_symbol";
		}

protected:		protected:
// This is the internal guts of ResolveReExportedSymbol, it assumes		// This is the internal guts of ResolveReExportedSymbol, it assumes
// reexport_name is not null, and that module_spec is valid. We track the		// reexport_name is not null, and that module_spec is valid. We track the
// modules we've already seen to make sure we don't get caught in a cycle.		// modules we've already seen to make sure we don't get caught in a cycle.

Symbol *ResolveReExportedSymbolInModuleSpec(		Symbol *ResolveReExportedSymbolInModuleSpec(
Target &target, ConstString &reexport_name,		Target &target, ConstString &reexport_name,
lldb_private::ModuleSpec &module_spec,		lldb_private::ModuleSpec &module_spec,
lldb_private::ModuleList &seen_modules) const;		lldb_private::ModuleList &seen_modules) const;

		void SynthesizeNameIfNeeded() const;

uint32_t m_uid =		uint32_t m_uid =
UINT32_MAX; // User ID (usually the original symbol table index)		UINT32_MAX; // User ID (usually the original symbol table index)
uint16_t m_type_data = 0; // data specific to m_type		uint16_t m_type_data = 0; // data specific to m_type
uint16_t m_type_data_resolved : 1, // True if the data in m_type_data has		uint16_t m_type_data_resolved : 1, // True if the data in m_type_data has
// already been calculated		// already been calculated
m_is_synthetic : 1, // non-zero if this symbol is not actually in the		m_is_synthetic : 1, // non-zero if this symbol is not actually in the
// symbol table, but synthesized from other info in		// symbol table, but synthesized from other info in
// the object file.		// the object file.
Show All 9 Lines	uint16_t m_type_data_resolved : 1, // True if the data in m_type_data has
m_demangled_is_synthesized : 1, // The demangled name was created should		m_demangled_is_synthesized : 1, // The demangled name was created should
// not be used for expressions or other		// not be used for expressions or other
// lookups		// lookups
m_contains_linker_annotations : 1, // The symbol name contains linker		m_contains_linker_annotations : 1, // The symbol name contains linker
// annotations, which are optional when		// annotations, which are optional when
// doing name lookups		// doing name lookups
m_is_weak : 1,		m_is_weak : 1,
m_type : 6; // Values from the lldb::SymbolType enum.		m_type : 6; // Values from the lldb::SymbolType enum.
Mangled m_mangled; // uniqued symbol name/mangled name pair		mutable Mangled m_mangled; // uniqued symbol name/mangled name pair
AddressRange m_addr_range; // Contains the value, or the section offset		AddressRange m_addr_range; // Contains the value, or the section offset
// address when the value is an address in a		// address when the value is an address in a
// section, and the size (if any)		// section, and the size (if any)
uint32_t m_flags = 0; // A copy of the flags from the original symbol table,		uint32_t m_flags = 0; // A copy of the flags from the original symbol table,
// the ObjectFile plug-in can interpret these		// the ObjectFile plug-in can interpret these
};		};

} // namespace lldb_private		} // namespace lldb_private

#endif // LLDB_SYMBOL_SYMBOL_H		#endif // LLDB_SYMBOL_SYMBOL_H

lldb/include/lldb/Symbol/Symtab.h

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	case eVisibilityExtern:
return m_symbols[idx].IsExternal();		return m_symbols[idx].IsExternal();

case eVisibilityPrivate:		case eVisibilityPrivate:
return !m_symbols[idx].IsExternal();		return !m_symbols[idx].IsExternal();
}		}
return false;		return false;
}		}

		/// A helper function that looks up full function names.
		shafikUnsubmitted Done Reply Inline Actions We should add Doxygen comment for this member function. I know we are not consistent with doing this but for new stuff we should do this and fix when we can we refactoring. Thank you! I thought about this because I noticed we are returning `0` and we had an explicit comment about what it meant and this is where it really belongs. I also noticed we use `UINT32_MAX` but we don't seem to have an alias for that either. shafik: We should add Doxygen comment for this member function. I know we are not consistent with doing…
		///
		/// We generate unique names for synthetic symbols so that users can look
		jinghamUnsubmitted Not Done Reply Inline Actions This comment is hard to read. I think it's mostly because you describe the implementation before the reason for it. Maybe this would be clearer like: "We generate unique names for synthetic symbols so that users can look them up by name when needed. But because doing so is uncommon in normal debugger use, we trade off some performance at lookup time for faster symbol table building by detecting these symbols and generating their names lazily, rather than adding them to the normal symbol indexes. This function does the job of first consulting the indexes, and if that fails checking whether the symbol has the synthetic symbol prefix and generating the correct synthetic name if it does. jingham: This comment is hard to read. I think it's mostly because you describe the implementation…
		/// them up by name when needed. But because doing so is uncommon in normal
		/// debugger use, we trade off some performance at lookup time for faster
		/// symbol table building by detecting these symbols and generating their
		/// names lazily, rather than adding them to the normal symbol indexes. This
		/// function does the job of first consulting the name indexes, and if that
		/// fails it extracts the information it needs from the synthetic name and
		/// locates the symbol.
		///
		/// @param[in] symbol_name The symbol name to search for.
		///
		/// @param[out] indexes The vector if symbol indexes to update with results.
		///
		/// @returns The number of indexes added to the index vector. Zero if no
		/// matches were found.
		uint32_t GetNameIndexes(ConstString symbol_name,
		std::vector<uint32_t> &indexes);

void SymbolIndicesToSymbolContextList(std::vector<uint32_t> &symbol_indexes,		void SymbolIndicesToSymbolContextList(std::vector<uint32_t> &symbol_indexes,
SymbolContextList &sc_list);		SymbolContextList &sc_list);

void RegisterMangledNameEntry(		void RegisterMangledNameEntry(
uint32_t value, std::set<const char *> &class_contexts,		uint32_t value, std::set<const char *> &class_contexts,
std::vector<std::pair<NameToIndexMap::Entry, const char *>> &backlog,		std::vector<std::pair<NameToIndexMap::Entry, const char *>> &backlog,
RichManglingContext &rmc);		RichManglingContext &rmc);

Show All 11 Lines

lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp

Show First 20 Lines • Show All 1,874 Lines • ▼ Show 20 Lines	if (auto gdd_objfile_section_list = gdd_obj_file->GetSectionList()) {
eSectionTypeELFSymbolTable, true);		eSectionTypeELFSymbolTable, true);
if (module_section_sp)		if (module_section_sp)
unified_section_list.ReplaceSection(module_section_sp->GetID(),		unified_section_list.ReplaceSection(module_section_sp->GetID(),
symtab_section_sp);		symtab_section_sp);
else		else
unified_section_list.AddSection(symtab_section_sp);		unified_section_list.AddSection(symtab_section_sp);
}		}
}		}
}		}
}		}

std::shared_ptr<ObjectFileELF> ObjectFileELF::GetGnuDebugDataObjectFile() {		std::shared_ptr<ObjectFileELF> ObjectFileELF::GetGnuDebugDataObjectFile() {
if (m_gnu_debug_data_object_file != nullptr)		if (m_gnu_debug_data_object_file != nullptr)
return m_gnu_debug_data_object_file;		return m_gnu_debug_data_object_file;

SectionSP section =		SectionSP section =
GetSectionList()->FindSectionByName(ConstString(".gnu_debugdata"));		GetSectionList()->FindSectionByName(ConstString(".gnu_debugdata"));
▲ Show 20 Lines • Show All 915 Lines • ▼ Show 20 Lines	if (m_symtab_up == nullptr) {
if (CalculateType() == eTypeExecutable) {		if (CalculateType() == eTypeExecutable) {
ArchSpec arch = GetArchitecture();		ArchSpec arch = GetArchitecture();
auto entry_point_addr = GetEntryPointAddress();		auto entry_point_addr = GetEntryPointAddress();
bool is_valid_entry_point =		bool is_valid_entry_point =
entry_point_addr.IsValid() && entry_point_addr.IsSectionOffset();		entry_point_addr.IsValid() && entry_point_addr.IsSectionOffset();
addr_t entry_point_file_addr = entry_point_addr.GetFileAddress();		addr_t entry_point_file_addr = entry_point_addr.GetFileAddress();
if (is_valid_entry_point && !m_symtab_up->FindSymbolContainingFileAddress(		if (is_valid_entry_point && !m_symtab_up->FindSymbolContainingFileAddress(
entry_point_file_addr)) {		entry_point_file_addr)) {
uint64_t symbol_id = m_symtab_up->GetNumSymbols();		uint64_t symbol_id = m_symtab_up->GetNumSymbols();
Symbol symbol(symbol_id,		// Don't set the name for any synthetic symbols, the Symbol
		shafikUnsubmitted Done Reply Inline Actions Since we are touching this can we move to using parameter name in the comment style as documented here: https://llvm.org/docs/CodingStandards.html#comment-formatting We even have a clang-tidy check to verify this: https://clang.llvm.org/extra/clang-tidy/checks/bugprone-argument-comment.html e.g.`/name=/llvm::StringRef(), /type=/eSymbolTypeCode, ...` shafik: Since we are touching this can we move to using parameter name in the comment style as…
GetNextSyntheticSymbolName().GetCString(), // Symbol name.		// object will generate one if needed when the name is accessed
eSymbolTypeCode, // Type of this symbol.		// via accessors.
true, // Is this globally visible?		SectionSP section_sp = entry_point_addr.GetSection();
false, // Is this symbol debug info?		Symbol symbol(
false, // Is this symbol a trampoline?		/symID=/symbol_id,
true, // Is this symbol artificial?		/name=/llvm::StringRef(), // Name will be auto generated.
entry_point_addr.GetSection(), // Section where this		/type=/eSymbolTypeCode,
// symbol is defined.		/external=/true,
0, // Offset in section or symbol value.		/is_debug=/false,
0, // Size.		/is_trampoline=/false,
false, // Size is valid.		/is_artificial=/true,
false, // Contains linker annotations?		/section_sp=/section_sp,
0); // Symbol flags.		/offset=/entry_point_addr.GetOffset(),
		/size=/0, // FDE can span multiple symbols so don't use its size.
		/size_is_valid=/false,
		/contains_linker_annotations=/false,
		/flags=/0);
m_symtab_up->AddSymbol(symbol);		m_symtab_up->AddSymbol(symbol);
// When the entry point is arm thumb we need to explicitly set its		// When the entry point is arm thumb we need to explicitly set its
// class address to reflect that. This is important because expression		// class address to reflect that. This is important because expression
// evaluation relies on correctly setting a breakpoint at this		// evaluation relies on correctly setting a breakpoint at this
// address.		// address.
if (arch.GetMachine() == llvm::Triple::arm &&		if (arch.GetMachine() == llvm::Triple::arm &&
(entry_point_file_addr & 1))		(entry_point_file_addr & 1))
m_address_class_map[entry_point_file_addr ^ 1] =		m_address_class_map[entry_point_file_addr ^ 1] =
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (symbol) {
symbol->SetByteSize(size);		symbol->SetByteSize(size);
symbol->SetSizeIsSynthesized(true);		symbol->SetSizeIsSynthesized(true);
}		}
} else {		} else {
SectionSP section_sp =		SectionSP section_sp =
section_list->FindSectionContainingFileAddress(file_addr);		section_list->FindSectionContainingFileAddress(file_addr);
if (section_sp) {		if (section_sp) {
addr_t offset = file_addr - section_sp->GetFileAddress();		addr_t offset = file_addr - section_sp->GetFileAddress();
const char *symbol_name = GetNextSyntheticSymbolName().GetCString();
uint64_t symbol_id = ++last_symbol_id;		uint64_t symbol_id = ++last_symbol_id;
		// Don't set the name for any synthetic symbols, the Symbol
		// object will generate one if needed when the name is accessed
		// via accessors.
Symbol eh_symbol(		Symbol eh_symbol(
symbol_id, // Symbol table index.		/symID=/symbol_id,
symbol_name, // Symbol name.		/name=/llvm::StringRef(), // Name will be auto generated.
eSymbolTypeCode, // Type of this symbol.		/type=/eSymbolTypeCode,
true, // Is this globally visible?		/external=/true,
false, // Is this symbol debug info?		/is_debug=/false,
false, // Is this symbol a trampoline?		/is_trampoline=/false,
true, // Is this symbol artificial?		/is_artificial=/true,
section_sp, // Section in which this symbol is defined or null.		/section_sp=/section_sp,
offset, // Offset in section or symbol value.		/offset=/offset,
0, // Size: Don't specify the size as an FDE can		/size=/0, // FDE can span multiple symbols so don't use its size.
false, // Size is valid: cover multiple symbols.		/size_is_valid=/false,
false, // Contains linker annotations?		/contains_linker_annotations=/false,
0); // Symbol flags.		/flags=/0);
new_symbols.push_back(eh_symbol);		new_symbols.push_back(eh_symbol);
}		}
}		}
return true;		return true;
});		});

for (const Symbol &s : new_symbols)		for (const Symbol &s : new_symbols)
symbol_table->AddSymbol(s);		symbol_table->AddSymbol(s);
▲ Show 20 Lines • Show All 478 Lines • Show Last 20 Lines

lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,690 Lines • ▼ Show 20 Lines	if (num_synthetic_function_symbols > 0) {
next_symbol_file_addr &= THUMB_ADDRESS_BIT_MASK;		next_symbol_file_addr &= THUMB_ADDRESS_BIT_MASK;
symbol_byte_size = std::min<lldb::addr_t>(		symbol_byte_size = std::min<lldb::addr_t>(
next_symbol_file_addr - symbol_file_addr,		next_symbol_file_addr - symbol_file_addr,
section_end_file_addr - symbol_file_addr);		section_end_file_addr - symbol_file_addr);
} else {		} else {
symbol_byte_size = section_end_file_addr - symbol_file_addr;		symbol_byte_size = section_end_file_addr - symbol_file_addr;
}		}
sym[sym_idx].SetID(synthetic_sym_id++);		sym[sym_idx].SetID(synthetic_sym_id++);
sym[sym_idx].GetMangled().SetDemangledName(		// Don't set the name for any synthetic symbols, the Symbol
GetNextSyntheticSymbolName());		// object will generate one if needed when the name is accessed
		// via accessors.
		sym[sym_idx].GetMangled().SetDemangledName(ConstString());
sym[sym_idx].SetType(eSymbolTypeCode);		sym[sym_idx].SetType(eSymbolTypeCode);
sym[sym_idx].SetIsSynthetic(true);		sym[sym_idx].SetIsSynthetic(true);
sym[sym_idx].GetAddressRef() = symbol_addr;		sym[sym_idx].GetAddressRef() = symbol_addr;
add_symbol_addr(symbol_addr.GetFileAddress());		add_symbol_addr(symbol_addr.GetFileAddress());
if (symbol_flags)		if (symbol_flags)
sym[sym_idx].SetFlags(symbol_flags);		sym[sym_idx].SetFlags(symbol_flags);
if (symbol_byte_size)		if (symbol_byte_size)
sym[sym_idx].SetByteSize(symbol_byte_size);		sym[sym_idx].SetByteSize(symbol_byte_size);
▲ Show 20 Lines • Show All 2,228 Lines • Show Last 20 Lines

lldb/source/Symbol/ObjectFile.cpp

Show First 20 Lines • Show All 610 Lines • ▼ Show 20 Lines	if (!name.empty()) {
} else if (name.startswith(".objc_class_name_")) {		} else if (name.startswith(".objc_class_name_")) {
// ObjC v1		// ObjC v1
return lldb::eSymbolTypeObjCClass;		return lldb::eSymbolTypeObjCClass;
}		}
}		}
return symbol_type_hint;		return symbol_type_hint;
}		}

ConstString ObjectFile::GetNextSyntheticSymbolName() {
llvm::SmallString<256> name;
llvm::raw_svector_ostream os(name);
ConstString file_name = GetModule()->GetFileSpec().GetFilename();
++m_synthetic_symbol_idx;
os << "___lldb_unnamed_symbol" << m_synthetic_symbol_idx << "$$"
<< file_name.GetStringRef();
return ConstString(os.str());
}

std::vector<ObjectFile::LoadableData>		std::vector<ObjectFile::LoadableData>
ObjectFile::GetLoadableData(Target &target) {		ObjectFile::GetLoadableData(Target &target) {
std::vector<LoadableData> loadables;		std::vector<LoadableData> loadables;
SectionList *section_list = GetSectionList();		SectionList *section_list = GetSectionList();
if (!section_list)		if (!section_list)
return loadables;		return loadables;
// Create a list of loadable data from loadable sections		// Create a list of loadable data from loadable sections
size_t section_count = section_list->GetNumSections(0);		size_t section_count = section_list->GetNumSections(0);
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

lldb/source/Symbol/Symbol.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	Symbol::Symbol(uint32_t symID, const Mangled &mangled, SymbolType type,
bool size_is_valid, bool contains_linker_annotations,		bool size_is_valid, bool contains_linker_annotations,
uint32_t flags)		uint32_t flags)
: SymbolContextScope(), m_uid(symID), m_type_data(0),		: SymbolContextScope(), m_uid(symID), m_type_data(0),
m_type_data_resolved(false), m_is_synthetic(is_artificial),		m_type_data_resolved(false), m_is_synthetic(is_artificial),
m_is_debug(is_debug), m_is_external(external), m_size_is_sibling(false),		m_is_debug(is_debug), m_is_external(external), m_size_is_sibling(false),
m_size_is_synthesized(false),		m_size_is_synthesized(false),
m_size_is_valid(size_is_valid \|\| range.GetByteSize() > 0),		m_size_is_valid(size_is_valid \|\| range.GetByteSize() > 0),
m_demangled_is_synthesized(false),		m_demangled_is_synthesized(false),
m_contains_linker_annotations(contains_linker_annotations),		m_contains_linker_annotations(contains_linker_annotations),
m_is_weak(false), m_type(type), m_mangled(mangled), m_addr_range(range),		m_is_weak(false), m_type(type), m_mangled(mangled), m_addr_range(range),
m_flags(flags) {}		m_flags(flags) {}

Symbol::Symbol(const Symbol &rhs)		Symbol::Symbol(const Symbol &rhs)
: SymbolContextScope(rhs), m_uid(rhs.m_uid), m_type_data(rhs.m_type_data),		: SymbolContextScope(rhs), m_uid(rhs.m_uid), m_type_data(rhs.m_type_data),
m_type_data_resolved(rhs.m_type_data_resolved),		m_type_data_resolved(rhs.m_type_data_resolved),
m_is_synthetic(rhs.m_is_synthetic), m_is_debug(rhs.m_is_debug),		m_is_synthetic(rhs.m_is_synthetic), m_is_debug(rhs.m_is_debug),
m_is_external(rhs.m_is_external),		m_is_external(rhs.m_is_external),
m_size_is_sibling(rhs.m_size_is_sibling), m_size_is_synthesized(false),		m_size_is_sibling(rhs.m_size_is_sibling), m_size_is_synthesized(false),
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void Symbol::Clear() {
m_addr_range.Clear();		m_addr_range.Clear();
}		}

bool Symbol::ValueIsAddress() const {		bool Symbol::ValueIsAddress() const {
return m_addr_range.GetBaseAddress().GetSection().get() != nullptr;		return m_addr_range.GetBaseAddress().GetSection().get() != nullptr;
}		}

ConstString Symbol::GetDisplayName() const {		ConstString Symbol::GetDisplayName() const {
return m_mangled.GetDisplayDemangledName();		return GetMangled().GetDisplayDemangledName();
}		}

ConstString Symbol::GetReExportedSymbolName() const {		ConstString Symbol::GetReExportedSymbolName() const {
if (m_type == eSymbolTypeReExported) {		if (m_type == eSymbolTypeReExported) {
// For eSymbolTypeReExported, the "const char *" from a ConstString is used		// For eSymbolTypeReExported, the "const char *" from a ConstString is used
// as the offset in the address range base address. We can then make this		// as the offset in the address range base address. We can then make this
// back into a string that is the re-exported name.		// back into a string that is the re-exported name.
intptr_t str_ptr = m_addr_range.GetBaseAddress().GetOffset();		intptr_t str_ptr = m_addr_range.GetBaseAddress().GetOffset();
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	void Symbol::GetDescription(Stream *s, lldb::DescriptionLevel level,
} else {		} else {
if (m_size_is_sibling)		if (m_size_is_sibling)
s->Printf(", sibling = %5" PRIu64,		s->Printf(", sibling = %5" PRIu64,
m_addr_range.GetBaseAddress().GetOffset());		m_addr_range.GetBaseAddress().GetOffset());
else		else
s->Printf(", value = 0x%16.16" PRIx64,		s->Printf(", value = 0x%16.16" PRIx64,
m_addr_range.GetBaseAddress().GetOffset());		m_addr_range.GetBaseAddress().GetOffset());
}		}
ConstString demangled = m_mangled.GetDemangledName();		ConstString demangled = GetMangled().GetDemangledName();
if (demangled)		if (demangled)
s->Printf(", name=\"%s\"", demangled.AsCString());		s->Printf(", name=\"%s\"", demangled.AsCString());
if (m_mangled.GetMangledName())		if (m_mangled.GetMangledName())
s->Printf(", mangled=\"%s\"", m_mangled.GetMangledName().AsCString());		s->Printf(", mangled=\"%s\"", m_mangled.GetMangledName().AsCString());
}		}

void Symbol::Dump(Stream s, Target target, uint32_t index,		void Symbol::Dump(Stream s, Target target, uint32_t index,
Mangled::NamePreference name_preference) const {		Mangled::NamePreference name_preference) const {
s->Printf("[%5u] %6u %c%c%c %-15s ", index, GetID(), m_is_debug ? 'D' : ' ',		s->Printf("[%5u] %6u %c%c%c %-15s ", index, GetID(), m_is_debug ? 'D' : ' ',
m_is_synthetic ? 'S' : ' ', m_is_external ? 'X' : ' ',		m_is_synthetic ? 'S' : ' ', m_is_external ? 'X' : ' ',
GetTypeAsString());		GetTypeAsString());

// Make sure the size of the symbol is up to date before dumping		// Make sure the size of the symbol is up to date before dumping
GetByteSize();		GetByteSize();

ConstString name = m_mangled.GetName(name_preference);		ConstString name = GetMangled().GetName(name_preference);
if (ValueIsAddress()) {		if (ValueIsAddress()) {
if (!m_addr_range.GetBaseAddress().Dump(s, nullptr,		if (!m_addr_range.GetBaseAddress().Dump(s, nullptr,
Address::DumpStyleFileAddress))		Address::DumpStyleFileAddress))
s->Printf("%*s", 18, "");		s->Printf("%*s", 18, "");

s->PutChar(' ');		s->PutChar(' ');

if (!m_addr_range.GetBaseAddress().Dump(s, target,		if (!m_addr_range.GetBaseAddress().Dump(s, target,
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	if (!m_type_data_resolved) {
}		}
}		}
return m_type_data;		return m_type_data;
}		}
return 0;		return 0;
}		}

bool Symbol::Compare(ConstString name, SymbolType type) const {		bool Symbol::Compare(ConstString name, SymbolType type) const {
if (type == eSymbolTypeAny \|\| m_type == type)		if (type == eSymbolTypeAny \|\| m_type == type) {
return m_mangled.GetMangledName() == name \|\|		const Mangled &mangled = GetMangled();
m_mangled.GetDemangledName() == name;		return mangled.GetMangledName() == name \|\|
		mangled.GetDemangledName() == name;
		}
return false;		return false;
}		}

#define ENUM_TO_CSTRING(x) \		#define ENUM_TO_CSTRING(x) \
case eSymbolType##x: \		case eSymbolType##x: \
return #x;		return #x;

const char *Symbol::GetTypeAsString() const {		const char *Symbol::GetTypeAsString() const {
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines

lldb::addr_t Symbol::GetLoadAddress(Target *target) const {		lldb::addr_t Symbol::GetLoadAddress(Target *target) const {
if (ValueIsAddress())		if (ValueIsAddress())
return GetAddressRef().GetLoadAddress(target);		return GetAddressRef().GetLoadAddress(target);
else		else
return LLDB_INVALID_ADDRESS;		return LLDB_INVALID_ADDRESS;
}		}

ConstString Symbol::GetName() const { return m_mangled.GetName(); }		ConstString Symbol::GetName() const { return GetMangled().GetName(); }

ConstString Symbol::GetNameNoArguments() const {		ConstString Symbol::GetNameNoArguments() const {
return m_mangled.GetName(Mangled::ePreferDemangledWithoutArguments);		return GetMangled().GetName(Mangled::ePreferDemangledWithoutArguments);
}		}

lldb::addr_t Symbol::ResolveCallableAddress(Target &target) const {		lldb::addr_t Symbol::ResolveCallableAddress(Target &target) const {
if (GetType() == lldb::eSymbolTypeUndefined)		if (GetType() == lldb::eSymbolTypeUndefined)
return LLDB_INVALID_ADDRESS;		return LLDB_INVALID_ADDRESS;

Address func_so_addr;		Address func_so_addr;

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (disassembler_sp) {
return true;		return true;
}		}
return false;		return false;
}		}

bool Symbol::ContainsFileAddress(lldb::addr_t file_addr) const {		bool Symbol::ContainsFileAddress(lldb::addr_t file_addr) const {
return m_addr_range.ContainsFileAddress(file_addr);		return m_addr_range.ContainsFileAddress(file_addr);
}		}

		void Symbol::SynthesizeNameIfNeeded() const {
		if (m_is_synthetic && !m_mangled) {
		// Synthetic symbol names don't mean anything, but they do uniquely
		// identify individual symbols so we give them a unique name. The name
		// starts with the synthetic symbol prefix, followed by a unique number.
		jinghamUnsubmitted Not Done Reply Inline Actions are -> is or maybe: starts with the synthetic symbol prefix, followed by a unique number jingham: are -> is or maybe: starts with the synthetic symbol prefix, followed by a unique number
		// Typically the UserID of a real symbol is the symbol table index of the
		jinghamUnsubmitted Not Done Reply Inline Actions is -> of so this reads: Typically the UserID of a real symbol is ... jingham: is -> of so this reads: Typically the UserID of a real symbol is ...
		// symbol in the object file's symbol table(s), so it will be the same
		// every time you read in the object file. We want the same persistence for
		jinghamUnsubmitted Not Done Reply Inline Actions I don't think you need the implementation detail here, you are stating policy. Starting from "Typically" I think something like the following is more direct: Typically the UserID of a real symbol is the symbol table index of the symbol in the object file's symbol table(s), so it will be the same every time you read in the object file. We want the same persistence for synthetic symbols so that users can identify them across multiple debug sessions, to understand crashes in those symbols and to reliably set breakpoints on them. jingham: I don't think you need the implementation detail here, you are stating policy. Starting from…
		// synthetic symbols so that users can identify them across multiple debug
		// sessions, to understand crashes in those symbols and to reliably set
		// breakpoints on them.
		llvm::SmallString<256> name;
		clayborgAuthorUnsubmitted Done Reply Inline Actions This is the comment Jim Ingham asked for. clayborg: This is the comment Jim Ingham asked for.
		llvm::raw_svector_ostream os(name);
		os << GetSyntheticSymbolPrefix() << GetID();
		m_mangled.SetDemangledName(ConstString(os.str()));
		}
		}

lldb/source/Symbol/Symtab.cpp

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	if (!m_name_indexes_computed) {
RichManglingContext rmc;		RichManglingContext rmc;
for (uint32_t value = 0; value < num_symbols; ++value) {		for (uint32_t value = 0; value < num_symbols; ++value) {
Symbol *symbol = &m_symbols[value];		Symbol *symbol = &m_symbols[value];

// Don't let trampolines get into the lookup by name map If we ever need		// Don't let trampolines get into the lookup by name map If we ever need
// the trampoline symbols to be searchable by name we can remove this and		// the trampoline symbols to be searchable by name we can remove this and
// then possibly add a new bool to any of the Symtab functions that		// then possibly add a new bool to any of the Symtab functions that
// lookup symbols by name to indicate if they want trampolines.		// lookup symbols by name to indicate if they want trampolines.
if (symbol->IsTrampoline())		if (symbol->IsTrampoline() \|\| symbol->IsSynthetic())
continue;		continue;

// If the symbol's name string matched a Mangled::ManglingScheme, it is		// If the symbol's name string matched a Mangled::ManglingScheme, it is
// stored in the mangled field.		// stored in the mangled field.
Mangled &mangled = symbol->GetMangled();		Mangled &mangled = symbol->GetMangled();
if (ConstString name = mangled.GetMangledName()) {		if (ConstString name = mangled.GetMangledName()) {
name_to_index.Append(name, value);		name_to_index.Append(name, value);

▲ Show 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	void Symtab::SortSymbolIndexesByValue(std::vector<uint32_t> &indexes,

// Remove any duplicates if requested		// Remove any duplicates if requested
if (remove_duplicates) {		if (remove_duplicates) {
auto last = std::unique(indexes.begin(), indexes.end());		auto last = std::unique(indexes.begin(), indexes.end());
indexes.erase(last, indexes.end());		indexes.erase(last, indexes.end());
}		}
}		}

		uint32_t Symtab::GetNameIndexes(ConstString symbol_name,
		std::vector<uint32_t> &indexes) {
		auto &name_to_index = GetNameToSymbolIndexMap(lldb::eFunctionNameTypeNone);
		const uint32_t count = name_to_index.GetValues(symbol_name, indexes);
		if (count)
		return count;
		// Synthetic symbol names are not added to the name indexes, but they start
		// with a prefix and end with a the symbol UserID. This allows users to find
		jinghamUnsubmitted Not Done Reply Inline Actions so -> to jingham: so -> to
		// these symbols without having to add them to the name indexes. These
		wallaceUnsubmitted Not Done Reply Inline Actions these queries wallace: these queries
		// queries will not happen very often since the names don't mean anything, so
		// performance is not paramount in this case.
		llvm::StringRef name = symbol_name.GetStringRef();
		// String the synthetic prefix if the name starts with it.
		if (!name.consume_front(Symbol::GetSyntheticSymbolPrefix()))
		return 0; // Not a synthetic symbol name

		// Extract the user ID from the symbol name
		user_id_t uid = 0;
		if (getAsUnsignedInteger(name, /Radix=/10, uid))
		shafikUnsubmitted Done Reply Inline Actions `/Radix=/10` shafik: `/Radix=/10`
		return 0; // Failed to extract the user ID as an integer
		Symbol *symbol = FindSymbolByID(uid);
		if (symbol == nullptr)
		aprantlUnsubmitted Not Done Reply Inline Actions `if (!symbol)` ? aprantl: `if (!symbol)` ?
		return 0;
		const uint32_t symbol_idx = GetIndexForSymbol(symbol);
		if (symbol_idx == UINT32_MAX)
		return 0;
		indexes.push_back(symbol_idx);
		return 1;
		}

uint32_t Symtab::AppendSymbolIndexesWithName(ConstString symbol_name,		uint32_t Symtab::AppendSymbolIndexesWithName(ConstString symbol_name,
std::vector<uint32_t> &indexes) {		std::vector<uint32_t> &indexes) {
std::lock_guard<std::recursive_mutex> guard(m_mutex);		std::lock_guard<std::recursive_mutex> guard(m_mutex);

LLDB_SCOPED_TIMER();		LLDB_SCOPED_TIMER();
if (symbol_name) {		if (symbol_name) {
if (!m_name_indexes_computed)		if (!m_name_indexes_computed)
InitNameIndexes();		InitNameIndexes();

auto &name_to_index = GetNameToSymbolIndexMap(lldb::eFunctionNameTypeNone);		return GetNameIndexes(symbol_name, indexes);
return name_to_index.GetValues(symbol_name, indexes);
}		}
return 0;		return 0;
}		}

uint32_t Symtab::AppendSymbolIndexesWithName(ConstString symbol_name,		uint32_t Symtab::AppendSymbolIndexesWithName(ConstString symbol_name,
Debug symbol_debug_type,		Debug symbol_debug_type,
Visibility symbol_visibility,		Visibility symbol_visibility,
std::vector<uint32_t> &indexes) {		std::vector<uint32_t> &indexes) {
std::lock_guard<std::recursive_mutex> guard(m_mutex);		std::lock_guard<std::recursive_mutex> guard(m_mutex);

LLDB_SCOPED_TIMER();		LLDB_SCOPED_TIMER();
if (symbol_name) {		if (symbol_name) {
const size_t old_size = indexes.size();		const size_t old_size = indexes.size();
if (!m_name_indexes_computed)		if (!m_name_indexes_computed)
InitNameIndexes();		InitNameIndexes();

auto &name_to_index = GetNameToSymbolIndexMap(lldb::eFunctionNameTypeNone);
std::vector<uint32_t> all_name_indexes;		std::vector<uint32_t> all_name_indexes;
const size_t name_match_count =		const size_t name_match_count =
name_to_index.GetValues(symbol_name, all_name_indexes);		GetNameIndexes(symbol_name, all_name_indexes);
for (size_t i = 0; i < name_match_count; ++i) {		for (size_t i = 0; i < name_match_count; ++i) {
if (CheckSymbolAtIndex(all_name_indexes[i], symbol_debug_type,		if (CheckSymbolAtIndex(all_name_indexes[i], symbol_debug_type,
symbol_visibility))		symbol_visibility))
indexes.push_back(all_name_indexes[i]);		indexes.push_back(all_name_indexes[i]);
}		}
return indexes.size() - old_size;		return indexes.size() - old_size;
}		}
return 0;		return 0;
▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines