This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
include/lldb/Core/
-
lldb/
-
Core/
-
ModuleList.h
-
source/
-
Core/
-
CoreProperties.td
-
ModuleList.cpp
-
Symbol/
1
Symtab.cpp

Differential D118812

[lldb] Add a setting to skip long mangled names
AbandonedPublic

Authored by JDevlieghere on Feb 2 2022, 9:50 AM.

Download Raw Diff

Details

Reviewers

jingham
clayborg
labath

Summary

Libraries which rely heavily on templates (e.g. boost) can generate extremely long symbol names, with mangled names in the 10.000 of characters. These symbols take a long time to demangle and can results in unmangled names that are several megabytes in size. This patch adds a setting to skip past these symbols when indexing the symbol table to speed up launch/attach times and keep memory usage in check.

I arbitrarily picked 1000 as the default value which seems large enough to not affect most workflows.

Diff Detail

Event Timeline

JDevlieghere requested review of this revision.Feb 2 2022, 9:50 AM

JDevlieghere created this revision.

Harbormaster completed remote builds in B147175: Diff 405314.Feb 2 2022, 9:53 AM

Any chance you might want a limit on the size of the demangled name too? (might be worth considering what the most densely encoded mangled name is (ie: what's the longest name that could be produced by a 10k long mangled name? and see if that's worth having another cutoff for)

Is it worth trying to come up with a limit that's not arbitrarily picked?

lldb/source/Symbol/Symtab.cpp
300	technically `demangling_limit`

In D118812#3291109, @dblaikie wrote:

Any chance you might want a limit on the size of the demangled name too? (might be worth considering what the most densely encoded mangled name is (ie: what's the longest name that could be produced by a 10k long mangled name? and see if that's worth having another cutoff for)

Ironically, lldb seldom cares about most of the goo in these long demangled names. At this point, we are building up our fast-lookup "name indexes". We really only care about extracting the fully scoped names of the methods. When we get around to doing smart matching on overloads, we can still pull out all the matches to the method name, and then do the overload match on the results. That should be sufficiently efficient, and obviate the need to do any fancy indexing based on overloads. So most of the work of demangling these names is not being used anyway.

So what would be the better solution for lldb on the demangling front would be a way to tell the demangler "only extract the full method name, and don't bother producing the argument list or return values". But I have no idea how easy that would be in the demangler.

Note, I am not suggesting this as a replacement for this patch, since this fixes some really silly cases that template abuse has inflicted on us. This is a longer term suggestion.

JDevlieghere edited the summary of this revision. (Show Details)Feb 2 2022, 10:51 AM

JDevlieghere edited the summary of this revision. (Show Details)

In D118812#3291303, @jingham wrote:

In D118812#3291109, @dblaikie wrote:

Any chance you might want a limit on the size of the demangled name too? (might be worth considering what the most densely encoded mangled name is (ie: what's the longest name that could be produced by a 10k long mangled name? and see if that's worth having another cutoff for)

Ironically, lldb seldom cares about most of the goo in these long demangled names. At this point, we are building up our fast-lookup "name indexes". We really only care about extracting the fully scoped names of the methods. When we get around to doing smart matching on overloads, we can still pull out all the matches to the method name, and then do the overload match on the results. That should be sufficiently efficient, and obviate the need to do any fancy indexing based on overloads. So most of the work of demangling these names is not being used anyway.

So what would be the better solution for lldb on the demangling front would be a way to tell the demangler "only extract the full method name, and don't bother producing the argument list or return values". But I have no idea how easy that would be in the demangler.

I think there's an API level of the demangler in LLVM designed for rewriting demangled names (@rsmith created/implemented that, I think) - I'm not sure if it's structured to allow lazy parsing/stopping after you get the base name, for instance, but maybe...

Can we put a limit on the length of the kinds of names we are willing to demangle in the first place? How long are some of these names _prior_ to demangling? It would be great if we could skip demangling names that are too long to begin with. That would allow us to skip trying to create the demangled name in the first place which is part of the memory problem right? Once the mangled name has been added to the ConstString pool it is already too late and we won't save any memory.

In D118812#3291937, @clayborg wrote:

Can we put a limit on the length of the kinds of names we are willing to demangle in the first place? How long are some of these names _prior_ to demangling? It would be great if we could skip demangling names that are too long to begin with. That would allow us to skip trying to create the demangled name in the first place which is part of the memory problem right? Once the mangled name has been added to the ConstString pool it is already too late and we won't save any memory.

Yep, that's exactly what this patch does: it checks the length of the mangled name and skips demangling if they are too long. It only does this while building the index though, so that if we need the demangled name elsewere we'll still cache it in the string pool.

In D118812#3291482, @dblaikie wrote:

In D118812#3291303, @jingham wrote:

In D118812#3291109, @dblaikie wrote:

Any chance you might want a limit on the size of the demangled name too? (might be worth considering what the most densely encoded mangled name is (ie: what's the longest name that could be produced by a 10k long mangled name? and see if that's worth having another cutoff for)

Ironically, lldb seldom cares about most of the goo in these long demangled names. At this point, we are building up our fast-lookup "name indexes". We really only care about extracting the fully scoped names of the methods. When we get around to doing smart matching on overloads, we can still pull out all the matches to the method name, and then do the overload match on the results. That should be sufficiently efficient, and obviate the need to do any fancy indexing based on overloads. So most of the work of demangling these names is not being used anyway.

So what would be the better solution for lldb on the demangling front would be a way to tell the demangler "only extract the full method name, and don't bother producing the argument list or return values". But I have no idea how easy that would be in the demangler.

I think there's an API level of the demangler in LLVM designed for rewriting demangled names (@rsmith created/implemented that, I think) - I'm not sure if it's structured to allow lazy parsing/stopping after you get the base name, for instance, but maybe...

We should definitely look into that as a general optimization for indexing the string table and would make sense in combination with D118814. For this particular patch, we're trying to avoid demangling at all if the symbol is too long, so unless a partial demangle is really cheap (it might be) we'd still want to exclude symbols based on their mangled length.

In D118812#3291950, @JDevlieghere wrote:

In D118812#3291937, @clayborg wrote:

Can we put a limit on the length of the kinds of names we are willing to demangle in the first place? How long are some of these names _prior_ to demangling? It would be great if we could skip demangling names that are too long to begin with. That would allow us to skip trying to create the demangled name in the first place which is part of the memory problem right? Once the mangled name has been added to the ConstString pool it is already too late and we won't save any memory.

Yep, that's exactly what this patch does: it checks the length of the mangled name and skips demangling if they are too long. It only does this while building the index though, so that if we need the demangled name elsewere we'll still cache it in the string pool.

Yikes sorry, I read line Symtab.cpp:311 as if it was demangling the name first and then checking _its_ length...

The settings leads one to believe that we don't ever demangle names longer than X characters, but it really means "don't auto demangle when indexing the symbol table". Should we actually stop the names from ever being demangled in "Mangled::GetDemangledName()" to force us to never try and demangle these kinds of names ever? We would need to have the setting modify a static value in that Mangled class when the setting gets changed so that the Mangled::GetDemangledName() function wouldn't have to try and access the settings each time someone wanted to demangle something. Then we will still display the mangled name in stack traces, but we won't ever waste time trying to demangle these names. When they are that long, they are really not useful at all for me at least.

In D118812#3291969, @clayborg wrote:

Yikes sorry, I read line Symtab.cpp:311 as if it was demangling the name first and then checking _its_ length...

The settings leads one to believe that we don't ever demangle names longer than X characters, but it really means "don't auto demangle when indexing the symbol table". Should we actually stop the names from ever being demangled in "Mangled::GetDemangledName()" to force us to never try and demangle these kinds of names ever? We would need to have the setting modify a static value in that Mangled class when the setting gets changed so that the Mangled::GetDemangledName() function wouldn't have to try and access the settings each time someone wanted to demangle something. Then we will still display the mangled name in stack traces, but we won't ever waste time trying to demangle these names. When they are that long, they are really not useful at all for me at least.

Yeah, I considered that but I noticed some uses of the demangled names in the language plugins and I wasn't sure if that was going to break. If we think that it never makes sense to demangle these name, I'm happy to move it into Demangled and make it so that we never demangle these.

If we can distinguish between "when we handle all mangled names" and "when we handle one" I think we should continue to demangle names in the "when we handle one" case, since you never know when somebody really might need to look at the whole name and that won't be super expensive even in the worst case. But OTOH that probably happens seldom, so if there's no good way to distinguish, I'm also fine with never demangling over some limit.

In D118812#3291954, @JDevlieghere wrote:

In D118812#3291482, @dblaikie wrote:

In D118812#3291303, @jingham wrote:

In D118812#3291109, @dblaikie wrote:

Any chance you might want a limit on the size of the demangled name too? (might be worth considering what the most densely encoded mangled name is (ie: what's the longest name that could be produced by a 10k long mangled name? and see if that's worth having another cutoff for)

Ironically, lldb seldom cares about most of the goo in these long demangled names. At this point, we are building up our fast-lookup "name indexes". We really only care about extracting the fully scoped names of the methods. When we get around to doing smart matching on overloads, we can still pull out all the matches to the method name, and then do the overload match on the results. That should be sufficiently efficient, and obviate the need to do any fancy indexing based on overloads. So most of the work of demangling these names is not being used anyway.

So what would be the better solution for lldb on the demangling front would be a way to tell the demangler "only extract the full method name, and don't bother producing the argument list or return values". But I have no idea how easy that would be in the demangler.

I think there's an API level of the demangler in LLVM designed for rewriting demangled names (@rsmith created/implemented that, I think) - I'm not sure if it's structured to allow lazy parsing/stopping after you get the base name, for instance, but maybe...

We should definitely look into that as a general optimization for indexing the string table and would make sense in combination with D118814. For this particular patch, we're trying to avoid demangling at all if the symbol is too long, so unless a partial demangle is really cheap (it might be) we'd still want to exclude symbols based on their mangled length.

The most expensive step in demangling is the actual construction of the demangled string. It's fairly easy to make that exponential (because the the output string can be exponentially larger than the input). The construction of AST (well, a kind of a DAG actually), should always be linear.

And extracting the name this way will also save us from having to another parse of the demangled name (to extract the base name), so it's double goodness. I don't think the actual extraction should be that hard. The trickiest part is understanding the way in which the name are encoded so that you know what to look for.

In D118812#3291109, @dblaikie wrote:

(ie: what's the longest name that could be produced by a 10k long mangled name? and see if that's worth having another cutoff for)

I can create a 1MB demangled name from a string of just 123 bytes. By my estimate, a 1000-char mangled name could produce a demangled name whose length would have 43 digits. 10k mangled => 10^430 demanged, etc.

In D118812#3292884, @labath wrote:

And extracting the name this way will also save us from having to another parse of the demangled name (to extract the base name), so it's double goodness. I don't think the actual extraction should be that hard.

Actually, I now see that we already have code which does just that. The mangled.DemangleWithRichManglingInfo call will use the "partial" demangler to extract the interesting pieces of the mangled name. It will also save that demangled name, but it only does that to avoid another demangling operation. We could easily make it skip that step. And I see that the other patch does just that...

So I guess my question is: what does this patch buy us vs. just doing the second patch alone? It seems like it would be nice to be able to let the user break do break set -n foo and have it stop on Something<RidiculouslyLongTemplate<...>>::foo..

Is it maybe because then the Something<RidiculouslyLongTemplate<...>> will be stored as a part foos "context" ? If so, maybe we could have it avoid storing the context instead? Or even better: store a simplified version of the scope with template gunk above some level removed?

JDevlieghere mentioned this in D118814: [lldb] Don't keep demangled names in memory after indexing.Feb 3 2022, 8:35 AM

Thanks Pavel. I should've dug a little deeper in the rich mangling context. After giving it a second look I think you're absolutely right and we can drop this patch in favor of D118814.

Revision Contents

Path

Size

lldb/

include/

lldb/

Core/

ModuleList.h

1 line

source/

Core/

CoreProperties.td

4 lines

ModuleList.cpp

6 lines

Symbol/

Symtab.cpp

9 lines

Diff 405314

lldb/include/lldb/Core/ModuleList.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	class ModuleListProperties : public Properties {

void UpdateSymlinkMappings();		void UpdateSymlinkMappings();

public:		public:
ModuleListProperties();		ModuleListProperties();

FileSpec GetClangModulesCachePath() const;		FileSpec GetClangModulesCachePath() const;
bool SetClangModulesCachePath(const FileSpec &path);		bool SetClangModulesCachePath(const FileSpec &path);
		uint64_t GetDemanglingLimit() const;
bool GetEnableExternalLookup() const;		bool GetEnableExternalLookup() const;
bool SetEnableExternalLookup(bool new_value);		bool SetEnableExternalLookup(bool new_value);
bool GetEnableLLDBIndexCache() const;		bool GetEnableLLDBIndexCache() const;
bool SetEnableLLDBIndexCache(bool new_value);		bool SetEnableLLDBIndexCache(bool new_value);
uint64_t GetLLDBIndexCacheMaxByteSize();		uint64_t GetLLDBIndexCacheMaxByteSize();
uint64_t GetLLDBIndexCacheMaxPercent();		uint64_t GetLLDBIndexCacheMaxPercent();
uint64_t GetLLDBIndexCacheExpirationDays();		uint64_t GetLLDBIndexCacheExpirationDays();
FileSpec GetLLDBIndexCachePath() const;		FileSpec GetLLDBIndexCachePath() const;
▲ Show 20 Lines • Show All 434 Lines • Show Last 20 Lines

lldb/source/Core/CoreProperties.td

	include "../../include/lldb/Core/PropertiesBase.td"			include "../../include/lldb/Core/PropertiesBase.td"

	let Definition = "modulelist" in {			let Definition = "modulelist" in {
	def EnableExternalLookup: Property<"enable-external-lookup", "Boolean">,			def EnableExternalLookup: Property<"enable-external-lookup", "Boolean">,
	Global,			Global,
	DefaultTrue,			DefaultTrue,
	Desc<"Control the use of external tools and repositories to locate symbol files. Directories listed in target.debug-file-search-paths and directory of the executable are always checked first for separate debug info files. Then depending on this setting: On macOS, Spotlight would be also used to locate a matching .dSYM bundle based on the UUID of the executable. On NetBSD, directory /usr/libdata/debug would be also searched. On platforms other than NetBSD directory /usr/lib/debug would be also searched.">;			Desc<"Control the use of external tools and repositories to locate symbol files. Directories listed in target.debug-file-search-paths and directory of the executable are always checked first for separate debug info files. Then depending on this setting: On macOS, Spotlight would be also used to locate a matching .dSYM bundle based on the UUID of the executable. On NetBSD, directory /usr/libdata/debug would be also searched. On platforms other than NetBSD directory /usr/lib/debug would be also searched.">;
	def ClangModulesCachePath: Property<"clang-modules-cache-path", "FileSpec">,			def ClangModulesCachePath: Property<"clang-modules-cache-path", "FileSpec">,
	Global,			Global,
	DefaultStringValue<"">,			DefaultStringValue<"">,
	Desc<"The path to the clang modules cache directory (-fmodules-cache-path).">;			Desc<"The path to the clang modules cache directory (-fmodules-cache-path).">;
				def DemanglingLimit: Property<"demangling-max-length", "UInt64">,
				Global,
				DefaultUnsignedValue<1000>,
				Desc<"The maximum length of the mangled symbol name. Mangled symbols that exceed this threshold will not be demangled when indexing the symbol table. A value of 0 means no limit.">;
	def SymLinkPaths: Property<"debug-info-symlink-paths", "FileSpecList">,			def SymLinkPaths: Property<"debug-info-symlink-paths", "FileSpecList">,
	Global,			Global,
	DefaultStringValue<"">,			DefaultStringValue<"">,
	Desc<"Debug info path which should be resolved while parsing, relative to the host filesystem.">;			Desc<"Debug info path which should be resolved while parsing, relative to the host filesystem.">;
	def EnableLLDBIndexCache: Property<"enable-lldb-index-cache", "Boolean">,			def EnableLLDBIndexCache: Property<"enable-lldb-index-cache", "Boolean">,
	Global,			Global,
	DefaultFalse,			DefaultFalse,
	Desc<"Enable caching for debug sessions in LLDB. LLDB can cache data for each module for improved performance in subsequent debug sessions.">;			Desc<"Enable caching for debug sessions in LLDB. LLDB can cache data for each module for improved performance in subsequent debug sessions.">;
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

lldb/source/Core/ModuleList.cpp

	Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines

	FileSpec ModuleListProperties::GetLLDBIndexCachePath() const {			FileSpec ModuleListProperties::GetLLDBIndexCachePath() const {
	return m_collection_sp			return m_collection_sp
	->GetPropertyAtIndexAsOptionValueFileSpec(nullptr, false,			->GetPropertyAtIndexAsOptionValueFileSpec(nullptr, false,
	ePropertyLLDBIndexCachePath)			ePropertyLLDBIndexCachePath)
	->GetCurrentValue();			->GetCurrentValue();
	}			}

				uint64_t ModuleListProperties::GetDemanglingLimit() const {
				const uint32_t idx = ePropertyDemanglingLimit;
				return m_collection_sp->GetPropertyAtIndexAsUInt64(
				nullptr, idx, g_modulelist_properties[idx].default_uint_value);
				}

	bool ModuleListProperties::SetLLDBIndexCachePath(const FileSpec &path) {			bool ModuleListProperties::SetLLDBIndexCachePath(const FileSpec &path) {
	return m_collection_sp->SetPropertyAtIndexAsFileSpec(			return m_collection_sp->SetPropertyAtIndexAsFileSpec(
	nullptr, ePropertyLLDBIndexCachePath, path);			nullptr, ePropertyLLDBIndexCachePath, path);
	}			}

	bool ModuleListProperties::GetEnableLLDBIndexCache() const {			bool ModuleListProperties::GetEnableLLDBIndexCache() const {
	const uint32_t idx = ePropertyEnableLLDBIndexCache;			const uint32_t idx = ePropertyEnableLLDBIndexCache;
	return m_collection_sp->GetPropertyAtIndexAsBoolean(			return m_collection_sp->GetPropertyAtIndexAsBoolean(
	▲ Show 20 Lines • Show All 926 Lines • Show Last 20 Lines

lldb/source/Symbol/Symtab.cpp

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	if (!m_name_indexes_computed) {
name_to_index.Reserve(num_symbols);		name_to_index.Reserve(num_symbols);

// The "const char *" in "class_contexts" and backlog::value_type::second		// The "const char *" in "class_contexts" and backlog::value_type::second
// must come from a ConstString::GetCString()		// must come from a ConstString::GetCString()
std::set<const char *> class_contexts;		std::set<const char *> class_contexts;
std::vector<std::pair<NameToIndexMap::Entry, const char *>> backlog;		std::vector<std::pair<NameToIndexMap::Entry, const char *>> backlog;
backlog.reserve(num_symbols / 2);		backlog.reserve(num_symbols / 2);

		const uint64_t mangling_limit =
		kastiglioneUnsubmitted Not Done Reply Inline Actions technically `demangling_limit` kastiglione: technically `demangling_limit`
		ModuleList::GetGlobalModuleListProperties().GetDemanglingLimit();

// Instantiation of the demangler is expensive, so better use a single one		// Instantiation of the demangler is expensive, so better use a single one
// for all entries during batch processing.		// for all entries during batch processing.
RichManglingContext rmc;		RichManglingContext rmc;
for (uint32_t value = 0; value < num_symbols; ++value) {		for (uint32_t value = 0; value < num_symbols; ++value) {
Symbol *symbol = &m_symbols[value];		Symbol *symbol = &m_symbols[value];
		Mangled &mangled = symbol->GetMangled();

		if (mangling_limit)
		if (mangled.GetMangledName().GetLength() > mangling_limit)
		continue;

// Don't let trampolines get into the lookup by name map If we ever need		// Don't let trampolines get into the lookup by name map If we ever need
// the trampoline symbols to be searchable by name we can remove this and		// the trampoline symbols to be searchable by name we can remove this and
// then possibly add a new bool to any of the Symtab functions that		// then possibly add a new bool to any of the Symtab functions that
// lookup symbols by name to indicate if they want trampolines. We also		// lookup symbols by name to indicate if they want trampolines. We also
// don't want any synthetic symbols with auto generated names in the		// don't want any synthetic symbols with auto generated names in the
// name lookups.		// name lookups.
if (symbol->IsTrampoline() \|\| symbol->IsSyntheticWithAutoGeneratedName())		if (symbol->IsTrampoline() \|\| symbol->IsSyntheticWithAutoGeneratedName())
continue;		continue;

// If the symbol's name string matched a Mangled::ManglingScheme, it is		// If the symbol's name string matched a Mangled::ManglingScheme, it is
// stored in the mangled field.		// stored in the mangled field.
Mangled &mangled = symbol->GetMangled();
if (ConstString name = mangled.GetMangledName()) {		if (ConstString name = mangled.GetMangledName()) {
name_to_index.Append(name, value);		name_to_index.Append(name, value);

if (symbol->ContainsLinkerAnnotations()) {		if (symbol->ContainsLinkerAnnotations()) {
// If the symbol has linker annotations, also add the version without		// If the symbol has linker annotations, also add the version without
// the annotations.		// the annotations.
ConstString stripped = ConstString(		ConstString stripped = ConstString(
m_objfile->StripLinkerSymbolAnnotations(name.GetStringRef()));		m_objfile->StripLinkerSymbolAnnotations(name.GetStringRef()));
▲ Show 20 Lines • Show All 1,026 Lines • Show Last 20 Lines