This is an archive of the discontinued LLVM Phabricator instance.

[lldb] add new overload for SymbolFile::FindTypes that accepts a scope
AcceptedPublic

Authored by lassefolger on Nov 26 2021, 4:06 AM.

Details

Summary

Searching for fully qualified types in a SymbolFile can be inefficient
if it contains many types with the same basename. This is because the
SymbolFile does not offer and an interface to search for qualified
types. To get qualified typed you first need to query all types with the
same basename and then filter based on the scope.

This change introduces a new overload that accepts a scope which the
SymbolFile implementations can use to return filtered by scope.

This change also includes an implementation for DWARF that uses this
scope for filtering. All other implementation use the overload without
scope and filter afterwards.

The motivation for this change is a performance issue when findings
types in targets that contain many types (>1000) with the same basename.
In the two scenarios I tested this change on, the change improves the
time it takes to find a time from 86s -> 12s and from 12s -> 2s,
respectively.

Diff Detail

Event Timeline

lassefolger requested review of this revision.Nov 26 2021, 4:06 AM
lassefolger created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2021, 4:06 AM

rebased on nfc

werat added inline comments.Nov 26 2021, 4:28 AM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
2468

Can GetQualifiedName return NULL string?

2468–2469

We can skip this check altogether if the scope is empty.

2469

There's no need to calculate length on every iteration.

Also you can make sc an llvm::StringRef and do something like StringRef(s).startswith(sc).

shafik added a subscriber: shafik.Nov 30 2021, 10:47 AM

Is there a specific use case that motivated this feature?

Is there a specific use case that motivated this feature?

The motivation is performance. If there are many (>1000) types with the same base name this implementation performs much better than the current implementation.
This is mostly important for the first access (because later ones use the caches results). In our scenarios it reduced the time for the first access from 86s to 6s and from 12s to 2s. And the access for subsequest accesses from 0.15 to 0.08 and 0.12 to 0.06.

The primary reason is that Type::GetQualifiedName is executed for every type with the same basename when the types are filtered at the end. This triggers type completion each type which is very expensive.

I'm still working on this commit.

rebased on nfc and completed implementation to expose function in API

lassefolger edited the summary of this revision. (Show Details)Dec 5 2021, 1:15 AM
lassefolger added reviewers: werat, teemperor.
lassefolger marked 3 inline comments as done.Dec 5 2021, 1:16 AM
lassefolger added inline comments.Dec 7 2021, 5:49 AM
lldb/source/Symbol/SymbolFile.cpp
148

I need to change this. This was correct in the first iteration but is no longer correct in the current iteration of this change.

werat added inline comments.Dec 7 2021, 7:24 AM
lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
2461

nit: maybe call it scope_ref?

2470

llvm::StringRef(storage) is slightly more efficient here than llvm::StringRef(qn). The latter uses strlen in the constructor to calculate the length of the source string.

lassefolger marked 2 inline comments as done.

fixed comment

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
2470

I wasn't sure about the semantics of die.GetQualifiedName().
It does not document whether it's guaranteed that storage contains the results. FWIW, the current implementation does that.
I changed it to use the storage.

rebase patch

werat accepted this revision.Feb 4 2022, 3:45 AM

Looks good to me, I'm going to land it soon unless there are objections from other reviewers.

This revision is now accepted and ready to land.Feb 4 2022, 3:45 AM
JDevlieghere added inline comments.Feb 4 2022, 10:24 AM
lldb/include/lldb/Symbol/SymbolFile.h
238–240
242

Does the scope need to be a ConstString? These strings are kept in memory forever and we should be mindful of their use. Could this a StringRef?

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
2466

Can this go into if (has_scope) {?

shafik accepted this revision.Feb 4 2022, 11:57 AM

LGTM after addressing Jonas's comments.

lassefolger marked 2 inline comments as done.

update according to comments

lldb/include/lldb/Symbol/SymbolFile.h
242

I don't think it has to be a ConstString. I used a ConstString for consistency with name.
Let me change it to StringRef

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
2466

I don't mind putting it in there but thought reusing the same storage would avoid allocations.
Moving it into the function will lead to an allocation each time the function is called and the qualified name is too long for small string optimization.

Sorry I am late to making comments on this patch.

This patch wants to add "llvm::StringRef scope" to the FindTypes calls, but this is what "const CompilerDeclContext &parent_decl_ctx" is actually for. If "parent_decl_ctx" is set, it is a declaration context that each type must exist in. So the CompilerDeclContext might be something like "namespace foo" and it might have a parent decl context of "namespace bar". This is currently how filtering types that exist in a decl context works.

The "scope" seems redundant in light of this no? Also, a string based "scope" isn't accurate enough to correctly identify the exact scoping of a type as "a:🅱️:some_type" might be "namespace a -> class b -> some_type" or "namespace a -> namespace b -> some_type".

The one difficult thing about this is currently the "parent_decl_ctx" that is a parameter only works if the decl context comes from the current module. Each module creates types using its own TypeSystem subclass. Those types are vended as CompilerType objects which contain the TypeSystem pointer and a void * for the type. This allows each TypeSystem to create types. For clang compiled code we use lldb_private::TypeSystemClang and we create types as types in a clang::ASTContext. We can also specify declaration contexts by using CompilerDeclContext objects. Each CompilerDeclContext object has a "TypeSystem *" and a "void *" inside of it. Each expression makes its own TypeSystem as well, and one for each expression and one in the target. It should be possible to fix the issue of needing the compiler decl context to come from the current module by having the code that compares the decl context make by using:

ConstString CompilerDeclContext::GetScopeQualifiedName() const;

Or we can modify the CompilerDeclContext class to get the current name:

ConstString CompilerDeclContext::GetName() const;

And then also add an API to return an enumeration for the kind of the declaration context (Namespace, Class, Struct, Union, etc).

Sorry I am late to making comments on this patch.

This patch wants to add "llvm::StringRef scope" to the FindTypes calls, but this is what "const CompilerDeclContext &parent_decl_ctx" is actually for. If "parent_decl_ctx" is set, it is a declaration context that each type must exist in. So the CompilerDeclContext might be something like "namespace foo" and it might have a parent decl context of "namespace bar". This is currently how filtering types that exist in a decl context works.

The "scope" seems redundant in light of this no? Also, a string based "scope" isn't accurate enough to correctly identify the exact scoping of a type as "a:🅱️:some_type" might be "namespace a -> class b -> some_type" or "namespace a -> namespace b -> some_type".

The one difficult thing about this is currently the "parent_decl_ctx" that is a parameter only works if the decl context comes from the current module. Each module creates types using its own TypeSystem subclass. Those types are vended as CompilerType objects which contain the TypeSystem pointer and a void * for the type. This allows each TypeSystem to create types. For clang compiled code we use lldb_private::TypeSystemClang and we create types as types in a clang::ASTContext. We can also specify declaration contexts by using CompilerDeclContext objects. Each CompilerDeclContext object has a "TypeSystem *" and a "void *" inside of it. Each expression makes its own TypeSystem as well, and one for each expression and one in the target. It should be possible to fix the issue of needing the compiler decl context to come from the current module by having the code that compares the decl context make by using:

ConstString CompilerDeclContext::GetScopeQualifiedName() const;

Or we can modify the CompilerDeclContext class to get the current name:

ConstString CompilerDeclContext::GetName() const;

And then also add an API to return an enumeration for the kind of the declaration context (Namespace, Class, Struct, Union, etc).

The idea is that the scope is a hint which implementations can use to improve the performance.
However, I see that this redundancy might be confusing. I have not looked to much into the ComplierDeclContext.
I'll try to take another look later this week.

FWIW there has been a discussion between werat@ and teemperor@ about this on Discord about this topic in general. The results was to use a string rather than a CompilerDeclContext. Although the discussion didn't go into detail why this is:
https://discord.com/channels/636084430946959380/636732809708306432/849218944581632001

I'm still planning to look into this just couldn't find the time this week and the last. Sorry for the delay.

Sorry for the very late response.

I looked into using the CompilerDeclContext().

This does not work in this case because the context is not set in this specific use case and I'm not sure if it case be set.
It is initialized with an invalid context when the function is called.

The entry point in our use case is lldb::SBTarget::FindTypes(const char *). A stacktrace looks like this:
SymbolFileDWARF::FindTypes()
lldb_private::Module::FindTypes_Impl()
lldb_private::ModuleList::FindTypes()
lldb::SBTarget::FindTypes()

Given that the input is just a C-string, I'm not sure how to derive a CompilerDeclContext() from that.
What we are trying to optimize is a call like:
lldb.target.FindTypes("Some::Namespace::Type")

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 1:27 AM