This is an archive of the discontinued LLVM Phabricator instance.

Make only one function that needs to be implemented when searching for types.
AcceptedPublic

Authored by clayborg on Nov 12 2022, 5:55 PM.

Details

Summary

This patch started off trying to fix Module::FindFirstType() as it sometimes didn't work. The issue was the SymbolFile plug-ins didn't do any filtering of the matching types they produced, and they only looked up types using the type basename. This means if you have two types with the same basename, your type lookup can fail when only looking up a single type. We would ask the Module::FindFirstType to lookup "Foo::Bar" and it would ask the symbol file to find only 1 type matching the basename "Bar", and then we would filter out any matches that didn't match "Foo::Bar". So if the SymbolFile found "Foo::Bar" first, then it would work, but if it found "Baz::Bar" first, it would return only that type and it would be filtered out.

Discovering this issue lead me to think of the patch Alex Langford did a few months ago that was done for finding functions, where he allowed SymbolFile objects to make sure something fully matched before parsing the debug information into an AST type and other LLDB types. So this patch aimed to allow type lookups to also be much more efficient.

As LLDB has been developed over the years, we added more ways to to type lookups. These functions have lots of arguments. This patch aims to make one API that needs to be implemented that serves all previous lookups:

  • Find a single type
  • Find all types
  • Find types in a namespace

This patch introduces a TypeMatch class that contains all of the settings and state needed for all previous type lookups. It contain a vector of CompilerContext objects that can fully or partially specify the lookup that needs to take place. This is controlled by the TypeMatch.m_exact_match boolean.

If you just want to lookup all types with a matching basename, regardless of the containing context, you can specify just a single CompilerContext entry that has a name and a CompilerContextKind mask of CompilerContextKind::AnyType.

Or you can fully specify the exact context to use when doing lookups like:
CompilerContextKind::Namespace "std"
CompilerContextKind::Class "foo"
CompilerContextKind::Typedef "size_type"

This change expands on the clang modules code that already used a vector<CompilerContext> items, but it modifies it to work with user type lookups. The clang modules type lookup is still an option that can be enabled o n the TypeMatch object.

This mirrors the most recent addition of type lookups that took a vector<CompilerContext> that allowed lookups to happen for the expression parser in certain places.

Prior to this we had the following APIs in Module:

void
Module::FindTypes(ConstString type_name, bool exact_match, size_t max_matches,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeList &types);

void
Module::FindTypes(llvm::ArrayRef<CompilerContext> pattern, LanguageSet languages,
                  llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
                  TypeMap &types);

void Module::FindTypesInNamespace(ConstString type_name,
                                  const CompilerDeclContext &parent_decl_ctx,
                                  size_t max_matches, TypeList &type_list);

The new Module API is much simpler. It gets rid of all three above functions and replaces them with:

void Module::FindTypes(TypeMatch &match);

The TypeMatch class contains all of the needed settings:

  • The vector<CompilerContext> that allow efficient lookups in the symbol file classes since they can look at basename matches only realize fully matching types. Before this any basename that matched was fully realized only to be removed later by code outside of the SymbolFile layer which could cause many types to be realized when they didn't need to.
  • If the lookup is exact or not. If not exact, then the compiler context must match the bottom most items that match the compiler context, otherwise it must match exactly
  • The max number of matches
  • If the compiler context match is for clang modules or not. Clang modules matches include a Module compiler context kind that allows types to be matched only from certain modules and these matches are not needed when d oing user type lookups.
  • An optional list of languages to use to limit the search to only certain languages
  • The set of SymbolFile objects that have already been searched
  • The matching type list for any matches that are found

The benefits of this approach are:

  • Simpler API, and only one API to implement in SymbolFile classes
  • Replaces the FindTypesInNamespace that used a CompilerDeclContext as a way to limit the search, but this only worked if the TypeSystem matched the current symbol file's type system, so you couldn't use it to lookup a type in another module
  • Fixes a serious bug in our FindFirstType functions where if we were searching for "foo::bar", and we found a "baz::bar" first, the basename would match and we would only fetch 1 type using the basename, only to drop it from the matching list and returning no results

Diff Detail

Event Timeline

clayborg created this revision.Nov 12 2022, 5:55 PM
Herald added a project: Restricted Project. · View Herald Transcript
clayborg requested review of this revision.Nov 12 2022, 5:55 PM
Herald added a project: Restricted Project. · View Herald Transcript
clayborg updated this revision to Diff 475032.Nov 13 2022, 5:09 PM

Added a test case for the failing cases where there are two types in a namespace or other declaration context and only one would previously be found, and the other would fail.

clayborg updated this revision to Diff 475208.Nov 14 2022, 10:37 AM

Remove unused local variables.

Thanks for looking into this. I'm not sure I like how TypeMatch is both input and output of the type search. What do you think about making an immutable "TypeQuery" object that describes the search characteristics and have the API return a TypeMap? Apart from making it easier to reason about it would, e.g., allow caching the result of a type lookup. I'm not sure what to du about the "visited" map in that case. Maybe it could be a mutable member of the query object. Ideally the TypeQuery object would have only constructors and getters and no methods that change its state.

Thanks for looking into this. I'm not sure I like how TypeMatch is both input and output of the type search. What do you think about making an immutable "TypeQuery" object that describes the search characteristics and have the API return a TypeMap?

The main issue is TypeMap has outlawed copy constructors and assignment operators for some reason. And the reason to contain all of the results in the object itself is because we often call a type search on a specific module first and then fall back to other modules. Having the TypeMap be part of the TypeMatch means the TypeMatch object can make decisions on if it is done or not (found the max number of matches, etc) and we only need to pass one object into each call and avoid copying and merging TypeMap objects which will just add inefficiency. So think of an API where we have:

TypeMap Module::FindTypes(TypeMatch &match);

If we want to search a module list for a type using this API, we will need to be able to pass the existing TypeMap result into each call to the above API.

TypeMap ModuleList::FindTypes(TypeMatch &match) {
  TypeMap result;
  for (auto module: m_modules) {
    TypeMap module_result = module.FindTypes(match);
    result.InsertUnique(module_result); // Copying things from the module_result to result will be a bit inefficient here
    // And now to check if the type match is done we need to pass this TypeMap into the match object
    if (match.Done(result))
      return;
  }
}

Right now the TypeMatch object has all of the state inside of it and it doesn't allow for coding mistakes, like if above someone instead of running "match.Done(result)" accidentally did "match.Done(module_result)" (using the wrong TypeMap to check if the matching is done. This is why I ended up with a single object that doesn't allow for any mistakes when using the APIs.

Let me know if you feel strongly about modifying this.

Apart from making it easier to reason about it would, e.g., allow caching the result of a type lookup. I'm not sure what to du about the "visited" map in that case. Maybe it could be a mutable member of the query object.

Ideally the TypeQuery object would have only constructors and getters and no methods that change its state.

I will try and get rid of the SetMatchContext functions and use only constructors and see how it looks.

I just got time to run a quick comparison to see how much more efficient this patch is over what we currently have when doing lookups. Since we only resolve types whose context matches now, we parse many orders of magnitude fewer types. I added the following log line into SymbolFileDWARF::ParseType(...):

LLDB_LOG(GetLog(LLDBLog::Symbols), "ParseType for die @ {0:x} {1} (\"{2}\")", die.GetOffset(), die.GetTagAsCString(), die.GetName());

And in this fixed LLDB we see we find resolve 10 types when searching a debug version of LLDB when searching for "llvm::StringRef::iterator":

(lldb) target create "../Debug/bin/lldb"
Current executable set to '/Users/gclayton/Documents/src/lldb/main/Debug/bin/lldb' (arm64).
(lldb) log enable lldb symbol
(lldb) type lookup llvm::StringRef::iterator
 ParseType for die @ 0x2265 DW_TAG_typedef ("iterator")
 ParseType for die @ 0x2187 DW_TAG_class_type ("StringRef")
 ParseType for die @ 0x12e DW_TAG_typedef ("iterator")
 ParseType for die @ 0x50 DW_TAG_class_type ("StringRef")
 ParseType for die @ 0x8fb0d DW_TAG_pointer_type ("")
 ParseType for die @ 0x8fb12 DW_TAG_const_type ("")
 ParseType for die @ 0x8fb18 DW_TAG_base_type ("char")
 ParseType for die @ 0x28efa DW_TAG_pointer_type ("")
 ParseType for die @ 0x28eff DW_TAG_const_type ("")
 ParseType for die @ 0x28f05 DW_TAG_base_type ("char")
const char *
const char *
(lldb) quit

So we parse 10 types now, where as before we parsed 2242541 types. I picked the typename "llvm::StringRef::iterator" because the previous code would have done a lookup on "iterator" and then weed out the resolved types before returning to the user. The old LLDB will every single "iterator" in any STL class, or any LLVM classes. And along the way it will resolve the containing classes and all of the types needed for the methods in the class, etc.

So this type lookup is 5 orders of magnitude better than we used to be!

clayborg updated this revision to Diff 475310.Nov 14 2022, 4:47 PM

Make TypeMatch immutable (const) in type searches. Moves the TypeMap out and pass it in by reference to it can be populated in the FindTypes(...) methods that plug-ins override. This keeps the TypeMatch object const so it can't change, and allows results to be efficienctly populated by passing the TypeMap by reference to call sites.

I still have a few more questions about the interface design here. May main goal is to eliminate as much mutable state and objects being passed by reference as makes sense (and no more :-).
In that vain I'm trying to get rid of all Set...() methods and all places where we pass a mutable reference to something. The accumulating TypeMap may be a good example where that isn't possible, though I wonder how common global exhaustive searches are and if we usually wouldn't stop in most cases after we found the first match?

I appreciate splitting it up into

FindTypes(match, results);

My hope was for match to be completely immutable and passed by value (const reference if need be), so I wonder if the m_searched_symbol_files dictionary shouldn't be part of the result object?
Are all the setters really necessary? Could we just have a bunch of static "create...()" members that instantiate an immutable search specification?

As far as the naming is concerned, what's currently called TypeMatch should maybe be called TypeQuery, since match seems to imply a result?

Why is FindFirstType a member of TypeMatch? I would have expected a function called FindFirstType to take a TypeMatch search specification as an argument?

lldb/include/lldb/Symbol/CompilerDecl.h
93

Why can't this be a return value? The context objects are tiny.

lldb/include/lldb/Symbol/CompilerDeclContext.h
66

same here

lldb/include/lldb/Symbol/Type.h
185

An ArrayRef is meant to be passed by value.

197

if?

241

Is this necessary, or could this be rolled into the constructor?

306

This sentence is missing at least one word. (I also don't quite understand the purpose of this flag)

I still have a few more questions about the interface design here. May main goal is to eliminate as much mutable state and objects being passed by reference as makes sense (and no more :-).
In that vain I'm trying to get rid of all Set...() methods and all places where we pass a mutable reference to something. The accumulating TypeMap may be a good example where that isn't possible, though I wonder how common global exhaustive searches are and if we usually wouldn't stop in most cases after we found the first match?

We look up many matches in the expression parser when we have a context to use for the lookup, and then we just have a base type name and we must return all of the results. Also people might want to lookup all of the types for a qualified name. So multiple matches are needed and need to span across modules so that expression lookups can get all of the information they need.

I appreciate splitting it up into

FindTypes(match, results);

My hope was for match to be completely immutable and passed by value (const reference if need be), so I wonder if the m_searched_symbol_files dictionary shouldn't be part of the result object?
Are all the setters really necessary? Could we just have a bunch of static "create...()" members that instantiate an immutable search specification?

As far as the naming is concerned, what's currently called TypeMatch should maybe be called TypeQuery, since match seems to imply a result?

I will rename TypeMatch to TypeQuery and try to see if I can put all mutable parts into a TypeQueryResults,

Why is FindFirstType a member of TypeMatch? I would have expected a function called FindFirstType to take a TypeMatch search specification as an argument?

It just sets all of the settings needed to do the TypeMatch correctly and then does the query. I avoids a bunch of duplicated code all over the place where call locations were creating a TypeMatch object with some query, then setting the max number of matches to 1 and then doing the type query. So it is a convenience function and allows us to modify anything needed when finding the first type and all locations that call this won't ever need to change if we modify the TypeMatch API.

I will try and break things up further and see if I can get rid of set accessors of TypeMatch (soon to be TypeQuery. I want to avoid having too many parameters to a constructor or create function, but it depends on how these queries are used all over. Let me do one more round of cleanup and you can let me know what you think,

clayborg updated this revision to Diff 476917.Nov 21 2022, 8:30 AM

Renamed TypeMatch to TypeQuery and made a new TypeResults class that contains the type query results and map of which symbol files were visited. Got rid of the max number of matches in favor if a boolean that indicates if we want to just find one type.

Adrian: so it doesn't seem to make sense to try and get rid of all set accessors because it really complicates the way the TypeQuery is used. We have places where a type query is being create and modified as we parse options from a command line command and we need to change the type query. I have gotten rid of most of the set accessors and mainly try to use one of 3 or 4 constructors now. There is only 1 set accessor now. Let me know what you think.

I think the void FindTypes(const TypeQuery &query, TypeResults &results); API looks like a reasonable compromise. I have a bunch of smaller questions inside, but otherwise I think this can work.

lldb/include/lldb/Core/Module.h
435

Why is this not taking a TypeQuery that wraps a name?

442–444

copy&paste error?

lldb/include/lldb/Core/ModuleList.h
350–357

needs update now

373

same question — why is this not a TypeQuery?

lldb/include/lldb/Symbol/CompilerDecl.h
90

declaration

93

ping?

lldb/include/lldb/Symbol/Type.h
85

Can we get rid of this now?

231

Would you mind sorting this closer to the constructors, so it doesn't appear in the middle of the list of functions that FindTypes implementation would use?

we could even wrap the setup / match functions in a doxygen group
/ \{
/ \}

275

Could this return an ArrayRef<CompilerContext>?

clayborg added inline comments.Nov 30 2022, 9:36 PM
lldb/include/lldb/Core/Module.h
435

This function doesn't need to exist as long as the "TypeSP TypeQuery::FindFirstType(Module)" function is ok to have around.

442–444

yep!

lldb/include/lldb/Core/ModuleList.h
373

Like in Module.h we can get rid of this as long as the convenience functions in TypeQuery::FindFirstType() are ok.

lldb/include/lldb/Symbol/CompilerDecl.h
90

will fix

93

Will change!

lldb/include/lldb/Symbol/Type.h
85

I believe so! I will change "default" to "delete".

241

I will add it to the constructor.

306

I will reword

aprantl added inline comments.Dec 1 2022, 9:05 AM
lldb/include/lldb/Core/Module.h
435

It would be good to have some consistency in the interface:

  • Either TypeQuery provides Find* methods that take a Module
  • or Module provides Find* methods that take a TypeQuery.

Why does this interface exist instead of having a "FindFirstOnly" flag in TypeQuery? Is it because of the overhead of having to dig the type out of a TypeMap?

Otherwise we could just have one method

Module::FindTypes(const TypeQuery &query, TypeResults &results);

that can be called like this

module_sp->FindTypes(TypeQuery::createFindFirstByName("name", e_exact_match), result);
TypeSP type = result.GetTypeAtIndex(0); /// better have a special method for this too
lldb/include/lldb/Symbol/Type.h
296

/// This is the result of a \ref TypeQuery.

299

document what the bool flag does?

clayborg updated this revision to Diff 479399.Dec 1 2022, 1:13 PM

Addressed review comments:

  • remove FindFirstType from Module and ModuleList
  • fix comment typos
  • remove TypeQuery::SetModuleSearch(...) accessor and add bool to constructors
clayborg updated this revision to Diff 479408.Dec 1 2022, 1:27 PM

Update headerdoc on the TypeResults class and methods.

clayborg added inline comments.Dec 1 2022, 8:36 PM
lldb/include/lldb/Core/Module.h
435

See what you think of the current code and let me know any changes you would like to see.

lldb/include/lldb/Symbol/CompilerDecl.h
93

I looked around LLVM code and there are many accessors that use this technique. Otherwise we need to enforce an exact SmallVector type as the return code and we can't use "llvm::SmallVectorImpl<lldb_private::CompilerContext>". I can change this to return a "llvm::SmallVector<CompilerContext, 4>" if needed? But that locks us into a specific small vector with a specific size of preached items.

clayborg added inline comments.Dec 1 2022, 8:41 PM
lldb/include/lldb/Symbol/CompilerDecl.h
93

change "preached" to "preset" above...

I have only two concerns left:

  1. IMHO it would be better if find_one (which I think of as a search *input*) would live in TypeQuery instead of TypeResult. It feels wrong to use TypeResult to pass something *in*.
  2. It's inconsistent that Module has a FindTypes(TypeQuery) method and TypeQuery has a Find(First)Type(Module) method.

Could we just move find_one into TypeQuery and get rid of the Find methods in TypeQuery.
Let me know if I'm misunderstanding something.

lldb/include/lldb/Symbol/CompilerDecl.h
93

Thanks for checking — that makes sense.

lldb/source/Symbol/Type.cpp
149

IMHO it would be better if find_one (which I think of as a search *input*) would live in TypeQuery instead of TypeResult. It feels wrong to use TypeResult to pass something *in*.

clayborg updated this revision to Diff 479788.Dec 2 2022, 5:48 PM

Address review comments:

  • move "find_one" to the TypeQuery object instead of in the TypeResults
  • add "find_one" to all TypqQuery constructors
  • add accessors to get/set the "find_one" on a TypeQuery
  • remove the TypeResults::FindFirstType(...) variants.
aprantl accepted this revision.Dec 5 2022, 12:57 PM

Thanks for bearing with me. I think this is good now. One last suggestion I have is to turn the bools into an enum, so we can't accidentally mix their order up. Otherwise this LGTM!

lldb/include/lldb/Symbol/Type.h
139

Should this be an enum a la TypeQuery::e_exact_match | e_first_only?

275

We shouldn't need this if it's in the constructor?

This revision is now accepted and ready to land.Dec 5 2022, 12:57 PM
clayborg updated this revision to Diff 481034.Dec 7 2022, 1:12 PM

Change boolean parameters to constructors to use a bitmask enumeration for readability of code.

@aprantl can you check this latest patch and accept again if you like what you see? All tests passed.

aprantl accepted this revision.Dec 8 2022, 4:44 PM
aprantl added inline comments.
lldb/include/lldb/Symbol/Type.h
101

this comment is out of date and should probably just be deleted.

270

Can we just delete this accessor?

Update: There are some new failures after someone recently added new tests that involve simple template types. I need to fix my new lookup routine to do the right thing in light of these new tests. The main issue is currently if you have a "foo<int>", the accelerator tables and DWARF contain "foo" only. This causes the expression parser to be able to lookup "foo" as it is parsing an expression that has "auto a = foo<int>()" as it will actually return a specialized "foo<int>" as the result since the accelerator table will point to the "foo<int>" type. We need to outlaw these raw lookups from working. Worse yet, if there are multiple instantiations of "foo<T>", it will return all of them and cause LLDB to crash later in some AST copying code.

clayborg updated this revision to Diff 492891.Jan 27 2023, 1:27 PM

Updated to latest sources and fixed issues from a recently added test for simple template types that showed some issues in type lookup. We previously would allow results from a lookup like "Foo" to match a type that was in the accelerator ables as "Foo", but for a type that was really "Foo<int>". This latest addition to the patch fixes these lookups from incorrectly succeeding.