This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Filter DIEs based on qualified name when possible
ClosedPublic

Authored by bulbazord on Jul 13 2022, 12:06 PM.

Details

Summary

Context:
When setting a breakpoint by name, we invoke Module::FindFunctions to find the function(s) in question. However, we use a Module::LookupInfo to first process the user-provided name and figure out exactly what we're looking for. When we actually perform the function lookup, we search for the basename. After performing the search, we then filter out the results using Module::LookupInfo::Prune. For example, given a::b::foo we would first search for all instances of foo and then filter out the results to just names that have a::b::foo in them. As one can imagine, this involves a lot of debug info processing that we do not necessarily need to be doing.

Some numbers:
Debugging LLDB and placing a breakpoint on llvm::itanium_demangle::StringView::begin without this change takes approximately 70 seconds and resolves 31,920 DIEs. With this change, placing the breakpoint takes around 30 seconds and resolves 8 DIEs.

Diff Detail

Event Timeline

bulbazord created this revision.Jul 13 2022, 12:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 12:06 PM
Herald added a subscriber: arphaman. · View Herald Transcript
bulbazord requested review of this revision.Jul 13 2022, 12:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 12:06 PM

Well.. first of let me say that the performance gain is impressive. I wouldn't have expected to gain that much with a relatively simple change.

Now, as for the implementation, I have two main questions:

  • Do we really need the GetQualifiedNameWithParams function? My impression was that we've been moving towards ignoring function arguments for name matching, and CPlusPlusLanguage::DemangledNameContainsPath does not look like it is making use of that. Note that I don't think that switching to GetQualifiedName is the right approach either. I think we should use the mangled name (GetMangledName) first, and only fall back to GetQualifiedName if that is unavailable (you can look up how it works during SymbolContext construction, but I believe it is roughly like that. (Incidentally, that will bring in the function arguments through the back door for some/most functions, but that isn't my motivation.)
  • This patch is duplicating some of the logic inside Module::LookupInfo::Prune. I get that we can't reuse it as-is (because we'd need a SymbolContext object), but it seems like that function only cares about the GetFunctionName portion of the object, so it seems like it should be possible to refactor it such that one can reuse it from within this new context (just pass in a name instead of a full SymbolContext). (I guess this part could be a separate patch)
lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp
819–822

This is an interesting approach, but I'd just use llvm::ListSeparator for that.

Well.. first of let me say that the performance gain is impressive. I wouldn't have expected to gain that much with a relatively simple change.

Thank you! I am also very surprised, it turned out to be a worthwhile time investment.

  • Do we really need the GetQualifiedNameWithParams function? My impression was that we've been moving towards ignoring function arguments for name matching, and CPlusPlusLanguage::DemangledNameContainsPath does not look like it is making use of that. Note that I don't think that switching to GetQualifiedName is the right approach either. I think we should use the mangled name (GetMangledName) first, and only fall back to GetQualifiedName if that is unavailable (you can look up how it works during SymbolContext construction, but I believe it is roughly like that. (Incidentally, that will bring in the function arguments through the back door for some/most functions, but that isn't my motivation.)

My reasoning for writing GetQualifiedNameWithParams instead of just using GetQualifiedName is because of matching overloaded names. It's valid for a user to set a breakpoint like b a_function(int) to disambiguate any other functions named a_function. Without the parameter list, we would be setting a breakpoint on every a_function which is not correct if you specify the argument list. I originally used GetQualifiedName but TestBreakOnOverload failed...
That being said, I'm not sure how we would use a mangled name to perform the match. We're specifically matching on user input for breakpoints, so if a user inputs something like StringView::begin, how would we match that against mangled names? We would need some way to partially mangle StringView::begin and match that against the mangled name from the DIE, right? It would probably be faster to match against a mangled name though, so if there is a way to make that work that would be perfect.

  • This patch is duplicating some of the logic inside Module::LookupInfo::Prune. I get that we can't reuse it as-is (because we'd need a SymbolContext object), but it seems like that function only cares about the GetFunctionName portion of the object, so it seems like it should be possible to refactor it such that one can reuse it from within this new context (just pass in a name instead of a full SymbolContext). (I guess this part could be a separate patch)

Specifically you're suggesting a refactor of Module::LookupInfo::Prune so that we can deduplicate the logic right? I'd actually like to see if we can get rid of Module::LookupInfo::Prune altogether so we don't have to post-process search results but instead filter them as we find them. I hope I've accomplished at least some of that with this change, but getting rid of it entirely would require a few more changes in Symtab and in the other SymbolFile implementations (I think...).

jingham added a comment.EditedJul 14 2022, 6:09 PM

It always bugged me that break set -n some::long::path::a_common_word was going to realize all the debug info for functions called "a_common_word". In large projects there can be a lot of those. So I'm not 100% surprised that this gives us a good speedup (but happy!)

I certainly don't want us to give up on setting breakpoints by overload as well as function name. This is currently inconvenient because we don't have a "fuzzy argument matcher", instead you have to type the arguments exactly as the demangler would. That shouldn't be all that hard to implement, and even without that it's a fairly useful feature when you've got lots of overloads. The alternative, setting the breakpoint on the name and then disabling the overloads you don't want, is tedious and makes us look a little weak...

I'm not familiar enough with the DWARF parser to make useful comments on that part of the change, but the rest looks fine, and this is a great idea.

BTW, this is a side comment, but given that it's fairly common in the STL at least to have functions with a bevy of defaulted arguments that very few people specify or even really know are part of the function signature, it would be neat to do things like:

(lldb) break set -n foo::bar::baz(int, *)

or something, meaning, first argument is an int, I don't care about the others.

Well.. first of let me say that the performance gain is impressive. I wouldn't have expected to gain that much with a relatively simple change.

Thank you! I am also very surprised, it turned out to be a worthwhile time investment.

  • Do we really need the GetQualifiedNameWithParams function? My impression was that we've been moving towards ignoring function arguments for name matching, and CPlusPlusLanguage::DemangledNameContainsPath does not look like it is making use of that. Note that I don't think that switching to GetQualifiedName is the right approach either. I think we should use the mangled name (GetMangledName) first, and only fall back to GetQualifiedName if that is unavailable (you can look up how it works during SymbolContext construction, but I believe it is roughly like that. (Incidentally, that will bring in the function arguments through the back door for some/most functions, but that isn't my motivation.)

My reasoning for writing GetQualifiedNameWithParams instead of just using GetQualifiedName is because of matching overloaded names. It's valid for a user to set a breakpoint like b a_function(int) to disambiguate any other functions named a_function. Without the parameter list, we would be setting a breakpoint on every a_function which is not correct if you specify the argument list. I originally used GetQualifiedName but TestBreakOnOverload failed...
That being said, I'm not sure how we would use a mangled name to perform the match. We're specifically matching on user input for breakpoints, so if a user inputs something like StringView::begin, how would we match that against mangled names? We would need some way to partially mangle StringView::begin and match that against the mangled name from the DIE, right? It would probably be faster to match against a mangled name though, so if there is a way to make that work that would be perfect.

I'm sorry, that wasn't fully clear. What I meant to say was to take the *mangled* name, then *demangle* it, and perform the matching that way. This would ensure that we are matching against the name that we are eventually going to use. This is particularly important if we're going to be matching function arguments (where, it seems, I have misunderstood what we are doing). Matching full function signatures is hard enough, so we should at least be consistent so that if the user copies a name from some lldb output into a breakpoint command, then it will match. Right now, GetQualifiedNameWithParams ignores template arguments so it will produce something like llvm::find(std::vector<llvm::PassRegistrationListener*, std::allocator<llvm::PassRegistrationListener*>>&, llvm::PassRegistrationListener* const&) instead of auto llvm::find<std::vector<llvm::PassRegistrationListener*, std::allocator<llvm::PassRegistrationListener*>>&, llvm::PassRegistrationListener*>(std::vector<llvm::PassRegistrationListener*, std::allocator<llvm::PassRegistrationListener*>>&, llvm::PassRegistrationListener* const&) -- although, ironically, the user might actually prefer matching the first one.

In order to avoid mismatches like this, I think we should merge this name-producing code with the code which computes the "real" function names.

  • This patch is duplicating some of the logic inside Module::LookupInfo::Prune. I get that we can't reuse it as-is (because we'd need a SymbolContext object), but it seems like that function only cares about the GetFunctionName portion of the object, so it seems like it should be possible to refactor it such that one can reuse it from within this new context (just pass in a name instead of a full SymbolContext). (I guess this part could be a separate patch)

Specifically you're suggesting a refactor of Module::LookupInfo::Prune so that we can deduplicate the logic right? I'd actually like to see if we can get rid of Module::LookupInfo::Prune altogether so we don't have to post-process search results but instead filter them as we find them. I hope I've accomplished at least some of that with this change, but getting rid of it entirely would require a few more changes in Symtab and in the other SymbolFile implementations (I think...).

Well.. my main point was to centralize the code doing the filtering. Whether that code lives in Module::LookupInfo::Prune or somewhere else is not particularly important, but I would like to avoid having multiple copies of the logic, as that is bound to create inconsistencies. Obvously the way we *derive* the name will be different for DWARF vs. PDB vs. symtab, but I'd hope that the actual matching part can stay the same.

bulbazord updated this revision to Diff 445629.Jul 18 2022, 2:40 PM
bulbazord edited the summary of this revision. (Show Details)

I have factored out part of Module::LookupInfo::Prune into its own method Module::LookupInfo::NameMatchesLookupInfo which I re-use in ProcessFunctionDIE instead of copying code.

I also have switched to taking the mangled name of a DIE and demangled it to get the fully qualified name. This doesn't seem to have regressed performance at all, so I'm happy to go with this moving forward. I've deleted the GetQualifiedNameWithParams I had written previously since it shouldn't be needed anymore.

Friendly ping :)

clayborg added inline comments.Jul 26 2022, 2:38 PM
lldb/source/Core/Module.cpp
730

So this function can end up being called with an empty function_name when there is no mangled name (see comment in DWARFIndex.cpp). If there is no language then we return true, but what does a language plug-in do in DemangledNameContainsPath? The same thing?

I guess this means if you lookup "foo::erase" and you get a DIE that has "erase" as its DW_AT_name, but the DIE has no DW_AT_mangled (yes, some C++ debug info is emitted without a mangled name), that it will match any DIE that has a DW_AT_name that matches "erase" regardless of the actual decl context?

lldb/source/Plugins/SymbolFile/DWARF/DWARFIndex.cpp
37

So this looks like this can be called with an empty named since not all DIEs have mangled names. Do we not want to try and calculate the demangled name here if the mangled name is empty?

bulbazord marked an inline comment as done.Jul 26 2022, 2:50 PM
bulbazord added inline comments.
lldb/source/Plugins/SymbolFile/DWARF/DWARFIndex.cpp
37

Yes, we probably do want to calculate the demangled name here. The previous version of this patch had a function GetQualifiedNameWithParams that tried to calculate the fully qualified demangled name but I think it probably needed a little more work. I can bring that back unless you can think of a different way to do it.

clayborg added inline comments.Jul 26 2022, 3:00 PM
lldb/source/Plugins/SymbolFile/DWARF/DWARFIndex.cpp
37

You can avoid doing this demangling or calculating the demangled name if the name in the lookup_info is already mangled? Like if the name lookup is "_Z3foo", there is no need to demangle and see if that matches, so there could be some good efficiency gains by trying not to always demangle (expensive) or calculate the demangled named (even more expensive) if we are looking for a mangled name in lookup_info

Sorry for dropping off.

Using the mangled name helped, but I am still moderately worried about that the name that's going to be matched here is different from the "real" name that gets computed later. I think it's important to avoid subtle variations in the names computed in two places, and that's only possible with a single copy of the code (*that* was my biggest issue with the GetQualifiedNameWithParams function). Functions without a mangled name are relatively common (all extern "C" functions, and *all* functions if compiled with -gsce) and I think it'd be confusing if we used one name for matching, and then displayed another one.

The exact string is probably not as important as consistency so I think we could replace the code I linked to with whatever can be produced with the information you have available here.

bulbazord updated this revision to Diff 448472.Jul 28 2022, 4:24 PM
  • Expanded on NameMatchesLookupInfo to handle different scenarios when names we are comparing are mangled/demangled.
  • Extracted functionality from DWARFASTParserClang to construct a demangled name from the DIE's context. This is now exposed through SymbolFileDWARF so that we can get a demangled name of a DIE consistently with one piece of code.
bulbazord marked 3 inline comments as done.Jul 28 2022, 4:29 PM
bulbazord added inline comments.
lldb/source/Core/Module.cpp
730

I moved the empty check to the top since we always want to keep unnamed symbols.

labath added a comment.Aug 1 2022, 6:19 AM

I like how you've extracted the name construction functionality. Just one (inline) comment about the filtering.

lldb/source/Core/Module.cpp
740

I think this is overly aggressive. _Z3foov could be a method name in some particularly sadistic class. I think you can do this optimization only in one direction: if the names match, then return true without consulting the language plugin. At that point, I don't think you even need to check whether the names are mangled.

labath added inline comments.Aug 1 2022, 6:21 AM
lldb/source/Core/Module.cpp
740

Actually, this could go wrong even for names like _Zonk, since GetManglingScheme does not check that the string is an actually valid mangled name -- just that it looks like one from very far away.

bulbazord marked an inline comment as done.Aug 1 2022, 2:53 PM
bulbazord added inline comments.
lldb/source/Core/Module.cpp
740

Then in that case, perhaps this method should always take a demangled name? If it can take mangled names, it seems like a giant pain and potentially pretty expensive to figure out if it's mangled or not, especially given that in the sadistic case a name could look mangled but actually be the demangled name. Am I following your thought correctly?

labath added inline comments.Aug 2 2022, 3:00 AM
lldb/source/Core/Module.cpp
740

Not exactly. The way that lookup works in lldb (or at least, how I understand it) is that the user gives you a "name" and then you go on to match that name against various "things". Those things include stuff like the mangled name, full demangled name, basename, etc. The set of things to match can be constrained by through extra arguments (e.g. br set --fullname vs. br set --basename), but I don't think we have a way to say "match against mangled names _only_" -- things like br set --fullname _Z3foov will both match the function whose mangled name is that string (i.e. foo()) and the function which has that as its demangled name (_Z3foov() aka. _Z7_Z3foovv).

So I'd say this should just match both names without trying to be particularly efficient about it (obviously it can skip the demangle step if the mangled name already matches).

I'd also say it's possible I don't understand the context in which this function is invoked...

bulbazord updated this revision to Diff 449350.Aug 2 2022, 11:08 AM
bulbazord edited the summary of this revision. (Show Details)

Simplify Module::LookupInfo::NameMatchesLookupInfo. Perform an initial check to see if the name matches, and if not, do the more expensive demangling operation.

labath accepted this revision.Aug 3 2022, 5:24 AM
This revision is now accepted and ready to land.Aug 3 2022, 5:24 AM