This is an archive of the discontinued LLVM Phabricator instance.

[demangler] Add a "partial" demangling API for LLDB
ClosedPublic

Authored by erik.pilkington on Mar 19 2018, 6:47 PM.

Details

Summary

lldb-dev thread: http://lists.llvm.org/pipermail/lldb-dev/2018-January/013186.html

Currently, the demangler in ItaniumDemangler works by parsing the mangled name into an AST-like format, then traverses that AST to produce the demangled name. Some users (LLDB) consume this demangled name by writing custom parsers. This is redundant (in a very hot section, AFAIK), as the name has to be printed and the re-parsed. For large demangled names with many (potentially nested) <template-param>s and substitutions, the demangler has to produce a massive symbol that LLDB has to dig through in order to find the small amount of information that it needs. Just parsing to an AST expands no substitutions, and should run in about O(strlen of the mangled name).

This patch adds a partial demangling API that parses the mangled name into the AST-like format without printing and provides some functions to access interesting information from that AST. If LLDB used this instead of char*s to investigate the demangled name, it would be cleaner and save time. (AFAIK, I've never touched lldb!)

I've put this patch up super early, I still want to add more tests and make it more robust. I'm hoping that this would be a good place to find out if this is what the LLDB people actually want. Does anyone have any thoughts?

Thanks for taking a look!
Erik

Diff Detail

Event Timeline

davide added a subscriber: davide.Mar 19 2018, 6:55 PM

It would be awesome if you could benchmark on some pathological testcases (e.g. the one Greg was pointing out)

_ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_

That seems great. This would be the backing for the CPlusPlusLanguage::MethodName class (in lldb in Source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.h). It looks like a great fit for that.

I can't think of anything else we really wanted out of these names that you aren't currently providing. Greg might. But this looks like a great start to me.

Seems like it might be nice to replace all "char *Buf, size_t *N" arguments, where "char *" is returned with a llvm::raw_ostream and let the user use what ever backing / memory allocation scheme they want instead of forcing re-allocations? Probably best to include a person that has contributed to this file in the LLVM community so they can comment on the changes here.

llvm/include/llvm/Demangle/Demangle.h
42

Use "llvm::StringRef MangledName" instead of "const char *"?

46

Might be better to use a "llvm::raw_ostream &out" instead of just a plain buffer? Seems like llvm::raw_ostream can be backed by what ever kind of buffer you want it to be backed (llvm::raw_string_ostream is a nice thing to use with this that uses a std::string as the destination buffer).

54

Maybe rename to getFunctionDeclContext? or getFunctionDeclContextName?

64

s/provably/probably/ in comment and function name. Maybe "couldBeMemberFunction" might be a better name?

69

Can you document what a special name is here?

Great stuff, thanks for doing this BTW

It would be awesome if you could benchmark on some pathological testcases (e.g. the one Greg was pointing out)

As of r327690, the pathological symbol Greg posted is much smaller, due to a bug in how the demangler parsed generic lambda parameters. Now, partially demangling that symbol and getting the basename "Subscribe" is about 20x faster than fully demangling the name. Just for fun I removed the fix and benchmarked it again and it was more in the ballpark of hundreds of thousands of times faster. For more realistic symbols (test case being all the symbols in LLVM), partially demangling and extracting the basename runs in about 40% less time than fully demangling. Partially demangling and splitting up the entire name runs in about as long as the full demangle, but it should be faster if LLDB avoids parsing the demangled name. Depending on how LLDB is consuming these names, it might be profitable to do a first pass to find the base names of all the symbols in the program, then lazily demangle the other components as needed.

Seems like it might be nice to replace all "char *Buf, size_t *N" arguments, where "char *" is returned with a llvm::raw_ostream and let the user use what ever backing / memory allocation scheme they want instead of forcing re-allocations? Probably best to include a person that has contributed to this file in the LLVM community so they can comment on the changes here.

@rafael: Any thoughts? I think it would be nice to use LLVM data structures to express this API and have the no-dependencies demangler live in libcxxabi.

Thanks!

llvm/include/llvm/Demangle/Demangle.h
64

That wasn't a typo, this function checks if the name is for sure a non-static member function, so it conservatively returns false for a::b(). I suppose isDefinitelyMemberFunction() might be more clear.

clayborg added inline comments.Mar 20 2018, 1:09 PM
llvm/include/llvm/Demangle/Demangle.h
64

Maybe just "isMemberFunction()" would be clear enough? Don't think Definitely needs to be in the name.

labath added a subscriber: labath.Mar 21 2018, 3:24 AM

This looks great. Thank you for doing this.

llvm/include/llvm/Demangle/Demangle.h
64

I think that having something in the name which indicates this is just a best-effort guess is a good idea.

Alternatively, we could replace this function with something more well-defined (getFunctionQualifiers ? hasFunctionQualifiers?) and leave the business of guessing the member-ness of the function to lldb.

erik.pilkington marked 2 inline comments as done.

In this new patch:

  • Rename some functions, add some comments
  • Support for local names
  • More testcases
erik.pilkington removed a subscriber: rafael.

Use Rafael's Phab account.

llvm/include/llvm/Demangle/Demangle.h
42

On second thought, I don't think we can use LLVM data structures for this. libSupport depends on libDemangle, so using any LLVM stuff would create a circular dependency. (I'm basing this off the commit message for r328072)

46

This has the same problem as above, but also raw_ostream isn't compatible with our how the AST is printed (Sometimes the printer look back at the last char).

54

Sure, I called this getFunctionDeclContextName() in the new patch.

64

I agree it would be better to indicate that this is only a best effort guess. hasFunctionQualifiers sounds good, I renamed this function to that in the new patch.

erik.pilkington retitled this revision from [demangler] WIP: Add a "partial" demangling API for LLDB to [demangler] Add a "partial" demangling API for LLDB.

Rebase. In this new patch:

  • Add some more tests.
  • Minor cleanup.

I think I'm happy with the implementation of this now, so I'll land it as long as this API looks okay.

Looks good to me, but probably need someone that works on LLVM fulltime to give the final ok

I don't have objections to this one either. Maybe @majnemer / @espindola / @rnk / @Bigcheese have comment on the API?

I'll CC @dexonsmith and @compnerd too, they've helped with demangler reviews in the past.

I don't have an opinion on this.

dexonsmith accepted this revision.Apr 11 2018, 2:50 PM

I'd prefer to have this be as close as possible to the libcxxabi demangler (and use the same data structures) so that it's easy to detect when they diverge. If we ever move to monorepo hopefully there will be a way to factor them out into a single implementation.

Since that's where this ended up, LGTM!

This revision is now accepted and ready to land.Apr 11 2018, 2:50 PM
This revision was automatically updated to reflect the committed changes.