This is an archive of the discontinued LLVM Phabricator instance.

Use rich mangling information in Symtab::InitNameIndexes()
AbandonedPublic

Authored by sgraenitz on Jul 30 2018, 8:07 AM.

Details

Summary

This review is about getting your feedback for the patch. If it doesn't make it in this form, I can adjust everything that's necessary and open a new review once I am done. So please don't hesitate to share your honest opinions.

In preparation for this review, there were two earlier ones:

Primary goals for this patch are:
(1) Use ItaniumPartialDemangler's rich mangling info for building LLDB's name index.
(2) Make a uniform interface so that Symtab doesn't get involved with mangling details too much.
(3) Improve indexing performance.

In order to achive (1) and (2) I added two classes:

  • RichManglingInfo offers a uniform interface to query symbol properties like getFunctionDeclContextName() or isCtorOrDtor(). It can switch between different providers internally. At the moment it supports llvm::ItaniumPartialDemangler and lldb_private::CPlusPlusLanguage::MethodName (legacy/fallback implementation).
  • RichManglingSpec handles configuration and lifetime of RichManglingInfos. It is likely stack-allocated and can be reused for multiple queries during batch processing.

These classes are used for wrapping the input and output of DemangleWithRichManglingInfo(), our new function for explicit demangling. It will return a properly initialized RichManglingInfo on success, or otherwise null:

RichManglingInfo *Mangled::DemangleWithRichManglingInfo(RichManglingSpec, SkipMangledNameFn)

Thus RichManglingInfo does not need to support a None-state (it's not accessible in this state). In order to avoid an extra heap allocation per invocation for storing the result of DemangleWithRichManglingInfo(), the actual instance is owned by RichManglingSpec. This aids (3) and we want to use a single RichManglingSpec instance for the entire index anyway (it also owns the IPD). An efficient filtering function SkipMangledNameFn contributes here too and helps to mimic the original behavior of InitNameIndexes.

The old implementation only parsed and indexed Itanium mangled names. The new RichManglingInfo can be easily extended for various mangling schemes and languages.

One problem with the implementation of RichManglingInfo is the inaccessibility of class CPlusPlusLanguage::MethodName (defined in source/Plugins/Language/..), from within any header in the Core components of LLDB. The rather hacky solution is to store a type erased pointer and cast it to the correct type on access in the cpp - see RichManglingInfo::get<ParserT>(). Not sure if there's a better way to do it. IMHO CPlusPlusLanguage::MethodName should be a top-level class in order to enable forward delcarations (but that is a rather big change I guess).

I also found a few minor bugs/smells, which I will mark with inline comments. First simple profiling shows a good speedup. target create clang now takes 0.64s on average (over 5 runs). Before the change I observed runtimes between 0.76s an 1.01s. This is still no bulletproof data (I only ran it on one machine!), but it's a promising indicator I think.

What do you think?
Is this too unconventional?
Do you have ideas for improvements?

Event Timeline

sgraenitz created this revision.Jul 30 2018, 8:07 AM
sgraenitz edited the summary of this revision. (Show Details)Jul 30 2018, 8:10 AM
sgraenitz added inline comments.Jul 30 2018, 8:24 AM
include/lldb/Symbol/Symtab.h
42

We don't need a None-case here.

57

This is the hackiest point I guess.

105

^^^^^ May have its own header & cpp

include/lldb/Utility/ConstString.h
357 ↗(On Diff #157967)

Fixing a related issue: There is no way to determine whether or not the internal string is null or empty. In fact, operator! does the same as the above IsEmpty(). The Mangled::GetDemangledName() function, however, thinks there would be a difference and wants to benefit from it. The fixed version should be correct now.

source/Core/Mangled.cpp
199

Fixing bug: This is no dead code, but well, maybe in a rare branch.

367

Using the difference between null and empty.

398

Using the difference between null and empty.

source/Symbol/Symtab.cpp
220

This uses a raw C-string instead of llvm::StringRef in order to achieve O(1) runtime.

I haven't read through this in detail yet, but I think this is a good start!

The part I'm not sure about is whether the RichManglingInfo vs. RichManglingSpec distinction brings any value. I mean, the lifetime of the first is tied to the lifetime of the second, and the Spec class can only have one active Info instance at any given moment. So you might as well just have one class, pass that to DemangleWithRichManglingInfo, and then query the same object when the call returns. The current interface with createItaniumInfo et al. makes it seem like one could call it multiple times in sequence, stashing the results, and then doing some post-processing on them.

I'll have to think about the C++::MethodName issue a bit more, but in general, I don't think moving that class to a separate file is a too disruptive change. If it means we don't have to mess with untyped pointers, then we should just do it. (Ideally, I wouldn't want the common code to reference that plugin at all, but that ship has already sailed, so I don't think this patch should be predicated on fixing that.)

include/lldb/Symbol/Symtab.h
80–82

This is implied by the deleted copy operations.

source/Symbol/Symtab.cpp
220

If you changed the caller to use StringRef too (it seems possible at a first glance) then this would still be O(1)

271–289

Could these return StringRef instead of C strings?

zturner added inline comments.
include/lldb/Symbol/Symtab.h
57

We have llvm::Any. Perhaps you want to use that here instead of void*?

sgraenitz added inline comments.Jul 30 2018, 9:41 AM
source/Symbol/Symtab.cpp
274

@erik.pilkington Is it acceptable/good practice to pass (nullptr, 0) here? At the moment this safes some lines of initialization checks for m_IPD_buf and m_IPD_size.

clayborg added inline comments.
include/lldb/Core/Mangled.h
23–32

move any forward decls to lldb-forward.h and remove all manual forward declarations in lldb_private from here.

include/lldb/Symbol/Symtab.h
25

move to separate files RichManglingInfo.h and RichManglingInfo.cpp

92

move to separate files RichManglingInfo.h and RichManglingInfo.cpp

Thanks or the quick reviews! Follow-ups inline.

include/lldb/Symbol/Symtab.h
57

Thanks. I will check that.

80–82

Which are implicitly deleted too, due to the existence of the destructor right? Does LLVM/LLDB have some kind of convention for it? I like to be explicit on ctors&assignment ("rule of 5"), because it aids error messages, but I would be fine with following the existing convention here.

source/Symbol/Symtab.cpp
220

Right, thanks there's a ConstString::GetStringRef(). Perfect.

271–289

Yes. So far it's simply the closest superset of the two interfaces, but I will try using StringRef where possible.

source/Symbol/Symtab.cpp
274

Sure, thats fine! Those parameters act the same way as buf and size in __cxa_demangle.

getFunctionBaseName will return nullptr if the mangled name isn't a function. Is it a precondition of this function that m_IPD stores a function? If not, it looks like you'll leak the buffer.

sgraenitz added inline comments.Jul 30 2018, 10:27 AM
include/lldb/Symbol/Symtab.h
57

@zturner Where is llvm::Any? Expected it in ADT or Support, but can't find it. IIUC llvm::Optional does something similar, but uses its own optional_detail::OptionalStorage. Same for llvm::Expected. Or is it a very recent addition?

zturner added inline comments.Jul 30 2018, 10:29 AM
include/lldb/Symbol/Symtab.h
57

It's pretty recent. I was actually the one who added it, about maybe 2 weeks ago. It's in include/llvm/ADT/Any.h

sgraenitz added inline comments.Jul 30 2018, 11:41 AM
source/Symbol/Symtab.cpp
274

Oh that is a very good note. I had it as a precondition in the client function in Symtab. When I removed that and started to just check the result for nullptr, I didn't think about the buffer. Gonna fix it, the generalized interface shouldn't have that precondition anyway. Thanks!

Simple fixes

Moved forward decls

sgraenitz marked 3 inline comments as done.Jul 30 2018, 12:25 PM

Sorry, I accidentally added the tests from https://reviews.llvm.org/D49909 also to this review. I will clean this up tomorrow.

sgraenitz updated this revision to Diff 158206.Jul 31 2018, 3:56 AM

Remove test code, that I added accidentally. Move RichManglingInfo and RichManglingSpec to their own header and cpp.

sgraenitz marked 3 inline comments as done.Jul 31 2018, 4:09 AM

The part I'm not sure about is whether the RichManglingInfo vs. RichManglingSpec distinction brings any value. [...] So you might as well just have one class, pass that to DemangleWithRichManglingInfo, and then query the same object when the call returns.

The idea here was that DemangleWithRichManglingInfo() acts like a gate keeper. If it succeeds it provides read-only access to the updated RichManglingInfo in RichManglingSpec, otherwise it returns null. IMHO the value behind it is that RichManglingInfo does not need to handle a NoInfo case next to ItaniumPartialDemangler and PluginCxxLanguage in every single getter. Instead it's just not accessible in that state. (Plus: there is no maintenance functions that confuse the public interface of RichManglingInfo.) I don't know how to do this with only one class.

Maybe RichManglingSpec is not the perfect name. What about renaming it to RichManglingContext?

I mean, the lifetime of the first is tied to the lifetime of the second, and the Spec class can only have one active Info instance at any given moment.

Yes, this was handy and avoids extra heap-allocations. const RichManglingInfo * was intended to clarify: lifetimes are handled elsewhere.

The current interface with createItaniumInfo et al. makes it seem like one could call it multiple times in sequence, stashing the results, and then doing some post-processing on them.

Yes, I can see that this is implicit knowledge. Do you have an idea how to make this more explicit? Rename to SetItaniumInfo() maybe?

In the end, I definitely prefer this approach over having a NoInfo state in RichManglingInfo. What do you think?

I think there is still something wrong with the diff. I can't see any of the callers of e.g. createItaniumInfo but I can see the function on both LHS and RHS of the diff (which shouldn't be the case as it's a new function). It looks like you uploaded just an ammending patch instead of the entire work. Can you fix that?

The part I'm not sure about is whether the RichManglingInfo vs. RichManglingSpec distinction brings any value. [...] So you might as well just have one class, pass that to DemangleWithRichManglingInfo, and then query the same object when the call returns.

The idea here was that DemangleWithRichManglingInfo() acts like a gate keeper. If it succeeds it provides read-only access to the updated RichManglingInfo in RichManglingSpec, otherwise it returns null. IMHO the value behind it is that RichManglingInfo does not need to handle a NoInfo case next to ItaniumPartialDemangler and PluginCxxLanguage in every single getter. Instead it's just not accessible in that state. (Plus: there is no maintenance functions that confuse the public interface of RichManglingInfo.) I don't know how to do this with only one class.

Maybe RichManglingSpec is not the perfect name. What about renaming it to RichManglingContext?

I mean, the lifetime of the first is tied to the lifetime of the second, and the Spec class can only have one active Info instance at any given moment.

Yes, this was handy and avoids extra heap-allocations. const RichManglingInfo * was intended to clarify: lifetimes are handled elsewhere.

The current interface with createItaniumInfo et al. makes it seem like one could call it multiple times in sequence, stashing the results, and then doing some post-processing on them.

Yes, I can see that this is implicit knowledge. Do you have an idea how to make this more explicit? Rename to SetItaniumInfo() maybe?

In the end, I definitely prefer this approach over having a NoInfo state in RichManglingInfo. What do you think?

Yes, I can see what you mean here. Neither of the solutions is particularly appealing. I guess if I were implementing this, I'd go with the "invalid state" option, though I am not sure why, as usually I am opposed to invalid states. Maybe we can leave this to the discretion of the implementor (you).

include/lldb/Symbol/Symtab.h
80–82

As far as I know, the presence of a destructor has no impact on the state of copy/move operations, so you still need to delete the copy operations explicitly.

I don't know if there is an official policy on explicitly deleting move operations, but I don't remember seeing that style anywhere. However, I don't care much about that either.

sgraenitz added inline comments.Jul 31 2018, 5:28 AM
include/lldb/Symbol/Symtab.h
80–82

The generation of the implicitly-defined copy constructor is deprecated if T has a user-defined destructor or user-defined copy assignment operator.

https://en.cppreference.com/w/cpp/language/copy_constructor

Actually, copy has no implications here and move won't work on the const pointer. Thus I will just remove it :)

source/Core/Mangled.cpp
325

I think there is still something wrong with the diff. I can't see any of the callers of e.g. createItaniumInfo

Weird. The caller is here, but not shown as a change anymore..

sgraenitz updated this revision to Diff 158231.Jul 31 2018, 5:52 AM

Fix potential leak of m_IPD_buf. Use llvm::Any instead of void*.
Rename: RichManglingSpec -> RichManglingContext, RichManglingContext::CreateXyInfo() -> RichManglingContext::SetXyInfo()

sgraenitz marked 4 inline comments as done.Jul 31 2018, 5:57 AM
sgraenitz abandoned this revision.Jul 31 2018, 8:17 AM

I think there is still something wrong with the diff. I can't see any of the callers of e.g. createItaniumInfo

Weird. The caller is here, but not shown as a change anymore..

I created a new review where all my changes are marked in green and red: https://reviews.llvm.org/D50071
If you have any more feedback, please let me know. I will keep the new one open for a few days, so Jim can review it when he is back from vacation.