This is an archive of the discontinued LLVM Phabricator instance.

Change UniqueCStringMap to use ConstString as the key
ClosedPublic

Authored by scott.smith on Apr 20 2017, 3:14 PM.

Details

Summary

UniqueCStringMap "sorts" the entries for fast lookup, but really it only cares about uniqueness. ConstString can be compared by pointer alone, rather than with strcmp, resulting in much faster comparisons. Change the interface to take ConstString instead, and propagate use of the type to the callers where appropriate.

Diff Detail

Repository
rL LLVM

Event Timeline

scott.smith created this revision.Apr 20 2017, 3:14 PM
scott.smith edited the summary of this revision. (Show Details)

I'd recommend Greg Clayton as a reviewer.

You should also add lldb-commits as a subscriber, fwiw, so comments/questions/etc are cc:'ed to the -commits list.

clayborg requested changes to this revision.Apr 20 2017, 9:00 PM

Very close. A few misuses of ConstString and this will be good to go.

include/lldb/Interpreter/Property.h
43

This shouldn't be const-ified, Only names of things like variables, enumerators, typenames, paths, and other strings that are going to be uniqued should go into the ConstStringPool

include/lldb/Symbol/ObjectFile.h
569

Remove whitespace changes. Do as separate checkin if desired. This helps keep the noise down in the change in case things need to be cherry picked,.

576–577

Remove whitespace changes. Do as separate checkin if desired. This helps keep the noise down in the change in case things need to be cherry picked,.

583

Remove whitespace changes. Do as separate checkin if desired. This helps keep the noise down in the change in case things need to be cherry picked,.

808–811

This actually doesn't need to change. Since it is merely stripping off parts of the string, this should really stay as a StringRef and it should return a StringRef. Change to use StringRef for return and for argument, or revert.

include/lldb/Utility/ConstString.h
481

I really don't like the prospect of crashing the debugger if someone calls this with an invalid index. Many people ship with asserts on. I would either make it safe if it is indeed just for reporting or remove the assert.

source/Interpreter/Property.cpp
252

Revert this. Description shouldn't be a ConstString.

269

Revert this. Description shouldn't be a ConstString.

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
324–326

StringRef is better to use for modifying simple strings and looking for things. Actually looking at this class it is using ConstString all over the place for putting strings together. For creating strings we should use lldb_private::StreamString. For stripping stuff off and grabbing just part of a string, seeing if a string starts with, ends with, contains, etc, llvm::StringRef. So I would vote to just change it back, or fix the entire class to use lldb_private::StreamString

345–348

Ditto

354

Ditto

363

Ditto

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
1811–1816

Switch to use StringRef as return and for argument or revert.

source/Plugins/ObjectFile/ELF/ObjectFileELF.h
155–157

Switch to use StringRef as return and for argument or revert.

This revision now requires changes to proceed.Apr 20 2017, 9:00 PM

At a high level, I think there might be a misunderstanding on what I'm attempting to do. It isn't to convert things that weren't ConstString into things that are. It is instead to utilize the fact that we have all these ConstString in order to improve the performance of various operations by retaining the fact that they are ConstString. While a conversion from ConstString to StringRef is relatively cheap (at least after a separate change I have to remove the hashing of the string and the lock operation), the conversion back to ConstString is expensive. If we pass around StringRef to functions that need ConstString, then the performance will remain poor.

include/lldb/Interpreter/Property.h
43

ok but note the original code - it's already storing m_name and m_description as ConstString. All I'm doing is exposing the underlying type. Would you prefer I change m_name and m_description to be std::string instead? Otherwise it won't actually save anything.

Also note that later uses take the names are put into m_name_to_index in the option parser, which is a UniqueCString, which requires ConstString. I don't know the code well enough to know whether all options or only some options go through this, so I could see it being worth it to only convert to ConstString at that point (for example, see source/Interpreter/OptionValueProperties.cpp in this review).

include/lldb/Symbol/ObjectFile.h
808–811

The only user of StripLinkerSymbolAnnotations immediately converts it to a ConstString. Changing this back to using StringRef would mean an extra lookup for architectures that do not override this function.

include/lldb/Utility/ConstString.h
481

This code is copied from StringRef, which is what lldb used before this change. Do you still want me to revert the assert?

Though IMO it should be changed to <=, not <, so that the null terminator can be read safely.

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
324–326

I can revert this fragment, but just note this won't affect the number of ConstStrings that are generated. m_impl has ConstStrings in it, and type_name is a ConstString, so yes they can be converted to StringRef just to call StringRef::contains, or we can leave them as ConstString and utilize strstr. Or, I can add ConstString::contains so the interface to ConstString and StringRef is more similar.

345–348

In line 352 below, m_impl is a UniqueCStringMap, so all keys are ConstString. The only way to look up in a UnqiueCStringMap is using ConstString, so matching_key should remain a ConstString to prevent an unnecessary lookup. Everything else is following from that.

labath added a subscriber: labath.Apr 25 2017, 7:16 AM
labath added inline comments.
include/lldb/Interpreter/Property.h
43

I'd suggest keeping these as StringRef as well. The option parser does not do anything performance critical (which probably means it shouldn't be using UniqueCStringMap in the first place, but that is a different discussion...).

include/lldb/Utility/ConstString.h
481

I think that asserting here is fine... There's no way you can make code that likes to tread off the end of an array safe by changing this function.

However, I am not really fond of the idea of ConstString taking over StringRef functionality. I think we should stick to requiring stringref conversions when doing string manipulation. Maybe if you reduce the number of ConstString occurences according to other comments, this will not be that necessary anymore (?)

scott.smith edited edge metadata.
scott.smith marked 21 inline comments as done.

address review comments

include/lldb/Symbol/ObjectFile.h
808–811

reverting this has no measurable performance impact on my test, so even though the caller has a ConstString, and will convert the result to a ConstString, I will revert this.

include/lldb/Utility/ConstString.h
481

The indexing is used in Symtab::InitNameIndexes, where is dealing with ConstString. IMO converting to a StringRef there is a bad idea, as it means there will be two variables representing the same thing (one ConstString, one StringRef), which makes it easy to update one and not the other.

But there is no performance difference, at least if review https://reviews.llvm.org/D32306 is committed (which removes hashing and locking when getting the length). Otherwise it's a >1% penalty to convert to StringRef in InitNameIndexes.

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
324–326

turns out this code is unused, filing a separate review to remove it.

scott.smith marked an inline comment as done.Apr 27 2017, 10:43 AM

Can I get a re-review on this? Thanks.

Can I get a re-review on this? Thanks.

Ping - please rereview!

clayborg accepted this revision.May 1 2017, 9:38 AM

We can iterate on this.

This revision is now accepted and ready to land.May 1 2017, 9:38 AM

We can iterate on this.

Thank you! Also, can you please push this? I don't have commit access.

This revision was automatically updated to reflect the committed changes.