UniqueCStringMap "sorts" the entries for fast lookup, but really it only cares about uniqueness. ConstString can be compared by pointer alone, rather than with strcmp, resulting in much faster comparisons. Change the interface to take ConstString instead, and propagate use of the type to the callers where appropriate.
Details
- Reviewers
clayborg - Group Reviewers
Restricted Project - Commits
- rG4d35d6b3b38b: Change UniqueCStringMap to use ConstString as the key
rLLDB301908: Change UniqueCStringMap to use ConstString as the key
rL301908: Change UniqueCStringMap to use ConstString as the key
Diff Detail
- Repository
- rL LLVM
Event Timeline
You should also add lldb-commits as a subscriber, fwiw, so comments/questions/etc are cc:'ed to the -commits list.
Very close. A few misuses of ConstString and this will be good to go.
include/lldb/Interpreter/Property.h | ||
---|---|---|
43 | This shouldn't be const-ified, Only names of things like variables, enumerators, typenames, paths, and other strings that are going to be uniqued should go into the ConstStringPool | |
include/lldb/Symbol/ObjectFile.h | ||
569 | Remove whitespace changes. Do as separate checkin if desired. This helps keep the noise down in the change in case things need to be cherry picked,. | |
576–577 | Remove whitespace changes. Do as separate checkin if desired. This helps keep the noise down in the change in case things need to be cherry picked,. | |
583 | Remove whitespace changes. Do as separate checkin if desired. This helps keep the noise down in the change in case things need to be cherry picked,. | |
808–811 | This actually doesn't need to change. Since it is merely stripping off parts of the string, this should really stay as a StringRef and it should return a StringRef. Change to use StringRef for return and for argument, or revert. | |
include/lldb/Utility/ConstString.h | ||
481 | I really don't like the prospect of crashing the debugger if someone calls this with an invalid index. Many people ship with asserts on. I would either make it safe if it is indeed just for reporting or remove the assert. | |
source/Interpreter/Property.cpp | ||
252 | Revert this. Description shouldn't be a ConstString. | |
269 | Revert this. Description shouldn't be a ConstString. | |
source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp | ||
324–326 | StringRef is better to use for modifying simple strings and looking for things. Actually looking at this class it is using ConstString all over the place for putting strings together. For creating strings we should use lldb_private::StreamString. For stripping stuff off and grabbing just part of a string, seeing if a string starts with, ends with, contains, etc, llvm::StringRef. So I would vote to just change it back, or fix the entire class to use lldb_private::StreamString | |
345–348 | Ditto | |
354 | Ditto | |
363 | Ditto | |
source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp | ||
1811–1816 | Switch to use StringRef as return and for argument or revert. | |
source/Plugins/ObjectFile/ELF/ObjectFileELF.h | ||
155–157 | Switch to use StringRef as return and for argument or revert. |
At a high level, I think there might be a misunderstanding on what I'm attempting to do. It isn't to convert things that weren't ConstString into things that are. It is instead to utilize the fact that we have all these ConstString in order to improve the performance of various operations by retaining the fact that they are ConstString. While a conversion from ConstString to StringRef is relatively cheap (at least after a separate change I have to remove the hashing of the string and the lock operation), the conversion back to ConstString is expensive. If we pass around StringRef to functions that need ConstString, then the performance will remain poor.
include/lldb/Interpreter/Property.h | ||
---|---|---|
43 | ok but note the original code - it's already storing m_name and m_description as ConstString. All I'm doing is exposing the underlying type. Would you prefer I change m_name and m_description to be std::string instead? Otherwise it won't actually save anything. Also note that later uses take the names are put into m_name_to_index in the option parser, which is a UniqueCString, which requires ConstString. I don't know the code well enough to know whether all options or only some options go through this, so I could see it being worth it to only convert to ConstString at that point (for example, see source/Interpreter/OptionValueProperties.cpp in this review). | |
include/lldb/Symbol/ObjectFile.h | ||
808–811 | The only user of StripLinkerSymbolAnnotations immediately converts it to a ConstString. Changing this back to using StringRef would mean an extra lookup for architectures that do not override this function. | |
include/lldb/Utility/ConstString.h | ||
481 | This code is copied from StringRef, which is what lldb used before this change. Do you still want me to revert the assert? Though IMO it should be changed to <=, not <, so that the null terminator can be read safely. | |
source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp | ||
324–326 | I can revert this fragment, but just note this won't affect the number of ConstStrings that are generated. m_impl has ConstStrings in it, and type_name is a ConstString, so yes they can be converted to StringRef just to call StringRef::contains, or we can leave them as ConstString and utilize strstr. Or, I can add ConstString::contains so the interface to ConstString and StringRef is more similar. | |
345–348 | In line 352 below, m_impl is a UniqueCStringMap, so all keys are ConstString. The only way to look up in a UnqiueCStringMap is using ConstString, so matching_key should remain a ConstString to prevent an unnecessary lookup. Everything else is following from that. |
include/lldb/Interpreter/Property.h | ||
---|---|---|
43 | I'd suggest keeping these as StringRef as well. The option parser does not do anything performance critical (which probably means it shouldn't be using UniqueCStringMap in the first place, but that is a different discussion...). | |
include/lldb/Utility/ConstString.h | ||
481 | I think that asserting here is fine... There's no way you can make code that likes to tread off the end of an array safe by changing this function. However, I am not really fond of the idea of ConstString taking over StringRef functionality. I think we should stick to requiring stringref conversions when doing string manipulation. Maybe if you reduce the number of ConstString occurences according to other comments, this will not be that necessary anymore (?) |
address review comments
include/lldb/Symbol/ObjectFile.h | ||
---|---|---|
808–811 | reverting this has no measurable performance impact on my test, so even though the caller has a ConstString, and will convert the result to a ConstString, I will revert this. | |
include/lldb/Utility/ConstString.h | ||
481 | The indexing is used in Symtab::InitNameIndexes, where is dealing with ConstString. IMO converting to a StringRef there is a bad idea, as it means there will be two variables representing the same thing (one ConstString, one StringRef), which makes it easy to update one and not the other. But there is no performance difference, at least if review https://reviews.llvm.org/D32306 is committed (which removes hashing and locking when getting the length). Otherwise it's a >1% penalty to convert to StringRef in InitNameIndexes. | |
source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp | ||
324–326 | turns out this code is unused, filing a separate review to remove it. |
This shouldn't be const-ified, Only names of things like variables, enumerators, typenames, paths, and other strings that are going to be uniqued should go into the ConstStringPool