This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/source/Utility/
-
source/
-
Utility/
2/2
ConstString.cpp
-
llvm/
-
include/llvm/ADT/
-
llvm/
-
ADT/
2/2
StringMap.h
-
lib/Support/
-
Support/
2/2
StringMap.cpp

Differential D122974

prevent ConstString from calling djbHash() more than once
ClosedPublic

Authored by dblaikie on Apr 2 2022, 5:46 AM.

Download Raw Diff

Details

Reviewers

clayborg
chandlerc
llunak

Summary

Pool::GetConstCStringWithStringRef() may call djbHash() up to 3 times for the same string: First from Pool::hash() when deciding which StringMap in the pool to use, then from read-locked StringMap::find() and then possibly from StringMap::insert() if insertion is needed.

If LLDB's index cache is enabled and everything is cached, calls to djbHash() are ~30% of the total CPU time here when I profile lldb start. Modifying StringMap to allow explicitly passing in a precomputed hash and calling djbHash() just once reduces startup CPU time by ~19%. Explicitly passing in the hash will also allow LLDB to cache the value in a cache file, thus avoiding computing it completely (another commit).

Diff Detail

Unit TestsFailed

	Time	Test
	60,160 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,120 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,110 ms	x64 debian > LLVM.CodeGen/NVPTX::wmma.py

Event Timeline

llunak created this revision.Apr 2 2022, 5:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2022, 5:46 AM

Herald added subscribers: dexonsmith, hiraditya. · View Herald Transcript

llunak requested review of this revision.Apr 2 2022, 5:46 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 2 2022, 5:46 AM

Herald added subscribers: llvm-commits, lldb-commits. · View Herald Transcript

Harbormaster completed remote builds in B157587: Diff 419971.Apr 2 2022, 7:12 AM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptApr 2 2022, 7:12 AM

The ConstString/StringPool is pretty hot so I'm very excited about making it faster.

I'm slightly concerned about the two hash values (the "full" hash vs the single byte one). That's not something that was introduced in this patch, but passing it around adds an opportunity to get it wrong.

I'm wondering if we could wrap this in a struct and pass that around:

struct HashedStringRef {
  const llvm::StringRef str;
  const unsigned full_hash;
  const uint8_t hash; 
  HashedStringRef(llvm::StringRef s) : str(s), full_hash(djbHash(str)), hash(hash(str)) {}
}

It looks like we always need both except in the StringMap implementation, but if I'm reading the code correctly we'll have constructed it in ConstString already anyway?

In D122974#3424654, @JDevlieghere wrote:

I'm slightly concerned about the two hash values (the "full" hash vs the single byte one). That's not something that was introduced in this patch, but passing it around adds an opportunity to get it wrong.

If this is aimed at me, then I don't know what you mean here. The single-byte hash is not passed around, it's used only locally in the two places that my change modifies and one more place where StringMap is not accessed (so no full hash needed there). I see no easy way for it to go wrong, and IMO the HashedStringRef struct would just complicate the code.

LGTM from a LLDB perspective.

In principle, I like this a lot, though I'm not sure if this is the best way to integrate with with the StringMap class. Adding @dblaikie for thoughts on that.

Some background:
LLDB uses a string pool, and during (parallel) dwarf indexing, this string pool is the biggest bottleneck. To reduce contention, lldb actually uses a set of string pools, where it first computes a hash of the string (I believe this is the one-byte hash that Jonas is referring to) to determine which pool (and which mutex) to use, and then performs the actual container operation. This approach has resulted in considerable speedup, but it means the string is getting hashed more than once. Luboš is trying to remove the redundant hashes and speed it up even further.

Ping @dblaikie ?

llunak mentioned this in D123390: [demangler][NFC] OperatorInfo table unit test.Apr 12 2022, 12:50 AM

Ping.

(sorry, this slipping through somehow - it's on my radar now)

In D122974#3424654, @JDevlieghere wrote:
The ConstString/StringPool is pretty hot so I'm very excited about making it faster.

I'm slightly concerned about the two hash values (the "full" hash vs the single byte one). That's not something that was introduced in this patch, but passing it around adds an opportunity to get it wrong.

I'm wondering if we could wrap this in a struct and pass that around:
struct HashedStringRef {
  const llvm::StringRef str;
  const unsigned full_hash;
  const uint8_t hash; 
  HashedStringRef(llvm::StringRef s) : str(s), full_hash(djbHash(str)), hash(hash(str)) {}
}
It looks like we always need both except in the StringMap implementation, but if I'm reading the code correctly we'll have constructed it in ConstString already anyway?

I'm sort of with you here - I don't think this API feature needs to affect lldb much - the scope of these externally-computed hashes is small, but the StringMap API could be more robust/less error prone by having a struct like this. Specifically StringMap::hash could/should return an immutable struct that contains the StringRef and the hash (not that that's bulletproof - the data beneath the StringRef could still be modified externally - but the same is true of any hashing data structure - the keys might be shallow or have mutable state, etc) - and that immutable object must be passed back to the hash-providing insertion functions. (these functions could/should probably still then assert that the hash is correct in case of that underlying mutation - though someone could argue that's overkill and I'd be open to having that discussion).

By immutable I mean that the caller can't go and modify either the StringRef or hash to make these out of sync. These handles are probably still copy assignable. The external code shouldn't be able to create their own (ctor private/protected, etc) - the only way to get one is to call StringMap to produce one.

lldb/source/Utility/ConstString.cpp
106–109	total aside, but it might be nice to refactor `m_string_pools[h]` out into a named variable here - accessing it 3 times and in a context where it's super important that it's the same thing in all 3 places, seems like it'd be easier to read with a named variable - might make it clear which things need to happen under the lock and which ones don't, etc. (oh, and the 4th and 5th use a few lines down - and I think that'd be the only uses of `h`, so `h` could go away and `hash` could be called inline in the array index instead, maybe?)

In D122974#3480686, @dblaikie wrote:

In D122974#3424654, @JDevlieghere wrote:

...

struct HashedStringRef {
  const llvm::StringRef str;
  const unsigned full_hash;
  const uint8_t hash; 
  HashedStringRef(llvm::StringRef s) : str(s), full_hash(djbHash(str)), hash(hash(str)) {}
}

...

The external code shouldn't be able to create their own (ctor private/protected, etc) - the only way to get one is to call StringMap to produce one.

But what if I don't want StringMap to compute the hash at all? There's still that 10-15% of CPU time spent in djbHash() when debug info uses exactly[*] that (and LLDB's index cache could be modified to store it). Which means it can be useful for StringMap to allow accepting precomputed hash, and then what purpose will that HashedStringRef have?

(*) Well, not exactly, the seed is different. Doesn't seem an impossible problem to solve though.

(these functions could/should probably still then assert that the hash is correct in case of that underlying mutation - though someone could argue that's overkill and I'd be open to having that discussion).

I'd prefer to have that discussion. If your argument is perfection, then let's do the full monty.

In D122974#3481269, @llunak wrote:
In D122974#3480686, @dblaikie wrote:

In D122974#3424654, @JDevlieghere wrote:

...
struct HashedStringRef {
  const llvm::StringRef str;
  const unsigned full_hash;
  const uint8_t hash; 
  HashedStringRef(llvm::StringRef s) : str(s), full_hash(djbHash(str)), hash(hash(str)) {}
}
...

The external code shouldn't be able to create their own (ctor private/protected, etc) - the only way to get one is to call StringMap to produce one.

But what if I don't want StringMap to compute the hash at all? There's still that 10-15% of CPU time spent in djbHash() when debug info uses exactly[*] that (and LLDB's index cache could be modified to store it). Which means it can be useful for StringMap to allow accepting precomputed hash, and then what purpose will that HashedStringRef have?

I think that David's point was that you would use a (probably static) StringMap method to produce the hash object, and then you use the hash object to select the correct map, and also to insert the key in the map. Something like:

auto hash_obj = MyStringMap::Hash(key);
MyStringMap &map = maps[hash_obj.one_char_hash];
map.insert(has_obj, key, value);

That should only produce one hash computation, right?

(*) Well, not exactly, the seed is different. Doesn't seem an impossible problem to solve though.

I think that's because it's trying to produce a value that is not correlated to the value used inside the individual objects (so that you don't overload some buckets of the maps while keeping others empty). However, there are other ways to achieve that. Since StringMap is always using a power of two for the number of buckets, I think it would be sufficient to use an odd number of StringMap instances (say 255). Using the modulo operation to select the StringMap should select spread out the complete hashes evenly among to individual maps.

And it would mean that the StringMap does not have to concern itself with the one-char hashes (which definitely seems like a layering violation).

(these functions could/should probably still then assert that the hash is correct in case of that underlying mutation - though someone could argue that's overkill and I'd be open to having that discussion).

I'd prefer to have that discussion. If your argument is perfection, then let's do the full monty.

I guess we could put those checks under #ifdef EXPENSIVE_CHECKS (?)

In D122974#3481852, @labath wrote:

In D122974#3481269, @llunak wrote:

But what if I don't want StringMap to compute the hash at all? There's still that 10-15% of CPU time spent in djbHash() when debug info uses exactly[*] that (and LLDB's index cache could be modified to store it). Which means it can be useful for StringMap to allow accepting precomputed hash, and then what purpose will that HashedStringRef have?

I think that David's point was that you would use a (probably static) StringMap method to produce the hash object, and then you use the hash object to select the correct map, and also to insert the key in the map. Something like:

...

That should only produce one hash computation, right?

Yes, it would, but that's not my point. If StringMap insists on computing the hash itself and doesn't allow outside code to provide it, then that prevents the possible LLDB optimization saving the remaing 10-15% startup CPU cost by not computing the hash at all. If StringMap allows that, then this HashedStringRef idea can be easily circumvented by passing the hash directly, so it seems rather pointless. And the relevant LLDB code from this change is so small that creating HashedStringRef just for that seems like an overkill.

(*) Well, not exactly, the seed is different. Doesn't seem an impossible problem to solve though.

I think that's because it's trying to produce a value that is not correlated to the value used inside the individual objects (so that you don't overload some buckets of the maps while keeping others empty). However, there are other ways to achieve that. Since StringMap is always using a power of two for the number of buckets, I think it would be sufficient to use an odd number of StringMap instances (say 255). Using the modulo operation to select the StringMap should select spread out the complete hashes evenly among to individual maps.

By different seed I meant that DWARF and StringMap use different seed for DJB hash. And that's only because LLVM code historically used the worse 0 seed and some tests rely on that (7b319df29bf4ebe690ca0c41761e46d8b0081293). But it seems that LLDB doesn't even read .debug_names entries into memory, so this having anything to do with DWARF debuginfo is actually moot, it'd only matter for LLDB index cache, which could use any hash it wants.

But you're possibly talking about something unrelated here.

In D122974#3482540, @llunak wrote:

In D122974#3481852, @labath wrote:

In D122974#3481269, @llunak wrote:

But what if I don't want StringMap to compute the hash at all? There's still that 10-15% of CPU time spent in djbHash() when debug info uses exactly[*] that (and LLDB's index cache could be modified to store it). Which means it can be useful for StringMap to allow accepting precomputed hash, and then what purpose will that HashedStringRef have?

I think that David's point was that you would use a (probably static) StringMap method to produce the hash object, and then you use the hash object to select the correct map, and also to insert the key in the map. Something like:

...

That should only produce one hash computation, right?

Yes, it would, but that's not my point. If StringMap insists on computing the hash itself and doesn't allow outside code to provide it, then that prevents the possible LLDB optimization saving the remaing 10-15% startup CPU cost by not computing the hash at all. If StringMap allows that, then this HashedStringRef idea can be easily circumvented by passing the hash directly, so it seems rather pointless. And the relevant LLDB code from this change is so small that creating HashedStringRef just for that seems like an overkill.

Interesting. I don't know if I missed this somewhere, but could explain what kind of a map operation can lldb perform without computing the hash at least once?

(*) Well, not exactly, the seed is different. Doesn't seem an impossible problem to solve though.

I think that's because it's trying to produce a value that is not correlated to the value used inside the individual objects (so that you don't overload some buckets of the maps while keeping others empty). However, there are other ways to achieve that. Since StringMap is always using a power of two for the number of buckets, I think it would be sufficient to use an odd number of StringMap instances (say 255). Using the modulo operation to select the StringMap should select spread out the complete hashes evenly among to individual maps.

By different seed I meant that DWARF and StringMap use different seed for DJB hash. And that's only because LLVM code historically used the worse 0 seed and some tests rely on that (7b319df29bf4ebe690ca0c41761e46d8b0081293). But it seems that LLDB doesn't even read .debug_names entries into memory, so this having anything to do with DWARF debuginfo is actually moot, it'd only matter for LLDB index cache, which could use any hash it wants.

But you're possibly talking about something unrelated here.

Yeah, I thought that the we used a different seed (or hash?) when computing the StringMap index -- I believe that was the case at some point, but I guess it was changed since then...

In D122974#3482556, @labath wrote:

Interesting. I don't know if I missed this somewhere, but could explain what kind of a map operation can lldb perform without computing the hash at least once?

All of them :). Computing is not the only way of obtaining the hash of a string. LLDB index cache (settings set symbols.enable-lldb-index-cache true) stores a big bunch of strings and when loading them back into ConstString the next time djbHash() is ~15% of total CPU LLDB startup time. That could be saved if the cache cached the hash values too.

In D122974#3482720, @llunak wrote:

In D122974#3482556, @labath wrote:

Interesting. I don't know if I missed this somewhere, but could explain what kind of a map operation can lldb perform without computing the hash at least once?

All of them :). Computing is not the only way of obtaining the hash of a string. LLDB index cache (settings set symbols.enable-lldb-index-cache true) stores a big bunch of strings and when loading them back into ConstString the next time djbHash() is ~15% of total CPU LLDB startup time. That could be saved if the cache cached the hash values too.

If the string pool caches the hash value, we could actually write out the hash in the cache file to speed up loading. I didn't want to re-hash every string when I saved the strings out to the cache file due to the cost, but it would be interesting to look into to see if it improves performance.

llunak mentioned this in D124704: [lldb] save ConstString hash value to cache files too.Apr 29 2022, 3:42 PM

In D122974#3483203, @clayborg wrote:

If the string pool caches the hash value, we could actually write out the hash in the cache file to speed up loading.

The patch doing that is D124704 (except for the string pool caching).

I didn't want to re-hash every string when I saved the strings out to the cache file due to the cost, but it would be interesting to look into to see if it improves performance.

I can measure 10% startup time saved when everything is already cached. Funnily enough, profiler consistently claims that saving caches got faster too (I already use D122975).

In D122974#3483647, @llunak wrote:

I can measure 10% startup time saved when everything is already cached. Funnily enough, profiler consistently claims that saving caches got faster too (I already use D122975).

Never mind, the saving was a bug. But the loading number is valid.

Used a temporary variable instead of repeated 'm_string_pools[h]'.
Added assert that the passed-in hash value matches, guarded by EXPENSIVE_CHECKS. It will assert also hashes computed by StringMap itself, but checking only values passed from outside would mean adding a number of *Impl functions and adding asserts in a number of places.

llunak updated this revision to Diff 427188.May 4 2022, 9:10 PM

Harbormaster completed remote builds in B162824: Diff 427188.May 4 2022, 10:34 PM

dexonsmith removed a subscriber: dexonsmith.May 5 2022, 9:39 AM

clayborg requested changes to this revision.May 5 2022, 11:28 AM

clayborg added inline comments.

lldb/source/Utility/ConstString.cpp
176–177	Can we rename these functions to something other than "hash"? It is a bit confusing to have a function called "hash" that returns a uint8_t when what this is really doing is getting use the string pool index. Maybe "GetPoolIndex"? I could see this function accidentally being used to hash a string instead of using StringPool::hash()
llvm/include/llvm/ADT/StringMap.h
69	Use "uint32_t" instead of unsigned to clarify the exact width of the integer. Or create a "HashType" typedef and then use that.
106
llvm/lib/Support/StringMap.cpp
83
141

This revision now requires changes to proceed.May 5 2022, 11:28 AM

add a function for selecting a pool in ConstString
use uint32_t for hash in StringMap API

Harbormaster completed remote builds in B165719: Diff 431216.May 22 2022, 1:51 AM

clayborg accepted this revision.May 23 2022, 1:13 PM

This revision is now accepted and ready to land.May 23 2022, 1:13 PM

In D122974#3481269, @llunak wrote:
In D122974#3480686, @dblaikie wrote:

In D122974#3424654, @JDevlieghere wrote:

...
struct HashedStringRef {
  const llvm::StringRef str;
  const unsigned full_hash;
  const uint8_t hash; 
  HashedStringRef(llvm::StringRef s) : str(s), full_hash(djbHash(str)), hash(hash(str)) {}
}
...

The external code shouldn't be able to create their own (ctor private/protected, etc) - the only way to get one is to call StringMap to produce one.

But what if I don't want StringMap to compute the hash at all? There's still that 10-15% of CPU time spent in djbHash() when debug info uses exactly[*] that (and LLDB's index cache could be modified to store it). Which means it can be useful for StringMap to allow accepting precomputed hash, and then what purpose will that HashedStringRef have?

If that happens, yeah, there's probably no point protecting this API - though I'd be more cautious of that change/any change that supports serializing the hash as this could end up causing StringMap to have to have stability guarantees that might be unfavorable for other uses (see ABSL maps that intentionally randomize/reseed each instance to ensure users aren't depending on undocumented guarantees - this gives them the flexibility to update the hash algorithm with something better in the future without breaking users who might be serializing the hashes/relying on iteration order/etc)

If we want a structure that can use a stable hash - it might be at that point that we move the hash support entirely out of StringMap and make it pluggable, with the default implementation doing djbHash as before, and even the new "stable" one doing that too, but doing it explicitly/documenting what guarantees it requires (stability within a major version? Across major versions?)

& then I guess what the API looks like is still an open question - perhaps the default trait (the one without any stability guarantees) could have a private implementation and expose that to StringMap via friendship. The stable implementation can have a public implementation for hashing & then an API like the one proposed where they can be passed into StringMap (yeah, without any of the handle safety previously proposed) - and assert when it's passed in that it matches what the trait provides (under EXPENSIVE_CHECKS, probably).

In D122974#3532822, @dblaikie wrote:

If we want a structure that can use a stable hash

Not for D124704. It doesn't make sense to require a stable hash algorithm for an internal cache file. All that it requires is that StringMap provides the hash value for a string, which it does with the public hash(). If that implementation changes, the EXPENSIVE_CHECKS assert will find that when unittesting the LLDB parts, and in that case the versioned cache file can have its version increased and problem solved. Even in case StringMap implementation changes in a way that no longer makes it feasible to store the precomputed hash value it's simpler to just dump the optimization.

So no need to overcomplicate this. As for the case of somebody else trying to rely on the stability of the hash, I can solve that by adding a comment that says it's not stable (and then that somebody else can go to the lengths of making it stable if needed).

Added documentation for StringMap::hash(), including an explicit comment saying that the implementation is not guaranteed to be stable.

Harbormaster completed remote builds in B165998: Diff 431597.May 24 2022, 12:55 AM

clayborg accepted this revision.May 24 2022, 12:04 PM

It doesn't make sense to require a stable hash algorithm for an internal cache file.

It's at least a stronger stability requirement than may be provided before - like: stable across process boundaries, at least, by the sounds of it? yeah?
& then still raises the question for me what about minor version updates, is it expected to be stable across those?

It'd be a bit subtle to have to know when to go and update the lldb cache version number because this hash function has changed, for instance. It might be more suitable to have lldb explicitly request a known hash function rather than the generic one (even if they're identical at the moment) so updates to LLVM's core libraries don't subtly break the hashing/cache system here. (I would guess there's no cross-version hash testing? So it seems like such a change could produce fairly subtle breakage that would slip under the radar fairly easily?)

In D122974#3535669, @dblaikie wrote:

It doesn't make sense to require a stable hash algorithm for an internal cache file.

It's at least a stronger stability requirement than may be provided before - like: stable across process boundaries, at least, by the sounds of it? yeah?

It's meant to be the same for the same library binary.

& then still raises the question for me what about minor version updates, is it expected to be stable across those?

Is anything expected to be stable across those? If so, that doesn't seem to be the reality of it (a random quick check finds two ABI changes just between 14.0.4 and 14.0.5, e6de9ed37308e46560243229dd78e84542f37ead and 53433dd0b5034681e1866e74651e8527a29e9364).

It'd be a bit subtle to have to know when to go and update the lldb cache version number because this hash function has changed, for instance. It might be more suitable to have lldb explicitly request a known hash function rather than the generic one (even if they're identical at the moment) so updates to LLVM's core libraries don't subtly break the hashing/cache system here. (I would guess there's no cross-version hash testing? So it seems like such a change could produce fairly subtle breakage that would slip under the radar fairly easily?)

D124704 adds a unittest that compares StringMap::hash() to a known hardcoded value, so whenever the hash implementation changes, it won't be possible to unittest LLDB with that change, and that will be the time to change the lldb cache version. If the theory is that this should keep working even with the library changing without LLDB rebuild, then as I wrote above that theory needs double-checking first. And additionally a good question to ask would be if it's really a good idea to do incompatible implementation changes to a class that has half of its functionality inline. Finally, there have been already attempts to change the hash function to use the better non-zero seed (D97396), and they were reverted because something depended on the hash function not changing, so I don't see it that likely for the hash function to change just like that in a minor update.

But if after all this that's still the theory, I guess StringMap could get an additional template parameter specifying the hash function, if it's actually worth the effort.

@clayborg : BTW, this is exactly the reason why I wanted the cache files header to include a hash value for a known string.

In D122974#3536342, @llunak wrote:

In D122974#3535669, @dblaikie wrote:

It doesn't make sense to require a stable hash algorithm for an internal cache file.

It's at least a stronger stability requirement than may be provided before - like: stable across process boundaries, at least, by the sounds of it? yeah?

It's meant to be the same for the same library binary.

& then still raises the question for me what about minor version updates, is it expected to be stable across those?

Is anything expected to be stable across those? If so, that doesn't seem to be the reality of it (a random quick check finds two ABI changes just between 14.0.4 and 14.0.5, e6de9ed37308e46560243229dd78e84542f37ead and 53433dd0b5034681e1866e74651e8527a29e9364).

My udnerstanding is that generally ABI stability is intended to be guaranteed across minor releases (& those patches don't look like ABI breaks to me - the first one changes the mangling of intended-to-be-local symbols so they don't collide, so it should cause valid programs to link when they previously failed to link. The second seems to add a new target that wasn't present - so nothing to break against?) but that's probably orthogonal to the stability of the map/cache/expectations of folks releasing and using lldb.

It'd be a bit subtle to have to know when to go and update the lldb cache version number because this hash function has changed, for instance. It might be more suitable to have lldb explicitly request a known hash function rather than the generic one (even if they're identical at the moment) so updates to LLVM's core libraries don't subtly break the hashing/cache system here. (I would guess there's no cross-version hash testing? So it seems like such a change could produce fairly subtle breakage that would slip under the radar fairly easily?)

D124704 adds a unittest that compares StringMap::hash() to a known hardcoded value, so whenever the hash implementation changes, it won't be possible to unittest LLDB with that change, and that will be the time to change the lldb cache version.

Ah, good stuff - doesn't guarantee that any hash change necessarily breaks the test, but certainly helps/seems like a good idea, thanks!

If the theory is that this should keep working even with the library changing without LLDB rebuild, then as I wrote above that theory needs double-checking first. And additionally a good question to ask would be if it's really a good idea to do incompatible implementation changes to a class that has half of its functionality inline.

Sorry, I wasn't meaning to discuss dynamic library versioning issues/mismatches, just static/fully matched versions.

Finally, there have been already attempts to change the hash function to use the better non-zero seed (D97396), and they were reverted because something depended on the hash function not changing, so I don't see it that likely for the hash function to change just like that in a minor update.

That something seems to have been another part of lldb - and that's the sort of change I'd like to enable/not make harder by avoiding more dependence on this ordering/hashing.

But if after all this that's still the theory, I guess StringMap could get an additional template parameter specifying the hash function, if it's actually worth the effort.

Yeah, a little traits class or the like is roughly what I'd picture.

In D122974#3538413, @dblaikie wrote:

In D122974#3536342, @llunak wrote:

D124704 adds a unittest that compares StringMap::hash() to a known hardcoded value, so whenever the hash implementation changes, it won't be possible to unittest LLDB with that change, and that will be the time to change the lldb cache version.

Ah, good stuff - doesn't guarantee that any hash change necessarily breaks the test, but certainly helps/seems like a good idea, thanks!

I think that does guarantee it in practice. If changing a string hash implementation does not change the result for a randomly chosen non-trivial input, then either the implementation is compatible or it's very poor, so I don't see why either of these should be a practical concern. But if the highly improbable case is a concern for some reason I fail to see, I can make the test check two different inputs, which should make the case virtually impossible.

If the theory is that this should keep working even with the library changing without LLDB rebuild, then as I wrote above that theory needs double-checking first. And additionally a good question to ask would be if it's really a good idea to do incompatible implementation changes to a class that has half of its functionality inline.

Sorry, I wasn't meaning to discuss dynamic library versioning issues/mismatches, just static/fully matched versions.

Then I still don't know what the problem is supposed to be. If the StringMap hash implementation ever changes, the necessary LLDB rebuild will detect this, the relevant LLDB parts will get adjusted and problem solved.

Ping.

If the theory is that this should keep working even with the library changing without LLDB rebuild, then as I wrote above that theory needs double-checking first. And additionally a good question to ask would be if it's really a good idea to do incompatible implementation changes to a class that has half of its functionality inline.

Sorry, I wasn't meaning to discuss dynamic library versioning issues/mismatches, just static/fully matched versions.

Then I still don't know what the problem is supposed to be. If the StringMap hash implementation ever changes, the necessary LLDB rebuild will detect this, the relevant LLDB parts will get adjusted and problem solved.

What I mean is if the cache is used across statically linked versions - eg: cache is created, someone installs an update to lldb, then the cache is read back and misinterprets the hashes in the cache if the hash algorithm had changed between versions.

Finally, there have been already attempts to change the hash function to use the better non-zero seed (D97396), and they were reverted because something depended on the hash function not changing, so I don't see it that likely for the hash function to change just like that in a minor update.

That something seems to have been another part of lldb - and that's the sort of change I'd like to enable/not make harder by avoiding more dependence on this ordering/hashing.

But if after all this that's still the theory, I guess StringMap could get an additional template parameter specifying the hash function, if it's actually worth the effort.

Yeah, a little traits class or the like is roughly what I'd picture.

In D122974#3567278, @dblaikie wrote:

Then I still don't know what the problem is supposed to be. If the StringMap hash implementation ever changes, the necessary LLDB rebuild will detect this, the relevant LLDB parts will get adjusted and problem solved.

What I mean is if the cache is used across statically linked versions - eg: cache is created, someone installs an update to lldb, then the cache is read back and misinterprets the hashes in the cache if the hash algorithm had changed between versions.

That is not going to happen, that updated LLDB will be using a different cache version. That's what the part you're replying to means. You cannot have two LLDB builds using the same cache version but different hash implementations, as long as you do not ignore LLDB unittests failing.

Ping.

Herald added a subscriber: Michael137. · View Herald TranscriptJun 15 2022, 12:12 AM

Finally, there have been already attempts to change the hash function to use the better non-zero seed (D97396), and they were reverted because something depended on the hash function not changing, so I don't see it that likely for the hash function to change just like that in a minor update.

That something seems to have been another part of lldb - and that's the sort of change I'd like to enable/not make harder by avoiding more dependence on this ordering/hashing.

But if after all this that's still the theory, I guess StringMap could get an additional template parameter specifying the hash function, if it's actually worth the effort.

Yeah, a little traits class or the like is roughly what I'd picture.

^

^ I think it's still worthwhile/necessary to separate LLDB's use case/hashing algorithm choice from LLVM's so LLVM's code can be changed to be more change resilient in a way that LLDB's cannot (eg: random seeds will never be usable by LLDB but may be for LLVM).

In D122974#3567444, @llunak wrote:

In D122974#3567278, @dblaikie wrote:

Then I still don't know what the problem is supposed to be. If the StringMap hash implementation ever changes, the necessary LLDB rebuild will detect this, the relevant LLDB parts will get adjusted and problem solved.

What I mean is if the cache is used across statically linked versions - eg: cache is created, someone installs an update to lldb, then the cache is read back and misinterprets the hashes in the cache if the hash algorithm had changed between versions.

That is not going to happen, that updated LLDB will be using a different cache version. That's what the part you're replying to means. You cannot have two LLDB builds using the same cache version but different hash implementations, as long as you do not ignore LLDB unittests failing.

Fair enough - probably good to have some commentary in the test cases that makes it really clear that if the test needs to be updated then the version needs to be updated. (is that patch already posted? Could you link to that comment from this review?)

In D122974#3586352, @dblaikie wrote:

^ I think it's still worthwhile/necessary to separate LLDB's use case/hashing algorithm choice from LLVM's so LLVM's code can be changed to be more change resilient in a way that LLDB's cannot (eg: random seeds will never be usable by LLDB but may be for LLVM).

Possibly, but that can be done in another patch. Preferably by somebody who has an actual use case for it.

Fair enough - probably good to have some commentary in the test cases that makes it really clear that if the test needs to be updated then the version needs to be updated. (is that patch already posted? Could you link to that comment from this review?)

Already done. D124704 , https://reviews.llvm.org/D124704#change-yXGqjEgMMr0u .

Ping.

dblaikie mentioned this in rG67c631d283fc: [ADT][StringMap] Add ability to precompute and reuse the string hash.Mon, Dec 11, 4:07 PM

Ended up coming back to this and committing a version of this in:
rG67c631d283fc96d652304199cd625be426b98f8e
rG2e197602305be18b963928e6ae024a004a95af6d

dblaikie commandeered this revision.Tue, Dec 12, 2:59 PM

dblaikie closed this revision.

dblaikie edited reviewers, added: llunak; removed: dblaikie.

Revision Contents

Path

Size

lldb/

source/

Utility/

ConstString.cpp

48 lines

llvm/

include/

llvm/

ADT/

StringMap.h

51 lines

lib/

Support/

StringMap.cpp

14 lines

Diff 431597

lldb/source/Utility/ConstString.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	if (ccstr != nullptr) {
// Since the entry is read only, and we derive the entry entirely from		// Since the entry is read only, and we derive the entry entirely from
// the pointer, we don't need the lock.		// the pointer, we don't need the lock.
const StringPoolEntryType &entry = GetStringMapEntryFromKeyData(ccstr);		const StringPoolEntryType &entry = GetStringMapEntryFromKeyData(ccstr);
return entry.getKey().size();		return entry.getKey().size();
}		}
return 0;		return 0;
}		}

StringPoolValueType GetMangledCounterpart(const char *ccstr) const {		StringPoolValueType GetMangledCounterpart(const char *ccstr) {
if (ccstr != nullptr) {		if (ccstr != nullptr) {
const uint8_t h = hash(llvm::StringRef(ccstr));		const PoolEntry &pool = selectPool(llvm::StringRef(ccstr));
llvm::sys::SmartScopedReader<false> rlock(m_string_pools[h].m_mutex);		llvm::sys::SmartScopedReader<false> rlock(pool.m_mutex);
return GetStringMapEntryFromKeyData(ccstr).getValue();		return GetStringMapEntryFromKeyData(ccstr).getValue();
}		}
return nullptr;		return nullptr;
}		}

const char GetConstCString(const char cstr) {		const char GetConstCString(const char cstr) {
if (cstr != nullptr)		if (cstr != nullptr)
return GetConstCStringWithLength(cstr, strlen(cstr));		return GetConstCStringWithLength(cstr, strlen(cstr));
return nullptr;		return nullptr;
}		}

const char GetConstCStringWithLength(const char cstr, size_t cstr_len) {		const char GetConstCStringWithLength(const char cstr, size_t cstr_len) {
if (cstr != nullptr)		if (cstr != nullptr)
return GetConstCStringWithStringRef(llvm::StringRef(cstr, cstr_len));		return GetConstCStringWithStringRef(llvm::StringRef(cstr, cstr_len));
return nullptr;		return nullptr;
}		}

const char *GetConstCStringWithStringRef(const llvm::StringRef &string_ref) {		const char *GetConstCStringWithStringRef(const llvm::StringRef &string_ref) {
if (string_ref.data()) {		if (string_ref.data()) {
const uint8_t h = hash(string_ref);		const uint32_t string_hash = StringPool::hash(string_ref);
		PoolEntry &pool = selectPool(string_hash);

{		{
llvm::sys::SmartScopedReader<false> rlock(m_string_pools[h].m_mutex);		llvm::sys::SmartScopedReader<false> rlock(pool.m_mutex);
auto it = m_string_pools[h].m_string_map.find(string_ref);		auto it = pool.m_string_map.find(string_ref, string_hash);
if (it != m_string_pools[h].m_string_map.end())		if (it != pool.m_string_map.end())
		dblaikieAuthorUnsubmitted Done Reply Inline Actions total aside, but it might be nice to refactor `m_string_pools[h]` out into a named variable here - accessing it 3 times and in a context where it's super important that it's the same thing in all 3 places, seems like it'd be easier to read with a named variable - might make it clear which things need to happen under the lock and which ones don't, etc. (oh, and the 4th and 5th use a few lines down - and I think that'd be the only uses of `h`, so `h` could go away and `hash` could be called inline in the array index instead, maybe?) dblaikie: total aside, but it might be nice to refactor `m_string_pools[h]` out into a named variable…
return it->getKeyData();		return it->getKeyData();
}		}

llvm::sys::SmartScopedWriter<false> wlock(m_string_pools[h].m_mutex);		llvm::sys::SmartScopedWriter<false> wlock(pool.m_mutex);
StringPoolEntryType &entry =		StringPoolEntryType &entry =
*m_string_pools[h]		*pool.m_string_map
.m_string_map.insert(std::make_pair(string_ref, nullptr))		.insert(std::make_pair(string_ref, nullptr), string_hash)
.first;		.first;
return entry.getKeyData();		return entry.getKeyData();
}		}
return nullptr;		return nullptr;
}		}

const char *		const char *
GetConstCStringAndSetMangledCounterPart(llvm::StringRef demangled,		GetConstCStringAndSetMangledCounterPart(llvm::StringRef demangled,
const char *mangled_ccstr) {		const char *mangled_ccstr) {
const char *demangled_ccstr = nullptr;		const char *demangled_ccstr = nullptr;

{		{
const uint8_t h = hash(demangled);		const uint32_t demangled_hash = StringPool::hash(demangled);
llvm::sys::SmartScopedWriter<false> wlock(m_string_pools[h].m_mutex);		PoolEntry &pool = selectPool(demangled_hash);
		llvm::sys::SmartScopedWriter<false> wlock(pool.m_mutex);

// Make or update string pool entry with the mangled counterpart		// Make or update string pool entry with the mangled counterpart
StringPool &map = m_string_pools[h].m_string_map;		StringPool &map = pool.m_string_map;
StringPoolEntryType &entry = *map.try_emplace(demangled).first;		StringPoolEntryType &entry =
		*map.try_emplace_with_hash(demangled, demangled_hash).first;

entry.second = mangled_ccstr;		entry.second = mangled_ccstr;

// Extract the const version of the demangled_cstr		// Extract the const version of the demangled_cstr
demangled_ccstr = entry.getKeyData();		demangled_ccstr = entry.getKeyData();
}		}

{		{
// Now assign the demangled const string as the counterpart of the		// Now assign the demangled const string as the counterpart of the
// mangled const string...		// mangled const string...
const uint8_t h = hash(llvm::StringRef(mangled_ccstr));		PoolEntry &pool = selectPool(llvm::StringRef(mangled_ccstr));
llvm::sys::SmartScopedWriter<false> wlock(m_string_pools[h].m_mutex);		llvm::sys::SmartScopedWriter<false> wlock(pool.m_mutex);
GetStringMapEntryFromKeyData(mangled_ccstr).setValue(demangled_ccstr);		GetStringMapEntryFromKeyData(mangled_ccstr).setValue(demangled_ccstr);
}		}

// Return the constant demangled C string		// Return the constant demangled C string
return demangled_ccstr;		return demangled_ccstr;
}		}

const char GetConstTrimmedCStringWithLength(const char cstr,		const char GetConstTrimmedCStringWithLength(const char cstr,
Show All 11 Lines	for (const auto &pool : m_string_pools) {
llvm::sys::SmartScopedReader<false> rlock(pool.m_mutex);		llvm::sys::SmartScopedReader<false> rlock(pool.m_mutex);
const Allocator &alloc = pool.m_string_map.getAllocator();		const Allocator &alloc = pool.m_string_map.getAllocator();
stats.bytes_total += alloc.getTotalMemory();		stats.bytes_total += alloc.getTotalMemory();
stats.bytes_used += alloc.getBytesAllocated();		stats.bytes_used += alloc.getBytesAllocated();
}		}
return stats;		return stats;
}		}

protected:		protected:
uint8_t hash(const llvm::StringRef &s) const {
uint32_t h = llvm::djbHash(s);
return ((h >> 24) ^ (h >> 16) ^ (h >> 8) ^ h) & 0xff;
}

struct PoolEntry {		struct PoolEntry {
		clayborgUnsubmitted Done Reply Inline Actions Can we rename these functions to something other than "hash"? It is a bit confusing to have a function called "hash" that returns a uint8_t when what this is really doing is getting use the string pool index. Maybe "GetPoolIndex"? I could see this function accidentally being used to hash a string instead of using StringPool::hash() clayborg: Can we rename these functions to something other than "hash"? It is a bit confusing to have a…
mutable llvm::sys::SmartRWMutex<false> m_mutex;		mutable llvm::sys::SmartRWMutex<false> m_mutex;
StringPool m_string_map;		StringPool m_string_map;
PoolEntry() : m_string_map(16384) {}		PoolEntry() : m_string_map(16384) {}
};		};

std::array<PoolEntry, 256> m_string_pools;		std::array<PoolEntry, 256> m_string_pools;

		PoolEntry &selectPool(const llvm::StringRef &s) {
		return selectPool(StringPool::hash(s));
		}

		PoolEntry &selectPool(uint32_t h) {
		return m_string_pools[((h >> 24) ^ (h >> 16) ^ (h >> 8) ^ h) & 0xff];
		}
};		};

// Frameworks and dylibs aren't supposed to have global C++ initializers so we		// Frameworks and dylibs aren't supposed to have global C++ initializers so we
// hide the string pool in a static function so that it will get initialized on		// hide the string pool in a static function so that it will get initialized on
// the first call to this static function.		// the first call to this static function.
//		//
// Note, for now we make the string pool a pointer to the pool, because we		// Note, for now we make the string pool a pointer to the pool, because we
// can't guarantee that some objects won't get destroyed after the global		// can't guarantee that some objects won't get destroyed after the global
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/include/llvm/ADT/StringMap.h

Show All 11 Lines

//===----------------------------------------------------------------------===//

#ifndef LLVM_ADT_STRINGMAP_H

#define LLVM_ADT_STRINGMAP_H

#include "llvm/ADT/StringMapEntry.h"

#include "llvm/ADT/iterator.h"

#include "llvm/Support/AllocatorBase.h"

#include "llvm/Support/DJB.h"

#include "llvm/Support/PointerLikeTypeTraits.h"

#include <initializer_list>

#include <iterator>

namespace llvm {

template <typename ValueTy> class StringMapConstIterator;

template <typename ValueTy> class StringMapIterator;

Show All 27 Lines

protected:

StringMapImpl(unsigned InitSize, unsigned ItemSize);

unsigned RehashTable(unsigned BucketNo = 0);

/// LookupBucketFor - Look up the bucket that the specified string should end

/// up in. If it already exists as a key in the map, the Item pointer for the

/// specified bucket will be non-null. Otherwise, it will be null. In either

/// case, the FullHashValue field of the bucket will be set to the hash value

/// of the string.

unsigned LookupBucketFor(StringRef Key);

unsigned LookupBucketFor(StringRef Key) {

return LookupBucketFor(Key, hash(Key));

}

/// Overload that explicitly takes precomputed hash(Key).

unsigned LookupBucketFor(StringRef Key, uint32_t FullHashValue);

clayborgUnsubmitted

Done

/// Overload that explicitly takes precomputed hash(Key).

- unsigned LookupBucketFor(StringRef Key, unsigned FullHashValue);

+ unsigned LookupBucketFor(StringRef Key, uint32_t FullHashValue);

/// FindKey - Look up the bucket that contains the specified key. If it exists

Use "uint32_t" instead of unsigned to clarify the exact width of the integer. Or create a "HashType" typedef and then use that.

clayborg: Use "uint32_t" instead of unsigned to clarify the exact width of the integer. Or create a…

/// FindKey - Look up the bucket that contains the specified key. If it exists

/// in the map, return the bucket number of the key. Otherwise return -1.

/// This does not modify the map.

int FindKey(StringRef Key) const;

int FindKey(StringRef Key) const { return FindKey(Key, hash(Key)); }

/// Overload that explicitly takes precomputed hash(Key).

int FindKey(StringRef Key, uint32_t FullHashValue) const;

/// RemoveKey - Remove the specified StringMapEntry from the table, but do not

/// delete it. This aborts if the value isn't in the table.

void RemoveKey(StringMapEntryBase *V);

/// RemoveKey - Remove the StringMapEntry for the specified key from the

/// table, returning it. If the key is not in the table, this returns null.

StringMapEntryBase *RemoveKey(StringRef Key);

Show All 12 Lines

public:

}

unsigned getNumBuckets() const { return NumBuckets; }

unsigned getNumItems() const { return NumItems; }

bool empty() const { return NumItems == 0; }

unsigned size() const { return NumItems; }

/// Returns the hash value that will be used for the given string.

clayborgUnsubmitted

Done

unsigned size() const { return NumItems; }

- static unsigned hash(StringRef Key) { return llvm::djbHash(Key, 0); }

+ static uint32_t hash(StringRef Key) { return llvm::djbHash(Key, 0); }

void swap(StringMapImpl &Other) {

clayborg:

/// This allows precomputing the value and passing it explicitly

/// to some of the functions.

/// The implementation of this function is not guaranteed to be stable

/// and may change.

static uint32_t hash(StringRef Key) { return llvm::djbHash(Key, 0); }

void swap(StringMapImpl &Other) {

std::swap(TheTable, Other.TheTable);

std::swap(NumBuckets, Other.NumBuckets);

std::swap(NumItems, Other.NumItems);

std::swap(NumTombstones, Other.NumTombstones);

}

};

▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines

const_iterator end() const {

return const_iterator(TheTable + NumBuckets, true);

}

iterator_range<StringMapKeyIterator<ValueTy>> keys() const {

return make_range(StringMapKeyIterator<ValueTy>(begin()),

StringMapKeyIterator<ValueTy>(end()));

}

iterator find(StringRef Key) {

iterator find(StringRef Key) { return find(Key, hash(Key)); }

int Bucket = FindKey(Key);

iterator find(StringRef Key, uint32_t FullHashValue) {

int Bucket = FindKey(Key, FullHashValue);

if (Bucket == -1)

return end();

return iterator(TheTable + Bucket, true);

}

const_iterator find(StringRef Key) const {

const_iterator find(StringRef Key) const { return find(Key, hash(Key)); }

int Bucket = FindKey(Key);

const_iterator find(StringRef Key, uint32_t FullHashValue) const {

int Bucket = FindKey(Key, FullHashValue);

if (Bucket == -1)

return end();

return const_iterator(TheTable + Bucket, true);

}

/// lookup - Return the entry for the specified key, or a default

/// constructed value if no such entry exists.

ValueTy lookup(StringRef Key) const {

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

bool insert(MapEntryTy *KeyValue) {

return true;

}

/// insert - Inserts the specified key/value pair into the map if the key

/// isn't already in the map. The bool component of the returned pair is true

/// if and only if the insertion takes place, and the iterator component of

/// the pair points to the element with key equivalent to the key of the pair.

std::pair<iterator, bool> insert(std::pair<StringRef, ValueTy> KV) {

return try_emplace(KV.first, std::move(KV.second));

return try_emplace_with_hash(KV.first, hash(KV.first),

std::move(KV.second));

}

std::pair<iterator, bool> insert(std::pair<StringRef, ValueTy> KV,

uint32_t FullHashValue) {

return try_emplace_with_hash(KV.first, FullHashValue, std::move(KV.second));

}

/// Inserts elements from range [first, last). If multiple elements in the

/// range have keys that compare equivalent, it is unspecified which element

/// is inserted .

template <typename InputIt> void insert(InputIt First, InputIt Last) {

for (InputIt It = First; It != Last; ++It)

insert(*It);

Show All 16 Lines

std::pair<iterator, bool> insert_or_assign(StringRef Key, V &&Val) {

return Ret;

}

/// Emplace a new element for the specified key into the map if the key isn't

/// already in the map. The bool component of the returned pair is true

/// if and only if the insertion takes place, and the iterator component of

/// the pair points to the element with key equivalent to the key of the pair.

template <typename... ArgsTy>

std::pair<iterator, bool> try_emplace(StringRef Key, ArgsTy &&... Args) {

std::pair<iterator, bool> try_emplace(StringRef Key, ArgsTy &&...Args) {

unsigned BucketNo = LookupBucketFor(Key);

return try_emplace_with_hash(Key, hash(Key), std::forward<ArgsTy>(Args)...);

}

template <typename... ArgsTy>

std::pair<iterator, bool> try_emplace_with_hash(StringRef Key,

uint32_t FullHashValue,

ArgsTy &&...Args) {

unsigned BucketNo = LookupBucketFor(Key, FullHashValue);

StringMapEntryBase *&Bucket = TheTable[BucketNo];

if (Bucket && Bucket != getTombstoneVal())

return std::make_pair(iterator(TheTable + BucketNo, false),

false); // Already exists in map.

if (Bucket == getTombstoneVal())

--NumTombstones;

Bucket = MapEntryTy::Create(Key, Allocator, std::forward<ArgsTy>(Args)...);

▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/lib/Support/StringMap.cpp

//===--- StringMap.cpp - String Hash table map implementation -------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This file implements the StringMap class.

//===----------------------------------------------------------------------===//

#include "llvm/ADT/StringMap.h"

#include "llvm/Support/DJB.h"

#include "llvm/Support/MathExtras.h"

using namespace llvm;

/// Returns the number of buckets to allocate to ensure that the DenseMap can

/// accommodate \p NumEntries without need to grow().

static inline unsigned getMinBucketToReserveForEntries(unsigned NumEntries) {

// Ensure that "NumEntries * 4 < NumBuckets * 3"

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines

void StringMapImpl::init(unsigned InitSize) {

NumBuckets = NewNumBuckets;

}

/// LookupBucketFor - Look up the bucket that the specified string should end

/// up in. If it already exists as a key in the map, the Item pointer for the

/// specified bucket will be non-null. Otherwise, it will be null. In either

/// case, the FullHashValue field of the bucket will be set to the hash value

/// of the string.

unsigned StringMapImpl::LookupBucketFor(StringRef Name) {

unsigned StringMapImpl::LookupBucketFor(StringRef Name,

uint32_t FullHashValue) {

clayborgUnsubmitted

Done

unsigned StringMapImpl::LookupBucketFor(StringRef Name,

- unsigned FullHashValue) {

+ uint32_t FullHashValue) {

#ifdef EXPENSIVE_CHECKS

clayborg:

#ifdef EXPENSIVE_CHECKS

assert(FullHashValue == hash(Name));

#endif

// Hash table unallocated so far?

if (NumBuckets == 0)

init(16);

unsigned FullHashValue = djbHash(Name, 0);

unsigned BucketNo = FullHashValue & (NumBuckets - 1);

unsigned *HashTable = getHashTable(TheTable, NumBuckets);

unsigned ProbeAmt = 1;

int FirstTombstone = -1;

while (true) {

StringMapEntryBase *BucketItem = TheTable[BucketNo];

// If we found an empty bucket, this key isn't in the table yet, return it.

Show All 35 Lines

while (true) {

// probing and has good cache behavior in the common case.

++ProbeAmt;

}

/// FindKey - Look up the bucket that contains the specified key. If it exists

/// in the map, return the bucket number of the key. Otherwise return -1.

/// This does not modify the map.

int StringMapImpl::FindKey(StringRef Key) const {

int StringMapImpl::FindKey(StringRef Key, uint32_t FullHashValue) const {

clayborgUnsubmitted

Done

/// This does not modify the map.

- int StringMapImpl::FindKey(StringRef Key, unsigned FullHashValue) const {

+ int StringMapImpl::FindKey(StringRef Key, uint32_t FullHashValue) const {

if (NumBuckets == 0)

clayborg:

if (NumBuckets == 0)

return -1; // Really empty table?

unsigned FullHashValue = djbHash(Key, 0);

#ifdef EXPENSIVE_CHECKS

assert(FullHashValue == hash(Key));

#endif

unsigned BucketNo = FullHashValue & (NumBuckets - 1);

unsigned *HashTable = getHashTable(TheTable, NumBuckets);

unsigned ProbeAmt = 1;

while (true) {

StringMapEntryBase *BucketItem = TheTable[BucketNo];

// If we found an empty bucket, this key isn't in the table yet, return.

if (LLVM_LIKELY(!BucketItem))

▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

prevent ConstString from calling djbHash() more than onceClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 431597

lldb/source/Utility/ConstString.cpp

llvm/include/llvm/ADT/StringMap.h

llvm/lib/Support/StringMap.cpp

prevent ConstString from calling djbHash() more than once
ClosedPublic