This is an archive of the discontinued LLVM Phabricator instance.

Change ConstString to support massive multi-threaded access
ClosedPublic

Authored by tberghammer on Oct 12 2015, 7:33 AM.

Details

Summary

Change ConstString to support massive multi-threaded access

Previously ConstString had a single mutex guarding the global string
pool for each access what become a bottleneck when using it with a
large number of threads.

This CL distributes the strings to 256 individual string pools based on
a simple hash function to eliminate the bottleneck and speed up the
multi-thread access.

The goal of the change is to prepare to multi-threaded symbol parsing code
to speed up the symbol parsing speed.

Diff Detail

Event Timeline

tberghammer retitled this revision from to Change ConstString to support massive multi-threaded access.
tberghammer updated this object.
tberghammer added reviewers: labath, clayborg.
tberghammer added a subscriber: lldb-commits.
labath edited edge metadata.Oct 12 2015, 9:11 AM

Looks reasonable to me, but please with for ok from clayborg.

zturner added inline comments.
source/Core/ConstString.cpp
175

Did you consider changing this to an llvm::RWMutex?

clayborg requested changes to this revision.Oct 12 2015, 1:38 PM
clayborg edited edge metadata.

It would be nice to compute one hash per string and use that during insertion. I really like the patch, but can we avoid two string hashes?

source/Core/ConstString.cpp
156–162

Is there a way we can use the hash that llvm uses for its string pool here? We are calculating two hashes: one for this to see which pool it will go into, and then another when the string is hashed into the string pool object. It would be nice if we can calculate the hash once and maybe do/add a string pool insertion that the pool with use to verify the string is in the pool or insert it using the supplied hash?

This revision now requires changes to proceed.Oct 12 2015, 1:38 PM
tberghammer added inline comments.Oct 12 2015, 3:08 PM
source/Core/ConstString.cpp
156–162

I don't see any reasonable way to avoid using 2 hash function without re-implementing llvm::StringMap with multi-threaded support in mind with per bucket mutextes. One of the issue is that llvm::StringMap don't have any interface where we can specify a hash value for an insert to avoid the calculation of the hash. The other problem is that we want to use a different hash function for selecting the pool and then selecting the bucket to achieve a uniform distribution between the buckets inside the StringMap. I am already a little bit concerned because we use 2 very similar hash function (StringMap use the LSB of llvm::HashString) what can cause some performance degradation.

I think a nice solution would be to use a hash map with built in multi-threaded support, or even better with a lock-free implementation (lock/unlock takes a lot of time) but I don't think implementing it would worth the effort.

175

I haven't tried it, but I don't see any easy way to use it because we use a single StringMap::insert call to read and possibly write to the map. If we want to get the advantage out from RWMutex then we should split it into a StringMap::find and then a StringMap::insert call what is doing 2 lookup.

zturner added inline comments.Oct 12 2015, 3:21 PM
source/Core/ConstString.cpp
175

That's not a big deal though is it? Average case for lookup and insert are both constant, so doing 2 operations is still constant. Over time as more and more strings get added, the probability of finding any given string increases, meaning you will converge towards having more reads than writes and the additional concurrency gained from doing a read followed by a write far outweighs the overhead of the extra constant-time operation.

You could do it with double checked locking. For example:

lock.acquire_read();
if (value = map.find(x))
     return value;
lock.acquire_write();
if (value = map.find(x))
    return value;
map.insert(x);
return x;
tberghammer edited edge metadata.

Use llvm::sys::RWMutex

Using it have a minor performance hit for debug info parsing because of the intensive write operations, but should have a positive impact on all reads (most access after debug info parsing). If the performance hit for writes isn't acceptable we can add a flag to the ConstString constructor hinting that we are creating a new string so we don't have to make a check in this case, but I prefer to don't do it until we can prove it is necessary.

Sounds reasonable

clayborg accepted this revision.Oct 13 2015, 12:51 PM
clayborg edited edge metadata.

I would still like to get down to hashing the string once at some point, but we can start with this and iterate on it.

This revision is now accepted and ready to land.Oct 13 2015, 12:51 PM
This revision was automatically updated to reflect the committed changes.