This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
source/Core/
-
Core/
5
ConstString.cpp

Differential D13652

Change ConstString to support massive multi-threaded access
ClosedPublic

Authored by tberghammer on Oct 12 2015, 7:33 AM.

Download Raw Diff

Details

Reviewers

clayborg
labath

Commits

rG3fe5ce0b3e37: Change ConstString to support massive multi-threaded access
rLLDB250289: Change ConstString to support massive multi-threaded access
rL250289: Change ConstString to support massive multi-threaded access

Summary

Change ConstString to support massive multi-threaded access

Previously ConstString had a single mutex guarding the global string
pool for each access what become a bottleneck when using it with a
large number of threads.

This CL distributes the strings to 256 individual string pools based on
a simple hash function to eliminate the bottleneck and speed up the
multi-thread access.

The goal of the change is to prepare to multi-threaded symbol parsing code
to speed up the symbol parsing speed.

Diff Detail

Event Timeline

tberghammer updated this revision to Diff 37109.Oct 12 2015, 7:33 AM

tberghammer retitled this revision from to Change ConstString to support massive multi-threaded access.

tberghammer updated this object.

tberghammer added reviewers: labath, clayborg.

tberghammer added a subscriber: lldb-commits.

Looks reasonable to me, but please with for ok from clayborg.

zturner added a subscriber: zturner.Oct 12 2015, 11:41 AM

zturner added inline comments.

source/Core/ConstString.cpp
175	Did you consider changing this to an `llvm::RWMutex`?

tberghammer mentioned this in D13662: Make dwarf parsing multi-threaded.Oct 12 2015, 12:56 PM

It would be nice to compute one hash per string and use that during insertion. I really like the patch, but can we avoid two string hashes?

source/Core/ConstString.cpp
156–162	Is there a way we can use the hash that llvm uses for its string pool here? We are calculating two hashes: one for this to see which pool it will go into, and then another when the string is hashed into the string pool object. It would be nice if we can calculate the hash once and maybe do/add a string pool insertion that the pool with use to verify the string is in the pool or insert it using the supplied hash?

This revision now requires changes to proceed.Oct 12 2015, 1:38 PM

tberghammer added inline comments.Oct 12 2015, 3:08 PM

source/Core/ConstString.cpp
156–162	I don't see any reasonable way to avoid using 2 hash function without re-implementing llvm::StringMap with multi-threaded support in mind with per bucket mutextes. One of the issue is that llvm::StringMap don't have any interface where we can specify a hash value for an insert to avoid the calculation of the hash. The other problem is that we want to use a different hash function for selecting the pool and then selecting the bucket to achieve a uniform distribution between the buckets inside the StringMap. I am already a little bit concerned because we use 2 very similar hash function (StringMap use the LSB of llvm::HashString) what can cause some performance degradation. I think a nice solution would be to use a hash map with built in multi-threaded support, or even better with a lock-free implementation (lock/unlock takes a lot of time) but I don't think implementing it would worth the effort.
175	I haven't tried it, but I don't see any easy way to use it because we use a single StringMap::insert call to read and possibly write to the map. If we want to get the advantage out from RWMutex then we should split it into a StringMap::find and then a StringMap::insert call what is doing 2 lookup.

zturner added inline comments.Oct 12 2015, 3:21 PM

source/Core/ConstString.cpp
175	That's not a big deal though is it? Average case for lookup and insert are both constant, so doing 2 operations is still constant. Over time as more and more strings get added, the probability of finding any given string increases, meaning you will converge towards having more reads than writes and the additional concurrency gained from doing a read followed by a write far outweighs the overhead of the extra constant-time operation. You could do it with double checked locking. For example: lock.acquire_read(); if (value = map.find(x)) return value; lock.acquire_write(); if (value = map.find(x)) return value; map.insert(x); return x;

Use llvm::sys::RWMutex

Using it have a minor performance hit for debug info parsing because of the intensive write operations, but should have a positive impact on all reads (most access after debug info parsing). If the performance hit for writes isn't acceptable we can add a flag to the ConstString constructor hinting that we are creating a new string so we don't have to make a check in this case, but I prefer to don't do it until we can prove it is necessary.

Sounds reasonable

I would still like to get down to hashing the string once at some point, but we can start with this and iterate on it.

This revision is now accepted and ready to land.Oct 13 2015, 12:51 PM

Closed by commit rL250289: Change ConstString to support massive multi-threaded access (authored by tberghammer). · Explain WhyOct 14 2015, 3:40 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

source/

Core/

ConstString.cpp

99 lines

Diff 37258

source/Core/ConstString.cpp

//===-- ConstString.cpp ------------------------------------------ C++ --===//		//===-- ConstString.cpp ------------------------------------------ C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "lldb/Core/ConstString.h"		#include "lldb/Core/ConstString.h"
#include "lldb/Core/Stream.h"		#include "lldb/Core/Stream.h"
#include "lldb/Host/Mutex.h"		#include "lldb/Host/Mutex.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
		#include "llvm/ADT/StringExtras.h"
		#include "llvm/Support/RWMutex.h"

#include <mutex> // std::once		#include <array>
		#include <mutex>

using namespace lldb_private;		using namespace lldb_private;


class Pool		class Pool
{		{
public:		public:
typedef const char * StringPoolValueType;		typedef const char * StringPoolValueType;
typedef llvm::StringMap<StringPoolValueType, llvm::BumpPtrAllocator> StringPool;		typedef llvm::StringMap<StringPoolValueType, llvm::BumpPtrAllocator> StringPool;
typedef llvm::StringMapEntry<StringPoolValueType> StringPoolEntryType;		typedef llvm::StringMapEntry<StringPoolValueType> StringPoolEntryType;

//------------------------------------------------------------------
// Default constructor
//
// Initialize the member variables and create the empty string.
//------------------------------------------------------------------
Pool () :
m_mutex (Mutex::eMutexTypeRecursive),
m_string_map ()
{
}

//------------------------------------------------------------------
// Destructor
//------------------------------------------------------------------
~Pool ()
{
}


static StringPoolEntryType &		static StringPoolEntryType &
GetStringMapEntryFromKeyData (const char *keyData)		GetStringMapEntryFromKeyData (const char *keyData)
{		{
char ptr = const_cast<char>(keyData) - sizeof (StringPoolEntryType);		char ptr = const_cast<char>(keyData) - sizeof (StringPoolEntryType);
return reinterpret_cast<StringPoolEntryType>(ptr);		return reinterpret_cast<StringPoolEntryType>(ptr);
}		}

size_t		size_t
Show All 27 Lines	SetMangledCounterparts (const char key_ccstr, const char value_ccstr)
return false;		return false;
}		}

const char *		const char *
GetConstCString (const char *cstr)		GetConstCString (const char *cstr)
{		{
if (cstr)		if (cstr)
return GetConstCStringWithLength (cstr, strlen (cstr));		return GetConstCStringWithLength (cstr, strlen (cstr));
return NULL;		return nullptr;
}		}

const char *		const char *
GetConstCStringWithLength (const char *cstr, size_t cstr_len)		GetConstCStringWithLength (const char *cstr, size_t cstr_len)
{		{
if (cstr)		if (cstr)
{		return GetConstCStringWithStringRef(llvm::StringRef(cstr, cstr_len));
Mutex::Locker locker (m_mutex);		return nullptr;
llvm::StringRef string_ref (cstr, cstr_len);
StringPoolEntryType& entry = *m_string_map.insert (std::make_pair (string_ref, (StringPoolValueType)NULL)).first;
return entry.getKeyData();
}
return NULL;
}		}

const char *		const char *
GetConstCStringWithStringRef (const llvm::StringRef &string_ref)		GetConstCStringWithStringRef (const llvm::StringRef &string_ref)
{		{
if (string_ref.data())		if (string_ref.data())
{		{
Mutex::Locker locker (m_mutex);		uint8_t h = hash (string_ref);
StringPoolEntryType& entry = *m_string_map.insert (std::make_pair (string_ref, (StringPoolValueType)NULL)).first;
		{
		llvm::sys::SmartScopedReader<false> rlock(m_string_pools[h].m_mutex);
		auto it = m_string_pools[h].m_string_map.find (string_ref);
		if (it != m_string_pools[h].m_string_map.end())
		return it->getKeyData();
		}

		llvm::sys::SmartScopedWriter<false> wlock(m_string_pools[h].m_mutex);
		StringPoolEntryType& entry = *m_string_pools[h].m_string_map.insert (std::make_pair (string_ref, nullptr)).first;
return entry.getKeyData();		return entry.getKeyData();
}		}
return NULL;		return nullptr;
}		}

const char *		const char *
GetConstCStringAndSetMangledCounterPart (const char demangled_cstr, const char mangled_ccstr)		GetConstCStringAndSetMangledCounterPart (const char demangled_cstr, const char mangled_ccstr)
{		{
if (demangled_cstr)		if (demangled_cstr)
{		{
Mutex::Locker locker (m_mutex);		llvm::StringRef string_ref (demangled_cstr);
		uint8_t h = hash (string_ref);
		llvm::sys::SmartScopedWriter<false> wlock(m_string_pools[h].m_mutex);

// Make string pool entry with the mangled counterpart already set		// Make string pool entry with the mangled counterpart already set
StringPoolEntryType& entry = *m_string_map.insert (std::make_pair (llvm::StringRef (demangled_cstr), mangled_ccstr)).first;		StringPoolEntryType& entry = *m_string_pools[h].m_string_map.insert (std::make_pair (string_ref, mangled_ccstr)).first;

// Extract the const version of the demangled_cstr		// Extract the const version of the demangled_cstr
const char *demangled_ccstr = entry.getKeyData();		const char *demangled_ccstr = entry.getKeyData();
// Now assign the demangled const string as the counterpart of the		// Now assign the demangled const string as the counterpart of the
// mangled const string...		// mangled const string...
GetStringMapEntryFromKeyData (mangled_ccstr).setValue(demangled_ccstr);		GetStringMapEntryFromKeyData (mangled_ccstr).setValue(demangled_ccstr);
// Return the constant demangled C string		// Return the constant demangled C string
return demangled_ccstr;		return demangled_ccstr;
}		}
return NULL;		return nullptr;
}		}

const char *		const char *
GetConstTrimmedCStringWithLength (const char *cstr, size_t cstr_len)		GetConstTrimmedCStringWithLength (const char *cstr, size_t cstr_len)
{		{
if (cstr)		if (cstr)
{		{
const size_t trimmed_len = std::min<size_t> (strlen (cstr), cstr_len);		const size_t trimmed_len = std::min<size_t> (strlen (cstr), cstr_len);
return GetConstCStringWithLength (cstr, trimmed_len);		return GetConstCStringWithLength (cstr, trimmed_len);
}		}
return NULL;		return nullptr;
}		}

//------------------------------------------------------------------		//------------------------------------------------------------------
// Return the size in bytes that this object and any items in its		// Return the size in bytes that this object and any items in its
// collection of uniqued strings + data count values takes in		// collection of uniqued strings + data count values takes in
// memory.		// memory.
//------------------------------------------------------------------		//------------------------------------------------------------------
size_t		size_t
MemorySize() const		MemorySize() const
{		{
Mutex::Locker locker (m_mutex);
size_t mem_size = sizeof(Pool);		size_t mem_size = sizeof(Pool);
const_iterator end = m_string_map.end();		for (const auto& pool : m_string_pools)
for (const_iterator pos = m_string_map.begin(); pos != end; ++pos)
{		{
mem_size += sizeof(StringPoolEntryType) + pos->getKey().size();		llvm::sys::SmartScopedReader<false> rlock(pool.m_mutex);
		for (const auto& entry : pool.m_string_map)
		mem_size += sizeof(StringPoolEntryType) + entry.getKey().size();
}		}
return mem_size;		return mem_size;
}		}

protected:		protected:
//------------------------------------------------------------------		uint8_t
// Typedefs		hash(const llvm::StringRef &s)
//------------------------------------------------------------------		{
typedef StringPool::iterator iterator;		uint32_t h = llvm::HashString(s);
typedef StringPool::const_iterator const_iterator;		return ((h >> 24) ^ (h >> 16) ^ (h >> 8) ^ h) & 0xff;
		}

		clayborgUnsubmitted Not Done Reply Inline Actions Is there a way we can use the hash that llvm uses for its string pool here? We are calculating two hashes: one for this to see which pool it will go into, and then another when the string is hashed into the string pool object. It would be nice if we can calculate the hash once and maybe do/add a string pool insertion that the pool with use to verify the string is in the pool or insert it using the supplied hash? clayborg: Is there a way we can use the hash that llvm uses for its string pool here? We are calculating…
		tberghammerAuthorUnsubmitted Not Done Reply Inline Actions I don't see any reasonable way to avoid using 2 hash function without re-implementing llvm::StringMap with multi-threaded support in mind with per bucket mutextes. One of the issue is that llvm::StringMap don't have any interface where we can specify a hash value for an insert to avoid the calculation of the hash. The other problem is that we want to use a different hash function for selecting the pool and then selecting the bucket to achieve a uniform distribution between the buckets inside the StringMap. I am already a little bit concerned because we use 2 very similar hash function (StringMap use the LSB of llvm::HashString) what can cause some performance degradation. I think a nice solution would be to use a hash map with built in multi-threaded support, or even better with a lock-free implementation (lock/unlock takes a lot of time) but I don't think implementing it would worth the effort. tberghammer: I don't see any reasonable way to avoid using 2 hash function without re-implementing llvm…
//------------------------------------------------------------------		struct PoolEntry
// Member variables		{
//------------------------------------------------------------------		mutable llvm::sys::SmartRWMutex<false> m_mutex;
mutable Mutex m_mutex;
zturnerUnsubmitted Not Done Reply Inline Actions Did you consider changing this to an `llvm::RWMutex`? zturner: Did you consider changing this to an `llvm::RWMutex`?
tberghammerAuthorUnsubmitted Not Done Reply Inline Actions I haven't tried it, but I don't see any easy way to use it because we use a single StringMap::insert call to read and possibly write to the map. If we want to get the advantage out from RWMutex then we should split it into a StringMap::find and then a StringMap::insert call what is doing 2 lookup. tberghammer: I haven't tried it, but I don't see any easy way to use it because we use a single StringMap…
zturnerUnsubmitted Not Done Reply Inline Actions That's not a big deal though is it? Average case for lookup and insert are both constant, so doing 2 operations is still constant. Over time as more and more strings get added, the probability of finding any given string increases, meaning you will converge towards having more reads than writes and the additional concurrency gained from doing a read followed by a write far outweighs the overhead of the extra constant-time operation. You could do it with double checked locking. For example: lock.acquire_read(); if (value = map.find(x)) return value; lock.acquire_write(); if (value = map.find(x)) return value; map.insert(x); return x; zturner: That's not a big deal though is it? Average case for lookup and insert are both constant, so…
StringPool m_string_map;		StringPool m_string_map;
};		};

		std::array<PoolEntry, 256> m_string_pools;
		};

//----------------------------------------------------------------------		//----------------------------------------------------------------------
// Frameworks and dylibs aren't supposed to have global C++		// Frameworks and dylibs aren't supposed to have global C++
// initializers so we hide the string pool in a static function so		// initializers so we hide the string pool in a static function so
// that it will get initialized on the first call to this static		// that it will get initialized on the first call to this static
// function.		// function.
//		//
// Note, for now we make the string pool a pointer to the pool, because		// Note, for now we make the string pool a pointer to the pool, because
// we can't guarantee that some objects won't get destroyed after the		// we can't guarantee that some objects won't get destroyed after the
// global destructor chain is run, and trying to make sure no destructors		// global destructor chain is run, and trying to make sure no destructors
// touch ConstStrings is difficult. So we leak the pool instead.		// touch ConstStrings is difficult. So we leak the pool instead.
//----------------------------------------------------------------------		//----------------------------------------------------------------------
static Pool &		static Pool &
StringPool()		StringPool()
{		{
static std::once_flag g_pool_initialization_flag;		static std::once_flag g_pool_initialization_flag;
static Pool *g_string_pool = NULL;		static Pool *g_string_pool = nullptr;

std::call_once(g_pool_initialization_flag, [] () {		std::call_once(g_pool_initialization_flag, [] () {
g_string_pool = new Pool();		g_string_pool = new Pool();
});		});

return *g_string_pool;		return *g_string_pool;
}		}

Show All 20 Lines	ConstString::operator < (const ConstString& rhs) const

llvm::StringRef lhs_string_ref (m_string, StringPool().GetConstCStringLength (m_string));		llvm::StringRef lhs_string_ref (m_string, StringPool().GetConstCStringLength (m_string));
llvm::StringRef rhs_string_ref (rhs.m_string, StringPool().GetConstCStringLength (rhs.m_string));		llvm::StringRef rhs_string_ref (rhs.m_string, StringPool().GetConstCStringLength (rhs.m_string));

// If both have valid C strings, then return the comparison		// If both have valid C strings, then return the comparison
if (lhs_string_ref.data() && rhs_string_ref.data())		if (lhs_string_ref.data() && rhs_string_ref.data())
return lhs_string_ref < rhs_string_ref;		return lhs_string_ref < rhs_string_ref;

// Else one of them was NULL, so if LHS is NULL then it is less than		// Else one of them was nullptr, so if LHS is nullptr then it is less than
return lhs_string_ref.data() == NULL;		return lhs_string_ref.data() == nullptr;
}		}

Stream&		Stream&
lldb_private::operator << (Stream& s, const ConstString& str)		lldb_private::operator << (Stream& s, const ConstString& str)
{		{
const char *cstr = str.GetCString();		const char *cstr = str.GetCString();
if (cstr)		if (cstr)
s << cstr;		s << cstr;
▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines