I've noticed (what looked like) deadlocking behavior while running some
Python code that registers categories and formatters from a separate
thread. I ran lldb and my scripts through Helgrind and got (among other
unrelated reports) the report below.
In summary, there are two mutexes (TypeCategoryMap::m_map_mutex and
FormatManager::m_language_categories_mutex) that sometimes get
acquired in different order. This change attempts to fix that by
reducing the scope of the locks that appear in the Helgrind report.
- TypeCategoryMap::Add locks m_map_mutex and keeps holding it while notifying listener. The listener class shouldn't touch any data in TypeCategoryMap (because everything is private), except by using the class' own methods, and all of them acquire the mutex on entry.
This change makes it release the lock after m_map_mutex is updated, but before the listener is invoked.
- FormatManager::GetCategoryForLanguage does a map lookup (holding m_language_categories_mutex) and, if the lookup fails, constructs a new LanguageCategory. This also results in listeners being called while still holding the mutex.
This patch changes it so that the lock is only held during initial lookup. If lookup fails, the new category will be created and only then we'll acquire the lock again to insert the new category into the map.
Here's the Helgrind report for reference:
==3574084== ---------------------------------------------------------------- ==3574084== ==3574084== Thread #6: lock order "0x1472C070 before 0x1472C110" violated ==3574084== ==3574084== Observed (incorrect) order is: acquisition of lock at 0x1472C110 ==3574084== at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942) ==3574084== by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749) ==3574084== by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811) ==3574084== by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108) ==3574084== by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229) ==3574084== by 0xB2F1E04: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:28) ==3574084== by 0xB2D8A57: lldb_private::FormatManager::GetCategory(lldb_private::ConstString, bool) (llvm/lldb/source/DataFormatters/FormatManager.cpp:423) ==3574084== by 0xB2D380B: lldb_private::DataVisualization::Categories::GetCategory(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&, bool) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:79) ==3574084== by 0xAE47DFD: lldb::SBDebugger::CreateCategory(char const*) (llvm/lldb/source/API/SBDebugger.cpp:1535) ==3574084== by 0xAF7BA0B: _wrap_SBDebugger_CreateCategory(_object*, _object*) (LLDBWrapPython.cpp:24738) ==3574084== by 0x14E77483: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0) ==3574084== by 0x14E33A07: _PyObject_MakeTpCall (in /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0) ==3574084== ==3574084== followed by a later acquisition of lock at 0x1472C070 ==3574084== at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942) ==3574084== by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749) ==3574084== by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811) ==3574084== by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108) ==3574084== by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229) ==3574084== by 0xB2D6DDA: lldb_private::FormatManager::Changed() (llvm/lldb/source/DataFormatters/FormatManager.cpp:118) ==3574084== by 0xB2F1E46: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:31) ==3574084== by 0xB2D8A57: lldb_private::FormatManager::GetCategory(lldb_private::ConstString, bool) (llvm/lldb/source/DataFormatters/FormatManager.cpp:423) ==3574084== by 0xB2D380B: lldb_private::DataVisualization::Categories::GetCategory(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&, bool) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:79) ==3574084== by 0xAE47DFD: lldb::SBDebugger::CreateCategory(char const*) (llvm/lldb/source/API/SBDebugger.cpp:1535) ==3574084== by 0xAF7BA0B: _wrap_SBDebugger_CreateCategory(_object*, _object*) (LLDBWrapPython.cpp:24738) ==3574084== by 0x14E77483: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0) ==3574084== ==3574084== Required order was established by acquisition of lock at 0x1472C070 ==3574084== at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942) ==3574084== by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749) ==3574084== by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811) ==3574084== by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108) ==3574084== by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229) ==3574084== by 0xB2D92A3: lldb_private::FormatManager::GetCategoryForLanguage(lldb::LanguageType) (llvm/lldb/source/DataFormatters/FormatManager.cpp:586) ==3574084== by 0xB2DAB24: std::shared_ptr<lldb_private::TypeFormatImpl> lldb_private::FormatManager::Get<std::shared_ptr<lldb_private::TypeFormatImpl> >(lldb_private::ValueObject&, lldb::DynamicValueType) (llvm/lldb/source/DataFormatters/FormatManager.cpp:620) ==3574084== by 0xB2D93C0: lldb_private::FormatManager::GetFormat(lldb_private::ValueObject&, lldb::DynamicValueType) (llvm/lldb/source/DataFormatters/FormatManager.cpp:671) ==3574084== by 0xB2D3565: lldb_private::DataVisualization::GetFormat(lldb_private::ValueObject&, lldb::DynamicValueType) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:33) ==3574084== by 0xB2AC762: lldb_private::ValueObject::UpdateFormatsIfNeeded() (llvm/lldb/source/Core/ValueObject.cpp:217) ==3574084== by 0xB2AC245: lldb_private::ValueObject::UpdateValueIfNeeded(bool) (llvm/lldb/source/Core/ValueObject.cpp:116) ==3574084== by 0xB2FD1B2: lldb_private::ValueObjectPrinter::GetMostSpecializedValue() (llvm/lldb/source/DataFormatters/ValueObjectPrinter.cpp:101) ==3574084== ==3574084== followed by a later acquisition of lock at 0x1472C110 ==3574084== at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942) ==3574084== by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749) ==3574084== by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811) ==3574084== by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108) ==3574084== by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229) ==3574084== by 0xB2F20C4: lldb_private::TypeCategoryMap::Get(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:134) ==3574084== by 0xB2D89A6: lldb_private::FormatManager::GetCategory(lldb_private::ConstString, bool) (llvm/lldb/source/DataFormatters/FormatManager.cpp:417) ==3574084== by 0xB2D380B: lldb_private::DataVisualization::Categories::GetCategory(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&, bool) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:79) ==3574084== by 0xB75D5B2: lldb_private::CPlusPlusLanguage::GetFormatters()::$_0::operator()() const (llvm/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp:1246) ==3574084== by 0xB75D55C: void std::__invoke_impl<void, lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>(std::__invoke_other, lldb_private::CPlusPlusLanguage::GetFormatters()::$_0&&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61) ==3574084== by 0xB75D52C: std::__invoke_result<lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>::type std::__invoke<lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>(lldb_private::CPlusPlusLanguage::GetFormatters()::$_0&&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96) ==3574084== by 0xB75D4FF: std::call_once<lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>(std::once_flag&, lldb_private::CPlusPlusLanguage::GetFormatters()::$_0&&)::{lambda()#1}::operator()() const (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:776) ==3574084== ==3574084== Lock at 0x1472C070 was first observed ==3574084== at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942) ==3574084== by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749) ==3574084== by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811) ==3574084== by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108) ==3574084== by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229) ==3574084== by 0xB2D6DDA: lldb_private::FormatManager::Changed() (llvm/lldb/source/DataFormatters/FormatManager.cpp:118) ==3574084== by 0xB2F1E46: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:31) ==3574084== by 0xB2F1DB6: lldb_private::TypeCategoryMap::TypeCategoryMap(lldb_private::IFormatChangeListener*) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:23) ==3574084== by 0xB2D9511: lldb_private::FormatManager::FormatManager() (llvm/lldb/source/DataFormatters/FormatManager.cpp:689) ==3574084== by 0xB2D34A1: GetFormatManager() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:16) ==3574084== by 0xB2D3458: lldb_private::DataVisualization::ForceUpdate() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:20) ==3574084== by 0xE514F28: lldb_private::CommandObjectScript::DoExecute(llvm::StringRef, lldb_private::CommandReturnObject&) (llvm/lldb/source/Commands/CommandObjectScript.cpp:121) ==3574084== Address 0x1472c070 is 120 bytes inside data symbol "_ZZL16GetFormatManagervE16g_format_manager" ==3574084== ==3574084== Lock at 0x1472C110 was first observed ==3574084== Lock at 0x1472C110 was first observed ==3574084== at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942) ==3574084== by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749) ==3574084== by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811) ==3574084== by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108) ==3574084== by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229) ==3574084== by 0xB2F1E04: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:28) ==3574084== by 0xB2F1DB6: lldb_private::TypeCategoryMap::TypeCategoryMap(lldb_private::IFormatChangeListener*) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:23) ==3574084== by 0xB2D9511: lldb_private::FormatManager::FormatManager() (llvm/lldb/source/DataFormatters/FormatManager.cpp:689) ==3574084== by 0xB2D34A1: GetFormatManager() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:16) ==3574084== by 0xB2D3458: lldb_private::DataVisualization::ForceUpdate() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:20) ==3574084== by 0xE514F28: lldb_private::CommandObjectScript::DoExecute(llvm::StringRef, lldb_private::CommandReturnObject&) (llvm/lldb/source/Commands/CommandObjectScript.cpp:121) ==3574084== by 0xB3AF4B3: lldb_private::CommandObjectRaw::Execute(char const*, lldb_private::CommandReturnObject&) (llvm/lldb/source/Interpreter/CommandObject.cpp:1020) ==3574084== Address 0x1472c110 is 280 bytes inside data symbol "_ZZL16GetFormatManagervE16g_format_manager" ==3574084== ==3574084== ==3574084== ----------------------------------------------------------------
Forgive me if I am speaking out of turn, but do we need to check again here for whether lang_type is in the map? In other words, it seems possible that (in some zany situation) while we are doing the memory allocation (line 593) someone else has come along and added that category since we released the lock on it in 592.
I hope that this is helpful. Please (again) forgive me if I am speaking out of turn.