This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Tighten the scope of a couple of locks in DataFormatters.
AbandonedPublic

Authored by jgorbe on May 23 2022, 1:59 PM.

Details

Summary

I've noticed (what looked like) deadlocking behavior while running some
Python code that registers categories and formatters from a separate
thread. I ran lldb and my scripts through Helgrind and got (among other
unrelated reports) the report below.

In summary, there are two mutexes (TypeCategoryMap::m_map_mutex and
FormatManager::m_language_categories_mutex) that sometimes get
acquired in different order. This change attempts to fix that by
reducing the scope of the locks that appear in the Helgrind report.

  • TypeCategoryMap::Add locks m_map_mutex and keeps holding it while notifying listener. The listener class shouldn't touch any data in TypeCategoryMap (because everything is private), except by using the class' own methods, and all of them acquire the mutex on entry.

    This change makes it release the lock after m_map_mutex is updated, but before the listener is invoked.
  • FormatManager::GetCategoryForLanguage does a map lookup (holding m_language_categories_mutex) and, if the lookup fails, constructs a new LanguageCategory. This also results in listeners being called while still holding the mutex.

    This patch changes it so that the lock is only held during initial lookup. If lookup fails, the new category will be created and only then we'll acquire the lock again to insert the new category into the map.

Here's the Helgrind report for reference:

==3574084== ----------------------------------------------------------------
==3574084==
==3574084== Thread #6: lock order "0x1472C070 before 0x1472C110" violated
==3574084==
==3574084== Observed (incorrect) order is: acquisition of lock at 0x1472C110
==3574084==    at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942)
==3574084==    by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
==3574084==    by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811)
==3574084==    by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108)
==3574084==    by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229)
==3574084==    by 0xB2F1E04: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:28)
==3574084==    by 0xB2D8A57: lldb_private::FormatManager::GetCategory(lldb_private::ConstString, bool) (llvm/lldb/source/DataFormatters/FormatManager.cpp:423)
==3574084==    by 0xB2D380B: lldb_private::DataVisualization::Categories::GetCategory(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&, bool) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:79)
==3574084==    by 0xAE47DFD: lldb::SBDebugger::CreateCategory(char const*) (llvm/lldb/source/API/SBDebugger.cpp:1535)
==3574084==    by 0xAF7BA0B: _wrap_SBDebugger_CreateCategory(_object*, _object*) (LLDBWrapPython.cpp:24738)
==3574084==    by 0x14E77483: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0)
==3574084==    by 0x14E33A07: _PyObject_MakeTpCall (in /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0)
==3574084==
==3574084==  followed by a later acquisition of lock at 0x1472C070
==3574084==    at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942)
==3574084==    by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
==3574084==    by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811)
==3574084==    by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108)
==3574084==    by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229)
==3574084==    by 0xB2D6DDA: lldb_private::FormatManager::Changed() (llvm/lldb/source/DataFormatters/FormatManager.cpp:118)
==3574084==    by 0xB2F1E46: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:31)
==3574084==    by 0xB2D8A57: lldb_private::FormatManager::GetCategory(lldb_private::ConstString, bool) (llvm/lldb/source/DataFormatters/FormatManager.cpp:423)
==3574084==    by 0xB2D380B: lldb_private::DataVisualization::Categories::GetCategory(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&, bool) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:79)
==3574084==    by 0xAE47DFD: lldb::SBDebugger::CreateCategory(char const*) (llvm/lldb/source/API/SBDebugger.cpp:1535)
==3574084==    by 0xAF7BA0B: _wrap_SBDebugger_CreateCategory(_object*, _object*) (LLDBWrapPython.cpp:24738)
==3574084==    by 0x14E77483: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0)
==3574084==
==3574084== Required order was established by acquisition of lock at 0x1472C070
==3574084==    at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942)
==3574084==    by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
==3574084==    by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811)
==3574084==    by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108)
==3574084==    by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229)
==3574084==    by 0xB2D92A3: lldb_private::FormatManager::GetCategoryForLanguage(lldb::LanguageType) (llvm/lldb/source/DataFormatters/FormatManager.cpp:586)
==3574084==    by 0xB2DAB24: std::shared_ptr<lldb_private::TypeFormatImpl> lldb_private::FormatManager::Get<std::shared_ptr<lldb_private::TypeFormatImpl> >(lldb_private::ValueObject&, lldb::DynamicValueType) (llvm/lldb/source/DataFormatters/FormatManager.cpp:620)
==3574084==    by 0xB2D93C0: lldb_private::FormatManager::GetFormat(lldb_private::ValueObject&, lldb::DynamicValueType) (llvm/lldb/source/DataFormatters/FormatManager.cpp:671)
==3574084==    by 0xB2D3565: lldb_private::DataVisualization::GetFormat(lldb_private::ValueObject&, lldb::DynamicValueType) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:33)
==3574084==    by 0xB2AC762: lldb_private::ValueObject::UpdateFormatsIfNeeded() (llvm/lldb/source/Core/ValueObject.cpp:217)
==3574084==    by 0xB2AC245: lldb_private::ValueObject::UpdateValueIfNeeded(bool) (llvm/lldb/source/Core/ValueObject.cpp:116)
==3574084==    by 0xB2FD1B2: lldb_private::ValueObjectPrinter::GetMostSpecializedValue() (llvm/lldb/source/DataFormatters/ValueObjectPrinter.cpp:101)
==3574084==
==3574084==  followed by a later acquisition of lock at 0x1472C110
==3574084==    at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942)
==3574084==    by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
==3574084==    by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811)
==3574084==    by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108)
==3574084==    by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229)
==3574084==    by 0xB2F20C4: lldb_private::TypeCategoryMap::Get(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:134)
==3574084==    by 0xB2D89A6: lldb_private::FormatManager::GetCategory(lldb_private::ConstString, bool) (llvm/lldb/source/DataFormatters/FormatManager.cpp:417)
==3574084==    by 0xB2D380B: lldb_private::DataVisualization::Categories::GetCategory(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl>&, bool) (llvm/lldb/source/DataFormatters/DataVisualization.cpp:79)
==3574084==    by 0xB75D5B2: lldb_private::CPlusPlusLanguage::GetFormatters()::$_0::operator()() const (llvm/lldb/source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp:1246)
==3574084==    by 0xB75D55C: void std::__invoke_impl<void, lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>(std::__invoke_other, lldb_private::CPlusPlusLanguage::GetFormatters()::$_0&&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61)
==3574084==    by 0xB75D52C: std::__invoke_result<lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>::type std::__invoke<lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>(lldb_private::CPlusPlusLanguage::GetFormatters()::$_0&&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96)
==3574084==    by 0xB75D4FF: std::call_once<lldb_private::CPlusPlusLanguage::GetFormatters()::$_0>(std::once_flag&, lldb_private::CPlusPlusLanguage::GetFormatters()::$_0&&)::{lambda()#1}::operator()() const (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:776)
==3574084==
==3574084==  Lock at 0x1472C070 was first observed
==3574084==    at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942)
==3574084==    by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
==3574084==    by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811)
==3574084==    by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108)
==3574084==    by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229)
==3574084==    by 0xB2D6DDA: lldb_private::FormatManager::Changed() (llvm/lldb/source/DataFormatters/FormatManager.cpp:118)
==3574084==    by 0xB2F1E46: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:31)
==3574084==    by 0xB2F1DB6: lldb_private::TypeCategoryMap::TypeCategoryMap(lldb_private::IFormatChangeListener*) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:23)
==3574084==    by 0xB2D9511: lldb_private::FormatManager::FormatManager() (llvm/lldb/source/DataFormatters/FormatManager.cpp:689)
==3574084==    by 0xB2D34A1: GetFormatManager() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:16)
==3574084==    by 0xB2D3458: lldb_private::DataVisualization::ForceUpdate() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:20)
==3574084==    by 0xE514F28: lldb_private::CommandObjectScript::DoExecute(llvm::StringRef, lldb_private::CommandReturnObject&) (llvm/lldb/source/Commands/CommandObjectScript.cpp:121)
==3574084==  Address 0x1472c070 is 120 bytes inside data symbol "_ZZL16GetFormatManagervE16g_format_manager"
==3574084==
==3574084==  Lock at 0x1472C110 was first observed
==3574084==  Lock at 0x1472C110 was first observed
==3574084==    at 0x484777F: mutex_lock_WRK (hg_intercepts.c:942)
==3574084==    by 0xADFF5E2: __gthread_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749)
==3574084==    by 0xADFF5B4: __gthread_recursive_mutex_lock(pthread_mutex_t*) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:811)
==3574084==    by 0xAE00D34: std::recursive_mutex::lock() (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/mutex:108)
==3574084==    by 0xADFFEB2: std::lock_guard<std::recursive_mutex>::lock_guard(std::recursive_mutex&) (bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229)
==3574084==    by 0xB2F1E04: lldb_private::TypeCategoryMap::Add(lldb_private::ConstString, std::shared_ptr<lldb_private::TypeCategoryImpl> const&) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:28)
==3574084==    by 0xB2F1DB6: lldb_private::TypeCategoryMap::TypeCategoryMap(lldb_private::IFormatChangeListener*) (llvm/lldb/source/DataFormatters/TypeCategoryMap.cpp:23)
==3574084==    by 0xB2D9511: lldb_private::FormatManager::FormatManager() (llvm/lldb/source/DataFormatters/FormatManager.cpp:689)
==3574084==    by 0xB2D34A1: GetFormatManager() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:16)
==3574084==    by 0xB2D3458: lldb_private::DataVisualization::ForceUpdate() (llvm/lldb/source/DataFormatters/DataVisualization.cpp:20)
==3574084==    by 0xE514F28: lldb_private::CommandObjectScript::DoExecute(llvm::StringRef, lldb_private::CommandReturnObject&) (llvm/lldb/source/Commands/CommandObjectScript.cpp:121)
==3574084==    by 0xB3AF4B3: lldb_private::CommandObjectRaw::Execute(char const*, lldb_private::CommandReturnObject&) (llvm/lldb/source/Interpreter/CommandObject.cpp:1020)
==3574084==  Address 0x1472c110 is 280 bytes inside data symbol "_ZZL16GetFormatManagervE16g_format_manager"
==3574084==
==3574084==
==3574084== ----------------------------------------------------------------

Diff Detail

Event Timeline

jgorbe created this revision.May 23 2022, 1:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 1:59 PM
Herald added a subscriber: pengfei. · View Herald Transcript
jgorbe requested review of this revision.May 23 2022, 1:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 1:59 PM
jgorbe edited the summary of this revision. (Show Details)May 23 2022, 2:04 PM
hawkinsw added inline comments.
lldb/source/DataFormatters/FormatManager.cpp
595

Forgive me if I am speaking out of turn, but do we need to check again here for whether lang_type is in the map? In other words, it seems possible that (in some zany situation) while we are doing the memory allocation (line 593) someone else has come along and added that category since we released the lock on it in 592.

I hope that this is helpful. Please (again) forgive me if I am speaking out of turn.

jgorbe added inline comments.Jun 3 2022, 6:44 PM
lldb/source/DataFormatters/FormatManager.cpp
595

Not at all! Thank you for your comment! :)

You are absolutely right, I had overlooked this case. Another annoying consequence is that in this case the racing threads would create multiple LanguageCategory objects. The LanguageCategory constructor calls Enable which in turn ends up calling the Changed method of the listener, so even if we check the map before insertion I think we'd still see the listener receiving multiple notifications that the category was enabled, which is not ideal.

I don't know how to properly fix this. I'll give it some more thought.

hawkinsw added inline comments.Jun 4 2022, 3:01 AM
lldb/source/DataFormatters/FormatManager.cpp
595

Glad that it was helpful! I will keep an eye on this PR and be ready to help again if you can find a way out of that tricky conundrum!! :-)

Will

jgorbe added a comment.Jun 7 2022, 3:09 PM

I've been experimenting a little bit and it seems that I can avoid the deadlocks I was seeing by applying only the modification to TypeCategoryMap::Add. This would avoid the problem that @hawkinsw pointed out in my change to FormatManager::GetCategoryForLanguage. However, the following problem could still happen with the change to TypeCategoryMap::Add:

  1. Thread A acquires the lock, runs m_map[key1] = value1, and releases the lock
  2. Thread B acquires the lock, runs m_map[key2] = value2, and releases the lock
  3. Thread A runs listener->Changed()
  4. Thread B runs listener->Changed()

So we would have two changes, the listener would be called twice, but both listener calls would see the same state with the two changes already applied, instead of observing each change individually. I'll keep thinking to see if I can fix this elsewhere.

I've been experimenting a little bit and it seems that I can avoid the deadlocks I was seeing by applying only the modification to TypeCategoryMap::Add. This would avoid the problem that @hawkinsw pointed out in my change to FormatManager::GetCategoryForLanguage. However, the following problem could still happen with the change to TypeCategoryMap::Add:

  1. Thread A acquires the lock, runs m_map[key1] = value1, and releases the lock
  2. Thread B acquires the lock, runs m_map[key2] = value2, and releases the lock
  3. Thread A runs listener->Changed()
  4. Thread B runs listener->Changed()

So we would have two changes, the listener would be called twice, but both listener calls would see the same state with the two changes already applied, instead of observing each change individually. I'll keep thinking to see if I can fix this elsewhere.

I am *so* sorry I haven't had a chance to come back to work on this change until now, but I did have one thought that I will investigate this PM. I will keep you posted if I find anything. Again, I am sorry for the radio silence.

I've been experimenting a little bit and it seems that I can avoid the deadlocks I was seeing by applying only the modification to TypeCategoryMap::Add. This would avoid the problem that @hawkinsw pointed out in my change to FormatManager::GetCategoryForLanguage. However, the following problem could still happen with the change to TypeCategoryMap::Add:

  1. Thread A acquires the lock, runs m_map[key1] = value1, and releases the lock
  2. Thread B acquires the lock, runs m_map[key2] = value2, and releases the lock
  3. Thread A runs listener->Changed()
  4. Thread B runs listener->Changed()

So we would have two changes, the listener would be called twice, but both listener calls would see the same state with the two changes already applied, instead of observing each change individually. I'll keep thinking to see if I can fix this elsewhere.

[disclaimer: I suggested this approach offline]

Yes, that could happen, but is that a problem? That sequence of actions is indistinguishable (to the object receiving the Changed notifications) from an action which modifies two keys, followed by a no-op action.

I know we currently don't have such actions, but I don't see why should the receiver care about that. The notifications don't carry any information about the change anyway, so the only way it could notice that is if it made a snapshot of the state each time it is called and then done a diff. If we really wanted it to receive granular update informations like this, we would have designed the interface differently.

jgorbe abandoned this revision.Sep 16 2022, 11:29 AM