This is an archive of the discontinued LLVM Phabricator instance.

[DWARFSymbolFile] Add the module lock where necessary and assert that we own it.
ClosedPublic

Authored by JDevlieghere on Sep 26 2018, 4:30 AM.

Details

Summary

As discussed with Greg at the dev meeting, we need to ensure we have the module lock in the SymbolFile. Usually the symbol file is accessed through the symbol vendor which ensures that the necessary locks are taken. However, there are a few methods that are accessed by the expression parser and were lacking the lock.

This patch adds the locking where necessary and everywhere else asserts that we actually already own the lock.

Diff Detail

Event Timeline

JDevlieghere created this revision.Sep 26 2018, 4:30 AM

It is the SymbolVendor's job to do the locking. And in many cases it already is. I stopped with inlined comments after a few comments as it would apply to this entire patch.

It would be great if we can replace all of the locations where you added lock guards with lldbasserts to assert that the lock is already taken. Then we might be able to catch places where the locks are not being respected. But in general the symbol vendor should take the lock, use N number of SymbolFile instances to complete the request, and then release the lock. Makes things easier on the lldb_private::SymbolFile subclasses this way.

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
263–264

This shouldn't require locking as it is just handing out a type list ivar from one place or another.

275–276

This is not a SymbolFile override. It shouldn't require the locking.

439–440

This shouldn't require locking as it is just handing out a type system from one place or another.

860–861

SymbolVendor::GetNumCompileUnits() already takes the module lock for us. All other internal calls to this should be protected as well

867–868

SymbolVendor already takes the module lock for us

880–881

SymbolVendor requests that lead to this call should take the lock for us...

It is the SymbolVendor's job to do the locking. And in many cases it already is. I stopped with inlined comments after a few comments as it would apply to this entire patch.

Thanks Greg. Honestly I don't know the code well enough to be sure where we *need* a lock and where we can get away without. Since it's a recursive mutex I figured there's no harm in playing it safe. Not everything goes through the symbol vendor though, for example the backtrace below is the result of a corruption and it looks like we're calling directly into SymbolFileDWARF.

>  1 com.apple.LLDB.framework       0x02f8dcf5 bool llvm::DenseMapBase<llvm::DenseMap<void*, DIERef, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, DIERef> >, void*, DIERef, llvm::DenseMapInfo<void*>, llvm::detail::DenseMapPair<void*, DIERef> >::LookupBucketFor<void*>(void* const&, llvm::detail::DenseMapPair<void*, DIERef> const*&) const + 45
   2 com.apple.LLDB.framework       0x02f8e91e llvm::DenseMapBase<llvm::DenseMap<DWARFDebugInfoEntry const*, clang::Decl*, llvm::DenseMapInfo<DWARFDebugInfoEntry const*>, llvm::detail::DenseMapPair<DWARFDebugInfoEntry const*, clang::Decl*> >, DWARFDebugInfoEntry const*, clang::Decl*, llvm::DenseMapInfo<DWARFDebugInfoEntry const*>, llvm::detail::DenseMapPair<DWARFDebugInfoEntry const*, clang::Decl*> >::FindAndConstruct(DWARFDebugInfoEntry const*&&) + 28
   3 com.apple.LLDB.framework       0x02f81f94 DWARFASTParserClang::ParseTypeFromDWARF(lldb_private::SymbolContext const&, DWARFDIE const&, lldb_private::Log*, bool*) + 6144
   4 com.apple.LLDB.framework       0x03150f75 SymbolFileDWARF::ParseType(lldb_private::SymbolContext const&, DWARFDIE const&, bool*) + 209
   5 com.apple.LLDB.framework       0x0314a70e SymbolFileDWARF::GetTypeForDIE(DWARFDIE const&, bool) + 486
   6 com.apple.LLDB.framework       0x03149ee2 SymbolFileDWARF::ResolveType(DWARFDIE const&, bool, bool) + 68
   7 com.apple.LLDB.framework       0x02f88728 DWARFASTParserClang::GetClangDeclContextForDIE(DWARFDIE const&) + 176
   8 com.apple.LLDB.framework       0x02f8c0ae DWARFASTParserClang::GetDeclContextForUIDFromDWARF(DWARFDIE const&) + 24
   9 com.apple.LLDB.framework       0x031636fe DWARFDIE::GetDeclContext() const + 60
  10 com.apple.LLDB.framework       0x03149dd5 SymbolFileDWARF::GetDeclContextForUID(unsigned long long) + 53
  11 com.apple.LLDB.framework       0x0315c6ed SymbolFileDWARFDebugMap::GetDeclContextForUID(u

It would be great if we can replace all of the locations where you added lock guards with lldbasserts to assert that the lock is already taken. Then we might be able to catch places where the locks are not being respected. But in general the symbol vendor should take the lock, use N number of SymbolFile instances to complete the request, and then release the lock. Makes things easier on the lldb_private::SymbolFile subclasses this way.

That sounds like a really good idea, I'll definitely do that.

My first observations:

  • We call ResolveTypeUID without going through the symbol vendor (similar to GetDeclContextForUID in the stack trace). This is easy to fix, just add locking to these functions.
  • The module in the symbol vendor is different from the one in the symbol file. This seems more tricky and I'm not sure if this expected?

I looked at TestCallCPPFunction where my newly added assertion triggered in ParseCompileUnitSupportFiles which I found odd since we're taking the module lock in symbol vendor.

frame #4: 0x000000010590c083 _lldb.so`lldb_private::SymbolFile::AssertModuleLock(this=0x000000010388ac00) at SymbolFile.cpp:160
   157    // We assert that we have to module lock by trying to acquire the lock from a
   158    // different thread. Note that we must abort if the result is true to
   159    // guarantee correctness.
-> 160    assert(std::async(
   161               std::launch::async,
   162               [this] {
   163                 return this->GetObjectFile()->GetModule()->GetMutex().try_lock();
(lldb) p GetObjectFile()->GetModule()
(lldb::ModuleSP) $1 = std::__1::shared_ptr<lldb_private::Module>::element_type @ 0x0000000118905378 strong=1 weak=1 {
  __ptr_ = 0x0000000118905378
}
(lldb) fr sel 7
frame #7: 0x000000010591035b _lldb.so`lldb_private::SymbolVendor::ParseCompileUnitSupportFiles(this=0x0000000100184f90, sc=0x00007ffeefbf79a8, support_files=0x0000000118904d70) at SymbolVendor.cpp:171
   168    if (module_sp) {
   169      std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());
   170      if (m_sym_file_ap.get())
-> 171        return m_sym_file_ap->ParseCompileUnitSupportFiles(sc, support_files);
   172    }
   173    return false;
   174  }
(lldb) p module_sp
(lldb::ModuleSP) $0 = std::__1::shared_ptr<lldb_private::Module>::element_type @ 0x0000000100189ec0 strong=6 weak=18 {
  __ptr_ = 0x0000000100189ec0
}
  • Add assertions per Greg's suggestion.
xbolva00 added inline comments.
source/Symbol/SymbolFile.cpp
164

lldbassert?

JDevlieghere retitled this revision from [DWARFSymbolFile] Adds the module lock to all virtual methods from SymbolFile to [DWARFSymbolFile] Add the module lock where necessary and assert that we own it..
JDevlieghere edited the summary of this revision. (Show Details)
clayborg requested changes to this revision.Oct 22 2018, 10:27 AM

See inlined comments

include/lldb/Symbol/SymbolFile.h
98

Might add a comment here stating something like "Symbols file subclasses should override this to return the Module that owns the TypeSystem that this symbol file modifies type information in".

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
2022–2025

Possible race condition and crasher here. Best coded as:

lldb::ModuleSP module_sp(m_debug_map_module_wp.lock());
if (module_sp)
  return module_sp->GetMutex();
return GetObjectFile()->GetModule()->GetMutex();

Otherwise some other thread could clear "m_debug_map_module_wp" after "m_debug_map_module_wp.expired()" is called, but before "m_debug_map_module_wp.lock()" is called and we would crash.

source/Symbol/SymbolFile.cpp
160–168

We should make this entire thing and all uses of it into a macro so we can enable it for Debug builds, but disable it for Release builds and have no extra code in release builds. We really shouldn't be starting threads up all the time for every function in SymbolFile subclasses for Release builds. It is ok for Debug builds or maybe manually enable this when we run into issues, but this is too expensive to leave in for all builds.

This revision now requires changes to proceed.Oct 22 2018, 10:27 AM
JDevlieghere marked 3 inline comments as done.

Address Greg's feedback.

This revision is now accepted and ready to land.Oct 22 2018, 12:44 PM
This revision was automatically updated to reflect the committed changes.

I think we can close this after rL344945?