Calling symbolization directly from stopTheWorld was causing deadlock.
For libc dep systems, symbolization uses dl_iterate_phdr, which acquire a
dl write lock. It could deadlock if the lock is already acquired by one of
suspended.
Details
Diff Detail
Event Timeline
How hard to reproduce this issue?
compiler-rt/lib/asan/asan_memory_profile.cpp | ||
---|---|---|
143 | Could you please point me to that code? |
compiler-rt/lib/asan/asan_memory_profile.cpp | ||
---|---|---|
143 | thanks for looking, its here This deadlock is almost similar to this one https://github.com/google/sanitizers/issues/1364 |
compiler-rt/lib/asan/asan_memory_profile.cpp | ||
---|---|---|
143 |
Actually I want to see call stack from "Stack depot" to use dl_iterate_phdr. I didn't expect it's used from there. |
compiler-rt/lib/asan/asan_memory_profile.cpp | ||
---|---|---|
143 | its call stack was similar to this one: #0 __lll_lock_wait (futex=futex@entry=0x7ff0eebae990 <_rtld_global+2352>, private=0) at lowlevellock.c:52 #1 0x00007ff0ee97e131 in __GI___pthread_mutex_lock (mutex=0x7ff0eebae990 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115 #2 0x00007ff0ee6bb231 in __GI___dl_iterate_phdr (callback=0x558daa8feef0 <__sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)>, data=0x7ff082c7cca0) at dl-iteratephdr.c:40 #3 0x0000558daa8feee5 in __sanitizer::ListOfModules::init() () #4 0x0000558daa90515e in __sanitizer::Symbolizer::FindModuleForAddress(unsigned long) () #5 0x0000558daa904d18 in __sanitizer::Symbolizer::SymbolizePC(unsigned long) () #6 0x0000558daa9035e8 in __sanitizer::StackTrace::Print() const () #7 0x0000558daa8e2437 in __asan::HeapProfile::Print(unsigned long, unsigned long) () #8 0x0000558daa8e22d0 in __asan::MemoryProfileCB(__sanitizer::SuspendedThreadsList const&, void*) () #9 0x0000558daa900e16 in __sanitizer::TracerThread(void*) () #10 0x0000558daa8f8a03 in __sanitizer::internal_clone(int (*)(void*), void*, int, void*, int*, void*, int*) () |
compiler-rt/lib/asan/asan_memory_profile.cpp | ||
---|---|---|
143 | Thanks, I see, so it's StackTrace::Print() with symbolization, not the Depot. Maybe nicer solution is to move these from here and lsan into __sanitizer::StopTheWorld? |
compiler-rt/lib/asan/asan_memory_profile.cpp | ||
---|---|---|
143 | Thanks, that's great idea. Other option I was thinking was making lsan's LockDefStuffAndStopTheWorld function more generic, accepting (void* instead of CheckForLeaksParam *), but due to lsan specific StopTheWorld implementation for fuchsia, dropped that idea as well. Sorry, I'm still learning sanitizers code, so if I intercept anything wrong please correct me. |
It seems that this change causes the test suite to hang: https://lab.llvm.org/buildbot/#/builders/37/builds/21181 and later.
ah, its hanging when running with internal symbolizer.
Not sure if its due to this change, if it is than possibly one of acquired locks here are causing deadlock with internal symbolizer.
Is there is any way I can try reproducing it locally? Maybe in meantime we can try by reverting this change.
I'm so sorry for this trouble.
I'm sorry, please disregard my latest comment, I was looking at the wrong diff. I've eventually bisected the hang I was experiencing on Gentoo to https://reviews.llvm.org/rG27c4777f41d2. That said, I don't know if the buildbot failure is related to this commit or that one.
This appears to be breaking the builtbot: https://lab.llvm.org/buildbot/#/builders/37/builds/21181
I did try to reproduce the problem locally, but it seemed to pass. There may be an issue with our buildbot hosts.
I will check into that separately.
https://lab.llvm.org/buildbot/#/builders/37/builds/21181
We still can't safely symbolize from stop the world (and we don't symbolize in lsan as well)
We should:
- remove LockStuffAndStopTheWorld from here
- Lock allocator and call __lsan::ForEachChunk(ChunkCallback, &hp) under that lock
- unlock allocator and printout HeapProfile
something like
diff --git a/compiler-rt/lib/asan/asan_memory_profile.cpp b/compiler-rt/lib/asan/asan_memory_profile.cpp index 6c2b7404a279..ba2a7ecde4ef 100644 --- a/compiler-rt/lib/asan/asan_memory_profile.cpp +++ b/compiler-rt/lib/asan/asan_memory_profile.cpp @@ -14,7 +14,6 @@ #include "sanitizer_common/sanitizer_common.h" #include "sanitizer_common/sanitizer_stackdepot.h" #include "sanitizer_common/sanitizer_stacktrace.h" -#include "sanitizer_common/sanitizer_stoptheworld.h" #include "lsan/lsan_common.h" #include "asan/asan_allocator.h" @@ -100,12 +99,12 @@ static void ChunkCallback(uptr chunk, void *arg) { FindHeapChunkByAllocBeg(chunk)); } -static void MemoryProfileCB(const SuspendedThreadsList &suspended_threads_list, - void *argument) { +static void MemoryProfileCB(uptr top_percent, uptr max_number_of_contexts) { HeapProfile hp; + __lsan::LockAllocator(); __lsan::ForEachChunk(ChunkCallback, &hp); - uptr *Arg = reinterpret_cast<uptr*>(argument); - hp.Print(Arg[0], Arg[1]); + __lsan::UnlockAllocator(); + hp.Print(top_percent, max_number_of_contexts); if (Verbosity()) __asan_print_accumulated_stats(); @@ -122,10 +121,7 @@ SANITIZER_INTERFACE_ATTRIBUTE void __sanitizer_print_memory_profile(uptr top_percent, uptr max_number_of_contexts) { #if CAN_SANITIZE_LEAKS - uptr Arg[2]; - Arg[0] = top_percent; - Arg[1] = max_number_of_contexts; - __sanitizer::StopTheWorld(__asan::MemoryProfileCB, Arg); + __asan::MemoryProfileCB(top_percent, max_number_of_contexts); #endif // CAN_SANITIZE_LEAKS } } // extern "C"
Could you please point me to that code?