This is an archive of the discontinued LLVM Phabricator instance.

asan_memory_profile: Fix for deadlock in memory profiler code.
ClosedPublic

Authored by sanjeet-karan-singh on Mar 27 2023, 10:59 AM.

Details

Summary

Calling symbolization directly from stopTheWorld was causing deadlock.
For libc dep systems, symbolization uses dl_iterate_phdr, which acquire a
dl write lock. It could deadlock if the lock is already acquired by one of
suspended.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 10:59 AM
sanjeet-karan-singh requested review of this revision.Mar 27 2023, 10:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 10:59 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript

How hard to reproduce this issue?

compiler-rt/lib/asan/asan_memory_profile.cpp
131

Could you please point me to that code?

compiler-rt/lib/asan/asan_memory_profile.cpp
131
vitalybuka added inline comments.Mar 27 2023, 11:32 AM
compiler-rt/lib/asan/asan_memory_profile.cpp
131

thanks for looking, its here
https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp#L737

This deadlock is almost similar to this one https://github.com/google/sanitizers/issues/1364

Actually I want to see call stack from "Stack depot" to use dl_iterate_phdr. I didn't expect it's used from there.

compiler-rt/lib/asan/asan_memory_profile.cpp
131

its call stack was similar to this one:

#0  __lll_lock_wait (futex=futex@entry=0x7ff0eebae990 <_rtld_global+2352>, private=0) at lowlevellock.c:52
#1  0x00007ff0ee97e131 in __GI___pthread_mutex_lock (mutex=0x7ff0eebae990 <_rtld_global+2352>) at ../nptl/pthread_mutex_lock.c:115
#2  0x00007ff0ee6bb231 in __GI___dl_iterate_phdr (callback=0x558daa8feef0 <__sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)>, data=0x7ff082c7cca0) at dl-iteratephdr.c:40
#3  0x0000558daa8feee5 in __sanitizer::ListOfModules::init() ()
#4  0x0000558daa90515e in __sanitizer::Symbolizer::FindModuleForAddress(unsigned long) ()
#5  0x0000558daa904d18 in __sanitizer::Symbolizer::SymbolizePC(unsigned long) ()
#6  0x0000558daa9035e8 in __sanitizer::StackTrace::Print() const ()
#7  0x0000558daa8e2437 in __asan::HeapProfile::Print(unsigned long, unsigned long) ()
#8  0x0000558daa8e22d0 in __asan::MemoryProfileCB(__sanitizer::SuspendedThreadsList const&, void*) ()
#9  0x0000558daa900e16 in __sanitizer::TracerThread(void*) ()
#10 0x0000558daa8f8a03 in __sanitizer::internal_clone(int (*)(void*), void*, int, void*, int*, void*, int*) ()
vitalybuka added inline comments.Mar 27 2023, 12:26 PM
compiler-rt/lib/asan/asan_memory_profile.cpp
131

Thanks, I see, so it's StackTrace::Print() with symbolization, not the Depot.

Maybe nicer solution is to move these from here and lsan into __sanitizer::StopTheWorld?

compiler-rt/lib/asan/asan_memory_profile.cpp
131

Thanks, that's great idea.
But, I think these locks are specific to caller functions, like here we know we need dl-iter lock (which may not be case for other StopTheWorld use cases).

Other option I was thinking was making lsan's LockDefStuffAndStopTheWorld function more generic, accepting (void* instead of CheckForLeaksParam *), but due to lsan specific StopTheWorld implementation for fuchsia, dropped that idea as well.

Sorry, I'm still learning sanitizers code, so if I intercept anything wrong please correct me.

vitalybuka accepted this revision.Mar 31 2023, 8:36 PM
This revision is now accepted and ready to land.Mar 31 2023, 8:36 PM
vitalybuka edited the summary of this revision. (Show Details)Mar 31 2023, 8:39 PM
This revision was landed with ongoing or failed builds.Mar 31 2023, 9:11 PM
This revision was automatically updated to reflect the committed changes.
mgorny added a subscriber: mgorny.Apr 1 2023, 3:29 AM

It seems that this change causes the test suite to hang: https://lab.llvm.org/buildbot/#/builders/37/builds/21181 and later.

mgorny added a comment.Apr 1 2023, 5:57 AM

It seems that this change causes the test suite to hang: https://lab.llvm.org/buildbot/#/builders/37/builds/21181 and later.

Hmm, actually, it might be a change in LLVM or Clang. I'm bisecting.

mgorny added a comment.Apr 1 2023, 6:33 AM

Ok, I've given up on bisect. I don't have the hardware to do this.

ah, its hanging when running with internal symbolizer.
Not sure if its due to this change, if it is than possibly one of acquired locks here are causing deadlock with internal symbolizer.
Is there is any way I can try reproducing it locally? Maybe in meantime we can try by reverting this change.
I'm so sorry for this trouble.

mgorny added a comment.Apr 3 2023, 4:16 AM
This comment was removed by mgorny.
mgorny added a comment.Apr 3 2023, 4:28 AM

I'm sorry, please disregard my latest comment, I was looking at the wrong diff. I've eventually bisected the hang I was experiencing on Gentoo to https://reviews.llvm.org/rG27c4777f41d2. That said, I don't know if the buildbot failure is related to this commit or that one.

kda added a subscriber: kda.Apr 3 2023, 5:23 PM

This appears to be breaking the builtbot: https://lab.llvm.org/buildbot/#/builders/37/builds/21181

I did try to reproduce the problem locally, but it seemed to pass. There may be an issue with our buildbot hosts.
I will check into that separately.

vitalybuka reopened this revision.Apr 3 2023, 11:07 PM

https://lab.llvm.org/buildbot/#/builders/37/builds/21181

We still can't safely symbolize from stop the world (and we don't symbolize in lsan as well)

We should:

  1. remove LockStuffAndStopTheWorld from here
  2. Lock allocator and call __lsan::ForEachChunk(ChunkCallback, &hp) under that lock
  3. unlock allocator and printout HeapProfile
This revision is now accepted and ready to land.Apr 3 2023, 11:07 PM

something like

diff --git a/compiler-rt/lib/asan/asan_memory_profile.cpp b/compiler-rt/lib/asan/asan_memory_profile.cpp
index 6c2b7404a279..ba2a7ecde4ef 100644
--- a/compiler-rt/lib/asan/asan_memory_profile.cpp
+++ b/compiler-rt/lib/asan/asan_memory_profile.cpp
@@ -14,7 +14,6 @@
 #include "sanitizer_common/sanitizer_common.h"
 #include "sanitizer_common/sanitizer_stackdepot.h"
 #include "sanitizer_common/sanitizer_stacktrace.h"
-#include "sanitizer_common/sanitizer_stoptheworld.h"
 #include "lsan/lsan_common.h"
 #include "asan/asan_allocator.h"
 
@@ -100,12 +99,12 @@ static void ChunkCallback(uptr chunk, void *arg) {
       FindHeapChunkByAllocBeg(chunk));
 }
 
-static void MemoryProfileCB(const SuspendedThreadsList &suspended_threads_list,
-                            void *argument) {
+static void MemoryProfileCB(uptr top_percent, uptr max_number_of_contexts) {
   HeapProfile hp;
+  __lsan::LockAllocator();
   __lsan::ForEachChunk(ChunkCallback, &hp);
-  uptr *Arg = reinterpret_cast<uptr*>(argument);
-  hp.Print(Arg[0], Arg[1]);
+  __lsan::UnlockAllocator();
+  hp.Print(top_percent, max_number_of_contexts);
 
   if (Verbosity())
     __asan_print_accumulated_stats();
@@ -122,10 +121,7 @@ SANITIZER_INTERFACE_ATTRIBUTE
 void __sanitizer_print_memory_profile(uptr top_percent,
                                       uptr max_number_of_contexts) {
 #if CAN_SANITIZE_LEAKS
-  uptr Arg[2];
-  Arg[0] = top_percent;
-  Arg[1] = max_number_of_contexts;
-  __sanitizer::StopTheWorld(__asan::MemoryProfileCB, Arg);
+  __asan::MemoryProfileCB(top_percent, max_number_of_contexts);
 #endif  // CAN_SANITIZE_LEAKS
 }
 }  // extern "C"
sanjeet-karan-singh edited the summary of this revision. (Show Details)

Great Idea, Thankyou so much @vitalybuka, just updated diff

vitalybuka accepted this revision.Apr 4 2023, 4:40 PM