This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer] Don't miss threads by ThreadSuspender
ClosedPublic

Authored by vitalybuka on May 7 2018, 1:21 AM.

Details

Summary

Enumerating /proc/<pid>/task/ dir Linux may stop if thread is dead. In this case
we miss some threads and report false memory leaks.
To solve this issue we repeat enumeration if the last thread is dead.
Do detect dead threads same way as proc_task_readdir we use
/proc/<pid>/task/<tid>/status.

Similarly it also ends enumeration of if proc_fill_cache fails, but in this case
Linux sets inode to 1 (Bad block).

Diff Detail

Event Timeline

vitalybuka created this revision.May 7 2018, 1:21 AM

What if there is no short read and the entire list is read in one syscall - does that mean that we've got a consistent thread list for some point of time in the past? If so, then iterating until the list stops changing and ignoring any attempts that ended in short reads should work.

What if there is no short read and the entire list is read in one syscall - does that mean that we've got a consistent thread list for some point of time in the past? If so, then iterating until the list stops changing and ignoring any attempts that ended in short reads should work.

I already tried that. It's possible to have no short read but still small list. Weird that even /proc/<pid>/status:Threads contains small value which match this incomplete list.

vitalybuka added a comment.EditedMay 7 2018, 2:00 PM

here I explain what I see WITHOUT this patch:
I have small program which creates 30 threads, which exit almost immediately. And another thread which runs __lsan_do_recoverable_leak_check().

Here the last successful __lsan_do_recoverable_leak_check

SuspendAllThreads
 ThreadLister::ListThreads
  internal_getdents: 840
   /proc/6563/task/6563
   /proc/6563/task/6564
   /proc/6563/task/6565
   /proc/6563/task/6566
   /proc/6563/task/6929
   /proc/6563/task/6930
   /proc/6563/task/6931
   /proc/6563/task/6932
   /proc/6563/task/6933
   /proc/6563/task/6934
   /proc/6563/task/6935
   /proc/6563/task/6936
   /proc/6563/task/6939
   /proc/6563/task/6940
   /proc/6563/task/6941
   /proc/6563/task/6942
   /proc/6563/task/6943
   /proc/6563/task/6944
   /proc/6563/task/6945
   /proc/6563/task/6946
   /proc/6563/task/6949
   /proc/6563/task/6950
   /proc/6563/task/6951
   /proc/6563/task/6952
   /proc/6563/task/6955
   /proc/6563/task/6956
   /proc/6563/task/6957
   /proc/6563/task/6958
   /proc/6563/task/6959
   /proc/6563/task/6960
   /proc/6563/task/6961
   /proc/6563/task/6966
   /proc/6563/task/6968
  internal_getdents: 0
 ThreadLister::ListThreads
  internal_getdents: 840
   /proc/6563/task/6563
   /proc/6563/task/6564
   /proc/6563/task/6565
   /proc/6563/task/6566
   /proc/6563/task/6929
   /proc/6563/task/6930
   /proc/6563/task/6931
   /proc/6563/task/6932
   /proc/6563/task/6933
   /proc/6563/task/6934
   /proc/6563/task/6935
   /proc/6563/task/6936
   /proc/6563/task/6939
   /proc/6563/task/6940
   /proc/6563/task/6941
   /proc/6563/task/6942
   /proc/6563/task/6943
   /proc/6563/task/6944
   /proc/6563/task/6945
   /proc/6563/task/6946
   /proc/6563/task/6949
   /proc/6563/task/6950
   /proc/6563/task/6951
   /proc/6563/task/6952
   /proc/6563/task/6955
   /proc/6563/task/6956
   /proc/6563/task/6957
   /proc/6563/task/6958
   /proc/6563/task/6959
   /proc/6563/task/6960
   /proc/6563/task/6961
   /proc/6563/task/6966
   /proc/6563/task/6968
  internal_getdents: 0

Leak check iterated list twice with same list of thread and suspended following:

Suspended
 suspended: 6563
 suspended: 6564
 suspended: 6565
 suspended: 6566
 suspended: 6929
 suspended: 6930
 suspended: 6931
 suspended: 6932
 suspended: 6933
 suspended: 6934
 suspended: 6935
 suspended: 6936
 suspended: 6939
 suspended: 6940
 suspended: 6941
 suspended: 6942
 suspended: 6943
 suspended: 6944
 suspended: 6945
 suspended: 6946
 suspended: 6949
 suspended: 6950
 suspended: 6951
 suspended: 6952
 suspended: 6955
 suspended: 6956
 suspended: 6957
 suspended: 6958
 suspended: 6959
 suspended: 6960
 suspended: 6961
 suspended: 6966
 suspended: 6968

Note threads 6958 and 6968.

The next call to __lsan_do_recoverable_leak_check()

SuspendAllThreads
 ThreadLister::ListThreads
  internal_getdents: 240
   /proc/6563/task/6563
   /proc/6563/task/6564
   /proc/6563/task/6565
   /proc/6563/task/6566
   /proc/6563/task/6930
   /proc/6563/task/6942
   /proc/6563/task/6944
   /proc/6563/task/6950
  internal_getdents: 0

Some threads are gone including 6958 and 6968.
The second internal_getdents returned 0 so it looks like a complete list.

Do one more iterations:

 ThreadLister::ListThreads
  internal_getdents: 168
   /proc/6563/task/6563
   /proc/6563/task/6564
   /proc/6563/task/6565
   /proc/6563/task/6566
   /proc/6563/task/6958
  internal_getdents: 0
Suspended
 suspended: 6563
 suspended: 6564
 suspended: 6565
 suspended: 6566

Thread 6958 is exiting so we failed to suspend it and we return from SuspendAllThreads.

However if we call ThreadLister one more time (just for debugging, without actually suspending):

ThreadLister::ListThreads
 internal_getdents: 168
  /proc/6563/task/6563
  /proc/6563/task/6564
  /proc/6563/task/6565
  /proc/6563/task/6566
  /proc/6563/task/6968
 internal_getdents: 0

6968 is back and we didn't suspend it.

Then we proceed leak checking and get:

==6563==ERROR: LeakSanitizer: detected memory leaks
Objects leaked above:
0x613000072500 (352 bytes)

If we ignore the leak and continue, next call to __lsan_do_recoverable_leak_check will see and suspend 6968 thread again.

SuspendAllThreads
 ThreadLister::ListThreads
  internal_getdents: 168
   /proc/6563/task/6563
   /proc/6563/task/6564
   /proc/6563/task/6565
   /proc/6563/task/6566
   /proc/6563/task/6968
  internal_getdents: 0
 ThreadLister::ListThreads
  internal_getdents: 168
   /proc/6563/task/6563
   /proc/6563/task/6564
   /proc/6563/task/6565
   /proc/6563/task/6566
   /proc/6563/task/6968
  internal_getdents: 0
Suspended
 suspended: 6563
 suspended: 6564
 suspended: 6565
 suspended: 6566
 suspended: 6968

So obviously increasing iteration highly reduce probability of missing threads but not guaranty correctness.
Reason is probably this place https://github.com/torvalds/linux/blob/master/fs/proc/base.c#L3562
When it hits dying thread, it exits as list is complete.

vitalybuka planned changes to this revision.May 7 2018, 9:39 PM

Looks like I can assume that kernel stops enumerating on the first !pid_alive()
So I can assume that if the last thread in the returned list is !pid_alive() I need one more iteration.
I have no access to pid_alive() but according kernel source PPid of such thread is going to be set to 0.

Seems this reliably works. I will update the patch.

different approach

remove unrelated changes

Nice!

compiler-rt/lib/sanitizer_common/sanitizer_file.cc
169 ↗(On Diff #145739)

I don't like this code duplication at all.

compiler-rt/lib/sanitizer_common/sanitizer_linux.cc
954 ↗(On Diff #145739)

Would it be better to resize the buffer and start from scratch until we get the entire directory in one go? Otherwise we can still miss a thread that was added before the current directory cursor, I guess - or is that impossible?

compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc
226–227

Is this because you want to suspend threads even when you've got an incomplete response? If would be cleaner to just loop while(Incomplete) ListThreads() and then handle the other two return codes.

vitalybuka marked 2 inline comments as done.May 8 2018, 12:35 PM
vitalybuka added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_file.cc
169 ↗(On Diff #145739)

Let me resolve this a separate patch.

compiler-rt/lib/sanitizer_common/sanitizer_linux.cc
954 ↗(On Diff #145739)

If it's not asan (thread registry is blocked there) threads may continue create new threads. So better to stop whatever is known.

compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc
226–227

It would be cleaner, but not sure about probability of issues.
Also if you just reread immediatly, it will likely return same list because the dying thread is still there. Suspending threads we give some time to kernel to remove it from list.

vitalybuka edited the summary of this revision. (Show Details)May 8 2018, 12:36 PM
eugenis added inline comments.May 8 2018, 12:50 PM
compiler-rt/lib/sanitizer_common/sanitizer_linux.cc
954 ↗(On Diff #145739)

Yes, if the read stopped on a dead/dying thread.
No, if it stopped because it ran out of buffer space.
Basically, we want the final iteration to succeed in a single system call, without any dead threads, and without any threads that have not been stopped on previous iterations. Otherwise we don't know if we've stopped everything.

compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc
226–227

Agreed, it can't hurt to stop what we can as soon as we can.

vitalybuka updated this revision to Diff 145842.May 8 2018, 7:18 PM
vitalybuka marked 6 inline comments as done.

try to avoid short read

While there, listing threads on NetBSD has to be done differently. If I'm looking correctly this code will be used for this OS too... and fail.

But we can skip it, I will address it in future.

eugenis accepted this revision.May 9 2018, 3:32 PM
eugenis added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux.cc
935 ↗(On Diff #145843)

s/termination/terminated

942 ↗(On Diff #145843)

I can't parse this comment.
s/with/we, s/about/above ?

compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc
229

Maybe rename to smth like "bool retry" ? Otherwise it is not clear why an incomplete list results in added_threads = true.

This revision is now accepted and ready to land.May 9 2018, 3:32 PM
This revision was automatically updated to reflect the committed changes.
Herald added a subscriber: Restricted Project. · View Herald TranscriptMay 9 2018, 9:06 PM
vitalybuka marked 3 inline comments as done.May 9 2018, 9:23 PM
vitalybuka added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux.cc
942 ↗(On Diff #145843)

I restricted the code and comments https://reviews.llvm.org/rL331953

vitalybuka marked 2 inline comments as done.May 9 2018, 9:26 PM
vitalybuka added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux.cc
942 ↗(On Diff #145843)

restricted -> restructed