This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][tsan] Fix: Leak of ThreadSignalContext memory mapping when destroying fibers.
ClosedPublic

Authored by Florian on Mar 12 2020, 8:51 AM.

Details

Summary

When creating and destroying fibers in tsan a thread state is created and destroyed. Currently, a memory mapping is leaked with each fiber (in __tsan_destroy_fiber). This causes applications with many short running fibers to crash or hang because of linux vm.max_map_count.

The root of this is that ThreadState holds a pointer to ThreadSignalContext for handling signals. The initialization and destruction of it is tied to platform specific events in tsan_interceptors_posix and missed when destroying a fiber (specifically, SigCtx is used to lazily create the ThreadSignalContext in tsan_interceptors_posix). This patch cleans up the memory by makinh the ThreadState create and destroy the ThreadSignalContext.

The relevant code causing the leak with fibers is the fiber destruction:

void FiberDestroy(ThreadState *thr, uptr pc, ThreadState *fiber) {
  FiberSwitchImpl(thr, fiber);
  ThreadFinish(fiber);
  FiberSwitchImpl(fiber, thr);
  internal_free(fiber);
}

Diff Detail

Event Timeline

Florian created this revision.Mar 12 2020, 8:51 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 12 2020, 8:51 AM
Herald added subscribers: llvm-commits, Restricted Project, dberris. · View Herald Transcript

Hi Florian,

I am on the current HEAD (ce6c95aacae0cf5f8de8780de7f22de5f31a07dd), I applied this patch. But I am getting this error from 'ninja check-tsan`. Does it pass for you?

FAIL: ThreadSanitizer-x86_64 :: fiber_cleanup.cpp (199 of 400)

  • TEST 'ThreadSanitizer-x86_64 :: fiber_cleanup.cpp' FAILED ****

Script:

: 'RUN: at line 1'; /home/dvyukov/src/llvm/build/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -gline-tables-only -I/home/dvyukov/src/llvm/compiler-rt/test/tsan/../ -std=c++11 -I/home/dvyukov/src/llvm/compiler-rt/test/tsan/../ -nostdinc++ -I/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp -o /home/dvyukov/src/llvm/build/projects/compiler-rt/test/tsan/X86_64Config/Output/fiber_cleanup.cpp.tmp && /home/dvyukov/src/llvm/build/projects/compiler-rt/test/tsan/X86_64Config/Output/fiber_cleanup.cpp.tmp 2>&1 | FileCheck /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp

Exit Code: 1

Command Output (stderr):

/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `count_memory_mappings()':
/home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:14: undefined reference to `std::1::to_string(int)'
/usr/bin/ld: /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:14: undefined reference to `std::
1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::~basic_string()'
/usr/bin/ld: /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:14: undefined reference to `std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::~basic_string()'
/usr/bin/ld: /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:14: undefined reference to `std::
1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::~basic_string()'
/usr/bin/ld: /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:23: undefined reference to `std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::~basic_string()'
/usr/bin/ld: /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:14: undefined reference to `std::
1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::~basic_string()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:/home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:14: more undefined references to `std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::~basic_string()' follow
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `fiber_iteration()':
/home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:33: undefined reference to `std::
1::mutex::lock()'
/usr/bin/ld: /home/dvyukov/src/llvm/compiler-rt/test/tsan/fiber_cleanup.cpp:34: undefined reference to `std::1::mutex::unlock()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::
1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> > std::1::operator+<char, std::1::char_traits<char>, std::1::allocator<char> >(std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >&&, char const*)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/string:4250: undefined reference to `std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::append(char const*)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::
1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> > std::1::operator+<char, std::1::char_traits<char>, std::1::allocator<char> >(char const*, std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >&&)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/string:4233: undefined reference to `std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::insert(unsigned long, char const*)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::
1::basic_ifstream<char, std::1::char_traits<char> >::basic_ifstream(std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> > const&, unsigned int)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:1231: undefined reference to `std::1::basic_istream<char, std::1::char_traits<char> >::~basic_istream()'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:1231: undefined reference to `std::1::basic_ios<char, std::1::char_traits<char> >::~basic_ios()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_ifstream<char, std::1::char_traits<char> >::~basic_ifstream()':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/iosfwd:144: undefined reference to `std::1::basic_ios<char, std::1::char_traits<char> >::~basic_ios()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_ifstream<char, std::1::char_traits<char> >::~basic_ifstream()':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/iosfwd:144: undefined reference to `std::1::basic_istream<char, std::1::char_traits<char> >::~basic_istream()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_filebuf<char, std::1::char_traits<char> >::~basic_filebuf()':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:409: undefined reference to `std::1::basic_streambuf<char, std::1::char_traits<char> >::~basic_streambuf()'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:409: undefined reference to `std::1::basic_streambuf<char, std::1::char_traits<char> >::~basic_streambuf()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::codecvt<char, char, mbstate_t> const& std::1::use_facet<std::1::codecvt<char, char, mbstate_t> >(std::1::locale const&)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/__locale:250: undefined reference to `std::1::codecvt<char, char, mbstate_t>::id'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/__locale:250: undefined reference to `std::1::locale::use_facet(std::1::locale::id&) const'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::throw_bad_cast()':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/typeinfo:345: undefined reference to `std::bad_cast::bad_cast()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_ios<char, std::1::char_traits<char> >::basic_ios()':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:675: undefined reference to `vtable for std::1::basic_ios<char, std::1::char_traits<char> >'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:675: undefined reference to `vtable for std::1::basic_ios<char, std::1::char_traits<char> >'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_filebuf<char, std::1::char_traits<char> >::basic_filebuf()':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:213: undefined reference to `std::1::basic_streambuf<char, std::1::char_traits<char> >::basic_streambuf()'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:305: undefined reference to `std::1::locale::~locale()'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:307: undefined reference to `std::
1::locale::~locale()'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:307: undefined reference to `std::1::locale::~locale()'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/fstream:311: undefined reference to `std::
1::basic_streambuf<char, std::1::char_traits<char> >::~basic_streambuf()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::
1::ios_base::ios_base()':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:344: undefined reference to `vtable for std::1::ios_base'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:344: undefined reference to `vtable for std::
1::ios_base'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_ios<char, std::1::char_traits<char> >::init(std::1::basic_streambuf<char, std::1::char_traits<char> >*)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:712: undefined reference to `std::1::ios_base::init(void*)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `bool std::
1::has_facet<std::1::codecvt<char, char, mbstate_t> >(std::1::locale const&)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/
locale:242: undefined reference to `std::1::codecvt<char, char, mbstate_t>::id'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/__locale:242: undefined reference to `std::1::locale::has_facet(std::1::locale::id&) const'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_streambuf<char, std::1::char_traits<char> >::getloc() const':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/streambuf:149: undefined reference to `std::1::locale::locale(std::1::locale const&)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::ios_base::setstate(unsigned int)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:548: undefined reference to `std::
1::ios_base::clear(unsigned int)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::basic_istream<char, std::1::char_traits<char> >& std::1::getline<char, std::1::char_traits<char>, std::1::allocator<char> >(std::1::basic_istream<char, std::1::char_traits<char> >&, std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >&, char)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/istream:1511: undefined reference to `std::1::basic_istream<char, std::1::char_traits<char> >::sentry::sentry(std::1::basic_istream<char, std::1::char_traits<char> >&, bool)'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/istream:1532: undefined reference to `std::1::basic_string<char, std::1::char_traits<char>, std::1::allocator<char> >::push_back(char)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::
1::basic_ios<char, std::1::char_traits<char> >::widen(char) const':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:778: undefined reference to `std::
1::ios_base::getloc() const'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:778: undefined reference to `std::1::locale::~locale()'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/ios:778: undefined reference to `std::
1::locale::~locale()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o: in function `std::1::ctype<char> const& std::1::use_facet<std::1::ctype<char> >(std::1::locale const&)':
/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/__locale:250: undefined reference to `std::1::ctype<char>::id'
/usr/bin/ld: /home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1/
locale:250: undefined reference to `std::1::locale::use_facet(std::1::locale::id&) const'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE[_ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE]+0x10): undefined reference to `typeinfo for std::1::basic_istream<char, std::1::char_traits<char> >'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE[_ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE]+0x18): undefined reference to `std::1::basic_istream<char, std::1::char_traits<char> >::~basic_istream()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE[_ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE]+0x20): undefined reference to `std::1::basic_istream<char, std::1::char_traits<char> >::~basic_istream()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE[_ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE]+0x38): undefined reference to `typeinfo for std::1::basic_istream<char, std::1::char_traits<char> >'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE[_ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE]+0x40): undefined reference to `virtual thunk to std::1::basic_istream<char, std::1::char_traits<char> >::~basic_istream()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE[_ZTCNSt3114basic_ifstreamIcNS_11char_traitsIcEEEE0_NS_13basic_istreamIcS2_EE]+0x48): undefined reference to `virtual thunk to std::1::basic_istream<char, std::1::char_traits<char> >::~basic_istream()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTINSt3114basic_ifstreamIcNS_11char_traitsIcEEEE[_ZTINSt3114basic_ifstreamIcNS_11char_traitsIcEEEE]+0x10): undefined reference to `typeinfo for std::1::basic_istream<char, std::1::char_traits<char> >'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE[_ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE]+0x48): undefined reference to `std::1::basic_streambuf<char, std::1::char_traits<char> >::showmanyc()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE[_ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE]+0x50): undefined reference to `std::1::basic_streambuf<char, std::1::char_traits<char> >::xsgetn(char*, long)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE[_ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE]+0x60): undefined reference to `std::1::basic_streambuf<char, std::1::char_traits<char> >::uflow()'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE[_ZTVNSt3113basic_filebufIcNS_11char_traitsIcEEEE]+0x70): undefined reference to `std::1::basic_streambuf<char, std::1::char_traits<char> >::xsputn(char const*, long)'
/usr/bin/ld: /tmp/fiber_cleanup-0ec261.o:(.rodata._ZTINSt3113basic_filebufIcNS_11char_traitsIcEEEE[_ZTINSt3113basic_filebufIcNS_11char_traitsIcEEEE]+0x10): undefined reference to `typeinfo for std::1::basic_streambuf<char, std::1::char_traits<char> >'
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)

: 'RUN: at line 1'; /home/dvyukov/src/llvm/build/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -gline-tables-only -I/home/dvyukov/src/llvm/compiler-rt/test/tsan/../ -std=c++11 -I/home/dvyukov/src/llvm/compiler-rt/test/tsan/../ -nostdinc++ -I/home/dvyukov/src/llvm/build/projects/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /home/dvyukov/src/llvm/compiler-rt/test/tsan/inlined_memcpy_race.cpp -o /home/dvyukov/src/llvm/build/projects/compiler-rt/test/tsan/X86_64Config/Output/inlined_memcpy_race.cpp.tmp && /home/dvyukov/src/llvm/compiler-rt/test/tsan/deflake.bash 10 /home/dvyukov/src/llvm/build/projects/compiler-rt/test/tsan/X86_64Config/Output/inlined_memcpy_race.cpp.tmp 2>&1 | FileCheck /home/dvyukov/src/llvm/compiler-rt/test/tsan/inlined_memcpy_race.cpp

dvyukov added inline comments.Mar 20 2020, 10:14 AM
compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
904

I wonder if it's OK to use tctx here. We already called ctx->thread_registry->FinishThread(thr->tid) and tctx is owner by thread_registry. So by now it probably can be already reused for another thread.

I would appreciate feedback if this way of fixing the leak is ok.

It looks ok to me.
As you noted below the shutdown sequence is quite messy...

Also, I think it would be worthwhile to more closely look at the lifecycle of ThreadState (i.e. it uses no constructor/destructor, thus requiring manual callbacks for cleanup) and how OS-Threads/user level fibers are differentiated in the codebase. I would be happy to contribute more if someone could point me at the right place to discuss this issue.

thread-sanitizer@googlegroups.com mailing list is the right place. Or, we can discuss prototype changes here.

Thanks

Florian updated this revision to Diff 252260.Mar 24 2020, 4:06 AM
Florian marked 2 inline comments as done.
  • The clean-up callback for the platform code is moved to a different part of the thread destruction sequence.
  • Convert the test case to C instead of C++

Hello Dmitry,

thanks for your feedback. The test compiles fine on my machine, it looks like your linker is missing C++ symbols. I converted the test to be C instead of C++ and hope it works now.

Regarding the shutdown-sequence:

Firstly, tsan_rtl_proc.h defines clean methods for creating and destroying a Process class (memory allocation and initialization).
Processor *ProcCreate()
void ProcDestroy(Processor *proc)
Maybe it would be beneficial to do the same for the thread state?

Secondly, I got a question regarding responsibilities in ThreadStates. The classes ThreadState and ThreadContext seem to represent a logical thread, i.e. a logical strand of control. Process represents a 'real' thread, i.e. an operating system controlled execution entity like a single pthread. Would it be a worthwhile effort for me to pull responsibilities of logical and 'real' threads apart?

Lastly, if I plan to make changes, is it ok to introduce features like virtual classes and smart pointers (on non hot paths) or are there restrictions on these in the codebase?

Best regards,
Florian

compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
904

I looked into it and tctx can indeed be re-used by another thread at this point leading to a race.

The thread context is an essential part of an alive thread state, therefore I moved the call to the PlatformThreadFinish clean up call into the OnFinished callback of the thread state. In here, the thread context is still valid. My only concern is that we then perform a rather 'heavy' clean up work while holding the mutex of the ThreadRegistry (if threads are created/destroyed at a moderate pace this should be fine).

Would it be a worthwhile effort for me to pull responsibilities of logical and 'real' threads apart?

Yes. The Processor entity was introduced rather late in runtime evolution and only few most critical things were moved to the Processor from ThreadState. Perhaps more things can be moved.
Note tsan is also used for Go where Processor maps to a "logical processor" (a unit of GOMAXPROCS) and ThreadState maps to a goroutine. That was the main motivation for the split.

Maybe it would be beneficial to do the same for the thread state?

Generally yes. A good abstraction is good. But the devil may be in details :)
Also note that Processor is pure abstract, while ThreadState has several platform-dependent clean up parts.
Also the shutdown sequence is the hardest part, the order of things was crafted very carefully.

dvyukov added a comment.EditedMar 25 2020, 3:41 AM

Lastly, if I plan to make changes, is it ok to introduce features like virtual classes and smart pointers (on non hot paths) or are there restrictions on these in the codebase?

If it results in a real improvement and does not degrade performance of fast paths.
Fast paths were crafted very carefully to avoid any single excessive instruction. A virtual call on most fast paths is unacceptable.
Also it needs to make things simpler rather than harder. For example, reference counting may make things more obscure and non-transparent, lead to leaks and ordering bugs. Though, of course, absence of refcounting may lead to all the same problems :)
So I guess it will mostly depend on case-by-case basis. For example, we use scoped locks extensively and that's very handy and I guess prevented lots of bugs.

Lastly, if I plan to make changes, is it ok to introduce features like virtual classes and smart pointers (on non hot paths) or are there restrictions on these in the codebase?

If it results in a real improvement and does not degrade performance of fast paths.
Fast paths were crafted very carefully to avoid any single excessive instruction. A virtual call on most fast paths is unacceptable.
Also it needs to make things simpler rather than harder. For example, reference counting may make things more obscure and non-transparent, lead to leaks and ordering bugs. Though, of course, absence of refcounting may lead to all the same problems :)
So I guess it will mostly depend on case-by-case basis. For example, we use scoped locks extensively and that's very handy and I guess prevented lots of bugs.

Speaking of unnecessary complexity and refactoring.

The current design of ThreadRegistry always confused me. Reading this change, it makes it very hard to follow logic.

We call ThreadFinish, which calls thread_registry->FinishThread which is in different directory, which always unconditionally calls virtual ThreadContext::OnFinished which is in a different part of tsan_rtl_thread.cc. I always need to open 5 files and do constant switching just to recover this part each time.

We need to remove OnFinished entirely and just do whatever we do there after the thread_registry->FinishThread call right in ThreadFinish.

dvyukov added inline comments.Mar 25 2020, 4:09 AM
compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
904

We could avoid holding the lock around the cleanup by passing ThreadType to the callback. That's the only bit it needs from tctx.
Currently we know this bit, so we don't access tctx at all. With the new code forget this bit when call ThreadFinish and then we try to restore the knowledge of what we are destroying by looking at tctx.

Florian marked an inline comment as done.Mar 25 2020, 5:16 AM

Would it be a worthwhile effort for me to pull responsibilities of logical and 'real' threads apart?

Yes. The Processor entity was introduced rather late in runtime evolution and only few most critical things were moved to the Processor from ThreadState. Perhaps more things can be moved.
Note tsan is also used for Go where Processor maps to a "logical processor" (a unit of GOMAXPROCS) and ThreadState maps to a goroutine. That was the main motivation for the split.

So essentially in Go a ThreadState is used as a Fiber (which they call goroutine and schedule internally)? This makes it behave the same as fibers in C++: you have worker threads that switch between logical fibers.

Does tsan then run its race-detection logic on a per ThreadState/per logical string of execution basis? If so, a clean separation between operating system threads and logical threads would remove the ThreadType::Fiber and treat all ThreadStates equally as a logical string of execution. However, I fear that such a separation of concerns would require too much rework...

compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp
904

Yes, good point. I actually had that one typed out as one possible solution to the race. I can submit the updated diff later.

Lastly, if I plan to make changes, is it ok to introduce features like virtual classes and smart pointers (on non hot paths) or are there restrictions on these in the codebase?

If it results in a real improvement and does not degrade performance of fast paths.
Fast paths were crafted very carefully to avoid any single excessive instruction. A virtual call on most fast paths is unacceptable.
Also it needs to make things simpler rather than harder. For example, reference counting may make things more obscure and non-transparent, lead to leaks and ordering bugs. Though, of course, absence of refcounting may lead to all the same problems :)
So I guess it will mostly depend on case-by-case basis. For example, we use scoped locks extensively and that's very handy and I guess prevented lots of bugs.

Speaking of unnecessary complexity and refactoring.

The current design of ThreadRegistry always confused me. Reading this change, it makes it very hard to follow logic.

We call ThreadFinish, which calls thread_registry->FinishThread which is in different directory, which always unconditionally calls virtual ThreadContext::OnFinished which is in a different part of tsan_rtl_thread.cc. I always need to open 5 files and do constant switching just to recover this part each time.

We need to remove OnFinished entirely and just do whatever we do there after the thread_registry->FinishThread call right in ThreadFinish.

Yes, I absolutely agree. It took me quite a while to figure out that OnFinished is part of the cleanup sequence.
I think it would be beneficial to collect all clean-up code in these main lifecycle functions and make all code that wants to change a ThreadState use them. This way one can follow what happens at specific points of a thread's lifetime by the steps listed in the corresponding function.

If platform specific cleanups are needed on ThreadState transitions, they can get explicit platform callbacks.

int ThreadCreate(ThreadState *thr, uptr pc, uptr uid, bool detached);
void ThreadStart(ThreadState *thr, int tid, tid_t os_id,
                 ThreadType thread_type);
void ThreadFinish(ThreadState *thr);
...

Processor *ProcCreate();
void ProcDestroy(Processor *proc);
void ProcWire(Processor *proc, ThreadState *thr);
void ProcUnwire(Processor *proc, ThreadState *thr);

It probably is quite messy to get all nuances right when refactoring. I will give it a first experimental go if my spare time allows this week :)

Should we take this bigger discussion on refactoring to the mailing list and keep this review to fixing the leak?

dvyukov accepted this revision.Mar 25 2020, 9:06 AM
This revision is now accepted and ready to land.Mar 25 2020, 9:06 AM

So essentially in Go a ThreadState is used as a Fiber (which they call goroutine and schedule internally)?

In Go goroutine what is Thread in C.

Does tsan then run its race-detection logic on a per ThreadState/per logical string of execution basis?

Yes, race detection logic is tied to ThreadState.
Processors is unrelated to logic, it's merely an implementation abstraction for better resource utilization.

Should we take this bigger discussion on refactoring to the mailing list and keep this review to fixing the leak?

Yes, it would be better.

FTR, Go entry points are in compiler-rt/lib/tsan/go/tsan_go.cpp

dyung added a subscriber: dyung.Mar 25 2020, 6:36 PM
wolfgangp added inline comments.
compiler-rt/test/tsan/fiber_cleanup.cpp
30

This caused a lint complaint about using a constant instead of sizeof(proc_file_name).
It was breaking some builds.

Fixed with this commit.

I've reverted this commit to turn our bot green again.

To github.com:llvm/llvm-project.git
   e06d707aa2a..6430707196b  master -> master

It appears this broke the LLDB bot: http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/13127/testReport/junit/

Can you please take a look?

Florian, please take a look. There are some crashes on darwin.

It appears this broke the LLDB bot: http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/13127/testReport/junit/

Can you please take a look?

Florian, please take a look. There are some crashes on darwin.

I found the issue, but I will take some more time to follow all platform shutdown sequences before submitting again. Sorry for the trouble I've caused.

Florian updated this revision to Diff 254782.Apr 3 2020, 7:07 AM
Florian edited the summary of this revision. (Show Details)

Making the ThreadFinish call into platform specific code caused all kinds of further issues.
Now the ThreadState itself owns and manages the ThreadSignalContext, as it is required on all supported platforms anyways.

I ran check-tsan multiple times on both my Mac and Linux machine. I also ran clang-tidy and clang-format over it.
I would appreciate it if you could also run the test suite, Dmitry. I am a little scared of breaking something again.

This is lots of changes. This makes me nervous and makes it harder for me approve this. Tsan is currently in maintenance mode with no official staffing and I don't remember most of this code well. Keeping changes "safer" makes it much easier to handle.
Say, you reorder thr->~ThreadState() and StatAggregate. And this has no explanation and is hidden in the code movement, so not even really visible. This makes me wonder why, and if you have given this enough consideration and are there other similar hidden changes...
Also moving all this ThreadSignalContext/SignalDesc/ucontext_buffer_t/768/129/65 into abstract logic layer makes me wince. It was specifically kept as "interceptors" stuff for a 8 years. I think the same can be achieved by providing a callback in tsan_interceptors_posix.cc with no major code movement.
If you restrict this to just fixing the bug, it will make both of us less scared of breaking things :)
Refactrorings may go in separate subsequent changes with own proper descriptions. This is much better for review, tests and these are also a unit of rollback. So if just a refactoring is rolled back, that's much better.

Florian updated this revision to Diff 256034.Apr 8 2020, 9:11 AM

I integrated your feedback and put a separate callback for just the clean up. To not do any further refactoring I placed it in OnFinished where also other resources of ThreadState are freed.
The cleanup accesses non of the previously freed resources and does not free any resources needed in the destructor called after it or the stat aggregation, which are the only functions accessing ThreadState afterwards.

PS: It is my very first time contributing to a bigger repository/project and I have only limited experience with big codebases. I really try to get this right and not cause any issues, maybe I was a bit over-eager with my refactoring. Thank you for all your patience and feedback :)

I integrated your feedback and put a separate callback for just the clean up. To not do any further refactoring I placed it in OnFinished where also other resources of ThreadState are freed.
The cleanup accesses non of the previously freed resources and does not free any resources needed in the destructor called after it or the stat aggregation, which are the only functions accessing ThreadState afterwards.

PS: It is my very first time contributing to a bigger repository/project and I have only limited experience with big codebases. I really try to get this right and not cause any issues, maybe I was a bit over-eager with my refactoring. Thank you for all your patience and feedback :)

Thanks! It's now way more obvious what is the problem, what is the fix and what are effects of this change.
Refactorings are good, but sometimes they better go in separately for purposes of review/description/history/testing/reverts/etc.
Thanks for your persistence!