This is an archive of the discontinued LLVM Phabricator instance.

[lsan] Don't crash on ThreadRegistry::threads_ data race
ClosedPublic

Authored by vitalybuka on Apr 13 2023, 5:48 PM.

Details

Summary

Comment "No lock needed" in CurrentThreadContext was wrong.
Concurent ThreadRegistry::CreateThread can resize and relocate
ThreadRegistry::threads_ the same time CurrentThreadContext reads it.

To mitigate lock cost we store ThreadContext* instead of tid in
THREADLOCAL cache, we can tid from the ThreadContext*.

Diff Detail

Event Timeline

vitalybuka created this revision.Apr 13 2023, 5:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2023, 5:48 PM
Herald added a subscriber: Enna1. · View Herald Transcript
vitalybuka requested review of this revision.Apr 13 2023, 5:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2023, 5:48 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
vitalybuka added inline comments.Apr 13 2023, 6:31 PM
compiler-rt/lib/lsan/lsan_thread.cpp
61

I guess this is going to crash OSX:

Hi, thanks Vitaly!

AFAIR, thread_local (and TLS in general) require macOS 10.12 / iOS 10 / watchOS 3 or greater.

Therefore, can we hold off on this a bit?

@thetruestblue is looking into bumping the minimal sanitizer deployment targets for Apple platforms. Blue, when you do that, can you circle back here and apply this patch? This change is a good example for a sanity check: it doesn't work right now but should work afterwards.

I'll update my patch, but is there any news on bumping min OSX version?

grandinj added inline comments.
compiler-rt/lib/lsan/lsan_thread.cpp
61

Just #ifdef this out on those platforms, and use the original code?

MaskRay added a subscriber: MaskRay.EditedApr 14 2023, 12:14 AM

LGTM. The description can be made clearer. One thread may run ThreadRegistry::CreateThread and resize threads_. It's unsafe for another thread to read threads_ concurrently.

kstoimenov accepted this revision.Apr 14 2023, 7:30 AM
This revision is now accepted and ready to land.Apr 14 2023, 7:30 AM
vitalybuka retitled this revision from [lsan] Don't crash on data race to [lsan] Don't crash on ThreadRegistry::threads_ data race.Apr 14 2023, 4:37 PM
vitalybuka edited the summary of this revision. (Show Details)

store context in thread local

MaskRay accepted this revision.Apr 14 2023, 4:47 PM
MaskRay added inline comments.
compiler-rt/lib/lsan/lsan_linux.cpp
22–24

Doesn't clang-format prefer ThreadContextLsanBase * to ThreadContextLsanBase*?

vitalybuka added inline comments.Apr 14 2023, 4:58 PM
compiler-rt/lib/lsan/lsan_linux.cpp
22–24

lsan is configured as
BasedOnStyle: Google
AllowShortIfStatementsOnASingleLine: false
IndentPPDirectives: AfterHash

MaskRay added inline comments.Apr 14 2023, 5:02 PM
compiler-rt/lib/lsan/lsan_linux.cpp
22–24

OK. I see both A * and A* in the diff context :)

This revision was landed with ongoing or failed builds.Apr 17 2023, 3:33 PM
This revision was automatically updated to reflect the committed changes.

Hi, this test thread_context_crash.cpp failed on mac:
Log:

FAIL: LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp (66201 of 70710)
******************** TEST 'LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp' FAILED ********************
Script:
--
: 'RUN: at line 2';      /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/./bin/clang  --driver-mode=g++ -O0  -arch x86_64 -stdlib=libc++ -mmacosx-version-min=10.12 -isysroot /opt/s/w/ir/cache/osx_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.3.sdk  -gline-tables-only -fsanitize=leak -I/opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/../ -O3 -pthread /opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/TestCases/thread_context_crash.cpp -o /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp &&  /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp 100
--
Exit Code: 1

Command Output (stderr):
--
Undefined symbols for architecture x86_64:
  "__ZN6__lsan16GetCurrentThreadEv", referenced from:
      __Z9null_funcPv in thread_context_crash-a63dee.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Hi, this test thread_context_crash.cpp failed on mac:
Log:

FAIL: LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp (66201 of 70710)
******************** TEST 'LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp' FAILED ********************
Script:
--
: 'RUN: at line 2';      /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/./bin/clang  --driver-mode=g++ -O0  -arch x86_64 -stdlib=libc++ -mmacosx-version-min=10.12 -isysroot /opt/s/w/ir/cache/osx_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.3.sdk  -gline-tables-only -fsanitize=leak -I/opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/../ -O3 -pthread /opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/TestCases/thread_context_crash.cpp -o /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp &&  /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp 100
--
Exit Code: 1

Command Output (stderr):
--
Undefined symbols for architecture x86_64:
  "__ZN6__lsan16GetCurrentThreadEv", referenced from:
      __Z9null_funcPv in thread_context_crash-a63dee.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

do you have a link to the builder?

Hi, this test thread_context_crash.cpp failed on mac:
Log:

FAIL: LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp (66201 of 70710)
******************** TEST 'LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp' FAILED ********************
Script:
--
: 'RUN: at line 2';      /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/./bin/clang  --driver-mode=g++ -O0  -arch x86_64 -stdlib=libc++ -mmacosx-version-min=10.12 -isysroot /opt/s/w/ir/cache/osx_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.3.sdk  -gline-tables-only -fsanitize=leak -I/opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/../ -O3 -pthread /opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/TestCases/thread_context_crash.cpp -o /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp &&  /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp 100
--
Exit Code: 1

Command Output (stderr):
--
Undefined symbols for architecture x86_64:
  "__ZN6__lsan16GetCurrentThreadEv", referenced from:
      __Z9null_funcPv in thread_context_crash-a63dee.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

do you have a link to the builder?

Yes, it's here: https://ci.chromium.org/ui/p/chromium/builders/try/mac_upload_clang/3593/overview

I haven't confirmed this patch is the cause, but it looks very likely. I'm trying to setup the environment for build llvm on my mac to confirm it.

Hi, this test thread_context_crash.cpp failed on mac:
Log:

FAIL: LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp (66201 of 70710)
******************** TEST 'LeakSanitizer-Standalone-x86_64 :: TestCases/thread_context_crash.cpp' FAILED ********************
Script:
--
: 'RUN: at line 2';      /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/./bin/clang  --driver-mode=g++ -O0  -arch x86_64 -stdlib=libc++ -mmacosx-version-min=10.12 -isysroot /opt/s/w/ir/cache/osx_sdk/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.3.sdk  -gline-tables-only -fsanitize=leak -I/opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/../ -O3 -pthread /opt/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/test/lsan/TestCases/thread_context_crash.cpp -o /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp &&  /opt/s/w/ir/cache/builder/src/third_party/llvm-build/Release+Asserts/runtimes/runtimes-bins/compiler-rt/test/lsan/X86_64LsanConfig/TestCases/Output/thread_context_crash.cpp.tmp 100
--
Exit Code: 1

Command Output (stderr):
--
Undefined symbols for architecture x86_64:
  "__ZN6__lsan16GetCurrentThreadEv", referenced from:
      __Z9null_funcPv in thread_context_crash-a63dee.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

do you have a link to the builder?

Yes, it's here: https://ci.chromium.org/ui/p/chromium/builders/try/mac_upload_clang/3593/overview

I haven't confirmed this patch is the cause, but it looks very likely. I'm trying to setup the environment for build llvm on my mac to confirm it.

Yes. it's very likey, but seems strange, no other OSX bots complained.
Not sure why linker is unhappy. Could this be an issue with incremental build, e.g. it's linking agains older library?
I will disabled darwin for now.

Not sure why linker is unhappy. Could this be an issue with incremental build, e.g. it's linking agains older library?

The builder does full build on a clean building directory every time.

I will disabled darwin for now.

Thanks.

Just realized that my macbook is M1, but that only fails on x86_64.

Yes, it's here: https://ci.chromium.org/ui/p/chromium/builders/try/mac_upload_clang/3593/overview

I haven't confirmed this patch is the cause, but it looks very likely. I'm trying to setup the environment for build llvm on my mac to confirm it.

Yes. it's very likey, but seems strange, no other OSX bots complained.
Not sure why linker is unhappy. Could this be an issue with incremental build, e.g. it's linking agains older library?
I will disabled darwin for now.

We are seeing this on our Apple internal CI bots as well. Thanks for disabling.