Page MenuHomePhabricator

[sanitizer] Simplify GetTls with dl_iterate_phdr on Linux and use it on musl/FreeBSD
ClosedPublic

Authored by MaskRay on Mar 19 2021, 12:56 AM.

Details

Summary

... so that FreeBSD specific GetTls/glibc specific pthread_self code can be
removed. This also helps FreeBSD arm64/powerpc64 which don't have GetTls
implementation yet.

GetTls is the range of

  • thread control block and optional TLS_PRE_TCB_SIZE
  • static TLS blocks plus static TLS surplus

On glibc, lsan requires the range to include
pthread::{specific_1stblock,specific} so that allocations only referenced by
pthread_setspecific can be scanned.

This patch uses dl_iterate_phdr to collect TLS blocks. Find the one
with dlpi_tls_modid==1 as one of the initially loaded module, then find
consecutive ranges. The boundaries give us addr and size.

This allows us to drop the glibc internal _dl_get_tls_static_info and
InitTlsSize. However, huge glibc x86-64 binaries with numerous shared objects
may observe time complexity penalty, so exclude them for now. Use the simplified
method with non-Android Linux for now, but in theory this can be used with *BSD
and potentially other ELF OSes.

This removal of RISC-V __builtin_thread_pointer makes the code compilable more
compiler versions (added in Clang in 2020-03, added in GCC in 2020-07).

This simplification enables D99566 for TLS Variant I architectures.

Note: as of musl 1.2.2 and FreeBSD 12.2, dlpi_tls_data returned by
dl_iterate_phdr is not desired: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254774
This can be worked around by using __tls_get_addr({modid,0}) instead
of dlpi_tls_data. The workaround can be shared with the workaround for glibc<2.25.

This fixes some tests on Alpine Linux x86-64 (musl)

test/lsan/Linux/cleanup_in_tsd_destructor.c
test/lsan/Linux/fork.cpp
test/lsan/Linux/fork_threaded.cpp
test/lsan/Linux/use_tls_static.cpp
test/lsan/many_tls_keys_thread.cpp

test/msan/tls_reuse.cpp

and test/lsan/TestCases/many_tls_keys_pthread.cpp on glibc aarch64.

The number of sanitizer test failures does not change on FreeBSD/amd64 12.2.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vitalybuka added inline comments.Tue, Mar 23, 4:51 PM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
309–328
326–327

what is going to happens with gaps > alignment?
If it's impossible maybe just CHECK() for that
otherwise probably better to change GetThreadStackAndTls to return vector and let caller decide

vitalybuka added inline comments.Tue, Mar 23, 4:56 PM
compiler-rt/lib/asan/asan_rtl.cpp
571

Actually PoisonShadow has CHECKs to I don't mind to keep this and below as is if it does not trigger.

MaskRay marked 5 inline comments as done.Tue, Mar 23, 6:11 PM
MaskRay added inline comments.
compiler-rt/lib/asan/asan_rtl.cpp
571

I'll adjust top-bottom but keep bottom untouched.

compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
299

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces The llvm-project coding standard does not use anonymous namespace for static functions. Does compiler-rt do it differently?

326–327

A dynamic loader does not make gaps > alignment for static TLS blocks (which would simply waste space). The code uses alignment to check whether two consecutive TLS ranges should be merged.

If a range is out of reach, it means it belongs to a dynamic TLS block and there should not be any CHECK.

MaskRay updated this revision to Diff 332841.Tue, Mar 23, 6:18 PM
MaskRay marked 3 inline comments as done.

address comments

MaskRay added inline comments.Tue, Mar 23, 6:20 PM
compiler-rt/lib/asan/asan_thread.cpp
324–325

Hmm, I'll move the round to where stack_top_ is initialized.

vitalybuka added inline comments.Tue, Mar 23, 11:45 PM
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
299

Oh, I forgot this one.
Originally sanitizers were made with Google Style. Still it uses statics more often then unnamed namespace.
Lets keep static.

340

check-sanitizer gets

/usr/bin/ld: SANITIZER_TEST_OBJECTS.sanitizer_linux_test.cpp.x86_64.o: in function `__sanitizer::thread_self_offset_test_func(void*)':
/usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:193: undefined reference to `__sanitizer::ThreadSelf()'
/usr/bin/ld: /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:193: undefined reference to `__sanitizer::ThreadSelfOffset()'
/usr/bin/ld: /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:193: undefined reference to `__sanitizer::ThreadSelf()'
/usr/bin/ld: SANITIZER_TEST_OBJECTS.sanitizer_linux_test.cpp.x86_64.o: in function `__sanitizer::SanitizerLinux_ThreadSelfOffset_Test::TestBody()':
/usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:193: undefined reference to `__sanitizer::ThreadSelf()'
/usr/bin/ld: /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:193: undefined reference to `__sanitizer::ThreadSelfOffset()'
/usr/bin/ld: /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:193: undefined reference to `__sanitizer::ThreadSelf()'
/usr/bin/ld: SANITIZER_TEST_OBJECTS.sanitizer_linux_test.cpp.x86_64.o: in function `__sanitizer::thread_descriptor_size_test_func(void*)':
/usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:208: undefined reference to `__sanitizer::ThreadSelf()'rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp:193: undefined reference to `__sanitizer::ThreadSelf()'

also it's still declared in sanitizer_linux.h and used in unittest

MaskRay updated this revision to Diff 333119.Wed, Mar 24, 2:19 PM

Delete check-sanitizer found tests.

Hack powerpc64 to use 1664. For unknown reasons it is needed for an internal test.

MaskRay marked an inline comment as done.Wed, Mar 24, 2:25 PM

By intercepting pthread_setspecific, we can track thread-specific data keys for lsan in a better way and likely will be able to get rid of ThreadDescriptorSize() and TlsPreTcbSize().

compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
340

Thanks. I did not know check-sanitizer. I used check-asan check-msan check-hwasan check-tsan check-ubsan check-cfi check-lsan ...

MaskRay marked an inline comment as done.Wed, Mar 24, 2:26 PM
vitalybuka accepted this revision.Wed, Mar 24, 9:11 PM
vitalybuka added inline comments.
compiler-rt/lib/sanitizer_common/tests/sanitizer_linux_test.cpp
188

Please keep ThreadDescriptorSize like this
ThreadDescriptorSize is version specific so we want to know if it executed on system with unexpected size

This revision is now accepted and ready to land.Wed, Mar 24, 9:11 PM
MaskRay updated this revision to Diff 333210.Wed, Mar 24, 10:46 PM
MaskRay marked 2 inline comments as done.

add back a unit test

MaskRay updated this revision to Diff 333457.Thu, Mar 25, 4:22 PM

Explain why we need the 1664 static TLS surplus.

Performed a large-scale testing internally and it looks good.

MaskRay edited the summary of this revision. (Show Details)Thu, Mar 25, 4:54 PM
This revision was landed with ongoing or failed builds.Thu, Mar 25, 9:55 PM
This revision was automatically updated to reflect the committed changes.
phosek added a subscriber: phosek.Fri, Apr 2, 5:38 PM

After this change, we're also seeing a leak being reported in BoringSSL when running on Debian 9 (stretch) which appears to be a false positive, see fxbug.dev/73487 for more details.

MaskRay added a comment.EditedFri, Apr 2, 5:55 PM

After this change, we're also seeing a leak being reported in BoringSSL when running on Debian 9 (stretch) which appears to be a false positive, see fxbug.dev/73487 for more details.

Is the false positive fixed by 58c62fd9768594ec8dd57e8320ba2396bf8b87e5 [sanitizer] Improve accuracy of GetTls on x86/s390? And note that the change only affects Linux, so Fuchsia is not affected.

I have found an approach to work around the Ubuntu 16.04 glibc 2.23 bug https://sourceware.org/pipermail/libc-alpha/2021-April/124697.html without making the code dirty.
I am testing it.

MaskRay reopened this revision.Sat, Apr 3, 4:51 PM
This revision is now accepted and ready to land.Sat, Apr 3, 4:51 PM
MaskRay updated this revision to Diff 335117.Sat, Apr 3, 4:56 PM
MaskRay retitled this revision from [sanitizer] Simplify GetTls with dl_iterate_phdr to [sanitizer] Simplify GetTls with dl_iterate_phdr on Linux.
MaskRay edited the summary of this revision. (Show Details)

Fix Ubuntu 16.04 glibc 2.23

This revision was landed with ongoing or failed builds.Sun, Apr 4, 3:36 PM
This revision was automatically updated to reflect the committed changes.
jyknight added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
430

This seems not OK to call this here. I'm seeing an ASAN "stack-use-after-scope" arising from this change, which I believe is because of calling this function during thread setup, before the stack bounds have been set in asan.

#0 0x5620b775273b in __interceptor_confstr compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:3713:5
#1 0x5620b77128ea in __sanitizer::GetLibcVersion(int*, int*, int*) compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:174:14
#2 0x5620b77129c5 in __sanitizer::GetTls(unsigned long*, unsigned long*) compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:424:7
#3 0x5620b7712e89 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:525:3
#4 0x5620b77a8f5d in __asan::AsanThread::SetThreadStackAndTls(__asan::AsanThread::InitOptions const*) compiler-rt/lib/asan/asan_thread.cpp:308:3
#5 0x5620b77a8b32 in __asan::AsanThread::Init(__asan::AsanThread::InitOptions const*) compiler-rt/lib/asan/asan_thread.cpp:235:3
#6 0x5620b77a9076 in __asan::AsanThread::ThreadStart(unsigned long long) compiler-rt/lib/asan/asan_thread.cpp:266:3

Probably should revert the change for now.

DavidSpickett added a subscriber: DavidSpickett.EditedTue, Apr 6, 3:34 AM

I've disabled many_tls_keys_pthread.cpp for AArch64 as it requires this change (https://reviews.llvm.org/rG34f8a7f93c98b7c189f9684d0d207ab42aca166f). The commit that enabled it wasn't caught in the initial revert.

If/when you reland it would make sense to include enabling this test as part of it. (passes on our bot when this change is included)

MaskRay reopened this revision.Wed, Apr 7, 11:17 AM
This revision is now accepted and ready to land.Wed, Apr 7, 11:17 AM
MaskRay updated this revision to Diff 335877.Wed, Apr 7, 11:18 AM
MaskRay retitled this revision from [sanitizer] Simplify GetTls with dl_iterate_phdr on Linux to [sanitizer] Simplify GetTls with dl_iterate_phdr on Linux and use it on musl/FreeBSD.
MaskRay edited the summary of this revision. (Show Details)

Fold D99871 into it

Keep InitTlsSize

MaskRay updated this revision to Diff 337901.Thu, Apr 15, 2:28 PM
MaskRay edited the summary of this revision. (Show Details)

Special case Linux x86-64

MaskRay updated this revision to Diff 337912.Thu, Apr 15, 2:48 PM
MaskRay edited the summary of this revision. (Show Details)

Update description

This revision was landed with ongoing or failed builds.Thu, Apr 15, 3:35 PM
This revision was automatically updated to reflect the committed changes.
morehouse added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp
198

This is triggering "unused variable" errors on Android: https://lab.llvm.org/buildbot/#/builders/77/builds/5507

/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:198:12: error: unused variable 'g_use_dlpi_tls_data' [-Werror,-Wunused-variable]
static int g_use_dlpi_tls_data;

PPC bots are also segfaulting during some MSan test after this patch: https://lab.llvm.org/buildbot/#/builders/19/builds/3665, https://lab.llvm.org/buildbot/#/builders/105/builds/8438

I'm not 100% sure, but I think it's likely this patch is the culprit.

******************** TEST 'UBSan-MemorySanitizer-powerpc64le :: TestCases/ImplicitConversion/integer-truncation.c' FAILED ********************
...
Exit Code: 254
Command Output (stderr):
--
clang-13: error: unable to execute command: Segmentation fault (core dumped)
clang-13: error: linker command failed due to signal (use -v to see invocation)

As mentioned above, this seems to be causing segfaults on PPC buildbots; did some builds on a few revisions and this seems to be the cause.

MaskRay added a subscriber: amyk.EditedFri, Apr 16, 9:26 AM

PPC bots are also segfaulting during some MSan test after this patch: https://lab.llvm.org/buildbot/#/builders/19/builds/3665, https://lab.llvm.org/buildbot/#/builders/105/builds/8438

I'm not 100% sure, but I think it's likely this patch is the culprit.

******************** TEST 'UBSan-MemorySanitizer-powerpc64le :: TestCases/ImplicitConversion/integer-truncation.c' FAILED ********************
...
Exit Code: 254
Command Output (stderr):
--
clang-13: error: unable to execute command: Segmentation fault (core dumped)
clang-13: error: linker command failed due to signal (use -v to see invocation)

Yes, I've reported it to @amyk . There may be a GNU ld 2.30 bug. I have tested the patch on a Ubunbu 18.04 machine with GNU ld 2.31: there is no segfault.
I think the bot may need to upgrade binutils - this is not the first time I've seen compiler/linker crashes on the ppc64le build bot - I don't even know how to work around it if we don't have more information about the nature.

The -Wunused-variable has been fixed by 376db8eaef3a06cf963eba9e8009048dc93689f0

@MaskRay
Seems that it caused
https://bugs.llvm.org/show_bug.cgi?id=50017
cannot link on s390x

could you please have a look?