HomePhabricator

[sanitizer] Simplify GetTls with dl_iterate_phdr

Authored by MaskRay on Mar 25 2021, 9:55 PM.

Description

[sanitizer] Simplify GetTls with dl_iterate_phdr

GetTls is the range of

  • thread control block and optional TLS_PRE_TCB_SIZE
  • static TLS blocks plus static TLS surplus

On glibc, lsan requires the range to include
pthread::{specific_1stblock,specific} so that allocations only referenced by
pthread_setspecific can be scanned.

This patch uses dl_iterate_phdr to collect TLS ranges. Find the one
with dlpi_tls_modid==1 as one of the initially loaded module, then find
consecutive ranges. The boundaries give us addr and size.

This allows us to drop the glibc internal _dl_get_tls_static_info and
InitTlsSize entirely. Use the simplified method with non-Android Linux for
now, but in theory this can be used with *BSD and potentially other ELF OSes.

In the future, we can move ThreadDescriptorSize code to lsan (and consider
intercepting pthread_setspecific) to avoid hacks in generic code.

See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage
across various sanitizers.

Differential Revision: https://reviews.llvm.org/D98926

Event Timeline

jsji added a subscriber: jsji.Mar 26 2021, 5:56 PM

@MaskRay Looks like this has caused large number of failures in sanitizer on PowerPC bulidbot https://lab.llvm.org/buildbot/#/builders/57/builds/5613
Can you have a look or revert? Thanks!

@MaskRay Looks like this has caused large number of failures in sanitizer on PowerPC bulidbot https://lab.llvm.org/buildbot/#/builders/57/builds/5613
Can you have a look or revert? Thanks!

I have sent a message to powerllvm@ca.ibm.com earlier today. Hope someone can provide more information but I don' think this patch regresses powerpc behaviors.

The failures are some check-tsan tests.
However, sanitizer-ppc64le-linux
(https://lab.llvm.org/buildbot/#/builders/19) is good. My testing on
Debian GLIBC 2.28-10 is also good.
And I have confidence that it should even pass with very old glibc
like 2.19 or even older.

92: FATAL: ThreadSanitizer CHECK failed: /home/buildbots/ppc64le-clang-test/clang-ppc64le/llvm/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp:447 "((thr_beg)) >= ((tls_addr))" (0x3fffb30bfd80, 0xfffffffffffff8b0)

From the log, tls_addr is -1872, which means in
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp : GetTls,
after GetStaticTlsRange(addr, size), addr is 0 (ThreadDescriptorSize()
is 1872), but I am confused how this can happen.
Can someone debug a check-tsan test a bit and check how this happens??

I have sent a message to powerllvm@ca.ibm.com earlier today. Hope someone can provide more information but I don' think this patch regresses powerpc behaviors.

We received a message at 6:28 EDT on a Friday. Given the notification came after business hours on a Friday, would you be able to pull this to bring the bot back to green and we can resume investigation on Monday?

From the log, tls_addr is -1872, which means in
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp : GetTls,
after GetStaticTlsRange(addr, size), addr is 0 (ThreadDescriptorSize()
is 1872), but I am confused how this can happen.
Can someone debug a check-tsan test a bit and check how this happens??

We can help investigate this on Monday.

I have sent a message to powerllvm@ca.ibm.com earlier today. Hope someone can provide more information but I don' think this patch regresses powerpc behaviors.

We received a message at 6:28 EDT on a Friday. Given the notification came after business hours on a Friday, would you be able to pull this to bring the bot back to green and we can resume investigation on Monday?

If you mean revert I'd prefer not... For my listed evidence I tend to think this is some configuration issue.
This touches many files and I don't think we should let the configuration issue cause churn to it.

Can check-tsan be disabled on ppc64le-clang-test?

From the log, tls_addr is -1872, which means in
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp : GetTls,
after GetStaticTlsRange(addr, size), addr is 0 (ThreadDescriptorSize()
is 1872), but I am confused how this can happen.
Can someone debug a check-tsan test a bit and check how this happens??

We can help investigate this on Monday.

I have sent a message to powerllvm@ca.ibm.com earlier today. Hope someone can provide more information but I don' think this patch regresses powerpc behaviors.

We received a message at 6:28 EDT on a Friday. Given the notification came after business hours on a Friday, would you be able to pull this to bring the bot back to green and we can resume investigation on Monday?

If you mean revert I'd prefer not... For my listed evidence I tend to think this is some configuration issue.
This touches many files and I don't think we should let the configuration issue cause churn to it.

Can check-tsan be disabled on ppc64le-clang-test?

From the log, tls_addr is -1872, which means in
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp : GetTls,
after GetStaticTlsRange(addr, size), addr is 0 (ThreadDescriptorSize()
is 1872), but I am confused how this can happen.
Can someone debug a check-tsan test a bit and check how this happens??

We can help investigate this on Monday.

31e541e37587100a5b21378380f54c028fda2d04 made https://lab.llvm.org/buildbot/#/builders/76/builds/2066 happy:)

@nemanjai @jsji The root issue causing *addr == 0 (most check-tsan tests have the issue) is still very appreciated.. (i.e. locally reverting 31e541e37587100a5b21378380f54c028fda2d04 can break https://lab.llvm.org/buildbot/#/builders/76)

TLS handling in compiler-rt is fragile and I am trying to keep the logic simple and maintainable. Simplifying the code is an important step.

  • InitTlsSize's estimate of the size of static TLS blocks is generally inaccurate for non-x86.
  • upstream glibc (without the help of } else if ((tls_beg % 4096) == sizeof(Glibc_2_19_tls_header)) { code in lib/sanitizer_common/sanitizer_tls_get_addr.cpp
jsji added subscribers: Conanap, amyk.Apr 1 2021, 1:42 PM

@nemanjai @jsji The root issue causing *addr == 0 (most check-tsan tests have the issue) is still very appreciated.. (i.e. locally reverting 31e541e37587100a5b21378380f54c028fda2d04 can break https://lab.llvm.org/buildbot/#/builders/76)

@MaskRay Sorry to reply late.
Yes, @Conanap and @amyk will help investigate the root cause and let you know .

amyk added a comment.Apr 6 2021, 9:23 AM

Hi @MaskRay,

I saw that the patch, along with the temporary workaround for clang-ppc64le-linux has been reverted. I also saw that you committed https://reviews.llvm.org/rGec575e3b0a462ff7a3d23d0f39a22147606050de without a work around for PPC, although that has also been reverted at this time.
Was your recent patch still problematic for PPC and would you still like us to investigate it?

Hi @MaskRay,

I saw that the patch, along with the temporary workaround for clang-ppc64le-linux has been reverted. I also saw that you committed https://reviews.llvm.org/rGec575e3b0a462ff7a3d23d0f39a22147606050de without a work around for PPC, although that has also been reverted at this time.
Was your recent patch still problematic for PPC and would you still like us to investigate it?

Hi @amyk, if you can verify the reverted https://reviews.llvm.org/rGec575e3b0a462ff7a3d23d0f39a22147606050de works on clang-ppc64le-linux, it'd be nice...
Seems so from the previous runs?