This is an archive of the discontinued LLVM Phabricator instance.

tsan: add debugging code for ptrace test failures
ClosedPublic

Authored by dvyukov on Oct 28 2021, 4:02 AM.

Details

Summary

Debugging of crashes on powerpc after commit:
c80604f7a3 ("tsan: remove real func check from interceptors")
Somehow replacing if with DCHECK leads to strange failures in:
SanitizerCommon-tsan-powerpc64le-Linux :: Linux/ptrace.cpp
https://lab.llvm.org/buildbot/#/builders/105
https://lab.llvm.org/buildbot/#/builders/121
https://lab.llvm.org/buildbot/#/builders/57

The hypothesis is that something writes out-of-bounds
into pt_regs on stack and that corrupts internal tsan state.

Diff Detail

Event Timeline

dvyukov created this revision.Oct 28 2021, 4:02 AM
dvyukov requested review of this revision.Oct 28 2021, 4:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2021, 4:02 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
melver accepted this revision.Oct 28 2021, 4:27 AM

This is very strange.

@amyk
Just in case, is there a way to get ssh access to a powerpc environment to repro these kinds of failures? There have been a number of powerpc-related issues, and we just can't repro them. :-/

This revision is now accepted and ready to land.Oct 28 2021, 4:27 AM

Yes, this is very strange :)

The failure looks like the ScopedInterceptor object on stack was corrupted when returning from the interceptor (after the real syscall). So I don't know if it's the case or not, but if we assume that syscall assumes that pt_regs are larger and overwrites more memory, then it would explain the failure mode perfectly.

amyk added a comment.Oct 28 2021, 7:27 AM

Hi @melver @dvyukov,

I've checked and we do have a Power environment that can be accessed. I've sent an e-mail to you both regarding this.

amyk added a comment.Oct 29 2021, 7:51 AM

Hi @melver @dvyukov,

Just wanted to check in - I believe I've granted access to a PPC machine. Were we able to determine if this patch resolves the issues seen on the buildbot, and would this patch be ready to commit?

dvyukov added a comment.EditedOct 29 2021, 8:18 AM

Hi Any,

I only get to this point:

# to get newer cmake
export PATH=/usr/sbin:$PATH
CC=/home/fedora/gcc/install/gcc-7.1.0/bin/gcc CXX=/home/fedora/gcc/install/gcc-7.1.0/bin/g++ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;compiler-rt;libcxx;libcxxabi;lld" -GNinja ../llvm
...
CMake Error in /home/llvmguest/dvyukov/llvm-project/libcxx/benchmarks/CMakeLists.txt:
  The compiler feature "cxx_std_20" is not known to CXX compiler
  version 7.1.0.

I can't find any other gcc/clang on the machine.

amyk added a comment.Oct 29 2021, 8:34 AM

Hi Any,

I only get to this point:

# to get newer cmake
export PATH=/usr/sbin:$PATH
CC=/home/fedora/gcc/install/gcc-7.1.0/bin/gcc CXX=/home/fedora/gcc/install/gcc-7.1.0/bin/g++ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;compiler-rt;libcxx;libcxxabi;lld" -GNinja ../llvm
...
CMake Error in /home/llvmguest/dvyukov/llvm-project/libcxx/benchmarks/CMakeLists.txt:
  The compiler feature "cxx_std_20" is not known to CXX compiler
  version 7.1.0.

I can't find any other gcc/clang on the machine.

I'm sorry that you were having troubles building on the machine. I can look into that further.
In any case, I just tested this patch on the machine I am on right now and this resolves the SanitizerCommon-tsan-powerpc64le-Linux :: Linux/ptrace.cpp failure.
Since this patch resolves the failure, can this be committed? Or were you planning on investigating the problem further on the machine (If this is the case and if this patch cannot be committed, perhaps we can revert the offending patch first prior to doing more investigation)?

This revision was automatically updated to reflect the committed changes.

Thanks for testing. I've landed this change.

Marco helped me to figure out the command that works. For the record:

export PATH=/usr/sbin:$PATH
cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;compiler-rt;" -GNinja ../llvm
ninja && ninja check-sanitizer

It's now building...

amyk added a comment.Oct 29 2021, 9:07 AM

Great, thank you very much Dmitry. Really appreciate it!

I've reproduced the failure and reproduced "fixing" by this change.
I see some stack corruption and it's affected by small unrelated changes like adding Printf's. I've tried dumping corrupted memory, etc, but it did not give me any glues.
At this point I suspect gcc/glibc/kernel bug. I don't think it makes sense to spend more time debugging this with gcc 6.1 and kernel 4.11. Thousands of bugs were fixed since then.