This is an archive of the discontinued LLVM Phabricator instance.

[lldb/test] Enable faulthandler in dotest
ClosedPublic

Authored by rupprecht on Sep 14 2020, 2:41 PM.

Details

Summary

Register the faulthandler module so we can see what lldb tests are doing when they misbehave (e.g. run under a test runner that sets a timeout). This will print a stack trace for the following signals:

  • SIGSEGV, SIGFPE, SIGABRT, SIGBUS, and SIGILL (via faulthandler.enable())
  • SIGTERM (via faulthandler.register(SIGTERM)) [This is what our test runners sends when it times out].

The only signal we currently handle is SIGINT (via unittest2.signals.installHandler()) so there should be no overlap added by this patch.

Because this import is not available until python3, and the register() method is not available on Windows, this is enabled defensively.

This should have absolutely no effect when tests are passing (or even normally failing), but can be observed by running this while ninja is running:

kill -s SIGTERM $(ps aux | grep dotest.py | head -1 | awk '{print $2}')

Diff Detail

Event Timeline

rupprecht created this revision.Sep 14 2020, 2:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2020, 2:41 PM
rupprecht requested review of this revision.Sep 14 2020, 2:41 PM
JDevlieghere accepted this revision.Sep 16 2020, 1:59 PM

I think this will be very useful. LGTM!

This revision is now accepted and ready to land.Sep 16 2020, 1:59 PM
This revision was automatically updated to reflect the committed changes.

I'm late to the party, but I actually don't think this is a good change as it disables the normal LLDB backtraces. The current test errors we see on Green Dragon look now like this:

Assertion failed: (size() >= N && "Dropping more elements than exist"), function drop_front, file /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/ADT/StringRef.h, line 655.
Fatal Python error: Aborted

Current thread 0x000000010e0f9dc0 (most recent call first):
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/lib/python3.7/site-packages/lldb/__init__.py", line 2717 in HandleCommand
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2129 in runCmd
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/test/API/commands/apropos/with-process/TestAproposWithProcess.py", line 43 in test_apropos_with_process
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 413 in runMethod
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 383 in run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 458 in __call__
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 117 in _wrapped_run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 115 in _wrapped_run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 85 in run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 66 in __call__
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/runner.py", line 165 in run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/dotest.py", line 1017 in run_suite
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/test/API/dotest.py", line 7 in <module>

It's practically impossible to say from the backtrace which commit is to blame for the crash as I only know we run the command apropos env and then we crash in StringRef.

Do we have a way of getting both the LLVM and the Python backtrace?

teemperor reopened this revision.Nov 13 2020, 3:31 PM
This revision is now accepted and ready to land.Nov 13 2020, 3:31 PM
teemperor requested changes to this revision.Nov 13 2020, 3:31 PM
This revision now requires changes to proceed.Nov 13 2020, 3:31 PM

I'm late to the party, but I actually don't think this is a good change as it disables the normal LLDB backtraces. The current test errors we see on Green Dragon look now like this:

Assertion failed: (size() >= N && "Dropping more elements than exist"), function drop_front, file /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/ADT/StringRef.h, line 655.
Fatal Python error: Aborted

Current thread 0x000000010e0f9dc0 (most recent call first):
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/lib/python3.7/site-packages/lldb/__init__.py", line 2717 in HandleCommand
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2129 in runCmd
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/test/API/commands/apropos/with-process/TestAproposWithProcess.py", line 43 in test_apropos_with_process
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 413 in runMethod
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 383 in run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 458 in __call__
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 117 in _wrapped_run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 115 in _wrapped_run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 85 in run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 66 in __call__
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/runner.py", line 165 in run
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/dotest.py", line 1017 in run_suite
  File "/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/lldb/test/API/dotest.py", line 7 in <module>

It's practically impossible to say from the backtrace which commit is to blame for the crash as I only know we run the command apropos env and then we crash in StringRef.

Do we have a way of getting both the LLVM and the Python backtrace?

I'm not really sure what you mean by the LLDB backtrace or how this is inhibiting it.

IIUC, the faulthandler here is triggered when the test *receives* a signal killing it (e.g. if it hangs and lit or whatever kills it after some timeout). If the LLDB process that this calls crashes, it shouldn't affect that. SWIG maybe is complicating things here.

Do you have a way to reproduce the issue you're talking about? e.g. "git checkout <bad commit> && ninja check-lldb-api"?

I think Raphael is referring to the LLVM's PrettyStackTrace which prints a backtrace when you crash. Other than the driver I don't think we enable that in LLDB, so maybe LLVM was registering them by default if you had no signal handler installed?

By default the LLVM signal handlers propagate the signal once they're done with it, so I see no reason why we couldn't have both. I think it should work if we ensure libLLDB calls llvm::EnablePrettyStackTrace(), but if we go that route the behavior should be opt in, libraries shouldn't be setting signal handlers by default.

+1 to what Jonas said.

You can reproduce this by just adding an abort(); call at the start of CommandInterpreter::HandleCommand and then for example like the TestApropos.py test.

If the LLDB process that this calls crashes, it shouldn't affect that.

LLDB is loaded as a Python module *inside* the Python process, so Python and LLDB actually do share a process/signals (that's why when LLDB crashes in a Python test, you actually see the 'Python' process crashing. It's just a loaded shared library).

Thanks, I'll take a look at that.

For SIGTERM (issued by test runners to handle timeouts), the stack trace is printed via faulthandler.register(signal.SIGTERM, chain=True). The chain=True should cause previous signal handlers registered for SIGTERM to execute.

For all other signal types, this is enabled with faulthandler.enable(), and there doesn't appear to be any equivalent option AFAICT, so it's plausible that it's overwriting the signal handler. Maybe we could replace that with calls to faulthandler.register() for each signal, but hopefully there's a cleaner way.

At head:

$ ninja check-lldb-api-commands-apropos-basic
[2/3] Running lit suite /home/rupprecht/src/llvm-project/lldb/test/API/commands/apropos/basic
FAIL: lldb-api :: commands/apropos/basic/TestApropos.py (1 of 1)
******************** TEST 'lldb-api :: commands/apropos/basic/TestApropos.py' FAILED ********************
Script:
--
/usr/bin/python3.8 /home/rupprecht/src/llvm-project/lldb/test/API/dotest.py <...> -p TestApropos.py
--
Exit Code: -6

Command Output (stdout):
--
lldb version 12.0.0 (https://github.com/llvm/llvm-project.git revision 703ef17e7a0a0f51e1d000bb1f71ad437a9933e4)
  clang revision 703ef17e7a0a0f51e1d000bb1f71ad437a9933e4
  llvm revision 703ef17e7a0a0f51e1d000bb1f71ad437a9933e4

--
Command Output (stderr):
--
Fatal Python error: Aborted

Current thread 0x00007f1541aea740 (most recent call first):
  File "/home/rupprecht/src/llvm-build/dev/lib/python2.7/dist-packages/lldb/__init__.py", line 3352 in HandleCommand
  File "/home/rupprecht/src/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2129 in runCmd
  File "/home/rupprecht/src/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1949 in setUp
  File "/home/rupprecht/src/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 377 in run
  File "/home/rupprecht/src/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/case.py", line 458 in __call__
  File "/home/rupprecht/src/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 117 in _wrapped_run
  File "/home/rupprecht/src/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 115 in _wrapped_run
  File "/home/rupprecht/src/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 85 in run
  File "/home/rupprecht/src/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/suite.py", line 66 in __call__
  File "/home/rupprecht/src/llvm-project/lldb/third_party/Python/module/unittest2/unittest2/runner.py", line 165 in run
  File "/home/rupprecht/src/llvm-project/lldb/packages/Python/lldbsuite/test/dotest.py", line 1013 in run_suite
  File "/home/rupprecht/src/llvm-project/lldb/test/API/dotest.py", line 7 in <module>

--

********************
********************
Failed Tests (1):
  lldb-api :: commands/apropos/basic/TestApropos.py

With registerFaulthandler() commented out:

$ ninja check-lldb-api-commands-apropos-basic
[2/3] Running lit suite /home/rupprecht/src/llvm-project/lldb/test/API/commands/apropos/basic
FAIL: lldb-api :: commands/apropos/basic/TestApropos.py (1 of 1)
******************** TEST 'lldb-api :: commands/apropos/basic/TestApropos.py' FAILED ********************
Script:
--
/usr/bin/python3.8 /home/rupprecht/src/llvm-project/lldb/test/API/dotest.py <...> -p TestApropos.py
--
Exit Code: -6

Command Output (stdout):
--
lldb version 12.0.0 (https://github.com/llvm/llvm-project.git revision 703ef17e7a0a0f51e1d000bb1f71ad437a9933e4)
  clang revision 703ef17e7a0a0f51e1d000bb1f71ad437a9933e4
  llvm revision 703ef17e7a0a0f51e1d000bb1f71ad437a9933e4

--

********************
********************
Failed Tests (1):
  lldb-api :: commands/apropos/basic/TestApropos.py

So, it doesn't seem like the LLVM signal handler is being enabled, at least not by default -- maybe it has something to do with my cmake config.

My cmake config appears to be correct. I tried adding this to CommandInterpreter::HandleCommand:

lldbassert(ENABLE_BACKTRACES && "back traces are not enabled!");
lldbassert(false && "crash!");

Running LLDB directly does what you describe:

$ bin/lldb
(lldb) help
lldb: /home/rupprecht/src/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1655: bool lldb_private::CommandInterpreter::HandleCommand(const char *, lldb_private::LazyBool, lldb_private::CommandReturnObject &, lldb_private::ExecutionContext *, bool, bool): Assertion `false && "crash!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: bin/lldb 
 #0 0x00000000003a283a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:563:11
 #1 0x00000000003a2a0b PrintStackTraceSignalHandler(void*) /home/rupprecht/src/llvm-project/llvm/lib/Support/Unix/Signals.inc:630:1
 #2 0x00000000003a102b llvm::sys::RunSignalHandlers() /home/rupprecht/src/llvm-project/llvm/lib/Support/Signals.cpp:70:5

However, when I disable the faulthandler added here and run the lldb tests, I only see the "Assertion `false && "crash!"' failed" part. So: yes, something in the lldb test suite is messing with the signal handler, but it's not this (or it's not *only* this).
I recall running across other signal handling code in the lldb test framework (in the unittest2 bits), I'll see if I can find that again.

teemperor accepted this revision.Nov 14 2020, 12:36 AM

Yeah I don't think this revision is to blame as just disabling the code here doesn't bring back the backtraces. Doesn't explain what happened to them, but that's probably on me to figure out. Really sorry for the noise, closing this again.

This revision is now accepted and ready to land.Nov 14 2020, 12:36 AM
teemperor closed this revision.Nov 14 2020, 12:36 AM

Yeah I don't think this revision is to blame as just disabling the code here doesn't bring back the backtraces. Doesn't explain what happened to them, but that's probably on me to figure out. Really sorry for the noise, closing this again.

No worries, this is an entirely reasonable change to suspect.

Not sure if you saw, but further on down there is also:

# Install the control-c handler.
unittest2.signals.installHandler()

However, removing that in addition to this still does not seem to get the C++ backtraces working. I'm guessing there's a call to EnablePrettyStackTrace that used to be happening and was accidentally removed.