This change introduces the concept of a platform-specific, pre-kill-hook mechanism. If a platform defines the hook, then the hook gets called right after a timeout is detected in a test run, but before the process is killed.
The pre-kill-hook mechanism works as follows:
- When a timeout is detected in the process_control.ProcessDriver class that runs the per-test lldb process, a new overridable on_timeout_pre_kill() method is called on the ProcessDriver instance.
- The concurrent test driver's derived ProcessDriver overrides this method. It looks to see if a module called "lldbsuite.pre_kill_hook.{platform-system-name}" module exists, where platform-system-name is replaced with platform.system().lower():
- If that module doesn't exist, the rest of the new behavior is skipped.
- If that module does exist, it is loaded, and the method "do_pre_kill(process_id, output_stream)" is called. If that method throws an exception, we log it and we ignore further processing of the pre-killed process.
- The process_id arg of the do_pre_kill function is the process id as returned by the ProcessDriver.pid property.
- The output_stream arg of the do_pre_kill function takes a file-like object. Output to be collected from doing any processing on the process-to-be-killed should be written into the file-like object. The current impl uses a six.StringIO and then writes this output to {TestFilename}-{pid}.sample in the session directory.
Platforms where platform.system() is "Darwin" will get a pre-kill action that runs the 'sample' program on the lldb that has timed out. That data will be collected on CI and analyzed to determine what is happening during timeouts. (This has an advantage over a core in that it is much smaller and that it clearly demonstrates any liveness of the process, if there is any).
I will also hunt around on Linux to see if there might be something akin to 'sample' that might be available. If so, it would be nice to hook something up for that.
I suspect we will need to tweak this a bit. We need to be able to dispatch on more than just the host platform.system(). It may be sufficient to pass along the test platform info as an argument.