Page MenuHomePhabricator

add hook for calling platform-dependent pre-kill action on a timed out test

Authored by tfiala on Sep 22 2016, 4:21 PM.



This change introduces the concept of a platform-specific, pre-kill-hook mechanism. If a platform defines the hook, then the hook gets called right after a timeout is detected in a test run, but before the process is killed.

The pre-kill-hook mechanism works as follows:

  • When a timeout is detected in the process_control.ProcessDriver class that runs the per-test lldb process, a new overridable on_timeout_pre_kill() method is called on the ProcessDriver instance.
  • The concurrent test driver's derived ProcessDriver overrides this method. It looks to see if a module called "lldbsuite.pre_kill_hook.{platform-system-name}" module exists, where platform-system-name is replaced with platform.system().lower():
    • If that module doesn't exist, the rest of the new behavior is skipped.
    • If that module does exist, it is loaded, and the method "do_pre_kill(process_id, output_stream)" is called. If that method throws an exception, we log it and we ignore further processing of the pre-killed process.
    • The process_id arg of the do_pre_kill function is the process id as returned by the property.
    • The output_stream arg of the do_pre_kill function takes a file-like object. Output to be collected from doing any processing on the process-to-be-killed should be written into the file-like object. The current impl uses a six.StringIO and then writes this output to {TestFilename}-{pid}.sample in the session directory.

Platforms where platform.system() is "Darwin" will get a pre-kill action that runs the 'sample' program on the lldb that has timed out. That data will be collected on CI and analyzed to determine what is happening during timeouts. (This has an advantage over a core in that it is much smaller and that it clearly demonstrates any liveness of the process, if there is any).

I will also hunt around on Linux to see if there might be something akin to 'sample' that might be available. If so, it would be nice to hook something up for that.

Diff Detail

Event Timeline

tfiala updated this revision to Diff 72222.Sep 22 2016, 4:21 PM
tfiala retitled this revision from to add hook for calling platform-dependent pre-kill action on a timed out test.
tfiala updated this object.
tfiala added reviewers: clayborg, labath.
tfiala added a subscriber: lldb-commits.
tfiala added inline comments.

I suspect we will need to tweak this a bit. We need to be able to dispatch on more than just the host platform.system(). It may be sufficient to pass along the test platform info as an argument.

Greg also had the idea of having a fallback mechanism that uses a newly-spun-up lldb to attach to the to-be-killed process, and retrieves the threads and backdtraces, to dump out a compact description. That's nice in that it should work on any host that has a working lldb with python support.

clayborg accepted this revision.Sep 22 2016, 5:04 PM
clayborg edited edge metadata.
This revision is now accepted and ready to land.Sep 22 2016, 5:04 PM
tfiala updated this object.Sep 22 2016, 9:43 PM
tfiala edited edge metadata.
labath accepted this revision.Sep 23 2016, 1:45 AM
labath edited edge metadata.

Sounds like a useful thing to have. I've found turning on logging very helpful when looking for these issues, as it can tell you what was happening in the past, in addition to the current state (also it allows you to compare the logs from a successful and unsuccessful run).


Is this actually used anywhere?

tfiala added inline comments.Sep 23 2016, 7:52 AM

No - I originally was parsing some options out of it in the handler, but I no longer am doing that. I will take this out of the final. It's easy enough to add in later if we ever need it.

tfiala updated this revision to Diff 72294.Sep 23 2016, 9:07 AM
tfiala edited edge metadata.

I'm about to check this in. I just wanted to put up my final change, which includes the following:

  • - documents the new pre_kill_hook package in lldbsuite.
  • added a runner_context/context_dict argument to the pre-kill-hook logic. This dictionary will contain:
    • entry 'args': array containing configuration.args
    • entry 'platform_name': contains configuration.lldb_platform_name
    • entry 'platform_url': contains configuration.lldb_platform_url
    • entry 'platform_working_dir': contains configuration.lldb_platform_working_dir

I pass the dictionary in to decouple the pre_kill_hook package from the test runner configuration module. (Also, the configuration module logic is not guaranteed to be run on any of those test queue workers, which may be in separate processes when using the multiprocessing* test runner strategies).

tfiala closed this revision.Sep 23 2016, 9:19 AM

Closed by r282258