This is an archive of the discontinued LLVM Phabricator instance.

add hook for calling platform-dependent pre-kill action on a timed out test
ClosedPublic

Authored by tfiala on Sep 22 2016, 4:21 PM.

Download Raw Diff

Details

Reviewers

clayborg
labath

Summary

This change introduces the concept of a platform-specific, pre-kill-hook mechanism. If a platform defines the hook, then the hook gets called right after a timeout is detected in a test run, but before the process is killed.

The pre-kill-hook mechanism works as follows:

When a timeout is detected in the process_control.ProcessDriver class that runs the per-test lldb process, a new overridable on_timeout_pre_kill() method is called on the ProcessDriver instance.

The concurrent test driver's derived ProcessDriver overrides this method. It looks to see if a module called "lldbsuite.pre_kill_hook.{platform-system-name}" module exists, where platform-system-name is replaced with platform.system().lower():
- If that module doesn't exist, the rest of the new behavior is skipped.
- If that module does exist, it is loaded, and the method "do_pre_kill(process_id, output_stream)" is called. If that method throws an exception, we log it and we ignore further processing of the pre-killed process.
- The process_id arg of the do_pre_kill function is the process id as returned by the ProcessDriver.pid property.
- The output_stream arg of the do_pre_kill function takes a file-like object. Output to be collected from doing any processing on the process-to-be-killed should be written into the file-like object. The current impl uses a six.StringIO and then writes this output to {TestFilename}-{pid}.sample in the session directory.

Platforms where platform.system() is "Darwin" will get a pre-kill action that runs the 'sample' program on the lldb that has timed out. That data will be collected on CI and analyzed to determine what is happening during timeouts. (This has an advantage over a core in that it is much smaller and that it clearly demonstrates any liveness of the process, if there is any).

I will also hunt around on Linux to see if there might be something akin to 'sample' that might be available. If so, it would be nice to hook something up for that.

Diff Detail

Event Timeline

tfiala updated this revision to Diff 72222.Sep 22 2016, 4:21 PM

tfiala retitled this revision from to add hook for calling platform-dependent pre-kill action on a timed out test.

tfiala updated this object.

tfiala added reviewers: clayborg, labath.

tfiala added a subscriber: lldb-commits.

tfiala added inline comments.

packages/Python/lldbsuite/test/dosep.py
239–246	I suspect we will need to tweak this a bit. We need to be able to dispatch on more than just the host platform.system(). It may be sufficient to pass along the test platform info as an argument.

Greg also had the idea of having a fallback mechanism that uses a newly-spun-up lldb to attach to the to-be-killed process, and retrieves the threads and backdtraces, to dump out a compact description. That's nice in that it should work on any host that has a working lldb with python support.

clayborg accepted this revision.Sep 22 2016, 5:04 PM

clayborg edited edge metadata.

This revision is now accepted and ready to land.Sep 22 2016, 5:04 PM

tfiala updated this object.Sep 22 2016, 9:43 PM

tfiala edited edge metadata.

Sounds like a useful thing to have. I've found turning on logging very helpful when looking for these issues, as it can tell you what was happening in the past, in addition to the current state (also it allows you to compare the logs from a successful and unsuccessful run).

packages/Python/lldbsuite/test/test_runner/process_control.py
513	Is this actually used anywhere?

tfiala added inline comments.Sep 23 2016, 7:52 AM

packages/Python/lldbsuite/test/test_runner/process_control.py
513	No - I originally was parsing some options out of it in the handler, but I no longer am doing that. I will take this out of the final. It's easy enough to add in later if we ever need it.

I'm about to check this in. I just wanted to put up my final change, which includes the following:

README.md - documents the new pre_kill_hook package in lldbsuite.

added a runner_context/context_dict argument to the pre-kill-hook logic. This dictionary will contain:
- entry 'args': array containing configuration.args
- entry 'platform_name': contains configuration.lldb_platform_name
- entry 'platform_url': contains configuration.lldb_platform_url
- entry 'platform_working_dir': contains configuration.lldb_platform_working_dir

I pass the dictionary in to decouple the pre_kill_hook package from the test runner configuration module. (Also, the configuration module logic is not guaranteed to be run on any of those test queue workers, which may be in separate processes when using the multiprocessing* test runner strategies).

Closed by r282258

Revision Contents

Path

Size

packages/

Python/

lldbsuite/

pre_kill_hook/

README.md

55 lines

__init__.py

1 line

darwin.py

46 lines

tests/

__init__.py

test_darwin.py

107 lines

test/

dosep.py

151 lines

test_runner/

process_control.py

18 lines

Diff 72294

packages/Python/lldbsuite/pre_kill_hook/README.md

This file was added.

				# pre\_kill\_hook package

				## Overview

				The pre\_kill\_hook package provides a per-platform method for running code
				after a test process times out but before the concurrent test runner kills the
				timed-out process.

				## Detailed Description of Usage

				If a platform defines the hook, then the hook gets called right after a timeout
				is detected in a test run, but before the process is killed.

				The pre-kill-hook mechanism works as follows:

				* When a timeout is detected in the process_control.ProcessDriver class that
				runs the per-test lldb process, a new overridable on\_timeout\_pre\_kill() method
				is called on the ProcessDriver instance.

				* The concurrent test driver's derived ProcessDriver overrides this method. It
				looks to see if a module called
				"lldbsuite.pre\_kill\_hook.{platform-system-name}" module exists, where
				platform-system-name is replaced with platform.system().lower(). (e.g.
				"Darwin" becomes the darwin.py module).

				* If that module doesn't exist, the rest of the new behavior is skipped.

				* If that module does exist, it is loaded, and the method
				"do\_pre\_kill(process\_id, context\_dict, output\_stream)" is called. If
				that method throws an exception, we log it and we ignore further processing
				of the pre-killed process.

				* The process\_id argument of the do\_pre\_kill function is the process id as
				returned by the ProcessDriver.pid property.

				* The output\_stream argument of the do\_pre\_kill function takes a file-like
				object. Output to be collected from doing any processing on the
				process-to-be-killed should be written into the file-like object. The
				current impl uses a six.StringIO and then writes this output to
				{TestFilename}-{pid}.sample in the session directory.

				* Platforms where platform.system() is "Darwin" will get a pre-kill action that
				runs the 'sample' program on the lldb that has timed out. That data will be
				collected on CI and analyzed to determine what is happening during timeouts.
				(This has an advantage over a core in that it is much smaller and that it
				clearly demonstrates any liveness of the process, if there is any).

				## Running the tests

				To run the tests in the pre\_kill\_hook package, open a console, change into
				this directory and run the following:

				```
				python -m unittest discover
				```

packages/Python/lldbsuite/pre_kill_hook/init.py

This file was added.

"""Initialize the package."""

packages/Python/lldbsuite/pre_kill_hook/darwin.py

This file was added.

				"""Provides a pre-kill method to run on macOS."""
				from __future__ import print_function

				# system imports
				import subprocess
				import sys

				# third-party module imports
				import six


				def do_pre_kill(process_id, runner_context, output_stream, sample_time=3):
				"""Samples the given process id, and puts the output to output_stream.

				@param process_id the local process to sample.

				@param runner_context a dictionary of details about the architectures
				and platform on which the given process is running. Expected keys are
				archs (array of architectures), platform_name, platform_url, and
				platform_working_dir.

				@param output_stream file-like object that should be used to write the
				results of sampling.

				@param sample_time specifies the time in seconds that should be captured.
				"""

				# Validate args.
				if runner_context is None:
				raise Exception("runner_context argument is required")
				if not isinstance(runner_context, dict):
				raise Exception("runner_context argument must be a dictionary")

				# We will try to run sample on the local host only if there is no URL
				# to a remote.
				if "platform_url" in runner_context and (
				runner_context["platform_url"] is not None):
				import pprint
				sys.stderr.write(
				"warning: skipping timeout pre-kill sample invocation because we "
				"don't know how to run on a remote yet. runner_context={}\n"
				.format(pprint.pformat(runner_context)))

				output = subprocess.check_output(['sample', six.text_type(process_id),
				str(sample_time)])
				output_stream.write(output)

packages/Python/lldbsuite/pre_kill_hook/tests/init.py

This file was added.

This is an empty file.

packages/Python/lldbsuite/pre_kill_hook/tests/test_darwin.py

This file was added.

				"""Test the pre-kill hook on Darwin."""
				from __future__ import print_function

				# system imports
				from multiprocessing import Process, Queue
				import platform
				import re
				from unittest import main, TestCase

				# third party
				from six import StringIO


				def do_child_process(child_work_queue, parent_work_queue, verbose):
				import os

				pid = os.getpid()
				if verbose:
				print("child: pid {} started, sending to parent".format(pid))
				parent_work_queue.put(pid)
				if verbose:
				print("child: waiting for shut-down request from parent")
				child_work_queue.get()
				if verbose:
				print("child: received shut-down request. Child exiting.")


				class DarwinPreKillTestCase(TestCase):

				def __init__(self, methodName):
				super(DarwinPreKillTestCase, self).__init__(methodName)
				self.process = None
				self.child_work_queue = None
				self.verbose = False

				def tearDown(self):
				if self.verbose:
				print("parent: sending shut-down request to child")
				if self.process:
				self.child_work_queue.put("hello, child")
				self.process.join()
				if self.verbose:
				print("parent: child is fully shut down")

				def test_sample(self):
				# Ensure we're Darwin.
				if platform.system() != 'Darwin':
				self.skipTest("requires a Darwin-based OS")

				# Start the child process.
				self.child_work_queue = Queue()
				parent_work_queue = Queue()
				self.process = Process(target=do_child_process,
				args=(self.child_work_queue, parent_work_queue,
				self.verbose))
				if self.verbose:
				print("parent: starting child")
				self.process.start()

				# Wait for the child to report its pid. Then we know we're running.
				if self.verbose:
				print("parent: waiting for child to start")
				child_pid = parent_work_queue.get()

				# Sample the child process.
				from darwin import do_pre_kill
				context_dict = {
				"archs": [platform.machine()],
				"platform_name": None,
				"platform_url": None,
				"platform_working_dir": None
				}

				if self.verbose:
				print("parent: running pre-kill action on child")
				output_io = StringIO()
				do_pre_kill(child_pid, context_dict, output_io)
				output = output_io.getvalue()

				if self.verbose:
				print("parent: do_pre_kill() wrote the following output:", output)
				self.assertIsNotNone(output)

				# We should have a line with:
				# Process: .* [{pid}]
				process_re = re.compile(r"Process:[^[]+\[([^]]+)\]")
				match = process_re.search(output)
				self.assertIsNotNone(match, "should have found process id for "
				"sampled process")
				self.assertEqual(1, len(match.groups()))
				self.assertEqual(child_pid, int(match.group(1)))

				# We should see a Call graph: section.
				callgraph_re = re.compile(r"Call graph:")
				match = callgraph_re.search(output)
				self.assertIsNotNone(match, "should have found the Call graph section"
				"in sample output")

				# We should see a Binary Images: section.
				binary_images_re = re.compile(r"Binary Images:")
				match = binary_images_re.search(output)
				self.assertIsNotNone(match, "should have found the Binary Images "
				"section in sample output")


				if __name__ == "__main__":
				main()

packages/Python/lldbsuite/test/dosep.py

Show All 40 Lines
import multiprocessing.pool		import multiprocessing.pool
import os		import os
import platform		import platform
import re		import re
import signal		import signal
import sys		import sys
import threading		import threading

		from six import StringIO
from six.moves import queue		from six.moves import queue

# Our packages and modules		# Our packages and modules
import lldbsuite		import lldbsuite
import lldbsuite.support.seven as seven		import lldbsuite.support.seven as seven

from . import configuration		from . import configuration
from . import dotest_args		from . import dotest_args
from lldbsuite.support import optional_with		from lldbsuite.support import optional_with
from lldbsuite.test_event import dotest_channels		from lldbsuite.test_event import dotest_channels
from lldbsuite.test_event.event_builder import EventBuilder		from lldbsuite.test_event.event_builder import EventBuilder
from lldbsuite.test_event import formatter		from lldbsuite.test_event import formatter

from .test_runner import process_control		from .test_runner import process_control

# Status codes for running command with timeout.		# Status codes for running command with timeout.
eTimedOut, ePassed, eFailed = 124, 0, 1		eTimedOut, ePassed, eFailed = 124, 0, 1

		g_session_dir = None
		g_runner_context = None
output_lock = None		output_lock = None
test_counter = None		test_counter = None
total_tests = None		total_tests = None
test_name_len = None		test_name_len = None
dotest_options = None		dotest_options = None
RESULTS_FORMATTER = None		RESULTS_FORMATTER = None
RUNNER_PROCESS_ASYNC_MAP = None		RUNNER_PROCESS_ASYNC_MAP = None
RESULTS_LISTENER_CHANNEL = None		RESULTS_LISTENER_CHANNEL = None
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	def on_process_exited(self, command, output, was_timeout, exit_status):
# Save off the results for the caller.		# Save off the results for the caller.
self.results = (		self.results = (
self.file_name,		self.file_name,
exit_status,		exit_status,
passes,		passes,
failures,		failures,
unexpected_successes)		unexpected_successes)

		def on_timeout_pre_kill(self):
		# We're just about to have a timeout take effect. Here's our chance
		# to do a pre-kill action.

		# For now, we look to see if the lldbsuite.pre_kill module has a
		# runner for our platform.
		module_name = "lldbsuite.pre_kill_hook." + platform.system().lower()
		import importlib
		try:
		module = importlib.import_module(module_name)
		except ImportError:
		# We don't have one for this platform. Skip.
		sys.stderr.write("\nwarning: no timeout handler module: " +
		module_name)
		tfialaAuthorUnsubmitted Not Done Reply Inline Actions I suspect we will need to tweak this a bit. We need to be able to dispatch on more than just the host platform.system(). It may be sufficient to pass along the test platform info as an argument. tfiala: I suspect we will need to tweak this a bit. We need to be able to dispatch on more than just…
		return

		# Try to run the pre-kill-hook method.
		try:
		# Run the pre-kill command.
		output_io = StringIO()
		module.do_pre_kill(self.pid, g_runner_context, output_io)

		# Write the output to a filename associated with the test file and
		# pid.
		basename = "{}-{}.sample".format(self.file_name, self.pid)
		sample_path = os.path.join(g_session_dir, basename)
		with open(sample_path, "w") as output_file:
		output_file.write(output_io.getvalue())
		except Exception as e:
		sys.stderr.write("caught exception while running "
		"pre-kill action: {}".format(e))
		return

def is_exceptional_exit(self):		def is_exceptional_exit(self):
"""Returns whether the process returned a timeout.		"""Returns whether the process returned a timeout.

Not valid to call until after on_process_exited() completes.		Not valid to call until after on_process_exited() completes.

@return True if the exit is an exceptional exit (e.g. signal on		@return True if the exit is an exceptional exit (e.g. signal on
POSIX); False otherwise.		POSIX); False otherwise.
"""		"""
▲ Show 20 Lines • Show All 392 Lines • ▼ Show 20 Lines	for root, _, files in os.walk(dir_root, topdown=False):
tests = [		tests = [
(filename, os.path.join(root, filename))		(filename, os.path.join(root, filename))
for filename in files		for filename in files
if is_test_filename(root, filename)]		if is_test_filename(root, filename)]
if tests:		if tests:
found_func(root, tests)		found_func(root, tests)


def initialize_global_vars_common(num_threads, test_work_items):		def initialize_global_vars_common(num_threads, test_work_items, session_dir,
global total_tests, test_counter, test_name_len		runner_context):
		global g_session_dir, g_runner_context, total_tests, test_counter
		global test_name_len

total_tests = sum([len(item[1]) for item in test_work_items])		total_tests = sum([len(item[1]) for item in test_work_items])
test_counter = multiprocessing.Value('i', 0)		test_counter = multiprocessing.Value('i', 0)
test_name_len = multiprocessing.Value('i', 0)		test_name_len = multiprocessing.Value('i', 0)
		g_session_dir = session_dir
		g_runner_context = runner_context
if not (RESULTS_FORMATTER and RESULTS_FORMATTER.is_using_terminal()):		if not (RESULTS_FORMATTER and RESULTS_FORMATTER.is_using_terminal()):
print(		print(
"Testing: %d test suites, %d thread%s" %		"Testing: %d test suites, %d thread%s" %
(total_tests,		(total_tests,
num_threads,		num_threads,
(num_threads > 1) *		(num_threads > 1) *
"s"),		"s"),
file=sys.stderr)		file=sys.stderr)
update_progress()		update_progress()


def initialize_global_vars_multiprocessing(num_threads, test_work_items):		def initialize_global_vars_multiprocessing(num_threads, test_work_items,
		session_dir, runner_context):
# Initialize the global state we'll use to communicate with the		# Initialize the global state we'll use to communicate with the
# rest of the flat module.		# rest of the flat module.
global output_lock		global output_lock
output_lock = multiprocessing.RLock()		output_lock = multiprocessing.RLock()

initialize_global_vars_common(num_threads, test_work_items)		initialize_global_vars_common(num_threads, test_work_items, session_dir,
		runner_context)


def initialize_global_vars_threading(num_threads, test_work_items):		def initialize_global_vars_threading(num_threads, test_work_items, session_dir,
		runner_context):
"""Initializes global variables used in threading mode.		"""Initializes global variables used in threading mode.

@param num_threads specifies the number of workers used.		@param num_threads specifies the number of workers used.

@param test_work_items specifies all the work items		@param test_work_items specifies all the work items
that will be processed.		that will be processed.

		@param session_dir the session directory where test-run-speciif files are
		written.

		@param runner_context a dictionary of platform-related data that is passed
		to the timeout pre-kill hook.
"""		"""
# Initialize the global state we'll use to communicate with the		# Initialize the global state we'll use to communicate with the
# rest of the flat module.		# rest of the flat module.
global output_lock		global output_lock
output_lock = threading.RLock()		output_lock = threading.RLock()

index_lock = threading.RLock()		index_lock = threading.RLock()
index_map = {}		index_map = {}

def get_worker_index_threading():		def get_worker_index_threading():
"""Returns a 0-based, thread-unique index for the worker thread."""		"""Returns a 0-based, thread-unique index for the worker thread."""
thread_id = threading.current_thread().ident		thread_id = threading.current_thread().ident
with index_lock:		with index_lock:
if thread_id not in index_map:		if thread_id not in index_map:
index_map[thread_id] = len(index_map)		index_map[thread_id] = len(index_map)
return index_map[thread_id]		return index_map[thread_id]

global GET_WORKER_INDEX		global GET_WORKER_INDEX
GET_WORKER_INDEX = get_worker_index_threading		GET_WORKER_INDEX = get_worker_index_threading

initialize_global_vars_common(num_threads, test_work_items)		initialize_global_vars_common(num_threads, test_work_items, session_dir,
		runner_context)


def ctrl_c_loop(main_op_func, done_func, ctrl_c_handler):		def ctrl_c_loop(main_op_func, done_func, ctrl_c_handler):
"""Provides a main loop that is Ctrl-C protected.		"""Provides a main loop that is Ctrl-C protected.

The main loop calls the main_op_func() repeatedly until done_func()		The main loop calls the main_op_func() repeatedly until done_func()
returns true. The ctrl_c_handler() method is called with a single		returns true. The ctrl_c_handler() method is called with a single
int parameter that contains the number of times the ctrl_c has been		int parameter that contains the number of times the ctrl_c has been
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	if workers is not None and len(workers) > 0:
# We're not done if we still have workers left.		# We're not done if we still have workers left.
return False		return False
if async_map is not None and len(async_map) > 0:		if async_map is not None and len(async_map) > 0:
return False		return False
# We're done.		# We're done.
return True		return True


def multiprocessing_test_runner(num_threads, test_work_items):		def multiprocessing_test_runner(num_threads, test_work_items, session_dir,
		runner_context):
"""Provides hand-wrapped pooling test runner adapter with Ctrl-C support.		"""Provides hand-wrapped pooling test runner adapter with Ctrl-C support.

This concurrent test runner is based on the multiprocessing		This concurrent test runner is based on the multiprocessing
library, and rolls its own worker pooling strategy so it		library, and rolls its own worker pooling strategy so it
can handle Ctrl-C properly.		can handle Ctrl-C properly.

This test runner is known to have an issue running on		This test runner is known to have an issue running on
Windows platforms.		Windows platforms.

@param num_threads the number of worker processes to use.		@param num_threads the number of worker processes to use.

@param test_work_items the iterable of test work item tuples		@param test_work_items the iterable of test work item tuples
to run.		to run.

		@param session_dir the session directory where test-run-speciif files are
		written.

		@param runner_context a dictionary of platform-related data that is passed
		to the timeout pre-kill hook.
"""		"""

# Initialize our global state.		# Initialize our global state.
initialize_global_vars_multiprocessing(num_threads, test_work_items)		initialize_global_vars_multiprocessing(num_threads, test_work_items,
		session_dir, runner_context)

# Create jobs.		# Create jobs.
job_queue = multiprocessing.Queue(len(test_work_items))		job_queue = multiprocessing.Queue(len(test_work_items))
for test_work_item in test_work_items:		for test_work_item in test_work_items:
job_queue.put(test_work_item)		job_queue.put(test_work_item)

result_queue = multiprocessing.Queue(len(test_work_items))		result_queue = multiprocessing.Queue(len(test_work_items))

▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	while not done:
# is complete.		# is complete.
if len(channel_map) > 0:		if len(channel_map) > 0:
# We still have an asyncore channel running. Not done yet.		# We still have an asyncore channel running. Not done yet.
done = False		done = False

return map_results		return map_results


def multiprocessing_test_runner_pool(num_threads, test_work_items):		def multiprocessing_test_runner_pool(num_threads, test_work_items, session_dir,
		runner_context):
# Initialize our global state.		# Initialize our global state.
initialize_global_vars_multiprocessing(num_threads, test_work_items)		initialize_global_vars_multiprocessing(num_threads, test_work_items,
		session_dir, runner_context)

manager = multiprocessing.Manager()		manager = multiprocessing.Manager()
worker_index_map = manager.dict()		worker_index_map = manager.dict()

pool = multiprocessing.Pool(		pool = multiprocessing.Pool(
num_threads,		num_threads,
initializer=setup_global_variables,		initializer=setup_global_variables,
initargs=(output_lock, test_counter, total_tests, test_name_len,		initargs=(output_lock, test_counter, total_tests, test_name_len,
dotest_options, worker_index_map))		dotest_options, worker_index_map))

# Start the map operation (async mode).		# Start the map operation (async mode).
map_future = pool.map_async(		map_future = pool.map_async(
process_dir_worker_multiprocessing_pool, test_work_items)		process_dir_worker_multiprocessing_pool, test_work_items)
return map_async_run_loop(		return map_async_run_loop(
map_future, RUNNER_PROCESS_ASYNC_MAP, RESULTS_LISTENER_CHANNEL)		map_future, RUNNER_PROCESS_ASYNC_MAP, RESULTS_LISTENER_CHANNEL)


def threading_test_runner(num_threads, test_work_items):		def threading_test_runner(num_threads, test_work_items, session_dir,
		runner_context):
"""Provides hand-wrapped pooling threading-based test runner adapter		"""Provides hand-wrapped pooling threading-based test runner adapter
with Ctrl-C support.		with Ctrl-C support.

This concurrent test runner is based on the threading		This concurrent test runner is based on the threading
library, and rolls its own worker pooling strategy so it		library, and rolls its own worker pooling strategy so it
can handle Ctrl-C properly.		can handle Ctrl-C properly.

@param num_threads the number of worker processes to use.		@param num_threads the number of worker processes to use.

@param test_work_items the iterable of test work item tuples		@param test_work_items the iterable of test work item tuples
to run.		to run.

		@param session_dir the session directory where test-run-speciif files are
		written.

		@param runner_context a dictionary of platform-related data that is passed
		to the timeout pre-kill hook.
"""		"""

# Initialize our global state.		# Initialize our global state.
initialize_global_vars_threading(num_threads, test_work_items)		initialize_global_vars_threading(num_threads, test_work_items, session_dir,
		runner_context)

# Create jobs.		# Create jobs.
job_queue = queue.Queue()		job_queue = queue.Queue()
for test_work_item in test_work_items:		for test_work_item in test_work_items:
job_queue.put(test_work_item)		job_queue.put(test_work_item)

result_queue = queue.Queue()		result_queue = queue.Queue()

Show All 31 Lines	"""

# Reap the test results.		# Reap the test results.
test_results = []		test_results = []
while not result_queue.empty():		while not result_queue.empty():
test_results.append(result_queue.get(block=False))		test_results.append(result_queue.get(block=False))
return test_results		return test_results


def threading_test_runner_pool(num_threads, test_work_items):		def threading_test_runner_pool(num_threads, test_work_items, session_dir,
		runner_context):
# Initialize our global state.		# Initialize our global state.
initialize_global_vars_threading(num_threads, test_work_items)		initialize_global_vars_threading(num_threads, test_work_items, session_dir,
		runner_context)

pool = multiprocessing.pool.ThreadPool(num_threads)		pool = multiprocessing.pool.ThreadPool(num_threads)
map_future = pool.map_async(		map_future = pool.map_async(
process_dir_worker_threading_pool, test_work_items)		process_dir_worker_threading_pool, test_work_items)

return map_async_run_loop(		return map_async_run_loop(
map_future, RUNNER_PROCESS_ASYNC_MAP, RESULTS_LISTENER_CHANNEL)		map_future, RUNNER_PROCESS_ASYNC_MAP, RESULTS_LISTENER_CHANNEL)


def asyncore_run_loop(channel_map):		def asyncore_run_loop(channel_map):
try:		try:
asyncore.loop(None, False, channel_map)		asyncore.loop(None, False, channel_map)
except:		except:
# Swallow it, we're seeing:		# Swallow it, we're seeing:
# error: (9, 'Bad file descriptor')		# error: (9, 'Bad file descriptor')
# when the listener channel is closed. Shouldn't be the case.		# when the listener channel is closed. Shouldn't be the case.
pass		pass


def inprocess_exec_test_runner(test_work_items):		def inprocess_exec_test_runner(test_work_items, session_dir, runner_context):
# Initialize our global state.		# Initialize our global state.
initialize_global_vars_multiprocessing(1, test_work_items)		initialize_global_vars_multiprocessing(1, test_work_items, session_dir,
		runner_context)

# We're always worker index 0		# We're always worker index 0
global GET_WORKER_INDEX		global GET_WORKER_INDEX
GET_WORKER_INDEX = lambda: 0		GET_WORKER_INDEX = lambda: 0

# Run the listener and related channel maps in a separate thread.		# Run the listener and related channel maps in a separate thread.
# global RUNNER_PROCESS_ASYNC_MAP		# global RUNNER_PROCESS_ASYNC_MAP
global RESULTS_LISTENER_CHANNEL		global RESULTS_LISTENER_CHANNEL
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	def find(pattern, path):
result = []		result = []
for root, dirs, files in os.walk(path):		for root, dirs, files in os.walk(path):
for name in files:		for name in files:
if fnmatch.fnmatch(name, pattern):		if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))		result.append(os.path.join(root, name))
return result		return result


def get_test_runner_strategies(num_threads):		def get_test_runner_strategies(num_threads, session_dir, runner_context):
"""Returns the test runner strategies by name in a dictionary.		"""Returns the test runner strategies by name in a dictionary.

@param num_threads specifies the number of threads/processes		@param num_threads specifies the number of threads/processes
that will be used for concurrent test runners.		that will be used for concurrent test runners.

		@param session_dir specifies the session dir to use for
		auxiliary files.

		@param runner_context a dictionary of details on the architectures and
		platform used to run the test suite. This is passed along verbatim to
		the timeout pre-kill handler, allowing that decoupled component to do
		process inspection in a platform-specific way.

@return dictionary with key as test runner strategy name and		@return dictionary with key as test runner strategy name and
value set to a callable object that takes the test work item		value set to a callable object that takes the test work item
and returns a test result tuple.		and returns a test result tuple.
"""		"""
return {		return {
# multiprocessing supports ctrl-c and does not use		# multiprocessing supports ctrl-c and does not use
# multiprocessing.Pool.		# multiprocessing.Pool.
"multiprocessing":		"multiprocessing":
(lambda work_items: multiprocessing_test_runner(		(lambda work_items: multiprocessing_test_runner(
num_threads, work_items)),		num_threads, work_items, session_dir, runner_context)),

# multiprocessing-pool uses multiprocessing.Pool but		# multiprocessing-pool uses multiprocessing.Pool but
# does not support Ctrl-C.		# does not support Ctrl-C.
"multiprocessing-pool":		"multiprocessing-pool":
(lambda work_items: multiprocessing_test_runner_pool(		(lambda work_items: multiprocessing_test_runner_pool(
num_threads, work_items)),		num_threads, work_items, session_dir, runner_context)),

# threading uses a hand-rolled worker pool much		# threading uses a hand-rolled worker pool much
# like multiprocessing, but instead uses in-process		# like multiprocessing, but instead uses in-process
# worker threads. This one supports Ctrl-C.		# worker threads. This one supports Ctrl-C.
"threading":		"threading":
(lambda work_items: threading_test_runner(num_threads, work_items)),		(lambda work_items: threading_test_runner(
		num_threads, work_items, session_dir, runner_context)),

# threading-pool uses threading for the workers (in-process)		# threading-pool uses threading for the workers (in-process)
# and uses the multiprocessing.pool thread-enabled pool.		# and uses the multiprocessing.pool thread-enabled pool.
# This does not properly support Ctrl-C.		# This does not properly support Ctrl-C.
"threading-pool":		"threading-pool":
(lambda work_items: threading_test_runner_pool(		(lambda work_items: threading_test_runner_pool(
num_threads, work_items)),		num_threads, work_items, session_dir, runner_context)),

# serial uses the subprocess-based, single process		# serial uses the subprocess-based, single process
# test runner. This provides process isolation but		# test runner. This provides process isolation but
# no concurrent test execution.		# no concurrent test execution.
"serial":		"serial":
inprocess_exec_test_runner		(lambda work_items: inprocess_exec_test_runner(
		work_items, session_dir, runner_context))
}		}


def _remove_option(		def _remove_option(
args, long_option_name, short_option_name, takes_arg):		args, long_option_name, short_option_name, takes_arg):
"""Removes option and related option arguments from args array.		"""Removes option and related option arguments from args array.

This method removes all short/long options that match the given		This method removes all short/long options that match the given
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	def default_test_runner_name(num_threads):
else:		else:
# For everyone else, use the ctrl-c-enabled threading support.		# For everyone else, use the ctrl-c-enabled threading support.
# Should use fewer system resources than the multprocessing		# Should use fewer system resources than the multprocessing
# variant.		# variant.
test_runner_name = "threading"		test_runner_name = "threading"
return test_runner_name		return test_runner_name


def rerun_tests(test_subdir, tests_for_rerun, dotest_argv):		def rerun_tests(test_subdir, tests_for_rerun, dotest_argv, session_dir,
		runner_context):
# Build the list of test files to rerun. Some future time we'll		# Build the list of test files to rerun. Some future time we'll
# enable re-run by test method so we can constrain the rerun set		# enable re-run by test method so we can constrain the rerun set
# to just the method(s) that were in issued within a file.		# to just the method(s) that were in issued within a file.

# Sort rerun files into subdirectories.		# Sort rerun files into subdirectories.
print("\nRerunning the following files:")		print("\nRerunning the following files:")
rerun_files_by_subdir = {}		rerun_files_by_subdir = {}
for test_filename in tests_for_rerun.keys():		for test_filename in tests_for_rerun.keys():
Show All 23 Lines	def rerun_tests(test_subdir, tests_for_rerun, dotest_argv, session_dir,
# Do not update legacy counts, I am getting rid of		# Do not update legacy counts, I am getting rid of
# them so no point adding complicated merge logic here.		# them so no point adding complicated merge logic here.
rerun_thread_count = 1		rerun_thread_count = 1
# Force the parallel test runner to choose a multi-worker strategy.		# Force the parallel test runner to choose a multi-worker strategy.
rerun_runner_name = default_test_runner_name(rerun_thread_count + 1)		rerun_runner_name = default_test_runner_name(rerun_thread_count + 1)
print("rerun will use the '{}' test runner strategy".format(		print("rerun will use the '{}' test runner strategy".format(
rerun_runner_name))		rerun_runner_name))

runner_strategies_by_name = get_test_runner_strategies(rerun_thread_count)		runner_strategies_by_name = get_test_runner_strategies(
		rerun_thread_count, session_dir, runner_context)
rerun_runner_func = runner_strategies_by_name[		rerun_runner_func = runner_strategies_by_name[
rerun_runner_name]		rerun_runner_name]
if rerun_runner_func is None:		if rerun_runner_func is None:
raise Exception(		raise Exception(
"failed to find rerun test runner "		"failed to find rerun test runner "
"function named '{}'".format(rerun_runner_name))		"function named '{}'".format(rerun_runner_name))

walk_and_invoke(		walk_and_invoke(
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	def main(num_threads, test_subdir, test_runner_name, results_formatter):
system_info = " ".join(platform.uname())		system_info = " ".join(platform.uname())

# Figure out which test files should be enabled for expected		# Figure out which test files should be enabled for expected
# timeout		# timeout
expected_timeout = getExpectedTimeouts(dotest_options.lldb_platform_name)		expected_timeout = getExpectedTimeouts(dotest_options.lldb_platform_name)
if results_formatter is not None:		if results_formatter is not None:
results_formatter.set_expected_timeouts_by_basename(expected_timeout)		results_formatter.set_expected_timeouts_by_basename(expected_timeout)

		# Setup the test runner context. This is a dictionary of information that
		# will be passed along to the timeout pre-kill handler and allows for loose
		# coupling of its implementation.
		runner_context = {
		"archs": configuration.archs,
		"platform_name": configuration.lldb_platform_name,
		"platform_url": configuration.lldb_platform_url,
		"platform_working_dir": configuration.lldb_platform_working_dir,
		}

# Figure out which testrunner strategy we'll use.		# Figure out which testrunner strategy we'll use.
runner_strategies_by_name = get_test_runner_strategies(num_threads)		runner_strategies_by_name = get_test_runner_strategies(
		num_threads, session_dir, runner_context)

# If the user didn't specify a test runner strategy, determine		# If the user didn't specify a test runner strategy, determine
# the default now based on number of threads and OS type.		# the default now based on number of threads and OS type.
if not test_runner_name:		if not test_runner_name:
test_runner_name = default_test_runner_name(num_threads)		test_runner_name = default_test_runner_name(num_threads)

if test_runner_name not in runner_strategies_by_name:		if test_runner_name not in runner_strategies_by_name:
raise Exception(		raise Exception(
Show All 30 Lines	if results_formatter is not None:

# Check if the number of files exceeds the max cutoff. If so,		# Check if the number of files exceeds the max cutoff. If so,
# we skip the rerun step.		# we skip the rerun step.
if rerun_file_count > configuration.rerun_max_file_threshold:		if rerun_file_count > configuration.rerun_max_file_threshold:
print("Skipping rerun: max rerun file threshold ({}) "		print("Skipping rerun: max rerun file threshold ({}) "
"exceeded".format(		"exceeded".format(
configuration.rerun_max_file_threshold))		configuration.rerun_max_file_threshold))
else:		else:
rerun_tests(test_subdir, tests_for_rerun, dotest_argv)		rerun_tests(test_subdir, tests_for_rerun, dotest_argv,
		session_dir, runner_context)

# The results formatter - if present - is done now. Tell it to		# The results formatter - if present - is done now. Tell it to
# terminate.		# terminate.
if results_formatter is not None:		if results_formatter is not None:
results_formatter.send_terminate_as_needed()		results_formatter.send_terminate_as_needed()

timed_out = set(timed_out)		timed_out = set(timed_out)
num_test_files = len(passed) + len(failed)		num_test_files = len(passed) + len(failed)
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

packages/Python/lldbsuite/test/test_runner/process_control.py

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	class ProcessDriver(object):
# =============================================		# =============================================

def on_process_started(self):		def on_process_started(self):
pass		pass

def on_process_exited(self, command, output, was_timeout, exit_status):		def on_process_exited(self, command, output, was_timeout, exit_status):
pass		pass

		def on_timeout_pre_kill(self):
		"""Called after the timeout interval elapses but before killing it.

		This method is added to enable derived classes the ability to do
		something to the process prior to it being killed. For example,
		this would be a good spot to run a program that samples the process
		to see what it was doing (or not doing).

		Do not attempt to reap the process (i.e. use wait()) in this method.
		That will interfere with the kill mechanism and return code processing.
		"""
		pass

def write(self, content):		def write(self, content):
# pylint: disable=no-self-use		# pylint: disable=no-self-use
# Intended - we want derived classes to be able to override		# Intended - we want derived classes to be able to override
# this and use any self state they may contain.		# this and use any self state they may contain.
sys.stdout.write(content)		sys.stdout.write(content)

# ==============================================================		# ==============================================================
# Operations used to drive processes. Clients will want to call		# Operations used to drive processes. Clients will want to call
# one of these.		# one of these.
# ==============================================================		# ==============================================================

def run_command(self, command):		def run_command(self, command):
# Start up the child process and the thread that does the		# Start up the child process and the thread that does the
# communication pump.		# communication pump.
self._start_process_and_io_thread(command)		self._start_process_and_io_thread(command)
		labathUnsubmitted Not Done Reply Inline Actions Is this actually used anywhere? labath: Is this actually used anywhere?
		tfialaAuthorUnsubmitted Not Done Reply Inline Actions No - I originally was parsing some options out of it in the handler, but I no longer am doing that. I will take this out of the final. It's easy enough to add in later if we ever need it. tfiala: No - I originally was parsing some options out of it in the handler, but I no longer am doing…

# Wait indefinitely for the child process to finish		# Wait indefinitely for the child process to finish
# communicating. This indicates it has closed stdout/stderr		# communicating. This indicates it has closed stdout/stderr
# pipes and is done.		# pipes and is done.
self.io_thread.join()		self.io_thread.join()
self.returncode = self.process.wait()		self.returncode = self.process.wait()
if self.returncode is None:		if self.returncode is None:
raise Exception(		raise Exception(
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	class ProcessDriver(object):
def _wait_with_timeout(self, timeout_seconds, command, want_core):		def _wait_with_timeout(self, timeout_seconds, command, want_core):
# Allow up to timeout seconds for the io thread to wrap up.		# Allow up to timeout seconds for the io thread to wrap up.
# If that completes, the child process should be done.		# If that completes, the child process should be done.
completed_normally = self.done_event.wait(timeout_seconds)		completed_normally = self.done_event.wait(timeout_seconds)
if completed_normally:		if completed_normally:
# Reap the child process here.		# Reap the child process here.
self.returncode = self.process.wait()		self.returncode = self.process.wait()
else:		else:

		# Allow derived classes to do some work after we detected
		# a timeout but before we touch the timed-out process.
		self.on_timeout_pre_kill()

# Prepare to stop the process		# Prepare to stop the process
process_terminated = completed_normally		process_terminated = completed_normally
terminate_attempt_count = 0		terminate_attempt_count = 0

# Try as many attempts as we support for trying to shut down		# Try as many attempts as we support for trying to shut down
# the child process if it's not already shut down.		# the child process if it's not already shut down.
while not process_terminated:		while not process_terminated:
terminate_attempt_count += 1		terminate_attempt_count += 1
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines