Download Raw Diff

Details

Reviewers

vitalybuka
modocache
glider
dvyukov
MatzeB
beanz
• ddunbar

Commits

Summary

Running lit tests and unit tests of ASan and TSan on macOS has very bad performance when running with a high number of threads. This is caused by xnu (the macOS kernel), which currently doesn't handle mapping and unmapping of sanitizer shadow regions (reserved VM which are several terabytes large) very well. The situation is so bad that increasing the number of threads actually makes the total testing time larger. The macOS buildbots are affected by this. Note that we can't easily limit the number of sanitizer testing threads without affecting the rest of the tests.

This patch adds a special "group" into lit, and limits the number of concurrently running tests in this group. This helps solve the contention problem, while still allowing other tests to run in full, that means running lit with -j8 will still with 8 threads, and parallelism is only limited in sanitizer tests.

Diff Detail

Repository: rL LLVM

Event Timeline

kubamracek updated this revision to Diff 83457.Jan 6 2017, 4:27 PM

kubamracek retitled this revision from to [lit] Limit parallelism of sanitizer tests on Darwin.

kubamracek updated this object.

kubamracek added reviewers: dvyukov, glider, beanz, vitalybuka, • ddunbar.

kubamracek set the repository for this revision to rL LLVM.

kubamracek added a project: Restricted Project.

kubamracek added subscribers: llvm-commits, zaks.anna, cmatthews.

Herald added a reviewer: modocache. · View Herald TranscriptJan 6 2017, 4:28 PM

LGTM

We noticed that issues with slow mmaps/munmaps on darwin as well.
Looks fine to me, but I am not very experienced with lit/python.

Closed by commit rL292231: [lit] Limit parallelism of sanitizer tests on Darwin [llvm part] (authored by kuba.brecka). · Explain WhyJan 17 2017, 9:26 AM

This revision was automatically updated to reflect the committed changes.

LLVM part landed in r292231, compiler-rt part in r292232.

MatzeB added a subscriber: MatzeB.Jan 17 2017, 10:15 AM

MatzeB added inline comments.

llvm/trunk/utils/lit/lit/main.py
329	Mentioning the sanitizer and the magic number 3 in the generic lit code seems wrong to me. This should really be in the compiler-rt lit.site.cfg somehow.

Mentioning the sanitizer and the magic number 3 in the generic lit code seems wrong to me.
This should really be in the compiler-rt lit.site.cfg somehow.

Right. I've reverted this (there was also a failure on the Windows bot), let me submit another patch for review.

Updating patch. darwin_sanitizer_parallelism_group_func is now only used on Darwin (should fix Windows bot breakage) and the magic number 3 is moved to compiler-rt (into test/lit.common.cfg and unittests/lit.common.unit.cfg).

Generally looks good to me, nitpicks below.

You may consider using the word "pool" instead of "parallelism_group" to match the terminology of the ninja builder.

projects/compiler-rt/test/asan/lit.cfg
245–246 ↗	(On Diff #84855)	could be if config.host_os == 'Darwin' and config.target_arch in ["x86_64", "x86_64h"]:
projects/compiler-rt/unittests/lit.common.unit.cfg
39–41 ↗	(On Diff #84855)	`return "darwin-64bit-sanitizer" if "x86_64" in test.file_path else ""`
utils/lit/lit/run.py
178 ↗	(On Diff #84855)	This should rather be `dict()` or left out
182 ↗	(On Diff #84855)	Supporting callbacks here feels overengineered, but if you have to...
185 ↗	(On Diff #84855)	I would move the name lookup into the `try:` part, so that for an invalid name lit doesn't shutdown completely.

Thanks for reviewing this! I'll submit another patch.

utils/lit/lit/run.py
182 ↗	(On Diff #84855)	I tried really hard not to use a callback, but unfortunately for compiler-rt unittests (gtest-style), we can't just set `config.parallelism_group` at lit launch time -- we only know the architecture after test discovery. I'm open to suggestions if you can think of a better way of handling this.

MatzeB added inline comments.Jan 18 2017, 11:00 AM

utils/lit/lit/run.py
182 ↗	(On Diff #84855)	I don't understand exactly what is happening with the gtests here. Do you expect unittests for architectures other than `config.host_os` being present? I wouldn't expect us to have infrastructure here to even run tests for targets other than the host...

kubamracek added inline comments.Jan 18 2017, 11:29 AM

utils/lit/lit/run.py
182 ↗	(On Diff #84855)	We have i386 and x86_64 unittests. Finding which tests are 64-bit can only be done after test discovery.

MatzeB added inline comments.Jan 18 2017, 11:48 AM

utils/lit/lit/run.py
182 ↗	(On Diff #84855)	Ah. Well, I don't have any better suggestion than the callback right now.

Updating patch, addressing review comments.

kubamracek marked 8 inline comments as done.Jan 19 2017, 2:34 PM

LGTM

This revision is now accepted and ready to land.Jan 19 2017, 4:30 PM

Closed by commit rL292548: [lit] Limit parallelism of sanitizer tests on Darwin [llvm part, take 2] (authored by kuba.brecka). · Explain WhyJan 19 2017, 4:35 PM

This revision was automatically updated to reflect the committed changes.

Thanks! LLVM part in r292548, compiler-rt part in r292549.

Diff 85058

llvm/trunk/utils/lit/lit/LitConfig.py

Show All 18 Lines	class LitConfig(object):
easily.		easily.
"""		"""

def __init__(self, progname, path, quiet,		def __init__(self, progname, path, quiet,
useValgrind, valgrindLeakCheck, valgrindArgs,		useValgrind, valgrindLeakCheck, valgrindArgs,
noExecute, debug, isWindows,		noExecute, debug, isWindows,
params, config_prefix = None,		params, config_prefix = None,
maxIndividualTestTime = 0,		maxIndividualTestTime = 0,
maxFailures = None):		maxFailures = None,
		parallelism_groups = []):
# The name of the test runner.		# The name of the test runner.
self.progname = progname		self.progname = progname
# The items to add to the PATH environment variable.		# The items to add to the PATH environment variable.
self.path = [str(p) for p in path]		self.path = [str(p) for p in path]
self.quiet = bool(quiet)		self.quiet = bool(quiet)
self.useValgrind = bool(useValgrind)		self.useValgrind = bool(useValgrind)
self.valgrindLeakCheck = bool(valgrindLeakCheck)		self.valgrindLeakCheck = bool(valgrindLeakCheck)
self.valgrindUserArgs = list(valgrindArgs)		self.valgrindUserArgs = list(valgrindArgs)
Show All 21 Lines	def __init__(self, progname, path, quiet,
self.valgrindArgs.append('--leak-check=full')		self.valgrindArgs.append('--leak-check=full')
else:		else:
# The default is 'summary'.		# The default is 'summary'.
self.valgrindArgs.append('--leak-check=no')		self.valgrindArgs.append('--leak-check=no')
self.valgrindArgs.extend(self.valgrindUserArgs)		self.valgrindArgs.extend(self.valgrindUserArgs)

self.maxIndividualTestTime = maxIndividualTestTime		self.maxIndividualTestTime = maxIndividualTestTime
self.maxFailures = maxFailures		self.maxFailures = maxFailures
		self.parallelism_groups = parallelism_groups

@property		@property
def maxIndividualTestTime(self):		def maxIndividualTestTime(self):
"""		"""
Interface for getting maximum time to spend executing		Interface for getting maximum time to spend executing
a single test		a single test
"""		"""
return self._maxIndividualTestTime		return self._maxIndividualTestTime
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/trunk/utils/lit/lit/TestingConfig.py

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	def load_from_path(self, path, litConfig):
path, traceback.format_exc()))		path, traceback.format_exc()))

self.finish(litConfig)		self.finish(litConfig)

def __init__(self, parent, name, suffixes, test_format,		def __init__(self, parent, name, suffixes, test_format,
environment, substitutions, unsupported,		environment, substitutions, unsupported,
test_exec_root, test_source_root, excludes,		test_exec_root, test_source_root, excludes,
available_features, pipefail, limit_to_features = [],		available_features, pipefail, limit_to_features = [],
is_early = False):		is_early = False, parallelism_group = ""):
self.parent = parent		self.parent = parent
self.name = str(name)		self.name = str(name)
self.suffixes = set(suffixes)		self.suffixes = set(suffixes)
self.test_format = test_format		self.test_format = test_format
self.environment = dict(environment)		self.environment = dict(environment)
self.substitutions = list(substitutions)		self.substitutions = list(substitutions)
self.unsupported = unsupported		self.unsupported = unsupported
self.test_exec_root = test_exec_root		self.test_exec_root = test_exec_root
self.test_source_root = test_source_root		self.test_source_root = test_source_root
self.excludes = set(excludes)		self.excludes = set(excludes)
self.available_features = set(available_features)		self.available_features = set(available_features)
self.pipefail = pipefail		self.pipefail = pipefail
# This list is used by TestRunner.py to restrict running only tests that		# This list is used by TestRunner.py to restrict running only tests that
# require one of the features in this list if this list is non-empty.		# require one of the features in this list if this list is non-empty.
# Configurations can set this list to restrict the set of tests to run.		# Configurations can set this list to restrict the set of tests to run.
self.limit_to_features = set(limit_to_features)		self.limit_to_features = set(limit_to_features)
# Whether the suite should be tested early in a given run.		# Whether the suite should be tested early in a given run.
self.is_early = bool(is_early)		self.is_early = bool(is_early)
		self.parallelism_group = parallelism_group

def finish(self, litConfig):		def finish(self, litConfig):
"""finish() - Finish this config object, after loading is complete."""		"""finish() - Finish this config object, after loading is complete."""

self.name = str(self.name)		self.name = str(self.name)
self.suffixes = set(self.suffixes)		self.suffixes = set(self.suffixes)
self.environment = dict(self.environment)		self.environment = dict(self.environment)
self.substitutions = list(self.substitutions)		self.substitutions = list(self.substitutions)
Show All 18 Lines

llvm/trunk/utils/lit/lit/main.py

Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	def main_with_tmp(builtinParameters):

isWindows = platform.system() == 'Windows'		isWindows = platform.system() == 'Windows'

# Create the global config object.		# Create the global config object.
litConfig = lit.LitConfig.LitConfig(		litConfig = lit.LitConfig.LitConfig(
progname = os.path.basename(sys.argv[0]),		progname = os.path.basename(sys.argv[0]),
path = opts.path,		path = opts.path,
quiet = opts.quiet,		quiet = opts.quiet,
useValgrind = opts.useValgrind,		useValgrind = opts.useValgrind,
		MatzeBUnsubmitted Not Done Reply Inline Actions Mentioning the sanitizer and the magic number 3 in the generic lit code seems wrong to me. This should really be in the compiler-rt lit.site.cfg somehow. MatzeB: Mentioning the sanitizer and the magic number 3 in the generic lit code seems wrong to me. This…
valgrindLeakCheck = opts.valgrindLeakCheck,		valgrindLeakCheck = opts.valgrindLeakCheck,
valgrindArgs = opts.valgrindArgs,		valgrindArgs = opts.valgrindArgs,
noExecute = opts.noExecute,		noExecute = opts.noExecute,
debug = opts.debug,		debug = opts.debug,
isWindows = isWindows,		isWindows = isWindows,
params = userParams,		params = userParams,
config_prefix = opts.configPrefix,		config_prefix = opts.configPrefix,
maxIndividualTestTime = maxIndividualTestTime,		maxIndividualTestTime = maxIndividualTestTime,
maxFailures = opts.maxFailures)		maxFailures = opts.maxFailures,
		parallelism_groups = {})

# Perform test discovery.		# Perform test discovery.
run = lit.run.Run(litConfig,		run = lit.run.Run(litConfig,
lit.discovery.find_tests_for_inputs(litConfig, inputs))		lit.discovery.find_tests_for_inputs(litConfig, inputs))

# After test discovery the configuration might have changed		# After test discovery the configuration might have changed
# the maxIndividualTestTime. If we explicitly set this on the		# the maxIndividualTestTime. If we explicitly set this on the
# command line then override what was set in the test configuration		# command line then override what was set in the test configuration
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/trunk/utils/lit/lit/run.py

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	class Run(object):
This class represents a concrete, configured testing run.		This class represents a concrete, configured testing run.
"""		"""

def __init__(self, lit_config, tests):		def __init__(self, lit_config, tests):
self.lit_config = lit_config		self.lit_config = lit_config
self.tests = tests		self.tests = tests

def execute_test(self, test):		def execute_test(self, test):
		pg = test.config.parallelism_group
		if callable(pg): pg = pg(test)

result = None		result = None
start_time = time.time()		semaphore = None
try:		try:
		if pg: semaphore = self.parallelism_semaphores[pg]
		if semaphore: semaphore.acquire()
		start_time = time.time()
result = test.config.test_format.execute(test, self.lit_config)		result = test.config.test_format.execute(test, self.lit_config)

# Support deprecated result from execute() which returned the result		# Support deprecated result from execute() which returned the result
# code and additional output as a tuple.		# code and additional output as a tuple.
if isinstance(result, tuple):		if isinstance(result, tuple):
code, output = result		code, output = result
result = lit.Test.Result(code, output)		result = lit.Test.Result(code, output)
elif not isinstance(result, lit.Test.Result):		elif not isinstance(result, lit.Test.Result):
raise ValueError("unexpected result from test execution")		raise ValueError("unexpected result from test execution")

		result.elapsed = time.time() - start_time
except KeyboardInterrupt:		except KeyboardInterrupt:
raise		raise
except:		except:
if self.lit_config.debug:		if self.lit_config.debug:
raise		raise
output = 'Exception during script execution:\n'		output = 'Exception during script execution:\n'
output += traceback.format_exc()		output += traceback.format_exc()
output += '\n'		output += '\n'
result = lit.Test.Result(lit.Test.UNRESOLVED, output)		result = lit.Test.Result(lit.Test.UNRESOLVED, output)
result.elapsed = time.time() - start_time		finally:
		if semaphore: semaphore.release()

test.setResult(result)		test.setResult(result)

def execute_tests(self, display, jobs, max_time=None,		def execute_tests(self, display, jobs, max_time=None,
use_processes=False):		use_processes=False):
"""		"""
execute_tests(display, jobs, [max_time])		execute_tests(display, jobs, [max_time])

Show All 16 Lines	def execute_tests(self, display, jobs, max_time=None,
"""		"""

# Choose the appropriate parallel execution implementation.		# Choose the appropriate parallel execution implementation.
consumer = None		consumer = None
if jobs != 1 and use_processes and multiprocessing:		if jobs != 1 and use_processes and multiprocessing:
try:		try:
task_impl = multiprocessing.Process		task_impl = multiprocessing.Process
queue_impl = multiprocessing.Queue		queue_impl = multiprocessing.Queue
		sem_impl = multiprocessing.Semaphore
canceled_flag = multiprocessing.Value('i', 0)		canceled_flag = multiprocessing.Value('i', 0)
consumer = MultiprocessResultsConsumer(self, display, jobs)		consumer = MultiprocessResultsConsumer(self, display, jobs)
except:		except:
# multiprocessing fails to initialize with certain OpenBSD and		# multiprocessing fails to initialize with certain OpenBSD and
# FreeBSD Python versions: http://bugs.python.org/issue3770		# FreeBSD Python versions: http://bugs.python.org/issue3770
# Unfortunately the error raised also varies by platform.		# Unfortunately the error raised also varies by platform.
self.lit_config.note('failed to initialize multiprocessing')		self.lit_config.note('failed to initialize multiprocessing')
consumer = None		consumer = None
if not consumer:		if not consumer:
task_impl = threading.Thread		task_impl = threading.Thread
queue_impl = queue.Queue		queue_impl = queue.Queue
		sem_impl = threading.Semaphore
canceled_flag = LockedValue(0)		canceled_flag = LockedValue(0)
consumer = ThreadResultsConsumer(display)		consumer = ThreadResultsConsumer(display)

		self.parallelism_semaphores = {k: sem_impl(v)
		for k, v in self.lit_config.parallelism_groups.items()}

# Create the test provider.		# Create the test provider.
provider = TestProvider(queue_impl, canceled_flag)		provider = TestProvider(queue_impl, canceled_flag)
handleFailures(provider, consumer, self.lit_config.maxFailures)		handleFailures(provider, consumer, self.lit_config.maxFailures)

# Queue the tests outside the main thread because we can't guarantee		# Queue the tests outside the main thread because we can't guarantee
# that we can put() all the tests without blocking:		# that we can put() all the tests without blocking:
# https://docs.python.org/2/library/multiprocessing.html		# https://docs.python.org/2/library/multiprocessing.html
# e.g: On Mac OS X, we will hang if we put 2^15 elements in the queue		# e.g: On Mac OS X, we will hang if we put 2^15 elements in the queue
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[lit] Limit parallelism of sanitizer tests on Darwin
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85058

llvm/trunk/utils/lit/lit/LitConfig.py

llvm/trunk/utils/lit/lit/TestingConfig.py

llvm/trunk/utils/lit/lit/main.py

llvm/trunk/utils/lit/lit/run.py

This is an archive of the discontinued LLVM Phabricator instance.

[lit] Limit parallelism of sanitizer tests on DarwinClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85058

llvm/trunk/utils/lit/lit/LitConfig.py

llvm/trunk/utils/lit/lit/TestingConfig.py

llvm/trunk/utils/lit/lit/main.py

llvm/trunk/utils/lit/lit/run.py

[lit] Limit parallelism of sanitizer tests on Darwin
ClosedPublic