This is an archive of the discontinued LLVM Phabricator instance.

[lit] Support sharding testsuites, for parallel execution.
ClosedPublic

Authored by graydon on Jan 16 2017, 5:27 PM.

Download Raw Diff

Details

Reviewers

modocache
• ddunbar

Commits

rGae5d7bb4f5a2: [lit] Support sharding testsuites, for parallel execution.
rL292417: [lit] Support sharding testsuites, for parallel execution.

Summary

This change equips lit.py with two new options, --num-shards=M and
--run-shard=N (set by default from env vars LIT_NUM_SHARDS and LIT_RUN_SHARD).

The options must be used together, and N must be in 0..M-1.

Together these options effect only test selection: they partition the testsuite
into M equal-sized "shards", then select only the Nth shard. They can be used
in a cluster of test machines to achieve a very crude (static) form of
parallelism, with minimal configuration work.

Diff Detail

Build Status

Buildable 2966
Build 2966: arc lint + arc unit

Event Timeline

graydon created this revision.Jan 16 2017, 5:27 PM

Herald added a reviewer: modocache. · View Herald TranscriptJan 16 2017, 5:27 PM

LGTM, this seems like a great idea!

utils/lit/lit/main.py
436	Would it be better to shard in a round robin fashion? There is some tendency for tests to be clumped by where they are defined, and where they are defined to be (weakly) correlated with how long they take to run, so that would distribute long running tests across machines, which should help reduce the deviation between total testing time among shards.

This revision is now accepted and ready to land.Jan 17 2017, 11:37 AM

graydon added inline comments.Jan 17 2017, 1:48 PM

utils/lit/lit/main.py
436	Considered it, but decided against based on the (possibly wrong) guess that the discovery-clumping order would have better locality in terms of what test-prerequisites are built, tested, and hot-in-cache. If you think round-robin will work better overall, I'm happy to change it.

• ddunbar added inline comments.Jan 17 2017, 2:01 PM

utils/lit/lit/main.py
436	What do you mean by test prerequisites? lit currently doesn't really do any shared work on a per-test basis that could be cached. One other advantage of the current clumping is you are more likely to get deterministic assignments to machines, which is a blessing and a curse. The blessing means you won't have weird configuration changes that users might not think to check, the curse means you are less likely to shake such things out. I'm ok with the current patch unless you feel swayed the other direction.

graydon added inline comments.Jan 17 2017, 9:33 PM

utils/lit/lit/main.py
436	I meant things like, say, if there is a module that gets cached as a .pcm between a bunch of tests against it, or a .dylib or .a that's only generated on-demand for running tests, there might be advantage in only running them in one spot. I agree the likelihood of the same test running on the same machine over multiple runs might be either good or bad. My gut suggests it's better to actually have them move around some, to shake nondeterminism bugs out. Hard to say. I just redid the code to support round-robin assignment (and fixed some bugs) and think it actually reads a bit nicer, and thinking it over I think it might be a bit more useful as a smoke-test or profiling mode for users as well (i.e. you can run --num-shards=100 --run-shard=1 to run an evenly-distributed 1% of the testsuite against a wip change). Will post revised patch once I've adjusted tests.

Update to round-robin sharding

The previous approach would split a testsuite like [1, 2, 3, 4, 5] into 3
shards [1, 2], [3, 4], and [5]. This change will split it into 3 shards
[1, 4, [2, 5], and [3]. That is, it takes "every Nth test" rather than
"the next N tests" for each shard.

Also fixed the tests to actually run FileCheck.

Harbormaster completed remote builds in B3023: Diff 84797.Jan 17 2017, 9:36 PM

Closed by commit rL292417: [lit] Support sharding testsuites, for parallel execution. (authored by graydon). · Explain WhyJan 18 2017, 10:23 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

docs/

CommandGuide/

lit.rst

17 lines

utils/

lit/

main.py

36 lines

tests/

selecting.py

90 lines

Diff 84615

docs/CommandGuide/lit.rst

	Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
	.. option:: --max-time=N			.. option:: --max-time=N

	Spend at most ``N`` seconds (approximately) running tests and then terminate.			Spend at most ``N`` seconds (approximately) running tests and then terminate.

	.. option:: --shuffle			.. option:: --shuffle

	Run the tests in a random order.			Run the tests in a random order.

				.. option:: --num-shards=M

				Divide the set of selected tests into ``M`` equal-sized subsets or
				"shards", and run only one of them. Must be used with the
				``--run-shard=N`` option, which selects the shard to run. The environment
				variable ``LIT_NUM_SHARDS`` can also be used in place of this
				option. These two options provide a coarse mechanism for paritioning large
				testsuites, for parallel execution on separate machines (say in a large
				testing farm).

				.. option:: --run-shard=N

				Select which shard to run, assuming the ``--num-shards=M`` option was
				provided. The two options must be used together, and the value of ``N``
				must be in the range ``0..M-1``. The environment variable
				``LIT_RUN_SHARD`` can also be used in place of this option.

	ADDITIONAL OPTIONS			ADDITIONAL OPTIONS
	------------------			------------------

	.. option:: --debug			.. option:: --debug

	Run :program:`lit` in debug mode, for debugging configuration issues and			Run :program:`lit` in debug mode, for debugging configuration issues and
	:program:`lit` itself.			:program:`lit` itself.

	▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

utils/lit/lit/main.py

Show All 10 Lines
import platform		import platform
import random		import random
import re		import re
import sys		import sys
import time		import time
import argparse		import argparse
import tempfile		import tempfile
import shutil		import shutil
		import math

import lit.ProgressBar		import lit.ProgressBar
import lit.LitConfig		import lit.LitConfig
import lit.Test		import lit.Test
import lit.run		import lit.run
import lit.util		import lit.util
import lit.discovery		import lit.discovery

▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	def main_with_tmp(builtinParameters):
selection_group.add_argument("-i", "--incremental",		selection_group.add_argument("-i", "--incremental",
help="Run modified and failing tests first (updates "		help="Run modified and failing tests first (updates "
"mtimes)",		"mtimes)",
action="store_true", default=False)		action="store_true", default=False)
selection_group.add_argument("--filter", metavar="REGEX",		selection_group.add_argument("--filter", metavar="REGEX",
help=("Only run tests with paths matching the given "		help=("Only run tests with paths matching the given "
"regular expression"),		"regular expression"),
action="store", default=None)		action="store", default=None)
		selection_group.add_argument("--num-shards", dest="numShards", metavar="M",
		help="Split testsuite into N pieces and only run one",
		action="store", type=int,
		default=os.environ.get("LIT_NUM_SHARDS"))
		selection_group.add_argument("--run-shard", dest="runShard", metavar="N",
		help="Run the Nth shard of the testsuite",
		action="store", type=int,
		default=os.environ.get("LIT_RUN_SHARD"))

debug_group = parser.add_argument_group("Debug and Experimental Options")		debug_group = parser.add_argument_group("Debug and Experimental Options")
debug_group.add_argument("--debug",		debug_group.add_argument("--debug",
help="Enable debugging (for 'lit' development)",		help="Enable debugging (for 'lit' development)",
action="store_true", default=False)		action="store_true", default=False)
debug_group.add_argument("--show-suites", dest="showSuites",		debug_group.add_argument("--show-suites", dest="showSuites",
help="Show discovered test suites",		help="Show discovered test suites",
action="store_true", default=False)		action="store_true", default=False)
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	def main_with_tmp(builtinParameters):
# Then select the order.		# Then select the order.
if opts.shuffle:		if opts.shuffle:
random.shuffle(run.tests)		random.shuffle(run.tests)
elif opts.incremental:		elif opts.incremental:
sort_by_incremental_cache(run)		sort_by_incremental_cache(run)
else:		else:
run.tests.sort(key = lambda t: (not t.isEarlyTest(), t.getFullName()))		run.tests.sort(key = lambda t: (not t.isEarlyTest(), t.getFullName()))

		# Then optionally restrict our attention to a shard of the tests.
		if (opts.numShards is not None) or (opts.runShard is not None):
		if (opts.numShards is None) or (opts.runShard is None):
		parser.error("--num-shards and --run-shard must be used together")
		if opts.numShards <= 0:
		parser.error("--num-shards must be positive")
		if (opts.runShard < 0) or (opts.runShard >= opts.numShards):
		parser.error("--run-shard must be between 0 and --num-shards (exclusive)")
		nTests = len(run.tests)
		# Take a ceiling so that any remainder is spread over all the shards,
		# rather than accumulating it all in the last one.
		shard_size = int(math.ceil(float(nTests) / opts.numShards))
		shard_begin = opts.runShard * shard_size
		if shard_begin >= nTests:
		litConfig.note('Selecting shard %d/%d = size 0/%d' %
		(opts.runShard, opts.numShards, nTests))
		run.tests = []
		else:
		# If there was a remainder, final shard might be a little short.
		shard_end = min(nTests, shard_begin + shard_size)
		litConfig.note('Selecting shard %d/%d = size %d/%d = range [%d, %d]' % \
		(opts.runShard, opts.numShards,
		(shard_end - shard_begin), nTests,
		# note: lit reports test numbers starting from 1
		shard_begin+1, shard_end))
		run.tests = run.tests[shard_begin:shard_end]
		ddunbarUnsubmitted Not Done Reply Inline Actions Would it be better to shard in a round robin fashion? There is some tendency for tests to be clumped by where they are defined, and where they are defined to be (weakly) correlated with how long they take to run, so that would distribute long running tests across machines, which should help reduce the deviation between total testing time among shards. ddunbar: Would it be better to shard in a round robin fashion? There is some tendency for tests to be…
		graydonAuthorUnsubmitted Not Done Reply Inline Actions Considered it, but decided against based on the (possibly wrong) guess that the discovery-clumping order would have better locality in terms of what test-prerequisites are built, tested, and hot-in-cache. If you think round-robin will work better overall, I'm happy to change it. graydon: Considered it, but decided against based on the (possibly wrong) guess that the discovery…
		ddunbarUnsubmitted Not Done Reply Inline Actions What do you mean by test prerequisites? lit currently doesn't really do any shared work on a per-test basis that could be cached. One other advantage of the current clumping is you are more likely to get deterministic assignments to machines, which is a blessing and a curse. The blessing means you won't have weird configuration changes that users might not think to check, the curse means you are less likely to shake such things out. I'm ok with the current patch unless you feel swayed the other direction. ddunbar: What do you mean by test prerequisites? lit currently doesn't really do any shared work on a…
		graydonAuthorUnsubmitted Not Done Reply Inline Actions I meant things like, say, if there is a module that gets cached as a .pcm between a bunch of tests against it, or a .dylib or .a that's only generated on-demand for running tests, there might be advantage in only running them in one spot. I agree the likelihood of the same test running on the same machine over multiple runs might be either good or bad. My gut suggests it's better to actually have them move around some, to shake nondeterminism bugs out. Hard to say. I just redid the code to support round-robin assignment (and fixed some bugs) and think it actually reads a bit nicer, and thinking it over I think it might be a bit more useful as a smoke-test or profiling mode for users as well (i.e. you can run --num-shards=100 --run-shard=1 to run an evenly-distributed 1% of the testsuite against a wip change). Will post revised patch once I've adjusted tests. graydon: I meant things like, say, if there is a module that gets cached as a .pcm between a bunch of…

# Finally limit the number of tests, if desired.		# Finally limit the number of tests, if desired.
if opts.maxTests is not None:		if opts.maxTests is not None:
run.tests = run.tests[:opts.maxTests]		run.tests = run.tests[:opts.maxTests]

# Don't create more threads than tests.		# Don't create more threads than tests.
opts.numThreads = min(len(run.tests), opts.numThreads)		opts.numThreads = min(len(run.tests), opts.numThreads)

# Because some tests use threads internally, and at least on Linux each		# Because some tests use threads internally, and at least on Linux each
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

utils/lit/tests/selecting.py

This file was added.

				# RUN: %{lit} %{inputs}/discovery \| FileCheck --check-prefix=CHECK-BASIC %s
				# CHECK-BASIC: Testing: 5 tests


				# Check that regex-filtering works
				#
				# RUN: %{lit} --filter 'o[a-z]e' %{inputs}/discovery \| FileCheck --check-prefix=CHECK-FILTER %s
				# CHECK-FILTER: Testing: 2 of 5 tests


				# Check that maximum counts work
				#
				# RUN: %{lit} --max-tests 3 %{inputs}/discovery \| FileCheck --check-prefix=CHECK-MAX %s
				# CHECK-MAX: Testing: 3 of 5 tests


				# Check that sharding partitions the testsuite in a way that distributes the
				# rounding error nicely (i.e. 5/3 => 2 2 1, not 1 1 3 or whatever)
				#
				# RUN: %{lit} --num-shards 3 --run-shard 0 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD0-ERR < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD0-OUT < %t.out %s
				# CHECK-SHARD0-ERR: note: Selecting shard 0/3 = size 2/5 = range [1, 2]
				# CHECK-SHARD0-OUT: Testing: 2 of 5 tests
				#
				# RUN: %{lit} --num-shards 3 --run-shard 1 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD1-ERR < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD1-OUT < %t.out %s
				# CHECK-SHARD1-ERR: note: Selecting shard 1/3 = size 2/5 = range [3, 4]
				# CHECK-SHARD1-OUT: Testing: 2 of 5 tests
				#
				# RUN: %{lit} --num-shards 3 --run-shard 2 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD2-ERR < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD2-OUT < %t.out %s
				# CHECK-SHARD2-ERR: note: Selecting shard 2/3 = size 1/5 = range [5]
				# CHECK-SHARD2-OUT: Testing: 1 of 5 tests


				# Check that sharding via env vars works.
				#
				# RUN: env LIT_NUM_SHARDS=3 LIT_RUN_SHARD=0 %{lit} %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD0-ERR < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD0-OUT < %t.out %s
				# CHECK-SHARD0-ERR: note: Selecting shard 0/3 = size 2/5 = range [1, 2]
				# CHECK-SHARD0-OUT: Testing: 2 of 5 tests
				#
				# RUN: env LIT_NUM_SHARDS=3 LIT_RUN_SHARD=1 %{lit} %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD1-ERR < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD1-OUT < %t.out %s
				# CHECK-SHARD1-ERR: note: Selecting shard 1/3 = size 2/5 = range [3, 4]
				# CHECK-SHARD1-OUT: Testing: 2 of 5 tests
				#
				# RUN: env LIT_NUM_SHARDS=3 LIT_RUN_SHARD=2 %{lit} %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD2-ERR < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD2-OUT < %t.out %s
				# CHECK-SHARD2-ERR: note: Selecting shard 2/3 = size 1/5 = range [5]
				# CHECK-SHARD2-OUT: Testing: 1 of 5 tests


				# Check that providing more shards than tests results in 1 test per shard
				# until we run out, then 0.
				#
				# RUN: %{lit} --num-shards 100 --run-shard 2 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD-BIG-ERR1 < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD-BIG-OUT1 < %t.out %s
				# CHECK-SHARD-BIG-ERR1: note: Selecting shard 2/100 = size 1/5 = range [3, 3]
				# CHECK-SHARD-BIG-OUT1: Testing: 1 of 5 tests
				#
				# RUN: %{lit} --num-shards 100 --run-shard 5 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD-BIG-ERR2 < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD-BIG-OUT2 < %t.out %s
				# CHECK-SHARD-BIG-ERR2: note: Selecting shard 5/100 = size 0
				# CHECK-SHARD-BIG-OUT2: Testing: 0 of 5 tests
				#
				# RUN: %{lit} --num-shards 100 --run-shard 50 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD-BIG-ERR3 < %t.err %s
				# FileCheck --check-prefix=CHECK-SHARD-BIG-OUT3 < %t.out %s
				# CHECK-SHARD-BIG-ERR3: note: Selecting shard 50/100 = size 0
				# CHECK-SHARD-BIG-OUT3: Testing: 0 of 5 tests


				# Check that range constraints are enforced
				#
				# RUN: not %{lit} --num-shards 0 --run-shard 2 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD-ERR < %t.err %s
				# CHECK-SHARD-ERR: error: --num-shards must be positive
				#
				# RUN: not %{lit} --num-shards 3 --run-shard 3 %{inputs}/discovery >%t.out 2>%t.err
				# FileCheck --check-prefix=CHECK-SHARD-ERR2 < %t.err %s
				# CHECK-SHARD-ERR2: error: --run-shard must be between 0 and --num-shards (exclusive)