Page MenuHomePhabricator

[lit] Support sharding testsuites, for parallel execution.
ClosedPublic

Authored by graydon on Jan 16 2017, 5:27 PM.

Details

Summary

This change equips lit.py with two new options, --num-shards=M and
--run-shard=N (set by default from env vars LIT_NUM_SHARDS and LIT_RUN_SHARD).

The options must be used together, and N must be in 0..M-1.

Together these options effect only test selection: they partition the testsuite
into M equal-sized "shards", then select only the Nth shard. They can be used
in a cluster of test machines to achieve a very crude (static) form of
parallelism, with minimal configuration work.

Diff Detail

Repository
rL LLVM

Event Timeline

graydon created this revision.Jan 16 2017, 5:27 PM
ddunbar accepted this revision.Jan 17 2017, 11:37 AM

LGTM, this seems like a great idea!

utils/lit/lit/main.py
436 ↗(On Diff #84615)

Would it be better to shard in a round robin fashion? There is some tendency for tests to be clumped by where they are defined, and where they are defined to be (weakly) correlated with how long they take to run, so that would distribute long running tests across machines, which should help reduce the deviation between total testing time among shards.

This revision is now accepted and ready to land.Jan 17 2017, 11:37 AM
graydon added inline comments.Jan 17 2017, 1:48 PM
utils/lit/lit/main.py
436 ↗(On Diff #84615)

Considered it, but decided against based on the (possibly wrong) guess that the discovery-clumping order would have better locality in terms of what test-prerequisites are built, tested, and hot-in-cache. If you think round-robin will work better overall, I'm happy to change it.

ddunbar added inline comments.Jan 17 2017, 2:01 PM
utils/lit/lit/main.py
436 ↗(On Diff #84615)

What do you mean by test prerequisites? lit currently doesn't really do any shared work on a per-test basis that could be cached.

One other advantage of the current clumping is you are more likely to get deterministic assignments to machines, which is a blessing and a curse. The blessing means you won't have weird configuration changes that users might not think to check, the curse means you are less likely to shake such things out.

I'm ok with the current patch unless you feel swayed the other direction.

graydon added inline comments.Jan 17 2017, 9:33 PM
utils/lit/lit/main.py
436 ↗(On Diff #84615)

I meant things like, say, if there is a module that gets cached as a .pcm between a bunch of tests against it, or a .dylib or .a that's only generated on-demand for running tests, there might be advantage in only running them in one spot.

I agree the likelihood of the same test running on the same machine over multiple runs might be either good or bad. My gut suggests it's better to actually have them move around some, to shake nondeterminism bugs out. Hard to say.

I just redid the code to support round-robin assignment (and fixed some bugs) and think it actually reads a bit nicer, and thinking it over I think it might be a bit more useful as a smoke-test or profiling mode for users as well (i.e. you can run --num-shards=100 --run-shard=1 to run an evenly-distributed 1% of the testsuite against a wip change). Will post revised patch once I've adjusted tests.

graydon updated this revision to Diff 84797.Jan 17 2017, 9:36 PM

Update to round-robin sharding

The previous approach would split a testsuite like [1, 2, 3, 4, 5] into 3
shards [1, 2], [3, 4], and [5]. This change will split it into 3 shards
[1, 4, [2, 5], and [3]. That is, it takes "every Nth test" rather than
"the next N tests" for each shard.

Also fixed the tests to actually run FileCheck.

This revision was automatically updated to reflect the committed changes.