This is an archive of the discontinued LLVM Phabricator instance.

[lit] Limit parallelism of sanitizer tests on Darwin
ClosedPublic

Authored by kubamracek on Jan 6 2017, 4:27 PM.

Details

Summary

Running lit tests and unit tests of ASan and TSan on macOS has very bad performance when running with a high number of threads. This is caused by xnu (the macOS kernel), which currently doesn't handle mapping and unmapping of sanitizer shadow regions (reserved VM which are several terabytes large) very well. The situation is so bad that increasing the number of threads actually makes the total testing time larger. The macOS buildbots are affected by this. Note that we can't easily limit the number of sanitizer testing threads without affecting the rest of the tests.

This patch adds a special "group" into lit, and limits the number of concurrently running tests in this group. This helps solve the contention problem, while still allowing other tests to run in full, that means running lit with -j8 will still with 8 threads, and parallelism is only limited in sanitizer tests.

Diff Detail

Repository
rL LLVM

Event Timeline

kubamracek updated this revision to Diff 83457.Jan 6 2017, 4:27 PM
kubamracek retitled this revision from to [lit] Limit parallelism of sanitizer tests on Darwin.
kubamracek updated this object.
kubamracek set the repository for this revision to rL LLVM.
kubamracek added a project: Restricted Project.
dvyukov edited edge metadata.Jan 7 2017, 2:34 AM

LGTM

We noticed that issues with slow mmaps/munmaps on darwin as well.
Looks fine to me, but I am not very experienced with lit/python.

This revision was automatically updated to reflect the committed changes.

LLVM part landed in r292231, compiler-rt part in r292232.

MatzeB added a subscriber: MatzeB.Jan 17 2017, 10:15 AM
MatzeB added inline comments.
llvm/trunk/utils/lit/lit/main.py
329

Mentioning the sanitizer and the magic number 3 in the generic lit code seems wrong to me.
This should really be in the compiler-rt lit.site.cfg somehow.

kubamracek reopened this revision.Jan 17 2017, 10:18 AM

Mentioning the sanitizer and the magic number 3 in the generic lit code seems wrong to me.
This should really be in the compiler-rt lit.site.cfg somehow.

Right. I've reverted this (there was also a failure on the Windows bot), let me submit another patch for review.

kubamracek added a reviewer: MatzeB.

Updating patch. darwin_sanitizer_parallelism_group_func is now only used on Darwin (should fix Windows bot breakage) and the magic number 3 is moved to compiler-rt (into test/lit.common.cfg and unittests/lit.common.unit.cfg).

MatzeB edited edge metadata.Jan 18 2017, 10:41 AM

Generally looks good to me, nitpicks below.

You may consider using the word "pool" instead of "parallelism_group" to match the terminology of the ninja builder.

projects/compiler-rt/test/asan/lit.cfg
245–246 ↗(On Diff #84855)

could be

if config.host_os == 'Darwin' and config.target_arch in ["x86_64", "x86_64h"]:
projects/compiler-rt/unittests/lit.common.unit.cfg
39–41 ↗(On Diff #84855)

return "darwin-64bit-sanitizer" if "x86_64" in test.file_path else ""

utils/lit/lit/run.py
178 ↗(On Diff #84855)

This should rather be dict() or left out

182 ↗(On Diff #84855)

Supporting callbacks here feels overengineered, but if you have to...

185 ↗(On Diff #84855)

I would move the name lookup into the try: part, so that for an invalid name lit doesn't shutdown completely.

Thanks for reviewing this! I'll submit another patch.

utils/lit/lit/run.py
182 ↗(On Diff #84855)

I tried really hard not to use a callback, but unfortunately for compiler-rt unittests (gtest-style), we can't just set config.parallelism_group at lit launch time -- we only know the architecture after test discovery. I'm open to suggestions if you can think of a better way of handling this.

MatzeB added inline comments.Jan 18 2017, 11:00 AM
utils/lit/lit/run.py
182 ↗(On Diff #84855)

I don't understand exactly what is happening with the gtests here. Do you expect unittests for architectures other than config.host_os being present?
I wouldn't expect us to have infrastructure here to even run tests for targets other than the host...

kubamracek added inline comments.Jan 18 2017, 11:29 AM
utils/lit/lit/run.py
182 ↗(On Diff #84855)

We have i386 and x86_64 unittests. Finding which tests are 64-bit can only be done after test discovery.

MatzeB added inline comments.Jan 18 2017, 11:48 AM
utils/lit/lit/run.py
182 ↗(On Diff #84855)

Ah. Well, I don't have any better suggestion than the callback right now.

kubamracek updated this revision to Diff 85037.Jan 19 2017, 2:33 PM

Updating patch, addressing review comments.

kubamracek marked 8 inline comments as done.Jan 19 2017, 2:34 PM
MatzeB accepted this revision.Jan 19 2017, 4:30 PM

LGTM

This revision is now accepted and ready to land.Jan 19 2017, 4:30 PM
This revision was automatically updated to reflect the committed changes.

Thanks! LLVM part in r292548, compiler-rt part in r292549.