This is an archive of the discontinued LLVM Phabricator instance.

Clean temp directories before running lit
ClosedPublic

Authored by zturner on Jun 27 2017, 6:11 PM.

Details

Summary

Currently, lit creates a little "sandbox" for each test to use by doing two things:

  1. Creating a directory in the build folder for each test directory in the source tree.
  2. Within the folder in directory 1, creating a unique file prefix that can be appended to to create unique per-test filenames that an individual test can use.

No effort is made to clean these files up between test runs, which creates two problems.

First is that we just leak files. If a test changes, or gets removed, or moves to a different location, all the files remain in the old location. (Very) gradually this can contribute to out of disk space errors on build bots, but in general it's just bad practice.

More sinister is that leaked files from previous runs can lead to false positives in tests.

This patch makes lit clean up the output directory before each run. One existing design choice made this tricky: A tests' temporary folder is of the form <build-dir>/test/SourceDir/Output. We can't just rmtree the entire <build-dir>/test because there are some lit configuration files in there. So we would have to enumerate the top-level test directory deleting only certain ones.

To simplify this, I inverted the heirarchy, so that now it's <build-dir>/test/Output/SourceDir. this way we can just indiscriminately rmtree anything under Output without worrying about what it is.

A few tests on Windows show that the rmtree takes about 1.75 seconds with a full output tree, which is a negligible portion of the overall test suite run. On Linux I expect the rmtree to be even faster.

There is one test failure with this patch, in llvm-symbolizer.test. For some reason it's demangling symbols differently, which is quite bizarre. Still looking into this, but throwing up this patch early for high level comments.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Jun 27 2017, 6:11 PM
zturner updated this revision to Diff 104331.Jun 27 2017, 6:13 PM

Two files from an unrelated CL snuck in. Removed those.

More sinister is that leaked files from previous runs can lead to false positives in tests.

I've also seen this problem; thanks for working on a fix.

Does this always remove the entire Output directory if you're only running a subset of the tests?

More sinister is that leaked files from previous runs can lead to false positives in tests.

I've also seen this problem; thanks for working on a fix.

Does this always remove the entire Output directory if you're only running a subset of the tests?

No, I actually tried to be smart about it. The following pseudocode describes the algorithm used

paths = list(unique_set(<all output paths>))
sort(paths, by number of components in path ascending)
for (path in paths) {
  if exists(path)
    rmtree(path)
  mkdir(path)
}

the unique'ifying is because you have have A/foo.test and A/bar.test which are both in A. Sorting by number of components in path is because otherwise you might do something like

rmtree("foo/bar")
mkdir("foo/bar")
rmtree("foo")
mkdir("foo")

and then foo/bar is gone too. So we rmtree the highest level folder that must be rmtree'd first, and we only ever remove trees that are necessary to remove based on the subset of tests you're running.

More sinister is that leaked files from previous runs can lead to false positives in tests.

I've also seen this problem; thanks for working on a fix.

Does this always remove the entire Output directory if you're only running a subset of the tests?

No, I actually tried to be smart about it. The following pseudocode describes the algorithm used

paths = list(unique_set(<all output paths>))
sort(paths, by number of components in path ascending)
for (path in paths) {
  if exists(path)
    rmtree(path)
  mkdir(path)
}

the unique'ifying is because you have have A/foo.test and A/bar.test which are both in A. Sorting by number of components in path is because otherwise you might do something like

rmtree("foo/bar")
mkdir("foo/bar")
rmtree("foo")
mkdir("foo")

and then foo/bar is gone too. So we rmtree the highest level folder that must be rmtree'd first, and we only ever remove trees that are necessary to remove based on the subset of tests you're running.

Sounds great.

rnk edited edge metadata.Jun 28 2017, 9:27 AM

No, I actually tried to be smart about it. The following pseudocode describes the algorithm used

Nice, that's really cool. I figured we'd just pay 1.7s the first time we remove a test tree after 'check-all', but then pay less after running just llvm/test/MyAwesomeSuite because there were less output files. This is better, though. :)

rnk added inline comments.Jun 28 2017, 10:16 AM
llvm/utils/lit/lit/run.py
254 ↗(On Diff #104331)

The time module is already available at global scope

260 ↗(On Diff #104331)

Can you add a comment, something like "Sort the list by number of path components so that parent directories come before their children."

264 ↗(On Diff #104331)

Let's just import shutil at global scope and do shutil.rmtree.

271 ↗(On Diff #104331)

Should we print this at all? If we decide we want it, should we print with litConfig.note()?

zturner updated this revision to Diff 104529.Jun 28 2017, 3:30 PM

Updated with fixes from dblaikie. All tests pass now.

zturner updated this revision to Diff 104672.Jun 29 2017, 10:08 AM

Update with suggestions from rnk. With this patch all tests pass.

rnk accepted this revision.Jun 29 2017, 3:42 PM

Code looks good to me as is. Improving the symbolizer test seems like something we can handle as needed or later.

This revision is now accepted and ready to land.Jun 29 2017, 3:42 PM
This revision was automatically updated to reflect the committed changes.