This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] [test] Avoid race conditions between tests regarding temp directories
ClosedPublic

Authored by mstorsjo on Mar 16 2021, 6:07 AM.

Details

Summary

Prior to e0d01294bc124211a8ffb55e69162eb34a242680, all tests used a
random directory name, but now it is deterministic, based on the
test name. This change was done under the assumption that the filename
portion of the cwd is unique across tests that use the filesystem
test temporary directories.

When running tests locally, the cwd of the test is something like
"<build-dir>/test/<test path>/Output/copy_assign.pass.cpp.dir",
and the filename portion, "copy_assign.pass.cpp.dir", is used as
base for the temp directory names.

The change noted that there's a risk for race conditions if multiple
threads within one test try to create temp directories in parallel, but
that doesn't really happen in practice.

However, if running tests with a large number of parallel workers,
multiple tests with the same filename portion, e.g. "copy_assign.pass.cpp.dir",
can run in parallel, leading to race conditions across processes.

Therefore, add a hash of the full cwd to distinguish such cases
from each other.

Secondly, don't use two separate levels of temporary directories
(<base>/static_env.0). When cleaning up, only the individual
directory is removed, leaving the empty intermediate directory
behind littering the temp directory.

Diff Detail

Event Timeline

mstorsjo requested review of this revision.Mar 16 2021, 6:07 AM
mstorsjo created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2021, 6:07 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone added inline comments.
libcxx/test/support/filesystem_test_helper.h
324–331

How about (suggested edit)? My logic is that if this piece of code ever produces a collision EVER, then we need a programmer to revisit this piece of code; there's no sense in trying to make the code "work around" a collision, because the whole point is that collisions shouldn't happen.

However, I still don't understand why there's a collision here in the first place. You say "As some multiple test subdirectories contain tests with the same file names..." but shouldn't we just be using the entire (unique) path, then? Why (and where) are we throwing away the parts of the path that aren't the filename?

Also, is it possible that we could still have collisions when the same llvm-lit process spawns two copies of the same test (does it ever do that? idk) or when two different llvm-lit processes spawn the same test at the same time (do we support running two llvm-lit runs simultaneously? idk)

mstorsjo added inline comments.Mar 16 2021, 7:26 AM
libcxx/test/support/filesystem_test_helper.h
324–331

No, we can't just add an incrementing integer and forcibly require it to be unique - it's a race condition.

Originally this used random names and all was well - but to simplify building in environments without randomness available, it was chosen to use cwd.filename() as uniquefier, under the assumption that cwd.filename() is unique within the set of tests that use these helpers.

At this point, cwd would be something like <build-dir>/test/std/input.output/filesystems/class.directory_iterator/directory_iterator.members/Output/copy_assign.pass.cpp.dir. The assumption made in the commit that removed randomness was that cwd.filename(), i.e. copy_assign.pass.cpp.dir would be unique (i.e. the only possible race condition is if there's multiple instances in parallel within the same test), but in this case, there's 3 copy_ssign.pass.cpp under test/std/input.output/filesystems.

Including the full path from cwd on top of tmp is a huge mess (also see D98702 about an issue with leftover intermediate directories - we don't want to create a deep tree of possibly leftover intermediate directoreis).

Adding a small bit of randomness should guard against the few testcases with similar names hitting the same dir at the same time.

The alternative woule perhaps be to create the temp dir tree directly under cwd instead of under the system temp root - maybe that would be ok? That would at least avoid littering that dir excessively. But maybe there's other reasons why the filesystem tests wants to keep these temp files outside of the normal test file tree?

(One run of llvm-lit normally doesn't launch the same test twice afaik, but two runs of llvm-lit in parallel on the same tree would trigger the same issues. I don't think we explicitly forbid being able to run two instances of llvm-lit with overlapping sets of tests at the same time - being able to do that sounds handy to me.)

libcxx/test/support/filesystem_test_helper.h
324–331

it's a race condition

Do you mean because the integer isn't atomic? I thought of that, but I figured that incrementing an unsigned int is at least vastly more likely to be safe in practice than calling operator() on an mt19937, which is what you're proposing to do in this patch right now.

Would it make you feel better to stick a std::atomic around that unsigned (on platforms that support atomic)? And/or have I misunderstood the race you're worried about?

Including the full path from cwd on top of tmp is a huge mess

Is it, though? Can't you just replace('/', '-') (and maybe also replace('\\', '-') for Windows) to turn any full path into a single suitable directory name? Or is the problem that it quickly becomes longer than PATH_MAX? Could we take the hash of the whole path name, using SHA1 if we've got it handy, or std::hash<std::string> if we haven't?

mstorsjo added inline comments.Mar 16 2021, 8:39 AM
libcxx/test/support/filesystem_test_helper.h
324–331

No, there's no race within the process (unless tests use this in threads, but I don't think any test do) so atomicity doesn't matter, it's races between processes that is the issue.

I guess a hash of the cwd could work for deterministically distinguishing between cases without creating an unwieldy path though...

Just curious but can't we use tmpnam?

Just curious but can't we use tmpnam?

Maybe - that relies on some sort of randomness I guess, and I dunno if it exists in the restricted environments where one would build with random device disabled. On the other hand, the test environment probably still is fully featured.

tmpnam itself seems to be kinda discouraged/deprecated though...

libcxx/test/support/filesystem_test_helper.h
324–331

Okay, so how about

fs::path const cwd = utils::getcwd();
fs::path const tmp = fs::temp_directory_path();
std::string base = cwd.filename().string();
size_t i = std::hash<std::string>()(cwd.string());
fs::path p = tmp / (base + "-static_env." + std::to_string(i));
while (utils::exists(p.string())) { ...
mstorsjo updated this revision to Diff 331177.Mar 17 2021, 1:42 AM
mstorsjo retitled this revision from [libcxx] [test] Readd randomness to temporary directory names to [libcxx] [test] Avoid race conditions between tests regarding temp directories.
mstorsjo edited the summary of this revision. (Show Details)

Squashed the two changes, using a hash of the full cwd instead of random chars.

mstorsjo updated this revision to Diff 331203.Mar 17 2021, 3:31 AM

Reupload to rerun CI (with the preceding patch detached in phabricator so it won't be applied here)

curdeius accepted this revision as: curdeius.Mar 19 2021, 12:55 AM
curdeius added a subscriber: curdeius.

LGTM.
It might come handy indeed, I also sometimes run lit in parallel from two different directories (e.g. one for clang and one for gcc), but haven't encountered this particular problem.

Quuxplusone accepted this revision.Mar 19 2021, 6:41 AM

LGTM at this point.

This revision is now accepted and ready to land.Mar 19 2021, 6:41 AM