This is an archive of the discontinued LLVM Phabricator instance.

[libc] Resolve race condition in sub-process test runner.
ClosedPublic

Authored by PaulkaToast on Mar 16 2020, 9:55 PM.

Details

Summary

There seems to be a race condition between the pipe closing and the child process death. Likely these two events are not atomic on some versions of linux.

With the removal of WNOHANG we eliminate the race condition, however if the child closes the pipe intentionally then it could result in the test runner hanging. I find this situation less likely, where as I experience failures locally with this race condition rather consistently.

Diff Detail

Event Timeline

PaulkaToast created this revision.Mar 16 2020, 9:55 PM
sivachandra marked an inline comment as done.Mar 16 2020, 10:15 PM
sivachandra added inline comments.
libc/utils/testutils/ExecuteFunctionUnix.cpp
73

At this point, this assert is probably pointless.

sivachandra accepted this revision.Mar 16 2020, 10:16 PM
This revision is now accepted and ready to land.Mar 16 2020, 10:16 PM
abrachet accepted this revision.Mar 16 2020, 10:40 PM

LGTM. Sorry about that, I never caught this failing so thanks for catching it!

however if the child closes the pipe intentionally then it could result in the test runner hanging. I find this situation less likely, where as I experience failures locally with this race condition rather consistently.

Agreed I don't think the death test would run ::close(4), and this is a more controlled environment so I think it is fine. If we want we can do the more traditional pattern of setting an alarm and sigwaiting for either SIGALRM or SIGCHLD, but I find signals very hacky.

sivachandra marked an inline comment as done.Mar 17 2020, 10:58 AM
sivachandra added inline comments.
libc/utils/testutils/ExecuteFunctionUnix.cpp
70–75

BTW, it is worth adding a comment here that Pid will not be recycled until the call to waitpid and so waiting on it is safe even after the process actually exits. Every time I read code like this, I end up having to Google for Pid recycling rules.

Add comment explaining why we wait on pid.

abrachet accepted this revision.Mar 17 2020, 1:27 PM
This revision was automatically updated to reflect the committed changes.