This is an archive of the discontinued LLVM Phabricator instance.

Enhance test runner timeout logic to detect successful completion of spawned process when stdout/stderr are shared to still-existing child processes
AbandonedPublic

Authored by tfiala on Oct 1 2015, 1:12 PM.

Details

Reviewers
zturner
labath
Summary

This makes one of the internal process_control tests go green. It covers the case where spawned process P1 itself spawns a child process C1, shared stdout/stderr file handles with C1, and then P1 terminates.

Prior to this change, we would end up timing out rather than detecting the immediate termination of P1 because we would wait for the stdout/stderr file handles to wrap up.

Now we wait on a condition variable that is set via a thread parked on subprocess.Popen.wait() and another on subprocess.Popen.communicate(). This allows us to catch the scenario above. There's an additional thread (for the thread parked on wait()). Both the wait() and the communicate() threads wait efficiently, so this should be minimal cost.

Diff Detail

Event Timeline

tfiala updated this revision to Diff 36287.Oct 1 2015, 1:12 PM
tfiala retitled this revision from to Enhance test runner timeout logic to detect successful completion of spawned process when stdout/stderr are shared to still-existing child processes.
tfiala updated this object.
tfiala added reviewers: labath, zturner.
tfiala added a subscriber: lldb-commits.
tfiala added a comment.Oct 1 2015, 3:48 PM

Hmm, I'm getting a few more tests to time out now with this change:

TIMEOUT: LLDB (suite) :: TestCallStopAndContinue.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestCommandScript.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestFrames.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestInlineStepping.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestReturnValue.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestStepNoDebug.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestThreadAPI.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestThreadJump.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestThreadStepping.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)
TIMEOUT: LLDB (suite) :: TestValueMD5Crash.py (Linux rad 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 x86_64)

I'll need to drill into that.

tfiala added a comment.Oct 1 2015, 5:07 PM

On OS X the only timeout I'm getting is:

TIMEOUT: LLDB (suite) :: TestValueObjectRecursion.py (Darwin tfiala-mbp-01.local 15.0.0 Darwin Kernel Version 15.0.0: Sat Sep 19 15:53:45 PDT 2015; root:xnu-3247.10.11~1/DEVELOPMENT_X86_64 x86_64 i386)

tfiala added a comment.Oct 1 2015, 6:05 PM

The OS X one seems to be related to load (not terribly surprising). I'm seeing timeouts as I crank up the --threads. But I don't think that's the whole story. On the Linux side, I've been cranking down the threads and I'm still seeing it.

In light of the more fundamental issue I just filed here:
https://llvm.org/bugs/show_bug.cgi?id=25019

I'm going to address that piece first (upstream) before I take this part on here. That part needs to be straightened out first. The then I'll come back here.

labath edited edge metadata.Oct 2 2015, 6:51 AM

The OS X one seems to be related to load (not terribly surprising). I'm seeing timeouts as I crank up the --threads. But I don't think that's the whole story. On the Linux side, I've been cranking down the threads and I'm still seeing it.

In light of the more fundamental issue I just filed here:
https://llvm.org/bugs/show_bug.cgi?id=25019

I'm going to address that piece first (upstream) before I take this part on here. That part needs to be straightened out first. The then I'll come back here.

I think that is a reasonable course of action. Having three threads servicing the same process seems wasteful/hard to maintain and has a very big race potential.

tfiala added a comment.Oct 2 2015, 8:31 AM

I think that is a reasonable course of action. Having three threads servicing the same process seems wasteful/hard to maintain and has a very big race potential.

Yeah I'm not liking how complex this is getting.

I'll have more comments once I address the other issue depending on how the solution falls out.

tfiala abandoned this revision.Oct 2 2015, 2:39 PM

I've fixed:
https://llvm.org/bugs/show_bug.cgi?id=25019

I think for now I am not interested in trying to tackle the intent of this change as it unduly complicates the timeout detection logic.

I am okay with saying:
"If you run a process P1, and that process creates child processes C1..CN, and shares the stdout/stderr file handles from P1 to C1..CN, and if P1 exits, we don't detect the exit until all stdout/stderr handles shared with C1..CN are closed." That's just a bad test if it is leaving children around. It will time out.

Addressing that in Python should be possible, as I was working towards here, but I don't see that as being worthwhile. If we didn't time out (as was the case prior to an earlier fix yesterday), that would be an issue. But that's no longer the case.

labath added a comment.Oct 5 2015, 1:55 AM

I've fixed:
https://llvm.org/bugs/show_bug.cgi?id=25019

I think for now I am not interested in trying to tackle the intent of this change as it unduly complicates the timeout detection logic.

I am okay with saying:
"If you run a process P1, and that process creates child processes C1..CN, and shares the stdout/stderr file handles from P1 to C1..CN, and if P1 exits, we don't detect the exit until all stdout/stderr handles shared with C1..CN are closed." That's just a bad test if it is leaving children around. It will time out.

Sounds good to me. This motivates people to write correct tests, which I think is good. My main concern was not leaving those children around after we time out, which I believe you fixed already.

I've fixed:
https://llvm.org/bugs/show_bug.cgi?id=25019

I think for now I am not interested in trying to tackle the intent of this change as it unduly complicates the timeout detection logic.

I am okay with saying:
"If you run a process P1, and that process creates child processes C1..CN, and shares the stdout/stderr file handles from P1 to C1..CN, and if P1 exits, we don't detect the exit until all stdout/stderr handles shared with C1..CN are closed." That's just a bad test if it is leaving children around. It will time out.

Sounds good to me. This motivates people to write correct tests, which I think is good. My main concern was not leaving those children around after we time out, which I believe you fixed already.

Yep, great thanks!

I just closed the bugzilla ticket for this feature as WONTFIX. If we really want this later, we can revisit.