This is an archive of the discontinued LLVM Phabricator instance.

Add ctrl-c support to parallel dotest.py.
ClosedPublic

Authored by tfiala on Sep 4 2015, 3:21 PM.

Details

Reviewers
zturner
clayborg
Summary

This change does the following:

  • If Ctrl-C is hit once, the parallel dotest.py runner will stop any further work from being started, but will wait for the existing tests in progress to complete.
  • If Ctrl-C is hit a second time, the parallel dotest.py will kill the process trees for the child worker threads and the inferior dotest.py processes they spawned.

In both cases, the report infrastructure is left intact to report on whatever work was completed before it stopped. Thus, if 42 tests ran when the test was keyboard interrupted, the results for those 42 tests will be displayed.

See https://llvm.org/bugs/show_bug.cgi?id=24709

Diff Detail

Event Timeline

tfiala updated this revision to Diff 34084.Sep 4 2015, 3:21 PM
tfiala retitled this revision from to Add ctrl-c support to parallel dotest.py..
tfiala updated this object.
tfiala added reviewers: clayborg, zturner.
tfiala added a subscriber: lldb-commits.
zturner edited edge metadata.Sep 4 2015, 3:32 PM

Ctrl+C once doesn't work on Windows with this patch. It seems to continue starting up new processes, and after all of them are done, I'm left with the original set of Python processes not doing any work, just sitting there. If I Ctrl+C again everything does.

Note that when I press the first Ctrl+C, I don't even get the print that says First Keyboard Interrupt received.

clayborg accepted this revision.Sep 4 2015, 3:35 PM
clayborg edited edge metadata.

Looks fine. Zach, do you need to make some modifications for Windows? Or should we just check this in and you can fix windows with a separate patch?

This revision is now accepted and ready to land.Sep 4 2015, 3:35 PM
zturner added a subscriber: zturner.Sep 4 2015, 3:37 PM

I can try to put a little time into figuring out what's wrong. I'd rather
it not go in broken though, so let me take a look first and we can figure
out what to do after that.

tfiala added a comment.Sep 4 2015, 3:38 PM

Ctrl+C once doesn't work on Windows with this patch. It seems to continue starting up new processes, and after all of them are done, I'm left with the original set of Python processes not doing any work, just sitting there. If I Ctrl+C again everything does.

Okay. So its like the first Ctrl-C is essentially not even handled, but the second Ctrl-C does the nuclear option and kills everything. Is that right?

Note that when I press the first Ctrl+C, I don't even get the print that says First Keyboard Interrupt received.

That kinda sounds like these lines might be suspect:

# Shut off interrupt handling in the child process.
signal.signal(signal.SIGINT, signal.SIG_IGN)

(at line 224-225). If they're not doing anything on Windows, then it is possible that the Ctrl-C could get gobbled by one of the child multiprocessing workers (?)

I'll dig around and see if there's something different about how Windows handles disabling Ctrl-C. (If you know anything there, that'd be helpful as well).

Worst case we could check os type and skip the soft kill on NT?

tfiala added a comment.Sep 4 2015, 3:38 PM

I can try to put a little time into figuring out what's wrong. I'd rather
it not go in broken though, so let me take a look first and we can figure
out what to do after that.

Yep absolutely.

Hmm, yea the problem is worse than I thought. Even if I don't do Ctrl+C at
all (I just run the test suite like I always have and let it finish) it's
just leaving a bunch of stale processes at the end without them shutting
down on their own.

I believe that python is also weird in that if you want to catch a CTRL+C it has to be executing python code (not in a python code the entered C/C++ code) otherwise the CTRL+C might not get to the python itself. This might be only a problem in the embedded python interpreter though so I could be wrong on this.

Were you running dosep.py on Windows before as the way to run your test suite? Or were you running dotest.py (no multi-threading)?

tfiala added a comment.Sep 4 2015, 3:54 PM

Hmm, yea the problem is worse than I thought. Even if I don't do Ctrl+C at
all (I just run the test suite like I always have and let it finish) it's
just leaving a bunch of stale processes at the end without them shutting
down on their own.

Ok, and it doesn't do that prior to the patch? (i.e. this isn't just a case of the processes slowly falling off at the end?)

There's a point where all the worker processes are joined, which should mean they're done. So either the join is failing to complete (maybe because something in the process isn't wrapping up right), or the join isn't causing them to die.

Is the initial test runner itself finishing up?

Correct, it doesn't do that prior to the patch. It looks like it's never
exiting this loop:

try:
    for worker in workers:
        worker.join()
except:

either when a Ctrl+C happens, or when all the processes finish. I guess
it's stuck in there for some reason. So even the test runner doesn't
finish. Still investigating.

Also to answer Greg's question, yes we were using dosep before. The
problem seems to be related to swtiching from multiprocess.Pool to doing
the pumping manually.

tfiala added a comment.Sep 4 2015, 4:00 PM

Correct, it doesn't do that prior to the patch. It looks like it's never
exiting this loop:

try:
    for worker in workers:
        worker.join()
except:

either when a Ctrl+C happens, or when all the processes finish. I guess
it's stuck in there for some reason. So even the test runner doesn't
finish. Still investigating.

Okay let me go ahead and make a few changes that might help and resend the patch. Back in a few...

tfiala updated this revision to Diff 34089.Sep 4 2015, 4:27 PM
tfiala edited edge metadata.

Specify tight queue sizes for the job description queue and the job results queue.

This *might* stop unintentional queue blocking when adding items to the queue if they were sized too small by default. Might be a possible difference between Windows and OS X/Linux.

tfiala added a comment.Sep 4 2015, 4:29 PM

Hey Zachary,

Can you give the latest patch a try? That might stop some unintentional blockage as adding items to those two queues would be blocking.

I'm going to be relocating home in a minute, but the other thing I'm going to try is to go back to the multiprocessing.Pool() and see if the Ctrl-C woes that others hit are maybe somehow not an issue for us. If that's the case, then maybe the Pool() piece will take care of whatever isn't happy on the Windows side with the manual pooling.

Tried out this patch, unfortunately I'm seeing the same thing. The very
first call to worker.join() is never returning.

It's unfortunate that it's so hard to debug this stuff, do you have any
suggestions for how I can try to nail down what the child dotest instance
is actually doing? I wonder if it's blocking somewhere in its script, or
if this is some quirk of the multiprocessing library's dynamic invocation /
whatever magic is does.

How much of an effort would it be to make the switch to threads now? The
main thing we'd have to do is get rid of all of the globals in dotest, and
make a DoTest class or something.

As a last resort, you can bring back the pool.map logic behind an OS check,
and use the multiprocess.Process logic for other platforms. But it would
be great to have less platform specific branches and logic, not more.

I'm willing to bet you could reproduce this on a Windows VM with a trash
script that all it does is create some processes using the same structure
of loop you're using here. Not sure if that's an option for you. I don't
mind the back and forth diagnostics either, just whatever is easiest for
you.

tfiala added a comment.Sep 4 2015, 5:10 PM

Tried out this patch, unfortunately I'm seeing the same thing. The very
first call to worker.join() is never returning.

It's unfortunate that it's so hard to debug this stuff, do you have any
suggestions for how I can try to nail down what the child dotest instance
is actually doing? I wonder if it's blocking somewhere in its script, or
if this is some quirk of the multiprocessing library's dynamic invocation /
whatever magic is does.

How much of an effort would it be to make the switch to threads now? The
main thing we'd have to do is get rid of all of the globals in dotest, and
make a DoTest class or something.

It's a bit more work than I want to take on right now. I think we really may want to keep the multiprocessing and just not exec out to dotest.py for a third-ish time for each inferior.

As a last resort, you can bring back the pool.map logic behind an OS check,
and use the multiprocess.Process logic for other platforms. But it would
be great to have less platform specific branches and logic, not more.

I'm willing to bet you could reproduce this on a Windows VM with a trash
script that all it does is create some processes using the same structure
of loop you're using here. Not sure if that's an option for you. I don't
mind the back and forth diagnostics either, just whatever is easiest for
you.

Yeah I will have to get a Windows LLDB setup going again. I've got one at home so I can try with that in the near future here. For the moment I'm going to throw up one with multiprocessing.Pool again and throw it over the wall after I make sure Ctrl-C is working with it here (OS X and Linux).

I will try to get the Pool-based version going now. Back as soon as I have that!

tfiala updated this revision to Diff 34112.EditedSep 5 2015, 12:38 PM
tfiala added a subscriber: dawn.

The latest patch does the following:

  • adds a test runner strategy layer for the isolated, concurrent test support (in dosep.py).

The currently-supported test runners are:

  1. multiprocessing

This is a multiprocessing-based parallel test runner that supports ctrl-c through rolling its own worker pool. This one is known to currently have an issue on Windows.

  1. multiprocessing-pool

This one is exactly like what previously existed - multiprocessing based using multiprocessing.Pool() to do the work. It does not handle Ctrl-C at all (and performs poorly, just like before, when the user does press Ctrl-C).

  1. serial

This one is the same test runner used that does no concurrent test running but does use subprocess to provide process isolation to each of the test suite files.

It is not expected that a user specify test-runner-name: sane defaults are provided. The defaults used when no test runner name is given are:

  • Use serial if the number of threads determined to be used is 1. (Likely due to the --threads argument on modern hardware).
  • If the os.name is 'nt', use multiprocessing-pool.
  • Otherwise, use multiprocessing.

@zturner - this should provide the net result of 'no behavior change' on Windows. If you have a chance to verify that with no special options, that'd be great.

Note I plan to make one more change to add in the threading-based test runner before I'm totally done here, so this is just WIP. Zachary, if you want to wait for the "threading" test runner, that seems reasonable. I hope to have that diff in later today.

@dawn, I did add in the logic that says 'if you specify -v and using the dosep-style test runner, assume --output-on-success is specified.

That last patch also fixes a regression in the previous patch when --threads 1 was specified --- the serial test runner in dosep.py was busted and I missed that.

I won't be able to test this until Tuesday, but thanks for working on it

tfiala added a comment.Sep 5 2015, 3:18 PM

I won't be able to test this until Tuesday, but thanks for working on it

No worries. Have a great weekend!

(And maybe I'll have a working Windows setup before then).

tfiala updated this revision to Diff 34114.Sep 5 2015, 4:11 PM

This change adds the threading-module-based parallel test runner strategy. It is not selected by default. It is pool-based.

The threading-pool and multiprocessing-pool strategies are 19.8% slower than the newer hand-wrapped ctrl-c-supporting, hand-wrapped multiprocessing pool approach. I'm going to keep the multiprocessing strategy turned on for everything except Windows NT.

I may later try to hand-wrap the pooling on the threading side as well, since it looks like the slowdown may actually be the Pool provided by multiprocessing.pool (which we were always using, so the multiprocessing test runner strategy represents a potential 20% win for same-class Linux boxes vs. the previous approach currently at top of tree). These numbers were on a 24-core Linux box using 24 processes/threads (the defaults for the box).

If I get this working on a Windows box with the current defaults, I'll check it in.

tfiala updated this revision to Diff 34186.Sep 7 2015, 9:09 PM

Adds "threading" test-runner strategy, which mirrors the "multithreading" runner in terms of supporting Ctrl-C and hand-rolling the worker model. Like multiprocessing over multiprocessing-pool, threading outperforms threading-pool.

Both threading and multiprocessing operate at roughly the same speed (within 10s of ms) on a 24 core Linux box. Both these two test runners are substantially faster than the versions that use the Pool class --- the Pool-class versions are 20-25% slower than the hand-rolled worker pool versions, likely due to the more even on-demand scheduling and coarse grain and uneven nature of our tasks being run.

tfiala added a comment.EditedSep 7 2015, 9:38 PM

@zturner, at this point you should be able to run this and see no change on Windows (assuming I did the os check correctly). The Windows test runner is set to be the previous multithreading-pool strategy. For everyone else, they'll get the multiprocessing strategy by default that supports Ctrl-C.

You can also check the threading-pool and the threading strategy to see if you get any speedup on Windows. The only speed difference I saw was that the -pool versions were slower than the non-pool versions, but otherwise no wall-clock difference in the threading vs. multiprocessing versions. I saw a slight (~1.4%) reduction in system time on the threading vs. multiprocessing, but that didn't translate to an overall speedup. Also, it is more difficult to get the Ctrl-C behavior correct in the threading vs. multiprocessing scenario. So unless there was a specific reason to want to use threading over multiprocessing (or the perf difference was significant on some other OS), I'd stick with multiprocessing-based.

On OS X there was at least one test that would hang each run, so I didn't get any kind of real timing numbers there since everything was always maxed out by the hanging test. We'll definitely stick with the multiprocessing test runner until we have some data that suggests a worthy reason for changing.

I'd like to get this in as soon as this is working on your end, Zachary, as I have other changes backed up behind it.

Thanks!

Adrian, can you verify this in the morning? Basically just trying to
ensure that ninja check-lldb still works as it did before. There's a
chance I'm going to be OOO tomorrow (or at the very best late) due to
something unexpected.

After applying the patch, I get three additional test case failures on Windows. I'm trying to figure out now which ones.

Is there something specific I'm supposed to be trying, with regard to Ctrl+C itself.

If the patch is busted, pretty much every test should start failing. As
long as ninja check-lldb actually runs, completes, and behaves pretty much
the same way as before, that's pretty much all that needs to be verified.

Ctrl+C shouldn't behave any differently with this patch, the idea was to
refactor it so that on Windows it would be identical to pre-patch, but on
other platforms there would be improved Ctrl+C handling

Ninja check-lldb works.

But...

Before the patch, there are 53 failing test suites and 71 failing test cases.
After the patch there are 54 failing test suites and 74 failing test cases.

Diffing the list of failed suites at the end of the output shows only one difference: TestCPPBreakpoints.py is listed twice after the patch.

That may be unrelated to the patch, but it seems consistent.

Can you confirm with 2 runs before and 2 runs after the patch that you see
the same results every time?

Todd, can you think of a reason why this might happen? I don't think it's
worth holding the patch up too long over this because I want to see the
other improvements come through, but if you can think of an obvious cause
for this Todd (or if you feel like looking into it before I get back
tomorrow Adrian) we could probably fix it before it goes in. Either In any
case, I'm willing to let it go in this way and fix it tomorrow if necessary.

I'm re-running the tests now, but I agree this is probably not worth holding up this patch.

tfiala added a comment.Sep 8 2015, 1:46 PM

Can you confirm with 2 runs before and 2 runs after the patch that you see
the same results every time?

Todd, can you think of a reason why this might happen?

Hmm, I vaguely recall finding what looked like a bug in the pre-change code where we might have been losing a count on something, so we might be counting something we previously didn't. But I went through so much code on this in the last 72 hours that I might be misremembering now.

What I can say is that if we're miscounting something (possibly newly so), it shouldn't be too hard to find and fix when we list everything. (i.e. if we're counting something, we should be able to say what it was that was counted, and if there's something being misrepresented, that should be relatively easy to track down).

worth holding the patch up too long over this because I want to see the
other improvements come through, but if you can think of an obvious cause
for this Todd (or if you feel like looking into it before I get back
tomorrow Adrian) we could probably fix it before it goes in. Either In any
case, I'm willing to let it go in this way and fix it tomorrow if necessary.

That would be great.

Let me know when you're comfortable with it going in, Zachary. And thanks for trying it, Adrian!

Diffing the list of failed suites at the end of the output shows only one difference: TestCPPBreakpoints.py is listed twice after the patch.

We should be able to hunt that down. What is your command line?

It seems to be general flakiness. In three runs without the patch, I got the lower numbers every time. In three runs with the patch, I got higher numbers twice and the lower numbers once.

I have no objection to this patch based on this.

Cool, lgtm as well. Sorry for the holdup

tfiala added a comment.Sep 8 2015, 2:25 PM

Cool, lgtm as well. Sorry for the holdup

Absolutely no worries. Thanks for checking, Zachary!

It would also be good if we could get either the --test-runner-name with "threading" or "mulltiprocessing" working on Windows at some point (i.e. look into that original failure when ctrl-c was added), if for no other reason than I suspect you'd get a performance win on Windows based on other platforms. You might also find, even if you're stuck with using the pool implementation, that "threading-pool" might just be faster than "multiprocessing-pool" on Windows. The threading-pool test runner strategy should be identical in behavior on Windows to the multiprocessing-pool strategy that you're using over there, with the diff of using "threading" rather than the "multiprocessing" module for the underlying implementation.

I'll get this checked in. Thanks!

tfiala closed this revision.Sep 8 2015, 3:24 PM

Committed here:

Sending        test/dosep.py
Sending        test/dotest.py
Sending        test/dotest_args.py
Transmitting file data ...
Committed revision 247084.
tfiala added a comment.Sep 8 2015, 4:06 PM

On OS X there was at least one test that would hang each run, so I didn't get any kind of real timing numbers there since everything was always maxed out by the hanging test.

Well wonders abound. On my OS X setup, I am not seeing the hanging tests with TOT from mid day. And, surprisingly, the '--test-runner-name threading' is no longer grabbing the parallel test runner's Python global interpreter lock on the exec. This means the threads are really working in parallel as originally intended last year. This might be related to changes since OS X 10.9 when I was last messing with this. The better news is that, unlike Linux, the 'threading' test runner *is* a speed improvement over the 'multiprocessing' test runner: on a 4 core MBP mid 2013 with 8 GB RAM using 8 threads, I shaved off 16% (~1.2 times speedup) on my runtime using 'threading' over 'multiprocessing'.

I'm not sure if this is related to the version of OS X and the bundled Python, so I may do some experimentation before considering flipping it to default to 'threading' for OS X.