This is an archive of the discontinued LLVM Phabricator instance.

Roll dosep.py parallel test runner into dotest.py command line
ClosedPublic

Authored by tfiala on Sep 2 2015, 10:12 PM.

Details

Reviewers
zturner
clayborg
Summary

This is a change discussed in http://reviews.llvm.org/D12416.

This change does the following:

  • Removes the need to call dosep.py directly. Indeed, calling dosep.py directly now errors out (intentionally).
  • Moves the arguments that dosep.py used to take into a Parallel execution options group of the dotest.py options.
  • dotest.py now takes an optional --no-multiprocess option to shut off the multiprocess test runner. If not specified (i.e. the default), the test runner runs it in the same manner as dosep.py would have run it.
  • The unnamed option (trailing argument) that dosep.py took was a test-dir-relative subdir to limit multiprocess test traversal within. This has been moved to the dotest.py --test-subdir option since dotest.py already handled unnamed options for something else.
  • cmake and configure-based builds have been updated to use dotest.py or dotest.py --no-multiprocess in place of calling dosep.py.

I intend to do further cleanup here as follow up work, but I wanted to get the outside experience modified to "multiprocess by default, one interface" to the outside user.

Diff Detail

Event Timeline

tfiala updated this revision to Diff 33901.EditedSep 2 2015, 10:12 PM
tfiala retitled this revision from to Roll desep.py parallel test runner into dotest.py command line.
tfiala updated this object.
tfiala added a reviewer: clayborg.
tfiala added a subscriber: lldb-commits.

Removes the need to call dosep.py directly. Indeed, calling dosep.py now errors out (intentionally).

Calling it *directly* errors out. dotest.py still uses it under the covers at the moment, which is why I haven't removed it.

tfiala updated this object.Sep 2 2015, 10:15 PM
tfiala retitled this revision from Roll desep.py parallel test runner into dotest.py command line to Roll dosep.py parallel test runner into dotest.py command line.

Are there any plans to actually just kill dosep.py at some point? I know you mentioned you want to do further cleanup, so I'm wondering if this is just an incremental step towards a dosep'less world.

I'll review this in more detail tomorrow, but please hold off on comitting until I get a chance to test everything on Windows and make sure everything still works. (I assume other people will want to do the same for their platforms)

tfiala added a subscriber: tfiala.Sep 2 2015, 11:23 PM

Yep, all good. I was planning on waiting for feedback first.

I've checked on Linux but don't have a Windows box available at the moment.

I very much like the direction this is going in. We will need to update our buildbots when this lands (we need to run dosep manually to setup remote testing), but I have tested this manually and didn't see any problems.

test/dosep.py
503

You should print a \n after this message.

tfiala added a comment.Sep 3 2015, 7:36 AM

I very much like the direction this is going in. We will need to update our buildbots when this lands (we need to run dosep manually to setup remote testing), but I have tested this manually and didn't see any problems.

Good catch, indeed!

tfiala added inline comments.Sep 3 2015, 7:38 AM
test/dosep.py
503

Yes, good catch.

tfiala updated this revision to Diff 33951.Sep 3 2015, 9:26 AM

Added newline to dosep.py error sent via sys.stderr.write() when called directly.

tfiala added a comment.Sep 3 2015, 9:32 AM

Also, I should mention this was tested on OS X in addition to Linux.

Let me know when you get a chance to run it on Windows, Zachary.

Are there any plans to actually just kill dosep.py at some point?

Hmm not seeing my response to this from my phone last night, so at the cost of potentially repeating:
Absolutely. I think there is a fair amount of cleanup that could happen to the test running code. I'd expect later to see the dosep.py functionality turn into an LLDBTestRunner class or something similar, with a concrete impl for the single-serialized running and a separate one for multithreaded running. There are many areas that could be cleaned up (both in terms of code organization and things like pep8/pylint --- a few of which I took care of in this change but the majority of which I left unchanged).

My goal for this particular change was to be minimally different at the implementation level to simplify the transition --- get the user interface right (no dosep.py from the user's point of view). In using a minimal implementation change, I'm also hopeful to avoid things like breaking the Windows build.

clayborg accepted this revision.Sep 3 2015, 10:29 AM
clayborg edited edge metadata.

Looks good.

This revision is now accepted and ready to land.Sep 3 2015, 10:29 AM
zturner edited edge metadata.Sep 3 2015, 10:36 AM

About to take a look at this

About to take a look at this

Great, thanks.

zturner added inline comments.Sep 3 2015, 11:36 AM
test/CMakeLists.txt
65

Is this necessary? dosep already takes a -t argument which specifies the number of threads, and if the number of threads is 1 it disables the thread pool. Is it possible to delete this option and just pass -t=1 instead?

tfiala added inline comments.Sep 3 2015, 11:47 AM
test/CMakeLists.txt
65

Right now there's an implementation difference between the two:

--no-multiprocess literally switches from using the dosep parallel test runner logic to using the old dotest.py test runner logic. Specifying -t/--threads THREADCOUNT where THREADCOUNT is 1 will use the multiprocess test runner with a single process queue. So, if somebody wanted to have the extra test subdir filtering of the dosep logic (the test-subdir arg) but for some reason wanted to debug or otherwise run it serially, they'd be out of luck if we adopted a -t=1 == --no-multiprocess stance.

What I'd recommend is we keep the two options for now since there is behavior we would otherwise lose at the moment. But, when we do the refactor to test runners, we get rid of all the bits that are special to the multi test runner (e.g. the test-subdir and the --output-on-success flags), at which point your idea that "--no-multiprocess is unnecessary" should be true, and we can remove it then.

Sound reasonable?

Ok, seems fine. No other issues on my side

Ok, seems fine. No other issues on my side

Great, thanks. I'm going to get this in.

tfiala closed this revision.Sep 3 2015, 12:00 PM

Submitted here:

$ svn commit
Sending        test/CMakeLists.txt
Sending        test/Makefile
Sending        test/dosep.py
Sending        test/dotest.py
Sending        test/dotest_args.py
Sending        www/test.html
Transmitting file data ......
Committed revision 246794.
chying added a comment.Sep 3 2015, 3:06 PM

Sorry, coming late to this thread.
This change works ok on linux and darwin systems.

But it has some problem on Windows, the main code has to be protected with the following statement on Windows,
if name == 'main':
Otherwise it creates subprocesses recursively.

I didn't see this when I ran ninja check-lldb, but Adrian is seeing the
same thing on his machine for some reason.

I actually am seeing this now, I'm not sure why I didn't see it earlier,
the only thing I can think of is that the patch didn't apply successfully
even though I thoguht it did.

When I stop at this line:

if num_threads > 1:
    pool = multiprocessing.Pool(
        num_threads,
        initializer=setup_global_variables,
        initargs=(output_lock, test_counter, total_tests, test_name_len,
                  dotest_options))

in a debugger and inspect the value of dotest_options,
dotest_options.no_multiprocessing is set to False. Is this correct? It
seems like for the child dotest instances, it shouldn't be trying to
recursively do this multiprocessing?

tfiala added a comment.Sep 3 2015, 3:30 PM

Sorry, coming late to this thread.
This change works ok on linux and darwin systems.

Hi @chying,

But it has some problem on Windows, the main code has to be protected with the following statement on Windows,
if name == 'main':
Otherwise it creates subprocesses recursively.

Can you tell me more about this? How are you calling it? Which one (dotest.py or dosep.py) is going recursively out of control? On what kind of machine? Did you used to run it with dosep,py or dotest.py? If you used to use dotest.py, you'll start seeing a bunch of dotest.py test runners (one per core, which you can control with the -t or --threads option). If you're on a machine that has 48 cores, you'll get 48 dotest runner processes. Are you sure that it is not running?

If it is not running and is truly recursing out of control, can you capture the command line args for several of the processes? All of the inferior dotest.py worker bees should have been kicked off with the --inferior flag added. If *not*, then something sounds like it is going wrong.

Do you have a patch that fixes this? (I'm not sure exactly which part you're wanting to protect).

One more question - I recall sometimes having .pyc files not get cleared out properly, and get used even when they're stale on some platforms in the past. Is it possible you have stale .pyc files that possibly don't have the updated code (i.e. .py file is fresh, but .pyc file is getting used accidentally)? Easiest way to rule that out is to blow away the .pyc files in the directory. Having stale files in this case could go really badly.

Thanks!

tfiala added a comment.EditedSep 3 2015, 3:33 PM

I actually am seeing this now, I'm not sure why I didn't see it earlier,
the only thing I can think of is that the patch didn't apply successfully
even though I thoguht it did.

When I stop at this line:

if num_threads > 1:
    pool = multiprocessing.Pool(
        num_threads,
        initializer=setup_global_variables,
        initargs=(output_lock, test_counter, total_tests, test_name_len,
                  dotest_options))

in a debugger and inspect the value of dotest_options,
dotest_options.no_multiprocessing is set to False. Is this correct? It
seems like for the child dotest instances, it shouldn't be trying to
recursively do this multiprocessing?

That flag is only meaningful to the initial invocation. It would be False if the user didn't specify it on the command line in the highest-level process that is driving the parallel test runner. That looks fine.

The children know their inferiors by an --inferior flag, and don't try to recurse if either the no_multiprocess or the --inferior flags are set. The parallel test runner arranges to always pass in the --inferior flag.

I'm testing a fix for this locally. Hold tight

chying added a comment.Sep 3 2015, 3:51 PM

Sorry, coming late to this thread.
This change works ok on linux and darwin systems.

But it has some problem on Windows, the main code has to be protected with the following statement on Windows,
if name == 'main':
Otherwise it creates subprocesses recursively.

Sorry, coming late to this thread.
This change works ok on linux and darwin systems.

Hi @chying,

But it has some problem on Windows, the main code has to be protected with the following statement on Windows,
if name == 'main':
Otherwise it creates subprocesses recursively.

Can you tell me more about this? How are you calling it? Which one (dotest.py or dosep.py) is going recursively out of control? On what kind of machine? Did you used to run it with dosep,py or dotest.py? If you used to use dotest.py, you'll start seeing a bunch of dotest.py test runners (one per core, which you can control with the -t or --threads option). If you're on a machine that has 48 cores, you'll get 48 dotest runner processes. Are you sure that it is not running?

I call dotest directly, like "python.exe dotest.py --executable ..." on a Windows 7 machine.
I set LLDB_TEST_THREADS=8, and see 8 test runners from task manager. That has no problem.
But it seems each thread keeps calling dotest->dosep->dotest-... recursively.

If it is not running and is truly recursing out of control, can you capture the command line args for several of the processes? All of the inferior dotest.py worker bees should have been kicked off with the --inferior flag added. If *not*, then something sounds like it is going wrong.

Do you have a patch that fixes this? (I'm not sure exactly which part you're wanting to protect).

I don't have a patch yet, because the functions in dotest.py is pretty scattered.
I'm not sure which part to guard yet. I'm still looking at it.

One more question - I recall sometimes having .pyc files not get cleared out properly, and get used even when they're stale on some platforms in the past. Is it possible you have stale .pyc files that possibly don't have the updated code (i.e. .py file is fresh, but .pyc file is getting used accidentally)? Easiest way to rule that out is to blow away the .pyc files in the directory. Having stale files in this case could go really badly.

Yes, I had a clean run.

Hi Ying, check the patch I just posted, I think it fixes it.

tfiala added a comment.Sep 3 2015, 3:56 PM

Yes, I had a clean run.

Thanks for all the details, @chying! It looks like Zachary found the culprit. I'm looking forward to hear if his patch fixes it for you.