Python uses stdio from the C runtime for file descriptors and pipes. On Windows, by default, the CRT has a limit of 512 open file descriptors. https://msdn.microsoft.com/en-us/library/6e3b887c.aspx
The parent dotest process ends up with several FDs for each process running a test in parallel. At about 37-38 logical cores we started hitting this limit regularly.
This patch works around the problem by capping the threads to 32 on Windows.