This is very preliminary change which adds support for (clang-cl) /MP (Build with multiple processes). Doc for /MP is here.
Support for -j to the clang driver could also be added along the way.
I would simply like a general advice on this change. I don't plan to commit this as it is. If the general idea seems good, I'll cut it down in several smaller pieces.
The change about having explicit return codes (Program.h / enum ProgramReturnCode) should probably be discussed separately, although if you have an opinion about that, please do.
Some timings
Full rebuild of LLVM + Clang + LLD (at r341847), Release, optimized tablegen. VS2017 15.8.3, Ninja 1.8.2, CMake 3.12.2
Config 1 : Intel Xeon Haswell 6 cores / 12 HW threads, 3.5 GHz, 15M cache, 128 GB RAM, SSD 550 MB/s
MSBuild, MSVC /MP | (56min 43 sec) | 2 parallel msbuild |
MSBuild, Clang + LLD | (2hour 40min) | 2 parallel msbuild |
MSBuild, Clang /MP + LLD | (51min 8sec) | 2 parallel msbuild |
Ninja, MSVC | (37min) | |
Ninja, Clang [1] | (31min 52sec) | |
Config 2 : Intel Xeon Skylake 18 cores / 36 HW threads, x2 (Dual CPU), 72 HW threads total, 2.3 GHz, 24.75M cache, 128 GB RAM, NVMe 4.6 GB/s
MSBuild, MSVC /MP | (12min 8sec) | 32 parallel msbuild |
MSBuild, Clang + LLD | (29min 12sec) | 32 parallel msbuild |
MSBuild, Clang /MP + LLD | (9min 22sec) | 32 parallel msbuild |
Ninja, MSVC | (7min 35sec) | |
Ninja, Clang [1] | (11min) | |
[1] Clang compiled with Clang.
Some remarks:
- Ninja is better in regards to discovering cmake "features" (Looking for ... - found); whereas the Visual Studio generator goes though a MSBuild instance for each "feature" which makes it very slow.
- Ninja seems to have a better job allocation that MSBuild, which compiles monolithically each vcproj one by one.
- Ninja + Clang on the Skylake is significantly worse than Ninja + MSVC, probably because of the poor Windows 10 thread scheduler which is known to under-perform on high cores count, see here. The reason is that for each file compiled, clang-cl.exe invokes an additional -cc1 child process.
- Each new clang-cl.exe child process creation costs between 60-100ms (that is the time spent between the parent's CreateProcess() and the first file accessed by the child .exe). In contrast MSBuild + Clang /MP uses only one parent .exe per vcproject.
Seems nice to save a syscall and not ask how many cores we have if we were given an explicit value first.