This is very preliminary change which adds support for (clang-cl) /MP (Build with multiple processes). Doc for /MP is here.
Support for -j to the clang driver could also be added along the way.
I would simply like a general advice on this change. I don't plan to commit this as it is. If the general idea seems good, I'll cut it down in several smaller pieces.
The change about having explicit return codes (Program.h / enum ProgramReturnCode) should probably be discussed separately, although if you have an opinion about that, please do.
Full rebuild of LLVM + Clang + LLD (at r341847), Release, optimized tablegen. VS2017 15.8.3, Ninja 1.8.2, CMake 3.12.2
Config 1 : Intel Xeon Haswell 6 cores / 12 HW threads, 3.5 GHz, 15M cache, 128 GB RAM, SSD 550 MB/s
|MSBuild, MSVC /MP||(56min 43 sec)||2 parallel msbuild|
|MSBuild, Clang + LLD||(2hour 40min)||2 parallel msbuild|
|MSBuild, Clang /MP + LLD||(51min 8sec)||2 parallel msbuild|
|Ninja, Clang ||(31min 52sec)|
Config 2 : Intel Xeon Skylake 18 cores / 36 HW threads, x2 (Dual CPU), 72 HW threads total, 2.3 GHz, 24.75M cache, 128 GB RAM, NVMe 4.6 GB/s
|MSBuild, MSVC /MP||(12min 8sec)||32 parallel msbuild|
|MSBuild, Clang + LLD||(29min 12sec)||32 parallel msbuild|
|MSBuild, Clang /MP + LLD||(9min 22sec)||32 parallel msbuild|
|Ninja, MSVC||(7min 35sec)|
|Ninja, Clang ||(11min)|
 Clang compiled with Clang.
- Ninja is better in regards to discovering cmake "features" (Looking for ... - found); whereas the Visual Studio generator goes though a MSBuild instance for each "feature" which makes it very slow.
- Ninja seems to have a better job allocation that MSBuild, which compiles monolithically each vcproj one by one.
- Ninja + Clang on the Skylake is significantly worse than Ninja + MSVC, probably because of the poor Windows 10 thread scheduler which is known to under-perform on high cores count, see here. The reason is that for each file compiled, clang-cl.exe invokes an additional -cc1 child process.
- Each new clang-cl.exe child process creation costs between 60-100ms (that is the time spent between the parent's CreateProcess() and the first file accessed by the child .exe). In contrast MSBuild + Clang /MP uses only one parent .exe per vcproject.