This is very preliminary change which adds support for (`clang-cl`) `/MP` (Build with multiple processes). Doc for `/MP` is [[ https://msdn.microsoft.com/en-us/library/bb385193.aspx | here ]].
Support for `-j` to the `clang` driver could also be added along the way.
I would simply like a general advice on this change. **I don't plan to commit this as it is**. If the general idea seems good, I'll cut it down in several smaller pieces.
The change about having explicit return codes (`Program.h / enum ProgramReturnCode`) should probably be discussed separately, although if you have an opinion about that, please do.
== Some timings ==
Full rebuild of LLVM + Clang + LLD (at r341847), Release, optimized tablegen. VS2017 15.8.3, Ninja 1.8.2, CMake 3.12.2
**Config 1 : **Intel Xeon Haswell 6 cores / 12 HW threads, 3.5 GHz, 15M cache, 128 GB RAM, SSD 550 MB/s
| MSBuild, MSVC | **(56min 43 sec)** | 2 parallel msbuild
| MSBuild, Clang + LLD | **(2hour 40min)** | 2 parallel msbuild
| MSBuild, Clang **/MP** + LLD | **(51min 8sec)** | 2 parallel msbuild
| Ninja, MSVC | **(37min)** |
| Ninja, Clang [1] | **(31min 52sec)** |
**Config 2 : **(Intel Xeon Skylake 18 cores / 36 HW threads, x2 (Dual CPU), 72 HW threads total, 2.3 GHz, 24.75M cache, 128 GB RAM, NVMe 4.6 GB/s)
| MSBuild, MSVC | **(12min 8sec)** | 32 parallel msbuild
| MSBuild, Clang + LLD | **(29min 12sec)** | 32 parallel msbuild
| MSBuild, Clang **/MP** + LLD | **(9min 22sec)** | 32 parallel msbuild
| Ninja, MSVC | **(7min 35sec)**
| Ninja, Clang [1] | **(11min)**
[1] Clang compiled with Clang.
Some remarks:
- Ninja is better in regards to discovering cmake "features" (`Looking for ... - found`); whereas the Visual Studio generator goes though a MSBuild instance for each "feature" which makes it very slow.
- Ninja seems to have a better job allocation that MSBuild, which compiles monolithically each vcproj one by one.
- Ninja + Clang on the Skylake is significantly worse than Ninja + MSVC, probably because of the poor Windows 10 thread scheduler which is known to under-perform on high cores count, [[ https://www.phoronix.com/scan.php?page=article&item=2990wx-linux-windows&num=4 | see here ]]. The reason is that for each file compiled, clang-cl.exe invokes an additional -cc1 child process.
- Each new clang-cl.exe child process creation costs between 60-100ms (that is the time spent between the parent's `CreateProcess()` and the first file accessed by the child .exe). In contrast MSBuild + Clang /MP uses only one parent .exe per vcproject.