This is very preliminary change which adds support for (`clang-cl`) `/MP` (Build with multiple processes). Doc for `/MP` is [[ https://msdn.microsoft.com/en-us/library/bb385193.aspx | here ]].
Support for `-j` to the `clang` driver could also be added along the way.
I would simply like a general advice on this change. **I don't plan to commit this as it is**. If the general idea seems good, I'll cut it down in several smaller pieces.
The change about having explicit return codes (`Program.h / enum ProgramReturnCode`) should probably be discussed separately, although if you have an opinion about that, please do.
Some timings //(I ran each configuration several== Some time to ensure the figures are stable)//ings ==
BFull rebuild of LLVM + Clang + LLD (at r341847) inside, Release, optimized tablegen. VS2017: 15.8.3, Ninja 1.8.2, CMake 3.12.2
(**Config 1 : **Intel Xeon Haswell 6 cores / 12 HW threads, 3.5 GHz, 15M cache, 128 GB RAM, SSD 550 MB/s)
- With VS2017 15.8.3: | MSBuild, MSVC | **(56min 43 sec)** | 2 parallel msbuild
- With Clang /MP + LLD (trunk r341847) : (51m 8sec) 2 parallel msbuild
(Intel Xeon Skylake 18 cores / 36 HW threads, x2 (Dual CPU)| MSBuild, 72 HW threads total, 2.3 GHzClang + LLD | **(2hour 40min)** | 2 parallel msbuild
| MSBuild, 24.75M cache, 128 GB RAMClang **/MP** + LLD | **(51min 8sec)** | 2 parallel msbuild
| Ninja, NVMe 4.6 GB/s)
- With VS2017 15.8.3: (12m 8sec) 32 parallel msbuildMSVC | **(37min)** |
- With| Ninja, Clang /MP + LLD (trunk r341847) : (9m 2 | **(31min 52sec) 32 parallel msbuild** |
 running clang-cl.exe compiled with VS2017 15.8.3 at r341847**Config 2 : **(Intel Xeon Skylake 18 cores / 36 HW threads, x2 (Dual CPU), 72 HW threads total, 2.3 GHz, 24.75M cache, 128 GB RAM, NVMe 4.6 GB/s)
| MSBuild, MSVC | **(12min 8sec)** | 32 parallel msbuild
| MSBuild, Clang + LLD | **(29min 12sec)** | 32 parallel msbuild
| MSBuild, Clang **/MP** + LLD | **(9min 22sec)** | 32 parallel msbuild
```| Ninja, MSVC | **(7min 35sec)**
| Ninja, Clang  | **(11min)**
 Clang compiled with Clang.
Please add anyone who might want to review thisSome remarks:
- Ninja is better in regards to discovering cmake "features" (`Looking for ... - found`); whereas the Visual Studio generator goes though a MSBuild instance for each "feature" which makes it very slow.
- Ninja seems to have a better job allocation that MSBuild, which compiles monolithically each vcproj one by one.
- Ninja + Clang on the Skylake is significantly worse than Ninja + MSVC, probably because of the poor Windows 10 thread scheduler which is known to under-perform on high cores count, [[ https://www.phoronix.com/scan.php?page=article&item=2990wx-linux-windows&num=4 | see here ]]. The reason is that for each file compiled, clang-cl.exe invokes an additional -cc1 child process.
- Each new clang-cl.exe child process creation costs between 60-100ms (that is the time spent between the parent's `CreateProcess()` and the first file accessed by the child .exe). Many thanks in advance!In contrast MSBuild + Clang /MP uses only one parent .exe per vcproject.