The issue with std::thread::hardware_concurrency is that it forwards to libc and some implementations (like glibc) don't take thread affinity into consideration.
With this change a llvm program that can execute in only 2 cores will use 2 threads, even if the machine has 32 cores.
This makes benchmarking a lot easier, but should also help if someone doesn't want to use all cores for compilation for example.
hardware_concurrency()