When --threads= is unspecified, we set it to
parallel::strategy.compute_thread_count(), which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
This is the "default" value that I was referring to. I wasn't suggesting that this should be capped. But assuming that this is just passed on to llvm::heavyweight_hardware_concurrency(), then it should be fine as is because llvm::heavyweight_hardware_concurrency() should sort out any CPU affinity restrictions.
AFAIK, the "heavyweight" version only counts "real" cores and ignores any SMT ones. So on most X86 CPUs it will be half as many threads. Now that some X86 CPUs have a mix of fast and low-power cores, I'm not too sure what "heavyweight" is or should be in these scenarios.