As discussed in https://reviews.llvm.org/D87805#2491679 the MergedInfo::recs vector can get quite large and resizing it while remapping types creates unnecessary copies and virtual memory page faults.
This patch saves consistently approx. 0.6 sec (out of 18 sec) on a large dataset (400 MB EXE, 2 GB PDB).
With 12 hyper-threads:
Benchmark #1: before\lld-link.exe @link.rsp /threads:12 Time (mean ± σ): 17.939 s ± 1.215 s [User: 2.7 ms, System: 3.5 ms] Range (min … max): 15.537 s … 18.597 s 10 runs Benchmark #2: after\lld-link.exe @link.rsp /threads:12 Time (mean ± σ): 17.298 s ± 1.511 s [User: 1.4 ms, System: 8.9 ms] Range (min … max): 15.512 s … 18.513 s 10 runs
With 36 hyper-threads (thus using only one CPU socket):
Benchmark #1: before\lld-link.exe @link.rsp /threads:36 Time (mean ± σ): 17.787 s ± 0.747 s [User: 4.2 ms, System: 5.6 ms] Range (min … max): 15.666 s … 18.059 s 10 runs Benchmark #2: after\lld-link.exe @link.rsp /threads:36 Time (mean ± σ): 17.102 s ± 1.323 s [User: 2.6 ms, System: 4.0 ms] Range (min … max): 15.175 s … 18.023 s 10 runs
With 72 hyper-threads (using two CPU sockets, slower because kernel locks now cross CPUs)
Benchmark #1: before\lld-link.exe @link.rsp Time (mean ± σ): 18.085 s ± 0.764 s [User: 2.7 ms, System: 3.3 ms] Range (min … max): 15.918 s … 18.444 s 10 runs Benchmark #2: after\lld-link.exe @link.rsp Time (mean ± σ): 17.453 s ± 1.147 s [User: 2.7 ms, System: 8.7 ms] Range (min … max): 15.766 s … 18.246 s 10 runs