As discussed in https://reviews.llvm.org/D87805#2491679 the MergedInfo::recs vector can get quite large and resizing it while remapping types creates unnecessary copies and virtual memory page faults.
This patch saves consistently approx. 0.6 sec (out of 18 sec) on a large dataset (400 MB EXE, 2 GB PDB).
With 12 hyper-threads:
Benchmark #1: before\lld-link.exe @link.rsp /threads:12 Time (mean ± σ): 17.939 s ± 1.215 s [User: 2.7 ms, System: 3.5 ms] Range (min … max): 15.537 s … 18.597 s 10 runs Benchmark #2: after\lld-link.exe @link.rsp /threads:12 Time (mean ± σ): 17.298 s ± 1.511 s [User: 1.4 ms, System: 8.9 ms] Range (min … max): 15.512 s … 18.513 s 10 runs
With 36 hyper-threads (thus using only one CPU socket):
Benchmark #1: before\lld-link.exe @link.rsp /threads:36 Time (mean ± σ): 17.787 s ± 0.747 s [User: 4.2 ms, System: 5.6 ms] Range (min … max): 15.666 s … 18.059 s 10 runs Benchmark #2: after\lld-link.exe @link.rsp /threads:36 Time (mean ± σ): 17.102 s ± 1.323 s [User: 2.6 ms, System: 4.0 ms] Range (min … max): 15.175 s … 18.023 s 10 runs
With 72 hyper-threads (using two CPU sockets, slower because kernel locks now cross CPUs)
Benchmark #1: before\lld-link.exe @link.rsp Time (mean ± σ): 18.085 s ± 0.764 s [User: 2.7 ms, System: 3.3 ms] Range (min … max): 15.918 s … 18.444 s 10 runs Benchmark #2: after\lld-link.exe @link.rsp Time (mean ± σ): 17.453 s ± 1.147 s [User: 2.7 ms, System: 8.7 ms] Range (min … max): 15.766 s … 18.246 s 10 runs
I know this is a bit of a stretch - but given the somewhat non-trivial extra work required to determine the capacity, I'm wondering if this issue might be partly due to this incremental/repeated resizing.
I'm guessing that maybe the resize is thwarting the usual growth function of the vectors (eg: resize brings the size up to exactly the size specified, whereas repeated push_back hits the usual growth factor to offset the next push_backs, etc). The notes here: https://en.cppreference.com/w/cpp/container/vector/reserve discuss this sort of issue with reserve, though I'm pretty sure it applies to resize as well.
If it's easy to do, I'd be curious to know whether removing the fix in this patch, and instead changing this resize+memcpy to loop+push_back or... oh, reference invalidation concerns. What about recs.insert(recs.begin(), recs.end()) ? Hmm, yeah, seems the spec for std::vector doesn't allow that, so I guess the nearest code would be: