Profiling shows a fair amount of time spent in updateReferences.
Flamegraphs; https://people.freebsd.org/~davide/llvm/lld.svg
The following patch make updateReference() working in parallel.
Some numbers, linking clang with lld on FreeBSD 11.
This patch shaves 15 seconds making lld 18% faster for this workload.
Unpatched
real 1m23.951s user 2m43.226s sys 0m17.333s
real 1m25.684s user 2m45.534s sys 0m16.882s
real 1m22.960s user 2m40.704s sys 0m16.323s
real 1m24.512s user 2m44.209s sys 0m17.052s
real 1m24.591s user 2m43.689s sys 0m17.472s
Patched
real 1m12.069s user 2m58.951s sys 0m17.113s
real 1m8.631s user 3m4.934s sys 0m17.344s
real 1m11.175s user 3m1.927s sys 0m16.072s
real 1m11.530s user 3m3.798s sys 0m15.587s
real 1m8.922s user 3m3.654s sys 0m16.541s
I think this change is not thread-safe because of this line. _deadAtoms.insert is not thread-safe.