Reduces time to link PGO instrumented net_unittets.exe by 11% (9.766s ->
8.672s, best of three). Reduces peak memory by 65.7MB (2142.71MB ->
2076.95MB).
Use a more compact struct, BulkPublic, for faster sorting. Sort in
parallel. Construct the hash buckets in parallel. Try to use one vector
to hold all the publics instead of copying them from one to another.
Allocate all the memory needed to serialize publics up front, and then
serialize them in place in parallel.

Maybe reserve in advance to avoid reallocations?
unsigned count{}; symTab->forEachSymbol([](Symbol *s) { auto *def = dyn_cast<Defined>(s); count += (def && def->isLive() && def->getChunk()); }); publics.reserve(count);Do you think there would be a gain to do the creation in parallel? (and do .resize instead in that case)