- Change Symbol::flags to a std::atomic<uint16_t>
- Add llvm::parallel::threadIndex as a thread-local non-negative integer
- Add relocsVec to part.relaDyn and part.relrDyn so that relative relocations can be added without a mutex
- Arbitrarily change -z nocombreloc to move relative relocations to the end. Disable parallelism for deterministic output.
MIPS and PPC64 use global states for relocation scanning. Keep serial scanning.
Speed-up with mimalloc and --threads=8 on an Intel Skylake machine:
- clang (Release): 1.27x as fast
- clang (Debug): 1.06x as fast
- chrome (default): 1.05x as fast
- scylladb (default): 1.04x as fast
Speed-up with glibc malloc and --threads=16 on a ThunderX2 (AArch64):
- clang (Release): 1.31x as fast
- scylladb (default): 1.06x as fast
Why not define a copy constructor?