When linking a 1.2G output (nearly no debug info, 2846621 dynamic relocations) using --threads=8, I measured
9.131462 Total ExecuteLinker 1.449913 Total Write output file 1.445784 Total Write sections 0.657152 Write sections {"detail":".rela.dyn"}
This change decreases the .rela.dyn time to 0.25, leading to 4% speed up in the total time.
- The parallelSort is slow because of expensive r_sym/r_offset computation. Cache the values.
- The iteration is slow. Move r_sym/r_addend computation ahead of time and parallelize it.
With the change, the new encodeDynamicReloc is cheap (0.05s). So don't parallelize it.
Just thinking about making the enum size uint8_t Not entirely sure it will make a lot of difference in this case, but you may want to reorder some of the fields so that they minimise padding between items. For example addend before r_sym. Kind may also benefit from being last.