Instead of storing a pointer, store the members we need.
The reason for doing this is that it makes it far easier to create synthetic sections. It also avoids reading data from files multiple times., which might help with cross endian linking and host architectures with slow unaligned access.
There are obvious compacting opportunities, but this already has mixed results even on native x86_64 linking.
There is also the possibility of better refactoring the code for handling common symbols, but this already shows that a custom class is not necessary.
In summary, I would like to commit this an iterate if you guys are OK with it.
The perf number I got on native x86_64 are
firefox
master 7.309622414 patch 7.285605127 1.00329653976x faster
firefox-gc
master 7.510372903 patch 7.485952007 1.00326222984x faster
chromium
master 5.310620915 patch 5.269296067 1.0078425747x faster
chromium fast
master 2.062932807 patch 2.080311297 1.00842416677x slower
the gold plugin
master 0.356284971 patch 0.356714755 1.00120629281x slower
clang
master 0.602845631 patch 0.604443573 1.00265066531x slower
llvm-as
master 0.034642356 patch 0.034852631 1.00606988162x slower
the gold plugin fsds
master 0.3861195 patch 0.388452457 1.00604205952x slower
clang fsds
master 0.688210298 patch 0.689806352 1.00231913705x slower
llvm-as fsds
master 0.0320734 patch 0.032240692 1.005215911x slower
scylla
master 3.249120693 patch 3.246897164 1.00068481658x faster
I am building lld on a big endian power8 and will try to bencmark a cross link of an x86_64 binary.
Please add a comment to note that they correspond to Elf_Shdr.