https://maskray.me/blog/2022-01-16-archives-and-start-lib
For every definition in an extracted archive member, we intern the symbol twice,
once for the archive index entry, once for the .o symbol table after extraction.
This is inefficient.
Symbols in a --start-lib ObjFile/BitcodeFile are only interned once because the
result is cached in symbols[i].
Just handle an archive using the --start-lib code path. We can therefore remove
ArchiveFile and LazyArchive. For many projects, archive member extraction ratio
is high and it is a net performance win. Linking a Release build of clang is
1.01x as fast.
Note: --start-lib scans symbols in the same order that llvm-ar adds them to the
index, so in the common case the semantics should be identical. If the archive
symbol table was created in a different order, or is incomplete, this strategy
may have different semantics. Such cases are considered user error.
The is neither ET_REL nor LLVM bitcode error is changed to a warning.
Previously an archive may have such members without a diagnostic. Using a
warning prevents breakage.
- For some tests, the diagnostics get improved where we did not consider the archive member name: b.a: => b.a(b.o):.
- no-obj.s: the link is now allowed, matching GNU ld
- archive-no-index.s: the is neither ET_REL nor LLVM bitcode diagnostic is demoted to a warning.
- incompatible.s: even when an archive is unextracted, we may report an "incompatible with" error.
I recently decreased sizeof(SymbolUnion) by 8 and decreased memory usage quite a
bit, so retaining symbols for un-extracted archive members should not cause a
memory usage problem.
Maybe worth expanding this some more: "--start-lib and --end-lib scans symbols in the same order that llvm-ar adds them, so in the common case the semantics should be identical. If the archive symbol table was created in a different order, or is incomplete, this strategy has different semantics, such cases are considered user error."