String merging is one of the most time-consuming functions in lld.
This patch parallelize it to speed it up. On my 2-socket 20-core
40-threads Xeon E5-2680 @ 2.8 GHz machine, this patch shorten the
clang debug build link time from 7.11s to 5.16s. It's a 27%
improvement and actually pretty noticeable. In this test condition,
lld is now 4x faster than gold.
Details
Diff Detail
- Build Status
Buildable 10671 Build 10671: arc lint + arc unit
Event Timeline
lld/ELF/SyntheticSections.cpp | ||
---|---|---|
2240 | Will it be better to calculate Begin and End instead of iterating over all pieces and using constructions like Sec->Hashes[I] % NumShards == ShardId ? size_t TaskSize = Sec->Pieces.size() / NumShards; size_t I = TaskSize * ShardId; size_t E = (ShardId == NumShards - 1) ? Sec->Pieces.size() : I + TaskSize; |
lld/ELF/SyntheticSections.cpp | ||
---|---|---|
2240 | That won't work because it doesn't guarantee to produce a minimum table. |
lld/ELF/SyntheticSections.cpp | ||
---|---|---|
2240 | What I want to say is that spec says: "SHF_MERGE is an optional flag indicating a possible optimization. The link-editor is allowed to perform the optimization, or to ignore the optimization." MergeNoTailSection is used for -O1. So for full optimization it is not used and that means users agree to have larger output size than is possible in theory, but have faster link time. if such change does not noticably reduces the link time, it make no sence though. |
- Implemented a better algorithm. The new algorithm has little overhead on a single core use case unlike the previous one.
@ruiu, this happened to fail check-lld on -m32 (i686). Seems x86-64 is fine.
http://bb.pgr.jp/builders/test-lld-i686-linux-RA/builds/538
Could you investigate it please? I will investigate tomorrow.
Will it be better to calculate Begin and End instead of iterating over all pieces and using constructions like Sec->Hashes[I] % NumShards == ShardId ?
Something like: