Page MenuHomePhabricator

Parallelize string merging.
ClosedPublic

Authored by ruiu on Sep 25 2017, 6:04 PM.

Details

Summary

String merging is one of the most time-consuming functions in lld.
This patch parallelize it to speed it up. On my 2-socket 20-core
40-threads Xeon E5-2680 @ 2.8 GHz machine, this patch shorten the
clang debug build link time from 7.11s to 5.16s. It's a 27%
improvement and actually pretty noticeable. In this test condition,
lld is now 4x faster than gold.

Diff Detail

Repository
rL LLVM

Event Timeline

ruiu created this revision.Sep 25 2017, 6:04 PM
ruiu edited the summary of this revision. (Show Details)Sep 25 2017, 6:07 PM
grimar added a subscriber: grimar.Sep 27 2017, 1:59 AM
grimar added inline comments.
lld/ELF/SyntheticSections.cpp
2241 ↗(On Diff #116627)

Will it be better to calculate Begin and End instead of iterating over all pieces and using constructions like Sec->Hashes[I] % NumShards == ShardId ?
Something like:

size_t TaskSize = Sec->Pieces.size() / NumShards;
size_t I = TaskSize * ShardId;
size_t E = (ShardId == NumShards - 1) ? Sec->Pieces.size() : I + TaskSize;
ruiu added inline comments.Sep 27 2017, 8:33 PM
lld/ELF/SyntheticSections.cpp
2241 ↗(On Diff #116627)

That won't work because it doesn't guarantee to produce a minimum table.

grimar added inline comments.Sep 28 2017, 5:10 AM
lld/ELF/SyntheticSections.cpp
2241 ↗(On Diff #116627)

What I want to say is that spec says: "SHF_MERGE is an optional flag indicating a possible optimization. The link-editor is allowed to perform the optimization, or to ignore the optimization."

MergeNoTailSection is used for -O1. So for full optimization it is not used and that means users agree to have larger output size than is possible in theory, but have faster link time.
In that sence it can be acceptable not to guatantee a minimum table as it anyways not minimum possible one.

if such change does not noticably reduces the link time, it make no sence though.

ruiu updated this revision to Diff 117082.Sep 28 2017, 7:33 PM
  • Implemented a better algorithm. The new algorithm has little overhead on a single core use case unlike the previous one.
ruiu updated this revision to Diff 117084.Sep 28 2017, 7:45 PM
  • Fix a few minor errors.
ruiu edited the summary of this revision. (Show Details)Sep 28 2017, 8:06 PM
This revision was automatically updated to reflect the committed changes.

@ruiu, this happened to fail check-lld on -m32 (i686). Seems x86-64 is fine.
http://bb.pgr.jp/builders/test-lld-i686-linux-RA/builds/538

Could you investigate it please? I will investigate tomorrow.