This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Parallelize linker optimization hint processing
ClosedPublic

Authored by BertalanD on Sep 7 2022, 10:56 AM.

Details

Summary

This commit moves the parsing of linker optimization hints into
ARM64::applyOptimizationHints. This lets us avoid allocating memory
for holding the parsed information, and moves work out of
ObjFile::parse, which is not parallelized at the moment.

This change reduces the overhead of processing LOHs to 25-30 ms when
linking Chromium Framework on my M1 machine; previously it took close to
100 ms.

There's no statistically significant change in runtime for a --threads=1
link.

Performance figures with all 8 cores utilized:

    N           Min           Max        Median           Avg        Stddev
x  20     3.8027232     3.8760762     3.8505335     3.8454145   0.026352574
+  20     3.7019017     3.8660538     3.7546209     3.7620371   0.032680043
Difference at 95.0% confidence
	-0.0833775 +/- 0.019
	-2.16823% +/- 0.494094%
	(Student's t, pooled s = 0.0296854)

Diff Detail

Event Timeline

BertalanD created this revision.Sep 7 2022, 10:56 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
BertalanD requested review of this revision.Sep 7 2022, 10:56 AM
int3 added a subscriber: int3.Sep 7 2022, 1:24 PM

This change reduces the overhead of processing LOHs to 15-20 ms

Could you include what the previous overhead was?

lld/MachO/Arch/ARM64.cpp
552–557

nit: how about declaring this outside the loop and have it take p as a param?

maybe the compiler can optimize the code as-is, but it would be nice not to have to wonder about it + having parameters passed explicitly instead of by capture tends to make for more readable code (at least when the param list is small)

569–570

it would be nice to avoid macros where possible, especially when they change control flow... IMO this is clearer and doesn't take any extra lines

lld/test/MachO/invalid/invalid-loh.s
7–8

can rm this now that split-file is no longer used

BertalanD updated this revision to Diff 458569.Sep 7 2022, 2:52 PM
BertalanD edited the summary of this revision. (Show Details)

Addressed review comments, tweaked the comment on applyOptimizationHints and updated the description to include that the previous overhead was close to 100 milliseconds.

BertalanD marked 3 inline comments as done.Sep 7 2022, 2:55 PM
BertalanD added inline comments.
lld/MachO/Arch/ARM64.cpp
569–570

Oh, right. This can just be a lambda in the outermost scope that captures section as sectionAddr.

BertalanD updated this revision to Diff 458733.Sep 8 2022, 7:28 AM
BertalanD marked an inline comment as done.

Removed ObjFile::optimizationHints.

oontvoo added a subscriber: oontvoo.Sep 8 2022, 8:15 AM
oontvoo added inline comments.
lld/MachO/Arch/ARM64.cpp
551–556

nit: does this have to be a lambda? can't it be a simple static function?

each spelling of a lambda will be a unique type - multipling with different instantiation of this forEachHint's template, we may end up with a quite a handful types.

708

delete this?

BertalanD updated this revision to Diff 458766.Sep 8 2022, 8:57 AM

Changed the readValuelambda into a standalone function and removed a stray #undef.

BertalanD marked 2 inline comments as done.Sep 8 2022, 8:58 AM
BertalanD added inline comments.
lld/MachO/Arch/ARM64.cpp
708

oops

thakis accepted this revision.Sep 11 2022, 2:40 PM
thakis added a subscriber: thakis.

Nice!

lld/MachO/Arch/ARM64.cpp
675

Does this save a lot of time? Running the loop below and early-returning every time (if there's no AdrpAdrp) feels like it should be fairly fast, and most realistic programs have AdrpAdrps anyways (…right?). I'd probably remove this early return and this variable.

This revision is now accepted and ready to land.Sep 11 2022, 2:40 PM
BertalanD marked an inline comment as done.Sep 12 2022, 9:30 AM
BertalanD added inline comments.
lld/MachO/Arch/ARM64.cpp
675

I checked the usual Chromium Framework repro file. Of the 29,126 objects that have *any* LOHs, only 4127 actually contain the AdrpAdrp kind. IMO it's worth keeping this check. Reading all these ULEB-encoded integers a second time is not a trivial task.

int3 accepted this revision.Sep 12 2022, 9:53 AM

lgtm

thakis added inline comments.Sep 12 2022, 10:11 AM
lld/MachO/Arch/ARM64.cpp
675

Sgtm

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 8:39 AM