This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Optimize getRelocAttrs()
AbandonedPublic

Authored by int3 on Mar 12 2021, 6:05 PM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Summary

Investigation of PR49480 showed that D95121 caused about a 5.0% speed
regression when linking chromium_framework. That diff introduces a (very
useful) additional layer of abstraction over relocations, so the perf
overhead is not too surprising. The diff is pretty large, and perf
didn't give me any great hints, so I just went with optimizing the
likely candidate -- getRelocAttrs(). I managed to claw back about
1.4% of perf this way. Making the relocAttrsArray a global and
devirtualizing getRelocAttrs() gave most of the win; I also marked
the array range check with LLVM_UNLIKELY for good measure.

The numbers above are quoted for chromium_framework (from the tarball in
PR48657).

    N           Min           Max        Median           Avg        Stddev
x  20           4.5          4.66          4.56        4.5715   0.044871161
+  20          4.42          4.61           4.5        4.5075   0.053001986
Difference at 95.0% confidence
        -0.064 +/- 0.0314295
        -1.39998% +/- 0.68751%
        (Student's t, pooled s = 0.0491052)

I also measured v8_unittests:

    N           Min           Max        Median           Avg        Stddev
x  20          0.62          0.65          0.64        0.6355  0.0082557795
+  20          0.61          0.63          0.62         0.616  0.0059824304
Difference at 95.0% confidence
        -0.0195 +/- 0.00461426
        -3.06845% +/- 0.726084%
        (Student's t, pooled s = 0.00720928)

The v8 difference is likely larger because it doesn't use an order file.
Symbol ordering is actually one of the most expensive steps when linking
chromium_framework, and probably is a target for further optimization.

The other hotspot is the assignment of relocations to subsections. I'm
curious as to whether replacing the RB-tree in std::map with a radix
trie would be an improvement...

Diff Detail

Event Timeline

int3 created this revision.Mar 12 2021, 6:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2021, 6:05 PM
int3 requested review of this revision.Mar 12 2021, 6:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2021, 6:05 PM
int3 edited the summary of this revision. (Show Details)Mar 12 2021, 6:06 PM
int3 added a comment.Mar 13 2021, 7:20 AM

I'm aware :) the reason why we're using it here is because we need upper_bound(). That said, a sorted vector (as mentioned in the manual) might help here... parseSymbols interleaves inserts and queries to the SubsectionMaps, but parseRelocations only does queries, so we could convert the map to a vector in the latter case.

int3 planned changes to this revision.Mar 15 2021, 11:28 AM
int3 added a subscriber: smeenai.

maybe using templates would be better here, similar to LLD-ELF's ELFT type. Thanks @smeenai for the idea

int3 abandoned this revision.Mar 25 2021, 10:59 AM

This is kind of ugly, and I think there's lower-hanging performance fruit, so we can revisit this later