This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Parallelize UUID hash computation
ClosedPublic

Authored by int3 on Mar 24 2021, 9:51 AM.

Details

Reviewers
MaskRay
oontvoo
Group Reviewers
Restricted Project
Commits
rG9b6dde8af8f0: [lld-macho] Parallelize UUID hash computation
Summary

This reuses the approach (and some code) from LLD-ELF.

It's a decent win when linking chromium_framework on a Mac Pro (3.2 GHz 16-Core Intel Xeon W):

    N           Min           Max        Median           Avg        Stddev
x  20          4.58          4.83          4.66        4.6685   0.066591844
+  20          4.42          4.61           4.5         4.505    0.04751731
Difference at 95.0% confidence
        -0.1635 +/- 0.0370242
        -3.5022% +/- 0.793064%
        (Student's t, pooled s = 0.0578462)

The output binary is 381MB.

Diff Detail

Event Timeline

int3 created this revision.Mar 24 2021, 9:51 AM
Herald added a project: Restricted Project. · View Herald Transcript
int3 requested review of this revision.Mar 24 2021, 9:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2021, 9:51 AM
oontvoo added inline comments.
lld/MachO/Writer.cpp
921–925

Looking at the implementation of xxHash64, I think the reinterpret_cast<uint8_t *> is UB.
Specifically, it tried to iterate the array by +8.

How about this?

ELF uses the same chunk size. I am wondering, why the chunk size is independent of the file size and the number of cores? E.g. why bother with chunks if you have 1 core? Or if if you link a 1GB file with 10 cores, you probably have too many chunks?

Would you mind adding the file size and the number of cores to the summary?

int3 added a comment.Mar 24 2021, 12:02 PM

Would you mind adding the file size and the number of cores to the summary?

Ah yeah I'd forgotten to do that, good catch

lld/MachO/Writer.cpp
921–925

Looking at the implementation of xxHash64, I think the reinterpret_cast<uint8_t *> is UB.
Specifically, it tried to iterate the array by +8.

Can you elaborate? I'm not sure I follow.

As for using write64le to write the intermediate values, I know LLD-ELF does it, but seems pretty superfluous to me. I guess it means that we'll get the same hash values if we run LLD on a big-endian host machine, but ... when will that ever happen? And if someone tried that, I'm pretty sure lots of things would break anyway, since we don't have any contbuilds covering this.

int3 edited the summary of this revision. (Show Details)Mar 24 2021, 12:04 PM
int3 planned changes to this revision.Mar 24 2021, 12:06 PM

Let me see what I can do about the chunks.

oontvoo added inline comments.Mar 24 2021, 12:49 PM
lld/MachO/Writer.cpp
921–925

Can you elaborate? I'm not sure I follow.
As for using write64le <..... >

I guess write64le is not strictly required :)

I was getting more at the reinterpret_cast. There is no *guarantee* that casting uint64_t* to a uint8_t* won't cause aliasing violations. In practice, it's probably fine. If possible, though, it'd be nice to avoid it. (And it *is* possible in this case)

int3 added inline comments.Mar 24 2021, 2:07 PM
lld/MachO/Writer.cpp
921–925

Hmm interesting. I did some reading up on aliasing rules, and if I understand things correctly, casting to a char type is fine.

From https://github.com/cplusplus/draft/raw/master/papers/n4659.pdf, section 6.10 [basic.lval] point 8:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined
...
a char, unsigned char, or std::byte type.

int3 updated this revision to Diff 333181.Mar 24 2021, 5:33 PM

size chunks based on number of cores

int3 added a comment.Mar 24 2021, 5:33 PM

The initial numbers were noisy, I've updated the commit message with better data

int3 added a comment.Mar 30 2021, 7:15 AM

bump. Can I get a stamp?

oontvoo accepted this revision.Mar 30 2021, 10:13 AM

LGTM

lld/MachO/Writer.cpp
920

It seems strategy is only initted to hardware_concurrency (in Driver.cpp) if --threads is set.
Do we care to set set it unconditionally?

921–925

Not the hill to die on but I'll add that uint8_t isn't *guaranteed* to be unsigned char. Yes, in practice code in LLVM seems to make this kind of assumption. so .... ¯\_(ツ)_/¯

This revision is now accepted and ready to land.Mar 30 2021, 10:13 AM
int3 marked an inline comment as done.Mar 31 2021, 12:39 PM
int3 added inline comments.
lld/MachO/Writer.cpp
920

The default-constructed strategy will return the right value for compute_thread_count() too, so we don't need to

This revision was landed with ongoing or failed builds.Mar 31 2021, 12:49 PM
This revision was automatically updated to reflect the committed changes.
int3 marked an inline comment as done.
MaskRay added inline comments.Dec 16 2021, 11:56 PM
lld/MachO/Writer.cpp
920

Is this approach faster than fixed-size 1MiB chunks used by lld/ELF?