This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Deduplicate names of local symbols only with -O2
ClosedPublic

Authored by MaskRay on Jan 30 2022, 6:23 PM.

Details

Summary

The deduplication requires a DenseMap of the same size of the local part of
.strtab . I optimized it in e20544543478b259eb09fa0a253d4fb1a5525d9e but it is
still quite slow.

For Release build of clang, deduplication makes .strtab 1.1% smaller and makes the link 3% slower.
For chrome, deduplication makes .strtab 0.1% smaller and makes the link 6% slower.

I suggest that we only perform the optimization with -O2 (default is -O1).
Not deduplicating local symbol names will simplify parallel symbol table write.

Diff Detail

Event Timeline

MaskRay created this revision.Jan 30 2022, 6:23 PM
MaskRay requested review of this revision.Jan 30 2022, 6:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2022, 6:23 PM

I'm OK with the change. It looks like a good trade off for non-release links. I think it will be worth an entry in the release notes. Something like: "LLD no longer deduplicates local symbol names at the default optimization level of -O1. This results in a larger output ELF file but a faster link time. Use optimization level -O2 to restore the deduplication."

MaskRay updated this revision to Diff 404738.EditedJan 31 2022, 2:42 PM

Thanks for the note. Added to ReleaseNotes.rst

MaskRay edited the summary of this revision. (Show Details)Jan 31 2022, 11:49 PM
peter.smith accepted this revision.Feb 1 2022, 1:42 AM

LGTM thanks for the release note.

This revision is now accepted and ready to land.Feb 1 2022, 1:42 AM
MaskRay edited the summary of this revision. (Show Details)Feb 1 2022, 10:09 AM
This revision was automatically updated to reflect the committed changes.

This .strtab deduplication makes parallel .symtab write difficult. I suspect we may need to disable it entirely in the future.

This .strtab deduplication makes parallel .symtab write difficult. I suspect we may need to disable it entirely in the future.

Thanks for making this optimization. I suspect disabling entirely is acceptable because the deduplication could be done via post-processing. There's going to be a trade off between the features offered and the amount of parallelism - I think that's fine as long as the speed up is reasonable. Users are evidently willing to go to great lengths to restructure their builds to get the most out of the build tools (game code often uses unity builds, for example).