This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Deduplicate CFStrings
ClosedPublic

Authored by int3 on Jun 28 2021, 11:35 AM.

Details

Reviewers
gkm
Group Reviewers
Restricted Project
Commits
rGac2dd06b91ae: [lld-macho] Deduplicate CFStrings
Summary

__cfstring is a special literal section, so instead of breaking it up
at symbol boundaries, we break it up at fixed-width boundaries (since
each literal is the same size). Symbols can only occur at one of those
boundaries, so this is strictly more powerful than
.subsections_via_symbols.

With that in place, we then run the section through ICF.

This change is about perf-neutral when linking chromium_framework.

Diff Detail

Event Timeline

int3 created this revision.Jun 28 2021, 11:35 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: mgrang. · View Herald Transcript
int3 requested review of this revision.Jun 28 2021, 11:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2021, 11:35 AM
int3 edited the summary of this revision. (Show Details)Jul 1 2021, 6:31 AM
int3 updated this revision to Diff 355866.Jul 1 2021, 7:05 AM

don't create extra InputSections unless ICF is enabled

int3 edited the summary of this revision. (Show Details)Jul 1 2021, 7:06 AM
gkm added a comment.Jul 1 2021, 4:16 PM

LGTM

lld/MachO/InputFiles.cpp
287
lld/test/MachO/cfstring-dedup.s
76

Length of UTF-8 "foo" is 3, but length of UTF-16 "foo" (or rather "foo\0") is 4? I do see that l_.str.2 has an extra 0 terminator. Is that detail relevant? If so, it could use a comment.

gkm accepted this revision.Jul 1 2021, 4:16 PM
This revision is now accepted and ready to land.Jul 1 2021, 4:16 PM
int3 added inline comments.Jul 1 2021, 4:40 PM
lld/test/MachO/cfstring-dedup.s
76

yeah, the idea was to check that we would correctly handle utf-16 strings with null bytes in them. TBH, utf-8 *should* support encoding null bytes too, but 1) ld64 doesn't handle that correctly either and 2) clang will generate utf-16 CFStrings if their literal contains a null byte. Yeah, I'll add all that in a comment...

This revision was automatically updated to reflect the committed changes.