This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Deduplicate CFStrings during ICF
ClosedPublic

Authored by int3 on Feb 18 2022, 8:38 AM.

Details

Reviewers
Roger
Group Reviewers
Restricted Project
Commits
rG8ec103393300: [lld-macho] Deduplicate CFStrings during ICF
Summary

__cfstring has embedded addends that foil ICF's hashing / equality
checks. (We can ignore embedded addends when doing ICF because the same
information gets recorded in our Reloc structs.) Therefore, in order to
properly dedup CFStrings, we create a mutable copy of the CFString and
zero out the embedded addends before performing any hashing / equality
checks.

(We did in fact have a partial implementation of CFString deduplication
already. However, it only worked when the cstrings they point to are at
identical offsets in their object files.)

I anticipate this approach can be extended to other similar
statically-allocated struct sections in the future.

In addition, we previously treated all references with differing addends
as unequal. This is not true when the references are to literals:
different addends may point to the same literal in the output binary. In
particular, __cfstring has such references to __cstring. I've
adjusted ICF's equalsConstant logic accordingly, and I've added a few
more tests to make sure the addend-comparison code path is adequately
covered.

Fixes https://github.com/llvm/llvm-project/issues/51281.

Diff Detail

Event Timeline

int3 created this revision.Feb 18 2022, 8:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2022, 8:38 AM
int3 requested review of this revision.Feb 18 2022, 8:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2022, 8:38 AM
int3 edited the summary of this revision. (Show Details)Feb 18 2022, 8:41 AM
int3 updated this revision to Diff 410010.Feb 18 2022, 1:24 PM
int3 edited the summary of this revision. (Show Details)

update

Roger accepted this revision.Mar 7 2022, 1:36 PM
Roger added a subscriber: Roger.

As I understand, we are zero-ing out the addends of all cfstring sections because we are moving the addend check from the section comparisons to the relocation comparisons. All relocation comparisons (not just those for cfsections) will now check addend equality. LGTM

lld/MachO/ICF.cpp
129

For the Undefined case, it looks like we're making a functional change from returning sa == sb to always returning false. Was the previous behavior a bug? Is this Undefined symbol a symbol that needs to be linked to a definition in another source file?

This revision is now accepted and ready to land.Mar 7 2022, 1:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 1:36 PM
int3 marked an inline comment as done.Mar 7 2022, 7:32 PM
int3 added inline comments.
lld/MachO/ICF.cpp
129

oops, no, the previous behavior was intended. Let me fix this and add a test so we don't regress

int3 updated this revision to Diff 413686.Mar 7 2022, 7:32 PM
int3 marked an inline comment as done.

update

This revision was automatically updated to reflect the committed changes.