__cfstring has embedded addends that foil ICF's hashing / equality
checks. (We can ignore embedded addends when doing ICF because the same
information gets recorded in our Reloc structs.) Therefore, in order to
properly dedup CFStrings, we create a mutable copy of the CFString and
zero out the embedded addends before performing any hashing / equality
checks.
(We did in fact have a partial implementation of CFString deduplication
already. However, it only worked when the cstrings they point to are at
identical offsets in their object files.)
I anticipate this approach can be extended to other similar
statically-allocated struct sections in the future.
In addition, we previously treated all references with differing addends
as unequal. This is not true when the references are to literals:
different addends may point to the same literal in the output binary. In
particular, __cfstring has such references to __cstring. I've
adjusted ICF's equalsConstant logic accordingly, and I've added a few
more tests to make sure the addend-comparison code path is adequately
covered.
For the Undefined case, it looks like we're making a functional change from returning sa == sb to always returning false. Was the previous behavior a bug? Is this Undefined symbol a symbol that needs to be linked to a definition in another source file?