Otherwise, it doesn't take into account things like whether the symbol
is defined or undefined, and whether symbols are indirect references
(re-exports) or not.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
So _symbol_difference is used to check for added or removed symbols. changed_symbols is the bit of logic that detects if the properties of the symbol changed.
What's the issue this is trying to address?
The problem this is trying to address is that we don't take into account the addition of an indirect symbol (a re-export). Try this:
cat <<EOF > old.sym {'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'} EOF cat <<EOF > new.sym {'is_defined': True, 'name': '___cxa_throw_bad_array_new_length', 'type': 'I'} {'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'} EOF ./libcxx/utils/sym_diff.py old.sym new.sym
You'll see that despite the fact that we added an indirect symbol, we consider both symbol lists to be the same. This was discovered while working on https://reviews.llvm.org/D60424, which adds such re-exports.
Looking at it again, it looks like the "problem" is that we don't support duplicate symbol names in the list. For example, we use a set in _symbol_differences.
Is there a reason for not simply normalizing these list (sorting each line and also sorting each JSON entry), and then simply running diff on it? It seems to me that this would take into account all the "aspects" related to a symbol, automatically.
We shouldn't emit tables with duplicate symbols from sym_extract. Instead we should meaningfully merge duplicate symbols in the extractor as the appear.
U means the symbol has been used but not defined. I is an indirect definition. The definition should take precedence.
This case also occurs using sym_check on static libraries when one TU uses a symbol and another defines it.
I have a patch sitting around that does this. Let me clean it up and send it out.
I mean, listing both a U entry and a I entry is what nm does if I'm not mistaken. Why are we trying to do something so significantly different from nm? It seems to me that what this script should do is essentially diff <(nm -jg old.dylib) <(nm -jg new.dylib) while taking into account symbol types and a few other things.
LGTM. This fix is correct given the current state of things.
I'll work on improving things in the longer term.