This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Make sure that the symbol differ takes into account symbol types
ClosedPublic

Authored by ldionne on Apr 8 2019, 11:54 AM.

Details

Summary

Otherwise, it doesn't take into account things like whether the symbol
is defined or undefined, and whether symbols are indirect references
(re-exports) or not.

Diff Detail

Repository
rL LLVM

Event Timeline

ldionne created this revision.Apr 8 2019, 11:54 AM

So _symbol_difference is used to check for added or removed symbols. changed_symbols is the bit of logic that detects if the properties of the symbol changed.

What's the issue this is trying to address?

So _symbol_difference is used to check for added or removed symbols. changed_symbols is the bit of logic that detects if the properties of the symbol changed.

What's the issue this is trying to address?

The problem this is trying to address is that we don't take into account the addition of an indirect symbol (a re-export). Try this:

cat <<EOF > old.sym
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

cat <<EOF > new.sym
{'is_defined': True, 'name': '___cxa_throw_bad_array_new_length', 'type': 'I'}
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

./libcxx/utils/sym_diff.py old.sym new.sym

You'll see that despite the fact that we added an indirect symbol, we consider both symbol lists to be the same. This was discovered while working on https://reviews.llvm.org/D60424, which adds such re-exports.

Looking at it again, it looks like the "problem" is that we don't support duplicate symbol names in the list. For example, we use a set in _symbol_differences.

Is there a reason for not simply normalizing these list (sorting each line and also sorting each JSON entry), and then simply running diff on it? It seems to me that this would take into account all the "aspects" related to a symbol, automatically.

So _symbol_difference is used to check for added or removed symbols. changed_symbols is the bit of logic that detects if the properties of the symbol changed.

What's the issue this is trying to address?

The problem this is trying to address is that we don't take into account the addition of an indirect symbol (a re-export). Try this:

cat <<EOF > old.sym
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

cat <<EOF > new.sym
{'is_defined': True, 'name': '___cxa_throw_bad_array_new_length', 'type': 'I'}
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

./libcxx/utils/sym_diff.py old.sym new.sym

You'll see that despite the fact that we added an indirect symbol, we consider both symbol lists to be the same. This was discovered while working on https://reviews.llvm.org/D60424, which adds such re-exports.

Looking at it again, it looks like the "problem" is that we don't support duplicate symbol names in the list. For example, we use a set in _symbol_differences.

We shouldn't emit tables with duplicate symbols from sym_extract. Instead we should meaningfully merge duplicate symbols in the extractor as the appear.
U means the symbol has been used but not defined. I is an indirect definition. The definition should take precedence.
This case also occurs using sym_check on static libraries when one TU uses a symbol and another defines it.

I have a patch sitting around that does this. Let me clean it up and send it out.

We shouldn't emit tables with duplicate symbols from sym_extract. Instead we should meaningfully merge duplicate symbols in the extractor as the appear.
U means the symbol has been used but not defined. I is an indirect definition. The definition should take precedence.
This case also occurs using sym_check on static libraries when one TU uses a symbol and another defines it.

I have a patch sitting around that does this. Let me clean it up and send it out.

I mean, listing both a U entry and a I entry is what nm does if I'm not mistaken. Why are we trying to do something so significantly different from nm? It seems to me that what this script should do is essentially diff <(nm -jg old.dylib) <(nm -jg new.dylib) while taking into account symbol types and a few other things.

EricWF accepted this revision.Apr 13 2019, 4:02 PM

LGTM. This fix is correct given the current state of things.

I'll work on improving things in the longer term.

This revision is now accepted and ready to land.Apr 13 2019, 4:02 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2019, 7:03 AM