This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/trunk/utils/libcxx/sym_check/
-
trunk/
-
utils/
-
libcxx/
-
sym_check/
-
diff.py

Differential D60416

[libc++] Make sure that the symbol differ takes into account symbol types
ClosedPublic

Authored by ldionne on Apr 8 2019, 11:54 AM.

Download Raw Diff

Details

Reviewers

EricWF

Commits

rCXX358408: [libc++] Make sure that the symbol differ takes into account symbol types
rGf3e4f24ed749: [libc++] Make sure that the symbol differ takes into account symbol types
rL358408: [libc++] Make sure that the symbol differ takes into account symbol types

Summary

Otherwise, it doesn't take into account things like whether the symbol
is defined or undefined, and whether symbols are indirect references
(re-exports) or not.

Diff Detail

Repository: rL LLVM

Event Timeline

ldionne created this revision.Apr 8 2019, 11:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2019, 11:54 AM

Herald added subscribers: libcxx-commits, dexonsmith, jkorous, christof. · View Herald Transcript

Harbormaster completed remote builds in B30192: Diff 194185.Apr 8 2019, 11:54 AM

So _symbol_difference is used to check for added or removed symbols. changed_symbols is the bit of logic that detects if the properties of the symbol changed.

What's the issue this is trying to address?

In D60416#1461648, @EricWF wrote:

So _symbol_difference is used to check for added or removed symbols. changed_symbols is the bit of logic that detects if the properties of the symbol changed.

What's the issue this is trying to address?

The problem this is trying to address is that we don't take into account the addition of an indirect symbol (a re-export). Try this:

cat <<EOF > old.sym
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

cat <<EOF > new.sym
{'is_defined': True, 'name': '___cxa_throw_bad_array_new_length', 'type': 'I'}
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

./libcxx/utils/sym_diff.py old.sym new.sym

You'll see that despite the fact that we added an indirect symbol, we consider both symbol lists to be the same. This was discovered while working on https://reviews.llvm.org/D60424, which adds such re-exports.

Looking at it again, it looks like the "problem" is that we don't support duplicate symbol names in the list. For example, we use a set in _symbol_differences.

Is there a reason for not simply normalizing these list (sorting each line and also sorting each JSON entry), and then simply running diff on it? It seems to me that this would take into account all the "aspects" related to a symbol, automatically.

In D60416#1461770, @ldionne wrote:
In D60416#1461648, @EricWF wrote:

So _symbol_difference is used to check for added or removed symbols. changed_symbols is the bit of logic that detects if the properties of the symbol changed.

What's the issue this is trying to address?

The problem this is trying to address is that we don't take into account the addition of an indirect symbol (a re-export). Try this:
cat <<EOF > old.sym
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

cat <<EOF > new.sym
{'is_defined': True, 'name': '___cxa_throw_bad_array_new_length', 'type': 'I'}
{'is_defined': False, 'name': '___cxa_throw_bad_array_new_length', 'type': 'U'}
EOF

./libcxx/utils/sym_diff.py old.sym new.sym
You'll see that despite the fact that we added an indirect symbol, we consider both symbol lists to be the same. This was discovered while working on https://reviews.llvm.org/D60424, which adds such re-exports.

Looking at it again, it looks like the "problem" is that we don't support duplicate symbol names in the list. For example, we use a set in _symbol_differences.

We shouldn't emit tables with duplicate symbols from sym_extract. Instead we should meaningfully merge duplicate symbols in the extractor as the appear.
U means the symbol has been used but not defined. I is an indirect definition. The definition should take precedence.
This case also occurs using sym_check on static libraries when one TU uses a symbol and another defines it.

I have a patch sitting around that does this. Let me clean it up and send it out.

In D60416#1462291, @EricWF wrote:

We shouldn't emit tables with duplicate symbols from sym_extract. Instead we should meaningfully merge duplicate symbols in the extractor as the appear.
U means the symbol has been used but not defined. I is an indirect definition. The definition should take precedence.
This case also occurs using sym_check on static libraries when one TU uses a symbol and another defines it.

I have a patch sitting around that does this. Let me clean it up and send it out.

I mean, listing both a U entry and a I entry is what nm does if I'm not mistaken. Why are we trying to do something so significantly different from nm? It seems to me that what this script should do is essentially diff <(nm -jg old.dylib) <(nm -jg new.dylib) while taking into account symbol types and a few other things.

LGTM. This fix is correct given the current state of things.

I'll work on improving things in the longer term.

This revision is now accepted and ready to land.Apr 13 2019, 4:02 PM

Closed by commit rL358408: [libc++] Make sure that the symbol differ takes into account symbol types (authored by ldionne). · Explain WhyApr 15 2019, 7:03 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2019, 7:03 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Revision Contents

Path

Size

libcxx/

trunk/

utils/

libcxx/

sym_check/

diff.py

6 lines

Diff 195176

libcxx/trunk/utils/libcxx/sym_check/diff.py

	# -- Python -- vim: set syntax=python tabstop=4 expandtab cc=80:			# -- Python -- vim: set syntax=python tabstop=4 expandtab cc=80:
	#===----------------------------------------------------------------------===##			#===----------------------------------------------------------------------===##
	#			#
	# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	# See https://llvm.org/LICENSE.txt for license information.			# See https://llvm.org/LICENSE.txt for license information.
	# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	#			#
	#===----------------------------------------------------------------------===##			#===----------------------------------------------------------------------===##
	"""			"""
	diff - A set of functions for diff-ing two symbol lists.			diff - A set of functions for diff-ing two symbol lists.
	"""			"""

	from libcxx.sym_check import util			from libcxx.sym_check import util


	def _symbol_difference(lhs, rhs):			def _symbol_difference(lhs, rhs):
	lhs_names = set((n['name'] for n in lhs))			lhs_names = set(((n['name'], n['type']) for n in lhs))
	rhs_names = set((n['name'] for n in rhs))			rhs_names = set(((n['name'], n['type']) for n in rhs))
	diff_names = lhs_names - rhs_names			diff_names = lhs_names - rhs_names
	return [n for n in lhs if n['name'] in diff_names]			return [n for n in lhs if (n['name'], n['type']) in diff_names]


	def _find_by_key(sym_list, k):			def _find_by_key(sym_list, k):
	for sym in sym_list:			for sym in sym_list:
	if sym['name'] == k:			if sym['name'] == k:
	return sym			return sym
	return None			return None

	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines