This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/test/API/
-
test/
-
API/
2/2
lldbtest.py

Differential D79645

[lldb/test] Fix for flakiness in TestNSDictionarySynthetic
ClosedPublic

Authored by vsk on May 8 2020, 11:56 AM.

Download Raw Diff

Details

Reviewers

JDevlieghere
jingham
shafik

Commits

rG05551fb21435: [lldb/test] Fix for flakiness in TestNSDictionarySynthetic
rGf807d0b4acdb: [lldb/test] Fix for flakiness in TestNSDictionarySynthetic

Summary

TestNSDictionarySynthetic sets up an NSURL which does not initialize its
_baseURL member. When the test runs and we print out the NSURL, we print
out some garbage memory pointed-to by the _baseURL member, like:

_baseURL = 0x0800010020004029 @"d��qX"

and this can cause a python unicode decoding error like:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position
10309: invalid start byte

There's a discrepancy here because lldb's StringPrinter facility tries
to only print out "printable" sequences (see: isprint32()), whereas python
rejects the StringPrinter output as invalid utf8. For the specific error
seen above, lldb's isprint32(0xa0) = true, even though 0xa0 is not
really "printable" in the usual sense.

The problem is that lldb and python disagree on what exactly is
"printable". Both have dismayingly hand-rolled utf8 validation code
(c.f. _Py_DecodeUTF8Ex), and I can't really tell which one is more
correct.

I tried replacing lldb's isprint32() with a call to libc's iswprint():
this satisfied python, but broke emoji printing :|.

Now, I believe that lldb (and python too) ought to just call into some
battle-tested utf library, and that we shouldn't aim for compatibility
with python's strict unicode decoding mode until then.

FWIW I ran this test under an ASanified lldb hundreds of times but
didn't turn up any other issues.

rdar://62941711

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vsk created this revision.May 8 2020, 11:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2020, 11:56 AM

Herald added a subscriber: lldb-commits. · View Herald Transcript

Harbormaster completed remote builds in B56187: Diff 262927.May 8 2020, 1:26 PM

LGTM. We discussed adding this during the original review, but I left it out because we could always add it if we needed it. Turns out we do!

This revision is now accepted and ready to land.May 8 2020, 3:17 PM

shafik added inline comments.May 8 2020, 4:06 PM

lldb/test/API/lldbtest.py
122–129	Maybe add a comment as to why we are using `'replace'` and LLDB and python implementation disagree and this works for now.

shafik accepted this revision.May 8 2020, 4:07 PM

vsk marked 2 inline comments as done.May 11 2020, 9:53 AM

vsk added inline comments.

lldb/test/API/lldbtest.py
122–129	Definitely, I'll add this before committing.

Closed by commit rGf807d0b4acdb: [lldb/test] Fix for flakiness in TestNSDictionarySynthetic (authored by vsk). · Explain WhyMay 11 2020, 10:12 AM

This revision was automatically updated to reflect the committed changes.

vsk marked an inline comment as done.

Revision Contents

Path

Size

lldb/

test/

API/

lldbtest.py

11 lines

Diff 263210

lldb/test/API/lldbtest.py

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	def execute(self, test, litConfig):
except lit.util.ExecuteCommandTimeoutException as e:		except lit.util.ExecuteCommandTimeoutException as e:
out = e.out		out = e.out
err = e.err		err = e.err
exitCode = e.exitCode		exitCode = e.exitCode
timeoutInfo = 'Reached timeout of {} seconds'.format(		timeoutInfo = 'Reached timeout of {} seconds'.format(
litConfig.maxIndividualTestTime)		litConfig.maxIndividualTestTime)

if sys.version_info.major == 2:		if sys.version_info.major == 2:
# In Python 2, string objects can contain Unicode characters.		# In Python 2, string objects can contain Unicode characters. Use
out = out.decode('utf-8')		# the non-strict 'replace' decoding mode. We cannot use the strict
err = err.decode('utf-8')		# mode right now because lldb's StringPrinter facility and the
		# Python utf8 decoder have different interpretations of which
		# characters are "printable". This leads to Python utf8 decoding
		# exceptions even though lldb is behaving as expected.
		out = out.decode('utf-8', 'replace')
		err = err.decode('utf-8', 'replace')
		shafikUnsubmitted Done Reply Inline Actions Maybe add a comment as to why we are using `'replace'` and LLDB and python implementation disagree and this works for now. shafik: Maybe add a comment as to why we are using `'replace'` and LLDB and python implementation…
		vskAuthorUnsubmitted Done Reply Inline Actions Definitely, I'll add this before committing. vsk: Definitely, I'll add this before committing.

output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (		output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (
' '.join(cmd), exitCode)		' '.join(cmd), exitCode)
if timeoutInfo is not None:		if timeoutInfo is not None:
output += """Timeout: %s\n""" % (timeoutInfo,)		output += """Timeout: %s\n""" % (timeoutInfo,)
output += "\n"		output += "\n"

if out:		if out:
Show All 24 Lines