This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/trunk/packages/Python/lldbsuite/
-
trunk/
-
packages/
-
Python/
-
lldbsuite/
-
support/
-
encoded_file.py
-
test/
-
lang/cpp/char1632_t/
-
cpp/
-
char1632_t/
-
TestChar1632T.py
-
lldbtest.py
-
tools/lldb-server/
-
lldb-server/
-
TestGDBRemoteMemoryRead.py

Differential D16736

Always write the session log file in UTF-8
ClosedPublic

Authored by zturner on Jan 29 2016, 1:50 PM.

Download Raw Diff

Details

Reviewers

tberghammer
tfiala
labath

Commits

rGe5a7990dbe8d: Always write the session file in UTF-8.
rLLDB259379: Always write the session file in UTF-8.
rL259379: Always write the session file in UTF-8.

Summary

Hopefully the 10th time's a charm on this one.

This patch attempts to solve the Python 2 / Python 3 incompatibilities by introducing a new encoded_file abstraction that we use instead of io.open(). The problem with the builtin implementation of io.open is that read and write accept and return unicode objects, which are not always convenient to work with in Python 2. We solve this by making encoded_file.open() return the same object returned by io.open() but with hooked read() and write() methods. These hooked methods will accept binary or text data, and conditionally convert what it gets to a unicode object using the correct encoding. When calling read() it also does any conversion necessary to convert the output back into the native string type of the running python version.

I actually tested this one on both Windows and Linux and it works fine. So hopefully this is closer to a working solution!

Diff Detail

Repository: rL LLVM

Event Timeline

zturner updated this revision to Diff 46426.Jan 29 2016, 1:50 PM

zturner retitled this revision from to Always write the session log file in UTF-8.

zturner updated this object.

zturner added reviewers: tfiala, labath, tberghammer.

zturner added a subscriber: lldb-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptJan 29 2016, 1:50 PM

I'll give it a run!

Running the test suite now. So far so good (it didn't blow up immediately like it has for other attempts). I'll post back when the test run completes.

The only test I see fail with this is:
TestGDBRemoteMemoryRead.py:

ERROR: test_memory_read_dwarf (TestGDBRemoteMemoryRead.MemoryReadTestCase)

Traceback (most recent call last):

File "/Users/tfiala/lldb/tot/git-svn/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2289, in dwarf_test_method
  return attrvalue(self)
File "/Users/tfiala/lldb/tot/git-svn/lldb/packages/Python/lldbsuite/test/tools/lldb-server/TestGDBRemoteMemoryRead.py", line 37, in test_memory_read
  self.match("process plugin packet send x%x,%x" % (pc, size), ["response:", memory])
File "/Users/tfiala/lldb/tot/git-svn/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2615, in match
  self.runCmd(str, msg=msg, trace = (True if trace else False), check = not error)
File "/Users/tfiala/lldb/tot/git-svn/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2591, in runCmd
  print(self.res.GetError(), file=sbuf)
File "/Users/tfiala/lldb/tot/git-svn/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 267, in __exit__
  print(self.getvalue(), file=self.session)
File "/Users/tfiala/lldb/tot/git-svn/lldb/packages/Python/lldbsuite/support/encoded_file.py", line 32, in impl
  s = s.decode(encoding)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
  return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 90: invalid start byte

Hmm, that almost looks to me like it's finding a real bug. Which it
probably is, because higher up the callstack it's reading the response of a
memory send. I'm guessing that the response includes non-printable
characters (raw memory data) that wasn't being written before but was being
silently ignored, and now it's giving an error.

Can you try passing errors='replace' to encoded_file.open()? Then open the
log file and see what's in it.

In D16736#339889, @zturner wrote:

Hmm, that almost looks to me like it's finding a real bug. Which it
probably is, because higher up the callstack it's reading the response of a
memory send. I'm guessing that the response includes non-printable
characters (raw memory data) that wasn't being written before but was being
silently ignored, and now it's giving an error.

Can you try passing errors='replace' to encoded_file.open()? Then open the
log file and see what's in it.

Yep, I can give that a try. It'll end up being later (not at a computer now) but I'll give it a shot.

In D16736#339925, @tfiala wrote:

In D16736#339889, @zturner wrote:

Hmm, that almost looks to me like it's finding a real bug. Which it
probably is, because higher up the callstack it's reading the response of a
memory send. I'm guessing that the response includes non-printable
characters (raw memory data) that wasn't being written before but was being
silently ignored, and now it's giving an error.

Can you try passing errors='replace' to encoded_file.open()? Then open the
log file and see what's in it.

Yep, I can give that a try. It'll end up being later (not at a computer now) but I'll give it a shot.

I'm going to have a look at trying this modification now.

I'm going to have a look at trying this modification now.

I'm getting the same error with the replace.

Here is the patch (okay the whole encoded_file.py) I was able to use to get past this - which ultimately looks to be an error in the match result printing of a raw byte buffer (never meant to be unicode printable) in the result stream (i.e. the error I was getting looked to be purely a by-product of printing match results that succeeded, not failed, but the unicode decoding introduced the actual failure point):

"""
                     The LLVM Compiler Infrastructure

This file is distributed under the University of Illinois Open Source
License. See LICENSE.TXT for details.

Prepares language bindings for LLDB build process.  Run with --help
to see a description of the supported command line arguments.
"""

# Python modules:
import io

# Third party modules
import six

def _encoded_read(old_read, encoding):
    def impl(size):
        result = old_read(size)
        # If this is Python 2 then we need to convert the resulting `unicode` back
        # into a `str` before returning
        if six.PY2:
            result = result.encode(encoding)
        return result
    return impl

def _encoded_write(old_write, encoding):
    def impl(s):
        # If we were asked to write a `str` (in Py2) or a `bytes` (in Py3) decode it
        # as unicode before attempting to write.
        if isinstance(s, six.binary_type):
            try:
                s = s.decode(encoding)
            except UnicodeDecodeError as decode_err:
                import sys
                sys.stderr.write("error: unicode decode failed on raw string '{}': '{}'".format(s, decode_err))
                s = u"Could not decode unicode string, see stderr for details"
        return old_write(s)
    return impl

'''
Create a Text I/O file object that can be written to with either unicode strings or byte strings
under Python 2 and Python 3, and automatically encodes and decodes as necessary to return the
native string type for the current Python version
'''
def open(file, encoding, mode='r', buffering=-1, errors=None, newline=None, closefd=True):
    wrapped_file = io.open(file, mode=mode, buffering=buffering, encoding=encoding,
                           errors=errors, newline=newline, closefd=closefd)
    new_read = _encoded_read(getattr(wrapped_file, 'read'), encoding)
    new_write = _encoded_write(getattr(wrapped_file, 'write'), encoding)
    setattr(wrapped_file, 'read', new_read)
    setattr(wrapped_file, 'write', new_write)
    return wrapped_file

It just adds a try/except block around the Unicode decode. Is is highly likely that might not run on Python 3 - i.e. ping pong this back into a Python 3 error. I may try to bring this up on Windows to see if that does actually happen.

In any event, the right fix here probably is to have displays of matched/expected text for known-to-be binary data *not* try to print results in the expect-string-match code since these are just going to have no way of being valid. The other way to go (perhaps better) would be to put some kind of safe wrapper around the byte compares, so that they are tested as ASCII-ified output, or use an entirely different mechanism here.

If this works on Windows the way I fixed this up, you can go ahead and check this in. (I no longer get failures with the code change I made above). If it doesn't work but you can tweak that slightly, feel free to go ahead and do that as well. In the meantime I am going to see if I can get the binary aspect of the matching handled properly (i.e. not done as string compares). This might be one of my tests. (It's at least in the goop of lldb-server tests that I had written 1.5 to 2 years ago).

In D16736#340302, @tfiala wrote:

In any event, the right fix here probably is to have displays of matched/expected text for known-to-be binary data *not* try to print results in the expect-string-match code since these are just going to have no way of being valid.

i.e. a test fix. I think we still want to keep the unicode decode guard in, as it could easily be something printing binary-based data like we had here, but the culprit under this is basically the method the test is using to verify binary data.

In fact I'd say you could do this, now that I've looked at the test:

diff --git a/packages/Python/lldbsuite/test/tools/lldb-server/TestGDBRemoteMemoryRead.py b/packages/Python/lldbsuite/test/tools/lldb-server/TestGDBRemoteMemoryRead.py
index 7b974e5..9f80abe 100644
--- a/packages/Python/lldbsuite/test/tools/lldb-server/TestGDBRemoteMemoryRead.py
+++ b/packages/Python/lldbsuite/test/tools/lldb-server/TestGDBRemoteMemoryRead.py
@@ -34,7 +34,7 @@ class MemoryReadTestCase(TestBase):
             error = lldb.SBError()
             memory = process.ReadMemory(pc, size, error)
             self.assertTrue(error.Success())
-            self.match("process plugin packet send x%x,%x" % (pc, size), ["response:", memory])
+            # self.match("process plugin packet send x%x,%x" % (pc, size), ["response:", memory])
             self.match("process plugin packet send m%x,%x" % (pc, size), ["response:", binascii.hexlify(memory)])

         process.Continue()

(i.e. comment out the direct memory compare with strings). That'll take a little bit of work to re-work, but the 'm' hexlified version is sufficient for verifying memory reads for now until we can rework the test to compare the bytes directly instead of counting on print-style matches. You could comment it out and file a bug on it to fix.

TestGDBRemoteMemoryRead.py is my test and I gotta admit it is somewhat hackish. If this is the only issue that is stopping this from going through, then please xfail it, and i'll rewrite it in some other way.

Just that x command needs to be commented out. The m command is fine as is.

-Todd

Closed by commit rL259379: Always write the session file in UTF-8. (authored by zturner). · Explain WhyFeb 1 2016, 10:17 AM

This revision was automatically updated to reflect the committed changes.

This change appears to have caused a test error (rather than a test failure) on the OS X builder. See here (master builder step):
http://lab.llvm.org:8080/green/job/LLDB/16318/

and here (build and test run step):
http://lab.llvm.org:8080/green/job/lldb_build_test/16209/

It didn't get reported since the test collector Jenkins step is somehow behaving differently than other scenarios and wasn't marking the build as failed. (I have since fixed this morning).

These look to be some kind of Unicode encoding failure, which means the test is probably doing something that isn't Unicode safe. I'll have a look at it.

In D16736#341705, @tfiala wrote:

This change appears to have caused a test error (rather than a test failure) on the OS X builder.

It's not clear to me yet why I didn't see this at home, but my first guess is that I have all my consoles set to utf-8 encoding, so if this is terminal related, I may have masked the issue that way. In any event, I'll have a look.

Skipped TestWatchLocation.py test method on OS X with r259526 while I investigate.

Revision Contents

Path

Size

lldb/

trunk/

packages/

Python/

lldbsuite/

support/

encoded_file.py

48 lines

test/

lang/

cpp/

char1632_t/

TestChar1632T.py

1 line

lldbtest.py

4 lines

tools/

lldb-server/

TestGDBRemoteMemoryRead.py

3 lines

Diff 46556

lldb/trunk/packages/Python/lldbsuite/support/encoded_file.py

				"""
				The LLVM Compiler Infrastructure

				This file is distributed under the University of Illinois Open Source
				License. See LICENSE.TXT for details.

				Prepares language bindings for LLDB build process. Run with --help
				to see a description of the supported command line arguments.
				"""

				# Python modules:
				import io

				# Third party modules
				import six

				def _encoded_read(old_read, encoding):
				def impl(size):
				result = old_read(size)
				# If this is Python 2 then we need to convert the resulting `unicode` back
				# into a `str` before returning
				if six.PY2:
				result = result.encode(encoding)
				return result
				return impl

				def _encoded_write(old_write, encoding):
				def impl(s):
				# If we were asked to write a `str` (in Py2) or a `bytes` (in Py3) decode it
				# as unicode before attempting to write.
				if isinstance(s, six.binary_type):
				s = s.decode(encoding)
				return old_write(s)
				return impl

				'''
				Create a Text I/O file object that can be written to with either unicode strings or byte strings
				under Python 2 and Python 3, and automatically encodes and decodes as necessary to return the
				native string type for the current Python version
				'''
				def open(file, encoding, mode='r', buffering=-1, errors=None, newline=None, closefd=True):
				wrapped_file = io.open(file, mode=mode, buffering=buffering, encoding=encoding,
				errors=errors, newline=newline, closefd=closefd)
				new_read = _encoded_read(getattr(wrapped_file, 'read'), encoding)
				new_write = _encoded_write(getattr(wrapped_file, 'write'), encoding)
				setattr(wrapped_file, 'read', new_read)
				setattr(wrapped_file, 'write', new_write)
				return wrapped_file

lldb/trunk/packages/Python/lldbsuite/test/lang/cpp/char1632_t/TestChar1632T.py

Show All 19 Lines	def setUp(self):
# Call super's setUp().		# Call super's setUp().
TestBase.setUp(self)		TestBase.setUp(self)
# Find the line number to break for main.cpp.		# Find the line number to break for main.cpp.
self.source = 'main.cpp'		self.source = 'main.cpp'
self.lines = [ line_number(self.source, '// breakpoint1'),		self.lines = [ line_number(self.source, '// breakpoint1'),
line_number(self.source, '// breakpoint2') ]		line_number(self.source, '// breakpoint2') ]

@expectedFailureIcc # ICC (13.1) does not emit the DW_TAG_base_type for char16_t and char32_t.		@expectedFailureIcc # ICC (13.1) does not emit the DW_TAG_base_type for char16_t and char32_t.
@expectedFailureWindows("llvm.org/pr24489: Name lookup not working correctly on Windows")
def test(self):		def test(self):
"""Test that the C++11 support for char16_t and char32_t works correctly."""		"""Test that the C++11 support for char16_t and char32_t works correctly."""
self.build()		self.build()
exe = os.path.join(os.getcwd(), "a.out")		exe = os.path.join(os.getcwd(), "a.out")

# Create a target by the debugger.		# Create a target by the debugger.
target = self.dbg.CreateTarget(exe)		target = self.dbg.CreateTarget(exe)
self.assertTrue(target, VALID_TARGET)		self.assertTrue(target, VALID_TARGET)
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

lldb/trunk/packages/Python/lldbsuite/test/lldbtest.py

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
# Third-party modules		# Third-party modules
import unittest2		import unittest2
from six import add_metaclass		from six import add_metaclass
from six import StringIO as SixStringIO		from six import StringIO as SixStringIO
from six.moves.urllib import parse as urlparse		from six.moves.urllib import parse as urlparse
import six		import six

# LLDB modules		# LLDB modules
		import use_lldb_suite
import lldb		import lldb
from . import configuration		from . import configuration
from . import lldbtest_config		from . import lldbtest_config
from . import lldbutil		from . import lldbutil
from . import test_categories		from . import test_categories
		from lldbsuite.support import encoded_file

from .result_formatter import EventBuilder		from .result_formatter import EventBuilder

# dosep.py starts lots and lots of dotest instances		# dosep.py starts lots and lots of dotest instances
# This option helps you find if two (or more) dotest instances are using the same		# This option helps you find if two (or more) dotest instances are using the same
# directory at the same time		# directory at the same time
# Enable it to cause test failures and stderr messages if dotest instances try to run in		# Enable it to cause test failures and stderr messages if dotest instances try to run in
# the same directory simultaneously		# the same directory simultaneously
▲ Show 20 Lines • Show All 1,410 Lines • ▼ Show 20 Lines	def setUp(self):
self.forkedProcessPids = []		self.forkedProcessPids = []

# Create a string buffer to record the session info, to be dumped into a		# Create a string buffer to record the session info, to be dumped into a
# test case specific file if test failure is encountered.		# test case specific file if test failure is encountered.
self.log_basename = self.getLogBasenameForCurrentTest()		self.log_basename = self.getLogBasenameForCurrentTest()

session_file = "{}.log".format(self.log_basename)		session_file = "{}.log".format(self.log_basename)
# Python 3 doesn't support unbuffered I/O in text mode. Open buffered.		# Python 3 doesn't support unbuffered I/O in text mode. Open buffered.
self.session = open(session_file, "w")		self.session = encoded_file.open(session_file, "utf-8", mode="w")

# Optimistically set __errored__, __failed__, __expected__ to False		# Optimistically set __errored__, __failed__, __expected__ to False
# initially. If the test errored/failed, the session info		# initially. If the test errored/failed, the session info
# (self.session) is then dumped into a session specific file for		# (self.session) is then dumped into a session specific file for
# diagnosis.		# diagnosis.
self.__cleanup_errored__ = False		self.__cleanup_errored__ = False
self.__errored__ = False		self.__errored__ = False
self.__failed__ = False		self.__failed__ = False
▲ Show 20 Lines • Show All 1,314 Lines • Show Last 20 Lines

lldb/trunk/packages/Python/lldbsuite/test/tools/lldb-server/TestGDBRemoteMemoryRead.py

Show All 28 Lines	def test_memory_read(self):
self.assertTrue(process, PROCESS_IS_VALID)		self.assertTrue(process, PROCESS_IS_VALID)
self.assertEqual(process.GetState(), lldb.eStateStopped, "Process is stopped")		self.assertEqual(process.GetState(), lldb.eStateStopped, "Process is stopped")

pc = process.GetSelectedThread().GetSelectedFrame().GetPC()		pc = process.GetSelectedThread().GetSelectedFrame().GetPC()
for size in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]:		for size in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]:
error = lldb.SBError()		error = lldb.SBError()
memory = process.ReadMemory(pc, size, error)		memory = process.ReadMemory(pc, size, error)
self.assertTrue(error.Success())		self.assertTrue(error.Success())
self.match("process plugin packet send x%x,%x" % (pc, size), ["response:", memory])		# Results in trying to write non-printable characters to the session log.
		# self.match("process plugin packet send x%x,%x" % (pc, size), ["response:", memory])
self.match("process plugin packet send m%x,%x" % (pc, size), ["response:", binascii.hexlify(memory)])		self.match("process plugin packet send m%x,%x" % (pc, size), ["response:", binascii.hexlify(memory)])

process.Continue()		process.Continue()
self.assertEqual(process.GetState(), lldb.eStateExited, "Process exited")		self.assertEqual(process.GetState(), lldb.eStateExited, "Process exited")