This is an archive of the discontinued LLVM Phabricator instance.

Fix lldb-server test suite for python3
ClosedPublic

Authored by labath on Feb 13 2019, 6:56 AM.

Details

Summary

This patch finishes the python3-ification of the lldb-server test suite.
It reverts the partial attempt in r352709 to encode/decode the string
via utf8 before writing to the socket. This wasn't enough because the
gdb-remote protocol can sometimes (but not very often) carry binary
data, and the utf8 codec chokes on that. Instead I add utility functions
to the "seven" module for performing "identity" transformations on the
byte data. This basically drills back the hole in the python type system
that the string/bytes distinction was supposed to plug. That is not
ideal, but was the best solution of the alternatives I could come up
with. The options I considered were:

  • make use of the type system to add type safety to the test suite: This required making a lot of changes to the test suite, since most of the strings would now become byte objects instead, and it was not even fully clear to me where to draw the line. One extreme solution would be to just use byte objects everywhere, as the protocol doesn't support non-ascii characters anyway. However, this appeared to be: a) weird, because most of the protocol actually deals with strings, but we would have to prefix everything with 'b' b) clunky, because the handling of the bytes objects is sufficiently different in PY2 and PY3 (e.g. b'a'[0] is a string in PY2, but an int in PY3).
  • using the latin1 codec (which gives an identity transformation for the first 256 code points of unicode) instead of the custom bytes_to_string functions. This almost could work, but it was still slightly different between python 2 and 3, because in PY2 in would return a unicode object, which would then cause problems when combined with regular strings if it contained 8-bit chars.

With this in mind, I think the best solution for the time being is to
just coerce everything into the string type as early as possible, and
have things proceed indentically on both python versions. Once we stop
supporting python3, we can revisit the idea of using bytes objects more
prevasively.

Event Timeline

labath created this revision.Feb 13 2019, 6:56 AM
packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py
415

Maybe from __future__ import division to make sure the / operator behaves consistently across versions? That way it may uncover missed / to // conversion?

davide accepted this revision.Feb 13 2019, 10:09 AM

LGTM

This revision is now accepted and ready to land.Feb 13 2019, 10:09 AM
stella.stamenova requested changes to this revision.Feb 13 2019, 10:26 AM
stella.stamenova added a subscriber: stella.stamenova.
stella.stamenova added inline comments.
packages/Python/lldbsuite/support/seven.py
28

There are other places in the lldb test suite and lit where similar functions are defined. Rather than just adding new ones here, can you normalize them and perhaps use your functions everywhere instead of ending up with two (or three) solutions?

One place that comes to mind is lit\helper\build.py: to_string. There are others as well that implement conversions in-place and usually use isinstance to determine what to do.

This revision now requires changes to proceed.Feb 13 2019, 10:26 AM

Thanks for the review Stella. I was hoping someone would step in and tell me that there is a better way to do that. However :), I see two problems with your proposal:

  • build.py is a standalone file, and so it is not easy for it to share code with other stuff. This can be solved with some use_lldb_suite magic or moving the script some place else, but then comes the second problem:
  • I am not sure these two use cases are actually compatible. Here I specifically want to "reinterpret_cast" binary bytes without doing any sort of encoding. The only place where to_string is used in build.py currently is on a value retrieved from the windows registry. I'm not 100% sure, but it seems to me that using utf8 encoding (which is what to_string is doing) is precisely the right thing to do there.

With that in mind, I propose to do the following:

  • leave build.py alone for now
  • rename the new functions I'm introducing here to something less generic. Maybe bitcast_to_string and bitcast_to_bytes?

WDYT?

labath updated this revision to Diff 186823.Feb 14 2019, 5:16 AM
labath marked an inline comment as done.
  • rename the string conversion functions and explain their operation in more detail
  • add "from future import division" to ensure consistency between python versions
packages/Python/lldbsuite/test/tools/lldb-server/lldbgdbserverutils.py
415

That sounds like a good idea. I didn't know about that. Thanks.

stella.stamenova accepted this revision.Feb 14 2019, 9:58 AM

Thanks for the review Stella. I was hoping someone would step in and tell me that there is a better way to do that. However :), I see two problems with your proposal:

  • build.py is a standalone file, and so it is not easy for it to share code with other stuff. This can be solved with some use_lldb_suite magic or moving the script some place else, but then comes the second problem:
  • I am not sure these two use cases are actually compatible. Here I specifically want to "reinterpret_cast" binary bytes without doing any sort of encoding. The only place where to_string is used in build.py currently is on a value retrieved from the windows registry. I'm not 100% sure, but it seems to me that using utf8 encoding (which is what to_string is doing) is precisely the right thing to do there.

With that in mind, I propose to do the following:

  • leave build.py alone for now
  • rename the new functions I'm introducing here to something less generic. Maybe bitcast_to_string and bitcast_to_bytes?

WDYT?

Thanks, Pavel. That sounds good and it will make it clear that they two functions are distinct.

This revision is now accepted and ready to land.Feb 14 2019, 9:58 AM

Thank you for reviewing this.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 11:46 PM