This is an archive of the discontinued LLVM Phabricator instance.

Python bindings: Fix handling of file bodies with multi-byte characters
AbandonedPublic

Authored by mheinzler on Apr 17 2018, 3:10 PM.

Details

Reviewers
kristina
serge-sans-paille
Group Reviewers
Restricted Project
Summary

With python3 there is a difference between the length of the string and the length of the utf-8 encoded bytes array. To not cut off characters at the end when the string contains multi-byte characters, the length of file contents that gets passed to clang needs to be calculated from their bytes representation.

I also added a test case that catches this. I needed to add the coding line at the top of the test unit to make python2 work with the embedded Unicode character. Alternatively we could replace the character with /uXXXX, but then there would be other problems with python2.

Diff Detail

Repository
rC Clang

Event Timeline

mheinzler created this revision.Apr 17 2018, 3:10 PM

Would you mind re-uploading these patches with full context (with diff -U99999). 3 lines of context around changes makes this very difficult to review. Also I would suggest testing for Python version and using appropriate semantics (using encode in case of Python3). Not entirely sure if the bindings are supposed to be compatible with Python3, but I don't see any harm. However explicit tests that avoid changing functionality of existing Python2.7 code would be better in my opinion.

mheinzler updated this revision to Diff 165728.Sep 17 2018, 2:35 AM

Sorry, here's the diff for the whole files.

The b function defined at the top of the file already does what you suggest. For python2 it returns the string unchanged, for python3 it calls encode. So there shouldn't be any change at all for python2.

mheinzler abandoned this revision.Feb 3 2019, 3:08 AM

I'm closing this because it has been fixed in master by:
https://reviews.llvm.org/D56429

Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: arphaman. · View Herald Transcript