This is an archive of the discontinued LLVM Phabricator instance.

[lit] Clean up internal diff's encoding handling
ClosedPublic

Authored by jdenny on Oct 8 2019, 1:49 PM.

Details

Summary

As suggested by rnk at D67643#1673043, instead of reading files
multiple times until an appropriate encoding is found, read them once
as binary, and then try to decode what was read.

For Python >= 3.5, don't fail when attempting to decode the
diff_bytes output in order to print it.

Avoid failures for Python 2.7 used on some Windows bots by
transforming diff output with lit.util.to_string before writing it
to stdout.

Finally, add some tests for encoding handling.

Diff Detail

Event Timeline

jdenny created this revision.Oct 8 2019, 1:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2019, 1:49 PM
Herald added a subscriber: delcypher. · View Herald Transcript
rnk accepted this revision.Oct 8 2019, 2:19 PM

lgtm

I'm guessing you've tested with Python 2.7 and 3.5, and that's probably what matters.

Thanks for working on this, and sorry for ever expanding scope of the suggested refactorings.

llvm/utils/lit/lit/builtin_commands/diff.py
36 ↗(On Diff #223935)

try / except UnicodeDecodeError is an exciting code pattern, but I guess it's the existing behavior. :)

43 ↗(On Diff #223935)

Extra logging?

63 ↗(On Diff #223935)

Ditto

This revision is now accepted and ready to land.Oct 8 2019, 2:19 PM
jdenny marked 2 inline comments as done.Oct 8 2019, 2:43 PM
In D68664#1700456, @rnk wrote:

I'm guessing you've tested with Python 2.7 and 3.5, and that's probably what matters.

I have 2.7.15 and 3.6.8, but I assume that's sufficient. With each, I've run check-lit and manually tried diff.py. I still need to try check-all before pushing everything.

Thanks for working on this, and sorry for ever expanding scope of the suggested refactorings.

It all needed to be done. I just ran out of time last month. Thanks for the reviews.

llvm/utils/lit/lit/builtin_commands/diff.py
36 ↗(On Diff #223935)

I'm not quite sure how to interpret this remark. Indeed, it's the existing behavior. Do you recommend an alternative?

43 ↗(On Diff #223935)

Hmm. I'm being sloppy. I'll remove. Thanks.

jdenny updated this revision to Diff 224051.Oct 9 2019, 7:56 AM

Removed commented code pointed out during review.

Changed diff.decode(errors="replace") to diff.decode(errors="backslashreplace") so the test suite doesn't fail at sys.stdout.write(diff) when running with python3. The test's commands worked fine when running interactively from a shell prompt, apparently because stdout is then different. Sorry, I must have forgotten to re-run the actual test suite with python3 after making some change here.

In any case, backslashreplace results in output like the following, where \xff represents the bytes that cannot be rendered:

 foo
-bar
+bar\xff\xff
 baz

That seems better than ignore, which just drops them altogether:

 foo
-bar
+bar
 baz
jdenny marked 2 inline comments as done.Oct 9 2019, 7:58 AM
This revision was automatically updated to reflect the committed changes.
jdenny updated this revision to Diff 225137.Oct 15 2019, 5:01 PM
jdenny edited the summary of this revision. (Show Details)

Rebased onto master so it doesn't depend on D66574 anymore. Thus, it modifies TestRunner.py instead of diff.py.

Added lit.util.to_string calls to transform diff output before writing to stdout. I built in a local Windows 10 to test this. With Python 2.7 and without to_string, I was able to reproduce at least some of the previous bot failures, and to_string fixes them. I also tried Python 3.6, and it works fine for me.