This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/utils/lit/
-
utils/
-
lit/
-
lit/builtin_commands/
-
builtin_commands/
4/5
diff.py
-
tests/
-
Inputs/shtest-shell/
-
shtest-shell/
-
diff-encodings.txt
-
diff-in.bin
-
diff-in.utf16
-
diff-in.utf8
-
max-failures.py
-
shtest-shell.py

Differential D68664

[lit] Clean up internal diff's encoding handling
ClosedPublic

Authored by jdenny on Oct 8 2019, 1:49 PM.

Download Raw Diff

Details

Reviewers

probinson
stella.stamenova
bd1976llvm
jlpeyton
rnk
mgorny

Commits

rGf095b8c425ec: [lit] Clean up internal diff's encoding handling
rL375018: [lit] Clean up internal diff's encoding handling
rGe4f11a31927e: Reland r374389: [lit] Clean up internal diff's encoding handling
rL374649: Reland r374389: [lit] Clean up internal diff's encoding handling
rG19e6bb25f05f: [lit] Clean up internal diff's encoding handling
rL374389: [lit] Clean up internal diff's encoding handling

Summary

As suggested by rnk at D67643#1673043, instead of reading files
multiple times until an appropriate encoding is found, read them once
as binary, and then try to decode what was read.

For Python >= 3.5, don't fail when attempting to decode the
diff_bytes output in order to print it.

Avoid failures for Python 2.7 used on some Windows bots by
transforming diff output with lit.util.to_string before writing it
to stdout.

Finally, add some tests for encoding handling.

Diff Detail

Event Timeline

jdenny created this revision.Oct 8 2019, 1:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2019, 1:49 PM

Herald added a subscriber: delcypher. · View Herald Transcript

jdenny added a parent revision: D66574: [lit] Make internal diff work in pipelines.Oct 8 2019, 1:52 PM

jdenny added a child revision: D67643: [lit] Extend internal diff to support `-` argument.

jdenny mentioned this in D67643: [lit] Extend internal diff to support `-` argument.Oct 8 2019, 2:15 PM

lgtm

I'm guessing you've tested with Python 2.7 and 3.5, and that's probably what matters.

Thanks for working on this, and sorry for ever expanding scope of the suggested refactorings.

llvm/utils/lit/lit/builtin_commands/diff.py
36	try / except UnicodeDecodeError is an exciting code pattern, but I guess it's the existing behavior. :)
43	Extra logging?
63	Ditto

This revision is now accepted and ready to land.Oct 8 2019, 2:19 PM

In D68664#1700456, @rnk wrote:

I'm guessing you've tested with Python 2.7 and 3.5, and that's probably what matters.

I have 2.7.15 and 3.6.8, but I assume that's sufficient. With each, I've run check-lit and manually tried diff.py. I still need to try check-all before pushing everything.

Thanks for working on this, and sorry for ever expanding scope of the suggested refactorings.

It all needed to be done. I just ran out of time last month. Thanks for the reviews.

llvm/utils/lit/lit/builtin_commands/diff.py
36	I'm not quite sure how to interpret this remark. Indeed, it's the existing behavior. Do you recommend an alternative?
43	Hmm. I'm being sloppy. I'll remove. Thanks.

Removed commented code pointed out during review.

Changed diff.decode(errors="replace") to diff.decode(errors="backslashreplace") so the test suite doesn't fail at sys.stdout.write(diff) when running with python3. The test's commands worked fine when running interactively from a shell prompt, apparently because stdout is then different. Sorry, I must have forgotten to re-run the actual test suite with python3 after making some change here.

In any case, backslashreplace results in output like the following, where \xff represents the bytes that cannot be rendered:

 foo
-bar
+bar\xff\xff
 baz

That seems better than ignore, which just drops them altogether:

 foo
-bar
+bar
 baz

jdenny marked 2 inline comments as done.Oct 9 2019, 7:58 AM

Closed by commit rG19e6bb25f05f: [lit] Clean up internal diff's encoding handling (authored by jdenny). · Explain WhyOct 10 2019, 10:43 AM

This revision was automatically updated to reflect the committed changes.

jdenny mentioned this in D68839: [lit] Fix internal diff's --strip-trailing-cr and use it.Oct 10 2019, 3:14 PM

jdenny mentioned this in rL374648: Reland r374388: [lit] Make internal diff work in pipelines.Oct 12 2019, 4:59 AM

jdenny mentioned this in rL374649: Reland r374389: [lit] Clean up internal diff's encoding handling.

jdenny mentioned this in rL374650: Reland r374390: [lit] Extend internal diff to support `-` argument.

jdenny mentioned this in rL374651: Reland r374392: [lit] Extend internal diff to support -U.

jdenny mentioned this in rL374652: [lit] Fix internal diff's --strip-trailing-cr and use it.

jdenny mentioned this in rGdaf42dc36dc2: Reland r374388: [lit] Make internal diff work in pipelines.

jdenny mentioned this in rGe4f11a31927e: Reland r374389: [lit] Clean up internal diff's encoding handling.

jdenny mentioned this in rG32096a86b240: Reland r374390: [lit] Extend internal diff to support `-` argument.

jdenny mentioned this in rG92a8294f9eda: Reland r374392: [lit] Extend internal diff to support -U.

jdenny mentioned this in rG0f80927316c7: [lit] Fix internal diff's --strip-trailing-cr and use it.

jdenny mentioned this in rL374657: [lit] Try again to fix new tests that fail on Windows bots.Oct 12 2019, 8:00 AM

jdenny mentioned this in rG1f5823b78803: [lit] Try again to fix new tests that fail on Windows bots.

Rebased onto master so it doesn't depend on D66574 anymore. Thus, it modifies TestRunner.py instead of diff.py.

Added lit.util.to_string calls to transform diff output before writing to stdout. I built in a local Windows 10 to test this. With Python 2.7 and without to_string, I was able to reproduce at least some of the previous bot failures, and to_string fixes them. I also tried Python 3.6, and it works fine for me.

jdenny removed a parent revision: D66574: [lit] Make internal diff work in pipelines.Oct 15 2019, 5:04 PM

jdenny edited child revisions, added: D68839: [lit] Fix internal diff's --strip-trailing-cr and use it; removed: D67643: [lit] Extend internal diff to support `-` argument.

jdenny mentioned this in rL375020: [lit] Fix internal diff's --strip-trailing-cr and use it.Oct 16 2019, 10:28 AM

jdenny mentioned this in rG2622419c78c2: [lit] Fix internal diff's --strip-trailing-cr and use it.

Revision Contents

Path

Size

llvm/

utils/

lit/

builtin_commands/

diff.py

53 lines

tests/

Inputs/

shtest-shell/

9 lines

3 lines

2 lines

54 lines

Diff 223935

llvm/utils/lit/lit/builtin_commands/diff.py

	import difflib			import difflib
	import functools			import functools
	import getopt			import getopt
				import locale
	import os			import os
	import sys			import sys

	class DiffFlags():			class DiffFlags():
	def __init__(self):			def __init__(self):
	self.ignore_all_space = False			self.ignore_all_space = False
	self.ignore_space_change = False			self.ignore_space_change = False
	self.unified_diff = False			self.unified_diff = False
	self.recursive_diff = False			self.recursive_diff = False
	self.strip_trailing_cr = False			self.strip_trailing_cr = False

	def getDirTree(path, basedir=""):			def getDirTree(path, basedir=""):
	# Tree is a tuple of form (dirname, child_trees).			# Tree is a tuple of form (dirname, child_trees).
	# An empty dir has child_trees = [], a file has child_trees = None.			# An empty dir has child_trees = [], a file has child_trees = None.
	child_trees = []			child_trees = []
	for dirname, child_dirs, files in os.walk(os.path.join(basedir, path)):			for dirname, child_dirs, files in os.walk(os.path.join(basedir, path)):
	for child_dir in child_dirs:			for child_dir in child_dirs:
	child_trees.append(getDirTree(child_dir, dirname))			child_trees.append(getDirTree(child_dir, dirname))
	for filename in files:			for filename in files:
	child_trees.append((filename, None))			child_trees.append((filename, None))
	return path, sorted(child_trees)			return path, sorted(child_trees)

	def compareTwoFiles(flags, filepaths):			def compareTwoFiles(flags, filepaths):
	compare_bytes = False
	encoding = None
	filelines = []			filelines = []
	for file in filepaths:			for file in filepaths:
				with open(file, 'rb') as file_bin:
				filelines.append(file_bin.readlines())

	try:			try:
	with open(file, 'r') as f:			return compareTwoTextFiles(flags, filepaths, filelines,
	filelines.append(f.readlines())			locale.getpreferredencoding(False))
	except UnicodeDecodeError:			except UnicodeDecodeError:
				rnkUnsubmitted Not Done Reply Inline Actions try / except UnicodeDecodeError is an exciting code pattern, but I guess it's the existing behavior. :) rnk: try / except UnicodeDecodeError is an exciting code pattern, but I guess it's the existing…
				jdennyAuthorUnsubmitted Done Reply Inline Actions I'm not quite sure how to interpret this remark. Indeed, it's the existing behavior. Do you recommend an alternative? jdenny: I'm not quite sure how to interpret this remark. Indeed, it's the existing behavior. Do you…
	try:			try:
	with io.open(file, 'r', encoding="utf-8") as f:			return compareTwoTextFiles(flags, filepaths, filelines, "utf-8")
	filelines.append(f.readlines())
	encoding = "utf-8"
	except:			except:
	compare_bytes = True			return compareTwoBinaryFiles(flags, filepaths, filelines)

	if compare_bytes:
	return compareTwoBinaryFiles(flags, filepaths)
	else:
	return compareTwoTextFiles(flags, filepaths, encoding)

	def compareTwoBinaryFiles(flags, filepaths):
	filelines = []
	for file in filepaths:
	with open(file, 'rb') as f:
	filelines.append(f.readlines())

				def compareTwoBinaryFiles(flags, filepaths, filelines):
				#sys.stderr.write("Trying as binary....\n")
				rnkUnsubmitted Done Reply Inline Actions Extra logging? rnk: Extra logging?
				jdennyAuthorUnsubmitted Done Reply Inline Actions Hmm. I'm being sloppy. I'll remove. Thanks. jdenny: Hmm. I'm being sloppy. I'll remove. Thanks.
	exitCode = 0			exitCode = 0
	if hasattr(difflib, 'diff_bytes'):			if hasattr(difflib, 'diff_bytes'):
	# python 3.5 or newer			# python 3.5 or newer
	diffs = difflib.diff_bytes(difflib.unified_diff, filelines[0], filelines[1], filepaths[0].encode(), filepaths[1].encode())			diffs = difflib.diff_bytes(difflib.unified_diff, filelines[0], filelines[1], filepaths[0].encode(), filepaths[1].encode())
	diffs = [diff.decode() for diff in diffs]			diffs = [diff.decode(errors="replace") for diff in diffs]
	else:			else:
	# python 2.7			# python 2.7
	if flags.unified_diff:			if flags.unified_diff:
	func = difflib.unified_diff			func = difflib.unified_diff
	else:			else:
	func = difflib.context_diff			func = difflib.context_diff
	diffs = func(filelines[0], filelines[1], filepaths[0], filepaths[1])			diffs = func(filelines[0], filelines[1], filepaths[0], filepaths[1])

	for diff in diffs:			for diff in diffs:
	sys.stdout.write(diff)			sys.stdout.write(diff)
	exitCode = 1			exitCode = 1
	return exitCode			return exitCode

	def compareTwoTextFiles(flags, filepaths, encoding):			def compareTwoTextFiles(flags, filepaths, filelines_bin, encoding):
				#sys.stderr.write("Trying with encoding {}....\n".format(encoding))
				rnkUnsubmitted Done Reply Inline Actions Ditto rnk: Ditto
	filelines = []			filelines = []
	for file in filepaths:			for lines_bin in filelines_bin:
	if encoding is None:			lines = []
	with open(file, 'r') as f:			for line_bin in lines_bin:
	filelines.append(f.readlines())			line = line_bin.decode(encoding=encoding)
	else:			lines.append(line)
	with io.open(file, 'r', encoding=encoding) as f:			filelines.append(lines)
	filelines.append(f.readlines())

	exitCode = 0			exitCode = 0
	def compose2(f, g):			def compose2(f, g):
	return lambda x: f(g(x))			return lambda x: f(g(x))

	f = lambda x: x			f = lambda x: x
	if flags.strip_trailing_cr:			if flags.strip_trailing_cr:
	f = compose2(lambda line: line.rstrip('\r'), f)			f = compose2(lambda line: line.rstrip('\r'), f)
	▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/utils/lit/tests/Inputs/shtest-shell/diff-encodings.txt

This file was added.

				# Check that diff falls back to binary mode if it cannot decode a file.

				# RUN: diff -u diff-in.bin diff-in.bin
				# RUN: diff -u diff-in.utf16 diff-in.bin && false \|\| true
				# RUN: diff -u diff-in.utf8 diff-in.bin && false \|\| true
				# RUN: diff -u diff-in.bin diff-in.utf8 && false \|\| true

				# Fail so lit will print output.
				# RUN: false

llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.bin

This binary file was added.

llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.utf16

This binary file was added.

llvm/utils/lit/tests/Inputs/shtest-shell/diff-in.utf8

This file was added.

				foo
				bar
				baz

llvm/utils/lit/tests/max-failures.py

	# Check the behavior of --max-failures option.			# Check the behavior of --max-failures option.
	#			#
	# RUN: not %{lit} -j 1 -v %{inputs}/max-failures > %t.out			# RUN: not %{lit} -j 1 -v %{inputs}/max-failures > %t.out
	# RUN: not %{lit} --max-failures=1 -j 1 -v %{inputs}/max-failures >> %t.out			# RUN: not %{lit} --max-failures=1 -j 1 -v %{inputs}/max-failures >> %t.out
	# RUN: not %{lit} --max-failures=2 -j 1 -v %{inputs}/max-failures >> %t.out			# RUN: not %{lit} --max-failures=2 -j 1 -v %{inputs}/max-failures >> %t.out
	# RUN: not %{lit} --max-failures=0 -j 1 -v %{inputs}/max-failures 2>> %t.out			# RUN: not %{lit} --max-failures=0 -j 1 -v %{inputs}/max-failures 2>> %t.out
	# RUN: FileCheck < %t.out %s			# RUN: FileCheck < %t.out %s
	#			#
	# END.			# END.

	# CHECK: Failing Tests (27)			# CHECK: Failing Tests (28)
	# CHECK: Failing Tests (1)			# CHECK: Failing Tests (1)
	# CHECK: Failing Tests (2)			# CHECK: Failing Tests (2)
	# CHECK: error: Option '--max-failures' requires positive integer			# CHECK: error: Option '--max-failures' requires positive integer

llvm/utils/lit/tests/shtest-shell.py

	Show All 28 Lines
	# CHECK: FAIL: shtest-shell :: colon-error.txt			# CHECK: FAIL: shtest-shell :: colon-error.txt
	# CHECK: * TEST 'shtest-shell :: colon-error.txt' FAILED *			# CHECK: * TEST 'shtest-shell :: colon-error.txt' FAILED *
	# CHECK: $ ":"			# CHECK: $ ":"
	# CHECK: # command stderr:			# CHECK: # command stderr:
	# CHECK: Unsupported: ':' cannot be part of a pipeline			# CHECK: Unsupported: ':' cannot be part of a pipeline
	# CHECK: error: command failed with exit status: 127			# CHECK: error: command failed with exit status: 127
	# CHECK: ***			# CHECK: ***


				# CHECK: FAIL: shtest-shell :: diff-encodings.txt
				# CHECK: * TEST 'shtest-shell :: diff-encodings.txt' FAILED *

				# CHECK: $ "diff" "-u" "diff-in.bin" "diff-in.bin"
				# CHECK-NOT: error

				# CHECK: $ "diff" "-u" "diff-in.utf16" "diff-in.bin"
				# CHECK: # command output:
				# CHECK-NEXT: ---
				# CHECK-NEXT: +++
				# CHECK-NEXT: @@
				# CHECK-NEXT: {{^ .f.o.o.$}}
				# CHECK-NEXT: {{^-.b.a.r.$}}
				# CHECK-NEXT: {{^\+.b.a.r..}}
				# CHECK-NEXT: {{^ .b.a.z.$}}
				# CHECK: error: command failed with exit status: 1
				# CHECK: $ "true"

				# CHECK: $ "diff" "-u" "diff-in.utf8" "diff-in.bin"
				# CHECK: # command output:
				# CHECK-NEXT: ---
				# CHECK-NEXT: +++
				# CHECK-NEXT: @@
				# CHECK-NEXT: -foo
				# CHECK-NEXT: -bar
				# CHECK-NEXT: -baz
				# CHECK-NEXT: {{^\+.f.o.o.$}}
				# CHECK-NEXT: {{^\+.b.a.r..}}
				# CHECK-NEXT: {{^\+.b.a.z.$}}
				# CHECK: error: command failed with exit status: 1
				# CHECK: $ "true"

				# CHECK: $ "diff" "-u" "diff-in.bin" "diff-in.utf8"
				# CHECK: # command output:
				# CHECK-NEXT: ---
				# CHECK-NEXT: +++
				# CHECK-NEXT: @@
				# CHECK-NEXT: {{^\-.f.o.o.$}}
				# CHECK-NEXT: {{^\-.b.a.r..}}
				# CHECK-NEXT: {{^\-.b.a.z.$}}
				# CHECK-NEXT: +foo
				# CHECK-NEXT: +bar
				# CHECK-NEXT: +baz
				# CHECK: error: command failed with exit status: 1
				# CHECK: $ "true"

				# CHECK: $ "false"

				# CHECK: ***


	# CHECK: FAIL: shtest-shell :: diff-error-1.txt			# CHECK: FAIL: shtest-shell :: diff-error-1.txt
	# CHECK: * TEST 'shtest-shell :: diff-error-1.txt' FAILED *			# CHECK: * TEST 'shtest-shell :: diff-error-1.txt' FAILED *
	# CHECK: $ "diff" "-B" "temp1.txt" "temp2.txt"			# CHECK: $ "diff" "-B" "temp1.txt" "temp2.txt"
	# CHECK: # command stderr:			# CHECK: # command stderr:
	# CHECK: Unsupported: 'diff': option -B not recognized			# CHECK: Unsupported: 'diff': option -B not recognized
	# CHECK: error: command failed with exit status: 1			# CHECK: error: command failed with exit status: 1
	# CHECK: ***			# CHECK: ***

	▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
	# CHECK: * TEST 'shtest-shell :: rm-error-3.txt' FAILED *			# CHECK: * TEST 'shtest-shell :: rm-error-3.txt' FAILED *
	# CHECK: Exit Code: 1			# CHECK: Exit Code: 1
	# CHECK: ***			# CHECK: ***

	# CHECK: PASS: shtest-shell :: rm-unicode-0.txt			# CHECK: PASS: shtest-shell :: rm-unicode-0.txt
	# CHECK: PASS: shtest-shell :: sequencing-0.txt			# CHECK: PASS: shtest-shell :: sequencing-0.txt
	# CHECK: XFAIL: shtest-shell :: sequencing-1.txt			# CHECK: XFAIL: shtest-shell :: sequencing-1.txt
	# CHECK: PASS: shtest-shell :: valid-shell.txt			# CHECK: PASS: shtest-shell :: valid-shell.txt
	# CHECK: Failing Tests (27)			# CHECK: Failing Tests (28)