This is an archive of the discontinued LLVM Phabricator instance.

[lit] Extend internal diff to support `-` argument
ClosedPublic

Authored by jdenny on Sep 16 2019, 5:12 PM.

Details

Summary

When using lit's internal shell, RUN lines like the following
accidentally execute an external diff instead of lit's internal
diff:

# RUN: program | diff file -

Such cases exist now, in clang/test/Analysis for example. We are
preparing patches to ensure lit's internal diff is called in such
cases, which will then fail because lit's internal diff doesn't
recognize - as a command-line option. This patch adds support for
- to mean stdin.

Diff Detail

Event Timeline

jdenny created this revision.Sep 16 2019, 5:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2019, 5:12 PM
Herald added a subscriber: delcypher. · View Herald Transcript
rnk added inline comments.Sep 17 2019, 11:36 AM
llvm/utils/lit/lit/builtin_commands/diff.py
32–34

This function seems fragile to me. It reads each file three times:

  1. without an encoding (system default?), may fail
  2. with a utf-8 encoding, may fail
  3. inside compareTwo*Files, reads them yet again

I think the best way to be portable between Python 2 & 3 will probably be to work in bytes as much as possible.

I think you can use this pattern to get a new file descriptor for stdin that reads bytes:

fileno = sys.stdin.fileno()
if fileno is not None:
    new_stdin = os.fdopen(os.dup(fileno), 'rb') # read in binary

Here's how I would do it:

  1. read both inputs completely in binary (can't fail)
  2. try decoding the entire file in a few encodings (try default, try utf-8, if that fails, stick with bytes and diff_bytes)
  3. split file on u'\n' or b'\n' as appropriate, perhaps stripping trailing '\r' if present, depending on flags

Does that seem reasonable?

jdenny marked an inline comment as done.Sep 20 2019, 6:56 AM
jdenny added inline comments.
llvm/utils/lit/lit/builtin_commands/diff.py
32–34

I don't know the motivation by the original implementation. Your plan sounds better. Sorry, but I don't have time to implement and test this right now. Hopefully soon. Thanks for the review.

jdenny marked an inline comment as done.Oct 8 2019, 1:45 PM
jdenny added inline comments.
llvm/utils/lit/lit/builtin_commands/diff.py
32–34

I think you can use this pattern to get a new file descriptor for stdin that reads bytes:

fileno = sys.stdin.fileno()
if fileno is not None:

I don't see documentation saying that sys.stdin.fileno might return None. Is that possible?

jdenny updated this revision to Diff 223941.Oct 8 2019, 2:15 PM
jdenny edited the summary of this revision. (Show Details)

Rebased onto D68664, and extended its encoding tests for -.

jdenny marked an inline comment as done.Oct 8 2019, 2:16 PM
rnk accepted this revision.Oct 8 2019, 2:26 PM

lgtm

llvm/utils/lit/lit/builtin_commands/diff.py
32–34

I think I just copied it from stack overflow here: https://stackoverflow.com/questions/32199552/what-is-sys-stdin-fileno-in-python

I guess in this case os.dup would fail, which is a reasonable failure mode.

This revision is now accepted and ready to land.Oct 8 2019, 2:26 PM
jdenny updated this revision to Diff 225336.Oct 16 2019, 5:58 PM

Rebased onto the most recent D66574.

jdenny updated this revision to Diff 225717.Oct 18 2019, 4:42 PM

Rebased onto updated D66574.

When echo appears in a pipeline, lit calls an external echo, which terminates lines with \r\n on Windows. Otherwise, lit uses its internal echo, which terminates lines with \n, just as does lit's out-of-process builtin cat. Because lit's internal diff is sensitive to this difference, after echo foo > %t.foo, do cat %t.foo | diff - %t.foo not echo foo | diff - %t.foo in lit's test suite in this patch.

jdenny marked an inline comment as done.Oct 18 2019, 4:43 PM