# [lit] Implement "-r" option for builtin "diff" command + a test using that.ClosedPublicActions

Authored by Dor1s on Jan 5 2018, 9:43 AM.

# Details

Summary

That would allow to recursively compare directories in tests using
"diff -r" on Windows in a similar way as it can be done on Linux or Mac.

# Diff Detail

Build Status
 Buildable 13640 Build 13640: arc lint + arc unit

### Event Timeline

Dor1s created this revision.Jan 5 2018, 9:43 AM
vsk added a comment.Jan 5 2018, 10:44 AM

I haven't looked too closely, but I think this test might fail:

RUN: mkdir -p %t/A %t/B
RUN: touch %t/A/only_in_a %t/B/only_in_b
RUN: not diff -r %t/A %t/B

If there are tests for lit itself, it might be useful to check this one in.

Dor1s added a comment.Jan 5 2018, 10:56 AM
In D41776#968593, @vsk wrote:

I haven't looked too closely, but I think this test might fail:

RUN: mkdir -p %t/A %t/B
RUN: touch %t/A/only_in_a %t/B/only_in_b
RUN: not diff -r %t/A %t/B

If there are tests for lit itself, it might be useful to check this one in.

That's a great point! I should add names for non empty files as well! Also will look for TestRunner tests, thanks Vedant!

utils/lit/lit/TestRunner.py
378

Does os.walk have well-defined ordering? What if contents of dir1 are visited in a different order than dir2?

386

Do we also want to compare file names?

E.g., what if we have

dir1/a  # Contents: 12345
dir2/b  # Contents: 12345

Won't this implementation consider dir1 and dir2 to be the same?

zturner requested changes to this revision.Jan 5 2018, 11:03 AM

Since a user of lit is going to expect diff -r to "just work", then it needs to match the semantics of what the tool diff -r actually does. The patch here will print "something" and return an error code if there are differences, but it won't print the same thing as the actual diff utility when used with -r. I think if we implement this in lit, we need to make the output look the same. For example, a person might try to use this to verify that two directories differ only in one file. (for example, one has an a.txt and another has a b.txt). So they might run diff -r and filecheck the output against this:

Only in d:\foo: a.txt
Only in d:\bar: b.txt

But the code here won't print that. In fact, if the *contents* of a.txt and b.txt are the same but just the names are different, then I think (from inspecting this patch) that the diff would actually succeed, since it just appends file contents to a buffer.

So I think if we want to implement diff -r here, you need to make sure it exactly matches the output of the unix tool for various inputs.

This revision now requires changes to proceed.Jan 5 2018, 11:03 AM
Dor1s updated this revision to Diff 128776.Jan 5 2018, 11:18 AM

Fix a bug pointed out by vsk + add a test for lit.

Dor1s marked an inline comment as done.Jan 5 2018, 11:21 AM

Thanks for the comments, I'll work more on it in order to follow "diff" utility default output.

utils/lit/lit/TestRunner.py
378

I believe so, but either way it looks like this needs more work in order to provide the same output as a regular "diff" utility does.

386

Right, thanks!

Dor1s updated this revision to Diff 128800.Jan 5 2018, 2:28 PM
Dor1s marked an inline comment as done.

Implement the same output as produced by regular diff utility + added tests for that.

Dor1s updated this revision to Diff 128805.Jan 5 2018, 2:35 PM
Dor1s marked 3 inline comments as done.

Use None for empty directories instead of a fake string.

Dor1s added a comment.Jan 5 2018, 2:35 PM

utils/lit/lit/TestRunner.py
378

Resolved now, as I put filepaths and contents into a dict.

utils/lit/tests/Inputs/shtest-shell/diff-r.txt
27

I was hoping to use FileCheck --check-prefix and different prefixes, but for some reason I'm getting weird errors, e.g.:

...
$"not" "diff" "-r" "/usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/Output/diff-r.txt.tmp/dir1" "/usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/Output/diff-r.txt.tmp/dir2"$ "FileCheck" "--check-prefix=ERROR1" "/usr/local/google/home/mmoroz/Projects/llvm/build/utils/lit/tests/Inputs/shtest-shell/diff-r.txt"
# command stderr:
# ERROR1: Only in %t/dir1: dir1unique
^
<stdin>:1:1: note: scanning from here
^
<stdin>:1:106: note: possible intended match here
^

error: command failed with exit status: 1

I'm fine to proceed with grep, if reviewers don't mind.

utils/lit/tests/Inputs/shtest-shell/diff-r.txt
27

You can't use substitutions like %t in a check prefix. Those are expanded only in run lines. The general solution for this is to wrap that part in a regex. So for example, instead of this:

ERROR1: Only in %t/dir1: dir1unique

You would write this:

ERROR1: Only in {{.*}}dir1: dir1unique

That said, I would really prefer if we could use the check lines instead of grep, since that's the canonical way of writing tests and there are other advantages as well (CHECK-NEXT, etc)

Dor1s updated this revision to Diff 128808.Jan 5 2018, 2:40 PM

Use logical OR for exitCode assignment to avoid overwriting its value from 1 to 0.

Dor1s updated this revision to Diff 128810.Jan 5 2018, 2:49 PM
Dor1s marked 2 inline comments as done.

Use FileCheck with different prefixes for different errors instead of grep.

utils/lit/tests/Inputs/shtest-shell/diff-r.txt
27

That totally works! Thank you, Zachary!

I've tried that before as well, but I put string into double-quotes, so the regex didn't work for me neither :)

Dor1s marked 2 inline comments as done.Jan 5 2018, 2:51 PM
utils/lit/lit/TestRunner.py
383–385

Would it be more efficient to just save off the paths instead of reading the lines up front? You can imagine a case where a folder has a ton of files, but the other folder has none of those files. Reading all those lines up front and making a strong reference to them is going to consume a huge amount of memory. But you might not even need the contents of the file because the file might not exist in the other folder anyway, in which case you can early out on it. It seems like all you need to do is build up a tuple like (dirname, child_dirs, files), where child_dirs is itself a list of tuples of the exact same form. But don't read the contents until you know you have to do the comparison, and throw the contents away before you do the next comparison, to guarantee your'e not loading an entire directory tree's worth of file contents into memory at once.

395

s/compate/compare/

418

s/compate/compare/

424–427

This looks a little strange to me. stdout.write doesn't add a newline does it? Wouldn't this mash everything together on a single line? Any reason not to use print here?

432

s/compate/compare/

Dor1s updated this revision to Diff 128859.Jan 6 2018, 4:25 PM
Dor1s marked 4 inline comments as done.

An optimized version that works, but needs some cleanup + more tests.

Dor1s marked an inline comment as done.Jan 6 2018, 4:27 PM
utils/lit/lit/TestRunner.py
383–385

Agreed. Implemented a solution that does that and follows behavior of GNU diffutils better. I'll do a cleanup a bit later and ask for another round of review :)

424–427

Good catch, print sounds good. Thanks!

Dor1s updated this revision to Diff 128954.Jan 8 2018, 10:55 AM
Dor1s marked 3 inline comments as done.

Did a cleanup + extended the test.

Dor1s added a comment.Jan 8 2018, 11:07 AM

By the way, I'm having some troubles running shtest-shell on Linux. Is there any proper way to run it, or maybe we have a test bot that can do it for me? In the worst case I can try that on Windows, but it would take some extra time...

zturner accepted this revision.Jan 8 2018, 1:48 PM

This looks good to me. One minor suggestion is to make sure this works with Python 3. If you're already confident it does, you don't have to do anything. But we do have at least one bot running Python 3, and i've been hit with Py3 specific errors before and had to fix it while the bot was broken. Either way, up to you. Thanks for working on this!

By the way, I'm having some troubles running shtest-shell on Linux. Is there any proper way to run it, or maybe we have a test bot that can do it for me? In the worst case I can try that on Windows, but it would take some extra time...

Someone else will need to answer this one, as I have really no idea :-/ Sorry

This revision is now accepted and ready to land.Jan 8 2018, 1:48 PM
Dor1s added a comment.Jan 9 2018, 9:23 AM

Thanks for the review and for the heads up. I've spent enough time trying to set up a work environment on Windows, but still can't say that it works fine. I'll land this and see how it goes on the bots.

Dor1s added a comment.Jan 9 2018, 9:39 AM

Thanks for the review and for the heads up. I've spent enough time trying to set up a work environment on Windows, but still can't say that it works fine. I'll land this and see how it goes on the bots.

Well, I figured out some troubles locally, will fix it before landing.

Dor1s updated this revision to Diff 129119.Jan 9 2018, 10:19 AM

Split the tests in a way that they would really work

Dor1s added a comment.Jan 9 2018, 10:19 AM
1. For some reason, it appears to be important to use stdout.write instead of print, so replacing these.
1. Commands implemented in TestRunner.py can't be used as part of a pipeline, this is why there are several diff-error-* tests. I don't see a better way to test negative scenarios rather than split my test into diff-r-error-* and check their output in shtest-shell.py as it's done for existing tests
Dor1s closed this revision.Jan 9 2018, 10:24 AM
Dor1s added a comment.Jan 9 2018, 10:24 AM

Committing, fingers crossed.